Friday, November 19, 2004

A Connectionist Model of Metaphor

In my posts on metaphor (I, II, III, and IV), I focused primarily on theories of metaphor, with some empirical work thrown in for good measure. I didn't discuss any of the many models that implement these theories, primarily because there are so many. Today I was reading a paper on a connectionist model, called the Metaphor by Pattern Completion (MPC) model (Thomas & Mareschal, 1999), that implements the attributive categorization model (see posts III and IV), and I thought it might be interesting to give a quick description of it.

The model uses a pretty normal three-level connectionist architecture (see here for a good primer on connectionism), consisting of input and output nodes, along with intervening hidden nodes. The model was trained with exemplars from three separate categories (Apple, Fork, and Ball), and the representations of these three categories are stored in the hidden units. Here is a graphic representation of the architecture:

Posted by Hello

Figure 1 from Thomas and Mareschal (1999). Click for larger view.

Given an exemplar, the model autoassociates the features that comprise the representation of the category to which the exemplar belongs. This just means that the model reproduces the features, in the form of semantic vectors, of the category as output. So, given an exemplar with a set of features, the model will determine into which category the exemplar belongs, and output the features of that category. To model metaphors, an exemplar serving as the topic of the metaphor is inputted into the node(s) representing the category which serves as the vehicle. For instance, the metaphor "An apple is a ball" would involve inputting an apple exemplar into the node(s) representing the category Ball. The output, in this case, would be a representation of apples (the topic) which has been altered by the vehicle representation so that the output is more similar to the ball representation than the original exemplar was. Here is the author's description of why this alteration of the topic representation occurs:
Pattern completion is a property of connectionist networks that derives from their non-linear processing (Rumelhart & McClelland, 1986). A network trained to respond to a given input set will still respond adequately given noisy versions of the input patterns. For example, if an autoassociator is trained to reproduce the vector <0> and is subsequently given the input <.2 .6 .2 .2>, its output is likely to be much closer to the vector it 'knows', perhaps <.0 .9 .0 .0>. An input is transformed so as to make it more consistent with the knowledge that the network has been previously trained on. The connection weights store the feature correlation information in previously experienced examples. If a partial input is presented to the network, it can use that correlation information to reconstruct the missing features.
When the model is actually used, it exhibits properties of metaphors consistent with the attributive categorization theory. First, the representation of the topic and vehicle interact to determine which features of the topic are represented in the output of the metaphor. In addition, the output of the metaphorical comparison differs depending on the direction of the comparison. "An apple is a ball" will produce a representation that is different from "A ball is an apple," indicating that the model produces results consistent with the irreversability of metaphorical statements. In addition to the results consistent with the predictions of the attributive categorization theory, the model itself produces three new predictions. The first is that the smaller the range of features found in exemplars of a vehicle category, the less metaphorical comparisons involving that category as a vehicle will produce interactive effects. In other words, when a vehicle with a small range of features is used in metaphors, the features it transfers to the topic will tend to be the same regardless of what the topic is. This will also be the case when the vehicle category is highly familiar. Finally, metaphorical comparisons will involve the transfer of attributes from the vehicle to the topic that are not likely to be reported. This is because, while some features may not make sense when transferred from the vehicle to the topic, (e.g., "A ball is an apple" may transfer the feature "edible," even though very few balls are actually edible), but which are transferred anyway. As a result, Thomas & Mareschal predict that participants will be slower to questions about features which are transferred but not otherwise reported.

So, there you have it, a connectionist model of metaphor implementing the attributive categorization theory. There are several obvious problems and limitations with the model. For one, it's not really doing what the attributive categorization theory says is involved in metaphor. Recall that in the attributive categorization model, the vehicle is a member of a category. The topic is placed into that category, not into the category specifically referred to by the vehicle label. For example, "My job is a jail" doesn't involve categorizing my job as a jail, but as a member of a common category, confining plances. However, in the MPC model, the topic is placed into the category referred to by the vehicle label, rather than a common category. I imagine the model could be modified to do this, but as it's described, it isn't doing what the attributive categorization theory says it should be. In addition, the model can only deal with metaphors that involve the transfer of attributes, and cannot handle any metaphors that involve the transfer of structure. This is a big problem, since research has shown that most metaphors involve structural information. In addition, the model's version of the irreversability of metaphorical comparisons is pretty weak. The model can perform comparisons in either direction, and has no way of determining which direction is better. This hints at a limitation common to most (if not all) connectionist models: it's not clear how th model fits within a larger system which, in this case, would ultimately be needed to explain the full range of metaphorical behaviors we humans exhibit. The model's simulation of metaphor is therefore uncomfortably elliptical. Finally, with the exception of the third prediction, the first two predictions are hardly novel, and are pretty much self-evident. If the model had not demonstrated these properties, it would certainly have been in trouble, but the fact that it does make these predictions is hardly a case for taking the model seriously. The third prediction is interesting, but it is also a prediction that would be made by most comparison theories, and therefore wouldn't allow us to differentiate between this model and comparison models.


