Thursday, February 10, 2005

Concepts III: Exemplars

Research on prototype theories was designed to show that they could deal with the issues listed in the last post, including typicality effects, fuzzy concepts, relationships between features, and relationships between concepts. While prototype theories handle these well, they ran into another set of problems, and we must now consider it necessary to explain these as well. They include:
  • Intra-category variance: Prototypes are just averages, but the variance within a category appears to matter too.
  • Interrelations between specific features: Prototypes capture category-wide feature correlations, but often miss local ones (e.g., the correlation between wooden and large in the category SPOONS).
  • Conceptual combinations: Why is a goldfish a typical pet fish, while it's neither a typical pet nor a typical fish?
  • Ad hoc categories: How do people produce and represent categories on the fly in particular contexts (e.g., things to take on a picnic)?
  • Linearly separable categories: for some natural kind and artifact concepts, all members have the same (or similar) values on a particular dimension.
For some, the way around these problems with prototype theories was to develop two-process theories that encorporated rules and prototoypes. This provides an easy solution to the problem of linear separability, and over the last 10 or so years, prototype models of this sort have done when modeling experiments that use linearly separable categories, but the other problems are more difficult to handle. In the late 70s, some researchers argued that the way to deal with these problems was to treat category representations as a collection (or cluster in high-dimensional space) of all of the members of that category that we have encountered. These theories have generally been called exemplar theories.

As with prototypes, the fundamentals of expemplar theories are pretty straightforward. We encounter an exemplar, and to categorize it, we compare it to all (or some subset) of the stored exemplars for categories that meet some initial similarity requirement. The comparison is generally considered to be between features, which are usually represented in a multidimensional space defined by various "psychological" dimensions (on which the values of particular features vary). Some features are more salient, or relevant, than others, and are thus given more attention and weight during the comparison. Thus, we can use an equation like the following1 to determine the similarity of an exemplar:
dist(s, m) = åiai|yistim - ymiex|
Here, the distance in the space between an instance, s, and an exemplar in memory, m, is equal to the sum of the values of the feature of m on all of dimensions (represented individually by i) subtracted from the feature value of the stimulus on the same dimensions. The sum is weighted by a, which represents the saliency of the particular features. The distance is converted into similarity by feeding it into a function in which the similarity decreases exponentially as the distance increases.

There are a couple ways to determine to which category an instance belongs using similarity calculated from the equation above. First, we could have some similarity threshold, and the first category to reach that threshold is the one into which we place the instance. This is how random-walk models of classification work2. Exemplars from different categories are retrieved from memory, roughly in the order of their similarity to the instance, and contribute incrimentally to the similarity of their category to the instance. More commonly, the instance is classified as a member of the category that has the highest similarity relative to the total similarity of all of the retrieved categories to the instance.

The classic experiment used to argue for the use of an exemplar model, the context model3, over a prototype model, used stimuli of the following form:

Category A Category B
1. 1111 4. 0000
2. 1110 5. 1100
3. 0001 6. 0011

Each series of four numbers represents an exemplar, and each 1 or 0 represents the value on one of four dimensions. If we take the averages on each dimension for the two categories, we get prototypes of 1111 for A and 0000 for B. If participants learn these two categories during a training phase, and, during the testing phase are presented with a new exemplar, 7, with the feature values 0101, then absent any information about the salience of the dimensions, prototype theory would predict that the probabilities of us classifying 7 as a member of A and B are equal. However, using an exemplar model equation like the one above, and the relative similarity (rather than a similarity threshold), the probability of us classifying 7 as a member of A would be 61%. The experiments showed that people actually classified stimuli like 7 at rates more consistent with the predictions of the exemplar model than those of prototype models.

As you might imagine, exemplar models are incredibly powerful. By storing all, or many of the exemplars of a category in memory, and comparing new instances to those in memory, we capture all of the important features of concepts, such as the interrelations between specific features (since most wooden spoon exemplars that we've stored in memory share the value "large" on the size dimension, these features are automatically associated with each other in any similarity calculation), the variation within categories (when you've got all the exemplars, you've got all the variation), the ways in which linearly separable categories are categorized (on the whole, if all members of a category share a value on a particular dimension, then the similarity of new instances with that value will be higher to members of that category than to members of other categories with different values on that dimension), and the role of context in classification (the probability of classifying an exemplar into a particular category is dependent on both its similarity to the members of that category as well as to the members of all of the other retrieved categories). It can also explain conceptual combinations, because instead of simply combing the two prototypes, we can combine specific exemplars. In fact, since exemplar models store all of the instances of a concept that we've encountered, it's hard to imagine any feature of permanently-stored concepts that it can't capture.

While that is a blessing when it comes to fitting exemplar models to data, it's not much of one when attempting to use the models to say something positive about concepts and their representations, and this turns out to be the biggest flaw of exemplar models. In many cases, they look more like statistical tools for analyzing classification data than actual theories of concepts, because they can model any possible data set (since they contain all the information!). What have we learned about categories if exemplar models can explain any data set, be it logically possible but empirically impossible, or actually obtained through empirical research? Not a whole hell of a lot. This problem doesn't make exemplar models any worse than prototype models, however. It turns out that if we tweak the parameters of prototype models, we can model pretty much any data set with them that we can with exemplar models. So, we have a problem. We have two types of models that are too powerful to tell us anything about concepts.

Exemplar models also have arough time with ad hoc concepts. By definition, these categories are produced on-line, rather than stored permanently in memory. If people actually can and do use these categories, as research suggests, then how do we account for them with a theory that requires that we compare new instances to old instances that have been previously stored in memory? Part of the problemhere is that exemplar models tell us nothing about the relationships between features, other than that the relationships exist. For many concepts, especially ad hoc concepts, the thematic and causal nature of relationships between features can be important for distinguishing members from non-members. Without accounting for this information, our theory of concepts will be incomplete.

For the most part in the concept literature, the problems that prototype and exemplar models share have been ignored. Instead, there has been an often violent battle between the prototype and exemplar camps, focusing more on what one type of model can do that the other can't (which quickly becomes what the latter type can do better, and on and on). Fortunately, over the last decade or so, some concept researchers have become fed up with both types of theories, and with similarity-based approaches in general. This has led to the construction of several alternative types of theories, some of which actually predate the similarity-based approaches (e.g., rule theories), and some of which are fairly new (e.g., theory theories, decision-bound theories, multiple-systems theories, and causal reasoning theories). In the next couple posts, I'll describe some of these alternatives, and try to wrap all of this up. It won't be easy, because as you've probably noticed already, things are pretty messy. It's hard to say, at this point, exactly what it is that we know about concepts.

1 Kruschke, J.K. (2005). Category Learning. In: K. Lamberts and R.L. Goldstone (eds.), The Handbook of Cognition, pp. 183-201. London: Sage. This equation describes most exemplar models fairly well.
2 Nosofsky, R.M., & Palmeri, T.J. (1997). An exemplar-based random walk model of speeded classification. Psychological Review, 104, 266-300.
3 Medin, D.L. & Schaffer, M.M. (1978). Context theory of classification learning. Psychological Review, 85, 207-238.