Mixing Memory: The Effects of Color Names on Color Concepts, or Like Lazarus Raised from the Tomb

Sunday, August 14, 2005

The Effects of Color Names on Color Concepts, or Like Lazarus Raised from the Tomb

After I finally finished Language in Mind, about which I posted the other day, I went back and looked at some of the literature on linguistic relativity that I had read over the years, but had mostly forgotten. And since linguistic relativity has always been a favorite topic of mine, I thought I'd post a little more about it (it may not be a favorite topic of yours, but hey, this is my blog!). In the early days of cognitive science, the majority of the studies designed to test the Sapir-Whorf hypothesis, or other linguistic relativity hypotheses, looked at the effects of color terms on color concepts. Early on, much of this work produced promising results for supporters of the S-W hypothesis. But in 1972, E.R. Heider published a paper titled "Universals in color naming and memory" that effectively killed the Sapir-Whorf hypothesis¹. Heider compared memory for colors in speakers of two different languages, English and the language of the Dugum Dani. The Dugum Dani is a remote hunter-gatherer tribe in New Guinea that has had little exposure to western culture. They have two basic color terms, compared to 11 in English. Thus, the comparison between the two presented a particularly strong test of the Sapir-Whorf hypothesis: if the speakers of a language with 2 color terms and the speakers of a language with 11 both have the same or highly similar color concepts, then we're justified in dropping the idea of linguistic relativity, at least for color. And that's what Heider found: English speakers and members of the Dugum Dani tribe displayed highly similar color memory. Other researchers found similar evidence in the comparisons of speakers of several other languages with varying numbers of color terms, but for all intents and purposes, linguistic relativity was dead after Heider.

Until the 1990s, that is. Looking at color terms presents a test of a particularly strong version of linguistic relativity. Color is a concrete, physically-defined category with a well-known neural basis. Thus, the explanation for Heider's data, which was widely accepted, was that color concepts are determined not by color names, but by the physiology of color perception. However, in the 90s, researchers began to think color was too strong a test, and that for more abstract domains, language does play a constraining role. Over the last several years, a growing body of evidence has shown that this is in fact the case. For abstract domains like time, number, space, and substance, language can be highly influential. But anyone who's really interested in linguistic relativity always has color in the back of his or her mind. If evidence for universal color categories can kill the Sapir-Whorf hypothesis, then evidence for cultural variance in color categories can bring it back to life.

But there are difficulties in studying cultural variation in color. You can't just study the speakers of languages spoken by people in industrialized nations, because there has been a lot of interaction between the speakers of those languages. You can't even study languages spoken exclusively by people in unindustrialized nations who have had a lot of exposure to western culture, because speakers of those languages tend to adopt color terms from western languages (especially from English). So, you have to find remote tribes that speak languages that have had relatively little outside influence. That takes money and time. Furthermore, there is always the problem of running the same experiment in multiple languages. You never know whether the experiment is exactly the same to speakers of different languages (and if you're an adherent of some version of linguistic relativity, you have to believe that it isn't!). That's a particularly big problem when you're studying members of remote hunter-gatherer tribes. Psychology experiments seem weird to American undergraduates who are taking a course about psychology and its experiments. Imagine how odd they must seem to hunter-gatherers who've never heard of psychology. But Heider's experiments suffered from these problems, too, so if there's reason to doubt any cross-cultural research on color concepts, then there's reason to doubt Heider's. For some, that doubt is all the motivation they need to do more research.

Enter Debi Roberson, and her colleagues. Roberson believes that there is evidence in Heider's data that Heider's conclusions may have been a bit hasty. For instance, Dani color memory was much worse than that of English speakers, even though the error patterns were highly similar. Heider has no explanation for this. Perhaps it is an indication that Dani color concepts really are different from those of English speakers. Armed with her doubt of Heider's data, Roberson set out to replicate his results, and further test the effects of color names on color concepts using new methods. For this post, I'll describe two of her methods: color memory, which attempts to replicate Heider's findings, and categorical perception.

The experiments on color memory go like this. First, you have to determine the number of basic color terms in a language. You do this by having people name Munsell color chips, which depict colors across the visible spectrum. You then determine the color names that were used to describe the bulk of the spectrum. In doing so, you get graphs that look like the following for English-speakers (from Roberson et al. 2000²:

The numbers at the top and on the side (which are hard to see, I know) are the numbers and labels for the Munsell chips. While you're eliciting color names, you also ask the participants to indicate the best example of each color (the most common answer to this question for each name is represented in the above graph by the dots). Roberson and her colleagues have done this for three cultures, English (in the graph above, which gives 10 basic color terms, as compared to the 11 that Heider found), as well as the Berinmo tribe, which is also from New Guinea, and the Himba tribe from Namibia. Both the Berinmo and Himba tribes have had very little exposure to western culture, and their languages lack any color terms borrowed from other languages. Both of them have five color terms. Here are the graphs of their color names, which you can compare to the graph of English color names above³:

The Berinmo and Himba graphs look somewhat similar, but are different enough for comparison. The numbers on these two graphs indicate the number of participants who said that that chip was the best example.

After you've got the naming data, you can do the memory task. The materials for the task include low saturation color chips (i.e., chips that are near naming boundaries, or otherwise far from the best examples of a color name) from the English color categories. The participants are shown a chip by itself for five seconds, and then the chip is covered. After thirty seconds, the participants are shown a full array of chips (40 total) and asked to identify the color they had just seen. The low saturation chips are chosen because they will produce high error rates. The key data is what sorts of errors participants make. If they tend to mistakenly choose other chips from the same color category in English, as English participants do, then we can infer that color categories are universal. However, if there errors tend to involve choosing chips that have the same name in Berinmo or Himba, then we can reason that color terms affect color perception and memory, and thus that color categorization is not universal. This would be strong evidence for linguistic relativity in color concepts. And that's what Roberson and her colleagues found. Berinmo participants tended to make errors consistent with their color names, while Himba participants made errors consistent with theirs. Neither made errors consistent with English color names (the correlations between naming and memory for Himba participants were r = .559 for memory and Himba names and r = .036 memory and English names; the correlations were similar for Berinmo vs. English for Berinmo speakers).

To provide further evidence that color names affect color categories, Roberson et al. (2000) and Roberson et al. (2005) conducted an experiment on categorical perception with the Berinmo and Himba. In categorical perception, within-category exemplars tend to be treated as more similar than between-category exemplars, even when the between-category exemplars are more similar physically. This is particularly interesting in color perception: a color exemplar classified as red will be more similar to other exemplars of red, particularly the best examples of red, than it will be to examplars of neighboring colors, even if the exemplar falls on the physical boundary between the two colors and is thus closer, physically, to exemplars from the neighboring colors than it is to the best example of red. If Berinmo and Himba speakers demonstrate categorical perception effects consistent with their labels, but not with other labels (particularly English), then we can conclude that their color naming affects their color concepts.

To test this with Berinmo speakers, they tested participants on a category distinction present in English, but not Berinmo (green-blue) and one present in Berinmo, but not English ("nor" and "wor," as in the graph above). They presented participants with three Munsell color chips, and asked them which two were the most similar to each other. Two of the chips were highly physically similar, i.e., they were close to each other in the physical color space. One of those two chips also shared the same label as the third chip. Thus, you might have two "wor" chips, and one "nor" chip, with the "nor" chip being physically more similar to one of the "nor" chips than the other "nor" chip. If participants consistently answer that the two "nor" chips are more similar than the physically similar "nor" and "wor" chips, then they will have exhibited a categorical perception effect. Furthermore, if they do not exhibit a categorical perception effect for the green-blue distinction (i.e., they pick the more physically similar chips when the choices are two classified as green and one as blue), then we can conclude that it is the naming, rather than any universal physiological aspect of color perception, that is driving the categorical perception effect. And that's what Roberson et al. (2000) found for Berinmo. Roberson et al. (2005) found similar categorical perception effects for the Himba speakers.

It's interesting that the Berinmo and Himba tribes have the same number of color terms, as well, because that rules out one possible alternative explanation of their data. It could be that as languages develop, they develop a more sophisticated color vocabulary, which eventually approximates the color categories that are actually innately present in our visual systems. We would expect, then, that two languages that are at similar levels of development (in other words, they both have the same number of color categories) would exhibit similar effects, but the speakers' of the two languages remembered and perceived the colors differently. Thus it appears that languages do not develop towards any single set of universal color categories. In fact, Roberson et al. (2004) reported a longitudinal study that implies that exactly the opposite may be the case⁴. They found that children in the Himba tribe, and English-speaking children in the U.S., initially categorized color chips in a similar way, but as they grew older and more familiar with the color terms of their languages, their categorizations diverged, and became more consistent with their color names. This is particularly strong evidence that color names affect color concepts.

It appears, then, that Roberson and her colleagues have laid their hands on the Sapir-Whorf hypothesis and raised it from the dead with their experiments using members of the Berinmo and Himba tribes. We should, of course, take these results with a healthy dose of skepticism, because it does involve testing people in very different languages and cultures and comparing their results, which, as I said earlier, is a big problem. However, the growing body of evidence from Roberson and her colleagues' experiments is hard to deny. I don't know about you folks, but I find the revival of Sapir-Whorf incredibly exciting.

¹ Heider, E.R. (1972). Universals in color naming and memory. Journal of Experimental Psychology, 93, 10-20.
² Roberson, D., Davies I. & Davidoff, J. (2000) Colour categories are not universal: Replications and new evidence from a Stone-age culture. Journal of Experimental Psychology: General , 129, 369-398.
³ The Himba graph is from Roberson, D., Davidoff, J., Davies, I. & Shapiro, L. (2005) Colour categories in Himba: Evidence for the cultural relativity hypothesis. Cognitive Psychology, 50, 378-411.
⁴ Roberson, D., Davidoff, J., Davies, I.R.L. & Shapiro, L. R. (2004) The Development of Color Categories in Two languages: a longitudinal study. Journal of Experimental Psychology: General, 133, 554-571.

13 comments:

Anonymous said...: I don't see the excitement. It's a terribly weak version of the Sapir-Whorf hypothesis. People are shown a color, which is terribly hard to remember in any great detail, and so they think "red" to themselves. Then when they are forced to pick from dozens of chips that all look alike, they rely on this verbal memory to pick a chip, since they do not have a reliable visual memory for such fine-grained distinctions.

You'd expect the same anchoring effect for ANY categorical task, regardless of whether the subjects spoke different languages or not. All this seems to show is that they do actually group different colored objects into different verbal categories, which I don't think anybody would deny. Am I missing something?; 8/14/2005 9:27 PM
Anonymous said...: I'm with you -- any evidence for the Sapir-Whorf hypothesis delights me. I wish you'd comment on the work that Dan Slobin has been doing in this regard lately for manners of motion; I'd be interested in your reaction. (If you've already done that and I've missed it, my apologies.)

Suzette Haden Elgin; 8/15/2005 7:24 AM
Chris said...: Tim, there are two reasons for considering it strong evidence, and I think a stronger version of linguistic relativity than those that have been discussed by people like Boroditsky or Slobin over the last several years:

1.) It's color. Almost all of the literature on linguistic relativity and determinism prior to the late 1970s was on color, and after 1968 or so, it was all pointing towards universal color categories that were unaffected by a language's system of color terms. Any evidence indicating that there is, in fact, cultural variability that correlates with color naming is strong evidence for linguistic relativity simply because the primary evidence against it came from color.
2.) It's color. Color was, and for many still is, thought to be determined entirely by the physical properties of the stimulus and the makeup of the color processing parts of the visual system. It's about as concrete as a category can get. If you can show that it varies, even a little, as a result of color naming, then you've strengthened the contemporary version of linguistic relativity significantly.; 8/15/2005 1:29 PM
Anonymous said...: Aha. So it isn't the memory task that is the most interesting, but the very fact that they do use different names to divide up the color space. How can you qualify that as an effect of the language rather than an effect of the culture though?; 8/15/2005 2:08 PM
Chris said...: Yes, and that their linguistic divisions appear to affect their color perception.

But you're right, it's impossible from this data to tease apart the effects of culture and language. The fact that color memory and categorical perception correlate so well with naming indicates that language plays a role, but it may play an indirect role through cultural learning. Determining how language affects color concepts will take future research (and I'm not exactly sure how to do it).; 8/15/2005 2:11 PM
Anonymous said...: See, I'm not sure I buy that this proves their linguistic divisions affect their color _perception_. It could just as easily be that they are explictly verbally coding what they see, and that is affecting their later pick in the memory experiment.; 8/15/2005 4:00 PM
Chris said...: That is possible, but I'm wondering how that works with the categorical perception task. Categorical perception is a pretty well-understood phenomenon, and it doesn't appear to be caused by simple verbal encoding.; 8/15/2005 4:02 PM
Anonymous said...: Hmm. You're saying the categorical perception task is more than the explicit thought "that one is nor, and so is that one, but that one is wor, so I'll pick the two that share a categorical label as most similiar"? I'm not sure I see that.

How do you think of linguistic relativity neurologically? It seems to me, at least, that the visual system in both sets of people is going to be the same. The linguistic relativity seems possible, but it must come in at the semantic/categorization or verbal level, no? This is where my thought that it's a fairly weak hypothesis comes in: it seems to be reducible to "people in different cultures group things into different semantic categories, which influences them in some categorization tasks". I still don't see where that's wrong.; 8/15/2005 9:03 PM
Chris said...: Tim,

I really don't know how it works neurologically, though as popular as cognitive neuroscience is these days, I'm sure someone's studying it.

However, I think we can be pretty sure that the semantic categories are pretty ingrained, and that it's not just people explicitly choosing two cards in the cat. perception task because they know they have the same label.

I should have made it more clear that I was only describing two of the several studies they ran. They also included other short and long-term memory tasks for basic colors (in the paper they're called focal colors), as well as for low-saturation colors; they used a forced-choice recognition task (which shows categorical perception effects); and they attempted to train people on the distinctions present in one language but not the other, and found that people had a very hard time learning those distinctions. I'm just not sure how we can explain all of that without saying that there's some top-down influence on perception.

By the way, I'd be happy to send copies of any of the papers to anyone who's interested in reading about all of the experiments. I warn you, Roberson's writing can be... sloppy at times, which makes her a bit hard to read, but it's interesting nonetheless.; 8/16/2005 11:29 AM
Anonymous said...: I see your point, I think we are just disagreeing about what "top down influence on perception" really means.

Nobody would say it is a top down influence on perception that I treat dogs differently from cats, even if we found a culture where they did not distinguish between them in their actions, but did distinguish between orange cats and brown cats (which I don't). It's simply a different semantic grouping of the same perception, causing me to treat them differently (and do memory tasks differently, etc..). I fail to see what makes you believe this color effect is any different, other than the fact that 'color' is a lower level property. Without some evidence that what they experience is different (e.g., they have a harder time distinguishing between two chips of the same 'color name' than people who use different names for the two chips, even with them right in front of them) I just don't see why you would want to call this an influence on 'perception', rather then simply categorization.

Anyway, I'll look up the papers when I have more time next week and see if they can convince me.; 8/16/2005 5:47 PM
Chris said...: The experiment in which they tried to teach individuals categories that weren't present in their language (but were in another) may be similar to the discrimination task you describe.

I still think that this is about the strongest version of linguistic relativity that we're going to get. It's not as strong as the strong versions of Sapir-Whorf, but it's stronger than anything that's been floating around since the 60s. Their effects occur in non-verbal tasks (and in other experiments, with verbal interference), indicating that what they find is not thinking-for-speaking, even implicitly (i.e., they're not just verbally encoding the memory stimuli -- in fact, they have a hard time naming them, such that some participants don't even give them a name). These are observable effects on nonverbal cognitive tasks, and given that the effects occur on several different tasks, they they seem pretty robust. Of course, it doesn't mean that language changes color perception radically, but it does appear to constrain it. These experiments don't rule out all alternative explanations, but I think they at least establish that much, and if nothing else, they put to rest Heider's experiments and interpretations.; 8/16/2005 11:06 PM
Stephen said...: I remember a simpler test, done years ago, with the Navaho using a red ball, a yellow ball and an orange ball. When given the orange ball to play with and then asked to pick out the ball played with fifteen minutes later, they always picked yellow or red.

Much simpler test, and it fit with the way witnesses remember colors in automobile collisions.

I used to do the same test with little kids using a blue, green and teal ball.; 8/18/2005 9:58 PM
Stephen said...: Oh, got here by a recommendation by Dr. Elgin ( http://www.livejournal.com/users/ozarque/ )

She mentioned that she reads this blog daily.; 8/18/2005 9:59 PM