Stereotype Accuracy: Toward Appreciating Group Differences by Yueh-Ting Lee, Lee J. Jussim, and Clark R. McCauley (1995, review)

I recently tweeted this book, but didn’t read it. Since it was short (330 pages), I decided to give it a try. This is knowing that older works in psychology are generally dangerous to read given the replication crisis. However, since the replication crisis has only reinforced the validity of the stereotype accuracy field, I am not so concerned about this (for those who don’t know: we have replicated typical stereotype accuracy findings in multiple pre-registered studies, e.g. Kirkegaard & Bjerrekær 2016, and we have another 2-3 on the way, see also this replication of sex stereotypes and movie preferences). With that being said, I will do the usual thing of presenting a bunch of quotes with some comments.

Stereotypes are based in prejudice. This is actually a variant of the “illogical in origin” charge, and it reflects an assumption underlying much of the first 30 years of research in stereotypes (e.g., Adorno, Frenkel-Brunswick, Levinson, & Sanford, 1950; Katz & Braly, 1933; LaPiere, 1936). Especially if prejudice is considered an affective predisposition to a group (an attitude of liking or disliking a social group), there is considerable historical evidence suggesting that stereotypes may sometimes serve to justify prejudice. National stereotypes, in particular, can change quickly with changing international attitudes and alliances (e.g., Americans had negative views of Germans and positive views of Russians during World War 11, but positive views of Germans and negative views of Russians after World War 11; see Oakes, Haslam, & Turner, 1994, for a review).

Interestingly, however, there has been little empirical study of the relation between strength or accuracy of stereotyping and attitude toward the stereotyped group. One example of this kind of inquiry is a study by Eagly and Mladinic (1989), which found that strength of gender stereo- typing correlated only about .2 to .3 with attitudes toward men and women (although the study found considerably higher correlations between stereotypes of and attitudes toward Democrats and Republicans). Similarly, McCauley and Thangavelu (1991) found that strength of gender stereotyping of occupations was unrelated to attitude toward women in nontraditional occupations, although stereotype strength was positively correlated with accuracy.

The first example is poor because the it is about affection for some group, which is not a stereotype with some objective criterion data. The second set of results are very interesting. We have been planning to do such a study of various sex differences and stereotypes of them, while relating this to such opinions, so I am happy that others already too some steps into this one can build upon and replicate.

As noted earlier, however, we know of no research documenting the existence of people who believe all members of any stereotyped group have any particular attribute. In casual conversation, when people say things like “New Yorkers are loud and aggressive,” we doubt that they mean all New Yorkers. Instead, they most likely mean that in general, or on average, New Yorkers are louder and more aggressive than most other people. Are these people being irrational if they do not change their belief when confronted with a calm, passive New Yorker? We do not think so. Should you change your belief that Alaska is colder than New York, even if we can show you evidence that one day last month, it was warmer in Alaska? Again, we do not think so. In fact, it would be irrational in a statistical sense if you did change your belief on such minimal evidence (see Tversky & Kahneman, 1971, on the “law of small numbers”). Similarly, if 12 million people live in the New York area, and if “New Yorkers are loud and aggressive,” means something like “three fourths of all New Yorkers are loud and aggressive,” then there are still 3 million New Yorkers who are not loud and aggressive. It would be irrational to change a belief about millions of New Yorkers on the basis of a few disconfirming individuals.

This is linked to the broader issue of interpretation of what we might call under-specified quantifiers in logic. If we say “Xers are Y”, we leave out the quantifier, e.g. “all”, or “most”, which could also be a specific proportion (e.g. 45%). In general, the speaker of such sentences expects the listener to apply principle of charity and use a reasonable interpretation. But somehow when it comes to statistical links that people dislike (anything of that says your ingroup or protected groups are bad), suddenly people lose their ability to apply principle of charity, and instead apply principle of uncharity, meaning put in the least plausible quantifiers such as “all”. No one who says stuff like “New Yorkers are loud and aggressive” ever means “All New Yorkers are loud and aggressive”, they always mean something along the lines of “New Yorkers are louder and more aggressive than most other people”, where the latter refers to some suitable comparison groups, say, people from Michigan, or Americans in general. This relates back to some of my earlier blogging years ago on philosophy of language. See in particular this post on the principles of propaganda.

Stereotypes imply genetic origins of group differences This charge implies that we already know that many or most group differences do not have substantial genetic foundations. The truth is, of course, that we do not know any such thing. Indeed, many have been surprised by recent evidence suggesting that even political and religious views may be more similar in separated monozygotic twins than in separated dizygotic twins (Bouchard, Lykken, McGue, Segal, et al., 1990). Although many psychologists may prefer environmental explanations to biological ones, most re- searchers also agree that it is exceedingly difficult to distinguish biological and environmental contributions to group differences (e.g., Gould, 1981; Mackenzie, 1984).

If we do not know the extent to which genetics causes differences among groups, we are in no position to declare that people who believe in genetic differences are inaccurate. Their beliefs may not be supported by scientific evidence, but this is because the evidence is sparse or its interpretation unclear, not because the evidence disproves genetic sources of group differences.

Even more important, this charge suffers a fundamentally flawed assumption-that people actually assume a genetic basis for group differences. We are aware of only one recent study that examined the degree to which nonpsychologists attribute group differences to biological as opposed to environmental causes (Martin & Parker, 1995; cf. Buchanan & Cantril, 1953). This study showed that a sample of undergraduate students believed that differences in socialization and opportunities were a stronger basis for gender and race differences than were differences in biology. Whether people other than undergraduates hold similar beliefs is currently an open question.

Curious to read this 1995 writing considering Jussim’s current stance of moratorium. Somehow, mounting evidence for race realism and hereditarian models has made Jussim less amenable to their scientific study. Weird!

“They all look alike to me” (outgroup homogeneity). Another more sophisticated accusation against stereotypes is that they lead people to assume that members of outgroups are more similar to one another than they really are (the outgroup is seen as more homogeneous than it really is). This is one of the few accusations against stereotypes that have received some empirical attention. Although there may be some tendency for people to see outgroups as less diverse than they really are (see Judd & Park, 1993, for a review), outgroup homogeneity is far from universal (see, e.g., Linville, Fischer, & Salovey, 1989; Simon & Pettigrew, 1990). In fact, outgroup and minority group members often see themselves as more homogeneous than they see ingroup or majority group members (Brewer, 1993; Lee, 1993; Simon & Brown, 1987). Americans and Chinese perceivers both judge Americans to be more diverse than are the Chinese-which, on many dimensions, they really are (Lee & Ottati, 1993). Like many of the other charges, outgroup homogeneity seems to be a hypothesis worth pursuing rather than an established fact.

Sounds like this kind of finding might have resulted from bias in choice of target groups. If one compares white people’s beliefs about homogeneity of e.g. political views of whites and blacks in USA, whites are a lot more diverse: they vote ~55% Republication, while blacks vote ~10% Republican. Funny to see that the Chinese agree on who is more variable. This seems to be a research question ripe for exploration. With regards to target groups, it was recently shown that the biased selection of these has misled researchers about the strength of prejudice for decades. I refer to the findings from this great study:

Motivated to protect their worldviews. One way to protect one’ worldviews is through prejudice towards worldview-dissimilar groups and individuals. The traditional hypothesis predicts that people with more traditional and conservative worldviews will be more likely to protect their worldviews with prejudice than people with more liberal and progressive worldviews, whereas the worldview conflict hypothesis predicts that people with both traditional and liberal worldviews will be protect their worldviews through prejudice. We review evidence across both political and religious domains, as well as evidence using disgust sensitivity, Big Five personality traits, and cognitive ability as measures of individual differences historically associated with prejudice. We discuss four core findings that are consistent with the worldview conflict hypothesis: (1) The link between worldview conflict and prejudice is consistent across worldviews. (2) The link between worldview conflict and prejudice is found across various expressions of prejudice. (3) The link between worldview conflict and prejudice is found in multiple countries. (4) Openness, low disgust sensitivity, and cognitive ability – traits and individual differences historically associated with less prejudice – may in fact also show evidence of worldview conflict. We discuss how worldview conflict may be rooted in value dissimilarity, identity, and uncertainty management, as well as potential routes for reducing worldview.

Scientific research on stereotype accuracy is in its infancy. Few studies have addressed accuracy by comparing stereotypes to any sort of criterion (see Judd & Park, 1993; Jussim, 1990; Ottati & Lee, chapter 2, this volume, for reviews). As yet, there is little in the way of shared theory, questions, methods, or paradigms for investigating stereotype accuracy and inaccuracy. Nor should there be: In such an early stage, inquiry into stereotype accuracy should be open to many different approaches. We hope that this book helps initiate a revival of scientific interest in stereotype accuracy. Perhaps 15 years from now, research will have led us to better questions, improved methods, and even some unifying theories.

This book is from 1995, so 15 years is 2010. Was there much progress by 2010? Not really, but Jussim did at least take up the topic again in his 2012 book (Social Perception and Social Reality: Why Accuracy Dominates Bias and Self-Fulfilling Prophecy), which inspired my research program in this field. I actually learned about it from Steven Pinker’s Blank Slate, which referred to an older book chapter by Jussim et al (The Unbearable Accuracy of Stereotypes).

Although no single definition of stereotype is unanimously accepted, most researchers agree that stereotypes involve ascribing characteristics to so-cia1 groups or segments of society (Lee & Ottati, 1995; D. Mackie & Hamil- ton, 1993; Oakes, Haslam, & Turner, 1994; Zanna & Olson, 1994). These characteristics may include traits (e.g., industrious), physical attributes, societal role (e.g., occupation), or even specific behaviors. Stereotypic characterizations of a social group are implicitly comparative. For example, the belief that the “Chinese are industrious” implies that the Chinese are more industrious than most other ethnic groups. Many scholars make a distinction between the mean and variance of each dimension composing a stereotype. For example, an individual may believe that the average basketball player is extremely tall but also recognize that there is considerable variability among basketball players along this dimension. A stereotype may be accurate or inaccurate in either of these respects.

Formal analysis of stereotyping began with Lippmann’s ( 1922/ 1965) seminal book, entitled Public Opinion, Lippmann stressed that stereotypic representations of social groups were both incomplete and biased. More- over, he emphasized that stereotypes were insensitive to individual variability within social groups and persisted even in the face of contradictory evidence. At the same time, Lippmann acknowledged that stereotypes serve a basic and necessary function: economization of cognitive resources. Katz and Braly (1933) performed one of the earliest empirical investigations of social stereotyping. In their study, subjects were given a list of 84 psychological trait adjectives (e.g., sly, alert, aggressive, superstitious, and quiet) and were asked to “characterize . . . ten racial and national groups” (Katz & Braly, 1933, p. 282). The stereotype of each group was defined as the set of traits most frequently assigned to the group. For instance, the Chinese stereotype included superstitious (35%), conservative (30%), and industrious ( 19%). Katz and Braly (1933) were primarily interested in the link between stereotypes and prejudice. Stereotypes, in their view, were public fictions with little factual basis. These public fictions served to justify unwarranted negative emotional reactions toward social groups.

From about 1940 to 1970, debate concerning the accuracy of stereo- types became prevalent. Some researchers argued that stereotypes existed without any realistic basis or kernel of truth (Fishman, 1956; Klineberg, 1954; LaPiere, 1936; Schoenfeld, 1942).4 It was noted that certain social stereotypes directly contradicted more objective social observations. For ex- ample, Armenian laborers residing in southern California were stereotyped as dishonest, deceitful liars and troublemakers during the 1920s. In fact, LaPiere (1936) found that Armenians in this locale appeared less often in legal cases and possessed credit ratings that rivaled those of other ethnic groups. Other psychologists (Campbell, 1967; Ichheiser, 1943, 1970; Schuman, 1966; Triandis & Vassiliou, 1967; Vinacke, 1956) argued that stereo- types possess a kernel of truth. Vinacke (1956), for example, postulated that it would be ridiculous to assert that groups of a given national or cultural origin do not have certain general characteristics that differentiate them from groups of different origin. Triandis and Vassiliou (1967) empirically demonstrated that Greek and American stereotypes possess a substantial component of veridicality, especially when they are elicited from people who have firsthand knowledge of the group being stereotyped.

Some cool studies people used to do in social psychology before it became massively overrun by lefties. It would be interesting to revisit some of these studies.

Accuracy as Convergence Across Heterostereotypes

As noted previously, a heterostereotype is simply one group’s stereotype of another group. In many cases, different perceiver groups share a similar heterostereotype of a particular target group. This pattern of perceptual convergence is consistent with the notion that stereotypes can accurately reflect the target group’s objective characteristics. According to G. W. Allport’s (1954) “earned reputation theory,” this form of convergence may be especially prevalent when perceivers have had the experience of interacting directly with the target group.

Vinacke (1949) examined the stereotypes of eight interacting groups at the University of Hawaii. These were Japanese, Chinese, White, Korean, Filipino, Hawaiian, Samoan, and Black students. Vinacke reported that the different subject groups agreed on essential aspects of group images. For example, there was strong agreement that Hawaiians are musical, easygoing, and friendly. Analogous findings were obtained by Prothro and Melikian (1954, 1955). They found convergence in stereotypes held by Arab and American students with reference to Germans, Blacks, and Jews.

Accuracy as Convergence Between a Heterostereotype and an Autostereotype

In some cases, heterostereotypes of a target group correspond to the tar- get group’s self-image, or autostereotype. Vinacke ( 1949), in addition to finding convergence across heterostereotypes, obtained convergence between heterostereotypes and the autostereotype of the target group. For example, in keeping with the image held by other groups, Hawaiian students perceived themselves as musical, easygoing, and friendly. Almost two decades later, Schuman ( 1966) reported similar findings when investigating stereotype accuracy in Bangladesh (previously East Pakistan). In this study, East Pakistani students were asked to describe the general characteristics of people in four districts (i.e., Noakhali, Comila, Barisal, and Mymensingh). One third of the student sample characterized the people of Noakhali as pious, shrewd, and money loving. More important, the Noakhali agreed with the perceivers of the other three districts (i.e., Comila, Barisal, and Mymensingh). That is, the Noakhali, who were perceived by the other three groups as pious, shrewd, and money loving, perceived themselves to be more religious and more concerned with job payment and advancement.

Abate and Berrien (1967) used 15 behavioral orientations (e.g., achievement, deference, order, exhibition, and autonomy) from the Edwards Personal Preference Schedule to study stereotypes between Americans and Japanese. They found relatively high agreement between self-stereotypes and those provided by the opposite cultures. For example, both the subjects at Rutgers University and the subjects at universities in Tokyo reported that Japanese people are more likely to follow orders and to be less autonomy oriented than Americans. In a study of Greek and American stereotypes, Triandis and Vassiliou (1967) demonstrated that this form of perceptual convergence can increase when members of the two groups experience firsthand contact with each other.

Almost two decades later, Bond (1986) examined the mutual stereo- types of two interacting groups at the Chinese University of Hong Kong. American exchange students and local Chinese undergraduates were asked to rate a typical ingroup member (autostereotype) and a typical outgroup member (heterostereotype) on 30 bipolar trait scales. He reported that both groups agreed that the typical Chinese student is more emotionally controlled, but less open and extraverted, than the typical American ex- change student (also see Bond, 1986, p. 239). Convergence among Sino-American autostereotypes has also been reported along other dimensions (Lee, 1995; Triandis, 1990). For example, Lee (1995) reported that both American and Chinese individuals perceive the government of the United States of America as more democratic and open to critical opinion than the government of the People’s Republic of China.

Accuracy as Sensitivity to Intragroup Variation

As noted previously, many scholars assume that stereotypic images of a social group contain a representation of both the group mean and vari- ance along each attribute dimension. Most of the previously cited studies suggest that representations of the target group mean possess an accuracy component. Lee and Ottati ( 1993) have recently presented evidence that indicates that stereotypic representations of target group variability also possess an accuracy component.

Lee and Ottati (1993) began by citing a wide range of anthropological studies that suggest that Chinese people are, objectively speaking, more homogeneous than American people. In keeping with the kernel-of-truth hypothesis, Lee and Ottati (1993) reported that both Chinese and American subjects perceived Chinese people to be more homogeneous than American. Mean perceptions of ingroup and outgroup homogeneity are shown in Table 1 for both the Chinese and the American sample. In accordance with the kernel-of-truth hypothesis, both samples agreed that Americans are more heterogeneous than Chinese. This suggests that both groups were capable of accurately perceiving the amount of intragroup variation within both cultures.

It’s amazing that people keep forming the same stereotypes about groups across time and place if these have no relation to reality. Until recently, I did not have many Chinese friends owing to the lack of Chinese people in Denmark. Now that I have a bunch of them, I see the point of the stereotypes.

In Studying Stereotype Accuracy, Does One Begin With the Stereotype, With Objective Group Characteristics, or With Both Simultaneously?

Three basic approaches to conducting stereotype accuracy research can be identified: (a) Begin with objectively determined group characteristics, (b) identify variables that have both stereotype and actual group- differences measures, and (c) begin with the documented stereotype. Al- though one of the earliest accuracy studies (LaPiere, 1936) took the third approach, recent accuracy researchers have emphasized the first two approaches. Empirically determined stereotype accuracy may vary with type of sequential approach, and each strategy has both strengths and weaknesses.

Some accuracy researchers first identify objective and quantitative indexes of group characteristics, especially those that differentiate logical contrast categories, and then ask a sample of people to estimate these indexes. For example, McCauley and Stitt (1978), in their investigation of the accuracy of racial stereotypes, began by finding census data on demographic variables for Black Americans and White Americans (e.g., welfare rate and percentage unemployed) and then asked subjects to estimate these rates. They found that stereotypic perceptions matched reality fairly well, but underestimated actual racial differences. This research certainly showed that people seem relatively knowledgeable about some aspects of publicly available data about societally important behaviors. But did it show that racial stereotypes are accurate (albeit conservative) representations of reality? Do the social perception indexes that McCauley and Stitt used constitute a good operationalization of the structured sets of beliefs that Americans have about Whites and Blacks? In some ways, the answer seems obviously yes, because the measured variables included welfare rate (Blacks are often stereotyped as lazy and dependent on welfare) and percentage who were high school graduates (a long-standing feature of Whites’ beliefs about Blacks is that they are not intelligent). At the same time, census data were not available for other facets of shared beliefs about Black Americans (e.g., the stereotype that Blacks are musical), and as a consequence, the accuracy of these stereotypic conceptions was not investigated.

This highlights one way one can bias the findings in some desired direction. It is likely that some traits are associated in general with less accurate stereotypes, and also where the criterion data are weaker. Doing a bunch of studies with these traits would then generally result in findings of poor accuracy. Similarly, one can sample traits with high accuracy to bias the other direction. So the statistical problem is that we want to generalize to traits in general, but there doesn’t seem to be any obvious way to sample these. In my research, I have generally focused on sampling traits for which good quality criterion data exist. Thus means I am not generally investigation personality differences between groups because I consider the data suspect for these. And indeed, studies of national personality differences are the only ones which fail to produce high accuracies when comparing with self-report personality data. Curiously, the accuracy seems a lot higher when researchers attempt objective measures of the same traits. It would be of interest to have people rate countries on a diverse collection of country indicators and see how well they would do. I suggest one samples indicators from the Social Progress Index at random, and then have independent groups rate them.

In our first test of the shifting standards perspective, subjects viewed a series of 40 full-body photographs of male and female targets, whose true heights were known to us (Biernat et al., 1991, Study 1). Although concealed from the subjects, the male and female targets had been matched for height, that is, for every female of a certain height there was a corresponding male in the stimulus set. One third of the subjects were asked to judge height in feet and inches (objective condition), and the other two thirds judged height using subjective-short (1) to tall (7)-response scales. Of the subjective- judgment group, half were asked to judge the heights of the targets compared with the height of the average person (average-person condition), and half made height judgments in relation to the targets’ sex category, that is, compared with the height of the average man or woman (average-for-sex condition).

The shifting standards predictions were that objective height judgments would clearly reveal the operation of the sex stereotype (male tar- gets judged taller than female targets), average-person judgments would reduce this difference (because subjects were being evaluated in relation to a common standard), and average-for-sex judgments would completely eliminate this judged sex difference (because targets were compared only with same-sex others). This was precisely the pattern we found: In feet and inches, the perceived male-female height differential was about 1 .O standardized units; in average-person units, this difference dropped to 0.2; and in average-for-sex units, it was reduced to 0.0.

Given that this volume concerns accuracy, the reader should note the difficulty of drawing conclusions about accurate judgment in these data. Subjects in the objective condition did show evidence of their (accurate) belief that men are taller than women. However, in this particular set of targets, there was no sex difference in height, and therefore the judgment was inaccurate. By this latter standard, subjects in the average-for-sex condition appeared to be more accurate: Their height ratings did not differ for women and men. On the other hand, if these subjects were responsive to height cues and truly used within-sex height standards, they should have judged the women as taller than the men: In this sample, the female targets were taller than the average female, and the male targets shorter than the average male. This example serves to point out the difficulty of setting an accuracy criterion and of making confident statements about judgmental accuracy. The main point of the shifting standards model in this regard is that whether individual judgments are accurate or not, objective rating scales will be more likely than are subjective ratings to reflect perceivers’ mental representations with reasonable fidelity. In this case, we know that people believe that men are taller than women, and the objective height judgments reflect that.

In a second study testing this perspective, we extended our analysis to other sex-relevant beliefs that are also accurate and clearly associated with objective indexes of measurement. Along with height judgments, our subjects were asked to judge a different sample of 40 male and female targets on the dimensions of weight, income, and age (Biernat et al., 1991, Study 2). Half of the subjects used objective rating units (feet and inches, pounds, dollars, and years), and half used subjective units, comparing the targets to the average person (short-tall, light-heavy, financially unsuccessful- financially successful, and young-old). This study also allowed us to test an important aspect of the shifting standards model-namely, that standard shifts should occur only on stereotyped judgment dimensions. Sex is clearly differentially associated with height, weight, and income, but not with age that is, there is no stereotype that says “men and women differ, on average, in age.’’ Therefore, different judgment scales should produce different male-female judgment patterns only on the first three dimensions, but not with regard to age.

Again, this is precisely the pattern we found. For height, weight, and in- come judgments, objective ratings revealed stereotype-consistent judgment effects: Male targets were judged as taller, heavier, and richer than female targets. Subjective ratings, however, generally showed reductions of these effects, and in the case of financial judgments, a reversal: Although men were perceived as earning more money than women, the women were judged as more financially successful than the men. In contrast, for age judgments (the non-sex-linked attribute), the comparable effect was not significant.

In looking at these data using “target” as the unit of analysis, we discovered some evidence that the cognitive process described earlier in this chapter may indeed have been contributing to subjects’ judgments. Specifically, we noted that female and male targets who were rated the same in subjective units were nonetheless seen to differ substantially in objective units. For example, a female target rated a “4” on the subjective scale of financial success was objectively rated as earning about $23,000 a year. In comparison, a male target, also subjectively rated a “4,” was perceived to earn about $40,000 a year (see Biernat et al., 1991, Figure 7). Clearly, subjects used their subjective rating scales differentially to judge women and men on stereotype- relevant attributes; they did not do this on judgments irrelevant to gender (i.e., age). Other work has also replicated this latter effect by demonstrating that standard shifts did not occur on judgment dimensions for which subjects reported holding no sex stereotypes (e.g., hours of studying or number of movies seen; see Biernat et al., 1991, Study 3).

Now a days this kind of thing is also called the reference group problem or reference group effect. When using interval scales, who is the reference group? It matters quite a bit! People generally use sensible but somewhat varying implicit reference norms. Generally, then, we should use objective scales that are intuitive to people.

We realize that something is a snake and immediately respond to it on the basis of what we believe to be true about snakes in general. We realize somebody is a librarian, or an extravert, and respond to that person on the basis of what we believe to be true about librarians or extraverts. Every new object, and every new person, that we encounter is almost immediately categorized in the light of the similarity we perceive between it, or him or her, and other objects or persons we have encountered in the past. Is this kind of categorization and subsequent response on the basis of preexisting knowledge a good thing to do?

The social psychological literature contains two firm and unequivocal answers to this question: no and yes. That is, in a fairly amazing twist of scientific progress, over the past half-century, social psychology has managed to develop two independent research literatures-both active, influential, and even famous-that reach diametrically opposite conclusions on this matter. The contradiction does not seem to be noted very often; an informal survey of several social psychology textbooks indicates that the two literatures are generally safely ensconced away from each other, in separate chapters. These two literatures comprise research on stereotypes and research on base rates, respectively.


The literature that says no, we should not categorize those we meet and treat them according to our knowledge of their categories is, of course, the classic stereotyping literature that is the topic of so much of this book (e.g., Allport, 1954; Bar-Tal, Graumann, Kruglanski, & Stroebe, 1989; Ehrlich, 1973; Hamilton, 1981; Katz & Braly, 1933; LaPiere, 1936). The term stereotype is usually attributed in this context to the commentator Walter Lippman (1922). He used the term to describe the fixed and harmful images that various European nationalities stubbornly held of each other and which, he believed, helped incline them to go to war. The more general definition, the one that has gained common currency both in everyday speech and in the psychological literature, is that a stereotype is a preexisting representation of a type of person. The connotation, almost uniformly, is that this representation has an overly powerful effect on human judgment (see Ottati & Lee, chapter 2, this volume).

Since at least 1970, the nearly constant message of the stereotypes literature has been that as soon as we know what category a person belongs to-an ethnic minority, an occupation, or a place of residence-we “rush to judgment” and conclude that the individual possesses all or most of the prototypic traits that we tend to associate with his or her category. Whatever the poor individual is actually like, by contrast, and in particular whatever he or she actually does in our presence, will tend to be ignored. Indeed, social psychology’s belief in this principle is so strong that when Locksley (Locksley, Borgida, Brekke, & Hepburn, 1980; Locksley, Hepburn, & Ortiz, 1982) and, more recently, Jussim (1993) tried to argue that people do not always ignore individuating information in favor of group stereotypes, their arguments were viewed as controversial (e.g., Rasinski, Crocker, & Hastie, 1985).

So the message is that we use stereotypes too much. Whenever we can, we should ignore them. Wouldn’t the world be a wonderful place if only we could judge every person, and indeed every instance, on the basis of individual merits and not be misled by all that baggage of usually incorrect information about the attributes of his or her or its category?


The literature that says yes, we should categorize those we meet and treat them according to our knowledge of their categories comprises the some- what more recent, but nearly as popular, research on base-rate utilization. The message of this literature is that we do not use our categorical knowledge, here called base rates instead of stereotypes, nearly as much as we should. Indeed, our failure to use base rates sufficiently has been dubbed the “base-rate fallacy” (e.g., Bar-Hillel, 1980; Kahneman & Tversky, 1973).

The message of the base-rate literature is that we often possess pre-existing, probabilistic representations of what people (or things) are generally like. But we do not use that information enough. As soon as we are confronted with an individual instance of a category, such as an actual person, we are overwhelmed by the salience of this stimulus, and our more general knowledge goes out the window. We base our judgment entirely on what we see directly; more “pallid” information, such as categorical data, is ignored. Indeed, social psychology’s belief in this conclusion is so strong that when some commentators, such as Koehler (1993, 1994, in press), state that we sometimes can use base rates appropriately, their statements are also treated as highly controversial.

The conventional view, rather, is that we are generally trapped by our cognitive limitations, such as the protagonist of Nisbett, Borgida, Crandall, and Reed’s (1976) classic “Volvo” story (see also Ross, 1977). This story is not an experiment, nor is it even apparently a true anecdote. It is a hypothetical case, used to illustrate how salient, individualized information overwhelms base rates. To abbreviate, the story is that you have just read up on Volvos in Consumer Reports or some other reliable source and found out that there were, say, 10,000 satisfied Volvo owners and only 1,000 dissatisfied owners. You are considering buying one yourself when your neighbor tells you a vivid story of his brother-in-law’s Volvo, which broke down on the freeway, caught on fire, and had to be sold for scrap. What you should rationally do, say Nisbett et al., is update your statistics to be 10,000 happy owners and 1,001 unhappy owners. Instead, though, the individual case is so overwhelming that you suddenly have deep doubts about something about which you really have little more information than you did before.

The conclusion, then, is that we do not use base rates enough. But wouldn’t the world be wonderful if we could only learn to make judgments like true Bayesian statisticians and give more weight to what we know to be true about the world in general (e.g., group and probabilistic data) rather than allowing ourselves to be so distracted by the salient aspects of particular cases?


So, on the one hand, we have a literature that tells us that we apply our categorical knowledge to our evaluation of individual cases too much. On the other hand, we have a literature that says we apply such knowledge to our judgments not nearly enough. How is such a contradiction possible? I can offer three answers.

The first and most general answer is that this is just yet another ex- ample of a sort of all-too-common scientific myopia. It stems from a failure of individual investigators to appreciate, or often even to know about, any literature beyond the narrow segment in which they are directly working. As a result, literatures can develop along contradictory tracks for years before it occurs to anybody that they need to be reconciled or integrated in some way. This is a fairly routine happening; such myopia is found throughout science-certainly not just psychology-and, in this light, there is nothing unprecedented, or even very remarkable, about the contradiction that motivates this chapter.

A second, more specific answer is that these two contradictory literatures both developed under the influence of a common and more basic assumption, which has permeated social psychology for decades: People are usually wrong. Social psychology is an extraordinarily cynical field, it sometimes seems, and is particularly prone to the bias of seeing any general human judgmental tendency as likely to be misguided (Funder, 1987, 1992; Lopes, 1991). In the present case, we can see one similarity between the belief that people use categorical information way too much and the belief that people use categorical information not nearly enough In both cases, people are portrayed as judgmentally hapless, making errors that are frequent, basic, and consequential. The small problem, however, is that what we really seem to have here is a case of “damned if you do and damned if you don’t.’’

[his third answer I don’t think is good, but you can read it in the book yourself]

This is the best description of the incoherence of psychology on stereotypes & base rates I’ve seen.


Black American Versus All American Census Statistics

One of the earliest studies to look for exaggeration of group differences found underestimation instead. McCauley and Stitt (1978) asked five groups of White subjects to estimate the percentage of Black Americans and the percentage of all Americans with each of seven characteristics (e.g., “completed high school” or “illegitimate”; see Table 1). Criterion percentages for these characteristics were available from the U.S. census. Stereotyping of Black Americans was measured as the extent to which percentage estimates were different for Black Americans than for all Americans.

The results were originally published in terms of mean ratios of percentage of Black Americans divided by corresponding percentage of all Americans, but here the same data are presented in terms of mean difference in the percentage of Blacks and all Americans having each characteristic. (Psychometrically, the diagnostic-ratio measure and the percentage-difference measure tell pretty much the same story; the difference measure is preferable for its relative simplicity in calculation and interpretation. See McCauley & Thangavelu, 1991.)

Table 1 shows that the five different groups of subjects produced generally similar estimates of Black versus all American percentage differences, similar stereotypes, for each characteristic. Perceived differences for each characteristic were almost always (34 of 35 differences) in the same direction as differences taken from the U.S. census. The first characteristic in Table 1, “completed high school,” was seen as substantially less likely for Black Americans than for all Americans (i.e., counterstereotypic for Blacks). The remaining six characteristics were seen as more likely for Blacks (stereotypic for Blacks).

Entries with the superscript a in Table 1 are the 10 perceived differences that are substantially in error (perceived Black American vs. all American difference at least 10 percentage points away from census difference). All 10 of the substantial errors were in the direction of underestimating the difference between Black Americans and all Americans.

Overall, the results in Table 1 indicate considerable accuracy. Correlation between mean estimated percentage differences (stereotypes) and corresponding criterion percentage differences was high ( rs .85 to .90 for the five groups of subjects). Perceived difference departed substantially from census difference for 10 of 35 comparisons, but the errors were all underestimates rather than exaggerations.

This picture of accuracy with some underestimation of real differences might be misleading if, at the same time, percentage estimates for Black Americans were consistently exaggerated on characteristics stereotypic of Blacks. For example, estimates of the percentage of Black families headed by a female might be exaggerated (e.g., 50% vs. census 33%) even if the estimates of the Black versus the all American difference were underestimates. This result could occur if estimates of the percentage of all American families with a female head were, for some reason, even more exaggerated (e.g., 40% vs. census 12%, perceived difference then 10 percentage points vs. census difference of 2 1 percentage points).

To examine this possibility, Table 2 presents for each characteristic the mean percentage estimates for Black Americans and all Americans and the corresponding census percentages. All five groups of subjects tended to overestimate the percentage of Black Americans who had completed high school, although exaggeration of a counterstereotypic trait would predict underestimation. For the remaining six characteristics, which tended to be seen as more likely for Black Americans than all Americans, stereotypic exaggeration would predict exaggeration of estimates for Black Americans and underestimation of estimates for all Americans. For these six characteristics, 21 of 30 mean estimates for Black Americans were numerically greater than their corresponding criterion percentages; 30 of 30 mean estimates for all Americans were numerically greater than corresponding criterion percentages. These results indicate a general tendency toward over- estimation, no matter what the target group; they do not show stereotypic exaggeration.

Taken together, Tables 1 and 2 indicate that diverse groups of subjects agree in estimating differences between Black Americans and all Americans that (a) are sensitive to real differences (high correlations with criterion differences), (b) are underestimates of real differences, and (c) are based on target estimates for Black and all American percentages that are generally overestimated rather than stereotypically exaggerated. Judd and Park (1993, p. 123) suggested that Black subjects might show the same pattern of underestimation of real differences as the five groups of White subjects in McCauley and Stitt (1978), but this surmise has not yet been substantiated.

This pattern of results, overestimating minority groups, has been independently found in relation to immigrant studies. For many years, researchers have been complaining that voters (the wrong ones!) are irrational about immigration (e.g. this paper). The evidence for this is that voters overestimate the number of Muslims in their countries quite a bit. It turns out, however, that people overestimate the proportions of all salient groups in the population, this is not specific to Muslim immigrants, and thus not evidence of any irrationality on the part of the population. They presumably base their estimates on whatever they hear about in the media frequently, which will tend to be groups that the media obsess about (i.e. lefties care about) and groups that cause many social problems (hence blacks, Muslims etc.). It would be interesting to do a study where one asks people directly how much they hear about various groups in the public conversation in general, instead of asking them to estimate proportions. If this model is right, these two variables should be very strongly correlated.

When it comes to knowledge of demographic facts, misinformation appears to be the norm. Americans massively overestimate the proportions of their fellow citizens who are immigrants, Muslim, LGBTQ, and Latino, but underestimate those who are White or Christian. Previous explanations of these estimation errors have invoked topic-specific mechanisms such as xenophobia or media bias. We reconsidered this pattern of errors in the light of more than 30 years of research on the psychological processes involved in proportion estimation and decision-making under uncertainty. In two publicly available datasets featuring demographic estimates from 14 countries, we found that proportion estimates of national demographics correspond closely to what is found in laboratory studies of quantitative estimates more generally. Biases in demographic estimation, therefore, are part of a very general pattern of human psychology—independent of the particular topic or demographic under consideration—that explains most of the error in estimates of the size of politically salient populations. By situating demographic estimates within a broader understanding of general quantity estimation, these results demand reevaluation of both topic-specific misinformation about demographic facts and topic-specific explanations of demographic ignorance, such as media bias and xenophobia.

As scientists concerned with improving the social condition, we must be wary of arguments that can be used to justify the use of stereotypes. While it may be tempting to argue that a person’s beliefs that most Blacks are stupid, lazy, and aggressive represents a “social reality” and, thus, that these beliefs enrich, inform, and enhance his or her social perception, we cannot allow a bigot to continue to use his or her stereotypes, even if those beliefs seem to them to be accurate. Allowing this would be to ignore the potential damage that can result when stereotypes are misapplied. This argument is not the same as saying that it is only other people’s beliefs that are incorrect (see Oakes, Haslam, h Turner, p. 206). All stereotypes, if inappropriately applied, are unfair to the targets of judgment.

There are certainly social, political, and economic causes of group inequalities that do not involve stereotyping. And it is possible that we have overestimated the role of stereotypes as determinants of inequities in our society. Certainly, not all people who hold stereotypes practice discrimination, nor is all discrimination the result of stereotypes. Yet the role of individual and collective perceptions, such as social stereotypes, as determinants of group relationships has been well documented in the social psychological literature, and research has clearly shown that stereotypes have powerful, and frequently unintended, influence on social behavior (Macrae et al., in press).

Arguments that stereotypes are by and large accurate are premature. It is tempting, from a purely scientific point of view, to argue that stereo- types are just other pieces of social information that should be used to the extent that they provide diagnostic information about a target. Yet the cost of misusing these beliefs is potentially high, and most important, these costs are not equally distributed between the perceiver and the receiver of the judgment. It is relatively easy for individuals with power and status to convince themselves that their social beliefs are accurate and that their use is appropriate. But it is the stigmatized and the powerless for whom the inappropriate use of stereotypes really matters. The misuse of stereotypes can have grave consequences for the victims of stereotyping; thus, it be- hooves every one of us to think twice or even three times before using category memberships as a basis of thinking about others.

Confused? This is an edited book, and this chapter is one written by a typical social justice social psychologist. You can sense it easily. It begins with the false consensus-like statement of the readers sharing a common purpose, namely, the policy preference of this guy. It does not occur to him that not using stereotypes is unfair to other people — those who have to pay for the problems that the troublemakers cause. There is no value-free stance on stereotyping. Someone has to pay for the uncertainty in decision making. If we ban using stereotypes rationally, then we are just saying implicitly that decision makes have to bear the burden of uncertainty. Since decision makers are generally the good citizens (landlords, employers, police etc.), and these are generally European ancestry (this is America), the author is just implicitly assuming that social costs of uncertainty should by default be assigned to this group of people instead of the problem people, in this case, blacks and Hispanics. This is bullshit. If you want to argue such things (I of course think it is fair to argue such things!), then be explicit about it. One can advance genetic lottery arguments, yes, it is unfair that some people are born with genetic disposition for violence, laziness, and low intelligence. It was not their fault in some metaphysically defensible sense of cosmic justice. It’s unfair the genetic variants for such traits are not evenly distributed by group too. But then again, why is that other people’s problem? Does not everybody thrive relatively better in a well-functioning country? And if so, then it makes sense promote meritocratic ideals even if they are not philosophically justified in the cosmic justice sense — nothing is (my opinion!). Similar discussion by Sesardić: