Stereotype threat: current evidence in January 2020

Stereotype threat is:

a situational predicament in which people are or feel themselves to be at risk of conforming to stereotypes about their social group.[1][2] Stereotype threat is purportedly a contributing factor to long-standing racial and gender gaps in academic performance. It may occur whenever an individual’s performance might confirm a negative stereotype because stereotype threat is thought to arise from a particular situation, rather than from an individual’s personality traits or characteristics. Since most people have at least one social identity which is negatively stereotyped, most people are vulnerable to stereotype threat if they encounter a situation in which the stereotype is relevant. Situational factors that increase stereotype threat can include the difficulty of the task, the belief that the task measures their abilities, and the relevance of the stereotype to the task. Individuals show higher degrees of stereotype threat on tasks they wish to perform well on and when they identify strongly with the stereotyped group. These effects are also increased when they expect discrimination due to their identification with a negatively stereotyped group.[3] Repeated experiences of stereotype threat can lead to a vicious circle of diminished confidence, poor performance, and loss of interest in the relevant area of achievement.[4]

At least, that’s the theory. What is the evidence? It’s the usual thing. A bunch of small studies with various p-hacking issues, and then some larger ones with null results. I summarize the large sample size studies and meta-analyses. There is also a published review by skeptic academics:

The stereotype threat literature primarily comprises lab studies, many of which involve features that would not be present in high-stakes testing settings. We meta-analyze the effect of stereotype threat on cognitive ability tests, focusing on both laboratory and operational studies with features likely to be present in high stakes settings. First, we examine the features of cognitive ability test metric, stereotype threat cue activation strength, and type of nonthreat control group, and conduct a focal analysis removing conditions that would not be present in high stakes settings. We also take into account a previously unrecognized methodological error in how data are analyzed in studies that control for scores on a prior cognitive ability test, which resulted in a biased estimate of stereotype threat. The focal sample, restricting the database to samples utilizing operational testing-relevant conditions, displayed a threat effect of d = −.14 (k = 45, N = 3,532, SDδ = .31). Second, we present a comprehensive meta-analysis of stereotype threat. Third, we examine a small subset of studies in operational test settings and studies utilizing motivational incentives, which yielded d-values ranging from .00 to −.14. Fourth, the meta-analytic database is subjected to tests of publication bias, finding nontrivial evidence for publication bias. Overall, results indicate that the size of the stereotype threat effect that can be experienced on tests of cognitive ability in operational scenarios such as college admissions tests and employment testing may range from negligible to small.

Sex: females and math

For sex differences, they picked women and math as the claim to attempt to secure. The reason for this choice is that women’s relatively worse math performance is a major factor in their lower STEM representation, which feminists desperately want. Jelte Wicherts’ former PhD student, Paulette Flore, basically destroyed this idea with her dissertation. Some of it has been published as articles:

Although the effect of stereotype threat concerning women and mathematics has been subject to various systematic reviews, none of them have been performed on the sub-population of children and adolescents. In this meta-analysis we estimated the effects of stereotype threat on performance of girls on math, science and spatial skills (MSSS) tests. Moreover, we studied publication bias and four moderators: test difficulty, presence of boys, gender equality within countries, and the type of control group that was used in the studies. We selected study samples when the study included girls, samples had a mean age below 18years, the design was (quasi-)experimental, the stereotype threat manipulation was administered between-subjects, and the dependent variable was a MSSS test related to a gender stereotype favoring boys. To analyze the 47 effect sizes, we used random effects and mixed effects models. The estimated mean effect size equaled -0.22 and significantly differed from 0. None of the moderator variables was significant; however, there were several signs for the presence of publication bias. We conclude that publication bias might seriously distort the literature on the effects of stereotype threat among schoolgirls. We propose a large replication study to provide a less biased effect size estimate.

And then she did this replication study:

The effects of gender stereotype threat on mathematical test performance in the classroom have been extensively studied in several cultural contexts. Theory predicts that stereotype threat lowers girls’ performance on mathematics tests, while leaving boys’ math performance unaffected. We conducted a large-scale stereotype threat experiment in Dutch high schools (N = 2064) to study the generalizability of the effect. In this registered report, we set out to replicate the overall effect among female high school students and to study four core theoretical moderators, namely domain identification, gender identification, math anxiety, and test difficulty. Among the girls, we found neither an overall effect of stereotype threat on math performance, nor any moderated stereotype threat effects. Most variance in math performance was explained by gender, domain identification, and math identification. We discuss several theoretical and statistical explanations for these findings. Our results are limited to the studied population (i.e. Dutch high school students, age 13–14) and the studied domain (mathematics).

Various groups and GRE-like scores

A little known report (12 years old, 13 citations!) reports some strong evidence, based on 2 previous papers:

The figures speak for themselves:

These are all based on large samples in high-stakes tests i.e. real life life important context.

Academic stereotypes and tracking — in China

Educational tracks create differential expectations of student ability, raising concerns that the negative stereotypes associated with lower tracks might threaten student performance. The authors test this concern by drawing on a field experiment enrolling 11,624 Chinese vocational high school students, half of whom were randomly primed about their tracks before taking technical skill and math exams. As in almost all countries, Chinese students are sorted between vocational and academic tracks, and vocational students are stereotyped as having poor academic abilities. Priming had no effect on technical skills and, contrary to hypotheses, modestly improved math performance. In exploring multiple interpretations, the authors highlight how vocational tracking may crystallize stereotypes but simultaneously diminishes stereotype threat by removing academic performance as a central measure of merit. Taken together, the study implies that reminding students about their vocational or academic identities is unlikely to further contribute to achievement gaps by educational track.

Immigrants

Normie authors:

In many regions around the world students with certain immigrant backgrounds underachieve in educational settings. This paper provides a review and meta-analysis on one potential source of the immigrant achievement gap: stereotype threat, a situational predicament that may prevent students to perform up to their full abilities. A meta-analysis of 19 experiments suggests an overall mean effect size of 0.63 (random effects model) in support of stereotype threat theory. The results are complemented by moderator analyses with regard to circulation (published or unpublished research), cultural context (US versus Europe), age of immigrants, type of stereotype threat manipulation, dependent measures, and means for identification of immigrant status; evidence on the role of ethnic identity strength is reviewed. Theoretical and practical implications of the findings are discussed.

Their funnel plot says it all:

Black-White gap in USA

In depth analysis of the first and famous paper is given by Ulrich Schimmack.

Wicherts has an old meta-analysis that he is hiding for whatever reason.

But we will soon get another big replication, similar to Flore’s above:

According to stereotype threat theory, the possibility of confirming a negative group stereotype can evoke feelings of threat, leading people to underperform in the domains in which they are stereotyped as lacking ability. This theory has immense theoretical and practical implications, but many studies supporting it include small samples and varying operational definitions of “stereotype threat”. We address the first challenge by leveraging a network of psychology labs to recruit a large Black student sample (Nanticipated = 2700) from multiple US sites (Nanticipated = 27). We address the second challenge by identifying three threat-increasing and three threat-decreasing procedures that could plausibly affect performance and use an adaptive Bayesian design to determine which “stereotype threat” operationalization yields the strongest evidence for underperformance. This project has the potential to advance our knowledge of a scientifically and socially important topic: whether and under what conditions stereotype threat affects current US Black students.

Which of course I am looking forward to!


A reasonable prior is that anything from social psychology is most likely bullshit. More so the more left-wing friendly it is. Stereotype threat gets a double bad prior here. The evidence for it is laughably bad, so a reasonable person’s posterior will be close to 0.