intelligence / IQ / cognitive ability

Magic (American) Racism Theory

Aside from the usual Magic Dirt Theory, there is also the Magic (American) Racism Theory. James Flynn explains the conundrum for environmentalists:

There is no doubt that high h² [heritability] estimates force environmentalists to find a factor or factors that are relatively uniform in their presence within the black population – and within the white population as well if they operate there. After all, if an environmental factor is potent enough to account for the 15-point performance gap between black and white, and if it varies much from person to person within the black population, it would be extremely odd if it accounted for none of the variable performance within the black population! And if it did, it would of course increase the role of environmental factors in explaining IQ variance and thus lower the h² estimate for blacks. There is also no doubt this criterion, the criterion of uniform presence, is the most crippling of those the environmentalist is forced to accept. If we seize on SES as a between-population explanation, who can deny that there are large differences in SES within black America; if we seize on education, who can deny that blacks differ significantly in terms of quality of education? The usual candidate brought forward for the role of blindfold is racism: after all every black suffers from racial bias, and no white suffers from at least that kind of handicap, and racism is very potent. But this too is simply an escape from hard thinking and hard research. Racism is not some magic force that operates without a chain of causality. Racism harms people because of its effects and when we list those effects, lack of confidence, low self-image, emasculation of the male, the welfare mother home, poverty, it seems absurd to claim that any one of them does not vary significantly within both black and white America. Certainly there are some blacks who have self-confidence, enjoy a stable home, a reasonable income, good housing; and certainly we all know whites who have a poor self-image, suffer from emasculation, or suffer from poverty. [p. 59-60 in Race, IQ, and Jensen, 1980]

This illustrates what Jensen wrote back in 1973:

The very ad hoc nature of environmentalist explanations seems to me antithetical to the ways of science. Scientific progress is won through an unrelenting battle against ad hoc explanations of natural phenomena. Therefore, in studying subpopulation differ­ences in mental abilities, does it not seem a more scientific approach to consider all factors which are known to cause individual differences within groups? And is it not reasonable, if for practical reasons of research strategy we must assign some priority to the hypothesized causes we wish to consider, that the evidence derived from studies within groups should serve as a guide to the kinds of hypotheses most worth entertaining about the causes of differences between groups? And does not this lead us directly to the hypothesis of genetic factors as being among the undoubtedly multiple causes of racial subpopulation differences in mental abilities? Further­ more, it is practically axiomatic in biology that any characteristics showing individual variation within subgroups of a species will also show variation between subgroups of the species. [p. 129 in Educability and group differences, 1973]

Jensen (p. 509ff in The g factor, 1998) nevertheless provides a reply to people who offer Unique American Legacy of Slavery model of African American-White IQ gap:

Those behavioral scientists who attribute the difference entirely to the envi­ronment typically hypothesize factors that are unique to the historical experience of blacks in the United States, such as a past history of slavery, minority status, caste status, white racism, social prejudice and discrimination, a lowered level of aspiration resulting from restricted opportunity, peer pressure against ‘ ‘acting white,” and the like. The obvious difficulty with these variables is that we lack independent evidence that they have any effect on g or other mental ability factors, although in some cases one can easily imagine how they might adversely affect motivation for certain kinds of achievement. But as yet no mechanism has been identified that causally links them to g or other psychometric factors. There are several other problems with attributing causality to this class of var­iables:

1. Some of the variables (e.g., a past history of slavery, minority or caste status) do not explain the W-B 1 sd to 1.5 sd mean difference on psychometric tests in places where blacks have never been slaves in a nonblack society, or where they have never been a minority population, or where there has not been a color line.

2. These theories are made questionable by the empirical findings for other racial or ethnic groups that historically have experienced as much discrimination as have blacks, in America and other parts of the world, but do not show any deficit in mean IQ. Asians (Chinese, Japanese, East Indian) and Jews, for ex­ample, are minorities (some are physically identifiable) in the United States and in other countries, and have often experienced discrimination and even persecution, yet they perform as well or better on g-loaded tests and in g-loaded occupations than the majority population of any of the countries in which they reside. Social discrimination per se obviously does not cause lower levels of g. One might even conclude the opposite, considering the minority subpopulations in the United States and elsewhere that show high g and high g-related achieve­ments, relative to the majority population.

3. The causal variable posited by these theories is unable to explain the de­tailed empirical findings, such as the large variability in the size of the W-B difference on various kinds of psychometric tests. As noted in Chapter 11, most of this variability is quite well explained by the modified Spearman hypothesis. It states that the size of the W-B difference on various psychometric tests is mainly related to the tests’ g loadings, and the difference is increased if the test is also loaded on a spatial factor and it is decreased if the test is also loaded on a short-term memory factor. It is unlikely that broad social variables would produce, within the black and white populations, the ability to rank-order the various tests in a battery in terms of their loadings on g and the spatial and memory factors and then to distribute their effort on these tests to accord with the prediction of the modified Spearman hypothesis. (Even Ph.D. psychologists cannot do this.) Such a possibility is simply out of the question for three-year- olds, whose performance on a battery of diverse tests has been found to accord with Spearman’s hypothesis (see Chapter 11, p. 385). It is hard to even imagine a social variable that could cause systematic variation in the size of the W-B difference across different tests that is unrelated to the specific informational or cultural content of the tests, but is consistently related to the tests’ g loadings (which can only be determined by performing a factor analysis).

4. Test scores have the same validity for predicting educational and occu­pational performance for all American-born, English-speaking subpopulations whatever their race or ethnicity. Blacks, on average, do not perform at a higher level educationally or on the job, relative to other groups, than is predicted by g-Ioaded tests. An additional ad hoc hypothesis is required, namely, that the social variables that depress blacks’ test scores must also depress blacks’ per­formance on a host of nonpsychometric variables to a degree predicted by the regression of the nonpsychometric variables on the psychometric variables within the white population. This seems highly improbable. In general, the social variables hypothesized to explain the lower average IQ of blacks would have to simulate consistently all of the effects predicted by the default hypothesis and Spearman’s hypothesis. To date, the environmental theories of the W-B IQ dif­ference put forward have been unable to do this. Moreover, it is difficult or impossible to perform an empirical test of their validity.

This explanation has not seen any serious work since these words were written. In fact, it has sort of failed even more with the decline of stereotype threat theory. Granted, there is not yet a large pre-registered replication for the black-white version (YET, there is a pre-registration!), but there is an excellent one for the male-female math version (Flore, 2018). And there is a meta-analysis of stereotype threat for black-white with publication bias present, and some large sample studies already (not cited much of course).

intelligence / IQ / cognitive ability

New paper out: Filling in the Gaps: The Association between Intelligence and Both Color and Parent-Reported Ancestry in the National Longitudinal Survey of Youth 1997

We’ve got a new important paper out in the special issue on race and intelligence in Psych:

Little research has dealt with intragroup ancestry-related differences in intelligence in Black and White Americans. To help fill this gap, we examined the association between intelligence and both color and parent-reported ancestry using the NLSY97. We used a nationally-representative sample, a multidimensional measure of cognitive ability, and a sibling design. We found that African ancestry was negatively correlated with general mental ability scores among Whites (r = −0.038, N = 3603; corrected for attenuation, rc = −0.245). In contrast, the correlation between ability and parent-reported European ancestry was positive among Blacks (r = 0.137, N = 1788; rc = 0.344). Among Blacks, the correlation with darker skin color, an index of African ancestry, was negative (r = −0.112, N = 1455). These results remained with conspicuous controls. Among Blacks, both color and parent-reported European ancestry had independent effects on general cognitive ability (color: β = −0.104; ancestry: β = 0.118; N = 1445). These associations were more pronounced on g-loaded subtests, indicating a Jensen Effect for both color and ancestry (rs = 0.679 to 0.850). When we decomposed the color results for the African ancestry sample between and within families, we found an association between families, between singletons (β = −0.153; N = 814), and between full sibling pairs (β = −0.176; N = 225). However, we found no association between full siblings (β = 0.027; N = 225). Differential regression to the mean results indicated that the factors causing the mean group difference acted across the cognitive spectrum, with high-scoring African Americans no less affected than low-scoring ones. We tested for measurement invariance and found that strict factorial invariance was tenable. We then found that the weak version of Spearman’s hypothesis was tenable while the strong and contra versions were not. The results imply that the observed cognitive differences are primarily due to differences in g and that the Black-White mean difference is attributable to the same factors that cause differences within both groups. Further examination revealed comparable intraclass correlations and absolute differences for Black and White full siblings. This implied that the non-shared environmental variance components were similar in magnitude for both Blacks and Whites.

So, to sum up:

  • Measurement invariance holds per MGCFA, as Wicherts et al demands. It’s also positive with Jensen’s method.
  • We confirm yet again that skin color is correlated with IQ among blacks. This replicates findings going back to early 1900s.
  • Used sibling design to test for causality of skin color and IQ relationship. Causality was not found. This is predicted by hereditarian model, but colorism predicts a within family relationship. Hence, 1-0 in favor of hereditarianism from this test.
  • Sample size was not impressive, it will need replication. Add Health has 1200 normal sibling and 452 dizygotic twin pairs, and the sample is ~25% black, hence there should be ~410 black sibling pairs to use.

Careful readers will know that this is the stuff based on Dalliard’s and Fuerst’s old blogposts (2013!) over at Human Varieties. One of my goals with starting OpenPsych was to get a lot of the science produced by bloggers into journals, so academics would take notice. This undertaking has been moderately successful I think.

Genomics intelligence / IQ / cognitive ability

The mysterious PC1 ancestry component

Remember “poverty shrinks the brain from birth”? From birth! Fast acting that poverty, actually before birth too. Anyway, Kimberly Noble is back with another noble attempt:

Early adversity and socioeconomic disadvantage are risk factors associated with diminished cognitive outcomes during development. Recent studies also provide evidence that upbringings characterized by stressful experiences and markers of disadvantage during childhood, such as lower parental education or household income, are associated with variation in brain structure. Although disadvantage often confers adversity, these are distinct risk factors whose differential influences on neurodevelopment and neurocognitive outcomes are not well characterized. We examined pathways linking parental education, adverse experiences, brain structure, and cognitive performances through an analysis of 1,413 typically-developing youth, ages 8 through 21, in the Philadelphia Neurodevelopmental Cohort. Parental education and adverse experiences had unique associations with cortical surface area and subcortical volume as well as cognitive performance across several domains. Associations between parental education and several cognitive tasks were explained, in part, by variation in cortical surface area. In contrast, associations between adversity and cognitive tasks were explained primarily by variation in subcortical volume. A composite neurodevelopmental factor derived from principal component analysis of cortical thickness, cortical surface area, and subcortical volume mediated independent associations between both parental education and adverse experiences with reading, geometric reasoning, verbal reasoning, attention, and emotional differentiation tasks. Our analysis provides novel evidence that socioeconomic disadvantage and adversity influence neurodevelopmental pathways associated with cognitive outcomes through independent mechanisms.

Sounds benign? Discrimination and parental education causes is associated with brain size metrics and cognitive test scores. She’s mining the MRI subsample of the PNC (aka. TCP) new insights. Authors conclude:

Our results do not necessarily imply causal relationships between PE, AE, brain structure, and cognitive skills. First, PE is a distal marker for environmental factors such as nutrition, toxins, lack of caregiver support, learning impairments such as dyslexia or attention disorders that are linked to genetics, and fewer opportunities for cognitive enrichment that were not assessed in the present sample [ 1 , 2 ]. Similarly, it is unlikely that (a) all types of AE have equal effects on neurocognitive development and (b) all individuals accurately report whether they have undergone an AE. Further investigations incorporating more complete descriptions of adverse experiences may provide greater insight into their influences on neurocognitive development in relation to SES.

Our study contributes to a growing literature on the role played by brain structure and physiology in explaining associations between environmental factors and cognition among youth [ 3 , 46 ]. If disadvantage and adversity have unique implications for neurocognitive development, as our results suggest, then remediation is likely to be more effective at improving educational outcomes when targeted to specific risk profiles. Devising targeted early intervention strategies would help achieve the public policy goal of reducing effects of socioeconomic disparities on academic achievement among youth [47].

Sounds perfectly normal, only a bare mention of genetic confounding (sort of) in a previous sentence, and some references to “genetic ancestry” as a covariate. What do the results show?

Alright, so the PC1-6 are genetic ancestry covariates. Along with sex, PC1 sure does predict a lot of variance in first 2 brain measures and has negative coefficients. But what does it mean? Tell me what does it mean. Well, the authors are nice enough to report the distributions of the ancestry components in their supplementary materials. Looks like this:

Right, so looking at PC1, we see that populations called AFR-1KG and AA-PNC have really high values for this and not so much the other populations. Reading more around, we realize these are both African groups, so PC1 essentially measures Africanness in the genomic space. So, it means that authors are nice enough to hide the inconvenient finding that African ancestry predicts smaller brains, controlling for adverse events and parental education. In terms of relative effect sizes, African ancestry was 11.3/0.4=28 times more important than adverse events for predicting cortical surface area, and 5.2 times more important for predicting subcortical volume.

History intelligence / IQ / cognitive ability

How long do we need affirmative action?

1976, Constance Baker Motley, African American judge:

Judicial predictions of reduction or elimination of the RAG through color – based decisions approached the l udicrous. In rendering the decisive vote on the High Court decision Grutter vs. Bolling (539 U.S. 2003) and endorsing a continuing legality of quotas, Justice Sandra Day O ’ Connor averred, “ …the Court expects that 25 years from now, the use of racial preferences of social performance will no longer be necessary. ” In 2012 and having concurred with Justice O ’ Connor in the 2003 ruling, Justice Breyer acknowledged evidence of the unchanging RAG but noted only nine of the 25 years had passed. Puzzled by remarks of Justices O ’ Connor and Breyer, Otis Graham , writing in the editorial page of the Wall Street Journal , recalled the 1976 statement of Constance Baker Motley, an African – American judge, at a Conference on Affirmative Action at the Center for Studies of Democratic Institutions: “ I despise the necessity of reverse discrimination but I swear to you we will end it in 25 years.” Twenty years had passed when Graham noted this in 1997, and it is now 16 years since then.

42 years so far.

1986, Euluis Simien writing for a law journal:

The purpose of this article is to review the accomplishments and failures of affirmation active in the legal educational arena over the last fifteen to twenty years. The most substantial accomplishments have been significant increases in the enrollment of minorities, other than Blacks, into American law schools. At the same time, affirmation action efforts have wholly failed to significantly increase Black enrollment in law school. In an effort to review these accomplishments and failures, this article reviews statistics on representation of females, Blacks, and other minorities in the bar and law schools. These statistics will show that although females and other minorities are not yet proportionally represented in the bar, an end to this disparity is in sight.

Was unable to get the article fulltext, so not sure if he made some predictions for blacks too, or just females and non-black minorities.

In 2003, US Supreme Court defended pro black etc race discrimination (affirmative action) in admissions:

The Court’s majority ruling, authored by Justice Sandra Day O’Connor, held that the United States Constitution “does not prohibit the law school’s narrowly tailored use of race in admissions decisions to further a compelling interest in obtaining the educational benefits that flow from a diverse student body.” The Court held that the law school’s interest in obtaining a “critical mass” of minority students was indeed a “tailored use”. O’Connor noted that sometime in the future, perhaps twenty-five years hence, racial affirmative action would no longer be necessary in order to promote diversity. It implied that affirmative action should not be allowed permanent status and that eventually a “colorblind” policy should be implemented. The opinion read, “race-conscious admissions policies must be limited in time.” “The Court takes the Law School at its word that it would like nothing better than to find a race-neutral admissions formula and will terminate its use of racial preferences as soon as practicable. The Court expects that 25 years from now, the use of racial preferences will no longer be necessary to further the interest approved today.” The phrase “25 years from now” was echoed by Justice Thomas in his dissent. Justice Thomas, writing that the system was “illegal now”, concurred with the majority only on the point that he agreed the system would still be illegal 25 years hence.

Well, 10 years to go!

Getting it right

Someone mentioned that a judge wrote a rather amazingly foresightful letter to Dean Louis Pollak in 1969:

Judge Macklin Fleming (quoted from Powerline):

From your remarks and those of Dean Poor, I understand that 43 black students have been admitted to next fall’s class, of whom 5 qualified under the regular standards and 38 did not. … You also said that the future policy of the Law School will be to admit 10 per cent of each entering class without regard to qualification under regular standards.
With the adoption of its new admission policy the Law School has taken a long step toward the practice of apartheid and the maintenance of two law schools under one roof. Already there has been established in the Law School building a Black Law Students Union lounge with furniture and law books provided by the school. And I learned from Dean Poor that the 12 black students in the present first year class who were admitted under relaxed standards have not done well academically. Dean Poor attributed this deficiency to the pre-occupation of these students with racial activities. I think it equally logical to attribute their preoccupation with racial activities to their lack of qualification to compete on even terms in the study of law.
The immediate damage to the standards of Yale Law School needs no elaboration. But beyond this, it seems to me the admission policy adopted by the Law School faculty will serve to perpetuate the very ideas and prejudices it is designed to combat. If in a given class the great majority of the black students are at the bottom of the class, this factor is bound to instill, unconsciously at least, some sense of intellectual superiority among the white students and some sense of intellectual inferiority among the black students.


No one can be expected to accept an inferior status willingly. The black students, unable to compete on even terms in the study of law, inevitably will seek other means to achieve recognition and self-expression. This is likely to take two forms. First, agitation to change the environment from one in which they are unable to compete to one in which they can. Demands will be made for elimination of competition, reduction in standards of performance, adoption of courses of study which do not require intensive legal analysis, and recognition for academic credit of sociological activities which have only an indirect relationship to legal training.

Second, it seems probable that this group will seek personal satisfaction and public recognition by aggressive conduct, which, although ostensibly directed at external injustices and problems, will in fact be primarily motivated by the psychological needs of the members of the group to overcome feelings of inferiority caused by lack of success in their studies. Since the common denominator of the group of students with lower qualifications is one of race this aggressive expression will undoubtedly take the form of racial demands–the employment of faculty on the basis of race, a marking system based on race, the establishment of a black curriculum and a black law journal, an increase in black financial aid, and a rule against expulsion of black students who fail to satisfy minimum academic standards.

intelligence / IQ / cognitive ability

The original texts of the Eyferth study

Often discussed, occasionally cited, but rarely read — certainly a highly suspicious combo — I post here the original German language final version (we think):

(there’s a few more earlier reports which I will post later)

The numbers in the final version seem to match those given by Flynn 1980.

My guess is that there’s more of these kinds of old studies around, but there’s not enough curious people around to dig them up (or brave enough to post them if found and they have the wrong results). If you have time, try looking around for WW1 and WW2 US soldier related studies.

intelligence / IQ / cognitive ability

Comments on Fagan and Holland (2007)’s racial equality paper

Someone sent me some questions on this study:

African-Americans and Whites were asked to solve problems typical of those administered on standard tests of intelligence. Half of the problems were solvable on the basis of information generally available to either race and/or on the basis of information newly learned. Such knowledge did not vary with race. Other problems were only solvable on the basis of specific previous knowledge, knowledge such as that tested on conventional IQ tests. Such specific knowledge did vary with race and was shown to be subject to test bias. Differences in knowledge within a race and differences in knowledge between races were found to have different determinants. Race was unrelated to the g factor. Cultural differences in the provision of information account for racial differences in IQ.

I don’t recall reading it before, but I can see why environmentalists would love it. It has 44 citations according to GScholar.

Fagan (1992, 2000) assumes that the IQ score is a measure of knowledge. Knowledge depends on information processing ability and on the information given by the culture for processing. The term intelligence, in Fagan’s theory , means information processing ability. Fagan assumes that not all have had equal opportunity for exposure to the information underlying the knowledge being quizzed on standard tests of IQ. Given such assumptions, if group differences in IQ are not accompanied by group differences in information processing ability, then group differences in IQ are due to differences in access to information.

Timeline of US Black-White gap based on ~100 datapoints of heterogeneous data. Orange line = linear fit (slope is ~0). Mean gap is different from the intercept because some datapoints had no test year, so weren’t used for the fit for the plot but were used for calculating the mean in the top left corner.

So, it’s an “equality of opportunity” argument. The basic idea doesn’t even pass the smell test. US Black-White IQ gap has been about constant for ~150 years (see figure), including with the advent of the internet and public libraries. Obviously, anyone who wants and is able to to learn stuff can very easily do so on the internet for free too. Yet the racial gaps stay the same. So, we know already that access to information is not a big independent factor. Rather, exposure to information is something that comes from within — smarter and more curious people seek out exposure to information. It’s textbook active gene-environment correlation.

The paper has a few fairly obvious methodological errors, aside from the small samples in some of their substudies (n’s = 77 students, 65 students, 86 students, 223 students). One of the more obvious ones is that they disregard group factors and differences in them. If we control for item performance on tests with “specific knowledge”, the group difference on other items may be zero, or it may not. It depends on various things like the g-loading of the “specific knowledge” items, measurement error, group differences on whatever abilities underlie performance on these kinds of items. Blacks have been found to have an advantage on some memory tests, for instance, when controls for g, indicating some advantage on non-g abilities related to that. See e.g.:

where they report group differences of:

These are from a bi-factor model, so the non-g gaps are independent of g (~same as controlling for g in a hierarchical model framework). We see that Whites seem to be quite a bit better at visual processing (d = 0.8) and maybe on verbal comprehension (d = 0.23), while Blacks did somewhat better on long-term retrieval (d = -.35). The latter two are quite uncertain (confidence interval is wide and close to 0).

The general issue with papers like Fagan is that they attack a strawman, namely that Black-White gap is IQ is solely due to gap in g factor. Here’s Jensen in 1985:

For the sake of precision, Spearman’s hypothesis should be stated in two forms that can be termed strong and weak, respectively, although Spearman himself did Black-white difference not suggest this distinction. The strong form of the hy­pothesis holds that the magnitudes of the black-white differences (in standard score units) on a variety of tests are directly related to the tests’ g loadings, because black and white populations differ only on g and on no other cognitive factors. The weak form of the hypothesis holds that the black-white difference in various mental tests is predominantly a difference in g, although the populations also differ, but to a much lesser degree, in certain other ability factors besides g.

Jensen has never subscribed to the strong form as far as I know, and indeed in his 1985 (!) study, he wrote that:

A study of the national standardization sample of the WISC-R (Jensen & Reynolds 1982), based on 1,868 white and 305 black children, bears out Spearman’s hypothesis but contradicts it in its strong form, because significant, but small, black-white differences were found on other factors besides g. When the WISC-R is subjected to a Schmid-Leiman hierarchical factor analysis, it yields four factors that are virtually identical for both populations: g, verbal, spatial, and memory. When factor scores on each of these four factors are computed for every black and white subject, the populations show significant mean differences on all four factors, a finding that contradicts the strong form of Spearman’s hypothesis. But the weak form is strongly upheld, as the g factor accounts for more than seven times as much of the between-population variance as the other three factors combined. Black testees exceed white testees on the Memory factor (0.32o-), whereas white testees exceed black testees on the g (1.14cr), Verbal (0.20rr), and Performance (0.20cx) factors.

See also:

Spearman’s Hypothesis holds that the magnitude of mean White–Black differences on cognitive tests covaries with the extent to which a test is saturated with g. This paper evaluates Spearman’s Hypothesis by manipulating the g saturation of cognitive composites. Using a sample of 16,384 people from the General Aptitude Test Battery database, we show that one can decrease mean racial differences in a g test by altering the g saturation of the measure. Consistent with Spearman’s Hypothesis, the g saturation of a test is positively and strongly related to the magnitude of White–Black mean racial differences in test scores. We demonstrate that the reduction in mean racial differences accomplished by reducing the g saturation in a measure is obtained at the cost of lower validity and increased prediction errors. We recommend that g tests varying in mean racial differences be examined to determine if the Spearman’s Hypothesis is a viable explanation for the results.

Differential psychology/psychometrics

The US Black-White cognitive ability gap in 1850, 1870 and 1900 censuses

Using the crude measures of literacy and numeracy discussed in a previous post, it is possible to quantify the cognitive ability gap for US Black-White in the 1800s. The data come from:

And look like this:

Following the pass rate to normal conversion (qnorm in R), we can derive relative z-score gaps from the pass rates. These are:

Year W literacy W numeracy B literacy B numeracy BW literacy BW numeracy BW mean
1850 0.86 0.88 0.58 0.70 0.87 0.64 0.75
1870 0.89 0.88 0.35 0.70 1.63 0.68 1.15
1900 0.94 0.95 0.62 0.86 1.22 0.60 0.91



  • 1850s data is based on free slaves only, hence these were likely above average cognitive ability, explaining the smaller gap. Corollary of this is that Black literacy decreases from 1850 to 1870 due to the release of the southern slaves.
  • Samples were small.
  • The White pass rates are close to the ceiling introducing ceiling effects and extra imprecision.
  • Large-scale immigration of Whites probably reduces their literacy, despite these data being based on natives only.

All in all, the mean gap in the 1850-1900 period comes out at 0.94 d.

Differential psychology/psychometrics intelligence / IQ / cognitive ability

How to do a meta-analysis of Black-White IQ gap

There’s been some talk about whether the SIRE (self-identified race/ethnicity) gaps are closing and if so how much and when they did that. The matter is complicated for many reasons. The last major, published review is very dated, as it is from 2001.

  • Roth, P. L., Bevier, C. A., Bobko, P., Switzer, F. S., & Tyler, P. (2001). Ethnic Group Differences in Cognitive Ability in Employment and Educational Settings: A Meta-Analysis. Personnel Psychology, 54(2), 297–330.

John Fuerst took a stab at it a few years ago (in 2013), but apparently never finished the project (sound familiar?). I digged up his datafile and plotted it, which looks like this:

The wiggly fit is from LOESS. Both the mean and the median gaps are exactly 1.00. The wiggly pattern is possibly entirely artifactual.

The data are messy. If we take a look at the file, we find:

  • Meta-analyses of specific, selected groups e.g. criminals reviewed in Shuey (1966) found a gap of 0.73 d data on 46 studies from 1919-1965.
  • IQ batteries for special contexts e.g. GATB used for employment test. A review found a gap of 0.90 d based on data from 1940-1970.
  • Standardized IQ batteries e.g. Wonderlic. A review found a gap of 1.00 d based on data from 1970.
  • Achievement test data for primary education e.g. NAEP. There are many datapoints from this, e.g. one datapoint found a gap of 1.35 using 1970s data and the NAEP LTT.
  • Achievement test data for tertiary education selection e.g. SAT. For the years 1987-2009, the mean gap was 1.06.
  • Poor/short tests e.g. WORDSUM (10 items only).

Wiggly patterns can easily arise if the type of test influences the gap size (surely), and the distribution of data sources is not equal across time, which it is very unlikely to be given that there are only some 100 included datapoints.

Going forward

If one wants to properly study the topic, one has to collect a lot of data. The following approach seems reasonable to me. One should collect data from, in order of preference:

  1. All standardized lengthy IQ tests and batteries commonly used in the USA. These include, WAIS, WISC, WPPS, AFQT/ASVAB, DAT, DAS, Wonderlic, PPVT, WJ, Raven’s, CCF (Cattell), CAT, GATB, CAB, HB.
  2. All standardized achievement tests and batteries commonly used in the USA. These include NAEP, PISA, TIMMS, SAT, ACT, GMAT, GRE, MCAT, LSAT, WRAT, KAT.
  3. Data from the commonly used/discussed ad hoc/short IQ-ish tests. This is primarily the WORDSUM which has been included in the GSS for many decades.

When collecting data for (1-2), it makes sense to also collect the subtest gaps. These can then be used in a combined analysis for Spearman’s hypothesis. One does not need case-level data to do this as Jensen’s method does not require this (it has other problems tho). In fact, neither does SEM, but SEM does require that one can estimate a complete correlation matrix. While one cannot find a paper that used every possible test combination (but a good starting point is the MISTRA data), one may be able to find papers that reported every possible two-way pairing which can then be used to build a larger correlation matrix.


The IQ batteries usually, but not always, have a nationally representative standardization sample. There’s usually a such sample for each iteration of the test, which makes it possible to examine historical trends. These are the best source of data for the research questions and one should try to find all of these, as well as contact all the test publishers for any additional data. If they decline, this needs to be noted as well. Note that two samples can both be fairly nationally representative in the broad sense without being entirely equal. As such, even differences in sampling for these can alter results. There is no probably no easy way to deal with this problem.

The achievement tests often have problematic sampling. NAEP has good sampling (everybody in school at age 17 I think?), but may have problems with test cheating related to No Child Left Behind and similar legislation. The tertiary education-related tests (SAT, ACT, GRE, etc.) have self-selected samples and these change over time (more people take the tests now) which affects the observed gaps even if the ability distributions don’t change.

Test changes

Another source of trouble is when the test or scoring changes over time. For instance, the NAEP data indicate that the reading gap was 1.31 d in 1980 and then 0.82 d in 1984! Are we to believe that the gap narrowed some .50 d in 4 years?

Since the group differences are a source of embarrassment for many testing companies, they probably took extensive steps to minimizing the gaps (sometimes they are open about this too, and it also applies to sex gaps). While one solution simply involves lowering the g-loading of the test, this will also make the test comparatively less useful (per this study). A better solution is to increase the size of the group factors that the less bright groups are better at. For instance, 2015 evidence indicates that Blacks do relatively better at the non-g memory factor, with an advantage of 0.35 d.

As such, one could swap around some tests in a battery to add more memory factor tests, and one could probably do this without altering the overall test g-loading much. This would then decrease the IQ gap on the test by some amount. In the test above (WAIS-4 + some additional tests), the g-gap was 1.16 while the IQ gap was ??? (not reported).


One can begin using the Dickens and Flynn collection from 2006. It looks like this:

Based on these, it does appear that there are some gains in IQ scores and maybe in GCA — the underlying ability trait — as well. Note that the general upwards trend in these data is also found in Fuerst’s dataset: look at the period 1970 to 2000 trend. In fact, this is the period with the largest gains, but they don’t seem to continue in the post 2000s data, as the trend then moved upwards again (e.g. d = 1.13 in WAIS-4 which is not in the dataset).

Black gains are expected over time given a purely genetic model of differences because of the increased rates of inter-racial marriage which adds more European ancestry to the Black group. This also brings us to the problem of who is included in the Black/African American as well as White categories over time. These problems cannot be solved by such an analysis as this one, but one can solve them using genomic data, which does not care about the SIRE categories. Given a large sample of African Americans (say, n=1000 assuming it is representative), one can regress IQ on African ancestry and look at the predicted mean IQ for a person with 100% African ancestry.