Clear Language, Clear Mind

January 3, 2019

Environmentalists like admixture analysis too (until they don’t)

Filed under: Genetics / behavioral genetics,intelligence / IQ / cognitive ability — Tags: , — Emil O. W. Kirkegaard @ 17:10

See previous post about quotes from the medical genetics and physical anthropology literature on admixture analysis and the causal interpretation.

There’s quite a few older admixture studies that examined relationships between racial ancestry and intelligence. Most of these used quite crude methods such as interviewer judgement. Some used a better method, namely objectively measured skin tone/color/reflectance. A small few used something approaching modern technology, namely blood groups. Curiously, you only hear about the ones with the smallest samples probably because these found no relationships, and are thus perceived to provide evidence (sometimes seen as conclusive) for non-genetic models. Thus, we can find environmentalists defending this kind of analysis, now that they like the results.

Templeton provides a particularly good quote, my bolding below.

There is a way of testing if differences in phenotypic means between two populations have a genetic basis. The test was developed by Mendel and requires that the populations be crossed and that the hybrids and their descendants be raised in a ‘‘common garden” (i.e., a common environment). Despite the extreme interest in the genetic basis of between population differences in intelligence, only a handful of studies have even attempted to use this standard research design of genetics. These few studies (Green, 1972; Loehlin, Vandenberg, & Osborne, 1973; Scarr, Pakstis, Katz, & Barker, 1977) have several common features. First, they take advantage of the strong tendency of humans to interbreed when brought into physical proximity. For example, in the Americas, geographically differentiated human populations of European and sub-Saharan African origin were brought together and began to hybridize. However, most matings still occurred within populations. Given this assortative mating, the genetic impact of hybridization is extremely sensitive to the cultural environment. In North America, the hybrids were culturally classified as blacks, and hence most subsequent matings involving the hybrids were into the population of African origin. Therefore, a broad range of variation in degree of European and African ancestry can be found among North American individuals who are all culturally classified as being members of the same “race”, in this case blacks (a “common garden” cultural classification). In Latin America, different cultures have different ways of classifying hybrids, but in general a number of alternative categories are available and social class is a more powerful determinant of mating than is physical appearance (e.g., skin color). As a consequence, individuals in Latin America can be culturally classified into a single social entity that genetically represents a broad range of variation in amount of European and African ancestry. Thus, these studies use a “common garden” design in a cultural sense that nevertheless includes hybrid individuals and their descendants. Second, these studies quantify the degree of European and African ancestry in a population of individuals that is culturally classified as being a single “race.” Because the original geographically disparate populations do show genetic differences due to isolation by distance, the degree of European and African ancestry of a specific individual can be estimated using blood group and molecular genetic markers. Finally, the shared premise of these studies is that if a trait that differentiates European and sub-Saharan Africans has a genetic basis, it should show variation in the hybrid population that correlates with the degree of African ancestry. This is indeed the case for many morphological traits, such as skin color (Scarr et al., 1977). However, there is no significant correlation with the degree of African ancestry for any cognitive test result, either within the cultural environment of being “black” (Loehlin et al., 1973; Scarr et al.,1977) or in the cultural environment of being “white” (Green, 1972). Hence, even though these populations differ in their average test scores, there is no evidence for any genetic differentiation among these populations at genetic loci that influence these IQ test scores.

So if such admixture patterns were to be found,, Templeton would have to agree this constitutes evidence for “genetic differentiation among these populations at genetic loci that influence these IQ test scores”.

Nisbett spends an entire appendix trying to argue for a 0% genetic contribution to US black-white gap. After ignoring most of the evidence on the topic (even in his included categories), he specifically advocates using admixture studies in his discussion (again my bolding):

Racial Ancestry and IQ
All of the research reported above is most consistent with the proposition that the genetic contribution to the black/white dif-ference is nil, but the evidence is not terribly probative one way or the other because it is indirect. The only direct evidence on the question of genetics concerns the racial ancestry of a given individual. The genes in the U.S. “black” population are about zo percent European (Parra et al., 1998; Parra, Kittles, and Shriver, 1004). Some blacks have completely African ancestry, many have at least some European ancestry, and some—about to percent—have mostly European ancestry. Does it make a difference how African versus European a black person is? A hereditarian model demands that blacks with more European genes have higher IQs. Herrnstein and Murray (1994) and Rushton and Jensen (2005), as it happens, scarcely deal with this direct evidence.

[discussion of various studies]

So what do we have in the way of studies that examine the effects of racial ancestry—by far the most direct way to assess the contribution of genes versus the environment to the black/white IQ gap? We have one flawed adoption study with results consistent with the hypothesis that the gap is substantially genetic in origin, and we have two less-flawed adoption studies, one of which indicates slightly superior African genes and one of which suggests no genetic difference. We have downs of studies looking at racial ancestry as indicated by skin color and “negroidness” of features that provide scant support for the genetic theory. In addition, three different studies of Europeanness of blood groups, using two different designs, indicate no support for the genetic theory. One study of illegitimate children in Germany demonstrates no superiority for children of white fathers as compared to children of black fathers. One study shows that exceptionally bright “black children have no more European ancestry than the best-available estimate for the population as a whole. And one study indicates that A is more advantageous for a mixed-race child to be raised by a family having a white mother than by a family having a black mother. All of these racial ancestry studies are subject to alternative interpretations Most of these alternatives boil down to the possibility that there was self-selection for IQ in black-white unions. If whites who mated with blacks had much lower IQs than whites in general, their European genes would convey little IQ advantage. Similarly, if blacks who mated with whites had much higher IQs than blacks in general, their African genes would not have been a drawback. Yet the extent to which white genes contributing to mixed-race unions would have to be inferior to white genes in general, or black genes would have to be superior to black genes in general, would have to be very extreme to result in no IQ difference at all between children of purely African heritage and those of partially European origin. Moreover, self-selection by IQ was probably not very great during the slave era, when most black-white unions probably took place. It is unlikely, for example, that the white males who mated with black females had on average a lower IQ than other white males. Indeed, if such unions mostly involved white male slave-owners and black female slaves, which seems likely to be the case (Parra et al., 1998), and if economic status was slightly positively related to IQ (as it is now), thew whites probably had IQs slightly above average. The black female partners were nor likely chosen on the bask of IQ, as opposed to comeliness. Similarly, it scarcely seems likely that either black or white soldiers in World War II were selecting their German mates on the basis of IQ. Several studies, moreover, are immune to the self-selection hypothesis. In particular, the study involving black and white children raised in an institutional setting, and the study involving black children adopted into either black or white middle-class homes, could not be explained by self-selection for IQ in mating. In short, though one would never know it by reading Herrnstein and Murray’s book (1994) or Rushton and Jensen’s article (zoos), the great mass of evidence on racial ancestry—the only direct evidence we have—points toward no contribution at all of genetics to the black/white gap.


December 27, 2018

Admixture analysis and genetic causation: some quotes from the literature

Filed under: Genomics,Metascience — Tags: , , — Emil O. W. Kirkegaard @ 09:42

A common comment on bias in scientific peer review is that reviewers don’t usually say openly they are applying double standards. Instead, they just silently increase their standards. If their bias against some finding is strong, the evidential burden to meet goes to infinity, making sure that nothing is rigorous enough to pass review. A case in point of this behavior was very clear in our attempts to get an admixture analysis for race and intelligence published. Although admixture analysis is commonplace in medical genetics and in scientific anthropology, somehow the interpretation of such findings is totally different when one changes the trait.

For instance, one really hostile reviewer recently wrote:

2. Second, I made the point that the fundamental logic of the study is weak. The authors simply state that they are following accepted protocol in genetic epidemiology. For one thing, the authors provide no basis to believe that their study follows accepted protocol in genetic epidemiology – their approach is certainly not widely accepted as a means of demonstrating a causal influence of continental ancestry on cognitive/behavioural traits; for another, saying ‘this is the way things are done’ does not rebut my point that the logic of the study is flawed.


4. but I do think that any such enquiries have to be held to a very high standard of evidence, given the potential social harms of misguided findings. The evidence presented here is not of a high standard at all.

I have discussed the topic of causality and admixture results at length (e.g. in my long PING write-up, and in other places), and it’s also done in this version of the paper (reviewer never commented on that, as to be expected). However, we can easily disprove his claim that admixture findings are not generally taken to indicate causality. We thank this particular reviewer for openly admitting his double standards.

The collection of quotes below is obviously not exhaustive. Indeed, I compiled most of these in about two hours. One can find 100s of such quotes if determined to spend a day or two. To find such quotes, one can use search queries like this one for African Americans.

Medical genetics

African Americans and health outcomes

In the Atherosclerosis Risk in Communities (ARIC) Study, African Americans are twice as likely as whites to develop incident type 2 diabetes—a disparity which persists even after extensive adjustment for socioeconomic status (SES) and behavioral risk factors [4]. This persistent disparity suggests that genetic factors may contribute to ethnic differences in susceptibility to type 2 diabetes.


Given the observed ethnic/racial disparities in diabetes prevalence, we hypothesized that some diabetes susceptibility alleles are present at higher frequency in African Americans than in European Americans, resulting in association between genetic ancestry and diabetes risk that is independent of its association with other non-genetic risk factors for type 2 diabetes. Thus we sought 1) to establish the association of genetic ancestry with diabetes and related quantitative traits in African Americans, after accounting for the non-genetic risk factors, and 2) to identify diabetes susceptibility loci by conducting a genome-wide admixture mapping scan.


In summary, in community-based populations with more than 7,000 African Americans, we found that genetic ancestry is significant associated with type 2 diabetes above and beyond the effects of markers of SES, and we detected several suggestive loci that may harbor genetic variants modulating diabetes risk. These results suggest that in African Americans, genetic ancestry has a significant effect on the risk of type 2 diabetes that are independent of the contribution of SES, but that no single locus with a major effect explains a large portion of the observed disparity in diabetes risk between African Americans and European Americans. In addition, they suggest that genetic measured African ancestry contributes to the risk of type 2 diabetes via both genetic and non-genetic pathways. The effect of ancestry on any individual locus in the genome is likely to be modest, but in aggregate, differences in ancestry may contribute substantially to the observed ethnic disparity in risk of type 2 diabetes.

This study is particularly noteworthy in that the authors explicitly present SIRE gaps that remain after extensive (sociologist fallacy style) controls as being evidence of genetic causation. They afterwards then hypothesize an association between genetically measured ancestry and outcome risk, which they then find. This is basically the same reasoning used by Jensen in in 1969 and forwards.

We have demonstrated that genetic ancestry may serve as a biomarker for identifying smokers who would benefit from targeted counseling regarding smoking cessation [41], [42]. One important implication of our findings is that there may be rare genetic variants relevant to smoking associated lung function decline that are population-specific and which co-vary with genetic ancestry [43]. While we cannot rule out that some of these associations may be in part due to environmental factors which co-vary with ancestry, these results highlight the scientific advantages of studying racially mixed populations. Future analyses should include admixture mapping to identify genomic regions associated with rate of lung function decline.

In summary, a consistent association of African ancestry with asthma risk was observed in a large case-control sample of self-reported African American subjects. Although confounding effects attributable to other relevant risk factors cannot be ruled out, we replicate previous findings and support the notion that ethnic disparities in asthma incidence are affected, in part, by genetic determinants. Frequency differences for risk alleles across populations and/or differential gene-environmental interactions may lead to differential disease susceptibility.

Due to this heterogeneity, genetic admixture analysis offers a unique opportunity for studying the role of genetic factors within a single, admixed population, independent of social factors, and comorbidities. Ancestry informative markers, AIMs, are genetic loci showing alleles with large frequency differences between populations that can be used to estimate bio-geographical ancestry at the level of the population and individual. Ancestry estimates at both the subgroup and individual level can be directly instructive regarding the genetics of the phenotypes that differ qualitatively or in frequency between populations (Shriver et al., 2003). Specifically, an association between genetic ancestry and a disease phenotype within an admixed group such as AAs may be an indicator of genetic factors underlying differential expression among racial groups (Peralta et al., 2010).

A greater proportion of African genetic ancestry is independently associated with higher FG
levels in a non-diabetic community-based cohort, even accounting for other ancestry
proportions, obesity and SES. The results suggest that differences between African-Americans
and whites in type 2 diabetes risk may include genetically mediated differences in glucose

The mechanisms that underlie differences in sleep characteristics between European Americans (EA) and African Americans (AA) are not fully known. Although social and psychological processes that differ by race are possible mediators, the substantial heritability of sleep characteristics also suggests genetic underpinnings of race differences. We hypothesized that racial differences in sleep phenotypes would show an association with objectively measured individual genetic ancestry in AAs.


Ancestry-phenotype association tests, which quantify associations between measured genetic ancestry and a phenotype in an admixed population, like AAs, can be used to test the extent to which the genetic characteristics underlying race may be responsible for observed population level differences.39,40 In the context of sleep, ancestry-phenotype association tests assume that multiple genetic variants, each with small effects on sleep, may have different allele frequencies in different continental populations that contributed to the admixed population. Because individuals from the admixed population inherit varying proportions of their genome from different ancestral populations, one expects contribution from any one ancestral population to show a wide range of variation (theoretically, spread between 0–100%). Any association between ancestry and a phenotype in an admixed group, then, indicates that multiple variants across the genome that have been inherited from one particular ancestral population are related to variation in the phenotype.3945 In this manner, objectively measured genetic ancestry enables us to test the uniquely genetic facet of “race” parsed from the cultural, behavioral, and psychosocial aspects that may be responsible for the observed phenotypic differences.


By utilizing the genetic variability attributable to continental admixture, we show for the first time that visually scored percent SWS and NREM EEG delta are associated with %AF in AAs. Even after adjusting for several demographic, socioeconomic and clinical covariates, %AF explained between 9% and 11% of the variance in SWS in AAs. These results show that AAs have inherited multiple alleles (either few alleles of large effect sizes or several alleles of moderate to low effect sizes) from their African ancestors that may pre-dispose them to lower percent SWS. This association between measured genetic ancestry and SWS clearly establishes a partial genetic basis underlying the observed racial differences in this dimension of sleep.

The authors even write out the Jensen logic in the abstract.

The role of genetic predisposition in this disparity is supported by two admixture mapping

studies of AAs which demonstrated that greater proportion of European ancestry was inversely
associated with fibroids in AA women.

Latin Americans and health

Some recent studies, which also used AIMs, demonstrated/ suggested that the genomic Amerindian ancestry may be protective against hypertension in women from the United States [10] , protective against metabolic syndrome in the population of Costa Rica [11] and protective against Alzheimer’s disease in Brazilian population [12] . Furthermore, recent studies in the Brazilian population showed that Amerindian individuals had lesser arterial stiffness and hypertension [13 – 14] . These studies suggest that lower risk of diseases studied in individuals with Amerindian ancestry may be due to the existence of protective genetic factors associated with this ancestry.

Significant questions remain unanswered regarding t he genetic versus environmental contributions to racial/ethnic differences in sleep and circadian rhythms. We addressed this question by investigating the association betw een diurnal preference, using the MorningnessAEveningness questionnaire (MEQ), and ge netic ancestry within the Baependi Heart Study cohort, a highly admixed Brazi lian population based in a rural town. Analysis was performed using measures of ance stry, using the Admixture program, and MEQ from 1,453 individuals. We found a n association between the degree of Amerindian (but not European of African) ancestry and morningness, equating to 0.16 units for each additional percent of Amerindian ancestry, after adjustment for age, sex, education, and residential zone. To our knowledge, this is the first published report identifying an association b etween genetic ancestry and MEQ, and above all, the first one based on ancestral contrib utions within individuals living in the same community. This previously unknown ancestral d imension of diurnal preference suggests a stratification between racial/ethnic gro ups in an as yet unknown number of genetic polymorphisms.
The authors essentially cover the entire reasoning in their abstract.


Pygmy height

Note the title “genetic determination”!

Considering a subset of 213 individuals for which DNA was available, we were able to formally compare the individual variation in height with the neutral genetic variation among individuals from the different Pygmy and Non-Pygmy populations.

Controlling for the binary categorization of individuals as Pygmies or Non-Pygmies, as well as for population substructure, we found strongly significant positive correlations between Pygmy individuals’ stature and their levels of admixture with the Non-Pygmy gene-pool estimated using the clustering software STRUCTURE. This result suggests that the major difference in average stature observed between Central African Pygmy and Non- Pygmy populations is likely determined by complex genetic factors.

In this context, Genome Wide Association studies and Admixture Mapping methods will likely reveal the genetic loci involved in the determination of the differences of average height found in existing African Pygmy and Non-Pygmy populations. This will further help us to better understand the determination and evolution of height variation among human populations.

We observed extensive and significant genetic and phenotypic differentiation (Figure 1, Figure 2, Figure S1) and varying levels of admixture among the Pygmy and Bantu populations. Average levels of Bantu ancestry, as determined by STRUCTURE (K = 2), in the three Western Pygmy populations were 27% (Bakola), 35% (Baka), and 49% (Bedzan) with individual values ranging from 16–73%. Average levels of Pygmy ancestry in the three Bantu populations were <1% (Lemande), 2% (Tikar), and 7% (Ngumba), with individual values ranging from 0–39%. We also observed a highly significant correlation between ancestry and height (p = 5.047×10−18) after correcting for the effect of sex (full model r2 = 0.7411, r2 for sex = 0.4247; r2 for ancestry = 0.3164). In addition, the effect of ancestry remains significant in a model that also includes Pygmy-Bantu ethnicity as a covariate (p = 3.8×10−5). These results are consistent with Becker et al. [21] and indicate a strong genetic influence on height. Similar findings were also observed using Pygmy samples only (pancestry = 0.000216; full model r2 = 0.5066; r2 sex = 0.3744; r2 ancestry = 0.1322) and the independent set of genome-wide microsatellite markers described in Tishkoff et al. [9] (data not shown).

We used the results from ADMIXTURE to estimate individual ancestry proportion ( K = 5 for esti- mating pygmy ancestry, and K = 8 for Asian ancestry) and its correlation with adult height for 43 men and 27 women from the different pygmy groups of the Philippines (Aeta, Agta, and Batak) and for the nonpygmy groups (Tagbanua, Zambales, Casiguran). Because K = 5 separates negritos and Asians, we used individual “negrito” ancestry proportion to correlate with their adult height. This procedure allows us to estimate the effect of genetic contribution on adult height.

As expected, mean stature estimates for the Batwa (66 males, 152.9 cm; 103 females, 145.7 cm) were lower than those for the Bakiga (20 males, 165.4 cm; 41 females, 155.1 cm; Fig. 2 B ). Batwa stature is significantly positively correlated with the proportion of Bakiga admixture: for males, females, and for all samples combined after regressing out the sex effect (Fig. 2 C – E ), confirming a genetic basis for the African pygmy phenotype (6, 12).


We can draw four primary conclusions from our analyses. ( i ) The African pygmy phenotype has a genetic basis, rather than a solely environmental one, based on the positive correlation between stature and Bakiga admixture for Batwa individuals raised in Batwa communities (Fig. 2 C – E ). These results confirm those obtained from other African rainforest hunter-gatherer populations by Becker et al. (12) and Jarvis et al. (6) and are consistent with individual case observations from Cavalli-Sforza (4).

Although environmental variation is an important factor influencing adult height, such influences are considered insufficient to account fully for observed population differences. Some African populations are considerably taller than others, for example, despite experiencing poorer nutrition and elevated levels of pathogen exposure (Deaton, 2007), suggesting that such differences may have a genetic basis. To date, very few studies have addressed this issue. Notable exceptions are studies investigating the difference in height observed between the Baka pygmies of Cameroon and taller neighbouring non-Pygmy populations (Becker et al., 2011; Jarvis et al., 2012). Both of these studies showed that Pygmy individuals who were genetically more similar to non-Pygmy individuals (i.e. higher levels of genetic admixture) were taller. Most recently, Perry et al. (2014) have shown that the pygmy phenotype likely arose several times independently due to positive natural selection for short stature. Additional evidence for genetic factors underlying population differences in height come from a Korean population (Cho et al., 2009).[…]

Substantial levels of non-Pygmy genetic admixture have been observed across Central African Pygmy populations [ 24 – 2 7 , 29 – 31 ] , correlating positively with adult standing height [ 32 – 34 ] . The general genetic difference be tween Pygmies and non-Pygmies together with the correlation of genetic admixture and standing height suggests that adult body size differences among Central African Pygmies and neighboring non-Pygmies are attributable in large part to genetic factors, arguing against a view that diminutive Central African pygmy body size is the consequence solely of phenotypic plasticity in a challenging nutritional and parasitic environment [ 8 ].


Our findings accord with prior observations [ 59 , 63 ] that while Pygmy body size is generally proportionally reduced relative to non-Pygmies, their leg lengths are significantly shorter relative to their trunk length. Importantly, our results provide further support for an appreciable genetic component to the determination of body size differences between Pygmies and non-Pygmies, as implied by the correlations observed between the different measures and inferred levels of non-Pygmy admixture that replicate those reported previously for adult standing height [ 32 – 34 ] .

Note that the authors changed their wordings a bit in the published version. Maybe they too encountered some funny reviewers!

Amerindian-descent physical appearance

A number of studies look at admixture in Amerindian populations, relating both macro-race/continental ancestry to phenotypes as well as sub-Amerindian clusters.

Furthermore, studies of regional human genome diversity, and its bearing on phenotypic variation, have so far been strongly biased towards European-derived populations17. The study of populations with non-European ancestry is essential if we are to obtain a more complete picture of human diversity. Latin America represents an advantageous setting in which to examine regional genetic variation and its bearing on human phenotypic diversity18, considering that the extensive admixture resulted in a marked genetic and phenotypic heterogeneity2,3,19. Relative to disease phenotypes, the genetics of physical appearance can be viewed as a model setting with distinct advantages for analyzing patterns of genetic and phenotypic variation. Many physical features are relatively simple to evaluate, show substantial geographic diversity and are highly heritable. We have previously shown that variation at a range of physical features correlates with continental ancestry in Latin Americans19 and have identified genetic variants with specific effects for a number of features20,21,22.


We infer the timings of these genetic contributions and relate them to historically-attested migrations, for example providing compelling new evidence of widespread ancestry from undocumented migrants during the colonial era. We further show how differences in Native and European sub-continental ancestry components are associated with variation in physical appearance traits in Latin Americans, highlighting the impact of regional genetic variation on human phenotypic diversity.

August 16, 2017

Some more reviews for our PING paper

Filed under: Differential psychology/psychometrics,Genomics — Tags: , , , — Emil O. W. Kirkegaard @ 19:31

How easy is it to get provocative findings using mainstream methods published? Well, it depends on how provocative. Here’s a second round of generally nonsensical reviews for our PING paper (which you can read here and judge yourself: a good chunk of readers of this blog are themselves researchers and don’t need others to review for them!).

As I recall, Cochran and Harpending spent a few years getting their excellent Jewish paper out. Guess we’ll see how long this takes.

Copied verbatim.

Reviewer #1: MS#PAID-D-17-01184  “Bio-Ancestry, Cognitive Ability, and Socioeconomic Outcomes”
Reviewer #1

Comments for the Authors

This paper is so poorly written, peppered with a large number of unexplained and undefined acronyms throughout, that it is excruciatingly difficult to wade through the text and comprehend just what in the world the authors are attempting to do.  But I surmise that the main goal of this paper is to introduce a new word “bio-ancestry” as something somehow different from race or ethnicity.

Rachel Dolezal notwithstanding, most (sane) people are aware of their ancestral background and perceive it accurately.  So, for the most part, “bio-ancestry” and “self-identified race” are the same thing.  And, quite remarkably, that’s exactly what the authors’ data show!  Ancestors of most whites come from Europe, ancestors of most blacks come from Africa, and ancestors of most Asians come from Asia!  How remarkable!  Do we need quantitative data analyses to know whites are European in ancestral origin?

To make matters worse, while the concept of “bio-ancestry” is supposedly central to (and the primary raison d’être of) this paper, the authors do not at all explain how they measured it in their sample!  The authors simply state that the data are “available,” without explaining how the variable was measured at all.  Thus the central concept of the paper remains entirely mysterious.

So, when all is said and done, what the authors have managed to show with their (unexplained) data are:  1) there are race differences in intelligence; and 2) there are race differences in socioeconomic status.  We have known both of these for at least half a century.

Contrary to what the authors may believe, race is not a “cultural identity” or a “social construction.”  Race is a biological reality, and there are real, biological, genetic differences between races in all measurable quantitative traits.  I would recommend that the authors stay away from the likes of Rachel Dolezal and introductory sociology textbooks, and familiarize themselves with scientific (not sociological) literature on race and race differences.  There are observable race differences among human groups, because they separately evolved in different regions of the world.  Race is not a social construction or a matter of self-identity (once again, contra Rachel Dolezal).

Reviewer #2: This article seeks to demonstrate that people of African descent have lower cognitive ability than people of European descent, and this difference has a genetic explanation. The authors utilize data from a separate study (Pediatric Imaging, Neurocognition, and Genetics) that collected cognitive test scores and “bio-ancestry” data from racially diverse 12 year-old children in several major urban areas in the US.

There are many flaws and omissions in this work that attempt to lead the reader to the notion that Europeans are genetically smarter than people of African descent – a notion that has been largely rejected by the mainstream scientific community. Given this fact, it is odd that the authors do not make a case at all for the importance of this work in the introduction. It remains unclear why this work would be important and how the research world could or should use the findings.

This paper is extremely weak scientifically. Beyond the questionable thesis of the study, I will describe the most egregious scientific problems below.

The authors admit that they did not adjust or weight the sample to be demographically representative, which is problematic given the sweeping and general conclusions made about entire ethnic groups. They also leave many gaps in the methodology and did not adequately describe the sample or how the sample was acquired. There should be a table of demographic information, by SIRE grouping, showing relevant data, such as parent’s education, SES, the 7 test scores, “g” as they compute it, etc. The authors speak at length about the relevance or non-relevance of these demographic factors, making it even more odd that the data is omitted, despite the fact that they include an excessive number of tables of alternate calculations. Because of the weakness of the methods section, we don’t know if there were biases in recruitment that differentially impacted the ethnoracial composition of the sample. It is well known that people of color may be underrepresented or overrepresented in
research studies for any number of reasons, and the authors do not mention this at all or make any attempts to correct for it.

According to the authors, “psych 1.6.9 (Revelle, 2017), factor analysis was used to extract a g factor from the cognitive data (Jensen, 1998). Using this method, g scores for 1,369 individuals were derived.” The seven cognitive tests used to compute g were: Dimensional Change Card Sort Test (Card Sort); Flanker Inhibitory Control and Attention Test (Flanker); Picture Sequence Memory Test (Picture Sequence); Pattern Comparison Processing Speed Test (Pattern Recognition); Oral Reading Recognition Test (Pattern Recognition); List Sorting Working Memory Test (List Sort); and Picture Vocabulary Test (Vocabulary).

However, the authors need to provide some legitimate precedents (peer-reviewed papers) to show that those 7 tests are a legitimate means of computing “g”. Most scholars recognize that intelligence is multidimensional, so there is a problem with assigning it a single numerical value without any justification for this approach. They also need to give more description and explanation for their approach to finding “g,” backed up by peer-reviewed sources, rather than drowning readers in pages of supplementary data tables. It looks like they just made up this method on their own.

Additionally, the authors do not report any psychometric data on the 7 tests. It would be important to describe the validity of these scales for each of the ethnic groups examined, and describe the tests norms for those groups. It is possible that adjustments may be required by group (for example IQ tests are frequently gender normed).  If the test has not been validated on any of the groups under study, then that should be listed as an important limitation to the study. The author’s also need to consider that any test of cognitive ability or IQ will reflect that culture’s priorities and markers of what they consider intelligent behavior. Therefore, it is not accurate to refer to the test results or their determination of “g” as a “cognitive ability,” rather they should simply call it “test performance.”

The ethnicity of the person giving the test to the child was not reported. Any person doing trace and ethnicity research should be aware of the literature showing that a cultural mismatch can affect outcomes/scores. If all the testers were European American, then we would expect a bias in test scores in favor of European American children. Likewise, if all the tests were given in English, we’d expect lower scores among those for whom English is a second language. The authors attempt to control for language with a single dummy variable, but a continuous measure of English exposure would be more appropriate. Finally, stereotype threat also needs to be considered and controlled for in any study of race and IQ (Steele & Aronson, 1995), and this concept was not even mentioned in the paper.

In general, there are too many redundant tables and analyses. The authors should choose those that are most salient rather than dumping so many analyses on the reader, and better explain their results so that exhaustive figures and tables are not needed.

One strength of the paper is that the authors have attempted to disentangle culture from race using SIRE and genetic data. However, these variables are not independent. More African genes lead to both darker skin and classification in a non-White ethnic group, and both lead to discrimination and fewer opportunities, which leads to lower income, which leads to poorer nutrition and fewer educational opportunities, which leads to lower test scores. So, the fact that both SIRE and genetic data both find lower test scores in children is not unexpected or extraordinary.

Previous studies have used blood markers to estimate the percentage of European in a Black child’s family tree and no correlation was found between the number of “European” blood markers and IQ. When ‘Black genes’ are not visible to the naked eye and are not associated with membership in a Black community, they do not have much effect on young children’s test scores. Growing up in an African American rather than a European American family substantially reduces a young child’s test performance. When Black Americans raised in White families have higher IQ scores, but when they reach adolescence, their test scores fall, indicating differences are environmental, not genetic. The authors do not discuss the discrepant findings of other researchers, and how this fits with the current study.

Most egregiously, the authors wrongly dismiss the well-known harms of racism. They state “it is not clear that race-phenotype discrimination is actually a potent force in the US,” despite a plethora of findings to the contrary from the NSAL epidemiological study of Black Americans and others. The only source given for their fantastic notion about the non-harm caused of discrimination is research by Mill and Stein (2016), which actually shows the opposite of what they claim (e.g., light-skinned Blacks who choose to pass as Whites make more money than those who choose not to).

They also wrongly state, “it is unlikely that such discrimination, which is typically envisioned as market-based, could directly lead to the association between ancestry and cognitive ability, given the ages of the participants.” Classic research first published by Clark and Clark (1939; doll studies) show that even children are aware of racial hierarchies and preferences, and these studies have been replicated in recent times. Children as young as six years old are aware of race and start to show implicit bias (Baron & Banaji, 2006). Thus, the authors need to acknowledge the negative impact of being a stigmatized minority child in a socioracial hierarchy.

Further, they state “it is unlikely that e.g., European ancestry would be related to SES among native Mexicans and native Puerto Ricans” without any citations. It seems like the authors must not be very knowledgeable diversity researchers if they are unaware of colorism rampant in those societies. In the manuscript, too many claims like this are not backed up with peer-reviewed research or any citations at all.

The authors make are extraordinary claims, and such claims require extraordinary data. These findings are predictable and actually tells us nothing about cognitive ability and race. Furthermore, the authors fail to provide an adequate review of the literature, and do not even acknowledge alternative explanations such as stereotype threat, or the fact that individuals living in minority and/or low income communities may be exposed to lead and other toxins that influence cognitive functioning more than others. There is literature on how poor maternal nutrition during pregnancy can influence children’s brain development, which is also not mentioned.

The authors need to reconcile their findings with established research on racial issues like stereotype threat, achievement gaps, racial discrimination, structural racism, and colorism, before resorting to unsound and largely debunked theories about genetic racial inferiority/superiority.

Reviewer #3: Given the march of research over the past 20 years or so into the impact of genetics on group mean differences in intelligence, it was inevitable that this manuscript was going to be written. I think that the manuscript should be published eventually, but there need to be some moderate changes in the manuscript before it is fully acceptable.

Major issues:
*       I thought that the intercorrelations in Table 1 looked weak compared to most cognitive test batteries. My suspicions were confirmed in Table S14, which seems to indicate that the general cognitive ability factor seems to have captured 26% of variance. (In contrast, the socioeconomic status seems to have captured 63% of variance, according to Table S17.) This is much less than what is usually captured by g, and the authors need to state this.
*       Yes, the reliability of the g factor scores is probably high, but is there any information about the reliability of the factor’s constituent tests? If so, please provide this information.
*       In the first paragraph of p. 14, the authors summarize their data from Tables 7-10 and S1-S4. Some of their summaries (e.g., range of Central Asian betas for socioeconomic outcomes) do not seem to match the data in the tables. The authors need to double-check all the numbers in this summary.
*       One reason that SIRE is an inferior predictor to genomic ancestry is undoubtedly statistical. SIRE is a dichotomous variable, while genomic ancestry is continuous. Nominal variables almost always have more variability and statistical power than continuous variables. The authors should discuss this.
*       I would like the authors to highlight some of the anomalies in more in depth. For example, Table 16 might indicate that SIRE is a better predictor of cognitive ability for Hispanics than genomic ancestry, which is not in line with the other results. Also, a discussion of the inconsistent East Asian results should be expanded on.

Minor issues:
*       Similar prior research (e.g., Fuerst & Kirkegaard, 2016) has all been conducted at the national or subnational level—not with individual data. Skeptics of genetic hypotheses of group mean differences could state that past studies suffered from the ecological fallacy. The current study overcomes the shortcomings of past work, and the authors should state this.
*       Please elaborate more on the “logic [that] has been intricately detailed by Templeton (2001)” (p. 4, lines 69-70).
*       The authors should also explain the factor analysis better. What is the extraction method? Did the authors do any factor rotation?
*       I agree with the author’s use of the EFL variable. But please provide information about which cognitive tests would require English mastery.
*       Please don’t use the phrase “coefficient of determination” for r2. The r2 statistic is just a correlation coefficient squared. Because correlation is not causation, the phrase implies that one variable causes the other.
*       I’m thankful for the authors’ concern about research degrees of freedom. To fully rest any concerns, please indicate the standards by which “. . . ancestry components with high standard errors were not dropped” (p. 11, lines 233-234).
*       Tables 7-10 would be easier to read if the independent variables were always in the same order.
*       Ranges are typically reported from minimum to maximum, but many ranges are reported in the reverse order, which is awkward.
*       Figure 1 would be easier to read if the dependent variables were a table heading, instead of a right axis label.
*       Please revise “validity” on p. 15, line 288 to “predictive ability.” The term “validity” has too many meetings to be exact enough in this context.
*       Independent variables in Tables 11, 12, 15, 16 are not clear. Which variables are SIRE variables?
*       Please eliminate the phrase “. . . and well beyond chance levels” (p. 21, line 370). Statistical significance can be caused by more than just “chance.”
*       The “market-based” phrase on p. 24 needs to be explained more clearly.
*       In Table S11, what does the unlabeled column (with rows numbered 1-6) mean? I think it’s just extraneous information, but I’m not sure.

June 13, 2017

New paper out: Admixture in Argentina (with John Fuerst)

We have a new big paper out:

  • Kirkegaard, E. O. W., & Fuerst, J. (2017). Admixture in Argentina. Mankind Quarterly, 57(4). Retrieved from


Analyses of the relationships between cognitive ability, socioeconomic outcomes, and European ancestry were carried out at multiple levels in Argentina: individual (max. n = 5,920), district (n = 437), municipal (n = 299), and provincial (n = 24). Socioeconomic outcomes correlated in expected ways such that there was a general socioeconomic factor (S factor). The structure of this factor replicated across four levels of analysis, with a mean congruence coefficient of .96. Cognitive ability and S were moderately to strongly correlated at the four levels of analyses: individual r=.55 (.44 before disattenuation), district r=.52, municipal r=.66, and provincial r=.88. European biogeographic ancestry (BGA) for the provinces was estimated from 25 genomics papers. These estimates were validated against European ancestry estimated from self-identified race/ethnicity (SIRE; r=.67) and interviewer-rated skin brightness (r=.33). On the provincial level, European BGA correlated strongly with scholastic achievement-based cognitive ability and composite S-factor scores (r’s .48 and .54, respectively). These relationships were not due to confounding with latitude or mean temperature when analyzed in multivariate analyses. There were no BGA data for the other levels, so we relied on %White, skin brightness, and SIRE-based ancestry estimates instead, all of which were related to cognitive ability and S at all levels of analysis. At the individual level, skin brightness was related to both cognitive ability and S. Regression analyses showed that SIRE had little detectable predictive validity when skin brightness was included in models. Similarly, the correlations between skin brightness, cognitive ability, and S were also found inside SIRE groups. The results were similar when analyzed within provinces. In general, results were congruent with a familial model of individual and regional outcome differences.


We carried out our usual thoro analysis of the predictions of genetic models of cognitive ability/social inequality with regards to admixture. We combined a variety of data sources to estimate mean racial admixture by subnational units, and related these to estimates of cognitive ability (CA) and S factor scores. In this case, we were also able to find individual-level skin tone/color data, as well as really crude cognitive ability data (2-5 items) and a decent number of social measures (>10). All in all, everything was more or less as expected: substantial correlations between European ancestry, CA and S, and some relationships to skin tone as well. The most outlying results were those for the smaller subnational units (districts, municipals) for which our estimates of European ancestry based on SIRE were not strong related to CA/S. Presumably this was due to a variety of factors including sampling error, SIRE x location interactions for predicting ancestry (as seen in Brazil), and ancestry x location interactions for predicting CA/S.

The paper thus is another replication of the general patterns we already saw most other places we already looked. There are still some large American countries left to cover (e.g. Bolivia, Venezuela), but they are hard to get decent data for. We will probably have to rely on the LAPOP survey to estimate many of them.

Some figures of interest

Maps for those who like them.

Main regressions.

March 5, 2017

Nisbett’s 2009 book on Intelligence: reviews

Nisbett’s 2009 book on intelligence, Intelligence and how to get it, is a goldmine of stupid claims that one can quote-mine for introduction and discussion sections of papers. For instance, Nisbett tries to argue that brain size is not causal for intelligence! He writes:

The correlation between cranial capacity and IQ is probably about .30-.40 in the white population (McDaniel, 2005; Schoenemann, Budinger, Sarich, and Wang, 1999). Rushton and Jensen (2005) claim that cranial capacity for blacks is on average smaller than that for whites. A difference between black and white brain size is not always found, however (National Aeronautics and Space Administration, 1978). More important, the correlation found within the white population probably does not indicate that greater brain size causes higher IQ. Within a given family, the sibling with the larger brain has no higher IQ on average than the sibling with the smaller brain (Schoenemann, Budinger, Sarich, and Wang, 1999).

That’s a little surprising given that this association does show up within families when we use the cruder proxy of head size (n = very large). Also because we know the brain uses some 20% of body’s energy while taking up some 2% of the weight, making it 10 times as metabolically expensive as other organs. And human brain size increased enormously during the recent evolution (and in fact continuous doing so in line with FLynn effect). And some species get rid of their brain when they no longer need it to save energy. The large brain is problematic for childbirth since humans have the bizarre design of being bipedal and giving birth between the legs too. Many deaths in childbirth must be related to the unusually large infant head, which means there is a direct selection against head size related to this. If brain size gets larger nonetheless, obviously there must have been strong pressure for it. And so on, our prior that this relationship is causal is nearly 100%. The study in question, thus, seems suspect. If we read the abstract, we immediately see the problem:

Hominid brain size increased dramatically in the face of apparently severe associated evolutionary costs. This suggests that increasing brain size must have provided some sort of counterbalancing adaptive benefit. Several recent studies using magnetic resonance imaging (MRI) have indicated that a substantial correlation (mean r = ≈0.4) exists between brain size and general cognitive performance, consistent with the hypothesis that the payoff for increasing brain size was greater general cognitive ability. However, these studies confound between-family environmental influences with direct genetic/biological influences. To address this problem, within-family (WF) sibling differences for several neuroanatomical measures were correlated to WF scores on a diverse battery of cognitive tests in a sample of 36 sibling pairs. WF correlations between neuroanatomy and general cognitive ability were essentially zero, although moderate correlations were found between prefrontal volumes and the Stroop test (known to involve prefrontal cortex). These findings suggest that nongenetic influences play a role in brain volume/cognitive ability associations. Actual direct genetic/biological associations may be quite small, and yet still may be strong enough to account for hominid brain evolution.

Who the hell tries to establish a null relationship with a sample size of 36??? Of course published in a top journal, PNAS. Top journals routinely publish studies that try to disprove obvious thesis with tiny samples, e.g. this adoption study published in Nature, with a 3 group comparison with n’s 36, 22 and 24. Such tiny samples can produce any given result and so are meaningless to draw conclusions from alone. Nisbett is a reverse thermometer. Whenever he says something controversial, he is always wrong. So, one can simply take this opinion and reverse it to get the truth. Wrong about heritability of IQ, wrong about transracial adoption studies, wrong about brain size’s causal effect on IQ, wrong about race being a social construct in a problematic way, wrong about morality of studying group differences, wrong about genetic contribution to Black White IQ gap, wrong about…

In general, Rushton and Jensen were not amused, so they wrote a very long combined rebuttal and review:

However, unknown to me was that James Lee had also written a pretty thorough and interesting review of the book:

Lee writes:

If Nisbett is truly confident that degree of European ancestry shows no association whatsoever with IQ, he should call for studies employing superior ancestry estimates of the kind displayed in Fig. 3. Note that the increased reliability of ancestry estimation does not obviate the need for a large sample. Even under an extreme hereditarian hypothesis assigning mean genotypic IQs of 80 and 100 respectively to the African and European ancestors of African Americans, we can only expect an increase of .2 IQ points for every percentage increase in European ancestry. The considerable IQ variation among African Americans makes an effect of this size difficult to detect in small samples.

The ultimate test of the hereditarian hypothesis is of course the identification of the genetic variants affecting IQ and a tally of their frequencies in the two populations. Because of their likely small effects, we may have to identify dozens of such variants before we are able to make any confident inferences regarding the overall genotypic means of different populations. Although this task is currently within our technological means, it seems practically out of reach in the very short term. Ancestry estimation is much less costly than gene-trait association research and thus offers the advantage of an immediate increment toward the resolution of this issue.

I believe we have nearly fulfilled this goal with the following studies:

The only possible non-genetic hypothesis (I can think of) that remains is the colorism model, where ancestry is a non-causal confound with skin tone (correlation is ~.50), which then provokes (White) racism which somehow manages to lower IQ scores of some groups, while apparently not harming the others (e.g. US Indians who do very well but are dark skinned), and also leaving unscathed non-IQ traits like self-esteem. This model can be ruled out via sibling studies, or simply by controlling for skin tone. Unfortunately, the dataset used for the admixture study above (PING) does not have a skin tone variable, so one cannot directly rule out colorism in that dataset. I’m pretty confident that colorism can be ruled out in other datasets, however, see the extended discussion of colorism at Human Varieties. I think that in general, colorism does not fit the numbers at all even if real and sizable, so it can be rejected as being the entire explanation solely on numerical grounds.

The issue of using identified variants (usually SNPs) is problematic due to the differential LD decay, which I have discussed at length.

October 10, 2016

Individual genomic admixture and cognitive ability

So, I posted this:

We used data from the PING study (n≈1200) to examine the relationship between cognitive ability, socioeconomic outcomes and genomic racial ancestry. We found that when genomic ancestry was not included in models, self-reported race/ethnicity (SIRE) was a useful predictor of cognitive ability/S, but when genomic ancestry was included, SIRE lost much or most of its validity. In particular, for African Americans, the SIRE standardized beta changed from about -1.00 to .20. European genomic ancestry was found to be positively related to cognitive ability/S (r’s .26/.33) including when SIRE was controlled, while African genomic ancestry was found to be negatively related to cognitive ability/S (r’s-.36/-.30) also when SIRE was controlled.

Key words: race differences, group differences, intelligence, IQ, cognitive ability, social status, inequality, ancestry, admixture, genomics

It’s note quite definitive because it lacks ability to distinguish between colorism and genetic effects. Environmentalists are very stubborn, so one needs very strong evidence. So, we need datasets that have:

  • Genomic ancestry, preferably not using just a few AIMs
  • Racial appearance data: skin brightness in particular
  • Cognitive ability
  • Socioeconomic indicators (for S factor)

What datasets are there like this? Let’s find them and start applying so we can settle this question once and for all.

Pelotas (Brazil) Birth Cohort Study

  • n=3700.
  • Mixed race.
  • All four variables of interest.
  • Requires approval.
  • In Portuguese.

The Coronary Artery Risk Development in Young Adults Study (CARDIA)

  • “It began in 1985-6 with a group of 5115 black and white men and women aged 18-30 years”
  • 3 cognitive tests: Rey Auditory-Verbal Learning Test (RAVLT), Digit Symbol Substitution Test (DSST) and Stroop Test
  • Skin brightness. Can’t see on the website, but used in studies that used the dataset.
  • Socioeconomic outcomes. Yes, same as above.
  • Requires approval.

UK Biobank

  • n=500k planned. Currently about 100k.
  • Mostly Britons, but some British Africans and others.
  • Crappy cognitive tests, good genomic ancestry and no racial phenotype data (I think).
  • Requires approval. Must be for medical reasons.

The Multi-Ethnic Study of Atherosclerosis (MESA)

Add Health

May 27, 2015

The sibling control design

Filed under: Uncategorized — Tags: , , , — Emil O. W. Kirkegaard @ 23:48

A friend of mine and his brother just received their 23andme results.



In a table they look like this (I have added myself for comparison):

Macrorace Bro1 Bro2 Emil
European 52.6 53 99.8
MENA 42.5 41.3 0.2
South Asian 2.8 3.4 0
East Asian & Amerindian 1.1 0.7 0
Sub-Saharan African 0.5 0.5 0
Oceanian 0.5 0 0
Unassigned 0 1.1 0.1
Sum 100 100 100.1
Mesorace Bro1 Bro2 Emil
Northern 51.5 51.5 91.3
Southern 1 1.2 0
Ashkenazi 0.1 0 2.9
Eastern 0 0 4
Common European 0.1 0.4 1.5
Middle Eastern 42 40.8 0
North African 0.3 0.2 0.2
Common MENA 0.2 0.3 0
South Asian 2.8 3.4 0
East Asian & Amerindian
East Asian 0.7 0.4 0
Southeast Asian 0.2 0 0
Amerindian 0 0.1 0
Common East Asian & Amerindian 0.1 0.1 0
Sub-Saharan African
East 0.3 0.3 0
West 0.2 0.4 0
Central & South 0 0 0
Common Sub-Sahara African 0.1 0.1 0
Oceanian 0 0 0
Unassigned 0.5 1.1 0.1
Sum 100.1 100.3 100
Microrace Bro1 Bro2 Emil
Scandinavian 21.3 24.2 37.3
French & German 10.5 14.9 0.8
British and Irish 8.9 4.9 11
Finnish 0 0 0.3
Common Northern 10.7 7.5 42
Italian 0.9 0.8 0
Sardinian 0 0 0
Iberian 0 0 0
Balkan 0 0 0
Common Southern 0.1 0.4 0
Ashkenazi 0.1 0 2.9
Eastern 0 0 4
Common European 0.1 0.4 1.5
Middle Eastern 42 40.8 0
North African 0.3 0.2 0.2
Common MENA 0.2 0.3 0
South Asian 2.8 3.4 0
East Asian & Amerindian
East Asian
Japanese 0.2 0 0
Mongolian 0.1 0.2 0
Korean 0 0 0
Yakut 0 0 0
Chinese 0 0 0
Common East Asian 0.5 0.2 0
Southeast Asian 0.2 0 0
Amerindian 0 0.1 0
Common East Asian & Amerindian 0.1 0.1 0
Sub-Saharan African
East 0.3 0.3 0
West 0.2 0.4 0
Central & South 0 0 0
Common Sub-Sahara African 0.1 0.1 0
Oceanian 0 0 0
Unassigned 0.5 1.1 0.1
Sum 100.1 100.3 100.1


Note that I have used data from all three zoom levels. Sometimes people will ask the nonsensical question “How many races are there?” Well, it depends on how much you want to zoom in. 23andme supports three zoom-levels. I have called the groups identified macro-, meso- and microraces.

So we see that the siblings are almost but not exactly the same. As Jason Malloy has pointed out, this is a very important fact because it allows for a sibling-control study akin to Murray (2002). In this design, researchers find full-siblings, measure some predictor variable(s) from each sibling and compare them on the outcome variable(s). This is an important design because it removes the common environment (between family effects) confound that make interpretation of regression results difficult, e.g. those in The Bell Curve (Herrnstein and Murray, 1994). Murray (2002) used each sibling’s IQ to predict socioeconomic outcomes at adulthood (age 30-38): income, marriage and birth out of wedlock. I reproduce the tables below:


The results are similar to the results from regression modeling presented in The Bell Curve. In other words, for this question, the effects were not due to the common environment confound.

The same design can be used for the question of whether racial ancestry predicts outcome variables such as general cognitive ability (g factor, IQ, etc.), income, educational attainment and crime rate. Since siblings differ somewhat in their ancestry (as was shown in the tables and figures above), then if the genetic hypothesis for the trait is true, then the differences in ancestry will slightly predict the level of the trait.

In practice for this to work, one will need a large sample of sibling sets (pairs, triples, etc.). To make it easy, they should not be admixture from more than 2 genetic clusters/races. So e.g. African Americans in the US are good for this purpose as they are mostly a mix of European and African genes, but there are other similar groups in the world: Colored in South Africa, Greenlanders in Denmark and Greenland (Moltke et al, 2015), admixed Hawaiians, basically everybody in South America (see admixture project, part I).


April 5, 2015

The general brain factor, working memory, parental income and education, and racial admixture


Remains to be done:

  • Admixture analysis (doing)
  • Proofreading and editing
  • Deciding how to control for age and scanner (technical question)


I explore a large (N≈1000), open dataset of brain measurements and find a general factor of brain size (GBSF) that covers all regions except possibly the amygdala (loadings near-zero, 3 out of 4 negative). It is very strongly correlated with total brain size volume and surface area (rs>.9). The factor was (near)identical across genders after adjustments for age were made (factor congruence 1.00).

GBSF had similar correlations to cognitive measures as did other aggregate brain size measures: total cortical area and total brain volume. I replicated the finding that brain measures were associated with parental income and educational attainment.



A recent paper by Noble et al (2015) has gotten considerable attention in the media. An interesting fact about the paper is that most of the data was published in the paper, perhaps inadvertently. I was made aware of this fact by an observant commenter, FranklinDMadoff, on the blog of James Thompson (Psychological Comments). In this paper I make use of the same data, revisit their conclusions as well do some of my own.

The abstract of the paper reads:

Socioeconomic disparities are associated with differences in cognitive development. The extent to which this translates to disparities in brain structure is unclear. We investigated relationships between socioeconomic factors and brain morphometry, independently of genetic ancestry, among a cohort of 1,099 typically developing individuals between 3 and 20 years of age. Income was logarithmically associated with brain surface area. Among children from lower income families, small differences in income were associated with relatively large differences in surface area, whereas, among children from higher income families, similar income increments were associated with smaller differences in surface area. These relationships were most prominent in regions supporting language, reading, executive functions and spatial skills; surface area mediated socioeconomic differences in certain neurocognitive abilities. These data imply that income relates most strongly to brain structure among the most disadvantaged children.

The results are not all that interesting, but the dataset is very large for a neuroscience study, the median sample size of median samples sizes in a meta-analysis of 49 meta-analysis is 116.5 (Button et al, 2013; based on the data in their Table 1). Furthermore, they provide a very large set of different, non-overlapping brain measurements which are useful for a variety of analyses, and they provide genetic admixture data which can be used for admixture mapping.

Why their results are as expected

The authors give their results (positive relationships between various brain size measures and parental educational and economic variables) environmental interpretations. For instance:

It is possible that, in these regions, associations between parent education and children’s brain surface area may be mediated by the ability of more highly educated parents to earn higher incomes, thereby having the ability to purchase more nutritious foods, provide more cognitively stimulating home learning environments, and afford higher quality child care settings or safer neighborhoods, with more opportunities for physical activity and less exposure to environmental pollutants and toxic stress3, 37. It will be important in the future to disambiguate these proximal processes by measuring home, family and other environmental mediators21.

However, one could also expect the relationship to be due to general cognitive ability (GCA; aka. general intelligence) and its relationship to favorable educational and economic outcomes, as well as brain measures. Figure 1 illustrates this expected relationship:

Figure 1

Figure 1 – Relationships between variables

The purple line is the one the authors are often arguing for based on their observed positive relationships. As can be seen in the figure, this positive relationship is also expected because of parental education/income’s positive relationship to parental GCA, which is related to parental brain properties which are highly heritable. Based on these well-known relationships, we can estimate some expected correlations. The true score relationship between adult educational attainment and GCA is somewhere around .56 (Strenze, 2007).

The relationship between GCA and whole brain size is around .24-.28, depending on whether one wants to use the unweighted mean, n-weighted mean or median, and which studies one includes of those collected by Pietschnig et al (2014). I used healthy samples (as opposed to clinical) and only FSIQ. This value is uncorrected for measurement error of the IQ test, typically assumed to be around .90. If we choose the conservative value of .24 and then correct with .90, we get .27 as an estimated true score correlation.

The heritability of whole brain size is very high. Bouchard (2014) summarized a few studies: one found cerebral total h^2 of = .89, another whole-brain grey matter .82, whole-brain white matter .87, and a third total brain volume .80. Perhaps there is some publication bias in these numbers, so we can choose .80 as an estimate. We then correct this for measurement error and get .89. None of the previous studies were corrected for restriction of range which is fairly common because most studies use university students (Henrich et al, 2010) who average perhaps 1 standard deviation above the population mean in GCA. If we multiply these numbers we get an estimate of r=.13 between parental education and total brain volume or a similar measure. As for income, the expected correlation is somewhat lower because the relationship between GCA and income is weaker, perhaps .23 (Strenze, 2007). This gives .05. However, Strenze did not account for the non-linearity of the income x GCA relationship, so it is probably somewhat higher.

Initial analyses

Analysis was done in R. Code file, figures, and data are available in supplementary material

Collecting the data

The authors did not just publish one datafile with comments about the variables, but instead various excel files were attached to parts of the figures. There are 6 such files. They all contain the same number of cases and they overlap completely (as can be seen by the subjectID column). The 6 files however do not overlap completely in their columns and some of them have unique columns. These can all be merged into one dataset.

Dealing with missing data

The original authors dealt with this simply by relying on the complete cases only. This method can bias the results when the data is not missing completely at random. Instead, it is generally better to impute missing data (Sterne et al, 2009). Figure 1 shows the matrixplot of the data file.


The red areas mean missing data, except in the case of nominal variables which are for some reason always colored red (an error I think). Examining the structure of missing data showed that it was generally not possible to impute the data, since many cases were missing most of their values. One will have to exclude these cases. Doing so reduces the sample size from 1500 to 1068. The authors report having 1099 complete cases, but I’m not sure where the discrepancy arises.

Dealing with gender

Since males have much larger brain volumes than females, even after adjustment for body size, there is the question of how to deal with gender (no distinction is being made here between sex and gender). The original authors did this by regressing the effect out. However, in my experience, regression does not always accomplish this perfectly, so when possible one should just split the sample by gender and calculate results in each one-gender sample. One cannot do the sampling splitting when one is interested in the specific regression effect of gender, or when the resulting samples would be too small.

Dealing with age

This problem is tricky. The original authors used age and age2 to deal with age in a regression model. However, since I wanted to visualize the relationships between variables, this option was not useful to me because it would only give me the summary statistics with the effects of age, not the data. Instead, I calculated the residuals for all variables of interest after they were regressed on age, age2 and age3. The cubic age was used to further catch non-linear effects of age, as noted by e.g. Jensen (2006: 132-133).

Dealing with scanning site

One peculiar feature of the study not discussed by the authors was the relatively effect of different scanners on their results, see e.g. their Table 3. To avoid scanning site influencing the results, I also regressed this out (as a nominal variable with 13 levels).

Dealing with size

The dataset does not have size measures thus making it impossible to adjust for body size. This is problematic as it is known that body size correlates with GCA both within and between species. We are interested in differences in brain size holding body size equal. This cannot be done in the present study.

Factor analyzing brain size measurements

Why would one want to factor analyze brain measures?

The short answer is the same as that to the question: why would one want to factor analyze cognitive ability data? The answer: To explore the latent relationships in the data not immediately obvious. A factor analysis will reveal whether there is a general factor of some domain, which can be a theoretically very important discovery (Dalliard, 2013; Jensen, 1998:chapter 2). If there is no general factor, this will also be revealed and may be important as well. This is not to say that general factors or the lack thereof are the only interesting thing about the factor structure, multifactor structures are also interesting, whether orthogonal (uncorrelated) or as part of a hierarchical solution (Jensen, 2002).

The long answer is that human psychology is fundamentally a biological fact, a matter of brain physics and chemistry. This is not to say that important relationships can not fruitfully be described better at higher-levels (e.g. cognitive science), but that ultimately the origin of anything mental is biology. This fact should not be controversial except among the religious, for it is merely the denial of dualism, of ghosts, spirits, gods and other immaterial beings. As Jensen (1997) wrote:

Although the g factor is typically the largest component of the common factor variance, it is the most “invisible.” It is the only “factor of the mind” that cannot possibly be described in terms of any particular kind of knowledge or skill, or any other characteristics of psychometric tests. The fact that psychometric g is highly heritable and has many physical and brain correlates means that it is not a property of the tests per se. Rather, g is a property of the brain that is reflected in observed individual differences in the many types of behavior commonly referred to as “cognitive ability” or “intelligence.” Research on the explanation of g, therefore, must necessarily extend beyond psychology and psychometrics. It is essentially a problem for brain neurophysiology. [my emphasis]

If GCA is a property of the brain, or at least that there is an analogous general brain performance factor, it may be possible to find it with the same statistical methods that found the GCA. Thus, to find it, one must factor analyze a large, diverse sample of brain measurements that are known to correlate individually with GCA in the hope that there will be a general factor which will correlate very strongly with GCA. There is no guarantee as I see it that this will work, as I see it, but it is something worth trying.

In their chapter on brain and intelligence, Colom and Thompson (2011) write:

The interplay between genes and behavior takes place in the brain. Therefore, learning the language of the brain would be crucial to understand how genes and behavior interact. Regarding this issue, Kovas and Plomin (2006) proposed the so -called “ generalist genes ” hypothesis, on the basis of multivariate genetic research findings showing significant genetic overlap among cognitive abilities such as the general factor of intelligence ( g ), language, reading, or mathematics. The hypothesis has implication for cognitive neuroscience, because of the concepts of pleiotropy (one gene affecting many traits) and polygenicity (many genes affecting a given trait). These genetic concepts suggest a “ generalist brain ” : the genetic influence over the brain is thought to be general and distributed.

Which brain measurements have so far been found to correlate with GCA (or its IQ proxy)?

Below I have compiled a list of brain measurements that have at some point been found to be correlated with GCA IQ scores:

  • Brain evoked potentials: habituation time (Jensen, 1998:155)
  • Brain evoked potentials: complexity of waveform (Deary and Carol, 1997)
  • Brain intracellular pH-level (Jensen, 1998:162)
  • Brain size: total and brain regions (Jung and Haier, 2007)
  • Of the above, grey matter and white matter separate
  • Cortical thickness (Deary et al, 2010)
  • Cortical development (Shaw, P. et al. 2006)
  • Nerve conduction velocity (Deary and Carol, 1997)
  • Brain wave (EEG) coherence (Jensen, 2002)
  • Event related desynchronization of brain waves (Jensen, 2002)
  • White matter lesions (Turken et al, 2008)
  • Concentrations of N-acetyl aspartate (Jung, et al. 2009)
  • Water diffusion parameters (Deary et al, 2010)
  • White matter integrity (Deary et al, 2010)
  • White matter network efficiency (Li et al. 2009)
  • Cortical glucose metabolic rate during mental activity / Neural efficiency (Neubauer et al, 2009)
  • Uric acid level (Jensen, 1998:162)
  • Density of various regions (Frangou et al 2004)
  • White matter fractional anisotropy (Navas‐Sánchez et al 2014; Kim et al 2014)
  • Reliable responding to changing inputs (Euler et al, 2015)

Most of the references above lead to the reviews I relied upon (Deary and Carol, 1997; Jensen, 1998, 2002; Deary et al, 2010). There are surely more, and probably a large number of the above are false-positives. Some I could not find a direct citation for. We cannot know which are false positives until large datasets are compiled with these measures as well as a large number of cognitive tests. A simple WAIS battery won’t suffice, there needs to be elementary cognitive tests too, and other tests that vary more in content, type and g-loading. This is necessary if we are to use the method of correlated vectors as this does not work well without diversity in factor indicators. It is also necessary if we are to examine non-GCA factors.

My hypothesis is that if there is a general brain factor, then it will have a hierarchical structure similar to GCA. Figure 2 shows a hypothetical structure of this.

Figure 2

Notes: Where squares at latent variables and circles are observed variables. I am aware this is opposite of normal practice (e.g. Beaujean, 2014) but text is difficult to fit into circles.

Of these, the speed factor has to do with speed of processing which can be enhanced in various ways (nerve conduction velocity, higher ‘clock’ frequency). Efficiency has to do with efficient use of resources (primarily glucose). Connectivity has to do with better intrabrain connectivity, either by having more connections, less problematic connections or similar. Size has to do with having more processing power by scaling up the size. Some areas may matter more than others for this. Integrity has to do with withstanding assaults, removing garbage (which is known to be the cause of many neurodegenerative diseases) and the like. There are presumably more factors, and some of mine may need to be split.

Previous studies and the present study

Altho factor analysis is common in differential psychology and related fields, it is somewhat rare outside of those. And when it is used, it is often done in ways that are questionable (see e.g. controversy surrounding Hampshire et al (2012): Ashton et al (2014a), Hampshire et al (2014), Ashton et al (2014b), Haier et al (2014a), Ashton et al (2014c), Haier et al (2014b)). On the other hand, factor analytic methods have been used in a surprisingly diverse collection of scientific fields (Jöreskog 1996; Cudeck and MacCallum, 2012).

I am only familiar with one study applying factor analysis to different brain measures and it was a fairly small study at n=132 (Pennington et al, 2000). They analyzed 13 brain regions and reported a two-factor solution. It is worth quoting their methodology section:

Since the morphometric analyses yield a very large number of variables per subject, we needed a data reduction strategy that fit with the overall goal of exploring the etiology of individual differences in the size of major brain structures. There were two steps to this strategy: (1) selecting a reasonably small set of composite variables that were both comprehensive and meaningful; and (2) factor analyzing the composite variables. To arrive at the 13 composite variables discussed earlier, we (1) picked the major subcortical structures identified by the anatomic segmentation algorithms, (2) reduced the set of possible cortical variables by combining some of the pericallosal partitions as described earlier, and (3) tested whether it was justifiable to collapse across hemispheres. In the total sample, there was a high degree of correlation (median R=.93, range=.82-.99) between the right and left sides of any given structure; it thus seemed reasonable to collapse across hemispheres in creating composites. We next factor-analyzed the 13 brain variables in the total sample of 132 subjects, using Principal Components factor analysis with Varimax rotation (Maxwell & Delaney, 1990). The criteria for a significant factor was an eigenvalue>l.0, with at least two variables loading on the factor.

The present study makes it possible to perform a better analysis. The sample is about 8 times larger and has 27 non-overlapping measurements of brain size, broadly speaking. The major downside of the variables in the present study is that the cerebral is not divided into smaller areas as done in their study. Given the very large sample size, one could use 100 variables or more.

The available brain measures are:

  1. cort_area.ctx.lh.caudalanteriorcingulate
  2. cort_area.ctx.lh.caudalmiddlefrontal
  3. cort_area.ctx.lh.fusiform
  4. cort_area.ctx.lh.inferiortemporal
  5. cort_area.ctx.lh.middletemporal
  6. cort_area.ctx.lh.parsopercularis
  7. cort_area.ctx.lh.parsorbitalis
  8. cort_area.ctx.lh.parstriangularis
  9. cort_area.ctx.lh.rostralanteriorcingulate
  10. cort_area.ctx.lh.rostralmiddlefrontal
  11. cort_area.ctx.lh.superiortemporal
  12. cort_area.ctx.rh.caudalanteriorcingulate
  13. cort_area.ctx.rh.caudalmiddlefrontal
  14. cort_area.ctx.rh.fusiform
  15. cort_area.ctx.rh.parsopercularis
  16. cort_area.ctx.rh.parsorbitalis
  17. cort_area.ctx.rh.parstriangularis
  18. cort_area.ctx.rh.rostralanteriorcingulate
  19. cort_area.ctx.rh.rostralmiddlefrontal
  20. vol.Left.Cerebral.White.Matter
  21. vol.Left.Cerebral.Cortex
  22. vol.Left.Hippocampus
  23. vol.Left.Amygdala
  24. vol.Right.Cerebral.White.Matter
  25. vol.Right.Cerebral.Cortex
  26. vol.Right.Hippocampus
  27. vol.Right.Amygdala

I am not expert in neuroscience, but as far as I know, the above measurements are independent and thus suitable for factor analysis. They reported additional aggregate measures such as total surface area and total volume. They also reported total cranial volume, which permits the calculations of another two brain measurements: the non-brain volume of the cranium (subtracting total brain volume from total intracranial volume), and the proportion of intracranial volume used for brain.

The careful reader has perhaps noticed something bizarre about the dataset, namely that there is an unequal number of left hemisphere (“lh”) and right hemisphere (“rh”) regions (11 vs. 8). I have no idea why this is, but it is somewhat problematic in factor analysis since this weights some variables twice as well as weighting the left side a bit more.

The present dataset is inadequate for properly testing the general brain factor hypothesis because it only has measurements from one domain: size. The original authors may have more measurements they did not publish. However, one can examine the existence of the brain size factor, as a prior test of the more general hypothesis.

Age and overall brain size

As an initial check, I plotted the relationship between total brain size measures and age. These are shown in Figure 3 and 4.

Figure 3 Figure 4

Curiously, these show that the size increase only occurs up to about age 8 and 10, or so. I was under the impression that brain size continued to go up until the body in general stopped growing, around 15-20 years. This study does not appear to be inconsistent with others (e.g. Giedd, 1999). The relationship is clearly non-linear, so one will need to use the age corrections described above. To see if the correction worked, we plot the total size variables and age. There should be near-zero correlation. Results in Figures 5 and 6.

Figure 5 Figure 6

Instead we still see a slight correlation for both genders, both apparently due to a single outlier. Very odd. I examined these outliers (IDs: P0009 and P0010) but did not see anything special about them. I removed them and reran the residualization from the original data. This produced new outliers similar to before (with IDs following them). When I removed them, new ones. I figure it is due to some error with the residualization process. Indeed, a closer look revealed that the largest outliers (positive and negative) were always the first two indexes. I thus removed these before doing more analyses. The second largest outliers had no particular index. I tried removing more age outliers, but it was not possible to completely remove the correlations between age and the other variables (usually remained near r=.03). Figure 6a shows the same as Figure 6 just without the two outliers.

Figure 6a

The genders are somewhat displaced on the age variable, but if one looks at the x-axis, one an see that this is in fact a very, very small difference.

General brain size factor with and without residualization

Results for the factor analysis without residualization are shown in Figure 7. I used the fa() function from the psych package with default settings: 1 factor extracted with the minimum residuals method. Previous studies have shown factor extraction method to be of little importance as long as it isn’t principal components with a smaller number of variables (Kirkegaard, 2014).

Figure 7

We see that the factors are quite similar (factor congruence .95) but that the male factor is quite a bit stronger (var% M/F 26 vs. 16). This suggests that the factor either works differently in the genders, or there is error in the results. If it is error, we should see an improvement after removing some of it. Figure 8 shows the same plot using the residualized data.

Figure 8

The results were more similar now and stronger for both genders (var% M/F = 34 vs. 33).

The amygdala results are intriguing, suggesting that this region does not increase in size along with the rest of the brain. The right amygdala even had negative loadings in both genders.

Using all that’s left

The next thing one might want to do is extract multiple factors. I tried extracting various solutions with nfactors 3-5. These however are bogus models due to the near-1 correlation between the brain sides. This results in spurious factors that load on just 2 variables (left and right versions) with loadings near 1. One could solve this by either averaging those with 2 measurements, or using only those from the left side. It makes little difference because they correlate so highly. It should be noted tho that doing this means one can’t see any lateralization effects such as that suggested for the right amygdala.

I redid all the results using the left side variables only. Figure 9 shows the results.

Figure 9

Now all regions had positive loadings and the var% increased a bit for both genders to 36/36. Factor congruence was 1.00, even for the non-residualized data. It thus seems that the missing measures of the right side or the use of near-doubled measures had a negative impact on the results as well.

One can calculate other measures of factor strength/internal reliability, such as the average intercorrelation, Cronbach’s alpha, Guttman’s G6. These are shown in Table 1.

Table 1- Internal reliability measures
Sample Mean r Alpha (raw) Alpha (std.) G6
Male .33 .48 .88 .90
Female .34 .45 .89 .90


Multiple factors

We are now ready to explore factor solutions with more factors. Three different methods suggested extracted at most 5 factors both datasets (using nScree() from nFactors package). I extracted solutions for 2 to 6 factors for each dataset, the last included by accident. All of these were extracted with oblique rotation method of oblimin thus possibly returning correlated factors. The prediction from a hierarchical model is clear: factors extracted in this way should be correlated. Figures 10 to 14 show the factor loadings of these solutions.

Figure 10 Figure 11 Figure 12 Figure 13

Figure 14

So it looks like results very pretty good with 4 factors and not too good with the others. The problem with this method is that the factors extracted may be similar/identical but not in the same order and with the same name. This means that the plots above may plot the wrong factors together which defeats the entire purpose. So what we need is an automatic method of pairing up the factors correctly if possible. The exhaustive method is trying all the pairings of factors for each number of factors to extract, and then calculating some summary metrics or finding the best overall pairing combination. This would involve quite a lot of comparisons, since e.g. one can pair up set 2 sets of, say, 5 factors in 5*4/2 ways (10).

I settled for a quicker solution. For each factor solution pair, I calculated all the cross-analysis congruence factors. Then for each factor, I found the factor from the other analysis it had the highest congruence with and saved this information. This method can miss some okay but not great solutions, but I’m not overly concerned about those. In a good fit, the factors found in each analysis should map 1 to 1 to each other such that their highest congruence is with the analog factor from the other analysis.

From this information, I calculated the mean of the best congruence pairs, the minimum, and whether there was a mismatch. A mismatch occurs when two or more factors from one analysis maps to (has the highest congruence) with the same factor from the other analysis. I calculated three metrics for all the analyses performed above. The results are shown in Table 2.

Table 2 – Cross-analysis comparison metrics
Factors.extracted factor.mismatch
2 0.825 0.73 FALSE
3 0.713 0.37 TRUE
4 0.960 0.93 FALSE
5 0.720 0.35 TRUE
6 0.765 0.58 FALSE


As can be seen, the two analyses with 4 factors were a very good match. Those with 3 and 5 terrible as they produced factor mismatches. The analyses with 2 and 6 were also okay.

The function for going thru all the oblique solutions for two samples also returns the necessary information to match up the factors if they need reordering. If there is a mismatch, this operation is nonsensical, so I won’t re-do all the plots. The plot above with 4 factors just happens to already be correctly ordered. This however need not be the case. The only plot that needs to be redone is that with 6 factors. It is shown in Figure 15.

Figure 15

Compare with figure 14 above. One might wonder whether the 4 or 6 factor solutions are the best. In this case, the answer is the 4 factor solutions because the female 6 factor solution is illegal — one factor loading is above 1 (“a Heywood case”). At present, given the relatively few regional measures, and the limitation to only volume and surface measures, I would not put too much effort into theorizing about the multifactor structure found so far. It is merely a preliminary finding and may change drastically when more measures are added or measures are sampled differently.

A more important finding from all the multifactor solutions was that all produced correlated factors, which indicates a general factor.

Aggregate measures and the general brain size factor

So, the general brain size factor (GBSF) may exist, but is it useful? At first, we may want to correlate the various aggregate variables. Results are in Table 3.

Table 3 – Correlations between aggregate brain measures vol.WholeBrain vol.IntracranialVolume GBSF 0.997 0.869 0.746 0.953 0.997 0.867 0.751 0.953
vol.WholeBrain 0.832 0.832 0.822 0.923
vol.IntracranialVolume 0.638 0.642 0.798 0.776
GBSF 0.950 0.950 0.905 0.711

Notes: Correlations above diagonal are males, below females.

The total areas of the brain are almost symmetrical: the correlation of the total surface area and left side only is a striking .997. Intracranial volume is a decent proxy (.822) for whole brain volume, but is somewhat worse for total surface area (.746). GBSF has very strong correlations with the surface areas (.95), but not quite as strong as the analogous situation in cognitive data: IQ and extracted general factor (GCA factor) usually correlate .99 with a reasonable sample of subtests: Ree and Earles (1991) reported that an average GCA factor correlated .991 with an unweighted sum score in a sample of >9k, Kirkegaard (2014b) found a .99 correlation between extracted GCA and an unweighted sum in a Dutch university sample of ~500.

Correlations with cognitive measures

The authors have data for 4 cognitive tests, however, data are only public for 2 of them. These are in the authors’ words:

Flanker inhibitory control test (N = 1,074).
The NIH Toolbox Cognition Battery version of the flanker task was adapted from the Attention Network Test (ANT). Participants were presented with a stimulus on the center of a computer screen and were required to indicate the left-right orientation while inhibiting attention to the flankers (surrounding stimuli). On some trials the orientation of the flankers was congruent with the orientation of the central stimulus and on the other trials the flankers were incongruent. The test consisted of a block of 25 fish trials (designed to be more engaging and easier to see to make the task easier for children) and a block of 25 arrow trials, with 16 congruent and 9 incongruent trials in each block, presented in pseudorandom order. Participants who responded correctly on 5 or more of the 9 incongruent trials then proceeded to the arrows block. All children age 9 and above received both the fish and arrows blocks regardless of performance. The inhibitory control score was based on performance on both congruent and incongruent trials. A two-vector method was used that incorporated both accuracy and reaction time (RT) for participants who maintained a high level of accuracy (>80% correct), and accuracy only for those who did not meet this criteria. Each vector score ranged from 0 to 5, for a maximum total score of 10 (M = 7.67, s.d. = 1.86).
List sorting working memory test (N = 1,084).
This working memory measure requires participants to order stimuli by size. Participants were presented with a series of pictures on a computer screen and heard the name of the object from a speaker. The test was divided into the One-List and Two-List conditions. In the One-List condition, participants were told to remember a series of objects (food or animals) and repeat them in order, from smallest to largest. In the Two-List condition, participants were told to remember a series of objects (food and animals, intermixed) and then again report the food in order of size, followed by animals in order of size. Working memory scores consisted of combined total items correct on both One-List and Two-List conditions, with a maximum of 28 points (M = 17.71, s.d. = 5.39).

I could not locate a factor analytic study for the Flanker test, so I don’t know how g-loaded it is. Working memory (WM) is known to have a strong relationship to GCA (Unsworth et al, 2014). The WM variable should probably be expected to be the most g-loaded of the two. The implication given the causal hypothesis of brain size for GCA is that the WM test should show higher correlations to the brain measures. Figures X and X show the histograms for the cognitive measures.


Note that the x-values do not have any interpretation as they are the residual raw values, not raw values. For the Flanker test, we see that it is bimodal. It seems that a significant part of the sample did not understand the test and thus did very poorly. One should probably either remove them or use a non-parametric measure if one wanted to rely on this variable. I decided to remove them since the sample was sufficiently large that this wasn’t a big problem. The procedure reduced the skew from -1.3/-1.1 to -.2/-.1 respectively for the male and female samples. The sample sizes were reduced from 548/516 to 522/487 respectively. One could plausibly combine them into one measure which would perhaps be a better estimate of GCA than either of them alone. This would be the case if their g-loading was about similar. If however, one is much more g-loaded than the other, it would degrade the measurement towards a middle level. I combined the two measures by first normalizing them (to put them on the same scale) and then averaging them.

Given the very high correlations between the GBSF of these data and the other aggregate measures, it is not expected that the GBSF will correlate much more strongly with cognitive measures than the other aggregate brain measures. Table X shows the correlations.

Table X – Correlations between cognitive measures and aggregate brain size measures
Variable WM Flanker Combined WM Flanker Combined
Males Females
Flanker 0.407 0.393
WM.Flanker.mean 0.830 0.847 0.824 0.845 0.302 0.138 0.235 0.236 0.201 0.237 0.302 0.137 0.235 0.239 0.203 0.238
vol.WholeBrain 0.263 0.103 0.201 0.158 0.120 0.146
vol.IntracranialVolume 0.213 0.101 0.170 0.154 0.101 0.137
GBSF 0.311 0.147 0.252 0.223 0.181 0.218


As for the GBSF, given that it is a ‘distillate’ (Jensen’s term), one would expect it to have slightly higher correlations with the cognitive measures than the merely unweighted ‘sum’ measures. This was the case for males, but not females. In general, the female correlations were weaker, especially the whole brain volume x WM (.263 vs. .158). Despite the large sample sizes, this difference is not very certain, the 95% confidence intervals are -.01 to .22. A larger sample is necessary to examine this question. The finding is intriguing is that if real, it would pose an alternative solution to the Ankney-Rushton anomaly, that is, the fact that males have greater brain size and this is related to IQ scores, but do not consistently perform better on IQ tests (Jackson and Rushton, 2006). Note however that the recent large meta-analysis of brain size x IQ studies did not find an effect of gender, so perhaps the above results are a coincidence (Pietschnig et al 2014).

We also see that the total cortical area variables were stronger correlates of cognitive measures than whole brain volume, but a larger sample is necessary to confirm this pattern.

Lastly, we see a moderately strong correlation between the two cognitive measures (≈.4). The combined measure was a weaker correlate of the criteria variables, which is what is expected if the Flanker test was a relatively weaker test of GCA than the WM one.

Correlations with parental education and income

It is time to revisit the results reported by the original authors, namely correlations between educational/economic variables and brain measures. I think the correlations between specific brain regions and criteria variables is mostly a fishing expedition of chance results (multiple testing) and of no particular interest unless strong predictions can be made before looking at the data. For this reason, I present only correlations with the aggregate brain measures, as seen in Table X.

Table x – Correlations between educational/economic variables and other variables
Variable ED ln_Inc Income ED ln_Inc Income
Males Females
WM 0.131 0.192 0.175 0.170 0.229 0.174
Flanker 0.163 0.180 0.188 0.118 0.131 0.106
WM.Flanker.mean 0.168 0.215 0.206 0.178 0.215 0.165 0.104 0.217 0.207 0.128 0.173 0.154 0.108 0.215 0.208 0.133 0.170 0.152
vol.WholeBrain 0.103 0.190 0.195 0.064 0.112 0.078
vol.IntracranialVolume 0.126 0.157 0.159 0.086 0.104 0.100
GBSF 0.109 0.206 0.204 0.100 0.157 0.137
ED 0.559 0.542 0.561 0.513
ln_Inc 0.559 0.866 0.561 0.855
Income 0.542 0.866 0.513 0.855


Here the correlations of the combined cognitive measure was higher than WM, unlike before, so perhaps the diagnosis from before was wrong. In general, the correlations of income and brain measures were stronger than that for education. This is despite the fact that GCA is more strongly correlated to educational attainment than income. This was however not the same in this sample: correlations of WM and Flanker were stronger with the economic variables. Perhaps there is more range restriction in the educational variable than the income one. An alternative environmental interpretation is that it is the affluence that causes the larger brains.

If we recall the theoretic predictions of the strength of the correlations, the incomes are stronger than expected (actual .19/.09 M/F, predicted about .05), while the educational ones are a bit weaker than expected (actual .1/.6, predicted about .13). However, the sample sizes are not larger enough for these results to be certain enough to question the theory.

Racial admixture

To me surprise, the sample had racial admixture data. This is surprising because such data has been available to testing the genetic hypothesis of group differences for many years, apparently without anyone publishing something on the issue. As I argued elsewhere, this is odd given that a good dataset would be able to decisively settle the ‘race and intelligence’ controversy (Dalliard, 2014; Rote and Rodgers, 2005; Rushton and Jensen, 2005). It is actually very good evidence for the genetic hypothesis because if it was false, and these datasets showed it, it would have been a great accomplishment for a mainstream scientist to publish a paper decisively showing that it was indeed false. However, if it was true, then any mainstream scientist could not publish it without risking personal assaults, getting fired and possibly pulled in court as were academics who previously researched that topic (Gottfredson, 2005; Intelligence 1998; Nyborg 2011; 2003).

The genomic data however appeared to be an either/or (1 or 0) variable in the released data files. Oddly, some persons had no value for any racial group. It turns out that the data was merely rounded in the spreadsheet file. This explained why some persons had 0 for all groups: These persons did not belong at least 50% to any racial group, and thus they were assigned a 0 in every case.

I can think of two ways to count the number of persons in the main categories. One can count the total ‘summed’ persons. In this way, if person A has 50% ancestry from race R, and person B has 30%, this would sum to .8 persons. One can think of it as the number of pure-breed persons’ worth of ancestry from that that group. Another way is to count everybody as 1 who is above some threshold for ancestry. I chose to use 20% and 80% for thresholds, which correspond with persons with substantial ancestry from that racial cluster, and persons with mostly ancestry from that cluster. One could choose other values of course, and there is a degree of arbitrariness, but it is not important what the particular values are.

Results are in Table X.

Racial group European African Amerindian East Asian Oceanian Central Asia Sum
Summed ancestry ‘persons’ 686.364 134.2714 48.31457 163.49868 8.59802 26.95408 1068.00075
Persons with >20% 851 187 89 238 8 30 1403
Persons with >80% 647 105 3 121 0 21 897


Note that the number 1068 is the exact number of persons in the complete sample, which means that the summed ancestry for all groups has an error of a mere .00075.

Another way of understanding the data is to plot histograms of each racial group. These are shown below in Figures X to X.

Race_European_histogramRace_African_histogram Race_Amerindian_histogram Race_East_Asian_histogram   Race_Oceanian_histogramRace_Central_Asian_histogram


Since European ancestry is the largest, the other plots are mostly empty except for the 0% bar. But we do see a fair amount of admixture in the dataset.

Regression, residualization, correlation and power

There are a couple of different methods one could use to examine the admixture data. A simple correlation is justified when dealing with a group that only has 2 sources of ancestry. This is the easiest case to handle. For this to work, the groups most have a different genotypic mean of the trait in question (GCA and brain size variables in this case) and there must be a substantially admixtured population. Even given a large hypothesized genotypic difference, the expected correlation is actually quite small. For African Americans (such as those in the sample), their European ancestry% is about 15-25% depending on the exact sample. The standard deviation of their European ancestry% is not always reported, but one can calculate it if one has some data, which we do.

The first problem with this dataset is that there are no sociological race categories (“white”, “African American”, “Hispanic”, “Asian” etc.), but only genomic data. This means that to get an African American subsample, we must create one based on actual actual ancestry. There are two criteria that needs to be met for inclusion in that group: 1) the person must be substantially African, 2) the person must be mostly a mix of European and African ancestry. Going with the values from before, this means that the person must be at least 20% African, and at least 80% combined European and African.

Dealing with scanner and site

There are a variety of ways to use the data and they may or may not give similar results. First is the question of which variables to control for. In the earlier sections of this paper, I controlled for Age, Age2, Age3, Scanner (12 different). For producing the above ancestry plots and results I did not control for anything. Controlling the ancestry variables for scanner is problematic as people from different races live in different places. Controlling for this thus removes the racial differences for no reason. One could similarly control for site where the scanner is (I did not do this earlier). We can compare this to scanner by a contingency table, as shown in Table X below:

Table X – Contingency table of scanner site and scanner #
Site/scanner 0 1 10 11 12 2 3 4 5 6 7 8 9
Cornel 0 0 0 96 0 0 0 0 0 0 0 0 0
Davis 0 0 0 0 0 0 0 0 0 0 114 0 0
Hawaii 0 0 0 0 0 0 0 0 0 202 0 0 0
KKI 0 0 0 0 0 0 103 0 0 0 0 0 0
MGH 0 0 0 0 0 0 0 0 115 0 0 0 13
UCLA 0 0 27 0 22 0 0 10 0 0 0 0 0
UCSD 109 93 0 0 0 0 0 0 0 0 0 0 0
UMMS 0 0 0 0 0 56 0 0 0 0 0 0 0
Yale 0 0 0 0 0 0 0 0 0 0 0 108 0


As we can see, these are clearly inter-dependent, given the obvious fact that the scanners have a particular location and was not moved around (all columns have only 1 cell with value>0). Some sites however have multiple scanners, some have only one. E.g. UCSD has two scanners (#0 and #1), while KKI has only one (#3).

Controlling for scanner however makes sense if we are looking at brain size variables, as this removes differences between measurements due to differences in the scanning equipment or (post-)processing. So perhaps one would want to control brain measurements for scanner and age effects, but only control the remaining variables for age affects.

Dealing with gender

As before

To be continued…



  • Ashton, M. C., Lee, K., & Visser, B. A. (2014a). Higher-order g versus blended variable models of mental ability: Comment on Hampshire, Highfield, Parkin, and Owen (2012). Personality and Individual Differences, 60, 3-7.
  • Ashton, M. C., Lee, K., & Visser, B. A. (2014b). Orthogonal factors of mental ability? A response to Hampshire et al. Personality and Individual Differences, 60, 13-15.
  • Ashton, M. C., Lee, K., & Visser, B. A. (2014c). Further response to Hampshire et al. Personality and Individual Differences, 60, 18-19.
  • Beaujean, A. A. (2014). Latent Variable Modeling Using R: A Step by Step Guide: A Step-by-Step Guide. Routledge.
  • Bouchard Jr, T. J. (2014). Genes, Evolution and Intelligence. Behavior genetics, 44(6), 549-577.
  • Button, K. S., Ioannidis, J. P., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S., & Munafò, M. R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365-376.
  • Colom, R., & Thompson, P. M. (2011). Intelligence by Imaging the Brain. The Wiley-Blackwell handbook of individual differences, 3, 330.
  • Cudeck, R., & MacCallum, R. C. (Eds.). (2012). Factor analysis at 100: Historical developments and future directions. Routledge.
  • Dalliard, M. (2013). Is Psychometric g a Myth?. Human Varieties.
  • Dalliard, M. (2014). The Elusive X-Factor: A Critique of J. M. Kaplan’s Model of Race and IQ. Open Differential Psychology.
  • Deary, I. J., & Caryl, P. G. (1997). Neuroscience and human intelligence differences. Trends in Neurosciences, 20(8), 365-371.
  • Deary, I. J., Penke, L., & Johnson, W. (2010). The neuroscience of human intelligence differences. Nature Reviews Neuroscience, 11(3), 201-211.
  • Dekaban, A.S. and Sadowsky, D. (1978). Changes in brain weights during the span of human life: relation of brain weights to body heights and body weights, Ann. Neurology, 4:345-356.
  • Euler, M. J., Weisend, M. P., Jung, R. E., Thoma, R. J., & Yeo, R. A. (2015). Reliable Activation to Novel Stimuli Predicts Higher Fluid Intelligence. NeuroImage.
  • Frangou, S., Chitins, X., & Williams, S. C. (2004). Mapping IQ and gray matter density in healthy young people. Neuroimage, 23(3), 800-805.
  • Giedd, J. N., Blumenthal, J., Jeffries, N. O., Castellanos, F. X., Liu, H., Zijdenbos, A., … & Rapoport, J. L. (1999). Brain development during childhood and adolescence: a longitudinal MRI study. Nature neuroscience, 2(10), 861-863.
  • Gottfredson, L. S. (2005). Suppressing intelligence research: Hurting those we intend to help. In R. H. Wright & N. A. Cummings (Eds.), Destructive trends in mental health: The well-intentioned path to harm (pp. 155-186). New York: Taylor and Francis.
  • Haier, R. J., Karama, S., Colom, R., Jung, R., & Johnson, W. (2014a). A comment on “Fractionating Intelligence” and the peer review process. Intelligence, 46, 323-332.
  • Haier, R. J., Karama, S., Colom, R., Jung, R., & Johnson, W. (2014b). Yes, but flaws remain. Intelligence, 46, 341-344.
  • Hampshire, A., Highfield, R. R., Parkin, B. L., & Owen, A. M. (2012). Fractionating human intelligence. Neuron, 76(6), 1225-1237.
  • Hampshire, A., Parkin, B., Highfield, R., & Owen, A. M. (2014). Response to:“Higher-order g versus blended variable models of mental ability: Comment on Hampshire, Highfield, Parkin, and Owen (2012)”. Personality and Individual Differences, 60, 8-12.
  • Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world?. Behavioral and brain sciences, 33(2-3), 61-83.
  • Intelligence. (1998). Special issue dedicated to Arthur Jensen. Volume 26, Issue 3.
  • Jackson, D. N., & Rushton, J. P. (2006). Males have greater g: Sex differences in general mental ability from 100,000 17-to 18-year-olds on the Scholastic Assessment Test. Intelligence, 34(5), 479-486.
  • Jensen, A. R., & Weng, L. J. (1994). What is a good g?. Intelligence, 18(3), 231-258.
  • Jensen, A. R. (1997). The psychometrics of intelligence. In H. Nyborg (Ed.), The scientific study of human nature: Tribute to Hans J. Eysenck at eighty. New York: Elsevier. Pp. 221—239.
  • Jensen, A. R. (1998). The g Factor: The Science of Mental Ability. Preager.
  • Jensen, A. R. (2002). Psychometric g: Definition and substantiation. The general factor of intelligence: How general is it, 39-53.
  • Jung, R. E. & Haier, R. J. (2007). The Parieto-Frontal Integration Theory (P-FIT) of intelligence: converging neuroimaging evidence. Behav. Brain Sci. 30, 135–154; discussion 154–187.
  • Jung, R. E. et al. (2009). Imaging intelligence with proton magnetic resonance spectroscopy. Intelligence 37, 192–198.
  • Jöreskog, K. G. (1996). Applied factor analysis in the natural sciences. Cambridge University Press.
  • Kim, S. E., Lee, J. H., Chung, H. K., Lim, S. M., & Lee, H. W. (2014). Alterations in white matter microstructures and cognitive dysfunctions in benign childhood epilepsy with centrotemporal spikes. European Journal of Neurology, 21(5), 708-717.
  • Kirkegaard, E. O. W. (2014a). The international general socioeconomic factor: Factor analyzing international rankings. Open Differential Psychology.
  • Kirkegaard, E. O. W. (2014b). The personal Jensen coefficient does not predict grades beyond its association with g. Open Differential Psychology.
  • Li, Y. et al. (2009). Brain anatomical network and intelligence. PLoS Comput. Biol. 5, e1000395.
  • Navas‐Sánchez, F. J., Alemán‐Gómez, Y., Sánchez‐Gonzalez, J., Guzmán‐De‐Villoria, J. A., Franco, C., Robles, O., … & Desco, M. (2014). White matter microstructure correlates of mathematical giftedness and intelligence quotient. Human brain mapping, 35(6), 2619-2631.
  • Neubauer, A. C. & Fink, A. (2009). Intelligence and neural efficiency. Neurosci. Biobehav. Rev. 33, 1004–1023.
  • Noble, K. G., Houston, S. M., Brito, N. H., Bartsch, H., Kan, E., Kuperman, J. M., … & Sowell, E. R. (2015). Family income, parental education and brain structure in children and adolescents. Nature Neuroscience.
  • Nyborg, H. (2003). The sociology of psychometric and bio-behavioral sciences: A case study of destructive social reductionism and collective fraud in 20th century academia. Nyborg H.(Ed.). The scientific study of general intelligence. Tribute to Arthur R. Jensen, 441-501.
  • Nyborg, H. (2011). The greatest collective scientific fraud of the 20th century: The demolition of differential psychology and eugenics. Mankind Quarterly, Spring Issue.
  • Pennington, B. F., Filipek, P. A., Lefly, D., Chhabildas, N., Kennedy, D. N., Simon, J. H., … & DeFries, J. C. (2000). A twin MRI study of size variations in the human brain. Journal of Cognitive Neuroscience, 12(1), 223-232.
  • Pietschnig, J., Penke, L., Wicherts, J. M., Zeiler, M., & Voracek, M. (2014). Meta-Analysis of Associations Between Human Brain Volume And Intelligence Differences: How Strong Are They and What Do They Mean?. Available at SSRN 2512128.
  • Ree, M. J., & Earles, J. A. (1991). The stability of g across different methods of estimation. Intelligence, 15(3), 271-278.
  • Rowe, D. C., & Rodgers, J. E. (2005). Under the skin: On the impartial treatment of genetic and environmental hypotheses of racial differences. American Psychologist, 60(1), 60.
  • Rushton, J. P., & Jensen, A. R. (2005). Thirty years of research on race differences in cognitive ability. Psychology, public policy, and law, 11(2), 235.
  • Shaw, P. et al. (2006). Intellectual ability and cortical development in children and adolescents. Nature 440, 676–679 (2006).
  • Sterne, J. A., White, I. R., Carlin, J. B., Spratt, M., Royston, P., Kenward, M. G., … & Carpenter, J. R. (2009). Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. Bmj, 338, b2393.
  • Strenze, T. (2007). Intelligence and socioeconomic success: A meta-analytic review of longitudinal research. Intelligence, 35(5), 401-426.
  • Turken, A. et al. (2008). Cognitive processing speed and the structure of white matter pathways: convergent evidence from normal variation and lesion studies. Neuroimage 42, 1032–1044
  • Unsworth, N., Fukuda, K., Awh, E., & Vogel, E. K. (2014). Working memory and fluid intelligence: Capacity, attention control, and secondary memory retrieval. Cognitive psychology, 71, 1-26.

January 19, 2015

Admixture in the Americas: Admixture among US Blacks and Hispanics and academic achievement

Filed under: Genetics / behavioral genetics,Psychology — Tags: , — Emil O. W. Kirkegaard @ 22:06

Some time ago a new paper came out from the 23andme people reporting admixture among US ethnoracial groups (Bryc et al, 2014). Per our still on-going admixture project (current draft here), one could see if admixture predicts academic achievement (or IQ, if such were available). We (that is, John did) put together achievement data (reading and math scores) from the NAEP and the admixture data here.

Descriptive stats

Admixture studies do not work well if there is no or little variation within groups. So let’s first examine them. For blacks:

                      vars  n mean   sd median trimmed  mad  min  max range  skew kurtosis   se
BlackAfricanAncestry     1 31 0.74 0.04   0.74    0.74 0.03 0.64 0.83  0.19 -0.03    -0.38 0.01
BlackEuropeanAncestry    1 31 0.23 0.04   0.24    0.23 0.03 0.15 0.34  0.19  0.09    -0.30 0.01


So we see that there is little American admixture in Blacks because the African and European add up to close to 100 (23+74=97). In fact, the correlation between African and European ancestry in Blacks is -.99. This also means that multiple correlation is useless because of collinearity.

White admixture data is also not very useful. It is almost exclusively European:

                      vars  n mean sd median trimmed mad  min max range  skew kurtosis se
WhiteEuropeanAncestry    1 51 0.99  0   0.99    0.99   0 0.98   1  0.02 -0.95     0.74  0

What about Hispanics (some sources call them Latinos)?

                       vars  n mean   sd median trimmed  mad  min  max range skew kurtosis   se
LatinoEuropeanAncestry    1 34 0.73 0.07   0.72    0.73 0.05 0.57 0.90  0.33 0.34     0.22 0.01
LatinoAfricanAncestry     1 34 0.09 0.05   0.08    0.08 0.06 0.01 0.22  0.21 0.51    -0.69 0.01
LatinoAmericanAncestry    1 34 0.10 0.05   0.09    0.10 0.03 0.04 0.21  0.17 0.80    -0.47 0.01

Hispanics are fairly admixed. Overall, they are mostly European, but the range of African and American ancestry is quite high. Furthermore, due to the three way variation, multiple regression should work. The ancestry intercorrelations are: -.42 (Afro x Amer) -.21 (Afro x Euro) -.50 (Amer x Euro). There must also be another source because 73+9+10 is only 92%. Where’s the last 8% admixture from?

Admixture x academic achievement correlations: Blacks

row.names BlackAfricanAncestry BlackAmericanAncestry BlackEuropeanAncestry
1 Math2013B -0.32 0.09 0.29
2 Math2011B -0.27 0.21 0.25
3 Math2009B -0.30 0.09 0.28
4 Math2007B -0.12 0.27 0.08
5 Math2005B -0.28 0.26 0.23
6 Math2003B -0.30 0.15 0.26
7 Math2000B -0.36 -0.08 0.34
8 Read2013B -0.25 0.14 0.22
9 Read2011B -0.33 0.22 0.30
10 Read2009B -0.40 -0.03 0.41
11 Read2007B -0.26 0.14 0.24
12 Read2005B -0.43 0.33 0.39
13 Read2003B -0.42 0.09 0.38
14 Read2002B -0.30 -0.10 0.27


Summarizing these results:

     vars  n  mean   sd median trimmed  mad   min   max range  skew kurtosis   se
Afro    1 14 -0.31 0.08  -0.30   -0.32 0.05 -0.43 -0.12  0.31  0.48     0.10 0.02
Amer    1 14  0.13 0.13   0.14    0.13 0.11 -0.10  0.33  0.43 -0.32    -1.07 0.03
Euro    1 14  0.28 0.08   0.28    0.29 0.06  0.08  0.41  0.33 -0.49     0.11 0.02

So we see the expected directions and order, for Blacks (who are mostly African), American admixture is positive and European is more positive. There is quite a bit of variation over the years. It is possible that this reflects mostly ‘noise’ as in, e.g. changes in educational policies in the states, or just sampling error. It is also possible that the changes are due to admixture changes within states over time.

Admixture x academic achievement correlations: Hispanics

row.names LatinoAfricanAncestry LatinoAmericanAncestry LatinoEuropeanAncestry
1 Math13H 0.20 -0.13 -0.10
2 Math11H 0.27 0.02 -0.02
3 Math09H 0.29 -0.32 0.04
4 Math07H 0.36 -0.14 -0.01
5 Math05H 0.38 -0.08 0.00
6 Math03H 0.37 -0.23 -0.08
7 Math00H 0.30 -0.09 -0.05
8 Read2013H 0.18 -0.44 0.33
9 Read2011H 0.21 -0.26 0.33
10 Read2009H 0.19 -0.44 0.33
11 Read2007H 0.13 -0.32 0.23
12 Read2005H 0.38 -0.30 0.23
13 Read2003H 0.32 -0.34 0.18
14 Read2002H 0.24 -0.23 0.08

And summarizing:

     vars  n  mean   sd median trimmed  mad   min  max range  skew kurtosis   se
Afro    1 14  0.27 0.08   0.28    0.28 0.12  0.13 0.38  0.25 -0.10    -1.49 0.02
Amer    1 14 -0.24 0.14  -0.24   -0.24 0.15 -0.44 0.02  0.46  0.17    -1.13 0.04
Euro    1 14  0.11 0.16   0.06    0.11 0.19 -0.10 0.33  0.43  0.23    -1.68 0.04

We do not see the expected results per genetic model. Among Hispanics who are 73% European, African admixture has a positive relationship to academic achievement. American admixture is negatively correlated and European positively, but weaker than African. The only thing that’s in line with the genetic model is that European is positive. On the other hand, results are not in line with a null model either, because then we were expecting results to fluctuate around 0.

Note that the European admixture numbers are only positive for the reading tests. The reading tests are presumably those mostly affected by language bias (many Hispanics speak Spanish as a first language). If anything, the math results are worse for the genetic model.

General achievement factors

We can eliminate some of the noise in the data by extracting a general achievement factor for each group. I do this by first removing the cases with no data at all, and then imputing the rest.

Then we get the correlation like before. This should be fairly close to the means above:

 LatinoAfricanAncestry LatinoAmericanAncestry LatinoEuropeanAncestry 
                  0.28                  -0.36                   0.22

The European result is stronger with the general factor from the imputed dataset, but the order is the same.

We can do the same for the Black data to see if the imputation+factor analysis screws up the results:

 BlackAfricanAncestry BlackAmericanAncestry BlackEuropeanAncestry 
                -0.35                  0.20                  0.31

These results are similar to before (-.31, .13, .28) with the American result somewhat stronger.


Perhaps if we plot the results, we can figure out what is going on. We can plot either the general achievement factor, or specific results. Let’s do both:

Reading2013 plots

hispanic_afro_read13 hispanic_amer_read13 hispanic_euro_read13

Math2013 plots

hispanic_afro_math13 hispanic_amer_math13 hispanic_euro_math13

General factor plots

hispanic_afro_general hispanic_amer_general hispanic_euro_general

These did not help me understand it. Maybe they make more sense to someone who understands US demographics and history better.

Multiple regression

As mentioned above, the Black data should be mostly useless for multiple regression due to high collinearity. But the hispanic should be better. I ran models using two of the three ancestry estimates at a time since one cannot use all three (I think).

Generally, the independents did not reach significance. Using the general achievement factor as the dependent, the standardized betas are:

LatinoAfricanAncestry LatinoAmericanAncestry
             0.1526765             -0.2910413
LatinoAfricanAncestry LatinoEuropeanAncestry
             0.3363636              0.2931108
LatinoAmericanAncestry LatinoEuropeanAncestry
           -0.32474678             0.06224425

The first is relative to European, second to American, and third African. The results are not even consistent with each other. In the first, African>European. In the third, European>African. All results show that Others>American tho.

The remainder

There is something odd about the data, it doesn’t sum to 1. I calculated the sum of the ancestry estimates, and then subtracted that from 1. Here’s the results:

black_remainder hispanic_remainder

To these we can add simple descriptive stats:

                        vars  n mean   sd median trimmed  mad  min  max range skew kurtosis   se
BlackRemainderAncestry     1 31 0.02 0.00   0.02    0.02 0.00 0.01 0.03  0.02 1.35     1.18 0.00
LatinoRemainderAncestry    1 34 0.08 0.05   0.07    0.07 0.03 0.02 0.34  0.32 3.13    12.78 0.01


So we see that there is a sizable other proportion of Hispanics and a small one for Blacks. Presumably, the large outlier of Hawaii is Asian admixture from Japanese, Chinese, Filipino and Native Hawaiian clusters. At least, these are the largest groups according to Wikipedia. For Blacks, the ancestry is presumably Asian admixture as well.

Do these remainders correlate with academic achievement? For Blacks, r = .39 (p = .03), and for Hispanics r = -.24 (p = .18). So the direction is as expected for Blacks and stronger, but for Hispanics, it is in the right direction but weaker.

Partial correlations

What about partialing out the remainders?

LatinoAfricanAncestry LatinoAmericanAncestry LatinoEuropeanAncestry
            0.21881404            -0.33114612             0.09329413
BlackAfricanAncestry BlackAmericanAncestry BlackEuropeanAncestry
           -0.2256171             0.1189219             0.2185139


Not much has changed. European correlation has become weaker for Hispanics. For Blacks, results are similar to before.

Proposed explanations?

The African results are in line with genetic models. The Hispanic is not, but it isn’t in line with the null-model either. Perhaps it has something to do with generational effects. Perhaps if one could find % of first generation Hispanics by state and add those to the regression model / control for that using partial correlations.

Other ideas? Before calculating the results, John wrote:

Language, generation, and genetic assimilation are all confounded, so I thought it best to not look at them.

He may be right.

R code

data = read.csv("BryceAdmixNAEP.tsv", sep="\t",row.names=1)
library(car) # for vif
library(psych) # for describe
library(VIM) # for imputation
library(QuantPsyc) #for lm.beta
library(devtools) #for source_url
#load mega functions

#descriptive stats

black.model = "Math2013B ~ BlackAfricanAncestry+BlackAmericanAncestry"
black.model = "Read2013B ~ BlackAfricanAncestry+BlackAmericanAncestry"
black.model = "Math2013B ~ BlackAfricanAncestry+BlackEuropeanAncestry"
black.model = "Read2013B ~ BlackAfricanAncestry+BlackEuropeanAncestry" = lm(black.model, data)

hispanic.model = "Math2013H ~ LatinoAfricanAncestry+LatinoAmericanAncestry"
hispanic.model = "Read2013H ~ LatinoAfricanAncestry+LatinoAmericanAncestry"
hispanic.model = "Math2013H ~ LatinoAfricanAncestry+LatinoEuropeanAncestry"
hispanic.model = "Read2013H ~ LatinoAfricanAncestry+LatinoEuropeanAncestry"
hispanic.model = "hispanic.ach.factor ~ LatinoAfricanAncestry+LatinoAmericanAncestry"
hispanic.model = "hispanic.ach.factor ~ LatinoAfricanAncestry+LatinoEuropeanAncestry"
hispanic.model = "hispanic.ach.factor ~ LatinoAmericanAncestry+LatinoEuropeanAncestry"
hispanic.model = "hispanic.ach.factor ~ LatinoAfricanAncestry+LatinoAmericanAncestry+LatinoEuropeanAncestry" = lm(hispanic.model, data)

cors = round(rcorr(as.matrix(data))$r,2) #all correlations, round to 2 decimals

#blacks = cors[10:23,1:3] #Black admixture x Achv.
hist(unlist([,1])) #hist for afri x achv
hist(unlist([,2])) #amer x achv
hist(unlist([,3])) #euro x achv
desc = rbind(Afro=describe(unlist([,1])), #descp. stats afri x achv
             Amer=describe(unlist([,2])), #amer x achv
             Euro=describe(unlist([,3]))) #euro x achv

admixture.cors.white = cors[24:25,4:6] #White admixture x Achv.

admixture.cors.hispanic = cors[26:39,7:9] #White admixture x Achv.
desc = rbind(Afro=describe(unlist(admixture.cors.hispanic[,1])), #descp. stats afri x achv
             Amer=describe(unlist(admixture.cors.hispanic[,2])), #amer x achv
             Euro=describe(unlist(admixture.cors.hispanic[,3]))) #euro x achv

##Examine hispanics by scatterplots
scatterplot(Read2013H ~ LatinoAfricanAncestry, data,
            smoother=FALSE, id.n=nrow(data))
scatterplot(Read2013H ~ LatinoEuropeanAncestry, data,
            smoother=FALSE, id.n=nrow(data))
scatterplot(Read2013H ~ LatinoAmericanAncestry, data,
            smoother=FALSE, id.n=nrow(data))
scatterplot(Math2013H ~ LatinoAfricanAncestry, data,
            smoother=FALSE, id.n=nrow(data))
scatterplot(Math2013H ~ LatinoEuropeanAncestry, data,
scatterplot(Math2013H ~ LatinoAmericanAncestry, data,
#General factor
scatterplot(hispanic.ach.factor ~ LatinoAfricanAncestry, data,
            smoother=FALSE, id.n=nrow(data))
scatterplot(hispanic.ach.factor ~ LatinoEuropeanAncestry, data,
scatterplot(hispanic.ach.factor ~ LatinoAmericanAncestry, data,

##Imputed and aggregated data
#Hispanics = data[26:39] #subset hispanic ach data =[<ncol(,] #remove empty cases
miss.table( #examine missing data = irmi(, noise.factor = 0) #impute the rest
#factor analysis
fact.hispanic = fa( #get common ach factor
fact.scores = fact.hispanic$scores; colnames(fact.scores) = "hispanic.ach.factor"
data = merge.datasets(data,fact.scores,1) #merge it back into data
cors[7:9,"hispanic.ach.factor"] #results for general factor

#Blacks = data[10:23] #subset black ach data =[<ncol(,] #remove empty cases = irmi(, noise.factor = 0) #impute the rest
#factor analysis = fa( #get common ach factor
fact.scores =$scores; colnames(fact.scores) = "black.ach.factor"
data = merge.datasets(data,fact.scores,1) #merge it back into data
cors[1:3,"black.ach.factor"] #results for general factor

##Admixture totals
Hispanic.admixture = subset(data, select=c("LatinoAfricanAncestry","LatinoAmericanAncestry","LatinoEuropeanAncestry"))
Hispanic.admixture = Hispanic.admixture[,] #complete cases
Hispanic.admixture.sum = data.frame(apply(Hispanic.admixture, 1, sum))
colnames(Hispanic.admixture.sum)="Hispanic.admixture.sum" #fix name
describe(Hispanic.admixture.sum) #stats

#add data back to dataframe
LatinoRemainderAncestry = 1-Hispanic.admixture.sum #get remainder
colnames(LatinoRemainderAncestry) = "LatinoRemainderAncestry" #rename
data = merge.datasets(LatinoRemainderAncestry,data,2) #merge back

#plot it
LatinoRemainderAncestry = LatinoRemainderAncestry[order(LatinoRemainderAncestry,decreasing=FALSE),,drop=FALSE] #reorder
dotchart(as.matrix(LatinoRemainderAncestry),cex=.7) #plot, with smaller text

Black.admixture = subset(data, select=c("BlackAfricanAncestry","BlackAmericanAncestry","BlackEuropeanAncestry"))
Black.admixture = Black.admixture[,] #complete cases
Black.admixture.sum = data.frame(apply(Black.admixture, 1, sum))
colnames(Black.admixture.sum)="Black.admixture.sum" #fix name
describe(Black.admixture.sum) #stats

#add data back to dataframe
BlackRemainderAncestry = 1-Black.admixture.sum #get remainder
colnames(BlackRemainderAncestry) = "BlackRemainderAncestry" #rename
data = merge.datasets(BlackRemainderAncestry,data,2) #merge back

#plot it
BlackRemainderAncestry = BlackRemainderAncestry[order(BlackRemainderAncestry,decreasing=FALSE),,drop=FALSE] #reorder
dotchart(as.matrix(BlackRemainderAncestry),cex=.7) #plot, with smaller text

#simple stats for both

#make subset with remainder data and achievement
remainders = subset(data, select=c("black.ach.factor","BlackRemainderAncestry",
View(rcorr(as.matrix(remainders))$r) #correlations?

#Partial correlations
partial.r(data, c(7:9,40), c(43))[4,] #partial out remainder for Hispanics
partial.r(data, c(1:3,41), c(42))[4,] #partial out remainder for Blacks


Bryc, K., Durand, E. Y., Macpherson, J. M., Reich, D., & Mountain, J. L. (2014). The Genetic Ancestry of African Americans, Latinos, and European Americans across the United States. The American Journal of Human Genetics.

November 15, 2014

Admixture in the Americas: Introduction, partial correlations and IQ predictions based on ancestry

For those who have been living under a rock (i.e. not following my on Twitter), John Fuerst have been very good at compiling data from published research. Have a look at Human Varieties with the tag Admixture Mapping. He asked me to help him analyze it and write it up. I gladly obliged, you can read the draft here. John thinks we should write it all into one huge paper instead of splitting it up as is standard practice. The standard practice is perhaps not entirely just for gaming the reputation system, but also because writing huge papers like that can seem overwhelming and may take a long time to get thru review.

So the project summarized so far is this:

  • Genetic models of trait admixture predict that mixed groups will be in-between the two source population in the trait in proportion to their admixture.
  • For psychological traits such as general intelligence (g), this has previously primarily been studied unsystematically in African Americans, but this line of research seems to have dried up, perhaps because it became too politically sensitive over there.
  • However, there have been some studies using the same method, just examining illness-related traits (e.g. diabetes). These studies usually include socioeconomic variables as controls. In doing so, they have found robust correlations between admixture at the individual level and socioeconomic outcomes: income, occupation, education and the like.
  • John has found quite a lot of these and compiled the results into a table that can be found here.
  • The results clearly show the expected results, namely that more European ancestry is associated with more favorable outcomes, more African or American less favorable outcomes. A few of them are non-significant, but none contradicts. A meta-analysis of this would find a very small p value indeed.
  • One study actually included cognitive measures as co-variates and found results in the generally expected direction. See material under the headline “Cognitive differences in the Americans” in the draft file.
  • There is no necessity that one has to look at the individual level. One can look at the group level too. For this reason John has compiled data about the ancestry proportions of American countries and Mexican regions.
  • For the countries, he has tested this against self-identified proportions, CIA World Factbook estimates, skin reflection data and stuff like that, see: The results are pretty solid. The estimates are clearly in the right ballpark.
  • Now, genetic models of the world distribution of general intelligence clearly predict that these estimates will be strongly related to the countries’ estimated mean levels of general intelligence. To test this John has carried out a number of multiple regressions with various controls such as parasite prevalence or cold weather along with European ancestry with the dependent variable being skin color and national achievement scores (PISA tests and the like). Results are in the expected directions even with controls.
  • Using the Mexican regional data, John has compared the Amerindian estimates with PISA scores, Raven’s scores, and Human Development Index (a proxy for S factor (see here and here)). Post is here:

This is where we are. Basically, the data is all there, ready to be analyzed. Someone needs to do the other part of the grunt work, namely running all the obvious tests and writing everything up for a big paper. This is where I come in.

The first I did was to create an OSF repository for the data and code since John had been manually keeping track of versions on HV. Not too good. I also converted his SPSS datafile to one that works on all platforms (CSV with semi-colons).

Then I started writing code in R. First I wanted to look at the more obvious relationships, such as that between IQ and ancestry estimates (ratios). Here I discovered that John had used a newer dataset of IQ estimates Meisenberg had sent him. However, it seems to have wrong data (Guatemala) and covers fewer relevant countries (25 vs. 35) vs. than the standard dataset from Lynn and Vanhanen 2012 (+Malloyian fixes) that I have been using. So for this reason I merged up John’s already enormous dataset (126 variables) with the latest Megadataset (365 variables), to create the cleverly named supermegadataset to be used for this study.

IQ x Ancestry zero-order correlations

Here’s the three scatterplots:




So the reader might wonder, what is wrong with the Amerindian data? Why is about nill? Simply inspecting it reveals the problem. The countries with low Amerindian ancestry have very mixed European vs. African which keeps the mean around 80-85 thus creating no correlation.

Partial correlations

So my idea was this, as I wrote it in my email to John:

Hey John,I wrote my bachelor in 4 days (5 pages per day), so now I’m back to working on more interesting things. I use the LV12 data because it seems better and is larger.

One thing that had been annoying me that was correlations between ancestry and IQ do not take into account that there are three variables that vary, not just two. Remember that odd low correlation Amer x IQ r=.14 compared with Euro x IQ = .68 and Afr x IQ = -.66. The reason for this, it seems to me, is that the countries with low Amer% are a mix of high and low Afr countries. That’s why you get a flat scatterplot. See attached.

Unfortunately, one cannot just use MR with these three variables, since the following equation is true of them 1 = Euro+Afr+Amer. They are structurally dependent. Remember that MR attempts to hold the other variables constant while changing one. This is impossible.
The solution is seems to me is to use partial correlations. In this way, one can partial out one of them and look at the remaining two. There are six possible ways to do this:Amer x IQ, partial out Afr = -.51
Amer x IQ, partial out Euro = .29
Euro x IQ, partial out Afr = .41
Euro x IQ, partial out Amer = .70
Afr x IQ, partial out Euro = -.37
Afr x IQ, partial out Amer = -.76
Assuming that genotypically, Amer=85, Afr=80, Euro=97 (or so), then these results are completed as expected direction wise. In the first case, we remove Afr, so we are comparing Amer vs. Euro. We expect negative since Amer<Euro
In two, we expect positive because Amer>Afr
In three, we expect positive because Euro>Amer
In four, we expect positive because Euro>Afr
In five, we expect negative because Afr<Amer
In six, we expect negative because Afr<Euro
All six predictions were as expected. The sample size is quite small at N=34 and LV12 isn’t perfect, certainly not for these countries. The overall results are quite reasonable in my review.
Estimates of IQ directly from ancestry
But instead merely looking at it via correlations or regressions, one can try to predict the IQs directly from the ancestry. Simple create a predicted IQ based on the proportions and these populations estimated IQs. I tried a number of variations, but they were all close to this: Euro*95+Amer*85+Afro*70. The reason to use Euro 95 and not, say, 100 is that 100 is the IQ of Northern Europeans, in particular the British (‘Greenwich Mean IQ’). The European genes found in the Americans are mostly from Spain and Portugal, which have estimated IQs of 96.6 and 94.4 (mean = 95.5). This creates a problem since the US and Canada are not mostly from these somewhat lower IQ Europeans, but the error source is small (one can always just try excluding them).

So, does the predictions work? Yes.

Now, there is another kind of error with such estimates, called elevation. It refers to getting the intervals between countries right, but generally either over or underestimating them. This kind of error is undetectable in correlation analysis. But one can calculate it by taking the predicted IQs and subtracting the measured IQs, and then taking the mean of these values. Positive values mean that one is overestimating, negative means underestimation. The value for the above is: 1.9, so we’re overestimating a little bit, but it’s fairly close. A bit of this is due to USA and CAN, but then again, LCA (St. Lucia) and DMA (Dominica) are strong negative outliers, perhaps just wrong estimates by Lynn and Vanhanen (the only study for St. Lucia is this, but I don’t have the norms so I can’t calculate the IQ).

I told Davide Piffer about these results and he suggested that I use his PCA factor scores instead. Now, these are not themselves meaningful, but they have the intervals directly estimated from the genetics. His numbers are: Africa: -1.71; Native American: -0.9; Spanish: -0.3. Ok, let’s try:


Astonishingly, the correlation is almost the same. .01 from. However, this fact is less overwhelming than it seems at first because it arises simply because the correlations between the three racial estimates is .999 (95.5

Older Posts »

Powered by WordPress