Reading up on the huge animal breeding literature gives a useful background to one’s thinking about what selection on humans will do in the future (embryo selection and direct editing á la CRISPR).


I made the above infograph some time ago, maybe 1-2 years. It is still pretty accurate. The newest data for genome sequencing does not look much different.

Steve Hsu has been following some of the animal breeding literature, e.g. Frontiers in cattle genomics.

I digged around a bit and found some reviews. They mentioned various interesting experiments. Of course, the most interesting experiment is still the Russian domesticated fox experiment (I want one of these!). Recently, there was an interesting one about breeding for brain size in guppies.


There is also the famous rat maze ability experiments. Solving mazes is g-loaded in humans (Jensen, 1980, book). A good review is Tolman and Tryon Early research on the inheritance of the ability to learn.


The most new and interesting part in relationship to humans is using genomic predictors alone. There is a recent, easy to read review: Understanding genomic selection in
poultry breeding.

selection for eggs

Because the animal breeding field has been going for so long, one find 100s if not 1000s of these types of graphs, yet they are still exciting. One might wonder: is there nothing one cannot select for? It seems no matter the trait, evolution finds a way. Dawkins seems to agree:

Political opposition to eugenic breeding of humans sometimes spills over into the almost certainly false assertion that it is impossible. Not only is it immoral, you may hear it said, it wouldn’t work. Unfortunately, to say that something is morally wrong, or politically undesirable, is not to say that it wouldn’t work. I have no doubt that, if you set your mind to it and had enough time and enough political power, you could breed a race of superior body-builders, or high-jumpers, or shot-putters; pearl fishers, sumo wrestlers, or sprinters; or (I suspect, although now with less confidence because there are no animal precedents) superior musicians, poets, mathematicians or wine-tasters. The reason I am confident about selective breeding for athletic prowess is that the qualities needed are so similar to those that demonstrably work in the breeding of racehorses and carthorses, of greyhounds and sledge dogs. The reason I am still pretty confident about the practical feasibility (though not the moral or political desirability) of selective breeding for mental or otherwise uniquely human traits is that there are so few examples where an attempt at selective breeding in animals has ever failed, even for traits that might have been thought surprising. Who would have thought, for example, that dogs could be bred for sheep-herding skills, or ‘pointing’, or bull-baiting?

[from The Greatest Show on Earth]

Selection for High and Low Fatness in Swine


Also interesting is that selective breeding makes it possible to estimate realized heritability, not just from family relationships.


I think we will see some interesting humans in the future. The reason is this: embryo selection is very close and genetic engineering is fairly close. If some countries ban them, others will allow them. Or one can sail or fly to a seastead. Or use any number of black market solutions that will inevitably spring up. Probably, not all jurisdictions will ban it, so there will be reproductive havens+tourism just like there are tax havens and even suicide havens. I don’t think Western governments will dare to force abortions on pregnant returnees, so there is nothing much they can do at that point. There is also of course the near-impossibility of proving that a fetus is a result of embryo selection, not normal fertilization. After all, embryo selection is just choosing between actual possibilities (hopefully, philosophy readers will allow me the flagrant abuse of modal terminology). If everybody starts having healthier children by using this technology, there will be no way to prove that a particular couple ‘cheated’. It is only in the aggregate one can prove that something is going on. A particular couple may just have been lucky. As for direct editing, it may be possible to spot genetically, but I doubt this will happen.

In the EU, I suspect the legality of this practice will come down to legal interpretation. The EU has a CHARTER OF FUNDAMENTAL RIGHTS OF THE EUROPEAN UNION, in which one can read:

Article 3
Right to the integrity of the person
1.   Everyone has the right to respect for his or her physical and mental integrity.
2.   In the fields of medicine and biology, the following must be respected in particular:
(a) the free and informed consent of the person concerned, according to the procedures laid down by law;
(b) the prohibition of eugenic practices, in particular those aiming at the selection of persons;
(c) the prohibition on making the human body and its parts as such a source of financial gain;
(d) the prohibition of the reproductive cloning of human beings.

But given that selection of persons is widely done for e.g. Down’s syndrome, (b) is clearly ignored in practice. (c) is also ignored e.g. for sperm and egg selling, altho they call it donation (with a nice monetary benefit in return). So, the best hope is that embryo selection for medical reasons will sneak into practice and become so standard that it would seem outlandish to ban it. This is well underway. When the public comes to accept it, the judges will probably make up some legal reason to interpret (b) narrowly, e.g. as to refer to forced sterilization. One may be able to find support for this in the background work for this charter, altho I haven’t looked into it.

Given that the technology will likely come into wide-scale practice within the next couple of decades, what remains to be researched more — a lot more — is how people will actually make choices. When prospective parent(s) have to make decisions re. which embryos to implement, there will be a choice. With a limited choice of embryos, one cannot simultaneously maximize all desirable traits and minimize all undesirable traits. There will probably be clear trends in this: few will select against intelligence, few will select short boys, few will select nasty diseases, most will select for health and happiness. People like Helen Henderson are not common:

I can say, without hesitation, that my life has been richer because I have MS. How can anyone who has no experience with disabilities understand that?

[From Future Human Evolution.]

If they still try to get children with horrible genetic diseases, the government probably (should?) will step in and ban it.

Still, there will be lots of variation. This variation in selective pressure between people should — together with strong assortative mating — result in divergence of human lines. This is will somewhat akin to dog, cat and horse breeds. Assortative mating is apparently so strong that people even choose pets that are similar to themselves: Self seeks like: many humans choose their dog pets following rules used for assortative mating.


We truly live in interesting times. :)

If you want to read more like this, there was also recently the double paper: Eugenics, Ready or Not I, II. (I could not find a link to part 2.)


A recent paper informs us that we have now found a small number of SNPs that explain skin color in European samples.

In the International Visible Trait Genetics (VisiGen) Consortium, we investigated the genetics of human skin color by combining a series of genome-wide association studies (GWAS) in a total of 17,262 Europeans with functional follow-up of discovered loci. Our GWAS provide the first genome-wide significant evidence for chromosome 20q11.22 harboring the ASIP gene being explicitly associated with skin color in Europeans. In addition, genomic loci at 5p13.2 (SLC45A2), 6p25.3 (IRF4), 15q13.1 (HERC2/OCA2), and 16q24.3 (MC1R) were confirmed to be involved in skin coloration in Europeans. In follow-up gene expression and regulation studies of 22 genes in 20q11.22, we highlighted two novel genes EIF2S2 and GSS, serving as competing functional candidates in this region and providing future research lines. A genetically inferred skin color score obtained from the 9 top-associated SNPs from 9 genes in 940 worldwide samples (HGDP-CEPH) showed a clear gradual pattern in Western Eurasians similar to the distribution of physical skin color, suggesting the used 9 SNPs as suitable markers for DNA prediction of skin color in Europeans and neighboring populations, relevant in future forensic and anthropological investigations.


All 9 SNPs listed in Table 1 were used to construct a genetically inferred skin color score in 940 samples from 54 worldwide populations (HGDP-CEPH samples), which showed a spatial distribution with a clear gradual increase in skin darkness from Northern Europe to Southern Europe to Northern Africa, the Middle East and Western Asia (Figure S2); in agreement with the known distribution of skin color across these geographic regions. Outside of these geographic regions, the inferred skin color score appeared rather similar (i.e., failing to discriminate), despite the known phenotypic skin color difference between generally lighter Asians/Native Americans and darker Africans. This demonstrates that although these 9 SNPs can explain skin color variation among Europeans, they cannot explain existing skin color differences between Asians/Native Americans and Africans. Therefore, these differences in skin color variation may partly be due to different DNA variants not identifiable by this European study with restricted genetic origin.

The same general problem may apply to the Piffer results. Perhaps the SNPs found only affect cognitive ability within European samples (or Euroasian, because there is one Chinese replication). This sounds like a case of epistasis, where the other necessary gene(s) for the identified SNPs to have an effect on cognitive ability have substantial frequencies in European populations, but don’t exist or are very rare in non-European populations.

As far as I know, this is a possible but unlikely scenario. It will perhaps serve as one of the remaining areas where non-hereditarians can point to and say that there is still reasonable doubt. The solution is to perform GWAS on African subjects. Luckily, a large number of such subjects live in or near (relatively) affluent countries in the Americas.


Remains to be done:

  • Admixture analysis (doing)
  • Proofreading and editing
  • Deciding how to control for age and scanner (technical question)


I explore a large (N≈1000), open dataset of brain measurements and find a general factor of brain size (GBSF) that covers all regions except possibly the amygdala (loadings near-zero, 3 out of 4 negative). It is very strongly correlated with total brain size volume and surface area (rs>.9). The factor was (near)identical across genders after adjustments for age were made (factor congruence 1.00).

GBSF had similar correlations to cognitive measures as did other aggregate brain size measures: total cortical area and total brain volume. I replicated the finding that brain measures were associated with parental income and educational attainment.



A recent paper by Noble et al (2015) has gotten considerable attention in the media. An interesting fact about the paper is that most of the data was published in the paper, perhaps inadvertently. I was made aware of this fact by an observant commenter, FranklinDMadoff, on the blog of James Thompson (Psychological Comments). In this paper I make use of the same data, revisit their conclusions as well do some of my own.

The abstract of the paper reads:

Socioeconomic disparities are associated with differences in cognitive development. The extent to which this translates to disparities in brain structure is unclear. We investigated relationships between socioeconomic factors and brain morphometry, independently of genetic ancestry, among a cohort of 1,099 typically developing individuals between 3 and 20 years of age. Income was logarithmically associated with brain surface area. Among children from lower income families, small differences in income were associated with relatively large differences in surface area, whereas, among children from higher income families, similar income increments were associated with smaller differences in surface area. These relationships were most prominent in regions supporting language, reading, executive functions and spatial skills; surface area mediated socioeconomic differences in certain neurocognitive abilities. These data imply that income relates most strongly to brain structure among the most disadvantaged children.

The results are not all that interesting, but the dataset is very large for a neuroscience study, the median sample size of median samples sizes in a meta-analysis of 49 meta-analysis is 116.5 (Button et al, 2013; based on the data in their Table 1). Furthermore, they provide a very large set of different, non-overlapping brain measurements which are useful for a variety of analyses, and they provide genetic admixture data which can be used for admixture mapping.

Why their results are as expected

The authors give their results (positive relationships between various brain size measures and parental educational and economic variables) environmental interpretations. For instance:

It is possible that, in these regions, associations between parent education and children’s brain surface area may be mediated by the ability of more highly educated parents to earn higher incomes, thereby having the ability to purchase more nutritious foods, provide more cognitively stimulating home learning environments, and afford higher quality child care settings or safer neighborhoods, with more opportunities for physical activity and less exposure to environmental pollutants and toxic stress3, 37. It will be important in the future to disambiguate these proximal processes by measuring home, family and other environmental mediators21.

However, one could also expect the relationship to be due to general cognitive ability (GCA; aka. general intelligence) and its relationship to favorable educational and economic outcomes, as well as brain measures. Figure 1 illustrates this expected relationship:

Figure 1

Figure 1 – Relationships between variables

The purple line is the one the authors are often arguing for based on their observed positive relationships. As can be seen in the figure, this positive relationship is also expected because of parental education/income’s positive relationship to parental GCA, which is related to parental brain properties which are highly heritable. Based on these well-known relationships, we can estimate some expected correlations. The true score relationship between adult educational attainment and GCA is somewhere around .56 (Strenze, 2007).

The relationship between GCA and whole brain size is around .24-.28, depending on whether one wants to use the unweighted mean, n-weighted mean or median, and which studies one includes of those collected by Pietschnig et al (2014). I used healthy samples (as opposed to clinical) and only FSIQ. This value is uncorrected for measurement error of the IQ test, typically assumed to be around .90. If we choose the conservative value of .24 and then correct with .90, we get .27 as an estimated true score correlation.

The heritability of whole brain size is very high. Bouchard (2014) summarized a few studies: one found cerebral total h^2 of = .89, another whole-brain grey matter .82, whole-brain white matter .87, and a third total brain volume .80. Perhaps there is some publication bias in these numbers, so we can choose .80 as an estimate. We then correct this for measurement error and get .89. None of the previous studies were corrected for restriction of range which is fairly common because most studies use university students (Henrich et al, 2010) who average perhaps 1 standard deviation above the population mean in GCA. If we multiply these numbers we get an estimate of r=.13 between parental education and total brain volume or a similar measure. As for income, the expected correlation is somewhat lower because the relationship between GCA and income is weaker, perhaps .23 (Strenze, 2007). This gives .05. However, Strenze did not account for the non-linearity of the income x GCA relationship, so it is probably somewhat higher.

Initial analyses

Analysis was done in R. Code file, figures, and data are available in supplementary material

Collecting the data

The authors did not just publish one datafile with comments about the variables, but instead various excel files were attached to parts of the figures. There are 6 such files. They all contain the same number of cases and they overlap completely (as can be seen by the subjectID column). The 6 files however do not overlap completely in their columns and some of them have unique columns. These can all be merged into one dataset.

Dealing with missing data

The original authors dealt with this simply by relying on the complete cases only. This method can bias the results when the data is not missing completely at random. Instead, it is generally better to impute missing data (Sterne et al, 2009). Figure 1 shows the matrixplot of the data file.


The red areas mean missing data, except in the case of nominal variables which are for some reason always colored red (an error I think). Examining the structure of missing data showed that it was generally not possible to impute the data, since many cases were missing most of their values. One will have to exclude these cases. Doing so reduces the sample size from 1500 to 1068. The authors report having 1099 complete cases, but I’m not sure where the discrepancy arises.

Dealing with gender

Since males have much larger brain volumes than females, even after adjustment for body size, there is the question of how to deal with gender (no distinction is being made here between sex and gender). The original authors did this by regressing the effect out. However, in my experience, regression does not always accomplish this perfectly, so when possible one should just split the sample by gender and calculate results in each one-gender sample. One cannot do the sampling splitting when one is interested in the specific regression effect of gender, or when the resulting samples would be too small.

Dealing with age

This problem is tricky. The original authors used age and age2 to deal with age in a regression model. However, since I wanted to visualize the relationships between variables, this option was not useful to me because it would only give me the summary statistics with the effects of age, not the data. Instead, I calculated the residuals for all variables of interest after they were regressed on age, age2 and age3. The cubic age was used to further catch non-linear effects of age, as noted by e.g. Jensen (2006: 132-133).

Dealing with scanning site

One peculiar feature of the study not discussed by the authors was the relatively effect of different scanners on their results, see e.g. their Table 3. To avoid scanning site influencing the results, I also regressed this out (as a nominal variable with 13 levels).

Dealing with size

The dataset does not have size measures thus making it impossible to adjust for body size. This is problematic as it is known that body size correlates with GCA both within and between species. We are interested in differences in brain size holding body size equal. This cannot be done in the present study.

Factor analyzing brain size measurements

Why would one want to factor analyze brain measures?

The short answer is the same as that to the question: why would one want to factor analyze cognitive ability data? The answer: To explore the latent relationships in the data not immediately obvious. A factor analysis will reveal whether there is a general factor of some domain, which can be a theoretically very important discovery (Dalliard, 2013; Jensen, 1998:chapter 2). If there is no general factor, this will also be revealed and may be important as well. This is not to say that general factors or the lack thereof are the only interesting thing about the factor structure, multifactor structures are also interesting, whether orthogonal (uncorrelated) or as part of a hierarchical solution (Jensen, 2002).

The long answer is that human psychology is fundamentally a biological fact, a matter of brain physics and chemistry. This is not to say that important relationships can not fruitfully be described better at higher-levels (e.g. cognitive science), but that ultimately the origin of anything mental is biology. This fact should not be controversial except among the religious, for it is merely the denial of dualism, of ghosts, spirits, gods and other immaterial beings. As Jensen (1997) wrote:

Although the g factor is typically the largest component of the common factor variance, it is the most “invisible.” It is the only “factor of the mind” that cannot possibly be described in terms of any particular kind of knowledge or skill, or any other characteristics of psychometric tests. The fact that psychometric g is highly heritable and has many physical and brain correlates means that it is not a property of the tests per se. Rather, g is a property of the brain that is reflected in observed individual differences in the many types of behavior commonly referred to as “cognitive ability” or “intelligence.” Research on the explanation of g, therefore, must necessarily extend beyond psychology and psychometrics. It is essentially a problem for brain neurophysiology. [my emphasis]

If GCA is a property of the brain, or at least that there is an analogous general brain performance factor, it may be possible to find it with the same statistical methods that found the GCA. Thus, to find it, one must factor analyze a large, diverse sample of brain measurements that are known to correlate individually with GCA in the hope that there will be a general factor which will correlate very strongly with GCA. There is no guarantee as I see it that this will work, as I see it, but it is something worth trying.

In their chapter on brain and intelligence, Colom and Thompson (2011) write:

The interplay between genes and behavior takes place in the brain. Therefore, learning the language of the brain would be crucial to understand how genes and behavior interact. Regarding this issue, Kovas and Plomin (2006) proposed the so -called “ generalist genes ” hypothesis, on the basis of multivariate genetic research findings showing significant genetic overlap among cognitive abilities such as the general factor of intelligence ( g ), language, reading, or mathematics. The hypothesis has implication for cognitive neuroscience, because of the concepts of pleiotropy (one gene affecting many traits) and polygenicity (many genes affecting a given trait). These genetic concepts suggest a “ generalist brain ” : the genetic influence over the brain is thought to be general and distributed.

Which brain measurements have so far been found to correlate with GCA (or its IQ proxy)?

Below I have compiled a list of brain measurements that have at some point been found to be correlated with GCA IQ scores:

  • Brain evoked potentials: habituation time (Jensen, 1998:155)
  • Brain evoked potentials: complexity of waveform (Deary and Carol, 1997)
  • Brain intracellular pH-level (Jensen, 1998:162)
  • Brain size: total and brain regions (Jung and Haier, 2007)
  • Of the above, grey matter and white matter separate
  • Cortical thickness (Deary et al, 2010)
  • Cortical development (Shaw, P. et al. 2006)
  • Nerve conduction velocity (Deary and Carol, 1997)
  • Brain wave (EEG) coherence (Jensen, 2002)
  • Event related desynchronization of brain waves (Jensen, 2002)
  • White matter lesions (Turken et al, 2008)
  • Concentrations of N-acetyl aspartate (Jung, et al. 2009)
  • Water diffusion parameters (Deary et al, 2010)
  • White matter integrity (Deary et al, 2010)
  • White matter network efficiency (Li et al. 2009)
  • Cortical glucose metabolic rate during mental activity / Neural efficiency (Neubauer et al, 2009)
  • Uric acid level (Jensen, 1998:162)
  • Density of various regions (Frangou et al 2004)
  • White matter fractional anisotropy (Navas‐Sánchez et al 2014; Kim et al 2014)
  • Reliable responding to changing inputs (Euler et al, 2015)

Most of the references above lead to the reviews I relied upon (Deary and Carol, 1997; Jensen, 1998, 2002; Deary et al, 2010). There are surely more, and probably a large number of the above are false-positives. Some I could not find a direct citation for. We cannot know which are false positives until large datasets are compiled with these measures as well as a large number of cognitive tests. A simple WAIS battery won’t suffice, there needs to be elementary cognitive tests too, and other tests that vary more in content, type and g-loading. This is necessary if we are to use the method of correlated vectors as this does not work well without diversity in factor indicators. It is also necessary if we are to examine non-GCA factors.

My hypothesis is that if there is a general brain factor, then it will have a hierarchical structure similar to GCA. Figure 2 shows a hypothetical structure of this.

Figure 2

Notes: Where squares at latent variables and circles are observed variables. I am aware this is opposite of normal practice (e.g. Beaujean, 2014) but text is difficult to fit into circles.

Of these, the speed factor has to do with speed of processing which can be enhanced in various ways (nerve conduction velocity, higher ‘clock’ frequency). Efficiency has to do with efficient use of resources (primarily glucose). Connectivity has to do with better intrabrain connectivity, either by having more connections, less problematic connections or similar. Size has to do with having more processing power by scaling up the size. Some areas may matter more than others for this. Integrity has to do with withstanding assaults, removing garbage (which is known to be the cause of many neurodegenerative diseases) and the like. There are presumably more factors, and some of mine may need to be split.

Previous studies and the present study

Altho factor analysis is common in differential psychology and related fields, it is somewhat rare outside of those. And when it is used, it is often done in ways that are questionable (see e.g. controversy surrounding Hampshire et al (2012): Ashton et al (2014a), Hampshire et al (2014), Ashton et al (2014b), Haier et al (2014a), Ashton et al (2014c), Haier et al (2014b)). On the other hand, factor analytic methods have been used in a surprisingly diverse collection of scientific fields (Jöreskog 1996; Cudeck and MacCallum, 2012).

I am only familiar with one study applying factor analysis to different brain measures and it was a fairly small study at n=132 (Pennington et al, 2000). They analyzed 13 brain regions and reported a two-factor solution. It is worth quoting their methodology section:

Since the morphometric analyses yield a very large number of variables per subject, we needed a data reduction strategy that fit with the overall goal of exploring the etiology of individual differences in the size of major brain structures. There were two steps to this strategy: (1) selecting a reasonably small set of composite variables that were both comprehensive and meaningful; and (2) factor analyzing the composite variables. To arrive at the 13 composite variables discussed earlier, we (1) picked the major subcortical structures identified by the anatomic segmentation algorithms, (2) reduced the set of possible cortical variables by combining some of the pericallosal partitions as described earlier, and (3) tested whether it was justifiable to collapse across hemispheres. In the total sample, there was a high degree of correlation (median R=.93, range=.82-.99) between the right and left sides of any given structure; it thus seemed reasonable to collapse across hemispheres in creating composites. We next factor-analyzed the 13 brain variables in the total sample of 132 subjects, using Principal Components factor analysis with Varimax rotation (Maxwell & Delaney, 1990). The criteria for a significant factor was an eigenvalue>l.0, with at least two variables loading on the factor.

The present study makes it possible to perform a better analysis. The sample is about 8 times larger and has 27 non-overlapping measurements of brain size, broadly speaking. The major downside of the variables in the present study is that the cerebral is not divided into smaller areas as done in their study. Given the very large sample size, one could use 100 variables or more.

The available brain measures are:

  1. cort_area.ctx.lh.caudalanteriorcingulate
  2. cort_area.ctx.lh.caudalmiddlefrontal
  3. cort_area.ctx.lh.fusiform
  4. cort_area.ctx.lh.inferiortemporal
  5. cort_area.ctx.lh.middletemporal
  6. cort_area.ctx.lh.parsopercularis
  7. cort_area.ctx.lh.parsorbitalis
  8. cort_area.ctx.lh.parstriangularis
  9. cort_area.ctx.lh.rostralanteriorcingulate
  10. cort_area.ctx.lh.rostralmiddlefrontal
  11. cort_area.ctx.lh.superiortemporal
  12. cort_area.ctx.rh.caudalanteriorcingulate
  13. cort_area.ctx.rh.caudalmiddlefrontal
  14. cort_area.ctx.rh.fusiform
  15. cort_area.ctx.rh.parsopercularis
  16. cort_area.ctx.rh.parsorbitalis
  17. cort_area.ctx.rh.parstriangularis
  18. cort_area.ctx.rh.rostralanteriorcingulate
  19. cort_area.ctx.rh.rostralmiddlefrontal
  20. vol.Left.Cerebral.White.Matter
  21. vol.Left.Cerebral.Cortex
  22. vol.Left.Hippocampus
  23. vol.Left.Amygdala
  24. vol.Right.Cerebral.White.Matter
  25. vol.Right.Cerebral.Cortex
  26. vol.Right.Hippocampus
  27. vol.Right.Amygdala

I am not expert in neuroscience, but as far as I know, the above measurements are independent and thus suitable for factor analysis. They reported additional aggregate measures such as total surface area and total volume. They also reported total cranial volume, which permits the calculations of another two brain measurements: the non-brain volume of the cranium (subtracting total brain volume from total intracranial volume), and the proportion of intracranial volume used for brain.

The careful reader has perhaps noticed something bizarre about the dataset, namely that there is an unequal number of left hemisphere (“lh”) and right hemisphere (“rh”) regions (11 vs. 8). I have no idea why this is, but it is somewhat problematic in factor analysis since this weights some variables twice as well as weighting the left side a bit more.

The present dataset is inadequate for properly testing the general brain factor hypothesis because it only has measurements from one domain: size. The original authors may have more measurements they did not publish. However, one can examine the existence of the brain size factor, as a prior test of the more general hypothesis.

Age and overall brain size

As an initial check, I plotted the relationship between total brain size measures and age. These are shown in Figure 3 and 4.

Figure 3 Figure 4

Curiously, these show that the size increase only occurs up to about age 8 and 10, or so. I was under the impression that brain size continued to go up until the body in general stopped growing, around 15-20 years. This study does not appear to be inconsistent with others (e.g. Giedd, 1999). The relationship is clearly non-linear, so one will need to use the age corrections described above. To see if the correction worked, we plot the total size variables and age. There should be near-zero correlation. Results in Figures 5 and 6.

Figure 5 Figure 6

Instead we still see a slight correlation for both genders, both apparently due to a single outlier. Very odd. I examined these outliers (IDs: P0009 and P0010) but did not see anything special about them. I removed them and reran the residualization from the original data. This produced new outliers similar to before (with IDs following them). When I removed them, new ones. I figure it is due to some error with the residualization process. Indeed, a closer look revealed that the largest outliers (positive and negative) were always the first two indexes. I thus removed these before doing more analyses. The second largest outliers had no particular index. I tried removing more age outliers, but it was not possible to completely remove the correlations between age and the other variables (usually remained near r=.03). Figure 6a shows the same as Figure 6 just without the two outliers.

Figure 6a

The genders are somewhat displaced on the age variable, but if one looks at the x-axis, one an see that this is in fact a very, very small difference.

General brain size factor with and without residualization

Results for the factor analysis without residualization are shown in Figure 7. I used the fa() function from the psych package with default settings: 1 factor extracted with the minimum residuals method. Previous studies have shown factor extraction method to be of little importance as long as it isn’t principal components with a smaller number of variables (Kirkegaard, 2014).

Figure 7

We see that the factors are quite similar (factor congruence .95) but that the male factor is quite a bit stronger (var% M/F 26 vs. 16). This suggests that the factor either works differently in the genders, or there is error in the results. If it is error, we should see an improvement after removing some of it. Figure 8 shows the same plot using the residualized data.

Figure 8

The results were more similar now and stronger for both genders (var% M/F = 34 vs. 33).

The amygdala results are intriguing, suggesting that this region does not increase in size along with the rest of the brain. The right amygdala even had negative loadings in both genders.

Using all that’s left

The next thing one might want to do is extract multiple factors. I tried extracting various solutions with nfactors 3-5. These however are bogus models due to the near-1 correlation between the brain sides. This results in spurious factors that load on just 2 variables (left and right versions) with loadings near 1. One could solve this by either averaging those with 2 measurements, or using only those from the left side. It makes little difference because they correlate so highly. It should be noted tho that doing this means one can’t see any lateralization effects such as that suggested for the right amygdala.

I redid all the results using the left side variables only. Figure 9 shows the results.

Figure 9

Now all regions had positive loadings and the var% increased a bit for both genders to 36/36. Factor congruence was 1.00, even for the non-residualized data. It thus seems that the missing measures of the right side or the use of near-doubled measures had a negative impact on the results as well.

One can calculate other measures of factor strength/internal reliability, such as the average intercorrelation, Cronbach’s alpha, Guttman’s G6. These are shown in Table 1.

Table 1- Internal reliability measures
Sample Mean r Alpha (raw) Alpha (std.) G6
Male .33 .48 .88 .90
Female .34 .45 .89 .90


Multiple factors

We are now ready to explore factor solutions with more factors. Three different methods suggested extracted at most 5 factors both datasets (using nScree() from nFactors package). I extracted solutions for 2 to 6 factors for each dataset, the last included by accident. All of these were extracted with oblique rotation method of oblimin thus possibly returning correlated factors. The prediction from a hierarchical model is clear: factors extracted in this way should be correlated. Figures 10 to 14 show the factor loadings of these solutions.

Figure 10 Figure 11 Figure 12 Figure 13

Figure 14

So it looks like results very pretty good with 4 factors and not too good with the others. The problem with this method is that the factors extracted may be similar/identical but not in the same order and with the same name. This means that the plots above may plot the wrong factors together which defeats the entire purpose. So what we need is an automatic method of pairing up the factors correctly if possible. The exhaustive method is trying all the pairings of factors for each number of factors to extract, and then calculating some summary metrics or finding the best overall pairing combination. This would involve quite a lot of comparisons, since e.g. one can pair up set 2 sets of, say, 5 factors in 5*4/2 ways (10).

I settled for a quicker solution. For each factor solution pair, I calculated all the cross-analysis congruence factors. Then for each factor, I found the factor from the other analysis it had the highest congruence with and saved this information. This method can miss some okay but not great solutions, but I’m not overly concerned about those. In a good fit, the factors found in each analysis should map 1 to 1 to each other such that their highest congruence is with the analog factor from the other analysis.

From this information, I calculated the mean of the best congruence pairs, the minimum, and whether there was a mismatch. A mismatch occurs when two or more factors from one analysis maps to (has the highest congruence) with the same factor from the other analysis. I calculated three metrics for all the analyses performed above. The results are shown in Table 2.

Table 2 – Cross-analysis comparison metrics
Factors.extracted factor.mismatch
2 0.825 0.73 FALSE
3 0.713 0.37 TRUE
4 0.960 0.93 FALSE
5 0.720 0.35 TRUE
6 0.765 0.58 FALSE


As can be seen, the two analyses with 4 factors were a very good match. Those with 3 and 5 terrible as they produced factor mismatches. The analyses with 2 and 6 were also okay.

The function for going thru all the oblique solutions for two samples also returns the necessary information to match up the factors if they need reordering. If there is a mismatch, this operation is nonsensical, so I won’t re-do all the plots. The plot above with 4 factors just happens to already be correctly ordered. This however need not be the case. The only plot that needs to be redone is that with 6 factors. It is shown in Figure 15.

Figure 15

Compare with figure 14 above. One might wonder whether the 4 or 6 factor solutions are the best. In this case, the answer is the 4 factor solutions because the female 6 factor solution is illegal — one factor loading is above 1 (“a Heywood case”). At present, given the relatively few regional measures, and the limitation to only volume and surface measures, I would not put too much effort into theorizing about the multifactor structure found so far. It is merely a preliminary finding and may change drastically when more measures are added or measures are sampled differently.

A more important finding from all the multifactor solutions was that all produced correlated factors, which indicates a general factor.

Aggregate measures and the general brain size factor

So, the general brain size factor (GBSF) may exist, but is it useful? At first, we may want to correlate the various aggregate variables. Results are in Table 3.

Table 3 – Correlations between aggregate brain measures vol.WholeBrain vol.IntracranialVolume GBSF 0.997 0.869 0.746 0.953 0.997 0.867 0.751 0.953
vol.WholeBrain 0.832 0.832 0.822 0.923
vol.IntracranialVolume 0.638 0.642 0.798 0.776
GBSF 0.950 0.950 0.905 0.711

Notes: Correlations above diagonal are males, below females.

The total areas of the brain are almost symmetrical: the correlation of the total surface area and left side only is a striking .997. Intracranial volume is a decent proxy (.822) for whole brain volume, but is somewhat worse for total surface area (.746). GBSF has very strong correlations with the surface areas (.95), but not quite as strong as the analogous situation in cognitive data: IQ and extracted general factor (GCA factor) usually correlate .99 with a reasonable sample of subtests: Ree and Earles (1991) reported that an average GCA factor correlated .991 with an unweighted sum score in a sample of >9k, Kirkegaard (2014b) found a .99 correlation between extracted GCA and an unweighted sum in a Dutch university sample of ~500.

Correlations with cognitive measures

The authors have data for 4 cognitive tests, however, data are only public for 2 of them. These are in the authors’ words:

Flanker inhibitory control test (N = 1,074).
The NIH Toolbox Cognition Battery version of the flanker task was adapted from the Attention Network Test (ANT). Participants were presented with a stimulus on the center of a computer screen and were required to indicate the left-right orientation while inhibiting attention to the flankers (surrounding stimuli). On some trials the orientation of the flankers was congruent with the orientation of the central stimulus and on the other trials the flankers were incongruent. The test consisted of a block of 25 fish trials (designed to be more engaging and easier to see to make the task easier for children) and a block of 25 arrow trials, with 16 congruent and 9 incongruent trials in each block, presented in pseudorandom order. Participants who responded correctly on 5 or more of the 9 incongruent trials then proceeded to the arrows block. All children age 9 and above received both the fish and arrows blocks regardless of performance. The inhibitory control score was based on performance on both congruent and incongruent trials. A two-vector method was used that incorporated both accuracy and reaction time (RT) for participants who maintained a high level of accuracy (>80% correct), and accuracy only for those who did not meet this criteria. Each vector score ranged from 0 to 5, for a maximum total score of 10 (M = 7.67, s.d. = 1.86).
List sorting working memory test (N = 1,084).
This working memory measure requires participants to order stimuli by size. Participants were presented with a series of pictures on a computer screen and heard the name of the object from a speaker. The test was divided into the One-List and Two-List conditions. In the One-List condition, participants were told to remember a series of objects (food or animals) and repeat them in order, from smallest to largest. In the Two-List condition, participants were told to remember a series of objects (food and animals, intermixed) and then again report the food in order of size, followed by animals in order of size. Working memory scores consisted of combined total items correct on both One-List and Two-List conditions, with a maximum of 28 points (M = 17.71, s.d. = 5.39).

I could not locate a factor analytic study for the Flanker test, so I don’t know how g-loaded it is. Working memory (WM) is known to have a strong relationship to GCA (Unsworth et al, 2014). The WM variable should probably be expected to be the most g-loaded of the two. The implication given the causal hypothesis of brain size for GCA is that the WM test should show higher correlations to the brain measures. Figures X and X show the histograms for the cognitive measures.


Note that the x-values do not have any interpretation as they are the residual raw values, not raw values. For the Flanker test, we see that it is bimodal. It seems that a significant part of the sample did not understand the test and thus did very poorly. One should probably either remove them or use a non-parametric measure if one wanted to rely on this variable. I decided to remove them since the sample was sufficiently large that this wasn’t a big problem. The procedure reduced the skew from -1.3/-1.1 to -.2/-.1 respectively for the male and female samples. The sample sizes were reduced from 548/516 to 522/487 respectively. One could plausibly combine them into one measure which would perhaps be a better estimate of GCA than either of them alone. This would be the case if their g-loading was about similar. If however, one is much more g-loaded than the other, it would degrade the measurement towards a middle level. I combined the two measures by first normalizing them (to put them on the same scale) and then averaging them.

Given the very high correlations between the GBSF of these data and the other aggregate measures, it is not expected that the GBSF will correlate much more strongly with cognitive measures than the other aggregate brain measures. Table X shows the correlations.

Table X – Correlations between cognitive measures and aggregate brain size measures
Variable WM Flanker Combined WM Flanker Combined
Males Females
Flanker 0.407 0.393
WM.Flanker.mean 0.830 0.847 0.824 0.845 0.302 0.138 0.235 0.236 0.201 0.237 0.302 0.137 0.235 0.239 0.203 0.238
vol.WholeBrain 0.263 0.103 0.201 0.158 0.120 0.146
vol.IntracranialVolume 0.213 0.101 0.170 0.154 0.101 0.137
GBSF 0.311 0.147 0.252 0.223 0.181 0.218


As for the GBSF, given that it is a ‘distillate’ (Jensen’s term), one would expect it to have slightly higher correlations with the cognitive measures than the merely unweighted ‘sum’ measures. This was the case for males, but not females. In general, the female correlations were weaker, especially the whole brain volume x WM (.263 vs. .158). Despite the large sample sizes, this difference is not very certain, the 95% confidence intervals are -.01 to .22. A larger sample is necessary to examine this question. The finding is intriguing is that if real, it would pose an alternative solution to the Ankney-Rushton anomaly, that is, the fact that males have greater brain size and this is related to IQ scores, but do not consistently perform better on IQ tests (Jackson and Rushton, 2006). Note however that the recent large meta-analysis of brain size x IQ studies did not find an effect of gender, so perhaps the above results are a coincidence (Pietschnig et al 2014).

We also see that the total cortical area variables were stronger correlates of cognitive measures than whole brain volume, but a larger sample is necessary to confirm this pattern.

Lastly, we see a moderately strong correlation between the two cognitive measures (≈.4). The combined measure was a weaker correlate of the criteria variables, which is what is expected if the Flanker test was a relatively weaker test of GCA than the WM one.

Correlations with parental education and income

It is time to revisit the results reported by the original authors, namely correlations between educational/economic variables and brain measures. I think the correlations between specific brain regions and criteria variables is mostly a fishing expedition of chance results (multiple testing) and of no particular interest unless strong predictions can be made before looking at the data. For this reason, I present only correlations with the aggregate brain measures, as seen in Table X.

Table x – Correlations between educational/economic variables and other variables
Variable ED ln_Inc Income ED ln_Inc Income
Males Females
WM 0.131 0.192 0.175 0.170 0.229 0.174
Flanker 0.163 0.180 0.188 0.118 0.131 0.106
WM.Flanker.mean 0.168 0.215 0.206 0.178 0.215 0.165 0.104 0.217 0.207 0.128 0.173 0.154 0.108 0.215 0.208 0.133 0.170 0.152
vol.WholeBrain 0.103 0.190 0.195 0.064 0.112 0.078
vol.IntracranialVolume 0.126 0.157 0.159 0.086 0.104 0.100
GBSF 0.109 0.206 0.204 0.100 0.157 0.137
ED 0.559 0.542 0.561 0.513
ln_Inc 0.559 0.866 0.561 0.855
Income 0.542 0.866 0.513 0.855


Here the correlations of the combined cognitive measure was higher than WM, unlike before, so perhaps the diagnosis from before was wrong. In general, the correlations of income and brain measures were stronger than that for education. This is despite the fact that GCA is more strongly correlated to educational attainment than income. This was however not the same in this sample: correlations of WM and Flanker were stronger with the economic variables. Perhaps there is more range restriction in the educational variable than the income one. An alternative environmental interpretation is that it is the affluence that causes the larger brains.

If we recall the theoretic predictions of the strength of the correlations, the incomes are stronger than expected (actual .19/.09 M/F, predicted about .05), while the educational ones are a bit weaker than expected (actual .1/.6, predicted about .13). However, the sample sizes are not larger enough for these results to be certain enough to question the theory.

Racial admixture

To me surprise, the sample had racial admixture data. This is surprising because such data has been available to testing the genetic hypothesis of group differences for many years, apparently without anyone publishing something on the issue. As I argued elsewhere, this is odd given that a good dataset would be able to decisively settle the ‘race and intelligence’ controversy (Dalliard, 2014; Rote and Rodgers, 2005; Rushton and Jensen, 2005). It is actually very good evidence for the genetic hypothesis because if it was false, and these datasets showed it, it would have been a great accomplishment for a mainstream scientist to publish a paper decisively showing that it was indeed false. However, if it was true, then any mainstream scientist could not publish it without risking personal assaults, getting fired and possibly pulled in court as were academics who previously researched that topic (Gottfredson, 2005; Intelligence 1998; Nyborg 2011; 2003).

The genomic data however appeared to be an either/or (1 or 0) variable in the released data files. Oddly, some persons had no value for any racial group. It turns out that the data was merely rounded in the spreadsheet file. This explained why some persons had 0 for all groups: These persons did not belong at least 50% to any racial group, and thus they were assigned a 0 in every case.

I can think of two ways to count the number of persons in the main categories. One can count the total ‘summed’ persons. In this way, if person A has 50% ancestry from race R, and person B has 30%, this would sum to .8 persons. One can think of it as the number of pure-breed persons’ worth of ancestry from that that group. Another way is to count everybody as 1 who is above some threshold for ancestry. I chose to use 20% and 80% for thresholds, which correspond with persons with substantial ancestry from that racial cluster, and persons with mostly ancestry from that cluster. One could choose other values of course, and there is a degree of arbitrariness, but it is not important what the particular values are.

Results are in Table X.

Racial group European African Amerindian East Asian Oceanian Central Asia Sum
Summed ancestry ‘persons’ 686.364 134.2714 48.31457 163.49868 8.59802 26.95408 1068.00075
Persons with >20% 851 187 89 238 8 30 1403
Persons with >80% 647 105 3 121 0 21 897


Note that the number 1068 is the exact number of persons in the complete sample, which means that the summed ancestry for all groups has an error of a mere .00075.

Another way of understanding the data is to plot histograms of each racial group. These are shown below in Figures X to X.

Race_European_histogramRace_African_histogram Race_Amerindian_histogram Race_East_Asian_histogram   Race_Oceanian_histogramRace_Central_Asian_histogram


Since European ancestry is the largest, the other plots are mostly empty except for the 0% bar. But we do see a fair amount of admixture in the dataset.

Regression, residualization, correlation and power

There are a couple of different methods one could use to examine the admixture data. A simple correlation is justified when dealing with a group that only has 2 sources of ancestry. This is the easiest case to handle. For this to work, the groups most have a different genotypic mean of the trait in question (GCA and brain size variables in this case) and there must be a substantially admixtured population. Even given a large hypothesized genotypic difference, the expected correlation is actually quite small. For African Americans (such as those in the sample), their European ancestry% is about 15-25% depending on the exact sample. The standard deviation of their European ancestry% is not always reported, but one can calculate it if one has some data, which we do.

The first problem with this dataset is that there are no sociological race categories (“white”, “African American”, “Hispanic”, “Asian” etc.), but only genomic data. This means that to get an African American subsample, we must create one based on actual actual ancestry. There are two criteria that needs to be met for inclusion in that group: 1) the person must be substantially African, 2) the person must be mostly a mix of European and African ancestry. Going with the values from before, this means that the person must be at least 20% African, and at least 80% combined European and African.

Dealing with scanner and site

There are a variety of ways to use the data and they may or may not give similar results. First is the question of which variables to control for. In the earlier sections of this paper, I controlled for Age, Age2, Age3, Scanner (12 different). For producing the above ancestry plots and results I did not control for anything. Controlling the ancestry variables for scanner is problematic as people from different races live in different places. Controlling for this thus removes the racial differences for no reason. One could similarly control for site where the scanner is (I did not do this earlier). We can compare this to scanner by a contingency table, as shown in Table X below:

Table X – Contingency table of scanner site and scanner #
Site/scanner 0 1 10 11 12 2 3 4 5 6 7 8 9
Cornel 0 0 0 96 0 0 0 0 0 0 0 0 0
Davis 0 0 0 0 0 0 0 0 0 0 114 0 0
Hawaii 0 0 0 0 0 0 0 0 0 202 0 0 0
KKI 0 0 0 0 0 0 103 0 0 0 0 0 0
MGH 0 0 0 0 0 0 0 0 115 0 0 0 13
UCLA 0 0 27 0 22 0 0 10 0 0 0 0 0
UCSD 109 93 0 0 0 0 0 0 0 0 0 0 0
UMMS 0 0 0 0 0 56 0 0 0 0 0 0 0
Yale 0 0 0 0 0 0 0 0 0 0 0 108 0


As we can see, these are clearly inter-dependent, given the obvious fact that the scanners have a particular location and was not moved around (all columns have only 1 cell with value>0). Some sites however have multiple scanners, some have only one. E.g. UCSD has two scanners (#0 and #1), while KKI has only one (#3).

Controlling for scanner however makes sense if we are looking at brain size variables, as this removes differences between measurements due to differences in the scanning equipment or (post-)processing. So perhaps one would want to control brain measurements for scanner and age effects, but only control the remaining variables for age affects.

Dealing with gender

As before

To be continued…



  • Ashton, M. C., Lee, K., & Visser, B. A. (2014a). Higher-order g versus blended variable models of mental ability: Comment on Hampshire, Highfield, Parkin, and Owen (2012). Personality and Individual Differences, 60, 3-7.
  • Ashton, M. C., Lee, K., & Visser, B. A. (2014b). Orthogonal factors of mental ability? A response to Hampshire et al. Personality and Individual Differences, 60, 13-15.
  • Ashton, M. C., Lee, K., & Visser, B. A. (2014c). Further response to Hampshire et al. Personality and Individual Differences, 60, 18-19.
  • Beaujean, A. A. (2014). Latent Variable Modeling Using R: A Step by Step Guide: A Step-by-Step Guide. Routledge.
  • Bouchard Jr, T. J. (2014). Genes, Evolution and Intelligence. Behavior genetics, 44(6), 549-577.
  • Button, K. S., Ioannidis, J. P., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S., & Munafò, M. R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365-376.
  • Colom, R., & Thompson, P. M. (2011). Intelligence by Imaging the Brain. The Wiley-Blackwell handbook of individual differences, 3, 330.
  • Cudeck, R., & MacCallum, R. C. (Eds.). (2012). Factor analysis at 100: Historical developments and future directions. Routledge.
  • Dalliard, M. (2013). Is Psychometric g a Myth?. Human Varieties.
  • Dalliard, M. (2014). The Elusive X-Factor: A Critique of J. M. Kaplan’s Model of Race and IQ. Open Differential Psychology.
  • Deary, I. J., & Caryl, P. G. (1997). Neuroscience and human intelligence differences. Trends in Neurosciences, 20(8), 365-371.
  • Deary, I. J., Penke, L., & Johnson, W. (2010). The neuroscience of human intelligence differences. Nature Reviews Neuroscience, 11(3), 201-211.
  • Dekaban, A.S. and Sadowsky, D. (1978). Changes in brain weights during the span of human life: relation of brain weights to body heights and body weights, Ann. Neurology, 4:345-356.
  • Euler, M. J., Weisend, M. P., Jung, R. E., Thoma, R. J., & Yeo, R. A. (2015). Reliable Activation to Novel Stimuli Predicts Higher Fluid Intelligence. NeuroImage.
  • Frangou, S., Chitins, X., & Williams, S. C. (2004). Mapping IQ and gray matter density in healthy young people. Neuroimage, 23(3), 800-805.
  • Giedd, J. N., Blumenthal, J., Jeffries, N. O., Castellanos, F. X., Liu, H., Zijdenbos, A., … & Rapoport, J. L. (1999). Brain development during childhood and adolescence: a longitudinal MRI study. Nature neuroscience, 2(10), 861-863.
  • Gottfredson, L. S. (2005). Suppressing intelligence research: Hurting those we intend to help. In R. H. Wright & N. A. Cummings (Eds.), Destructive trends in mental health: The well-intentioned path to harm (pp. 155-186). New York: Taylor and Francis.
  • Haier, R. J., Karama, S., Colom, R., Jung, R., & Johnson, W. (2014a). A comment on “Fractionating Intelligence” and the peer review process. Intelligence, 46, 323-332.
  • Haier, R. J., Karama, S., Colom, R., Jung, R., & Johnson, W. (2014b). Yes, but flaws remain. Intelligence, 46, 341-344.
  • Hampshire, A., Highfield, R. R., Parkin, B. L., & Owen, A. M. (2012). Fractionating human intelligence. Neuron, 76(6), 1225-1237.
  • Hampshire, A., Parkin, B., Highfield, R., & Owen, A. M. (2014). Response to:“Higher-order g versus blended variable models of mental ability: Comment on Hampshire, Highfield, Parkin, and Owen (2012)”. Personality and Individual Differences, 60, 8-12.
  • Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world?. Behavioral and brain sciences, 33(2-3), 61-83.
  • Intelligence. (1998). Special issue dedicated to Arthur Jensen. Volume 26, Issue 3.
  • Jackson, D. N., & Rushton, J. P. (2006). Males have greater g: Sex differences in general mental ability from 100,000 17-to 18-year-olds on the Scholastic Assessment Test. Intelligence, 34(5), 479-486.
  • Jensen, A. R., & Weng, L. J. (1994). What is a good g?. Intelligence, 18(3), 231-258.
  • Jensen, A. R. (1997). The psychometrics of intelligence. In H. Nyborg (Ed.), The scientific study of human nature: Tribute to Hans J. Eysenck at eighty. New York: Elsevier. Pp. 221—239.
  • Jensen, A. R. (1998). The g Factor: The Science of Mental Ability. Preager.
  • Jensen, A. R. (2002). Psychometric g: Definition and substantiation. The general factor of intelligence: How general is it, 39-53.
  • Jung, R. E. & Haier, R. J. (2007). The Parieto-Frontal Integration Theory (P-FIT) of intelligence: converging neuroimaging evidence. Behav. Brain Sci. 30, 135–154; discussion 154–187.
  • Jung, R. E. et al. (2009). Imaging intelligence with proton magnetic resonance spectroscopy. Intelligence 37, 192–198.
  • Jöreskog, K. G. (1996). Applied factor analysis in the natural sciences. Cambridge University Press.
  • Kim, S. E., Lee, J. H., Chung, H. K., Lim, S. M., & Lee, H. W. (2014). Alterations in white matter microstructures and cognitive dysfunctions in benign childhood epilepsy with centrotemporal spikes. European Journal of Neurology, 21(5), 708-717.
  • Kirkegaard, E. O. W. (2014a). The international general socioeconomic factor: Factor analyzing international rankings. Open Differential Psychology.
  • Kirkegaard, E. O. W. (2014b). The personal Jensen coefficient does not predict grades beyond its association with g. Open Differential Psychology.
  • Li, Y. et al. (2009). Brain anatomical network and intelligence. PLoS Comput. Biol. 5, e1000395.
  • Navas‐Sánchez, F. J., Alemán‐Gómez, Y., Sánchez‐Gonzalez, J., Guzmán‐De‐Villoria, J. A., Franco, C., Robles, O., … & Desco, M. (2014). White matter microstructure correlates of mathematical giftedness and intelligence quotient. Human brain mapping, 35(6), 2619-2631.
  • Neubauer, A. C. & Fink, A. (2009). Intelligence and neural efficiency. Neurosci. Biobehav. Rev. 33, 1004–1023.
  • Noble, K. G., Houston, S. M., Brito, N. H., Bartsch, H., Kan, E., Kuperman, J. M., … & Sowell, E. R. (2015). Family income, parental education and brain structure in children and adolescents. Nature Neuroscience.
  • Nyborg, H. (2003). The sociology of psychometric and bio-behavioral sciences: A case study of destructive social reductionism and collective fraud in 20th century academia. Nyborg H.(Ed.). The scientific study of general intelligence. Tribute to Arthur R. Jensen, 441-501.
  • Nyborg, H. (2011). The greatest collective scientific fraud of the 20th century: The demolition of differential psychology and eugenics. Mankind Quarterly, Spring Issue.
  • Pennington, B. F., Filipek, P. A., Lefly, D., Chhabildas, N., Kennedy, D. N., Simon, J. H., … & DeFries, J. C. (2000). A twin MRI study of size variations in the human brain. Journal of Cognitive Neuroscience, 12(1), 223-232.
  • Pietschnig, J., Penke, L., Wicherts, J. M., Zeiler, M., & Voracek, M. (2014). Meta-Analysis of Associations Between Human Brain Volume And Intelligence Differences: How Strong Are They and What Do They Mean?. Available at SSRN 2512128.
  • Ree, M. J., & Earles, J. A. (1991). The stability of g across different methods of estimation. Intelligence, 15(3), 271-278.
  • Rowe, D. C., & Rodgers, J. E. (2005). Under the skin: On the impartial treatment of genetic and environmental hypotheses of racial differences. American Psychologist, 60(1), 60.
  • Rushton, J. P., & Jensen, A. R. (2005). Thirty years of research on race differences in cognitive ability. Psychology, public policy, and law, 11(2), 235.
  • Shaw, P. et al. (2006). Intellectual ability and cortical development in children and adolescents. Nature 440, 676–679 (2006).
  • Sterne, J. A., White, I. R., Carlin, J. B., Spratt, M., Royston, P., Kenward, M. G., … & Carpenter, J. R. (2009). Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. Bmj, 338, b2393.
  • Strenze, T. (2007). Intelligence and socioeconomic success: A meta-analytic review of longitudinal research. Intelligence, 35(5), 401-426.
  • Turken, A. et al. (2008). Cognitive processing speed and the structure of white matter pathways: convergent evidence from normal variation and lesion studies. Neuroimage 42, 1032–1044
  • Unsworth, N., Fukuda, K., Awh, E., & Vogel, E. K. (2014). Working memory and fluid intelligence: Capacity, attention control, and secondary memory retrieval. Cognitive psychology, 71, 1-26.



I analyzed the S factor in US states by compiling a dataset of 25 diverse socioeconomic indicators. Results show that Washington DC is a strong outlier, but if it is excluded, then the S factor correlated strongly with state IQ at .75.

Ethnoracial demographics of the states are related to the state’s IQ and S in the expected order (White>Hispanic>Black).


Introduction and data sources

In my previous two posts, I analyzed the S factor in 33 Indian states (Kirkegaard, 2015a) and 31 Chinese regions (Kirkegaard, 2015b). In both samples I found strongish S factors and they both correlated positively with cognitive estimates (IQ or G). In this post I used cognitive data from McDaniel (2006). He gives two sets of estimated IQs based on SAT-ACT and on NAEP. Unfortunately, they only correlate .58, so at least one of them is not a very accurate estimate of general intelligence.

His article also reports some correlations between these IQs and socioeconomic variables: Gross State Product per capita, median income and percent poverty. However, data for these variables is not given in the article, so I did not use them. Not quite sure where his data came from.

However, with cognitive data like this and the relatively large number of datapoints (50 or 51 depending on use of District of Colombia), it is possible to do a rather good study of the S factor and its correlates. High quality data for US states are readily available, so results should be strong. Factor analysis requires a case to variable ratio of at least 2:1 to deliver reliable results (Zhao, 2009). So, this means that one can do an S factor analysis with about 25 variables.

Thus, I set out to find about 25 diverse socioeconomic variables. There are two reasons to gather a very diverse sample of variables. First, for method of correlated vectors to work (Jensen, 1998), there must be variation in the indicators’ loading on the factor. Lack of variation causes restriction of range problems. Second, lack of diversity in the indicators of a latent variable leads to psychometric sampling error (Jensen, 1994; review post here for general intelligence measures).

My primary source was The 2012 Statistical Abstract website. I simply searched for “state” and picked various measures. I tried to pick things that weren’t too dependent on geography. E.g. kilometer of coast line per capita would be very bad since it’s neither socioeconomic and very dependent (near 100%) on geographical factors. To increase reliability, I generally used all data for the last 10 years and averaged them. Curious readers should see the datafile for details.

I ended up with the following variables:

  1. Murder rate per 100k, 10 years
  2. Proportion with high school or more education, 4 years
  3. Proportion with bachelor or more education, 4 years
  4. Proportion with advanced degree or more, 4 years
  5. Voter turnout, presidential elections, 3 years
  6. Voter turnout, house of representatives, 6 years
  7. Percent below poverty, 10 years
  8. Personal income per capita, 1 year
  9. Percent unemployed, 11 years
  10. Internet usage, 1 year
  11. Percent smokers, male, 1 year
  12. Percent smokers, female, 1 year
  13. Physicians per capita, 1 year
  14. Nurses per capita, 1 year
  15. Percent with health care insurance, 1 year
  16. Percent in ‘Medicaid Managed Care Enrollment’, 1 year
  17. Proportion of population urban, 1 year
  18. Abortion rate, 5 years
  19. Marriage rate, 6 years
  20. Divorce rate, 6 years
  21. Incarceration rate, 2 years
  22. Gini coefficient, 10 years
  23. Top 1%, proportion of total income, 10 years
  24. Obesity rate, 1 year

Most of these are self-explanatory. For the economic inequality measures, I found 6 different measures (here). Since I wanted diversity, I chose the GINI and the top 1% because these correlated the least and are well-known.

Aside from the above, I also fetched the racial proportions for each state, to see how they relate the S factor (and the various measures above, but to get these, run the analysis yourself).

I used R with RStudio for all analyses. Source code and data is available in the supplementary material.

Missing data

In large analyses like this there are nearly always some missing data. The matrixplot() looks like this:


(It does not seem possible to change the font size, so I have cut off the names at the 8th character.)

We see that there aren’t many missing values. I imputed all the missing values with the VIM package (deterministic imputation using multiple regression).

Extreme values

A useful feature of the matrixplot() is that it shows in grey-tone the relatively outliers for each variable. We can see that some of them have some hefty outliers, which may be data errors. Therefore, I examined them.

The outlier in the two university degree variables is DC, surely because the government is based there and there is a huge lobbyist center. For the marriage rate, the outlier is Nevada. Many people go there and get married. Physician and nurse rates are also DC, same reason (maybe one could make up some story about how politics causes health problems!).

After imputation, the matrixplot() looks like this:


It is pretty much the same as before, which means that we did not substantially change the data — good!

Factor analyzing the data

Then we factor analyze the data (socioeconomic data only). We plot the loadings (sorted) with a dotplot:


We see a wide spread of variable loadings. All but two of them load in the expected direction — positive are socially valued outcomes, negative the opposite — showing the existence of the S factor. The ‘exceptions’ are: abortion rate loading +.60, but often seen as a negative thing. It is however open to discussion. Maybe higher abortion rates can be interpreted as less backward religiousness or more freedom for women (both good in my view). The other is marriage rate at -.19 (weak loading). I’m not sure how to interpret that. In any case, both of these are debatable which way the proper desirable direction is.

Correlations with cognitive measures

And now comes the big question, does state S correlate with our IQ estimates? They do, the correlations are: .14 (SAT-ACT) and .43 (NAEP). These are fairly low given our expectations. Perhaps we can work out what is happening if we plot them:


Now we can see what is going on. First, the SAT-ACT estimates are pretty strange for three states: California, Arizona and Nevada. I note that these are three adjacent states, so it is quite possibly some kind of regional testing practice that’s throwing off the estimates. If someone knows, let me know. Second, DC is a huge outlier in S, as we may have expected from our short discussion of extreme values above. It’s basically a city state which is half-composed of low s (SES) African Americans and half upper class related to government.

Dealing with outliers – Spearman’s correlation aka. rank-order correlation

There are various ways to deal with outliers. One simple way is to convert the data into ranked data, and just correlate those like normal. Pearson’s correlations assume that the data are normally distributed, which is often not the case with higher-level data (states, countries). Using rank-order gets us these:

S_IQ1_rank S_IQ2_rank

So the correlations improved a lot for the SAT-ACT IQs and a bit for the NAEP ones.

Results without DC

Another idea is simply excluding the strange DC case, and then re-running the factor analysis. This procedure gives us these loadings:


(I have reversed them, because they were reversed e.g. education loading negatively.)

These are very similar to before, excluding DC did not substantially change results (good). Actually, the factor is a bit stronger without DC throwing off the results (using minres, proportion of var. = 36%, vs. 30%). The reason this happens is that DC is an odd case, scoring very high in some indicators (e.g. education) and very poorly in others (e.g. murder rate).

The correlations are:


So, not surprisingly, we see an increase in the effect sizes from before: .14 to .31 and .43 to .69.

Without DC and rank-order

Still, one may wonder what the results would be with rank-order and DC removed. Like this:


So compared to before, effect size increased for the SAT-ACT IQ and decreased slightly for the NAEP IQ.

Now, one could also do regression with weights based on some metric of the state population and this may further change results, but I think it’s safe to say that the cognitive measures correlate in the expected direction and with the removal of one strange case, the better measure performs at about the expected level with or without using rank-order correlations.

Method of correlated vectors

The MCV (Jensen, 1998) can be used to test whether a specific latent variable underlying some data is responsible for the observed correlation between the factor score (or factor score approximation such as IQ — an unweighted sum) and some criteria variable. Altho originally invented for use on cognitive test data and the general intelligence factor, I have previously used it in other areas (e.g. Kirkegaard, 2014). I also used it in the previous study of India (Kirkegaard, 2015a), but not that of China because there was a lack of variation in the loadings of socioeconomic variables on the S factor.

Using the dataset without DC, the MCV result for the NAEP dataset is:


So, again we see that MCV can reach high r’s when there is a large number of diverse variables. But note that the value can be considered inflated because of the negative loadings of some variables. It is debatable whether one should reverse them.

Racial proportions of states and S and IQ

A last question is whether the states’ racial proportions predict their S score and their IQ estimate. There are lots of problems with this. First, the actual genomic proportions within these racial groups vary by state (Bryc, 2015). Second, within ‘pure-breed’ groups, general intelligence varies by state too (this was shown in the testing of draftees in the US in WW1). Third, there is an ‘other’ group that also varies from state to state, presumably different kinds of Asians (Japanese, Chinese, Indians, other SE Asia). Fourth, it is unclear how one should combine these proportions into an estimate used for correlation analysis or model them. Standard multiple regression is unsuited for handling this kind of data with a perfect linear dependency, i.e. the total proportion must add up to 1 (100%). MR assumes that the ‘independent’ variables are.. independent of each other. Surely some method exists that can handle this problem, but I’m not familiar with it. Given the four problems above, one will not expect near-perfect results, but one would probably expect most going in the right direction with non-near-zero size.

Perhaps the simplest way of analyzing it is correlation. These are susceptible to random confounds when e.g. white% correlates differentially with the other racial proportions. However, they should get the basic directions correct if not the effect size order too.

Racial proportions, NAEP IQ and S

For this analysis I use only the NAEP IQs and without DC, as I believe this is the best subdataset to rely on. I correlate this with the S factor and each racial proportion. The results are:

Racial group NAEP IQ S
White 0.69 0.18
Black -0.5 -0.42
Hispanic -0.38 -0.08
Other -0.26 0.2


For NAEP IQ, depending on what one thinks of the ‘other’ category, these have either exactly or roughly the order one expects: W>O>H>B. If one thinks “other” is mostly East Asian (Japanese, Chinese, Korean) with higher cognitive ability than Europeans, one would expect O>W>H>B. For S, however, the order is now O>W>H>B and the effect sizes much weaker. In general, given the limitations above, these are perhaps reasonable if somewhat on the weak side for S.

Estimating state IQ from racial proportions using racial IQs

One way to utilize all the four variable (white, black, hispanic and other) without having MR assign them weights is to assign them weights based on known group IQs and then calculate a mean estimated IQ for each state.

Depending on which estimates for group IQs one accepts, one might use something like the following:

State IQ est. = White*100+Other*100+Black*85+Hispanic*90

Or if one thinks other is somewhat higher than whites (this is not entirely unreasonable, but recall that the NAEP includes reading tests which foreigners and Asians perform less well on), one might want to use 105 for the other group (#2). Or one might want to raise black and hispanic IQs a bit, perhaps to 88 and 93 (#3). Or do both (#4) I did all of these variations, and the results are:

Variable Race.IQ Race.IQ2 Race.IQ3 Race.IQ4
Race.IQ 1 0.96 1 0.93
Race.IQ2 0.96 1 0.96 0.99
Race.IQ3 1 0.96 1 0.94
Race.IQ4 0.93 0.99 0.94 1
NAEP IQ 0.67 0.56 0.67 0.51
S 0.41 0.44 0.42 0.45


As far as I can tell, there is no strong reason to pick any of these over each other. However, what we learn is that the racial IQ estimate and NAEP IQ estimate is somewhere between .51 and .67, and the racial IQ estimate and S is somewhere between .41 and .45. These are reasonable results given the problems of this analysis described above I think.

Added March 11: New NAEP data

I came across a series of posts by science blogger The Audacious Epigone, who has also estimated IQs based on NAEP data. He has done this three times (for 2013, 2009 and 2005 data), so along with McDaniels estimates, this gives us 4 non-identical estimates. First, we check their intercorrelations, which should be very high, r>.9, for this kind of data. Second, we extract the general factor and use it as the best estimate of NAEP IQ for the states (I deleted DC again). Third, we see how all 5 variables relate to S from before.


NAEP.IQ.09 0.96        
NAEP.IQ.05 0.83 0.89      
NAEP M. 0.88 0.93 0.96    
NAEP.1 0.95 0.99 0.95 0.97  
S 0.81 0.76 0.64 0.69 0.75


Where NAEP.1 is the general NAEP factor. We see that intercorrelations between NAEP estimates are not that high, they average only .86. Their loadings on the common factor is very high tho, .95 to .99. Still, this should result in improved results due to measurement error. And it does, NAEP IQ x S is now .75 from .69.

Scatter plot


Supplementary material

Data files and R source code available on the Open Science Framework repository.


Bryc, K., Durand, E. Y., Macpherson, J. M., Reich, D., & Mountain, J. L. (2015). The Genetic Ancestry of African Americans, Latinos, and European Americans across the United States. The American Journal of Human Genetics, 96(1), 37-53.

Jensen, A. R., & Weng, L. J. (1994). What is a good g?. Intelligence, 18(3), 231-258.

Jensen, A. R. (1998). The g factor: The science of mental ability. Westport, CT: Praeger.

Kirkegaard, E. O. W. (2014). The international general socioeconomic factor: Factor analyzing international rankings. Open Differential Psychology.

Kirkegaard, E. O. W. (2015a). Indian states: G and S factors. The Winnower.

Kirkegaard, E. O. W. (2015b). The S factor in China. The Winnower.

McDaniel, M. A. (2006). State preferences for the ACT versus SAT complicates inferences about SAT-derived state IQ estimates: A comment on Kanazawa (2006). Intelligence, 34(6), 601-606.

Zhao, N. (2009). The Minimum Sample Size in Factor Analysis.

Some time ago a new paper came out from the 23andme people reporting admixture among US ethnoracial groups (Bryc et al, 2014). Per our still on-going admixture project (current draft here), one could see if admixture predicts academic achievement (or IQ, if such were available). We (that is, John did) put together achievement data (reading and math scores) from the NAEP and the admixture data here.

Descriptive stats

Admixture studies do not work well if there is no or little variation within groups. So let’s first examine them. For blacks:

                      vars  n mean   sd median trimmed  mad  min  max range  skew kurtosis   se
BlackAfricanAncestry     1 31 0.74 0.04   0.74    0.74 0.03 0.64 0.83  0.19 -0.03    -0.38 0.01
BlackEuropeanAncestry    1 31 0.23 0.04   0.24    0.23 0.03 0.15 0.34  0.19  0.09    -0.30 0.01


So we see that there is little American admixture in Blacks because the African and European add up to close to 100 (23+74=97). In fact, the correlation between African and European ancestry in Blacks is -.99. This also means that multiple correlation is useless because of collinearity.

White admixture data is also not very useful. It is almost exclusively European:

                      vars  n mean sd median trimmed mad  min max range  skew kurtosis se
WhiteEuropeanAncestry    1 51 0.99  0   0.99    0.99   0 0.98   1  0.02 -0.95     0.74  0

What about Hispanics (some sources call them Latinos)?

                       vars  n mean   sd median trimmed  mad  min  max range skew kurtosis   se
LatinoEuropeanAncestry    1 34 0.73 0.07   0.72    0.73 0.05 0.57 0.90  0.33 0.34     0.22 0.01
LatinoAfricanAncestry     1 34 0.09 0.05   0.08    0.08 0.06 0.01 0.22  0.21 0.51    -0.69 0.01
LatinoAmericanAncestry    1 34 0.10 0.05   0.09    0.10 0.03 0.04 0.21  0.17 0.80    -0.47 0.01

Hispanics are fairly admixed. Overall, they are mostly European, but the range of African and American ancestry is quite high. Furthermore, due to the three way variation, multiple regression should work. The ancestry intercorrelations are: -.42 (Afro x Amer) -.21 (Afro x Euro) -.50 (Amer x Euro). There must also be another source because 73+9+10 is only 92%. Where’s the last 8% admixture from?

Admixture x academic achievement correlations: Blacks

row.names BlackAfricanAncestry BlackAmericanAncestry BlackEuropeanAncestry
1 Math2013B -0.32 0.09 0.29
2 Math2011B -0.27 0.21 0.25
3 Math2009B -0.30 0.09 0.28
4 Math2007B -0.12 0.27 0.08
5 Math2005B -0.28 0.26 0.23
6 Math2003B -0.30 0.15 0.26
7 Math2000B -0.36 -0.08 0.34
8 Read2013B -0.25 0.14 0.22
9 Read2011B -0.33 0.22 0.30
10 Read2009B -0.40 -0.03 0.41
11 Read2007B -0.26 0.14 0.24
12 Read2005B -0.43 0.33 0.39
13 Read2003B -0.42 0.09 0.38
14 Read2002B -0.30 -0.10 0.27


Summarizing these results:

     vars  n  mean   sd median trimmed  mad   min   max range  skew kurtosis   se
Afro    1 14 -0.31 0.08  -0.30   -0.32 0.05 -0.43 -0.12  0.31  0.48     0.10 0.02
Amer    1 14  0.13 0.13   0.14    0.13 0.11 -0.10  0.33  0.43 -0.32    -1.07 0.03
Euro    1 14  0.28 0.08   0.28    0.29 0.06  0.08  0.41  0.33 -0.49     0.11 0.02

So we see the expected directions and order, for Blacks (who are mostly African), American admixture is positive and European is more positive. There is quite a bit of variation over the years. It is possible that this reflects mostly ‘noise’ as in, e.g. changes in educational policies in the states, or just sampling error. It is also possible that the changes are due to admixture changes within states over time.

Admixture x academic achievement correlations: Hispanics

row.names LatinoAfricanAncestry LatinoAmericanAncestry LatinoEuropeanAncestry
1 Math13H 0.20 -0.13 -0.10
2 Math11H 0.27 0.02 -0.02
3 Math09H 0.29 -0.32 0.04
4 Math07H 0.36 -0.14 -0.01
5 Math05H 0.38 -0.08 0.00
6 Math03H 0.37 -0.23 -0.08
7 Math00H 0.30 -0.09 -0.05
8 Read2013H 0.18 -0.44 0.33
9 Read2011H 0.21 -0.26 0.33
10 Read2009H 0.19 -0.44 0.33
11 Read2007H 0.13 -0.32 0.23
12 Read2005H 0.38 -0.30 0.23
13 Read2003H 0.32 -0.34 0.18
14 Read2002H 0.24 -0.23 0.08

And summarizing:

     vars  n  mean   sd median trimmed  mad   min  max range  skew kurtosis   se
Afro    1 14  0.27 0.08   0.28    0.28 0.12  0.13 0.38  0.25 -0.10    -1.49 0.02
Amer    1 14 -0.24 0.14  -0.24   -0.24 0.15 -0.44 0.02  0.46  0.17    -1.13 0.04
Euro    1 14  0.11 0.16   0.06    0.11 0.19 -0.10 0.33  0.43  0.23    -1.68 0.04

We do not see the expected results per genetic model. Among Hispanics who are 73% European, African admixture has a positive relationship to academic achievement. American admixture is negatively correlated and European positively, but weaker than African. The only thing that’s in line with the genetic model is that European is positive. On the other hand, results are not in line with a null model either, because then we were expecting results to fluctuate around 0.

Note that the European admixture numbers are only positive for the reading tests. The reading tests are presumably those mostly affected by language bias (many Hispanics speak Spanish as a first language). If anything, the math results are worse for the genetic model.

General achievement factors

We can eliminate some of the noise in the data by extracting a general achievement factor for each group. I do this by first removing the cases with no data at all, and then imputing the rest.

Then we get the correlation like before. This should be fairly close to the means above:

 LatinoAfricanAncestry LatinoAmericanAncestry LatinoEuropeanAncestry 
                  0.28                  -0.36                   0.22

The European result is stronger with the general factor from the imputed dataset, but the order is the same.

We can do the same for the Black data to see if the imputation+factor analysis screws up the results:

 BlackAfricanAncestry BlackAmericanAncestry BlackEuropeanAncestry 
                -0.35                  0.20                  0.31

These results are similar to before (-.31, .13, .28) with the American result somewhat stronger.


Perhaps if we plot the results, we can figure out what is going on. We can plot either the general achievement factor, or specific results. Let’s do both:

Reading2013 plots

hispanic_afro_read13 hispanic_amer_read13 hispanic_euro_read13

Math2013 plots

hispanic_afro_math13 hispanic_amer_math13 hispanic_euro_math13

General factor plots

hispanic_afro_general hispanic_amer_general hispanic_euro_general

These did not help me understand it. Maybe they make more sense to someone who understands US demographics and history better.

Multiple regression

As mentioned above, the Black data should be mostly useless for multiple regression due to high collinearity. But the hispanic should be better. I ran models using two of the three ancestry estimates at a time since one cannot use all three (I think).

Generally, the independents did not reach significance. Using the general achievement factor as the dependent, the standardized betas are:

LatinoAfricanAncestry LatinoAmericanAncestry
             0.1526765             -0.2910413
LatinoAfricanAncestry LatinoEuropeanAncestry
             0.3363636              0.2931108
LatinoAmericanAncestry LatinoEuropeanAncestry
           -0.32474678             0.06224425

The first is relative to European, second to American, and third African. The results are not even consistent with each other. In the first, African>European. In the third, European>African. All results show that Others>American tho.

The remainder

There is something odd about the data, it doesn’t sum to 1. I calculated the sum of the ancestry estimates, and then subtracted that from 1. Here’s the results:

black_remainder hispanic_remainder

To these we can add simple descriptive stats:

                        vars  n mean   sd median trimmed  mad  min  max range skew kurtosis   se
BlackRemainderAncestry     1 31 0.02 0.00   0.02    0.02 0.00 0.01 0.03  0.02 1.35     1.18 0.00
LatinoRemainderAncestry    1 34 0.08 0.05   0.07    0.07 0.03 0.02 0.34  0.32 3.13    12.78 0.01


So we see that there is a sizable other proportion of Hispanics and a small one for Blacks. Presumably, the large outlier of Hawaii is Asian admixture from Japanese, Chinese, Filipino and Native Hawaiian clusters. At least, these are the largest groups according to Wikipedia. For Blacks, the ancestry is presumably Asian admixture as well.

Do these remainders correlate with academic achievement? For Blacks, r = .39 (p = .03), and for Hispanics r = -.24 (p = .18). So the direction is as expected for Blacks and stronger, but for Hispanics, it is in the right direction but weaker.

Partial correlations

What about partialing out the remainders?

LatinoAfricanAncestry LatinoAmericanAncestry LatinoEuropeanAncestry
            0.21881404            -0.33114612             0.09329413
BlackAfricanAncestry BlackAmericanAncestry BlackEuropeanAncestry
           -0.2256171             0.1189219             0.2185139


Not much has changed. European correlation has become weaker for Hispanics. For Blacks, results are similar to before.

Proposed explanations?

The African results are in line with genetic models. The Hispanic is not, but it isn’t in line with the null-model either. Perhaps it has something to do with generational effects. Perhaps if one could find % of first generation Hispanics by state and add those to the regression model / control for that using partial correlations.

Other ideas? Before calculating the results, John wrote:

Language, generation, and genetic assimilation are all confounded, so I thought it best to not look at them.

He may be right.

R code

data = read.csv("BryceAdmixNAEP.tsv", sep="\t",row.names=1)
library(car) # for vif
library(psych) # for describe
library(VIM) # for imputation
library(QuantPsyc) #for lm.beta
library(devtools) #for source_url
#load mega functions

#descriptive stats

black.model = "Math2013B ~ BlackAfricanAncestry+BlackAmericanAncestry"
black.model = "Read2013B ~ BlackAfricanAncestry+BlackAmericanAncestry"
black.model = "Math2013B ~ BlackAfricanAncestry+BlackEuropeanAncestry"
black.model = "Read2013B ~ BlackAfricanAncestry+BlackEuropeanAncestry" = lm(black.model, data)

hispanic.model = "Math2013H ~ LatinoAfricanAncestry+LatinoAmericanAncestry"
hispanic.model = "Read2013H ~ LatinoAfricanAncestry+LatinoAmericanAncestry"
hispanic.model = "Math2013H ~ LatinoAfricanAncestry+LatinoEuropeanAncestry"
hispanic.model = "Read2013H ~ LatinoAfricanAncestry+LatinoEuropeanAncestry"
hispanic.model = "hispanic.ach.factor ~ LatinoAfricanAncestry+LatinoAmericanAncestry"
hispanic.model = "hispanic.ach.factor ~ LatinoAfricanAncestry+LatinoEuropeanAncestry"
hispanic.model = "hispanic.ach.factor ~ LatinoAmericanAncestry+LatinoEuropeanAncestry"
hispanic.model = "hispanic.ach.factor ~ LatinoAfricanAncestry+LatinoAmericanAncestry+LatinoEuropeanAncestry" = lm(hispanic.model, data)

cors = round(rcorr(as.matrix(data))$r,2) #all correlations, round to 2 decimals

#blacks = cors[10:23,1:3] #Black admixture x Achv.
hist(unlist([,1])) #hist for afri x achv
hist(unlist([,2])) #amer x achv
hist(unlist([,3])) #euro x achv
desc = rbind(Afro=describe(unlist([,1])), #descp. stats afri x achv
             Amer=describe(unlist([,2])), #amer x achv
             Euro=describe(unlist([,3]))) #euro x achv

admixture.cors.white = cors[24:25,4:6] #White admixture x Achv.

admixture.cors.hispanic = cors[26:39,7:9] #White admixture x Achv.
desc = rbind(Afro=describe(unlist(admixture.cors.hispanic[,1])), #descp. stats afri x achv
             Amer=describe(unlist(admixture.cors.hispanic[,2])), #amer x achv
             Euro=describe(unlist(admixture.cors.hispanic[,3]))) #euro x achv

##Examine hispanics by scatterplots
scatterplot(Read2013H ~ LatinoAfricanAncestry, data,
            smoother=FALSE, id.n=nrow(data))
scatterplot(Read2013H ~ LatinoEuropeanAncestry, data,
            smoother=FALSE, id.n=nrow(data))
scatterplot(Read2013H ~ LatinoAmericanAncestry, data,
            smoother=FALSE, id.n=nrow(data))
scatterplot(Math2013H ~ LatinoAfricanAncestry, data,
            smoother=FALSE, id.n=nrow(data))
scatterplot(Math2013H ~ LatinoEuropeanAncestry, data,
scatterplot(Math2013H ~ LatinoAmericanAncestry, data,
#General factor
scatterplot(hispanic.ach.factor ~ LatinoAfricanAncestry, data,
            smoother=FALSE, id.n=nrow(data))
scatterplot(hispanic.ach.factor ~ LatinoEuropeanAncestry, data,
scatterplot(hispanic.ach.factor ~ LatinoAmericanAncestry, data,

##Imputed and aggregated data
#Hispanics = data[26:39] #subset hispanic ach data =[<ncol(,] #remove empty cases
miss.table( #examine missing data = irmi(, noise.factor = 0) #impute the rest
#factor analysis
fact.hispanic = fa( #get common ach factor
fact.scores = fact.hispanic$scores; colnames(fact.scores) = "hispanic.ach.factor"
data = merge.datasets(data,fact.scores,1) #merge it back into data
cors[7:9,"hispanic.ach.factor"] #results for general factor

#Blacks = data[10:23] #subset black ach data =[<ncol(,] #remove empty cases = irmi(, noise.factor = 0) #impute the rest
#factor analysis = fa( #get common ach factor
fact.scores =$scores; colnames(fact.scores) = "black.ach.factor"
data = merge.datasets(data,fact.scores,1) #merge it back into data
cors[1:3,"black.ach.factor"] #results for general factor

##Admixture totals
Hispanic.admixture = subset(data, select=c("LatinoAfricanAncestry","LatinoAmericanAncestry","LatinoEuropeanAncestry"))
Hispanic.admixture = Hispanic.admixture[,] #complete cases
Hispanic.admixture.sum = data.frame(apply(Hispanic.admixture, 1, sum))
colnames(Hispanic.admixture.sum)="Hispanic.admixture.sum" #fix name
describe(Hispanic.admixture.sum) #stats

#add data back to dataframe
LatinoRemainderAncestry = 1-Hispanic.admixture.sum #get remainder
colnames(LatinoRemainderAncestry) = "LatinoRemainderAncestry" #rename
data = merge.datasets(LatinoRemainderAncestry,data,2) #merge back

#plot it
LatinoRemainderAncestry = LatinoRemainderAncestry[order(LatinoRemainderAncestry,decreasing=FALSE),,drop=FALSE] #reorder
dotchart(as.matrix(LatinoRemainderAncestry),cex=.7) #plot, with smaller text

Black.admixture = subset(data, select=c("BlackAfricanAncestry","BlackAmericanAncestry","BlackEuropeanAncestry"))
Black.admixture = Black.admixture[,] #complete cases
Black.admixture.sum = data.frame(apply(Black.admixture, 1, sum))
colnames(Black.admixture.sum)="Black.admixture.sum" #fix name
describe(Black.admixture.sum) #stats

#add data back to dataframe
BlackRemainderAncestry = 1-Black.admixture.sum #get remainder
colnames(BlackRemainderAncestry) = "BlackRemainderAncestry" #rename
data = merge.datasets(BlackRemainderAncestry,data,2) #merge back

#plot it
BlackRemainderAncestry = BlackRemainderAncestry[order(BlackRemainderAncestry,decreasing=FALSE),,drop=FALSE] #reorder
dotchart(as.matrix(BlackRemainderAncestry),cex=.7) #plot, with smaller text

#simple stats for both

#make subset with remainder data and achievement
remainders = subset(data, select=c("black.ach.factor","BlackRemainderAncestry",
View(rcorr(as.matrix(remainders))$r) #correlations?

#Partial correlations
partial.r(data, c(7:9,40), c(43))[4,] #partial out remainder for Hispanics
partial.r(data, c(1:3,41), c(42))[4,] #partial out remainder for Blacks


Bryc, K., Durand, E. Y., Macpherson, J. M., Reich, D., & Mountain, J. L. (2014). The Genetic Ancestry of African Americans, Latinos, and European Americans across the United States. The American Journal of Human Genetics.

Incidentally, the Wiki page was very poor, so I had to rewrite that before writing this.

Generally, this was an interesting read that taught me a lot. This probably has to do with me not really caring much about sports. Some parts can be boring if you don’t care/know much about e.g. Baseball. It is pretty US-centric in the topics chosen.

The science in the book comes mostly thru interviews with experts and some summarizing of studies. Rarely is sufficient detail given about the studies for one to make an informed decision about whether to trust it or not. Usually, no sample sizes, p-values, effect sizes etc. are mentioned. It was written as a popular science book to be fair, so this criticism is somewhat unfair.

Some quotes:

When scientists at Washington University in St. Louis tested him, Pujols, the greatest hitter of an era, was in the sixty-sixth percentile for simple reaction time compared with a random sample of college students.

College students are above average g, which means above average reaction time. Presumably, the tested simple reaction time. This correlates about .2 with g. College students are perhaps at 115 on average. This university is apparently a top university. So perhaps the mean IQ is 120-125 there, meaning that these students are about 0.334 d above the mean on reaction time (unless they were students in fysical ed. in which case they may be even higher). Being at the 66 centile is not bad then.

Jason Gulbin, the physiologist who worked on Australia’s Olympic skeleton experiment, says that the word “genetics” has become so taboo in his talent-identification field that “we actively changed our language here around genetic work that we’re doing from ‘genetics’ to ‘molecular biology and protein synthesis.’ It was, literally, ‘Don’t mention the g-word.’ Any research proposals we put in, we don’t mention the genetics if we can help it. It’s: ‘Oh, well, if you’re doing molecular biology and protein synthesis, well, that’s all right.’” Never mind that it’s the same thing.

Studying race? NAZI NAZI!!! Studying population genetics? No problem, carry on.

This story is fascinating. Perhaps the best example of how categorical thinking about gender lead to real life problems.

Several scientists I spoke with about the theory insisted that they would have no interest in investigating it because of the inevitably thorny issue of race involved. One of them told me that he actually has data on ethnic differences with respect to a particular physiological trait, but that he would never publish the data because of the potential controversy. Another told me he would worry about following Cooper and Morrison’s line of inquiry because any suggestion of a physical advantage among a group of people could be equated to a corresponding lack of intellect, as if athleticism and intelligence were on some kind of biological teeter-totter. With that stigma in mind, perhaps the most important writing Cooper did in Black Superman was his methodical evisceration of any supposed inverse link between physical and mental prowess. “The concept that physical superiority could somehow be a symptom of intellectual inferiority only developed when physical superiority became associated with African Americans,” Cooper wrote. “That association did not begin until about 1936.” The idea that athleticism was suddenly inversely proportional to intellect was never a cause of bigotry, but rather a result of it. And Cooper implied that more serious scientific inquiry into difficult issues, not less, is the appropriate path.

How very familiar. Better not hurt those feelings! At least they should publish the data anonymously in some way so others can examine them.

There is a university called Lehigh… Le High… geddit??

In 2010, Heather Huson, a geneticist then studying at the University of Alaska, Fairbanks—and a dogsled racer since age seven—tested dogs from eight different racing kennels. To Huson’s surprise, Alaskan sled dogs have been so thoroughly bred for specific traits that analysis of microsatellites—repeats of small sequences of DNA—proved Alaskan huskies to be an entirely genetically distinct breed, as unique as poodles or labs, rather than just a variation of Alaskan malamutes or Siberian huskies.
Huson and colleagues discovered genetic traces of twenty-one dog breeds, in addition to the unique Alaskan husky signature. The research team also established that the dogs had widely disparate work ethics (measured via the tension in their tug lines) and that sled dogs with better work ethics had more DNA from Anatolian shepherds—a muscular, often blond breed of dog originally prized as a guardian of sheep because it would eagerly do battle with wolves. That Anatolian shepherd genes uniquely contribute to the work ethic of sled dogs was a new finding, but the best mushers already knew that work ethic is specifically bred into dogs.
“Yeah, thirty-eight years ago in the Iditarod there were dogs that weren’t enthused about doing it, and that were forced to do it,” Mackey says. “I want to be out there and have the privilege of going along for the ride because they want to go, because they love what they do, not because I want to go across the state of Alaska for my satisfaction, but because they love doing it. And that’s what’s happened over forty years of breeding. We’ve made and designed dogs suited for desire.”

Admixture studies in dogs, a useful precedent to cite to ease the pain for newcomers.

In one tank are mice missing oxytocin receptors. They are used in the study of pain, but the mice also have deficits in social recognition. Put them with mice they grew up with and they won’t recognize them. In another corner is a tank of raven-haired mice that were bred to be prone to head pain, that is, migraines. They spend a lot of time scratching their foreheads and shuddering, and they are apparently justified in using the old headache excuse to avoid mating. “This experiment has taken years,” says Jeffrey Mogil, head of the lab, of the work that seeks to help develop migraine treatments, “because they breed really, really badly.”

How did they get ethics approval for this???

As Pitsiladis put it, to be a world-beater, “you absolutely must choose your parents correctly.” He was being facetious, of course, because we can’t choose our parents. Nor do humans tend to couple with conscious knowledge of one another’s gene variants. We pair up more in the manner of a roulette ball that bounces off a few pockets before settling into one of many suitable spots. Williams suggests, hypothetically, that if humanity is to produce an athlete with more “correct” sports genes, one approach is to weight the genetic roulette ball with more lineages in which parents and grandparents are outstanding athletes and thus probably harbor a large number of good athleticism genes. Yao Ming—at 7’5″, once the tallest active player in the NBA—was born from China’s tallest couple, a pair of ex–basketball players brought together by the Chinese basketball federation. As Brook Larmer writes in Operation Yao Ming: “Two generations of Yao Ming’s forebears had been singled out by authorities for their hulking physiques, and his mother and father were both drafted into the sports system against their will.” Still, the witting merger of athletes in pursuit of superstar progeny is rare.

Sure we do!  Some do it quite consciously, e.g. using dating sites that match for overall likeness.

I had seen references to this book in a number of places which got me curious. I am somewhat hesitant to read older books since I know much of what they discuss is dated and has been superseded by newer science. Sometimes, however, science (or the science culture) has gone wrong so one may actually learn more reading an older book than a newer one. Since fewer people read older books, one can sometimes find relevant but forgotten facts in them. Lastly, they can provide much needed historical information about the development of thinking about some idea or of some field. All of these remarks are arguably relevant to the race/population genetics controversy.

Still, I did not read the book immediately altho I had a PDF of it. I ended up starting to read it more or less at random due to a short talk I had with John Fuerst about it (we are writing together on racial admixture, intelligence and socioeconomic outcomes in the Americas and also wrote a paper on immigrant performance in Denmark).

So, the book really is dated. It spends hundreds of pages on arcane fysical anthropology which requires one to master human anatomy. Most readers don’t master this discipline, so these parts of the book are virtually un-understandable. However, they do provide one with the distinct impression of how one did fysical anthropology in old times. Lots of observations of cranium, other bones, noses, eyes+lids, teeth, lips, buttocks, etc., and then try to find clusters in these data manually. No wonder they did not reach that high agreement. The data are too scarce to find clusters and humans not sufficiently good at cluster analysis at the intuitive level. Still, they did notice some patterns that are surely correct, such as the division between various African populations, Ainu vs. Japanese, that Europeans are Asians are closer related, that Afghans etc. belong to the European supercluster etc. Clearly, these pre-genetic ideas were not all totally wrong headed. Here’s the table of Races+Subraces from the end of the book. They seem reasonably in line with modern evidence.


Some quotes:

The story of 7 ‘kinds’ of mosquitoes.

[Dobzhansky’s definition = ‘Species in sexual cross-fertilizing organisms can be defined as groups of populations which are reproductively isolated to the extent that the exchange of genes between them is absent or so slow that the genetic differences are not diminished or swamped.’]

Strict application of Dobzhansky’s definition results in certain very similar animals being assigned to different species. The malarial mosquitoes and their relatives provide a remarkable example of this. The facts are not only extreme­ly interesting from the purely scientific point of view, but also of great practical importance in the maintenance of public health in malarious districts. It was discovered in 1920 that one kind of the genus Anopheles, called elutus, could be distinguished from the well-known malarial mosquito, A. maculipennis, by certain minute differences in the adult, and by the fact that its its eggs looked different; but for our detailed knowledge of this subject we are mainly indebted to one Falleroni, a retired inspector of public health in Italy, who began in 1924 to breed Anopheles mosquitoes as a hobby. He noticed that several different kinds of eggs could be distinguished, that the same female always laid eggs having the same appearance, and that adult females derived from those eggs produced eggs of the same type. He realized that although the adults all appeared similar, there were in fact several different kinds, which he could recognize by the markings on their eggs. Falleroni named several different kinds after his friends, and the names he gave are the accepted ones today in scientific nomenclature.

It was not until 1931 that the matter came to the attention of L. W. Hackett, who, with A. Missiroli, did more than anyone else to unravel the details of this curious story.(449,447.448] The facts are these. There are in Europe six different kinds of Anopheles that cannot be distinguished with certainty from one another in the adult state, however carefully they are examined under the microscope by experts; a seventh kind, elutus, can be distinguished by minor differences if its age is known. The larvae of two of the kinds can be distinguished from one another by minute differences (in the type of palmate hair on the second segment, taken in conjunction with the number of branches of hair no. 2 on the fourth and fifth segments). Other supposed differences between the kinds, apart from those in the eggs, have been shown to be unreal.

In nature the seven kinds are not known to interbreed, and it is therefore necessary, under Dobzhansky’s definition, to regard them all as separate species.

The mates of six of the seven species have the habit of ‘swarming’ when ready to copulate. They join in groups of many individuals, humming, high in the air; suddenly the swarm bursts asunder and rejoins. The females recognize the swarms of males of their own species, and are attracted towards them. Each female dashes in, seizes a male, and flies off, copulating.

With the exceptions mentioned, the only visible differences between the species occur at the egg-stage. The eggs of six of the seven species are shown in Fig. 8 (p. 76).

6 anopheles

It will be noticed that each egg is roughly sausage-shaped, with an air-filled float at each side, which supports it in the water in which it is laid. The eggs of the different species are seen to differ in the length and position of the floats. The surface of the rest of the egg is covered all over with microscopic finger-shaped papillae, standing up like the pile of a carpet. It is these papillae that are responsible for the distinctive patterns seen on the eggs of the different species. Where the papillae are long and their tips rough, light is reflected to give a whitish appearance; where they are short and smooth, light passes through to reveal the underlying surface of the egg, which is black. The biological significance of these apparently trivial differences is unknown.

From the point of view of the ethnic problem the most interesting fact is this. Although the visible differences between the species are trivial and confined or almost confined to the egg-stage, it is evident that the nervous and sensory systems are different, for each species has its own habits. The males of one species (atroparvus) do not swarm. It has already been mentioned that the females recognize the males of their own species. Some of the species lay their eggs in fresh water, others in brackish. The females of some species suck the blood of cattle, and are harmless to man; those of other species suck the blood of man, and in injecting their saliva transmit malaria to him.

Examples could be quoted of other species that are distinguishable from one another by morphological differences no greater than those that separate the species of Anopheles; but the races of a single species—indeed, the subraces of a single race—are often distinguished from one another, in their typical forms, by obvious differences, affecting many parts of the body. It is not the case that species are necessarily very distinct, and races very similar. [p. 74ff]

Nature is very odd indeed! More on Wiki.

Some very strange examples of abnormalities of this sort have been recorded by reputable authorities. Buffon quotes two examples of an ‘amour violent’ between a dog and a sow. In one case the dog was a large spaniel on the property of the Comte de Feuillee, in Burgundy. Many persons witnessed ‘the mutual ardour of these two animals; the dog even made prodigious and oft-repeated efforts to copulate with the sow, but the unsuitability of their reproductive organs prevented their union.’ Another example, still more remarkable, occurred on Buffon’s own property. A miller kept a mare and a bull in the same stable. These two animals developed such a passion for one another that on all occasions when the mare was on heat, over a period of several years, the bull copulated with her three or four times a day, whenever he was free to do so. The act was witnessed by all the inhabitants of the place. [p. 92]

Of smelly Japanese:

There is, naturally enough, a correlation between the development of the axillary organ and the smelliness of the secretion of this gland (and probably this applies also to the a glands of the genito-anal region). Briefly, the Europids and Negrids are smelly, the Mongolids scarcely or not at all. so far as the axillary secretion is concerned. Adachi. who has devoted more study to this subject than anyone else, has summed up his findings in a single, short sentence: ‘The Mongolids are essentially an odourless or very slightly smelly race with dry ear-wax.’(5] Since most of the Japanese are free or almost free from axillary smell, they are very sensitive to its presence, of which they seem to have a horror. About 10% of Japanese have smelly axillae. This is attributed to remote Ainuid ancestry, since the Ainu are invariably smelly, like most other Europids, and a tendency to smelliness is known to be inherited among the Japanese. 151 The existence of the odour is regarded among Japanese as a disease, osmidrosis axillae which warrants (or used to warrant) exemption from military service. Certain doctors specialize in its treatment, and sufferers are accustomed to enter hospital. [p. 173]

Japan always take these things to a new level.

Measurements of adult stature, made on several thousand pairs of persons, show a rather close correspondence with these figures, namely, 0 507, 0-322, 0-543, and 0-287 respectively.(172) It will be noticed that the correlations are all somewhat higher than one would expect; that is to say, the members of each pair are, on average, rather more nearly of the same height than the simple theory would suggest. This is attributed in the main to the tendency towards assortative mating, the reality of which had already been recognized by Karl Pearson and Miss Lee in their paper published in 1903. [p. 462]

I didn’t know assortative mating was recognized so far back. This may be a good source to understand the historical development of understanding of assortative mating.

The reference is: Pearson, K. &  Lee,  A.,  1903.  ‘On  the  laws  of  inheritance  in  man.  I.  Inheritance  of  physical characters.’  Biometrika,  2, 357—462.

Definition of intelligence?

What has been said on p. 496 may now be rewritten in the form of a short definition of intelligence, in the straightforward, everyday sense of that word. It is the ability to perceive, comprehend, and reason, combined with the capacity to choose worth-while subjects for study, eagerness to acquire, use, transmit, and (if possible) add to knowledge and understanding, and the faculty for sustained effort towards these ends (cf. p. 438). One might say briefly that a person is intelligent in so far as his cognitive ability and personality tend towards productiveness through mental activity. [p. 495ff]

Baker prefers a broader definition of “intelligence” which includes certain non-cognitive parts. He uses “cognitive ability” like many people do now a days use “general cognitive ability”.

And now surely at the end of the book, the evil master-racist privileged white male John Baker tells us what to do with the information we just learned in the book:

Here, on reaching the end of the book, 1 must repeat some words that I wrote years ago when drafting the Introduction (p. 6), for there is nothing in the whole work that would tend to contradict or weaken them:
Every ethnic taxon of man includes many persons capable of living responsible and useful lives in the communities to which they belong, while even in those taxa that are best known for their contributions to the world’s store of intellectual wealth, there are many so mentally deficient that they would be inadequate members of any society. It follows that no one can claim superiority simply because he or she belongs to a particular ethnic taxon. [p. 534]

So, clearly according to our anti-racist heroes, Baker tells us to revel in our (sorry Jayman if you are reading!) European master ancestry, right?

edited: removed joke because public image -_-

The book is a sociologist trying to interpret the history of behavior genetics into sociology theories. I didn’t pay much attention to their theorizing, being familiar with that kind of nonsense or useless theory. It generally employs the kind of kind of terminology that sociologists are known for: reductionism here, genetic determinism there, racism, eugenics, Nazi, blahblah. It is somewhat dated despite just being released. This is the nature of legacy publishers, since it takes so long go get thru their machinery. It spends a lot of time talking about how the molecular (GWA) studies did not fulfill the dreams of behavior geneticists. This is however semi-moot now due to the fact that recent studies have replicated findings of g-genes and used GCTA to estimate heritability values that make extreme environmentalism impossible to hold onto.

It, however, did contain a lot of interesting quotes from unnamed persons, and various other stuff. It is recommended for those who have an interest in the history of behavior genetics and the race and IQ debate. I cannot give it 4 or 5 stars despite it being interesting due to the aforementioned problems.

For those who have been living under a rock (i.e. not following my on Twitter), John Fuerst have been very good at compiling data from published research. Have a look at Human Varieties with the tag Admixture Mapping. He asked me to help him analyze it and write it up. I gladly obliged, you can read the draft here. John thinks we should write it all into one huge paper instead of splitting it up as is standard practice. The standard practice is perhaps not entirely just for gaming the reputation system, but also because writing huge papers like that can seem overwhelming and may take a long time to get thru review.

So the project summarized so far is this:

  • Genetic models of trait admixture predict that mixed groups will be in-between the two source population in the trait in proportion to their admixture.
  • For psychological traits such as general intelligence (g), this has previously primarily been studied unsystematically in African Americans, but this line of research seems to have dried up, perhaps because it became too politically sensitive over there.
  • However, there have been some studies using the same method, just examining illness-related traits (e.g. diabetes). These studies usually include socioeconomic variables as controls. In doing so, they have found robust correlations between admixture at the individual level and socioeconomic outcomes: income, occupation, education and the like.
  • John has found quite a lot of these and compiled the results into a table that can be found here.
  • The results clearly show the expected results, namely that more European ancestry is associated with more favorable outcomes, more African or American less favorable outcomes. A few of them are non-significant, but none contradicts. A meta-analysis of this would find a very small p value indeed.
  • One study actually included cognitive measures as co-variates and found results in the generally expected direction. See material under the headline “Cognitive differences in the Americans” in the draft file.
  • There is no necessity that one has to look at the individual level. One can look at the group level too. For this reason John has compiled data about the ancestry proportions of American countries and Mexican regions.
  • For the countries, he has tested this against self-identified proportions, CIA World Factbook estimates, skin reflection data and stuff like that, see: The results are pretty solid. The estimates are clearly in the right ballpark.
  • Now, genetic models of the world distribution of general intelligence clearly predict that these estimates will be strongly related to the countries’ estimated mean levels of general intelligence. To test this John has carried out a number of multiple regressions with various controls such as parasite prevalence or cold weather along with European ancestry with the dependent variable being skin color and national achievement scores (PISA tests and the like). Results are in the expected directions even with controls.
  • Using the Mexican regional data, John has compared the Amerindian estimates with PISA scores, Raven’s scores, and Human Development Index (a proxy for S factor (see here and here)). Post is here:

This is where we are. Basically, the data is all there, ready to be analyzed. Someone needs to do the other part of the grunt work, namely running all the obvious tests and writing everything up for a big paper. This is where I come in.

The first I did was to create an OSF repository for the data and code since John had been manually keeping track of versions on HV. Not too good. I also converted his SPSS datafile to one that works on all platforms (CSV with semi-colons).

Then I started writing code in R. First I wanted to look at the more obvious relationships, such as that between IQ and ancestry estimates (ratios). Here I discovered that John had used a newer dataset of IQ estimates Meisenberg had sent him. However, it seems to have wrong data (Guatemala) and covers fewer relevant countries (25 vs. 35) vs. than the standard dataset from Lynn and Vanhanen 2012 (+Malloyian fixes) that I have been using. So for this reason I merged up John’s already enormous dataset (126 variables) with the latest Megadataset (365 variables), to create the cleverly named supermegadataset to be used for this study.

IQ x Ancestry zero-order correlations

Here’s the three scatterplots:




So the reader might wonder, what is wrong with the Amerindian data? Why is about nill? Simply inspecting it reveals the problem. The countries with low Amerindian ancestry have very mixed European vs. African which keeps the mean around 80-85 thus creating no correlation.

Partial correlations

So my idea was this, as I wrote it in my email to John:

Hey John,I wrote my bachelor in 4 days (5 pages per day), so now I’m back to working on more interesting things. I use the LV12 data because it seems better and is larger.

One thing that had been annoying me that was correlations between ancestry and IQ do not take into account that there are three variables that vary, not just two. Remember that odd low correlation Amer x IQ r=.14 compared with Euro x IQ = .68 and Afr x IQ = -.66. The reason for this, it seems to me, is that the countries with low Amer% are a mix of high and low Afr countries. That’s why you get a flat scatterplot. See attached.

Unfortunately, one cannot just use MR with these three variables, since the following equation is true of them 1 = Euro+Afr+Amer. They are structurally dependent. Remember that MR attempts to hold the other variables constant while changing one. This is impossible.
The solution is seems to me is to use partial correlations. In this way, one can partial out one of them and look at the remaining two. There are six possible ways to do this:Amer x IQ, partial out Afr = -.51
Amer x IQ, partial out Euro = .29
Euro x IQ, partial out Afr = .41
Euro x IQ, partial out Amer = .70
Afr x IQ, partial out Euro = -.37
Afr x IQ, partial out Amer = -.76
Assuming that genotypically, Amer=85, Afr=80, Euro=97 (or so), then these results are completed as expected direction wise. In the first case, we remove Afr, so we are comparing Amer vs. Euro. We expect negative since Amer<Euro
In two, we expect positive because Amer>Afr
In three, we expect positive because Euro>Amer
In four, we expect positive because Euro>Afr
In five, we expect negative because Afr<Amer
In six, we expect negative because Afr<Euro
All six predictions were as expected. The sample size is quite small at N=34 and LV12 isn’t perfect, certainly not for these countries. The overall results are quite reasonable in my review.
Estimates of IQ directly from ancestry
But instead merely looking at it via correlations or regressions, one can try to predict the IQs directly from the ancestry. Simple create a predicted IQ based on the proportions and these populations estimated IQs. I tried a number of variations, but they were all close to this: Euro*95+Amer*85+Afro*70. The reason to use Euro 95 and not, say, 100 is that 100 is the IQ of Northern Europeans, in particular the British (‘Greenwich Mean IQ’). The European genes found in the Americans are mostly from Spain and Portugal, which have estimated IQs of 96.6 and 94.4 (mean = 95.5). This creates a problem since the US and Canada are not mostly from these somewhat lower IQ Europeans, but the error source is small (one can always just try excluding them).

So, does the predictions work? Yes.

Now, there is another kind of error with such estimates, called elevation. It refers to getting the intervals between countries right, but generally either over or underestimating them. This kind of error is undetectable in correlation analysis. But one can calculate it by taking the predicted IQs and subtracting the measured IQs, and then taking the mean of these values. Positive values mean that one is overestimating, negative means underestimation. The value for the above is: 1.9, so we’re overestimating a little bit, but it’s fairly close. A bit of this is due to USA and CAN, but then again, LCA (St. Lucia) and DMA (Dominica) are strong negative outliers, perhaps just wrong estimates by Lynn and Vanhanen (the only study for St. Lucia is this, but I don’t have the norms so I can’t calculate the IQ).

I told Davide Piffer about these results and he suggested that I use his PCA factor scores instead. Now, these are not themselves meaningful, but they have the intervals directly estimated from the genetics. His numbers are: Africa: -1.71; Native American: -0.9; Spanish: -0.3. Ok, let’s try:


Astonishingly, the correlation is almost the same. .01 from. However, this fact is less overwhelming than it seems at first because it arises simply because the correlations between the three racial estimates is .999 (95.5