The book is on Libgen (free download).

Since I have ventured into criminology as part of my ongoing research program into the spatial transferability hypothesis (psychological traits are stable when people move around, including between countries) and the immigrant groups by country of origin studies, I thought it was a good idea to actually read some criminology. So since there was a recent book covering genetically informative studies, this seemed like a decent choice, especially because it was also available on libgen for free! :)

So basically it is a debate book with a number of topics. For each topic, someone (or a group of someones) will argue for or explain the non-genetic theories/hypotheses, while another someone will sum up the genetically informative studies (i.e. behavioral genetics studies into crime) or at least biologically informed (e.g. neurological correlates of crime).

Initially, I read all the sociological chapters too until I decided they were a waste of time to read. Then I just read the biosocial ones. If you are wondering about the origin of that term as opposed to the more commonly used synonym sociobiological, the use of it was mostly a move to avoid the political backslash. One of the biosocial authors explained it like this to me:

In terms of the name biosocial (versus sociobiological), I think the name change happened accidentally. But there was somewhat of a reason, I guess. EO Wilson and sociobiological thought was so hated amongst sociologists and criminologists, none of us would have gotten a job had we labelled ourselves sociobiologists. Though it was no great secret that sociobiology gave birth to our field. In some ways, it was purely a semantic way to fend off attacks. Even so, there are some distinctions between us and old school sociobiology (use of behavior genetic techniques, etc.).

The book suffers from the widespread problem in social science of not giving effect size numbers. This is more of a problem for the sociological chapters, but true also for the biosocial ones. If no effect sizes are not reported, one cannot compare the importance of the alleged causes! Note that behavioral genetics results inherently include effect sizes. The simplest ACE fitting will output the effect sizes for additive genetics, shared environment and unshared environment+error.

Even if you don’t plan to read much of this, I recommend reading the highly entertaining chapter: The Role of Intelligence and Temperament in Interpreting the SES-Crime Relationship by Anthony Walsh, Charlene Y. Taylor, and Ilhong Yun.

From Reddit.

 [–]jufnitz 2 points

Not a bad place to plug the excellent, accessible, and snarky Statistics Done Wrong, which explains this problem with p-values along with many other, similarly rampant statistical missteps in natural and social science.

Also too, this:

Science is not a magic wand that turns everything it touches to truth. Instead, “science operates as a procedure of uncertainty reduction,” said Nosek, of the Center for Open Science. “The goal is to get less wrong over time.” This concept is fundamental — whatever we know now is only our best approximation of the truth. We can never presume to have everything right.

…is what philosophers of science have been saying for decades now, contrary to the lay tendency to invest in science the social and ideological capital once held by organized religion. Sure there are scientists who encourage this tendency, but this isn’t necessarily distinguishable from rank self-interest fit for the most soulless corporate executive.


Lewontin is not the right person to cite for skepticism. He is best known for his fallacy with regards to racial divisions.

[–]jufnitz 1 point

Among the general population of working scientists, Lewontin is “best known” for his groundbreaking contributions to evolutionary biology and evolutionary genetics. The circles among which he’s “best known for his fallacy with regards to racial divisions” tend to be those of scientific eugenics and lay racism.


His Wikipedia page mentions his anti-determinism work/activism in the introduction section too. :)

You can also look at his citations:

His #1 citation is his famous paper against the adaptionist program, i.e. part of his anti-determinism work. His #2 fits what you are saying. His #3 is again part of his general behavior genetics/differential psychology denialism. #4 concerns the very question of how to apportion human diversity. #5 is his communist biology manifesto (didn’t we try that once before?). And so on.

You can also look at his publications the last 10 years, they seem to be exclusively or mostly about his denialism project, i.e. political activism. He states this himself in his own books. See also Defenders of the truth.

So yeah, he is mostly known for his political activist biology. He did some real work a long time ago, for which he is still rightfully known to population geneticists.

Note that perhaps there should be doubt quotation marks around human in the title. Would humans with a 1,000 SD increase in (general) cognitive ability (CA) really be human?

Steve Hsu discusses his rough estimation that we can increase CA in humans around 1,000 SD by basically turning all the current alleles with negative effects into their positive or neutral variants. While the problem seems sound enough to me, I can think of some problems.

Trait level x gene interactions

One problem is the possibility of trait level x gene interactions. For instance, suppose that a large number of genes affect pathway X to CA in a roughly linear fashion (i.e. what we find using familial studies and GCTA methods). This could be brain nerve conduction velocity (BNCV) for which there is some evidence that it is related to CA (TE Reed, AR Jensen, 1993). One seemingly mostly forgotten study did find evidence that the correlation between IQ and NCV is genetic (FV Rijsdijk, DI Boomsma, 1997). There is a physical limit on how fast BNCV can be such that the closer to get the the physical limit, the smaller increase we get from altering another negative allele to its positive version. This would be roughly equivalent to the situation in physics with the speed of light. A given amount of energy converted to kinetic energy will result in a smaller increase in m/s as we get closer to the speed of light (the physical limit).

In the comments, Hsu invokes the history of artificial selection on e.g. oil content to argue against trait level x gene interactions. See also: Animal breeding, human breeding. Brains are much more complicated than simple oil content, tho.

Brain size and BNCV

I seem to recall that due to the relatively low BNCV, there is a limit on the practice size of the brain. We know that brain size correlates with CA around .25 (large meta-analysis), which perhaps after corrections for errors will be .35 (restriction of range and measurement error; see Understanding Statistics). The reason this problem happens is that the internal brain communication will become slower at the brain size increases (brief discussion here), which presumably in the end results in a lower (possibly negative in the long run!) increase in CA from changing the alleles that result in larger brains. Solving this could mean requiring more modularization, which presumably would affect the factor structure of cognitive abilities resulting in a weaker general factor.

Brain size and reproduction

When selecting for one trait one will simultaneously select for a number of other genetically correlated traits. With cognitive ability, one of them is brain size. However, due to the physical limitation on space on women’s wombs, we cannot just scale up the scale of brains indefinitely. The relatively large human head size already results in complications with giving birth in current humans. The birth problem has probably been a relatively strong selective force against higher CA.

We can of course use Cesarians now to avoid between-the-legs birth, so it is not really a problem, but it adds costs to the reproduction process. In the long run, if we scale up brain size a lot, we would need to scale up the size of women’s interior space to accommodate the larger fetus. Note that if we just increase the size of women overall, this would result in smaller brain to brain ratio, which is what really matters. So it won’t be so easy to deal with this problem.

The final (biological reproduction) solution is to stop using women for reproduction: artificial wombs/uterus. This technology is however not being aggressively pursued as far as I know, so it is probably a number of decades away.

Of course, we will want to switch to some other neurate at some point too. :)

Reading up on the huge animal breeding literature gives a useful background to one’s thinking about what selection on humans will do in the future (embryo selection and direct editing á la CRISPR).


I made the above infograph some time ago, maybe 1-2 years. It is still pretty accurate. The newest data for genome sequencing does not look much different.

Steve Hsu has been following some of the animal breeding literature, e.g. Frontiers in cattle genomics.

I digged around a bit and found some reviews. They mentioned various interesting experiments. Of course, the most interesting experiment is still the Russian domesticated fox experiment (I want one of these!). Recently, there was an interesting one about breeding for brain size in guppies.


There is also the famous rat maze ability experiments. Solving mazes is g-loaded in humans (Jensen, 1980, book). A good review is Tolman and Tryon Early research on the inheritance of the ability to learn.


The most new and interesting part in relationship to humans is using genomic predictors alone. There is a recent, easy to read review: Understanding genomic selection in
poultry breeding.

selection for eggs

Because the animal breeding field has been going for so long, one find 100s if not 1000s of these types of graphs, yet they are still exciting. One might wonder: is there nothing one cannot select for? It seems no matter the trait, evolution finds a way. Dawkins seems to agree:

Political opposition to eugenic breeding of humans sometimes spills over into the almost certainly false assertion that it is impossible. Not only is it immoral, you may hear it said, it wouldn’t work. Unfortunately, to say that something is morally wrong, or politically undesirable, is not to say that it wouldn’t work. I have no doubt that, if you set your mind to it and had enough time and enough political power, you could breed a race of superior body-builders, or high-jumpers, or shot-putters; pearl fishers, sumo wrestlers, or sprinters; or (I suspect, although now with less confidence because there are no animal precedents) superior musicians, poets, mathematicians or wine-tasters. The reason I am confident about selective breeding for athletic prowess is that the qualities needed are so similar to those that demonstrably work in the breeding of racehorses and carthorses, of greyhounds and sledge dogs. The reason I am still pretty confident about the practical feasibility (though not the moral or political desirability) of selective breeding for mental or otherwise uniquely human traits is that there are so few examples where an attempt at selective breeding in animals has ever failed, even for traits that might have been thought surprising. Who would have thought, for example, that dogs could be bred for sheep-herding skills, or ‘pointing’, or bull-baiting?

[from The Greatest Show on Earth]

Selection for High and Low Fatness in Swine


Also interesting is that selective breeding makes it possible to estimate realized heritability, not just from family relationships.


I think we will see some interesting humans in the future. The reason is this: embryo selection is very close and genetic engineering is fairly close. If some countries ban them, others will allow them. Or one can sail or fly to a seastead. Or use any number of black market solutions that will inevitably spring up. Probably, not all jurisdictions will ban it, so there will be reproductive havens+tourism just like there are tax havens and even suicide havens. I don’t think Western governments will dare to force abortions on pregnant returnees, so there is nothing much they can do at that point. There is also of course the near-impossibility of proving that a fetus is a result of embryo selection, not normal fertilization. After all, embryo selection is just choosing between actual possibilities (hopefully, philosophy readers will allow me the flagrant abuse of modal terminology). If everybody starts having healthier children by using this technology, there will be no way to prove that a particular couple ‘cheated’. It is only in the aggregate one can prove that something is going on. A particular couple may just have been lucky. As for direct editing, it may be possible to spot genetically, but I doubt this will happen.

In the EU, I suspect the legality of this practice will come down to legal interpretation. The EU has a CHARTER OF FUNDAMENTAL RIGHTS OF THE EUROPEAN UNION, in which one can read:

Article 3
Right to the integrity of the person
1.   Everyone has the right to respect for his or her physical and mental integrity.
2.   In the fields of medicine and biology, the following must be respected in particular:
(a) the free and informed consent of the person concerned, according to the procedures laid down by law;
(b) the prohibition of eugenic practices, in particular those aiming at the selection of persons;
(c) the prohibition on making the human body and its parts as such a source of financial gain;
(d) the prohibition of the reproductive cloning of human beings.

But given that selection of persons is widely done for e.g. Down’s syndrome, (b) is clearly ignored in practice. (c) is also ignored e.g. for sperm and egg selling, altho they call it donation (with a nice monetary benefit in return). So, the best hope is that embryo selection for medical reasons will sneak into practice and become so standard that it would seem outlandish to ban it. This is well underway. When the public comes to accept it, the judges will probably make up some legal reason to interpret (b) narrowly, e.g. as to refer to forced sterilization. One may be able to find support for this in the background work for this charter, altho I haven’t looked into it.

Given that the technology will likely come into wide-scale practice within the next couple of decades, what remains to be researched more — a lot more — is how people will actually make choices. When prospective parent(s) have to make decisions re. which embryos to implement, there will be a choice. With a limited choice of embryos, one cannot simultaneously maximize all desirable traits and minimize all undesirable traits. There will probably be clear trends in this: few will select against intelligence, few will select short boys, few will select nasty diseases, most will select for health and happiness. People like Helen Henderson are not common:

I can say, without hesitation, that my life has been richer because I have MS. How can anyone who has no experience with disabilities understand that?

[From Future Human Evolution.]

If they still try to get children with horrible genetic diseases, the government probably (should?) will step in and ban it.

Still, there will be lots of variation. This variation in selective pressure between people should — together with strong assortative mating — result in divergence of human lines. This is will somewhat akin to dog, cat and horse breeds. Assortative mating is apparently so strong that people even choose pets that are similar to themselves: Self seeks like: many humans choose their dog pets following rules used for assortative mating.


We truly live in interesting times. :)

If you want to read more like this, there was also recently the double paper: Eugenics, Ready or Not I, II. (I could not find a link to part 2.)


A recent paper informs us that we have now found a small number of SNPs that explain skin color in European samples.

In the International Visible Trait Genetics (VisiGen) Consortium, we investigated the genetics of human skin color by combining a series of genome-wide association studies (GWAS) in a total of 17,262 Europeans with functional follow-up of discovered loci. Our GWAS provide the first genome-wide significant evidence for chromosome 20q11.22 harboring the ASIP gene being explicitly associated with skin color in Europeans. In addition, genomic loci at 5p13.2 (SLC45A2), 6p25.3 (IRF4), 15q13.1 (HERC2/OCA2), and 16q24.3 (MC1R) were confirmed to be involved in skin coloration in Europeans. In follow-up gene expression and regulation studies of 22 genes in 20q11.22, we highlighted two novel genes EIF2S2 and GSS, serving as competing functional candidates in this region and providing future research lines. A genetically inferred skin color score obtained from the 9 top-associated SNPs from 9 genes in 940 worldwide samples (HGDP-CEPH) showed a clear gradual pattern in Western Eurasians similar to the distribution of physical skin color, suggesting the used 9 SNPs as suitable markers for DNA prediction of skin color in Europeans and neighboring populations, relevant in future forensic and anthropological investigations.


All 9 SNPs listed in Table 1 were used to construct a genetically inferred skin color score in 940 samples from 54 worldwide populations (HGDP-CEPH samples), which showed a spatial distribution with a clear gradual increase in skin darkness from Northern Europe to Southern Europe to Northern Africa, the Middle East and Western Asia (Figure S2); in agreement with the known distribution of skin color across these geographic regions. Outside of these geographic regions, the inferred skin color score appeared rather similar (i.e., failing to discriminate), despite the known phenotypic skin color difference between generally lighter Asians/Native Americans and darker Africans. This demonstrates that although these 9 SNPs can explain skin color variation among Europeans, they cannot explain existing skin color differences between Asians/Native Americans and Africans. Therefore, these differences in skin color variation may partly be due to different DNA variants not identifiable by this European study with restricted genetic origin.

The same general problem may apply to the Piffer results. Perhaps the SNPs found only affect cognitive ability within European samples (or Euroasian, because there is one Chinese replication). This sounds like a case of epistasis, where the other necessary gene(s) for the identified SNPs to have an effect on cognitive ability have substantial frequencies in European populations, but don’t exist or are very rare in non-European populations.

As far as I know, this is a possible but unlikely scenario. It will perhaps serve as one of the remaining areas where non-hereditarians can point to and say that there is still reasonable doubt. The solution is to perform GWAS on African subjects. Luckily, a large number of such subjects live in or near (relatively) affluent countries in the Americas.


Remains to be done:

  • Admixture analysis (doing)
  • Proofreading and editing
  • Deciding how to control for age and scanner (technical question)


I explore a large (N≈1000), open dataset of brain measurements and find a general factor of brain size (GBSF) that covers all regions except possibly the amygdala (loadings near-zero, 3 out of 4 negative). It is very strongly correlated with total brain size volume and surface area (rs>.9). The factor was (near)identical across genders after adjustments for age were made (factor congruence 1.00).

GBSF had similar correlations to cognitive measures as did other aggregate brain size measures: total cortical area and total brain volume. I replicated the finding that brain measures were associated with parental income and educational attainment.



A recent paper by Noble et al (2015) has gotten considerable attention in the media. An interesting fact about the paper is that most of the data was published in the paper, perhaps inadvertently. I was made aware of this fact by an observant commenter, FranklinDMadoff, on the blog of James Thompson (Psychological Comments). In this paper I make use of the same data, revisit their conclusions as well do some of my own.

The abstract of the paper reads:

Socioeconomic disparities are associated with differences in cognitive development. The extent to which this translates to disparities in brain structure is unclear. We investigated relationships between socioeconomic factors and brain morphometry, independently of genetic ancestry, among a cohort of 1,099 typically developing individuals between 3 and 20 years of age. Income was logarithmically associated with brain surface area. Among children from lower income families, small differences in income were associated with relatively large differences in surface area, whereas, among children from higher income families, similar income increments were associated with smaller differences in surface area. These relationships were most prominent in regions supporting language, reading, executive functions and spatial skills; surface area mediated socioeconomic differences in certain neurocognitive abilities. These data imply that income relates most strongly to brain structure among the most disadvantaged children.

The results are not all that interesting, but the dataset is very large for a neuroscience study, the median sample size of median samples sizes in a meta-analysis of 49 meta-analysis is 116.5 (Button et al, 2013; based on the data in their Table 1). Furthermore, they provide a very large set of different, non-overlapping brain measurements which are useful for a variety of analyses, and they provide genetic admixture data which can be used for admixture mapping.

Why their results are as expected

The authors give their results (positive relationships between various brain size measures and parental educational and economic variables) environmental interpretations. For instance:

It is possible that, in these regions, associations between parent education and children’s brain surface area may be mediated by the ability of more highly educated parents to earn higher incomes, thereby having the ability to purchase more nutritious foods, provide more cognitively stimulating home learning environments, and afford higher quality child care settings or safer neighborhoods, with more opportunities for physical activity and less exposure to environmental pollutants and toxic stress3, 37. It will be important in the future to disambiguate these proximal processes by measuring home, family and other environmental mediators21.

However, one could also expect the relationship to be due to general cognitive ability (GCA; aka. general intelligence) and its relationship to favorable educational and economic outcomes, as well as brain measures. Figure 1 illustrates this expected relationship:

Figure 1

Figure 1 – Relationships between variables

The purple line is the one the authors are often arguing for based on their observed positive relationships. As can be seen in the figure, this positive relationship is also expected because of parental education/income’s positive relationship to parental GCA, which is related to parental brain properties which are highly heritable. Based on these well-known relationships, we can estimate some expected correlations. The true score relationship between adult educational attainment and GCA is somewhere around .56 (Strenze, 2007).

The relationship between GCA and whole brain size is around .24-.28, depending on whether one wants to use the unweighted mean, n-weighted mean or median, and which studies one includes of those collected by Pietschnig et al (2014). I used healthy samples (as opposed to clinical) and only FSIQ. This value is uncorrected for measurement error of the IQ test, typically assumed to be around .90. If we choose the conservative value of .24 and then correct with .90, we get .27 as an estimated true score correlation.

The heritability of whole brain size is very high. Bouchard (2014) summarized a few studies: one found cerebral total h^2 of = .89, another whole-brain grey matter .82, whole-brain white matter .87, and a third total brain volume .80. Perhaps there is some publication bias in these numbers, so we can choose .80 as an estimate. We then correct this for measurement error and get .89. None of the previous studies were corrected for restriction of range which is fairly common because most studies use university students (Henrich et al, 2010) who average perhaps 1 standard deviation above the population mean in GCA. If we multiply these numbers we get an estimate of r=.13 between parental education and total brain volume or a similar measure. As for income, the expected correlation is somewhat lower because the relationship between GCA and income is weaker, perhaps .23 (Strenze, 2007). This gives .05. However, Strenze did not account for the non-linearity of the income x GCA relationship, so it is probably somewhat higher.

Initial analyses

Analysis was done in R. Code file, figures, and data are available in supplementary material

Collecting the data

The authors did not just publish one datafile with comments about the variables, but instead various excel files were attached to parts of the figures. There are 6 such files. They all contain the same number of cases and they overlap completely (as can be seen by the subjectID column). The 6 files however do not overlap completely in their columns and some of them have unique columns. These can all be merged into one dataset.

Dealing with missing data

The original authors dealt with this simply by relying on the complete cases only. This method can bias the results when the data is not missing completely at random. Instead, it is generally better to impute missing data (Sterne et al, 2009). Figure 1 shows the matrixplot of the data file.


The red areas mean missing data, except in the case of nominal variables which are for some reason always colored red (an error I think). Examining the structure of missing data showed that it was generally not possible to impute the data, since many cases were missing most of their values. One will have to exclude these cases. Doing so reduces the sample size from 1500 to 1068. The authors report having 1099 complete cases, but I’m not sure where the discrepancy arises.

Dealing with gender

Since males have much larger brain volumes than females, even after adjustment for body size, there is the question of how to deal with gender (no distinction is being made here between sex and gender). The original authors did this by regressing the effect out. However, in my experience, regression does not always accomplish this perfectly, so when possible one should just split the sample by gender and calculate results in each one-gender sample. One cannot do the sampling splitting when one is interested in the specific regression effect of gender, or when the resulting samples would be too small.

Dealing with age

This problem is tricky. The original authors used age and age2 to deal with age in a regression model. However, since I wanted to visualize the relationships between variables, this option was not useful to me because it would only give me the summary statistics with the effects of age, not the data. Instead, I calculated the residuals for all variables of interest after they were regressed on age, age2 and age3. The cubic age was used to further catch non-linear effects of age, as noted by e.g. Jensen (2006: 132-133).

Dealing with scanning site

One peculiar feature of the study not discussed by the authors was the relatively effect of different scanners on their results, see e.g. their Table 3. To avoid scanning site influencing the results, I also regressed this out (as a nominal variable with 13 levels).

Dealing with size

The dataset does not have size measures thus making it impossible to adjust for body size. This is problematic as it is known that body size correlates with GCA both within and between species. We are interested in differences in brain size holding body size equal. This cannot be done in the present study.

Factor analyzing brain size measurements

Why would one want to factor analyze brain measures?

The short answer is the same as that to the question: why would one want to factor analyze cognitive ability data? The answer: To explore the latent relationships in the data not immediately obvious. A factor analysis will reveal whether there is a general factor of some domain, which can be a theoretically very important discovery (Dalliard, 2013; Jensen, 1998:chapter 2). If there is no general factor, this will also be revealed and may be important as well. This is not to say that general factors or the lack thereof are the only interesting thing about the factor structure, multifactor structures are also interesting, whether orthogonal (uncorrelated) or as part of a hierarchical solution (Jensen, 2002).

The long answer is that human psychology is fundamentally a biological fact, a matter of brain physics and chemistry. This is not to say that important relationships can not fruitfully be described better at higher-levels (e.g. cognitive science), but that ultimately the origin of anything mental is biology. This fact should not be controversial except among the religious, for it is merely the denial of dualism, of ghosts, spirits, gods and other immaterial beings. As Jensen (1997) wrote:

Although the g factor is typically the largest component of the common factor variance, it is the most “invisible.” It is the only “factor of the mind” that cannot possibly be described in terms of any particular kind of knowledge or skill, or any other characteristics of psychometric tests. The fact that psychometric g is highly heritable and has many physical and brain correlates means that it is not a property of the tests per se. Rather, g is a property of the brain that is reflected in observed individual differences in the many types of behavior commonly referred to as “cognitive ability” or “intelligence.” Research on the explanation of g, therefore, must necessarily extend beyond psychology and psychometrics. It is essentially a problem for brain neurophysiology. [my emphasis]

If GCA is a property of the brain, or at least that there is an analogous general brain performance factor, it may be possible to find it with the same statistical methods that found the GCA. Thus, to find it, one must factor analyze a large, diverse sample of brain measurements that are known to correlate individually with GCA in the hope that there will be a general factor which will correlate very strongly with GCA. There is no guarantee as I see it that this will work, as I see it, but it is something worth trying.

In their chapter on brain and intelligence, Colom and Thompson (2011) write:

The interplay between genes and behavior takes place in the brain. Therefore, learning the language of the brain would be crucial to understand how genes and behavior interact. Regarding this issue, Kovas and Plomin (2006) proposed the so -called “ generalist genes ” hypothesis, on the basis of multivariate genetic research findings showing significant genetic overlap among cognitive abilities such as the general factor of intelligence ( g ), language, reading, or mathematics. The hypothesis has implication for cognitive neuroscience, because of the concepts of pleiotropy (one gene affecting many traits) and polygenicity (many genes affecting a given trait). These genetic concepts suggest a “ generalist brain ” : the genetic influence over the brain is thought to be general and distributed.

Which brain measurements have so far been found to correlate with GCA (or its IQ proxy)?

Below I have compiled a list of brain measurements that have at some point been found to be correlated with GCA IQ scores:

  • Brain evoked potentials: habituation time (Jensen, 1998:155)
  • Brain evoked potentials: complexity of waveform (Deary and Carol, 1997)
  • Brain intracellular pH-level (Jensen, 1998:162)
  • Brain size: total and brain regions (Jung and Haier, 2007)
  • Of the above, grey matter and white matter separate
  • Cortical thickness (Deary et al, 2010)
  • Cortical development (Shaw, P. et al. 2006)
  • Nerve conduction velocity (Deary and Carol, 1997)
  • Brain wave (EEG) coherence (Jensen, 2002)
  • Event related desynchronization of brain waves (Jensen, 2002)
  • White matter lesions (Turken et al, 2008)
  • Concentrations of N-acetyl aspartate (Jung, et al. 2009)
  • Water diffusion parameters (Deary et al, 2010)
  • White matter integrity (Deary et al, 2010)
  • White matter network efficiency (Li et al. 2009)
  • Cortical glucose metabolic rate during mental activity / Neural efficiency (Neubauer et al, 2009)
  • Uric acid level (Jensen, 1998:162)
  • Density of various regions (Frangou et al 2004)
  • White matter fractional anisotropy (Navas‐Sánchez et al 2014; Kim et al 2014)
  • Reliable responding to changing inputs (Euler et al, 2015)

Most of the references above lead to the reviews I relied upon (Deary and Carol, 1997; Jensen, 1998, 2002; Deary et al, 2010). There are surely more, and probably a large number of the above are false-positives. Some I could not find a direct citation for. We cannot know which are false positives until large datasets are compiled with these measures as well as a large number of cognitive tests. A simple WAIS battery won’t suffice, there needs to be elementary cognitive tests too, and other tests that vary more in content, type and g-loading. This is necessary if we are to use the method of correlated vectors as this does not work well without diversity in factor indicators. It is also necessary if we are to examine non-GCA factors.

My hypothesis is that if there is a general brain factor, then it will have a hierarchical structure similar to GCA. Figure 2 shows a hypothetical structure of this.

Figure 2

Notes: Where squares at latent variables and circles are observed variables. I am aware this is opposite of normal practice (e.g. Beaujean, 2014) but text is difficult to fit into circles.

Of these, the speed factor has to do with speed of processing which can be enhanced in various ways (nerve conduction velocity, higher ‘clock’ frequency). Efficiency has to do with efficient use of resources (primarily glucose). Connectivity has to do with better intrabrain connectivity, either by having more connections, less problematic connections or similar. Size has to do with having more processing power by scaling up the size. Some areas may matter more than others for this. Integrity has to do with withstanding assaults, removing garbage (which is known to be the cause of many neurodegenerative diseases) and the like. There are presumably more factors, and some of mine may need to be split.

Previous studies and the present study

Altho factor analysis is common in differential psychology and related fields, it is somewhat rare outside of those. And when it is used, it is often done in ways that are questionable (see e.g. controversy surrounding Hampshire et al (2012): Ashton et al (2014a), Hampshire et al (2014), Ashton et al (2014b), Haier et al (2014a), Ashton et al (2014c), Haier et al (2014b)). On the other hand, factor analytic methods have been used in a surprisingly diverse collection of scientific fields (Jöreskog 1996; Cudeck and MacCallum, 2012).

I am only familiar with one study applying factor analysis to different brain measures and it was a fairly small study at n=132 (Pennington et al, 2000). They analyzed 13 brain regions and reported a two-factor solution. It is worth quoting their methodology section:

Since the morphometric analyses yield a very large number of variables per subject, we needed a data reduction strategy that fit with the overall goal of exploring the etiology of individual differences in the size of major brain structures. There were two steps to this strategy: (1) selecting a reasonably small set of composite variables that were both comprehensive and meaningful; and (2) factor analyzing the composite variables. To arrive at the 13 composite variables discussed earlier, we (1) picked the major subcortical structures identified by the anatomic segmentation algorithms, (2) reduced the set of possible cortical variables by combining some of the pericallosal partitions as described earlier, and (3) tested whether it was justifiable to collapse across hemispheres. In the total sample, there was a high degree of correlation (median R=.93, range=.82-.99) between the right and left sides of any given structure; it thus seemed reasonable to collapse across hemispheres in creating composites. We next factor-analyzed the 13 brain variables in the total sample of 132 subjects, using Principal Components factor analysis with Varimax rotation (Maxwell & Delaney, 1990). The criteria for a significant factor was an eigenvalue>l.0, with at least two variables loading on the factor.

The present study makes it possible to perform a better analysis. The sample is about 8 times larger and has 27 non-overlapping measurements of brain size, broadly speaking. The major downside of the variables in the present study is that the cerebral is not divided into smaller areas as done in their study. Given the very large sample size, one could use 100 variables or more.

The available brain measures are:

  1. cort_area.ctx.lh.caudalanteriorcingulate
  2. cort_area.ctx.lh.caudalmiddlefrontal
  3. cort_area.ctx.lh.fusiform
  4. cort_area.ctx.lh.inferiortemporal
  5. cort_area.ctx.lh.middletemporal
  6. cort_area.ctx.lh.parsopercularis
  7. cort_area.ctx.lh.parsorbitalis
  8. cort_area.ctx.lh.parstriangularis
  9. cort_area.ctx.lh.rostralanteriorcingulate
  10. cort_area.ctx.lh.rostralmiddlefrontal
  11. cort_area.ctx.lh.superiortemporal
  12. cort_area.ctx.rh.caudalanteriorcingulate
  13. cort_area.ctx.rh.caudalmiddlefrontal
  14. cort_area.ctx.rh.fusiform
  15. cort_area.ctx.rh.parsopercularis
  16. cort_area.ctx.rh.parsorbitalis
  17. cort_area.ctx.rh.parstriangularis
  18. cort_area.ctx.rh.rostralanteriorcingulate
  19. cort_area.ctx.rh.rostralmiddlefrontal
  20. vol.Left.Cerebral.White.Matter
  21. vol.Left.Cerebral.Cortex
  22. vol.Left.Hippocampus
  23. vol.Left.Amygdala
  24. vol.Right.Cerebral.White.Matter
  25. vol.Right.Cerebral.Cortex
  26. vol.Right.Hippocampus
  27. vol.Right.Amygdala

I am not expert in neuroscience, but as far as I know, the above measurements are independent and thus suitable for factor analysis. They reported additional aggregate measures such as total surface area and total volume. They also reported total cranial volume, which permits the calculations of another two brain measurements: the non-brain volume of the cranium (subtracting total brain volume from total intracranial volume), and the proportion of intracranial volume used for brain.

The careful reader has perhaps noticed something bizarre about the dataset, namely that there is an unequal number of left hemisphere (“lh”) and right hemisphere (“rh”) regions (11 vs. 8). I have no idea why this is, but it is somewhat problematic in factor analysis since this weights some variables twice as well as weighting the left side a bit more.

The present dataset is inadequate for properly testing the general brain factor hypothesis because it only has measurements from one domain: size. The original authors may have more measurements they did not publish. However, one can examine the existence of the brain size factor, as a prior test of the more general hypothesis.

Age and overall brain size

As an initial check, I plotted the relationship between total brain size measures and age. These are shown in Figure 3 and 4.

Figure 3 Figure 4

Curiously, these show that the size increase only occurs up to about age 8 and 10, or so. I was under the impression that brain size continued to go up until the body in general stopped growing, around 15-20 years. This study does not appear to be inconsistent with others (e.g. Giedd, 1999). The relationship is clearly non-linear, so one will need to use the age corrections described above. To see if the correction worked, we plot the total size variables and age. There should be near-zero correlation. Results in Figures 5 and 6.

Figure 5 Figure 6

Instead we still see a slight correlation for both genders, both apparently due to a single outlier. Very odd. I examined these outliers (IDs: P0009 and P0010) but did not see anything special about them. I removed them and reran the residualization from the original data. This produced new outliers similar to before (with IDs following them). When I removed them, new ones. I figure it is due to some error with the residualization process. Indeed, a closer look revealed that the largest outliers (positive and negative) were always the first two indexes. I thus removed these before doing more analyses. The second largest outliers had no particular index. I tried removing more age outliers, but it was not possible to completely remove the correlations between age and the other variables (usually remained near r=.03). Figure 6a shows the same as Figure 6 just without the two outliers.

Figure 6a

The genders are somewhat displaced on the age variable, but if one looks at the x-axis, one an see that this is in fact a very, very small difference.

General brain size factor with and without residualization

Results for the factor analysis without residualization are shown in Figure 7. I used the fa() function from the psych package with default settings: 1 factor extracted with the minimum residuals method. Previous studies have shown factor extraction method to be of little importance as long as it isn’t principal components with a smaller number of variables (Kirkegaard, 2014).

Figure 7

We see that the factors are quite similar (factor congruence .95) but that the male factor is quite a bit stronger (var% M/F 26 vs. 16). This suggests that the factor either works differently in the genders, or there is error in the results. If it is error, we should see an improvement after removing some of it. Figure 8 shows the same plot using the residualized data.

Figure 8

The results were more similar now and stronger for both genders (var% M/F = 34 vs. 33).

The amygdala results are intriguing, suggesting that this region does not increase in size along with the rest of the brain. The right amygdala even had negative loadings in both genders.

Using all that’s left

The next thing one might want to do is extract multiple factors. I tried extracting various solutions with nfactors 3-5. These however are bogus models due to the near-1 correlation between the brain sides. This results in spurious factors that load on just 2 variables (left and right versions) with loadings near 1. One could solve this by either averaging those with 2 measurements, or using only those from the left side. It makes little difference because they correlate so highly. It should be noted tho that doing this means one can’t see any lateralization effects such as that suggested for the right amygdala.

I redid all the results using the left side variables only. Figure 9 shows the results.

Figure 9

Now all regions had positive loadings and the var% increased a bit for both genders to 36/36. Factor congruence was 1.00, even for the non-residualized data. It thus seems that the missing measures of the right side or the use of near-doubled measures had a negative impact on the results as well.

One can calculate other measures of factor strength/internal reliability, such as the average intercorrelation, Cronbach’s alpha, Guttman’s G6. These are shown in Table 1.

Table 1- Internal reliability measures
Sample Mean r Alpha (raw) Alpha (std.) G6
Male .33 .48 .88 .90
Female .34 .45 .89 .90


Multiple factors

We are now ready to explore factor solutions with more factors. Three different methods suggested extracted at most 5 factors both datasets (using nScree() from nFactors package). I extracted solutions for 2 to 6 factors for each dataset, the last included by accident. All of these were extracted with oblique rotation method of oblimin thus possibly returning correlated factors. The prediction from a hierarchical model is clear: factors extracted in this way should be correlated. Figures 10 to 14 show the factor loadings of these solutions.

Figure 10 Figure 11 Figure 12 Figure 13

Figure 14

So it looks like results very pretty good with 4 factors and not too good with the others. The problem with this method is that the factors extracted may be similar/identical but not in the same order and with the same name. This means that the plots above may plot the wrong factors together which defeats the entire purpose. So what we need is an automatic method of pairing up the factors correctly if possible. The exhaustive method is trying all the pairings of factors for each number of factors to extract, and then calculating some summary metrics or finding the best overall pairing combination. This would involve quite a lot of comparisons, since e.g. one can pair up set 2 sets of, say, 5 factors in 5*4/2 ways (10).

I settled for a quicker solution. For each factor solution pair, I calculated all the cross-analysis congruence factors. Then for each factor, I found the factor from the other analysis it had the highest congruence with and saved this information. This method can miss some okay but not great solutions, but I’m not overly concerned about those. In a good fit, the factors found in each analysis should map 1 to 1 to each other such that their highest congruence is with the analog factor from the other analysis.

From this information, I calculated the mean of the best congruence pairs, the minimum, and whether there was a mismatch. A mismatch occurs when two or more factors from one analysis maps to (has the highest congruence) with the same factor from the other analysis. I calculated three metrics for all the analyses performed above. The results are shown in Table 2.

Table 2 – Cross-analysis comparison metrics
Factors.extracted factor.mismatch
2 0.825 0.73 FALSE
3 0.713 0.37 TRUE
4 0.960 0.93 FALSE
5 0.720 0.35 TRUE
6 0.765 0.58 FALSE


As can be seen, the two analyses with 4 factors were a very good match. Those with 3 and 5 terrible as they produced factor mismatches. The analyses with 2 and 6 were also okay.

The function for going thru all the oblique solutions for two samples also returns the necessary information to match up the factors if they need reordering. If there is a mismatch, this operation is nonsensical, so I won’t re-do all the plots. The plot above with 4 factors just happens to already be correctly ordered. This however need not be the case. The only plot that needs to be redone is that with 6 factors. It is shown in Figure 15.

Figure 15

Compare with figure 14 above. One might wonder whether the 4 or 6 factor solutions are the best. In this case, the answer is the 4 factor solutions because the female 6 factor solution is illegal — one factor loading is above 1 (“a Heywood case”). At present, given the relatively few regional measures, and the limitation to only volume and surface measures, I would not put too much effort into theorizing about the multifactor structure found so far. It is merely a preliminary finding and may change drastically when more measures are added or measures are sampled differently.

A more important finding from all the multifactor solutions was that all produced correlated factors, which indicates a general factor.

Aggregate measures and the general brain size factor

So, the general brain size factor (GBSF) may exist, but is it useful? At first, we may want to correlate the various aggregate variables. Results are in Table 3.

Table 3 – Correlations between aggregate brain measures vol.WholeBrain vol.IntracranialVolume GBSF 0.997 0.869 0.746 0.953 0.997 0.867 0.751 0.953
vol.WholeBrain 0.832 0.832 0.822 0.923
vol.IntracranialVolume 0.638 0.642 0.798 0.776
GBSF 0.950 0.950 0.905 0.711

Notes: Correlations above diagonal are males, below females.

The total areas of the brain are almost symmetrical: the correlation of the total surface area and left side only is a striking .997. Intracranial volume is a decent proxy (.822) for whole brain volume, but is somewhat worse for total surface area (.746). GBSF has very strong correlations with the surface areas (.95), but not quite as strong as the analogous situation in cognitive data: IQ and extracted general factor (GCA factor) usually correlate .99 with a reasonable sample of subtests: Ree and Earles (1991) reported that an average GCA factor correlated .991 with an unweighted sum score in a sample of >9k, Kirkegaard (2014b) found a .99 correlation between extracted GCA and an unweighted sum in a Dutch university sample of ~500.

Correlations with cognitive measures

The authors have data for 4 cognitive tests, however, data are only public for 2 of them. These are in the authors’ words:

Flanker inhibitory control test (N = 1,074).
The NIH Toolbox Cognition Battery version of the flanker task was adapted from the Attention Network Test (ANT). Participants were presented with a stimulus on the center of a computer screen and were required to indicate the left-right orientation while inhibiting attention to the flankers (surrounding stimuli). On some trials the orientation of the flankers was congruent with the orientation of the central stimulus and on the other trials the flankers were incongruent. The test consisted of a block of 25 fish trials (designed to be more engaging and easier to see to make the task easier for children) and a block of 25 arrow trials, with 16 congruent and 9 incongruent trials in each block, presented in pseudorandom order. Participants who responded correctly on 5 or more of the 9 incongruent trials then proceeded to the arrows block. All children age 9 and above received both the fish and arrows blocks regardless of performance. The inhibitory control score was based on performance on both congruent and incongruent trials. A two-vector method was used that incorporated both accuracy and reaction time (RT) for participants who maintained a high level of accuracy (>80% correct), and accuracy only for those who did not meet this criteria. Each vector score ranged from 0 to 5, for a maximum total score of 10 (M = 7.67, s.d. = 1.86).
List sorting working memory test (N = 1,084).
This working memory measure requires participants to order stimuli by size. Participants were presented with a series of pictures on a computer screen and heard the name of the object from a speaker. The test was divided into the One-List and Two-List conditions. In the One-List condition, participants were told to remember a series of objects (food or animals) and repeat them in order, from smallest to largest. In the Two-List condition, participants were told to remember a series of objects (food and animals, intermixed) and then again report the food in order of size, followed by animals in order of size. Working memory scores consisted of combined total items correct on both One-List and Two-List conditions, with a maximum of 28 points (M = 17.71, s.d. = 5.39).

I could not locate a factor analytic study for the Flanker test, so I don’t know how g-loaded it is. Working memory (WM) is known to have a strong relationship to GCA (Unsworth et al, 2014). The WM variable should probably be expected to be the most g-loaded of the two. The implication given the causal hypothesis of brain size for GCA is that the WM test should show higher correlations to the brain measures. Figures X and X show the histograms for the cognitive measures.


Note that the x-values do not have any interpretation as they are the residual raw values, not raw values. For the Flanker test, we see that it is bimodal. It seems that a significant part of the sample did not understand the test and thus did very poorly. One should probably either remove them or use a non-parametric measure if one wanted to rely on this variable. I decided to remove them since the sample was sufficiently large that this wasn’t a big problem. The procedure reduced the skew from -1.3/-1.1 to -.2/-.1 respectively for the male and female samples. The sample sizes were reduced from 548/516 to 522/487 respectively. One could plausibly combine them into one measure which would perhaps be a better estimate of GCA than either of them alone. This would be the case if their g-loading was about similar. If however, one is much more g-loaded than the other, it would degrade the measurement towards a middle level. I combined the two measures by first normalizing them (to put them on the same scale) and then averaging them.

Given the very high correlations between the GBSF of these data and the other aggregate measures, it is not expected that the GBSF will correlate much more strongly with cognitive measures than the other aggregate brain measures. Table X shows the correlations.

Table X – Correlations between cognitive measures and aggregate brain size measures
Variable WM Flanker Combined WM Flanker Combined
Males Females
Flanker 0.407 0.393
WM.Flanker.mean 0.830 0.847 0.824 0.845 0.302 0.138 0.235 0.236 0.201 0.237 0.302 0.137 0.235 0.239 0.203 0.238
vol.WholeBrain 0.263 0.103 0.201 0.158 0.120 0.146
vol.IntracranialVolume 0.213 0.101 0.170 0.154 0.101 0.137
GBSF 0.311 0.147 0.252 0.223 0.181 0.218


As for the GBSF, given that it is a ‘distillate’ (Jensen’s term), one would expect it to have slightly higher correlations with the cognitive measures than the merely unweighted ‘sum’ measures. This was the case for males, but not females. In general, the female correlations were weaker, especially the whole brain volume x WM (.263 vs. .158). Despite the large sample sizes, this difference is not very certain, the 95% confidence intervals are -.01 to .22. A larger sample is necessary to examine this question. The finding is intriguing is that if real, it would pose an alternative solution to the Ankney-Rushton anomaly, that is, the fact that males have greater brain size and this is related to IQ scores, but do not consistently perform better on IQ tests (Jackson and Rushton, 2006). Note however that the recent large meta-analysis of brain size x IQ studies did not find an effect of gender, so perhaps the above results are a coincidence (Pietschnig et al 2014).

We also see that the total cortical area variables were stronger correlates of cognitive measures than whole brain volume, but a larger sample is necessary to confirm this pattern.

Lastly, we see a moderately strong correlation between the two cognitive measures (≈.4). The combined measure was a weaker correlate of the criteria variables, which is what is expected if the Flanker test was a relatively weaker test of GCA than the WM one.

Correlations with parental education and income

It is time to revisit the results reported by the original authors, namely correlations between educational/economic variables and brain measures. I think the correlations between specific brain regions and criteria variables is mostly a fishing expedition of chance results (multiple testing) and of no particular interest unless strong predictions can be made before looking at the data. For this reason, I present only correlations with the aggregate brain measures, as seen in Table X.

Table x – Correlations between educational/economic variables and other variables
Variable ED ln_Inc Income ED ln_Inc Income
Males Females
WM 0.131 0.192 0.175 0.170 0.229 0.174
Flanker 0.163 0.180 0.188 0.118 0.131 0.106
WM.Flanker.mean 0.168 0.215 0.206 0.178 0.215 0.165 0.104 0.217 0.207 0.128 0.173 0.154 0.108 0.215 0.208 0.133 0.170 0.152
vol.WholeBrain 0.103 0.190 0.195 0.064 0.112 0.078
vol.IntracranialVolume 0.126 0.157 0.159 0.086 0.104 0.100
GBSF 0.109 0.206 0.204 0.100 0.157 0.137
ED 0.559 0.542 0.561 0.513
ln_Inc 0.559 0.866 0.561 0.855
Income 0.542 0.866 0.513 0.855


Here the correlations of the combined cognitive measure was higher than WM, unlike before, so perhaps the diagnosis from before was wrong. In general, the correlations of income and brain measures were stronger than that for education. This is despite the fact that GCA is more strongly correlated to educational attainment than income. This was however not the same in this sample: correlations of WM and Flanker were stronger with the economic variables. Perhaps there is more range restriction in the educational variable than the income one. An alternative environmental interpretation is that it is the affluence that causes the larger brains.

If we recall the theoretic predictions of the strength of the correlations, the incomes are stronger than expected (actual .19/.09 M/F, predicted about .05), while the educational ones are a bit weaker than expected (actual .1/.6, predicted about .13). However, the sample sizes are not larger enough for these results to be certain enough to question the theory.

Racial admixture

To me surprise, the sample had racial admixture data. This is surprising because such data has been available to testing the genetic hypothesis of group differences for many years, apparently without anyone publishing something on the issue. As I argued elsewhere, this is odd given that a good dataset would be able to decisively settle the ‘race and intelligence’ controversy (Dalliard, 2014; Rote and Rodgers, 2005; Rushton and Jensen, 2005). It is actually very good evidence for the genetic hypothesis because if it was false, and these datasets showed it, it would have been a great accomplishment for a mainstream scientist to publish a paper decisively showing that it was indeed false. However, if it was true, then any mainstream scientist could not publish it without risking personal assaults, getting fired and possibly pulled in court as were academics who previously researched that topic (Gottfredson, 2005; Intelligence 1998; Nyborg 2011; 2003).

The genomic data however appeared to be an either/or (1 or 0) variable in the released data files. Oddly, some persons had no value for any racial group. It turns out that the data was merely rounded in the spreadsheet file. This explained why some persons had 0 for all groups: These persons did not belong at least 50% to any racial group, and thus they were assigned a 0 in every case.

I can think of two ways to count the number of persons in the main categories. One can count the total ‘summed’ persons. In this way, if person A has 50% ancestry from race R, and person B has 30%, this would sum to .8 persons. One can think of it as the number of pure-breed persons’ worth of ancestry from that that group. Another way is to count everybody as 1 who is above some threshold for ancestry. I chose to use 20% and 80% for thresholds, which correspond with persons with substantial ancestry from that racial cluster, and persons with mostly ancestry from that cluster. One could choose other values of course, and there is a degree of arbitrariness, but it is not important what the particular values are.

Results are in Table X.

Racial group European African Amerindian East Asian Oceanian Central Asia Sum
Summed ancestry ‘persons’ 686.364 134.2714 48.31457 163.49868 8.59802 26.95408 1068.00075
Persons with >20% 851 187 89 238 8 30 1403
Persons with >80% 647 105 3 121 0 21 897


Note that the number 1068 is the exact number of persons in the complete sample, which means that the summed ancestry for all groups has an error of a mere .00075.

Another way of understanding the data is to plot histograms of each racial group. These are shown below in Figures X to X.

Race_European_histogramRace_African_histogram Race_Amerindian_histogram Race_East_Asian_histogram   Race_Oceanian_histogramRace_Central_Asian_histogram


Since European ancestry is the largest, the other plots are mostly empty except for the 0% bar. But we do see a fair amount of admixture in the dataset.

Regression, residualization, correlation and power

There are a couple of different methods one could use to examine the admixture data. A simple correlation is justified when dealing with a group that only has 2 sources of ancestry. This is the easiest case to handle. For this to work, the groups most have a different genotypic mean of the trait in question (GCA and brain size variables in this case) and there must be a substantially admixtured population. Even given a large hypothesized genotypic difference, the expected correlation is actually quite small. For African Americans (such as those in the sample), their European ancestry% is about 15-25% depending on the exact sample. The standard deviation of their European ancestry% is not always reported, but one can calculate it if one has some data, which we do.

The first problem with this dataset is that there are no sociological race categories (“white”, “African American”, “Hispanic”, “Asian” etc.), but only genomic data. This means that to get an African American subsample, we must create one based on actual actual ancestry. There are two criteria that needs to be met for inclusion in that group: 1) the person must be substantially African, 2) the person must be mostly a mix of European and African ancestry. Going with the values from before, this means that the person must be at least 20% African, and at least 80% combined European and African.

Dealing with scanner and site

There are a variety of ways to use the data and they may or may not give similar results. First is the question of which variables to control for. In the earlier sections of this paper, I controlled for Age, Age2, Age3, Scanner (12 different). For producing the above ancestry plots and results I did not control for anything. Controlling the ancestry variables for scanner is problematic as people from different races live in different places. Controlling for this thus removes the racial differences for no reason. One could similarly control for site where the scanner is (I did not do this earlier). We can compare this to scanner by a contingency table, as shown in Table X below:

Table X – Contingency table of scanner site and scanner #
Site/scanner 0 1 10 11 12 2 3 4 5 6 7 8 9
Cornel 0 0 0 96 0 0 0 0 0 0 0 0 0
Davis 0 0 0 0 0 0 0 0 0 0 114 0 0
Hawaii 0 0 0 0 0 0 0 0 0 202 0 0 0
KKI 0 0 0 0 0 0 103 0 0 0 0 0 0
MGH 0 0 0 0 0 0 0 0 115 0 0 0 13
UCLA 0 0 27 0 22 0 0 10 0 0 0 0 0
UCSD 109 93 0 0 0 0 0 0 0 0 0 0 0
UMMS 0 0 0 0 0 56 0 0 0 0 0 0 0
Yale 0 0 0 0 0 0 0 0 0 0 0 108 0


As we can see, these are clearly inter-dependent, given the obvious fact that the scanners have a particular location and was not moved around (all columns have only 1 cell with value>0). Some sites however have multiple scanners, some have only one. E.g. UCSD has two scanners (#0 and #1), while KKI has only one (#3).

Controlling for scanner however makes sense if we are looking at brain size variables, as this removes differences between measurements due to differences in the scanning equipment or (post-)processing. So perhaps one would want to control brain measurements for scanner and age effects, but only control the remaining variables for age affects.

Dealing with gender

As before

To be continued…



  • Ashton, M. C., Lee, K., & Visser, B. A. (2014a). Higher-order g versus blended variable models of mental ability: Comment on Hampshire, Highfield, Parkin, and Owen (2012). Personality and Individual Differences, 60, 3-7.
  • Ashton, M. C., Lee, K., & Visser, B. A. (2014b). Orthogonal factors of mental ability? A response to Hampshire et al. Personality and Individual Differences, 60, 13-15.
  • Ashton, M. C., Lee, K., & Visser, B. A. (2014c). Further response to Hampshire et al. Personality and Individual Differences, 60, 18-19.
  • Beaujean, A. A. (2014). Latent Variable Modeling Using R: A Step by Step Guide: A Step-by-Step Guide. Routledge.
  • Bouchard Jr, T. J. (2014). Genes, Evolution and Intelligence. Behavior genetics, 44(6), 549-577.
  • Button, K. S., Ioannidis, J. P., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S., & Munafò, M. R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365-376.
  • Colom, R., & Thompson, P. M. (2011). Intelligence by Imaging the Brain. The Wiley-Blackwell handbook of individual differences, 3, 330.
  • Cudeck, R., & MacCallum, R. C. (Eds.). (2012). Factor analysis at 100: Historical developments and future directions. Routledge.
  • Dalliard, M. (2013). Is Psychometric g a Myth?. Human Varieties.
  • Dalliard, M. (2014). The Elusive X-Factor: A Critique of J. M. Kaplan’s Model of Race and IQ. Open Differential Psychology.
  • Deary, I. J., & Caryl, P. G. (1997). Neuroscience and human intelligence differences. Trends in Neurosciences, 20(8), 365-371.
  • Deary, I. J., Penke, L., & Johnson, W. (2010). The neuroscience of human intelligence differences. Nature Reviews Neuroscience, 11(3), 201-211.
  • Dekaban, A.S. and Sadowsky, D. (1978). Changes in brain weights during the span of human life: relation of brain weights to body heights and body weights, Ann. Neurology, 4:345-356.
  • Euler, M. J., Weisend, M. P., Jung, R. E., Thoma, R. J., & Yeo, R. A. (2015). Reliable Activation to Novel Stimuli Predicts Higher Fluid Intelligence. NeuroImage.
  • Frangou, S., Chitins, X., & Williams, S. C. (2004). Mapping IQ and gray matter density in healthy young people. Neuroimage, 23(3), 800-805.
  • Giedd, J. N., Blumenthal, J., Jeffries, N. O., Castellanos, F. X., Liu, H., Zijdenbos, A., … & Rapoport, J. L. (1999). Brain development during childhood and adolescence: a longitudinal MRI study. Nature neuroscience, 2(10), 861-863.
  • Gottfredson, L. S. (2005). Suppressing intelligence research: Hurting those we intend to help. In R. H. Wright & N. A. Cummings (Eds.), Destructive trends in mental health: The well-intentioned path to harm (pp. 155-186). New York: Taylor and Francis.
  • Haier, R. J., Karama, S., Colom, R., Jung, R., & Johnson, W. (2014a). A comment on “Fractionating Intelligence” and the peer review process. Intelligence, 46, 323-332.
  • Haier, R. J., Karama, S., Colom, R., Jung, R., & Johnson, W. (2014b). Yes, but flaws remain. Intelligence, 46, 341-344.
  • Hampshire, A., Highfield, R. R., Parkin, B. L., & Owen, A. M. (2012). Fractionating human intelligence. Neuron, 76(6), 1225-1237.
  • Hampshire, A., Parkin, B., Highfield, R., & Owen, A. M. (2014). Response to:“Higher-order g versus blended variable models of mental ability: Comment on Hampshire, Highfield, Parkin, and Owen (2012)”. Personality and Individual Differences, 60, 8-12.
  • Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world?. Behavioral and brain sciences, 33(2-3), 61-83.
  • Intelligence. (1998). Special issue dedicated to Arthur Jensen. Volume 26, Issue 3.
  • Jackson, D. N., & Rushton, J. P. (2006). Males have greater g: Sex differences in general mental ability from 100,000 17-to 18-year-olds on the Scholastic Assessment Test. Intelligence, 34(5), 479-486.
  • Jensen, A. R., & Weng, L. J. (1994). What is a good g?. Intelligence, 18(3), 231-258.
  • Jensen, A. R. (1997). The psychometrics of intelligence. In H. Nyborg (Ed.), The scientific study of human nature: Tribute to Hans J. Eysenck at eighty. New York: Elsevier. Pp. 221—239.
  • Jensen, A. R. (1998). The g Factor: The Science of Mental Ability. Preager.
  • Jensen, A. R. (2002). Psychometric g: Definition and substantiation. The general factor of intelligence: How general is it, 39-53.
  • Jung, R. E. & Haier, R. J. (2007). The Parieto-Frontal Integration Theory (P-FIT) of intelligence: converging neuroimaging evidence. Behav. Brain Sci. 30, 135–154; discussion 154–187.
  • Jung, R. E. et al. (2009). Imaging intelligence with proton magnetic resonance spectroscopy. Intelligence 37, 192–198.
  • Jöreskog, K. G. (1996). Applied factor analysis in the natural sciences. Cambridge University Press.
  • Kim, S. E., Lee, J. H., Chung, H. K., Lim, S. M., & Lee, H. W. (2014). Alterations in white matter microstructures and cognitive dysfunctions in benign childhood epilepsy with centrotemporal spikes. European Journal of Neurology, 21(5), 708-717.
  • Kirkegaard, E. O. W. (2014a). The international general socioeconomic factor: Factor analyzing international rankings. Open Differential Psychology.
  • Kirkegaard, E. O. W. (2014b). The personal Jensen coefficient does not predict grades beyond its association with g. Open Differential Psychology.
  • Li, Y. et al. (2009). Brain anatomical network and intelligence. PLoS Comput. Biol. 5, e1000395.
  • Navas‐Sánchez, F. J., Alemán‐Gómez, Y., Sánchez‐Gonzalez, J., Guzmán‐De‐Villoria, J. A., Franco, C., Robles, O., … & Desco, M. (2014). White matter microstructure correlates of mathematical giftedness and intelligence quotient. Human brain mapping, 35(6), 2619-2631.
  • Neubauer, A. C. & Fink, A. (2009). Intelligence and neural efficiency. Neurosci. Biobehav. Rev. 33, 1004–1023.
  • Noble, K. G., Houston, S. M., Brito, N. H., Bartsch, H., Kan, E., Kuperman, J. M., … & Sowell, E. R. (2015). Family income, parental education and brain structure in children and adolescents. Nature Neuroscience.
  • Nyborg, H. (2003). The sociology of psychometric and bio-behavioral sciences: A case study of destructive social reductionism and collective fraud in 20th century academia. Nyborg H.(Ed.). The scientific study of general intelligence. Tribute to Arthur R. Jensen, 441-501.
  • Nyborg, H. (2011). The greatest collective scientific fraud of the 20th century: The demolition of differential psychology and eugenics. Mankind Quarterly, Spring Issue.
  • Pennington, B. F., Filipek, P. A., Lefly, D., Chhabildas, N., Kennedy, D. N., Simon, J. H., … & DeFries, J. C. (2000). A twin MRI study of size variations in the human brain. Journal of Cognitive Neuroscience, 12(1), 223-232.
  • Pietschnig, J., Penke, L., Wicherts, J. M., Zeiler, M., & Voracek, M. (2014). Meta-Analysis of Associations Between Human Brain Volume And Intelligence Differences: How Strong Are They and What Do They Mean?. Available at SSRN 2512128.
  • Ree, M. J., & Earles, J. A. (1991). The stability of g across different methods of estimation. Intelligence, 15(3), 271-278.
  • Rowe, D. C., & Rodgers, J. E. (2005). Under the skin: On the impartial treatment of genetic and environmental hypotheses of racial differences. American Psychologist, 60(1), 60.
  • Rushton, J. P., & Jensen, A. R. (2005). Thirty years of research on race differences in cognitive ability. Psychology, public policy, and law, 11(2), 235.
  • Shaw, P. et al. (2006). Intellectual ability and cortical development in children and adolescents. Nature 440, 676–679 (2006).
  • Sterne, J. A., White, I. R., Carlin, J. B., Spratt, M., Royston, P., Kenward, M. G., … & Carpenter, J. R. (2009). Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. Bmj, 338, b2393.
  • Strenze, T. (2007). Intelligence and socioeconomic success: A meta-analytic review of longitudinal research. Intelligence, 35(5), 401-426.
  • Turken, A. et al. (2008). Cognitive processing speed and the structure of white matter pathways: convergent evidence from normal variation and lesion studies. Neuroimage 42, 1032–1044
  • Unsworth, N., Fukuda, K., Awh, E., & Vogel, E. K. (2014). Working memory and fluid intelligence: Capacity, attention control, and secondary memory retrieval. Cognitive psychology, 71, 1-26.



I analyzed the S factor in US states by compiling a dataset of 25 diverse socioeconomic indicators. Results show that Washington DC is a strong outlier, but if it is excluded, then the S factor correlated strongly with state IQ at .75.

Ethnoracial demographics of the states are related to the state’s IQ and S in the expected order (White>Hispanic>Black).


Introduction and data sources

In my previous two posts, I analyzed the S factor in 33 Indian states (Kirkegaard, 2015a) and 31 Chinese regions (Kirkegaard, 2015b). In both samples I found strongish S factors and they both correlated positively with cognitive estimates (IQ or G). In this post I used cognitive data from McDaniel (2006). He gives two sets of estimated IQs based on SAT-ACT and on NAEP. Unfortunately, they only correlate .58, so at least one of them is not a very accurate estimate of general intelligence.

His article also reports some correlations between these IQs and socioeconomic variables: Gross State Product per capita, median income and percent poverty. However, data for these variables is not given in the article, so I did not use them. Not quite sure where his data came from.

However, with cognitive data like this and the relatively large number of datapoints (50 or 51 depending on use of District of Colombia), it is possible to do a rather good study of the S factor and its correlates. High quality data for US states are readily available, so results should be strong. Factor analysis requires a case to variable ratio of at least 2:1 to deliver reliable results (Zhao, 2009). So, this means that one can do an S factor analysis with about 25 variables.

Thus, I set out to find about 25 diverse socioeconomic variables. There are two reasons to gather a very diverse sample of variables. First, for method of correlated vectors to work (Jensen, 1998), there must be variation in the indicators’ loading on the factor. Lack of variation causes restriction of range problems. Second, lack of diversity in the indicators of a latent variable leads to psychometric sampling error (Jensen, 1994; review post here for general intelligence measures).

My primary source was The 2012 Statistical Abstract website. I simply searched for “state” and picked various measures. I tried to pick things that weren’t too dependent on geography. E.g. kilometer of coast line per capita would be very bad since it’s neither socioeconomic and very dependent (near 100%) on geographical factors. To increase reliability, I generally used all data for the last 10 years and averaged them. Curious readers should see the datafile for details.

I ended up with the following variables:

  1. Murder rate per 100k, 10 years
  2. Proportion with high school or more education, 4 years
  3. Proportion with bachelor or more education, 4 years
  4. Proportion with advanced degree or more, 4 years
  5. Voter turnout, presidential elections, 3 years
  6. Voter turnout, house of representatives, 6 years
  7. Percent below poverty, 10 years
  8. Personal income per capita, 1 year
  9. Percent unemployed, 11 years
  10. Internet usage, 1 year
  11. Percent smokers, male, 1 year
  12. Percent smokers, female, 1 year
  13. Physicians per capita, 1 year
  14. Nurses per capita, 1 year
  15. Percent with health care insurance, 1 year
  16. Percent in ‘Medicaid Managed Care Enrollment’, 1 year
  17. Proportion of population urban, 1 year
  18. Abortion rate, 5 years
  19. Marriage rate, 6 years
  20. Divorce rate, 6 years
  21. Incarceration rate, 2 years
  22. Gini coefficient, 10 years
  23. Top 1%, proportion of total income, 10 years
  24. Obesity rate, 1 year

Most of these are self-explanatory. For the economic inequality measures, I found 6 different measures (here). Since I wanted diversity, I chose the GINI and the top 1% because these correlated the least and are well-known.

Aside from the above, I also fetched the racial proportions for each state, to see how they relate the S factor (and the various measures above, but to get these, run the analysis yourself).

I used R with RStudio for all analyses. Source code and data is available in the supplementary material.

Missing data

In large analyses like this there are nearly always some missing data. The matrixplot() looks like this:


(It does not seem possible to change the font size, so I have cut off the names at the 8th character.)

We see that there aren’t many missing values. I imputed all the missing values with the VIM package (deterministic imputation using multiple regression).

Extreme values

A useful feature of the matrixplot() is that it shows in grey-tone the relatively outliers for each variable. We can see that some of them have some hefty outliers, which may be data errors. Therefore, I examined them.

The outlier in the two university degree variables is DC, surely because the government is based there and there is a huge lobbyist center. For the marriage rate, the outlier is Nevada. Many people go there and get married. Physician and nurse rates are also DC, same reason (maybe one could make up some story about how politics causes health problems!).

After imputation, the matrixplot() looks like this:


It is pretty much the same as before, which means that we did not substantially change the data — good!

Factor analyzing the data

Then we factor analyze the data (socioeconomic data only). We plot the loadings (sorted) with a dotplot:


We see a wide spread of variable loadings. All but two of them load in the expected direction — positive are socially valued outcomes, negative the opposite — showing the existence of the S factor. The ‘exceptions’ are: abortion rate loading +.60, but often seen as a negative thing. It is however open to discussion. Maybe higher abortion rates can be interpreted as less backward religiousness or more freedom for women (both good in my view). The other is marriage rate at -.19 (weak loading). I’m not sure how to interpret that. In any case, both of these are debatable which way the proper desirable direction is.

Correlations with cognitive measures

And now comes the big question, does state S correlate with our IQ estimates? They do, the correlations are: .14 (SAT-ACT) and .43 (NAEP). These are fairly low given our expectations. Perhaps we can work out what is happening if we plot them:


Now we can see what is going on. First, the SAT-ACT estimates are pretty strange for three states: California, Arizona and Nevada. I note that these are three adjacent states, so it is quite possibly some kind of regional testing practice that’s throwing off the estimates. If someone knows, let me know. Second, DC is a huge outlier in S, as we may have expected from our short discussion of extreme values above. It’s basically a city state which is half-composed of low s (SES) African Americans and half upper class related to government.

Dealing with outliers – Spearman’s correlation aka. rank-order correlation

There are various ways to deal with outliers. One simple way is to convert the data into ranked data, and just correlate those like normal. Pearson’s correlations assume that the data are normally distributed, which is often not the case with higher-level data (states, countries). Using rank-order gets us these:

S_IQ1_rank S_IQ2_rank

So the correlations improved a lot for the SAT-ACT IQs and a bit for the NAEP ones.

Results without DC

Another idea is simply excluding the strange DC case, and then re-running the factor analysis. This procedure gives us these loadings:


(I have reversed them, because they were reversed e.g. education loading negatively.)

These are very similar to before, excluding DC did not substantially change results (good). Actually, the factor is a bit stronger without DC throwing off the results (using minres, proportion of var. = 36%, vs. 30%). The reason this happens is that DC is an odd case, scoring very high in some indicators (e.g. education) and very poorly in others (e.g. murder rate).

The correlations are:


So, not surprisingly, we see an increase in the effect sizes from before: .14 to .31 and .43 to .69.

Without DC and rank-order

Still, one may wonder what the results would be with rank-order and DC removed. Like this:


So compared to before, effect size increased for the SAT-ACT IQ and decreased slightly for the NAEP IQ.

Now, one could also do regression with weights based on some metric of the state population and this may further change results, but I think it’s safe to say that the cognitive measures correlate in the expected direction and with the removal of one strange case, the better measure performs at about the expected level with or without using rank-order correlations.

Method of correlated vectors

The MCV (Jensen, 1998) can be used to test whether a specific latent variable underlying some data is responsible for the observed correlation between the factor score (or factor score approximation such as IQ — an unweighted sum) and some criteria variable. Altho originally invented for use on cognitive test data and the general intelligence factor, I have previously used it in other areas (e.g. Kirkegaard, 2014). I also used it in the previous study of India (Kirkegaard, 2015a), but not that of China because there was a lack of variation in the loadings of socioeconomic variables on the S factor.

Using the dataset without DC, the MCV result for the NAEP dataset is:


So, again we see that MCV can reach high r’s when there is a large number of diverse variables. But note that the value can be considered inflated because of the negative loadings of some variables. It is debatable whether one should reverse them.

Racial proportions of states and S and IQ

A last question is whether the states’ racial proportions predict their S score and their IQ estimate. There are lots of problems with this. First, the actual genomic proportions within these racial groups vary by state (Bryc, 2015). Second, within ‘pure-breed’ groups, general intelligence varies by state too (this was shown in the testing of draftees in the US in WW1). Third, there is an ‘other’ group that also varies from state to state, presumably different kinds of Asians (Japanese, Chinese, Indians, other SE Asia). Fourth, it is unclear how one should combine these proportions into an estimate used for correlation analysis or model them. Standard multiple regression is unsuited for handling this kind of data with a perfect linear dependency, i.e. the total proportion must add up to 1 (100%). MR assumes that the ‘independent’ variables are.. independent of each other. Surely some method exists that can handle this problem, but I’m not familiar with it. Given the four problems above, one will not expect near-perfect results, but one would probably expect most going in the right direction with non-near-zero size.

Perhaps the simplest way of analyzing it is correlation. These are susceptible to random confounds when e.g. white% correlates differentially with the other racial proportions. However, they should get the basic directions correct if not the effect size order too.

Racial proportions, NAEP IQ and S

For this analysis I use only the NAEP IQs and without DC, as I believe this is the best subdataset to rely on. I correlate this with the S factor and each racial proportion. The results are:

Racial group NAEP IQ S
White 0.69 0.18
Black -0.5 -0.42
Hispanic -0.38 -0.08
Other -0.26 0.2


For NAEP IQ, depending on what one thinks of the ‘other’ category, these have either exactly or roughly the order one expects: W>O>H>B. If one thinks “other” is mostly East Asian (Japanese, Chinese, Korean) with higher cognitive ability than Europeans, one would expect O>W>H>B. For S, however, the order is now O>W>H>B and the effect sizes much weaker. In general, given the limitations above, these are perhaps reasonable if somewhat on the weak side for S.

Estimating state IQ from racial proportions using racial IQs

One way to utilize all the four variable (white, black, hispanic and other) without having MR assign them weights is to assign them weights based on known group IQs and then calculate a mean estimated IQ for each state.

Depending on which estimates for group IQs one accepts, one might use something like the following:

State IQ est. = White*100+Other*100+Black*85+Hispanic*90

Or if one thinks other is somewhat higher than whites (this is not entirely unreasonable, but recall that the NAEP includes reading tests which foreigners and Asians perform less well on), one might want to use 105 for the other group (#2). Or one might want to raise black and hispanic IQs a bit, perhaps to 88 and 93 (#3). Or do both (#4) I did all of these variations, and the results are:

Variable Race.IQ Race.IQ2 Race.IQ3 Race.IQ4
Race.IQ 1 0.96 1 0.93
Race.IQ2 0.96 1 0.96 0.99
Race.IQ3 1 0.96 1 0.94
Race.IQ4 0.93 0.99 0.94 1
NAEP IQ 0.67 0.56 0.67 0.51
S 0.41 0.44 0.42 0.45


As far as I can tell, there is no strong reason to pick any of these over each other. However, what we learn is that the racial IQ estimate and NAEP IQ estimate is somewhere between .51 and .67, and the racial IQ estimate and S is somewhere between .41 and .45. These are reasonable results given the problems of this analysis described above I think.

Added March 11: New NAEP data

I came across a series of posts by science blogger The Audacious Epigone, who has also estimated IQs based on NAEP data. He has done this three times (for 2013, 2009 and 2005 data), so along with McDaniels estimates, this gives us 4 non-identical estimates. First, we check their intercorrelations, which should be very high, r>.9, for this kind of data. Second, we extract the general factor and use it as the best estimate of NAEP IQ for the states (I deleted DC again). Third, we see how all 5 variables relate to S from before.


NAEP.IQ.09 0.96        
NAEP.IQ.05 0.83 0.89      
NAEP M. 0.88 0.93 0.96    
NAEP.1 0.95 0.99 0.95 0.97  
S 0.81 0.76 0.64 0.69 0.75


Where NAEP.1 is the general NAEP factor. We see that intercorrelations between NAEP estimates are not that high, they average only .86. Their loadings on the common factor is very high tho, .95 to .99. Still, this should result in improved results due to measurement error. And it does, NAEP IQ x S is now .75 from .69.

Scatter plot


Supplementary material

Data files and R source code available on the Open Science Framework repository.


Bryc, K., Durand, E. Y., Macpherson, J. M., Reich, D., & Mountain, J. L. (2015). The Genetic Ancestry of African Americans, Latinos, and European Americans across the United States. The American Journal of Human Genetics, 96(1), 37-53.

Jensen, A. R., & Weng, L. J. (1994). What is a good g?. Intelligence, 18(3), 231-258.

Jensen, A. R. (1998). The g factor: The science of mental ability. Westport, CT: Praeger.

Kirkegaard, E. O. W. (2014). The international general socioeconomic factor: Factor analyzing international rankings. Open Differential Psychology.

Kirkegaard, E. O. W. (2015a). Indian states: G and S factors. The Winnower.

Kirkegaard, E. O. W. (2015b). The S factor in China. The Winnower.

McDaniel, M. A. (2006). State preferences for the ACT versus SAT complicates inferences about SAT-derived state IQ estimates: A comment on Kanazawa (2006). Intelligence, 34(6), 601-606.

Zhao, N. (2009). The Minimum Sample Size in Factor Analysis.

Some time ago a new paper came out from the 23andme people reporting admixture among US ethnoracial groups (Bryc et al, 2014). Per our still on-going admixture project (current draft here), one could see if admixture predicts academic achievement (or IQ, if such were available). We (that is, John did) put together achievement data (reading and math scores) from the NAEP and the admixture data here.

Descriptive stats

Admixture studies do not work well if there is no or little variation within groups. So let’s first examine them. For blacks:

                      vars  n mean   sd median trimmed  mad  min  max range  skew kurtosis   se
BlackAfricanAncestry     1 31 0.74 0.04   0.74    0.74 0.03 0.64 0.83  0.19 -0.03    -0.38 0.01
BlackEuropeanAncestry    1 31 0.23 0.04   0.24    0.23 0.03 0.15 0.34  0.19  0.09    -0.30 0.01


So we see that there is little American admixture in Blacks because the African and European add up to close to 100 (23+74=97). In fact, the correlation between African and European ancestry in Blacks is -.99. This also means that multiple correlation is useless because of collinearity.

White admixture data is also not very useful. It is almost exclusively European:

                      vars  n mean sd median trimmed mad  min max range  skew kurtosis se
WhiteEuropeanAncestry    1 51 0.99  0   0.99    0.99   0 0.98   1  0.02 -0.95     0.74  0

What about Hispanics (some sources call them Latinos)?

                       vars  n mean   sd median trimmed  mad  min  max range skew kurtosis   se
LatinoEuropeanAncestry    1 34 0.73 0.07   0.72    0.73 0.05 0.57 0.90  0.33 0.34     0.22 0.01
LatinoAfricanAncestry     1 34 0.09 0.05   0.08    0.08 0.06 0.01 0.22  0.21 0.51    -0.69 0.01
LatinoAmericanAncestry    1 34 0.10 0.05   0.09    0.10 0.03 0.04 0.21  0.17 0.80    -0.47 0.01

Hispanics are fairly admixed. Overall, they are mostly European, but the range of African and American ancestry is quite high. Furthermore, due to the three way variation, multiple regression should work. The ancestry intercorrelations are: -.42 (Afro x Amer) -.21 (Afro x Euro) -.50 (Amer x Euro). There must also be another source because 73+9+10 is only 92%. Where’s the last 8% admixture from?

Admixture x academic achievement correlations: Blacks

row.names BlackAfricanAncestry BlackAmericanAncestry BlackEuropeanAncestry
1 Math2013B -0.32 0.09 0.29
2 Math2011B -0.27 0.21 0.25
3 Math2009B -0.30 0.09 0.28
4 Math2007B -0.12 0.27 0.08
5 Math2005B -0.28 0.26 0.23
6 Math2003B -0.30 0.15 0.26
7 Math2000B -0.36 -0.08 0.34
8 Read2013B -0.25 0.14 0.22
9 Read2011B -0.33 0.22 0.30
10 Read2009B -0.40 -0.03 0.41
11 Read2007B -0.26 0.14 0.24
12 Read2005B -0.43 0.33 0.39
13 Read2003B -0.42 0.09 0.38
14 Read2002B -0.30 -0.10 0.27


Summarizing these results:

     vars  n  mean   sd median trimmed  mad   min   max range  skew kurtosis   se
Afro    1 14 -0.31 0.08  -0.30   -0.32 0.05 -0.43 -0.12  0.31  0.48     0.10 0.02
Amer    1 14  0.13 0.13   0.14    0.13 0.11 -0.10  0.33  0.43 -0.32    -1.07 0.03
Euro    1 14  0.28 0.08   0.28    0.29 0.06  0.08  0.41  0.33 -0.49     0.11 0.02

So we see the expected directions and order, for Blacks (who are mostly African), American admixture is positive and European is more positive. There is quite a bit of variation over the years. It is possible that this reflects mostly ‘noise’ as in, e.g. changes in educational policies in the states, or just sampling error. It is also possible that the changes are due to admixture changes within states over time.

Admixture x academic achievement correlations: Hispanics

row.names LatinoAfricanAncestry LatinoAmericanAncestry LatinoEuropeanAncestry
1 Math13H 0.20 -0.13 -0.10
2 Math11H 0.27 0.02 -0.02
3 Math09H 0.29 -0.32 0.04
4 Math07H 0.36 -0.14 -0.01
5 Math05H 0.38 -0.08 0.00
6 Math03H 0.37 -0.23 -0.08
7 Math00H 0.30 -0.09 -0.05
8 Read2013H 0.18 -0.44 0.33
9 Read2011H 0.21 -0.26 0.33
10 Read2009H 0.19 -0.44 0.33
11 Read2007H 0.13 -0.32 0.23
12 Read2005H 0.38 -0.30 0.23
13 Read2003H 0.32 -0.34 0.18
14 Read2002H 0.24 -0.23 0.08

And summarizing:

     vars  n  mean   sd median trimmed  mad   min  max range  skew kurtosis   se
Afro    1 14  0.27 0.08   0.28    0.28 0.12  0.13 0.38  0.25 -0.10    -1.49 0.02
Amer    1 14 -0.24 0.14  -0.24   -0.24 0.15 -0.44 0.02  0.46  0.17    -1.13 0.04
Euro    1 14  0.11 0.16   0.06    0.11 0.19 -0.10 0.33  0.43  0.23    -1.68 0.04

We do not see the expected results per genetic model. Among Hispanics who are 73% European, African admixture has a positive relationship to academic achievement. American admixture is negatively correlated and European positively, but weaker than African. The only thing that’s in line with the genetic model is that European is positive. On the other hand, results are not in line with a null model either, because then we were expecting results to fluctuate around 0.

Note that the European admixture numbers are only positive for the reading tests. The reading tests are presumably those mostly affected by language bias (many Hispanics speak Spanish as a first language). If anything, the math results are worse for the genetic model.

General achievement factors

We can eliminate some of the noise in the data by extracting a general achievement factor for each group. I do this by first removing the cases with no data at all, and then imputing the rest.

Then we get the correlation like before. This should be fairly close to the means above:

 LatinoAfricanAncestry LatinoAmericanAncestry LatinoEuropeanAncestry 
                  0.28                  -0.36                   0.22

The European result is stronger with the general factor from the imputed dataset, but the order is the same.

We can do the same for the Black data to see if the imputation+factor analysis screws up the results:

 BlackAfricanAncestry BlackAmericanAncestry BlackEuropeanAncestry 
                -0.35                  0.20                  0.31

These results are similar to before (-.31, .13, .28) with the American result somewhat stronger.


Perhaps if we plot the results, we can figure out what is going on. We can plot either the general achievement factor, or specific results. Let’s do both:

Reading2013 plots

hispanic_afro_read13 hispanic_amer_read13 hispanic_euro_read13

Math2013 plots

hispanic_afro_math13 hispanic_amer_math13 hispanic_euro_math13

General factor plots

hispanic_afro_general hispanic_amer_general hispanic_euro_general

These did not help me understand it. Maybe they make more sense to someone who understands US demographics and history better.

Multiple regression

As mentioned above, the Black data should be mostly useless for multiple regression due to high collinearity. But the hispanic should be better. I ran models using two of the three ancestry estimates at a time since one cannot use all three (I think).

Generally, the independents did not reach significance. Using the general achievement factor as the dependent, the standardized betas are:

LatinoAfricanAncestry LatinoAmericanAncestry
             0.1526765             -0.2910413
LatinoAfricanAncestry LatinoEuropeanAncestry
             0.3363636              0.2931108
LatinoAmericanAncestry LatinoEuropeanAncestry
           -0.32474678             0.06224425

The first is relative to European, second to American, and third African. The results are not even consistent with each other. In the first, African>European. In the third, European>African. All results show that Others>American tho.

The remainder

There is something odd about the data, it doesn’t sum to 1. I calculated the sum of the ancestry estimates, and then subtracted that from 1. Here’s the results:

black_remainder hispanic_remainder

To these we can add simple descriptive stats:

                        vars  n mean   sd median trimmed  mad  min  max range skew kurtosis   se
BlackRemainderAncestry     1 31 0.02 0.00   0.02    0.02 0.00 0.01 0.03  0.02 1.35     1.18 0.00
LatinoRemainderAncestry    1 34 0.08 0.05   0.07    0.07 0.03 0.02 0.34  0.32 3.13    12.78 0.01


So we see that there is a sizable other proportion of Hispanics and a small one for Blacks. Presumably, the large outlier of Hawaii is Asian admixture from Japanese, Chinese, Filipino and Native Hawaiian clusters. At least, these are the largest groups according to Wikipedia. For Blacks, the ancestry is presumably Asian admixture as well.

Do these remainders correlate with academic achievement? For Blacks, r = .39 (p = .03), and for Hispanics r = -.24 (p = .18). So the direction is as expected for Blacks and stronger, but for Hispanics, it is in the right direction but weaker.

Partial correlations

What about partialing out the remainders?

LatinoAfricanAncestry LatinoAmericanAncestry LatinoEuropeanAncestry
            0.21881404            -0.33114612             0.09329413
BlackAfricanAncestry BlackAmericanAncestry BlackEuropeanAncestry
           -0.2256171             0.1189219             0.2185139


Not much has changed. European correlation has become weaker for Hispanics. For Blacks, results are similar to before.

Proposed explanations?

The African results are in line with genetic models. The Hispanic is not, but it isn’t in line with the null-model either. Perhaps it has something to do with generational effects. Perhaps if one could find % of first generation Hispanics by state and add those to the regression model / control for that using partial correlations.

Other ideas? Before calculating the results, John wrote:

Language, generation, and genetic assimilation are all confounded, so I thought it best to not look at them.

He may be right.

R code

data = read.csv("BryceAdmixNAEP.tsv", sep="\t",row.names=1)
library(car) # for vif
library(psych) # for describe
library(VIM) # for imputation
library(QuantPsyc) #for lm.beta
library(devtools) #for source_url
#load mega functions

#descriptive stats

black.model = "Math2013B ~ BlackAfricanAncestry+BlackAmericanAncestry"
black.model = "Read2013B ~ BlackAfricanAncestry+BlackAmericanAncestry"
black.model = "Math2013B ~ BlackAfricanAncestry+BlackEuropeanAncestry"
black.model = "Read2013B ~ BlackAfricanAncestry+BlackEuropeanAncestry" = lm(black.model, data)

hispanic.model = "Math2013H ~ LatinoAfricanAncestry+LatinoAmericanAncestry"
hispanic.model = "Read2013H ~ LatinoAfricanAncestry+LatinoAmericanAncestry"
hispanic.model = "Math2013H ~ LatinoAfricanAncestry+LatinoEuropeanAncestry"
hispanic.model = "Read2013H ~ LatinoAfricanAncestry+LatinoEuropeanAncestry"
hispanic.model = "hispanic.ach.factor ~ LatinoAfricanAncestry+LatinoAmericanAncestry"
hispanic.model = "hispanic.ach.factor ~ LatinoAfricanAncestry+LatinoEuropeanAncestry"
hispanic.model = "hispanic.ach.factor ~ LatinoAmericanAncestry+LatinoEuropeanAncestry"
hispanic.model = "hispanic.ach.factor ~ LatinoAfricanAncestry+LatinoAmericanAncestry+LatinoEuropeanAncestry" = lm(hispanic.model, data)

cors = round(rcorr(as.matrix(data))$r,2) #all correlations, round to 2 decimals

#blacks = cors[10:23,1:3] #Black admixture x Achv.
hist(unlist([,1])) #hist for afri x achv
hist(unlist([,2])) #amer x achv
hist(unlist([,3])) #euro x achv
desc = rbind(Afro=describe(unlist([,1])), #descp. stats afri x achv
             Amer=describe(unlist([,2])), #amer x achv
             Euro=describe(unlist([,3]))) #euro x achv

admixture.cors.white = cors[24:25,4:6] #White admixture x Achv.

admixture.cors.hispanic = cors[26:39,7:9] #White admixture x Achv.
desc = rbind(Afro=describe(unlist(admixture.cors.hispanic[,1])), #descp. stats afri x achv
             Amer=describe(unlist(admixture.cors.hispanic[,2])), #amer x achv
             Euro=describe(unlist(admixture.cors.hispanic[,3]))) #euro x achv

##Examine hispanics by scatterplots
scatterplot(Read2013H ~ LatinoAfricanAncestry, data,
            smoother=FALSE, id.n=nrow(data))
scatterplot(Read2013H ~ LatinoEuropeanAncestry, data,
            smoother=FALSE, id.n=nrow(data))
scatterplot(Read2013H ~ LatinoAmericanAncestry, data,
            smoother=FALSE, id.n=nrow(data))
scatterplot(Math2013H ~ LatinoAfricanAncestry, data,
            smoother=FALSE, id.n=nrow(data))
scatterplot(Math2013H ~ LatinoEuropeanAncestry, data,
scatterplot(Math2013H ~ LatinoAmericanAncestry, data,
#General factor
scatterplot(hispanic.ach.factor ~ LatinoAfricanAncestry, data,
            smoother=FALSE, id.n=nrow(data))
scatterplot(hispanic.ach.factor ~ LatinoEuropeanAncestry, data,
scatterplot(hispanic.ach.factor ~ LatinoAmericanAncestry, data,

##Imputed and aggregated data
#Hispanics = data[26:39] #subset hispanic ach data =[<ncol(,] #remove empty cases
miss.table( #examine missing data = irmi(, noise.factor = 0) #impute the rest
#factor analysis
fact.hispanic = fa( #get common ach factor
fact.scores = fact.hispanic$scores; colnames(fact.scores) = "hispanic.ach.factor"
data = merge.datasets(data,fact.scores,1) #merge it back into data
cors[7:9,"hispanic.ach.factor"] #results for general factor

#Blacks = data[10:23] #subset black ach data =[<ncol(,] #remove empty cases = irmi(, noise.factor = 0) #impute the rest
#factor analysis = fa( #get common ach factor
fact.scores =$scores; colnames(fact.scores) = "black.ach.factor"
data = merge.datasets(data,fact.scores,1) #merge it back into data
cors[1:3,"black.ach.factor"] #results for general factor

##Admixture totals
Hispanic.admixture = subset(data, select=c("LatinoAfricanAncestry","LatinoAmericanAncestry","LatinoEuropeanAncestry"))
Hispanic.admixture = Hispanic.admixture[,] #complete cases
Hispanic.admixture.sum = data.frame(apply(Hispanic.admixture, 1, sum))
colnames(Hispanic.admixture.sum)="Hispanic.admixture.sum" #fix name
describe(Hispanic.admixture.sum) #stats

#add data back to dataframe
LatinoRemainderAncestry = 1-Hispanic.admixture.sum #get remainder
colnames(LatinoRemainderAncestry) = "LatinoRemainderAncestry" #rename
data = merge.datasets(LatinoRemainderAncestry,data,2) #merge back

#plot it
LatinoRemainderAncestry = LatinoRemainderAncestry[order(LatinoRemainderAncestry,decreasing=FALSE),,drop=FALSE] #reorder
dotchart(as.matrix(LatinoRemainderAncestry),cex=.7) #plot, with smaller text

Black.admixture = subset(data, select=c("BlackAfricanAncestry","BlackAmericanAncestry","BlackEuropeanAncestry"))
Black.admixture = Black.admixture[,] #complete cases
Black.admixture.sum = data.frame(apply(Black.admixture, 1, sum))
colnames(Black.admixture.sum)="Black.admixture.sum" #fix name
describe(Black.admixture.sum) #stats

#add data back to dataframe
BlackRemainderAncestry = 1-Black.admixture.sum #get remainder
colnames(BlackRemainderAncestry) = "BlackRemainderAncestry" #rename
data = merge.datasets(BlackRemainderAncestry,data,2) #merge back

#plot it
BlackRemainderAncestry = BlackRemainderAncestry[order(BlackRemainderAncestry,decreasing=FALSE),,drop=FALSE] #reorder
dotchart(as.matrix(BlackRemainderAncestry),cex=.7) #plot, with smaller text

#simple stats for both

#make subset with remainder data and achievement
remainders = subset(data, select=c("black.ach.factor","BlackRemainderAncestry",
View(rcorr(as.matrix(remainders))$r) #correlations?

#Partial correlations
partial.r(data, c(7:9,40), c(43))[4,] #partial out remainder for Hispanics
partial.r(data, c(1:3,41), c(42))[4,] #partial out remainder for Blacks


Bryc, K., Durand, E. Y., Macpherson, J. M., Reich, D., & Mountain, J. L. (2014). The Genetic Ancestry of African Americans, Latinos, and European Americans across the United States. The American Journal of Human Genetics.

Incidentally, the Wiki page was very poor, so I had to rewrite that before writing this.

Generally, this was an interesting read that taught me a lot. This probably has to do with me not really caring much about sports. Some parts can be boring if you don’t care/know much about e.g. Baseball. It is pretty US-centric in the topics chosen.

The science in the book comes mostly thru interviews with experts and some summarizing of studies. Rarely is sufficient detail given about the studies for one to make an informed decision about whether to trust it or not. Usually, no sample sizes, p-values, effect sizes etc. are mentioned. It was written as a popular science book to be fair, so this criticism is somewhat unfair.

Some quotes:

When scientists at Washington University in St. Louis tested him, Pujols, the greatest hitter of an era, was in the sixty-sixth percentile for simple reaction time compared with a random sample of college students.

College students are above average g, which means above average reaction time. Presumably, the tested simple reaction time. This correlates about .2 with g. College students are perhaps at 115 on average. This university is apparently a top university. So perhaps the mean IQ is 120-125 there, meaning that these students are about 0.334 d above the mean on reaction time (unless they were students in fysical ed. in which case they may be even higher). Being at the 66 centile is not bad then.

Jason Gulbin, the physiologist who worked on Australia’s Olympic skeleton experiment, says that the word “genetics” has become so taboo in his talent-identification field that “we actively changed our language here around genetic work that we’re doing from ‘genetics’ to ‘molecular biology and protein synthesis.’ It was, literally, ‘Don’t mention the g-word.’ Any research proposals we put in, we don’t mention the genetics if we can help it. It’s: ‘Oh, well, if you’re doing molecular biology and protein synthesis, well, that’s all right.’” Never mind that it’s the same thing.

Studying race? NAZI NAZI!!! Studying population genetics? No problem, carry on.

This story is fascinating. Perhaps the best example of how categorical thinking about gender lead to real life problems.

Several scientists I spoke with about the theory insisted that they would have no interest in investigating it because of the inevitably thorny issue of race involved. One of them told me that he actually has data on ethnic differences with respect to a particular physiological trait, but that he would never publish the data because of the potential controversy. Another told me he would worry about following Cooper and Morrison’s line of inquiry because any suggestion of a physical advantage among a group of people could be equated to a corresponding lack of intellect, as if athleticism and intelligence were on some kind of biological teeter-totter. With that stigma in mind, perhaps the most important writing Cooper did in Black Superman was his methodical evisceration of any supposed inverse link between physical and mental prowess. “The concept that physical superiority could somehow be a symptom of intellectual inferiority only developed when physical superiority became associated with African Americans,” Cooper wrote. “That association did not begin until about 1936.” The idea that athleticism was suddenly inversely proportional to intellect was never a cause of bigotry, but rather a result of it. And Cooper implied that more serious scientific inquiry into difficult issues, not less, is the appropriate path.

How very familiar. Better not hurt those feelings! At least they should publish the data anonymously in some way so others can examine them.

There is a university called Lehigh… Le High… geddit??

In 2010, Heather Huson, a geneticist then studying at the University of Alaska, Fairbanks—and a dogsled racer since age seven—tested dogs from eight different racing kennels. To Huson’s surprise, Alaskan sled dogs have been so thoroughly bred for specific traits that analysis of microsatellites—repeats of small sequences of DNA—proved Alaskan huskies to be an entirely genetically distinct breed, as unique as poodles or labs, rather than just a variation of Alaskan malamutes or Siberian huskies.
Huson and colleagues discovered genetic traces of twenty-one dog breeds, in addition to the unique Alaskan husky signature. The research team also established that the dogs had widely disparate work ethics (measured via the tension in their tug lines) and that sled dogs with better work ethics had more DNA from Anatolian shepherds—a muscular, often blond breed of dog originally prized as a guardian of sheep because it would eagerly do battle with wolves. That Anatolian shepherd genes uniquely contribute to the work ethic of sled dogs was a new finding, but the best mushers already knew that work ethic is specifically bred into dogs.
“Yeah, thirty-eight years ago in the Iditarod there were dogs that weren’t enthused about doing it, and that were forced to do it,” Mackey says. “I want to be out there and have the privilege of going along for the ride because they want to go, because they love what they do, not because I want to go across the state of Alaska for my satisfaction, but because they love doing it. And that’s what’s happened over forty years of breeding. We’ve made and designed dogs suited for desire.”

Admixture studies in dogs, a useful precedent to cite to ease the pain for newcomers.

In one tank are mice missing oxytocin receptors. They are used in the study of pain, but the mice also have deficits in social recognition. Put them with mice they grew up with and they won’t recognize them. In another corner is a tank of raven-haired mice that were bred to be prone to head pain, that is, migraines. They spend a lot of time scratching their foreheads and shuddering, and they are apparently justified in using the old headache excuse to avoid mating. “This experiment has taken years,” says Jeffrey Mogil, head of the lab, of the work that seeks to help develop migraine treatments, “because they breed really, really badly.”

How did they get ethics approval for this???

As Pitsiladis put it, to be a world-beater, “you absolutely must choose your parents correctly.” He was being facetious, of course, because we can’t choose our parents. Nor do humans tend to couple with conscious knowledge of one another’s gene variants. We pair up more in the manner of a roulette ball that bounces off a few pockets before settling into one of many suitable spots. Williams suggests, hypothetically, that if humanity is to produce an athlete with more “correct” sports genes, one approach is to weight the genetic roulette ball with more lineages in which parents and grandparents are outstanding athletes and thus probably harbor a large number of good athleticism genes. Yao Ming—at 7’5″, once the tallest active player in the NBA—was born from China’s tallest couple, a pair of ex–basketball players brought together by the Chinese basketball federation. As Brook Larmer writes in Operation Yao Ming: “Two generations of Yao Ming’s forebears had been singled out by authorities for their hulking physiques, and his mother and father were both drafted into the sports system against their will.” Still, the witting merger of athletes in pursuit of superstar progeny is rare.

Sure we do!  Some do it quite consciously, e.g. using dating sites that match for overall likeness.