In reply to:


There’s a new kind of ‘environmentalist’ (rather, anti-hereditarian) defense in town, or at least, one that’s not commonly seen. It goes like this:

This is the first of a series of blog posts about race and intelligence. My opinions on this topic are, I think, the least popular arguments I have ever made, and I have made a few unpopular ones over the years. My assertion, basically, is this: discussions of differences in IQ among racial groups are not actually about empirical data. My argument is unpopular, I think, because both sides of the issue are committed to the idea that they are being scientific about it. The hereditarians want to be seen as scientific so they aren’t seen as racists; the anti-hereditarians want to be scientific so they aren’t seen as frightened by socially difficult scientific findings.

So, the matter is not just hard, says Turkheimer, it’s impossible, making it not scientific at all. The main targets are researchers like Jensen:

It was Arthur Jensen who first attempted to extract claims of genetically-based differences in the IQs of racial groups from the obviously racist context in which they had previously existed. Jensen’s writing on the subject adopted a measured scientific tone, denounced racism directed at individuals, referred extensively to data, and was extremely well-informed about the science of human intelligence. Jensen’s work on race differences produced pushback of all kinds, but the predominant strand accepted his contention that the question of race differences was a scientific hypothesis deserving of careful empirical consideration. The classic work of this kind, still the only take on the subject that appears genuinely to avoid reaching a conclusion one way or the other, was written by my mentor John Loehlin.

Turkheimer gets Loehlin et al wrong, for they wrote in their conclusion:

Given the empirical findings, and the theoretical arguments we have discussed, what conclusions about racial-ethnic differences in per­formance on intellectual tests appear justified? It seems to us that they include the following:

1. Observed average differences in the scores of members of different U.S. racial-ethnic groups on intellectual-ability tests probably reflect in part inadequacies and biases in the tests themselves, in part differences in environmental conditions among the groups, and in part genetic differences among the groups. It should be emphasized that these three factors are not necessarily independent, and may interact

2. A rather wide range of positions concerning the relative weight to be given these three factors can reasonably be taken on the basis of current evidence, and ;t sensible person’s position might well differ for different abilities, for different groups, and for different tests.

3. Regardless of the position taken on the relative importance of these three factors, it seems clear that the differences among individuals within racial-ethnic (and socioeconomic) groups greatly exceed in mag­nitude the average differences between such groups.

Let us emphasize that these conclusions arc based on the condi­tions that have existed in the United States in the recent past. None of them precludes the possibility that changes in these conditions could alter these relationships for future populations. It should also he noted that the probable existence of relevant environmental differ­ences, genetic differences, and psychometric biases does not imply that they must always be in the same direction as the observed between-group differences.

On the whole, these are rather limited conclusions. It does not appear to us, however, that the state of the scientific evidence at the present time justifies stronger ones.

In other words, Loehlin et al attribute non-zero effect size to between group genetics, as well as test bias and environmental conditions. Sounds familiar? Let’s try The Bell Curve:

It the reader is now convinced that either the genetic or environmen­tal explanation has won out to the exclusion of the other, we have not done a sufficiently good job of presenting one side or the other. It seems highly likely to us that both genes and the environment have something to do with racial differences. W hat might the mix be? We are resolutely agnostic on that issue; as far as we can determine, the evidence does not yet justify an estimate.

The conclusion by Loehlin et al and Murray & Herrnstein are in fact almost identical. Both agree that some mix of genetics and environment explain the gap, but Loehlin et al also attribute some to test bias, which M&H does not, at least in that passage. Both groups decline to attempt make definite numerical estimates, or even give a probability distribution. What about Jensen? Here’s Jensen in 1969:

The fact that a reasonable hypothesis has not been rigorously proved does not mean that it should be summarily dismissed. It only means that we need more appropriate research for putting it to the test. I believe such definitive research is entirely possible but has not yet been done. So all we are left with are various lines of evidence, no one of which is definitive alone, but which, viewed all to­gether, make it a not unreasonable hypothesis that genetic factors are strongly implicated in the average Negro-white intelligence difference. The preponderance of the evidence is, in my opinion, less consistent with a strictly environmental hypothesis than with a genetic hypothesis, which, of course, does not exclude the influence of environment or its interaction with genetic factors.

Jensen does not strictly say one or another factor is >0, it seems that he thinks about it in terms of a probability distribution for different values of the between group heritability. What do modern researchers think in terms of probability distributions? Unfortunately, Rindermann et al did not poll researchers this way, but simply asked them 1) whether the evidence justifies a point estimate, and 2) if so, what their best guess is. In terms of a probability distribution, this might be taken to mean the mean, median or mode, depending on interpretation. The Beta distribution is well-suited for estimating single-parameters models.

You can use this calculator to try out values until you find something that approximates your probability distribution. My probability distribution looks something like this [Beta(2, .5)]:

I will attempt to do a poll of top experts on both sides, so we can get an idea of how sure the experts are.

But aside from that, what reasons do Turkheimer give for believing the matter to be unsolvable? First, he notes that humans are stubborn, and continues:

The second reason involves the nature of the studies that are invoked. When one is discusses questions about the origins of human differences, most of the scientific designs that might be most informative aren’t available. You can’t breed people, you can’t mess with their DNA, you can’t raise them under controlled conditions. So the discussion is necessarily based on quasi-experimental science that is by definition fundamentally flawed. In research of this kind the grounds for objection are almost always greater than the valid scientific signal, making it endlessly possible to trade insults about the low quality of the evidence produced by the other side. Many if not most social scientific arguments are tiresome in exactly this way. (The hereditarians are always saying that any day now there will be new genomic data that will settle the question once and for all. They are wrong. More on that in subsequent posts.)

Turkheimer is engaging in a textbook case of proving too much. His argument about the lack of strictly controlled experiments means that essentially all science is pseudoscience, or non-science. This extends far beyond social science: we cannot do experiments in many areas of geology, meteorology, astronomy, or biology, but of course covers quite a lot of social science too, including virtually all of history, and most of sociology, psychology, epidemiology etc.

Turkheimer has a third reason, but it’s strangely inconsistent with the second. He begins by noting how there’s both established theory and models for calculating within group heritabilities, “it is why the nature-nurture debate is over.”. But with between group heritability, he says:

Now have a look at the Rushton and Jensen paper making the case for the partial genetic determination of racial differences, or listen to Murray and Harris, or read any of the replies to our piece in Vox. Where are the percentages? Where is the equivalent of the ACE model? Where are the structural equation models with parameters quantifying the “partly genetic and partly environmental” hypothesis the hereditarians keep repeating? For all the hereditarians’ idle intuitions about differences being part genetic and part environmental, where is the empirical or quantitative theory that describes how this apportioning is supposed to work? There is no such thing as a “group heritability coefficient,” no way to put any meat on the speculative bones about partial genetic determination. In the absence of an actual empirical theory, the discussion is all about your intuitions against my intuitions. “Well, it sure seems to me that if the black-white difference was environmental, it would have been reduced at least a little by now!” “Yeah, but it has been reduced by five IQ points, and what about the children born to white German mothers and African-American soldiers?” “Well, OK, but what about Scarr’s trans-racial adoption study?” “Yeah, that’s not how I interpret the trans-racial adoption study!” And so on ad infinitum.

Turkheimer appears to be making the claim that genetic models make no (falsifiable) predictions, and there’s no models to think about the issue with. A strange claim that’s incoherent with Turkheimer et al’s own recent paper of such predictions that they believe have been falsified.

Of course, it is wholly untrue and predictions of genetic models are not only quite simple, they have been given by many prominent researchers for decades. DeFries (1972) gives an overview of the mathematics that can be used to derive the between group heritability, and this was also given by Jensen (1998, p. 443), citing Defries, Loehlin et al and Lush (1968). I have done a series of simulations to illustrate the modeling and methods, see these for further details.

Aside from direct methods to estimate the between group heritability, one can also use a number of more obvious methods to decide between specific models. The matter is particularly simple if we assume a simple autosomal, additive genetics-only model with random mating. From this, one can make many testable predictions such as:

  • If a person from group A mates with a person from group B, their offspring is expected to have the trait value half-way between the groups. The parent order should not matter, i.e. A-B offspring have the same expected value as B-A offspring (male-female).
  • Assuming no adoptive selection, then if children from group A are adopted by parents from group B, the children are expected to attain the trait value of their biological parents, not adoptive parents. The order of the groups does not matter.
  • If a mixed population exists, then the ancestry and trait levels should follow a simple linear model, where the predicted difference for both extremes reflect the genetic gap, i.e. E(T|A=1) – E(T|A=0) = the genetic difference in T.

Deviations from these predictions that cannot be explained by sampling error or bias constitute falsification of the simplest possible model. For instance, if mating deviates from random mating in certain ways, the offspring means may depend on the parental group order. While there are some studies of these predictions, they are few, have small samples, are very old etc., so the literature is not very reliable. Aside from the purely statistical, publication bias, researcher error, fraud etc. can make reality seem different than it really is. We have every reason to expect massive bias for this topic, as no topic in social science is more fraught with feelings. Indeed, researchers (Turkheimer included) and philosophers regularly ensure us of the horrors that will entail lest people start believing that group differences have anything to do with genetics. The most recent paper of back and forths is Jeffery and Shackleford (2017).

Difficulties in figuring out the causes

Whereas Turkheimer makes the easily disprovable claim that no models etc. exist for modeling or quantifying genetic group differences in polygenic traits, Dalton Conley & Jason Fletcher make the more defensible claim that:

That said, let us ask what is perhaps the most controversial question in the human sciences: Do genetic differences by ancestral population subgroup explain observed differences in achievement between self-identified race groups in the contemporary United States over and above all the environmental differences that we also know matter? In their best-selling 1994 book, The Bell Curve: Intelligence and Class Structure in American Life, Richard Herrnstein and Charles Murray indeed made the argument that blacks are genetically inferior to whites with respect to cognitive ability. Their “evidence,” however, contained no molecular genetic data, and was flawed as a result. But today we have molecular data that might potentially allow us to directly examine the question of race, genes, and IQ. We raise this pernicious question again only to demonstrate the impossibility of answering it scientifically.

Given that I quoted the relevant passage from The Bell Curve, readers can judge for themselves the fairness of C&F’s claims about what it actually said. One funny point is that while Turkheimer believes genomic data to be useless for deciding the issue, C&F believes it is relevant, and criticizes H&M for not basing their conclusion on a kind of evidence that wasn’t available at the time. Well, it soon will be available in large quantities. Turkheimer has a prediction about what will happen in the coming years:

My concern is that anti-hereditarians play into race scientist’s hands when we agree to engage with them as though there existed a legitimate research paradigm proceeding toward a rational conclusion. At least in the social sciences, legitimate empirical research paradigms rarely come to all or none conclusions, so it becomes natural for people to conclude, with Murray and Harris, that the whole long argument is bound to settle eventually on the idea that group differences are a little environmental, a little genetic. But in fact, that is not where we are headed. I predict that in a relatively short period of time, contemporary race science will seem just as transparently unscientific and empirically untrue as the race science of the early 20th Century now appears from our modern perspective.

(Turkheimer’s prediction is even more strange considering that he thinks genomic data will be irrelevant.)

I think Murray and Harris are exactly right. C&F consider both the options I used in my simulation above: 1) polygenic scores, and 2) ancestry analysis. They think neither will work. Their problem with polygenic scores is linkage disequilibrium (LD) decay combined with non-causal variants:

As it turns out, however, these scores when developed for one population—say, those of European descent—fail to predict for other populations. Take the height example. The best height score, which has been “trained” on whites, when applied to blacks, predicts that Africans or African Americans are six inches shorter than they are. So they simply don’t work. The very differences in genetics between ancestral groups make comparisons across groups impossible. The million or so markers measured by gene chips are picking up different things in distinct populations. They are merely flags spaced out along the chromosomes and meant to stand in for all the genetic real estate around them. But what that real estate holds is very different—particularly when African descent populations, with their greater degree of variation, are compared to non-African populations. So polygenic scores, while useful for analysis within populations, do not allow us to make apples to oranges comparisons across groups.

This is not a new problem. In fact I discussed it in 2015 and recently in 2017 as well in relationship to Pifferian worldwide results. C&F does not seem to understand the reason why the scores can fail to work across groups. An excellent (preprint) paper by Zanetti and Weale discusses the various options and carries out a simulation study based on real data:

Through genome-wide association studies (GWASs), researchers have identified hundreds of genetic variants associated with particular complex traits. Previous studies have compared the pattern of association signals across different populations in real data, and these have detected differences in the strength and sometimes even the direction of GWAS signals. These differences could be due to a combination of (1) lack of power (insufficient sample sizes); (2) minor allele frequency (MAF) differences (again affecting power); (3) linkage disequilibrium (LD) differences (affecting power to tag the causal variant); and (4) true differences in causal variant effect sizes (defined by relative risks). In the present work, we sought to assess whether the first three of these reasons are sufficient on their own to explain the observed incidence of trans-ethnic differences in replications of GWAS signals, or whether the fourth reason is also required. We simulated case-control data of European, Asian and African ancestry, drawing on observed MAF and LD patterns seen in the 1000-Genomes reference dataset and assuming the true causal relative risks were the same in all three populations. We found that a combination of Euro-centric SNP selection and between population differences in LD, accentuated by the lower SNP density typical of older GWAS panels, was sufficient to explain the rate of trans-ethnic differences previously reported, without the need to assume between population differences in true causal SNP effect size. This suggests a cross population consistency that has implications for our understanding of the interplay between genetics and environment in the aetiology of complex human diseases.

In any case, we will know for sure when we start finding the actual causal variants for traits. LD decay is only a problem when one relies on proxy (or tag) variants instead of the causal variants. LD decay also becomes smaller the denser the genotyping because this will result in the stronger LD patterns that decay less across populations. Of course, when we get affordable whole genome sequencing (in a few years), density will hit and ceiling and LD decay will be a much smaller problem. C&F continue:

The problem is the same one facing Wade (even if he was unaware of it): Whether measured with a single genetic marker or a summative measure like a principal component, genes act as proxies for environments. The only way to truly insure that observed differences by genetics are really genetic effects is to compare full siblings from the same family where we know the differences between brothers and sisters are the result of luck, of the randomness at conception, and not correlated with background differences in poverty, neighborhood, and so on. But here’s the catch-22: While polygenic scores vary quite a bit between siblings, measures of ancestry, almost by definition, do not. Thus, while initially promising, the idea of comparing siblings with differing dosages of continental ancestries won’t work either.

C&F are half-right. Ancestry itself does not establish genetic causation (believing so, is almost a reverse sociologist’s fallacy), though the absence of such falsifies most genetic models. As such, it constitutes a test for genetic models, one that they have passed in a large number of genomic studies examining links between ancestry and social outcomes (Kirkegaard et al, 2017), as well as in the only large study that examined ancestry using proper genomic data (Kirkegaard et al, in review). Note that the aggregate-level version of the genetic models predict the same kinds of relationships for units composed of multiple persons, possible of mixed ancestry. Many studies of this kind have also confirmed predictions from genetic models (Fuerst and Kirkegaard, 2016; Kirkegaard and Fuerst, in review). Models that make risky — falsifiable — predictions and are repeatedly vindicated increase their posterior probability. What risky predictions have environment-only models made? None, really. Turkheimer et al, as well as others recently still argued for no relationship between ancestry and IQ, something that has been clearly disproved by the analysis of the PING data.

C&F are wrong to require genomic full sibling-control studies as there are only a few plausible hypothesis. One such hypothesis-family is a reactive one based on racial appearance. These can work in multiple ways. In the other-reactive model, racial appearances give rise to hostile discrimination from others, and somehow this results in lowered intelligence, though mysteriously does not seem to affect many other more plausible traits such as self-esteem (Dalliard, 2014). Furthermore, large-scale studies fail to find substantial differences in self-reported discrimination by race (Boutwell et al, 2017). In the self-reactive one, persons react to their own racial appearance by adopting the cultural values associated with how they look, instead of their true polygenic scores.

C&F are also wrong about the impossibility of doing genomic full sibling-control studies. As with their other claims, others already looked into the idea and conducted some analysis to determine the necessary sample size. In a comment on their article, Gwern, who conducted such an analysis along with yours truly, writes:

‘requires large sample sizes’ is not at all the same thing as “won’t work”. Yes, it may require something like n=50k (since siblings will differ by a few percent on ancestry), but that’s not as impossible as it looks, now that we have individual datasets like UKBB with n=500k+ and serious plans for studies with n=1m. With whole genomes at <$1000 and SNP panels at <$50 and costs still falling, it will soon become routine for everyone to be genotyped, picking up fraternal twins / sibling pairs / trios. (For example, at around 1m births a year in the US and around 15% of the population, less than a years’ worth would be necessary.)

Note that the full genomic full sibling design was proposed by hereditarians already back in 2013 (Malloy, 2013). But there are other designs that require smaller samples. Half-siblings are quite common, and particularly so among African Americans due to their relatively unstable families. Using data from such designs still enables a powerful control for shared environment-type causes, while increasing the dispersion in ancestry among siblings. One can also include measures of racial appearance, self-rated racial/cultural identification etc. Even better would be to use a genomic adoptive design, which maximizes the ancestry differences between the siblings. However, such data would probably also be the hardest to get.

C&F end their piece by grasping at straws:

There has long been evidence—dating back to the days of W.E.B. Du Bois—that there is a pigmentocracy within U.S. black (and white and Latino) communities. More recent work has shown that this is not a uniquely American phenomenon but extends to Brazil, South Africa, and other nations with a creole, mixed population. We could try to measure skin tone and factor that out. But we cannot ultimately measure all the myriad cues about racial identity that we react to, especially since we may not even be aware of them. It could even be the case that African or European ancestry predicts height and that taller people are treated better in school, get more nutritional resources at home, and so on. Even though we do not generally think of height as a key dividing line for race, it does not mean that it is not silently associated—at the genotypic level—with the alleles that also differ by race.

Science proceeds by plausibility. C&F is merely arguing here that we cannot be 100% certain. That is true, but also irrelevant. Genetic models keep producing correct predictions, and environmentalists cannot keep responding with “yeah, but you haven’t controlled for possible environmental causes that we don’t even know about” if they want to retain scientific credibility. They must put forward plausible causes that can be tested, otherwise they have — as Jensen and Rushton wrote in 2005 — a degenerative research program. Ironically, they follow-up with:

The near impossibility of a definitive, scientific approach to interrogating genes, race, and IQ stands in contrast to the loose claims of pundits or scholars who assert that there is a genetic explanation for the black-white test score gap. That said, the consideration of genetics in racial analysis is not always pernicious. The ability to control for genotype actually places the effects of social processes, like discrimination, in starker relief. Once you eliminate the claim that there are biological or genetic differences between populations by controlling them away, we can show more clearly the importance of environmental (non-genetic) processes such as structural racism.

Which side has ‘loose claims’ by ‘pundits’? We know the answer to this question, so why do C&F pretend otherwise? Let them make their predictions, publicly, and then let’s examine large datasets that can confirm or disconfirm said predictions. The true test of any scientific enterprise, model, and theory is predictive validity.

Intelligence, special, and yet not so special

Intelligence (general cognitive ability etc.) is of special significance to humans, but genetically speaking, it is not unusual. The same methods that can be used to infer recent selection for height, BMI etc. can be used for intelligence. Such methods are already being used. For instance, one recent paper concluded that height and BMI differences between European populations were partially heritable (Robinson et al 2015):

Across-nation differences in the mean values for complex traits are common, but the reasons for these differences are unknown. Here we find that many independent loci contribute to population genetic differences in height and body mass index (BMI) in 9,416 individuals across 14 European countries. Using discovery data on over 250,000 individuals and unbiased effect size estimates from 17,500 sibling pairs, we estimate that 24% (95% credible interval (CI) = 9%, 41%) and 8% (95% CI = 4%, 16%) of the captured additive genetic variance for height and BMI, respectively, reflect population genetic differences. Population genetic divergence differed significantly from that in a null model (height, P < 3.94 × 10−8; BMI, P < 5.95 × 10−4), and we find an among-population genetic correlation for tall and slender individuals (r = −0.80, 95% CI = −0.95, −0.60), consistent with correlated selection for both phenotypes. Observed differences in height among populations reflected the predicted genetic means (r = 0.51; P < 0.001), but environmental differences across Europe masked genetic differentiation for BMI (P < 0.58).