Clear Language, Clear Mind

April 11, 2019

The northwest-southeast cline in Europe and the brain

Filed under: Genomics,intelligence / IQ / cognitive ability,Neuroscience — Tags: — Emil O. W. Kirkegaard @ 19:55

A friend sent me this amazing study, seems to have been previously overlooked by hereditarians.


Human skull and brain morphology are strongly influenced by genetic factors, and skull size and shape vary worldwide. However, the relationship between specific brain morphology and genetically-determined ancestry is largely unknown.


We used two independent data sets to characterize variation in skull and brain morphology among individuals of European ancestry. The first data set is a historical sample of 1,170 male skulls with 37 shape measurements drawn from 27 European populations. The second data set includes 626 North American individuals of European ancestry participating in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) with magnetic resonance imaging, height and weight, neurological diagnosis, and genome-wide single nucleotide polymorphism (SNP) data.


We found that both skull and brain morphological variation exhibit a population-genetic fingerprint among individuals of European ancestry. This fingerprint shows a Northwest to Southeast gradient, is independent of body size, and involves frontotemporal cortical regions.


Our findings are consistent with prior evidence for gene flow in Europe due to historical population movements and indicate that genetic background should be considered in studies seeking to identify genes involved in human cortical development and neuropsychiatric disease.

Main highlights from the paper text:

We then tested the hypothesis that skulls exhibit clinal variation along geographic axes within Europe. A directional Mantel correlogram shows a monotonic decrease in craniometric similarity with distance in two orthogonal directions, NW-SE and NE-SW (online suppl. fig. S3B), and this result motivated us to search for a geographic axis that can explain a significant fraction of the craniometric variation between populations. Redundancy analysis, a constrained version of principal components analysis (PCA), was used to find a projection that maximizes the variation of the 37 cranial measures under the condition that this projection is a linear combination of longitude and latitude. The first principal component is statistically significant (p = 0.005) and explains 12.8% of total craniometric variation, more than a third of the variation (30.5%) explained by the first component of an unconstrained PCA. Population eigenvalues for the first principal component were interpolated by ordinary kriging and plotted to create an isocline map of Europe (fig. (fig.2a).2a). Cross-validation indicated that predicted population eigenvalues were highly correlated with observed values (r = 0.88). Significantly, this map shows a clear gradient along a NW-SE axis that was not specified a priori and emerged from the redundancy analysis as the direction of maximum cranial variation. Therefore, a subset of cranial measures exhibits clinal variation along this geographic axis.

Cranial morphology reflects geography across Europe. a Pairwise distances between 27 European populations. Craniometric distance is significantly correlated (rM = 0.51, p < 1 × 10-5) with geographic distance. b Non-metric multi-dimensional scaling ordination of craniometric distances aligned to geographic coordinates of populations. Population symbols identify 4 clusters, and lines form a minimum spanning tree. c Distances between predicted locations based on craniometric ordination and population locations. Average ± SD plotted for 100 bootstrap replications (black) and random permutations (gray). * p < 0.001. d Individual female skulls were identified with correct or nearby populations based on cranial morphometry (solid) significantly better than chance (dotted). e Proportion of female skulls that were correctly classified (black) and misclassified with populations at different distances (gray shades). Sample sizes are listed after population name.

Cranial measures show significant variation along a NWSE axis within Europe. a Isoclines of interpolated eigenvalues for first spatially constrained component of a redundancy analysis and geographic locations of populations. b Cranial measures plotted in order of their contribution to this map. Negative abscissas correspond to a more NW location. Proportion of variance explained (R2) and nominal p values are indicated. NLB = Nasal breadth; M28 = sagittal occipital arc; GOL = glabello-occipital length; NOL = nasio-occipital length; ASB = biasterionic breadth; BBH = basion-bregma height.

Intracranial and brain volumes and cortical surface area progressively increase with the amount of inferred NW European ancestry (fig. (fig.3b),3b), and these measures are approximately 5% larger in the 10% of individuals with the most NW European ancestry compared to the 10% with the most SE European ancestry. This percentage increase matches the percentage increase in cranial length and breadth observed along the same NW-SE geographic axis in the skull data set (fig. (fig.2b)2b) and cannot be attributed to a correlation with body size since we controlled for height and weight. This correlation involves specific – not global – brain morphology because hippocampal, basal ganglia, ventricular, and cerebellar volumes and average cortical thickness are not associated with NW-SE ancestry.

In this study, we leveraged brain imaging and genome-wide genotyping from 626 European Americans, as well as skull measurements obtained on an independent set of 1,170 individuals of European ancestry, to test the hypothesis that skull and brain morphology, like genetic background [,], mirror geography within Europe. We found that skull and brain morphology vary continuously across Europe as evidenced by the weak clustering between and large variation within populations and are geospatially structured. In particular, we observed a significant NW-SE gradient in morphology that is independent of body size and involves predominantly frontotemporal cortical areas.

The genetic basis for this geographic trend in skull and brain variation is strengthened by the observation of the trend in North American individuals of European ancestry. The environment – e.g. nutrition and prenatal healthcare – can influence skull morphometry due to developmental plasticity [,] and may be correlated with genetic variation across Europe. In contrast, environmental exposures may vary among European Americans, but this variation is less likely to be geographically structured based on individuals’ European ancestry. Furthermore, Ashkenazi Jewish individuals are geographically dispersed in Europe and yet are genetically quite similar and genetically intermediate between SE European and Middle Eastern populations [,,,]. This provides further support that the observed NW-SE clinal variation in brain morphology is driven by genetic more than environmental differentiation of these populations.


It is plausible that genes responsible for cortical expansion during human evolution retain a role in brain development and contribute to normal variation in brain morphology within and between modern human populations [,]. Identifying these genes could contribute to our understanding of developmental abnormalities associated with neuropsychiatric diseases such as autism and schizophrenia. In this context, admixture mapping may prove to be a powerful strategy for identifying genomic regions responsible for overt brain morphology differences among individuals of European ancestry. Independent of the use of inferred ancestry for identifying genes, our results indicate that studies seeking to identify genes that influence brain morphology should consider genetic background, as it reflects historical mixing and then isolation of populations.


Frontotemporal cortical regions are most affected by NW European ancestry. Lateral view of the left hemisphere with color map that indicates nominal -log10 (p value) of association between estimated NW-SE ancestry and cortical surface area across the reconstructed cortical surface, while controlling for height, weight, BMI, age, sex, and diagnosis.

Structural brain measures follow a predicted trend in a group of individuals with European ancestry. a The first two principal components of genotypes of ADNI subjects (yellow/small points; color refers to online version only) and individuals from European reference populations (gray crosses) rotated 18° to align with a map of Europe. For each reference population (see online suppl. table S3 for labels), the average (SD) of principal components for all individuals in that population are indicated by disc position (diameter). Geographic origin of each population is indicated by disc shade of gray from NW (black) to SE (light gray) Europe. ADNI subjects are spread out primarily along a NW-SE axis and form two distinct clusters corresponding to NW European and Ashkenazi Jewish ancestry (see also online suppl. fig. S5). b Brain structural measures tested for association with estimated NW-SE ancestry, while controlling for height, weight, BMI, age, sex, and diagnosis. Negative abscissas correspond to a larger proportion of NW ancestry.

March 25, 2019

Direct estimate of between group heritability with estimated polygenic scores

Filed under: Genomics — Tags: , — Emil O. W. Kirkegaard @ 04:15

A while ago, I had an idea for a direct estimate of the between group heritability from polygenic scores. The idea is based on the old DeFries (1972) paper, which had the following equation:

where h2B is the between group heritability, h2W is the within group heritability, t is the intraclass phenotypic correlation, and r is the intraclass genotypic correlation. I have shown thru detailed simulations that this equation works for polygenic group differences, i.e. produces correct results: I undertook this simulation work mainly because Eric Turkheimer was (again) talking ideologically:

For all the hereditarians’ idle intuitions about differences being part genetic and part environmental, where is the empirical or quantitative theory that describes how this apportioning is supposed to work? There is no such thing as a “group heritability coefficient,” no way to put any meat on the speculative bones about partial genetic determination.

However, a limitation of the above function is that one has access to true polygenic scores, not the noisy estimates we have. In fact, it is better to be very clear on the question. We can think of polygenic scores and error according to the classical psychometric measurement model: true polygenic score = estimated polygenic score + error (that is, random error). In this equation, true polygenic score is the genetic potential for the trait in question. Since we in practice only have estimates, and our estimates are quite noisy, one will have to adjust for this to estimate the r value for the equation. Not doing the correction just deflates the BGH estimate because the r is in the numerator. This is the same situation which was encountered in the recent studies of dysgenic selection based on our current PGSs, and which one correct for the ‘missing heritability’ i.e. the measurement error in the PGS estimate. See:

Most importantly, POLYEDU is just a fraction of the full genetic component of educational attainment, which we denote by POLYFULL. It is the rate of change of POLYFULL that is of ultimate interest. Under an assumption that the part of POLYFULL that is not captured by POLYEDU behaves in a similar fashion in its impact on reproduction, the rate of change is proportional to the square root of the variance explained (SI Text). Thus, if POLYFULL is assumed to account for 30% of the variance of EDU, then its estimated rate of change, by extrapolation, is −0.010 × (30/3.74)1/2 = −0.028 SUs per decade. To test the validity of this method of extrapolation we computed a separate polygenic score for educational attainment, denoted by POLY-U.K.B, which was based on the same GWAS results used to construct POLYEDU, except that the contribution from 111,349 UK Biobank samples was removed (Materials and Methods). When we applied POLY-U.K.B to the Icelandic data, it explained 2.52% of the variance of EDU, and the rate of decline estimated based on its effects on reproduction is −0.0085 SU per decade (Materials and Methods). Hence, with the polygenic score strengthening from POLY-U.K.B to POLYEDU, the estimated rate of decline increased by a factor of (0.0104/0.0085) = 1.22, nearly identical to (3.74/2.52)1/2 = 1.22, the square root of the variance explained ratio.

In our case above, we need to adjust an intraclass correlation for measurement error. I haven’t found anyone who knew about to do this, but since intraclass correlations are merely ANOVAs in disguise, and ANOVAs are just linear regression models in disguise, perhaps one can find a method from the econometrics literature. Solution of this issue requires someone with better math stats abilities than I possess (suggestions very welcome!).

Supposing one solves the measurement error problem above, the second issue is that current PGSs are not equally valid across the groups we are interested in. Though I haven’t simulated this situation (yet?), I’m sure this will result in estimation bias of the BGH value. Specifically, lower validity of our estimated PGS in one group will bias the intraclass correlation towards zero. I think the approach here is to utility an estimated PGS that does not suffer from group predictive bias in the classical test theory sense of equal slopes and intercepts. Currently, one could try Lee et al 2018‘s putatively causal PGS estimate (see Section 5 in their supplementary materials) and see if this produces unbiased predictions within groups of interest. Testing this requires a lot of sample size, which is why we have not yet done it. The TCP dataset, which we now have access to, will probably allow a test of this idea since it has ~3,000 blacks and ~4,500 whites. Assuming one can find an estimated PGS that shows no predictive bias by group, then one can adjust it for (random) measurement error as discussed above, and then finally get a first direct estimate of the BGH for the trait and groups in question. We are working on a series of ideas using the TCP dataset, hopefully multiple of them to be finished in 2019, so stay tuned! :)

Reversely, if one is willing to grant that the genetic architecture of human polygenic traits is the same across race groups (supported by simulations done using 1000 genomes; Zanetti and Weale 2018), one can use this fact to identify the causal variants by giving more probability of causality to variants that have no predictive bias across ancestry groups. This is the general reasoning behind multi-ethnic GWASs, which have been receiving increasing attention recently, e.g.: Klarin et al 2018, Roselli et al 2018, Li et al 2018 (and many others). In fact, Africans are the best population to use because they have the weakest LD between variants (because they didn’t experience out of Africa population bottlenecks that decrease variation/increase LD), making it easier to spot the true causal variant among the correlated variants.

March 9, 2019

Of cats and dogs and men

Filed under: Genomics — Tags: , , , — Emil O. W. Kirkegaard @ 20:44

Genetic variation between populations/races of a species is a nice single summary statistic about how large between population phenotypic differences to expect. In case of humans, this value (Fst, the fixation index) is about 15%. This finding is due to Lewontin (1972) and is now mindlessly repeated (Lewontin’s fallacy) as some kind of slam dunk argument against race realism. However, it’s somewhat of a double-edged sword because the choice of this particular metric allows one to look up comparisons with other species, species which have obviously agreed upon subspecies/races/populations (Woodley 2010). Of particular interest to humans are dogs and cats, so how large are their genetic differences?

With several hundred genetic diseases and an advantageous genome structure, dogs are ideal for mapping genes that cause disease. Here we report the development of a genotyping array with 27,000 SNPs and show that genome-wide association mapping of mendelian traits in dog breeds can be achieved with only 20 dogs. Specifically, we map two traits with mendelian inheritance: the major white spotting (S) locus and the hair ridge in Rhodesian ridgebacks. For both traits, we map the loci to discrete regions of <1 Mb. Fine-mapping of the S locus in two breeds refines the localization to a region of 100 kb contained within the pigmentation-related gene MITF. Complete sequencing of the white and solid haplotypes identifies candidate regulatory mutations in the melanocyte-specific promoter of MITF. Our results show that genome-wide association mapping within dog breeds, followed by fine-mapping across multiple breeds, will be highly efficient and generally applicable to trait mapping, providing insights into canine and human health.

Authors write both that the average Fst is about twice that of humans, and that it is 2-3 times, so I don’t know which it is, for they don’t report the actual value. I copied their table and found the mean Fst to be .26, which is not even twice the usual human value of .15. I also recall reading another paper finding a value of upper 20s, but I can’t find it right now.

Genetic variation in cat breeds was assessed utilizing a panel of short tandem repeat (STR) loci genotyped in 38 cat breeds and 284 single-nucleotide polymorphisms (SNPs) genotyped in 24 breeds. Population structure in cat breeds generally reflects their recent ancestry and absence of strong breed barriers between some breeds. There is a wide range in the robustness of population definition, from breeds demonstrating high definition to breeds with as little as a third of their genetic variation partitioning into a single population. Utilizing the STRUCTURE algorithm, there was no clear demarcation of the number of population subdivisions; 16 breeds could not be resolved into independent populations, the consequence of outcrossing in established breeds to recently developed breeds with common ancestry. These 16 breeds were divided into 6 populations. Ninety-six percent of cats in a sample set of 1040 were correctly assigned to their classified breed or breed group/population. Average breed STR heterozygosities ranged from moderate (0.53; Havana, Korat) to high (0.85; Norwegian Forest Cat, Manx). Most of the variation in cat breeds was observed within a breed population (83.7%), versus 16.3% of the variation observed between populations. The hierarchical relationships of cat breeds is poorly defined as demonstrated by phylogenetic trees generated from both STR and SNP data, though phylogeographic grouping of breeds derived completely or in part from Southeast Asian ancestors was apparent.

Their results:

An analysis of molecular variance (AMOVA) demonstrated moderate Fst values among cat breeds, from a high of 0.53 to a low of 0.0 (Table 3). The average pairwise Fst value observed in cat breeds is 0.17, with most of the variation in cat breeds observed within a breed population (83.7%), versus 16.3% of the variation observed between populations (Table 3).

Genetic differences -> phenotypic differences?

A nice thing about these papers is that they published the entire Fst matrix, so one can reuse this with any dataset of cat phenotypic values to look for congruence between genetic distance and trait distances. Such a study has also been done for humans with intelligence as the trait of interest.

The study analyzes whether genetic differences (“genetic distances”) help to explain cross-national IQ differences being controlled for environmental factors. Genetic distances are an indicator of evolutionary history and of difference or similarity between populations. Controlled for environmental determinants the relationship between genetic distances and intelligence differences can be interpreted as an effect of genetic factors. Genetic distances were calculated in Y-chromosomal haplogroup frequencies between N = 101 national populations based on k = 27 genetic studies. Correlations and path-analyses with differences in geographical coordinates and the Human Development Index (HDI) as background and control factors revealed a positive impact of genetic distances on cross-national IQ-differences (r = .37, β = .22 to .40). The strongest impact was found for HDI (r = .67, β = .58). Longitudinal differences have no positive effect (r = −.09, β = −.13 to −.26), latitudinal differences have a positive one (r = .37, β = .07 to .21). The positive relationship to latitudinal differences underpins an evolutionary explanation. Chances and limits of this approach (e.g. no intelligence coding genes detected) understanding national differences in cognitive ability and the role of environmental factors are discussed.

This Fst difference-difference approach was also used recently in another large scale dog study:

Variation across dog breeds presents a unique opportunity for investigating the evolution and biological basis of complex behavioral traits. We integrated behavioral data from more than 17,000 dogs from 101 breeds with breed-averaged genotypic data (N = 5,697 dogs) from over 100,000 loci in the dog genome. Across 14 traits, we found that breed differences in behavior are highly heritable, and that clustering of breeds based on behavior accurately recapitulates genetic relationships. We identify 131 single nucleotide polymorphisms associated with breed differences in behavior, which are found in genes that are highly expressed in the brain and enriched for neurobiological functions and developmental processes. Our results provide insight into the heritability and genetic architecture of complex behavioral traits, and suggest that dogs provide a powerful model for these questions.


Heritability estimates, breed-level behavioral data, and clustering based on behavioral and genetic data.A) Heritability (h2) estimates (proportion of variance attributable to genetic factors) for 14 behavioral traits.Genotypic variation accounts for five times more variance in analyses across vs. within breeds (within-breed estimates compiled from Ilska et al., 2017).Points for Hayward et al. and Parker et al. reflect the results of analyses with independent genetic datasets. Error bars reflect the 95% confidence intervals. B) Heatmap of breed-average behavioral scores plotted alongside a cladogram of breed relatedness from Parker et al., 2017. C) Breed dendrograms from clustering based on behavioral (left panel) and genetic (right panel) similarity. Colors correspond to clades from Parker et al., 2017.

Why the relevance? Studies of animals provide a reasonable prior for what to expect about human race differences. The general idea here is that if dog/cat breeds show obvious and accepted heritable subspecies differences, and 2) these species have comparable between group genetic variation to humans, then 3) it’s likely that human phenotypic gaps also reflect genetic ones. Sensing the danger, perhaps, some SJWs have decided to bite the bullet and accept dog blank slatism!

One major issue with this comparison is that dogs, cats etc. have been under intense human-directed selection in recorded and unrecorded history, whereas humans are thought to have evolved perhaps mostly in response to environmental variation. The question of how well the analogy works depends on current research topic into whether humans can be said to be domesticated too — by themselves.

Sign of differential selection

Another idea is that one could try out a lot of correlations for phenotypes and genetic correlations. The traits that produce the largest positive values are those that are have been under the strongest differential selection. This method depends on some assumption, such that one is operating with a single dimension of the trait, which is obvious for e.g. height. Genetic drift will also produce large gaps for more distant populations, but not as large ones, and not consistent ones since drift is random.

A related idea is to get Fst values for GWAS hits across traits, and see which traits have larger genetic gaps than one would generally expect by chance levels. Davide Piffer has been using this sort of approach (Piffer 2018), though not with a phenome-wide approach.

February 24, 2019

Paige-Harden, Turkheimer and the psychometric left

Kathryn Paige Harden is professor of psychology who belongs to the Turkheimerian ‘left psychometrics’ school. I’ve discussed the odd behavior of Eric Turkheimer before, but since then I found a rather amazing essay: The Search for a Psychometric Left 1997 (the journal seems to no longer exist). It’s definitely worth reading in its entirely, but here I quote the conclusion:

I do not wish to commit the very sin I am deploring. The radical scientific left is — obviously — entitled to its views, and in this increasingly biogenetic era their implacable opposition is often a very necessary tonic. I expect to continue to stand with them, albeit slightly to their right, against the smug unanimity of the Wall Street Journal scientific establishment, and in more urgent rejection of the deeply disturbing racism that has lately taken up a beachhead at the rightmost extrema of scientific respectability. But I also expect to continue to be allied with those who continue to investigate the complexities of human ability and its transmission between generations. It is time that the psychometric establishment had a left wing (Who can doubt that it has a right?) that is willing to share enough of its assumptions to engage it in meaningful debate.

A psychometric Left would recognize that human ability, individual differences in human ability, measures of human ability, and genetic influences on human ability are all real but profoundly complex, too complex for the imposition of biogenetic or political schemata. It would assert that the most important difference between the races is racism, with its origins in the horrific institution of slavery only a very few generations ago. Opposition to determinism, reductionism and racism, in their extreme or moderate fonts, need not depend on blanket rejection of undeniable if easily misinterpreted facts like heritability, or useful if easily misapplied tools like factor analysis. Indeed it had better not, because if it does the eventual victory of the psychometric right is assured.

I think this needs no comment.

Back to Paige-Harden. She likes to tweet, and yesterday she posted this (archived in case she deletes):

So, let’s translate this from left speak to plain speak: she wants some journalist to tell the public on her behalf that opening up science — i.e. removing gate keepers — results in more science being done that isn’t favorable to left-wing politics. She is naturally very concerned about this because according to followers of the Turkheimerian school, it’s all about not extending a hand to the evil racists of the right psychometrics. What if people were no longer told that All Real Scientists think that “the most important difference between the races is racism”, to give one example?

As for psychometrics — they mean differential psychology, not people who obsess about measurement models — being particularly right-wing, this is of course a very unlikely claim considering that rather crazy left-wing tilt of psychology itself: Lambert 2018 finds a ratio of 17 to 1 of registered Democrats to Republicans among psychology faculty in the US. Indeed, survey evidence disproves this just as it does for evolutionary psychology (Buss and von Hippel 2018): differential psychology, here represented by people who publishes in Intelligence is somewhat left-leaning, though less than other academic areas. As such, because one’s own position biases one’s perception, for someone with far left-wing views, mainstream differential psychology seems particularly right-wing, while in actuality, it is center-left. This perceiver bias effect forms the basis of the usual centrist take: everybody to my right is a Nazi/everybody to my left is a Stalinist.

I think the approach advocated above — putting political goals ahead of scientific ones — are completely in contradiction to the purpose of science. Arthur Jensen said it well:

But the most frequently heard objection to further research into human genetics, particularly research into the genetics of behavioral characteristics, is that the knowledge gained might be misused. I agree. Knowledge also, however, makes possible greater freedom of choice. It is a necessary condition for human freedom in the fullest sense. I therefore completely reject the idea that we should cease to discover, to invent, and to know (in the scientific meaning of that term) merely because what we find could be misunderstood, misused, or put to evil and inhumane ends. This can be done with almost any invention, discovery, or addition to knowledge. Would anyone argue that the first caveman who discovered how to make a fire with flint stones should have been prevented from making fire, or from letting others know of his discovery, on the grounds that it could be misused by arsonists? Of course not. Instead, we make a law against arson and punish those who are caught violating the law. The real ethical issue, I believe, is not concerned with whether we should or should not strive for a greater scientific understanding of our universe and of ourselves. For a scientist, it seems to me, this is axiomatic.

February 19, 2019

A partial test of DUF1220 for population differences in intelligence?

Filed under: Genomics,intelligence / IQ / cognitive ability,Population genetics — Tags: — Emil O. W. Kirkegaard @ 07:51

You might have heard the DUF1220 hypothesis, it goes something like this:

  • DUF1220 is a copy number variant poorly tagged by arrays, and thus would not be captured well by typical GWASs for education/IQ.
  • Comparative species data suggests strong selection for DUF1220 with increased intelligence/brain size.
  • There’s some data showing a relationship between IQ in humans and DUF1220 copy number.
  • Thus, things are plausible, and hereditarians will expect a good chance that if it is causal, it should show population differences as the regular SNP based polygenic scores do (Piffer 2018).

The between species plot is surely impressive looking, the background papers are:

From Keeney et al

The individual human data:

  • Davis, J. M., Searles, V. B., Anderson, N., Keeney, J., Raznahan, A., Horwood, L. J., … & Sikela, J. M. (2015). DUF1220 copy number is linearly associated with increased cognitive function as measured by total IQ and mathematical aptitude scores. Human genetics, 134(1), 67-75.

Sample: 59 individuals (41 males and 18 females) whose ages ranged from 6 to 22.

DUF1220 protein domains exhibit the greatest human lineage-specific copy number expansion of any protein-coding sequence in the genome, and variation in DUF1220 copy number has been linked to both brain size in humans and brain evolution among primates. Given these findings, we examined associations between DUF1220 subtypes CON1 and CON2 and cognitive aptitude. We identified a linear association between CON2 copy number and cognitive function in two independent populations of European descent. In North American males, an increase in CON2 copy number corresponded with an increase in WISC IQ (R2 = 0.13, p = 0.02), which may be driven by males aged 6–11 (R2 = 0.42, p = 0.003). We utilized ddPCR in a subset as a confirmatory measurement. This group had 26–33 copies of CON2 with a mean of 29, and each copy increase of CON2 was associated with a 3.3-point increase in WISC IQ (R2 = 0.22, p = 0.045). In individuals from New Zealand, an increase in CON2 copy number was associated with an increase in math aptitude ability (R2 = 0.10 p = 0.018). These were not confounded by brain size. To our knowledge, this is the first study to report a replicated association between copy number of a gene coding sequence and cognitive aptitude. Remarkably, dosage variations involving DUF1220 sequences have now been linked to human brain expansion, autism severity and cognitive aptitude, suggesting that such processes may be genetically and mechanistically inter-related. The findings presented here warrant expanded investigations in larger, well-characterized cohorts.

So, not at all convincing. Might be true, but these data look supremely p-hacked.

What about human population counts in 1000 genomes? Well, turns out someone did a study:


DUF1220 protein domains found primarily in Neuroblastoma BreakPoint Family (NBPF) genes show the greatest human lineage-specific increase in copy number of any coding region in the genome. There are 302 haploid copies of DUF1220 in hg38 (~160 of which are human-specific) and the majority of these can be divided into 6 different subtypes (referred to as clades). Copy number changes of specific DUF1220 clades have been associated in a dose-dependent manner with brain size variation (both evolutionarily and within the human population), cognitive aptitude, autism severity, and schizophrenia severity. However, no published methods can directly measure copies of DUF1220 with high accuracy and no method can distinguish between domains within a clade.

Here we describe a novel method for measuring copies of DUF1220 domains and the NBPF genes in which they are found from whole genome sequence data. We have characterized the effect that various sequencing and alignment parameters and strategies have on the accuracy and precision of the method and defined the parameters that lead to optimal DUF1220 copy number measurement and resolution. We show that copy number estimates obtained using our read depth approach are highly correlated with those generated by ddPCR for three representative DUF1220 clades. By simulation, we demonstrate that our method provides sufficient resolution to analyze DUF1220 copy number variation at three levels: (1) DUF1220 clade copy number within individual genes and groups of genes (gene-specific clade groups) (2) genome wide DUF1220 clade copies and (3) gene copy number for DUF1220-encoding genes.

To our knowledge, this is the first method to accurately measure copies of all six DUF1220 clades and the first method to provide gene specific resolution of these clades. This allows one to discriminate among the ~300 haploid human DUF1220 copies to an extent not possible with any other method. The result is a greatly enhanced capability to analyze the role that these sequences play in human variation and disease.

So, they developed a method to count DUF1220 (which is difficult because it’s a strange kind of variation) from sequence data (their tool is publicly available). Then they tested this on 1000 genomes public data, but:

Approximately 25 individuals were randomly chosen from each of the CEU, YRI, CHB, JPT, MXL, CLM, PUR, ASW, LWK, CHS, TSI, IBS, FIN, and GBR populations for a total of 324 individuals.

For some reason only used ~25 persons from each despite free availability of more. 🤔 Someone should re-do with all the data. Their counts are only shown in the supplements:

Which doesn’t seem to show any consistent pattern. Maybe had they merged the continental groups. They don’t provide the values in a table, so we can’t do it easily.


  • Maybe the samples were too small to see the count differences. [Conspiracy hat] This was on purpose to hide them.
  • Maybe DUF1220 is just a fluke. After all, the human IQ study looks p-hacked, and it’s a candidate gene, so prior = low. That stuff about being selected? Well, humans have 20k genes, so something is bound to show a pattern like that.

Things to do:

  • Calculate the DUF1220 counts in the full 1000 genomes dataset and other public sequence data such as the Simon’s panel. Their code for analyzing 1kg is public too.

Also, noting my prior conditional prediction:

Added 9th March 2019

Blogger Half-Assed Science has also previously blogged about this DUF1220 study. They also carried out some regressions, which found not much of interest, but did not reanalyze the data to get more data. To really examine the issue, one would need a large dataset with diverse people, sequencing data, and IQ/SES. The Simons Diversity Project has a lot of data, but no phenotypes, so one will have to assume population means. A better option is to apply for 100k genomes project, which I guess has some phenotypes. Website is not very clear. Another option is the UK10k, which has 10k sequenced genomes. I can’t see what phenotypes they have either. A better approach perhaps is to re-do the ancient genomes paper (Woodley et al 2017) but also adding the DUF1220 counts. Do they increase during human evolution? Does the count correlate with absolute latitude? Easy enough research question, just waiting for something with moderate technical skills and some courage.

February 5, 2019

FAQ for Dunkel et al 2019 Ashkenazim polygenic score for intelligence

Filed under: Genomics,intelligence / IQ / cognitive ability — Tags: , — Emil O. W. Kirkegaard @ 08:29

So predictably, our study on Jewish IQ has elicited some rather harsh (and in some cases moralistic) criticisms from people on Twitter. The paper itself is actually doing fairly well, and already has 4100 reads on ResearchGate. Since many claims have been made about the study, it seemed sensible to provide an FAQ of sorts.

Wasn’t your study underpowered, n=53?

Actually, power calculations indicate that our study was well-powered. This is easy to see by the generally small p values in the study. Underpowered studies end up with a lot of borderline p values, i.e. in the region close to 0.05. Our main p values were:

  • Correlations among key variables in Table 1, all p<.001.
  • The ANOVA tests for Jewish-Gentile gaps in Table 2 have p<.001, .001, .001, .05. Only the weaker similarities test had suspicious results, but we didn’t use this result for much.
  • Mediation analysis in Table 3 does not have p values, but it’s obvious from the range of the 95% CI that these must be quite small too.

But really, n=53? That doesn’t sound representative

We had to rely on already existing data to do this analysis. The Wisconsin Longitudinal Study (WLS) was the first dataset we were able to find with a minimally useful Jewish sample, along with PGSs for IQ/education, and a decent IQ test.

As for representativeness, we carried out an analysis to see if modern Wisconsin Jews earn more/less money than those in other states, which would indicate whether this was a representative group. Results indicated Wisconsin was quite average.

But Jesus, n=53, will this even replicate?

While the paper was in review, we found another dataset, Health and Retirement Study (HRS), which also had the necessary variables. Unfortunately, the IQ test is worse. However, our main findings replicated (see the codebook). This dataset has 153 Jews with the right variables.

But, but, n=53 samples! No one else does that!

Something like 50% of social science studies have smaller samples than 53, as seen in e.g. this review, or this one, or this one.

But, seriously, no one does that in human genetics!

You mean, aside from these high impact studies by well known authors?

What didn’t you use population stratification controls?

The polygenic scores were already constructed with standard population stratification controls. See the origin paper:

But why didn’t you use more population stratification controls? Look at this random study which did that

Controlling for population stratification should not be done indiscriminately. The standard method of controlling consists of controlling for a number of principal components (PCs) from the genetic data. These represent the k first genetic dimensions in the data. If the selection signal is strong, the selection signal variance will be included partly or wholly in these PCs and thus removed by the control. An interesting example of this can be seen in a recent genomic study of the Russian silver fox experiment.

Controlling for population stratification assumes the variance associated with this is of no interest, e.g. represents irrelevant environmental causation. This is the genomics equivalent of the sociologist’s fallacy, a kind of indirect way to assume the blank slate view.

Only pseudoscientists don’t use population stratification controls!

Like the ones who just published this large study looking at Ashkenazim genetic susceptibility to Crohn’s disease compared to Gentiles?

Given that enriched genetic variants in NOD2 and LRRK2 contribute to differences in CD risk in AJ population, we next asked whether unequivocally established common variant associations contribute to differences in CD genetic risk. We performed polygenic risk score (PRS) analysis using reported effect size estimates from 124 CD alleles including those reported in a previously published study[36] and four variants in IL23R from a recent fine-mapping study[37], and excluding variants in NOD2 and LRRK2. We observed an elevated PRS for AJ compared to non-Jewish controls (0.97 s.d. higher, p<10−16; Fig 3A; number of non-AJ controls = 35,007; number of AJ controls = 454), and as expected when performing the PRS analysis using OR calculated from non-Jewish subset of iCHIP data the signal still remains (p<10−16, S7 Fig). We observed a similar trend for the CD samples (0.54 s.d. higher; p<10−16; Fig 3B; number of non-AJ CD cases = 20,652; number of AJ CD cases = 1,938). We demonstrate this is not a systematic property of common risk alleles in AJ by running the same comparison using instead the comparable set of established schizophrenia associated alleles from the Psychiatric Genomics Consortium[38].

But how do we know the polygenic scores are valid in Ashkenazim?

While we are not aware of any direct validation study, it is known from plant, animal, human and simulation research that polygenic score validity declines as a function of Fst between the training population and the target population (Scutari et al 2016). The Fst difference between Ashkenazim and Central and Northern Europeans is tiny — about 0.06 to 0.08 according to Bray et al 2010 — so we don’t expect any serious decrease in validity.

The plot looks quite similar to the silver fox plot above.

But how about subtle biases in the polygenic score from Lee et al 2018?

Some recent studies have argued that even very subtle genetic stratification in training sets can create quite spurious results. We are aware of these. Indeed, I have been tweeting these studies as they come out — Kerminen et al 2018, Berg et al 2018, and Sohail et al 2018. It is currently not known how large an Fst needs to be before one gets serious issues with subtle biases in training GWASs, or whether one can construct polygenic scores differently to avoid this (e.g. only including top hits). A good candidate for future progress is using PGSs constructed from putative causal SNPs, though these have not yet found widespread use (see supplements in Lee et al 2018). We look forward to future studies examining these issues in more detail.

Why didn’t you do X analysis with the genomic data?

We did not have access to the genomic data for this study. We relied exclusively on precomputed variables, see their website.

I’ve found some other imperfections with your study / the above didn’t convince me

The purpose of our paper was not to provide irrefutable evidence of genetic basis of Ashkenazim intelligence, but to take a first stab at it, and hope that more research will follow so that we can get to the bottom of it. Our critics seem to be operating under a strange view of science where one would only ever publish something that 100% establishes some conclusion, and which no one can find a way to object to, but that’s not how science works. Scientific progress is a series a steps towards resolving a question. Sometimes there’s a big step, sometimes a small one. In this case, a study was needed to break the ice/advance the Overton window, so that others may follow. Our research group is working on multiple new projects related to Ashkenazim intelligence, some of which were inspired by the feedback we have received.

This is not some post hoc defence we made up, it’s right in the paper:

Given the above limitations, we consider the present results to be tentative and in need of replication with better PGS data and larger samples of the Jewish population. Our findings nonetheless yield an initial positive indication of the polygenic selection model and critically indicate that in the case of the Jewish versus non-Jewish Caucasian comparison, the same source of genetic variance that gives rise to of individual differences in GCA also contributes substantially to the group difference.

Where can I learn more?

Immediately following the publication of our study, someone write a good summary of the science of Jewish intelligence on Medium. This 2007 summary by Charles Murray is also good. For a book length piece, Richard Lynn’s 2011 book is worth checking out.

Bonus: 2019 April 1st, external replication published

Davide Piffer has a new study out:

It has this figure:

Note that Piffer did not have individual level IQ data, he relied on known/estimated group IQ means, and used the PGS means from his data sources. As such, this is not a direct replication of our study, but close. Not mentioned in the table above, but the Ashkenazi sample here is based on data from ~5000 people (150 full genomes, rest exomes).

January 23, 2019

Updated 23andme results (2019-01-22)

Filed under: Population genetics — Tags: — Emil O. W. Kirkegaard @ 04:03

Previous results.

23andme has updated their ancestry estimates, so I’m reposting mine for people who are wondering.

The change to previously is that now I’m slightly more European: 99.8% vs. 99.7%. I’m no longer North African, but now I’m 0.1% Amerindian (false positive probably), and less Ashkenazi 2.8% (from 2.9%).

They also report regional results now, and they figured out I come from Jutland, especially central. This is correct in so far as my mother’s family is entirely from there, with a long family tree going back some hundreds of years. My father grew up in Copenhagen, but his origins are obscure because he was adopted. His father seems to have been a vagabond of sorts (or moved around a lot at least) and who had a rare French middle name. His mother is some distant descendant of the Danish-Jewish family Kampmann/Engmann I think. There is a book one can get from the library somewhere that catalogs this.

My data is from an older version of the 23andme array (version 1, 950k snps), and my nuclear family has a newer version (650k snps). Thus, the array data format might affect the results if imputation/training is not done properly. Results look like this:

Or in tabular format:

Ancestry Father Mother Expected offspring Emil Brother Emil deviation Brother deviation Mean deviation
Scandinavian 43.5 46.5 45.00 47.4 48.6 2.40 1.20 1.80
French & German 20.4 20.4 20.40 19.0 14.0 -1.40 -5.00 -3.20
British & Irish 11.8 18.5 15.15 10.5 7.5 -4.65 -3.00 -3.83
Ashkenazi 5.1 0.0 2.55 2.8 0.8 0.25 -2.00 -0.88
Eastern European 0.8 1.4 1.10 1.4 0.4 0.30 -1.00 -0.35
Broadly Northwestern Euro 17.6 13.0 15.30 18.4 27.8 3.10 9.40 6.25
Broadly Southern Euro 0.0 0.0 0.00 0.0 0.4 0.00 0.40 0.20
Broadly Euro 0.7 0.3 0.50 0.4 0.5 -0.10 0.10 0.00
East Asian & Amerindian 0.0 0.0 0.00 0.1 0.0 0.10 -0.10 0.00
Unassigned 0.1 0.0 0.05 0.1 0.0 0.05 -0.10 -0.03
sum 100.0 100.1 100.1 100.1 100.0 0.0 -0.1 0.0


I’ve also added the expected child ancestry, as well as the deviations. Generally speaking, large mean deviations are very unlikely and indicate model bias. Thus, we see that the offspring French and German ancestry shrinks along with British and Irish, while Broadly Northwestern Euro increases. Thus, their model has some kind of issue with this ancestry and assigns it differently depending on which generation people are from. This probably indicates some kind of age or generation confounding in their training sample.

January 6, 2019

What you can’t say: genetic group difference edition

Paul Graham‘s 2004 essay What you can’t say had a big influence on me and remains my favorite essay. In he argued essentially that popular morality shows fashion tendencies i.e. that it varies over time but for no evidence-linked reason. What is at one time considered a grievous moral evil is later considered not a big deal, and the other way around. Graham did not mention anything related to race in his essay, but he mentioned that physicists are much smarter than humanities people. Apparently, this was hotly enough disputed that he wrote a follow up. Though academics at times deny it, intelligence is the most important trait. We have literally named our species after it (homo sapiens, thinking man).

I’ve already discussed Eric Turkheimer before, but his continued scientifically nonsensical incoherence on this topic means that we will have to do another post.

Behold, I present you the list of party approved OK traits for studying between group differences

OK – studying these is science as usual, carry on

NOT OK – you are racist and a pseudoscientist

  • Intelligence

January 3, 2019

Environmentalists like admixture analysis too (until they don’t)

Filed under: Genetics / behavioral genetics,intelligence / IQ / cognitive ability — Tags: , — Emil O. W. Kirkegaard @ 17:10

See previous post about quotes from the medical genetics and physical anthropology literature on admixture analysis and the causal interpretation.

There’s quite a few older admixture studies that examined relationships between racial ancestry and intelligence. Most of these used quite crude methods such as interviewer judgement. Some used a better method, namely objectively measured skin tone/color/reflectance. A small few used something approaching modern technology, namely blood groups. Curiously, you only hear about the ones with the smallest samples probably because these found no relationships, and are thus perceived to provide evidence (sometimes seen as conclusive) for non-genetic models. Thus, we can find environmentalists defending this kind of analysis, now that they like the results.

Templeton provides a particularly good quote, my bolding below.

There is a way of testing if differences in phenotypic means between two populations have a genetic basis. The test was developed by Mendel and requires that the populations be crossed and that the hybrids and their descendants be raised in a ‘‘common garden” (i.e., a common environment). Despite the extreme interest in the genetic basis of between population differences in intelligence, only a handful of studies have even attempted to use this standard research design of genetics. These few studies (Green, 1972; Loehlin, Vandenberg, & Osborne, 1973; Scarr, Pakstis, Katz, & Barker, 1977) have several common features. First, they take advantage of the strong tendency of humans to interbreed when brought into physical proximity. For example, in the Americas, geographically differentiated human populations of European and sub-Saharan African origin were brought together and began to hybridize. However, most matings still occurred within populations. Given this assortative mating, the genetic impact of hybridization is extremely sensitive to the cultural environment. In North America, the hybrids were culturally classified as blacks, and hence most subsequent matings involving the hybrids were into the population of African origin. Therefore, a broad range of variation in degree of European and African ancestry can be found among North American individuals who are all culturally classified as being members of the same “race”, in this case blacks (a “common garden” cultural classification). In Latin America, different cultures have different ways of classifying hybrids, but in general a number of alternative categories are available and social class is a more powerful determinant of mating than is physical appearance (e.g., skin color). As a consequence, individuals in Latin America can be culturally classified into a single social entity that genetically represents a broad range of variation in amount of European and African ancestry. Thus, these studies use a “common garden” design in a cultural sense that nevertheless includes hybrid individuals and their descendants. Second, these studies quantify the degree of European and African ancestry in a population of individuals that is culturally classified as being a single “race.” Because the original geographically disparate populations do show genetic differences due to isolation by distance, the degree of European and African ancestry of a specific individual can be estimated using blood group and molecular genetic markers. Finally, the shared premise of these studies is that if a trait that differentiates European and sub-Saharan Africans has a genetic basis, it should show variation in the hybrid population that correlates with the degree of African ancestry. This is indeed the case for many morphological traits, such as skin color (Scarr et al., 1977). However, there is no significant correlation with the degree of African ancestry for any cognitive test result, either within the cultural environment of being “black” (Loehlin et al., 1973; Scarr et al.,1977) or in the cultural environment of being “white” (Green, 1972). Hence, even though these populations differ in their average test scores, there is no evidence for any genetic differentiation among these populations at genetic loci that influence these IQ test scores.

So if such admixture patterns were to be found,, Templeton would have to agree this constitutes evidence for “genetic differentiation among these populations at genetic loci that influence these IQ test scores”.

Nisbett spends an entire appendix trying to argue for a 0% genetic contribution to US black-white gap. After ignoring most of the evidence on the topic (even in his included categories), he specifically advocates using admixture studies in his discussion (again my bolding):

Racial Ancestry and IQ
All of the research reported above is most consistent with the proposition that the genetic contribution to the black/white dif-ference is nil, but the evidence is not terribly probative one way or the other because it is indirect. The only direct evidence on the question of genetics concerns the racial ancestry of a given individual. The genes in the U.S. “black” population are about zo percent European (Parra et al., 1998; Parra, Kittles, and Shriver, 1004). Some blacks have completely African ancestry, many have at least some European ancestry, and some—about to percent—have mostly European ancestry. Does it make a difference how African versus European a black person is? A hereditarian model demands that blacks with more European genes have higher IQs. Herrnstein and Murray (1994) and Rushton and Jensen (2005), as it happens, scarcely deal with this direct evidence.

[discussion of various studies]

So what do we have in the way of studies that examine the effects of racial ancestry—by far the most direct way to assess the contribution of genes versus the environment to the black/white IQ gap? We have one flawed adoption study with results consistent with the hypothesis that the gap is substantially genetic in origin, and we have two less-flawed adoption studies, one of which indicates slightly superior African genes and one of which suggests no genetic difference. We have downs of studies looking at racial ancestry as indicated by skin color and “negroidness” of features that provide scant support for the genetic theory. In addition, three different studies of Europeanness of blood groups, using two different designs, indicate no support for the genetic theory. One study of illegitimate children in Germany demonstrates no superiority for children of white fathers as compared to children of black fathers. One study shows that exceptionally bright “black children have no more European ancestry than the best-available estimate for the population as a whole. And one study indicates that A is more advantageous for a mixed-race child to be raised by a family having a white mother than by a family having a black mother. All of these racial ancestry studies are subject to alternative interpretations Most of these alternatives boil down to the possibility that there was self-selection for IQ in black-white unions. If whites who mated with blacks had much lower IQs than whites in general, their European genes would convey little IQ advantage. Similarly, if blacks who mated with whites had much higher IQs than blacks in general, their African genes would not have been a drawback. Yet the extent to which white genes contributing to mixed-race unions would have to be inferior to white genes in general, or black genes would have to be superior to black genes in general, would have to be very extreme to result in no IQ difference at all between children of purely African heritage and those of partially European origin. Moreover, self-selection by IQ was probably not very great during the slave era, when most black-white unions probably took place. It is unlikely, for example, that the white males who mated with black females had on average a lower IQ than other white males. Indeed, if such unions mostly involved white male slave-owners and black female slaves, which seems likely to be the case (Parra et al., 1998), and if economic status was slightly positively related to IQ (as it is now), thew whites probably had IQs slightly above average. The black female partners were nor likely chosen on the bask of IQ, as opposed to comeliness. Similarly, it scarcely seems likely that either black or white soldiers in World War II were selecting their German mates on the basis of IQ. Several studies, moreover, are immune to the self-selection hypothesis. In particular, the study involving black and white children raised in an institutional setting, and the study involving black children adopted into either black or white middle-class homes, could not be explained by self-selection for IQ in mating. In short, though one would never know it by reading Herrnstein and Murray’s book (1994) or Rushton and Jensen’s article (zoos), the great mass of evidence on racial ancestry—the only direct evidence we have—points toward no contribution at all of genetics to the black/white gap.


December 27, 2018

Admixture analysis and genetic causation: some quotes from the literature

Filed under: Genomics,Metascience — Tags: , , — Emil O. W. Kirkegaard @ 09:42

A common comment on bias in scientific peer review is that reviewers don’t usually say openly they are applying double standards. Instead, they just silently increase their standards. If their bias against some finding is strong, the evidential burden to meet goes to infinity, making sure that nothing is rigorous enough to pass review. A case in point of this behavior was very clear in our attempts to get an admixture analysis for race and intelligence published. Although admixture analysis is commonplace in medical genetics and in scientific anthropology, somehow the interpretation of such findings is totally different when one changes the trait.

For instance, one really hostile reviewer recently wrote:

2. Second, I made the point that the fundamental logic of the study is weak. The authors simply state that they are following accepted protocol in genetic epidemiology. For one thing, the authors provide no basis to believe that their study follows accepted protocol in genetic epidemiology – their approach is certainly not widely accepted as a means of demonstrating a causal influence of continental ancestry on cognitive/behavioural traits; for another, saying ‘this is the way things are done’ does not rebut my point that the logic of the study is flawed.


4. but I do think that any such enquiries have to be held to a very high standard of evidence, given the potential social harms of misguided findings. The evidence presented here is not of a high standard at all.

I have discussed the topic of causality and admixture results at length (e.g. in my long PING write-up, and in other places), and it’s also done in this version of the paper (reviewer never commented on that, as to be expected). However, we can easily disprove his claim that admixture findings are not generally taken to indicate causality. We thank this particular reviewer for openly admitting his double standards.

The collection of quotes below is obviously not exhaustive. Indeed, I compiled most of these in about two hours. One can find 100s of such quotes if determined to spend a day or two. To find such quotes, one can use search queries like this one for African Americans.

Medical genetics

African Americans and health outcomes

In the Atherosclerosis Risk in Communities (ARIC) Study, African Americans are twice as likely as whites to develop incident type 2 diabetes—a disparity which persists even after extensive adjustment for socioeconomic status (SES) and behavioral risk factors [4]. This persistent disparity suggests that genetic factors may contribute to ethnic differences in susceptibility to type 2 diabetes.


Given the observed ethnic/racial disparities in diabetes prevalence, we hypothesized that some diabetes susceptibility alleles are present at higher frequency in African Americans than in European Americans, resulting in association between genetic ancestry and diabetes risk that is independent of its association with other non-genetic risk factors for type 2 diabetes. Thus we sought 1) to establish the association of genetic ancestry with diabetes and related quantitative traits in African Americans, after accounting for the non-genetic risk factors, and 2) to identify diabetes susceptibility loci by conducting a genome-wide admixture mapping scan.


In summary, in community-based populations with more than 7,000 African Americans, we found that genetic ancestry is significant associated with type 2 diabetes above and beyond the effects of markers of SES, and we detected several suggestive loci that may harbor genetic variants modulating diabetes risk. These results suggest that in African Americans, genetic ancestry has a significant effect on the risk of type 2 diabetes that are independent of the contribution of SES, but that no single locus with a major effect explains a large portion of the observed disparity in diabetes risk between African Americans and European Americans. In addition, they suggest that genetic measured African ancestry contributes to the risk of type 2 diabetes via both genetic and non-genetic pathways. The effect of ancestry on any individual locus in the genome is likely to be modest, but in aggregate, differences in ancestry may contribute substantially to the observed ethnic disparity in risk of type 2 diabetes.

This study is particularly noteworthy in that the authors explicitly present SIRE gaps that remain after extensive (sociologist fallacy style) controls as being evidence of genetic causation. They afterwards then hypothesize an association between genetically measured ancestry and outcome risk, which they then find. This is basically the same reasoning used by Jensen in in 1969 and forwards.

We have demonstrated that genetic ancestry may serve as a biomarker for identifying smokers who would benefit from targeted counseling regarding smoking cessation [41], [42]. One important implication of our findings is that there may be rare genetic variants relevant to smoking associated lung function decline that are population-specific and which co-vary with genetic ancestry [43]. While we cannot rule out that some of these associations may be in part due to environmental factors which co-vary with ancestry, these results highlight the scientific advantages of studying racially mixed populations. Future analyses should include admixture mapping to identify genomic regions associated with rate of lung function decline.

In summary, a consistent association of African ancestry with asthma risk was observed in a large case-control sample of self-reported African American subjects. Although confounding effects attributable to other relevant risk factors cannot be ruled out, we replicate previous findings and support the notion that ethnic disparities in asthma incidence are affected, in part, by genetic determinants. Frequency differences for risk alleles across populations and/or differential gene-environmental interactions may lead to differential disease susceptibility.

Due to this heterogeneity, genetic admixture analysis offers a unique opportunity for studying the role of genetic factors within a single, admixed population, independent of social factors, and comorbidities. Ancestry informative markers, AIMs, are genetic loci showing alleles with large frequency differences between populations that can be used to estimate bio-geographical ancestry at the level of the population and individual. Ancestry estimates at both the subgroup and individual level can be directly instructive regarding the genetics of the phenotypes that differ qualitatively or in frequency between populations (Shriver et al., 2003). Specifically, an association between genetic ancestry and a disease phenotype within an admixed group such as AAs may be an indicator of genetic factors underlying differential expression among racial groups (Peralta et al., 2010).

A greater proportion of African genetic ancestry is independently associated with higher FG
levels in a non-diabetic community-based cohort, even accounting for other ancestry
proportions, obesity and SES. The results suggest that differences between African-Americans
and whites in type 2 diabetes risk may include genetically mediated differences in glucose

The mechanisms that underlie differences in sleep characteristics between European Americans (EA) and African Americans (AA) are not fully known. Although social and psychological processes that differ by race are possible mediators, the substantial heritability of sleep characteristics also suggests genetic underpinnings of race differences. We hypothesized that racial differences in sleep phenotypes would show an association with objectively measured individual genetic ancestry in AAs.


Ancestry-phenotype association tests, which quantify associations between measured genetic ancestry and a phenotype in an admixed population, like AAs, can be used to test the extent to which the genetic characteristics underlying race may be responsible for observed population level differences.39,40 In the context of sleep, ancestry-phenotype association tests assume that multiple genetic variants, each with small effects on sleep, may have different allele frequencies in different continental populations that contributed to the admixed population. Because individuals from the admixed population inherit varying proportions of their genome from different ancestral populations, one expects contribution from any one ancestral population to show a wide range of variation (theoretically, spread between 0–100%). Any association between ancestry and a phenotype in an admixed group, then, indicates that multiple variants across the genome that have been inherited from one particular ancestral population are related to variation in the phenotype.3945 In this manner, objectively measured genetic ancestry enables us to test the uniquely genetic facet of “race” parsed from the cultural, behavioral, and psychosocial aspects that may be responsible for the observed phenotypic differences.


By utilizing the genetic variability attributable to continental admixture, we show for the first time that visually scored percent SWS and NREM EEG delta are associated with %AF in AAs. Even after adjusting for several demographic, socioeconomic and clinical covariates, %AF explained between 9% and 11% of the variance in SWS in AAs. These results show that AAs have inherited multiple alleles (either few alleles of large effect sizes or several alleles of moderate to low effect sizes) from their African ancestors that may pre-dispose them to lower percent SWS. This association between measured genetic ancestry and SWS clearly establishes a partial genetic basis underlying the observed racial differences in this dimension of sleep.

The authors even write out the Jensen logic in the abstract.

The role of genetic predisposition in this disparity is supported by two admixture mapping

studies of AAs which demonstrated that greater proportion of European ancestry was inversely
associated with fibroids in AA women.

Latin Americans and health

Some recent studies, which also used AIMs, demonstrated/ suggested that the genomic Amerindian ancestry may be protective against hypertension in women from the United States [10] , protective against metabolic syndrome in the population of Costa Rica [11] and protective against Alzheimer’s disease in Brazilian population [12] . Furthermore, recent studies in the Brazilian population showed that Amerindian individuals had lesser arterial stiffness and hypertension [13 – 14] . These studies suggest that lower risk of diseases studied in individuals with Amerindian ancestry may be due to the existence of protective genetic factors associated with this ancestry.

Significant questions remain unanswered regarding t he genetic versus environmental contributions to racial/ethnic differences in sleep and circadian rhythms. We addressed this question by investigating the association betw een diurnal preference, using the MorningnessAEveningness questionnaire (MEQ), and ge netic ancestry within the Baependi Heart Study cohort, a highly admixed Brazi lian population based in a rural town. Analysis was performed using measures of ance stry, using the Admixture program, and MEQ from 1,453 individuals. We found a n association between the degree of Amerindian (but not European of African) ancestry and morningness, equating to 0.16 units for each additional percent of Amerindian ancestry, after adjustment for age, sex, education, and residential zone. To our knowledge, this is the first published report identifying an association b etween genetic ancestry and MEQ, and above all, the first one based on ancestral contrib utions within individuals living in the same community. This previously unknown ancestral d imension of diurnal preference suggests a stratification between racial/ethnic gro ups in an as yet unknown number of genetic polymorphisms.
The authors essentially cover the entire reasoning in their abstract.


Pygmy height

Note the title “genetic determination”!

Considering a subset of 213 individuals for which DNA was available, we were able to formally compare the individual variation in height with the neutral genetic variation among individuals from the different Pygmy and Non-Pygmy populations.

Controlling for the binary categorization of individuals as Pygmies or Non-Pygmies, as well as for population substructure, we found strongly significant positive correlations between Pygmy individuals’ stature and their levels of admixture with the Non-Pygmy gene-pool estimated using the clustering software STRUCTURE. This result suggests that the major difference in average stature observed between Central African Pygmy and Non- Pygmy populations is likely determined by complex genetic factors.

In this context, Genome Wide Association studies and Admixture Mapping methods will likely reveal the genetic loci involved in the determination of the differences of average height found in existing African Pygmy and Non-Pygmy populations. This will further help us to better understand the determination and evolution of height variation among human populations.

We observed extensive and significant genetic and phenotypic differentiation (Figure 1, Figure 2, Figure S1) and varying levels of admixture among the Pygmy and Bantu populations. Average levels of Bantu ancestry, as determined by STRUCTURE (K = 2), in the three Western Pygmy populations were 27% (Bakola), 35% (Baka), and 49% (Bedzan) with individual values ranging from 16–73%. Average levels of Pygmy ancestry in the three Bantu populations were <1% (Lemande), 2% (Tikar), and 7% (Ngumba), with individual values ranging from 0–39%. We also observed a highly significant correlation between ancestry and height (p = 5.047×10−18) after correcting for the effect of sex (full model r2 = 0.7411, r2 for sex = 0.4247; r2 for ancestry = 0.3164). In addition, the effect of ancestry remains significant in a model that also includes Pygmy-Bantu ethnicity as a covariate (p = 3.8×10−5). These results are consistent with Becker et al. [21] and indicate a strong genetic influence on height. Similar findings were also observed using Pygmy samples only (pancestry = 0.000216; full model r2 = 0.5066; r2 sex = 0.3744; r2 ancestry = 0.1322) and the independent set of genome-wide microsatellite markers described in Tishkoff et al. [9] (data not shown).

We used the results from ADMIXTURE to estimate individual ancestry proportion ( K = 5 for esti- mating pygmy ancestry, and K = 8 for Asian ancestry) and its correlation with adult height for 43 men and 27 women from the different pygmy groups of the Philippines (Aeta, Agta, and Batak) and for the nonpygmy groups (Tagbanua, Zambales, Casiguran). Because K = 5 separates negritos and Asians, we used individual “negrito” ancestry proportion to correlate with their adult height. This procedure allows us to estimate the effect of genetic contribution on adult height.

As expected, mean stature estimates for the Batwa (66 males, 152.9 cm; 103 females, 145.7 cm) were lower than those for the Bakiga (20 males, 165.4 cm; 41 females, 155.1 cm; Fig. 2 B ). Batwa stature is significantly positively correlated with the proportion of Bakiga admixture: for males, females, and for all samples combined after regressing out the sex effect (Fig. 2 C – E ), confirming a genetic basis for the African pygmy phenotype (6, 12).


We can draw four primary conclusions from our analyses. ( i ) The African pygmy phenotype has a genetic basis, rather than a solely environmental one, based on the positive correlation between stature and Bakiga admixture for Batwa individuals raised in Batwa communities (Fig. 2 C – E ). These results confirm those obtained from other African rainforest hunter-gatherer populations by Becker et al. (12) and Jarvis et al. (6) and are consistent with individual case observations from Cavalli-Sforza (4).

Although environmental variation is an important factor influencing adult height, such influences are considered insufficient to account fully for observed population differences. Some African populations are considerably taller than others, for example, despite experiencing poorer nutrition and elevated levels of pathogen exposure (Deaton, 2007), suggesting that such differences may have a genetic basis. To date, very few studies have addressed this issue. Notable exceptions are studies investigating the difference in height observed between the Baka pygmies of Cameroon and taller neighbouring non-Pygmy populations (Becker et al., 2011; Jarvis et al., 2012). Both of these studies showed that Pygmy individuals who were genetically more similar to non-Pygmy individuals (i.e. higher levels of genetic admixture) were taller. Most recently, Perry et al. (2014) have shown that the pygmy phenotype likely arose several times independently due to positive natural selection for short stature. Additional evidence for genetic factors underlying population differences in height come from a Korean population (Cho et al., 2009).[…]

Substantial levels of non-Pygmy genetic admixture have been observed across Central African Pygmy populations [ 24 – 2 7 , 29 – 31 ] , correlating positively with adult standing height [ 32 – 34 ] . The general genetic difference be tween Pygmies and non-Pygmies together with the correlation of genetic admixture and standing height suggests that adult body size differences among Central African Pygmies and neighboring non-Pygmies are attributable in large part to genetic factors, arguing against a view that diminutive Central African pygmy body size is the consequence solely of phenotypic plasticity in a challenging nutritional and parasitic environment [ 8 ].


Our findings accord with prior observations [ 59 , 63 ] that while Pygmy body size is generally proportionally reduced relative to non-Pygmies, their leg lengths are significantly shorter relative to their trunk length. Importantly, our results provide further support for an appreciable genetic component to the determination of body size differences between Pygmies and non-Pygmies, as implied by the correlations observed between the different measures and inferred levels of non-Pygmy admixture that replicate those reported previously for adult standing height [ 32 – 34 ] .

Note that the authors changed their wordings a bit in the published version. Maybe they too encountered some funny reviewers!

Amerindian-descent physical appearance

A number of studies look at admixture in Amerindian populations, relating both macro-race/continental ancestry to phenotypes as well as sub-Amerindian clusters.

Furthermore, studies of regional human genome diversity, and its bearing on phenotypic variation, have so far been strongly biased towards European-derived populations17. The study of populations with non-European ancestry is essential if we are to obtain a more complete picture of human diversity. Latin America represents an advantageous setting in which to examine regional genetic variation and its bearing on human phenotypic diversity18, considering that the extensive admixture resulted in a marked genetic and phenotypic heterogeneity2,3,19. Relative to disease phenotypes, the genetics of physical appearance can be viewed as a model setting with distinct advantages for analyzing patterns of genetic and phenotypic variation. Many physical features are relatively simple to evaluate, show substantial geographic diversity and are highly heritable. We have previously shown that variation at a range of physical features correlates with continental ancestry in Latin Americans19 and have identified genetic variants with specific effects for a number of features20,21,22.


We infer the timings of these genetic contributions and relate them to historically-attested migrations, for example providing compelling new evidence of widespread ancestry from undocumented migrants during the colonial era. We further show how differences in Native and European sub-continental ancestry components are associated with variation in physical appearance traits in Latin Americans, highlighting the impact of regional genetic variation on human phenotypic diversity.

Older Posts »

Powered by WordPress