Clear Language, Clear Mind

February 19, 2019

A partial test of DUF1220 for population differences in intelligence?

Filed under: Genomics,intelligence / IQ / cognitive ability,Population genetics — Tags: — Emil O. W. Kirkegaard @ 07:51

You might have heard the DUF1220 hypothesis, it goes something like this:

  • DUF1220 is a copy number variant poorly tagged by arrays, and thus would not be captured well by typical GWASs for education/IQ.
  • Comparative species data suggests strong selection for DUF1220 with increased intelligence/brain size.
  • There’s some data showing a relationship between IQ in humans and DUF1220 copy number.
  • Thus, things are plausible, and hereditarians will expect a good chance that if it is causal, it should show population differences as the regular SNP based polygenic scores do (Piffer 2018).

The between species plot is surely impressive looking, the background papers are:

From Keeney et al

The individual human data:

  • Davis, J. M., Searles, V. B., Anderson, N., Keeney, J., Raznahan, A., Horwood, L. J., … & Sikela, J. M. (2015). DUF1220 copy number is linearly associated with increased cognitive function as measured by total IQ and mathematical aptitude scores. Human genetics, 134(1), 67-75.

Sample: 59 individuals (41 males and 18 females) whose ages ranged from 6 to 22.

DUF1220 protein domains exhibit the greatest human lineage-specific copy number expansion of any protein-coding sequence in the genome, and variation in DUF1220 copy number has been linked to both brain size in humans and brain evolution among primates. Given these findings, we examined associations between DUF1220 subtypes CON1 and CON2 and cognitive aptitude. We identified a linear association between CON2 copy number and cognitive function in two independent populations of European descent. In North American males, an increase in CON2 copy number corresponded with an increase in WISC IQ (R2 = 0.13, p = 0.02), which may be driven by males aged 6–11 (R2 = 0.42, p = 0.003). We utilized ddPCR in a subset as a confirmatory measurement. This group had 26–33 copies of CON2 with a mean of 29, and each copy increase of CON2 was associated with a 3.3-point increase in WISC IQ (R2 = 0.22, p = 0.045). In individuals from New Zealand, an increase in CON2 copy number was associated with an increase in math aptitude ability (R2 = 0.10 p = 0.018). These were not confounded by brain size. To our knowledge, this is the first study to report a replicated association between copy number of a gene coding sequence and cognitive aptitude. Remarkably, dosage variations involving DUF1220 sequences have now been linked to human brain expansion, autism severity and cognitive aptitude, suggesting that such processes may be genetically and mechanistically inter-related. The findings presented here warrant expanded investigations in larger, well-characterized cohorts.

So, not at all convincing. Might be true, but these data look supremely p-hacked.

What about human population counts in 1000 genomes? Well, turns out someone did a study:


DUF1220 protein domains found primarily in Neuroblastoma BreakPoint Family (NBPF) genes show the greatest human lineage-specific increase in copy number of any coding region in the genome. There are 302 haploid copies of DUF1220 in hg38 (~160 of which are human-specific) and the majority of these can be divided into 6 different subtypes (referred to as clades). Copy number changes of specific DUF1220 clades have been associated in a dose-dependent manner with brain size variation (both evolutionarily and within the human population), cognitive aptitude, autism severity, and schizophrenia severity. However, no published methods can directly measure copies of DUF1220 with high accuracy and no method can distinguish between domains within a clade.

Here we describe a novel method for measuring copies of DUF1220 domains and the NBPF genes in which they are found from whole genome sequence data. We have characterized the effect that various sequencing and alignment parameters and strategies have on the accuracy and precision of the method and defined the parameters that lead to optimal DUF1220 copy number measurement and resolution. We show that copy number estimates obtained using our read depth approach are highly correlated with those generated by ddPCR for three representative DUF1220 clades. By simulation, we demonstrate that our method provides sufficient resolution to analyze DUF1220 copy number variation at three levels: (1) DUF1220 clade copy number within individual genes and groups of genes (gene-specific clade groups) (2) genome wide DUF1220 clade copies and (3) gene copy number for DUF1220-encoding genes.

To our knowledge, this is the first method to accurately measure copies of all six DUF1220 clades and the first method to provide gene specific resolution of these clades. This allows one to discriminate among the ~300 haploid human DUF1220 copies to an extent not possible with any other method. The result is a greatly enhanced capability to analyze the role that these sequences play in human variation and disease.

So, they developed a method to count DUF1220 (which is difficult because it’s a strange kind of variation) from sequence data (their tool is publicly available). Then they tested this on 1000 genomes public data, but:

Approximately 25 individuals were randomly chosen from each of the CEU, YRI, CHB, JPT, MXL, CLM, PUR, ASW, LWK, CHS, TSI, IBS, FIN, and GBR populations for a total of 324 individuals.

For some reason only used ~25 persons from each despite free availability of more. 🤔 Someone should re-do with all the data. Their counts are only shown in the supplements:

Which doesn’t seem to show any consistent pattern. Maybe had they merged the continental groups. They don’t provide the values in a table, so we can’t do it easily.


  • Maybe the samples were too small to see the count differences. [Conspiracy hat] This was on purpose to hide them.
  • Maybe DUF1220 is just a fluke. After all, the human IQ study looks p-hacked, and it’s a candidate gene, so prior = low. That stuff about being selected? Well, humans have 20k genes, so something is bound to show a pattern like that.

Things to do:

  • Calculate the DUF1220 counts in the full 1000 genomes dataset and other public sequence data such as the Simon’s panel. Their code for analyzing 1kg is public too.

Also, noting my prior conditional prediction:

Added 9th March 2019

Blogger Half-Assed Science has also previously blogged about this DUF1220 study. They also carried out some regressions, which found not much of interest, but did not reanalyze the data to get more data. To really examine the issue, one would need a large dataset with diverse people, sequencing data, and IQ/SES. The Simons Diversity Project has a lot of data, but no phenotypes, so one will have to assume population means. A better option is to apply for 100k genomes project, which I guess has some phenotypes. Website is not very clear. Another option is the UK10k, which has 10k sequenced genomes. I can’t see what phenotypes they have either. A better approach perhaps is to re-do the ancient genomes paper (Woodley et al 2017) but also adding the DUF1220 counts. Do they increase during human evolution? Does the count correlate with absolute latitude? Easy enough research question, just waiting for something with moderate technical skills and some courage.

January 23, 2019

Updated 23andme results (2019-01-22)

Filed under: Population genetics — Tags: — Emil O. W. Kirkegaard @ 04:03

Previous results.

23andme has updated their ancestry estimates, so I’m reposting mine for people who are wondering.

The change to previously is that now I’m slightly more European: 99.8% vs. 99.7%. I’m no longer North African, but now I’m 0.1% Amerindian (false positive probably), and less Ashkenazi 2.8% (from 2.9%).

They also report regional results now, and they figured out I come from Jutland, especially central. This is correct in so far as my mother’s family is entirely from there, with a long family tree going back some hundreds of years. My father grew up in Copenhagen, but his origins are obscure because he was adopted. His father seems to have been a vagabond of sorts (or moved around a lot at least) and who had a rare French middle name. His mother is some distant descendant of the Danish-Jewish family Kampmann/Engmann I think. There is a book one can get from the library somewhere that catalogs this.

My data is from an older version of the 23andme array (version 1, 950k snps), and my nuclear family has a newer version (650k snps). Thus, the array data format might affect the results if imputation/training is not done properly. Results look like this:

Or in tabular format:

Ancestry Father Mother Expected offspring Emil Brother Emil deviation Brother deviation Mean deviation
Scandinavian 43.5 46.5 45.00 47.4 48.6 2.40 1.20 1.80
French & German 20.4 20.4 20.40 19.0 14.0 -1.40 -5.00 -3.20
British & Irish 11.8 18.5 15.15 10.5 7.5 -4.65 -3.00 -3.83
Ashkenazi 5.1 0.0 2.55 2.8 0.8 0.25 -2.00 -0.88
Eastern European 0.8 1.4 1.10 1.4 0.4 0.30 -1.00 -0.35
Broadly Northwestern Euro 17.6 13.0 15.30 18.4 27.8 3.10 9.40 6.25
Broadly Southern Euro 0.0 0.0 0.00 0.0 0.4 0.00 0.40 0.20
Broadly Euro 0.7 0.3 0.50 0.4 0.5 -0.10 0.10 0.00
East Asian & Amerindian 0.0 0.0 0.00 0.1 0.0 0.10 -0.10 0.00
Unassigned 0.1 0.0 0.05 0.1 0.0 0.05 -0.10 -0.03
sum 100.0 100.1 100.1 100.1 100.0 0.0 -0.1 0.0


I’ve also added the expected child ancestry, as well as the deviations. Generally speaking, large mean deviations are very unlikely and indicate model bias. Thus, we see that the offspring French and German ancestry shrinks along with British and Irish, while Broadly Northwestern Euro increases. Thus, their model has some kind of issue with this ancestry and assigns it differently depending on which generation people are from. This probably indicates some kind of age or generation confounding in their training sample.

December 23, 2018

Genetic evidence for historical polygyny — Rushton vindicated?

Filed under: Population genetics — Tags: , — Emil O. W. Kirkegaard @ 06:44

Sometimes references are made to such findings. For instance, in Is There Anything Good About Men? (Roy F. Baumeister, 2007):

Recent research using DNA analysis answered this question about two years ago. Today’s human population is descended from twice as many women as men. I think this difference is the single most underappreciated fact about gender. To get that kind of difference, you had to have something like, throughout the entire history of the human race, maybe 80% of women but only 40% of men reproduced.

No reference is given, but there’s a few of these studies:

The latter 2 studies are important because they looked at population differences. Rushton claimed that Africans had more chads (resulting in more polygyny) and East Asians more dads (more monogyny), with Europeans somewhere in between closer to Asians. These mating pattern differences result in predictions about the relative sex ratios of effective population sizes one can measure in the genetics. Arbiza et al note:

Compared with our empirical results, the results of these simulations have a few salient features. The range of estimates of absolute X/A diversity for relatively neutral regions, furthest from genes, tended to exceed or match the expectation from recent models of human demography: African populations showed a marked increase of at least 7.8% (0.86–0.88) from the expectation (0.780  0.798), European populations also had a higher increase, albeit to a smaller degree of ~4% (0.78–0.79), than the expectation (0.699–0.751), and East Asian populations showed estimates overlapping or only modestly exceeding (0.70–0.75) the expectation from demography (0.687–0.734). Although this increase could partially be due to the confounding effect of natural selection on some of the modeling strategies, it points to the possibility that additional factors leading to a relative increase in the female versus male N e , such as effective polygyny 7 over extended historical time periods, have played a past role in human populations.

Lippold et al:

Some basic summary statistics concerning mtDNA and NRY diversity for the regions are provided in Table 1. The π values we report are for the most part somewhat larger than reported in a previous study of eight Africans and eight Europeans [50], which is not unexpected given the much larger sampling in our study. Notably, we find substantial variation among geographic regions in amounts of mtDNA versus NRY diversity; this is shown further in the comparison of the mean number of pairwise differences (mpd) for mtDNA and the NRY (Figure 2A). The mtDNA mpd for Africa is about twice that for other regions, while the NRY mpd is greatest in the Middle East/North Africa region, and only slightly greater in Africa than in the other regions (with the exception of the Americas, which show substantially lower NRY diversity). Overall, there are striking differences in the ratio of NRY:mtDNA mpd (Table 1), with Africa, Central Asia, and the Americas having significantly less NRY diversity relative to mtDNA diversity, compared to the other regional groups. Moreover, differences in relative levels of NRY:mtDNA diversity are also evident in the individual populations (Additional file 3: Table S3), although the small sample sizes indicate that the individual population results must be viewed cautiously.

The trouble with straightforward interpretations is that population bottlenecks/history, and female out-migration can also skew these ratios, so the observed ratio does not necessarily function as a (pure) measure of polygyny. I imagine one could clarify this issue using animal data or some deeper genomic patterns that allow one to decompose the variation, but I haven’t followed the literature well enough to know whether this has been done.

Update 2019-05-04: Peter Frost reviewed genomic evidence in 2011

Peter Frost informs me per email that he reviewed some of the genomic evidence already back in 2011 in a blogpost.

June 7, 2018

Lewontin’s famous 1972 book chapter

Filed under: Population genetics — Tags: , , — Emil O. W. Kirkegaard @ 07:21

Few people actually read this, which is a shame because the science is interesting for historical purposes, and the fulltext of his famous fallacy is rarely read. I believe this is a pity because it is really obvious when you actually read it.

The conclusion section reads as follows:

The results are quite remarkable. The mean” proportion of the total species diversity that is contained within populations is 85.4%, with a maximum of 99.7% for the Xm gene, and a minimum of 63.6% for Duffy. Less than 15% of all human genetic diversity is accounted for by differences between human groups! Moreover, the difference between populations within a race accounts for an additional 8.3%, so that only 6.3% is accounted for by racial classification.

This allocation of 85% of human genetic diversity to individual variation within populations is sensitive to the sample of populations considered. As we have several times pointed out, our sample is heavily weighted with “primitive” peoples with small populations, so that their Ho values count much too heavily compared with their proportion in the total human population. Scanning Table 3 we see that, more often than not, the Hpop values are lower for South Asian aborigines, Australian aborigines, Oceanians, and Amerinds than for the three large racial groups. Moreover, the total human diversity, Hspecies, is inflated because of the overweighting of these small groups, which tend to have gene frequencies that deviate from the large races. Thus the fraction of diversity within populations is doubly underestimated since the numerator of that fraction is underestimated and the denominator overestimated.

When we consider the remaining diversity, not explained by within-population effects, the allocation to within-race and between-race effects is sensitive to our racial representations. On the one hand the over-representation of aborigines and Oceanians tends to give too much weight to diversity between races. On the other hand, the racial component is underestimated by certain arbitrary lumpings of divergent populations in one race. For example, if the Hindi and Urdu speaking peoples were separated out as a race, and if the Melanesian peoples of the South Asian seas were not lumped with the Oceanians, then the racial component of diversity would be increased. Of course, by assigning each population to separate races we would carry this procedure to the reductio ad absurdum. A post facto assignment, based on gene frequencies, would also increase the racial component, but if this were carried out objectively it would lump certain Africans with Lapps! Clearly, if we are to assess the meaning of racial classifications in genetic terms, we must concern ourselves with the usual racial divisions. All things considered, then, the 6.3% of human diversity assignable to race is about right, or a slight overestimate considering that Hpop is overestimated.

It is clear that our perception of relatively large differences between human races and subgroups, as compared to the variation within these groups, is indeed a biased perception and that, based on randomly chosen genetic differences, human races and populations are remarkably similar to each other, with the largest part by far of human variation being accounted for by the differences between individuals.

Human racial classification is of no social value and is positively destructive of social and human relations. Since such racial classification is now seen to be of virtually no genetic or taxonomic significance either, no justification can be offered for its continuance.

That’s it. Notice the absence of any premise connecting the conclusion

“Human racial classification is of no social value and is positively destructive of social and human relations”

to the intended premise

“The mean proportion of the total species diversity that is contained within populations is 85.4% […]. Less than 15% of all human genetic diversity is accounted for by differences between human groups! Moreover, the difference between populations within a race accounts for an additional 8.3%, so that only 6.3% is accounted for by racial classification.”.

What’s the intended premise supposed to be? What if we found the values were a 50-30-20 split? Even a cursory comparison to other species with subspecies no one denies would reveal these human values are quite typical (Woodley 2010, Fuerst 2015, Stoeckle & Thaler 2018).

March 26, 2018

East Asian intelligence and UV radiation

Filed under: intelligence / IQ / cognitive ability,Population genetics — Tags: , — Emil O. W. Kirkegaard @ 05:34

It has been noted that East Asians are positive outliers for the latitude ~ IQ pattern, they’re too far south for their high IQs.

World IQ map for Lynn’s 2012 dataset (map by David Becker).

Some possible reasons for this are:

  1. The peoples evolved further north and migrated south in recent times, and their intelligence level is related to their recent origin, not current location, just as it is for e.g. Europeans in Australia, South Africa and Brazil. This would be particularly applicable to the southern Chinese.
  2. The East Asian IQ has been over-estimated and is not actually that high, perhaps only around 100. This would leave a smaller residual, but a residual nonetheless. According to David Becker’s latest (v1.2) recalculations, the IQs are: China 105.1, South Korea: 97.2, Japan 107.6. So there doesn’t seem to be much support for this idea, except for the odd value for Korea, probably a fluke.
  3. Latitude does not proxy the relevant climate causes well in this case. A version of this is that Chinese area used to be colder in the past.
  4. The high intelligence of East Asians is primarily due to some other factors that doesn’t have to do with climate, e.g. social organization.

Regarding the latter, I was reviewing evolution of human skin pigmentation (this review) and found some UV radiation maps that are in line with (3). Notice especially China.

(A) Annual mean UVB (305 nm). Intensity is indicated by gradations from dark to light varying from 1 to 135 Jm−2 in 10 steps with oceans partially grayed-out. (B) Annual CoV for UVB (305 nm). Gradations of dark to light varying from 10 to 300 in 10 steps, with oceans area partially grayed-out.

(A) Annual mean UVA (380 nm). Intensity is indicated by gradations from dark to light varying from 65 to 930 Jm−2 in 10 steps with oceans partially grayed-out. (B) Annual CoV for UVA (380 nm). Gradations of dark to light varying from 1 to 13 in 10 steps, with oceans area partially grayed-out.

February 18, 2017

Cold winter theory in non-human animals

Disclaimer: I did not read Rushton’s seminal book on the topic. I was only first recently able to obtain a complete electronic copy (i.e. not abridged version). I did in fact only skim this literature. I don’t have time to work on this, but I’d like others to do so, hence I put the labor of my initial approach for others to build upon.

A colleague writes to me that (edited):

Last week during skiing holidays I saw a popular scientific program on animal intelligence in the evening, and they presented something like the Lynn-Miller-Rushton cold winter theory! Birds (chickadees) living in Alaska have bigger brains and are more intelligent than birds living in Kansas! This is an independent very valuable support for the cold winter theory on intelligence. I found some of the original studies:

Roth, T. C. & Pravosudov, V. V. (2009). Hippocampal volumes and neuron numbers increase along a gradient of environmental harshness: A large-scale comparison. Proceedings of the Royal Society B: Biological Sciences, 276(1656), 401–405.

Roth, T. C., LaDage, L. D., & Pravosudov, V. V. (2010). Learning capabilities enhanced in harsh environments: A common garden approach. Proceedings of the Royal Society B: Biological Sciences, 277(1697), 3187–3193.

Roth, T. C., Gallagher, C. M., LaDage, L. D., & Pravosudov, V. V. (2012). Variation in brain regions associated with fear and learning in contrasting climates. Brain, Behavior and Evolution, 79(3), 181–190.

Pravosudov, V. V., Roth, T. C., LaDage, L. D., & Freas, C. A. (2015). Environmental influences on spatial memory and the hippocampus in food-caching chickadees. Comparative Cognition & Behavior Reviews, 10, 25–43.

A list of the papers of Timothy C. Roth:

The area is called behavioral ecology. Some years ago I quickly skimmed some of this literature with an eye for comparing it with the human racial evolutionary models. Human populations were not the only ones who were faced with the cold climate of the north. While one can find studies that fit with Rushton etc.’s thinking, one can also find the reverse. Northern does not always mean smarter, larger brain, more behaviorally complex or less aggressive in non-human animals. Among turtles, the evidence shows that the northern ones are towards r, southern ones towards K (measured in egg size and count). Here’s a bunch of other studies I was able to quickly find covering a variety of animals:

  • Sol, D., Lefebvre, L., & Rodríguez-Teijeiro, J. D. (2005). Brain size, innovative propensity and migratory behaviour in temperate Palaearctic birds. Proceedings of the Royal Society of London B: Biological Sciences, 272(1571), 1433–1441.
  • Barrickman, N. L., Bastian, M. L., Isler, K., & van Schaik, C. P. (2008). Life history costs and benefits of encephalization: a comparative test using data from long-term studies of primates in the wild. Journal of Human Evolution, 54(5), 568–590.
  • Sol, D., Székely, T., Liker, A., & Lefebvre, L. (2007). Big-brained birds survive better in nature. Proceedings of the Royal Society of London B: Biological Sciences, 274(1611), 763–769.
  • Sol, D., Garcia, N., Iwaniuk, A., Davis, K., Meade, A., Boyle, W. A., & Székely, T. (2010). Evolutionary Divergence in Brain Size between Migratory and Resident Birds. PLOS ONE, 5(3), e9617.
  • Schuck-Paim, C., Alonso, W. J., & Ottoni, E. B. (2008). Cognition in an Ever-Changing World: Climatic Variability Is Associated with Brain Size in Neotropical Parrots. Brain, Behavior and Evolution, 71(3), 200–215.
  • Morrison, C., & Hero, J.-M. (2003). Geographic variation in life-history characteristics of amphibians: a review. Journal of Animal Ecology, 72(2), 270–279.
  • Jiang, A., Zhong, M. J., Xie, M., Lou, S. L., Jin, L., Robert, J., & Liao, W. B. (2015). Seasonality and Age is Positively Related to Brain Size in Andrew’s Toad (Bufo andrewsi). Evolutionary Biology, 42(3), 339–348.
  • Gillooly, J. F., & McCoy, M. W. (2014). Brain size varies with temperature in vertebrates. PeerJ, 2, e301.
  • Moore, I. T., Perfito, N., Wada, H., Sperry, T. S., & Wingfield, J. C. (2002). Latitudinal variation in plasma testosterone levels in birds of the genus Zonotrichia. General and comparative endocrinology, 129(1), 13-19.
  • Garamszegi, L. Z., Hirschenhauser, K., Bókony, V., Eens, M., Hurtrez-Boussès, S., Møller, A. P., … & Wingfield, J. C. (2008). Latitudinal distribution, migration, and testosterone levels in birds. The American Naturalist, 172(4), 533-546.
  • Leggett, W. C., & Carscadden, J. E. (1978). Latitudinal variation in reproductive characteristics of American shad (Alosa sapidissima): evidence for population specific life history strategies in fish. Journal of the Fisheries Board of Canada, 35(11), 1469-1478.
  • Iverson, J. B., Balgooyen, C. P., Byrd, K. K., & Lyddan, K. K. (1993). Latitudinal variation in egg and clutch size in turtles. Canadian Journal of Zoology, 71(12), 2448-2461.
  • Chalfoun, A. D., & Martin, T. E. (2007). Latitudinal variation in avian incubation attentiveness and a test of the food limitation hypothesis. Animal Behaviour, 73(4), 579-585.
  • Heibo, E., Magnhagen, C., & Vøllestad, L. A. (2005). Latitudinal variation in life‐history traits in Eurasian perch. Ecology, 86(12), 3377-3386.

There are many more. To locate them, use search queries like ‘latitude “brain size” bird‘. Or look up the publication lists of the prominent researchers such as Daniel Sol.

Integrating this will require someone to read a lot of varied material and find some overall patterns in the mess. Basically a job for someone biologisty and generalisty, someone like Woodley.

Some comments

With regards to birds, brain size and ecology, there is a problem. Birds living in the high latitudes must either adapt a migrating behavioral pattern or learn how to survive in the winter. Most birds take the first route, but some don’t. However, to fly long distances, it helps to be lean, so there is strong selection against extra weight such as a larger brain. For this reason, bivariate latitude x brain size comparisons might show the opposite pattern than expected. One must account for the solution to the, well, cold winter problem. Some amphibians have an analogous tactic: hibernation. Many insects have yet another analogous solution: they only live in the summer (single year life spans). As far as I understand, fish do not have issues with the water temperature in the winter, so they don’t face the problem. Except for possibly hibernation (which sometimes does require planning ability e.g. in squirrels), these strategies would not seem to select so strongly for intelligence, and so one would not expect the higher latitude species to smarter, less aggressive and so on.

In general, therefore, it seems best to focus on animals that tackle the cold winter problem head-on instead of avoiding it somehow (migrate, hibernate, or single-year lifespans). Among birds, the smartest birds are of the Corvidae family — in particular crows, ravens and magpies — and they generally don’t migrate in the winter. Of the non-Corvidae, I think the smartest birds are some of the parrot species. These also often don’t migrate. (See also bird intelligence.)

So, if I were to look for these relationships/integrate the findings, I would select families/orders (or whatever) of animals that:

  1. are widely distributed around the Earth, so that we have natural variation to exploit.
  2. that don’t side-step the cold winter problem in some way. If they do, then this must be taken into account.
  3. that have a large number of species and sub-species. Small n science is bad science.
  4. that haven’t been artificially selected by humans.

Then I would look for reviews and studies of intelligence/cognition, brain size, tool use, latitude, climate harshness, climate variability, (ecological) temperature in these. One can enter some animal family or common term (e.g. ‘bird’) as a search term to avoid the human IQ literature.

It’s a bit tricky to choose suitable families. E.g. hooved animals — cows, horses, pigs — have often been bred by humans, so they don’t work well for testing models (the variation is not natural). They also have too little natural variation, in some cases because we killed most of them already. But deer may work okay.

Carnivores would have worked well, but like the hooved animals, we either enslaved them (wolves -> dogs, cats) or killed them (other wolves, tigers). There some wild medium-sized cat species (Felidae) left — cougars, lynx –, so maybe one could try those. Since they are wild and have claws and teeth, they aren’t so easy to work with.

Would non-human monkeys work? Yeah, maybe. There are in fact already two great studies — Fernandes et al 2014, Navarrete et al 2016 — on primate brain size, cognitive ability tool use etc. Apparently, neither of them actually looked at the ecological correlations?! I smell gold. The dataset for Navarrette is public!

Humans are too slow to migrate effectively, so they are essentially forced to make do in the winter if they are to live in colder regions. So it’s not surprising that we see fairly robust patterns for humans (Beals et al 1984). The Inuit are of course the main exception, as these seem to have IQs in the low 90s despite living far north. Perhaps this is related to small effective population size which slows down evolution for multiple reasons.

Without any integration, what the non-human literature is good for is showing that the proposed model for humans (cold winter theory) is sometimes found for some animal families/orders, and that biologists have no particular trouble with positing these models for non-human animals. As such, there should be no particular scientific resistance to positing them for humans too.

October 15, 2016

SQL server for population frequencies from 1000 genomes

Filed under: Population genetics,R — Tags: , , , , — Emil O. W. Kirkegaard @ 23:06

Note: 2018 June 26

Server down right now, investigating.

Note: August 16, 2017

server IP changed to

Original post

We need dplyr for this:


First, use the anon user to log into the SQL server (user = “anon”, pass = “”, ip = “”, port = 3306):

sql = src_mysql("population_freqs", host = "", user = "anon", port = 3306)

Select the 1000 genomes phase 3 table:

sql_1kg = tbl(sql, "1000genomes_phase3")

#look at the first 10 rows
## Source:   query [?? x 35]
## Database: mysql 10.0.27-MariaDB-0ubuntu0.16.04.1 [anon@]
##      CHR       SNP    A1    A2    ACB    ASW    BEB    CDX    CEU    CHB
##    <int>     <chr> <chr> <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1      1 rs1000033     G     T 0.4062 0.3279 0.2151 0.3172 0.2071 0.3107
## 2      1 rs1000050     C     T 0.6302 0.5246 0.2267 0.4247 0.1364 0.3932
## 3      1 rs1000070     T     C 0.6146 0.5902 0.5116 0.4946 0.2828 0.3204
## 4      1 rs1000073     G     A 0.4219 0.3689 0.2093 0.1989 0.6717 0.1456
## 5      1 rs1000075     T     C 0.3750 0.4180 0.2733 0.3172 0.3586 0.2573
## 6      1 rs1000085     C     G 0.0781 0.1393 0.0349 0.0645 0.2121 0.0437
## 7      1 rs1000127     C     T 0.2500 0.2787 0.4012 0.4677 0.3081 0.5728
## 8      1 rs1000184     C     G 0.0521 0.0574 0.2442 0.8333 0.2879 0.7767
## 9      1 rs1000211     T     C 0.0260 0.0246 0.0000 0.0000 0.0000 0.0000
## 10     1 rs1000212     A     G 0.0365 0.0246 0.0000 0.0000 0.0000 0.0000
## # ... with more rows, and 25 more variables: CHS <dbl>, CLM <dbl>,
## #   ESN <dbl>, FIN <dbl>, GBR <dbl>, GIH <dbl>, GWD <dbl>, IBS <dbl>,
## #   ITU <dbl>, JPT <dbl>, KHV <dbl>, LWK <dbl>, MSL <dbl>, MXL <dbl>,
## #   PEL <dbl>, PJL <dbl>, PUR <dbl>, STU <dbl>, TSI <dbl>, YRI <dbl>,
## #   EAS <dbl>, EUR <dbl>, AFR <dbl>, AMR <dbl>, SAS <dbl>

The entire file is really large, about 3.6 GB in memory. You often only need a few (1-1000) SNPs, so let’s try downloading only a few:

#first 10 of the hits from the latest height GWAS
some_snps = c("rs425277", "rs9434723", "rs10779751", "rs2284746", "rs12137162", 
"rs212524", "rs1014987", "rs2806561", "rs4601530", "rs926438")

#fetch from SQL server
(sql_height_freqs = sql_1kg %>% filter(SNP %in% some_snps) %>% collect())
## # A tibble: 10 x 35
##      CHR        SNP    A1    A2    ACB    ASW    BEB    CDX    CEU    CHB
##    <int>      <chr> <chr> <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
## 1      1  rs1014987     G     C 0.1458 0.1475 0.1802 0.5000 0.2525 0.4612
## 2      1 rs10779751     A     G 0.6406 0.5984 0.1279 0.0591 0.2677 0.0631
## 3      1 rs12137162     A     C 0.1198 0.1557 0.2384 0.1935 0.2677 0.2621
## 4      1   rs212524     T     C 0.1823 0.1967 0.3605 0.2473 0.4293 0.1796
## 5      1  rs2284746     G     C 0.1927 0.1475 0.4012 0.1882 0.5657 0.2573
## 6      1  rs2806561     A     G 0.3073 0.3934 0.3488 0.5591 0.5909 0.5388
## 7      1   rs425277     T     C 0.0990 0.0984 0.3023 0.1452 0.2576 0.1990
## 8      1  rs4601530     T     C 0.4271 0.3852 0.4535 0.5538 0.2475 0.4515
## 9      1   rs926438     T     C 0.8177 0.7541 0.2093 0.1774 0.5455 0.3010
## 10     1  rs9434723     A     G 0.2240 0.2541 0.1744 0.0538 0.1414 0.0922
## # ... with 25 more variables: CHS <dbl>, CLM <dbl>, ESN <dbl>, FIN <dbl>,
## #   GBR <dbl>, GIH <dbl>, GWD <dbl>, IBS <dbl>, ITU <dbl>, JPT <dbl>,
## #   KHV <dbl>, LWK <dbl>, MSL <dbl>, MXL <dbl>, PEL <dbl>, PJL <dbl>,
## #   PUR <dbl>, STU <dbl>, TSI <dbl>, YRI <dbl>, EAS <dbl>, EUR <dbl>,
## #   AFR <dbl>, AMR <dbl>, SAS <dbl>

All the atomic populations are there as well as the 5 super populations (‘macro races’). The numbers for the super populations differ slightly from those that can be seen on ensembl because they used weighted means and I used unweighted means.


Powered by WordPress