Apparently, only a few studies have examined this question and they are not easily available. Because we obtained these, it makes sense to share our results. The datafile is on Google Drive. We will fill in more results as we find them.

So far results reveal nothing surprising: MZ correlations are larger than DZ correlations. The h2 using Falconer’s formula is about 60%. Total sample ≈ 750 pairs.

Consider the model below:

General model for immigrant group traits and outcomes

Something much like this has been my intuitive working model for thinking about immigrant groups’ traits and socioeconomic outcomes. I will explain the model in this post and refer back to it or use the material in some upcoming paper (nothing planned).

The model shows the home country/a country of origin and two destination countries. The model is not limited to just two destination countries, but I did not draw more to avoid making the model larger. It can be worth using more in some cases which will be explained below.

Familial traits (or intergenerational) are those traits that run in families. This term includes both genetic and shared environmental effects. Because most children grow up with their parents (I assume), it does not matter whether the parents traits→children traits route is genetic or environmental. This means that both psychological traits (mostly genetic) and culturally traits (mostly shared environmental) such as specific religion are included.

When persons leave (emigrate) their home country, there is some selection: people who decide to leave are not random. Sometimes, it is not easy to leave because the government actively tries to restrict its citizens from leaving. This is shown in the model as the Emigration selection→Emigrant group familial traits link. Emigration selection seems to be mostly positive in the real world: the better off and smarter emigrate more than the poorer and less bright.

When the immigrants then move to other countries, there is Immigration selection because the destination countries usually don’t just allow whoever to move in if they want to. Immigration selection can have both positive and negative effects. Countries that receive refugees but try not to receive others have negative selection, while those that try to only pick the best potential immigrants have positive selection. Often countries have elements of both. Immigration selection and Emigrant group familial traits jointly lead to Immigrant group familial traits in a particular destination country.

Note that because immigrant selection is unique for each destination country, but can be similar for some countries. This would show up at correlated immigration selection scores. There is also immigration selection that doesn’t happen in the destination country, namely selection that happens due to geographical distance. For this reason I placed the Immigration selection node half in the destination country boxes. With a more complex model, one could split these if desired.

Worse, it is possible that immigration selection in a given country depends on the origin country, i.e. a country-country interaction selection. This wasn’t included in the above model. Examples of this are easy to find. For instance, within the EU (well, it’s complicated), there is relatively free movement of EU citizens, but not so for persons coming in from outside the EU.

Socioeconomic outcomes: Human capital model + luck

The S factor score of the home country (the general factor of socioeconomic outcomes, which one can think of as roughly equal to the Human Development Index just broader ) is modeled as being the outcome of the Population familial traits and Environmental and historical luck . I think it is mostly the former. Perhaps the most obvious example of environmental luck is having valuable natural resources in your borders, today especially oil. But note that even this is somewhat complicated because borders can change by use of ‘bigger army diplomacy’ or by simply purchasing more land, so one could strategically buy or otherwise acquire land that has valuable resources on it, making it not a strict environmental effect.

Other things could be having access to water, sunlight, wind, earthquakes, mountains, large bodies of inland water & rivers, active underground, arable land, living close to peaceful (or not so much) neighbors and so on. These things can promote or retard economic development. Having suitable rivers means that one can get cheap and safe (well, mostly) energy from those. Countries without such resources have to look elsewhere which may cost more. They are not always strictly environmental, but some amount of their variance is more or less randomly distributed to countries. Some are more lucky than others.

There are some who argue that countries that were colonized are better off now because of it, so that would count as historical luck . However, being colonized is not just an environmental effect because it means that foreign powers were able to defeat your forces overwhelmingly for decades. If they were able to, you probably had a poor military which is linked to general technological development. There is some environmental component to whether you have a history of communism, but it seems to still have negative effects on economic growth decades after.

For immigrant groups inside a host country, however, the environmental effects with country-wide effects cannot account for differences. These are thus due to familial effects only (by a good approximation). To be sure, the other people living in the destination/host country, Other group familial traits, probably have some effect on the Immigrant familial traits as well , such as religion and language. These familial traits and the Other group S then jointly cause the Immigrant group S. This is the effect that Open Borders advocates often talk about one aspect of:

Wage differences are a revealing metric of border discrimination. When a worker from a poorer country moves to a richer one, her wages might double, triple, or rise even tenfold. These extreme wage differences reflect restrictions as stifling as the laws that separated white and black South Africans at the height of Apartheid. Geographical differences in wages also signal opportunity—for financially empowering the migrants, of course, but also for increasing total world output. On the other side of discrimination lies untapped potential. Economists have estimated that a world of open borders would double world GDP.

Paths estimated in studies

A path model is always complete which means that all causal routes are explicitly specified. All the remaining links are non-causal, but nodes can be substantially correlated. For instance, there is no link between the home country Country S and immigrant group S but these are strongly correlated in practice. I previously reported correlations between home Country S and Immigrant group S of .54 and .72 for Denmark and Norway .

There is no link between home country Population familial traits and Immigrant group familial traits, but there is only one link in between (Emigrant group familial traits), so seems reasonable to try to correlate these two nodes. A few studies have looked at these type of correlations. For instance, John Fuerst have looked at GRE/GMAT scores and the like for immigrant groups in the US . This is taken as a proxy for cognitive ability, probably the most important component of the psychological traits part of familial traits. In that paper, Fuerst found correlations of .78 and .81 between these and country cognitive ability using Lynn and Vanhanen’s dataset .

Rindermann and Thompson have reported correlations between cognitive ability (component of Immigrant group familial traits) and native population cognitive ability (component of Other group familial traits) .

Most of my studies have looked at the nodes Population familial traits (sub-components Islam belief and cognitive ability) and Immigrant group S (or sub-components like crime if S was not available). Often this results in large correlations: .54 and .59 for Denmark and Norway (depending on how to deal with missing data, use of weighted correlations etc.). Note that in the model the first does cause the second, but there are a few intermediate steps and other variables, especially Emigrant selection (differs by country of origin which reduces the correlation) and Immigrant selection (which has no effect on the correlation).

There is much to be done. If one could obtain estimates of multiple nodes in a causal chain, one could use mediation analysis to see if mediation is plausible. E.g. right we we have Immigrant group S for two countries, cognitive ability for 100s of countries of origin, so if we could obtain immigrant group cognitive ability, one could test the mediation role of the last. With the current data, one can also check whether country of origin cognitive ability mediates the relationship between immigrant group S and country of origin S, which it should partly, according to the model. I say partly because the mediation is only to the extend that familial cognitive ability is a cause.


This was an exchange between researchers that took place in 2006 in the academic journal Intelligence (34).

  1. Templer, D. I., & Arikawa, H. (2006). Temperature, skin color, per capita income, and IQ: An international perspective. Intelligence, 34(2), 121-139.
  2. Jensen, A. R. (2006). Comments on correlations of IQ with skin color and geographic–demographic variables. Intelligence, 34(2), 128-131.
  3. Hunt, E., & Sternberg, R. J. (2006). Sorry, wrong numbers: An analysis of a study of a correlation between skin color and IQ. Intelligence, 34(2), 131-137.
  4. Templer, D. I., & Arikawa, H. (2006). The Jensen and the Hunt and Sternberg comments: From penetrating to absurd. Intelligence, 34(2), 137-139.

Readers more curious read later works on the topic, some of which include (no particular order). The list includes both proponents and critics:

  • Hunt, E., & Carlson, J. (2007). Considerations relating to the study of group differences in intelligence. Perspectives on Psychological Science, 2(2), 194-213.
  • Templer, D. I. (2008). Correlational and factor analytic support for Rushton’s differential K life history theory. Personality and Individual Differences, 45(6), 440-444.
  • Rushton, J. P., & Templer, D. I. (2009). National differences in intelligence, crime, income, and skin color. Intelligence, 37(4), 341-346.
  • Pesta, B. J., & Poznanski, P. J. (2014). Only in America: Cold Winters Theory, race, IQ and well-being. Intelligence, 46, 271-274.
  • Lynn, R. (2006). Race differences in intelligence: An evolutionary analysis. Washington Summit Publishers.
  • Eppig, C., Fincher, C. L., & Thornhill, R. (2010). Parasite prevalence and the worldwide distribution of cognitive ability. Proceedings of the Royal Society of London B: Biological Sciences, 277(1701), 3801-3808.
  • Kanazawa, S. (2008). Temperature and evolutionary novelty as forces behind the evolution of general intelligence. Intelligence, 36(2), 99-108.
  • Wicherts, J. M., Borsboom, D., & Dolan, C. V. (2010). Evolution, brain size, and the national IQ of peoples around 3000 years BC. Personality and Individual Differences, 48(2), 104-106.
  • Templer, D. I., & Stephens, J. S. (2014). The relationship between IQ and climatic variables in African and Eurasian countries. Intelligence, 46, 169-178.
  • Lynn, R., & Vanhanen, T. (2012). Intelligence: A unifying construct for the social sciences. Ulster Institute for Social Research.

Chisala has his 3rd installment up:

One idea I had while reading it was that tail effects interact with population ethnic/racial heterogeneity. To show this, I did a simulation experiment. Population 1 is a regular population with a mean of 0 and sd of 1. Population 2 is a composite population of three sub-populations: one with a mean of 0 (80%; “normals”) one with mean of -1 (10%; “dullards”) and one with a mean of 1 (10%; “brights”). Population 3 is a normal population but with a slightly increased sd so that it is equal to the sd of population 2.

Descriptive stats:

> describe(df, skew = F, ranges = T)
     vars     n mean  sd median trimmed  mad   min  max range se
pop1    1 1e+06    0 1.0      0       0 1.00 -4.88 4.65  9.53  0
pop2    2 1e+06    0 1.1      0       0 1.09 -5.43 5.37 10.80  0
pop3    3 1e+06    0 1.1      0       0 1.09 -5.30 5.13 10.44  0

We see that the sd is increased a bit in the composite population (2) as expected. We also see that the range is somewhat increased, even compared to population 3 which has the same sd.

How do the tails look like?

> sapply(df, percent_cutoff, cutoff = 1:4)
      pop1     pop2     pop3
1 0.158830 0.179495 0.180856
2 0.022903 0.034342 0.034074
3 0.001314 0.003326 0.003126
4 0.000036 0.000160 0.000150

We are looking at the proportions of persons with scores above 1-4 (rows) by each population (cols). What do we see? Population 2 and 3 have clear advantages over population 1, but population 2 has a slight advantage over population 3 too.

Simulation 2

In the above, the composite population is made out of 3 populations. But what if it were instead made out of 5?


> describe(df, skew = F)
     vars     n mean   sd median trimmed  mad   min  max range se
pop1    1 1e+06    0 1.00      0       0 1.00 -4.88 4.65  9.53  0
pop2    2 1e+06    0 1.27      0       0 1.21 -5.91 6.03 11.94  0
pop3    3 1e+06    0 1.27      0       0 1.26 -6.12 5.92 12.04  0

The sd is clearly increased. There is not much difference in the range, but the range is very susceptible to sampling error, which we have. How do the tails look like?

> sapply(df, percent_cutoff, cutoff = 1:4)
      pop1     pop2     pop3
1 0.158830 0.205814 0.214353
2 0.022903 0.057077 0.056874
3 0.001314 0.011057 0.008872
4 0.000036 0.001246 0.000804

We see strong effects. At the +3 level, there are roughly 10x as many persons in the composite population as in the normal population. Population 3 also has more, but clearly fewer than the composite population.

We can conclude that one must take heterogeneity of populations into account when thinking about the tails.

R code

You can re-do the experiment yourself with this code, or try out some other numbers.

 p_load(reshape, kirkegaard, psych)
n = 1e6
# first simulation --------------------------------------------------------
 pop1 = rnorm(n)
 pop2 = c(rnorm(n*.8), rnorm(n*.1, 1), rnorm(n*.1, -1))
 pop3 = rnorm(n, sd = sd(pop2))
 df = data.frame(pop1, pop2, pop3)
 describe(df, skew = F)
 sapply(df, percent_cutoff, cutoff = 1:4)
# second simulation -------------------------------------------------------
 pop1 = rnorm(n)
 pop2 = c(rnorm(n*.70), rnorm(n*.10, 1), rnorm(n*.10, -1), rnorm(n*.05, 2), rnorm(n*.05, -2))
 pop3 = rnorm(n, sd = sd(pop2))
 df = data.frame(pop1, pop2, pop3)
 describe(df, skew = F)
 sapply(df, percent_cutoff, cutoff = 1:4)

Van Ijzendoorn, M. H., Juffer, F., & Poelhuis, C. W. K. (2005). Adoption and cognitive development: a meta-analytic comparison of adopted and nonadopted children’s IQ and school performance. Psychological bulletin, 131(2), 301.

It turns out that someone already did a meta-analysis of adoption studies and cognitive ability. It does not solely include cross-country transracial, but it does include some. They report both country of origin and country of adoption, so it is fairly easy to find the studies that one wants to take a closer look at. It is fairly inclusive in what counts as cognitive development, e.g. school results and language tests count, as well as regular IQ tests. They report standardized differences (d), so results are easy to understand.

They do not present aggregated results by country of origin however, so one would have to do that oneself. I haven’t done it (yet?), but the method to do so is this:

  1. Obtain the country IQs for all countries in the study. These are readily available from Lynn & Vanhanen (2012) or in the international megadataset.
  2. Score all the outcomes by using the adoptive country’s IQ. E.g. if the US has a score of 97, and Koreans adopted to that country have get a d score of .16 in “School results” as they do in the first study listed, then this corresponds to a school IQ performance of 97 – 2.4 = 94.6. Note that this assumes that the comparison sample is unselected (not smarter than average). This is likely false because adoptive parents tend to be higher social class and presumably smarter, so they would send their (adoptive) children to above average schools. Also be careful about norm comparisons because they often use older norms and the Flynn effect thus results in higher IQ scores for the adoptees.
  3. Copy relevant study characteristics from the table, e.g. comparison group, sample sizes, age of assessment and type of outcome (school, language, IQ, etc.).
  4. Repeat step (2-3) for all studies.
  5. BONUS: Look for additional studies. Do this by, a) contacting the authors of recent papers and the meta-analysis, b) search for more using Google Scholar/other academic search engine, c) look thru the studies that cite the included studies for more relevant studies.
  6. BONUS: Get someone else to independently repeat steps (2-3) for the same studies. This checks interrater consistency.
  7. Aggregate results (weighted mean of various kinds).
  8. Correlate aggregated results with origin countries’ IQs to check for spatial transferability, a prediction of genetic models.
  9. Do regression analyses to see of study characteristics predict outcomes.
  10. Write it up and submit to Open Differential Psychology (good guy) or Intelligence (career-focused bad guy). Write up to Winnower or Human Varieties if lazy or too busy.

The main results table

More likely, you are too lazy to do the above, but you want to sneak peak at the results. Here’s the main table from the paper.

Study Country/region of study Country/region of child’s origin Age at assessment (years) Age at adoption (months) N Adoption N Comparison Preadoption status Comparison group Outcome (d)
Andresen (1992) Norway Korea 12-18 12-24 135 135 Not reported Classmates School results 0.16 Language 0.09
Benson et al. (1994) United States United States 12-18 < 15 881 Norm Not reported Norm group School results —0.36
Berg-Kelly & Eriksson (1997) Sweden Korea/India 12-18 < 12 125 9204 Not reported General population School results 0.03 f/—0.04 m Language —0.02 f/—0.05 m
Bohman (1970) Sweden Not reported 12-18 < 12 160 1819 Not reported Classmates School results 0.09 f/0.07 m Language 0.02 f/—0.02 m Learning problems 0.00
Brodzinsky et al. (1984) United States United States 4-12 < 12 130 130 Not reported General population School competence 0.62 f/0.51 m
Brodzinsky & Steiger (1991) United States Not reported 9-19 441 6753 Not reported Population % School failure 0.76
Bunjes & de Vries (1988) Netherlands Korea
4-12 12-24 118 236 Not reported Classmates School results 0.24 Language 0.22
Castle et al. (2000) England England 4-12 < 12 52 Norm Not reported Standardized scores School results —0.47, IQ 0.47
Clark & Hanisee (1982) United States Vietnam
0-4 12-24 25 Norm Not reported Standardized scores IQ -2.42
Colombo et al. (1992) Chile Chile 4-12 0-12 16 ii Undernutrition Biological siblings IQ -1.16
Cook et al. (1997) Europe Not reported 4-8 12-24 131 125 Not reported General population School competence 0.56 f/0.16 m
Dalen (2001) Norway Korea
12-18 0-12 193 193 Not reported Classmates School results 0.47 (Colombia), —0.07 (Korea)
Language 0.43 (Colombia), —0.05 (Korea)
Learning problems 0.50
Dennis (1973) United States Lebanon 2-18 > 24 85 51 Institute Institute children IQ —1.28 (intraracial), —1.36 (transracial)
De Jong (2001) New Zealand Romania/Russia 4-15 12-24 116 Norm Some problems General population School competence 0.65
Duyme (1988) France France 12-18 < 12 87 14951 Not reported General population School results 0.00
Fan et al. (2002) United States United States 12-18 514 17241 Not reported General population School grades —0.02
Feigelman (1997) United States Not reported 8-21 101 6258 Not reported General population Education level —0.03
Fisch et al. (1976) United States United States 4-12 < 12 94 188 No problems General population IQ 0.00
School results 0.50 Language 0.52
Frydman & Lynn
Belgium Korea 4-12 12-24 19 Norm Not reported Standardized scores IQ -1.68
Gardner et al. (1961) United States Not reported 12-18 < 12 29 29 Not reported Classmates School achievement 0.09
Geerars et al. (1995) Netherlands Thailand 12-18 < 12 68 Norm Not reported Population % School results 0.19
Hoopes et al. (1970) United States United States 12-18 100 100 1-2 shifts in placement General population IQ 0.12
Hoopes (1982) United States United States 4-12 < 12 260 68 Nothing special General population IQ 0.18
Horn et al. (1979) United States United States 3-26 < 1 469 164 No problems Environment siblings IQ 0.17/0.34/—0.05
W. J. Kim et al. (1992) United States Not reported 12-18 43 43 Not reported General population School results 0.74
W. J. Kim et al. (1999) United States Korea 4-12 < 12 18 9 Nothing special Environment siblings School competence —0.39
Lansford et al. (2001) United States Not reported 12-18 111 200 Not reported General population School grades 0.46
Leahy (1935) United States United States 5-14 < 6 194 194 Not reported General population School grades 0.00 IQ -0.06
Levy-Shiff et al. (1997) Israel Israel
South America
7-13 < 3 5050 Norm
Not reported Standardized scores IQ -1.10 f/—2.00 m
Lien et al. (1977) United States Korea 12-18 > 24 240 Norm Undernutrition Standardized scores IQ 0.00
Lipman et al. (1992) Canada Not reported 4-16 104 3185 Not reported General population School performance —0.05 f/0.16 m
McGuinness & Pallansch (2000) United States Soviet Union 4-12 > 24 105 1000 Long time in orphanages Norm group School competence 0.46
Moore (1986) United States United States 7-10 12-24 23 Norm Not reported Standardized scores IQ -0.00 f/—1.00 m
Morison & Ell wood (2000) Canada Romania 4-12 12-24 59 35 Orphanages General population IQ 1.45 (combined)
Neiss & Rowe (2000) United States 75% LTnited States 12-18 392 392 Not reported General population IQ 0.08
O’Connor et al. (2000) England Romania 6 0-42 207 Norm Orphanage Standardized scores IQ —0.56 (combined)
Palacios & Sanchez (1996) Spain Spain 4-12 > 24 210 314 Not reported Institute children School competence —0.18
Pinderhughes (1998) United States United States 8-15 24—48 66 33 Older children General population School competence 0.64 (combined)
Plomin & DeEries (1985) United States United States 1 0-5 182 182 Not reported General population IQ 0.14
Priel et al. (2000) Israel 75% Israel 8-12 12-24 50 80 Not reported General population School competence 0.77 f/1.12 m
Rosenwald (1995) Australia 73% Korea Asia
South America
4-16 < 12 283 2583 Not reported General population School performance —0.18
Scarr & Weinberg (1976) United States 88% LTnited States 4-16 <12 176 145 Not reported Environment siblings IQ 0.75 (combined)
Schiff et al. (1978) France France 4-12 <12 32 20 Not reported Biological siblings School results —0.70
Segal (1997) United States United States 4-12 < 12 6 6 Not reported Environment siblings IQ -1.14 IQ 2.67
Sharma et al. (1996) United States 81% United States 12-18 12-24 4682 4682 Not reported General population School results 0.37 (combined)
Sharma et al. (1998) United States United States 12-18 < 12 629 72 Not reported Environment School competence —0.45 f/—0.61 m
Silver (1970) United States Not reported 4-12 < 3 10 70 Not reported General population Learning problems 1.21
Silver (1989) United States Not reported 4-12 39 Perc. Not reported General population Learning problems 1.38
Skodak & Skeels (1949) United States Not reported 12-18 < 6 100 100 Not reported Standardized scores IQ -1.12
Smyer et al. (1998) Sweden Not reported Adults < 12 60 60 Not reported Biological (twin siblings) Education level —0.82
Stams et al. (2000) Netherlands Sri Lanka
4-12 < 6 159 Norm Not reported Standardized scores School results 0.33 IQ -0.34 f/—0.73 m Learning problems —0.05
Teas dale & Owen (1986) Denmark Not reported Adults < 12 302 4578 Not reported General population IQ 0.35
Education level 0.32
Tizard & Hodges (1978) England Not reported 8 > 24 25 14 Not reported Restored children IQ —0.40 (older), —0.62 (younger)
Tsitsikas et al. (1988) Greece Greece 5-6 < 12 72 72 Not reported Classmates IQ 0.64, school performance 0.29 Language 0.30
Verhulst et al. (1990) Netherlands Europe
12-18 > 24 2148 933 Not reported General population Perc. special education 0.25 f/0.29 m
Versluis-den Bieman & Verhulst (1995) Netherlands Europe
12-18 > 24 1538 Norm Not reported General population School competence 0.28 f/0.41 m
Wattier & Frydman (1985) Belgium Korea 89% 4-12 12-24 28 Norm Not reported Standardized scores IQ -0.06
Westhues & Cohen (1997) Canada Korea 40% India 40% South America 12-18 12-24 134 83 Not reported Environment siblings School performance 0.13
Wickes & Slate (1997) United States Korea > 18 > 36 174 Norm Not reported Norm group School results 0.09 f/0.07 m Language 0.07 f/0.03 m
Winick et al. (1975) United States Korea 4-12 > 24 112 Norm Malnourished Standardized scores School performance 0.00 IQ 0.00
Witmer et al. (1963) United States United States 12-18 < 12 484 484 Nothing special Classmates School performance 0.00 IQ 0.00

I found this one a long time ago and tweeted it, but apparently forgot to blog it.

Odenstad, A., Hjern, A., Lindblad, F., Rasmussen, F., Vinnerljung, B., & Dalen, M. (2008). Does age at adoption and geographic origin matter? A national cohort study of cognitive test performance in adult inter-country adoptees. Psychological Medicine, 38(12), 1803-1814.

Background Inter-country adoptees run risks of developmental and health-related problems. Cognitive ability is one important indicator of adoptees’ development, both as an outcome measure itself and as a potential mediator between early adversities and ill-health. The aim of this study was to analyse relations between proxies for adoption-related circumstances and cognitive development.
Method Results from global and verbal scores of cognitive tests at military conscription (mandatory for all Swedish men during these years) were compared between three groups (born 1968–1976): 746 adoptees born in South Korea, 1548 adoptees born in other non-Western countries and 330 986 non-adopted comparisons in the same birth cohort. Information about age at adoption and parental education was collected from Swedish national registers.
Results South Korean adoptees had higher global and verbal test scores compared to adoptees from other non-European donor countries. Adoptees adopted after age 4 years had lower test scores if they were not of Korean ethnicity, while age did not influence test scores in South Koreans or those adopted from other non-European countries before the age of 4 years. Parental education had minor effects on the test performance of the adoptees – statistically significant only for non-Korean adoptees’ verbal test scores – but was prominently influential for non-adoptees.
Conclusions Negative pre-adoption circumstances may have persistent influences on cognitive development. The prognosis from a cognitive perspective may still be good regardless of age at adoption if the quality of care before adoption has been ‘good enough’ and the adoption selection mechanisms do not reflect an overrepresentation of risk factors – both requirements probably fulfilled in South Korea.

I summarize and comment on the findings below:

Which adoptees?

In total, 2294 inter-country adoptees were born outside the Western countries (Europe, North America and Australia) and adopted before age 10 years. Of these, 746 were born in South Korea [Korean adoptee (KA) group]. The remaining 1548 individuals were born in other countries, Non-Korean adoptee (NKA) group. India was the most common country of origin, followed by Thailand, Chile, Ethiopia, Colombia and Sri Lanka. These were the only donor countries for which the number of adoptees included in this study exceeded 100. The non-adopted population (NAP) group consisted of non-adopted individuals born in Sweden (n=330 896).

Unfortunately, no more detailed information is given so a origin country IQ x adoptee IQ study (spatial transferability) cannot be done.

Main results


We see that Koreans adoptees do better than Swedes, even on the verbal test. The superiority stops being p<alpha when they control for various things. Notice that the disadvantage for non-Koreans becomes larger after control (their scores decrease and the Swedes’ scores increase).

Age at adoption matters, but apparently only for non-Koreans

age at adoption

This is in line with environmental cumulative disadvantage for non-Koreans. Alternatively, it is due to selection bias in that the less bright children (in the origin countries) are adopted later.

Perhaps the Koreans were placed with the better parents and this made them smarter?

Maybe, but the data shows that it isn’t important, even for transracial adoptives.

parental edu and IQ

Notice the clear relationship between child IQ and parental education for the non-adopted population. Then notice the lack of a clear pattern among the adoptives. There may be a slight upward trend (for Koreans), but it is weak (only .22 between lowest and highest education for Koreans, giving a d≈.10) and not found for non-Koreans (middle education-level had highest scores).

Still, one could claim that in Korean, smarter/normal children are given up for adoption, while in non-Korea non-Western Europe, this isn’t the case or even the opposite is the case. This study cannot address this possibility.

This study is much larger than other studies and also has a comparison group. The main problem with it is that it does not report data for more countries of origin. Only the (superior) Koreans are singled out.

It seems that no one has integrated this literature yet. I will take a quick stab at it here. It could be expanded into a proper paper later in case someone wants to and have time to do that.


Lee Jussim (also blog) has done a tremendous job at reviewing the stereotype in recently years. In general he has found that stereotypes are mostly moderately to very accurate. On the other hand, self-fulfilling prophecies are probably real but fairly limited (e.g. work best when teachers don’t know their students well yet), especially in comparison to stereotype accuracy. Of course, these findings are exactly the opposite of what social psychologists, taken as a group, have been telling us for years.

The best short review of the literature is their book chapter The Unbearable Accuracy of Stereotypes. A longer treatment can be found in his 2012 book Social Perception and Social Reality: Why Accuracy Dominates Bias and Self-Fulfilling Prophecy (libgen).

Occupational success and cognitive ability

Society is more or less a semi-stable hierarchy biased on mostly inherited personality traits, cognitive ability as well as some family-based advantage. This shows up in the examination of surnames over time in many countries, as documented in Gregory Clark’s book The Son Also Rises: Surnames and the History of Social Mobility (libgen). One example:

sweden stability

Briefly put, surnames are kind of an extended family and they tend to keep their standing over time. They regress towards the mean (not the statistical kind!), but slowly. This is due to outmarrying (marrying people from lower classes) and genetic regression (i.e. predicted via breeder’s equation and due to the fact that narrow heritability and shared environment does not add up to 1).

It also shows up when educational attainment is directly examined with behavioral genetic methods. We reviewed the literature recently:

How do we find out whether g is causally related to later socioeconomic status? There are at least five lines of evidence: First, g and socioeconomic status correlate in adulthood. This has consistently been found for so many years that it hardly bears repeating[22, 23]. Second, in longitudinal studies, childhood g is a good correlate of adult socioeconomic status. A recent meta-analysis of longitudinal studies found that g was a better correlate of adult socioeconomic status and income than was parental socioeconomic status[24]. Third, there is a genetic overlap of causes of g and socioeconomic status and income[25, 26, 27, 28]. Fourth, multiple regression analyses show that IQ is a good predictor of future socioeconomic status, income and more, even controlling for parental income and the like[29]. Fifth, comparisons between full-siblings reared together show that those with higher IQ tend to do better in society. This cannot be attributed to shared environmental factors since these are the same for both siblings[30, 31].

I’m not aware of any behavioral genetic study of occupational success itself, but that may exist somewhere. (The scientific literature is basically a very badly standardized, difficult to search database.) But clearly, occupational success is closely related to income, educational attainment, cognitive ability and certain personality traits, all of which show substantial heritability and some of which are known to correlate genetically.

Occupations and cognitive ability

An old line of research shows that there is indeed a stable hierarchy in occupations’ mean and minimum cognitive ability levels. One good review of this is Meritocracy, Cognitive Ability,
and the Sources of Occupational Success, a working paper from 2002. I could not find a more recent version. The paper itself is somewhat antagonistic against the idea (the author hates psychometricians, in particular dislikes Herrnstein and Murray, as well as Jensen) but it does neatly summarize a lot of findings.

occu IQ 1

occu IQ 2

occu IQ 3

occu IQ 4

occu IQ 5

occu IQ 6

occu IQ 7

The last one is from Gottfredson’s book chapter g, jobs, and life (her site, better version).

Occupations and cognitive ability in preparation

Furthermore, we can go a step back from the above and find SAT scores (almost an IQ test) by college majors (more numbers here). These later result in people working in different occupations, altho the connection is not always a simple one-to-one, but somewhere between many-to-many and one-to-one, we might call it a few to a few. Some occupations only recruit persons with particular degrees — doctors must have degrees in medicine — while others are flexible within limits. Physics majors often don’t work with physics at their level of competence, but instead work as secondary education teachers, in the finance industry, as programmers, as engineers and of course sometimes as physicists of various kinds such as radiation specialists at hospitals and meteorologists. But still, physicists don’t often work as child carers or psychologists, so there is in general a strong connection between college majors and occupations.

There is some stereotype research into college majors. For instance, a recently popularized study showed that beliefs about intellectual requirements of college majors correlated with female% of the field, as in, the harder fields perceived to be more difficult had fewer women. In fact, the perceived difficulty of the field probably just mostly proxies the actual difficulty of the field, as measured by the mean SAT/ACT score of the students. However, no one seems to have actually correlated the SAT scores with the perceived difficulty, which is the correlation that is the most relevant for stereotype accuracy research.

There is a catch, however. If one analyses the SAT subtests vs. gender%, one sees that it is mostly the quantitative part of the SAT that gives rise to the SAT x gender% correlation. One can also see that the gender% correlates with median income by major.

quant-by-college-major-gender verbal-by-college-major-gender

Stereotypes about occupations and their cognitive ability

Finally, we get to the central question. If we ask people to estimate the cognitive ability of persons by occupation and then correlate this with the actual cognitive ability, what do we get? Jensen summarizes some results in his 1980 book Bias in Mental Testing (p. 339). I mark the most important passages.

People’s average ranking of occupations is much the same regardless of the basis on which they were told to rank them. The well-known Barr scale of occupations was constructed by asking 30 “ psychological judges” to rate 120 specific occupations, each definitely and concretely described, on a scale going from 0 to 100 according to the level of general intelligence required for ordinary success in the occupation. These judgments were made in 1920. Forty-four years later, in 1964, the National Opinion Research Center (NORC), in a large public opinion poll, asked many people to rate a large number of specific occupations in terms of their subjective opinion of the prestige of each occupation relative to all of the others. The correlation between the 1920 Barr ratings based on the average subjectively estimated intelligence requirements of the various occupations and the 1964 NORC ratings based on the average subjective opined prestige of the occupations is .91. The 1960 U.S. Census o f Population: Classified Index o f Occupations and Industries assigns each of several hundred occupations a composite index score based on the average income and educational level prevailing in the occupation. This index correlates .81 with the Barr subjective intelligence ratings and .90 with the NORC prestige ratings.

Rankings of the prestige of 25 occupations made by 450 high school and college students in 1946 showed the remarkable correlation of .97 with the rankings of the same occupations made by students in 1925 (Tyler, 1965, p. 342). Then, in 1949, the average ranking of these occupations by 500 teachers college students correlated .98 with the 1946 rankings by a different group of high school and college students. Very similar prestige rankings are also found in Britain and show a high degree of consistency across such groups as adolescents and adults, men and women, old and young, and upper and lower social classes. Obviously people are in considerable agreement in their subjective perceptions of numerous occupations, perceptions based on some kind of amalagam of the prestige image and supposed intellectual requirements of occupations, and these are highly related to such objective indices as the typical educational level and average income of the occupation. The subjective desirability of various occupations is also a part of the picture, as indicated by the relative frequencies of various occupational choices made by high school students. These frequencies show scant correspondence to the actual frequencies in various occupations; high-status occupations are greatly overselected and low-status occupations are seldom selected.

How well do such ratings of occupations correlate with the actual IQs of the persons in the rated occupations? The answer depends on whether we correlate the occupational prestige ratings with the average IQs in the various occupations or with the IQs of individual persons. The correlations between average prestige ratings and average IQs in occupations are very high— .90 to .95—when the averages are based on a large number of raters and a wide range of rated occupations. This means that the average of many people’s subjective perceptions conforms closely to an objective criterion, namely, tested IQ. Occupations with the highest status ratings are the learned professions—physician, scientist, lawyer, accountant, engineer, and other occupations that involve high educational requirements and highly developed skills, usually of an intellectual nature. The lowest-rated occupations are unskilled manual labor that almost any able-bodied person could do with very little or no prior training or experience and that involves minimal responsibility for decisions or supervision.

The correlation between rated occupational status and individual IQs ranges from about .50 to .70 in various studies. The results of such studies are much the same in Britain, the Netherlands, and the Soviet Union as in the United States, where the results are about the same for whites and blacks. The size of the correlation, which varies among different samples, seems to depend mostly on the age of the persons whose IQs are correlated with occupational status. IQ and occupational status are correlated .50 to .60 for young men ages 18 to 26 and about .70 for men over 40. A few years can make a big difference in these correlations. The younger men, of course, have not all yet attained their top career potential, and some of the highest-prestige occupations are not even represented in younger age groups. Judges, professors, business executives, college presidents, and the like are missing occupational categories in the studies based on young men, such as those drafted into the armed forces (e.g., the classic study of Harrell & Harrell, 1945).

I predict that there is a lot of delicious low-hanging, ripe research fruit ready for harvest in this area if one takes a day or ten to dig up some data and read thru older papers, books and reports.

Tattoos and piercing. I haven’t found any evidence that this relates to intelligence or even creativity. On the other hand, what underlying factor of openness would it be an indication of?

I have. A long time ago, I tried to find a study of this. The only meaningful study I found was a small study of Croatian veterans:

Pozgain, I., Barkic, J., Filakovic, P., & Koic, O. (2004). Tattoo and personality traits in Croatian veterans. Yonsei medical journal, 45, 300-305.

The study has N≈100 and found a difference in IQ scores of about 5 IQ. Not very convincing.

OKCupid data

In a still unpublished project, we scraped public data from OKCupid. We did this over several months, so the dataset has about N=70k. The dataset contains the public questions and users’ public answers to them, as well as profile information. Each question is multiple choice with 2 to 4 options.

Some of the questions can be used to make a rudimentary cognitive test. with 3-5 items that has reasonable sample size. This can then be used to calculate a mean cognitive score by answer category to all questions. Plots of the relevant questions are shown below. For interpretation, the SD of cognitive score is about 1, so the differences can be thought of as d values. There is some selection for cognitive ability (OKCupid is a more serious dating site mainly used by college students and graduates), so probably population-wide results would be a bit stronger in general. Worse, this selection gets stronger as the sample size decreases because smarter people tend to answer more questions. The effect is fairly small tho.

Tattoo results





Piercing results






See first: Some methods for measuring and correcting for spatial autocorrelation

Piffer’s method

Piffer’s method to examine the between group heritability of cognitive ability and height uses polygenic scores (either by simple mean or with factor analysis) based on GWAS findings to see if they predict phenotypes for populations. The prior studies (e.g. the recent one we co-authored) have relied on the 1000 Genomes and ALFRED public databases of genetic data. These datasets however do not have that high resolution, N’s = 26 and 50. These do not include fine-grained European populations. However, it is known that there is quite a bit of variation in cognitive ability within Europe. If a genetic model is true, then one should be able to see this when using genetic data. Thus, one should try to obtain frequency data for a large set of SNPs for more populations, and crucially, these populations must be linked with countries so that the large amount of country-level data can be used.

Genomic autocorrelation

The above would be interesting and probably one could find some data to use. However, another idea is to just rely on the overall differences between European populations, e.g. as measured by Fst values. Overall genetic differentiation should be a useful proxy for genetic differentiation in the causal variants for cognitive ability especially within Europe. Furthermore, because k nearest spatial neighbor regression is local, it should be possible to use it on a dataset with Fst values for all populations, not just Europeans.

Since I have already written the R code to analyze data like this, I just need some Fst tables, so if you know of any such tables, please send me an email.


There is another table in this paper:

The study is also interesting in that they note that the SNPs that distinguish Europeans the most are likely to be genic, that is, the SNPs are located within a gene. This is a sign of selection, not drift. See also the same finding in

It is often said that polygenic traits are based on tons of causal variants each of which has a very small effect size. What is less often discussed is the distribution of these effect sizes, although this has some implications.

The first statistical importance is that we may want to modify our hyperprior if using a Bayesian approach. I’m not sure what the equivalent solution would be using a frequentist approach. I suspect the Frequentist approach is based on assuming a normal distribution of the effects we are looking at and then testing them against the null hypothesis, i.e. looking at p values. Theoretically, the detection of SNPs may improve if we use an appropriate model.

The second implication is that to find even most of them, we need very, very large samples. The smaller effects probably can never be found because there are too few humans around to sample! Their signals are too weak in the noise. One could get around this by increasing the human population or simply collecting data over time as some humans die and new ones are born. Both have problems.

But just how does the distribution of betas look like?

However, based on the current results, just how does the distribution looks like? To find out, I downloaded the supplementary materials from Rietveld et al (2013). I used the EduYears one because college is a dichotomized version of this and dichotomization is bad. The datafile contains the SNP name (rs-number), effect allele, EAF (“frequency of the effect allele from the HapMap2-CEU sample”), beta, standard error and p value for each of the SNPs they examined, N=2.3 x 106.

From these values, we calculate the absolute beta because we are interested in effect size, but not direction. Direction is irrelevant because one could just ‘reverse’ the allele.

One can plot the data in various ways. Perhaps the most obvious is a histogram, shown below.


We see that most SNPs have effect sizes near zero. Another way is to cut the betas into k bins, calculate the midpoint of each bin and the number of betas in them.


The result is fairly close to the histogram above. It is clear that this is not linear. One can’t even see the difference between the numbers for about half the bins. We can fix this by using logscale for the y-axis:


We get the expected fairly straight line. It is however not exactly straight. Should it be? Is it a fluke? How do we quantify straightness/linearity?

Perhaps if we increase our resolution, we would see something more. Let’s try 50 bins:


Now we get a bizarre result. Some of them are empty! Usually this means sampling, coding, or data error. I checked and could not find a problem on my end and it is not sampling error for the smaller betas. Perhaps they used some internal rounding system that prevents betas in certain regions. It is pretty weird. Here’s how the table output looks like:

> table(r$cut_50)

(-3.5e-05,0.0007]   (0.0007,0.0014]   (0.0014,0.0021]   (0.0021,0.0028]   (0.0028,0.0035]   (0.0035,0.0042] 
           174315            340381            321445                 0            292916            258502 
  (0.0042,0.0049]   (0.0049,0.0056]   (0.0056,0.0063]    (0.0063,0.007]    (0.007,0.0077]   (0.0077,0.0084] 
                0            217534            177858            139775                 0            107282 
  (0.0084,0.0091]   (0.0091,0.0098]   (0.0098,0.0105]   (0.0105,0.0112]   (0.0112,0.0119]   (0.0119,0.0126] 
            80258                 0             58967             42998                 0             30249 
  (0.0126,0.0133]    (0.0133,0.014]    (0.014,0.0147]   (0.0147,0.0154]   (0.0154,0.0161]   (0.0161,0.0168] 
            21929             14894                 0              9733              6899                 0 
  (0.0168,0.0175]   (0.0175,0.0182]   (0.0182,0.0189]   (0.0189,0.0196]   (0.0196,0.0203]    (0.0203,0.021] 
             4757              3305                 0              2535              1322               912 
   (0.021,0.0217]   (0.0217,0.0224]   (0.0224,0.0231]   (0.0231,0.0238]   (0.0238,0.0245]   (0.0245,0.0252] 
                0               502               319                 0               174               133 
  (0.0252,0.0259]   (0.0259,0.0266]   (0.0266,0.0273]    (0.0273,0.028]    (0.028,0.0287]   (0.0287,0.0294] 
                0                85                47                33                 0                14 
  (0.0294,0.0301]   (0.0301,0.0308]   (0.0308,0.0315]   (0.0315,0.0322]   (0.0322,0.0329]   (0.0329,0.0336] 
                5                 0                 4                 2                 0                 1 
  (0.0336,0.0343]    (0.0343,0.035] 
                1                 1

Thus we see that some of them are inexplicably empty. Why are there no betas with values between .0021 and .0028?

We can try investigating some other number of cuts. I tried 10, 20, 30, 40 and 50. Only 40 and 50 have the problem. 30 is fine:


The pattern at the 50% higher resolution (30/20=1.5) is still somewhat curved, although probably not with a low p value.

Frequency-corrected betas?

An idea I had while writing this post. Correlations and other linear modeling is affected by base rates as well as betas. Unless they corrected for this (I don’t remember), then some of the SNPs with lower betas probably have stronger betas but they appear to be weak because their base rates are too high or too low. One could correct for this restriction of range if desired which may change conclusions somewhat. What this would do is to estimate the betas of the SNPs if they all had the same frequency.

Is there support for this idea? A simple test is to correlate frequency with absolute beta. This value should be negative. It is: r = -.006 [CI95: -.007 to -.005].

R code

# IO and libs -------------------------------------------------------------
p_load(stringr, kirkegaard, psych, plyr, ggplot2)

#load data
r = read.table("SSGAC_EduYears_Rietveld2013_publicrelease.txt", sep = "\t", header = T)

# calculations ------------------------------------------------------------
#absolute values
#since we dont care about direction
r$Abs_Beta =  abs(r$Beta)

#find cut midpoints
#feature is missing
midpoints <- function(x, dp=2){
  lower <- as.numeric(gsub(",.*", "", gsub("\\(|\\[|\\)|\\]", "", x)))
  upper <- as.numeric(gsub(".*," , "", gsub("\\(|\\[|\\)|\\]", "", x)))
  return(round(lower+(upper-lower)/2, dp))

#make new dfs
cut_vec = c(10, 20, 30, 40, 50)
d_list = llply(cut_vec, function(x) {
  #add cuts to r
  tmp_var = str_c("cut_", x)
  r[tmp_var] = cut(r$Abs_Beta, breaks = x)
  #make a new df based of the table
  data.frame(N = table(r[[tmp_var]]) %>% as.numeric,
             midpoint = table(r[[tmp_var]]) %>% names %>% midpoints(., dp = 99))
}, .progress = "text")
names(d_list) = str_c("cut_", cut_vec) #add names

# plots --------------------------------------------------------------------
ggplot(r, aes(Abs_Beta)) + geom_histogram() + xlab("Absolute beta coefficient")

#loop plot
for (i in seq_along(d_list)) {
  #fetch data
  tmp_d = d_list[[i]]
  ggplot(tmp_d, aes(midpoint, N)) + geom_point() + geom_smooth() + ylab("Number of SNPs") + xlab("Midpoint of range")
  name = str_c(names(d_list)[i], "_beta_N_linear.png")
  try({ #we try because log transformation can give an error
    ggplot(tmp_d, aes(midpoint, N)) + geom_point() + geom_smooth() + ylab("Number of SNPs") + xlab("Midpoint of range") + scale_y_log10() + geom_smooth(method = "lm", se = F, color = "red")
    name = str_c(names(d_list)[i], "_beta_N_log.png")

# investigate -------------------------------------------------------------