Clear Language, Clear Mind

May 22, 2019

Immigrant IQ gains by generation: evidence from PIAAC

Filed under: Immigration,intelligence / IQ / cognitive ability — Tags: — Emil O. W. Kirkegaard @ 01:50

Thanks to Chet Robie for sending me his study.

There’s been quite a bit of research on immigrant IQs. From within our network, Heiner Rindermann and James Thompson published a long analysis of scholastic achievement data:

Immigration, immigration policies and education of immigrants alter competence levels. This study analysed their effects using PISA, TIMSS and PIRLS data (1995 to 2012, N=93 nations) for natives’ and immigrants’ competences, competence gaps and their population proportions. The mean gap is equivalent to 4.71 IQ points. There are large differences across countries in these gaps ranging from around +12 to −10 IQ points. Migrants’ proportions grow roughly 4% per decade. The largest immigrant-based ‘brain gains’ are observed for Arabian oil-based economies, and the largest ‘brain losses’ for Central Europe. Regarding causes of native–immigrant gaps, language problems do not seem to explain them. However, English-speaking countries show an advantage. Acculturation within one generation and intermarriage usually reduce native–immigrant gaps (≅1 IQ point). National educational quality reduces gaps, especially school enrolment at a young age, the use of tests and school autonomy. A one standard deviation increase in school quality represents a closing of around 1 IQ point in the native–immigrant gap. A new Greenwich IQ estimation based on UK natives’ cognitive ability mean is recommended. An analysis of the first adult OECD study PIAAC revealed that larger proportions of immigrants among adults reduce average competence levels and positive Flynn effects. The effects on economic development and suggestions for immigration and educational policy are discussed.

One trouble with their study is the conflation of immigrant origins into a single group. This makes generational differences difficult to interpret, as they discuss:

Differences between immigrants of second and first generation. PISA offers competence results for immigrants of second generation (born in the country of assessment but whose parents were born in another country; NC=54) and first generation (born in another country and whose parents were born in another country). If students of the second generation show better achievement this would support the acculturation hypothesis. Generally, students of the second generation (see Table S4) show better results (SASM2-1 = 12, 1.84 IQ). The positive effect appears across the five PISA surveys(PISA 2000: SASM2-1 = 18; PISA 2003: SASM2-1 = 13; PISA 2006: SASM2-1 = 5; PISA2009: SASM2-1 = 14; PISA 2012: SASM2-1 = 8). The gains are tending to become smaller with time, and in 10 years they became 7.33 points smaller. This is a hint that acculturation becomes weaker, e.g. due to creating own milieus leading to social and cultural separation, facilitated by increasing immigrant groups and certain world views such as Islamic religion. A set to zero of negative effects is here not necessary (no comparisons with natives).

However, there are countries with a negative first-to-second-generation migrant competence development: Qatar SASM2-1 = 57, Emirates SASM2-1 = 38, ChileSASM2-1 = 34, Latvia SASM2-1 = 25, Czech SASM2-1 = 24, Azerbaijan SASM2-1 = 22, New Zealand SASM2-1 = 19,CostaRicaSASM2-1 = 16, Trinidad SASM2-1 = 14, Ireland SASM2-1 = 3 and Jordan SASM2-1 = 2. These negative results show that acculturation is not the whole story in second- vs first-generation differences, but probably also that there are differences in origin among immigrant groups. For example, in Germany (only SASM2-1 = 1) first-generation immigrants could be immigrants from Eastern Europe, whereas second-generation immigrants come from Turkey. Only if second- and first-generation immigrants did not differ in such aspects can the difference unambiguously be interpreted as acculturation gains. However, the presented difference is contaminated with other differences and therefore it is not consistent across countries.

So, preferably, we would want a dataset that allows us to both look at student generation (using place of birth to code as 1st or later generation) and country of origin of parents. One problem here is when parents originate from different countries of origin. In Danish data, the mother’s origin is used if available. Chet Robie et al has a recent study on just this question:

Globalization has led to increased migration and labor mobility over the past several decades and immigrants generally seek jobs in their new countries. Tests of general mental ability (GMA) are common in personnel selection systems throughout the world. Unfortunately, GMA test scores often display differences between majority groups and ethnic subgroups that may represent a barrier to employment for immigrants. The purpose of this study was to examine differences in GMA based on immigrant status in 29 countries (or jurisdictions of countries) throughout the world using an existing database that employs high‐quality measurement and sampling methodologies with large sample sizes. The primary findings were that across countries, non‐immigrants (n = 139,464) scored approximately half of a standard deviation (d = .53) higher than first‐generation immigrants (n = 22,162) but only one‐tenth of a standard deviation (d = .12) higher than second‐generation immigrants (n = 6,428). Considerable variability in effect sizes was found across countries as Nordic European and Germanic European countries evidenced the highest non‐immigrant/first‐generation immigrant mean differences and Anglo countries the smallest. Countries with the lowest income inequality tended to evidence the highest differences in GMA between non‐immigrants and first‐generation immigrants. Implications for GMA testing as a potential barrier to immigrant employment success and the field’s current understanding of group differences in GMA test scores will be discussed.

The PIAAC survey is another PISA like survey but for adults, and also done by OECD. The samples are quite large in general. The key results are:


So, sometimes the gaps are quite large as one would expect. They average the gap sizes by cultural groups and get:

Their value of d = 0.97 for first generation vs. later generation immigrants is similar to my own results (Kirkegaard 2013, 2019). The troublesome finding for hereditarianism here is that the immigrants show marked gains from first to second generation. To pick some of the most important countries:

Country 1st gen gap 2nd gen gap gain closure% 1st gen n 2nd gen n
Denmark 0.79 0.57 0.22 28% 1446 67
Norway 0.85 593
Sweden 1.09 0.30 0.79 72% 707 140
Finland 1.16 180
Other Northwest Europe
Netherlands 0.95 0.36 0.59 62% 422 96
Germany 0.81 0.27 0.54 67% 648 335
Belgium 0.85 0.62 0.23 27% 354 103
France 0.91 0.24 0.67 74% 737 361
England 0.57 0.29 0.28 49% 625 222
Average Nordic 0.97 0.44 0.51 50% 731.50 103.50
Average other 0.82 0.36 0.46 56% 557.20 223.40
Average all 0.89 0.38 0.47 54% 634.67 189.14


Thus, we see about ~50% of the 1st generation gap disappears by 2nd generation. This finding seems at odds with genetic models but it’s worth spelling out the assumptions to reach this prediction to understand what it tells us about the model. Specifically, if we assume that:

  1. The national origin composition does not differ by generation. I.e., perhaps the earlier wave of immigrants are from higher IQ countries.
  2. The sub-national composition for intelligence does not differ by generation. This can occur if earlier immigrants are more elite than later immigrants even though they are from the same country. Or if there was selective back migration e.g. such that less intelligent first generation had a tendency to move elsewhere/back leaving more elite immigrants for second generation (has been found before for US).
  3. There is no measurement invariance between the first and second generations, i.e. all score differences are due to trait differences.
  4. There are no identification issues with ethnic attrition/out-marriage, so that 2nd generation immigrants include people who are interbred with locals.
  5. Differences aren’t due to age or sex confounding.

Under these conditions, I think we can infer that differences between the immigrant generations are due to environmental causes that boost the intelligence. Unfortunately, these conditions are rather restrictive making it a less good test of hereditarian model. Partial falseness of these assumptions together or alone would reduce in some difference between the generations that would actually be genetic in origin, not environmental as might appear. Thus, the really better way to approach these issues would just to have polygenic scores for the population in a regression so that would have a direct control for genetic confounding. Though polygenic scores would only be partially able to control due to ‘missing heritability’/measurement error, they would enable one to detect the existence of a genetic difference between generations even if one cannot control for it effectively.

Still, the assumptions above are testable and should be tested in future studies. The PIAAC survey has of course conducted some of their own measurement invariance tests (as mentioned in Robie et al), so perhaps this rules out (3). I am not sure without having looked in detail, but it’s curious that environmentalists would be the ones in this case to have to assume MI to argue their case, opposite of the usual situation.

May 2, 2019

FAQ for “Cognitive ability and political preferences in Denmark” Kirkegaard, Bjerrekær, Carl (2017)

Filed under: intelligence / IQ / cognitive ability,Political science — Emil O. W. Kirkegaard @ 04:38

Various critics of Noah Carl are being asked to produce something that shows they have read and understand his research, and they appear to be struggling. In an attempt to rectify this, we have Danish researcher Stine Møllegaard, a professor at Copenhagen University, making some criticisms of our study on Twitter. (Note that she subsequently deleted all of these comments.) Unfortunately, she does not seem that familiar with research in this area, since her points do not make a lot of sense. Stine is a moderately qualified critic by virtue of her own field, sociology, but does not appear to have published anything on psyhometrics and political attitudes (the topic of our paper)


I had work to do. I am also very sceptical about the “danish data” used in multiple papers in OpenPsych, some of which Noah also co-authored. It requires quite a lot of digging to find the actual description of the data – and I find it curious. Have you read any of the papers?

A strange claim considering that all the data is public and so are the surveys given to people in the case of our study. The link to the data is given both on the OpenPsych website and in the paper PDF:

Unknown pollster

“1) It’s collected via a service I’ve never heard of (and I got quite some experience with quantitative data research in Denmark). This is not critical in itself, but strange..”

Not particularly strange. Has Stine heard of every internet pollster? In fact, when we looked for data collection, we reached out to multiple Danish pollsters. There was a huge price difference between the options because some suggested doing phone or face interviews (expensive! one estimated 100k DKK). We settled on the relatively unknown Survee because it relied on online collecting, which is similar to other highly utilized English language pollsters like MTurk or Prolific.


2) This service pays participants to participate – probably motivating some types of participants rather than others. How are basic demographic measures such as occupation, income, education, marital status related to the likelihood of being on such a site?

Saying that it is representative is a bit of a stretch, imo. This is further confirmed by the rather large number of participants the co-authors themselves note “… did not comply with instructions and filled out the questions seemingly at random.”

It is closely nationally representative on several metrics as we reported by comparing with data from Danish stats agency:

Because our sample was essentially a self-selected sub-set of another sample, it might be biased. As a check on representativeness, we calculated mean values of relevant variables for responders and non-responders from the original sample. As Table 1 indicates, responders were slightly younger, had slightly lower cognitive ability and were slightly less educated than non-responders. There was, therefore, some selection bias in responding to our survey. However, in all cases the differences were quite small (e.g. d = .23 for cognitive ability) and the subset was thus still fairly representative of the general population.

She continues:

3) The authors “openly stated the purpose of the study in the introduction of the survey” which might further have contributed to some selection into who would want to participate in such a survey and their answers.

Would she rather we don’t state the purpose of the study? It’s a rather odd criticism since most studies give a general introduction in the beginning of the survey. She presents no evidence that this would cause important selection bias.

Did responders understand the task?

4) The authors openly admit that their followup survey showed that a group of participants (size unknown) had misunderstood some of the questions, “however they may have been lying.”

There is ample research (see this random example) that representative surveys often have issues with responders not understanding what they are supposed to do. We decided to investigate this, but the standard method is to ignore it in studies. We went to extra lengths to ensure that participants understood the tasks by excluding data from those who didn’t.

The cognitive test

5) The authors use a measure of cognitive ability they themselves developed and validated – but on a completely different group: namely primary school students. I would be very surprised if primary school students are representative of 30-39 year old Danes in general.

The authors admit that “did not have other criteria variables than age and grade level to validate the test against”. A bit of a stretch to use it as a general measure of cognitive abilities.

Particularly a person (Noah) who has done research on intelligence should be more critical about how to measure cognitive ability.

Stine is not familiar with the test. This was in fact a Danish translation of the ICAR test, which has been well-validated on tens of thousands of people and used by numerous other researchers. We have used it not once, but multiple times in Danish samples (with middle schoolers, high school students, and adults), and there is no evidence it does not work as intended generally speaking. Obviously, such a short test (9 items) will have fairly low reliability compared to multiple hour long tests given in person, but that’s the reality of survey data: we can’t feasibly give everybody a full Wechsler assessment. For comparison, there are hundreds of studies using the 10-item vocabulary test found in the ANES and GSS survey datasets. Thus, her criticism is not on target and applies equally well to 100s of other studies by other researchers.


6) For a group of researchers publishing in their own journal in the spirit of open science, it is really curious that they cannot disclose who is funding their research; “This research was supported by two anonymous research contributions.”

Stine appears oblivious to the political bias of the field and the media. Suppose I got private funding from some person who was interested in immigration but not heavily involved in politics. Now, I could report this, and the social justice activists would then seek out that person and try to ruin their reputation by virtue of their association with me or the study. There is no legal mandate to report private funding, and no ethical mandate to do so. The only place really where funding sources should be reported are for research with commercial implications, which ours obviously does not have.


I’m just saying… This is really not what I associate with well-conducted research. I would expect more of my students. Maybe I’m harsh? But when studying controversial questions, this is even more important, imo.

And here we get the truth: Stine has higher requirements for research she or other left-wingers don’t like: that’s what “controversial” means in this context. Selectively applied higher standards is the norm of political bias in science. There are various lines of evidence that show this, one can e.g. read the recently edited book by Crawford and Jussim (2017), or the long target article from 2015 (Duarte et al 2015).

Real Peer Review

OK. Well then he shouldn’t have a problem publishing these papers in actual peer-reviewed journals? As in _not_ reviewed by his friends, but researchers in the field?

OpenPsych is actually peer reviewed — no need for the motte and bailey. The reviewers on this particular paper are listed on the website: Robert L. Williams, Peter Frost, L. J. Zigerell. Zigerell is a professor of political science, Peter Frost is an anthropologist with numerous publications and a long running interest in IQ research, and Williams is retired but is well read on IQ research as evidenced by his publication of a paper in Intelligence previously, which is the highest impact journal of this field and run by Elsevier.

April 16, 2019

“a kind of social paranoia, a belief that mysterious, hostile forces are operating to cause inequalities in educational and occupational performance, despite all apparent efforts to eliminate prejudice and discrimination”

Filed under: Ethics,intelligence / IQ / cognitive ability,Sociology — Tags: , — Emil O. W. Kirkegaard @ 07:12

Quoting from Arthur Jensen’s book Educability and group differences (1973). He was being criticized for postulating genetic causes, which some critics think would cause great social harm if they were to be believed. That is, Jensen was replying to the Turkheimers of yesteryear.

The scientific task is to get at the facts and properly verifiable explanations. Recommendations for dealing with specific problems in educational practice, and in social action in general, are mainly a social problem. But would anyone argue that educational and social policies should ignore the actual nature of the problems with which they must deal? The real danger is ignorance, and not that further research will result eventually in one or another hypothesis becoming generally accepted by the scientific community. In the sphere of social action, any theory, true or false, can be twisted to serve bad intentions. But good intentions are im­potent unless based on reality. Posing and testing alternative hypotheses are necessary stepping stones toward a knowledge of reality in the scientific sense. To liken this process to screaming ‘FIRE . . . I think’ in a crowded theatre (an analogy drawn by Scarr-Salapatek, 1971b, p. 1228) is thus quite mistaken, it seems to me. A much more subtle and complete expression of a similar attitude came to me by way of the comments of one of the several anonymous reviewers whose judgments on the draft of this book were solicited by the publishers. It summarizes so well the feelings of a good number of scientists that it deserves to be quoted at length.

The author tends to show marked impatience with those indi­viduals who insist that in the race-IQ controversy genetic arguments for the difference must be conclusively demonstrated before the scientific community accords them standing. He points out that for any number of other questions the scientific community, when confronted by a body of what might be called substantial circumstantial or correlational evidence, would adopt the position that even though an hypothesis stood not conclu­sively proven it was most probably right. Furthermore, he indicates that this view, because it offers a convenient theoretical framework in which to fit observations and is congruent with the observations of racial differences in just about everything else, would also recommend itself to the scientific community. Emphasizing all these considerations, he suggests that the scientific community has failed to endorse the genetic hypothesis as the most likely explanation for difference in test performance by different racial populations merely because the area of race relations is a highly charged one. In several parts of the text he either directly or indirectly indicts the scientific community for showing such extreme caution in its reluctance to embrace the genetic hypothesis he so ably promulgates.

This is an indictment to which the community of scientists should plead guilty as charged. Unlike more esoteric and abstract questions an endorsement of the admittedly unproven, but in Professor Jensen’s view highly plausible, genetic hypothesis will likely be picked up by those who make public policy and by the public they serve, and viewed as established truths rather than plausible hypotheses. It is not difficult to see such a public leaning toward the genetic hypothesis by the scientific community being used to justify all sorts of racially restrictive policies. It does not really matter that the legislature who passed such restrictive legislation did not really understand that the scientific community was only collectively betting on a hunch rather than handing down truth. The problem is that wherever science has a large and direct interface with the social policy one must always weigh the potential social effects of saying as a scientist that one subscribes to this or that unproven hypothesis. The scientific community has, I believe, rightly felt that subscription to one or the other presently competing hypothesis has impli­cations that extend beyond science into areas of social concern.

It is likely that public policies based on the belief that differences in the environment account for the black-white difference would differ from policies based on the alternative genetic hypothesis. A plausible extension of the genetic hypothesis would suggest that the under-representation of blacks in many areas of society is, as one might expect, because the pool of able individuals is inherently proportionately lower in that population than in the white population and other racial populations. Subscription to the environmental view suggests that improvement of the environment, extension of opportunity and efforts to compensate for obvious educational and economic disadvantages if sufficiently massive and continuous will narrow and eliminate that gap. Whichever of these views is correct, the one adopted by the larger society could have an important effect on the direction and goals of public policies. Many who have examined the history of race relations in the United States and round the world feel that of the two hypotheses, neither of which stands proven, subscription to the genetic one carries considerable potential for mischief. It is for this reason such emphasis has been placed on exposing the difficulties of the work that must be done before the genetic view is raised from the level of hypothesis to the status of scientifically demon­strated fact.

I take little exception to this statement at its face value, and none at all to its spirit. The interesting point is that I have not urged acceptance of an hypothesis on the basis of insufficient evidence, but have tried to show that the evidence we have does not support the environmentalist theory which, until quite recently, had been clearly promulgated as scientifically established. By social scientists, at least, it was generally unquestioned, and most scientists in other fields gave silent assent. I have assembled evidence which, I believe, makes such complacent assent no longer possible and reveals the issue as an open question calling for much further scientific study. My critics cannot now say that this was always known to be the case anyway, for they were saying nothing of the kind prior to the appearance of my 1969 Harvard Educational Review article. It was just my questioning of orthodox environ­mental doctrine that set off such a furore in the social science world.

But my chief complaint with the attitudes expressed in the above quotation is that they do not indicate the full complexity of the options we face. Even the simplest formulation of the issue requires a 2 x 2 table of possible consequences, as follows:

Prevailing hypotheses Genetic Environmental
Genetic True G False G
Environmental False E True E


(It is understood that a genetic hypothesis does not exclude environmental variance, while the environmental hypothesis excludes a genetic difference.) The aim of science clearly is to rule out False G and False E, that is to say, it strives to determine which hypothesis accords with reality, so that the result of suffi­cient research would be either True G or True E. What the prac­tical implications of True G or True E would be is another matter. But apparently, for some persons the crucial alternative is not between conditions True G and E, on the one hand, versus False G and E, on the other, which are the alternatives of interest to science, but between True G and False G (which are usually viewed as indistinguishable and equally bad), on the one hand, versus False E and True E on the other, which are seen as equally good. This amounts to saying that the hypothesis that prevails, whether true or false, is more important than the reality. Agreed, we would prefer the outcome True E to True G; but this wish has often led also to a preference for False E over True G. Since by subscription to the environmental hypothesis the two preferable conditions, False E and True E, prevail, there is no incentive to research that would decide between them. It is gratuitously assumed that False E is also good, or at worst harmless, while False G, to say nothing of True G, would give rise to incalculable ‘mischief’. False E is made to appear a more benign falsehood than False G. This may be debatable. What seems to me to be much less debatable is the choice between True and False, whether E or G, even acknowledging the preference for True E. Is there less ‘mischief’ in False E than in True G? When the question is viewed in this way, it seems to me, it places the burden upon research rather than upon personal preference and prejudice, and that, to my way of thinking, is as it should be. Is the choice between False G and False E worthy of debate? When all the arguments are lined up so as to favor False E over False G (and sometimes even over True G), the importance of the scientific question seems moot. But is False E really all that much preferable to False G? Dwight Ingle (1967, p. 498) suggested that it may not be:

When all Negroes are told that their problems are caused solely by racial discrimination and that none are inherent within themselves, the ensuing hatred, frustration behavior – largely negative and destructive – and reverse racism become forms of social malignancy. Is the dogma which has fostered it true or false?

False E could generate a kind of social paranoia, a belief that mysterious, hostile forces are operating to cause inequalities in educational and occupational performance, despite all apparent efforts to eliminate prejudice and discrimination – a fertile ground for the generation of frustrations, suspicions and hates. Added to this is the massive expenditure of limited resources on misguided, irrelevant and ineffective remedies based upon theories not in accord with reality, and the resultant shattering of false hopes. The scientific consequences of False E, if it is very strongly preferred to False G or True G, is the discouragement of scientific thinking and research on such problems. A penalty is attached to scientific skepticism and dissent, and there is a denigration and corruption of the very tools and methods that can lead to better studies of the problems, such as we are seeing presently in the ideological condemnation of psychometrics by persons with no demonstrated competence in this field and with no ideas for advancing this important branch of behavioral science.

Would True G really make for the social catastrophe that some persons seem to fear would ensue? Since this has been an unques­tionable assumption underlying much of the opposition to investi­gation in this area, little, if any, serious sociological thought has been given to the possible problems that might be expected to arise when two or more visibly distinguishable populations, with different distributions of those abilities needed for competing in the performances most closely connected with the reward system of a society, are brought together to share in the same territory and culture. What arrangements would be most likely to make such a situation workable to everyone’s satisfaction? It has often been assumed that such a combination of two or more disparate populations could not work; hence the fear of True G and the preference for False E rather than to take the risk of doing research that might result in True E but could also result in True G – a risk that many seem unwilling to take. There is indeed still much room for philosophic, ethical, sociological and political thought and discussion on these issues. It was with respect to the scientific investigation of such difficult human problems that Herbert Spencer remarked, ‘. . . the ultimate infidelity is the fear that the truth will be bad.’

The cited study is:

What the quotation above suggest is a kind of lying for racial justice. This position is vary curious because the same kind of people who writes this sort of thing, also complain when Christians are lying for Jesus. Indeed RationalWiki, always the best example of irrationality, has a page attacking religious people for pious fraud.

The ‘attitude-achievement paradox’

Filed under: Education,intelligence / IQ / cognitive ability — Tags: — Emil O. W. Kirkegaard @ 06:05
Speaking of philosophy of science, one of the things about new paradigms is that they are supposed to solve anomalies of older paradigms. A while ago, Dalliard (2014) mentioned (in a footnote!) one of these, namely the attitude-achievement paradox. There’s quite a number of studies on it:
Screenshot from 2019-04-16 06-51-50.png
The brief version is that black, Hispanic etc. underachievement relative to white levels is thought to be caused by low expectations (‘negative stereotypes’) leading to low aspirations. However, when the supposed mediator is measured, it turns out that blacks actually have higher educational aspirations than whites, and thus the proposed mediator completely fails, being in the wrong direction. Blacks apparently, as a group, suffer from over-confidence, which is another confirmed stereotype.
I don’t think the previous literature has pointed out how the addition of group average differences to the model makes this anomaly go away. It’s another pointless epicycle in the blank slate model.

April 13, 2019

Death by affirmative action: race quotas in medicine

Filed under: intelligence / IQ / cognitive ability,Medicine — Tags: , , , — Emil O. W. Kirkegaard @ 08:57

See also previous 2017 post.

I have been repeatedly asked about this topic, so here is a post that covers the basics. The argument goes like this:

  • Affirmation action is used in admission to medical schools, i.e. they practice race quotas in favor of less intelligent races, in particular blacks and hispanics, and discriminate against whites and especially (East) Asians.
  • Affirmation action results in less intelligent people getting into schools.
  • Less intelligent people tend to drop out more, so we expect and do see higher drop out rates among blacks.
  • Even with differential drop-out rates by race, affirmative action results in less intelligent people graduating and eventually practicing medicine.
  • Less intelligent people have worse job performance in every job type. This is especially the case for highly complex jobs such as being a doctor which results ultimately in patient suffering including untimely death.
  • Thus, bringing it all together, affirmation action for race results in less intelligent blacks and hispanics being admitted to medical schools, and when they don’t drop out, they end up practicing medicine, and in doing so, they do a worse job than white and Asian people would have done, thereby killing people by incompetence.

Given what we know about race, intelligence, and job performance, the conclusion is essentially inevitable. However, as far as I know, no one has actually published a study on malpractice rates by race, or some other direct measure of patient harm. You can probably imagine why that is the case. However, we do have a case study of a similar thing happening in the police (Gottfredson, 1996).

But let’s take things one step at a time.

Affirmation action … in action

Ample data exists about this. Mark Perry has documented it well:

The MCAT (Medical College Admission Test) is a cognitive test used for medical schools, basically their version of the SAT/ACT. I don’t know if anyone has published a study relating this one directly to an IQ test, but we have such data for the other achievement/entrance tests: SAT, ACT, and KTEA.

The table shows that someone with about the same score in the highlighted column has ~22.5% chance of admittance if Asian, and 81% if black. One can also fit a model, and calculate the specific race benefits in odds ratios, as was done in this report:

Medical school acceptance and matriculation rates

The table below shows acceptance and matriculation (people who actually enroll) rates by race (source):

So we see that blacks and hispanics have lower acceptance/matriculation rates. This is because while they gain from affirmative action, they are also worse applicants, and worse than the favoritism granted by affirmative action.

Competency measures

Tables below show average competencies by race for applicants and matriculates (same source as above):

Unhelpfully, no standard deviations are supplied for these measures, so one cannot just glance the Cohen’s d values from the table. These can be found in the table below (source):

These are for the total population of applicants, not whites alone, so the Cohen’s d will be slightly underestimated by using these (say, 10 to 15%). So, then we calculate the d gaps by race to white (I used matriculates, as is most relevant), which are:

Measure d gap to Whites
black Hispanic Asian
MCAT CPBS -0.63 -0.52 0.30
MCAT CARS -0.74 -0.74 -0.04
MCAT BBLS -0.63 -0.44 0.15
MCAT PSBB -0.52 -0.52 0.19
Total MCAT -0.73 -0.63 0.16
GPA Science -0.74 -0.44 0.02
GPA Non-Science -0.54 -0.36 0.04
GPA Total -0.71 -0.44 0.03


Thus, we see that black and Hispanic enrolled students are quite far below whites in academic talent, and Asians somewhat above.

Differential drop-out rates and competence among graduates

Less capable students drop out more, so this tends to even out group gaps among admitted students. However, the process is not 100% effective, so we still end up with gaps among the graduates. The figure and table below show the differential drop-out (source):

Finding competence data for graduates was more tricky, but I was able to find this figure (source):

Thus, we see that the rank order of scores among different races is preserved among graduates as well. The placed/unplaced refers to whether the graduates were able to find a residency.

Medical competency

Instead of relying on more academically broad MCAT, we can look at various medical school tests and exams (which are of course well predicted by the MCAT). First table is from here and concerns a competency test taken at medical school.

The adjusted score refers to the explained gaps after controlling for prior MCAT score and GPA. Women do worse in these old data, as they generally do on high stakes testing.

From the same Michigan report as above, we get a similar result:

The 75th centile black student in medical school do about as well as the 25th centile white and Asian student.

There’s also a similar report for Maryland. See also UK results in this meta-analysis.

MCAT et al predict actual medical performance

There are a variety of studies on this question, and a few reviews (e.g. here, here). Here’s an example of a primary study:


To establish whether successful certifying examination performances of doctors are associated with their patients’ mortality and length of stay following acute myocardial infarction.


Risk adjusted mortality and survivors’ length of stay were compared for doctors who had satisfactorily completed training in internal medicine or cardiology and attempted the relevant examination. Specifically, the study investigated the joint effects of hospital location, availability of advanced cardiac care, doctors’ specializations, certifying examination performances, year certification was first attempted and patient volume.


Data on all acute myocardial infarctions in Pennsylvania for the calendar year 1993 were collected by the Pennsylvania Health Care Cost Containment Council. These data were combined with physician information from the database of the American Board of Internal Medicine.


Holding all variables constant, successful examination performance (i.e. certification in internal medicine or cardiology) was associated with a 19% reduction in mortality. Decreased mortality was also correlated with treatment in hospitals located outwith either rural or urban settings and with management by a cardiologist. Shorter stays were not related to examination performance but were associated with treatment by high volume cardiologists who had recently finished training and who cared for their patients in hospitals located outwith rural or urban settings.


The results of the study add to the evidence supporting the validity of the certifying examination and lend support to the concept that fund of knowledge is related to quality of practice.

There are well-known race difference in job performance in general

Even if we didn’t have the other data, we would be well within reason to conclude that there are race differences in job performance among doctors, because we have large meta-analyses of race differences in work performance in general:

Race differences in medical performance

Thus, finally, we get to the last part. Unfortunately, I have not been able to find a study that looked at individual doctor’s race and malpractice and other patient harms measures. One can however use the race composition of medical schools as a proxy. There is a study of medical schools showing they have consistently different rates, but it didn’t investigate the link to school level metrics such as mean MCAT/GPA among students.

  • Waters, T. M., Lefevre, F. V., & Budetti, P. P. (2003). Medical school attended as a predictor of medical malpractice claims. BMJ Quality & Safety, 12(5), 330-336.

Objectives: Following earlier research which showed that certain types of physicians are more likely to be sued for malpractice, this study explored (1) whether graduates of certain medical schools have consistently higher rates of lawsuits against them, (2) if the rates of lawsuits against physicians are associated with their school of graduation, and (3) whether the characteristics of the medical school explain any differences found.

Design: Retrospective analysis of malpractice claims data from three states merged with physician data from the AMA Masterfile (n=30 288).

Study subjects: All US medical schools with at least 5% of graduates practising in three study states (n=89).

Main outcome measures: Proportion of graduates from a medical school for a particular decade sued for medical malpractice between 1990 and 1997 and odds ratio for lawsuits against physicians from high and low outlier schools; correlations between the lawsuit rates of successive cohorts of graduates of specific medical schools.

Results: Medical schools that are outliers for malpractice lawsuits against their graduates in one decade are likely to retain their outlier status in the subsequent decade. In addition, outlier status of a physician’s medical school in the decade before his or her graduation is predictive of that physician’s malpractice claims experience (p<0.01). All correlations of cohorts were relatively high and all were statistically significant at p<0.001. Comparison of outlier and non-outlier schools showed that some differences exist in school ownership (p<0.05), years since established (p<0.05), and mean number of residents and fellows (p<0.01).

Conclusions: Consistent differences in malpractice experience exist among medical schools. Further research exploring alternative explanations for these differences needs to be conducted.

Of particular interest here is the mentioned database, AMA Masterfile. Perhaps one can obtain access to this.

Worse, one can find studies that investigate the effect of race on patient outcomes — but only the patient’s race and sometimes the interaction of the patient with the doctor, without reporting the doctor’s race main effect! Here’s an example:

Context Many studies have documented race and gender differences in health care received by patients. However, few studies have related differences in the quality of interpersonal care to patient and physician race and gender.

Objective To describe how the race/ethnicity and gender of patients and physicians are associated with physicians’ participatory decision-making (PDM) styles.

Design, Setting, and Participants Telephone survey conducted between November 1996 and June 1998 of 1816 adults aged 18 to 65 years (mean age, 41 years) who had recently attended 1 of 32 primary care practices associated with a large mixed-model managed care organization in an urban setting. Sixty-six percent of patients surveyed were female, 43% were white, and 45% were African American. The physician sample (n=64) was 63% male, with 56% white, and 25% African American.

Main Outcome Measure Patients’ ratings of their physicians’ PDM style on a 100-point scale.

Results African American patients rated their visits as significantly less participatory than whites in models adjusting for patient age, gender, education, marital status, health status, and length of the patient-physician relationship (mean [SE] PDM score, 58.0 [1.2] vs 60.6 [3.3]; P=.03). Ratings of minority and white physicians did not differ with respect to PDM style (adjusted mean [SE] PDM score for African Americans, 59.2 [1.7] vs whites, 61.7 [3.1]; P=.13). Patients in race-concordant relationships with their physicians rated their visits as significantly more participatory than patients in race-discordant relationships (difference [SE], 2.6 [1.1]; P=.02). Patients of female physicians had more participatory visits (adjusted mean [SE] PDM score for female, 62.4 [1.3] vs male, 59.5 [3.1]; P=.03), but gender concordance between physicians and patients was not significantly related to PDM score (unadjusted mean [SE] PDM score, 76.0 [1.0] for concordant vs 74.5 [0.9] for discordant; P=.12). Patient satisfaction was highly associated with PDM score within all race/ethnicity groups.

Conclusions Our data suggest that African American patients rate their visits with physicians as less participatory than whites. However, patients seeing physicians of their own race rate their physicians’ decision-making styles as more participatory. Improving cross-cultural communication between primary care physicians and patients and providing patients with access to a diverse group of physicians may lead to more patient involvement in care, higher levels of patient satisfaction, and better health outcomes.

Note the use of “race-concordant”, which means patient and doctor had the same race. This is in fact a simple coding of the interaction effect between patient and physician race.

Bonus: really nice p-values in those findings. No worries, the study only has ~1900 citations.

Oh, and final point. From a public health perspective, affirmative action probably mostly kills blacks and Hispanics. People prefer to befriend and date others who are the same race, and this ethno-centrism also applies to patient’s choice of physicians. As such, the incompetent black and Hispanic physicians are mostly treating and thus harming black and Hispanic patients, who would have been better off with a white or Asian doctor.

April 11, 2019

The northwest-southeast cline in Europe and the brain

Filed under: Genomics,intelligence / IQ / cognitive ability,Neuroscience — Tags: — Emil O. W. Kirkegaard @ 19:55

A friend sent me this amazing study, seems to have been previously overlooked by hereditarians.


Human skull and brain morphology are strongly influenced by genetic factors, and skull size and shape vary worldwide. However, the relationship between specific brain morphology and genetically-determined ancestry is largely unknown.


We used two independent data sets to characterize variation in skull and brain morphology among individuals of European ancestry. The first data set is a historical sample of 1,170 male skulls with 37 shape measurements drawn from 27 European populations. The second data set includes 626 North American individuals of European ancestry participating in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) with magnetic resonance imaging, height and weight, neurological diagnosis, and genome-wide single nucleotide polymorphism (SNP) data.


We found that both skull and brain morphological variation exhibit a population-genetic fingerprint among individuals of European ancestry. This fingerprint shows a Northwest to Southeast gradient, is independent of body size, and involves frontotemporal cortical regions.


Our findings are consistent with prior evidence for gene flow in Europe due to historical population movements and indicate that genetic background should be considered in studies seeking to identify genes involved in human cortical development and neuropsychiatric disease.

Main highlights from the paper text:

We then tested the hypothesis that skulls exhibit clinal variation along geographic axes within Europe. A directional Mantel correlogram shows a monotonic decrease in craniometric similarity with distance in two orthogonal directions, NW-SE and NE-SW (online suppl. fig. S3B), and this result motivated us to search for a geographic axis that can explain a significant fraction of the craniometric variation between populations. Redundancy analysis, a constrained version of principal components analysis (PCA), was used to find a projection that maximizes the variation of the 37 cranial measures under the condition that this projection is a linear combination of longitude and latitude. The first principal component is statistically significant (p = 0.005) and explains 12.8% of total craniometric variation, more than a third of the variation (30.5%) explained by the first component of an unconstrained PCA. Population eigenvalues for the first principal component were interpolated by ordinary kriging and plotted to create an isocline map of Europe (fig. (fig.2a).2a). Cross-validation indicated that predicted population eigenvalues were highly correlated with observed values (r = 0.88). Significantly, this map shows a clear gradient along a NW-SE axis that was not specified a priori and emerged from the redundancy analysis as the direction of maximum cranial variation. Therefore, a subset of cranial measures exhibits clinal variation along this geographic axis.

Cranial morphology reflects geography across Europe. a Pairwise distances between 27 European populations. Craniometric distance is significantly correlated (rM = 0.51, p < 1 × 10-5) with geographic distance. b Non-metric multi-dimensional scaling ordination of craniometric distances aligned to geographic coordinates of populations. Population symbols identify 4 clusters, and lines form a minimum spanning tree. c Distances between predicted locations based on craniometric ordination and population locations. Average ± SD plotted for 100 bootstrap replications (black) and random permutations (gray). * p < 0.001. d Individual female skulls were identified with correct or nearby populations based on cranial morphometry (solid) significantly better than chance (dotted). e Proportion of female skulls that were correctly classified (black) and misclassified with populations at different distances (gray shades). Sample sizes are listed after population name.

Cranial measures show significant variation along a NWSE axis within Europe. a Isoclines of interpolated eigenvalues for first spatially constrained component of a redundancy analysis and geographic locations of populations. b Cranial measures plotted in order of their contribution to this map. Negative abscissas correspond to a more NW location. Proportion of variance explained (R2) and nominal p values are indicated. NLB = Nasal breadth; M28 = sagittal occipital arc; GOL = glabello-occipital length; NOL = nasio-occipital length; ASB = biasterionic breadth; BBH = basion-bregma height.

Intracranial and brain volumes and cortical surface area progressively increase with the amount of inferred NW European ancestry (fig. (fig.3b),3b), and these measures are approximately 5% larger in the 10% of individuals with the most NW European ancestry compared to the 10% with the most SE European ancestry. This percentage increase matches the percentage increase in cranial length and breadth observed along the same NW-SE geographic axis in the skull data set (fig. (fig.2b)2b) and cannot be attributed to a correlation with body size since we controlled for height and weight. This correlation involves specific – not global – brain morphology because hippocampal, basal ganglia, ventricular, and cerebellar volumes and average cortical thickness are not associated with NW-SE ancestry.

In this study, we leveraged brain imaging and genome-wide genotyping from 626 European Americans, as well as skull measurements obtained on an independent set of 1,170 individuals of European ancestry, to test the hypothesis that skull and brain morphology, like genetic background [,], mirror geography within Europe. We found that skull and brain morphology vary continuously across Europe as evidenced by the weak clustering between and large variation within populations and are geospatially structured. In particular, we observed a significant NW-SE gradient in morphology that is independent of body size and involves predominantly frontotemporal cortical areas.

The genetic basis for this geographic trend in skull and brain variation is strengthened by the observation of the trend in North American individuals of European ancestry. The environment – e.g. nutrition and prenatal healthcare – can influence skull morphometry due to developmental plasticity [,] and may be correlated with genetic variation across Europe. In contrast, environmental exposures may vary among European Americans, but this variation is less likely to be geographically structured based on individuals’ European ancestry. Furthermore, Ashkenazi Jewish individuals are geographically dispersed in Europe and yet are genetically quite similar and genetically intermediate between SE European and Middle Eastern populations [,,,]. This provides further support that the observed NW-SE clinal variation in brain morphology is driven by genetic more than environmental differentiation of these populations.


It is plausible that genes responsible for cortical expansion during human evolution retain a role in brain development and contribute to normal variation in brain morphology within and between modern human populations [,]. Identifying these genes could contribute to our understanding of developmental abnormalities associated with neuropsychiatric diseases such as autism and schizophrenia. In this context, admixture mapping may prove to be a powerful strategy for identifying genomic regions responsible for overt brain morphology differences among individuals of European ancestry. Independent of the use of inferred ancestry for identifying genes, our results indicate that studies seeking to identify genes that influence brain morphology should consider genetic background, as it reflects historical mixing and then isolation of populations.


Frontotemporal cortical regions are most affected by NW European ancestry. Lateral view of the left hemisphere with color map that indicates nominal -log10 (p value) of association between estimated NW-SE ancestry and cortical surface area across the reconstructed cortical surface, while controlling for height, weight, BMI, age, sex, and diagnosis.

Structural brain measures follow a predicted trend in a group of individuals with European ancestry. a The first two principal components of genotypes of ADNI subjects (yellow/small points; color refers to online version only) and individuals from European reference populations (gray crosses) rotated 18° to align with a map of Europe. For each reference population (see online suppl. table S3 for labels), the average (SD) of principal components for all individuals in that population are indicated by disc position (diameter). Geographic origin of each population is indicated by disc shade of gray from NW (black) to SE (light gray) Europe. ADNI subjects are spread out primarily along a NW-SE axis and form two distinct clusters corresponding to NW European and Ashkenazi Jewish ancestry (see also online suppl. fig. S5). b Brain structural measures tested for association with estimated NW-SE ancestry, while controlling for height, weight, BMI, age, sex, and diagnosis. Negative abscissas correspond to a larger proportion of NW ancestry.

March 27, 2019

Up to date introductions to psychology, stats, and genomics — 2019 March

Filed under: Math/Statistics,Psychology — Tags: , — Emil O. W. Kirkegaard @ 23:54

A fellow emailed me asking for help on what to start reading to learn psychology and stats. The replication crisis means that most older psychology introductions should be viewed with distrust, and the various new streams in stats means that much of the stuff in stats textbooks, while not wrong, is less than optimal. In the optimal world, there would be a new textbook on various psychology topics that covered all the important ideas that failed replication (stereotype threat, growth mindset etc.),  and especially the things that did not fail (stereotype accuracy, IQ testing, behavioral genetics [not candidate or GxE]). Unfortunately, this doesn’t exist yet. However, the below are a reasonable start in my opinion.


I’m open to recommendations that cover more subfields, but I’m not aware of any post-crisis introduction to e.g. social psychology.


Bonus, for genomics I recommend:

March 24, 2019

Before PISA and Binet

Filed under: History,intelligence / IQ / cognitive ability — Tags: , , — Emil O. W. Kirkegaard @ 20:41

Some years ago, I discovered the Swiss dataset used in R. It has data from the late 1800s of Swiss subnational units, 47 French speaking ‘provinces’ i.e. sub-canton level (probably districts). One can the documentation here. What caught my eye was that they had test data from the army examinations going back to 1870s, i.e. before Binet invented ‘the first’ IQ test in 1904 or so. If one follows the source to van de Walle 1980, we find this description:

As in a few other European countries, military recruits in Switzerland were given literacy tests during the annual draft. Exams for draftees were first introduced in 1843 in the canton of Solothurn in an effort to interest the working classes in their self- improvement and to fight illiteracy. By 1875 when recruiting became a federal prerogative, exams were being given in 14 cantons but were not standardized. The Con- federation made them mandatory and uniform. From then on, the exams provided an objective ranking of the educational level of cantons and districts.

Young men were examined (in their mother tongue) on their ability to read and write and on their skills in composition, arithmetic, history, and geography. For each exam, the recruits received a grade ranging from I to 4 (up to 1880) or 5 (thereafter). Grade 1 reflected excellence and grade 4 or 5, complete ignorance. In addition, information was recorded on the extent of the draftee’s education beyond the primary level.

Throughout the last quarter of the nineteenth century and into the early years of the twentieth century, these exams were “the symbol of the ambitions of the Confederation in matters of culture and enlightenment.”9 The avowed objective of the federal military exams was to test the educational quality of cantonal primary schools. Accordingly, the exams generated great competition among the cantons and provided a strong incentive to improve education. Greater emphasis was placed on secondary education, and a variety of schools were created for those who had quit after the primary level. After 1890, schools for boys aged 15 and 16 were opened in most cantons; classes were given at night or during the winter, for three to six hours a week or at least 40 hours a year. During the semester preceding the draft, boys attended preparatory schools for draftees, which were compulsory in some cantons.

The data on draftees’ examinations have the advantage of referring to a population belonging to a precisely defined age group-namely, young men before the start of their reproductive years. They are available between 1875 and 1913, and they provide a sensitive description of the educational level of the Swiss population during the period of sustained decline in marital fertility. Their principal disadvantage is that they relate to the male population only. No comparable information exists for females; however, it is plausible that the data on male recruits reflect the quality of education available by district. Compulsory education up to age 14 applied to both sexes from 1874 on, although in Switzerland as elsewhere in the Western world, attendance was higher among males than females.

We computed three indexes of education: (1) the proportion of recruits who scored an average grade of excellent (grade 1 in more than two subjects in 1888 and 1910, an average grade of 1 to 2.5 in 1875); (2) the proportion who scored poorly (grade 4 or 5 in more than one subject in 1888 and 1910, an average grade of 2.5 to 4 in 1875); and (3) the proportion who attended school beyond the primary level. These indexes were computed on the basis of three years’ results centered on the census years of 1888 and 1910; 1875 is the first year for which federal examination results are avail- able, and we have correlated fertility indexes in 1870 with literacy in the mid-1870s.

So, despite the tests sometimes called referring to as an education measure, or a literacy measure, they were quite broad and is more like a modern achievement test taken at the end of school, like the Dutch CITO, or Kaufman Test of Educational Achievement. These are known to very good measures of general intelligence, r = .80 or so.

In trying to find out more about these tests, and preferably some actual individual level data, I found an obscure history book about early attempts at making cross-national educational testing comparable: An Atlantic Crossing? The Work of the International Examination Inquiry, its Researchers, Methods and Influence. The book isn’t on LibGen yet, but we can read some of it via Google Books. In particular, from the introduction:

The work of the International Examinations Inquiry (1E1) is almost forgotten now and yet it was an international and well-funded scientific project, operating over seven years, which attracted key world figures in educational research and undertook significant exchanges of data and experiment. Originally, it comprised the USA, Scotland, England, France, Germany and Switzerland, and then it grew to include Norway, Sweden and Finland. The core research group met three times: in Eastbourne in 1931 and Folkestone in 1936 (both in England) and Dinard, near St Mato (in France), in 1938. The key problem which united the researchers was the expansion of secondary education and the determination of the most effective way of examining pupils for entry into the secondary school. Each of the national case studies tried to produce information which could help, nationally or internationally, to support changes or improvements in examining — from intelligence tests to essay marking (Appendix I). The value of the MI is that it reveals the difficulties that historians of education have in chronicling the exchanges of scientific ideas and researchers who work across borders. In this project, psychologists and comparativists can be seen in the task of engaging with each other’s work, and providing supportive and necessary conditions for these exchanges, and in turn effecting On international space, at the same time as they are producing a national one.

So it appears that:

  • Binet wasn’t the first to make an IQ test in the obvious sense. This had been done several decades before in army tests. Binet was perhaps the first to make a practical test for testing young children.
  • There is a much longer history of cross-national collaboration on test development than I was aware of.
  • It might be possible to obtain some of these early data in some obscure archive.

Bonus: if we look at the list of names involved, we aren’t that surprised:

There’s even a bunch of Swedes and Norwegians I haven’t heard of.


A libgen copy of the book has materialized. Also, there are 2 academic reviews of the book.

Seems like the interesting progress was cut short by the usual suspects:

In retrospect the timing of the IEI could not have been worse, coinciding with the rise of fascism and the threat of a new world war. The three-man German delegation, committed to the central concept of German academic culture, Bildung – the release of innate potential within individuals–lost their positions when Hitler came to power. A Norwegian representative, later a resistance member, committed suicide after his arrest rather than betray the movement. This human dimension breathes life into the educational issues the book deals with and engages the reader’s interest.

Given the intervention of war, it is not surprising that the IEI research did not bear fruit until after 1945. The tripartite school system, with intelligence testing to select students, and examination reform in England have their origins in the IEI.New research institutions in Scotland, Sweden and England were funded in part by IEI Carnegie money. Norway’s final IEI report, published as late as 1961, confirmed its belief that its examination system was valid and reliable, a simplistic faith later shaken by PISA results.


March 10, 2019

Extreme phenotype colorism

Filed under: intelligence / IQ / cognitive ability — Tags: , — Emil O. W. Kirkegaard @ 22:15

One prominent model for why (some) nonwhite race groups do worse is that they experience hostile experience based on skin color. For instance, Hunter (2007) writes:

How does colorism operate? Systems of racial discrimination operate on at least two levels: race and color. The first system of discrimination is the level of racial category, (i.e. black, Asian, Latino, etc.). Regardless of physical appearance, African Americans of all skin tones are subject to certain kinds of discrimination, denigration, and second-class citizenship,simply because they are African American. Racism in this form is systemic and has both ideological and material consequences (Bonilla-Silva 2006;Feagin 2000). The second system of discrimination, what I am calling colorism, is at the level of skin tone: darker skin or lighter skin. Althoug hall blacks experience discrimination as blacks, the intensity of that discrimination, the frequency, and the outcomes of that discrimination will differ dramatically by skin tone. Darker-skinned African Americans may earn less money that lighter-skinned African Americans, although both earn less than whites. These two systems of discrimination (race and color)work in concert. The two systems are distinct, but inextricably connected. For example, a light-skinned Mexican American may still experience racism, despite her light skin, and a dark-skinned Mexican American may experience racism and colorism simultaneously. Racism is a larger, systemic,social process and colorism is one manifestation of it.

Although many people believe that colorism is strictly a ‘black or Latino problem’, colorism is actually practiced by whites and people of color alike. Given the opportunity, many people will hire a light-skinned person before a dark-skinned person of the same race (Espino and Franz2002; Hill 2000; Hughes and Hertel 1990; Mason 2004; Telles and Murguia 1990), or choose to marry a lighter-skinned woman rather than a darker-skinned woman (Hunter 1998; Rondilla and Spickard 2007; Udryet al. 1971). Many people are unaware of their preferences for lighter skin because that dominant aesthetic is so deeply ingrained in our culture. In the USA, for example, we are bombarded with images of white and light skin and Anglo facial features. White beauty is the standard and the ideal (Kilbourne 1999).

So, strong claims, but her review has pretty flimsy evidence. The general problem with this idea is that hereditarians also predict that skin color is related to positive outcomes, but they think it’s because of it’s relationship to genetic ancestry/admixture, in particular, lighter skin correlates with more European ancestry in admixed peoples such as US Blacks and Hispanics. Furthermore, they think European ancestry is more enriched with cognitive boosting alleles than is African, hence resulting in the correlation with better outcomes thru cognitive meritocracy. Published estimates of the skin color — ancestry correlation shows it to be 0.20 to 0.70, depending which population is measured and how good the measurements were (Parra 2004; Beleza et al 2013).

A path model of the scenario looks something like this:

How can we test who is right? The typical method used by sociologists is just cross-sectional results, which are not necessarily informative about the causal structure. They can be because specific effect sizes are predicted by different models, but generally speaking, no one bothers to formalize colorism in order to derive specific numerical predictions. Another way to examine the question is to look at color variation between siblings, which will only be very weakly related to ancestry differences (which are tiny between siblings). This method has been applied a few times, including to IQ, and the results generally show that colorism is nonexistent or weak (see this post), see also the research discussed in the PING study.

Another method is to exploit a rare but quasi-random type of variation in skin color that’s also not linked to ancestry: albinism. Human albinism is caused by a mutation in one of the few coloration genes. Depending on which one, it may affect skin, eyes, hair, or just one of these. Siblings usually do not share this trait because the mutation is recessive. Assuming 2 carrier parents, this means offspring only has 25% chance of being affected.

Living with albinism is certainly something of a huge impact on one’s life, and one can expect various forms of maltreatment, at the very least on the dating market since it’s considered fairly ugly. But does this extra life stress result in serious psychological malfunction, or do people just learn to live with it? In particular, does it result in cognitive deficits? There is actually a literature review of studies on this:

The key table:

Now, the studies are small of necessity because albinos are quite rare, and they are also quite old. Nevertheless, 5 studies with total n = 106 were compared with matched controls or siblings, and there doesn’t seem to be any IQ difference. If being treated poorly due to miscolored skin was a large cause of lower intelligence, we should have seen it here.

The Beckham study is refreshingly forthright:

March 9, 2019

Thou shall not cite Richard Lynn

Filed under: intelligence / IQ / cognitive ability — Tags: , , — Emil O. W. Kirkegaard @ 13:34

Thou should in fact basically copy his work, and then publish it in The Lancet, and then not cite him, or Heiner Rindermann, or David Becker, or Gerhard Meisenberg — in fact, best not to cite any wrong-thinker at all.

This seems to be what happened actually (initial tweet).


Human capital is recognised as the level of education and health in a population and is considered an important determinant of economic growth. The World Bank has called for measurement and annual reporting of human capital to track and motivate investments in health and education and enhance productivity. We aim to provide a new comprehensive measure of human capital across countries globally.

We generated a period measure of expected human capital, defined for each birth cohort as the expected years lived from age 20 to 64 years and adjusted for educational attainment, learning or education quality, and functional health status using rates specific to each time period, age, and sex for 195 countries from 1990 to 2016. We estimated educational attainment using 2522 censuses and household surveys; we based learning estimates on 1894 tests among school-aged children; and we based functional health status on the prevalence of seven health conditions, which were taken from the Global Burden of Diseases, Injuries, and Risk Factors Study 2016 (GBD 2016). Mortality rates specific to location, age, and sex were also taken from GBD 2016.
In 2016, Finland had the highest level of expected human capital of 28·4 health, education, and learning-adjusted expected years lived between age 20 and 64 years (95% uncertainty interval 27·5–29·2); Niger had the lowest expected human capital of less than 1·6 years (0·98–2·6). In 2016, 44 countries had already achieved more than 20 years of expected human capital; 68 countries had expected human capital of less than 10 years. Of 195 countries, the ten most populous countries in 2016 for expected human capital were ranked: China at 44, India at 158, USA at 27, Indonesia at 131, Brazil at 71, Pakistan at 164, Nigeria at 171, Bangladesh at 161, Russia at 49, and Mexico at 104. Assessment of change in expected human capital from 1990 to 2016 shows marked variation from less than 2 years of progress in 18 countries to more than 5 years of progress in 35 countries. Larger improvements in expected human capital appear to be associated with faster economic growth. The top quartile of countries in terms of absolute change in human capital from 1990 to 2016 had a median annualised growth in gross domestic product of 2·60% (IQR 1·85–3·69) compared with 1·45% (0·18–2·19) for countries in the bottom quartile.

Countries vary widely in the rate of human capital formation. Monitoring the production of human capital can facilitate a mechanism to hold governments and donors accountable for investments in health and education.

Methods section sounds very familiar:

We did a systematic analysis of available data for 195 countries from 1990 to 2016 to measure educational attainment, by sex and 5-year age groups (from 5 to 64 years) for the in-school and working-age population, and learning, as measured by performance on standardised tests of mathematics, reading, and science by 5-year age groups (from 5 to 19 years) for school-aged children. We constructed a measure of functional health status using the prevalence, by 5-year age groups, of seven health conditions for which evidence suggests a link to economic productivity using estimates from the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2016.16We also used mortality rates specific to location, age, sex, and year from GBD 2016.17
Our testing database contains a comprehensive record of learning scores for school-aged children aged 5–19 years. Four major programmes provide extensive data: the Programme for International Student Assessment [PISA], which began in 2000 and now tests students in 73 countries on a 3-year cycle;20the Progress in International Reading Literacy Study (PIRLS), which covered 50 countries in the 2016 iteration;21the Trends in International Mathematics and Science Study (TIMSS), of which the latest round in 2015 covered 57 countries;22 and several tests from the International Association for the Evaluation of Educational Achievement.2324,25 In addition to these programmes, we also used regional testing programmes, including the Southern and Eastern Africa Consortium for Monitoring Educational Quality [SACMEQ], 26 the Latin American Laboratory for Assessment of the Quality of Education, 27 and the Programme d’Analyse des Systèmes Educatifs de la Confem; 28 national standardised testing programmes, such as the US National Assessment of Education Progress, 29 and the India National Achievement Survey; 30 and representative studies measuring intelligence quotient (IQ) in school-aged children that largely included the Wechsler Intelligence Scale for Children, 31 the Raven’s Standard Progressive Matrices, 32 and the Peabody Picture Vocabulary test. 33 This database provides the most extensive geographical distribution and compilation of long-term temporal trends to date. Unlike several other studies, 3435, 36, 37, 38 which used similar data, we kept scores in different school subjects (ie, mathematics, reading, and science) separate. We also maintained data on the year the tests were done to understand trends through time and included demographic information such as grade level (for implied age) and sex.
On the positive side, they did produce a map that looks like this, which we might call the TOTALLY NOT WORLD NATIONAL IQs map.

National IQs from Sea Hero Quest

Bonus! A little while back, another team of unrelated people also produced a replication of national IQs, of course, also without citing any wrong-thinker.

Human spatial ability is modulated by a number of factors, including age [ 1, 2, 3] and gender [ 4, 5]. Although a few studies showed that culture influences cognitive strategies [ 6, 7, 8, 9, 10, 11, 12, 13], the interaction between these factors has never been globally assessed as this requires testing millions of people of all ages across many different countries in the world. Since countries vary in their geographical and cultural properties, we predicted that these variations give rise to an organized spatial distribution of cognition at a planetary-wide scale. To test this hypothesis, we developed a mobile-app-based cognitive task, measuring non-verbal spatial navigation ability in more than 2.5 million people and sampling populations in every nation state. We focused on spatial navigation due to its universal requirement across cultures. Using a clustering approach, we find that navigation ability is clustered into five distinct, yet geographically related, groups of countries. Specifically, the economic wealth of a nation was predictive of the average navigation ability of its inhabitants, and gender inequality was predictive of the size of performance difference between males and females. Thus, cognitive abilities, at least for spatial navigation, are clustered according to economic wealth and gender inequalities globally, which has significant implications for cross-cultural studies and multi-center clinical trials using cognitive testing.

Which produced this map:

Older Posts »

Powered by WordPress