Clear Language, Clear Mind

April 13, 2019

Death by affirmative action: race quotas in medicine

Filed under: intelligence / IQ / cognitive ability,Medicine — Tags: , , , — Emil O. W. Kirkegaard @ 08:57

See also previous 2017 post.

I have been repeatedly asked about this topic, so here is a post that covers the basics. The argument goes like this:

  • Affirmation action is used in admission to medical schools, i.e. they practice race quotas in favor of less intelligent races, in particular blacks and hispanics, and discriminate against whites and especially (East) Asians.
  • Affirmation action results in less intelligent people getting into schools.
  • Less intelligent people tend to drop out more, so we expect and do see higher drop out rates among blacks.
  • Even with differential drop-out rates by race, affirmative action results in less intelligent people graduating and eventually practicing medicine.
  • Less intelligent people have worse job performance in every job type. This is especially the case for highly complex jobs such as being a doctor which results ultimately in patient suffering including untimely death.
  • Thus, bringing it all together, affirmation action for race results in less intelligent blacks and hispanics being admitted to medical schools, and when they don’t drop out, they end up practicing medicine, and in doing so, they do a worse job than white and Asian people would have done, thereby killing people by incompetence.

Given what we know about race, intelligence, and job performance, the conclusion is essentially inevitable. However, as far as I know, no one has actually published a study on malpractice rates by race, or some other direct measure of patient harm. You can probably imagine why that is the case. However, we do have a case study of a similar thing happening in the police (Gottfredson, 1996).

But let’s take things one step at a time.

Affirmation action … in action

Ample data exists about this. Mark Perry has documented it well:

The MCAT (Medical College Admission Test) is a cognitive test used for medical schools, basically their version of the SAT/ACT. I don’t know if anyone has published a study relating this one directly to an IQ test, but we have such data for the other achievement/entrance tests: SAT, ACT, and KTEA.

The table shows that someone with about the same score in the highlighted column has ~22.5% chance of admittance if Asian, and 81% if black. One can also fit a model, and calculate the specific race benefits in odds ratios, as was done in this report:

Medical school acceptance and matriculation rates

The table below shows acceptance and matriculation (people who actually enroll) rates by race (source):

So we see that blacks and hispanics have lower acceptance/matriculation rates. This is because while they gain from affirmative action, they are also worse applicants, and worse than the favoritism granted by affirmative action.

Competency measures

Tables below show average competencies by race for applicants and matriculates (same source as above):

Unhelpfully, no standard deviations are supplied for these measures, so one cannot just glance the Cohen’s d values from the table. These can be found in the table below (source):

These are for the total population of applicants, not whites alone, so the Cohen’s d will be slightly underestimated by using these (say, 10 to 15%). So, then we calculate the d gaps by race to white (I used matriculates, as is most relevant), which are:

Measure d gap to Whites
black Hispanic Asian
MCAT CPBS -0.63 -0.52 0.30
MCAT CARS -0.74 -0.74 -0.04
MCAT BBLS -0.63 -0.44 0.15
MCAT PSBB -0.52 -0.52 0.19
Total MCAT -0.73 -0.63 0.16
GPA Science -0.74 -0.44 0.02
GPA Non-Science -0.54 -0.36 0.04
GPA Total -0.71 -0.44 0.03


Thus, we see that black and Hispanic enrolled students are quite far below whites in academic talent, and Asians somewhat above.

Differential drop-out rates and competence among graduates

Less capable students drop out more, so this tends to even out group gaps among admitted students. However, the process is not 100% effective, so we still end up with gaps among the graduates. The figure and table below show the differential drop-out (source):

Finding competence data for graduates was more tricky, but I was able to find this figure (source):

Thus, we see that the rank order of scores among different races is preserved among graduates as well. The placed/unplaced refers to whether the graduates were able to find a residency.

Medical competency

Instead of relying on more academically broad MCAT, we can look at various medical school tests and exams (which are of course well predicted by the MCAT). First table is from here and concerns a competency test taken at medical school.

The adjusted score refers to the explained gaps after controlling for prior MCAT score and GPA. Women do worse in these old data, as they generally do on high stakes testing.

From the same Michigan report as above, we get a similar result:

The 75th centile black student in medical school do about as well as the 25th centile white and Asian student.

There’s also a similar report for Maryland. See also UK results in this meta-analysis.

MCAT et al predict actual medical performance

There are a variety of studies on this question, and a few reviews (e.g. here, here). Here’s an example of a primary study:


To establish whether successful certifying examination performances of doctors are associated with their patients’ mortality and length of stay following acute myocardial infarction.


Risk adjusted mortality and survivors’ length of stay were compared for doctors who had satisfactorily completed training in internal medicine or cardiology and attempted the relevant examination. Specifically, the study investigated the joint effects of hospital location, availability of advanced cardiac care, doctors’ specializations, certifying examination performances, year certification was first attempted and patient volume.


Data on all acute myocardial infarctions in Pennsylvania for the calendar year 1993 were collected by the Pennsylvania Health Care Cost Containment Council. These data were combined with physician information from the database of the American Board of Internal Medicine.


Holding all variables constant, successful examination performance (i.e. certification in internal medicine or cardiology) was associated with a 19% reduction in mortality. Decreased mortality was also correlated with treatment in hospitals located outwith either rural or urban settings and with management by a cardiologist. Shorter stays were not related to examination performance but were associated with treatment by high volume cardiologists who had recently finished training and who cared for their patients in hospitals located outwith rural or urban settings.


The results of the study add to the evidence supporting the validity of the certifying examination and lend support to the concept that fund of knowledge is related to quality of practice.

There are well-known race difference in job performance in general

Even if we didn’t have the other data, we would be well within reason to conclude that there are race differences in job performance among doctors, because we have large meta-analyses of race differences in work performance in general:

Race differences in medical performance

Thus, finally, we get to the last part. Unfortunately, I have not been able to find a study that looked at individual doctor’s race and malpractice and other patient harms measures. One can however use the race composition of medical schools as a proxy. There is a study of medical schools showing they have consistently different rates, but it didn’t investigate the link to school level metrics such as mean MCAT/GPA among students.

  • Waters, T. M., Lefevre, F. V., & Budetti, P. P. (2003). Medical school attended as a predictor of medical malpractice claims. BMJ Quality & Safety, 12(5), 330-336.

Objectives: Following earlier research which showed that certain types of physicians are more likely to be sued for malpractice, this study explored (1) whether graduates of certain medical schools have consistently higher rates of lawsuits against them, (2) if the rates of lawsuits against physicians are associated with their school of graduation, and (3) whether the characteristics of the medical school explain any differences found.

Design: Retrospective analysis of malpractice claims data from three states merged with physician data from the AMA Masterfile (n=30 288).

Study subjects: All US medical schools with at least 5% of graduates practising in three study states (n=89).

Main outcome measures: Proportion of graduates from a medical school for a particular decade sued for medical malpractice between 1990 and 1997 and odds ratio for lawsuits against physicians from high and low outlier schools; correlations between the lawsuit rates of successive cohorts of graduates of specific medical schools.

Results: Medical schools that are outliers for malpractice lawsuits against their graduates in one decade are likely to retain their outlier status in the subsequent decade. In addition, outlier status of a physician’s medical school in the decade before his or her graduation is predictive of that physician’s malpractice claims experience (p<0.01). All correlations of cohorts were relatively high and all were statistically significant at p<0.001. Comparison of outlier and non-outlier schools showed that some differences exist in school ownership (p<0.05), years since established (p<0.05), and mean number of residents and fellows (p<0.01).

Conclusions: Consistent differences in malpractice experience exist among medical schools. Further research exploring alternative explanations for these differences needs to be conducted.

Of particular interest here is the mentioned database, AMA Masterfile. Perhaps one can obtain access to this.

Worse, one can find studies that investigate the effect of race on patient outcomes — but only the patient’s race and sometimes the interaction of the patient with the doctor, without reporting the doctor’s race main effect! Here’s an example:

Context Many studies have documented race and gender differences in health care received by patients. However, few studies have related differences in the quality of interpersonal care to patient and physician race and gender.

Objective To describe how the race/ethnicity and gender of patients and physicians are associated with physicians’ participatory decision-making (PDM) styles.

Design, Setting, and Participants Telephone survey conducted between November 1996 and June 1998 of 1816 adults aged 18 to 65 years (mean age, 41 years) who had recently attended 1 of 32 primary care practices associated with a large mixed-model managed care organization in an urban setting. Sixty-six percent of patients surveyed were female, 43% were white, and 45% were African American. The physician sample (n=64) was 63% male, with 56% white, and 25% African American.

Main Outcome Measure Patients’ ratings of their physicians’ PDM style on a 100-point scale.

Results African American patients rated their visits as significantly less participatory than whites in models adjusting for patient age, gender, education, marital status, health status, and length of the patient-physician relationship (mean [SE] PDM score, 58.0 [1.2] vs 60.6 [3.3]; P=.03). Ratings of minority and white physicians did not differ with respect to PDM style (adjusted mean [SE] PDM score for African Americans, 59.2 [1.7] vs whites, 61.7 [3.1]; P=.13). Patients in race-concordant relationships with their physicians rated their visits as significantly more participatory than patients in race-discordant relationships (difference [SE], 2.6 [1.1]; P=.02). Patients of female physicians had more participatory visits (adjusted mean [SE] PDM score for female, 62.4 [1.3] vs male, 59.5 [3.1]; P=.03), but gender concordance between physicians and patients was not significantly related to PDM score (unadjusted mean [SE] PDM score, 76.0 [1.0] for concordant vs 74.5 [0.9] for discordant; P=.12). Patient satisfaction was highly associated with PDM score within all race/ethnicity groups.

Conclusions Our data suggest that African American patients rate their visits with physicians as less participatory than whites. However, patients seeing physicians of their own race rate their physicians’ decision-making styles as more participatory. Improving cross-cultural communication between primary care physicians and patients and providing patients with access to a diverse group of physicians may lead to more patient involvement in care, higher levels of patient satisfaction, and better health outcomes.

Note the use of “race-concordant”, which means patient and doctor had the same race. This is in fact a simple coding of the interaction effect between patient and physician race.

Bonus: really nice p-values in those findings. No worries, the study only has ~1900 citations.

Oh, and final point. From a public health perspective, affirmative action probably mostly kills blacks and Hispanics. People prefer to befriend and date others who are the same race, and this ethno-centrism also applies to patient’s choice of physicians. As such, the incompetent black and Hispanic physicians are mostly treating and thus harming black and Hispanic patients, who would have been better off with a white or Asian doctor.

June 7, 2018

Lewontin’s famous 1972 book chapter

Filed under: Population genetics — Tags: , , — Emil O. W. Kirkegaard @ 07:21

Few people actually read this, which is a shame because the science is interesting for historical purposes, and the fulltext of his famous fallacy is rarely read. I believe this is a pity because it is really obvious when you actually read it.

The conclusion section reads as follows:

The results are quite remarkable. The mean” proportion of the total species diversity that is contained within populations is 85.4%, with a maximum of 99.7% for the Xm gene, and a minimum of 63.6% for Duffy. Less than 15% of all human genetic diversity is accounted for by differences between human groups! Moreover, the difference between populations within a race accounts for an additional 8.3%, so that only 6.3% is accounted for by racial classification.

This allocation of 85% of human genetic diversity to individual variation within populations is sensitive to the sample of populations considered. As we have several times pointed out, our sample is heavily weighted with “primitive” peoples with small populations, so that their Ho values count much too heavily compared with their proportion in the total human population. Scanning Table 3 we see that, more often than not, the Hpop values are lower for South Asian aborigines, Australian aborigines, Oceanians, and Amerinds than for the three large racial groups. Moreover, the total human diversity, Hspecies, is inflated because of the overweighting of these small groups, which tend to have gene frequencies that deviate from the large races. Thus the fraction of diversity within populations is doubly underestimated since the numerator of that fraction is underestimated and the denominator overestimated.

When we consider the remaining diversity, not explained by within-population effects, the allocation to within-race and between-race effects is sensitive to our racial representations. On the one hand the over-representation of aborigines and Oceanians tends to give too much weight to diversity between races. On the other hand, the racial component is underestimated by certain arbitrary lumpings of divergent populations in one race. For example, if the Hindi and Urdu speaking peoples were separated out as a race, and if the Melanesian peoples of the South Asian seas were not lumped with the Oceanians, then the racial component of diversity would be increased. Of course, by assigning each population to separate races we would carry this procedure to the reductio ad absurdum. A post facto assignment, based on gene frequencies, would also increase the racial component, but if this were carried out objectively it would lump certain Africans with Lapps! Clearly, if we are to assess the meaning of racial classifications in genetic terms, we must concern ourselves with the usual racial divisions. All things considered, then, the 6.3% of human diversity assignable to race is about right, or a slight overestimate considering that Hpop is overestimated.

It is clear that our perception of relatively large differences between human races and subgroups, as compared to the variation within these groups, is indeed a biased perception and that, based on randomly chosen genetic differences, human races and populations are remarkably similar to each other, with the largest part by far of human variation being accounted for by the differences between individuals.

Human racial classification is of no social value and is positively destructive of social and human relations. Since such racial classification is now seen to be of virtually no genetic or taxonomic significance either, no justification can be offered for its continuance.

That’s it. Notice the absence of any premise connecting the conclusion

“Human racial classification is of no social value and is positively destructive of social and human relations”

to the intended premise

“The mean proportion of the total species diversity that is contained within populations is 85.4% […]. Less than 15% of all human genetic diversity is accounted for by differences between human groups! Moreover, the difference between populations within a race accounts for an additional 8.3%, so that only 6.3% is accounted for by racial classification.”.

What’s the intended premise supposed to be? What if we found the values were a 50-30-20 split? Even a cursory comparison to other species with subspecies no one denies would reveal these human values are quite typical (Woodley 2010, Fuerst 2015, Stoeckle & Thaler 2018).

July 27, 2017

Large scale sex, race etc. discrimination studies: what do they show?

Filed under: Psychology,Sociology — Tags: , , , , , — Emil O. W. Kirkegaard @ 01:56

Given enough motivation, QRPs, biased reviewing and time, one can build an entire literature of studies proving anything. There’s plenty of all of these to prove left-wing ideological beliefs (and libertarian in economics). However, it is much harder to QRP large N datasets to give preferred results. So, what do large scale studies show about sex, race etc. biases in hiring, grading etc.? Here’s an attempt at a collection. Ping me anywhere if you know of any more.

Teaching accreditation exams reveal grading biases favor women in male-dominated disciplines in France

Discrimination against women is seen as one of the possible causes behind their underrepresentation in certain STEM (science, technology, engineering, and mathematics) subjects. We show that this is not the case for the competitive exams used to recruit almost all French secondary and postsecondary teachers and professors. Comparisons of oral non–gender-blind tests with written gender-blind tests for about 100,000 individuals observed in 11 different fields over the period 2006–2013 reveal a bias in favor of women that is strongly increasing with the extent of a field’s male-domination. This bias turns from 3 to 5 percentile ranks for men in literature and foreign languages to about 10 percentile ranks for women in math, physics, or philosophy. These findings have implications for the debate over what interventions are appropriate to increase the representation of women in fields in which they are currently underrepresented.

National hiring experiments reveal 2:1 faculty preference for women on STEM tenure track

National randomized experiments and validation studies were conducted on 873 tenure-track faculty (439 male, 434 female) from biology, engineering, economics, and psychology at 371 universities/colleges from 50 US states and the District of Columbia. In the main experiment, 363 faculty members evaluated narrative summaries describing hypothetical female and male applicants for tenure-track assistant professorships who shared the same lifestyle (e.g., single without children, married with children). Applicants’ profiles were systematically varied to disguise identically rated scholarship; profiles were counterbalanced by gender across faculty to enable between-faculty comparisons of hiring preferences for identically qualified women versus men. Results revealed a 2:1 preference for women by faculty of both genders across both math-intensive and non–math-intensive fields, with the single exception of male economists, who showed no gender preference. Results were replicated using weighted analyses to control for national sample characteristics. In follow-up experiments, 144 faculty evaluated competing applicants with differing lifestyles (e.g., divorced mother vs. married father), and 204 faculty compared same-gender candidates with children, but differing in whether they took 1-y-parental leaves in graduate school. Women preferred divorced mothers to married fathers; men preferred mothers who took leaves to mothers who did not. In two validation studies, 35 engineering faculty provided rankings using full curricula vitae instead of narratives, and 127 faculty rated one applicant rather than choosing from a mixed-gender group; the same preference for women was shown by faculty of both genders. These results suggest it is a propitious time for women launching careers in academic science. Messages to the contrary may discourage women from applying for STEM (science, technology, engineering, mathematics) tenure-track assistant professorships.

Going blind to see more clearly: unconscious bias in Australian Public Service shortlisting processes

In characteristic spin language:

This study assessed whether women and minorities are discriminated against in the early stages of the recruitment process for senior positions in the APS, while also testing the impact of implementing a ‘blind’ or de-identified approach to reviewing candidates. Over 2,100 public servants from 14 agencies participated in the trial 1 . They completed an exercise in which they shortlisted applicants for a hypothetical senior role in their agency. Participants were randomly assigned to receive application materials for candidates in standard form or in de-identified form (with information about candidate gender, race and ethnicity removed). We found that the public servants engaged in positive (not negative) discrimination towards female and minority candidates:

• Participants were 2.9% more likely to shortlist female candidates and 3.2% less likely to shortlist male applicants when they were identifiable, compared with when they were de-identified.

• Minority males were 5.8% more likely to be shortlisted and minority females were 8.6% more likely to be shortlisted when identifiable compared to when applications were de-identified.

• The positive discrimination was strongest for Indigenous female candidates who were 22.2% more likely to be shortlisted when identifiable compared to when the applications were de-identified.

Interestingly, male reviewers displayed markedly more positive discrimination in favour of minority candidates than did female counterparts, and reviewers aged 40+ displayed much stronger affirmative action in favour for both women and minorities than did younger ones. Overall, the results indicate the need for caution when moving towards ’blind’ recruitment processes in the Australian Public Service, as de-identification may frustrate efforts aimed at promoting diversity 2 .

Funnel plot for “Racial Bias in Mock Juror Decision-Making”

Meta-analysis of juror decision making studies finds that Whites show no own-group bias, but Blacks do.

June 13, 2017

New paper out: Admixture in Argentina (with John Fuerst)

We have a new big paper out:

  • Kirkegaard, E. O. W., & Fuerst, J. (2017). Admixture in Argentina. Mankind Quarterly, 57(4). Retrieved from


Analyses of the relationships between cognitive ability, socioeconomic outcomes, and European ancestry were carried out at multiple levels in Argentina: individual (max. n = 5,920), district (n = 437), municipal (n = 299), and provincial (n = 24). Socioeconomic outcomes correlated in expected ways such that there was a general socioeconomic factor (S factor). The structure of this factor replicated across four levels of analysis, with a mean congruence coefficient of .96. Cognitive ability and S were moderately to strongly correlated at the four levels of analyses: individual r=.55 (.44 before disattenuation), district r=.52, municipal r=.66, and provincial r=.88. European biogeographic ancestry (BGA) for the provinces was estimated from 25 genomics papers. These estimates were validated against European ancestry estimated from self-identified race/ethnicity (SIRE; r=.67) and interviewer-rated skin brightness (r=.33). On the provincial level, European BGA correlated strongly with scholastic achievement-based cognitive ability and composite S-factor scores (r’s .48 and .54, respectively). These relationships were not due to confounding with latitude or mean temperature when analyzed in multivariate analyses. There were no BGA data for the other levels, so we relied on %White, skin brightness, and SIRE-based ancestry estimates instead, all of which were related to cognitive ability and S at all levels of analysis. At the individual level, skin brightness was related to both cognitive ability and S. Regression analyses showed that SIRE had little detectable predictive validity when skin brightness was included in models. Similarly, the correlations between skin brightness, cognitive ability, and S were also found inside SIRE groups. The results were similar when analyzed within provinces. In general, results were congruent with a familial model of individual and regional outcome differences.


We carried out our usual thoro analysis of the predictions of genetic models of cognitive ability/social inequality with regards to admixture. We combined a variety of data sources to estimate mean racial admixture by subnational units, and related these to estimates of cognitive ability (CA) and S factor scores. In this case, we were also able to find individual-level skin tone/color data, as well as really crude cognitive ability data (2-5 items) and a decent number of social measures (>10). All in all, everything was more or less as expected: substantial correlations between European ancestry, CA and S, and some relationships to skin tone as well. The most outlying results were those for the smaller subnational units (districts, municipals) for which our estimates of European ancestry based on SIRE were not strong related to CA/S. Presumably this was due to a variety of factors including sampling error, SIRE x location interactions for predicting ancestry (as seen in Brazil), and ancestry x location interactions for predicting CA/S.

The paper thus is another replication of the general patterns we already saw most other places we already looked. There are still some large American countries left to cover (e.g. Bolivia, Venezuela), but they are hard to get decent data for. We will probably have to rely on the LAPOP survey to estimate many of them.

Some figures of interest

Maps for those who like them.

Main regressions.

June 6, 2017

The race and intelligence question is solvable

In reply to:


There’s a new kind of ‘environmentalist’ (rather, anti-hereditarian) defense in town, or at least, one that’s not commonly seen. It goes like this:

This is the first of a series of blog posts about race and intelligence. My opinions on this topic are, I think, the least popular arguments I have ever made, and I have made a few unpopular ones over the years. My assertion, basically, is this: discussions of differences in IQ among racial groups are not actually about empirical data. My argument is unpopular, I think, because both sides of the issue are committed to the idea that they are being scientific about it. The hereditarians want to be seen as scientific so they aren’t seen as racists; the anti-hereditarians want to be scientific so they aren’t seen as frightened by socially difficult scientific findings.

So, the matter is not just hard, says Turkheimer, it’s impossible, making it not scientific at all. The main targets are researchers like Jensen:

It was Arthur Jensen who first attempted to extract claims of genetically-based differences in the IQs of racial groups from the obviously racist context in which they had previously existed. Jensen’s writing on the subject adopted a measured scientific tone, denounced racism directed at individuals, referred extensively to data, and was extremely well-informed about the science of human intelligence. Jensen’s work on race differences produced pushback of all kinds, but the predominant strand accepted his contention that the question of race differences was a scientific hypothesis deserving of careful empirical consideration. The classic work of this kind, still the only take on the subject that appears genuinely to avoid reaching a conclusion one way or the other, was written by my mentor John Loehlin.

Turkheimer gets Loehlin et al wrong, for they wrote in their conclusion:

Given the empirical findings, and the theoretical arguments we have discussed, what conclusions about racial-ethnic differences in per­formance on intellectual tests appear justified? It seems to us that they include the following:

1. Observed average differences in the scores of members of different U.S. racial-ethnic groups on intellectual-ability tests probably reflect in part inadequacies and biases in the tests themselves, in part differences in environmental conditions among the groups, and in part genetic differences among the groups. It should be emphasized that these three factors are not necessarily independent, and may interact

2. A rather wide range of positions concerning the relative weight to be given these three factors can reasonably be taken on the basis of current evidence, and ;t sensible person’s position might well differ for different abilities, for different groups, and for different tests.

3. Regardless of the position taken on the relative importance of these three factors, it seems clear that the differences among individuals within racial-ethnic (and socioeconomic) groups greatly exceed in mag­nitude the average differences between such groups.

Let us emphasize that these conclusions arc based on the condi­tions that have existed in the United States in the recent past. None of them precludes the possibility that changes in these conditions could alter these relationships for future populations. It should also he noted that the probable existence of relevant environmental differ­ences, genetic differences, and psychometric biases does not imply that they must always be in the same direction as the observed between-group differences.

On the whole, these are rather limited conclusions. It does not appear to us, however, that the state of the scientific evidence at the present time justifies stronger ones.

In other words, Loehlin et al attribute non-zero effect size to between group genetics, as well as test bias and environmental conditions. Sounds familiar? Let’s try The Bell Curve:

It the reader is now convinced that either the genetic or environmen­tal explanation has won out to the exclusion of the other, we have not done a sufficiently good job of presenting one side or the other. It seems highly likely to us that both genes and the environment have something to do with racial differences. W hat might the mix be? We are resolutely agnostic on that issue; as far as we can determine, the evidence does not yet justify an estimate.

The conclusion by Loehlin et al and Murray & Herrnstein are in fact almost identical. Both agree that some mix of genetics and environment explain the gap, but Loehlin et al also attribute some to test bias, which M&H does not, at least in that passage. Both groups decline to attempt make definite numerical estimates, or even give a probability distribution. What about Jensen? Here’s Jensen in 1969:

The fact that a reasonable hypothesis has not been rigorously proved does not mean that it should be summarily dismissed. It only means that we need more appropriate research for putting it to the test. I believe such definitive research is entirely possible but has not yet been done. So all we are left with are various lines of evidence, no one of which is definitive alone, but which, viewed all to­gether, make it a not unreasonable hypothesis that genetic factors are strongly implicated in the average Negro-white intelligence difference. The preponderance of the evidence is, in my opinion, less consistent with a strictly environmental hypothesis than with a genetic hypothesis, which, of course, does not exclude the influence of environment or its interaction with genetic factors.

Jensen does not strictly say one or another factor is >0, it seems that he thinks about it in terms of a probability distribution for different values of the between group heritability. What do modern researchers think in terms of probability distributions? Unfortunately, Rindermann et al did not poll researchers this way, but simply asked them 1) whether the evidence justifies a point estimate, and 2) if so, what their best guess is. In terms of a probability distribution, this might be taken to mean the mean, median or mode, depending on interpretation. The Beta distribution is well-suited for estimating single-parameters models.

You can use this calculator to try out values until you find something that approximates your probability distribution. My probability distribution looks something like this [Beta(2, .5)]:

I will attempt to do a poll of top experts on both sides, so we can get an idea of how sure the experts are.

But aside from that, what reasons do Turkheimer give for believing the matter to be unsolvable? First, he notes that humans are stubborn, and continues:

The second reason involves the nature of the studies that are invoked. When one is discusses questions about the origins of human differences, most of the scientific designs that might be most informative aren’t available. You can’t breed people, you can’t mess with their DNA, you can’t raise them under controlled conditions. So the discussion is necessarily based on quasi-experimental science that is by definition fundamentally flawed. In research of this kind the grounds for objection are almost always greater than the valid scientific signal, making it endlessly possible to trade insults about the low quality of the evidence produced by the other side. Many if not most social scientific arguments are tiresome in exactly this way. (The hereditarians are always saying that any day now there will be new genomic data that will settle the question once and for all. They are wrong. More on that in subsequent posts.)

Turkheimer is engaging in a textbook case of proving too much. His argument about the lack of strictly controlled experiments means that essentially all science is pseudoscience, or non-science. This extends far beyond social science: we cannot do experiments in many areas of geology, meteorology, astronomy, or biology, but of course covers quite a lot of social science too, including virtually all of history, and most of sociology, psychology, epidemiology etc.

Turkheimer has a third reason, but it’s strangely inconsistent with the second. He begins by noting how there’s both established theory and models for calculating within group heritabilities, “it is why the nature-nurture debate is over.”. But with between group heritability, he says:

Now have a look at the Rushton and Jensen paper making the case for the partial genetic determination of racial differences, or listen to Murray and Harris, or read any of the replies to our piece in Vox. Where are the percentages? Where is the equivalent of the ACE model? Where are the structural equation models with parameters quantifying the “partly genetic and partly environmental” hypothesis the hereditarians keep repeating? For all the hereditarians’ idle intuitions about differences being part genetic and part environmental, where is the empirical or quantitative theory that describes how this apportioning is supposed to work? There is no such thing as a “group heritability coefficient,” no way to put any meat on the speculative bones about partial genetic determination. In the absence of an actual empirical theory, the discussion is all about your intuitions against my intuitions. “Well, it sure seems to me that if the black-white difference was environmental, it would have been reduced at least a little by now!” “Yeah, but it has been reduced by five IQ points, and what about the children born to white German mothers and African-American soldiers?” “Well, OK, but what about Scarr’s trans-racial adoption study?” “Yeah, that’s not how I interpret the trans-racial adoption study!” And so on ad infinitum.

Turkheimer appears to be making the claim that genetic models make no (falsifiable) predictions, and there’s no models to think about the issue with. A strange claim that’s incoherent with Turkheimer et al’s own recent paper of such predictions that they believe have been falsified.

Of course, it is wholly untrue and predictions of genetic models are not only quite simple, they have been given by many prominent researchers for decades. DeFries (1972) gives an overview of the mathematics that can be used to derive the between group heritability, and this was also given by Jensen (1998, p. 443), citing Defries, Loehlin et al and Lush (1968). I have done a series of simulations to illustrate the modeling and methods, see these for further details.

Aside from direct methods to estimate the between group heritability, one can also use a number of more obvious methods to decide between specific models. The matter is particularly simple if we assume a simple autosomal, additive genetics-only model with random mating. From this, one can make many testable predictions such as:

  • If a person from group A mates with a person from group B, their offspring is expected to have the trait value half-way between the groups. The parent order should not matter, i.e. A-B offspring have the same expected value as B-A offspring (male-female).
  • Assuming no adoptive selection, then if children from group A are adopted by parents from group B, the children are expected to attain the trait value of their biological parents, not adoptive parents. The order of the groups does not matter.
  • If a mixed population exists, then the ancestry and trait levels should follow a simple linear model, where the predicted difference for both extremes reflect the genetic gap, i.e. E(T|A=1) – E(T|A=0) = the genetic difference in T.

Deviations from these predictions that cannot be explained by sampling error or bias constitute falsification of the simplest possible model. For instance, if mating deviates from random mating in certain ways, the offspring means may depend on the parental group order. While there are some studies of these predictions, they are few, have small samples, are very old etc., so the literature is not very reliable. Aside from the purely statistical, publication bias, researcher error, fraud etc. can make reality seem different than it really is. We have every reason to expect massive bias for this topic, as no topic in social science is more fraught with feelings. Indeed, researchers (Turkheimer included) and philosophers regularly ensure us of the horrors that will entail lest people start believing that group differences have anything to do with genetics. The most recent paper of back and forths is Jeffery and Shackleford (2017).

Difficulties in figuring out the causes

Whereas Turkheimer makes the easily disprovable claim that no models etc. exist for modeling or quantifying genetic group differences in polygenic traits, Dalton Conley & Jason Fletcher make the more defensible claim that:

That said, let us ask what is perhaps the most controversial question in the human sciences: Do genetic differences by ancestral population subgroup explain observed differences in achievement between self-identified race groups in the contemporary United States over and above all the environmental differences that we also know matter? In their best-selling 1994 book, The Bell Curve: Intelligence and Class Structure in American Life, Richard Herrnstein and Charles Murray indeed made the argument that blacks are genetically inferior to whites with respect to cognitive ability. Their “evidence,” however, contained no molecular genetic data, and was flawed as a result. But today we have molecular data that might potentially allow us to directly examine the question of race, genes, and IQ. We raise this pernicious question again only to demonstrate the impossibility of answering it scientifically.

Given that I quoted the relevant passage from The Bell Curve, readers can judge for themselves the fairness of C&F’s claims about what it actually said. One funny point is that while Turkheimer believes genomic data to be useless for deciding the issue, C&F believes it is relevant, and criticizes H&M for not basing their conclusion on a kind of evidence that wasn’t available at the time. Well, it soon will be available in large quantities. Turkheimer has a prediction about what will happen in the coming years:

My concern is that anti-hereditarians play into race scientist’s hands when we agree to engage with them as though there existed a legitimate research paradigm proceeding toward a rational conclusion. At least in the social sciences, legitimate empirical research paradigms rarely come to all or none conclusions, so it becomes natural for people to conclude, with Murray and Harris, that the whole long argument is bound to settle eventually on the idea that group differences are a little environmental, a little genetic. But in fact, that is not where we are headed. I predict that in a relatively short period of time, contemporary race science will seem just as transparently unscientific and empirically untrue as the race science of the early 20th Century now appears from our modern perspective.

(Turkheimer’s prediction is even more strange considering that he thinks genomic data will be irrelevant.)

I think Murray and Harris are exactly right. C&F consider both the options I used in my simulation above: 1) polygenic scores, and 2) ancestry analysis. They think neither will work. Their problem with polygenic scores is linkage disequilibrium (LD) decay combined with non-causal variants:

As it turns out, however, these scores when developed for one population—say, those of European descent—fail to predict for other populations. Take the height example. The best height score, which has been “trained” on whites, when applied to blacks, predicts that Africans or African Americans are six inches shorter than they are. So they simply don’t work. The very differences in genetics between ancestral groups make comparisons across groups impossible. The million or so markers measured by gene chips are picking up different things in distinct populations. They are merely flags spaced out along the chromosomes and meant to stand in for all the genetic real estate around them. But what that real estate holds is very different—particularly when African descent populations, with their greater degree of variation, are compared to non-African populations. So polygenic scores, while useful for analysis within populations, do not allow us to make apples to oranges comparisons across groups.

This is not a new problem. In fact I discussed it in 2015 and recently in 2017 as well in relationship to Pifferian worldwide results. C&F does not seem to understand the reason why the scores can fail to work across groups. An excellent (preprint) paper by Zanetti and Weale discusses the various options and carries out a simulation study based on real data:

Through genome-wide association studies (GWASs), researchers have identified hundreds of genetic variants associated with particular complex traits. Previous studies have compared the pattern of association signals across different populations in real data, and these have detected differences in the strength and sometimes even the direction of GWAS signals. These differences could be due to a combination of (1) lack of power (insufficient sample sizes); (2) minor allele frequency (MAF) differences (again affecting power); (3) linkage disequilibrium (LD) differences (affecting power to tag the causal variant); and (4) true differences in causal variant effect sizes (defined by relative risks). In the present work, we sought to assess whether the first three of these reasons are sufficient on their own to explain the observed incidence of trans-ethnic differences in replications of GWAS signals, or whether the fourth reason is also required. We simulated case-control data of European, Asian and African ancestry, drawing on observed MAF and LD patterns seen in the 1000-Genomes reference dataset and assuming the true causal relative risks were the same in all three populations. We found that a combination of Euro-centric SNP selection and between population differences in LD, accentuated by the lower SNP density typical of older GWAS panels, was sufficient to explain the rate of trans-ethnic differences previously reported, without the need to assume between population differences in true causal SNP effect size. This suggests a cross population consistency that has implications for our understanding of the interplay between genetics and environment in the aetiology of complex human diseases.

In any case, we will know for sure when we start finding the actual causal variants for traits. LD decay is only a problem when one relies on proxy (or tag) variants instead of the causal variants. LD decay also becomes smaller the denser the genotyping because this will result in the stronger LD patterns that decay less across populations. Of course, when we get affordable whole genome sequencing (in a few years), density will hit and ceiling and LD decay will be a much smaller problem. C&F continue:

The problem is the same one facing Wade (even if he was unaware of it): Whether measured with a single genetic marker or a summative measure like a principal component, genes act as proxies for environments. The only way to truly insure that observed differences by genetics are really genetic effects is to compare full siblings from the same family where we know the differences between brothers and sisters are the result of luck, of the randomness at conception, and not correlated with background differences in poverty, neighborhood, and so on. But here’s the catch-22: While polygenic scores vary quite a bit between siblings, measures of ancestry, almost by definition, do not. Thus, while initially promising, the idea of comparing siblings with differing dosages of continental ancestries won’t work either.

C&F are half-right. Ancestry itself does not establish genetic causation (believing so, is almost a reverse sociologist’s fallacy), though the absence of such falsifies most genetic models. As such, it constitutes a test for genetic models, one that they have passed in a large number of genomic studies examining links between ancestry and social outcomes (Kirkegaard et al, 2017), as well as in the only large study that examined ancestry using proper genomic data (Kirkegaard et al, in review). Note that the aggregate-level version of the genetic models predict the same kinds of relationships for units composed of multiple persons, possible of mixed ancestry. Many studies of this kind have also confirmed predictions from genetic models (Fuerst and Kirkegaard, 2016; Kirkegaard and Fuerst, in review). Models that make risky — falsifiable — predictions and are repeatedly vindicated increase their posterior probability. What risky predictions have environment-only models made? None, really. Turkheimer et al, as well as others recently still argued for no relationship between ancestry and IQ, something that has been clearly disproved by the analysis of the PING data.

C&F are wrong to require genomic full sibling-control studies as there are only a few plausible hypothesis. One such hypothesis-family is a reactive one based on racial appearance. These can work in multiple ways. In the other-reactive model, racial appearances give rise to hostile discrimination from others, and somehow this results in lowered intelligence, though mysteriously does not seem to affect many other more plausible traits such as self-esteem (Dalliard, 2014). Furthermore, large-scale studies fail to find substantial differences in self-reported discrimination by race (Boutwell et al, 2017). In the self-reactive one, persons react to their own racial appearance by adopting the cultural values associated with how they look, instead of their true polygenic scores.

C&F are also wrong about the impossibility of doing genomic full sibling-control studies. As with their other claims, others already looked into the idea and conducted some analysis to determine the necessary sample size. In a comment on their article, Gwern, who conducted such an analysis along with yours truly, writes:

‘requires large sample sizes’ is not at all the same thing as “won’t work”. Yes, it may require something like n=50k (since siblings will differ by a few percent on ancestry), but that’s not as impossible as it looks, now that we have individual datasets like UKBB with n=500k+ and serious plans for studies with n=1m. With whole genomes at <$1000 and SNP panels at <$50 and costs still falling, it will soon become routine for everyone to be genotyped, picking up fraternal twins / sibling pairs / trios. (For example, at around 1m births a year in the US and around 15% of the population, less than a years’ worth would be necessary.)

Note that the full genomic full sibling design was proposed by hereditarians already back in 2013 (Malloy, 2013). But there are other designs that require smaller samples. Half-siblings are quite common, and particularly so among African Americans due to their relatively unstable families. Using data from such designs still enables a powerful control for shared environment-type causes, while increasing the dispersion in ancestry among siblings. One can also include measures of racial appearance, self-rated racial/cultural identification etc. Even better would be to use a genomic adoptive design, which maximizes the ancestry differences between the siblings. However, such data would probably also be the hardest to get.

C&F end their piece by grasping at straws:

There has long been evidence—dating back to the days of W.E.B. Du Bois—that there is a pigmentocracy within U.S. black (and white and Latino) communities. More recent work has shown that this is not a uniquely American phenomenon but extends to Brazil, South Africa, and other nations with a creole, mixed population. We could try to measure skin tone and factor that out. But we cannot ultimately measure all the myriad cues about racial identity that we react to, especially since we may not even be aware of them. It could even be the case that African or European ancestry predicts height and that taller people are treated better in school, get more nutritional resources at home, and so on. Even though we do not generally think of height as a key dividing line for race, it does not mean that it is not silently associated—at the genotypic level—with the alleles that also differ by race.

Science proceeds by plausibility. C&F is merely arguing here that we cannot be 100% certain. That is true, but also irrelevant. Genetic models keep producing correct predictions, and environmentalists cannot keep responding with “yeah, but you haven’t controlled for possible environmental causes that we don’t even know about” if they want to retain scientific credibility. They must put forward plausible causes that can be tested, otherwise they have — as Jensen and Rushton wrote in 2005 — a degenerative research program. Ironically, they follow-up with:

The near impossibility of a definitive, scientific approach to interrogating genes, race, and IQ stands in contrast to the loose claims of pundits or scholars who assert that there is a genetic explanation for the black-white test score gap. That said, the consideration of genetics in racial analysis is not always pernicious. The ability to control for genotype actually places the effects of social processes, like discrimination, in starker relief. Once you eliminate the claim that there are biological or genetic differences between populations by controlling them away, we can show more clearly the importance of environmental (non-genetic) processes such as structural racism.

Which side has ‘loose claims’ by ‘pundits’? We know the answer to this question, so why do C&F pretend otherwise? Let them make their predictions, publicly, and then let’s examine large datasets that can confirm or disconfirm said predictions. The true test of any scientific enterprise, model, and theory is predictive validity.

Intelligence, special, and yet not so special

Intelligence (general cognitive ability etc.) is of special significance to humans, but genetically speaking, it is not unusual. The same methods that can be used to infer recent selection for height, BMI etc. can be used for intelligence. Such methods are already being used. For instance, one recent paper concluded that height and BMI differences between European populations were partially heritable (Robinson et al 2015):

Across-nation differences in the mean values for complex traits are common, but the reasons for these differences are unknown. Here we find that many independent loci contribute to population genetic differences in height and body mass index (BMI) in 9,416 individuals across 14 European countries. Using discovery data on over 250,000 individuals and unbiased effect size estimates from 17,500 sibling pairs, we estimate that 24% (95% credible interval (CI) = 9%, 41%) and 8% (95% CI = 4%, 16%) of the captured additive genetic variance for height and BMI, respectively, reflect population genetic differences. Population genetic divergence differed significantly from that in a null model (height, P < 3.94 × 10−8; BMI, P < 5.95 × 10−4), and we find an among-population genetic correlation for tall and slender individuals (r = −0.80, 95% CI = −0.95, −0.60), consistent with correlated selection for both phenotypes. Observed differences in height among populations reflected the predicted genetic means (r = 0.51; P < 0.001), but environmental differences across Europe masked genetic differentiation for BMI (P < 0.58).

April 3, 2017

Criticism of me and my research on SlateStarCodex’s subreddit

Most of it seems to be just rewritten stuff from RationalWiki’s collection of inaccuracies.

I repost my reply below here for prosperity, and also for those too lazy to read Reddit.

I’m Emil Kirkegaard, and it seems in order to make some general remarks as well as rebut some of the worse claims.

However, the straw that broke the camel’s back was the fact that Emil Kirkegaard posted here in the past few days. For context, Emil Kirkegaard is a complete unknown among most in genetics. The few that have heard of him consider him a complete laughingstock. He has no academic qualifications commensurate whatsoever with publishing behavioral genetics research and no association with any institutions of repute to booth.

I am an unknown among people in genetics, especially for unsurprising the reason that I don’t generally do much standard genetics research. In fact, I have published little research on behavioral genetics too, mainly for the reason that the datasets — twins etc. — needed to do this are heavily guarded and thus outside of my reach as an independent (at least, until recently). My published research is mainly in differential psychology and the intersection with sociology. A large number of people in this area evidently do know me given that they frequently have mutual interactions with me on Twitter as well as it conferences etc.

In general, it seems like a not too well thought out idea to criticize someone based on their lack of credentials in a community full of autodidacts. I’ve never seen anyone criticize Gwern for this or Scott for that matter. Scott is a doctor-psychiatrist, yet he writes interesting stuff on all kinds of topics such as psychology, behavioral genetics, philosophy and politics. I don’t know what Gwern’s formal background is, but it seems unlikely that he holds advanced degrees in every topic he writes about.

Most of his research is published in two non-peer reviewed “journals” that he edits.

OpenPsych journals do have peer review. In fact, they have open peer review meaning that literally anyone can see the review of any paper. For instance, look in the post-publication forum for the reviews of all published papers. I think this is a much better and certainly more transparent review than journals ordinarily practice. My status as editor has little impact on this system, since editors do not have rejection powers in these journals. Nor can they select reviewers at whim. Rather all in-house reviewers can review any paper they desire. The role of the editor is mainly to smooth things by asking reviewers whether they have time to review this or that submission.

This claim about no review seems to originate from RationalWiki, so it seems that OP just read that source and decided to re-write it for SSC subreddit.

Indeed, he is most famous for pulling a bunch of data from OKCupid, without the consent of the company or the people whose data he used, and throwing it online without anonymizing the data, in clear violation of every single ethical standard set by IRBs anywhere, which could reveal the identities of basically all of the people in the dataset.

Did you ever actually look at the data? It’s already anonymous because people use pseudonyms on OKCupid.

In general, this criticism makes no sense given that the information in the dataset is much less than what’s available on the website. If someone actually wanted to identify gays in Iran, they would go to the website and search for gays in Iran and locate all of them with photos etc. They wouldn’t download an incomplete version of the website with no photos to search. If one really wanted to make this argument, one should make it against OKCupid for making it possible to locate gays etc. in Islamic countries in the first place.

I don’t think scraping dating sites for research is unethical. Scraping websites for research purposes is fairly common, and indeed commonly taught in data science classes. A number of other people scraped this website before and published their datasets, where nothing happened.

A large number of academics wrote to me in private in support, offering among other things legal assistance if there came to be a court case.

His prior research basically looks at whether immigrants are disproportionately criminals with lower IQs and whether negative stereotypes about Muslims were true, using techniques nobody of repute in the field uses (which, unsurprisingly, ends up showing that Muslims and immigrants are criminals with low IQs).

The stereotype study used the exact same methods other studies into stereotype accuracy have used. I know this because I got them from Lee Jussim’s book (Jussim is the world’s top expert in this field). I also sent my paper to Jussim for comment and he was trilled about it. It doesn’t appear that it used bizarre methods to reach non-standard results. In fact, it basically found the same thing virtually every other study into stereotype accuracy have found: stereotypes about demographic groups are quite accurate. It is also the first such study to be pre-registered and use a large-ish nationally representative sample.

As for the immigrant performance studies, the main method used here is the Pearson correlation, perhaps the most widely used method in all of social science. The data are usually (always I think?) from official sources, so they are pretty hard to deny. In fact, I bought the Danish data directly from the Danish statistics agency. The original files are public on OSF, so anyone can verify their veracity.

Do note that not all immigrants groups are more criminal than the host population. Indeed, a consistent finding has been that East Asians are less criminal, often starkly so. This is totally in line with mainstream findings on the crime rates of East Asians in the USA, Canada etc. Muslim immigrants are generally found to do very poorly, something that has often been noted by others, but not systematically studied as far as I know.

Incorporating country of origin data into models is somewhat unusual, but mainstream research do this too. Here’s a 2017 study.

His post cherry picked data to show that race-mixing is bad for offspring. The only pushback he received was about how he probably doesn’t control for how many multi-racial children grow up in single parent households, which was not encouraging.

There was no cherry picking which several people can verify. The r/hapas subreddit has a list of “hard data” and I simply clicked their links, and discarded the ones that weren’t scientific reports (mostly press releases). This gave me a total of 2 studies I examined that were based on decent samples, and both of which I reported in my post. If you search scholar for citing articles, you will find that there are many such papers, so these two don’t seem to be outliers.

The criticism for not controlling for single-parenthood makes no sense at all. I did not do these studies. How would I control for X confounder? I can’t, I can only report what I found. Besides, controlling for such confounders is a sociologist fallacy.

You seem to be under the idea that my post was advocating the outbreeding depression hypothesis. Whereas in fact I considered it unlikely given the lack of effects for other mixed populations (e.g. African Americans) and the small amount of genetic variation between human populations compared to e.g. dogs that don’t show obvious outbreeding depression effects. My position is simply that these four hypotheses are worth investigating.

For what it’s worth, if the outbreeding depression hypothesis turns out to be mainly true, then no specific social policy follows. One might regard government meddling into the affairs of who mates with who as unacceptable, even if there are increased chances of some problems. The state does not generally prevent other people with bad genetics from mating either. Furthermore, if outbreeding depression is real, then it’s due to epistasis — gene by gene interaction. This means that if we can figure out which gene combinations cause this effect, one can screen embryos for bad combinations and avoid the problem. This is however only possible if one identifies the genetic causes, making it important to research the question. My personal stance is that the government should not apply coercive eugenics even if this effect turns out to be real.

October 17, 2016

Notes on Steve Hsu’s second interview w/ Daphne Martschenko

Back in January, Steve Hsu did an interview with Daphne Martschenko who is a phd candidate at Cambridge in education. She’s basically doing science journalism on behavioral genetics as far as I can tell. Now there’s a new interview up. I have some comments to it because Steve was a little too nice.

Bias from twin studies

A never ending topic of contention in the last 100 years or so has been the ability of standard twin studies (MZ-DZ reared together comparison) to correctly estimate causation parameters, in particular heritability, shared environment (whatever siblings have in common), and everything else (‘unshared environment’).

Assortative mating

Assortative mating – the tendency of mates to be similar in traits – is generally a thing, not just for humans. But for humans, it seems to be strongest for age (.77) and religiousness (.75), substantial for educational attainment, cognitive ability (.48), political preferences (~.35), criminality (~.40), and apparently weak for personality (r’s 0-.20), at least when we rely on self-reported, high-order (5-6 factors) scores. There’s even weak to moderate assortative mating for seemingly irrelevant physical features like ear lobes.

Assortative mating negatively biases heritability because it makes DZ twins more genetically similar on the trait of interest. Why? Because if parents are similar on a trait, and this trait is heritable, the parents are actually slightly related (inbred) with regards to that trait (and in general). To the degree that this is true, the genetic similarity of the DZ twins will be above 50%. However, MZ twins cannot get more similar since they are already at 100% (minus a few somatic mutations). The usual method to estimate heritability from standard twin studies relies on the assumption that there is no assortative mating because this means the DZ’s are 50% genetically alike while the MZ’s are 100%. This means that one can just double the difference between their correlations to estimate heritability: H=2(MZ-DZ). In the presence of assortative mating, one has to more than double this value because the genetic difference in difference in less than 50%. E.g. if the DZ’s are actually 60% related for the trait, the difference between their relatedness and the MZs relatedness is only 40%points. We need to multiple 40% 2.5 times instead of 2 to get to 100%. If we suppose for that a particular trait, MZs correlate at .85, DZs at .60, then the standard estimate of heritability would be 2(.85-.60)=50%. But if there is assortative mating as we assumed, we need to do 2.5(.85-.60)=62.5%. The bias in this was about -20%.

For more details, see:

Measurement error downwards bias

Measurement error — noise in the measurement of traits — systematically depresses correlations between any two variables. This also applies to twin studies. Measurement error makes the MZ and DZ correlations smaller which means that heritability estimates get lower too, and the everything else category grows. Let’s say we have the above correlations, but we know that our tests only have .90 reliability. What we need to do is correct the observed correlations first: .85/sqrt(.90*.90)=.944, and .60/sqrt(.90*.90)=.667. Then we plug the numbers back in: 2(.944-.667)=55.4%. But then, we also should take into account assortative mating: 2.5(.944-.667)=69.3%.

Thus, taking into account both assortative mating (possibly an unrealistic amount, not sure how to calculate this exactly) and realistic amounts of measurement error yields a heritability estimate that’s about 40% larger (50% vs. 69.3%).

Special MZ environment possible upwards bias

The standard twin design assumes that the environments of MZs are no more similar than those of DZs (in the equations, both MZ’s and DZ’s C are correlated at 1.00). Common sense has it that this is probably not entirely true. MZs look and act extremely similar and so it is possible that people — including their parents — treat them more similarly. This may cause an extra environmental effect that makes them more similar. This would cause upwards bias in the heritability estimates because the effect of special MZ environment is confounded with the genetic effect. The size of the bias depends on how much more similar the MZ environment is compared to the DZ, and on the effect size of this environment effect. If this sibling environment-type effect is weak to begin with, even a strong extra MZ environment would only weakly bias the estimates. (Again, I think too complicated to give a numeric example.) That’s the reason I said it’s a possible bias. The bias from assortative mating and measurement error are known, not merely possible.

How can we find about this possible bias? There are multiple options. One is:

In 1968, Scarr proposed a test of the EEA which examines the impact of phenotypic similarity in twins of perceived versus true zygosity. We apply this test for the EEA to five common psychiatric disorders (major depression, generalized anxiety disorder, phobia, bulimia, and alcoholism), as assessed by personal interview, in 1030 female-female twin pairs from the Virginia Twin Registry with known zygosity. We use a newly developed model-fitting approach which treats perceived zygosity as a form of specified familial environment. In 158 of the 1030 pairs (15.3%), one or both twins disagreed with the project-assigned zygosity. Model fitting provided no evidence for a significant influence of perceived zygosity on twin resemblance for any of the five disorders.

There’s a replication here, n=882. There’s a bunch of more studies too.

There’s another kind of study as well: sometimes twins get misclassified, including by their parents and themselves. Meaning, they think they are MZ, but they are really DZ, or the other way around. There’s been a number of such studies and there’s a 2013 review of them. No surprises.

What about womb effects? Twins may or may not share placentas and chorions. Pretty much no effect of these either.

Other family members?

Twins reared together are just convenient, but one can use pretty much any pair of family relations. Here’s a large study using siblings and half-siblings:

Twin studies have been criticized for upwardly biased estimates that might contribute to the missing heritability problem.

We identified, from the general Swedish population born 1960-1990, informative sibships containing a proband, one reared-together full- or half-sibling and a full-, step- or half-sibling with varying degrees of childhood cohabitation with the proband. Estimates of genetic, shared and individual specific environment for drug abuse (DA), alcohol use disorder (AUD) and criminal behavior (CB), assessed from medical, legal or pharmacy registries, were obtained using Mplus.

Aggregate estimates of additive genetic effects for DA, AUD and CB obtained separately in males and females varied from 0.46 to 0.73 and agreed with those obtained from monozygotic and dizygotic twins from the same population. Of 54 heritability estimates from individual classes of informative sibling trios (3 syndromes × 9 classes of trios × 2 sexes), heritability estimates from the siblings were lower, tied and higher than those from obtained from twins in 26, one and 27 comparisons, respectively. By contrast, of 54 shared environmental estimates, 33 were lower than those found in twins, one tied and 20 were higher.

With adequate information, human populations can provide many methods for estimating genetic and shared environmental effects. For the three externalizing syndromes examined, concerns that heritability estimates from twin studies are upwardly biased or were not generalizable to more typical kinds of siblings were not supported. Overestimation of heritability from twin studies is not a likely explanation for the missing heritability problem.

One can also use family members not reared together such as regular adoptions and the rare MZ adoptions. Generally these approaches also find similar results.

To be fair, here’s a large Icelandic study that used a variety of relationships that found lower heritability estimates (about 75% of standard) and higher shared environment estimates. Not sure why this is the case.

Still, most of the evidence fits with no particular bias from which kinds of family relationship are used. Measurement error, however, always biases heritability downwards.

Heritability of cognitive ability, and parental S

Daphne refers to the famous Turkheimer 2003 study. Since this is an interaction finding (interactions have low priors) and suited left-wingers, this finding spread like wildfire despite other equally large or larger studies finding no such effect (some found reverse effects too). Finally, 12 years later, someone did a meta-analysis, and we can now see that: 1) this finding is apparently only seen in US samples, and 2) it was wildly overestimated by the initial study. There’s some rather extreme citation bias there too. I wonder why.


Even if this interaction effect was large, it is mostly pointless because shared environment effects go away with age for this trait.

Heritability of social success

Daphne talks about the heritability of social success, e.g. occupations, education and so on. While one can estimate the influence of cognitive ability indirectly, it’s easier to just estimate heritability directly, which was first done 40 years ago. The finding of heritable social success is more a less a given because: 1) if success is a function of psychological traits, 2) psychological traits are moderately to highly heritable, then 3) success is pretty likely to be fairly heritable too. Herrnstein’s syllogism.

This field is in general not well researched because it lacks a unifying theory (which the S factor model provides, kinda sorta). However, here’s some findings:

Education: meta-analysis ~ 40% H, C = 36%, E=25% (yes, apparently, they sum to 101). 44% in Taubman 1967.

Income: 42% in NLSY. 25-54% in Finnish twins female/male. 48% in Taubman 1976.

Occupation: 43% among the 1944-1960 cohort.

‘Neighborhood deprivation’: 65% nation-wide Swedish dataset, 71% in Scotland.

Still missing a more general, S factor heritability study (this can be done using NLSY links). I will make this prediction: higher S loading → higher heritability. E.g. single year income has poor S loading because it’s a bad indicator of S, and hence heritability. Heritability of S comes from highly heritable traits that include cognitive ability, personality and interests.

Self-identified race/ethnicity, race and cognitive ability

The forbidden topic!

I am fortunate that these are my views because they are politically correct and garner me praise, speaking and writing invitations, and book adoptions at the same time those who disagree with me are demeaned, ostracized, and in some cases threatened with tenure revocation even though their science is as reasonable as mine. Don’t get me wrong, I think their positions are incorrect and I have relished aiming my pen at what I regard to be their leaps of logic and flawed analysis. But they deeply believe that I am wrong. The problem is that I can tell my side far more easily than they are permitted to tell theirs, through invitations to speak at meetings, to contribute chapters and articles, etc.. This offends my sense of fairness and cannot be good for science. I think Saletan would agree with me on this.

See also: Double standards.

As for genomic ancestry and self-identity, there are lots of studies. Here’s some results from the PING dataset:


(Working paper here)

We relate ancestry to cognitive ability, while controlling for SIRE (and all the cultural effects related to SIRE), then we get:


The method is not entirely satisfactory. Better would be if we also had skin brightness from the persons (PING does not seem to have this), so we could control for any effects of skin-based racism, however implausible. Better yet, we could get a large set of racially admixed siblings. These siblings vary slightly in their genomic ancestry in imperceptible ways (e.g. one is 55% African, another is 50%). A genetic model predicts this small variation to be linked to cognitive ability. This design is neat because by virtue of using siblings, it controls for family effects. The results of such a study would be pretty conclusive. It’s not at all impossible, a very similar study has already been done for height. All that is needed is some large (10k? Maybe Gwern should do a power analysis? :) ) sibling dataset with genomic data, cognitive ability and skin brightness.

Or, we use exploit the fact that while in the total population, skin brightness and racial ancestry are moderately to strongly correlated (r ≈ .50), they aren’t so between siblings (I can’t find an empirical demonstration of this). However, if skin-based racism is really responsible for cognitive differences, then skin brightness differences between siblings should show correlations to cognitive ability differences, educational attainment, income, and so on. Yet they don’t. And if we control for cognitive ability, skin brightness has positive correlations to education. The opposite of what colorism predicts. In general, colorism is not a good hypothesis. The race gaps are smaller for things like income, whereas these are the easiest things to discriminate for.

GWAS results/polygenic scores, by the way, replicate/work partly in non-European samples too, to a degree that depends on the relatedness according to an animal breeding study. This is because the causal tagging depends on LD patterns. SNPs are generally not causal (or so we think!), they are just close to the causal variants on the genome.

Isn’t it more than a little odd that a topic that so many people consider so important and which is so easy to get to the bottom of, attracts so little research attention? This debate is easy to settle: get access to medical datasets with the required variables.

October 10, 2016

Individual genomic admixture and cognitive ability

So, I posted this:

We used data from the PING study (n≈1200) to examine the relationship between cognitive ability, socioeconomic outcomes and genomic racial ancestry. We found that when genomic ancestry was not included in models, self-reported race/ethnicity (SIRE) was a useful predictor of cognitive ability/S, but when genomic ancestry was included, SIRE lost much or most of its validity. In particular, for African Americans, the SIRE standardized beta changed from about -1.00 to .20. European genomic ancestry was found to be positively related to cognitive ability/S (r’s .26/.33) including when SIRE was controlled, while African genomic ancestry was found to be negatively related to cognitive ability/S (r’s-.36/-.30) also when SIRE was controlled.

Key words: race differences, group differences, intelligence, IQ, cognitive ability, social status, inequality, ancestry, admixture, genomics

It’s note quite definitive because it lacks ability to distinguish between colorism and genetic effects. Environmentalists are very stubborn, so one needs very strong evidence. So, we need datasets that have:

  • Genomic ancestry, preferably not using just a few AIMs
  • Racial appearance data: skin brightness in particular
  • Cognitive ability
  • Socioeconomic indicators (for S factor)

What datasets are there like this? Let’s find them and start applying so we can settle this question once and for all.

Pelotas (Brazil) Birth Cohort Study

  • n=3700.
  • Mixed race.
  • All four variables of interest.
  • Requires approval.
  • In Portuguese.

The Coronary Artery Risk Development in Young Adults Study (CARDIA)

  • “It began in 1985-6 with a group of 5115 black and white men and women aged 18-30 years”
  • 3 cognitive tests: Rey Auditory-Verbal Learning Test (RAVLT), Digit Symbol Substitution Test (DSST) and Stroop Test
  • Skin brightness. Can’t see on the website, but used in studies that used the dataset.
  • Socioeconomic outcomes. Yes, same as above.
  • Requires approval.

UK Biobank

  • n=500k planned. Currently about 100k.
  • Mostly Britons, but some British Africans and others.
  • Crappy cognitive tests, good genomic ancestry and no racial phenotype data (I think).
  • Requires approval. Must be for medical reasons.

The Multi-Ethnic Study of Atherosclerosis (MESA)

Add Health

May 5, 2016

Who prefers to date people of their own race?

Filed under: Differential psychology/psychometrics — Tags: , , , — Emil O. W. Kirkegaard @ 22:25

Data from the OKCupid project.

In light of a recent paper examining who prefers to date within their own religion, I recalled that there was a question about this in the OKCupid dataset, except that it is for race: “Would you strongly prefer to go out with someone of your own skin color / racial background?” It’s a dichotomous outcome, where yes = 1, so I used a logistic model. I added the usual predictors. Results:

                                            Beta   SE CI.lower CI.upper
age                                         0.03 0.01     0.00     0.06
gender: Man                                 0.00   NA       NA       NA
gender: Other                              -1.85 0.59    -3.01    -0.70
gender: Woman                               0.12 0.03     0.06     0.19
race: White                                 0.00   NA       NA       NA
race: Mixed                                -1.15 0.06    -1.27    -1.03
race: Asian                                -0.66 0.07    -0.80    -0.52
race: Hispanic / Latin                     -1.40 0.09    -1.57    -1.22
race: Black                                -1.49 0.09    -1.67    -1.31
race: Other                                -2.06 0.15    -2.36    -1.76
race: Indian                               -0.60 0.15    -0.89    -0.31
race: Middle Eastern                       -1.30 0.24    -1.78    -0.82
race: Native American                      -0.28 0.26    -0.78     0.22
race: Pacific Islander                     -2.93 0.72    -4.35    -1.52
CA                                         -0.41 0.02    -0.44    -0.38
ideology: Liberal / Left-wing               0.00   NA       NA       NA
ideology: Centrist                          0.74 0.04     0.66     0.82
ideology: Conservative / Right-wing         1.49 0.05     1.39     1.60
ideology: Other                             0.76 0.04     0.69     0.83
religion_seriousness: Not at all important  0.00   NA       NA       NA
religion_seriousness: Not very important    0.42 0.04     0.35     0.49
religion_seriousness: Somewhat important   -0.18 0.03    -0.25    -0.12
religion_seriousness: Extremely important  -0.09 0.03    -0.15    -0.04

        N pseudo-R2  deviance       AIC 
 34519.00      0.12  30143.27  30183.27
$meta N pseudo-R2 deviance AIC 34519.00 0.12 30143.27 30183.27

The betas are standardized. CA = cognitive ability. Measured from an ad hoc collection of 14 items.

Some conclusions:

  • Women are slightly more ethnocentric.
  • All non-White races are less ethnocentric.
  • Smarter people are less ethnocentric.
  • Non-liberals and particularly conservatives/right-wingers are more ethnocentric.
  • Religion has a non-linear relationship in that the somewhat religious are more ethnocentric, but the more strongly religious are less ethnocentric.

See also: heritability of ethnocentrism and related traits (sorry egalitarians, racists were born that way!)

R code:

Previously I found that religiousness predicted ethnocentrism fairly linearly, but I can’t reproduce the result — even using the same code! No idea what caused this result.

March 23, 2016

New papers out! Admixture in the Americas

I forgot to post this blog post at the time of publication as I usually do. However, here it is.

As explored in some previous posts, John Fuerst and I have spent about 1.25 years (!) producing a massive article: published version runs 119 pages; 25k words without the references; 159k characters incl. spaces. We received 6 critical comments by other scholars, to which we also produced a 57-page reply with new analyses. The first article was chosen as a target article in Mankind Quarterly and I recommend reading all the papers. Unfortunately, they are behind a paywall, except for ours:

  • Target paper:
  • Reply paper:
Older Posts »

Powered by WordPress