The sibling control design

A friend of mine and his brother just received their 23andme results.

brother1

brother2

In a table they look like this (I have added myself for comparison):

Macrorace Bro1 Bro2 Emil
European 52.6 53 99.8
MENA 42.5 41.3 0.2
South Asian 2.8 3.4 0
East Asian & Amerindian 1.1 0.7 0
Sub-Saharan African 0.5 0.5 0
Oceanian 0.5 0 0
Unassigned 0 1.1 0.1
Sum 100 100 100.1
Mesorace Bro1 Bro2 Emil
European
Northern 51.5 51.5 91.3
Southern 1 1.2 0
Ashkenazi 0.1 0 2.9
Eastern 0 0 4
Common European 0.1 0.4 1.5
MENA
Middle Eastern 42 40.8 0
North African 0.3 0.2 0.2
Common MENA 0.2 0.3 0
South Asian 2.8 3.4 0
East Asian & Amerindian
East Asian 0.7 0.4 0
Southeast Asian 0.2 0 0
Amerindian 0 0.1 0
Common East Asian & Amerindian 0.1 0.1 0
Sub-Saharan African
East 0.3 0.3 0
West 0.2 0.4 0
Central & South 0 0 0
Common Sub-Sahara African 0.1 0.1 0
Oceanian 0 0 0
Unassigned 0.5 1.1 0.1
Sum 100.1 100.3 100
Microrace Bro1 Bro2 Emil
European
Northern
Scandinavian 21.3 24.2 37.3
French & German 10.5 14.9 0.8
British and Irish 8.9 4.9 11
Finnish 0 0 0.3
Common Northern 10.7 7.5 42
Southern
Italian 0.9 0.8 0
Sardinian 0 0 0
Iberian 0 0 0
Balkan 0 0 0
Common Southern 0.1 0.4 0
Ashkenazi 0.1 0 2.9
Eastern 0 0 4
Common European 0.1 0.4 1.5
MENA
Middle Eastern 42 40.8 0
North African 0.3 0.2 0.2
Common MENA 0.2 0.3 0
South Asian 2.8 3.4 0
East Asian & Amerindian
East Asian
Japanese 0.2 0 0
Mongolian 0.1 0.2 0
Korean 0 0 0
Yakut 0 0 0
Chinese 0 0 0
Common East Asian 0.5 0.2 0
Southeast Asian 0.2 0 0
Amerindian 0 0.1 0
Common East Asian & Amerindian 0.1 0.1 0
Sub-Saharan African
East 0.3 0.3 0
West 0.2 0.4 0
Central & South 0 0 0
Common Sub-Sahara African 0.1 0.1 0
Oceanian 0 0 0
Unassigned 0.5 1.1 0.1
Sum 100.1 100.3 100.1

 

Note that I have used data from all three zoom levels. Sometimes people will ask the nonsensical question “How many races are there?” Well, it depends on how much you want to zoom in. 23andme supports three zoom-levels. I have called the groups identified macro-, meso- and microraces.

So we see that the siblings are almost but not exactly the same. As Jason Malloy has pointed out, this is a very important fact because it allows for a sibling-control study akin to Murray (2002). In this design, researchers find full-siblings, measure some predictor variable(s) from each sibling and compare them on the outcome variable(s). This is an important design because it removes the common environment (between family effects) confound that make interpretation of regression results difficult, e.g. those in The Bell Curve (Herrnstein and Murray, 1994). Murray (2002) used each sibling’s IQ to predict socioeconomic outcomes at adulthood (age 30-38): income, marriage and birth out of wedlock. I reproduce the tables below:

Murray_table2Murray_table3Murray_table4

The results are similar to the results from regression modeling presented in The Bell Curve. In other words, for this question, the effects were not due to the common environment confound.

The same design can be used for the question of whether racial ancestry predicts outcome variables such as general cognitive ability (g factor, IQ, etc.), income, educational attainment and crime rate. Since siblings differ somewhat in their ancestry (as was shown in the tables and figures above), then if the genetic hypothesis for the trait is true, then the differences in ancestry will slightly predict the level of the trait.

In practice for this to work, one will need a large sample of sibling sets (pairs, triples, etc.). To make it easy, they should not be admixture from more than 2 genetic clusters/races. So e.g. African Americans in the US are good for this purpose as they are mostly a mix of European and African genes, but there are other similar groups in the world: Colored in South Africa, Greenlanders in Denmark and Greenland (Moltke et al, 2015), admixed Hawaiians, basically everybody in South America (see admixture project, part I).

References