S_IQ2_noDC

Introduction and data sources

In my previous two posts, I analyzed the S factor in 33 Indian states and 31 Chinese regions. In both samples I found strongish S factors and they both correlated positively with cognitive estimates (IQ or G). In this post I used cognitive data from McDaniel (2006). He gives two sets of estimated IQs based on SAT-ACT and on NAEP. Unfortunately, they only correlate .58, so at least one of them is not a very accurate estimate of general intelligence.

His article also reports some correlations between these IQs and socioeconomic variables: Gross State Product per capita, median income and percent poverty. However, data for these variables is not given in the article, so I did not use them. Not quite sure where his data came from.

However, with cognitive data like this and the relatively large number of datapoints (50 or 51 depending on use of District of Colombia), it is possible to do a rather good study of the S factor and its correlates. High quality data for US states are readily available, so results should be strong. Factor analysis requires a case to variable ratio of at least 2:1 to deliver reliable results (Zhao, 2009). So, this means that one can do an S factor analysis with about 25 variables.

Thus, I set out to find about 25 diverse socioeconomic variables. There are two reasons to gather a very diverse sample of variables. First, for method of correlated vectors to work (Jensen, 1998), there must be variation in the indicators’ loading on the factor. Lack of variation causes restriction of range problems. Second, lack of diversity in the indicators of a latent variable leads to psychometric sampling error (Jensen, 1994; review post here for general intelligence measures).

My primary source was The 2012 Statistical Abstract website. I simply searched for “state” and picked various measures. I tried to pick things that weren’t too dependent on geography. E.g. kilometer of coast line per capita would be very bad since it’s neither socioeconomic and very dependent (near 100%) on geographical factors. To increase reliability, I generally used all data for the last 10 years and averaged them. Curious readers should see the datafile for details.

I ended up with the following variables:

  1. Murder rate per 100k, 10 years
  2. Proportion with high school or more education, 4 years
  3. Proportion with bachelor or more education, 4 years
  4. Proportion with advanced degree or more, 4 years
  5. Voter turnout, presidential elections, 3 years
  6. Voter turnout, house of representatives, 6 years
  7. Percent below poverty, 10 years
  8. Personal income per capita, 1 year
  9. Percent unemployed, 11 years
  10. Internet usage, 1 year
  11. Percent smokers, male, 1 year
  12. Percent smokers, female, 1 year
  13. Physicians per capita, 1 year
  14. Nurses per capita, 1 year
  15. Percent with health care insurance, 1 year
  16. Percent in ‘Medicaid Managed Care Enrollment’, 1 year
  17. Proportion of population urban, 1 year
  18. Abortion rate, 5 years
  19. Marriage rate, 6 years
  20. Divorce rate, 6 years
  21. Incarceration rate, 2 years
  22. Gini coefficient, 10 years
  23. Top 1%, proportion of total income, 10 years
  24. Obesity rate, 1 year

Most of these are self-explanatory. For the economic inequality measures, I found 6 different measures (here). Since I wanted diversity, I chose the Gini and the top 1% because these correlated the least and are well-known.

Aside from the above, I also fetched the racial proportions for each state, to see how they relate the S factor (and the various measures above, but to get these, run the analysis yourself).

I used R with RStudio for all analyses. Source code and data is available in the supplementary material.

Missing data

In large analyses like this there are nearly always some missing data. The matrixplot() looks like this:

matrixplot

(It does not seem possible to change the font size, so I have cut off the names at the 8th character.)

We see that there aren’t many missing values. I imputed all the missing values with the VIM package (deterministic imputation using multiple regression).

Extreme values

A useful feature of the matrixplot() is that it shows in greytone the relatively outliers for each variable. We can see that some of them have some hefty outliers, which may be data errors. Therefore, I examined them.

The outlier in the two university degree variables is DC, surely because the government is based there and there is a huge lobbyist center. For the marriage rate, the outlier is Nevada. Many people go there and get married. Physician and nurse rates are also DC, same reason (maybe one could make up some story about how politics causes health problems!).

After imputation, the matrixplot() looks like this:

matrixplot_after

It is pretty much the same as before, which means that we did not substantially change the data — good!

Factor analyzing the data

Then we factor analyze the data (socioeconomic data only). We plot the loadings (sorted) with a dotplot:

S_loadings_US

We see a wide spread of variable loadings. All but two of them load in the expected direction — positive are socially valued outcomes, negative the opposite — showing the existence of the S factor. The ‘exceptions’ are: abortion rate loading +.60, but often seen as a negative thing. It is however open to discussion. Maybe higher abortion rates can be interpreted as less backward religiousness or more freedom for women (both good in my view). The other is marriage rate at -.19 (weak loading). I’m not sure how to interpret that. In any case, both of these are debatable which way the proper desirable direction is.

Correlations with cognitive measures

And now comes the big question, does state S correlate with our IQ estimates? They do, the correlations are: .14 (SAT-ACT) and .43 (NAEP). These are fairly low given our expectations. Perhaps we can work out what is happening if we plot them:

S_IQ1S_IQ2

Now we can see what is going on. First, the SAT-ACT estimates are pretty strange for three states: California, Arizona and Nevada. I note that these are three adjacent states, so it is quite possibly some kind of regional testing practice that’s throwing off the estimates. If someone knows, let me know. Second, DC is a huge outlier in S, as we may have expected from our short discussion of extreme values above. It’s basically a city state which is half-composed of low s (SES) African Americans and half upper class related to government.

Dealing with outliers – Spearman’s correlation aka. rank-order correlation

There are various ways to deal with outliers. One simple way is to convert the data into ranked data, and just correlate those like normal. Pearson’s correlations assume that the data are normally distributed, which is often not the case with higher-level data (states, countries). Using rank-order gets us these:

S_IQ1_rank S_IQ2_rank

So the correlations improved a lot for the SAT-ACT IQs and a bit for the NAEP ones.

Results without DC

Another idea is simply excluding the strange DC case, and then re-running the factor analysis. This procedure gives us these loadings:

S_loadings_noDC

(I have reversed them, because they were reversed e.g. education loading negatively.)

These are very similar to before, excluding DC did not substantially change results (good). Actually, the factor is a bit stronger without DC throwing off the results (using minres, proportion of var. = 36%, vs. 30%). The reason this happens is that DC is an odd case, scoring very high in some indicators (e.g. education) and very poorly in others (e.g. murder rate).

The correlations are:

S_IQ1_noDCS_IQ2_noDC

So, not surprisingly, we see an increase in the effect sizes from before: .14 to .31 and .43 to .69.

Without DC and rank-order

Still, one may wonder what the results would be with rank-order and DC removed. Like this:

S_IQ1_noDC_rankS_IQ2_noDC_rank

So compared to before, effect size increased for the SAT-ACT IQ and decreased slightly for the NAEP IQ.

Now, one could also do regression with weights based on some metric of the state population and this may further change results, but I think it’s safe to say that the cognitive measures correlate in the expected direction and with the removal of one strange case, the better measure performs at about the expected level with or without using rank-order correlations.

Method of correlated vectors

The MCV (Jensen, 1998) can be used to test whether a specific latent variable underlying some data is responsible for the observed correlation between the factor score (or factor score approximation such as IQ — an unweighted sum) and some criteria variable. Altho originally invented for use on cognitive test data and the general intelligence factor, I have previously used it in other areas (e.g. Kirkegaard, 2014). I also used it in the previous post on the S factor in India (but not China because there was a lack of variation in the loadings of socioeconomic variables on the S factor).

Using the dataset without DC, the MCV result for the NAEP dataset is:

MCV_NAEP_noDC

So, again we see that MCV can reach high r’s when there is a large number of diverse variables. But note that the value can be considered inflated because of the negative loadings of some variables. It is debatable whether one should reverse them.

Racial proportions of states and S and IQ

A last question is whether the states’ racial proportions predict their S score and their IQ estimate. There are lots of problems with this. First, the actual genomic proportions within these racial groups vary by state (Bryc, 2015). Second, within ‘pure-breed’ groups, general intelligence varies by state too (this was shown in the testing of draftees in the US in WW1). Third, there is an ‘other’ group that also varies from state to state, presumably different kinds of Asians (Japanese, Chinese, Indians, other SE Asia). Fourth, it is unclear how one should combine these proportions into an estimate used for correlation analysis or model them. Standard multiple regression is unsuited for handling this kind of data with a perfect linear dependency, i.e. the total proportion must add up to 1 (100%). MR assumes that the ‘independent’ variables are.. independent of each other. Surely some method exists that can handle this problem, but I’m not familiar with it. Given the four problems above, one will not expect near-perfect results, but one would probably expect most going in the right direction with non-near-zero size.

Perhaps the simplest way of analyzing it is correlation. These are susceptible to random confounds when e.g. white% correlates differentially with the other racial proportions. However, they should get the basic directions correct if not the effect size order too.

Racial proportions, NAEP IQ and S

For this analysis I use only the NAEP IQs and without DC, as I believe this is the best subdataset to rely on. I correlate this with the S factor and each racial proportion. The results are:

Racial group NAEP IQ S
White 0.69 0.18
Black -0.5 -0.42
Hispanic -0.38 -0.08
Other -0.26 0.2

 

For NAEP IQ, depending on what one thinks of the ‘other’ category, these have either exactly or roughly the order one expects: W>O>H>B. If one thinks “other” is mostly East Asian (Japanese, Chinese, Korean) with higher cognitive ability than Europeans, one would expect O>W>H>B. For S, however, the order is now O>W>H>B and the effect sizes much weaker. In general, given the limitations above, these are perhaps reasonable if somewhat on the weak side for S.

Estimating state IQ from racial proportions using racial IQs

One way to utilize all the four variable (white, black, hispanic and other) without having MR assign them weights is to assign them weights based on known group IQs and then calculate a mean estimated IQ for each state.

Depending on which estimates for group IQs one accepts, one might use something like the following:

State IQ est. = White*100+Other*100+Black*85+Hispanic*90

Or if one thinks other is somewhat higher than whites (this is not entirely unreasonable, but recall that the NAEP includes reading tests which foreigners and Asians perform less well on), one might want to use 105 for the other group (#2). Or one might want to raise black and hispanic IQs a bit, perhaps to 88 and 93 (#3). Or do both (#4) I did all of these variations, and the results are:

Variable Race.IQ Race.IQ2 Race.IQ3 Race.IQ4
Race.IQ 1 0.96 1 0.93
Race.IQ2 0.96 1 0.96 0.99
Race.IQ3 1 0.96 1 0.94
Race.IQ4 0.93 0.99 0.94 1
NAEP IQ 0.67 0.56 0.67 0.51
S 0.41 0.44 0.42 0.45

 

As far as I can tell, there is no strong reason to pick any of these over each other. However, what we learn is that the racial IQ estimate and NAEP IQ estimate is somewhere between .51 and .67, and the racial IQ estimate and S is somewhere between .41 and .45. These are reasonable results given the problems of this analysis described above I think.

Added March 11: New NAEP data

I came across a series of posts by science blogger The Audacious Epigone, who has also estimated IQs based on NAEP data. He has done this three times (for 2013, 2009 and 2005 data), so along with McDaniels estimates, this gives us 4 non-identical estimates. First, we check their intercorrelations, which should be very high, r>.9, for this kind of data. Second, we extract the general factor and use it as the best estimate of NAEP IQ for the states (I deleted DC again). Third, we see how all 5 variables relate to S from before.

Results:

NAEP.IQ.13 NAEP.IQ.09 NAEP.IQ.05 NAEP M. NAEP.1
NAEP.IQ.09 0.96        
NAEP.IQ.05 0.83 0.89      
NAEP M. 0.88 0.93 0.96    
NAEP.1 0.95 0.99 0.95 0.97  
S 0.81 0.76 0.64 0.69 0.75

 

Where NAEP.1 is the general NAEP factor. We see that intercorrelations between NAEP estimates are not that high, they average only .86. Their loadings on the common factor is very high tho, .95 to .99. Still, this should result in improved results due to measurement error. And it does, NAEP IQ x S is now .75 from .69.

Scatter plot

NAEP_S_new

 

Supplementary material

Data files and R source code available on the Open Science Framework repository.

References

Bryc, K., Durand, E. Y., Macpherson, J. M., Reich, D., & Mountain, J. L. (2015). The Genetic Ancestry of African Americans, Latinos, and European Americans across the United States. The American Journal of Human Genetics, 96(1), 37-53.

Jensen, A. R., & Weng, L. J. (1994). What is a good g?. Intelligence, 18(3), 231-258.

Jensen, A. R. (1998). The g factor: The science of mental ability. Westport, CT: Praeger.

Kirkegaard, E. O. W. (2014). The international general socioeconomic factor: Factor analyzing international rankings. Open Differential Psychology.

McDaniel, M. A. (2006). State preferences for the ACT versus SAT complicates inferences about SAT-derived state IQ estimates: A comment on Kanazawa (2006). Intelligence, 34(6), 601-606.

Zhao, N. (2009). The Minimum Sample Size in Factor Analysis. Encorewiki.org.

Some time ago a new paper came out from the 23andme people reporting admixture among US ethnoracial groups (Bryc et al, 2014). Per our still on-going admixture project (current draft here), one could see if admixture predicts academic achievement (or IQ, if such were available). We (that is, John did) put together achievement data (reading and math scores) from the NAEP and the admixture data here.

Descriptive stats

Admixture studies do not work well if there is no or little variation within groups. So let’s first examine them. For blacks:

                      vars  n mean   sd median trimmed  mad  min  max range  skew kurtosis   se
BlackAfricanAncestry     1 31 0.74 0.04   0.74    0.74 0.03 0.64 0.83  0.19 -0.03    -0.38 0.01
BlackEuropeanAncestry    1 31 0.23 0.04   0.24    0.23 0.03 0.15 0.34  0.19  0.09    -0.30 0.01

 

So we see that there is little American admixture in Blacks because the African and European add up to close to 100 (23+74=97). In fact, the correlation between African and European ancestry in Blacks is -.99. This also means that multiple correlation is useless because of collinearity.

White admixture data is also not very useful. It is almost exclusively European:

                      vars  n mean sd median trimmed mad  min max range  skew kurtosis se
WhiteEuropeanAncestry    1 51 0.99  0   0.99    0.99   0 0.98   1  0.02 -0.95     0.74  0

What about Hispanics (some sources call them Latinos)?

                       vars  n mean   sd median trimmed  mad  min  max range skew kurtosis   se
LatinoEuropeanAncestry    1 34 0.73 0.07   0.72    0.73 0.05 0.57 0.90  0.33 0.34     0.22 0.01
LatinoAfricanAncestry     1 34 0.09 0.05   0.08    0.08 0.06 0.01 0.22  0.21 0.51    -0.69 0.01
LatinoAmericanAncestry    1 34 0.10 0.05   0.09    0.10 0.03 0.04 0.21  0.17 0.80    -0.47 0.01

Hispanics are fairly admixed. Overall, they are mostly European, but the range of African and American ancestry is quite high. Furthermore, due to the three way variation, multiple regression should work. The ancestry intercorrelations are: -.42 (Afro x Amer) -.21 (Afro x Euro) -.50 (Amer x Euro). There must also be another source because 73+9+10 is only 92%. Where’s the last 8% admixture from?

Admixture x academic achievement correlations: Blacks

row.names BlackAfricanAncestry BlackAmericanAncestry BlackEuropeanAncestry
1 Math2013B -0.32 0.09 0.29
2 Math2011B -0.27 0.21 0.25
3 Math2009B -0.30 0.09 0.28
4 Math2007B -0.12 0.27 0.08
5 Math2005B -0.28 0.26 0.23
6 Math2003B -0.30 0.15 0.26
7 Math2000B -0.36 -0.08 0.34
8 Read2013B -0.25 0.14 0.22
9 Read2011B -0.33 0.22 0.30
10 Read2009B -0.40 -0.03 0.41
11 Read2007B -0.26 0.14 0.24
12 Read2005B -0.43 0.33 0.39
13 Read2003B -0.42 0.09 0.38
14 Read2002B -0.30 -0.10 0.27

 

Summarizing these results:

     vars  n  mean   sd median trimmed  mad   min   max range  skew kurtosis   se
Afro    1 14 -0.31 0.08  -0.30   -0.32 0.05 -0.43 -0.12  0.31  0.48     0.10 0.02
Amer    1 14  0.13 0.13   0.14    0.13 0.11 -0.10  0.33  0.43 -0.32    -1.07 0.03
Euro    1 14  0.28 0.08   0.28    0.29 0.06  0.08  0.41  0.33 -0.49     0.11 0.02

So we see the expected directions and order, for Blacks (who are mostly African), American admixture is positive and European is more positive. There is quite a bit of variation over the years. It is possible that this reflects mostly ‘noise’ as in, e.g. changes in educational policies in the states, or just sampling error. It is also possible that the changes are due to admixture changes within states over time.

Admixture x academic achievement correlations: Hispanics

row.names LatinoAfricanAncestry LatinoAmericanAncestry LatinoEuropeanAncestry
1 Math13H 0.20 -0.13 -0.10
2 Math11H 0.27 0.02 -0.02
3 Math09H 0.29 -0.32 0.04
4 Math07H 0.36 -0.14 -0.01
5 Math05H 0.38 -0.08 0.00
6 Math03H 0.37 -0.23 -0.08
7 Math00H 0.30 -0.09 -0.05
8 Read2013H 0.18 -0.44 0.33
9 Read2011H 0.21 -0.26 0.33
10 Read2009H 0.19 -0.44 0.33
11 Read2007H 0.13 -0.32 0.23
12 Read2005H 0.38 -0.30 0.23
13 Read2003H 0.32 -0.34 0.18
14 Read2002H 0.24 -0.23 0.08

And summarizing:

     vars  n  mean   sd median trimmed  mad   min  max range  skew kurtosis   se
Afro    1 14  0.27 0.08   0.28    0.28 0.12  0.13 0.38  0.25 -0.10    -1.49 0.02
Amer    1 14 -0.24 0.14  -0.24   -0.24 0.15 -0.44 0.02  0.46  0.17    -1.13 0.04
Euro    1 14  0.11 0.16   0.06    0.11 0.19 -0.10 0.33  0.43  0.23    -1.68 0.04

We do not see the expected results per genetic model. Among Hispanics who are 73% European, African admixture has a positive relationship to academic achievement. American admixture is negatively correlated and European positively, but weaker than African. The only thing that’s in line with the genetic model is that European is positive. On the other hand, results are not in line with a null model either, because then we were expecting results to fluctuate around 0.

Note that the European admixture numbers are only positive for the reading tests. The reading tests are presumably those mostly affected by language bias (many Hispanics speak Spanish as a first language). If anything, the math results are worse for the genetic model.

General achievement factors

We can eliminate some of the noise in the data by extracting a general achievement factor for each group. I do this by first removing the cases with no data at all, and then imputing the rest.

Then we get the correlation like before. This should be fairly close to the means above:

 LatinoAfricanAncestry LatinoAmericanAncestry LatinoEuropeanAncestry 
                  0.28                  -0.36                   0.22

The European result is stronger with the general factor from the imputed dataset, but the order is the same.

We can do the same for the Black data to see if the imputation+factor analysis screws up the results:

 BlackAfricanAncestry BlackAmericanAncestry BlackEuropeanAncestry 
                -0.35                  0.20                  0.31

These results are similar to before (-.31, .13, .28) with the American result somewhat stronger.

Plotting

Perhaps if we plot the results, we can figure out what is going on. We can plot either the general achievement factor, or specific results. Let’s do both:

Reading2013 plots

hispanic_afro_read13 hispanic_amer_read13 hispanic_euro_read13

Math2013 plots

hispanic_afro_math13 hispanic_amer_math13 hispanic_euro_math13

General factor plots

hispanic_afro_general hispanic_amer_general hispanic_euro_general

These did not help me understand it. Maybe they make more sense to someone who understands US demographics and history better.

Multiple regression

As mentioned above, the Black data should be mostly useless for multiple regression due to high collinearity. But the hispanic should be better. I ran models using two of the three ancestry estimates at a time since one cannot use all three (I think).

Generally, the independents did not reach significance. Using the general achievement factor as the dependent, the standardized betas are:

LatinoAfricanAncestry LatinoAmericanAncestry
             0.1526765             -0.2910413
LatinoAfricanAncestry LatinoEuropeanAncestry
             0.3363636              0.2931108
LatinoAmericanAncestry LatinoEuropeanAncestry
           -0.32474678             0.06224425

The first is relative to European, second to American, and third African. The results are not even consistent with each other. In the first, African>European. In the third, European>African. All results show that Others>American tho.

The remainder

There is something odd about the data, it doesn’t sum to 1. I calculated the sum of the ancestry estimates, and then subtracted that from 1. Here’s the results:

black_remainder hispanic_remainder

To these we can add simple descriptive stats:

                        vars  n mean   sd median trimmed  mad  min  max range skew kurtosis   se
BlackRemainderAncestry     1 31 0.02 0.00   0.02    0.02 0.00 0.01 0.03  0.02 1.35     1.18 0.00
LatinoRemainderAncestry    1 34 0.08 0.05   0.07    0.07 0.03 0.02 0.34  0.32 3.13    12.78 0.01

 

So we see that there is a sizable other proportion of Hispanics and a small one for Blacks. Presumably, the large outlier of Hawaii is Asian admixture from Japanese, Chinese, Filipino and Native Hawaiian clusters. At least, these are the largest groups according to Wikipedia. For Blacks, the ancestry is presumably Asian admixture as well.

Do these remainders correlate with academic achievement? For Blacks, r = .39 (p = .03), and for Hispanics r = -.24 (p = .18). So the direction is as expected for Blacks and stronger, but for Hispanics, it is in the right direction but weaker.

Partial correlations

What about partialing out the remainders?

LatinoAfricanAncestry LatinoAmericanAncestry LatinoEuropeanAncestry
            0.21881404            -0.33114612             0.09329413
BlackAfricanAncestry BlackAmericanAncestry BlackEuropeanAncestry
           -0.2256171             0.1189219             0.2185139

 

Not much has changed. European correlation has become weaker for Hispanics. For Blacks, results are similar to before.

Proposed explanations?

The African results are in line with genetic models. The Hispanic is not, but it isn’t in line with the null-model either. Perhaps it has something to do with generational effects. Perhaps if one could find % of first generation Hispanics by state and add those to the regression model / control for that using partial correlations.

Other ideas? Before calculating the results, John wrote:

Language, generation, and genetic assimilation are all confounded, so I thought it best to not look at them.

He may be right.

R code

data = read.csv("BryceAdmixNAEP.tsv", sep="\t",row.names=1)
library(car) # for vif
library(psych) # for describe
library(VIM) # for imputation
library(QuantPsyc) #for lm.beta
library(devtools) #for source_url
#load mega functions
source_url("https://osf.io/project/zdcbq/osfstorage/files/mega_functions.R/?action=download")

#descriptive stats
#blacks
rbind(describe(data["BlackAfricanAncestry"]),
describe(data["BlackEuropeanAncestry"]))
#whites
describe(data["WhiteEuropeanAncestry"])
#hispanics
rbind(describe(data["LatinoEuropeanAncestry"]),
      describe(data["LatinoAfricanAncestry"]),
      describe(data["LatinoAmericanAncestry"]))

##Regressions
#Blacks
black.model = "Math2013B ~ BlackAfricanAncestry+BlackAmericanAncestry"
black.model = "Read2013B ~ BlackAfricanAncestry+BlackAmericanAncestry"
black.model = "Math2013B ~ BlackAfricanAncestry+BlackEuropeanAncestry"
black.model = "Read2013B ~ BlackAfricanAncestry+BlackEuropeanAncestry"
black.fit = lm(black.model, data)
summary(black.fit)

#Hispanics
hispanic.model = "Math2013H ~ LatinoAfricanAncestry+LatinoAmericanAncestry"
hispanic.model = "Read2013H ~ LatinoAfricanAncestry+LatinoAmericanAncestry"
hispanic.model = "Math2013H ~ LatinoAfricanAncestry+LatinoEuropeanAncestry"
hispanic.model = "Read2013H ~ LatinoAfricanAncestry+LatinoEuropeanAncestry"
hispanic.model = "hispanic.ach.factor ~ LatinoAfricanAncestry+LatinoAmericanAncestry"
hispanic.model = "hispanic.ach.factor ~ LatinoAfricanAncestry+LatinoEuropeanAncestry"
hispanic.model = "hispanic.ach.factor ~ LatinoAmericanAncestry+LatinoEuropeanAncestry"
hispanic.model = "hispanic.ach.factor ~ LatinoAfricanAncestry+LatinoAmericanAncestry+LatinoEuropeanAncestry"
hispanic.fit = lm(hispanic.model, data)
summary(hispanic.fit)
lm.beta(hispanic.fit)

##Correlations
cors = round(rcorr(as.matrix(data))$r,2) #all correlations, round to 2 decimals

#blacks
admixture.cors.black = cors[10:23,1:3] #Black admixture x Achv.
hist(unlist(admixture.cors.black[,1])) #hist for afri x achv
hist(unlist(admixture.cors.black[,2])) #amer x achv
hist(unlist(admixture.cors.black[,3])) #euro x achv
desc = rbind(Afro=describe(unlist(admixture.cors.black[,1])), #descp. stats afri x achv
             Amer=describe(unlist(admixture.cors.black[,2])), #amer x achv
             Euro=describe(unlist(admixture.cors.black[,3]))) #euro x achv

#whites
admixture.cors.white = cors[24:25,4:6] #White admixture x Achv.

#hispanics
admixture.cors.hispanic = cors[26:39,7:9] #White admixture x Achv.
desc = rbind(Afro=describe(unlist(admixture.cors.hispanic[,1])), #descp. stats afri x achv
             Amer=describe(unlist(admixture.cors.hispanic[,2])), #amer x achv
             Euro=describe(unlist(admixture.cors.hispanic[,3]))) #euro x achv

##Examine hispanics by scatterplots
#Reading
scatterplot(Read2013H ~ LatinoAfricanAncestry, data,
            smoother=FALSE, id.n=nrow(data))
scatterplot(Read2013H ~ LatinoEuropeanAncestry, data,
            smoother=FALSE, id.n=nrow(data))
scatterplot(Read2013H ~ LatinoAmericanAncestry, data,
            smoother=FALSE, id.n=nrow(data))
#Math
scatterplot(Math2013H ~ LatinoAfricanAncestry, data,
            smoother=FALSE, id.n=nrow(data))
scatterplot(Math2013H ~ LatinoEuropeanAncestry, data,
            smoother=FALSE,id.n=nrow(data))
scatterplot(Math2013H ~ LatinoAmericanAncestry, data,
            smoother=FALSE,id.n=nrow(data))
#General factor
scatterplot(hispanic.ach.factor ~ LatinoAfricanAncestry, data,
            smoother=FALSE, id.n=nrow(data))
scatterplot(hispanic.ach.factor ~ LatinoEuropeanAncestry, data,
            smoother=FALSE,id.n=nrow(data))
scatterplot(hispanic.ach.factor ~ LatinoAmericanAncestry, data,
            smoother=FALSE,id.n=nrow(data))

##Imputed and aggregated data
#Hispanics
hispanic.ach.data = data[26:39] #subset hispanic ach data
hispanic.ach.data = hispanic.ach.data[miss.case(hispanic.ach.data)<ncol(hispanic.ach.data),] #remove empty cases
miss.table(hispanic.ach.data) #examine missing data
hispanic.ach.data = irmi(hispanic.ach.data, noise.factor = 0) #impute the rest
#factor analysis
fact.hispanic = fa(hispanic.ach.data) #get common ach factor
fact.scores = fact.hispanic$scores; colnames(fact.scores) = "hispanic.ach.factor"
data = merge.datasets(data,fact.scores,1) #merge it back into data
cors[7:9,"hispanic.ach.factor"] #results for general factor

#Blacks
black.ach.data = data[10:23] #subset black ach data
black.ach.data = black.ach.data[miss.case(black.ach.data)<ncol(black.ach.data),] #remove empty cases
black.ach.data = irmi(black.ach.data, noise.factor = 0) #impute the rest
#factor analysis
fact.black = fa(black.ach.data) #get common ach factor
fact.scores = fact.black$scores; colnames(fact.scores) = "black.ach.factor"
data = merge.datasets(data,fact.scores,1) #merge it back into data
cors[1:3,"black.ach.factor"] #results for general factor

##Admixture totals
#Hispanic
Hispanic.admixture = subset(data, select=c("LatinoAfricanAncestry","LatinoAmericanAncestry","LatinoEuropeanAncestry"))
Hispanic.admixture = Hispanic.admixture[miss.case(Hispanic.admixture)==0,] #complete cases
Hispanic.admixture.sum = data.frame(apply(Hispanic.admixture, 1, sum))
colnames(Hispanic.admixture.sum)="Hispanic.admixture.sum" #fix name
describe(Hispanic.admixture.sum) #stats

#add data back to dataframe
LatinoRemainderAncestry = 1-Hispanic.admixture.sum #get remainder
colnames(LatinoRemainderAncestry) = "LatinoRemainderAncestry" #rename
data = merge.datasets(LatinoRemainderAncestry,data,2) #merge back

#plot it
LatinoRemainderAncestry = LatinoRemainderAncestry[order(LatinoRemainderAncestry,decreasing=FALSE),,drop=FALSE] #reorder
dotchart(as.matrix(LatinoRemainderAncestry),cex=.7) #plot, with smaller text

#Black
Black.admixture = subset(data, select=c("BlackAfricanAncestry","BlackAmericanAncestry","BlackEuropeanAncestry"))
Black.admixture = Black.admixture[miss.case(Black.admixture)==0,] #complete cases
Black.admixture.sum = data.frame(apply(Black.admixture, 1, sum))
colnames(Black.admixture.sum)="Black.admixture.sum" #fix name
describe(Black.admixture.sum) #stats

#add data back to dataframe
BlackRemainderAncestry = 1-Black.admixture.sum #get remainder
colnames(BlackRemainderAncestry) = "BlackRemainderAncestry" #rename
data = merge.datasets(BlackRemainderAncestry,data,2) #merge back

#plot it
BlackRemainderAncestry = BlackRemainderAncestry[order(BlackRemainderAncestry,decreasing=FALSE),,drop=FALSE] #reorder
dotchart(as.matrix(BlackRemainderAncestry),cex=.7) #plot, with smaller text

#simple stats for both
rbind(describe(BlackRemainderAncestry),describe(LatinoRemainderAncestry))

#make subset with remainder data and achievement
remainders = subset(data, select=c("black.ach.factor","BlackRemainderAncestry",
                                   "hispanic.ach.factor","LatinoRemainderAncestry"))
View(rcorr(as.matrix(remainders))$r) #correlations?

#Partial correlations
partial.r(data, c(7:9,40), c(43))[4,] #partial out remainder for Hispanics
partial.r(data, c(1:3,41), c(42))[4,] #partial out remainder for Blacks

References

Bryc, K., Durand, E. Y., Macpherson, J. M., Reich, D., & Mountain, J. L. (2014). The Genetic Ancestry of African Americans, Latinos, and European Americans across the United States. The American Journal of Human Genetics.

www.goodreads.com/book/show/16171221-the-sports-gene

gen.lib.rus.ec/book/index.php?md5=51b837f6d7627ac696f3f36b2ca6edbc

Incidentally, the Wiki page was very poor, so I had to rewrite that before writing this.

Generally, this was an interesting read that taught me a lot. This probably has to do with me not really caring much about sports. Some parts can be boring if you don’t care/know much about e.g. Baseball. It is pretty US-centric in the topics chosen.

The science in the book comes mostly thru interviews with experts and some summarizing of studies. Rarely is sufficient detail given about the studies for one to make an informed decision about whether to trust it or not. Usually, no sample sizes, p-values, effect sizes etc. are mentioned. It was written as a popular science book to be fair, so this criticism is somewhat unfair.

Some quotes:

When scientists at Washington University in St. Louis tested him, Pujols, the greatest hitter of an era, was in the sixty-sixth percentile for simple reaction time compared with a random sample of college students.

College students are above average g, which means above average reaction time. Presumably, the tested simple reaction time. This correlates about .2 with g. College students are perhaps at 115 on average. This university is apparently a top university. So perhaps the mean IQ is 120-125 there, meaning that these students are about 0.334 d above the mean on reaction time (unless they were students in fysical ed. in which case they may be even higher). Being at the 66 centile is not bad then.

Jason Gulbin, the physiologist who worked on Australia’s Olympic skeleton experiment, says that the word “genetics” has become so taboo in his talent-identification field that “we actively changed our language here around genetic work that we’re doing from ‘genetics’ to ‘molecular biology and protein synthesis.’ It was, literally, ‘Don’t mention the g-word.’ Any research proposals we put in, we don’t mention the genetics if we can help it. It’s: ‘Oh, well, if you’re doing molecular biology and protein synthesis, well, that’s all right.’” Never mind that it’s the same thing.

Studying race? NAZI NAZI!!! Studying population genetics? No problem, carry on.

This story is fascinating. Perhaps the best example of how categorical thinking about gender lead to real life problems.

Several scientists I spoke with about the theory insisted that they would have no interest in investigating it because of the inevitably thorny issue of race involved. One of them told me that he actually has data on ethnic differences with respect to a particular physiological trait, but that he would never publish the data because of the potential controversy. Another told me he would worry about following Cooper and Morrison’s line of inquiry because any suggestion of a physical advantage among a group of people could be equated to a corresponding lack of intellect, as if athleticism and intelligence were on some kind of biological teeter-totter. With that stigma in mind, perhaps the most important writing Cooper did in Black Superman was his methodical evisceration of any supposed inverse link between physical and mental prowess. “The concept that physical superiority could somehow be a symptom of intellectual inferiority only developed when physical superiority became associated with African Americans,” Cooper wrote. “That association did not begin until about 1936.” The idea that athleticism was suddenly inversely proportional to intellect was never a cause of bigotry, but rather a result of it. And Cooper implied that more serious scientific inquiry into difficult issues, not less, is the appropriate path.

How very familiar. Better not hurt those feelings! At least they should publish the data anonymously in some way so others can examine them.

There is a university called Lehigh… Le High… geddit??

In 2010, Heather Huson, a geneticist then studying at the University of Alaska, Fairbanks—and a dogsled racer since age seven—tested dogs from eight different racing kennels. To Huson’s surprise, Alaskan sled dogs have been so thoroughly bred for specific traits that analysis of microsatellites—repeats of small sequences of DNA—proved Alaskan huskies to be an entirely genetically distinct breed, as unique as poodles or labs, rather than just a variation of Alaskan malamutes or Siberian huskies.
Huson and colleagues discovered genetic traces of twenty-one dog breeds, in addition to the unique Alaskan husky signature. The research team also established that the dogs had widely disparate work ethics (measured via the tension in their tug lines) and that sled dogs with better work ethics had more DNA from Anatolian shepherds—a muscular, often blond breed of dog originally prized as a guardian of sheep because it would eagerly do battle with wolves. That Anatolian shepherd genes uniquely contribute to the work ethic of sled dogs was a new finding, but the best mushers already knew that work ethic is specifically bred into dogs.
“Yeah, thirty-eight years ago in the Iditarod there were dogs that weren’t enthused about doing it, and that were forced to do it,” Mackey says. “I want to be out there and have the privilege of going along for the ride because they want to go, because they love what they do, not because I want to go across the state of Alaska for my satisfaction, but because they love doing it. And that’s what’s happened over forty years of breeding. We’ve made and designed dogs suited for desire.”

Admixture studies in dogs, a useful precedent to cite to ease the pain for newcomers.

In one tank are mice missing oxytocin receptors. They are used in the study of pain, but the mice also have deficits in social recognition. Put them with mice they grew up with and they won’t recognize them. In another corner is a tank of raven-haired mice that were bred to be prone to head pain, that is, migraines. They spend a lot of time scratching their foreheads and shuddering, and they are apparently justified in using the old headache excuse to avoid mating. “This experiment has taken years,” says Jeffrey Mogil, head of the lab, of the work that seeks to help develop migraine treatments, “because they breed really, really badly.”

How did they get ethics approval for this???

As Pitsiladis put it, to be a world-beater, “you absolutely must choose your parents correctly.” He was being facetious, of course, because we can’t choose our parents. Nor do humans tend to couple with conscious knowledge of one another’s gene variants. We pair up more in the manner of a roulette ball that bounces off a few pockets before settling into one of many suitable spots. Williams suggests, hypothetically, that if humanity is to produce an athlete with more “correct” sports genes, one approach is to weight the genetic roulette ball with more lineages in which parents and grandparents are outstanding athletes and thus probably harbor a large number of good athleticism genes. Yao Ming—at 7’5″, once the tallest active player in the NBA—was born from China’s tallest couple, a pair of ex–basketball players brought together by the Chinese basketball federation. As Brook Larmer writes in Operation Yao Ming: “Two generations of Yao Ming’s forebears had been singled out by authorities for their hulking physiques, and his mother and father were both drafted into the sports system against their will.” Still, the witting merger of athletes in pursuit of superstar progeny is rare.

Sure we do!  Some do it quite consciously, e.g. using dating sites that match for overall likeness.

www.goodreads.com/book/show/875481.Race

gen.lib.rus.ec/book/index.php?md5=5624936a816b96dd3e6a4af6808ee69b

I had seen references to this book in a number of places which got me curious. I am somewhat hesitant to read older books since I know much of what they discuss is dated and has been superseded by newer science. Sometimes, however, science (or the science culture) has gone wrong so one may actually learn more reading an older book than a newer one. Since fewer people read older books, one can sometimes find relevant but forgotten facts in them. Lastly, they can provide much needed historical information about the development of thinking about some idea or of some field. All of these remarks are arguably relevant to the race/population genetics controversy.

Still, I did not read the book immediately altho I had a PDF of it. I ended up starting to read it more or less at random due to a short talk I had with John Fuerst about it (we are writing together on racial admixture, intelligence and socioeconomic outcomes in the Americas and also wrote a paper on immigrant performance in Denmark).

So, the book really is dated. It spends hundreds of pages on arcane fysical anthropology which requires one to master human anatomy. Most readers don’t master this discipline, so these parts of the book are virtually un-understandable. However, they do provide one with the distinct impression of how one did fysical anthropology in old times. Lots of observations of cranium, other bones, noses, eyes+lids, teeth, lips, buttocks, etc., and then try to find clusters in these data manually. No wonder they did not reach that high agreement. The data are too scarce to find clusters and humans not sufficiently good at cluster analysis at the intuitive level. Still, they did notice some patterns that are surely correct, such as the division between various African populations, Ainu vs. Japanese, that Europeans are Asians are closer related, that Afghans etc. belong to the European supercluster etc. Clearly, these pre-genetic ideas were not all totally wrong headed. Here’s the table of Races+Subraces from the end of the book. They seem reasonably in line with modern evidence.

table

Some quotes:

The story of 7 ‘kinds’ of mosquitoes.

[Dobzhansky’s definition = ‘Species in sexual cross-fertilizing organisms can be defined as groups of populations which are reproductively isolated to the extent that the exchange of genes between them is absent or so slow that the genetic differences are not diminished or swamped.’]

Strict application of Dobzhansky’s definition results in certain very similar animals being assigned to different species. The malarial mosquitoes and their relatives provide a remarkable example of this. The facts are not only extreme­ly interesting from the purely scientific point of view, but also of great practical importance in the maintenance of public health in malarious districts. It was discovered in 1920 that one kind of the genus Anopheles, called elutus, could be distinguished from the well-known malarial mosquito, A. maculipennis, by certain minute differences in the adult, and by the fact that its its eggs looked different; but for our detailed knowledge of this subject we are mainly indebted to one Falleroni, a retired inspector of public health in Italy, who began in 1924 to breed Anopheles mosquitoes as a hobby. He noticed that several different kinds of eggs could be distinguished, that the same female always laid eggs having the same appearance, and that adult females derived from those eggs produced eggs of the same type. He realized that although the adults all appeared similar, there were in fact several different kinds, which he could recognize by the markings on their eggs. Falleroni named several different kinds after his friends, and the names he gave are the accepted ones today in scientific nomenclature.

It was not until 1931 that the matter came to the attention of L. W. Hackett, who, with A. Missiroli, did more than anyone else to unravel the details of this curious story.(449,447.448] The facts are these. There are in Europe six different kinds of Anopheles that cannot be distinguished with certainty from one another in the adult state, however carefully they are examined under the microscope by experts; a seventh kind, elutus, can be distinguished by minor differences if its age is known. The larvae of two of the kinds can be distinguished from one another by minute differences (in the type of palmate hair on the second segment, taken in conjunction with the number of branches of hair no. 2 on the fourth and fifth segments). Other supposed differences between the kinds, apart from those in the eggs, have been shown to be unreal.

In nature the seven kinds are not known to interbreed, and it is therefore necessary, under Dobzhansky’s definition, to regard them all as separate species.

The mates of six of the seven species have the habit of ‘swarming’ when ready to copulate. They join in groups of many individuals, humming, high in the air; suddenly the swarm bursts asunder and rejoins. The females recognize the swarms of males of their own species, and are attracted towards them. Each female dashes in, seizes a male, and flies off, copulating.

With the exceptions mentioned, the only visible differences between the species occur at the egg-stage. The eggs of six of the seven species are shown in Fig. 8 (p. 76).

6 anopheles

It will be noticed that each egg is roughly sausage-shaped, with an air-filled float at each side, which supports it in the water in which it is laid. The eggs of the different species are seen to differ in the length and position of the floats. The surface of the rest of the egg is covered all over with microscopic finger-shaped papillae, standing up like the pile of a carpet. It is these papillae that are responsible for the distinctive patterns seen on the eggs of the different species. Where the papillae are long and their tips rough, light is reflected to give a whitish appearance; where they are short and smooth, light passes through to reveal the underlying surface of the egg, which is black. The biological significance of these apparently trivial differences is unknown.

From the point of view of the ethnic problem the most interesting fact is this. Although the visible differences between the species are trivial and confined or almost confined to the egg-stage, it is evident that the nervous and sensory systems are different, for each species has its own habits. The males of one species (atroparvus) do not swarm. It has already been mentioned that the females recognize the males of their own species. Some of the species lay their eggs in fresh water, others in brackish. The females of some species suck the blood of cattle, and are harmless to man; those of other species suck the blood of man, and in injecting their saliva transmit malaria to him.

Examples could be quoted of other species that are distinguishable from one another by morphological differences no greater than those that separate the species of Anopheles; but the races of a single species—indeed, the subraces of a single race—are often distinguished from one another, in their typical forms, by obvious differences, affecting many parts of the body. It is not the case that species are necessarily very distinct, and races very similar. [p. 74ff]

Nature is very odd indeed! More on Wiki.

Some very strange examples of abnormalities of this sort have been recorded by reputable authorities. Buffon quotes two examples of an ‘amour violent’ between a dog and a sow. In one case the dog was a large spaniel on the property of the Comte de Feuillee, in Burgundy. Many persons witnessed ‘the mutual ardour of these two animals; the dog even made prodigious and oft-repeated efforts to copulate with the sow, but the unsuitability of their reproductive organs prevented their union.’ Another example, still more remarkable, occurred on Buffon’s own property. A miller kept a mare and a bull in the same stable. These two animals developed such a passion for one another that on all occasions when the mare was on heat, over a period of several years, the bull copulated with her three or four times a day, whenever he was free to do so. The act was witnessed by all the inhabitants of the place. [p. 92]

Of smelly Japanese:

There is, naturally enough, a correlation between the development of the axillary organ and the smelliness of the secretion of this gland (and probably this applies also to the a glands of the genito-anal region). Briefly, the Europids and Negrids are smelly, the Mongolids scarcely or not at all. so far as the axillary secretion is concerned. Adachi. who has devoted more study to this subject than anyone else, has summed up his findings in a single, short sentence: ‘The Mongolids are essentially an odourless or very slightly smelly race with dry ear-wax.’(5] Since most of the Japanese are free or almost free from axillary smell, they are very sensitive to its presence, of which they seem to have a horror. About 10% of Japanese have smelly axillae. This is attributed to remote Ainuid ancestry, since the Ainu are invariably smelly, like most other Europids, and a tendency to smelliness is known to be inherited among the Japanese. 151 The existence of the odour is regarded among Japanese as a disease, osmidrosis axillae which warrants (or used to warrant) exemption from military service. Certain doctors specialize in its treatment, and sufferers are accustomed to enter hospital. [p. 173]

Japan always take these things to a new level.

Measurements of adult stature, made on several thousand pairs of persons, show a rather close correspondence with these figures, namely, 0 507, 0-322, 0-543, and 0-287 respectively.(172) It will be noticed that the correlations are all somewhat higher than one would expect; that is to say, the members of each pair are, on average, rather more nearly of the same height than the simple theory would suggest. This is attributed in the main to the tendency towards assortative mating, the reality of which had already been recognized by Karl Pearson and Miss Lee in their paper published in 1903. [p. 462]

I didn’t know assortative mating was recognized so far back. This may be a good source to understand the historical development of understanding of assortative mating.

The reference is: Pearson, K. &  Lee,  A.,  1903.  ‘On  the  laws  of  inheritance  in  man.  I.  Inheritance  of  physical characters.’  Biometrika,  2, 357—462.

Definition of intelligence?

What has been said on p. 496 may now be rewritten in the form of a short definition of intelligence, in the straightforward, everyday sense of that word. It is the ability to perceive, comprehend, and reason, combined with the capacity to choose worth-while subjects for study, eagerness to acquire, use, transmit, and (if possible) add to knowledge and understanding, and the faculty for sustained effort towards these ends (cf. p. 438). One might say briefly that a person is intelligent in so far as his cognitive ability and personality tend towards productiveness through mental activity. [p. 495ff]

Baker prefers a broader definition of “intelligence” which includes certain non-cognitive parts. He uses “cognitive ability” like many people do now a days use “general cognitive ability”.

And now surely at the end of the book, the evil master-racist privileged white male John Baker tells us what to do with the information we just learned in the book:

Here, on reaching the end of the book, 1 must repeat some words that I wrote years ago when drafting the Introduction (p. 6), for there is nothing in the whole work that would tend to contradict or weaken them:
Every ethnic taxon of man includes many persons capable of living responsible and useful lives in the communities to which they belong, while even in those taxa that are best known for their contributions to the world’s store of intellectual wealth, there are many so mentally deficient that they would be inadequate members of any society. It follows that no one can claim superiority simply because he or she belongs to a particular ethnic taxon. [p. 534]

So, clearly according to our anti-racist heroes, Baker tells us to revel in our (sorry Jayman if you are reading!) European master ancestry, right?

edited: removed joke because public image -_-

www.goodreads.com/book/show/18526647-misbehaving-science

libgen.org/book/index.php?md5=ac86923b7bf1ed0639abf0e1c22810f8

The book is a sociologist trying to interpret the history of behavior genetics into sociology theories. I didn’t pay much attention to their theorizing, being familiar with that kind of nonsense or useless theory. It generally employs the kind of kind of terminology that sociologists are known for: reductionism here, genetic determinism there, racism, eugenics, Nazi, blahblah. It is somewhat dated despite just being released. This is the nature of legacy publishers, since it takes so long go get thru their machinery. It spends a lot of time talking about how the molecular (GWA) studies did not fulfill the dreams of behavior geneticists. This is however semi-moot now due to the fact that recent studies have replicated findings of g-genes and used GCTA to estimate heritability values that make extreme environmentalism impossible to hold onto.

It, however, did contain a lot of interesting quotes from unnamed persons, and various other stuff. It is recommended for those who have an interest in the history of behavior genetics and the race and IQ debate. I cannot give it 4 or 5 stars despite it being interesting due to the aforementioned problems.

For those who have been living under a rock (i.e. not following my on Twitter), John Fuerst have been very good at compiling data from published research. Have a look at Human Varieties with the tag Admixture Mapping. He asked me to help him analyze it and write it up. I gladly obliged, you can read the draft here. John thinks we should write it all into one huge paper instead of splitting it up as is standard practice. The standard practice is perhaps not entirely just for gaming the reputation system, but also because writing huge papers like that can seem overwhelming and may take a long time to get thru review.

So the project summarized so far is this:

  • Genetic models of trait admixture predict that mixed groups will be in-between the two source population in the trait in proportion to their admixture.
  • For psychological traits such as general intelligence (g), this has previously primarily been studied unsystematically in African Americans, but this line of research seems to have dried up, perhaps because it became too politically sensitive over there.
  • However, there have been some studies using the same method, just examining illness-related traits (e.g. diabetes). These studies usually include socioeconomic variables as controls. In doing so, they have found robust correlations between admixture at the individual level and socioeconomic outcomes: income, occupation, education and the like.
  • John has found quite a lot of these and compiled the results into a table that can be found here.
  • The results clearly show the expected results, namely that more European ancestry is associated with more favorable outcomes, more African or American less favorable outcomes. A few of them are non-significant, but none contradicts. A meta-analysis of this would find a very small p value indeed.
  • One study actually included cognitive measures as co-variates and found results in the generally expected direction. See material under the headline “Cognitive differences in the Americans” in the draft file.
  • There is no necessity that one has to look at the individual level. One can look at the group level too. For this reason John has compiled data about the ancestry proportions of American countries and Mexican regions.
  • For the countries, he has tested this against self-identified proportions, CIA World Factbook estimates, skin reflection data and stuff like that, see: humanvarieties.org/2014/10/19/racial-ancestry-in-the-americas-part-1-genomic-continental-racial-admixture-estimate-and-validation/ The results are pretty solid. The estimates are clearly in the right ballpark.
  • Now, genetic models of the world distribution of general intelligence clearly predict that these estimates will be strongly related to the countries’ estimated mean levels of general intelligence. To test this John has carried out a number of multiple regressions with various controls such as parasite prevalence or cold weather along with European ancestry with the dependent variable being skin color and national achievement scores (PISA tests and the like). Results are in the expected directions even with controls.
  • Using the Mexican regional data, John has compared the Amerindian estimates with PISA scores, Raven’s scores, and Human Development Index (a proxy for S factor (see here and here)). Post is here: humanvarieties.org/2014/10/15/district-level-variation-in-continental-racial-admixture-predicts-outcomes-in-mexico/

This is where we are. Basically, the data is all there, ready to be analyzed. Someone needs to do the other part of the grunt work, namely running all the obvious tests and writing everything up for a big paper. This is where I come in.

The first I did was to create an OSF repository for the data and code since John had been manually keeping track of versions on HV. Not too good. I also converted his SPSS datafile to one that works on all platforms (CSV with semi-colons).

Then I started writing code in R. First I wanted to look at the more obvious relationships, such as that between IQ and ancestry estimates (ratios). Here I discovered that John had used a newer dataset of IQ estimates Meisenberg had sent him. However, it seems to have wrong data (Guatemala) and covers fewer relevant countries (25 vs. 35) vs. than the standard dataset from Lynn and Vanhanen 2012 (+Malloyian fixes) that I have been using. So for this reason I merged up John’s already enormous dataset (126 variables) with the latest Megadataset (365 variables), to create the cleverly named supermegadataset to be used for this study.

IQ x Ancestry zero-order correlations

Here’s the three scatterplots:

Americas_Euro_Ancestry_IQ12data

IQ_amer

IQ_Afro

So the reader might wonder, what is wrong with the Amerindian data? Why is about nill? Simply inspecting it reveals the problem. The countries with low Amerindian ancestry have very mixed European vs. African which keeps the mean around 80-85 thus creating no correlation.

Partial correlations

So my idea was this, as I wrote it in my email to John:

Hey John,I wrote my bachelor in 4 days (5 pages per day), so now I’m back to working on more interesting things. I use the LV12 data because it seems better and is larger.

One thing that had been annoying me that was correlations between ancestry and IQ do not take into account that there are three variables that vary, not just two. Remember that odd low correlation Amer x IQ r=.14 compared with Euro x IQ = .68 and Afr x IQ = -.66. The reason for this, it seems to me, is that the countries with low Amer% are a mix of high and low Afr countries. That’s why you get a flat scatterplot. See attached.

Unfortunately, one cannot just use MR with these three variables, since the following equation is true of them 1 = Euro+Afr+Amer. They are structurally dependent. Remember that MR attempts to hold the other variables constant while changing one. This is impossible.
The solution is seems to me is to use partial correlations. In this way, one can partial out one of them and look at the remaining two. There are six possible ways to do this:Amer x IQ, partial out Afr = -.51
Amer x IQ, partial out Euro = .29
Euro x IQ, partial out Afr = .41
Euro x IQ, partial out Amer = .70
Afr x IQ, partial out Euro = -.37
Afr x IQ, partial out Amer = -.76
Assuming that genotypically, Amer=85, Afr=80, Euro=97 (or so), then these results are completed as expected direction wise. In the first case, we remove Afr, so we are comparing Amer vs. Euro. We expect negative since Amer<Euro
In two, we expect positive because Amer>Afr
In three, we expect positive because Euro>Amer
In four, we expect positive because Euro>Afr
In five, we expect negative because Afr<Amer
In six, we expect negative because Afr<Euro
All six predictions were as expected. The sample size is quite small at N=34 and LV12 isn’t perfect, certainly not for these countries. The overall results are quite reasonable in my review.
Estimates of IQ directly from ancestry
But instead merely looking at it via correlations or regressions, one can try to predict the IQs directly from the ancestry. Simple create a predicted IQ based on the proportions and these populations estimated IQs. I tried a number of variations, but they were all close to this: Euro*95+Amer*85+Afro*70. The reason to use Euro 95 and not, say, 100 is that 100 is the IQ of Northern Europeans, in particular the British (‘Greenwich Mean IQ’). The European genes found in the Americans are mostly from Spain and Portugal, which have estimated IQs of 96.6 and 94.4 (mean = 95.5). This creates a problem since the US and Canada are not mostly from these somewhat lower IQ Europeans, but the error source is small (one can always just try excluding them).

So, does the predictions work? Yes.

Now, there is another kind of error with such estimates, called elevation. It refers to getting the intervals between countries right, but generally either over or underestimating them. This kind of error is undetectable in correlation analysis. But one can calculate it by taking the predicted IQs and subtracting the measured IQs, and then taking the mean of these values. Positive values mean that one is overestimating, negative means underestimation. The value for the above is: 1.9, so we’re overestimating a little bit, but it’s fairly close. A bit of this is due to USA and CAN, but then again, LCA (St. Lucia) and DMA (Dominica) are strong negative outliers, perhaps just wrong estimates by Lynn and Vanhanen (the only study for St. Lucia is this, but I don’t have the norms so I can’t calculate the IQ).

I told Davide Piffer about these results and he suggested that I use his PCA factor scores instead. Now, these are not themselves meaningful, but they have the intervals directly estimated from the genetics. His numbers are: Africa: -1.71; Native American: -0.9; Spanish: -0.3. Ok, let’s try:

PCA_predicted_IQs

Astonishingly, the correlation is almost the same. .01 from. However, this fact is less overwhelming than it seems at first because it arises simply because the correlations between the three racial estimates is .999 (95.5

G.M. IQ & Economic growth

I noted down some comments while reading it.

In Table 1, Dominican birth cohort is reversed.

 

“0.70 and 0.80 in world-wide country samples. Figure 1 gives an impression of

this relationship.”

 

Figure 1 shows regional IQs, not GDP relationships.

“We still depend on these descriptive methods of quantitative genetics because

only a small proportion of individual variation in general intelligence and

school achievement can be explained by known genetic polymorphisms (e.g.,

Piffer, 2013a,b; Rietveld et al, 2013).”

 

We don’t. Modern BG studies can confirm A^2 estimates directly from the genes.

E.g.:

Davies, G., Tenesa, A., Payton, A., Yang, J., Harris, S. E., Liewald, D., … & Deary, I. J. (2011). Genome-wide association studies establish that human intelligence is highly heritable and polygenic. Molecular psychiatry, 16(10), 996-1005.

Marioni, R. E., Davies, G., Hayward, C., Liewald, D., Kerr, S. M., Campbell, A., … & Deary, I. J. (2014). Molecular genetic contributions to socioeconomic status and intelligence. Intelligence, 44, 26-32.

Results are fairly low tho, in the 20’s, presumably due to non-additive heritability and rarer genes.

 

“Even in modern societies, the heritability of

intelligence tends to be higher for children from higher socioeconomic status

(SES) families (Turkheimer et al, 2003; cf. Nagoshi and Johnson, 2005; van

der Sluis et al, 2008). Where this is observed, most likely environmental

conditions are of similar high quality for most high-SES children but are more

variable for low-SES children. “

 

Or maybe not. There are also big studies that don’t find this interaction effect. en.wikipedia.org/wiki/Heritability_of_IQ#Heritability_and_socioeconomic_status

 

“Schooling has

only a marginal effect on growth when intelligence is included, consistent with

earlier results by Weede & Kämpf (2002) and Ram (2007).”

In the regression model of all countries, schooling has a larger beta than IQ does (.158 and .125). But these appear to be unstandardized values, so they are not readily comparable.

“Also, earlier studies that took account of

earnings and cognitive test scores of migrants in the host country or IQs in

wealthy oil countries have concluded that there is a substantial causal effect of

IQ on earnings and productivity (Christainsen, 2013; Jones & Schneider,

2010)”

 

National IQs were also found to predict migrant income, as well as most other socioeconomic traits, in Denmark and Norway (and Finland and the Netherland).

Kirkegaard, E. O. W. (2014). Crime, income, educational attainment and employment among immigrant groups in Norway and Finland. Open Differential Psychology.

Kirkegaard, E. O. W., & Fuerst, J. (2014). Educational attainment, income, use of social benefits, crime rate and the general socioeconomic factor among 71 immigrant groups in Denmark. Open Differential Psychology.

 

 

Figures 3 A-C are of too low quality.

 

 

“Allocation of capital resources has been an

element of classical growth theory (Solow, 1956). Human capital theory

emphasizes that individuals with higher intelligence tend to have lower

impulsivity and lower time preference (Shamosh & Gray, 2008). This is

predicted to lead to higher savings rates and greater resource allocation to

investment relative to consumption in countries with higher average

intelligence.”

 

Time preference data for 45 countries are given by:

Wang, M., Rieger, M. O., & Hens, T. (2011). How time preferences differ: evidence from 45 countries.

They are in the megadataset from version 1.7f

Correlations among some variables of interest:

r
             SlowTimePref Income.in.DK Income.in.NO   IQ lgGDP
SlowTimePref         1.00         0.45         0.48 0.57  0.64
Income.in.DK         0.45         1.00         0.89 0.55  0.59
Income.in.NO         0.48         0.89         1.00 0.65  0.66
IQ                   0.57         0.55         0.65 1.00  0.72
lgGDP                0.64         0.59         0.66 0.72  1.00

n
             SlowTimePref Income.in.DK Income.in.NO  IQ lgGDP
SlowTimePref          273           32           12  45    40
Income.in.DK           32          273           20  68    58
Income.in.NO           12           20          273  23    20
IQ                     45           68           23 273   169
lgGDP                  40           58           20 169   273

So time prefs predict income in DK and NO only slightly worse than national IQs or lgGDP.

 

 

“Another possible mediator of intelligence effects that is difficult to

measure at the country level is the willingness and ability to cooperate. A

review by Jones (2008) shows that cooperativeness, measured in the Prisoner‟s

dilemma game, is positively related to intelligence. This correlate of

intelligence may explain some of the relationship of intelligence with

governance. Other likely mediators of the intelligence effect include less red

tape and restrictions on economic activities (“economic freedom”), higher

savings and/or investment, and technology adoption in developing countries.”

 

There are data for IQ and trust too. Presumably trust is closely related to willingness to cooperate.

Carl, N. (2014). Does intelligence explain the association between generalized trust and economic development? Intelligence, 47, 83–92. doi:10.1016/j.intell.2014.08.008

 

 

“There is no psychometric evidence for rising intelligence before that time

because IQ tests were introduced only during the first decade of the 20th

century, but literacy rates were rising steadily after the end of the Middle Age

in all European countries for which we have evidence (Mitch, 1992; Stone,

1969), and the number of books printed per capita kept rising (Baten & van

Zanden, 2008).”

 

There’s also age heaping scores which are a crude measure of numeracy. AH scores for 1800 to 1970 are in the megadataset. They have been going up for centuries too just like literacy scores. See:

A’Hearn, B., Baten, J., & Crayen, D. (2009). Quantifying quantitative literacy: Age heaping and the history of human capital. The Journal of Economic History, 69(03), 783–808.

 

 

“Why did this spiral of economic and cognitive growth take off in Europe

rather than somewhere else, and why did it not happen earlier, for example in

classical Athens or the Roman Empire? One part of the answer is that this

process can start only when technologies are already in place to translate rising

economic output into rising intelligence. The minimal requirements are a

writing system that is simple enough to be learned by everyone without undue

effort, and a means to produce and disseminate written materials: paper, and

the printing press. The first requirement had been present in Europe and the

Middle East (but not China) since antiquity, and the second was in place in

Europe from the 15thcentury. The Arabs had learned both paper-making and

printing from the Chinese in the 13thcentury (Carter, 1955), but showed little

interest in books. Their civilization was entering into terminal decline at about

that time (Huff, 1993). “

 

Are there no FLynn effects in China? They still have a difficult writing system.

 

“Most important is that Flynn effect gains have been decelerating in recent

years. Recent losses (anti-Flynn effects) were noted in Britain, Denmark,

Norway and Finland. Results for the Scandinavian countries are based on

comprehensive IQ testing of military conscripts aged 18-19. Evidence for

losses among British teenagers is derived from the Raven test (Flynn, 2009)

and Piagetian tests (Shayer & Ginsburg, 2009). These observations suggest

that for cohorts born after about 1980, the Flynn effect is ending or has ended

in many and perhaps most of the economically most advanced countries.

Messages from the United States are mixed, with some studies reporting

continuing gains (Flynn, 2012) and others no change (Beaujean & Osterlind,

2008).”

 

These are confounded with immigration of low-g migrants however. Maybe the FLynn effect is still there, just being masked by dysgenics + low-g immigration.

 

 

“The unsustainability of this situation is obvious. Estimating that one third

of the present IQ differences between countries can be attributed to genetics,

and adding this to the consequences of dysgenic fertility within countries,

leaves us with a genetic decline of between 1 and 2 IQ points per generation

for the entire world population. This decline is still more than offset by Flynn

effects in less developed countries, and the average IQ of the world‟s

population is still rising. This phase of history will end when today‟s

developing countries reach the end of the Flynn effect. “Peak IQ” can

reasonably be expected in cohorts born around the mid-21stcentury. The

assumptions of the peak IQ prediction are that (1) Flynn effects are limited by

genetic endowments, (2) some countries are approaching their genetic limits

already, and others will fiollow, and (3) today‟s patterns of differential fertility

favoring the less intelligent will persist into the foreseeable future. “

 

It is possible that embryo selection for higher g will kick in and change this.

Shulman, C., & Bostrom, N. (2014). Embryo Selection for Cognitive Enhancement: Curiosity or Game-changer? Global Policy, 5(1), 85–92. doi:10.1111/1758-5899.12123

 

 

“Fertility differentials between countries lead to replacement migration: the

movement of people from high-fertility countries to low-fertility countries,

with gradual replacement of the native populations in the low-fertility

countries (Coleman, 2002). The economic consequences depend on the

quality of the migrants and their descendants. Educational, cognitive and

economic outcomes of migrants are influenced heavily by prevailing

educational, cognitive and economic levels in the country of origin (Carabaña,

2011; Kirkegaard, 2013; Levels & Dronkers, 2008), and by the selectivity of

migration. Brain drain from poor to prosperous countries is extensive already,

for example among scientists (Franzoni, Scellato & Stephan, 2012; Hunter,

Oswald & Charlton, 2009). “

 

There are quite a few more papers on the spatial transferability hypothesis. I have 5 papers on this alone in ODP: openpsych.net/ODP/tag/country-of-origin/

But there’s also yet unpublished data for crime in Netherlands and more crime data for Norway. Papers based off these data are on their way.

 

www.goodreads.com/book/show/1737823.Understanding_Human_History

gen.lib.rus.ec/search.php?req=Understanding+Human+History&open=0&view=simple&column=def

I think Elijah mentioned this book somewhere. I can’t find where.

The basic idea of the book is to write a history book that does take known population differences into account. Normal history books don’t do that. Generally, the chapters are only very broad sketches of some period or pattern. Much of it is plausible but not too well-argued. If one looks in the references for sources given, one can see that a large number of them are to some 1985 edition of Encyclopedia Britannica. Very odd. This is a post-Wikipedia age, folks. Finding primary literature on some topic is really easy. Just search Wikipedia, read its sources. The book is certainly flawed due to the inadequate referencing of claims. Many claims that need references don’t have any either.

On the positive side, there are some interesting ideas in it. The simulations of population IQ’s in different regions is clearly a necessary beginning of a hard task.

Probably you should only read this book if you are interested in history, population genetics and differential psychology beyond a pop science superficial level.

The author is an interesting fellow. en.wikipedia.org/wiki/Michael_H._Hart

 

Seriously. Read it.

Behavior Genetics (Impact Factor: 2.61). 03/2014; DOI: 10.1007/s10519-014-9646-x

Source: PubMed

ABSTRACT I argue that the g factor meets the fundamental criteria of a scientific construct more fully than any other conception of intelligence. I briefly discuss the evidence regarding the relationship of brain size to intelligence. A review of a large body of evidence demonstrates that there is a g factor in a wide range of species and that, in the species studied, it relates to brain size and is heritable. These findings suggest that many species have evolved a general-purpose mechanism (a general biological intelligence) for dealing with the environments in which they evolved. In spite of numerous studies with considerable statistical power, we know of very few genes that influence g and the effects are very small. Nevertheless, g appears to be highly polygenic. Given the complexity of the human brain, it is not surprising that that one of its primary faculties-intelligence-is best explained by the near infinitesimal model of quantitative genetics.

Genes, Evolution and Intelligence

We have a neanderthal genome.

It is possible to estimate an individuals neanderthal ancestry. 23andme does this.

It is possible to use the admixture study design to see what the effects of some kind of ancestry origin is.

What are we waiting for? They can use the SNP datasets they have used GWA studies for psychological traits.

Girlfriend [12th may 2014]: I bet theres an autism/neanderthal link

Any takers?