Admixture in the Americas: Introduction, partial correlations and IQ predictions based on ancestry

For those who have been living under a rock (i.e. not following my on Twitter), John Fuerst have been very good at compiling data from published research. Have a look at Human Varieties with the tag Admixture Mapping. He asked me to help him analyze it and write it up. I gladly obliged, you can read the draft here. John thinks we should write it all into one huge paper instead of splitting it up as is standard practice. The standard practice is perhaps not entirely just for gaming the reputation system, but also because writing huge papers like that can seem overwhelming and may take a long time to get thru review.

So the project summarized so far is this:

  • Genetic models of trait admixture predict that mixed groups will be in-between the two source population in the trait in proportion to their admixture.
  • For psychological traits such as general intelligence (g), this has previously primarily been studied unsystematically in African Americans, but this line of research seems to have dried up, perhaps because it became too politically sensitive over there.
  • However, there have been some studies using the same method, just examining illness-related traits (e.g. diabetes). These studies usually include socioeconomic variables as controls. In doing so, they have found robust correlations between admixture at the individual level and socioeconomic outcomes: income, occupation, education and the like.
  • John has found quite a lot of these and compiled the results into a table that can be found here.
  • The results clearly show the expected results, namely that more European ancestry is associated with more favorable outcomes, more African or American less favorable outcomes. A few of them are non-significant, but none contradicts. A meta-analysis of this would find a very small p value indeed.
  • One study actually included cognitive measures as co-variates and found results in the generally expected direction. See material under the headline “Cognitive differences in the Americans” in the draft file.
  • There is no necessity that one has to look at the individual level. One can look at the group level too. For this reason John has compiled data about the ancestry proportions of American countries and Mexican regions.
  • For the countries, he has tested this against self-identified proportions, CIA World Factbook estimates, skin reflection data and stuff like that, see: The results are pretty solid. The estimates are clearly in the right ballpark.
  • Now, genetic models of the world distribution of general intelligence clearly predict that these estimates will be strongly related to the countries’ estimated mean levels of general intelligence. To test this John has carried out a number of multiple regressions with various controls such as parasite prevalence or cold weather along with European ancestry with the dependent variable being skin color and national achievement scores (PISA tests and the like). Results are in the expected directions even with controls.
  • Using the Mexican regional data, John has compared the Amerindian estimates with PISA scores, Raven’s scores, and Human Development Index (a proxy for S factor (see here and here)). Post is here:

This is where we are. Basically, the data is all there, ready to be analyzed. Someone needs to do the other part of the grunt work, namely running all the obvious tests and writing everything up for a big paper. This is where I come in.

The first I did was to create an OSF repository for the data and code since John had been manually keeping track of versions on HV. Not too good. I also converted his SPSS datafile to one that works on all platforms (CSV with semi-colons).

Then I started writing code in R. First I wanted to look at the more obvious relationships, such as that between IQ and ancestry estimates (ratios). Here I discovered that John had used a newer dataset of IQ estimates Meisenberg had sent him. However, it seems to have wrong data (Guatemala) and covers fewer relevant countries (25 vs. 35) vs. than the standard dataset from Lynn and Vanhanen 2012 (+Malloyian fixes) that I have been using. So for this reason I merged up John’s already enormous dataset (126 variables) with the latest Megadataset (365 variables), to create the cleverly named supermegadataset to be used for this study.

IQ x Ancestry zero-order correlations

Here’s the three scatterplots:




So the reader might wonder, what is wrong with the Amerindian data? Why is about nill? Simply inspecting it reveals the problem. The countries with low Amerindian ancestry have very mixed European vs. African which keeps the mean around 80-85 thus creating no correlation.

Partial correlations

So my idea was this, as I wrote it in my email to John:

Hey John,I wrote my bachelor in 4 days (5 pages per day), so now I’m back to working on more interesting things. I use the LV12 data because it seems better and is larger.

One thing that had been annoying me that was correlations between ancestry and IQ do not take into account that there are three variables that vary, not just two. Remember that odd low correlation Amer x IQ r=.14 compared with Euro x IQ = .68 and Afr x IQ = -.66. The reason for this, it seems to me, is that the countries with low Amer% are a mix of high and low Afr countries. That’s why you get a flat scatterplot. See attached.

Unfortunately, one cannot just use MR with these three variables, since the following equation is true of them 1 = Euro+Afr+Amer. They are structurally dependent. Remember that MR attempts to hold the other variables constant while changing one. This is impossible.
The solution is seems to me is to use partial correlations. In this way, one can partial out one of them and look at the remaining two. There are six possible ways to do this:Amer x IQ, partial out Afr = -.51
Amer x IQ, partial out Euro = .29
Euro x IQ, partial out Afr = .41
Euro x IQ, partial out Amer = .70
Afr x IQ, partial out Euro = -.37
Afr x IQ, partial out Amer = -.76
Assuming that genotypically, Amer=85, Afr=80, Euro=97 (or so), then these results are completed as expected direction wise. In the first case, we remove Afr, so we are comparing Amer vs. Euro. We expect negative since Amer<Euro
In two, we expect positive because Amer>Afr
In three, we expect positive because Euro>Amer
In four, we expect positive because Euro>Afr
In five, we expect negative because Afr<Amer
In six, we expect negative because Afr<Euro
All six predictions were as expected. The sample size is quite small at N=34 and LV12 isn’t perfect, certainly not for these countries. The overall results are quite reasonable in my review.
Estimates of IQ directly from ancestry
But instead merely looking at it via correlations or regressions, one can try to predict the IQs directly from the ancestry. Simple create a predicted IQ based on the proportions and these populations estimated IQs. I tried a number of variations, but they were all close to this: Euro*95+Amer*85+Afro*70. The reason to use Euro 95 and not, say, 100 is that 100 is the IQ of Northern Europeans, in particular the British (‘Greenwich Mean IQ’). The European genes found in the Americans are mostly from Spain and Portugal, which have estimated IQs of 96.6 and 94.4 (mean = 95.5). This creates a problem since the US and Canada are not mostly from these somewhat lower IQ Europeans, but the error source is small (one can always just try excluding them).

So, does the predictions work? Yes.

Now, there is another kind of error with such estimates, called elevation. It refers to getting the intervals between countries right, but generally either over or underestimating them. This kind of error is undetectable in correlation analysis. But one can calculate it by taking the predicted IQs and subtracting the measured IQs, and then taking the mean of these values. Positive values mean that one is overestimating, negative means underestimation. The value for the above is: 1.9, so we’re overestimating a little bit, but it’s fairly close. A bit of this is due to USA and CAN, but then again, LCA (St. Lucia) and DMA (Dominica) are strong negative outliers, perhaps just wrong estimates by Lynn and Vanhanen (the only study for St. Lucia is this, but I don’t have the norms so I can’t calculate the IQ).

I told Davide Piffer about these results and he suggested that I use his PCA factor scores instead. Now, these are not themselves meaningful, but they have the intervals directly estimated from the genetics. His numbers are: Africa: -1.71; Native American: -0.9; Spanish: -0.3. Ok, let’s try:


Astonishingly, the correlation is almost the same. .01 from. However, this fact is less overwhelming than it seems at first because it arises simply because the correlations between the three racial estimates is .999 (95.5

New paper out: The personal Jensen coefficient does not predict grades beyond its association with g

Found null results for a proposed metric (actually two). In the spirit of publishing failed ideas, I wrote this up.


General intelligence (g) is known to predict grades at all educational levels. A Jensen coefficient is the correlation of subtests’ g-loadings with a vector of interest. I hypothesized that the personal Jensen coefficient from the subjects’ subtest scores might predict grade point average beyond g. I used an open dataset to test this. The results showed that it does not seem to have predictive power beyond g (partial correlation = -.02). I found the same result when using a similar metric suggested by Davide Piffer.

Meisenberg’s new book chapter on intelligence, economics and other stuff

G.M. IQ & Economic growth

I noted down some comments while reading it.

In Table 1, Dominican birth cohort is reversed.


“0.70 and 0.80 in world-wide country samples. Figure 1 gives an impression of

this relationship.”


Figure 1 shows regional IQs, not GDP relationships.

“We still depend on these descriptive methods of quantitative genetics because

only a small proportion of individual variation in general intelligence and

school achievement can be explained by known genetic polymorphisms (e.g.,

Piffer, 2013a,b; Rietveld et al, 2013).”


We don’t. Modern BG studies can confirm A^2 estimates directly from the genes.


Davies, G., Tenesa, A., Payton, A., Yang, J., Harris, S. E., Liewald, D., … & Deary, I. J. (2011). Genome-wide association studies establish that human intelligence is highly heritable and polygenic. Molecular psychiatry, 16(10), 996-1005.

Marioni, R. E., Davies, G., Hayward, C., Liewald, D., Kerr, S. M., Campbell, A., … & Deary, I. J. (2014). Molecular genetic contributions to socioeconomic status and intelligence. Intelligence, 44, 26-32.

Results are fairly low tho, in the 20’s, presumably due to non-additive heritability and rarer genes.


“Even in modern societies, the heritability of

intelligence tends to be higher for children from higher socioeconomic status

(SES) families (Turkheimer et al, 2003; cf. Nagoshi and Johnson, 2005; van

der Sluis et al, 2008). Where this is observed, most likely environmental

conditions are of similar high quality for most high-SES children but are more

variable for low-SES children. “


Or maybe not. There are also big studies that don’t find this interaction effect.


“Schooling has

only a marginal effect on growth when intelligence is included, consistent with

earlier results by Weede & Kämpf (2002) and Ram (2007).”

In the regression model of all countries, schooling has a larger beta than IQ does (.158 and .125). But these appear to be unstandardized values, so they are not readily comparable.

“Also, earlier studies that took account of

earnings and cognitive test scores of migrants in the host country or IQs in

wealthy oil countries have concluded that there is a substantial causal effect of

IQ on earnings and productivity (Christainsen, 2013; Jones & Schneider,



National IQs were also found to predict migrant income, as well as most other socioeconomic traits, in Denmark and Norway (and Finland and the Netherland).

Kirkegaard, E. O. W. (2014). Crime, income, educational attainment and employment among immigrant groups in Norway and Finland. Open Differential Psychology.

Kirkegaard, E. O. W., & Fuerst, J. (2014). Educational attainment, income, use of social benefits, crime rate and the general socioeconomic factor among 71 immigrant groups in Denmark. Open Differential Psychology.



Figures 3 A-C are of too low quality.



“Allocation of capital resources has been an

element of classical growth theory (Solow, 1956). Human capital theory

emphasizes that individuals with higher intelligence tend to have lower

impulsivity and lower time preference (Shamosh & Gray, 2008). This is

predicted to lead to higher savings rates and greater resource allocation to

investment relative to consumption in countries with higher average



Time preference data for 45 countries are given by:

Wang, M., Rieger, M. O., & Hens, T. (2011). How time preferences differ: evidence from 45 countries.

They are in the megadataset from version 1.7f

Correlations among some variables of interest:

             SlowTimePref   IQ lgGDP
SlowTimePref         1.00         0.45         0.48 0.57  0.64         0.45         1.00         0.89 0.55  0.59         0.48         0.89         1.00 0.65  0.66
IQ                   0.57         0.55         0.65 1.00  0.72
lgGDP                0.64         0.59         0.66 0.72  1.00

             SlowTimePref  IQ lgGDP
SlowTimePref          273           32           12  45    40           32          273           20  68    58           12           20          273  23    20
IQ                     45           68           23 273   169
lgGDP                  40           58           20 169   273

So time prefs predict income in DK and NO only slightly worse than national IQs or lgGDP.



“Another possible mediator of intelligence effects that is difficult to

measure at the country level is the willingness and ability to cooperate. A

review by Jones (2008) shows that cooperativeness, measured in the Prisoner‟s

dilemma game, is positively related to intelligence. This correlate of

intelligence may explain some of the relationship of intelligence with

governance. Other likely mediators of the intelligence effect include less red

tape and restrictions on economic activities (“economic freedom”), higher

savings and/or investment, and technology adoption in developing countries.”


There are data for IQ and trust too. Presumably trust is closely related to willingness to cooperate.

Carl, N. (2014). Does intelligence explain the association between generalized trust and economic development? Intelligence, 47, 83–92. doi:10.1016/j.intell.2014.08.008



“There is no psychometric evidence for rising intelligence before that time

because IQ tests were introduced only during the first decade of the 20th

century, but literacy rates were rising steadily after the end of the Middle Age

in all European countries for which we have evidence (Mitch, 1992; Stone,

1969), and the number of books printed per capita kept rising (Baten & van

Zanden, 2008).”


There’s also age heaping scores which are a crude measure of numeracy. AH scores for 1800 to 1970 are in the megadataset. They have been going up for centuries too just like literacy scores. See:

A’Hearn, B., Baten, J., & Crayen, D. (2009). Quantifying quantitative literacy: Age heaping and the history of human capital. The Journal of Economic History, 69(03), 783–808.



“Why did this spiral of economic and cognitive growth take off in Europe

rather than somewhere else, and why did it not happen earlier, for example in

classical Athens or the Roman Empire? One part of the answer is that this

process can start only when technologies are already in place to translate rising

economic output into rising intelligence. The minimal requirements are a

writing system that is simple enough to be learned by everyone without undue

effort, and a means to produce and disseminate written materials: paper, and

the printing press. The first requirement had been present in Europe and the

Middle East (but not China) since antiquity, and the second was in place in

Europe from the 15thcentury. The Arabs had learned both paper-making and

printing from the Chinese in the 13thcentury (Carter, 1955), but showed little

interest in books. Their civilization was entering into terminal decline at about

that time (Huff, 1993). “


Are there no FLynn effects in China? They still have a difficult writing system.


“Most important is that Flynn effect gains have been decelerating in recent

years. Recent losses (anti-Flynn effects) were noted in Britain, Denmark,

Norway and Finland. Results for the Scandinavian countries are based on

comprehensive IQ testing of military conscripts aged 18-19. Evidence for

losses among British teenagers is derived from the Raven test (Flynn, 2009)

and Piagetian tests (Shayer & Ginsburg, 2009). These observations suggest

that for cohorts born after about 1980, the Flynn effect is ending or has ended

in many and perhaps most of the economically most advanced countries.

Messages from the United States are mixed, with some studies reporting

continuing gains (Flynn, 2012) and others no change (Beaujean & Osterlind,



These are confounded with immigration of low-g migrants however. Maybe the FLynn effect is still there, just being masked by dysgenics + low-g immigration.



“The unsustainability of this situation is obvious. Estimating that one third

of the present IQ differences between countries can be attributed to genetics,

and adding this to the consequences of dysgenic fertility within countries,

leaves us with a genetic decline of between 1 and 2 IQ points per generation

for the entire world population. This decline is still more than offset by Flynn

effects in less developed countries, and the average IQ of the world‟s

population is still rising. This phase of history will end when today‟s

developing countries reach the end of the Flynn effect. “Peak IQ” can

reasonably be expected in cohorts born around the mid-21stcentury. The

assumptions of the peak IQ prediction are that (1) Flynn effects are limited by

genetic endowments, (2) some countries are approaching their genetic limits

already, and others will fiollow, and (3) today‟s patterns of differential fertility

favoring the less intelligent will persist into the foreseeable future. “


It is possible that embryo selection for higher g will kick in and change this.

Shulman, C., & Bostrom, N. (2014). Embryo Selection for Cognitive Enhancement: Curiosity or Game-changer? Global Policy, 5(1), 85–92. doi:10.1111/1758-5899.12123



“Fertility differentials between countries lead to replacement migration: the

movement of people from high-fertility countries to low-fertility countries,

with gradual replacement of the native populations in the low-fertility

countries (Coleman, 2002). The economic consequences depend on the

quality of the migrants and their descendants. Educational, cognitive and

economic outcomes of migrants are influenced heavily by prevailing

educational, cognitive and economic levels in the country of origin (Carabaña,

2011; Kirkegaard, 2013; Levels & Dronkers, 2008), and by the selectivity of

migration. Brain drain from poor to prosperous countries is extensive already,

for example among scientists (Franzoni, Scellato & Stephan, 2012; Hunter,

Oswald & Charlton, 2009). “


There are quite a few more papers on the spatial transferability hypothesis. I have 5 papers on this alone in ODP:

But there’s also yet unpublished data for crime in Netherlands and more crime data for Norway. Papers based off these data are on their way.


Does conscientiousness predict PISA scores at the national level? A cautious meta-analysis

Just a quick write-up before I write up a paper with this for ODP.


Altho general cognitive ability (g) has received the most attention by differential psychologists, personality receives a fair share now a days. And just as g has been shown to have great predictive power in large meta-analyses in a variety of contexts (e.g. Gottfredson 1997 is still the best summary IMO), so has the personality trait of conscientiousness (C) (e.g. The Validity of Conscientiousness for Predicting Job Performance A meta-analytic test of two hypotheses A Meta-Analytic Investigation of Conscientiousness in the Prediction of Job Performance Examining the Intercorrelations and the Incremental Validity of Narrow Traits..asp(1) The Case for Conscientiousness Evidence and Implications for a Personality Trait Marker of Health and Longevity).

The ‘new’ thing in differential psych is to study national g estimates and how they correlate. This is the field ive been working mostly in with the spatial transferability hypothesis. The question then is, does C have predictive ability at the national level too? Well, maybe. There are some national estimates of the big five/OCEAN traits in Schmitt et al 2007. I added them to the Megdataset.


Partial correlations

The PISA x measured IQ (not the ones where scholastic ability have been factored in!) correlations were also of interest since no one apparently had calculated the mean PISA x measured IQ correlation. Well, it is .92. So, does C explain some of the remaining variance? One idea is to calculate the partial correlations of C and PISA with mIQ partialed out. However, this method seems to be wrong since some of the correlations are above 1! Ive never seen partial correlations above 1 before.

Math00Mean 1.4828725419
Read00Mean 1.1065080555
Sci00Mean 1.0012991174
Math03Mean 1.0742429148
Read03Mean 1.1147063889
Sci03Mean 1.2609157051
Sci06Mean 0.9137135525
Read06Mean 0.6593605051
Math06Mean 0.3923821506
Read09Mean 0.8607255528
Math09Mean 0.6409903363
Sci09Mean 0.843892485
Finance12Mean 0.3834897092
Math12Mean 0.3682415819
Read12Mean 0.5272534233
Sci12Mean 0.5563931581
CPS12Mean 0.1497008328


Multiple regression

So maybe another method is called for. I used multiple regression on all 17 PISA variables. One may be tempted to simply average them, but as Joost de Winter pointed out to me in an email, the PISA for the same year are not independent. So one cannot just count them as independent. One can get around this problem by doing the meta-analysis within test type, i.e. reading, math and science. Results:

> IQ.betas.weighted.mean
[1] 0.9631086
> C.betas.weighted.mean
[1] 0.1673834
> sum(samples.sizes)
[1] 166Math:
> IQ.betas.weighted.mean
[1] 0.9621924
> C.betas.weighted.mean
[1] 0.02653771
> sum(samples.sizes)
[1] 167Science:
> IQ.betas.weighted.mean
[1] 0.9826468
> C.betas.weighted.mean
[1] 0.1080092
> sum(samples.sizes)
[1] 167

The results from reading have p=.03, so maybe. In 1-2 years, we will have more data from PISA15 to test with. There are plenty of reasons to be cautious: 1) The measured IQs are not perfectly reliably measured. This means that the true correlation between g and PISA scores is higher, leaving less variance to be explained by non-g factors. Maybe nothing? 2) The quality of the personality data is quite poor. Altho one may counter-argue that this is a reason to be more optimistic since the results (well, reading results) are still plausible.

The R sourcecode for the paper is here. The dataset is here.

What about measured IQ and PISA scores?

#the mean PISA x IQ correlation
DF.C.PISA.IQ.rcorr = rcorr(as.matrix(DF.C.PISA.IQ))
IQ.PISA.cors = DF.C.PISA.IQ.rcorr$r[19,] #get IQ row
IQ.PISA.cors = IQ.PISA.cors[2:18] #remove C and IQ-IQ
mean(IQ.PISA.cors) #the mean measured IQ x PISA correlation
#weighted mean
IQ.PISA.cors.n = DF.C.PISA.IQ.rcorr$n[19,] #get IQ row
IQ.PISA.cors.n = IQ.PISA.cors.n[2:18] #remove C and IQ-IQ
IQ.PISA.cors.weighted = IQ.PISA.cors*IQ.PISA.cors.n
IQ.PISA.cors.weighted.mean = sum(IQ.PISA.cors.weighted)/sum(IQ.PISA.cors.n)

The unweighted mean is 0.919, the weighted is 0.924.



Schmitt, D. P., Allik, J., McCrae, R. R., & Benet-Martinez, V. (2007). The Geographic Distribution of Big Five Personality Traits: Patterns and Profiles of Human Self-Description Across 56 Nations. Journal of Cross-Cultural Psychology, 38(2), 173–212. doi:10.1177/0022022106297299

Appendix – full output from MR

PISA test IQ.betas C.betas samples.sizes
Math00Mean 0.9895461 0.096764646 22
Read00Mean 0.977835 0.297191736 22
Sci00Mean 0.9759363 0.099720868 22
Math03Mean 0.9812832 0.016108517 27
Read03Mean 1.0141552 0.27851122 27
Sci03Mean 1.008251 0.104575077 27
Sci06Mean 0.9796918 0.125369373 38
Read06Mean 0.9346129 0.118300942 37
Math06Mean 0.9455623 0.010964361 38
Read09Mean 0.9596431 0.140295939 39
Math09Mean 0.9628133 0.035653129 39
Sci09Mean 0.977768 0.102601624 39
Finance12Mean 0.5286025 -0.144810379 14
Math12Mean 0.9497653 0.001486034 41
Read12Mean 0.9506026 0.094608558 41
Sci12Mean 0.9767656 0.103772057 41
CPS12Mean 0.8830054 -0.025983714 29

International general factor of personality? yes, but…

I merged the dataset from Schmitt et al (2007)’s paper about OCEAN traits in 56 countries with the rest of the megadataset. Then i extracted the first factor of the OCEAN means and SDs. These two are nearly uncorrelated (.07). As for factor strength, they are not too bad:

Call: omega(m = DF.OCEAN.mean)
Alpha:                 0.73 
G.6:                   0.74 
Omega Hierarchical:    0.54 
Omega H asymptotic:    0.64 
Omega Total            0.84 

Schmid Leiman Factor loadings greater than  0.2 
                                        g   F1*   F2*   F3*   h2   u2   p2
ExtraversionMeanSchmittEtAl2007      0.44        0.66       0.64 0.36 0.30
AgreeablenessMeanSchmittEtAl2007     0.58  0.56             0.66 0.34 0.51
ConscientiousnessMeanSchmittEtAl2007 0.62  0.52             0.66 0.34 0.58
NeuroticismMeanSchmittEtAl2007      -0.66  0.28  0.36 -0.36 0.76 0.24 0.56
OpennessMeanSchmittEtAl2007          0.23        0.21  0.51 0.38 0.62 0.14

With eigenvalues of:
   g  F1*  F2*  F3* 
1.40 0.69 0.62 0.40 

general/max  2.04   max/min =   1.7
mean percent general =  0.42    with sd =  0.19 and cv of  0.46 
Explained Common Variance of the general factor =  0.45



Call: omega(m = DF.OCEAN.SD)
Alpha:                 0.79 
G.6:                   0.78 
Omega Hierarchical:    0.72 
Omega H asymptotic:    0.86 
Omega Total            0.84 

Schmid Leiman Factor loadings greater than  0.2 
                                      g   F1*   F2*   F3*   h2   u2   p2
ExtraversionSDSchmittEtAl2007      0.80                   0.64 0.36 0.99
AgreeablenessSDSchmittEtAl2007     0.57        0.47       0.55 0.45 0.59
ConscientiousnessSDSchmittEtAl2007 0.57  0.35             0.48 0.52 0.68
NeuroticismSDSchmittEtAl2007       0.78  0.52             0.87 0.13 0.69
OpennessSDSchmittEtAl2007          0.43        0.24       0.25 0.75 0.74

With eigenvalues of:
   g  F1*  F2*  F3* 
2.08 0.41 0.31 0.00 

general/max  5.09   max/min =   136.11
mean percent general =  0.74    with sd =  0.15 and cv of  0.2 
Explained Common Variance of the general factor =  0.74


Compare with values in Table 5 in my just published paper. GFP-mean is clearly weaker than g factor at individual level, GFP-SD is about the same.

Var% MR
Var% MR SL Omega h. Omega h. a. ECV R2
NO Complete cases 0.68 0.65 0.87 0.91 0.78 0.98
NO Impute 1 0.66 0.62 0.86 0.9 0.74 0.96
NO Impute 2 0.64 0.6 0.85 0.89 0.75 0.95
NO Impute 3 0.63 0.59 0.82 0.87 0.73 0.99
DK complete cases 0.57 0.51 0.83 0.85 0.68 0.99
DK impute 4 0.55 0.51 0.86 0.88 0.73 0.99
Int. S. Factor 0.43 0.35 0.76 0.77 0.51 0.81
Cognitive data 0.33 0.74 0.79 0.57 0.78
Personality data 0.16 0.37 0.48 0.34 0.41

Then i correlated these with national IQ, S factor and local S factors in Norway and Denmark.

> round(cor(DF.OCEAN.general.scores,use="pairwise.complete.obs"),2)
             GFP.mean GFP.SD Islam S.Int    IQ
GFP.mean         1.00   0.07        0.09        -0.25  0.17 -0.21 -0.58
GFP.SD           0.07   1.00        0.39         0.26 -0.14  0.36  0.24      0.09   0.39        1.00         0.78 -0.72  0.73  0.60    -0.25   0.26        0.78         1.00 -0.71  0.54  0.54
Islam            0.17  -0.14       -0.72        -0.71  1.00 -0.33 -0.27
S.Int           -0.21   0.36        0.73         0.54 -0.33  1.00  0.86
IQ              -0.58   0.24        0.60         0.54 -0.27  0.86  1.00

So strangely, the correlation of GFP-mean x national IQ is very negative. It correlates weakly with S factors. Let’s try partialing out national IQ:

> = partial.r(DF.OCEAN.general.scores,c(1:6),7)
partial correlations 
             GFP.mean GFP.SD Islam S.Int
GFP.mean         1.00   0.26        0.68         0.09  0.02  0.72
GFP.SD           0.26   1.00        0.31         0.16 -0.08  0.32      0.68   0.31        1.00         0.67 -0.73  0.53     0.09   0.16        0.67         1.00 -0.70  0.19
Islam            0.02  -0.08       -0.73        -0.70  1.00 -0.21
S.Int            0.72   0.32        0.53         0.19 -0.21  1.00

Even more strange. GFP-mean strongly correlates with 2 S factors, but not the one in Denmark. The Danish data are very good (25 variables) and so are the international data (42-54 variables). And all the S factors correlate strongly before partialing (.78, .73, .54) but mixed after removing IQ (.67, .53, .19). Again Denmark is odd. For GFP-SD, it is similar, but weaker (before: .39, .26, .36; after: .31, .16, .32).

What to make of this? So i emailed some colleagues:

Dear [NAMES]

Do you know if someone have looked at an international general factor of personality? Because I did it just now using a dataset of OCEAN trait scores (big five) from Schmitt et al 2007. There is indeed an international GFP in the data. It correlates negatively with national IQs (-.58). Strangely, partialing out national IQs, it correlates highly with general socioeconomic factors in Norway (.68) and internationally (.72), but not in Denmark (.09). Strange? Thoughts? I can send you the data+code if you like.


One of them had insider info:


There is a paper about to appear in Intelligence in which an international GFP has been computed and analyzed.


So i publish this here quickly so i establish priority and independence.

What about OCEAN traits themselves?

(sorry, tables apparently not easy to make smaller)
All correlations:
E mean E SD A mean A SD C mean C SD N mean N SD O mean O SD Mean SD S.NO S.DK Islam Int.S IQ
E mean 1 0.14 0.2 0.22 0.25 0.23 -0.49 0.17 0.27 0.09 0.23 0.06 -0.19 -0.02 0.09 -0.02
E sd 0.14 1 -0.08 0.47 -0.07 0.48 0.13 0.66 0.3 0.34 0.81 0.45 0.35 -0.35 0.53 0.39
A mean 0.2 -0.08 1 0.15 0.65 0.21 -0.48 0.21 0.26 -0.13 0.11 0.08 -0.26 0.26 -0.25 -0.53
A SD 0.22 0.47 0.15 1 0.23 0.43 0 0.45 0.22 0.35 0.71 0.18 0.23 -0.18 0.12 -0.04
C mean 0.25 -0.07 0.65 0.23 1 0.1 -0.57 0.07 0.2 -0.03 0.07 0.04 -0.19 0.14 -0.19 -0.6
C SD 0.23 0.48 0.21 0.43 0.1 1 0.11 0.62 0.41 0.25 0.78 0.34 -0.03 0.04 0.19 0.04
N mean -0.49 0.13 -0.48 0 -0.57 0.11 1 0.22 -0.09 0.25 0.19 -0.1 0.13 -0.06 0.12 0.38
N SD 0.17 0.66 0.21 0.45 0.07 0.62 0.22 1 0.41 0.28 0.83 0.23 0.19 0 0.24 0.18
O mean 0.27 0.3 0.26 0.22 0.2 0.41 -0.09 0.41 1 0.07 0.4 -0.01 -0.07 0.04 -0.02 -0.06
O sd 0.09 0.34 -0.13 0.35 -0.03 0.25 0.25 0.28 0.07 1 0.56 0.22 0.14 -0.07 0.25 0.37
Mean SD 0.23 0.81 0.11 0.71 0.07 0.78 0.19 0.83 0.4 0.56 1 0.41 0.25 -0.15 0.36 0.25 0.06 0.45 0.08 0.18 0.04 0.34 -0.1 0.23 -0.01 0.22 0.41 1 0.78 -0.72 0.73 0.6 -0.19 0.35 -0.26 0.23 -0.19 -0.03 0.13 0.19 -0.07 0.14 0.25 0.78 1 -0.71 0.54 0.54
IslamPewResearch2010 -0.02 -0.35 0.26 -0.18 0.14 0.04 -0.06 0 0.04 -0.07 -0.15 -0.72 -0.71 1 -0.33 -0.27
International.S.Factor 0.09 0.53 -0.25 0.12 -0.19 0.19 0.12 0.24 -0.02 0.25 0.36 0.73 0.54 -0.33 1 0.86
LV2012estimatedIQ -0.02 0.39 -0.53 -0.04 -0.6 0.04 0.38 0.18 -0.06 0.37 0.25 0.6 0.54 -0.27 0.86 1
With IQ partialed out:
E mean E sd A mean A SD C mean C SD N mean N SD O mean O SD Mean SD S.NO S.DK Islam Int.S
E mean 1 0.17 0.22 0.22 0.3 0.23 -0.52 0.18 0.27 0.1 0.24 0.09 -0.21 -0.02 0.21
E sd 0.17 1 0.16 0.53 0.22 0.51 -0.02 0.65 0.35 0.23 0.8 0.29 0.18 -0.28 0.42
A mean 0.22 0.16 1 0.15 0.49 0.28 -0.36 0.36 0.27 0.07 0.29 0.58 0.03 0.15 0.48
A SD 0.22 0.53 0.15 1 0.25 0.43 0.02 0.47 0.21 0.4 0.74 0.26 0.3 -0.2 0.3
C mean 0.3 0.22 0.49 0.25 1 0.15 -0.46 0.23 0.21 0.25 0.29 0.63 0.2 -0.02 0.82
C SD 0.23 0.51 0.28 0.43 0.15 1 0.1 0.62 0.41 0.26 0.79 0.39 -0.05 0.06 0.31
N mean -0.52 -0.02 -0.36 0.02 -0.46 0.1 1 0.17 -0.07 0.13 0.11 -0.45 -0.1 0.05 -0.44
N SD 0.18 0.65 0.36 0.47 0.23 0.62 0.17 1 0.43 0.23 0.83 0.16 0.11 0.05 0.18
O mean 0.27 0.35 0.27 0.21 0.21 0.41 -0.07 0.43 1 0.1 0.42 0.03 -0.04 0.03 0.06
O sd 0.1 0.23 0.07 0.4 0.25 0.26 0.13 0.23 0.1 1 0.52 0.01 -0.07 0.03 -0.14
Mean SD 0.24 0.8 0.29 0.74 0.29 0.79 0.11 0.83 0.42 0.52 1 0.33 0.15 -0.09 0.3 0.09 0.29 0.58 0.26 0.63 0.39 -0.45 0.16 0.03 0.01 0.33 1 0.67 -0.73 0.53 -0.21 0.18 0.03 0.3 0.2 -0.05 -0.1 0.11 -0.04 -0.07 0.15 0.67 1 -0.7 0.19
IslamPewResearch2010 -0.02 -0.28 0.15 -0.2 -0.02 0.06 0.05 0.05 0.03 0.03 -0.09 -0.73 -0.7 1 -0.21
International.S.Factor 0.21 0.42 0.48 0.3 0.82 0.31 -0.44 0.18 0.06 -0.14 0.3 0.53 0.19 -0.21 1
R code (load in the megadataset as DF.mega3 first):
DF.interest = cbind(DF.mega3[2:12],
DF.interest.cor = rcorr(as.matrix(DF.interest))

#remove IQ
DF.interest.cor.without.IQ = partial.r(DF.interest, c(1:15),16)
write.csv(round(DF.interest.cor.without.IQ,2), file="OCEANCors_no_g.csv")

DF.OCEAN.full = cbind(DF.mega3[2:12]) = omega(DF.OCEAN.full) = fa(DF.OCEAN.full)

DF.OCEAN.mean = cbind(DF.mega3[c(2,4,6,8,10)]) = omega(DF.OCEAN.mean) = fa(DF.OCEAN.mean)

DF.OCEAN.SD = cbind(DF.mega3[c(3,5,7,9,11)]) = omega(DF.OCEAN.SD) = fa(DF.OCEAN.SD)

DF.OCEAN.general.scores = cbind($scores,$scores,
colnames(DF.OCEAN.general.scores) = c("GFP.mean","GFP.SD","","","Islam","S.Int","IQ")
round(cor(DF.OCEAN.general.scores,use="pairwise.complete.obs"),2) = partial.r(DF.OCEAN.general.scores,c(1:6),7)

Megadataset is in the OSF repository, version 1.6b.

New paper out: Crime, income, educational attainment and employment among immigrant groups in Norway and Finland


I present new predictive analyses for crime, income, educational attainment and employment among immigrant groups in Norway and crime in Finland. Furthermore I show that the Norwegian data contains a strong general socioeconomic factor (S) which is highly predictable from country-level variables (National IQ .59, Islam prevalence -.71, international general socioeconomic factor .72, GDP .55), and correlates highly (.78) with the analogous factor among immigrant groups in Denmark. Analyses of the prediction vectors show very high correlations (generally ±.9) between predictors which means that the same variables are relatively well or weakly predicted no matter which predictor is used. Using the method of correlated vectors shows that it is the underlying S factor that drives the associations between predictors and socioeconomic traits, not the remaining variance (all correlations near unity).

All data and source files are at the OSF repository:

The hereditarian hypothesis is almost certainly true

And I’m not even referring to the latest results from any Pfferian methods (e.g. this).

Currently there are many large GWA studies for IQ/educational attainment/other g proxies that have data gathered from US citizens. Some of the subjects are African Americans (AA’s). AA’s are actually mixed ancestry with about 23% European ancestry. The simplest genetic model (without shenanigans from assortative mating or outbreeding) predicts a linear relationship between amount of Euro ancestry in AA’s and higher IQ/g proxy. This is easy to check, so easy that quite a lot of science bloggers could easily do this in one day if only they had the data available to them. The researchers who did the GWA studies surely are aware of this method.

Now, we have not seen any such study published, either positive or negative. Surely, there are huge benefits to being the first author to publish a study that almost conclusively disproves the genetic model for AA’s. Researchers with access to the data have a strong incitement to publish such a study. If the data supports the view, they have all the necessary means too. Since they haven’t published it, we can hence infer that the data does not support the politically favorable non-genetic conclusion, but instead the genetic model. Probably the authors with access do not want to go into history as the ones who finally proved ‘racism’. So, instead of settling the issue, they just sit on the data.

In any case, useful data for testing this are gonna leak sooner or later. It can’t take many years. An admixture study for diabetes and African ancestry came very close to proving the genetic model since they included socioeconomic (SES) measures as a control. Their S2 table shows a clear relationship between higher SES and less African ancestry. Of course, a stubborn person will regard this as simply being in line with a discrimination model being on visual cues.

The g factor in autistic persons?


Eyeballing their figure seems to indicate that the g factor is much less strong in these children. A quick search on Scholar didn’t reveal any studies that investigated this idea.

If someone can obtain subtest data from autism samples, that would be useful. The methods I used in my recent paper (section 12) can estimate the strength of the general factor in a sample. If g is weaker in autistic samples, this should be reflected in these measures.

I will write to some authors to see if they will let me how the subtest data.

Comments on Learning Statistics with R

So I found a textbook for learning both elementary statistics much of which i knew but hadnt read a textbook about, and for learning R. book is free legally

Numbers refer to the page number in the book. The book is in an early version (“0.4″) so many of these are small errors i stumbled upon while going thru virtually all commands in the book in my own R window.



These modeOf() and maxFreq() does not work. This is because the afl.finalists is a factor and they demand a vector. One can use as.vector() to make them work.



Worth noting that summary() is the same as quartile() except that it also includes the mean.



Actually, the output of describe() is not telling us the number of NA. It is only because the author assumes that there are 100 total cases that he can do 100-n and get the number of NAs for each var.



The cakes.Rdata is already transposed.



as.logical also converts numeric 0 and 1 to F and T. However, oddly, it does not understand “0” and “1”.



Actually P(0) is not equivalent with impossible. See:



Actually 100 simulations with N=20 will generally not result in a histogram like the above. Perhaps it is better to change the command to K=1000. And why not add hist() to it so it can be visually compared to the theoretic one?


hist(rbinom( n = 1000, size = 20, prob = 1/6 ))


It would be nice if the code for making these simulations was shown.



“This is just bizarre: σ ˆ 2 is and unbiased estimate of the population variance”





Typo in Figure 11.6 text. “Notice that when θ actually is equal to .05 (plotted as a black dot)”




“That is, what values of X2 would lead is to reject the null hypothesis.”



It is most annoying that the author doesn’t write the code for reproducing his plots. I spent 15 minutes trying to find a function to create histplots by group.





“It works for t-tests, but it wouldn’t be meaningful for chi-square testsm F -tests or indeed for most of the tests I talk about in this book.”



“we see that it is 95% certain that the true (population-wide) average improvement would lie between 0.95% and 1.86%.”


This wording is dangerous because there are two interpretations of the percent sign. In the relative sense, they are wrong. The author means absolute %’s.



The code has +’s in it which means it cannot just be copied and runned. This usually isn’t the case, but it happens a few times in the book.



In the description of the test, we are told to tick when the values are larger than. However, in the one sample version, the author ticks when the value is equal to. I guess this means that we tick when it is equal to or larger than.



This command doesn’t work because the dataframe isn’t attached as the author assumes.

> mood.gain <- list( placebo, joyzepam, anxifree)



First the author says he wants to use the R^2 non-adjusted, but then in the text he uses the adjusted value.



Typo with “Unless” capitalized.



“(3.45 for drug and 0.92 for therapy),”

He must mean .47 for therapy. .92 is the number for residuals.



In the alternates hypothesis, the author uses “u_ij” instead of “u_rc” which is used in the null-hypothesis. I’m guessing the null-hypothesis is right.



As earlier, it is ambiguous when the author talks about increases in percent. It could be relative or absolute. Again in this case it is absolute. The author should use %point or something to avoid confusion.





“I find it amusing to note that the default in R is Type I and the default in SPSS is Type III (with Helmert contrasts). Neither of these appeals to me all that much. Relatedly, I find it depressing that almost nobody in the psychological literature ever bothers to report which Type of tests they ran, much less the order of variables (for Type I) or the contrasts used (for Type III). Often they don’t report what software they used either. The only way I can ever make any sense of what people typically report is to try to guess from auxiliary cues which software they were using, and to assume that they never changed the default settings. Please don’t do this… now that you know about these issues, make sure you indicate what software you used, and if you’re reporting ANOVA results for unbalanced data, then specify what Type of tests you ran, specify order information if you’ve done Type I tests and specify contrasts if you’ve done Type III tests. Or, even better, do hypotheses tests that correspond to things you really care about, and then report those!”


An exmaple of the necessity of open methods along with open data. Science must be reproducible. The best is to simply share the exact source code to the the analyses in a paper.