Consider the model below:

General model for immigrant group traits and outcomes

Something much like this has been my intuitive working model for thinking about immigrant groups’ traits and socioeconomic outcomes. I will explain the model in this post and refer back to it or use the material in some upcoming paper (nothing planned).

The model shows the home country/a country of origin and two destination countries. The model is not limited to just two destination countries, but I did not draw more to avoid making the model larger. It can be worth using more in some cases which will be explained below.

Familial traits (or intergenerational) are those traits that run in families. This term includes both genetic and shared environmental effects. Because most children grow up with their parents (I assume), it does not matter whether the parents traits→children traits route is genetic or environmental. This means that both psychological traits (mostly genetic) and culturally traits (mostly shared environmental) such as specific religion are included.

When persons leave (emigrate) their home country, there is some selection: people who decide to leave are not random. Sometimes, it is not easy to leave because the government actively tries to restrict its citizens from leaving. This is shown in the model as the Emigration selection→Emigrant group familial traits link. Emigration selection seems to be mostly positive in the real world: the better off and smarter emigrate more than the poorer and less bright.

When the immigrants then move to other countries, there is Immigration selection because the destination countries usually don’t just allow whoever to move in if they want to. Immigration selection can have both positive and negative effects. Countries that receive refugees but try not to receive others have negative selection, while those that try to only pick the best potential immigrants have positive selection. Often countries have elements of both. Immigration selection and Emigrant group familial traits jointly lead to Immigrant group familial traits in a particular destination country.

Note that because immigrant selection is unique for each destination country, but can be similar for some countries. This would show up at correlated immigration selection scores. There is also immigration selection that doesn’t happen in the destination country, namely selection that happens due to geographical distance. For this reason I placed the Immigration selection node half in the destination country boxes. With a more complex model, one could split these if desired.

Worse, it is possible that immigration selection in a given country depends on the origin country, i.e. a country-country interaction selection. This wasn’t included in the above model. Examples of this are easy to find. For instance, within the EU (well, it’s complicated), there is relatively free movement of EU citizens, but not so for persons coming in from outside the EU.

Socioeconomic outcomes: Human capital model + luck

The S factor score of the home country (the general factor of socioeconomic outcomes, which one can think of as roughly equal to the Human Development Index just broader ) is modeled as being the outcome of the Population familial traits and Environmental and historical luck . I think it is mostly the former. Perhaps the most obvious example of environmental luck is having valuable natural resources in your borders, today especially oil. But note that even this is somewhat complicated because borders can change by use of ‘bigger army diplomacy’ or by simply purchasing more land, so one could strategically buy or otherwise acquire land that has valuable resources on it, making it not a strict environmental effect.

Other things could be having access to water, sunlight, wind, earthquakes, mountains, large bodies of inland water & rivers, active underground, arable land, living close to peaceful (or not so much) neighbors and so on. These things can promote or retard economic development. Having suitable rivers means that one can get cheap and safe (well, mostly) energy from those. Countries without such resources have to look elsewhere which may cost more. They are not always strictly environmental, but some amount of their variance is more or less randomly distributed to countries. Some are more lucky than others.

There are some who argue that countries that were colonized are better off now because of it, so that would count as historical luck . However, being colonized is not just an environmental effect because it means that foreign powers were able to defeat your forces overwhelmingly for decades. If they were able to, you probably had a poor military which is linked to general technological development. There is some environmental component to whether you have a history of communism, but it seems to still have negative effects on economic growth decades after.

For immigrant groups inside a host country, however, the environmental effects with country-wide effects cannot account for differences. These are thus due to familial effects only (by a good approximation). To be sure, the other people living in the destination/host country, Other group familial traits, probably have some effect on the Immigrant familial traits as well , such as religion and language. These familial traits and the Other group S then jointly cause the Immigrant group S. This is the effect that Open Borders advocates often talk about one aspect of:

Wage differences are a revealing metric of border discrimination. When a worker from a poorer country moves to a richer one, her wages might double, triple, or rise even tenfold. These extreme wage differences reflect restrictions as stifling as the laws that separated white and black South Africans at the height of Apartheid. Geographical differences in wages also signal opportunity—for financially empowering the migrants, of course, but also for increasing total world output. On the other side of discrimination lies untapped potential. Economists have estimated that a world of open borders would double world GDP.

Paths estimated in studies

A path model is always complete which means that all causal routes are explicitly specified. All the remaining links are non-causal, but nodes can be substantially correlated. For instance, there is no link between the home country Country S and immigrant group S but these are strongly correlated in practice. I previously reported correlations between home Country S and Immigrant group S of .54 and .72 for Denmark and Norway .

There is no link between home country Population familial traits and Immigrant group familial traits, but there is only one link in between (Emigrant group familial traits), so seems reasonable to try to correlate these two nodes. A few studies have looked at these type of correlations. For instance, John Fuerst have looked at GRE/GMAT scores and the like for immigrant groups in the US . This is taken as a proxy for cognitive ability, probably the most important component of the psychological traits part of familial traits. In that paper, Fuerst found correlations of .78 and .81 between these and country cognitive ability using Lynn and Vanhanen’s dataset .

Rindermann and Thompson have reported correlations between cognitive ability (component of Immigrant group familial traits) and native population cognitive ability (component of Other group familial traits) .

Most of my studies have looked at the nodes Population familial traits (sub-components Islam belief and cognitive ability) and Immigrant group S (or sub-components like crime if S was not available). Often this results in large correlations: .54 and .59 for Denmark and Norway (depending on how to deal with missing data, use of weighted correlations etc.). Note that in the model the first does cause the second, but there are a few intermediate steps and other variables, especially Emigrant selection (differs by country of origin which reduces the correlation) and Immigrant selection (which has no effect on the correlation).

There is much to be done. If one could obtain estimates of multiple nodes in a causal chain, one could use mediation analysis to see if mediation is plausible. E.g. right we we have Immigrant group S for two countries, cognitive ability for 100s of countries of origin, so if we could obtain immigrant group cognitive ability, one could test the mediation role of the last. With the current data, one can also check whether country of origin cognitive ability mediates the relationship between immigrant group S and country of origin S, which it should partly, according to the model. I say partly because the mediation is only to the extend that familial cognitive ability is a cause.


It seems that no one has integrated this literature yet. I will take a quick stab at it here. It could be expanded into a proper paper later in case someone wants to and have time to do that.


Lee Jussim (also blog) has done a tremendous job at reviewing the stereotype in recently years. In general he has found that stereotypes are mostly moderately to very accurate. On the other hand, self-fulfilling prophecies are probably real but fairly limited (e.g. work best when teachers don’t know their students well yet), especially in comparison to stereotype accuracy. Of course, these findings are exactly the opposite of what social psychologists, taken as a group, have been telling us for years.

The best short review of the literature is their book chapter The Unbearable Accuracy of Stereotypes. A longer treatment can be found in his 2012 book Social Perception and Social Reality: Why Accuracy Dominates Bias and Self-Fulfilling Prophecy (libgen).

Occupational success and cognitive ability

Society is more or less a semi-stable hierarchy biased on mostly inherited personality traits, cognitive ability as well as some family-based advantage. This shows up in the examination of surnames over time in many countries, as documented in Gregory Clark’s book The Son Also Rises: Surnames and the History of Social Mobility (libgen). One example:

sweden stability

Briefly put, surnames are kind of an extended family and they tend to keep their standing over time. They regress towards the mean (not the statistical kind!), but slowly. This is due to outmarrying (marrying people from lower classes) and genetic regression (i.e. predicted via breeder’s equation and due to the fact that narrow heritability and shared environment does not add up to 1).

It also shows up when educational attainment is directly examined with behavioral genetic methods. We reviewed the literature recently:

How do we find out whether g is causally related to later socioeconomic status? There are at least five lines of evidence: First, g and socioeconomic status correlate in adulthood. This has consistently been found for so many years that it hardly bears repeating[22, 23]. Second, in longitudinal studies, childhood g is a good correlate of adult socioeconomic status. A recent meta-analysis of longitudinal studies found that g was a better correlate of adult socioeconomic status and income than was parental socioeconomic status[24]. Third, there is a genetic overlap of causes of g and socioeconomic status and income[25, 26, 27, 28]. Fourth, multiple regression analyses show that IQ is a good predictor of future socioeconomic status, income and more, even controlling for parental income and the like[29]. Fifth, comparisons between full-siblings reared together show that those with higher IQ tend to do better in society. This cannot be attributed to shared environmental factors since these are the same for both siblings[30, 31].

I’m not aware of any behavioral genetic study of occupational success itself, but that may exist somewhere. (The scientific literature is basically a very badly standardized, difficult to search database.) But clearly, occupational success is closely related to income, educational attainment, cognitive ability and certain personality traits, all of which show substantial heritability and some of which are known to correlate genetically.

Occupations and cognitive ability

An old line of research shows that there is indeed a stable hierarchy in occupations’ mean and minimum cognitive ability levels. One good review of this is Meritocracy, Cognitive Ability,
and the Sources of Occupational Success, a working paper from 2002. I could not find a more recent version. The paper itself is somewhat antagonistic against the idea (the author hates psychometricians, in particular dislikes Herrnstein and Murray, as well as Jensen) but it does neatly summarize a lot of findings.

occu IQ 1

occu IQ 2

occu IQ 3

occu IQ 4

occu IQ 5

occu IQ 6

occu IQ 7

The last one is from Gottfredson’s book chapter g, jobs, and life (her site, better version).

Occupations and cognitive ability in preparation

Furthermore, we can go a step back from the above and find SAT scores (almost an IQ test) by college majors (more numbers here). These later result in people working in different occupations, altho the connection is not always a simple one-to-one, but somewhere between many-to-many and one-to-one, we might call it a few to a few. Some occupations only recruit persons with particular degrees — doctors must have degrees in medicine — while others are flexible within limits. Physics majors often don’t work with physics at their level of competence, but instead work as secondary education teachers, in the finance industry, as programmers, as engineers and of course sometimes as physicists of various kinds such as radiation specialists at hospitals and meteorologists. But still, physicists don’t often work as child carers or psychologists, so there is in general a strong connection between college majors and occupations.

There is some stereotype research into college majors. For instance, a recently popularized study showed that beliefs about intellectual requirements of college majors correlated with female% of the field, as in, the harder fields perceived to be more difficult had fewer women. In fact, the perceived difficulty of the field probably just mostly proxies the actual difficulty of the field, as measured by the mean SAT/ACT score of the students. However, no one seems to have actually correlated the SAT scores with the perceived difficulty, which is the correlation that is the most relevant for stereotype accuracy research.

There is a catch, however. If one analyses the SAT subtests vs. gender%, one sees that it is mostly the quantitative part of the SAT that gives rise to the SAT x gender% correlation. One can also see that the gender% correlates with median income by major.

quant-by-college-major-gender verbal-by-college-major-gender

Stereotypes about occupations and their cognitive ability

Finally, we get to the central question. If we ask people to estimate the cognitive ability of persons by occupation and then correlate this with the actual cognitive ability, what do we get? Jensen summarizes some results in his 1980 book Bias in Mental Testing (p. 339). I mark the most important passages.

People’s average ranking of occupations is much the same regardless of the basis on which they were told to rank them. The well-known Barr scale of occupations was constructed by asking 30 “ psychological judges” to rate 120 specific occupations, each definitely and concretely described, on a scale going from 0 to 100 according to the level of general intelligence required for ordinary success in the occupation. These judgments were made in 1920. Forty-four years later, in 1964, the National Opinion Research Center (NORC), in a large public opinion poll, asked many people to rate a large number of specific occupations in terms of their subjective opinion of the prestige of each occupation relative to all of the others. The correlation between the 1920 Barr ratings based on the average subjectively estimated intelligence requirements of the various occupations and the 1964 NORC ratings based on the average subjective opined prestige of the occupations is .91. The 1960 U.S. Census o f Population: Classified Index o f Occupations and Industries assigns each of several hundred occupations a composite index score based on the average income and educational level prevailing in the occupation. This index correlates .81 with the Barr subjective intelligence ratings and .90 with the NORC prestige ratings.

Rankings of the prestige of 25 occupations made by 450 high school and college students in 1946 showed the remarkable correlation of .97 with the rankings of the same occupations made by students in 1925 (Tyler, 1965, p. 342). Then, in 1949, the average ranking of these occupations by 500 teachers college students correlated .98 with the 1946 rankings by a different group of high school and college students. Very similar prestige rankings are also found in Britain and show a high degree of consistency across such groups as adolescents and adults, men and women, old and young, and upper and lower social classes. Obviously people are in considerable agreement in their subjective perceptions of numerous occupations, perceptions based on some kind of amalagam of the prestige image and supposed intellectual requirements of occupations, and these are highly related to such objective indices as the typical educational level and average income of the occupation. The subjective desirability of various occupations is also a part of the picture, as indicated by the relative frequencies of various occupational choices made by high school students. These frequencies show scant correspondence to the actual frequencies in various occupations; high-status occupations are greatly overselected and low-status occupations are seldom selected.

How well do such ratings of occupations correlate with the actual IQs of the persons in the rated occupations? The answer depends on whether we correlate the occupational prestige ratings with the average IQs in the various occupations or with the IQs of individual persons. The correlations between average prestige ratings and average IQs in occupations are very high— .90 to .95—when the averages are based on a large number of raters and a wide range of rated occupations. This means that the average of many people’s subjective perceptions conforms closely to an objective criterion, namely, tested IQ. Occupations with the highest status ratings are the learned professions—physician, scientist, lawyer, accountant, engineer, and other occupations that involve high educational requirements and highly developed skills, usually of an intellectual nature. The lowest-rated occupations are unskilled manual labor that almost any able-bodied person could do with very little or no prior training or experience and that involves minimal responsibility for decisions or supervision.

The correlation between rated occupational status and individual IQs ranges from about .50 to .70 in various studies. The results of such studies are much the same in Britain, the Netherlands, and the Soviet Union as in the United States, where the results are about the same for whites and blacks. The size of the correlation, which varies among different samples, seems to depend mostly on the age of the persons whose IQs are correlated with occupational status. IQ and occupational status are correlated .50 to .60 for young men ages 18 to 26 and about .70 for men over 40. A few years can make a big difference in these correlations. The younger men, of course, have not all yet attained their top career potential, and some of the highest-prestige occupations are not even represented in younger age groups. Judges, professors, business executives, college presidents, and the like are missing occupational categories in the studies based on young men, such as those drafted into the armed forces (e.g., the classic study of Harrell & Harrell, 1945).

I predict that there is a lot of delicious low-hanging, ripe research fruit ready for harvest in this area if one takes a day or ten to dig up some data and read thru older papers, books and reports.

Researcher degrees of freedom refer to the choices researchers make when conducting a study. There are many choices to be made, where to collect data, which variables to include, etc. However, a large subset of the choices concern only the question of how to analyze the data. Still I have now done 100s of analyses rigorous enough to publish, I know exactly what this means. I will give some examples from a work in progress.

1. Which variables to use?

The dataset I began with contains 75 columns. Some of these are names and the like, but many of them are socioeconomic variables in a broad sense. Which should be used? I picked some by judgment call with prior S studies, but I left out e.g. population density, mean age, pct. of population <16/working/old age. Should these have been included? Maybe.

2. What to do with City of London?

In the study, I examine the S factor among London boroughs. There are 32 boroughs and the City of London. The CoL is fairly small which can be rise to sampling error and effects related to being a very peculiar administrative division.

Furthermore, many variables in the dataset lack data for CoL. So I was faced with the question of what to do with it. Some options: 1) Exclude it. 2) Use only the variables for which there is data for CoL, 3) use more variables than has data for CoL and impute the rest. I chose (1), but one might have gone with either of the three.

3. The extra crime data

I found another dataset with crime counts. I calculated per capita versions of these. There are two level of types of crime: broad and detailed. Which should be used? One could also have factor analyzed the data and used the general factor scores. Or calculated a unit-weighted score (standardized all variables, then score cases by average of each variable). I used detailed variables.

4. The extra GCSE data

I found another dataset with GCSE measures. These exist for both genders together and for each gender alone. There are 9 different variables to choose from. Which should be used? Same options as before too: factor scores or unit-weighted average. I selected one for theoretical reasons (similarity to other scholastic variables e.g. PISA) and because Jensen’s method supported this choice.

5. How to deal with missing data

Before factor analyzing the data, one has the question of how to deal with missing data. Aside from CoL, a few other cases had some missing data. Some options: 1) exclude them, 2) impute them with means, 2) impute with best guess (various ways!). Which should be done? I used imputation with multiple regression method, one could have used e.g. k nearest means imputation instead.

6. How to deal with highly correlated variables

Sometimes including variables that correlate very strongly or even perfectly can seriously throw off the factor analysis results because they color the general factor. If extracted multiple factors, they will form their own factor. What should be done with these? 1) Nothing, 2) exclude based on a threshold value of max allowed intercorrelation. If (2), which value should be used? I used |.9|, but |.8| is about equally plausible.

7. How to deal with highly mixed cases

Sometimes some cases just don’t fit the factor structure of the data very well. They are structural outliers or mixed. What should be done with them? 1) Nothing, 2) Use rank-order data, 3) Use robust regression (many methods), 4) Change outlier values (e.g. any value >|3| sd gets reduced to 3 sd., 5) exclude them. If (5), which thresholds should we use for exclusion cutoff? [no answers forthcoming]. I chose to do (1), (2) and (5) and only excluded the most mixed case (Westminster).

Researcher choices as parameters

I made many more decisions than the ones mentioned above, but they are the most important ones (i think, so maybe!). Normally, research papers don’t mention these kind of choices. Sometimes they mention them, but doesn’t report results by different choices. I suspect a lot of this is due to the hassle of actually doing all the combinations.

However, the hassle is potentially much smaller if one had a general framework for doing it with programming tools. So I propose that as general, one should consider these kind of choices as parameters and calculate results for all of them. In the above, this means e.g. results with and without CoL, different variable exclusion thresholds, different choices with regards to mixed cases.

Theoretically, one could think of it as a hyperspace where every dimension is a choice for one of these options. Then one could examine the distribution of results over all parameter values to examine the robustness of the results re. analytic choices.

I have already been doing this for the choice of dealing with mixed cases, but perhaps I should ramp it up and do it more thoroly for other choices too. In this case, the threshold for exclusion of variables and which set of crime variables to use are important choices.

Just a quick analysis. When I read the Dutch crime report that forms the basis of this paper, I noticed one table that had crime rates by the proportion of immigrants in the neighborhood. Generally, one would expect r (immigrant% x S) to be negative and since r (S x crime) is negative, one would predict a positive r (immigrant% x crime). Is this the case? Well, mostly. The data are divided into 2 generation and 2 age groups, so there are 4 sub-datasets with lots of missing data and sampling error. If we just use all the cases as if they were independent and get rid of the data we get this result:

Immi% mean sd median trimmed mad min max range skew kurtosis
X0.5. 1.137 0.182 1.026 1.113 0.039 1 1.588 0.588 1.073 -0.148
X5.15. 1.284 0.292 1.162 1.258 0.24 1 1.938 0.938 0.809 -0.641
X15.50. 1.509 0.65 1.382 1.381 0.465 1 3.812 2.812 2.203 4.758
X.50. 1.769 1.154 1.435 1.526 0.471 1 5.812 4.812 2.36 4.937


In other words, within each group (N=28), the ones living in the areas with more immigrants are more crime-prone. There is however substantial variation. Sometimes the pattern is the reverse for no discernible reason. E.g. 12-17 year olds from Morocco have lower crime rates in the more immigrant heavy areas (7.4, 7.1, 6.5, 6.1).

The samples are too small for one to profitably dig more into it, I think.

R code & data


p_load(plyr, magrittr, readODS, kirkegaard, psych)

#load data from file
d_orig = read.ods("Z:/code/R/dutch_crime_area.ods")[[1]]
d_orig[d_orig=="" | d_orig=="0"] = NA

colnames(d_orig) = d_orig[1, ]
d_orig = d_orig[-1, ]

#remove cases with missing
d = na.omit(d_orig)

#remove names
origins = d$Origin
d$Origin = NULL

#remove unknown + total
d$Unknown = NULL
d$Total = NULL

#to numeric
d = lapply(d, as.numeric) %>%

#convert to standardized rates
d_std = adply(d, 1, function(x) {
  x_min = min(x)
  x_ret = x/x_min

describe(d_std) %>% write_clipboard

In the review of a paper submitted to ODP some time ago, the issue of a general extremism factor in religion came up. Unfortunately, Dutton deleted the submission thread, so the discussion is forever lost to history (possibly could be recovered from backups of the forum, but not worth the trouble; Yes I looked at the Wayback Machine with no luck).

Specifically, the topic was if and how one could rank order Christian denominations on a more/less extremist scale. Much discussion about them actually assumes this pattern (e.g. saying that , but as far as I know, it has not yet been examined empirically. I see two ways to examine it empirically:

1. A person-level approach
Person-level data where their beliefs re. a number of matters are given as well as their denomination. This allows for the calculation of mean acceptance rates of these beliefs within each denomination. These mean acceptance rates may then be factor analyzed to see if there is a general factor. For this to work, one would need at the very least 3 religious beliefs of central importance re. extremism (e.g. young earth creationism, stance towards homosexuals, atheists, abortion, sex outside marriage, freedom of speech wrt. religious criticism). Furthermore, one will need a number of different denominations, I’d say as least 10.

2. A denomination-level approach
Alternatively, one could examine it if one could find official beliefs from each denomination re. a range of issues. Critically, these must be expressible in numerical forms because that is required for factor analysis to work. Because the beliefs of persons belonging to a denomination often conflict with official beliefs (e.g. in Catholicism), the first approach is probably better.

Religious beliefs among Muslims
I was recently reminded of the above due to re-seeing Pew Research’s large-scale study of the beliefs of Muslims in their home countries. The dataset is publicly available and is fairly massive: 250 variables and a sample size of 32.6k. The questions cover socioeconomic variables as well as a large number of questions about stuff like Sharia:

sharia law

We see that there is a wide variety of beliefs, both within and between countries. Because persons can be grouped by country, this makes it possible to conduct factor analysis both at the case-level (within each country or pooled) and at the country-level. Islam does not have that many denominations, so a denomination-level analysis does not seem possible.

Predictive validity in country of origin studies
My studies of immigrant performance in Denmark, Norway, Finland and the Netherlands have shown that Islam prevalence in the homeland has some predictive validity for the socioeconomic outcomes of the migrants. If some of this predictive validity is due to religious conflict or religious extremism, then the degree of extremism in the home country should be a moderating (interaction) variable. It doesn’t initially appear to be the case because the major outlier with regards to socioeconomic performance is usually Indonesia, but as we can see above, they seem to be fairly extremist in their beliefs, at least with regards to Sharia.

A very quick look
Examining the factor structure of the religious beliefs require a number of hours dedicated to re-coding variables. This is because for some questions, they were only asked if they interviewee answered a particular way to an earlier question. I don’t have time to do this right now. I’m hoping posting this will inspire someone else to dig into it (write me an email/direct tweet/etc.).

However, to show that the idea is fruitful I did a little analysis. I used the following variables:

  • Q13. Generally, how would you rate Islamic political parties compared to other political parties? Are they better, worse or about the same as other parties?
  • Q14.Some feel that we should rely on a democratic form of government to solve our country’s problems.Others feel that we should rely on a leader with a strong hand to solve our country’s problems. Which comes closer to your opinion?
  • Q15. In your opinion, how much influence should religious leaders [IN IRN: religious figures] have in political matters? A large influence, some influence, not too much influence or no influence at all?
  • Q16. Which one of these comes closest to your opinion, number 1 or number 2? [morality and religion]
    Number 1 – It is not necessary to believe in God in order to be moral and have good values
    Number 2 – It is necessary to believe in God in order to be moral and have good values
  • Q20. Thinking about evolution [IN IRN: of humans and other living things], which comes closer to your view?
    Humans and other living things have evolved over time
    Humans and other living things have existed in their present form since the beginning of time
  • Q26. Which comes closer to describing your view? [IN IRN: In general,] Western music, movies and television have hurt morality in our country, OR western music, movies and television have NOT hurt morality in our country?
  • Q34. On average, how often do you attend the mosque for salah and Jum’ah Prayer [IN RUS: Friday afternoon prayer]?
  • Q36. How important is religion in your life – very important, somewhat important, not too important, or not at all important?
  • Q37. How comfortable would you be if a son of yours someday married a Christian?  Would you be very comfortable, somewhat comfortable, not too comfortable or not at all comfortable?
  • Q38. How comfortable would you be if a daughter of yours someday married a Christian?  Would you be very comfortable, somewhat comfortable, not too comfortable or not at all comfortable?
  • Q43a. Which, if any, of the following do you believe: in Heaven, where people who have led good lives [IN TUR: life without sin] are eternally rewarded?
  • Q43b. Which, if any, of the following do you believe: in Hell, where people who have led bad lives [IN TUR: life of sin] and die without being sorry are eternally punished?
  • Q43c. Which, if any, of the following do you believe: in angels?
  • Q43d. Which, if any, of the following do you believe: in witchcraft?
  • Q43e. Which, if any, of the following do you believe: in the ‘evil eye’ or that certain people can cast curses or spells that cause bad things to happen to someone?
  • Q43f. Which, if any, of the following do you believe: in predestination or fate (Kismat/Qadar)?
  • Q43g. Which, if any, of the following do you believe: in  jinns?

They were picked because they stood out to me as useful when I was scrolling down the list of variables, not because I examined all variables and handpicked these. There should be many more useful variables (good! because we like to extract general factors from indicators of a wide variety).

I re-coded them all so that higher values correspond to more extreme religious beliefs as judged by me, e.g. unacceptability of children marrying Christians (Q37-38). “Don’t know” and “refusal” were coded as missing. One could recode “don’t know” as the mean of each scale instead, perhaps, but this would require some more work from me.

Then factor analysis was run as usual. Results are shown further below. A general factor seems confirmed. I hereby dub it general religious factor (GRF), hopefully no one has used that term or letter combination yet (not true according to Google!). There was a lot of missing data however because not all interviewees wanted to answer all questions and not all questions was asked in all countries. We can impute this data (takes 11 mins on my computer, large dataset!). This does not change the general pattern of the results (good, otherwise the imputation would be introducing error), but it does allow us to calculate a mean GRF score for every country (i.e. mean of each case from that country; not taking weights into account).

Mean level by country (reordered):


Aside from Indonesia, which I had heard was less extremist, these results are not that surprising. The Muslim populations in central Asia and Europe are less extremist than those in MENAP.

Country-level GRF
Next up is calculating the country-level data. To do a country-level analysis, we need the mean score for each variable for each country. This is fairly tedious to calculate by hand or low-level code, but Hadley has made it fairly easy with plyr (or dplyr). This reduces the dataset to a 26 x 17 matrix from the original 32.6k x 17. I factor analyzed it as before. For comparison, we plot the loadings together:


Aside from the generally stronger loadings at the country-level (common finding), loadings are fairly similar. Factor congruence is .98, correlation is .87. At the country-level, only one variable has a negative loading and only slightly (believing in the existence of evil eyes, a belief not central (included in?) to Islam as far as I know).

One can also extract country-level scores from the country-level analysis and compare them with the mean scores from the individual-level analysis.


The findings are essentially the same whether we analyze at individual-level and then aggregate (x-axis), or aggregate and then analyze (y-axis). This is not a spurious finding as loadings can change quite a bit between levels depending on the way the data are aggregated.

There is a lot more one could do with this but I will leave it here for now. If someone knows of a suitable open access/science journal to publish this in, let me know. I could use Winnower, but I want some input from people who actually study religion in a comparative religi(on)ology.

Files uploaded to OSF:

We will be submitting the Admixture in the Americas article (first part) to Mankind Quarterly as a target article. Therefore, we are looking for people to comment on it. We are looking for people with substantial knowledge regarding the question of ethnic/race and national differences in psychological traits, primarily cognitive ability, and socioeconomic outcomes. Especially welcome are serious critics, which are very hard to find.

Send me an email if you would like to be a commenter. I don’t mean just oldschool academics. I intend to invite most prominent HBDers to submit a formal commentary paper to MQ.


“But Emil”, you say, “isn’t that a closed access journal, and you said you refuse to publish in those?”

Right you are. However, MQ allows us to post the PDFs elsewhere, e.g. ResearchGate, so this won’t be a problem.

I am considering starting another OpenPsych journal focused on sociology and political science. This is because I need somewhere to publish my S factor papers and I want an open science journal, with open data, code, and review. My guess is that such a journal does not exist right now, so I will have to start one. To do this I need a review team. Since submissions are probably going to be few, this means that it is not a very time consuming job. If we use the policy of generally recruiting 1 ad hoc external reviewer for every submission, this means that only 2 internal reviewers are needed for submissions.

So far I have asked Noah Carl who said that he would be “happy to review something now and again”. To start the journal, we probably need someone like >=5 people.

To be a reviewer you should be familiar with research in the area and have substantial expertise in statistics. The latter is most important. Please write me an email if you are interested in this.


The book is on Libgen (free download).

Since I have ventured into criminology as part of my ongoing research program into the spatial transferability hypothesis (psychological traits are stable when people move around, including between countries) and the immigrant groups by country of origin studies, I thought it was a good idea to actually read some criminology. So since there was a recent book covering genetically informative studies, this seemed like a decent choice, especially because it was also available on libgen for free! :)

So basically it is a debate book with a number of topics. For each topic, someone (or a group of someones) will argue for or explain the non-genetic theories/hypotheses, while another someone will sum up the genetically informative studies (i.e. behavioral genetics studies into crime) or at least biologically informed (e.g. neurological correlates of crime).

Initially, I read all the sociological chapters too until I decided they were a waste of time to read. Then I just read the biosocial ones. If you are wondering about the origin of that term as opposed to the more commonly used synonym sociobiological, the use of it was mostly a move to avoid the political backslash. One of the biosocial authors explained it like this to me:

In terms of the name biosocial (versus sociobiological), I think the name change happened accidentally. But there was somewhat of a reason, I guess. EO Wilson and sociobiological thought was so hated amongst sociologists and criminologists, none of us would have gotten a job had we labelled ourselves sociobiologists. Though it was no great secret that sociobiology gave birth to our field. In some ways, it was purely a semantic way to fend off attacks. Even so, there are some distinctions between us and old school sociobiology (use of behavior genetic techniques, etc.).

The book suffers from the widespread problem in social science of not giving effect size numbers. This is more of a problem for the sociological chapters, but true also for the biosocial ones. If no effect sizes are not reported, one cannot compare the importance of the alleged causes! Note that behavioral genetics results inherently include effect sizes. The simplest ACE fitting will output the effect sizes for additive genetics, shared environment and unshared environment+error.

Even if you don’t plan to read much of this, I recommend reading the highly entertaining chapter: The Role of Intelligence and Temperament in Interpreting the SES-Crime Relationship by Anthony Walsh, Charlene Y. Taylor, and Ilhong Yun.

What is age heaping?

Number heaping is a common tendency of humans. What this means is that we tend round numbers to the nearest 5 or 10 (those of us that use the decimal system!). Age heaping is the tendency of innumerate people to round their age to the nearest 5 or 10, presumably because they can’t subtract to infer their current age from their birth year and the current year. Psychometrically speaking, this is a very easy mathematical test, so why is it useful? Surely everybody but small children can do it now? Yes. However, in the past, not all adults even in Western countries could do this. One can locate legal documents and tomb stones from these times and analyze the amount of age heaping. The figure below shows an example of age heaping in old Italian data.

age heaping italy

Source: “Uniting Souls” and Numeracy Skills. Age Heaping in the First Italian National Censuses, 1861-1881. A’Hearn, Delfino & Nuvolari – Valencia, 13/06/2013.

Since we know that people’s ages really are nearly uniform, that is, the number of people aged 59 and 61 should be about the same as those aged 60, we can calculate indexes for how much heaping there is and use that as a crude numeracy measure. Economic historians have been doing this for some time and so we have some fairly comprehensible datasets for age heaping by now.

Is it a useful correlate?

If you read the source above you will see that age heaping in the 1800s show the expected north/south Italy patterns, but this is just one case. Does it work in general? The answer is yes. Below I plot some of the age heaping datasets versus Lynn and Vanhanen’s (2012) national IQs:

AH1800_IQAH1820_IQ  AH1850_IQAH1870_IQ AH1890_IQ

The problem with the data is this: the older datasets cover fewer countries and the newer datasets show strong ceiling effects (lots of countries very close to 100 on the x-axis). The ceiling effects are because the test is too easy. Still, the data covers a sufficiently large number of countries to be useful for modern comparisons. For instance, we can predict immigrant performance in Scandinavian countries based on their numeracy ability in the 1800s. Below I plot general socioeconomic performance (a general factor of education, income, use of social benefits and crime in Denmark in 2012) and age heaping in 1890:


The actual correlations are shown below:

AH1800 AH1820 AH1850 AH1870 AH1890 LV12 IQ S in DK
AH1800 1 0.95 0.94 0.96 0.9 0.85 0.61
AH1820 0.95 1 0.94 0.94 0.76 0.62 0.67
AH1850 0.94 0.94 1 0.99 0.84 0.73 0.59
AH1870 0.96 0.94 0.99 1 0.96 0.64 0.56
AH1890 0.9 0.76 0.84 0.96 1 0.52 0.73
LV12 IQ 0.85 0.62 0.73 0.64 0.52 1 0.54
S in DK 0.61 0.67 0.59 0.56 0.73 0.54 1


And the sample sizes:

AH1800 AH1820 AH1850 AH1870 AH1890 LV12 IQ S in DK
AH1800 31 25 22 22 24 29 24
AH1820 25 45 37 22 36 43 27
AH1850 22 37 45 27 37 43 30
AH1870 22 22 27 62 56 61 34
AH1890 24 36 37 56 109 107 50
LV12 IQ 29 43 43 61 107 203 68
S in DK 24 27 30 34 50 68 70


Great, where can I find the datasets?

Fortunately, they are freely available. The easiest solution is probably just to download the worldwide megadataset, which contains a number of the age heaping variables and lots of other variables for you to play around with:

Alternatively, you can find Baten’s age heaping data directly:

R code

#this is assuming you have loaded the megadataset as DF.supermega
temp = subset(DF.supermega, select = c("AH1800", "AH1820", "AH1850", "AH1870", "AH1890", "LV2012estimatedIQ", ""))
write_clipboard(wtd.cors(temp), digits = 2)

for (year in c("AH1800", "AH1820", "AH1850", "AH1870", "AH1890")) {
  ggplot(DF.supermega, aes_string(year, "LV2012estimatedIQ")) + geom_point() + geom_smooth(method = lm) + geom_text(aes(label = rownames(temp)))
  name = str_c(year, "_IQ.png")

ggplot(DF.supermega, aes(AH1890, + geom_point() + geom_smooth(method = lm) + geom_text(aes(label = rownames(temp)))

John Fuerst suggested that I write a meta-analysis, review and methodology paper on the S factor. That seems like a decent idea once I get some more studies done (data are known to exist on France (another level), Japan (analysis done, writing pending), Denmark, Sweden and Turkey (reanalysis of Lynn’s data done, but there is much more data).

However, before doing that it seems okay to post my check list here in case someone else is planning on doing a study.

A methodology paper is perhaps not too bad an idea. Here’s a quick check list of what I usually do:
  1. Find some country for which there exist administrative divisions that number preferably at least 10 and as many as possible.
  2. Find cognitive data for these divisions. Usually this is only available for fairly large divisions, like states but may sometimes be available for smaller divisions. One can sometimes find real IQ test data, but usually one will have to rely on scholastic ability tests such as PISA. Often one will have to use a regional or national variant of this.
  3. Find socioeconomic outcome data for these divisions. This can usually be found at some kind of official statistics bureau’s website. These websites often have English language editions for non-English speaker countries. Sometimes they don’t and one has to rely on clever use of guessing and Google Translate. If the country has a diverse ethnoracial demographic, obtain data for this as well. If possible, try to obtain data for multiple levels of administrative divisions and time periods so one can see changes over levels or time. Sometimes data will be available for a variety of years, so one can do a longitudinal study. Other times one will have to average all the years for each variable.
  4. If there are lots of variables to choose from, then choose a diverse mix of variables. Avoid variables that are overly dependent on local natural environment, such as the presence of a large body of water.
  5. Use the redundancy algorithm to remove the most redundant variables. I usually use a threshold of |.90|, such that if a pair of variables in the dataset correlate >= that level, then remove one of them. One can also average them if they are e.g. gendered versions, such as life expectancy or mean income by gender.
  6. Use the mixedness algorithms to detect if any cases are structural outliers, i.e. that they don’t fit the factor structure of the remaining cases. Create parallel datasets without the problematic cases.
  7. Factor analyze the dataset with outliers with ordinary factor analysis (FA), rank order and robust FA. Use ordinary FA on the dataset without the structural outliers. Plot all the FA loading sets using the loadings plotter function. Make note of variables that change their loadings between analyses, and variables that load in unexpected ways.
  8. Extract the S factors and examine their relationship to the ethnoracial variables and cognitive scores.
  9. If the country has seen substantial immigration over the recent decades, it may be a good idea to regress out the effect of this demographic and examine the loadings.
  10. Write up the results. Use lots of loading plots and scatter plots with names.
  11. After you have written a draft, contact natives to get their opinion. Maybe you missed something important about the country. People who speak the local language are also useful when gathering data, but generally, you will have to do things yourself.


If I missed something, let me know.