It seems that no one has integrated this literature yet. I will take a quick stab at it here. It could be expanded into a proper paper later in case someone wants to and have time to do that.


Lee Jussim (also blog) has done a tremendous job at reviewing the stereotype in recently years. In general he has found that stereotypes are mostly moderately to very accurate. On the other hand, self-fulfilling prophecies are probably real but fairly limited (e.g. work best when teachers don’t know their students well yet), especially in comparison to stereotype accuracy. Of course, these findings are exactly the opposite of what social psychologists, taken as a group, have been telling us for years.

The best short review of the literature is their book chapter The Unbearable Accuracy of Stereotypes. A longer treatment can be found in his 2012 book Social Perception and Social Reality: Why Accuracy Dominates Bias and Self-Fulfilling Prophecy (libgen).

Occupational success and cognitive ability

Society is more or less a semi-stable hierarchy biased on mostly inherited personality traits, cognitive ability as well as some family-based advantage. This shows up in the examination of surnames over time in many countries, as documented in Gregory Clark’s book The Son Also Rises: Surnames and the History of Social Mobility (libgen). One example:

sweden stability

Briefly put, surnames are kind of an extended family and they tend to keep their standing over time. They regress towards the mean (not the statistical kind!), but slowly. This is due to outmarrying (marrying people from lower classes) and genetic regression (i.e. predicted via breeder’s equation and due to the fact that narrow heritability and shared environment does not add up to 1).

It also shows up when educational attainment is directly examined with behavioral genetic methods. We reviewed the literature recently:

How do we find out whether g is causally related to later socioeconomic status? There are at least five lines of evidence: First, g and socioeconomic status correlate in adulthood. This has consistently been found for so many years that it hardly bears repeating[22, 23]. Second, in longitudinal studies, childhood g is a good correlate of adult socioeconomic status. A recent meta-analysis of longitudinal studies found that g was a better correlate of adult socioeconomic status and income than was parental socioeconomic status[24]. Third, there is a genetic overlap of causes of g and socioeconomic status and income[25, 26, 27, 28]. Fourth, multiple regression analyses show that IQ is a good predictor of future socioeconomic status, income and more, even controlling for parental income and the like[29]. Fifth, comparisons between full-siblings reared together show that those with higher IQ tend to do better in society. This cannot be attributed to shared environmental factors since these are the same for both siblings[30, 31].

I’m not aware of any behavioral genetic study of occupational success itself, but that may exist somewhere. (The scientific literature is basically a very badly standardized, difficult to search database.) But clearly, occupational success is closely related to income, educational attainment, cognitive ability and certain personality traits, all of which show substantial heritability and some of which are known to correlate genetically.

Occupations and cognitive ability

An old line of research shows that there is indeed a stable hierarchy in occupations’ mean and minimum cognitive ability levels. One good review of this is Meritocracy, Cognitive Ability,
and the Sources of Occupational Success, a working paper from 2002. I could not find a more recent version. The paper itself is somewhat antagonistic against the idea (the author hates psychometricians, in particular dislikes Herrnstein and Murray, as well as Jensen) but it does neatly summarize a lot of findings.

occu IQ 1

occu IQ 2

occu IQ 3

occu IQ 4

occu IQ 5

occu IQ 6

occu IQ 7

The last one is from Gottfredson’s book chapter g, jobs, and life (her site, better version).

Occupations and cognitive ability in preparation

Furthermore, we can go a step back from the above and find SAT scores (almost an IQ test) by college majors (more numbers here). These later result in people working in different occupations, altho the connection is not always a simple one-to-one, but somewhere between many-to-many and one-to-one, we might call it a few to a few. Some occupations only recruit persons with particular degrees — doctors must have degrees in medicine — while others are flexible within limits. Physics majors often don’t work with physics at their level of competence, but instead work as secondary education teachers, in the finance industry, as programmers, as engineers and of course sometimes as physicists of various kinds such as radiation specialists at hospitals and meteorologists. But still, physicists don’t often work as child carers or psychologists, so there is in general a strong connection between college majors and occupations.

There is some stereotype research into college majors. For instance, a recently popularized study showed that beliefs about intellectual requirements of college majors correlated with female% of the field, as in, the harder fields perceived to be more difficult had fewer women. In fact, the perceived difficulty of the field probably just mostly proxies the actual difficulty of the field, as measured by the mean SAT/ACT score of the students. However, no one seems to have actually correlated the SAT scores with the perceived difficulty, which is the correlation that is the most relevant for stereotype accuracy research.

There is a catch, however. If one analyses the SAT subtests vs. gender%, one sees that it is mostly the quantitative part of the SAT that gives rise to the SAT x gender% correlation. One can also see that the gender% correlates with median income by major.

quant-by-college-major-gender verbal-by-college-major-gender

Stereotypes about occupations and their cognitive ability

Finally, we get to the central question. If we ask people to estimate the cognitive ability of persons by occupation and then correlate this with the actual cognitive ability, what do we get? Jensen summarizes some results in his 1980 book Bias in Mental Testing (p. 339). I mark the most important passages.

People’s average ranking of occupations is much the same regardless of the basis on which they were told to rank them. The well-known Barr scale of occupations was constructed by asking 30 “ psychological judges” to rate 120 specific occupations, each definitely and concretely described, on a scale going from 0 to 100 according to the level of general intelligence required for ordinary success in the occupation. These judgments were made in 1920. Forty-four years later, in 1964, the National Opinion Research Center (NORC), in a large public opinion poll, asked many people to rate a large number of specific occupations in terms of their subjective opinion of the prestige of each occupation relative to all of the others. The correlation between the 1920 Barr ratings based on the average subjectively estimated intelligence requirements of the various occupations and the 1964 NORC ratings based on the average subjective opined prestige of the occupations is .91. The 1960 U.S. Census o f Population: Classified Index o f Occupations and Industries assigns each of several hundred occupations a composite index score based on the average income and educational level prevailing in the occupation. This index correlates .81 with the Barr subjective intelligence ratings and .90 with the NORC prestige ratings.

Rankings of the prestige of 25 occupations made by 450 high school and college students in 1946 showed the remarkable correlation of .97 with the rankings of the same occupations made by students in 1925 (Tyler, 1965, p. 342). Then, in 1949, the average ranking of these occupations by 500 teachers college students correlated .98 with the 1946 rankings by a different group of high school and college students. Very similar prestige rankings are also found in Britain and show a high degree of consistency across such groups as adolescents and adults, men and women, old and young, and upper and lower social classes. Obviously people are in considerable agreement in their subjective perceptions of numerous occupations, perceptions based on some kind of amalagam of the prestige image and supposed intellectual requirements of occupations, and these are highly related to such objective indices as the typical educational level and average income of the occupation. The subjective desirability of various occupations is also a part of the picture, as indicated by the relative frequencies of various occupational choices made by high school students. These frequencies show scant correspondence to the actual frequencies in various occupations; high-status occupations are greatly overselected and low-status occupations are seldom selected.

How well do such ratings of occupations correlate with the actual IQs of the persons in the rated occupations? The answer depends on whether we correlate the occupational prestige ratings with the average IQs in the various occupations or with the IQs of individual persons. The correlations between average prestige ratings and average IQs in occupations are very high— .90 to .95—when the averages are based on a large number of raters and a wide range of rated occupations. This means that the average of many people’s subjective perceptions conforms closely to an objective criterion, namely, tested IQ. Occupations with the highest status ratings are the learned professions—physician, scientist, lawyer, accountant, engineer, and other occupations that involve high educational requirements and highly developed skills, usually of an intellectual nature. The lowest-rated occupations are unskilled manual labor that almost any able-bodied person could do with very little or no prior training or experience and that involves minimal responsibility for decisions or supervision.

The correlation between rated occupational status and individual IQs ranges from about .50 to .70 in various studies. The results of such studies are much the same in Britain, the Netherlands, and the Soviet Union as in the United States, where the results are about the same for whites and blacks. The size of the correlation, which varies among different samples, seems to depend mostly on the age of the persons whose IQs are correlated with occupational status. IQ and occupational status are correlated .50 to .60 for young men ages 18 to 26 and about .70 for men over 40. A few years can make a big difference in these correlations. The younger men, of course, have not all yet attained their top career potential, and some of the highest-prestige occupations are not even represented in younger age groups. Judges, professors, business executives, college presidents, and the like are missing occupational categories in the studies based on young men, such as those drafted into the armed forces (e.g., the classic study of Harrell & Harrell, 1945).

I predict that there is a lot of delicious low-hanging, ripe research fruit ready for harvest in this area if one takes a day or ten to dig up some data and read thru older papers, books and reports.

Researcher degrees of freedom refer to the choices researchers make when conducting a study. There are many choices to be made, where to collect data, which variables to include, etc. However, a large subset of the choices concern only the question of how to analyze the data. Still I have now done 100s of analyses rigorous enough to publish, I know exactly what this means. I will give some examples from a work in progress.

1. Which variables to use?

The dataset I began with contains 75 columns. Some of these are names and the like, but many of them are socioeconomic variables in a broad sense. Which should be used? I picked some by judgment call with prior S studies, but I left out e.g. population density, mean age, pct. of population <16/working/old age. Should these have been included? Maybe.

2. What to do with City of London?

In the study, I examine the S factor among London boroughs. There are 32 boroughs and the City of London. The CoL is fairly small which can be rise to sampling error and effects related to being a very peculiar administrative division.

Furthermore, many variables in the dataset lack data for CoL. So I was faced with the question of what to do with it. Some options: 1) Exclude it. 2) Use only the variables for which there is data for CoL, 3) use more variables than has data for CoL and impute the rest. I chose (1), but one might have gone with either of the three.

3. The extra crime data

I found another dataset with crime counts. I calculated per capita versions of these. There are two level of types of crime: broad and detailed. Which should be used? One could also have factor analyzed the data and used the general factor scores. Or calculated a unit-weighted score (standardized all variables, then score cases by average of each variable). I used detailed variables.

4. The extra GCSE data

I found another dataset with GCSE measures. These exist for both genders together and for each gender alone. There are 9 different variables to choose from. Which should be used? Same options as before too: factor scores or unit-weighted average. I selected one for theoretical reasons (similarity to other scholastic variables e.g. PISA) and because Jensen’s method supported this choice.

5. How to deal with missing data

Before factor analyzing the data, one has the question of how to deal with missing data. Aside from CoL, a few other cases had some missing data. Some options: 1) exclude them, 2) impute them with means, 2) impute with best guess (various ways!). Which should be done? I used imputation with multiple regression method, one could have used e.g. k nearest means imputation instead.

6. How to deal with highly correlated variables

Sometimes including variables that correlate very strongly or even perfectly can seriously throw off the factor analysis results because they color the general factor. If extracted multiple factors, they will form their own factor. What should be done with these? 1) Nothing, 2) exclude based on a threshold value of max allowed intercorrelation. If (2), which value should be used? I used |.9|, but |.8| is about equally plausible.

7. How to deal with highly mixed cases

Sometimes some cases just don’t fit the factor structure of the data very well. They are structural outliers or mixed. What should be done with them? 1) Nothing, 2) Use rank-order data, 3) Use robust regression (many methods), 4) Change outlier values (e.g. any value >|3| sd gets reduced to 3 sd., 5) exclude them. If (5), which thresholds should we use for exclusion cutoff? [no answers forthcoming]. I chose to do (1), (2) and (5) and only excluded the most mixed case (Westminster).

Researcher choices as parameters

I made many more decisions than the ones mentioned above, but they are the most important ones (i think, so maybe!). Normally, research papers don’t mention these kind of choices. Sometimes they mention them, but doesn’t report results by different choices. I suspect a lot of this is due to the hassle of actually doing all the combinations.

However, the hassle is potentially much smaller if one had a general framework for doing it with programming tools. So I propose that as general, one should consider these kind of choices as parameters and calculate results for all of them. In the above, this means e.g. results with and without CoL, different variable exclusion thresholds, different choices with regards to mixed cases.

Theoretically, one could think of it as a hyperspace where every dimension is a choice for one of these options. Then one could examine the distribution of results over all parameter values to examine the robustness of the results re. analytic choices.

I have already been doing this for the choice of dealing with mixed cases, but perhaps I should ramp it up and do it more thoroly for other choices too. In this case, the threshold for exclusion of variables and which set of crime variables to use are important choices.

Just a quick analysis. When I read the Dutch crime report that forms the basis of this paper, I noticed one table that had crime rates by the proportion of immigrants in the neighborhood. Generally, one would expect r (immigrant% x S) to be negative and since r (S x crime) is negative, one would predict a positive r (immigrant% x crime). Is this the case? Well, mostly. The data are divided into 2 generation and 2 age groups, so there are 4 sub-datasets with lots of missing data and sampling error. If we just use all the cases as if they were independent and get rid of the data we get this result:

Immi% mean sd median trimmed mad min max range skew kurtosis
X0.5. 1.137 0.182 1.026 1.113 0.039 1 1.588 0.588 1.073 -0.148
X5.15. 1.284 0.292 1.162 1.258 0.24 1 1.938 0.938 0.809 -0.641
X15.50. 1.509 0.65 1.382 1.381 0.465 1 3.812 2.812 2.203 4.758
X.50. 1.769 1.154 1.435 1.526 0.471 1 5.812 4.812 2.36 4.937


In other words, within each group (N=28), the ones living in the areas with more immigrants are more crime-prone. There is however substantial variation. Sometimes the pattern is the reverse for no discernible reason. E.g. 12-17 year olds from Morocco have lower crime rates in the more immigrant heavy areas (7.4, 7.1, 6.5, 6.1).

The samples are too small for one to profitably dig more into it, I think.

R code & data


p_load(plyr, magrittr, readODS, kirkegaard, psych)

#load data from file
d_orig = read.ods("Z:/code/R/dutch_crime_area.ods")[[1]]
d_orig[d_orig=="" | d_orig=="0"] = NA

colnames(d_orig) = d_orig[1, ]
d_orig = d_orig[-1, ]

#remove cases with missing
d = na.omit(d_orig)

#remove names
origins = d$Origin
d$Origin = NULL

#remove unknown + total
d$Unknown = NULL
d$Total = NULL

#to numeric
d = lapply(d, as.numeric) %>%

#convert to standardized rates
d_std = adply(d, 1, function(x) {
  x_min = min(x)
  x_ret = x/x_min

describe(d_std) %>% write_clipboard

In the review of a paper submitted to ODP some time ago, the issue of a general extremism factor in religion came up. Unfortunately, Dutton deleted the submission thread, so the discussion is forever lost to history (possibly could be recovered from backups of the forum, but not worth the trouble; Yes I looked at the Wayback Machine with no luck).

Specifically, the topic was if and how one could rank order Christian denominations on a more/less extremist scale. Much discussion about them actually assumes this pattern (e.g. saying that , but as far as I know, it has not yet been examined empirically. I see two ways to examine it empirically:

1. A person-level approach
Person-level data where their beliefs re. a number of matters are given as well as their denomination. This allows for the calculation of mean acceptance rates of these beliefs within each denomination. These mean acceptance rates may then be factor analyzed to see if there is a general factor. For this to work, one would need at the very least 3 religious beliefs of central importance re. extremism (e.g. young earth creationism, stance towards homosexuals, atheists, abortion, sex outside marriage, freedom of speech wrt. religious criticism). Furthermore, one will need a number of different denominations, I’d say as least 10.

2. A denomination-level approach
Alternatively, one could examine it if one could find official beliefs from each denomination re. a range of issues. Critically, these must be expressible in numerical forms because that is required for factor analysis to work. Because the beliefs of persons belonging to a denomination often conflict with official beliefs (e.g. in Catholicism), the first approach is probably better.

Religious beliefs among Muslims
I was recently reminded of the above due to re-seeing Pew Research’s large-scale study of the beliefs of Muslims in their home countries. The dataset is publicly available and is fairly massive: 250 variables and a sample size of 32.6k. The questions cover socioeconomic variables as well as a large number of questions about stuff like Sharia:

sharia law

We see that there is a wide variety of beliefs, both within and between countries. Because persons can be grouped by country, this makes it possible to conduct factor analysis both at the case-level (within each country or pooled) and at the country-level. Islam does not have that many denominations, so a denomination-level analysis does not seem possible.

Predictive validity in country of origin studies
My studies of immigrant performance in Denmark, Norway, Finland and the Netherlands have shown that Islam prevalence in the homeland has some predictive validity for the socioeconomic outcomes of the migrants. If some of this predictive validity is due to religious conflict or religious extremism, then the degree of extremism in the home country should be a moderating (interaction) variable. It doesn’t initially appear to be the case because the major outlier with regards to socioeconomic performance is usually Indonesia, but as we can see above, they seem to be fairly extremist in their beliefs, at least with regards to Sharia.

A very quick look
Examining the factor structure of the religious beliefs require a number of hours dedicated to re-coding variables. This is because for some questions, they were only asked if they interviewee answered a particular way to an earlier question. I don’t have time to do this right now. I’m hoping posting this will inspire someone else to dig into it (write me an email/direct tweet/etc.).

However, to show that the idea is fruitful I did a little analysis. I used the following variables:

  • Q13. Generally, how would you rate Islamic political parties compared to other political parties? Are they better, worse or about the same as other parties?
  • Q14.Some feel that we should rely on a democratic form of government to solve our country’s problems.Others feel that we should rely on a leader with a strong hand to solve our country’s problems. Which comes closer to your opinion?
  • Q15. In your opinion, how much influence should religious leaders [IN IRN: religious figures] have in political matters? A large influence, some influence, not too much influence or no influence at all?
  • Q16. Which one of these comes closest to your opinion, number 1 or number 2? [morality and religion]
    Number 1 – It is not necessary to believe in God in order to be moral and have good values
    Number 2 – It is necessary to believe in God in order to be moral and have good values
  • Q20. Thinking about evolution [IN IRN: of humans and other living things], which comes closer to your view?
    Humans and other living things have evolved over time
    Humans and other living things have existed in their present form since the beginning of time
  • Q26. Which comes closer to describing your view? [IN IRN: In general,] Western music, movies and television have hurt morality in our country, OR western music, movies and television have NOT hurt morality in our country?
  • Q34. On average, how often do you attend the mosque for salah and Jum’ah Prayer [IN RUS: Friday afternoon prayer]?
  • Q36. How important is religion in your life – very important, somewhat important, not too important, or not at all important?
  • Q37. How comfortable would you be if a son of yours someday married a Christian?  Would you be very comfortable, somewhat comfortable, not too comfortable or not at all comfortable?
  • Q38. How comfortable would you be if a daughter of yours someday married a Christian?  Would you be very comfortable, somewhat comfortable, not too comfortable or not at all comfortable?
  • Q43a. Which, if any, of the following do you believe: in Heaven, where people who have led good lives [IN TUR: life without sin] are eternally rewarded?
  • Q43b. Which, if any, of the following do you believe: in Hell, where people who have led bad lives [IN TUR: life of sin] and die without being sorry are eternally punished?
  • Q43c. Which, if any, of the following do you believe: in angels?
  • Q43d. Which, if any, of the following do you believe: in witchcraft?
  • Q43e. Which, if any, of the following do you believe: in the ‘evil eye’ or that certain people can cast curses or spells that cause bad things to happen to someone?
  • Q43f. Which, if any, of the following do you believe: in predestination or fate (Kismat/Qadar)?
  • Q43g. Which, if any, of the following do you believe: in  jinns?

They were picked because they stood out to me as useful when I was scrolling down the list of variables, not because I examined all variables and handpicked these. There should be many more useful variables (good! because we like to extract general factors from indicators of a wide variety).

I re-coded them all so that higher values correspond to more extreme religious beliefs as judged by me, e.g. unacceptability of children marrying Christians (Q37-38). “Don’t know” and “refusal” were coded as missing. One could recode “don’t know” as the mean of each scale instead, perhaps, but this would require some more work from me.

Then factor analysis was run as usual. Results are shown further below. A general factor seems confirmed. I hereby dub it general religious factor (GRF), hopefully no one has used that term or letter combination yet (not true according to Google!). There was a lot of missing data however because not all interviewees wanted to answer all questions and not all questions was asked in all countries. We can impute this data (takes 11 mins on my computer, large dataset!). This does not change the general pattern of the results (good, otherwise the imputation would be introducing error), but it does allow us to calculate a mean GRF score for every country (i.e. mean of each case from that country; not taking weights into account).

Mean level by country (reordered):


Aside from Indonesia, which I had heard was less extremist, these results are not that surprising. The Muslim populations in central Asia and Europe are less extremist than those in MENAP.

Country-level GRF
Next up is calculating the country-level data. To do a country-level analysis, we need the mean score for each variable for each country. This is fairly tedious to calculate by hand or low-level code, but Hadley has made it fairly easy with plyr (or dplyr). This reduces the dataset to a 26 x 17 matrix from the original 32.6k x 17. I factor analyzed it as before. For comparison, we plot the loadings together:


Aside from the generally stronger loadings at the country-level (common finding), loadings are fairly similar. Factor congruence is .98, correlation is .87. At the country-level, only one variable has a negative loading and only slightly (believing in the existence of evil eyes, a belief not central (included in?) to Islam as far as I know).

One can also extract country-level scores from the country-level analysis and compare them with the mean scores from the individual-level analysis.


The findings are essentially the same whether we analyze at individual-level and then aggregate (x-axis), or aggregate and then analyze (y-axis). This is not a spurious finding as loadings can change quite a bit between levels depending on the way the data are aggregated.

There is a lot more one could do with this but I will leave it here for now. If someone knows of a suitable open access/science journal to publish this in, let me know. I could use Winnower, but I want some input from people who actually study religion in a comparative religi(on)ology.

Files uploaded to OSF:

We will be submitting the Admixture in the Americas article (first part) to Mankind Quarterly as a target article. Therefore, we are looking for people to comment on it. We are looking for people with substantial knowledge regarding the question of ethnic/race and national differences in psychological traits, primarily cognitive ability, and socioeconomic outcomes. Especially welcome are serious critics, which are very hard to find.

Send me an email if you would like to be a commenter. I don’t mean just oldschool academics. I intend to invite most prominent HBDers to submit a formal commentary paper to MQ.


“But Emil”, you say, “isn’t that a closed access journal, and you said you refuse to publish in those?”

Right you are. However, MQ allows us to post the PDFs elsewhere, e.g. ResearchGate, so this won’t be a problem.

I am considering starting another OpenPsych journal focused on sociology and political science. This is because I need somewhere to publish my S factor papers and I want an open science journal, with open data, code, and review. My guess is that such a journal does not exist right now, so I will have to start one. To do this I need a review team. Since submissions are probably going to be few, this means that it is not a very time consuming job. If we use the policy of generally recruiting 1 ad hoc external reviewer for every submission, this means that only 2 internal reviewers are needed for submissions.

So far I have asked Noah Carl who said that he would be “happy to review something now and again”. To start the journal, we probably need someone like >=5 people.

To be a reviewer you should be familiar with research in the area and have substantial expertise in statistics. The latter is most important. Please write me an email if you are interested in this.


The book is on Libgen (free download).

Since I have ventured into criminology as part of my ongoing research program into the spatial transferability hypothesis (psychological traits are stable when people move around, including between countries) and the immigrant groups by country of origin studies, I thought it was a good idea to actually read some criminology. So since there was a recent book covering genetically informative studies, this seemed like a decent choice, especially because it was also available on libgen for free! :)

So basically it is a debate book with a number of topics. For each topic, someone (or a group of someones) will argue for or explain the non-genetic theories/hypotheses, while another someone will sum up the genetically informative studies (i.e. behavioral genetics studies into crime) or at least biologically informed (e.g. neurological correlates of crime).

Initially, I read all the sociological chapters too until I decided they were a waste of time to read. Then I just read the biosocial ones. If you are wondering about the origin of that term as opposed to the more commonly used synonym sociobiological, the use of it was mostly a move to avoid the political backslash. One of the biosocial authors explained it like this to me:

In terms of the name biosocial (versus sociobiological), I think the name change happened accidentally. But there was somewhat of a reason, I guess. EO Wilson and sociobiological thought was so hated amongst sociologists and criminologists, none of us would have gotten a job had we labelled ourselves sociobiologists. Though it was no great secret that sociobiology gave birth to our field. In some ways, it was purely a semantic way to fend off attacks. Even so, there are some distinctions between us and old school sociobiology (use of behavior genetic techniques, etc.).

The book suffers from the widespread problem in social science of not giving effect size numbers. This is more of a problem for the sociological chapters, but true also for the biosocial ones. If no effect sizes are not reported, one cannot compare the importance of the alleged causes! Note that behavioral genetics results inherently include effect sizes. The simplest ACE fitting will output the effect sizes for additive genetics, shared environment and unshared environment+error.

Even if you don’t plan to read much of this, I recommend reading the highly entertaining chapter: The Role of Intelligence and Temperament in Interpreting the SES-Crime Relationship by Anthony Walsh, Charlene Y. Taylor, and Ilhong Yun.

What is age heaping?

Number heaping is a common tendency of humans. What this means is that we tend round numbers to the nearest 5 or 10. Age heaping is the tendency of innumerate people to round their age to the nearest 5 or 10, presumably because they can’t subtract to infer their current age from their birth year and the current year. Psychometrically speaking, this is a very easy mathematical test, so why is it useful? Surely everybody but small children can do it now? Yes. However, in the past, not all adults even in Western countries could do this. One can locate legal documents and tomb stones from these times and analyze the amount of age heaping. The figure below shows an example of age heaping in old Italian data.

age heaping italy

Source: “Uniting Souls” and Numeracy Skills. Age Heaping in the First Italian National Censuses, 1861-1881. A’Hearn, Delfino & Nuvolari – Valencia, 13/06/2013.

Since we know that people’s ages really are nearly uniform, that is, the number of people aged 59 and 61 should be about the same as those aged 60, we can calculate indexes for how much heaping there is and use that as a crude numeracy measure. Economic historians have been doing this for some time and so we have some fairly comprehensible datasets for age heaping by now.

Is it a useful correlate?

If you read the source above you will see that age heaping in the 1800s show the expected north/south Italy patterns, but this is just one case. Does it work in general? The answer is yes. Below I plot some of the age heaping datasets versus Lynn and Vanhanen’s (2012) national IQs:

AH1800_IQAH1820_IQ  AH1850_IQAH1870_IQ AH1890_IQ

The problem with the data is this: the older datasets cover fewer countries and the newer datasets show strong ceiling effects (lots of countries very close to 100 on the x-axis). The ceiling effects are because the test is too easy. Still, the data covers a sufficiently large number of countries to be useful for modern comparisons. For instance, we can predict immigrant performance in Scandinavian countries based on their numeracy ability in the 1800s. Below I plot general socioeconomic performance (a general factor of education, income, use of social benefits and crime in Denmark in 2012) and age heaping in 1890:


The actual correlations are shown below:

AH1800 AH1820 AH1850 AH1870 AH1890 LV12 IQ S in DK
AH1800 1 0.95 0.94 0.96 0.9 0.85 0.61
AH1820 0.95 1 0.94 0.94 0.76 0.62 0.67
AH1850 0.94 0.94 1 0.99 0.84 0.73 0.59
AH1870 0.96 0.94 0.99 1 0.96 0.64 0.56
AH1890 0.9 0.76 0.84 0.96 1 0.52 0.73
LV12 IQ 0.85 0.62 0.73 0.64 0.52 1 0.54
S in DK 0.61 0.67 0.59 0.56 0.73 0.54 1


And the sample sizes:

AH1800 AH1820 AH1850 AH1870 AH1890 LV12 IQ S in DK
AH1800 31 25 22 22 24 29 24
AH1820 25 45 37 22 36 43 27
AH1850 22 37 45 27 37 43 30
AH1870 22 22 27 62 56 61 34
AH1890 24 36 37 56 109 107 50
LV12 IQ 29 43 43 61 107 203 68
S in DK 24 27 30 34 50 68 70


Great, where can I find the datasets?

Fortunately, they are freely available. The easiest solution is probably just to download the worldwide megadataset, which contains a number of the age heaping variables and lots of other variables for you to play around with:

Alternatively, you can Baten’s age heaping data directly:

R code

#this is assuming you have loaded the megadataset as DF.supermega
temp = subset(DF.supermega, select = c("AH1800", "AH1820", "AH1850", "AH1870", "AH1890", "LV2012estimatedIQ", ""))
write_clipboard(wtd.cors(temp), digits = 2)

for (year in c("AH1800", "AH1820", "AH1850", "AH1870", "AH1890")) {
  ggplot(DF.supermega, aes_string(year, "LV2012estimatedIQ")) + geom_point() + geom_smooth(method = lm) + geom_text(aes(label = rownames(temp)))
  name = str_c(year, "_IQ.png")

ggplot(DF.supermega, aes(AH1890, + geom_point() + geom_smooth(method = lm) + geom_text(aes(label = rownames(temp)))

John Fuerst suggested that I write a meta-analysis, review and methodology paper on the S factor. That seems like a decent idea once I get some more studies done (data are known to exist on France (another level), Japan (analysis done, writing pending), Denmark, Sweden and Turkey (reanalysis of Lynn’s data done, but there is much more data).

However, before doing that it seems okay to post my check list here in case someone else is planning on doing a study.

A methodology paper is perhaps not too bad an idea. Here’s a quick check list of what I usually do:
  1. Find some country for which there exist administrative divisions that number preferably at least 10 and as many as possible.
  2. Find cognitive data for these divisions. Usually this is only available for fairly large divisions, like states but may sometimes be available for smaller divisions. One can sometimes find real IQ test data, but usually one will have to rely on scholastic ability tests such as PISA. Often one will have to use a regional or national variant of this.
  3. Find socioeconomic outcome data for these divisions. This can usually be found at some kind of official statistics bureau’s website. These websites often have English language editions for non-English speaker countries. Sometimes they don’t and one has to rely on clever use of guessing and Google Translate. If the country has a diverse ethnoracial demographic, obtain data for this as well. If possible, try to obtain data for multiple levels of administrative divisions and time periods so one can see changes over levels or time. Sometimes data will be available for a variety of years, so one can do a longitudinal study. Other times one will have to average all the years for each variable.
  4. If there are lots of variables to choose from, then choose a diverse mix of variables. Avoid variables that are overly dependent on local natural environment, such as the presence of a large body of water.
  5. Use the redundancy algorithm to remove the most redundant variables. I usually use a threshold of |.90|, such that if a pair of variables in the dataset correlate >= that level, then remove one of them. One can also average them if they are e.g. gendered versions, such as life expectancy or mean income by gender.
  6. Use the mixedness algorithms to detect if any cases are structural outliers, i.e. that they don’t fit the factor structure of the remaining cases. Create parallel datasets without the problematic cases.
  7. Factor analyze the dataset with outliers with ordinary factor analysis (FA), rank order and robust FA. Use ordinary FA on the dataset without the structural outliers. Plot all the FA loading sets using the loadings plotter function. Make note of variables that change their loadings between analyses, and variables that load in unexpected ways.
  8. Extract the S factors and examine their relationship to the ethnoracial variables and cognitive scores.
  9. If the country has seen substantial immigration over the recent decades, it may be a good idea to regress out the effect of this demographic and examine the loadings.
  10. Write up the results. Use lots of loading plots and scatter plots with names.
  11. After you have written a draft, contact natives to get their opinion. Maybe you missed something important about the country. People who speak the local language are also useful when gathering data, but generally, you will have to do things yourself.


If I missed something, let me know.

Due to lengthy discussion over at Unz concerning the good performance of some African groups in the UK, it seems worth it to review the Danish and Norwegian results. Basically, some African groups perform better on some measures than native British. The author is basically arguing that this disproves global hereditarianism. I think not.

The over-performance relative to home country IQ of some African countries is not restricted to the UK. In my studies of immigrants in Denmark and Norway, I found the same thing. It is very clear that there are strong selection effects for some countries, but not others, and that this is a large part of the reason why the home country IQ x performance in host country are not higher. If the selection effect was constant across countries, it would not affect the correlations. But because it differs between countries, it essentially creates noise in the correlations.

Two plots:


The codes are ISO-3 codes. SO e.g. NGA is Nigeria, GHA is Ghana, KEN = Kenya and so on. They perform fairly well compared to their home country IQ, both in Norway and Denmark. But Somalia does not and the performance of several MENAP immigrants is abysmal.

The scores on the Y axis are S factor scores for their performance in these countries. They are general factors extracted from measures of income, educational attainment, use of social benefits, crime and the like. The S scores correlate .77 between the countries. For details, see the papers concerning the data:

I did not use the scores from the papers, I redid the analysis. The code is posted below for those curious. The kirkegaard package is my personal package. It is on github. The megadataset file is on OSF.


p_load(kirkegaard, ggplot2)

M = read_mega("Megadataset_v2.0e.csv")

DK = M[111:135] #fetch danish data
DK = DK[miss_case(DK) <= 4, ] #keep cases with 4 or fewer missing
DK = irmi(DK, noise = F) #impute the missing
DK.S = fa(DK) #factor analyze
DK_S_scores = data.frame(DK.S = as.vector(DK.S$scores) * -1) #save scores, reversed
rownames(DK_S_scores) = rownames(DK) #add rownames

M = merge_datasets(M, DK_S_scores, 1) #merge to mega

ggplot(M, aes(LV2012estimatedIQ, DK.S)) + 
  geom_point() +
  geom_text(aes(label = rownames(M)), vjust = 1, alpha = .7) +
  geom_smooth(method = "lm", se = F)

# Norway ------------------------------------------------------------------

NO_work = cbind(M[""], #for work data

NO_income = cbind(M["Norway.Income.index.2009"], #for income data

#make DF
NO = cbind(M["NorwayViolentCrimeAdjustedOddsRatioSkardhamar2014"],

#get 5 year means
NO[""] = apply(NO_work[1:5],1,mean,na.rm=T) #get means, ignore missing
NO["OutOfWork.2010to2014.women"] = apply(NO_work[6:10],1,mean,na.rm=T) #get means, ignore missing

#get means for income and add to DF
NO["Income.index.2009to2012"] = apply(NO_income,1,mean,na.rm=T) #get means, ignore missing

plot_miss(NO) #view is data missing?

NO = NO[miss_case(NO) <= 3, ] #keep those with 3 datapoints or fewer missing
NO = irmi(NO, noise = F) #impute the missing

NO_S = fa(NO) #factor analyze
NO_S_scores = data.frame(NO_S = as.vector(NO_S$scores) * -1) #save scores, reverse
rownames(NO_S_scores) = rownames(NO) #add rownames

M = merge_datasets(M, NO_S_scores, 1) #merge with mega

ggplot(M, aes(LV2012estimatedIQ, NO_S)) +
  geom_point() +
  geom_text(aes(label = rownames(M)), vjust = 1, alpha = .7) +
  geom_smooth(method = "lm", se = F)


cor(M$NO_S, M$DK.S, use = "pair")