Researcher degrees of freedom refer to the choices researchers make when conducting a study. There are many choices to be made, where to collect data, which variables to include, etc. However, a large subset of the choices concern only the question of how to analyze the data. Still I have now done 100s of analyses rigorous enough to publish, I know exactly what this means. I will give some examples from a work in progress.

1. Which variables to use?

The dataset I began with contains 75 columns. Some of these are names and the like, but many of them are socioeconomic variables in a broad sense. Which should be used? I picked some by judgment call with prior S studies, but I left out e.g. population density, mean age, pct. of population <16/working/old age. Should these have been included? Maybe.

2. What to do with City of London?

In the study, I examine the S factor among London boroughs. There are 32 boroughs and the City of London. The CoL is fairly small which can be rise to sampling error and effects related to being a very peculiar administrative division.

Furthermore, many variables in the dataset lack data for CoL. So I was faced with the question of what to do with it. Some options: 1) Exclude it. 2) Use only the variables for which there is data for CoL, 3) use more variables than has data for CoL and impute the rest. I chose (1), but one might have gone with either of the three.

3. The extra crime data

I found another dataset with crime counts. I calculated per capita versions of these. There are two level of types of crime: broad and detailed. Which should be used? One could also have factor analyzed the data and used the general factor scores. Or calculated a unit-weighted score (standardized all variables, then score cases by average of each variable). I used detailed variables.

4. The extra GCSE data

I found another dataset with GCSE measures. These exist for both genders together and for each gender alone. There are 9 different variables to choose from. Which should be used? Same options as before too: factor scores or unit-weighted average. I selected one for theoretical reasons (similarity to other scholastic variables e.g. PISA) and because Jensen’s method supported this choice.

5. How to deal with missing data

Before factor analyzing the data, one has the question of how to deal with missing data. Aside from CoL, a few other cases had some missing data. Some options: 1) exclude them, 2) impute them with means, 2) impute with best guess (various ways!). Which should be done? I used imputation with multiple regression method, one could have used e.g. k nearest means imputation instead.

6. How to deal with highly correlated variables

Sometimes including variables that correlate very strongly or even perfectly can seriously throw off the factor analysis results because they color the general factor. If extracted multiple factors, they will form their own factor. What should be done with these? 1) Nothing, 2) exclude based on a threshold value of max allowed intercorrelation. If (2), which value should be used? I used |.9|, but |.8| is about equally plausible.

7. How to deal with highly mixed cases

Sometimes some cases just don’t fit the factor structure of the data very well. They are structural outliers or mixed. What should be done with them? 1) Nothing, 2) Use rank-order data, 3) Use robust regression (many methods), 4) Change outlier values (e.g. any value >|3| sd gets reduced to 3 sd., 5) exclude them. If (5), which thresholds should we use for exclusion cutoff? [no answers forthcoming]. I chose to do (1), (2) and (5) and only excluded the most mixed case (Westminster).

Researcher choices as parameters

I made many more decisions than the ones mentioned above, but they are the most important ones (i think, so maybe!). Normally, research papers don’t mention these kind of choices. Sometimes they mention them, but doesn’t report results by different choices. I suspect a lot of this is due to the hassle of actually doing all the combinations.

However, the hassle is potentially much smaller if one had a general framework for doing it with programming tools. So I propose that as general, one should consider these kind of choices as parameters and calculate results for all of them. In the above, this means e.g. results with and without CoL, different variable exclusion thresholds, different choices with regards to mixed cases.

Theoretically, one could think of it as a hyperspace where every dimension is a choice for one of these options. Then one could examine the distribution of results over all parameter values to examine the robustness of the results re. analytic choices.

I have already been doing this for the choice of dealing with mixed cases, but perhaps I should ramp it up and do it more thoroly for other choices too. In this case, the threshold for exclusion of variables and which set of crime variables to use are important choices.

Just a quick analysis. When I read the Dutch crime report that forms the basis of this paper, I noticed one table that had crime rates by the proportion of immigrants in the neighborhood. Generally, one would expect r (immigrant% x S) to be negative and since r (S x crime) is negative, one would predict a positive r (immigrant% x crime). Is this the case? Well, mostly. The data are divided into 2 generation and 2 age groups, so there are 4 sub-datasets with lots of missing data and sampling error. If we just use all the cases as if they were independent and get rid of the data we get this result:

Immi% mean sd median trimmed mad min max range skew kurtosis
X0.5. 1.137 0.182 1.026 1.113 0.039 1 1.588 0.588 1.073 -0.148
X5.15. 1.284 0.292 1.162 1.258 0.24 1 1.938 0.938 0.809 -0.641
X15.50. 1.509 0.65 1.382 1.381 0.465 1 3.812 2.812 2.203 4.758
X.50. 1.769 1.154 1.435 1.526 0.471 1 5.812 4.812 2.36 4.937


In other words, within each group (N=28), the ones living in the areas with more immigrants are more crime-prone. There is however substantial variation. Sometimes the pattern is the reverse for no discernible reason. E.g. 12-17 year olds from Morocco have lower crime rates in the more immigrant heavy areas (7.4, 7.1, 6.5, 6.1).

The samples are too small for one to profitably dig more into it, I think.

R code & data


p_load(plyr, magrittr, readODS, kirkegaard, psych)

#load data from file
d_orig = read.ods("Z:/code/R/dutch_crime_area.ods")[[1]]
d_orig[d_orig=="" | d_orig=="0"] = NA

colnames(d_orig) = d_orig[1, ]
d_orig = d_orig[-1, ]

#remove cases with missing
d = na.omit(d_orig)

#remove names
origins = d$Origin
d$Origin = NULL

#remove unknown + total
d$Unknown = NULL
d$Total = NULL

#to numeric
d = lapply(d, as.numeric) %>% as.data.frame

#convert to standardized rates
d_std = adply(d, 1, function(x) {
  x_min = min(x)
  x_ret = x/x_min

describe(d_std) %>% write_clipboard

In the review of a paper submitted to ODP some time ago, the issue of a general extremism factor in religion came up. Unfortunately, Dutton deleted the submission thread, so the discussion is forever lost to history (possibly could be recovered from backups of the forum, but not worth the trouble; Yes I looked at the Wayback Machine with no luck).

Specifically, the topic was if and how one could rank order Christian denominations on a more/less extremist scale. Much discussion about them actually assumes this pattern (e.g. saying that , but as far as I know, it has not yet been examined empirically. I see two ways to examine it empirically:

1. A person-level approach
Person-level data where their beliefs re. a number of matters are given as well as their denomination. This allows for the calculation of mean acceptance rates of these beliefs within each denomination. These mean acceptance rates may then be factor analyzed to see if there is a general factor. For this to work, one would need at the very least 3 religious beliefs of central importance re. extremism (e.g. young earth creationism, stance towards homosexuals, atheists, abortion, sex outside marriage, freedom of speech wrt. religious criticism). Furthermore, one will need a number of different denominations, I’d say as least 10.

2. A denomination-level approach
Alternatively, one could examine it if one could find official beliefs from each denomination re. a range of issues. Critically, these must be expressible in numerical forms because that is required for factor analysis to work. Because the beliefs of persons belonging to a denomination often conflict with official beliefs (e.g. in Catholicism), the first approach is probably better.

Religious beliefs among Muslims
I was recently reminded of the above due to re-seeing Pew Research’s large-scale study of the beliefs of Muslims in their home countries. The dataset is publicly available and is fairly massive: 250 variables and a sample size of 32.6k. The questions cover socioeconomic variables as well as a large number of questions about stuff like Sharia:

sharia law

We see that there is a wide variety of beliefs, both within and between countries. Because persons can be grouped by country, this makes it possible to conduct factor analysis both at the case-level (within each country or pooled) and at the country-level. Islam does not have that many denominations, so a denomination-level analysis does not seem possible.

Predictive validity in country of origin studies
My studies of immigrant performance in Denmark, Norway, Finland and the Netherlands have shown that Islam prevalence in the homeland has some predictive validity for the socioeconomic outcomes of the migrants. If some of this predictive validity is due to religious conflict or religious extremism, then the degree of extremism in the home country should be a moderating (interaction) variable. It doesn’t initially appear to be the case because the major outlier with regards to socioeconomic performance is usually Indonesia, but as we can see above, they seem to be fairly extremist in their beliefs, at least with regards to Sharia.

A very quick look
Examining the factor structure of the religious beliefs require a number of hours dedicated to re-coding variables. This is because for some questions, they were only asked if they interviewee answered a particular way to an earlier question. I don’t have time to do this right now. I’m hoping posting this will inspire someone else to dig into it (write me an email/direct tweet/etc.).

However, to show that the idea is fruitful I did a little analysis. I used the following variables:

  • Q13. Generally, how would you rate Islamic political parties compared to other political parties? Are they better, worse or about the same as other parties?
  • Q14.Some feel that we should rely on a democratic form of government to solve our country’s problems.Others feel that we should rely on a leader with a strong hand to solve our country’s problems. Which comes closer to your opinion?
  • Q15. In your opinion, how much influence should religious leaders [IN IRN: religious figures] have in political matters? A large influence, some influence, not too much influence or no influence at all?
  • Q16. Which one of these comes closest to your opinion, number 1 or number 2? [morality and religion]
    Number 1 – It is not necessary to believe in God in order to be moral and have good values
    Number 2 – It is necessary to believe in God in order to be moral and have good values
  • Q20. Thinking about evolution [IN IRN: of humans and other living things], which comes closer to your view?
    Humans and other living things have evolved over time
    Humans and other living things have existed in their present form since the beginning of time
  • Q26. Which comes closer to describing your view? [IN IRN: In general,] Western music, movies and television have hurt morality in our country, OR western music, movies and television have NOT hurt morality in our country?
  • Q34. On average, how often do you attend the mosque for salah and Jum’ah Prayer [IN RUS: Friday afternoon prayer]?
  • Q36. How important is religion in your life – very important, somewhat important, not too important, or not at all important?
  • Q37. How comfortable would you be if a son of yours someday married a Christian?  Would you be very comfortable, somewhat comfortable, not too comfortable or not at all comfortable?
  • Q38. How comfortable would you be if a daughter of yours someday married a Christian?  Would you be very comfortable, somewhat comfortable, not too comfortable or not at all comfortable?
  • Q43a. Which, if any, of the following do you believe: in Heaven, where people who have led good lives [IN TUR: life without sin] are eternally rewarded?
  • Q43b. Which, if any, of the following do you believe: in Hell, where people who have led bad lives [IN TUR: life of sin] and die without being sorry are eternally punished?
  • Q43c. Which, if any, of the following do you believe: in angels?
  • Q43d. Which, if any, of the following do you believe: in witchcraft?
  • Q43e. Which, if any, of the following do you believe: in the ‘evil eye’ or that certain people can cast curses or spells that cause bad things to happen to someone?
  • Q43f. Which, if any, of the following do you believe: in predestination or fate (Kismat/Qadar)?
  • Q43g. Which, if any, of the following do you believe: in  jinns?

They were picked because they stood out to me as useful when I was scrolling down the list of variables, not because I examined all variables and handpicked these. There should be many more useful variables (good! because we like to extract general factors from indicators of a wide variety).

I re-coded them all so that higher values correspond to more extreme religious beliefs as judged by me, e.g. unacceptability of children marrying Christians (Q37-38). “Don’t know” and “refusal” were coded as missing. One could recode “don’t know” as the mean of each scale instead, perhaps, but this would require some more work from me.

Then factor analysis was run as usual. Results are shown further below. A general factor seems confirmed. I hereby dub it general religious factor (GRF), hopefully no one has used that term or letter combination yet (not true according to Google!). There was a lot of missing data however because not all interviewees wanted to answer all questions and not all questions was asked in all countries. We can impute this data (takes 11 mins on my computer, large dataset!). This does not change the general pattern of the results (good, otherwise the imputation would be introducing error), but it does allow us to calculate a mean GRF score for every country (i.e. mean of each case from that country; not taking weights into account).

Mean level by country (reordered):


Aside from Indonesia, which I had heard was less extremist, these results are not that surprising. The Muslim populations in central Asia and Europe are less extremist than those in MENAP.

Country-level GRF
Next up is calculating the country-level data. To do a country-level analysis, we need the mean score for each variable for each country. This is fairly tedious to calculate by hand or low-level code, but Hadley has made it fairly easy with plyr (or dplyr). This reduces the dataset to a 26 x 17 matrix from the original 32.6k x 17. I factor analyzed it as before. For comparison, we plot the loadings together:


Aside from the generally stronger loadings at the country-level (common finding), loadings are fairly similar. Factor congruence is .98, correlation is .87. At the country-level, only one variable has a negative loading and only slightly (believing in the existence of evil eyes, a belief not central (included in?) to Islam as far as I know).

One can also extract country-level scores from the country-level analysis and compare them with the mean scores from the individual-level analysis.


The findings are essentially the same whether we analyze at individual-level and then aggregate (x-axis), or aggregate and then analyze (y-axis). This is not a spurious finding as loadings can change quite a bit between levels depending on the way the data are aggregated.

There is a lot more one could do with this but I will leave it here for now. If someone knows of a suitable open access/science journal to publish this in, let me know. I could use Winnower, but I want some input from people who actually study religion in a comparative religi(on)ology.

Files uploaded to OSF: osf.io/pwcgz/

We will be submitting the Admixture in the Americas article (first part) to Mankind Quarterly as a target article. Therefore, we are looking for people to comment on it. We are looking for people with substantial knowledge regarding the question of ethnic/race and national differences in psychological traits, primarily cognitive ability, and socioeconomic outcomes. Especially welcome are serious critics, which are very hard to find.

Send me an email if you would like to be a commenter. I don’t mean just oldschool academics. I intend to invite most prominent HBDers to submit a formal commentary paper to MQ.


“But Emil”, you say, “isn’t that a closed access journal, and you said you refuse to publish in those?”

Right you are. However, MQ allows us to post the PDFs elsewhere, e.g. ResearchGate, so this won’t be a problem.

I am considering starting another OpenPsych journal focused on sociology and political science. This is because I need somewhere to publish my S factor papers and I want an open science journal, with open data, code, and review. My guess is that such a journal does not exist right now, so I will have to start one. To do this I need a review team. Since submissions are probably going to be few, this means that it is not a very time consuming job. If we use the policy of generally recruiting 1 ad hoc external reviewer for every submission, this means that only 2 internal reviewers are needed for submissions.

So far I have asked Noah Carl who said that he would be “happy to review something now and again”. To start the journal, we probably need someone like >=5 people.

To be a reviewer you should be familiar with research in the area and have substantial expertise in statistics. The latter is most important. Please write me an email if you are interested in this.


The book is on Libgen (free download).

Since I have ventured into criminology as part of my ongoing research program into the spatial transferability hypothesis (psychological traits are stable when people move around, including between countries) and the immigrant groups by country of origin studies, I thought it was a good idea to actually read some criminology. So since there was a recent book covering genetically informative studies, this seemed like a decent choice, especially because it was also available on libgen for free! :)

So basically it is a debate book with a number of topics. For each topic, someone (or a group of someones) will argue for or explain the non-genetic theories/hypotheses, while another someone will sum up the genetically informative studies (i.e. behavioral genetics studies into crime) or at least biologically informed (e.g. neurological correlates of crime).

Initially, I read all the sociological chapters too until I decided they were a waste of time to read. Then I just read the biosocial ones. If you are wondering about the origin of that term as opposed to the more commonly used synonym sociobiological, the use of it was mostly a move to avoid the political backslash. One of the biosocial authors explained it like this to me:

In terms of the name biosocial (versus sociobiological), I think the name change happened accidentally. But there was somewhat of a reason, I guess. EO Wilson and sociobiological thought was so hated amongst sociologists and criminologists, none of us would have gotten a job had we labelled ourselves sociobiologists. Though it was no great secret that sociobiology gave birth to our field. In some ways, it was purely a semantic way to fend off attacks. Even so, there are some distinctions between us and old school sociobiology (use of behavior genetic techniques, etc.).

The book suffers from the widespread problem in social science of not giving effect size numbers. This is more of a problem for the sociological chapters, but true also for the biosocial ones. If no effect sizes are not reported, one cannot compare the importance of the alleged causes! Note that behavioral genetics results inherently include effect sizes. The simplest ACE fitting will output the effect sizes for additive genetics, shared environment and unshared environment+error.

Even if you don’t plan to read much of this, I recommend reading the highly entertaining chapter: The Role of Intelligence and Temperament in Interpreting the SES-Crime Relationship by Anthony Walsh, Charlene Y. Taylor, and Ilhong Yun.

What is age heaping?

Number heaping is a common tendency of humans. What this means is that we tend round numbers to the nearest 5 or 10. Age heaping is the tendency of innumerate people to round their age to the nearest 5 or 10, presumably because they can’t subtract to infer their current age from their birth year and the current year. Psychometrically speaking, this is a very easy mathematical test, so why is it useful? Surely everybody but small children can do it now? Yes. However, in the past, not all adults even in Western countries could do this. One can locate legal documents and tomb stones from these times and analyze the amount of age heaping. The figure below shows an example of age heaping in old Italian data.

age heaping italy

Source: “Uniting Souls” and Numeracy Skills. Age Heaping in the First Italian National Censuses, 1861-1881. A’Hearn, Delfino & Nuvolari – Valencia, 13/06/2013.

Since we know that people’s ages really are nearly uniform, that is, the number of people aged 59 and 61 should be about the same as those aged 60, we can calculate indexes for how much heaping there is and use that as a crude numeracy measure. Economic historians have been doing this for some time and so we have some fairly comprehensible datasets for age heaping by now.

Is it a useful correlate?

If you read the source above you will see that age heaping in the 1800s show the expected north/south Italy patterns, but this is just one case. Does it work in general? The answer is yes. Below I plot some of the age heaping datasets versus Lynn and Vanhanen’s (2012) national IQs:

AH1800_IQAH1820_IQ  AH1850_IQAH1870_IQ AH1890_IQ

The problem with the data is this: the older datasets cover fewer countries and the newer datasets show strong ceiling effects (lots of countries very close to 100 on the x-axis). The ceiling effects are because the test is too easy. Still, the data covers a sufficiently large number of countries to be useful for modern comparisons. For instance, we can predict immigrant performance in Scandinavian countries based on their numeracy ability in the 1800s. Below I plot general socioeconomic performance (a general factor of education, income, use of social benefits and crime in Denmark in 2012) and age heaping in 1890:


The actual correlations are shown below:

AH1800 AH1820 AH1850 AH1870 AH1890 LV12 IQ S in DK
AH1800 1 0.95 0.94 0.96 0.9 0.85 0.61
AH1820 0.95 1 0.94 0.94 0.76 0.62 0.67
AH1850 0.94 0.94 1 0.99 0.84 0.73 0.59
AH1870 0.96 0.94 0.99 1 0.96 0.64 0.56
AH1890 0.9 0.76 0.84 0.96 1 0.52 0.73
LV12 IQ 0.85 0.62 0.73 0.64 0.52 1 0.54
S in DK 0.61 0.67 0.59 0.56 0.73 0.54 1


And the sample sizes:

AH1800 AH1820 AH1850 AH1870 AH1890 LV12 IQ S in DK
AH1800 31 25 22 22 24 29 24
AH1820 25 45 37 22 36 43 27
AH1850 22 37 45 27 37 43 30
AH1870 22 22 27 62 56 61 34
AH1890 24 36 37 56 109 107 50
LV12 IQ 29 43 43 61 107 203 68
S in DK 24 27 30 34 50 68 70


Great, where can I find the datasets?

Fortunately, they are freely available. The easiest solution is probably just to download the worldwide megadataset, which contains a number of the age heaping variables and lots of other variables for you to play around with: osf.io/zdcbq/files/

Alternatively, you can Baten’s age heaping data directly: www.clio-infra.eu/datasets/indicators

R code

#this is assuming you have loaded the megadataset as DF.supermega
temp = subset(DF.supermega, select = c("AH1800", "AH1820", "AH1850", "AH1870", "AH1890", "LV2012estimatedIQ", "S.factor.in.Denmark.Kirkegaard2014"))
write_clipboard(wtd.cors(temp), digits = 2)

for (year in c("AH1800", "AH1820", "AH1850", "AH1870", "AH1890")) {
  ggplot(DF.supermega, aes_string(year, "LV2012estimatedIQ")) + geom_point() + geom_smooth(method = lm) + geom_text(aes(label = rownames(temp)))
  name = str_c(year, "_IQ.png")

ggplot(DF.supermega, aes(AH1890, S.factor.in.Denmark.Kirkegaard2014)) + geom_point() + geom_smooth(method = lm) + geom_text(aes(label = rownames(temp)))

John Fuerst suggested that I write a meta-analysis, review and methodology paper on the S factor. That seems like a decent idea once I get some more studies done (data are known to exist on France (another level), Japan (analysis done, writing pending), Denmark, Sweden and Turkey (reanalysis of Lynn’s data done, but there is much more data).

However, before doing that it seems okay to post my check list here in case someone else is planning on doing a study.

A methodology paper is perhaps not too bad an idea. Here’s a quick check list of what I usually do:
  1. Find some country for which there exist administrative divisions that number preferably at least 10 and as many as possible.
  2. Find cognitive data for these divisions. Usually this is only available for fairly large divisions, like states but may sometimes be available for smaller divisions. One can sometimes find real IQ test data, but usually one will have to rely on scholastic ability tests such as PISA. Often one will have to use a regional or national variant of this.
  3. Find socioeconomic outcome data for these divisions. This can usually be found at some kind of official statistics bureau’s website. These websites often have English language editions for non-English speaker countries. Sometimes they don’t and one has to rely on clever use of guessing and Google Translate. If the country has a diverse ethnoracial demographic, obtain data for this as well. If possible, try to obtain data for multiple levels of administrative divisions and time periods so one can see changes over levels or time. Sometimes data will be available for a variety of years, so one can do a longitudinal study. Other times one will have to average all the years for each variable.
  4. If there are lots of variables to choose from, then choose a diverse mix of variables. Avoid variables that are overly dependent on local natural environment, such as the presence of a large body of water.
  5. Use the redundancy algorithm to remove the most redundant variables. I usually use a threshold of |.90|, such that if a pair of variables in the dataset correlate >= that level, then remove one of them. One can also average them if they are e.g. gendered versions, such as life expectancy or mean income by gender.
  6. Use the mixedness algorithms to detect if any cases are structural outliers, i.e. that they don’t fit the factor structure of the remaining cases. Create parallel datasets without the problematic cases.
  7. Factor analyze the dataset with outliers with ordinary factor analysis (FA), rank order and robust FA. Use ordinary FA on the dataset without the structural outliers. Plot all the FA loading sets using the loadings plotter function. Make note of variables that change their loadings between analyses, and variables that load in unexpected ways.
  8. Extract the S factors and examine their relationship to the ethnoracial variables and cognitive scores.
  9. If the country has seen substantial immigration over the recent decades, it may be a good idea to regress out the effect of this demographic and examine the loadings.
  10. Write up the results. Use lots of loading plots and scatter plots with names.
  11. After you have written a draft, contact natives to get their opinion. Maybe you missed something important about the country. People who speak the local language are also useful when gathering data, but generally, you will have to do things yourself.


If I missed something, let me know.

Due to lengthy discussion over at Unz concerning the good performance of some African groups in the UK, it seems worth it to review the Danish and Norwegian results. Basically, some African groups perform better on some measures than native British. The author is basically arguing that this disproves global hereditarianism. I think not.

The over-performance relative to home country IQ of some African countries is not restricted to the UK. In my studies of immigrants in Denmark and Norway, I found the same thing. It is very clear that there are strong selection effects for some countries, but not others, and that this is a large part of the reason why the home country IQ x performance in host country are not higher. If the selection effect was constant across countries, it would not affect the correlations. But because it differs between countries, it essentially creates noise in the correlations.

Two plots:


The codes are ISO-3 codes. SO e.g. NGA is Nigeria, GHA is Ghana, KEN = Kenya and so on. They perform fairly well compared to their home country IQ, both in Norway and Denmark. But Somalia does not and the performance of several MENAP immigrants is abysmal.

The scores on the Y axis are S factor scores for their performance in these countries. They are general factors extracted from measures of income, educational attainment, use of social benefits, crime and the like. The S scores correlate .77 between the countries. For details, see the papers concerning the data:

I did not use the scores from the papers, I redid the analysis. The code is posted below for those curious. The kirkegaard package is my personal package. It is on github. The megadataset file is on OSF.


p_load(kirkegaard, ggplot2)

M = read_mega("Megadataset_v2.0e.csv")

DK = M[111:135] #fetch danish data
DK = DK[miss_case(DK) <= 4, ] #keep cases with 4 or fewer missing
DK = irmi(DK, noise = F) #impute the missing
DK.S = fa(DK) #factor analyze
DK_S_scores = data.frame(DK.S = as.vector(DK.S$scores) * -1) #save scores, reversed
rownames(DK_S_scores) = rownames(DK) #add rownames

M = merge_datasets(M, DK_S_scores, 1) #merge to mega

ggplot(M, aes(LV2012estimatedIQ, DK.S)) + 
  geom_point() +
  geom_text(aes(label = rownames(M)), vjust = 1, alpha = .7) +
  geom_smooth(method = "lm", se = F)

# Norway ------------------------------------------------------------------

NO_work = cbind(M["Norway.OutOfWork.2010Q2.men"], #for work data

NO_income = cbind(M["Norway.Income.index.2009"], #for income data

#make DF
NO = cbind(M["NorwayViolentCrimeAdjustedOddsRatioSkardhamar2014"],

#get 5 year means
NO["OutOfWork.2010to2014.men"] = apply(NO_work[1:5],1,mean,na.rm=T) #get means, ignore missing
NO["OutOfWork.2010to2014.women"] = apply(NO_work[6:10],1,mean,na.rm=T) #get means, ignore missing

#get means for income and add to DF
NO["Income.index.2009to2012"] = apply(NO_income,1,mean,na.rm=T) #get means, ignore missing

plot_miss(NO) #view is data missing?

NO = NO[miss_case(NO) <= 3, ] #keep those with 3 datapoints or fewer missing
NO = irmi(NO, noise = F) #impute the missing

NO_S = fa(NO) #factor analyze
NO_S_scores = data.frame(NO_S = as.vector(NO_S$scores) * -1) #save scores, reverse
rownames(NO_S_scores) = rownames(NO) #add rownames

M = merge_datasets(M, NO_S_scores, 1) #merge with mega

ggplot(M, aes(LV2012estimatedIQ, NO_S)) +
  geom_point() +
  geom_text(aes(label = rownames(M)), vjust = 1, alpha = .7) +
  geom_smooth(method = "lm", se = F)


cor(M$NO_S, M$DK.S, use = "pair")



A reanalysis of (Carl, 2015) revealed that the inclusion of London had a strong effect on the S loading of crime and poverty variables. S factor scores from a dataset without London and redundant variables was strongly related to IQ scores, r = .87. The Jensen coefficient for this relationship was .86.



Carl (2015) analyzed socioeconomic inequality across 12 regions of the UK. In my reading of his paper, I thought of several analyses that Carl had not done. I therefore asked him for the data and he shared it with me. For a fuller description of the data sources, refer back to his article.

Redundant variables and London

Including (nearly) perfectly correlated variables can skew an extracted factor. For this reason, I created an alternative dataset where variables that correlated above |.90| were removed. The following pairs of strongly correlated variables were found:

  1. median.weekly.earnings and log.weekly.earnings r=0.999
  2. GVA.per.capita and log.GVA.per.capita r=0.997
  3. R.D.workers.per.capita and log.weekly.earnings r=0.955
  4. log.GVA.per.capita and log.weekly.earnings r=0.925
  5. economic.inactivity and children.workless.households r=0.914

In each case, the first of the pair was removed from the dataset. However, this resulted in a dataset with 11 cases and 11 variables, which is impossible to factor analyze. For this reason, I left in the last pair.

Furthermore, because capitals are known to sometimes strongly affect results (Kirkegaard, 2015a, 2015b, 2015d), I also created two further datasets without London: one with the redundant variables, one without. Thus, there were 4 datasets:

  1. A dataset with London and redundant variables.
  2. A dataset with redundant variables but without London.
  3. A dataset with London but without redundant variables.
  4. A dataset without London and redundant variables.

Factor analysis

Each of the four datasets was factor analyzed. Figure 1 shows the loadings.


Figure 1: S factor loadings in four analyses.

Removing London strongly affected the loading of the crime variable, which changed from moderately positive to moderately negative. The poverty variable also saw a large change, from slightly negative to strongly negative. Both changes are in the direction towards a purer S factor (desirable outcomes with positive loadings, undesirable outcomes with negative loadings). Removing the redundant variables did not have much effect.

As a check, I investigated whether these results were stable across 30 different factor analytic methods.1 They were, all loadings and scores correlated near 1.00. For my analysis, I used those extracted with the combination of minimum residuals and regression.


Due to London’s strong effect on the loadings, one should check that the two methods developed for finding such cases can identify it (Kirkegaard, 2015c). Figure 2 shows the results from these two methods (mean absolute residual and change in factor size):

Figure 2: Mixedness metrics for the complete dataset.

As can be seen, London was identified as a far outlier using both methods.

S scores and IQ

Carl’s dataset also contains IQ scores for the regions. These correlate .87 with the S factor scores from the dataset without London and redundant variables. Figure 3 shows the scatter plot.

Figure 3: Scatter plot of S and IQ scores for regions of the UK.

However, it is possible that IQ is not really related to the latent S factor, just the other variance of the extracted S scores. For this reason I used Jensen’s method (method of correlated vectors) (Jensen, 1998). Figure 4 shows the results.

Figure 4: Jensen’s method for the S factor’s relationship to IQ scores.

Jensen’s method thus supported the claim that IQ scores and the latent S factor are related.

Discussion and conclusion

My reanalysis revealed some interesting results regarding the effect of London on the loadings. This was made possible by data sharing demonstrating the importance of this practice (Wicherts & Bakker, 2012).

Supplementary material

R source code and datasets are available at the OSF.


Carl, N. (2015). IQ and socioeconomic development across Regions of the UK. Journal of Biosocial Science, 1–12. doi.org/10.1017/S002193201500019X

Jensen, A. R. (1998). The g factor: the science of mental ability. Westport, Conn.: Praeger.

Kirkegaard, E. O. W. (2015a). Examining the S factor in Mexican states. The Winnower. Retrieved from thewinnower.com/papers/examining-the-s-factor-in-mexican-states

Kirkegaard, E. O. W. (2015b). Examining the S factor in US states. The Winnower. Retrieved from thewinnower.com/papers/examining-the-s-factor-in-us-states

Kirkegaard, E. O. W. (2015c). Finding mixed cases in exploratory factor analysis. The Winnower. Retrieved from thewinnower.com/papers/finding-mixed-cases-in-exploratory-factor-analysis

Kirkegaard, E. O. W. (2015d). The S factor in Brazilian states. The Winnower. Retrieved from thewinnower.com/papers/the-s-factor-in-brazilian-states

Revelle, W. (2015). psych: Procedures for Psychological, Psychometric, and Personality Research (Version 1.5.4). Retrieved from cran.r-project.org/web/packages/psych/index.html

Wicherts, J. M., & Bakker, M. (2012). Publish (your data) or (let the data) perish! Why not publish your data too? Intelligence, 40(2), 73–76. doi.org/10.1016/j.intell.2012.01.004

1There are 6 different extraction and 5 scoring methods supported by the fa() function from the psych package (Revelle, 2015). Thus, there are 6*5 combinations.