A reanalysis of (Carl, 2015) revealed that the inclusion of London had a strong effect on the S loading of crime and poverty variables. S factor scores from a dataset without London and redundant variables was strongly related to IQ scores, r = .87. The Jensen coefficient for this relationship was .86.
Carl (2015) analyzed socioeconomic inequality across 12 regions of the UK. In my reading of his paper, I thought of several analyses that Carl had not done. I therefore asked him for the data and he shared it with me. For a fuller description of the data sources, refer back to his article.
Redundant variables and London
Including (nearly) perfectly correlated variables can skew an extracted factor. For this reason, I created an alternative dataset where variables that correlated above |.90| were removed. The following pairs of strongly correlated variables were found:
- median.weekly.earnings and log.weekly.earnings r=0.999
- GVA.per.capita and log.GVA.per.capita r=0.997
- R.D.workers.per.capita and log.weekly.earnings r=0.955
- log.GVA.per.capita and log.weekly.earnings r=0.925
- economic.inactivity and children.workless.households r=0.914
In each case, the first of the pair was removed from the dataset. However, this resulted in a dataset with 11 cases and 11 variables, which is impossible to factor analyze. For this reason, I left in the last pair.
Furthermore, because capitals are known to sometimes strongly affect results (Kirkegaard, 2015a, 2015b, 2015d), I also created two further datasets without London: one with the redundant variables, one without. Thus, there were 4 datasets:
- A dataset with London and redundant variables.
- A dataset with redundant variables but without London.
- A dataset with London but without redundant variables.
- A dataset without London and redundant variables.
Each of the four datasets was factor analyzed. Figure 1 shows the loadings.
Figure 1: S factor loadings in four analyses.
Removing London strongly affected the loading of the crime variable, which changed from moderately positive to moderately negative. The poverty variable also saw a large change, from slightly negative to strongly negative. Both changes are in the direction towards a purer S factor (desirable outcomes with positive loadings, undesirable outcomes with negative loadings). Removing the redundant variables did not have much effect.
As a check, I investigated whether these results were stable across 30 different factor analytic methods.1 They were, all loadings and scores correlated near 1.00. For my analysis, I used those extracted with the combination of minimum residuals and regression.
Due to London’s strong effect on the loadings, one should check that the two methods developed for finding such cases can identify it (Kirkegaard, 2015c). Figure 2 shows the results from these two methods (mean absolute residual and change in factor size):
As can be seen, London was identified as a far outlier using both methods.
S scores and IQ
Carl’s dataset also contains IQ scores for the regions. These correlate .87 with the S factor scores from the dataset without London and redundant variables. Figure 3 shows the scatter plot.
However, it is possible that IQ is not really related to the latent S factor, just the other variance of the extracted S scores. For this reason I used Jensen’s method (method of correlated vectors) (Jensen, 1998). Figure 4 shows the results.
Jensen’s method thus supported the claim that IQ scores and the latent S factor are related.
Discussion and conclusion
My reanalysis revealed some interesting results regarding the effect of London on the loadings. This was made possible by data sharing demonstrating the importance of this practice (Wicherts & Bakker, 2012).
R source code and datasets are available at the OSF.
Carl, N. (2015). IQ and socioeconomic development across Regions of the UK. Journal of Biosocial Science, 1–12. http://doi.org/10.1017/S002193201500019X
Jensen, A. R. (1998). The g factor: the science of mental ability. Westport, Conn.: Praeger.
Kirkegaard, E. O. W. (2015a). Examining the S factor in Mexican states. The Winnower. Retrieved from https://thewinnower.com/papers/examining-the-s-factor-in-mexican-states
Kirkegaard, E. O. W. (2015b). Examining the S factor in US states. The Winnower. Retrieved from https://thewinnower.com/papers/examining-the-s-factor-in-us-states
Kirkegaard, E. O. W. (2015c). Finding mixed cases in exploratory factor analysis. The Winnower. Retrieved from https://thewinnower.com/papers/finding-mixed-cases-in-exploratory-factor-analysis
Kirkegaard, E. O. W. (2015d). The S factor in Brazilian states. The Winnower. Retrieved from https://thewinnower.com/papers/the-s-factor-in-brazilian-states
Revelle, W. (2015). psych: Procedures for Psychological, Psychometric, and Personality Research (Version 1.5.4). Retrieved from http://cran.r-project.org/web/packages/psych/index.html
Wicherts, J. M., & Bakker, M. (2012). Publish (your data) or (let the data) perish! Why not publish your data too? Intelligence, 40(2), 73–76. http://doi.org/10.1016/j.intell.2012.01.004
1There are 6 different extraction and 5 scoring methods supported by the fa() function from the psych package (Revelle, 2015). Thus, there are 6*5 combinations.