Abstract
It has been found that workers who hail from higher socioeconomic classes have higher earnings even in the same profession. An environmental cause was offered as an explanation of this. I show that this effect is expected solely for statistical reasons.

Introduction
Friedman and Laurison (2015) offer data about the earnings of persons employed in the higher professions by their class of origin. They find that those who have a higher class origin earn more. I reproduce their figure below.

Friedman-fig-1-1024x669

They posit an environmental explanation of this:

In doing so, we have purposively borrowed the ‘glass ceiling’ concept developed by feminist scholars to explain the hidden barriers faced by women in the workplace. In a working paper recently published by LSE Sociology, we argue that it is also possible to identify a ‘class ceiling’ in Britain which is preventing the upwardly mobile from enjoying equivalent earnings to those from upper middle-class backgrounds.

There is also a longer working paper by the same authors, but I did not read that. A link to it can be found in the previously mentioned source.

A simplified model of the situation
How do persons advance to professions? Well, we know that the occupational hierarchy is basically a (general) cognitive ability hierarchy (Gottfredson, 1997), as well as presumably also one of various relevant non-cognitive traits such as being hard working/conscientiousness altho I did not find a study of this.

A simple way to model this is to think of it as a threshold effect such that no one below the given threshold gets into the profession and everybody above gets into it. This is of course not like reality. Reality does have a threshold which increases up the hierarchy. (Insert the figure from one of Gottfredson’s paper that shows the minimum IQ by occupation, but I can’t seem to locate it. Halp!) The effect of GCA is probably more like a probabilistic function akin to the cumulative distribution function such that below a certain cognitive level, virtually no one from below that level is found.

Simulating this is a bit complicated but we can approximate it reasonably by using a simple cut-off value, such that everybody above gets in, everybody below does not (see Gordon (1997) for a similar case with belief in conspiracy theories).

A simulation
One could perhaps solve this analytically, but it is easier to simulate it, so we do that. I used the following procedure:

  1. We make three groups of origin with 90, 100, and 110 IQ.
  2. We simulate a large number (1e4) of random persons from these groups.
  3. We plot these for getting a feeling of the data.
  4. We find the subgroup of each group with IQ > 115, which we take as the minimum for some high level profession.
  5. We calculate the mean IQ of each group and subgroup.

The plot looks like this:

thresholds

The vertical lines are the cut-off threshold (black), and the three means (in the corresponding color). As can be seen, they are not the same despite the same threshold being used. The values are respectively: 121.74, 122.96, and 125.33. The differences between these are not large for the present simulation, but they may be sufficient to bring about differences that are detectable in a large dataset. The values depend on how far the population mean is from the threshold and the standard deviation of the population (all 15 in the above simulation). The further away the threshold is from the mean, the closer the mean of the subgroup above the threshold will be to the threshold value. For subgroups far away, it will be nearly identical. For instance, the mean IQ of those with >=150 is about 153.94 (based on a sampling with 10e7 cases, mean 100, sd 15).

It should be noted that if one also considers measurement error, this effect will be stronger, since persons from lower IQ groups regress further down.

Supplementary material
R source code is available at the Open Science Framework repository.

References

  • Friedman, S and Laurison, D. (2015). Introducing the ‘class’ ceiling. British Politics and Policy blog.
  • Gordon, R. A. (1997). Everyday life as an intelligence test: Effects of intelligence and intelligence context. Intelligence, 24(1), 203-320.
  • Gottfredson, L. S. (1997). Why g matters: The complexity of everyday life. Intelligence, 24(1), 79-132.

Abstract
Sizeable S factors were found across 3 different datasets (from years 1991, 2000 and 2010), which explained 56 to 71% of the variance. Correlations of extracted S factors with cognitive ability were strong ranging from .69 to .81 depending on which year, analysis and dataset is chosen. Method of correlated vectors supported the interpretation that the latent S factor was primarily responsible for the association (r’s .71 to .81).

Introduction
Many recent studies have examined within-country regional correlates of (general) cognitive ability (also known as (general) intelligence, general mental ability, g),. This has been done for the British Isles (Lynn, 1979; Kirkegaard, 2015g), France (Lynn, 1980), Italy (Lynn, 2010; Kirkegaard, 2015e), Spain (Lynn, 2012), Portugal (Almeida, Lemos, & Lynn, 2011), India (Kirkegaard, 2015d; Lynn & Yadav, 2015), China (Kirkegaard, 2015f; Lynn & Cheng, 2013), Japan (Kura, 2013), the US (Kirkegaard, 2015b; McDaniel, 2006; Templer & Rushton, 2011), Mexico (Kirkegaard, 2015a) and Turkey (Lynn, Sakar, & Cheng, 2015). This paper examines data for Brazil.

Data
Cognitive data
Data from PISA was used as a substitute for IQ test data. PISA and IQ correlate very strongly (>.9; (Rindermann, 2007)) across nations and presumably also across regions altho this hasn’t been thoroly investigated to my knowledge.

Socioeconomic data
As opposed to some of my prior analyses, there was no dataset to build on top of. For this reason, I tried to find an English-language database for Brazil with a comprehensive selection of variables. Altho I found some resources, they did not allow for easy download and compilation of state-level data, which I needed. Instead, I relied upon the Portugeese-language site, Atlasbrasil.org.br, which has a comprehensive data explorer with a convenient download function for state-level data. I used Google Translate to find my way around the site.

Using the data explorer, I selected a broad range of variables. The goal was to cover most important areas of socioeconomic development and avoid variables of little importance or which are heavily influenced by local climate factors (e.g. amount of rainforest). The following variables were selected:

  1. Gini coefficient
  2. Activity rate age 25-29
  3. Unemployment rate age 25-29
  4. Public sector workers%
  5. Farmers%
  6. Service sector workers%
  7. Girls age 10-17 with child%
  8. Life expectancy
  9. Households without electricity%
  10. Infant mortality rate
  11. Child mortality rate
  12. Survive to 40%
  13. Survive to 60%
  14. Total fertility rate
  15. Dependancy ratio
  16. Aging rate
  17. Illiteracy age 11-14 %
  18. Illiteracy age 25 and above %
  19. Age 6-17 in school %
  20. Attendence higher education %
  21. Income per capita
  22. Mean income lowest quintile
  23. Pct extremely poor
  24. Richest 10 pct income
  25. Bad walls%
  26. Bad water and sanitation%
  27. HDI
  28. HDI income
  29. HDI life expectancy
  30. HDI education
  31. Population
  32. Population rural

Variables were available only for three time points: 1991, 2000 and 2010. I selected all three with intention of checking stability of results over different time periods.

Most data was already in an appropriate per unit measure so it was not necessary to do extensive conversions as with the Mexican data (Kirkegaard, 2015a). I calculated fraction of the population living in rural areas by dividing the rural population by the total population.

Note that the data explorer also has data at a lower level, that of municipals. It could be used in the future to see if the S factor holds for a lower level of aggregate analysis.

S factor loadings
I split the data into three datasets, one for 1991, 2000 and 2010.

I extracted S factors using the fa() function with default parameters from the psych package (Revelle, 2015).

S factor in 1991
Due to missing data, there were only 21 indicators available for this year. The data could not be imputed since it was missing for all cases for these variables. The loadings plot is shown in Figure 1.

S.1991.loadings

Figure 1: Loadings plot for S factor for the data from 1991

All indicators were in the expected direction aside from perhaps “aging rate”, which is somewhat unclear and would perhaps be expected to have a positive loading.

S factor in 2000
Less missing data, 26 variables. Loadings plot shown in Figure 2.

S.2000.loadings

Figure 2: Loadings plot for S factor for the 2000 data

All indicators were in the expected direction.

S factor for 2010
27 variables. Factor analysis gave an error for this dataset which means that I had to remove at least one variable.1 This left me with the question of which variable(s) to exclude. Similar to the previous analysis for Mexican states (Kirkegaard, 2015a), I used an automatic method. After removing one variable, the factor analysis worked and gave no warning. The excluded variable was child mortaility which was correlated near perfectly with another variable (infant mortality, r=.992), so little indicator sampling error should be introduced because of this deletion. The loadings plot is shown in Figure 3.

S.2010.1.loadings

Figure 3: Loadings plot for S factor for the 2010 data, minus one variable

Oddly, survival to 60 and 40 now have negative loadings, altho one would expect them to correlate highly with life expectancy which has a loading near 1. In fact, the correlations between life expectancy and the survival variables was -.06 and -.21, which makes you wonder what these variables are measuring. Excluding them does not substantially change results, but does increase the amount of variance explained to .60.

Out of curiosity, I also tried versions where I deleted 5 and 10 variables, but this did not change much in the plots, so I won’t show them. Interested readers can consult the source code.

Mixed cases
To examine whether there are any cases with strong mixedness — cases that are incongruent with the factor structure in the data — I developed two methods which are presented elsewhere (Kirkegaard, 2015c). Briefly, the first method measures the mixedness of the case by quantifying how predictable indicator scores are from the factor score for each case (mean absolute residual, MAR). The second quantifies how much the size of the general factor changes after exclusion of each individual case (improvement in proportion of variance, IPV). Both methods were found to be useful at finding a strongly mixed case in simulation data.

I applied both methods to the Brazilian datasets. For the second method, I had to create two additional reduced datasets since the factor analysis could not run with the resulting combinations of cases and indicators.

There are two ways one can examine the results: 1) by looking at the top (or bottom) most mixed cases for each method; 2) by looking at the correlations between results from the methods. The first is interesting if Brazilian state-level inequality in S has particular interest, while the second is more relevant for checking that the methods really work — they should give congruent results if mixed cases are present.

Top mixed cases
For each method and each analysis, I extracted the names of the top 5 mixed states. They are shown in Table 1.

 

Position_1

Position_2

Position_3

Position_4

Position_5

m1.1991

Amapá

Acre

Distrito Federal

Roraima

Rondônia

m1.2000

Amapá

Roraima

Acre

Distrito Federal

Rondônia

m1.2010.1

Roraima

Distrito Federal

Amapá

Amazonas

Acre

m1.2010.5

Roraima

Distrito Federal

Amapá

Acre

Amazonas

m1.2010.10

Roraima

Distrito Federal

Amapá

Acre

Amazonas

m2.1991

Amapá

Rondônia

Acre

Roraima

Amazonas

m2.2000.1

Amapá

Rondônia

Roraima

Paraíba

Ceará

m2.2010.2

Amapá

Roraima

Distrito Federal

Pernambuco

Sergipe

m2.2010.5

Amapá

Roraima

Distrito Federal

Piauí

Bahia

m2.2010.10

Distrito Federal

Roraima

Amapá

Ceará

Tocantins

 

Table 1: Top 5 mixed cases by method and dataset

As can be seen, there is quite a bit of agreement across years, datasets, and methods. If one were to do a more thoro investigation of socioeconomic differences across Brazilian states, one should examine these states for unusual patterns. One could do this using the residuals for each indicator by case from the first method (these are available from the FA.residuals() in psych2). A quick look at the 2010.1 data for Amapá shows that the state is roughly in the middle regarding state-level S (score = -.26, rank 15 of 27), Farmers do not constitute a large fraction of the population (only 9.9%, rank 4 only behind the states with large cities: Federal district, Rio de Janeiro, and São Paulo). Given that farmers% has a strong negative loading (-.77) and the state’s S score, one would expect the state to have relatively more farmers than it has, the mean of all states for that dataset is 17.2%.

Much more could be said along these lines, but I rather refrain since I don’t know much about the country and can’t read the language very well. Perhaps a researchers who is a Brailizian native could use the data to make a more detailed analysis.

Correlations between methods and datasets
To test whether the results were stable across years, data reductions, and methods, I correlated all the mixedness metrics. Results are in Table 2.

 

m1.1991

m1.2000

m1.2010.1

m1.2010.5

m1.2010.10

m2.1991

m2.2000.1

m2.2010.2

m2.2010.5

m1.1991

m1.2000

0.88

m1.2010.1

0.81

0.85

m1.2010.5

0.77

0.87

0.98

m1.2010.10

0.70

0.79

0.93

0.96

m2.1991

0.48

0.64

0.45

0.48

0.40

m2.2000.1

0.41

0.58

0.34

0.39

0.27

0.87

m2.2010.2

0.53

0.63

0.66

0.66

0.51

0.58

0.68

m2.2010.5

0.32

0.49

0.60

0.64

0.51

0.49

0.59

0.86

m2.2010.10

0.42

0.44

0.66

0.65

0.59

0.32

0.44

0.75

0.76

 

Table 2: Correlation table for mixedness metrics across datasets and methods.

There is method specific variance since the correlations within method (topleft and bottomright squares) are stronger than those across methods. Still, all correlations are positive, Cronbach’s alpha is .87, Guttman lambda 6 is .98 and the mean correlation is .61.

S and HDI correlations
HDI
Previous S factor studies have found that HDI (Human Development Index) is basically a non-linear proxy for the S factor (Kirkegaard, 2014, 2015a). This is not surprising since the HDI is calculated from longevity, education and income, all three of which are known to have strong S factor loadings. The actual derivation of HDI values is somewhat complex. One might expect them simple to average the three indicators, or extract the general factor, but no. Instead they do complicated things (WHO, 2014).

For longevity (life expectancy at birth), they introduce ceilings at 25 and 85 years. According to data from WHO (WHO, 2012), no country has values above or below these values altho Japan is close (84 years).

For education, it is actually an average of two measures: years of education by 25 year olds and expected years of schooling for children entering school age. These measures also have artificial limits of 0-18 and 0-15 respectively.

For gross national income, they use the log values and also artificial limits of 100-75,000 USD.

Moreover, these are not simply combined by standardizing (i.e. rescaling so the mean is 0 and standard deviation is 1) the values and adding them or taking the mean. Instead, a value is calculated for every indicator using the following formula:

HDI_dimension_formula
Equation 1: HDI index formula

Note that for education, this formula is used twice and the results averaged.

Finally, the three dimensions are combined using a geometric mean:

HDI_combine
Equation 2: HDI index combination formula

The use of a geometric mean as opposed to the normal arithmetic mean, is that if a single indicator is low, the overall level is strongly reduced, whereas with the arithmetic, only the sum of the indicators matter, not the standard deviation of them. If the indicators have the same value, then the geometric and arithmetic mean have the same value.

For instance, if indicators are .7, .7, .7, the arithmetic mean is .7+.7+.7=2.1, 2.1/3=.7 and the geometric .73=0.343, 0.3431/3=.7. However, if indicators are 1, .7, .4, then the arithmetic mean is 1+.7+.4=2.1, 2.1/3=.7, but geometric mean is 1*.7*.4=0.28, 0.281/3=0.654 which is a bit lower than .7.

S and HDI correlations
I used the previously extracted factor scores and the HDI data. I also extracted S factors from the HDI datasets (3 variables)2 to see how these compared with the complex HDI value derivation. Finally, I correlated the S factors from non-HDI data, S factors from HDI data, HDI values and cognitive ability scores. Results are shown in Table 2.

 

HDI.1991

HDI.2000

HDI.2010

HDI.S.1991

HDI.S.2000

HDI.S.2010

S.1991

S.2000

S.2010.1

S.2010.5

S.2010.10

CA2012

HDI.1991

0.95

0.92

0.98

0.95

0.93

0.96

0.93

0.86

0.89

0.90

0.59

HDI.2000

0.97

0.97

0.94

0.99

0.96

0.94

0.98

0.93

0.95

0.96

0.66

HDI.2010

0.94

0.98

0.93

0.98

0.99

0.93

0.97

0.94

0.97

0.98

0.65

HDI.S.1991

0.98

0.96

0.94

0.95

0.94

0.98

0.92

0.84

0.88

0.90

0.54

HDI.S.2000

0.97

1.00

0.98

0.97

0.97

0.95

0.98

0.92

0.95

0.97

0.65

HDI.S.2010

0.95

0.98

0.99

0.95

0.98

0.94

0.96

0.94

0.96

0.97

0.66

S.1991

0.96

0.96

0.94

0.97

0.97

0.96

0.92

0.86

0.90

0.91

0.60

S.2000

0.95

0.98

0.96

0.93

0.99

0.97

0.97

0.96

0.98

0.98

0.69

S.2010.1

0.89

0.94

0.94

0.86

0.94

0.95

0.92

0.97

0.99

0.96

0.76

S.2010.5

0.91

0.95

0.96

0.89

0.96

0.97

0.93

0.98

0.99

0.98

0.72

S.2010.10

0.93

0.96

0.98

0.93

0.97

0.98

0.93

0.97

0.96

0.98

0.71

CA2012

0.67

0.73

0.71

0.60

0.72

0.74

0.69

0.78

0.81

0.79

0.75

 

Table 3: Correlation matrix for S, HDI and cognitive ability scores. Pearson’s below the diagonal, rank-order above.

All results were very strongly correlated no matter which dataset or scoring method was used. Cognitive ability scores were strongly correlated to all S factor measures. The best estimate of the relationship between S factor and cognitive ability is probably the correlation with S.2010.1, since this is the dataset cloest in time to the cognitive dataset and the S factor is extracted from the most variables. This is also the highest value (.81), but that may be a coincidence.

It is worth noting that the rank-order correlations were somewhat weaker. This usually indicates that an outlier case is increasing the Pearson correlation. To investigate this, I plot the S.2010.1 and CA2012 variables, see Figure 4.

CA_S_2010_1
Figure 4: Scatter plot of S factor and cognitive ability

The scatter plot however does not seem to reveal any outliers inflating the correlation.

Method of correlated vectors
To examine whether the S factor was plausibly the cause of the pattern seen with the S factor scores (it is not necessarily), I used the method of correlated vectors with reversing. Results are shown in Figures 5-7.

MCV_1991
Figure 5: MCV for the 1991 dataset

MCV_2000
Figure 6: MCV for the 2000 dataset

MCV_2010_1
Figure 7: MCV for the 2010 dataset

The first result seems to be driven by a few outliers, but the second and third seems decent enough. The numerical results were fairly consistent (.71, .75, .81).

Discussion and conclusion
Generally, the results were in line with earlier studies. Sizeable S factors were found across 3 (or 6 if one counts the mini-HDI ones) different datasets, which explained 56 to 71% of the variance. There seems to be a decrease over time, which is intriguing as it is may eventually lead to the ‘destruction’ of the S factor. It may also be due to differences between the datasets across the years, since they were not entirely comparable. I did not examine the issue in depth.

Correlations of S factors and HDIs with cognitive ability were strong ranging from .60 to .81 depending on which year, analysis, dataset is chosen, and whether one uses the HDI values. Correlations were stronger when they were from the larger datasets, which is perhaps because they were better measures of latent S. MCV supported the interpretation that the latent S factor was primarily responsible for the association (r’s .71 to .81).

Future studies should examine to which degree cognitive ability and S factor differences are explainable by ethnracial factors e.g. racial ancestry as done by e.g. (Kirkegaard, 2015b).

Limitations
There are some problems with this paper:

  • I cannot read Portuguese and this may have resulted in including some incorrect variables.
  • There was a lack of crime variables in the datasets, altho these have central importance for sociology. None were available in the data source I used.

Supplementary material
R source code, data and figures can be found in the Open Science Framework repository.

References

Almeida, L. S., Lemos, G., & Lynn, R. (2011). Regional Differences in Intelligence and per Capita Incomes in Portugal. Mankind Quarterly, 52(2), 213.

Kirkegaard, E. O. W. (2014). The international general socioeconomic factor: Factor analyzing international rankings. Open Differential Psychology. Retrieved from openpsych.net/ODP/2014/09/the-international-general-socioeconomic-factor-factor-analyzing-international-rankings/

Kirkegaard, E. O. W. (2015a). Examining the S factor in Mexican states. The Winnower. Retrieved from thewinnower.com/papers/examining-the-s-factor-in-mexican-states

Kirkegaard, E. O. W. (2015b). Examining the S factor in US states. The Winnower. Retrieved from thewinnower.com/papers/examining-the-s-factor-in-us-states

Kirkegaard, E. O. W. (2015c). Finding mixed cases in exploratory factor analysis. The Winnower. Retrieved from thewinnower.com/papers/finding-mixed-cases-in-exploratory-factor-analysis

Kirkegaard, E. O. W. (2015d). Indian states: G and S factors. The Winnower. Retrieved from thewinnower.com/papers/indian-states-g-and-s-factors

Kirkegaard, E. O. W. (2015e). S and G in Italian regions: Re-analysis of Lynn’s data and new data. The Winnower. Retrieved from thewinnower.com/papers/s-and-g-in-italian-regions-re-analysis-of-lynn-s-data-and-new-data

Kirkegaard, E. O. W. (2015f). The S factor in China. The Winnower. Retrieved from thewinnower.com/papers/the-s-factor-in-china

Kirkegaard, E. O. W. (2015g). The S factor in the British Isles: A reanalysis of Lynn (1979). The Winnower. Retrieved from thewinnower.com/papers/the-s-factor-in-the-british-isles-a-reanalysis-of-lynn-1979

Kura, K. (2013). Japanese north–south gradient in IQ predicts differences in stature, skin color, income, and homicide rate. Intelligence, 41(5), 512–516. doi.org/10.1016/j.intell.2013.07.001

Lynn, R. (1979). The social ecology of intelligence in the British Isles. British Journal of Social and Clinical Psychology, 18(1), 1–12. doi.org/10.1111/j.2044-8260.1979.tb00297.x

Lynn, R. (1980). The social ecology of intelligence in France. British Journal of Social and Clinical Psychology, 19(4), 325–331. doi.org/10.1111/j.2044-8260.1980.tb00360.x

Lynn, R. (2010). In Italy, north–south differences in IQ predict differences in income, education, infant mortality, stature, and literacy. Intelligence, 38(1), 93–100. doi.org/10.1016/j.intell.2009.07.004

Lynn, R. (2012). North-South Differences in Spain in IQ, Educational Attainment, per Capita Income, Literacy, Life Expectancy and Employment. Mankind Quarterly, 52(3/4), 265.

Lynn, R., & Cheng, H. (2013). Differences in intelligence across thirty-one regions of China and their economic and demographic correlates. Intelligence, 41(5), 553–559. doi.org/10.1016/j.intell.2013.07.009

Lynn, R., Sakar, C., & Cheng, H. (2015). Regional differences in intelligence, income and other socio-economic variables in Turkey. Intelligence, 50, 144–149. doi.org/10.1016/j.intell.2015.03.006

Lynn, R., & Yadav, P. (2015). Differences in cognitive ability, per capita income, infant mortality, fertility and latitude across the states of India. Intelligence, 49, 179–185. doi.org/10.1016/j.intell.2015.01.009

McDaniel, M. A. (2006). State preferences for the ACT versus SAT complicates inferences about SAT-derived state IQ estimates: A comment on Kanazawa (2006). Intelligence, 34(6), 601–606. doi.org/10.1016/j.intell.2006.07.005

Revelle, W. (2015). psych: Procedures for Psychological, Psychometric, and Personality Research (Version 1.5.4). Retrieved from cran.r-project.org/web/packages/psych/index.html

Rindermann, H. (2007). The g-factor of international cognitive ability comparisons: the homogeneity of results in PISA, TIMSS, PIRLS and IQ-tests across nations. European Journal of Personality, 21(5), 667–706. doi.org/10.1002/per.634

Templer, D. I., & Rushton, J. P. (2011). IQ, skin color, crime, HIV/AIDS, and income in 50 U.S. states. Intelligence, 39(6), 437–442. doi.org/10.1016/j.intell.2011.08.001

WHO. (2012). Life expectancy. Data by country. Retrieved from apps.who.int/gho/data/node.main.688?lang=en

WHO. (2014). Human Development Report: Technical notes: Calculating the human development indices—graphical presentation. WHO. Retrieved from hdr.undp.org/sites/default/files/hdr14_technical_notes.pdf

Footnotes

1 Error in min(eigens$values) : invalid ‘type’ (complex) of argument.

2 Factor loadings for HDI factor analysis were very strong, always >.9.

Abstract
Two methods are presented that allow for identification of mixed cases in the extraction of general factors. Simulated data is used to illustrate them.

Introduction
General factors can be extracted from datasets where all or nearly so the variables are correlated. At the case-level, such general factors are decreased in size if there are mixed cases present. A mixed case is an ‘inconsistent’ case according to the factor structure of the data.

A simple way of illustrating what I’m talking about is using the matrixplot() function from the VIM package to R (Templ, Alfons, Kowarik, & Prantner, 2015) with some simulated data.

For simulated dataset 1, start by imaging that we are measuring a general factor and that all our indicator variables have a positive loading on this general factor, but that this loading varies in strength. Furthermore, there is no error of measurement and there is only one factor in the data (no group factors, i.e. no hierarchical or bi-factor structure, (Jensen & Weng, 1994)). I have used datasets with 50 cases and 25 variables to avoid the excessive sampling error of small samples and to keep a realistic number of cases compared to the datasets examined in S factor studies (e.g. Kirkegaard, 2015). The matrix plot is shown in Figure 1.

data1_matplot
Figure 1: Matrix plot of dataset 1

No real data looks like this, but it is important to understand what to look for. Every indicator is on the x-axis and the cases are on the y-axis. The cases are colored by their relative values, where darker means higher values. So in this dataset we see that any case that does well on any particular indicator does just as well on every other indicator. All the indicators have the same factor loading of 1, and the proportion of variance explained is also 1 (100%), so there is little point in showing the loadings plot.

To move towards realism, we need to complicate this simulation in some way. The first way is to introduce some measurement error. The amount of error introduced determines the factor loadings and hence the size of the general factor. In dataset 2, the error amount is .5, and the signal multiplier varies from .05 to .95 all of which are equally likely (uniform distribution). The matrix and the loadings plots are shown in Figures 2 and 3.

data2_matplot
Figure 2: Matrix plot for dataset 2

data2_loadings

Figure 3: Loadings plot for dataset 2

By looking at the matrix plot we can still see a fairly simple structure. Some cases are generally darker (whiter) than others, but there is also a lot of noise which is of course the error we introduced. The loadings show quite a bit of variation. The size of this general factor is .45.

The next complication is to introduce the possibility of negative loadings (these are consistent with a general factor, as long as they load in the right direction, (Kirkegaard, 2014)). We go back to the simplified case of no measurement error for simplicity. Figures 4 and 5 show the matrix and loadings plots.

data3_matplot
Figure 4: Matrix plot for dataset 3

data3_loadings
Figure 5: Loadings plot for dataset 3

The matrix plot looks odd, until we realize that some of the indicators are simply reversed. The loadings plot shows this reversal. One could easily get back to a matrix plot like that in Figure 1 by reversing all indicators with a negative loading (i.e. multiplying by -1). However, the possibility of negative loadings does increase the complexity of the matrix plots.

For the 4th dataset, we make a begin with dataset 2 and create a mixed case. This we do by setting its value on every indicator to be 2, a strong positive value (98 centile given a standard normal distribution). Figure 6 shows the matrix plot. I won’t bother with the loadings plot because it is not strongly affected by a single mixed case.

data4_matplot
Figure 6: Matrix plot for dataset 4

Can you guess which case it is? Perhaps not. It is #50 (top line). One might expect it to be the same hue all the way. This however ignores the fact that the values in the different indicators vary due to sampling error. So a value of 2 is not necessarily at the same centile or equally far from the mean in standard units in every indicator, but it is fairly close which is why the color is very dark across all indicators.

For datasets with general factors, the highest value of a case tends to be on the most strongly loaded indicator (Kirkegaard, 2014b), but this information is not easy to use in an eye-balling of the dataset. Thus, it is not so easy to identify the mixed case.

Now we complicate things further by adding the possibility of negative loadings. This gets us data roughly similar to that found in S factor analysis (there are still no true group factors in the data). Figure 7 shows the matrix plot.

data5_matplot
Figure 7: Matrix plot for dataset 5

Just looking at the dataset, it is fairly difficult to detect the general factor, but in fact the variance explained is .38. The mixed case is easy to spot now (#50) since it is the only case that is consistently dark across indicators, which is odd given that some of them have negative loadings. It ‘shouldn’t’ happen. The situation is however somewhat extreme in the mixedness of the case.

Automatic detection

Eye-balling figures and data is a powerful tool for quick analysis, but it cannot give precise numerical values used for comparison between cases. To get around this I developed two methods for automatic identification of mixed cases.

Method 1
A general factor only exists when multidimensional data can be usefully compressed, informationally speaking, to 1-dimensional data (factor scores on the general factor). I encourage readers to consult the very well-made visualization of principal component analysis (almost the same as factor analysis) at this website. In this framework, mixed cases are those that are not well described or predicted by a single score.

Thus, it seems to me that that we can use this information as a measure of the mixedness of a case. The method is:

  1. Extract the general factor.
  2. Extract the case-level scores.
  3. For each indicator, regress it unto the factor scores. Save the residuals.
  4. Calculate a suitable summary metric, such as the mean absolute residual and rank the cases.

Using this method on dataset 5 in fact does identify case 50 as the most mixed one. Mixedness varies between cases due to sampling error. Figure 8 shows the histogram.

data5_method1_hist
Figure 8: Histogram of absolute mean residuals from dataset 5

The outlier on the right is case #50.

How extreme does a mixed case need to be for this method to find it? We can try reducing its mixedness by assigning it less extreme values. Table 1 shows the effects of doing this.

 

Mixedness values

Mean absolute residual

2

1.91

1.5

1.45

1

0.98

 

Table 1: Mean absolute residual and mixedness

So we see that when it is 2 and 1.5, it is clearly distinguishable from the rest of the cases, but 1 is about the limit of this since the second-highest value is .80. Below this, the other cases are similarly mixed, just due to the randomness introduced by measurement error.

Method 2
Since mixed cases are poorly described by a single score, they don’t fit well with the factor structure in the data. Generally, this should result in the proportion of variance increasing when they are removed. Thus the method is:

  1. Extract the general factor from the complete dataset.
  2. For every case, create a subset of the dataset where this case is removed.
  3. Extract the general factors from each subset.
  4. For each analysis, extract the proportion of variance explained and calculate the difference to that using the full dataset.

Using this method on the dataset also used above correctly identifies the mixed case. The histogram of results is shown in Figure 9.

data5_method2_hist
Figure 9: Histogram of differences in proportion of variance to the full analysis

Like we method 1, we then redo this analysis for other levels of mixedness. Results are shown in Table 2.

 

Mixedness values

Improvement in proportion of variance

2

1.91

1.5

1.05

1

0.50

 

Table 2: Improvement in proportion of variance and mixedness

We see the same as before, in that both 2 and 1.5 are clearly identifiable as being an outlier in mixedness, while 1 is not since the next-highest value is .45.

Large scale simulation with the above methods could be used to establish distributions to generate confidence intervals from.

It should be noted that the improvement in proportion of variance is not independent of number of cases (more cases means that a single case is less import, and non-linearly so), so the value cannot simply be used to compare across cases without correcting for this problem. Correcting it is however beyond the scope of this article.

Comparison of methods

The results from both methods should have some positive relationship. The scatter plot is shown in

IPV_MAR
Figure 10: Scatter plot of method 1 and 2

We see that the true mixedness case is a strong outlier with both methods — which is good because it really is a strong outlier. The correlation is strongly inflated because of this, to r=.70 with, but only .26 without. The relative lack of a positive relationship without the true outlier in mixedness is perhaps due to range restriction in mixedness in the dataset, which is true because the only amount of mixedness besides case 50 is due to measurement error. Whatever the exact interpretation, I suspect it doesn’t matter since the goal is to find the true outliers in mixedness, not to agree on the relative ranks of the cases with relatively little mixedness.1

Implementation
I have implemented both above methods in R. They can be found in my unofficial psych2 collection of useful functions located here.

Supplementary material
Source code and figures are available at the Open Science Framework repository.

References

Jensen, A. R., & Weng, L.-J. (1994). What is a good g? Intelligence, 18(3), 231–258. doi.org/10.1016/0160-2896(94)90029-9

Kirkegaard, E. O. W. (2014a). The international general socioeconomic factor: Factor analyzing international rankings. Open Differential Psychology. Retrieved from openpsych.net/ODP/2014/09/the-international-general-socioeconomic-factor-factor-analyzing-international-rankings/

Kirkegaard, E. O. W. (2014b). The personal Jensen coefficient does not predict grades beyond its association with g. Open Differential Psychology. Retrieved from openpsych.net/ODP/2014/10/the-personal-jensen-coefficient-does-not-predict-grades-beyond-its-association-with-g/

Kirkegaard, E. O. W. (2015). Examining the S factor in US states. The Winnower. Retrieved from thewinnower.com/papers/examining-the-s-factor-in-us-states

Templ, M., Alfons, A., Kowarik, A., & Prantner, B. (2015, February 19). VIM: Visualization and Imputation of Missing Values. CRAN. Retrieved from cran.r-project.org/web/packages/VIM/index.html

Footnotes

1 Concerning the agreement about rank-order, it is about .4 both with and without case 50. But this is based on a single simulation and I’ve seen some different values when re-running it. A large scale simulation is necessary.

Abstract

Two datasets of socioeconomic data was obtained from different sources. Both were factor analyzed and revealed a general factor (S factor). These factors were highly correlated with each other (.79 to .95), HDI (.68 to .93) and with cognitive ability (PISA; .70 to .78). The federal district was a strong outlier and excluding it improved results.

Method of correlated vectors was strongly positive for all 4 analyses (r’s .78 to .92 with reversing).

Introduction

In a number of recent articles (Kirkegaard 2015a,b,c,d,e), I have analyzed within-country regional data to examine the general socioeconomic factor, if it exists in the dataset (for the origin of the term, see e.g. Kirkegaard 2014). This work was inspired by Lynn (2010) whose datasets I have also reanalyzed. While doing work on another project (Fuerst and Kirkegaard, 2015*), I needed an S factor for Mexican states, if such exists. Since I was not aware of any prior analysis of this country in this fashion, I decided to do it myself.

The first problem was obtaining data for the analysis. For this, one needs a number of diverse indicators that measure important economic and social matters for each Mexican state. Mexico has 31 states and a federal district, so one can use a decent number of indicators to examine the S factor. Mexico is a Spanish speaking country and English comprehension is fairly poor. According to Wikipedia, only 13% of people speak English there. Compare with 86% for Denmark, 64% for Germany and 35% for Egypt.

S factor analysis 1 – Wikipedian data

Data source and treatment

Unlike for the previous countries, I could not easily find good data available in English. As a substitute, I used data from Wikipedia:

These come from various years, are sometimes not given per person, and often have no useful source given. So they are of unknown veracity, but they are probably fine for a first look. The HDI is best thought of as a proxy for the S factor, so we can use it to examine construct validity.

Some variables had data for multiple time-points and they were averaged.

Some data was given in raw numbers. I calculated per capita versions of them using the population data also given.

Results

The variables above minus HDI and population size were factor analyzed using minimum residuals to extract 1 factor. The loadings plot is shown below.

S_wiki

The literacy variables had a near perfect loading on S (.99). Unemployment unexpectedly loaded positively and so did homicides per capita altho only slightly. This could be because unemployment benefits are only in existence in the higher S states such that going unemployed would mean starvation. The homicide loading is possibly due to the drug war in the country.

Analysis 2 – Data obtained from INEG

Data source and treatment

Since the results based on Wikipedia data was dubious, I searched further for more data. I found it on the Spanish-language statistical database, Instituto Nacional De Estadística Y Geografía, which however had the option of showing poorly done English translations. This is not optimal as there are many translation errors which may result in choosing the wrong variable for further analysis. If any Spanish-speaker reads this, I would be happy if they would go over my chosen variables and confirm that they are correct. I ended up with the following variables:

  1. Cost of crime against individuals and households
  2. Cost of crime on economic units
  3. Annual percentage change of GDP at 2008 prices
  4. Crime prevalence rate per 10,000 economic units
  5. Crime prevalence rate per hundred thousand inhabitants aged 18 years and over, by state
  6. Dark figure of crime on economic units
  7. Dark figure (crimes not reported and crimes reported that were not investigated)
  8. Doctors per 100 000 inhabitants
  9. Economic participation of population aged 12 to 14 years
  10. Economic participation of population aged 65 and over
  11. Economic units.
  12. Economically active population. Age 15 and older
  13. Economically active population. Unemployed persons. Age 15 and older
  14. Electric energy users
  15. Employed population by income level. Up to one minimum wage. Age 15 and older
  16. Employed population by income level. More than 5 minimum wages. Age 15 and older
  17. Employed population by income level. Do not receive income. Age 15 and older
  18. Fertility rate of adolescents aged 15 to 19 years
  19. Female mortality rate for cervical cancer
  20. Global rate of fertility
  21. Gross rate of women participation
  22. Hospital beds per 100 thousand inhabitants
  23. Inmates in state prisons at year end
  24. Life expectancy at birth
  25. Literacy rate of women 15 to 24 years
  26. Literacy rate of men 15 to 24 years
  27. Median age
  28. Nurses per 100 000 inhabitants
  29. Percentage of households victims of crime
  30. Percentage of births at home
  31. Percentage of population employed as professionals and technicians
  32. Prisoners rate (per 10,000 inhabitants age 18 and over)
  33. Rate of maternal mortality (deaths per 100 thousand live births)
  34. Rate of inhabitants aged 18 years and over that consider their neighborhood or locality as unsafe, per hundred thousand inhabitants aged 18 years and over
  35. Rate of inhabitants aged 18 years and over that consider their state as unsafe, per hundred thousand inhabitants aged 18 years and over
  36. Rate sentenced to serve a sentence (per 1,000 population age 18 and over)
  37. State Gross Domestic Product (GDP) at constant prices of 2008
  38. Total population
  39. Total mortality rate from respiratory diseases in children under 5 years
  40. Total mortality rate from acute diarrheal diseases (ADD) in population under 5 years
  41. Unemployment rate of men
  42. Unemployment rate of women
  43. Households
  44. Inhabited housings with available computer
  45. Inhabited housings that have toilet
  46. Inhabited housings that have a refrigerator
  47. Inhabited housings with available water from public net
  48. Inhabited housings that have drainage
  49. Inhabited housings with available electricity
  50. Inhabited housings that have a washing machine
  51. Inhabited housings with television
  52. Percentage of housing with piped water
  53. Percentage of housing with electricity
  54. Proportion of population with access to improved sanitation, urban and rural
  55. Proportion of population with sustainable access to improved sources of water supply, in urban and rural areas

There are were data for multiple years for most of them. I used all data from the last 10 years, approximately. For all data with multiple years, I calculated the mean value.

For data given in raw numbers, I calculated the appropriate per unit measures (per person, per economically active person (?), per household).

A matrix plot for all the S factor relevant data (e.g. not population size) is shown below. It shows missing data in red, as well as the relative difference between datapoints. Thus, cells that are completely white or black are outliers compared to the other data.

matrixplot

One variable (inmates per person) has a few missing datapoints.

Multiple other variables had strong outliers. I examined these to determine if they were real or due to data error.

Inspection revealed that the GDP per person data was clearly incorrect for one state (Campeche) but I could not find the source of error. The data is the same as on the website and did not match the data on Wikipedia. I deleted it to be safe.

The GDP change outlier seems to be real (Campeche) which has negative growth. According to this site, it is due to oil fields closing.

The rest of the outliers were hard to say something about due to the odd nature of the data (“dark crime”?), or were plausible. E.g. Mexico City (aka Federal District, the capital) was an outlier on nurses and doctors per capita, but this is presumably due to many large hospitals being located there.

Some data errors of my own were found and corrected but there is no guarantee there are not more. Compiling a large set of data like this frequently results in human errors.

Factor analysis

Since there were only 32 cases — 31 states + federal district — and 47 variables (excluding the bogus GDP per capita one), this gives problems for factor analysis. There are various recommendations, but almost none of them are met by this dataset (Zhao, 2009). To test limits, I decided to try factor analyzing all of the variables. This produced warnings:

The estimated weights for the factor scores are probably incorrect.  Try a different factor extraction method.
In factor.scores, the correlation matrix is singular, an approximation is used
Warning messages:
1: In cor.smooth(R) : Matrix was not positive definite, smoothing was done
2: In cor.smooth(R) : Matrix was not positive definite, smoothing was done
3: In cor.smooth(r) : Matrix was not positive definite, smoothing was done
4: In cor.smooth(r) : Matrix was not positive definite, smoothing was done

Warnings such these do not always mean that the result is nonsense, but they often do. For that reason, I wanted to extract an S factor with a smaller number of variables. From the 47, I selected the following 21 variables as generally representative and interpretable:

  1. GDP.change,              #Economic
  2. Unemploy.men.rate,
  3. Unemploy.women.rate,
  4. Low.income.peap,
  5. High.income.peap,
  6. Prof.tech.employ.pct,
  7. crime.rate.per.adult,   #crime
  8. Inmates.per.pers,
  9. Unsafe.neighborhood.percept.rate,
  10. Has.water.net.per.hh,    #material goods
  11. Elec.pct,
  12. Has.wash.mach.per.hh,
  13. Doctors.per.pers,      #Health
  14. Nurses.per.pers,
  15. Hospital.beds.per.pers,
  16. Total.fertility,
  17. Home.births.pct,
  18. Maternal.death.rate,
  19. Life.expect,
  20. Women.participation,   #Gender equality
  21. Lit.young.women        #education

Note that peap = per economically active person, hh = household.

The selection was made by my judgment call and others may choose different variables.

Automatic reduction of dataset

As a robustness check and evidence against a possible claim that I picked the variables such as to get an S factor that most suited my prior beliefs, I decided to find an automatic method of selecting a subset of variables for factor analysis. I noticed that in the original dataset, some variables overlapped near perfectly. This would mean that whatever they measure, it would get measured twice or more when extracting a factor. Highly correlated variables can also create nonsense solutions, especially when extracting more than 1 factor.

Another piece of insight comes from the fact that for cognitive data, general factors extracted from a less broad selection of subtests are worse measures of general cognitive ability than those from broader selections (Johnson et al, 2008).

Lastly, subtests from different domains tend to be less correlated than those from the same domain (hence the existence of group factors).

Combining all this, it seems a decent idea that to reduce a dataset by 1 variable, one should calculate all the intercorrelations and find the highest one. Then one should remove one of the variables responsible for it. One can do this repeatedly to remove more than 1 variable from a dataset. Concerning the question of which of the two variables to remove, I can think of three ways: always removing the first, always the second, choosing at random. I implemented all three settings and chose the second as the default. This is because in many datasets the first of a set of highly correlated variables is usually the ‘primary one’, E.g. unemployment, unemployment men, unemployment women. The algorithm also outputs step-by-step information concerning which variables was removed and what their correlation was.

Having written the R code for the algorithm, I ran it on the Mexican dataset. I wanted to obtain a solution using the largest possible number of variables without getting a warning from the factor extraction function. So I first removed 1 variable, and then ran the factor analysis. When I received an error, I removed another, and so on. After having removed 20 variables, I no longer received an error. This left the analysis with 27 variables, or 6 more than my chosen selection. The output from the reduction algorithm was:

> s3 = remove.redundant(s, 20)
[1] "Dropping variable number 1"
[1] "Most correlated vars are Good.water.prop and Piped.water.pct r=0.997"
[1] "Dropping variable number 2"
[1] "Most correlated vars are Piped.water.pct and Has.water.net.per.hh r=0.996"
[1] "Dropping variable number 3"
[1] "Most correlated vars are Fertility.teen and Total.fertility r=0.99"
[1] "Dropping variable number 4"
[1] "Most correlated vars are Good.sani.prop and Has.drainage.per.hh r=0.984"
[1] "Dropping variable number 5"
[1] "Most correlated vars are Victims.crime.households and crime.rate.per.adult r=0.97"
[1] "Dropping variable number 6"
[1] "Most correlated vars are Nurses.per.pers and Doctors.per.pers r=0.962"
[1] "Dropping variable number 7"
[1] "Most correlated vars are Lit.young.men and Lit.young.women r=0.938"
[1] "Dropping variable number 8"
[1] "Most correlated vars are Elec.pct and Has.elec.per.hh r=0.938"
[1] "Dropping variable number 9"
[1] "Most correlated vars are Has.wash.mach.per.hh and Has.refrig.per.household r=0.926"
[1] "Dropping variable number 10"
[1] "Most correlated vars are Prisoner.rate and Inmates.per.pers r=0.901"
[1] "Dropping variable number 11"
[1] "Most correlated vars are Unemploy.women.rate and Unemploy.men.rate r=0.888"
[1] "Dropping variable number 12"
[1] "Most correlated vars are Women.participation and Has.computer.per.household r=0.877"
[1] "Dropping variable number 13"
[1] "Most correlated vars are Hospital.beds.per.pers and Doctors.per.pers r=0.87"
[1] "Dropping variable number 14"
[1] "Most correlated vars are Has.computer.per.household and Prof.tech.employ.pct r=0.868"
[1] "Dropping variable number 15"
[1] "Most correlated vars are Unemploy.men.rate and Unemployed.15plus.peap r=0.866"
[1] "Dropping variable number 16"
[1] "Most correlated vars are Has.tv.per.hh and Has.elec.per.hh r=0.864"
[1] "Dropping variable number 17"
[1] "Most correlated vars are Has.elec.per.hh and Has.drainage.per.hh r=0.851"
[1] "Dropping variable number 18"
[1] "Most correlated vars are Median.age and Prof.tech.employ.pct r=0.846"
[1] "Dropping variable number 19"
[1] "Most correlated vars are Home.births.pct and Low.income.peap r=0.806"
[1] "Dropping variable number 20"
[1] "Most correlated vars are Life.expect and Has.water.net.per.hh r=0.796

In my opinion the output shows that the function works. In most cases, the pair of variables found was either a (near-)double measure e.g. percent of population with electricity and percent of households with electricity, or closely related e.g. literacy in men and women. Sometimes however, the pair did not seem to be closely related, e.g. women’s participation and percent of households with a computer.

Since this dataset selected the variable with missing data, I used the irmi() function from the VIM package to impute the missing data (Templ et al, 2014).

Factor loadings: stability

The factor loading plots are shown below.

S_self_all S_self_automatic S_self_chosen

Each analysis relied upon a unique but overlapping selection of variables. Thus, it is possible to correlate the loadings of the overlapping parts for each analysis. This is a measure of loading stability in different factor analytic environments, as also done by Ree and Earles (1993) for general cognitive ability factor (g factor). The correlations were .98, 1.00, .98 (n’s 21, 27, 12), showing very high stability across datasets. Note that it was not possible to use the loadings from the Wikipedian data factor analysis because the variables were not strictly speaking overlapping.

Factor loadings: interpretation

Examining the factor loadings reveals some things of interest. Generally for all analyses, whatever that is generally considered good loads positively, and whatever considered bad loads negatively.

Unemployment (together, men, women) has positive loadings, whereas it ‘should’ have negative loadings. This is perhaps because the lower S factor states have more dysfunctional or no social security nets such that not working means starvation, and that this keeps people from not working. This is merely a conjecture because I don’t know much about Mexico. Hopefully someone more knowledgeable than me will read this and have a better answer.

Crime variables (crime rate, victimization, inmates/prisoner per capita, sentencing rate) load positively whereas it should load negatively. This pattern has been found before, see Kirkegaard (2015e) for a review of S factor studies and crime variables.

Factor scores

Next I correlated the factor scores from all 4 analysis with each other as well as HDI and cognitive ability as measured by PISA tests (the cognitive data is from Fuerst and Kirkegaard, 2015*; the HDI data from Wikipedia). The correlation matrix is shown below.

“regression” method S.all S.chosen S.automatic S.wiki HDI Cognitive ability
S.all 1.00 -0.08 -0.02 0.08 -0.17 -0.12
S.chosen -0.08 1.00 0.93 0.84 0.93 0.65
S.automatic -0.02 0.93 1.00 0.89 0.88 0.74
S.wiki 0.08 0.84 0.89 1.00 0.76 0.78
HDI -0.17 0.93 0.88 0.76 1.00 0.53
Cognitive ability -0.12 0.65 0.74 0.78 0.53 1.00

 

Strangely, despite the similar factor loadings, the factor scores from the factor extracted from all the variables had about no relation to the others. This probably indicates that the factor scoring method could not handle this type of odd case. The default scoring method for the factor analysis is “regression”, but there are a few others. Bartlett’s method yielded results for S.all that fit with the other factors, while none of the others did. See the psych package documentation for details (Revelle, 2015). I changed the extraction method for all the other analyses to Bartlett’s to remove method specific variance. The new correlation table is shown below:

Bartlett’s method S.all S.chosen S.automatic S.wiki HDI.mean Cognitive ability
S.all 1.00 0.79 0.88 0.88 0.68 0.74
S.chosen 0.79 1.00 0.95 0.87 0.93 0.70
S.automatic 0.88 0.95 1.00 0.88 0.89 0.74
S.wiki 0.88 0.87 0.88 1.00 0.75 0.78
HDI.mean 0.68 0.93 0.89 0.75 1.00 0.53
Cognitive ability 0.74 0.70 0.74 0.78 0.53 1.00

 

Intriguingly, now all the correlations are stronger. Perhaps Bartlett’s method is better for handling this type of extraction involving general factors from datasets with low case to variable ratios. It certainly deserves empirical investigation, including reanalysis of prior datasets. I reran the earlier parts of this paper with the Bartlett method. It did not substantially change results. The correlations between loadings across analysis increased a bit (to .98, 1.00, .99).

One possibility however is that the stronger results is just due to Bartlett’s method creating outliers that happen to lie on the regression line. This did not seem to be the case, see scatterplots below.

Correlation_matrix

S factor scores and cognitive ability

The next question is to what degree the within country differences in Mexico can be explained by cognitive ability. The correlations are in the above table as well, they are in the region .70 to .78 for the various S factors. In other words, fairly high. One could plot all of them vs. cognitive ability, but that would give us 4 plots. Instead, I plot only the S factor from my chosen variables since this has the highest correlation with HDI and thus the best claim for construct validity. It is also the most conservative option because of the 4 S factors, it has the lowest correlation with cognitive ability. The plot is shown below:

CA_S_chosen

We see that the federal district is a strong outlier, just like in the study with US states and Washington DC (Kirkegaard, 2015c). One should then remove it and rerun all the analyses. This includes the S factor extractions because the presence of a strong ‘mixed case’ (to be explained further in a future publication) affects the S factor extracted (see again, Kirkegaard, 2015c).

Analyses without Federal District

I reran all the analyses without the federal district. Generally, this did not change much with regards to loadings. Crime and unemployment still had positive loadings.

The loadings correlations across analyses increased to 1.00, 1.00, 1.00.

S.all S.chosen S.automatic S.wiki HDI mean Cognitive ability
S.all 1.00 0.99 0.98 0.93 0.85 0.78
S.chosen 0.99 1.00 0.98 0.94 0.88 0.80
S.automatic 0.98 0.98 1.00 0.90 0.90 0.75
S.wiki 0.93 0.94 0.90 1.00 0.75 0.77
HDI mean 0.85 0.88 0.90 0.75 1.00 0.56
Cognitive ability 0.78 0.80 0.75 0.77 0.56 1.00

 

The factor score correlations increased meaning that the federal district outlier was a source of discrepancy between the extraction methods. This can be seen in the scatterplots above in that  there is noticeable variation in how far from the rest the federal district lies. After this is resolved, the S factors from the INEG dataset are in near-perfect agreement (.99, .98, .98) while the one from Wikipedia data is less so but still respectable (.93, .94, .90). Correlations with cognitive ability also improved a bit.

Method of correlated vectors

In line with earlier studies, I examine whether the measures that are better measures of the latent S factor are also correlated more highly with the criteria variable, cognitive ability.

MCV_S_all MCV_S_automatic MCV_S_chosen MCV_S_wiki

The MCV results are strong: .90 .78 .85 and .92 for the analysis with all variables, chosen variables, automatically chosen variables and Wikipedian variables respectively. Note that these are for the analyses without the federal district, but they were similar with it too.

Discussion and conclusion

Generally, the present analysis reached similar findings to those before, especially with the one about US states. Cognitive ability was a very strong correlate of the S factors, especially once the federal district outlier was removed before the analysis. Further work is needed to find out why unemployment and crime variables sometimes load positively in S factor analyses with regions or states as the unit of analysis.

MCV analysis supported the idea that cognitive ability is related to the S factor, not just some non-S factor source of variance also present in the dataset.

Supplementary material

Data files, R code, figures are available at the Open Science Framework repository.

References

  • Fuerst, J. and Kirkegaard, E. O. W. (2015*). Admixture in the Americas part 2: Regional and National admixture. (Publication venue undecided.)
  • Johnson, W., Nijenhuis, J. T., & Bouchard Jr, T. J. (2008). Still just 1g: Consistent results from five test batteries. Intelligence, 36(1), 81-95.
  • Kirkegaard, E. O. W. (2014). The international general socioeconomic factor: Factor analyzing international rankings. Open Differential Psychology.
  • Kirkegaard, E. O. W. (2015a). S and G in Italian regions: Re-analysis of Lynn’s data and new data. The Winnower.
  • Kirkegaard, E. O. W. (2015b). Indian states: G and S factors. The Winnower.
  • Kirkegaard, E. O. W. (2015c). Examining the S factor in US states. The Winnower.
  • Kirkegaard, E. O. W. (2015d). The S factor in China. The Winnower.
  • Kirkegaard, E. O. W. (2015e). The S factor in the British Isles: A reanalysis of Lynn (1979). The Winnower.
  • Lynn, R. (2010). In Italy, north–south differences in IQ predict differences in income, education, infant mortality, stature, and literacy. Intelligence, 38(1), 93-100.
  • Ree, M. J., & Earles, J. A. (1991). The stability of g across different methods of estimation. Intelligence, 15(3), 271-278.
  • Revelle, W. (2015). psych: Procedures for Psychological, Psychometric, and Personality Research. CRAN
  • Templ, M., Alfons A., Kowarik A., Prantner, B. (2014). VIM: Visualization and Imputation of Missing Values. CRAN
  • Zhao, N. (2009). The Minimum Sample Size in Factor Analysis. Encorewiki.

* = not yet published, year is expected publication year.

pumpkinperson.com/2015/04/13/is-the-sat-an-iq-test/

Blog commenter Lion of the Judah-sphere has claimed that the SAT does not correlate as well with comprehensive IQ tests as said IQ tests correlate with one another. At first I assumed he was wrong, but my recent analysis suggesting Harvard undergrads have an average Wechsler IQ of 122, really makes me wonder.

While an IQ of 122 (white norms) is 25 points above the U.S. mean of 97, it seems very low for a group of people who averaged 1490 out of 1600 on the SAT. According to my formula, since 1995 a score of 1490 on the SAT equated to an IQ of 141. But my formula was based on modern U.S. norms; because demographic changes have made the U.S. mean IQ 3 points below the white American mean (and made the U.S. standard deviation 3.4 percent larger than the white SD), converting to white norms reduces Harvard’s SAT IQ equivalent to 139.

In general, research correlating the SAT with IQ has been inconsistent, with correlations ranging from 0.4 to 0.8. I think much depends on the sample. Among people who took similar courses in similar high schools, the SAT is probably an excellent measure of IQ. But considering the wide range of schools and courses American teenagers have experienced, the SAT is not, in my judgement, a valid measure of IQ. Nor should it be. Universities should not be selecting students based on biological ability, but rather on acquired academic skills.

The lower values are due to restriction of range, e.g. Frey and Detterman (2004). When corrected, the value goes up to .7-.8 range. Also .54 using ICAR60 (Condon and Revelle, 2014) without correction for reliability or restriction.

As for the post, I can think of few things:

1. The sample recruited is likely not representative of Harvard. Probably mostly social sci/humanities students, who have lower scores.

2. Regression towards the mean means that the Harvard student body won’t be as exceptional on their second measurement as on their first. This is because some of the reason they were so high was just good luck.

3. The SAT is teachable to some extent which won’t transfer well to other tests. This reduces the correlation between between SAT and other GCA tests.

4. Harvard uses affirmative action which lowers the mean SAT of the students a bit. It seems to be about 1500.

SAT has an approx. mean of ~500 per subtest, ceiling of 800 and SD of approx. 100. So a 1500 total score is 750+750=(500+250)+(500+250)=(500+2.5sd)+(500+2.5sd), or about 2.5 SD above the mean. Test-retest reliability/stability for a few months is around .86 (mean of values here research.collegeboard.org/sites/default/files/publications/2012/7/researchreport-1982-7-test-disclosure-retest-performance-sat.pdf, n≈3700).

The interesting question is how much regression towards the mean we can expect? I decided to investigate using simulated data. The idea is basically that we first generate some true scores (per classical test theory), and then make two measurements of them. Then using the first measurement, we make a selection that has a mean 2.5 SD above the mean, then we check how well this group performs on the second testing.

In R, the way we do this, is to simulate some randomly distributed data, and then create new variables that are a sum of true score and error for the measurements. This presents us with a problem.

How much error to add?

We can solve this question either by trying some values, or analytically. Analytically, it is like this:

Cor (test1 x test2) = cor(test1 x true score) * cor(test2 x true score)

The correlation of each testing is due to their common association with the true scores. To simulate a testing, we need the correlation between testing and true score. Since test-rest is .863, we take the square root and get .929. The squared value of a correlation is the amount of variance it explains, so if we square it get back to where we were before. Since the total variance is 1, we can calculate the remaining variance as 1-.863=.137. We can take the square root of this to get the correlation, which is .37. Since the correlations are those we need when adding, we have to weigh the true score by .929 and the error by .370 to get the measurements such that they have a test-retest correlation of .863.

One could also just try some values until one gets something that is about right. In this case, just weighing the true score by 1 and error by .4 produces nearly the same result. Trying out a few values like this is faster than doing the math. In fact, I initially tried out some values to hit the approximate value but then decided to solve it analytically as well.

How much selection to make?

This question is more difficult analytically. I have asked some math and physics people and they could not solve it analytically. The info we have is the mean value of the selected group, which is about 2.5, relative to a standard normal population. Assuming we make use of top-down selection (i.e. everybody above a threshold gets selected, no one below), where must we place our threshold to get a mean of 2.5? It is not immediately obvious. I solved it by trying some values and calculating the mean trait value in the selected group. It turns out that to get a selected group with a mean of 2.5, the threshold must be 2.14.

Since this group is selected for having positive errors and positive true scores, their true scores and second testing scores will be lower. How much lower? 2.32 and 2.15 according to my simulation. I’m not sure why the second measurement scores are lower than the true scores, perhaps someone more learned in statistics can tell me.

So there is still some way down to an IQ of 122 from 132.3 (2.15*15+100). However, we may note that they used a shorter WAIS-R form, which correlates .91 with the full-scale. Factoring this in reduces our estimate to 129.4. Given the selectivity noted above, this is not so unrealistic.

Also, the result can apparently be reached simply by 2.5*.86. I was aware that this might work, but wasn’t sure so I tested it (the purpose of this post). One of the wonders of statistical software like this is that one can do empirical mathematics. :)

R code

##SAT AND IQ FOR HARVARD
library(dplyr)

#size and true score
n.size = 1e6
true.score = rnorm(n.size)

#add error to true scores
test.1 = .929*true.score + .370*rnorm(n.size)
test.2 = .929*true.score + .370*rnorm(n.size)
SAT = data.frame(true.score,
                 test.1,
                 test.2)
#verify it is correct
cor(SAT)

#select a subsample
selected = filter(SAT, test.1>2.14) #selected sample
describe(selected) #desc. stats

References

I ran into trouble trying to remove the effects of one variable on other variables doing the writing my reanalysis of the Noble et al 2015 paper. It was not completely obvious which exact method to use.

The authors in their own paper used OLS aka. standard linear regression. However, since I wanted to plot the results at the case-level, this was not useful to me. Before doing this I did not really understand how partial correlations worked, or what semi-partial correlations were, or how multiple regression works exactly. I still don’t understand the last, but the others I will explain with example results.

Generated data

Since we are cool, we use R.

We begin by loading some stuff I need and generating some data:

#load libs
 library(pacman) #install this first if u dont have it
 p_load(Hmisc, psych, ppcor ,MASS, QuantPsyc, devtools)
 source_url("https://github.com/Deleetdk/psych2/raw/master/psych2.R")
 #Generate data
 df = as.data.frame(mvrnorm(200000, mu = c(0,0), Sigma = matrix(c(1,0.50,0.50,1), ncol = 2), empirical=TRUE))
 cor(df)[1,2]
[1] 0.5

The correlation from this is exactly .500000, as chosen above.

Then we add a gender specific effect:

df$gender = as.factor(c(rep("M",100000),rep("F",100000)))
 df[1:100000,"V2"] = df[1:100000,"V2"]+1 #add 1 to males
 cor(df[1:2])[1,2] #now theres error due to effect of maleness!

[1] 0.4463193

The correlation has been reduced a bit as expected. It is time to try and undo this and detect the real correlation of .5.

#multiple regression
 model = lm(V1 ~ V2+gender, df) #standard linear model
 coef(model)[-1] #unstd. betas
 lm.beta(model) #std. betas
V2        genderM 
0.4999994 -0.5039354
V2        genderM 
0.5589474 -0.2519683 

The raw beta coefficient is on spot, but the standardized is not at all. This certainly makes interpretation more difficult if the std. beta does not correspond in size with correlations.

#split samples
 df.genders = split(df, df$gender) #split by gender
 cor(df.genders$M[1:2])[1,2] #males
 cor(df.genders$F[1:2])[1,2] #females
[1] 0.4987116
[1] 0.5012883

These are both approximately correct. Note however that these are the values not for the total sample as before, but for each subsample. If the sumsamples differ in their correlation with, it will show up here clearly.

#partial correlation
df$gender = as.numeric(df$gender) #convert to numeric
partial.r(df, c(1:2),3)[1,2]
[1] 0.5000005

This is using an already made function for doing partial correlations. Partial correlations are calculated from the residuals of both variables. This means that they correlate whatever remains of the variables after everything that could be predicted from the controlling variable is removed.

#residualize both
df.r2 = residuals.DF(df, "gender")
cor(df.r2[1:2])[1,2]
[1] 0.5000005

This should be the exact same as the above, but done manually. And it is.

#semi-partials
spcor(df)$estimate #hard to interpet output
spcor.test(df$V1, df$V2, df$gender)[1] #partial out gender from V2
spcor.test(df$V2, df$V1, df$gender)[1] #partial out gender from V1
V1 V2 gender
V1 1.0000000 0.4999994 -0.2253951
V2 0.4472691 1.0000000 0.4479416
gender -0.2253103 0.5005629 1.0000000
estimate
1 0.4999994
estimate
1 0.4472691

Semi-partial correlations aka part correlations are the same as above, except that only one of the two variables is residualized by the controlled variable. The above has two different already made functions for calculating these. The first works on an entire data.frame, and outputs semi-partials for all variables controlling for all other variables. The output is a bit tricky to read however, and there is no explanation in the help. One has to read it by looking at each row, this is the original variable correlated with the residualized variables in each col.
The two calls below are using the other function one where has to specify which two variables to correlate and which variables to control for. The second variable called is the one that gets residualized by the control variables.
We see that the results are as they should be. Controlling V1 for gender has about zero effect. This is because gender has no effect on this variable aside from a very small chance effect (r=-0.00212261). Controlling V2 for gender has the desired effect of returning a value very close to .5, as it should.

#residualize only V2
df.r1.V1= df.r2 #copy above
df.r1.V1$V1 = df$V1 #fetch orig V1
cor(df.r1.V1[1:2])[1,2]
[1] 0.4999994
#residualize only V1
df.r1.V2= df.r2 #copy above
df.r1.V2$V2 = df$V2 #fetch orig V2
cor(df.r1.V2[1:2])[1,2]
[1] 0.4472691

These are the two manual ways of doing the same as above. We get the exact same results, so that is good.

So where does this lead us? Well, apparently using multiple regression to control variables is a bad idea since it results in difficult to interpret results.

UNFINISHED ANALYSIS. POSTED HERE TO ESTABLISH PRIORITY. MORE TO FOLLOW! Updated 2015-12-04

Remains to be done:

  • Admixture analysis (doing)
  • Proofreading and editing
  • Deciding how to control for age and scanner (technical question)

Abstract

I explore a large (N≈1000), open dataset of brain measurements and find a general factor of brain size (GBSF) that covers all regions except possibly the amygdala (loadings near-zero, 3 out of 4 negative). It is very strongly correlated with total brain size volume and surface area (rs>.9). The factor was (near)identical across genders after adjustments for age were made (factor congruence 1.00).

GBSF had similar correlations to cognitive measures as did other aggregate brain size measures: total cortical area and total brain volume. I replicated the finding that brain measures were associated with parental income and educational attainment.

 

Introduction

A recent paper by Noble et al (2015) has gotten considerable attention in the media. An interesting fact about the paper is that most of the data was published in the paper, perhaps inadvertently. I was made aware of this fact by an observant commenter, FranklinDMadoff, on the blog of James Thompson (Psychological Comments). In this paper I make use of the same data, revisit their conclusions as well do some of my own.

The abstract of the paper reads:

Socioeconomic disparities are associated with differences in cognitive development. The extent to which this translates to disparities in brain structure is unclear. We investigated relationships between socioeconomic factors and brain morphometry, independently of genetic ancestry, among a cohort of 1,099 typically developing individuals between 3 and 20 years of age. Income was logarithmically associated with brain surface area. Among children from lower income families, small differences in income were associated with relatively large differences in surface area, whereas, among children from higher income families, similar income increments were associated with smaller differences in surface area. These relationships were most prominent in regions supporting language, reading, executive functions and spatial skills; surface area mediated socioeconomic differences in certain neurocognitive abilities. These data imply that income relates most strongly to brain structure among the most disadvantaged children.

The results are not all that interesting, but the dataset is very large for a neuroscience study, the median sample size of median samples sizes in a meta-analysis of 49 meta-analysis is 116.5 (Button et al, 2013; based on the data in their Table 1). Furthermore, they provide a very large set of different, non-overlapping brain measurements which are useful for a variety of analyses, and they provide genetic admixture data which can be used for admixture mapping.

Why their results are as expected

The authors give their results (positive relationships between various brain size measures and parental educational and economic variables) environmental interpretations. For instance:

It is possible that, in these regions, associations between parent education and children’s brain surface area may be mediated by the ability of more highly educated parents to earn higher incomes, thereby having the ability to purchase more nutritious foods, provide more cognitively stimulating home learning environments, and afford higher quality child care settings or safer neighborhoods, with more opportunities for physical activity and less exposure to environmental pollutants and toxic stress3, 37. It will be important in the future to disambiguate these proximal processes by measuring home, family and other environmental mediators21.

However, one could also expect the relationship to be due to general cognitive ability (GCA; aka. general intelligence) and its relationship to favorable educational and economic outcomes, as well as brain measures. Figure 1 illustrates this expected relationship:

Figure 1

Figure 1 – Relationships between variables

The purple line is the one the authors are often arguing for based on their observed positive relationships. As can be seen in the figure, this positive relationship is also expected because of parental education/income’s positive relationship to parental GCA, which is related to parental brain properties which are highly heritable. Based on these well-known relationships, we can estimate some expected correlations. The true score relationship between adult educational attainment and GCA is somewhere around .56 (Strenze, 2007).

The relationship between GCA and whole brain size is around .24-.28, depending on whether one wants to use the unweighted mean, n-weighted mean or median, and which studies one includes of those collected by Pietschnig et al (2014). I used healthy samples (as opposed to clinical) and only FSIQ. This value is uncorrected for measurement error of the IQ test, typically assumed to be around .90. If we choose the conservative value of .24 and then correct with .90, we get .27 as an estimated true score correlation.

The heritability of whole brain size is very high. Bouchard (2014) summarized a few studies: one found cerebral total h^2 of = .89, another whole-brain grey matter .82, whole-brain white matter .87, and a third total brain volume .80. Perhaps there is some publication bias in these numbers, so we can choose .80 as an estimate. We then correct this for measurement error and get .89. None of the previous studies were corrected for restriction of range which is fairly common because most studies use university students (Henrich et al, 2010) who average perhaps 1 standard deviation above the population mean in GCA. If we multiply these numbers we get an estimate of r=.13 between parental education and total brain volume or a similar measure. As for income, the expected correlation is somewhat lower because the relationship between GCA and income is weaker, perhaps .23 (Strenze, 2007). This gives .05. However, Strenze did not account for the non-linearity of the income x GCA relationship, so it is probably somewhat higher.

Initial analyses

Analysis was done in R. Code file, figures, and data are available in supplementary material

Collecting the data

The authors did not just publish one datafile with comments about the variables, but instead various excel files were attached to parts of the figures. There are 6 such files. They all contain the same number of cases and they overlap completely (as can be seen by the subjectID column). The 6 files however do not overlap completely in their columns and some of them have unique columns. These can all be merged into one dataset.

Dealing with missing data

The original authors dealt with this simply by relying on the complete cases only. This method can bias the results when the data is not missing completely at random. Instead, it is generally better to impute missing data (Sterne et al, 2009). Figure 1 shows the matrixplot of the data file.

missing_data

The red areas mean missing data, except in the case of nominal variables which are for some reason always colored red (an error I think). Examining the structure of missing data showed that it was generally not possible to impute the data, since many cases were missing most of their values. One will have to exclude these cases. Doing so reduces the sample size from 1500 to 1068. The authors report having 1099 complete cases, but I’m not sure where the discrepancy arises.

Dealing with gender

Since males have much larger brain volumes than females, even after adjustment for body size, there is the question of how to deal with gender (no distinction is being made here between sex and gender). The original authors did this by regressing the effect out. However, in my experience, regression does not always accomplish this perfectly, so when possible one should just split the sample by gender and calculate results in each one-gender sample. One cannot do the sampling splitting when one is interested in the specific regression effect of gender, or when the resulting samples would be too small.

Dealing with age

This problem is tricky. The original authors used age and age2 to deal with age in a regression model. However, since I wanted to visualize the relationships between variables, this option was not useful to me because it would only give me the summary statistics with the effects of age, not the data. Instead, I calculated the residuals for all variables of interest after they were regressed on age, age2 and age3. The cubic age was used to further catch non-linear effects of age, as noted by e.g. Jensen (2006: 132-133).

Dealing with scanning site

One peculiar feature of the study not discussed by the authors was the relatively effect of different scanners on their results, see e.g. their Table 3. To avoid scanning site influencing the results, I also regressed this out (as a nominal variable with 13 levels).

Dealing with size

The dataset does not have size measures thus making it impossible to adjust for body size. This is problematic as it is known that body size correlates with GCA both within and between species. We are interested in differences in brain size holding body size equal. This cannot be done in the present study.

Factor analyzing brain size measurements

Why would one want to factor analyze brain measures?

The short answer is the same as that to the question: why would one want to factor analyze cognitive ability data? The answer: To explore the latent relationships in the data not immediately obvious. A factor analysis will reveal whether there is a general factor of some domain, which can be a theoretically very important discovery (Dalliard, 2013; Jensen, 1998:chapter 2). If there is no general factor, this will also be revealed and may be important as well. This is not to say that general factors or the lack thereof are the only interesting thing about the factor structure, multifactor structures are also interesting, whether orthogonal (uncorrelated) or as part of a hierarchical solution (Jensen, 2002).

The long answer is that human psychology is fundamentally a biological fact, a matter of brain physics and chemistry. This is not to say that important relationships can not fruitfully be described better at higher-levels (e.g. cognitive science), but that ultimately the origin of anything mental is biology. This fact should not be controversial except among the religious, for it is merely the denial of dualism, of ghosts, spirits, gods and other immaterial beings. As Jensen (1997) wrote:

Although the g factor is typically the largest component of the common factor variance, it is the most “invisible.” It is the only “factor of the mind” that cannot possibly be described in terms of any particular kind of knowledge or skill, or any other characteristics of psychometric tests. The fact that psychometric g is highly heritable and has many physical and brain correlates means that it is not a property of the tests per se. Rather, g is a property of the brain that is reflected in observed individual differences in the many types of behavior commonly referred to as “cognitive ability” or “intelligence.” Research on the explanation of g, therefore, must necessarily extend beyond psychology and psychometrics. It is essentially a problem for brain neurophysiology. [my emphasis]

If GCA is a property of the brain, or at least that there is an analogous general brain performance factor, it may be possible to find it with the same statistical methods that found the GCA. Thus, to find it, one must factor analyze a large, diverse sample of brain measurements that are known to correlate individually with GCA in the hope that there will be a general factor which will correlate very strongly with GCA. There is no guarantee as I see it that this will work, as I see it, but it is something worth trying.

In their chapter on brain and intelligence, Colom and Thompson (2011) write:

The interplay between genes and behavior takes place in the brain. Therefore, learning the language of the brain would be crucial to understand how genes and behavior interact. Regarding this issue, Kovas and Plomin (2006) proposed the so -called “ generalist genes ” hypothesis, on the basis of multivariate genetic research findings showing significant genetic overlap among cognitive abilities such as the general factor of intelligence ( g ), language, reading, or mathematics. The hypothesis has implication for cognitive neuroscience, because of the concepts of pleiotropy (one gene affecting many traits) and polygenicity (many genes affecting a given trait). These genetic concepts suggest a “ generalist brain ” : the genetic influence over the brain is thought to be general and distributed.

Which brain measurements have so far been found to correlate with GCA (or its IQ proxy)?

Below I have compiled a list of brain measurements that have at some point been found to be correlated with GCA IQ scores:

  • Brain evoked potentials: habituation time (Jensen, 1998:155)
  • Brain evoked potentials: complexity of waveform (Deary and Carol, 1997)
  • Brain intracellular pH-level (Jensen, 1998:162)
  • Brain size: total and brain regions (Jung and Haier, 2007)
  • Of the above, grey matter and white matter separate
  • Cortical thickness (Deary et al, 2010)
  • Cortical development (Shaw, P. et al. 2006)
  • Nerve conduction velocity (Deary and Carol, 1997)
  • Brain wave (EEG) coherence (Jensen, 2002)
  • Event related desynchronization of brain waves (Jensen, 2002)
  • White matter lesions (Turken et al, 2008)
  • Concentrations of N-acetyl aspartate (Jung, et al. 2009)
  • Water diffusion parameters (Deary et al, 2010)
  • White matter integrity (Deary et al, 2010)
  • White matter network efficiency (Li et al. 2009)
  • Cortical glucose metabolic rate during mental activity / Neural efficiency (Neubauer et al, 2009)
  • Uric acid level (Jensen, 1998:162)
  • Density of various regions (Frangou et al 2004)
  • White matter fractional anisotropy (Navas‐Sánchez et al 2014; Kim et al 2014)
  • Reliable responding to changing inputs (Euler et al, 2015)

Most of the references above lead to the reviews I relied upon (Deary and Carol, 1997; Jensen, 1998, 2002; Deary et al, 2010). There are surely more, and probably a large number of the above are false-positives. Some I could not find a direct citation for. We cannot know which are false positives until large datasets are compiled with these measures as well as a large number of cognitive tests. A simple WAIS battery won’t suffice, there needs to be elementary cognitive tests too, and other tests that vary more in content, type and g-loading. This is necessary if we are to use the method of correlated vectors as this does not work well without diversity in factor indicators. It is also necessary if we are to examine non-GCA factors.

My hypothesis is that if there is a general brain factor, then it will have a hierarchical structure similar to GCA. Figure 2 shows a hypothetical structure of this.

Figure 2

Notes: Where squares at latent variables and circles are observed variables. I am aware this is opposite of normal practice (e.g. Beaujean, 2014) but text is difficult to fit into circles.

Of these, the speed factor has to do with speed of processing which can be enhanced in various ways (nerve conduction velocity, higher ‘clock’ frequency). Efficiency has to do with efficient use of resources (primarily glucose). Connectivity has to do with better intrabrain connectivity, either by having more connections, less problematic connections or similar. Size has to do with having more processing power by scaling up the size. Some areas may matter more than others for this. Integrity has to do with withstanding assaults, removing garbage (which is known to be the cause of many neurodegenerative diseases) and the like. There are presumably more factors, and some of mine may need to be split.

Previous studies and the present study

Altho factor analysis is common in differential psychology and related fields, it is somewhat rare outside of those. And when it is used, it is often done in ways that are questionable (see e.g. controversy surrounding Hampshire et al (2012): Ashton et al (2014a), Hampshire et al (2014), Ashton et al (2014b), Haier et al (2014a), Ashton et al (2014c), Haier et al (2014b)). On the other hand, factor analytic methods have been used in a surprisingly diverse collection of scientific fields (Jöreskog 1996; Cudeck and MacCallum, 2012).

I am only familiar with one study applying factor analysis to different brain measures and it was a fairly small study at n=132 (Pennington et al, 2000). They analyzed 13 brain regions and reported a two-factor solution. It is worth quoting their methodology section:

Since the morphometric analyses yield a very large number of variables per subject, we needed a data reduction strategy that fit with the overall goal of exploring the etiology of individual differences in the size of major brain structures. There were two steps to this strategy: (1) selecting a reasonably small set of composite variables that were both comprehensive and meaningful; and (2) factor analyzing the composite variables. To arrive at the 13 composite variables discussed earlier, we (1) picked the major subcortical structures identified by the anatomic segmentation algorithms, (2) reduced the set of possible cortical variables by combining some of the pericallosal partitions as described earlier, and (3) tested whether it was justifiable to collapse across hemispheres. In the total sample, there was a high degree of correlation (median R=.93, range=.82-.99) between the right and left sides of any given structure; it thus seemed reasonable to collapse across hemispheres in creating composites. We next factor-analyzed the 13 brain variables in the total sample of 132 subjects, using Principal Components factor analysis with Varimax rotation (Maxwell & Delaney, 1990). The criteria for a significant factor was an eigenvalue>l.0, with at least two variables loading on the factor.

The present study makes it possible to perform a better analysis. The sample is about 8 times larger and has 27 non-overlapping measurements of brain size, broadly speaking. The major downside of the variables in the present study is that the cerebral is not divided into smaller areas as done in their study. Given the very large sample size, one could use 100 variables or more.

The available brain measures are:

  1. cort_area.ctx.lh.caudalanteriorcingulate
  2. cort_area.ctx.lh.caudalmiddlefrontal
  3. cort_area.ctx.lh.fusiform
  4. cort_area.ctx.lh.inferiortemporal
  5. cort_area.ctx.lh.middletemporal
  6. cort_area.ctx.lh.parsopercularis
  7. cort_area.ctx.lh.parsorbitalis
  8. cort_area.ctx.lh.parstriangularis
  9. cort_area.ctx.lh.rostralanteriorcingulate
  10. cort_area.ctx.lh.rostralmiddlefrontal
  11. cort_area.ctx.lh.superiortemporal
  12. cort_area.ctx.rh.caudalanteriorcingulate
  13. cort_area.ctx.rh.caudalmiddlefrontal
  14. cort_area.ctx.rh.fusiform
  15. cort_area.ctx.rh.parsopercularis
  16. cort_area.ctx.rh.parsorbitalis
  17. cort_area.ctx.rh.parstriangularis
  18. cort_area.ctx.rh.rostralanteriorcingulate
  19. cort_area.ctx.rh.rostralmiddlefrontal
  20. vol.Left.Cerebral.White.Matter
  21. vol.Left.Cerebral.Cortex
  22. vol.Left.Hippocampus
  23. vol.Left.Amygdala
  24. vol.Right.Cerebral.White.Matter
  25. vol.Right.Cerebral.Cortex
  26. vol.Right.Hippocampus
  27. vol.Right.Amygdala

I am not expert in neuroscience, but as far as I know, the above measurements are independent and thus suitable for factor analysis. They reported additional aggregate measures such as total surface area and total volume. They also reported total cranial volume, which permits the calculations of another two brain measurements: the non-brain volume of the cranium (subtracting total brain volume from total intracranial volume), and the proportion of intracranial volume used for brain.

The careful reader has perhaps noticed something bizarre about the dataset, namely that there is an unequal number of left hemisphere (“lh”) and right hemisphere (“rh”) regions (11 vs. 8). I have no idea why this is, but it is somewhat problematic in factor analysis since this weights some variables twice as well as weighting the left side a bit more.

The present dataset is inadequate for properly testing the general brain factor hypothesis because it only has measurements from one domain: size. The original authors may have more measurements they did not publish. However, one can examine the existence of the brain size factor, as a prior test of the more general hypothesis.

Age and overall brain size

As an initial check, I plotted the relationship between total brain size measures and age. These are shown in Figure 3 and 4.

Figure 3 Figure 4

Curiously, these show that the size increase only occurs up to about age 8 and 10, or so. I was under the impression that brain size continued to go up until the body in general stopped growing, around 15-20 years. This study does not appear to be inconsistent with others (e.g. Giedd, 1999). The relationship is clearly non-linear, so one will need to use the age corrections described above. To see if the correction worked, we plot the total size variables and age. There should be near-zero correlation. Results in Figures 5 and 6.

Figure 5 Figure 6

Instead we still see a slight correlation for both genders, both apparently due to a single outlier. Very odd. I examined these outliers (IDs: P0009 and P0010) but did not see anything special about them. I removed them and reran the residualization from the original data. This produced new outliers similar to before (with IDs following them). When I removed them, new ones. I figure it is due to some error with the residualization process. Indeed, a closer look revealed that the largest outliers (positive and negative) were always the first two indexes. I thus removed these before doing more analyses. The second largest outliers had no particular index. I tried removing more age outliers, but it was not possible to completely remove the correlations between age and the other variables (usually remained near r=.03). Figure 6a shows the same as Figure 6 just without the two outliers.

Figure 6a

The genders are somewhat displaced on the age variable, but if one looks at the x-axis, one an see that this is in fact a very, very small difference.

General brain size factor with and without residualization

Results for the factor analysis without residualization are shown in Figure 7. I used the fa() function from the psych package with default settings: 1 factor extracted with the minimum residuals method. Previous studies have shown factor extraction method to be of little importance as long as it isn’t principal components with a smaller number of variables (Kirkegaard, 2014).

Figure 7

We see that the factors are quite similar (factor congruence .95) but that the male factor is quite a bit stronger (var% M/F 26 vs. 16). This suggests that the factor either works differently in the genders, or there is error in the results. If it is error, we should see an improvement after removing some of it. Figure 8 shows the same plot using the residualized data.

Figure 8

The results were more similar now and stronger for both genders (var% M/F = 34 vs. 33).

The amygdala results are intriguing, suggesting that this region does not increase in size along with the rest of the brain. The right amygdala even had negative loadings in both genders.

Using all that’s left

The next thing one might want to do is extract multiple factors. I tried extracting various solutions with nfactors 3-5. These however are bogus models due to the near-1 correlation between the brain sides. This results in spurious factors that load on just 2 variables (left and right versions) with loadings near 1. One could solve this by either averaging those with 2 measurements, or using only those from the left side. It makes little difference because they correlate so highly. It should be noted tho that doing this means one can’t see any lateralization effects such as that suggested for the right amygdala.

I redid all the results using the left side variables only. Figure 9 shows the results.

Figure 9

Now all regions had positive loadings and the var% increased a bit for both genders to 36/36. Factor congruence was 1.00, even for the non-residualized data. It thus seems that the missing measures of the right side or the use of near-doubled measures had a negative impact on the results as well.

One can calculate other measures of factor strength/internal reliability, such as the average intercorrelation, Cronbach’s alpha, Guttman’s G6. These are shown in Table 1.

Table 1- Internal reliability measures
Sample Mean r Alpha (raw) Alpha (std.) G6
Male .33 .48 .88 .90
Female .34 .45 .89 .90

 

Multiple factors

We are now ready to explore factor solutions with more factors. Three different methods suggested extracted at most 5 factors both datasets (using nScree() from nFactors package). I extracted solutions for 2 to 6 factors for each dataset, the last included by accident. All of these were extracted with oblique rotation method of oblimin thus possibly returning correlated factors. The prediction from a hierarchical model is clear: factors extracted in this way should be correlated. Figures 10 to 14 show the factor loadings of these solutions.

Figure 10 Figure 11 Figure 12 Figure 13

Figure 14

So it looks like results very pretty good with 4 factors and not too good with the others. The problem with this method is that the factors extracted may be similar/identical but not in the same order and with the same name. This means that the plots above may plot the wrong factors together which defeats the entire purpose. So what we need is an automatic method of pairing up the factors correctly if possible. The exhaustive method is trying all the pairings of factors for each number of factors to extract, and then calculating some summary metrics or finding the best overall pairing combination. This would involve quite a lot of comparisons, since e.g. one can pair up set 2 sets of, say, 5 factors in 5*4/2 ways (10).

I settled for a quicker solution. For each factor solution pair, I calculated all the cross-analysis congruence factors. Then for each factor, I found the factor from the other analysis it had the highest congruence with and saved this information. This method can miss some okay but not great solutions, but I’m not overly concerned about those. In a good fit, the factors found in each analysis should map 1 to 1 to each other such that their highest congruence is with the analog factor from the other analysis.

From this information, I calculated the mean of the best congruence pairs, the minimum, and whether there was a mismatch. A mismatch occurs when two or more factors from one analysis maps to (has the highest congruence) with the same factor from the other analysis. I calculated three metrics for all the analyses performed above. The results are shown in Table 2.

Table 2 – Cross-analysis comparison metrics
Factors.extracted mean.best.cong min.best.cong factor.mismatch
2 0.825 0.73 FALSE
3 0.713 0.37 TRUE
4 0.960 0.93 FALSE
5 0.720 0.35 TRUE
6 0.765 0.58 FALSE

 

As can be seen, the two analyses with 4 factors were a very good match. Those with 3 and 5 terrible as they produced factor mismatches. The analyses with 2 and 6 were also okay.

The function for going thru all the oblique solutions for two samples also returns the necessary information to match up the factors if they need reordering. If there is a mismatch, this operation is nonsensical, so I won’t re-do all the plots. The plot above with 4 factors just happens to already be correctly ordered. This however need not be the case. The only plot that needs to be redone is that with 6 factors. It is shown in Figure 15.

Figure 15

Compare with figure 14 above. One might wonder whether the 4 or 6 factor solutions are the best. In this case, the answer is the 4 factor solutions because the female 6 factor solution is illegal — one factor loading is above 1 (“a Heywood case”). At present, given the relatively few regional measures, and the limitation to only volume and surface measures, I would not put too much effort into theorizing about the multifactor structure found so far. It is merely a preliminary finding and may change drastically when more measures are added or measures are sampled differently.

A more important finding from all the multifactor solutions was that all produced correlated factors, which indicates a general factor.

Aggregate measures and the general brain size factor

So, the general brain size factor (GBSF) may exist, but is it useful? At first, we may want to correlate the various aggregate variables. Results are in Table 3.

Table 3 – Correlations between aggregate brain measures
cort_area.ctx.total cort_area.ctx.lh.total vol.WholeBrain vol.IntracranialVolume GBSF
cort_area.ctx.total 0.997 0.869 0.746 0.953
cort_area.ctx.lh.total 0.997 0.867 0.751 0.953
vol.WholeBrain 0.832 0.832 0.822 0.923
vol.IntracranialVolume 0.638 0.642 0.798 0.776
GBSF 0.950 0.950 0.905 0.711

Notes: Correlations above diagonal are males, below females.

The total areas of the brain are almost symmetrical: the correlation of the total surface area and left side only is a striking .997. Intracranial volume is a decent proxy (.822) for whole brain volume, but is somewhat worse for total surface area (.746). GBSF has very strong correlations with the surface areas (.95), but not quite as strong as the analogous situation in cognitive data: IQ and extracted general factor (GCA factor) usually correlate .99 with a reasonable sample of subtests: Ree and Earles (1991) reported that an average GCA factor correlated .991 with an unweighted sum score in a sample of >9k, Kirkegaard (2014b) found a .99 correlation between extracted GCA and an unweighted sum in a Dutch university sample of ~500.

Correlations with cognitive measures

The authors have data for 4 cognitive tests, however, data are only public for 2 of them. These are in the authors’ words:

Flanker inhibitory control test (N = 1,074).
The NIH Toolbox Cognition Battery version of the flanker task was adapted from the Attention Network Test (ANT). Participants were presented with a stimulus on the center of a computer screen and were required to indicate the left-right orientation while inhibiting attention to the flankers (surrounding stimuli). On some trials the orientation of the flankers was congruent with the orientation of the central stimulus and on the other trials the flankers were incongruent. The test consisted of a block of 25 fish trials (designed to be more engaging and easier to see to make the task easier for children) and a block of 25 arrow trials, with 16 congruent and 9 incongruent trials in each block, presented in pseudorandom order. Participants who responded correctly on 5 or more of the 9 incongruent trials then proceeded to the arrows block. All children age 9 and above received both the fish and arrows blocks regardless of performance. The inhibitory control score was based on performance on both congruent and incongruent trials. A two-vector method was used that incorporated both accuracy and reaction time (RT) for participants who maintained a high level of accuracy (>80% correct), and accuracy only for those who did not meet this criteria. Each vector score ranged from 0 to 5, for a maximum total score of 10 (M = 7.67, s.d. = 1.86).
List sorting working memory test (N = 1,084).
This working memory measure requires participants to order stimuli by size. Participants were presented with a series of pictures on a computer screen and heard the name of the object from a speaker. The test was divided into the One-List and Two-List conditions. In the One-List condition, participants were told to remember a series of objects (food or animals) and repeat them in order, from smallest to largest. In the Two-List condition, participants were told to remember a series of objects (food and animals, intermixed) and then again report the food in order of size, followed by animals in order of size. Working memory scores consisted of combined total items correct on both One-List and Two-List conditions, with a maximum of 28 points (M = 17.71, s.d. = 5.39).

I could not locate a factor analytic study for the Flanker test, so I don’t know how g-loaded it is. Working memory (WM) is known to have a strong relationship to GCA (Unsworth et al, 2014). The WM variable should probably be expected to be the most g-loaded of the two. The implication given the causal hypothesis of brain size for GCA is that the WM test should show higher correlations to the brain measures. Figures X and X show the histograms for the cognitive measures.

Working_memory_noAge_histogramFlanker_noAge_histogram

Note that the x-values do not have any interpretation as they are the residual raw values, not raw values. For the Flanker test, we see that it is bimodal. It seems that a significant part of the sample did not understand the test and thus did very poorly. One should probably either remove them or use a non-parametric measure if one wanted to rely on this variable. I decided to remove them since the sample was sufficiently large that this wasn’t a big problem. The procedure reduced the skew from -1.3/-1.1 to -.2/-.1 respectively for the male and female samples. The sample sizes were reduced from 548/516 to 522/487 respectively. One could plausibly combine them into one measure which would perhaps be a better estimate of GCA than either of them alone. This would be the case if their g-loading was about similar. If however, one is much more g-loaded than the other, it would degrade the measurement towards a middle level. I combined the two measures by first normalizing them (to put them on the same scale) and then averaging them.

Given the very high correlations between the GBSF of these data and the other aggregate measures, it is not expected that the GBSF will correlate much more strongly with cognitive measures than the other aggregate brain measures. Table X shows the correlations.

Table X – Correlations between cognitive measures and aggregate brain size measures
Variable WM Flanker Combined WM Flanker Combined
Males Females
WM
Flanker 0.407 0.393
WM.Flanker.mean 0.830 0.847 0.824 0.845
cort_area.ctx.total 0.302 0.138 0.235 0.236 0.201 0.237
cort_area.ctx.lh.total 0.302 0.137 0.235 0.239 0.203 0.238
vol.WholeBrain 0.263 0.103 0.201 0.158 0.120 0.146
vol.IntracranialVolume 0.213 0.101 0.170 0.154 0.101 0.137
GBSF 0.311 0.147 0.252 0.223 0.181 0.218

 

As for the GBSF, given that it is a ‘distillate’ (Jensen’s term), one would expect it to have slightly higher correlations with the cognitive measures than the merely unweighted ‘sum’ measures. This was the case for males, but not females. In general, the female correlations were weaker, especially the whole brain volume x WM (.263 vs. .158). Despite the large sample sizes, this difference is not very certain, the 95% confidence intervals are -.01 to .22. A larger sample is necessary to examine this question. The finding is intriguing is that if real, it would pose an alternative solution to the Ankney-Rushton anomaly, that is, the fact that males have greater brain size and this is related to IQ scores, but do not consistently perform better on IQ tests (Jackson and Rushton, 2006). Note however that the recent large meta-analysis of brain size x IQ studies did not find an effect of gender, so perhaps the above results are a coincidence (Pietschnig et al 2014).

We also see that the total cortical area variables were stronger correlates of cognitive measures than whole brain volume, but a larger sample is necessary to confirm this pattern.

Lastly, we see a moderately strong correlation between the two cognitive measures (≈.4). The combined measure was a weaker correlate of the criteria variables, which is what is expected if the Flanker test was a relatively weaker test of GCA than the WM one.

Correlations with parental education and income

It is time to revisit the results reported by the original authors, namely correlations between educational/economic variables and brain measures. I think the correlations between specific brain regions and criteria variables is mostly a fishing expedition of chance results (multiple testing) and of no particular interest unless strong predictions can be made before looking at the data. For this reason, I present only correlations with the aggregate brain measures, as seen in Table X.

Table x – Correlations between educational/economic variables and other variables
Variable ED ln_Inc Income ED ln_Inc Income
Males Females
WM 0.131 0.192 0.175 0.170 0.229 0.174
Flanker 0.163 0.180 0.188 0.118 0.131 0.106
WM.Flanker.mean 0.168 0.215 0.206 0.178 0.215 0.165
cort_area.ctx.total 0.104 0.217 0.207 0.128 0.173 0.154
cort_area.ctx.lh.total 0.108 0.215 0.208 0.133 0.170 0.152
vol.WholeBrain 0.103 0.190 0.195 0.064 0.112 0.078
vol.IntracranialVolume 0.126 0.157 0.159 0.086 0.104 0.100
GBSF 0.109 0.206 0.204 0.100 0.157 0.137
ED 0.559 0.542 0.561 0.513
ln_Inc 0.559 0.866 0.561 0.855
Income 0.542 0.866 0.513 0.855

 

Here the correlations of the combined cognitive measure was higher than WM, unlike before, so perhaps the diagnosis from before was wrong. In general, the correlations of income and brain measures were stronger than that for education. This is despite the fact that GCA is more strongly correlated to educational attainment than income. This was however not the same in this sample: correlations of WM and Flanker were stronger with the economic variables. Perhaps there is more range restriction in the educational variable than the income one. An alternative environmental interpretation is that it is the affluence that causes the larger brains.

If we recall the theoretic predictions of the strength of the correlations, the incomes are stronger than expected (actual .19/.09 M/F, predicted about .05), while the educational ones are a bit weaker than expected (actual .1/.6, predicted about .13). However, the sample sizes are not larger enough for these results to be certain enough to question the theory.

Racial admixture

To me surprise, the sample had racial admixture data. This is surprising because such data has been available to testing the genetic hypothesis of group differences for many years, apparently without anyone publishing something on the issue. As I argued elsewhere, this is odd given that a good dataset would be able to decisively settle the ‘race and intelligence’ controversy (Dalliard, 2014; Rote and Rodgers, 2005; Rushton and Jensen, 2005). It is actually very good evidence for the genetic hypothesis because if it was false, and these datasets showed it, it would have been a great accomplishment for a mainstream scientist to publish a paper decisively showing that it was indeed false. However, if it was true, then any mainstream scientist could not publish it without risking personal assaults, getting fired and possibly pulled in court as were academics who previously researched that topic (Gottfredson, 2005; Intelligence 1998; Nyborg 2011; 2003).

The genomic data however appeared to be an either/or (1 or 0) variable in the released data files. Oddly, some persons had no value for any racial group. It turns out that the data was merely rounded in the spreadsheet file. This explained why some persons had 0 for all groups: These persons did not belong at least 50% to any racial group, and thus they were assigned a 0 in every case.

I can think of two ways to count the number of persons in the main categories. One can count the total ‘summed’ persons. In this way, if person A has 50% ancestry from race R, and person B has 30%, this would sum to .8 persons. One can think of it as the number of pure-breed persons’ worth of ancestry from that that group. Another way is to count everybody as 1 who is above some threshold for ancestry. I chose to use 20% and 80% for thresholds, which correspond with persons with substantial ancestry from that racial cluster, and persons with mostly ancestry from that cluster. One could choose other values of course, and there is a degree of arbitrariness, but it is not important what the particular values are.

Results are in Table X.

Racial group European African Amerindian East Asian Oceanian Central Asia Sum
Summed ancestry ‘persons’ 686.364 134.2714 48.31457 163.49868 8.59802 26.95408 1068.00075
Persons with >20% 851 187 89 238 8 30 1403
Persons with >80% 647 105 3 121 0 21 897

 

Note that the number 1068 is the exact number of persons in the complete sample, which means that the summed ancestry for all groups has an error of a mere .00075.

Another way of understanding the data is to plot histograms of each racial group. These are shown below in Figures X to X.

Race_European_histogramRace_African_histogram Race_Amerindian_histogram Race_East_Asian_histogram   Race_Oceanian_histogramRace_Central_Asian_histogram

 

Since European ancestry is the largest, the other plots are mostly empty except for the 0% bar. But we do see a fair amount of admixture in the dataset.

Regression, residualization, correlation and power

There are a couple of different methods one could use to examine the admixture data. A simple correlation is justified when dealing with a group that only has 2 sources of ancestry. This is the easiest case to handle. For this to work, the groups most have a different genotypic mean of the trait in question (GCA and brain size variables in this case) and there must be a substantially admixtured population. Even given a large hypothesized genotypic difference, the expected correlation is actually quite small. For African Americans (such as those in the sample), their European ancestry% is about 15-25% depending on the exact sample. The standard deviation of their European ancestry% is not always reported, but one can calculate it if one has some data, which we do.

The first problem with this dataset is that there are no sociological race categories (“white”, “African American”, “Hispanic”, “Asian” etc.), but only genomic data. This means that to get an African American subsample, we must create one based on actual actual ancestry. There are two criteria that needs to be met for inclusion in that group: 1) the person must be substantially African, 2) the person must be mostly a mix of European and African ancestry. Going with the values from before, this means that the person must be at least 20% African, and at least 80% combined European and African.

Dealing with scanner and site

There are a variety of ways to use the data and they may or may not give similar results. First is the question of which variables to control for. In the earlier sections of this paper, I controlled for Age, Age2, Age3, Scanner (12 different). For producing the above ancestry plots and results I did not control for anything. Controlling the ancestry variables for scanner is problematic as people from different races live in different places. Controlling for this thus removes the racial differences for no reason. One could similarly control for site where the scanner is (I did not do this earlier). We can compare this to scanner by a contingency table, as shown in Table X below:

Table X – Contingency table of scanner site and scanner #
Site/scanner 0 1 10 11 12 2 3 4 5 6 7 8 9
Cornel 0 0 0 96 0 0 0 0 0 0 0 0 0
Davis 0 0 0 0 0 0 0 0 0 0 114 0 0
Hawaii 0 0 0 0 0 0 0 0 0 202 0 0 0
KKI 0 0 0 0 0 0 103 0 0 0 0 0 0
MGH 0 0 0 0 0 0 0 0 115 0 0 0 13
UCLA 0 0 27 0 22 0 0 10 0 0 0 0 0
UCSD 109 93 0 0 0 0 0 0 0 0 0 0 0
UMMS 0 0 0 0 0 56 0 0 0 0 0 0 0
Yale 0 0 0 0 0 0 0 0 0 0 0 108 0

 

As we can see, these are clearly inter-dependent, given the obvious fact that the scanners have a particular location and was not moved around (all columns have only 1 cell with value>0). Some sites however have multiple scanners, some have only one. E.g. UCSD has two scanners (#0 and #1), while KKI has only one (#3).

Controlling for scanner however makes sense if we are looking at brain size variables, as this removes differences between measurements due to differences in the scanning equipment or (post-)processing. So perhaps one would want to control brain measurements for scanner and age effects, but only control the remaining variables for age affects.

Dealing with gender

As before

To be continued…

 

References

  • Ashton, M. C., Lee, K., & Visser, B. A. (2014a). Higher-order g versus blended variable models of mental ability: Comment on Hampshire, Highfield, Parkin, and Owen (2012). Personality and Individual Differences, 60, 3-7.
  • Ashton, M. C., Lee, K., & Visser, B. A. (2014b). Orthogonal factors of mental ability? A response to Hampshire et al. Personality and Individual Differences, 60, 13-15.
  • Ashton, M. C., Lee, K., & Visser, B. A. (2014c). Further response to Hampshire et al. Personality and Individual Differences, 60, 18-19.
  • Beaujean, A. A. (2014). Latent Variable Modeling Using R: A Step by Step Guide: A Step-by-Step Guide. Routledge.
  • Bouchard Jr, T. J. (2014). Genes, Evolution and Intelligence. Behavior genetics, 44(6), 549-577.
  • Button, K. S., Ioannidis, J. P., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S., & Munafò, M. R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365-376.
  • Colom, R., & Thompson, P. M. (2011). Intelligence by Imaging the Brain. The Wiley-Blackwell handbook of individual differences, 3, 330.
  • Cudeck, R., & MacCallum, R. C. (Eds.). (2012). Factor analysis at 100: Historical developments and future directions. Routledge.
  • Dalliard, M. (2013). Is Psychometric g a Myth?. Human Varieties.
  • Dalliard, M. (2014). The Elusive X-Factor: A Critique of J. M. Kaplan’s Model of Race and IQ. Open Differential Psychology.
  • Deary, I. J., & Caryl, P. G. (1997). Neuroscience and human intelligence differences. Trends in Neurosciences, 20(8), 365-371.
  • Deary, I. J., Penke, L., & Johnson, W. (2010). The neuroscience of human intelligence differences. Nature Reviews Neuroscience, 11(3), 201-211.
  • Dekaban, A.S. and Sadowsky, D. (1978). Changes in brain weights during the span of human life: relation of brain weights to body heights and body weights, Ann. Neurology, 4:345-356.
  • Euler, M. J., Weisend, M. P., Jung, R. E., Thoma, R. J., & Yeo, R. A. (2015). Reliable Activation to Novel Stimuli Predicts Higher Fluid Intelligence. NeuroImage.
  • Frangou, S., Chitins, X., & Williams, S. C. (2004). Mapping IQ and gray matter density in healthy young people. Neuroimage, 23(3), 800-805.
  • Giedd, J. N., Blumenthal, J., Jeffries, N. O., Castellanos, F. X., Liu, H., Zijdenbos, A., … & Rapoport, J. L. (1999). Brain development during childhood and adolescence: a longitudinal MRI study. Nature neuroscience, 2(10), 861-863.
  • Gottfredson, L. S. (2005). Suppressing intelligence research: Hurting those we intend to help. In R. H. Wright & N. A. Cummings (Eds.), Destructive trends in mental health: The well-intentioned path to harm (pp. 155-186). New York: Taylor and Francis.
  • Haier, R. J., Karama, S., Colom, R., Jung, R., & Johnson, W. (2014a). A comment on “Fractionating Intelligence” and the peer review process. Intelligence, 46, 323-332.
  • Haier, R. J., Karama, S., Colom, R., Jung, R., & Johnson, W. (2014b). Yes, but flaws remain. Intelligence, 46, 341-344.
  • Hampshire, A., Highfield, R. R., Parkin, B. L., & Owen, A. M. (2012). Fractionating human intelligence. Neuron, 76(6), 1225-1237.
  • Hampshire, A., Parkin, B., Highfield, R., & Owen, A. M. (2014). Response to:“Higher-order g versus blended variable models of mental ability: Comment on Hampshire, Highfield, Parkin, and Owen (2012)”. Personality and Individual Differences, 60, 8-12.
  • Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world?. Behavioral and brain sciences, 33(2-3), 61-83.
  • Intelligence. (1998). Special issue dedicated to Arthur Jensen. Volume 26, Issue 3.
  • Jackson, D. N., & Rushton, J. P. (2006). Males have greater g: Sex differences in general mental ability from 100,000 17-to 18-year-olds on the Scholastic Assessment Test. Intelligence, 34(5), 479-486.
  • Jensen, A. R., & Weng, L. J. (1994). What is a good g?. Intelligence, 18(3), 231-258.
  • Jensen, A. R. (1997). The psychometrics of intelligence. In H. Nyborg (Ed.), The scientific study of human nature: Tribute to Hans J. Eysenck at eighty. New York: Elsevier. Pp. 221—239.
  • Jensen, A. R. (1998). The g Factor: The Science of Mental Ability. Preager.
  • Jensen, A. R. (2002). Psychometric g: Definition and substantiation. The general factor of intelligence: How general is it, 39-53.
  • Jung, R. E. & Haier, R. J. (2007). The Parieto-Frontal Integration Theory (P-FIT) of intelligence: converging neuroimaging evidence. Behav. Brain Sci. 30, 135–154; discussion 154–187.
  • Jung, R. E. et al. (2009). Imaging intelligence with proton magnetic resonance spectroscopy. Intelligence 37, 192–198.
  • Jöreskog, K. G. (1996). Applied factor analysis in the natural sciences. Cambridge University Press.
  • Kim, S. E., Lee, J. H., Chung, H. K., Lim, S. M., & Lee, H. W. (2014). Alterations in white matter microstructures and cognitive dysfunctions in benign childhood epilepsy with centrotemporal spikes. European Journal of Neurology, 21(5), 708-717.
  • Kirkegaard, E. O. W. (2014a). The international general socioeconomic factor: Factor analyzing international rankings. Open Differential Psychology.
  • Kirkegaard, E. O. W. (2014b). The personal Jensen coefficient does not predict grades beyond its association with g. Open Differential Psychology.
  • Li, Y. et al. (2009). Brain anatomical network and intelligence. PLoS Comput. Biol. 5, e1000395.
  • Navas‐Sánchez, F. J., Alemán‐Gómez, Y., Sánchez‐Gonzalez, J., Guzmán‐De‐Villoria, J. A., Franco, C., Robles, O., … & Desco, M. (2014). White matter microstructure correlates of mathematical giftedness and intelligence quotient. Human brain mapping, 35(6), 2619-2631.
  • Neubauer, A. C. & Fink, A. (2009). Intelligence and neural efficiency. Neurosci. Biobehav. Rev. 33, 1004–1023.
  • Noble, K. G., Houston, S. M., Brito, N. H., Bartsch, H., Kan, E., Kuperman, J. M., … & Sowell, E. R. (2015). Family income, parental education and brain structure in children and adolescents. Nature Neuroscience.
  • Nyborg, H. (2003). The sociology of psychometric and bio-behavioral sciences: A case study of destructive social reductionism and collective fraud in 20th century academia. Nyborg H.(Ed.). The scientific study of general intelligence. Tribute to Arthur R. Jensen, 441-501.
  • Nyborg, H. (2011). The greatest collective scientific fraud of the 20th century: The demolition of differential psychology and eugenics. Mankind Quarterly, Spring Issue.
  • Pennington, B. F., Filipek, P. A., Lefly, D., Chhabildas, N., Kennedy, D. N., Simon, J. H., … & DeFries, J. C. (2000). A twin MRI study of size variations in the human brain. Journal of Cognitive Neuroscience, 12(1), 223-232.
  • Pietschnig, J., Penke, L., Wicherts, J. M., Zeiler, M., & Voracek, M. (2014). Meta-Analysis of Associations Between Human Brain Volume And Intelligence Differences: How Strong Are They and What Do They Mean?. Available at SSRN 2512128.
  • Ree, M. J., & Earles, J. A. (1991). The stability of g across different methods of estimation. Intelligence, 15(3), 271-278.
  • Rowe, D. C., & Rodgers, J. E. (2005). Under the skin: On the impartial treatment of genetic and environmental hypotheses of racial differences. American Psychologist, 60(1), 60.
  • Rushton, J. P., & Jensen, A. R. (2005). Thirty years of research on race differences in cognitive ability. Psychology, public policy, and law, 11(2), 235.
  • Shaw, P. et al. (2006). Intellectual ability and cortical development in children and adolescents. Nature 440, 676–679 (2006).
  • Sterne, J. A., White, I. R., Carlin, J. B., Spratt, M., Royston, P., Kenward, M. G., … & Carpenter, J. R. (2009). Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. Bmj, 338, b2393.
  • Strenze, T. (2007). Intelligence and socioeconomic success: A meta-analytic review of longitudinal research. Intelligence, 35(5), 401-426.
  • Turken, A. et al. (2008). Cognitive processing speed and the structure of white matter pathways: convergent evidence from normal variation and lesion studies. Neuroimage 42, 1032–1044
  • Unsworth, N., Fukuda, K., Awh, E., & Vogel, E. K. (2014). Working memory and fluid intelligence: Capacity, attention control, and secondary memory retrieval. Cognitive psychology, 71, 1-26.

Introduction

This is a post in the on-going series of comments on studies of international/transracial adoption. A global genetic/hereditarian model of cognitive differences and their socioeconomic effects implies that adoptees from different populations/countries/regions should show the usual group differences in the above mentioned traits and outcomes, all else equal. All else is of course not equal since adoptees from different regions can be adopted at different ages, experience different environment leading up to the adoption, possibly experience different environments after adoption thru no cause of their own (discrimination), and so on. It is not a strict test: finding the usual group differences can be explained by non-genetic factors, and finding no differences or unexpected ones could be consistent with a genetic model given strong non-genetic effects such as differences in adoption practices between origin countries/regions/populations. However, were such differences to be found relatively consistently in sufficiently powered studies, it would be an important prediction verified by the genetic model, broadly speaking. The question thus remains: what do the studies show?

Bruce et al (2009) – Disinhibited Social Behavior Among Internationally Adopted Children

Their abstract reads:

Postinstitutionalized children frequently demonstrate persistent socioemotional difficulties. For example, some postinstitutionalized children display an unusual lack of social reserve with unfamiliar adults. This behavior, which has been referred to as indiscriminate friendliness, disinhibited attachment behavior, and disinhibited social behavior, was examined by comparing children internationally adopted from institutional care to children internationally adopted from foster care and children raised by their biological families. Etiological factors and behavioral correlates were also investigated. Both groups of adopted children displayed more disinhibited social behavior than the nonadopted children. Of the etiological factors examined, only the length of time in institutional care was related to disinhibited social behavior. Disinhibited social behavior was not significantly correlated with general cognitive ability, attachment-related behaviors, or basic emotion abilities. However, this behavior was negatively associated with inhibitory control abilities even after controlling for the length of time in institutional care. These results suggest that disinhibited social behavior might reflect underlying deficits in inhibitory control.

While this does not seem immediately relevant, the authors do investigate IQ. The study design is a three-way comparison between adoptees in foster homes, institutional care and non-adoptees. The sample are small: 40 x 40 x 40. These are children who spent most of their lives first in foster homes and were then adopted, children who spent most of their lives in institutional care and then adopted, and biological children of the adoptive families for comparison. The groups were not equal in origin composition:

However, because countries have institutional or foster systems to care for wards of the state, the institutional care and foster care groups differed in terms of country of origin. The institutional care group was primarily from Eastern Europe (45%) and China (43%), whereas the foster care group was primarily from South Korea (80%).

Their cognitive measure is:

General cognitive ability.To provide an estimate of the children’s general cognitive functioning, each child was administered the vocabulary and block design subtests of the Wechsler Intelligence Scale for Children, 3rd edition (Wechsler, 1991). These subtests are considered the best measures of verbal and nonverbal intelligence, respectively, and are highly correlated with the full scale intelligence quotient (Sattler, 1992). Raw scores on the subtests were converted into age-normed scaled scores. The scaled scores were then summed and transformed into full scale intelligence quotient equivalents.

They give the mean FSIQ by the above three groups, not origin groups. These are:

Group Institutional care Foster care Control
FSIQ 102.68 (16.25) 109.37 (12.93) 117.11 (15.88)

Note: numbers in parentheses are SDs.

The high scores is presumably due to the FLynn effect. The IQ test is from 1991, but the study is from 2009, so there has been 18 years for raw score gains compared to the normative sample. An alternative idea is that the families were just above average SES/IQ which boosts the IQ scores of younger children. Nearly 100% of the adoptive families were Caucasian (presumably European) and were part of the Minnesota International Adoption Project Registry. According to Wiki, the non-Hispanic % of this state is 83%, so Caucasians are a bit overrepresented. In general, these elevation effects are not important when comparing groups within a study.

I contacted the first author to ask if she would give me some more data, and she obliged:

Please find the requested information below. For each region of origin, I provided the number of children, mean, and standard deviation for the Block Design Standard Score, Vocabulary Standard Score, and full scale IQ equivalent. I did not provide these statistics for regions with less than 5 children. Please let me know if you have any questions

Origin N FSIQ Vocabulary Block design
Institutional care
Eastern Europe (e.g., Russia, Romania, Ukraine) 18 94.84 (15.070) 8.39 (2.615) 8.39 (2.615)
China 17 111.77 (13.774) 11.76 (2.969) 12.29 (3.274)
Foster care
South Korea 31/32 110.38 (12.457) 12.10 (2.970) 11.53 (2.940)
Guatemala 5 104.64 (16.609) 12.40 (3.050) 9.20 (3.033)

Notes: Sample size for S. Korea was 31, 31, 32. No explanation given. Numbers in parentheses are SDs.

Again we see that East Asians do well, altho not better than the control children (mean=117). Eastern European do less well, but it is hard to say the exact expected mean since it is not stated how many come from which countries. The IQ of Lynn and Vanhanen (2012) gives 91 as the IQ of Romania, 96.6 for Russia and 94.3 for Ukraine. The Guatemalans certainly do better than expected (79 IQ), but N=5, and the demographics of Guatemala is very mixed making it possible that the adoptees actually had predominantly European ancestry.

The email reply

Hello Emil,
Please find the requested information below. For each region of origin, I provided the number of children, mean, and standard deviation for the Block Design Standard Score, Vocabulary Standard Score, and full scale IQ equivalent. I did not provide these statistics for regions with less than 5 children. Please let me know if you have any questions
Thank you for your interest in our publication,
Jackie

Institutional care group:
Eastern Europe (e.g., Russia, Romania, Ukraine)
N Mean Std. Deviation
Block Design Standard Score 18 9.83 3.884
Vocabulary Standard Score 18 8.39 2.615
IQ equivalent 18 94.84 15.070

China
N Mean Std. Deviation
Block Design Standard Score 17 12.29 3.274
Vocabulary Standard Score 17 11.76 2.969
IQ equivalent 17 111.77 13.774

Foster care group:
South Korea
N Mean Std. Deviation
Block Design Standard Score 32 11.53 2.940
Vocabulary Standard Score 31 12.10 2.970
IQ equivalent 31 110.38 12.457

Guatemala
N Mean Std. Deviation
Block Design Standard Score 5 9.20 3.033
Vocabulary Standard Score 5 12.40 3.050
IQ equivalent 5 104.64 16.609

References

Bruce, J., Tarullo, A. R., & Gunnar, M. R. (2009). Disinhibited social behavior among internationally adopted children. Development and psychopathology, 21(01), 157-171.

Abstract
I reanalyze data reported by Richard Lynn in a 1979 paper concerning IQ and socioeconomic variables in 12 regions of the United Kingdom as well as Ireland. I find a substantial S factor across regions (66% of variance with MinRes extraction). I produce a new best estimate of the G scores of regions. The correlation of this with the S scores is .79. The MCV with reversal correlation is .47.

Introduction
The interdisciplinary academic field examining the effect of general intelligence on large scale social phenomena has been called social ecology of intelligence by Richard Lynn (1979, 1980) and sociology of intelligence by Gottfredson (1998). One could also call it cognitive sociology by analogy with cognitive epidemiology (Deary, 2010; Special issue in Intelligence Volume 37, Issue 6, November–December 2009; Gottfredson, 2004). Whatever the name, it is a field that has received renewed attention recently. Richard Lynn and co-authors report data on Italy (Lynn 2010a, 2010b, 2012a, Piffer and Lynn 2014, see also papers by critics), Spain (Lynn 2012b), China (Lynn and Cheng, 2013) and India (Lynn and Yadav, 2015). Two of his older studies cover the British Isles and France (Lynn, 1979, 1980).

A number of my recent papers have reanalyzed data reported by Lynn, as well as additional data I collected. These cover Italy, India, United States, and China (Kirkegaard 2015a, 2015b, 2015c, 2015d). This paper reanalyzes Lynn’s 1979 paper.

Cognitive data and analysis

Lynn’s paper contains 4 datasets for IQ data that covers 11 regions in Great Britain. He further summarizes some studies that report data on Northern Ireland and the Republic of Ireland, so that his cognitive data covers the entire British Isles. Lynn only uses the first 3 datasets to derive a best estimate of the IQs. The last dataset does not report cognitive scores as IQs, but merely percentages of children falling into certain score intervals. Lynn converts these to a mean (method not disclosed). However, he is unable to convert this score to the IQ scale since the inter-personal standard deviation (SD) is not reported in the study. Lynn thus overlooks the fact that one can use the inter-regional SD from the first 3 studies to convert the 4th study to the common scale. Furthermore, using the intervals one could presumably estimate the inter-personal SD, altho I shall not attempt this. The method for converting the mean scores to the IQ score is this:

  1. Standardize the values by subtracting the mean and dividing by the inter-regional SD.
  2. Calculate the inter-regional SD in the other studies, and find the mean of these. Do the same for the inter-regional means.
  3. Multiple the standardized scores by the mean inter-regional SD from the other studies and add the inter-regional mean.

However, I did not use this method. I instead factor analyzed the four 4 IQ datasets as given and extracted 1 factor (extraction method = MinRes). All factor loadings were strongly positive indicating that G could be reliably measured among the regions. The factor score from this analysis was put on the same scale as the first 3 studies by the method above. This is necessary because the IQs for Northern Ireland and the Republic of Ireland are given on that scale. Table 1 shows the correlations between the cognitive variables. The correlations between G and the 4 indicator variables are their factor loadings (italic).

Table 1 – Correlations between cognitive datasets
Vernon.navy Vernon.army Douglas Davis G Lynn.mean
Vernon.navy 1 0.66 0.92 0.62 0.96 0.92
Vernon.army 0.66 1 0.68 0.68 0.75 0.89
Douglas 0.92 0.68 1 0.72 0.99 0.93
Davis 0.62 0.68 0.72 1 0.76 0.74
G 0.96 0.75 0.99 0.76 1 0.96
Lynn.mean 0.92 0.89 0.93 0.74 0.96 1

 

It can be noted that my use of factor analysis over simply averaging the datasets had little effect. The correlation of Lynn’s method (mean of datasets 1-3) and my G factor is .96.

Socioeconomic data and analysis

Lynn furthermore reports 7 socioeconomic variables. I quote his description of these:

“1. Intellectual achievement: (a) first-class honours degrees. All first-class honours graduates of the year 1973 were taken from all the universities in the British Isles (with the exception of graduates of Birkbeck College, a London College for mature and part-time students whose inclusion would bias the results in favour of London). Each graduate was allocated to the region where he lived between the ages of 11 and 18. This information was derived from the location of the graduate’s school. Most of the data were obtained from The Times, which publishes annually lists of students obtaining first-class degrees and the schools they attended. Students who had been to boarding schools were written to requesting information on their home residence. Information from the Republic of Ireland universities was obtained from the college records.

The total number of students obtaining first-class honours degrees was 3477, and information was obtained on place of residence for 3340 of these, representing 96 06 per cent of the total.
There are various ways of calculating the proportions of first-class honours graduates produced by each region. Probably the most satisfactory is to express the numbers of firsts in each region per 1000 of the total age cohorts recorded in the census of 1961. In this year the cohorts were approximately 9 years old. The reason for going back to 1961 for a population base is that the criterion taken for residence is the school attended and the 1961 figures reduce the distorting effects of subsequent migration between the regions. However, the numbers in the regions have not changed appreciably during this period, so that it does not matter greatly which year is taken for picking up the total numbers of young people in the regions aged approximately 21 in 1973. (An alternative method of calculating the regional output of firsts is to express the output as a percentage of those attending university. This method yields similar figures.)

2. Intellectual achievement: (b) Fellowships of the Royal Society. A second measure of intellectual achievement taken for the regions is Fellowships of the Royal Society. These are well-known distinctions for scientific work in the British Isles and are open equally to citizens of both the United Kingdom and the Republic of Ireland. The population consists of all Fellows of the Royal Society elected during the period 1931-71 who were born after the year 1911. The number of individuals in this population is 321 and it proved possible to ascertain the place of birth of 98 per cent of these. The Fellows were allocated to the region in which they were born and the numbers of Fellows born in each region were then calculated per million of the total population of the region recorded in the census of 1911. These are the data shown in Table 2. The year 1911 was taken as the population base because the majority of the sample was born between the years 1911-20, so that the populations in 1911 represent approximately the numbers in the regions around the time most of the Fellows were born. (The populations of the regions relative to one another do not change greatly over the period, so that it does not make much difference to the results which census year is taken for the population base.)

3. Per capita income. Figures for per capita incomes for the regions of the United Kingdom are collected by the United Kingdom Inland Revenue. These have been analysed by McCrone (1965) for the standard regions of the UK for the year 1959/60. These results have been used and a figure for the Republic of Ireland calculated from the United Nations Statistical Yearbook.

4. Unemployment. The data are the percentages of the labour force unemployed in the regions for the year 1961 (Statistical Abstracts of the UK and of Ireland).

5. Infant mortality. The data are the numbers of deaths during the first year of life expressed per 1000 live births for the year 1961 (Registrar Generals’ Reports).

6. Crime. The data are offences known to the police for 1961 and expressed per 1000 population (Statistical Abstracts of the UK and of Ireland).

7. Urbanization. The data are the percentages of the population living in county boroughs, municipal boroughs and urban districts in 1961 (Census).”

Lynn furthermore reports historical achievement scores as well as an estimate of inter-regional migration (actually change in population which can also be due to differential fertility). I did not use these in my analysis but they can be found in the datafile in the supplementary material.

Since there are 13 regions in total and 7 variables, I can analyze all variables at once and still almost conform to the rule of thumb of having a case-to-variable ratio of 2 (Zhao, 2009). Table 2 shows the factor loadings from this factor analysis as well as the correlation with G for each socioeconomic variable.

Table 2 – Correlations between S, S indicators, and G
Variable S G
Fellows.RS 0.92 0.92
First.class 0.55 0.58
Income 0.99 0.72
Unemployment -0.85 -0.79
Infant.mortality -0.68 -0.69
Crime 0.83 0.52
Urbanization 0.88 0.64
S 1 0.79

 

The crime variable had a strong positive loading on the S factor and also a positive correlation with the G factor. This is in contrast to the negative relationship found at the individual-level between the g factor and crime variables at about r=-.2 (Neisser 1996). The difference in mean IQ between criminal and non-criminal samples is usually around 7-15 points depending on which criminal group (sexual, violent and chronic offenders score lower than other offenders; Guay et al, 2005). Beaver and Wright (2011) found that IQ of countries was also negatively related to crime rates, r’s range from -.29 to -.58 depending on type of crime variable (violent crimes highest). At the level of country of origin groups, Fuerst and Kirkegaard (2014a) found that crime variables had strong negative loadings on the S factor (-.85 and -.89) and negative correlations with country of origin IQ. Altho not reported in the paper, Kirkegaard (2014b) found that the loading of 2 crime variables on the S factor in Norway among country of origin groups was -.63 and -.86 (larceny and violent crime; calculated using the supplementary material using the fully imputed dataset). Kirkegaard (2015a) found S loadings of .16 and -.72 of total crime and intentional homicide variables in Italy. Among US states, Kirkegaard (2015c) found S loadings of -.61 and -.71 for murder rate and prison rate. The scatter plot is shown in Figure 1.

Figure 1 – Scatter plot of regional G and S

G_S

 

 

 

 

 

 

 

 

 

 

So, the most similar finding in previous research is that from Italy. There are various possible explanations. Lynn (1979) thinks it is due to large differences in urbanization (which loads positively in multiple studies, .88 in this study). There may be some effect of the type of crime measurement. Future studies could examine this question by employing many different crime variables. My hunch is that it is a combination of differences in urbanization (which increases crime), immigration of crime prone persons into higher S areas, and differences in the justice system between areas.

Method of correlated vectors (MCV)

As done in the previous analysis of S factors, I performed MCV analysis to see whether the G factor was the reason for the association with the G factor score. S factor indicators with negative loadings were reversed to avoid inflating the result (these are marked with “_r” in the plot). The result is shown in Figure 2.

Figure 2 – MCV scatter plot

MCV

 

 

 

 

 

 

As in the previous analyses, the relationship was positive even after reversal.

Per capita income and the FLynn effect

An interesting quote from the paper is:

This interpretation [that the first factor of his factor analysis is intelligence] implies that the mean population IQs should be regarded as the cause of the other variables. When causal relationships between the variables are considered, it is obvious that some of the variables are dependent on others. For instance, people do not become intelligent as a consequence of getting a first-class honours degree. Rather, they get firsts because they are intelligent. The most plausible alternative causal variable, apart from IQ, is per capita income, since the remaining four are clearly dependent variables. The arguments against positing per capita income as the primary cause among this set of variables are twofold. First, among individuals it is doubtful whether there is any good evidence that differences in income in affluent nations are a major cause of differences in intelligence. This was the conclusion reached by Burt (1943) in a discussion of this problem. On the other hand, even Jencks (1972) admits that IQ is a determinant of income. Secondly, the very substantial increases in per capita incomes that have taken place in advanced Western nations since 1945 do not seem to have been accompanied by any significant increases in mean population IQ. In Britain the longest time series is that of Burt (1969) on London schoolchildren from 1913 to 1965 which showed that the mean IQ has remained approximately constant. Similarly in the United States the mean IQ of large national samples tested by two subtests from the WISC has remained virtually the same over a 16 year period from the early 1950s to the mid-1960s (Roberts, 1971). These findings make it doubtful whether the relatively small differences in per capita incomes between the regions of the British Isles can be responsible for the mean IQ differences. It seems more probable that the major causal sequence is from the IQ differences to the income differences although it may be that there is also some less important reciprocal effect of incomes on IQ. This is a problem which could do with further analysis.

Compare with Lynn’s recent overview of the history of the FLynn effect (Lynn, 2013).

References

  • Beaver, K. M.; Wright, J. P. (2011). “The association between county-level IQ and county-level crime rates”. Intelligence 39: 22–26. doi:10.1016/j.intell.2010.12.002
  • Deary, I. J. (2010). Cognitive epidemiology: Its rise, its current issues, and its challenges. Personality and individual differences, 49(4), 337-343.
  • Guay, J. P., Ouimet, M., & Proulx, J. (2005). On intelligence and crime: A comparison of incarcerated sex offenders and serious non-sexual violent criminals. International journal of law and psychiatry, 28(4), 405-417.
  • Gottfredson, L. S. (1998). Jensen, Jensenism, and the sociology of intelligence. Intelligence, 26(3), 291-299.
  • Gottfredson, L. S. (2004). Intelligence: is it the epidemiologists’ elusive” fundamental cause” of social class inequalities in health?. Journal of personality and social psychology, 86(1), 174.
  • Intelligence, Special Issue: Intelligence, health and death: The emerging field of cognitive epidemiology. Volume 37, Issue 6, November–December 2009
  • Kirkegaard, E. O. W., & Fuerst, J. (2014a). Educational attainment, income, use of social benefits, crime rate and the general socioeconomic factor among 71 immigrant groups in Denmark. Open Differential Psychology.
  • Kirkegaard, E. O. W. (2014b). Crime, income, educational attainment and employment among immigrant groups in Norway and Finland. Open Differential Psychology.
  • Kirkegaard, E. O. W. (2015a). S and G in Italian regions: Re-analysis of Lynn’s data and new data. The Winnower.
  • Kirkegaard, E. O. W. (2015b). Indian states: G and S factors. The Winnower.
  • Kirkegaard, E. O. W. (2015c). Examining the S factor in US states. The Winnower.
  • Kirkegaard, E. O. W. (2015d). The S factor in China. The Winnower.
  • Lynn, R. (1979). The social ecology of intelligence in the British Isles. British Journal of Social and Clinical Psychology, 18(1), 1-12.
  • Lynn, R. (1980). The social ecology of intelligence in France. British Journal of Social and Clinical Psychology, 19(4), 325-331.
  • Lynn, R. (2010a). In Italy, north–south differences in IQ predict differences in income, education, infant mortality, stature, and literacy. Intelligence, 38(1), 93-100.
  • Lynn, R. (2010b). IQ differences between the north and south of Italy: A reply to Beraldo and Cornoldi, Belacchi, Giofre, Martini, and Tressoldi. Intelligence, 38(5), 451-455.
  • Lynn, R. (2012a). IQs in Italy are higher in the north: A reply to Felice and Giugliano. Intelligence, 40(3), 255-259.
  • Lynn, R. (2012b). North-south differences in Spain in IQ, educational attainment, per capita income, literacy, life expectancy and employment. Mankind Quarterly, 52(3/4), 265.
  • Lynn, R. (2013). Who discovered the Flynn effect? A review of early studies of the secular increase of intelligence. Intelligence, 41(6), 765-769.
  • Lynn, R., & Cheng, H. (2013). Differences in intelligence across thirty-one regions of China and their economic and demographic correlates. Intelligence, 41(5), 553-559.
  • Lynn, R., & Yadav, P. (2015). Differences in cognitive ability, per capita income, infant mortality, fertility and latitude across the states of India. Intelligence, 49, 179-185.
  • Neisser, Ulric et al. (February 1996). “Intelligence: Knowns and Unknowns”. American Psychologist 52 (2): 77–101, 85.
  • Piffer, D., & Lynn, R. (2014). New evidence for differences in fluid intelligence between north and south Italy and against school resources as an explanation for the north–south IQ differential. Intelligence, 46, 246-249.
  • Zhao, N. (2009). The Minimum Sample Size in Factor Analysis. Encorewiki.