Clear Language, Clear Mind

March 29, 2016

Data sharing and Add Health

I am doing an S factor study of US counties in the usual way. For that reason, I need some kind of county-level cognitive ability estimate. I know that this is possible to create using the Add Health database, but that the data are not sharable. However, it may be possible to do some tricks, so I wrote to their support to get things clarified.

Emil:

Dear Add Health,
I could not find an answer to this question anywhere. Suppose I or someone else has calculated an aggregate value (usually a mean) for each county in the US. Would it be against the Add Health data rules to release this non-personal data to other researchers?

Add Health:

Emil,

Thank you for contacting Add Health about this issue.  The following guidelines listed in the contract must be followed when publishing information about Add Health.

To avoid inadvertent disclosure of persons, families, or households by using the following guidelines in the release of statistics derived from the Data     Files.

  1. In no table should all cases in any row or column be found in a single cell.
  2. In no case should the total for a row or column of a cross-tabulation be fewer than three (3).
  3. In no case should a cell frequency of a cross-tabulation be fewer than three (3) cases.
  4. In no case should a quantity figure be based on fewer than three (3) cases.
  5. Data released should never permit disclosure when used in combination with other known data.

Since the mean for a county is likely to be unique you would not be able to satisfy the above conditions.  Also, the data may allow users to identify the county by comparing it with the source data.  Therefore, you would not be able to release the aggregate information.

I hope this is helpful.

Best,

Joyce Tabor

Add Health Data Manager

Emil:

Hi Joyce,
How about adding a random number to the county means and excluding the counties with few cases? In this way, one cannot derive the scores of the individuals in the counties. Alternatively, how about binning the data by rounding the mean score to the nearest e.g. 5 (on a scale with a mean of 100 and SD of 15). This would make the means non-unique.

Add Health:

Emil,

Please provide more details about what you would like to do to help me better understand your project.  For example, who do you want to make these data available to?  What measures do you want to create at the county level?  Would your data be linkable to Add Health data?  If so, how would they link?

Best,

Joyce

Emil:

Joyce,

Thank you for your reply. Let me clarify matters. I did not do a study using Add Health data. However, some other researchers published a few studies using Add Health data. The studies in question are:

  1. http://www.sciencedirect.com/science/article/pii/S0160289610001340
  2. http://www.sciencedirect.com/science/article/pii/S0160289612001201
  3. http://www.sciencedirect.com/science/article/pii/S019188691300189X

They used the Add Health data to calculate an average IQ score for each county with data and then correlated this with other variables. I am unable to gain access to the Add Health data myself, but I work in the same field and would like the IQ scores for my own analyses. I asked the authors to send me the numbers, but they refused on account on the Add Health data sharing rules. However, it is my thinking that the privacy rules of Add Health are there to protect individuals from being deanonymized and so it should be possible to release aggregate-level county data without risking deanonymization. Hence, I wrote to you to investigate whether it would be possible to release the aggregate-level data so that other researchers can use it (open data).

As far as I understand, you had some concerns about this because this world produce unique values by county. I’m not sure I understand why that is a problem, but my proposals to remedy this problem were to either to 1) add a little random noise to the variable, thus obscuring the true values, or 2) bin the datapoints into nearest e.g. 2 or 5 point score (so e.g. 113.1434 gets turned into 115). Both of these obscure the original data but do not cause serious statistical degradation of the data. The methods can also be combined.

I am not interested in the individual-level data for this analysis, so I’m not sure what you mean by making the data linkable to the Add Health data. To make it perfectly clear, I would like for it to be possible to release the mean IQ scores by county with the county names, so that they can be merged with other datasets (e.g. Measure of America’s) for analytic purposes. If it is not possible to release the real means, then it is my hope that we can release the means with some added noise or binning as described above.

Add Health:

Emil,

The researchers are correct that they cannot provide you with data that they constructed using Add Health.  Only Add Health can release and share data based on Add Health.  Add Health never releases geographic identifiers smaller than Census region, therefore, county level data are not available.  The researchers you cite only have access to pseudo county level codes.  They do not have the data you want and Add Health will not release any data by location.

Best,

Joyce

So, the Add Health data sharing rules are extremely strict making the dataset much less useful. Some other way to estimate county-level IQ must be found.

February 15, 2016

Comments on Noah Carl’s new study: IQ and socio-economic development across local authorities of the UK

Paper is on Scihub.

There are a few researchers engaged in the cognitive sociology adventure. Aside from myself and John Fuerst, Noah Carl has also taken up the task. There is of course also Richard Lynn, the grand old man of these studies (publishing his first in 1979). Mostly the job of doing these analyses involves searching around to find a suitable dataset. Usually, one has to combine multiple resources. Noah has done exactly that.

Being British it is not surprising that Noah’s new study is also on the UK, like his previous study. The abstract reads:

Cross-regional correlations between average IQ and socio-economic development have been reported for many different countries. This paper analyses data on average IQ and a range of socio-economic variables at the local authority level in the UK. Local authorities are administrative bodies in local government; there are over 400 in the UK, and they contain anywhere from tens of thousands to more than a million people. The paper finds that local authority IQ is positively related to indicators of health, socio-economic status and tertiary industrial activity; and is negatively related to indicators of disability, unemployment and single parenthood. A general socioeconomic factor is correlated with local authority IQ at r = .56. This correlation increases to r = .65 when correcting for measurement error in the estimates of IQ.

In general, results are in line with the by now many previous studies: sizable correlations between group-level cognitive ability and group-level S indicators, and the existence of a clear S factor (16 of 16 loadings were in the expected direction; 15 correlated with cognitive ability in the expected direction). At a personal level, I am happy to be cited in the journal because it sends some attention towards the work published in the OpenPsych journals, including the new sociology & polit. sci. journal. Noah cites the London boroughs study, of which he was a reviewer.

Some things I am happy about:

  • He compiled a dataset of 16 S indicators.
  • He reported on the S factor.
  • He did some light multi-level analysis (reporting the variance associated with the regions and countries, which was small).
  • He used weighted analyses (square root of population, as we have also done before, but other weights could be used too).
  • He had access to 6 cognitive measures and used factor analysis to get a general factor.

Some critical comments:

  • The S factor loadings were not reported. Neither were the g loadings.
  • The scatterplot could have been better. For instance, one could have color boded the points by their region (helpful to spot regional effects) and mapped sample size to point size (like we did in the Admixture paper coming out in MQ sometimes soonish). Naming the outlier points could have been useful. I’m sure many readers wonder about the unit with an IQ of 110 and an S score of -1, or the unit with 90 IQ and an S score of 0. And what is that unit in the top right with 112 IQ and S of 2.5?
  • He did not use Jensen’s method to see if IQ is more strongly related to the indicators with stronger loadings, as one would expect. Because he does not report the loadings at all, no one else can do this. One could also use it on the cognitive measures, even if there are only 6 indicators. It is worth doing, like in the study of Indian states. It could be fed into a meta-analysis at some future point.
  • Related to the above, I did not see a mention of where one can find the data or analysis code. Is it public or hidden? I will ask. Not necessary:
    • “Next, average IQ was calculated for each of the 404 local authorities represented in the dataset. It is important to note that information on local authorities was obtained from the UK Data Service via a Special Licence. Therefore making these data available to other researchers is not possible. Information on local authorities are not included in the main Understanding Society dataset due to the fact that some local authorities contain relatively few respondents, which could permit identification of specific individuals.”
    • However, summary level information should be sufficient and the S data should still publishable. Noah is looking into it.
  • I’d like to see more multi-level analyses, e.g. like those done in my analysis of the religious general factor among Muslims. If one analyses the authorities in each region, are the S factor loadings the same as when analyzed at the national level? They need not be and differences could be revealing.
  • No demographic analyses. For some indicators, one would expect age to be a confound. One should at least try regressing it out if one can find mean age. Noah tells me it wasn’t (he did not report that in the paper I think).
    Noah didn’t do any correlations with SIRE (self-identified race/ethnicity). Perhaps he couldn’t find any data. Perhaps he did not want to be labelled a racist, only a classist. Perhaps one could find country of origin data for the units to try a compositional analysis like in this paper.

All in all, this study confirmed what has been found elsewhere. The findings in this field are very reproducible. There is still the odd case of Japan to wonder about, where results only are in line after ad hoc adjustment for population density, but that’s the only strange result I’ve come across. Well, that as Chile (the analysis of which will also be reported in more detail in the upcoming issue of Mankind Quarterly.)

June 27, 2015

The performance of African immigrants in Europe: Some Danish and Norwegian data

Due to lengthy discussion over at Unz concerning the good performance of some African groups in the UK, it seems worth it to review the Danish and Norwegian results. Basically, some African groups perform better on some measures than native British. The author is basically arguing that this disproves global hereditarianism. I think not.

The over-performance relative to home country IQ of some African countries is not restricted to the UK. In my studies of immigrants in Denmark and Norway, I found the same thing. It is very clear that there are strong selection effects for some countries, but not others, and that this is a large part of the reason why the home country IQ x performance in host country are not higher. If the selection effect was constant across countries, it would not affect the correlations. But because it differs between countries, it essentially creates noise in the correlations.

Two plots:

NO_S_IQ DK_S_IQ

The codes are ISO-3 codes. SO e.g. NGA is Nigeria, GHA is Ghana, KEN = Kenya and so on. They perform fairly well compared to their home country IQ, both in Norway and Denmark. But Somalia does not and the performance of several MENAP immigrants is abysmal.

The scores on the Y axis are S factor scores for their performance in these countries. They are general factors extracted from measures of income, educational attainment, use of social benefits, crime and the like. The S scores correlate .77 between the countries. For details, see the papers concerning the data:

  • Kirkegaard, E. O. W. (2014). Crime, income, educational attainment and employment among immigrant groups in Norway and Finland. Open Differential Psychology. Retrieved from http://openpsych.net/ODP/2014/10/crime-income-educational-attainment-and-employment-among-immigrant-groups-in-norway-and-finland/
  • Kirkegaard, E. O. W., & Fuerst, J. (2014). Educational attainment, income, use of social benefits, crime rate and the general socioeconomic factor among 70 immigrant groups in Denmark. Open Differential Psychology. Retrieved from http://openpsych.net/ODP/2014/05/educational-attainment-income-use-of-social-benefits-crime-rate-and-the-general-socioeconomic-factor-among-71-immmigrant-groups-in-denmark/

I did not use the scores from the papers, I redid the analysis. The code is posted below for those curious. The kirkegaard package is my personal package. It is on github. The megadataset file is on OSF.


 

library(pacman)
p_load(kirkegaard, ggplot2)

M = read_mega("Megadataset_v2.0e.csv")

DK = M[111:135] #fetch danish data
DK = DK[miss_case(DK) <= 4, ] #keep cases with 4 or fewer missing
DK = irmi(DK, noise = F) #impute the missing
DK.S = fa(DK) #factor analyze
DK_S_scores = data.frame(DK.S = as.vector(DK.S$scores) * -1) #save scores, reversed
rownames(DK_S_scores) = rownames(DK) #add rownames

M = merge_datasets(M, DK_S_scores, 1) #merge to mega

#plot
ggplot(M, aes(LV2012estimatedIQ, DK.S)) + 
  geom_point() +
  geom_text(aes(label = rownames(M)), vjust = 1, alpha = .7) +
  geom_smooth(method = "lm", se = F)
ggsave("DK_S_IQ.png")


# Norway ------------------------------------------------------------------

NO_work = cbind(M["Norway.OutOfWork.2010Q2.men"], #for work data
                M["Norway.OutOfWork.2011Q2.men"],
                M["Norway.OutOfWork.2012Q2.men"],
                M["Norway.OutOfWork.2013Q2.men"],
                M["Norway.OutOfWork.2014Q2.men"],
                M["Norway.OutOfWork.2010Q2.women"],
                M["Norway.OutOfWork.2011Q2.women"],
                M["Norway.OutOfWork.2012Q2.women"],
                M["Norway.OutOfWork.2013Q2.women"],
                M["Norway.OutOfWork.2014Q2.women"])

NO_income = cbind(M["Norway.Income.index.2009"], #for income data
                  M["Norway.Income.index.2010"],
                  M["Norway.Income.index.2011"],
                  M["Norway.Income.index.2012"])

#make DF
NO = cbind(M["NorwayViolentCrimeAdjustedOddsRatioSkardhamar2014"],
           M["NorwayLarcenyAdjustedOddsRatioSkardhamar2014"],
           M["Norway.tertiary.edu.att.bigsamples.2013"])


#get 5 year means
NO["OutOfWork.2010to2014.men"] = apply(NO_work[1:5],1,mean,na.rm=T) #get means, ignore missing
NO["OutOfWork.2010to2014.women"] = apply(NO_work[6:10],1,mean,na.rm=T) #get means, ignore missing

#get means for income and add to DF
NO["Income.index.2009to2012"] = apply(NO_income,1,mean,na.rm=T) #get means, ignore missing

plot_miss(NO) #view is data missing?

NO = NO[miss_case(NO) <= 3, ] #keep those with 3 datapoints or fewer missing
NO = irmi(NO, noise = F) #impute the missing

NO_S = fa(NO) #factor analyze
NO_S_scores = data.frame(NO_S = as.vector(NO_S$scores) * -1) #save scores, reverse
rownames(NO_S_scores) = rownames(NO) #add rownames

M = merge_datasets(M, NO_S_scores, 1) #merge with mega

#plot
ggplot(M, aes(LV2012estimatedIQ, NO_S)) +
  geom_point() +
  geom_text(aes(label = rownames(M)), vjust = 1, alpha = .7) +
  geom_smooth(method = "lm", se = F)
ggsave("NO_S_IQ.png")

sum(!is.na(M$NO_S))
sum(!is.na(M$DK.S))

cor(M$NO_S, M$DK.S, use = "pair")

 

June 26, 2015

IQ and socioeconomic development across Regions of the UK: a reanalysis

Abstract

A reanalysis of (Carl, 2015) revealed that the inclusion of London had a strong effect on the S loading of crime and poverty variables. S factor scores from a dataset without London and redundant variables was strongly related to IQ scores, r = .87. The Jensen coefficient for this relationship was .86.

 

Introduction

Carl (2015) analyzed socioeconomic inequality across 12 regions of the UK. In my reading of his paper, I thought of several analyses that Carl had not done. I therefore asked him for the data and he shared it with me. For a fuller description of the data sources, refer back to his article.

Redundant variables and London

Including (nearly) perfectly correlated variables can skew an extracted factor. For this reason, I created an alternative dataset where variables that correlated above |.90| were removed. The following pairs of strongly correlated variables were found:

  1. median.weekly.earnings and log.weekly.earnings r=0.999
  2. GVA.per.capita and log.GVA.per.capita r=0.997
  3. R.D.workers.per.capita and log.weekly.earnings r=0.955
  4. log.GVA.per.capita and log.weekly.earnings r=0.925
  5. economic.inactivity and children.workless.households r=0.914

In each case, the first of the pair was removed from the dataset. However, this resulted in a dataset with 11 cases and 11 variables, which is impossible to factor analyze. For this reason, I left in the last pair.

Furthermore, because capitals are known to sometimes strongly affect results (Kirkegaard, 2015a, 2015b, 2015d), I also created two further datasets without London: one with the redundant variables, one without. Thus, there were 4 datasets:

  1. A dataset with London and redundant variables.
  2. A dataset with redundant variables but without London.
  3. A dataset with London but without redundant variables.
  4. A dataset without London and redundant variables.

Factor analysis

Each of the four datasets was factor analyzed. Figure 1 shows the loadings.

loadings

Figure 1: S factor loadings in four analyses.

Removing London strongly affected the loading of the crime variable, which changed from moderately positive to moderately negative. The poverty variable also saw a large change, from slightly negative to strongly negative. Both changes are in the direction towards a purer S factor (desirable outcomes with positive loadings, undesirable outcomes with negative loadings). Removing the redundant variables did not have much effect.

As a check, I investigated whether these results were stable across 30 different factor analytic methods.1 They were, all loadings and scores correlated near 1.00. For my analysis, I used those extracted with the combination of minimum residuals and regression.

Mixedness

Due to London’s strong effect on the loadings, one should check that the two methods developed for finding such cases can identify it (Kirkegaard, 2015c). Figure 2 shows the results from these two methods (mean absolute residual and change in factor size):

mixedness
Figure 2: Mixedness metrics for the complete dataset.

As can be seen, London was identified as a far outlier using both methods.

S scores and IQ

Carl’s dataset also contains IQ scores for the regions. These correlate .87 with the S factor scores from the dataset without London and redundant variables. Figure 3 shows the scatter plot.

IQ_S
Figure 3: Scatter plot of S and IQ scores for regions of the UK.

However, it is possible that IQ is not really related to the latent S factor, just the other variance of the extracted S scores. For this reason I used Jensen’s method (method of correlated vectors) (Jensen, 1998). Figure 4 shows the results.

Jensen_method
Figure 4: Jensen’s method for the S factor’s relationship to IQ scores.

Jensen’s method thus supported the claim that IQ scores and the latent S factor are related.

Discussion and conclusion

My reanalysis revealed some interesting results regarding the effect of London on the loadings. This was made possible by data sharing demonstrating the importance of this practice (Wicherts & Bakker, 2012).

Supplementary material

R source code and datasets are available at the OSF.

References

Carl, N. (2015). IQ and socioeconomic development across Regions of the UK. Journal of Biosocial Science, 1–12. http://doi.org/10.1017/S002193201500019X

Jensen, A. R. (1998). The g factor: the science of mental ability. Westport, Conn.: Praeger.

Kirkegaard, E. O. W. (2015a). Examining the S factor in Mexican states. The Winnower. Retrieved from https://thewinnower.com/papers/examining-the-s-factor-in-mexican-states

Kirkegaard, E. O. W. (2015b). Examining the S factor in US states. The Winnower. Retrieved from https://thewinnower.com/papers/examining-the-s-factor-in-us-states

Kirkegaard, E. O. W. (2015c). Finding mixed cases in exploratory factor analysis. The Winnower. Retrieved from https://thewinnower.com/papers/finding-mixed-cases-in-exploratory-factor-analysis

Kirkegaard, E. O. W. (2015d). The S factor in Brazilian states. The Winnower. Retrieved from https://thewinnower.com/papers/the-s-factor-in-brazilian-states

Revelle, W. (2015). psych: Procedures for Psychological, Psychometric, and Personality Research (Version 1.5.4). Retrieved from http://cran.r-project.org/web/packages/psych/index.html

Wicherts, J. M., & Bakker, M. (2012). Publish (your data) or (let the data) perish! Why not publish your data too? Intelligence, 40(2), 73–76. http://doi.org/10.1016/j.intell.2012.01.004

1There are 6 different extraction and 5 scoring methods supported by the fa() function from the psych package (Revelle, 2015). Thus, there are 6*5 combinations.

June 16, 2015

The general socioeconomic factor among Colombian departments

Abstract

A dataset was compiled with 17 diverse socioeconomic variables for 32 departments of Colombia and the capital district. Factor analysis revealed an S factor. Results were robust to data imputation and removal of a redundant variable. 14 of 17 variables loaded in the expected direction. Extracted S factors correlated about .50 with the cognitive ability estimate. The Jensen coefficient for the S factor for this relationship was .60.

Keywords: Colombia, departments, social inequality, S factor, general socioeconomic factor, IQ, intelligence, cognitive ability, PISA, cognitive sociology

Files

https://osf.io/92vqd/files/

May 29, 2015

An S factor among census tracts of Boston

Abstract
A factor analysis was carried out on 6 socioeconomic variables for 506 census tracts of Boston. An S factor was found with positive loadings for median value of owner-occupied homes and average number of rooms in these; negative loadings for crime rate, pupil-teacher ratio, NOx pollution, and the proportion of the population of ‘lower status’. The S factor scores were negatively correlated with the estimated proportion of African Americans in the tracts r = -.36 [CI95 -0.43; -0.28]. This estimate was biased downwards due to data error that could not be corrected for.

Introduction
The general socioeconomic factor (s/S1) is a similar construct to that of general cognitive ability (GCA; g factor, intelligence, etc., (Gottfredson, 2002; Jensen, 1998). For ability data, it has been repeatedly found that performance on any cognitive test is positively related to performance on any other test, no matter which format (pen pencil, read aloud, computerized), and type (verbal, spatial, mathematical, figural, or reaction time-based) has been tried. The S factor is similar. It has been repeatedly found that desirable socioeconomic outcomes tend are positively related to other desirable socioeconomic outcomes, and undesirable outcomes positively related to other undesirable outcomes. When this pattern is found, one can extract a general factor such that the desirable outcomes have positive loadings and then undesirable outcomes have negative loadings. In a sense, this is the latent factor that underlies the frequently used term “socioeconomic status” except that it is broader and not just restricted to income, occupation and educational attainment, but also includes e.g. crime and health.

So far, S factors have been found for country-level (Kirkegaard, 2014b), state/regional-level (e.g. Kirkegaard, 2015), country of origin-level for immigrant groups (Kirkegaard, 2014a) and first name-level data (Kirkegaard & Tranberg, In preparation). The S factors found have not always been strictly general in the sense that sometimes an indicator loads in the ‘wrong direction’, meaning that either an undesirable variable loads positively (typically crime rates), or a desirable outcome loads negatively. These findings should not be seen as outliers to be explained away, but rather to be explained in some coherent fashion. For instance, crime rates may load positively despite crime being undesirable because the justice system may be better in the higher S states, or because of urbanicity tends to create crime and urbanicity usually has a positive loading. To understand why some indicators sometimes load in the wrong direction, it is important to examine data at many levels. This paper extends the S factor to a new level, that of census tracts in the US.

Data source
While taking a video course on statistical learning based on James, Witten, Hastie, & Tibshirani (2013), I noted that a dataset used as an example would be useful for an S factor analysis. The dataset concerns 506 census tracts of Boston and includes the following variables (Harrison & Rubinfeld, 1978):

  • Median value of owner-occupied homes
  • Average number of rooms in owner units.
  • Proportion of owner units built before 1940.
  • Proportion of the population that is ‘lower status’. “Proportion of adults without, some high school education and proportion of male workers classified as laborers)”.
  • Crime rate.
  • Proportion of residential land zoned for lots greater than 25k square feet.
  • Proportion of nonretail business acres.
  • Full value property tax rate.
  • Pupil-teacher ratios for schools.
  • Whether the tract bounds the Charles River.
  • Weighted distance to five employment centers in the Boston region.
  • Index of accessibility to radial highways.
  • Nitrogen oxide concentration. A measure of air pollution.
  • Proportion of African Americans.

See the original paper for a more detailed description of the variables.

This dataset has become very popular as a demonstration dataset in machine learning and statistics which shows the benefits of data sharing (Wicherts & Bakker, 2012). As Gilley & Pace (1996) note “Essentially, a cottage industry has sprung up around using these data to examine alternative statistical techniques.”. However, as they re-checked the data, they found a number of errors. The corrected data can be downloaded here, which is the dataset used for this analysis.

The proportion of African Americans
The variable concerning African Americans have been transformed by the following formula: 1000(x – .63)2. Because one has to take the square root to reverse the effect of taking the square, some information is lost. For example, if we begin with the dataset {2, -2, 2, 2, -2, -2} and take the square of these and get {4, 4, 4, 4, 4, 4}, it is impossible someone to reverse this transformation and get the original because they cannot tell whether 4 results from -2 or 2 being squared.

In case of the actual data, the distribution is shown in Figure 1.

untrans_hist
Figure 1: Transformed data for the proportion of blacks by census tract.

Due to the transformation, the values around 400 actually mean that the proportion of blacks is around 0. The function for back-transforming the values is shown in Figure 2.

backtransform_func
Figure 2: The transformation function.

We can now see the problem of back-transforming the data. If the transformed data contains a value between 0 and about 140, then we cannot tell which original value was with certainty. For instance, a transformed value of 100 might correspond to an original proportion of .31 or .95.

To get a feel for the data, one can use the Racial Dot Map explorer and look at Boston. Figure 3 shows the Boston area color-coded by racial groups.

Boston race
Figure 3: Racial dot map of Boston area.

As can be seen, the races tend to live rather separate with large areas dominated by one group. From looking at it, it seems that Whites and Asians mix more with each other than with the other groups, and that African Americans and Hispanics do the same. One might expect this result based on the groups’ relative differences in S factor and GCA (Fuerst, 2014). Still, this should be examined by numerical analysis, a task which is left for another investigation.

Still, we are left with the problem of how to back-transform the data. The conservative choice is to use only the left side of the function. This is conservative because any proportion above .63 will get back-transformed to a lower value. E.g. .80 will become .46, a serious error. This is the method used for this analysis.

Factor analysis
Of the variables in the dataset, there is the question of which to use for S factor analysis. In general when doing these analyses, I have sought to include variables that measure something socioeconomically important and which is not strongly influenced by the local natural environment. For instance, the dummy variable concerning the River Charles fails on both counts. I chose the following subset:

  • Median value of owner-occupied homes
  • Average number of rooms in owner units.
  • Proportion of the population that is ‘lower status’.
  • Crime rate.
  • Pupil-teacher ratios for schools.
  • Nitrogen oxide concentration. A measure of air pollution.

Which concern important but different things. Figure 4 shows the loadings plot for the factor analysis (reversed).2

S_loadings

Figure 4: Loadings plot for the S factor.

The S factor was confirmed for this data without exceptions, in that all indicator variables loaded in the expected direction. The factor was moderately strong, accounting for 47% of the variance.

Relationship between S factor and proportions of African Americans
Figure 5 shows a scatter plot of the relationship between the back-transformed proportion of African Americans and the S factor.

S_AA_backtrans
Figure 5: Scatter plot of S scores and the back-transformed proportion of African Americans by census tract in Boston.

We see that there is a wide variation in S factor even among tracts with no or very few African Americans. These low S scores may be due to Hispanics or simply reflect the wide variation within Whites (there few Asians back then). The correlation between proportion of African Americans and S is -.36 [CI95 -0.43; -0.28].

We see that many very low S points lie around S [-3 to -1.5]. Some of these points may actually be census tracts with very high proportions of African Americans that were back-transformed incorrectly.

Discussion
The value of r = -.36 should not be interpreted as an estimate of effect size of ancestry on S factor for census tracts in Boston because the proportions of the other sociological races were not used. A multiple regression or similar method with all sociological races as the predictors is necessary to answer this question. Still, the result above is in the expected direction based on known data concerning the mean GCA of African Americans, and the relationship between GCA and socioeconomic outcomes (Gottfredson, 1997).

Limitations
The back-transformation process likely introduced substantial error in the results.

Data are relatively old and may not reflect reality in Boston as it is now.

Supplementary material
Data, high quality figures and R source code is available at the Open Science Framework repository.

References

  • Gilley, O. W., & Pace, R. K. (1996). On the Harrison and Rubinfeld data. Journal of Environmental Economics and Management, 31(3), 403–405.
  • Gottfredson, L. S. (1997). Why g matters: The complexity of everyday life. Intelligence, 24(1), 79–132. http://doi.org/10.1016/S0160-2896(97)90014-3
  • Gottfredson, L. S. (2002). Where and Why g Matters: Not a Mystery. Human Performance, 15(1-2), 25–46. http://doi.org/10.1080/08959285.2002.9668082
  • Harrison, D., & Rubinfeld, D. L. (1978). Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management, 5(1), 81–102.
  • James, G., Witten, D., Hastie, T., & Tibshirani, R. (Eds.). (2013). An introduction to statistical learning: with applications in R. New York: Springer.
  • Jensen, A. R. (1998). The g factor: the science of mental ability. Westport, Conn.: Praeger.
  • Kirkegaard, E. O. W. (2014a). Crime, income, educational attainment and employment among immigrant groups in Norway and Finland. Open Differential Psychology. Retrieved from http://openpsych.net/ODP/2014/10/crime-income-educational-attainment-and-employment-among-immigrant-groups-in-norway-and-finland/
  • Kirkegaard, E. O. W. (2014b). The international general socioeconomic factor: Factor analyzing international rankings. Open Differential Psychology. Retrieved from http://openpsych.net/ODP/2014/09/the-international-general-socioeconomic-factor-factor-analyzing-international-rankings/
  • Fuerst, J. (2014). Ethnic/Race Differences in Aptitude by Generation in the United States: An Exploratory Meta-analysis. Open Differential Psychology. Retrieved from http://openpsych.net/ODP/2014/07/ethnicrace-differences-in-aptitude-by-generation-in-the-united-states-an-exploratory-meta-analysis/
  • Kirkegaard, E. O. W., & Tranberg, B. (In preparation). What is a good name? The S factor in Denmark at the name-level. Open Differential Psychology. Retrieved from https://osf.io/t2h9c/
  • Kirkegaard, E. O. W. (2015). Examining the S factor in Mexican states. The Winnower. Retrieved from https://thewinnower.com/papers/examining-the-s-factor-in-mexican-states
  • Rindermann, H. (2007). The g-factor of international cognitive ability comparisons: the homogeneity of results in PISA, TIMSS, PIRLS and IQ-tests across nations. European Journal of Personality, 21(5), 667–706. http://doi.org/10.1002/per.634
  • Wicherts, J. M., & Bakker, M. (2012). Publish (your data) or (let the data) perish! Why not publish your data too? Intelligence, 40(2), 73–76. http://doi.org/10.1016/j.intell.2012.01.004

Footnotes

1 Capital S is used when the data are aggregated, and small s is used when it is individual level data. This follows the nomenclature of (Rindermann, 2007).

2 To say that it is reversed is because the analysis gave positive loadings for undesirable outcomes and negative for desirable outcomes. This is because the analysis includes more indicators of undesirable outcomes and the factor analysis will choose the direction to which most indicators point as the positive one. This can easily be reversed by multiplying with -1.

April 30, 2015

The S factor in Brazilian states

Abstract
Sizeable S factors were found across 3 different datasets (from years 1991, 2000 and 2010), which explained 56 to 71% of the variance. Correlations of extracted S factors with cognitive ability were strong ranging from .69 to .81 depending on which year, analysis and dataset is chosen. Method of correlated vectors supported the interpretation that the latent S factor was primarily responsible for the association (r’s .71 to .81).

Introduction
Many recent studies have examined within-country regional correlates of (general) cognitive ability (also known as (general) intelligence, general mental ability, g),. This has been done for the British Isles (Lynn, 1979; Kirkegaard, 2015g), France (Lynn, 1980), Italy (Lynn, 2010; Kirkegaard, 2015e), Spain (Lynn, 2012), Portugal (Almeida, Lemos, & Lynn, 2011), India (Kirkegaard, 2015d; Lynn & Yadav, 2015), China (Kirkegaard, 2015f; Lynn & Cheng, 2013), Japan (Kura, 2013), the US (Kirkegaard, 2015b; McDaniel, 2006; Templer & Rushton, 2011), Mexico (Kirkegaard, 2015a) and Turkey (Lynn, Sakar, & Cheng, 2015). This paper examines data for Brazil.

Data
Cognitive data
Data from PISA was used as a substitute for IQ test data. PISA and IQ correlate very strongly (>.9; (Rindermann, 2007)) across nations and presumably also across regions altho this hasn’t been thoroly investigated to my knowledge.

Socioeconomic data
As opposed to some of my prior analyses, there was no dataset to build on top of. For this reason, I tried to find an English-language database for Brazil with a comprehensive selection of variables. Altho I found some resources, they did not allow for easy download and compilation of state-level data, which I needed. Instead, I relied upon the Portugeese-language site, Atlasbrasil.org.br, which has a comprehensive data explorer with a convenient download function for state-level data. I used Google Translate to find my way around the site.

Using the data explorer, I selected a broad range of variables. The goal was to cover most important areas of socioeconomic development and avoid variables of little importance or which are heavily influenced by local climate factors (e.g. amount of rainforest). The following variables were selected:

  1. Gini coefficient
  2. Activity rate age 25-29
  3. Unemployment rate age 25-29
  4. Public sector workers%
  5. Farmers%
  6. Service sector workers%
  7. Girls age 10-17 with child%
  8. Life expectancy
  9. Households without electricity%
  10. Infant mortality rate
  11. Child mortality rate
  12. Survive to 40%
  13. Survive to 60%
  14. Total fertility rate
  15. Dependancy ratio
  16. Aging rate
  17. Illiteracy age 11-14 %
  18. Illiteracy age 25 and above %
  19. Age 6-17 in school %
  20. Attendence higher education %
  21. Income per capita
  22. Mean income lowest quintile
  23. Pct extremely poor
  24. Richest 10 pct income
  25. Bad walls%
  26. Bad water and sanitation%
  27. HDI
  28. HDI income
  29. HDI life expectancy
  30. HDI education
  31. Population
  32. Population rural

Variables were available only for three time points: 1991, 2000 and 2010. I selected all three with intention of checking stability of results over different time periods.

Most data was already in an appropriate per unit measure so it was not necessary to do extensive conversions as with the Mexican data (Kirkegaard, 2015a). I calculated fraction of the population living in rural areas by dividing the rural population by the total population.

Note that the data explorer also has data at a lower level, that of municipals. It could be used in the future to see if the S factor holds for a lower level of aggregate analysis.

S factor loadings
I split the data into three datasets, one for 1991, 2000 and 2010.

I extracted S factors using the fa() function with default parameters from the psych package (Revelle, 2015).

S factor in 1991
Due to missing data, there were only 21 indicators available for this year. The data could not be imputed since it was missing for all cases for these variables. The loadings plot is shown in Figure 1.

S.1991.loadings

Figure 1: Loadings plot for S factor for the data from 1991

All indicators were in the expected direction aside from perhaps “aging rate”, which is somewhat unclear and would perhaps be expected to have a positive loading.

S factor in 2000
Less missing data, 26 variables. Loadings plot shown in Figure 2.

S.2000.loadings

Figure 2: Loadings plot for S factor for the 2000 data

All indicators were in the expected direction.

S factor for 2010
27 variables. Factor analysis gave an error for this dataset which means that I had to remove at least one variable.1 This left me with the question of which variable(s) to exclude. Similar to the previous analysis for Mexican states (Kirkegaard, 2015a), I used an automatic method. After removing one variable, the factor analysis worked and gave no warning. The excluded variable was child mortaility which was correlated near perfectly with another variable (infant mortality, r=.992), so little indicator sampling error should be introduced because of this deletion. The loadings plot is shown in Figure 3.

S.2010.1.loadings

Figure 3: Loadings plot for S factor for the 2010 data, minus one variable

Oddly, survival to 60 and 40 now have negative loadings, altho one would expect them to correlate highly with life expectancy which has a loading near 1. In fact, the correlations between life expectancy and the survival variables was -.06 and -.21, which makes you wonder what these variables are measuring. Excluding them does not substantially change results, but does increase the amount of variance explained to .60.

Out of curiosity, I also tried versions where I deleted 5 and 10 variables, but this did not change much in the plots, so I won’t show them. Interested readers can consult the source code.

Mixed cases
To examine whether there are any cases with strong mixedness — cases that are incongruent with the factor structure in the data — I developed two methods which are presented elsewhere (Kirkegaard, 2015c). Briefly, the first method measures the mixedness of the case by quantifying how predictable indicator scores are from the factor score for each case (mean absolute residual, MAR). The second quantifies how much the size of the general factor changes after exclusion of each individual case (improvement in proportion of variance, IPV). Both methods were found to be useful at finding a strongly mixed case in simulation data.

I applied both methods to the Brazilian datasets. For the second method, I had to create two additional reduced datasets since the factor analysis could not run with the resulting combinations of cases and indicators.

There are two ways one can examine the results: 1) by looking at the top (or bottom) most mixed cases for each method; 2) by looking at the correlations between results from the methods. The first is interesting if Brazilian state-level inequality in S has particular interest, while the second is more relevant for checking that the methods really work — they should give congruent results if mixed cases are present.

Top mixed cases
For each method and each analysis, I extracted the names of the top 5 mixed states. They are shown in Table 1.

 

Position_1

Position_2

Position_3

Position_4

Position_5

m1.1991

Amapá

Acre

Distrito Federal

Roraima

Rondônia

m1.2000

Amapá

Roraima

Acre

Distrito Federal

Rondônia

m1.2010.1

Roraima

Distrito Federal

Amapá

Amazonas

Acre

m1.2010.5

Roraima

Distrito Federal

Amapá

Acre

Amazonas

m1.2010.10

Roraima

Distrito Federal

Amapá

Acre

Amazonas

m2.1991

Amapá

Rondônia

Acre

Roraima

Amazonas

m2.2000.1

Amapá

Rondônia

Roraima

Paraíba

Ceará

m2.2010.2

Amapá

Roraima

Distrito Federal

Pernambuco

Sergipe

m2.2010.5

Amapá

Roraima

Distrito Federal

Piauí

Bahia

m2.2010.10

Distrito Federal

Roraima

Amapá

Ceará

Tocantins

 

Table 1: Top 5 mixed cases by method and dataset

As can be seen, there is quite a bit of agreement across years, datasets, and methods. If one were to do a more thoro investigation of socioeconomic differences across Brazilian states, one should examine these states for unusual patterns. One could do this using the residuals for each indicator by case from the first method (these are available from the FA.residuals() in psych2). A quick look at the 2010.1 data for Amapá shows that the state is roughly in the middle regarding state-level S (score = -.26, rank 15 of 27), Farmers do not constitute a large fraction of the population (only 9.9%, rank 4 only behind the states with large cities: Federal district, Rio de Janeiro, and São Paulo). Given that farmers% has a strong negative loading (-.77) and the state’s S score, one would expect the state to have relatively more farmers than it has, the mean of all states for that dataset is 17.2%.

Much more could be said along these lines, but I rather refrain since I don’t know much about the country and can’t read the language very well. Perhaps a researchers who is a Brailizian native could use the data to make a more detailed analysis.

Correlations between methods and datasets
To test whether the results were stable across years, data reductions, and methods, I correlated all the mixedness metrics. Results are in Table 2.

 

m1.1991

m1.2000

m1.2010.1

m1.2010.5

m1.2010.10

m2.1991

m2.2000.1

m2.2010.2

m2.2010.5

m1.1991

m1.2000

0.88

m1.2010.1

0.81

0.85

m1.2010.5

0.77

0.87

0.98

m1.2010.10

0.70

0.79

0.93

0.96

m2.1991

0.48

0.64

0.45

0.48

0.40

m2.2000.1

0.41

0.58

0.34

0.39

0.27

0.87

m2.2010.2

0.53

0.63

0.66

0.66

0.51

0.58

0.68

m2.2010.5

0.32

0.49

0.60

0.64

0.51

0.49

0.59

0.86

m2.2010.10

0.42

0.44

0.66

0.65

0.59

0.32

0.44

0.75

0.76

 

Table 2: Correlation table for mixedness metrics across datasets and methods.

There is method specific variance since the correlations within method (topleft and bottomright squares) are stronger than those across methods. Still, all correlations are positive, Cronbach’s alpha is .87, Guttman lambda 6 is .98 and the mean correlation is .61.

S and HDI correlations
HDI
Previous S factor studies have found that HDI (Human Development Index) is basically a non-linear proxy for the S factor (Kirkegaard, 2014, 2015a). This is not surprising since the HDI is calculated from longevity, education and income, all three of which are known to have strong S factor loadings. The actual derivation of HDI values is somewhat complex. One might expect them simple to average the three indicators, or extract the general factor, but no. Instead they do complicated things (WHO, 2014).

For longevity (life expectancy at birth), they introduce ceilings at 25 and 85 years. According to data from WHO (WHO, 2012), no country has values above or below these values altho Japan is close (84 years).

For education, it is actually an average of two measures: years of education by 25 year olds and expected years of schooling for children entering school age. These measures also have artificial limits of 0-18 and 0-15 respectively.

For gross national income, they use the log values and also artificial limits of 100-75,000 USD.

Moreover, these are not simply combined by standardizing (i.e. rescaling so the mean is 0 and standard deviation is 1) the values and adding them or taking the mean. Instead, a value is calculated for every indicator using the following formula:

HDI_dimension_formula
Equation 1: HDI index formula

Note that for education, this formula is used twice and the results averaged.

Finally, the three dimensions are combined using a geometric mean:

HDI_combine
Equation 2: HDI index combination formula

The use of a geometric mean as opposed to the normal arithmetic mean, is that if a single indicator is low, the overall level is strongly reduced, whereas with the arithmetic, only the sum of the indicators matter, not the standard deviation of them. If the indicators have the same value, then the geometric and arithmetic mean have the same value.

For instance, if indicators are .7, .7, .7, the arithmetic mean is .7+.7+.7=2.1, 2.1/3=.7 and the geometric .73=0.343, 0.3431/3=.7. However, if indicators are 1, .7, .4, then the arithmetic mean is 1+.7+.4=2.1, 2.1/3=.7, but geometric mean is 1*.7*.4=0.28, 0.281/3=0.654 which is a bit lower than .7.

S and HDI correlations
I used the previously extracted factor scores and the HDI data. I also extracted S factors from the HDI datasets (3 variables)2 to see how these compared with the complex HDI value derivation. Finally, I correlated the S factors from non-HDI data, S factors from HDI data, HDI values and cognitive ability scores. Results are shown in Table 2.

 

HDI.1991

HDI.2000

HDI.2010

HDI.S.1991

HDI.S.2000

HDI.S.2010

S.1991

S.2000

S.2010.1

S.2010.5

S.2010.10

CA2012

HDI.1991

0.95

0.92

0.98

0.95

0.93

0.96

0.93

0.86

0.89

0.90

0.59

HDI.2000

0.97

0.97

0.94

0.99

0.96

0.94

0.98

0.93

0.95

0.96

0.66

HDI.2010

0.94

0.98

0.93

0.98

0.99

0.93

0.97

0.94

0.97

0.98

0.65

HDI.S.1991

0.98

0.96

0.94

0.95

0.94

0.98

0.92

0.84

0.88

0.90

0.54

HDI.S.2000

0.97

1.00

0.98

0.97

0.97

0.95

0.98

0.92

0.95

0.97

0.65

HDI.S.2010

0.95

0.98

0.99

0.95

0.98

0.94

0.96

0.94

0.96

0.97

0.66

S.1991

0.96

0.96

0.94

0.97

0.97

0.96

0.92

0.86

0.90

0.91

0.60

S.2000

0.95

0.98

0.96

0.93

0.99

0.97

0.97

0.96

0.98

0.98

0.69

S.2010.1

0.89

0.94

0.94

0.86

0.94

0.95

0.92

0.97

0.99

0.96

0.76

S.2010.5

0.91

0.95

0.96

0.89

0.96

0.97

0.93

0.98

0.99

0.98

0.72

S.2010.10

0.93

0.96

0.98

0.93

0.97

0.98

0.93

0.97

0.96

0.98

0.71

CA2012

0.67

0.73

0.71

0.60

0.72

0.74

0.69

0.78

0.81

0.79

0.75

 

Table 3: Correlation matrix for S, HDI and cognitive ability scores. Pearson’s below the diagonal, rank-order above.

All results were very strongly correlated no matter which dataset or scoring method was used. Cognitive ability scores were strongly correlated to all S factor measures. The best estimate of the relationship between S factor and cognitive ability is probably the correlation with S.2010.1, since this is the dataset cloest in time to the cognitive dataset and the S factor is extracted from the most variables. This is also the highest value (.81), but that may be a coincidence.

It is worth noting that the rank-order correlations were somewhat weaker. This usually indicates that an outlier case is increasing the Pearson correlation. To investigate this, I plot the S.2010.1 and CA2012 variables, see Figure 4.

CA_S_2010_1
Figure 4: Scatter plot of S factor and cognitive ability

The scatter plot however does not seem to reveal any outliers inflating the correlation.

Method of correlated vectors
To examine whether the S factor was plausibly the cause of the pattern seen with the S factor scores (it is not necessarily), I used the method of correlated vectors with reversing. Results are shown in Figures 5-7.

MCV_1991
Figure 5: MCV for the 1991 dataset

MCV_2000
Figure 6: MCV for the 2000 dataset

MCV_2010_1
Figure 7: MCV for the 2010 dataset

The first result seems to be driven by a few outliers, but the second and third seems decent enough. The numerical results were fairly consistent (.71, .75, .81).

Discussion and conclusion
Generally, the results were in line with earlier studies. Sizeable S factors were found across 3 (or 6 if one counts the mini-HDI ones) different datasets, which explained 56 to 71% of the variance. There seems to be a decrease over time, which is intriguing as it is may eventually lead to the ‘destruction’ of the S factor. It may also be due to differences between the datasets across the years, since they were not entirely comparable. I did not examine the issue in depth.

Correlations of S factors and HDIs with cognitive ability were strong ranging from .60 to .81 depending on which year, analysis, dataset is chosen, and whether one uses the HDI values. Correlations were stronger when they were from the larger datasets, which is perhaps because they were better measures of latent S. MCV supported the interpretation that the latent S factor was primarily responsible for the association (r’s .71 to .81).

Future studies should examine to which degree cognitive ability and S factor differences are explainable by ethnracial factors e.g. racial ancestry as done by e.g. (Kirkegaard, 2015b).

Limitations
There are some problems with this paper:

  • I cannot read Portuguese and this may have resulted in including some incorrect variables.
  • There was a lack of crime variables in the datasets, altho these have central importance for sociology. None were available in the data source I used.

Supplementary material
R source code, data and figures can be found in the Open Science Framework repository.

References

  • Almeida, L. S., Lemos, G., & Lynn, R. (2011). Regional Differences in Intelligence and per Capita Incomes in Portugal. Mankind Quarterly, 52(2), 213.
  • Kirkegaard, E. O. W. (2014). The international general socioeconomic factor: Factor analyzing international rankings. Open Differential Psychology. Retrieved from http://openpsych.net/ODP/2014/09/the-international-general-socioeconomic-factor-factor-analyzing-international-rankings/
  • Kirkegaard, E. O. W. (2015a). Examining the S factor in Mexican states. The Winnower. Retrieved from https://thewinnower.com/papers/examining-the-s-factor-in-mexican-states
  • Kirkegaard, E. O. W. (2015b). Examining the S factor in US states. The Winnower. Retrieved from https://thewinnower.com/papers/examining-the-s-factor-in-us-states
  • Kirkegaard, E. O. W. (2015c). Finding mixed cases in exploratory factor analysis. The Winnower. Retrieved from https://thewinnower.com/papers/finding-mixed-cases-in-exploratory-factor-analysis
  • Kirkegaard, E. O. W. (2015d). Indian states: G and S factors. The Winnower. Retrieved from https://thewinnower.com/papers/indian-states-g-and-s-factors
  • Kirkegaard, E. O. W. (2015e). S and G in Italian regions: Re-analysis of Lynn’s data and new data. The Winnower. Retrieved from https://thewinnower.com/papers/s-and-g-in-italian-regions-re-analysis-of-lynn-s-data-and-new-data
  • Kirkegaard, E. O. W. (2015f). The S factor in China. The Winnower. Retrieved from https://thewinnower.com/papers/the-s-factor-in-china
  • Kirkegaard, E. O. W. (2015g). The S factor in the British Isles: A reanalysis of Lynn (1979). The Winnower. Retrieved from https://thewinnower.com/papers/the-s-factor-in-the-british-isles-a-reanalysis-of-lynn-1979
  • Kura, K. (2013). Japanese north–south gradient in IQ predicts differences in stature, skin color, income, and homicide rate. Intelligence, 41(5), 512–516. http://doi.org/10.1016/j.intell.2013.07.001
  • Lynn, R. (1979). The social ecology of intelligence in the British Isles. British Journal of Social and Clinical Psychology, 18(1), 1–12. http://doi.org/10.1111/j.2044-8260.1979.tb00297.x
  • Lynn, R. (1980). The social ecology of intelligence in France. British Journal of Social and Clinical Psychology, 19(4), 325–331. http://doi.org/10.1111/j.2044-8260.1980.tb00360.x
  • Lynn, R. (2010). In Italy, north–south differences in IQ predict differences in income, education, infant mortality, stature, and literacy. Intelligence, 38(1), 93–100. http://doi.org/10.1016/j.intell.2009.07.004
  • Lynn, R. (2012). North-South Differences in Spain in IQ, Educational Attainment, per Capita Income, Literacy, Life Expectancy and Employment. Mankind Quarterly, 52(3/4), 265.
  • Lynn, R., & Cheng, H. (2013). Differences in intelligence across thirty-one regions of China and their economic and demographic correlates. Intelligence, 41(5), 553–559. http://doi.org/10.1016/j.intell.2013.07.009
  • Lynn, R., Sakar, C., & Cheng, H. (2015). Regional differences in intelligence, income and other socio-economic variables in Turkey. Intelligence, 50, 144–149. http://doi.org/10.1016/j.intell.2015.03.006
  • Lynn, R., & Yadav, P. (2015). Differences in cognitive ability, per capita income, infant mortality, fertility and latitude across the states of India. Intelligence, 49, 179–185. http://doi.org/10.1016/j.intell.2015.01.009
  • McDaniel, M. A. (2006). State preferences for the ACT versus SAT complicates inferences about SAT-derived state IQ estimates: A comment on Kanazawa (2006). Intelligence, 34(6), 601–606. http://doi.org/10.1016/j.intell.2006.07.005
  • Revelle, W. (2015). psych: Procedures for Psychological, Psychometric, and Personality Research (Version 1.5.4). Retrieved from http://cran.r-project.org/web/packages/psych/index.html
  • Rindermann, H. (2007). The g-factor of international cognitive ability comparisons: the homogeneity of results in PISA, TIMSS, PIRLS and IQ-tests across nations. European Journal of Personality, 21(5), 667–706. http://doi.org/10.1002/per.634
  • Templer, D. I., & Rushton, J. P. (2011). IQ, skin color, crime, HIV/AIDS, and income in 50 U.S. states. Intelligence, 39(6), 437–442. http://doi.org/10.1016/j.intell.2011.08.001
  • WHO. (2012). Life expectancy. Data by country. Retrieved from http://apps.who.int/gho/data/node.main.688?lang=en
  • WHO. (2014). Human Development Report: Technical notes: Calculating the human development indices—graphical presentation. WHO. Retrieved from http://hdr.undp.org/sites/default/files/hdr14_technical_notes.pdf

Footnotes

1 Error in min(eigens$values) : invalid ‘type’ (complex) of argument.

2 Factor loadings for HDI factor analysis were very strong, always >.9.

April 19, 2015

Examining the S factor in Mexican states

Paper PDF

Abstract
Two datasets of socioeconomic data was obtained from different sources. Both were factor analyzed and revealed a general factor (S factor). These factors were highly correlated with each other (.79 to .95), HDI (.68 to .93) and with cognitive ability (PISA; .70 to .78). The federal district was a strong outlier and excluding it improved results.

Method of correlated vectors was strongly positive for all 4 analyses (r’s .77 to .92 with reversing).

Key words: intelligence, IQ, general socioeconomic factor, S factor, inequality, Mexico, Mexican states

OSF: https://osf.io/zk3yx/

Main figures

March 28, 2015

The S factor in the British Isles: A reanalysis of Lynn (1979)

Read the study PDF.

Emil O. W. Kirkegaard, Ulster Institute for Social Research. Email: emil@emilkirkegaard.dk

Abstract
I reanalyze data reported by Richard Lynn in a 1979 paper concerning IQ and socioeconomic variables in 12 regions of the United Kingdom as well as Ireland. I find a substantial S factor across regions (66% of variance with MinRes extraction). I produce new best estimates of the G scores of the regions. The correlation of these with the S scores is .79. The MCV correlation with reversal is .47.

Key words: intelligence, IQ, cognitive ability, inequality, S factor, British isles, inequality, ecology of intelligence, sociology of intelligence, cognitive sociology

Main figures

R Notebook: http://rpubs.com/EmilOWK/British_isles_1979_reanalysis

Notes

This paper was updated in March 2017 to make it more readable and improve the quality of the figures. The original code was used and the numbers were reproduced, except for the MCV analysis which for unknown reasons changed from r = .47 to r = .49.

Powered by WordPress