For years I have looked for a good alternative to Skype. Skype has some nice features including:

  1. Group conversations
  2. Voice and video chat
  3. A chat history
  4. Good interface
  5. Ease of use

It also have some nasty features I diswant:

  1. Closed source
  2. Ads
  3. NSA et al spying

So, the quest is to find something that has 1-5 from the first list and nothing from the second list. Over the years, there have been various proposals: Hemlis (defunct), Cryptocat, Jitsi, Pidgin, and so on. The EFF has a list here. However, it does not include Tox.

Tox has all the four good features and none of the bad ones. It has multiple cross platform clients. It beats Jitsi and Pidgin in good interface and ease of use, especially set-up. Personally, I like the qTox client, but you may have another preference.

Adding people is pretty easy: they simply add your ID and you get a request, like with Skype. No need to set up servers, or make accounts etc.

If you want to reach me, my Tox ID is: 1728E9D22CDDDDBE314E002843E7F57A20365D40CF0F6B26803AD68A163E82710C78A5213A9B. Be sure to use some specific message so you don’t look like a bot.

Not as easy as claimed, but not that difficult. I used the default values and stuff with Ubuntu like in the guide.

Some difficulties:

Had to learn SSH

Just read this.

No email was sent with password

Apparently, this is not done when using SSH, see above guide.

Don’t skip making a second user

If you do, you can’t log into RStudio server later.

Devtools package won’t install

It has various dependencies in the unix. Run these:

sudo apt-get update
sudo apt-get install libxml2-dev
sudo apt-get install libssl-dev

Then it worked for me. For now.

You installed stuff but the server won’t actually show the Shiny apps

Some stuff needs to be installed more than once. You will need to type these:

sudo su - \ -c "R -e \"install.packages('shiny', repos='http://cran.rstudio.com/')\""
sudo su - \ -c "R -e \"install.packages('rmarkdown', repos='http://cran.rstudio.com/')\""

Which made it work for me. No restarts necessary.

Abstract
It has been found that workers who hail from higher socioeconomic classes have higher earnings even in the same profession. An environmental cause was offered as an explanation of this. I show that this effect is expected solely for statistical reasons.

Introduction
Friedman and Laurison (2015) offer data about the earnings of persons employed in the higher professions by their class of origin. They find that those who have a higher class origin earn more. I reproduce their figure below.

Friedman-fig-1-1024x669

They posit an environmental explanation of this:

In doing so, we have purposively borrowed the ‘glass ceiling’ concept developed by feminist scholars to explain the hidden barriers faced by women in the workplace. In a working paper recently published by LSE Sociology, we argue that it is also possible to identify a ‘class ceiling’ in Britain which is preventing the upwardly mobile from enjoying equivalent earnings to those from upper middle-class backgrounds.

There is also a longer working paper by the same authors, but I did not read that. A link to it can be found in the previously mentioned source.

A simplified model of the situation
How do persons advance to professions? Well, we know that the occupational hierarchy is basically a (general) cognitive ability hierarchy (Gottfredson, 1997), as well as presumably also one of various relevant non-cognitive traits such as being hard working/conscientiousness altho I did not find a study of this.

A simple way to model this is to think of it as a threshold effect such that no one below the given threshold gets into the profession and everybody above gets into it. This is of course not like reality. Reality does have a threshold which increases up the hierarchy. (Insert the figure from one of Gottfredson’s paper that shows the minimum IQ by occupation, but I can’t seem to locate it. Halp!) The effect of GCA is probably more like a probabilistic function akin to the cumulative distribution function such that below a certain cognitive level, virtually no one from below that level is found.

Simulating this is a bit complicated but we can approximate it reasonably by using a simple cut-off value, such that everybody above gets in, everybody below does not (see Gordon (1997) for a similar case with belief in conspiracy theories).

A simulation
One could perhaps solve this analytically, but it is easier to simulate it, so we do that. I used the following procedure:

  1. We make three groups of origin with 90, 100, and 110 IQ.
  2. We simulate a large number (1e4) of random persons from these groups.
  3. We plot these for getting a feeling of the data.
  4. We find the subgroup of each group with IQ > 115, which we take as the minimum for some high level profession.
  5. We calculate the mean IQ of each group and subgroup.

The plot looks like this:

thresholds

The vertical lines are the cut-off threshold (black), and the three means (in the corresponding color). As can be seen, they are not the same despite the same threshold being used. The values are respectively: 121.74, 122.96, and 125.33. The differences between these are not large for the present simulation, but they may be sufficient to bring about differences that are detectable in a large dataset. The values depend on how far the population mean is from the threshold and the standard deviation of the population (all 15 in the above simulation). The further away the threshold is from the mean, the closer the mean of the subgroup above the threshold will be to the threshold value. For subgroups far away, it will be nearly identical. For instance, the mean IQ of those with >=150 is about 153.94 (based on a sampling with 10e7 cases, mean 100, sd 15).

It should be noted that if one also considers measurement error, this effect will be stronger, since persons from lower IQ groups regress further down.

Supplementary material
R source code is available at the Open Science Framework repository.

References

  • Friedman, S and Laurison, D. (2015). Introducing the ‘class’ ceiling. British Politics and Policy blog.
  • Gordon, R. A. (1997). Everyday life as an intelligence test: Effects of intelligence and intelligence context. Intelligence, 24(1), 203-320.
  • Gottfredson, L. S. (1997). Why g matters: The complexity of everyday life. Intelligence, 24(1), 79-132.

It’s a really annoying ‘feature’.

I’m searching for a way to disable the very annoying number formatting in Libre Office Calc. Whenever I enter some number or a string containing numbers, LO is trying to format a date out of it.

Pseudo-solutions:

I’ve found some so-called-solutions, but none of these works.

  • Format cells to “text” – works, but only as long as one pastes or deletes from/to a specific cell. For example, if I paste some content, the formatting is lost again.
  • Start typing with a single quote '. Not an option in daily working routine, just to enter a numeric string.
  • Deselect Tools > AutoCorrect > Options > Apply Numbering – there is no such option in Libre Office (at least not in version 3.5).

Use cases:

  • 2-3 means “two to three whatever”
  • 5.2 is the code following 5.1
  • 6. refers to the sixth item

All those values are translated to some random date in Libre Office by default. I guess they were on drugs when implementing such a bug into the program.

Is there a global setting to turn off that featurebug?

There are two main workarounds: 1) manually adding ‘ in front of numbers, which will force treatment of them as a character string. 2) setting the format to “text” before entering text.

Neither of these are good solutions for everyday work. For instance, if you paste in data from somewhere else, it will not generally have ‘ in front, and it will also override the format you chose. A last trick here is to use “paste special” and then choose the types, which can be a good workaround too.

It’s not a new complaint:

Developers don’t seem to understand the users’ frustration. Instead they write stuff like this:

At first I thought you meant you must handle cell formatting for each cell individually, which of course is false. (You select all the cells and choose “text” as a data type.)
Now if I understand correctly, you want a way to disable automatic date recognition globally, for all spreadsheets? Or at least, you want to know why that is not in the preferences section?
I can at least make a guess at the last question. It is probably for the same reason why there’s not an option to globally turn off automatic recognition of formulas. Because that’s what Calc is for….
In the vast, vast majority of cases, a user will expect that if he types in a date, it will be “understood” by his software as a date. If it doesn’t get “recognized”, he’s going to think “Hmmm, this software isn’t very good.” He’s not going to think, “Hmmm, there must be a global setting somewhere that’s been switched off so that dates don’t get recognized,” and then go hunting for that setting. (If that did happen and he actually found the setting, his inevitable question would be, “Why on earth is that even an option? Who would want to turn off the date recognition for all spreadsheets?” And we’d have to tell him, “Well, it was this guy Swingletree….” ;)

This is a typical example of developers being out of contact with normal users. For normal users (>95% of users), this auto-conversion is more of a bug than a feature, which is why people want to turn it off completely, and then just manually tell Calc when to interpret something as a date.

DMCA email:

VIA EMAIL:

Demand for Immediate Take-Down: Notice of Infringing Activity

Date:      April 24, 2015

URL:      filepost.com/

Dear Sir or Madam,

We have been made aware that the domain listed above, which appears to be on servers under your control, is offering unlicensed copies of, or is engaged in other unauthorized activities relating to copyrighted works published by, Wiley-VCH Verlag GmbH & Co. KGaA.

  1. Identification of copyrighted work(s):

Copyrighted work(s):

Edwards: Human genetic diversity: Lewontin’s fallacy, BioEssays 25:798–801,  2003 Wiley Periodicals, Inc.

Copyright owner or exclusive licensee:

Wiley-VCH Verlag GmbH & Co. KGaA.

  1. Copyright infringing material or activity found at the following location(s):

emilkirkegaard.dk/en/wp-content/uploads/A.W.F.-Edwards-Human-genetic-diversity-Lewontin%E2%80%99s-fallacy.pdf

The above copyrighted work(s) is being made available for copying, through downloading, at the above location without authorization of the copyright owner(s) or exclusive licensee.

  1. Statement of authority:

The information in this notice is accurate, and I hereby certify under penalty of perjury that I am authorized to act on behalf of, Wiley-VCH Verlag GmbH & Co. KGaA., the owner or exclusive licensee of the copyright(s) in the work(s) identified above. I have a good faith belief that none of the materials or activities listed above have been authorized by, Wiley-VCH Verlag GmbH & Co. KGaA., its agents, or the law.

We hereby give notice of these activities to you and request that you take expeditious action to remove or disable access to the material described above, and thereby prevent the illegal reproduction and distribution of this copyrighted work(s) via your company’s network.

We appreciate your cooperation in this matter. Please advise us regarding what actions you take.

Yours sincerely,

Bettina Loycke
Senior Rights Manager
Rights & Licenses

Wiley-VCH Verlag GmbH & Co. KGaA
Boschstraße 12
69469 Weinheim
Germany

www.wiley-vch.de

T          +(49) 6201 606-280
F          +(49) 6201 606-332
rightsDE@wiley.com

My name, typed above, constitutes an electronic signature under Federal law, and is intended to be binding.

Deutsch:
Wiley-VCH Verlag GmbH & Co. KGaA – A company of John Wiley & Sons, Inc. – Sitz der Gesellschaft: Weinheim – Amtsgericht Mannheim, HRB 432833 – Vorsitzender des Aufsichtsrates:
Stephen Michael Smith. Persönlich haftender Gesellschafter: John Wiley & Sons GmbH – Sitz der Gesellschaft: Weinheim – Amtsgericht Mannheim, HRB 432296 – Geschäftsführer: Sabine Steinbach, Dr. Jon Walmsley.

English:
Wiley-VCH Verlag GmbH & Co. KGaA – A company of John Wiley & Sons, Inc. – Location of the Company: Weinheim – Trade Register: Mannheim, HRB 432833.
Chairman of the Supervisory Board: Stephen Michael Smith. General Partner: John Wiley & Sons GmbH, Location: Weinheim – Trade Register Mannheim, HRB 432296 –
Managing Directors: Sabine Steinbach, Dr. Jon Walmsley.

Since I forgot about this and couldn’t find the emai later, I got another one:

VIA EMAIL:

 

Demand for Immediate Take-Down: Notice of Infringing Activity

 

Date:      April 24, 2015

URL:      filepost.com/

 

Dear Sir or Madam,

 

We have been made aware that the domain listed above, which appears to be on servers under your control, is offering unlicensed copies of, or is engaged in other unauthorized activities relating to copyrighted works published by, Wiley-VCH Verlag GmbH & Co. KGaA.

 

  1. Identification of copyrighted work(s):

 

Copyrighted work(s):

Edwards: Human genetic diversity: Lewontin’s fallacy, BioEssays 25:798–801,  2003 Wiley Periodicals, Inc.

 

Copyright owner or exclusive licensee:

Wiley-VCH Verlag GmbH & Co. KGaA.

 

  1. Copyright infringing material or activity found at the following location(s):

 

emilkirkegaard.dk/en/wp-content/uploads/A.W.F.-Edwards-Human-genetic-diversity-Lewontin%E2%80%99s-fallacy.pdf

 

The above copyrighted work(s) is being made available for copying, through downloading, at the above location without authorization of the copyright owner(s) or exclusive licensee.

 

  1. Statement of authority:

 

The information in this notice is accurate, and I hereby certify under penalty of perjury that I am authorized to act on behalf of, Wiley-VCH Verlag GmbH & Co. KGaA., the owner or exclusive licensee of the copyright(s) in the work(s) identified above. I have a good faith belief that none of the materials or activities listed above have been authorized by, Wiley-VCH Verlag GmbH & Co. KGaA., its agents, or the law.

 

We hereby give notice of these activities to you and request that you take expeditious action to remove or disable access to the material described above, and thereby prevent the illegal reproduction and distribution of this copyrighted work(s) via your company’s network.

 

We appreciate your cooperation in this matter. Please advise us regarding what actions you take.

 

Yours sincerely,

 

Bettina Loycke
Senior Rights Manager
Rights & Licenses

Wiley-VCH Verlag GmbH & Co. KGaA
Boschstraße 12
69469 Weinheim
Germany

www.wiley-vch.de

T          +(49) 6201 606-280
F          +(49) 6201 606-332
rightsDE@wiley.com

 

My name, typed above, constitutes an electronic signature under Federal law, and is intended to be binding.

 

Deutsch:
Wiley-VCH Verlag GmbH & Co. KGaA – A company of John Wiley & Sons, Inc. – Sitz der Gesellschaft: Weinheim – Amtsgericht Mannheim, HRB 432833 – Vorsitzender des Aufsichtsrates:
Stephen Michael Smith. Persönlich haftender Gesellschafter: John Wiley & Sons GmbH – Sitz der Gesellschaft: Weinheim – Amtsgericht Mannheim, HRB 432296 – Geschäftsführer: Sabine Steinbach, Dr. Jon Walmsley.

English:
Wiley-VCH Verlag GmbH & Co. KGaA – A company of John Wiley & Sons, Inc. – Location of the Company: Weinheim – Trade Register: Mannheim, HRB 432833.
Chairman of the Supervisory Board: Stephen Michael Smith. General Partner: John Wiley & Sons GmbH, Location: Weinheim – Trade Register Mannheim, HRB 432296 –
Managing Directors: Sabine Steinbach, Dr. Jon Walmsley.

Abstract
Sizeable S factors were found across 3 different datasets (from years 1991, 2000 and 2010), which explained 56 to 71% of the variance. Correlations of extracted S factors with cognitive ability were strong ranging from .69 to .81 depending on which year, analysis and dataset is chosen. Method of correlated vectors supported the interpretation that the latent S factor was primarily responsible for the association (r’s .71 to .81).

Introduction
Many recent studies have examined within-country regional correlates of (general) cognitive ability (also known as (general) intelligence, general mental ability, g),. This has been done for the British Isles (Lynn, 1979; Kirkegaard, 2015g), France (Lynn, 1980), Italy (Lynn, 2010; Kirkegaard, 2015e), Spain (Lynn, 2012), Portugal (Almeida, Lemos, & Lynn, 2011), India (Kirkegaard, 2015d; Lynn & Yadav, 2015), China (Kirkegaard, 2015f; Lynn & Cheng, 2013), Japan (Kura, 2013), the US (Kirkegaard, 2015b; McDaniel, 2006; Templer & Rushton, 2011), Mexico (Kirkegaard, 2015a) and Turkey (Lynn, Sakar, & Cheng, 2015). This paper examines data for Brazil.

Data
Cognitive data
Data from PISA was used as a substitute for IQ test data. PISA and IQ correlate very strongly (>.9; (Rindermann, 2007)) across nations and presumably also across regions altho this hasn’t been thoroly investigated to my knowledge.

Socioeconomic data
As opposed to some of my prior analyses, there was no dataset to build on top of. For this reason, I tried to find an English-language database for Brazil with a comprehensive selection of variables. Altho I found some resources, they did not allow for easy download and compilation of state-level data, which I needed. Instead, I relied upon the Portugeese-language site, Atlasbrasil.org.br, which has a comprehensive data explorer with a convenient download function for state-level data. I used Google Translate to find my way around the site.

Using the data explorer, I selected a broad range of variables. The goal was to cover most important areas of socioeconomic development and avoid variables of little importance or which are heavily influenced by local climate factors (e.g. amount of rainforest). The following variables were selected:

  1. Gini coefficient
  2. Activity rate age 25-29
  3. Unemployment rate age 25-29
  4. Public sector workers%
  5. Farmers%
  6. Service sector workers%
  7. Girls age 10-17 with child%
  8. Life expectancy
  9. Households without electricity%
  10. Infant mortality rate
  11. Child mortality rate
  12. Survive to 40%
  13. Survive to 60%
  14. Total fertility rate
  15. Dependancy ratio
  16. Aging rate
  17. Illiteracy age 11-14 %
  18. Illiteracy age 25 and above %
  19. Age 6-17 in school %
  20. Attendence higher education %
  21. Income per capita
  22. Mean income lowest quintile
  23. Pct extremely poor
  24. Richest 10 pct income
  25. Bad walls%
  26. Bad water and sanitation%
  27. HDI
  28. HDI income
  29. HDI life expectancy
  30. HDI education
  31. Population
  32. Population rural

Variables were available only for three time points: 1991, 2000 and 2010. I selected all three with intention of checking stability of results over different time periods.

Most data was already in an appropriate per unit measure so it was not necessary to do extensive conversions as with the Mexican data (Kirkegaard, 2015a). I calculated fraction of the population living in rural areas by dividing the rural population by the total population.

Note that the data explorer also has data at a lower level, that of municipals. It could be used in the future to see if the S factor holds for a lower level of aggregate analysis.

S factor loadings
I split the data into three datasets, one for 1991, 2000 and 2010.

I extracted S factors using the fa() function with default parameters from the psych package (Revelle, 2015).

S factor in 1991
Due to missing data, there were only 21 indicators available for this year. The data could not be imputed since it was missing for all cases for these variables. The loadings plot is shown in Figure 1.

S.1991.loadings

Figure 1: Loadings plot for S factor for the data from 1991

All indicators were in the expected direction aside from perhaps “aging rate”, which is somewhat unclear and would perhaps be expected to have a positive loading.

S factor in 2000
Less missing data, 26 variables. Loadings plot shown in Figure 2.

S.2000.loadings

Figure 2: Loadings plot for S factor for the 2000 data

All indicators were in the expected direction.

S factor for 2010
27 variables. Factor analysis gave an error for this dataset which means that I had to remove at least one variable.1 This left me with the question of which variable(s) to exclude. Similar to the previous analysis for Mexican states (Kirkegaard, 2015a), I used an automatic method. After removing one variable, the factor analysis worked and gave no warning. The excluded variable was child mortaility which was correlated near perfectly with another variable (infant mortality, r=.992), so little indicator sampling error should be introduced because of this deletion. The loadings plot is shown in Figure 3.

S.2010.1.loadings

Figure 3: Loadings plot for S factor for the 2010 data, minus one variable

Oddly, survival to 60 and 40 now have negative loadings, altho one would expect them to correlate highly with life expectancy which has a loading near 1. In fact, the correlations between life expectancy and the survival variables was -.06 and -.21, which makes you wonder what these variables are measuring. Excluding them does not substantially change results, but does increase the amount of variance explained to .60.

Out of curiosity, I also tried versions where I deleted 5 and 10 variables, but this did not change much in the plots, so I won’t show them. Interested readers can consult the source code.

Mixed cases
To examine whether there are any cases with strong mixedness — cases that are incongruent with the factor structure in the data — I developed two methods which are presented elsewhere (Kirkegaard, 2015c). Briefly, the first method measures the mixedness of the case by quantifying how predictable indicator scores are from the factor score for each case (mean absolute residual, MAR). The second quantifies how much the size of the general factor changes after exclusion of each individual case (improvement in proportion of variance, IPV). Both methods were found to be useful at finding a strongly mixed case in simulation data.

I applied both methods to the Brazilian datasets. For the second method, I had to create two additional reduced datasets since the factor analysis could not run with the resulting combinations of cases and indicators.

There are two ways one can examine the results: 1) by looking at the top (or bottom) most mixed cases for each method; 2) by looking at the correlations between results from the methods. The first is interesting if Brazilian state-level inequality in S has particular interest, while the second is more relevant for checking that the methods really work — they should give congruent results if mixed cases are present.

Top mixed cases
For each method and each analysis, I extracted the names of the top 5 mixed states. They are shown in Table 1.

 

Position_1

Position_2

Position_3

Position_4

Position_5

m1.1991

Amapá

Acre

Distrito Federal

Roraima

Rondônia

m1.2000

Amapá

Roraima

Acre

Distrito Federal

Rondônia

m1.2010.1

Roraima

Distrito Federal

Amapá

Amazonas

Acre

m1.2010.5

Roraima

Distrito Federal

Amapá

Acre

Amazonas

m1.2010.10

Roraima

Distrito Federal

Amapá

Acre

Amazonas

m2.1991

Amapá

Rondônia

Acre

Roraima

Amazonas

m2.2000.1

Amapá

Rondônia

Roraima

Paraíba

Ceará

m2.2010.2

Amapá

Roraima

Distrito Federal

Pernambuco

Sergipe

m2.2010.5

Amapá

Roraima

Distrito Federal

Piauí

Bahia

m2.2010.10

Distrito Federal

Roraima

Amapá

Ceará

Tocantins

 

Table 1: Top 5 mixed cases by method and dataset

As can be seen, there is quite a bit of agreement across years, datasets, and methods. If one were to do a more thoro investigation of socioeconomic differences across Brazilian states, one should examine these states for unusual patterns. One could do this using the residuals for each indicator by case from the first method (these are available from the FA.residuals() in psych2). A quick look at the 2010.1 data for Amapá shows that the state is roughly in the middle regarding state-level S (score = -.26, rank 15 of 27), Farmers do not constitute a large fraction of the population (only 9.9%, rank 4 only behind the states with large cities: Federal district, Rio de Janeiro, and São Paulo). Given that farmers% has a strong negative loading (-.77) and the state’s S score, one would expect the state to have relatively more farmers than it has, the mean of all states for that dataset is 17.2%.

Much more could be said along these lines, but I rather refrain since I don’t know much about the country and can’t read the language very well. Perhaps a researchers who is a Brailizian native could use the data to make a more detailed analysis.

Correlations between methods and datasets
To test whether the results were stable across years, data reductions, and methods, I correlated all the mixedness metrics. Results are in Table 2.

 

m1.1991

m1.2000

m1.2010.1

m1.2010.5

m1.2010.10

m2.1991

m2.2000.1

m2.2010.2

m2.2010.5

m1.1991

m1.2000

0.88

m1.2010.1

0.81

0.85

m1.2010.5

0.77

0.87

0.98

m1.2010.10

0.70

0.79

0.93

0.96

m2.1991

0.48

0.64

0.45

0.48

0.40

m2.2000.1

0.41

0.58

0.34

0.39

0.27

0.87

m2.2010.2

0.53

0.63

0.66

0.66

0.51

0.58

0.68

m2.2010.5

0.32

0.49

0.60

0.64

0.51

0.49

0.59

0.86

m2.2010.10

0.42

0.44

0.66

0.65

0.59

0.32

0.44

0.75

0.76

 

Table 2: Correlation table for mixedness metrics across datasets and methods.

There is method specific variance since the correlations within method (topleft and bottomright squares) are stronger than those across methods. Still, all correlations are positive, Cronbach’s alpha is .87, Guttman lambda 6 is .98 and the mean correlation is .61.

S and HDI correlations
HDI
Previous S factor studies have found that HDI (Human Development Index) is basically a non-linear proxy for the S factor (Kirkegaard, 2014, 2015a). This is not surprising since the HDI is calculated from longevity, education and income, all three of which are known to have strong S factor loadings. The actual derivation of HDI values is somewhat complex. One might expect them simple to average the three indicators, or extract the general factor, but no. Instead they do complicated things (WHO, 2014).

For longevity (life expectancy at birth), they introduce ceilings at 25 and 85 years. According to data from WHO (WHO, 2012), no country has values above or below these values altho Japan is close (84 years).

For education, it is actually an average of two measures: years of education by 25 year olds and expected years of schooling for children entering school age. These measures also have artificial limits of 0-18 and 0-15 respectively.

For gross national income, they use the log values and also artificial limits of 100-75,000 USD.

Moreover, these are not simply combined by standardizing (i.e. rescaling so the mean is 0 and standard deviation is 1) the values and adding them or taking the mean. Instead, a value is calculated for every indicator using the following formula:

HDI_dimension_formula
Equation 1: HDI index formula

Note that for education, this formula is used twice and the results averaged.

Finally, the three dimensions are combined using a geometric mean:

HDI_combine
Equation 2: HDI index combination formula

The use of a geometric mean as opposed to the normal arithmetic mean, is that if a single indicator is low, the overall level is strongly reduced, whereas with the arithmetic, only the sum of the indicators matter, not the standard deviation of them. If the indicators have the same value, then the geometric and arithmetic mean have the same value.

For instance, if indicators are .7, .7, .7, the arithmetic mean is .7+.7+.7=2.1, 2.1/3=.7 and the geometric .73=0.343, 0.3431/3=.7. However, if indicators are 1, .7, .4, then the arithmetic mean is 1+.7+.4=2.1, 2.1/3=.7, but geometric mean is 1*.7*.4=0.28, 0.281/3=0.654 which is a bit lower than .7.

S and HDI correlations
I used the previously extracted factor scores and the HDI data. I also extracted S factors from the HDI datasets (3 variables)2 to see how these compared with the complex HDI value derivation. Finally, I correlated the S factors from non-HDI data, S factors from HDI data, HDI values and cognitive ability scores. Results are shown in Table 2.

 

HDI.1991

HDI.2000

HDI.2010

HDI.S.1991

HDI.S.2000

HDI.S.2010

S.1991

S.2000

S.2010.1

S.2010.5

S.2010.10

CA2012

HDI.1991

0.95

0.92

0.98

0.95

0.93

0.96

0.93

0.86

0.89

0.90

0.59

HDI.2000

0.97

0.97

0.94

0.99

0.96

0.94

0.98

0.93

0.95

0.96

0.66

HDI.2010

0.94

0.98

0.93

0.98

0.99

0.93

0.97

0.94

0.97

0.98

0.65

HDI.S.1991

0.98

0.96

0.94

0.95

0.94

0.98

0.92

0.84

0.88

0.90

0.54

HDI.S.2000

0.97

1.00

0.98

0.97

0.97

0.95

0.98

0.92

0.95

0.97

0.65

HDI.S.2010

0.95

0.98

0.99

0.95

0.98

0.94

0.96

0.94

0.96

0.97

0.66

S.1991

0.96

0.96

0.94

0.97

0.97

0.96

0.92

0.86

0.90

0.91

0.60

S.2000

0.95

0.98

0.96

0.93

0.99

0.97

0.97

0.96

0.98

0.98

0.69

S.2010.1

0.89

0.94

0.94

0.86

0.94

0.95

0.92

0.97

0.99

0.96

0.76

S.2010.5

0.91

0.95

0.96

0.89

0.96

0.97

0.93

0.98

0.99

0.98

0.72

S.2010.10

0.93

0.96

0.98

0.93

0.97

0.98

0.93

0.97

0.96

0.98

0.71

CA2012

0.67

0.73

0.71

0.60

0.72

0.74

0.69

0.78

0.81

0.79

0.75

 

Table 3: Correlation matrix for S, HDI and cognitive ability scores. Pearson’s below the diagonal, rank-order above.

All results were very strongly correlated no matter which dataset or scoring method was used. Cognitive ability scores were strongly correlated to all S factor measures. The best estimate of the relationship between S factor and cognitive ability is probably the correlation with S.2010.1, since this is the dataset cloest in time to the cognitive dataset and the S factor is extracted from the most variables. This is also the highest value (.81), but that may be a coincidence.

It is worth noting that the rank-order correlations were somewhat weaker. This usually indicates that an outlier case is increasing the Pearson correlation. To investigate this, I plot the S.2010.1 and CA2012 variables, see Figure 4.

CA_S_2010_1
Figure 4: Scatter plot of S factor and cognitive ability

The scatter plot however does not seem to reveal any outliers inflating the correlation.

Method of correlated vectors
To examine whether the S factor was plausibly the cause of the pattern seen with the S factor scores (it is not necessarily), I used the method of correlated vectors with reversing. Results are shown in Figures 5-7.

MCV_1991
Figure 5: MCV for the 1991 dataset

MCV_2000
Figure 6: MCV for the 2000 dataset

MCV_2010_1
Figure 7: MCV for the 2010 dataset

The first result seems to be driven by a few outliers, but the second and third seems decent enough. The numerical results were fairly consistent (.71, .75, .81).

Discussion and conclusion
Generally, the results were in line with earlier studies. Sizeable S factors were found across 3 (or 6 if one counts the mini-HDI ones) different datasets, which explained 56 to 71% of the variance. There seems to be a decrease over time, which is intriguing as it is may eventually lead to the ‘destruction’ of the S factor. It may also be due to differences between the datasets across the years, since they were not entirely comparable. I did not examine the issue in depth.

Correlations of S factors and HDIs with cognitive ability were strong ranging from .60 to .81 depending on which year, analysis, dataset is chosen, and whether one uses the HDI values. Correlations were stronger when they were from the larger datasets, which is perhaps because they were better measures of latent S. MCV supported the interpretation that the latent S factor was primarily responsible for the association (r’s .71 to .81).

Future studies should examine to which degree cognitive ability and S factor differences are explainable by ethnracial factors e.g. racial ancestry as done by e.g. (Kirkegaard, 2015b).

Limitations
There are some problems with this paper:

  • I cannot read Portuguese and this may have resulted in including some incorrect variables.
  • There was a lack of crime variables in the datasets, altho these have central importance for sociology. None were available in the data source I used.

Supplementary material
R source code, data and figures can be found in the Open Science Framework repository.

References

Almeida, L. S., Lemos, G., & Lynn, R. (2011). Regional Differences in Intelligence and per Capita Incomes in Portugal. Mankind Quarterly, 52(2), 213.

Kirkegaard, E. O. W. (2014). The international general socioeconomic factor: Factor analyzing international rankings. Open Differential Psychology. Retrieved from openpsych.net/ODP/2014/09/the-international-general-socioeconomic-factor-factor-analyzing-international-rankings/

Kirkegaard, E. O. W. (2015a). Examining the S factor in Mexican states. The Winnower. Retrieved from thewinnower.com/papers/examining-the-s-factor-in-mexican-states

Kirkegaard, E. O. W. (2015b). Examining the S factor in US states. The Winnower. Retrieved from thewinnower.com/papers/examining-the-s-factor-in-us-states

Kirkegaard, E. O. W. (2015c). Finding mixed cases in exploratory factor analysis. The Winnower. Retrieved from thewinnower.com/papers/finding-mixed-cases-in-exploratory-factor-analysis

Kirkegaard, E. O. W. (2015d). Indian states: G and S factors. The Winnower. Retrieved from thewinnower.com/papers/indian-states-g-and-s-factors

Kirkegaard, E. O. W. (2015e). S and G in Italian regions: Re-analysis of Lynn’s data and new data. The Winnower. Retrieved from thewinnower.com/papers/s-and-g-in-italian-regions-re-analysis-of-lynn-s-data-and-new-data

Kirkegaard, E. O. W. (2015f). The S factor in China. The Winnower. Retrieved from thewinnower.com/papers/the-s-factor-in-china

Kirkegaard, E. O. W. (2015g). The S factor in the British Isles: A reanalysis of Lynn (1979). The Winnower. Retrieved from thewinnower.com/papers/the-s-factor-in-the-british-isles-a-reanalysis-of-lynn-1979

Kura, K. (2013). Japanese north–south gradient in IQ predicts differences in stature, skin color, income, and homicide rate. Intelligence, 41(5), 512–516. doi.org/10.1016/j.intell.2013.07.001

Lynn, R. (1979). The social ecology of intelligence in the British Isles. British Journal of Social and Clinical Psychology, 18(1), 1–12. doi.org/10.1111/j.2044-8260.1979.tb00297.x

Lynn, R. (1980). The social ecology of intelligence in France. British Journal of Social and Clinical Psychology, 19(4), 325–331. doi.org/10.1111/j.2044-8260.1980.tb00360.x

Lynn, R. (2010). In Italy, north–south differences in IQ predict differences in income, education, infant mortality, stature, and literacy. Intelligence, 38(1), 93–100. doi.org/10.1016/j.intell.2009.07.004

Lynn, R. (2012). North-South Differences in Spain in IQ, Educational Attainment, per Capita Income, Literacy, Life Expectancy and Employment. Mankind Quarterly, 52(3/4), 265.

Lynn, R., & Cheng, H. (2013). Differences in intelligence across thirty-one regions of China and their economic and demographic correlates. Intelligence, 41(5), 553–559. doi.org/10.1016/j.intell.2013.07.009

Lynn, R., Sakar, C., & Cheng, H. (2015). Regional differences in intelligence, income and other socio-economic variables in Turkey. Intelligence, 50, 144–149. doi.org/10.1016/j.intell.2015.03.006

Lynn, R., & Yadav, P. (2015). Differences in cognitive ability, per capita income, infant mortality, fertility and latitude across the states of India. Intelligence, 49, 179–185. doi.org/10.1016/j.intell.2015.01.009

McDaniel, M. A. (2006). State preferences for the ACT versus SAT complicates inferences about SAT-derived state IQ estimates: A comment on Kanazawa (2006). Intelligence, 34(6), 601–606. doi.org/10.1016/j.intell.2006.07.005

Revelle, W. (2015). psych: Procedures for Psychological, Psychometric, and Personality Research (Version 1.5.4). Retrieved from cran.r-project.org/web/packages/psych/index.html

Rindermann, H. (2007). The g-factor of international cognitive ability comparisons: the homogeneity of results in PISA, TIMSS, PIRLS and IQ-tests across nations. European Journal of Personality, 21(5), 667–706. doi.org/10.1002/per.634

Templer, D. I., & Rushton, J. P. (2011). IQ, skin color, crime, HIV/AIDS, and income in 50 U.S. states. Intelligence, 39(6), 437–442. doi.org/10.1016/j.intell.2011.08.001

WHO. (2012). Life expectancy. Data by country. Retrieved from apps.who.int/gho/data/node.main.688?lang=en

WHO. (2014). Human Development Report: Technical notes: Calculating the human development indices—graphical presentation. WHO. Retrieved from hdr.undp.org/sites/default/files/hdr14_technical_notes.pdf

Footnotes

1 Error in min(eigens$values) : invalid ‘type’ (complex) of argument.

2 Factor loadings for HDI factor analysis were very strong, always >.9.

Abstract
Two methods are presented that allow for identification of mixed cases in the extraction of general factors. Simulated data is used to illustrate them.

Introduction
General factors can be extracted from datasets where all or nearly so the variables are correlated. At the case-level, such general factors are decreased in size if there are mixed cases present. A mixed case is an ‘inconsistent’ case according to the factor structure of the data.

A simple way of illustrating what I’m talking about is using the matrixplot() function from the VIM package to R (Templ, Alfons, Kowarik, & Prantner, 2015) with some simulated data.

For simulated dataset 1, start by imaging that we are measuring a general factor and that all our indicator variables have a positive loading on this general factor, but that this loading varies in strength. Furthermore, there is no error of measurement and there is only one factor in the data (no group factors, i.e. no hierarchical or bi-factor structure, (Jensen & Weng, 1994)). I have used datasets with 50 cases and 25 variables to avoid the excessive sampling error of small samples and to keep a realistic number of cases compared to the datasets examined in S factor studies (e.g. Kirkegaard, 2015). The matrix plot is shown in Figure 1.

data1_matplot
Figure 1: Matrix plot of dataset 1

No real data looks like this, but it is important to understand what to look for. Every indicator is on the x-axis and the cases are on the y-axis. The cases are colored by their relative values, where darker means higher values. So in this dataset we see that any case that does well on any particular indicator does just as well on every other indicator. All the indicators have the same factor loading of 1, and the proportion of variance explained is also 1 (100%), so there is little point in showing the loadings plot.

To move towards realism, we need to complicate this simulation in some way. The first way is to introduce some measurement error. The amount of error introduced determines the factor loadings and hence the size of the general factor. In dataset 2, the error amount is .5, and the signal multiplier varies from .05 to .95 all of which are equally likely (uniform distribution). The matrix and the loadings plots are shown in Figures 2 and 3.

data2_matplot
Figure 2: Matrix plot for dataset 2

data2_loadings

Figure 3: Loadings plot for dataset 2

By looking at the matrix plot we can still see a fairly simple structure. Some cases are generally darker (whiter) than others, but there is also a lot of noise which is of course the error we introduced. The loadings show quite a bit of variation. The size of this general factor is .45.

The next complication is to introduce the possibility of negative loadings (these are consistent with a general factor, as long as they load in the right direction, (Kirkegaard, 2014)). We go back to the simplified case of no measurement error for simplicity. Figures 4 and 5 show the matrix and loadings plots.

data3_matplot
Figure 4: Matrix plot for dataset 3

data3_loadings
Figure 5: Loadings plot for dataset 3

The matrix plot looks odd, until we realize that some of the indicators are simply reversed. The loadings plot shows this reversal. One could easily get back to a matrix plot like that in Figure 1 by reversing all indicators with a negative loading (i.e. multiplying by -1). However, the possibility of negative loadings does increase the complexity of the matrix plots.

For the 4th dataset, we make a begin with dataset 2 and create a mixed case. This we do by setting its value on every indicator to be 2, a strong positive value (98 centile given a standard normal distribution). Figure 6 shows the matrix plot. I won’t bother with the loadings plot because it is not strongly affected by a single mixed case.

data4_matplot
Figure 6: Matrix plot for dataset 4

Can you guess which case it is? Perhaps not. It is #50 (top line). One might expect it to be the same hue all the way. This however ignores the fact that the values in the different indicators vary due to sampling error. So a value of 2 is not necessarily at the same centile or equally far from the mean in standard units in every indicator, but it is fairly close which is why the color is very dark across all indicators.

For datasets with general factors, the highest value of a case tends to be on the most strongly loaded indicator (Kirkegaard, 2014b), but this information is not easy to use in an eye-balling of the dataset. Thus, it is not so easy to identify the mixed case.

Now we complicate things further by adding the possibility of negative loadings. This gets us data roughly similar to that found in S factor analysis (there are still no true group factors in the data). Figure 7 shows the matrix plot.

data5_matplot
Figure 7: Matrix plot for dataset 5

Just looking at the dataset, it is fairly difficult to detect the general factor, but in fact the variance explained is .38. The mixed case is easy to spot now (#50) since it is the only case that is consistently dark across indicators, which is odd given that some of them have negative loadings. It ‘shouldn’t’ happen. The situation is however somewhat extreme in the mixedness of the case.

Automatic detection

Eye-balling figures and data is a powerful tool for quick analysis, but it cannot give precise numerical values used for comparison between cases. To get around this I developed two methods for automatic identification of mixed cases.

Method 1
A general factor only exists when multidimensional data can be usefully compressed, informationally speaking, to 1-dimensional data (factor scores on the general factor). I encourage readers to consult the very well-made visualization of principal component analysis (almost the same as factor analysis) at this website. In this framework, mixed cases are those that are not well described or predicted by a single score.

Thus, it seems to me that that we can use this information as a measure of the mixedness of a case. The method is:

  1. Extract the general factor.
  2. Extract the case-level scores.
  3. For each indicator, regress it unto the factor scores. Save the residuals.
  4. Calculate a suitable summary metric, such as the mean absolute residual and rank the cases.

Using this method on dataset 5 in fact does identify case 50 as the most mixed one. Mixedness varies between cases due to sampling error. Figure 8 shows the histogram.

data5_method1_hist
Figure 8: Histogram of absolute mean residuals from dataset 5

The outlier on the right is case #50.

How extreme does a mixed case need to be for this method to find it? We can try reducing its mixedness by assigning it less extreme values. Table 1 shows the effects of doing this.

 

Mixedness values

Mean absolute residual

2

1.91

1.5

1.45

1

0.98

 

Table 1: Mean absolute residual and mixedness

So we see that when it is 2 and 1.5, it is clearly distinguishable from the rest of the cases, but 1 is about the limit of this since the second-highest value is .80. Below this, the other cases are similarly mixed, just due to the randomness introduced by measurement error.

Method 2
Since mixed cases are poorly described by a single score, they don’t fit well with the factor structure in the data. Generally, this should result in the proportion of variance increasing when they are removed. Thus the method is:

  1. Extract the general factor from the complete dataset.
  2. For every case, create a subset of the dataset where this case is removed.
  3. Extract the general factors from each subset.
  4. For each analysis, extract the proportion of variance explained and calculate the difference to that using the full dataset.

Using this method on the dataset also used above correctly identifies the mixed case. The histogram of results is shown in Figure 9.

data5_method2_hist
Figure 9: Histogram of differences in proportion of variance to the full analysis

Like we method 1, we then redo this analysis for other levels of mixedness. Results are shown in Table 2.

 

Mixedness values

Improvement in proportion of variance

2

1.91

1.5

1.05

1

0.50

 

Table 2: Improvement in proportion of variance and mixedness

We see the same as before, in that both 2 and 1.5 are clearly identifiable as being an outlier in mixedness, while 1 is not since the next-highest value is .45.

Large scale simulation with the above methods could be used to establish distributions to generate confidence intervals from.

It should be noted that the improvement in proportion of variance is not independent of number of cases (more cases means that a single case is less import, and non-linearly so), so the value cannot simply be used to compare across cases without correcting for this problem. Correcting it is however beyond the scope of this article.

Comparison of methods

The results from both methods should have some positive relationship. The scatter plot is shown in

IPV_MAR
Figure 10: Scatter plot of method 1 and 2

We see that the true mixedness case is a strong outlier with both methods — which is good because it really is a strong outlier. The correlation is strongly inflated because of this, to r=.70 with, but only .26 without. The relative lack of a positive relationship without the true outlier in mixedness is perhaps due to range restriction in mixedness in the dataset, which is true because the only amount of mixedness besides case 50 is due to measurement error. Whatever the exact interpretation, I suspect it doesn’t matter since the goal is to find the true outliers in mixedness, not to agree on the relative ranks of the cases with relatively little mixedness.1

Implementation
I have implemented both above methods in R. They can be found in my unofficial psych2 collection of useful functions located here.

Supplementary material
Source code and figures are available at the Open Science Framework repository.

References

Jensen, A. R., & Weng, L.-J. (1994). What is a good g? Intelligence, 18(3), 231–258. doi.org/10.1016/0160-2896(94)90029-9

Kirkegaard, E. O. W. (2014a). The international general socioeconomic factor: Factor analyzing international rankings. Open Differential Psychology. Retrieved from openpsych.net/ODP/2014/09/the-international-general-socioeconomic-factor-factor-analyzing-international-rankings/

Kirkegaard, E. O. W. (2014b). The personal Jensen coefficient does not predict grades beyond its association with g. Open Differential Psychology. Retrieved from openpsych.net/ODP/2014/10/the-personal-jensen-coefficient-does-not-predict-grades-beyond-its-association-with-g/

Kirkegaard, E. O. W. (2015). Examining the S factor in US states. The Winnower. Retrieved from thewinnower.com/papers/examining-the-s-factor-in-us-states

Templ, M., Alfons, A., Kowarik, A., & Prantner, B. (2015, February 19). VIM: Visualization and Imputation of Missing Values. CRAN. Retrieved from cran.r-project.org/web/packages/VIM/index.html

Footnotes

1 Concerning the agreement about rank-order, it is about .4 both with and without case 50. But this is based on a single simulation and I’ve seen some different values when re-running it. A large scale simulation is necessary.

So you want to run some code that may throw an error? This is somewhat less common with R than with e.g. php.

It is quite simple in Python:

The try statement works as follows.

  • First, the try clause (the statement(s) between the try and except keywords) is executed.
  • If no exception occurs, the except clause is skipped and execution of the try statement is finished.
  • If an exception occurs during execution of the try clause, the rest of the clause is skipped. Then if its type matches the exception named after the except keyword, the except clause is executed, and then execution continues after the try statement.
  • If an exception occurs which does not match the exception named in the except clause, it is passed on to outer try statements; if no handler is found, it is an unhandled exception and execution stops with a message as shown above.

Let’s try something simple in iPython:

In [2]: try:
   ...:     "string"/7
   ...: except:
   ...:     print("Can't divide a string")
   ...:
Can't divide a string

Simple stuff.

Now, let’s do the same in R:

tryCatch("string"/7,
         error = function(e) {
           print("Can't divide a string")
         }
)

Notice how I had to add an anonymous function? Well apparently this is how it has to be done in R. The parameter to the function, e, is not even used. It would be better if one could simply do this:

tryCatch("string"/7,
         error = {
           print("Can't divide a string")
         }
)

But no, then you get:

[1] "Can't divide a string"
Error in tryCatchOne(expr, names, parentenv, handlers[[1L]]) : 
  attempt to apply non-function

After playing War for the Overworld (unofficial DK3), I felt that I needed to play the real version. So I downloaded a version of DK2 from here. The game opens and the menus work fine, but once you get into a game, the FPS drops to unplayable levels. I tried various compatibility settings but it didn’t work. However, the advice given here works:

HOW TO FIX THE INGAME LAGGGGS !!! —— Windows 8.1

OK, even if this is not the right post here, i say it right away.

I have bought the game from Origin, installed it and the problems begun right at start. When i started the game the bullfrog logo freezed.

Just tab out in windowns one time and tab back in game and it starts running.

When i was in the menu of the game everything worked just fine. but after i started a game in the campain i had like 2 fps and horrible mouse lags.

So here is what i did and now it runs perfectly without any problems.

Go in the game menu in OPTIONS – GRAPHICS OPTION and change the resulution to 640 x 480. (maybe u have to deactived Hardware accelerations too)

You wont believe but it was that easy and the game runs fine now.

I hope this helps a few players who bought this game.

I’m just putting it here for future reference in case I forget or something else has the same problem.