Installing scrapy on Mint 17

One would think that after so many years, things would simply work on linux. Especially on mainstream platforms. Mint 17 is based on the last Ubuntu LTS version (Trustly Tahr).

For the uninitiated linux can seem difficult with all the terminal commands. But actually install guides make things seem a lot easier than they really are. Case in point, the scrapy site simply lists the command:

pip install scrapy

Very easy right? Well, assuming you already managed to install pip in the first place.

In any case, finding the exact error is not easy. One needs some intuition of where to look for it in the huge log files, and then Google the right error message so one can find the relevant information in the hive-mind (typically stackexchange).

libxml2 and libxslt

Buried in the 1000 line log file was an error concerning these:

ERROR: /bin/sh: 1: xslt-config: not found
** make sure the development packages of libxml2 and libxslt are installed **

Seems to be solvable by installing these.

sudo apt-get install libxml2-dev

sudo apt-get install libxslt1-dev

libffi

Package libffi was not found in the pkg-config search path.
Perhaps you should add the directory containing `libffi.pc’
to the PKG_CONFIG_PATH environment variable
No package ‘libffi’ found

This was repeated many times. The command to fix is: sudo apt-get install libffi-dev

/usr/bin/ld: cannot find -lz

Found solution here.

sudo apt-get install -y zlib1g-dev

This error happened when trying to install lxml with pip.

OpenSSL error

One error was:

cryptography/hazmat/bindings/__pycache__/_Cryptography_cffi_36a40ff0x2bad1bae.c:194:25: fatal error: openssl/aes.h: No such file or directory

Which is discussed here. Solution: sudo apt-get install libssl-dev

And the rest

The above may not be sufficient. I ran lots of other commands which I have now forgotten. Perhaps some of them made a difference too. Perhaps not.

Installing R on Mint 17

As nearly always, when one wants to do something in Linux, it becomes a test of both intelligence and patience. Mostly the latter.

In this case I had installed the newest Mint on my new laptop, Mint 17. Then I set out to install R. Since this language’s name is but a single letter, it is not so easy to search for solutions. The first question is what the linux program is called. r-base, it turns out. However, to install it, one needs to add the CRAN mirror to the repository list (sources.list file in etc/apt/). But their website is not very helpful because they don’t give any examples, just write “http://<my.favorite.cran.mirror>/bin/linux/ubuntu utopic/”. And if one chooses the CRAN mirror in DK, the URL is “http://mirrors.dotsrc.org/cran/”. However, adding “http://mirrors.dotsrc.org/cran/bin/linux/ubuntu utopic/” does not work. Apparently, one has to know that one must remove the “/cran” part of the URL first. In any case, I ended up using “deb cran.at.r-project.org/bin/linux/ubuntu utopic/” which works except that it gives errors about the lack of a public key (apparently not important).

Then it’s a matter of trying sudo apt-get install r-base, but no no. There are various dependencies missing. In particular, libgomp1 was in some 4.8x version, while the required version was 4.9x. So, to Synaptics I go. But according to Synaptics, I already have the latest version!

Ok, so Synaptics itself must be outdated. So I updated the source lists (sudo apt-get update) but that didn’t help either. Then I tried downloading and installing libgomp1 manually, but that just required some more dependencies…

So, next idea was to update the repositories that Linux uses. Apparently, Mint 17 is based on Ubuntu 14.04 (Trusty Tahr). The question was just where one gets the URLs of these. After spending perhaps an hour trying some of the wrong ones, I finally found this page where one can generate sources.list code for Ubuntu that are updated! So I ended up with:

#deb cdrom:[Linux Mint 17 _Qiana_ – Release amd64 20140624]/ trusty contrib main non-free
deb cran.at.r-project.org/bin/linux/ubuntu utopic/

###### Ubuntu Main Repos
deb dk.archive.ubuntu.com/ubuntu/ utopic main restricted universe
deb-src dk.archive.ubuntu.com/ubuntu/ utopic main restricted universe

###### Ubuntu Update Repos
deb dk.archive.ubuntu.com/ubuntu/ utopic-security main restricted universe
deb dk.archive.ubuntu.com/ubuntu/ utopic-updates main restricted universe
deb-src dk.archive.ubuntu.com/ubuntu/ utopic-security main restricted universe
deb-src dk.archive.ubuntu.com/ubuntu/ utopic-updates main restricted universe

###### Ubuntu Partner Repo
deb archive.canonical.com/ubuntu utopic partner
deb-src archive.canonical.com/ubuntu utopic partner

###### Ubuntu Extras Repo
deb extras.ubuntu.com/ubuntu utopic main
deb-src extras.ubuntu.com/ubuntu utopic main

Which worked. After running update, I could then use Synaptics to install r-base with the dependencies. Hurray!

Admixture in the Americas: Introduction, partial correlations and IQ predictions based on ancestry

For those who have been living under a rock (i.e. not following my on Twitter), John Fuerst have been very good at compiling data from published research. Have a look at Human Varieties with the tag Admixture Mapping. He asked me to help him analyze it and write it up. I gladly obliged, you can read the draft here. John thinks we should write it all into one huge paper instead of splitting it up as is standard practice. The standard practice is perhaps not entirely just for gaming the reputation system, but also because writing huge papers like that can seem overwhelming and may take a long time to get thru review.

So the project summarized so far is this:

  • Genetic models of trait admixture predict that mixed groups will be in-between the two source population in the trait in proportion to their admixture.
  • For psychological traits such as general intelligence (g), this has previously primarily been studied unsystematically in African Americans, but this line of research seems to have dried up, perhaps because it became too politically sensitive over there.
  • However, there have been some studies using the same method, just examining illness-related traits (e.g. diabetes). These studies usually include socioeconomic variables as controls. In doing so, they have found robust correlations between admixture at the individual level and socioeconomic outcomes: income, occupation, education and the like.
  • John has found quite a lot of these and compiled the results into a table that can be found here.
  • The results clearly show the expected results, namely that more European ancestry is associated with more favorable outcomes, more African or American less favorable outcomes. A few of them are non-significant, but none contradicts. A meta-analysis of this would find a very small p value indeed.
  • One study actually included cognitive measures as co-variates and found results in the generally expected direction. See material under the headline “Cognitive differences in the Americans” in the draft file.
  • There is no necessity that one has to look at the individual level. One can look at the group level too. For this reason John has compiled data about the ancestry proportions of American countries and Mexican regions.
  • For the countries, he has tested this against self-identified proportions, CIA World Factbook estimates, skin reflection data and stuff like that, see: humanvarieties.org/2014/10/19/racial-ancestry-in-the-americas-part-1-genomic-continental-racial-admixture-estimate-and-validation/ The results are pretty solid. The estimates are clearly in the right ballpark.
  • Now, genetic models of the world distribution of general intelligence clearly predict that these estimates will be strongly related to the countries’ estimated mean levels of general intelligence. To test this John has carried out a number of multiple regressions with various controls such as parasite prevalence or cold weather along with European ancestry with the dependent variable being skin color and national achievement scores (PISA tests and the like). Results are in the expected directions even with controls.
  • Using the Mexican regional data, John has compared the Amerindian estimates with PISA scores, Raven’s scores, and Human Development Index (a proxy for S factor (see here and here)). Post is here: humanvarieties.org/2014/10/15/district-level-variation-in-continental-racial-admixture-predicts-outcomes-in-mexico/

This is where we are. Basically, the data is all there, ready to be analyzed. Someone needs to do the other part of the grunt work, namely running all the obvious tests and writing everything up for a big paper. This is where I come in.

The first I did was to create an OSF repository for the data and code since John had been manually keeping track of versions on HV. Not too good. I also converted his SPSS datafile to one that works on all platforms (CSV with semi-colons).

Then I started writing code in R. First I wanted to look at the more obvious relationships, such as that between IQ and ancestry estimates (ratios). Here I discovered that John had used a newer dataset of IQ estimates Meisenberg had sent him. However, it seems to have wrong data (Guatemala) and covers fewer relevant countries (25 vs. 35) vs. than the standard dataset from Lynn and Vanhanen 2012 (+Malloyian fixes) that I have been using. So for this reason I merged up John’s already enormous dataset (126 variables) with the latest Megadataset (365 variables), to create the cleverly named supermegadataset to be used for this study.

IQ x Ancestry zero-order correlations

Here’s the three scatterplots:

Americas_Euro_Ancestry_IQ12data

IQ_amer

IQ_Afro

So the reader might wonder, what is wrong with the Amerindian data? Why is about nill? Simply inspecting it reveals the problem. The countries with low Amerindian ancestry have very mixed European vs. African which keeps the mean around 80-85 thus creating no correlation.

Partial correlations

So my idea was this, as I wrote it in my email to John:

Hey John,I wrote my bachelor in 4 days (5 pages per day), so now I’m back to working on more interesting things. I use the LV12 data because it seems better and is larger.

One thing that had been annoying me that was correlations between ancestry and IQ do not take into account that there are three variables that vary, not just two. Remember that odd low correlation Amer x IQ r=.14 compared with Euro x IQ = .68 and Afr x IQ = -.66. The reason for this, it seems to me, is that the countries with low Amer% are a mix of high and low Afr countries. That’s why you get a flat scatterplot. See attached.

Unfortunately, one cannot just use MR with these three variables, since the following equation is true of them 1 = Euro+Afr+Amer. They are structurally dependent. Remember that MR attempts to hold the other variables constant while changing one. This is impossible.
The solution is seems to me is to use partial correlations. In this way, one can partial out one of them and look at the remaining two. There are six possible ways to do this:Amer x IQ, partial out Afr = -.51
Amer x IQ, partial out Euro = .29
Euro x IQ, partial out Afr = .41
Euro x IQ, partial out Amer = .70
Afr x IQ, partial out Euro = -.37
Afr x IQ, partial out Amer = -.76
Assuming that genotypically, Amer=85, Afr=80, Euro=97 (or so), then these results are completed as expected direction wise. In the first case, we remove Afr, so we are comparing Amer vs. Euro. We expect negative since Amer<Euro
In two, we expect positive because Amer>Afr
In three, we expect positive because Euro>Amer
In four, we expect positive because Euro>Afr
In five, we expect negative because Afr<Amer
In six, we expect negative because Afr<Euro
All six predictions were as expected. The sample size is quite small at N=34 and LV12 isn’t perfect, certainly not for these countries. The overall results are quite reasonable in my review.
Estimates of IQ directly from ancestry
But instead merely looking at it via correlations or regressions, one can try to predict the IQs directly from the ancestry. Simple create a predicted IQ based on the proportions and these populations estimated IQs. I tried a number of variations, but they were all close to this: Euro*95+Amer*85+Afro*70. The reason to use Euro 95 and not, say, 100 is that 100 is the IQ of Northern Europeans, in particular the British (‘Greenwich Mean IQ’). The European genes found in the Americans are mostly from Spain and Portugal, which have estimated IQs of 96.6 and 94.4 (mean = 95.5). This creates a problem since the US and Canada are not mostly from these somewhat lower IQ Europeans, but the error source is small (one can always just try excluding them).

So, does the predictions work? Yes.

Now, there is another kind of error with such estimates, called elevation. It refers to getting the intervals between countries right, but generally either over or underestimating them. This kind of error is undetectable in correlation analysis. But one can calculate it by taking the predicted IQs and subtracting the measured IQs, and then taking the mean of these values. Positive values mean that one is overestimating, negative means underestimation. The value for the above is: 1.9, so we’re overestimating a little bit, but it’s fairly close. A bit of this is due to USA and CAN, but then again, LCA (St. Lucia) and DMA (Dominica) are strong negative outliers, perhaps just wrong estimates by Lynn and Vanhanen (the only study for St. Lucia is this, but I don’t have the norms so I can’t calculate the IQ).

I told Davide Piffer about these results and he suggested that I use his PCA factor scores instead. Now, these are not themselves meaningful, but they have the intervals directly estimated from the genetics. His numbers are: Africa: -1.71; Native American: -0.9; Spanish: -0.3. Ok, let’s try:

PCA_predicted_IQs

Astonishingly, the correlation is almost the same. .01 from. However, this fact is less overwhelming than it seems at first because it arises simply because the correlations between the three racial estimates is .999 (95.5

Review: Writing Systems: An Introduction to Their Linguistic Analysis

www.goodreads.com/book/show/16641082-writing-systems

libgen.org/search.php?req=coulmas+writing&open=0&view=simple&phrase=1&column=def

I read this book as part of background reading for my bachelor (which im writing here) after seeing it referred to in a few other books. As a textbook it seems fine, except for the chapter dealing with psycholinguistics. Nearly all the references in this section are clearly dated, and the author is not up to speed.

Some quotes and comments.

Over time, the gap between spelling and pronunciation is bound to widen
in alphabetic orthographies, as spoken forms change and written forms are retained.
Many of the so-called ‘silent’ letters in French can be explained in this way. Catach
(1978: 65) states that 12.83 per cent of letters are mute letters in French, that is,
letters that have no phonetic interpretation whatever.

Imagine how much money and time has been spent on typing silent letters. Several hundred years of typing 13% more letters than necessary. 13% more paper use. Remember when books were actually expensive.

14 ways of writing u in English

A neat little overview. English is probably unique in this degree of linguistic insanity.

hebrew

Perhaps that’s where the name of the danish letter J comes from (jʌð). I always wondered.

We are like sailors who must rebuild their boat on the open sea without ever
being able to take it apart in a dock and reassemble it from scratch. -Otto Neurath

I have seen this one before, but i couldnt verify it via Wikiquote while writing this (on laptop).

The conflicting views about the role of phonological recoding in flu-
ent reading are mirrored in a long-standing controversy that pervades reading
teaching methods. On one hand, the phonics and decoding method views read-
ing as a process that converts written forms of language to speech forms and
then to meaning. A teaching method, consequently, should emphasize phonolog-
ical knowledge. As one leading proponent of the phonics/decoding approach puts
it, ‘phonological skills are not merely concomitants or by-products of reading
ability; they are true antecedents that may account for up to 60 per cent of the
variance in children’s reading ability’ (Mann 1991: 130). On the other hand, the
whole-word method sees reading as a form of communication that consists of the
reception of information through the written form, the recovery of meaning being
the essential purpose. ‘Since it is the case that learning to recognize whole words
is necessary to be a fluent reader, therefore, the learning of whole words right
from the start may be easier and more effective’ (Steinberg, Nagata and Aline
2001: 97).

This sounds like another case of social scientists identifying g without realizing it. Phonological awareness surely correlates with g.

onlinelibrary.wiley.com/doi/10.1002/(SICI)1099-0909(199806)4:2%3C73::AID-DYS104%3E3.0.CO;2-%23/abstract

This study shows that a factor analysis of 4 WAIS subtests + a phonological awareness test. PA had a loading on g of .61.

See also: psycnet.apa.org/journals/psp/86/1/174/ and www.sciencedirect.com/science/article/pii/S0160289697900167

Although a general correlation between literacy rate and prosperity can be ob-
served, relatively poor countries with high literacy rates, such as Vietnam and Sri Lanka, and very rich countries with residual illiteracy, such as the United States, do exist.

This pattern is easily explainable if one knows that the national g of Vietnam and Sri Lanka is around world average, while the US is much higher. The high illiteracy rate of the US is becus of their minority populations of hispanics and african americans.

Discrimination against females in grant applications or publication bias?

While looking for peer-review related studies, I came across a meta-analysis of gender bias in grant applications. That sounds good.

Bornmann, L., Mutz, R., & Daniel, H. D. (2007). Gender differences in grant peer review: A meta-analysis. Journal of Informetrics, 1(3), 226-238.

Abstract
Narrative reviews of peer review research have concluded that there is negligible evidence of gender bias in the awarding of grants based on peer review. Here, we report the findings of a meta-analysis of 21 studies providing, to the contrary, evidence of robust gender differences in grant award procedures. Even though the estimates of the gender effect vary substantially from study to study, the model estimation shows that all in all, among grant applicants men have statistically significant greater odds of receiving grants than women by about 7%.

Sounds intriguing? Knowing that publication bias especially on part of the authors would be a problem with these kind of studies (strong political motivation to publish studies finding bias against females), I immediately searched for key words related to publication bias… but didn’t find any. Then I skimmed the article. Very odd? Who does a meta-analysis on stuff like this, or most stuff anyway, without checking for publication bias?

TL;DR funnel plot. Briefly, publication bias happens when authors tend to send in papers that would results they liked rather than studies that failed to find results they liked. It can also result from biased reviewing. Since most social scientists believe in the magic of p<.05, this means scholars tend to publish studies meeting that arbitrary demand and not those who didn’t. Furthermore, when scholars do a lot of sciencing, gathering a lot of data, they are also more likely to submit due to having a larger amount of time invested in the project. These two together means that there is an interaction between effect size and direction and the N. People who spent lots of time doing a huge study will generally be very likely to publish it even tho the results weren’t as expected. But people who run small papers will to a higher degree not bother about writing up a small paper with negative results. This means that there will be a negative correlation between sample size and the preferred outcome, in this case bias against females.

Back to the meta-analysis. Luckily, the authors provide the data to calculate any bias. Their Table 1 has sample size (“Number of submitted applications”) and two datapoints one can calculate effect size from (“Proportion of women among all applicants”, “Proportion of women among approved applicants”). Simply subtract number of women who got a grant from the ones who submitted to get a female disfavor measure. (Well, actually it may just be that women write worse applications, so it does not imply bias at all.) Then correlate this with the sample size.

So there data is here. The funnel plot is below.

Funnel plot of studies examining gender bias in grant applications 2

There was indeed signs of publication bias. The simple N x effect size did not reach p <.05. However there is some question as to which measure of sample size one should use. The distribution is clearly not linear, thus violating the assumptions of linear regression/Pearson correlations. This page lists a variety of other options, the best perhaps being standard error, which however is not given by the authors. Here’s a funnel plot for log transformed sample sizes. This one has p=.04, so barely below the magic line.

Funnel plot of studies examining gender bias in grant applications log

Beautiful nonsense

In a comment on research of psychedelics, I found this gem:

.Your brain is a matrix that is dispensing substances known as neurotransmitters which don’t actually transmit anything but rather their purpose is to alter the rate and manner in which energy is affected. The resulting diffusion of energy is a 3-dimensional fractal continuously expanding outward in all directions or your ora which is being observed by a Consciousness at a given distance so it appears as a sphere. your ora could and perhaps is being observed at multiple distances giving rise to the sphere within spheres notion. As the observer perceives your ora it analyzes the resulting fractals (like divining the surface of the sun) and forms ideas within itself and then these ideas are conveyed back to you as thoughts to see how You react. This is either due to its choice or more likely due to the nature of “knowing” Let me deconstruct the word know for you, there is a line (l) intersected (->l) that diverges (->K) in (N) and around (O) to double you (W). The act of analyzing and conveying the information is perhaps the 1/2 to 2 second delay in “our” reality Didn’t Plato note that man by his nature is a member of a group which could be taken a step farther by saying man by his nature needs another to “be”. In this reality we are observed so that perhaps we become aware. So what I perceive as my conscious mind is my perception of the observed sphere (that hazy mirrored reflection) and my thoughts which are actually the interpretation of the observed sphere by another. Here is a way to examine what I mean, become a point in space then a sphere then back to a point again over and over, you can easily “see” a star and manipulate it by changing the perceived distance but when you are the point and become the sphere and back again you can only “feel” a sense of expansion and contraction. If we are only this then why do we perceive so readily from the outside and not vise versa. My subconscious is the swirling chaos of the 3-d fractal while my higher consciousness is that part of the interaction that escaped the analysis of the observer and is expanding infinitely fleeing from the observers expanding sphere of analysis. With its own analysis slowing it the only hope for rapture is becoming the leading edge expanding exponentially to complete dissipation. ora becoming light I meant.

My favorite part:

My subconscious is the swirling chaos of the 3-d fractal while my higher consciousness is that part of the interaction that escaped the analysis of the observer and is expanding infinitely fleeing from the observers expanding sphere of analysis.

I kinda want a t-shirt with it now.

For the uninitiated, see: emilkirkegaard.dk/en/?p=3629, emilkirkegaard.dk/en/?p=3490, emilkirkegaard.dk/en/?p=2537

W values from the Shapiro-Wilk test visualized with different datasets

For a mathematical explanation of the test, see e.g. here. However, such an explanation is not very useful for using the test in practice. Just what does a W value of .95 mean? What about .90 or .99? One way to get a feel for it, is to simulate datasets, plot them and calculate the W values. Additionally, one can check the sensitivity of the test, i.e. the p value.

All the code is in R.

#random numbers from normal distribution
set.seed(42) #for reproducible numbers
x = rnorm(5000) #generate random numbers from normal dist
hist(x,breaks=50, main="Normal distribution, N=5000") #plot
shapiro.test(x) #SW test
>W = 0.9997, p-value = 0.744

SW_norm

So, as expected, W was very close to 1, and p was large. In other words, SW did not reject a normal distribution just because N is large. But maybe it was a freak accident. What if we were to repeat this experiment 1000 times?

#repeat sampling + test 1000 times
Ws = numeric(); Ps = numeric() #empty vectors
for (n in 1:1000){ #number of simulations
  x = rnorm(5000) #generate random numbers from normal dist
  sw = shapiro.test(x)
  Ws = c(Ws,sw$statistic)
  Ps = c(Ps,sw$p.value)
}
hist(Ws,breaks=50) #plot W distribution
hist(Ps,breaks=50) #plot P distribution
sum(Ps<.05) #how many Ps below .05?

The number of Ps below .05 was in fact 43, or 4.3%. I ran the code with 100,000 simulations too, which takes 10 minutes or something. The value was 4389, i.e. 4.4%. So it seems that the method used to estimate the P value is slightly off in that the false positive rate is lower than expected.

What about the W statistic? Is it sensitive to fairly small deviations from normality?

#random numbers from normal distribution, slight deviation
x = c(rnorm(4900),rnorm(100,2))
hist(x,breaks=50, main="Normal distribution N=4900 + normal distribution N=200, mean=2")
shapiro.test(x)
>W = 0.9965, p-value = 1.484e-09


Here I started with a very large norm. dist. and added a small norm dist. to it with a different mean. The difference is hardly visible to the eye, but the P value is very small. The reason is that the large sample size makes it possible to detect even very small deviations from normality. W was again very close to 1, indicating that the distribution was close to normal.

What about a decidedly non-normal distribution?

#random numbers between -10 and 10
x = runif(5000, min=-10, max=10)
hist(x,breaks=50,main="evenly distributed numbers [-10;10], N=5000")
shapiro.test(x)
>W = 0.9541, p-value < 2.2e-16

 

SW_even

SW wisely rejects this with great certainty as being normal. However, W is near 1 still (.95). This tells us that the W value does not vary very much even when the distribution is decidedly non-normal. For interpretation then, we should probably bark when W drops just under .99 or so.

As a further test of the W values, here’s two equal sized distributions plotted together.

#normal distributions, 2 sd apart (unimodal fat normal distribution)
x = c(rnorm(2500, -1, 1),rnorm(2500, 1, 1))
hist(x,breaks=50,main="Mormal distributions, 2 sd apart")
shapiro.test(x)
>W = 0.9957, p-value = 6.816e-11
sd(x)
>1.436026

SW_norm3 It still looks fairly normal, altho too fat. The standard deviation is in fact 1.44, or 44% larger than it is supposed to be. The W value is still fairly close to 1, however, and only a little less than from the distribution that was only slightly nonnormal (Ws = .9957 and .9965). What about clearly bimodal distributions?

#bimodal normal distributions, 4 sd apart
x = c(rnorm(2500, -2, 1),rnorm(2500, 2, 1))
hist(x,breaks=50,main="Normal distributions, 4 sd apart")
shapiro.test(x)
>W = 0.9464, p-value < 2.2e-16

SW_norm4

This clearly looks nonnormal. SW rejects it rightly and W is about .95 (W=0.9464). This is a bit lower than for the evenly distributed numbers. (W=0.9541)

What about an extreme case of nonnormality?

#bimodal normal distributions, 20 sd apart
x = c(rnorm(2500, -10, 1),rnorm(2500, 10, 1))
hist(x,breaks=50,main="Normal distributions, 20 sd apart")
shapiro.test(x)
>W = 0.7248, p-value < 2.2e-16

SW_norm5

Finally we make a big reduction in the W value.

What about some more moderate deviations from normality?

#random numbers from normal distribution, moderate deviation
x = c(rnorm(4500),rnorm(500,2))
hist(x,breaks=50, main="Normal distribution N=4500 + normal distribution N=500, mean=2")
shapiro.test(x)
>W = 0.9934, p-value = 1.646e-14

SW_norm6

This one has a longer tail on the right side, but it still looks fairly normal. W=.9934.

#random numbers from normal distribution, large deviation
x = c(rnorm(4000),rnorm(1000,2))
hist(x,breaks=50, main="Normal distribution N=4000 + normal distribution N=1000, mean=2")
shapiro.test(x)
>W = 0.991, p-value < 2.2e-16

SW_norm7

This one has a very long right tail. W=.991.

In conclusion

Generally we see that given a large sample, SW is sensitive to departures from non-normality. If the departure is very small, however, it is not very important.

We also see that it is hard to reduce the W value even if one deliberately tries. One needs to test extremely non-normal distribution in order for it to fall appreciatively below .99.

New paper out: The personal Jensen coefficient does not predict grades beyond its association with g

Found null results for a proposed metric (actually two). In the spirit of publishing failed ideas, I wrote this up.

Abstract

General intelligence (g) is known to predict grades at all educational levels. A Jensen coefficient is the correlation of subtests’ g-loadings with a vector of interest. I hypothesized that the personal Jensen coefficient from the subjects’ subtest scores might predict grade point average beyond g. I used an open dataset to test this. The results showed that it does not seem to have predictive power beyond g (partial correlation = -.02). I found the same result when using a similar metric suggested by Davide Piffer.

openpsych.net/ODP/2014/10/the-personal-jensen-coefficient-does-not-predict-grades-beyond-its-association-with-g/

Meisenberg’s new book chapter on intelligence, economics and other stuff

G.M. IQ & Economic growth

I noted down some comments while reading it.

In Table 1, Dominican birth cohort is reversed.

 

“0.70 and 0.80 in world-wide country samples. Figure 1 gives an impression of

this relationship.”

 

Figure 1 shows regional IQs, not GDP relationships.

“We still depend on these descriptive methods of quantitative genetics because

only a small proportion of individual variation in general intelligence and

school achievement can be explained by known genetic polymorphisms (e.g.,

Piffer, 2013a,b; Rietveld et al, 2013).”

 

We don’t. Modern BG studies can confirm A^2 estimates directly from the genes.

E.g.:

Davies, G., Tenesa, A., Payton, A., Yang, J., Harris, S. E., Liewald, D., … & Deary, I. J. (2011). Genome-wide association studies establish that human intelligence is highly heritable and polygenic. Molecular psychiatry, 16(10), 996-1005.

Marioni, R. E., Davies, G., Hayward, C., Liewald, D., Kerr, S. M., Campbell, A., … & Deary, I. J. (2014). Molecular genetic contributions to socioeconomic status and intelligence. Intelligence, 44, 26-32.

Results are fairly low tho, in the 20’s, presumably due to non-additive heritability and rarer genes.

 

“Even in modern societies, the heritability of

intelligence tends to be higher for children from higher socioeconomic status

(SES) families (Turkheimer et al, 2003; cf. Nagoshi and Johnson, 2005; van

der Sluis et al, 2008). Where this is observed, most likely environmental

conditions are of similar high quality for most high-SES children but are more

variable for low-SES children. “

 

Or maybe not. There are also big studies that don’t find this interaction effect. en.wikipedia.org/wiki/Heritability_of_IQ#Heritability_and_socioeconomic_status

 

“Schooling has

only a marginal effect on growth when intelligence is included, consistent with

earlier results by Weede & Kämpf (2002) and Ram (2007).”

In the regression model of all countries, schooling has a larger beta than IQ does (.158 and .125). But these appear to be unstandardized values, so they are not readily comparable.

“Also, earlier studies that took account of

earnings and cognitive test scores of migrants in the host country or IQs in

wealthy oil countries have concluded that there is a substantial causal effect of

IQ on earnings and productivity (Christainsen, 2013; Jones & Schneider,

2010)”

 

National IQs were also found to predict migrant income, as well as most other socioeconomic traits, in Denmark and Norway (and Finland and the Netherland).

Kirkegaard, E. O. W. (2014). Crime, income, educational attainment and employment among immigrant groups in Norway and Finland. Open Differential Psychology.

Kirkegaard, E. O. W., & Fuerst, J. (2014). Educational attainment, income, use of social benefits, crime rate and the general socioeconomic factor among 71 immigrant groups in Denmark. Open Differential Psychology.

 

 

Figures 3 A-C are of too low quality.

 

 

“Allocation of capital resources has been an

element of classical growth theory (Solow, 1956). Human capital theory

emphasizes that individuals with higher intelligence tend to have lower

impulsivity and lower time preference (Shamosh & Gray, 2008). This is

predicted to lead to higher savings rates and greater resource allocation to

investment relative to consumption in countries with higher average

intelligence.”

 

Time preference data for 45 countries are given by:

Wang, M., Rieger, M. O., & Hens, T. (2011). How time preferences differ: evidence from 45 countries.

They are in the megadataset from version 1.7f

Correlations among some variables of interest:

r
             SlowTimePref Income.in.DK Income.in.NO   IQ lgGDP
SlowTimePref         1.00         0.45         0.48 0.57  0.64
Income.in.DK         0.45         1.00         0.89 0.55  0.59
Income.in.NO         0.48         0.89         1.00 0.65  0.66
IQ                   0.57         0.55         0.65 1.00  0.72
lgGDP                0.64         0.59         0.66 0.72  1.00

n
             SlowTimePref Income.in.DK Income.in.NO  IQ lgGDP
SlowTimePref          273           32           12  45    40
Income.in.DK           32          273           20  68    58
Income.in.NO           12           20          273  23    20
IQ                     45           68           23 273   169
lgGDP                  40           58           20 169   273

So time prefs predict income in DK and NO only slightly worse than national IQs or lgGDP.

 

 

“Another possible mediator of intelligence effects that is difficult to

measure at the country level is the willingness and ability to cooperate. A

review by Jones (2008) shows that cooperativeness, measured in the Prisoner‟s

dilemma game, is positively related to intelligence. This correlate of

intelligence may explain some of the relationship of intelligence with

governance. Other likely mediators of the intelligence effect include less red

tape and restrictions on economic activities (“economic freedom”), higher

savings and/or investment, and technology adoption in developing countries.”

 

There are data for IQ and trust too. Presumably trust is closely related to willingness to cooperate.

Carl, N. (2014). Does intelligence explain the association between generalized trust and economic development? Intelligence, 47, 83–92. doi:10.1016/j.intell.2014.08.008

 

 

“There is no psychometric evidence for rising intelligence before that time

because IQ tests were introduced only during the first decade of the 20th

century, but literacy rates were rising steadily after the end of the Middle Age

in all European countries for which we have evidence (Mitch, 1992; Stone,

1969), and the number of books printed per capita kept rising (Baten & van

Zanden, 2008).”

 

There’s also age heaping scores which are a crude measure of numeracy. AH scores for 1800 to 1970 are in the megadataset. They have been going up for centuries too just like literacy scores. See:

A’Hearn, B., Baten, J., & Crayen, D. (2009). Quantifying quantitative literacy: Age heaping and the history of human capital. The Journal of Economic History, 69(03), 783–808.

 

 

“Why did this spiral of economic and cognitive growth take off in Europe

rather than somewhere else, and why did it not happen earlier, for example in

classical Athens or the Roman Empire? One part of the answer is that this

process can start only when technologies are already in place to translate rising

economic output into rising intelligence. The minimal requirements are a

writing system that is simple enough to be learned by everyone without undue

effort, and a means to produce and disseminate written materials: paper, and

the printing press. The first requirement had been present in Europe and the

Middle East (but not China) since antiquity, and the second was in place in

Europe from the 15thcentury. The Arabs had learned both paper-making and

printing from the Chinese in the 13thcentury (Carter, 1955), but showed little

interest in books. Their civilization was entering into terminal decline at about

that time (Huff, 1993). “

 

Are there no FLynn effects in China? They still have a difficult writing system.

 

“Most important is that Flynn effect gains have been decelerating in recent

years. Recent losses (anti-Flynn effects) were noted in Britain, Denmark,

Norway and Finland. Results for the Scandinavian countries are based on

comprehensive IQ testing of military conscripts aged 18-19. Evidence for

losses among British teenagers is derived from the Raven test (Flynn, 2009)

and Piagetian tests (Shayer & Ginsburg, 2009). These observations suggest

that for cohorts born after about 1980, the Flynn effect is ending or has ended

in many and perhaps most of the economically most advanced countries.

Messages from the United States are mixed, with some studies reporting

continuing gains (Flynn, 2012) and others no change (Beaujean & Osterlind,

2008).”

 

These are confounded with immigration of low-g migrants however. Maybe the FLynn effect is still there, just being masked by dysgenics + low-g immigration.

 

 

“The unsustainability of this situation is obvious. Estimating that one third

of the present IQ differences between countries can be attributed to genetics,

and adding this to the consequences of dysgenic fertility within countries,

leaves us with a genetic decline of between 1 and 2 IQ points per generation

for the entire world population. This decline is still more than offset by Flynn

effects in less developed countries, and the average IQ of the world‟s

population is still rising. This phase of history will end when today‟s

developing countries reach the end of the Flynn effect. “Peak IQ” can

reasonably be expected in cohorts born around the mid-21stcentury. The

assumptions of the peak IQ prediction are that (1) Flynn effects are limited by

genetic endowments, (2) some countries are approaching their genetic limits

already, and others will fiollow, and (3) today‟s patterns of differential fertility

favoring the less intelligent will persist into the foreseeable future. “

 

It is possible that embryo selection for higher g will kick in and change this.

Shulman, C., & Bostrom, N. (2014). Embryo Selection for Cognitive Enhancement: Curiosity or Game-changer? Global Policy, 5(1), 85–92. doi:10.1111/1758-5899.12123

 

 

“Fertility differentials between countries lead to replacement migration: the

movement of people from high-fertility countries to low-fertility countries,

with gradual replacement of the native populations in the low-fertility

countries (Coleman, 2002). The economic consequences depend on the

quality of the migrants and their descendants. Educational, cognitive and

economic outcomes of migrants are influenced heavily by prevailing

educational, cognitive and economic levels in the country of origin (Carabaña,

2011; Kirkegaard, 2013; Levels & Dronkers, 2008), and by the selectivity of

migration. Brain drain from poor to prosperous countries is extensive already,

for example among scientists (Franzoni, Scellato & Stephan, 2012; Hunter,

Oswald & Charlton, 2009). “

 

There are quite a few more papers on the spatial transferability hypothesis. I have 5 papers on this alone in ODP: openpsych.net/ODP/tag/country-of-origin/

But there’s also yet unpublished data for crime in Netherlands and more crime data for Norway. Papers based off these data are on their way.