Clear Language, Clear Mind

December 6, 2018

Suppressed Taylor Swift – Shake it off meme remix

Filed under: Humor — Tags: , , — Emil O. W. Kirkegaard @ 04:03

Reposting this censored but great video.

  • https://news.avclub.com/gif-artists-shake-up-taylor-swift-s-shake-it-off-1798252599

 

March 18, 2015

International differences in intelligence can be confusing: A commentary on Harrison et al (2015)

Abstract

In this commentary I explain how mean differences between normal distributions give rise to different percentages of the populations being above or below a given threshold, depending on where the threshold is.

Introduction

Research uncovers flawed IQ scoring system” is the headline on phys.org, which often posts news about research from other fields. It concerns a study by Harrison et al (2015). The researchers have allegedly “uncovered anomalies and issues with the Wechsler Adult Intelligence Scale-Fourth Edition (WAIS-IV), one of the most widely used intelligence tests in the world”. An important discovery, if true. Let’s hear it from the lead researcher:

“Looking at the normal distribution of scores, you’d expect that only about five per cent of the population should get an IQ score of 75 or less,” says Dr. Harrison. “However, while this was true when we scored their tests using the American norms, our findings showed that 21 per cent of college and university students in our sample had an IQ score this low when Canadian norms were used for scoring.”

How can it be? To learn more, we delve into the actual paper titled: Implications for Educational Classification and Psychological Diagnoses Using the Wechsler Adult Intelligence Scale–Fourth Edition With Canadian Versus American Norms.

The paper

First they summarize a few earlier studies on Canada and the US. The Canadians obtained higher raw scores. Of course, this was hypothesized to be due to differences in ethnicity and educational achievement factors. However, this did not quite work out, so Harrison et al decided to investigate it more (they had already done so in 2014). Their method consists of taking the scores from a large mixed sample consisting of healthy people — i.e. with no diagnosis, 11% — and people with various mental disorders (e.g. 53.5% with ADHD), and then scoring this group on both the American and the Canadian norms. What did they find?

Blast! The results were similar to the results from the previous standardization studies! What happened? To find out, Harrison et al do a thorough examination of various subgroups in various ways. No matter which age group they compare, the result won’t go away. They also report the means and Cohen’s d for each subtest and aggregate measure — very helpful. I reproduce their Table 1 below:

Score M (US)
SD (US)
M (CAN)
SD (CAN)
p d r
FSIQ 95.5 12.9 88.1 14.4 <.001 0.54 0.99
GAI 98.9 14.7 92.5 16 <.001 0.42 0.99
Index Scores
Verbal Comprehension 97.9 15.1 91.8 16.3 <.001 0.39 0.99
Perceptual Reasoning 99.9 14.1 94.5 15.9 <.001 0.36 0.99
Working Memory 90.5 12.8 83.5 13.8 <.001 0.53 0.99
Processing Speed 95.2 12.9 90.4 14.1 <.001 0.36 0.99
Subtest Scores
Verbal Subtests
Vocabulary 9.9 3.1 8.7 3.3 <.001 0.37 0.99
Similarities 9.7 3 8.5 3.3 <.001 0.38 0.98
Information 9.2 3.1 8.5 3.3 <.001 0.22 0.99
Arithmetic 8.2 2.7 7.4 2.7 <.001 0.3 0.99
Digit Span 8.4 2.5 7.1 2.7 <.001 0.5 0.98
Performance Subtests
Block Design 9.8 3 8.9 3.2 <.001 0.29 0.99
Matrix Reasoning 9.8 2.9 9.1 3.2 <.001 0.23 0.99
Visual Puzzles 10.5 2.9 9.4 3.1 <.001 0.37 0.99
Symbol Search 9.3 2.8 8.5 3 <.001 0.28 0.99
Coding 8.9 2.5 8.2 2.6 <.001 0.27 0.98

 

Sure enough, the scores are lower using the Canadian norms. And very ‘significant’ too. A mystery.

Next, they go on to note how this sometimes changes the classification of individuals into 7 arbitrarily chosen intervals of IQ scores, and how this differs between subtests. They spend a lot of e-ink noting percents about this or that classification. For instance:

“Of interest was the percentage of individuals who would be classified as having a FSIQ below the 10th percentile or who would fall within the IQ range required for diagnosis of ID (e.g., 70 ± 5) when both normative systems were applied to the same raw scores. Using American norms, 13.1% had an IQ of 80 or less, and 4.2% had an IQ of 75 or less. By contrast, when using Canadian norms, 32.3% had an IQ of 80 or less, and 21.2% had an IQ of 75 or less.”

I wonder if some coherent explanation can be found for all these results. In their discussion they ask:

“How is it that selecting Canadian over American norms so markedly lowers the standard scores generated from the identical raw scores? One possible explanation is that more extreme scores occur because the Canadian normative sample is smaller than the American (cf. Kahneman, 2011).”

If the reader was unsure, yes, this is Kahneman’s 2011 book about cognitive biases and dual process theory.

They have more suggestions about the reason:

“One cannot explain this difference simply by saying it is due to the mature students in the sample who completed academic upgrading, as the score differences were most prominent in the youngest cohorts. It is difficult to explain these findings simply as a function of disability status, as all participants were deemed otherwise qualified by these postsecondary institutions (i.e., they had met normal academic requirements for entry into regular postsecondary programs). Furthermore, in Ontario, a diagnosis of LD is given only to students with otherwise normal thinking and reasoning skills, and so students with such a priori diagnosis would have had otherwise average full scale or general abilities scores when tested previously. Performance exaggeration seems an unlikely cause for the findings, as the students’ scores declined only when Canadian norms were applied. Finally, although no one would argue that a subset of disabled students might be functioning below average, it is difficult to believe that almost half of these postsecondary students would fall in this IQ range given that they had graduated from high school with marks high enough to qualify for acceptance into bona fide postsecondary programs. Whatever the cause, our data suggest that one must question both the representativeness of the Canadian normative sample in the younger age ranges and the accuracy of the scores derived when these norms are applied.”

And finally they conclude with a recommendation not to use the Canadian norms for Canadians because this results in lower IQs:

Overall, our findings suggest a need to examine more carefully the accuracy and applicability of the WAIS-IV Canadian norms when interpreting raw test data obtained from Canadian adults. Using these norms appears to increase the number of young adults identified as intellectually impaired and could decrease the number who qualify for gifted programming or a diagnosis of LD. Until more research is conducted, we strongly recommend that clinicians not use Canadian norms to determine intellectual impairment or disability status. Converting raw scores into Canadian standard scores, as opposed to using American norms, systematically lowers the scores of postsecondary students below the age of 35, as the drop in FSIQ was higher for this group than for older adults. Although we cannot know which derived scores most accurately reflect the intellectual abilities of young Canadian adults, it certainly seems implausible that almost half of postsecondary students have FSIQ scores below the 16th percentile, calling into question the accuracy of all other derived WAIS-IV Canadian scores in the classification of cognitive abilities.

Are you still wondering what it going on?

Populations with different mean IQs and cut-offs

Harrison et al seems to have inadvertently almost rediscovered the fact that Canadians are smarter than Americans. They don’t quite make it to this point even when faced with obvious and strong evidence (multiple standardization samples). They somehow don’t realize that using the norms from these standardization samples will reproduce the differences found in those samples, and won’t really report anything new.

Their numerous differences in percents reaching this or that cut-off are largely or entirely explained by simple statistics. They have two populations which have an IQ difference of 7.4 points (95.5 – 88.1 from Table 1) or 8.1 points (15 * .54 d from Table 1). Now, if we plot these (I used a difference of 7.5 IQ) and choose some arbitrary cut-offs, like those between arbitrarily chosen intervals, we see something like this:

2pop

Except that I cheated and chose all the cut-offs. The brown and the green lines are the ratios between the densities (read off the second y-axis). We see that around 100, they are generally low, but as we get further from the means, they get a lot larger. This simple fact is not generally appreciated. It’s not a new problem, Arthur Jensen spent much of a chapter in his behemoth 1980 book on the topic, he quotes for instance:

“In the construction trades, new apprentices were 87 percent white and 13 percent black. [Blacks constitute 12 percent of the U.S. population.] For the Federal Civil Service, of those employees above the GS-5 level, 88.5 percent were white, 8.3 percent black, and women account for 30.1 of all civil servants. Finally, a 1969 survey of college teaching positions showed whites with 96.3 percent of all posi­ tions. Blacks had 2.2 percent, and women accounted for 19.1 percent. (U.S. Commission on Civil Rights, 1973)”

Sounds familiar? Razib Khan has also written about it. Now, let’s go back to one of the quotes:

“Using American norms, 13.1% had an IQ of 80 or less, and 4.2% had an IQ of 75 or less. By contrast, when using Canadian norms, 32.3% had an IQ of 80 or less, and 21.2% had an IQ of 75 or less. Most notably, only 0.7% (2 individuals) obtained a FSIQ of 70 or less using American norms, whereas 9.7% had IQ scores this low when Canadian norms were used. At the other end of the spectrum, 1.4% of the students had FSIQ scores of 130 or more (gifted) when American norms were used, whereas only 0.3% were this high using Canadian norms.”

We can put these in a table and calculate the ratios:

IQ threshold Percent US Percent CAN US/CAN CAN/US
130 1.4 0.3 4.67 0.21
80 13.1 32.3 0.41 2.47
75 4.2 21.2 0.20 5.05
70 0.7 9.7 0.07 13.86

 

And we can also calculate the expected values based on the two populations (with means of 95.5 and 88) above:

IQ threshold Percent US Percent CAN US/CAN CAN/US
130 1.07 0.26 4.12 0.24
80 15.07 29.69 0.51 1.97
75 8.59 19.31 0.44 2.25
70 4.46 11.51 0.39 2.58

 

This is fairly close right? The only outlier (in italic) is the much lower than expected value for <70 IQ using US norms, perhaps a sampling error. But overall, this is a pretty good fit to the data. Perhaps we have our explanation.

What about those (mis)classification values in their Table 2? Well, for similar reasons that I won’t explain in detail, these are simply a function of the difference between the groups in that variable, e.g. Cohen’s d. In fact, if we correlate the d vector and the “% within same classification” we get a correlation of -.95 (-.96 using rank-orders).

MCV analysis

Incidentally, the d values report in their Table 1 are useful for using the method of correlated vectors. In a previous study comparing US and Canadian IQ data, Dutton and Lynn (2014) compared WAIS-IV standardization data. They found a mean difference of .31 d, or 4.65 IQ, which was reduced to 2.1 IQ if the samples were matched on education, ethnicity and sex. An interesting thing was that the difference between the countries was largest on the most g-loading subtests. When this happens, it is called a Jensen effect (or that it has a positive Jensen coefficient, Kirkegaard 2014). The value in their study was .83, which is on the high side (see e.g. te Nijenhuis et al, 2015).

I used the same loadings as used in their study (McFarland, 2013), and found a correlation of .24 (.35 with rank-order), substantially weaker.

Supplementary material

The R code and data files can be found in the Open Science Framework repository.

References

  • Harrison, A. G., Holmes, A., Silvestri, R., Armstrong, I. T. (2015). Implications for Educational Classification and Psychological Diagnoses Using the Wechsler Adult Intelligence Scale–Fourth Edition With Canadian Versus American Norms. Journal of Psychoeducational Assessment. 1-13.
  • Jensen, A. R. (1980). Bias in Mental Testing.
  • Kirkegaard, E. O. (2014). The personal Jensen coefficient does not predict grades beyond its association with g. Open Differential Psychology.
  • McFarland, D. (2013). Model individual subtests of the WAIS IV with multiple latent
    factors. PLoSONE. 8(9): e74980. doi:10.1371/journal.pone.0074980
  • te Nijenhuis, J., van den Hoek, M., & Armstrong, E. L. (2015). Spearman’s hypothesis and Amerindians: A meta-analysis. Intelligence, 50, 87-92.

February 11, 2015

Magic pun

Filed under: Humor — Tags: , — Emil O. W. Kirkegaard @ 23:16

best gf

January 22, 2015

Gender distribution of comedians over time

It is a long time ago since I did this project. I did not write about it here before but it is a pity since the results are thus not ‘out there’. I put the project page here in 2012 (!). In short, I wrote python code to crawl Wikipedia lists. I figured out a way to decide whether a person was male or female. This was done using gendered pronouns which exist in English. I.e., the crawler fetches the full-text of the article, and counts “he”, “his”, “him”, “she”, “her”. It assigns the gender with the most pronouns. This method seems rather reliable in my informal testing.

I specifically wrote it to look at comedians because I had read a study of comedians (Greengross et al 2012). They gave personality and a vocabulary test (from the Multidimensional Aptitude Battery, r=.62 with WAIS-R) to a sample of 31 comedians and psychology 400 students. The comedians scored 1.34 d above the students. Some care must be taken with this result. The comedians were much older and vocabulary raw scores go up with age (mean age 38.9 vs. 20.5). The authors do not state that they were age-corrected. Psychology students are not very bright and this was a sample from New Mexico with lots of Hispanics. We can safely conclude that comedians are smarter than the student body and the general population of New Mexico, but can’t say much about exactly. We can hazard a guess at student body (maybe 107 IQ) + age corrected d (maybe 15 IQ), so we end with an estimate of 122 IQ.

There are various other tables of interest that don’t need much explaining, which I will paste below:

comedian_table1comedian_table2comedian_table3comedian_table4

As of writing this, I found another older study (Janus, 1975). I will just quote:

Method
The data to support the above theses were gathered through psychological case studies, in-depth interviews with many of the leading comedians in the United States today, and psychological tests. [n addition to a clinical interview, the instruments used were the Wechsler Adult Intelligence Scale, Machover Human Figure Drawing Test, graphological analysis, earliest memories, and recurring dreams.

Population
Population consisted of 55 professional comedians. In order to be considered in this study, comedians had to be full-time professional stand-up comedians. Most of the subjects earned salaries of six figures or over, from comedy alone. In order to make the sample truly representative, each comedian had to be nationally known and had to have been in the field full time for at least ten years. The average time spent in full- time comedy for the subjects was twenty-five years. The group consisted of fifity-one men and four women. They represented all major religions, many geographic areas, and diverse socioeconomic backgrounds. Comedians were interviewed in New York, California, and points in between. Their socioeconomic backgrounds, family hierarchy, demographic information, religious influences, and analytic material were investigated. Of the population researched, 85 percent came from lower-class homes, 10 percent from lower-middle-class homes, and 5 percent from middle-class and upper-middle-class homes. All subjects participated voluntarily, received no remuneration, and were personally interviewed by the author.

Intelligence
I.Q. scores ranged from 115 to 160+. For a population at large, I.Q. scores in the average range are from 90 to 110. I.Q. scores in the bright-average range of intelligence, that is, from 10g to 115, were scored by only three subjects. The remainder scored above 125, with the mean score being 138. The vocabulary subtest was utilized. Several subjects approached it as a word-association test, but all regarded it as a challenge. Since these are verbal people, they were highly motivated. The problem was not one of getting them to respond, it was one of continuously allaying their anxiety, and re- assuring them they they were indeed doing well.

So, a very high mean was found. WAIS was published in 1955, so there is approximately 20 years of FLynn gains in raw scores, presumably uncorrected for. According to a new meta-analysis of FLynn gains (Trahan et al 2014), the mean gain is 2.31 per decade. So we are assuming about a gain of 4.6 IQ here. But then again, the verbal test for the students was published in 1984, so there may be some gain there as well (FLynn effects supposedly showed down recent in Western countries). Perhaps a net gain in favor of the old study by 4 IQ. In that case, we get estimates of 134 and 122. With samples of 31 and 55, different subtests, sampling procedure etc., this is surely reasonable. We can take a weighted mean and say best estimate for professional comedians is about 129.7, or about +2SD. It seems a bit wild, are comedians really on average as smart as fysicists?

EDIT: There is another study by Janus (1978). Same test:

[N=14] Intelligence scores ranged from 112 to 144 plus. (The range of average IQ is from 90 to 110.) Four subjects scored in the bright average range–i.e., 108 to 115. The remaining subjects scored above 118 with a mean score of 126. Two subjects scored above 130. The mean score for male comics was 138. The subjects approached the testing with overenthusiasm, in some cases bordering on frenzy. Despite the brightness of the group, all subjects needed constant reassurance and positive feedback.

So 126, with ~5 IQ because of FLynn effect. New weighted mean is 128.5 IQ.

Perhaps we should test it. If you want to test it with me, write me an email/tweet. We will design a questionnaire and give it to your local sample of comedians. One can e.g. try to convince professional comedian organizations (e.g. Danish here, N=35) to forward it to their members.

So what did I find?

I did the scraping twice. One time at first in 2012, and then again later when I was reminded of the project in May 2014. Now I have been reminded of it again. The very basic stats is that there were 1106 comedians found, of which the gender distribution was this (the “other” is unknown gender, which was 1 person).

What about the change over time? The code fetches their birth year if mentioned on their Wikipedia page. Then I limited the data to US comedians (66% of the sample). This was done because if we are looking for ways to explain it, we need to restrict ourselves to some more homogenous subset. What explains the change in gender distribution in Saudi Arabia at time t1 may not also explain it in Japan.

Next we get a common scientific conflict of interest: that between precision of estimate and detail. Essentially what we need is a moving average since most or all years have too few comedians for a reliable estimate (very zigzaggy lines on the plot). So we must decide how large a moving average to use. A larger will give more precision in estimate, but less detail. I decided to try a few different options (5, 10, 15, 20). To avoid extreme zigzagginess, I only plotted them if there were >=20 persons in the interval. This plots look like this:




So in general we see a decline in the proportion of male comedians. But it is not not going straight down. There is a local minimum in 1960 or so, and a local maximum in 1980 or so. How to explain these?

I tried abortion rate (not much data before 1973) and total fertility rate (plenty of data) but was not convinced by the results. One can also inflate or deflate the numbers according to which moving interval one chooses. One can even try all the possible sizes of intervals and the delays to see which gives the best match. I did some of this semi-manually using spreadsheets, but it has a very high chance of overfitting. One would need to do some programming to try all of them in a reasonable time.

I wrote some of this stuff in a paper, but never finished it. It can now be found at its OSF repository.

Datasets

Newer dataset from May 2014.

Older dataset dated to 2012.

Python code. This includes code to crawl Wikipedia with and quite a lot of other raw data output files.

References

Greengross, G., Martin, R. A., & Miller, G. (2012). Personality traits, intelligence, humor styles, and humor production ability of professional stand-up comedians compared to college students. Psychology of Aesthetics, Creativity, and the Arts, 6(1), 74.

Janus, S. S. (1975). The great comedians: Personality and other factors. The American Journal of Psychoanalysis, 35(2), 169-174.

Janus, S. S., Bess, B. E., & Janus, B. R. (1978). The great comediennes: Personality and other factors. The American Journal of Psychoanalysis, 38(4), 367-372.

Trahan, L. H., Stuebing, K. K., Fletcher, J. M., & Hiscock, M. (2014). The Flynn effect: A meta-analysis.

January 21, 2015

True bliss

Filed under: Humor — Tags: — Emil O. W. Kirkegaard @ 02:43
victory is mine

victory is mine

August 16, 2014

Pun #4923

Filed under: Humor — Tags: — Emil O. W. Kirkegaard @ 05:35

If a person is waiting to be treated at a hospital and he complains about waiting too long… is he being impatient?

[14:40:05] Emil – Deleet: is it funny to talk about a sex division of labor?
[14:40:39] Emil – Deleet: meaning #1: Effort expended on a particular task; toil, work.
meaning #2: The act of a mother giving birth.

 

February 21, 2013

Pyrrho on selfishness

Filed under: Ethics,Humor,Multilogues — Emil O. W. Kirkegaard @ 04:04

I was looking for something else.. and found this instead… From here: http://able2know.org/topic/151812-1

LittleMathYou

I think a 9 year old killing himself is a good representation of suicide as a whole. Selfish,short-sighted, and always looking for an escape. Although it seems insensitive for me to say this, I think we just need to realize that those things are apart of deciding to end your life.

Pyrrho

It is funny that people who stay alive because they want to, call people “selfish” who kill themselves because they want to. (It is like those who have children because they want children calling childless couples selfish for not having children because they don’t want to have children.)

And it is absurd to call a solution to all of life’s problems forever a “short-sighted” solution. The 9 year old could not possibly have come up with any other solution to his problems that would have been so complete and long lasting.

 

January 13, 2013

A funny review of Plantinga’s new book

Filed under: Humor,Religion — Emil O. W. Kirkegaard @ 16:57

A friend of mine sent me this. Start from page 21.

sept-oct2012

source

January 7, 2013

Pununun

Filed under: Humor,Multilogues — Emil O. W. Kirkegaard @ 16:49

[16:43:51] Emil – Deleet: how cool is this :D
http://www.khanacademy.org/cs/monty-hall-simulation/1121357698
[16:45:19] Jens Arhøj – Strawb: Cool
[16:46:35] Emil – Deleet: i love statistics :D
[16:47:04] Jens Arhøj – Strawb: What are the odds of that?
[16:47:09] Emil – Deleet: winrar

December 18, 2012

Punstuff

Filed under: Humor,Multilogues — Emil O. W. Kirkegaard @ 14:09

[11:19:25] Jens Arhøj – Strawb: Tfw reading Game of Thrones
>The author doesn’t use commas as often as I would like
>I really pick the wrong things to focus on
[11:19:34] Emil – Deleet: yes
[11:19:36 | Edited 11:19:52] Emil – Deleet: commas, suck
[11:19:55 | Edited 11:20:06] Jens Arhøj – Strawb: Damn, you

 

Older Posts »

Powered by WordPress