The WORDSUM question

I tweeted about this already, but I’m dumping some notes here for future reference.

The Wordsum is a 10 word vocabulary test that’s been used for decades as a brief measure of intelligence. It’s most prominently used in the General Social Survey (GSS), a recurrent US survey of various social matters that’s been going on since 1972. It’s also used in the American National Election Studies (ANES), a similar but more politically related recurrent survey that has run since 1977. These surveys have been used lot to relate intelligence to various outcomes, not just political opinions, but stuff like fertility, and group differences. There’s something like 280 uses of the term “Wordsum” in the literature, but probably more studies since many don’t use this exact term. Wordsum is taken to be a surprisingly good measure of general intelligence, and the citation everybody gives goes like this (Razib Khan 2010):

Every time I use the WORDSUM variable from the GSS people will complain that a score on a 10-question vocabulary test is not a good measure of intelligence. The reality is that “good” is too imprecise a term. The correlation between adult IQ and WORDSUM = 0.71. The source for this number is a 1980 paper, The Enduring Effects of Education on Verbal Skills. I’ve reproduced the relevant table…

Estimated Correlations for Variables in a Model of Enduring Effects of Education for White, Native-Born People 25 to 72 Years Old in the Contemporary [1970s] United States
Child IQ Age Sex Father’s Educ Father’s SEI Educ Adult IQ WORDSUM
Child IQ 0 0 0.31 0.30 0.51 0.80
Age 0.026 -0.304 -0.130 -0.304 -0.42 -0.005
Sex -0.054 0.058 0.050 0 -0.121
Father’s Educ 0.488 0.469 0.30 0.302
Father’s SEI 0.347 0.31 0.285
Educ 0.66 0.511
Adult IQ 0.71
WORDSUM

Obviously since the WORDSUM test was not given to those under 18 you can’t calculate the correlation between childhood IQ and WORDSUM score. Additionally, I suspect since 1980 there’s been a bit more cognitive stratification by education. I notice in the GSS sample that there are many older people, especially women, who have high WORDSUM scores but no college education. In the younger age cohorts this pattern is not as evident because if you are intelligent the probability is much higher that you’ll obtain a university education.

A correlation of 0.71 is not mind-blowing, there’s a significant difference between IQ and WORDSUM as they relate to each other linearly. But I think it’s good enough to get a sense that WORDSUM is a serviceable substitute for a more rigorous measure of g in lieu of any alternatives, and not so clumsy a proxy so as to be useless. Though that call is up to you, and readers are free to disagree with the methodology of the model used to obtain this correlation. Additionally, I would point out that WORDSUM is a subset of the vocabulary subsection of the Wechsler Adult Intelligence Scale. WORDSUM is in effect a slice of an IQ test.

The source for this is this rather obscure paper:

Unfortunately, it’s not actually the original source:

Source goes to (h), which is:

  • Thorndike, Robert L. 1967 Vocabulary Test G-T: Directions and Norms. New York: Institute of Psychological Research, mimeograph.

I couldn’t find a copy of this anywhere, so who knows what it says. However, Gavan Tredoux points out:

Following the chain of references (I actually have a copy of Miner 1955) it looks like that WORDSUM/IQ correlation really dates from the 1940s ref. sample. Guess people haven’t dug into it because of the strong relationship between vocab. and IQ. In short, g. But they should.

And indeed, I recalled that I’ve previously read another paper that reviewed Wordsum (posted about that here), which noted:

I first examined the data of Stewart and of Harrell and Harrell several years ago, long before the specter of IQ was raised by Herrnstein and Murray, and it occurred to me to wonder whether other data, not subject to the selection and truncation of the scores for enlisted men in the Armed Forces, would show the same pattern of variability of test scores across occupations. I looked first at variation in verbal ability among occupation groups of American adults interviewed in the NORC General Social Survey from 1974 to 1989. In almost every year, the entire GSS sample or a large, randomly selected fraction of it, was administered a 10-item vocabulary test, WORDSUM, which was selected from items originally constructed for a standard IQ test. The ten GSS vocabulary items were chosen from “Form A,” one of two parallel, twenty-item vocabulary tests selected by Thorndike. Each form contained two vocabulary test items from each of the levels of the vocabulary section of the Institute for Educational Research Intelligence Scale: Completion, Arithmetic Problems, Vocabulary, and Directions (Thorndike 1942). Form A was developed by Thorndike in response to the need for a very brief test of intelligence in a social survey (Thorndike and Gallup 1944), and it was also used in an attempt to study the feasibility of an aptitude census (Thorndike and Hagen 1952). Form A was later used by Miner (1957) in his monograph, Intelligence in the United States, which attempted to assess the intellectual ability of the U.S. population using a national household sample survey.

So, there we have it apparently. All these studies rely on a single value from the 1940s by one of the original test developers. Seems crazy, but apparently, that is the case. What needs to be done is a new norming of the Wordsum items against a test with known reliability and relation to general g. This does not mean it has to WAIS or something, one could use ICAR16 online and adjust the values using that tests known properties (see the reference paper on ICAR). Please contact me if you are interested in working on this. I can provide funding for doing a large survey online, but I’m looking for someone to do the questionnaire design and setup.

Here it should be noted that I did find the Wordsum psychometrics review article:

This paper does not actually discuss the correlation to fullscale IQ/general intelligence, but mostly centers on the use of IRT vs. just summing the items. In my testing, I’ve found that using IRT does produce slightly better estimates, which correlate something about 5% stronger with third variables, but it’s a minor thing one can mostly disregard.

Leave a Reply