Book review Psychiatry Psychology Science

Genius: The Natural History of Creativity (Hans Eysenck, 1995)

I continue my Eysenck readings with his popular genius book (prior review The Psychology of Politics (1954)). Having previously read some of Simonton’s work, Eysenck sure is a very different beast! The writing style follows the usual style: candid, emphasizing of uncertainty when present, funny, and very wide ranging. In fact, regarding replication, Eysenck is almost modern, always asking for replications of experiments, and saying that it is a waste of time to do studies with n < 100!

I don’t have time to write a big review, but I have marked a bunch of interesting passages, and I will quote them here. Before doing so, however, the reader should know that there is now a memorial site for Hans Eysenck too, with free copies of his work. It’s not complete yet, his bibliography is massive! I host it, but it’s created by a 3rd person.

Let’s begin. Actually, I forgot to note interesting passages in the first half of the book, so these are all from second part. Eysenck discusses the role of the environment in origins of genius, and illustrates with an unlikely case:

Our hero was born during the American Civil War, son of Mary, a Negro slave on a Missouri farm owned by Moses Carver and his wife Susan. Mary, who was a widow, had two other children – Melissa, a young girl, and a boy, Jim; George was the baby. In 1862 masked bandits who terrorized the countryside and stole livestock and slaves attacked Carver’s farm, tortured him and tried to make him tell where his slaves were hidden; he refused to tell. After a few weeks they came back, and this time Mary did not have time to hide in a cave, as she had done the first time; the raiders dragged her, Melissa and George away into the bitter cold winter’s night. Moses Carver had them followed, but only George was brought back; the raiders had given him away to some womenfolk saying ‘he ain’t worth nutting’. Carver’s wife Susan nursed him through every conceivable childhood disease that his small frame seemed to be particularly prone to; but his traumatic experiences had brought on a severe stammer which she couldn’t cure. He was called Carver’s George; his true name (if such a concept had any meaning for a slave) is not known. When the war ended the slaves were freed, but George and Jim stayed with the Carvers. Jim was sturdy enough to become a shepherd and to do other farm chores; George was a weakling and helped around the house. His favourite recreation was to steal off to the woods and watch insects, study flowers, and become acquainted with nature. He had no schooling of any kind, but he learned to tend flowers and became an expert gardener. He was quite old when he saw his first picture, in a neighbour’s house; he went home enchanted, made some paint by squeezing out the dark juices of some berries, and started drawing on a rock. He kept on experimenting with drawings, using sharp stones to scratch lines on the smooth pieces of earth. He became known as the ‘plant doctor’ in the neighbourhood, although still only young, and helped everyone with their gardens.

At some distance from the farm there was a one-room cabin that was used as a school house during the week; it doubled as a church on Sundays. When George discovered its existence, he asked Moses Carver for permission to go there, but was told that no Negroes were allowed to go to that school. George overcame his shock at this news after a while; Susan Carver discovered an old spelling-book, and with her help he soon learned to read and write. Then he discovered that at Neosho, eight miles away, there was a school that would admit Negro children. Small, thin and still with his dreadful stammer, he set out for Neosho, determined to earn some money to support himself there. Just 14 years old, he made his home with a coloured midwife and washerwoman. ‘That boy told me he came to Neosho to find out what made hail and snow, and whether a person could change the colour of a flower by changing the seed. I told him he’d never find that out in Neosho. Maybe not even in Kansas City. But all the time I knew he’d find it out – somewhere.’ Thus Maria, the washerwoman; she also told him to call himself George Carver – he just couldn’t go on calling himself Carver’s George! By that name, he entered the tumbledown shack that was the Lincoln School for Coloured Children, with a young Negro teacher as its only staff member. The story of his fight for education against a hostile environment is too long to be told here; largely self-educated he finally obtained his Bachelor of Science degree at the age of 32, specialized in mycology (the study of fungus growths) became an authority in his subject, and finally accepted an invitation from Booker T. Washington, the foremost Negro leader of his day, to help him fund a Negro university. He accepted, and his heroic struggles to create an institute out of literally nothing are part of Negro history. He changed the agricultural and the eating habits of the South; he created single-handed a pattern of growing food, harvesting and cooking it which was to lift Negroes (and whites too!) out of the abject state of poverty and hunger to which they had been condemned by their own ignorance. And in addition to all his practical and teaching work, administration and speech-making, he had time to do creative and indeed fundamental research; he was one of the first scientists to work in the field of synthetics, and is credited with creating the science of chemurgy – ‘agricultural chemistry’. The American peanut industry is based on his work; today this is America’s sixth most important agricultural product, with many hundreds of by-products. He became more and more obsessed with the vision that out of agriculture and industrial waste useful material could be created, and this entirely original idea is widely believed to have been Carver’s most important contribution. The number of his discoveries and inventions is legion; in his field, he was as productive as Edison. He could have become a millionaire-many times over but he never accepted money for his discoveries. Nor would he accept an increase in his salary, which remained at the 125 dollars a month (£100 per year) which Washington had originally offered him. (He once declined an offer by Edison to work with him at a minimum annual salary of 100000 dollars.) He finally died, over 80, in 1943. His death was mourned all over the United States. The New York Herald Tribune wrote: ‘Dr, Carver was, as everyone knows, a Negro. But he triumphed over every obstacle. Perhaps there is no one in this century whose example has done more to promote a better understanding between the races. Such greatness partakes of the eternal.’ He himself was never bitter, in spite of all the persecutions he and his fellow-Negroes had to endure. ‘No man can drag me down so low as to make me hate him.’ This was the epitaph on his grave. He could have added fortune to fame, but caring for neither, he found happiness and honour in being helpful to the world.

On Simonton‘s model of creativity:

Simonton’s own theory is interesting but lacks any conceivable psychological support. He calls his theory a ‘two-step’ model; he postulates that each individual creator begins with a certain ‘creative potential’ defined by the total number of contributions the creator would be capable of producing in an unrestricted life span. (Rather like a woman’s supply of ova!) There are presumably individual differences in this initial creative potential, which Simonton hardly mentions in the development of his theory. Now each creator is supposed to use up his supply of creative potential by transforming potential into actual contributions. (There is an obvious parallel here with potential energy in physics.) This translation of creative potential into actual creative products implies two steps. The first involves the conversion of creative potential into creative ideation, in the second step these ideas are worked into actual creative contributions in a form that can be appreciated publicly (elaboration). It is further assumed that the rate at which ideas are produced is proportional to the creative potential at a given time, and that the rate of elaboration if ‘proportional to the number of ideas in the works’ (Simonton, 1984b; p. 110). Simonton turns these ideas into a formula which generates a curve which gives a correlation between predicted and observed values in the upper 90s (Simonton, 1983b). The theory is inviting, but essentially untestable – how would we measure the ‘creative potential’ which presumably is entirely innate, and should exist from birth? How could we measure ideation, or elaboration, independently of external events? Of course the curve fits observations beautifully, but then all the constants are chosen to make sure of such a fit! Given the general shape of the curve (inverted U), many formulae could be produced to give such a fit. Unless we are shown ways of independently measuring the variables involved, no proper test of any underlying psychological theory exists.

On geniuses misbehavior, featuring Newton:

Less often remarked, but possibly even more insidious, is the resistance by scientists to ‘scientific discovery’, as Barker (1961) has named this phenomenon. As he point out, in two systematic analyses of the social process of scientific discovery and invention, analyses which tried to be as inclusive of empirical facts and theoretical problems as possible, there was only one passing reference to such resistance in the one instance and none at all in the second (GilfiUan, 1935; Barker, 1952). This contrasts markedly with the attention paid to the resistance to scientific discovery on the part of economic, technological, religious ideological elements and groups outside science itself (Frank, 1957; Rossman, 1931; Shyrock, 1936; Stamp, 1937). This neglect is probably based on the erroneous notion embodied in the title of Oppenheimer‘s (1955) book The Open Mind; we assume all too readily that objectivity is the characteristic of the scientist, and that he will impartially consider all the available facts and theories. Polanyi (1958, 1966) has emphasized the importance of the personality of the scientist, and no one familiar with the history of science can doubt that individual scientists are as emotional, jealous, quirky, self-centred, excitable, temperamental, ardent, enthusiastic, fervent, impassioned, zealous and hostile to competition as anyone else. The incredibly bellicose, malevolent and rancorous behaviour of scientists engaged in disputes about priority illustrates the truth of this statement. The treatment handed out to Halton Arp (1987), who dared to doubt the cosmological postulate about the meaning and interpretation of the red-shift is well worth pondering (Flanders, 1993). Objectivity flies out of the window when self-interest enters (Hagstrom, 1974).

The most famous example of a priority dispute is that between Newton and Leibnitz, concerning the invention of the calculus (Manuel, 1968). The two protagonists did not engage in the debate personally, but used proxies, hangers-on who would use their vituperative talents to the utmost in the service of their masters. Newton in particular abused his powers as President of the Royal Society in a completely unethical manner. He nominated his friends and supporters to a theoretically neutral commission of the Royal Society to consider the dispute; he wrote the report himself, carefully keeping his own name out of it, and he personally persecuted Leibnitz beyond the grave, insisting that he had plagiarized Newton’s discovery – which clearly was untrue, as posterity has found. Neither scientist emerges with any credit from the Machiavellian controversy, marred by constant untruths, innuendos of a personal nature, insults, and outrageous abuse which completely obscured the facts of the case. Newton behaved similarly towards Robert Hooke, Locke, Flamsted and many others; as Manuel (1968) says. ‘Newton was aware of the mighty anger that smouldered within him all his life, eternally seeking objects. … many were the times when (his censor) was overwhelmed and the rage could not be contained’ (p. 343). ‘Even if allowances are made for the general truculence of scientists and learned men, he remains one of the most ferocious practitioners of the art of scientific controversy. Genteel concepts of fair play are conspicuously absent, and he never gave any quarter’ (p. 345). So much for scientific objectivity!

More Newton!:

Once a theory has been widely accepted, it is difficult to displace, even though the evidence against it may be overwhelming. Kuhn (1957) points out that even after the publication of De Revolutionibus most astronomers retained their belief in the central position of the earth; even Brahe (Thoren, 1990) whose observations were accurate enough to enable Kepler (Caspar, 1959) to determine that the Mars orbit around the sun was elliptical, not circular, could not bring himself to accept the heliocentric view. Thomas Young proposed a wave theory of light on the basis of good experimental evidence, but because of the prestige of Newton, who of course favoured a corpuscular view, no-one accepted Young’s theory (Gillespie, 1960). Indeed, Young was afraid to publish the theory under his own name, in case his medical practice might suffer from his opposition to the god-like Newton! Similarly, William Harvey’s theory of the circulation of the blood was poorly received, in spite of his prestigious position as the King’s physician, and harmed his career (Keele, 1965). Pasteur too was hounded because his discovery of the biological character of the fermentation process was found unacceptable. Liebig and many others defended the chemical theory of these processes long after the evidence in favour of Pasteur was conclusive (Dubos, 1950). Equally his micro-organism theory of disease caused endless strife and criticism. Lister’s theory of antisepsis (Fisher, 1977) was also long argued over, and considered absurd; so were the contributions of Koch (Brock, 1988) and Erlich (Marquardt, 1949). Priestley (Gibbs, 1965) retained his views of phlogiston as the active principle in burning, and together with many others opposed the modern theories of Lavoisier, with considerable violence. Alexander Maconochie’s very successful elaboration and application of what would now be called ‘Skinnerian principle’ to the reclamation of convicted criminals in Australia, led to his dismissal (Barry, 1958).

But today is different! Or maybe not:

The story is characteristic in many ways, but it would be quite wrong to imagine that this is the sort of thing that happened in ancient, far-off days, and that nowadays scientists behave in a different manner. Nothing has changed, and I have elsewhere described the fates of modern Lochinvars who fought against orthodoxy and were made to suffer mercilessly (Eysenck, 1990a). The battle against orthodoxy is endless, and there is no chivalry; if power corrupts (as it surely does!), the absolute power of the orthodoxy in science corrupts absolutely (well, almost!). It is odd that books on genius seldom if ever mention this terrible battle that originality so often has when confronting orthodoxy. This fact certainly accounts for some of the personality traits so often found in genius, or even the unusually creative non-genius. The mute, inglorious Milton is a contradiction in terms, an oxymoron; your typical genius is a fighter, and the term ‘genius’ is by definition accorded the creative spirit who ultimately (often long after his death) wins through. An unrecognized genius is meaningless; success socially defined is a necessary ingredient. Recognition may of course be long delayed; the contribution of Green (Connell, 1993) is a good example.

On fraud in science, after discussing Newton’s fudging of data, and summarizing Kepler‘s:

It is certainly startling to find an absence of essential computational details because ‘taediesum esset’ to give them. But worse is to follow. Donahue makes it clear that Kepler presented theoretical deduction as computations based upon observation. He appears to have argued that induction does not suffice to generate true theories, and to have substituted for actual observations figures deduced from the theory. This is historically interesting in throwing much light on the origins of scientific theories, but is certainly not a procedure recommended to experimental psychologists by their teachers!

Many people have difficulties in understanding how a scientist can fraudulently ‘fudge’ his data in this fashion. The line of descent seems fairly clear. Scientists have extremely high motivation to succeed in discovering the truth; their finest and most original discoveries are rejected by the vulgar mediocrities filling the ranks of orthodoxy. They are convinced that they have found the right answer; Newton believed it had been vouchsaved him by God, who explicitly wanted him to preach the gospel of divine truth. The figures don’t quite fit, so why not fudge them a little bit to confound the infidels and unbelievers? Usually the genius is right, of course, and we may in retrospect excuse his childish games, but clearly this cannot be regarded as a licence for non-geniuses to foist their absurd beliefs on us. Freud is a good example of someone who improved his clinical findings with little regard for facts (Eysenck, 1990b), as many historians have demonstrated. Quod licet Jovi non licet bovi – what is permitted to Jupiter is not allowed the cow!

One further point. Scientists, as we shall see, tend to be introverted, and introverts show a particular pattern of level of aspiration (Eysenck, 1947) – it tends to be high and rigid. That means a strong reluctance to give up, to relinquish a theory, to acknowledge defeat. That, of course, is precisely the pattern shown by so many geniuses, fighting against external foes and internal problems. If they are right, they are persistent; if wrong, obstinate. As usual the final result sanctifies the whole operation (fudging included); it is the winners who write the history books!

The historical examples would seem to establish the importance of motivational and volitional factors, leading to persistence in opposition against a hostile world, and sometimes even to fraud when all else fails. Those whom the establishment refuses to recognize appropriately fight back as best they can; they should not be judged entirely by the standards of the mediocre!

This example and the notes about double standards for genius is all the more interesting in the recent light of problems with Eysenck’s own studies, published with yet another maverick!

And now, to sunspots and genius:

Ertel used recorded sun-spot activity going back to 1600 or so, and before that by analysis of the radiocarbon isotope CI4, whose productions as recorded in trees, which give an accurate picture of sun-spot activity. Plotted in relation to sun-spot activity were historical events, either wars, revolutions, etc. or specific achievements in painting, drama, poetry, science and philosophy. Note that Ertel’s investigations resemble a ‘double blind’ paradigm, in that the people who determined the solar cycle, and those who judged the merits of the artists and scientists in question, were ignorant of the purpose to which Ertel would put their data, and did not know anything about the theories involved. Hence the procedure is completely objective, and owes nothing to Ertel’s views, which in any case were highly critical of Chizhevsky’s ideas at the beginning.

The irregularities of the solar cycle present difficulties to the investigator, but also, as we shall see, great advantages. One way around this problem was suggested by Ertel; it consists of looking at each cycle separately; maximum solar activity (sol. max.) is denoted 0, and the years preceding or succeeding 0 are marked -1, -2, -3 etc., or +1, +2, +3 etc. Fig. 4.2 shows the occurrence of 17 conflicts between socialist states from 1945 to 1982, taken from a table published by the historian Bebeler, i.e. chosen in ignorance of the theory. In the figure the solid circles denote the actual distribution of events, the empty circles the expected distribution on a chance basis. Agreement with theory is obvious, 13 of the 17 events occurring between – 1 and + 1 solar maximum (Ertel, 1992a,b).

Actually Ertel bases himself on a much broader historical perspective, having amassed 1756 revolutionary events from all around the world, collected from 22 historical compendia covering the times from 1700 to the present. There appears good evidence in favour of Chizhevsky’s original hypothesis. However, in this book we are more concerned with Ertel’s extension to cultural events, i.e. the view that art and science prosper most when solar activity is at a minimum. Following his procedure regarding revolutionary events, Ertel built up a data bank concerned with scientific discoveries. Fig. 4.3 shows the outcome; the solid lines show the relation between four scientific disciplines and solar activity, while the black dots represent the means of the four scientific disciplines. However, as Ertel argues, the solar cycle may be shorter or longer than 11 years, and this possibility can be corrected by suitable statistical manipulation; the results of such manipulation, which essentially records strength of solar activity regardless of total duration of cycle, are shown on the right. It will be clear that with or without correction for duration of the solar cycle, there is a very marked U-shaped correlation with this activity, with an average minimum of scientific productivity at points -1,0 and -I-1, as demanded by the theory.

Intriguing! Someone must have tested this stuff since. It should be easy to curate a large dataset from Murray’s Human Accomplishment or Wikipedia based datasets, and see if it holds up. More generally, it is somewhat in line with quantitative historical takes by clio-dynamics people.

Intuition vs. thinking, system 1 vs. 2, and many other names:

It was of course Jung (1926) who made intuition one of the four functions of his typology (in addition to thinking, feeling, and sensation). This directed attention from the process of intuition to intuition as a personality variable – we can talk about the intuitive type as opposed to the thinking type, the irrational as opposed to the rational. (Feeling, too, is rational, while sensation is irrational, i.e. does not involve a judgment.) Beyond this, Jung drifts of Tinto the clouds peopled with archetypes and constituted of the ‘collective unconscious’, intuitions of which are held to be far more important than intuitions of the personal unconscious. Jung’s theory is strictly untestable, but has been quite important historically in drawing attention to the intuitive person’, or intuition as a personality trait.

Jung, like most philosophers, writers and psychologists, uses the contrast between ‘intuition’ and logic’ as an absolute, a dichotomy of either – or. Yet when we consider the definitions and uses of the terms, we find that we are properly dealing with a continuum, with ‘intuition’ and logic’ at opposite extremes, rather like the illustration in Fig. 5.2. In any problem-solving some varying degree of intuition is involved, and that may be large or small in amount. Similarly, as even Jung recognized, people are more or less intuitive; personalities are ranged along a continuum. It is often easier to talk as if we were dealing with dichotomies (tall vs. short; bright vs. dumb; intuitive vs. logical), but it is important to remember that this is not strictly correct; we always deal with continua.

The main problem with the usual treatment of ‘intuition’ is the impossibility of proof; whatever is said or postulated is itself merely intuitive, and hence in need of translation into testable hypotheses. Philosophical or even common- sense notions of intuition, sometimes based on experience as in the case of Poincare, may seem acceptable, but they suffer the fate of all introspection – they may present us with a problem, but do not offer a solution.

The intuitive genius of Ramanujan:

For Hardy, as Kanigel says, Ramanujan’s pages of theorems were like an alien forest whose trees were familiar enough to call trees, yet so strange they seemed to come from another planet. Indeed, it was the strangeness of Ramanujan’s theorems, not their brilliance, that struck Hardy first. Surely this was yet another crank, he thought, and put the letter aside. However, what he had read gnawed at his imagination all day, and finally he decided to take the letter to Littlewood, a mathematical prodigy and friend of his. The whole story is brilliantly (and touchingly) told by Kanigel; fraud or genius, they asked themselves, and decided that genius was the only possible answer. All honour to Hardy and Littlewood for recognizing genius, even under the colourful disguise of this exotic Indian plant; other Cambridge mathematicians, like Baker and Hobson, had failed to respond to similar letters. Indeed, as Kanigel says, ‘it is not just that he discerned genius in Ramanujan that stands to his credit today; it is that he battered down his own wall of skepticism to do so’ (p. 171).

The rest of his short life (he died at 33) Ramanujan was to spend in Cambridge, working together with Hardy who tried to educate him in more rigorous ways and spent much time in attempting to prove (or disprove!) his theorems, and generally see to it that his genius was tethered to the advance- ment of modern mathematics. Ramanujan’s tragic early death left a truly enormous amount of mathematical knowledge in the form of unproven theorems of the highest value, which were to provide many outstanding mathematicians with enough material for a life’s work to prove, integrate with what was already known, and generally give it form and shape acceptable to orthodoxy. Ramanujan’s standing may be illustrated by an informal scale of natural mathematical ability constructed by Hardy, on which he gave himself a 25 and Littlewood a 30. To David Hilbert, the most eminent mathematician of his day, he gave an 80. To Ramanujan he gave 100! Yet, as Hardy said:

the limitations of his knowledge were as startling as its profundity. Here was man who could work out modular equations and theorems of complex multiplication, to orders unheard of, whose mastery of continued fractions was, on the formal side at any rate, beyond that of any mathematician in the world, who had found for himself the functional equation of the Zeta- function, and the dominant terms of many of the most famous problems in the analytical theory of numbers; and he had never heard of a doubly periodic function or of Cauchy’s theorem, and had indeed but the vaguest idea of what a function of a complex variable was. His ideas as to what constituted a mathematical proof were of the most shadowy description. All his results, new or old, right or wrong, had been arrived at by a process of mingled arguments, intuition, and induction, of which he was entirely unable to give any coherent account (p. 714).

Ramanujan’s life throws some light on the old question of the ‘village Hampden’ and ‘mute inglorious Milton’; does genius always win through, or may the potential genius languish unrecognized and undiscovered? In one sense the argument entails a tautology: if genius is defined in terms of social recognition, an unrecognized genius is of course a contradicto in adjecto. But if we mean, can a man who is a potential genius be prevented from demonstrating his abilities?, then the answer must surely be in the affirmative. Ramanujan was saved from such a fate by a million-to-one accident. All his endeavours to have his genius recognized in India had come to nothing; his attempts to interest Baker and Hobson in Cambridge came to nothing; his efforts to appeal to Hardy almost came to nothing. He was saved by a most unlikely accident. Had Hardy not reconsidered his first decision, and consulted Littlewood, it is unlikely that we would ever have heard of Ramanujan! How many mute inglorious Miltons (and Newtons, Einsteins and Mendels) there may be we can never know, but we may perhaps try and arrange things in such a way that their recognition is less likely to be obstructed by bureaucracy, academic bumbledom and professional envy. In my experience, the most creative of my students and colleagues have had the most difficulty in finding recognition, acceptance, and research opportunities; they do not fit in, their very desire to devote their lives to research is regarded with suspicion, and their achievements inspire envy and hatred.

Eysenck talks about his psychoticism construct, which is almost the same as the modern general psychopathology factor, both abbreviated to P:

The study was designed to test Kretschmer’s (1946, 1948) theory of a schizothymia-cyclothymia continuum, as well as my own theory of a norma- lity-psychosis continuum. Kretschmer was one of the earliest proponents of a continuum theory linking psychotic and normal behaviour. There is, he argued, a continuum from schizophrenia through schizoid behaviour to normal dystonic (introverted) behaviour; on the other side of the continuum we have syntonic (extraverted) behaviour, cycloid and finally manic-depres- sive disorder. He is eloquent in discussing how psychotic abnormality shades over into odd and eccentric behaviour and finally into quite normal typology. Yet, as I have pointed out (Eysenck, 1970a,b), the scheme is clearly incom- plete. We cannot have a single dimension with ‘psychosis’ at both ends; we require at least a two dimensional scheme, with psychosis-normal as one axis, and schizophrenia-affective disorder as the other.

In order to test this hypothesis, I designed a method of ‘criterion analysis’ (Eysenck, 1950, 1952a,b), which explicitly tests the validity of continuum vs. categorical theories. Put briefly, we take two groups (e.g. normal vs. psycho- tic), and apply to both objective tests which significantly discriminate between the groups. We then intercorrelate the tests within each group, and factor analyse the resulting matrices. If and only if the continuum hypothesis is correct will it be found that the factor loadings in both matrices will be similar or identical, and that these loading will be proportional to the degree to which the various tests discriminate between the two criterion groups.

An experiment has been reported, using this method. Using 100 normal controls, 50 schizophrenics and 50 manic-depressives, 20 objective tests which had been found previously to correlate with psychosis were applied to all the subjects (Eysenck, 1952b). The results clearly bore out the continuum hypothesis. The two sets of factor loadings correlated .87, and both were proportional to the differentiating power of the tests r = .90 and .95, respecti- vely). These figures would seem to establish the continuum hypotheses quite firmly; the results of the experiment are not compatible with a categorical type of theory.

Eysenck summarizes his model:

Possessing this trait, however, does not guarantee creative achievement. Trait creativity may be a necessary component of such achievement, but many other conditions must be fulfilled, many other traits added (e.g. ego-strength), many abilities and behaviours added (e.g. IQ, persistence), and many socio- cultural variables present, before high creative achievement becomes prob- able. Genius is characterized by a very rare combination of gifts, and these gifts function synergistically, i.e. they multiply rather than add their effects. Hence the mostly normally distributed conditions for supreme achievement interact in such a manner as to produce a J-shaped distribution, with huge numbers of non- or poor achievers, a small number of high achievers, and the isolated genius at the top.

This, in very rough outline, is the theory here put forward. As discussed, there is some evidence in favour of the theory, and very little against it. Can we safely say that the theory possesses some scientific credentials, and may be said to be to some extent a valid account of reality? There are obvious weaknesses. Genius is extremely rare, and no genius has so far been directly studied with such a theory in mind. My own tests have been done to study deductions from the theory, and these have usually been confirmatory. Is that enough, and how far does it get us?

Terms like ‘theory’, of course, are often abused. Thus Koestler (1964) attempts to explain creativity in terms of his theory of’bisociation’ according to which the creative act ‘always operates on more than one plane’ (p. 36). This is not a theory, but a description; it cannot be tested, but acts as a definition. Within those limits, it is acceptable as a non-contingent proposition (Smets- lund, 1984), i.e. necessarily true and not subject to empirical proof. A creative idea must, by definition, bring together two or more previously unrelated concepts. As an example, consider US Patent 5,163,447, the ‘force-sensitive, sound-playing condom’, i.e. an assembly of a piezo-electric sound transducer, microchip, power supply and miniature circuitry in the rim of a condom, so that when pressure is applied, it emits ‘a predetermined melody or a voice message’. Here is bisociation in its purest form, bringing together mankind’s two most pressing needs, safe sex and eternal entertainment. But there is no proper theory here; nothing is said that could be disproved by experiment. Theory implies a lot more than simple description.

And he continues outlining his own theory of scientific progress:

The philosophy of science has thrown up several criteria for judging the success of a theory in science. All are agreed that it must be testable, but there are two alternative ways of judging the outcome of such tests. Tradition (including the Vienna school) insists on the importance of confirmation’, the theory is in good shape as long as results of testing deductions are positive (Suppe, 1974). Popper (1959, 1979), on the other hand, uses falsification as his criterion, pointing out that theories can never be proved to be correct, because we cannot ever test all the deductions that can possibly be made. More recent writers like Lakatos (1970, 1978; Lakatos and Musgrave, 1970) have directed their attention rather at a whole research programme, which can be either advancing or degenerating. An advancing research programme records a number of successful predictions which suggest further theoretical advances; a degenerating research programme seeks to excuse its failures by appealing to previously unconsidered boundary conditions. On those terms we are surely dealing with an advancing programme shift; building on research already done, many new avenues are opening up for supporting or disproving the theories making up our model.

It has always seemed to me that the Viennese School, and Popper, too, were wrong in disregarding the evolutionary aspect of scientific theories. Methods appropriate for dealing with theories having a long history of development might not be optimal in dealing with theories in newly developing fields, lacking the firm sub-structure of the older kind. Newton, as already men- tioned, succeeded in physics, where much sound knowledge existed in the background, as well as good theories; he failed in chemistry/alchemy where they did not. Perhaps it may be useful to put forward my faltering steps in this very complex area situated between science and philosophy (Eysenck, 1960, 1985b).

It is agreed that theories can never be proved right, and equally that they are dependent on a variety of facts, hunches and assumptions outside the theory itself; these are essential for making the theory testable. Cohen and Nagel (1936) put the matter very clearly, and take as their example Foucault’s famous experiment in which he showed that light travels faster in air than in water. This was considered a crucial experiment to decide between two hypotheses: H1? the hypothesis that light consists of very small particles travelling with enormous speeds, and H2 , the hypothesis that light is a form of wave motion. H1 implies the proposition Pl that the velocity of light in water is greater than in air, while H2 implies the proposition P2 that the velocity of light in water is less than in air. According to the doctrine of crucial experiments, the corpuscular hypothesis of light should have been banished to limbo once and for all. However, as is well known, contemporary physics has revived the corpuscular theory in order to explain certain optical effects which cannot be explained by the wave theory. What went wrong?

As Cohen and Nagel point out, in order to deduce the proposition P1 from H1 and in order that we may be able to perform the experiment of Foucault, many other assumptions, K, must be made about the nature of light and the instruments we employ in measuring its velocity. Consequently, it is not the hypothesis H1 alone which is being put to the test by the experiment – it is H1 and K. The logic of the crucial experiment may therefore be put in this fashion. If Hl and K, then P1; if now experiment shows P1 to be false, then either Hl is false or K (in part or complete) is false (or of course both may be false!). If we have good grounds for believing that K is not false, H1 is refuted by the experiment. Nevertheless the experiment really tests both H1 and K. If in the interest of the coherence of our knowledge it is found necessary to revise the assumptions contained in K, the crucial experiment must be reinterpreted, and it need not then decide against H1.

What I am suggesting is that when we are using H + K to deduce P, the ratio of H to K will vary according to the state of development of a given science. At an early stage, K will be relatively little known, and negative outcomes of testing H + K will quite possibly be due to faulty assumptions concerning K. Such theories I have called ‘weak’, as opposed to ‘strong’ theories where much is known about K, so that negative outcomes of testing H + K are much more likely to be due to errors in H (Eysenck, 1960, 1985b).

We may now indicate the relevance of this discussion to our distinction between weak and strong theories. Strong theories are elaborated on the basis of a large, well founded and experimentally based set of assumptions, K, so that the results of new experiments are interpreted almost exclusively in terms of the light they throw on H1, H2, …, Hn . Weak theories lack such a basis, and negative results of new experiments may be interpreted with almost equal ease as disproving H or disproving K. The relative importance of K can of course vary continuously, giving rise to a continuum; the use of the terms ‘strong’ and ‘weak’ is merely intended to refer to the extremes of this continuum, not to suggest the existence of two quite separate types of theories. In psychology, K is infinitely less strong than it is in physics, and consequently theories in psychology inevitably lie towards the weaker pole.

Weak theories in science, then, generate research the main function of which is to investigate certain problems which, but for the theory in question, would not have arisen in that particular form; their main purpose is not to generate predictions the chief use of which is the direct verification or confirmation of the theory. This is not to say that such theories are not weakened if the majority of predictions made are infirmed; obviously there comes a point when investigators turn to more promising theories after consistent failure with a given hypothesis, however interesting it may be. My intention is merely to draw attention to the fact – which will surely be obvious to most scientifically trained people – that both proof and failure of deductions from a scientific hypothesis are more complex than may appear at first sight, and that the simple-minded application of precepts derived from strong theories to a field like psychology may be extremely misleading. Ultimately, as Conant has emphasized, scientific theories of any kind are not discarded because of failures of predictions, but only because a better theory has been advanced.

The reader with a philosophy background will naturally think of this in terms of The Web of Belief, which takes us back to my earlier days of blogging philosophy!


Open data and behavioral genetics: room for improvement!

Open data is a fundamental part of getting science to work well. Primary reasons for this:

  • Redundancy is data archiving. Most data are lost because no backups exist!
  • Easy access to 3rd parties. For new analyses or error checking previous work. Scientists are human and often refuse access to data for hostile outsiders, preventing them from error checking their own work.
Unfortunately, there are only a few behavioral datasets in existence owing to not generally collecting datasets for multiple family members at a time. Some of the public or partially public ones are:
  • NLSYs (National Longitudinal Surveys of Youth) as described in
  • NCPP (National Collaborative Perinatal Project) The data files are really annoying to work with (fixed width format), but some people have released 3rd party versions that are easier
  • TEDS (Twins Early Development Study) is closed
    • But, part of it used to be partially public at this but removed now, I have put a copy here
    • Update: it is now moved to here and still available
  • PT (Project Talent), but so far not released I think
  • More? Contact me
intelligence / IQ / cognitive ability Stereotypes

Stereotype threat: current evidence in January 2020

Stereotype threat is:

a situational predicament in which people are or feel themselves to be at risk of conforming to stereotypes about their social group.[1][2] Stereotype threat is purportedly a contributing factor to long-standing racial and gender gaps in academic performance. It may occur whenever an individual’s performance might confirm a negative stereotype because stereotype threat is thought to arise from a particular situation, rather than from an individual’s personality traits or characteristics. Since most people have at least one social identity which is negatively stereotyped, most people are vulnerable to stereotype threat if they encounter a situation in which the stereotype is relevant. Situational factors that increase stereotype threat can include the difficulty of the task, the belief that the task measures their abilities, and the relevance of the stereotype to the task. Individuals show higher degrees of stereotype threat on tasks they wish to perform well on and when they identify strongly with the stereotyped group. These effects are also increased when they expect discrimination due to their identification with a negatively stereotyped group.[3] Repeated experiences of stereotype threat can lead to a vicious circle of diminished confidence, poor performance, and loss of interest in the relevant area of achievement.[4]

At least, that’s the theory. What is the evidence? It’s the usual thing. A bunch of small studies with various p-hacking issues, and then some larger ones with null results. I summarize the large sample size studies and meta-analyses. There is also a published review by skeptic academics:

The stereotype threat literature primarily comprises lab studies, many of which involve features that would not be present in high-stakes testing settings. We meta-analyze the effect of stereotype threat on cognitive ability tests, focusing on both laboratory and operational studies with features likely to be present in high stakes settings. First, we examine the features of cognitive ability test metric, stereotype threat cue activation strength, and type of nonthreat control group, and conduct a focal analysis removing conditions that would not be present in high stakes settings. We also take into account a previously unrecognized methodological error in how data are analyzed in studies that control for scores on a prior cognitive ability test, which resulted in a biased estimate of stereotype threat. The focal sample, restricting the database to samples utilizing operational testing-relevant conditions, displayed a threat effect of d = −.14 (k = 45, N = 3,532, SDδ = .31). Second, we present a comprehensive meta-analysis of stereotype threat. Third, we examine a small subset of studies in operational test settings and studies utilizing motivational incentives, which yielded d-values ranging from .00 to −.14. Fourth, the meta-analytic database is subjected to tests of publication bias, finding nontrivial evidence for publication bias. Overall, results indicate that the size of the stereotype threat effect that can be experienced on tests of cognitive ability in operational scenarios such as college admissions tests and employment testing may range from negligible to small.

Sex: females and math

For sex differences, they picked women and math as the claim to attempt to secure. The reason for this choice is that women’s relatively worse math performance is a major factor in their lower STEM representation, which feminists desperately want. Jelte Wicherts’ former PhD student, Paulette Flore, basically destroyed this idea with her dissertation. Some of it has been published as articles:

Although the effect of stereotype threat concerning women and mathematics has been subject to various systematic reviews, none of them have been performed on the sub-population of children and adolescents. In this meta-analysis we estimated the effects of stereotype threat on performance of girls on math, science and spatial skills (MSSS) tests. Moreover, we studied publication bias and four moderators: test difficulty, presence of boys, gender equality within countries, and the type of control group that was used in the studies. We selected study samples when the study included girls, samples had a mean age below 18years, the design was (quasi-)experimental, the stereotype threat manipulation was administered between-subjects, and the dependent variable was a MSSS test related to a gender stereotype favoring boys. To analyze the 47 effect sizes, we used random effects and mixed effects models. The estimated mean effect size equaled -0.22 and significantly differed from 0. None of the moderator variables was significant; however, there were several signs for the presence of publication bias. We conclude that publication bias might seriously distort the literature on the effects of stereotype threat among schoolgirls. We propose a large replication study to provide a less biased effect size estimate.

And then she did this replication study:

The effects of gender stereotype threat on mathematical test performance in the classroom have been extensively studied in several cultural contexts. Theory predicts that stereotype threat lowers girls’ performance on mathematics tests, while leaving boys’ math performance unaffected. We conducted a large-scale stereotype threat experiment in Dutch high schools (N = 2064) to study the generalizability of the effect. In this registered report, we set out to replicate the overall effect among female high school students and to study four core theoretical moderators, namely domain identification, gender identification, math anxiety, and test difficulty. Among the girls, we found neither an overall effect of stereotype threat on math performance, nor any moderated stereotype threat effects. Most variance in math performance was explained by gender, domain identification, and math identification. We discuss several theoretical and statistical explanations for these findings. Our results are limited to the studied population (i.e. Dutch high school students, age 13–14) and the studied domain (mathematics).

Various groups and GRE-like scores

A little known report (12 years old, 13 citations!) reports some strong evidence, based on 2 previous papers:

The figures speak for themselves:

These are all based on large samples in high-stakes tests i.e. real life life important context.

Academic stereotypes and tracking — in China

Educational tracks create differential expectations of student ability, raising concerns that the negative stereotypes associated with lower tracks might threaten student performance. The authors test this concern by drawing on a field experiment enrolling 11,624 Chinese vocational high school students, half of whom were randomly primed about their tracks before taking technical skill and math exams. As in almost all countries, Chinese students are sorted between vocational and academic tracks, and vocational students are stereotyped as having poor academic abilities. Priming had no effect on technical skills and, contrary to hypotheses, modestly improved math performance. In exploring multiple interpretations, the authors highlight how vocational tracking may crystallize stereotypes but simultaneously diminishes stereotype threat by removing academic performance as a central measure of merit. Taken together, the study implies that reminding students about their vocational or academic identities is unlikely to further contribute to achievement gaps by educational track.


Normie authors:

In many regions around the world students with certain immigrant backgrounds underachieve in educational settings. This paper provides a review and meta-analysis on one potential source of the immigrant achievement gap: stereotype threat, a situational predicament that may prevent students to perform up to their full abilities. A meta-analysis of 19 experiments suggests an overall mean effect size of 0.63 (random effects model) in support of stereotype threat theory. The results are complemented by moderator analyses with regard to circulation (published or unpublished research), cultural context (US versus Europe), age of immigrants, type of stereotype threat manipulation, dependent measures, and means for identification of immigrant status; evidence on the role of ethnic identity strength is reviewed. Theoretical and practical implications of the findings are discussed.

Their funnel plot says it all:

Black-White gap in USA

In depth analysis of the first and famous paper is given by Ulrich Schimmack.

Wicherts has an old meta-analysis that he is hiding for whatever reason.

But we will soon get another big replication, similar to Flore’s above:

According to stereotype threat theory, the possibility of confirming a negative group stereotype can evoke feelings of threat, leading people to underperform in the domains in which they are stereotyped as lacking ability. This theory has immense theoretical and practical implications, but many studies supporting it include small samples and varying operational definitions of “stereotype threat”. We address the first challenge by leveraging a network of psychology labs to recruit a large Black student sample (Nanticipated = 2700) from multiple US sites (Nanticipated = 27). We address the second challenge by identifying three threat-increasing and three threat-decreasing procedures that could plausibly affect performance and use an adaptive Bayesian design to determine which “stereotype threat” operationalization yields the strongest evidence for underperformance. This project has the potential to advance our knowledge of a scientifically and socially important topic: whether and under what conditions stereotype threat affects current US Black students.

Which of course I am looking forward to!

A reasonable prior is that anything from social psychology is most likely bullshit. More so the more left-wing friendly it is. Stereotype threat gets a double bad prior here. The evidence for it is laughably bad, so a reasonable person’s posterior will be close to 0.

intelligence / IQ / cognitive ability

Ken Richardson claims FAQ

A much more in depth reply to one of Richardson’s papers was written by Zeke Jeffrey here.

Since philosophy blogger RaceRealist is on a mission to promote Ken Richardson, it seems we need to discuss his various claims a bit more. Hence, this post is an FAQ about his claims.

Is there agreement on definition of “intelligence” and how to measurement it?

“As we approach the centenary of the first practical intelligence test, there is still little scientific agreement about how human intelligence should be described, whether IQ tests actually measure it, and if they don’t, what they actually do measure.” (Richardson 2002)

Richardson cherry picks some quotes to make his point. However, there is plenty of agreement about the description of intelligence and how to measure it. See results in surveys going back to the 1980s: Snyderman & Rothman 1987, 1988, Reeve & Charles 2008 (in Scott Alexander’s words “97% of expert psychologists and 85% of applied psychologists agree that IQ tests measure cognitive ability “reasonably well””). A typical mainstream statement of definition is that offered by Gottfredson 1994:

A very general mental capability that, among other things, involves the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience. It is not merely book learning, a narrow academic skill, or test-taking smarts. Rather, it reflects a broader and deeper capability for comprehending our surroundings—”catching on,” “making sense” of things, or “figuring out” what to do.

One can also consult every mainstream textbook on the topic.

Do IQ tests measure social class or intelligence?

IQ tests are merely clever numerical surrogates for social class. The numerous correlations evoked in support of g arise from this. (Richardson 1999)

It suggests that all of the population variance in IQ scores can be described in terms of a nexus of sociocognitive-affective factors that differentially prepares individuals for the cognitive, affective and performance demands of the test—in effect that the test is a measure of social class background, and not one of the ability for complex cognition as such. (Richardson 2002)

There are easy and obvious ways to test this idea. First, if IQ tests measure social class, they should be very strongly related to social class/status (SES), either of oneself or one’s parents, as in r = .90 or so (measuring the same thing minus some measurement error). In fact, they are correlated about .35 (Hanscombe et al 2012) with parental SES and about .60 with one’s own adult SES (Strenze 2007 presents a meta-analysis but does not have a composite SES measure, however we reported this in, e.g., our Argentina study .55, Kirkegaard & Fuerst 2017).

Second, one can look at siblings, who of course are known to vary widely in IQ, differing on average by 12 IQ, whereas random people differ by 17 IQ. Many large studies (e.g. Frisell et al., 2012; Hegelund et al., 2019; Murray, 2002, Aghion et al 2018) have shown that these sibling differences in IQ predict outcomes well, sometimes as well as between families, usually with some loss (maybe 15% on average).

Third, as an easier alternative to the second point above, one can also just adjust for parental SES in a regression and see whether IQ still predicts stuff. The Bell Curve (Herrnstein & Murray, 1994) of course famously did this and found that IQ usually predicts the best of the two. These results were also recently replicated for SAT and university education outcomes by Higdem et al 2016:

In table 1, compare lines 1 vs. 2 to see influence of parental SES. The model R2 hardly changes (.268 to .273) and the beta change for SAT is minor (.194 to .173). Table 2 shows values where they calculated the partial correlation, i.e. removed statistical dependency of parental SES for both outcomes and then showed the correlations among the remaining variables by subgroup. They are all substantial. Note that range restriction causes the correlations to differ by group.


New paper out: Intelligence and Religiosity among Dating Site Users (with Jordan Lasker)

We sought to assess whether previous findings regarding the relationship between cognitive ability and religiosity could be replicated in a large dataset of online daters (maximum n = 67k). We found that self-declared religious people had lower IQs than nonreligious people (atheists and agnostics). Furthermore, within most religious groups, a negative relationship between the strength of religious conviction and IQ was observed. This relationship was absent or reversed in nonreligious groups. A factor of religiousness based on five questions correlated at −0.38 with IQ after adjusting for reliability (−0.30 before). The relationship between IQ and religiousness was not strongly confounded by plausible demographic covariates (β = −0.24 in final model versus −0.30 without covariates).

Keywords: intelligence; religion; religious belief; atheism; agnosticism; Christianity; Catholicism; Hinduism; Judaism; Islam; OKCupid; cognitive ability

So, yet another OKCupid dataset paper!

People often ask me what the IQ items in the OKCupid dataset are. Here they are:

  1. Which is bigger, the earth or the sun?
  2. STALE is to STEAL as 89475 is to what?
  3. What is next in this series? 1, 4, 10, 19, 31, __
  4. If you turn a left-handed glove inside out, it fits on your left or right hand?
  5. In the line ‘’Wherefore art thou Romeo?’’ what does ‘’wherefore’’ mean?
  6. How many fortnights are in a year?
  7. Half of all policemen are thieves and half of all policemen are murderers. Does it follow logically that all policemen are criminals?
  8. Which is longer, a mile or a kilometer?
  9. When birds stand on power lines and don’t get hurt, it’s most likely because of what?
  10. Etymology is?
  11. If some men are doctors and some doctors are tall, does it follow that some men are tall?
  12. A little grade 10 science: what is the Ideal Gas Law?
  13. If you flipped three pennies, what would be the odds that they all came out the same?
  14. Which is the day before the day after yesterday?

Main figure

Figure 1. Mean cognitive ability by religious orientation and certainty. Error bars are 95% confidence intervals. Shaded regions are the 95% confidence intervals for the individual-level regression results. Groups without at least five cases are not shown.


Video talk

Book review

2019 in books

I document my reading habits on Goodreads platform, so it’s easy for anyone to follow along. I don’t review all books I read, just when I feel like it. I do however rate them all. 2019 is over, so I can look back at the year and see what I read. Below, I organized them by broad style of book. They are all nonfiction because I don’t read fiction.

Psychology books

A neat introduction to current problems of psychology, replication crisis, low power, p-hacking and so on.

Read this because of the SlateStarCodex review. It’s definitely an interesting take to psychopathology (mental illness) that left me with many ideas to pursue.

This is a book length introduction to David Becker’s recalculation of Richard Lynn’s national IQ database. It’s still incomplete. I published a review of this.

Eysenck’s book on race and intelligence. Quite entertaining, as most of his books are.

Since I have been studying political bias in media and academia, this was on my to-read list. Not being previously familiar with Pearson’s work, only that he’s very hated. In fact, Turkheimer tried to shame me about him already, which naturally has the opposite result of just making me more curious.

Only partially a psychology book. Since I work a lot with immigration and politics thereof, this was an obvious to read book. It’s pretty good!

PhD thesis by some Danish guy. Not too interesting, just skim the published papers instead.

Another good Eysenck book which taught me a lot despite being very old, 1954! Reviewed here.

Eysenck’s classic takedown of Freudian psychology. Definitely recommended.


A fairly basic introduction to genomics. Mainly read this because the author is Danish and so could be relevant to network with.


Amusing autobiography by an eminent biologist and oddball.


Overview of left-wing bias in the study of history. For anyone familiar with the evidence of left-wing bias in psychology, this won’t be surprising. The disturbing part is that many historical facts might not really be so.

Similar to the above, but focused on the Comintern.

Similar to the above, but specifically about McCarthy’s legacy. Before reading this book, I held the typical opinion of being against McCarthy as overly zealous, but history has later mostly vindicated him. So I no longer use McCarthyism as a negative adjective. The communist infiltration was mostly real.

Mainly read this because of the articles in Reliable Media™ about people proposing new poor history where Shakespeare is a woman, promoted by Amazing Amy, of course.

Methods books

I was doing some time series analysis and was wondering about the theoretical background. This book sucks though.

Brief and useful introduction to working with spatial data in R. I’ve been increasingly working with spatial data for the purpose of map making, and dealing with omnipresent spatial autocorrelation in datasets of political units.

Short introduction to deep learning with R. Not very good.

A kind of introduction to Pearl’s way of thinking about causality. Worth reading but flawed. I reviewed it in length here.

A neat and brief introduction to statistical thinking.


As the title says, a kind of introduction to applied decision theory by trying to aggressively quantify everything.

Author is an annoying neurotic feminist, but this is a neat book about the issues of science in physics for those of us who don’t have the background to read the primary literature.

Fairly typical Silicon Valley take on science history and how to move stuff along.

Charlton used to run a journal Medical Hypotheses which published a lot of non-medicine, and which used editorial review instead of the slow peer review. Thus, his book about science progress could be interesting, but it wasn’t.

Good defense of free speech/thought . Reviewed in detail here.

Fairly boring. Essentially some commentary on various recent SJW insanity.

Essentially an academic book that defends the idea that right wing media are uniquely bad using network analysis. Does not really properly consider the problem of political bias in media except in a few amusing places where it is casually dismissed. I still recommend it as something to read from the other side, especially because it presents a lot of data well.

Political science

Women, taxes and voting

Many libertarians defend women’s voting rights (suffrage) by reference to general principles about equality of legal rights. The historical reason for this move, I think, is that libertarians and classical liberals was a political reaction to the extra legal rights of the nobility and royalty. Thus, their goal was to equalize such rights, and this also implies equalizing them for women as well as non-European groups (most famously, African descent slaves).

History aside, the stance of libertarians is odd politically because women really don’t like libertarianism, a few Twitter liberty hotties aside. For instance, sex composition of US libertarians as measured by this 2013 survey was 68% male. I think it underestimates because it didn’t look at voting, and voting results tend to be more extreme than various kinds of self-report. There’s some more numbers here and here, but again they are not based on actual voting patterns as far as I can tell. Pew research produced this figure based on 2014 data (US again). For a broader comparison, see also Karlin’s Coffee Salon demographics.

We can also have a look ourselves since Pew releases their data after 2 years (data collection). I am actually surprised fewer people use these public datasets, which are high quality, representative surveys that cover many difference topics. There are even a few surveys with IQ-like items, mainly related to scientific knowledge. Back in 2017, I downloaded one of these surveys to plot the demographics of preference for smaller government. Simple race x sex breakdown looks like this:

The female preference for larger government shows up pretty much no matter how you slice the data:

More numerically, we can also fit a regression model with all of these proposed explanatory factors at the same time:

To produce this plot, I did:

  • Fit 5 logistic regression models (using rms package for R) with varying predictions. The first has just sex, and then we incrementally add more covariates to see if they can make the effect of sex go away.
  • Extract the probability of favoring bigger govt from the 5 models, and combine these to a dataset.
  • Plot the probabilities with appropriate error bars (95% confidence predictions) and labels.

So, all in all, we see that the sex difference actually does not get smaller with controls, it gets slightly larger (because male % decreases slightly). Numerically, the logit changes from 0.452 to 0.567 between first and final model. Because of this, one cannot explain female bigger government preference out of their present social conditions in society, it must have deeper roots, meaning evolutionary. Leaving speculations about the origin aside, we can also predict that giving women votes should increase the size of the government. Does it? We can rely upon policy changes in history that extended the vote to women. There are a few of these studies actually. The most obvious country to examine is the United States:

This paper examines the growth of government during this century as a result of giving women the right to vote. Using cross‐sectional time‐series data for 1870–1940, we examine state government expenditures and revenue as well as voting by U.S. House and Senate state delegations and the passage of a wide range of different state laws. Suffrage coincided with immediate increases in state government expenditures and revenue and more liberal voting patterns for federal representatives, and these effects continued growing over time as more women took advantage of the franchise. Contrary to many recent suggestions, the gender gap is not something that has arisen since the 1970s, and it helps explain why American government started growing when it did.

This is the most famous, I think, mainly because the author is naughty for other reasons.

Switzerland is another good choice:

  • Abrams, B. A., & Settle, R. F. (1999). Women’s suffrage and the growth of the welfare state. Public Choice, 100(3-4), 289-300.

In this paper we test the hypothesis that extensions of the voting franchise to include lower income people lead to growth in government, especially growth in redistribution expenditures. The empirical analysis takes advantage of the natural experiment provided by Switzerland’s extension of the franchise to women in 1971. Women’s suffrage represents an institutional change with potentially significant implications for the positioning of the decisive voter. For various reasons, the decisive voter is more likely to favor increases in governmental social welfare spending following the enfranchisement of women. Evidence indicates that this extension of voting rights increased Swiss social welfare spending by 28% and increased the overall size of the Swiss government.

Authors explain on their results:

The results for the key variable, suffrage, are striking. Based upon our estimates, giving women the vote in Switzerland raised the level of social welfare spending by 28%, after accounting for other influences on that spending variable. In Switzerland, social welfare spending is about half the government budget, so this result implies an increase in overall government spending of about 14%. Qualitatively, this result is consistent with the Husted and Kenny (1997: 76) finding of “. . . strong support for the prediction that welfare spending rises as the decisive voter moves down the income distribution.”

In an attempt to shed some light on whether the effect of suffrage on welfare spending was immediate or occurred with some lag, we estimated several alternative models (not reported here). In these models we varied the date at which the suffrage variable switched from 0 to 1. Rather than indicating the year in which suffrage occurred, this alternative formulation indicates alternative years when suffrage may have begun to influence spending. The values for the suffrage coefficient and t-statistic are maximized when suffrage equals one beginning in 1973: the coefficient equals 0.295 versus 0.25 in Table 2, while the t-value is 6.30 versus 4.31. This simple test suggests the reasonable conclusion that the political mechanism in Switzerland did not respond immediately to this sweeping change in the nature of the electorate, but with a lag of about two years.

In contrast to the positive impact on social welfare spending, enfranchisement of Swiss women appears to have reduced the rate of government consumption spending in Switzerland. The estimates for model 2 suggest that, as a result of women’s suffrage, the level of government consumption spending is about 5.8% less than otherwise. This finding differs from the Husted and Kenny (1997: 80) study: they found that “. . . nonwelfare government expenditures were unaffected by various measures of political influence of the poor. . .” Examination of Swiss government spending suggests that at least part of the observed reduction in government consumption spending may e attributed to cuts in military outlays. In the period 1963–1971, military spending averaged 2.46% of GDP. In the period 1972–1983, following the enfranchisement of women, military spending averaged only 1.99% of GDP (Source: U.S. Arms Control and Disarmament Agency).

Government consumption spending in Switzerland is about 38% of total government outlays. Thus, the 5.8% reduction translates into a 2.2% reduction in total outlays. That impact, combined with the estimated positive effect on social welfare spending, suggests that the overall effect of women’s suffrage on total government spending in Switzerland is around 12%.

So, women moved spending from military (national defense) into enjoyment/health, and also increased the overall spending.

Thus, overall, the conclusion seems to be that anyone who is against large government should be against women voting, at least prima facie. One could defend it on other grounds such as keeping a fair and simple rights setup in society. If one wants to avoid a direct sexist solution, one could attempt to target problem voters in other ways. For instance, one idea is to limit voting rights to people who are employed in the private sector, i.e. who are not dependent on the welfare state for their own salaries. Another idea is to remove voter rights from people who are living off non-earned welfare payouts (unemployment/disability benefits and the like, but not including earned pensions). The latter solution is in line with the typical rational voter framework of economics, but the results above for effect of sex do not generally align with self-interested voting since the effect of sex was remarkable stable no matter which controls were employed.

Immigration intelligence / IQ / cognitive ability

Studies using national IQs for predicting immigration outcomes

Below I list all the studies I am aware of that use national IQs in studies of immigrants, usually as an estimate of their IQ levels in the host country. Most of these studies are authored by me and coauthors, but not the first two from 2010, which were seemingly independently conceived by two sets of academics. I count 14 papers in this literature so far, and 1 unpublished meta-analysis of these.

Search was done this way:

  1. Looked over the citing studies of Jones & Schneider 2010, and Vinogradov., & Kolvereid 2010 on Google Scholar, looking for anything that seemed relevant.

  2. Copied in all my own listed using my website front page as list.

  3. Looked in Lynn’s Race difference book (2nd ed) for any mentions of immigration/immigrants. Since Lynn is the originator of national IQs, he presumably would be on the lookout for any study using them.

So it is possible I missed studies that do not cite the two 2010 studies, but do cite some of Lynn’s work. Furthermore, it’s possible there are some studies that use other datasets to do essentially the same thing, e.g. Altinok’s scores, Rindermann’s scores, or other datasets.

  • Jones, G., & Schneider, W. J. (2010). IQ in the production function: Evidence from immigrant earnings. Economic Inquiry, 48(3), 743-755.

We show that a country’s average IQ score is a useful predictor of the wages that immigrants from that country earn in the United States, whether or not one adjusts for immigrant education. Just as in numerous microeconomic studies, 1 IQ point predicts 1% higher wages, suggesting that IQ tests capture an important difference in cross‐country worker productivity. In a cross‐country development accounting exercise, about one‐sixth of the global inequality in log income can be explained by the effect of large, persistent differences in national average IQ on the private marginal product of labor. This suggests that cognitive skills matter more for groups than for individuals. (JEL J24, J61, O47)

The level of self-employment varies significantly among immigrants from different countries of origin. The objective of this research is to examine the relationship between home-country national intelligence and self-employment rates among first generation immigrants in Norway. Empirical secondary data on self-employment among immigrants from 117 countries residing in Norway in 2008 was used. The relevant hypothesis was tested using hierarchical regression analysis. The immigrants’ national intelligence was found to be significantly positively associated with self-employment. However, the importance of national IQ for self-employment among immigrants decreases with the duration of residence in Norway. The study concludes with practical implications and suggestions for future research.

Many recent studies have corroborated Lynn and Vanhanen’s worldwide compilation of national IQs; however, no one has attempted to estimate the mean IQ of an immigration population based on its countries of origin. This paper reports such a study based on the Danish immigrant population and IQ data from the military draft. Based on Lynn and Vanhanen’s estimates, the Danish immigrant population was estimated to have an average 89.9 IQ in 2013Q2, and the IQ from the draft was 86.3 in 2003Q3 (against a ‘Danish’ IQ of 100). However, after taking account of two error sources, the discrepancy between the measured IQ and the estimated IQ was reduced to a mere 0.4 IQ. The study thus strongly validates Lynn and Vanhanen’s national IQs.

We discuss the global hereditarian hypothesis of race differences in g and test it on data from the NLSF. We find that migrants country of origin’s IQ predicts GPA and SAT/ACT.

Criminality rates and fertility vary wildly among Danish immigrant populations by their country of origin. Correlational and regression analyses show that these are very predictable (R’s about .85 and .5) at the group level with national IQ, Islam belief, GDP and height as predictors.

A previous study found that criminality among immigrant groups in Denmark was highly predictable by their countries of origin’s prevalence of Muslims, IQ, GDP and height. This study replicates the study for Norway with similar results.

We obtained data from Denmark for the largest 70 immigrant groups by country of origin. We show that three important socioeconomic variables are highly predictable from the Islam rate, IQ, GDP and height of the countries of origin. We further show that there is a general immigrant socioeconomic factor and that country of origin national IQs, Islamic rates, and GDP strongly predict immigrant general socioeconomic scores.

I present new predictive analyses for crime, income, educational attainment and employment among immigrant groups in Norway and crime in Finland. Furthermore I show that the Norwegian data contains a strong general socioeconomic factor (S) which is highly predictable from country-level variables (National IQ .59, Islam prevalence -.71, international general socioeconomic factor .72, GDP .55), and correlates highly (.78) with the analogous factor among immigrant groups in Denmark. Analyses of the prediction vectors show very high correlations (generally ±.9) between predictors which means that the same variables are relatively well or weakly predicted no matter which predictor is used. Using the method of correlated vectors shows that it is the underlying S factor that drives the associations between predictors and socioeconomic traits, not the remaining variance (all correlations near unity).

We argue that if immigrants have a different mean general intelligence (g) than their host country and if immigrants generally retain their mean level of g, then immigration will increase the standard deviation of g. We further argue that inequality in g is an important cause of social inequality, so increasing it will increase social inequality. We build a demographic model to analyze change in the mean and standard deviation of g over time and apply it to data from Denmark. The simplest model, which assumes no immigrant gains in g, shows that g has fallen due to immigration from 97.1 to 96.4, and that for the same reason standard deviation has increased from 15.04 to 15.40, in the time span 1980 to 2014.

Immigrants can be classified into groups based on their country of origin. Group-level data concerning immigrant crime by country of origin was obtained from a 2005 Dutch-language report and were from 2002. There are data for 57 countries of origin. The crime rates were correlated with country of origin predictor variables: national IQ, prevalence of Islam and general socioeconomic factor (S). For males aged 12-17 and 18-24, the mean correlation with IQ, Islam, and S was, respectively, -.51, .37, and -.42. When subsamples split into 1st and 2nd generations were used, the mean correlation was -.74, .34, and -.40. A general crime factor among young persons was extracted. The correlations with the predictors for this variable were -.80, .34, and -.43. The results were similar when weighing the observations by the population of each immigrant group in the Netherlands. The results were also similar when using crime rates controlled for differences in household income. Some groups increased their crime rates from the 1st to 2nd generation, while for others the reverse happened.

Two datasets with grade point average by country of origin or parents’ country of origin are presented (N=13 and 19). Correlation analyses show that GPA is highly predictable from country-level variables: National IQ (.40 to .64), age heaping 1900 (.32 to .53), Islam prevalence (-.72 to -.75), average years of schooling (.41 to .74) and general socioeconomic factor (S) in both Denmark (.72 to .87) and internationally (.38 to .68). Examination of the gap sizes in GPA between natives and immigrants shows that these are roughly the size one would expect based on the estimated general cognitive ability differences between the groups.

Number of suspects per capita were estimated for immigrants in Germany grouped by citizenship (n=83). These were correlated with national IQs (r=-.53) and Islam prevalence in the home countries (r=.49). Multivariate analyses revealed that the mean age and sex distribution of the groups in Germany were confounds.

The German data lacked age and sex information for the crime data and so it was not possible to adjust for age and sex using subgroup analyses. For this reason, an alternative adjustment method was developed. This method was tested on the detailed Danish data which does have the necessary information to carry out subgroup analyses. The new method was found to give highly congruent results with the subgrouping method.

The German crime data were then adjusted for age and sex using the developed method and the resulting values were analyzed with respect to the predictors. They were moderately to strongly correlated with national IQs (.46) and Islam prevalence in the home country (.35). Combining national IQ, Islam% and distance to Germany resulted in a model with a cross-validated r2 of 20%, equivalent to a correlation of .45. If two strong outliers were removed, this rose to 25%, equivalent to a correlation of .50.

Employment rates for 11 country of origin groups living in the three Scandinavian countries are presented. Analysis of variance showed that differences in employment rates are highly predictable (adjusted multiple R = .93). This predictability was mostly due to origin countries (eta = .89), not sex (eta = .25) and host country (eta = .20). Furthermore, national IQs of the origin countries predicted employment rates well across all host countries (r’s = 0.74 [95%CI: 0.30, 0.92], 0.75 [0.30, 0.92], 0.66 [0.14, 0.89] for Denmark, Norway and Sweden, respectively), and so did Muslim % of the origin countries (r’s =-0.80 [-0.94,-0.43],-0.78 [-0.94,-0.37],-0.58 [-0.87,-0.01]).

The relationships between national IQs, Muslim% in origin countries and estimates of net fiscal contributions to public finances in Denmark (n=32) and Finland (n=11) were examined. The analyses showed that the fiscal estimates were near-perfectly correlated between countries (r = .89 [.56 to .98], n=9), and were well-predicted by national IQs (r’s .89 [.49 to .96] and .69 [.45 to .84]), and Muslim% (r’s -.75 [-.93 to -.27] and -.73 [-.86 to -.51]). Furthermore, general socioeconomic factor scores for Denmark were near-perfectly correlated with the fiscal estimates (r = .86 [.74 to .93]), especially when one outlier (Syria) was excluded (.90 [.80 to .95]). Finally, the monetary returns to higher country of origin IQs were estimated to be 917/470 Euros/person-year for a 1 IQ point increase, and -188/-86 for a 1% increase in Muslim%.

The European Union has seen an increased number of asylum seekers and economic migrants over the past few years. There will be request to assess some of these individuals to see if they have an intellectual disability (ID). If this is to be done using the current internationally recognized definitions of ID, we will need to be confident that the IQ tests we have available are able to accurately measure the IQs of people from developing countries. The literature showing substantial differences in the mean measured IQs of different countries is considered. It is found that, although there are numerous problems with these studies, the overall conclusion that there are substantial differences in mean measured IQ is sound. However, what is not clear is whether there are large differences in true intellectual ability between different countries, how predictive IQ scores are of an individual from a developing country ability to cope, and whether or not an individual’s IQ would increase if they go from a developing country to a developed one. Because of these uncertainties, it is suggested that a diagnosis of ID should not be dependent on an IQ cut-off point when assessing people from developing countries.

This is borderline with regards to inclusion. It does not look at prediction differential immigrant group performance using national IQs, but it does discuss the national IQs at length with regards to immigration.

Not a paper yet, but I have done a meta-analysis on most of these results, which was presented at LCI 2017. It is available on Youtube. For technical output, see

Genetics / behavioral genetics intelligence / IQ / cognitive ability

New paper out: Racial and ethnic group differences in the heritability of intelligence: A systematic review and meta-analysis (Pesta et al 2020)

So, our big Scarr-Rowe meta-analysis dropped recently. I was traveling at the time, so there was a bit of a delay to this posting. I also recorded a long video covering the reasons why this kind of study is important.

Via meta-analysis, we examined whether the heritability of intelligence varies across racial or ethnic groups. Specifically, we tested a hypothesis predicting an interaction whereby those racial and ethnic groups living in relatively disadvantaged environments display lower heritability and higher environmentality. The reasoning behind this prediction is that people (or groups of people) raised in poor environments may not be able to realize their full genetic potentials. Our sample (k = 16) comprised 84,897 Whites, 37,160 Blacks, and 17,678 Hispanics residing in the United States. We found that White, Black, and Hispanic heritabilities were consistently moderate to high, and that these heritabilities did not differ across groups. At least in the United States, Race/Ethnicity × Heritability interactions likely do not exist.

Main table:

There is also the table of matched samples (i.e. subsamples from same studies where we calculate differences in ACE parameters), but this table is unwieldy and basically shows the same.

Genetics / behavioral genetics intelligence / IQ / cognitive ability

Figlio et al 2017 race Scarr-Rowe results

Accurate understanding of environmental moderation of genetic influences is vital to advancing the science of cognitive development as well as for designing interventions. One widely reported idea is increasing genetic influence on cognition for children raised in higher socioeconomic status (SES) families, including recent proposals that the pattern is a particularly US phenomenon. We used matched birth and school records from Florida siblings and twins born in 1994–2002 to provide the largest, most population-diverse consideration of this hypothesis to date. We found no evidence of SES moderation of genetic influence on test scores, suggesting that articulating gene-environment interactions for cognition is more complex and elusive than previously supposed.

Main result:

But they also tested Scarr-Rowe by race. Not reported in the paper, but we requested it. Described this way in our recent meta-analysis (which I will cover later):

Figlio, Freese, Karbownik, and Roth (2017) investigated the Scarr-Rowe hypothesis for SES using administrative data from the Florida Public School System. The sample comprised 24,640 twins born between 1994 and 2002. The authors did not have information on zygosity, so they estimated heritabilities using a method similar to that employed by Scarr-Salapatek (1971). Multilevel mixed-effects linear regressions were used in the analysis, and results were computed separately for grades 3 to 5, and grades 6 to 8, for both the FCAT Math and Reading tests. The authors reported the absence of Scarr-Rowe interactions for SES in all comparisons. Figlio (2019, personal communication) provided “ABCE” estimates decomposed by race/ethnicity. The B in the model here is the variance explained by SES, and B is also a sub-component of C. We condensed these results into ACE estimates (i.e., by adding B and C) and standardized the results. We also added average math and reading ACE estimates per grade (third to fifth and sixth to eighth). These values are shown in Tables S3-S4.

For Grades 3 to 5, the Black and White heritabilities (averaged for math and reading) were nearly identical: h2White= .57, S.E. = .05; h2Black = .56, S.E. = .09. However, the Hispanic heritability was substantially higher (h2Hispanic= .72, S.E. = .10). For Grades 5 to 6, the Black heritabilities were reduced somewhat compared to the White (h2White= .59, S.E. = .07; h2Black = .48, S.E. = .11), while the Hispanic heritability was again substantially elevated (h2Hispanic = .73, S.E. = .12). Means and standard deviations for this sample were not reported. Nonetheless, Figlio et al. (2017) reported math and reading FCAT averages for the 1994 to 2002 cohort from which the twins were sampled. Scores were not broken down by age. Using these data, the d-values were .96 (White-Black), .40 (White-Hispanic), and .57 (Hispanic-Black).