Clear Language, Clear Mind

September 26, 2018

XYY supermales and violence

Filed under: Criminology,intelligence / IQ / cognitive ability — Tags: , , , — Emil O. W. Kirkegaard @ 08:08

Nature has a review of Robert Plomin’s new book by some random lefty historian data-free reasoning type called Nathaniel Comfort (he follows the usual pattern of having a background in biology, as the usual suspects). Among the usual claims without any supporting references we find this one:

Crude hereditarianism often re-emerges after major advances in biological knowledge: Darwinism begat eugenics; Mendelism begat worse eugenics. The flowering of medical genetics in the 1950s led to the notorious, now-debunked idea that men with an extra Y chromosome (XYY genotype) were prone to violence. Hereditarian books such as Charles Murray and Richard Herrnstein’s The Bell Curve (1994) and Nicholas Wade’s 2014 A Troublesome Inheritance (see N. Comfort Nature513, 306–307; 2014) exploited their respective scientific and cultural moments, leveraging the cultural authority of science to advance a discredited, undemocratic agenda. Although Blueprint is cut from different ideological cloth, the consequences could be just as grave.

Our prior for this claim being wrong is strong because we know that large structural abnormalities including whole chromosome aneuploidies are usually associated with a lot of issues, including behavioral ones — many of which would be criminal if we held such people accountable. This is probably in part due to the link to lower intelligence. In any case, we can easily check the claim in the academic literature since people compile datasets about persons with large genetic abnormalities.

Stockholm et al 2012

Objective To investigate the criminal pattern in men between 15 and 70 years of age diagnosed with 47,XXY (Klinefelter’s syndrome (KS)) or 47,XYY compared to the general population.

Design Register-based cohort study comparing the incidence of convictions among men with KS and with 47,XYY with age- and calendar-matched samples of the general population. Crime was classified into eight types (sexual abuse, homicide, burglary, violence, traffic, drug-related, arson and ‘others’).

Setting Denmark 1978–2006.

Participants All men diagnosed with KS (N=934) or 47,XYY (N=161) at risk and their age- and calendar-time-matched controls (N=88 979 and 15 356, respectively).

Results The incidence of convictions was increased in men with KS (omitting traffic offenses) compared to controls with a HR of 1.40 (95% CI 1.23 to 1.59, p<0.001), with significant increases in sexual abuse, burglary, arson and ‘others’, but with a decreased risk of traffic and drug-related offenses. The incidence of convictions was significantly increased among men with 47,XYY compared to controls with a HR of 1.42 (95% CI 1.14 to 1.77, p<0.005) in all crime types, except drug-related crimes and traffic. Adjusting for socioeconomic variables (education, fatherhood, retirement and cohabitation) reduced the total HR for both KS and 47,XYY to levels similar to controls, while some specific crime types (sexual abuse, arson, etc) remained increased.

Conclusion The overall risk of conviction (excluding traffic offenses) was moderately increased in men with 47,XYY or KS; however, it was similar to controls when adjusting for socioeconomic parameters. Convictions for sexual abuse, burglary, arson and ‘others’ were significantly increased. The increased risk of convictions may be partly or fully explained by the poor socioeconomic conditions related to the chromosome aberrations.

This is more or less a perfect study for this purpose, the Everest regression of the authors notwithstanding.

Leggett et al 2010

Aim To review systematically the neurodevelopmental characteristics of individuals with sex chromosome trisomies (SCTs).

Method A bibliographic search identified English‐language articles on SCTs. The focus was on studies unbiased by clinical referral, with power of at least 0.69 to detect an effect size of 1.0.

Results We identified 35 articles on five neonatally identified samples that had adequate power for our review. An additional 11 studies were included where cases had been identified for reasons other than neurodevelopmental concerns. Individuals with an additional X chromosome had mean IQs that were within broadly normal limits but lower than the respective comparison groups, with verbal IQ most affected. Cognitive outcomes were poorest for females with XXX. Males with XYY had normal‐range IQs, but all three SCT groups (XXX, XXY, and XYY) had marked difficulties in speech and language, motor skills, and educational achievement. Nevertheless, most adults with SCTs lived independently. Less evidence was available for brain structure and for attention, social, and psychiatric outcomes. Within each group there was much variation.

Interpretation Individuals with SCTs are at risk of cognitive and behavioural difficulties. However, the evidence base is slender, and further research is needed to ascertain the nature, severity, and causes of these difficulties in unselected samples.

So, we know from the study above that crime rates are in fact higher. Do we also see the usual other types of problem behavior and related cognitive issues that would explain this? Yep, IQ scores are 10-15 points below comparison groups. Note that the authors’ discussion of the IQ gaps show a lack of understanding of the role of the Flynn effects since they keep remarking how the IQ means of affected groups are around 100 while matched comparisons are around 110-115 (failure to realize the need for renorming). This error was also seen in the Minnesota Transracial Adoption Study and was pointed out by Loehlin.

Money et al 1969

By now the reader might be wondering: maybe the historian’s view is based on something that was published before 2010? What does the older reviews say? Well, they say about the same thing.

It is possible that boys with the XYY syndrome are ill-equipped to cope with the ordinary stresses of learning and the academic environ- ment; 19 of the 35 patients are recorded as having had problems in school, and only 4 as having been free of problems. In five cases, the problems were specified as behavioral, in four as underachievement, and in ten as both. Behavioral problems included dislike of school, deficient attention span, restlessness, truancy and disruption of the classroom routine. In at least one case, behavior was so bizarre as to resemble both brain-damage symptoms and psychosis. This latter case is instructive because the boy’s behavior improved remarkably by the middle teenage years, under the influence of a planned, benign en- vironment. Simultaneously, there was an improvement of the abnormal, spike-wave EEG (there had been no clinical seizures), the spike being no longer in evidence. Also the IQ rose from a 6 2 year old low of 89, to 100 in teenage.

School difficulties did not especially correlate with IQ. They oc- curred over an IQ range from 63 to 125. The IQ was given in 18 cases, the median being 91, and the mean 89. The exact nature of the rela- tionship of the extra Y chromosome to IQ will remain uncertain until incidence studies have been completed. So far, there does not appear to be an excess of severe mental retardation, as in the XXY syndrome (though note that some individuals with XXY do have superior IQ’s above 120).

Unstable work histories closely relate to prison histories, so far as the 30 men over the age of 16 in the present series are concerned, for some of them were in and out of jail two or more times. Some were in special institutions for men considered to be poor risks as chronic offenders. One had been in both a jail and a mental hospital.

The present sample includes 24 men with a prison record; 2 others were in hospitals for the criminally insane and 1 was in a regular psychiatric hospital because of sex offenses. These total 27 in detention. There were three children, one of whom at age eight had been in trouble with the law. The other two and one of the teenagers had exhibited grossly deviant behavior. In only one case (an adult) was the necessary information lacking. In the whole sample, therefore, there were three, one teenager and two men whose behavior was relatively normal and law abiding.

The sample is, of course, deliberately biased in favor of law breakers, for investigators were, by design, screening tall men in jails, in the belief that there would be more XYY men among them than elsewhere. Thus, the exact relationship between XYY and imprisonment also cannot be ascertained until proper incidence studies have been completed.

The offenses that kept men in detention varied from robbery to murder (3 cases). In 7 cases crimes against both property and person were specified (in another 7, attacks on property and person were noted, though not in connection with detention). In 11 cases, the subjects were imprisoned for an offense against property only, and in 7 others, for an offense against persons only, 5 of them sex offenses. It should be noted that in the sum total of sex offenses, both homosexual and heterosexual assaults and/or approaches were represented

The history seems to be that prison workers discovered suspiciously high numbers of XXY and XYY men among high profile criminals. They not unreasonably then inferred that this had something to do with their predicament (essentially a case-control study). When data later became available for larger samples, the relationship between these disorders and problem behavior was confirmed. Somewhere along the way, some accounts presumably exaggerated the relationships which cause historians sensitive to social justice to produce counter-narratives which apparently they still believe in to this day despite the actual data.

February 11, 2017

Swedish immigrant crime data from the 1980s

Filed under: Criminology,Immigration — Tags: , , — Emil O. W. Kirkegaard @ 17:52

I was skimming a Wikipedia article related to immigrant crime and came across an obscure Swedish language report from the 1990s:

  • Ahlberg, J. (1996). Invandrares och invandrares barns brottslighet: En statistisk analys [Immigrants’ and immigrants’ children’s crime: a statistical analysis]. Brottsförebyggande rådet (BRÅ).

(file available on OSF)

This report is a goldmine. Briefly:

  • Data from 1985-1989.
  • Country of origin crime rates for 1st gen. Raw and adjusted for age and sex. n=38.
  • Country of origin crime rates for 2nd gen. Raw and adjusted for age and sex.
  • Stereotypes about immigrant crime levels. n=10 ethnicities.
  • Data for different crime types: violent, property, sex, etc.
  • Various other interesting things.

In this case, the old data are particularly useful because they allow us to examine whether recent wars etc. are responsible for poor performance of some groups. E.g. Iraqis usually do not perform well, but people will say it’s because their country got invaded by the US. Twice. And so they suffer from transgenerational epigenetic stress or whatever. And also had prolonged civil war. Similar things apply to Afghanistan and Yugoslavia.

The old data also allow for longitudinal analyses, which are very important to immigrant policy. I.e. if we can expect immigrants to acquire similar performance to natives after 20 years, then taking in poorly performing immigrants is only a temporary burden, not a permanent one.

Of the 38 cases, 34 are countries and 4 are unspecified remainder categories (e.g. “other European countries”). Of the country cases, some are combined, presumably due to small numbers. E.g. Argentina and Uruguay are combined. They are mostly combined sensibly by combining neighboring countries with similar cultures and genetics. E.g. there is a North African case with Algeria, Libya, Morocco and Tunisia. Perhaps the most problematic is the combination of Bangladesh and Pakistan. These are on different sides of India, which is included by itself. Both are mostly Muslim, but Bangladesh is mostly grouped with the other South Asian countries (Burma, Bhutan, Nepal) not with MENAP. In general, a sensible approach to these combined groups is to split them and use both countries as datapoints. This inflates the sample size a bit. One could weigh them accordingly when doing this to avoid this problem somewhat (i.e. weigh datapoints for Bangladesh and Pakistan by 0.5 vs. usual 1). After expanding the combined countries, we get a more respectable n=48. This still has a few former countries that are problematic: USSR, Yugoslavia, Czech Slovakia.

The stereotypes come from a large survey (n=1,362) and simple concern whether one thinks the 10 groups have more, same or fewer immigrants (5 options + don’t know). Similar to the data in this recent paper about the UK. Unfortunately, they don’t all match up with the immigrant groups, e.g. one group is gypsies. Gypsies have no country, so it is hard to get reliable data on them about any trait. However, if we match up 8 or 9 of the groups depending on our liberal we are. Depending on which, we get accuracy scores of r = .36 or .52. Not too bad. The groups are not random: all were above average, so there is variance reduction which reduces the observed correlation. But this is the best we can do.

[There is a meta-analysis of Roma IQ which found a mean of 74! Seems to be not published yet, but there’s an abstract from the talk.]

So, what are the basic findings? A scatterplot says a thousand words.



(all values adjusted for age and sex)

These subgroup rates should be taken very lightly because the samples must be very small indeed. Italians not known to be particularly crime prone.

More to come! We’re buying a large dataset for Sweden with immigrant performance data on 4 metrics: crime, education, income and social benefits. Then we will essentially replicate the Denmark and Norway analyses.

June 17, 2016

Predictions for an individual-level general crime factor study using Swedish register data

There is a lot of research on the link between crime and cognitive ability. (For criminal outcomes, see the problems in my previous post.) E.g.

Is the Association between General Cognitive Ability and Violent Crime Caused by Family-Level Confounders?

We linked longitudinal Swedish total population registers to study the association of general cognitive ability (intelligence) at age 18 (the Conscript Register, 1980–1993) with the incidence proportion of violent criminal convictions (the Crime Register, 1973–2009), among all men born in Sweden 1961–1975 (N = 700,514). Using probit regression, we controlled for measured childhood socioeconomic variables, and further employed sibling comparisons (family pedigree data from the Multi-Generation Register) to adjust for shared familial characteristics.

Cognitive ability in early adulthood was inversely associated to having been convicted of a violent crime (β = −0.19, 95% CI: −0.19; −0.18), the association remained when adjusting for childhood socioeconomic factors (β = −0.18, 95% CI: −0.18; −0.17). The association was somewhat lower within half-brothers raised apart (β = −0.16, 95% CI: −0.18; −0.14), within half-brothers raised together (β = −0.13, 95% CI: (−0.15; −0.11), and lower still in full-brother pairs (β = −0.10, 95% CI: −0.11; −0.09). The attenuation among half-brothers raised together and full brothers was too strong to be attributed solely to attenuation from measurement error.


I note that the reduction in the link between siblings is in line with cognitive ability being confounded with other traits that influence crime such as self-control. This idea was also suggested by Peter Frost.

Intelligence and criminal behavior in a total birth cohort: An examination of functional form, dimensions of intelligence, and the nature of offending

The current study contributes to this literature by examining the functional form of the IQ-offending association in a total birth cohort of Finnish males born in 1987. Criminal offending was measured with nine different indicators from official records and intelligence was measured using three subscales (verbal, mathematical, and spatial reasoning) as well as a composite measure. The results show consistent evidence of mostly linear patterns, with some indication of curvilinear associations at the very lowest and the very highest ranges of intellectual ability.


I note that this study did have crime by type and could have analyzed for a general factor, but did not. I also don’t see any numeric values for the strength of the relationships in the article, but one could derive estimates from the reported statistics, I think. E.g. their Table 1 gives the SD of number of crimes committed as 2.67. Assuming the pattern is entirely linear (close approximation), we can take the value of the highest and lowest groups and see how far they are apart in SD on both variables and divide these to get the individual-level correlation. In this case, mean crime by lowest 9-tile is 1.19 and for highest is 0.33 (found in Table A4 at the end). So for crime the d for lowest vs. highest is (1.19-0.33)/2.67=0.32. Their d for cognitive ability is about (9-1)/2.18=3.67. Then we divide and get 0.32/3.67=0.09, the estimated correlation.

Since this is too small (expected value around 0.20), there is probably some bias in these estimates. For one thing, the number of crimes does not follow a normal distribution so using means and SDs is misleading. One should use medians and MADs (robust alternatives). SDs are very sensitive to outliers because they are based on squared values (a common feature of all methods based on squared values such as OLS regression [which minimizes the mean squared error]). Because we divide by the crime SD, this makes the estimated correlation too small. As a quick test of this idea, I simulated some power low distributed data in R (rlnorm) and calculated the SD and MAD: 2.20 and 0.88 (using the default values for simulating data). We see that the MAD is much smaller. We can use this for a quick and dirty re-estimate for the above using the ratio of the SD and MAD, which is 0.88/2.20=0.40. So: 0.32/(3.67*0.40)=0.22, which is in the right ballpark, but still based on means, not medians. Based on some more playing around, it seems that using medians increases the d value by about 70% compared to medians, which would reduce the above estimate of 0.22. However, I cannot calculate how much because they do not give the medians by group. I don’t know. Maybe trying to use normal distribution statistics to estimate a correlation for obviously non-normal data is not a good idea, even if one does use the robust versions. :p

See also: Dull minds and criminal acts

The study I want done and my predictions

What I want to do is something like this:

  1. Get a very large dataset that includes criminal records/self-reports on different types of crimes and relevant predictors such as cognitive ability, gender, age and mental illness. In writing this post, I was specifically thinking of the dataset that Amir Sariaslan (RG, Twitter) usually works with. Self-report is better because it has less 0’s. Most crimes are not caught and punished, which means that most cells in the dataset get filled up with 0’s. These bias the correlations downwards.
  2. Factor analyze the criminal outcomes at the individual-level using latent correlations. Then score the cases using IRT.
  3. Model the relationship between the predictors and the crime scores.
  4. Use Jensen’s method to assess whether the predictor-outcome relationships can plausible be attributed to a general factor or other variance.

I believe in making predictions before seeing the data, preferably numeric predictions. I wish to make the following public predictions in case someone does such a study.

  1. There will be a general crime factor (GCF?) at the individual-level just as there was at the aggregate level in my prior S factor study of London boroughs (~districts).
  2. The Jensen coefficients with cognitive ability, gender and heritability will be positive, i.e. the crimes that load stronger on GCF will have stronger relationships to the predictors. I don’t know about the age predictor. For mental illness I think it will depend on the type of mental illness. E.g. ASPD will show the effect, but perhaps not major (unipolar) depression.
  3. Violent crimes will have stronger loadings than non-violent crime.
  4. The distribution of GCF scores will follow a power law distribution.
  5. There will be clear population differences in the parameters of the power law fit by origin/ethnic groups in such a way as to influence the central tendency. Groups interest include native Swedes (and/or other Scandinavians/Nordics), Muslim immigrants, EU-immigrants; country of origin groups too if one has enough data to look at these.
  6. GCF will be positively associated with the extremism of religious beliefs among Muslims (cultural conflict theory). For more on the general religiousness factor, see this prior post.

June 16, 2016

Measurement error and behavioral genetics in criminology

I am watching Brian Boutwell’s (Twitter, RG) talk at a recent conference and this got me thinking.

What are we measuring?

As far as I know, there are typically two outcome variables used in criminological studies:

  1. Official records convictions.
  2. Self-reported criminal or anti-social behavior.

But exactly what trait are we trying to measure? It seems to me that we (or I am!) are really interested in measuring something like tendency to break laws that are harmful to other people. Harmful is here used in a broad sense. Stealing something may not always cause someone harm, but it does deprive them (usually) unfairly of their property. Stealing is not always wrong, but it is usually wrong. Let’s call the construct we want to measure harmful criminal behavior.

Measurement error: two types

Before going on, it is necessary to distinguish between the two types of measurement error in classical test theory:

  1. Random measurement error.
  2. Systematic measurement error.

Random measurement error is by definition error in measurement that is not correlated with anything else at all (sampling error aside). Conceptually we can think of it as adding random noise to our measurements. A simple, every-day example of this would be a study where we examine the relationship between height and GPA for ground/elementary school students. Suppose we obtain access to a school and we measure the height of all the students using a measurement tape. Then we obtain their GPAs from the school administration. Random measurement error here would be if we used dice to pick random numbers and added/subtracted these to each student’s height.

Systematic measurement error (also called bias) is different. Suppose we are measuring the ability of persons to sneak past a guard post because we want to recruit a team of James Bond-type super spies. We conduct the experiment by having people try to sneak past a guard post. Because we have a lot of people to test, our experiment is carried out all day beginning in the early morning and ending in the evening. Each individual has to try three times to sneak past the guard post and we measure their ability as the number of times they sneaked past (so 0-3 are possible scores) We assign their trials in order of their birthdays: people born early in the year take their trials in the early morning. Because it is easier to see when the sun is higher in the sky, the individuals who happen to be born later or very early in the year have an advantage: it is more difficult for the guards to spot them when it is darker. Someone who successfully sneaked past the guards three times in the evening is not necessarily at the same skill level as someone who sneaked three times around noon. There is a systematic error in the measurement of sneaking ability related to the time of testing, and it is furthermore related to the persons’ birthday.

Problems with official records

Using official records as a measure of harmful criminal behavior has a big problem: they often include convictions for things that aren’t wrong (e.g. drug use or sex work). Ideally, we don’t care about convictions for things like smoking cannabis because in a sense, this isn’t a real crime: it’s just the government that is evil. In the same way that homosexual sex or even oral sex is not a crime anymore, and was not a real crime back when it was illegal (overview of US ‘sodomy’ laws). There is a moral dimension as to what to one is trying to measure if one does not just want to go with the construct of ‘any criminal behavior that the present day state in this country happen to have criminalized’.

Furthermore, official records are based on court decisions (and pleas). Court decisions are in turn the result of the police taking up a case. If the police are biased — rightly or wrongly — in their decision about which cases to pursue, this will give rise to systematic measurement error.

Since the police does not have infinite resources, they will not pursue every case they know of. They probably won’t even pursue every case they know of they think they can win in court. There is thus an inherent randomness in which cases they will pursue. i.e. random measurement error.

Worse, which cases the police pursues may depend on irrelevant things like whether the police leadership has set a goal for the number of cases of a given type that must be pursued every year. This practice seems to be fairly common, and yet it results in serious distortions in the use of police resources. In Denmark, the police often have these goals about biking violations (say, biking on the sidewalk). The result is that in December (if the goal is based on a year-to-year basis), if they are not close to meeting their goal, the police leadership will divert resources away from more important crimes, say, break-ins, to hand out fines for people breaking biking laws. They may also lower the bar as to what counts as a violation.

Even worse, they may focus on targeting violations that are not wrong they are easy to pursue. One police officer gave the following story (anonymously in order to prevent reprisals from the leaders!) in response to a parliament discussion of the topic:

“When we are told that we must write 120 bikers [hand out fines to] the next 14 days, then we don’t place ourselves in the pedestrian area while there are pedestrians, and when the bikers may cause problems. No, we take them in the morning when they bike thru the empty pedestrian area on their way to work, because then we get more quickly to the 120 number. In other words, we do it for the numbers’ sake and not for the sake of traffic safety.”

This kind of police behavior induce both random and systematic measurement error in the official records. For instance, people who happen to bike to work and whose work is on the opposite side of a pedestrian area are more likely to receive such fines.

Measurement error, self-rating and the heritability of personality traits

While personality is probably not really that simple to summarize, most research on personality use some variant of the big five/OCEAN model (use this test). Using such measures, it has generally been found that the heritability of OCEAN traits is around 40%. Lots of room for environmental effects, surely. Unfortunately, most of the non-heritable variance is in the everything else-category.

But, these results are based on self-rated personality and not even corrected for random measurement error which is usually easy enough to do. So, suppose we correct for random measurement error, then perhaps we get to 50% heritability. This is because (almost?) any kind of measurement error biases heritability downwards.

What about self-rating bias? Surely there are some personality traits that cause people to systematically rate themselves different from how other people rate themselves, i.e. systematic measurement error. Even for height — a very simple trait — using self-reported height deflated heritability by about 4% compared with clinical measurement (from 91 to 87%), and clinical measurement is not free of random measurement error either. Furthermore, human height varies somewhat within a given day — a kind of systematic measurement error.

So, are other-ratings of personality better? There is a large meta-analysis showing that other-ratings are better. They have stronger correlations with actual criteria outcomes than self-ratings:

Other_rating_strangersother_rating_academic other_rating_workperf

This suggests considerable systematic measurement error in the self-ratings. The counter-hypothesis: others’ ratings of one’s personality, while not actually more accurate than self-ratings, causally influences the chosen outcomes, such that it appears that other-ratings are better. E.g. teachers/supervisors give higher grades/performance ratings to those they incorrectly judge to be more open minded due to some kind of halo effect. I don’t know of any research on this question.

Still, what do we find if we analyze the heritability of personality using other-ratings and especially the combination of self- and other-ratings? We get this:


A mean heritability of 81% for the OCEAN traits. Like the height study, there was evidence of heritable influence on systematic self-rating error (53% in this study, the height study found 36% but had limited precision).

Conclusion: measurement error and criminology

Back to criminology. We have seen that:
  1. Official records have serious problems with measuring the right construct (criminal harmful behavior), probably suffer from lots of random measurement error and probably some systematic measurement error.
  2. Self-ratings suffer from systematic measurement error.
  3. Measurement error biases estimates of heritability downwards.
We combine them and derive the conclusion: heritabilities of harmful criminal behavior are probably seriously underestimated.
Questions for future research:
  • Locate or do behavioral genetic studies of crime based on multiple methods and other-ratings. What do they show?
  • Find evidence to determine whether the higher validity of other ratings is due to their higher precision or due to causal halo effects.

Powered by WordPress