## The Unz compositional data fallacy

Readers may recall that about 12 years ago, Ron Unz (of Unz.com) posted an article The Myth of Hispanic Crime. In this article he tried to show that Hispanics were not actually elevated in crime rate. His method was:

These individual city comparisons may be quantitatively extended to urban crime rates in general by calculating the weighted-average correlation coefficient between the Hispanic percentage of a city and its various crime rates and performing the same calculation for the white-plus-Asian percentage as well. (Asians are a very small population in most cities, so it is convenient to combine them with whites; since all studies show Asians tend to have much lower crime rates than whites, this will tend to reduce the apparent white crime rate.) Whereas all the previous urban crime figures quoted were from 2008, the latest year available, we can obtain the separate correlations for the last several years in order to consider trends over time.

As Charts 8-13 indicate, the Hispanic and white-plus-Asian crime correlation rates are usually quite close and in many cases have converged to almost identical values, at least since 2005. Moreover, we must remember that all these ethnic percentage rates refer to the total population rather than the percentage of young males in the high-crime years for each group, and as mentioned earlier, the age distributions for Hispanics and whites are very different. In fact, if we repeat these same correlation calculations for the population of males aged 18-29, the Hispanic and white rates substantially diverge, with young Hispanics usually being associated with significantly lower urban crime rates.

This gets him to plots like this:

And later in 2013, he posted another piece titled Race and Crime in America also with results like these:

We first examine the correlations among the population proportions:

OK, they are mostly negative. This is of course because when there’s a larger proportion of one group, there will have to a smaller proportion of another group. Due to the non-random distribution of groups across the units of analysis, you can end up with areas with high proportions of groups A and B, and so groups A and B can have a positive correlation despite the fixed-sum nature of the data. We see this in an extreme case for Alaskan Natives and Other Amerindians (these are different tribes of American Natives, and they generally live in the same areas). For the larger groups, though, we see the negative correlations: Hispanic% is correlated at White% at -.70, -.37 with Black%, and -.33 with multiethnic.

Finally, let’s look at the crime rate correlations with these racial proportions:

The second column replicates the results Unz reported: Hispanic% is weakly negatively related to crime rates across types. So are Hispanics less criminal than average? Well, maybe, but we also see something more obvious in the data: Black% is the only group with large positive correlations. It seems that Blacks’ role in crime is so large that they deflate the correlations to other group proportions even though they may also be above average criminal. There is of course an easy way to tackle this problem: multiple regression. Here’s how the models looks like:

The outcome variable is the murder/homicide rate, which I used because the total violent crime rate was missing for a few cities. The first model shows issues because we can see that a … PUMA with 100% Hispanics is predicted to have a crime rate of -1.30 (add the intercept and slope). Linear regression does not understand that the outcome cannot be negative. Anyway, in the second model, we add the most important variable, Black%. Now the model is a lot better, R2 went from 9% to 55%. Hispanic% still has a negative coefficient, but it is p = .599, and close to 0. If we add the remaining variables, this changes essentially nothing, and the model doesn’t even improve, probably because we have too few cases for this modeling (n=100). We cannot add White% here because we are using it as a baseline group for comparison. Anyway, we see that there is now no evidence of Hispanic% being negatively associated with crime rate. That conclusion was a mistake based on the confounding factor of Black%. The strong dominance of Black% is also seen in Unz’ figures, so it is rather strange he didn’t think to try a regression model. Maybe because it cannot be done in Excel.

Of course, there is no particular reason to examine the question at the aggregate level. If we want to know whether Hispanics are more criminal than Whites, we can just look up statistics based on persons. While the FBI etc. did not previously distinguish between Hispanics and Whites properly, they do so now a days. The go-to report for race and crime from a realist perspective is The Color of Crime report by American Renaissance. There’s a 2016 version here. We can look at e.g. incarceration rate for violent crime:

These data are not age adjusted, and Unz points out that age is a confounder since Hispanics are younger and younger people are more criminal. True, so we can look at some model results, say, based on longitudinal studies like the NLSY’s. NLSY1979 is too old, there weren’t a lot of Hispanics back then. The follow-up NLSY97 has about 1,900. The subjects in the survey are all born around the same time, so there’s negligible sex and age differences in these data. They asked the subjects how many times they had been arrested. The distributions look like this:

The White-Asian-other group is the purple, Hispanics in green. They are getting arrested somewhat more even by self-report standards. As we already know self-report standards for criminals aren’t too good, these are probably underestimates of true differences.

Ron Unz is needlessly bombastic. The elevated Hispanic crime rate is well documented. His use of correlations with compositional data was the key mistake instead of the more obvious and an standard multiple regression. Various government data and independent data show clearly that Hispanics are elevated in crime. There is no myth here. There is however, a true mystery with Hispanics, namely that they live longer than Whites. This is puzzling because of their generally worse social status (income, education, occupation etc.) and intelligence, and obesity rates, which create an expectation of shorter lifespan. Wikipedia has a decent page on this one — for now.