Just a quick analysis. When I read the Dutch crime report that forms the basis of this paper, I noticed one table that had crime rates by the proportion of immigrants in the neighborhood. Generally, one would expect r (immigrant% x S) to be negative and since r (S x crime) is negative, one would predict a positive r (immigrant% x crime). Is this the case? Well, mostly. The data are divided into 2 generation and 2 age groups, so there are 4 sub-datasets with lots of missing data and sampling error. If we just use all the cases as if they were independent and get rid of the data we get this result:
Immi% | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis |
X0.5. | 1.137 | 0.182 | 1.026 | 1.113 | 0.039 | 1 | 1.588 | 0.588 | 1.073 | -0.148 |
X5.15. | 1.284 | 0.292 | 1.162 | 1.258 | 0.24 | 1 | 1.938 | 0.938 | 0.809 | -0.641 |
X15.50. | 1.509 | 0.65 | 1.382 | 1.381 | 0.465 | 1 | 3.812 | 2.812 | 2.203 | 4.758 |
X.50. | 1.769 | 1.154 | 1.435 | 1.526 | 0.471 | 1 | 5.812 | 4.812 | 2.36 | 4.937 |
In other words, within each group (N=28), the ones living in the areas with more immigrants are more crime-prone. There is however substantial variation. Sometimes the pattern is the reverse for no discernible reason. E.g. 12-17 year olds from Morocco have lower crime rates in the more immigrant heavy areas (7.4, 7.1, 6.5, 6.1).
The samples are too small for one to profitably dig more into it, I think.
R code & data
library(pacman) p_load(plyr, magrittr, readODS, kirkegaard, psych) #load data from file d_orig = read.ods("Z:/code/R/dutch_crime_area.ods")[[1]] d_orig[d_orig=="" | d_orig=="0"] = NA #headers colnames(d_orig) = d_orig[1, ] d_orig = d_orig[-1, ] #remove cases with missing d = na.omit(d_orig) #remove names origins = d$Origin d$Origin = NULL #remove unknown + total d$Unknown = NULL d$Total = NULL #to numeric d = lapply(d, as.numeric) %>% as.data.frame #convert to standardized rates d_std = adply(d, 1, function(x) { x_min = min(x) x_ret = x/x_min }) describe(d_std) %>% write_clipboard