## Crime by immigrant group by proportion of immigrants in the neighborhood in the Netherlands

Just a quick analysis. When I read the Dutch crime report that forms the basis of this paper, I noticed one table that had crime rates by the proportion of immigrants in the neighborhood. Generally, one would expect r (immigrant% x S) to be negative and since r (S x crime) is negative, one would predict a positive r (immigrant% x crime). Is this the case? Well, mostly. The data are divided into 2 generation and 2 age groups, so there are 4 sub-datasets with lots of missing data and sampling error. If we just use all the cases as if they were independent and get rid of the data we get this result:

 Immi% mean sd median trimmed mad min max range skew kurtosis X0.5. 1.137 0.182 1.026 1.113 0.039 1 1.588 0.588 1.073 -0.148 X5.15. 1.284 0.292 1.162 1.258 0.24 1 1.938 0.938 0.809 -0.641 X15.50. 1.509 0.65 1.382 1.381 0.465 1 3.812 2.812 2.203 4.758 X.50. 1.769 1.154 1.435 1.526 0.471 1 5.812 4.812 2.36 4.937

In other words, within each group (N=28), the ones living in the areas with more immigrants are more crime-prone. There is however substantial variation. Sometimes the pattern is the reverse for no discernible reason. E.g. 12-17 year olds from Morocco have lower crime rates in the more immigrant heavy areas (7.4, 7.1, 6.5, 6.1).

The samples are too small for one to profitably dig more into it, I think.

R code & data

dutch_crime_area

```library(pacman)

d_orig[d_orig=="" | d_orig=="0"] = NA

colnames(d_orig) = d_orig[1, ]
d_orig = d_orig[-1, ]

#remove cases with missing
d = na.omit(d_orig)

#remove names
origins = d\$Origin
d\$Origin = NULL

#remove unknown + total
d\$Unknown = NULL
d\$Total = NULL

#to numeric
d = lapply(d, as.numeric) %>% as.data.frame

#convert to standardized rates
d_std = adply(d, 1, function(x) {
x_min = min(x)
x_ret = x/x_min
})

describe(d_std) %>% write_clipboard```