Same guy proposed another idea. Wikipedia has data here. However, since i had previously seen that people fudge data on Wikipedia articles (e.g. this one), then maybe it was not a good idea to just rely on Wikipedia. So i did the best thing: fetched both the data from Wiki and the data from the primary source (WHO), and then compared them for accuracy. They were 100 identical for the “total rates”. I did not compare the other variables. But at least this dataset was not fudged. :)
So, then i loaded the data in R and plotted alcohol consumption per capita (age >=15) vs. cancer rates per capita.
source(“merger.R”) #load custom functions
DF.mega = read.mega(“Megadataset_v1.7b.csv”) #load megadataset
#load alcohol
alcohol = read.mega(“alcohol_consumption.csv”) #loads the data
short.names = as.abbrev(rownames(alcohol)) #gets the abbreviated names so it can be merged with megadataset
rownames(alcohol) = short.names #inserts the abbreviated namesDF.mega2= merge.datasets(alcohol,DF.mega) #merge datasets
scatterplot(CancerRatePer100000 ~ AlcoholConsumptionPerCapitaWHO, DF.mega2, #plot it
smoother=FALSE, #no moving average
labels = rownames(DF.mega),id.n=nrow(DF.mega)) #include datapoint names
There is no relationship there. However, it may work in multiple regression:
lm1 = lm(CancerRatePer100000 ~ AlcoholConsumptionPerCapitaWHO+X2012LifeExpectancyatBirth,
DF.mega2)
summary(lm1)Call: lm(formula = CancerRatePer100000 ~ AlcoholConsumptionPerCapitaWHO + X2012LifeExpectancyatBirth, data = DF.mega2) Residuals: Min 1Q Median 3Q Max -48.677 -26.569 0.717 28.486 61.631 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -91.6149 79.2375 -1.156 0.254 AlcoholConsumptionPerCapitaWHO 1.7712 1.6978 1.043 0.303 X2012LifeExpectancyatBirth 4.2518 0.9571 4.442 6.13e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 31.56 on 43 degrees of freedom (227 observations deleted due to missingness) Multiple R-squared: 0.3158, Adjusted R-squared: 0.284 F-statistic: 9.923 on 2 and 43 DF, p-value: 0.0002861
There is seemingly no predictive power of alcohol consumption! But it does cause cancer, right? According to my skim of Wiki, yes, but only 3.5% of cancer cases, so the effect is too small to be seen here.
The data is in megadataset 1.7c.