{"id":4423,"date":"2014-10-14T18:50:48","date_gmt":"2014-10-14T17:50:48","guid":{"rendered":"http:\/\/emilkirkegaard.dk\/en\/?p=4423"},"modified":"2014-10-14T18:52:42","modified_gmt":"2014-10-14T17:52:42","slug":"cancer-rates-part-2-does-alcohol-consumption-have-incremental-predictive-power","status":"publish","type":"post","link":"https:\/\/emilkirkegaard.dk\/en\/2014\/10\/cancer-rates-part-2-does-alcohol-consumption-have-incremental-predictive-power\/","title":{"rendered":"Cancer rates: Part 2, does alcohol consumption have incremental predictive power?"},"content":{"rendered":"<p>Same guy proposed another idea. <a href=\"https:\/\/en.wikipedia.org\/wiki\/List_of_countries_by_alcohol_consumption\">Wikipedia has data here<\/a>. However, since i had previously seen that people fudge data on Wikipedia articles (e.g. <a href=\"http:\/\/theunsilencedscience.blogspot.dk\/2014\/08\/the-alondra-oubre-academic-fraud-exposed.html\">this one<\/a>), then maybe it was not a good idea to just rely on Wikipedia. So i did the best thing: fetched both the data from Wiki and the data from the primary source (WHO), and then compared them for accuracy. They were 100 identical for the &#8220;total rates&#8221;. I did not compare the other variables. But at least this dataset was not fudged. :)<\/p>\n<p>So, then i loaded the data in R and plotted alcohol consumption per capita (age &gt;=15) vs. cancer rates per capita.<\/p>\n<blockquote><p>source(&#8220;merger.R&#8221;) #load custom functions<\/p>\n<p>DF.mega = read.mega(&#8220;Megadataset_v1.7b.csv&#8221;) #load megadataset<\/p>\n<p>#load alcohol<br \/>\nalcohol = read.mega(&#8220;alcohol_consumption.csv&#8221;) #loads the data<br \/>\nshort.names = as.abbrev(rownames(alcohol)) #gets the abbreviated names so it can be merged with megadataset<br \/>\nrownames(alcohol) = short.names #inserts the abbreviated names<\/p>\n<p>DF.mega2= merge.datasets(alcohol,DF.mega) #merge datasets<\/p>\n<p>scatterplot(CancerRatePer100000 ~ AlcoholConsumptionPerCapitaWHO, DF.mega2, #plot it<br \/>\nsmoother=FALSE, #no moving average<br \/>\nlabels = rownames(DF.mega),id.n=nrow(DF.mega)) #include datapoint names<\/p><\/blockquote>\n<p><a href=\"http:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/alcohol_lifeexpectancy.png\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-4424\" src=\"http:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/alcohol_lifeexpectancy.png\" alt=\"alcohol_lifeexpectancy\" width=\"660\" height=\"407\" \/><\/a><\/p>\n<p>There is no relationship there. However, it may work in multiple regression:<\/p>\n<blockquote><p>lm1 = lm(CancerRatePer100000 ~ AlcoholConsumptionPerCapitaWHO+X2012LifeExpectancyatBirth,<br \/>\nDF.mega2)<br \/>\nsummary(lm1)<\/p>\n<pre id=\"rstudio_console_output\" class=\"GEWYW5YBFEB\" tabindex=\"0\">Call:\r\nlm(formula = CancerRatePer100000 ~ AlcoholConsumptionPerCapitaWHO + \r\n    X2012LifeExpectancyatBirth, data = DF.mega2)\r\n\r\nResiduals:\r\n    Min      1Q  Median      3Q     Max \r\n-48.677 -26.569   0.717  28.486  61.631 \r\n\r\nCoefficients:\r\n                               Estimate Std. Error t value Pr(&gt;|t|)    \r\n(Intercept)                    -91.6149    79.2375  -1.156    0.254    \r\nAlcoholConsumptionPerCapitaWHO   1.7712     1.6978   1.043    0.303    \r\nX2012LifeExpectancyatBirth       4.2518     0.9571   4.442 6.13e-05 ***\r\n---\r\nSignif. codes:  0 \u2018***\u2019 0.001 \u2018**\u2019 0.01 \u2018*\u2019 0.05 \u2018.\u2019 0.1 \u2018 \u2019 1\r\n\r\nResidual standard error: 31.56 on 43 degrees of freedom\r\n  (227 observations deleted due to missingness)\r\nMultiple R-squared:  0.3158,\tAdjusted R-squared:  0.284 \r\nF-statistic: 9.923 on 2 and 43 DF,  p-value: 0.0002861<\/pre>\n<\/blockquote>\n<p>There is seemingly no predictive power of alcohol consumption! But it does cause cancer, right? According to my skim of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Alcohol_and_cancer\">Wiki<\/a>, yes, but only 3.5% of cancer cases, so the effect is too small to be seen here.<\/p>\n<p>The data is in <a href=\"https:\/\/osf.io\/zdcbq\/files\/\">megadataset 1.7c<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Same guy proposed another idea. Wikipedia has data here. However, since i had previously seen that people fudge data on Wikipedia articles (e.g. this one), then maybe it was not a good idea to just rely on Wikipedia. So i did the best thing: fetched both the data from Wiki and the data from the [&hellip;]<\/p>\n","protected":false},"author":17,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1727],"tags":[72,2030],"class_list":["post-4423","post","type-post","status-publish","format-standard","hentry","category-medicine","tag-alcohol","tag-cancer-rates","entry"],"_links":{"self":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts\/4423","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/users\/17"}],"replies":[{"embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/comments?post=4423"}],"version-history":[{"count":3,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts\/4423\/revisions"}],"predecessor-version":[{"id":4427,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts\/4423\/revisions\/4427"}],"wp:attachment":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/media?parent=4423"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/categories?post=4423"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/tags?post=4423"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}