An online acquaintance asked me me to find data about these two and look for a relationship. It stands to reason that if you have a country where people die of lots of other things (accidents, warfare, parasites/contagious disease, hunger/thirst), they don’t ling long enough to get cancer.

I already had life expectancy data from the UN when I imported the Human Development Index data. So where to find cancer rate data? I found 50 datapoints here. There are other sources such as this, but they have age standardized the data, which makes it useless for our purpose here. Furthermore, they are given by regions, where we want country-level.

So, I used the 50 datapoints from The Guardian. In R, I typed:

source(“merger.R”) #this loads my custom functions for working with the megadataset

DF.mega = read.mega(“Megadataset_v1.7b.csv”) #load data

library(car) #library needed for scatterplot function
scatterplot(CancerRatePer100000 ~ X2012LifeExpectancyatBirth, DF.mega,
smoother=FALSE, #no moving average
labels = rownames(DF.mega),id.n=nrow(DF.mega)) #add labels for all points

cor(DF.mega[“CancerRatePer100000”],DF.mega[“X2012LifeExpectancyatBirth”],use=”pairwise”) #get correlation


The correlation is .55. The labels are ISO-3 or custom (full names can be found in the “Names” variable in the megadataset).