{"id":5099,"date":"2015-04-09T23:16:56","date_gmt":"2015-04-09T22:16:56","guid":{"rendered":"http:\/\/emilkirkegaard.dk\/en\/?p=5099"},"modified":"2015-04-09T23:16:56","modified_gmt":"2015-04-09T22:16:56","slug":"how-exactly-does-one-properly-remove-the-effect-of-one-variable-residuals-partial-correlation-multiple-regression-semi-partial","status":"publish","type":"post","link":"https:\/\/emilkirkegaard.dk\/en\/2015\/04\/how-exactly-does-one-properly-remove-the-effect-of-one-variable-residuals-partial-correlation-multiple-regression-semi-partial\/","title":{"rendered":"How exactly does one properly remove the effect of one variable? Residuals, partial correlation, multiple regression, semi-partial?"},"content":{"rendered":"<p>I ran into trouble trying to remove the effects of one variable on other variables doing the writing <a href=\"http:\/\/emilkirkegaard.dk\/en\/?p=5034\">my reanalysis of the Noble et al 2015 paper<\/a>. It was not completely obvious which exact method to use.<\/p>\n<p>The authors in their own paper used OLS aka. standard linear regression. However, since I wanted to plot the results at the case-level, this was not useful to me. Before doing this I did not really understand how partial correlations worked, or what semi-partial correlations were, or how multiple regression works exactly. I still don&#8217;t understand the last, but the others I will explain with example results.<\/p>\n<p><strong>Generated data<\/strong><\/p>\n<p>Since we are cool, <a href=\"http:\/\/en.wikipedia.org\/wiki\/R_%28programming_language%29\">we use R<\/a>.<\/p>\n<p>We begin by loading some stuff I need and generating some data:<\/p>\n<pre>#load libs\r\n library(pacman) #install this first if u dont have it\r\n p_load(Hmisc, psych, ppcor ,MASS, QuantPsyc, devtools)\r\n source_url(\"https:\/\/github.com\/Deleetdk\/psych2\/raw\/master\/psych2.R\")\r\n #Generate data\r\n df = as.data.frame(mvrnorm(200000, mu = c(0,0), Sigma = matrix(c(1,0.50,0.50,1), ncol = 2), empirical=TRUE))\r\n cor(df)[1,2]<\/pre>\n<pre>[1] 0.5<\/pre>\n<p>The correlation from this is exactly .500000, as chosen above.<\/p>\n<p>Then we add a gender specific effect:<\/p>\n<pre>df$gender = as.factor(c(rep(\"M\",100000),rep(\"F\",100000)))\r\n df[1:100000,\"V2\"] = df[1:100000,\"V2\"]+1 #add 1 to males\r\n cor(df[1:2])[1,2] #now theres error due to effect of maleness!<\/pre>\n<p>[1] 0.4463193<\/p>\n<p>The correlation has been reduced a bit as expected. It is time to try and undo this and detect the real correlation of .5.<\/p>\n<pre>#multiple regression\r\n model = lm(V1 ~ V2+gender, df) #standard linear model\r\n coef(model)[-1] #unstd. betas\r\n lm.beta(model) #std. betas<\/pre>\n<pre>V2        genderM \r\n0.4999994 -0.5039354<\/pre>\n<pre>V2        genderM \r\n0.5589474 -0.2519683 \r\n<\/pre>\n<p>The <em>raw<\/em> beta coefficient is on spot, but the standardized is not at all. This certainly makes interpretation more difficult if the std. beta does not correspond in size with correlations.<\/p>\n<pre>#split samples\r\n df.genders = split(df, df$gender) #split by gender\r\n cor(df.genders$M[1:2])[1,2] #males\r\n cor(df.genders$F[1:2])[1,2] #females<\/pre>\n<pre>[1] 0.4987116\r\n[1] 0.5012883<\/pre>\n<p>These are both approximately correct. Note however that these are the values not for the total sample as before, but for each subsample. If the sumsamples differ in their correlation with, it will show up here clearly.<\/p>\n<pre>#partial correlation\r\ndf$gender = as.numeric(df$gender) #convert to numeric\r\npartial.r(df, c(1:2),3)[1,2]<\/pre>\n<pre>[1] 0.5000005<\/pre>\n<p>This is using an already made function for doing partial correlations. Partial correlations are calculated from the residuals of both variables. This means that they correlate whatever remains of the variables after everything that could be predicted from the controlling variable is removed.<\/p>\n<pre>#residualize both\r\ndf.r2 = residuals.DF(df, \"gender\")\r\ncor(df.r2[1:2])[1,2]<\/pre>\n<pre>[1] 0.5000005<\/pre>\n<p>This should be the exact same as the above, but done manually. And it is.<\/p>\n<pre>#semi-partials\r\nspcor(df)$estimate #hard to interpet output\r\nspcor.test(df$V1, df$V2, df$gender)[1] #partial out gender from V2\r\nspcor.test(df$V2, df$V1, df$gender)[1] #partial out gender from V1<\/pre>\n<pre>V1 V2 gender\r\nV1 1.0000000 0.4999994 -0.2253951\r\nV2 0.4472691 1.0000000 0.4479416\r\ngender -0.2253103 0.5005629 1.0000000<\/pre>\n<pre>estimate\r\n1 0.4999994<\/pre>\n<pre>estimate\r\n1 0.4472691<\/pre>\n<p>Semi-partial correlations aka part correlations are the same as above, except that only one of the two variables is residualized by the controlled variable. The above has two different already made functions for calculating these. The first works on an entire data.frame, and outputs semi-partials for all variables controlling for all other variables. The output is a bit tricky to read however, and there is no explanation in the help. One has to read it by looking at each row, this is the original variable correlated with the residualized variables in each col.<br \/>\nThe two calls below are using the other function one where has to specify which two variables to correlate and which variables to control for. The second variable called is the one that gets residualized by the control variables.<br \/>\nWe see that the results are as they should be. Controlling V1 for gender has about zero effect. This is because gender has no effect on this variable aside from a very small chance effect (r=-0.00212261). Controlling V2 for gender has the desired effect of returning a value very close to .5, as it should.<\/p>\n<pre>#residualize only V2\r\ndf.r1.V1= df.r2 #copy above\r\ndf.r1.V1$V1 = df$V1 #fetch orig V1\r\ncor(df.r1.V1[1:2])[1,2]<\/pre>\n<pre>[1] 0.4999994<\/pre>\n<pre>#residualize only V1\r\ndf.r1.V2= df.r2 #copy above\r\ndf.r1.V2$V2 = df$V2 #fetch orig V2\r\ncor(df.r1.V2[1:2])[1,2]<\/pre>\n<pre>[1] 0.4472691<\/pre>\n<p>These are the two manual ways of doing the same as above. We get the exact same results, so that is good.<\/p>\n<p>So where does this lead us? Well, apparently using multiple regression to control variables is a bad idea since it results in difficult to interpret results.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I ran into trouble trying to remove the effects of one variable on other variables doing the writing my reanalysis of the Noble et al 2015 paper. It was not completely obvious which exact method to use. The authors in their own paper used OLS aka. standard linear regression. However, since I wanted to plot [&hellip;]<\/p>\n","protected":false},"author":17,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1766],"tags":[2147,2108,2145,2146],"class_list":["post-5099","post","type-post","status-publish","format-standard","hentry","category-math-science","tag-control","tag-multiple-regression","tag-partial-correlation","tag-semipartial-correlation","entry"],"_links":{"self":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts\/5099","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/users\/17"}],"replies":[{"embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/comments?post=5099"}],"version-history":[{"count":1,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts\/5099\/revisions"}],"predecessor-version":[{"id":5100,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts\/5099\/revisions\/5100"}],"wp:attachment":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/media?parent=5099"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/categories?post=5099"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/tags?post=5099"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}