{"id":5670,"date":"2015-11-25T01:47:33","date_gmt":"2015-11-25T00:47:33","guid":{"rendered":"http:\/\/emilkirkegaard.dk\/en\/?p=5670"},"modified":"2015-11-25T01:47:33","modified_gmt":"2015-11-25T00:47:33","slug":"kirkegaard-df_func","status":"publish","type":"post","link":"https:\/\/emilkirkegaard.dk\/en\/2015\/11\/kirkegaard-df_func\/","title":{"rendered":"kirkegaard: df_func()"},"content":{"rendered":"<p>Often I want to get the mean value for a case across a number of columns, usually years. This however gets repetitive because the base mean() function cannot handle data like that. Other times, one wants to standardize the data first, e.g. when the scales are not the same across variables. Lastly, often one wants to use just a few columns, usually marked by a special name. Before, these tasks were time-consuming. Now they are easy.<\/p>\n<p>Consider the iris dataset:<\/p>\n<pre id=\"rstudio_console_output\" class=\"GEM3DMTCFGB\" tabindex=\"0\"><span class=\"GEM3DMTCLGB ace_keyword\">&gt; <\/span><span class=\"GEM3DMTCLFB ace_keyword\">head(iris)\r\n<\/span>  Sepal.Length Sepal.Width Petal.Length Petal.Width Species\r\n1          5.1         3.5          1.4         0.2  setosa\r\n2          4.9         3.0          1.4         0.2  setosa\r\n3          4.7         3.2          1.3         0.2  setosa\r\n4          4.6         3.1          1.5         0.2  setosa\r\n5          5.0         3.6          1.4         0.2  setosa\r\n6          5.4         3.9          1.7         0.4  setosa<\/pre>\n<p>It has 4 numeric and 1 factor variable. Let&#8217;s say we want the means by variable for the first four. The simplest idea is:<\/p>\n<pre id=\"rstudio_console_output\" class=\"GEM3DMTCFGB\" tabindex=\"0\"><span class=\"GEM3DMTCLGB ace_keyword\">&gt; <\/span><span class=\"GEM3DMTCLFB ace_keyword\">mean(iris[1:4])\r\n<\/span>[1] NA\r\n<span class=\"GEM3DMTCPFB  ace_constant\">Warning message:\r\n<\/span><span class=\"GEM3DMTCPFB  ace_constant\">In mean.default(iris[1:4]) :<\/span>\r\n <span class=\"GEM3DMTCPFB  ace_constant\"> argument is not numeric or logical: returning NA<\/span><\/pre>\n<p>Alas, it doesn&#8217;t work. However, we can:<\/p>\n<pre id=\"rstudio_console_output\" class=\"GEM3DMTCFGB\" tabindex=\"0\"><span class=\"GEM3DMTCLGB ace_keyword\">&gt; <\/span><span class=\"GEM3DMTCLFB ace_keyword\">df_func(iris[1:4]) %&gt;% head\r\n<\/span>[1] 2.550 2.375 2.350 2.350 2.550 2.850<\/pre>\n<p>If we want to standardize the variables first:<\/p>\n<pre id=\"rstudio_console_output\" class=\"GEM3DMTCFGB\" tabindex=\"0\"><span class=\"GEM3DMTCLGB ace_keyword\">&gt; <\/span><span class=\"GEM3DMTCLFB ace_keyword\">df_func(iris[1:4], standardize = T) %&gt;% head\r\n<\/span>[1] -0.6322189 -0.9793858 -0.9392153 -0.9984393 -0.6050527 -0.2041362<\/pre>\n<p>Maybe we want the median instead:<\/p>\n<pre id=\"rstudio_console_output\" class=\"GEM3DMTCFGB\" tabindex=\"0\"><span class=\"GEM3DMTCLGB ace_keyword\">&gt; <\/span><span class=\"GEM3DMTCLFB ace_keyword\">df_func(iris[1:4], func = median) %&gt;% head\r\n<\/span>[1] 2.45 2.20 2.25 2.30 2.50 2.80<\/pre>\n<p>What is we want to match columns by a pattern? The string &#8220;petal&#8221; matches two variables:<\/p>\n<pre id=\"rstudio_console_output\" class=\"GEM3DMTCFGB\" tabindex=\"0\"><span class=\"GEM3DMTCLGB ace_keyword\">&gt; <\/span><span class=\"GEM3DMTCLFB ace_keyword\">df_func(iris, pattern = \"Petal\") %&gt;% head\r\n<\/span>[1] 0.80 0.80 0.75 0.85 0.80 1.05<\/pre>\n<p class=\"GEM3DMTCFGB\" tabindex=\"0\">If we try to use a non-numeric variable, we get an informative error:<\/p>\n<pre id=\"rstudio_console_output\" class=\"GEM3DMTCFGB\" tabindex=\"0\"><span class=\"GEM3DMTCLGB ace_keyword\">&gt; <\/span><span class=\"GEM3DMTCLFB ace_keyword\">df_func(iris)\r\n<\/span><span class=\"GEM3DMTCPFB  ace_constant\">Error in df_func(iris) : Some variables were not numeric!<\/span><\/pre>\n<p class=\"GEM3DMTCFGB\" tabindex=\"0\">Likewise, if we our pattern matching but it doesn&#8217;t match anything:<\/p>\n<pre id=\"rstudio_console_output\" class=\"GEM3DMTCFGB\" tabindex=\"0\"><span class=\"GEM3DMTCLGB ace_keyword\">&gt; <\/span><span class=\"GEM3DMTCLFB ace_keyword\">df_func(iris, pattern = \"sadaiasd\")\r\n<\/span><span class=\"GEM3DMTCPFB  ace_constant\">Error in df_func(iris, pattern = \"sadaiasd\") : \r\n  No columns matched the pattern!<\/span><\/pre>\n<p class=\"GEM3DMTCFGB\" tabindex=\"0\">Finally, the function ignores missing data by default, but one can change this if needed.<\/p>\n<p class=\"GEM3DMTCFGB\" tabindex=\"0\">&#8212;<\/p>\n<p class=\"GEM3DMTCFGB\" tabindex=\"0\">Get the package from <a href=\"https:\/\/github.com\/Deleetdk\/kirkegaard\">github<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Often I want to get the mean value for a case across a number of columns, usually years. This however gets repetitive because the base mean() function cannot handle data like that. Other times, one wants to standardize the data first, e.g. when the scales are not the same across variables. Lastly, often one wants [&hellip;]<\/p>\n","protected":false},"author":17,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2089],"tags":[2259,1979],"class_list":["post-5670","post","type-post","status-publish","format-standard","hentry","category-programming","tag-kirkegaard-package","tag-r","entry"],"_links":{"self":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts\/5670","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/users\/17"}],"replies":[{"embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/comments?post=5670"}],"version-history":[{"count":1,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts\/5670\/revisions"}],"predecessor-version":[{"id":5671,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts\/5670\/revisions\/5671"}],"wp:attachment":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/media?parent=5670"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/categories?post=5670"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/tags?post=5670"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}