Often I want to get the mean value for a case across a number of columns, usually years. This however gets repetitive because the base mean() function cannot handle data like that. Other times, one wants to standardize the data first, e.g. when the scales are not the same across variables. Lastly, often one wants to use just a few columns, usually marked by a special name. Before, these tasks were time-consuming. Now they are easy.
Consider the iris dataset:
> head(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa
It has 4 numeric and 1 factor variable. Let’s say we want the means by variable for the first four. The simplest idea is:
> mean(iris[1:4]) [1] NA Warning message: In mean.default(iris[1:4]) : argument is not numeric or logical: returning NA
Alas, it doesn’t work. However, we can:
> df_func(iris[1:4]) %>% head [1] 2.550 2.375 2.350 2.350 2.550 2.850
If we want to standardize the variables first:
> df_func(iris[1:4], standardize = T) %>% head [1] -0.6322189 -0.9793858 -0.9392153 -0.9984393 -0.6050527 -0.2041362
Maybe we want the median instead:
> df_func(iris[1:4], func = median) %>% head [1] 2.45 2.20 2.25 2.30 2.50 2.80
What is we want to match columns by a pattern? The string “petal” matches two variables:
> df_func(iris, pattern = "Petal") %>% head [1] 0.80 0.80 0.75 0.85 0.80 1.05
If we try to use a non-numeric variable, we get an informative error:
> df_func(iris) Error in df_func(iris) : Some variables were not numeric!
Likewise, if we our pattern matching but it doesn’t match anything:
> df_func(iris, pattern = "sadaiasd") Error in df_func(iris, pattern = "sadaiasd") : No columns matched the pattern!
Finally, the function ignores missing data by default, but one can change this if needed.
—
Get the package from github.