kirkegaard: df_func()

Often I want to get the mean value for a case across a number of columns, usually years. This however gets repetitive because the base mean() function cannot handle data like that. Other times, one wants to standardize the data first, e.g. when the scales are not the same across variables. Lastly, often one wants to use just a few columns, usually marked by a special name. Before, these tasks were time-consuming. Now they are easy.

Consider the iris dataset:

> head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

It has 4 numeric and 1 factor variable. Let’s say we want the means by variable for the first four. The simplest idea is:

> mean(iris[1:4])
[1] NA
Warning message:
In mean.default(iris[1:4]) :
  argument is not numeric or logical: returning NA

Alas, it doesn’t work. However, we can:

> df_func(iris[1:4]) %>% head
[1] 2.550 2.375 2.350 2.350 2.550 2.850

If we want to standardize the variables first:

> df_func(iris[1:4], standardize = T) %>% head
[1] -0.6322189 -0.9793858 -0.9392153 -0.9984393 -0.6050527 -0.2041362

Maybe we want the median instead:

> df_func(iris[1:4], func = median) %>% head
[1] 2.45 2.20 2.25 2.30 2.50 2.80

What is we want to match columns by a pattern? The string “petal” matches two variables:

> df_func(iris, pattern = "Petal") %>% head
[1] 0.80 0.80 0.75 0.85 0.80 1.05

If we try to use a non-numeric variable, we get an informative error:

> df_func(iris)
Error in df_func(iris) : Some variables were not numeric!

Likewise, if we our pattern matching but it doesn’t match anything:

> df_func(iris, pattern = "sadaiasd")
Error in df_func(iris, pattern = "sadaiasd") : 
  No columns matched the pattern!

Finally, the function ignores missing data by default, but one can change this if needed.

—

Get the package from github.

You Might Also Like

kirkegaard: GG_group_means(), easy plotting of group means using ggplot2

Easy plotting of kmeans cluster analysis with ggplot2

Scraping IMDB user ratings