Quick recap of the main object types in R from Advanced R:
homogeneous | heterogeneous | |
1-d | atomic vector | list |
2-d | matrix | data.frame |
n-d | array | ??? |
So, objects can either store only data of the same type or of any type, and they can have 1, 2 or any number of dimensions. Note that there is a missing cell. But actually, all the object types in the right column are lists. Data.frames are lists that look and behave like matrices in some ways. So, if we can make a list behave like a matrix, can we make a list behave like an n-dimensional array? Yes: list-arrays.
Why would we want an array that can take any type of data? Well, as argued before, one can use this data structure to automate analysis across methods and subsets. In other words, make it easy to test the effects of researcher degrees of freedom. What if we didn’t make that correction to the data to make it more normal? What if we split the data into Whites and Blacks, or analyze them together? What about splitting the data into gender: both genders, males, females? Examining all these manually requires doing the analysis 2 * 3 * 3 = 18 times. This would probably make for very long code that would break easily and be hard to update. In other words, violate the DRY principle.
In some cases, to do the analysis across many settings, one needs to save data.frames for each setting. This is fairly cumbersome using just lists, but can be done. If we only need to enter different parameters, just storing all of them in a data.frame would be sufficient. In my case, however, I needed to repeat an analysis across a large number of (sub-)samples that were derived using two parameters.
So, how to set up the initial list-array? I wrote a simple function to do this:
t = make_list_array(1:2, letters[1:2], LETTERS[1:2]); t # , , A # # a b # 1 NA NA # 2 NA NA # # , , B # # a b # 1 NA NA # 2 NA NA
One can give the function either vectors or whole numbers. If one gives it vectors, it uses their length and their names to construct the list-array. If one gives it whole numbers, it uses that as the length and just uses the natural numbers as the names.
I’m still not sure if using list-arrays is superior to just using one long 1-d list and storing the parameter values in a data.frame made using expand.grid. For instance:
do.call("expand.grid", args = dimnames(t)) # Var1 Var2 Var3 # 1 1 a A # 2 2 a A # 3 1 b A # 4 2 b A # 5 1 a B # 6 2 a B # 7 1 b B # 8 2 b B
Time will tell.