tapply is a very handy function for partitioning and summarizing data. For example, if I want to calculate the average miles-per-gallon of 4-cylinder, 6-cylinder, and 8-cylinder cars from the mtcars data set, I can do it very easily in one line of code:
> head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
> tapply(mtcars$mpg, mtcars$cyl, mean)
4 6 8
26.66364 19.74286 15.10000
But if the summarizing function returns something other than a single scalar (like mean does), the output -- a list -- can be a little hard to manage.
Dan had this experience when attempting to use the quantile function with tapply to calculate a 5-number summary for subsets of data. He could assemble the results (a) into a dataframe (the desired output) with rbind, but that required a lot of typing:
b <- rbind(a[[1]],a[[2]],a[[3]],a[[4]])
It should be possible to automate that and save all the typing, right? And so it is. Dan used the tricky, but immensely powerful function do.call to do the typing for him:
b <- do.call(rbind,a)
Much simpler, and the only practical solution when there are many elements to combine. do.call basically passes the elements of its second argument (a list) as arguments to a function you specify (as the first argument of the call to do.call). Here, it has the effect of calling rbind with each element of the output of tapply. In any situation where you want to call a function but you don't know what the arguments of the function should be in advance because they're stored in a list, do.call is your friend.
dube.mine.nu: R: tapply Output Formatting
Hello,
Thank you for this.
Another solution is to use reshape package:
library(reshape)
cast(mtcars, cyl ~ ., quantile, value = "mpg")
Posted by: david | May 20, 2009 at 12:22
Or the plyr package.
By the way, I've had some bad experiences with do.call(rbind, ...) when there is a huge number of lists.
Posted by: Doug | May 21, 2009 at 08:05
the URL has died, along with the server i had the post on. it lives again here:
http://dandube.com/blog/?p=145
if you care to see it.
Posted by: dan | July 09, 2009 at 12:08