« According to Microsoft, the fourth paradigm of science is data | Main | 10 Must-Have R Packages for Social Scientists »

December 17, 2009


Feed You can follow this conversation by subscribing to the comment feed for this post.

To me the video looks deceptive. You have to read the documentation and understand where to get data frames in plyr but you're supposed to just wish sapply returned them? No where in any of the apply documentation does it say you can get data frames out and if you asked someone how to get it no one who knows anything is going to say to look at apply functions (maybe aggregate or loop with rbind depending on the task).

So, he starts from a faulty premise, works himself into a hole using sapply to do something it's documentation says it cannot and then goes on to show a function that can do what some vague zeitgeist in his head says sapply should do.

Furthermore, his terminology is a mess. He's not working with a list at all (which he says repeatedly) in the first place but a function that generates data frames and a vector.

The whole premise is misleading for the example case. Pick a real example!!

For this particular task a loop is just...

myData <- NULL
for (y in years){ myData <- rbind( myData, getData(y)) }

#don't remember exact commands from the video but is this all ldply is a wrapper for?

It's a very misleading video and I think the behaviour of the ld function looks mysterious.

(I'm in no way condemning plyr and the many who love it, just the video.)

Hey JC, you are spot on about me using the term list instead of vector. That's a really good point. I'm going to edit the blog post to point out that error. Good catch.

In terms of your comments about working my way into a hole. That is correct as well. It appears, however, that you may be failing to appreciate how a beginner with R approaches programming problems. What I was illustrating in the video is how a beginner (and I group myself in that category) approaches a problem and can end up frustrated quickly because things seem non-intuitive. We can point at my intuition and say that I have unreasonable expectations, which may be true. But all new users bring conceptual misunderstandings to the keyboard with them.

The challenge with the apply() family of functions in R is that they use rather different syntax from each other. They also often require wrapping the syntax in helper functions in order to accomplish a logic process such as split-apply-combine (which my simple example did not illustrate). What the plyr package adds is not new functionality. Plyr adds a unified abstraction and simplification to the analytical process.

Thanks for showing how to accomplish the same thing with a loop. That’s a very good illustration of how one can do things in R in many different ways. It’s always useful and educational to see equivalent methods!

JC's code using the for-loop could more succinctly be expressed using a Reduce:

myData <- Reduce(rbind, lapply(years, getData))

The comments to this entry are closed.

Search Revolutions Blog

Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid
Get this blog via email with Blogtrottr