John Myles White (who did the Canabalt scores analysis from last month) was trying to decide which R functions to spend time learning, and asked the obvious followup question: Which functions in R are used the most? With no readily-available answer, John answered the question himself, by counting the number of times each function is called in all the packages available on CRAN. He then ranked the functions in two ways: first, by the total number of times each function is called in all the source code for all the packages, and then by counting the number of packages that use each function at least once. The top five functions are therefore:
- if
- c
- function
- length
- list (by the number of packages) or paste (by number of uses)
The order varies by the method used, and technically "if" and "function" are keywords, not functions. But John has helpfully provided the data in CSV format for both the package method and total uses method, so you can do your own analysis. Also interesting to note is that the occurrences table is another example of the power-law distribution in action.
John Myles White: R Function Usage Frequencies (Take 2)



Thanks for linking to these posts, David. I think the second link in your first paragraph is supposed to be http://www.johnmyleswhite.com/notebook/2009/12/08/r-function-usage-frequencies-take-2/ rather than http://www.johnmyleswhite.com/notebook/2009/12/07/r-function-usage-frequencies/.
Posted by: John Myles White | December 09, 2009 at 11:10
I don't think "number of times called" is quite the right metric for "most important" - for the simple reason that some functions are important because you don't have to keep calling them.
For example, functions like those in the "apply" family (apply, tapply, sapply and so on) are important precisely because they often do all the "work" in a single call.
There are a number of other functions that fall into this category.
Certainly functions like c and if are important, even fundamental, and you can't get far without them, but I think the measure of importance penalizes functions that are better at not needing to be called a lot because they do so much.
In fact, I probably use ifelse (operating on an entire matrix) more than I use if in regular work, but if I was writing a lot of "production quality" functions, then if would need to be used more (for handling all the special cases that crop up that I'm usually able to ignore if I'm just doing something for myself).
Posted by: GB | December 09, 2009 at 12:47
Looks like a textbook example of Zipf's Law, with one 'blip' ('function'?).
Posted by: Jason Barrett | December 10, 2009 at 16:24