During a discussion with some other members of the R Consortium, the question came up: who maintains the most packages on CRAN? DataCamp maintains a list of most active maintainers by downloads, but in this case we were interested in the total number of packages by maintainer. Fortunately, this is pretty easy to figure thanks to the CRAN repository tools now included in R, and a little dplyr (see the code below) gives the answer quickly[*].
And the answer? The most prolific maintainer is Scott Chamberlain from ROpenSci, who is currently the maintainer of 77 packages. Here's a list of the top 20:
Maint n 1 Scott Chamberlain 77 2 Dirk Eddelbuettel 53 3 Gabor Csardi 50 4 Hadley Wickham 41 5 Jeroen Ooms 40 6 ORPHANED 37 7 Thomas J. Leeper 29 8 Bob Rudis 28 9 Henrik Bengtsson 28 10 Kurt Hornik 28 11 Oliver Keyes 28 12 Martin Maechler 27 13 Richard Cotton 27 14 Robin K. S. Hankin 25 15 Simon Urbanek 24 16 Kirill Muller 23 17 Torsten Hothorn 23 18 Achim Zeileis 22 19 Paul Gilbert 22 20 Yihui Xie 21
[Update Mar 23: updated the R code and the results to treat Gabor Csardi and Gábor Csárdi as the same person, and corrected a trailing space issue that failed to count 2 of Hadley Wickham's packages.] (That list of orphaned packages with no current maintainer includes XML, d3heatmap, and flexclust, to name just 3 of the 37.) Here's the R code used to calculate the top 20:
[*]Well, it would have been quick, until I noticed that some maintainers had two forms of their name in the database, one with surrounding quotes and one without. It seemed like it was going to be trivial to fix with a regular expression, but it took me longer than I hoped to come up with the final regexp on line 6 above, which is now barely distinguishable from line noise. As usual, there an xkcd for this situation:
So this post reminds me of something I'd really like to get feedback on. Quite a bit of the serious statistical analysis in R relies on the packages in CRAN. What happens when those packages are no longer maintained? Is there anyone out there that has put together a contingency plan for such a situation in their business?
Posted by: Jonathan Taylor | March 22, 2018 at 13:15
The biggest problem with orphaned packages is that they may be removed from CRAN if they start throwing errors in new versions of R. That's how packages become orphaned, actually: when maintainers stop responding to CRAN emails. It's not *always* a death sentence though; several of those orphaned packages have been in that state for years.
What you should plan on is supporting older versions of R in your systems, rather than committing to start maintaining old packages in new versions of R. That's best practice in a production system, anyway. Yesterday's post about Rocker provides some good tools to create containers with fixed versions of R and compatible packages, so you don't need to worry if they become orphaned later. Provided you trust the results today, of course.
Posted by: David Smith | March 22, 2018 at 14:58
A simpler regular expression is
.Posted by: Kent Johnson | March 22, 2018 at 17:47
@Kent I think that was my first iteration, but the data also has email addresses delimited by < > that I needed to strip. Also some of the names had suffixes like ", companyname" or "(title)" that I needed to strip as well. It all seemed so simple at first...
Posted by: David Smith | March 23, 2018 at 07:07