Bob Muenchen has recently updated his report on the popularity of statistical software. With the updated analysis, we see that the R community remains as strong as ever: the number of contributed R packages continues its exponential growth rate, R maintains its dominance in online discussion, and has 20x the content of other statistics packages on social programming sites like CrossValidated and StackOverflow.
One particularly interesting metric is the number of times software packages are cited in scholarly articles, as a measure of the rate at which such packages are used in academia. Google Scholar makes it possible to quantify such measures, and from the chart below we can see that SAS and SPSS have been on a steep decline since around 2005 or so:
Amongst the software packages showing growth in academia (i.e. all those except SAS and SPSS), R is has the largest "market share" of citations, and continues to grow rapidly:
If you're interested in the details of how market share was calculated, or how to create a "market share" plot like this using the ggplot2 package, the librestats blog has everything you need at the link below.
librestats: Statistical Software Popularity on Google Scholar
The decline in SPSS and SAS is very sharp, and it's curious that nothing is taking up the slack.
The APA publications manual now says that if commonly used software is employed, such as SAS or SPSS there is no need to reference it. I wonder if that accounts for some of the decline.
Posted by: Jeremy Miles | April 13, 2012 at 11:55
The rapid growth of R and equally rapid decline of SAS and SPSS in the Google Scholar data is quite pronounced. Students tend to keep using what they learned in college throughout their careers, implying an eventual effect that will be much broader than the academic data now demonstrates.
However, R's current market share is greatly overemphasized in that plot, which lacks SAS and SPSS. It mirrors Figure 7b in http://r4stats.com/popularity rather than the bigger picture shown in Figure 7a.
Posted by: Bob Muenchen | April 13, 2012 at 15:29
I'm surprised that R share is so small. Working in a medical biology and informatics department I just don't see anything else used. Plus these courses in our university are massively over-subscribed.
I suspect a lot more people use it but don't choose to cite it for simple graphics or mundane data manipulation..in much the same way most people don't cite excel in papers. I always cite specific bioinformatics packages but don't reference ggplot2 all over my papers.I'm also not sure I always cite R rather than the packages. On the other hand I think people cite expensive software because they usually have to justify buying it for a particular feature that they think the free software didn't cover (which is increasingly less the case).
Posted by: Stephen | April 14, 2012 at 04:15
Stephen raises a good point. A good search is critical to getting quality data. The details of that search are described at http://librestats.com/2012/04/12/statistical-software-popularity-on-google-scholar. For R I included the main web site that citations are supposed to include, but for each string I added such as "Bioconductor", "ggplot2 package" etc. I would get a few percent more hits. Altogether they may have added 10% more hits. The same held for SAS, adding strings like "proc mixed".
I don't know Bioconductor well, but if you know of packages or functions that are popular please write me at muenchen.bob at gmail dot com. The names have to be unique to avoid spurious hits, but I suspect that's probably the norm in Bioconductor.
Posted by: Bob Muenchen | April 14, 2012 at 05:18
I can't help wondering about the accuracy of data involving a search for a single letter. 'R'? what were they thinking? Software developers have to quit with the cutesy titles.
Posted by: Helen | April 20, 2012 at 15:41