« In case you missed it: March 2012 Roundup | Main | Because it's Friday: Visualizing ocean currents »

April 13, 2012


Feed You can follow this conversation by subscribing to the comment feed for this post.

The decline in SPSS and SAS is very sharp, and it's curious that nothing is taking up the slack.

The APA publications manual now says that if commonly used software is employed, such as SAS or SPSS there is no need to reference it. I wonder if that accounts for some of the decline.

The rapid growth of R and equally rapid decline of SAS and SPSS in the Google Scholar data is quite pronounced. Students tend to keep using what they learned in college throughout their careers, implying an eventual effect that will be much broader than the academic data now demonstrates.

However, R's current market share is greatly overemphasized in that plot, which lacks SAS and SPSS. It mirrors Figure 7b in http://r4stats.com/popularity rather than the bigger picture shown in Figure 7a.

I'm surprised that R share is so small. Working in a medical biology and informatics department I just don't see anything else used. Plus these courses in our university are massively over-subscribed.

I suspect a lot more people use it but don't choose to cite it for simple graphics or mundane data manipulation..in much the same way most people don't cite excel in papers. I always cite specific bioinformatics packages but don't reference ggplot2 all over my papers.I'm also not sure I always cite R rather than the packages. On the other hand I think people cite expensive software because they usually have to justify buying it for a particular feature that they think the free software didn't cover (which is increasingly less the case).

Stephen raises a good point. A good search is critical to getting quality data. The details of that search are described at http://librestats.com/2012/04/12/statistical-software-popularity-on-google-scholar. For R I included the main web site that citations are supposed to include, but for each string I added such as "Bioconductor", "ggplot2 package" etc. I would get a few percent more hits. Altogether they may have added 10% more hits. The same held for SAS, adding strings like "proc mixed".

I don't know Bioconductor well, but if you know of packages or functions that are popular please write me at muenchen.bob at gmail dot com. The names have to be unique to avoid spurious hits, but I suspect that's probably the norm in Bioconductor.

I can't help wondering about the accuracy of data involving a search for a single letter. 'R'? what were they thinking? Software developers have to quit with the cutesy titles.

The comments to this entry are closed.

Search Revolutions Blog

Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid
Get this blog via email with Blogtrottr