There's an interesting discussion thread on LinkedIn going on now on the relative benefits of R versus SAS in the commercial sector. Oleg Okun kicks off the discussion with this question:
Did anyone have to justify to a prospect/customer why R is better than SAS? What arguments did you provide? Did your prospect/customer agree with them? Why do you think, despite being free and having a lot of packages, R is still not a favorite in Data Mining/Predictive Analytics in the corporate world?
What follows is an in-depth discussion (more than 130 comments so far) comparing the two statistical software systems. Steve Miller condenses the discussion in a great post at the Information Management blog. Themes covered include: the benefits and purported risks of using open-source software vs commercial software; dealing with large data sets (one R user notes: "I've used a very fast (~16Tb RAM) computer to run simulations on hundreds of billions of observations"); availability of skills for new hires ("Many of our customers have the problem of needing to spend the time and money to train new hires in SAS because their new hires have only used R"); availibility of support for R (Revolution Analytics provides support for R); and many other topics. One sub-thread focused on quality in open-source software, for which Steve had an excellent riposte:
There's little argument that the vast international R community provides access to the latest statistical models and procedures before they're available in proprietary SAS. But SAS proponents counter that R users assume more risk with software quality than do those of SAS. In fact, an oft-quoted comment from a SAS executive on the "benefits" of R goes something like “I think it addresses a niche market for high-end data analysts that want free, readily available code. We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.” My take after 8 years of heavy R usage is that I've never worked with a more stable, bug-free piece of software.
Check out the full thread and contribute to the discussion on LinkedIn.
Information Management: LinkedIn Advanced Business Analytics – SAS Vs. R
R is stable? Has he ever looked at the NEWS file for R?
I use R daily. Almost every single .0 release of R has changes to the core that break my packages/scripts. The upcoming 2.14.0 release is doing that by introducing required NAMESPACEs, forcing me to re-write some code that was originally written to work with both S3 and S4 generics.
I have sent 100+ bug reports to R-core and package developers over the past 10 years. Way more than with SAS. My most recent bug report to R-core was flatly rejected, probably because it was a very rare bug and would have taken a fair bit of work to fix. Fair enough for a volunteer project, but not the kind of attitude that a commercial product would have.
For mission-critical use, the only way to keep R reliable and stable is to set it up with the necessary packages and then never update/upgrade.
I live R, but if my life depended on R or SAS, I would choose SAS.
Posted by: Kevin Wright | September 02, 2011 at 14:03
SAS has had quite a few bugs over the years as well, to the point where I have had instructions "not to debug SAS." Don't forget the infamous and embarrassing "where/by" bug (where using a where clause with a by statement in SAS 9.1.3 caused one observation per by group to be deleted).
R and SAS are both huge pieces of software. They both have many bugs, and SAS certainly doesn't win in this area in my experience.
Posted by: John Johnson | September 02, 2011 at 19:15
I agree that the proprietary vendors face a challenge, but I think it will take longer than 10 years for them to end up like Sun. That's because the migration from Solaris to Linux was a relatively easy one compared to the SAS to R conversion. Companies have thousands of SAS programs to convert to R.
Regarding big data, I don't think virtual memory has anything to do with SAS's advantage. For example, when doing linear regression, simply storing sums of squares and crossproducts is sufficient to solve the problem regardless of the number of observations involved. SAS could analyze billions of records back when mainframe memory was tiny compared to today's desktops. Of course some algorithms require all data be in memory at once, but for those SAS and R face the same challenge.
For anyone interested in seeing how many data analysis tasks are done in SAS, R, SPSS & Stata, see http://r4stats.com.
Cheers,
Bob Muenchen
Posted by: Bob Muenchen | September 07, 2011 at 09:54
Hm. This was supposed to be a comment on Steve Miller's site, but somehow it ended up here!
Posted by: Bob Muenchen | September 07, 2011 at 09:57
Yes there should realize the reader to RSS my feed to RSS commentary, quite simply
Posted by: hooher tod | September 15, 2011 at 00:47
In my experience the R cognoscenti do not like to involve themselves
with mundane matters like “quality control”. Recently, Zhang et al.
2011 published some simulation results indicating serious problems with
the lme4 package. I verified some of the results and posted to the
R list. There was absolutely no response whatsoever.
For comparison I used AD Model Builder which is free software. It got results close to those reported by Zhang et al. for SAS NLMIXED.
I certainly would not use R for any serious mixed model analysis.
The link is.
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2011q4/006953.html
Posted by: dave fournier | December 02, 2011 at 15:09
New verion of SAS is much better and have less bugs. We use it in our company. I choose SAS.
Posted by: Mirek Burnejko | January 28, 2012 at 02:54