A quote from SAS's Anne Milley (director of technology product marketing) in the recent NYT article about R has stirred up a bit of controversy on open-source blogs and mailing lists. The quote in question is this:
“We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet.”
Many commenters pointed out that "freeware" isn't an appropriate description of R, which is in fact open source software. The defining feature of R isn't that it is available without cost (in fact,
some vendors sell supported versions of R for a subscription fee), it is that it is open software whose source code is freely available to all to inspect, verify, and modify. (The GNU foundation provides an
essay making the distinction between free and open source software better than I can.) In response to the controversy, Milley posted a followup on the
SAScom magazine blog:
My remark reflects a key difference between R and SAS, that of support, reliability, and validation. Customers value SAS for many things, including our extensive testing, documentation, 24/7 support, and training. In contrast, the quality of proliferating R packages is varied and uneven, especially in complex analytical modules. Mistakes in these packages can lead to misleading results, even for experienced users.
REvolution Computing provides support and validation services for R so that it can be used in environments where reliability and quality must be assured -- I'll have more to say about that in a subsequent post. But for now I'd like to focus on the narrow aspect of the aforementioned "proliferating R packages".
This really does seem like a specious argument to me. Just as R has many packages contributed by users and available for download, there are
thousands of sites where one can download SAS code and procedures for different kinds of analyses. In either case, whether one trusts the results is a function of one's trust in the source of the code, and the collective evaluations of those who have inspected the code and the results. The number of packages available for R has no bearing on the quality of R, in the same way that contributed code for SAS has no bearing on the quality of SAS itself.
To be fair, R's nomenclature does confuse the issue a little. In R, there are a number of standard packages (the "base" and "recommended" packages) which, as with R itself, are part of the official R distribution and are under the oversight of the R core group. (These are of a different class than the
contributed packages which are downloaded separately from CRAN.) These standard packages have undergone the same degree of rigorous scrutiny as the core of R itself, and by statistical community has repeatedly verified the results of these packages against the statistical literature.
But is this "community standard" enough for R to be used in validated environments, such as the analysis of clinical trials where SAS is often used? Since R already is used in such environments the answer is self-evident, but that's a topic I'll pick up in more detail in another post.