« SAS formally announces integration with R in SAS/IML Studio | Main | Open-source banking with R »

March 26, 2009


Feed You can follow this conversation by subscribing to the comment feed for this post.


If one goes to the CRAN page for contributed packages, it states that there are over 1715 packages, and that "All packages are tested regularly on machines running Debian GNU/Linux. Packages are also checked under MacOS X and Windows, but only at the day the package appears on CRAN." It makes no mention of any of the points you make, nor does it tell the user that R does not support these packages (as stated in
http://www.r-project.org/doc/R-FDA.pdf. ) Would it not be best to have this information on the same page as that from which these packages can be downloaded ?

Some are student projects, long since abandoned. Just as when using a SAS macro downloaded from a website, or installing a third-party Excel add-in, you'll need to rely on the reputation of the author (or the recommendation of trusted peers) when deciding whether to use such third-party code.

This is true up to a certain point, but to the difference of other statistical software, an R package needs to meet formal criteria and pass the R package checker (R CMD check) before being admitted to CRAN. This is not at all the same as getting a macro from a web page as the R package checker assures: presence of documentation, consistency of documentation and code, syntactic correctness, platform-independence etc.

This does not certify the software does what one expects it to do, but at least assures it meets minimum quality standards.

The ability of R packages to include automated software testing inside the package (and let the R package checker
run these tests) is another main quality of R packages, but
that is a bit off topic as this is merely a tool that can
be used with different levels of sophistication or not at all.

As with many products, one has to be aware of the potential limitations and familiarize oneself with the system. From time to time I have found problems in R, but I have also found problems in other systems. Besides the classical problems in Excel, around 2002/3 I remember facing problems with SAS GLM, where factors would be significant using one version and not when using the latest version. The problem was subsequently fixed.

The core of R is becoming more polished with each release, but one always need to be aware of the distinction between core and contributed packages. Good and informative article.

A reviewing system for contributed packages could alleviate the problem. This could be as simple as a commenting system coupled to CRAN, where package users can post their feedback on the packages.

This won't be a sure-fire way to guarantee package quality and correctness, but one could at least get an immediate idea about possible issues with a package.

The link to "analysis of critical trial data" is definitely worth following up, for those who work in a regulatory environment. The links must be read: promoting the enthusiasm whilst honestly presenting some of the cons. For example, FDA guidance currently emphasizes end-user validation of software, and the summary slide of the Novartis link emphasizes the considerable resources required. More background information can be found on the MedStats mailing list,
in particular the posting by Marc Schwartz on 26/03/2009 16:22 (which bears my name because I forwarded it to the list). Note that the above concern about resources applies to SAS, SPSS, etc..not just R. I guess the successful supplier will be the one which best facilitates this end-user validation.

Nice article! The top of each help file also shows what package a function is in. There is a helpful package reviewing site at http://crantastic.org/.


The comments to this entry are closed.

Search Revolutions Blog

Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid
Get this blog via email with Blogtrottr