« Free e-book: Effective Graphs with Microsoft R Open | Main | What's in Pasta Carbonara? »

May 12, 2016


Feed You can follow this conversation by subscribing to the comment feed for this post.

Here are the things I tell people to look out for as potential indicators for the "quality" of a package (in no particular order):

- written by a known expert in the field
- package has been around for some time
- package has been updated
- listed under a task view
- has a vignette or other supporting documentation
- there is a package website
- paper/book about package has been published
- help files are comprehensive and free of errors
- has been cited in papers

None of these are necessary nor sufficient conditions for showing that a package is of high quality (which is difficult to define in the first place), but these can be helpful indicators.

I am curious, does the 25% figure include those packages that are on Bioconductor, or just those on CRAN? A requirement for inclusion on Bioconductor is that a package must have at least one vignette, which I think is a very good policy.

You write "For an R package, the first obvious place for an author to provide quality documentation is the vignette" I would argue this is not the first obvious place to look for quality documentation. README's are the first place as they are what get displayed first on a GitHub page. The very historical nature of READMEs tell the user to "read this first". You make what is a faulty initial assumption in the vignette == quality (I agree with this) but your logic also means !vignette == low quality (which you indicate is also an assumption based on the remainder of your post). This is likely not a tenable assumption given the prevalence of READMEs and GitHub as the popular dev environment. If this were a regression model you have one variable and an R^2 that shows vignettes are predictors but your model falls short of explaining package quality (i.e., significance but great amounts of unexplained variance). It's pretty tedious to maintain separate READMEs and vignettes so many chose READMEs as the natural obvious place to provide quality information. Adding README length (nchar) as a variable to your model may be more work but is likely a better model of package quality. Granted you do give a paragraph disclaimer that vignettes aren't the only measure but but your title implies !vignette == bad. Perhaps you're model is measuring developer's valuing of vignettes as the primary form of communication, not package quality as your title suggests. I suspect your model is a measure of developer groups (e.g., academics, business, etc.) and their choices of mode of communication, not quality.

You have raised an important topic. As Bill Venables wrote on R-Help in 2007 "Most packages are very good, but I regret to say some are pretty inefficient and others downright dangerous."

Having good examples in the help with real datasets is another plus point. If authors have never tried their own code in practice, they may well have missed something.

Tyler Rinker suggests Github README files are more important than vignettes. I disagree. Github is a developer site, not a user site. Users see the documentation that comes with the package, and is displayed within R.

Maintaining a web page to support a package is a good thing, and the README file would be the place to hold the content if the package was hosted on Github, but requiring Github as the host is too limiting, and depending on an external web site for documentation is too fragile. Packages should contain their documentation.

I think Joe Rickert underestimates the importance of the help page documentation. Antony Unwin said that having good examples with real datasets is a plus. Those probably belong in vignettes, but I'd add that having short illustrative examples on the help pages is really important.

Duncan thanks for your insights. I think I may have overstated the importance of a README, I didn't mean to suggest it is more important, more so that it is an alternative place to display package usage and can also demonstrate quality. The README is also typically part of the package build which also is displayed in a pretty way if it's an .md by CRAN though not as readily accessible within R as a vignette or help page.

I think there's room for a solution to the problem of having an easy way to maintain a README that also serves as a vignette without maintaining 2 documents. This is especially true for smaller packages with a few number of functions where the use could easily be described with a single vignette. It'd be nice to just be able to maintain a README and vignette but the two documents are most likely, just a tad bit different. A gogle search brings up this related twitter discussion on the topic: https://twitter.com/JennyBryan/status/724391823385354240

Vignettes are less important than manuals IMO. Vignettes are just a way to wrap what is in manuals. Vignette is important to show a bigger picture on how to combine multiple functions from a package, but it is just an example workflow. Real documentation are the manuals.

The comments to this entry are closed.

Search Revolutions Blog

Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid
Get this blog via email with Blogtrottr