When it comes to getting things right in data science, most of the focus goes to the data and the statistical methodology used. But when a misplaced parenthesis can throw off your results entirely, ensuring correctness in your programming is just as important. A new book published by CRC Press, Testing R Code by Richard (Richie) Cotton, provides all the guidance you’ll need to write robust, correct code in the R language.
This is not a book exclusively for package developers: data science, after all, is a combination of programming and statistics, and this book provides lots of useful advice on helping to ensure the programming side of things is correct. The book suggests you begin by using the assertive package to provide alerts when your code runs off the rails. From there, you can move on to creating formal unit tests as you code, using the testthat package. There's also a wealth of useful advice for organizing and writing code that's more maintainable in the long run.
If you are a package developer, there's lots here for you too. There are detailed instructions on incorporating unit tests into your package, and how tests interact with version control and continuous integration systems. It also touches on advanced topics relevant to package authors, like debugging C or C++ code embedded via the Rcpp package. And there's more good advice, like how to write better error messages so your users can recover more easily when an error does occur.
Testing is a topic that doesn't get as much attention as it deserves in data science disciplines. One reason may be that it's a fairly dry topic, but Cotton does a good job in making the material engaging with practical examples and regular exercises (with answers in the appendix). Frequent (and often amusing) footnotes help make this an entertaining read (given the material) and hopefully will motivate you to make testing a standard (and early) part of your R programming process.
Testing R Code is available from your favorite bookseller now, in hardback and electronic formats.
Comments
You can follow this conversation by subscribing to the comment feed for this post.