By Joseph Rickert
Anyone seeking to learn R faces two major challenges: (1) learning how to swim in the sea of information: R packages, books, websites, blog posts, message boards etc. that threatens to drown a newbie and (2) and coming to grips with the structure, syntax and features of the language itself. Having some idea of what one wants to do with R is clearly an important first step that will set the path of learning. R, an open source computer language, is the premier software system for statistical computing. Not only can any statistical idea be expressed in R, it is likely that someone in the open source community has already written a function to accomplish or at least facilitate any statistical analysis a working statistician or data scientist might be contemplating.
R functions are organized into libraries or packages that usually relate to some particular statistical task. Assuming something like an average of 20 functions per package, the 3400 available contributed packages[1] offer over 68,000 routines to read in data, manipulate it analyze it and visualize the results. No one could possibly become familiar with all of these. But, because R is an interpreted (instant feedback) language that encourages experimentation, some serious, sophisticated statistical analyses can be accomplished by stringing together the appropriate functions into a script. If interest in R is to only perform some particular analysis then a beginner’s best bet might be to select one of 100 or so books or blogs on doing statistics with R that provides relevant sample code and cut and paste to get a workable script. There is no shame in this. That is why all the open source authors went to the trouble of packaging up their work.
However, if a person really wants to be able to speak the R language and become a competent R programmer then, at the present time, one can find no better guide than Norman Matloff’s The Art of R Programming. Professor Matloff is a statistician and a computer scientist with a considerable amount of teaching experience. His book is no mere programming reference guide. It is a carefully crafted sequence of lessons that start at the beginning and work up to some fairly advanced topics including a lucid account of object-oriented programming in R, a presentation of the rudiments of TCP/IP operations and a discussion of R programming for the internet, examples of parallel programming with R, and a discussion spanning several chapters of how to write production-level R code that includes methods and advice on debugging R code, writing efficient R code, and interfacing R with other languages. Other distinguishing features of the book are brief examples showcasing a large number of functions (including rare gems such as D() for symbolic differentiation) that indicate the power and scope of R, and over thirty “Extended Examples” each of which is a credible study in writing careful, professional code. The most captivating aspect of the book, however, is Matloff’s thoughtful manner of exposition. R’s rich, compact syntax can be challenging the first time around. Matloff knows where the difficulties are. His presentations of R’s various features and functions begin from a point of view that anticipates obstacles that likely to confound someone going down the R path for the first time and guides the novice around them. I expect that The Art of R Programming will appeal to diverse audience of aspiring R programmers.
[1] As of 11/27/11
Comments
You can follow this conversation by subscribing to the comment feed for this post.