By David Smith
I was on a panel back in 2009 where Bow Cowgill said, "The best thing about R is that it was written by statisticians. The worst thing about R is that it was written by statisticians." R is undeniably quirky — especially to computer scientists — and yet it has attracted a huge following for a domain-specific language, with more than two million users wordwide.
So why has R become so successful, despite being outside the mainstream of programming languages? John Cook adeptly tackles that question in a 2013 lecture, "The R Language: The Good The Bad And The Ugly" (embedded below). His insight is that to understand a domain-specific language, you have to understand the domain, and statistical data analysis is a very different domain than systems programming.
I think R sometimes gets a bit of an unfair rap from its quirks, but in fact these design decisions — made in the interest of making R extensible rather than fast — have enabled some truly important innovations in statistical computing:
- The fact that R has lazy evaluation allowed for the development of the formula syntax, so useful for statistical modeling of all kinds.
- The fact that R supports missing values as a core data value allowed R to handle real-world, messy data sources without resorting to dangerous hacks (like using zeroes to represent missing data).
- R's package system — a simple method of encapsulating user-contributed functions for R — enabled the CRAN system to flourish. The pass-by-value system and naming notation for function arguments also made it easy for R programmers to create R functions that could easily be used by others.
- R's graphics system was designed to be extensible, which allowed the ggplot2 system to be built on top of the "grid" framework (and influencing the look of statistical graphics everywhere).
- R is dynamically typed and allows functions to "reach outside" of scope, and everything is an object — including expressions in the R language itself. These language-level programming features allowed for the development of the reactive programming framework underlying Shiny.
- The fact that every action in R is a function — including operators — allowed for the development of new syntax models, like the %>% pipe operator in magrittr.
- R gives programmers the ability to control the REPL loop, which allowed for the development of IDEs like ESS and RStudio.
- The "for" loops can be slow in R which ... well, I can't really think of an upside for that one, except that it encouraged the development of high-performance extension frameworks like Rcpp.
Some languages have some of these features, but I don't know of any language that has all of these features — probably with good reason. But there's no doubt that without these qualities, R would not have been able to advance the state of the art in statistical computing in so many ways, and attract such a loyal following in the process.