On of the nice things about R is that a lot if it is written in the R language. That means, as an R user, if you want to see how R calculates a certain statistic, or you want to modify an existing function for your own use, you can just look at the R code by typing the name of the functions. Sometimes, though, you'll see just a couple of lines of R code even for quite complicated functions -- in these cases, the work is mostly being done in a lower-level programming language like C or Fortran via a call to .Internal or .Call. (Of course, since R is open source, you can take a look at the C or Fortran code as well, if you like.)
So the natural question follows: how much of R is written in the R language? The librestats blog did an analysis of the latest R source code distribution, and found that 22% of the R's lines of code are in R.
For comparison, about 50% of R is written in C, and just under 30% in Fortran -- for the current 2.13.1 release, at least. Of course, these ratios have changed over time with each release of R, as this analysis of the R codebase from Ohloh shows:
Note that R's source distribution includes the language/graphics engine and only the base packages. All distributions of R also include the recommended packages, and most users install third-party packages from CRAN as well. For packages in general, the ratios are a bit different: when considering the 12.4 million (!) lines of code that make up the 3249 packages on CRAN, just shy of 50% of that code is in R. (Thanks again to librestats for the analysis.)
So in conclusion: while R itself is mostly written in C (with hefty chunks in R and Fortran), R packages are mostly written in R (with hefty chunks written in C/C++).
librestats: How Much of R is Written in R Part 2: Contributed Packages
Comments
You can follow this conversation by subscribing to the comment feed for this post.