I wasn't at the Use of R in Official Statistics (uRos2018) conference in the Netherlands last month, but I'm thankful to Jeroen Ooms for sharing the slides from his keynote presentation. In addition to being a postdoc staffer at ROpenSci, Jeroen maintains the official repository for the daily R builds on Windows — you might recognize his name from the verification certificate that pops up when installing R on Windows. His uRos2018 talk provides a fascinating glimpse into the complex systems, dependencies, and processes that come together to make installing R as easy as as double-click.
On thing I found if interest is that while R is a complex program in its own right — made up of a mix of C (32%), Fortran (24%) and R (37%) code — it also relies on a number of external libraries to perform some of its underlying tasks. For example:
- When you multiply two matrices or perform other linear algebra operations, R uses the BLAS and LAPACK libraries for the calculation. (Microsoft R open swaps in the Intel Math Kernel Library here, which provides the same calculations using different algorithms.)
- When you download a file with
download.file
orinstall.packages
, R uses the libcurl library to manage the network transfer. - When you use character data with foreign encodings (for example text with accents, symbols, and emojis), R uses the libicu library so those strings can be handled and compared in a uniform way.
- When you use regular expressions with
grep
andregexpr
, R uses the PCRE library to parse the syntax and perform the search operation. - For graphics, R uses the cairo library to render points, lines, and text as a bitmap image for display and/or export.
Using these libraries means that the R maintainers don't need to reimplement the wheel, but can instead use open-source libraries that are used in a plethora of systems to accomplish these universal tasks. On Linux-based systems, these libraries are generally pre-installed for any application to use. But on Windows (and Mac), R needs to build these libraries for its own use, which makes building R a complex process. Not only do you have to have the right set of compilers and scripts to build R itself, but also all its library dependencies as well. R packages may, in turn, depend on other third-party libraries and in that case the same issues apply to building them, too.
You can find all of the scripts used to create the official releases and daily builds (r-patched and r-devel) for R for Windows on Github. Brian Ripley and Duncan Murdoch (and now, Jeroen Ooms) have long provided RTools: a collection of Windows programs (shell commands, other programs) which, given the right compiler, will allow R's build scripts to run. One other interesting bit of news from Jeroen's keynote is that an experimental version of RTools is on the way, which promises to make it easier to build, distribute and install external libraries on Windows.
If you're at all interested in the inner workings of what it takes to make R run on Windows, I recommend checking out Jeroen's talk at the link below.
Jeroen Ooms: The R infrastructure
Comments
You can follow this conversation by subscribing to the comment feed for this post.