I was honoured to be invited earlier this month to the Directions of Statistical Computing meeting in Brixen, Italy. DSC is one of two meetings run by the R Project and unlike the useR! conference, DSC is a much smaller and intimate meeting (DSC 2014 had about 30 participants). If you haven't come across DSC meeting before (quite possible, given that it had last been held in 2009), R Core Group member Martyn Plummer has a nice overview of DSC.
A focus of the first day of the conference was on the performance of R computation engine. The organizers invited representatives from all of the "alternative" R engine implementations, and I believe it marked the first time that developers involved with pqR, Renjin, FastR, and Riposte and TERR were gathered in the same place. (The CXXR project was unfortunately not represented.) Jan Vitek [slides] presented a fascinating comparison of the various projects, based on his interviews with the developers.
It was interesting to see the commonalities in many of the approaches. Three projects, Renjin [slides], FastR [slides] and Riposte [slides] use just-in-time compilation and an optimized bytecode engine. All have achieved impressive performance gains, but have struggled with compatibility (and especially being able to run the 6000+ CRAN packages). But it's clear that their work is having an influence on R itself: Thomas Kalibera [slides] (who previously worked on the FastR project) is working with Luke Tierney and Jan Vitek to improve the performance of R's bytecode interpreter.
Other approaches are also being pursued to improve the performance of the R engine. Luke Tierney [slides] described new improvements in R 3.1 to streamline the reference counting system, and noted that several of the performance improvements implemented by Radford Neal [slides] in pqR have already been incorporated into the R engine. And Helena Kotthaus [slides] has done some very exciting work to profile the performance of the R engine which has already led to performance improvements when virtual memory is being used.
Overall, it was exciting to see collaboration and research into R as a language, and especially the attention from the computer science community to the implementation of R. As Robert Gentleman (co-creator of R and conference lead) noted, R now has a new community beyond statisticians and data scientists: computer scientists. It's exciting to see how R is incorporating learning and innovation from this new community.
For more on DSC 2014, see the reports from Martyn Plummer on Day 1 and Day 2 of the conference. The full program, with links to download the slide presentation, is at the link below.
DSC 2014: Schedule (and slide downloads)
-- But it's clear that their work is having an influence on R itself
Legend has it, I'm not spreading unfounded rumours (so far as I know :) ) that R-Core has been resistant to "anything" not done their way. It does seem to be highly inefficient for all of these projects to have to build a nearly complete R, in order for R-Core (or whoever is gate-keeping) to accept evident improvements to R, but not the alternate implementations.
One wonders how long being poached will be acceptable to outsiders. Does anyone really want to be assimilated by the Borg? Again, :)
In a contemporaneous (in the R-Bloggers sense) post, Derek Jones mused about ISO-R, not seriously. OTOH, were there a language spec and test kit, alternate versions could be built with far less hassle.
There is precedent. Before Sun, then Oracle more overtly, killed off both the spec and the test kit, java was thriving in an open source sense. Not so much now, I'd wager.
Posted by: Robert Young | July 28, 2014 at 06:06