by Joseph Rickert
The 8th XLDB (Extremely Large Databases) Conference open at Stanford on Tuesday with an outstanding program. This conference has been providing leadership in the "Big Data" world since its first workshop which was held in 2007. For example, the summary report for that year notes: "Both communities (industry and science) are moving towards parallel ... architectures on large clusters of commodity hardware, with the map/reduce paradigm as he leading processing model." but also observes that: "The map/reduce paradigm ... will likely not be the final answer" — prescience and a sober assessment with none of the hype that was to follow.
The extraordinary feature of the first day of this year's conference was the prominence of R. Several talks were either directly about R, or discussed R in conjunction with a significant subtopic. John Chambers spoke on "R in the World: Interfaces between Languages". Karim Chine presented ElasticR. Hannes Mühleisen elaborated on some innovative ideas in his talk: "R as a Query Language" describing a system for using R to write effective queries based on renjin, (R on the JVM). Jeff Lefevre discussed HP's DistributedR in his talk on "Extending Vertica with External Analytics". Rene Brun described the Root-R package and Rcpp in his talk about "ROOT: a Data Storage and Analysis Framework" used at CERN, and Nachum Shacham mentioned both R and the R/H2O/Hadoop interface in his opening talk: "On the Practice of Predictive Modeling with Bit Data".
Even Stephen Wolfram obliquely referred to R! He began his special keynote talk, and very impressive impromptu demo of the Wolfram Language, with a statement that went something like this: "Unlike other languages that have a very small core and add features through packages we decided to build as much as possible into the language". The exact quote will have to wait untill the video is available, but it very much seemed to me that at least with respect to design he was positioning the Wolfram Language (the combination of Mathematica and Wolfram Alpha) as a kind of anti-R!
The slides from all of the talks will be available on the conference program page in a couple of days, and the conference videos will follow in June. In the meantime, through the kindness of Hannes Mühleisen and the conference organizers, we have Hannes' slides and those of Rene Brun and John Chambers available for download.
The following slide from Hannes' presentation indicates how R might be make more efficient through certain SQL sensibilities and seems to share the spirit of data.table.
Rene's presentation contains several informative slides. Be sure to check out slide 11 which shows when C/C++ overtook Fortran, slide 29 which gives an overview of the core ROOT Math/Stat libraries , and slide 40 which shows how R, Rcpp and RInside fit in.
John Chamber's presentation begins with a reminder that the original S language was initially conceived as an interface to the Fortran libraries, the outstanding computational resource of the day, and then stresses that R's interfaces to other languages and resources such as databases is one of its greatest strengths.
He then elaborates on his three principles for understanding R and describes the motivations, architecture and design of the new group of "XR" packages he is working on. When complete, these will provide a uniform interface to languages as diverse as Python and Julia and provide proxies to objects, functions and classes that will benefit both end-user programmers and developers.
If XLDB 8 turns out to be as prescient as its predecessors at pointing to the direction in which big databases will go, then the future will bring some pretty exciting developments to R.