The RHadoop project, the open-source project supported by Revolution Analytics to integrate R and Hadoop, continues to evolve. Now available is version 2 of the rmr package, which makes it possible for R programmers to write map-reduce tasks in the R language, and have them run within the Hadoop cluster. This update is the "simplest and fastest rmr yet", according to lead developer Antonio Piccolboni. While previous releases added performance-improving vectorization capabilities to the interface, this release simplifies the API while still improving performance (for example, by using native serialization where appropriate). This release also adds some conveniance functions, for example for taking random samples from Big Data stored in Hadoop. You can find further details of the changes here, and download RHadoop here.
RHadoop Project: Changelog
Comments
You can follow this conversation by subscribing to the comment feed for this post.