At the Bay Area R User Group meeting this week, Antonio Piccolboni gave an overview of the design goals and implementation of the RHadoop Project packages that connect Hadoop and R: rhdfs, rhbase and rmr:
(The image above was captured from Antionio's slides.) The most revealing part of the talk for me was the comparison of implementing the K-means clustering algorithm the "standard" way (using Python, Pig and Java, as shown on slides 8-10) compared to using just R (with the rmr package, shown on slides 14-15): it takes much less code, and can be implemented in a single language. Antonio expands on this example at the RHadoop wiki, which makes for a great place to start if you're looking to implement big-data statistical models with the rmr package.
RHadoop wiki: Comparison of high level languages for mapreduce: k means
Could you please share the slide with me on [email protected]
Posted by: Raheel Javed | April 11, 2012 at 15:29