If you're looking to apply massively parallel resources to an R problem, one of the most time-consuming aspects of the problem might not be the computations themselves, but the task of setting up the cluster in the first place. You can use Amazon Web Services to set up the cluster in the cloud, but even that take some time, especially if you haven't done it before.
Jeffrey Breen created his first AWS cluster this weekend, and in just 15 minutes had demonstrated how to use 5 nodes to generate and analyze a billion simulations in R. It was a toy example, sure -- estimating pi -- but it's a great example of how quickly you can set up a parallel computing environment using R. Jeffrey used JD Long's segue package, which works with the Hadoop Streaming service on AWS. The segue package is still in the experimental stage, but still: this is a great demonstration of applying cloud-based hardware to parallel problems in R.
Jeffrey Breen: Abusing Amazon’s Elastic MapReduce Hadoop service… easily, from R
Thanks for the shout out, David!
My main goals with the Segue package was to illustrate how to abstract "the cloud" away with a combination of (hopefully) smart defaults and simple language abstractions.
OK, that's a lie. My first goal was to get my own simulations to run faster. My second goal was the abstraction mumbo jumbo mentioned above.
-JD
Posted by: JD Long | January 10, 2011 at 16:03
Karmasphere is giving developers and analysts the power to mine and explore Big Data on Hadoop.
Analysts:
Write and Prototype SQL
Profile and Diagnose Queries
Generate Dataset and Visualize
Developers:
Develop, Test, and Debug MapReduce
Profile and optimize
Deploy to Any Hadoop
karmasphere.com
Posted by: John Murphy | January 13, 2011 at 14:07
I'm interested in this webinar ; can you please send me the URL for tomorrow ?
Thank you.
Posted by: Goniak | January 23, 2013 at 09:07