« Revolution R Enterprise 5.0 now available for free academic download | Main | DecisionStats review of Revolution R Enterprise 5.0 »

November 22, 2011

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a010534b1db25970b0153936b6cfe970b

Listed below are links to weblogs that reference Why we need to deal with big data in R:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Isn't it fun to live in this age when we can exchange ideas so easily? Take care,

Luis

Great reply David - thank you.

Sue's demo is great, and I like it very much. Just one thing to mention! Her example is related to Windows HPC server. And not every one of us has access to this kind of hardware. Is that possible for her to give examples on Amazon Web Services? For example, she could set up five S3 buckets. And she can make one of those nodes into head node which distributes the tasks to four sub nodes, gathers results, and sends back to the requesting laptop. Big data analysis and parallel computing on Amazon Cloud sounds more accessible for everyone.

From Susan earlier slide in useR2011, there is a lot of dirty work to define each variable properly and make them into the dataset "birthAll", such as SEX = list(type="factor", start=35, width=1, levels=c("1", "2"), newLevels = c("Male", "Female"),description = "Sex of Infant“). I spent several hours tonight to match columns and their names. Is that possible for Susan to share this "birthALL"? Certainly, it will be fun to have this dataset and actually play with it in various distributed environments.

The comments to this entry are closed.


R for the Enterprise

Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid

Search Revolutions Blog