« Orbitz and the Macs: Signals, not segmentation | Main | Trying for a baby? Here's how long it might take. »

June 28, 2012


Feed You can follow this conversation by subscribing to the comment feed for this post.

This is pretty cool. So, what's the reason for this big divergence? E.g what makes xdf so much better? besides xdf, parallelism is there anything else? Does logistic regression scale in the same way? I'd love to see that in a future presentation.

I'm Sue Ranney and I ran the GLM timings on my laptop for the plot in this blog post. There are three main reasons for the big divergence: efficient use of memory (data is not copied unless absolutely necessary), efficient handling of categorical variables (saves memory and time), and efficient use of threads for parallelization. Logistic regression scales the same way. By the way, these timings were all done using in-memory data frames. The RevoScaleR analysis functions continue to scale up for huge data sets if data is stored in the efficient .xdf file format.

The comments to this entry are closed.

Search Revolutions Blog

Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid
Get this blog via email with Blogtrottr