Comments on Big Data Generalized Linear Models with Revolution R EnterpriseTypePad2012-06-28T21:45:53ZBlog Administratorhttp://blog.revolutionanalytics.com/tag:typepad.com,2003:http://blog.revolutionanalytics.com/2012/06/big-data-generalized-linear-models-with-revolution-r-enterprise/comments/atom.xml/Sue Ranney commented on 'Big Data Generalized Linear Models with Revolution R Enterprise'tag:typepad.com,2003:6a010534b1db25970b01774336a2ed970d2012-07-10T17:25:46Z2012-07-11T21:45:47ZSue RanneyI'm Sue Ranney and I ran the GLM timings on my laptop for the plot in this blog post. There...<p>I'm Sue Ranney and I ran the GLM timings on my laptop for the plot in this blog post. There are three main reasons for the big divergence: efficient use of memory (data is not copied unless absolutely necessary), efficient handling of categorical variables (saves memory and time), and efficient use of threads for parallelization. Logistic regression scales the same way. By the way, these timings were all done using in-memory data frames. The RevoScaleR analysis functions continue to scale up for huge data sets if data is stored in the efficient .xdf file format.</p>nick commented on 'Big Data Generalized Linear Models with Revolution R Enterprise'tag:typepad.com,2003:6a010534b1db25970b017615eeece7970c2012-06-29T16:48:29Z2012-07-01T02:49:22ZnickThis is pretty cool. So, what's the reason for this big divergence? E.g what makes xdf so much better? besides...<p>This is pretty cool. So, what's the reason for this big divergence? E.g what makes xdf so much better? besides xdf, parallelism is there anything else? Does logistic regression scale in the same way? I'd love to see that in a future presentation.<br />
</p>