« Package Update Roundup: May 2009 | Main | Converting time zones in R: tips, tricks and pitfalls »

June 01, 2009


Feed You can follow this conversation by subscribing to the comment feed for this post.

Do you think standard options can read in 15K variables on a Windows machine with 2G RAM? I haven't had success with this, so I'm curious. Thanks!

> their winning entry was for the slow challenge using the small data set

No, it was for the large data set which they have reduced to about 200 significant variables for each of the three problems. (I have their list of variables, courtesy of the ever-helpful Hugh Miller. I want to compare their list with the variables selected by several (semi-)automated variable selection packages in R like "caret".)

>Do you think standard options can read in 15K variables on a Windows machine with 2G RAM?

It can’t. They had to read the variables in smaller batches, find the significant variables in each batch, then combine and find the significant variables on the combined set.

Thanks for the correction, Allan - I've updated the article to reflect that they won for the large data set. I was confused because according to their results they got a higher score for their "small" entry.

Also, while R might not be able to fit a model with 15,000 variables with only 2Gb of RAM (you'd need 64-bit R for that), they did use R for the entire analysis. It was just divided into two steps: preprocessing the data, and then fitting the model on the smaller, processed data. That's standard practice in any case.

The comments to this entry are closed.

Search Revolutions Blog

Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid
Get this blog via email with Blogtrottr