« Revolution Newsletter: March 2013 | Main | What does a data scientist do? »

March 26, 2013


Feed You can follow this conversation by subscribing to the comment feed for this post.

Is this stepwise regression available for the RevoScaleR class of datasets (Big Data)? Will it be available for parallel computation

An example/demonstration as and when is available would be nice. Stepwise regression have computational advantages but some theorists discourage them on grounds of reliability.

How about other ensemble methods and data mining for Big Data.

Thanks for the comment! Stepwise Regression will be a feature of ScaleR, our scaleable package for Big Data.

Good idea about a demonstration. Agree with you that Stepwise is not universally accepted (among theorists or working analysts). In light of the demand for it from our existing customers, we think it best to support it and let users decide.

We have ensemble methods and other advanced techniques (such as random forests) in our roadmap for future releases. I'll write about them in this blog as we refine our release schedule.

Yeah, in my experience, stepwise tends to overfit massively. Its OK if you use a hold out set, but even then I find that lasso or ridge will give much more stable and useful solutions. You tend not to get a p value, but that's part of the charm for me, at least.

The comments to this entry are closed.

Search Revolutions Blog

Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid
Get this blog via email with Blogtrottr