by Thomas Dinsmore
Revolution R Enterprise Release 6.2 goes live next week, so naturally our development team is thinking ahead to Release 7, which we plan to release later this year.
Some of those enhancements are hush-hush, and we can't talk about them yet. But one of the most important enhancements we've already announced: support for predictive analytics inside Hadoop.
Let's be clear what we mean by running "inside" Hadoop. Lots of analytics vendors currently offer the capability to connect with Hadoop: they give you the capability to extract your data from Hadoop and move it over to the server where your analytics software is deployed.
Revolution R Enterprise can do this too, but we make it easier for the user by providing a virtual interface — our open source rmr project — that lets you work in R. SAS can connect with Hadoop too, through its SAS/ACCESS engine for Hadoop, but SAS forces you to work in MapReduce, Pig or HiveQL.
But what if you want to work with predictive analytics inside Hadoop -- without first moving your data to a server? There are few options available today. You can consider using Apache Mahout, the open source project for predictive analytics; but analysts who have tried to work with Mahout see it as something of a mixed bag. The most widely deployed capabilities within Mahout are its unsupervised learning methods, such as clustering, association and collaborative filtering, while the predictive analytics algorithms are thinly used and deployed.
ScaleR inside Hadoop offers the potential to radically reduce the cycle time needed to build and deploy predictive models:
- Since you don't need to extract and move data, you can eliminate an entire step from the process
- You can leverage distributed tools to transform your data
- You can work with all of your data at once, not just the data you have time to extract and move
- You can speed model deployment by leveraging the native ScaleR prediction capability
Over the next several months I'll post about other features we're adding to Release 7 of Revolution R Enterprise. In the meantime, we're always interested in hearing from you about features you would like to add; please feel free to make suggestions in the comments section below.
Any plans for Debian/Ubuntu packages?
Posted by: Ryan | April 16, 2013 at 12:56
are ScalaR, ScalaR+Hadoop open source like R/Hadoop?
Posted by: nick | April 16, 2013 at 13:50
We've had other requests for Ubuntu, and we've taken them under consideration. We've not previously seen much interest in Debian, but will keep an eye on it.
Scala is open source; I'm unable to locate any information on "ScalaR".
Posted by: Thomas Dinsmore | April 16, 2013 at 15:01
Hello
I read that Revolution 6.2 goes live on April 22, but I can't find the link.
Posted by: amparo | April 23, 2013 at 17:20