Sameer Chopra, vice president of Advanced Analytics at Orbitz Worldwide, wrote recently in *Analytics m*agazine about the changing landscape of processes, software and systems for statistical modelers. In a section on "Big Data and Open Source Analytics", Chopra lays out the reasons why the R language "has become the data-mining tool of choice for machine learners":

- R has very good integration with Hadoop, an area where established commercial statistical tools have frankly been playing catch-up over the past year. (Note: At the time of this writing, some established statistical solution providers were announcing an access interface to Hadoop.)
- Many startups and smaller firms do not have deep pockets and are embracing open source tools such as the R programming language and NoSQL database systems such as MongoDB.
- R is a leading language for developing new statistical methods, and it is a platform for statistical innovation and collaboration across both the corporate world and academia. In my opinion, for the first time in years, the stronghold of established commercial players seems to be potentially threatened; open source tools are better suited for Big Data and will slowly but surely continue to take share away from commercialized statistical packages. In fact, traditional statistical vendors have recognized that R is a force to be reckoned with. In response, many of these vendors have developed hooks into R so users can interface with the R language.
- Based on the resumes I’ve been reading, the next generation of data miners is flocking to R as their go-to tool. Professors in general are comfortable with R; they tend to use R and Excel as part of their curriculum.
- In short, open-source analytics tools and platforms have arrived.

Chopra says that the usage of R in the commercial sector is growing "as firms such as Revolution Analytics focus on the enterprise capabilities for R" (for example, Revolution R Enterprise's Hadoop support and enterprise deployment).

Chopra also has some interesting perspectives on statistical modeling vs machine learning which you can find in the full article linked below.

Analytics magazine: The times they are a changin’ for advanced analytics

Mango?

Posted by: tb | May 17, 2012 at 14:01

Good point - the MangoDB typo was in the source article. I've corrected it here -- thanks.

Posted by: David Smith | May 17, 2012 at 14:30