The competitive data prediction competitions hosted by Kaggle require data scientists to bring their A game: the competition is intense, and competitors know in real time from the daily leaderboards how their predictions compare in accuracy to those of their rivals. So it's no surprise that open-source R, the most powerful statistics language, is a common tool of choice amongst competitors.
In a presentation to the Bay Area R User Group last week, Kaggle CEO Anthony Goldbloom showed this chart of the software preferences of Kaggle competitors:
As you can see, a third of all Kaggle competitors report using R. Moreover, Kaggle reports that fully 50% of competition winners used R to beat out their competitors to create the most accurate predictive models. Today, Revolution Analytics has released a new white paper that interviews some Kaggle competitors who used R to win their competitions, and what makes R uniquely suited to building the most predictive models.
Revolution Analytics has also announced a partnership with Kaggle to make the big-data capabilities of Revolution R Enterprise available for use in Kaggle competitions, free of charge. Now Kaggle competitors can download Revolution R Enterprise and extend their use of R with the R Productivity Environment for coding and debugging predictive models, and apply out-of-memory statistical models to the large data sets appearing in many Kaggle-hosted competitions like the $3M Heritage Health prize or the forthcoming NASA and Wikipedia competitions.
This quote from Jeff Erhardt (Revolution Analytics COO) sums up why we're excitied to make Revolution R Enterprise available for Kaggle competitions. “We’ve entered an era of information where data science can be applied to solve nearly any real world problem," says Jeff. "Technological and scientific advances brought us the R language, and by innovating on top of R, Revolution Analytics is providing data scientists with an opportunity to access broader sets of data faster to tackle today’s toughest data problems. We’re pleased to work with Kaggle to offer Revolution R Enterprise to its ambitious participants.”
Revolution White Papers: R Competition Brings Out the Best in Data Analytics
I would like to say that this is one of the most horrible graphs I have seen lately.
I wonder what the next biggest slice next to R refers to, others? I cannot get it from this thing.
Next, I wonder how you can even get this in R. But actually, it looks more like Excel. And this on an R blog,...
I mean, everybody interested in data visualization is arguing against pie charts (e.g., http://www.stat.columbia.edu/~cook/movabletype/archives/2011/01/chartjunk_but_i.html ). I really wonder how anybody could have missed that.
Sorry, if this is to aggressive but I couldn't resist.
Posted by: Henrik | April 19, 2011 at 13:36
Indeed, I agree with Henrik. My first reaction on seeing this posting was "Please tell me that the pie chart wasn't produced using R".
Posted by: Douglas Bates | April 20, 2011 at 11:00
I'm not sure what Anthony used for that graph, I cropped it from a PowerPoint presentation. I'll ask him.
Posted by: David Smith | April 20, 2011 at 11:54
Ok,ok the pie chart sucks. I am not sure which has been beaten up more lately pie charts or word clouds.
However, I think linking tools to Kaggle for the use in contest is a great idea. I have always thought if they included preloaded baseline predictive analytics models to help people get started. If that start package where freely available and scalable all the better.
Every time I have talked to anyone who has competed in these great contest they always run baseline predictive models first. If this was already done for them and everyone else, more modeling time could be spent on implementing unique approaches or the ensemble methods that put the winners models over the top.
For those that want more pie chart fodder:
http://www.stevefenton.co.uk/Content/Pie-Charts-Are-Bad/
http://blogs.oracle.com/experience/2010/03/countdown_of_top_10_reasons_to_never_ever_use_a_pie_chart.html
Posted by: NPHard | April 20, 2011 at 13:29