Ventana Research analyst David Menninger was on the judging panel for the Applications of R in Business contest. In a post on the Ventana research blog, he offers his perspectives on the contest, noting that
R, as a statistical package, includes many algorithms for predictive analytics, including regression, clustering, classification, text mining and other techniques. The contest submissions supported a variety of business cases, including, among others, predicting order amounts to optimize manufacturing processes, predicting marketing campaign effectiveness to optimize marketing spending, predicting liquid steel temperatures to optimize steel plant processes and performing sentiment analysis of Twitter data.
(Incidentally, David also has a great riff on the terminology of "predictive analytics" and "big data" out today.) He also notes that these applications are compelling precisely because of the close relationship between the contest entrants and the business problems they demonstrated how to solve:
The entries also demonstrated a best practice: close alignment between the analyst and the underlying business objectives. Predictive analytics is not magic. It requires an understanding of business processes and an understanding of statistical techniques. The judging criteria reflected this requirement as well. One of the three categories we were asked to score was applicability of the submission to business. I think it’s clear how the analyses in the winning entries could provide significant business value.
As David notes, however, the counterpoint to this is that the analyst must combine *both* the . "How many people in your organization could perform those types of analyses", he rightly asks. A combination of statistical tools along with domain expertise (plus the technical skills to implement the solution) is the hallmark of a good data scientist, which exactly why many organizations are looking to build effective data science teams.
By the way, while the concept of "data scientist" is relatively new, this idea of combining statistical analysts with domain expertise is not. Bill Cleveland (yes, that Bill Cleveland) made similar suggestions in a prescient paper back in 2001: "Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics". (ISI Review, 69)
David Menninger: Revolution Analytics Hosts Contest on Business Predicting the Future
The "data scientist" meme, it's seemed to me from its inception, is a re-invention of operations research. Why? I suppose for the same reason coder-types re-invent lots of terms: so they can lay claim to expertise in some field for which they really have none, by co-opting.
This practice goes back, at least, to Dr. Deming. While well intentioned, he set in motion the idea that, as Mrs. Peel, a talented amateur was all that is required in a highly technical field. And, as far as that goes, real mathematicians more often than not view statisticians (even math stats) as marginally talented amateurs.
Posted by: Robert Young | February 02, 2012 at 05:47
I certainly agree that the idea of statisticians having domain expertise is not new! If you look back to the work of William Gosset a century ago, you will see that domain expertise - in farming and brewing - was the driver of his work, work that underlies most everything statisticians and data miners do today.
Posted by: Meta Brown | February 02, 2012 at 09:30