In the 2012 edition of the SAP Sybase Capital Markets Guide, Revolution Analytics' Senior Advisor for Products and Strategy (and former CEO) Norman Nie writes about the "Five Benefits of Big Analytics". (You can also read his article at Enterprise Innovation.) Norman makes the argument that while sampling and aggregation are often useful ways of handling very large data sets for statistical analysis, there are nonetheless several situations where using all of the data in the analysis is beneficial and/or important. They include situations where you need to:
- Make Predictions with Data Mining
- Deploy More Powerful Predictive Models
- Find and Understand Rare Events
- Extract and Analyze “Low-Incidence Populations”
- Move Beyond “Statistical Significance”
Not coincidentally, many of the big-data analysis features of Revolution R Enterprise have been designed to support these classes of data analysis. You can read the full article by dowloading the 2012 Sybase Capital Markets Guide from the link below (it's on pages 16-19). An expanded version of the article is also available as a white paper from Revolution Analytics.
SAP Sybase: Capital Markets Guide 2012
In a nutshell: find the Black Swans. But, how important, commercially, are such outliers? The question is seldom addressed directly. Unless there is some business (or, possibly, social mandate) where the outliers can be serviced at unusually high profit, it's just a waste of time and money. I'd love to witness a confrontation between a Big Data Advocate and Philipp Janert (or someone of his ilk).
We do see this black swan hunt in the pharma business; companies spend ever more money seeking approval of drugs which treat small groups of patients. FDA allows the companies to charge any price for such Orphan Drugs for long periods of time. Whether this is good public policy is not addressed.
Posted by: Robert Young | April 19, 2012 at 09:04