In the sponsored article Data Science: Buyer Beware at Forbes, SAP's Ray Rivera takes a dim view of Data Science. According to Rivera, Data Science is a "management fad" in the mold of Business Process Reengineering, and casts data scentists as self-ordained "gurus" whose mission is to stand between the "ignorant masses" that need access to data and a company's valuable data stores. He likens data scientists to the icemen of the olden days, keen to provide a handcrafted service instead of the newfangled automated solution:
I don’t want no iceman
I’m gonna get me a Frigidaire …
I don’t want nobody
Who’s always hangin’ around.
If you've been following my writings about data science on this blog or in my webinar on the Rise of Data Science, you'll know I find this viewpoint to be total bunk. (So does Melinda Thielbar, who offers an excellent critique of Rivera's post from the perspective of a practicing data scientist.) First, Data Science definitely isn't a management process, and it's certainly not a fad: statistical analysis, one of the three components of Data Science, has been used in companies for more than 100 years, and the advent of Big Data and all of its applications has only solidified its importance in recent years. Secondly, acting as a gatekeeper to data is the antithesis of Data Science: a data scientist's main focus should be on liberating data by creating data apps that provide on-demand access to data analysis, while implementing the unique expertise that data scientists provide.
There's much more I could say about this, but my thoughts are captured in detail in this podcast at the IBM Big Data Hub. In my conversation with David Pittman we also cover whether Data Science is "sexy" (note: there's no such thing as a calendar on the theme of "Guys and Gals of Data Science"), and how the R language is an ideal platform for creating data apps. You can listen to the podcast at the link below.
IBM Big Data Hub: Rebuffing "Buyer Beware" Attitude on Data Science
The problem with Data Science (or stats-lite) is the too often lack of both stats fundamentals and analysis context. One au courant venue is the argument over the viability of Social Security and Medicare, when arguing about the "increase" in life expectancy, and its impact on both programs.
Too often, especially the Lunatic Right which wants to remove the hoi polloi, the proponents that The Sky is Falling trot out the increase in life expectancy (at birth, but they usually don't tell you that) is the death knell. What they don't tell you is that life expectancy at 65 has barely nudged up since the start of SS in 1935/6. More folks don't die young. As a result, more folks pay into SS for a longer time.
SS *is not* an investment program; never was and never should be. It is run on current account. What "trust funds" exist are "invested" in government instruments; moving moolah from one pocket to another. The Baby Boomers will move through, and SS will return to surplus. Problem solved.
In all, Data Science has the potential to do more damage than Excel. Think about it that way. Excel allowed folks to go off half-cocked with a .22. Data Science puts a .44 Magnum (Dirty Harry's gun) in their hands. Is that a good idea?
Posted by: Robert Young | January 22, 2013 at 06:56
Excellent points, particularly "...a data scientist's main focus should be on liberating data..."
I find that this is a struggle to convey within my company, especially to my sales team, who hears "self serve data applications" as "end to our revenue stream." I wonder if data scientists at other companies are having similar issues.
Posted by: Melinda Thielbar | January 22, 2013 at 07:04
Robert, not sure I get the connection with Social Security -- there have been plenty of good analyses by statisticians and data scientists making exactly your points. But to your .44 Magnum analogy: that's exactly the difference between giving general purpose data analysis tools to untrained users, and data scientists creating data apps. A data apps, being a targeted application designed to solve a specific problem on demand, lacks the capability of shooting oneself in the foot by design.
Posted by: David Smith | January 22, 2013 at 09:41
-- Secondly, acting as a gatekeeper to data is the antithesis of Data Science: a data scientist's main focus should be on liberating data by creating data apps that provide on-demand access to data analysis, while implementing the unique expertise that data scientists provide.
This statement, as I read it, is self-contradictory. Either Data Science acts as Gatekeeper, through the process of providing bespoke data apps, or it is another Swiss Army knife for untrained (in stats, OR, etc.) business analysts, et al, just as Excel was in its time. R for all and all for R. Where once the insult was Excel spreadsheets stuffed with mind bending macros, now we have cobbled R functions? Is that where we should go? All that matters is a small p-value or large r-square? Hmm.
-- (Rivera) Yet, data science belongs to a family tree of business practices that for over a century have been governed by technocrats who view organizations as machines, desiring to automate everything and eliminate people wherever possible.
I have to agree with him on this point; why didn't he invoke Taylor? The Great Recession occurred because Wall Street quants (Data Scientists by another mother) believed that time series analysis was sufficient in assuming that today was mostly like yesterday, and tomorrow will look mostly like today. The issue, as usual, was the quants assumed a regular mechanist process pertained. They didn't bother to look at the underlying money flows, or lack thereof (hamburger flippers could never actually pay for a 4,000 sq.ft. McMansion). They never (save a handful, humble self included) bothered to ask how house prices could surge while median income fell. There may not be many black swans in human endeavor, but human processes are driven by hoards of gray ones (in The Great Recession case, mortgage corruption); isolated data streams don't capture that. But, he's wrong in asserting that giving full rein of uber-Excel to math illiterates is the answer. Neurosurgeons should be the only ones carving up brains.
On the whole, Rivera's assertions read, to me, with a healthy dose of schizophrenia: on the one hand, Data Science isn't science and can be done by anyone, on the other hand, misuse of data leads to damage.
If Data Science has a godfather it's O'Reilly (both the corporation and the man), much as Web2.0. To the extent that Data Science is an attempt to codify vapor (and it is), then Rivera has a point. Computer science is one of his examples, and I was around when it was created as a dumber alternative to EE, to satisfy those who couldn't grok EE, but still wanted "to do computers". Rivera is right on that point. His leap of faith that Data Science was sired by BPR is historically false, so far as I can see.
-- (Rivera, my emendation) [SAP] ended up producing the opposite, requiring enormous amounts of IT investment, bureaucratic overhead, and technical specialization in order to achieve even simple results.
Just ask anyone.
-- (Rivera) Rather than seeking out gurus to mollify big data anxieties, analytics users should demand that their vendors produce tools that can be used primarily by subject matter experts, in collaboration with analytics specialists, providing transparency and an appropriate level of functionality to both, and facilitating collaboration among business users.
Which sounds like what you assert in your comment? Gatekeepers making bespoke data apps for untrained users. He doesn't make up his mind.
I could go on for a while longer parsing his piece. But I'll stop here.
As to the Social Security example, it's become the poster child for bad quant analysis, and an example arrived today on R-bloggers; so that's why I chose it. One could just as easily have used The Great Recession as the archetype for untrained quants putting a .44 slug in their feet. In both cases, however, the slug doesn't hurt them anywhere near as much as the collateral damage in the greater economy and society. To the extent we wish to avoid such damage, best let neurosurgeons do the brain carving, not ditch diggers.
Posted by: Robert Young | January 22, 2013 at 11:53