« Because it's Friday: a video tour of the International Space Station | Main | A beginner's guide to sharing and collaboration with R »

January 21, 2013

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a010534b1db25970b017d3fcc3ebe970c

Listed below are links to weblogs that reference A strained Data Science analogy:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

The problem with Data Science (or stats-lite) is the too often lack of both stats fundamentals and analysis context. One au courant venue is the argument over the viability of Social Security and Medicare, when arguing about the "increase" in life expectancy, and its impact on both programs.

Too often, especially the Lunatic Right which wants to remove the hoi polloi, the proponents that The Sky is Falling trot out the increase in life expectancy (at birth, but they usually don't tell you that) is the death knell. What they don't tell you is that life expectancy at 65 has barely nudged up since the start of SS in 1935/6. More folks don't die young. As a result, more folks pay into SS for a longer time.

SS *is not* an investment program; never was and never should be. It is run on current account. What "trust funds" exist are "invested" in government instruments; moving moolah from one pocket to another. The Baby Boomers will move through, and SS will return to surplus. Problem solved.

In all, Data Science has the potential to do more damage than Excel. Think about it that way. Excel allowed folks to go off half-cocked with a .22. Data Science puts a .44 Magnum (Dirty Harry's gun) in their hands. Is that a good idea?

Excellent points, particularly "...a data scientist's main focus should be on liberating data..."

I find that this is a struggle to convey within my company, especially to my sales team, who hears "self serve data applications" as "end to our revenue stream." I wonder if data scientists at other companies are having similar issues.

Robert, not sure I get the connection with Social Security -- there have been plenty of good analyses by statisticians and data scientists making exactly your points. But to your .44 Magnum analogy: that's exactly the difference between giving general purpose data analysis tools to untrained users, and data scientists creating data apps. A data apps, being a targeted application designed to solve a specific problem on demand, lacks the capability of shooting oneself in the foot by design.

-- Secondly, acting as a gatekeeper to data is the antithesis of Data Science: a data scientist's main focus should be on liberating data by creating data apps that provide on-demand access to data analysis, while implementing the unique expertise that data scientists provide.

This statement, as I read it, is self-contradictory. Either Data Science acts as Gatekeeper, through the process of providing bespoke data apps, or it is another Swiss Army knife for untrained (in stats, OR, etc.) business analysts, et al, just as Excel was in its time. R for all and all for R. Where once the insult was Excel spreadsheets stuffed with mind bending macros, now we have cobbled R functions? Is that where we should go? All that matters is a small p-value or large r-square? Hmm.

-- (Rivera) Yet, data science belongs to a family tree of business practices that for over a century have been governed by technocrats who view organizations as machines, desiring to automate everything and eliminate people wherever possible.

I have to agree with him on this point; why didn't he invoke Taylor? The Great Recession occurred because Wall Street quants (Data Scientists by another mother) believed that time series analysis was sufficient in assuming that today was mostly like yesterday, and tomorrow will look mostly like today. The issue, as usual, was the quants assumed a regular mechanist process pertained. They didn't bother to look at the underlying money flows, or lack thereof (hamburger flippers could never actually pay for a 4,000 sq.ft. McMansion). They never (save a handful, humble self included) bothered to ask how house prices could surge while median income fell. There may not be many black swans in human endeavor, but human processes are driven by hoards of gray ones (in The Great Recession case, mortgage corruption); isolated data streams don't capture that. But, he's wrong in asserting that giving full rein of uber-Excel to math illiterates is the answer. Neurosurgeons should be the only ones carving up brains.

On the whole, Rivera's assertions read, to me, with a healthy dose of schizophrenia: on the one hand, Data Science isn't science and can be done by anyone, on the other hand, misuse of data leads to damage.

If Data Science has a godfather it's O'Reilly (both the corporation and the man), much as Web2.0. To the extent that Data Science is an attempt to codify vapor (and it is), then Rivera has a point. Computer science is one of his examples, and I was around when it was created as a dumber alternative to EE, to satisfy those who couldn't grok EE, but still wanted "to do computers". Rivera is right on that point. His leap of faith that Data Science was sired by BPR is historically false, so far as I can see.

-- (Rivera, my emendation) [SAP] ended up producing the opposite, requiring enormous amounts of IT investment, bureaucratic overhead, and technical specialization in order to achieve even simple results.

Just ask anyone.

-- (Rivera) Rather than seeking out gurus to mollify big data anxieties, analytics users should demand that their vendors produce tools that can be used primarily by subject matter experts, in collaboration with analytics specialists, providing transparency and an appropriate level of functionality to both, and facilitating collaboration among business users.

Which sounds like what you assert in your comment? Gatekeepers making bespoke data apps for untrained users. He doesn't make up his mind.

I could go on for a while longer parsing his piece. But I'll stop here.

As to the Social Security example, it's become the poster child for bad quant analysis, and an example arrived today on R-bloggers; so that's why I chose it. One could just as easily have used The Great Recession as the archetype for untrained quants putting a .44 slug in their feet. In both cases, however, the slug doesn't hurt them anywhere near as much as the collateral damage in the greater economy and society. To the extent we wish to avoid such damage, best let neurosurgeons do the brain carving, not ditch diggers.

The comments to this entry are closed.


R for the Enterprise

Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid

Search Revolutions Blog