As someone who trained as a statistician, I've always struggled with that title. I love the rigor and insight that Statistics brings to data analysis, but let's face it: Statistics — the name — has always had a bit of a branding problem. Telling someone I was a statistician was more likely to conjure up images of me counting runs at a baseball (or cricket) game than pursuing serious science. And the image of what Statistics ideally is about — collaborative, interactive, applied, fun — was too often subsumed by the stereotype image — isolated, actuarial, ivory tower, report driven. (And hey, even actuaries can be fun sometimes.)
That's why I'm a fan of the term "data scientist" — it embodies everything that Statistics always should be, without the baggage and tradition of the term "statistician". So I enjoyed participating in yesterday's Kalido webinar "Data Scientist: Your Must-Have Business Investment Now" where I could make the following contrast between the images of Statisticians and Data Scientists:
(A quick aside on the "Data Size" row above: while the unstructured or unaggregated data source data that data scientist work with can be in the terabytes range or even large, by the time it's cleaned and prepared for statistical modeling, a file in the gigabytes range is even more typical — even at "Big Data" companies like Facebook. This is a topic I cover in more detail in my recent Strata talk on real-time predictive analytics.)
So bottom line: while I am a statistician, and I love Statistics dearly, I do prefer to call myself a Data Scientist today, because it better represents to me what Statistics really is to me (if that makes sense). And that's certainly not to diminish the achievements of those who do call themselves Statistician. In particular, I want to recognize George Box: a true hero of mine, coiner of the idiom "all models are wrong, but some are useful", and one of the nicest people I ever met, who sadly passed away in March.
On the other hand, I have no qualms about making a competitive comparison between Data Science and Business Intelligence:
You can get the details of how I differentiate Statistics and Data Science and BI, and hear other perspectives on Data Science from fellow data scientists Carla Gentry and Gregory Piatetsky in the slides and replay of the webinar provided by Kalido at the link below.
Kalido: Data Scientist: Your Must-Have Business Investment NOW
Oh heck...it's a good thing the Red Wings and Blackhawks are in the first intermission. Speaking as an econometrician who has constructed more than 15,000 forecasting and predictive models for economics and applied marketing... "horse puckey." Get it? "Predictive" = forward looking "Forecasting" = forward looking. You can SAY that data science is forward looking, but a lot of it is merely pattern-based analysis applied in the hope that what has happened in the past will happen again. The only reason that statistics has a branding problem is that non-quant MBA Marketing majors are still scratching their heads over probability and statistics. A statistician with kilobytes of data is in academia, not the real world. No matter your preferred moniker, real world data is just as large and just as dirty. What is Linux doing in a tools list, btw? Should I point out that one of the current uses for mainframes right now is to run large numbers of simultaneous linux instances for web sites? Basically, if you think Data Scientist is sexy, then more power to ya - I hereby dub thee data scientist, but you should still be proud of statistician.
Posted by: Lynne | May 15, 2013 at 18:16
I turned off the wings game in the third, no need to stick around and get more depressed.
As a developer who does a lot of BI work, I find this comparison of Business Intelligence vs Data Science a bit one sided. I have used Cognos and Tableau for analysis and reporting. Also, a good BI developer should present the data in an easily understandable format which doesn't mean it HAS to be tables, unless of course that is what the client requested. While a lot of BI does tend to be reflective, looking back on events is a great way to estimate what the future will look like.
I do however agree, that "Data Scientist" is a sexier name than "Business Intelligence" but see one as the natural evolution of the other.
Posted by: Doug E | May 15, 2013 at 21:49
My wife doesn't like the term Data Science because while they (we?) may deal with data, they don't do anything close to science (under any reasonable definition of science).
Posted by: Ian | May 15, 2013 at 22:42
Don't try to search for any objective justification for something that's pure marketing. You are all quant people, don't get trapped by non quant people rationality.
Posted by: Pau | May 16, 2013 at 00:26
From decades of sad experience, Data Science is as regressive (!) wrt both Data and Science as NoSql is to the Relational Model, and even SQL databases. In both cases, a cabal of coders want to appropriate the aura of smarts without having to actually be smart. Kind of like Web 2.0: a meme created by purveyors, not thinkers or even practitioners.
Posted by: Robert Young | May 16, 2013 at 12:48
A scientist presents facts to support a hypothesis. I see neither in this blog post (specifically in the comparison of data scientist and BI).
Business Intelligence is called that, in part, because it helps the entire business. A data scientist is focused on his/her dataset, whethet in GB or KB.
Posted by: S.S. | May 16, 2013 at 20:18
Here is how I see it. I was trained as an econometrician. While that gave me a great stats foundation- I also had the trouble seeing myself as a statistician.
Statisticians were not then nor typically now trained to work with business. They don't speak the same language (and vice versa). Most statistics grad programs that I have read spend more time on real analysis, rather than how to deal with a complex, data oriented business problems and communication.
This is the true difference I see in DS vs Statistician. A DS probably cannot do real analysis, but can put a business problem into context and work to solve it with data. A Statistician is the opposite.
Of course the above is a generalization- I certainly know Statisticians who have conquered the business world. And some data scientists who do not have any understanding of the underlying model.
So both areas need to converge to be both rigorous and useful.
Posted by: Myles Gartland | May 17, 2013 at 10:22
Agree with the 'more power to ya' comment, also agree with the comments re marketing - much of this is fashionable labelling.
What I don't agree with is the negative view of BI - yes, the Balanced Scorecard emerged c1997 (and that's only part of true BI) but it's still the most powerful tool available to any forward-thinking operation. I believe the main reason it (and related BI) is seen as old-hat by many conservative operations is because they are always looking for a quick fix, and they think the latest fad is their silver bullet, even though we all know it is the same thing re-badged.
A comprehensive use of statistical analysis, data mining (or analysis or science - whatever you wish), proven BI tools (eg. BSC), business analysis, process mapping and analysis, etc, etc is the only way to create genuine performance improvements and assist a company who can see beyond the start-up costs.
Let's take the 'vs' out of the conversation, converge the lot and thank Professor Deming for lighting our way.
Posted by: Russell Triggs (Thornton, NSW, Australia) | May 17, 2013 at 13:42
Thanks everyone for the great feedback and conversation. I'm definitely proud to be a statistician -- my point was more that the world at large seems to respond more favourably to the "data science" title as representing what we'd call "real" applied statistics. One things I've learned from conversations since this post is that "data science" seems to be having a branding problem of its own. All of the data scientists I've worked with have excellent statistics skills, but (ironically?) that could well be due to a nonrepresentative sample. Anyway, thanks again for the responses.
Posted by: David Smith | May 17, 2013 at 13:54
How does the term "data artist" compare with the other terms?
Posted by: Humayun Khan | May 18, 2013 at 11:09
Well there is always a fascination towards newly coined terms and I guess the same is the case with data scientist. However, I feel a data scientist is a complete combination of a statistician and BI guy. He or she is the master chef of data!
Posted by: Shaona Mukherjee | June 25, 2013 at 03:43
what a crap article to prove a preconceived notion about the glamour of the name data science
Posted by: arup | July 10, 2013 at 15:51
how do to get to become a data scientist. which career path takes to to that step.
Posted by: divya | July 19, 2013 at 09:36