« Data Science Toolset discussion at Data Scientist Summit | Main | An essential vocabulary for the R language »

May 11, 2011


Feed You can follow this conversation by subscribing to the comment feed for this post.

Full disclosure: I'm a statistician as well.

I'm definitely in the anti-"Data Science"-as-a-label camp because I think it minimizes other extremely useful areas of statistics. Statistics is, at least according to Moore and McCabe, "the science of collecting, organizing, and interpreting numerical facts, which we call data." Whether you are collecting data using python or R from some website's API or giving a survey to cancer survivors, you are collecting and possibly organizing the data. If you are making inferences with this data, you are doing statistics in either scenario.

IMO, all of statistics is inherently data science. I realize that the data science movement is trying to identify a part of statistics specific to Web 2.0-type problems. However, I think that identifying "science" with the Web 2.0 crowd, then we are indirectly de-emphasizing that statisticians working outside of Web 2.0 are doing statistical science as well, e.g. biostatistics, renewable energy, etc.

Having said that, I did recently author the R package for the Data Science Toolkit. :) The tools that Pete made available are pretty sweet and I felt that the R community could take advantage. A brief write-up can be found here:


and the code is given at github.com/rtelmore.


Thanks for the comments, Ryan! (I made your link clickable so it's easier for others to follow -- thanks for sharing.)

I'd counter slightly to say that "all of *applied* statistics is inherently data science" - I think one of the benefits of the term "data science" is that it implies the application of statistics to real-world problems (and not just Web 2.0 problems -- I think Data Science is broader than that).

Although the name is nice and useful, I really think it should have been called something like "data literate scientist". I am in the Operations Research department and we have two types of people: those who feel comfortable with data and those who don't. We both do operations research but one of us doesn't speak the data language. and the group who knows that language calls itself "data scientists" but we have almost nothing in common with the data scientist who works at FDA, or the data scientist who works at facebook. We are passionate about a different set of problems.

Anyways, to me data literacy is just one new skill that anybody should master. just like language or communication skills. It is not another field or profession it is just one skill.

Thank you. Very useful.
yes, the terms "Data Science" and "Data Scientist" have only been in common usage for a little over a year. And the terms have been used and started from many years ago. some references as follows,
[1] http://datascience.fudan.edu.cn/.
[2] Dataology and Data Science: Up to Now [OL]. [16 June 2011] http://www.paper.edu.cn/index.php/default/en_releasepaper/content/4432156.
[3] Data Explosion, Data Nature and Dataology. In Proceedings of International Conference on Brain Informatics (BI’09).2009.
[4] Dataology and Data Science. (in Chinese with English abstract). Fudan University Press. 2009. ISBN 978-7-309-06956-3 /T.350.

The comments to this entry are closed.

Search Revolutions Blog

Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid
Get this blog via email with Blogtrottr