Last month I joined Gregory Piatetsky (KDnuggets editor) for a webinar presentation Data Science: Not Just for Big Data, hosted by Kalido. In my portion of the presentation (you can see my slides below, and Gregory's slides are here), I wanted to react to the Big Data focus which is so much a part of the Data Science movement today, to focus on the issues that with all data sets, that statisticians have learned from working with smaller data sets over the last 200 years. This includes issues like observational bias (an often-overlooked issue with Big Data), confounding and overfitting (which can mess up any model, if care isn't taken), and to move the discussion around predictions (means) and towards risk (variance).
I still firmly believe that Big Data is important — there's so much we can do today that was never possible without the variety and volume of data sources we have now — but the data science community has much to learn from the realm of smaller data. Several examples come from the excellent ComputerWorld article, 12 predictive analytics screw-ups. You can watch the webinar replay below.
[Updated Nov 19 to add a link to Gregory's slides.]
Kalido Webinars: Data Science: Not Just For Big Data
While I agree with you completely regarding the potential pitfalls, these are just the normal ones one is warned about when doing statistics. I have been involved in statistics in one form or another, in many fields, for a few decades now. I have also just re-started an MS in Applied Statistics (after a couple of decades). What you talk about with small data is just the stuff that I get in just about every textbook. Good stuff, but standard statistics.
What this makes me think is that you are finding Data Scientists who are not statisticians trying to do statistics. That would be like someone who is a PhD Statistician trying to make suggestions about how to physically handle large datasets on a particular hardware and software platform. The results are often hilarious.
Posted by: Louis Giokas | November 15, 2013 at 15:40