Last month I joined Gregory Piatetsky (KDnuggets editor) for a webinar presentation Data Science: Not Just for Big Data, hosted by Kalido. In my portion of the presentation (you can see my slides below, and Gregory's slides are here), I wanted to react to the Big Data focus which is so much a part of the Data Science movement today, to focus on the issues that with all data sets, that statisticians have learned from working with smaller data sets over the last 200 years. This includes issues like observational bias (an often-overlooked issue with Big Data), confounding and overfitting (which can mess up any model, if care isn't taken), and to move the discussion around predictions (means) and towards risk (variance).
I still firmly believe that Big Data is important — there's so much we can do today that was never possible without the variety and volume of data sources we have now — but the data science community has much to learn from the realm of smaller data. Several examples come from the excellent ComputerWorld article, 12 predictive analytics screw-ups. You can watch the webinar replay below.
[Updated Nov 19 to add a link to Gregory's slides.]
Kalido Webinars: Data Science: Not Just For Big Data