by Joseph Rickert
I had the good fortune to attend the GE hosted event Data Forecast: What’s Next in Data Science held at the chic “Box SF” a couple of weeks ago. (This the close as I get to “elegantly cool”.) The highlight of the evening was a panel discussion very ably led by Jeff Kelly of Wikibon. The panelists were Kaggle’s Anthony Goldbloom, Annika Jimenez of Pivotal (formerly EMC Greenplum) Andreas Weigend of Stanford’s Social Data Lab, Hilary Mason of bitly and Anil Varma of GE. The discussion is online and worth listening to in its entirety.
This was a better than average event from the point of view of serious questions and panelists who were willing to give thoughtful, sometimes provocative answers. To Jeff’s first question “What is the role of data science, not just in business but in society in general ... is it only about ad placement”, Anthony replies the data science is about “bringing rigor to decision making” — certainly a positive move away from ad placement — but not as far as Hilary takes it. In the preamble to her response she observes: “For the first time we are able to study human behavior at the scale of human behavior” but then notes that “The accomplishments of data science have out-paced data products.” Annika refers to the “person level impact of data science” and Andreas points out that “data science allows us to observe individual things that individuals do in their natural environment”. This tension between the promise of human understanding implicit in first part of Hilary’s response and the “dark side” of corporations and governments learning much too much about individuals pervades the entire discussion. For example, later in the discussion (29:30) Andreas points out that data scientists are now able to address “questions that didn’t exist before” and suggest that we may be near the “end of insurance as we know it”. It is conceivable that data scientists working for insurance companies could make predictions at a high enough level of granularity to make clear the risk of insuring individuals. He asks: Do we want this as a society?”. And yet, after, this ominous warning, Anil expresses optimism and points to positive examples of companies sharing data for the “greater good” (42:56). The thread of this discussion could not have been planned in the five minutes or so the participants huddled by the hors d'oeuvres before the panel. Magic can happen when good people get together.
The discussion around Jeff’s second question “Is the role of the data scientist to find the right questions” surprised me. Both Annika and Anthony push back insisting that collaboration is key and that the business people should know what questions to ask and how to prioritize. Hilary contributes that an essential skill of a data scientist is to be able to “understand a business problem and translate it into an analysis”, adding in her own unequivocal style that creativity is a and essential quality for a data scientist and the notion that “the data tells us what to do is bullshit”. Later in the discussion (28:46) Annika warns that data scientists and the companies they work for will find themselves in trouble unless that problems that they are working on are attached to a “vision for data science”.
Finally, a major theme pervading the discussion (and probably the reason for the evening) is that everyone there was looking to hire data scientists. There was clear note of desperation here, and how could there not be given that the cumulative description of skills that the panelists attribute to data scientists. One would think that every data-driven organization is trying to hire the the equivalent of the Avengers. Sure, there are a few superheroes out there: “I still believe in heroes” (Nick Fury). But really, do Black Widow or Captain America go looking for work? Data scientists need to be identified, cultivated and mentored, wooed maybe, but rarely merely hired.
I am really surprised that in this discussion, the word "Statistics" has not been mentioned. What do you think of the relationship between Data science and Statistics?
Posted by: Honglang Wang | April 19, 2013 at 07:54
Anil Varma talked about uncertainty, and being able to determine confidence intervals for the results. Statistics plays a big role there.
Posted by: Markus Spatz | April 19, 2013 at 08:10