Jim Stodgell has a though-provoking post at O'Reilly Radar today on the ethics of data mining and personalization. He frames his argument through a personal anecdote: his local independent bookseller, Al, gives Jim valued recommendations for books Jim might like to read, based on his personal knowledge of Jim through their friendly conversations and Jim's purchases at the store. That's awesome -- I wish I could spend more time in bookstores and build up a relationship like that. But Jim says that when companies do the same thing, by offering recommendations based on your transactions and other personal information, that's somehow unethical, and that the corporate zeal for collecting such data is sociopathic.
Here's a thought experiment. If Al were to be replaced by a friendly robot (I was going to say "AI" -- artifical intelligence -- instead of "robot" there, but let's dodge that typographical landmine), would the situation be any less ethical? If Al the robot assiduously observed not just the books Jim bought from the store, but also those he looked at and put down, or even overheard the conversations he had with friends about his opinions of books in the store, and used that information as the basis of his recommendations, would that still be ok? At a fundamental level, Al the bookseller is doing exactly that: he just has amuch more complex data mining model than any algorithm today, based on a richer set of qualitative data (and somewhat less quantitative data) about Jim.
For me, data mining and the recommendations that come from the process are incredibly useful. I simply don't have enough time to research every product -- heck, not even every product category -- that I might be interested in, so getting automated tips about stuff I might want to buy is valuable to me. (And I'm not alone: about a third of Amazon's revenue comes from recommendations, so plenty of other people find it valuable too.) The same goes for finding out about movies I might want to see, or people I might want to meet, and places I want to go. I value recommendations about such things from friends, certainly, but only because my friends have better in-built data mining algorithms, and more unique data about me to make predictions with, than those machines at Netflix and Foursquare. But with better data mining algorithms and more data, perhaps one day automated recommendations could one day be just as valuable to me as those from people I know. That's why I'm diligent about providing ratings on Amazon and Netflix: because it's beneficial to me.
The bottom line is that, personally, I have few worries about ethics around the use of data by machines. The ethical issues arise at the boundaries of machines and humans: if Amazon starts automatically issuing alerts to FBI agents because it doesn't like the type of books I'm buying, or if Facebook starts sending gossipy notes to my husband about who I've been chatting with lately then yes, you sociopathic faceless corporations, we're going to have issues. But until that happens, I'm going to keep on rating things, checking in, and signing up for data-based services. And one day, perhaps, welcome our robot bookseller overlords.
O'Reilly Radar: An ethical bargain
Hello straw man. When Al, my bookstore owner, gives me a suggestion, I know he is giving me a suggestion and I know what basis he is making that recommendation from (books I've bought in the past, conversations he and I have shared, ect). The issue with Big Data is that too many people are unaware of when and how data is being collected. If Al gave me a suggestion based on an entry in my diary, I would have an issue with Al.
I would also have a problem if my grocer, Tim, gave me a suggestion based on a purchased I made at Al's bookstore, even if Al is Tim's brother.
If everyone had a way of knowing when something clicked, entered, uploaded or downloaded becomes data, then I would say there is less an issue. Until the data collection process becomes more transparent, I do believe there is issues.
Posted by: Tyler | June 02, 2011 at 13:21
@Tyler Straw man battle! I too would have a problem if Al told Tim what books I'd been buying, because I don't want Tim giving me the stink-eye next time I buy a banana. But I don't have an issue with grocers.com using public data about me to make recommendations: I don't care what the grocers.com data mining algorithm "thinks" about me, because it's not a person. It's a machine. That was my point.
The rub comes in what "public data" means, and there are definitely ethical issues there (that Jim focuses on in his original post). My take is that a site is welcome to use any data I've shared privately with that site - my clicks, my actions, ratings, etc. Any site it also welcome to use any data I've chosen to share with the Web at large, via public profiles etc. The controversy arises when sites share data they only know through my private interactions with me, which should only be shareable with explicit consent. There should be an "ethics of data sharing" for corporations to follow in this case -- does such a thing exist?
Posted by: David Smith | June 02, 2011 at 14:08
I don't believe there is. A recent Technology Review post on Big Data (http://www.technologyreview.com/business/37548/?ref=rss) makes one suspicious of corporations collecting date from a broad range of resources using highly sophisticated algorithms. Until there is an established set of ethics (or laws), I don't think data mining is a "fair" practice for directing advertising at customers (and a recommendation is really just an advertisement directly focused at that particular customer).
I realize this is where the debate gets off topic... were focus groups ever "fair" in the advertising world? I would say yes as long as those participating in the group know the reason behind it. I believe the same goes for a site using the data I provided. It should disclose how and when data is being collected. This is especially important for sites targeted at children and teens.
Posted by: Tyler | June 02, 2011 at 14:49