« The Economist on Big Data's impact on business | Main | Highlights from R/Finance 2011 presentations »

June 02, 2011

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a010534b1db25970b01538ee5f10a970b

Listed below are links to weblogs that reference In defense of data mining ethics:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Hello straw man. When Al, my bookstore owner, gives me a suggestion, I know he is giving me a suggestion and I know what basis he is making that recommendation from (books I've bought in the past, conversations he and I have shared, ect). The issue with Big Data is that too many people are unaware of when and how data is being collected. If Al gave me a suggestion based on an entry in my diary, I would have an issue with Al.

I would also have a problem if my grocer, Tim, gave me a suggestion based on a purchased I made at Al's bookstore, even if Al is Tim's brother.

If everyone had a way of knowing when something clicked, entered, uploaded or downloaded becomes data, then I would say there is less an issue. Until the data collection process becomes more transparent, I do believe there is issues.

@Tyler Straw man battle! I too would have a problem if Al told Tim what books I'd been buying, because I don't want Tim giving me the stink-eye next time I buy a banana. But I don't have an issue with grocers.com using public data about me to make recommendations: I don't care what the grocers.com data mining algorithm "thinks" about me, because it's not a person. It's a machine. That was my point.

The rub comes in what "public data" means, and there are definitely ethical issues there (that Jim focuses on in his original post). My take is that a site is welcome to use any data I've shared privately with that site - my clicks, my actions, ratings, etc. Any site it also welcome to use any data I've chosen to share with the Web at large, via public profiles etc. The controversy arises when sites share data they only know through my private interactions with me, which should only be shareable with explicit consent. There should be an "ethics of data sharing" for corporations to follow in this case -- does such a thing exist?

I don't believe there is. A recent Technology Review post on Big Data (http://www.technologyreview.com/business/37548/?ref=rss) makes one suspicious of corporations collecting date from a broad range of resources using highly sophisticated algorithms. Until there is an established set of ethics (or laws), I don't think data mining is a "fair" practice for directing advertising at customers (and a recommendation is really just an advertisement directly focused at that particular customer).

I realize this is where the debate gets off topic... were focus groups ever "fair" in the advertising world? I would say yes as long as those participating in the group know the reason behind it. I believe the same goes for a site using the data I provided. It should disclose how and when data is being collected. This is especially important for sites targeted at children and teens.

The comments to this entry are closed.


R for the Enterprise

Got comments or suggestions for the blog editor?
Email David Smith.
Follow revodavid on Twitter Follow David on Twitter: @revodavid

Search Revolutions Blog