Many of us have worked with datasets that have been anonymized by deleting personally-identifiable information. Taking the concept further, a number of institutions -- including Netflix, AOL, and countless government agencies -- have released anonymized data sets to the public for analysis. But how anonymous are these datasets, really?
Not very, according to this Ars Technica article based on a recent paper. When 87% of Americans can be uniquely identified by nothing more than their ZIP code, birthdate, and sex it's easy to see how such "anonymous" data can be worked to find personal information about individuals. The Governor of Massachusetts found this out the hard way when he was delivered a copy of his personal medical records extracted from publicly-released data. Unfortunately, we're apparently left with the conundrum that "data can either be useful or perfectly anonymous but never both".
Ars Technica: "Anonymized" data really isn't—and here's why not
Comments