Yesterday's New York Times includes a great article on the failure of some genetic tests for cancer detection, and the flaws in the research that led to them. The article features quotes from Keith Baggerly of MD Anderson Cancer Center, and includes a photo of him and colleague Kevin Coombes in front of a page of R code:
The article highlights the importance of reproducible research: unless others can have access to the data and code that backs up the statistical conclusions in the paper, many such errors are likely to continue to go undiscovered. I saw Keith give an amazing talk about reproducible research and data forensics at the BioConductor conference a couple of years ago (I'm pretty sure that's one of the slides from the talk in the image above):
The talk concerned a published article where the results seemed odd. Baggerly tried in vain to get hold of the source data to reproduce the analysis, but the article authors didn't cooperate. So in an amazing feat of data forensics, he managed to recreate the data by matching public sources to measurements from the printed graphs, and figured out that there were gross data errors in the article: labels transposed, data duplicated, that kind of thing. The conclusions were completely bunk, but the journal refused to print a correction, despite the fact that it meant actual patients were being trialled on inappropriate drugs.
I'm glad to see this important issue is getting some wider media attention -- check out the Times article at the link below for the story.
New York Times: How Bright Promise in Cancer Testing Fell Apart
Not sure that "reproducible research and data forensics" links to the right place.??
Posted by: Kevin Wright | July 10, 2011 at 06:28
That's the right link Kevin -- it's my summary of the Bioconductor conference in Seattle in 2009, where Keith Baggerly spoke on this topic.
Posted by: David Smith | July 11, 2011 at 09:00