The New York Times last weekend looked at the controversy around the recent changes to the mammogram guidelines from a mathematical perspective. Compared to the analysis based on Bayes' Theorem from the Harvard Social Science Statistics blog (which apparently caused some controversy itself: that post was deleted and later replaced after some errors apparently crept into the calculations), this article argues from a simple scenario with made-up (but plausible) numbers:

Assume there is a screening test for a certain cancer that is 95 percent accurate; that is, if someone has the cancer, the test will be positive 95 percent of the time. Let’s also assume that if someone doesn’t have the cancer, the test will be positive just 1 percent of the time. Assume further that 0.5 percent — one out of 200 people — actually have this type of cancer. Now imagine that you’ve taken the test and that your doctor somberly intones that you’ve tested positive. Does this mean you’re likely to have the cancer? Surprisingly, the answer is no.

To see why, let’s suppose 100,000 screenings for this cancer are conducted. Of these, how many are positive? On average, 500 of these 100,000 people (0.5 percent of 100,000) will have cancer, and so, since 95 percent of these 500 people will test positive, we will have, on average, 475 positive tests (.95 x 500). Of the 99,500 people without cancer, 1 percent will test positive for a total of 995 false-positive tests (.01 x 99,500 = 995). Thus of the total of 1,470 positive tests (995 + 475 = 1,470), most of them (995) will be false positives, and so the probability of having this cancer given that you tested positive for it is only 475/1,470, or about 32 percent! This is to be contrasted with the probability that you will test positive given that you have the cancer, which by assumption is 95 percent.

It's a nice example of how our intuition about probabilities can often be out of step with reality.

New York Times: Mammogram Math

Uh, that IS Bayes' Theorem.

Posted by: John R. Vokey | December 14, 2009 at 18:52

You're right of course, but presenting it as conditional probability (or even probability within subsets) makes in more approachable than quoting Bayes' Theorem.

Posted by: David Smith | December 14, 2009 at 20:02

I like it.

Posted by: MB | December 14, 2009 at 20:37

The article is basically explaining the commonly misunderstood difference between sensitivity and positive predictive value. Obviously, people fail to understand the impact of what you are conditioning on. I've email a link to my biostat class, since we spent some time on this.

Posted by: Wade Davis | December 15, 2009 at 10:21

This will be the case as soon as the number of positives in the population (the 0.5%) is small. Exactly the same argumentation applies when you are "testing" for terror suspects through data mining/random searches/etc -- your system becomes ineffective as it is flooded by false positives.

Posted by: Alexis | December 17, 2009 at 02:28