In ScienceNews this month, there's controversial article exposing the fact that results claimed to be "statistically significant" in scientific articles aren't always what they're cracked up to be. The article -- titled "Odds Are, It's Wrong" is interesting, but I take a bit of an issue with the sub-headline, "Science fails to face the shortcomings of Statistics". As it happens, the examples in the article are mostly cases of scientists behaving badly and abusing statistical techniques and results:
- Authors abusing P-vales to conflate statistical significance with practical significance. A for example, a drug may uncritically be described as "significantly" reducing the risk of some outcome, but the the actual scale of the statistically significant difference is so small that is has no real clinical implication.
- Not accounting for multiple comparisons biases. By definition, a test "significant at the 95% level" has 5% chance of having occurred by random chance alone. Do enough tests, and you'll find some indeed indicate significant differences -- but there will be some fluke events in that batch. There are so many studies, experiments and tests being done today (oftentimes, all in the same paper)that the "false discovery rate" maybe higher than we think -- especially given that most nonsignificant results go unreported.
Statisticians, in general, are aware of these problems and have offered solutions: there's a vast field of literature on multiple comparisons tests, reporting bias, and alternatives (such as Bayesian methods) to P-value tests. But more often than not, these "arcane" issues (which are actually part of any statistical training) go ignored in scientific journals. You don't need to be a cynic to understand the motives of the authors for doing so -- hey, a publication is a publication, right? -- but the cooperation of the peer reviewers and editorial boards is disturbing.
ScienceNews: Odds Are, It's Wrong



I agree with you but in my opinion the big problem lies in the lack of communication between statisticians and biologists. And it's a faulty communications from both sides obviously.
I had to smile a little bit when I read your sentence saying: "But more often than not, these "arcane" issues (which are actually part of any statistical training) go ignored in scientific journals."
Well, try and ask 100 biologist what a Baesyan method is and see how many can at least give you a vague definition. Not many, I can tell you. And most people are not interested (or too scared) to care and read something about it.
Posted by: nico | March 30, 2010 at 11:04
"Authors abusing P-values to conflate statistical significance with practical significance. A for example, a drug may uncritically be described as "significantly" reducing the risk of some outcome, but the the actual scale of the statistically significant difference is so small that is has no real clinical implication."
This reads like the issue is simply a misunderstanding of jargon. "Significance" means statistical significance to the academic audience.
The problem is more fundamental: there is a tendency to focus on hypothesis testing rather than parameter estimation.
Regarding multiple comparisons, many corrections could be performed by a reader using the uncorrected p-values, so long as the tests reported are all the tests that were conducted. If you want to be conservative, Bonferroni correction is simple: just count up the number of p-values and multiply them all by that number.
Posted by: Dan | March 30, 2010 at 23:33
The co-operation of peer reviewers and editorial boards may be disturbing, but it is understandable. Most of the reviewers and editors will have studied statistics many years ago and forgotten more than they remember. The result - stick with what was done before, even if it could actually be wrong. The status quo is self-reinforcing.
I like this account from "The Cult of Statistical Significance" p.112:
We asked William Kruskal a couple of years before his death, "Why did significance testing get so badly mixed up, even in the hands of professional statisticians? ..." "Well," replied Kruskal, smiling sadly, "I guess it's a cheap way to get marketable results."
Posted by: Grant Paton-Simpson | March 31, 2010 at 00:57
Hi David,
Good post on an importing subject.
Just one small point: Notice that your sentence:
"... the "false discovery rate" maybe higher than we think "
Is (probably) correct,
But my guess is that what is more probable (and also the type of error people are making) is that they think that the "family wise error" (FWE) in the article is about 5% - when in fact it is not. That is, people might think that each time they see a P.value < .05 , that means that it can be interpreted in the way they would interpret a single P value.
While the FDR of the article might be kept on q<.05, people could (too easily) misinterpret it as if the article's FWE was less then .05.
I am not sure I was clear to whoever is not familiar to the subject, but I hope at least that I was able to raise some question for people to go and find the answers to :)
Posted by: Tal Galili | March 31, 2010 at 06:33
百草枯是世界上被最广泛应用的非选择性除草剂之一,主要用于可持续农业和保护性耕作。在做整地时,例如采用免耕方法,百草枯(可无踪)使快速除草、防治草甘膦抗性杂草及防止水土流失成为可能。百草枯在土壤中失去活性,没有淋溶
Posted by: 土地準備中 | April 08, 2010 at 11:39
learned a lot
Posted by: Boydayoppoppy | September 06, 2011 at 01:04