by Joseph Rickert
I had barely begun reading Statistics Done Wrong: the Woefully Complete Guide by Alex Reinhart (no starch press 2015) when I stated to wonder about the origin of the aphorism "Don't shoot the messenger." It occurred to me that this might be a reference to a primitive emotion that wells up unbidden when you hear bad news in such a way that you know things are not going to get better any time soon.
It was on page 4 that I read: "Even properly done statistics can't be trusted." Ouch! Now, to be fair, the point the author is trying to make here is that it is often not possible, based solely on the evidence contained in a scientific paper, to determine if an author sifted through his data until he turned up something interesting. But, coming as it does after mentioning J.P.A Ioannidis' conclusion that most published research findings are probably false, that the average scores of medical school faculty on tests of basic statistical knowledge don’t get much better than 75%, and that both pharmaceutical companies and the scientific journals themselves bias research by failing to publish studies with negative results, Reinhart’s sentence really stings. Moreover, Reinhart is so zealous in his efforts to expose the numerous ways a practicing scientist can go wrong in attempting to "employ statistics" it is reasonable (despite the optimism he expresses in the final chapters) for a reader in the book’s target demographic of practicing scientists with little formal training in statistics to conclude that the subject is just insanely difficult.
Is the practice of statistics just too difficult? Before permitting myself a brief comment on this I’ll start with an easier and more immediate question: Is this book worth reading? To this question, the answer is an unqualified yes.
Anyone starting out on a journey would like to know ahead of time where the road is dangerous, were the hard climbs are, and most of all: where be the dragons? Statistics Done Wrong is as good a map to the traps lurking in statistical analysis adventures that you are ever likely to find. In less than 150 pages it covers the pitfalls of p-values, the perils of being underpowered, the disappointments of false discoveries, the follies mistaking correlation for causation, the evils of torturing data and the need for exploratory analysis to avoid Simpson’s paradox.
About three quarters of the way into the book (Chapter 8), Reinhart moves beyond the basic hypothesis testing to consider some of the problems associated with fitting linear models. There follows a succinct but lucid presentation some essential topics including over fitting, unnecessary dichotomization, variable selection via stepwise regression, the subtle ways in which one can be led into mistaking correlation for causation, the need for clarity in dealing with missing data and the difficulties of recognizing and accounting for bias.
That is a lot of ground to cover, but Reinhart manages it with some style and with an eye for relevant contemporary issues. For example, in his discussion on statistical significance Reinhart says:
And because any medication or intervention usually has some real effect, you can always get a statistically significant result by collecting so much data that you detect extremely tiny but relatively unimportant differences (p9).
And then, he follows up with a very amusing quote from Bruce Thomson's 1992 paper that wryly explains that significance tests on large data sets are often little more than confirmations of the fact that a lot of data was collected. Here we have a “Big Data” problem, deftly dealt with in 1992, but in a journal that no data scientist is ever likely to have read.
The bibliography contained in the notes to each chapter of Statistics Done Wrong is a major strength of the book. Nearly every transgression recorded and every lamentable tale of the sorry state of statistical practice is backed up with a reference to the literature. This impressive exercise at scholarly research adds some weight and depth to the book’s contents and increases it usefulness as a guide.
Also, to my surprise and great delight, Reinhart manages a short discussion that elucidates the differences between R. A. Fisher’s conception of p-values and the treatment given by Neyman and Pearson in their formal theory of hypothesis testing. The confounding of these two very different approaches in what Gigerenzer et al. call the “Null Ritual” is perhaps the root cause of most of the misuse and abuse of significance testing in the scientific literature. However, you can examine dozens of the most popular text books on elementary statistics and find no mention of it.
In the closing chapters of the Statistics Done Wrong Reinhart effects a change of tone and discusses some of the structural difficulties with the practice of statistics in the medical and health sciences that have contributed to the present pandemic of the publication of false, misleading or just plain useless results. Topics include the lack of incentives for researchers to publish inconclusive and negative results, the reluctance of many researchers to share data and the willingness of some to attempt to game the system by deliberately publishing “doctored” results. Reinhart handles these topics nicely and uses them to motivate contemporary work on reproducible research and the need to cultivate a culture of reproducible and open research. Reinhart ends the book with recommendations for the new researcher that allows him to finish the book on a surprisingly upbeat note. The bearer of bad news concludes by offering hope.
I highly recommend Statistics Done Wrong to be read as the author intended: as supplementary material. In the preface, Reinhart writes:
But this is not a textbook, so I will not teach you how to use these techniques in any technical detail. I only hope to make you aware of the most common problems so you are able to pick the statistical technique best suited to your question.
Statistics Done Wrong is the kind of study guide that I think could benefit almost anyone slogging through a statistical analysis for the first time. It seems to me that the author achieve his stated goal with admiral economy and just a few shortcomings. The book, which entirely avoids the use of mathematical symbolism, would have benefited from precise definitions of the key concepts presented (p-values, confidence intervals etc.) and from a little R code to back these definitions. These are, however, relatively minor failings.
Now, back to the big question: is the practice of statistics just too difficult? Yes, I think that the catalogue of errors and numerous opportunities for going wrong documented by Reinhart indicates that the practice of statistics is more difficult than it needs to be. My take on why this is so is expressed (perhaps inadvertently) by Reinhart in the statement of his of his goal for the book quoted above. As long as statistics is conceived and taught as the process of selecting the right technique to answer isolated questions, rather than as an integrated system for thinking with data, we are all going to have a difficult time of it.
Comments
You can follow this conversation by subscribing to the comment feed for this post.