In 2009, I read and enjoyed Stephen Ziliak's and Deirdre McCloskey's important book The Cult of Statistical Significance, and in 2010 my review of it appeared in the Notices of the American Mathematical Society. In the hope of engaging Ziliak and McCloskey in some further discussion, I write the present blog post in English; readers looking for a gentle introduction in Swedish to statistical hypothesis testing and statistical significance may instead consult an earlier blog post of mine.
In Ziliak's and McCloskey's recent contribution to Econ Journal Watch, we find the following passage:
- In several dozen journal reviews and in comments we have received—from, for example, four Nobel laureates, the statistician Dennis Lindley (2012), the mathematician Olle Häggström (2010), the sociologist Steve Fuller (2008), and the historian Theodore Porter (2008)—no one [...] has tried to defend null hypothesis significance testing.
- The Cult of Statistical Significance is written in an entertaining and polemical style. Sometimes the authors push their position a bit far, such as when they ask themselves: "If null-hypothesis significance testing is as idiotic as we and its other critics have
so long believed, how on earth has it survived?" (p. 240). Granted, the single-minded focus on statistical significance that they label sizeless science is bad practice. Still, to throw out the use of significance tests would be a mistake, considering how often it is a crucial tool for concluding with confidence that what we see really is a pattern, as opposed
to just noise. For a data set to provide reasonable evidence of an important deviation from the null hypothesis, we typically need both statistical and subject-matter significance
Let me take this opportunity to expand a bit, by means of a simple example, on my claim that in order to establish "reasonable evidence of an important deviation from the null hypothesis, we typically need both statistical and subject-matter significance". Assume that the producer of the soft drink Percy-Cola has carried out a study in which subjects have been blindly exposed to one mug of Percy-Cola and one mug of Crazy-Cola (in randomized order), and asked to indicate which of them tastes better. Assume furthermore that 75% of subjects prefer the mug containing Percy-Cola, while only 25% prefer the one with Crazy-Cola. How impressed should we be by this?
This depends on how large the study is. Compare the two cases
- (a) out of a total of 4 subjects, 3 preferred Percy-Cola,
(b) out of a total of 1000 subjects, 750 preferred Percy-Cola.
Statistical significance is a useful way of quantifying how convinced we should be that an observed effect is real and not just a statistical fluctuation. Ziliak and McCloskey argue at length in their book that statistical significance has often been misused in many fields, and in this they are right. But they are wrong when they suggest that the concept is worthless and should be discarded.
Edit, March 4, 2015: Somewhat belatedly, and thanks to the kind remark by Mark Dehaven in the comments section below, I have realized that my sentence "Statistical significance is a useful way of quantifying how convinced we should be that an observed effect is real and not just a statistical fluctuation" in the last paragraph does not accurately reflect my view - neither my view now, nor the one I had two years ago. It is hard for me to understand now how I could have written such a thing, but my best guess is that it must have been written in a haste. Statistical significance and p-values do not quantify "how convinced we should be", because there may be so much else, beyond the data set presently at hand, that ought to influence how convinced or not we should be. Instead of the unfortunate sentence, I'd prefer to say that "Statistical significance and p-values provide, as a first approximation, an indication of how strongly the data set in itself constitutes evidence against the null hypothesis (provided the various implicit and explicit model assumptions correctly represent reality)".