På senare år har jag kommit att bli alltmer övertygad om att inkompetent bruk av statistiska metoder utgör ett omfattande hinder för skapandet av god forskning inom ett brett spektrum av vetenskaper - kanske rentav flertalet empiriska vetenskaper. Därför behöver ämnet matematisk statistik flytta fram sina positioner i universitetsvärlden, och därför har jag vid två tillfällen det senaste året - på KVVS och på Göteborgs universitets statsvetenskapliga institution - givit föredrag med samma rubrik som denna bloggpost. Nu har jag dessutom skrivit en uppsats om dessa saker, med rubriken Why the empirical sciences need statistics so desperately och avsedd för publicering i den engelskspråkiga vetenskapliga litteraturen. Som en aptitretare serverar jag uppsatsens inledande avsnitt nedan.
* * *
What is science? Despite what some adherents of Popperian falsificationism [25] may claim, it seems unlikely that we can find a single short definition of science that captures all important aspects. See, e.g., Haack [13] for a sensible discussion on some of its many facets. The complexity notwithstanding, I hope most of us can agree on the somewhat vague statement that science consists of systematic attempts by us humans to extract reliable information about the world around us.
Since science is carried out by humans, it is in practice dependent on our cognitive capacities. Evolution has equipped us with impressive abilities to observe and draw conclusions about the world around us, necessary for finding food and sexual partners and to avoid predators. On the other hand, since Darwinian evolution by natural selection is not a perfect optimization algorithm, it should not come as a huge surprise that we have some striking cognitive biases. Some of these form serious obstacles to the scientific endeavor. In particular the following two spring to mind.
Since science is carried out by humans, it is in practice dependent on our cognitive capacities. Evolution has equipped us with impressive abilities to observe and draw conclusions about the world around us, necessary for finding food and sexual partners and to avoid predators. On the other hand, since Darwinian evolution by natural selection is not a perfect optimization algorithm, it should not come as a huge surprise that we have some striking cognitive biases. Some of these form serious obstacles to the scientific endeavor. In particular the following two spring to mind.
- (a) The human pattern recognition machinery is often too trigger-happy, i.e., we tend to see patterns in what is actually just noise. There is a famous experiment where, because of this phenomenon, human subjects typically perform worse than pigeons and mice. The subject is faced with two lamps, and are asked repeatedly to predict which lamp will light up next. Unbeknownst to the subject, the lamps are turned on randomly, with a 0.8 probability for the lamp on the left to light up next, versus a 0.2 probability for the one on the right, in an i.i.d. sequence. Human subjects notice the asymmetry, but try to mimic the intricate pattern of lights, predicting the lamp on the left about 80% of the time, and ending up making the right guess 0.8·0.8+0.2·0.2 = 68% of the time. Simpler animals quickly settle for guessing the most frequent lamp every time, getting it right 80% of the time. See, e.g., Hinson and Staddon [16] and Wolford et al. [28].
(b) In many situations, we tend to be overly confident about our conclusions. The following experimet is described by Alpert and Raiffa [2]; see also Yudkowsky [29]. Subjects are asked to estimate som quantity whose exakt value they typically do not know, such as the surface area of Lake Michigan or the number of registered cars in Sweden. They are asked not for a single number, but for an upper bound and a lower bound together encompassing an interval with the property that the subject attaches a subjective probability of 98% to the event that the true value lies in the interval. If subjects are well-calibrated in terms of the confidence they attach to their estimates, then one would expect them to hit the true value about 98% of the time. In experiments, they do so less than 60% of the time, indicating that severe overconfidence in estimating unknown quantities is a wide-spread phenomenon.
Because of these cognitive biases and several others, we need, in order to perform good science, to set up various safeguards against our spontaneous tendency towards faulty and overconfident conclusions. Randomized, double blind and placebo-controlled clinical trials is a typical example of a formalized protocol for precisely this purpose. The theory of statistical inference offers plenty of others, including a variety of important techniques for telling pattern from noise and for quantifying the amount of confidence in a given conclusion that a given data set warrants – that is, for circumventing biases (a) and (b) above.
Statistical techniques are indispensable for doing high-quality and trustworthy science. Fortunately, the use of such techniques are wide-spread, to the point of permeating the empirical sciences. Unfortunately, they are often used in erroneous ways and in situations where they simply do not apply, leading to unwarranted conclusions.
In Section 2, I will try to argue the seriousness of the situation by pointing out some indications – some of them quite shocking – about how widespread this misuse is. In Sections 3 and 4 I offer a couple of concrete examples of erroneous application and interpretation of statistical arguments. In an unabashed attempt to catch the readers’ attention, I take them from two of the most hotly debated (in public discourse) research areas: climate science and gender studies. Then, in Section 5, I will exemplify how the lack of statistical expertise in many empirical sciences has given room to a population of self-proclaimed and mostly self-taught statistical “experts” giving erroneous advice to their colleagues. Finally, in Section 6, I will offer a few thoughts on how it might be possible to improve the situation in the future.
Continue reading here!