A recently published article in Nature Magazine calls for action against the misleading use of "statistical significance". The paper is supported by more than eight hundred academics from disciplines.
Pleased to meet you, ‘P values’!
Statistical significance is prevalent in many fields and has a deep impact on our daily lives, choices, and decisions. The three scientists behind the paper argue that, in statistical analyses, it happens too often that it concludes that there is "no difference" between two studied groups. In statistics, this phenomenon is called the "null hypothesis".
The authors claim that a study stating such thing based on solely the null hypothesis is dangerously misleading. Their argument is that there can be a minuscule difference between two studied groups, although one of them can turn out to be significant, while the other one insignificant. This dichotomization happens because of the method that is too strictly relying on one factor, as of the threshold.
"Let’s be clear about what must stop: we should never conclude there is ‘no difference’ or ‘no association’ just because a P value is larger than a threshold such as 0.05 (…) Neither should we conclude that two studies conflict because one had a statistically significant result and the other did not. These errors waste research efforts and misinform policy decisions."
How does it work?
"For example, consider a series of analyses of unintended effects of anti-inflammatory drugs2. Because their results were statistically non-significant, one set of researchers concluded that exposure to the drugs was 'not associated' with new-onset atrial fibrillation (…) and that the results stood in contrast to those from an earlier study with a statistically significant outcome."
Looking at the actual data did not prove these above, they argue, thus stating: "It is ludicrous to conclude that the statistically non-significant results showed 'no association', when the interval estimate included serious risk increases; it is equally absurd to claim these results were in contrast with the earlier results showing an identical observed effect. Yet these common practices show how reliance on thresholds of statistical significance can mislead us."
The consequences Amrhein, Greenland, and McShane professors also state that the whole issue is actually more human than it is statistical, it is us, and our cognitive processes what work in this categoric way. It "led scientists and journal editors to privilege such results, thereby distorting the literature. Statistically significant estimates are biased upwards in magnitude and potentially to a large degree, whereas statistically non-significant estimates are biased downwards in magnitude."
Is there a way out? "We (…) call for the entire concept of statistical significance to be abandoned. (…) One reason to avoid such ‘dichotomania’ is that all statistics, including Pvalues and confidence intervals, naturally vary from study to study, and often do so to a surprising degree."
"We must learn to embrace uncertainty," they continue. "One practical way to do so is to rename confidence intervals as ‘compatibility intervals’ and interpret them in a way that avoids overconfidence."
They are not alone
This article is an important one in a line of other similar warnings written by scientists in the past few years, all advocating against the usage of the misleading methodology. In 2016, the American Statistical Association released a statement in The American Statistician warning against the misuse of statistical significance and P values.
The issue also included many commentaries on the subject. This month, a special issue in the same journal attempts to push these reforms further. It presents more than 40 papers on "Statistical inference in the 21st century: a world beyond P < 0.05". The editors introduce the collection with the caution “don’t say ‘statistically significant'". Another article with dozens of signatories also calls on authors and journal editors to disavow those terms.