Stephen T. Ziliak, co-author with Deidre N. McCloskey of The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives, appeared on BBC Radio 4’s More or Less with Tim Harford to discuss the problems with significance testing. Harford’s program focuses on the way numbers and statistics are used and misused in the public realm, and for the April 15 episode he spoke with Ziliak about a recent Supreme Court decision that hinged on whether statistical significance was a meaningful standard by which to assess a company’s liability for a potentially dangerous product.
Harford began using an example of a bowl full of Smarties candy with colors other than orange and blue removed, from which the host would take a sample to determine which color was present in greater quantity. A random sample found 2 blue, 8 orange, which led Harford to ask, “Does that prove that there are more orange than blue Smarties in the jar?” As Harford munched away on his sample, co-host Hannah Barnes explained that statistical significance testing would assume an equal number of orange and blue Smarties and then calculate the likelihood of drawing 8 orange or more in a sample of ten.
“One reason I do not like the test of statistical significance,” Ziliak told BBC Radio 4, “is that it doesn’t tell you what you want. What you want to know is the probability of your hypothesis being true, given the available evidence. That would be very useful, for example, when you’re taking a pain relief pill. You’d like to know the probability of the benefits and costs of taking that medicine when that effect is actually there. The test of statistical significance does not give us that information. It reverses the problem entirely and says the following: what is the probability of seeing the data given that there’s no difference between the two medicines we’re examining?”
“And in fact there may well be a difference,” Harford interjected.
“There might well be a difference. And you can see secondly there’s a major problem with the test of significance is that it does not adequately deal with reasonable human beliefs about the odds of events and the consequences of following or not following the available and knowable odds.”
Discussion then turned to Matrixx v. Siracusano, a case before the Supreme Court to decide whether the makers of the drug Zicam should be held responsible for not disclosing potential side effects because the frequency of occurrence was not statistically significant. McCloskey and Ziliak submitted an amicus brief stating that the practical importance of the side effect—in this case, a permanent loss of smell in some patients—should be the true measure, rather than whether that effect is statistically likely to occur. Justice Sonya Sotomayor thanked the University of Michigan Press authors for their "wonderful" work, and McCloskey and Ziliak's brief led the court to determine that medical, drug, and other industries reporting to the Securities and Exchange Commission must be held to a more relevant standard than statistical significance.
“A third major problem is that it’s possible to get your significance test, your p-value, if you will, below that 5% threshold simply by adding observations to your study,” Ziliak noted, meaning that data can be manipulated to prove or disprove statistical significance depending on the observer's desired result.
“What we want to know is, what’s the economic or clinical or medical importance, the practical importance, of the quantitative difference between two or however many items you’re comparing at once? There’s a big difference between magnitude and likelihood of magnitude, and strangely enough 90% of publishing scientists do not make that distinction.”