Bruce Bower thinks that social sciences researchers wanted to seem as impressive as hard science researchers, in terms of results, so they developed the p-value (p < .05) in the mid-20th century. He doesn’t think it was a good idea.
Psychologists in particular wanted a statistical skeleton key to unlock true experimental insights. It was an unrealistic burden to place on statistics, but the longing for a mathematical seal of approval burned hot. So psychology textbook writers and publishers created one, and called it statistical significance.
By calculating just one number from their experimental results, called a P value, researchers could now deem those results “statistically significant.” That was all it took to claim — even if mistakenly — that an interesting and powerful effect had been demonstrated. The idea took off, and soon legions of researchers were reporting statistically significant results.
To make matters worse, psychology journals began to publish papers only if they reported statistically significant findings, prompting a surprisingly large number of investigators to massage their data — either by gaming the system or cheating — to get below the P value of 0.05 that granted that status. Inevitably, bogus findings and chance associations began to proliferate.
Bruce Bower, “How the strange idea of ‘statistical significance’ was born” at ScienceNews (August 12, 2021)
We know. It hasn’t helped the profession’s reputation. Some want to just dump “the null ritual”:
It’s well past time to dump the null ritual, says psychologist and applied statistician Richard Morey of Cardiff University, Wales. Researchers need to focus on developing theories of mind and behavior that lead to testable predictions. In that brave new scientific world, investigators will choose which of many statistical tools best suits their needs. “Statistics offer ways to figure out how to doubt what you’re seeing,” Morey says.
Bruce Bower, “How the strange idea of ‘statistical significance’ was born” at ScienceNews (August 12, 2021)
Bower provides an interesting account of an attempt tp use p-values to assess whether people lost their religious beliefs while contemplating Rodin’s “Thinker” statue. We can all think of more useful enterprises for social sciences than that.