First, only one-third of published psychology research is reliable. How do we respond?
We sort of knew that but, from the Conversation:
What does it mean if the majority of what’s published in journals can’t be reproduced?
Publishing together as the Open Science Collaboration and coordinated by social psychologist Brian Nosek from the Center for Open Science, research teams from around the world each ran a replication of a study published in three top psychology journals – Psychological Science; Journal of Personality and Social Psychology; and Journal of Experimental Psychology: Learning, Memory, and Cognition. To ensure the replication was as exact as possible, research teams obtained study materials from the original authors, and worked closely with these authors whenever they could.
Almost all of the original published studies (97%) had statistically significant results. This is as you’d expect – while many experiments fail to uncover meaningful results, scientists tend only to publish the ones that do.
What we found is that when these 100 studies were run by other researchers, however, only 36% reached statistical significance. This number is alarmingly low. Put another way, only around one-third of the rerun studies came out with the same results that were found the first time around. That rate is especially low when you consider that, once published, findings tend to be held as gospel.
The bad news doesn’t end there. Even when the new study found evidence for the existence of the original finding, the magnitude of the effect was much smaller — half the size of the original, on average.
The authors appropriately caution that replication efforts can yield wrong results too:
Some of these failures could be due to luck, or poor execution, or an incomplete understanding of the circumstances needed to show the effect (scientists call these “moderators” or “boundary conditions”). More.
From the Guardian:
Even when scientists could replicate original findings, the sizes of the effects they found were on average half as big as reported first time around. That could be due to scientists leaving out data that undermined their hypotheses, and by journals accepting only the strongest claims for publication.
and
the New York Times:
The past several years have been bruising ones for the credibility of the social sciences. A star social psychologist was caught fabricating data, leading to more than 50 retracted papers. A top journal published a study supporting the existence of ESP that was widely criticized. The journal Science pulled a political science paper on the effect of gay canvassers on voters’ behavior because of concerns about faked data.
…
Dr. John Ioannidis, a director of Stanford University’s Meta-Research Innovation Center, who once estimated that about half of published results across medicine were inflated or wrong, noted the proportion in psychology was even larger than he had thought. He said the problem could be even worse in other fields, including cell biology, economics, neuroscience, clinical medicine, and animal research.
The report appears at a time when the number of retractions of published papers is rising sharply in a wide variety of disciplines. Scientists have pointed to a hypercompetitive culture across science that favors novel, sexy results and provides little incentive for researchers to replicate the findings of others, or for journals to publish studies that fail to find a splashy result. More.
But first, don’t we need to decide whether truth matters? Many today think it doesn’t exist in any meaningful way.
But then there is the science journalist at Quartz, suffering from chronic abuse syndrome who responds with thoughts like:
This is how science can finally start to fix itself
And yet, as science writer Ed Yong explains in The Atlantic, “failed replications don’t discredit the original studies, any more than successful ones enshrine them as truth.”
No, but the general pattern discredits the discipline as a whole. A cook who finds one rotten egg tends to—for good reasons—throw out the whole carton.
Scientists need to balance their work on research that pushes the boundaries of science with less eye-catching studies that simply strengthen convictions on what we already know. That is why Jason Mitchell of Harvard University says, “we can’t interpret whether 36% [success at replication] is good, bad, or right on the money.”
If it’s “right on the money,” let’s quit funding the discipline. We may as well fund water witching.
You might assume that every scientific study is replicable, but it’s not that simple. Universal truths in psychology are much harder to establish than in mathematics. As you go up the complexity pyramid—mathematics to physics to chemistry to biology to psychology—the number of subjective choices a scientist must make increases quickly. Thus, a minor tweak can produce a significantly different outcome.
So maybe psychology can’t be a discipline in science, anymore than street intervention with substance abusers can.
Please, readers, that is not a detraction! Plenty of useful activities cannot be sciences simply because they are too difficult to quantify in any meaningful way.
But get this:
The upshot from Nosek’s grand experiment is not that psychology studies are unreliable, but that we are starting to learn how to make scientific research more rigorous.
Now that is what we mean by science journalists’ chronic abuse syndrome.
Having grasped that the subject is never going to reform and quit beating up writers (and their lay readers) with sexy but false data, the science journalist acts like a chronically battered wife:
“The upshot from Nosek’s grand experiment is not that psychology studies are unreliable” = My Fang really is a great guy, people just don’t understand him, and it’s all THEIR fault. Oh, and “we are starting to learn how to make scientific research more rigorous.” = Fang’s latest community service order will turn him into the sweet and lovable person I just know in my heart that he is!
Friend who work in crisis counselling tell me that the disorder is only minimally treatable, because—in an intelligent and sane adult—it is a form of wilful blindness.
And just think, those lucky multiverse cosmologists and luckier Darwinians, they have it so good! They need never worry about anything as tiresome as replication!
See also: Is it better not to know the truth? (Unfortunately, this sort of thing, advanced in highbrow mags, usually “evolves” toward a state where big government and other power institutions lie to us “for our own good.”)