Intelligent Design News Peer review

Early Darwinian Ronald Fisher’s p-value measure is coming under serious scrutiny

Spread the love
File:R. A. Fischer.jpg
Ronald Fisher/Bletchley

Further to “Everyone seems to know now that there’s a problem in science research today and “At a British Journal of Medicine blog, a former editor says, medical research is still a scandal,” Ronald Fisher’s p-value measure, a staple of research, is coming under serious scrutiny.

Many will remember Ronald Fisher (1890–1962) as the early twentieth century Darwinian who reconciled Darwinism with Mendelian genetics, hailed by Richard Dawkins as the greatest biologist since Darwin. Hid original idea of p-values (a measure of whether an observed result can be attributed to chance) was reasonable enough, but over time the dead hand got hold of it:

Perhaps the worst fallacy is the kind of self-deception for which psychologist Uri Simonsohn of the University of Pennsylvania and his colleagues have popularized the term P-hacking; it is also known as data-dredging, snooping, fishing, significance-chasing and double-dipping. “P-hacking,” says Simonsohn, “is trying multiple things until you get the desired result” — even unconsciously. It may be the first statistical term to rate a definition in the online Urban Dictionary, where the usage examples are telling: “That finding seems to have been obtained through p-hacking, the authors dropped one of the conditions so that the overall p-value would be less than .05”, and “She is a p-hacker, she always monitors data while it is being collected.”

Such practices have the effect of turning discoveries from exploratory studies — which should be treated with scepticism — into what look like sound confirmations but vanish on replication. Simonsohn’s simulations have shown9 that changes in a few data-analysis decisions can increase the false-positive rate in a single study to 60%. P-hacking is especially likely, he says, in today’s environment of studies that chase small effects hidden in noisy data. It is tough to pin down how widespread the problem is, but Simonsohn has the sense that it is serious. In an analysis10, he found evidence that many published psychology papers report P values that cluster suspiciously around 0.05, just as would be expected if researchers fished for significant P values until they found one.

It all ended in scandals. In some cases, you might just as well have interviewed the researchers as to their private opinions about certain types of people as bother to read their studies. They might as well have been tweeting for popular media about cereal ads. Oops. No, wait, even media can get too big a dose of that kind of thing.

Some want to introduce Bayesianism (plausibility as a measure) or a combination of Bayesianism and p-values. But here’s the big hurdle:

Any reform would need to sweep through an entrenched culture. It would have to change how statistics is taught, how data analysis is done and how results are reported and interpreted. But at least researchers are admitting that they have a problem, says Goodman. “The wake-up call is that so many of our published findings are not true.” We just don’t yet have all the fixes.” More.

Hey, for now, look on the bright side. It still matters to a lot of people that so many of the published findings are not true! So there is hope.

See also: If peer review is still working, why all the retractions?

Hat tip: Stephanie West Allen at Brains on Purpose

Follow UD News at Twitter!

9 Replies to “Early Darwinian Ronald Fisher’s p-value measure is coming under serious scrutiny

  1. 1
    Ian Thompson says:

    There are similar p-hacking problems in the very structure of using MRI images of the brain to search for ‘localization of cognitions’.
    The problem is that the exploratory and the testing experiments are not independent enough: exactly as discussed above!
    See Edward Vul, here and at his own voodoo correlations site.

  2. 2
    Barry Arrington says:

    And on the other end of the spectrum Nick Matzke is not willing to exclude “chance” as an explanation for 500 heads in a row. Ideological blinders cause people to do stupid and sometimes dishonest things.

  3. 3
    selvaRajan says:

    p values have long been known to be unreliable. If we calculate the p-values of a data set many times, it can be seen that p-values vary widely from non-significant to highly significant. In statistics, conclusions have to be drawn by examining the values obtained from a tool set- not just one tool’s result.

  4. 4
    cantor says:

    If we calculate the p-values of a data set many times, it can be seen that p-values vary widely from non-significant to highly significant.

    That sentence as written makes no sense. If you’re using the same dataset and the same computation, you’ll always get the same value. So what did you mean?

  5. 5
    Mark Frank says:

    I couldn’t agree more with the article but I am kind of surprised it got into Nature. It is hardly news. The problems of Fisherian and N-P hypothesis testing have been acknowledged for decades.

    Ironically William Dembski is a great supporter of Fisherian hypothesis testing, considers his approach to design detection to be an extension of it and has attached Bayesian approaches.

  6. 6
    Mark Frank says:

    that should read “attacked Bayesian approaches”

  7. 7
    bornagain77 says:

    Well lo and behold, Theobald’s attempt to make common descent ‘mathematically scientific’ with ‘statistical significance’ gets flushed down the toilet.

    Scientific method: Statistical errors – P values, the ‘gold standard’ of statistical validity, are not as reliable as many scientists assume. – Regina Nuzzo – 12 February 2014
    Excerpt: “P values are not doing their job, because they can’t,” says Stephen Ziliak, an economist at Roosevelt University in Chicago, Illinois, and a frequent critic of the way statistics are used.,,,
    “Change your statistical philosophy and all of a sudden different things become important,” says Steven Goodman, a physician and statistician at Stanford. “Then ‘laws’ handed down from God are no longer handed down from God. They’re actually handed down to us by ourselves, through the methodology we adopt.”,,
    One researcher suggested rechristening the methodology “statistical hypothesis inference testing”3, presumably for the acronym it would yield.,,
    The irony is that when UK statistician Ronald Fisher introduced the P value in the 1920s, he did not mean it to be a definitive test. He intended it simply as an informal way to judge whether evidence was significant in the old-fashioned sense: worthy of a second look. The idea was to run an experiment, then see if the results were consistent with what random chance might produce.,,,
    Neyman called some of Fisher’s work mathematically “worse than useless”
    “The P value was never meant to be used the way it’s used today,” says Goodman.
    The more implausible the hypothesis — telepathy, aliens, homeopathy — the greater the chance that an exciting finding is a false alarm, no matter what the P value is.,,,
    “It is almost impossible to drag authors away from their p-values, and the more zeroes after the decimal point, the harder people cling to them”11,,

    Theobald’s claim for ‘statistical significance’ is here:

    29+ Evidences for Macroevolution – Douglas Theobald, Ph.D.
    Part 1: – The Unique Universal Phylogenetic Tree
    Excerpt: Seventy-five independent studies from different researchers, on different organisms and genes, with high values of CI (P < 0.01) is an incredible confirmation with an astronomical degree of combined statistical significance (P << 10-300, Bailey and Gribskov 1998; Fisher 1990).
    per Talk Origins

    Recent refutations of Theobald's claim:

    Molecular Data Wreak Havoc on the Tree of Life – Casey Luskin – February 7, 2014
    Excerpt: Douglas Theobald claims in his "29+ Evidences for Macroevolution" that "well-determined phylogenetic trees inferred from the independent evidence of morphology and molecular sequences match with an extremely high degree of statistical significance."
    In reality, however, the technical literature tells a different story. Studies of molecular homologies often fail to confirm evolutionary trees depicting the history of the animal phyla derived from studies of comparative anatomy. Instead, during the 1990s, early into the revolution in molecular genetics, many studies began to show that phylogenetic trees derived from anatomy and those derived from molecules often contradicted each other.
    Stephen Meyer – Darwin's Doubt – (pp. 122-123)
    ,,,Moreover, when complex parts that are shared by different animals aren't distributed in a treelike pattern, that wreaks havoc on the assumption of homology that's used to build phylogenetic trees. In other words, this kind of extreme convergent evolution refutes the standard assumption that shared biological similarity (especially complex biological similarity like a brain and nervous system) implies inheritance from a common ancestor.
    If brains and nervous systems evolved multiple times, this undermines the main assumptions used in constructing phylogenetic trees, calling into question the very basis for inferring common ancestry.,,,

    Does Natural Selection Leave "Detectable Statistical Evidence in the Genome"? More Problems with Matzke's Critique of Darwin's Doubt – Casey Luskin August 7, 2013
    Excerpt: A critical review of these statistical methods has shown that their theoretical foundation is not well established and they often give false-positive and false-negative results.

    Previous refutations of Theobalds claim:

    Douglas Theobald's Test Of Common Ancestry Ignores Common Design (Koonin) – November 2010

    But Isn't There a Consilience of Data That Corroborates Common Descent? – Casey Luskin – December 2010
    Excerpt: Dr. Theobald might have had a point, were it not for the fact that:
    (1) Phylogeny and biogeography don't always agree.
    (2) Phylogeny and paleontology don't always agree.
    (3) Transitional fossils are often missing (or the "predicted" transitional fossils fall apart on closer inspection).
    (4) Hierarchical classifications often fail.
    (5) "Homologous" structures often have different developmental pathways or different structures often have "homologous" developmental pathways.
    (6) Morphological and molecular phylogenies are often incongruent.

    Oh well. is this really any surprise?

    “nobody to date has yet found a demarcation criterion according to which Darwin can be described as scientific”
    – Imre Lakatos (November 9, 1922 – February 2, 1974) a philosopher of mathematics and science, quote as stated in 1973 LSE Scientific Method Lecture

    Oxford University Seeks Mathemagician — May 5th, 2011 by Douglas Axe
    Excerpt: Grand theories in physics are usually expressed in mathematics. Newton’s mechanics and Einstein’s theory of special relativity are essentially equations. Words are needed only to interpret the terms. Darwin’s theory of evolution by natural selection has obstinately remained in words since 1859. …

    “On the other hand, I disagree that Darwin’s theory is as `solid as any explanation in science.; Disagree? I regard the claim as preposterous. Quantum electrodynamics is accurate to thirteen or so decimal places; so, too, general relativity. A leaf trembling in the wrong way would suffice to shatter either theory. What can Darwinian theory offer in comparison?”
    (Berlinski, D., “A Scientific Scandal?: David Berlinski & Critics,” Commentary, July 8, 2003)

    Macroevolution, microevolution and chemistry: the devil is in the details – Dr. V. J. Torley – February 27, 2013
    Excerpt: After all, mathematics, scientific laws and observed processes are supposed to form the basis of all scientific explanation. If none of these provides support for Darwinian macroevolution, then why on earth should we accept it? Indeed, why does macroevolution belong in the province of science at all, if its scientific basis cannot be demonstrated?

    Whereas nobody can seem to come up with a rigid demarcation criteria for Darwinism so as to potentially falsify it and thus make it 'scientific', Intelligent Design (ID) does not suffer from such a lack of mathematical rigor or falsification threshhold:

    Evolutionary Informatics Lab – Main Publications

    ,, the empirical falsification criteria of ID is much easier to understand than the math is, and is as such:

    "Orr maintains that the theory of intelligent design is not falsifiable. He’s wrong. To falsify design theory a scientist need only experimentally demonstrate that a bacterial flagellum, or any other comparably complex system, could arise by natural selection. If that happened I would conclude that neither flagella nor any system of similar or lesser complexity had to have been designed. In short, biochemical design would be neatly disproved."
    – Dr Behe in 1997

    Michael Behe on Falsifying Intelligent Design – video

  8. 8
    bornagain77 says:

    Mark Frank, so statistics turns out to be “seeing a *weasel in the cloud”? Oh well, since both ID and Darwinisn use statistics to different degrees, I guess we had better do science the ‘old fashioned’ way. i.e. By empirical evidence! So does Darwinian evolution have any empirical support that it can generate molecular machines?

    “There are no detailed Darwinian accounts for the evolution of any fundamental biochemical or cellular system only a variety of wishful speculations. It is remarkable that Darwinism is accepted as a satisfactory explanation of such a vast subject.”
    James Shapiro – Molecular Biologist

    Michael Behe – No Scientific Literature For Evolution of Any Irreducibly Complex Molecular Machines

    “Grand Darwinian claims rest on undisciplined imagination”
    Dr. Michael Behe

    HMMM, wishful speculations and imagination instead of empirical support for Darwinian claims. Not good, not good at all! Oh well, so much for Darwinian claims as to being scientific. How about ID, does ID have any empirical evidence that Intelligence can generate molecular machines?:

    (Man-Made) DNA nanorobot – video

    Making Structures with DNA “Building Blocks” – Wyss institute – video

    Also of note, Dr. James Tour, who, in my honest opinion, currently builds the most sophisticated man-made molecular machines in the world,,,

    Science & Faith — Dr. James Tour – video (At the two minute mark of the following video, you can see a nano-car that was built by Dr. James Tour’s team)

    ,,will buy lunch for anyone who can explain to him exactly how Darwinian evolution works:

    “I build molecules for a living, I can’t begin to tell you how difficult that job is. I stand in awe of God because of what he has done through his creation. Only a rookie who knows nothing about science would say science takes away from faith. If you really study science, it will bring you closer to God.”
    James Tour – one of the leading nano-tech engineers in the world – Strobel, Lee (2000), The Case For Faith, p. 111

    Top Ten Most Cited Chemist in the World Knows That Evolution Doesn’t Work – James Tour, Phd. – video

    Of note: *’seeing a weasel in the cloud’ refers to a part of Shakespeare’s Hamlet play:

    Of note: “Methinks it is like a weasel” phrase doesn’t makes any sense at all unless the entire play of Hamlet is taken into consideration so as to give the “Weasel” phrase a proper context. Moreover the context in which the phrase finds its meaning is derived from several different levels of the play. i.e. The ENTIRE play, and even the Elizabethan culture, provides meaning for the individual “Weasel” phrase.

    A Meaningful World: How the Arts and Sciences Reveal the Genius of Nature – Book Review
    Excerpt: They focus instead on what “Methinks it is like a weasel” really means. In isolation, in fact, it means almost nothing. Who said it? Why? What does the “it” refer to? What does it reveal about the characters? How does it advance the plot? In the context of the entire play, and of Elizabethan culture, this brief line takes on significance of surprising depth. The whole is required to give meaning to the part.

    In fact it is interesting to note what the overall context is for “Methinks it is like a weasel” that is used in the Hamlet play. The context in which the phrase is used is to illustrate the spineless nature of one of the characters of the play. To illustrate how easily the spineless character can be led to say anything that Hamlet wants him to say:

    Ham. Do you see yonder cloud that ’s almost in shape of a camel?
    Pol. By the mass, and ’t is like a camel, indeed.
    Ham. Methinks it is like a weasel.
    Pol. It is backed like a weasel.
    Ham. Or like a whale?
    Pol. Very like a whale.

    After realizing what the context of ‘Methinks it is like a weasel’ actually was, I remember thinking to myself that it was perhaps the worse possible phrase that Dawkins could have possibly chosen to try to illustrate his point, since the phrase, when taken into context, actually illustrates that the person saying it (Hamlet) was purposely manipulating the other character into saying that the cloud looked like a weasel. Which, I am sure, is hardly the idea, i.e. deception and manipulation, that Dawkins was trying to convey for Darwinism with his ‘Weasel’ example.

    Supplemental notes:

    LIFE’S CONSERVATION LAW – William Dembski – Robert Marks – Pg. 13
    Excerpt: Simulations such as Dawkins’s WEASEL, Adami’s AVIDA, Ray’s Tierra, and Schneider’s ev appear to support Darwinian evolution, but only for lack of clear accounting practices that track the information smuggled into them.,,, Information does not magically materialize. It can be created by intelligence or it can be shunted around by natural forces. But natural forces, and Darwinian processes in particular, do not create information. Active information enables us to see why this is the case.

    Before They’ve Even Seen Stephen Meyer’s New Book, Darwinists Waste No Time in Criticizing Darwin’s Doubt – William A. Dembski – April 4, 2013
    Excerpt: In the newer approach to conservation of information, the focus is not on drawing design inferences but on understanding search in general and how information facilitates successful search. The focus is therefore not so much on individual probabilities as on probability distributions and how they change as searches incorporate information. My universal probability bound of 1 in 10^150 (a perennial sticking point for Shallit and Felsenstein) therefore becomes irrelevant in the new form of conservation of information whereas in the earlier it was essential because there a certain probability threshold had to be attained before conservation of information could be said to apply. The new form is more powerful and conceptually elegant. Rather than lead to a design inference, it shows that accounting for the information required for successful search leads to a regress that only intensifies as one backtracks. It therefore suggests an ultimate source of information, which it can reasonably be argued is a designer. I explain all this in a nontechnical way in an article I posted at ENV a few months back titled “Conservation of Information Made Simple” (go here). ,,,

    ,,, Here are the two seminal papers on conservation of information that I’ve written with Robert Marks:
    “The Search for a Search: Measuring the Information Cost of Higher-Level Search,” Journal of Advanced Computational Intelligence and Intelligent Informatics 14(5) (2010): 475-486
    “Conservation of Information in Search: Measuring the Cost of Success,” IEEE Transactions on Systems, Man and Cybernetics A, Systems & Humans, 5(5) (September 2009): 1051-1061
    For other papers that Marks, his students, and I have done to extend the results in these papers, visit the publications page at

  9. 9
    Joe says:

    If unguided, ie blind watchmaker, evolution could produce some models we could test we wouldn’t need probabilities. And seeing they cannot produce any models for unguided evolution they shouldn’t get all pithy wrt probabilities.

    Hi Lizzie 😛

Leave a Reply