Big data Population genetics Science

At Phys.org: Study reveals flaws in popular genetic method

Spread the love

The most common analytical method within population genetics is deeply flawed, according to a new study from Lund University in Sweden. This may have led to incorrect results and misconceptions about ethnicity and genetic relationships. The method has been used in hundreds of thousands of studies, affecting results within medical genetics and even commercial ancestry tests. The study is published in Scientific Reports.

The rate at which scientific data can be collected is rising exponentially, leading to massive and highly complex datasets, dubbed the “Big Data revolution.” To make these data more manageable, researchers use statistical methods that aim to compact and simplify the data while still retaining most of the key information. Perhaps the most widely used method is called PCA (principal component analysis). By analogy, think of PCA as an oven with flour, sugar and eggs as the data input. The oven may always do the same thing, but the outcome, a cake, critically depends on the ingredients’ ratios and how they are combined.

“It is expected that this method will give correct results because it is so frequently used. But it is neither a guarantee of reliability nor produces statistically robust conclusions,” says Dr. Eran Elhaik, Associate Professor in molecular cell biology at Lund University.

According to Elhaik, the method helped create old perceptions about race and ethnicity. It plays a role in manufacturing historical tales of who and where people come from, not only by the scientific community but also by commercial ancestry companies. A famous example is when a prominent American politician took an ancestry test before the 2020 presidential campaign to support their ancestral claims. Another example is the misconception of Ashkenazic Jews as a race or an isolated group driven by PCA results.

“This study demonstrates that those results were unreliable,” says Eran Elhaik.

PCA is used across many scientific fields, but Elhaik’s study focuses on its usage in population genetics, where the explosion in dataset sizes is particularly acute, which is driven by the reduced costs of DNA sequencing.

The field of paleogenomics, where we want to learn about ancient peoples and individuals such as Copper age Europeans, heavily relies on PCA. PCA is used to create a genetic map that positions the unknown sample alongside known reference samples. Thus far, the unknown samples have been assumed to be related to whichever reference population they overlap or lie closest to on the map.

However, Elhaik discovered that the unknown sample could be made to lie close to virtually any reference population just by changing the numbers and types of the reference samples, generating practically endless historical versions, all mathematically “correct,” but only one may be biologically correct.

In the study, Elhaik has examined the twelve most common population genetic applications of PCA. He has used both simulated and real genetic data to show just how flexible PCA results can be. According to Elhaik, this flexibility means that conclusions based on PCA cannot be trusted since any change to the reference or test samples will produce different results.

Between 32,000 and 216,000 scientific articles in genetics alone have employed PCA for exploring and visualizing similarities and differences between individuals and populations and based their conclusions on these results.

“I believe these results must be re-evaluated,” says Elhaik.

“Techniques that offer such flexibility encourage bad science and are particularly dangerous in a world where there is intense pressure to publish. If a researcher runs PCA several times, the temptation will always be to select the output that makes the best story,” adds Prof. William Amos, from the University of Cambridge, who was not involved in the study.

Phys.org

How much does the “intense pressure to publish” research skew scientific articles away from objectivity towards attempts to show confirmation of acceptable, popular results?

8 Replies to “At Phys.org: Study reveals flaws in popular genetic method

  1. 1
    bornagain77 says:

    Of semi-related note. The main mathematical model used by Darwinists in population genetics, i.e. Fisher’s theorem, is now known to have little, if any, correspondence to biological reality.

    Fisher’s fundamental theorem of natural selection is (considered by Darwinists as) one of the basic laws of population genetics.,,,

    Fisher’s theorems,,, 2005
    Excerpt: Fisher’s fundamental theorem of natural selection is one of the basic laws of population genetics. In 1930, Fisher showed that for single-locus genetic systems with pure selection and constant selection coefficients, the rate of variation of the average population fitness equals the genetic variance of the fitness (1). Because the variance is nonnegative, it follows that for systems with pure selection and constant rate coefficients, the average fitness always increases in time, a result that is compatible with general ideas of biological evolution. Fisher claimed that this law should hold the same position among biological sciences as the second law of thermodynamics in physical sciences (1).
    https://www.pnas.org/doi/10.1073/pnas.0504073102
    1. Fisher, R. A. (1930) The Genetical Theory of Natural Selection (Clarendon, Oxford); reprinted (1999) (Oxford Univ. Press, Oxford) pp. 22–47.

    In his theorem Fisher “assumed that new mutations arose with a nearly normal distribution – with an equal proportion of good and bad mutations (so mutations would have a net fitness effect of zero). (Yet) We now know that the vast majority of mutations in the functional genome are harmful, and that beneficial mutations are vanishingly rare.”,,, And when realistic rates of detrimental to beneficial mutations are taken into consideration, then it falsifies Fisher’s assumption within his mathematical model. i.e. It falsifies his assumption that fitness must always increase:

    Mathematician and Geneticist Team Up to Correct Fisher’s Theorem – Dec. 22, 2017
    Excerpt: A recent paper in the Journal of Mathematical Biology (https://link.springer.com/article/10.1007/s00285-017-1190-x) has uncovered major problems with the historically pivotal Fundamental Theorem of Natural Selection. That theorem was proven by Ronald Fisher – one the great scientists of the last century. Fisher’s theorem was published in 1930, and was the foundational work that gave rise to neo-Darwinian theory and the field of population genetics.
    Fisher described his theorem as “fundamental,” because he believed he had discovered a mathematical proof for Darwinian evolution. He described his theorem as equivalent to a universal natural law – on the same level as the second law of thermodynamics. Fisher’s self-proclaimed new law of nature was that populations will always increase in fitness – without limit, as long as there is any genetic variation in the population. Therefore evolution is like gravity – a simple mathematical certainly. Over the years, a vast number of students of biology have been taught this mantra – Fisher’s Theorem proves that evolution is a mathematical certainty.
    The authors of the new paper describe the fundamental problems with Fisher’s theorem. They then use Fisher’s first principles, and reformulate and correct the theorem. They have named the corrected theorem The Fundamental Theorem of Natural Selection with Mutations. The correction of the theorem is not a trivial change – it literally flips the theorem on its head. The resulting conclusions are clearly in direct opposition to what Fisher had originally intended to prove.,,,
    The authors of the new paper realized that one of Fisher’s pivotal assumptions was clearly false, and in fact was falsified many decades ago. In his informal corollary, Fisher essentially assumed that new mutations arose with a nearly normal distribution – with an equal proportion of good and bad mutations (so mutations would have a net fitness effect of zero). We now know that the vast majority of mutations in the functional genome are harmful, and that beneficial mutations are vanishingly rare. The simple fact that Fisher’s premise was wrong, falsifies Fisher’s corollary. Without Fisher’s corollary – Fisher’s Theorem proves only that selection improves a population’s fitness until selection exhausts the initial genetic variation, at which point selective progress ceases. Apart from his corollary, Fisher’s Theorem only shows that within an initial population with variant genetic alleles, there is limited selective progress followed by terminal stasis.
    Since we now know that the vast majority of mutations are deleterious, therefore we can no longer assume that the mutations and natural selection will lead to increasing fitness. For example, if all mutations were deleterious, it should be obvious that fitness would always decline, and the rate of decline would be proportional to the severity and rate of the deleterious mutations.
    To correct Fisher’s Theorem, the authors of the new paper needed to reformulate Fisher’s mathematical model. The problems with Fisher’s theorem were that; 1) it was initially formulated in a way that did not allow for any type of dynamical analysis; 2) it did not account for new mutations; and 3) it consequently did not consider the net fitness effect of new mutations. The newly formulated version of Fisher’s theorem has now been mathematically proven. It is shown to yield identical results as the original formulation, when using the original formulation’s assumptions (no mutations). The new theorem incorporates two competing factors: a) the effect of natural selection, which consistently drives fitness upward); and b) the effect of new mutations, which consistently drive fitness downward). It is shown that the actual efficiency of natural selection and the actual rate and distribution of new mutations determines whether a population’s fitness will increase or decrease over time. Further analysis indicates that realistic rates and distributions of mutations make sustained fitness gain extremely problematic, while fitness decline becomes more probable. The authors observe that the more realistic the parameters, the more likely fitness decline becomes. The new paper seems to have turned Fisher’s Theorem upside down, and with it, the entire neo-Darwinian paradigm.
    Supplemental Information – Fisher’s informal corollary (really just a thought experiment), was convoluted. The essence of Fisher’s corollary was that the effect of both good and bad mutations should be more or less equal – so their net effect should be more-or less neutral. However, the actual evidence available to Fisher at that time already indicated that mutations were overwhelmingly deleterious. Fisher acknowledged that most observed mutations were clearly deleterious – but he imagined that this special class of highly deleterious mutations would easily be selected away, and so could be ignored. He reasoned that this might leave behind a different class of invisible mutations that all had a very low-impact on fitness – which would have a nearly equal chance of being either good or bad. This line of reasoning was entirely speculative and is contrary to what we now know. Ironically, such “nearly-neutral” mutations are now known to also be nearly-invisible to natural selection – precluding their role in any possible fitness increase. Moreover, mutations are overwhelmingly deleterious – even the low impact mutations. This means that the net effect of such “nearly-neutral” mutations, which are all invisible to selection, must be negative, and must contribute significantly to genetic decline. Furthermore, it is now known that the mutations that contribute most to genetic decline are the deleterious mutations that are intermediate in effect – not easily selected away, yet impactful enough to cause serious decline.
    https://crev.info/2017/12/geneticist-corrects-fishers-theorem/

    Defending the validity and significance of the new theorem “Fundamental Theorem of Natural Selection With Mutations, Part I: Fisher’s Impact – Bill Basener and John Sanford – February 15, 2018
    Excerpt: While Fisher’s Theorem is mathematically correct, his Corollary is false. The simple logical fallacy is that Fisher stated that mutations could effectively be treated as not impacting fitness, while it is now known that the vast majority of mutations are deleterious, providing a downward pressure on fitness. Our model and our correction of Fisher’s theorem (The Fundamental Theorem of Natural Selection with Mutations), take into account the tension between the upward force of selection with the downward force of mutations.,,,
    Our paper shows that Fisher’s corollary is clearly false, and that he misunderstood the implications of his own theorem. He incorrectly believed that his theorem was a mathematical proof that showed that natural selection plus mutation will necessarily and always increase fitness. He also believed his theorem was on a par with a natural law (such as entropic dissipation and the second law of thermodynamics). Because Fisher did not understand the actual fitness distribution of new mutations, his belief in the application of his “fundamental theorem of natural selection” was fundamentally and profoundly wrong – having little correspondence to biological reality. Therefore, we have reformulated Fisher’s model and have corrected his errors, thereby have established a new theorem that better describes biological reality, and allows for the specification of those key variables that will determine whether fitness will increase or decrease.
    http://theskepticalzone.com/wp.....rs-impact/

    The fundamental theorem of natural selection with mutations – June 2018
    Excerpt: Because the premise underlying Fisher’s corollary is now recognized to be entirely wrong, Fisher’s corollary is falsified. Consequently, Fisher’s belief that he had developed a mathematical proof that fitness must always increase is also falsified.
    We build a differential equations model from Fisher’s first principles with mutations added, and prove a revised theorem showing the rate of change in mean fitness is equal to genetic variance plus a mutational effects term. We refer to our revised theorem as the fundamental theorem of natural selection with mutations. Our expanded theorem, and our associated analyses (analytic computation, numerical simulation, and visualization), provide a clearer understanding of the mutation–selection process, and allow application of biologically realistic parameters such as mutational effects. The expanded theorem has biological implications significantly different from what Fisher had envisioned.
    https://link.springer.com/article/10.1007/s00285-017-1190-x

    The Fundamental Theorem of Natural Selection with Mutations – Conference Presentation
    https://www.youtube.com/watch?v=ZA4LpDWZ2KA

    Of related note to Darwinists not having a mathematical model for their theory that corresponds to biological reality,

    Top Ten Questions and Objections to ‘Introduction to Evolutionary Informatics’ – Robert J. Marks II – June 12, 2017
    Excerpt: “There exists no model successfully describing undirected Darwinian evolution. Hard sciences are built on foundations of mathematics or definitive simulations. Examples include electromagnetics, Newtonian mechanics, geophysics, relativity, thermodynamics, quantum mechanics, optics, and many areas in biology. Those hoping to establish Darwinian evolution as a hard science with a model have either failed or inadvertently cheated. These models contain guidance mechanisms to land the airplane squarely on the target runway despite stochastic wind gusts. Not only can the guiding assistance be specifically identified in each proposed evolution model, its contribution to the success can be measured, in bits, as active information.,,,”,,, “there exists no model successfully describing undirected Darwinian evolution. According to our current understanding, there never will be.,,,”
    https://evolutionnews.org/2017/06/top-ten-questions-and-objections-to-introduction-to-evolutionary-informatics/
    Robert Jackson Marks II is an American electrical engineer. His contributions include the Zhao-Atlas-Marks (ZAM) time-frequency distribution in the field of signal processing,[1] the Cheung–Marks theorem[2] in Shannon sampling theory and the Papoulis-Marks-Cheung (PMC) approach in multidimensional sampling.[3] He was instrumental in the defining of the field of computational intelligence and co-edited the first book using computational intelligence in the title.[4][5]
    – per wikipedia

  2. 2
    Lieutenant Commander Data says:

    Bornagain77
    In his theorem Fisher “assumed that new mutations arose with a nearly normal distribution – with an equal proportion of good and bad mutations (so mutations would have a net fitness effect of zero). (Yet) We now know that the vast majority of mutations in the functional genome are harmful, and that beneficial mutations are vanishingly rare.”,,, And when realistic rates of detrimental to beneficial mutations are taken into consideration, then it falsifies Fisher’s assumption within his mathematical model. i.e. It falsifies his assumption that fitness must always increase:

    Nobody has proven that mutations are random . Majority of mutations are just part of calibration apparatus.
    Again the word “mutation”(like “evolution”) is a darwinian term that imply a transformation or metamorphosis and abduct the mind to think in darwinian paradigm.

    They say the majority of mutations are “neutral”? “Mutation” and “neutral” are 2 opposite words . This is a nonsense .If are neutral why do you call them mutations because mutate nothing. The expressions like “wet dryness” and “neutral mutations” are nonsense .

    Bad mutations cannot be called mutations because are just errors. Only in darwinian world is possible to exist such nonsense as “bad errors” and “good errors”.

  3. 3
    bornagain77 says:

    As to Dr. Marks statement that “there exists no (mathematical) model successfully describing undirected Darwinian evolution. According to our current understanding, there never will be.,,,”

    Top Ten Questions and Objections to ‘Introduction to Evolutionary Informatics’ – Robert J. Marks II – June 12, 2017
    Excerpt: “There exists no model successfully describing undirected Darwinian evolution. Hard sciences are built on foundations of mathematics or definitive simulations. Examples include electromagnetics, Newtonian mechanics, geophysics, relativity, thermodynamics, quantum mechanics, optics, and many areas in biology. Those hoping to establish Darwinian evolution as a hard science with a model have either failed or inadvertently cheated. These models contain guidance mechanisms to land the airplane squarely on the target runway despite stochastic wind gusts. Not only can the guiding assistance be specifically identified in each proposed evolution model, its contribution to the success can be measured, in bits, as active information.,,,”,,, “there exists no model successfully describing undirected Darwinian evolution. According to our current understanding, there never will be.,,,”
    https://evolutionnews.org/2017/06/top-ten-questions-and-objections-to-introduction-to-evolutionary-informatics/

    Dr. Marks statement that, “there exists no (mathematical) model successfully describing undirected Darwinian evolution. According to our current understanding, there never will be”, finds fairly strong mathematical support via Gödel’s incompleteness theorem for mathematics.

    Specifically, Darwin’s theory is based upon reductive materialism. Yet Godel’s incompleteness theorem for mathematics has now been extended into quantum physics itself, and, (in that extension of Gödel’s incompleteness theorem into quantum physics), it is now proven that “even a perfect and complete description of the microscopic properties of a material is not enough to predict its macroscopic behaviour.,,,” and “challenge the reductionists’ point of view, as the insurmountable difficulty lies precisely in the derivation of macroscopic properties from a microscopic description.”

    Quantum physics problem proved unsolvable: Gödel and Turing enter quantum physics – December 9, 2015
    Excerpt: A mathematical problem underlying fundamental questions in particle and quantum physics is provably unsolvable,,,
    It is the first major problem in physics for which such a fundamental limitation could be proven. The findings are important because they show that even a perfect and complete description of the microscopic properties of a material is not enough to predict its macroscopic behaviour.,,,
    “We knew about the possibility of problems that are undecidable in principle since the works of Turing and Gödel in the 1930s,” added Co-author Professor Michael Wolf from Technical University of Munich. “So far, however, this only concerned the very abstract corners of theoretical computer science and mathematical logic. No one had seriously contemplated this as a possibility right in the heart of theoretical physics before. But our results change this picture. From a more philosophical perspective, they also challenge the reductionists’ point of view, as the insurmountable difficulty lies precisely in the derivation of macroscopic properties from a microscopic description.”
    http://phys.org/news/2015-12-q.....godel.html

    Spectral gap (physics)
    Excerpt: In quantum mechanics, the spectral gap of a system is the energy difference between its ground state and its first excited state.[1][2] The mass gap is the spectral gap between the vacuum and the lightest particle. A Hamiltonian with a spectral gap is called a gapped Hamiltonian, and those that do not are called gapless.
    In solid-state physics, the most important spectral gap is for the many-body system of electrons in a solid material, in which case it is often known as an energy gap.
    In quantum many-body systems, ground states of gapped Hamiltonians have exponential decay of correlations.[3][4][5]
    In 2015 it was shown that the problem of determining the existence of a spectral gap is undecidable in two or more dimensions.[6][7] The authors used an aperiodic tiling of quantum Turing machines and showed that this hypothetical material becomes gapped if and only if the machine halts.[8] The one-dimensional case was also proved undecidable in 2020 by constructing a chain of interacting qudits divided into blocks that gain energy if they represent a full computation by a Turing machine, and showing that this system becomes gapped if and only if the machine does not halt.[9]
    https://en.wikipedia.org/wiki/Spectral_gap_(physics)

    Undecidability of the Spectral Gap – June 16, 2020
    Toby Cubitt, David Perez-Garcia, and Michael M. Wolf
    https://arxiv.org/pdf/1502.04573.pdf

  4. 4
    EDTA says:

    Ah, another fancy statistical technique seduces researchers! Makes one wonder just how many invalid or difficult-to-use-properly statistical techniques are out there, and how many scientific conclusions rely on them.

    (If you want more examples, see the references at the end of this: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5410776/ You may remember this controversy from a few years back…)

  5. 5
    jerry says:

    Makes one wonder just how many invalid or difficult-to-use-properly statistical techniques are out there,

    You might be interested in

    Everything You Believe Is Wrong

    William M. Briggs
    STATISTICIAN TO THE STARS!

    https://www.alibris.com/Everything-You-Believe-Is-Wrong-William-M-Briggs/book/51005501?matches=4

  6. 6
    EDTA says:

    I think I’ve read some similar books, not by that author, though; they seems to be popular these days. It is unfortunate in a way that we human beings can hold so many beliefs that are not correct, that we nonetheless don’t suffer much for–at least not right after forming the beliefs. If the feedback cycle were shorter, we wouldn’t get away with believing so much garbage.

  7. 7

    This paper is troubling. PCA analysis isn’t just a biology technique, its used in math, computer modeling, physics, economics etc. If it were true that it is an unreliable technique, we would have known it by now. What the authors seem to be saying, is that one can manipulate statistics by cherry-picking the data set. This has been known for centuries, and is not new. Recently psychology journals have implemented protocols to prevent cherry-picking, and is hoping that this stops the avalanche of retracted papers. Then all that this paper is really saying, is that biologists and genomists must also implement protocols to keep their data sets pristine. Intrinsically there is nothing wrong with PCA analysis if the data is not tampered with. The fear is that publishing pressures will cause pop gen authors to manipulate their data and cover their tracks with a PCA treatment.

  8. 8
    martin_r says:

    Researchers find flaws in how scientists build trees of life

    Our finding casts serious doubts over literally thousands of studies that use phylogenetic trees of extant data to reconstruct the diversification history of taxa, especially for those taxa where fossils are rare, or that found correlations between environmental factors such as changing global temperatures and species extinction rates,” Louca said, using a term for populations of one or more organisms that form a single unit.
    .
    .
    .
    The results, Louca said, do not invalidate the theory of evolution itself. They do, however, put constraints on what type of information can be extracted from genetic data to reconstruct evolution’s path.

    https://around.uoregon.edu/content/researchers-find-flaws-how-scientists-build-trees-life

Leave a Reply