From a recent article in Molecular Phylogenetics and Evolution:
In summary, the long c-genes that are required for accurate reconstruction of species trees using shortcut coalescence methods do not exist and are a delusion. Coalescence approaches based on SNPs that are widely spaced in the genome avoid problems with the recombination ratchet and merit further pursuit in both empirical systematic research and simulations.
A delusion? Wow. Story: Well, here’s a ScienceDirect abstract, and you decide:
Higher-level relationships among placental mammals are mostly resolved, but several polytomies remain contentious. Song et al. (2012) claimed to have resolved three of these using shortcut coalescence methods (MP-EST, STAR) and further concluded that these methods, which assume no within-locus recombination, are required to unravel deep-level phylogenetic problems that have stymied concatenation. Here, we reanalyze Song et al.’s (2012) data and leverage these re-analyses to explore key issues in systematics including the recombination ratchet, gene tree stoichiometry, the proportion of gene tree incongruence that results from deep coalescence versus other factors, and simulations that compare the performance of coalescence and concatenation methods in species tree estimation. Song et al. (2012) reported an average locus length of 3.1 kb for the 447 protein-coding genes in their phylogenomic dataset, but the true mean length of these loci (start codon to stop codon) is 139.6 kb. Empirical estimates of recombination breakpoints in primates, coupled with consideration of the recombination ratchet, suggest that individual coalescence genes (c-genes) approach 12 bp or less for Song et al.’s (2012) dataset, three to four orders of magnitude shorter than the c-genes reported by these authors. This result has general implications for the application of coalescence methods in species tree estimation. We contend that it is illogical to apply coalescence methods to complete protein-coding sequences. Such analyses amalgamate c-genes with different evolutionary histories (i.e., exons separated by >100,000 bp), distort true gene tree stoichiometry that is required for accurate species tree inference, and contradict the central rationale for applying coalescence methods to difficult phylogenetic problems. In addition, Song et al.’s (2012) dataset of 447 genes includes 21 loci with switched taxonomic names, eight duplicated loci, 26 loci with non-homologous sequences that are grossly misaligned, and numerous loci with >50% missing data for taxa that are misplaced in their gene trees. These problems were compounded by inadequate tree searches with nearest neighbor interchange branch swapping and inadvertent application of substitution models that did not account for among-site rate heterogeneity. Sixty-six gene trees imply unrealistic deep coalescences that exceed 100 million years (MY). Gene trees that were obtained with better justified models and search parameters show large increases in both likelihood scores and congruence. Coalescence analyses based on a curated set of 413 improved gene trees and a superior coalescence method (ASTRAL) support a Scandentia (treeshrews) + Glires (rabbits, rodents) clade, contradicting one of the three primary systematic conclusions of Song et al. (2012). Robust support for a Perissodactyla + Carnivora clade within Laurasiatheria is also lost, contradicting a second major conclusion of this study. Song et al.’s (2012) MP-EST species tree provided the basis for circular simulations that led these authors to conclude that the multispeciescoalescent accounts for 77% of the gene tree conflicts in their dataset, but many internal branches of their MP-EST tree are stunted by an order of magnitude or more due to wholesale gene tree reconstruction errors. An independent assessment of branch lengths suggests the multispecies coalescent accounts for 615% of the conflicts among Song et al.’s (2012) 447 gene trees. Unfortunately, Song et al.’s (2012) flawed phylogenomic dataset has been used as a model for additional simulation work that suggests the superiority of shortcut coalescence methods relative to concatenation. Investigator error was passed on to the subsequent simulation studies, which also incorporated further logical errors that should be avoided in future simulation studies. Illegitimate branch length switches in the simulation routines unfairly protected coalescence methods from their Achilles’ heel, high gene tree reconstruction error at short internodes. These simulations therefore provide no evidence that shortcut coalescence methods out-compete concatenation at deep timescales. In summary, the long c-genes that are required for accurate reconstruction of species trees using shortcut coalescence methods do not exist and are a delusion. Coalescence approaches based on SNPs that are widely spaced in the genome avoid problems with the recombination ratchet and merit further pursuit in both empirical systematic research and simulations. – Mark S. Springer, John Gatesy, Department of Biology, University of California, Riverside, CA 92521, USA, Molecular Phylogenetics and Evolution Volume 94, Part A, January 2016, Pages 1–33
It’s a review of Resolving conflict in eutherian mammal phylogeny using phylogenomics and the multispecies coalescent model Here’s the abstract:
The reconstruction of the Tree of Life has relied almost entirely on concatenation methods, which do not accommodate gene tree heterogeneity, a property that simulations and theory have identified as a likely cause of incongruent phylogenies. However, this incongruence has not yet been demonstrated in empirical studies. Several key relationships among eutherian mammals remain controversial and conflicting among previous studies, including the root of eutherian tree and the relationships within Euarchontoglires and Laurasiatheria. Both Bayesian and maximum-likelihood analysis of genome-wide data of 447 nuclear genes from 37 species show that concatenation methods indeed yield strong incongruence in the phylogeny of eutherian mammals, as revealed by subsampling analyses of loci and taxa, which produced strongly conflicting topologies. In contrast, the coalescent methods, which accommodate gene tree heterogeneity, yield a phylogeny that is robust to variable gene and taxon sampling and is congruent with geographic data. The data also demonstrate that incomplete lineage sorting, a major source of gene tree heterogeneity, is relevant to deep-level phylogenies, such as those among eutherian mammals. Our results firmly place the eutherian root between Atlantogenata and Boreoeutheria and support ungulate polyphyly and a sister-group relationship between Scandentia and Primates. This study demonstrates that the incongruence introduced by concatenation methods is a major cause of long-standing uncertainty in the phylogeny of eutherian mammals, and the same may apply to other clades. Our analyses suggest that such incongruence can be resolved using phylogenomic data and coalescent methods that deal explicitly with gene tree heterogeneity.– (public access) Song, S., L. Liu, S. V. Edwards, and S. Wu. 2012. “Resolving Conflict in Eutherian Mammal Phylogeny Using Phylogenomics and the Multispecies Coalescent Model.” Proceedings of the National Academy of Sciences 109 (37) (August 28): 14942–14947.doi:10.1073/pnas.1211733109.
See also: Tree of Life: Art exhibition or science pursuit?
Follow UD News at Twitter!