- Share
-
-
arroba
Computer programmer Glenn Williamson now claims that ICR geneticist Jeff Tomkins made an elementary error when using the nucmer program to show that human and chimp DNA are only 88% similar. Williamson also asserts that 60 de novo protein coding genes said to be unique to human beings have very similar counterparts in apes, contrary to claims made last year by Dr. Cornelius Hunter, who is an adjunct professor of biophysics at Biola University.
What Dr. Tomkins allegedly got wrong
As readers of my recent post, Human and chimp DNA: They really are about 98% similar, will recall, Glenn Williamson demolished Dr. Tomkins’s original claim, made back in 2013, that human and chimp DNA are only about 70% similar. Williamson’s detailed takedown of Dr. Tomkins’s 70% similarity figure can be accessed here. In short: the version of the BLAST computer algorithm used by Tomkins contained a bug which invalidated his results. Dr. Tomkins responded by performing a new study which came up with a similarity figure of 88% – still far below the 98% similarity figure commonly claimed in textbooks for human and chimp DNA. Tomkins arrived at that figure by using a version of the BLAST algorithm which did not contain the bug, and in my last post, I pointed out the errors identified by Glenn Williamson in Dr. Tomkins’ new paper, relating to BLAST.
But to give credit where credit is due, Dr. Tomkins didn’t rely on just one computer program to come up with his 88% figure; he relied on three. In addition to BLAST, Dr. Tomkins made use of two other programs: nucmer and LASTZ. Creation scientist Jay Wile described these programs in a recent post discussing Dr. Tomkins’ work:
The nucmer program’s results agreed with the unbugged BLAST results: on average the human and chimpanzee genomes are 88% similar. The LASTZ program produced a lower average similarity (73%), which indicates that perhaps LASTZ has a bug or is not optimized for such comparisons, since its results are very close to the results Dr. Tomkins got with the bugged version of BLAST.
In today’s post, I’ll discuss the flaws identified by Glenn Williamson in Dr. Tomkins’s comparisons that were made using the nucmer program.
Basic methodological errors?
As we saw in yesterday’s post on Uncommon Descent, Glenn Williamson claims that Dr. Tomkins’s new study makes some fundamental errors in the way it performs the BLASTN analysis. Now, however, Williamson has gone further, and identified some very basic errors in the way Dr. Tomkins obtained his results from the nucmer program. What Williamson has shown is that even when human chromosome 20 is compared with itself, the calculation method used by Dr. Tomkins when running the “nucmer” program would imply (absurdly) that it is less than 90% similar to itself!
I have been in email correspondence with Glenn Williamson over the past 24 hours, and he kindly allowed me to publish his responses, as well as some emails he sent to Dr. Tomkins. Here’s an excerpt from his first email to me.
Hi Vincent,
I’ve only just seen your post on UD, and I thought I’d fill you in on where we are at with one of the other comparisons (“nucmer”) Jeff did in his recent paper. Basically what he is doing in this comparison is taking every single alignment for each query sequence (as opposed to taking just the best alignment) and taking the average of all those. Obviously all the repeat motifs will find many matches across each chromosome, but only one of those will be (putatively) homologous. If you can follow the email thread from the bottom, hopefully you can understand the issue.
I’m currently running a nucmer job with human chromosome 20 being compared to itself, just to show the absurdity of his calculation method. I should have the results by tomorrow.
I subsequently emailed him, and asked if he could tell me about the results:
I would greatly appreciate it if you would let me know about your results, after you finish running your nucmer job. I was also wondering if you would allow me to quote excepts from your correspondence in a forthcoming post on UD.
Glenn Williamson replied:
Hey,
Yup, no problems quoting any of the emails…
The first nucmer job I ran took 37 hours (human 20 to chimp 20), and this current “control” job (human 20 to human 20) has taken 37 hours as of right now, so it should finish soon. It will take a couple of hours to put all the results together, so should have something by tonight.
It wasn’t long before I heard from Glenn Williamson again:
It’s done!
And human chromosome 20 is only 88.86% identical to human chromosome 20! 🙂
Unix commands, if you care:
awk ‘NR>5 { print $7″\t”$8″\t”$10 }’ control.coords > control.tab
awk ‘{ sum += ($1 + $2) / 2; prod += ($1 + $2) / 2 * $3 } END { print prod; print sum; print prod / sum }’ control.tabOutput:
1.71549e+09
1.52439e+11
88.8601So basically the alignments covered 1.715Gb for a chromosome that is only 64Mb long (27x coverage). There were 4.8 million individual alignments …
So there we have it. If Dr. Tomkins is right, then not only is chimpanzee DNA only 88% similar to our own, but human DNA is only 89% similar to itself!
Do human beings really have 60 de novo protein-coding genes with no counterparts in apes?
But there was more – much more. In my original email to Glenn Williamson, I had expressed curiosity over a comment he made on a January 2014 post titled, Chinese Researchers Demolish Evolutionary Pseudo-Science, over at Dr. Cornelius Hunter’s Website, Darwin’s God, in which Williamson expressed skepticism over Dr. Hunter’s claim that no less than 60 protein-coding orphan genes had been identified in human DNA which had no counterpart in chimpanzees. To support his claim, Dr. Hunter cited a 2011 PLOS study by Dong-Dong Wu, David M. Irwin and Ya-Ping Zhang, titled De Novo Origin of Human Protein-Coding Genes. Here is the authors’ summary of their paper (emphases mine – VJT):
The origin of genes can involve mechanisms such as gene duplication, exon shuffling, retroposition, mobile elements, lateral gene transfer, gene fusion/fission, and de novo origination. However, de novo origin, which means genes originate from a non-coding DNA region, is considered to be a very rare occurrence. Here we identify 60 new protein-coding genes that originated de novo on the human lineage since divergence from the chimpanzee, supported by both transcriptional and proteomic evidence. It is inconsistent with the traditional view that the de novo origin of new genes is rare. RNA–seq data indicate that these de novo originated genes have their highest expression in the cerebral cortex and testes, suggesting these genes may contribute to phenotypic traits that are unique to humans, such as development of cognitive ability. Therefore, the importance of de novo origination needs greater appreciation.
Commenting on the paper, Dr. Hunter remarked (bold emphases mine – VJT):
A 2011 paper out of China and Canada, for example, found 60 protein-coding genes in humans that are not in the chimp. And that was an extremely conservative estimate. They actually found evidence for far more such genes, but used conservative filters to arrive at 60 unique genes. Not surprisingly, the research also found evidence of function, for these genes, that may be unique to humans.
If the proteins encoded by these genes are anything like most proteins, then this finding would be another major problem for evolutionary theory. Aside from rebuking the evolutionist’s view that the human-chimp genome differences must be minor, 6 million years simply would not be enough time to evolve these genes.
In fact, 6 billion years would not be enough time. The evolution of a single new protein, even by evolutionists’ incredibly optimistic assumptions, is astronomically unlikely, even given the entire age of the universe to work on the problem.
Note the claim that Dr. Hunter is making here: “60 protein-coding genes in humans that are not in the chimp.” But as we’ll see, these genes do have virtually identical counterparts in chimps, even if they are noncoding.
So, how many ORFan genes do humans really have?
In his comment, Glenn Williamson responded to Dr. Hunter’s claim that humans have 60 protein-coding genes that are “not in the chimp” by pointing out that the first of these 60-odd genes actually has a counterpart in chimpanzee DNA which is 98% identical to the human gene (emphasis mine – VJT):
“So how many ORFan genes are actually in humans???”
Depends what you call an ORFan gene. I looked at the paper that Cornelius cites as having 60 de novo protein coding genes.
Now, granted that I only looked at the very first one (“ZNF843”), it was quite easy to find the corresponding DNA on the chimpanzee chromosome, with approximately 98.5% identity.
So whether it is protein-coding in humans and non-coding in everything else doesn’t really concern me. What concerns me is whether it has an evolutionary history: clearly this one does.
Like I said, I’ve only done one. I’d happily take bets on the majority of these de novo genes having an evolutionary history (chimpanzee > 95% and/or gorilla > 90%).
Any takers?
I had only come across this exchange in the last couple of days, while surfing the Net, and my curiosity was piqued. So I wrote back to Williamson:
By the way, I was intrigued with your work on orphan genes, and I thought I’d have a look at the 60 genes mentioned by Cornelius Hunter in a post he wrote last year. However, I don’t have any experience in this area. Can you tell me how to go about running these comparisons?
Orphan genes – did Dr. Hunter get his facts wrong?
Glenn Williamson’s reply was very helpful – and it pulled no punches. He accused Dr. Hunter of getting his facts wrong about ORFan genes (emphasis mine – VJT):
As for Orphan genes, I assume you mean this comment? http://darwins-god.blogspot.com.au/2014/01/chinese-researchers-demolish.html?showComment=1421299517820#c1105680265537141676
There are a couple of points to be made here. First is that Cornelius fundamentally misunderstands what an orphan gene is and what an ORF(an) is – they are not equivalent terms. A true orphan gene should be called a “taxonomically restricted gene” (TRG), and no trace of its evolutionary history can be found outside a particular taxonomic group. These would be examples of de novo evolution. With respect to humans and chimpanzees, I don’t know of any TRGs that exist in either genome (with respect to the other), and if there were, I would then check the other great apes to see if it was likely that this gene was deleted in one of the genomes (rather than evolved out of nothing in 6mn years!).
Good point. Williamson continued:
An ORFan gene usually refers to a putative protein coding gene. “Putative” because these are generally the result of a computer program trying to find long open reading frames, and if it finds something over a certain length (300bp? 400bp?) then, since a long open reading frame is quite unlikely, the program thinks that this open reading frame is evolutionarily conserved, and it might be conserved because it codes for an important protein. Have a read of Eric Lander’s paper – http://www.ncbi.nlm.nih.gov/pubmed/18040051 – where he says we should be removing these ORFs from the gene database unless and until we can actually find their corresponding proteins.
Glad we got that point cleared up. So, what about those 60 protein-coding genes in humans which Dr. Hunter claimed are not found in the chimp? Here’s what Williamson wrote back to me:
So, these 60-odd genes that Cornelius brings up, he is claiming that they must have evolved de novo:
“In fact, 6 billion years would not be enough time. The evolution of a single new protein, even by evolutionists’ incredibly optimistic assumptions, is astronomically unlikely, even given the entire age of the universe to work on the problem.”
And that’s why I checked the first one on the list, just to demonstrate that it was in the chimpanzee genome (at 98.5% identity). So if this gene codes for a protein in humans, maybe we just haven’t found the protein in chimps. Maybe it codes for a protein in humans, and there was a single mutation that caused it not to be translated in chimps. Maybe it doesn’t actually code for a protein in humans at all? (Although I think we can check that). In any case, it’s not an example of de novo evolution – it’s not an orphan gene in the sense of being taxonomically restricted.
As to how to do the work yourself .. let me send this one off first and I’ll start another email 🙂
For my part, I am somewhat skeptical about Williamson’s speculation that these genes got switched off in the lin leading to chimpanzees – especially in view of the discovery of three undoubted cases of de novo genes in human beings where the ancestral sequence in apes was noncoding. But given the striking 98% similarities between these genes and their non-coding counterparts in apes, I would also urge caution about Dr. Hunter’s claim that even billions of years would not have been long enough for these protein-coding genes to have evolved. If they were evolving from scratch, yes; but if they were evolving from 98% identical counterparts, I wouldn’t be so sure about that.
I learn how to do a BLAST comparison
In his next email, Glenn Williamson kindly informed me how to do a BLAST comparison, and how he obtained his results for ZNF843, which was the first of the 60 de novo protein coding genes cited by Dr. Hunter in his 2014 post. In his response to Dr. Hunter, Williamson had reported that “it was quite easy to find the corresponding DNA on the chimpanzee chromosome, with approximately 98.5% identity.” Here’s what he wrote to me:
Alright, I’ll run you through a simple BLAST search on the Ensembl website. Although, if you want to do some serious BLASTing, then you probably should install the software on your own machine, and download the genomes onto your hard drive.
Anyway, go to:
http://www.ensembl.org/index.html
and stick the name of the gene: ZNF843 into the search box. That should get you to here:
http://asia.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000176723;r=16:31432593-31443160
On the left hand side, there should be an “Export Data” tab. Click it. Deselect all the checkboxes (we just want the raw DNA) and hit “Next”. Hit the “Text” button, and then just Copy the whole output, starting with the “>blah blah blah”. Now, at the top left of the page is the “BLAST/BLAT” tool. Click it.
Paste the copied DNA into the box, make sure you search against the chimpanzee genome (i.e. uncheck the human genome) and then run the search – using the default parameters should be fine for now.
The results can be found here:
http://www.ensembl.org/Homo_sapiens/Tools/Blast/Ticket?tl=mQCTv8YnFRQKB0Kx
Unfortunately the results are given in chunks, and if you want to get an exact number, stick them in Excel and work it out. But if you just want to look at it on the website, click on the “Genomic Location” header to sort them in that order, scroll down to chromosome 16, and you’ll see that it covers the vast majority of the 10.5kb of query DNA, and the matches are around 98.5%-99.5%. Rough guess for the overall identity (including some small indels) is about 98.5%.
If you need help just email me back and I’ll see what I can do. I gotta run now tho 🙂
And here’s what Williamson got when he ran the BLAST comparison on his computer:
I ran it on my local machine:
#!/bin/sh
QRY=”ZNF843.fa”
SBJ=”${HOME}/Data/Ensembl/chimp/Pan_troglodytes.CHIMP2.1.4.dna.chromosome.16.fa”blastn -query ${QRY} -subject ${SBJ} -max_hsps 1 -outfmt ’10 qseqid qstart qend sstart send nident pident qlen length’
Output:
16,1,10568,31611859,31601307,10375,97.62,10568,10628
So, only 97.62% identity for that one … 0.57% of the alignment is indels. Boooooooooooooo.
So, for the first of the alleged 60 “de novo” protein coding genes cited by Dr. Hunter (“ZNF843″), Glenn Williamson managed to locate some corresponding DNA on the chimpanzee chromosome, which was approximately 98% identical. Are these genes without an evolutionary history? I hardly think so!
More good news – the results for all the other genes are already in!
In his most recent email, Glenn Williamson shared further good tidings: comparisons for the other 59 genes have already been done!
Just looking into that 2011 paper a little further – they’ve already done all the work for us!
http://journals.plos.org/plosgenetics/article/asset?unique&id=info:doi/10.1371/journal.pgen.1002379.s009
http://journals.plos.org/plosgenetics/article/asset?unique&id=info:doi/10.1371/journal.pgen.1002379.s011These are the 60 “de novo” genes, and their alignments with chimpanzee and orang-utan 🙂
I’ve had a look at the output, and even to my untutored eye, it’s obvious that any claims that these “de novo” genes are not found in the DNA of chimps and other apes are flat-out wrong. They have virtually identical counterparts on the chimpanzee and orang-utan genomes, even if these are non-protein coding.
Some cautionary remarks about the 2011 paper cited by Dr. Hunter
The 2011 paper by Wu et al. which was cited by Dr. Hunter was critiqued in another article in PLOS Genetics (7(11): e1002381. doi:10.1371/journal.pgen.1002381, published 10 November 2011), titled,
De Novo Origins of Human Genes by Daniele Guerzoni and Aoife McLysaght. The authors felt that the estimate of 60 de novo human-specific genes in the paper by Wu et al. was based on rather lax criteria. What’s more, they seemed confident that the genes could have evolved:
In this issue of PLoS Genetics, Wu et al. [15] report 60 putative de novo human-specific genes. This is a lot higher than a previous, admittedly conservative, estimate of 18 such genes [13], [16]. The genes identified share broad characteristics with other reported de novo genes [13]: they are short, and all but one consist of a single exon. In other words, the genes are simple, and their evolution de novo seems plausible. The potential evolution of complex features such as intron splicing and protein domains within de novo genes remains somewhat puzzling. However, features such as proto-splice sites may pre-date novel genes [9], [17], and the appearance of protein domains by convergent evolution may be more likely than previously thought [2].
The operational definition of a de novo gene used by Wu et al. [15] means that there may be an ORF (and thus potentially a protein-coding gene) in the chimpanzee genome that is up to 80% of the length of the human gene (for about a third of the genes the chimpanzee ORF is at least 50% of the length of the human gene). This is a more lenient criterion than employed by other studies, and this may partly explain the comparatively high number of de novo genes identified. Some of these cases may be human-specific extensions of pre-existing genes, rather than entirely de novo genes — an interesting, but distinct, phenomenon.
In a 2009 paper titled Recent de novo origin of human protein-coding genes (Genome Research 2009, 19: 1752-1759), David Knowles and Aoife McLysaght presented evidence for the de novo origin of at least three human protein-coding genes since the divergence with chimp, and estimated that there may be 18 such genes in the human genome, altogether. Here’s what they said about the three genes they identified:
Each of these genes has no protein-coding homologs in any other genome, but is supported by evidence from expression and, importantly, proteomics data. The absence of these genes in chimp and macaque cannot be explained by sequencing gaps or annotation error. High-quality sequence data indicate that these loci are noncoding DNA in other primates. Furthermore, chimp, gorilla, gibbon, and macaque share the same disabling sequence difference, supporting the inference that the ancestral sequence was noncoding over the alternative possibility of parallel gene inactivation in multiple primate lineages.
Note the wording: “Each of these genes has no protein-coding homologs in any other genome.” Nevertheless, the genes have non-coding counterparts in the DNA of apes: “High-quality sequence data indicate that these loci are noncoding DNA in other primates.”
Whether these genes could have evolved naturally from their ape counterparts is a question I’ll leave for the experts to sort out. One thing I do know, however: they are not “new” in the sense that layfolk would construe that term – that is, functioning genes which have no counterparts in the DNA of apes. Clearly, they do have very similar counterparts in apes, even if those counterparts are non-coding.
Conclusion
Well, I think that’s about enough new revelations for one day, so I shall stop there. It seems to me that any claims that humans have a large number of “de novo” genes with no counterparts in the DNA of chimpanzees and other apes should be treated with extreme caution. In fact, I wouldn’t bet on our having any de novo protein-coding genes having no counterparts in apes, after that takedown.
We already have very good arguments demonstrating the impossibility of proteins having evolved via an undirected process, thanks to the excellent work of Dr. Douglas Axe – see, for instance, his excellent article, The Case Against a Darwinian Origin of Protein Folds. It seems to me that arguments based on de novo genes alleged to exist in human beings, with no counterparts in apes, have much weaker evidential support, and that Intelligent Design proponents would be better off not using them.
But perhaps those who are feeling adventurous might like to take up Glenn Williamson on his 2014 wager:
I’d happily take bets on the majority of these de novo genes having an evolutionary history (chimpanzee > 95% and/or gorilla > 90%).
Any takers?
Well? Is anyone feeling lucky?
POSTSCRIPT: Readers may be interested to know that Dr. Ann Gauger has written a very balanced post titled, Orphan Genes—A Guide for the Perplexed. In her post, Dr. Gauger defines orphan genes as ” those open reading frames that lack identifiable sequence similarity to other protein-coding genes.” Note the word “protein-coding.” She raises the possibility that “they are uniquely designed for species- and clade-specific functions” but draws no firm conclusions.