Uncommon Descent Serving The Intelligent Design Community
graph

A statistical comparison of two human genomes

Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

In a previous post I provided a statistical test to compare chimpanzee and human genomes. As you can read there, the post generated a very interesting discussion among the readers, and it seemed to me that the general feeling at the end was that my statistical method for performing genome-wide comparisons might have some merit, after all.

One reader suggested applying an identical test in order to compare two human genomes. That sounded like a very good idea to me, so I downloaded another human genome dataset from NCBI and performed a test.

For the benefit of readers, I’ll briefly recapitulate the simple comparison algorithm used in my previous test. 10,000 different sequences, each composed of 30 consecutive DNA bases (possible values: A, T, G and C) were randomly selected from chromosome N of genome A. A search for a matching pattern was then performed on the corresponding chromosome N of genome B. A pattern match was deemed to occur only when all 30 base pairs coincided perfectly – in other words, the head-to-head comparison between these DNA sub-strings was not relaxed, as occurs in many other tests in evolutionary comparative genomics. The total number of pattern matches found for that particular chromosome was then recorded. All chromosomes were tested in a similar fashion. Readers can view the latest results in the table and chart below, and compare them with the earlier results for the chimpanzee vs. human comparison.

As expected, the number of pattern matches was always significantly greater when comparing two humans than when comparing a chimpanzee with a human. As the chart above clearly shows, the number of matches in human vs. human comparisons was quite stable, ranging from 9507 to 9705 (chromosome Y is the sole exception, with 8989). However, the same did not hold for chimpanzee vs. human comparisons, where the values were much more scattered.

Finally, the average number of pattern matches per chromosome, shown at the bottom of the table, was very different in the two cases: 9616 for human vs. human comparisons, but only 6173 for chimp vs. human comparisons. The average number of patterns without a match for human vs. human comparisons was (10000 – 9616) = 384, or in percentage terms, 384/10000 = 3.84%. The average number of patterns without a match in human vs. chimp comparisons was (10000 – 6173) = 3827, or in percentage terms, 3827/10000 = 38.27%, which is almost ten times greater.

So the bottom-line question is: if, as many evolutionists say, chimpanzee and human genomes are 99% identical, how “identical” are two human genomes?

Comments
why ellazimm here you go Darwin's tree of life is 'annihilated : Here is another article, written by an evolutionist mind you, that states the true pattern found for life, from comparative genetic evidence, is not the tree pattern Darwin had envisioned: A New Model for Evolution: A Rhizome - May 2010 Excerpt: Thus we cannot currently identify a single common ancestor for the gene repertoire of any organism.,,, Overall, it is now thought that there are no two genes that have a similar history along the phylogenic tree.,,,Therefore the representation of the evolutionary pathway as a tree leading to a single common ancestor on the basis of the analysis of one or more genes provides an incorrect representation of the stability and hierarchy of evolution. Finally, genome analyses have revealed that a very high proportion of genes are likely to be newly created,,, and that some genes are only found in one organism (named ORFans). These genes do not belong to any phylogenic tree and represent new genetic creations. http://darwins-god.blogspot.com/2010/05/new-model-for-evolution-rhizome.html "Why Darwin was wrong about the tree of life," New Scientist (January 21, 2009) Excerpt: “Phylogenetic incongruities [conflicts] can be seen everywhere in the universal tree, from its root to the major branchings within and among the various taxa to the makeup of the primary groupings themselves.”,,, “We’ve just annihilated the (Darwin's) tree of life.” http://www.evolutionnews.org/2009/05/a_primer_on_the_tree_of_life_p_1.html#more More Questions for Evolutionists - August 2010 Excerpt: First of all, we have 65% of the gene number of humans in little old sponges—an organism that appears as far back as 635 million years ago, about as old as you can get [except for bacteria]. This kind of demolishes Darwin’s argument about what he called the pre-Silurian (pre-Cambrian). 635 mya predates both the Cambrian AND the Edicarian, which comes before the Cambrian (i.e., the pre-Cambrian) IOW, out of nowhere, 18,000 animal genes. Darwinian gradualism is dealt a death blow here (unless you’re a ‘true believer”!). Here’s a quote: “It means there was an elaborate machinery in place that already had some function. What I want to know now is what were all these genes doing prior to the advent of sponge.” (Charles Marshall, director of the University of California Museum of Paleontology in Berkeley.) I want to know, too! https://uncommondescent.com/intelligent-design/more-questions-for-evolutionists/ Since evolutionists continually misrepresent the true state of the evidence for molecular sequences, here are several more comments and articles, by leading experts, on the incongruence of molecular sequences to Darwin's theory: https://docs.google.com/document/pub?id=1S5wXsukzkauD5YQLkQYuIMGL25I4fJrOUzJhONvBXe4 ======= Primate Phylogenetics Challenge Darwin's Tree of Life - Casey Luskin - audio podcast http://intelligentdesign.podomatic.com/player/web/2011-05-09T16_32_00-07_00 A False Trichotomy Excerpt: The common chimp (Pan troglodytes) and human Y chromosomes are “horrendously different from each other”, says David Page,,, “It looks like there’s been a dramatic renovation or reinvention of the Y chromosome in the chimpanzee and human lineages.” https://uncommondescent.com/intelligent-design/a-false-trichotomy/ The Unbearable Lightness of Chimp-Human Genome Similarity Excerpt: One can seriously call into question the statement that human and chimp genomes are 99% identical. For one thing, it has been noted in the literature that the exact degree of identity between the two genomes is as yet unknown (Cohen, J., 2007. Relative differences: The myth of 1% Science 316: 1836.). ,,, In short, the figure of identity that one wants to use is dependent on various methodological factors. http://www.evolutionnews.org/2009/05/guy_walks_into_a_bar_and_think.html#morebornagain77
May 23, 2011
May
05
May
23
23
2011
02:20 PM
2
02
20
PM
PDT
It's interesting looking for good graphical representations of common descent. And humbling to realise how much time and effort has been put into the research behind this: http://upload.wikimedia.org/wikipedia/commons/0/0e/MyosinUnrootedTree.jpgellazimm
May 23, 2011
May
05
May
23
23
2011
02:12 PM
2
02
12
PM
PDT
BA77: I'd like to see the whole 'tree' of life subjected to the best and most recent genetic analysis, kangaroos included! I LIKE DATA!! I'll try and track down a good graphic showing how humans and marsupials fit into common descent based on the predominant view.ellazimm
May 23, 2011
May
05
May
23
23
2011
02:05 PM
2
02
05
PM
PDT
ellazimm, and I would like to see just where kangaroos fall on that tree: Evolutionists were recently completely surprised by this genetic study of kangaroos: Kangaroo genes close to humans Excerpt: Australia's kangaroos are genetically similar to humans,,, "There are a few differences, we have a few more of this, a few less of that, but they are the same genes and a lot of them are in the same order," ,,,"We thought they'd be completely scrambled, but they're not. There is great chunks of the human genome which is sitting right there in the kangaroo genome," http://www.reuters.com/article/science%20News/idUSTRE4AH1P020081118 I'm just left wondering exactly where evolutionists should place the kangaroos on their cartoon drawings that show man evolving from apes.bornagain77
May 23, 2011
May
05
May
23
23
2011
01:57 PM
1
01
57
PM
PDT
As to paulmc trying to assert that gene and protein sequences between chimps and humans are virtually identical, I would like to point out that evolutionists have severely distorted this particular line of evidence to fit their preconceived bias: There are over 1000 completely unique 'ORFan' genes, not found in any other known species, intertwined in the 22,000 genes of the human genome: Human Gene Count Tumbles Again - 2008 Excerpt: Applying this technique to nearly 22,000 genes in the Ensembl gene catalog, the analysis revealed 1,177 “orphan” DNA sequences. http://www.sciencedaily.com/releases/2008/01/080113161406.htm The authors of the preceding study tried to 'remove the unique genes' from the human gene catalog simply because they could not find matches in any supposed precursor species, and not for any lack of functionality on the genes part. This following site has a brief discussion on that biased methodology of the preceding study: https://uncommondescent.com/intelligent-design/proteins-fold-as-darwin-crumbles/#comment-358505 Moreover the 'anomaly' of unique ORFan genes is found in every new genome sequenced thus far: Widespread ORFan Genes Challenge Common Descent – Paul Nelson – video with references http://www.vimeo.com/17135166 As well, completely contrary to evolutionary thought, these 'new' ORFan genes are found to be just as essential as 'old' genes for maintaining life: Age doesn't matter: New genes are as essential as ancient ones - December 2010 Excerpt: "A new gene is as essential as any other gene; the importance of a gene is independent of its age," said Manyuan Long, PhD, Professor of Ecology & Evolution and senior author of the paper. "New genes are no longer just vinegar, they are now equally likely to be butter and bread. We were shocked." http://www.sciencedaily.com/releases/2010/12/101216142523.htm I would like to reiterate that evolutionists cannot account for the origination of even one unique gene or protein, much less the over one thousand completely unique ORFan genes found distinctly embedded within the 20,000 genes of the human genome: Protein Folding and Evolution - December 2010 Excerpt: A typical gene has something like a thousand nucleotides. Given that there are four different types of nucleotides, this means there are 4^1000 different sequences that could make up the gene. This is equal to a 1 followed by about 600 zeros—a big number. That’s more than the number of nano seconds since the Big Bang—by about 10^574 (a 1 followed by 574 zeros). Finding the right gene sequence to get a particular job done in the cell would make finding a needle in a haystack seem easy. The problem is so difficult that we haven’t yet figured out the answer, but it would be a 1 in 10^100++ long shot. Do not try this at home. http://darwins-god.blogspot.com/2010/12/protein-folding-and-evolution.html Could Chance Arrange the Code for (Just) One Gene? "our minds cannot grasp such an extremely small probability as that involved in the accidental arranging of even one gene (10^-236)." http://www.creationsafaris.com/epoi_c10.htm "Estimating the Prevalence of Protein Sequences Adopting Functional Enzyme Folds” 2004: - Doug Axe ,,,this implies the overall prevalence of sequences performing a specific function by any domain-sized fold may be as low as 1 in 10^77, adding to the body of evidence that functional folds require highly extraordinary sequences." http://www.mendeley.com/research/estimating-the-prevalence-of-protein-sequences-adopting-functional-enzyme-folds/ So where did these completely unique genes come from and why are neo-Darwinists so disposed to ignore them??? i.e. why are we continually sold a bill of goods as to the actually evidence by neo-darwinists??? ,,, and this tidbit that just came out last week is also of interest: New level of genetic diversity in human RNA sequences uncovered Excerpt: They call these sites RNA-DNA differences, or RDDs. They found at least one RDD site in about 40 percent of genes, and many of these RDDs cause the cell to produce different protein sequences than would be expected based on the DNA. In the cells they studied, the sequences of thousands of proteins may be different from their corresponding DNA, the scientists say. http://www.physorg.com/news/2011-05-genetic-diversity-human-rna-sequences.html My hunch is that these RNA sites are drastically different between chimps and man as well???bornagain77
May 23, 2011
May
05
May
23
23
2011
01:54 PM
1
01
54
PM
PDT
You mentioned in your two posts the sources of your two human genomic data sets. Do you have any information which would indicate how closely those two sets might reflect a similar genetic background? I.E. were they both white western Europeans? I think this is really interesting but I'd like to see more comparisons run with many different human genome datasets. One comparison does not a hypothesis make. Twenty comparisons is starting to establish a case. One hundred comparisons just might be establishing a paradigm. Also, it would be interesting, if your metric proves valuable, to compare chimps and gorillas and orangutangs and humans to see if the similarity analysis matches the consensus tree of descent. Sounds like a good area of ID research!!ellazimm
May 23, 2011
May
05
May
23
23
2011
01:39 PM
1
01
39
PM
PDT
So the bottom-line question is: if, as many evolutionists say, chimpanzee and human genomes are 99% identical, how “identical” are two human genomes?
Well, what you've said above is not a quote from the researchers involved. From NIH on the topic in 2005:"The consortium found that the chimp and human genomes are very similar and encode very similar proteins. The DNA sequence that can be directly compared between the two genomes is almost 99 percent identical. When DNA insertions and deletions are taken into account, humans and chimps still share 96 percent of their sequence. At the protein level, 29 percent of genes code for the same amino sequences in chimps and humans. In fact, the typical human protein has accumulated just one unique change since chimps and humans diverged from a common ancestor about 6 million years ago." Emphasis added. Source.paulmc
May 23, 2011
May
05
May
23
23
2011
01:09 PM
1
01
09
PM
PDT
Last post had a bad typo: I think an accepted percent difference between humans is 0.1-0.2%, so the 30bp metric works really consistently. If you divide the 30bp metric of 3.84% by 24 you get .16%. Similarly, 38.27 divided by 24 is 1.59%. As mismatches increase, the 30bp metric might tend to underestimate, as a 2 nt difference in a 30 nt string still produces a mismatch. At this level, there will probably be zero matches anyway. But for highly similar targets, it seems like a pretty fast and effective test.DrREC
May 23, 2011
May
05
May
23
23
2011
01:04 PM
1
01
04
PM
PDT
I think an accepted percent difference between humans is 0.1-0.2%, so the 30bp metric works really consistently. If you divide the 30bp metric by 3.84% by 24 you get .16% and 38.27 by 24, it is 1.59%. As mismatches increase, the 30bp metric might tend to underestimate, as a 2 nt difference in a 30 nt string still produces a mismatch. But for highly similar targets, it seems like a pretty fast and effective test.DrREC
May 23, 2011
May
05
May
23
23
2011
01:01 PM
1
01
01
PM
PDT
Thanks, this is a very interesting post. Couple of questions, as I'm kind of new to this area: Could you tell us a bit more about how the matches are done (it may be in your previous post, but the link doesn't work). Also, when you say that the differential was 384/10000 for humans, is that base pair hits, identified gene coding sequences, etc.? Does it take into account moves, introns, etc.? I thought the early chimp comparison was done by breaking up the dna and seeing how much recombined with chimp dna, yielding a very high (but very suspect) number. Based on your numbers above, could one argue that a chimp is only 61% similar, or do your numbers point to something else?Eric Anderson
May 23, 2011
May
05
May
23
23
2011
12:29 PM
12
12
29
PM
PDT
1 3 4 5

Leave a Reply