A simple statistical test for the alleged “99% genetic identity” between humans and chimps

_{September 27, 2010

Genomics, Informatics, Intelligent Design}

Share: Facebook; Twitter; LinkedIn; Flipboard; Print; Email

Typical figures published in the scientific literature for the percentage similarities between the genomes of human beings (Homo sapiens) and chimpanzees (Pan troglodytes) range from 95% to 99%. However, in press releases intended for popular consumption, evolutionary biologists frequently claim that human and chimpanzee genomes are 99% identical. Skeptics of neo-Darwinian evolution have repeatedly punctured this”99% myth,” but unfortunately, it seems to have gained widespread credence, due to its being continually propagated by evolutionists! For instance, one often encounters statements like these in the literature:

“Because the chimpanzee lies at such a short evolutionary distance with respect to human, nearly all of the bases are identical by descent and sequences can be readily aligned” (The Chimpanzee Sequencing and Analysis Consortium, Initial sequence of the chimpanzee genome and comparison with the human genome, Vol. 437/1 September 2005/doi:10.1038/nature04072).

“The consortium [National Human Genome Research Institute] found that the chimp and human genomes are very similar and encode very similar proteins. The DNA sequence that can be directly compared between the two genomes is almost 99 percent identical.” (here.)

“The genetic codes of chimps and humans are 99 percent identical.” (here)

Supporters of the neo-Darwinian theory of evolution have a strong ideological motivation for minimizing the differences between humans and chimps, as they claim that these two species evolved from a common ancestor, as a result of random mutations filtered by natural selection. Now, I don’t personally believe that humans and chimps share a common ancestry, for a host of reasons that would take me too long to explain in this post. Nor do I attach much significance to the magnitude of the genetic differences between these two species, per se, because in my opinion, the fundamental differences between these creatures lie elsewhere. However, since the genomic data is now available for free on the Internet, I decided to perform some sleuthing of my own, and check out the wildly exaggerated claims that are often made regarding the percentage similarities between human and chimp genomes. Here is what I discovered.

Interactive functional comparison methods
Usually, molecular biologists compare genomes on a functional basis. For example, they may search for similar genes in the genomes of human beings and chimpanzees, and try to identify the bases or nucleotides where they differ or match. Many different technologies have been developed to investigate genomes. One of these is BLAST (Basic Local Alignment Search Tool) software (see the NCBI Web site for more details). BLAST is an extremely powerful computer aided tool, as it is able to locate regions of local similarity among sequences by searching a whole database of genomes. Alignment methods (such as those implemented by BLAST and other techniques) allow geneticists to search interactively for common local patterns in different positions. However, this interactive task has its limits, as it can compare only portions of different genomes. Additionally, some critics have pointed out that these tools are susceptible to slip-ups (see here). Given the amount of data involved (in the order of Gigabytes), the global comparison of two genomes is a very demanding job, which cannot be completed interactively in a short time by human beings, even with the aid of tools such as BLAST. At the present time, only fully automated computer programs are capable of performing such a task on entire genomes. However, the development of an automated computer program which is capable of performing a complete functional comparison between human and chimpanzee genomes is practically impossible, for the simple reason that the functional architecture of these genomes is not yet perfectly known.

Automatic statistical comparison methods
From a mere informatics and statistical point of view, DNA sequences are simply strings of symbols or characters. Thus it is also possible to develop tests comparing genomes as unstructured sequences of characters, without taking into consideration genes, pseudo-genes, coding and non-coding regions, vertical and horizontal gene transfer, open reading frames (ORFs), or any other functional concepts. The characters most commonly present in DNA sequences are A, C, G and T. There are other less important characters which are used basically to indicate ambiguity regarding the identity of certain bases in the sequences. The comparison I performed was completely different from those usually performed by geneticists, because was purely statistical in nature. In a sense, it could be described as an application of the well-known Monte Carlo method. The Monte Carlo method is frequently used when data or processes involved are huge, and one wants to reduce the computer running time. In short, it involves dealing with a partial random sample, instead of the whole space which is under investigation. In the Monte Carlo method, only a small portion of the data population is actually investigated; nevertheless, this portion is statistically large enough to reveal the characteristics of the whole.

Metrics, distances and similarity measures
One theoretical approach to the problem would be to consider the set of all strings of characters as a metric space, and then define a distance function for all pairs of strings. Many distance functions have been developed by mathematicians for studying the degree of similarity between strings (for a list of them see (here). Given a metric or pseudo-metric space and its distance function, we can refer to a particular similarity, which differs from the similarity distance of another metric space. In a pairwise comparison identity test, we can easily calculate a simple metric distance called the “Hamming distance.” In this test, the order is important, because the n-th character of string A is compared to the n-th character of string B, after the initial characters of A and B have been aligned. After each comparison, if the two characters don’t match, then the Hamming distance increases by 1. If the order doesn’t matter, we can compare sub-strings of the parent strings A and B. Additionally, if they are at different positions in the two strings then many different tests are possible. We call these pattern matching or similarity tests. While there is only one possible method of comparing identity between strings of characters (the above pairwise comparison), there are many methods of comparing similarity. In other words, there are many measures of similarity, depending on the rules of pattern matching that we choose. In practice, calculating a certain distance function between two genomes can be a demanding job, in terms of running time, even for powerful computers.

Specifications for a statistical similarity test
Any final result for a complete statistical similarity test (especially if it is a unique number) is meaningful only if: 1) the distance function is mathematically defined; 2) the rules for pattern matching and the formulas for calculating the result are explained in detail; 3) it is clearly stated which parts of the input strings are being examined; 4) in the event that computer programs were used to perform the comparison, the source codes and algorithms are provided. My explanations below have the goal to meet the three first constraints. To satisfy the fourth condition, the source file of the Perl script used for the test is freely downloadable here.

How the genome data was obtained
Genome data for Homo sapiens and Pan troglodytes was freely downloaded from public bio-informatics archives at UCSC Genome Bioinformatics. The downloaded DNA sequences were in FASTA format. Before running the test, I decided to discard all symbols in the sequences, except for A, C, G and T. Most of the symbols I had to discard were “N” symbols, which represented rare, undefined situations (probably due to the level of sophistication of the scanning technology). The frequency of other symbols was very low. As it turned out, the deletion of a few “N” symbols didn’t affect the overall result very much. Given that the chimp’s genome contains two chromosomes (referred to as chr_02a and chr_02b) corresponding to chromosome #2 in human beings, I decided to concatenate them, in order to compare them with human chromosome #2 (chr_02).

30 Base Pattern Matching (30BPM) similarity test
The 30BPM similarity test is a very simple one: it performs searches for shared 30 base-long patterns, on two homologous chromosomes. This method is a true pattern-matching test, because it searches for identical patterns in the chromosomes of humans and chimpanzees. The beauty of this test is that it allows patterns to match, independently of their position in the chromosome. The significance of local similarities in homologous chromosomes is that identical patterns may be found in quite different positions along the two chromosomes. In fact, this test allows a total scrambling of patterns between homologous chromosomes. Of course, it is generally very difficult to know what the functional implications of this scrambling are. In particular, the positions of the genes might shift, but when non-gene coding is scrambled, it is doubtful that functionality is preserved. However, from a purely quantitative point of view, in this particular test, I don’t need to worry about qualitative issues such as functionality; only statistical issues count.

The algorithm implemented
For each pair of homologous chromosomes A and B, a PRNG (pseudo-random number generator) generates 10,000 uniformly distributed pseudo-random numbers which specify the offset, or starting point, of 10,000 30-base patterns that are contained in source chromosome A. The 30BPM test involves searching for all 10,000 of these DNA sub-strings of chromosome A in our target chromosome B. Now let F be the number of patterns located (at least once) in chromosome B. The 30BPM similarity is simply defined as F/100 (minimum value = 0%, maximum value = 100%). The absolute difference between 10,000 and F (minimum 0, maximum 10,000) is the 30BPM distance. Thus the greater the similarity is, the smaller the distance will be. Strictly speaking, this 30BPM space is only a pseudo-metric, inasmuch as the axiom of identity (“the distance is zero if and only if A and B are equal”) defining a true metric space is somewhat relaxed (in some cases, the distance could still be zero even if A and B were different), while the axiom of symmetry (“the distance between A and B is equal to the distance between B and A”) does not hold in some cases. It can easily be seen that the 30BPM distance will be zero (30BPM similarity = 100%) if the two strings are identical. In an additional test which I performed on two random 100 million-base DNA strings, the 30-BPM distance was 10,000 (i.e. no patterns on A were located in B). Hence I shall refer to the value 10,000 as the “random 30BPM distance.” In other words, the 30BPM similarity between two artificially generated random 100 million-base DNA strings is zero. Of course, when generating these artificial DNA strings I had to take into consideration the fact that that on average, the true probabilities of A, T, G and C occurring in natural DNA are not exactly 0.25 each, but as follows: A=0.3, T=0.3, G=0.2, and C=0.2. In such a case, the following formula accurately describes the probability of obtaining a single-base match between the two DNA sequences:

(30*30 + 30*30 + 20*20 + 20*20)/(100*100) = (900+900+400+400)/10000 = 26%

In a supplementary test in which I performed a pure pair-wise comparison between human/chimp genomes, I obtained a global figure 25.90%, which matches very closely with the theoretically predicted result above.

Results obtained
The following table and graph show the report of the 30BPM similarity test on the whole set of human/chimp chromosomes.

The results obtained are statistically valid. The same test was previously run on a sampling of 1,000 random 30-base patterns and the percentages obtained were almost identical with those obtained in the final test, with 10,000 random 30-base patterns. When human and chimp genomes are compared, the X chromosome is the one showing the highest degree of 30BPM similarity (72.37%), while the Y chromosome shows the lowest degree of 30BPM similarity (30.29%). On average the overall 30BPM similarity, when all chromosomes are taken into consideration, is approximately 62%. Here we have the classic case of the glass which some people perceive as being half-full, while others perceive it as being half-empty. When compared to two random strings which are 0% similar, 62% is a very large value, so nobody would deny that human and chimp genomes are quite similar! On the other end, 62% is a very low value when compared to the more than 95% similarity percentages which are published by bioinformatics evolutionary researchers. Now, I realize that it may seem somewhat arbitrary to choose 30-base-long patterns, as I did in my test, and indeed it is arbitrary to some degree. However, if the two genomes were really 95% similar or more, as is commonly claimed, also a 30BPM statistical test should produce 95% results, and it does not.

An analogy from politics: an exit poll
To help readers to grasp the significance and potential implications of my test, here is a simple analogy. Consider an election, in which 100 million electors are eligible to vote. One exit poll, based on a sample of 10,000 voters, calculates that party X has received 62% of the popular vote. However, at the end of election party X declares it has received more than 95% of the vote! The 30BPM statistical test described above is analogous to the exit poll, while the claims made by evolutionary biologists are analogous to party X’s “95%” claim. The sample of 10,000 patterns is taken from a global population of 100 million bases (the approximate number of bases on a typical human/chimp chromosome), while the ratio of population to sample is 100,000,000/10,000=10,000. The 30BPM exit poll metaphorically says that only 62% voted for Darwin’s party, whereas modern Darwinists claim that over 95% did. Something doesn’t quite add up.

I believe that the classic evolutionary comparisons between human and chimp genomes exaggerate the similarities, for at least two reasons: (1) they don’t consider whole chromosomes, but only portions of them (e.g. particular genes); (2) the rules of pattern matching are relaxed in some way (e.g. sometimes two bases are said to match, even when they don’t really match). Now, there is nothing intrinsically wrong with comparisons where (1) and (2) hold. However, any research that is truly worthy of being called “scientific” should openly acknowledge built-in limitations, such as (1) and (2) above. Sadly, this is very rarely done. It is perfectly acceptable to publish partial results that are obtained by relaxing the rules, but one should not publicize them as global and mathematically sound, when in fact, they are nothing of the sort.

Conclusion
We have seen that in a genome comparison, the only thing that matters is the degree of similarity. However, once we put the concept of similarity between two text strings on the table we open a can of worms. Many different measures of the similarity between two strings are possible, and different methods of comparing two genomes can result in wildly different estimates of the similarity between them. The assumptions that drive the methods used also drive the results obtained, as well as their interpretation. A simple layman’s statistical test, such as the 30BPM, shows that the “95% claim” described above is a highly controversial one. It is worth noting that as more information comparing the two genomes is published, the differences between them will appear more profound than they were originally thought to be. The big question that still remains is: what should one conclude from the similarities and differences between the genomes of humans and chimpanzees? Commonly reported evolutionary statistics that should provide an informative answer to this question may actually obscure the true answer.

Comments

Looks like it dd not work. Hope this works.andrewjg_{September 29, 2010
September
09
Sep
29
29
2010
11:28 AM
11
11
28
AM
PDT}

AMW@50 Because someone open an italic tag and never closed it. I'll close it for you. Should be be normal now.andrewjg_{September 29, 2010
September
09
Sep
29
29
2010
11:14 AM
11
11
14
AM
PDT}

Does anyone know why all the comments after #37 are showing up in italics?AMW_{September 29, 2010
September
09
Sep
29
29
2010
09:23 AM
9
09
23
AM
PDT}

If you're looking to repeat the test on different human genomes, look here: http://www.ncbi.nlm.nih.gov/sites/entrez?db=genomeprj&cmd=Retrieve&dopt=Overview&list_uids=9558CJL2718_{September 29, 2010
September
09
Sep
29
29
2010
08:52 AM
8
08
52
AM
PDT}

Some of the characters have been lost in my last post, when I compare the value of 30-BPM with 40-BPM and 50-BPM, there should be a equivalence symbol between 30-BPM/40-BPM and between 40-BPM/50-BPM.CharlesJ_{September 29, 2010
September
09
Sep
29
29
2010
06:52 AM
6
06
52
AM
PDT}

I’m still not sure we understand each other very well, and by reading my older post I think it is at least in part because my comment were not clear enough. I will try one last time with an example. If you were to run the same analysis, using the same dataset (human and chimp genome) and only changing the size of the pattern analyzed (i.e. ; 40-BPM and 50-BPM), you would get very different results. For a 40-BPM analysis, it would be around 53% and for a 50-BPM analysis it would be around 45% (I mixed those up in one of my last post, those are the right values). Yet, it is still the same type of analysis on the same dataset. Why would we get so different results? Should we not expect that (30-BPM similarity 40-BPM similarity 50-BPM similarity)? Actually, all those values are indeed equivalent if you take into account the length of the pattern (and the average mismatch expected in the patterns rejected in your analysis). I can give you the calculation if you want, but for the sake of simplicity I’ll only tell you that you have to divide the % of similarity by 24 in a 30-BPM analysis, by 30 in a 40-BPM analysis and by 35 in a 50-BPM analysis. This will give you: ((30-BPM = 1.58) (40-BPM = 1.58) (50-BPM – 1.58)) 1.58 is a constant in every analysis, and I’ll let you guess what it is. The conclusion is: In order to be able to compare your results with results obtained in the literature and also to be able to compare your results within your own algorithm using different size of pattern, you can’t directly use BPM % of similarity.CharlesJ_{September 29, 2010
September
09
Sep
29
29
2010
06:48 AM
6
06
48
AM
PDT}

gpuccio: How did anyone achieve this result? Through a simple mutation?gpuccio_{September 29, 2010
September
09
Sep
29
29
2010
05:37 AM
5
05
37
AM
PDT}

one more testbornagain77_{September 29, 2010
September
09
Sep
29
29
2010
05:37 AM
5
05
37
AM
PDT}

checkbornagain77_{September 29, 2010
September
09
Sep
29
29
2010
05:34 AM
5
05
34
AM
PDT}

testbornagain77_{September 29, 2010
September
09
Sep
29
29
2010
04:02 AM
4
04
02
AM
PDT}

I appreciate the collaborative spirit showed in this thread by all. AMW #35 Your idea of simulating evolution is a good one. Unfortunately it is not at all easy to implement because implies knowledge of evolutionary theory and genomics far beyond mine (and maybe involves issues that today are controversial among the evolutionary biologists themselves). markf #41 Also your idea to test against two unrelated human genomes is interesting and perhaps easier to implement than AMW’s, given we find such genomes somewhere. I promise nothing but if I will do other studies along the line of your suggestions I will post here the results if noteworthy. The good news: it seems that statistical methods of comparison, as the 30BPM test, can have a place beside the functional approaches and give results comparable. The former are simple and automatic, the latter are complex and interactive (and need big knowledge of the field). What’s sure is that there is really a lot of work to do in bio-informatics.niwrad_{September 29, 2010
September
09
Sep
29
29
2010
03:49 AM
3
03
49
AM
PDT}

#35 Good idea. Or (if the data is available) it might be easier to do what I suggest in #2 and run the test against two unrelated human genomes or indeed any two individuals of the same species . This would give a benchmark.markf_{September 28, 2010
September
09
Sep
28
28
2010
11:06 PM
11
11
06
PM
PDT}

I'm just not seeing how using niwrad's algorithm on two mock genomes is a biased test. I'm also not sure why you keep insisting that I am biased in my approach to the data. Do you think I'm unwilling or incapable of changing my mind on the subject in response to evidence?AMW_{September 28, 2010
September
09
Sep
28
28
2010
08:05 PM
8
08
05
PM
PDT}

AMW, excuse me for not more directly engaging in the debate of how to get a more biased genetic similarity reading for evolution or not. I thought I made my case clear by showing that evolutionists have completely biased the previous test in the first place by throwing out dissimilar "Junk DNA" sequences. Dissimilar Junk DNA sequences that are now known not to be 'Junk" at all, but are in fact known to contain high level regulatory information. Information that is of a 'deeper', more crucial, level than the genes themselves are since they regulate the genes,,, I sorry that you don' find it interesting that since evolution is shown to be impossible, even to be impossible as proposed by the very foundational mechanisms, and equations, proposed and used by leading evolutionists themselves, as I clearly illustrated in the following post you refused to read,,,, https://uncommondescent.com/intelligent-design/a-simple-statistical-test-for-the-alleged-99-genetic-identity-between-humans-and-chimps/#comment-364779 ,,,then the point of genetic similarity is moot! Moot, absurd, and completely irrelevant since it is shown that the evolution of Humans from some hypothetical ape-like ancestor is completely impossible, even if the most generous assumptions are granted to the Darwinian methods, for ascertaining genetic similarity. Myself, though you may not find that fact interesting, I find it very interesting since it in fact goes to the very heart of the matter being discussed! ,,, That you would refuse to even acknowledge the crushing foundational evidence that is brought to bear against your position, brought to bear by the work of evolutionists themselves, reveals that you don't really seem to care to be objective in this matter as to carefully weigh the evidence so to find the truth of whether you evolved from some ape-like creature or not. I would think such an important matter would make you a little more careful as to how you looked at the evidence.,,, Further notes: 4 Nails in The Coffin of Darwin Population Genetics Vs. Whale Evolution - Part 1 - Richard Sternberg PhD in Evolutionary Biology http://www.metacafe.com/watch/5263733 Neo-Darwinism Vs. Whale Evolution - Part 2 - Richard Sternberg PhD in Evolutionary Biology http://www.metacafe.com/watch/5263746bornagain77_{September 28, 2010
September
09
Sep
28
28
2010
06:50 PM
6
06
50
PM
PDT}

Well, that'll teach me to try my hand at html tags!AMW_{September 28, 2010
September
09
Sep
28
28
2010
04:50 PM
4
04
50
PM
PDT}

bornagain77, I don't understand how the test I proposed is biased. If A' and B' are more similar than the human and chimp genomes (using niwrad's algorithm), then that counts as evidence against evolution. If they are about as similar as the human and chimp genomes, then that counts as evidence in its favor. Not proof, certainly, but definitely evidence. I'm offering a refinement to niwrad's current research agenda. That's not uncommon in scholarly disciplines. As for your foundational evidence, I quit reading your links when it became clear that you'll only respond to criticisms of your links with yet more links, ad nauseum. Niwrad clearly has some intent to engage in reasoned discourse, so I'm more than happy to respond to him. Give me something roughly equivalent to the comments he has*, and I'll be more likely to respond to you as well. *To wit: cordial, on point, cohesive and non-redundant.AMW_{September 28, 2010
September
09
Sep
28
28
2010
04:49 PM
4
04
49
PM
PDT}

AMW, and exactly why should any similarity evidence be considered more trustworthy over the more foundational evidence I presented here? https://uncommondescent.com/intelligent-design/a-simple-statistical-test-for-the-alleged-99-genetic-identity-between-humans-and-chimps/#comment-364779 Does only evidence for evolution count in your book? ,,,Even though the evidence against evolution is of a more solid basis scientifically? Does it not bother you to be so biased in your weighing of the evidence?bornagain77_{September 28, 2010
September
09
Sep
28
28
2010
03:37 PM
3
03
37
PM
PDT}

niwrad, ToE says chimps are probably very close cousins to humans. If that's true it means that at one time we had the same genome. (More correctly, our genomes were in the same pool, but thinking about a single genome is simpler.) So the ToE model says there was this basal genome that some ancient species of ape had. The breeding population with that genome split in (at least) two. The genome of one population acquired one set of mutations, and eventually became modern chimps. The genome of the other acquired an independent set of mutations, and eventually became modern humans. (Obviously, I'm leaving out a lot of subsequent splits along the way.) Since the split didn't happen that long ago (in geologic time) the genomes of chimps and humans should be *very* similar, because they haven't had that much time to acquire independent mutations. You purport to show that the human and chimp genomes aren't all that similar, so ToE is wrong (or has a big hole in it). Charlesj, markf and I have argued that actually your algorithm is biased toward showing low levels of similarity, even between genomes that are very similar. So why don't you do the following? Create a genome; just a long series of A's, T's, G's, and C's. It doesn't have to be meaningful, just a string that's more or less as long as a human or chimp genome. Next, make an exact copy of that genome, so you've got copy A and copy B. Now, put copy A through a series of "mutations." Randomly change some of the letters, insert new ones, delete others, take chunks of letters from one place in the string and put them somewhere else, reverse some of their ordering, etc. All the mutations should be of the type we find in nature, and in their observed proportions. And there should be about as many mutations as ToE suggests there would have been between the time the human/chimp common ancestor split and now. Call this mutated version of A copy A'. Next, do the same process on copy B, but make sure you're mutating it in an independent process. Call the resulting copy B'. Finally, use your algorithm described above to compare A' and B' for similarity. Since you're doing a Monte Carlo, you probably want to repeat the process of independent mutation and comparison 1,000+ times. Then you can report back on the results. Here's the rub. If your comparisons come back that A' and B' are, on average, 95% or so similar, you've got some evidence that ToE is wrong, because you've done a simulation of speciation and the genomes are more similar than we find in the real world. But if they come back such that A' and B' are, say, ~65% similar, that's evidence in support of ToE, because your simulation of speciation produces similarities that are comparable to those found in the real world. In short, I like that you're getting your hands dirty with the data. I just want a more rigorous treatment of it before I accept your conclusions.AMW_{September 28, 2010
September
09
Sep
28
28
2010
03:07 PM
3
03
07
PM
PDT}

correction; the fact is that 1% of roughly 3 billion is a 30 million DNA base pair difference and yet you can’t even account for the fixation of a single coordinated mutation within the human lineage!! ,,,bornagain77_{September 28, 2010
September
09
Sep
28
28
2010
02:38 PM
2
02
38
PM
PDT}

CharlesJ #31 - markf #30 I agree with your remarks and evaluations. However, as I said previously in my article, the 30BPM test, which I declared to be a similarity test, considers as matching patterns independently from their positions in the target genome. In these conditions, whatever be the quantitative values obtained in the tests, to speak of genomic "identity" is improper, since identity would imply that the matching patterns have also the same positions in the source and target genomes. In general this is not the case in the human and chimp genomic comparisons. As a consequence I think that my criticism about the "99% identity" as publicized continues to be valid. I have noted that CharlesJ aptly uses the term "homology" to describe the situation and I agree with him about this terminology. Homology and similarity are terms far more convenient than identity in genomics. Not only numbers, also words matter.niwrad_{September 28, 2010
September
09
Sep
28
28
2010
02:16 PM
2
02
16
PM
PDT}

CharlesJ, I'm of the completely opposite camp that believes we should throw out all the biased genetic similarity studies that have focused solely on finding similarities in the genomes while throwing out all discrepancies. Primarily I do this because looking solely for similarities presupposes that humans evolved from apes in the first place and will thus end up proving its presupposition in its final analysis. For example Charles look how much of the genome was 'thrown out' here: Chimpanzee? - Richard Buggs PhD. Excerpt: When we do this alignment, we discover that only 2400 million of the human genome’s 3164.7 million ’letters’ align with the chimpanzee genome - that is, 76% of the human genome. Some scientists have argued that the 24% of the human genome that does not line up with the chimpanzee genome is useless ”junk DNA”. http://www.idnet.com.au/files/pdf/Chimpanzee.pdf Charles perhaps you say that they were justified to throw out 24% of the genome?!? If that is the case please let me ask you just how much of the kangaroo genome should we be allowed to throw out to find similarity with humans???: Kangaroo genes close to humans Excerpt: Australia’s kangaroos are genetically similar to humans,,, “There are a few differences, we have a few more of this, a few less of that, but they are the same genes and a lot of them are in the same order,” ,,,”We thought they’d be completely scrambled, but they’re not. There is great chunks of the human genome which is sitting right there in the kangaroo genome,” http://www.reuters.com/article/idUSTRE4AH1P020081118 If you say we shouldn't be allowed to why not? You see Charles you can't presuppose your conclusion into the way in which you gather evidence or it will give you a false positive in your final analysis! But perhaps you say that the 24% is Junk and 'deserved' to be thrown out???? If you think so that is a another false assumption for there has now been found regulatory codes in the "Junk DNA" that is of a higher level of information than the genetic code is: This following study, that discovered a 'Second Regulatory Code" on top of the protein coding DNA code: Nature Reports Discovery of “Second Genetic Code” But Misses Intelligent Design Implications - May 2010 Excerpt: Rebutting those who claim that much of our genome is useless, the article reports that "95% of the human genome is alternatively spliced, and that changes in this process accompany many diseases." ,,,, the complexity of this "splicing code" is mind-boggling:,,, A summary of this article also titled “Breaking the Second Genetic Code” in the print edition of Nature summarized this research thusly: “At face value, it all sounds simple: DNA makes RNA, which then makes protein. But the reality is much more complex.,,, So what we’re finding in biology are: # “beautiful” genetic codes that use a biochemical language; # Deeper layers of codes within codes showing an “expanding realm of complexity”; # Information processing systems that are far more complex than previously thought (and we already knew they were complex), including “the appearance of features deeper into introns than previously appreciated” http://www.evolutionnews.org/2010/05/nature_reports_discovery_of_se.html This following paper highlights the regulatory role that the 'second code' has over the primary protein coding DNA code: Researchers Crack ‘Splicing Code,’ Solve a Mystery Underlying Biological Complexity Excerpt: “For example, three neurexin genes can generate over 3,000 genetic messages that help control the wiring of the brain,” says Frey. “Previously, researchers couldn’t predict how the genetic messages would be rearranged, or spliced, within a living cell,” Frey said. “The splicing code that we discovered has been successfully used to predict how thousands of genetic messages are rearranged differently in many different tissues. http://www.sciencedaily.com/releases/2010/05/100505133252.htm Thus Charles we have high level function arising from 'Junk DNA' regions that have in all probability not been stringently accounted for in previous 'similarity' studies of evolutionists simply because they did not match. Much like how you are trying to arrive at a artificially high percentage of similarity at the present time!!! As well Charles, even if we assume that the rate of mutations to DNA was not overwhelmingly detrimental, the time it would take to fix a 'coordinated' beneficial mutation in the human lineage is 216 million years, and this number (216 m.y.) is taken directly from a paper written by a Darwinist using the equations of the 'modern synthesis!!! Waiting Longer for Two Mutations - Michael J. Behe Excerpt: Citing malaria literature sources (White 2004) I had noted that the de novo appearance of chloroquine resistance in Plasmodium falciparum was an event of probability of 1 in 10^20. I then wrote that ‘‘for humans to achieve a mutation like this by chance, we would have to wait 100 million times 10 million years’’ (Behe 2007) (because that is the extrapolated time that it would take to produce 10^20 humans). Durrett and Schmidt (2008, p. 1507) retort that my number ‘‘is 5 million times larger than the calculation we have just given’’ using their model (which nonetheless "using their model" gives a prohibitively long waiting time of 216 million years). Their criticism compares apples to oranges. My figure of 10^20 is an empirical statistic from the literature; it is not, as their calculation is, a theoretical estimate from a population genetics model. http://www.discovery.org/a/9461 Thus Charles can you see the problem? Even if we presupposed evolution to be true and lined up the genomes as best we could, and threw out all the mismatches, and arrived at the 99% number that evolutionists so desperately want us to arrive at, the fact is that 1% of roughly 3 billion is a 3 million DNA base pair difference and yet you can't even account for the fixation of a single coordinated mutation within the human lineage!! This is more than a slight problem for evolutionists. As well Charles DNA does not even encode for Body Plan morphogenesis in the first place, so the point of DNA similarity or dissimilarity, from a strict scientific perspective, is moot, since mutations to DNA is not even the right tool for the job for constructing a new animal. In fact mutations to the DNA can in all honesty can be considered the bottom rung of the ladder as far as the information hierarchy of the cell is concerned: Stephen Meyer on Craig Venter, Complexity Of The Cell & Layered Information http://www.metacafe.com/watch/4798685 Splicing Together the Case for Design, Part 2 (of 2) - Fazale Rana - June 2010 Excerpt: Remarkably, the genetic code appears to be highly optimized, further indicating design. Equally astounding is the fact that other codes, such as the histone binding code, transcription factor binding code, the splicing code, and the RNA secondary structure code, overlap the genetic code. Each of these codes plays a special role in gene expression, but they also must work together in a coherent integrated fashion. The existence of multiple overlapping codes also implies the work of a Creator. It would take superior reasoning power to structure the system in such a way that it can simultaneously harbor codes working in conjunction instead of interfering with each other. As I have written elsewhere, the genetic code is in fact optimized to harbor overlapping codes, further evincing the work of a Mind. http://www.reasons.org/splicing-together-case-design-part-2-2 As well Charles I don't know what propaganda you have been fed on the fossil record of humans and apes, but the fossil record is certainly not the neat little progression of apes evolving into man that we see popularly depicted in those cartoons: Evolution of the Genus Homo – Annual Review of Earth and Planetary Sciences – Tattersall, Schwartz, May 2009 Excerpt: “Definition of the genus Homo is almost as fraught as the definition of Homo sapiens. We look at the evidence for “early Homo,” finding little morphological basis for extending our genus to any of the 2.5–1.6-myr-old fossil forms assigned to “early Homo” or Homo habilis/rudolfensis.” http://www.annualreviews.org/doi/abs/10.1146/annurev.earth.031208.100202 this might interest you Charles: Shoddy Engineering or Intelligent Design? Case of the Mouse's Eye - April 2009 Excerpt: -- The (entire) nuclear genome is thus transformed into an optical device that is designed to assist in the capturing of photons. This chromatin-based convex (focusing) lens is so well constructed that it still works when lattices of rod cells are made to be disordered. Normal cell nuclei actually scatter light. -- So the next time someone tells you that it “strains credulity” to think that more than a few pieces of “junk DNA” could be functional in the cell - remind them of the rod cell nuclei of the humble mouse. http://www.evolutionnews.org/2009/04/shoddy_engineering_or_intellig020011.htmlbornagain77_{September 28, 2010
September
09
Sep
28
28
2010
02:05 PM
2
02
05
PM
PDT}

I find it easier to understand when I work directly with the % of mismatches. In your case it would be 38% (have you taken into account the relative size of each chromosome compared to the whole genome when doing the average?). For most of those 38% mismatches, the actual number of bases that does not match will be 1 (I can explain why in greater details if you want). So, by simply saying that the percentage of patterns that do not have a perfect match can directly be translated in the percentage of mismatches between the 2 genomes, you would be overestimating the percentage of mismatches by approximately 30-folds. In other word, your calculations do not make the distinction between a complete mismatch (0 bases out of 30; a deletion) and a partial mismatch (1, 2, 3 or more mismatches; SNPs). 98.5% from SNPs (based on the 2005 chimp genome paper), and like I said previously most of the mismatches will be only on 1 base in each pattern that scores a mismatch. At some point, this has to be acknowledged in your calculations. The easiest way would be to divide the percentage of patterns that do not score a perfect match by 30, since it would be true in most cases. And that will give you a rough estimation of the percentage of mismatches between human and chimp genome (in this case; 38% / 30 = ~1.2%).CharlesJ_{September 28, 2010
September
09
Sep
28
28
2010
12:23 PM
12
12
23
PM
PDT}

#29 Forgive me butting in, but why did you not complete this approach for the 2% case? Imagine there was a mismatch every 50 bases. Now for a 30 base sample we get 20 matches, followed by 30 mismatches - giving 20/30 = 0.66 - less than 1.63. So your figure suggests somewhere between 1% and 2% - which is exactly what the literature and Charles is suggesting.markf_{September 28, 2010
September
09
Sep
28
28
2010
11:39 AM
11
11
39
AM
PDT}

CharlesJ #26,27 Thank you for your involvement in the probability calculations. Let’s look at the problem from another point of view. My test gives near 62% 30BMP-similarity. This means that, in average, in 10000 searched patterns we have 6200 matches and 3800 mismatches. The ratio between matches/mismatches is 6200/3800 = 1.63. In your hypothesis of two genomes that differ only 1% in average every 100 bases there is a mismatch. To simplify the scenario let’s imagine that these mismatches are uniformly distributed along the coupled genomes A and B, as the tags in a ruler. Now let’s consider a random 30 base pattern in A. In every range of 100 bases we have 70 successive positions where there are no mismatches followed by 30 positions where there are mismatches. Now we have that the ratio between matches/mismatches is roughly 70/30 = 2.3. Since 1.63 is lesser than 2.3 I wouldn’t say that the 30BMP test agrees well with a 1% difference, rather with a larger difference.niwrad_{September 28, 2010
September
09
Sep
28
28
2010
09:35 AM
9
09
35
AM
PDT}

Todd Wood has a short response on his blog.Harfen_{September 28, 2010
September
09
Sep
28
28
2010
06:12 AM
6
06
12
AM
PDT}

Actually, even using your 3/10 probability, we should expect ~70% perfect matches with a 1% difference in both genomes. This is not very far from your results. Thus based on your study, there is approximately a 99% homology between human and chimp genome.CharlesJ_{September 28, 2010
September
09
Sep
28
28
2010
05:55 AM
5
05
55
AM
PDT}

“If we have a 1% difference between the genomes in average we find a mismatch in a 100 base sequence, then 3 mismatches in a series of 10 30 base sequences. Hence the probability of finding a mismatch in a 30 base sequence is 3/10.” With that kind of logic, the probability of finding a mismatch in a 100 base sequence would be 10/10? It’s a bit like saying that since the probability of getting a given number on a dice is 1/6, then the probability of getting a specific number in 6 try is 6/6. A result on a dice does not get more likely every time you throw the dice; it’s always 1/6. So the probability of getting a specific number at least once on a dice in 6 try is approximately 62%. But I did make a mistake in my previous post, it should have been: the probability of finding "at least" 1 mismatch is approximately 1 in 4. If the differences between human and chimp genome was 66% like you claim, that would mean on average 1 different base every 3 bases; a 30-base pattern with a perfect match would be very rare. You probably would not need to do so much complex calculation to demonstrate that the 99% claim is erroneous either; a simple look at 2 aligned sequence would do the job. Another way of explaining what I mean is with an example. Let’s say that during your calculations, you have a 30-base pattern that gives a mismatch (which means that it is not a 100% perfect match). If you took that pattern and did a blast with it, you would probably find that there is only one base that is a mismatch in the 30-base pattern, yet the whole pattern is considered a mismatch. For 1 base difference, you consider that the 30 bases are a mismatch, causing an overestimation of the mismatches. Like I said, if we consider there is 1% difference between the 2 genomes, then the probability of finding at least 1 mismatch in a 30-base pattern is roughly 1 in 4 (around 75%). The probability of finding at least 1 mismatch in a 30-base pattern considering 2% difference between the 2 genomes is around 45%. Your results are pretty much in between (66% average), which is consistent with ~98.5% homology between our genome and the chimp genome. If you don’t believe me, you should try a simple test. Take a small chromosome (to save calculation time) and do a 40 BPM and a 50 BPM analysis. The probability of finding a single mismatch in a 40-base pattern is bigger that for a 30-base pattern, so you should find a lower level of similarity. If you try it on chromosome 22, my prediction is that the 40-BPM similarity will be around 55% and the 50-BPM similarity around 47%.CharlesJ_{September 28, 2010
September
09
Sep
28
28
2010
05:37 AM
5
05
37
AM
PDT}

Robert Byers #23
Creationism should welcome ape likeness as simply showing a creator with a simple program for life. 99% is fine and makes more sense.
The problem is that evolutionism uses the 99% myth to counter creationism and ID. Less the DNA differences between humans and chimps more the believability of unguided macroevolution, evolutionists think. My test doesn’t disprove creationism and ID, rather the inverse. Humans and chimps are beings with extremely different potentialities just from the beginning. At all levels, their similarities show their designs share some common templates, their differences show they didn’t arise by random evolution from each other.niwrad_{September 28, 2010
September
09
Sep
28
28
2010
03:11 AM
3
03
11
AM
PDT}

Thanks to all commenters for the objections/suggestions, which will be useful to me if I will continue my tests. AMW #3
Let’s say you’ve got a string of 30 T’s on chromosome A, and you’re looking for a matching string on chromosome B. Suppose there is no string of 30 T’s on Chromosome A, but there is a string of 29 T’s followed by a G. That is, a single point mutation could account for the difference between strings. Does your metric count that as a match, or as no match?
No match, my test only accepts perfect pattern matching between two 30 base sequences. If we begin to relax the rules then we may arrive to the 99% identity that is exactly what is controversial. CharlesJ #19
If we say that we have a 1% difference between the human and chimp genomes, isn’t the probability of finding a mismatch in a 30 base sequence roughly 1 in 4?
If we have a 1% difference between the genomes in average we find a mismatch in a 100 base sequence, then 3 mismatches in a series of 10 30 base sequences. Hence the probability of finding a mismatch in a 30 base sequence is 3/10.niwrad_{September 28, 2010
September
09
Sep
28
28
2010
02:43 AM
2
02
43
AM
PDT}

I know the author of the thread said it didn't matter to him how close we are to chimps by Dna yet it does seem to matter to I.D people and evolutionists. To this biblical creationist we are so alike to apes that the small differences in our bodies does not in any way suggest a different origin. Therefore since the bible says Adam/Eve were not born but instantly created then it can only be that there is a general common blueprint for life and we simply got given the best bodies one could pick in the existing blueprint. Everything has eyes, ears, legs, lungs, etc. Therefore the sameness must be from a simple program in nature. Therefore we should have the same Dna if we have the same parts. I read recently bats and whales had the same dna score for radar. right on. Dna is not a trail of heritage but merely a parts department and the connections are a part. like form equals like Dna. Yet no actual biological relation. I see this in marsupials and placentals. They surely are the same creatures yet have different Dna. Therefore the marsupial change brought a score change that hides actual biological relationship. Creationism should welcome ape likeness as simply showing a creator with a simple program for life. 99% is fine and makes more sense.Robert Byers_{September 28, 2010
September
09
Sep
28
28
2010
12:40 AM
12
12
40
AM
PDT}

Prev 1 2 3 Next

Leave a Reply
You must be logged in to post a comment.