A simple statistical test for the alleged “99% genetic identity” between humans and chimps

_{September 27, 2010

Genomics, Informatics, Intelligent Design}

Share: Facebook; Twitter; LinkedIn; Flipboard; Print; Email

Typical figures published in the scientific literature for the percentage similarities between the genomes of human beings (Homo sapiens) and chimpanzees (Pan troglodytes) range from 95% to 99%. However, in press releases intended for popular consumption, evolutionary biologists frequently claim that human and chimpanzee genomes are 99% identical. Skeptics of neo-Darwinian evolution have repeatedly punctured this”99% myth,” but unfortunately, it seems to have gained widespread credence, due to its being continually propagated by evolutionists! For instance, one often encounters statements like these in the literature:

“Because the chimpanzee lies at such a short evolutionary distance with respect to human, nearly all of the bases are identical by descent and sequences can be readily aligned” (The Chimpanzee Sequencing and Analysis Consortium, Initial sequence of the chimpanzee genome and comparison with the human genome, Vol. 437/1 September 2005/doi:10.1038/nature04072).

“The consortium [National Human Genome Research Institute] found that the chimp and human genomes are very similar and encode very similar proteins. The DNA sequence that can be directly compared between the two genomes is almost 99 percent identical.” (here.)

“The genetic codes of chimps and humans are 99 percent identical.” (here)

Supporters of the neo-Darwinian theory of evolution have a strong ideological motivation for minimizing the differences between humans and chimps, as they claim that these two species evolved from a common ancestor, as a result of random mutations filtered by natural selection. Now, I don’t personally believe that humans and chimps share a common ancestry, for a host of reasons that would take me too long to explain in this post. Nor do I attach much significance to the magnitude of the genetic differences between these two species, per se, because in my opinion, the fundamental differences between these creatures lie elsewhere. However, since the genomic data is now available for free on the Internet, I decided to perform some sleuthing of my own, and check out the wildly exaggerated claims that are often made regarding the percentage similarities between human and chimp genomes. Here is what I discovered.

Interactive functional comparison methods
Usually, molecular biologists compare genomes on a functional basis. For example, they may search for similar genes in the genomes of human beings and chimpanzees, and try to identify the bases or nucleotides where they differ or match. Many different technologies have been developed to investigate genomes. One of these is BLAST (Basic Local Alignment Search Tool) software (see the NCBI Web site for more details). BLAST is an extremely powerful computer aided tool, as it is able to locate regions of local similarity among sequences by searching a whole database of genomes. Alignment methods (such as those implemented by BLAST and other techniques) allow geneticists to search interactively for common local patterns in different positions. However, this interactive task has its limits, as it can compare only portions of different genomes. Additionally, some critics have pointed out that these tools are susceptible to slip-ups (see here). Given the amount of data involved (in the order of Gigabytes), the global comparison of two genomes is a very demanding job, which cannot be completed interactively in a short time by human beings, even with the aid of tools such as BLAST. At the present time, only fully automated computer programs are capable of performing such a task on entire genomes. However, the development of an automated computer program which is capable of performing a complete functional comparison between human and chimpanzee genomes is practically impossible, for the simple reason that the functional architecture of these genomes is not yet perfectly known.

Automatic statistical comparison methods
From a mere informatics and statistical point of view, DNA sequences are simply strings of symbols or characters. Thus it is also possible to develop tests comparing genomes as unstructured sequences of characters, without taking into consideration genes, pseudo-genes, coding and non-coding regions, vertical and horizontal gene transfer, open reading frames (ORFs), or any other functional concepts. The characters most commonly present in DNA sequences are A, C, G and T. There are other less important characters which are used basically to indicate ambiguity regarding the identity of certain bases in the sequences. The comparison I performed was completely different from those usually performed by geneticists, because was purely statistical in nature. In a sense, it could be described as an application of the well-known Monte Carlo method. The Monte Carlo method is frequently used when data or processes involved are huge, and one wants to reduce the computer running time. In short, it involves dealing with a partial random sample, instead of the whole space which is under investigation. In the Monte Carlo method, only a small portion of the data population is actually investigated; nevertheless, this portion is statistically large enough to reveal the characteristics of the whole.

Metrics, distances and similarity measures
One theoretical approach to the problem would be to consider the set of all strings of characters as a metric space, and then define a distance function for all pairs of strings. Many distance functions have been developed by mathematicians for studying the degree of similarity between strings (for a list of them see (here). Given a metric or pseudo-metric space and its distance function, we can refer to a particular similarity, which differs from the similarity distance of another metric space. In a pairwise comparison identity test, we can easily calculate a simple metric distance called the “Hamming distance.” In this test, the order is important, because the n-th character of string A is compared to the n-th character of string B, after the initial characters of A and B have been aligned. After each comparison, if the two characters don’t match, then the Hamming distance increases by 1. If the order doesn’t matter, we can compare sub-strings of the parent strings A and B. Additionally, if they are at different positions in the two strings then many different tests are possible. We call these pattern matching or similarity tests. While there is only one possible method of comparing identity between strings of characters (the above pairwise comparison), there are many methods of comparing similarity. In other words, there are many measures of similarity, depending on the rules of pattern matching that we choose. In practice, calculating a certain distance function between two genomes can be a demanding job, in terms of running time, even for powerful computers.

Specifications for a statistical similarity test
Any final result for a complete statistical similarity test (especially if it is a unique number) is meaningful only if: 1) the distance function is mathematically defined; 2) the rules for pattern matching and the formulas for calculating the result are explained in detail; 3) it is clearly stated which parts of the input strings are being examined; 4) in the event that computer programs were used to perform the comparison, the source codes and algorithms are provided. My explanations below have the goal to meet the three first constraints. To satisfy the fourth condition, the source file of the Perl script used for the test is freely downloadable here.

How the genome data was obtained
Genome data for Homo sapiens and Pan troglodytes was freely downloaded from public bio-informatics archives at UCSC Genome Bioinformatics. The downloaded DNA sequences were in FASTA format. Before running the test, I decided to discard all symbols in the sequences, except for A, C, G and T. Most of the symbols I had to discard were “N” symbols, which represented rare, undefined situations (probably due to the level of sophistication of the scanning technology). The frequency of other symbols was very low. As it turned out, the deletion of a few “N” symbols didn’t affect the overall result very much. Given that the chimp’s genome contains two chromosomes (referred to as chr_02a and chr_02b) corresponding to chromosome #2 in human beings, I decided to concatenate them, in order to compare them with human chromosome #2 (chr_02).

30 Base Pattern Matching (30BPM) similarity test
The 30BPM similarity test is a very simple one: it performs searches for shared 30 base-long patterns, on two homologous chromosomes. This method is a true pattern-matching test, because it searches for identical patterns in the chromosomes of humans and chimpanzees. The beauty of this test is that it allows patterns to match, independently of their position in the chromosome. The significance of local similarities in homologous chromosomes is that identical patterns may be found in quite different positions along the two chromosomes. In fact, this test allows a total scrambling of patterns between homologous chromosomes. Of course, it is generally very difficult to know what the functional implications of this scrambling are. In particular, the positions of the genes might shift, but when non-gene coding is scrambled, it is doubtful that functionality is preserved. However, from a purely quantitative point of view, in this particular test, I don’t need to worry about qualitative issues such as functionality; only statistical issues count.

The algorithm implemented
For each pair of homologous chromosomes A and B, a PRNG (pseudo-random number generator) generates 10,000 uniformly distributed pseudo-random numbers which specify the offset, or starting point, of 10,000 30-base patterns that are contained in source chromosome A. The 30BPM test involves searching for all 10,000 of these DNA sub-strings of chromosome A in our target chromosome B. Now let F be the number of patterns located (at least once) in chromosome B. The 30BPM similarity is simply defined as F/100 (minimum value = 0%, maximum value = 100%). The absolute difference between 10,000 and F (minimum 0, maximum 10,000) is the 30BPM distance. Thus the greater the similarity is, the smaller the distance will be. Strictly speaking, this 30BPM space is only a pseudo-metric, inasmuch as the axiom of identity (“the distance is zero if and only if A and B are equal”) defining a true metric space is somewhat relaxed (in some cases, the distance could still be zero even if A and B were different), while the axiom of symmetry (“the distance between A and B is equal to the distance between B and A”) does not hold in some cases. It can easily be seen that the 30BPM distance will be zero (30BPM similarity = 100%) if the two strings are identical. In an additional test which I performed on two random 100 million-base DNA strings, the 30-BPM distance was 10,000 (i.e. no patterns on A were located in B). Hence I shall refer to the value 10,000 as the “random 30BPM distance.” In other words, the 30BPM similarity between two artificially generated random 100 million-base DNA strings is zero. Of course, when generating these artificial DNA strings I had to take into consideration the fact that that on average, the true probabilities of A, T, G and C occurring in natural DNA are not exactly 0.25 each, but as follows: A=0.3, T=0.3, G=0.2, and C=0.2. In such a case, the following formula accurately describes the probability of obtaining a single-base match between the two DNA sequences:

(30*30 + 30*30 + 20*20 + 20*20)/(100*100) = (900+900+400+400)/10000 = 26%

In a supplementary test in which I performed a pure pair-wise comparison between human/chimp genomes, I obtained a global figure 25.90%, which matches very closely with the theoretically predicted result above.

Results obtained
The following table and graph show the report of the 30BPM similarity test on the whole set of human/chimp chromosomes.

The results obtained are statistically valid. The same test was previously run on a sampling of 1,000 random 30-base patterns and the percentages obtained were almost identical with those obtained in the final test, with 10,000 random 30-base patterns. When human and chimp genomes are compared, the X chromosome is the one showing the highest degree of 30BPM similarity (72.37%), while the Y chromosome shows the lowest degree of 30BPM similarity (30.29%). On average the overall 30BPM similarity, when all chromosomes are taken into consideration, is approximately 62%. Here we have the classic case of the glass which some people perceive as being half-full, while others perceive it as being half-empty. When compared to two random strings which are 0% similar, 62% is a very large value, so nobody would deny that human and chimp genomes are quite similar! On the other end, 62% is a very low value when compared to the more than 95% similarity percentages which are published by bioinformatics evolutionary researchers. Now, I realize that it may seem somewhat arbitrary to choose 30-base-long patterns, as I did in my test, and indeed it is arbitrary to some degree. However, if the two genomes were really 95% similar or more, as is commonly claimed, also a 30BPM statistical test should produce 95% results, and it does not.

An analogy from politics: an exit poll
To help readers to grasp the significance and potential implications of my test, here is a simple analogy. Consider an election, in which 100 million electors are eligible to vote. One exit poll, based on a sample of 10,000 voters, calculates that party X has received 62% of the popular vote. However, at the end of election party X declares it has received more than 95% of the vote! The 30BPM statistical test described above is analogous to the exit poll, while the claims made by evolutionary biologists are analogous to party X’s “95%” claim. The sample of 10,000 patterns is taken from a global population of 100 million bases (the approximate number of bases on a typical human/chimp chromosome), while the ratio of population to sample is 100,000,000/10,000=10,000. The 30BPM exit poll metaphorically says that only 62% voted for Darwin’s party, whereas modern Darwinists claim that over 95% did. Something doesn’t quite add up.

I believe that the classic evolutionary comparisons between human and chimp genomes exaggerate the similarities, for at least two reasons: (1) they don’t consider whole chromosomes, but only portions of them (e.g. particular genes); (2) the rules of pattern matching are relaxed in some way (e.g. sometimes two bases are said to match, even when they don’t really match). Now, there is nothing intrinsically wrong with comparisons where (1) and (2) hold. However, any research that is truly worthy of being called “scientific” should openly acknowledge built-in limitations, such as (1) and (2) above. Sadly, this is very rarely done. It is perfectly acceptable to publish partial results that are obtained by relaxing the rules, but one should not publicize them as global and mathematically sound, when in fact, they are nothing of the sort.

Conclusion
We have seen that in a genome comparison, the only thing that matters is the degree of similarity. However, once we put the concept of similarity between two text strings on the table we open a can of worms. Many different measures of the similarity between two strings are possible, and different methods of comparing two genomes can result in wildly different estimates of the similarity between them. The assumptions that drive the methods used also drive the results obtained, as well as their interpretation. A simple layman’s statistical test, such as the 30BPM, shows that the “95% claim” described above is a highly controversial one. It is worth noting that as more information comparing the two genomes is published, the differences between them will appear more profound than they were originally thought to be. The big question that still remains is: what should one conclude from the similarities and differences between the genomes of humans and chimpanzees? Commonly reported evolutionary statistics that should provide an informative answer to this question may actually obscure the true answer.

Comments

DCX, please stop cutting heads off the Hydra. The internet is running out of bandwidth.AMW_{September 27, 2010
September
09
Sep
27
27
2010
07:49 PM
7
07
49
PM
PDT}

The discontinuity between humans and all other forms of life is so profound and so obvious that it clearly cannot be explained by differences in DNA. The appearance of humans represents an evolutionary sea change that makes all other evolutionary discontinuities seem trivial in comparison. Something very strange and extraordinarily marvelous took place when the first humans appeared on the scene.GilDodgen_{September 27, 2010
September
09
Sep
27
27
2010
07:21 PM
7
07
21
PM
PDT}

DCX, you may find this following video a bit more clear for explaining exactly why mutations to the DNA do not control Body Plan morphogenesis, since they are the 'bottom rung of the ladder' as far as the 'layered information' of the cell is concerned: Stephen Meyer on Craig Venter, Complexity Of The Cell & Layered Information http://www.metacafe.com/watch/4798685bornagain77_{September 27, 2010
September
09
Sep
27
27
2010
06:05 PM
6
06
05
PM
PDT}

I was wondering if you could clarify a point for me. When you are testing a 30-base pattern, do you only accept a pattern with a perfect match on the two chromosomes tested? If we say that we have a 1% difference between the human and chimp genomes, isn’t the probability of finding a mismatch in a 30 base sequence roughly 1 in 4?CharlesJ_{September 27, 2010
September
09
Sep
27
27
2010
05:00 PM
5
05
00
PM
PDT}

OH another tidbit DCX, mutations to DNA don't even control Body Plan morphogenesis, thus whatever the sequence similarity or dissimilarity of the DNA it doesn't matter, for the point is moot in the first place: Cortical Inheritance: The Crushing Critique Against Genetic Reductionism - Arthur Jones - video http://www.metacafe.com/watch/4187488 entire video: http://edinburghcreationgroup.org/fishfossils.xml The Origin of Biological Information and the Higher Taxonomic Categories - Stephen Meyer"Neo-Darwinism seeks to explain the origin of new information, form, and structure as a result of selection acting on randomly arising variation at a very low level within the biological hierarchy, mainly, within the genetic text. Yet the major morphological innovations depend on a specificity of arrangement at a much higher level of the organizational hierarchy, a level that DNA alone does not determine. Yet if DNA is not wholly responsible for body plan morphogenesis, then DNA sequences can mutate indefinitely, without regard to realistic probabilistic limits, and still not produce a new body plan. Thus, the mechanism of natural selection acting on random mutations in DNA cannot in principle generate novel body plans, including those that first arose in the Cambrian explosion." http://eyedesignbook.com/ch6/eyech6-append-d.html Stephen Meyer - Functional Proteins And Information For Body Plans - video http://www.metacafe.com/watch/4050681 Hopeful monsters,' transposons, and the Metazoan radiation: Excerpt: Viable mutations with major morphological or physiological effects are exceedingly rare and usually infertile; the chance of two identical rare mutant individuals arising in sufficient propinquity to produce offspring seems too small to consider as a significant evolutionary event. These problems of viable "hopeful monsters" render these explanations untenable. Paleobiologists Douglas Erwin and James Valentine “Yet by the late 1980s it was becoming obvious to most genetic researchers, including myself, since my own main research interest in the ‘80s and ‘90s was human genetics, that the heroic effort to find the information specifying life’s order in the genes had failed. There was no longer the slightest justification for believing that there exists anything in the genome remotely resembling a program capable of specifying in detail all the complex order of the phenotype (Body Plan)." Michael John Denton page 172 of Uncommon Dissent etc.. etc.. etc..bornagain77_{September 27, 2010
September
09
Sep
27
27
2010
04:08 PM
4
04
08
PM
PDT}

DCX, I think a major prediction for ID is that the majority of Junk DNA will be found to have function, and indeed ENCODE has bore this out as well as subsequent studies on 'non-coding regions. Thus as I stated earlier this will only greatly increase the already insurmountable bridge that Darwinism has yet to honestly address. The problem for evolution is a lot worse than you seem to realize, another problem for you is that you must assume a substantial portion of beneficial mutations to account for the 'dramatic' changes in genes, as previously noted in chromosome 22 and the y chromosome, and you simply have slim to none to none whatsoever 'beneficial mutations' to the human genome to point to as evidence for Darwinism (nor do you have any anywhere else to point to). the evidence for the detrimental nature of mutations in humans is overwhelming for scientists have already cited over 100,000 mutational disorders. Inside the Human Genome: A Case for Non-Intelligent Design - Pg. 57 By John C. Avise Excerpt: "Another compilation of gene lesions responsible for inherited diseases is the web-based Human Gene Mutation Database (HGMD). Recent versions of HGMD describe more than 75,000 different disease causing mutations identified to date in Homo-sapiens." I went to the mutation database website cited by John Avise and found: HGMD®: Now celebrating our 100,000 mutation milestone! http://www.biobase-international.com/pages/index.php?id=hgmddatabase This following study confirmed the detrimental mutation rate for humans, of 100 to 300 per generation, estimated by John Sanford in his book 'Genetic Entropy' in 2005: Human mutation rate revealed: August 2009 Every time human DNA is passed from one generation to the next it accumulates 100–200 new mutations, according to a DNA-sequencing analysis of the Y chromosome. (Of note: this number is derived after "compensatory mutations") http://www.nature.com/news/2009/090827/full/news.2009.864.html This 'slightly detrimental' mutation rate of 100 to 200 per generation is far greater than even what evolutionists agree is an acceptable mutation rate for an organism: Beyond A 'Speed Limit' On Mutations, Species Risk Extinction Excerpt: Shakhnovich's group found that for most organisms, including viruses and bacteria, an organism's rate of genome mutation must stay below 6 mutations per genome per generation to prevent the accumulation of too many potentially lethal changes in genetic material. http://www.sciencedaily.com/releases/2007/10/071001172753.htm Contamination of the genome by very slightly deleterious mutations: why have we not died 100 times over? Kondrashov A.S. http://www.ingentaconnect.com/content/ap/jt/1995/00000175/00000004/art00167 Another huge problem that you don't seem to be aware of is the fact that genomes are severely poly-constrained to mutations because they are now shown to be poly-functional: Scientists Map All Mammalian Gene Interactions – August 2010 Excerpt: Mammals, including humans, have roughly 20,000 different genes.,,, They found a network of more than 7 million interactions encompassing essentially every one of the genes in the mammalian genome. http://www.sciencedaily.com/releases/2010/08/100809142044.htm Poly-Functional Complexity equals Poly-Constrained Complexity http://docs.google.com/Doc?docid=0AYmaSrBPNEmGZGM4ejY3d3pfMjdoZmd2emZncQ DNA - Evolution Vs. Polyfuctionality - video http://www.metacafe.com/watch/4614519 I don't know DCX you seem to be pretty certain Humans evolved from apes but I can find no compelling evidence for your certainty, In fact find plenty of evidence that strongly argues against it.bornagain77_{September 27, 2010
September
09
Sep
27
27
2010
03:59 PM
3
03
59
PM
PDT}

FYI: "Now researchers have learned that only about two percent of human and chimp DNA encodes genetic blueprints for proteins. They also know that most of the rest — once referred to dismissively as “junk DNA” — contains sequences that affect whether, where and when proteins are made - and in what combinations, a key factor in development. Pollard raises a question that scientists have been debating for decades: “Do you make a human by making different proteins or do you make one by taking the same building blocks and putting them together in a different way?” She says most scientists now believe the greatest potential for change arises from rearranging the building blocks. Some of the DNA formerly regarded as junk plays an important role in these rearrangements... Pollard and her collaborators are most interested in rapidly evolving bits of DNA that may play a role in determining human attributes such as language, the complexity of the brain’s cerebral cortex, hairless skin, fine motor coordination of the thumb and fingers, and the ability to easily digest certain foods we commonly eat. The top-ranking piece of human DNA to emerge form Pollard’s first comprehensive round of number-crunching differed from chimp DNA in 18 of 118 base pairs. In contrast, between chimp and chicken —a vertebrate that has evolved on a separate path from our evolutionary ancestors for about 300 million years - there were only two differences along the same DNA stretch. Pollard and colleagues named the DNA segment HAR1, for “human accelerated region.” The name refers to this DNA’s relatively fast evolution in our human ancestors. Pollard’s colleagues subsequently showed that HAR1 encodes RNA. But it’s not like the biology-textbook messenger RNA that is translated into protein. Instead the HAR1-encoded RNA has a more direct influence. There is more to learn about HAR1 RNA, but already a Belgian colleague of Pollard’s has shown that it is made in specific nerve cells within the brain’s developing cerebral cortex. The second highest-ranking DNA in Pollard’s screen, dubbed HAR2, is a switch regulating the activation of specific genes. Scientists have discovered that it plays a role in limb development. Differences between human and chimp may help explain why human can more precisely control finger and thumb movements." http://www.physorg.com/news196962452.htmlwhoisyourcreator_{September 27, 2010
September
09
Sep
27
27
2010
03:56 PM
3
03
56
PM
PDT}

Ah, another quick reply. Good show. I would like to comment that your first two links were discussions on only one chromosome, which one link said was 1% of the genome. Is that significant, or relatively minor mutations? Is that predicted by having a designer, or is it one chromosome that has undergone quite a few mutations? Does ID have a certain percentage of DNA that we would expect to find different between the human and chimp genome? 20% 50% 90%? What would the dot plot look like? How would it compare to human computer code? And as for your other articles, I am not surprised to hear that identifying genes is a tricky business. From the textbook (a quick glance though) genes do not have a defined starting/ending sequence. They can overlap. They can be expressed or not-expressed erroneously. It's a very complicated issue, especially since bioinformatics is a relatively new field. I believe it is a topic that I will cover in class, so I will have to come back another time and discuss it more fully then. But returning to the original post... On the comparison that niwrad made, do you agree that pair-wise comparisons of DNA may not be "cutting-edge"? Would you join me in suggesting a research program of testing ID predictions against SynMap dot plots to whoever wants to pursue this study?DCX_{September 27, 2010
September
09
Sep
27
27
2010
03:36 PM
3
03
36
PM
PDT}

DCX, well I guess ID would predict something like this: Chimp chromosome creates puzzles - 2004 Excerpt: However, the researchers were in for a surprise. Because chimps and humans appear broadly similar, some have assumed that most of the differences would occur in the large regions of DNA that do not appear to have any obvious function. But that was not the case. The researchers report in 'Nature' that many of the differences were within genes, the regions of DNA that code for proteins. 83% of the 231 genes compared had differences that affected the amino acid sequence of the protein they encoded. And 20% showed "significant structural changes". In addition, there were nearly 68,000 regions that were either extra or missing between the two sequences, accounting for around 5% of the chromosome.,,, "we have seen a much higher percentage of change than people speculated." The researchers also carried out some experiments to look at when and how strongly the genes are switched on. 20% of the genes showed significant differences in their pattern of activity. http://www.nature.com/news/1998/040524/full/news040524-8.html Chimps are not like humans - May 2004 Excerpt: the International Chimpanzee Chromosome 22 Consortium reports that 83% of chimpanzee chromosome 22 proteins are different from their human counterparts,,, The results reported this week showed that "83% of the genes have changed between the human and the chimpanzee—only 17% are identical—so that means that the impression that comes from the 1.2% [sequence] difference is [misleading]. In the case of protein structures, it has a big effect," Sakaki said. http://cmbi.bjmu.edu.cn/news/0405/119.htm or maybe ID would predict something like this DCX,,, This following article, which has a direct bearing on the 98.8% genetic similarity myth, shows that over 1000 'ORFan' genes, that are completely unique to humans and not found in any other species, and that very well may directly code for proteins, were stripped from the 20,500 gene count of humans simply because the evolutionary scientists could not find corresponding genes in primates. In other words evolution, of humans from primates, was assumed to be true in the first place and then the genetic evidence was directly molded to fit in accord with their unproven assumption. It would be hard to find a more biased and unfair example of practicing science! Human Gene Count Tumbles Again - 2008 Excerpt: Scientists on the hunt for typical genes — that is, the ones that encode proteins — have traditionally set their sights on so-called open reading frames, which are long stretches of 300 or more nucleotides, or “letters” of DNA, bookended by genetic start and stop signals.,,,, The researchers considered genes to be valid if and only if similar sequences could be found in other mammals – namely, mouse and dog. Applying this technique to nearly 22,000 genes in the Ensembl gene catalog, the analysis revealed 1,177 “orphan” DNA sequences. These orphans looked like proteins because of their open reading frames, but were not found in either the mouse or dog genomes. Although this was strong evidence that the sequences were not true protein-coding genes, it was not quite convincing enough to justify their removal from the human gene catalogs. Two other scenarios could, in fact, explain their absence from other mammalian genomes. For instance, the genes could be unique among primates, new inventions that appeared after the divergence of mouse and dog ancestors from primate ancestors. Alternatively, the genes could have been more ancient creations — present in a common mammalian ancestor — that were lost in mouse and dog lineages yet retained in humans. If either of these possibilities were true, then the orphan genes should appear in other primate genomes, in addition to our own. To explore this, the researchers compared the orphan sequences to the DNA of two primate cousins, chimpanzees and macaques. After careful genomic comparisons, the orphan genes were found to be true to their name — they were absent from both primate genomes. http://www.sciencedaily.com/releases/2008/01/080113161406.htm The sheer, and blatant, shoddiness of the science of the preceding study should give everyone who reads it severe pause whenever, in the future, someone tells them that genetic studies have proven evolution to be true. This following site has a brief discussion on the biased methodology of the preceding study: https://uncommondescent.com/intelligent-design/proteins-fold-as-darwin-crumbles/#comment-358505 If the authors of the preceding study were to have actually tried to see if the over 1000 unique ORFan genes of humans may actually encode for proteins, instead of just written them off because they were not found in in other supposedly related species, they would have found that there is ample reason to believe that they may very well encode for biologically important proteins: A survey of orphan enzyme activities Abstract: We demonstrate that for ~80% of sampled orphans, the absence of sequence data is bona fide. Our analyses further substantiate the notion that many of these (orfan) enzyme activities play biologically important roles. http://www.biomedcentral.com/1471-2105/8/244 Dr. Howard Ochman - Dept. of Biochemistry at the University of Arizona Excerpt of Proposal: Although it has been hypothesized that ORFans might represent non-coding regions rather than actual genes, we have recently established that the vast majority that ORFans present in the E. coli genome are under selective constraints and encode functional proteins. https://uncommondescent.com/intelligent-design/proteins-fold-as-darwin-crumbles/#comment-358868 In fact it turns out that the authors of the 'kick the ORFans out in the street' paper actually did know that there was unbiased evidence strongly indicating the ORFan genes encoded proteins but chose to ignore it in favor of their preconceived evolutionary bias: https://uncommondescent.com/intelligent-design/proteins-fold-as-darwin-crumbles/#comment-358547 I would like to reiterate that evolutionists cannot even account for the origination of just unique one gene or protein, much less over one thousand completely unique ORFan genes: Could Chance Arrange the Code for (Just) One Gene? "our minds cannot grasp such an extremely small probability as that involved in the accidental arranging of even one gene (10^-236)." http://www.creationsafaris.com/epoi_c10.htm "Estimating the Prevalence of Protein Sequences Adopting Functional Enzyme Folds” 2004: - Doug Axe ,,,this implies the overall prevalence of sequences performing a specific function by any domain-sized fold may be as low as 1 in 10^77, adding to the body of evidence that functional folds require highly extraordinary sequences." http://www.mendeley.com/research/estimating-the-prevalence-of-protein-sequences-adopting-functional-enzyme-folds/bornagain77_{September 27, 2010
September
09
Sep
27
27
2010
03:05 PM
3
03
05
PM
PDT}

Ah bornagain77, what a quick response. Thank you. This problem is not based entirely on evolution. It is an fact a problem based entirely on string comparison. DNA, like any piece of text, can undergo this treatment. For example, you could run two student's essays into a dot plot and find out if significant passages have been stolen. A "plagiarism dot plot" Google query will bring up many pages on the subject. I believe the point is not to prove that evolution is true. It is to merely state that evolution would predict "closely-related" genomes to be more similar than "non-closely-related" genomes. From these simple comparisons, it looks pretty likely that while evolution throws the DNA out of alignment and changes it, evolution still "preserves" most of the text. What predictions does the design hypothesis have? Can design be seen in the mutation/insertion/deletion/chromosome adjustment/reversals?DCX_{September 27, 2010
September
09
Sep
27
27
2010
02:53 PM
2
02
53
PM
PDT}

To make it even easier: Look at this tutorial video: human-chimp dot plot It describes how a dot plot would look for humans and chimps. You can then look at the SynMap program yourself at: SynMapDCX_{September 27, 2010
September
09
Sep
27
27
2010
02:39 PM
2
02
39
PM
PDT}

DCX, but your computer model for establishing similarity seems to be based, I'm pretty sure, on the assumption that evolution has occurred, so as to find similarities and ignore discrepancies, so of course it will give a different reading.,, Thus your model would seem to commit the same fallacy as the other models niwrad is critiquing. Namely your computer program ends up proving evolution is true because evolution is first assumed evolution to be true prior to the search,,, Just a bit biased wouldn't you say?bornagain77_{September 27, 2010
September
09
Sep
27
27
2010
02:26 PM
2
02
26
PM
PDT}

AMW, the Buggs's link I listed works fine for me,, here it is again,,, http://www.idnet.com.au/files/pdf/Chimpanzee.pdf As far as the 'hydra' your fighting, I just want you to know that I have definite reasons for doubting the certainty with which you state your 'conclusion' that man evolved from apes.bornagain77_{September 27, 2010
September
09
Sep
27
27
2010
02:17 PM
2
02
17
PM
PDT}

niwrad: While your code seems to be valid, I suggest that your matching scheme is naive, and will report more mis-matches than a more complicated algorithm. Hamming distance does not work very well with deletions and insertions, which are common in DNA. I suggest that a Needleman-Wunch distance may be a better solution to a string-matching problem. Or for those visual thinkers among us, I suggest using a dot plot such as DNA dot plotter to compare sequences. Sources: Course I am currently taking + the textbookDCX_{September 27, 2010
September
09
Sep
27
27
2010
02:17 PM
2
02
17
PM
PDT}

Also, do you just have excerpts and links on file somewhere? Because I did a Google search on part of your intro to the Dr. Richard Bugg link, and the first six or seven links were all to different comments where you'd posted it.AMW_{September 27, 2010
September
09
Sep
27
27
2010
02:00 PM
2
02
00
PM
PDT}

Bornagain77, For what it's worth, the probability that I'll read your entire comment is inversely related to the number of links that you post in it. I just haven't the time to go through them all. What's more, when I responded to one link, you gave me seven more to look up. It's worse than fighting a hydra! Follow-up comments would just lead us further afield. In short, when I see a Gish Gallop, I disengage. If you want to discuss any one concept or article in depth, I'll be much more likely to respond.AMW_{September 27, 2010
September
09
Sep
27
27
2010
01:53 PM
1
01
53
PM
PDT}

well AMW, I have my reservations as to your certainty, and I think niwrad's test may just well lend itself to such a test with a few modifications. certainly he won't have to relax constraints as much as Darwinists have in order to arrive at their biased 95% to 99% conclusion! Further notes: the chimp genome is about 12% larger than the human genome. A recent, more accurate, human/chimp genome comparison study, by Richard Buggs in 2008, has found when he rigorously compared the recently completed sequences in the genomes of chimpanzees to the genomes of humans side by side, the similarity between chimps and man fell to slightly below 70%! Why is this study ignored since the ENCODE study has now implicated 100% high level functionality across the entire human genome? Finding compelling evidence that implicates 100% high level functionality across the entire genome clearly shows the similarity is not to be limited to the very biased 'only 1.5% of the genome' studies of evolutionists. Chimpanzee? 10-10-2008 - Dr Richard Buggs - research geneticist at the University of Florida ...Therefore the total similarity of the genomes could be below 70%. http://www.idnet.com.au/files/pdf/Chimpanzee.pdf Moreover, when scientists did a actual Nucleotide by Nucleotide sequence comparison, to find the 'real world' difference between the genomes of chimps and Humans, they found the difference was even more profound than what Dr. Richard Buggs, or the statistical test, had estimated: Do Human and Chimpanzee DNA Indicate an Evolutionary Relationship? Excerpt: the authors found that only 48.6% of the whole human genome matched chimpanzee nucleotide sequences. [Only 4.8% of the human Y chromosome could be matched to chimpanzee sequences.] http://www.apologeticspress.org/articles/2070 As well niwrad, I thought you might like to know that your 'stunning' y chromosome dissimilarity added weight to these studies: Recent Genetic Research Shows Chimps More Distant From Humans,,, - Jan. 2010 Excerpt: “many of the stark changes between the chimp and human Y chromosomes are due to gene loss in the chimp and gene gain in the human” since “the chimp Y chromosome has only two-thirds as many distinct genes or gene families as the human Y chromosome and only 47% as many protein-coding elements as humans.”,,,, “Even more striking than the gene loss is the rearrangement of large portions of the chromosome. More than 30% of the chimp Y chromosome lacks an alignable counterpart on the human Y chromosome, and vice versa,,," http://www.evolutionnews.org/2010/04/recent_genetic_research_shows.html Chimp and human Y chromosomes evolving faster than expected - Jan. 2010 Excerpt: "The results overturned the expectation that the chimp and human Y chromosomes would be highly similar. Instead, they differ remarkably in their structure and gene content.,,, The chimp Y, for example, has lost one third to one half of the human Y chromosome genes. http://www.physorg.com/news182605704.html further notes: When we consider the remote past, before the origin of the actual species Homo sapiens, we are faced with a fragmentary and disconnected fossil record. Despite the excited and optimistic claims that have been made by some paleontologists, no fossil hominid species can be established as our direct ancestor. Richard Lewontin - Harvard Zoologist http://www.discovery.org/a/9961 Evolution of the Genus Homo - Annual Review of Earth and Planetary Sciences - Tattersall, Schwartz, May 2009 Excerpt: "Definition of the genus Homo is almost as fraught as the definition of Homo sapiens. We look at the evidence for “early Homo,” finding little morphological basis for extending our genus to any of the 2.5–1.6-myr-old fossil forms assigned to “early Homo” or Homo habilis/rudolfensis." http://arjournals.annualreviews.org/doi/abs/10.1146/annurev.earth.031208.100202 Man is indeed as unique, as different from all other animals, as had been traditionally claimed by theologians and philosophers. Evolutionist Ernst Mayr http://www.y-origins.com/index.php?p=home_more4 “Something extraordinary, if totally fortuitous, happened with the birth of our species….Homo sapiens is as distinctive an entity as exists on the face of the Earth, and should be dignified as such instead of being adulterated with every reasonably large-brained hominid fossil that happened to come along.” Anthropologist Ian Tattersall (curator at the American Museum of Natural History)bornagain77_{September 27, 2010
September
09
Sep
27
27
2010
12:54 PM
12
12
54
PM
PDT}

Bornagain77, My guess is less than 10%. For one thing, Kangaroos have 12 chromosomes, and Niwrad's algorithm compares one chromosome to another. For another, according to the article you cite, we're supposed to have shared a common ancestor 150 million years ago with 'Roos, whereas it's only supposed to be 5-7 million years since our common ancestor with chimps. Also, I noticed that the article just says there are large chunks of DNA that we share with 'Roos. It doesn't say what the percent similarity is, and in particular it doesn't say anything about the similarities in non-coding DNA. I'd bet good money we're miles apart on that front.AMW_{September 27, 2010
September
09
Sep
27
27
2010
12:00 PM
12
12
00
PM
PDT}

niwrad, I would be interested to know what numbers you would get if you ran this comparison through: Kangaroo genes close to humans Excerpt: Australia's kangaroos are genetically similar to humans,,, "There are a few differences, we have a few more of this, a few less of that, but they are the same genes and a lot of them are in the same order," ,,,"We thought they'd be completely scrambled, but they're not. There is great chunks of the human genome which is sitting right there in the kangaroo genome," http://www.reuters.com/article/science%20News/idUSTRE4AH1P020081118 If you want the sequences for the kangaroo genome I believe one of the people at the bottom of this page can help you: Australian First: Kangaroo Genome Mapped http://archive.uninews.unimelb.edu.au/view-55743.htmlbornagain77_{September 27, 2010
September
09
Sep
27
27
2010
11:01 AM
11
11
01
AM
PDT}

Interesting. I've got two questions. First, to clarify the 30 BPM metric. Let's say you've got a string of 30 T's on chromosome A, and you're looking for a matching string on chromosome B. Suppose there is no string of 30 T's on Chromosome A, but there is a string of 29 T's followed by a G. That is, a single point mutation could account for the difference between strings. Does your metric count that as a match, or as no match? Because clearly, if it's been a couple million years since two species diverged, there would be an accumulation of mutations in each that might break up a pattern to some extent (particularly in non-coding DNA). In other words, your pseudo-metric might be biased toward Type 1 errors. Second, you ran the comparison between two randomly generated "chromosomes" to confirm that unrelated strings generate a pseudo-metric value of 0. Good start. But did you do that for the other end of the pseudo-metric? In particular, did you generate two identical strings, then apply random mutations (base changes, insertions, deletions, relocations, etc.) to each of them, and THEN run the 30 BPM test? I'd be interested to see the results of that test case.AMW_{September 27, 2010
September
09
Sep
27
27
2010
10:19 AM
10
10
19
AM
PDT}

niwrad I wonder what results you would get if you did the test on two unrelated humans?markf_{September 27, 2010
September
09
Sep
27
27
2010
09:22 AM
9
09
22
AM
PDT}

I am so glad you did this analysis niwrad. This 99% myth must be the one myth, on top of all other evolutionary myths, that is used to promulgate evolution to school children as a undeniable fact. Now maybe perhaps you can do an analysis on the 'cartoon drawings' showing man evolving from ape?!? 8)bornagain77_{September 27, 2010
September
09
Sep
27
27
2010
08:52 AM
8
08
52
AM
PDT}

Prev 1 2 3

You must be logged in to post a comment.

Leave a Reply