Uncommon Descent Serving The Intelligent Design Community

Invasion of the IBM Engineers

Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

http://domino.research.ibm.com/comm/pr.nsf/pages/news.20060425_dna.html

IBM today announced its researchers have discovered numerous DNA patterns shared by areas of the human genome that were thought to have little or no influence on its function and those areas that do.

As reported today in the Proceedings of the National Academy of Sciences (PNAS), regions of the human genome that were assumed to largely contain evolutionary leftovers (called “junk DNA”) may actually hold significant clues that can add to scientists’ understanding of cellular processes. IBM researchers have discovered that these regions contain numerous, short DNA “motifs,” or repeating sequence fragments, which also are present in the parts of the genome that give rise to proteins.

If verified experimentally, the discovery suggests a potential connection between these coding and non-coding parts of the human genome that could have a profound impact on genomic research and provide important insights on the workings of cells.

“Our goal is to apply advanced computational techniques to analyze the workings of processes and systems, in this case the function of the human genome,” said Ajay Royyuru, head of the Computational Biology Center at IBM Research. “Using these tools, we’ve been able to shed new light on parts of the DNA that were traditionally thought of as not having a specific purpose. We believe the innovative application of technology can provide further understanding in the life sciences at large.”

The IBM team used a mathematical tool called pattern-discovery, often applied to mine useful information from very large repositories of data in both business and scientific applications, to sift through the approximately six billion letters in the non-coding regions of the human genome and look for repeating sequence fragments, or motifs.

Among the millions of discovered motifs, the team identified approximately 128,000 that also occur in the coding region of the genome and are significantly over-represented in genes involved in specific biological processes such as cell communication, regulation of transcription, transport and others. In fact, copies of one or more of these motifs can be found in over 90 percent of all known human gene sequences, as well as some genes of other animals where they associate with similar biological processes.

The report on this work “Short blocks from the non-coding parts of the human genome have instances within nearly all known genes and relate to biological processes” by Isidore Rigoutsos, Tien Huynh, Kevin Miranda, Aristotelis Tsirigos, Alice McHardy and Daniel Platt of IBM’s T. J. Watson Research Center, Yorktown Heights, NY appeared on April 24th in the early edition of the journal PNAS.

http://www.pnas.org/cgi/content/abstract/0601688103v1

Using an unsupervised pattern-discovery method, we processed the human intergenic and intronic regions and catalogued all variable-length patterns with identically conserved copies and multiplicities above what is expected by chance. Among the millions of discovered patterns, we found a subset of 127,998 patterns, termed pyknons, which have additional nonoverlapping instances in the untranslated and protein-coding regions of 30,675 transcripts from 20,059 human genes. The pyknons arrange combinatorially in the untranslated and coding regions of numerous human genes where they form mosaics. Consecutive instances of pyknons in these regions show a strong bias in their relative placement, favoring distances of {approx}22 nucleotides. We also found pyknons to be enriched in a statistically significant manner in genes involved in specific processes, e.g., cell communication, transcription, regulation of transcription, signaling, transport, etc. For {approx}1/3 of the pyknons, the intergenic/intronic instances of their reverse complement lie within 380,084 nonoverlapping regions, typically 60-80 nucleotides long, which are predicted to form double-stranded, energetically stable, hairpin-shaped RNA secondary structures; additionally, the pyknons subsume {approx}40% of the known microRNA sequences, thus suggesting a possible link with posttranscriptional gene silencing and RNA interference. Cross-genome comparisons reveal that many of the pyknons have instances in the 3′ UTRs of genes from other vertebrates and invertebrates where they are overrepresented in similar biological processes, as in the human genome. These unexpected findings suggest potential unique functional connections between the coding and noncoding parts of the human genome.

Comments
So ID theory leads to knowledge and understanding. What would Judge Jones say?tribune7
April 30, 2006
April
04
Apr
30
30
2006
06:08 AM
6
06
08
AM
PDT

“There is a widespread belief among Darwinians that such apparently unnecessary DNA would have been eliminated long ago by natural selection if it did not have some, as of yet undiscovered, function.”
- Ernst Mayr, What Evolution Is, p. 108, Box 5.5.

DNA replication is energetically expensive. Hardcore RM/NS evo biologists have predicted for years that there had to be some selective pressure which would conserve these vast stretches of non-coding regions. Hypotheses included acting as a buffer against damage during replication and telomeric considerations. How does this support or weaken ID?

RM+NS explains junk DNA with or without function. It explains everything! Thus it explains nothing. -ds unthink
April 30, 2006
April
04
Apr
30
30
2006
04:28 AM
4
04
28
AM
PDT

A question for Great Ape:

How are complex instinctive behaviors which are demonstrably inherited coded into the DNA molecule?

Indeed, how many people have even asked the question: What is the biological basis of instinct? Here's a partial answer: http://www.google.com/search?hl=en&lr=&q=%22biological+basis+of+instinct%22 DaveScot
April 30, 2006
April
04
Apr
30
30
2006
12:23 AM
12
12
23
AM
PDT

This paper definitely caught my eye as well, and I've given it a critical reading. The sequences are there, to be sure, but ultimately this is going to be a "poster child" case for how **not having a adequate grasp of molecular evolution** or an appropriate background in biology leads to some rather wrong-headed interpretations of your data. I don't blame the fellows; they couldn't have known. But it's a shame the reviewers didn't catch the (potentially) serious flaw here. It was probably reviewed largely by nonbio computational "invaders" as well. Here's the story as I see it: these IBM guys find all these stretches of 15-16mer sequences that are over-represented, right? And they demonstrate statistically that they couldn't have appeared in these quantities by chance alone. And about this they're *absolutely right.* Seen them myself some time ago and also did the math/simulation. As have others. You'll notice that in the IBM-PNAS paper that one thing the reviewers obviously took them to task over was "confirming" that these things things they found weren't simply jumping genes/transposons, etc.--in other words, repetitive sequences resulting from transposition/retrotransposition. In response, the authors very clearly show that the majority of these sequences do not overlap with RepeatMasked, identifiable repeats in the respective genomes analyzed. Again, here they are **absolutely correct**. The problem, however, is that this identification method they used for transposon confirmation only works for elements younger than a certain date in time (DNA information in nonselected regions like retrotransposons erodes pretty quickly via neutral replacement of bases.) The nucleotide-level methods of identifying the sequences begin to fail for human elements around the time of mammalian radiation when the percent identity remaining at the nucleotide level approaches that expected by chance alone. Yet genomic expansion via so-called "junk" dna has been going on much longer than the mammalian radiation. The remnants of this replicative process, which produces repeated regions of DNA, should be short stretches of approximately the size the IBM fellows found. So my guess--and I could well be wrong, but *this* definitely should be the hypothesis to disprove--is that they've found remnant signatures of ancient repeats that have long since degraded.... The authors also make a lot of hoopla about how their seqs are distributed selectively around certain types of genes and this indicates some sorta function--though they know not what. If they had a proper background to be doing this sort of work, they'd know that certain transposons will target the regions around specific types of genes b/c of the chromatin structure and timing of the gene's activity. So yes, this paper in PNAS is a lovely example of what can happen when engineers/computer scientists invade biology.

Your spiel is a lovely example of what happens when biologists try to analyze information bearing structures. Neutral replacement isn't necessarily neutral. While synonymous replacement may be functionally neutral in protein coding genes it isn't functionally neutral in pattern matching processes. This is, in my opinion, a prime reason why the neutral molecular clock hypothesis didn't work out very well. There's no such thing as neutral substitutions. Synonymous substitutions have as yet undiscovered effects. They're simply more subtle than a gross substitution of one amino acid for another in a protein product. Mark my words. And why didn't you use your real name when you registered here? What are you hiding? -ds great_ape
April 29, 2006
April
04
Apr
29
29
2006
10:03 PM
10
10
03
PM
PDT
Junk DNA - "Ha! Exactly what evolution expected to find!" Not junk DNA - "Ha! Exactly what evolution expected to find!"chunkdz
April 29, 2006
April
04
Apr
29
29
2006
09:03 PM
9
09
03
PM
PDT

Creationists and ID types predicted that would be the case, not that they will receive any credit for empirical verifications of their predictions.

A few questions:

Exactly what was the prediction made? Was it something other than "some use will be found for the parts of the DNA for which a use has not yet been found"?

What group of ID researchers actually carried out the empirical verification for which they should receive credit?

What other predictions of ID are verifiable?

What group of ID researchers are actually trying to verify any of those predictions?

Data belongs to everyone. You seem to be struggling under the common fallacy that ID theorists can't use data unless they do the gathering of it. That's utter nonsense and shows a basic misconception of how science works. Get a clue. -ds Kipli
April 29, 2006
April
04
Apr
29
29
2006
08:28 PM
8
08
28
PM
PDT
...regions of the human genome that were assumed to largely contain evolutionary leftovers (called “junk DNA”) may actually hold significant clues that can add to scientists’ understanding of cellular processes. Creationists and ID types predicted that would be the case, not that they will receive any credit for empirical verifications of their predictions. Instead, the hypothetical goo that can apparently be used as an explanation for virtually any set of observations is assumed and defined as the scientific answer from the start and then labeled natural. In the Darwinian mind their own imaginations/explanations about history carry the same epistemic weight as a clear statement in text or the language of mathematics that is then subject to current empirical verification. Instead they still seem to be using Darwin's old philosophy: "If I can imagine a way for this to happen, then that's evidence of my theory and that it did happen that way. If I cannot imagine a way, then my theory absolutely breaks down. Would you look at that, I can imagine something about everything! You want me to stop imagining things! Well, that's a science stopper." It is little wonder that they are so easily overwhelmed by their own imaginations and numerous "explanations." If ID types set that sort of epistemic standard for themselves then they'd probably be quickly overwhelmed based on evidence that was all in their imagination too.mynym
April 29, 2006
April
04
Apr
29
29
2006
05:57 PM
5
05
57
PM
PDT
1 2

Leave a Reply