Is functional information in DNA always conserved? (Part two)

_{Giuseppe Puccio

May 20, 2014

Intelligent Design}

Share: Facebook; Twitter; LinkedIn; Flipboard; Print; Email

So, in the first part of this discussion, I have tried to show with real data from scientific literature how much of the human genome is conserved, and how that conservation is evaluated and expressed. Then I have argued that we already have good credible evidence for function in a relevant part of the human genome (let’s say about 20%), and that most of that functional part is non coding, and great part of it is non conserved. While some can disagree on the real figures, I think that it is really difficult to reject the whole argument.

But, as I have anticipated, there are two more important aspects of the issue that I want to discuss ion detail. I will do it now.

3) Conserved function which does not imply conserved sequence.

The reason why sequence is conserved when function is present is because function creates specific constraints to the sequence itself.

For example, in a protein sequence with a well defined biochemical function, some variation will be possible without affecting the protein function, while other kinds of variation will affect it more or less.

We have many examples of important loss of function for the change of even one aminoacid: mendelian diseases in humans are a well known, unpleasant example of that.

We have many examples of important variation in the sequence of functional proteins which does not affect the function: the so called neutral variations in proteins. For example, there are many variants of human hemoglobin, more than 1000, most of them caused by a single aminoacid substitution. While many of them cause some disease, or at least some functional modification of the protein, at least a few of them are completely silent clinically, both in the heterozygote and in the homozygote state.

Now, there is an important consequence of that. Neutral variation happens also in functional sequences, although it happens less in those sequences. How much neutral variation can be tolerated by a functional sequnece depends on the sequence. For proteins, it is well known that some of them can vary a lot while retaining the same structure and function, while others are much more functionally constrained. Therefore, even functional proteins are more or less conserved, in the same span of time.

What about non coding genes? While we understand much (but not all) of the sequence-structure-function relationship for proteins, here we are almost wholly ignorant. Non coding genes, when they are functional, act in very different ways, most of them not well understood. Many of them are transcribed, and we don’t understand much of the structure of the transcribed RNAs, least of all of their sequence-structure-function relationship. IOWs, we have no idea of how functionally constrained is the sequence of a functional non coding DNA element.

While searching for pertinent literature about this issue, I have found this very recent, interesting paper:

Evolutionary conservation of long non-coding RNAs; sequence, structure, function.

The abstract (all emphasis is mine):

BACKGROUND:

Recent advances in genomewide studies have revealed the abundance of long non-coding RNAs (lncRNAs) in mammalian transcriptomes. The ENCODE Consortium has elucidated the prevalence of human lncRNA genes, which are as numerous as protein-coding genes. Surprisingly, many lncRNAs do not show the same pattern of high interspecies conservation as protein-coding genes. The absence of functional studies and the frequent lack of sequence conservation therefore make functional interpretation of these newly discovered transcripts challenging. Many investigators have suggested the presence and importance of secondary structural elements within lncRNAs, but mammalian lncRNA secondary structure remains poorly understood. It is intriguing to speculate that in this group of genes, RNA secondary structures might be preserved throughout evolution and that this might explain the lack of sequence conservation among many lncRNAs.

SCOPE OF REVIEW:

Here, we review the extent of interspecies conservation among different lncRNAs, with a focus on a subset of lncRNAs that have been functionally investigated. The function of lncRNAs is widespread and we investigate whether different forms of functionalities may beconserved.

MAJOR CONCLUSIONS:

Lack of conservation does not imbue a lack of function. We highlight several examples of lncRNAs where RNA structure appears to be the main functional unit and evolutionary constraint. We survey existing genomewide studies of mammalian lncRNA conservation and summarize their limitations. We further review specific human lncRNAs which lack evolutionary conservation beyond primates but have proven to be both functional and therapeutically relevant.

GENERAL SIGNIFICANCE:

Pioneering studies highlight a role in lncRNAs for secondary structures, and possibly the presence of functional “modules”, which are interspersed with longer and less conserved stretches of nucleotide sequences. Taken together, high-throughput analysis of conservation and functional composition of the still-mysterious lncRNA genes is only now becoming feasible.

So, what are we talking here? The point is simple. Function in non coding DNA can be linked to specific structures in RNA transcripts, and those structures, and therefore their function, can be conserved across species even in absence of sequence conservation. Why? Because the sequence/structure/function relationship in this kind of molecules is completely different from what we observe in proteins, and we still understand very little of those issues.

As the authors say:

In contrast to microRNAs, almost all of which are post-transcriptional repressors, the diverse functions of lncRNAs include both positive and negative regulations of protein-coding genes, and range fromlncRNA:RNA and lncRNA:
protein to lncRNA:chromatin interactions [8–11]. Due to this functional diversity, it seems reasonable to presume that different evolutionary constraints might be operative for different RNAs, such as mRNAs, microRNAs, and lncRNAs.

Which is exactly my point.

The authors examine a few cases where the sequence/structure/functional relationship of some lncRNAs has been stiudied more in detail. They conclude:

Tens of thousands of human lncRNAs have been identified during the first genomic decade. Functional studies for most of these lncRNAs are however still lackingwith only a handful having been characterized in detail [8,10,11,87]. Fromthese few studies it is apparent that some lncRNAs are important cellular effectors ranging from splice complex formation [34] to chromatin and chromosomal complex formation [43,46] to epigenetic regulators of key cellular genes.

—

It is becoming increasingly apparent that lncRNAs do not show the same pattern of evolutionary conservation as protein-coding genes. Many lncRNAs have been shown to be evolutionary conserved [5]; but they do not appear to exhibit the same evolutionary constraints as mRNAs of protein-coding genes.

—

While certain regions of the lncRNAs appear tomaintain the regulatory function, such as bulges and loops, the exact sequence in other regions of lncRNAs appear less important and possibly act as spacers in order to link functional units or modules. Depending on the function, e.g.,whether the RNA sequence is a linker or a functional module, different patterns of conservation might be expected.

It is important to remember that lncRNA genes are only a part of non coding DNA. If someone wonders how big a part, I would suggest the following paper:

The Vast, Conserved Mammalian lincRNome

which estimates human lncRNA genes at about 53,649 genes, more than twice the number of protein coding genes, corresponding to about 2.7% of the whole genome (Figure 2). It’s an important part, but only a part. And it is a part which, while probably functional in many cases, still is poorly conserved at sequence level.

Other parts of the non coding genome will have different types of function, structure, and therefore sequence conservation. For example, the following paper:

Integrated genome analysis suggests that most conserved non-coding sequences are regulatory factor binding sites

argues that most conserved non coding regions (about 3.5% of the genome, conserved across vertebrate phylogeny, strongly suggesting its functional importance, which clusters into >700 000 unannotated conserved islands, 90% of which are <200 bp) “serve as promoter-distal regulatory factor binding sites (RFBSs) like enhancers”, rather than encoding non-coding RNAs. IOWs, these short sequences in the non coding genome which make up another 3.5% of the total would be functional not because of their RNA transcript, but directly as binding sites (enhancers and other distal regulatory elements). Now, these sequences are conserved. That proves the general point: different functions, different relationship between sequence and function, different conservation of functional elements. In general, it seems that function which expresses itself through non coding RNA transcripts is less conserved at sequence level.

And now, the last point, maybe the most important of all.

4) Function which requires non conservation of sequence.

When we analyze conservation of sequences across species as an indicator of function, we are forgetting a fundamental point: in the course of natural history, species change, and function changes with them.

IOWs, the reason why species are different is that they have different molecular functions.

So, there is some implicit contradiction in equating conservation with function. A conserved sequence is very likely to be functional, but it is not true that a function needs a conserved sequence, if it is a new function, or a function which has changed.

Now. we know that protein coding genes have not changed a lot in the last parts of natural history. It is usually recognized that the greatest change, especially in more recent taxa, is probably regulatory. And the functions which have been identified in various parts of non coding DNA are exactly that: regulatory.

So, to sum up:

– Species evolve and change

– The main tool for that change is, realistically, a change in regulatory functions

– If a function changes, the sequences on which the function is based must change too

– Therefore, those important regulatory functions which change for functional reasons will not be conserved across species

This point is different from the previous point discussed here.

In point 3, the reasoning was that the same function can be conserved even if the sequence changes, provided that the structure is conserved.

In point 4, we are saying that in many cases the sequence must change for the function to change with it.

Now, although this reasoning is quite logic and convincing, I will try to backup it with empirical observations. To that purpose, I will use two different models: HARs and the results of the recent FANTOM5 paper about the promoterome.

4a) Human Accelerated Regions (HARs).

Waht are HARs? Let’s take it from Wikipedia:

Human accelerated regions (HARs), first described in August 2006, are a set of 49 segments of the human genome that are conserved throughout vertebrate evolution but are strikingly different in humans.

IOWs, they are sequences which were conserved in primates, and which change in humans.

Are they functional. That’s what is believed for some of them. Again, Wikipedia:

Several of the HARs encompass genes known to produce proteins important in neurodevelopment. HAR1 is an 106-base pair stretch found on the long arm of chromosome 20 overlapping with part of the RNA genes HAR1F and HAR1R. HAR1F is active in the developing human brain. The HAR1 sequence is found (and conserved) in chickens and chimpanzees but is not present in fish or frogs that have been studied. There are 18 base pair mutations different between humans and chimpanzees, far more than expected by its history of conservation.^[1]

HAR2 includes HACNS1 a gene enhancer “that may have contributed to the evolution of the uniquely opposable human thumb, and possibly also modifications in the ankle or foot that allow humans to walk on two legs”. Evidence to date shows that of the 110,000 gene enhancer sequences identified in the human genome, HACNS1 has undergone the most change during the evolution of humans following the split with the ancestors of chimpanzees.^[4] The substitutions in HAR2 may have resulted in loss of binding sites for a repressor, possibly due to biased gene conversion

Now, for brevity, I will not go into details, but… “active in the developing human brain” and “may have contributed to the evolution of the uniquely opposable human thumb, and possibly also modifications in the ankle or foot that allow humans to walk on two legs” are provocative thoughts enough, and I believe that I don’t need to comment on them.

The important point is: what makes us humans different from chimps? Logic says: something which is different. Not something which is conserved.

4b) The results from FANTOM5 about the promoterome.

FANTOM5 has very recently published a series of papers with very important results. One the most important is probably the following article on Nature:

A promoter-level mammalian expression atlas

Unfortunately, the article is paywalled. I have access to it, so I will try to sum up the points which are needed for my reasoning.

So, what did they do? In brief, they used a very powerful technology, cap analysis of gene expression (CAGE), to study various aspects of the transcriptome in different human cells from different tissues and states. This is probably the most important analysis of the human transcriptome ever realized.

This particular paper focuses on a “promoter atlas”, IOWs an atlas of the expression of promoters (transcription start sites, TSSs, which control the transcription of target genes) in different tissues.

So, according to the level of expression of those promoters in different tissues and cells, they classify genes (both protein coding and non protein coding) in:

– ubiquitous-uniform (‘housekeeping’, 6%): those genes which are expressed at similar levels in most cell types

– ubiquitous non-uniform (14%): expressed in most cell types, but at different levels

– non-ubiquitous (cell-type restricted, 80%)

Each of those types includes both C (protein coding genes) and N (non protein coding genes).

Now. that’s very interesting. Now we know that most genes (80%), both coding and non coding, are expressed only in some cell types.

But the most interesting thing, for our discussion about conservation, is that they studied the promoter expression both in human cells and in other mammals.

Now, we must look at Figure 3 in the paper. For those who cannot access the article, there is a low resolution version of this figure here (just click on Figure 3 in the “at a glance” box; OK, OK, it’s better than nothing!).

The figure is divided into two parts, a and b. In each part, the x axis shows the evolutionary divergence from humans (from 0 to 0.8, the grey vertical lines correspond to macaque, dog and mouse). The y axis shows “Human TSS with aligning orthologous sequence (%)”, IOWs the % conservation of each group of genes in the graph at various points of evolutionary divergence. Each line represents a different group of genes. So, the lines which remain more “horizontal” represent groups of genes which are more conserved, while those which “go down” from lest to right are those less conserved. I hope it’s clear.

On the left (part a) genes are grouped as above: ubiquitous- uniform, etc, each category divided into C or N (coding or non coding).

What are the conserved groups? In order: Non-ubiquitous C (green line); Ubiquitous uniform C (orange line); Ubiquitous non-uniform C (purple line).

IOWs, coding genes are more conserved, and non ubiquitous are most conserved.

That is not news.

Conversely, non coding genes are less conserved, in this order: Non-ubiquitous N (lighter green); Ubiquitous non-uniform N (lighter purple); Ubiquitous uniform N (lighter orange). This last line is definitely less conserved than the random reference (the dotted line).

This part is “Conservation by expression breadth and annotation”.

Well, what is on the right (part b)? It is “Conservation by cell-type biased expression”.

IOWs, the graph is the same, but genes are grouped in different lines according to the cell type where they are preferentially expressed.

The most conserved groups? Those with preferential expression in: Fibroblast of periodontium, Fibroblast of gingiva, Preadipocyte, Chondrocyte, Mesenchymal cell.

The least conserved? Those with preferential expression in: Astrocyte, Hepatocyte, Neuron, Sensory epithelial cell, Macrophage, T-cell, Blood vessel endothelial cell. In decreasing conservation order.

Does that mean something? I leave it to you to decide. For me, I definitely see a pattern. With all due respect for fibroblasts and adipocytes, neurons and T cells smell more of specialized cells which must change in higher taxa (excuse me, Piotr, mice will accuse me of not being politically correct).

So, my humble suggestion is: the things that change more are not necessarily those less functional. In many cases, they could be exactly the opposite: the bearers of new, more complex functions.

And non coding genes are very good candidates for that role.

Comments

rhampton7: I think we have already discussed that in the past. The 500 bit threshold refers to the number of bits necessary to implement one function. Individual regulatory functions do not accumulate by chance to give a complex metafunction, no more than individual AA mutations accumulate to build a complex functional proteins. In my identification of intelligent design, which IMO is in no way different from the general identification of intelligent design, it is the specified (functional) complexity that counts, not the accumulation of simpler, independent traits. A specific sequence of AAs is as rare as a specific accumulation of regulatory elements which contribute to a general, specific function which in no way can be expressed in a simpler way. Dog breeds may differ for an accumulation of simpler regulatory differences. I have no problem with that. In that case, there is no reason to infer design. So, if a dog breed differs from another one only because of different sizes of individual body parts, or the color of this or that part, and so on, maybe those differences are generated by random variation (or human selection of random variation, in the case of dogs). But if a dog breed has a completely different enzyme chain, which is not present in the genome of another breed, the origin of that enzyme chain is design. Obviously, that part of genome could already exist and simply have been transferred by HGT or something like that, but the origin of the complex functional molecules is design. In the same way, even if you have a complex cascade of relatively simpler elements (such as short peptides) which interact specifically to give a complex sequence of regulatory functions, and the whole system is irreducibly complex, than you can infer design for the whole system, even is some of the individual components are not complex enough for the inference.gpuccio_{May 29, 2014
May
05
May
29
29
2014
11:35 AM
11
11
35
AM
PDT}

Dionisio: I just found that wonderful source (on another thread), just to see that you had already pointed to it here. Thank you. The web resource is particularly, interesting, allowing us to query for individual protein expression.gpuccio_{May 29, 2014
May
05
May
29
29
2014
11:23 AM
11
11
23
AM
PDT}

in this kind of regulatory networks, the complexity is mainly of the whole system
gpuccio, This is where your identification of intelligent design, I believe, may result in false-positives among different dog breeds if the 500 bit (250bp) threshold may be reached by the accumulation of many individual "regulatory function[s] of a short fragment[s] of mRNA" that comprise a regulatory network. It seems logical that most of changes in the genotypes between breeds should be in regulatory regions, and that these changes could easily surpass 250bp especially for older breeds.rhampton7_{May 28, 2014
May
05
May
28
28
2014
05:59 PM
5
05
59
PM
PDT}

Dionisio, excellent find! Human Proteome Project Finds 193 Previously Unknown Proteins - 05/28/2014 Excerpt: Striving for the protein equivalent of the Human Genome Project, an international team of researchers has created an initial catalog of the human “proteome,” or all of the proteins in the human body. In total, using 30 different human tissues, the team identified proteins encoded by 17,294 genes, which is about 84 percent of all of the genes in the human genome predicted to encode proteins. In a summary of the effort, to be published today in the journal Nature, the team also reports the identification of 193 novel proteins that came from regions of the genome not predicted to code for proteins, suggesting that the human genome is more complex than previously thought. ,,, “You can think of the human body as a huge library where each protein is a book,” said Akhilesh Pandey, ,,, “The difficulty is that we don’t have a comprehensive catalog that gives us the titles of the available books and where to find them. We think we now have a good first draft of that comprehensive catalog.”,,, The team’s most unexpected finding was that 193 of the proteins they identified could be traced back to these supposedly noncoding regions of DNA. “This was the most exciting part of this study, finding further complexities in the genome,” said Pandey. “The fact that 193 of the proteins came from DNA sequences predicted to be noncoding means that we don’t fully understand how cells read DNA, because clearly those sequences do code for proteins.” Pandey believes that the human proteome is so extensive and complex that researchers’ catalog of it will never be fully complete, but this work provides a solid foundation that others can reliably build upon. http://www.biosciencetechnology.com/news/2014/05/human-proteome-project-finds-193-previously-unknown-proteins?et_cid=3964356&et_rid=653535995&type=cta Human Proteome Mapped - By Anna Azvolinsky | May 28, 2014 Excerpt: Both studies identified evidence to suggest there is translation from DNA regions that were not thought to be translated—including more than 400 translated long, intergenic non-coding RNAs (lincRNAs)—found by the Küster team—and 193 new proteins—uncovered by the Pandey team. http://www.the-scientist.com/?articles.view/articleNo/40083/title/Human-Proteome-Mapped/bornagain77_{May 28, 2014
May
05
May
28
28
2014
05:46 PM
5
05
46
PM
PDT}

In a summary of the effort, to be published today in the journal Nature, the team also reports the identification of 193 novel proteins that came from regions of the genome not predicted to code for proteins, suggesting that the human genome is more complex than previously thought. The cataloging project, led by researchers at The Johns Hopkins University and the Institute of Bioinformatics in Bangalore, India, should prove an important resource for biological research and medical diagnostics, according to the team’s leaders. http://www.biosciencetechnology.com/news/2014/05/human-proteome-project-finds-193-previously-unknown-proteins?et_cid=3964356&et_rid=653535995&type=cta
Dionisio_{May 28, 2014
May
05
May
28
28
2014
05:45 PM
5
05
45
PM
PDT}

gpuccio, Did you see this report? http://www.biosciencetechnology.com/news/2014/05/human-proteome-project-finds-193-previously-unknown-proteins?et_cid=3964356&et_rid=653535995&type=ctaDionisio_{May 28, 2014
May
05
May
28
28
2014
05:08 PM
5
05
08
PM
PDT}

This is a beautiful example of important function which can be sequence conserved across species (as in tubulin) or definitely not conserved (as in centrosomin). Which was, I believe, one of the main points of discussion in this thread.
eccellente conclusione! mile grazie! Glad to see that the amazing complexity of the spindle dynamical control serves as a good illustration for the interesting topic of function and sequence conservation that was discussed in this thread.Dionisio_{May 26, 2014
May
05
May
26
26
2014
05:44 AM
5
05
44
AM
PDT}

Dionisio: Really interesting! The complexity of the spindle dynamical control is mind blowing. But, obviously, it is based on tons of "simpler" complexity. Just as an example, tubulin molecules appear in eukaryotes, and they are highly conserved: Alpha Tubulin: Plasmodium falciparum: 453 AAs Human: 450 AAs Identities: 379/450(84%) Positives: 419/450(93%) On the contrary, centrosomin, the really important protein in this paper, is, in drosophila, a 1320 AAs long protein (!), which is scarcely conserved in other species. This is a beautiful example of important function which can be sequence conserved across species (as in tubulin) or definitely not conserved (as in centrosomin). Which was, I believe, one of the main points of discussion in this thread.gpuccio_{May 26, 2014
May
05
May
26
26
2014
01:25 AM
1
01
25
AM
PDT}

#117 This was implicit in the previous link, but here's explicitly http://www.cell.com/developmental-cell/abstract/S1534-5807(14)00163-4Dionisio_{May 25, 2014
May
05
May
25
25
2014
08:44 PM
8
08
44
PM
PDT}

gpuccio, Info related to the spindle apparatus mechanisms operating on the intrinsic asymmetric mitosis: http://www.cell.com/developmental-cell/abstract/S1534-5807(14)00129-4Dionisio_{May 25, 2014
May
05
May
25
25
2014
08:39 PM
8
08
39
PM
PDT}

Are you comparing our revered interlocutors to swans?
No, swans are nice looking birds. I like the elegant way they swim, but like more how they fly over the lake. Also, they don't argue stubbornly ;-) The swan example was about illustrating the nature of functional information. The air pressure waves associated with the music sound are converted to electrical impulses that activate specialized parts of the brain that make us hear the music. However, different individuals may react differently to the music. Some will recognize a choreography associated with the music and will act accordingly, depending on the scenario context. Others won't know what to do with it. Some will consider it an unpleasant noise. Is the functionality of any information partially revealed by the effects it causes on the system that contains it? Changing subject, check this out: http://mail.cell-press.com/go.asp?/bECE001/mKPUFB6F/uDM4512F/xMU8OB6FDionisio_{May 24, 2014
May
05
May
24
24
2014
07:35 PM
7
07
35
PM
PDT}

rhampton7: Without noticing the date, I responded in depth to a question of yours on another thread. Wondering why you had not replied, I looked again at the thread, and see the date was 2011. My error. Anyhow, if you are interested in getting an answer to your old question (mine is the only one), you can find it at: https://uncommondescent.com/christian-darwinism/observant-jew-prager-takes-on-christian-darwinist-giberson/ If you want to reply there, you are welcome, and I'll read what you have to say.Timaeus_{May 24, 2014
May
05
May
24
24
2014
11:25 AM
11
11
25
AM
PDT}

Dionisio: Are you comparing our revered interlocutors to swans? Could have been worse, after all. :)gpuccio_{May 24, 2014
May
05
May
24
24
2014
09:52 AM
9
09
52
AM
PDT}

Have anyone ever been to a classic music concert or ballet presentation in a theater? Assuming one doesn't arrive in late, one may hear the orchestra tuning their instruments. Does it sound good? No, it's a terrible cacophony! However, when the same musicians play the same notes on the same instruments, but following their individual sheets under the direction of the conductor, then beautiful music fills the theater. What's the difference? Timing, location, arrangement, synchronization, inspiration, composition, orchestration, are words that come to our minds. Why? Now, add to all that a ballet choreography and see the dancers appear on stage, move on the stage, leave the stage, while the orchestra plays a musical composition, specially written for the given scenario and choreography. Now things got more complicated. However, the same principles apply. All the components of such a complex event have their individual function(s) to perform. Some components react to external signals in order to start and perform their part, according to pre-established protocols, based on the original idea of the main composer and the main choreographer. Each dancer knows what to do in reaction to the surrounding environment. However, try playing Tchaikovsky's 'Swan Lake' on loudspeakers placed near a lake where swans calmly swim and watch how they react to the loud sound of the famous musical composition they inspired. Compare to any video of that ballet, and see if there's any similarity. Now, someone might conclude that the swans are deaf, like the frog without legs. Maybe they are, who knows? Hard to tell ;-)Dionisio_{May 24, 2014
May
05
May
24
24
2014
07:58 AM
7
07
58
AM
PDT}

Dionisio: Thank you for your comments. I appreciate them very much. And I obviously agree. The fun part is still ahead. :)gpuccio_{May 24, 2014
May
05
May
24
24
2014
06:19 AM
6
06
19
AM
PDT}

#106
...in this kind of regulatory networks, the complexity is mainly of the whole system.
Yes! Exactly! Well stated. The functionality of the whole is not the sum of the functionality of the individual components. Timing and location are important factors that make a big difference. The individual components by themselves may not mean much if taken out of the context they are found in. This applies to any informational system. The particular physical or chemical properties of every individual component are used in the right way, at the right time, in the right location, in order to cause a desired effect, that combined with the actions of the other components, produce the designed system-wide result. Reductionist approaches don't work well when analyzing complex systems. Simplistic reasoning can lead to 'frog without legs' stories. That's pseudo-science voodoo. How long did it take to figure out that regulatory networks require controlling mechanisms that could be located anywhere in the system? Did we look everywhere from the beginning? Did we keep some areas of the system out of the investigation at the beginning? Did we discover things 'by accident' because we did not expect them to be where they are or to look as they do or to function as they do? Did we let preconceived ideas to determine what parts of the systems should be investigated and how to interpret the results? We better watch out. We better don't cry. Santa Claus may not be coming to town, but an overwhelming amount of data is coming out of research, and science must handle it well. New discoveries will shed more light on the complex biological systems, answering outstanding questions, but raising new ones. The party has just started. The fun part is still ahead ;-) Let's enjoy it!Dionisio_{May 24, 2014
May
05
May
24
24
2014
05:42 AM
5
05
42
AM
PDT}

#108
I am certainly not thinking of “source code written by a human programmer”.
Humans have not created any information processing system comparable -in functionally creative power- just to the first few weeks of the human development process. Not even in our wildest imagination. Not even in theory. Not even close. Computer scientists and engineers are humbly fascinated and attracted by the biological systems described by researchers these days. That's why we find electrical engineers and computer scientists working along with biologists on complex research projects at different universities. Perhaps we should approach this and other similar discussions with the same humility and fascination? Let's think out of the box. But let's think well. Let's not rush into premature conclusions.Dionisio_{May 24, 2014
May
05
May
24
24
2014
03:19 AM
3
03
19
AM
PDT}

“There is nothing wrong in not understanding things. That is the source of inquiry. There is nothing worse than pretending that things not understood are understood.”

“There is nothing wrong in not understanding things. That is the source of inquiry. There is nothing worse than pretending that things not understood are understood.”

“There is nothing wrong in not understanding things. That is the source of inquiry. There is nothing worse than pretending that things not understood are understood.”
Dionisio_{May 24, 2014
May
05
May
24
24
2014
02:36 AM
2
02
36
AM
PDT}

Piotr: You say:
No explicit procedures are necessary. “Procedures” emerge from the interactions or regulatory elements. Thinking of DNA in terms of source code written by a human programmer is not very helpful.
This is an important point, and one where your way of thinking is definitely different from mine. First of all, I don't understand well what you mean by "explicit". I am certainly not thinking of "source code written by a human programmer". My only point is that the procedures must be there, and they must be working. And to be working, they must be implemented in some form, IOWs there must be functional information and functional complexity linked to them. I will try to be more clear. If, as a human programmer, I write some source code in C, and then I compile it, I have two different objects: a) The source code, which is typically in "human friendly" form (well, that is not always true for C!). b) The compiled code, which is the working form. Now, you can call a) explicit and b) implicit. I don't know if that is what you mean. I only know that b) must be there, for the software to work. And b) requires digital information of some kind. Let's make an example. From the one genome, we get in humans about 500 basic transcriptomes. That is the foundation of our multicellular form. How does that happen? Some algorithm must work, otherwise how could such different patterns arise from the same set of genes? You say that the procedures "emerge from the interactions or regulatory elements". Which interactions? The regulatory elements are coded in the genome, and the genome is the same in all cells. So, what happens? How do different, complex patterns of regulation emerge from the same genome in different cells, at different times, and in definite functional order? It is certainly possible, but not without a lot of information. Information which generates the differences, guides them, makes algorithmic "decisions" (not in a conscious sense) and implements definite "strategies" (again, not in a conscious sense). If we accept, at least as a first hypothesis, that such information is in the genome, we must ask ourselves where it is, and in what form. It's not enough to state that it is not "explicit", or that it magically "emerges from the interactions or regulatory elements" (what in the world does that mean?) to evade that important problem. Information, in the form of digital sequences, requires bits, physical bits, to be implemented. What are those bits? How do they perform the task? Those are legitimate questions. Your answers are not answers. Just to conclude, in a despicable attack of narcissism, I quote myself: "There is nothing wrong in not understanding things. That is the source of inquiry. There is nothing worse than pretending that things not understood are understood."gpuccio_{May 24, 2014
May
05
May
24
24
2014
01:05 AM
1
01
05
AM
PDT}

Mung: Thank you for your interesting comments (and for the blessing!). I must confess that I have not a clear idea about the possible role of NS in the fixation of designed functional elements. It could be, but certainly your comments about the "population genetics" problems are interesting. Personally, I don't think that most designed features are lost by drift. There must be some mechanism to rescue them, IMO. Design is too precious (and difficult) to waste it beyond some acceptable level. Part of it, however, could really be lost. And part of it, as I have said, can certainly fail even after its fixation and rather long existence (see Ediacara).gpuccio_{May 24, 2014
May
05
May
24
24
2014
12:40 AM
12
12
40
AM
PDT}

Piotr at #99: The issue about procedures is certainly the most important, so I will leave it for next post. Here, I will briefly answer two minor points: You say:
Note that, since the fragment in question is derived from an mRNA, it doesn’t really increase the amount of “non-junk”. How do you extrapolate from one such case to a whole realm of hitherto unknown molecules? All the known types of functional RNAs, long as well as short, still add up to a few percent of the human genome (a pretty generous estimate).
OK, but I never argued that it increased the "non junk" part. Dionisio linked the article, and I commented on it. I believe, however, that it can be considered indirectly related to the discussion. It is, at least, an interesting clue to how function can be implemented in the general scenario.
As a side note: the RNA fragment in question is an 18-mer (and most miRNAs are about 22nt long). Since they are so short and there are so many of them, and they are functional, there must be a reasonable probability that a randomly generated short RNA sequence can be co-opted for a regulatory function. They don’t have to be designed by a bodiless intelligence.
I am well aware that, the shorter the molecule, the less is the functional complexity of the sequence. However, in this kind of regulatory networks, the complexity is mainly of the whole system. IOWs, we are more in a situation of irreducible complexity of the system, where the role of the short molecule is a role mainly of signal (or maybe of interference). Even is the molecule is rather simple, its role is functional only if integrated in a general regulation pattern. More in next post about the procedures.gpuccio_{May 24, 2014
May
05
May
24
24
2014
12:34 AM
12
12
34
AM
PDT}

wd400- it sounds like you don't understand computer science at all. Have you ever worked on a computer at the level of its integrated circuitry- ie the component level?Joe_{May 23, 2014
May
05
May
23
23
2014
08:41 PM
8
08
41
PM
PDT}

Joe, I understand comp. sci. pretty well, certainly enough to know it's a terrible model for understanding biology. What I don't understand it what you are talking about.wd400_{May 23, 2014
May
05
May
23
23
2014
08:26 PM
8
08
26
PM
PDT}

gpuccio:
I am a design believer, as I believe you are. The point of interest for me is how the functional result originates, and I think I have the general answer in the design paradigm.
Today I considered the possibility that I am a design believer because it is easier to argue on the internet than it is to get out and get my hands "dirty" in the real world. Hoping to allow the Spirit to water that little seed. Not that I'll stop being a design believer! :) I believe in God the Creator of Heaven and Earth. To me creation implies design. Period. How could it not? gpuccio:
Frankly, I am less interested in how the functional design becomes fixed, if we admit that it starts in a limited part of the population.
Understood. But that's still an interesting thing to think about especially when considering scenarios offered by evolutionists. and I was responding to Piotr. To fix a particular allele in the population by natural selection requires that there be reproductive excess, as Haldane showed, as Walter ReMine helped clarify, and as Nei admits in his latest book Mutation-Driven Evolution. Piotr was claiming that he rejects the "drift" explanation as being "too slow" and reasons the fixation therefore "must have been due to natural selection." That's just plain faulty reasoning. Given all this recent talk of neutral theory and drift and natural selection and their roles in evolution I thought it pertinent. gpuccio:
I suppose that NS can have a role in that. I am sure that drift would lose most of the designed traits in favor of useless variation. I don’t think that is a good way to “sell” designed things on the market!
But that's all the evolutionists have, lol! And that's their "answer" to intelligent design. In the models of population genetics, most new "designs" get lost, regardless of whether they are beneficial or not. So given their models, there must be far more beneficial mutations being offered up by "the engines of variation" than they are willing to let on. Else evolutions doesn't stand a snowball's chance in Venice in August. And of course, any scenario rife with beneficial mutations just begs for design. Enough for now. God bless!Mung_{May 23, 2014
May
05
May
23
23
2014
07:56 PM
7
07
56
PM
PDT}

Piotr:
If the genome is a RAM, it means that biological processes can write to it and change its content.
I didn't say the genome was RAM.
Do you realise what you are proposing?
Absolutely. I don't think you do though.Joe_{May 23, 2014
May
05
May
23
23
2014
07:46 PM
7
07
46
PM
PDT}

wd400:
Will I still don’t understand it Joe, sounds a bit like magic to me.
So computer science is magic to you. Nice to know. Thanks.Joe_{May 23, 2014
May
05
May
23
23
2014
07:44 PM
7
07
44
PM
PDT}

The Bladderwort and the Pufferfish
Would it serve Pufferfish and Bladderwort chips?Mung_{May 23, 2014
May
05
May
23
23
2014
06:39 PM
6
06
39
PM
PDT}

Gpuccio:
So please, clarify what is your position about the procedures: do you simply deny that they exist? And if they exist, where and how do you hypothesize they are written?
No explicit procedures are necessary. "Procedures" emerge from the interactions or regulatory elements. Thinking of DNA in terms of source code written by a human programmer is not very helpful.
So I ask you: do you agree that RNA genes, if functional, express their function in ways that are not yet very clear?
We understand how some of them work, and we know precious little about others. No disagreement here.
That the relationship between nucleotide sequence and structure and function is something which still needs to be studied? That it is something completely different from the more known, but not yet completely understood, field of protein structure and function? Am I imagining those things?
So far, so good. Functional RNAs are important, interesting, and insufficiently understood.
So I ask you: do you agree that the regulatory function of a short fragment of mRNA is something new, that up to now has escaped our detection?
I'm not an expert, but this particular little fellow does seem to be something new, given its untypical origin and its target (though functional miRNAs in general are hardly big news).
Do you agree that, if that is not an isolated case, there could be a lot operating at the level of transcription regulation by short molecules, of which we still see only a tiny part? Is it imagination to hypothesize that, in the light of many recent discoveries, including the one linked by Dionisio?
Note that, since the fragment in question is derived from an mRNA, it doesn't really increase the amount of "non-junk". How do you extrapolate from one such case to a whole realm of hitherto unknown molecules? All the known types of functional RNAs, long as well as short, still add up to a few percent of the human genome (a pretty generous estimate). As a side note: the RNA fragment in question is an 18-mer (and most miRNAs are about 22nt long). Since they are so short and there are so many of them, and they are functional, there must be a reasonable probability that a randomly generated short RNA sequence can be co-opted for a regulatory function. They don't have to be designed by a bodiless intelligence.Piotr_{May 23, 2014
May
05
May
23
23
2014
04:49 PM
4
04
49
PM
PDT}

Piotr: Call it as you like, but I would really encourage you to answer the following points: a) How does the genome control the development of more than 500 different, specific transcriptomes, each of them expressing about 400 specific transcription factors, plus the structure of organs and tissues, plus the connectome in the brain, plus the immunologic networks, and so on? Do you realize that when I ask where are the procedures, and how are they written, I am not imagining anything, but just exercising the natural human right of trying to understand things? So please, clarify what is your position about the procedures: do you simply deny that they exist? And if they exist, where and how do you hypothesize they are written? b) When I say: "I don’t think that RNA’s function is really sequence independent, but that we don’t know how much and how it is sequence dependent." I am just trying to clarify my position about the discussion which is taking place between wd400 and Joe. While it is certainly possible that some parts of the genome have sequence independent roles, I think that most functional DNA, including the non coding part, has sequence dependent function. But I have specified that we don't know exactly how that relationship works for non coding DNA, especially for the parts which codify regulatory RNA. That is not my personal fault. It is just that we do not understand enough of the biochemical way in which regulatory RNA acts, not even of its biochemical structure. That means that there is space to discover new things, not that I am imagining anything. As you can see from my answers to wd400, I am taking his objection about the problem of conservation and of possible mutations very seriously. For variation across species, I really believe that functional contraints to vary can have a very important role. For wd400's argument about the possibility to detect a signal of conservation even in current populations, I am afraid I cannot give a technical judgement, at least not at present. However, I am not at all sure that such a signal can be detected in a general way, and I have suggested that it would be more useful to study specific different components of non coding DNA, first of all across species, and if possible also in populations, always continuing to try to understand directly how the possible functions operate. A general, multifaceted approach to this important issue could give some more realistic answers than the couple of papers, certainly interesting, quoted by wd400. So I ask you: do you agree that RNA genes, if functional, express their function in ways that are not yet very clear? That the relationship between nucleotide sequence and structure and function is something which still needs to be studied? That it is something completely different from the more known, but not yet completely understood, field of protein structure and function? Am I imagining those things? c) When I said: "That means that there can be an intricate regulation network that we not only don’t understand, but essentially don’t see." I was referring to a paper, very interesting, proposed by Dionisio, where important regulatory functions had been demonstrated for a short nucleotide sequence derived from a protein coding mRNA, but completely independent from the protein coding function. My comment is not imagination. It has recently become well known that various kinds of short sequences, both RNA and peptides, have important regulatory functions. That network, that certainly exists, is probably still vastly invisible to our technologies, for obvious reasons: many of the methods we have developed focus indeed on longer sequences. I though I had said that clearly in my post: "There is an important point which should be considered. We are probably really underestimating the importance of short molecules in functional regulation. Both peptides and RNAs. Indeed, while basic biochemical function usually requires long molecules, the regulation of what those long molecules do can easily be obtained by short molecules. That means that there can be an intricate regulation network that we not only don’t understand, but essentially don’t see." So I ask you: do you agree that the regulatory function of a short fragment of mRNA is something new, that up to now has escaped our detection? Do you agree that, if that is not an isolated case, there could be a lot operating at the level of transcription regulation by short molecules, of which we still see only a tiny part? Is it imagination to hypothesize that, in the light of many recent discoveries, including the one linked by Dionisio? You say:
The picture that emerges is one of a genome full of functional “dark matter” — an intricate regulatory network, essentially invisible and incomprehensible to us, but presumably involved in some important activities like cortical development.
Yes. Exactly. Is that so strange for you? Those important activities are there, for all to see. Do we really need imagination to ask ourselves how do they take place? There is nothing wrong in not understanding things. That is the source of inquiry. There is nothing worse than pretending that things not understood are understood. You say:
Let’s imagine that such a network (spanning most of the genome) is real, and let’s call it the imaginome ®.
Call it as you like. I don't know if it spans most of the genome. Indeed, I am not really sure that it is in the genome, or entirely in it. I am very open minded about that. But I am sure that such a network exists, that it is real.
What about mutational load? The imaginome would be an easy target for mutations, and one would expect many of them to have deleterious effects.
I can agree with that. My point is that we must understand more of the network and where it is and how it works, before trying to use the mutation argument and gross measures to deny that it exists. My point is that only a correct assessment of structure and function of the regulatory elements will allow to measure if they change, how much they change, and what the results of the changes are. Even for proteins, we often don't understand how sequences which are very different may have similar structure and function. And we know much more about proteins than we do about regulatory RNAs.
I shall have more questions later on.
I am waiting for them. I love questions.gpuccio_{May 23, 2014
May
05
May
23
23
2014
03:18 PM
3
03
18
PM
PDT}

Joe: If the genome is a RAM, it means that biological processes can write to it and change its content. Do you realise what you are proposing?Piotr_{May 23, 2014
May
05
May
23
23
2014
01:18 PM
1
01
18
PM
PDT}

1 2 3 … 5 Next

You must be logged in to post a comment.

3) Conserved function which does not imply conserved sequence.

BACKGROUND:

SCOPE OF REVIEW:

MAJOR CONCLUSIONS:

GENERAL SIGNIFICANCE:

4) Function which requires non conservation of sequence.

4a) Human Accelerated Regions (HARs).

4b) The results from FANTOM5 about the promoterome.

Leave a Reply