Uncommon Descent Serving The Intelligent Design Community

Is functional information in DNA always conserved? (Part one)

Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

Conservation of sequence in the course of natural history has always been considered a sign of function. But does function always coincide with sequence conservation? And are there other important aspects which must be considered? This topic has been discussed recently with some passion here, so I will dedicate a series of two posts to it, in the hope that we can base our discussions on reliable data. I apologize in advance if some of the following discussion is necessarily rather technical.

In general, in evolutionary analysis, conservation is considered a sign of function. Protein coding genes which are more strictly conserved in the course of time are usually considered as having greater functional constraint than those genes which change more. The same is supposed to be true for non coding sequences, although the topic is much more controversial.

So, we start here considering how much of the human genome is conserved, and how that conservation relates to function. These will be the first two points in the discussion.

 

1) How much of the human genome is made of conserved sequences?

Luckily, this is a point which is well understood. After all, conservation can be evaluated objectively aligning the genomes of different species, and that has already been done with enough precision.

However, it is important to remember that the result can be somewhat different according to how we define conservation, and according to the method we use to measure it. That is perfectly normal.

A very complete paper about sequence conservation in genomes is the following:

Adam Siepel, Gill Bejerano, Jakob S. Pedersen, et al.

”Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes”

In that paper, they evaluate conservation in vertebrate genomes. Just to make it short, they find about 4.3% of conservation in the human genome (referred to vertebrates), while allowing that:

These numbers are somewhat sensitive to the methods used for parameter estimation. Various different methods produced coverage estimates of 2.8% – 8.1% for the vertebrates, 36.9% – 53.1% for the insects, 18.4% – 36.6% for the worms, and 46.5% -67.6% for the yeasts (see Supplemental material). Note that the vertebrate coverage is similar to recent estimates of 5% – 8% for the share of the human genome that is under purifying selection (Chiaromonte et al. 2003; Roskinetal. 2003; Cooper et al. 2004), despite the use of quite different methods and datasets.

So, we can say that with most methods the  percentage of the human genome which is conserved is about 3 – 8%.

If we look carefully at Figure 3  in the same paper, and in particular to the data about vertebrates, we find other interesting information:

a) Protein coding regions (exons, in red) are highly conserved, about 68%, but they are only 18% of the conserved regions.

b) Introns are less conserved, almost 5%, and they are 28.5% of the conserved regions.

c) Unannotated regions (the rest of non coding DNA) are even less conserved, about 2,5%, and they are 41.2% of the conserved regions.

Other gene associated sequences (5’ UTR, 3’ UTR, etc.) represent smaller fractions.

So, there is no doubt that non coding DNA is less conserved than coding DNA (less than 5% versus 68%), but there is no doubt that most of conserved DNA is non coding (about 70%).

Another important point is that we are discussing here general conservation. The same paper analyzes also highly conserved elements (HCEs). They cover only 0.14% of the human genome, a much smaller fraction: About 42% of these were in gene coding or gene associated regions, and about 58% in non coding regions.

Finally, there is an even more restricted category, ultra conserved elements (UCEs), with 100% identity, which is described in this paper:

G. Bejerano et al.: “Ultraconserved elements in the human genome”.

2004 May 28;304(5675):1321-5. Epub 2004 May 6.

There are 481 segments longer than 200 base pairs (bp) that are absolutely conserved (100% identity with no insertions or deletions) between orthologous regions of the human, rat, and mouse genomes. Nearly all of these segments are also conserved in the chicken and dog genomes, with an average of 95 and 99% identity, respectively. Many are also significantly conserved in fish. These ultraconserved elements of the human genome are most often located either overlapping exons in genes involved in RNA processing or in introns or nearby genes involved in the regulation of transcription and development. Along with more than 5000 sequences of over 100 bp that are absolutely conserved among the three sequenced mammals, these represent a class of genetic elements whose functions and evolutionary origins are yet to be determined, but which are more highly conserved between these species than are proteins and appear to be essential for the ontogeny of mammals and other vertebrates.

This is an even smaller fraction of the genome.

 

2) Is there functional DNA which is not conserved,  in the human genome?

Certainly, and a lot of it!

Everybody knows that the ENCODE project has found that most of human genome is transcribed. That does not necessarily mean that it is functional, as many have pointed out.

A very recent paper from the people at ENCODE discusses the problem of function. It is:

“Defining functional DNA elements in the human genome”

The authors  in that paper use three different approaches to infer function in the human genome:

a) Evolutionary approach. That means conservation. They start with what we have already discussed at point 1, but they refer to mammalian conservation, which can be expected to be somewhat higher than vertebrate conservation. They comment:

The lower bound estimate that 5% of the human genome has been under evolutionary constraint was based on the excess conservation observed in mammalian alignments (2, 3, 87) relative to a neutral reference (typically ancestral repeats, small introns, or fourfold degenerate codon positions). However, estimates that incorporate alternate references, shape-based constraint (88), evolutionary turnover (89), or lineage-specific constraint (90) each suggests  roughly two to three times more constraint than previously (12–15%), and their union might be even larger as they each correct different aspects of alignment-based excess constraint. Moreover, the mutation rate estimates of the human genome are still uncertain and surprisingly low (91) and not inconsistent with a larger fraction of the genome under relatively weaker constraint (92). Although still weakly powered, human population studies suggest that an additional 4–11% of the genome may be under lineage-specific constraint after specifically excluding protein coding regions (90, 92, 93), and these numbers may also increase as our ability to detect human constraint increases with additional human genomes. Thus, revised models, lineage-specific constraint, and additional datasets may further increase evolution-based estimates.

Now, let’s look at Fig. 1 in the paper, a Venn diagram which sums up the results of a detailed analysis of available data. I have checked the exact numbers on which the figure is based in the Supporting Information file. the purple circle is the protein coding fraction in the genome, about 1.25%. The evolutionary conserved fraction of human genome is the red circle, and it is  7.38% of the whole genome.  The greater part of it (6.33%) is non coding DNA. That is in good accord with what reported at point 1.

b) Genetic approach. With that, the authors mean proof of modifications in phenotype with genetic alterations of the sequence. This is the “gold standard” of function. It means that function is certainly there.

The subset of genome for which there is genetic confirmation of function is the green area. I have not found the exact numbers for it in the paper, but I would say that it is about 15%. It can be seen that it somewhat overlaps the conserved circle, but at least 50% of it is not conserved and is not protein coding. As this is the gold standard, we have here a significant portion of non coding DNA which is not conserved while being certainly functional.

c) Biochemical approach. This is the traditional ENCODE approach, the one with indicates possible function in 80% of the genome. It is based on many biochemical evidences, which are explained in the paper.The blue areas indeed include about 80% of the whole genome.

However, the authors divided the blue area in three subsets, according to the level of activity detected. The dark blue area is the area with high level of activity. So, let’s consider only that subsets, leaving the other two as controversial, at present.

For the dark blue area (15.56%), evidence at transcription level and at other biochemical levels is very high. So, the inference of function can be considered very reliable. As can be seen, the dark blue area overlaps the green area and the red circle, but still about two thirds of it are out of both.

If we consider the union of these three different subsets (red circle, green area, dark blue area) we have the total portion of the genome for which there is convincing evidence of function, at the present state. It  is about 24%, and most of it is non coding.

Moreover, the percentage of functional genome can only increase in time. While the red circle (conserved elements) and the purple circle (protein coding genes) are more or less final, the green area (gold standard) can only expand, and it can potentially confirm the function of parts of the blue areas (including those with lower activity).

So, to sum up:

– At the present state of knowledge, function is extremely likely for 24% of the human genome.

– For about 15% (green area) it is certain.

Most of that functional genome (about 95% of it) is non coding.

Most of that functional genome (about 70% of it) is not conserved.

But that is not all. There are two other important points which must be addressed, and which are even more intriguing. They are:

3) Conserved function which does not imply conserved sequence.

4) Function which requires non conservation of sequence.

I will deal with them in the second part of this discussion.

Comments
Upright BiPed @ 14, [OT] I'm glad you are interested in the subject of my current studies. I plan to share the results of my search at some point. Also I have enjoyed reading your interesting comments in other threads, though sometimes the discussed subjects are difficult to me.Dionisio
May 18, 2014
May
05
May
18
18
2014
07:40 PM
7
07
40
PM
PDT
I didn't mean to imply that you'd called me anything, gpuccio. Rather, even if you happily make the leap form "subject to function" to "is functional" you end up with much less of th genome being functional than most of your fellow travelers seem to think is the case.wd400
May 18, 2014
May
05
May
18
18
2014
06:28 PM
6
06
28
PM
PDT
(Dionoso, OT, I am among those here who are enjoying your search.)Upright BiPed
May 18, 2014
May
05
May
18
18
2014
06:20 PM
6
06
20
PM
PDT
gpuccio,
Let’s try to just discuss, and let the ideas speak for themselves.
Well stated. Caro Dottore, So glad to see your new post, which as usually, has started to generate interesting follow-up discussions within the same thread. Also as usually, most of these discussions seem way above and beyond my limited capacity to understand them, but I try to read them so I can learn something new, that somehow might be related to the subject I'm currently interested in. I continue to struggle with trying to compile a detailed step by step description of the mechanisms behind the cell fate determination, differentiation and migration within the first few weeks of human embryonic development. This slow studying has led me to the intrinsic* asymmetric cell division, which has taken me to the spindle apparatus mechanisms. At this point I'm looking at the mechanisms associated with the centrosome and everything that goes with it during the mitotic phases. Among the several sources of information I'm reviewing, I thought that perhaps the following quote slightly (indirectly) relate to the complex functionality issues you've been writing about lately? If not, then just disregard it. Please, note that the quoted paper seems to be pretty fresh out of the oven, which might confirm something you told me in another thread - that certain things in this area of science are not so well understood yet, which makes this whole studying experience even more exciting to me!
The mitotic spindle is defined by its organized, bipolar mass of microtubules, which drive chromosome alignment and segregation. Although different cells have been shown to use different molecular pathways to generate the microtubules required for spindle formation, how these pathways are coordinated within a single cell is poorly understood.
From http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3898610/ (*) have left the extrinsic ACD for later reviewing.Dionisio
May 18, 2014
May
05
May
18
18
2014
05:54 PM
5
05
54
PM
PDT
Thanks GP. Looking forward to part two.Upright BiPed
May 18, 2014
May
05
May
18
18
2014
05:02 PM
5
05
02
PM
PDT
wd400: I don't call you anything, I just think you are wrong. And you still have to make this jump from the idea that a sequence has not yet been proven functional beyond any doubt to the idea that it is not functional. Again, biological evidence for new and different functions of different parts of non coding DNA is accumulating quickly, and most of it is very recent. In your shoes, I would just be cautious in "stating it’s likely most of the human genome is junk". And you should see the things I’ve been called in other places for just existing! :) Let's try to just discuss, and let the ideas speak for themselves.gpuccio
May 18, 2014
May
05
May
18
18
2014
04:48 PM
4
04
48
PM
PDT
Well, to get to 20% you still have to make this jump from the idea that a sequence is subject to biological function to the idea it is functional. And 20% is a far cry most, and you should see the things I've been called here for stating it's likely most of the human genome is junk.wd400
May 18, 2014
May
05
May
18
18
2014
03:52 PM
3
03
52
PM
PDT
Piotr: If you knew how tentative are many "gold standards" in medicine! :)gpuccio
May 18, 2014
May
05
May
18
18
2014
03:25 PM
3
03
25
PM
PDT
gpuccio:
Well, I took the “gold standard” concept from the paper itself.
I realise that much. Still, it's too proud a name for a somewhat tentative criterion.
“A coding-independent function of gene and pseudogene mRNAs regulates tumour biology”
I have no problem with this kind of "function".Piotr
May 18, 2014
May
05
May
18
18
2014
03:12 PM
3
03
12
PM
PDT
wd400, @2 I deliberately mentioned a specific colour (and the "function" of the allele involved, supposing for the sake of argument that it isn't involved in much else). No disagreement about the locus.Piotr
May 18, 2014
May
05
May
18
18
2014
03:08 PM
3
03
08
PM
PDT
wd400:
I also think functional elements will always be conserved – it’s just a question the gap across which the elements have been retained: there will be ape and human-specific functional elements, which may even double the proportion of functional DNA form the mamallian-wide conservation. But it still won’t add up to the “death of junk DNA” that many commentators here seem to think has happened.
I have said clearly that conservation can be measured differently. Mammalian conservation is certainly higher than vertebrate conservation, and primate conservation is certainly even higher. The Siepel paper data are about vertebrate conservation. The Kellis paper data are about mammalian conservation, and they are slightly higher. However, the two series of data are essentially in accord. Even if you do not consider the estimated green area in the reasoning, we still have about 20% of the human genome which is very probably functional (red circle + dark blue area), and 19% is non coding. I think the general concept does not change. The problem is not the "death" of junk DNA. You can keep some pieces of it, if that is of some consolation. :) The point is that we have much more non coding DNA than coding DNA in the functional subset, and it is destined to grow. Maybe it will not be 80% in the end, but it will be definitely more than 20% of the whole genome. Maybe much more. We will see. Moreover, wait for the other two "arguments". :)gpuccio
May 18, 2014
May
05
May
18
18
2014
03:05 PM
3
03
05
PM
PDT
Piotr: Well, I took the "gold standard" concept from the paper itself: "Genetic approaches, which rely on sequence alterations to establish the biological relevance of a DNA segment, are often considered a gold standard for defining function." I suppose that if we really look at biologically relevant effects, then your objection can be met. For example, this paper: "A coding-independent function of gene and pseudogene mRNAs regulates tumour biology" http://www.nature.com/nature/journal/v465/n7301/full/nature09144.html is about a role of pseudogenes as interacting with tumor suppressor genes. That is certainly a relevant biological function.gpuccio
May 18, 2014
May
05
May
18
18
2014
02:56 PM
2
02
56
PM
PDT
wd400: The figure legend says: "The green shaded domain conceptually represents DNA that produces a phenotype upon alteration, although we lack well-developed summary estimates for the amount of genetic evidence and its relationship with the other types. This summary of our understanding in early 2014 will likely evolve substantially with more data and more refined experimental and analytical methods." So, I suppose it is an estimate ("summary of our understanding"), but I would not say it is "summary of our understanding". I think they based it on some general review of present literature, even if not on "well-developed summary estimates". There are many papers in the literature which may have been considered in that estimate, but there is probably no general quantitative summary of them. So I agree that we must consider that quantification as more tentative, and not as a precise measure. But it certainly has its value.gpuccio
May 18, 2014
May
05
May
18
18
2014
02:48 PM
2
02
48
PM
PDT
I also think functional elements will always be conserved - it's just a question the gap across which the elements have been retained: there will be ape and human-specific functional elements, which may even double the proportion of functional DNA form the mamallian-wide conservation. But it still won't add up to the "death of junk DNA" that many commentators here seem to think has happened.wd400
May 18, 2014
May
05
May
18
18
2014
02:41 PM
2
02
41
PM
PDT
It's pretty clear form Fig 1 that the size and shape of the green region is made up - it's name even has a question mark. Piotr, Im not sure I agree. In the examples you mention the variants themselves have not obvious biological consequence, but if the sequences they were from did play a role in some biological function we wouldn't be able to measure the variation (and indeed the eye-colour locus will certainly be subject to purifying selection, even if eye-colour alleles are not under selection). There could be examples of phenotypes form usually non-functional sections of the genome (i.e. a derived allele for a non-functional locus could mess up some other functional system) but I doubt there would be many such cases.wd400
May 18, 2014
May
05
May
18
18
2014
02:29 PM
2
02
29
PM
PDT
I'll be watching this with genuine interest, especially if you develop the last two points. Just one remark at this stage: the fact that the phenotype is somehow affected does not unquestionably imply a function, so I wouldn't call it the "gold standard". Phenotypes often vary in ways that are genomically conditioned but don't affect their fitness. What is the "function" of brown eyes, for example, or the ability to roll your tongue (no just-so stories, please)? This, of course, accounts for the fact that many of the sequences in question will not be conserved.Piotr
May 18, 2014
May
05
May
18
18
2014
02:15 PM
2
02
15
PM
PDT
1 2 3 4

Leave a Reply