Uncommon Descent Serving The Intelligent Design Community

Is functional information in DNA always conserved? (Part two)

Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

So, in the  first  part of this discussion, I have tried to show with real data from scientific literature how much of the human genome is conserved, and how that conservation is evaluated and expressed. Then I have argued that we already have good credible evidence for function in a relevant part of the human genome (let’s say about 20%), and that most of that functional part is non coding, and great part of it is non conserved. While some can disagree on the real figures, I think that it is really difficult to reject the whole argument.

But, as I have anticipated, there are two more important aspects of the issue that I want to discuss ion detail. I will do it now.

3) Conserved function which does not imply conserved sequence.

The reason why sequence is conserved when function is present is because function creates specific constraints to the sequence itself.

For example, in a protein sequence with a well defined biochemical function, some variation will be possible without affecting the protein function,  while other kinds of variation will affect it more or less.

We have many examples of important loss of function for the change of even one aminoacid:  mendelian diseases in humans are a well known, unpleasant example of that.

We have many examples of important variation in the sequence of functional proteins which does not affect the function:  the so called neutral variations in proteins. For example, there are many variants of human hemoglobin, more than 1000, most of them caused by a single aminoacid substitution. While many of them cause some disease, or at least some functional modification of the protein, at least a few of them are completely silent clinically, both in the heterozygote and in the homozygote state.

Now, there is an important consequence of that. Neutral variation happens also in functional sequences, although it happens less in those sequences. How much neutral variation can be tolerated by a functional sequnece depends on the sequence. For proteins, it is well known that some of them can vary a lot while retaining the same structure and function, while others are much more functionally constrained. Therefore, even functional proteins are more or less conserved, in the same span of time.

What about non coding genes? While we  understand much (but not all) of the sequence-structure-function relationship for proteins, here we are almost wholly ignorant. Non coding genes, when they are functional, act in very different ways, most of them not well understood. Many of them are transcribed, and we don’t understand much of the structure of the transcribed RNAs, least of all of their sequence-structure-function relationship.  IOWs, we have no idea of how functionally constrained is the sequence of a functional non coding DNA element.

While searching for pertinent literature about this issue, I have found this very recent, interesting paper:

Evolutionary conservation of long non-coding RNAs; sequence, structure, function.

The abstract (all emphasis is mine):

BACKGROUND:

Recent advances in genomewide studies have revealed the abundance of long non-coding RNAs (lncRNAs) in mammalian transcriptomes. The ENCODE Consortium has elucidated the prevalence of human lncRNA genes, which are as numerous as protein-coding genes. Surprisingly, many lncRNAs do not show the same pattern of high interspecies conservation as protein-coding genes. The absence of functional studies and the frequent lack of sequence conservation therefore make functional interpretation of these newly discovered transcripts challenging. Many investigators have suggested the presence and importance of secondary structural elements within lncRNAs, but mammalian lncRNA secondary structure remains poorly understood. It is intriguing to speculate that in this group of genes, RNA secondary structures might be preserved throughout evolution and that this might explain the lack of sequence conservation among many lncRNAs.

SCOPE OF REVIEW:

Here, we review the extent of interspecies conservation among different lncRNAs, with a focus on a subset of lncRNAs that have been functionally investigated. The function of lncRNAs is widespread and we investigate whether different forms of functionalities may beconserved.

MAJOR CONCLUSIONS:

Lack of conservation does not imbue a lack of function. We highlight several examples of lncRNAs where RNA structure appears to be the main functional unit and evolutionary constraint. We survey existing genomewide studies of mammalian lncRNA conservation and summarize their limitations. We further review specific human lncRNAs which lack evolutionary conservation beyond primates but have proven to be both functional and therapeutically relevant.

GENERAL SIGNIFICANCE:

Pioneering studies highlight a role in lncRNAs for secondary structures, and possibly the presence of functional “modules”, which are interspersed with longer and less conserved stretches of nucleotide sequences. Taken together, high-throughput analysis of conservation and functional composition of the still-mysterious lncRNA genes is only now becoming feasible.

 

So, what are we talking here? The point is simple. Function in non coding DNA can be linked to specific structures in RNA transcripts, and those structures, and therefore their function, can be conserved across species even in absence of sequence conservation. Why? Because the sequence/structure/function relationship in this kind of molecules is completely different from what we observe in proteins, and we still understand very little of those issues.

As the authors say:

In contrast to microRNAs, almost all of which are post-transcriptional repressors, the diverse functions of lncRNAs include both positive and negative regulations of protein-coding genes, and range fromlncRNA:RNA and lncRNA:
protein to lncRNA:chromatin interactions [8–11]. Due to this functional diversity, it seems reasonable to presume that different evolutionary constraints might be operative for different RNAs, such as mRNAs, microRNAs, and lncRNAs.

Which is exactly my point.

The authors examine a few cases where the sequence/structure/functional relationship of some lncRNAs has been stiudied more in detail.  They conclude:

Tens of thousands of human lncRNAs have been identified during the first genomic decade. Functional studies for most of these lncRNAs are however still lackingwith only a handful having been characterized in detail [8,10,11,87]. Fromthese few studies it is apparent that some lncRNAs are important cellular effectors ranging from splice complex formation [34] to chromatin and chromosomal complex formation [43,46] to epigenetic regulators of key cellular genes.

It is becoming increasingly apparent that lncRNAs do not show the same pattern of evolutionary conservation as protein-coding genes. Many lncRNAs have been shown to be evolutionary conserved [5]; but they do not appear to exhibit the same evolutionary constraints as mRNAs of protein-coding genes.

While certain regions of the lncRNAs appear tomaintain the regulatory function, such as bulges and loops, the exact sequence in other regions of lncRNAs appear less important and possibly act as spacers in order to link functional units or modules. Depending on the function, e.g.,whether the RNA sequence is a linker or a functional module, different patterns of conservation might be expected.

It is important to remember that lncRNA genes are only a part of non coding DNA. If someone wonders how big a part, I would suggest the following paper:

The Vast, Conserved Mammalian lincRNome

which estimates human lncRNA genes at about 53,649 genes, more than twice the number of protein coding genes, corresponding to about 2.7% of the whole genome (Figure 2). It’s an important part, but only a part. And it is a part which, while probably functional in many cases, still is poorly conserved at sequence level.

Other parts of the non coding genome will have different types of function, structure, and therefore sequence conservation. For example, the following paper:

Integrated genome analysis suggests that most conserved non-coding sequences are regulatory factor binding sites

argues that most conserved non coding regions (about 3.5% of the genome, conserved across vertebrate phylogeny, strongly suggesting its functional importance, which clusters into >700 000 unannotated conserved islands, 90% of which are <200 bp) “serve as promoter-distal regulatory factor binding sites (RFBSs) like enhancers”, rather than encoding non-coding RNAs. IOWs, these short sequences in the non coding genome which make up another 3.5% of the total would be functional not because of their RNA transcript, but directly as binding sites (enhancers and other distal regulatory elements). Now, these sequences are conserved. That proves the general point: different functions, different relationship between sequence and function, different conservation of functional elements. In general, it seems that function which expresses itself through non coding RNA transcripts is less conserved at sequence level.

And now, the last point, maybe the most important of all.

4) Function which requires non conservation of sequence.

When we analyze conservation of sequences across species as an indicator of function, we are forgetting a fundamental point: in the course of natural history, species change, and function changes with them.

IOWs, the reason why species are different is that they have different molecular functions.

So, there is some implicit contradiction in equating conservation with function. A conserved sequence is very likely to be functional, but it is not true that a function needs a conserved sequence, if it is a new function, or a function which has changed.

Now. we know that protein coding genes have not changed a lot in the last parts of natural history. It is usually recognized that the greatest change, especially in more recent taxa, is probably regulatory. And the functions which have been identified in various parts of non coding DNA are exactly that: regulatory.

So, to sum up:

– Species evolve and change

– The main tool for that change is, realistically, a change in regulatory functions

– If a function changes, the sequences on which the function is based must change too

– Therefore, those important regulatory functions which change for functional reasons will not be conserved across species

This point is different from the previous point discussed here.

In point 3, the reasoning was that the same function can be conserved even if the sequence changes, provided that the structure is conserved.

In point 4, we are saying that in many cases the sequence must change for the function to change with it.

Now, although this reasoning is quite logic and convincing, I will try to backup it with empirical observations. To that purpose, I will use two different models: HARs and the results of the recent FANTOM5 paper about the promoterome.

4a) Human Accelerated Regions (HARs).

Waht are HARs? Let’s take it from Wikipedia:

Human accelerated regions (HARs), first described in August 2006,  are a set of 49 segments of the human genome that are conserved throughout vertebrate evolution but are strikingly different in humans.

IOWs, they are sequences which were conserved in primates, and which change in humans.

Are they functional. That’s what is believed for some of them. Again, Wikipedia:

Several of the HARs encompass genes known to produce proteins important in neurodevelopment. HAR1 is an 106-base pair stretch found on the long arm of chromosome 20 overlapping with part of the RNA genes HAR1F and HAR1R. HAR1F is active in the developing human brain. The HAR1 sequence is found (and conserved) in chickens and chimpanzees but is not present in fish or frogs that have been studied. There are 18 base pair mutations different between humans and chimpanzees, far more than expected by its history of conservation.[1]

HAR2 includes HACNS1 a gene enhancer “that may have contributed to the evolution of the uniquely opposable human thumb, and possibly also modifications in the ankle or foot that allow humans to walk on two legs”. Evidence to date shows that of the 110,000 gene enhancer sequences identified in the human genome, HACNS1 has undergone the most change during the evolution of humans following the split with the ancestors of chimpanzees.[4] The substitutions in HAR2 may have resulted in loss of binding sites for a repressor, possibly due to biased gene conversion

Now, for brevity, I will not go into details, but…  “active in the developing human brain” and “may have contributed to the evolution of the uniquely opposable human thumb, and possibly also modifications in the ankle or foot that allow humans to walk on two legs” are provocative thoughts enough, and I believe that I don’t need to comment on them.

The important point is: what makes us humans different from chimps? Logic says: something which is different. Not something which is conserved.

4b) The results from FANTOM5 about the promoterome.

FANTOM5 has very recently published a series of papers with very important results. One the most important is probably the following article on Nature:

A promoter-level mammalian expression atlas

Unfortunately, the article is paywalled. I have access to it, so I will try to sum up the points which are needed for my reasoning.

So, what did they do? In brief, they used a very powerful technology, cap analysis of gene expression (CAGE), to study various aspects of the transcriptome in different human cells from different tissues and states. This is probably the most important analysis of the human transcriptome ever realized.

This particular paper focuses on a “promoter atlas”, IOWs an atlas of the expression of promoters (transcription start sites, TSSs, which control the transcription of target genes) in different tissues.

So, according to the level of expression of those promoters in different tissues and cells, they classify genes (both protein coding and non protein coding) in:

– ubiquitous-uniform (‘housekeeping’, 6%): those genes which are expressed at similar levels in most cell types

– ubiquitous non-uniform (14%): expressed in most cell types, but at different levels

– non-ubiquitous (cell-type restricted, 80%)

Each of those types includes both  C (protein coding genes) and N (non protein coding genes).

Now. that’s very interesting. Now we know that most genes (80%), both coding and non coding, are expressed only in some cell types.

But the most interesting thing, for our discussion about conservation, is that they studied the promoter expression both in human cells and in other mammals.

Now, we must look at Figure 3 in the paper. For those who cannot access the article, there is a low resolution version of this figure here  (just click on Figure 3 in the “at a glance” box;  OK, OK, it’s better than nothing!).

The figure is divided into two parts, a and b. In each part, the x axis shows the evolutionary divergence from humans (from 0 to 0.8, the grey vertical lines correspond to macaque, dog and mouse). The y axis shows “Human TSS with aligning orthologous sequence (%)”, IOWs the % conservation of each group of genes in the graph at various points of evolutionary divergence. Each line represents a different group of genes. So, the lines which remain more “horizontal” represent groups of genes which are more conserved, while those which “go down” from lest to right are those less conserved.  I hope it’s clear.

On the left (part a) genes are grouped as above: ubiquitous- uniform, etc, each category divided into C or N (coding or non coding).

What are the conserved groups? In order:  Non-ubiquitous C (green line); Ubiquitous uniform C (orange line); Ubiquitous non-uniform C (purple line).

IOWs, coding genes are more conserved, and non ubiquitous are most conserved.

That is not news.

Conversely, non coding genes are less conserved, in this order: Non-ubiquitous N (lighter green); Ubiquitous non-uniform N (lighter purple); Ubiquitous uniform N (lighter orange). This last line is definitely less conserved than the random reference (the dotted line).

This part is “Conservation by expression breadth and annotation”.

Well, what is on the right (part b)? It is “Conservation by cell-type biased expression”.

IOWs, the graph is the same, but genes are grouped in different lines according to the cell type where they are preferentially expressed.

The most conserved groups? Those with preferential expression in:  Fibroblast of periodontium, Fibroblast of gingiva, Preadipocyte, Chondrocyte, Mesenchymal cell.

The least conserved? Those with preferential expression in:  Astrocyte, Hepatocyte, Neuron, Sensory epithelial cell, Macrophage, T-cell, Blood vessel endothelial cell. In decreasing conservation order.

Does that mean something?  I leave it to you to decide. For me, I definitely see a pattern. With all due respect for fibroblasts and adipocytes, neurons and T cells smell more of specialized cells which must change in higher taxa (excuse me, Piotr, mice will accuse me of not being politically correct).

So, my humble suggestion is: the things that change more are not necessarily those less functional. In many cases, they could be exactly the opposite: the bearers of new, more complex functions.

And non coding genes are very good candidates for that role.

Comments
@34 Oops, sorry, let me try again: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0032877Piotr
May 22, 2014
May
05
May
22
22
2014
12:35 AM
12
12
35
AM
PDT
Gpuccio:
IOWs, the current theory that most variation is neutral and that most of human genome is non functional is simply wrong, and obstinately ignores the problem of where the procedures are written, those procedures which make a mouse a mouse, a dog a dog, and a human a human.
Where indeed? Please tell us where those procedures are written. Can we perhaps see some of them? Sorry, Gpuccio, but you are sinking into gibberish. So someone's been patiently manipulating billions of genomes over billions of years by psychokinetic means in order to achieve feats like making the 1,500 species of Drosophila different from each other, or to make the bibymalagasy a bibymalagasy (presumably for a reason, but only to let it go extinct when he's bored with the bibymalagasy project).Piotr
May 22, 2014
May
05
May
22
22
2014
12:32 AM
12
12
32
AM
PDT
Piotr: I am afraid that "the link above" does not work. That's why I asked.gpuccio
May 22, 2014
May
05
May
22
22
2014
12:09 AM
12
12
09
AM
PDT
Piotr: OK, but what is the importance of that? If HARs appeared before the divergence of various hominins, that is fine for me. My point is that functional innovation requires, in principle, sequence innovation. And that those variations will be restricted to those branches (classes, species, even races) where they are needed to express the functional difference. Even the divergence between bacteria and archea, the oldest example of fundamental difference, poses the same problem: was it a functional divergence? And if, as I believe, the answer is yes, how much of the differences between bacteria and archea, as we see them now, can be linked to that original functional differentiation? That is a reasoning based on design and function. It does not assume that the differences are mainly due to random neutral variation, and that function is some strange aside that emerges by sheer luck. On the contrary, it considers function for what it is, an extremely abundant and amazing property of the biological world, and asks for an explanation of it, and for the necessary implications at molecular level. The point is: paradigms do matter in scientific reasoning, and it is very important to choose the right paradigm, because wrong paradigms only lead to deformation of facts and of their interpretation.gpuccio
May 22, 2014
May
05
May
22
22
2014
12:07 AM
12
12
07
AM
PDT
Mung: I am a design believer, as I believe you are. The point of interest for me is how the functional result originates, and I think I have the general answer in the design paradigm. Frankly, I am less interested in how the functional design becomes fixed, if we admit that it starts in a limited part of the population. I suppose that NS can have a role in that. I am sure that drift would lose most of the designed traits in favor of useless variation. I don't think that is a good way to "sell" designed things on the market! We must understand that the role of NS in the neo darwinian paradigm is not only of fixing the final result, but mainly of contributing to its generation by selecting the intermediate "small steps". That role is completely imaginary and unsupported. But I suppose that, once a protein, or a species, is there, and it is functional, NS can act on that. That has nothing to do with the supposed role of NS in generating that protein or that species. After all, we know that negative purifying selection acts on functional proteins because we see the amazing conservation of many of them. It is certainly more difficult to establish the role of NS in the "expansion" of function (that is, the role of positive selection. But I don't see any reason to deny it a priori. What is your position about those points? Just to understand.gpuccio
May 21, 2014
May
05
May
21
21
2014
11:55 PM
11
11
55
PM
PDT
What are the “closely related species that shared most of them” and which “have gone extinct”?
Neanderthals and denisovans. I mention these two because we have their DNA, and it has been checked for the presence of HARS alleles (see the link above). There's none available from other hominins.Piotr
May 21, 2014
May
05
May
21
21
2014
11:48 PM
11
11
48
PM
PDT
rhampton7: I agree with your concept. That is certainly a field which can be directly investigated. Similarly, the genetic differences between members of the same species and their phenotypic effects are also an interesting aspect, and potentially of great importance. Let's see if the new powerful technologies of transcriptome analysis can find new perspectives about that. Frankly, I have no problem with "uneasy answers". I love answers, whatever they are. The only type of answers I don't like is "wrong answers". :)gpuccio
May 21, 2014
May
05
May
21
21
2014
11:44 PM
11
11
44
PM
PDT
wd400: "How do you get from a few hundred thousand base pairs of “human accelerated regions” to “the current theory that most variation is neutral and that most of human genome is non functional is simply wrong”." Please, follow my reasoning. I start with the very logic consideration that function must change in different species, because different species are different. That simple fact is really underestimated. And it implies that regulatory functions which change, which are the real basis of the important differences between species, require that molecular sequences change, and that they change for a reason. And that such a functional change, if not recognized as such, will usually be considered as simple random variation due to neutral mutations. Then I offer two different empirical observations to support this principle. HARs and the difference in transcriptome conservation observed in FANTOM5, which can suggest a functional pattern. I am not quantifying show much sequence difference in the genome is due to functional change from those two examples. I am only trying to show that there are empirical observations that support my reasoning. I am well aware that HARs are few and short, and that quantitatively they so not mean much. But they are a good qualitative model. They diverge quickly, after having been conserved in the previous phase of evolution. And, exactly from that, they pose a problem: are they diverging because of random neutral variation, or because they have a new function? And we really don't know yet the answer for all of them, we just have hints for a few. So, my point is: we don't know why sequences change. The default explanation that they change mostly for random neutral mutations is just that, a default explanation which simply ignores the possibility that they change for functional reasons, and mostly fro those regulatory functional reasons which we don't yet understand, but of which we see the results in phenotype. The second observation about transcriptomes can potentially interest greater parts of the genome. However, I am not counting here how much genome changes for functional reasons. I don't think we have the data to understand that, at present. I am only showing that part of the genome must change fro functional reasons, that we don't know yet what part it is and how big it is, that there are empirical observation at molecular level (beyond the obvious observations at phenotypic level) that support that fact, and that the present theories completely ignore, or absolutely minimize, that concept.gpuccio
May 21, 2014
May
05
May
21
21
2014
11:39 PM
11
11
39
PM
PDT
Piotr:
Piotr evidently believes that the quick fixation of new functional variants in the species in question must have been due to natural selection, since drift is simply too slow.
Your reasoning is flawed. Drift is slow [questionable premise] Natural Selection is fast [questionable premise] Drift is too slow therefore Natural Selection is fast [non sequitur] Drift is too slow therefore it must have been Natural Selection [non sequitur] If you want fast fixation, reduce the population size and let drift do it's thing!Mung
May 21, 2014
May
05
May
21
21
2014
07:32 PM
7
07
32
PM
PDT
However, when I wrote about “species” being different at molecular level, I was not thinking only of strictly related species, but in more general terms, including differences among genus, family, order, class, and even phyla.
I get that, but that's not where your likely to find the dividing line between nature and intelligent design. After all, both sides in the debate have assumptions about the abilities (or lack thereof) of natural processes generating new information. Without impirical data, this will never be resolved. That's why I see my proposal as being important. Humans have witnessed the differentiation of dog breeds - that is, the role of natural (material) forces in generating variability. Thus we need to measure just how different those genomes are (how many regions changed, how many base pairs per change, etc). Even among many ID advocates its widely assumed that a basal wolf form could have naturally evolved (without intelligent intervention) into all the known Canid species (wolf, fox, jackal, coyote). This too is an assumption that needs to be tested, hence the the need for measurement. Presumably the differences will be greater but still within nature's grasp. I suspect that some uneasy answers are going to come forward: perhaps the jump between species like the Arctic and the Fennec fox requires intelligent intervention, or perhaps nature alone is capable of generating all the forms of the sub-order Caniformia (bears, skunks, badgers, raccoons, seals, walrus, wolves, etc.)rhampton7
May 21, 2014
May
05
May
21
21
2014
07:29 PM
7
07
29
PM
PDT
I'm sorry, How do you get from a few hundred thousand base pairs of "human accelerated regions" to "the current theory that most variation is neutral and that most of human genome is non functional is simply wrong". An especially puzzling claim since the HARs are detected becase they diverge faster than the neutral expectation.wd400
May 21, 2014
May
05
May
21
21
2014
05:39 PM
5
05
39
PM
PDT
Piotr: I wonder what exactly you mean here: "it’s because some regulatory regions have undergone recent bursts of accelerated evolution and, in the case of HARs, closely related species that shared most of them have gone extinct, leaving those innovations restricted to one surviving species." What are the "closely related species that shared most of them" and which "have gone extinct"? My point in section 4) is simply this: function which is specific of some species, or taxon, or class, you name it, requires difference. Whenever it emerges. The things that generates functional differences between species wiil be based on different molecular solutions. HARs are essentially sequences which were conserved in primates. They were probably functional in primates (otherwise, why were they conserved)? But in humans they change. And probably (at least some of them) they change to serve some other function. My point is that much of the non coding DNA which varies more than coding DNA across species can well vary for functional reasons, and not simply because it is modified by neutral mutations because of its supposed lack of function. My point is that, while proteins change little, procedures change a lot across species. And procedures are mainly written in non coding DNA. Which, therefore, changes more. That's why T cells and neurons change more than fibroblasts across mammals. IOWs, the current theory that most variation is neutral and that most of human genome is non functional is simply wrong, and obstinately ignores the problem of where the procedures are written, those procedures which make a mouse a mouse, a dog a dog, and a human a human. Which are not only small accidental differences in some gene, which by sheer luck produce a brain and a connectome different from anything that previously existed. But, on the contrary, whole reorganizations of the complexity, new plans of software and function, which are individualized and optimized in each single new design, and require therefore individual and specific information in each case. Different in each case. These are my points.gpuccio
May 21, 2014
May
05
May
21
21
2014
04:19 PM
4
04
19
PM
PDT
Gpuccio: I wonder what exactly your point is in Section 4). The title "Function which requires non conservation of sequence" is misleading. Non-conservation is not required here, and it doesn't define a separate class of functions. If this kind of stuff isn't conserved "across species", it's because some regulatory regions have undergone recent bursts of accelerated evolution and, in the case of HARs, closely related species that shared most of them have gone extinct, leaving those innovations restricted to one surviving species. It isn't long-term conservation that reveals functionality, anyway, but any evidence of selection. For example, we presume that the HAR alleles fixed in humans are "functional" because their fixation pattern clearly shows they have been selected for. There's no shortage of very old and highly conserved sequences whose regulatory function is known or can be inferred. If the recent ones are not (yet) conserved, it's a consequence of their young age, not the importance of their functions.Piotr
May 21, 2014
May
05
May
21
21
2014
02:53 PM
2
02
53
PM
PDT
Dionisio: Intrinsic asymmetric mitosis is a fascinating subject. Let me know if you find interesting reviews on it. Bioinformatics is the future. I am immersed in reading other papers from FANTOM5. They are truly a treasure of new information.gpuccio
May 21, 2014
May
05
May
21
21
2014
02:23 PM
2
02
23
PM
PDT
gpuccio,
Just as a reflection, if as said we have sets of about 400 specific Transcription Factors which characterize cell types (indeed, a median of 430 TFs per cell type),...
very interesting
I wanted to compute the “space of combinations” of 400 TFs out of 1700. I tried in R with the “choose” function, but the result is out of range.
oh, no!
So, I downsized the problem to the combinations of 200 TFs out of 1700.
That's a good approach.
The result is 7.900757e+265.
Wow! That's an indecent number! ;-)
But there are still those who believe that cell differentiation in metazoa can happen without a lot of procedures written somewhere!
Well, maybe, but how? Can they describe it or at least give us a hint? As you know, I'm trying to understand the mechanisms behind the centrosome and the spindle apparatus operating during the intrinsic asymmetric mitosis in order to understand the cell fate determination, differentiation, migration, etc. during the first few weeks of human embryonic development, from an information processing perspective, and believe me, I'm struggling and sweating, just to gather and make sense of the information that is out there. It's like a never-ending story. The more I dig, the deeper I have to keep digging. Questions get answered while new questions arise. But it's fascinating. Who needs science fiction movies or computer-based action games when we have this mind-boggling scientific information to look at, though most times having no idea what it means? Mio caro amico, we have not seen anything yet... the party has just started... the fun part is still ahead... the amount of data coming out of research is overwhelming... how can science process and analyze all that information? Enormous computer and software resources have to be assigned to those important tasks. Many hard working scientists are dedicated to leading edge research, but perhaps more are needed? How to motivate students to pursue bioinformatics and biology research careers?Dionisio
May 21, 2014
May
05
May
21
21
2014
12:35 PM
12
12
35
PM
PDT
GP, "Interesting, isn't it?" Mio amico, 'Interesting' is an understatement in this case. I think 'Wow!' would be more appropriate ;-) Wow! Thank you for sharing this report (+ your comments) with the rest of us here. Mio Caro Dottore, we ain't seen nothing yet... the party is just beginning... let the FANTOM5, RIKEN, and other serious research centers overwhelm us with delightful reports. If only we could understand and digest so much info faster, but that's fine. One thing at a time. Yes, let's enjoy it!Dionisio
May 21, 2014
May
05
May
21
21
2014
10:48 AM
10
10
48
AM
PDT
Dionisio: Just as a reflection, if as said we have sets of about 400 specific Transcription Factors which characterize cell types (indeed, a median of 430 TFs per cell type), I wanted to compute the "space of combinations" of 400 TFs out of 1700. I tried in R with the "choose" function, but the result is out of range. So, I downsized the problem to the combinations of 200 TFs out of 1700. The result is 7.900757e+265. But there are still those who believe that cell differentiation in metazoa can happen without a lot of procedures written somewhere!gpuccio
May 21, 2014
May
05
May
21
21
2014
10:42 AM
10
10
42
AM
PDT
Dionisio: Thank you for your contributions. As usual, you say many important things. I would like to profit of the relative tranquility of this discussion to give some more data from the FANTOM5 promoterome paper. First of all, it is important to know that the analysis was performed on a really impressing number of different cell types:
Single molecule CAGE profiles were generated across a collection of 573 human primary cell samples (,3 donors for most cell types) and 128 mouse primary cell samples, covering most mammalian cell steady states.This data set is complemented with profiles of 250 different cancer cell lines (all available through public repositories and representing 154 distinct cancer subtypes), 152 human post-mortem tissues and 271 mouse developmental tissue samples
This is really amazing. They give us for the first time an idea of the complexity of the transcriptome in different cell types in humans. For example, they studied the expression of different transcription factors in different cells.
Among 1,762 human and 1,516 mouse transcription factors compiled from the literature 21–23, promoter level expression profiles for 1,665 human transcription factors (94%) and 1,382 mouse transcription factors (91%) were obtained (Supplementary Tables 7, 8 and 9 and Supplementary Note 6). The distribution of expression levels and cell-type or tissue specificity of transcription factors (Extended Data Fig. 3f–j) and the number of robust promoter peaks per transcription factor gene was similar to coding genes in general (4.8 compared to 4.6). In any given primary cell type, a median of 430 (306 to 722) transcription factors were expressed at 10 TPM or above (~3 copies per cell based on 300,000 mRNAs per cell 18) (Extended Data Fig. 3g).
So, we know now that specific cell types have in average high expression of about 400 specific transcription factors. And we still have no idea of how these highly specific and complex cell profiles are achieved in each different cell type. Finally, "Figure 4 | Coexpression clustering of human promoters in FANTOM5" gives an idea of how different clusters of genes are expressed in different cell types. If you have the paper, just look at it: it is something. If you are satisfied with a low resolution image, just go here: http://www.nature.com/nature/journal/v507/n7493/full/nature13182.html and click on Figure 4. "Collapsed coexpression network derived from 4,882 coexpression groups (one node is one group of promoters; 4,664 groups are shown here) derived from expression profiles of 124,090 promoters across all primary cell types, tissues and cell lines" Enjoy! By the way, I would like to quote one of the important conclusions in the paper:
The most commonly-enriched terms at a P value threshold of 10220 were classical monocyte (CL:0000860; 26,634 peaks, 14%), bone marrow (UBERON:0002371; 22,387 peaks, 12%) and neural tube (UBERON:0001049; 20,484 peaks, 11%) (Supplementary Table 13). This is consistent with the coexpression clustering in Fig. 4 (green and purple spheres correspond to leukocyte and central nervous system enriched expression profiles) and indicates that a large fraction of the mammalian genome is dedicated to immune and nervous system specific functions.
Interesting, isn't it?gpuccio
May 21, 2014
May
05
May
21
21
2014
10:25 AM
10
10
25
AM
PDT
gpuccio
Let’s try to concentrate on the ideas, rather than on our personal lucidity, I would really appreciate that. :)
Agree. Well stated. Thank you. Perhaps I have made that mistake too, and have written something that relates more to 'personal lucidity' than to the discussed issues. My apologies for that. This is a very interesting discussion, which requires clear thinking and solid arguments based on valid evidences. One of my children, who is a veterinarian doctor, doesn't understand how anyone can spend a minute reading what is discussed here. Why would anyone like to deal with these complex issues? Well, I don't know. ;-)Dionisio
May 21, 2014
May
05
May
21
21
2014
09:36 AM
9
09
36
AM
PDT
Joe,
What I strongly disagree with is sheer dumb luck producing a novel functional protein from non-functional sequences.
I could even concede having that unlikely hypothetical scenario that you mentioned, but what my poor mind can't figure out is having the (timing + location) ingredients of the recipes. I mean the 'being in the right place at the right time' kind of situation. The choreography and the orchestration. I definitely need someone to provide an explanation. The explanation must be robust, so it holds water in any weather conditions ;-). It must withstand any questions asked by any 7-year old child, which sometimes are more difficult to answer than the most educated questions. Let's see if the following example car serve as an illustration. My former boss was a brilliant engineer, who came up with the main ideas for the software product I worked on as a simple programmer. Several software developers working together without the direction of the leading engineer, could have created some cool programs, but nothing comparable to the software product that was used by many engineers for years, because it was designed specially to make engineering design work easier and more efficient. My boss communicated his ideas to the analysts, who then wrote the programming specs for the programmers, who then wrote the programs in a code that was translated by an development environment compiler to a lower level assembly language which the operating system could translate to the lower level machine code that utilized the electronic signals and circuit topology of the microprocessors, based on the physical properties of the electronic components of the computer. At the end of that complex software development project, which included design, development, implementation, testing, delivery, they had a product that was used by many engineers in different engineering organizations in different countries. That successful software product was a close reflection of the original idea the main engineer had in his brilliant mind long before the first technical meeting took place to discuss the idea. Perhaps that example could serve as an analogy to the subject that occupies us here in this blog, because we deal with complex specified purpose-oriented prescriptive functional information. However, in the biology case we discuss here, the level of complexity and sophistication is incomparable higher than the software development example. But the basic concept of top-down design seems similar. Without the original idea of the main engineer combined with his skillful communication of his ideas to us, that system could not have been created, even if we had all the other components of the process, as described before. Yes, the programmers could have written some cool apps, that perhaps did a little of this and a little of that, but not that successful comprehensive system that was sought after by many design engineers, because it was conceived by someone who knew what engineers needed in order to do their work much easier, faster and efficiently. Without the main idea guy directing the show, it would not have made any difference hiring more programmers, better programmers, working more years, etc. Still that successful product or anything similar would not have been created. It would have been like a group of guys trying to swim from Portugal to Bermuda just using their natural abilities. Let's say a cheating swimmer starts the race a week before or one kilometer ahead of the other swimmers. Would that have made a difference for the ultimate goal of reaching Bermuda? Let's assume one dishonest guy cheat and wears a pair of fins, against the official rules. Well, perhaps he would be able to swim much faster than the rest, but eventually all the competitors would be exhausted, sunburned, taken out of the water to a recovery clinic. No advantageous natural feature could make them reach the goal. Yes, let them have a few functional proteins pop up somehow here and there. Don't even bother to ask how they got them. Maybe someone stole them from the cookies jar. You may give them a few more treats too. Let them start swimming a couple of kilometers ahead, a few weeks before. That's fine. Just wish them a safe swim and please ask them to send us a nice colorful postcard from Bermuda when they get there, after they dry themselves well, change clothes and enjoy a well deserved meal in that beautiful island ;-)Dionisio
May 21, 2014
May
05
May
21
21
2014
09:18 AM
9
09
18
AM
PDT
Piotr and Joe: Let's try to concentrate on the ideas, rather than on our personal lucidity, I would really appreciate that. :)gpuccio
May 21, 2014
May
05
May
21
21
2014
06:58 AM
6
06
58
AM
PDT
I'm confused because Piotr doesn't have a clue? How does that work, exactly?Joe
May 21, 2014
May
05
May
21
21
2014
06:43 AM
6
06
43
AM
PDT
Piotr: When you write to supplement what I have said, please make it clear, or my natural insecurity can have the best of me! :) OK, if you mean fixation of an already functional sequence, which could have been generated by design, then I agree with you: the final fixation of an existing functional sequence is possible work for NS.gpuccio
May 21, 2014
May
05
May
21
21
2014
06:34 AM
6
06
34
AM
PDT
Gpuccio: I wrote more to supplement what you'd said than to disagree with you. However:
My point is that different functions require different sequences.
Except when one and the same sequence is multifunctional, or assumes a novel function in a changed context.
Piotr evidently believes that functional variation can be explained by NS. We are waiting inputs from him about that. Maybe they will come.
Piotr evidently believes that the quick fixation of new functional variants in the species in question must have been due to natural selection, since drift is simply too slow. Most of our present-day neutral polymorphisms are shared with neanderthals and denisovans, because the time since the common ancestor has been to short for random drift to fix them. This has nothing to do with the origin of those new functional variants, so please don't confuse Joe further; he's already too confused.Piotr
May 21, 2014
May
05
May
21
21
2014
06:26 AM
6
06
26
AM
PDT
Joe: I obviously agree with you. I was only trying to separate the two aspects in the present discussion.gpuccio
May 21, 2014
May
05
May
21
21
2014
06:11 AM
6
06
11
AM
PDT
gpuccio- Functional variation is very different than explaining the origin of the function. I have no qualms about blind and undirected processes slightly altering existing functions. Sheer dumb luck does exist and given many opportunities can possibly produce something that works slightly different than the original. What I strongly disagree with is sheer dumb luck producing a novel functional protein from non-functional sequences.Joe
May 21, 2014
May
05
May
21
21
2014
06:06 AM
6
06
06
AM
PDT
Joe: Piotr evidently believes that functional variation can be explained by NS. We are waiting inputs from him about that. Maybe they will come. We evidently believe that functional variation is explained by design. I don't envy Piotr's position. However, we can probably agree that functional variation exists. If he agrees on that, it would already be something. Let's see. I suppose he has rather been evading the problem, at present.gpuccio
May 21, 2014
May
05
May
21
21
2014
05:56 AM
5
05
56
AM
PDT
Piotr: I can agree with what you say. And so? My point is not to say that we are better than chimps (although we may have some advantages, probably) or than Neandertals (who knows?). My point is that different functions require different sequences. Now, unless you want to deny the differences between, say mouse and human, it is obvious that we have to allow differences in sequences as functional: they will provide mouse functions in mice, and human functions in humans. Let's not judge. But the differences remain. So, we cannot ignore the problem of DNA which changes to express new functions. It must be there. And we don't know how much of the variation we observe is due to functional change. But we will discover, as time goes by. That point is also fundamental in my confutation of the famous Moran argument. It is wrong to equate variation with neutral variation, until we don't know how much of that variation is linked to functional differences. If functional variation is subtracted from total variation, the remaining variation which is reasonably neutral is less. Less neutral variation mean more purifying selection, which means that a higher part of the whole genome is functional, even according to the general standard. Therefore, Moran's argument is wrong because it simply ignores the functional variation, whose percentage is still to be determined.gpuccio
May 21, 2014
May
05
May
21
21
2014
05:52 AM
5
05
52
AM
PDT
Piotr: '
which suggests that most HARS became fixed rather quickly by natural selection
Why natural selection, Piotr? What evidence supports that claim? Do you ever get tired of your bald assertions?Joe
May 21, 2014
May
05
May
21
21
2014
05:46 AM
5
05
46
AM
PDT
The important point is: what makes us humans different from chimps? Logic says: something which is different. Not something which is conserved.
Note, however, that it works both ways if you flip the roles. Chimpanzees, symmetrically, have chimp-specific accelerated regions (CARs), and mice have mouse-specific accelerated regions (MARs), and I suppoose aardvarks have aardvark-specific accelerated regions (AARs). What's more, such lineage-specific accelerated regions often have to do with the regulation of cortical development, not just in humans but also in other species:
Strikingly, while some of these sequences were accelerated in the human lineage only, many others were accelerated in chimpanzee and/or mouse lineages, indicating that genes important for cortical development may be particularly prone to changes in transcriptional regulation across mammals. [Lambert et al. 2011, emphasis mine]
Interestingly, comparison between human and neandertal/denisovan genomes shows that there are fewer differences between our HARs than in the rest of the genome, which suggests that most HARS became fixed rather quickly by natural selection before our divergence from the common ancestor. One could say that they are "conserved" in the whole cluster of species most closely related to Homo sapiens.Piotr
May 21, 2014
May
05
May
21
21
2014
05:24 AM
5
05
24
AM
PDT
1 2 3 4 5

Leave a Reply