Intelligent Design

Is functional information in DNA always conserved? (Part two)

Spread the love

So, in the  first  part of this discussion, I have tried to show with real data from scientific literature how much of the human genome is conserved, and how that conservation is evaluated and expressed. Then I have argued that we already have good credible evidence for function in a relevant part of the human genome (let’s say about 20%), and that most of that functional part is non coding, and great part of it is non conserved. While some can disagree on the real figures, I think that it is really difficult to reject the whole argument.

But, as I have anticipated, there are two more important aspects of the issue that I want to discuss ion detail. I will do it now.

3) Conserved function which does not imply conserved sequence.

The reason why sequence is conserved when function is present is because function creates specific constraints to the sequence itself.

For example, in a protein sequence with a well defined biochemical function, some variation will be possible without affecting the protein function,  while other kinds of variation will affect it more or less.

We have many examples of important loss of function for the change of even one aminoacid:  mendelian diseases in humans are a well known, unpleasant example of that.

We have many examples of important variation in the sequence of functional proteins which does not affect the function:  the so called neutral variations in proteins. For example, there are many variants of human hemoglobin, more than 1000, most of them caused by a single aminoacid substitution. While many of them cause some disease, or at least some functional modification of the protein, at least a few of them are completely silent clinically, both in the heterozygote and in the homozygote state.

Now, there is an important consequence of that. Neutral variation happens also in functional sequences, although it happens less in those sequences. How much neutral variation can be tolerated by a functional sequnece depends on the sequence. For proteins, it is well known that some of them can vary a lot while retaining the same structure and function, while others are much more functionally constrained. Therefore, even functional proteins are more or less conserved, in the same span of time.

What about non coding genes? While we  understand much (but not all) of the sequence-structure-function relationship for proteins, here we are almost wholly ignorant. Non coding genes, when they are functional, act in very different ways, most of them not well understood. Many of them are transcribed, and we don’t understand much of the structure of the transcribed RNAs, least of all of their sequence-structure-function relationship.  IOWs, we have no idea of how functionally constrained is the sequence of a functional non coding DNA element.

While searching for pertinent literature about this issue, I have found this very recent, interesting paper:

Evolutionary conservation of long non-coding RNAs; sequence, structure, function.

The abstract (all emphasis is mine):

BACKGROUND:

Recent advances in genomewide studies have revealed the abundance of long non-coding RNAs (lncRNAs) in mammalian transcriptomes. The ENCODE Consortium has elucidated the prevalence of human lncRNA genes, which are as numerous as protein-coding genes. Surprisingly, many lncRNAs do not show the same pattern of high interspecies conservation as protein-coding genes. The absence of functional studies and the frequent lack of sequence conservation therefore make functional interpretation of these newly discovered transcripts challenging. Many investigators have suggested the presence and importance of secondary structural elements within lncRNAs, but mammalian lncRNA secondary structure remains poorly understood. It is intriguing to speculate that in this group of genes, RNA secondary structures might be preserved throughout evolution and that this might explain the lack of sequence conservation among many lncRNAs.

SCOPE OF REVIEW:

Here, we review the extent of interspecies conservation among different lncRNAs, with a focus on a subset of lncRNAs that have been functionally investigated. The function of lncRNAs is widespread and we investigate whether different forms of functionalities may beconserved.

MAJOR CONCLUSIONS:

Lack of conservation does not imbue a lack of function. We highlight several examples of lncRNAs where RNA structure appears to be the main functional unit and evolutionary constraint. We survey existing genomewide studies of mammalian lncRNA conservation and summarize their limitations. We further review specific human lncRNAs which lack evolutionary conservation beyond primates but have proven to be both functional and therapeutically relevant.

GENERAL SIGNIFICANCE:

Pioneering studies highlight a role in lncRNAs for secondary structures, and possibly the presence of functional “modules”, which are interspersed with longer and less conserved stretches of nucleotide sequences. Taken together, high-throughput analysis of conservation and functional composition of the still-mysterious lncRNA genes is only now becoming feasible.

 

So, what are we talking here? The point is simple. Function in non coding DNA can be linked to specific structures in RNA transcripts, and those structures, and therefore their function, can be conserved across species even in absence of sequence conservation. Why? Because the sequence/structure/function relationship in this kind of molecules is completely different from what we observe in proteins, and we still understand very little of those issues.

As the authors say:

In contrast to microRNAs, almost all of which are post-transcriptional repressors, the diverse functions of lncRNAs include both positive and negative regulations of protein-coding genes, and range fromlncRNA:RNA and lncRNA:
protein to lncRNA:chromatin interactions [8–11]. Due to this functional diversity, it seems reasonable to presume that different evolutionary constraints might be operative for different RNAs, such as mRNAs, microRNAs, and lncRNAs.

Which is exactly my point.

The authors examine a few cases where the sequence/structure/functional relationship of some lncRNAs has been stiudied more in detail.  They conclude:

Tens of thousands of human lncRNAs have been identified during the first genomic decade. Functional studies for most of these lncRNAs are however still lackingwith only a handful having been characterized in detail [8,10,11,87]. Fromthese few studies it is apparent that some lncRNAs are important cellular effectors ranging from splice complex formation [34] to chromatin and chromosomal complex formation [43,46] to epigenetic regulators of key cellular genes.

It is becoming increasingly apparent that lncRNAs do not show the same pattern of evolutionary conservation as protein-coding genes. Many lncRNAs have been shown to be evolutionary conserved [5]; but they do not appear to exhibit the same evolutionary constraints as mRNAs of protein-coding genes.

While certain regions of the lncRNAs appear tomaintain the regulatory function, such as bulges and loops, the exact sequence in other regions of lncRNAs appear less important and possibly act as spacers in order to link functional units or modules. Depending on the function, e.g.,whether the RNA sequence is a linker or a functional module, different patterns of conservation might be expected.

It is important to remember that lncRNA genes are only a part of non coding DNA. If someone wonders how big a part, I would suggest the following paper:

The Vast, Conserved Mammalian lincRNome

which estimates human lncRNA genes at about 53,649 genes, more than twice the number of protein coding genes, corresponding to about 2.7% of the whole genome (Figure 2). It’s an important part, but only a part. And it is a part which, while probably functional in many cases, still is poorly conserved at sequence level.

Other parts of the non coding genome will have different types of function, structure, and therefore sequence conservation. For example, the following paper:

Integrated genome analysis suggests that most conserved non-coding sequences are regulatory factor binding sites

argues that most conserved non coding regions (about 3.5% of the genome, conserved across vertebrate phylogeny, strongly suggesting its functional importance, which clusters into >700 000 unannotated conserved islands, 90% of which are <200 bp) “serve as promoter-distal regulatory factor binding sites (RFBSs) like enhancers”, rather than encoding non-coding RNAs. IOWs, these short sequences in the non coding genome which make up another 3.5% of the total would be functional not because of their RNA transcript, but directly as binding sites (enhancers and other distal regulatory elements). Now, these sequences are conserved. That proves the general point: different functions, different relationship between sequence and function, different conservation of functional elements. In general, it seems that function which expresses itself through non coding RNA transcripts is less conserved at sequence level.

And now, the last point, maybe the most important of all.

4) Function which requires non conservation of sequence.

When we analyze conservation of sequences across species as an indicator of function, we are forgetting a fundamental point: in the course of natural history, species change, and function changes with them.

IOWs, the reason why species are different is that they have different molecular functions.

So, there is some implicit contradiction in equating conservation with function. A conserved sequence is very likely to be functional, but it is not true that a function needs a conserved sequence, if it is a new function, or a function which has changed.

Now. we know that protein coding genes have not changed a lot in the last parts of natural history. It is usually recognized that the greatest change, especially in more recent taxa, is probably regulatory. And the functions which have been identified in various parts of non coding DNA are exactly that: regulatory.

So, to sum up:

– Species evolve and change

– The main tool for that change is, realistically, a change in regulatory functions

– If a function changes, the sequences on which the function is based must change too

– Therefore, those important regulatory functions which change for functional reasons will not be conserved across species

This point is different from the previous point discussed here.

In point 3, the reasoning was that the same function can be conserved even if the sequence changes, provided that the structure is conserved.

In point 4, we are saying that in many cases the sequence must change for the function to change with it.

Now, although this reasoning is quite logic and convincing, I will try to backup it with empirical observations. To that purpose, I will use two different models: HARs and the results of the recent FANTOM5 paper about the promoterome.

4a) Human Accelerated Regions (HARs).

Waht are HARs? Let’s take it from Wikipedia:

Human accelerated regions (HARs), first described in August 2006,  are a set of 49 segments of the human genome that are conserved throughout vertebrate evolution but are strikingly different in humans.

IOWs, they are sequences which were conserved in primates, and which change in humans.

Are they functional. That’s what is believed for some of them. Again, Wikipedia:

Several of the HARs encompass genes known to produce proteins important in neurodevelopment. HAR1 is an 106-base pair stretch found on the long arm of chromosome 20 overlapping with part of the RNA genes HAR1F and HAR1R. HAR1F is active in the developing human brain. The HAR1 sequence is found (and conserved) in chickens and chimpanzees but is not present in fish or frogs that have been studied. There are 18 base pair mutations different between humans and chimpanzees, far more than expected by its history of conservation.[1]

HAR2 includes HACNS1 a gene enhancer “that may have contributed to the evolution of the uniquely opposable human thumb, and possibly also modifications in the ankle or foot that allow humans to walk on two legs”. Evidence to date shows that of the 110,000 gene enhancer sequences identified in the human genome, HACNS1 has undergone the most change during the evolution of humans following the split with the ancestors of chimpanzees.[4] The substitutions in HAR2 may have resulted in loss of binding sites for a repressor, possibly due to biased gene conversion

Now, for brevity, I will not go into details, but…  “active in the developing human brain” and “may have contributed to the evolution of the uniquely opposable human thumb, and possibly also modifications in the ankle or foot that allow humans to walk on two legs” are provocative thoughts enough, and I believe that I don’t need to comment on them.

The important point is: what makes us humans different from chimps? Logic says: something which is different. Not something which is conserved.

4b) The results from FANTOM5 about the promoterome.

FANTOM5 has very recently published a series of papers with very important results. One the most important is probably the following article on Nature:

A promoter-level mammalian expression atlas

Unfortunately, the article is paywalled. I have access to it, so I will try to sum up the points which are needed for my reasoning.

So, what did they do? In brief, they used a very powerful technology, cap analysis of gene expression (CAGE), to study various aspects of the transcriptome in different human cells from different tissues and states. This is probably the most important analysis of the human transcriptome ever realized.

This particular paper focuses on a “promoter atlas”, IOWs an atlas of the expression of promoters (transcription start sites, TSSs, which control the transcription of target genes) in different tissues.

So, according to the level of expression of those promoters in different tissues and cells, they classify genes (both protein coding and non protein coding) in:

– ubiquitous-uniform (‘housekeeping’, 6%): those genes which are expressed at similar levels in most cell types

– ubiquitous non-uniform (14%): expressed in most cell types, but at different levels

– non-ubiquitous (cell-type restricted, 80%)

Each of those types includes both  C (protein coding genes) and N (non protein coding genes).

Now. that’s very interesting. Now we know that most genes (80%), both coding and non coding, are expressed only in some cell types.

But the most interesting thing, for our discussion about conservation, is that they studied the promoter expression both in human cells and in other mammals.

Now, we must look at Figure 3 in the paper. For those who cannot access the article, there is a low resolution version of this figure here  (just click on Figure 3 in the “at a glance” box;  OK, OK, it’s better than nothing!).

The figure is divided into two parts, a and b. In each part, the x axis shows the evolutionary divergence from humans (from 0 to 0.8, the grey vertical lines correspond to macaque, dog and mouse). The y axis shows “Human TSS with aligning orthologous sequence (%)”, IOWs the % conservation of each group of genes in the graph at various points of evolutionary divergence. Each line represents a different group of genes. So, the lines which remain more “horizontal” represent groups of genes which are more conserved, while those which “go down” from lest to right are those less conserved.  I hope it’s clear.

On the left (part a) genes are grouped as above: ubiquitous- uniform, etc, each category divided into C or N (coding or non coding).

What are the conserved groups? In order:  Non-ubiquitous C (green line); Ubiquitous uniform C (orange line); Ubiquitous non-uniform C (purple line).

IOWs, coding genes are more conserved, and non ubiquitous are most conserved.

That is not news.

Conversely, non coding genes are less conserved, in this order: Non-ubiquitous N (lighter green); Ubiquitous non-uniform N (lighter purple); Ubiquitous uniform N (lighter orange). This last line is definitely less conserved than the random reference (the dotted line).

This part is “Conservation by expression breadth and annotation”.

Well, what is on the right (part b)? It is “Conservation by cell-type biased expression”.

IOWs, the graph is the same, but genes are grouped in different lines according to the cell type where they are preferentially expressed.

The most conserved groups? Those with preferential expression in:  Fibroblast of periodontium, Fibroblast of gingiva, Preadipocyte, Chondrocyte, Mesenchymal cell.

The least conserved? Those with preferential expression in:  Astrocyte, Hepatocyte, Neuron, Sensory epithelial cell, Macrophage, T-cell, Blood vessel endothelial cell. In decreasing conservation order.

Does that mean something?  I leave it to you to decide. For me, I definitely see a pattern. With all due respect for fibroblasts and adipocytes, neurons and T cells smell more of specialized cells which must change in higher taxa (excuse me, Piotr, mice will accuse me of not being politically correct).

So, my humble suggestion is: the things that change more are not necessarily those less functional. In many cases, they could be exactly the opposite: the bearers of new, more complex functions.

And non coding genes are very good candidates for that role.

126 Replies to “Is functional information in DNA always conserved? (Part two)

  1. 1
    Dionisio says:

    GP:
    Glad to see you have posted the second part of this excellent research you have done. Now let us digest what you wrote, so we can discuss it. In my case, as usual, I will mostly learn from the OP and the follow-up comments, because I’m new to most of this. Thank you! Mile grazie!

  2. 2
    gpuccio says:

    Dionisio:

    Thank you for being the first to comment!

  3. 3
    Joe says:

    Hey, you Italian guys- My maternal grandfather hails from Pre’ de Ledro. I have a painting of it on my hall wall- my cousin painted it from the side of a mountain.

  4. 4
    gpuccio says:

    Joe:

    Well, that’s not exactly where I am, but it seems a very beautiful place!

  5. 5
    rhampton7 says:

    IOWs, the reason why species are different is that they have different molecular functions.

    Unlike humans, there are over a dozen recognized existing species of wolves, foxes, jackals, and coyotes, and even more subspecies (dog breeds). So your definition could test the darwin-based biological classifications of Canidae, leading to the construction of new, intelligent design based classifications. To do the same with human beings would be impossible – all we have is one existing species.

  6. 6
    gpuccio says:

    rhampton7:

    Well, it will certainly be done as our understanding of the molecular differences among species, even related ones, will grow. Bioinformatics and transcriptome analysis have great potentialities, which were inconceivable just a hew years ago.

    However, when I wrote about “species” being different at molecular level, I was not thinking only of strictly related species, but in more general terms, including differences among genus, family, order, class, and even phyla.
    The comparisons in the FANTOM5 paper, for example, were among different orders/families in the class of mammals (the reference animals being macaque, dog and mouse).

  7. 7
    Piotr says:

    The important point is: what makes us humans different from chimps? Logic says: something which is different. Not something which is conserved.

    Note, however, that it works both ways if you flip the roles. Chimpanzees, symmetrically, have chimp-specific accelerated regions (CARs), and mice have mouse-specific accelerated regions (MARs), and I suppoose aardvarks have aardvark-specific accelerated regions (AARs). What’s more, such lineage-specific accelerated regions often have to do with the regulation of cortical development, not just in humans but also in other species:

    Strikingly, while some of these sequences were accelerated in the human lineage only, many others were accelerated in chimpanzee and/or mouse lineages, indicating that genes important for cortical development may be particularly prone to changes in transcriptional regulation across mammals. [Lambert et al. 2011, emphasis mine]

    Interestingly, comparison between human and neandertal/denisovan genomes shows that there are fewer differences between our HARs than in the rest of the genome, which suggests that most HARS became fixed rather quickly by natural selection before our divergence from the common ancestor. One could say that they are “conserved” in the whole cluster of species most closely related to Homo sapiens.

  8. 8
    Joe says:

    Piotr:

    which suggests that most HARS became fixed rather quickly by natural selection

    Why natural selection, Piotr? What evidence supports that claim?

    Do you ever get tired of your bald assertions?

  9. 9
    gpuccio says:

    Piotr:

    I can agree with what you say. And so?

    My point is not to say that we are better than chimps (although we may have some advantages, probably) or than Neandertals (who knows?).

    My point is that different functions require different sequences.

    Now, unless you want to deny the differences between, say mouse and human, it is obvious that we have to allow differences in sequences as functional: they will provide mouse functions in mice, and human functions in humans. Let’s not judge. But the differences remain.

    So, we cannot ignore the problem of DNA which changes to express new functions. It must be there. And we don’t know how much of the variation we observe is due to functional change. But we will discover, as time goes by.

    That point is also fundamental in my confutation of the famous Moran argument. It is wrong to equate variation with neutral variation, until we don’t know how much of that variation is linked to functional differences.

    If functional variation is subtracted from total variation, the remaining variation which is reasonably neutral is less.

    Less neutral variation mean more purifying selection, which means that a higher part of the whole genome is functional, even according to the general standard.

    Therefore, Moran’s argument is wrong because it simply ignores the functional variation, whose percentage is still to be determined.

  10. 10
    gpuccio says:

    Joe:

    Piotr evidently believes that functional variation can be explained by NS. We are waiting inputs from him about that. Maybe they will come.

    We evidently believe that functional variation is explained by design.

    I don’t envy Piotr’s position.

    However, we can probably agree that functional variation exists. If he agrees on that, it would already be something. Let’s see. I suppose he has rather been evading the problem, at present.

  11. 11
    Joe says:

    gpuccio- Functional variation is very different than explaining the origin of the function. I have no qualms about blind and undirected processes slightly altering existing functions. Sheer dumb luck does exist and given many opportunities can possibly produce something that works slightly different than the original.

    What I strongly disagree with is sheer dumb luck producing a novel functional protein from non-functional sequences.

  12. 12
    gpuccio says:

    Joe:

    I obviously agree with you. I was only trying to separate the two aspects in the present discussion.

  13. 13
    Piotr says:

    Gpuccio:

    I wrote more to supplement what you’d said than to disagree with you. However:

    My point is that different functions require different sequences.

    Except when one and the same sequence is multifunctional, or assumes a novel function in a changed context.

    Piotr evidently believes that functional variation can be explained by NS. We are waiting inputs from him about that. Maybe they will come.

    Piotr evidently believes that the quick fixation of new functional variants in the species in question must have been due to natural selection, since drift is simply too slow. Most of our present-day neutral polymorphisms are shared with neanderthals and denisovans, because the time since the common ancestor has been to short for random drift to fix them. This has nothing to do with the origin of those new functional variants, so please don’t confuse Joe further; he’s already too confused.

  14. 14
    gpuccio says:

    Piotr:

    When you write to supplement what I have said, please make it clear, or my natural insecurity can have the best of me! 🙂

    OK, if you mean fixation of an already functional sequence, which could have been generated by design, then I agree with you: the final fixation of an existing functional sequence is possible work for NS.

  15. 15
    Joe says:

    I’m confused because Piotr doesn’t have a clue? How does that work, exactly?

  16. 16
    gpuccio says:

    Piotr and Joe:

    Let’s try to concentrate on the ideas, rather than on our personal lucidity, I would really appreciate that. 🙂

  17. 17
    Dionisio says:

    Joe,

    What I strongly disagree with is sheer dumb luck producing a novel functional protein from non-functional sequences.

    I could even concede having that unlikely hypothetical scenario that you mentioned, but what my poor mind can’t figure out is having the (timing + location) ingredients of the recipes. I mean the ‘being in the right place at the right time’ kind of situation. The choreography and the orchestration. I definitely need someone to provide an explanation. The explanation must be robust, so it holds water in any weather conditions ;-). It must withstand any questions asked by any 7-year old child, which sometimes are more difficult to answer than the most educated questions.

    Let’s see if the following example car serve as an illustration.

    My former boss was a brilliant engineer, who came up with the main ideas for the software product I worked on as a simple programmer. Several software developers working together without the direction of the leading engineer, could have created some cool programs, but nothing comparable to the software product that was used by many engineers for years, because it was designed specially to make engineering design work easier and more efficient.

    My boss communicated his ideas to the analysts, who then wrote the programming specs for the programmers, who then wrote the programs in a code that was translated by an development environment compiler to a lower level assembly language which the operating system could translate to the lower level machine code that utilized the electronic signals and circuit topology of the microprocessors, based on the physical properties of the electronic components of the computer.

    At the end of that complex software development project, which included design, development, implementation, testing, delivery, they had a product that was used by many engineers in different engineering organizations in different countries. That successful software product was a close reflection of the original idea the main engineer had in his brilliant mind long before the first technical meeting took place to discuss the idea.

    Perhaps that example could serve as an analogy to the subject that occupies us here in this blog, because we deal with complex specified purpose-oriented prescriptive functional information. However, in the biology case we discuss here, the level of complexity and sophistication is incomparable higher than the software development example. But the basic concept of top-down design seems similar.

    Without the original idea of the main engineer combined with his skillful communication of his ideas to us, that system could not have been created, even if we had all the other components of the process, as described before.
    Yes, the programmers could have written some cool apps, that perhaps did a little of this and a little of that, but not that successful comprehensive system that was sought after by many design engineers, because it was conceived by someone who knew what engineers needed in order to do their work much easier, faster and efficiently.

    Without the main idea guy directing the show, it would not have made any difference hiring more programmers, better programmers, working more years, etc. Still that successful product or anything similar would not have been created.

    It would have been like a group of guys trying to swim from Portugal to Bermuda just using their natural abilities. Let’s say a cheating swimmer starts the race a week before or one kilometer ahead of the other swimmers. Would that have made a difference for the ultimate goal of reaching Bermuda? Let’s assume one dishonest guy cheat and wears a pair of fins, against the official rules. Well, perhaps he would be able to swim much faster than the rest, but eventually all the competitors would be exhausted, sunburned, taken out of the water to a recovery clinic. No advantageous natural feature could make them reach the goal.

    Yes, let them have a few functional proteins pop up somehow here and there. Don’t even bother to ask how they got them. Maybe someone stole them from the cookies jar. You may give them a few more treats too. Let them start swimming a couple of kilometers ahead, a few weeks before. That’s fine.

    Just wish them a safe swim and please ask them to send us a nice colorful postcard from Bermuda when they get there, after they dry themselves well, change clothes and enjoy a well deserved meal in that beautiful island 😉

  18. 18
    Dionisio says:

    gpuccio

    Let’s try to concentrate on the ideas, rather than on our personal lucidity, I would really appreciate that. 🙂

    Agree. Well stated. Thank you.

    Perhaps I have made that mistake too, and have written something that relates more to ‘personal lucidity’ than to the discussed issues. My apologies for that.

    This is a very interesting discussion, which requires clear thinking and solid arguments based on valid evidences.
    One of my children, who is a veterinarian doctor, doesn’t understand how anyone can spend a minute reading what is discussed here. Why would anyone like to deal with these complex issues? Well, I don’t know. 😉

  19. 19
    gpuccio says:

    Dionisio:

    Thank you for your contributions. As usual, you say many important things.

    I would like to profit of the relative tranquility of this discussion to give some more data from the FANTOM5 promoterome paper.

    First of all, it is important to know that the analysis was performed on a really impressing number of different cell types:

    Single molecule CAGE profiles were generated across a collection of 573 human primary cell samples (,3 donors for most cell types) and 128 mouse primary cell samples, covering most mammalian cell steady
    states.This data set is complemented with profiles of 250 different cancer cell lines (all available through public repositories and representing 154 distinct cancer subtypes), 152 human post-mortem tissues and 271 mouse developmental tissue samples

    This is really amazing. They give us for the first time an idea of the complexity of the transcriptome in different cell types in humans.

    For example, they studied the expression of different transcription factors in different cells.

    Among 1,762 human and 1,516 mouse transcription factors compiled from the literature 21–23, promoter level expression profiles for 1,665 human transcription factors (94%) and 1,382 mouse transcription factors (91%) were obtained (Supplementary Tables 7, 8 and 9 and Supplementary Note 6). The distribution of expression levels and cell-type or tissue specificity of transcription factors (Extended Data Fig. 3f–j) and the number of robust promoter peaks per transcription factor gene was similar to coding genes in general (4.8 compared to 4.6). In any given
    primary cell type, a median of 430 (306 to 722) transcription factors were expressed at 10 TPM or above (~3 copies per cell based on 300,000 mRNAs per cell 18) (Extended Data Fig. 3g).

    So, we know now that specific cell types have in average high expression of about 400 specific transcription factors. And we still have no idea of how these highly specific and complex cell profiles are achieved in each different cell type.

    Finally, “Figure 4 | Coexpression clustering of human promoters in FANTOM5” gives an idea of how different clusters of genes are expressed in different cell types. If you have the paper, just look at it: it is something. If you are satisfied with a low resolution image, just go here:

    http://www.nature.com/nature/j.....13182.html

    and click on Figure 4.

    “Collapsed coexpression network derived from 4,882 coexpression groups (one node is one group of promoters; 4,664 groups are shown here) derived from expression profiles of 124,090 promoters across all primary cell types, tissues and cell lines”

    Enjoy!

    By the way, I would like to quote one of the important conclusions in the paper:

    The most commonly-enriched terms
    at a P value threshold of 10220 were classical monocyte (CL:0000860; 26,634 peaks, 14%), bone marrow (UBERON:0002371; 22,387 peaks, 12%) and neural tube (UBERON:0001049; 20,484 peaks, 11%) (Supplementary
    Table 13). This is consistent with the coexpression clustering in Fig. 4 (green and purple spheres correspond to leukocyte and central nervous system enriched expression profiles) and indicates that a large fraction of the mammalian genome is dedicated to immune and nervous system specific functions.

    Interesting, isn’t it?

  20. 20
    gpuccio says:

    Dionisio:

    Just as a reflection, if as said we have sets of about 400 specific Transcription Factors which characterize cell types (indeed, a median of 430 TFs per cell type), I wanted to compute the “space of combinations” of 400 TFs out of 1700. I tried in R with the “choose” function, but the result is out of range.

    So, I downsized the problem to the combinations of 200 TFs out of 1700. The result is 7.900757e+265.

    But there are still those who believe that cell differentiation in metazoa can happen without a lot of procedures written somewhere!

  21. 21
    Dionisio says:

    GP,

    “Interesting, isn’t it?”

    Mio amico, ‘Interesting’ is an understatement in this case.

    I think ‘Wow!’ would be more appropriate 😉

    Wow! Thank you for sharing this report (+ your comments) with the rest of us here.

    Mio Caro Dottore, we ain’t seen nothing yet… the party is just beginning… let the FANTOM5, RIKEN, and other serious research centers overwhelm us with delightful reports.
    If only we could understand and digest so much info faster, but that’s fine. One thing at a time.

    Yes, let’s enjoy it!

  22. 22
    Dionisio says:

    gpuccio,

    Just as a reflection, if as said we have sets of about 400 specific Transcription Factors which characterize cell types (indeed, a median of 430 TFs per cell type),…

    very interesting

    I wanted to compute the “space of combinations” of 400 TFs out of 1700. I tried in R with the “choose” function, but the result is out of range.

    oh, no!

    So, I downsized the problem to the combinations of 200 TFs out of 1700.

    That’s a good approach.

    The result is 7.900757e+265.

    Wow! That’s an indecent number! 😉

    But there are still those who believe that cell differentiation in metazoa can happen without a lot of procedures written somewhere!

    Well, maybe, but how? Can they describe it or at least give us a hint?

    As you know, I’m trying to understand the mechanisms behind the centrosome and the spindle apparatus operating during the intrinsic asymmetric mitosis in order to understand the cell fate determination, differentiation, migration, etc. during the first few weeks of human embryonic development, from an information processing perspective, and believe me, I’m struggling and sweating, just to gather and make sense of the information that is out there. It’s like a never-ending story. The more I dig, the deeper I have to keep digging. Questions get answered while new questions arise. But it’s fascinating. Who needs science fiction movies or computer-based action games when we have this mind-boggling scientific information to look at, though most times having no idea what it means?

    Mio caro amico, we have not seen anything yet… the party has just started… the fun part is still ahead…
    the amount of data coming out of research is overwhelming… how can science process and analyze all that information? Enormous computer and software resources have to be assigned to those important tasks.
    Many hard working scientists are dedicated to leading edge research, but perhaps more are needed?
    How to motivate students to pursue bioinformatics and biology research careers?

  23. 23
    gpuccio says:

    Dionisio:

    Intrinsic asymmetric mitosis is a fascinating subject. Let me know if you find interesting reviews on it.

    Bioinformatics is the future. I am immersed in reading other papers from FANTOM5. They are truly a treasure of new information.

  24. 24
    Piotr says:

    Gpuccio:

    I wonder what exactly your point is in Section 4). The title “Function which requires non conservation of sequence” is misleading. Non-conservation is not required here, and it doesn’t define a separate class of functions. If this kind of stuff isn’t conserved “across species”, it’s because some regulatory regions have undergone recent bursts of accelerated evolution and, in the case of HARs, closely related species that shared most of them have gone extinct, leaving those innovations restricted to one surviving species.

    It isn’t long-term conservation that reveals functionality, anyway, but any evidence of selection. For example, we presume that the HAR alleles fixed in humans are “functional” because their fixation pattern clearly shows they have been selected for. There’s no shortage of very old and highly conserved sequences whose regulatory function is known or can be inferred. If the recent ones are not (yet) conserved, it’s a consequence of their young age, not the importance of their functions.

  25. 25
    gpuccio says:

    Piotr:

    I wonder what exactly you mean here:

    “it’s because some regulatory regions have undergone recent bursts of accelerated evolution and, in the case of HARs, closely related species that shared most of them have gone extinct, leaving those innovations restricted to one surviving species.”

    What are the “closely related species that shared most of them” and which “have gone extinct”?

    My point in section 4) is simply this: function which is specific of some species, or taxon, or class, you name it, requires difference. Whenever it emerges. The things that generates functional differences between species wiil be based on different molecular solutions.

    HARs are essentially sequences which were conserved in primates. They were probably functional in primates (otherwise, why were they conserved)? But in humans they change. And probably (at least some of them) they change to serve some other function.

    My point is that much of the non coding DNA which varies more than coding DNA across species can well vary for functional reasons, and not simply because it is modified by neutral mutations because of its supposed lack of function.

    My point is that, while proteins change little, procedures change a lot across species. And procedures are mainly written in non coding DNA. Which, therefore, changes more.

    That’s why T cells and neurons change more than fibroblasts across mammals.

    IOWs, the current theory that most variation is neutral and that most of human genome is non functional is simply wrong, and obstinately ignores the problem of where the procedures are written, those procedures which make a mouse a mouse, a dog a dog, and a human a human. Which are not only small accidental differences in some gene, which by sheer luck produce a brain and a connectome different from anything that previously existed. But, on the contrary, whole reorganizations of the complexity, new plans of software and function, which are individualized and optimized in each single new design, and require therefore individual and specific information in each case. Different in each case.

    These are my points.

  26. 26
    wd400 says:

    I’m sorry,

    How do you get from a few hundred thousand base pairs of “human accelerated regions” to “the current theory that most variation is neutral and that most of human genome is non functional is simply wrong”. An especially puzzling claim since the HARs are detected becase they diverge faster than the neutral expectation.

  27. 27
    rhampton7 says:

    However, when I wrote about “species” being different at molecular level, I was not thinking only of strictly related species, but in more general terms, including differences among genus, family, order, class, and even phyla.

    I get that, but that’s not where your likely to find the dividing line between nature and intelligent design. After all, both sides in the debate have assumptions about the abilities (or lack thereof) of natural processes generating new information. Without impirical data, this will never be resolved.

    That’s why I see my proposal as being important. Humans have witnessed the differentiation of dog breeds – that is, the role of natural (material) forces in generating variability. Thus we need to measure just how different those genomes are (how many regions changed, how many base pairs per change, etc).

    Even among many ID advocates its widely assumed that a basal wolf form could have naturally evolved (without intelligent intervention) into all the known Canid species (wolf, fox, jackal, coyote). This too is an assumption that needs to be tested, hence the the need for measurement. Presumably the differences will be greater but still within nature’s grasp.

    I suspect that some uneasy answers are going to come forward: perhaps the jump between species like the Arctic and the Fennec fox requires intelligent intervention, or perhaps nature alone is capable of generating all the forms of the sub-order Caniformia (bears, skunks, badgers, raccoons, seals, walrus, wolves, etc.)

  28. 28
    Mung says:

    Piotr:

    Piotr evidently believes that the quick fixation of new functional variants in the species in question must have been due to natural selection, since drift is simply too slow.

    Your reasoning is flawed.

    Drift is slow [questionable premise]

    Natural Selection is fast [questionable premise]

    Drift is too slow therefore Natural Selection is fast [non sequitur]

    Drift is too slow therefore it must have been Natural Selection [non sequitur]

    If you want fast fixation, reduce the population size and let drift do it’s thing!

  29. 29
    gpuccio says:

    wd400:

    “How do you get from a few hundred thousand base pairs of “human accelerated regions” to “the current theory that most variation is neutral and that most of human genome is non functional is simply wrong”.”

    Please, follow my reasoning. I start with the very logic consideration that function must change in different species, because different species are different. That simple fact is really underestimated. And it implies that regulatory functions which change, which are the real basis of the important differences between species, require that molecular sequences change, and that they change for a reason. And that such a functional change, if not recognized as such, will usually be considered as simple random variation due to neutral mutations.

    Then I offer two different empirical observations to support this principle. HARs and the difference in transcriptome conservation observed in FANTOM5, which can suggest a functional pattern.

    I am not quantifying show much sequence difference in the genome is due to functional change from those two examples. I am only trying to show that there are empirical observations that support my reasoning.

    I am well aware that HARs are few and short, and that quantitatively they so not mean much. But they are a good qualitative model. They diverge quickly, after having been conserved in the previous phase of evolution. And, exactly from that, they pose a problem: are they diverging because of random neutral variation, or because they have a new function? And we really don’t know yet the answer for all of them, we just have hints for a few.

    So, my point is: we don’t know why sequences change. The default explanation that they change mostly for random neutral mutations is just that, a default explanation which simply ignores the possibility that they change for functional reasons, and mostly fro those regulatory functional reasons which we don’t yet understand, but of which we see the results in phenotype.

    The second observation about transcriptomes can potentially interest greater parts of the genome. However, I am not counting here how much genome changes for functional reasons. I don’t think we have the data to understand that, at present.

    I am only showing that part of the genome must change fro functional reasons, that we don’t know yet what part it is and how big it is, that there are empirical observation at molecular level (beyond the obvious observations at phenotypic level) that support that fact, and that the present theories completely ignore, or absolutely minimize, that concept.

  30. 30
    gpuccio says:

    rhampton7:

    I agree with your concept. That is certainly a field which can be directly investigated. Similarly, the genetic differences between members of the same species and their phenotypic effects are also an interesting aspect, and potentially of great importance.

    Let’s see if the new powerful technologies of transcriptome analysis can find new perspectives about that.

    Frankly, I have no problem with “uneasy answers”. I love answers, whatever they are. The only type of answers I don’t like is “wrong answers”. 🙂

  31. 31
    Piotr says:

    What are the “closely related species that shared most of them” and which “have gone extinct”?

    Neanderthals and denisovans. I mention these two because we have their DNA, and it has been checked for the presence of HARS alleles (see the link above). There’s none available from other hominins.

  32. 32
    gpuccio says:

    Mung:

    I am a design believer, as I believe you are. The point of interest for me is how the functional result originates, and I think I have the general answer in the design paradigm.

    Frankly, I am less interested in how the functional design becomes fixed, if we admit that it starts in a limited part of the population.

    I suppose that NS can have a role in that. I am sure that drift would lose most of the designed traits in favor of useless variation. I don’t think that is a good way to “sell” designed things on the market!

    We must understand that the role of NS in the neo darwinian paradigm is not only of fixing the final result, but mainly of contributing to its generation by selecting the intermediate “small steps”. That role is completely imaginary and unsupported. But I suppose that, once a protein, or a species, is there, and it is functional, NS can act on that. That has nothing to do with the supposed role of NS in generating that protein or that species.

    After all, we know that negative purifying selection acts on functional proteins because we see the amazing conservation of many of them. It is certainly more difficult to establish the role of NS in the “expansion” of function (that is, the role of positive selection. But I don’t see any reason to deny it a priori.

    What is your position about those points? Just to understand.

  33. 33
    gpuccio says:

    Piotr:

    OK, but what is the importance of that? If HARs appeared before the divergence of various hominins, that is fine for me. My point is that functional innovation requires, in principle, sequence innovation. And that those variations will be restricted to those branches (classes, species, even races) where they are needed to express the functional difference.

    Even the divergence between bacteria and archea, the oldest example of fundamental difference, poses the same problem: was it a functional divergence? And if, as I believe, the answer is yes, how much of the differences between bacteria and archea, as we see them now, can be linked to that original functional differentiation?

    That is a reasoning based on design and function. It does not assume that the differences are mainly due to random neutral variation, and that function is some strange aside that emerges by sheer luck. On the contrary, it considers function for what it is, an extremely abundant and amazing property of the biological world, and asks for an explanation of it, and for the necessary implications at molecular level.

    The point is: paradigms do matter in scientific reasoning, and it is very important to choose the right paradigm, because wrong paradigms only lead to deformation of facts and of their interpretation.

  34. 34
    gpuccio says:

    Piotr:

    I am afraid that “the link above” does not work. That’s why I asked.

  35. 35
    Piotr says:

    Gpuccio:

    IOWs, the current theory that most variation is neutral and that most of human genome is non functional is simply wrong, and obstinately ignores the problem of where the procedures are written, those procedures which make a mouse a mouse, a dog a dog, and a human a human.

    Where indeed? Please tell us where those procedures are written. Can we perhaps see some of them? Sorry, Gpuccio, but you are sinking into gibberish.

    So someone’s been patiently manipulating billions of genomes over billions of years by psychokinetic means in order to achieve feats like making the 1,500 species of Drosophila different from each other, or to make the bibymalagasy a bibymalagasy (presumably for a reason, but only to let it go extinct when he’s bored with the bibymalagasy project).

  36. 36
  37. 37
    gpuccio says:

    Piotr:

    First of all, thanks for the link. It is interesting.

    You say:

    Where indeed? Please tell us where those procedures are written. Can we perhaps see some of them? Sorry, Gpuccio, but you are sinking into gibberish.

    What gibberish?

    I wrote:

    “ignores the problem of where the procedures are written, those procedures which make a mouse a mouse, a dog a dog, and a human a human.”

    Emphasis added.

    A problem is a problem. I have been saying for years that we don’t understand where and how the procedures are written. That’s the problem.

    Denying that the procedures exist is not a good solution. Or are you suggesting that there is no difference between “mice and men”?

    Non coding DNA is the best candidate for the procedures. We are beginning to understand something of that, but I would heartily agree that we are still in the dark about the most important aspects of the issue.

    What is gibberish about that? Or is any serious question about what we don’t understand, motivated by serious observations in reality, qualify as gibberish?

    So someone’s been patiently manipulating billions of genomes over billions of years by psychokinetic means in order to achieve feats like making the 1,500 species of Drosophila different from each other, or to make the bibymalagasy a bibymalagasy (presumably for a reason, but only to let it go extinct when he’s bored with the bibymalagasy project).

    Yes.

    But I don’t agree about the final part. There is no special reason to believe that the extinction of species is designed. It’s the generation of functional information which requires design, not its destruction. That can well be “natural”.

    For example, I believe that the Ediacara explosion, and its extinction, are examples of very good design and of its failure. Design projects may fail even if the designer is not at all bored by them. Usually the designer, if not easily discouraged, tries again (see Cambrian explosion).

    And I can agree about the “psychokinetic means” only if you admit that the same “psychokinetic means” can be working in human brains.

    By the way, thank you for the mention of the bibymalagasy. I did not know of it, but the name is really something. Always good to discuss with a linguist!

  38. 38
    gpuccio says:

    Piotr and wd400:

    I will try to explain again my point very simply, with reference to the FANTOM5 data I have discussed.

    a) We have about 500 different transcirptomes in human cells, according to the FANTOM5 categorization.

    b) We can maybe agree that some seqeucnes in the genome explain those different transcriptomes (what I call “the procedures”).

    c) We can maybe agree that there are different porganizations of the brain in mouse, dog and human.

    d) We can maybe agree that those differences are probably related to different transcriptomes in cells, and specifically in the neurons, and therefore to the DNA sequences that control transcriptomes.

    e) We can see from Figure 3 in the FANTOM5 paper that human neurons show, as measured in the paper, about 45% conservation between humans and dogs, and 40% conservation between humans and mice, while, for example, fibroblasts are at 60% in dogs and 55% in mice. That’s a 20% difference in transcriptome conservation between the tow cell types.

    f) We can maybe agree that the fact that the neuron transcriptome has changed more, in humans, than the fibroblast transcriptome, could reasonably be related to the fact that brain architecture is very different in the three species, and especially in humans vs the other two.

    g) We can maybe agree, therefore, that there must be sequence differences in the human genome and, say, the mouse genome which are linked to that difference in neuron transcriptome, and therefore reasonably to the differences in brain structure.

    e) If we have some appreciation for human brain abilities, we can maybe agree in calling those differences “functional”.

    f) We can maybe agree that, unless and until we understand what those differences are, and how many they are, it is very likely that those differences will appear just as neutral variation, and be considered as such under the current methodology.

    OK, maybe I have assumed too much agreement. I have always been an optimist.

    However, I hope that the above points may be useful to clarify better my thoughts.

  39. 39
    Piotr says:

    f) We can maybe agree that the fact that the neuron transcriptome has changed more, in humans, than the fibroblast transcriptome, could reasonably be related to the fact that brain architecture is very different in the three species, and especially in humans vs the other two.

    To say that (the part in bold), you’d have to compare all those species pairwise. It seems, from what I’ve read, that the “neuron transcriptome” is less constrained by long-term conservation than some other parts of the transcriptome (say, that involved in the production of fibroblasts): not “especially” in humans (though here the results have been particularly spectacular) but more generally in mammals (at least). In other words, neuronal organisation enjoys more “experimental freedom” than some other tissues.

  40. 40
    gpuccio says:

    Piotr:

    OK, I will accept that as a partial agreement! Especially the “spectacular” part.

    But the same trend, and even more evident, is true for macrophages and T cells.

    Again, nervous system and immune system. Is it a case that they are the most complex and adaptive systems we know of?

  41. 41
    Piotr says:

    Again, nervous system and immune system. Is it a case that they are the most complex and adaptive systems we know of?

    I don’t know about complex — it depends on how you define the metric. As for adaptive — OK, they are adaptive in the technical sense: they can “learn” and “remember” (I hope this is what you mean).

  42. 42
    Joe says:

    Piotr, Is there a way we can test the claim that natural selection (or any other materialistic processes) produced either the nervous or the immune systems? If there is what is it?

    If there isn’t then it isn’t science.

  43. 43
    gpuccio says:

    Piotr:

    Yes, I mean that they receive information from the environment and process it. I suppose it’s more or less what you mean with “learn” and “remember” (which, however, are consciousness related terms, which I would never use for algorithmic non conscious activities; but that’s another story).

  44. 44
    bornagain77 says:

    semi related: ‘Aliens of sea’ provide new insight into evolution – May 22, 2014
    Excerpt: in an in-depth look at the genes of 10 comb jelly species, researchers report that these mysterious creatures evolved a unique nervous system in a completely different way than the rest of the animal kingdom.
    In other words, the nervous system evolved more than once, a finding published Wednesday by the journal Nature that challenges long-standing theories about animal development.
    http://phys.org/news/2014-05-a.....n.html#jCp
    But apparently that challenge to ‘long standing theories’ is never allowed to challenge the theory of evolution itself.

  45. 45
    gpuccio says:

    BA:

    Thank you. I hope sometimes you will tell us how you succeed in finding all these interesting sources. 🙂

  46. 46
    Piotr says:

    BA77:

    Nice stuff, thanks!

    Phylogenetic hypotheses change all the time. What else would you expect? The idea that Ctenophora may be a sister group to all the remaining animals (including sponges) has been around for some time. If their nervous system developed independently, the idea gains further support, since we do not have to puzzle over the absence of a nervous system in sponges.

    Still, I don’t think the controversy is over. Stem ctenophores must have existed already in the Proterozoic, but the group is small today (in comparison with sponges, cnidarians and bilaterians) and it seems that all the living ctenophores had their last common ancestor quite recently — possibly in the Paleocene, after the K/T mass extinction (which may have killed off all other comb jellies). What happened in their lineage during the long time — some 600 million years at the very least — between their putative separation from the rest of animals and the common ancestor of the crown group, is hard to reconstruct. What looks like genomic gain in other animals may actually be massive genomic loss during the “dark ages” of the ctenophore lineage.

  47. 47
    gpuccio says:

    BA and Piotr:

    Very interesting article. It’s amazing how genomes and transcriptomes can be analyzed in detail with the recent technologies. I expect that in 1 or 2 years we will have a lot of new knowledge to reflect upon. That’s really good.

    I would like to share this piece of information from the paper (supplementary material):

    Pleurobrachia-Specific Genes* (NR/SP) 10, 897/11,957 (56%/61.2%)

    Gene Homologs to Metazoa* (NR/SP) 8,626/7,566 (44%/38.8%)

    *based on BLAST of the gene models to NR and SwissProt, with an evalue cutoff of 1e-04 (NR: 8,626 shared, 10,897 unique; SwissProt: 7,566 shared, 11,957 unique)

    IOWs, Pleurobrachia have more or less the same number of genes as we have (about 20000), but about 60% of them are unique of Pleurobrachia.

    NR and SwissProt refers to the two different protein databases which were used to blast the genes, NCBI-NR and UniProtKB/SwissProt.

    And the result does not change choosing a much higher evalue cutoff (1): the percent of unique genes remains essentially the same, 55.5% / 60.8% in the two databases.

  48. 48
    wd400 says:

    I will try to explain again my point very simply, with reference to the FANTOM5 data I have discussed…

    OK, but this seems to establish that (a) gene regulation is important (b) evolutinary change requires, you know, change.

    This has been known for some time, so I’m not sure what you are trying to establish here.

  49. 49
    Joe says:

    Piotr, Nice stories however science requires evidence and all you have is to throw time at any issue.

    wd400- (a) gene regulation is evidence for Intelligent Design (b) Intelligent Design is not anti-evolution.

  50. 50
    gpuccio says:

    wd400:

    Well, maybe something more:

    (a) gene regulation is important, and we don’t know ell on which parts of the genome it is based (rememder, the genome is one, and human transcriptomes are at leas 500).

    (b) gene regulation could very reasonably be based on sequences which are part of non coding DNA (indeed, there are already many evidences of that)

    (c) while part of the regulatory sequences can well be conserved, many others will have to change, because change in phenotypes is mainly based on change in regulations. Indeed, evolutionary change requires, I know, change.

    (d) therefore, if (b) is correct, then part of non coding DNA (we don’t know how much) will change across species for functional reasons, and not for neutral evolution.

    (e) Let’s say that we have non coding DNA which is made of the following components:

    1) Non functional, subject to neutral variation and changing with it

    2) Functional and conserved across species

    3) Functional, and quickly changing across species for functional reasons.

    We have a total variation which is essentially the sum of 1) and 3). At the same time, we have a non functional genome which is subject to neutral variation which is essentially 1). IOWs, the total (non coding) genome, minus 2) and 3).

    Now, if you ascribe the total variation to neutral variation, you can easily think that it is justified by the whole non coding genome being non functional.

    But if you subtract the variation in 3) to the total variaton, you get the true neutral variation, which is smaller (maybe much smaller, you know, with all the procedures to get those transcriptomes, and the rest, having to be in 3) ).

    But that would be compensated by the fact that the true non functional genome is smaller too (maybe much smaller).

    That’s how Moran can make a wrong argument for neutral variation being exactly that which we would expect from a non functional non coding genome.

    That’s how things which “have been known for some time” are completely ignored when a general scenario is considered, if the only purpose is to support the existing theories.

    As I have written to Piotr, in post 33:

    “The point is: paradigms do matter in scientific reasoning, and it is very important to choose the right paradigm, because wrong paradigms only lead to deformation of facts and of their interpretation.”

  51. 51
    wd400 says:

    Now, if you ascribe the total variation to neutral variation, you can easily think that it is justified by the whole non coding genome being non functional.

    No one thinks this. And if huge amounts of the human genome are functional they would still be under purifying selection now, that signal should be detectable for population datasets. But even biased estimated of lineage-specific constraint add only a few extra percent.

    That’s how things which “have been known for some time” are completely ignored when a general scenario is considered, if the only purpose is to support the existing theories.

    I have no idea what this means.

  52. 52
    Jehu says:

    gpucio

    Interestingly, comparison between human and neandertal/denisovan genomes shows that there are fewer differences between our HARs than in the rest of the genome, which suggests that most HARS became fixed rather quickly by natural selection before our divergence from the common ancestor. One could say that they are “conserved” in the whole cluster of species most closely related to Homo sapiens.

    Or one could say they are all the same species, which parsimony supports.

  53. 53
    gpuccio says:

    wd400:

    “And if huge amounts of the human genome are functional they would still be under purifying selection now, that signal should be detectable for population datasets. But even biased estimated of lineage-specific constraint add only a few extra percent.”

    Could you please give greater details about that?

  54. 54
    Jehu says:

    gpuccio

    Interestingly, comparison between human and neandertal/denisovan genomes shows that there are fewer differences between our HARs than in the rest of the genome, which suggests that most HARS became fixed rather quickly by natural selection before our divergence from the common ancestor. One could say that they are “conserved” in the whole cluster of species most closely related to Homo sapiens.

    Or one could say they are all the same species, which parsimony supports.

  55. 55
    wd400 says:

    When you look for withing-species constraint on sequences you find an extra ~4% of the genome may be constrained:

    dx.doi.org/10.1126/science.1225057

  56. 56
    Piotr says:

    Gpuccio:

    That’s how Moran can make a wrong argument for neutral variation being exactly that which we would expect from a non functional non coding genome.

    Could you give us some sort of reference to the place where Larry Moran makes “the famous Moran argument”?

  57. 57
    Joe says:

    wd400:

    And if huge amounts of the human genome are functional they would still be under purifying selection now,

    How could you tell if the function isn’t sequence specific?

  58. 58
    gpuccio says:

    Piotr:

    I referred to this post of his, which was discussed here for a long time, together with other aspects, on VJ’s posts some time ago.

    http://sandwalk.blogspot.it/20.....chive.html

  59. 59
    gpuccio says:

    wd400:

    I have looked at the paper you linked. Unfortunately, I could only access the abstract. However, I read the free access paper which criticizes it (I suppose that’s why you call it “a biased estimate”).

    Well, I cannot really judge the substance of the discussion, first of all because I could not read the whole article, and second because it is a type of analysis with which I am not familiar. Maybe I will find the time to consider it better.

    However:

    a) The paper you refer to, in the abstract, says:

    “Although only 5% of the human genome is conserved across mammals, a substantially larger portion is biochemically active, raising the question of whether the additional elements evolve neutrally or confer a lineage-specific fitness advantage. To address this question, we integrate human variation information from the 1000 Genomes Project and activity data from the ENCODE Project. A broad range of transcribed and regulatory nonconserved elements show decreased human diversity, suggesting lineage-specific purifying selection. Conversely, conserved elements lacking activity show increased human diversity, suggesting that some recently became nonfunctional. Regulatory elements under human constraint in nonconserved regions were found near color vision and nerve-growth genes, consistent with purifying selection for recently evolved functions. Our results suggest continued turnover in regulatory regions, with at least an additional 4% of the human genome subject to lineage-specific constraint.”

    Emphasis mine. I am not sponsoring their conclusions, I am only trying to see what they say.

    The criticism by Phil Green and Brent Ewing, if I understand well, denies the validity of their findings.

    As far as I can understand, the main point remains the following. You say:

    “And if huge amounts of the human genome are functional they would still be under purifying selection now, that signal should be detectable for population datasets.”

    Emphasis mine. OK, maybe, or maybe not. I don’t know, and those two papers don’t seem to give great certainties. And, obviously, my objections remain that much depends on what type of function we are analyzing, and on how much it can change, even among population datasets.

    However, I appreciate your contribution, and please feel free to give me any further reference which can help clarify this aspect.

  60. 60
    Piotr says:

    Gpuccio:

    All that Larry argues in that blog post is that the difference between the genomes of chimps and humans is consistent with neutral expectations. That’s because molecular evolution is predominantly neutral (if you just count mutations).

    He doesn’t say that natural selection and adaptation haven’t been taking place as well, or that adaptive evolution didn’t play a decisive role in making humans different from chimps. He is more explicit about it in this post:

    http://sandwalk.blogspot.com/2.....ncent.html

  61. 61
    wd400 says:

    And, obviously, my objections remain that much depends on what type of function we are analyzing, and on how much it can change, even among population datasets.

    Well, if a functional sequence can accept almost any of sequence then biological function must be pretty easy to come by? It’s going to be very hard to prove that ‘the gnome is mostly functional’ and ‘biological function is highly specified’ are both true…

    Joe,

    You are right that sequences that function only as spacers would not show up in the data that paper analysed. By even spacer loci incur a mutational load with regard indels and rearrangements, so the genome is unlikely to be dominated by functional elements even when function can include taking up space.

  62. 62
    Dionisio says:

    [OT]

    Is this abstract interesting?

    http://www.cell.com/molecular-.....14)00394-3

  63. 63
    bornagain77 says:

    Semi OT quote:

    “In light of Doug Axe’s number, and other similar results,, (1 in 10^77), it is overwhelmingly more likely than not that the mutation, random selection, mechanism will fail to produce even one gene or protein given the whole multi-billion year history of life on earth. There is not enough opportunities in the whole history of life on earth to search but a tiny fraction of the space of 10^77 possible combinations that correspond to every functional combination. Why? Well just one little number will help you put this in perspective. There have been only 10^40 organisms living in the entire history of life on earth. So if every organism, when it replicated, produced a new sequence of DNA to search that (1 in 10^77) space of possibilities, you would have only searched 10^40th of them. 10^40 over 10^77 is 1 in 10^37. Which is 10 trillion, trillion, trillion. In other words, If every organism in the history of life would have been searching for one those (functional) gene sequences we need, you would have searched 1 in 10 trillion, trillion, trillionth of the haystack. Which makes it overwhelmingly more likely than not that the (Darwinian) mechanism will fail. And if it overwhelmingly more likely than not that the (Darwinian) mechanism will fail should we believe that is the way that life arose?”
    Stephen Meyer – 46:19 minute mark – Darwin’s Doubt – video
    https://www.youtube.com/watch?v=Vg8bqXGrRa0&feature=player_detailpage#t=2778

  64. 64
  65. 65
    Joe says:

    wd400- There are also sequences that function as data storage.

  66. 66
    wd400 says:

    If that were the case there would need to be some mechanism by which the “storage” sequences were shielded from mutation, which would show up in these sorts of studies.

  67. 67
    Dionisio says:

    Dionisio @ 62

    Principles and Properties of Eukaryotic mRNPs

    Sarah F. Mitchell, Roy Parker

    DOI: http://dx.doi.org/10.1016/j.molcel.2014.04.033

    The proper processing, export, localization, translation, and degradation of mRNAs are necessary for regulation of gene expression. These processes are controlled by mRNA-specific regulatory proteins, noncoding RNAs, and core machineries common to most mRNAs. These factors bind the mRNA in large complexes known as messenger ribonucleoprotein particles (mRNPs). Herein, we review the components of mRNPs, how they assemble and rearrange, and how mRNP composition differentially affects mRNA biogenesis, function, and degradation. We also describe how properties of the mRNP “interactome” lead to emergent principles affecting the control of gene expression.

    These processes are controlled by mRNA-specific regulatory proteins, noncoding RNAs, and core machineries common to most mRNAs.

    Where do these mRNA-specific regulatory proteins, noncoding RNAs, and core machineries common to most mRNAs come from? How?

  68. 68
    Joe says:

    wd400:

    If that were the case there would need to be some mechanism by which the “storage” sequences were shielded from mutation, which would show up in these sorts of studies.

    No, storage space is NOT sequence specific. It is just a physical place along the DNA sequence, that isn’t used for anything else.

  69. 69
    Mung says:

    gpuccio:

    So, my point is: we don’t know why sequences change. The default explanation that they change mostly for random neutral mutations is just that, a default explanation which simply ignores the possibility that they change for functional reasons,

    Masatoshi Nei:

    In other words, we could use the neutral theory as a null hypothesis for studying molecular evolution.

  70. 70
    Mung says:

    gpuccio @ 32, were you responding to my post @ 28?

    That post was written to respond to something Piotr had written in a post @ 13.

    So if you were thinking it was a response to something you had written it wasn’t. 🙂

  71. 71
    tjguy says:

    Partially off topic:

    Creation.com has a new series of articles on the information issue here:
    http://creation.mobi/a/9559

    Here is the opening paragraph of the last article in the series. Maybe there I is some helpful information there for IDers.

    Any thoughts?

    In parts 1 and 2 of this series the work of various information theoreticians was outlined, and reasons were identified for needing to ask the same questions in a different manner. In Part 3 we saw that information often refers to many valid ideas but that the statements reflect we are not thinking of a single entity, but a system of discrete parts which produce an intended outcome by using different kinds of resources.

    We introduced in Part 3 the model for a new approach, i.e. that we are dealing with Coded Information Systems (CIS). Here in Part 4 the fundamental theories for CIS Theory are presented and we show that novel conclusions are reached.

  72. 72
    gpuccio says:

    Piotr at #60:

    I am aware of that. I never said that Moran denied the importance of adaptive evolution. That was probably VJ’s impression in the beginning, and VJ is probably less cynical than I am. I know that non design thinkers will always go back to NS to try to make sense, even if they strangely evade the subject when asked for detailed models or empirical supports to that point. 🙂

    My point was, instead, that Moran uses the assumption that most mutations are neutral to prove/suggest that most of the genome is non functional, which is a circular reasoning because, as I have tried to show, if an important part of the mutations is not neutral, but functional in a way we still do not understand, then an important part of variation must be subtracted to the observed mutations to get the true observed neutral fixation rate, and then that observed rate would be definitely lower than what we would expect from the total mutation rate, which would point to an important part of the genome being functional and therefore under purifying selection.

    My point is: the argument is circular anyway. We cannot understand if the genome is functional or not by looking at the apparent rate of fixation, unless we know if that fixation is due to function or not. IOWs, unless we already know how much of the genome is functional.

  73. 73
    gpuccio says:

    wd400 at #61:

    I understand you are answering Joe’s point about non sequence depending function (such as spacer function).

    I would like to mention, however, that my point 3) was not about that, but rather about function which allow different forms of sequence variation versus what we observe in protein coding genes, because the sequence/structure/function relationship is different.

    For example, the sequence/structure relationship in regulatory RNAs is certainly different than in coding DNA: the first one is related to the direct sequence of nucleotides and to the way it determines the final structure of the molecule according to chemical laws. The second one is related to the symbolic meaning of codons, and therefore to the chemical properties of another type of molecule.

    That certainly makes a difference. Just as an example, in genes which code for RNAs we cannot use the concept of synonimous mutations. In general, the problem is that we understand less of how regulatory RNAs work. And therefore, of how variation of sequence affects the functional space.

  74. 74
    gpuccio says:

    Dionisio at #64:

    Really interesting (and the article here is free!).

    There is an important point which should be considered. We are probably really underestimating the importance of short molecules in functional regulation. Both peptides and RNAs.

    Indeed, while basic biochemical function usually requires long molecules, the regulation of what those long molecules do can easily be obtained by short molecules.

    That means that there can be an intricate regulation network that we not only don’t understand, but essentially don’t see.

  75. 75
    Piotr says:

    My point was, instead, that Moran uses the assumption that most mutations are neutral to prove/suggest that most of the genome is non functional…

    Again, I have to ask you: where exactly does Larry Moran prove/suggest such a thing? I somehow find it hard to imagine him guilty of such a gross non sequitur. N(early n)eutral theory explains why the accumulation of junk DNA is possible; it doesn’t claim that most of the genome must be junk. It predicts that junk content may be highly variable, from almost none to truckloads of it.

    Of course if “the genome” is specifically the human genome (or indeed any typical eukaryote genome, leaving aside special cases like the bladderwort and the pufferfish), there are well known reasons to argue that a lot of it consists of junk DNA. But it isn’t a conclusion drawn directly from neutral theory. I am sure Larry Moran must have mentioned some other evidence.

    By the way, if there is a pub anywhere in the world where genome researchers meet for a pint or two after work, “The Bladderwort and the Pufferfish” would be a terrific name for it.

  76. 76
    gpuccio says:

    wd400 at #66:

    Just to discuss, suppose that some sequence stores information through a different code that we don’t understand.

    In the genetic coed (protein coding genes) we know that some mutations do not change the meaning of the coded information. Those are called synonymous, are considered by definition neutral (even if now we know that it is not always true), and they are not considered a sign that the sequence is not functional. Instead, we consider the rate of non synonymous mutations as an indicator of how much neutral mutations that sequence can tolerate, and therefore of the purifying selection acting on it.

    But if the code were different, we should reason differently. Other types of mutations would be “synonimous”, and other would be “non synonimous”.

    I simply mean that we cannot by default apply the concepts derived from protein coding genes to non protein coding genes, or to any sequence of which we don’t understand the sequence/structure/function relationship.

  77. 77
    gpuccio says:

    Mung at #70:

    OK, I was not really “responding” to you, just using something you had said as a starting point to clarify some concepts. 🙂

  78. 78
    gpuccio says:

    Piotr at #75:

    It gets even worse for the IDiots. Evolutionary theory predicts that the rate of change should correspond to the mutation rate since most of the differences are due to neutral substitutions in junk DNA.

    If evolutionary theory (population genetics) is correct, and if David Klinhoffer and chimps/bonobos actually evolved from a common ancestor, then we should observe a correspondence between the percent similarity of Klinghoffer and chimps and the predicted number of changes due to evolution.

    Let’s see if it works.

    The average generation time of chimps and humans is 27.5 years. Thus, there have been 185,200 generations since they last shared a common ancestor if the time of divergence is accurate. (It’s based on the fossil record.) This corresponds to a substitution rate (fixation) of 121 mutations per generation and that’s very close to the mutation rate as predicted by evolutionary theory.

    Now, I suppose that this could be just an amazing coincidence. Maybe it’s a fluke that the intelligent designer introduced just the right number of changes to make it look like evolution was responsible. Or maybe the IDiots have a good explanation that they haven’t revealed?

    Emphasis mine.

  79. 79
    Piotr says:

    You misrepresent Larry’s position. He doesn’t say or imply that “neutral theory is correct” = “most of the genome must be junk”. He does assume that most of human DNA is junk and therefore the separate evolution of humans and chimps has been neutral for the most part, but this assumption is based on various other evidence which he discusses elsewhere (including e.g. mutational load).

  80. 80
    Dionisio says:

    gpuccio @ 74

    There is an important point which should be considered. We are probably really underestimating the importance of short molecules in functional regulation. Both peptides and RNAs.

    Indeed, while basic biochemical function usually requires long molecules, the regulation of what those long molecules do can easily be obtained by short molecules.

    That means that there can be an intricate regulation network that we not only don’t understand, but essentially don’t see.

    I could not have said it better, hence I have nothing to add to what you wrote, except to remind ourselves that we ain’t seen nothing yet… more discoveries are coming, the best part is still ahead 🙂

    Let’s enjoy it!

  81. 81
    Joe says:

    Piotr, Seeing that Larry Moran cannot produce a testable hypothesis for unguided evolution producing a bacterial flagellum, no one really cares what he says as most likely it is meaningless to science.

  82. 82
    wd400 says:

    Just to discuss, suppose that some sequence stores information through a different code that we don’t understand…

    If this ‘code’ was sequence specific then we’d detect the signatures of purifying selection (decreased diversity, skewed allele frequencies….) if it was in operation. If functional sequences can remain funtional while accepting any-old mutation then the sequence-space must be full of biological function.

  83. 83
    Joe says:

    wd400:

    If functional sequences can remain funtional while accepting any-old mutation then the sequence-space must be full of biological function.

    Only some functional sequences can remain functional while accepting an old mutation. If all you need is a 10 nucleotide-long space for storage, then all you need is ten nucleotides that are not already doing something.

    I can store things in a cardboard box. I can also store things in a milk crate, wooden crate, plastic storage bin, etc.

  84. 84
    wd400 says:

    It what sense are those nts storing something? a 10nt gap sounds like an empty box to me?

  85. 85
    gpuccio says:

    wd400:

    I don’t think that RNA’s function is really sequence independent, but that we don’t know how much and how it is sequence dependent.

    It can be true that we should see some sequence conservation for those functions which do not change for functional reasons, but I am not sure at all that the context you quoted (population datasets) is sensitive enough to detect that signal.

    I think that we need more data to understand really what is functional, what is conserved, what is functional in different ways, and what expresses quickly changing functions.

    Moreover, the analysis should be as specific as possible for different types of non coding DNA, because they certainly behave very differently.

  86. 86
    Joe says:

    wd400- You just don’t get it. In a design scenario there is software running the show. That software needs a place to reside in/ on the DNA. The RNAs formed from the DNA are akin to the data, ie ones and zeros, on computer busses. Some data is in/ on the functioning/ coding parts. Some of it is in the other regions.

  87. 87
    gpuccio says:

    Piotr:

    Maybe I am misrepresenting Moran. If that is the case, I apologize with him and with you.

    But frankly, I don’t think that is true.

    I can only, in complete honesty, say again how I read his argument.

    a) He assumes a specific hypothesis which generates a prediction:

    “Evolutionary theory predicts that the rate of change should correspond to the mutation rate since most of the differences are due to neutral substitutions in junk DNA.”

    So, the hypothesis is that “most of the differences are due to neutral substitutions in junk DNA”, IOWs, that most DNA is junk, and therefore most of the mutations happen in non functional elements, and are therefore neutral.

    The prediction is that “the rate of change should correspond to the mutation rate”. IOWs that if he measures, independently, the mutation rate, and the observed number of fixed mutations, he should find a good correspondence. That’s, again, because he believes that his assumption, that most of the mutations happen in non functional elements, and are therefore neutral, is true. IOWs, Moran is making a prediction, then goes to verify it, to confirm his assumption.

    b) He evaluates the mutation rate:

    “We know that the mutation rate is about 130 mutations per generation based on our knowledge of biochemistry. This rate has been confirmed by direct sequencing of parents and children ”

    b) He goes on with the verification:

    “Let’s see if it works.”

    c) He evaluates the observed number of fixed mutations as follows:

    “The human and chimp genomes are 98.6% identical or 1.4% different. That difference amounts to 44.8 million base pairs distributed throughout the entire genome. If this difference is due to evolution then it means that 22.4 million mutations have become fixed in each lineage (humans and chimp) since they diverged about five million years ago.

    The average generation time of chimps and humans is 27.5 years. Thus, there have been 185,200 generations since they last shared a common ancestor if the time of divergence is accurate. (It’s based on the fossil record.) This corresponds to a substitution rate (fixation) of 121 mutations per generation and that’s very close to the mutation rate as predicted by evolutionary theory.”

    d) He concludes that the two numbers are extremely similar:

    “Now, I suppose that this could be just an amazing coincidence. Maybe it’s a fluke that the intelligent designer introduced just the right number of changes to make it look like evolution was responsible. Or maybe the IDiots have a good explanation that they haven’t revealed?”

    So, maybe he has other arguments to conclude what he concludes, but the reasoning here is clear enough: my prediction has been confirmed, and only IDiots can believe that this is a coincidence. Therefore, the verification of my prediction confirms my assumptions.

    What were the assumptions?

    “most of the differences are due to neutral substitutions in junk DNA”.

    That’s my understanding of what he says.

  88. 88
    wd400 says:

    Joe,

    I still don’t’ get it, how can something that is “running the show” also just be a random string of nucleotides? Again, if any old string of nucleotides can help to “run the show” then biological function is pretty easy to come by?

  89. 89
    Piotr says:

    Gpuccio:

    As Joe Felsenstein kept patiently explaining on Sandwalk (and chastising everybody, including Larry, for their “fixation on fixation”), the number of fixed mutations doesn’t matter at all, because the differences between human and chimp reference sequence may well be between alleles that are not yet fixed in either species (and of course many of those that are fixed now were at polymorphic loci in the Homo/Pan MRCA). But Larry’s purpose was not to prove that hominin evolution has been neutral. He wanted to show that even random drift alone (the evolutionary “noise”) may realistically produce a difference of the observed order in the time available. He made his point.

    I find your argument strange. We don’t see it, we don’t understand it, we don’t know to what extent it may be conserved, we have no direct evidence of its existence: therefore it must be real, intricate, and important. Looks more like wishful thinking than logic to me.

  90. 90
    gpuccio says:

    Piotr:

    There is misrepresentation and misrepresentation.

    My arguments may be strange, but I have tried to detail them as much as possible, and to take in serious consideration the (few) comments and objections.

    Your strange “summary” does not even appear to be a comment, and frankly I can’t see any resemblance to what I have tried to argue here in it.

    However, I appreciate your attention.

  91. 91
    Joe says:

    wd400:

    I still don’t’ get it, how can something that is “running the show” also just be a random string of nucleotides?

    Umm nucleotides are hardware. You can’t see the immaterial information, ie software. You can only see its effects.

    Again, if any old string of nucleotides can help to “run the show” then biological function is pretty easy to come by?

    The nucleotides = available/ blank disc space or RAM- It ain’t the information, it’s just a place for the information to reside.

  92. 92
    wd400 says:

    Joe,

    . You can’t see the immaterial information, ie software

    How is this immaterial information stored if not in sequences?

    The nucleotides = available/ blank disc space or RAM- It ain’t the information, it’s just a place for the information to reside.

    So there function is to one day serve a function? You still have mutational load from rearrangements and indels to deal with, I’m afraid

  93. 93
    Joe says:

    wd400- In a design scenario there would be at least two different types or classes of information. The first is the Crick class- functional sequence specificity in both nucleotides and proteins. That is the material representation of the immaterial information.

    The second class would be what directs all of that- as in it just doesn’t happen due to physics and chemistry. That class is immaterial.

  94. 94
    Joe says:

    wd400- the information is stored on the physical space available. The function is to serve as a physical space to store immaterial information.

  95. 95
    wd400 says:

    Will I still don’t understand it Joe, sounds a bit like magic to me.

  96. 96
    Piotr says:

    Gpuccio:

    Your strange “summary” does not even appear to be a comment, and frankly I can’t see any resemblance to what I have tried to argue here in it.

    OK, I refer to statements like these:

    IOWs, the current theory that most variation is neutral and that most of human genome is non functional is simply wrong and obstinately ignores the problem of where the procedures are written…

    I don’t think that RNA’s function is really sequence independent, but that we don’t know how much and how it is sequence dependent.

    That means that there can be an intricate regulation network that we not only don’t understand, but essentially don’t see.

    The picture that emerges is one of a genome full of functional “dark matter” — an intricate regulatory network, essentially invisible and incomprehensible to us, but presumably involved in some important activities like cortical development. Let’s imagine that such a network (spanning most of the genome) is real, and let’s call it the imaginome ®.

    What about mutational load? The imaginome would be an easy target for mutations, and one would expect many of them to have deleterious effects.

    I shall have more questions later on.

  97. 97
    Piotr says:

    Joe:

    If the genome is a RAM, it means that biological processes can write to it and change its content. Do you realise what you are proposing?

  98. 98
    gpuccio says:

    Piotr:

    Call it as you like, but I would really encourage you to answer the following points:

    a) How does the genome control the development of more than 500 different, specific transcriptomes, each of them expressing about 400 specific transcription factors, plus the structure of organs and tissues, plus the connectome in the brain, plus the immunologic networks, and so on?

    Do you realize that when I ask where are the procedures, and how are they written, I am not imagining anything, but just exercising the natural human right of trying to understand things?

    So please, clarify what is your position about the procedures: do you simply deny that they exist? And if they exist, where and how do you hypothesize they are written?

    b) When I say:

    “I don’t think that RNA’s function is really sequence independent, but that we don’t know how much and how it is sequence dependent.”

    I am just trying to clarify my position about the discussion which is taking place between wd400 and Joe. While it is certainly possible that some parts of the genome have sequence independent roles, I think that most functional DNA, including the non coding part, has sequence dependent function. But I have specified that we don’t know exactly how that relationship works for non coding DNA, especially for the parts which codify regulatory RNA. That is not my personal fault. It is just that we do not understand enough of the biochemical way in which regulatory RNA acts, not even of its biochemical structure. That means that there is space to discover new things, not that I am imagining anything.

    As you can see from my answers to wd400, I am taking his objection about the problem of conservation and of possible mutations very seriously. For variation across species, I really believe that functional contraints to vary can have a very important role. For wd400’s argument about the possibility to detect a signal of conservation even in current populations, I am afraid I cannot give a technical judgement, at least not at present. However, I am not at all sure that such a signal can be detected in a general way, and I have suggested that it would be more useful to study specific different components of non coding DNA, first of all across species, and if possible also in populations, always continuing to try to understand directly how the possible functions operate. A general, multifaceted approach to this important issue could give some more realistic answers than the couple of papers, certainly interesting, quoted by wd400.

    So I ask you: do you agree that RNA genes, if functional, express their function in ways that are not yet very clear? That the relationship between nucleotide sequence and structure and function is something which still needs to be studied? That it is something completely different from the more known, but not yet completely understood, field of protein structure and function? Am I imagining those things?

    c) When I said:

    “That means that there can be an intricate regulation network that we not only don’t understand, but essentially don’t see.”

    I was referring to a paper, very interesting, proposed by Dionisio, where important regulatory functions had been demonstrated for a short nucleotide sequence derived from a protein coding mRNA, but completely independent from the protein coding function.

    My comment is not imagination. It has recently become well known that various kinds of short sequences, both RNA and peptides, have important regulatory functions. That network, that certainly exists, is probably still vastly invisible to our technologies, for obvious reasons: many of the methods we have developed focus indeed on longer sequences.

    I though I had said that clearly in my post:

    “There is an important point which should be considered. We are probably really underestimating the importance of short molecules in functional regulation. Both peptides and RNAs.

    Indeed, while basic biochemical function usually requires long molecules, the regulation of what those long molecules do can easily be obtained by short molecules.

    That means that there can be an intricate regulation network that we not only don’t understand, but essentially don’t see.”

    So I ask you: do you agree that the regulatory function of a short fragment of mRNA is something new, that up to now has escaped our detection? Do you agree that, if that is not an isolated case, there could be a lot operating at the level of transcription regulation by short molecules, of which we still see only a tiny part? Is it imagination to hypothesize that, in the light of many recent discoveries, including the one linked by Dionisio?

    You say:

    The picture that emerges is one of a genome full of functional “dark matter” — an intricate regulatory network, essentially invisible and incomprehensible to us, but presumably involved in some important activities like cortical development.

    Yes. Exactly. Is that so strange for you? Those important activities are there, for all to see. Do we really need imagination to ask ourselves how do they take place?

    There is nothing wrong in not understanding things. That is the source of inquiry. There is nothing worse than pretending that things not understood are understood.

    You say:

    Let’s imagine that such a network (spanning most of the genome) is real, and let’s call it the imaginome ®.

    Call it as you like. I don’t know if it spans most of the genome. Indeed, I am not really sure that it is in the genome, or entirely in it. I am very open minded about that. But I am sure that such a network exists, that it is real.

    What about mutational load? The imaginome would be an easy target for mutations, and one would expect many of them to have deleterious effects.

    I can agree with that. My point is that we must understand more of the network and where it is and how it works, before trying to use the mutation argument and gross measures to deny that it exists. My point is that only a correct assessment of structure and function of the regulatory elements will allow to measure if they change, how much they change, and what the results of the changes are.

    Even for proteins, we often don’t understand how sequences which are very different may have similar structure and function. And we know much more about proteins than we do about regulatory RNAs.

    I shall have more questions later on.

    I am waiting for them. I love questions.

  99. 99
    Piotr says:

    Gpuccio:

    So please, clarify what is your position about the procedures: do you simply deny that they exist? And if they exist, where and how do you hypothesize they are written?

    No explicit procedures are necessary. “Procedures” emerge from the interactions or regulatory elements. Thinking of DNA in terms of source code written by a human programmer is not very helpful.

    So I ask you: do you agree that RNA genes, if functional, express their function in ways that are not yet very clear?

    We understand how some of them work, and we know precious little about others. No disagreement here.

    That the relationship between nucleotide sequence and structure and function is something which still needs to be studied? That it is something completely different from the more known, but not yet completely understood, field of protein structure and function? Am I imagining those things?

    So far, so good. Functional RNAs are important, interesting, and insufficiently understood.

    So I ask you: do you agree that the regulatory function of a short fragment of mRNA is something new, that up to now has escaped our detection?

    I’m not an expert, but this particular little fellow does seem to be something new, given its untypical origin and its target (though functional miRNAs in general are hardly big news).

    Do you agree that, if that is not an isolated case, there could be a lot operating at the level of transcription regulation by short molecules, of which we still see only a tiny part? Is it imagination to hypothesize that, in the light of many recent discoveries, including the one linked by Dionisio?

    Note that, since the fragment in question is derived from an mRNA, it doesn’t really increase the amount of “non-junk”. How do you extrapolate from one such case to a whole realm of hitherto unknown molecules? All the known types of functional RNAs, long as well as short, still add up to a few percent of the human genome (a pretty generous estimate).

    As a side note: the RNA fragment in question is an 18-mer (and most miRNAs are about 22nt long). Since they are so short and there are so many of them, and they are functional, there must be a reasonable probability that a randomly generated short RNA sequence can be co-opted for a regulatory function. They don’t have to be designed by a bodiless intelligence.

  100. 100
    Mung says:

    The Bladderwort and the Pufferfish

    Would it serve Pufferfish and Bladderwort chips?

  101. 101
    Joe says:

    wd400:

    Will I still don’t understand it Joe, sounds a bit like magic to me.

    So computer science is magic to you. Nice to know. Thanks.

  102. 102
    Joe says:

    Piotr:

    If the genome is a RAM, it means that biological processes can write to it and change its content.

    I didn’t say the genome was RAM.

    Do you realise what you are proposing?

    Absolutely. I don’t think you do though.

  103. 103
    Mung says:

    gpuccio:

    I am a design believer, as I believe you are. The point of interest for me is how the functional result originates, and I think I have the general answer in the design paradigm.

    Today I considered the possibility that I am a design believer because it is easier to argue on the internet than it is to get out and get my hands “dirty” in the real world.

    Hoping to allow the Spirit to water that little seed. Not that I’ll stop being a design believer! 🙂

    I believe in God the Creator of Heaven and Earth. To me creation implies design. Period. How could it not?

    gpuccio:

    Frankly, I am less interested in how the functional design becomes fixed, if we admit that it starts in a limited part of the population.

    Understood. But that’s still an interesting thing to think about especially when considering scenarios offered by evolutionists. and I was responding to Piotr.

    To fix a particular allele in the population by natural selection requires that there be reproductive excess, as Haldane showed, as Walter ReMine helped clarify, and as Nei admits in his latest book Mutation-Driven Evolution.

    Piotr was claiming that he rejects the “drift” explanation as being “too slow” and reasons the fixation therefore “must have been due to natural selection.” That’s just plain faulty reasoning.

    Given all this recent talk of neutral theory and drift and natural selection and their roles in evolution I thought it pertinent.

    gpuccio:

    I suppose that NS can have a role in that. I am sure that drift would lose most of the designed traits in favor of useless variation. I don’t think that is a good way to “sell” designed things on the market!

    But that’s all the evolutionists have, lol! And that’s their “answer” to intelligent design.

    In the models of population genetics, most new “designs” get lost, regardless of whether they are beneficial or not.

    So given their models, there must be far more beneficial mutations being offered up by “the engines of variation” than they are willing to let on. Else evolutions doesn’t stand a snowball’s chance in Venice in August.

    And of course, any scenario rife with beneficial mutations just begs for design.

    Enough for now.

    God bless!

  104. 104
    wd400 says:

    Joe,

    I understand comp. sci. pretty well, certainly enough to know it’s a terrible model for understanding biology. What I don’t understand it what you are talking about.

  105. 105
    Joe says:

    wd400- it sounds like you don’t understand computer science at all.

    Have you ever worked on a computer at the level of its integrated circuitry- ie the component level?

  106. 106
    gpuccio says:

    Piotr at #99:

    The issue about procedures is certainly the most important, so I will leave it for next post.

    Here, I will briefly answer two minor points:

    You say:

    Note that, since the fragment in question is derived from an mRNA, it doesn’t really increase the amount of “non-junk”. How do you extrapolate from one such case to a whole realm of hitherto unknown molecules? All the known types of functional RNAs, long as well as short, still add up to a few percent of the human genome (a pretty generous estimate).

    OK, but I never argued that it increased the “non junk” part. Dionisio linked the article, and I commented on it. I believe, however, that it can be considered indirectly related to the discussion. It is, at least, an interesting clue to how function can be implemented in the general scenario.

    As a side note: the RNA fragment in question is an 18-mer (and most miRNAs are about 22nt long). Since they are so short and there are so many of them, and they are functional, there must be a reasonable probability that a randomly generated short RNA sequence can be co-opted for a regulatory function. They don’t have to be designed by a bodiless intelligence.

    I am well aware that, the shorter the molecule, the less is the functional complexity of the sequence. However, in this kind of regulatory networks, the complexity is mainly of the whole system. IOWs, we are more in a situation of irreducible complexity of the system, where the role of the short molecule is a role mainly of signal (or maybe of interference). Even is the molecule is rather simple, its role is functional only if integrated in a general regulation pattern.

    More in next post about the procedures.

  107. 107
    gpuccio says:

    Mung:

    Thank you for your interesting comments (and for the blessing!).

    I must confess that I have not a clear idea about the possible role of NS in the fixation of designed functional elements. It could be, but certainly your comments about the “population genetics” problems are interesting. Personally, I don’t think that most designed features are lost by drift. There must be some mechanism to rescue them, IMO. Design is too precious (and difficult) to waste it beyond some acceptable level.

    Part of it, however, could really be lost. And part of it, as I have said, can certainly fail even after its fixation and rather long existence (see Ediacara).

  108. 108
    gpuccio says:

    Piotr:

    You say:

    No explicit procedures are necessary. “Procedures” emerge from the interactions or regulatory elements. Thinking of DNA in terms of source code written by a human programmer is not very helpful.

    This is an important point, and one where your way of thinking is definitely different from mine.

    First of all, I don’t understand well what you mean by “explicit”. I am certainly not thinking of “source code written by a human programmer”. My only point is that the procedures must be there, and they must be working. And to be working, they must be implemented in some form, IOWs there must be functional information and functional complexity linked to them.

    I will try to be more clear. If, as a human programmer, I write some source code in C, and then I compile it, I have two different objects:

    a) The source code, which is typically in “human friendly” form (well, that is not always true for C!).

    b) The compiled code, which is the working form.

    Now, you can call a) explicit and b) implicit. I don’t know if that is what you mean. I only know that b) must be there, for the software to work. And b) requires digital information of some kind.

    Let’s make an example. From the one genome, we get in humans about 500 basic transcriptomes. That is the foundation of our multicellular form.

    How does that happen? Some algorithm must work, otherwise how could such different patterns arise from the same set of genes?

    You say that the procedures “emerge from the interactions or regulatory elements”. Which interactions? The regulatory elements are coded in the genome, and the genome is the same in all cells. So, what happens? How do different, complex patterns of regulation emerge from the same genome in different cells, at different times, and in definite functional order?

    It is certainly possible, but not without a lot of information. Information which generates the differences, guides them, makes algorithmic “decisions” (not in a conscious sense) and implements definite “strategies” (again, not in a conscious sense).

    If we accept, at least as a first hypothesis, that such information is in the genome, we must ask ourselves where it is, and in what form. It’s not enough to state that it is not “explicit”, or that it magically “emerges from the interactions or regulatory elements” (what in the world does that mean?) to evade that important problem.

    Information, in the form of digital sequences, requires bits, physical bits, to be implemented. What are those bits? How do they perform the task? Those are legitimate questions. Your answers are not answers.

    Just to conclude, in a despicable attack of narcissism, I quote myself:

    “There is nothing wrong in not understanding things. That is the source of inquiry. There is nothing worse than pretending that things not understood are understood.”

  109. 109
    Dionisio says:

    “There is nothing wrong in not understanding things. That is the source of inquiry. There is nothing worse than pretending that things not understood are understood.”

    “There is nothing wrong in not understanding things. That is the source of inquiry. There is nothing worse than pretending that things not understood are understood.”

    “There is nothing wrong in not understanding things. That is the source of inquiry. There is nothing worse than pretending that things not understood are understood.”

  110. 110
    Dionisio says:

    #108

    I am certainly not thinking of “source code written by a human programmer”.

    Humans have not created any information processing system comparable -in functionally creative power- just to the first few weeks of the human development process. Not even in our wildest imagination. Not even in theory. Not even close.

    Computer scientists and engineers are humbly fascinated and attracted by the biological systems described by researchers these days. That’s why we find electrical engineers and computer scientists working along with biologists on complex research projects at different universities.
    Perhaps we should approach this and other similar discussions with the same humility and fascination?

    Let’s think out of the box. But let’s think well. Let’s not rush into premature conclusions.

  111. 111
    Dionisio says:

    #106

    …in this kind of regulatory networks, the complexity is mainly of the whole system.

    Yes! Exactly! Well stated. The functionality of the whole is not the sum of the functionality of the individual components.
    Timing and location are important factors that make a big difference.
    The individual components by themselves may not mean much if taken out of the context they are found in. This applies to any informational system.
    The particular physical or chemical properties of every individual component are used in the right way, at the right time, in the right location, in order to cause a desired effect, that combined with the actions of the other components, produce the designed system-wide result.
    Reductionist approaches don’t work well when analyzing complex systems.
    Simplistic reasoning can lead to ‘frog without legs’ stories. That’s pseudo-science voodoo.
    How long did it take to figure out that regulatory networks require controlling mechanisms that could be located anywhere in the system? Did we look everywhere from the beginning? Did we keep some areas of the system out of the investigation at the beginning? Did we discover things ‘by accident’ because we did not expect them to be where they are or to look as they do or to function as they do? Did we let preconceived ideas to determine what parts of the systems should be investigated and how to interpret the results?
    We better watch out. We better don’t cry. Santa Claus may not be coming to town, but an overwhelming amount of data is coming out of research, and science must handle it well. New discoveries will shed more light on the complex biological systems, answering outstanding questions, but raising new ones.
    The party has just started. The fun part is still ahead 😉
    Let’s enjoy it!

  112. 112
    gpuccio says:

    Dionisio:

    Thank you for your comments. I appreciate them very much.

    And I obviously agree. The fun part is still ahead. 🙂

  113. 113
    Dionisio says:

    Have anyone ever been to a classic music concert or ballet presentation in a theater? Assuming one doesn’t arrive in late, one may hear the orchestra tuning their instruments. Does it sound good? No, it’s a terrible cacophony! However, when the same musicians play the same notes on the same instruments, but following their individual sheets under the direction of the conductor, then beautiful music fills the theater. What’s the difference?
    Timing, location, arrangement, synchronization, inspiration, composition, orchestration, are words that come to our minds. Why?
    Now, add to all that a ballet choreography and see the dancers appear on stage, move on the stage, leave the stage, while the orchestra plays a musical composition, specially written for the given scenario and choreography.
    Now things got more complicated. However, the same principles apply. All the components of such a complex event have their individual function(s) to perform. Some components react to external signals in order to start and perform their part, according to pre-established protocols, based on the original idea of the main composer and the main choreographer. Each dancer knows what to do in reaction to the surrounding environment.
    However, try playing Tchaikovsky’s ‘Swan Lake’ on loudspeakers placed near a lake where swans calmly swim and watch how they react to the loud sound of the famous musical composition they inspired. Compare to any video of that ballet, and see if there’s any similarity. Now, someone might conclude that the swans are deaf, like the frog without legs. Maybe they are, who knows? Hard to tell 😉

  114. 114
    gpuccio says:

    Dionisio:

    Are you comparing our revered interlocutors to swans?

    Could have been worse, after all. 🙂

  115. 115
    Timaeus says:

    rhampton7:

    Without noticing the date, I responded in depth to a question of yours on another thread. Wondering why you had not replied, I looked again at the thread, and see the date was 2011. My error. Anyhow, if you are interested in getting an answer to your old question (mine is the only one), you can find it at:

    http://www.uncommondescent.com.....-giberson/

    If you want to reply there, you are welcome, and I’ll read what you have to say.

  116. 116
    Dionisio says:

    Are you comparing our revered interlocutors to swans?

    No, swans are nice looking birds. I like the elegant way they swim, but like more how they fly over the lake. Also, they don’t argue stubbornly 😉

    The swan example was about illustrating the nature of functional information. The air pressure waves associated with the music sound are converted to electrical impulses that activate specialized parts of the brain that make us hear the music. However, different individuals may react differently to the music. Some will recognize a choreography associated with the music and will act accordingly, depending on the scenario context. Others won’t know what to do with it. Some will consider it an unpleasant noise. Is the functionality of any information partially revealed by the effects it causes on the system that contains it?

    Changing subject, check this out:

    http://mail.cell-press.com/go......F/xMU8OB6F

  117. 117
    Dionisio says:

    gpuccio,

    Info related to the spindle apparatus mechanisms operating on the intrinsic asymmetric mitosis:

    http://www.cell.com/developmen.....14)00129-4

  118. 118
    Dionisio says:

    #117

    This was implicit in the previous link, but here’s explicitly

    http://www.cell.com/developmen.....14)00163-4

  119. 119
    gpuccio says:

    Dionisio:

    Really interesting! The complexity of the spindle dynamical control is mind blowing.

    But, obviously, it is based on tons of “simpler” complexity.

    Just as an example, tubulin molecules appear in eukaryotes, and they are highly conserved:

    Alpha Tubulin:

    Plasmodium falciparum: 453 AAs
    Human: 450 AAs

    Identities: 379/450(84%) Positives: 419/450(93%)

    On the contrary, centrosomin, the really important protein in this paper, is, in drosophila, a 1320 AAs long protein (!), which is scarcely conserved in other species.

    This is a beautiful example of important function which can be sequence conserved across species (as in tubulin) or definitely not conserved (as in centrosomin).

    Which was, I believe, one of the main points of discussion in this thread.

  120. 120
    Dionisio says:

    This is a beautiful example of important function which can be sequence conserved across species (as in tubulin) or definitely not conserved (as in centrosomin).

    Which was, I believe, one of the main points of discussion in this thread.

    eccellente conclusione! mile grazie!

    Glad to see that the amazing complexity of the spindle dynamical control serves as a good illustration for the interesting topic of function and sequence conservation that was discussed in this thread.

  121. 121
    Dionisio says:

    gpuccio,

    Did you see this report?

    http://www.biosciencetechnolog.....8;type=cta

  122. 122
    Dionisio says:

    In a summary of the effort, to be published today in the journal Nature, the team also reports the identification of 193 novel proteins that came from regions of the genome not predicted to code for proteins, suggesting that the human genome is more complex than previously thought. The cataloging project, led by researchers at The Johns Hopkins University and the Institute of Bioinformatics in Bangalore, India, should prove an important resource for biological research and medical diagnostics, according to the team’s leaders.

    http://www.biosciencetechnolog.....8;type=cta

  123. 123
    bornagain77 says:

    Dionisio, excellent find!

    Human Proteome Project Finds 193 Previously Unknown Proteins – 05/28/2014
    Excerpt: Striving for the protein equivalent of the Human Genome Project, an international team of researchers has created an initial catalog of the human “proteome,” or all of the proteins in the human body. In total, using 30 different human tissues, the team identified proteins encoded by 17,294 genes, which is about 84 percent of all of the genes in the human genome predicted to encode proteins.
    In a summary of the effort, to be published today in the journal Nature, the team also reports the identification of 193 novel proteins that came from regions of the genome not predicted to code for proteins, suggesting that the human genome is more complex than previously thought. ,,,
    “You can think of the human body as a huge library where each protein is a book,” said Akhilesh Pandey, ,,, “The difficulty is that we don’t have a comprehensive catalog that gives us the titles of the available books and where to find them. We think we now have a good first draft of that comprehensive catalog.”,,,
    The team’s most unexpected finding was that 193 of the proteins they identified could be traced back to these supposedly noncoding regions of DNA.
    “This was the most exciting part of this study, finding further complexities in the genome,” said Pandey. “The fact that 193 of the proteins came from DNA sequences predicted to be noncoding means that we don’t fully understand how cells read DNA, because clearly those sequences do code for proteins.”
    Pandey believes that the human proteome is so extensive and complex that researchers’ catalog of it will never be fully complete, but this work provides a solid foundation that others can reliably build upon.
    http://www.biosciencetechnolog.....8;type=cta

    Human Proteome Mapped – By Anna Azvolinsky | May 28, 2014
    Excerpt: Both studies identified evidence to suggest there is translation from DNA regions that were not thought to be translated—including more than 400 translated long, intergenic non-coding RNAs (lincRNAs)—found by the Küster team—and 193 new proteins—uncovered by the Pandey team.
    http://www.the-scientist.com/?.....me-Mapped/

  124. 124
    rhampton7 says:

    in this kind of regulatory networks, the complexity is mainly of the whole system

    gpuccio,
    This is where your identification of intelligent design, I believe, may result in false-positives among different dog breeds if the 500 bit (250bp) threshold may be reached by the accumulation of many individual “regulatory function[s] of a short fragment[s] of mRNA” that comprise a regulatory network.

    It seems logical that most of changes in the genotypes between breeds should be in regulatory regions, and that these changes could easily surpass 250bp especially for older breeds.

  125. 125
    gpuccio says:

    Dionisio:

    I just found that wonderful source (on another thread), just to see that you had already pointed to it here. Thank you.

    The web resource is particularly, interesting, allowing us to query for individual protein expression.

  126. 126
    gpuccio says:

    rhampton7:

    I think we have already discussed that in the past. The 500 bit threshold refers to the number of bits necessary to implement one function.

    Individual regulatory functions do not accumulate by chance to give a complex metafunction, no more than individual AA mutations accumulate to build a complex functional proteins.

    In my identification of intelligent design, which IMO is in no way different from the general identification of intelligent design, it is the specified (functional) complexity that counts, not the accumulation of simpler, independent traits.

    A specific sequence of AAs is as rare as a specific accumulation of regulatory elements which contribute to a general, specific function which in no way can be expressed in a simpler way.

    Dog breeds may differ for an accumulation of simpler regulatory differences. I have no problem with that. In that case, there is no reason to infer design.

    So, if a dog breed differs from another one only because of different sizes of individual body parts, or the color of this or that part, and so on, maybe those differences are generated by random variation (or human selection of random variation, in the case of dogs).

    But if a dog breed has a completely different enzyme chain, which is not present in the genome of another breed, the origin of that enzyme chain is design. Obviously, that part of genome could already exist and simply have been transferred by HGT or something like that, but the origin of the complex functional molecules is design.

    In the same way, even if you have a complex cascade of relatively simpler elements (such as short peptides) which interact specifically to give a complex sequence of regulatory functions, and the whole system is irreducibly complex, than you can infer design for the whole system, even is some of the individual components are not complex enough for the inference.

Leave a Reply