Uncommon Descent Serving The Intelligent Design Community

The amazing level of engineering in the transition to the vertebrate proteome: a global analysis

Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

As a follow-up to my previous post:

I am presenting here some results obtained by a general application, expanded to the whole human proteome, of the procedure already introduced in that post.

Main assumptions.

The aim of the procedure is to measure a well defined equivalent of functional information in proteins: the information that is conserved throughout long evolutionary times, in a well specified evolutionary line.

The simple assumption is that  such information, which is not modified by neutral variation in a time span of hundreds of million years, is certainly highly functionally constrained, and is therefore a very good empirical approximation of the value of functional information in a protein.

In particular, I will use the proteins in the human proteome as “probes” to measure the information that is conserved from different evolutionary timepoints.

The assumption here is very simple. Let’s say that the line that includes humans (let’s call it A) splits from some different line (let’s call it B) at some evolutionary timepoint T. Then, the homology that we observe in a protein when we compare organisms derived from B  and humans (derived from A) must have survived neutral variation throughout the timespan from T to now. If the timespan is long enough, we can very safely assume that the measured homology is a measure of some specific functional information conserved from the time of the split to now.

Procedure.

I downloaded a list of the basic human proteome (in FASTA form). In particular, I downloaded it from UNIPROT selecting all human reviewed sequences, for a total of 20171 sequences. That is a good approximation of the basic human  proteome as known at present.

I used NCBI’s blast tool in local form to blast the whole human proteome against known protein sequences from specific groups of organisms, using the nr (non redundant) NCBI database of protein sequences, and selecting, for each human protein, the alignment with the highest homology bitscore from that group of organisms.

Homology values:

I have used two different measures of homology for each protein alignment:

  1. The total bitscore from the BLAST alignment (from now on: “bits”)
  2. The ratio of the total bitscore to the length in aminoacids of the human protein, that I have called “bits per aminoacid” (from now on, “baa”). This is a measure of the mean “density” of functional information in that protein, which corrects for the protein length.

The values of homology in bits have a very wide range of variation  in each specific comparison with a group of organisms. For example, in the comparison between human proteins and the proteins in cartilaginous fish, the range of bit homology per protein is 21.6 – 34368, with a mean of 541.4 and a median of 376 bits.

The vlaues of homology in baa , instead, are necessarily confined between 0 and about 2.2. 2.2, indeed, is (approximately) the highest homology bitscore (per aminoacid) that we get when we blast a protein against itself (total identity).  I use the BLAST bitscore because it is a widely used and accepted way to measure homology and to derive probabilities from it (the E values).

So, for example, in the same human – cartilaginous fish comparison, the range of the baa values is:  0.012 – 2.126, with a mean of 0.95 and a median of 0.97 baas.

For each comparison, a small number of proteins (usually about 1-2%) did not result in any significant alignment, and were not included in the specific analysis for that comparison.

Organism categories and split times:

The analysis includes the following groups of organisms:

  • Cnidaria
  • Cephalopoda (as a representative sample of Mollusca, and more in general Protostomia: cephalopoda and more generally Mollusca, are, among Protostomia, a group with highest homology to deuterostomia, and therefore can be a good sample to evaluate conservation from the protostomia – deuterostomia split).
  • Deuterostomia (excluding vertebrates): this includes echinoderms, hemichordates and chordates (excluding vertebrates).
  • Cartilaginous fish
  • Bony fish
  • Amphibians
  • Crocodylia, including crocodiles and alligators (as a representative sample of reptiles, excluding birds. Here again, crocodylia have usually the highest homology with human proteins among reptiles, together maybe with turtles).
  • Marsupials (an infraclass of mammals representing Metatheria, a clade which split early enough from the human lineage)
  • Afrotheria, including elephants and other groups (representing a group of mammals relatively distant from the human lineage, in the Eutheria clade)

There are reasons for these choices, but I will not discuss them in detail for the moment. The main purpose is always to detect the functional information (in form of homology) that was present at specific split times, and has been therefore conserved in both lines after the split. In a couple of cases (Protostomia, Reptiles), I have used a smaller group (Cephalopoda, Crocodylia) which could reasonably represent the wider group, because using very big groups of sequences (like all protostomia, for example) was too time consuming for my resources.

So what are the split times we are considering? This is a very difficult question, because split times are not well known, and very often you can get very different values for them from different sources. Moreover, I am not at all an expert of these issues.

So, the best I can do is to give here some reasonable proposal, from what I have found, but I am completely open to any suggestions to improve my judgements. In each split, humans derive from the second line:

  • Cnidaria – Bilateria. Let’s say at least 555 My ago.
  • Protostomia – deuterostomia.  Let’s say about 530 My ago.
  • Pre-vertebrate deuterostomia (including chordates like cephalocordata and Tunicates) – Vertebrates  (Cartilaginous fish). Let’s say 440 My ago.
  • Cartilaginous fish – Bony fish. Let’s say about 410 My ago.
  • Bony fish – Tetrapods (Amphibians). Let’s say 370 My ago, more or less.
  • Amphibians – Amniota (Sauropsida, Crocodylia): about 340 My ago
  • Sauropsida (Crocodylia) – Synapsida (Metatheria, Marsupialia): about 310 My ago
  • Metatheria – Eutheria (Afrotheria): about 150 My ago
  • Atlantogenata (Afrotheria) – Boreoeutheria: probably about 100 My ago.

The simple rule is: for each split, the second member of each split is the line to humans, and the human conserved information present in the first member of each couple must have been conserved in both lines at least from the time of the split to present day.

So, for example, the human-conserved information in Cnidaria has been conserved for at least 555 MY, the human-conserved information in Crocodylia has been conserved for at least 310 My, and so on.

The problem of redundancy (repeated information).

However, there is an important problem that requires attention. Not all the information in the human proteome is unique, in the sense of “present only once”. Many sequences, especially domains, are repeated many times, in more or less similar way, in many different proteins. Let’s call this “the problem of redundancy”.

So, all the results that we obtain about homologies of the human proteome to some other organism or group of organisms should be corrected for that factor, if we want to draw conclusions about the real amount of new functional information in a transition. Of course, repeated information will inflate the apparent amount of new functional information.

Therefore, I computed a “coefficient of correction for redundancy” for each protein in the human proteome. For the moment, for the sake of simplicity, I will not go into the details of that computation, but I am ready to discuss it in depth if anyone is interested.

The interesting result is that the mean coefficient of correction is, according to my computations, 0.497. IOWs, we can say that about half of the potential information present in the human proteome can be considered unique, while about half can be considered as repeated information. This correction takes into account, for each protein in the human proteome, the number of proteins in the human proteome that have significant homologies to that protein and their mean homology.

So, when I give the results “corrected for redundancy” what I mean is that the homology values for each protein have been corrected multiplying them for the coefficient of that specific protein. Of course, in general, the results will be approximately halved.

Results

Table 1 shows the means of the values of total homology (bitscore) with human proteins in bits and in bits per aminoacid for the various groups of organisms.

 

Group of organisms Homology bitscore

(mean)

Total homology

bitscore

Bits per aminoacid

(mean)

Cnidaria 276.9 5465491 0.543
Cephalopoda 275.6 5324040 0.530
Deuterostomia (non vertebrates) 357.6 7041769 0.671
Cartilaginous fish 541.4 10773387 0.949
Bony fish 601.5 11853443 1.064
Amphibians 630.4 12479403 1.107
Crocodylia 706.2 13910052 1.217
Marsupialia 777.5 15515530 1.354
Afrotheria 936.2 18751656 1.629
Maximum possible value (for identity) 24905793 2.2

 

Figure 1 shows a plot of the mean bits-per-aminoacid score in the various groups of organisms, according to the mentioned approximate times of split.

Figure 2 shows a plot of the density distribution of human-conserved functional information in the various groups of organisms.

 

 

 

The jump to vertebrates.

Now, let’s see how big are the informational jumps for each split, always in relation to human conserved information.

The following table sums up the size of each jump:

 

 

 

 

Split Homology bitscore jump (mean) Total homology bitscore jump Bits per aminoacid (mean)
Homology bits in Cnidaria 5465491 0.54
Cnidaria – Bilateria (cephalopoda) -6.3 -121252 -0.02
Protostomia (Cephalopoda)- Deuterostomia 87.9 1685550 0.15
Deuterostomia (non vert.) – Vertebrates (Cartilaginous fish) 189.6 3708977 0.29
Cartilaginous fish-Bony fish 54.9 1073964 0.11
Bony fish-Tetrapoda (Amphibians) 31.9 624344 0.05
Amphibians-Amniota (Crocodylia) 73.3 1430963 0.11
Sauropsida (Crocodylia)-Synapsida (Marsupialia) 80.8 1585361 0.15
Metatheria (Marsupialia) – Eutheria (Afrotheria) 162.2 3226932 0.28
Total bits of homology in Afrotheria 18751656 1.63
Total bits of maximum information in  humans 24905793 2.20

 

The same jumps are shown graphically in Figure 3:

 

As everyone can see, each of these splits, except the first one (Cnidaria-Bilateria) is characterized by a very relevant informational jumps in terms of human-conserved information. The split is in general of the order of 0.5 – 1.5 million bits.

However, two splits are characterized by a much bigger jump: the prevertebrate-vertebrate split reaches 3.7 million bits, while the Methateria-Eutheria split is very near, with 3.2 million bits.

For the moment I will discuss only the prevertebrate-vertebrate jump.

This is where a great part of the functional information present in humans seems to have been generated: 3.7 million bits, and about 0.29 bits per aminoacid of new functional information.

Let’s see that jump also in terms of information density, looking again at Figure 2, but only with the first 4 groups of organisms:

 

Where is the jump here?

 

We can see that the density distribution is almost identical for Cnidaria and Cephalopoda. Deuterostomia (non vertebrates) have a definite gain in human-conserved information, as we know, it is about 1.68 million bits, and it corresponds to the grey area (and, obviously, to the lower peak of low-homology proteins).

But the real big jump is in vertebrates (cartilaginous fish). The pink area and the lower peak in the low homology zone correspond to the amazing acquisition of about 3.7 million bits of human-conserved functional information.

That means that a significant percentage of proteins in cartilaginous fish had a high homology, higher than 1 bit per aminoacid, with the corresponding human protein. Indeed, that is true for 9574 proteins out of 19898, 48.12% of the proteome. For comparison, these high homology proteins are “only” 4459 out of 19689,  22.65% of the proteome in pre-vertebrates.

So, in the transition from pre-vertebrates to vertebrates, the following amazing events took place:

  • About 3,7 million bits of human-conserved functional information were generated
  • A mean increase of about 190 bits per proteins of that information took place
  • The number of high human homology proteins more than doubled

Correcting for redundancy

However, we must still correct for redundancy if we want to know how much really new functional information was generated in the transition to vertebrates. As I have explained, we should expect that about half of the total information can be considered unique information.

Making the correction for each single protein, the final result is that the total number of new unique functional bits that appear for the first time in the transition to vertebrates, and are then conserved up to humans, is:

1,764,427  bits

IOWs, more than 1.7 million bits of unique new human-conserved functional information are generated in the proteome with the transition to vertebrates.

But what does 1.7 million bits really mean?

I would like to remind that we are dealing with exponential values here. A functional complexity of 1.7 million bits means a probability (in a random search) of:

1:2^1.7 million

A quite amazing number indeed!

Just remember that Dembski’s Universal Probability Bound is 500 bits, a complexity of 2^500. Our number (2^1764427) is so much bigger that the UPB seems almost a joke, in comparison.

Moreover, this huge modification in the proteome seems to be strongly constrained and definitely necessary for the new vertebrate bodily system, so much so that it is conserved for hundreds of millions of years after its appearance.

Well, that is enough for the moment. The analysis tools I have presented here can be used for many other interesting purposes, for example to compare the evolutionary history of proteins or groups of proteins. But that will probably be the object of further posts.

Comments
Apparently when the discussion gets really scientific the politely dissenting interlocutors take time off? Only 4 politely dissenting comments out of 120? @16 answered by GP @18 @23 answered by GP @25 @28 answered by GP @29 @30 answered by GP @32 (also answered by Origenes @34) Is that all? Did they run out of valid arguments? Did they ever have any? :) Since this discussion was so interesting, it proves once more that the politely dissenting interlocutors may not add value to the threads?Dionisio
March 26, 2017
March
03
Mar
26
26
2017
08:32 PM
8
08
32
PM
PDT
Upright BiPed @5:
I expect you may get caught up in cross-platform comments [...]
Dionisio @100
Did that happen yet?
gpuccio @101:
Apparently not.
would it matter?Dionisio
March 26, 2017
March
03
Mar
26
26
2017
08:11 PM
8
08
11
PM
PDT
gpuccio, Yes, I see it now. I missed it because didn't scroll back to the earlier posts. My mistake. Yes, that was nice of timothya to write objections and ask for clarifications in such a polite manner. Maybe other antagonists will notice it and do similarly?Dionisio
March 26, 2017
March
03
Mar
26
26
2017
02:08 PM
2
02
08
PM
PDT
Dionisio: He made a generic objection at #16, to which I wnaswered at #18. Then, let's say that he "asked for clarifications" about some little misunderstanding of the cladistics at #23, to which I answered at #25 , and again at #28, to which I answered at #29. After that he kept his peace. Not much, I agree, but there was nothing else (from our kind antagonists). So, thank you, timothya!gpuccio
March 26, 2017
March
03
Mar
26
26
2017
12:39 PM
12
12
39
PM
PDT
gpuccio @115: where did you see that politely-dissenting interlocutor wrote about your article? I don't see it. What's the post #? BTW, the number of anonymous visits is over 7 times greater than the number of posted comments.Dionisio
March 26, 2017
March
03
Mar
26
26
2017
12:21 PM
12
12
21
PM
PDT
Origenes
we find that not only it is not true (Hayashi), but, moreover, the kind of function Swamidass is talking about is utterly irrelevant. What Swamidass needs to show is that fitting, complete and well integrated functions are ‘very close to one another and abundant’.
This is not consistent with what we see in cells especially the cell nucleus. The requirement is, proteins with unique binding capability or the transcriptional paths that regulate cell function would not be possible. If a JNK path protein starts to bind with a KRAS path protein the cell mis functions. The design of proteins is based on a sequence of amino acids defined in the 4 nucleotides of DNA. This is a sequence (the largest type of mathematical space) and it is possible to generate almost infinite diversity which is required for functioning living cells. Finding function through a random search not so much :-)bill cole
March 26, 2017
March
03
Mar
26
26
2017
11:31 AM
11
11
31
AM
PDT
Friends: I suppose that I should really thank timothya. Apparently, he is the only one that has tried to criticize my post. As they say: "No critics, no party!" Well, thank you, timothya. Seriously, and from my heart.gpuccio
March 26, 2017
March
03
Mar
26
26
2017
10:04 AM
10
10
04
AM
PDT
Eric Anderson @109:
There is this incredibly pervasive and naive concept in biology, in particular in the materialist strain of thought, that once you get some proteins floating around in the cell things just automatically come together by force of chemistry and physics to form wonderful, complex, functional systems. We’ve even had a few interlocutors on these pages argue vehemently for such a claim.
Yes, those are the same folks that would invest all their money to buy oceanfront apartments in the middle of Kansas or Siberia without questioning it. :)Dionisio
March 26, 2017
March
03
Mar
26
26
2017
09:51 AM
9
09
51
AM
PDT
Clarification for the comment @90 & 92: Note that every Delta(x) must include all the developmental changes to the associated GRNs, signaling pathways and cascades, morphogen gradient formation and interpretation mechanisms, epigenome, epitranscriptome, etc. required to transform Dev(ca) to Dev(x), where x={d1,d2} according to every particular case. Once Dev(ca) and Dev(x) are precisely described, then it shouldn't be a major issue to determine the corresponding Delta(x) for every descendant.Dionisio
March 26, 2017
March
03
Mar
26
26
2017
09:39 AM
9
09
39
AM
PDT
Origenes: "we find that not only it is not true (Hayashi), but, moreover, the kind of function Swamidass is talking about is utterly irrelevant. What Swamidass needs to show is that fitting, complete and well integrated functions are ‘very close to one another and abundant’." OK,let's wait for him to show that. But I am not holding my breath! :)gpuccio
March 26, 2017
March
03
Mar
26
26
2017
09:31 AM
9
09
31
AM
PDT
Eric Anderson: "Yes, even something as “simple” as where you tooth grows and what size it is, or where your nose forms and how it is shaped." Absolutely: The origin of form still remains one of the great mysteries out there. Morphogens are certainly part of the answer, but just a small part. And the epigenetic approach is simply confirming what we already knew: that we really don't understand how fomr, from whole body plans to the minuscule details, originates. I would like, in that regard, to quote the title of one of the pioneers of ID thought, the great italian biologist Giuseppe Sermonti: "Dimenticare Darwin. Perché la mosca non è un cavallo." (Forgetting Darwin. Why a fly is not a horse.)gpuccio
March 26, 2017
March
03
Mar
26
26
2017
09:29 AM
9
09
29
AM
PDT
GPuccio: ... it [natural selection] cannot act on “any possible function”, but only on an extremely restricted subset of functions, and even that only when the function is completely there, and well integrated in the context.
When we look at Swamidass' statement:
".... the pervasive observation of multifunctional proteins suggests that functions are actually very close to one another and abundant."
we find that not only it is not true (Hayashi), but, moreover, the kind of function Swamidass is talking about is utterly irrelevant. What Swamidass needs to show is that fitting, complete and well integrated functions are 'very close to one another and abundant'.Origenes
March 26, 2017
March
03
Mar
26
26
2017
09:23 AM
9
09
23
AM
PDT
gpuccio @103:
Just think: if all that complexity of process is implied to just get different sizes in two similar teeth . . .
This is precisely one of the examples I was thinking about when I said @69:
Finally, there are thousands upon thousands of biological functions and processes that we know exist and for which clear genetic instructions have not yet been identified — vastly more in fact, than what has thus far been identified. It follows as a matter of logic that either (a) a significant amount of allegedly-non-functional DNA will turn out to have function, or (b) the known functional DNA will turn out to have many more layers of multiple-functionality than currently thought. Or a combination of the two.
I'm glad see someone has looked into the teeth at least. ----- I like to give a couple of examples when people question how much there is yet to learn -- two examples that they can quickly grasp from a personal, individual level: your teeth and your nose. There is this incredibly pervasive and naive concept in biology, in particular in the materialist strain of thought, that once you get some proteins floating around in the cell things just automatically come together by force of chemistry and physics to form wonderful, complex, functional systems. We've even had a few interlocutors on these pages argue vehemently for such a claim. It is simply false. Blatantly and obviously so. But we have to stop and think through the details in order to realize it. Here is the reality, an ongoing prediction from the design perspective: We will find that scarcely anything important happens in the cell by dint of pure chemistry and physics. Anything useful or important that we want to have occur must be carefully orchestrated, monitored, moderated, controlled. Yes, even something as "simple" as where you tooth grows and what size it is, or where your nose forms and how it is shaped.Eric Anderson
March 26, 2017
March
03
Mar
26
26
2017
09:16 AM
9
09
16
AM
PDT
Origenes: The point that most functions, which could in principle be useful in some context, will be useless or detrimental in a specific complex system, like a specific cell of a specifi organism, is a very important one. It is a very strong counter-argument to the common pseudo-argument often made by darwinists, that NS is really special because it can select "any possible function". The "any possible function" argument is, indeed, completely wrong. NS has extremely strong restrictions, and the main are: 1) It can select only for those functions that are useful in that specific context: cell, organism, epigenetic scenario, and so on. 2) The new function, moreover, must not only be potentially useful, but also well integrated in the context, so that it is really, practically useful. As yuou have so well emphasized. 3) Finally, the new function must not only be useful in a general sense: it must be able to confer a detectable reproductive advantage, so that NS can act on it and fix it by negative and positive selection. This is very important, because many seem to forget it: any function, even if useful, is invisible to NS if it does not confer a detectable reproductive advantage, and in many cases even a detectable reproductive advantage will not be enough to fix the new trait. And it will be useful to remind that NS can act only when the new function is completely there, ready to work to confer the reproductive advantage. All the steps that precede that state are completely invisible to NS. Until such a function appears, all the rest is a random walk, and nothing else. With all these restrictions, we can safely state that Natural Selection is really powerless when functional complexity is implied: it cannot act on "any possible function", but only on an extremely restricted subset of functions, and even that only when the function is completely there, and well integrated in the context. While I am very sure that the protein functional state is not dense in connected functions, I am of course even more sure, beyond any doubt, that it is not dense in naturally selectable, connected functions.gpuccio
March 26, 2017
March
03
Mar
26
26
2017
08:09 AM
8
08
09
AM
PDT
GPuccio: Sometimes darwinists seem to forget that not any function is good in a cell environment. Many “functions” are simply useless, or detrimental.
Excellent crucial point. Suppose a DVD copy machine which produces a few random copy-errors every time it makes a copy. Suppose this machine makes 10 copies of a newly purchased functional Windows 7 DVD. Now remove this original DVD and repeat the imperfect copy process starting with functional second generation “mutated” copies (dysfunctional copies are removed from the process after testing). Next remove all second generation copies and repeat the copy process starting with functional third generation mutated copies. And so forth. Question: who of us would expect this imperfect copy process to be anything other than the degeneration of Windows 7 eventually leading to mutated copies which are, without exception, dysfunctional? And finally, if by sheer dumb luck random copy-errors produce some new functional code, who of us expects this new functionality to fit in? If some new activity stems from code produced by random errors, who would be optimistic? IOWs who would expect this new functionality to be integrated in Windows 7? Who of us expects versions of Windows 7 with improved functionality?Origenes
March 26, 2017
March
03
Mar
26
26
2017
07:04 AM
7
07
04
AM
PDT
gpuccio @104:
This is very interesting. 83 genes that are expressed in a strongly different way would seem more than enough to explain the difference between upper and lower molar in the mouse. And yet, they are not the answer. Part of the answer are a lot of minor differences in expression in a lot of other genes, differences that only a global statistical approach like Principal Component Analysis can reveal.
I like the comprehensive way you have summarized it.
Epigenetic regulation is like that: a general, pervasive, extremely balanced program for each different situation.
That's a very interesting way to describe it in few words.
A miracle? From a programmer’s point of view, I would definitely say yes!
Fully agree! Thanks.Dionisio
March 26, 2017
March
03
Mar
26
26
2017
06:21 AM
6
06
21
AM
PDT
gpuccio @103:
[...] if all that complexity of process is implied to just get different sizes in two similar teeth, what can we expect for more complex differentiation processes at phenotypic level?
That's a very logical observation and question.
The methodology of this paper is very interesting and original: trying to link phenotypic differences to the whole epigenetic scenario is indeed a very correct way to understand what really happens.
I see your valid point. Agree.
[...] a real understanding of what drives and controls these complex processes is still lacking…
Yes, perhaps that's why we see the 'junk of the gaps' and the Neo-Darwinian 'just so' fairytale propaganda still going unchallenged in many academic circles. However, with the accelerated improvement of cellular & molecular visualization technology and the refining of the computer-based modeling techniques used by dedicated interdisciplinary research teams, we should see a growing avalanche of discoveries coming out of the numerous wet and dry labs, thus shedding light on the big picture we all are so eager to understand. Obviously, as some outstanding questions will get answered, new ones will be posed. But with every discovery the intelligence design paradigm will get strengthened further while the Neo-Darwinism will continue its fall into the trash bin where it belongs, remaining solely as a shameful historic reminder of what can happen when science is not done honestly, humbly, with open-mindedness, thinking out of wrongly preconceived boxes, avoiding gross extrapolations and misinterpretations of the available evidences. We're told to test everything and hold only what is good. In this thread you showed how to do serious, honest, thorough research. I'm sure many readers here, including the anonymous onlookers/lurkers (7 times more than the number of posted comments) really appreciate it.Dionisio
March 26, 2017
March
03
Mar
26
26
2017
06:09 AM
6
06
09
AM
PDT
Dionisio:
We first wondered whether this pattern could be caused by specific or strongly biased genes that would mark a clear lower versus upper molar identity, as identified for early jaws [50]. A single gene, Nkx2-3, was specifically expressed in the lower molar and we found no gene specific for the upper molar, although the top upper gene, Pou3f3, was about ten times more expressed in the upper molar. If fact, when taking all samples from each kind of tooth in the large-scale dataset as replicates (so in total, eight replicates for upper and eight for lower molar), we found 1347 genes (out of 14,808) differentially expressed (“upper/lower DE genes,” adjusted P value?<?0.1), out of which only 83 show more than a twofold excess difference (see Additional file 2: Table S1). This included genes known for their role in molar or jaw specification (like Dlx5 and 6, Pou3f3, and two associated non-coding RNAs (ncRNA), 2900092D14Rik, 2610017I09Rik [50], Pitx1 [43]). We concluded that there were relatively few genes that were consistently biased with a fold change over 2 throughout stages. On a developmental point of view, however, these consistently biased genes were possibly sufficient to provide and sustain different orientations for upper and lower molar development. Are those consistently biased genes sufficient to explain the genomic signature that separates the upper and the lower tooth? To answer this question, we looked how far this genomic signature would resist the removal of differentially expressed genes. Removing the 83 above-mentioned genes had a marginal effect on the second axis of PCA (11.4% of variation explained instead of 12.9% with the 83 genes). In fact, the second axis of the PCA still separated the upper and lower samples and represented a significant amount of the total variation, even when all differentially expressed genes were removed: after removing 1347 DE genes that were found when the eight stages are taken as replicates, the axis that splits upper and lower tooth represented 9.3% of the total variation. Upper/lower DE genes can also be estimated taking time into account (DESeq2, adjusted P value?<?0.1), which is less stringent and resulted in 3155 DE genes: after removing these genes, the axis that splits the upper and the lower tooth represented 7.8% of the total variation. We concluded that the upper/lower transcriptomic signature was not only carried by sets of genes that are moderately to strongly biased throughout the developmental period, but also by more subtle gene expression differences in a very large number of genes.
This is very interesting. 83 genes that are expressed in a strongly different way would seem more than enough to explain the difference between upper and lower molar in the mouse. And yet, they are not the answer. Part of the answer are a lot of minor differences in expression in a lot of other genes, differences that only a global statistical approach like Principal Component Analysis can reveal. Epigenetic regulation is like that: a general, pervasive, extremely balanced program for each different situation. A miracle? From a programmer's point of view, I would definitely say yes! :)gpuccio
March 26, 2017
March
03
Mar
26
26
2017
03:04 AM
3
03
04
AM
PDT
Dionisio: Wow! You are really good at finding interesting articles. Just think: if all that complexity of process is implied to just get different sizes in two similar teeth, what can we expect for more complex differentiation processes at phenotypic level? The methodology of this paper is very interesting and original: trying to link phenotypic differences to the whole epigenetic scenario is indeed a very correct way to understand what really happens. Of course, a real understanding of what drives and controls these complex processes is still lacking... :)gpuccio
March 26, 2017
March
03
Mar
26
26
2017
12:46 AM
12
12
46
AM
PDT
Check this out: Transcriptomic signatures shaped by cell proportions shed light on comparative developmental biology Article (PDF Available)?in?Genome biology 18(1) DOI: 10.1186/s13059-017-1157-7Dionisio
March 25, 2017
March
03
Mar
25
25
2017
08:56 PM
8
08
56
PM
PDT
Dionisio: "Did that happen yet?" Apparently not.gpuccio
March 25, 2017
March
03
Mar
25
25
2017
02:43 PM
2
02
43
PM
PDT
Upright BiPed @5:
I expect you may get caught up in cross-platform comments with TSZ, as you did in your last paper.
Did that happen yet?Dionisio
March 25, 2017
March
03
Mar
25
25
2017
12:56 PM
12
12
56
PM
PDT
DATCG @67: https://phys.org/news/2017-03-circular-rnas-non-coding-encode-proteins.html Yes, that's an interesting paper. Thanks.Dionisio
March 25, 2017
March
03
Mar
25
25
2017
12:49 PM
12
12
49
PM
PDT
bill cole What about Neo-Darwinism of the gaps? :)Dionisio
March 25, 2017
March
03
Mar
25
25
2017
12:23 PM
12
12
23
PM
PDT
gpuccio, Maybe the evo-devo folks should give us some credits for trying to help them resolve their equations? :)Dionisio
March 25, 2017
March
03
Mar
25
25
2017
12:21 PM
12
12
21
PM
PDT
bill cole: Junk of the gaps it is, definitely! :)gpuccio
March 25, 2017
March
03
Mar
25
25
2017
11:56 AM
11
11
56
AM
PDT
Gpuccio
Indeed, there is a strong crusade against function, the most imbecile intellectual war ever fought, whose only purpose is to constrain observed, exuberant, amazing biological function in the narrow limits of present dogma and little imagination (how strange, in an academic class that has such huge imagination in elaborating darwinian fairy tales! ???? ).
Yes, its a junk of the gaps discussion.:-)bill cole
March 25, 2017
March
03
Mar
25
25
2017
11:51 AM
11
11
51
AM
PDT
Dionisio: "Does this make sense?" Yes. "My limited mind and poor understanding of biology, physics and chemistry don’t allow me to approach such a difficult task." You are in good company. Count me in! :)gpuccio
March 25, 2017
March
03
Mar
25
25
2017
11:39 AM
11
11
39
AM
PDT
gpuccio @88:
Have you ever wondered at how DNA metilation, chromosome 3D architecture, histone post-translactional modifications, RNA splicing, microRNAs, mRNA methylation, and who knows what else, all seem to contribute to parallel cross regulations of final transcription, so much so that it is really difficult even to begin to disentangle that multiple, redundant, wonderfully complex network of meanings?
I have started to wonder at that, but my thoughts have drifted away into fantasyland and I've felt kind of hypnotized and levitated or beamed up to the seventh heaven. Definitely that task is above my pay grade. My limited mind and poor understanding of biology, physics and chemistry don't allow me to approach such a difficult task. I let the scientists figure out that. :)Dionisio
March 25, 2017
March
03
Mar
25
25
2017
11:31 AM
11
11
31
AM
PDT
gpuccio,
Regarding you question about the equations, I think you are right, but how do you think we can set the equations, with so many components we still don’t understand?
That's an excellent question. I think you're right about our current level of understanding on the subject. At this point, as far as I can see, your work on the proteome information jumps is perhaps the better known part of the Delta(x) components of the equations, but still does not tell us how those information jumps appeared. The Delta(x) components are not only about the observed differences in the mechanisms or the parts, but about the procedure leading to those differences in the mechanisms and parts. In the case of the proteome information jumps it would require to define the procedure(s) required to get those proteome differences inserted. As you mentioned earlier in this discussion, there are other things that must be accounted for, but are not well characterized or are poorly understood. But perhaps that's the ultimate validation (falsification) test for the evo-devo conundrum. In order to determine Delta(d1) and Delta(d2) we must know very precisely Dev(ca), Dev(d1) and Dev(d2). Is there any workaround? Then, once those three main components are known, then we could work on determining the possible spatiotemporal physicochemical changes that could be included in Delta(d1) and Delta(d2). Does this make sense? Is there a better way to do this? Thanks.Dionisio
March 25, 2017
March
03
Mar
25
25
2017
11:11 AM
11
11
11
AM
PDT
1 3 4 5 6 7 9

Leave a Reply