Uncommon Descent Serving The Intelligent Design Community

The amazing level of engineering in the transition to the vertebrate proteome: a global analysis

Categories
Intelligent Design
Share
Facebook
Twitter/X
LinkedIn
Flipboard
Print
Email

As a follow-up to my previous post:

I am presenting here some results obtained by a general application, expanded to the whole human proteome, of the procedure already introduced in that post.

Main assumptions.

The aim of the procedure is to measure a well defined equivalent of functional information in proteins: the information that is conserved throughout long evolutionary times, in a well specified evolutionary line.

The simple assumption is that  such information, which is not modified by neutral variation in a time span of hundreds of million years, is certainly highly functionally constrained, and is therefore a very good empirical approximation of the value of functional information in a protein.

In particular, I will use the proteins in the human proteome as “probes” to measure the information that is conserved from different evolutionary timepoints.

The assumption here is very simple. Let’s say that the line that includes humans (let’s call it A) splits from some different line (let’s call it B) at some evolutionary timepoint T. Then, the homology that we observe in a protein when we compare organisms derived from B  and humans (derived from A) must have survived neutral variation throughout the timespan from T to now. If the timespan is long enough, we can very safely assume that the measured homology is a measure of some specific functional information conserved from the time of the split to now.

Procedure.

I downloaded a list of the basic human proteome (in FASTA form). In particular, I downloaded it from UNIPROT selecting all human reviewed sequences, for a total of 20171 sequences. That is a good approximation of the basic human  proteome as known at present.

I used NCBI’s blast tool in local form to blast the whole human proteome against known protein sequences from specific groups of organisms, using the nr (non redundant) NCBI database of protein sequences, and selecting, for each human protein, the alignment with the highest homology bitscore from that group of organisms.

Homology values:

I have used two different measures of homology for each protein alignment:

  1. The total bitscore from the BLAST alignment (from now on: “bits”)
  2. The ratio of the total bitscore to the length in aminoacids of the human protein, that I have called “bits per aminoacid” (from now on, “baa”). This is a measure of the mean “density” of functional information in that protein, which corrects for the protein length.

The values of homology in bits have a very wide range of variation  in each specific comparison with a group of organisms. For example, in the comparison between human proteins and the proteins in cartilaginous fish, the range of bit homology per protein is 21.6 – 34368, with a mean of 541.4 and a median of 376 bits.

The vlaues of homology in baa , instead, are necessarily confined between 0 and about 2.2. 2.2, indeed, is (approximately) the highest homology bitscore (per aminoacid) that we get when we blast a protein against itself (total identity).  I use the BLAST bitscore because it is a widely used and accepted way to measure homology and to derive probabilities from it (the E values).

So, for example, in the same human – cartilaginous fish comparison, the range of the baa values is:  0.012 – 2.126, with a mean of 0.95 and a median of 0.97 baas.

For each comparison, a small number of proteins (usually about 1-2%) did not result in any significant alignment, and were not included in the specific analysis for that comparison.

Organism categories and split times:

The analysis includes the following groups of organisms:

  • Cnidaria
  • Cephalopoda (as a representative sample of Mollusca, and more in general Protostomia: cephalopoda and more generally Mollusca, are, among Protostomia, a group with highest homology to deuterostomia, and therefore can be a good sample to evaluate conservation from the protostomia – deuterostomia split).
  • Deuterostomia (excluding vertebrates): this includes echinoderms, hemichordates and chordates (excluding vertebrates).
  • Cartilaginous fish
  • Bony fish
  • Amphibians
  • Crocodylia, including crocodiles and alligators (as a representative sample of reptiles, excluding birds. Here again, crocodylia have usually the highest homology with human proteins among reptiles, together maybe with turtles).
  • Marsupials (an infraclass of mammals representing Metatheria, a clade which split early enough from the human lineage)
  • Afrotheria, including elephants and other groups (representing a group of mammals relatively distant from the human lineage, in the Eutheria clade)

There are reasons for these choices, but I will not discuss them in detail for the moment. The main purpose is always to detect the functional information (in form of homology) that was present at specific split times, and has been therefore conserved in both lines after the split. In a couple of cases (Protostomia, Reptiles), I have used a smaller group (Cephalopoda, Crocodylia) which could reasonably represent the wider group, because using very big groups of sequences (like all protostomia, for example) was too time consuming for my resources.

So what are the split times we are considering? This is a very difficult question, because split times are not well known, and very often you can get very different values for them from different sources. Moreover, I am not at all an expert of these issues.

So, the best I can do is to give here some reasonable proposal, from what I have found, but I am completely open to any suggestions to improve my judgements. In each split, humans derive from the second line:

  • Cnidaria – Bilateria. Let’s say at least 555 My ago.
  • Protostomia – deuterostomia.  Let’s say about 530 My ago.
  • Pre-vertebrate deuterostomia (including chordates like cephalocordata and Tunicates) – Vertebrates  (Cartilaginous fish). Let’s say 440 My ago.
  • Cartilaginous fish – Bony fish. Let’s say about 410 My ago.
  • Bony fish – Tetrapods (Amphibians). Let’s say 370 My ago, more or less.
  • Amphibians – Amniota (Sauropsida, Crocodylia): about 340 My ago
  • Sauropsida (Crocodylia) – Synapsida (Metatheria, Marsupialia): about 310 My ago
  • Metatheria – Eutheria (Afrotheria): about 150 My ago
  • Atlantogenata (Afrotheria) – Boreoeutheria: probably about 100 My ago.

The simple rule is: for each split, the second member of each split is the line to humans, and the human conserved information present in the first member of each couple must have been conserved in both lines at least from the time of the split to present day.

So, for example, the human-conserved information in Cnidaria has been conserved for at least 555 MY, the human-conserved information in Crocodylia has been conserved for at least 310 My, and so on.

The problem of redundancy (repeated information).

However, there is an important problem that requires attention. Not all the information in the human proteome is unique, in the sense of “present only once”. Many sequences, especially domains, are repeated many times, in more or less similar way, in many different proteins. Let’s call this “the problem of redundancy”.

So, all the results that we obtain about homologies of the human proteome to some other organism or group of organisms should be corrected for that factor, if we want to draw conclusions about the real amount of new functional information in a transition. Of course, repeated information will inflate the apparent amount of new functional information.

Therefore, I computed a “coefficient of correction for redundancy” for each protein in the human proteome. For the moment, for the sake of simplicity, I will not go into the details of that computation, but I am ready to discuss it in depth if anyone is interested.

The interesting result is that the mean coefficient of correction is, according to my computations, 0.497. IOWs, we can say that about half of the potential information present in the human proteome can be considered unique, while about half can be considered as repeated information. This correction takes into account, for each protein in the human proteome, the number of proteins in the human proteome that have significant homologies to that protein and their mean homology.

So, when I give the results “corrected for redundancy” what I mean is that the homology values for each protein have been corrected multiplying them for the coefficient of that specific protein. Of course, in general, the results will be approximately halved.

Results

Table 1 shows the means of the values of total homology (bitscore) with human proteins in bits and in bits per aminoacid for the various groups of organisms.

 

Group of organisms Homology bitscore

(mean)

Total homology

bitscore

Bits per aminoacid

(mean)

Cnidaria 276.9 5465491 0.543
Cephalopoda 275.6 5324040 0.530
Deuterostomia (non vertebrates) 357.6 7041769 0.671
Cartilaginous fish 541.4 10773387 0.949
Bony fish 601.5 11853443 1.064
Amphibians 630.4 12479403 1.107
Crocodylia 706.2 13910052 1.217
Marsupialia 777.5 15515530 1.354
Afrotheria 936.2 18751656 1.629
Maximum possible value (for identity) 24905793 2.2

 

Figure 1 shows a plot of the mean bits-per-aminoacid score in the various groups of organisms, according to the mentioned approximate times of split.

Figure 2 shows a plot of the density distribution of human-conserved functional information in the various groups of organisms.

 

 

 

The jump to vertebrates.

Now, let’s see how big are the informational jumps for each split, always in relation to human conserved information.

The following table sums up the size of each jump:

 

 

 

 

Split Homology bitscore jump (mean) Total homology bitscore jump Bits per aminoacid (mean)
Homology bits in Cnidaria 5465491 0.54
Cnidaria – Bilateria (cephalopoda) -6.3 -121252 -0.02
Protostomia (Cephalopoda)- Deuterostomia 87.9 1685550 0.15
Deuterostomia (non vert.) – Vertebrates (Cartilaginous fish) 189.6 3708977 0.29
Cartilaginous fish-Bony fish 54.9 1073964 0.11
Bony fish-Tetrapoda (Amphibians) 31.9 624344 0.05
Amphibians-Amniota (Crocodylia) 73.3 1430963 0.11
Sauropsida (Crocodylia)-Synapsida (Marsupialia) 80.8 1585361 0.15
Metatheria (Marsupialia) – Eutheria (Afrotheria) 162.2 3226932 0.28
Total bits of homology in Afrotheria 18751656 1.63
Total bits of maximum information in  humans 24905793 2.20

 

The same jumps are shown graphically in Figure 3:

 

As everyone can see, each of these splits, except the first one (Cnidaria-Bilateria) is characterized by a very relevant informational jumps in terms of human-conserved information. The split is in general of the order of 0.5 – 1.5 million bits.

However, two splits are characterized by a much bigger jump: the prevertebrate-vertebrate split reaches 3.7 million bits, while the Methateria-Eutheria split is very near, with 3.2 million bits.

For the moment I will discuss only the prevertebrate-vertebrate jump.

This is where a great part of the functional information present in humans seems to have been generated: 3.7 million bits, and about 0.29 bits per aminoacid of new functional information.

Let’s see that jump also in terms of information density, looking again at Figure 2, but only with the first 4 groups of organisms:

 

Where is the jump here?

 

We can see that the density distribution is almost identical for Cnidaria and Cephalopoda. Deuterostomia (non vertebrates) have a definite gain in human-conserved information, as we know, it is about 1.68 million bits, and it corresponds to the grey area (and, obviously, to the lower peak of low-homology proteins).

But the real big jump is in vertebrates (cartilaginous fish). The pink area and the lower peak in the low homology zone correspond to the amazing acquisition of about 3.7 million bits of human-conserved functional information.

That means that a significant percentage of proteins in cartilaginous fish had a high homology, higher than 1 bit per aminoacid, with the corresponding human protein. Indeed, that is true for 9574 proteins out of 19898, 48.12% of the proteome. For comparison, these high homology proteins are “only” 4459 out of 19689,  22.65% of the proteome in pre-vertebrates.

So, in the transition from pre-vertebrates to vertebrates, the following amazing events took place:

  • About 3,7 million bits of human-conserved functional information were generated
  • A mean increase of about 190 bits per proteins of that information took place
  • The number of high human homology proteins more than doubled

Correcting for redundancy

However, we must still correct for redundancy if we want to know how much really new functional information was generated in the transition to vertebrates. As I have explained, we should expect that about half of the total information can be considered unique information.

Making the correction for each single protein, the final result is that the total number of new unique functional bits that appear for the first time in the transition to vertebrates, and are then conserved up to humans, is:

1,764,427  bits

IOWs, more than 1.7 million bits of unique new human-conserved functional information are generated in the proteome with the transition to vertebrates.

But what does 1.7 million bits really mean?

I would like to remind that we are dealing with exponential values here. A functional complexity of 1.7 million bits means a probability (in a random search) of:

1:2^1.7 million

A quite amazing number indeed!

Just remember that Dembski’s Universal Probability Bound is 500 bits, a complexity of 2^500. Our number (2^1764427) is so much bigger that the UPB seems almost a joke, in comparison.

Moreover, this huge modification in the proteome seems to be strongly constrained and definitely necessary for the new vertebrate bodily system, so much so that it is conserved for hundreds of millions of years after its appearance.

Well, that is enough for the moment. The analysis tools I have presented here can be used for many other interesting purposes, for example to compare the evolutionary history of proteins or groups of proteins. But that will probably be the object of further posts.

Comments
Dionisio: "BTW, I have not read much about this besides your articles in this site. I have difficulties memorizing so many names and don’t understand the classification criteria. BTW, are they based on physiological, phenotypic or genetic parameters?" I think they are based on the sum total of what is available. It is not a simple field, and experts are constantly debating about those things. Which is good. I suppose that the fossil record remains the foundation, but molecular data are a great contribution. Regarding you question about the equations, I think you are right, but how do you think we can set the equations, with so many components we still don't understand?gpuccio
March 25, 2017
March
03
Mar
25
25
2017
10:32 AM
10
10
32
AM
PST
gpuccio, Yes, your detailed explanation did help to clarify the methodology you've used for your article. Thanks. It's funny you wrote "please note that I am not an expert...". In this topic you definitely seem like an expert to me. You've written on this same subject before and I thought I had understood it, but now I realized I had not grasped it quite well yet. It takes me longer than normal. BTW, I have not read much about this besides your articles in this site. I have difficulties memorizing so many names and don't understand the classification criteria. BTW, are they based on physiological, phenotypic or genetic parameters? I've noticed an ongoing discussion in this site between common and uncommon descent, but I stay out of that debate, because don't understand it well, and nobody has enough spare time to explain it to me well enough to ensure that I understand it. Perhaps some folks are laughing while seeing someone needed additional clarification of your clear article. This may help them understand my situation: Uri Alon's 2014 System Biology course video is about 15-hour long, but it took me several months to watch it. Most people would have taken that course much faster. I couldn't finish MIT Jeff Gore's course, which is almost 10 hours longer. It was taking me too long and I had to move on with my project. That's life. :) In any case, I'm glad you started this discussion thread. I would like to have an example of the evo-devo problem posted @35, using real data -maybe from the classification you provided? Do you think that's feasible now? This could be a comprehensive illustration of a transition at one of the splits you've documented here. For example, ca = bilateria d1 = protostomia d2 = duterostomia Then knowing Dev(ca), Dev(d1) and Dev(d2), we could determine Delta(d1) and Delta(d2) as the required spatiotemporal physicochemical changes to resolve the equations. Actually, we could resolve just one of the equations, whichever is easier, based on the available information. We could choose a more recent split if we prefer to. Perhaps this is off topic here because it covers other things besides the proteome. Any objections, suggestions, ideas, corrections, comments? Thanks.Dionisio
March 25, 2017
March
03
Mar
25
25
2017
10:23 AM
10
10
23
AM
PST
Dionisio: "I like your honesty adjusting the results for redundancy." Eric Anderson: "Yes, I noticed the same thing when reading and appreciated the careful, honest approach gpuccio took." I am really glad that you appreciated that. It was not an easy task. In the beginning, when I became conscious of the problem, I had no idea of how to solve it. I was tempted to just acknowledge it, and hope that its relevance was not too great. Then I made some cautious attempt at trying to quantify it, and at some point I realized that it was really possible to correct it in a satisfying quantitative way. I am rather satisfied with the result, even if for simplicity I have not given the details of how it works. I do think that having found that about half of the human proteome can be traced to repetition is in itself a fine result.gpuccio
March 25, 2017
March
03
Mar
25
25
2017
10:12 AM
10
10
12
AM
PST
DATCG: I am completely with you about redundancy as a fundamental feature of complex engineered systems. A diploid genome is a basic example of redundancy. The multiple, interconnected signaling pathways from cell membrane to nucleus are reasonably another example. The combinatorial action of Transcription Factors could well qualify, too. But the most astounding levels of complex, stratified, interconnected, regulated redundancy are probably to be observed in epigenetics. Have you ever wondered at how DNA metilation, chromosome 3D architecture, histone post-translactional modifications, RNA splicing, microRNAs, mRNA methylation, and who knows what else, all seem to contribute to parallel cross regulations of final transcription, so much so that it is really difficult even to begin to disentangle that multiple, redundant, wonderfully complex network of meanings?gpuccio
March 25, 2017
March
03
Mar
25
25
2017
10:06 AM
10
10
06
AM
PST
Dionosio @78: Yes, that is why I try to use a qualifier, like "pervasive amounts." Both Darwinian theory and design can accommodate some amount of junk. The difference is in the expectations. Well, that and the real-world realities of engineering. Anyone who thinks the genome is littered with junk, that some 90%+ of DNA is junk, or any similar percentage is incredibly naive and doesn't have any idea what they are talking about from an engineering standpoint. The remarkable thing, is that even after all these years we still hear the occasional Darwinist claim that DNA is almost all junk -- clinging desperately to yet another failed Darwinian claim . . .Eric Anderson
March 25, 2017
March
03
Mar
25
25
2017
09:26 AM
9
09
26
AM
PST
Dionosio @80: +1 Yes, I noticed the same thing when reading and appreciated the careful, honest approach gpuccio took.Eric Anderson
March 25, 2017
March
03
Mar
25
25
2017
09:20 AM
9
09
20
AM
PST
Eric A., On Redundancy, one paper cites advantages...
"Organisms may exploit mutual repression among such redundant regulators, for example, to overcome stochastic fluctuations in protein expression. In such cases, expression of one redundant copy may be induced when expression of the repressing partner is temporarily reduced, thus negating the disruption."
From 2009, they mention evolution, but this can be a result of Prescribed planning or Front Loading as you mention... Genetic Redundancy: New Tricks for Old Genes DATCG
March 25, 2017
March
03
Mar
25
25
2017
08:46 AM
8
08
46
AM
PST
Gpuccio, again thank you. Yes, enjoying this discussion and look forward when I have more time to fully review this and previous post of yours.DATCG
March 25, 2017
March
03
Mar
25
25
2017
08:39 AM
8
08
39
AM
PST
#69 - Eric A.,
"The problem is that we know almost nothing about what is actually happening in the cell — from an engineering standpoint. We are still engaged in very crude reverse-engineering attempts:"
Thanks, reviewing and highlighting those areas, especially Redundancy to lead off. Then knockout experiments, never forget jumps to conclusions by Darwinist citing as evidence, turned out to be wrong. I agree, still at "crude" stage of reverse-engineering. And just to meet these levels of reverse-engineering requires best in technological innovations by researchers to view, let alone untangle and comprehend precise order, collaboration and coordination of the organized systems. I briefly debated a Darwinist once on Redundancy several years ago who thought Redundancy was a prime example of Darwinist prediction for JUNK DNA showing unguided process. Purely from a Design point of view, responded stating Redundancy can actually be a sign of highly efficient planning features that intelligent agents - ourselves - use daily in programming and networked systems for matters of efficiency, speed, for access around the globe. Including overcoming any disruptions to the systems to prevent downtime to live applications. Therefore, if biological functions are designed for survival I would expect to find built-in Redundancy for many logical reasons, not the least of which is prolonging survival during extremes or circumstances that might suppress certain areas, so that the need arises for differentiation and duplicate genes. I agree we're just scratching the surface of what ENCODE unearthed and it's wide open for discovery. The link I posted on circRNA above is an example of how new discoveries are taking place for previous areas once assumed not to encode for proteins. Excerpt from the article...
This discovery reveals an unexplored layer of gene activity in a type of molecule not previously thought to produce proteins. It also reveals the existence of a new universe of proteins not yet characterized.
"... new universe of proteins not yet characterized." That's quite a description about their discovery.DATCG
March 25, 2017
March
03
Mar
25
25
2017
08:37 AM
8
08
37
AM
PST
Dionisio: Yes, I would say that you get it right. Each split is between two different evolutionary branches: the first one becomes separated from the second, and the second one is the lineage from which humans are derived. For clarity, the human lineage, according to current evolutionary models, could be grossly described as follows (please note that I am not en expert, and that the following concepts are neither complete nor necessarily precise: it's just my best understanding): Domain: Eukaryota Kingdom: Animalia (Metazoa) Subkingdom: Eumetazoa (unranked): Bilateria Superphylum: Deuterostomia Phylum: Chordata Subphylum: Vertebrata Infraphylum: Gnathostomata Subgroup: Osteichthyes (Bony fish) Supercalss: Tetrapoda Clade: Amniota Clade: Synapsida Class: Mammalia Subclass: Theria Clade: Eutheria Magnorder: Boreoeutheria Order: Primates So, just to review briefly the splits I have considered: Cnidaria - Bilateria This is based mainly on boy symmetry. Cnidaria are Radiata. For Bilateria I have considered Cephalopoda, a class of Mollusca, because they are Bilateria but Protostomia, and among Protostomia they are extremely near to Deiterostomia (at protein homology level). As I ahve explained, using all Protostomia sequence would have been too time consuming, so I chose Cephalopoda as a representative sample. Of course, humans derive from Bilateria, and not from Radiata. Therefore, the human homology present in Cnidaria can be traced to before the split to Bilateria, IOWs to the common ancestor of Radiata and Bilateria. Before the split. Let's go to the second split: Protostomia - Deuterostomia Here I use again Cephalopoda as representative of Protostomia, and the whole group of Deuterostomia (excluding vertebrates). In the Bilateria group, humans derive from Deuterostomia, not from Protostomia. therefore, the huamn conserved homology in Cephalopoda can be traced to the common ancestor of Protostomia and Deuterostomia. Before the split. Third split: Pre-vertebrate Deuterostomia - Vertebrates (Cartilaginous fish) This is the important split for the discussion about the vertebrate proteome. So, let's look at it better. First deuterostomia include Echinodermata, Hemichordata and the first Chordata (Cephalochordata and Urochordata), if we exclude vertebrates. Vertebrates appear first as Agnatha (jawless fish, today's lampreys), then as Gnatostomatha, that split very quickly into Cartilaginous fish and Bony fish. So, the human conserved information in Deuterostomia not vertebrates can be traced to the common ancestor of vertebrates and other deuterostomes. That represents the highest level of human conserved information in animals that are not vertebrates. Of course, humans derive from the Vertebrate branch. Instead, the human conserved information in Cartilaginous fish can be traced to the common ancestor of Cartilaginous fish and Bony Fish. Humans (and, in general Tetrapoda) derive from Bony fish, so the human conserved information in Cartilagionus fish can be traced to some time after the Non vertebrates - Vertebrates split and before the Cartilaginous fish - Bony fish split. You may observe that I have not considered jawless fish (lampreys) in my analysis. That is mainly due to methodological problems: lampreys are an extremely small group of organisms, and they are scarcely represented in the NCBI database of proteins (just a few hundred sequences). Therefore, it is practically impossible to perform a reliable analysis on those data. Well, I hope that this helps clarify the methodology I have used.gpuccio
March 25, 2017
March
03
Mar
25
25
2017
07:11 AM
7
07
11
AM
PST
gpuccio, Given my poor reading comprehension, you may feel free to laugh at my naïve comments and dumb questions related to the following part of your article:
In each split, humans derive from the second line: Cnidaria – Bilateria. Let’s say at least 555 My ago. Protostomia – duterostomia. Let’s say about 530 My ago. Pre-vertebrate deuterostomia (including chordates like cephalocordata and Tunicates) – Vertebrates (Cartilaginous fish). Let’s say 440 My ago. Cartilaginous fish – Bony fish. Let’s say about 410 My ago. Bony fish – Tetrapods (Amphibians). Let’s say 370 My ago, more or less. Amphibians – Amniota (Sauropsida, Crocodylia): about 340 My ago Sauropsida (Crocodylia) – Synapsida (Metatheria, Marsupialia): about 310 My ago Metatheria – Eutheria (Afrotheria): about 150 My ago Atlantogenata (Afrotheria) – Boreoeutheria: probably about 100 My ago. The simple rule is: for each split, the second member of each split is the line to humans, and the human conserved information present in the first member of each couple must have been conserved in both lines at least from the time of the split to present day. So, for example, the human-conserved information in Cnidaria has been conserved for at least 555 MY, the human-conserved information in Crocodylia has been conserved for at least 310 My, and so on.
In this sentence: "In each split, humans derive from the second line:" does the word 'line' refers to 'branch' coming out of the split? Is the first line "Cnidaria – Bilateria" the first split? What did that split from? According to your explanations, is the Bilateria branch the one leading to humans? If I got that right, then is the next line (Protostomia – duterostomia) the split from the Bilateria branch? The following line then splits from "duterostomia" and so on? Thanks.Dionisio
March 25, 2017
March
03
Mar
25
25
2017
05:25 AM
5
05
25
AM
PST
gpuccio, I like your honesty adjusting the results for redundancy. That's a lesson on honest approach to interpretation of scientific research results. Thanks.Dionisio
March 25, 2017
March
03
Mar
25
25
2017
04:59 AM
4
04
59
AM
PST
Dionisio, Eric Anderson, UB, DATG, bill cole, phineas: Excellent discussion, my friends. Thanks to all of you for your precious contributions. Let's try to go deeper and deeper, this is the right way to discuss ID theory, IMHO. :) :)gpuccio
March 25, 2017
March
03
Mar
25
25
2017
04:22 AM
4
04
22
AM
PST
Eric Anderson @69: Regarding the so called "junk" DNA, I wouldn't be surprised if we have some cellular or molecular mess resulting from the messy human history. IOW, things that used to work but got messed up and stopped working right or completely. Well designed things that get abused and/or misused losing their functionality partially or completely. The amazing fact that despite such a messy history the robust biological systems have survived speaks volumes about the kind of design we're looking at.Dionisio
March 25, 2017
March
03
Mar
25
25
2017
04:06 AM
4
04
06
AM
PST
Eric Anderson @69:
The problem is that we know almost nothing about what is actually happening in the cell — from an engineering standpoint. We are still engaged in very crude reverse-engineering attempts: knock out a gene here, mutate a sequence there, and see what happens. Until we actually know what is doing what — precisely what the entire coding system is and what it means and how it plays out in the larger context — it will be impossible to draw a definitive conclusion that something is junk.
Excellent description of the situation. Thanks.Dionisio
March 25, 2017
March
03
Mar
25
25
2017
03:46 AM
3
03
46
AM
PST
gpuccio @68:
Indeed, there is a strong crusade against function, the most imbecile intellectual war ever fought, whose only purpose is to constrain observed, exuberant, amazing biological function in the narrow limits of present dogma and little imagination (how strange, in an academic class that has such huge imagination in elaborating darwinian fairy tales!
That's an interesting comment on the sad reality of academic circles today. Perhaps the appearance of the third way and the latest book from professor D. Noble shows that not all scientists remain dogmatic, and some have started to be more open-minded and think out of outdated boxes. However, even the new dissenting voices still seem confused about how to approach the strong evidences presented by the latest research.Dionisio
March 25, 2017
March
03
Mar
25
25
2017
03:35 AM
3
03
35
AM
PST
gpuccio @68:
Think also of the fundamental regulator activity of small molecules, like miRNAs and small peptides. Fascinating stuff indeed. And a global meta-setting that is incredibly complex and incredibly flexible (the global epigenetic levels of programming), and can rapidly change in an extremely coordinated way from cell type to cell type, ad even in the same cell type in different contexts!
Wow! Very interesting. Thanks.Dionisio
March 25, 2017
March
03
Mar
25
25
2017
03:13 AM
3
03
13
AM
PST
gpuccio @68:
And I would like to mention here two very big systems hat work as huge networks based on well programmed and flexible objects: the network of Transcription Factors and the amazing system based on Ubiquitin, E1 and E2 ubiquitin enzymes and especially E3 ubiquitin protein ligases, and the proteasome.
Wow! Very interesting. Thanks.Dionisio
March 25, 2017
March
03
Mar
25
25
2017
03:07 AM
3
03
07
AM
PST
It is a physical certainty that if one of Szostak's (or Joyce's or Lincoln's, etc) intellectual descendants ever creates a homogeneous self-replicator, they will most certainly have to add a semiotic component to the system in order to get it to function. One wonders if even that will finally be enough to infer design in biology among materialists?Upright BiPed
March 25, 2017
March
03
Mar
25
25
2017
02:43 AM
2
02
43
AM
PST
Eric,
Until we actually know what is doing what — precisely what the entire coding system is and what it means and how it plays out in the larger context — it will be impossible to draw a definitive conclusion that something is junk.
I think this is one of major benefits of the Venter program -- to minimize the operating system until we can better isolate the collective function of the minimal genome. Materialists so often come here to scorn, and ask where is the research that supports ID. It all does.Upright BiPed
March 25, 2017
March
03
Mar
25
25
2017
01:37 AM
1
01
37
AM
PST
gpuccio @68:
The working of biological objects is definitely similar to very good Object Oriented Programming.
[emphasis added] Excellent point, though in this particular context very good is an understatement. :) I worked for over two decades on software development for engineering design. The level of cybernetic solutions we're seeing in the biological systems was simply unimaginable to me or my fellow colleagues, even in our wildest daydreaming after many sleepless nights. :) Just imagine -if you will, that we could design a building in such a way that we just place our design product (shaped as a small sphere) within an enclosed frame, push a button to start the build process and voilà! The only caveat is that the frame has connectors to supply generic materials to the construction when required. That's it. Bingo! Even that is not comparable. The building isn't conscious. But we can leave that aside for now. :) Unbelievable! When one of my children left at home a textbook on human development after graduating from medical school, I dared to open it and all my learned concepts on complex systems and information technology were totally blown up into pieces. I haven't recovered since. Actually, it got worse from that point on. What started as curiosity soon turned into a growing fascination which eventually became an irresistible addiction to searching for newer information to understand how all that stuff works. Well, needless to say I had to leave the previous job and start a new career in a completely unknown field from scratch. It's been a long and winding road that has led to many doors of new information that answered some outstanding questions while raising new ones. Unexplainably, the fascination keeps growing as I dig deeper into the marvelous functioning of the biological systems, from my very limited perspective and much more limited intellectual knowledge and capacity to understand the difficult issues that appear before me in the literature I review. Definitely it's been very humbling. BTW, Professor Uri Alon described a kind of comparable situation he experienced after getting his PhD in Physics and looking at a biology book. He told that interesting story in the first class of his 2014 course on Systems Biology at the Weizmann Institute for Science. The 15-lesson video course is available online to anyone interested in that subject. For the same reasons interdisciplinary work teams are becoming standard in biology-related research these days. We ain't seen nothin' yet. The best is still ahead.Dionisio
March 25, 2017
March
03
Mar
25
25
2017
01:28 AM
1
01
28
AM
PST
Eric Anderson at #69: Very good thoughts! :)gpuccio
March 25, 2017
March
03
Mar
25
25
2017
01:19 AM
1
01
19
AM
PST
DATCG @67:
Question – is there any areas so far in actual research that is absolute JUNK? No function, no purpose, not utilized by other systems – just trash? It seems every single day, old Junk is being discovered as No Junk.
The problem is that we know almost nothing about what is actually happening in the cell -- from an engineering standpoint. We are still engaged in very crude reverse-engineering attempts: knock out a gene here, mutate a sequence there, and see what happens. Until we actually know what is doing what -- precisely what the entire coding system is and what it means and how it plays out in the larger context -- it will be impossible to draw a definitive conclusion that something is junk. But we can see some important developments and draw some initial conclusions. Let's consider so-called "junk DNA" for a moment: Even if we can't seem to find a function for a particular sequence, it doesn't mean it doesn't have one. 1. There is the obvious possibility of redundancy. 2. There may be pathways that are important for some stages of development and not others, thus making it more difficult to identify whether it lacks function, particularly with experiments that don't track the entire lifecycle of the organism. 3. There may be functions that are there for a particular purpose or particular environmental condition and which won't be manifest immediately. 4. There may be functions that contribute to overall fitness of the organism/population over generations, but which would not be evident over a few generations in short-term knockout studies. 5. There may be functions that are not critical to the organism, but which still contribute to overall health, performance, quality of life, etc. These nuances can easily be missed in knockout experiments. 6. There may be front-loading -- sequences that are not necessary for the organism currently, but which were in the past or could become so later on. 7. Finally, there are thousands upon thousands of biological functions and processes that we know exist and for which clear genetic instructions have not yet been identified -- vastly more in fact, than what has thus far been identified. It follows as a matter of logic that either (a) a significant amount of allegedly-non-functional DNA will turn out to have function, or (b) the known functional DNA will turn out to have many more layers of multiple-functionality than currently thought. Or a combination of the two. ----- Finally, let us remember one additional overarching fact: Every time we find an additional function, we are adding to the amount of known functional DNA. In contrast, we will essentially never discover that a known functional sequence is no longer functional. Thus, as more research is being done, we can only add to the known functional DNA. In other words, whatever percentage of functional DNA is currently known, the percentage can only increase, not decrease. The arrow of discovery and the trend of research is very clear on this point, both logically and practically, as witnessed by it seems almost daily new discoveries in functionality. Anyone who still clings to the outdated and simplistic Darwinian talking-point of pervasive amounts of junk DNA is not only incredibly naive about how complex, functional systems work in the real world, but is also standing firmly on the wrong side of the trajectory of the evidence.Eric Anderson
March 25, 2017
March
03
Mar
25
25
2017
12:11 AM
12
12
11
AM
PST
DATCG: Thank you for the kind words! :) You raise important and fascinating points: "When Dionisio brought up Object Class modifications, it struck a chord." Absolutely! The working of biological objects is definitely similar to very good Object Oriented Programming. Protein domains are probably the main core objects, with a lot of ability for conservation of core function and tweaking of interface. The same reasoning can be done for the whole protein as a meta-object. And so on. And I would like to mention here two very big systems hat work as huge networks based on well programmed and flexible objects: the network of Transcription Factors and the amazing system based on Ubiquitin, E1 and E2 ubiquitin enzymes and especially E3 ubiquitin protein ligases, and the proteasome. "I’ve always looked at biological functions through Programmers perspective as functions in a Common Design Depository of component systems" Absolutely! Think also of the fundamental regulator activity of small molecules, like miRNAs and small peptides. Fascinating stuff indeed. And a global meta-setting that is incredibly complex and incredibly flexible (the global epigenetic levels of programming), and can rapidly change in an extremely coordinated way from cell type to cell type, ad even in the same cell type in different contexts! "Just how large an area exist still yet undiscovered today? ENCODE I think said at least 80% of what was previously written off as JUNK might be functional?" Yes, and they have been strongly criticized for that! But time will tell. :) Indeed, there is a strong crusade against function, the most imbecile intellectual war ever fought, whose only purpose is to constrain observed, exuberant, amazing biological function in the narrow limits of present dogma and little imagination (how strange, in an academic class that has such huge imagination in elaborating darwinian fairy tales! :) ). "Question – is there any areas so far in actual research that is absolute JUNK? No function, no purpose, not utilized by other systems – just trash?" Maybe classical neo-darwinism? OK, just joking... :)gpuccio
March 24, 2017
March
03
Mar
24
24
2017
09:50 PM
9
09
50
PM
PST
Then there's the Brain. Yet more discoveries of previously ignored areas. How many areas exist like this? Of "types of molecules not previously thought to produce proteins?" Just how large an area exist still yet undiscovered today? ENCODE I think said at least 80% of what was previously written off as JUNK might be functional? Question - is there any areas so far in actual research that is absolute JUNK? No function, no purpose, not utilized by other systems - just trash? It seems every single day, old Junk is being discovered as No Junk. Research shows that circular RNAs, until now considered non-coding, can encode for proteins
This discovery reveals an unexplored layer of gene activity in a type of molecule not previously thought to produce proteins. It also reveals the existence of a new universe of proteins not yet characterized. To determine whether circRNAs are translated, the researchers used Drosophila (fruit flies) and developed or adapted various techniques from molecular biology, computational biochemistry and neurobiology. They also found that translated circRNAs are associated with specific places in the cells, in particular synapses, the junctions where electrical impulses pass from one nerve cell to another nerve or muscle cell. Indeed, the proteins produced from these circRNAs are present in synapses and are translated in response to specific signals, e.g. when the flies did not have access to food for 12 hours. This suggests that communication between neurons might involve unknown and uncharacterized mechanisms. Moreover, starvation and other pathways that induce the translation of circRNAs are also involved in aging, suggesting a strong link between circRNA translation and aging and a possible role for these molecules in neurodegenerative diseases. As circRNAs are extremely stable, they potentially could be stored for a long time in compartments more distant to the cell's body like axons of neuron cells. There, the RNA molecules could serve as a reservoir for proteins being produced at a given time.
Reservoir or Depository, take your pick. I came across this after I discussed a "Depository"DATCG
March 24, 2017
March
03
Mar
24
24
2017
04:29 PM
4
04
29
PM
PST
Nice post Gpuccio. Congrats and thanks for all your hard work. Always enjoy your post. Enjoyed reading this, past post and discussion. On yours and Dioniso discussion #50/#54.
Dionisio: “It’s like modifying object classes in order to boost their capabilities when doing object-oriented software development.” Gpuccio: "That’s exactly what I think. For many proteins, some basic biochemical function can remain similar, and is usually implemented by more conserved domains."
Was thinking there must be core, conserved domains, but with Prescribed tolerance levels of variation across systems. When Dionisio brought up Object Class modifications, it struck a chord. I've always looked at biological functions through Programmers perspective as functions in a Common Design Depository of component systems Blueprints, regulators, switches, and data references. All utilized by different Rules-based Regulatory Routines and/or Sub-Routines. Seeing how Enhancers work, coordination, collaboration takes place with Master Regulators, we live in fascinating times. Looking forward to your next post.DATCG
March 24, 2017
March
03
Mar
24
24
2017
04:05 PM
4
04
05
PM
PST
Interesting discussion indeed.Dionisio
March 24, 2017
March
03
Mar
24
24
2017
03:52 PM
3
03
52
PM
PST
Eric Anderson: "But even if by chance a purely natural process stumbles upon a new “function”, it still has to be integrated into the cellular machinery, instructions have to be appropriately included in the repertoire and faithfully passed on to the next generation, and on and on. This is an engineering problem." These are fundamental points. thank you for raising them One very interesting thing that, IMO, emerges from the data presented here, is that they seem to be a coordinate change. That's why we have such huge "jumps" in information content. This is a very important point. If we look at Figures 4 and 5 (the last two figures in the OP, we can see many interesting things, but one thing is particularly interesting, and maybe not so obvious: The almost identical form of the density curve for cnidaria and cephalopoda. The two curves are really so similar that a casual observer really should wonder why. Now, let's consider that: a) Cnidaria are a phylum of Radiata, while Cephalopoda are a quite evolved class of the phylum Mollusca, in the superphylum Lophotrochozoa, which is part of Protostomia in the clade Bilateria. While we cannot be sure of how chronologically distant these two groups of organisms are (the problems inherent in the Cambrian explosion make any final assessment very difficult), there is no doubt that Cnidaria and Cephalopoda are extremely distant from practically all major points of vew, starting from body plan simmetry. b) And yet, id we evaluate those two groups for their content of human related sequences, they behave almost in an identical way. That si really surprising and interesting. c) Now, I want to be very clear about one aspect: when we analyze the content in human conserved information, we are in no way looking at the total functional information in an organism, but only to the subset of functional information that is conserved also in the human lineage after the relevant split. d) To be more clear, I am in no way saying that Cnidaria and Protostomia have the same amount of functional information, or that they do not differ for specific functional information. This is an important point. For example, Cephalopoda may well have a lot of speicif functional information that makes them different from Cnidaria. They probably have millions of bits of functional information that are Cephalopod-specific. But here we are not looking at that. We are simply looking at the functional information that is conserved up to humans. And that functional information is extremely similar in Cnidaria and Cephalopoda. It is not only very similar as total content in bits (a little more than 5 million bits, not corrected for redundancy). It has also almost the same density distribution when evaluated as baa values for each single protein. e) So, in the measure that these two very different groups have human conserved information, we can say that such human conserved information is almost the same in the two groups. f) Now, even if 5 million bits are certainly a relevant bulk of information, they are only a minor part of the total information that we can observe in humans. They very likely correspond mainly to old protein systems, very conserved even before the animal stage. Fundamental, house keeping functional systems, that are transmitted and conserved for hundreds of millions of years, often for billions of years. Much of that information can certainly be traced, still with good conservation, up to LUCA and OOL. g) But then, first at the level of deuterostomia, and then, much more, at the level of the first vertebrates, we have what could be called as a sudden "explosion" of human conserved functional information: 1.7 million bits in the first deuterostomia, 3.7 million bits in the first vertebrates (not corrected for redundancy). A very big explosion, equivalent to the sum total of the human conserved functional information that had been accumulated from OOL to the whole group of protostomia. And it happens in two well separated steps. h) What does that mean? IMO, there is only one reasonable interpretation: with the appearance of deuterostomia first, and then with the appearance of craniata-vertebrata (cartilaginous fish), new bodily plans are introduced. And they are introduced rather suddenly from an evolutionary standpoint. And they are introduced in a coordinated way. i) So, I think these data really support the points that you have so brilliantly made: not only the appearance of the first vertebrates required 3.7 million bits of new specific information (1.7 million bits of which unique), but there are all the reasons in the world to believe that such bulk of new information is highly coordinated. The whole bulk of new information contributes to make vertebrata different from previous deuterostomia, and to implement a really new biological plan that will then be carried on more or less gradually in the hundreds of million years that will follow. For example, one of the most striking features of the new lineage will be cephalisation, the concentration of brain functions in the head. Another one will be the development of adaptive immunity. This is design in its highest form. I hope I will be able to give further insights about these points when I will present other data about the evolutionary pattern of different kinds of proteins. And, just to finish, let's remember that this is only the proteome, and that behind it there is much, much more.gpuccio
March 24, 2017
March
03
Mar
24
24
2017
03:11 PM
3
03
11
PM
PST
gpuccio's thorough research work is better than many papers I've seen lately -and I've seen quite a bit of them these days (just look at the thread "a third way of evolution"). Nevertheless, I think I understand his decision to provide serious scientific arguments in an ongoing public debate that sometimes lacks scientific rigor.Dionisio
March 24, 2017
March
03
Mar
24
24
2017
02:13 PM
2
02
13
PM
PST
gpuccio @61: Excellent points and critique. -----
Sometimes darwinists seem to forget that not any function is good in a cell environment. Many “functions” are simply useless, or detrimental.
This is such a crucial and fundamental point for any complex, functional system. It would be nearly impossible to overemphasize this point. We cannot simply throw functions into a complex, functional system and get advantageous outcomes. The Darwinian paradigm has had such a hard time coming up with decent examples of additional fuction. The problem, as has often been pointed out, is due to the lack of resources and the massive search space. It is a daunting task. But even if by chance a purely natural process stumbles upon a new "function", it still has to be integrated into the cellular machinery, instructions have to be appropriately included in the repertoire and faithfully passed on to the next generation, and on and on. This is an engineering problem. And if we step back for a moment from the weeds, like Szostak's paper, to look at the big picture, we can see that this is a massive conceptual problem for the evolutionary paradigm at large. One simply cannot go around randomly introducing new "functions" into a complex, functional system like a cell and expect anything good to come of it. The evolutionary construct is unworkable, not just at the detailed level of a protein here or a protein there there. It is fundamentally unworkable across the whole spectrum of changes that allegedly would have had to occur across biology and across time. The entire edifice is built upon an incredibly naive and simplistic view of how complex, functional systems operate.Eric Anderson
March 24, 2017
March
03
Mar
24
24
2017
12:37 PM
12
12
37
PM
PST
1 4 5 6 7 8 9

Leave a Reply