What are the limits of Random Variation? A simple evaluation of the probabilistic resources of our biological world

_{gpuccio
October 31, 2017

Intelligent Design

10}_{Categories
Intelligent Design}

Share: Facebook; Twitter/X; LinkedIn; Flipboard; Print; Email

Coming from a long and detailed discussion about the limits of Natural Selection, here:

What are the limits of Natural Selection? An interesting open discussion with Gordon Davisson

I realized that some attention could be given to the other great protagonist of the neo-darwinian algorithm: Random Variation (RV).

For the sake of clarity, as usual, I will try to give explicit definitions in advance.

Let’s call RV event any random event that, in the course of Natural History, acts on an existing organism at the genetic level, so that the genome of that individual organism changes in its descendants.

That’s more or less the same as the neo-darwinian concept of descent with modifications.

A few important clarifications:

a) I use the term variation instead of mutation because I want to include in the definition all possible kinds of variation, not only single point mutations.

b) Random here means essentially that the mechanisms that cause the variation are in no way related to function, whatever it is: IOWs, the function that may arise or not arise as a result of the variation is in no way related to the mechanism that effects the change, but only to the specific configuration which arises randomly from that mechanism.

In all the present discussion we will not consider how NS can change the RV scenario: I have discussed that in great detail in the quoted previous thread, and those who are interested in that aspect can refer to it. In brief, I will remind here that NS does not act on the sequences themselves (IOWs the functional information), but, if and when and in the measure that it can act, it acts by modifyng the probabilistic resources.

So, an important concept is that:

All new functional information that may arise by the neo-darwinian mechanism is the result of RV.

Examining the Summers paper about chloroquine resistance:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4035986/

I have argued in the old thread that the whole process of generation of the resistance in natural strains can be divided into two steps:

a) The appearance of an initial new state which confers the initial resistance. In our example, that corresponds to the appearance of one of two possible resistant states, both of which require two neutral mutations. IOWs, this initial step is the result of mere RV, and NS has no role in that. Of course, the initial resistant state, once reached, can be selected. We have also seen that the initial state of two mutations is probably the critical step in the whole process, in terms of time required.

b) From that point on, a few individual steps of one single mutation, each of them conferring greater resistance, can optimize the function rather easily.

Now, point a) is exactly what we are discussing in this new thread.

So, what are the realistic powers of mere RV in the biological world, in terms of functional information? What can it really achieve?

Another way to ask the same question is: how functionally complex can the initial state that for the first time implements a new function be, arising from mere RV?

And now, let’s define the probabilistic resources.

Let’s call probabilistic resources, in a system where random events take place, the total number of different states that can be reached by RV events in a certain window of time.

In a system where two dies are tossed each minute, and the numbers deriving from each toss are the states we are interested in, the probabilistic resources of the system in one day amount to 1440 states.

The greater the probabilstic resources, the easier it is to find some specific state, which has some specific probability to be found in one random attempt.

So, what are the states generated by RV? They are, very simply, all different genomes that arise in any individual of any species by RV events, or if you prefer by descent with modification.

Please note that we are referring here to heritable variation only, we are not interested to somatic genetic variation, which is not transmitted to descendants.

So, what are the probabilistic resources in our biological world? How can they be estimated?

I will use here a top-down method. So, I will not rely on empirical data like those from Summers or Behe or others, but only on what is known about the biological world and natural history.

The biological probabilstic resources derive from reproduction: each reproduction event is a new state reached, if its genetic information is different from the previous state. So, the total numbet of states reached in a system in a certain window of time is simply the total number of reproduction events where the genetic information changes. IOWs, where some RV event takes place.

Those resources depend essentially on three main components:

The population size
The number of reproductions of each individual (the reproduction rate) in a certain time
The time window

So, I have tried to compute the total probabilistic resources (total number of different states) for some different biological populations, in different time windows, appropriate for the specific population (IOWs, for each population, from the approximate time of its appearance up to now). As usual, I have expressed the final results in bits (log2 of the total number).

Here are the results:

Population	Size	Reproduction rate (per day)	Mutation rate	Time window	Time (in days)	Number of states	Bits	+ 5 sigma	Specific AAs
Bacteria	5.00E+30	24	0.003	4 billion years	1.46E+12	5.26E+41	138.6	160.3	37
Fungi	1.00E+27	24	0.003	2 billion years	7.3E+11	5.26E+37	125.3	147.0	34
Insects	1.00E+19	0.2	0.06	500 million years	1.825E+11	2.19E+28	94.1	115.8	27
Fish	4E+12	0.1	5	400 million years	1.46E+11	2.92E+23	78.0	99.7	23
Hominidae	5.00E+09	0.000136986	100	15 million years	5.48E+09	3.75E+17	58.4	80.1	19

The mutation rate is expressed as mutations per genome per reproduction.

This is only a tentative estimate, and of course a gross one. I have tried to get the best reasonable values from the sources I could find, but of course many values could be somewhat different, and sometimes it was really difficult to find any good reference, and I just had to make an educated guess. Of course, I will be happy to acknowledge any suggestion or correction based on good sources.

But, even if we consider all those uncertainties, I would say that these numbers do tell us something very interesting.

First of all, the highest probabilistic resources are found in bacteria, as expected: this is due mainly to the huge population size and high reproduction rate. The number for fungi are almost comparable, although significantly lower.

So, the first important conclusion is that, in these two basic classes of organisms, the probabilistic resources, with this hugely optimistic estimate, are still under 140 bits.

The penultimate column just adds 21.7 bits (the margin for 5 sigma safety for inferences about fundamental issues in physics). What does that mean?

It means, for example, that any sequence with 160 bits of functional information is, by far, beyond any reasonable probability of being the result of RV in the system of all bacteria in 4 billion years of natural history, even with the most optimistic assumptions.

The last column gives the number of specific AAs that corrispond to the bit value in the penultimate column (based on a maximum information value of 4.32 bits per AA).

For bacteria, that corresponds to 37 specific AAs.

IOWs, a sequence of 37 specific AAs is already well beyond the probabilistic resources of the whole population of bacteria in the whole world reproducing for 4 billion years!

For fungi, 147 bits and 34 AAs are the upper limit.

Of course, values become lower for the other classes. Insects still perform reasonably well, with 116 bits and 27 AAs. Fish and Hominidae have even lower values.

We can notice that Hominidae gain something in the mutation rate, which as known is higher, and that I have considered here at 100 new mutations per genome per reproduction (a reasonable estimate for homo sapiens). Moreover, I have considered here a very generous population of 5 billion individuals, again taking a recent value for homo sapiens. These are not realistic choices, but again generous ones, just to make my darwinist friends happy.

Another consideration: I have given here total populations (or at least generous estimates for them), and not effective population sizes. Again, the idea is to give the highest chances to the neo-darwinian algorithm.

So, these are very simple numbers, and they should give an idea of what I would call the upper threshold of what mere RV can do, estimated by a top down reasoning, and with extremely generous assumptions.

Another important conclusion is the following:

All the components of the probabilistic resources have a linear relationship with the total number of states.

That is true for population size, for reproduction rate, mutation rate and time.

For example, everyone can see that the different time windows, ranging from 4 billion years to 15 million years, which seems a very big difference, correspond to only 3 orders of magnitude in the total number of states. Indeed, the highest variations are probably in population size.

However, the complexity of a sequence, in terms of necessary AA sites, has an exponential relationship with the functional information in bits: a range from 19 to 37 AAs (only 18 AAs) corresponds to a range of 24 orders of magnitude in the distribution of probabilistic resources.

Can I remind here briefly, without any further comments, that in my OP here:

The amazing level of engineering in the transition to the vertebrate proteome: a global analysis

I have analyzed the informational jump in human conserved information at the apperance of vertebrates? One important result is that 10% of all human proteins (about 2000) have an information jump from pre-vertebrates to vertenrates of at least (about) 500 bits (corresponding to about 116 AAs)!

Now, some important final considerations:

I am making no special inferences here, and I am drawing no special conclusions. I don’t think it is really necessary. The numbers speak for themselves.
I will be happy of any suggestion, correction, or comment. Especially if based on facts or reasonable arguments. The discussion is open.
Again, this is about mere RV. This is about the neutral case. NS has nothing to do with these numbers.
For those interested in a discussion about the possible role of NS, I can suggest the thread linked at the beginning of this OP.
I will be happy to answer any question about NS too, of course, but I would be even more happy if someone tried to answer my two questions challenge, given at post #103 of the other thread, and that nobody has answered yet. I paste it here for the convenience of all:

Will anyone on the other side answer the following two simple questions?

1) Is there any conceptual reason why we should believe that complex protein functions can be deconstructed into simpler, naturally selectable steps? That such a ladder exists, in general, or even in specific cases?

2) Is there any evidence from facts that supports the hypothesis that complex protein functions can be deconstructed into simpler, naturally selectable steps? That such a ladder exists, in general, or even in specific cases?

Comments

Nonlin.org: I have answered Anaxagoras at #5. I do not accept Darwinist false assumptions. I accept those assumptions that are supported by facts and have scientific credibility. Most of Darwinist assumptions do not have those requisites, and that's why I reject them. But I completely accept the idea of sequence conservation by negative natural selection, for example, because it is completely supported by facts. Are you suggesting that we should reject any good idea only because Darwinists share it? Moreover, I can agree that DNA is "overrated". But that is no reason to deny what it really does. Are you suggesting that protein coding genes do not bear the information for their proteins? Are you suggesting that the sequence of a proteins is not important for its function? Are you suggesting that the conservation of a sequence is not an indicator of functional complexity? Or are you just suggesting that RV does not occur at all? Could you specify what are the "Darwinist false assumptions" that I would accept? Just to understand.gpuccio_{March 20, 2018
March
03
Mar
20
20
2018
01:17 AM
1
01
17
AM
PDT}

Gpuccio: Anaxagoras @1 is right (didn't read all). The analysis might be fine, but accepts Darwinist false assumptions. I would add it's too geeky and these kind of messages never win. Contrast this with the "selfish gene" soundbite. Yes it's wrong, but effective. Btw, DNA is overrated: http://nonlin.org/dna-not-essence-of-life/Nonlin.org_{March 2, 2018
March
03
Mar
2
02
2018
10:00 AM
10
10
00
AM
PDT}

ET, You may have scared the polite dissenter away. :)Dionisio_{November 22, 2017
November
11
Nov
22
22
2017
10:11 AM
10
10
11
AM
PDT}

Corey:
We’ve both made mistakes, but whereas mine are in glossing over and maybe embellishing things a little too much, your mistakes are egregious misunderstandings of the undelying science and biology.
Except the fact that YOU don't have any science nor biology to support the claim that blind and mindless processes can produce proteins 50AA's long or longer. So it looks like you lose because of that failure.ET_{November 21, 2017
November
11
Nov
21
21
2017
02:46 AM
2
02
46
AM
PDT}

Critical residues in an SH3 domain from Sem-5 suggest a mechanism for proline-rich peptide recognition. Lim WA1, Richards FM. That's a 1994 paper. Here's a 2016 paper referring to the same domain: http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0145872Dionisio_{November 21, 2017
November
11
Nov
21
21
2017
02:38 AM
2
02
38
AM
PDT}

gpuccio, It seems like your politely dissenting interlocutor should have spent more time reading your comments carefully before coming back? Thus all the barking up the wrong trees could have been avoided? Well, at least the loud barking attracted more readers back to this thread. :)Dionisio_{November 21, 2017
November
11
Nov
21
21
2017
01:11 AM
1
01
11
AM
PDT}

Corey Delvine: you say:
Scientists: “Here we show that a small beta-sheet protein, the SH3 domain” Gpuccio: “It’s a domain, not a protein” Seems like a bunch of scientists don’t have an issue with using the word protein here, but Gpucc doesn’t like it. Hmm who do we listen to?
Maybe the serious authors of the study they quote, who very correctly say:
Critical residues in an SH3 domain from Sem-5 suggest a mechanism for proline-rich peptide recognition. Lim WA1, Richards FM. Abstract Src homology 3 (SH3) domains bind specific proline-rich peptide motifs. To identify interactions involved in peptide recognition, we have mutated residues on the putative binding surface of an SH3 domain from the Caenorhabditis elegans protein Sem-5. Among the most critical positions are three adjacent aromatic residues, which appear to participate in highly stereospecific packing interactions with the ligand. The co-planar arrangement of two of these residues closely matches the periodicity of a poly-proline II (PPII) helix. Thus, a model for recognition has the peptide adopting a PPII helix, with the pyrrolidine rings on one helical face interlocking with the aromatic SH3 residues.
(Emphasis mine) This is the correct way to say things. It's not my fault if the authors of your beloved paper are sloppy and imprecise in their wording (and not only in that). You say:
Gpuccio: “Unfortunately, this step of “random splicing” is not detailed in the Methods section” Wrong, it’s called PCR splicing
It is not detailed. It's simply mentioned. There is not even a reference. Again, sloppy. What's wrong in my statement? You say:
Gpuccio: “Therefore, their two “functional” results are probably the only functional sequences in that search space.” Pucci, you are completely and utterly wrong. How can you say those are the only two functional sequences, when they’ve put numerous other functional sequences right in you face? Oh right, because it’s you. Scientists: “The biophysical properites of a number of functional variants were assesed…studies indicated that each of the variants was folded and stable” And the thermodynamics/peptide binding results show that these variants are functional. Scientists: “Simplification was successful at 38 of the 40 positions varied” Scientists: “The protein scaffold that supports the binding site is 95% IKEAG”
"when they’ve put numerous other functional sequences right in you face?" But those are of course the partially simplified sequences! The statement you quote: “The biophysical properites of a number of functional variants were assesed…studies indicated that each of the variants was folded and stable” is about the partially simplified sequences! The fully simplified sequences are two, and only two. And all the conclusions of the study are about the two sully simplified sequences, not about the partially simplified ones. Again, you mystify, probably intentionally. I have already discussed the last two points in my previous post. You say:
Gpuccio: “But, you will say, in two cases they did it.” The fact that they end up with just these two variants is a product of experimental limitations, not biological limitations. You try all the cloning, PCRing, digestion/ligations, transforming, bead binding/eluting, colony screening and let me know what you end up with.
Ah, then you understand that they "end up with just two variants". So, the mystification was intentional, and not due to ignorance. But my point is very explicit. They used a library of about 10^7 sequences for each third of the molecule. Now, we agree I hope that the successful simplifications were in about 50% of the molecule. That mean 28 residues. So, each third which was simplified was about 9 residues. The search space for 9 residues in a 5 letters alphabet is: 5^9 = about 10^6 So, they explored all the search space. And they found only two final sequences. Which was my point.
Gpuccio: “So, what is it? 5 or 10-12?” That depends what you are trying to do. Demonstrate amino acid tolerances in simple proteins, or reproduce folding of a large number of today’s known protein families? The two are very different.
No. What everybody is trying to do is showing how many AAs could be present in the original alphabet at OOL. That is very clear in the first statement of your paper, in the abstract: "Early protein synthesis is thought to have involved a reduced amino acid alphabet. What is the minimum number of amino acids that would have been needed to encode complex protein folds similar to those found in nature today?" So, they are all debating the same thing.
Gpuccio: “With the available evidence, I would say neither!” Of course, because when you have two papers sitting in front of you that explicitly demonstrate function with 5 or 10-12 amino acid alphabets Gpuccio will simply deny their existence.
After having argued in great detail why I think that way, and why the paper you presented as evidence is evidence of nothing relevant to the discussion. You say:
Gpuccio: “wt SH3: 7.5 micromoles” M is molar not moles. =)
Well, at least you put a smiley :) OK, I should have written micromoles/liter. So, one big mistake on my part! :) Let's close with part of your final statement: "We’ve both made mistakes" Yes, we are mistake buddies! :)gpuccio_{November 21, 2017
November
11
Nov
21
21
2017
12:31 AM
12
12
31
AM
PDT}

Corey Delvine: Hey, I thought we had lost you, instead you have done a lot of work! That's good, i really appreciate it. :) And you even admit that you have made some mistakes, even if of course implying me in much greater ones! "We’ve both made mistakes, but whereas mine are in glossing over and maybe embellishing things a little too much, your mistakes are egregious misunderstandings of the underlying science and biology." That's progress, I would say. So, let's see you arguments in detail, and what my huge mistakes are, even if it means to spend further time and attention with a paper that certainly does not deserve it! :) I said: "“residues not directly implied in the function of the domain. IOWs, the most important residues from the point of view of function. 12 AAs (21% of the molecule)" You comment:
So, let me get this straight, you are admitting that 79% is not essential to function and tolerate substitutions to some degree (many of them to apparently a large degree)? Doesn’t this fly in the face of your “functional sequence space is a tiny bit of the search space” belief?
Excuse me, where did I say such a thing? I said that those 12 AAs are "the most important residues from the point of view of function." I never said that the others "are not is not essential to function and tolerate substitutions to some degree". You are inventing things and putting word in my mouth. OK, that is not the first time.
Ah yes, those scientists, always trying to deceive us!
Not "those scientists". Some scientists. And not "always". Sometimes. This is one of those times. I said: "But for the moment, let’s acknowledge this simple truth: the simplification failed in 5 residues where it was attempted. But in the Figure I can find only 4 of them. Residues 2,16,17,42 The fifth residue in black, the S at 50, is indeed one of the 12 AAs mentioned before, So, there must be some error in the paper. OK, not important." You comment:
Wrong, there are 40 IKEAG and 5 non-IKEAG in FP2. The S->A was caused by the PCR splicing which can cause mutation, check table 1.
Excuse me, in Fig. 2b it says: "The colour scheme is as in Fig. 1". In Fig 1 it says: "black, residues which did not tolerate simplification". So, that means that in Fig 2, too, black residues are those which "did not tolerate simplification". But in Fig. 2, in FP2, I can count only 5 black residues, number 2, 16, 17, 42 and 50. Can you confirm, or are my old eyes deceiving me? So, according to the legend, those 5 residues in FP2, and only them, must be residues which "did not tolerate simplification". But the S in residue 50, in Fig 1, is shown in blue, "residues where simplification was not attempted". So, there is an error. Not important, as I said. But an error just the same. What's your problem? You say: "The S->A was caused by the PCR splicing which can cause mutation, check table 1" OK, and what has that to do with what I am saying? The error in in Fig 2, in the color code attributed to the 50th residue. I said: "“So, it would seem that 40 (or 41) of the residues have been “simplified”, isn’t it”" You comment:
But that is not what it says. Nobody but you is to blame for your own misunderstandings or miscomprehension. The study was about building a protein from a “simplified amino acid alphabet,” which is largely what they did.
There is no miscomprehension here. One thing is to say that the FP2 protein contains 40 IKEAG, a thing which is true and that I have never denied. Another thing is saying: “In the more simplified variant, FP2” (IOWs their best result) “40 of the 45 residues at which simplification was attempted are I,K,E,A or G.” which is at best ambiguous and decieving, as I have said. Another thing is saying: “For example, in the first third of the protein (Fig. 1a), 16 of the 19 residues not involved in binding were converted to I,K,E,A, or G in the most simplified sequences.” which is simply false. About this last point, you say:
Gpuccio: “If you look at Fig. 1a, you will see that 3 of those 16 residues 4,21 and 22, have never been converted at all” Wrong. There are 19 residues they attempted to change in 1a. Of those 19, I repeat:19, 3 were not changed (4,21,22) to get the 16/19 (not 13/16 as you said).
Let's see: "There are 19 residues they attempted to change in 1a." Correct. "Of those 19, I repeat:19, 3 were not changed (4,21,22)" Yes. "to get the 16/19 (not 13/16 as you said)." No. You are confused. 19 residues: those they attempted to change. 3 residues: those which did not change, because they are the same as in the WT ((4,21,22) + the residues where simplification was not tolerated (those in black), whose number is different in each sequence. In the most simplified sequences they are 3, in the others they are more. The most simplified sequences are number 5, 8, 9, 12, 14, 15, 18. In each of them there are 3 black residues + at least 3 residues which have not changed. Take number 5, for example: 3 black residues: 2, 16, 17 3 residues which have not changed in any sequence: 4, 21, 22 + residues which have not changed in this particular sequence: 9, 19, 20 In each of the most simplified sequences, there are at least 6 residues that have not been converted to IKEAG (the 3 black ones + the 3 which have never changed). But the authors state: “For example, in the first third of the protein (Fig. 1a), 16 of the 19 residues not involved in binding were converted to I,K,E,A, or G in the most simplified sequences.” But that is not true. 19 - 6 = 13, not 16 And yet, you support their statement. Why? you say:
“He should have noticed that in 12 important AAs no simplification was even attempted, because that is clearly stated by the authors. So, he’s responsible for that error.” Wrong, the paper mentions that they were able to change half of those 12 amino acids individually to alanine and not effect on function. You conveniently left that part out.
Wrong. The authors simply say: "For example, of the 12 positions held fixed in this study, half were shown to tolerate alanine substitutions in the Sem5 SH3 domain; judging from the effects on expression levels, only one of these mutants appeared to have significantly decreased stability 9." They are only quoting another paper. It's not something they did at all. And yet you say: "the paper mentions that they were able to change half of those 12 amino acids individually to alanine and not effect on function" Wrong. The paper they quote is the following: "Critical residues in an SH3 domain from Sem-5 suggest a mechanism for proline-rich peptide recognition." by Lim WA1, Richards FM. Another paper, other authors, other aims of the study, different SH3 domain (SH3 domain from the Caenorhabditis elegans protein Sem-5). So, it's not true that "they were able to change half of those 12 amino acids individually to alanine and not effect on function", as you say. In their study, the 12 AAs were simply "held fixed", as clearly stated by the authors. So, I did not left out any part of their study, and what you say is simply wrong. More in next post.gpuccio_{November 20, 2017
November
11
Nov
20
20
2017
11:59 PM
11
11
59
PM
PDT}

Gpuccio: "residues not directly implied in the function of the domain. IOWs, the most important residues from the point of view of function." 12 AAs (21% of the molecule) So, let me get this straight, you are admitting that 79% is not essential to function and tolerate substitutions to some degree (many of them to apparently a large degree)? Doesn't this fly in the face of your "functional sequence space is a tiny bit of the search space" belief? "subtle deception." Ah yes, those scientists, always trying to deceive us! "The fifth residue in black, the S at 50, is indeed one of the 12 AAs mentioned before, So, there must be some error in the paper. OK, not important." Wrong, there are 40 IKEAG and 5 non-IKEAG in FP2. The S->A was caused by the PCR splicing which can cause mutation, check table 1. "So, it would seem that 40 (or 41) of the residues have been “simplified”, isn’t it" But that is not what it says. Nobody but you is to blame for your own misunderstandings or miscomprehension. The study was about building a protein from a "simplified amino acid alphabet," which is largely what they did. "He should have noticed that in 12 important AAs no simplification was even attempted, because that is clearly stated by the authors. So, he’s responsible for that error." Wrong, the paper mentions that they were able to change half of those 12 amino acids individually to alanine and not effect on function. You conveniently left that part out. Scientists: "Here we show that a small beta-sheet protein, the SH3 domain" Gpuccio: "It's a domain, not a protein" Seems like a bunch of scientists don't have an issue with using the word protein here, but Gpucc doesn't like it. Hmm who do we listen to? Gpuccio: "If you look at Fig. 1a, you will see that 3 of those 16 residues 4,21 and 22, have never been converted at all" Wrong. There are 19 residues they attempted to change in 1a. Of those 19, I repeat:19, 3 were not changed (4,21,22) to get the 16/19 (not 13/16 as you said). Gpuccio: "Unfortunately, this step of “random splicing” is not detailed in the Methods section" Wrong, it's called PCR splicing Gpuccio: "Therefore, their two “functional” results are probably the only functional sequences in that search space." Pucci, you are completely and utterly wrong. How can you say those are the only two functional sequences, when they've put numerous other functional sequences right in you face? Oh right, because it's you. Scientists: "The biophysical properites of a number of functional variants were assesed...studies indicated that each of the variants was folded and stable" And the thermodynamics/peptide binding results show that these variants are functional. Scientists: "Simplification was successful at 38 of the 40 positions varied" Scientists: "The protein scaffold that supports the binding site is 95% IKEAG" Gpuccio: "But, you will say, in two cases they did it." The fact that they end up with just these two variants is a product of experimental limitations, not biological limitations. You try all the cloning, PCRing, digestion/ligations, transforming, bead binding/eluting, colony screening and let me know what you end up with. Gpuccio: "So, what is it? 5 or 10-12?" That depends what you are trying to do. Demonstrate amino acid tolerances in simple proteins, or reproduce folding of a large number of today's known protein families? The two are very different. Gpuccio: "With the available evidence, I would say neither!" Of course, because when you have two papers sitting in front of you that explicitly demonstrate function with 5 or 10-12 amino acid alphabets Gpuccio will simply deny their existence. Gpuccio: "wt SH3: 7.5 micromoles" M is molar not moles. =) We've both made mistakes, but whereas mine are in glossing over and maybe embellishing things a little too much, your mistakes are egregious misunderstandings of the undelying science and biology.Corey Delvine_{November 20, 2017
November
11
Nov
20
20
2017
07:52 PM
7
07
52
PM
PDT}

forexhr: "So, instead of you spending your time and mental energy in writing extensive posts, let evolutionists prove that evolutionary processes can leave the current structural landscape (of an organ for e.g.) and climb another one." Frankly, I have no real confidence in evolutionists proving anything that could be in favor of ID. Maybe I am a pessimist, or a cynic! :) So, I will probably go on spending my time and mental energy. I know very well the Keefe and Szostak paper. It's probably the paper I have commented more frequently upon. It is essentially a paper about protein engineering, and the conclusions that evolutionists think can be drawn from it are not the conclusions that should be drawn by any thinking person. :) I have discussed it even in this thread, but for a more detailed discussion about it look at my previous OP here: https://uncommondescent.com/intelligent-design/what-are-the-limits-of-natural-selection-an-interesting-open-discussion-with-gordon-davisson/ comments #62, 229, 237, 238, 263, 277, 284, 301, 303, 320. As you can see, I have spent a lot of my time and mental energy dealing with that paper. "But, given the high level of mutational neutrality, where the tons of mutations in the gene might not alter the structure it codes for," OK, but that is true only for mutations that happen in functional coding sequences. But, if mutations happen in non functional, non coding sequences (such as duplicated and inactivated genes, or just non functional non coding DNA sequences), then any mutation is neutral, and can at the same time go in any direction, towards any possible state. That's why the evaluation of probabilistic resources fro that type of random walk is so important.gpuccio_{November 19, 2017
November
11
Nov
19
19
2017
08:17 AM
8
08
17
AM
PDT}

gpuccio: Why don't we completely reverse the story of functional information and let the evolutionists eat their own numbers. Here is what I mean. Evolution theory is based on the fundamental premise that genes which code for new structures that provide new biological functions, arise through duplication and modification of pre-existing genes. But, given the high level of mutational neutrality, where the tons of mutations in the gene might not alter the structure it codes for, even if all the mutations in the history of life(10^43) are spent this might be insufficient to alter the underlying structure which provides some biological function. For example, lets look at this paper: Functional Proteins from a random sequence library(1), which comes up with an estimate of 10^91 different structures having ATP binding function. Such an enormous structural landscape clearly shows that even with all evolutionary mutations spent, the evolutionary process is stuck at current structural landscape and it cannot proceed towards new structures, let alone specific or adaptive structures which are beneficial in the environment where the population currently exists. This also explains the observation of evolutionary stasis. If we add to that a mutation rate of about 10^-8 mutations/bp/generation, where a 100,000 mutations must be spent just to produce one mutation in a specific 1000 bp DNA region(where some new gene 'evolves'), it is more than obvious that the evolution theory is a complete hoax. So, instead of you spending your time and mental energy in writing extensive posts, let evolutionists prove that evolutionary processes can leave the current structural landscape (of an organ for e.g.) and climb another one. (1) https://www.researchgate.net/publication/12045894_Functional_proteins_from_a_random-sequence_libraryforexhr_{November 19, 2017
November
11
Nov
19
19
2017
06:02 AM
6
06
02
AM
PDT}

ENOUGH- Back on the boat, both of you! :) There are more trolls flailing about.ET_{November 16, 2017
November
11
Nov
16
16
2017
10:31 AM
10
10
31
AM
PDT}

To all: So, to sum up: There are different variants of the SH3 domain, often rather distant at sequence level, which probably share the same basic folding and structure. However, while a low level homology seems enough to implement that basci similarity, individual variants of the domain have extreme conservation of their sequence through very long evolutionary times, which definitely points to their functional nature (whatever Corey can say). IOWs, we have two level of functional information in the domain: a basic level, shared by all variants, which is implied in the basic folding, and an additional level which serves to "fine tune" the function in each protein scenario. So, while the function of the domain can be described, in general, as "binding some ligand", usually, but not always, a leucine rich peptide with some specific AA sequence, the specific nature of the ligand, of the interaction with it, and of the interaction with the host protein can vary a lot from protein to protein, and therefore from domain variant to domain variant. That "fine tuning" of the generic function seems to require more functional information than the basic folding itself. And the amazing evolutionary conservation of that specific sequence information in the same domain variant contrasts strikingly with the low level homology between different variants, even in the same organism. OK, that said, let's go back to the Riddle/Baker paper, and to Corey's initial statement about it, which was the beginning of everything. Let's read it again: “experiments have swapped amino acids in proteins, heck they’ve even stripped all 20 amino acids away and rebuilt proteins using only 4 amino acids and the protein was still functional” Now, we already know that there is here a basic confusion between the protein and the protein domain. In the light of what we have said about the function, that confusion becomes extremely important. Indeed, we must distinguish here between two different functions: a) The function of the domain, which we have already discussed, and which is similar in all domains at a gross level (binding some ligand), but finely individualized in each domain variant. b) The function of the host protein, which is completely different from case to case, but which of course depends in some measure on the function of the domain. So, a first important question is: what does the paper say about the function of the protein? And the answer is: absolutely nothing. The authors do not even mention it. They are only interested in the function of the domain, as we will see. So, Corey's statement that "the protein was still functional" (after the simplification), is completely wrong if we consider it as regarding the protein. The authors have not verified in any way if the protein hosting the domain could remain functional if a simplified domain is substituted to the wildtype domain. We have no information about that. And yet, that is the really important point, as we will see. However, let's assume that Corey can admit his basic error (he has half done that) and so reformulate the statement as: "and the domain was still functional". Can we accept that? No. Not even that is supported by the paper. However, there is some work in the paper about the domain function. Let's see. They have tested the function of the domain in a very specific way: as binding affinity to a specific peptide, in particular the RALPPLPRY sequence that was used for the artificial selection. So, the two final sequences were able to bind that proline rich sequence. Like the wildtype. They have even evaluated the folding of the different sequences, and they state that FP2 "refolds at almost exactly the same rate as the WT, while FP1 refolds even faster". They even state that this result proves that "the sequence of the src SH3 domain has not been highly optimized by natural selection for rapid folding". A rather heavy statement, IMO. Are they sure that rapid folding is what is needed for the function? However, let's take that for good, for the moment: their two sequences fold as well as, or even better than, the WT (at least in terms of rapidity!). However, there are differences, as we can see in Tab. 2, where the first three columns of data are relative to folding. The fourth column, instead, is about the affinity for the specific ligand used in the experiment. And there are differences here too. Let's not consider S1, S2 and S3, which are the partially simplified variants. Let's go directly to FP1 and FP2, the final variants. To help understand the fourth column, I will remind that it contains the values of the equilibrium dissociation constant (KD) of each sequence in relation to the above mentioned ligand. I paste here a brief explanation of what that means, taken form the following web site: https://www.malvern.com/en/products/measurement-type/binding-affinity
Binding affinity is the strength of the binding interaction between a single biomolecule (e.g. protein or DNA) to its ligand/binding partner (e.g. drug or inhibitor). Binding affinity is typically measured and reported by the equilibrium dissociation constant (KD), which is used to evaluate and rank order strengths of bimolecular interactions. The smaller the KD value, the greater the binding affinity of the ligand for its target.
Emphasis mine. So, what are the Kd values in the Table? wt SH3: 7.5 micromoles FP1: 150 micromoles FP2: 38 micromoles Remember: The smaller the KD value, the greater the binding affinity of the ligand for its target. So, we can see that both final sequences have a significantly lower affinity for the substrate than the WT. Now, of course, we don't know what the best affinity is for the domain function. Even a higher affinity could be deleterious. But we know that the WT is functional. While we have no evidence that the simplified versions are functional. In the context of the whole protein. So, can a significant difference in affinity to a specific substrate, in particular a lower affinity for it, be deleterious to that final function? Of course it can. So, have we any evidence that the simplified versions can really be functional in a real protein, in a real biological context? Of course no. We only know that the two simplified sequences fold almost as rapidly or more rapidly than the WT (which could already be a problem), and that both sequences have lower affinity for one specific substrate (which could almost certainly be a problem). But there is more. Do we know that the chosen substrate represents well what the domain and the protein do in real life? No, we have no evidence for that. Some other ligand, even slightly different, could have some basic importance in the function of the domain and of the protein. And the differences in affinity for those other ligands could be greater than the differences for the chosen ligand, which after all is the ligand for which the final sequences were selected, and therefore the one with which they should reasonably perform better! So, what remains of Corey's initial statement that: “experiments have swapped amino acids in proteins, heck they’ve even stripped all 20 amino acids away and rebuilt proteins using only 4 amino acids and the protein was still functional” Practically nothing! The experiment was about a short domain, and not a protein. The simplification was done with 5 AAs, and not 4. The simplification was effected in less that 50% of the sequence, and not in all of it. We know nothing about the functionality of the protein, with the simplified sequences. We know very little about the functionality of the domain with the simplified sequences: that it folds rapidly, and that it retains affinity for the specific ligand with which the artificial selection was made, but at a definitely lower level. We know nothing about other possible ligands, or about the interaction of the domain with its host protein. So, let's say, graciously, that Corey's statement was, at best, preposterous. But we have learnt a lot of interesting things in understanding why. :)gpuccio_{November 16, 2017
November
11
Nov
16
16
2017
10:10 AM
10
10
10
AM
PDT}

The clarification about the text alignment issue helps. Thanks. I'm still digesting the whole explanation, very technical as usual, but so far it seems clear.Dionisio_{November 16, 2017
November
11
Nov
16
16
2017
06:14 AM
6
06
14
AM
PDT}

To all: I am sorry for the bad formatting of the alignments. The middle line seems not to stay where it should! OK, I hope the meaning is clear just the same. :)gpuccio_{November 16, 2017
November
11
Nov
16
16
2017
05:41 AM
5
05
41
AM
PDT}

To all: Now I would like to address thw problem of function in relation to the SH3 domain. To do that, I will introduce another interesting paper: "SH3-like Fold Proteins are Structurally Conserved and Functionally Divergent" http://www.eurekaselect.com/79432/article The abstract says:
The folding space for all the protein sequences is limited. Therefore it was observed that many proteins, whose sequences are not related, have similar fold characteristics. The fold databases like SCOP and CATH have classified various protein folds. However, in-depth analysis of the functional features of these folds was not done. We analyzed about twenty unique SH3-like folded proteins in their structural environment and functional characteristics. From our analysis it is apparent that the SH3-like folds could carry out various functions by modulation of loops and the functional region is restricted to one side of a particular sheet helped by two or three loops. The functions vary from oligonucleotide-binding to peptide-binding and other ligand binding. Although certain degree of sequence similarity was observed among the SH3-fold proteins, the similarity was restricted to the ?-strand regions of the proteins.
The paper is about the different proteins which include an SH3 domain (called simply, in the article, SH3-fold proteins). I quote: "Here, we review the structural and functional features of many different SH3-fold proteins and our analysis show that the functional region of the fold, to a large extent is conserved to bind to a variety of ligands." Emphasis mine. So, the first important point is: the SH3 domain can bind to a variety of ligands. And: "All the proteins considered in this study were superposed to align them structurally by taking the chicken spectrin SH3 as reference molecule. The proteins considered in this study are listed in Table. I." Now, let's see the SH3 domain in chicken pectrin. Here it is: TGKELVLALYDYQEKSPREVTMKKGDILTLLNSTNKDWWKVEVNDRQGFVPAAYVKKLDP It is located more or less at the center of the spectrin chain, a 2477 AAs protein (the alpha chain). What is the function of spectrin? Here is what Uniprot says: "Morphologically, spectrin-like proteins appear to be related to spectrin, showing a flexible rod-like structure. They can bind actin but seem to differ in their calmodulin-binding activity. In nonerythroid tissues, spectrins, in association with some other proteins, may play an important role in membrane organization." Now, let's see again the SH3 sequence in the paper we have discussed, the sequence from Proto-oncogene tyrosine-protein kinase Src, P12931, 536 AAs long: TFVALYDYESRTETDLSFKKGERLQIVNNTEGDWWLAHSLSTGRTGYIPSNYVAPSD In this protein, the domain is placed near the N terminal part of the protein. What is the function of Proto-oncogene tyrosine-protein kinase Src? Here is what Uniprot says: "Non-receptor protein tyrosine kinase which is activated following engagement of many different classes of cellular receptors including immune response receptors, integrins and other adhesion receptors, receptor protein tyrosine kinases, G protein-coupled receptors as well as cytokine receptors. Participates in signaling pathways that control a diverse spectrum of biological activities including gene transcription, immune response, cell adhesion, cell cycle progression, apoptosis, migration, and transformation. Due to functional redundancy between members of the SRC kinase family, identification of the specific role of each SRC kinase is very difficult. " And much more. So, two very different proteins, it seems. But what does the same domain (SH3) do in such different proteins? We can certainly imagine that it binds some ligand. But first of all, are these two SH3 domains the same thing? We know that, as far as we know, all SH3 domains share a similar folding and structure. In this sense, they are certainly similar. But I have blasted the two short sequences. Here is the result: Score: 47.0 bits Expect: 1e-14 Identities: 20/56(36%) Positives: 34/56(60%) Gaps: 2/56(3%) VLALYDYQEKSPREVTMKKGDILTLLNSTNKDWWKVE--VNDRQGFVPAAYVKKLD +ALYDY+ ++ +++ KKG+ L ++N+T DWW R G++P+ YV D FVALYDYESRTETDLSFKKGERLQIVNNTEGDWWLAHSLSTGRTGYIPSNYVAPSD Well, that the two sequences are realted there is no doubt: an Expect of 1e-14 is more than enough for that. But they are also so different! Only 20 identities (36% of the aligned part)! OK, I can already heart Corey shouting triumphantly: "I told you! These two sequences have low homology and still they implement the same function!" Maybe. We have examples of that, certainly. But here, are we really sure that they "implement the same function"? We have two important points: a) The SH3 domains binds to a variety of ligands. b) The two proteins where the domain is included have very different sequences and functions. So, what if those two domains, while sharing a gross folding and structure, have some different function, because they: a) Bind different ligands and: b) Interact differently with the whole protein and its specific function? That would certainly explain the difference in sequence. The functionally required difference in sequence. Is it so? Let's see. We learn form the Kishan paper that SH3 can bind both peptides and DNA, although peptide binding is certainly the most common scenario. The short peptide motifs to which the domain binds can vary much. Here is another paper of 2012: "SH3 domain ligand binding: What’s the consensus and where’s the specificity?" http://www.sciencedirect.com/science/article/pii/S0014579312003316
Abstract An increasing number of SH3 domain–ligand interactions continue to be described that involve the conserved peptide-binding surface of SH3, but structurally deviate substantially from canonical docking of consensus motif-containing SH3 ligands. Indeed, it appears that that the relative frequency and importance of these types of interactions may have been underestimated. Instead of atypical, we propose referring to such peptides as type I or II (depending on the binding orientation) non-consensus ligands. Here we discuss the structural basis of non-consensus SH3 ligand binding and the dominant role of the SH3 domain specificity zone in selective target recognition, and review some of the best-characterized examples of such interactions.
So, there can be no doubt about the variety of ligands and interactions of which the SH3 domain is capable. But let's see another important aspect. Let's consider again the two SH3 sequences we have already seen. The question is, as usual: are those sequences evolutionary conserved? To answer that, let's ask for the help of our ussal friends, the sharks: :) 1) Sequence from the spectrin alpha chain. Blast: human - cartilaginous fish Well, the human sequence is identical to the chicken sequence: 60/60 identities. So, we can just look at the homology with sharks: The best hit is wuìith Callorhinchus milii: 60/60 identities! The sequence is identical, after 400+ million years of evolutionary history! 2) Now, the sequence from human Proto-oncogene tyrosine-protein kinase, the one in the Riddle / Baker paper: Again, the best hit is with Callorhincus milii: Bitscore: 110 bits Expect value: 3e-30 Identities: 51/57(89%) Positives: 54/57(94%) Gaps: 0/57(0%) TFVALYDYESRTETDLSFKKGERLQIVNNTEGDWWLAHSLSTGRTGYIPSNYVAPSD TFVALYDYESRT +DLSFKKGERLQIVNNTEGDWWLA SL+TG +GYIPSNYVAPSD TFVALYDYESRTASDLSFKKGERLQIVNNTEGDWWLARSLNTGSSGYIPSNYVAPSD Again, an amazing conservation. But then the question is: How is it that these two versions of SH3 domain, which have so low homology one with the other, and still share the basci folding and structure, are then so conserved in evolutionary history, in the same protein? The obvious answer is: because the additional information is necessary to fine-tune the specific function of the domain in each protein. No more time. More in next post.gpuccio_{November 16, 2017
November
11
Nov
16
16
2017
05:39 AM
5
05
39
AM
PDT}

@257-258 follow-up More related papers referenced @42-45 here: https://uncommondescent.com/evolution/rethinking-biology-what-role-does-physical-structure-play-in-the-development-of-cells/#comment-643574Dionisio_{November 16, 2017
November
11
Nov
16
16
2017
04:59 AM
4
04
59
AM
PDT}

@253: The 2000 paper “Simplified amino acid alphabets for protein fold recognition and implications for folding” is directly cited in at least 158 papers referenced in the following link: https://www.researchgate.net/publication/12540406_Simplified_amino_acid_alphabets_for_protein_fold_recognition_and_implications_for_folding The paper referenced @257 is one of the 158 papers mentioned in the above link.Dionisio_{November 16, 2017
November
11
Nov
16
16
2017
03:39 AM
3
03
39
AM
PDT}

@253: “Simplified amino acid alphabets for protein fold recognition and implications for folding” Cited in: "Basic units of protein structure, folding, and function" Igor N. Berezovsky, Enrico Guarnera, Zejun Zheng Related papers here: http://www.sciencedirect.com/science/article/pii/S0079610716300864 Full text here: https://www.researchgate.net/profile/Igor_Berezovsky/publication/308912133_Basic_units_of_protein_structure_folding_and_function/links/597efbdfaca272d56817fa17/Basic-units-of-protein-structure-folding-and-function.pdfDionisio_{November 16, 2017
November
11
Nov
16
16
2017
02:31 AM
2
02
31
AM
PDT}

gpuccio @254: "Any room left in the boat?" Sorry, but you were not mentioned by your [not so] politely dissenting interlocutor @236. Only ET, Origenes, Mung and I were given such an honor. :) However, since you have provided excellent bait for this fishing adventure, perhaps that qualifies you automatically for a distinguished place in the boat? Also, given that this kind of fish could secrete toxic material, it might help to have a medical doctor on board. Just in case somebody gets in contact with the fish. :)Dionisio_{November 16, 2017
November
11
Nov
16
16
2017
02:17 AM
2
02
17
AM
PDT}

@253: Very interesting comparative analysis of potential protein functionality based on the known 20 AAs set (including both essential and nonessential AAs) vs. a hypothetical 5 AAs set. Very insightful indeed. With at least two interesting papers referenced. The heat is still up in this thread. GP promises more of this interesting analysis ahead. Thanks!Dionisio_{November 16, 2017
November
11
Nov
16
16
2017
02:02 AM
2
02
02
AM
PDT}

Dionisio: "Origenes, ET and Mung, According to the comment @236, I’ve been given the undeserved honor of sharing a boat with you. Can you tell me any details about that boat? Do you plan a trip soon?" Any room left in the boat? :)gpuccio_{November 16, 2017
November
11
Nov
16
16
2017
01:41 AM
1
01
41
AM
PDT}

To all: Now, a few more thoughts about the paper Corey kindly. I have to cover two more big issues: a) How they did it, and what it means. b) The function. So, let's start with the first. Here, Corey can be of some help. In his comment #208, he quickly summarizes the procedure: "If you had actually read the paper instead of just the abstract, you’d see that even for this short amino acid segment, they had to break it up into 3 sections to generate variant libraries." Well, of course I have read the paper, and very carefully. So, I can say that Corey's summary is perfectly correct: they had to break it into 3 sections, and do the search and artificial selection for each of them. IOWs, they simplified each third of the sequence using random libraries, restricted to the 5 chosen AAs, whose individual complexity "averaged 5x10^7" (see Methods). Then, for each third, they artifically selected the sequences that showed function accodring to the method of artificail selection they chose (which I will discuss better when I will deal with the "function" problem: IOWs, they selected the "functional" sequences "by biopanning with proline-rich peptide covered paramagnetic beads". In particular, the beads exhibited the short proline rich peptide RALPPLPRY, which is a ligand for our SH3 domain (see Methods). So, for each of the three subsections, simplified sequences that retained the ability to bind that ligand were selected. Those sequences are shown in Fig. 1 a,b,c. Of course, these are very partial simplifications, because for each simplified third, the remaining two thirds of the molecule have not changed at all. Here again the authors make their subtle deception rather explicit. For example, they state, about the first third: "For example, in the first third of the protein (Fig. 1a), 16 of the 19 residues not involved in binding were converted to I,K,E,A, or G in the most simplified sequences." (emphasis mine) But, as already said, that's absolutely not true. If you look at Fig. 1a, you will see that 3 of those 16 residues 4,21 and 22, have never been converted at all, because in all shown sequences they are absolutely the same as in the wild type. And other residues have conserved the same value as in the wildtype in almost all the variants (for example, 19 and 20). But that's not enough, of course. The authors had to join these partial results to get their final "simplified" sequences. They did that by "randomly splicing together the simplified segments of a number of the partially simplified variants", and then artificially selecting the results, as previously described. Unfortunately, this step of "random splicing" is not detailed in the Methods section. However, we can safely assume that this further step of random search (the random splicing) and artificial selection implement a further exploration of the potential search space. this is an important point. In the end, they found two sequences that could be artificially selected after the whole "simplification" procedure. They are shown in Fig. 2b, and I have already discussed how, in the most "simplified" sequence (FP2), only 28 AAs (less than 50%) have been "simplified" (see my post #238). However, let's take this very limited result for good, for the moment, and try to understand who it was attained. And, of course, make some quantitative assessments. It should be very clear that the procedure used here is, as usual, RV + artificail selection. IOWs, like in Szostak's paper (random phage libraries for variation, beads exhibiting the ligand for selection). But here the RV was limited to 5 AAs, because that is the aim of the study. So, for 28 AAs (those where simplification was achieved) the search space here is: 5^28 = 3.7E+19 = 65 bits How much of that search space was traversed in the RV procedures? It's easy. Each library was about 5x10^7. They used 3 independent libraries, selecting each time the results, before the last recombination step. Therefore they explored the equivalent of a space of about 1.25E+23 + the component linked to the final recombination step. that we cannot evaluate because the size of the random procedure is not detailed. Let's say, conservatively, that they have explored a search space of about 10^25. But, as we said before, the real search space is only of the order of 10^19. Therefore, we can safely assume that they have explored the search space rather completely, by their procedure of RV + artifical selection steps. Therefore, their two "functional" results are probably the only functional sequences in that search space. Well, maybe there are a few others, but certainly not many more. What does that mean? First of all, it means that the two functional sequence that were found for those 28 AAs sequence in the search space of 5 AAs have, as far as we can know, a functional complexity (for the function as defined by the authors) of: 2:3.7E+19 = 5.41E-20 = 64 bits That's very big functional information for such a short sequence, about 2.29 baa. And it is computed top-down! Why is it so? Let's understand. The idea of simplification should, of course, be of making things simpler. So, a simplified alphabet would be really effective if it simplified the functional information. That should work this way: I have an alphabet of 20 AAs, but they are definitely too many. I believe that I can achieve similar functional results using only five of them, because AAs are often of the same kind, and they can be easily substituted without consequences for function. So, let's say that I find a substitution scheme that wroks well enough to preserve function: for example, I will write I every time that an AA in the wildtype is one of maybe 4 explicitly defined AAS, K for another group of maybe 4, and so on. IOWs I should be able to use a translation code from the 20 AAs code to the 5 AAs code. In that case, there would be no need to make all the work that is described in the paper. Once I find the right translation code, I can simply synthesize de novo the correct translated sequence, using only my 5 aas, deriving it directly from the wildtype sequence. But that's not what happened here. Not at all. Because there is no translation code here. There is only a random search helped by many steps of artificial selection, aimed at selection those 5 AAs sequences that retain function, and whose result is two mere sequences. Why? The answer is simple: because it is almost impossible to retain function if we use 5 AAs only. But, you will say, in two cases they did it. True. Because they found the only sequences that can retain the defined function, at some level (more on that in the future discussion about function), because they had some favorable effect from epistasis. In brief, here is a very simple explanation of the concept: "A protein's biological functions emerge from its chemical and physical properties, which in turn are determined by the interactions between its amino acid residues in three?dimensional space. It is therefore not surprising that the functional effect of changing an amino acid often depends on the specific sequence of the protein into which the mutation is introduced. This dependency on genetic context has long been called epistasis by geneticists.1 Epistasis is invoked when the combined effect of two or more mutations deviates from that predicted by adding their individual effects." From: "Epistasis in protein evolution" https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4918427/ So, this is what we observe: there is no translation code from a 20 AAs alphabet to a 5 AAs alphabet, because it is not possible to generate the same functional information that is present in 20 AAs using only 5 AAs. IOWs, reducing the number of AAs quickly degrades functional information. However, in extremely rare cases, substitutions that in general could not be tolerated can still be compatible with some function, because the general epistatic interaction of many different deleterious substitutions can still salvage some function. That is the case in our two "functional" sequences. There would be more to say on this fascinating issue, but I have not the time now, and I still have to deal with the "function" problem! However, we can at least ask ourselves a simple question: does further literature support the idea that we can reduce the number of AAs to 5? No. Take, for example, this paper of 2000: "Simplified amino acid alphabets for protein fold recognition and implications for folding"
Abstract Protein design experiments have shown that the use of specific subsets of amino acids can produce foldable proteins. This prompts the question of whether there is a minimal amino acid alphabet which could be used to fold all proteins. In this work we make an analogy between sequence patterns which produce foldable sequences and those which make it possible to detect structural homologs by aligning sequences, and use it to suggest the possible size of such a reduced alphabet. We estimate that reduced alphabets containing 10–12 letters can be used to design foldable sequences for a large number of protein families. This estimate is based on the observation that there is little loss of the information necessary to pick out structural homologs in a clustered protein sequence database when a suitable reduction of the amino acid alphabet from 20 to 10 letters is made, but that this information is rapidly degraded when further reductions in the alphabet are made.
And this paper is about "picking out structural homologs in a clustered protein sequence database", not about building true functional proteins! And it seems that reducing the alphabet to less than 10 rapidly degrades even that possibility! So, what is it? 5 or 10-12? With the available evidence, I would say neither! As far as we know, proteins are made of 20 AAs, and the genetic code works through 20 aa-tRNA synthetases, and all the rest. Those who believe that a reduction in the number of AAs is feasible should really try to make whole proteins with that number, and verify their function in a true biological context. More on that when I will have the time to discuss the "function" aspect of our paper. In the meantime, Corey will of course go on with his "gibberish"! :)gpuccio_{November 16, 2017
November
11
Nov
16
16
2017
01:34 AM
1
01
34
AM
PDT}

Joe Sixpack: Thank you! :) Any comment from you will be certainly appreciated.gpuccio_{November 15, 2017
November
11
Nov
15
15
2017
11:36 PM
11
11
36
PM
PDT}

ET @250: LOL!Dionisio_{November 15, 2017
November
11
Nov
15
15
2017
06:02 PM
6
06
02
PM
PDT}

You’re now in the same troll-boat as ET, Dio, and Mungy
Which makes Corey the troll we are fishing for. The bait is good and I believe we have him hooked. Nicely done guys.ET_{November 15, 2017
November
11
Nov
15
15
2017
05:27 PM
5
05
27
PM
PDT}

Gpuccio, I just stumbled on to this site and skimmed through the OP and thread. I haven’t given it enough thought to comment yet but I just wanted to commend you on a lively discussion. Too many of these devolve into insults puked on insults.Joe Sixpack_{November 15, 2017
November
11
Nov
15
15
2017
05:07 PM
5
05
07
PM
PDT}

Origenes, ET and Mung, According to the comment @236, I've been given the undeserved honor of sharing a boat with you. Can you tell me any details about that boat? Do you plan a trip soon?Dionisio_{November 15, 2017
November
11
Nov
15
15
2017
04:38 PM
4
04
38
PM
PDT}

Corey @240
Corey: You cannot draw the conclusion that the function also “jumped” into existence solely based on the appearance of a certain amino acid sequence.
Everyone agrees, but no one has suggested otherwise. Functionality of a sequence is simply inferred from the fact that it is conserved for long evolutionary times.
Corey: This is where gpuccio assumes that only this conserved sequence represents the only functional sequence of said protein.
Why would he? That is not at all required for his argument. Quote please.
Corey: This is where Gpuccio is implying that function = conserved sequence homology. And this is completely incorrect.
Frankly, I don’t even know what it is supposed to mean. How can biological function be ‘conserved sequence homology?’ This is all in your warped mind. It doesn’t make sense; gibberish.Origenes_{November 15, 2017
November
11
Nov
15
15
2017
02:26 PM
2
02
26
PM
PDT}

To all: While Corey goes on accumulating errors beyond any possible conception, I must say that I am happy that the discussion is still alive. We have still so many things to say: the failure of neo-darwinism is such a fascinating argument. :)gpuccio_{November 15, 2017
November
11
Nov
15
15
2017
02:24 PM
2
02
24
PM
PDT}

1 2 3 … 10 Next

You must be logged in to post a comment.

Leave a Reply