Uncommon Descent Serving The Intelligent Design Community

Exon Shuffling, and the Origins of Protein Folds

Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

800px-Protein_structure.png

A frequently made claim in the scientific literature is that protein domains can be readily recombined to form novel folds. In Darwin’s Doubt, Stephen Meyer addresses this subject in detail (see Chapter 11). Over the course of this article, I want to briefly expand on what was said there.

Defining Our Terms

Before going on, it may be useful for me to define certain key terms and concepts. I will be referring frequently to “exons” and “introns.” Exons are sections of genes that code for proteins; whereas introns are sections of genes that don’t code for proteins.Introns and exons.png

Proteins have multiple structural levels. Primary structure refers to the linear sequence of amino acids comprising the protein chain. When segments within this chain fold into structures such as helices and loops, this is referred to as secondary structure. Common units of secondary structure include α-helices and β-strands. Tertiary structure is the biologically active form of the protein, and refers to the packing of secondary structural elements into domains. Since a protein’s tertiary structure optimizes the forces of attraction between amino acids, it is the most stable form of the protein. When multiple folded domains are arranged in a multi-subunit complex, it is referred to as a quaternary structure.

A further concept is domain shuffling. This is the hypothesis that fundamentally new protein folds can be created by recombining already-existing domains. This is thought to be accomplished by moving exons from one part of the genome to another (exon shuffling). There are various ways in which exon shuffling might be achieved, and it is to this subject that I now turn.

The Mechanisms of Exon Shuffling

There are several ways in which exon shuffling may occur. Exon shuffling can be transposon-mediated, or it can occur as a result of crossover during meiosis and recombination between non-homologous or (less frequently) short homologous DNA sequences. Alternative splicing is also thought to play a role in facilitating exon shuffling.

When domain shuffling occurs as a result of crossover during sexual recombination, it is hypothesized that it takes place in three stages (called the “modularization hypothesis”). First, introns are gained at positions that correspond to domain boundaries, forming a “protomodule.” Introns are typically longer than exons, and thus the majority of crossover events take place in the noncoding regions. Second, within the inserted introns, the newly formed protomodule undergoes tandem duplication. Third, intronic recombination facilitates the movement of the protomodule to a different, non-homologous, gene.

Another hypothesized mechanism for domain shuffling involves transposable elements such as LINE-1 retroelements and Helitron transposons, as well as LTR retroelements. LINE-1 elements are transcribed into an mRNA that specifies proteins called ORF1 and ORF2, both of which are essential for the process of transposition. LINE-1 frequently associates with 3′ flanking DNA, transporting the flanking sequence to a new locus somewhere else on the genome (Ejima and Yang, 2003Moran et al., 1999Eickbush, 1999). This association can happen if the weak polyadenylation signal of the LINE-1 element is bypassed during transcription, causing downstream exons to be included on the RNA transcript. Since LINE-1’s are “copy-and-paste” elements (i.e. they transpose via an RNA intermediate), the donor sequence remains unaltered.

Long-terminal repeat (LTR) retrotransposons have also been established to facilitate exon shuffling, notably in rice (e.g. Zhang et al., 2013Wang et al., 2006). LTR retrotransposons possess a gag and a pol gene. The pol gene translates into a polyprotein composed of an aspartic protease (which cleaves the polyprotein), and various other enzymes including reverse transcriptase (which reverse transcribes RNA into DNA), integrase (used for integrating the element into the host genome), and Rnase H (which serves to degrade the RNA strand of the RNA-DNA hybrid, resulting in single-stranded DNA). Like LINE-1 elements, LTR retrotransposons transpose in a “copy-and-paste” fashion via an RNA intermediate. There are a number of subfamilies of LTR retrotransposons, including endogenous retroviruses, Bel/Pao, Ty1/copia, and Ty3/gypsy.

Alternative splicing by exon skipping is also believed to play a role in exon shuffling (Keren et al., 2010). Alternative splicing allows the exons of a pre-mRNA transcript to be spliced into a number of different isoforms to produce multiple proteins from the same transcript. This is facilitated by the joining of a 5′ donor site of one intron to the 3′ site of another intron downstream, resulting in the “skipping” of exons that lie in between. This process may result in introns flanking exons. If this genomic structure is reinserted somewhere else in the genome, the result is exon shuffling.There are of course other mechanisms that are hypothesized to play a role in exon shuffling. But this will suffice for our present purposes. Next, we will look at the evidence for and against domain shuffling as an explanation for the origin of new protein folds.

Introns Early vs. Introns Late

It was hypothesized fairly early, after the discovery of introns in vertebrate genes, that they could have contributed to the evolution of proteins. In a 1978 article in Nature, Walter Gilbert first proposed that exons could be independently assorted by recombination within introns (Gilbert, 1978). Gilbert also hypothesized that introns are in fact relics of the original RNA world (Gilbert, 1986). According to the “exons early” hypothesis, all protein-coding genes were created from exon modules — coding for secondary structural elements (such as α-helices, β-sheets, signal peptides, or transmembrane helices) or folding domains — by a process of intron-mediated recombination (Gilbert and Glynias, 1993Dorit et al., 1990).

The alternative “introns late” scenario proposed that introns only appeared much later in the genes of eukaryotes (Hickey and Benkel, 1986Sharp, 1985Cavalier-Smith, 1985Orgel and Crick, 1980). Such a scenario renders exon shuffling moot in accounting for the origins of the most ancient proteins.

The “introns early” hypothesis was the dominant view in the 1980s. The frequently cited evidence for this was the then widespread belief in the general correspondence between exon-intron structure and protein secondary structure.

From the mid 1980s, this view became increasingly untenable, however, as new information came to light (e.g. see Palmer and Logsdon, 1991; and Patthy, 1996199419911987) that raised doubts about a general correlation between protein structure and intron-exon structure. Such a correspondence is not borne out in many ancient protein-coding genes. Moreover, the apparently clearest examples of exon shuffling all took place fairly late in the evolution of eukaryotes, becoming significant only at the time of the emergence of the first multicellular animals (Patthy,19961994).

In addition, analysis of intron splicing junctions suggested a similar pattern of late-arising exon shuffling. The location where introns are inserted and interrupt the protein’s reading frame determines whether exons can be recombined, duplicated or deleted by intronic recombination without altering the downstream reading frame of the modified protein (Patthy, 1987). Introns can be grouped according to three “phases”: Phase 0 introns insert between two consecutive codons; phase 1 introns insert between the first and second nucleotide of a codon; and phase 2 introns insert between the second and third nucleotide.

Thus, if exon shuffling played a major role in protein evolution, we should expect a characteristic intron phase distribution. But the hypothetical modules of ancient proteins do not conform to such expectations (Patthy, 19911987).

It is clear, then, that exon shuffling (at the very least) is unlikely to explain the origins of the most ancient proteins that have emerged in the history of life. But is this mechanism adequate to explain the origins of later proteins such as those that arise in the evolution of eukaryotes? I now turn to evaluate the evidence pro-and-con for the role of exon shuffling in protein origins.

The Case for Exon Shuffling

What, then, are the best arguments for exon shuffling? If the thesis is correct, a prediction would be that exon boundaries should correlate strongly with protein domains. In other words, one exon should code for a single protein domain. One argument, therefore, points to the fact that there is a statistically significant correlation between exon boundaries and protein domains (e.g., see Liu et al., 2005 and Liu and Grigoriev, 2004).

However, there are many, many examples where this correspondence does not hold. In many cases, single exons code for multiple domains. For instance, protocadhedrin genes typically involve large exons coding for multiple domains (Wu and Maniatis, 2000). In other cases, multiple exons are required to specify a single domain (e.g. see Ramasarma et al., 2012; or Buljan et al., 2010).

A further argument for the role of exon shuffling in protein evolution is the intron phase distributions found in the exons coding for protein domains in humans. In 2002, Henrik Kaessmann and colleagues reported that “introns at the boundaries of domains show high excess of symmetrical phase combinations (i.e., 0-0, 1-1, and 2-2), whereas nonboundary introns show no excess symmetry” (Kaessmann, 2002). Their conclusion was thus that “exon shuffling has primarily involved rearrangement of structural and functional domains as a whole.” They also performed a similar analysis on the nematode worm Caenorhabditis elegans, finding that “Although the C. elegans data generally concur with the human patterns, we identified fewer intron-bounded domains in this organism, consistent with the lower complexity of C. elegans genes.”

Another line of evidence relates to genes that appear to be chimeras of parent genes. These are typically associated with signs indicative of its mode of origin. One famous example is the jingweigene in Drosophila, which may have arisen when “the sequence of the processed Adh [alcohol dehydrogenase] messenger RNA became part of a new functional gene by capturing several upstream exons and introns of an unrelated gene” (Long and Langley, 1993).

We must take care, however, not to confuse the observed pattern of intron phase distribution, or exon/domain mapping, with proof that exon shuffling is actually the process by which this pattern arose.

Perhaps common ancestry is the cause, but this must be demonstrated and not assumed. It is the biologist’s duty to determine whether unintelligent chance-based mechanisms actually can produce novel genes in this manner. It is to this question that I now turn.

The Problems with Domain Shuffling as an Explanation for Protein Folds

While the hypothesis of exon shuffling does, taken at face value, have some attractive elements, it suffers from a number of problems. For one thing, the model at its core presupposes the prior existence of protein domains. A protein’s lower-level secondary structures (α-helices and β-strands) exist stably only in the context of the tertiary structures in which they are found. In other words, the domain level is the lowest level at which self-contained stable structural modules exist. This leaves the origins of these domains in the first place unaccounted for. But stable and functional protein domains are demonstrably rare within amino-acid sequence space (e.g. Axe, 2010Axe, 2004Taylor et al., 2001Keefe and Szostak, 2001Reidhaar-Olson and Sauer, 1990Salisbury, 1969).

A fairly recent study examined many different combinations of E. coli secondary structural elements (α-helices, β-strands and loops), assembling them “semirandomly into sequences comprised of as many as 800 amino acid residues” (Graziano et al., 2008). The researchers screened 108 variants for features that might suggest folded structure. They failed, however, to find any folded protein structures. Reporting on this study, Axe (2010) writes:

“After a definitive demonstration that the most promising candidates were not properly folded, the authors concluded that “the selected clones should therefore not be viewed as ‘native-like’ proteins but rather ‘molten-globule-like'”, by which they mean that secondary structure is present only transiently flickering in and out of existence along a compact but mobile chain. This contrasts with native-like structure, where secondary structure is locked-in to form a well defined and stable tertiary fold. Their finding accords well with what we should expect in view of the above considerations. Indeed, it would be very puzzling if secondary structure were modular.”

“For those elements to work as robust modules,” explains Axe, “their structure would have to be effectively context-independent, allowing them to be combined in any number of ways to form new folds.” In the case of protein secondary structure, however, this requirement is not met.

The model also seems to require that the diversity and disparity of functions carried out by proteins in the cell can in principle originate by mixing and matching prior existing domains. But this presupposes the ability of blind evolutionary processes to account for a specific “toolbox” of domains that can be recombined in various ways to yield new functions. This seems unlikely, especially in light of the estimation that “1000 to 7000 exons were needed to construct all proteins” (Dorit et al., 1990). In other words, a primordial toolkit of thousands of diverse protein domains needs to be constructed before the exon shuffling hypothesis even becomes a possibility. And even then there are severe problems.

A further issue relates to interface compatibility. The domain shuffling hypothesis in many cases requires the formation of new binding interfaces. Since amino acids that comprise polypeptide chains are distinguished from one another by the specificity of their side-chains, however, the binding interfaces that allow units of secondary structure (i.e. α-helices and β-strands) to come together to form elements of tertiary structure is dependent upon the specific sequence of amino acids. That is to say, it is non-generic in the sense that it is strictly dependent upon the particulars of the components. Domains that must bind and interact with one another can’t simply be pieced together like jenga tiles.

In his 2010 paper in the journal BIO-Complexity Douglas Axe reports on an experiment conducted using β-lactamase enzymes which illustrates this difficulty (Axe, 2010). Take a look at the following figure, excerpted from the paper:

Beta lactamase comparison.png

The top half of the figure (labeled “A”) reveals the ribbon structure of the TEM-1 β-lactamase (left) and the PER-1 β-lactamase (right). The bottom half of the figure (labeled “B”) reveals the backbone alignments for the two corresponding domains in the two proteins. Note the high level of structural similarity between the two enzymes. Axe attempted to recombine sections of the two genes to produce a chimeric protein from the domains colored green and red. Since the two parent enzymes exhibit extremely high levels of structural and functional similarity, this should be expected to work. No detectable function was identified in the chimeric construct, though, presumably as a consequence of the substantial dissimilarity between the respective amino-acid sequences and the interface incompatibility between the two domains.

This isn’t by any means the only study demonstrating the difficulty of shuffling domains to form new functional proteins. Another study by Axe (2000) described “a set of hybrid sequences” from “the 50%-identical TEM-1 and Proteus mirabilis β-lactamases,” which were created such that the “hybrids match[ed] the TEM-1 sequence except for a region at the C-terminal end, where they [were] random composites of the two parents.” The results? “All of these hybrids are biologically inactive.”

In fact, in the few cases where protein chimeras do possess detectable function, it only works for the precise reason that the researchers used an algorithm (developed by Meyer et al., 2006) to carefully select the sections of a protein structure that possess the fewest side-chain interactions with the rest of the fold, and chose parent proteins with relatively high sequence identity (Voigt et al., 2002). This only serves to underscore the problem. Even in the Voigt study, the success rate was quite low, even with highly favorable circumstances, with only one in five chimeras possessing discernible functionality.

Conclusion

To conclude, although there is some indirect inferential evidence for the role of exon shuffling in protein evolution, a consideration of how such a process might work in reality reveals that the hypothesis itself is fraught with severe difficulties.

This article was originally published at Evolution News & Views (part 1; part 2)

Comments
Indels exist. How did you determine they are blind watchmaker processes? And CONTEXT matters, duh. We were discussing polypeptides of 80 amino acids evolving to polypeptides of 300+ amino acids. Joe
[within-species protein variations] are their own proteins, perhaps related by a common design.
You really ought to try this in a court of law. Close relatives are more 'commonly designed' than distant ones! Heh heh heh.
And one more time- what is your experimental evidence for proteins growing by adding amino acids?
There is a very well-known a class of mutation known as an 'indel'. Are you honestly saying that there is no such thing? It's not just a question of doing a lit-search for a single example of length change - you'd just deny it anyway; see your ridiculous claims about within-species variation. This is embedded in genetics. It's fundamental stuff. I gave you 4 known mechanisms. People didn't just make them up. Do you really think they NEVER happen? Well, that's genetics sorted!
Well seeing that your position can’t account for the tyrosine kinase, that would be an issue for you.
Ha ha! Stock Move #13. Demand evidence for X, get it, then huff "Well your position can't account for the precursor of X!". We'll take it as a gracious concession that proteins can, indeed, grow. Hangonasec
Well seeing that your position can't account for the tyrosine kinase, that would be an issue for you. Joe
How about a 1,130 amino acid tyrosine kinase 'growing' to be a 1,530 amino acid fusion protein, that still retains tyrosine kinase activity? Would that satisfy you? DNA_Jock
We have already pointed out that, any time a stop codon mutates, the protein “grows” by ~~20 amino acids.
And it still functions? Evidence please. What you need is evidence that an 80 amino acid protein can grow into a 300 amino acid protein. That is the context of what I was discussing. Joe
Joe:
Hi DNA Jock- How does your link affect what I am saying? In what way is that an example of a protein growing?
We have already pointed out that, any time a stop codon mutates, the protein "grows" by ~~20 amino acids. Your response was the truly bizarre claim that :
additional amino acids will either alter or bury the active site. That is just the way it is.
This statement is demonstrably false. The two-hybrid system that I linked to demonstrates that one can add many different (random) peptides to many different proteins without disturbing their function. Similarly, it is common practice to make fusion proteins, adding substance P, avidin, a Ni-binding domain, a Maltose binding domain, a myc-tag, FLAG-tag, hemagglutinin-tag, to a protein of interest in order to facilitate its purification. If you had read and understood Keefe & Szostak, you would be aware of this. Perhaps one of the better-informed ID proponents will help us set you straight on this matter. I doubt it, however; there is a curious reticence... DNA_Jock
What happens to activity when a cassette skipped in one spliceoform is included in another? Does the active site get buried or inactivated?
Most likely a new active site emerges, by design. And your position can't explain exon shuffling anyway... Joe
So, what’s the JoeWorld explanation for within-species protein length differences? If they aren’t homologues, related by descent … what are they?
They are their own proteins, perhaps related by a common design. And one more time- what is your experimental evidence for proteins growing by adding amino acids? Joe
Hi DNA Jock- How does your link affect what I am saying? In what way is that an example of a protein growing? Joe
Oh Joe,
additional amino acids will either alter or bury the active site. That is just the way it is.
So how do you explain the fact that two-hybrid screening works? http://en.wikipedia.org/wiki/Two-hybrid_screening Hilarious. DNA_Jock
If all you have is one extra amino acid then you don’t have anything to discuss.
Do you understand what the qualifier "as a minimum" means? Or even 'growth', for that matter?
As for homologs- you cannot tell if it is or if it just looks like a homolog.
Yes - as I predicted: "Of course, I know what the next dodge is going to be – “you cannot prove they are homologues” [...] Care to bet there is no within-species variation in protein length, then?" So, what's the JoeWorld explanation for within-species protein length differences? If they aren't homologues, related by descent ... what are they?
Me: Particularly amusing in a thread about exon shuffling, where different-length mature products are all functional, the extra amino acids in the longer version not ‘getting in the way’ at all. Joe: LoL! They are all distinct proteins in their own right.
Whoosh! The point goes soaring over Joe's head. What happens to activity when a cassette skipped in one spliceoform is included in another? Does the active site get buried or inactivated?
Again additional amino acids will either alter or bury the active site. That is just the way it is.
So no protein can ever change its length because length change is universally fatal? Is that what you are saying? Can any non-ID mechanism cause indels? Hangonasec
If all you have is one extra amino acid then you don't have anything to discuss. As for homologs- you cannot tell if it is or if it just looks like a homolog.
Particularly amusing in a thread about exon shuffling, where different-length mature products are all functional, the extra amino acids in the longer version not ‘getting in the way’ at all.
LoL! They are all distinct proteins in their own right. Again additional amino acids will either alter or bury the active site. That is just the way it is. Joe
Joe
OK hangonasec doesn’t have any evidence for growing proteins. That is what we thought…
In order to show that proteins can grow, I need as a minimum only give 1 instance of a mutation resulting in 1 extra amino acid. Do you really think I can't? Are you basically saying that indels as a class contain nothing but 'dels'? Or that they don't even exist, in either direction? Hangonasec
DNA_Jock: I am happy that you laugh. That is good for your health. All the best to you, sincerely. gpuccio
gpuccio, So you have no rebuttal at all to my comment 87. Good to know. (Note that any agnosticism about whether the improvements are in affinity or in yield in no way affects the validity of the two conclusions outlined therein.) And similarly, if you were being consistent, you would dismiss Axe & Gauger's work as being a 'methodological cheat', given your statement @78
Obviously, it [reproductive advantage] is the only property that anyone is allowed to test is one wants to derive conclusions about NS.
Okay, I guess. You should stop citing Axe's work then. LOL DNA_Jock
DNA_Jock: "given that they do not report the Kd of the ancestral sequence (and it may be impossible to measure), the phrase “improve ATP-binding relative to the ancestral sequence” is agnostic as to whether the “improvement” is in Kd or in yield." So, I am right in being very "agnostic" about those sequences in the original library. Those sequences about which we really know very little, but about which so bold conclusions are made, both in the paper and in your discussion. I think I will stick to my agnosticism, and to my position that a paper which is agnostic about the object of its conclusions is a bad paper. I am happy that you are entertained by my silence about, I suppose, this statement of yours: "According to your bizarre view of indirect measurement, I can breezily dismiss Axe & Gauger’s work because they wash their cells four times in ice-cold phosphate-buffered saline, which is far removed from measuring any “reproductive advantage”, and therefore (according to gpuccio-logic) one can derive no conclusions whatsoever from their work about what NS can or cannot achieve." but the simple reason for my silence is that I find that statement lacking any detectable meaning, like many of the things that you said recently. When I can find some sense in what you say, I still answer (maybe I will change my mind). Otherwise, I don't. gpuccio
So, you think all homologues of a given protein are the same length? Hee hee. Of course, I know what the next dodge is going to be - "you cannot prove they are homologues". Because I wasn't there. Care to bet there is no within-species variation in protein length, then? Particularly amusing in a thread about exon shuffling, where different-length mature products are all functional, the extra amino acids in the longer version not 'getting in the way' at all. Hangonasec
OK hangonasec doesn't have any evidence for growing proteins. That is what we thought... Joe
By what? The additional amino acids, duh.
So you think 'additional amino acids' are always fatal to protein function? In what journal have you published this startling revelation? Hangonasec
By what? The additional amino acids, duh.
Also known as ‘mechanisms’.
Non-sequitur. Look either you have evidence for proteins \growing or you don't. And obviously you don't. Joe
But then you don’t have a STOP codone and your active substrate gets buried.
By what? The additional C-terminus tail? Universally, in all proteins, ever?
Proteins have functional sections tat bind and/ or catalyze reactions. If you block those sections you lose functionality. If you add an amino acid which somehow gets in the way then you lose. I asked for evidence and you bring stories.
Also known as 'mechanisms'. This is the kind of garbage that people routinely bring as 'evidence' against mutation in general. "Mutations are frequently catastrophic". Yeah, I know, so what? Unless they are always catastrophic, you have failed to demonstrate that proteins are unable to grow by mechanisms such as those I outlined, which appears to be your position. Care to place a bet that there are no homologues, in any species pairs anywhere, in which one exemplar has a STOP where the other has an amino acid and subsequent 'tail'? Hangonasec
gpuccio, I don't see why you would think that the passage from Keefe and Szostak that you quoted @88 disagrees in any way with what I have written. Please note the phrase "contributing to the formation of a folded structure". Perhaps you were misled by the final sentence: given that they do not report the Kd of the ancestral sequence (and it may be impossible to measure), the phrase "improve ATP-binding relative to the ancestral sequence" is agnostic as to whether the "improvement" is in Kd or in yield. Note that their selection procedure measures and optimizes primarily for yield, rather than affinity. Separately, I am entertained by your silence on the question of whether (according to gpuccio-logic) Axe and Gauger are guilty of a 'methodological cheat'. DNA_Jock
Hangonasec:
1) Mutate a STOP codon into one with a corresponding charged tRNA, and bingo, your protein has grown, as if by magic, adding (as many residues as sit between this position and the next STOP) + 1 amino acids. M
But then you don't have a STOP codone and your active substrate gets buried. Proteins have functional sections tat bind and/ or catalyze reactions. If you block those sections you lose functionality. If you add an amino acid which somehow gets in the way then you lose. I asked for evidence and you bring stories. Joe
DNA_Jock:
Comparing the round 18 sequences with the ancestral sequence showed that four amino-acid substitutions had become predominant in the selected population (present more than 39 times in 56 sequences, Fig. 3b), and that 16 other substitutions had also been selectively enriched (present more than 4 times in 56 sequences, Fig. 3b). In addition, each clone contained a variable number of other substitutions. The selectively enriched substitutions are distributed over the 62 amino-terminal amino acids of the original 80-amino-acid random region, suggesting that amino acids throughout this region are contributing to the formation of a folded structure, at least in the complex with ATP. The substitutions in each of the assayed clones improve ATP-binding relative to the ancestral sequence.
gpuccio
gpuccio writes:
That ATP-binding is a known biological function is something we agree upon, why do you insist in giving me references about that? My point is that the weak ATP binding in the original sequences in the random library, as far as we know, would be of no use in a real biological context, and that the paper gives no evidence at all to believe or hypothesize differently. While it gives a lot of unnecessary and irrelevant information about an engineer protein which however, as far as we know (and as far as it has been tested) would be of no use in a real biological context.
You appear to be agreeing with us that ATP binding is a ‘known biological function’, but asserting that this function (as embodied in the final, 'engineered' protein) would be of no use in a real biological context. Just. Plain. Weird.
Either they are deriving no conclusions about what NS can do, or they are. If they are, why are they discussing at length an engineered protein, and not the original proteins in the random library? Why didn’t they focus on the original proteins (which are apparently the object of their paper), and on their non engineered properties, and on how NS could act on them?
Your objection to Keefe and Szostak has now migrated from the rather sad “that`s not the experiment I would do” to the genuinely peculiar “that’s not the way I would write the paper”. Seriously, you conclude that they are NOT deriving conclusions about what NS can do, because they spend so much time discussing the results of NS, rather than the starting point. Really? This makes no sense. The paper demonstrates two things 1) Proteins with a minimal function exist (at a low frequency) in random peptide libraries 2) These proteins can, via mutation and selection, evolve into proteins with good function There is a subtlety here: the round 8 proteins, that we have all been referring to as “weak ATP binders”, have not had their Kd measured (in this paper at least). It would be more accurate to refer to the round 8 proteins as “inefficient ATP binders”, since they bind to ATP with a yield of only 5 - 15%. This ‘conformational heterogeneity’ makes studying the proteins very difficult – nobody in their right mind tries to do enzyme kinetics using a sample that is 85 - 95% inactive. So Keefe focused on the biochemical properties of the later proteins because those were the ones that he could measure. It is an interesting thought that the initial binders might bind with quite high affinity -- the paper gives no evidence at all to believe or hypothesize differently – and the subsequent mutation and selection achieved an increase in the yield of protein that had the right conformation. After all, a somewhat disordered peptide can ‘explore’ a huge conformational space, thanks to conformational heterogeneity and induced fit. Subsequent M&S can stabilize the conformation that functions best. DNA_Jock
Joe @75:
Proteins can grow? Evidence please. Proteins are not stalagmites, nor are they living organisms.
1) Mutate a STOP codon into one with a corresponding charged tRNA, and bingo, your protein has grown, as if by magic, adding (as many residues as sit between this position and the next STOP) + 1 amino acids. 2) Insert extra DNA bases between Start and STOP and (provided a truncating STOP is not thereby generated), once again, it has grown. 3) Alter the position of an exon/intron boundary such that it includes more exon and less intron. 4) Less readily, mutate the Shine-Dalgarno sequence or the initial AUG-methionine, lengthening 'the other way', provided there are other initiation signals upstream. Hangonasec
DNA_Jock: Just because I have some spare time today: Either they are deriving no conclusions about what NS can do, or they are. If they are, why are they discussing at length an engineered protein, and not the original proteins in the random library? Why didn't they focus on the original proteins (which are apparently the object of their paper), and on their non engineered properties, and on how NS could act on them? gpuccio
Zachriel: That ATP-binding is a known biological function is something we agree upon, why do you insist in giving me references about that? My point is that the weak ATP binding in the original sequences in the random library, as far as we know, would be of no use in a real biological context, and that the paper gives no evidence at all to believe or hypothesize differently. While it gives a lot of unnecessary and irrelevant information about an engineer protein which however, as far as we know (and as far as it has been tested) would be of no use in a real biological context. But, strangely, both you and DNA_Jock seem to elude that aspect. So, if you prefer, just go on giving me references about things we agree upon. gpuccio
gpuccio @78
Pitiful. Are you denying that reproductive advantage is the only property which is selected by NS?
Heavens, no; I agree that it is the only property that is directly selected by NS.
Obviously, it is the only property that anyone is allowed to test is one wants to derive conclusions about NS.
Obviously not. One can test other properties and make an inference (reasonable or otherwise) that these properties would confer a selectable advantage. You are welcome to argue against the reasonableness of the inference (if you actually had an argument), but to deny that the inference can ever be made is truly "pitiful". According to your bizarre view of indirect measurement, I can breezily dismiss Axe & Gauger's work because they wash their cells four times in ice-cold phosphate-buffered saline, which is far removed from measuring any "reproductive advantage", and therefore (according to gpuccio-logic) one can derive no conclusions whatsoever from their work about what NS can or cannot achieve. I (unlike you, apparently) am comfortable with Axe & Gauger testing for the Bio phenotype without their having to prove that it confers a reproductive advantage. Rather, I dismiss their work because they are asking the wrong question. DNA_Jock
gpuccio: I said: “which as far as we know would be of no use in a real biological context ” Then we pointed out that ATP-binding is a known biological function. See, for instance, Matte & Delbaere, ATP-binding Motifs, in Encyclopedia of Life Sciences, Wiley-Blackwell 2002. Zachriel
Zachriel: Some further reflections. Just to show how vague your concept of "known biological function" is: Sodium bicarbonate has certainly a known biological function. Buffer systems based on it are essential for our survival. I agree with you that proteins are a special example, because of their folding properties and biochemical activities of the residues, which allow gradual "molding" to specific functions. That's why protein engineering is possible. What you need is some starting property which can be intentionally selected and then bottom up engineered. That's what the weak ATP binding is in the paper. NS can do the same only if the starting property and the individual intermediates confer a reproductive advantage to real biological replicators in a real, or appropriately simulated, natural environment. gpuccio
There is no such thing as a natural protein fold. All protein folding is supernatural. Mung
Zachriel: I said: "We can certainly say that in a random library there are some proteins which exhibit some biological function, in the specific case weak ATP binding" And you say: "ATP-binding is a known biological function" So, here we agree. I said: "which as far as we know would be of no use in a real biological context " And you do not comment. Have you any evidence that the original proteins have been shown to be of some use in a real biological context? Or do you agree that there is no evidence of that? I said: "and could never be the object of a process of NS." which is obviously the consequence of the previous statement (the one you have not commented). You say: "If there’s a selection gradient, then natural selection could work on that gradient to increase functionality." But that is obviously not true if we don't start from a property which is naturally selectable. A selection gradient is not enough. You need a starting naturally selectable property and a gradient of naturally selectable states. Here you ave neither. There is no evidence that the original sequences are naturally selectable. There is some evidence (in a following paper) that even the final engineered proteins is not naturally selectable. You say: "This is just a proof of concept. It shows that random sequences can fold into the complex conformations necessary for protein function, and that these can then be optimized through selection." And I agree. But again, your statement cleverly conflates natural selection and intelligent selection (engineering). To sum up (Intelligent selection for dummies): a) Natural selection is a special form of selection where what is selected is always a reproductive advantage in replicators in a natural environment (or in a correct model of a natural environment), and nothing else. That reproductive advantage must be demonstrated in the starting state and in each intermediate state which is supposed to be selected. b) Intelligent selection is any form of selection where an environment explicitly measures a property and reacts to the measurement by promoting or repressing the result of variation according to the measurement. In intelligent selection, both the measurement system and the active intervention on the result of variation are realized by special configurations of the system which implement the measurement and the intervention, and connect the two things. In intelligent selection any property can be selected at any desired level, provided that the system is configured (usually, maybe always, by design) so that it can attain the result. gpuccio
DNA_Jock @76: Pitiful. Are you denying that reproductive advantage is the only property which is selected by NS? Obviously, it is the only property that anyone is allowed to test is one wants to derive conclusions about NS. The experiment I quoted repeatedly (rugged landscape) was about phage infectivity, and I have repeatedly stated that it is a good experiment testing NS. Obviously, infectivity for phages and growth for bacteria are forms of the same thing: reproduction. And reproductive advantage is the property selected by NS. Back-pedalling? Pitiful. If you go on with this tone, I will probably not answer any more. gpuccio
Joe, Much as I hesitate to engage you,
Proteins can grow? Evidence please. Proteins are not stalagmites, nor are they living organisms.
Can you think of a way that a single point mutation could cause a protein to "grow" longer by Poisson(21) amino acids? Hint: the new amino acids get added to the C-terminus... LOL DNA_Jock
gpuccio writes:
DNA_Jock: Wrong. There is no problem is designing an experiments which models natural selection is a somewhat reliable way. One has to be aware of possible differences between the experimental model and the true scenario, but is the experimental model is a good model, at least for the important aspects, it is certainly useful. What is not useful is a bad model.
Fantastic! I am happy that you have finally come around to my way of thinking. Just remember that "That's not the experiment I would have done?" is NOT a valid objection.
I am not selecting growth because I find it “desirable”, but because it is the only property which corresponds to the differential growth which is the mechanism in NS.
So this is the reason why, according to you, 'growth' is the one, unique property that I am allowed to design experiments to optimize. Cool. Would bacteriophage infectivity count too? [Cue back-pedalling] DNA_Jock
Proteins can grow? Evidence please. Proteins are not stalagmites, nor are they living organisms. Joe
Yes, and babies are very short with respect to the median length of human beings ... Proteins can grow. Catalytic activity is achievable even with dipeptides, particularly when they are attached to a larger molecule (which does not have to be the remainder of a protein). Hangonasec
It shows that random sequences can fold into the complex conformations necessary for protein function, and that these can then be optimized through selection.
Short random sequences-> 80 amino acids is very short with respect to the median length of polypeptides known to exist in living organisms. Joe
gpuccio: in the specific case weak ATP binding, which as far as we know would be of no use in a real biological context ATP-binding is a known biological function. See Matte & Delbaere, ATP?binding Motifs, in Encyclopedia of Life Sciences, Wiley-Blackwell 2002. gpuccio: and could never be the object of a process of NS. If there's a selection gradient, then natural selection could work on that gradient to increase functionality. This is just a proof of concept. It shows that random sequences can fold into the complex conformations necessary for protein function, and that these can then be optimized through selection. Returning to the original claim: Jonathan M: But stable and functional protein domains are demonstrably rare within amino-acid sequence space Functional proteins are found in about 10^-11 random sequences. Zachriel
Zachriel: We can certainly say that in a random library there are some proteins which exhibit some biological function, in the specific case weak ATP binding, which as far as we know would be of no use in a real biological context and could never be the object of a process of NS. And we can say that simple methods of intelligent protein engineering can transform that property, through a gradient of affinity, into a strong ATP binding. Which, as far as we know (and there is also a paper about that) would be of no use in a real biological context and could never be the object of a process of NS. Are you satisfied with that summary? gpuccio
DNA_Jock: Wrong. There is no problem is designing an experiments which models natural selection is a somewhat reliable way. One has to be aware of possible differences between the experimental model and the true scenario, but is the experimental model is a good model, at least for the important aspects, it is certainly useful. What is not useful is a bad model. You say: "My understanding from these statements is that any experiment I could set up to prove the in vivo “selectability” of random peptides would be an example of Intelligent Selection, since I would be selecting some function as desirable (e.g.‘growth’), setting the context for its appearance and measuring it." No. If I select growth of a bacterial system as the measured outcome, I am building an acceptable model of NS. Why? Because reproductive advantage is exactly the only property that is supposed to be selected in NS. So, if I make an experiment where the outcome is differential growth, I am on a good path to model NS. I am not selecting growth because I find it "desirable", but because it is the only property which corresponds to the differential growth which is the mechanism in NS. Not so if I choose a weak ATP binding, and then transform it into a strong ATP binding by methods of active engineering, which have nothing to do with a selection based on differential growth (NS). It is so simple. But if one does not want to accept a true concept, one will never accept it. gpuccio
Gpuccio, I am not lying. You have defined “Intelligent Selection" as follows:
IS is any situation in which the system actively measures some property of the mutated object and reacts to that measure in a specific way. [Emphasis added]
And noted
IS requires a conscious intelligent agent who recognizes some function as desirable, sets the context to develop it, can measure it at any desired level, and can intervene in the system to expand any result which shows any degree of the desired function. IOWs, both the definition of the function, the way to measure it, and the interventions to facilitate its emergence are carefully engineered. It’s design all the way.
My understanding from these statements is that any experiment I could set up to prove the in vivo “selectability” of random peptides would be an example of Intelligent Selection, since I would be selecting some function as desirable (e.g.‘growth’), setting the context for its appearance and measuring it. And since:
Using examples of Intelligent Selection to derive conclusions about Natural Selection is methodological error or cheat, because they are two different things, whatever their possible origin. It’s as simple as that.
Szostak’s experiment is a poor model of the natural process because it has the formal properties of IS, not of NS: selection of a property by measurement, and controlled variation + amplification in cycles of and re-selection based on new measurements.
So how on earth could I design an experiment that did not involve “Intelligent Selection”. It can’t be done. Using YOUR definitions. For instance, if I were to replace an essential peptide sequence with a random peptide, and then iteratively mutate and select for growth in vivo, that would be “Intelligent Selection”, and therefore a “methodological error or cheat” according to your definitions, right? DNA_Jock
gpuccio: there is no doubt at all that some proteins in the initial random library exhibit weak binding to ATP. And there is no doubt that we can call this “a function” gpuccio: Instead, he has done a different thing. Rather, he has done an additional thing. Not only did the experiment show that there are functional proteins in random sequences, but it showed that there is a selectable pathway to increased function. Zachriel
DNA_Jock and Zachriel: I am rather tired of your tricks. The problem is not with the word "function". The problem is with the methodology and the conclusions. As you should know, I have no special concept of what is function and what is not. If you read my OP here: https://uncommondesc.wpengine.com/intelligent-design/functional-information-defined/ you will easily see that in my procedure to detect dFSCI any function is valid and can be used to measure a specific functional information. Function is any way an object can be used to do something and have some result. So, I have no objections to considering a weak binding to ATP a function. The simple fact that you both insist in arguing in that sense shows that either you don't understand what I clearly say, or you are playing tricks. Now, I will try to be even more clear, if possible. As I have stated explicitly many times, there is no doubt at all that some proteins in the initial random library exhibit weak binding to ATP. And there is no doubt that we can call this "a function", defining it, for example: "Any molecule which binds to ATP". Fine. No problems. Now, let's say that the purpose of Szostak was to demonstrate that some sequences in the initial random library had this function. It's rather easy. The simple fact that he easily succeeded in selecting and enriching them by ATP columns is proof of the function. He could very well stop there. Or he could simply go on describing these sequences with weak ATP binding as they were, and trying to show why and how they bound to ATP. The result? We would know that in a random library sequences which bind weakly to ATP are present with a frequency such and such. No problems. Instead, he has done a different thing. He has engineered a protein with strong ATP binding form the initial sequences. And then the paper analyzes the properties of this engineered protein. OK, that's fine. What does this prove? Simply, that we can intelligently engineer a protein with strong ATP binding from a sequence with weak ATP binding. No problems with that. So, the conclusions should have been: We have shown that sequences with weak ATP binding are present in a random library with such and such frequency. We have not really said anything else about those sequences. Then, for some strange personal reasons, we have shown that protein engineering works. This is not exactly the tone of the conclusions. That's why I stick to my idea: bad paper, bad methodology, ideological ambiguity. And completely irrelevant to the ID neo darwinism debate. And here is a new pearl from DNA_Jock:
Well, according to your definition of natural (as opposed to ‘intelligent’) selection, such evidence is impossible to come by, by definition. Do you have any evidence that they can’t be naturally selected in a biological system?
This is, apparently, an explicit lie (OK, I have said it). It is not true, at all, that "according to my definition" such evidence is impossible to come by. In my post #27, in this same thread, I write to Hangonasec: "Knockout rescue experiments are certainly more appropriate as models of NS. That is exactly the difference with the Szostak paper." And I have pointed out to you, for example, the rugged landscape paper as a good example of an experiment testing natural selection. Again, you are (intentionally?) equivocating my words. It is perfectly possible to show that a protein is naturally selectable. But you have to do it. Szostak has not done it. Therefore we have no evidence that any of the original proteins in the random library is naturally selectable. Now, you must stop with this ridiculous habit of saying that I must show that they were not naturally selectable. In science, something has a property only if you show that it has that property. It is not the duty of the general public to demonstrate that an object has not a property. It is the duty of those who think that the object has a property to demonstrate that it has it, or that there are reasons to believe that. So, my point is simple. For the neo darwinist scenario, the only relevant function is to be naturally selectable. The Szostak paper demonstrates that some sequences in a random library have weak binding to ATP. It also demonstrates that we can engineer a strong binding to ATP from them by the usual procedures of protein engineering. In no way it demonstrates that the same result can be obtained in a lab scenario which tests for natural selection. It could have been done, but it has not been attempted. That's why the paper is irrelevant to the ID neo darwinism debate, unless we consider it as evidence that design can achieve results. This is my position. You will certainly go on with your tricks but, unless you offer true arguments about these points, I have nothing else to say. gpuccio
Zachriel at #60: I had not seen you #45. It is not a real acknowledgement, but I appreciate it just the same. A simple: OK, I was wrong when I said: "The original random protein exhibited enzymatic activity." would have been more elegant, IMO. gpuccio
Box: Moreover Keefe and Szostak, 2001 is even mentioned in the OP Yes, it's the same claim in both cases. What is your point? Zachriel
follow up on #63: Moreover Keefe and Szostak, 2001 is even mentioned in the OP ... because it is the same article to which I refer in #63 ... as indicated by Jonathan M. Box
Zachriel Jonathan M: Keefe and Szostak, 2001
Jonathan M mentioned the paper in an 2013 article. excerpt:
The Problems with Domain Shuffling as an Explanation for Protein Folds While the hypothesis of exon shuffling does, taken at face value, have some attractive elements, it suffers from a number of problems. For one thing, the model at its core presupposes the prior existence of protein domains. A protein's lower-level secondary structures (alpha-helices and beta-strands) exist stably only in the context of the tertiary structures in which they are found. In other words, the domain level is the lowest level at which self-contained stable structural modules exist. This leaves the origins of these domains in the first place unaccounted for. But stable and functional protein domains are demonstrably rare within amino-acid sequence space (e.g. Axe, 2010; Axe, 2004; Taylor et al., 2001; Keefe and Szostak, 2001; Reidhaar-Olson and Sauer, 1990; Salisbury, 1969).
Box
DNA_Jock: No Zachriel, you don’t understand: the initial peptides showed weak ATP binding, which is not a “function”, whereas the final peptides showed strong ATP binding, which IS a “function”, according to gpuccio Got it! http://www.youtube.com/watch?v=iQrLPtr_ikE Zachriel
No Zachriel, you don't understand: the initial peptides showed weak ATP binding, which is not a "function", whereas the final peptides showed strong ATP binding, which IS a "function", according to gpuccio, as in "I indicated that the final functional protein which is described and analyzed in the paper was engineered. " And we all know that turning something non-functional into something functional requires engineering. See? gp writes:
Have you any evidence that the proteins in the original random library had any catalytic activity?
Yes, some of them catalyzed the reaction glucose -> 3-deoxyhexosulose.
This is the point I made. Please, answer.
That was NOT the point you made. You did not use the word “catalytic”. Please stop making stuff up.
And have you any evidence that the proteins in the original random library can be naturally selected in a biological system?
Well, according to your definition of natural (as opposed to ‘intelligent’) selection, such evidence is impossible to come by, by definition. Do you have any evidence that they can’t be naturally selected in a biological system? Are you ready to explain why binding is necessary for catalysis yet? DNA_Jock
gpuccio: ... acknowledge ... See #45. gpuccio: What do you mean? They were selected because they bound to ATP Yes. That alone shows that they were functional from the get-go, which was the claim. However, we also know that they increased their specificity through rounds of selection showing a gradient of function. This is not what is expected of a simple chemical affinity. We even have a phylogeny of those that were most successful, and they trace back to four progenitor molecules. Zachriel
Zachriel: "If it were merely a simple chemical affinity, as you suggested above, there would be no selection gradient." What do you mean? They were selected because they bound to ATP: "For rounds 1±9, we used a butyl-agarose pre-column (Sigma) and incubated the ¯owthrough with the ATP-af®nity column. Rounds 14±16 included two ATP-agarose selection steps, and rounds 17 and 18 included three ATPagarose selection steps. For reiterated selection steps the eluted material was puri®ed away from ATP on a denaturing Ni-NTA column and reverse transcribed again before the subsequent selection step." "A chemical affinity" is your words, not mine. I said: "They certainly exhibited some binding to ATP", and it simply means that the two molecules, the protein and ATP, bind together, and you can isolate the protein by ATP columns and elution. Which is how they selected them. I really don't understand what you are trying to say (if you are really trying to say something). And I acknowledge, with some sadness, that you were not decent enough. gpuccio
This is precisely how mathematicians use the term. Dice throwing is a random process.
But even the dice doesn't spin randomly, there are forces that determine the result! If we knew every one of them we could predict the result 100%, it has happened with a coin. JimFit
DNA_Jock: Just answer this simple question: Have you any evidence that the proteins in the original random library had any catalytic activity? This is the point I made. Please, answer. And have you any evidence that the proteins in the original random library can be naturally selected in a biological system? This is the other point I made. Please, answer. gpuccio
gpuccio: I indicated that the final functional protein which is described and analyzed in the paper was engineered. It was the result of rounds of amplification and selection for functional activity, yes. gpuccio: The original proteins in the library were not the final engineered protein. They certainly exhibited some binding to ATP, but we know nothing more about their “function”. The function was selectable. If it were merely a simple chemical affinity, as you suggested above, there would be no selection gradient. Zachriel
So, in this context, “binding” is the same thing as having “enzymatic activity”.
In what context? Where is your logic?
In the context of the ability of evolving peptides to achieve particular functions, then avid binding is utterly equivalent to efficient catalysis and specific binding is utterly equivalent to specific catalysis. Which you would realize if you bothered to fill in the blank in my little riddle above, instead of merely asserting "This is wrong." You are aware that binding is necessary for catalysis. Good. Could you explain why binding is necessary for catalysis? [In your defense, I remember a lecturer who introduced the Haldane relationship as if it were a semi-magical property of enzymes; it isn't - rather it is a necessary consequence of the chemical equivalence of binding and catalysis...] This is basic biochemistry. DNA_Jock
Zachriel: I indicated that the final functional protein which is described and analyzed in the paper was engineered. The original proteins in the library were not the final engineered protein. They certainly exhibited some binding to ATP, but we know nothing more about their "function". Certainly, there is absolutely nothing in the paper that shows that they are naturally selectable molecules. Or that they have any enzymatic function. Which was my initial statement, and has not changed at all. Because it is perfectly true. Now, you could at least be decent enough to admit that you were wrong about the enzymatic activity of those molecules. If you want. gpuccio
The paper shows that the functional proteins existed in the original set of random sequences.
And that is of no help to unguided evolution. Joe
Box: Zachriel the incorrigible Gpuccio indicated that the functional proteins were engineered, when, in fact, functional proteins were in the original population of random sequences. Zachriel
gpuccio: the paper explores if we can engineer specific biological functions from random peptides. We can. The paper shows that the functional proteins existed in the original set of random sequences. Zachriel
Yes, unguided evolution is irrelevant and unscientific. Joe
Irrelevant, as usual Joe. Hangonasec
DNA_Jock:
In the published studies, the proteins did not have catalytic activity.
So, you agree that Zachriel was wrong. And I was right.
The term “enzymatic” is ambiguous and, in this context, irrelevant.
The term "enzymatic" is not ambiguous at all. And it was Zachriel who used it, not I. So, just tell him that it is irrelevant.
The paper explores whether random peptides can evolve specific biological functions. They can.
No. As it is, the paper explores if we can engineer specific biological functions from random peptides. We can. Note that making "random peptides" the subject of "evolve" does not change things. It is only a verbal trick.
No, it is a much stronger statement – binding is necessary AND SUFFICIENT for ‘enzymatic’ activity (although the reaction catalyzed might be difficult to assay…).
No. Binding is not sufficient for enzymatic activity. Binding in itself is not catalysis.
how does an enzyme catalyze a reaction? Answer: if and only if it binds with high affinity and specificity to the _________ state.
This is wrong. The correct form is that catalytic activity implies binding, but binding does not imply catalytic activity.
So, in this context, “binding” is the same thing as having “enzymatic activity”.
In what context? Where is your logic? However, I stick to my choice: I will not say it. gpuccio
Random, wrt evolution, means accidental, happenstance, errors and mistakes. There isn't anything useful from such a concept. It cannot be tested which means it is outside of science Joe
JimFit @40
Again i think its a wrong definition to describe something, better use the word incalculable not random.
No - better to use the word 'stochastic'. Which people often do, precisely because there are 4 or 5 non-synonymous meanings of the word 'random'. But it's easy enough to just say 'random', apart from the fact that even after clarification you end up down these rabbit-holes with people who argue that you don't mean what you actually SAY you mean when you use the word!
Randomness doesn’t exist when there is something even if that something is a variable.
What if that something is a random variable? This is precisely how mathematicians use the term. Dice throwing is a random process. Throwing a weighted dice is also a random process. Randomness in the mathematical sense of 'stochastic' does exist 'when there is something'. Nobody - not even atheists, hee hee! - is suggesting that mutation or fixation are uncaused or result purely from unknown mechanism. As in your original statement:
When in science something is random it doesn’t mean what the atheists think, something random is something that we cannot determine its cause yet.
Hangonasec
gpuccio: Zachriel has more tha one time explicitly stated that the initial proteins in SzostaK’s library showed enzymatic activity, and that binding is the same thing as enzymatic activity. You objected to the term "function", so we meant to indicate that binding is a basic activity of enzymes. We apologize for the confusion. Nomenclature has nothing to do with the underlying finding, of course. The study showed that functional proteins occur in random sequences at a frequency of about 10^-11. Zachriel
DNA_Jock: Your interventions become strangely unfair and out of context as soon as the Szostak paper is mentioned. Strange.
Well, I evidently know more about the Szostak technology than you do. If you feel that’s unfair, the solution is entirely within your own hands.
Zachriel has more tha one time explicitly stated that the initial proteins in SzostaK’s library showed enzymatic activity, and that binding is the same thing as enzymatic activity. That is completely wrong, as you should know, and Dr JDD and I have simply tried to explain that simple fact to him. There is no special problem, everyone can be wrong, but it is better to acknowledge it when one is wrong.
In the published studies, the proteins did not have catalytic activity.
So, do you want to be wrong with Zachriel? Are you defending the statement that binding is the same thing as having an enzymatic activity? And that the paper says that the initial proteins had an enzymatic activity? Just to know.
The term “enzymatic” is ambiguous and, in this context, irrelevant. The paper explores whether random peptides can evolve specific biological functions. They can.
If, instead, you are trying simply to say the very trivial thing that binding is necessary for an enzymatic activity, which is so obvious that nobody in his own mind would try to deny it, and pretend that this is a refutation of what Dr JDD and I have said, then you are not simply a fool and an unfair discussant: you are…
No, it is a much stronger statement – binding is necessary AND SUFFICIENT for ‘enzymatic’ activity (although the reaction catalyzed might be difficult to assay…). Did you bother to read what I wrote, viz:
Riddle me this: how does an enzyme catalyze a reaction? Answer: if and only if it binds with high affinity and specificity to the _________ state.
??? So, in this context, “binding” is the same thing as having “enzymatic activity”. Just fill in the blank above. For the fourth time:
They also had a technical problem in optimizing catalysis, but that limitation would not apply in actual living systems.
I will not say it.
Why hold back? You have already accused me of quote-mining. Impugn my motives all you want, but please, please learn some biochemistry. DNA_Jock
DNA_Jock: Your interventions become strangely unfair and out of context as soon as the Szostak paper is mentioned. Strange. Zachriel has more tha one time explicitly stated that the initial proteins in SzostaK's library showed enzymatic activity, and that binding is the same thing as enzymatic activity. That is completely wrong, as you should know, and Dr JDD and I have simply tried to explain that simple fact to him. There is no special problem, everyone can be wrong, but it is better to acknowledge it when one is wrong. So, do you want to be wrong with Zachriel? Are you defending the statement that binding is the same thing as having an enzymatic activity? And that the paper says that the initial proteins had an enzymatic activity? Just to know. If, instead, you are trying simply to say the very trivial thing that binding is necessary for an enzymatic activity, which is so obvious that nobody in his own mind would try to deny it, and pretend that this is a refutation of what Dr JDD and I have said, then you are not simply a fool and an unfair discussant: you are... I will not say it. gpuccio
Good points above, but realize that catalytic activities have been repeatedly isolated from very short (7 amino acid) peptides: http://www.rsc.org/chemistryworld/2014/03/short-amyloid-peptides-self-assemble-catalyst-enzyme REC
Dr JDD wrote:
Biotin has an affinity for streptavidin (very very high).
Well, it would be a lot more accurate to say that streptavidin has a very high affinity for biotin. And that streptavidin has an important biological function, i.e. binding biotin.
An antibody has an affinity for an epitope / antigen. Often very high. Often inhibitory (blocking) in the case of a therapeutic.
Exactly, another example of a useful, biological function. Thus BMS-962476 binds to Proprotein convertase subtilisin kexin-9 and inhibits it, thereby lowering LDL levels. It’s a rather interesting alternative to anti-PCSK9 antibodies, such as evolocumab, bococizumab, and alirocumab, of which I am sure you have heard. How was this BMS-962476 developed? Using the Szostak technology… So, after reading your opening two paragraphs, I assumed you were supporting Zachriel’s position.
I cannot believe we have to debate if an enzyme and having an affinity for something are the same thing. How ridiculous.
It is rather ridiculous. Riddle me this: how does an enzyme catalyze a reaction? Answer: if and only if it binds with high affinity and specificity to the _________ state. Gpuccio wrote
However, I would say that Zachriel has at least one minor justification: he is not a biologist or medical doctor, and he has probably been confounded by the ambiguity of the paper itself.
Well, Zachriel seems to have a much better understanding of the Szostak protein evolution technology than the medical doctors here. Sad, since I have explained this to gpuccio twice already, writing
Phylos Inc demonstrated that using libraries of sizes of ~ 10^13 (e.g. USP 6,261,804), you could evolve peptides that bound to pretty much ANYTHING. Unfortunately, I can’t get much more specific, but here’s a “statement against interest”: the libraries produced better binders if the random peptide was anchored by an invariant ‘scaffold’. They used fibronectin, but I suspect that a bit of beta sheet at each end of the random peptide would have done the trick. They also had a technical problem in optimizing catalysis, but that limitation would not apply in actual living systems.
DNA_Jock
Anyways, the definition generally preferred is the one that corresponds to ‘stochastic’, which is in line with usage in mathematics, statistics, physics and engineering. Mutation, fixation etc are random merely in the sense of having a probability distribution. It is not a statement about causality.
Again i think its a wrong definition to describe something, better use the word incalculable not random. Randomness doesn't exist when there is something even if that something is a variable. JimFit
The main point of the szostak paper is that they found function relatively easily and with a frequency that resembles today's sequence libraries. It seems like a simple and straightforward paper. Don't forget that a molecule binding to a protein, especially one with as much potential energy as an NTP, will induce a conformational change in that protein. Simply binding ATP still serves various functions in cells today. Curly Howard
Dr JDD: Are these all enzymes then? No, but binding is a protein function often found in enzymes. gpuccio: They did not simply “amplify” the proteins, they intentionally “mutagenized” them, using mutagenic PCR, and selected them again, in rounds. That's correct, and the process of selection resulted in increasing affinity. Zachriel
Dr JDD: You are perfectly right, obviously. However, I would say that Zachriel has at least one minor justification: he is not a biologist or medical doctor, and he has probably been confounded by the ambiguity of the paper itself. That ambiguity, instead, has no justification at all, coming from people who certainly know what they are doing and what they are writing. gpuccio
Zachriel: I am sorry, but you are flat wrong. From the paper:
Single representatives of each of these protein families (round 8) were chosen for further study. Only 5±15% of the mRNA-displayed protein prepared from each of these clones binds to immobilized ATP and then elutes with free ATP under selection conditions, consistent with the 6.2% binding and elution with ATP for the library as a whole. One possible explanation for this low level of ATP-binding is conformational heterogeneity, possibly reflecting inefficient folding of these primordial protein sequences. In an effort to increase the proportion of these proteins that fold into an ATP-binding conformation, we mutagenized the library and carried out further rounds of in vitro selection and amplification.
Emphasis added. It is very clear. They did not simply "amplify" the proteins, they intentionally "mutagenized" them, using mutagenic PCR, and selected them again, in rounds. Your position is indefensible, I am sorry for you. gpuccio
Biotin has an affinity for streptavidin (very very high). An antibody has an affinity for an epitope / antigen. Often very high. Often inhibitory (blocking) in the case of a therapeutic. SiRNA has affinity for RNA. Are these all enzymes then? I cannot believe we have to debate if an enzyme and having an affinity for something are the same thing. How ridiculous. Dr JDD
JimFit @33 - atheists ...? Anyways, the definition generally preferred is the one that corresponds to 'stochastic', which is in line with usage in mathematics, statistics, physics and engineering. Mutation, fixation etc are random merely in the sense of having a probability distribution. It is not a statement about causality. Hangonasec
I hear a lot of atheists that use the wrong definition of Randomness in Science, True Randomness can exist only in a state of Nothingness, something random is something that isn't determined by anything nor it determines something, that can happen only in a state of absolute Nothingness since Nothing lucks any law or any cause that could determine something. When in science something is random it doesn't mean what the atheists think, something random is something that we cannot determine its cause yet. JimFit
gpuccio: If the statement is about the proteins in the original library, the only “function” is a weak binding of ATP. Yes, that is the function they chose to look for. gpuccio: The whole paper used methodologically incorrect (in the context) procedures of artificial engineering to transform that basic “function”, which could be better defined as a simple chemical reaction, into something that could look a little bit more as a true “function” (strong binding of ATP. and some gross folding). It wasn't a simple chemical reaction—whatever you think that means—, but a rough structure necessary to bind to ATP, with directed evolution increasing its specificity. gpuccio: If the original sequences were already “functional proteins”, as stated in the final paragraph, why not simply analyze them and describe their function? Because they are rare and difficult to isolate, they are repeatedly amplified. After several rounds of amplification, they determined they were binding ATP, and that they were descendents from just a few of the original molecules. Zachriel
Zachriel: Please, don't insist. Binding a ligand is obviously necessary to catalyze a chemical reaction in which that ligand takes part. But binding in itself is not an enzymatic activity. As I said before, some basic enzymatic activity was found later in a late derivative of the protein in the paper, but certainly not in the original molecules in the random library, where even the binding was really weak. I asked you where in the paper it was stated that "The original random protein exhibited enzymatic activity", as you had said. You quoted the following paragraph: “The frequency of occurrence of functional proteins in random sequence libraries appears to be similar to that observed for equivalent RNA libraries… In conclusion, we suggest that functional proteins are sufficiently common in protein sequence space (roughly 1 in 10^11) that they may be discovered by entirely stochastic means, such as presumably operated when proteins were first used by living organisms.” which makes no mention at all of "enzymatic activity". The reason I think that the paragraph (and the paper) is wrong is that it exploits the essential ambiguity of the word "functional". If the statement is about the proteins in the original library, the only "function" is a weak binding of ATP. The whole paper used methodologically incorrect (in the context) procedures of artificial engineering to transform that basic "function", which could be better defined as a simple chemical reaction, into something that could look a little bit more as a true "function" (strong binding of ATP. and some gross folding). Why? If the original sequences were already "functional proteins", as stated in the final paragraph, why not simply analyze them and describe their function? Why derive a new protein from them by protein engineering, and center all the discussion on that final result? Pretending maybe that the same process could happen by natural selection? Bad methodology, bad paper, ambiguous discussion. gpuccio
gpuccio: Could you please explain what is the chemical reaction whose rate is increased by the participation of the proteins found in the original library? Binding is a basic process found in enzymes. You might also look at Seelig & Szostak, Selection and evolution of enzymes from a partially randomized non-catalytic scaffold, Nature 2007, which resulted in the isolation of novel ligases. gpuccio: And the “suggestion” you quote is exactly the wrong conclusion that is not supported in any way by the facts in the paper. Your question was where in the paper the statement was found. Why do you think Szostak is incorrect in his conclusions? Zachriel
Zachriel: From Wikipedia: "Enzymes are macromolecular biological catalysts." "Catalysis is the increase in the rate of a chemical reaction due to the participation of an additional substance called a catalyst." Could you please explain what is the chemical reaction whose rate is increased by the participation of the proteins found in the original library? A simple advice: if you are not sure of the meaning of a word, just avoid using it. And the "suggestion" you quote is exactly the wrong conclusion that is not supported in any way by the facts in the paper. That's why I say that it is a bad paper. gpuccio
gpuccio: Could you please point to where in the paper that is stated? Essentially the entire paper. "The frequency of occurrence of functional proteins in random sequence libraries appears to be similar to that observed for equivalent RNA libraries... In conclusion, we suggest that functional proteins are sufficiently common in protein sequence space (roughly 1 in 10^11) that they may be discovered by entirely stochastic means, such as presumably operated when proteins were first used by living organisms." gpuccio: It means that they bound ATP weakly, not that they had enzymatic activity. Binding is a basic enzymatic activity, the purpose being to discriminate between all the various molecules in the cell. Zachriel
Hangonasec: Knockout rescue experiments are certainly more appropriate as models of NS. That is exactly the difference with the Szostak paper. However, those experiments have limitations too. You have anticipated a couple of them, and I could add a few others. As I am rather busy at the moment, and I owe a few answers to Piotr and Zachriel, I will not engage in that discussion immediately. But if you are interested, we can certainly deepen the discussion. The paper you point to is probably not the only one of that kind, and I am available to analyze that kind of approach in as much detail as I can. gpuccio
Zachriel: "The original random protein exhibited enzymatic activity" I don't think so. Could you please point to where in the paper that is stated? "What do you think “weak affinity” means?" It means that they bound ATP weakly, not that they had enzymatic activity. gpuccio
gpuccio: The original proteins in the library were random, but the final protein, which is the only one analyzed in the paper for folding etc., was engineered. The original random protein exhibited enzymatic activity. gpuccio: The proteins in the original library were not “active enzymes” at all. They were sequences with some weak affinity for ATP. What do you think "weak affinity" means? Zachriel
Synthetic peptides rescue function in E coli. Now, the purpose of the experiment was not to demonstrate evolution, but to investigate synthetic peptide manufacture. The basic repeat was 'designed' (more accurately, reverse-engineered from a 14-bit residue sequence pattern known to produce stable helical folds). But from a tiny library of 1.6 million variants (out of 10^53 possible variants), they found 4 different peptides which rescued 4/27 different auxotrophic knockout mutants. So these do have biological activity, and it would not be a huge surprise if they performed better than the knockouts in a selective competition. Given that only 27 knockouts were tried, out of 4000+ genes multiplied by however million species there are on earth that could have been tried, this is a remarkable hit rate. Of course they didn't start from scratch, so shelve that objection, but they certainly didn't design the peptides with any of the 4 functions in mind. It's recent work, so the peptides need to be characterised and the mechanism of replacement elucidated. But as it stands, it certainly points towards a richness of biological function in protein space that is at odds with that asserted by defenders of ID. Hangonasec
Zachriel at #20: I will look at that too. And I have nothing against experiments and Petri dishes. There are good experiments, and experiments based on wrong methodology. The Szostak experiment is in the second category. IMO. So, please, stop repeating the senseless objection that I am against all forms of design of an experiment. That is not true. I am simply against experiments badly designed, whose conclusions are not justified by the methodology used. gpuccio
Zachriel: Not exactly. The original proteins in the library were random, but the final protein, which is the only one analyzed in the paper for folding etc., was engineered. You say: "It tests whether random sequence proteins can fold into a protein with a basic enzymatic function. They can and do." No. If it were that way, they would have analyzed the properties of the proteins in the original library, and not those of the final engineered derivative. You say: "active enzymes are not that uncommon in random sequences" Wrong. The proteins in the original library were not "active enzymes" at all. They were sequences with some weak affinity for ATP. An enzyme is a molecule which accelerates a chemical reaction. Your statement is simply wrong. The final protein had a much stronger affinity for ATP and some folding, although still it was not an enzyme. If I remember well, further derivatives showed some basic enzymatic activity, under special conditions. And again, ATP-binding is not an enzymatic function, if no special reaction follows the binding. gpuccio
Wilson & Tucker, Fgf and Bmp signals repress the expression of Bapx1 in the mandibular mesenchyme and control the position of the developing jaw joint, Developmental Biology 2004. Zachriel
See also Singer et al., A wide variety of DNA sequences can functionally replace a yeast TATA element for transcriptional activation, Genes & Development 1990. It's an oldie, but shows the basic process. But the Petri dishes are not natural! Zachriel
gpuccio: Very simply, do you believe that anything that is “selectable” is “naturally selectable”? Depends what it means to select. If it means selecting for an intermediate structure rather than function, then it would not mimic natural selection. Natural selection works without regard to any knowledge of the structure, but selection only according to function. However, the Szostak experiment selected for ATP-binding, a common biological enzymatic function. Zachriel
gpuccio: if you want to state that a protein is naturally selectable, you have to show (not hypothesize) that it confers some reproductive advantage in a biological system. Oh gee whiz. gpuccio: The protein analyzed in Szostac’s paper was selected and engineered through laboratory tools which were very much intelligently designed to recognize a specific biochemical property even at low levels of expression, ... That is incorrect. The sequences were random, not engineered for a specific biochemical property. gpuccio: and then amplify that property by cycles of random variation and intelligent selection. That is correct. They were artificially selected for the specified function. gpuccio: That is not natural selection. No. It's called an *experiment*. It tests whether random sequence proteins can fold into a protein with a basic enzymatic function. They can and do. Our use of the term "selectable" referred to the experimental ability to distinguish the active enzyme from other sequences. Whether you consider it "natural selection" is immaterial to the fact that active enzymes are not that uncommon in random sequences. Zachriel
JonathanM, if you have time, could you briefly comment on this following study and how it might impact the thesis of 'random' exon shuffling? From my nose-bleed section, it seems pretty devastating to me:
Duality in the human genome - November 28, 2014 Excerpt: The results show that most genes can occur in many different forms within a population: On average, about 250 different forms of each gene exist. The researchers found around four million different gene forms just in the 400 or so genomes they analysed. This figure is certain to increase as more human genomes are examined. More than 85 percent of all genes have no predominant form which occurs in more than half of all individuals. This enormous diversity means that over half of all genes in an individual, around 9,000 of 17,500, occur uniquely in that one person - and are therefore individual in the truest sense of the word. The gene, as we imagined it, exists only in exceptional cases. "We need to fundamentally rethink the view of genes that every schoolchild has learned since Gregor Mendel's time.,,, According to the researchers, mutations of genes are not randomly distributed between the parental chromosomes. They found that 60 percent of mutations affect the same chromosome set and 40 percent both sets. Scientists refer to these as cis and trans mutations, respectively. Evidently, an organism must have more cis mutations, where the second gene form remains intact. "It's amazing how precisely the 60:40 ratio is maintained. It occurs in the genome of every individual – almost like a magic formula," says Hoehe. http://medicalxpress.com/news/2014-11-duality-human-genome.html
bornagain77
Zachriel: By the way, just a simple question. Why did you (who are so careful with words) state: "They found selectable proteins" and not: "They found naturally selectable proteins"? Very simply, do you believe that anything that is "selectable" is "naturally selectable"? IOWs, do you believe that natural systems exist that can select for any property that an engineered system can recognize? Just to give an old example, correct English words? Are there natural systems which can do that? gpuccio
Zachriel: The distinction between Intelligent Selection and Natural Selection, which I have discussed in detail many times, including recently with DNA_Jock. In brief, if you want to state that a protein is naturally selectable, you have to show (not hypothesize) that it confers some reproductive advantage in a biological system. The protein analyzed in Szostac's paper was selected and engineered through laboratory tools which were very much intelligently designed to recognize a specific biochemical property even at low levels of expression, and then amplify that property by cycles of random variation and intelligent selection. That is not natural selection. The protein is selected by the active measurement of a biochemical affinity, and engineered laboratory cycles amplify that initial affinity. It is protein engineering, not natural selection. To show that naturally selectable proteins were present in his initial library, Szostac had to introduce those proteins in real biological systems (like bacterial cultures) and let those systems select new functions from the initial library. That's not what he did. gpuccio
gpuccio: And, beyond all the rest, one thing is certain: they were not naturally selectable proteins. What distinction are you attempting to draw? Zachriel
Zachriel: And, beyond all the rest, one thing is certain: they were not naturally selectable proteins. gpuccio
Zachriel: Old story. Debated many times here. Shall we start again? (OK, it was Jonathan who quoted it, I know... :) ) gpuccio
Jonathan M: Keefe and Szostak, 2001 They found selectable proteins in about 1 in 10^11 random sequences. That isn't exactly common, but is well within the posited limitations. Zachriel
Joe: I agree. In the measure that it is confirmed, it is an example of modular design and Object Oriented Programming. :) gpuccio
Exon shuffling- more evidence for Intelligent Design... Joe
JonathanM: Thank you for your wonderful OP. This is a very important topic, and it deserves a lot of attention. :) gpuccio
Hangonasec: I suppose Jonathan was using "folded protein structures" in the sense highlighted by Axe. However, Jonathan can answer himself, certainly. gpuccio
gpuccio, it wasn't Axe's statement, but Jonathan's I was taking issue with. 'Failed to find any folded protein structures' is at odds with numerous statements in the paper, from its title ("Selecting Folded Proteins from a Library of Secondary Structural Elements") onwards. Hangonasec
Hi JonathanM, long time no see, welcome back. :) This topic is important, but was lacking in substantive analysis. I am very glad you have now addressed the topic more fully than you did before. It plugs a huge whole in my references. bornagain77
Hangonasec, Thanks for finding that -- the blog editor actually had the 8 as a superscript but it apparently didn't appear as such on the blog post. I have now fixed it. Jonathan Jonathan M
Hangonasec: To be precise, they found: " Of the 1149 clones screened 4 sequences (0.3%; clones 5.1, 5.6, 5.26 and 5.31) were identified as proteins with significant amounts of secondary structure." Emphasis added. So, Axe's comment remains valid: "“After a definitive demonstration that the most promising candidates were not properly folded, the authors concluded that “the selected clones should therefore not be viewed as ‘native-like’ proteins but rather ‘molten-globule-like’”, by which they mean that secondary structure is present only transiently flickering in and out of existence along a compact but mobile chain. This contrasts with native-like structure, where secondary structure is locked-in to form a well defined and stable tertiary fold. Their finding accords well with what we should expect in view of the above considerations. Indeed, it would be very puzzling if secondary structure were modular.”" gpuccio
I'm lukewarm about the idea that exons have any particular relevance to early protein evolution. After all, exon boundaries only manifest themselves at transcription/edit/translate time, and are not visible to mechanisms of replication and recombination. But what they do show is that, contrary to the thrust of this article, folded soluble peptides can be produced in multiple ways from the same basic elements. Hangonasec
A fairly recent study examined many different combinations of E. coli secondary structural elements (?-helices, ?-strands and loops), assembling them “semirandomly into sequences comprised of as many as 800 amino acid residues” (Graziano et al., 2008). The researchers screened 108 variants for features that might suggest folded structure. They failed, however, to find any folded protein structures.
That's not what the paper says! (A typo, they screened 10^8 not 108. But they got a 0.3% hit rate). Hangonasec

Leave a Reply