Axe on specific barriers to macro-level Darwinian Evolution due to protein formation (and linked islands of specific function)

_{kairosfocus

November 14, 2014

Cell biology, ID Foundations, rhetoric, specified complexity}

Share: Facebook; Twitter; LinkedIn; Flipboard; Print; Email

A week ago, VJT put up a useful set of excerpts from Axe’s 2010 paper on proteins and barriers they pose to Darwinian, blind watchmaker thesis evolution. During onward discussions, it proved useful to focus on some excerpts where Axe spoke to some numerical considerations and the linked idea of islands of specific function deeply isolated in AA sequence and protein fold domain space, though he did not use those exact terms.

I think it worth the while to headline the clips, for reference (instead of leaving them deep in a discussion thread):

_________________

ABSTRACT: >> Four decades ago, several scientists suggested that the impossibility of any evolutionary process sampling anything but a miniscule fraction of the possible protein sequences posed a problem for the evolution of new proteins. This potential problem—the sampling problem—was largely ignored, in part because those who raised it had to rely on guesswork to fill some key gaps in their understanding of proteins. The huge advances since that time call for a care -ful reassessment of the issue they raised. Focusing specifically on the origin of new protein folds, I argue here that the sampling problem remains. The difficulty stems from the fact that new protein functions, when analyzed at the level of new beneficial phenotypes, typically require multiple new protein folds, which in turn require long stretches of new protein sequence. Two conceivable ways for this not to pose an insurmountable barrier to Darwinian searches exist. One is that protein function might generally be largely indifferent to protein sequence. The other is that rela-tively simple manipulations of existing genes, such as shuffling of genetic modules, might be able to produce the necessary new folds. I argue that these ideas now stand at odds both with known principles of protein structure and with direct experimental evidence . . . >>

Pp 5 – 6: >> . . . we need to quantify a boundary value for m, meaning a value which, if exceeded, would solve the whole sampling problem. To get this we begin by estimating the maximum number of opportunities for spontane-ous mutations to produce any new species-wide trait, meaning a trait that is fixed within the population through natural selection (i.e., selective sweep). Bacterial species are most conducive to this because of their large effective population sizes. 3 So let us assume, generously, that an ancient bacterial species sustained an effective population size of 10 ^10 individuals [26] while passing through 10^4 generations per year. After five billion years, such a species would produce a total of 5 × 10 ^ 23 (= 5 × 10^ 9 x 10^4 x 10 ^10 ) cells that happen (by chance) to avoid the small-scale extinction events that kill most cells irrespective of fitness. These 5 × 10 ^23 ‘lucky survivors’ are the cells available for spontaneous muta-tions to accomplish whatever will be accomplished in the species. This number, then, sets the maximum probabilistic resources that can be expended on a single adaptive step. Or, to put this another way, any adaptive step that is unlikely to appear spontaneously in that number of cells is unlikely to have evolved in the entire history of the species.

In real bacterial populations, spontaneous mutations occur in only a small fraction of the lucky survivors (roughly one in 300 [27]). As a generous upper limit, we will assume that all lucky survivors happen to receive mutations in portions of the genome that are not constrained by existing functions 4 , making them free to evolve new ones. At most, then, the number of different viable genotypes that could appear within the lucky survivors is equal to their number, which is 5 × 10^ 23 . And again, since many of the genotype differences would not cause distinctly new proteins to be produced, this serves as an upper bound on the number of new protein sequences that a bacterial species may have sampled in search of an adaptive new protein structure.

Let us suppose for a moment, then, that protein sequences that produce new functions by means of new folds are common enough for success to be likely within that number of sampled sequences. Taking a new 300-residue structure as a basis for calculation (I show this to be modest below), we are effectively supposing that the multiplicity factor m introduced in the previous section can be as large as 20 ^300 / 5×10^ 23 ~ 10 ^366 . In other words, we are supposing that particular functions requiring a 300-residue structure are real-izable through something like 10 ^366 distinct amino acid sequences. If that were so, what degree of sequence degeneracy would be implied? More specifically, if 1 in 5×10 23 full-length sequences are supposed capable of performing the function in question, then what proportion of the twenty amino acids would have to be suit-able on average at any given position? The answer is calculated as the 300 th root of (5×10 23 ) -1 , which amounts to about 83%, or 17 of the 20 amino acids. That is, by the current assumption proteins would have to provide the function in question by merely avoid-ing three or so unacceptable amino acids at each position along their lengths.

No study of real protein functions suggests anything like this degree of indifference to sequence. In evaluating this, keep in mind that the indifference referred to here would have to charac-terize the whole protein rather than a small fraction of it. Natural proteins commonly tolerate some sequence change without com- plete loss of function, with some sites showing more substitutional freedom than others. But this does not imply that most mutations are harmless. Rather, it merely implies that complete inactivation with a single amino acid substitution is atypical when the start-ing point is a highly functional wild-type sequence (e.g., 5% of single substitutions were completely inactivating in one study [28]). This is readily explained by the capacity of well-formed structures to sustain moderate damage without complete loss of function (a phenomenon that has been termed the buffering effect [25]). Conditional tolerance of that kind does not extend to whole proteins, though, for the simple reason that there are strict limits to the amount of damage that can be sustained.

A study of the cumulative effects of conservative amino acid substitutions, where the replaced amino acids are chemically simi-lar to their replacements, has demonstrated this [23]. Two unrelat-ed bacterial enzymes, a ribonuclease and a beta-lactamase, were both found to suffer complete loss of function in vivo at or near the point of 10% substitution, despite the conservative nature of the changes. Since most substitutions would be more disruptive than these conservative ones, it is clear that these protein functions place much more stringent demands on amino acid sequences than the above supposition requires.

Two experimental studies provide reliable data for estimating the proportion of protein sequences that perform specified func -tions [–> note the terms] . One study focused on the AroQ-type chorismate mutase, which is formed by the symmetrical association of two identical 93-residue chains [24]. These relatively small chains form a very simple folded structure (Figure 5A). The other study examined a 153-residue section of a 263-residue beta-lactamase [25]. That section forms a compact structural component known as a domain within the folded structure of the whole beta-lactamase (Figure 5B). Compared to the chorismate mutase, this beta-lactamase do-main has both larger size and a more complex fold structure.

In both studies, large sets of extensively mutated genes were produced and tested. By placing suitable restrictions on the al-lowed mutations and counting the proportion of working genes that result, it was possible to estimate the expected prevalence of working sequences for the hypothetical case where those restric-tions are lifted. In that way, prevalence values far too low to be measured directly were estimated with reasonable confidence.

The results allow the average fraction of sampled amino acid substitutions that are functionally acceptable at a single amino acid position to be calculated. By raising this fraction to the power l, it is possible to estimate the overall fraction of working se-quences expected when l positions are simultaneously substituted (see reference 25 for details). Applying this approach to the data from the chorismate mutase and the beta-lactamase experiments gives a range of values (bracketed by the two cases) for the preva-lence of protein sequences that perform a specified function. The reported range [25] is one in 10 ^77 (based on data from the more complex beta-lactamase fold; l = 153) to one in 10 ^53 (based on the data from the simpler chorismate mutase fold, adjusted to the same length: l = 153). As remarkable as these figures are, par-ticularly when interpreted as probabilities, they were not without precedent when reported [21, 22]. Rather, they strengthened an existing case for thinking that even very simple protein folds can place very severe constraints on sequence. [–> Islands of function issue.]

Rescaling the figures to reflect a more typical chain length of 300 residues gives a prevalence range of one in 10 ^151 to one in 10 ^104 . On the one hand, this range confirms the very highly many-to-one mapping of sequences to functions. The corresponding range of m values is 10 ^239 (=20 ^300 /10 ^151 ) to 10 ^286 (=20 ^300 /10 ^104 ), meaning that vast numbers of viable sequence possibilities exist for each protein function. But on the other hand it appears that these functional sequences are nowhere near as common as they would have to be in order for the sampling problem to be dis-missed. The shortfall is itself a staggering figure—some 80 to 127 orders of magnitude (comparing the above prevalence range to the cutoff value of 1 in 5×10 23 ). So it appears that even when m is taken into account, protein sequences that perform particular functions are far too rare to be found by random sampling.>>

Pp 9 – 11: >> . . . If aligned but non-matching residues are part-for-part equivalents, then we should be able to substitute freely among these equivalent pairs without impair-ment. Yet when protein sequences were even partially scrambled in this way, such that the hybrids were about 90% identical to one of the parents, none of them had detectable function. Considering the sensitivity of the functional test, this implies the hybrids had less than 0.1% of normal activity [23]. So part-for-part equiva-lence is not borne out at the level of amino acid side chains.

In view of the dominant role of side chains in forming the bind-ing interfaces for higher levels of structure, it is hard to see how those levels can fare any better. Recognizing the non-generic [–> that is specific and context sensitive] na-ture of side chain interactions, Voigt and co-workers developed an algorithm that identifies portions of a protein structure that are most nearly self-contained in the sense of having the fewest side-chain contacts with the rest of the fold [49]. Using that algorithm, Meyer and co-workers constructed and tested 553 chimeric pro-teins that borrow carefully chosen blocks of sequence (putative modules) from any of three natural beta lactamases [50]. They found numerous functional chimeras within this set, which clearly supports their assumption that modules have to have few side chain contacts with exterior structure if they are to be transport-Able.

At the same time, though, their results underscore the limita-tions of structural modularity. Most plainly, the kind of modular-ity they demonstrated is not the robust kind that would be needed to explain new protein folds. The relatively high sequence simi-larity (34–42% identity [50]) and very high structural similarity of the parent proteins (Figure 8) favors successful shuffling of modules by conserving much of the overall structural context. Such conservative transfer of modules does not establish the ro-bust transportability that would be needed to make new folds. Rather, in view of the favorable circumstances, it is striking how low the success rate was. After careful identification of splice sites that optimize modularity, four out of five tested chimeras were found to be completely non-functional, with only one in nine being comparable in activity to the parent enzymes [50]. In other words, module-like transportability is unreliable even under extraordinarily favorable circumstances [–> these are not generally speaking standard bricks that will freely fit together in any freely plug- in compatible pattern to assemble a new structure] . . . .

Graziano and co-workers have tested robust modularity directly by using amino acid sequences from natural alpha helices, beta strands, and loops (which connect helices and/or strands) to con-struct a large library of gene segments that provide these basic structural elements in their natural genetic contexts [52]. For those elements to work as robust modules, their structures would have to be effectively context-independent, allowing them to be com-bined in any number of ways to form new folds. A vast number of combinations was made by random ligation of the gene segments, but a search through 10^8 variants for properties that may be in-dicative of folded structure ultimately failed to identify any folded proteins. After a definitive demonstration that the most promising candidates were not properly folded, the authors concluded that “the selected clones should therefore not be viewed as ‘native-like’ proteins but rather ‘molten-globule-like’” [52], by which they mean that secondary structure is present only transiently, flickering in and out of existence along a compact but mobile chain. This contrasts with native-like structure, where secondary structure is locked-in to form a well defined and stable tertiary Fold . . . .

With no discernable shortcut to new protein folds, we conclude that the sampling problem really is a problem for evolutionary accounts of their origins. The final thing to consider is how per-vasive this problem is . . . Continuing to use protein domains as the basis of analysis, we find that domains tend to be about half the size of complete protein chains (compare Figure 10 to Figure 1), implying that two domains per protein chain is roughly typical. This of course means that the space of se-quence possibilities for an average domain, while vast, is nowhere near as vast as the space for an average chain. But as discussed above, the relevant sequence space for evolutionary searches is determined by the combined length of all the new domains needed to produce a new beneficial phenotype. [–> Recall, courtesy Wiki, phenotype: “the composite of an organism’s observable characteristics or traits, such as its morphology, development, biochemical or physiological properties, phenology, behavior, and products of behavior (such as a bird’s nest). A phenotype results from the expression of an organism’s genes as well as the influence of environmental factors and the interactions between the two.”]

As a rough way of gauging how many new domains are typi-cally required for new adaptive phenotypes, the SUPERFAMILY database [54] can be used to estimate the number of different protein domains employed in individual bacterial species, and the EcoCyc database [10] can be used to estimate the number of metabolic processes served by these domains. Based on analysis of the genomes of 447 bacterial species 11, the projected number of different domain structures per species averages 991 (12) . Compar-ing this to the number of pathways by which metabolic processes are carried out, which is around 263 for E. coli,13 provides a rough figure of three or four new domain folds being needed, on aver-age, for every new metabolic pathway 14 . In order to accomplish this successfully, an evolutionary search would need to be capable of locating sequences that amount to anything from one in 10 ^159 to one in 10 ^308 possibilities 15 , something the neo-Darwinian model falls short of by a very wide margin. >>
____________________

Those who argue for incrementalism or exaptation and fortuitous coupling or Lego brick-like modularity or the like need to address these and similar issues. END

PS: Just for the objectors eager to queue up, just remember, the Darwinism support essay challenge on actual evidence for the tree of life from the root up to the branches and twigs is still open after over two years, with the following revealing Smithsonian Institution diagram showing the first reason why, right at the root of the tree of life:

No root, no shoots, folks. (Where, the root must include a viable explanation of gated encapsulation, protein based metabolism and cell functions, code based protein assembly and the von Neumann self replication facility keyed to reproducing the cell.)

Comments

It's always interesting to go back through previous threads to see what Darwinists argued then and what they say now. keiths @ 10:
There is nothing anti-Darwinian about Wagner’s thesis.
Mung @ 45:
That’s about as non-darwinian as you can get.
keiths @ 193:
No, Mung’s objective was to argue that Wagner’s book was non-Darwinian, when it clearly isn’t.
keiths @ 215:
There is nothing anti-Darwinian about Wagner’s thesis.
This provides a good glimpse into the psychology of visitors to UD such as keiths. I never said Wagner's book was anti-Darwinian and keiths even acknowledges this to be the case (more than the one time I showed). According to keiths, the book is clearly not non-darwinian because it is not anti-darwinian. And therefore I must be wrong to say anything about it is non-Darwinian. keiths has not shown that the book is not non-Darwinian, and in fact he now wants to argue that the book is anti-ID. Therefore it must be Darwinian. Right?Mung_{November 14, 2014
November
11
Nov
14
14
2014
05:45 PM
5
05
45
PM
PDT}

KF, Perhaps keiths can answer "The Challenge." We're still waiting for Darwin's Champion to appearMung_{November 14, 2014
November
11
Nov
14
14
2014
05:03 PM
5
05
03
PM
PDT}

pp: feel free to explain and expand. Is this a 5,000 dimension config space, which is rather small for such -- x1, x2 . . x5000? (Phase spaces routinely go over 10^20 or more dimensions.) KFkairosfocus_{November 14, 2014
November
11
Nov
14
14
2014
05:00 PM
5
05
00
PM
PDT}

Keith, Arrival of the Fittest is a cool book so far for me. Just finished chapter 3 where Wagner proposes a 5000 Dimension Universal Library. And I thought 10 Dimension String Theory was wild:)ppolish_{November 14, 2014
November
11
Nov
14
14
2014
04:55 PM
4
04
55
PM
PDT}

KS: I interpret your claims as an offer to answer based on observational evidence that warrants here and now the causal adequacy of blind watchmaker mechanisms from the root of the Darwinist Tree of life up. Kindly cf the PS I added to the OP, and the diagram from the Smithsonian. KF PS: If you care to look here, you will see why it is a matter of fairly easily observed fact (commonplace all around us) that function dependent on specific configuration of interacting components imposes stringent limits on functional clusters of configs, relative to vastly many more that are clumped but non functional, and even more that are scattered. If you cannot tell why that is when confronted by an Abu 6500 C3 reel and a bag of parts for same to be shaken up till they somehow assemble in functional form, then -- with all due respect -- you have a problem with patent facts to go with the longstanding one of denying self-evident first truths of reasoning. If you think that this does not relate to molecular processes in the cell, then ponder the NC machine that uses coded strings to control protein assembly. Selective hyperskepticism and/or denialism about the reality of islands of functional configs in the space of possible clumped/ scattered configs is not a healthy sign.kairosfocus_{November 14, 2014
November
11
Nov
14
14
2014
04:42 PM
4
04
42
PM
PDT}

keith s, I know you are overjoyed whenever you find anything that might cast doubt on the design inference, but, I hate to burst your nihilistic bubble, unsubstantiated criticism to Axe's work is a dime a dozen. Almost every claim that unguided evolution can produce functional proteins is based on 'assuming the conclusion'. i.e. Evolution is assumed as true throughout the process of investigation and it never allowed to be questioned:
Proteins Did Not Evolve Even According to the Evolutionist’s Own Calculations but so What, Evolution is a Fact - Cornelius Hunter - July 2011 Excerpt: For instance, in one case evolutionists concluded that the number of evolutionary experiments required to evolve their protein (actually it was to evolve only part of a protein and only part of its function) is 10^70 (a one with 70 zeros following it). Yet elsewhere evolutionists computed that the maximum number of evolutionary experiments possible is only 10^43. Even here, giving the evolutionists every advantage, evolution falls short by 27 orders of magnitude. http://darwins-god.blogspot.com/2011/07/response-to-comments-proteins-did-not.html Now Evolution Must Have Evolved Different Functions Simultaneously in the Same Protein - Cornelius Hunter - Dec. 1, 2012 Excerpt: In one study evolutionists estimated the number of attempts that evolution could possibly have to construct a new protein. Their upper limit was 10^43. The lower limit was 10^21. These estimates are optimistic for several reasons, but in any case they fall short of the various estimates of how many attempts would be required to find a small protein. One study concluded that 10^63 attempts would be required for a relatively short protein. And a similar result (10^65 attempts required) was obtained by comparing protein sequences. Another study found that 10^64 to 10^77 attempts are required. And another study concluded that 10^70 attempts would be required. In that case the protein was only a part of a larger protein which otherwise was intact, thus making the search easier. These estimates are roughly in the same ballpark, and compared to the first study giving the number of attempts possible, you have a deficit ranging from 20 to 56 orders of magnitude. Of course it gets much worse for longer proteins. http://darwins-god.blogspot.com/2012/12/now-evolution-must-have-evolved.html?showComment=1354423575480#c6691708341503051454
The following article exposes how Darwinists severely twist the data once it reaches the popular press:
The Hierarchy of Evolutionary Apologetics: Protein Evolution Case Study - Cornelius Hunter - January 2011 http://darwins-god.blogspot.com/2011/01/hierarchy-of-evolutionary-apologetics.html
i.e. Nobody, not even Darwinists themselves, ever demonstrate that unguided Darwinian processes can produce functional proteins, but they assume throughout the process of investigation that Darwinism has done so and argue vigorously from that perspective, especially once the research reaches the level of the popular press. But the question being asked all along is exactly that, i.e. 'can unguided processes generate functional proteins?',,, All the empirical evidence we have says that unguided Darwinian processes are grossly inadequate as to the generation of novel proteins. If you disagree with that fact, then please post the exact peer-reviewed paper that refutes Dr. Behe's 'First Rule' paper which examined 4 decades of laboratory evolution experiments and found not even a single novel protein and 'that even the great majority of helpful mutations degrade the genome to a greater or lesser extent',,,
“The First Rule of Adaptive Evolution”: Break or blunt any functional coded element whose loss would yield a net fitness gain - Michael Behe - December 2010 Excerpt: In its most recent issue The Quarterly Review of Biology has published a review by myself of laboratory evolution experiments of microbes going back four decades.,,, The gist of the paper is that so far the overwhelming number of adaptive (that is, helpful) mutations seen in laboratory evolution experiments are either loss or modification of function. Of course we had already known that the great majority of mutations that have a visible effect on an organism are deleterious. Now, surprisingly, it seems that even the great majority of helpful mutations degrade the genome to a greater or lesser extent.,,, I dub it “The First Rule of Adaptive Evolution”: Break or blunt any functional coded element whose loss would yield a net fitness gain. http://behe.uncommondescent.com/2010/12/the-first-rule-of-adaptive-evolution/
Michael Behe talks about the preceding paper on this podcast:
Michael Behe: Challenging Darwin, One Peer-Reviewed Paper at a Time - December 2010 http://intelligentdesign.podomatic.com/player/web/2010-12-23T11_53_46-08_00
Moreover, Axe has defended his work on numerous occasions. The following is one of my favorite defences by him:
Show Me: A Challenge for Martin Poenie - Douglas Axe August 16, 2013 Excerpt: Poenie want to be free to appeal to evolutionary processes for explaining past events without shouldering any responsibility for demonstrating that these processes actually work in the present. That clearly isn't valid. Unless we want to rewrite the rules of science, we have to assume that what doesn't work (now) didn't work (then). It isn't valid to think that evolution did create new enzymes if it hasn't been demonstrated that it can create new enzymes. And if Poenie really thinks this has been done, then I'd like to present him with an opportunity to prove it. He says, "Recombination can do all the things that Axe thinks are impossible." Can it really? Please show me, Martin! I'll send you a strain of E. coli that lacks the bioF gene, and you show me how recombination, or any other natural process operating in that strain, can create a new gene that does the job of bioF within a few billion years. http://www.evolutionnews.org/2013/08/a_challenge_for075611.html
So basically keith s, since we have all seen the over the top bluff and bluster of Darwinists before, if you truly want to falsify ID then show us the empirical evidence from Lenski's e-coli, or some such other similar experiment, where a novel protein, or better yet a molecular machine, was generated by unguided Darwinian processes .bornagain77_{November 14, 2014
November
11
Nov
14
14
2014
03:37 PM
3
03
37
PM
PDT}

keiths:
I will be quoting liberally from Andreas Wagner’s new book Arrival of the Fittest. I highly recommend this book to anyone involved in the ID debate, whether pro or con.
You've already stated that there's nothing new in it. Now that you're actually reading it have you changed your mind?Mung_{November 14, 2014
November
11
Nov
14
14
2014
03:30 PM
3
03
30
PM
PDT}

Cross-posting this from vjtorley's new thread: Vincent, Of all the points you raise in your OP, Axe’s argument is going to be the most fun for me to criticize, but also the most technically involved. I will be quoting liberally from Andreas Wagner’s new book Arrival of the Fittest. I highly recommend this book to anyone involved in the ID debate, whether pro or con. You will be hearing about it again and again, so you need to understand its contents. Denyse did an OP on the book, thinking it was anti-Darwinian. Boy oh boy, was she ever wrong. This book is full of bad news for ID. It’s well-written and fascinating. I think that ID supporters will enjoy it, if they can get past the sinking feeling they’ll experience when they realize the dire implications for ID. The ‘islands of function’ argument for ID was already unsustainable, but this book nails the coffin lid shut. Just thought I’d give readers advance notice in case they want to order the book or download it onto their e-readers. PS Thanks again, Denyse, for bringing the book to my attention. :-)keith s_{November 14, 2014
November
11
Nov
14
14
2014
02:29 PM
2
02
29
PM
PDT}

Hallelujah! A KF thread with open comments!keith s_{November 14, 2014
November
11
Nov
14
14
2014
02:23 PM
2
02
23
PM
PDT}

Prev 1 … 3 4 5

You must be logged in to post a comment.

Leave a Reply