Uncommon Descent Serving The Intelligent Design Community

Is the CSI concept well-founded mathematically, and can it be applied to the real world, giving real and useful numbers?

Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

Those who have been following the recently heated up exchanges on the theory of intelligent design and the key design inference on tested, empirically reliable signs, through the ID explanatory filter, will know that a key move in recent months was the meteoric rise of the mysterious internet persona MathGrrl (who is evidently NOT the Calculus Prof who has long used the same handle).

MG as the handle is abbreviated, is well known for “her” confident-manner assertion — now commonly stated as if it were established fact in the Darwin Zealot fever swamps that are backing the current cyberbullying tactics that have tried to hold my family hostage —  that:

without a rigorous mathematical definition and examples of how to calculate [CSI], the metric is literally meaningless. Without such a definition and examples, it isn’t possible even in principle to associate the term with a real world referent.

As the strike-through emphasises, every one of these claims has long been exploded.

You doubt me?

Well, let us cut down the clip from the CSI Newsflash thread of April 18, 2011, which was again further discussed in a footnote thread of 10th May (H’mm, anniversary of the German Attack in France in 1940), which was again clipped yesterday at fair length.

( BREAK IN TRANSMISSION: BTW, antidotes to the intoxicating Darwin Zealot fever swamp “MG dunit” talking points were collected here — Graham, why did you ask the question but never stopped by to discuss the answer? And the “rigour” question was answered step by step at length here.  In a nutshell, as the real MathGrrl will doubtless be able to tell you, the Calculus itself, historically, was founded on sound mathematical intuitive insights on limits and infinitesimals, leading to the warrant of astonishing insights and empirically warranted success, for 200 years. And when Math was finally advanced enough to provide an axiomatic basis — at the cost of the sanity of a mathematician or two [doff caps for a minute in memory of Cantor] — it became plain that such a basis was so difficult that it could not have been developed in C17. Had there been an undue insistence on absolute rigour as opposed to reasonable warrant, the great breakthroughs of physics and other fields that crucially depended on the power of Calculus, would not have happened.  For real world work, what we need is reasonable warrant and empirical validation of models and metrics, so that we know them to be sufficiently reliable to be used.  The design inference is backed up by the infinite monkeys analysis tracing to statistical thermodynamics, and is strongly empirically validated on billions of test cases, the whole Internet and the collection of libraries across the world being just a sample of the point that the only credibly known source for functionally specific complex information and associated organisation [FSCO/I]  is design.  )

After all, a bit of  careful citation always helps:

_________________

>>1 –> 10^120 ~ 2^398

I = – log(p) . . .  eqn n2
3 –> So, we can re-present the Chi-metric:
[where, from Dembski, Specification 2005,  χ = – log2[10^120 ·ϕS(T)·P(T|H)]  . . . eqn n1]
Chi = – log2(2^398 * D2 * p)  . . .  eqn n3
Chi = Ip – (398 + K2) . . .  eqn n4
4 –> That is, the Dembski CSI Chi-metric is a measure of Information for samples from a target zone T on the presumption of a chance-dominated process, beyond a threshold of at least 398 bits, covering 10^120 possibilities.
5 –> Where also, K2 is a further increment to the threshold that naturally peaks at about 100 further bits . . . .
6 –> So, the idea of the Dembski metric in the end — debates about peculiarities in derivation notwithstanding — is that if the Hartley-Shannon- derived information measure for items from a hot or target zone in a field of possibilities is beyond 398 – 500 or so bits, it is so deeply isolated that a chance dominated process is maximally unlikely to find it, but of course intelligent agents routinely produce information beyond such a threshold.

7 –> In addition, the only observed cause of information beyond such a threshold is the now proverbial intelligent semiotic agents.
8 –> Even at 398 bits that makes sense as the total number of Planck-time quantum states for the atoms of the solar system [most of which are in the Sun] since its formation does not exceed ~ 10^102, as Abel showed in his 2009 Universal Plausibility Metric paper. The search resources in our solar system just are not there.
9 –> So, we now clearly have a simple but fairly sound context to understand the Dembski result, conceptually and mathematically [cf. more details here]; tracing back to Orgel and onward to Shannon and Hartley . . . .
As in (using Chi_500 for VJT’s CSI_lite [UPDATE, July 3: and S for a dummy variable that is 1/0 accordingly as the information in I is empirically or otherwise shown to be specific, i.e. from a narrow target zone T, strongly UNREPRESENTATIVE of the bulk of the distribution of possible configurations, W]):
Chi_500 = Ip*S – 500,  bits beyond the [solar system resources] threshold  . . . eqn n5
Chi_1000 = Ip*S – 1000, bits beyond the observable cosmos, 125 byte/ 143 ASCII character threshold . . . eqn n6
Chi_1024 = Ip*S – 1024, bits beyond a 2^10, 128 byte/147 ASCII character version of the threshold in n6, with a config space of 1.80*10^308 possibilities, not 1.07*10^301 . . . eqn n6a
[UPDATE, July 3: So, if we have a string of 1,000 fair coins, and toss at random, we will by overwhelming probability expect to get a near 50-50 distribution typical of the bulk of the 2^1,000 possibilities W. On the Chi-500 metric, I would be high, 1,000 bits, but S would be 0, so the value for Chi_500 would be – 500, i.e. well within the possibilities of chance.  However, if we came to the same string later and saw that the coins somehow now had the bit pattern of the ASCII codes for the first 143 or so characters of this post, we would have excellent reason to infer that an intelligent designer, using choice contingency, had intelligently reconfigured the coins. that is because, using the same I = 1,000 capacity value, S is now 1, and so Chi_500 = 500 bits beyond the solar system threshold. If the 10^57 or so atoms of our solar system, for its lifespan, were to be converted into coins and tables etc, and tossed at an impossibly fast rate, it would be impossible to sample enough of the possibilities space W to have confidence that something from so unrepresentative a zone T,  could reasonably be explained on chance. So, as long as an intelligent agent capable of choice is possible, choice — i.e. design — would be the rational, best explanation on the sign observed, functionally specific, complex information.]
10 –> Similarly, the work of Durston and colleagues, published in 2007, fits this same general framework . . . .
We use the formula log (20) – H(Xf) to calculate the functional information at a site specified by the variable Xf such that Xf corresponds to the aligned amino acids of each sequence with the same molecular function f. The measured FSC for the whole protein is then calculated as the summation of that for all aligned sites. The number of Fits quantifies the degree of algorithmic challenge, in terms of probability [info and probability are closely related], in achieving needed metabolic function. For example, if we find that the Ribosomal S12 protein family has a Fit value of 379, we can use the equations presented thus far to predict that there are about 10^49 different 121-residue sequences that could fall into the Ribsomal S12 family of proteins, resulting in an evolutionary search target of approximately 10^-106 percent of 121-residue sequence space. In general, the higher the Fit value, the more functional information is required to encode the particular function in order to find it in sequence space . . . .
11 –> So, Durston et al are targetting the same goal, but have chosen a different path from the start-point of the Shannon-Hartley log probability metric for information. That is, they use Shannon’s H, the average information per symbol, and address shifts in it from a ground to a functional state on investigation of protein family amino acid sequences. They also do not identify an explicit threshold for degree of complexity. [Added, Apr 18, from comment 11 below:] However, their information values can be integrated with the reduced Chi metric:
Using Durston’s Fits from his Table 1, in the Dembski style metric of bits beyond the threshold, and simply setting the threshold at 500 bits:
RecA: 242 AA, 832 fits, Chi: 332 bits beyond
SecY: 342 AA, 688 fits, Chi: 188 bits beyond
Corona S2: 445 AA, 1285 fits, Chi: 785 bits beyond  . . . results n7
The two metrics are clearly consistent . . .  (Think about the cumulative fits metric for the proteins for a cell . . . )
In short one may use the Durston metric as a good measure of the target zone’s actual encoded information content, which Table 1 also conveniently reduces to bits per symbol so we can see how the redundancy affects the information used across the domains of life to achieve a given protein’s function; not just the raw capacity in storage unit bits [= no.  of  AA’s * 4.32 bits/AA on 20 possibilities, as the chain is not particularly constrained.]>>

_________________

So, there we have it folks:

I: Dembski’s CSI metric is closely related to standard and widely used work in Information theory, starting with I = – log p

II: It is reducible on taking the appropriate logs, to an information beyond a threshold value

III: The threshold is reasonably set by referring to the accessible search resources of a relevant system, i.e. our solar system or the observed cosmos as a whole.

IV: Where, once an observed configuration — event E, per NFL — that bears or implies information is from a separately and “simply” describable narrow zone T that is strongly unrepresentative — that’s key — of the space of possible configurations, W, then

V: since the search applied is of a very small fraction of W, it is unreasonable to expect that chance can reasonably account for E in T, instead of the far more typical possibilities in W of in aggregate, overwhelming statistical weight.

(For instance the 10^57 or so atoms of our solar system will go through about 10^102 Planck-time Quantum states in the time since its founding on the usual timeline. 10^150 possibilities [500 bits worth of possibilities] is 48 orders of magnitude beyond that reach, where it takes 10^30 P-time states to execute the fastest chemical reactions.  1,000 bits worth of possibilities is 150 orders of magnitude beyond the 10^150 P-time Q-states of the about 10^80 atoms of our observed cosmos. When you are looking for needles in haystacks, you don’t expect to find them on relatively tiny and superficial searches.)

VI: Where also, in empirical investigations we observe that an aspect of an object, system, process or phenomenon that is controlled by mechanical necessity will show itself in low contingency. A dropped, heavy object falls reliably at g. We can make up a set of differential equations and model how events will play out on a given starting condition, i.e we identify an empirically reliable natural law.

VII: By contrast, highly contingent outcomes — those that vary significantly on similar initial conditions, reliably trace to chance factors and/or choice, e.g we may drop a fair die and it will tumble to a value essentially by chance. (This is in part an ostensive definition, by key example and family resemblance.)  Or, I may choose to compose a text string, writing it this way or the next. Or as the 1,000 coins in a string example above shows, coins may be strung by chance or by choice.

VIII: Choice and chance can be reliably empirically distinguished, as we routinely do in day to day life, decision-making, the court room, and fields of science like forensics.  FSCO/I is one of the key signs for that and the Dembski-style CSI metric helps us quantify that, as was shown.

IX:  Shown, based on a reasonable reduction from standard approaches, and shown by application to real world cases, including biologically relevant ones.

We can safely bet, though, that you would not have known that this was done months ago — over and over again — in response to MG’s challenge, if you were going by the intoxicant fulminations billowing up from the fever swamps of the Darwin zealots.

Let that be a guide to evaluating their credibility — and, since this was repeatedly drawn to their attention and just as repeatedly brushed aside in the haste to go on beating the even more intoxicating talking point drums,  sadly, this also raises serious questions on the motives and attitudes of the chief ones responsible for those drumbeat talking points and for the fever swamps that give off the poisonous, burning strawman rhetorical fumes that make the talking points seem stronger than they are.  (If that is offensive to you, try to understand: this is coming from a man whose argument as summarised above has repeatedly been replied to by drumbeat dismissals without serious consideration, led on to the most outrageous abuses by the more extreme Darwin zealots (who were too often tolerated by host sites advocating alleged “uncensored commenting,” until it was too late), culminating now in a patent threat to his family by obviously unhinged bigots.)

And, now also you know the most likely why of TWT’s attempt to hold my family hostage by making the mafioso style threat: we know you, we know where you are and we know those you care about. END

Comments
Indium:
Ok, what about the search space? You seem to assume that the search space is based on the total number of AA acids.
How do you calculate the size of a search space? One poster here claimed that because there were only four possible bases in DNA the size of the search space was 4. Do you agree?Mung
July 21, 2011
July
07
Jul
21
21
2011
11:04 AM
11
11
04
AM
PDT
Doveton @130 At first I thought you were making a very funny joke. "Randomly generated sprinkler system?" I think people get their ideas about evolution from X-Men and Heroes. It's this magic force that randomly drops fully-developed, useful gifts on unsuspecting individuals.ScottAndrews
July 21, 2011
July
07
Jul
21
21
2011
10:19 AM
10
10
19
AM
PDT
The search space volume is vastly exaggerated. Evolution does not have to explore the full sequence space when there are intermediate bridges.
IOW, it doesn't have to search the ocean for islands. It just follows bridges. Leaving aside that the bridges are purely hypothetical, this does nothing to mitigate the problem. If points B, C, D, E, etc. are only accessible by bridges originating from point A, then A is still a primary target. (Not to mention that evolution must create its bridges, not find them.) You could explain this away by reasoning that point A is also arbitrary. After all, maybe biology could have taken some different form. But that's just supporting speculation with more speculation. Let's explain the natural occurrence of current biology before we start imagining new ones.ScottAndrews
July 21, 2011
July
07
Jul
21
21
2011
10:13 AM
10
10
13
AM
PDT
Chris,
But if the need to install a sprinkler system meant the difference between life and death, you wouldn’t have much of an opportunity for a random search before the house burnt down!
You would if all the houses in a given neighborhood had reproductive capability and could pass on improvements to the next generation of houses that didn't burn down all the way due to the randomly generated sprinkler system in the parent stock of houses.Doveton
July 21, 2011
July
07
Jul
21
21
2011
09:44 AM
9
09
44
AM
PDT
Gpuccio It is almost funny how you dance around the main points: 1. The target space designation is post hoc and arbritary. 2. The search space volume is vastly exaggerated. Evolution does not have to explore the full sequence space when there are intermediate bridges. So the only question that remains is this: Can we construct such bridges or not. Your fancy numbers however have no meaning at all, they can tell us nothing regarding the question wether or how some sequence evolved. You will always have to look at potential precursers to do an exact calculation. This is probably why you are concentrating on RP S2: It is so old that it is quite unlikely that we will be able to build a model of it´s evolutionary history. In any case there *are* models for the origin of the superfamilies, for example this paper. On the other hand I am happy to accept that we don´t know exactly how these extremely old parts of the genome evolved. I accept your result that they did not form randomly, so, ahem, well done!Indium
July 21, 2011
July
07
Jul
21
21
2011
09:26 AM
9
09
26
AM
PDT
But if the need to install a sprinkler system meant the difference between life and death, you wouldn't have much of an opportunity for a random search before the house burnt down!Chris Doyle
July 21, 2011
July
07
Jul
21
21
2011
08:19 AM
8
08
19
AM
PDT
There are a number of beneficial enhancements I could make to my home. (Within the context, those would include giving it mobility, sentience, and the ability to self-repair.) Can I mitigate the absurdity of finding a new bathroom via a random search by reasoning that it's not the only possible improvement? After all, that random search could find a pool table or a second floor instead.ScottAndrews
July 21, 2011
July
07
Jul
21
21
2011
08:14 AM
8
08
14
AM
PDT
Joseph: "All this anti-ID CSI bashing and there STILL isn’t any evidence that genetic accidents can accumulate in such a way as to give rise to new, useful and functional multi-part systems." A great truth indeed! Perhaps if darwinists gave less time to bashing ID, and more time to imagination, we could have a greater number of just so stories about why the darwinian theory works :) .gpuccio
July 21, 2011
July
07
Jul
21
21
2011
06:37 AM
6
06
37
AM
PDT
Indium: I am horrified at your epistemology, but anyway, what can I expext from a convinced darwinist? Durston is making an estimate of the target space, not an exact measure. A margin of error is possible, in both directions. It is anyway a reasoonable estimate, the best I have found up to now. Darwinists don't even try, even if that estimate is crucia for their theory. They seem to prefer ignorance, and arrogance. Or they just cheat. The best attempt at an estimate from the darwinian part is the famous Szostak paper, which is methodologically completely wrong. About the abundance of "solutions": the fact remains that each solution must be found in a specific context (organism, environment). As I have tried to show, in a highly organized and complex organism, like a bacterium, whatever the environment, new solutions of some importance are usually few and complex. That's because a really new function must not only work, but also be compatible with the organism, and integrated with its already existing machinery. For instance, in this discussion I am willfully avoiding to consider higher levels of complexity and improbability, for instance that the new protein must be correctly transcribed and translated, at the right time and in the right quantity, in other ways it must be integrated in the complex regulation system of the cell. I am discussing only the sheer improbability of the basic sequence. But goiong back to the discussion: let's imagine that, in a certain cell and in a certain environment there may be, say 10^3 (about 2^10) new complex proteins that, as they are, could confer by themselves a reproductive advantage (IMO, I am very generous here). Let's assume that each of them has a probability of being generated by chance of 1:2^462 bits. Well, how much would the probability be of hitting at least one of the 1000 solutions? You must sum the target space of the solutions, while the search space remains the same. So, the general probability would still be of 1:2^452, that is 452 bits. Not a great improvements, as you can see. So, unless you believe to forests of functional solutions everywhere (well, you are a darwinist, it should not be difficult for you), your position is still rather questionable. You say: "Before a lottery every single person has a very small probability to win, after the lottery one person has won it with a quite high probability." Well, this is really simple ignorance or just, as you asy, "a cheap trick". In a lottery, the probability of one winning is 1 (100%). The victory of a ticket is a necessary event. In the scenario I described, the probability of at least one of the 1000 functional solutions to be found is 1:2^452. We are near the UPB of 500 bits. I would say that it is not the same thing. Maybe even a darwinist should understand that. And yet, the "lottery" argument comes back once in a while, together with other ridiculous "pseudo-darwinian" arguments (I am trying here not to offend the few serious darwinists, who would never use such propaganda tools). You say: You seem to assume that the search space is based on the total number of AA acids. It is. But this is quite obviously false, since, as you admit, in reality you have to look at the number of “new” AAs. No, if you have read well my posts, the meaning should be clear. But we can make it more clear. If we are evaluating the dFSCI of a whole molecule, such as Ribosomal S2 in the Durston table, we refer to all the AAs, and calculate the reduction of uncertainty for each AA position on the total number of aligned sequences, and then sum the results. The global functional complexity of the molecule is the improbability of the whole molecule emerging from a random walk. If, instead, you want to compute the functional complexity of a transition, you take into account only the new AAs. That point is very clear in the Durston paper. Have you read it? So, please let us know what is the precusrsor of the family of Ribosomal S2, and we can do a transition calculation. The only reason why we calculate the whole functional complexity for protein families is that no precursor is known. Indeed, I would say tha no precursor exists. Please, remember that the basic protein superfamilies are totally unrelated at the sequence level. Therefore, using a protein from one superfamily as a precursor for another superfamily is not different than starting from a random sequence. It is, however, a random walk. Every selectable intermediate step makes your argument weaker and weaker. If there is a viable evolutionary series from a starting to the end point, your argument breaks down (as you admit). Yes. You are right. Like all scientific arguments, my argument can be falsified. That is the reason why it is a scientific argument (let me be popperian, once in a while :) ). So, please falsify it. This means that in the end you are just making a quite elaborate god of the gaps argument. No. I am making a quite elaborate sientific and falsifiable argument. Please, review your epistemology, and stop the propaganda. If nobody can show you with a pathetic level of detail how something evolved you just assume design. This is not correct. If you have followed the argument, other things are necessary. And why "pathetic"? A credible level of detail would do. Never seen any in darwinist reasonings. As kf says, it´s just Paleys watch all over again. Kf forever! And Paleys' watch is a very good argument. Always has been. Maybe not "quite elaborate" as mine :) . But vey good just the same.gpuccio
July 21, 2011
July
07
Jul
21
21
2011
06:33 AM
6
06
33
AM
PDT
PS: Durston et al: _________ >> The measure of Functional Sequence Complexity, denoted as ?, is defined as the change in functional uncertainty from the ground state H(Xg(ti)) to the functional state H(Xf(ti)), or ? = ?H (Xg(ti), Xf(tj)). (6) The resulting unit of measure is defined on the joint data and functionality variable, which we call Fits (or Functional bits). The unit Fit thus defined is related to the intuitive concept of functional information, including genetic instruction and, thus, provides an important distinction between functional information and Shannon information [6,32]. Eqn. (6) describes a measure to calculate the functional information of the whole molecule, that is, with respect to the functionality of the protein considered. The functionality of the protein can be known and is consistent with the whole protein family, given as inputs from the database. However, the functionality of a sub-sequence or particular sites of a molecule can be substantially different [12]. The functionality of a sub-molecule, though clearly extremely important, has to be identified and discovered. This problem of estimating the functionality as well as where it is expressed at the sub-molecular level is currently an active area of research in our group. To avoid the complication of considering functionality at the sub-molecular level, we crudely assume that each site in a molecule, when calculated to have a high measure of FSC, correlates with the functionality of the whole molecule. The measure of FSC of the whole molecule, is then the total sum of the measured FSC for each site in the aligned sequences. Consider that there are usually only 20 different amino acids possible per site for proteins, Eqn. (6) can be used to calculate a maximum Fit value/protein amino acid site of 4.32 Fits/site. We use the formula log (20) - H(Xf) to calculate the functional information at a site specified by the variable Xf such that Xf corresponds to the aligned amino acids of each sequence with the same molecular function f. The measured FSC for the whole protein is then calculated as the summation of that for all aligned sites. The number of Fits quantifies the degree of algorithmic challenge, in terms of probability, in achieving needed metabolic function. For example, if we find that the Ribosomal S12 protein family has a Fit value of 379, we can use the equations presented thus far to predict that there are about 10^49 different 121-residue sequences that could fall into the Ribsomal S12 family of proteins, resulting in an evolutionary search target of approximately 10^-106 percent of 121-residue sequence space. In general, the higher the Fit value, the more functional information is required to encode the particular function in order to find it in sequence space. A high Fit value for individual sites within a protein indicates sites that require a high degree of functional information. High Fit values may also point to the key structural or binding sites within the overall 3-D structure. Since the functional uncertainty, as defined by Eqn. (1) is proportional to the -log of the probability, we can see that the cost of a linear increase in FSC is an exponential decrease in probability. For the current approach, both equi-probability of monomer availability/reactivity and independence of selection at each site within the strand can be assumed as a starting point, using the null state as our ground state. For the functional state, however, an a posteriori probability estimate based on the given aligned sequence ensemble must be made. Although there are a variety of methods to estimate P(Xf(t)), the method we use here, as an approximation, is as follows. First, a set of aligned sequences with the same presumed function, is produced by methods such as CLUSTAL, downloaded from Pfam. Since real sequence data is used, the effect of the genetic code on amino acid frequency is already incorporated into the outcome. Let the total number of sequences with the specified function in the set be denoted by M. The data set can be represented by the N-tuple X = (X1, ... XN) where N denotes the aligned sequence length as mentioned earlier. The total number of occurrences, denoted by d, of a specific amino acid "aa" in a given site is computed. An estimate for the probability that the given amino acid will occur in that site Xi, denoted by P(Xi = "aa") is then made by dividing the number of occurrences d by M, or, P(Xi = "aa") = d/M. (7) For example, if in a set of 2,134 aligned sequences, we observe that proline occurs 351 times at the third site, then P ("proline") = 351/2,134. Note that P ("proline") is a conditional probability for that site variable on condition of the presumed function f. This is calculated for each amino acid for all sites. The functional uncertainty of the amino acids in a given site is then computed using Eqn. (1) using the estimated probabilities for each amino acid observed. The Fit value for that site is then obtained by subtracting the functional uncertainty of that site from the null state, in this case using Eqn. (4), log20. The individual Fit values for each site can be tabulated and analyzed. The summed total of the fitness values for each site can be used as an estimate for the overall FSC value for the entire protein and compared with other proteins. >> __________ All of this is reasonable, and is related to how information content of real world codes or examples of languages generating text is estimated.kairosfocus
July 21, 2011
July
07
Jul
21
21
2011
06:11 AM
6
06
11
AM
PDT
Indium: Pardon, but I must be direct. Why is it that you keep on dodging easily accessible corrective information? For instance, kindly cf 111 above, where you will see as a PS:
Indium, sampling theory will tell us that a random or reasonably random sample will be representative of a space. Have you seen how the relative frequency patterns of symbols in codes are identified? By precisely the sort of sampling approach Durston et al took; even though someone out there actually composed and published a whole book in which the letter E never occurs. You are being selectively hyperskeptical.
The question you raise as though it is a crippling objection to what Durston et al did, is in fact an objection to reasonable extension of a standard practice in evaluating the information content of coded messages: see symbol frequencies and use that to estimate the probabilities needed in Shannon's H-metric of avg info per symbol. In the case of Durston et al, if say 5 of the 20 AAs can work -- functional state as opposed to ground state -- in a given position, across the world of life, then we see that the relevant ratio for information measure is 1 in 4 not 1 in 20 [so, less info per symbol at that point: 2 functional bits not 4.32 for that position . . . and in my own look see on Cytochrome C, that was the sort of range up to, where some positions were pretty well fixed to just one AA], and we can go across the aligned segments of the relevant protein families like that. Why not look at point 9 here [scroll down], for a bit of a discussion? Bradley:
9] Recently, Bradley has done further work on this, using Cytochrome C, which is a 110-monomer protein. He reports, for this case (noting along the way that Shannon information is of course really a metric of information-carrying capacity and using Brillouin information as a measure of complex specified information, i.e IB = ICSI below), that: Cytochrome c (protein) -- chain of 110 amino acids of 20 types If each amino acid has pi = .05, then average information “i” per amino acid is given by log2 (20) = 4.32 The total Shannon information is given by I = N * i = 110 * 4.32 = 475, with total number of unique sequences “W0” that are possible is W0 = 2^I = 2^475 = 10^143 Amino acids in cytochrome c are not equiprobable (pi ? 0.05) as assumed above. If one takes the actual probabilities of occurrence of the amino acids in cytochrome c [i.e. by observing relative frequencies across the various forms of this protein across the domain of life], one may calculate the average information per residue (or link in our 110 link polymer chain) to be 4.139 using i = - ? pi log2 pi [TKI NB: which is related of course to the Boltzmann expression for S] Total Shannon information is given by I = N * i = 4.139 x 110 = 455. The total number of unique sequences “W0” that are possible for the set of amino acids in cytochrome c is given by W0 = 2^455 = 1.85 x 10^137 . . . . Some amino acid residues (sites along chain) allow several different amino acids to be used interchangeably in cytochrome-c without loss of function, reducing i from 4.19 to 2.82 and I (i x 110) from 475 to 310 (Yockey) M = 2^310 = 2.1 x 10^93 = W1 Wo / W1 = 1.85 x 10^137 / 2.1 x 10^93 = 8.8 x 10^44 Recalculating for a 39 amino acid racemic prebiotic soup [as Glycine is achiral] he then deduces (appar., following Yockey): W1 is calculated to be 4.26 x 10^62 Wo/W1 = 1.85 x 10^137 / 4.26 x 10^62 = 4.35 x 10^74 ICSI = log2 (4.35 x 10^74) = 248 bits He then compares results from two experimental studies: Two recent experimental studies on other proteins have found the same incredibly low probabilities for accidental formation of a functional protein that Yockey found 1 in 10^75 (Strait and Dewey, 1996) and 1 in 10^65 (Bowie, Reidhaar-Olson, Lim and Sauer, 1990). --> Of course, to make a functioning life form we need dozens of proteins and other similar information-rich molecules all in close proximity and forming an integrated system, in turn requiring a protective enclosing membrane. --> The probabilities of this happening by the relevant chance conditions and natural regularities alone, in aggregate are effectively negligibly different from zero in the gamut of the observed cosmos. --> But of course, we know that agents, sometimes using chance and natural regularities as part of what they do, routinely produce FSCI-rich systems. [Indeed, that is just what the Nanobots and Micro-jets thought experiment shows by a conceivable though not yet technically feasible example.]
GEM of TKIkairosfocus
July 21, 2011
July
07
Jul
21
21
2011
06:02 AM
6
06
02
AM
PDT
All this anti-ID CSI bashing and there STILL isn't any evidence that genetic accidents can accumulate in such a way as to give rise to new, useful and functional multi-part systems.Joseph
July 21, 2011
July
07
Jul
21
21
2011
05:10 AM
5
05
10
AM
PDT
Indium,
Evolution searchs in the vicinity of viable and reproducing organisms and therefore only “tests” an incredibly tiny amount of the total possible DNA sequences for example. And as long there is a working path between a starting and an end point, evolution might have the ressources to find it.
You're describing a speculative hypothesis - that between two points there is an evolutionary path - as if it were an observed phenomenon like cloud formation or digestion. It's a bit clearer if we refer to these "searches" and paths as what they are - speculative hypothesis. Otherwise we risk giving them too much weight and credibility.ScottAndrews
July 21, 2011
July
07
Jul
21
21
2011
05:08 AM
5
05
08
AM
PDT
Gpuccio: I find this
Durston needs not know anything like that. What Durston says is very simple: given the existing variants of that protein with that function in the whole proteome, the functional complexity computed by calculating the reduction of uncertainness for each AA position derived from the existing sequences is such and such.
a bit hard to parse, but anyway: You explicitly said that to determine DFCSI one has to determine the target space (number of sequences performing a specific function). Durston cannot exhaust the total target space for the function he is looking at, since he doesn´t know which other, maybe even much shorter sequences might have a similar function or the same function in a different environment. But we are repeating ourselfes here. And of course the restraints on evolution are huge. But to look at just one function and to then declare that the evolution of *exactly* this solution as incredibly unlikely is kind of a cheap trick. Before a lottery every single person has a very small probability to win, after the lottery one person has won it with a quite high probability. Ok, what about the search space? You seem to assume that the search space is based on the total number of AA acids. But this is quite obviously false, since, as you admit, in reality you have to look at the number of "new" AAs. Every selectable intermediate step makes your argument weaker and weaker. If there is a viable evolutionary series from a starting to the end point, your argument breaks down (as you admit). This means that in the end you are just making a quite elaborate god of the gaps argument. If nobody can show you with a pathetic level of detail how something evolved you just assume design. As kf says, it´s just Paleys watch all over again.Indium
July 21, 2011
July
07
Jul
21
21
2011
04:51 AM
4
04
51
AM
PDT
GP: Excellent. Gkairosfocus
July 21, 2011
July
07
Jul
21
21
2011
02:05 AM
2
02
05
AM
PDT
Indium: A few simple comments for you: 1) You ask: "How does Durston know he has exhausted all possible ways to generate the same function in the phenotype?" Durston needs not know anything like that. What Durston says is very simple: given the existing variants of that protein with that function in the whole proteome, the functional complexity computed by calculating the reduction of uncertainness for each AA position derived from the existing sequences is such and such. I have commented, in my post #103 (to you): "The Durston method, applying the Shannon reduction of uncertainty to single amioacid position in large and old protein families, is a reasonable method to approximate the target space, provided we assume that in those cases the functional space has been almost completely traversed by neutral evolution, which is a very reasonable assumption, supported by all existing data." The fact is, proteins diverge in a perotein family because of neutral evolution. So, they "explore" their target space in the course of natural history. That is the big bang theory of protein evolution, and it is supported by observed facts. It is true that many protein families, even old, are still diverging, but the general idea is that in old families most of the target space has been explored. This is a reasonable, empirically based assumption. 2) You ask: "How does Durston know which other functions could have evolved instead of the one he is looking at? Which other “targets” there might have been?" Durston needs not know anything like that. He is just measuring functional complexity in protein families. My definition of dFSCI, too, has no relationship with this "problem". I have explicitly stated that dFSCI must be calculated for one explicitly defined function. So, what do you mean with your question? Well, I will try to state it more consistently for you. It is a form of an old objection, that I usually call the "any possible function" objection. I have answered that objection many times, and I will do that again for you now. The objection, roughly stated, goes as follows: "But evolution needs not find a specific functional target. It can reach any possible functional target, any possible function. So, its chances of succewding are huge." Well, that is not true at all. Darwinian evolution has two severe restraints: a) It must reach targets visible to NS b) It must reach targets visible to NS ina an already existing, and very complex, and very integrated system (the replicator). Obviously, b) is valid only for darwinian evolution "after" OOL. let's say certainly after LUCA. If you prefer to discuss OOL, we can do that, but I am not sure it would be better for you. :) So, what does that mean? It means that, unless and until a target whioch is naturally selectable in a specific replicator emerges, NS cannot "come into action", and therefore the only "tool" acting according to darwinian theory is Random Variation. So, let's try to define how you should formulate a calculation of dFSCI for the emergence of some new functionally complex property in some bacterium species. What you simply should do is: a) Define a series of steps, one leading to the other, that are naturally selectable. b) Each step must have the following properties: 1) It has a definite reproductive advantage versus the previous step (is naturally selectable). IOWs adding that variation to an existing bacterial population (the precious step) the new trait will rapidly expand to all or most of the population. 2) The transition from each to the following one must not be too complex (certainly, it must not be of the dFSCI order of magnitude), so that it is in the range of random variation. 3) The final transition, form the initial state to the final state, whatever the number of intermediate steps, muts be complex enough to exhibit dFSCI. IOWs the new function emerging as the final result of all the transitions must have a functional complexity, let's say, of at least 150 bits. For simplicity, let's say that it must include at least 35 new AAs, each of them necessary for the new function. Well, show me such a path, explicitly proposed and if possible verified in the lab, for any of the existing complex protein families. That is what is meant by a serious scientific theory. If that does not exist, not even for one case, then the darwinian theory is what it really is: a just so story. c) You say: "I don´t know how big the target space is and neither do you I guess. But when you follow Gpuccios algorithm to determine DFCSI you have to calculate it. And my point is that if you assume there was just this “WEASEL” that had to be reached you´re underestimating the target space by a vast amount. Since this concentration on just one target is one of the weaknesses of Dawkins WEASEL I don´t understand why ID people would make this error." First of all, nobody is supposing that there is just one "WEASEL". The Durston method approximates the target space for one function, and that target space is usually very big. Take the valus for Ribosomal S2, for instance, a protein of 197 AAs, analyzed in a family of 605 sequences. The search space is 851 bits. The functional complexity is "only" 462 bits. That means that the method computes the target space for that protein at 389 bits, that is 2^389 functional sequences. That is hardly a small functional space, I would say. The method works, and it works very fine. Moreover, the weakness of the WEASEL example is another one, and very simple: the algorithm already knows the solution. It is a very trivial example of intelligent selection.gpuccio
July 21, 2011
July
07
Jul
21
21
2011
01:49 AM
1
01
49
AM
PDT
Morning Lizzie, We certainly do have different mental images here! I think that’s because I’m talking about self-replication and you’re talking about something else (I’m trying to think of a technical term for disassembling and reassembling lego blocks… but can’t)! Crucially, the ability of the first self-replicating molecule to self-replicate (a small part of which may involve splitting down the middle) comes from its molecular shape and content so, that will include its sequence. The process of self-replication does not and cannot rely upon random free-floating monomers being in the right place at the right time: self-replication needs to be far more self-contained than that or else the parent would almost certainly lose the daughter (and itself!) in the very process of reproduction. Besides, if self-replication was merely the “ability to split down the middle, and for both halves to then attract to each now-unmated unit” then surely the most likely outcome would be for the two halves of the peptide chain to merely come back together again. So, the self-replicating molecule would (and we’re dramatically oversimplifying here) need to start off with something like: AC CA CA DB DB BD CA CA AC AC DB BD And then, after a self-contained and straightforward process of self-replication, we end up with two: AC AC CA CA CA CA DB DB DB DB BD BD CA CA CA CA AC AC AC AC DB DB BD BD Given that such a self-replicating molecule must have existed if life just made itself, then I can see no scope for copying error that will not impair or destroy the ability to self-replicate. There is only perfect and eternal cloning. And, this heredity is a much more important feature of life than copying errors. So, tell me Lizzie, how can we realistically move beyond this first self-replicating molecule?Chris Doyle
July 21, 2011
July
07
Jul
21
21
2011
01:27 AM
1
01
27
AM
PDT
ScottAndrews:
It’s misleading to imply that the big picture changes just because there are multiple potential targets.
I don´t know how big the target space is and neither do you I guess. But when you follow Gpuccios algorithm to determine DFCSI you have to calculate it. And my point is that if you assume there was just this "WEASEL" that had to be reached you´re underestimating the target space by a vast amount. Since this concentration on just one target is one of the weaknesses of Dawkins WEASEL I don´t understand why ID people would make this error. At the same time, you guys often vastly overestimate the search space. Evolution searchs in the vicinity of viable and reproducing organisms and therefore only "tests" an incredibly tiny amount of the total possible DNA sequences for example. And as long there is a working path between a starting and an end point, evolution might have the ressources to find it. Of course it might still be the case that you can in some way prove that certain developments biologists think have happened are extremely unlikely. But you cannot prove this by vastly overestimating the search space and at the same time vastly underestimating the potential target space to arrive at a magic ultra small probability.Indium
July 20, 2011
July
07
Jul
20
20
2011
11:58 PM
11
11
58
PM
PDT
The concept of a target is highly misleading anyway. Whatever biological structure you look at, it was never a target that had to be reached.
That makes it sound a bit like you can't spit without hitting a potential biological structure. The truth is that every known biological structure and anything we can imagine when combined amount to a really tiny target, like a pinhead on the moon. It's misleading to imply that the big picture changes just because there are multiple potential targets.ScottAndrews
July 20, 2011
July
07
Jul
20
20
2011
07:19 PM
7
07
19
PM
PDT
Elizabeth Liddle:
Each of these then “mates” with the appropriate A, B, C and D monomers in the environment, resulting in tow chains that are identical to the parent chain.
How long will the chain need to be? Why can't it just be one monomer long? Why do you think a chain of monomers like that would contain any information? How much information would the chain contain? How do you propose to measure the amount of information?Mung
July 20, 2011
July
07
Jul
20
20
2011
04:59 PM
4
04
59
PM
PDT
3. The concept of a target is highly misleading anyway. Whatever biological structure you look at, it was never a target that had to be reached.
Neither was the watch the Paley stumbled upon while out walking on the heath.
But that's simply not true. Human designed objects are the result of striving for a target. Biological structures are not. It's a key difference. It's the most important difference between things that are designed and things that have the appearance of design. The chain of descent means that there is no individual that can be said to be an intermediate between species, because every individual is a transitional. There is no abstract form involved, just instances of living things that have descended continuously from other living things.Petrushka
July 20, 2011
July
07
Jul
20
20
2011
03:01 PM
3
03
01
PM
PDT
3. The concept of a target is highly misleading anyway. Whatever biological structure you look at, it was never a target that had to be reached.
Neither was the watch the Paley stumbled upon while out walking on the heath.Mung
July 20, 2011
July
07
Jul
20
20
2011
01:44 PM
1
01
44
PM
PDT
Even if a mutant could still self-replicate, if there is no real competition for resources (because the original strain flourishes along with the mutant strain), then why should the original strain die out?
Why didn't the most efficient self-replicator gobble up all the resources required for it to self-replicate?Mung
July 20, 2011
July
07
Jul
20
20
2011
01:21 PM
1
01
21
PM
PDT
Mung: Yup, searches can do better than average. Problem comes in when the search is next to nil relative to the space and is looking for a needle in the haystack, as I showed for Indium above. The average search under those conditions is next to no chance of catching something that is deeply isolated. The NFL does not FORBID, but the circumstances create a practical impossibility, quite similar to how he statistical form of the 2nd law of thermodynamics does not forbid classical 2nd law violations, but the rarity in the space of possibilities locks it out for all practical purposes once you are above a reasonable threshold of system scale/complexity. And, there is no evidence -- Zachriel's little games notwithstanding, Indium -- that we have the sort of abundance that would make functional states TYPICAL of the config space. And, remember that starts at 500 - 1,000 bits worth of configs. GEM of TKI PS; Indium, sampling theory will tell us that a random or reasonably random sample will be representative of a space. Have you seen how the relative frequency patterns of symbols in codes are identified? By precisely the sort of sampling approach Durston et al took; even though someone out there actually composed and published a whole book in which the letter E never occurs. You are being selectively hyperskeptical.kairosfocus
July 20, 2011
July
07
Jul
20
20
2011
01:16 PM
1
01
16
PM
PDT
You don't need to respond Lizzie. Carry on with Chris and gpuccio.Mung
July 20, 2011
July
07
Jul
20
20
2011
01:08 PM
1
01
08
PM
PDT
Elizabeth Liddle:
It’s also the title of a pair of theorems by Wolpert and MacReady which don’t apply to evolutionary algorithms, and which Dembski tries to apply to evolutionary algorithms here...
Another source:
What is true, though, is that the NFL theorems, while perfectly applicable to all kinds of algorithms including the Darwinian evolutionary algorithms (with a possible exception for co-evolution), contrary to Dembski's assertions, do not in any way prohibit Darwinian evolution. The NFL theorems do not at all prevent evolutionary algorithms from outperforming a random sampling (or blind search) because these theorems are about performance averaged over all possible fitness functions. They say nothing about performance of different algorithms on specific fitness landscapes. In real-life situations, it is the performance on a specific landscape that counts and this is where evolutionary algorithms routinely outperform random searches and do so very efficiently, both when the processes are targeted (as in Dawkins's algorithm –see [8]) and when they are non-targeted (as Darwinian evolution is). here
Mung
July 20, 2011
July
07
Jul
20
20
2011
12:56 PM
12
12
56
PM
PDT
It may be true that we cannot both be right; however it is not necessarily true that one of us is lying. Do try to remember that there is a difference between a mistake and a lie, Mung, it is quite important. But you'll have to wait for my response as I've got something I need to do urgently which might take a few days. See you later.Elizabeth Liddle
July 20, 2011
July
07
Jul
20
20
2011
12:54 PM
12
12
54
PM
PDT
Elizabeth Liddle:
It’s also the title of a pair of theorems by Wolpert and MacReady which don’t apply to evolutionary algorithms, and which Dembski tries to apply to evolutionary algorithms here...
LOL! You're are so amazing at times. Simply amazing. You just spout off without having a clue about what you're talking about. And if you will just say anything, and even claim to believe it to be true, though it is false, what am I supposed to call that? One of the first objections to Dembski was that while NFL theorems are applicable to EA's, evolution is not a search, therefore Dembski is wrong. [A non sequitur, at that.] The question is is he wrong about EA's? H. Allen Orr:
The NFL theorems compare the efficiency of evolutionary algorithms; roughly speaking, they ask how often different search algorithms reach a target within some number of steps.
And then:
The problem with all this is so simple that I hate to bring it up. But here goes: Darwinism isn't trying to reach a prespecified target...Evolution isn't searching for anything and Darwinism is not therefore a search algorithm.
One of you has to be wrong. One of you is not telling the truth. Orr then goes on to say:
The proper conclusion is that evolutionary algorithms are flawed analogies for Darwinism.
You Darwinists really ought to get your stories straight. For not only are you wrong about Dembski and NFL, you've argued repeatedly here at UD that evolutionary algorithms are a great analogy for Darwinian evolution. You and H. Allen Orr cannot both be right. http://bostonreview.net/BR27.3/orr.html See more: http://www.iscid.org/boards/ubb-get_topic-f-6-t-000240.htmlMung
July 20, 2011
July
07
Jul
20
20
2011
12:40 PM
12
12
40
PM
PDT
How does Durston know he has exhausted all possible ways to generate the same function in the phenotype? How does Durston know which other functions could have evolved instead of the one he is looking at? Which other "targets" there might have been?Indium
July 20, 2011
July
07
Jul
20
20
2011
11:49 AM
11
11
49
AM
PDT
By the way, I would say that Indium has really made me a favor, repeating essentially the objections I had anticipated in my post 96. It seems that I know well my darwinists! :)gpuccio
July 20, 2011
July
07
Jul
20
20
2011
11:38 AM
11
11
38
AM
PDT
1 2 3 4 6

Leave a Reply