Uncommon Descent Serving The Intelligent Design Community

How to calculate Chi_500, a log-reduced, simplified form of the Dembski Chi-metric for CSI

arroba Email

In response to onward use of the talking points that CSI is not calculable etc., I have updated the CSI Newsflash post of April 14, 2011, to explicitly incorporate the dummy variable for specificity, and by adding a 1,000 coin demonstration calculation to go with the already existing use of the Durston et al calculation of FSC that was fed into three cases of a biologically relevant Chi_500 value.

I show the clip below:


>>What are we to make of [Dembski’s discussion of CSI in NFL pp 144 and 148, cited in the Newsflash thread of April 14], in light of Orgel’s conceptual definition from 1973 and the recent challenges to CSI raised by MG and others.

That is:

. . . In brief, living organisms are distinguished by their specified complexity. Crystals are usually taken as the prototypes of simple well-specified structures, because they consist of a very large number of identical molecules packed together in a uniform way. Lumps of granite or random mixtures of polymers are examples of structures that are complex but not specified. The crystals fail to qualify as living because they lack complexity; the mixtures of polymers fail to qualify because they lack specificity. [[The Origins of Life (John Wiley, 1973), p. 189.]

And, what about the more complex definition in the 2005 Specification paper by Dembski?


define ϕS as . . . the number of patterns for which [agent] S’s semiotic description of them is at least as simple as S’s semiotic description of [a pattern or target zone] T. [26] . . . . where M is the number of semiotic agents [S’s] that within a context of inquiry might also be witnessing events and N is the number of opportunities for such events to happen . . . . [where also] computer scientist Seth Lloyd has shown that 10^120 constitutes the maximal number of bit operations that the known, observable universe could have performed throughout its entire multi-billion year history.[31] . . . [Then] for any context of inquiry in which S might be endeavoring to determine whether an event that conforms to a pattern T happened by chance, M·N will be bounded above by 10^120. We thus define the specified complexity [χ] of T given [chance hypothesis] H [in bits] . . . as  [the negative base-2 log of the conditional probability P(T|H) multiplied by the number of similar cases ϕS(t) and also by the maximum number of binary search-events in our observed universe 10^120]

χ = – log2[10^120 ·ϕS(T)·P(T|H)]  . . . eqn n1

How about this (we are now embarking on an exercise in “open notebook” science):

1 –> 10^120 ~ 2^398

2 –> Following Hartley, we can define Information on a probability metric:

I = – log(p) . . .  eqn n2

3 –> So, we can re-present the Chi-metric:

Chi = – log2(2^398 * D2 * p)  . . .  eqn n3

Chi = Ip – (398 + K2) . . .  eqn n4

4 –> That is, the Dembski CSI Chi-metric is a measure of Information for samples from a target zone T on the presumption of a chance-dominated process, beyond a threshold of at least 398 bits, covering 10^120 possibilities.

5 –> Where also, K2 is a further increment to the threshold that naturally peaks at about 100 further bits. In short VJT’s CSI-lite is an extension and simplification of the Chi-metric. He explains in the just linked (and building on the further linked) :

The CSI-lite calculation I’m proposing here doesn’t require any semiotic descriptions, and it’s based on purely physical and quantifiable parameters which are found in natural systems. That should please ID critics. These physical parameters should have known probability distributions. A probability distribution is associated with each and every quantifiable physical parameter that can be used to describe each and every kind of natural system – be it a mica crystal, a piece of granite containing that crystal, a bucket of water, a bacterial flagellum, a flower, or a solar system . . . .

Two conditions need to be met before some feature of a system can be unambiguously ascribed to an intelligent agent: first, the physical parameter being measured has to have a value corresponding to a probability of 10^(-150) or less, and second, the system itself should also be capable of being described very briefly (low Kolmogorov complexity), in a way that either explicitly mentions or implicitly entails the surprisingly improbable value (or range of values) of the physical parameter being measured . . . .

my definition of CSI-lite removes Phi_s(T) from the actual formula and replaces it with a constant figure of 10^30. The requirement for low descriptive complexity still remains, but as an extra condition that must be satisfied before a system can be described as a specification. So Professor Dembski’s formula now becomes:

CSI-lite=-log2[10^120.10^30.P(T|H)]=-log2[10^150.P(T|H)] . . . eqn n1a

. . . .the overall effect of including Phi_s(T) in Professor Dembski’s formulas for a pattern T’s specificity, sigma, and its complex specified information, Chi, is to reduce both of them by a certain number of bits. For the bacterial flagellum, Phi_s(T) is 10^20, which is approximately 2^66, so sigma and Chi are both reduced by 66 bits. My formula makes that 100 bits (as 10^30 is approximately 2^100), so my CSI-lite computation represents a very conservative figure indeed.

Readers should note that although I have removed Dembski’s specification factor Phi_s(T) from my formula for CSI-lite, I have retained it as an additional requirement: in order for a system to be described as a specification, it is not enough for CSI-lite to exceed 1; the system itself must also be capable of being described briefly (low Kolmogorov complexity) in some common language, in a way that either explicitly mentions pattern T, or entails the occurrence of pattern T. (The “common language” requirement is intended to exclude the use of artificial predicates like grue.) . . . .

[As MF has pointed out] the probability p of pattern T occurring at a particular time and place as a result of some unintelligent (so-called “chance”) process should not be multiplied by the total number of trials n during the entire history of the universe. Instead one should use the formula (1–(1-p)^n), where in this case p is P(T|H) and n=10^120. Of course, my CSI-lite formula uses Dembski’s original conservative figure of 10^150, so my corrected formula for CSI-lite now reads as follows:

CSI-lite=-log2(1-(1-P(T|H))^(10^150)) . . . eqn n1b

If P(T|H) is very low, then this formula will be very closely approximated [HT: Giem] by the formula:

CSI-lite=-log2[10^150.P(T|H)]  . . . eqn n1c

6 –> So, the idea of the Dembski metric in the end — debates about peculiarities in derivation notwithstanding — is that if the Hartley-Shannon- derived information measure for items from a hot or target zone in a field of possibilities is beyond 398 – 500 or so bits, it is so deeply isolated that a chance dominated process is maximally unlikely to find it, but of course intelligent agents routinely produce information beyond such a threshold.

7 –> In addition, the only observed cause of information beyond such a threshold is the now proverbial intelligent semiotic agents.

8 –> Even at 398 bits that makes sense as the total number of Planck-time quantum states for the atoms of the solar system [most of which are in the Sun] since its formation does not exceed ~ 10^102, as Abel showed in his 2009 Universal Plausibility Metric paper. The search resources in our solar system just are not there.

9 –> So, we now clearly have a simple but fairly sound context to understand the Dembski result, conceptually and mathematically [cf. more details here]; tracing back to Orgel and onward to Shannon and Hartley. Let’s augment here [Apr 17], on a comment in the MG progress thread:

Shannon measured info-carrying capacity, towards one of his goals: metrics of the carrying capacity of comms channels — as in who was he working for, again?

CSI extended this to meaningfulness/function of info.

And in so doing, observed that this — due to the required specificity — naturally constricts the zone of the space of possibilities actually used, to island[s] of function.

That specificity-complexity criterion links:

I: an explosion of the scope of the config space to accommodate the complexity (as every added bit DOUBLES the set of possible configurations),  to

II: a restriction of the zone, T, of the space used to accommodate the specificity (often to function/be meaningfully structured).

In turn that suggests that we have zones of function that are ever harder for chance based random walks [CBRW’s] to pick up. But intelligence does so much more easily.

Thence, we see that if you have a metric for the information involved that surpasses a threshold beyond which a CBRW is a plausible explanation, then we can confidently infer to design as best explanation.

Voila, we need an info beyond the threshold metric. And, once we have a reasonable estimate of the direct or implied specific and/or functionally specific (especially code based) information in an entity of interest, we have an estimate of or credible substitute for the value of – log2(p(T|H)); especially if the value of information comes from direct inspection of storage capacity and code symbol patterns of use leading to an estimate of relative frequency, we may evaluate average [functionally or otherwise] specific information per symbol used. This is a version of Shannon’s weighted average information per symbol H-metric, H = –  Σ pi * log(pi), which is also known as informational  entropy [there is an arguable link to thermodynamic entropy, cf here)  or uncertainty.

As in (using Chi_500 for VJT’s CSI_lite [UPDATE, July 3: and S for a dummy variable that is 1/0 accordingly as the information in I is empirically or otherwise shown to be specific, i.e. from a narrow target zone T, strongly UNREPRESENTATIVE of the bulk of the distribution of possible configurations, W]):

Chi_500 = Ip*S – 500,  bits beyond the [solar system resources] threshold  . . . eqn n5

Chi_1000 = Ip*S – 1000, bits beyond the observable cosmos, 125 byte/ 143 ASCII character threshold . . . eqn n6

Chi_1024 = Ip*S – 1024, bits beyond a 2^10, 128 byte/147 ASCII character version of the threshold in n6, with a config space of 1.80*10^308 possibilities, not 1.07*10^301 . . . eqn n6a

[UPDATE, July 3: So, if we have a string of 1,000 fair coins, and toss at random, we will by overwhelming probability expect to get a near 50-50 distribution typical of the bulk of the 2^1,000 possibilities W. On the Chi-500 metric, I would be high, 1,000 bits, but S would be 0, so the value for Chi_500 would be – 500, i.e. well within the possibilities of chance.  However, if we came to the same string later and saw that the coins somehow now had the bit pattern of the ASCII codes for the first 143 or so characters of this post, we would have excellent reason to infer that an intelligent designer, using choice contingency, had intelligently reconfigured the coins. that is because, using the same I = 1,000 capacity value, S is now 1, and so Chi_500 = 500 bits beyond the solar system threshold. If the 10^57 or so atoms of our solar system, for its lifespan, were to be converted into coins and tables etc, and tossed at an impossibly fast rate, it would be impossible to sample enough of the possibilities space W to have confidence that something from so unrepresentative a zone T,  could reasonably be explained on chance. So, as long as an intelligent agent capable of choice is possible, choice — i.e. design — would be the rational, best explanation on the sign observed, functionally specific, complex information.]

10 –> Similarly, the work of Durston and colleagues, published in 2007, fits this same general framework. Excerpting:

Consider that there are usually only 20 different amino acids possible per site for proteins, Eqn. (6) can be used to calculate a maximum Fit value/protein amino acid site of 4.32 Fits/site [NB: Log2 (20) = 4.32]. We use the formula log (20) – H(Xf) to calculate the functional information at a site specified by the variable Xf such that Xf corresponds to the aligned amino acids of each sequence with the same molecular function f. The measured FSC for the whole protein is then calculated as the summation of that for all aligned sites. The number of Fits quantifies the degree of algorithmic challenge, in terms of probability [info and probability are closely related], in achieving needed metabolic function. For example, if we find that the Ribosomal S12 protein family has a Fit value of 379, we can use the equations presented thus far to predict that there are about 10^49 different 121-residue sequences that could fall into the Ribsomal S12 family of proteins, resulting in an evolutionary search target of approximately 10^-106 percent of 121-residue sequence space. In general, the higher the Fit value, the more functional information is required to encode the particular function in order to find it in sequence space. A high Fit value for individual sites within a protein indicates sites that require a high degree of functional information. High Fit values may also point to the key structural or binding sites within the overall 3-D structure.

11 –> So, Durston et al are targetting the same goal, but have chosen a different path from the start-point of the Shannon-Hartley log probability metric for information. That is, they use Shannon’s H, the average information per symbol, and address shifts in it from a ground to a functional state on investigation of protein family amino acid sequences. They also do not identify an explicit threshold for degree of complexity. [Added, Apr 18, from comment 11 below:] However, their information values can be integrated with the reduced Chi metric:

Using Durston’s Fits from his Table 1, in the Dembski style metric of bits beyond the threshold, and simply setting the threshold at 500 bits:

RecA: 242 AA, 832 fits, Chi: 332 bits beyond

SecY: 342 AA, 688 fits, Chi: 188 bits beyond

Corona S2: 445 AA, 1285 fits, Chi: 785 bits beyond  . . . results n7

The two metrics are clearly consistent, and Corona S2 would also pass the X metric’s far more stringent threshold right off as a single protein. (Think about the cumulative fits metric for the proteins for a cell . . . )

In short one may use the Durston metric as a good measure of the target zone’s actual encoded information content, which Table 1 also conveniently reduces to bits per symbol so we can see how the redundancy affects the information used across the domains of life to achieve a given protein’s function; not just the raw capacity in storage unit bits [= no.  of  AA’s * 4.32 bits/AA on 20 possibilities, as the chain is not particularly constrained.]>>


So, CSI is demonstrably calculable for biologically relevant cases, on reasonable metrics and empirical investigations. This, despite many talking points  to the contrary. END

Mung: Sometimes the best target is not a peak under current circumstances, if that locks out adaptability. A blind, optimising process may well embrittle the resulting population setting it up for collapse if and when the environment that is doing the culling on differential reproductive success, shifts sharply. Right now, here the Montserrat oriole -- national bird -- is endangered, but the pearly Eyed Thrasher, a real rat of a bird (and the 4-footed kind too!) are thriving in the aftermath of catastrophic habitat loss. This is the old generalist vs specialist debate all over again. And, here is the cruncher: the evo algorithm is said to be a blind hill climber. One that clips off the inferior relative to CURRENT environment -- doubtless with some sort of relaxation time for the change to wend its way through the population. So, are we looking at an embrittling algorithm in the long term? As in, the better it works, the worse it really is as it is optimising on Present CONDITIONS BLINDLY. BTW, a similar issue is in industries, as firms that do well in a certain environment may be setting up for a fall when big change comes. And cultures may have self-reinforcing dominant elites that are heading for a fall, that may take their countries down with them. GEM of TKI kairosfocus
That seems consistent with the way I'm using it. The target value is the amount of information obtained on a per query basis.
btw, kf, I agree, brittleness vs robustness is key when comparing search algorithms.
Or efficiency. Why use a search algorithm that can't be expected to find the target in a reasonable amount of time? Mung
optimisation: maximising or minimising a target value, subject to an objective function and constraints. kairosfocus
http://marksmannet.com/RobertMarks/REPRINTS/2010-EfficientPerQueryInformationExtraction.pdf Mung
hi kf, How are you using optimality? Let's make sure we're talking about the same concept, lol. Say you have 4 boxes and one of the boxes contains a coin. The boxes are labeled 1 through 4. Box 1 and 2 are painted red, box 3 and 4 are painted white. You know which box contains the coin. You are going to act as an oracle and can answer yes or no to a query. The optimal strategy for me to use to find the coin does not start by using a strategy which asks if the coin is in box #1. The optimal search begins with partitioning the search space according to some binary scheme, excluding fully half the possible solutions. Is the coin in a red box? Is the coin in a white box? Is the coin in an even numbered box. Is the coin in an odd numbered box? I think it's the optimal strategy because it obtains the maximum information per query given the capabilities of the oracle. That's what I mean by optimal, and it's relative not absolute. Mung
"What is the probability of a blind unassisted search stumbling upon the zone of interest?" Zero. For a search which is indeed blind cannot "know" to stop searching should it ever land in "the zone of interest." Thus, even should it ever land there, it will immediately go "search" somewhere else. Ilion
Mung: Think drifting landscapes. Flexibility in response to gradual, cyclical or catastrophic change. Not all situations are static, Optimality may be brittle against such. G kairosfocus
btw, kf, I agree, brittleness vs robustness is key when comparing search algorithms.
sigh The optimal strategy would be the one that obtains the maximal information per query. Is that in dispute? Given a sorted list, is a binary search more efficient than a linear search? Why? What does robustness have to do with it? Mung
PPS: But you gotta find your function first. kairosfocus
PS: Just chopped off that lot, would not open as a word doc, reverted to a read of the bit pattern. Junk ain't so junky after all . . . kairosfocus
First, identify your function, and its sensitivity to perturbation of the informational element. Then, look at the information quantity to get function. BTW, did you do the set up a blank Word doc and look at it with notepad etc test? [You are going to see a lot of apparently nonfunctional and repetitive elements, but try changing at random and see what is going to happen.] cf: ÐÏࡱá >  þÿ   !  #  þÿÿÿ ÿÿÿÿÿÿÿÿÿÿÿÿÿÿ kairosfocus
Way back when, it used to be a question of post optimality sensitivity analysis. Too often, BAAD news. kairosfocus
er, Mung:
You are asking him to measure Information where none exists?
You think there is no information in non-coding DNA sequences? That it is all, um, junk? If so, I don't agree :) Elizabeth Liddle
btw, kf, I agree, brittleness vs robustness is key when comparing search algorithms. Elizabeth Liddle
Sorry, that wasn't very clear, I was still thinking in terms of the other thread. I meant: leaving aside the threshold for now, how would you plot an observed pattern on our compressibility vs complexity plot? In other words, how do you compute each of these values for a given pattern? Elizabeth Liddle
You are asking him to measure Information where none exists? Mung
kf: can you explain how you would apply this to a candidate pattern? For instance, a non-coding DNA sequence of bases? (Easy enough to do for a coding stretch, I would imagine). Elizabeth Liddle
Mung: Actually, I think optimal strategies tend to be brittle against changes that are almost inevitable in real world contexts. Robust, good strategies seem better on average. GEM of TKI kairosfocus
I'm beginning to think of this in terms of searches and information. What is the probability of a blind unassisted search stumbling upon the zone of interest? How much information would be required by an assisted search to find the target? Intelligent causes have consistently demonstrated the ability to provide the necessary information. I think it might be interesting also to consider the ability of intelligent causes to devise optimal strategies. For example, if you have a coin hidden under one of 8 boxes all of which are the same size and shape, what strategy would you devise to find the coin in the lest number of steps? But if the coin is hidden under one of 8 boxes, four boxes are square, four are rectangular, you could obtain maximal information by posing the question is the coin under a rectangular box. This requires knowledge about the search space. Mung

Leave a Reply