Is the CSI concept well-founded mathematically, and can it be applied to the real world, giving real and useful numbers?

_{kairosfocus

July 4, 2011

ID Foundations, Intelligent Design}

Share: Facebook; Twitter; LinkedIn; Flipboard; Print; Email

Those who have been following the recently heated up exchanges on the theory of intelligent design and the key design inference on tested, empirically reliable signs, through the ID explanatory filter, will know that a key move in recent months was the meteoric rise of the mysterious internet persona MathGrrl (who is evidently NOT the Calculus Prof who has long used the same handle).

MG as the handle is abbreviated, is well known for “her” confident-manner assertion — now commonly stated as if it were established fact in the Darwin Zealot fever swamps that are backing the current cyberbullying tactics that have tried to hold my family hostage — that:

without a rigorous mathematical definition and examples of how to calculate [CSI], the metric is literally meaningless. Without such a definition and examples, it isn’t possible even in principle to associate the term with a real world referent.

As the strike-through emphasises, every one of these claims has long been exploded.

You doubt me?

Well, let us cut down the clip from the CSI Newsflash thread of April 18, 2011, which was again further discussed in a footnote thread of 10th May (H’mm, anniversary of the German Attack in France in 1940), which was again clipped yesterday at fair length.

( BREAK IN TRANSMISSION: BTW, antidotes to the intoxicating Darwin Zealot fever swamp “MG dunit” talking points were collected here — Graham, why did you ask the question but never stopped by to discuss the answer? And the “rigour” question was answered step by step at length here. In a nutshell, as the real MathGrrl will doubtless be able to tell you, the Calculus itself, historically, was founded on sound mathematical intuitive insights on limits and infinitesimals, leading to the warrant of astonishing insights and empirically warranted success, for 200 years. And when Math was finally advanced enough to provide an axiomatic basis — at the cost of the sanity of a mathematician or two [doff caps for a minute in memory of Cantor] — it became plain that such a basis was so difficult that it could not have been developed in C17. Had there been an undue insistence on absolute rigour as opposed to reasonable warrant, the great breakthroughs of physics and other fields that crucially depended on the power of Calculus, would not have happened. For real world work, what we need is reasonable warrant and empirical validation of models and metrics, so that we know them to be sufficiently reliable to be used. The design inference is backed up by the infinite monkeys analysis tracing to statistical thermodynamics, and is strongly empirically validated on billions of test cases, the whole Internet and the collection of libraries across the world being just a sample of the point that the only credibly known source for functionally specific complex information and associated organisation [FSCO/I] is design. )

After all, a bit of careful citation always helps:

_________________

>>1 –> 10^120 ~ 2^398

2 –> Following Hartley, we can define Information on a probability metric:

I = – log(p) . . . eqn n2

3 –> So, we can re-present the Chi-metric:

[where, from Dembski, Specification 2005, χ = – log₂[10^120 ·ϕS(T)·P(T|H)] . . . eqn n1]

Chi = – log₂(2^398 * D2 * p) . . . eqn n3

Chi = I_p – (398 + K₂) . . . eqn n4

4 –> That is, the Dembski CSI Chi-metric is a measure of Information for samples from a target zone T on the presumption of a chance-dominated process, beyond a threshold of at least 398 bits, covering 10^120 possibilities.

5 –> Where also, K2 is a further increment to the threshold that naturally peaks at about 100 further bits . . . .

6 –> So, the idea of the Dembski metric in the end — debates about peculiarities in derivation notwithstanding — is that if the Hartley-Shannon- derived information measure for items from a hot or target zone in a field of possibilities is beyond 398 – 500 or so bits, it is so deeply isolated that a chance dominated process is maximally unlikely to find it, but of course intelligent agents routinely produce information beyond such a threshold.

7 –> In addition, the only observed cause of information beyond such a threshold is the now proverbial intelligent semiotic agents.

8 –> Even at 398 bits that makes sense as the total number of Planck-time quantum states for the atoms of the solar system [most of which are in the Sun] since its formation does not exceed ~ 10^102, as Abel showed in his 2009 Universal Plausibility Metric paper. The search resources in our solar system just are not there.

9 –> So, we now clearly have a simple but fairly sound context to understand the Dembski result, conceptually and mathematically [cf. more details here]; tracing back to Orgel and onward to Shannon and Hartley . . . .

As in (using Chi_500 for VJT’s CSI_lite [UPDATE, July 3: and S for a dummy variable that is 1/0 accordingly as the information in I is empirically or otherwise shown to be specific, i.e. from a narrow target zone T, strongly UNREPRESENTATIVE of the bulk of the distribution of possible configurations, W]):

Chi_500 = Ip*S – 500, bits beyond the [solar system resources] threshold . . . eqn n5

Chi_1000 = Ip*S – 1000, bits beyond the observable cosmos, 125 byte/ 143 ASCII character threshold . . . eqn n6

Chi_1024 = Ip*S – 1024, bits beyond a 2^10, 128 byte/147 ASCII character version of the threshold in n6, with a config space of 1.80*10^308 possibilities, not 1.07*10^301 . . . eqn n6a

[UPDATE, July 3: So, if we have a string of 1,000 fair coins, and toss at random, we will by overwhelming probability expect to get a near 50-50 distribution typical of the bulk of the 2^1,000 possibilities W. On the Chi-500 metric, I would be high, 1,000 bits, but S would be 0, so the value for Chi_500 would be – 500, i.e. well within the possibilities of chance. However, if we came to the same string later and saw that the coins somehow now had the bit pattern of the ASCII codes for the first 143 or so characters of this post, we would have excellent reason to infer that an intelligent designer, using choice contingency, had intelligently reconfigured the coins. that is because, using the same I = 1,000 capacity value, S is now 1, and so Chi_500 = 500 bits beyond the solar system threshold. If the 10^57 or so atoms of our solar system, for its lifespan, were to be converted into coins and tables etc, and tossed at an impossibly fast rate, it would be impossible to sample enough of the possibilities space W to have confidence that something from so unrepresentative a zone T, could reasonably be explained on chance. So, as long as an intelligent agent capable of choice is possible, choice — i.e. design — would be the rational, best explanation on the sign observed, functionally specific, complex information.]

10 –> Similarly, the work of Durston and colleagues, published in 2007, fits this same general framework . . . .

We use the formula log (20) – H(Xf) to calculate the functional information at a site specified by the variable Xf such that Xf corresponds to the aligned amino acids of each sequence with the same molecular function f. The measured FSC for the whole protein is then calculated as the summation of that for all aligned sites. The number of Fits quantifies the degree of algorithmic challenge, in terms of probability [info and probability are closely related], in achieving needed metabolic function. For example, if we find that the Ribosomal S12 protein family has a Fit value of 379, we can use the equations presented thus far to predict that there are about 10^49 different 121-residue sequences that could fall into the Ribsomal S12 family of proteins, resulting in an evolutionary search target of approximately 10^-106 percent of 121-residue sequence space. In general, the higher the Fit value, the more functional information is required to encode the particular function in order to find it in sequence space . . . .

11 –> So, Durston et al are targetting the same goal, but have chosen a different path from the start-point of the Shannon-Hartley log probability metric for information. That is, they use Shannon’s H, the average information per symbol, and address shifts in it from a ground to a functional state on investigation of protein family amino acid sequences. They also do not identify an explicit threshold for degree of complexity. [Added, Apr 18, from comment 11 below:] However, their information values can be integrated with the reduced Chi metric:

Using Durston’s Fits from his Table 1, in the Dembski style metric of bits beyond the threshold, and simply setting the threshold at 500 bits:

RecA: 242 AA, 832 fits, Chi: 332 bits beyond

SecY: 342 AA, 688 fits, Chi: 188 bits beyond

Corona S2: 445 AA, 1285 fits, Chi: 785 bits beyond . . . results n7

The two metrics are clearly consistent . . . (Think about the cumulative fits metric for the proteins for a cell . . . )

In short one may use the Durston metric as a good measure of the target zone’s actual encoded information content, which Table 1 also conveniently reduces to bits per symbol so we can see how the redundancy affects the information used across the domains of life to achieve a given protein’s function; not just the raw capacity in storage unit bits [= no. of AA’s * 4.32 bits/AA on 20 possibilities, as the chain is not particularly constrained.]>>

_________________

So, there we have it folks:

I: Dembski’s CSI metric is closely related to standard and widely used work in Information theory, starting with I = – log p

II: It is reducible on taking the appropriate logs, to an information beyond a threshold value

III: The threshold is reasonably set by referring to the accessible search resources of a relevant system, i.e. our solar system or the observed cosmos as a whole.

IV: Where, once an observed configuration — event E, per NFL — that bears or implies information is from a separately and “simply” describable narrow zone T that is strongly unrepresentative — that’s key — of the space of possible configurations, W, then

V: since the search applied is of a very small fraction of W, it is unreasonable to expect that chance can reasonably account for E in T, instead of the far more typical possibilities in W of in aggregate, overwhelming statistical weight.

(For instance the 10^57 or so atoms of our solar system will go through about 10^102 Planck-time Quantum states in the time since its founding on the usual timeline. 10^150 possibilities [500 bits worth of possibilities] is 48 orders of magnitude beyond that reach, where it takes 10^30 P-time states to execute the fastest chemical reactions. 1,000 bits worth of possibilities is 150 orders of magnitude beyond the 10^150 P-time Q-states of the about 10^80 atoms of our observed cosmos. When you are looking for needles in haystacks, you don’t expect to find them on relatively tiny and superficial searches.)

VI: Where also, in empirical investigations we observe that an aspect of an object, system, process or phenomenon that is controlled by mechanical necessity will show itself in low contingency. A dropped, heavy object falls reliably at g. We can make up a set of differential equations and model how events will play out on a given starting condition, i.e we identify an empirically reliable natural law.

VII: By contrast, highly contingent outcomes — those that vary significantly on similar initial conditions, reliably trace to chance factors and/or choice, e.g we may drop a fair die and it will tumble to a value essentially by chance. (This is in part an ostensive definition, by key example and family resemblance.) Or, I may choose to compose a text string, writing it this way or the next. Or as the 1,000 coins in a string example above shows, coins may be strung by chance or by choice.

VIII: Choice and chance can be reliably empirically distinguished, as we routinely do in day to day life, decision-making, the court room, and fields of science like forensics. FSCO/I is one of the key signs for that and the Dembski-style CSI metric helps us quantify that, as was shown.

IX: Shown, based on a reasonable reduction from standard approaches, and shown by application to real world cases, including biologically relevant ones.

We can safely bet, though, that you would not have known that this was done months ago — over and over again — in response to MG’s challenge, if you were going by the intoxicant fulminations billowing up from the fever swamps of the Darwin zealots.

Let that be a guide to evaluating their credibility — and, since this was repeatedly drawn to their attention and just as repeatedly brushed aside in the haste to go on beating the even more intoxicating talking point drums, sadly, this also raises serious questions on the motives and attitudes of the chief ones responsible for those drumbeat talking points and for the fever swamps that give off the poisonous, burning strawman rhetorical fumes that make the talking points seem stronger than they are. (If that is offensive to you, try to understand: this is coming from a man whose argument as summarised above has repeatedly been replied to by drumbeat dismissals without serious consideration, led on to the most outrageous abuses by the more extreme Darwin zealots (who were too often tolerated by host sites advocating alleged “uncensored commenting,” until it was too late), culminating now in a patent threat to his family by obviously unhinged bigots.)

And, now also you know the most likely why of TWT’s attempt to hold my family hostage by making the mafioso style threat: we know you, we know where you are and we know those you care about. END

Comments

Fine. But that’s why enables the monkeys to type Shakespeare.
Have you had your morning caffeine yet? The monkeys don't type Shakespeare. They type whatever they type, mostly nonsense. But how is it that the monkey god knows Shakespeare? Isn't that horribly non-Darwinian?Mung_{July 19, 2011
July
07
Jul
19
19
2011
01:05 AM
1
01
05
AM
PDT}

Mung:
No Free Lunch is a book written by Dembski which you haven’t even read.
It's also the title of a pair of theorems by Wolpert and MacReady which don't apply to evolutionary algorithms, and which Dembski tries to apply to evolutionary algorithms here: http://www.designinference.com/documents/2005.03.Searching_Large_Spaces.pdf defending his application by appeal to what he calls a "no free lunch regress". His defense is faulty.Elizabeth Liddle_{July 19, 2011
July
07
Jul
19
19
2011
12:44 AM
12
12
44
AM
PDT}

Fine. But that's why enables the monkeys to type Shakespeare. It's The Whole Point. As you rightly point out.Elizabeth Liddle_{July 19, 2011
July
07
Jul
19
19
2011
12:31 AM
12
12
31
AM
PDT}

Elizabeth Liddle:
Darwinian theory is not a monkeys-at-keyboard theory.
Yes, it is. It just adds a monkey god to determine which sequences typed by the monkeys ought to be saved and which should be discarded.Mung_{July 18, 2011
July
07
Jul
18
18
2011
11:53 PM
11
11
53
PM
PDT}

This is what is wrong with Dembski’s application of No Free Lunch in fact, but we needn’t worry about that here.
No Free Lunch is a book written by Dembski which you haven't even read.Mung_{July 18, 2011
July
07
Jul
18
18
2011
10:28 PM
10
10
28
PM
PDT}

Elizabeth: I would like anyway to go on with the remaining parts of the procedure. Maybe later, ot tomorrow.gpuccio_{July 18, 2011
July
07
Jul
18
18
2011
11:03 AM
11
11
03
AM
PDT}

Elizabeth: I must sat that you follow my reasoning very well. Your comments are relevant, and good. I don't know if I have the time to go on for much today. I will try to make some clarifications which are simple enough: You say: "Ah, OK. You actually want to use compressibility to exclude “Necessity” – simple algorithms. That’s sort of the opposite of Dembski " Yes, I know. And that is exactly one of the main differences. That is, mainly, because I am interested in a direct application of the concept to a real biological context, rather than in a general logical definition of CSI. So, if you agree, we can stick to my concept, for now. I agree with you that the computation of the target space is the most difficult point. And I agree that it is a still unsolved point. We have already discussed that in part. For me, the Durston methon remains at present the best approximation available for the target space. I do believe, for various reasons, that it is a good approximation, but I agree that it rests on some asuumtions (very reasonable ones, anyway). One of them is that existing protein family have more or less, in the course of evolution, traversed the target space, if not completely, in great part. That is consistent with the other paper about the protein big bang model. Remember that we need not a precise value, but a reasonable order of magnitude. Research on the size of protein target space must go on, and is going on, on both sides. The fact that a measure is difficult does not make it impossible. But, in principle, the target space is measurale, and the concept of dFSCI is perfectly working. The "fixed length" iddue is not really important, IMO. It is just a way to fix the computation to a tractable model. Shorter sequences with the same function can exist in some cases, but in general we have no reason to believe that they change much the ratio, especially considering that for longer sequences the search is almost less favourable. Anyway, the dFSCI is always computed only for a specific function. Your objection of "all possible funtions" that evolution can access, instead, is a more general one, and IMO scarcely relevant. I have already answered it once with you, and I can do that again, if you want.gpuccio_{July 18, 2011
July
07
Jul
18
18
2011
11:02 AM
11
11
02
AM
PDT}

PS: Here's my FSCI test. Generate true and flat random binary digits, feeding them into an ASCII text reader. Compare against say the Gutenberg library for code strings. Find out how long of a valid string you are going to get. So far, the results are up to 20 - 24 characters, picking from a space of 10^50 or so. We are looking at spaces of 10^150 and more.kairosfocus_{July 18, 2011
July
07
Jul
18
18
2011
10:59 AM
10
10
59
AM
PDT}

Dr Liddle: In short, you don't trust the search of life on the ground to adequately sample the space successfully and empirically! (Remember my darts dropped on charts example? the typical pattern of a population will usually show itself soon enough!) And, the table is hardly in isolation from the paper and its context in the wider discussion. Have you worked through it? What are your conclusions? On what grounds? Mine, are that they have a reasonable method to estimate the information per AA in the protein families, reduced from the 4.32 bits per symbol that a flat random distribution would give, based on actual patterns of usage; similar to methods that a Yockey or the like or even a Shannon might use; or, your intelligent code cracker. There may be outliers out there but if you were to go into the business of synthesising this is what I would lay in to know how much of what to stock up on. Just as printers did int eh old days of lead type -- Morse I gather sent an assistant over to a printers to make up a table of frequencies of use of typical letters. BTW, that is how comms theory generally works in assigning bit values per symbol in typical message patterns on traffic analysis. Objections as to what is logically possible don't make much difference to what is seen as the typical pattern. And that pattern is one of islands of function, falling into fold domains. GEM of TKIkairosfocus_{July 18, 2011
July
07
Jul
18
18
2011
10:47 AM
10
10
47
AM
PDT}

Dr Liddle: Remember, the life function problem starts with OOL. The relevant functional polymers would have had to be structured on thermodynamics of chaining reactions, which ARE a random walk driven challenge, i.e. monkeys at keyboards as a good illustration. Then, to create embryologically feasible new body plans, we have to go to changes that are complex -- 10 - 100+ million bits, with no intelligent guidance to structure proteins to form cell types and tissues, or regulatory circuits etc to specify structures to make up complex organisms. The recent Marks-Dembsky work has shown that a blind search will on average do at most as well as chance based random walks rewarded by trial and error. So there is method to the madness. GEM of TKI PS: In short, to get to the concepts we have to use ostensive definitions, and the operationalisation, modellinfg, application and quantification follow from the conceptualisation. Just as it took from the 1660s to the 1960s to find a way to mathematicise infinitesimals coherently, by creating an extension to the reals, the hyper-reals. I suspect my learning of calculus would have been eased by that, but that would have been cutting edge at the time.kairosfocus_{July 18, 2011
July
07
Jul
18
18
2011
10:35 AM
10
10
35
AM
PDT}

Yes, kf, and I looked at it. But that was simply a table of extant family members. It doesn't tell us a) how many possible other family members there might be (combos that would result in an equally viable protein), nor does it tell us how many other genes, even, might achieve the same phenotypic function. So it must be an underestimate of the target space. So cited table does not tell us that the proteins are "deeply isolated islands". It tells us nothing about how isolated they are. It just tells us that in general there are a lot of ways of producing a functional protein, and that some of them are observed doing so.Elizabeth Liddle_{July 18, 2011
July
07
Jul
18
18
2011
10:32 AM
10
10
32
AM
PDT}

Dr Liddle: Over some months you have been repeatedly directed to Durston et al and their table of 35 protein families, joined to the metric they developed. they have empirically identified per study the range of values for which aligned proteins will still function. So, life itself has worked out the reasonable range. That is why I used their table. And, as proteins are expressed from DNA etc, this is an answer on the reasonable range that works for DNA. More broadly, we know that proteins are restricted by folding and functional requirements, and fall in fold domains, which are deeply isolated in sequence space. Some positions are fairly non-critical, but others are just the opposite. All in all, we end up with deeply isolated islands of function, i.e an arbitrary sequence of AAs will not be a functional protein, typically. GEM of TKIkairosfocus_{July 18, 2011
July
07
Jul
18
18
2011
10:25 AM
10
10
25
AM
PDT}

Oh, and of course operational definitions are derivative. They have to be, or they would be useless. We start with a conceptual hypothesis, then we operationalise it so that we have a way of testing whether our conceptual hypothesis is supported. It may be derivative but that does not mean we don't need it - it's essential!Elizabeth Liddle_{July 18, 2011
July
07
Jul
18
18
2011
10:25 AM
10
10
25
AM
PDT}

But kf: no-one is proposing monkey-at-keyboards. This is a really really fundamental point. Darwinian theory is not a monkeys-at-keyboard theory. It assumes that near-neutral and slightly beneficial combinations accumulate, while deleterious, relatively as well as absolutely, are purged. This hugely affects the search space, and thus your calculations. And that's before we tackle the target space itself, which is, I suggest, far larger than you are calculating.Elizabeth Liddle_{July 18, 2011
July
07
Jul
18
18
2011
10:23 AM
10
10
23
AM
PDT}

Dr Liddle: Operational definitions are at best derivative. If one makes the error that the logical positivists did, one will fall into self referential incoherence. Ostensive definitions are the first ones we apply to rough out concepts. By key example and material family resemblance. For instance this is the only definition we can make of life, the general subject matter. Does that mean that absent an operational definition, life is meaningless or unobservable and untestable? Plainly not, or biology as a science collapses. In any case, I have given you an actual quantification of dFSCI, above; one that implies a process of observation and measurement that applies well known techniques in engineering relevant to information systems and communication systems. Processes that have been in use, literally, for decades, and are as familiar as the measure of information in files in your PC. The fake Internet personality MG was trying to make a pretended mountain out of a molehill, to advance a rhetorical agenda. He -- most likely -- was answered repeatedly, but refused any and all answers. Please do not go down that road. Or if you do insist on another loop around a wearing path, kindly first look at the OP above. You will find there all the answer a reasonable person needs on what CSI is, and FSCI as the key part of it, and how it can be reduced to a mathematically based measure, then applied to even biological systems. There are a great many now barking out the mantra that CSI/FSCI is ill defined and meaningless, but hey do so in the teeth of plain and easily accessible evidence. Just as is the case with those who are still trotting out the calumny that Design theory is merely repackaged creationism. GEM of TKIkairosfocus_{July 18, 2011
July
07
Jul
18
18
2011
10:18 AM
10
10
18
AM
PDT}

OK, gpuccio: So, complexity - the nub of the problem:
Well, we are dealing with digital strings, so we will define complexity, in general, as the probability of that sequence versus the total number of possible sequences of the same length. For simplicity, we can express that in bits, like in Shannon’s theory, by talking the negative base 2 log.
OK.
So, each specific sequence in binary language has a generic complexity of 1 : 2^length.
Right. So, for a DNA strand, it will be 1:4^length, right?
But we are intersted in the functional complexity, that is the probability of any string of that length which still expresses the function, as defined.
So we could be talking about a gene, with a known function?
So we have to compute, ot at least approximate, the “target space”, the functional space: The total number of sequences of that length which still express the function, as defined.
Well, I have a question here: how do we find out how many sequences of that length will still express the target function? And why that length? We know that genes can vary in length and still perform their function - either express the same protein or an equally functional one. We often know how many alleles (variants of a gene) are actually found in a population, but how do we know what the total possible number of variants are?
The ratio of the target space to the search space, expressed in bits, is defined as the functional complexity of that string for that function, with the definition and assesment method we have taken into consideration.
Well, I still don't know how you are defining the target space.
In reality, we have to exclude any known algorithm that can generate the functional string in a simpler way. That means, IOWs that the string must be sarcely compressible, at least for what we know. This point is important, and we can discuss it in more detail.
Well, as you have defined your target space in terms of the number of sequences [of the same length] that will serve that function (I put "of the same length" in square brackets because it seems to me that is an invalid constraint), then why do we also need to worry about "compressibility"? We have our specification right there. On the other hand, compressibility may be relevant for identifying our target space (because right now I don't know how to get a number on your target space).
In general, protein sequences are considered as scarcely compressible, and for the moment that will do. If any explicity necessity mechanism can be shown to generate the string, even in part, the evaluation of complexity must be done again, limiting the calculation to the functional complexity that cannot be explained by the knwn necessity mechanism.
Ah, OK. You actually want to use compressibility to exclude "Necessity" - simple algorithms. That's sort of the opposite of Dembski :) But yes, if an algorithm can produce the sequence, then I agree that we can exclude Design (if that is what you are saying). But I then think that the evolutionary algorithm can probably produce your pattern :) However, I'm bothered by your target space issue. Ironically we are in danger of an inverse WEASEL here: of defining complexity in terms of too small a target. Firstly, it is not clear to me how to compute the target space for a single protein, because we'd still have to model a probability distribution for introns. Then we don't know how many protein variants would be equally functional, val/met substitutions for instance. Nor do we know whether a quite different protein might do the equivalent job in a different way, or whether an entire set of proteins, expressed at different times, might be equally effective in "solving" the problem that the protein in question forms a part-solution. And this gets to the heart of the evolutionary story - that Darwinian evolution makes no prediction about what "solutions" a population will find to survive in a particular environment - all it predicts is that adaptation will tend to occur, not form that adaptation will take. So the "target space" is absolutely critical. And so indeed is the "search space", which is not the kind of search you get by rolling four sided dice N times, where N equals the length of a candidate sequence, and repeating the process until you hit a useful protein. This is what is wrong with Dembski's application of No Free Lunch in fact, but we needn't worry about that here. What concerns us more is how many possible sequences offer a slight phenotypic advantage of whatever kind and how many have a near neutral effect, and of those, ditto, and of those, ditto, and of those, ditto. , and, of those, how many offer. So my concern is that you have a) underestimated the target space and b) overestimated the search space. Shall we try your claim out with some real data and see if I'm right?Elizabeth Liddle_{July 18, 2011
July
07
Jul
18
18
2011
10:17 AM
10
10
17
AM
PDT}

KF: thank you for your wonderful and more technical contribution. I am not a mathemathician, so I am trying to stay as simple as possible. I have not yet discussed the threshold for a biological context, but as you know I usually put it at a mcuh lower level of complexity. That will be the subject of my next post.gpuccio_{July 18, 2011
July
07
Jul
18
18
2011
10:15 AM
10
10
15
AM
PDT}

Elizabeth:
Well, I mean that the patterns deemed to have CSI (e.g. the patterns in living things) can be created by Chance and Necessity.
Yet that has never been observed. And there isn't any evidence that genetic accidents can accumulate in such a way as to give rise to new, useful and functional multi-part systems.Joseph_{July 18, 2011
July
07
Jul
18
18
2011
10:11 AM
10
10
11
AM
PDT}

GP: Here's how I build on WD. Conceive of an observed event E, for specificity a string of digital elements ( as every other data structure can be broken into connected organised strings, and systems that are not digital can be captured as digital reps). Now, define a set -- a separately definable collection -- T, where the E's of relevance to our interest come from T. Further, T is itself a narrow and UN-representative set in a space of possibilities W. That is, if you were to take small samples of W, you would be unlikely to end up in T, a special zone. Now, let the elements of T store at least 500 bits of info, info that has to meet certain specifications to be in T. (e.g. T for this thread would be posts longer than 73 ASCII characters in English and contextually responsive to the theme of the thread.) Now, the zone of possibilities for at least 500 bits is vast. Indeed at the 500 bit limit, we are looking at 3 * 10^150 possibilities. That is on the usual estimates 10^48 times the number of Planck time quantum states of the atoms in our solar system since its birth. No random walk sample from W on the gamut of the 10^57 or so atoms in our solar system, would be reasonably likely to land in T by chance. T, the island of function, is deeply isolated in and unrepresentative of the possibilities W. The typical at random 500 bit string would be 73 characters of garbled rubbish. And yet, each participant, in a matter of a few minutes at most, well within the resources of the solar system [our bodies are about 10^27 atoms], pounded out 10- word strings. Monkeys at keyboards could not reasonably do the same in anything like a reasonable scope of resources. A whole internet is there to back up the point. We have a working definition and rationale for the metric in the OP: Chi_500 = I*S - 500, bits beyond the solar system threshold. I on I = - log p or extensions or applications. S, a dummy variable that goes to 1 if Es can be spotted in a defined T in a much wider W. We can go up to 1,000 bits easily, if you need an observed cosmos threshold. Remember 10^30 PTQSs for the fastest chemical reactions. GEM of TKIkairosfocus_{July 18, 2011
July
07
Jul
18
18
2011
10:06 AM
10
10
06
AM
PDT}

Elizabeth: I agree with your definition of an operational definition.gpuccio_{July 18, 2011
July
07
Jul
18
18
2011
09:53 AM
9
09
53
AM
PDT}

kf: I don't know what an "ostensive" definition is, I'm afraid, but I think that operational definitions are vital if a claim is to be put to the test. Indeed, they are vital by definition! Wiki has a good definition at present:
An operational definition defines something (e.g. a variable, term, or object) in terms of the specific process or set of validation tests used to determine its presence and quantity. That is, one defines something in terms of the operations that count as measuring it.
If the claim is that a certain kind of pattern (a pattern with certain properties) can only be created by Design, then it is important to have a set of validation tests in order to "determine its presence" if the claim is to be tested. If I produce a simulation, where the only inputs are Chance and a set of Rules of Necessity, then if I end up with a candidate pattern, we need to be able to determine the presence or otherwise of the claimed Design signature. This is what an operational definition is.Elizabeth Liddle_{July 18, 2011
July
07
Jul
18
18
2011
09:48 AM
9
09
48
AM
PDT}

PS: by consensus and experience, we are intelligent, conscious, purposeful and designing. The relevant set is non empty and by material family resemblance other cases would be recognised; as we moved from earth's moon to those first four of Jupiter and onwards.kairosfocus_{July 18, 2011
July
07
Jul
18
18
2011
09:41 AM
9
09
41
AM
PDT}

Elizabeth at 49: OK, more or less I would say it is correct. Go on.gpuccio_{July 18, 2011
July
07
Jul
18
18
2011
09:37 AM
9
09
37
AM
PDT}

Elizabeth: So, complexity. Well, we are dealing with digital strings, so we will define complexity, in general, as the probability of that sequence versus the total number of possible sequences of the same length. For simplicity, we can express that in bits, like in Shannon's theory, by talking the negative base 2 log. So, each specific sequence in binary language has a generic complexity of 1 : 2^length. But we are intersted in the functional complexity, that is the probability of any string of that length which still expresses the function, as defined. So we have to compute, ot at least approximate, the "target space", the functional space: The total number of sequences of that length which still express the function, as defined. The ratio of the target space to the search space, expressed in bits, is defined as the functional complexity of that string for that function, with the definition and assesment method we have taken into consideration. In reality, we have to exclude any known algorithm that can generate the functional string in a simpler way. That means, IOWs that the string must be sarcely compressible, at least for what we know. This point is important, and we can discuss it in more detail. In general, protein sequences are considered as scarcely compressible, and for the moment that will do. If any explicity necessity mechanism can be shown to generate the string, even in part, the evaluation of complexity must be done again, limiting the calculation to the functional complexity that cannot be explained by the knwn necessity mechanism.gpuccio_{July 18, 2011
July
07
Jul
18
18
2011
09:35 AM
9
09
35
AM
PDT}

Dr Liddle: On definitions of CSI cf OP and onward links to April 18 and May 10 threads. Do not overlook provided examples for families of proteins. BTW, ostensive definitions are the most relevant, not so much, operational or precising denotative ones. GEM of TKIkairosfocus_{July 18, 2011
July
07
Jul
18
18
2011
09:30 AM
9
09
30
AM
PDT}

Hang on, gpuccio, we really do need to go step by step. Let me try to summarise: Step 1: A conscious being is a being that many human beings agree is conscious. Step2: Design is the purposeful imposition of form on an object by a conscious being. Step3: digital Functionally Specified Complex Information (dFSCI) is: 1. Digital: can be read by an observer as as series of digital values e.g. DNA. 2. Functionally specified: e.g. has a function that can be described by a conscious observer (e.g. us). 3. Complexity: tba. Can you let me know if any of that is incorrect?Elizabeth Liddle_{July 18, 2011
July
07
Jul
18
18
2011
09:21 AM
9
09
21
AM
PDT}

Elizabeth: Ah, the excuhtive power. Well, it is not important. If we observe a designer in the process of designing, that means that he has the executive power. That power can be different, according to the method of impèlementation of design. But at present this is not important. For defining a function, the only executive power is that the observer must recognize a function, define it and give a method to assess it. Maybe I must be mpore clear. I am not saying that, if a certain observer cannot see any function in an object, that means that no function exists for the object. Another observer can recognize a function that the first observer missed. All that has not consequences on the discussion. All I am saying is that, if a cosncious observer can explicitly define a function for the object, and give an objective way to assess it, we will take that function into consideration as a specification for that object. Nothing more.gpuccio_{July 18, 2011
July
07
Jul
18
18
2011
09:03 AM
9
09
03
AM
PDT}

Elizabeth: Your objections may have value or not, but they are not relevant to my procedure. I need the concept of conscious intelligent being only for two steps of the procedure: 1) To define design. So, do you agree that humans are, in general, conscious intelligent agents? That's the important part, because we will consider exactly kuman artifacts as examples of design. It is not important, for now, to know if a computer is conscious or not. If it is conscious, we could also consider its outputs as designed, but that is not really important, because the computer, being designed by humans, implies anyway the purposeful intervention of a designer for its ecistence, and so that does not change anything. So, if you agree that humans are conscious intelligent agents, we can go on with the discussion, and leave for the moment unsolved if computers, or aliens, are. 2) The secon point is to define functional specification. Here, too, all we need is that a human observe may define a function for the onbject explicitly, and give a way to assess it, so that other conscious intelligent beings, like other humans, may agree. Again, it is of no relevance if computers or aliens are cosncious. If they are, they can certainly follow the reaoning with us :)gpuccio_{July 18, 2011
July
07
Jul
18
18
2011
08:57 AM
8
08
57
AM
PDT}

Elizabeth: d = digital. We will consider only objects in which some sequence can be read by an observer as a digital sequence of values. It does not matter if those aspects of the object (points on paper, aminoacids in a protein, nucleotides in a DNA molecule) are really symbols of something or not, the only thing that matters is that we, as observers, can give some numeric value to the sequence we observe, abd describe it as a digital sequence. We can certainly do that for nucleotides in DNA protein coding genes, for instance. They are sequences of 4 different nucleotides, and the sequence can easily be writte as a sequnce of A, T, C and G on paper, conserving essentially the same digital form of the original molecule. FS = functionally specified. To assess our specification, all we need is a simple empirical procedure. I a conscious intelligent agent can define a function for the object, we say that the object is specified relatively to that function, provided that: 1) The function can be explicitly defined, so that anyone can understand it and verify it. 2) Some explicit method is given to measure the function and assess its presence or absence, preferably by a quantitative threshold. Please note that ir perfectly possibly that more than one function is defined for the same object. A good example is a PC. We can define it as an useful paperweight, and indeed it can certainly be used as suchh (if we have not excessive expectations). But we can certainly define it as capable of computing, and give methods to verify that function. There is no problem there. Any function wchich can be explicitly defined and objectively assessed can be considered. Each function, however, has to be considered separately. Let's leave complexity to the next post.gpuccio_{July 18, 2011
July
07
Jul
18
18
2011
08:49 AM
8
08
49
AM
PDT}

I like your definition in Step 2, I think. It's good to include purpose explicitly. However, it is absolutely dependent on Step One, and, moreover, requires an additional criterion - that the being is not only conscious but able to execute intentions (so would rule a conscious being with no executive power, e.g. a conscious but paralysed being). So we have to find an empirical method for determining a) consciousness and b) executive power. We don't have either yet :)Elizabeth Liddle_{July 18, 2011
July
07
Jul
18
18
2011
08:41 AM
8
08
41
AM
PDT}

Prev 1 2 3 4 5 6 Next

You must be logged in to post a comment.

Leave a Reply