ID Foundations, 11: Borel’s Infinite Monkeys analysis and the significance of the log reduced Chi metric, Chi_500 = I*S – 500

_{kairosfocus
November 26, 2011

ID Foundations, Intelligent Design

22}_{Categories
ID Foundations
Intelligent Design}

Share: Facebook; Twitter/X; LinkedIn; Flipboard; Print; Email

Emile Borel (1871 – 1956) was a distinguished French Mathematician who — a son of a Minister — came from France’s Protestant minority, and he was a founder of measure theory in mathematics. He was also a significant contributor to modern probability theory, and so Knobloch observed of his approach, that:

>>Borel published more than fifty papers between 1905 and 1950 on the calculus of probability. They were mainly motivated or influenced by Poincaré, Bertrand, Reichenbach, and Keynes. However, he took for the most part an opposed view because of his realistic attitude toward mathematics. He stressed the important and practical value of probability theory. He emphasized the applications to the different sociological, biological, physical, and mathematical sciences. He preferred to elucidate these applications instead of looking for an axiomatization of probability theory. Its essential peculiarities were for him unpredictability, indeterminism, and discontinuity. Nevertheless, he was interested in a clarification of the probability concept. [Emile Borel as a probabilist, in The probabilist revolution Vol 1 (Cambridge Mass., 1987), 215-233. Cited, Mac Tutor History of Mathematics Archive, Borel Biography.]>>

Among other things, he is credited as the worker who introduced a serious mathematical analysis of the so-called Infinite Monkeys theorem (just a moment).

So, it is unsurprising that Abel, in his recent universal plausibility metric paper, observed that:

Emile Borel’s limit of cosmic probabilistic resources [c. 1913?] was only 10⁵⁰[[23] (pg. 28-30)]. Borel based this probability bound in part on the product of the number of observable stars (10⁹) times the number of possible human observations that could be made on those stars (10²⁰).

This of course, is now a bit expanded, since the breakthroughs in astronomy occasioned by the Mt Wilson 100-inch telescope under Hubble in the 1920’s. However, it does underscore how centrally important the issue of available resources is, to render a given — logically and physically strictly possible but utterly improbable — potential chance- based event reasonably observable.

We may therefore now introduce Wikipedia as a hostile witness, testifying against known ideological interest, in its article on the Infinite Monkeys theorem:

In one of the forms in which probabilists now know this theorem, with its “dactylographic” [i.e., typewriting] monkeys (French: singes dactylographes; the French word singe covers both the monkeys and the apes), appeared in Émile Borel‘s 1913 article “Mécanique Statistique et Irréversibilité” (Statistical mechanics and irreversibility),^[3] and in his book “Le Hasard” in 1914. His “monkeys” are not actual monkeys; rather, they are a metaphor for an imaginary way to produce a large, random sequence of letters. Borel said that if a million monkeys typed ten hours a day, it was extremely unlikely that their output would exactly equal all the books of the richest libraries of the world; and yet, in comparison, it was even more unlikely that the laws of statistical mechanics would ever be violated, even briefly.

The physicist Arthur Eddington drew on Borel’s image further in The Nature of the Physical World (1928), writing:

If I let my fingers wander idly over the keys of a typewriter it might happen that my screed made an intelligible sentence. If an army of monkeys were strumming on typewriters they might write all the books in the British Museum. The chance of their doing so is decidedly more favourable than the chance of the molecules returning to one half of the vessel.^[4]

These images invite the reader to consider the incredible improbability of a large but finite number of monkeys working for a large but finite amount of time producing a significant work, and compare this with the even greater improbability of certain physical events. Any physical process that is even less likely than such monkeys’ success is effectively impossible, and it may safely be said that such a process will never happen.

Let us emphasise that last part, as it is so easy to overlook in the heat of the ongoing debates over origins and the significance of the idea that we can infer to design on noticing certain empirical signs:

These images invite the reader to consider the incredible improbability of a large but finite number of monkeys working for a large but finite amount of time producing a significant work, and compare this with the even greater improbability of certain physical events. Any physical process that is even less likely than such monkeys’ success is effectively impossible, and it may safely be said that such a process will never happen.

Why is that?

Because of the nature of sampling from a large space of possible configurations. That is, we face a needle-in-the-haystack challenge.

For, there are only so many resources available in a realistic situation, and only so many observations can therefore be actualised in the time available. As a result, if one is confined to a blind probabilistic, random search process, s/he will soon enough run into the issue that:

a: IF a narrow and atypical set of possible outcomes T, that

b: may be described by some definite specification Z (that does not boil down to listing the set T or the like), and

c: which comprise a set of possibilities E₁, E₂, . . . E_n, from

d: a much larger set of possible outcomes, W, THEN:

e: IF, further, we do see some E_i from T, THEN also

f: E_i is not plausibly a chance occurrence.

The reason for this is not hard to spot: when a sufficiently small, chance based, blind sample is taken from a set of possibilities, W — a configuration space, the likeliest outcome is that what is typical of the bulk of the possibilities will be chosen, not what is atypical. And, this is the foundation-stone of the statistical form of the second law of thermodynamics.

Hence, Borel’s remark as summarised by Wikipedia:

Borel said that if a million monkeys typed ten hours a day, it was extremely unlikely that their output would exactly equal all the books of the richest libraries of the world; and yet, in comparison, it was even more unlikely that the laws of statistical mechanics would ever be violated, even briefly.

In recent months, here at UD, we have described this in terms of searching for a needle in a vast haystack [corrective u/d follows]:

let us work back from how it takes ~ 10^30 Planck time states for the fastest chemical reactions, and use this as a yardstick, i.e. in 10^17 s, our solar system’s 10^57 atoms would undergo ~ 10^87 “chemical time” states, about as fast as anything involving atoms could happen. That is 1 in 10^63 of 10^150. So, let’s do an illustrative haystack calculation:

Let us take a straw as weighing about a gram and having comparable density to water, so that a haystack weighing 10^63 g [= 10^57 tonnes] would take up as many cubic metres. The stack, assuming a cubical shape, would be 10^19 m across. Now, 1 light year = 9.46 * 10^15 m, or about 1/1,000 of that distance across. If we were to superpose such a notional 1,000 light years on the side haystack on the zone of space centred on the sun, and leave in all stars, planets, comets, rocks, etc, and take a random sample equal in size to one straw, by absolutely overwhelming odds, we would get straw, not star or planet etc. That is, such a sample would be overwhelmingly likely to reflect the bulk of the distribution, not special, isolated zones in it.

With this in mind, we may now look at the Dembski Chi metric, and reduce it to a simpler, more practically applicable form:

m: In 2005, Dembski provided a fairly complex formula, that we can quote and simplify:

χ = – log₂[10^120 ·ϕS(T)·P(T|H)]. χ is “chi” and ϕ is “phi”

n: To simplify and build a more “practical” mathematical model, we note that information theory researchers Shannon and Hartley showed us how to measure information by changing probability into a log measure that allows pieces of information to add up naturally: Ip = – log p, in bits if the base is 2. (That is where the now familiar unit, the bit, comes from.)

o: So, since 10^120 ~ 2^398, we may do some algebra as log(p*q*r) = log(p) + log(q ) + log(r) and log(1/p) = – log (p):

Chi = – log₂(2^398 * D2 * p), in bits

Chi = I_p – (398 + K₂), where log₂ (D2 ) = K₂

p: But since 398 + K₂ tends to at most 500 bits on the gamut of our solar system [our practical universe, for chemical interactions! (if you want , 1,000 bits would be a limit for the observable cosmos)] and

q: as we can define a dummy variable for specificity, S, where S = 1 or 0 according as the observed configuration, E, is on objective analysis specific to a narrow and independently describable zone of interest, T:

Chi_500 = I_p*S – 500, in bits beyond a “complex enough” threshold

(If S = 0, Chi = – 500, and, if I_p is less than 500 bits, Chi will be negative even if S is positive. E.g.: A string of 501 coins tossed at random will have S = 0, but if the coins are arranged to spell out a message in English using the ASCII code [[notice independent specification of a narrow zone of possible configurations, T], Chi will — unsurprisingly — be positive.)

r: So, we have some reason to suggest that if something, E, is based on specific information describable in a way that does not just quote E and requires at least 500 specific bits to store the specific information, then the most reasonable explanation for the cause of E is that it was intelligently designed. (For instance, no-one would dream of asserting seriously that the English text of this post is a matter of chance occurrence giving rise to a lucky configuration, a point that was well-understood by that Bible-thumping redneck fundy — NOT! — Cicero in 50 BC.)

s: The metric may be directly applied to biological cases:

t: Using Durston’s Fits values — functionally specific bits — from his Table 1, to quantify I, so also accepting functionality on specific sequences as showing specificity giving S = 1, we may apply the simplified Chi_500 metric of bits beyond the threshold:

RecA: 242 AA, 832 fits, Chi: 332 bits beyond

SecY: 342 AA, 688 fits, Chi: 188 bits beyond

Corona S2: 445 AA, 1285 fits, Chi: 785 bits beyond

u: And, this raises the controversial question that biological examples such as DNA — which in a living cell is much more complex than 500 bits — may be designed to carry out particular functions in the cell and the wider organism.

v: Therefore, we have at least one possible general empirical sign of intelligent design, namely: functionally specific, complex organisation and associated information [[FSCO/I] .

But, but, but . . . isn’t “natural selection” precisely NOT a chance based process, so doesn’t the ability to reproduce in environments and adapt to new niches then dominate the population make nonsense of such a calculation?

NO.

Why is that?

Because of the actual claimed source of variation (which is often masked by the emphasis on “selection”) and the scope of innovations required to originate functionally effective body plans, as opposed to varying same — starting with the very first one, i.e. Origin of Life, OOL.

But that’s Hoyle’s fallacy!

Advice: when you go up against a Nobel-equivalent prize-holder, whose field requires expertise in mathematics and thermodynamics, one would be well advised to examine carefully the underpinnings of what is being said, not just the rhetorical flourish about tornadoes in junkyards in Seattle assembling 747 Jumbo Jets.

More specifically, the key concept of Darwinian evolution [we need not detain ourselves too much on debates over mutations as the way variations manifest themselves], is that:

CHANCE VARIATION (CV) + NATURAL “SELECTION” (NS) –> DESCENT WITH (UNLIMITED) MODIFICATION (DWM), i.e. “EVOLUTION.”

CV + NS –> DWM, aka Evolution

If we look at NS, this boils down to differential reproductive success in environments leading to elimination of the relatively unfit.

That is, NS is a culling-out process, a subtract-er of information, not the claimed source of information.

That leaves only CV, i.e. blind chance, manifested in various ways. (And of course, in anticipation of some of the usual side-tracks, we must note that the Darwinian view, as modified though the genetic mutations concept and population genetics to describe how population fractions shift, is the dominant view in the field.)

There are of course some empirical cases in point, but in all these cases, what is observed is fairly minor variations within a given body plan, not the relevant issue: the spontaneous emergence of such a complex, functionally specific and tightly integrated body plan, which must be viable from the zygote on up.

To cover that gap, we have a well-known metaphorical image — an analogy, the Darwinian Tree of Life. This boils down to implying that there is a vast contiguous continent of functionally possible variations of life forms, so that we may see a smooth incremental development across that vast fitness landscape, once we had an original life form capable of self-replication.

What is the evidence for that?

Actually, nil.

The fossil record, the only direct empirical evidence of the remote past, is notoriously that of sudden appearances of novel forms, stasis (with some variability within the form obviously), and disappearance and/or continuation into the modern world.

If by contrast the tree of life framework were the observed reality, we would see a fossil record DOMINATED by transitional forms, not the few strained examples that are so often triumphalistically presented in textbooks and museums.

Similarly, it is notorious that fairly minor variations in the embryological development process are easily fatal. No surprise, if we have a highly complex, deeply interwoven interactive system, chance disturbances are overwhelmingly going to be disruptive.

Likewise, complex, functionally specific hardware is not designed and developed by small, chance based functional increments to an existing simple form.

Hoyle’s challenge of overwhelming improbability does not begin with the assembly of a Jumbo jet by chance, it begins with the assembly of say an indicating instrument on its cockpit instrument panel.

The D’Arsonval galvanometer movement commonly used in indicating instruments; an adaptation of a motor, that runs against a spiral spring (to give proportionality of deflection to input current across the magnetic field) which has an attached needle moving across a scale. Such an instrument, historically, was often adapted for measuring all sorts of quantities on a panel.

(Indeed, it would be utterly unlikely for a large box of mixed nuts and bolts, to by chance shaking, bring together matching nut and bolt and screw them together tightly; the first step to assembling the instrument by chance.)

Further to this, It would be bad enough to try to get together the text strings for a Hello World program (let’s leave off the implementing machinery and software that make it work) by chance. To then incrementally create an operating system from it, each small step along the way being functional, would be a bizarrely operationally impossible super-task.

So, the real challenge is that those who have put forth the tree of life, continent of function type approach, have got to show, empirically that their step by step path up the slopes of Mt Improbable, are empirically observable, at least in reasonable model cases. And, they need to show that in effect chance variations on a Hello World will lead, within reasonable plausibility, to such a stepwise development that transforms the Hello World into something fundamentally different.

In short, we have excellent reason to infer that — absent empirical demonstration otherwise — complex specifically functional integrated complex organisation arises in clusters that are atypical of the general run of the vastly larger set of physically possible configurations of components. And, the strongest pointer that this is plainly so for life forms as well, is the detailed, complex, step by step information controlled nature of the processes in the cell that use information stored in DNA to make proteins. Let’s call Wiki as a hostile witness again, courtesy two key diagrams:

I: Overview:

The step-by-step process of protein synthesis, controlled by the digital (= discrete state) information stored in DNA

II: Focusing on the Ribosome in action for protein synthesis:

The Ribosome, assembling a protein step by step based on the instructions in the mRNA “control tape” (the AA chain is then folded and put to work)

Clay animation video [added Dec 4]:

More detailed animation [added Dec 4]:

This sort of elaborate, tightly controlled, instruction based step by step process is itself a strong sign that this sort of outcome is unlikely by chance variations.

(And, attempts to deny the obvious, that we are looking at digital information at work in algorithmic, step by step processes, is itself a sign that there is a controlling a priori at work that must lock out the very evidence before our eyes to succeed. The above is not intended to persuade such, they are plainly not open to evidence, so we can only note how their position reduces to patent absurdity in the face of evidence and move on.)

But, isn’t the insertion of a dummy variable S into the Chi_500 metric little more than question-begging?

Again, NO.

Let us consider a simple form of the per-aspect explanatory filter approach:

The per aspect design inference explanatory filter

You will observe two key decision nodes, where the first default is that the aspect of the object, phenomenon or process being studied, is rooted in a natural, lawlike regularity that under similar conditions will produce similar outcomes, i.e there is a reliable law of nature at work, leading to low contingency of outcomes. A dropped, heavy object near earth’s surface will reliably fall at g initial acceleration, 9.8 m/s². That lawlike behaviour with low contingency can be empirically investigated and would eliminate design as a reasonable explanation.

Second, we see some situations where there is a high degree of contingency of possible outcomes under initial circumstances. This is the more interesting case, and in our experience has two candidate mechanisms: chance, or choice. The default for S under these circumstances, is 0. That is, the presumption is that chance is an adequate explanation, unless there is a good — empirical and/or analytical — reason to think otherwise. In short, on investigation of the dynamics of volcanoes and our experience with them, rooted in direct observations, the complexity of a Mt Pinatubo is explained partly on natural laws and chance variations, there is no need to infer to choice to explain its structure.

But, if the observed configurations of highly contingent elements were from a narrow and atypical zone T not credibly reachable based on the search resources available, then we would be objectively warranted to infer to choice. For instance, a chance based text string of length equal to this post, would overwhelmingly be gibberish, so we are entitled to note the functional specificity at work in the post, and assign S = 1 here.

So, the dummy variable S is not a matter of question-begging, never mind the usual dismissive talking points.

I is of course an information measure based on standard approaches, through the sort of probabilistic calculations Hartley and Shannon used, or by a direct observation of the state-structure of a system [e.g. on/off switches naturally encode one bit each].

And, where an entity is not a direct information storing object, we may reduce it to a mesh of nodes and arcs, then investigate how much variation can be allowed and still retain adequate function, i.e. a key and lock can be reduced to a bit measure of implied information, and a sculpture like at Mt Rushmore can similarly be analysed, given the specificity of portraiture.

The 500 is a threshold, related to the limits of the search resources of our solar system, and if we want more, we can easily move up to the 1,000 bit threshold for our observed cosmos.

On needle in a haystack grounds, or monkeys strumming at the keyboards grounds, if we are dealing with functionally specific, complex information beyond these thresholds, the best explanation for seeing such is design.

And, that is abundantly verified by the contents of say the Library of Congress (26 million works) or the Internet, or the product across time of the Computer programming industry.

But, what about Genetic Algorithms etc, don’t they prove that such FSCI can come about by cumulative progress based on trial and error rewarded by success?

Not really.

As a rule, such are about generalised hill-climbing within islands of function characterised by intelligently designed fitness functions with well-behaved trends and controlled variation within equally intelligently designed search algorithms. They start within a target Zone T, by design, and proceed to adapt incrementally based on built in designed algorithms.

If such a GA were to emerge from a Hello World by incremental chance variations that worked as programs in their own right every step of the way, that would be a different story, but for excellent reason we can safely include GAs in the set of cases where FSCI comes about by choice, not chance.

So, we can see what the Chi_500 expression means, and how it is a reasonable and empirically supported tool for measuring complex specified information, especially where the specification is functionally based.

And, we can see the basis for what it is doing, and why one is justified to use it, despite many commonly encountered objections. END

________

F/N, Jan 22: In response to a renewed controversy tangential to another blog thread, I have redirected discussion here. As a point of reference for background information, I append a clip from the thread:

. . . [If you wish to find] basic background on info theory and similar background from serious sources, then go to the linked thread . . . And BTW, Shannon’s original 1948 paper is still a good early stop-off on this. I just did a web search and see it is surprisingly hard to get a good simple free online 101 on info theory for the non mathematically sophisticated; to my astonishment the section A of my always linked note clipped from above is by comparison a fairly useful first intro. I like this intro at the next level here, this is similar, this is nice and short while introducing notation, this is a short book in effect, this is a longer one, and I suggest the Marks lecture on evo informatics here as a useful contextualisation. Qualitative outline here. I note as well Perry Marshall’s related exchange here, to save going over long since adequately answered talking points, such as asserting that DNA in the context of genes is not coded information expressed in a string of 4-state per position G/C/A/T monomers. The one good thing is, I found the Jaynes 1957 paper online, now added to my vault, no cloud without a silver lining.

If you are genuinely puzzled on practical heuristics, I suggest a look at the geoglyphs example already linked. This genetic discussion may help on the basic ideas, but of course the issues Durston et al raised in 2007 are not delved on.

(I must note that an industry-full of complex praxis is going to be hard to reduce to an in a nutshell. However, we are quite familiar with information at work, and how we routinely measure it as in say the familiar: “this Word file is 235 k bytes.” That such a file is exceedingly functionally specific can be seen by the experiment of opening one up in an inspection package that will access raw text symbols for the file. A lot of it will look like repetitive nonsense, but if you clip off such, sometimes just one header character, the file will be corrupted and will not open as a Word file. When we have a great many parts that must be right and in the right pattern for something to work in a given context like this, we are dealing with functionally specific, complex organisation and associated information, FSCO/I for short.

The point of the main post above is that once we have this, and are past 500 bits or 1000 bits, it is not credible that such can arise by blind chance and mechanical necessity. But of course, intelligence routinely produces such, like comments in this thread. Objectors can answer all of this quite simply, by producing a case where such chance and necessity — without intelligent action by the back door — produces such FSCO/I. If they could do this, the heart would be cut out of design theory. But, year after year, thread after thread, here and elsewhere, this simple challenge is not being met. Borel, as discussed above, points out the basic reason why.

Comments

eigenstate: Your post # 14. I will be brief (I hope). The wrong thing in the lottery is the statement (shared by you) that "someone has to win the lottery". That os true only if all the lottery tickets have been printed abd bought, and one of their numbers is extracted. That has nothing to do with the situation in biological information. I quote again for my initial comment, because it seems you have not read, or understood, it:
"The example of the lottery is simply stupid (please, don’t take offense). The reason is the following: in a lottery, a certain number of tickets is printed, let’s say 10000, and one of them is extracted. The “probability” of a ticket winning the lottery is 1: it is a necessity relationship. But a protein of, say, 120 AAs, has a search space of 20^120 sequences. To think that all the “tickets” have been printed would be the same as saying that those 20^120 sequences have been really generated, and one of them is selected (wins the lottery). But, as that number is by far greater than the number of atoms in the universe (and of many other things), that means that we have a scenario where 10000 tickets are printed, each with a rnadom numbet between 1 and 20^120, and one random number between 1 and 20^120 is extracted. How probable is then that someone “wins the lottery”?” It seems clear, isn’t it? The lottery example is wrong and stupid."
But, if you want to go on defending it, please be my guest! :) Let's go to more important things. Again, you equivocate on fundamental facts. We observe observe designed objects, and look for specific formal properties in them. We choose dFSCI and verify that it is not exhibited by known non designed objects. All your following comments in the post show that you misunderstand that completely. Please, read carefully what I am going to write: a) We know that the objects we observe in the beginning are designed, because we have direct evidence of that. So we choose them to look for some specific formal property. b) We define dFSCI becasue we believe, because of what we observe, that it can be that property. c) As I have shown before, the computation of dFSCI, and the judgement about its categorical presence (above a certain threshold of complexity) or absence (because no function is observed, or because the functional complexity is too low) is completely independent for knowing that the object is designed. I really can't understand why you insist on such a wrong idea. To compute dFSCI in a digital string, all we need is: c1: Recognize and define a function: I have clearly shown you that that does not imply design, and is done independently of any pre judgement that the string is designed. Non designed strings can be functional (but are never functionally complex). So, you are wrong if you think that, because we define a function, we are already assuming design. That is simply not true. c2) A computation of the search space: very easy, as I have shown. c3) A computation of the target space: difficult, but possible. c4) a simple division, and taking -log2 of the result. That gives us the functional complexity. To decide if that functional complexity is high enough to assign the object to the categorical, binary class of objects exhibiting complex functional information, for which a design inference can be made, we need a threshold of functional complexity, that must be appropriate for the probabilistic resources of the system we are studying. This is a property we can observe in the object, although, as I have already said, it is more a property of the function than of the object. Let's say that the object exhibits a function that is complex. So, all your other comments in the post are not pertinent and wrong. ___________ I hope my b/quote helped. Also, I suggest the issue is that the object exhibits a function that up to a "narrow" zone T in a much wider field of possible configs W, is specific and complex. KFgpuccio_{January 23, 2012
January
01
Jan
23
23
2012
03:04 PM
3
03
04
PM
PDT}

I apologize, I failed to close the quote. Peter's words are only the first two paragraphs in the quote, up to "a very simple one that does alot?" The rest is mine.gpuccio_{January 23, 2012
January
01
Jan
23
23
2012
02:44 PM
2
02
44
PM
PDT}

Elizabeth, eigenstate, and others: As we are in the middle of a hot discussion, I would like to mention here, and answer, an objection made by Peter Griffin about dFSCI on the other thread. I do it hete because I beleiev that it can clarify better what dFSCI really measures (Peter, why don't you join us here? :) ). Peter writes, commenting on my points about the relationship between length and dFSCI:
I do believe you. But that’s the problem. I can write a book with minimal meaning and as long as it’s sufficiently long it’ll apparently be more complex (have a higher value for dFSCI) then a much shorter text which much more inherent meaning. Harry Potter might be a long book but it can’t have more dFSCI then the special theory of relativity, or can it? Or to put it another way, is a very very large protein that does almost nothing more complex (higher dFSCI value) then a very simple one that does a lot?
Now, I want to be very clear about that: dFSCI in no way is measuring the conceptual depth or importance of the function. In that sense, Harry Potter can well be more complex than the special theory of relativity (or at least, of some not too long text about it). And yet, it is not measuring only length. So, what is it measuring? It is easy: the coplexity in bits necessary to implement the defined function. Whatever that function is. It is my opinion that, in dense text, the relationship between length and dFSCI will be rather linear (with some random variability). The same is not true for proteins. Let's look at Durston's data, and in particular to the last column in Table one, the one titled "FSC Density Fits/aa". It gives the mean density of functional information in the protein families he studied. As you can say, the values goe from a minimun of 1.4 for ankyrin to a maximum of 4 for Flu PB2. That is quite a vast range, if we consider that the minimum value for a single aminoacid site is 0 (that is, that site can randomly accommodate any of the 20 aminoacids, and is in no sense constrained by the functional state), and the maximum value is 4.32 (that is, the same aminoacid is always found at that site in the functional set). Now, that does not mean necessarily that a protein with a higher mean value of dFSCI is "more important", or has "more function" than any other. But it does mean that the structure function relationship, in that protein, is more strict, and does not tolerate well random variation. I hope that is clear, and answers Peter's question. ______ Does that help, KFgpuccio_{January 23, 2012
January
01
Jan
23
23
2012
02:42 PM
2
02
42
PM
PDT}

Thanks for looking, GP. I'll have a look at 23.1.2. I've been following along as time allows!material.infantacy_{January 23, 2012
January
01
Jan
23
23
2012
02:34 PM
2
02
34
PM
PDT}

Correction: the limit of -log2(k/20^150) as k approaches 20^150 is zero. The limit of (k/20^150) as k approaches 2^150 is 1, implying a probability of 1 for finding the random function.material.infantacy_{January 23, 2012
January
01
Jan
23
23
2012
02:30 PM
2
02
30
PM
PDT}

KF: Thanks for your contribution. As you know, I am a great fan of "objective subjectivity" :) .gpuccio_{January 23, 2012
January
01
Jan
23
23
2012
02:25 PM
2
02
25
PM
PDT}

material.infantacy: I think what you write is very correct. I have added some other thoughts in my post 23.1.2gpuccio_{January 23, 2012
January
01
Jan
23
23
2012
02:23 PM
2
02
23
PM
PDT}

Elizabeth: It's strange: I have said these things so many times, even responding to you... However: The search space is easily approximated by taking the length of the sequence and calculating the combinatorial value of the total number of possible sequences of that length. Obviously, shorter or longer sequences may be functional too, but fixing the length to the length of the observed sequence is the best way to simplify the problem. The target space is the big problem. In principle, we could measure the funtion, and assess its presence or absence, for each possible sequence. That is not a good idea in practice, with non trivial search spaces, but it means that the target space exists, and has a definite size. So, we need other, indirect methods to approximate it. In another thread, recently, I have given a mathemathical demonstration that, for texts, the ratio of the target space to the search space, that is the functional complexity, is bound to increse as the length of the meaningful text increses. So I have demonstrated that any meaningful (and dense) text of more than 10000 characters will certainly have a dFSCI higher than 1000 bits, and will allow a design inference. That is a way to face the problem, and my demonstration for a 10000 characters threshold can certainly be lowered. The functional space of proteins is certainly different from the functional space of text. Moreover, while the length of text has virtually no limit, the length of individual proteins, and protein domains, has definite boundaries. Therefore, it is important to have a more precise measure of dFSCI, and indeed it can be done. One way is to go on with research about the sequence structure relationship in specific proteins, an approach followed also by Axe, and try to understand how much function is "robust" to random cahnce. A lot of importnat information can be obtained that way, but much is still to be done in that sense. The Durston method, instead, is a simple and powerfiul method to appoximate the target space of specific protein families, and it is based on the comparison of a grest number of different sequences in the proteome that implement the same function in different species, and a brilliant application to that of the principle of Shannon's uncertainty. The numbers found by Durston are certainly an approximation, but theu are the best simple way we have of measuring the function space of proteins. They do measure it, in reality, although the precision of the measure can certainly be investigated further. An interseting result of Dirston's method is that it confirms, for the space of protein function, what I hyave shown for the space of meaningful texts: that the functional complexity increases as the length of the functional sequence increases. While that is rather intuitive, it's fine to have an empirical confirmation. I have shown that in my discussion with Peter Griffin (by the way, Peter, I am still waiting to know what I should do with your six short phrases :) ). I quote here what I wrote there: "Just some more support for my concept that dFSCI increases with length increase, this time empirical and regarding the protein space. I have taken the data from Durston’s table giving values for 35 protein families, and I have performed a linear regression of dFSCI against sequence length. Here are the results for the regression: Multiple R-squared: 0.8105, Adjusted R-squared: 0.8047 F-statistic: 141.1 on 1 and 33 DF, p-value: 1.836e-13 So, empirically and for the protein space, values of dFSCI are strongly related to sequence length."gpuccio_{January 23, 2012
January
01
Jan
23
23
2012
02:22 PM
2
02
22
PM
PDT}

Thanks :)Elizabeth Liddle_{January 23, 2012
January
01
Jan
23
23
2012
02:14 PM
2
02
14
PM
PDT}

hey KF, on this point: "moving from S = 0, the default value, to S = 1." setting S=1 seems to be where the confusion is. specific identifiers, such as: discrete representation of some thing, protocol, effect/output etc should be all identified within the system before s=1? so the information contained in the grand canyon would be set provisionally to s=0 because protocol and output/function etc cannot be identified? Is this how the value of S is determined?junkdnaforlife_{January 23, 2012
January
01
Jan
23
23
2012
01:53 PM
1
01
53
PM
PDT}

Hi Elizabeth, I'll risk a guess that the target space is the specific functional sequence, expressed as string of amino acids (or their corresponding DNA code string) of length n, and a character set of length 20. The search space is the set of all finite sequences of length n, which figures as 20^n sequences for an n-length string. The target sequence complexity could be expressed in bits as -log2(1/20^n), which is the same for any single sequence in the search space. If multiple sequences could code for the same function, bit complexity could be reduced to -log2(k/20^n) for 1 < k < n, but I'm just fiddling with the numbers. k needs to be really large, or n needs to be relatively small, for k to make any significant impact. However the value of k has implication to other types of functions, such as the random selection of a single sequence from a generous search space (this was mentioned earlier) to be used as a key for encryption. For a space of 10^150, or any computationally non-traversable quantity, practically any sequence chosen at random will serve the purpose, so will have the same function. Therefore k is very close to 10^150 (or equal to it). Such being the case, the limit of the function -log2(k/20^150) as k approaches 20^150 is 1. A random string, although expressing a function as a cryptographic key, in my reasoning, has a very low functional complexity, because the probability of finding the functional sequence by random search is certain, if "functional sequence" is defined as any random sequence of base^length. gpuccio, feel free to correct anything mistaken or unreasonable above.material.infantacy_{January 23, 2012
January
01
Jan
23
23
2012
01:46 PM
1
01
46
PM
PDT}

I think this intuitive understanding on "a lot" is fine as the basis of conjectures. It just isn't good for anything beyond conjecture. It says nothing about the history of an object. Most specifically, it says nothing about the history of coding sequences. It might suggest research, but it is not a substitute for research. It is particularly pernicious when it leads to the conclusion that a mysterious and unseen entity has done something that requires the very power of assembly that you are denying to known and observable processes. I find it almost comical that the putative designer has the power to assemble sequences that have more combinational possibilities than the number of particles in the universe. How is it that the designer has access to the list of functional sequences?Petrushka_{January 23, 2012
January
01
Jan
23
23
2012
01:24 PM
1
01
24
PM
PDT}

Onlookers: Just a footnote for now: subjective and objective are not opposites. To see this, consider what the sort of weak form knowledge we have in science and everyday life is: warranted, credibly true belief. Truth, or correspondence to what actually is here, is objective, and belief is subjective, where credibility is a weighting we exert. Warrant is the rationale that holds all together and gives us good reason to be confident that the belief is true. Now, let us apply that to a metric (system and standard of measurement, which implies scaled comparison with a conventional standard for a quantity; where scales may be ratio, interval, ordinal, and nominal) and the value delivered by it -- and some of this feels a lot like we have been over this ground before almost a year ago with the sock-puppet MG, and it was not listened to that time either. That is why I am simply speaking for record, I frankly do not expect the likes of ES to be interested in more than endlessly and hypnotically drumming out favourite talking points. I do have a faint hope that I could be shown wrong. Scales and standards are of course conventional, and yet may also be objective and warranted. They are deeply involved with subjects who set up conventions, standards, models, units, etc etc. And yet they can be quite objective, speaking to something that is credibly sufficiently accurate to reality that we can rely on it for serious work. Let's focus the metric model that is under attack: Chi_500 = Ip*S - 500, bits beyond the solar system threshold. The bit is a standard unit of information-carrying capacity, i.e the amount in a yes/no decision, a true/false, an on/off etc. String seven together and we can encode the standard symbols of English text. And of course this is what we use to measure capacity of memory etc. 500 bits is enough to have 3.27*10^150 possibilities, which defines a space of possibilities, W. If we have one more bit than that, we have twice the number of possibilities, and two more would give four times, and so on. Objective fact, easily shown mathematically. As an order of magnitude there are about 10^57 atoms in our solar system and there are about 10^45 Planck times per second. Run the solar system for the typical estimate of its age, and its atoms will have had 10^102 Planck time states [where some 98% of the mass of the system is locked up in the sun], where about 10^30 are required for the fastest chemical reactions. That is we are looking at about 1 in 10^48 of the possibilities for just 500 bits. If the atoms are made into the equivalent of monkeys typing at keyboards at random, they are just not going to be able to sample a very large fraction of the possibilities. So, a search across W, will be looking for a needle in a haystack, which is the point of the random text generation exercise shown earlier. Under such circumstances, you are going to sample straw with near certainty, equivalent to taking a one straw sized sample from a haystack 3 1/2 light days across. Even if a solar system lurked within, you would be maximally likely to pick straw. Next step, we take a system in some state E and measure the information used in it, Ip. If it is beyond 500 bits [as we routinely measure or as we may use some more sophisticated metrics deriving from Shannon's H, average info per symbol], it would certainly be complex, but are we looking at straw or needle here? That is where S comes in. There are many cases where E as we have observed -- objective again -- is not particularly constrained, e.g we toss 500 fair coins or the like. But sometimes, E is indeed special, coming from a highly constrained -- specific zone, T. This can be observed, e.g the first 72 ascii characters in this post are constrained by the requisites of a contextually responsive message in English. If they were not so constrained but were blindly picked by our proverbial monkeys typing, by overwhelming likelihood, we would be looking at gibberish:f39iegusvfjebg . . . If E were code for a functional program, that too would be very constraining, e.g. we have that rocket that veered off course for want of a good comma, and had to be self-destructed. Similarly, if E were something that implied information, like the specific way bits and pieces are put together o make a fishing reel, that too is very constraining and separately describable based on a function. Similarly the amino acid sequence to do a job as say an enzyme is typically quite constrained. All of this can be measured by comparing degree of function with the variation of E, and we can in effect map out a narrow zone or island of function T in W. None of this is unfamiliar to anyone who has had to say design and wire up a complicated circuit and get it to work, or who has had to do a machine language program and get the computer to work. There is a threshold of no function, to beginnings of function, and there is a range of changes in which function can vary, sometimes better sometimes not so better. This is a bit vague because we are speaking generically here; once we deal with particular cases, metrics for function, and ways to compare better or worse are going to come out of the technical practice. GP, a physician, is fond of enzyme examples, we can always compare how fast a candidate enzyme makes a reaction go, relative to how it goes undisturbed, and there is a whole statistics of analysis of variance across treatments and blocks that can be deployed. We know this, the objectors running around above -- once ES has confirmed that he has some knowledge of engineering praxis -- know this. So, the objections above are specious, and are known to be specious, or should be known to be specious. For in a great many cases of relevance, we can and routinely do identify functions that depend on complex and specific arrangements of components that are therefore information rich. So, we see that each term on the RHS of the equation is objectively justified. We even can justify on objective grounds, moving from S = 0, the default value, to S = 1. We then can evaluate Chi_500, and if it is positive, then that becomes significant, For that means the observed E's are from a zone T that would be implausible to be hit upon by blind chance plus mechanical necessity. Nor, can we try the idea of well, we can look at difference in degree of function, Gibberish has no relevant function, and 0 - 0 = 0. that is there is no proper hill-climbing signal. [Dawkins; weasel worked by smuggling in a known target and evaluating digital distance to target, rewarding closer in cases. More modern GAs and the like start within an island of function and in effect reward superior function, the issue is 0 - 0 = 0, getting to islands of function in the first instance, So we see the real question begging that has been going on all along. As has been repeatedly pointed out but ignored.) The analytical conclusion is that with high confidence if we see cases E from zones T in such large domains W, we have good reason to infer this is because they were not blindly arrived at, i.e they reflect intelligent design. This is abundantly empirically confirmed, with billions of cases in point. Such as posts in this thread. the problem is the same metric points to design in the living cell and in major body plans, which cuts clean across a dominant school of thought that is locked into the institutionally dominant evolutionary materialist worldview. In short, it is not politically correct. Which, ironically, is a subjective problem. GEM of TKIkairosfocus_{January 23, 2012
January
01
Jan
23
23
2012
01:17 PM
1
01
17
PM
PDT}

Facial recognition is just around the corner. A couple months ago I bought a painting at a garage sale. I thought it was a print of some good artist, but when I looks closely It was on stretched canvas and had paint texture. I looked up the artist and found he is a fairly successful landscape artist represented by galleries in major cities. I still think it might be a sophisticated reproduction. Prints of Thomas Kincaid are being sold for hundreds of dollars. So I started looking on the net for prints of my subject. I couldn't find any. So I took a picture of my painting and plugged it into Google image search. Within seconds, Google returned an image, a snapshot taken at a wedding reception at a hotel. In the background was my painting, my exact frame and all. the image of that painting was out of focus, because it wasn't what was being photographed, but it was unmistakably my painting. I have trouble understanding how the search is done, but I am convinced that recognition software is right here and now, not in the distant future.I've since done other searches on faces, and Google can find images of people. Even from small, fuzzy images.Petrushka_{January 23, 2012
January
01
Jan
23
23
2012
01:10 PM
1
01
10
PM
PDT}

eig: "but drives motor controllers from input sensors in a [Bullet Physics] 3D environment which is running on the CPUs." very nice "I have the code faith" I never doubted lol!junkdnaforlife_{January 23, 2012
January
01
Jan
23
23
2012
12:27 PM
12
12
27
PM
PDT}

And how do you quantify the target space and the search space?Elizabeth Liddle_{January 23, 2012
January
01
Jan
23
23
2012
12:05 PM
12
12
05
PM
PDT}

Elizabeth, Science is done via observations- so yes we would see a protein doing something and then inquire about that protein. If we observe a protein doing nothing we would also want to know why it is there (I would assume). So we see all of this stuff going on- functions being carried out-> function is part of the OBSERVATION. We observe function and meaning. We can measure how well it functions and how many different configurations can perform that function. We can measure what is the minimum to perform that function/ convey that meaning. And we can measure the information based on all of that.Joe_{January 23, 2012
January
01
Jan
23
23
2012
11:55 AM
11
11
55
AM
PDT}

Petrushka, The designer uses a GA to do both.Joe_{January 23, 2012
January
01
Jan
23
23
2012
11:52 AM
11
11
52
AM
PDT}

Elizabeth: Yes, giving function as binary is correct. But I don't compute the number of bits in the sequence. I compute the number of functional bits for the function, that is the ratio of the target space to the search space: that expresses at the same time the probability of getting the function (not the individual sequence) by a random search, and the constraints that the functional state imposes to the sequence. Are we OK on that?gpuccio_{January 23, 2012
January
01
Jan
23
23
2012
11:12 AM
11
11
12
AM
PDT}

You can measure some kinds of function, such as catalytic efficiency, but other functions are elusive. How do you measure the function of height, weight, length of tail feathers? these are known to have both utility (perhaps for attracting a mate) and tradeoffs. But the various versions of FSCI tend to revolve around sequence length, and there is no known way of determining why a certain sequence produces utility, and a neighboring sequence does not. Among other things, this means that there is no theory associating the length of a sequence with utility. It is not clear what the minimum lengths of a useful sequence might be or whether longer sequences are more useful than shorter ones. There is some reason for skepticism about length of codes, since it is not clear that the genomes of onions and amoebas are more functional than shorter genomes.Petrushka_{January 23, 2012
January
01
Jan
23
23
2012
09:36 AM
9
09
36
AM
PDT}

OK, that's fine. As you hadn't given a quantitative definition, I though it might not be. So would I be right in saying then, that to calculate the dFCSI of a gene you first decide: is it functional? If it is, it scores Function=1, if it isn't it scores Function=0. Then you compute number of bits in the sequence in some way, and multiply the answer by the value of Function. And that's dFCSI. Is that more or less it?Elizabeth Liddle_{January 23, 2012
January
01
Jan
23
23
2012
09:23 AM
9
09
23
AM
PDT}

So how do you measure the minimum code length or the minimum protein? That's part of my question about how a designer would work. Can you demonstrate with an example how you would determine, without using evolution, how to build a functional sequence?Petrushka_{January 23, 2012
January
01
Jan
23
23
2012
09:22 AM
9
09
22
AM
PDT}

eigenstate (and Elizabeth): I am really amazed. Either I have become stupid, or I cannot follow your reasonings :) Well, before going on with answers to the previous posts, let's try to clarify the main point here. dFSCI is quantitative. Let's take an example from Durston, again. Let's take betalactamase. The definition of the function is easy. I quote from Wikipedia: "Beta-lactamases are enzymes (EC 3.5.2.6) produced by some bacteria and are responsible for their resistance to beta-lactam antibiotics like penicillins, cephamycins, and carbapenems (ertapenem) (Cephalosporins are relatively resistant to beta-lactamase). These antibiotics have a common element in their molecular structure: a four-atom ring known as a beta-lactam. The lactamase enzyme breaks that ring open, deactivating the molecule's antibacterial properties." The length of the sequence, in Durston's table, is 239 AAs. That means that the ground state (the random state) for that length is 1033 bits. The functional complexity is given as 336 functional bits. That means that, according to Durston's method, that has compared here 1785 different sequences with the same function, the functional space if 697 bits. Therefore, the functional complexity is -log2 of 2^697 / 2^1033, that is 336 functional bits. That is quantitattive, I would say. What is your problem? Of which "inputs" are you discussing? Please, explain. The threshold problem is another point. A threshold is necessary to establish if we will infer design or not. The threshold must take into account the probabilistic resources of the real system one is studying. For a biological system, I have proposed 150 bits as a threshold, according to a computation of the maximal probabilistic resources of our planet in 5 billion years and considering a maximal prokaryote population. Again, the threshold is a methodological choice, and can be discussed. So: that is quantitative. If there is something that is not clear, or if you don't agree, please explain clearly why, and we will avoid discussing uselessly and wasting our reciprocal time.gpuccio_{January 23, 2012
January
01
Jan
23
23
2012
09:14 AM
9
09
14
AM
PDT}

I'm absolutely not saying that function is an illusion. I am saying that it needs to be defined pretty carefully if we are to measure it (or decide whether or not a thing is functional) objectively. I think this is perfectly possible.Elizabeth Liddle_{January 23, 2012
January
01
Jan
23
23
2012
09:01 AM
9
09
01
AM
PDT}

@eingestate:
How will you know if/when you’ve measured the minimum?
It depends- when a roller coaster has a minimum height requirement they usually put up a sign that has the minimum at a point X inches from the ground. So a tape measure would come in handy when setting that up. If there is a minimum weight requirement you would use a scale. True those are mighty complex devices so you should leave all that to experts.Joe_{January 23, 2012
January
01
Jan
23
23
2012
08:44 AM
8
08
44
AM
PDT}

@Joe
We can measure what is the minimum to perform that function/ convey that meaning.
How will you know if/when you've measured the minimum?eigenstate_{January 23, 2012
January
01
Jan
23
23
2012
08:38 AM
8
08
38
AM
PDT}

eigenstate:
Well, that’s tough cookies for FCSI and dFDSCI as metrics then, huh?
Not at all. Ya see science is based on observations. We observe function and meaning. We can measure how well it functions and how many different configurations can perform that function. We can measure what is the minimum to perform that function/ convey that meaning. And we can measure the information based on all of that.Joe_{January 23, 2012
January
01
Jan
23
23
2012
08:31 AM
8
08
31
AM
PDT}

Do I understand this objection correctly? Very loosely stated, by measuring dfsci we call it "functional," which attributes functionality to it without adequately explaining on what basis we call it "functional." Did I get that right? If so, on the one hand it seems like a valid, logical argument. The determination that something is functional is subjective. How does one objectively measure what only exists subjectively? (If I've misunderstood then everything after this is rather pointless. Maybe it is anyway.) The trouble is that this reasoning runs counter to rational thought rather than complementing it. It holds up only if we choose to believe that the only difference between a supposedly functional enzyme and a random assortment of molecules is in our imagination. Or, expanding on it, that the difference between molecules organized to form a living frog and those that form a rock is purely subjective. We can choose to see function or choose not to. I've seen similar reasoning before. It's not inherently illogical, but the result is that we explain what is remarkable by reassessing it and deciding that it's unremarkable. We look for the origin of a function, and then exclude one method because it requires us to subjectively identify the function as such. Reasonably, is not the fact that we expend resources attempting to explain the cause of a given function sufficient cause to label it "function?" It seems logical, but it leads to absurdity. I can understand of one speaks of the illusion of design. But the illusion of function? Isn't that just scraping the bottom of the barrel for objections?ScottAndrews2_{January 23, 2012
January
01
Jan
23
23
2012
08:23 AM
8
08
23
AM
PDT}

@Joe, Well, that's tough cookies for FCSI and dFDSCI as metrics then, huh? I understand you can say "We don't need no steenkin metrics", and that's your prerogative as a position to take, but your observation above discredits those who DO use "function" as a vector in their metrics the offer as a scientific/mathematical means of detecting design.eigenstate_{January 23, 2012
January
01
Jan
23
23
2012
08:22 AM
8
08
22
AM
PDT}

@Elizabeth, It's best if he speaks for himself, but I can say from just our exchange that he DOES implicate an upper probability bound here as a threshold, which makes his metric numeric in nature -- you can't have non-numeric probabilities, as 'numeric' is implied in the term 'probability'. I think he's approaching this from a "heap problem" perspective. If someone asked me how man hairs a man would have to have on his head to NOT be "bald", I'd be hard pressed to come up with a precise number. Perhaps I could offer one, but I'd be challenged to defend why X, and not X-1, or X+1, etc. Even so, I would have no problem saying a man with no hair on his head was "bald", and a man with a thick shock of hair was "not bald". How do make such distinctions if I can't state X? That's the heap problem, as you are likely aware, and it's not a practical problem in that case, "bald" being quasi-quantitative, or qualitative about numbers rather than a discrete numerical metric. I think gpuccio is suggesting that "functional complexity" is something like that, where he can't give you a precise measurement of the quantity, but can only judge there to be "a lot", or "not so much", intuitively or qualitatively. When there's "a lot", the idea is that 'a lot' cannot be achieved without intelligent design. That's my current state of reverse engineering his ideas on this. I'm still getting worthwhile explanations from his point of view as the posts progress, so I'm learning more, post by post.eigenstate_{January 23, 2012
January
01
Jan
23
23
2012
08:19 AM
8
08
19
AM
PDT}

Prev 1 … 9 10 11 12 13 14 Next

You must be logged in to post a comment.

Leave a Reply