Uncommon Descent Serving The Intelligent Design Community

ID Foundations, 11: Borel’s Infinite Monkeys analysis and the significance of the log reduced Chi metric, Chi_500 = I*S – 500

Categories
ID Foundations
Intelligent Design
Share
Facebook
Twitter/X
LinkedIn
Flipboard
Print
Email

 (Series)

Emile Borel, 1932

Emile Borel (1871 – 1956) was a distinguished French Mathematician who — a son of a Minister — came from France’s Protestant minority, and he was a founder of measure theory in mathematics. He was also a significant contributor to modern probability theory,  and so Knobloch observed of his approach, that:

>>Borel published more than fifty papers between 1905 and 1950 on the calculus of probability. They were mainly motivated or influenced by Poincaré, Bertrand, Reichenbach, and Keynes. However, he took for the most part an opposed view because of his realistic attitude toward mathematics. He stressed the important and practical value of probability theory. He emphasized the applications to the different sociological, biological, physical, and mathematical sciences. He preferred to elucidate these applications instead of looking for an axiomatization of probability theory. Its essential peculiarities were for him unpredictability, indeterminism, and discontinuity. Nevertheless, he was interested in a clarification of the probability concept. [Emile Borel as a probabilist, in The probabilist revolution Vol 1 (Cambridge Mass., 1987), 215-233. Cited, Mac Tutor History of Mathematics Archive, Borel Biography.]>>

Among other things, he is credited as the worker who introduced a serious mathematical analysis of the so-called Infinite Monkeys theorem (just a moment).

So, it is unsurprising that Abel, in his recent universal plausibility metric paper, observed  that:

Emile Borel’s limit of cosmic probabilistic resources [c. 1913?] was only 1050 [[23] (pg. 28-30)]. Borel based this probability bound in part on the product of the number of observable stars (109) times the number of possible human observations that could be made on those stars (1020).

This of course, is now a bit expanded, since the breakthroughs in astronomy occasioned by the Mt Wilson 100-inch telescope under Hubble in the 1920’s. However,  it does underscore how centrally important the issue of available resources is, to render a given — logically and physically strictly possible but utterly improbable — potential chance- based event reasonably observable.

We may therefore now introduce Wikipedia as a hostile witness, testifying against known ideological interest, in its article on the Infinite Monkeys theorem:

In one of the forms in which probabilists now know this theorem, with its “dactylographic” [i.e., typewriting] monkeys (French: singes dactylographes; the French word singe covers both the monkeys and the apes), appeared in Émile Borel‘s 1913 article “Mécanique Statistique et Irréversibilité” (Statistical mechanics and irreversibility),[3] and in his book “Le Hasard” in 1914. His “monkeys” are not actual monkeys; rather, they are a metaphor for an imaginary way to produce a large, random sequence of letters. Borel said that if a million monkeys typed ten hours a day, it was extremely unlikely that their output would exactly equal all the books of the richest libraries of the world; and yet, in comparison, it was even more unlikely that the laws of statistical mechanics would ever be violated, even briefly.

The physicist Arthur Eddington drew on Borel’s image further in The Nature of the Physical World (1928), writing:

If I let my fingers wander idly over the keys of a typewriter it might happen that my screed made an intelligible sentence. If an army of monkeys were strumming on typewriters they might write all the books in the British Museum. The chance of their doing so is decidedly more favourable than the chance of the molecules returning to one half of the vessel.[4]

These images invite the reader to consider the incredible improbability of a large but finite number of monkeys working for a large but finite amount of time producing a significant work, and compare this with the even greater improbability of certain physical events. Any physical process that is even less likely than such monkeys’ success is effectively impossible, and it may safely be said that such a process will never happen.

Let us emphasise that last part, as it is so easy to overlook in the heat of the ongoing debates over origins and the significance of the idea that we can infer to design on noticing certain empirical signs:

These images invite the reader to consider the incredible improbability of a large but finite number of monkeys working for a large but finite amount of time producing a significant work, and compare this with the even greater improbability of certain physical events. Any physical process that is even less likely than such monkeys’ success is effectively impossible, and it may safely be said that such a process will never happen.

Why is that?

Because of the nature of sampling from a large space of possible configurations. That is, we face a needle-in-the-haystack challenge.

For, there are only so many resources available in a realistic situation, and only so many observations can therefore be actualised in the time available. As a result, if one is confined to a blind probabilistic, random search process, s/he will soon enough run into the issue that:

a: IF a narrow and atypical set of possible outcomes T, that

b: may be described by some definite specification Z (that does not boil down to listing the set T or the like), and

c: which comprise a set of possibilities E1, E2, . . . En, from

d: a much larger set of possible outcomes, W, THEN:

e: IF, further, we do see some Ei from T, THEN also

f: Ei is not plausibly a chance occurrence.

The reason for this is not hard to spot: when a sufficiently small, chance based, blind sample is taken from a set of possibilities, W — a configuration space,  the likeliest outcome is that what is typical of the bulk of the possibilities will be chosen, not what is atypical.  And, this is the foundation-stone of the statistical form of the second law of thermodynamics.

Hence, Borel’s remark as summarised by Wikipedia:

Borel said that if a million monkeys typed ten hours a day, it was extremely unlikely that their output would exactly equal all the books of the richest libraries of the world; and yet, in comparison, it was even more unlikely that the laws of statistical mechanics would ever be violated, even briefly.

In recent months, here at UD, we have described this in terms of searching for a needle in a vast haystack [corrective u/d follows]:

let us work back from how it takes ~ 10^30 Planck time states for the fastest chemical reactions, and use this as a yardstick, i.e. in 10^17 s, our solar system’s 10^57 atoms would undergo ~ 10^87 “chemical time” states, about as fast as anything involving atoms could happen. That is 1 in 10^63 of 10^150. So, let’s do an illustrative haystack calculation:

 Let us take a straw as weighing about a gram and having comparable density to water, so that a haystack weighing 10^63 g [= 10^57 tonnes] would take up as many cubic metres. The stack, assuming a cubical shape, would be 10^19 m across. Now, 1 light year = 9.46 * 10^15 m, or about 1/1,000 of that distance across. If we were to superpose such a notional 1,000 light years on the side haystack on the zone of space centred on the sun, and leave in all stars, planets, comets, rocks, etc, and take a random sample equal in size to one straw, by absolutely overwhelming odds, we would get straw, not star or planet etc. That is, such a sample would be overwhelmingly likely to reflect the bulk of the distribution, not special, isolated zones in it.

With this in mind, we may now look at the Dembski Chi metric, and reduce it to a simpler, more practically applicable form:

m: In 2005, Dembski provided a fairly complex formula, that we can quote and simplify:

χ = – log2[10^120 ·ϕS(T)·P(T|H)]. χ is “chi” and ϕ is “phi”

n:  To simplify and build a more “practical” mathematical model, we note that information theory researchers Shannon and Hartley showed us how to measure information by changing probability into a log measure that allows pieces of information to add up naturally: Ip = – log p, in bits if the base is 2. (That is where the now familiar unit, the bit, comes from.)

o: So, since 10^120 ~ 2^398, we may do some algebra as log(p*q*r) = log(p) + log(q ) + log(r) and log(1/p) = – log (p):

Chi = – log2(2^398 * D2 * p), in bits

Chi = Ip – (398 + K2), where log2 (D2 ) = K2

p: But since 398 + K2 tends to at most 500 bits on the gamut of our solar system [our practical universe, for chemical interactions! (if you want , 1,000 bits would be a limit for the observable cosmos)] and

q: as we can define a dummy variable for specificity, S, where S = 1 or 0 according as the observed configuration, E, is on objective analysis specific to a narrow and independently describable zone of interest, T:

Chi_500 =  Ip*S – 500, in bits beyond a “complex enough” threshold

(If S = 0, Chi = – 500, and, if Ip is less than 500 bits, Chi will be negative even if S is positive. E.g.: A string of 501 coins tossed at random will have S = 0, but if the coins are arranged to spell out a message in English using the ASCII code [[notice independent specification of a narrow zone of possible configurations, T], Chi will — unsurprisingly — be positive.)

r: So, we have some reason to suggest that if something, E, is based on specific information describable in a way that does not just quote E and requires at least 500 specific bits to store the specific information, then the most reasonable explanation for the cause of E is that it was intelligently designed. (For instance, no-one would dream of asserting seriously that the English text of this post is a matter of chance occurrence giving rise to a lucky configuration, a point that was well-understood by that Bible-thumping redneck fundy — NOT! — Cicero in 50 BC.)

s: The metric may be directly applied to biological cases:

t: Using Durston’s Fits values — functionally specific bits — from his Table 1, to quantify I, so also  accepting functionality on specific sequences as showing specificity giving S = 1, we may apply the simplified Chi_500 metric of bits beyond the threshold:

RecA: 242 AA, 832 fits, Chi: 332 bits beyond

SecY: 342 AA, 688 fits, Chi: 188 bits beyond

Corona S2: 445 AA, 1285 fits, Chi: 785 bits beyond

u: And, this raises the controversial question that biological examples such as DNA — which in a living cell is much more complex than 500 bits — may be designed to carry out particular functions in the cell and the wider organism.

v: Therefore, we have at least one possible general empirical sign of intelligent design, namely: functionally specific, complex organisation and associated information [[FSCO/I] .

But, but, but . . . isn’t “natural selection” precisely NOT a chance based process, so doesn’t the ability to reproduce in environments and adapt to new niches then dominate the population make nonsense of such a calculation?

NO.

Why is that?

Because of the actual claimed source of variation (which is often masked by the emphasis on “selection”) and the scope of innovations required to originate functionally effective body plans, as opposed to varying same — starting with the very first one, i.e. Origin of Life, OOL.

But that’s Hoyle’s fallacy!

Advice: when you go up against a Nobel-equivalent prize-holder, whose field requires expertise in mathematics and thermodynamics, one would be well advised to examine carefully the underpinnings of what is being said, not just the rhetorical flourish about tornadoes in junkyards in Seattle assembling 747 Jumbo Jets.

More specifically, the key concept of Darwinian evolution [we need not detain ourselves too much on debates over mutations as the way variations manifest themselves], is that:

CHANCE VARIATION (CV) + NATURAL “SELECTION” (NS) –> DESCENT WITH (UNLIMITED) MODIFICATION (DWM), i.e. “EVOLUTION.”

CV + NS –> DWM, aka Evolution

If we look at NS, this boils down to differential reproductive success in environments leading to elimination of the relatively unfit.

That is, NS is a culling-out process, a subtract-er of information, not the claimed source of information.

That leaves only CV, i.e. blind chance, manifested in various ways. (And of course, in anticipation of some of the usual side-tracks, we must note that the Darwinian view, as modified though the genetic mutations concept and population genetics to describe how population fractions shift, is the dominant view in the field.)

There are of course some empirical cases in point, but in all these cases, what is observed is fairly minor variations within a given body plan, not the relevant issue: the spontaneous emergence of such a complex, functionally specific and tightly integrated body plan, which must be viable from the zygote on up.

To cover that gap, we have a well-known metaphorical image — an analogy, the Darwinian Tree of Life. This boils down to implying that there is a vast contiguous continent of functionally possible variations of life forms, so that we may see a smooth incremental development across that vast fitness landscape, once we had an original life form capable of self-replication.

What is the evidence for that?

Actually, nil.

The fossil record, the only direct empirical evidence of the remote past, is notoriously that of sudden appearances of novel forms, stasis (with some variability within the form obviously), and disappearance and/or continuation into the modern world.

If by contrast the tree of life framework were the observed reality, we would see a fossil record DOMINATED by transitional forms, not the few strained examples that are so often triumphalistically presented in textbooks and museums.

Similarly, it is notorious that fairly minor variations in the embryological development process are easily fatal. No surprise, if we have a highly complex, deeply interwoven interactive system, chance disturbances are overwhelmingly going to be disruptive.

Likewise, complex, functionally specific hardware is not designed and developed by small, chance based functional increments to an existing simple form.

Hoyle’s challenge of overwhelming improbability does not begin with the assembly of a Jumbo jet by chance, it begins with the assembly of say an indicating instrument on its cockpit instrument panel.

The D’Arsonval galvanometer movement commonly used in indicating instruments; an adaptation of a motor, that runs against a spiral spring (to give proportionality of deflection to input current across the magnetic field) which has an attached needle moving across a scale. Such an instrument, historically, was often adapted for measuring all sorts of quantities on a panel.

(Indeed, it would be utterly unlikely for a large box of mixed nuts and bolts, to by chance shaking, bring together matching nut and bolt and screw them together tightly; the first step to assembling the instrument by chance.)

Further to this, It would be bad enough to try to get together the text strings for a Hello World program (let’s leave off the implementing machinery and software that make it work) by chance. To then incrementally create an operating system from it, each small step along the way being functional, would be a bizarrely operationally impossible super-task.

So, the real challenge is that those who have put forth the tree of life, continent of function type approach, have got to show, empirically that their step by step path up the slopes of Mt Improbable, are empirically observable, at least in reasonable model cases. And, they need to show that in effect chance variations on a Hello World will lead, within reasonable plausibility, to such a stepwise development that transforms the Hello World into something fundamentally different.

In short, we have excellent reason to infer that — absent empirical demonstration otherwise — complex specifically functional integrated complex organisation arises in clusters that are atypical of the general run of the vastly larger set of physically possible configurations of components. And, the strongest pointer that this is plainly  so for life forms as well, is the detailed, complex, step by step information controlled nature of the processes in the cell that use information stored in DNA to make proteins.  Let’s call Wiki as a hostile witness again, courtesy two key diagrams:

I: Overview:

The step-by-step process of protein synthesis, controlled by the digital (= discrete state) information stored in DNA

II: Focusing on the Ribosome in action for protein synthesis:

The Ribosome, assembling a protein step by step based on the instructions in the mRNA “control tape” (the AA chain is then folded and put to work)

Clay animation video [added Dec 4]:

More detailed animation [added Dec 4]:

This sort of elaborate, tightly controlled, instruction based step by step process is itself a strong sign that this sort of outcome is unlikely by chance variations.

(And, attempts to deny the obvious, that we are looking at digital information at work in algorithmic, step by step processes, is itself a sign that there is a controlling a priori at work that must lock out the very evidence before our eyes to succeed. The above is not intended to persuade such, they are plainly not open to evidence, so we can only note how their position reduces to patent absurdity in the face of evidence and move on.)

But, isn’t the insertion of a dummy variable S into the Chi_500 metric little more than question-begging?

Again, NO.

Let us consider a simple form of the per-aspect explanatory filter approach:

The per aspect design inference explanatory filter

You will observe two key decision nodes,  where the first default is that the aspect of the object, phenomenon or process being studied, is rooted in a natural, lawlike regularity that under similar conditions will produce similar outcomes, i.e there is a reliable law of nature at work, leading to low contingency of outcomes.  A dropped, heavy object near earth’s surface will reliably fall at g initial acceleration, 9.8 m/s2.  That lawlike behaviour with low contingency can be empirically investigated and would eliminate design as a reasonable explanation.

Second, we see some situations where there is a high degree of contingency of possible outcomes under initial circumstances.  This is the more interesting case, and in our experience has two candidate mechanisms: chance, or choice. The default for S under these circumstances, is 0. That is, the presumption is that chance is an adequate explanation, unless there is a good — empirical and/or analytical — reason to think otherwise.  In short, on investigation of the dynamics of volcanoes and our experience with them, rooted in direct observations, the complexity of a Mt Pinatubo is explained partly on natural laws and chance variations, there is no need to infer to choice to explain its structure.

But, if the observed configurations of highly contingent elements were from a narrow and atypical zone T not credibly reachable based on the search resources available, then we would be objectively warranted to infer to choice. For instance, a chance based text string of length equal to this post, would  overwhelmingly be gibberish, so we are entitled to note the functional specificity at work in the post, and assign S = 1 here.

So, the dummy variable S is not a matter of question-begging, never mind the usual dismissive talking points.

I is of course an information measure based on standard approaches, through the sort of probabilistic calculations Hartley and Shannon used, or by a direct observation of the state-structure of a system [e.g. on/off switches naturally encode one bit each].

And, where an entity is not a direct information storing object, we may reduce it to a mesh of nodes and arcs, then investigate how much variation can be allowed and still retain adequate function, i.e. a key and lock can be reduced to a bit measure of implied information, and a sculpture like at Mt Rushmore can similarly be analysed, given the specificity of portraiture.

The 500 is a threshold, related to the limits of the search resources of our solar system, and if we want more, we can easily move up to the 1,000 bit threshold for our observed cosmos.

On needle in a haystack grounds, or monkeys strumming at the keyboards grounds, if we are dealing with functionally specific, complex information beyond these thresholds, the best explanation for seeing such is design.

And, that is abundantly verified by the contents of say the Library of Congress (26 million works) or the Internet, or the product across time of the Computer programming industry.

But, what about Genetic Algorithms etc, don’t they prove that such FSCI can come about by cumulative progress based on trial and error rewarded by success?

Not really.

As a rule, such are about generalised hill-climbing within islands of function characterised by intelligently designed fitness functions with well-behaved trends and controlled variation within equally intelligently designed search algorithms. They start within a target Zone T, by design, and proceed to adapt incrementally based on built in designed algorithms.

If such a GA were to emerge from a Hello World by incremental chance variations that worked as programs in their own right every step of the way, that would be a different story, but for excellent reason we can safely include GAs in the set of cases where FSCI comes about by choice, not chance.

So, we can see what the Chi_500 expression means, and how it is a reasonable and empirically supported tool for measuring complex specified information, especially where the specification is functionally based.

And, we can see the basis for what it is doing, and why one is justified to use it, despite many commonly encountered objections. END

________

F/N, Jan 22: In response to a renewed controversy tangential to another blog thread, I have redirected discussion here. As a point of reference for background information, I append a clip from the thread:

. . . [If you wish to find] basic background on info theory and similar background from serious sources, then go to the linked thread . . . And BTW, Shannon’s original 1948 paper is still a good early stop-off on this. I just did a web search and see it is surprisingly hard to get a good simple free online 101 on info theory for the non mathematically sophisticated; to my astonishment the section A of my always linked note clipped from above is by comparison a fairly useful first intro. I like this intro at the next level here, this is similar, this is nice and short while introducing notation, this is a short book in effect, this is a longer one, and I suggest the Marks lecture on evo informatics here as a useful contextualisation. Qualitative outline here. I note as well Perry Marshall’s related exchange here, to save going over long since adequately answered talking points, such as asserting that DNA in the context of genes is not coded information expressed in a string of 4-state per position G/C/A/T monomers. The one good thing is, I found the Jaynes 1957 paper online, now added to my vault, no cloud without a silver lining.

If you are genuinely puzzled on practical heuristics, I suggest a look at the geoglyphs example already linked. This genetic discussion may help on the basic ideas, but of course the issues Durston et al raised in 2007 are not delved on.

(I must note that an industry-full of complex praxis is going to be hard to reduce to an in a nutshell. However, we are quite familiar with information at work, and how we routinely measure it as in say the familiar: “this Word file is 235 k bytes.” That such a file is exceedingly functionally specific can be seen by the experiment of opening one up in an inspection package that will access raw text symbols for the file. A lot of it will look like repetitive nonsense, but if you clip off such, sometimes just one header character, the file will be corrupted and will not open as a Word file. When we have a great many parts that must be right and in the right pattern for something to work in a given context like this, we are dealing with functionally specific, complex organisation and associated information, FSCO/I for short.

The point of the main post above is that once we have this, and are past 500 bits or 1000 bits, it is not credible that such can arise by blind chance and mechanical necessity. But of course, intelligence routinely produces such, like comments in this thread. Objectors can answer all of this quite simply, by producing a case where such chance and necessity — without intelligent action by the back door — produces such FSCO/I. If they could do this, the heart would be cut out of design theory. But, year after year, thread after thread, here and elsewhere, this simple challenge is not being met. Borel, as discussed above, points out the basic reason why.

Comments
So, what is it [dFSCI] measuring? It is easy: the coplexity in bits necessary to implement the defined function. Whatever that function is.
But how do you define a "function" in such a way that doesn't render your argument circular? A thing or a system may have many functions. Function is in the eye of the beholder. It's completely subjective. By assuming in your premise that the function you observe for an object or system is the "defined function" you are assuming the thing you are allegedly trying to test forlastyearon
January 24, 2012
January
01
Jan
24
24
2012
09:54 AM
9
09
54
AM
PDT
For all interested: I have just posted the last in my long series about modelling RV + NS. The thread is always the same: https://uncommondescent.com/intelligent-design/evolutionist-youre-misrepresenting-natural-selection/comment-page-2/#comment-413684 (posts 34 and following) Any comment is welcome.gpuccio
January 24, 2012
January
01
Jan
24
24
2012
08:37 AM
8
08
37
AM
PDT
Petrushka: On another thread, I have given the link to the Wikipedia page about probabilistic distribution, 18000 characters long, as an example of text that certainly has more than 1000 bits of functional complexity, according to my demobstration, and therefore allows a safe design inference. How do you believe that text was written? By evolution?gpuccio
January 24, 2012
January
01
Jan
24
24
2012
08:33 AM
8
08
33
AM
PDT
Petrushka,
I would suggest your analogy is not worth much. The ID argument is basedd on the claim that coding sequences are not connected, and are therefore not evolvable.
The context is the generation of functional information. So it's not an analogy. It's a straightforward example. The ID argument is not based on the claim that coding sequences are not connected. It is based on the totality of available evidence, which does not include the connectedness of coding sequences. That would be influenced by adding actual evidence, not by supplementing the lack of it with the presupposition that coding sequences are connected and evolvable.
What’s missing from ID is a demonstration that impossibly long sequences have a syntax that would allow generation by a finite designer, without using evolution.
If you're referring specifically to proteins, then your argument is inexplicable. "It's too complicated to have evolved" is routinely chastised as a simplistic, ignorant argument, even though no demonstration of any kind to the contrary is provided. (It doesn't get enough credit. Based on the evidence it's a rational argument, and discrediting it is what sends us down this rabbit hole.) But it's okay to argue that something is too complicated to have been designed? (Please don't say that it can be designed but only by evolution. A process that does not allow you to target what you wish to design cannot be called design.) And in this case the argument can only be refuted by a specific example? We'll just ignore that the capabilities of intelligent agents to formulate complex designs has increased throughout human history and is accelerating rather than slowing? Apparently this is only a simplistic argument from ignorance when applied against darwinian evolution, but suddenly becomes enlightened logic when applied in favor of it. If you place the two arguments side by side - too complicated for evolution vs. too complicated for design - it's easy to tell that the latter is a willful argument from ignorance and the expectation of ignorance. Behind every such double standard is an arbitrary preference masquerading as science and reason. It's like a judge who sentences people of one race to prison and gives the rest probation for the same offense. In each case he can argue that he's following some legal precedent, but it quickly becomes apparent that he just likes some people more or less than others. That is exactly the sort of double standard you are attempting to pass off.ScottAndrews2
January 24, 2012
January
01
Jan
24
24
2012
08:23 AM
8
08
23
AM
PDT
I would suggest your analogy is not worth much. The ID argument is basedd on the claim that coding sequences are not connected, and are therefore not evolvable. There's quite a bit of evidence to the contrary, but in absolute terms it's still being investigated. What's missing from ID is a demonstration that impossibly long sequences have a syntax that would allow generation by a finite designer, without using evolution.Petrushka
January 24, 2012
January
01
Jan
24
24
2012
07:09 AM
7
07
09
AM
PDT
Petrushka,
Design without some form of evolution would require omniscience.
I conceived of this post and typed it without any iterative process of trial and error or variation and selection. In fact, this is my very first draft. Am I omniscient? According to you I must be. I think what you mean to say is that it would require intelligence.ScottAndrews2
January 24, 2012
January
01
Jan
24
24
2012
06:40 AM
6
06
40
AM
PDT
23.1.2.2.10 This is not fallacious. This is a purely scientific deduction in its own right. What you need is a demonstrable counter-example to falsify it. In the absense of observations suggesting that otherwise is true, we say that empirically the best explanation of such and such things is design. What's wrong with that? You can say it is an argument of "the gaps" if you wish but this argument is scientific routine. Note that explanation quality is judged in the sense of Occam and Bayes: we prefer parsimonious explanations while acknowledging that every new observation we make varies the weights of our initial hypotheses. Care should be taken in order to come up with a complete set of hypotheses. In other words, it is scientifically illegitimate to exclude design as a possible cause, not the other way around. So far, no observations at all are avaliable that would suggest the plausibility of spontaneous emergence of cybernetic control.Eugene S
January 24, 2012
January
01
Jan
24
24
2012
04:52 AM
4
04
52
AM
PDT
material_infantacy: I find your comments very reasonable and correct. I don't understand why champignon says they make no sense. Maybe he means they don't answer his concerns about necessity mechanisms (that is not the same as "not making sense). Anyway, I have answered that point in my post 23.1.2.2.11. Thanks again for your contributions.gpuccio
January 24, 2012
January
01
Jan
24
24
2012
04:05 AM
4
04
05
AM
PDT
Petrushka: The "simple linear equeztion" you speak of is the regression. The regression explains much of the variance (about 80% of it). That means that about 80% of the variance in dFSCI depends on sequence length. But there is a residual 20% variance that dependes on some other thing: the most reasonable hypothesis is that it depends on the specific structure function relationship in that protein family. So, Durston's data are very useful, not only to support the ID theory, but also as a tool to investigate the strucutre function relationship for different protein functions, and in general the protein functional space.gpuccio
January 24, 2012
January
01
Jan
24
24
2012
03:58 AM
3
03
58
AM
PDT
champignon (post 23.1.2.2.10): Let's complete and refine the statements: "1. When we look at known designed and undesigned objects, only the designed objects have high dFSCI. 2. Many biological objects have high dFSCI. 3. No known necessity mechanism can explain their origin, even if coupled to random variation. 4. Therefore, we infer design as the best explanation for them. Yes, that's the argument in a nutshell. It is an empirical inference by analogy. Now, would you please explain why "it’s obviously a fallacious argument". I am curious to understand how I have missed such an obvious conclusion for years.gpuccio
January 24, 2012
January
01
Jan
24
24
2012
03:54 AM
3
03
54
AM
PDT
champignon: If you go back to Dembski's explanatory filter, you will remember that two conditions are necessary to infer design: a) The observed object must exhibit CSI b) A necessity explanation must reasonably be ruled out My reasoning is the same. dFSCI must be exhibited to infer design. Abd any necessity explanation, if known, must be taken into account. I have said clearly, in response to you (my post 23.1.2.1.1): "dFSCI is used in my analysis only to evaluate the possibility (or empirical iompossibility) that a certain functional result may have emerged in a random way. The necessity part of the neo darwinian algorithm, NS, is always present in my discussions, but it is evaluated separately. You can look, if you want, to my posts here: https://uncommondescent.com/intelligent-design/evolutionist-youre-misrepresenting-natural-selection/comment-page-2/#comment-413684 (posts 34 and following) I will add a last post about the relationship between positive NS and the probabilistic modelling, as soon as you guys leave me the time! " So, why do you state, in your post 23.1.2.2.1: What? I thought dFSCI was supposed to be a reliable, no-false-positives indicator of design! Yes, it is, provided we apply it only to random transitions, and analyze separately the known necessity mechanisms. So, what is the point? I cannot doubt your intelligence. Are you just distracted when you read my posts? I really invite you to go to the linked thread, if you have the patience to read the detailed material I have posted there. I hope to add soon the final post, that deals explicitly with positive NS and its modelling.gpuccio
January 24, 2012
January
01
Jan
24
24
2012
03:49 AM
3
03
49
AM
PDT
Kairos, I think JDNA is asking a more fundamental question. I was hoping that you (GP, MI, etc) might entertain it.Upright BiPed
January 24, 2012
January
01
Jan
24
24
2012
12:45 AM
12
12
45
AM
PDT
In fact, looking over some of gpuccio's prior comments, it appears that his argument boils down to this: 1. When we look at known designed and undesigned objects, only the designed objects have high dFSCI. 2. Some biological objects have high dFSCI. 3. Therefore, they are designed. It's obviously a fallacious argument, but that really appears to be what he is saying.champignon
January 24, 2012
January
01
Jan
24
24
2012
12:25 AM
12
12
25
AM
PDT
If high dFSCI merely means "unlikely to have come about by pure chance", then it tells us nothing about the probability that the sequence in question evolved. Thus there is the danger of a false positive -- a sequence that has high dFSCI but is not designed. For his argument to be successful, gpuccio needs to show that sequences with high dFSCI cannot evolve. He hasn't done so yet, and dFSCI as it is defined could never do so, because it doesn't take the nature of evolution into account. It is formulated based on the assumption of blind search, which is not at all how evolution works.champignon
January 24, 2012
January
01
Jan
24
24
2012
12:17 AM
12
12
17
AM
PDT
Champignon, I thought it was your comment that didn't make sense. gpuccio says,
dFSCI is used in my analysis only to evaluate the possibility (or empirical iompossibility) that a certain functional result may have emerged in a random way.
That is to say, he can evaluate the possibility that a specific functional sequence could have come about randomly. Your reply:
What? I thought dFSCI was supposed to be a reliable, no-false-positives indicator of design!
Which does not follow, because the "possibility of random emergence" precludes a false positive. It can only be false negative or negative -- these comprise a set that is the proper complement of a positive result. P(F') = 1-P(F). There are no false positives in F, and there can be none in F' by definition. If I misunderstood your remark, feel free to clarify. I'll confess to not understanding where you're getting the notion of false positives.material.infantacy
January 23, 2012
January
01
Jan
23
23
2012
11:54 PM
11
11
54
PM
PDT
I'm waiting for an example of how a designer would get to. An island of function, assuming there's no incremental path. Assuming the designer isn't God. I think it's pretty easy to see why Behe thinks the designer is God. Design without some form of evolution would require omniscience.Petrushka
January 23, 2012
January
01
Jan
23
23
2012
11:19 PM
11
11
19
PM
PDT
I wast just observing that the data plot supplied by gpuccio could be duplicated by a simple linear equation.Petrushka
January 23, 2012
January
01
Jan
23
23
2012
11:09 PM
11
11
09
PM
PDT
GP: Of course, the result is comparable to: Chi_xxx = Ip*S - xxx, where xxx is a complexity threshold. KFkairosfocus
January 23, 2012
January
01
Jan
23
23
2012
11:08 PM
11
11
08
PM
PDT
material infantacy, Your comment doesn't make sense. Have you been following the discussion closely?champignon
January 23, 2012
January
01
Jan
23
23
2012
11:05 PM
11
11
05
PM
PDT
F/N: Please re-read the original post, e.g. it addresses whether S is question-begging etc, on the way the explanatory filter works. KFkairosfocus
January 23, 2012
January
01
Jan
23
23
2012
10:57 PM
10
10
57
PM
PDT
P: Nope, the issue is not just length of sequence in digital units (thus exponentially growing scope of the space of possibilities W) but also being confined to a narrow, specific zone that is specifically constrained for the relevant function to be present. And, on "how evo works," the problem with evo as CV + NS --> DWM, is that it starts at the point where we already have a function so that we have differential reproductive success of populations. That is, it starts WITHIN an island of function. The design theory challenge is to get TO the shores of such islands of function, de novo. As has been pointed out over and over and over and over again. Adaptation and optimisation within an island of function is not the problem it is to get to where you can begin that, without intelligent direction to put together a first working model of a functional, composite, complex entity. to make this concrete, try explaining for the initial cell, how we end up with a von Neumann self replicator, that used stored code and is joined to a metabolic automaton, all using C-chemistry informational macromolecules. Then, explain how we get something like the avian lung. All, on observationally based evidence that shows what happened in the real world. There are of course billions of cases in point on how FSCO/I and especially dFSCI, have originated by intelligent action. GEM of TKIkairosfocus
January 23, 2012
January
01
Jan
23
23
2012
10:40 PM
10
10
40
PM
PDT
JDFL: Actually, sadly, after much back and forth, I am pretty well sure the "confusion" is intentional, on the part of those who have stirred it. At the very least by willful refusal to do duties of care before commenting adversely. If you compare the explanatory filter, per aspect version, you will easily enough see what is being got at. The DEFAULT ASSUMPTION is that something can be accounted for on chance and/or necessity, i.e is a natural outcome not an artificial one, and the Grand Canyon is easily explained on this. When this objection talking point was first circulated, the example was Mt Pinatubo, and the answer was the very same: S = 0, default, and there is no warrant to move it to 1. A specific warrant -- i.e. case by case -- has to be provided for switching S to 1. Just as, in court the default in anglophone jurisprudence on a criminal law case is not guilty, and a specific warrant beyond reasonable doubt has to be provided to shift that to guilty. That warrant cannot be generically given in advance, but is to be determined on the circumstances to a level that a pool of typical, reasonable and unbiased individuals of ordinary common sense will conclude that this is so beyond reasonable doubt. Just so, there is a jury mechanism in scientific work, that works pretty well wen there is not an ideological bias involved, peer review by members of the circle of the specifically knowledgeable. So, it is not as though there is no framework in which the credibility of assigning S = 1 can be had. And, again, the promoters of objecting talking points KNOW this, or SHOULD know this. That is why I have now lost patience with them, and have concluded that something is rotten in the state of Denmark. In particular, we have the direct test that has been put on the table for the past 5 - 6 years here at UD: drop a controlled noise bomb into the informational item, i.e a moderate and adjustable amount of random variation. If such modest amounts cause function to vanish, we have defined how wide the zone of functional states is. Take an ASCII text string. Underneath the bonnet, this is a string of 1's and 0's. So inject a bit of random walk on it, at a given ratio, say 1%. That is, on a random basis 1 in every 100 bits is flipped. Restore to ASCII, and see what happens [we for the moment, for simplicity, ignore the effects of the parity check bits]. Repeat. The typical English word is about 6 - 7 letters, and random changes that affect words can be seen for effect. We probably can still make out a text string with errors in 1 letter in 7, or about 1 bit in 49. Going up beyond that progressively produces gibberish. Especially if we have a random walk where we have repeated exposure. The reason we can do that is that we are actually very sophisticated information processors, and can exploit all sorts of redundancies and background knowledge. Now, let us have similar text that is source code for an application. That will be a LOT more sensitive to such random changes, as a rule, i.e. computers will do what you tell them to do, not what you intended to tell them to do. Switch to the object code for a program, which is a string of 1's and 0's. That will as a rule be very vulnerable to noise bombing. Now, go to the nodes-arcs type structural model. Noise bomb it -- this is a random walk. there will be some tolerance, depending on where you hit, but usually not much, especially at wiring together level. I once asked guest contributor EP on the effect of doing that to the control setup for a robotics workcell. He was aghast at the likely outcome, for good reason; especially given the power levels and degree of careful co-ordination at work. Similarly, I think you know that for many vehicles, getting a random change to the timing belt, will trigger serious engine damage, in some cases writing the engine off. And so forth. Pretty soon, we have in hand a pool of credible cases of such complex functional specificity, and we have in hand enough to see that certain types of phenomena we observe are credibly FSCI. The text of posts on the Internet or in books etc are the first example we have cited over and over. The code for programs and similar prescriptive "wiring diagram" constrained functional information is a second. This implies the third case, functionally coordinated composite objects that have to be put together in fairly specific ways to work, e.g. a computer or cell phone motherboard. The fault tolerance in such things is very low. An electric motor is a fourth, as is something like a car engine or other irreducibly complex functional object. (Irreducible complexity of core function is a commonplace in our world, just think about how specific and necessary car parts are for something like an engine.) Most relevantly, the von Neumann kinematic self replicator is like that, and such is in fact the heart of how a living cell self-replicates. What the objectors will stoutly resist, is acknowledging that DNA code and folding, functional proteins and protein machines are like that too. That is why they hate the term islands of function, even though these are abundantly obvious. But, that is exactly what Durston et al documented: across the domain of life, it is common for proteins to be quite constrained, sot hat the sequence for certain proteins will not vary all that much, hence the high functional bit information content they reported in the peer reviewed literature. Similarly, Behe's observation that the observed form of evolution typically pivots on slightly varying existing functional forms, and often breaking something that on breaking it will confer some advantage in a particularly stressful environment, is another example. As a Caribbean person, malaria and sickle cell anaemia come to mind as a sad classic in point; I have lost at least one treasured friend to that, while we were in college together. Similarly, observe the way that there has been no good answer form that side to the origin of the bird lung with its one-way flow. The origin of flight of birds and bats is a similar case. And there are others. Consistently, we have just so stories, without detailed, specific observationally anchored warrant for the claim that such systems originated by chance variation and natural selection in real world environments. In short, my conclusion is, the objection is fallacious. But, if fallacies were not persuasive to some and confusing to others, they would not survive. That is why I favour something more direct, like the case of random text generation, which is directly amenable to empirical observation, and we can document the result on infinite monkeys tests through a source known to be speaking against ideological interest on the point. Spaces of 10^50 possibilities have been successfully searched [24-character length functional strings], but those of 10^150 possibilities [72 character length], are a very different story, at 128 times the number of possibilities per additional letter. So, next time the objectors come by, just stand your ground and ask them, can you kindly show us a case where something that is functionally specific as can be shown directly or indirectly, can be shown by observation to have come from blind chance and mechanical necessity without intelligent intervention? (Clue: if there were such credible cases, they would be all over the internet, and the design theory movement would have long since collapsed. Clue 2: when you keep seeing claimed cases that are fallacious -- the latest was the claimed origin of a functional clock that turned out to be typical of GA's, i.e. it started well within an island of function, and moved to implicit targets, all under direction of a designer, who in this case showed off his IDE with code (not realising what he was telling us) -- that too is telling us something.) I trust his helps. GEM of TKIkairosfocus
January 23, 2012
January
01
Jan
23
23
2012
10:18 PM
10
10
18
PM
PDT
dgw: Dawkins, if I remember correctly searched for random letters and when the correct letter was found in the correct position, it was retained. I don't think that is correct. As I understand it, the Weasel algorithm generates iterations of populations of individual sentences, each consisting of random letter variants along the string representing "methinks it is like a weasel". Every individual of each population is then evaluated against the target string. Those letters that are closer to the target letter (in the ASCII sequence) at each position score higher, and the individual with the highest score goes on to become the replicator for the next generation. However, an exact letter match does not guarantee success. An individual with several close letter matches can "out-compete" an individual with a single perfect match. I did go through the exercise of coding this and you can actually see that the best-fit sentence will sometimes drift away from the target sentence throughout the sequence of populations.NormO
January 23, 2012
January
01
Jan
23
23
2012
06:38 PM
6
06
38
PM
PDT
@gpuccio#12,
The functional complexity is given as 336 functional bits. That means that, according to Durston’s method, that has compared here 1785 different sequences with the same function, the functional space if 697 bits. Therefore, the functional complexity is -log2 of 2^697 / 2^1033, that is 336 functional bits. That is quantitattive, I would say. What is your problem? Of which “inputs” are you discussing? Please, explain.
The 336 bits is perfectly quantitative, and uncontroversially so. It's just irrelevant to function. I think from reading down a little bit, and your invoking Durston, I understand the language mangling that may be going on, here. The reason (OK, one reason of several) 336 bits is perfectly quantitative, but perfectly irrelevant to anything you'd use dFSCI for is that it's just a probability sampling, that's all. It's T target configurations out of a phase space P. That's all well and good as basic division goes, but it's got naught to do with describing, measuring our capturing function. This is the whole reason I presented a random string (remember the 32 char password example I gave) as maximally "functionally complex" on a byte for byte basis. You granted the "functional" designation, and my 32 char randomly generated string is, by definition, maximally complex. Bingo, maximum dFSCI just by generating a random string, calling it "functional", and understanding information theory. But without explicitly rejecting this example (the function was... 'overly wide'), you resisted that example because the functional aspect didn't contribute to "functional complexity". Here, now, it's plain that that is just an arbitrary dismissal on your part. The "functional", the "F" in your dFSCI, means NOTHING, perfectly nothing mathematically in your formula, it is now clear. dFSCI is nothing more base 2 logarithm for a probability (T/P). There's nothing even tangentially related to function in dFSCI. To test this, let's accept that your "ring-breaking enzyme" (RBE) factory AA is 336 bits. So: dFSCI-RBE == 336. Now, let's switch the sequence out with a new AA sequence of the same length, which we randomly select from somewhere, anywhere, we care not where, and understand that it is provisioned with NO KNOWN FUNCTION (NKF). Now we do the same math, and voilà! we get precisely the same result: dFSCI-NKF == 336. The ONLY DIFFERENCE is that I switched out the data bits you are operating on. But as you have it, THE ACTUAL BITS DON'T MATTER. You don't even reference any of the actual bits in the sequence when you calculate dFSCI. That means that dFSCI is mathematically COMPLETELY independent of any function that is related to the content you are evaluating. You don't care what the amino acid sequence actually is for RBE -- it doesn't matter. You could reverse every other codon, completely changing its production characteristics (function), and your metric would neither know or care, mathematically. Elizabeth triggered the light bulb moment with this comment from her, above -- she's smarter than I am in reverse engineering dFSCI:
So would I be right in saying then, that to calculate the dFCSI of a gene you first decide: is it functional? If it is, it scores Function=1, if it isn’t it scores Function=0.
That's frankly outrageous -- dFSCI hardly even rises to the level of 'prank' if this is the essence of dFSCI. I feel like asking for all the time back I wasted in trying to figure your posts out with the expectation that there was an earnest attempt to at least FAIL at or FAKE some engagement with the functional data, AS functional data, rather than simply slapping the arbitrary label "functional" on some string and doing simple division of pulled-out-of-the-ether phase space probabilities. You got me their, gpuccio, I admit. This isn't even a toy, and you had me thinking there was something, something at least wrong or confused there, that took some analysis of data and algorithm as 'functional'. I should have known from the way you responded to the 32 char random password. I'm a chump. Fool me once, I guess. You are right, your calculation is perfectly quantitative, but it's also perfectly vacuous in terms of 'function'. I was looking, over and over for how you might be working in the functional part, but I just missed what you said, admitting it openly as it turned out -- you don't even consider function mathematically. It's just a probability score you assign to things you (and I) deem 'functional', but on grounds TOTALLY UNRELATED to the actual data set you are purportedly analyzing. To sum up, if I say that the AA I offered above ("NKF") is now something I declare to be functional, dFSCI-NKF now instantly really IS 336 bits, and that AA sequence is as dFSCI-rich as dFSCI-RBE. Everything turns on just my waving my hand and saying "this is functional data". The data itself doesn't even enter into the analysis! Which does not mean that I disagree that this AA sequence or that may be functional. I'm sure there are plenty of places we agree. But it's an absolutely debilitating flaw -- a joke of a construct -- to say this is the basis of your metric. If we disagree on what is functional, then what? Moreover, ANY string can be declared functional, at any time, and functions can be marshaled up for any string on demand (that was the point of my random 32 char password example, which was right on the money, after all, I see now). Given that, dFSCI cannot go anywhere at all, not possibly. It's less than useless as a tool for knowledge building and investigation.eigenstate
January 23, 2012
January
01
Jan
23
23
2012
06:02 PM
6
06
02
PM
PDT
dFSCI is used in my analysis only to evaluate the possibility (or empirical iompossibility) that a certain functional result may have emerged in a random way.
What? I thought dFSCI was supposed to be a reliable, no-false-positives indicator of design!
The set F is comprised of all functional sequences of sufficient length (no false positives here). The set F' is the complement of F in the sequence space S (false negatives may be here with the true negatives). F ∪ F' = S, and F ∩ F' = {} F' are all sequences that may have come about in a random way. The false negatives are in this set, hence elements in the set "may have come about in a random way." IOW, P(F') = 1 - P(F). If a sequence "may have emerged in a random way" then it is "not definitively designed."material.infantacy
January 23, 2012
January
01
Jan
23
23
2012
05:21 PM
5
05
21
PM
PDT
Wouldn't it be simpler to say dFSCI = sequence length times 2.5?Petrushka
January 23, 2012
January
01
Jan
23
23
2012
05:06 PM
5
05
06
PM
PDT
dFSCI is used in my analysis only to evaluate the possibility (or empirical iompossibility) that a certain functional result may have emerged in a random way.
What? I thought dFSCI was supposed to be a reliable, no-false-positives indicator of design!champignon
January 23, 2012
January
01
Jan
23
23
2012
04:57 PM
4
04
57
PM
PDT
champignon: 1) My idea is indeed that the method probably overestimates the target space. But you are right, we cannot be absolutely sure of the precision. But we need not precision here, just a reasonable approximation of the order of magnitude. I am not saying that the Durston method is our final procedure: it is, at present, the only simple procedure we have, it is credible, reasonable, and it certainly meausres what it says it meausures. As it always happens in empirical science, its precision has to be verified by independent methods. As I have suggested, that can and will be done by means of deeper understanding of the sequence function relationship in proteins and of the topology of protein functional space. You see, the general attitude of darwinists regarding the problem of functional comlexity is to deny it exists, and to denigrateall the serious attempts at solving it scientifically. But that is not a scientific attitude at all. The problem exists, is very importamt, and indeed it is crucila to the neo dariwnian theory. And Durston has greatly contributed to the solution. 2) Good points, but out of context. You have obviously not followed my general reasoning. dFSCI is used in my analysis only to evaluate the possibility (or empirical iompossibility) that a certain functional result may have emerged in a random way. The necessity part of the neo darwinian algorithm, NS, is always present in my discussions, but it is evaluated separately. You can look, if you want, to my posts here: https://uncommondescent.com/intelligent-design/evolutionist-youre-misrepresenting-natural-selection/comment-page-2/#comment-413684 (posts 34 and following) I will add a last post about the relationship between positive NS and the probabilistic modelling, as soon as you guys leave me the time! :) So, in no way I am assuming that "evolution works by blind search". I understand that the neo darwinian algorithm works by RV + NS, and I deal with both aspects. But dFSCI is useful only to evaluate the RV part. And it is used by me only for that. Finally, I am not assuming that evolution works for a specific function defined in advance. My reasoning is very different: a) We know that specific functional information emerged at definite times in natural history (in particular, new basic protein domains). b) We can compute dFSCI for each new domain, expressing the probability that that specific function may have emerged in a random way. c) That is not the end of the story. Darwinists will onject (and they do!) that other useful functions could have emerged. That is correct. But I have a couple of arguments about that: c1) First of all, the only functions that are interesting for our discussion are new, naturally selectable protein domains. That's what we are trying to explaing: the emergence of new protein domains. c2) In a specific biological environment, the existing complexity creates huge constraints to what protein domains are "naturally selectable", that is, can by themselves give a reproductive advantage, and therefore be fixed and expand. IC adds to those constraints. The need of a, apporpriate regulatory integration adds further difficulties. c3) Therefore, it can be argued (and I definitely argue) that in each specific scenario, only a few new proteindomains and biochemicla functions would be naturally selectable. c4) However, whatever is their number, the size of their respective target spaces must be only summed to the size of all other selectable possible new domains. Therefore, unless the number of possible, selectable new domains is really huge (and there is ablsolutely no eivdence of that), the "help" deriving from considering all possible selectable new domains, instead of one, will be really small. Please, consider that in the proteome we have "only" 2000 protein superfamilies. Now, let's say, that in a specific moment of natural history, in a specific species, a new protein domain appears. Let's say its functional complexity is 500 bits, because the search space is 700 bits and the target space is 200 bits. Now, let's hypotesize, being really too generous, that in that scenario 1000 other new basic domains, all of them naturally selectable in that context, and with similar functional complexity, could have emerged. Then the whole probability of the target space will be 2^200 * 1000, that is 210 bits. The functional complexity, evaluated for the whole functional space of all 1000 possible domains, will still be 690 bits. So, as you can see, only the existence of huge numbers of possible new functional domains would be of help. But that assumption is against all we know, and in particular: a) The rarity of folding and functional sequences in the search space b) The fact that only 2000 protein superfamilies have been found in billion of years of evolution c) The fact that the emergence of new functional domains has become always more rare with the advancement of evolution, and in more recent times. These are all good empirical indications that basci protein domains are isolated islands in the serach space, and that their number is not very big.gpuccio
January 23, 2012
January
01
Jan
23
23
2012
04:08 PM
4
04
08
PM
PDT
By the way, following Peter's advice, I have uploaded my scatterplot of Durston's data about sequence length and functional complexity at imageshack. Here is the link: http://img17.imageshack.us/img17/5649/durston.jpggpuccio
January 23, 2012
January
01
Jan
23
23
2012
03:28 PM
3
03
28
PM
PDT
The Durston method, instead, is a simple and powerfiul method to appoximate the target space of specific protein families, and it is based on the comparison of a grest number of different sequences in the proteome that implement the same function in different species, and a brilliant application to that of the principle of Shannon’s uncertainty.
gpuccio, A couple of comments: 1. If you use Durston's method as a basis for your dFSCI calculation and you want to guarantee no false positives, you have to show among other things that Durston's method doesn't underestimate the size of the target space. How can you demonstrate this? 2. By defining dFSCI in terms of the ratio of the sizes of the target space and the search space, you are in effect assuming that evolution 1) works by blind search 2) for a specific function defined in advance. Neither is true.champignon
January 23, 2012
January
01
Jan
23
23
2012
03:23 PM
3
03
23
PM
PDT
1 8 9 10 11 12 14

Leave a Reply