Uncommon Descent Serving The Intelligent Design Community

On FSCO/I vs. Needles and Haystacks (as well as elephants in rooms)

Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

Sometimes, the very dismissiveness of hyperskeptical objections is their undoing, as in this case from TSZ:

Pesky EleP(T|H)ant

Over at Uncommon Descent KirosFocus repeats the same old bignum arguments as always. He seems to enjoy the ‘needle in a haystack’ metaphor, but I’d like to counter by asking how does he know he’s not searching for a needle in a needle stack? . . .

What had happened, is that on June 24th, I had posted a discussion here at UD on what Functionally Specific Complex Organisation and associated Information (FSCO/I) is about, including this summary infographic:

csi_defnInstead of addressing what this actually does, RTH of TSZ sought to strawmannise and rhetorically dismiss it by an allusion to the 2005 Dembski expression for Complex Specified Information, CSI:

χ = – log2[10^120 ·ϕS(T)·P(T|H)].

–> χ is “chi” and ϕ is “phi” (where, CSI exists if Chi > ~ 1)

. . . failing to understand — as did the sock-puppet Mathgrrrl [not to be confused with the Calculus prof who uses that improperly appropriated handle) — that by simply moving forward to the extraction of the information and threshold terms involved, this expression reduces as follows:

To simplify and build a more “practical” mathematical model, we note that information theory researchers Shannon and Hartley showed us how to measure information by changing probability into a log measure that allows pieces of information to add up naturally:

Ip = – log p, in bits if the base is 2. That is where the now familiar unit, the bit, comes from. Where we may observe from say — as just one of many examples of a standard result — Principles of Comm Systems, 2nd edn, Taub and Schilling (McGraw Hill, 1986), p. 512, Sect. 13.2:

Let us consider a communication system in which the allowable messages are m1, m2, . . ., with probabilities of occurrence p1, p2, . . . . Of course p1 + p2 + . . . = 1. Let the transmitter select message mk of probability pk; let us further assume that the receiver has correctly identified the message [[–> My nb: i.e. the a posteriori probability in my online discussion here is 1]. Then we shall say, by way of definition of the term information, that the system has communicated an amount of information Ik given by

I_k = (def) log_2  1/p_k   (13.2-1)

xxi: So, since 10^120 ~ 2^398, we may “boil down” the Dembski metric using some algebra — i.e. substituting and simplifying the three terms in order — as log(p*q*r) = log(p) + log(q ) + log(r) and log(1/p) = log (p):

Chi = – log2(2^398 * D2 * p), in bits,  and where also D2 = ϕS(T)
Chi = Ip – (398 + K2), where now: log2 (D2 ) = K
That is, chi is a metric of bits from a zone of interest, beyond a threshold of “sufficient complexity to not plausibly be the result of chance,”  (398 + K2).  So,
(a) since (398 + K2) tends to at most 500 bits on the gamut of our solar system [[our practical universe, for chemical interactions! ( . . . if you want , 1,000 bits would be a limit for the observable cosmos)] and
(b) as we can define and introduce a dummy variable for specificity, S, where
(c) S = 1 or 0 according as the observed configuration, E, is on objective analysis specific to a narrow and independently describable zone of interest, T:

Chi =  Ip*S – 500, in bits beyond a “complex enough” threshold

  • NB: If S = 0, this locks us at Chi = – 500; and, if Ip is less than 500 bits, Chi will be negative even if S is positive.
  • E.g.: a string of 501 coins tossed at random will have S = 0, but if the coins are arranged to spell out a message in English using the ASCII code [[notice independent specification of a narrow zone of possible configurations, T], Chi will — unsurprisingly — be positive.

explan_filter

  • S goes to 1 when we have objective grounds — to be explained case by case — to assign that value.
  • That is, we need to justify why we think the observed cases E come from a narrow zone of interest, T, that is independently describable, not just a list of members E1, E2, E3 . . . ; in short, we must have a reasonable criterion that allows us to build or recognise cases Ei from T, without resorting to an arbitrary list.
  • A string at random is a list with one member, but if we pick it as a password, it is now a zone with one member.  (Where also, a lottery, is a sort of inverse password game where we pay for the privilege; and where the complexity has to be carefully managed to make it winnable. )
  • An obvious example of such a zone T, is code symbol strings of a given length that work in a programme or communicate meaningful statements in a language based on its grammar, vocabulary etc. This paragraph is a case in point, which can be contrasted with typical random strings ( . . . 68gsdesnmyw . . . ) or repetitive ones ( . . . ftftftft . . . ); where we can also see by this case how such a case can enfold random and repetitive sub-strings.
  • Arguably — and of course this is hotly disputed — DNA protein and regulatory codes are another. Design theorists argue that the only observed adequate cause for such is a process of intelligently directed configuration, i.e. of  design, so we are justified in taking such a case as a reliable sign of such a cause having been at work. (Thus, the sign then counts as evidence pointing to a perhaps otherwise unknown designer having been at work.)
  • So also, to overthrow the design inference, a valid counter example would be needed, a case where blind mechanical necessity and/or blind chance produces such functionally specific, complex information. (Points xiv – xvi above outline why that will be hard indeed to come up with. There are literally billions of cases where FSCI is observed to come from design.)

xxii: So, we have some reason to suggest that if something, E, is based on specific information describable in a way that does not just quote E and requires at least 500 specific bits to store the specific information, then the most reasonable explanation for the cause of E is that it was designed. The metric may be directly applied to biological cases:

Using Durston’s Fits values — functionally specific bits — from his Table 1, to quantify I, so also  accepting functionality on specific sequences as showing specificity giving S = 1, we may apply the simplified Chi_500 metric of bits beyond the threshold:
RecA: 242 AA, 832 fits, Chi: 332 bits beyond
SecY: 342 AA, 688 fits, Chi: 188 bits beyond
Corona S2: 445 AA, 1285 fits, Chi: 785 bits beyond

Where, of course, there are many well known ways to obtain the information content of an entity, which automatically addresses the “how do you evaluate p(T|H)” issue. (As has been repeatedly pointed out, just insistently ignored in the rhetorical intent to seize upon a dismissive talking point.)

There is no elephant in the room.

Apart from . . . the usual one design objectors generally refuse to address, selective hyperskepticism.

But also, RTH imagines there is a whole field of needles, refusing to accept that many relevant complex entities are critically dependent on having the right parts, correctly arranged, coupled and organised in order to function.

That is, there are indeed empirically and analytically well founded narrow zones of functional configs in the space of possible configs. By far and away most of the ways in which the parts of a watch may be arranged — even leaving off the ever so many more ways they can be scattered across a planet or solar system– will not work.

The reality of narrow and recognisable zones T in large spaces W beyond the blind sampling capacity — that’s yet another concern — of a solar system of 10^57 atoms or an observed cosmos of 10^80 or so atoms and 10^17 s or so duration, is patent. (And if RTH wishes to dismiss this, let him show us observed cases of life spontaneously organising itself out of reasonable components, say soup cans. Or, of watches created by shaking parts in drums, or of recognisable English text strings of at least 72 characters being created through random text generation . . . which last is a simple case that is WLOG, as the infographic points out. As, 3D functional arrangements can be reduced to code strings, per AutoCAD etc.)

Finally, when the material issue is sampling, we do not need to generate grand probability calculations.

The proverbial needle in the haystack
The proverbial needle in the haystack

For, once we are reasonably confident that we are looking at deeply isolated zones in a field of possibilities, it is simple to show that unless a “search” is so “biased” as to be decidedly not random and decidedly not blind, only a blind sample on a scope sufficient to make it reasonably likely to catch zones T in the field W would be a plausible blind chance + mechanical necessity causal account.

But, 500 – 1,000 bits (a rather conservative threshold relative to what we see in just the genomes of life forms) of FSCO/I is (as the infographic shows) far more than enough to demolish that hope. For 500 bits, one can see that to give every one of the 10^57 atoms of our solar system a tray of 500 H/T coins tossed and inspected every 10^-14 s — a fast ionic reaction rate — would sample as one straw to a cubical haystack 1,000 LY across, about as thick as our galaxy’s central bulge. If such a haystack were superposed on our galactic neighbourhood and we were to take a blind, reasonably random one-straw sized sample it would with maximum likelihood be straw.

As in, empirically impossible, or if you insist, all but impossible.

 

It seems that objectors to design inferences on FSCO/I have been reduced to clutching at straws. END

Comments
Gordon Davisson: Here I will discuss the definition and meaning of dFSCI as a tool for design detection without any reference to its application in the biological field. OK? So, for this point, we can completely ignore biological information and the design inference for it. Let's just say that we agree, for the moment, to consider all objects in the biological world as "objects whose origin is not known, and will be discussed later". I think that we can agree that the origin of biological objects, or at least of the specific information in them, is at best controversial. If it were not controversial, we would not be here to discuss. So, the concept of dFSCI is, as I have argued, a restricted subset of the concept of CSI. In my definition, an observer is completely free to define explicitly any possible function for any possible object (in particular, digital sequence). There are no limitations (this is an important point) and the function must be objectively defined so that its presence or absence can unambiguosly be assessed for any possible object (sequence). Once the function is defined, a value of dFSI can be (at least in principle) computed for that function, an appropriate threshold of complexity can be established for the system and time span where we assume the object originated, and by that threshold we can asses if the object exhibits dFSCI for that system, after having checked that there are no known algorithmic procedures in the system that can help to overcome the probabilistic barriers. If we conclude that the object exhibits dFSCI, we can infer design. Why? Because it can be shown that this method, if applied correctly to any object out of the biological world, will work with 100% specificity to detect objects designed by humans. It has, obviously, low sensitivity. IOWs, it has no false positives, and many false negatives. That is the consequence of choosing a threshold which definitely favors the specificity - sensitivity tradeoff to obtain absolute specificity. So, how is the threshold established? It's simple. We compute the probabilistic resources of the system in the time span (the number of new states that the system can reach in the time span, given the "engines of variation" acting in it). As the dFSI value can be interpreted as the probability of getting to that specific functional state in one attempt, assuming an uniform distribution of the possible states, we must choose a threshold which is much higher (in -log2 terms) than the probabilistic resources of the system, so that the probabilistic explanation can be safely rejected. Obviously, we don't choose a 0.05 alpha level as in ordinary research! That would be folly. We must have a threshold that is many orders of magnitude higher than the computed probabilistic resources. That's how I have proposed by a (very gross) computation a threshold of 150 bits for biological information on our planet, I an not yet speaking of biological information here, I want just to show the procedure to compute an appropriate threshold. I don't remember the numbers now, so I will just give an idea. I grossly considered the total number of bacteria on earth and 5 billion years of existence for our planet, and an average bacterial mutation rate, and computed the total number of mutations in that system for that time span. I considered that as a reasonable higher threshold for the biological probabilistic resource of our planet. Then I added many orders of magnitude to that, and arrived to my proposed number of 150 bits. My computation is very gross and can well be wrong and need refinement, but I suppose that the final reasonable value will still be much lower than the 500 bits of the UPB. Now, two simple facts: a) The procedure to compute dFSCI, as you can see, is completely specific for one explicit function. It takes not into account other possible functions. b) Given that, if we apply the procedure to any digital sequence, we can easily identify correctly designed sequences, with 100% specificity and low sensitivity. This is the empirical validation of the model. It can be easily applied to language or software or any digital machine. I am not aware of any false positive that has ever occurred. To avoid long computations, it will be enough to use the 500 bit threshold (UPB), which is certainly appropriate for any system in our universe. Two important recommendations: 1) The sequence must clearly be of the "non ordered" kind. IOWs, it must be grossly pseudo-random. That is usually satisfied in language and software, provided that the sequence is long enough, because in language and software the link between sequence and function is generated by cognitive and functional rules, and cannot be compressed into a simple algorithm. In these conditions, it is easy to exclude a necessity origin of the sequence. 2) It is important to apply the model only to original, new dFSCI: IOWs, a sequence that was not present in the system at time 0, not even as homologues, and whose function is new. I maintain that, in this way, we can recognize designed objects (out of the biological world) with 100% specificity and low sensitivity. In all positive cases, the form we observe in the object has been represented, willed and outputted by a conscious designer (a human).gpuccio
August 29, 2014
August
08
Aug
29
29
2014
02:32 AM
2
02
32
AM
PDT
gpuccio
I live to neo darwinism the arguable privilege of being “a theory which has become a fact”.
leave? BTW, I plan to read and learn from your coming posts too. Thanks. ³Dionisio
August 29, 2014
August
08
Aug
29
29
2014
02:31 AM
2
02
31
AM
PDT
Gordon Davisson: Here I am. I will try to address your points one at a time, if possible in different posts, so that it is more easy for me and, I hope, for you to go on with the discussion. Obviously, my references to dFSCI in this thread have been generic and brief. I have discussed many of these points in detail elsewhere. But I must sat that you make some points with great clarity and balance, and so it will be a pleasure to deepen the discussion with you. The fact is, the general concept of dFSCI is simple and intuitive enough, but to answer the inevitable counter arguments a lot of detail is needed. In this first post, I would like to make a few simple premises which will help us much in the following discussion. The following points should not be controversial, but please feel free to intervene at any moment while I try to develop my reasoning. So, the premises: 1) ID theory is, IMO, the best explanation available for complex functional biological information. The reasons for that conclusion will be the object of the following discussion. My simple premise here is that ID theory, like all other scientific theories, is only a theory. I believe that no scientific theory is final, and that all of them should be tested again as new facts and new understanding are gathered. I live to neo darwinism the arguable privilege of being "a theory which has become a fact". In my world, theories and facts are separate categories. So, all my reasonings are aimed at showing that ID is the best explanation, not the only or the final explanation. What I mean is that, form me, "The jury will always be out". Let's say that, while the jury is out (which will probably be forever), each of us has the privilege and duty to choose what he considers the best explanation. That's the way I conceive science: a personal choice based on what is available. 2) There are obviously, as you correctly state, " big messy unknowns", but they are in the whole issue of biological information and of how it emerged. They are not in my formula or in my reasoning, they are in all formulas and in all reasonings about the issue, because the issue is much more complex than we can imagine, even with all that we know, and that grows daily. I would like to remind here that ID is trying, at least, to offer a quantitative approach to the problem of probabilities in biological evolution. That is not only the duty of ID: it is the duty of anyone who is interested in the problem, most of all the duty of those who believe that the neo darwinian model is a good solution. After all, it's the neo darwinian model which is vastly based on RV (a probabilistic system). So, it is the cogent duty of neo darwinists to show that their random explanation is appropriate. They have repeatedly avoided any serious analysis of that aspect, so ID is trying to do that for them. OK, there are big messy unknowns, but it must be tried just the same, and refined as the unknown become less messy and, I hope, smaller. So, given these very general premises, let's go to the concept of dSFCI, which I will discuss in next post.gpuccio
August 29, 2014
August
08
Aug
29
29
2014
01:50 AM
1
01
50
AM
PDT
GD, on the No Free Lunch issue you conflate two things and reflect a key gap in understanding. Behe spoke to the challenge of EVEN micro-evolution WITHIN an Island of function. The evidence indicates that with realistic, generous generation sizes, times and mut rates, it is hard, hard, hard to get a double mutation based change going, in a creature which is a going concern. This raises serious how much more the case questions on macro-evo, on the imagined broad continent of beings accessible through incremental changes model implicit in so much evolutionary theorising in light of the tree of life model and its variants. So, even were the observation that FSCO/I naturally comes in isolated islands due to requisites of function based on correct co-ordination and coupling of many parts grossly in error, there is a major challenge. But in fact, the islands of function pattern is well grounded empirically and analytically. And the point is, that blind search in a context of FSCO/I is truly blind. So, your random incremental walk, once it is in the sea of non-function, has to face blind search resource challenges to reach an island of function in a context where per solar system and 500 bits of FSCO/I, the atomic resources and time can sample something like one straw to a cubical haystack 1,000 LY across (about as thick as our galaxy). The cosmos scale case of 1,000 bits swamps our observed cosmos' search capacity to a ratio that is higher than the SQUARE of that one straw to a haystack as thick as our galaxy. So, there is a severe challenge to FIND the shores of an island of function through blind search. And, there is a challenge to implement irreducibly complex multiple component co-ordinated steps within such an island. Again, complementary, not contradictory. To give an idea, dot the illustrated picture in the OP with a large number of T's, though they still must be reasonably isolated -- not the BULK. Now, take a sample of 1 in 10^150 or worse (MUCH worse for realistic genomes), that is blind. Overwhelmingly you will get the bulk, straw not needles. Assume for argument you start at one needle and jump off "into the blue" blindly. Sampling less than 1 in 10^150 through an incremental random walk with no reinforcement until you hit another island, come up with a plausible, empirically and analytically warranted case as to how you can reach another island without a very directional information-rich driving push or strongly pulling and equally highly informational oracle. In short when we face an overwhelming search challenge, solved, active information is its best warranted explanation. KFkairosfocus
August 29, 2014
August
08
Aug
29
29
2014
12:39 AM
12
12
39
AM
PDT
UB: It amazes me the endless refusals to follow evident facts and cogent reasoning occasioned by the zero concessions I object to ID and to ID thinkers rhetorical strategy we see. It is sad, really, as it points to a spreading degradation of clarity and reasonableness in thought that has chilling implications for our civilisation. I just thought a word of encouragement would help. KF PS: The widespread assumption and cynical manipulation of the "generally dumb public," are truly saddening. Resistance to reason and genuine enlightenment (there is also false light that is in reality darkness of a Plato's Cave shadow-show out there).kairosfocus
August 29, 2014
August
08
Aug
29
29
2014
12:15 AM
12
12
15
AM
PDT
Gordon Davisson: Thank you for commenting. I have not the time now, but I will address your very good points as soon as possible. :)gpuccio
August 28, 2014
August
08
Aug
28
28
2014
09:54 PM
9
09
54
PM
PDT
Hello Gordon, I didn’t mean to imply anything more or less than what follows from your previous statement. Here is the direct quote:
I haven’t seen a definition [of information] which can be shown to be present in DNA and also cannot be produced without intelligence.
When I read your statement at the time, I was struck by the obvious fact that no one has ever seen any information (like that in DNA) come into existence without intelligent intervention, and so the basis of the claim was (to me) a bit of a mystery. I suppose the wildcard in your statement is the definition of information itself, however, I would not think your statement suddenly becomes warranted by the lack of a suitable definition, given that (regardless of the definition) it remains a fact that no one has any experience whatsoever of such information rising without intelligence. In any case, if I read you correctly then I really appreciate your candor that you know of no information (like that in DNA) coming into existence without intelligence, but instead, you make an assumption it can do so based on your chosen line of reasoning. I disagree with your reasoning because all of it simply takes for granted the material conditions required for information to come into existence in the first place. On a semiotic view, those conditions include: representation (i.e. an arrangement of matter to evoke a functional effect within a system, where the arrangement of the medium and the effect it evokes are physicochemically arbitrary); specification (i.e. a physical protocol to establish the otherwise non-existent relationship between the arrangement of the medium and its post-translation effect); and discontinuity (i.e. the observation that, in order to function, the organization of the system must preserve the discontinuity between the arrangement of the medium and its effect). These are the necessary interdependent conditions found in any instance of translated information, and they provide for a fairly steep entry to function, particularly in an inanimate pre-biotic environment. Two sets of matter must arise where one encodes the information and the other establishes what the results of that encoding will be, but because the organization of the system must also preserve the discontinuity, a set of relationships are thereby created that otherwise wouldn’t exist - and all of this must occur prior to the organization of the cell. Not only would this system necessarily arise in an inanimate environment, but the details of its construction must be simultaneously encoded in the very information that it makes possible. One additional interesting thing about DNA is of the particular type of semiotic system found there. There are two distinct categories of semiotic systems. One category uses physical representations that are reducible to their material make-up (such as a pheromone for instance), while the other uses physical representations that have a dimensional orientation and are not reducible to their material make-up (i.e. they are independent of the minimum total potential energy principle). The first type is found throughout the living kingdom. The second type is found nowhere else but in the translation of language and mathematics. Such systems require not only the same transfer protocols as any other semiotic system, but also require an additional set of systematic protocols to establish the dimensional operation of the system itself. This leads to an intractable observation; the incredibly unique material conditions required for dimensional semiosis, which would ostensibly not exist on Earth until the rise of human intelligence, were entirely evident at the very origin of life. They are the physical means by which the living cell becomes organized. Given these observations, none of which are really even controversial, it seems a little dismissive of you to claim that no evidence exists of intelligent intervention in the information that organizes life.Upright BiPed
August 28, 2014
August
08
Aug
28
28
2014
09:20 PM
9
09
20
PM
PDT
Upright Biped (back at #93):
Hi Gordon, You once made a statement on this forum that you didn’t know of a definition of information which can be shown to be present in DNA and also cannot be produced without intelligence. If this is the case, did you mean you already knew of a origin of information like that in DNA that had come about without intelligence, or are you merely assuming such information can come about without intelligence?
Not quite either of them; I'm saying I haven't seen a convincing case that the type of information in DNA requires an intelligent source, and I think that there's a reasonable case that in theory it can come from an unintelligent source, but I don't claim to be able to demonstrate this in practice. (Of course, I think that the information in the living world actually did come about without intelligent assistance, but I don't claim to be able to demonstrate this. This part is just an assumption on my part, based on Occam's razor and what seems to me to be a lack of evidence for intelligent assistance.) Let me give a quick sketch of my theoretical argument that this sort of information can come from unintelligent sources. Basically, it comes from the success of genetic algorithms: they show that random variation + selection for function can produce functional information. But what about the No Free Lunch theorem, you ask? The NFLT shows that genetic algorithms only produce functional information (in nontrivial amounts) if the fitness function has the right characteristics; a random fitness function will stymie the evolutionary algorithm, and it'll do no better than chance. Dembski, Marks, Ewert et al have extended this in various ways. I'm not as familiar with this work as I probably should be, but in general I haven't been particularly impressed by their approach: they seem to be using very abstract models that don't correspond very well with real evolution, and mix the real issue together with a bunch of irrelevant confusion. So I'll ignore their models and just go with a simpler intuititive version of what I see as the real issue. Actually, I'll claim there are three related issues: does the fitness function have the right shape to let evolution work better than chance (I say: yes!), does it have a right enough shape to work well enough to explain all the complex functional information in living organisms (I say: maybe!), and can that "right shape" be explained without requiring an intelligent agent to design it (I say: yes!). Let me take these slightly out of order. First: does the fitness function have the right shape to let evolution work better than chance (or, in Dembski et al's terms, is there active information in the fitness function)? I think the answer on this one has to be a pretty clear yes. Take a look, for example, at the recent dustup between Mike Behe and and Larry Moran about how difficult it was for the malaria parasite to evolve chloroquine resistance. Setting aside the disagreement about the precise math, note that none of the estimates of difficulty are anywhere near searching the entire space of possible protein sequences to find that new function; but under the random-fitness-function model assumed by NFLT, that's pretty much what would have been necessary. Under the NFLT random model, even minor changes to the function of a protein would have to basically start searching from scratch, because it assumes no correlation in function between similar gene/protein sequences. Second: can that "right shape" be explained without requiring an intelligent agent to design it? Again, I think the answer here is a pretty clear yes. Dembski argue that there's no reason (other than ID) to expect the fitness function to have the right shape, but I think there's a very good reason: similar causes tend to have similar effects, and as a result we can reasonably expect similar gene/protein sequences to have similar functions (as in the chloroquine resistance example), and that means that finding one functional sequence will increase the chance of find others nearby, and hence of evolution doing better than random. Third: does it have a right enough shape to work well enough to explain all the complex functional information in living organisms? If you look at the debate between Behe and Moran, they actually agree about the basics -- that evolution needs selectable intermediates to guide it to distant functional sequences -- but their disagrement is about the details of how many selectable intermediates are needed, how distant is distant, etc... And they're disagreeing about this in a case where the possible intermediates have been mapped out in great detail! Trying to do similar reasoning about larger-scale and less-well-mapped-out portions of the fitness function is, at least as far as I can see, basically hopeless at this point. And that's what leads me to say that I can't show this in practice.Gordon Davisson
August 28, 2014
August
08
Aug
28
28
2014
05:28 PM
5
05
28
PM
PDT
KF, Wow! In Antigua? Just tell us when, I'm ready anytime! ;-) I'm telling my wife to book our flights. Will leave our orange tabby cat in our children's home. Only problem the Caribbean environment might be a little distracting, hence not very conducive to having a serious meeting ;-) But I have to wait until the left side of my face is not swollen. It's getting better now, but not completely recovered yet. You see, that's another sound proof that 'n-D e' is true - if my body were designed it would have healed in an instant and completely painless, wouldn't it? ;-)Dionisio
August 28, 2014
August
08
Aug
28
28
2014
05:18 PM
5
05
18
PM
PDT
Rich: Nope, the lineage is Thaxton et al thence Orgel-Wicken [and way back, Cicero . . . ], but even if it were Sir Fred, you would be well advised to think again before dismissing as if his very name is a mark of fallacy; a Nobel-Equivalent prize holder speaking on a matter of his expertise. When you have a substantial answer to the matter posed in the infographic, let us hear it. Meanwhile the successive side tracks and strawman caricatures are duly noted. KFkairosfocus
August 28, 2014
August
08
Aug
28
28
2014
04:29 PM
4
04
29
PM
PDT
D: A UD meeting in Antigua would be a feasible proposition. "De beaches are nice, much nicer dan ice . . . " KFkairosfocus
August 28, 2014
August
08
Aug
28
28
2014
04:24 PM
4
04
24
PM
PDT
Rich, FYI WmAD has explicitly noted that in the biological world, specification is always linked to function, and indeed, function is also a key component of irreducible complexity. Add the work of Durston et al on functional sequence complexity (as opposed to orderly and random), and that of Meyer as he engages the issues of functional specificity and complexity in his published work. Dismissal that "nobody uses" fails the basic fact test. More to the point, FSCO/I in the form dFSCI is abundantly common as what is reported by our PCs as file sizes, which we would never dream of assigning to lucky noise etc. And, functional specificity is as close and as real as you had better have the right part put in correctly, to fix your car. KFkairosfocus
August 28, 2014
August
08
Aug
28
28
2014
04:22 PM
4
04
22
PM
PDT
Once again, I failed at proofreading.
Let’s say (again, all numbers made up for the sake of illustration) that this [selection] increases the “hit rate” for functional sequences by a factor of 10^100.
... that should be 2^100, not 10^100.Gordon Davisson
August 28, 2014
August
08
Aug
28
28
2014
04:03 PM
4
04
03
PM
PDT
gpuccio:
Thank you for you #102. Although addressed to Dionisio, it is a also a very good comment of my #101. First of all, I am really grateful to you because you seem to understand my personal approach much better than many others have. That is, for me, an absolute gratification.
I'm glad you liked it. But I'm also a little disturbed, because I don't feel I have a really good grasp of your argument -- general outline, sure, but the details matter, and I don't have them properly sorted out in my head (your #101 actually confused me significantlly; I'll try to ask a coherent question about this later in this message). And if I don't really understand it that well... does anyone else? I do want to comment, at least briefly, on some of the points you raise:
So, I would be grateful if you could explain better why you state: “measures like Durston’s Fits that’re easy to measure but hard to base a probability calculation on”.
Let me give a simple (and completely made-up) example to illustrate the issue. Suppose we had a 750-base-pair gene (I know, pretty small, but as I said this is a simple example), meaning its total information content was 1500 bits. Let's say that 250 of those bases are fully required for the gene's function, and the other 500 could be anything without changing its function. That means we should get around 500 Fits for this gene (assuming the sequence space is fully explored etc). But that doesn't mean that the probability of something like this evolving is 1 in 2^500, because there are other factors you need to take into account. First, there are many different functions that a gene might have, and we're only looking at the probability of that particular one evolving. Let's say, for the sake of illustration, that there are 2^50 possible functions that a gene might perform. Also, there are likely to be many different ways that a gene might perform any of these functions. I'm not talking about minor sequence variation (Durston's method accounts for those), but e.g. the difference between the bacterial and archaeal flagella -- very different structures, but essentially the same function. Again for the sake of illustration, let's say there are 2^250 possible structures corresponding to each of those functions (and to oversimplify even further, assume each has the same degree of functional restriction, i.e. the same Fits). That means that while each possible gene's "functional island" corresponds to only 1/2^500 of the sequence space, a total of 1/2^200 of the sequence space corresponds to some function. But not all sequences are equally likely, because selection biases the search toward the sequence space near other successful solutions, and functional sequences seem to cluster together (e.g. gene families). Let's say (again, all numbers made up for the sake of illustration) that this increases the "hit rate" for functional sequences by a factor of 10^100. That means that while functional sequences make up only 1/2^200 of the sequence space, evolution stumbles into one every 1/2^100 or so "tries". There appear to be about 5e30 ~= 2^102 bacteria on Earth (finally, a non-made-up number!)... after some fiddling about the number of mutations (/new potential genes) per bacteria per generation, that means we'd expect roughly one new functional gene per generation. That's the real probability calculation I was talking about. Or rather, it would be the calculation I was talking about if it had real numbers, rather than just made-up-out-of-thin-air ones. And I have no idea what the real numbers are. I don't think there's any way to get a good handle on them until we know far more about the large-scale shape of the fitness function (i.e. the mapping between sequence and function) than we do now. But if you want to do a probability argument, you pretty much need them. Dembski's CSI formula provides a good framework for handling these factors. phi_S(T) provides an upper bound on the number of functions that can be given at-least-as-simple descriptions (i.e. the 2^50 factor in my example), and the other two need to be taken into account when calculating P(T|H). But if you use a formula that doesn't include them...
Also, I am not sure that I understand what you mean in this final comment about my position: “And that means he needs something far far over to the Durston end of that spectrum.” Except for these points, I wholly agree with all that you say. Thank you again for your very deep and clear views.
Here's where your #101 confused me, because it seems to conflict with what I thought I knew about your approach. My understanding from previous discussion (which I admit I haven't always followed in full detail) was that your argument that dFSCI can't be produced without intelligence is that it hasn't been observed to. In order to make a solid case like this, you really have to be able to go on and say "...and if it could be produced naturally, we would have observed it." And that means we must have tested many cases, and (in each of those cases) be able to tell if dFSCI has appeared. Dembski's CSI doesn't allow this. According to him, "Does nature exhibit actual specified complexity? The jury is still out." (from "Explaining Specified Complexity", quoted via Winston Ewert since the original isn't loading for me at the moment). All of the functional complexity in the biological world, and Dembski isn't sure if it actually qualifies as CSI (in his sense) because of unknowns like the ones I've been harping on. If he can't tell for sure if a human exhibits CSI, what hope have you of telling if this new mutation you just observed does? So, for your purposes (at least, as I thought I understood them), you need something that doesn't have any big messy unknowns in the formula. Something much more like Durston's metric. But here's where I got confused. In #101, you said:
[...]f) A cutoff appropriate for the system and its probabilistic resources, sufficient to make the random emergence of a functional state for that function absolutely unlikely, given the probabilistic resources of the system in the time span. At that point, if the functional complexity of the object (as dFSCI for the specified function) exceeds the appropriate cutoff, we categorize dFSCI as exhibited by the object, and we can infer design… If we have reasonably excluded that any known algorithm can intervene to lower the probabilistic barriers that we have just analyzed. At that point, design is the best explanation available.
...but that sounds more like the sort of argument from theoretical probability that Dembski's working toward, and it means you do need to take those messy unknowns into account in your calculations. It also seems to conflict with the empirical approach I thought you were taking. Suppose we observed dFSCI appearing (in, say, something like the Lenski bacteria experiment): would that be evidence that it can be produced naturally, or evidence that some intelligence intervened to add it? Without some way of distinguishing the two, I don't see how the claim that dFSCI only comes from intelligence can be empirically tested. Therefore, I am now confused. Is the source of my confusion clear enough that you can see what I'm worried about, and clarify it for me?Gordon Davisson
August 28, 2014
August
08
Aug
28
28
2014
03:28 PM
3
03
28
PM
PDT
Hope the Real world is treating everyone, KF especially better. I think both CSI and FSCO/I are flawed. This is evidenced by the fact that no-one is using them for design detection. At least CSI has a mechanism that allows for the creation mechanism (I think Dembski was thinking evolution) as the null hypothesis. FSCO/I seems to only argue against spontaneous assembly, it is a reformation of Hoyle's arguments with the addition of the UPB as a ceiling. It is unusable and not relevant to evolution, so we should return to CSI as a more honest approach - FSCO/I at its core has a straw man.rich
August 28, 2014
August
08
Aug
28
28
2014
12:57 PM
12
12
57
PM
PDT
GP and KF Thank you. P.S. maybe KF would host an UD meeting in the Caribbean, so we can chat by the beach? ;-)Dionisio
August 28, 2014
August
08
Aug
28
28
2014
11:18 AM
11
11
18
AM
PDT
D: OUCH! Trust you get better soon, but I guess this thread helps ease "de pains." KFkairosfocus
August 28, 2014
August
08
Aug
28
28
2014
09:18 AM
9
09
18
AM
PDT
Dionisio: I hope you will recover quickly! :)gpuccio
August 28, 2014
August
08
Aug
28
28
2014
05:25 AM
5
05
25
AM
PDT
Joe: Nobody is saying it is simple. But in general, there is a very strong relationship between sequence and shape. I don't think that anybody can deny that. Monogenic disease are caused by single aminoacid mutations which affect negatively the protin's shape and function.gpuccio
August 28, 2014
August
08
Aug
28
28
2014
05:14 AM
5
05
14
AM
PDT
gpuccio Thank you very much for the detailed explanation of the questions I posted. Again! To me there are concepts and terms I have to chew well in order to digest them correctly. Sometimes I may ask what seem like redundant, rhetorical or even 'duh!' questions, but I want to ensure I'm understanding well what you write in other comments you wrote to GD and other interlocutors here. Hopefully many visiting onlookers are learning as much as I do from reading your and KF's commentaries in this thread. I have not read the new ENCODE articles in the current edition of Nature yet, but now that you've asked, I'm going to read it next! Mile grazie caro Dottore! P.S. I'm recovering from a dental surgery for extracting a cracked molar (last on the left lower side).Dionisio
August 28, 2014
August
08
Aug
28
28
2014
05:01 AM
5
05
01
AM
PDT
gpuccio- your countryman, Giuseppe Semonti, has a chapter titled "What teaches proteins their shapes" in "Why is a Fly Not a Horse?" It seems it isn't as simple as saying that AA sequence dictates the shape.Joe
August 28, 2014
August
08
Aug
28
28
2014
04:40 AM
4
04
40
AM
PDT
KF Thank you for responding to my questions. I appreciate it very much.Dionisio
August 28, 2014
August
08
Aug
28
28
2014
04:32 AM
4
04
32
AM
PDT
Dionisio: a) Yes, the protein functionality depends critically on the 3D shape, but obviously also on the biochemical properties of individual AA residues. The 3D shape has at least two important levels: the general folding of the molecule, and the active site. We could say that the general folding of the molecule determines the active site and its form. The biochemical activity of the active site, obviously, depends both on its 3D form and on its biochemical nature. Moreover, in many cases when the active site interacts with its biochemical target, the general 3D configuration of the whole molecule changes, and that can activate or repress other active sites in the molecule. b) Yes, the 3D shape is strictly related to the AA sequence. The AA sequence (the primary structure of the protein) determines both the secondary structure (alpha elices, beta sheets) and the tertiary structure (the general 3D folding). However, the 3D shape of a protein must not be considered as a static condition: it is very dynamic. Folding is a very complex process, and it is very difficult to compute it from the primary structure (the sequence). It depends on many biochemical variables. It happens very efficiently and rapidly in functional proteins, but in many cases it requires the help of chaperones. c) Yes, in general I would say yes. Maybe there are some exceptions, but in general that is the case. But even small modifications can change the folding very much (or, sometimes, not at all). Moreover, post-translational modifications, which do not change the AA sequence, but act differently, for example by adding other molecules to the protein, can change the 3D structure and the function very much. d) Chaperones are proteins which help other proteins in the folding process, mainly by preventing wrong folding (bot also in other ways). Not all proteins require chaperones to fold correctly, but many do. e) dFSCI can be applied to many contexts and many systems. I generally use it to compute the functional information of protein sequences, mainly basic protein domains or protein superfamilies. The reason for that is that they are the best scenario, at present, for the design inference, because much is known in detail about protein sequence and function. In this case, the sequence of the protein (or of its coding gene, which is more or less the same thing) is the functional object for which we compute dFSCI. IOWs, what we are asking is: this new functional sequence which was not present at time T0, originated in this system between time T0 and time T1. Given for granted all that was already present in the system at time T0, could this specific new functional sequence arise by random variation? If that sequence, for that system, exhibits dFSCI (and no functional intermediates are known, so that we have no explicit NS model that can help its emergence) then we can infer design for it as the best explanation. As you can see, the sequence is the real vehicle of the function. We are dealing with digital information here, and therefore the sequence is all, like the sequence of bits in a piece of software is all. In this scenario, the machinery which contributes to the new function is supposed to be already in the system at time 0, and therefore does not contribute to the computation of dFSCI for this specific transition (the emergence of the new protein). If we want to apply the concept of dFSCI to more complex systems (let's say a regulatory network) we need to know the sequence modifications in physical objects (if we are dealing with digital information), or more in general the modifications in physical objects (if we are dealing with analog information) that are necessary to implement the new regulatory function which was not present at T0 and is present at T1, and again compute the ratio between the total number of states (of all the objects implied) which are compatible with the new function and the total number of possible states, and use an appropriate threshold which takes into consideration the probabilistic resources for the whole system, IOWs, the number of possible different states reached by the system through RV in the allotted time span, and evaluate if the functional transition in the system exhibits dFSCI. That is perfectly possible, but you can see that it is more complex. That's why I stick to single proteins to test the concept of dFSCI. However, it is rather obvious that more complex systems, and especially regulatory networks, are certainly very good examples of dFSCI. Maybe that, as our understanding grows, it will be easier to compute dFSCI for them too. By the way, have you seen the new ENCODE articles in the current issue of Nature? :)gpuccio
August 28, 2014
August
08
Aug
28
28
2014
04:12 AM
4
04
12
AM
PDT
D: >> Is the protein functionality associated with its 3D shape?>> a --> It has to fold to a key-lock fitting shape as a first step to functioning. It may need other AA chains, and there may be additions that give chemically active clefts etc. >>Is the 3D shape related to the AA sequence?>> b --> Yes, a balance of forces, "seeking" a reasonably stable 3-D config >>Does a given AA sequence produce the same 3D shape?>> c --> I presume, always. No, often there is a chaperoned fold to the required shape and other patterns are possible. d --> Prions (as in mad cow disease and maybe Alzheimer's), are mis-folded, and are MORE stable than the biofunctional shape. >>What is the role of the chaperones? Are they related to the folded 3D shape?>> e --> They enable proper folds for biofunction, IIRC 10% of proteins. >>How is all the above quantified or accounted for in dFSCI ? >> f --> To a limited degree, in the requisite of FUNCTION, which is linked to specific clusters of sequences that allow fold and relevant key-lock fit and function. g --> Recall, the issue is, a string, information-bearing structure within narrow functionally specific zones T in wider spaces of possible string states. h --> The ATP synthetase either works or it does not. If not, there is no factory for the energy battery molecules for the many endothermic rxns involved in life. Almost instant death -- as IIRC cyanide poisoning shows. KFkairosfocus
August 28, 2014
August
08
Aug
28
28
2014
03:57 AM
3
03
57
AM
PDT
gpuccio Thank you for writing another detailed explanation of the discussed subject. Is the protein functionality associated with its 3D shape? Is the 3D shape related to the AA sequence? Does a given AA sequence produce the same 3D shape? What is the role of the chaperones? Are they related to the folded 3D shape? How is all the above quantified or accounted for in dFSCI ?Dionisio
August 28, 2014
August
08
Aug
28
28
2014
02:16 AM
2
02
16
AM
PDT
F/N: It is probably worth noting that Durston et al are working on the premise that mechanisms for genomes and proteins by extension that were likely enough to be observable on solar system or observed cosmos scope, would be reflected in the range of protein variation in a given family of proteins as observed in actual life forms. This implicitly captures probabilities that are practically relevant. Also, the info-probability relationship can be worked both ways, up to the "fat zero" of practically feasible possibilities. KFkairosfocus
August 28, 2014
August
08
Aug
28
28
2014
02:08 AM
2
02
08
AM
PDT
GD: Specified complexity or complex specified info pre-date the ID scientific school of thought, especially in light of Orgel, Wicken and Hoyle across the 70's into the 80's. The concept of a definable characterisable narrow zone in a field of possible configs traces to statistical thermodynamics and is relevant to the statistical grounding of the second law. That one may see that a blind sample or random walk of limited scope will be challenged to plausibly capture such zones follows from simple probability of sampling. I have used the idea of dropping darts from a height to a chart of a bell curve carried out several SDs. It is easy to get the bulk, but to catch the truly far skirt with any reasonable number of tosses is harder. This is in fact a key insight of traditional hyp testing. In that context, WmAD has sought to construct a metric model, esp c 2005. One may wish to debate the merits and demerits of such, maybe with some profit. But if one conflates the discussion of that with the wider issue of searching large spaces of configs of a system blindly, hoping to catch narrow zones in it on available search resources, one runs the risk of a strawman fallacy. On the characterisation of zones T, a short description that specifies the zone (and like ones of similar character) is to be seen as contrasted with effectively being forced to list its members. This is a big part of the paint the target around where you already hit issue. While WmAD is interested in general characterisations that then may be subject to all sorts of fruitless debates, in fact from the days of Orgel and Wicken, function dependent on specific, complex arrangement, has been recognised as the material case of interest for focussed discussion. Where, function can easily be recognised. So, the material issue is in fact FSCO/I, especially its digitally coded form, dFSCI. Which is right there in the world of life in vital contexts: D/RNA, protein and enzyme synthesis, proteins themselves. Involving codes, prescriptive instructions, algorithms, initiation, continuation and halting, etc. Carried out in organised, co-ordinated execution machines based on the same molecular nanotech. And, involving a kinematic von Neumann Self Replication process. Bring this to bear in light of the search space challenges involved, and the further fact that protein fold domains are deeply isolated in AA sequence space. Then address the empirical observation on the only empirically known source of FSCO/I. Do so, in light of the needle in haystack blind search challenge and the credible range for an initial genome: 100 - 1,000 kbits of information. KFkairosfocus
August 28, 2014
August
08
Aug
28
28
2014
02:01 AM
2
02
01
AM
PDT
Gordon Davisson: Thank you for you #102. Although addressed to Dionisio, it is a also a very good comment of my #101. First of all, I am really grateful to you because you seem to understand my personal approach much better than many others have. That is, for me, an absolute gratification. :) As I agree on most of what you say. I will only try to address a few points about which I think it is worthwhile to go deeper, or where I am not sure I understand completely your view. First of all, it is perfectly true that: "there’s also a bit of terminology confusion here, because gpuccio tends to use “CSI” to refer to the basic idea that both Dembski’s and his are variants of. Unfortunately, that leaves him without a good specific term for Dembski’s version. I, on the other hand, tend to reserve “CSI” for Dembski’s metric alone… which leaves me without a good blanket term that covers all of the variants." I would only say that the concept of CSI (maybe not the exact acronym) has been indeed used before Dembski, even if Dembski has the great merit of having clarified many aspects of the idea and made it popular. And that Demvski has widely used the general concept of CSI before trying to "fix" the concept of specification with the well known metrics in the well known paper. So, I believe that identifying the concept of CSI, even only as an acronym, with the specific metrics in that paper (which, I believe, many of us, including apparently you and me, consider at best perplexing, at worst wrong) is not practical, and probably penalizing for the whole ID debate (it's not a cse that out neo darwinian adversaries love to go back to that paper!). CSI is a fundamental concept of ID theory: the basic notion that we can measure the minimal complexity linked to some acceptable specification, any simple rule that generates a binary partition in the set of possible events without having to enumerate each single event (that's as near as I can go to Dembski's concept of "verbosity"). In this very wide sense, CSI is the most general set of specified information, because we leave the definition of specification open, without any loss of generality. In this sense, functional information is certainly a subset of CSI (a specific subset of the possible rules of specification). Now, some comments about the Durston approach. I use it often as an example of how it is possible to comput "easily" dFSCI for family proteins. However, in principle it is always "possible" (not easy or empirically possible) to compute dFSCI in all cases by just testing all possible outcomes for the function. That cannot be really done because of the big numbers we are dealing with in the complex cases, but it shows that dFSCI is a very concrete complex, and that it measures a property which really exists (something that my interlocutors have tried many times to deny). Now, if we want to be precise, and cal "dFSI the continuous measure, adn "dFSCI" the binary categorization, it is very easy to compute dFSCI in simple systems, where the number of outcomes can be directly tested for a function. In those cases, a direct measure of dFSI can be easily obtained, even if it will never be complex enough to allow a design inference. But that is proof that dFSI is real, and so is dFSCI. dFSCI is only more difficult (impossible) to be measured directly. But, luckily, it can be measured, or at least approximated, indirectly, like many other quantities ion empirical science. Now, Durston's method is a very good shortcut, but not the only one. Many informations about dFSCI in proteins can be derived from what we know of protein functional space (if we look at things without the neo darwinian bias). And the more our understanding of those things gorws, the more it will be obvious that most basic protein domains do exhibit dFSCI. But Durston's approach is perfectly valid. OK, it implies some assumption, mostly that the functional space of the protein families has been traversed more or less completely during evolution by neutral variation, so that the variants we observe today are a reliable sample of all possible functional variants. That assumption can be more or less exact. But the fact remains that, given that assumption, Durston's fits are very simply a measure of dFSCI in protein families. To understand why (given that obviously he does not use the term dFSCI in his paper, and he does not use my definitions) one has to understand well what he is doing, but it is easy to show that what he is measuring (approximating) is exactly dFSCI. In a sense, dFSCI always tends to increase with the length of the functional protein, but with another variable acting, which could be described as the functional relevance of each AA site. That's exactly what Durston measures. I have given many times an extreme example, where even Durston's method is not strictly necessary, because the computation os a minimal dFSCI value is rather immediate and obvious. I am referring to the octameric structure of ATP synthase, made of the two alpha and beta subunits. I have argued, for example here: https://uncommondescent.com/intelligent-design/four-fallacies-evolutionists-make-when-arguing-about-biological-function-part-1/ that a very simple alignment of those two protein sequences using one in archaea, one in bacteria and the human one, shows 378 perfectly conserved aminoacid positions (and that is just a part of the whole functional molecule). Now, I think that we can probably agree that 378 identities from LUCA to humans are very difficult to explain, unless we are dealing with AAs which are essential for the function. So, we can easily assume that. more or less, at least 378 specific AA positions are necessary, exactly as they are, for the function. That represents a minimum value of dFSI of about 1600 bits of functional information, well beyond Dembski's UPB. That's just to show how easy it is to approximate dFSCI in some cases. Moreover, I have argued many times that the UPB is excessive as a threshold, if our aim is to infer design for biological objects. I have proposed a more realistic bound of 150 bits (about 35 specific AAs), which makes the emergence of a specific outcome unlikely enough to be easily rejected, considering the probabilistic resources of our planet (OK, it is a very gross estimation, but it is based on very generous assumptions for those probabilistic resources!). So, I would be grateful if you could explain better why you state: "measures like Durston’s Fits that’re easy to measure but hard to base a probability calculation on". Also, I am not sure that I understand what you mean in this final comment about my position: "And that means he needs something far far over to the Durston end of that spectrum." Except for these points, I wholly agree with all that you say. Thank you again for your very deep and clear views.gpuccio
August 28, 2014
August
08
Aug
28
28
2014
12:55 AM
12
12
55
AM
PDT
Gordon Davisson @ 102 I'll try to write more on this tomorrow.Dionisio
August 27, 2014
August
08
Aug
27
27
2014
08:47 PM
8
08
47
PM
PDT
Gordon Davisson @ 102 What I meant by you being off the target is that your quick reaction to my comment #43 was to apply formulae that don't apply to my example, because an important part of the information associated with the given string of digits, is the set of instructions on how to process the string in order to translate it and the purpose of such encryption. Those formulae don't seem to help describe the associated instructions or the purpose. gpuccio commented on this in his last few posts in this thread. Note that in the given example one needs to know the source of the characters and the method to select the characters. The source could be a book, a newspaper, a magazine. The method is (page # [char position within the given page)]Dionisio
August 27, 2014
August
08
Aug
27
27
2014
08:46 PM
8
08
46
PM
PDT
1 3 4 5 6 7 9

Leave a Reply