Darwinist rhetorical tactics Design inference Functionally Specified Complex Information & Organization ID Foundations

On FSCO/I vs. Needles and Haystacks (as well as elephants in rooms)

Spread the love

Sometimes, the very dismissiveness of hyperskeptical objections is their undoing, as in this case from TSZ:

Pesky EleP(T|H)ant

Over at Uncommon Descent KirosFocus repeats the same old bignum arguments as always. He seems to enjoy the ‘needle in a haystack’ metaphor, but I’d like to counter by asking how does he know he’s not searching for a needle in a needle stack? . . .

What had happened, is that on June 24th, I had posted a discussion here at UD on what Functionally Specific Complex Organisation and associated Information (FSCO/I) is about, including this summary infographic:

csi_defnInstead of addressing what this actually does, RTH of TSZ sought to strawmannise and rhetorically dismiss it by an allusion to the 2005 Dembski expression for Complex Specified Information, CSI:

χ = – log2[10^120 ·ϕS(T)·P(T|H)].

–> χ is “chi” and ϕ is “phi” (where, CSI exists if Chi > ~ 1)

. . . failing to understand — as did the sock-puppet Mathgrrrl [not to be confused with the Calculus prof who uses that improperly appropriated handle) — that by simply moving forward to the extraction of the information and threshold terms involved, this expression reduces as follows:

To simplify and build a more “practical” mathematical model, we note that information theory researchers Shannon and Hartley showed us how to measure information by changing probability into a log measure that allows pieces of information to add up naturally:

Ip = – log p, in bits if the base is 2. That is where the now familiar unit, the bit, comes from. Where we may observe from say — as just one of many examples of a standard result — Principles of Comm Systems, 2nd edn, Taub and Schilling (McGraw Hill, 1986), p. 512, Sect. 13.2:

Let us consider a communication system in which the allowable messages are m1, m2, . . ., with probabilities of occurrence p1, p2, . . . . Of course p1 + p2 + . . . = 1. Let the transmitter select message mk of probability pk; let us further assume that the receiver has correctly identified the message [[–> My nb: i.e. the a posteriori probability in my online discussion here is 1]. Then we shall say, by way of definition of the term information, that the system has communicated an amount of information Ik given by

I_k = (def) log_2  1/p_k   (13.2-1)

xxi: So, since 10^120 ~ 2^398, we may “boil down” the Dembski metric using some algebra — i.e. substituting and simplifying the three terms in order — as log(p*q*r) = log(p) + log(q ) + log(r) and log(1/p) = log (p):

Chi = – log2(2^398 * D2 * p), in bits,  and where also D2 = ϕS(T)
Chi = Ip – (398 + K2), where now: log2 (D2 ) = K
That is, chi is a metric of bits from a zone of interest, beyond a threshold of “sufficient complexity to not plausibly be the result of chance,”  (398 + K2).  So,
(a) since (398 + K2) tends to at most 500 bits on the gamut of our solar system [[our practical universe, for chemical interactions! ( . . . if you want , 1,000 bits would be a limit for the observable cosmos)] and
(b) as we can define and introduce a dummy variable for specificity, S, where
(c) S = 1 or 0 according as the observed configuration, E, is on objective analysis specific to a narrow and independently describable zone of interest, T:

Chi =  Ip*S – 500, in bits beyond a “complex enough” threshold

  • NB: If S = 0, this locks us at Chi = – 500; and, if Ip is less than 500 bits, Chi will be negative even if S is positive.
  • E.g.: a string of 501 coins tossed at random will have S = 0, but if the coins are arranged to spell out a message in English using the ASCII code [[notice independent specification of a narrow zone of possible configurations, T], Chi will — unsurprisingly — be positive.

explan_filter

  • S goes to 1 when we have objective grounds — to be explained case by case — to assign that value.
  • That is, we need to justify why we think the observed cases E come from a narrow zone of interest, T, that is independently describable, not just a list of members E1, E2, E3 . . . ; in short, we must have a reasonable criterion that allows us to build or recognise cases Ei from T, without resorting to an arbitrary list.
  • A string at random is a list with one member, but if we pick it as a password, it is now a zone with one member.  (Where also, a lottery, is a sort of inverse password game where we pay for the privilege; and where the complexity has to be carefully managed to make it winnable. )
  • An obvious example of such a zone T, is code symbol strings of a given length that work in a programme or communicate meaningful statements in a language based on its grammar, vocabulary etc. This paragraph is a case in point, which can be contrasted with typical random strings ( . . . 68gsdesnmyw . . . ) or repetitive ones ( . . . ftftftft . . . ); where we can also see by this case how such a case can enfold random and repetitive sub-strings.
  • Arguably — and of course this is hotly disputed — DNA protein and regulatory codes are another. Design theorists argue that the only observed adequate cause for such is a process of intelligently directed configuration, i.e. of  design, so we are justified in taking such a case as a reliable sign of such a cause having been at work. (Thus, the sign then counts as evidence pointing to a perhaps otherwise unknown designer having been at work.)
  • So also, to overthrow the design inference, a valid counter example would be needed, a case where blind mechanical necessity and/or blind chance produces such functionally specific, complex information. (Points xiv – xvi above outline why that will be hard indeed to come up with. There are literally billions of cases where FSCI is observed to come from design.)

xxii: So, we have some reason to suggest that if something, E, is based on specific information describable in a way that does not just quote E and requires at least 500 specific bits to store the specific information, then the most reasonable explanation for the cause of E is that it was designed. The metric may be directly applied to biological cases:

Using Durston’s Fits values — functionally specific bits — from his Table 1, to quantify I, so also  accepting functionality on specific sequences as showing specificity giving S = 1, we may apply the simplified Chi_500 metric of bits beyond the threshold:
RecA: 242 AA, 832 fits, Chi: 332 bits beyond
SecY: 342 AA, 688 fits, Chi: 188 bits beyond
Corona S2: 445 AA, 1285 fits, Chi: 785 bits beyond

Where, of course, there are many well known ways to obtain the information content of an entity, which automatically addresses the “how do you evaluate p(T|H)” issue. (As has been repeatedly pointed out, just insistently ignored in the rhetorical intent to seize upon a dismissive talking point.)

There is no elephant in the room.

Apart from . . . the usual one design objectors generally refuse to address, selective hyperskepticism.

But also, RTH imagines there is a whole field of needles, refusing to accept that many relevant complex entities are critically dependent on having the right parts, correctly arranged, coupled and organised in order to function.

That is, there are indeed empirically and analytically well founded narrow zones of functional configs in the space of possible configs. By far and away most of the ways in which the parts of a watch may be arranged — even leaving off the ever so many more ways they can be scattered across a planet or solar system– will not work.

The reality of narrow and recognisable zones T in large spaces W beyond the blind sampling capacity — that’s yet another concern — of a solar system of 10^57 atoms or an observed cosmos of 10^80 or so atoms and 10^17 s or so duration, is patent. (And if RTH wishes to dismiss this, let him show us observed cases of life spontaneously organising itself out of reasonable components, say soup cans. Or, of watches created by shaking parts in drums, or of recognisable English text strings of at least 72 characters being created through random text generation . . . which last is a simple case that is WLOG, as the infographic points out. As, 3D functional arrangements can be reduced to code strings, per AutoCAD etc.)

Finally, when the material issue is sampling, we do not need to generate grand probability calculations.

The proverbial needle in the haystack
The proverbial needle in the haystack

For, once we are reasonably confident that we are looking at deeply isolated zones in a field of possibilities, it is simple to show that unless a “search” is so “biased” as to be decidedly not random and decidedly not blind, only a blind sample on a scope sufficient to make it reasonably likely to catch zones T in the field W would be a plausible blind chance + mechanical necessity causal account.

But, 500 – 1,000 bits (a rather conservative threshold relative to what we see in just the genomes of life forms) of FSCO/I is (as the infographic shows) far more than enough to demolish that hope. For 500 bits, one can see that to give every one of the 10^57 atoms of our solar system a tray of 500 H/T coins tossed and inspected every 10^-14 s — a fast ionic reaction rate — would sample as one straw to a cubical haystack 1,000 LY across, about as thick as our galaxy’s central bulge. If such a haystack were superposed on our galactic neighbourhood and we were to take a blind, reasonably random one-straw sized sample it would with maximum likelihood be straw.

As in, empirically impossible, or if you insist, all but impossible.

 

It seems that objectors to design inferences on FSCO/I have been reduced to clutching at straws. END

253 Replies to “On FSCO/I vs. Needles and Haystacks (as well as elephants in rooms)

  1. 1
    kairosfocus says:

    A FTR/FYI for RTH and other denizens at TSZ.

  2. 2
    kairosfocus says:

    PS: RTH & AF et al If you don’t like chirping cricket metaphors, then the OP above shows that elephant in room ones will do . . . but not in the way you hoped. And no, tagging the pivotal issue as a “bignum” argument then using a strawman tactic dismissal will not do. And in case P May is around, a log reduction as above that shows the way the 2005 metric turns into an info beyond a threshold of complexity metric is NOT a probability argument . . . a blunder his Mathgrrl sock puppet made that gave away the game he played at UD. (Please, answer to the merits of the matter, and as I don’t generally hang about at TSZ, if you choose to respond there then let us know. The other objector sites are so bad that reasonable people will only go there under protest.)

  3. 3
    jerry says:

    KF,

    I know of only one serious line from the naturalistic side that disputes the big number argument. That is the work of Juergen Brosius and his colleagues. He is a prolific publisher of research claiming new proteins arise all the time through various mutation processes. He was a colleague of Stephen Gould and is from Munich.

    As far as he is concerned, macro evolution is a done deal. Allen MacNeill pointed to his research as a basis for his claims that the engines of variation were adequate to explain all macro evolutionary change. I believe Larry Moran invoked him too.

    My guess is that he too will fail to overcome the big number problem but he acts as if he has.

  4. 4
    kairosfocus says:

    Jerry, Amino Acid sequence space is actually one of the strongest points showing islands of function, with protein fold domains deeply isolated and is it half of the domains being very sparse, without nearby antecedents so there is not a stepping stones model. And, we have not got to the even more thorny issue of regulating and the like. KF

  5. 5
    jerry says:

    I suggest that Behe, Axe etc read Brosius’ research. He acts like it is no big deal and he is one of the major players in evolutionary biology.

    If you have access to a university library that has a good electronic journal list a lot of his papers are available as PDFs. You can also get a lot of them off his personal site. He runs a big research program at a university in Bavaria.

    I am on my iPad at the moment and will have to look at my computer for his website.

  6. 6
    Mung says:

    hi kf,

    I’d like to suggest a slight modification to what you write regarding Shannon and Harley showing us how to measure information.

    “we note that information theory researchers Shannon and Hartley showed us how to measure information [given a probability distribution] by changing probability into a log measure that allows pieces [the units] of information to add up naturally:”

    For purposes of measuring information Shannon’s metric has only limited applicability, so it’s an important distinction to make, imo. Otoh, it can be applied to any probability distribution, which accounts for it’s usefulness.

    As an aside, it turns out that this is just what we have in statistical thermodynamics, a special subset of probability distributions, and it turns out that the entropy is just Shannon’s measure of information applied to this further subset of probability distributions, providing a meaning of entropy.

    (Information (Shannon Measure of Information (Entropy)))

    regards

  7. 7
    gpuccio says:

    jerry:

    I have looked on Pubmed, and there are a lot of papers with that name. Almost all the most recent deal with transposons and their role in evolution, and with RNA genes.

    Could you please point to some specific paper that, in your opinion, deals with the probabilistic analysis of the emergence of new genes?

  8. 8
  9. 9
    jerry says:

    KF and others. Here is Brosius’web page and some comments from it.

    http://zmbe.uni-muenster.de/in.....ain_de.htm

    Interests:

    1) Our primary focus is on non-protein coding RNAs (npcRNAs) with emphasis on those preferentially expressed in the brain, ranging from discovery to function. This includes gene deletions (the epigenetically regulated MBII-85 RNA cluster) leading to mouse models of Prader-Willi-Syndrome (a neurodevelopmental disease), as well as certain forms of epilepsy that are mediated by dysregulation of protein biosynthesis near synapses and hyperexcitability due to absence of BC1 RNA. Wherever possible, we use transgenic mouse models (“in vivo veritas”) in diverse studies, such as the regulation of neuronal expression of npcRNAs, sub-cellular transport of RNAs into neuronal processes and functional compensation of gene-deleted animal models.

    2) From such simple beginnings of a protocell consisting of a small number of RNA molecules, we are still able to learn lessons on how modern genomes and genes evolve(d). The conversion of RNA to DNA via retroposition remains a major factor in providing raw material for future (nondirected) evolvability. Much of this process generates superfluous DNA. Yet, occasionally new modules of existing genes can be, by chance exapted (recruited) from such previously non-functional sequences.

    More recently, culminating in the “sales”-effort of the ENCODE consortium, the pendulum has almost swung towards the opposite extreme. This falsely implies that almost 80% of the human genome and, in analogy, that almost every chunk of transcribed RNA is functional (see cartoon). For an excellent assessment of this recent excess see Graur et al. (2013) and for the noisy transcriptome see this paper.

    We also use retroposed elements to infer phylogenetic relationships in vertebrates, chiefly in mammals and birds.

    Also, the academic content that we present in our teaching relies heavily on evolutionary thought. Apart from a better understanding of how cells and organisms work, it provides us with valuable tools to understand the evolution of genomes and genes. For medical students, it addresses the question “why we get sick” in a fundamental way. Finally, many bioethical questions posed by our growing capabilities in medical technology, such as gene therapy, assisted reproduction etc. are rendered more tangible “in the light of evolution”.

    Innovative analogies and radical thinking should free students from the restrictions and chains of much of their previous scholastic education. Likewise, evolutionary thought is potentially decisive in complicated patent disputes in the areas of life and biomedical sciences including biotechnology and RNA biology.

    His CV and list of publications are:

    http://zmbe.uni-muenster.de/in.....AllPub.pdf

    The article in Vrba’s book on macro evolution is here:

    http://www.bioone.org/doi/abs/.....2.0.CO%3B2

    Here is a comment I made about this a few months ago:

    http://www.uncommondescent.com.....ent-498336

  10. 10
    kairosfocus says:

    Mung:

    I hear you, I am using the basic “standard” metric, which allows measures for info to add naturally, using properties of logs. Recall, we are going to move from info-carrying capacity to functional-complexity metrics by imposing conditions.

    When it comes to the Shannon metric, that is actually avg info per symbol, hence the weighted sum metric Sum of pi log pi.

    And yes, entropy, in being a metric of degree of microscopic freedom [or, lack of specificity] consistent with a given macro state, is a measure of missing info that would specify the microstate.

    But then, that becomes a subject of controversy as that is not the usual approach. Just as, Garrison’s interesting look at macroeconomics from an Austrian perspective based on Hayek’s investment triangle is illuminating but controversial. I find it is a tool that gives one perspective and highlights one cluster of concerns that happen to be relevant.

    I don’t claim it captures the whole story.

    But it does catch a useful aspect.

    KF

  11. 11
    kairosfocus says:

    Jerry, more stuff for the get around to read pile . . . silly season here, three weeks to go. KF

  12. 12
    Dionisio says:

    check this out – don’t miss the insightful comments by Silver Asiatic..

    http://www.uncommondescent.com.....ent-511598

  13. 13
    gpuccio says:

    jerry:

    I will try to understand better the points you refer to. But, frankly, until now I cannot see any novelty in the few things I have read.

    I absolutely agree about the importance and roles of transposons and non coding sequences. I have argued many times that transposons are an important “engine (tool) of design”.

    But, if transposons, or any other thing, are interpreted as random “engines of variation”, I reallt can’t see how they can help solve the problem of complex functional information. They can’t.

    And I must confess that I have some difficulties to cope with definitions like the following:

    “Here we designate as a “nuon” any stretch of nucleic acid sequence that may be identifiable by any criterion.”

    However, if you have some passages that start to suggest how nuons or any other concept can help solve the problem of big numbers in complex functional information, please be kind and point directly to them.

  14. 14
    roding says:

    I sense FSCO/I is an important concept and Kairosfocus work seems comprehensive, I have to confess I don’t understand much of it. I don’t have any math ability beyond high school and am not familiar at all with information theory.

    KF – have you considered trying to write a simpler description of FSCO/I for the layperson? I don’t know if using analogies would help, but there must be a way for us mere mortals to understand this without needing an advanced degree! Thanks.

  15. 15
    Mung says:

    hi roding,

    kf maintains a website as well. check out the linked.

    http://iose-gen.blogspot.com/2.....mmary.html

    That said, I sympathize. I think first we need a basic course in probabilities and then one in statistics with a follow-on in sampling theory. 🙂

    Information theory is rather tangential, imo, as it is really a misnomer for communication theory so I would only go down that path far enough to understand what Shannon’s theory was about and what it was not about.

    cheers

  16. 16
    Dionisio says:

    roding @ 14

    you brought up a very good point.

    I think Kairosfocus has done a tremendous work trying to explain the referred subject, but it’s not an easy thing to do.

    What KF and GP explain in their OPs and comments in rather general terms, could be applied to the cell fate determination mechanisms in the first few weeks of human development. Basically, think that we all started from a cell known as zygote which is the product of a very complex process called “conception” (I’ll skip that event, in order to simplify the discussion, but perhaps will have to refer back to it later).
    Words like choreography and orchestration appear often in biology literature these days. Perhaps those two concepts could serve as analogy for describing biological processes to the rest of us? In either case, we would still deal with reductionist approaches to describe complex systems to commoners like me. But perhaps the scientists could benefit from trying to explain what they know in simpler terms? Just thinking out loud.

  17. 17
    kairosfocus says:

    R:

    I hear you, unfortunately when one deals with the sort of hyperskepticism and tendency to pounce we face, there is a need for something of adequate technical level . . . even, when that is at introductory College level.

    Consider the text of this post or your own.

    To be text in English, we have letters, spaces, punctuation marks, all coded for in on/off, high/low digital signals. Something like 100 1001. The common code is ASCII, as you may remember from IT class.

    Text like that is different from random gibberish:

    yudkngfsitirsgdoiha64tvxtiu . . .

    or a fixed repeating sequence:

    DSDSDSDSDSDSDSDSDSDSDSDSDSDSDS . . .

    Computer code for a program is similar, telling the PC to carry out step by step actions that in effect are recipes. These can be seen as being like beads on a string:

    -*-*-*-*-* . . .

    They are called strings.

    Now, short words or phrases could be produced at random, e.g. “sit” and “do” come up.

    But when we go to strings of 72 or more ASCII characters, all the atoms in our solar system, flipping 500 coins a hundred million million times every second, could only sample the equivalent of a straw to a cubical haystack as thick as our galaxy. That is a distance so long that light would take 1,000 years to traverse it. To cross the stack, a beam of light would have to have been travelling since about the time that William the Conqueror invaded England.

    Light takes less than 2 s to travel from the Moon to Earth.

    In short the atomic resources of our whole solar system, running at a generously fast rate, would not be able to sample more than one straw to a haystack as big as that. Even if the stars in our neighbourhood — the ones we see in our sky — were in such a stack, predictably a blind sample from such a haystack would get hay nothing else.

    That is the needle in haystack “big number” challenge that is at stake.

    There is only one observed source for such text — design. By an intelligence.

    Now, this is not just a toy example. If you look at an AutoCAD drawing file, it is much the same as the text strings. Similarly, DNA has coded strings that instruct molecular machines in the cell to make protein molecules step by step, as is shown in the original post. These machines, Ribosomes, are what we call numerically controlled machines and the mRNA made from the DNA is a control tape.

    If we saw such machines, in a factory, we would not even pause to suggest they came about by blind chance or mechanical necessity similar to tossing dice or having a heavy object fall when you drop it.

    But, when it comes to the cell — which also is able to replicate itself — we are dealing with something that long predates factory automation or even our own existence.

    How did it come to be?

    Can this be reasonably taken as the result of blind chance and mechanical necessity in the warm little pond envisioned by Darwin, or some similar environment?

    Has this or even the origin of FSCO/I strings been observed happening without design?

    Given the sort of search challenge above, is it reasonable to conclude that this did or must have happened by chance and necessity?

    To answer, consider why it is the only observed source of FSCO/I is design?

    KF

    PS: This introduction from some years ago and this part 2, may help.

  18. 18
    jerry says:

    Gpuccio,

    I posted several things he has written on my comment from three months ago. See link above. Here is one that sums up what he claims,

    The evolution of new genes can ensue through either gene duplication and the neofunctionalization of one of the copies or the formation of a de novo gene from hitherto nonfunctional, neutrally evolving intergenic or intronic genomic sequences. Only very rarely are entire genes created de novo. Mostly, nonfunctional sequences are coopted as novel parts of existing genes, such as in the process of exonization whereby introns become exons through changes in splicing. Here, we report a case in which a novel nonprotein coding RNA evolved by intron-sequence recruitment into its structure. cDNAs derived from rat brain small RNAs, revealed a novel small nucleolar RNA (snoRNA) originating from one of the Snord115 copies in the rat Prader–Willi syndrome locus. We suggest that a single-point substitution in the Snord115 region led to the expression of a longer snoRNA variant, designated as L-Snord115. Cell culture and footprinting experiments confirmed that a single nucleotide substitution at Snord115 position 67 destabilized the kink-turn motif within the canonical snoRNA, while distal intronic sequences provided an alternate D-box region. The exapted sequence displays putative base pairing to 28S rRNA and mRNA targets.

    Essentially what he is saying is that the zillions of genomic shuffling that goes on occasionally produces a genomic sequence that when translated provides a useable protein and the organism then expats it for some use. The key idea is exaption.

    Disagree but he helps run one the biggest medical research labs on the planet researching this process. My guess is that they find a lot of examples but not enough to account for more than a very small percentage of functional proteins.

  19. 19
    kairosfocus says:

    Jerry, search space challenge in a context where FSCO/I exhibits islands of function in a much larger sea of non-function. An explanation that may account for hill climbing does not account for blind island finding in a vast sea with very limited resources relative to the scale of the sea. KF

  20. 20
    Mung says:

    hey kf, fyi, those links in @ 17 return 404 Not Found.

  21. 21
    Jerad says:

    Could you please explain how you calculate P(T|H)? Thanks!!

  22. 22
    gpuccio says:

    jerry:

    It’s very simple. I have no doubts that they find what they find. I have all the possible doubts that what they find cannot be explained without design.

    There bis no analysis of numbers or probabilities there. You can throw in words like “zillions”, but it would be better to count what can happen and what really happens.

    I have no doubts that new genes appear. I have no doubts that non coding regions are used to make new genes. I have no doubts that transposons have an important role. So, I ma perfectly comfortable with all those findings.

    I am perfectly sure that a quantitative analysis of what happens can prove that new dFSCI is constantly generated in those processes: IOWs, they are designed processes.

  23. 23
    kairosfocus says:

    Mung, Thanks. Dead, I guess. I will have to come up with a way to put up or link Dan Peterson’s classic pair of articles. Later, got silly season to attend to after spending time on hyperskeptical dismissals of inductive logic. KF

  24. 24
    kairosfocus says:

    F/N: Links:

    First link, courtesy wayback machine.

    Second one.

    KF

  25. 25
    kairosfocus says:

    R, was I helpful? Where do you need more? KF

  26. 26
    kairosfocus says:

    F/N: Just struck me as a parallel, the frequency of visible light is about 4 – 8 * 10^14 Hz, cycles per second. We are talking here about flipping and reading 500 coins every four cycles of red light, or in the ball park of as fast as light vibrates. KF

    PS: Yes, there are good reasons why light frequencies and fast reaction rates would be of fairly similar order of magnitude having to do with involved energies on order of a few to several eV, and responses of orbiting electrons in atoms. (And thinking in terms of eV is convenient never mind it is a dated unit.)

  27. 27
    jerry says:

    Gpuccio,

    I am only providing what some evolutionary biologists are saying how evolution is a naturalistic process and not one that requires periodic infusion of information to succeed. It contradicts Meyer’s thesis.

    I am saying that Behe, Axe, Durston and Meyer have to address it. So far no one has. I am skeptical that there is enough reshuffling of the genome to produce what has happened but there has to be a focused analysis on it.

    Eventually, Brosius’s thesis will be confirmed or falsified by the analysis of genomes. Both the Darwinian approach and Brosius’s form of punctuated equilibrium are testable.

  28. 28
    Eric Anderson says:

    kf:

    Thanks for the useful post, as always.

    One minor quibble just in terms of terminology: You mention the ability to “measure information”. I’m not sure that anyone has ever done that, or indeed, that it is possible.

    Information carrying capacity of a medium? Yes, that can be measured.

    Can information — the substantive, functional, meaningful aspects of what really makes up information — be objectively measured in a precise mathematical way? I’ve never seen it and personally I doubt it.

  29. 29
    gpuccio says:

    Eric:

    Very simply, I would say that we can measure not only the information carrying capacity of a medium, but also, and especially, the minimum information carrying capacity that is necessary to implement a function, and the rate between that and the total information carrying capacity of the medium we are observing, and that implements the function.

    That’s just another way of describing dFSCI, which is a definite measure.

  30. 30
    roding says:

    KF, yes your description #17 definitely helps, thanks for taking the time to do this.

    What I don’t fully understand yet though is, how do you determine what is being “searched” for? For example, how do you calculate the odds of a cell being formed and what assumptions would you make. Would you assume basic building blocks are in place first (e.g., amino acids), or is the calculation done from scratch. Not sure if that makes any sense, but hopefully you’ll get my drift.

  31. 31
    rich says:

    It seems ID’s information measuring claims have retracted. It used to be look at a thing, measure specified information, probabilistically infer design. Has ID given up on measuring the SI of a thing? It appears to me KF has.

  32. 32
    kairosfocus says:

    R: in the first instance, we have chemicals of relatively low complexity in a pond, the ocean, a comet core or whatever; perhaps as one sees from a Miller-Urey type exercise. The hope is that — without direction or purpose (this is blind, a random walk in the space of possible configs) — they form self-replicating then metabolising and self-replicating entities. Somewhere along the line, codes enter and encapsulation with gating, giving us cell based life. Long on speculative hopes, short on empirical evidence. Beyond, it is hoped they go on to form diverse body plans. At each stage there are numerous blind random walk searches as outlined to find a first functional configuration. Search, of course is metaphorical, even as natural “selection” is. At each stage the only empirically warranted solution to the FSCO/I generation challenge, is design. Which is strictly verboten! KF

  33. 33
    Eric Anderson says:

    gpuccio @28:

    Very simply, I would say that we can measure not only the information carrying capacity of a medium, but also, and especially, the minimum information carrying capacity that is necessary to implement a function, and the rate between that and the total information carrying capacity of the medium we are observing, and that implements the function.

    Once the system is set up, the minimum amount of information to carry out a function — any function — is a single bit. We can use a single switch to initiate any function or cascade of functions.

    Now setting up the system in the first place is a different issue, to be sure, and a great deal of information is required to set up a complex functional system. Quantifying that information, however, is another matter.

    It is relatively easy in the case where our “function” is simply the transmission of a particular message. In that case, if we know the relevant parameters (potential characters/alphabet, the specific code, what is required to encode the message, etc.) then we can calculate the minimum carrying capacity required to carry that message. And, if we treat the transmission of the message as our “function” in this case, we can then collapse the analysis and equate that quantity to the amount of information required to perform our function (meaning, to convey the message).

    I don’t dispute that in these kinds of very narrow situations — particularly in which we define the very conveyance itself as our desired “function” — that we can come up with a measurement that equals or at least approximates the amount of “information” required to convey the message.

    It is much less clear, however, that we can measure the amount of information actually contained in that message.

    Let’s set aside for a moment the larger challenge of measuring the amount of information in something instantiated in three-dimensional space in the real world. Even in the case of the written or spoken word — say, a poem — can we really mathematically and precisely measure the amount of information in the poem? Sure, we can measure the amount of carrying capacity required to transmit the message of the poem in a particular language across a particular medium. But that is a purely statistical measure. It ignores all the other aspects of the information contained in that poem: things like intent, purpose, meaning, pragmatics, and so on — the things that give the real heart and soul and substance to the information in question. Indeed, the very things that we typically think of when we talk about “information” informing us in some way.

    To say that those things don’t matter would be to commit the exact same fallacy that so many anti-ID people commit when they argue that the so-called “Shannon information” is all that needs to be dealt with. Generate that and you’re done, they say. We would be right back to the classic examples of a meaningful sentence versus a meaningless string of characters.

    So, yes, we can measure carrying capacity all day. We can generate bits and bytes and look at Shannon calculations until we are blue in the face. But through such means we will never arrive at a measurement of the real substantive information contained therein. That information — the part that really matters — is better understood in terms of intent and purpose and goals and meaning, than in terms of bits and bytes.

  34. 34
    Eric Anderson says:

    roding @29:

    What I don’t fully understand yet though is, how do you determine what is being “searched” for? For example, how do you calculate the odds of a cell being formed and what assumptions would you make. Would you assume basic building blocks are in place first (e.g., amino acids), or is the calculation done from scratch. Not sure if that makes any sense, but hopefully you’ll get my drift.

    This is a good question. kf will provide a solid answer, but perhaps I can offer a couple of thoughts if I may.

    There are myriad ways to run a sample calculation on the odds of something like a first functional cell coming about by purely naturalistic means. Typically, those calculations make all kinds of concessions to the naturalistic scenario to allow the maximum likelihood of succeeding. For example, such calculations often include all the particles in the known universe, reacting at an incredibly fast rate, ignoring interfering reactions, ignoring breakdown of nascent structures, and on and on. What the calculations inevitably show is that the resources of the entire known universe could not possibly give us any realistic chance of even basic cellular structures arising on their own, much less an entire functional cell.

    I need to emphasize that the calculations often include just raw odds of the basic building blocks coming together in the right configuration. This is by far the most favorable approach to a naturalistic scenario. Other calculations try to capture things like chiralty, etc. Those calculations end up being even more stringent.

    A while back on UD, I wrote:

    I’m willing to grant you all the amino acids you want. I’ll even give them all to you in a non-racemic mixture. You want them all left-handed? No problem. I’ll also grant you the exact relative mixture of the specific amino acids you want (what percentage do you want of glycine, alanine, arganine, etc.?). I’ll further give you just the right concentration to encourage optimum reaction. I’m also willing to give you the most benign and hospitable environment you can possibly imagine for your fledgling structures to form (take your pick of the popular ideas: volcanic vents, hydrothermal pools, mud globules, tide pools, deep sea hydrothermal vents, cometary clouds in space . . . whichever environment you want). I’ll even throw in whatever type of energy source you want in true Goldilocks fashion: just the right amount to facilitate the chemical reactions; not too much to destroy the nascent formations. I’ll further spot you that all these critical conditions occur in the same location spatially. And at the same time temporally. Shoot, as a massive bonus I’ll even step in to prevent contaminating cross reactions. I’ll also miraculously make your fledgling chemical structures immune from their natural rate of breakdown so you can keep them around as long as you want.

    Every single one of the foregoing items represents a huge challenge and a significant open question to the formation of life, but I’m willing to grant them all.

    Now, with all these concessions, what do you think the next step is?

    Go ahead, what is your theory about how life forms?

    The above deals just with the simple, pure odds of the constituent components coming together in the right order. But as mentioned, it ignores the other numerous, and so far as we know, insurmountable problems with a naturalistic origins scenario.

    —–

    What is the materialistic response to this most basic and obvious problem with a naturalistic origin of life scenario? Largely the response is three fold:

    1. There must be some other natural law we haven’t discovered yet that will overcome the problem. We just need to keep looking. [This is not only unlikely; it is a logical impossibility. But this is a topic for another time.]

    2. The calculations proposed were too stringent. It is much easier for life to form, so not a problem. [This is exactly the opposite of reality. As mentioned, the calculations tend to give every possible advantage to a naturalistic scenario, and it still comes up as woefully inadequate.]

    3. Given that we don’t know exactly, with complete precision what the various parameters and odds are, we cannot do any calculation at all and cannot reach any preliminary assessment or conclusion. [This is a silly, even juvenile, objection, but it is a favorite of, for example, Elizabeth Liddle on these pages. It is essentially a confession of ignorance about how life could have come about through naturalistic means, coupled with a refusal to grapple with the obvious implications by falling back on a hyper-technical demand for precision that is rarely, if ever applied to any other problem by anyone in any other area of science.]

  35. 35
    kairosfocus says:

    Rich:

    Instead of making ill-founded assertions on retractions (where, by whom, when) kindly respond to the specifics of the OP, including that the common files we handle every day are in fact measured, quantified instances of functionally specific complex organisation and associated information.

    If you doubt me on functional specificity, do a little experiment.

    Call up Word, and make an EMPTY file, then save under a convenient name. Go in with a file opening utility [notice the apparently meaningless, repetitive character strings?], and clip a character or two at random. Save again. Try to click-open the file. With rather high probability, it will report as corrupt.

    Functional specificity.

    That is, in the expression above, S = 1, it has been shifted from the default 0 based on objective evidence.

    Empirically, objectively confirmed.

    Then, too, look at the file size, maybe 1 kByte.

    Complexity measured in bits.

    Compare:

    Chi_500 = I*S – 500

    Chi_500 (Word File X) = (1 * 8 * 1024) – 500 = 7,692 functionally specific bits beyond the solar system threshold.

    Inference on the FSCO/I as sign principle: designed.

    Observation: designed.

    As, expected.

    It is time to drop the ill-advised gotcha snip and snipe rhetorical games, Rich.

    KF

  36. 36
    kairosfocus says:

    PS: Further to this, examine genomes, noting the algorithmic content that regulates and effects protein assembly using Ribosomes with mRNA as control tape. Also, take time to observe the commonplace functionality-destroying effects of ever so many mutations. Then, multiply by the requisites of folding and function [note on Sickle Cell trait and anaemia as a case in point], with the phenomenon of thousands of fold domains with a good fraction of these being of one or a few proteins. Where, in amino acid sequence space, we see deep isolation of such domains so there is no easy stepping stones path across the space in light of available planetary or solar system atomic resources and time. In short, deeply isolated islands of specific, complex function based on coded DNA information executed on nanotech numerically controlled machines. Now, give us the observationally grounded a priori evolutionary materialist account that does not beg big questions. NC machines FYI are routinely produced by design, as are algorithms and associated digitally coded functionally specific complex information. In our observation, ONLY by design. And, on the search challenge analysis above, for excellent reason.

  37. 37
    kairosfocus says:

    R: Have you done a course in school where they discussed origin of life, etc? Does Miller-Urey ring a bell? If not, look up and read in say Wikipedia — never mind the agendas in that notorious site on topics like this, that’s what you will see in school. Similarly, have you done basic probability, such as why the odds of tossing a 7 total with two dice is 6/36? Those will affect what would be needed to reasonably further answer. KF

  38. 38
    kairosfocus says:

    Rich: I should add that even in Shannon’s original paper, they used more than one way to get at information carrying capacity values. A direct count of state possibilities is well known and common, indeed that is what lies behind, my memory has 4 Giga Bytes, or a file size is 768 kB, etc. Context allows us to assess functional specificity, and we can convert that to a threshold metric as shown. So, the above is quite legitimate. It is ALSO in order to show that blind samples from a population need to be sufficiently large to become likely to capture rare phenomena, and that one needs not do elaborate probability calculations to do that. That is what the needle in haystack analysis is about. And, enough has been noted above and onwards for a reasonable person to see why FSCO/I will naturally result in that sort of rarity. RTH’s talking point quip on needle-stacks is silly rhetoric, not a serious objection. If you doubt me, shake up a shoebox full of fishing reel parts for a while — make sure it’s a cheapo [I have too much respect for the integrity of decent engineering and good workmanship to advise otherwise . . . ] — and see if something functional results. Predictably not, for precisely the needle in haystack reasons given. KF

  39. 39
    Gordon Davisson says:

    Hi, KF. I disagree about whether there’s an elephant in the room — actually, in this particular case, I think there are two of them. I think comparing Dembski’s CSI metric with Durston et al’s Fits metric is an excellent way to illustrate the problems, so I want to concentrate on the differences between them. But first, I have to point out a couple of technical problems with your derivation of the S metric:

    Chi = – log2(2^398 * D2 * p), in bits, and where also D2 = ?S(T)

    Chi = I_p – (398 + K_2), where now: log2 (D2 ) = K_2

    That is, chi is a metric of bits from a zone of interest, beyond a threshold of “sufficient complexity to not plausibly be the result of chance,” (398 + K2). So,

    (a) since (398 + K_2) tends to at most 500 bits on the gamut of our solar system [[our practical universe, for chemical interactions! ( . . . if you want , 1,000 bits would be a limit for the observable cosmos)] […]

    I think you’ve lost track of the meaning of ?S(T) (= D2 = 2^(K_2) ) here. In Dembski’s formulation, ?S(T) is a measure of the descriptive complexity of the specification, that is, it’s a measure of how verbose you have to be to describe the specification. It has nothing to do with the solar system vs. universe (that’s actually what the 398 bits relate to, and they’re already taking the entire universe into account), it has to do with the difference between “bidirectional rotary motor-driven propeller” and “bidirectional rotary motor-driven propeller consisting of […detailed description here…]”. Assuming 398 + K_2 =< 500 bits corresponds to assuming K_2 =< 102 bits, which is not quite 13 bytes, or (using Dembski's assumption of a library of 10^5 "basic concepts") a little over 6 concepts. Using Dembski's "basic concept" approach, "bidirectional rotary motor-driven propeller" (4 concepts) would fit easily in this limit, but anything much longer wouldn't.

    You may be tempted to write this off as a quibble, but I disagree. I think one of the problems people run into when applying CSI is being sloppy about exactly what specification they're using in the analysis. Don't be sloppy; be clear: specify your specifications!

    (b) as we can define and introduce a dummy variable for specificity, S, where

    (c) S = 1 or 0 according as the observed configuration, E, is on objective analysis specific to a narrow and independently describable zone of interest, T:

    Chi = Ip*S – 500, in bits beyond a “complex enough” threshold

    NB: If S = 0, this locks us at Chi = – 500; and, if Ip is less than 500 bits, Chi will be negative even if S is positive.

    E.g.: a string of 501 coins tossed at random will have S = 0, but if the coins are arranged to spell out a message in English using the ASCII code [[notice independent specification of a narrow zone of possible configurations, T], Chi will — unsurprisingly — be positive.

    In this example, you’re under the specification complexity limit (3 concepts: “message”, “ASCII”, “English”), but you’re ignoring another important factor: the p in question isn’t the probability of that particular event, it’s the probability of any event meeting the specification. In this case, those 500 coin flips correspond to 62.5 ASCII characters, and since English is generally estimated to have an entropy of around 1 bit per letter, there are probably about 2^62 meaningful English messages of that length. So p ~= 2^62 / 2^501 = 2^439, and you haven’t met the 500-bit threshold.

    (There’s also the problem that in Dembski’s formulation, p must be computed under a well-defined chance hypothesis; in your calculation, you’re just assuming complete randomness. I’ll return to this distinction later.)

    Ok, that’s it for the minor technical quibbles; now let me get to my main topic of interest: contrasting CSI and Fits, and showing that the Fits don’t qualify as CSI. There are a number of minor differences between CSI and Fits, but IMO there are only two that really matter: how they approach the Texas sharpshooter fallacy, and how they handle probability calculations. The latter is the one that the EleP(T|H)ant refers to, but let me start with the Texas problem.

    Elephant #1: the Texas sharpshooter fallacy

    For those unfamiliar with it, the Texas sharpshooter fallacy refers to drawing a circle around the tightest cluster of bullet holes, and claiming that was the target you were aiming for. In the original version of Dembski’s CSI, he avoided this problem by insisting that the target (“specification”) be defined independently of the data:

    Specifications are the independently given patterns that are not simply read off information. By contrast, the “bad” patterns will be called fabrications. Fabrications are the post hoc patterns that are simply read off already existing information.

    (From Dembski’s 1997 paper “Intelligent Design as a Theory of Information”.)

    Durston’s approach, on the other hand, consists of … simply reading off the existing information (the range of sequences for a given gene/domain/whatever), and treating that as a specification. Unless I’ve seriously misunderstood something here, Durston’s specifications correspond precisely with what Dembski calls fabrications.

    Now, in later versions of CSI, the independence criterion was replaced by an adjustment for “specificational resources” (the ?S(T) I was discussing earlier). Essentially, the idea is that if you can state the specification tersely, it’s relatively independent, and you get a small adjustment; if you’re just reading it off the data, it’ll be quite long, and you’ll get a huge adjustment (and no CSI). In the case of Durston’s measure, the description of the specification will basically be a list of which amino acids can occur at which positions, which’ll be longer than the sequence in question, and hence give no CSI (well, technically it’ll give negative CSI).

    To back up a little: what’s really needed here is the probability of something specified occurring; that can be loosely broken into three factors:

    – The number of specifications.

    – The number of significantly different ways each of those specification can be met.

    – The number of minor variants on each of those different ways there are.

    Dembski’s ?S(T) addresses the first of these factors, and requires that the p calculation take the last two factors into account. Durston’s approach gives an estimate the last of these factors (and winds up rolling the second factor into its incredibly verbose specifications). But in order to reconcile the two, you need a way to isolate and calculate the second factor; and I’ve never seen an approach that even estimates it.

    This is a problem I haven’t really seen addressed in any treatment of the improbability of functionality evolving. It’s simply fallacious to get excited about the improbability of this evolving, when if something else had evolved we’d be looking at the improbability of that evolving instead. Even in Dembski’s work, when he sketches out how to compute the CSI of the bacterial flagellum (see “Specification: The Pattern That Signifies Intelligence”), he treats that basic structure as the only possible way of satisfying the specification “bidirectional rotary motor-driven propeller” — but I don’t think any of us has any idea how many different structures could satisfy that specification, and it’s the sum of their probabilities that matters.

    Elephant #2: probability and probability distributions

    Here, the difference is more obvious and well-known (and is what the “EleP(T|H)ant” pun refers to). Durston’s Fits only correspond to probabilities if all sequences are equally probable. Dembski’s CSI requires that you compute the probability based on each relevant chance hypothesis (and you’ll generally get a different CSI value for each hypothesis).

    Essentially, with Durston’s approach, you compute Fits and then ask if selection can account for it. With Dembski’s, you figure out what selection can do, then compute CSI based on that. By treating Fits as CSI, you’re implicitly assuming that selection doesn’t matter; but you can’t assume that, you need to show that!

    Or to put it another way: Durston’s Fits correspond to log probabilities under the chance hypothesis that the AA sequence is entirely random. But calculating a positive CSI under that hypothesis only refutes that hypothesis; and since the theory of evolution holds that selection (and other factors) make some sequences more likely than others, you haven’t actually refuted evolution, you’ve refuted a strawman.

    When you say:

    … there are many well known ways to obtain the information content of an entity, which automatically addresses the “how do you evaluate p(T|H)” issue.

    I’ve never seen one. If you have one, please give it, but note that any analysis that claims to do this had better include at least these three elements:

    1) An explicit statement of what specification is being used.

    2) An explicit statement of what chance hypothesis is being used.

    3) A calculation of p(T|H) which actually takes into account all ways that the specification can be met, and calculates their probability under the stated chance hypothesis.

    If you haven’t got all three of those, you don’t have a valid calculation of CSI.

  40. 40
    Gordon Davisson says:

    …and apparently I’ve failed at unicode. The “?S(T)”s throughout my previous comment should be phi_S(T).

  41. 41
    rich says:

    Gordon Davisson, I’m impressed with your command of math. would you be willing to try and calculate the SI of an object, as the regulars here don’t seem interested / are unable to?

    I don’t think KFs macro-scale example has any bearing on life, personally.

  42. 42
    rich says:

    >>>

    1) An explicit statement of what specification is being used.

    2) An explicit statement of what chance hypothesis is being used.

    3) A calculation of p(T|H) which actually takes into account all ways that the specification can be met, and calculates their probability under the stated chance hypothesis.

    This seems key. KF, Is it beyond you?

  43. 43
    Dionisio says:

    Y’all have blown my mind with that level of math that is above my pay grade.
    But I would like to understand this you’re discussing here.
    Please, can you use simpler examples to explain the terminology you are using?
    For example, what could be the information associated with the following text?
    039009005009019007024001025010014002030004
    Thank y’all in advance for your help with this.

  44. 44
    Dionisio says:

    #43 follow-up

    What about if the message is larger?

    039009005009019007024001025010014002030004057026005007005005016004005034041006009012037015

    Does the associated information change? Does it mean anything?

    Thank you.

  45. 45
    Gordon Davisson says:

    Gordon Davisson, I’m impressed with your command of math. would you be willing to try and calculate the SI of an object, as the regulars here don’t seem interested / are unable to?

    Thanks! I’d be willing to do some example calculations, but I fear they’d have to be on “toy” examples, rather than the real biological objects we’re actually interested in. The problem with applying this to real biological systems is that the unknowns (basically, my two elephants) are too large.

    For any real biological system, you could use something like Durston’s approach (or his calculations directly, if it was something he’d studied) to get the range of minor variants on that functional system, but I know of no way to estimate the number of completely different ways to solve the same problem. Doing the calculation with just the known solutions would give a lower bound on the probability (i.e. you could say “the probability is at least P, but might be higher”), but since what’s really needed is a upper bound on the probability (in order to get a lower bound on CSI).

    …And then there’s the problem of accounting for selection. In order to do that properly, you’d need to no far FAR FAR more about the overall shape of the relevant fitness functions than we actually do, and even if we knew them completely the probably calculation would almost certainly be computationally intractable. Look at how difficult it is to calculate how a given protein will fold — if it takes massive computing power to figure out what will happen to a single protein (acting under fairly well understood physics), how difficult do you think it’d be to simulate an entire population of organisms? The only way to get a result here is going to be to use a hopelessly oversimplified model (which unfortunately means that the results will be inapplicable to reality).

    Now, doing a toy example might still be interesting enough to be worthwhile, just to illustrate the mechanics of how the analysis works. Even something simple, like a poker hand consisting of the 10 through ace of spades, would be enough to illustrate some things that (I think) most people don’t realize about CSI. For example, there are a number of specifications it satisfies (straight, flush, spades flush, straight flush, royal flush, spades royal flush, etc), and a number of chance hypotheses that might be relevant (dealt from a well-shuffled 52-card deck, a deck with one joker, a deck with two jokers, pinochle deck, best hand from a 7-card stud deal, many possible poorly-shuffled decks, etc). The tricky thing here is that you’ll get a different CSI value for each possible combination of specification and chance hypothesis…

    Maybe if I get a chance tomorrow, I’ll run some number to show what I mean.

  46. 46
    Gordon Davisson says:

    Dionisio:

    Please, can you use simpler examples to explain the terminology you are using?
    For example, what could be the information associated with the following text?
    039009005009019007024001025010014002030004
    Thank y’all in advance for your help with this.

    Let me do a really quick analysis here. First, the most obvious nontrivial specifications I see are based on the fact that there are a lot of “0”s (23 out of 42 total digits). Let me pick just two specs based on that: “23 0’s” and “over half 0’s”. From each of these, we need to compute Phi_S(T) (I’ll use Dembski’s basic-concept-counting system, though I’m not sure it really fits):

    “23 0’s” -> 2 concepts -> Phi_S(“23 0’s”) = (1e5)^2 = 1e10 (that is, “1” followed by 10 “0”s).

    “over half 0’s” -> 3 concepts -> Phi_S(“over half 0’s”) = (1e5)^3 = 1e15

    Second, we need to decide what the relevant chance hypotheses are. To keep this simple, I’ll just use the hypothesis that these are 42 uniform-random digits. Based on that, we can calculate the probability of the two specs under that hypothesis (“P(T|H)” refers to the probability of the specification T under the hypothesis H).

    P( “23 0’s” | “42 uniform-random digits” ) = 6.0e-13

    P( “over half 0’s” | “42 uniform-random digits” ) = 6.9e-12

    Now, to calculate the CSI for each of these specifications, we compute Chi = -log2(1e120 * Phi_S(T) * P(T|H):

    For “23 0’s”, Chi = -log2(1e120 * 1e10 * 6.0e-13) = -391 bits. Since this is negative, this doesn’t entitle us to rule out this chance hypothesis. (Note: the 1e120 factor here is based on the computational capacity of the entire known universe; in many cases a lower “context-dependent” value could be used, which would raise the CSI value. But you’d have to make it pretty small to make this CSI value positive.)

    For “over half 0’s”, Chi = -log2(1e120 * 1e15 * 6.9e-12) = -411 bits. For this specification, P and Phi were both higher, so the resulting CSI was even lower.

    (Note that it’s possible some other specification might give a higher CSI value than either of the above — in general, the highest across all qualifying specifications is what matters — but in this case even if the specification completely determined the digits, P(T|H) would be at least 1e-42, which will never overcome the factor of 1e120, so the CSI will always come out negative. Basically, 42 digits are not enough to put you past the universal probability bound.)

    Now, on to your longer number: I won’t run it in detail, but analogous specifications apply. Using an analogous probability distribution, the probabilities are much lower, and the relevant specifications are the same length, so overall the CSI values are higher (i.e. not so far off in the negatives). But 90 digits is still not enough to put you over the 1e120 probability bound, so the CSI is still going to be negative under any possible specification.

  47. 47
    Gordon Davisson says:

    p.s. I’m not sure if that really helped at all….

  48. 48
    Dionisio says:

    Gordon Davisson

    Thank you for the detailed explanation. I could repeat what rich wrote before:

    I’m impressed with your command of math.

    However, since my IQ score is lower than my age, I may need some time to digest what you wrote.
    Also, there are some terms I don’t quite understand.
    For example, you wrote:

    Now, to calculate the CSI for each of these specifications, we compute Chi =…

    What exactly is that acronym CSI? I’m sure you are not referring to the below link, are you?

    http://en.wikipedia.org/wiki/CSI:_Miami

    Again, thank you for your time to explain these difficult concepts.

  49. 49
    Dionisio says:

    Gordon Davisson

    The text in my post #44 contains 90 characters:

    039009005009019007024001025010014002030004057026005007005005016004005034041006009012037015

    Would it make a difference if we knew that -for example- this is a message made of 15 consecutive sets of 6 characters each?

    1. 039009
    2. 005009
    3. 019007
    4. 024001
    5. 025010
    6. 014002
    7. 030004
    8. 057026
    9. 005007
    10. 005005
    11. 016004
    12. 005034
    13. 041006
    14. 009012
    15. 037015

    Again, thanks.

  50. 50
    Dionisio says:

    Gordon Davisson

    Does the term CSI somehow relate to the acronym FSCO/I that appears in the title of the OP that started this discussion thread?

    Thanks.

  51. 51
    Joe says:

    Both Durston and Dembski (Meyer too) refer to biological information- the type Crick wrote about.

    Also natural selection doesn’t seem have the capability to construct biological information without pre-existing biological information. So that would be an issue for the unguided evolution camp.

  52. 52
    Dionisio says:

    Gordon Davisson

    Do the acronyms CSI and FSCO/I somehow relate to the acronym dFSCI referred to in post #29 in this same discussion thread? Do they mean the same or have different meanings? Or should we ask the guys who coined these names?

    Thanks.

  53. 53
    Dionisio says:

    Gordon Davisson

    The text in my post #44 contains 90 characters:

    039009005009019007024001025010014002030004057026005007005005016004005034041006009012037015

    Would it make a difference if we knew that -for example- this is a message made of 15 consecutive sets of 6 characters each, where every set is split into 2 subsets of 3 characters each, as seen below?

    1. 039 009
    2. 005 009
    3. 019 007
    4. 024 001
    5. 025 010
    6. 014 002
    7. 030 004
    8. 057 026
    9. 005 007
    10. 005 005
    11. 016 004
    12. 005 034
    13. 041 006
    14. 009 012
    15. 037 015

    Thanks.

  54. 54
    kairosfocus says:

    GD:

    I don’t really want to get into a side-debate on Dembski’s CSI metric, but phi_S(T) is a measure of number of opportunities to observe something from T, which Dembski gives 10^120 as upper limit on following Seth Lloyd’s calc.

    Let me give an excerpt from Dembski:

    define ?S as . . . the number of patterns for which [agent] S’s semiotic description of them is at least as simple as S’s semiotic description of [a pattern or target zone] T. [26] . . . . where M is the number of semiotic agents [S’s] that within a context of inquiry might also be witnessing events and N is the number of opportunities for such events to happen . . . . [where also] computer scientist Seth Lloyd has shown that 10^120 constitutes the maximal number of bit operations that the known, observable universe could have performed throughout its entire multi-billion year history.[31] . . . [Then] for any context of inquiry in which S might be endeavoring to determine whether an event that conforms to a pattern T happened by chance, M·N will be bounded above by 10^120. We thus define the specified complexity [?] of T given [chance hypothesis] H [in bits] . . . as [the negative base-2 log of the conditional probability P(T|H) multiplied by the number of similar cases ?S(t) and also by the maximum number of binary search-events in our observed universe 10^120]

    ? = – log2[10^120 ·?S(T)·P(T|H)].

    If you compare, you will see that What I have done is to show that giving each of the 10^57 atoms of our solar system 500 H/T coins and allowing a flip-inspect cycle every 10^-14 s, for 10^17 s — notice, in this toy model each atom of H, He, C, O etc is an observer and observes every 10^-14 s, runs us up to a comparison of observations with possibilities for 500 bits. The result is comparable to a straw sized sample from a cubical haystack 1,000 LY across.

    For the observed cosmos, moving up to 1,000 coins gives an even more overwhelming needle in haystack search challenge.

    That is, I am giving an upper limit to observational possibilities based on atomic matter, which preserves the fundamental meaning of the term. (Indeed, LLoyd’s calc is about converting mass and energy into computer processing, ending up at 10^120 bit ops on 10^90 bits.)

    Next, what I have done with the probability metric is I have used the principle that the occurrence of an uncertain event is informative [a posteriori], and converts to information by being logged. Per well known work.

    Thus, it is appropriate to take log p as an info metric. Indeed, that is implicit in the Dembski expression.

    I put it to you, that a search of the field of possibilities for 500/1,000 bits under the terms used overwhelms any search by any reasonably random method of the space of possibilities for those many bits. Intelligent searches inject active information on the search space that materially changes the situation.

    Further to that, that space W contains ANY bit pattern as coded under ANY convention.

    I also put to you, that the measured info content of a system reflects the underlying probabilities under any reasonably random process of search.

    Especially, where bit content can be estimated. (As happens to be true for Genomes or proteins.)

    Where, what the Durston et al model does is to infer that the relevant dominant search patterns occurred across the history of life. So, from comparing null state with flat random distribution across 4 or 20 possibilities in each node of the string of monomers, to the actual situation where some values are more and some less probable, an avg info per “symbol” metric can be computed per the H-metric of avg info per symbol, based on – SUM [pi log Pi].

    It is reasonable to take a direct info metric as a reflection of what is empirically plausible across the span of life. ( It does not take a great many samples to capture the bulk of a population distribution, but rarities are much harder to find on blind search. So, the Durston et al metric will reasonably capture what is dominant or material.)

    Further to this, a search of a space is in effect the collection of a sampled subset. It therefore picks from the power set. For a set of W members, that’s 2^N subsets. So search for search comes from an overwhelmingly higher order space of possibilities. This leads to the situation where a blind random search in the original space is as good an index of the capacity of blind search as any blindly chosen search.

    In the overall context, we are back to what a common sense reading of the OP tells us.

    Once we have a narrow, deeply isolated zone T in a field of possibilities W [Or even a cluster of such zones] the situation where we may only search a very small, blind sample, we are utterly unlikely to capture events from T or the T’s.

    As for the suggestion of after the fact target-painting etc, the situation is, we are dealing with functionally specific, complex organisation and associated information, FSCO/I. As in, things where many parts must be correctly oriented, arranged and aligned then coupled together to achieve function. Where function is an observable and is highly constraining on acceptable possible organisations or arrangements. This naturally gives us specificity and observability independent of arrangement. Does the Abu Cardinal reel work in this clustering of its parts or does it not. T is recognisable, independent and specific. Where the dummy variable I used in my heuristic model, is dependent on observable function based on specific configuration.

    This can be converted into strings WLOG, as we can see from AutoCAD.

    So, the analysis and threshold metric on strings remain reasonable.

    Indeed, let’s look at it again, on its own terms as “inspired by” not necessarily determined by what Dembski did:

    Chi_500 = I*S – 500, bits beyond the solar system search capacity.

    1 –> As the infographic in the OP shows, it is unreasonable for us to expect our solar system to be able to capture rarities in the space of possibilities W = 2^500 possibilities for 500 bits.

    2 –> If we find a specific info metric I*S that exceeds this, Chi_500 goes positive. That is, a flag is tripped beyond the threshold to indicate, not plausibly discoverable by blind search that has no access to active info.

    3 –> Notice, this is NOT about smooth fitness functions supportive of incremental hill climbing within islands of function T, it is about FINDING SUCH ISLANDS IN W, where search resources on solar system scope are comparatively minimal per one straw to a cubical haystack 1,000 LY across.

    4 –> That is, in the zone of possible configs for atoms in our solar system at fast chem rxn rates for a reasonable duration, we have an overwhelming search challenge once the complexity is beyond 500 bits and the relevant zones are rare.

    5 –> Which, is secured by the FSCO/I criterion.

    6 –> Which, is independently observable: i wuk, or i nuh wuk, mon!

    7 –> Notice, long before molecular biology came on the scene we were able to see functionality of life forms. Metabolism, reproduction, movement, etc.

    8 –> Just, on recent decades we have been astonished to see the complexity that went into that. Which gives us a clue on search space challenge.

    KF

  55. 55
    Joe says:

    Biological specification always refers to function. An organism is a functional system comprising many functional subsystems. In virtue of their function, these systems embody patterns that are objectively given and can be identified independently of the systems that embody them. Hence these systems are specified in the same sense required by the complexity-specification criterion (see sections 1.3 and 2.5). The specification of organisms can be crashed out in any number of ways. Arno Wouters cashes it out globally in terms of the viability of whole organisms. Michael Behe cashes it out in terms of minimal function of biochemical systems.- Wm. Dembski page 148 of NFL

    In the paper “The origin of biological information and the higher taxonomic categories”, Stephen C. Meyer wrote:

    Dembski (2002) has used the term “complex specified information” (CSI) as a synonym for “specified complexity” to help distinguish functional biological information from mere Shannon information–that is, specified complexity from mere complexity. This review will use this term as well.

    Sir Francis Crick talked about biological information in his “Central Dogma”. For example:

    Information means here the precise determination of sequence, either of bases in the nucleic acid or on amino acid residues in the protein.

    from Kirk K. Durston, David K. Y. Chiu, David L. Abel, Jack T. Trevors, Measuring the functional sequence complexity of proteins, Theoretical Biology and Medical Modelling, Vol. 4:47 (2007):

    [N]either RSC [Random Sequence Complexity] nor OSC [Ordered Sequence Complexity], or any combination of the two, is sufficient to describe the functional complexity observed in living organisms, for neither includes the additional dimension of functionality, which is essential for life. FSC [Functional Sequence Complexity] includes the dimension of functionality. Szostak argued that neither Shannon’s original measure of uncertainty nor the measure of algorithmic complexity are sufficient. Shannon’s classical information theory does not consider the meaning, or function, of a message. Algorithmic complexity fails to account for the observation that “different molecular structures may be functionally equivalent.” For this reason, Szostak suggested that a new measure of information—functional information—is required.

    It is all the same thing.

  56. 56
    Dionisio says:

    Gordon Davisson

    [#53 follow-up]

    The text in my post #44 contains 90 characters:

    039009005009019007024001025010014002030004057026005007005005016004005034041006009012037015

    Would it make a difference if we knew that -for example- this is a message made of 15 consecutive sets of 6 characters each, where every set is split into 2 subsets of 3 characters each, as seen below?

    1. 039 009
    2. 005 009 *
    3. 019 007
    4. 024 001 *
    5. 025 010
    6. 014 002 *
    7. 030 004
    8. 057 026 *
    9. 005 007
    10. 005 005 *
    11. 016 004
    12. 005 034 *
    13. 041 006
    14. 009 012 *
    15. 037 015

    What would change in the analysis if we knew that out of the 15 6-char sets, the ones on the even positions in the list (marked with * in the list above) have to swap their 3-char subsets, in such a way that the 3-char subsets that appear on the left move to the right, resulting in this new list?

    1. 039 009
    2. 009 005
    3. 019 007
    4. 001 024
    5. 025 010
    6. 002 014
    7. 030 004
    8. 026 057
    9. 005 007
    10. 005 005
    11. 016 004
    12. 034 005
    13. 041 006
    14. 012 009
    15. 037 015

    Thanks.

  57. 57
    kairosfocus says:

    Rich, if you would but scan above, you will have seen relevant calcs on Chi_500 which are not rocket science, and you will know that I am snatching moments from having to address the local silly season, which just ratcheted up to the next level. I will simply give you an index from genome size, bearing in mind that it is FSCO/I. Reasonable estimates put 1st life in the zone 100 – 1,000 kbases, most likely upper end as lower end is parasitic on existing life per observations. At 2 bits per 4-state base [refinements on distributions won’t make a material difference as ab initio the necessity to bear info imposes freedom to chain ACGT or U in any order], we see W = 2^200,000 ~ 9.98*10^60,205 possibilities. That swamps any plausible scope of search in a Darwin’s pond or the like environment given the number of atomic, ionic rxn speed events in our solar system or the observed cosmos. And that is an utterly generous overestimate of resources that would capture any reasonably plausible blind search across life configs and potential habitable zones. As to dismissing my microjets in a cu m vat comparative or the fishing reel etc, those are vastly simpler than the so called simple cell, which recall is a von Neumann Self Replicating, code and algorithm using, gated, encapsulated metabolic entity. KF

  58. 58
    kairosfocus says:

    Joe, thanks. KF

    PS: I clip and comment on Durston et al in my always linked note:

    +++++++++++

    11 –> Durston, Chiu, Abel and Trevors provide a third metric, the Functional H-metric in functional bits or fits, a functional bit extension of Shannon’s H-metric of average information per symbol, here. The way the Durston et al metric works by extending Shannon’s H-metric of the average info per symbol to study null, ground and functional states of a protein’s AA linear sequence — illustrating and providing a metric for the difference between order, randomness and functional sequences discussed by Abel and Trevors — can be seen from an excerpt of the just linked paper. Pardon length and highlights, for clarity in an instructional context:

    >>Abel and Trevors have delineated three qualitative aspects of linear digital sequence complexity [2,3], Random Sequence Complexity (RSC), Ordered Sequence Complexity (OSC) and Functional Sequence Complexity (FSC). RSC corresponds to stochastic ensembles with minimal physicochemical bias and little or no tendency toward functional free-energy binding. OSC is usually patterned either by the natural regularities described by physical laws or by statistically weighted means. For example, a physico-chemical self-ordering tendency creates redundant patterns such as highly-patterned polysaccharides and the polyadenosines adsorbed onto montmorillonite [4]. Repeating motifs, with or without biofunction, result in observed OSC in nucleic acid sequences. The redundancy in OSC can, in principle, be compressed by an algorithm shorter than the sequence itself. As Abel and Trevors have pointed out, neither RSC nor OSC, or any combination of the two, is sufficient to describe the functional complexity observed in living organisms, for neither includes the additional dimension of functionality, which is essential for life [5]. FSC includes the dimension of functionality [2,3]. Szostak [6] argued that neither Shannon’s original measure of uncertainty [7] nor the measure of algorithmic complexity [8] are sufficient. Shannon’s classical information theory does not consider the meaning, or function, of a message. Algorithmic complexity fails to account for the observation that ‘different molecular structures may be functionally equivalent’. For this reason, Szostak suggested that a new measure of information–functional information–is required [6] . . . .

    Shannon uncertainty, however, can be extended to measure the joint variable (X, F), where X represents the variability of data, and F functionality. This explicitly incorporates empirical knowledge of metabolic function into the measure that is usually important for evaluating sequence complexity. This measure of both the observed data and a conceptual variable of function jointly can be called Functional Uncertainty (Hf) [17], and is defined by the equation:

    H(Xf(t)) = -?P(Xf(t)) logP(Xf(t)) . . . (1)
    where Xf denotes the conditional variable of the given sequence data (X) on the described biological function f which is an outcome of the variable (F). For example, a set of 2,442 aligned sequences of proteins belonging to the ubiquitin protein family (used in the experiment later) can be assumed to satisfy the same specified function f, where f might represent the known 3-D structure of the ubiquitin protein family, or some other function common to ubiquitin. The entire set of aligned sequences that satisfies that function, therefore, constitutes the outcomes of Xf. Here, functionality relates to the whole protein family which can be inputted from a database . . . .

    In our approach, we leave the specific defined meaning of functionality as an input to the application, in reference to the whole sequence family. It may represent a particular domain, or the whole protein structure, or any specified function with respect to the cell. Mathematically, it is defined precisely as an outcome of a discrete-valued variable, denoted as F={f}. The set of outcomes can be thought of as specified biological states. They are presumed non-overlapping, but can be extended to be fuzzy elements . . . Biological function is mostly, though not entirely determined by the organism’s genetic instructions [24-26]. The function could theoretically arise stochastically through mutational changes coupled with selection pressure, or through human experimenter involvement [13-15] . . . .

    The ground state g (an outcome of F) of a system is the state of presumed highest uncertainty (not necessarily equally probable) permitted by the constraints of the physical system, when no specified biological function is required or present. Certain physical systems may constrain the number of options in the ground state so that not all possible sequences are equally probable [27]. An example of a highly constrained ground state resulting in a highly ordered sequence occurs when the phosphorimidazolide of adenosine is added daily to a decameric primer bound to montmorillonite clay, producing a perfectly ordered, 50-mer sequence of polyadenosine [3]. In this case, the ground state permits only one single possible sequence . . . .

    The null state, a possible outcome of F denoted as ø, is defined here as a special case of the ground state of highest uncertainly when the physical system imposes no constraints at all, resulting in the equi-probability of all possible sequences or options. Such sequencing has been called “dynamically inert, dynamically decoupled, or dynamically incoherent” [28,29]. For example, the ground state of a 300 amino acid protein family can be represented by a completely random 300 amino acid sequence where functional constraints have been loosened such that any of the 20 amino acids will suffice at any of the 300 sites. From Eqn. (1) the functional uncertainty of the null state is represented as

    H(Xø(ti))= – ?P(Xø(ti)) log P(Xø(ti)) . . . (3)

    where (Xø(ti)) is the conditional variable for all possible equiprobable sequences. Consider the number of all possible sequences is denoted by W. Letting the length of each sequence be denoted by N and the number of possible options at each site in the sequence be denoted by m, W = mN. For example, for a protein of length N = 257 and assuming that the number of possible options at each site is m = 20, W = 20257. Since, for the null state, we are requiring that there are no constraints and all possible sequences are equally probable, P(Xø(ti)) = 1/W and

    H(Xø(ti))= – ?(1/W) log (1/W) = log W . . . (4)

    The change in functional uncertainty from the null state is, therefore,
    ?H(Xø(ti), Xf(tj)) = log (W) – H(Xf(ti)). (5)

    . . . . The measure of Functional Sequence Complexity, denoted as ?, is defined as the change in functional uncertainty from the ground state H(Xg(ti)) to the functional state H(Xf(ti)), or

    ? = ?H (Xg(ti), Xf(tj)) . . . (6)

    The resulting unit of measure is defined on the joint data and functionality variable, which we call Fits (or Functional bits). The unit Fit thus defined is related to the intuitive concept of functional information, including genetic instruction and, thus, provides an important distinction between functional information and Shannon information [6,32].
    Eqn. (6) describes a measure to calculate the functional information of the whole molecule, that is, with respect to the functionality of the protein considered. The functionality of the protein can be known and is consistent with the whole protein family, given as inputs from the database. However, the functionality of a sub-sequence or particular sites of a molecule can be substantially different [12]. The functionality of a sub-molecule, though clearly extremely important, has to be identified and discovered . . . .

    To avoid the complication of considering functionality at the sub-molecular level, we crudely assume that each site in a molecule, when calculated to have a high measure of FSC, correlates with the functionality of the whole molecule. The measure of FSC of the whole molecule, is then the total sum of the measured FSC for each site in the aligned sequences. Consider that there are usually only 20 different amino acids possible per site for proteins, Eqn. (6) can be used to calculate a maximum Fit value/protein amino acid site of 4.32 Fits/site [NB: Log2 (20) = 4.32]. We use the formula log (20) – H(Xf) to calculate the functional information at a site specified by the variable Xf such that Xf corresponds to the aligned amino acids of each sequence with the same molecular function f. The measured FSC for the whole protein is then calculated as the summation of that for all aligned sites. The number of Fits quantifies the degree of algorithmic challenge, in terms of probability, in achieving needed metabolic function. For example, if we find that the Ribosomal S12 protein family has a Fit value of 379, we can use the equations presented thus far to predict that there are about 1049 different 121-residue sequences that could fall into the Ribsomal S12 family of proteins, resulting in an evolutionary search target of approximately 10-106 percent of 121-residue sequence space. In general, the higher the Fit value, the more functional information is required to encode the particular function in order to find it in sequence space. A high Fit value for individual sites within a protein indicates sites that require a high degree of functional information. High Fit values may also point to the key structural or binding sites within the overall 3-D structure.>>

    11 –> Thus, we here see an elaboration, in the peer reviewed literature, of the concepts of Functionally Specific, Complex Information [FSCI] (and related, broader specified complexity) that were first introduced by Orgel and Wicken in the 1970’s. This metric gives us a way to compare the fraction of residue space that is used by identified islands of function, and so validates the islands of function in a wider configuration space concept. So, we can profitably go on to address the issue of how plausible it is for a stochastic search mechanism to find such islands of function on essentially random walks and trial and error without foresight of location or functional possibilities. We already know that intelligent agents routinely create entities on islands of function based on foresight, purpose, imagination, skill, knowledge and design.

    ++++++++++

  59. 59
    Joe says:

    kairosfocus- I get the feeling that your opponents don’t read your posts all the way through- their responses support that claim. So I just try to simplify it for them. 😉

  60. 60
    Dionisio says:

    Gordon Davisson

    [#53 follow-up]

    The text in my post #44 contains 90 characters:

    039009005009019007024001025010014002030004057026005007005005016004005034041006009012037015

    Would it make a difference if we knew that -for example- this is a message made of 15 consecutive sets of 6 characters each, where every set is split into 2 subsets of 3 characters each, as seen below?

    1. 039 009
    2. 005 009 *
    3. 019 007
    4. 024 001 *
    5. 025 010
    6. 014 002 *
    7. 030 004
    8. 057 026 *
    9. 005 007
    10. 005 005 *
    11. 016 004
    12. 005 034 *
    13. 041 006
    14. 009 012 *
    15. 037 015

    What would change in the analysis if we knew that out of the 15 6-char sets, the ones on the even positions in the list (marked with * in the list above) have to swap their 3-char subsets, in such a way that the 3-char subsets that appear on the left move to the right, resulting in this new list?

    1. 039 009
    2. 009 005
    3. 019 007
    4. 001 024
    5. 025 010
    6. 002 014
    7. 030 004
    8. 026 057
    9. 005 007
    10. 005 005
    11. 016 004
    12. 034 005
    13. 041 006
    14. 012 009
    15. 037 015

    Can we consider all the above instructions to process the original message procedural information?
    Is that procedural information accounted for in the previous analysis?

    Now let’s assume that the first 3-digit number in each pair represents a post # in this thread and the second 3-digit number in each pair represents a character position within the given post, relative to the beginning of the post text.

    Does the information associated with the original message starts to show up?

  61. 61
    Dionisio says:

    #60 PS: note the space character is counted in too. There could be errors in the message.

    1. 039 009 = I

    Post #39, character #9 (bold)

    Hi, KF. I disagree about whether

    2. 009 005 = n
    Post #9, character #5 (bold)

    KF and others. Here is Brosius’web page

    3. 019 007

    4. 001 024

    5. 025 010

    6. 002 014

    7. 030 004

    8. 026 057

    9. 005 007

    10. 005 005

    11. 016 004

    12. 034 005

    13. 041 006

    14. 012 009

    15. 037 015

  62. 62
    kairosfocus says:

    Joe, I hear you, that is part of why I laid out the infographic, which they are also studiously avoiding. KF

  63. 63
  64. 64
    Dionisio says:

    #61 follow-up

    3. 019 007 = space
    Post #19, 7th character is space

    Jerry, search space challenge in

    4. 001 024 = t
    Post #1, 24th character is ‘t’ (bold)

    A FTR/FYI for RTH and other denizens at TSZ.

    5. 025 010 = h
    Post #25, 10th character is ‘h’ (bold)

    R, was I helpful? Where do you need more? KF

    6. 002 014 = e
    Post #2, 14th character is ‘e’ (bold)

    PS: RTH & AF et al If you don’t like chirping cricket metaphors, then

    7. 030 004

    8. 026 057

    9. 005 007

    10. 005 005

    11. 016 004

    12. 034 005

    13. 041 006

    14. 012 009

    15. 037 015

  65. 65
    Dionisio says:

    #64 follow-up
    Anyone wants to continue this and do the rest?

    7. 030 004

    8. 026 057

    9. 005 007

    10. 005 005

    11. 016 004

    12. 034 005

    13. 041 006

    14. 012 009

    15. 037 015

  66. 66
    gpuccio says:

    Dionisio (#52):

    Just to clarify:

    CSI (complex specified information) is the general concept. It is the one used by Dembski and by most other discussants. It refers to the information identified by soem form of specification (any form). IOWs, in a set of possible configuration, we identify a subset which is well specified by some rule. The ration between the numerosity of the specified subset and the numerosity of the total set of configuration is the probability (assuming uniform probability distribution) to get a configuration in the specified subset by a random search in one attempt. -log2 of that probability is the specified complexity. CSI, like all other similar concepts can be expressed as a binary value (present/absent) by some appropriate cutoff/threshold.

    FSCO/I (functionally specified complex organization/information) is the from used by KF. He can answer better about that, but as I understand it is a subset of CSI, in which the specification is functional.

    dFSCI is the form I generally use. It mean digital functionally specified complex information. It is definitely a subset of CSI. The general concepts are the same. I have simply added to further properties:

    a) The specification must be functional. That is very useful in my discussions, because I have tried to give explicit and rigorous definitions of function and of functional information, which are important to make of dFSCI a reproducible tool, which can be used to make real measurements.

    b) The concept is restricted to digital sequences. This restriction bears no loss of generality, because, as KF has pointed out many times, any kind of functional information can be expressed digitally. Moreover, it is perfectly appropriate to use dFSCI in most biological contexts, because the functional information we find in those systems is mainly digital (that is certainly true for protein sequence information, which is the main scenario I use in my reasonings). Reasoning only in terms of digital sequences makes the discussion much simpler, and allow to compute dFSCI in many case in a rather direct way.

    So, sorry for the proliferation of acronyms, but as you can see there is some reason for that. The definition of specification as functional, and the restriction to digital sequences have helped me very much in my discussions here, where the interlocutors are not always completely “open-minded” to these very simple and intuitive concepts. 🙂

    I have defended the concept of dFSCI in many concrete contexts, but to be able to do that efficiently I do need my explicit definitions and lines of reasonings.

    However, the general concepts are identical to those in all classic ID models, like for example Dembski’s explanatory filter. One advantage of dFSCI is that it can be easily measured (indirectly) for protein families by the Durston method.

  67. 67
    Dionisio says:

    #65 follow-up
    should the complex specified purpose-oriented functional information include the orchestration and choreography for every scenario? 😉
    Can that be quantified? how?

  68. 68
    Dionisio says:

    Mio caro amico Dottore GP,

    Mile grazie per la eccellente esplicazione!

    (not sure that was written correctly)

    Thank you for the excellent clarification. Maybe your post #66 will help many of us here to understand this whole thing better.

  69. 69
    Dionisio says:

    gpuccio,

    Do post #60 and the follow-up posts (up to #67) somehow relate to dFSCI and procedures?

    Shouldn’t the instructions to process the given 90-char message be considered part of the whole functional information?

  70. 70
    kairosfocus says:

    D: GP’s summary is quite correct. I use functionally specific complex organisation and associated information, FSCO/I, to emphasise that organisation constrained to function is reducible to dFSCI in the form of coded strings. Just look at AutoCAD. And, when we look at bio-systems, from cells on up, we see highly specific functional organisation, that in many cases actually uses coded strings as part of the function. In the case of presumably coded strings that provide structured information in a data structure which is algorithmically processed, the strings are dFSCI, and the algorithms if software encoded will be dFSCI. The functional organisation of the machinery is FSCO/I, reducible to dFSCI on applying AutoCAD or the like. KF

  71. 71
    Mark Frank says:

    #66 Gpuccio

    It is definitely a subset of CSI.

    Just to remind you that the way you used to define dFSCI you also had the property cannot be recreated through some simple rule. This meant it was definitely not a subset of CSI as most recently defined by Dembski which quite explicitly says the opposite – CSI requires can be created through some simple rule. Maybe you have changed your definition of dFSCI?

    Don’t worry – I am not going to fight the whole dFSCI battle again.

  72. 72
    Joe says:

    Mark Frank:

    Just to remind you that the way you used to define dFSCI you also had the property cannot be recreated through some simple rule.

    Actually it isn’t part of the definition, it is just a fact of life.

    This meant it was definitely not a subset of CSI as most recently defined by Dembski which quite explicitly says the opposite – CSI requires can be created through some simple rule.

    That is incorrect. Time and again Dembski has stated that algorithms cannot produce CSI and algorithms have a simple rule.

  73. 73
    Dionisio says:

    KF and GP,

    How can we quantify the information in the case described in post #60 and the follow-up posts?
    Can it be done?

  74. 74
    Dionisio says:

    KF and GP,

    As you may have noticed, Gordon Davisson did some allegedly CSI quantification on the message string described in post #60, but that was before additional information was added to the message. Can it be done for the whole thing? How?

  75. 75
    kairosfocus says:

    D: Quantification of info requires knowledge of a set of possible states or symbols, and the degree of surprise on finding in a particular state. Degree of surprise can be in many cases quantified as a probability then converted into an additive information metric by extracting logs under certain circumstances. If you are for instance reporting ASCII codes in decimal format — and I see no signs of hex codes or binary digits, you can then work back, from the 128 possibilities per character. Particular codes such as English text impose non-uniform distributions [E is ~ 1/8 of text, as my fading E character testifies . . . ], then we can work out an average info per symbol by summing pi log pi across the set of possible symbol states 1, 2, . . . n. Cf my always linked note here, if you want a primer in a nutshell. A fairly simple and often adequate approach is to look at the storage elements and note how many states they take up, then how many elements there are, such as ASCII, which has seven info bearing bits per character, and each character therefore stores 7 bits of info at this basic level. 143 ASCII characters is about 1,000 bits of info. There are 2^1,000 = 1.07 * 10^301 possibilities for 1,000 bits. The whole of the observed universe cannot sample as much as 1 in 10^150 of that, even under ridiculously generous conditions. KF

  76. 76
    gpuccio says:

    Dionisio:

    Very good questions!

    In brief:

    1) dFSCI can be measured only if we define the search space and the function. The target space must be computed from the defined function. If we want a binary categorization, we must also define the system where the observed object originated and the time span (that will allow the choice of an appropriate threshold). Please, remember that the final purpose of dFSCI as a tool is of rejecting the random explanation of what we observe, in order to infer design (there is also another passage about necessary explanations, but I will not deal with that here).

    2) When we evaluate dFSCI in the object where a specific sequence is written, the essential questions are:

    a) What is the search space? This can usually be approximated as the total number of possible sequences of that length.

    b) How many of those sequences can convey the message so that the function, as previously defined, is guaranteed?

    Now, if the conversion of the sequence to the final functional form needs further computations or elaborations (like DNA transcription and translation in the case of proteins, or decoding procedures like in your case), those procedures are probably further examples of dFSCI, but they are dFSCI in the system which uses the original sequence, so they would not be computed as dFSCI of the original object/sequence.

    IOWs, when we evaluate dFSCI for an object in a system, we give the system for granted (for that specific evaluation), and focus our attention on the object itself.

    I hope that is clear.

  77. 77
    Dionisio says:

    KF
    Thank you for the explanation.
    Did you understand the example given in post #60 and associated follow-up posts?
    How the final message is constructed based on a predetermined text, where the required characters are taken from? In the original example, the character repository was a book, not a blog discussion thread. Every character is identified by a pair of numbers that represent book page and character position within the page.

  78. 78
    kairosfocus says:

    D: Coding by pages and characters raises issues on number of pages, number of characters per page and number of states per character-space. From this an alphabet of possibilities may be composed, and we can get info capacity. This technique is known from cypher theory. KF

  79. 79
    Dionisio says:

    gpuccio
    I think I understand this much better now, after reading your latest explanation. Most probably other readers of this thread will benefit from what you just wrote here too.
    Thank you.
    BTW, I noticed you understood well the example in post #60 and associated follow-up posts. 🙂
    The idea is that even a series of numbers that seem random may not be so, as in the given example. It is designed, even though it doesn’t look like being designed.
    Kind of the opposite to the dilemma the second and third way folks have with things that appear designed 😉
    Perhaps the SETI guys should rethink their searching algorithms too? 😉

  80. 80
    Dionisio says:

    KF
    Good point. Thank you again.
    This was one of the many methods used to pass info through the iron curtain. I know from what you have written before, that you don’t need to be told what that means.

  81. 81
    Gordon Davisson says:

    KF:

    I don’t really want to get into a side-debate on Dembski’s CSI metric,

    I agree that it’s a side-debate, unfortunately I think it’s a necessary one. If we can’t even figure out what the terms in the formula refer to, how can we possibly apply it successfully?

    but phi_S(T) is a measure of number of opportunities to observe something from T, which Dembski gives 10^120 as upper limit on following Seth Lloyd’s calc.

    No, the number of opportunities for the event to happen is N (from your Dembski quote: “…N is the number of opportunities for such events to happen…”), and this gets rolled into the 10^120 factor (based on Seth Lloyd’s calc). phi_S(T) is something entirely different:

    phi_S(T) = the number of patterns for which S’s semiotic description of them is at least as simple as S’s semiotic description of T.

    …which, as I said before, has nothing to do with the number of events being examined, and everything to do with the complexity/verbosity of the specification (aka pattern) the event matches. This might be even clearer if you look at Dembski’s example of the poker hands:

    To see that the specificity so defined corresponds to our intuitions about specificity in general, think of the game of poker and consider the following three descriptions of poker hands: “single pair,” “full house,” and “royal flush.” If we think of these poker hands as patterns denoted respectively by T1, T2, and T3, then, given that they each have the same description length (i.e., two words for each), it makes sense to think of these patterns as associated with roughly equal specificational resources. Thus, phi_S(T1), phi_S(T2), and phi_S(T3) will each be roughly equal. […discussion of the different probabilities of these patterns being matched, and hence the differences in P(T|H)’s…]

    In this example, specificational resources were roughly equal and so they could be factored out. But consider the following description of a poker hand: “four aces and the king of diamonds.” For the underlying pattern here, which we’ll denote by T4, this description is about as simple as it can be made. Since there is precisely one poker hand that conforms to this description, its probability will be one-fourth the probability of getting a royal flush, i.e., P(T4|H) = P(T3|H)/4. And yet, phi_S(T4) will be a lot bigger than phi_S(T3). Note that phi_S is not counting the number of words in the description of a pattern but the number of patterns of comparable or lesser complexity in terms of word count, which will make phi_S(T4) many orders of magnitude bigger than phi_S(T3). Accordingly, even though P(T4|H) and P(T3|H) are pretty close, phi_S(T4)·P(T4|H) will be many orders of magnitude bigger than phi_S(T3)·P(T3|H), implying that the specificity associated with T4 will be substantially smaller than the specificity associated with T3. This may seem counterintuitive because, in absolute terms, “four aces and the king of diamonds” more precisely circumscribes a poker hand than “royal flush” (with the former, there is precisely one hand, whereas with the latter, there are four). Indeed, we can define the absolute specificity of T as –log2P(T|H). But specificity, as I’m defining it here, includes not just absolute specificity but also the cost of describing the pattern in question. Once this cost is included, the specificity of “royal flush” exceeds than the specificity of “four aces and the king of diamonds.”

    The difference in phi_S for the various patterns has nothing at all to do with how many opportunities there are to observe the pattern (that doesn’t even come up in the section I quoted) and everything to do with how complex/verbose the specifications are (“royal flush” vs. “four aces and the king of diamonds”).

    What you’re doing, substituting a number of possible events for phi_S, winds up double-counting the number of events (once as part of the 10^120, and again as phi_S), and ignoring the descriptive complexity of the specification. This makes no sense.

    Now, on to the elephants: your discussion below simply continues to avoid dealing with them; but ignoring them won’t make them go away. Your discussion of how to calculate probabilities continues to duck the question of relevant chance hypotheses:

    I put it to you, that a search of the field of possibilities for 500/1,000 bits under the terms used overwhelms any search by any reasonably random method of the space of possibilities for those many bits. Intelligent searches inject active information on the search space that materially changes the situation.

    Further to that, that space W contains ANY bit pattern as coded under ANY convention.

    I also put to you, that the measured info content of a system reflects the underlying probabilities under any reasonably random process of search.

    You’re assuming that evolution = unguided search = “reasonably random process of search” ~= uniform probabilities. But evolutionary theory disagrees with this, which means that you’re calculating probabilities under a chance hypothesis other than standard evolutionary theory. And that means that your calculation cannot possibly refute standard evolutionary theory.

    Now, you may argue (and have argued) that standard evolutionary theory is untenable for other reasons (e.g. that it requires active information in the fitness function, and that would require an intelligent designer). But that’s a different argument (and one that IMO has different problems); you can’t simply assume that that other argument is correct, roll that assumption into your CSI calculation, and claim that CSI refutes evolution. As I said before, you have to show that selection is irrelevant, not just assume it.

    Not to get too deep into the question of how effective selection is, but let me give you a quick summary of my view: I think it’s clear that selection helps considerably in finding functional biological systems. I think it’s also clear that selection doesn’t make the task trivial (as in Dawkins’ weasel example). So, the real question (IMO) isn’t whether selection is effective, it’s whether it is effective enough to produce the massive functional complexity we see in the biological world. Dembski’s CSI metric has potential to be useful for defining just what constitutes “effective enough”, but in order to do that it must be applied to a model that takes selection into account, not one that ignores it.

    As for the Texas sharpshooter fallacy:

    As for the suggestion of after the fact target-painting etc, the situation is, we are dealing with functionally specific, complex organisation and associated information, FSCO/I. As in, things where many parts must be correctly oriented, arranged and aligned then coupled together to achieve function. Where function is an observable and is highly constraining on acceptable possible organisations or arrangements. This naturally gives us specificity and observability independent of arrangement. Does the Abu Cardinal reel work in this clustering of its parts or does it not. T is recognisable, independent and specific. Where the dummy variable I used in my heuristic model, is dependent on observable function based on specific configuration.

    …I don’t see how this really addresses the problem. I agree that functional arrangements are rare, but in order to make your argument you need to be able to quantify exactly how rare they are. And the only ways I’ve seen that done fall into the Texas sharpshooter fallacy.

  82. 82
    gpuccio says:

    Mark:

    I have changed nothing.

    You may notice that in my post #76 I added:

    (there is also another passage about necessary explanations, but I will not deal with that here)

    In the same post I clarify that I refer to Dembski’s explanatory filter, not to other things.

    Finally, when in my post #66 I say:

    “we identify a subset which is well specified by some rule.”

    I am referring to the simple concept that a specification, whatever its nature, generates a binary partition of the original set. That has nothing to do with the possible high compressibility (regularity, computational necessity explanation) of the information we observe, which must be excluded to infer design with safety.

    Sometimes it is difficult to be brief and exhaustive at the same time.

  83. 83
    Mark Frank says:

    Gpuccio

    My sole point is that there things you would count as examples of dFCSI which according to Dembski’s formula would not count as CSI and vice versa. Do you deny that? (You accepted it once – I thought I was only reminding you – your response was on the lines that you didn’t necessarily agree with everything Dembski has written which is fair enough)

  84. 84
    Joe says:

    Mark Frank:

    My sole point is that there things you would count as examples of dFCSI which according to Dembski’s formula would not count as CSI and vice versa.

    I doubt that. Any examples? (I doubt that, too)

  85. 85
    Joe says:

    GD:

    You’re assuming that evolution = unguided search = “reasonably random process of search” ~= uniform probabilities.

    It isn’t a search and that is the entire problem. kairosfocus is giving unguided evolution a huge boosy by calling and treating it as a search.

    But evolutionary theory disagrees with this,…

    Reference please- reference for an evolutionary theory, that is.

  86. 86
    gpuccio says:

    Mark:

    I think that I don’t understand everything that Dembski has written. Some of the things he has written, like the paper you usually refer to about specification, in the measure that I understand them, in that measure I don’t fully agree.

    As I see things, specification in general is any definition (or rule, call it as you like) that generates a binary partition in the search space, so hat we can identify explicitly a target space whose probability to be found we can compute. In that sense, CSI refers to any type of specification where the target space is unlikely enough (small enough if compared to the search space).

    In this sense, functional specification is certainly a subset of CSI. A subset where the target space is defined by explicitly defining a function which has to be implemented by the object.

    If Dembski defines compressible subsets as target space, it’s fine with me in general. The only problem is that, if we want to use that kind of partition to infer design, we must be very sure that the system we are examining has no way to compute that result by some available algorithm.

    A simple example should explain it. Let’s go to proteins again. ATP synthase is, beyond any doubt, an example of dFSCI.

    But, let’s say that I have a 500 AA long protein for which I can define a biochemical function, and let’s say that such a protein is a sequence of 500 alanines. That would not be a good example of dFSCI, because any biological system which can synthesize proteins could easily generate that protein in a context where only alanine is available. IOWs, the protein, although long and potentially highly complex (the probability of getting it by chance in a truly random system, where all 20 AAs have the same probability of being used is 1:20^500!), still could easily arise by chance in certain particular contexts. The simple reason is that 500 alanines is a highly compressible sequence, and and algorithm which simply implements “add alanine 500 times” can easily generate that sequence.

    Another way to say that is that, in design inference, we are interested in the Kolmogorov complexity of the object, rather than in its apparent complexity.

    The same considerations are true for a sequence of 500 heads, which has been debated many times here. It is a sure sign of designed intervention only if we can be absolutely sure that no simple algorithm can generate it in the system we are examining (for example, we must certainly exclude that the coin is so unfair that the probability of having head is not 0.5, but 0.999999).

    So, it is true that a sequence of 500 heads is specified and cannot be explained by random events and points to design, but only if we are certain that the probability of each event in the system is really fair (with a reasonable approximation).

    The case of functional information, like in language or software or proteins, is the best for design inference, because that is exactly the type of information which is never highly compressible, and where only a series of cognitive considerations can, more or less easily, find the right sequence which implements the function.

    No algorithm can “easily” compute the sequence of a long functional protein which folds and has active sites and so on. We cannot yet do that efficiently, even with all our understanding and computing power. Certainly, the sequence of a functional protein can be computed, but the computing machinery and procedure are infinitely more complex than the protein itself.

    dFSCI is a sure marker of design. I prefer functional specification to any other kind of specification. Absolutely.

    Not to say that functional specification is exactly the kind of specification that we find in biology!

  87. 87
    Dionisio says:

    Gordon Davisson @ 47

    p.s. I’m not sure if that really helped at all….

    Well, not really, as you probably saw in my posts #60 and the follow-up posts.

    Your explanation was too simplistic and did not even scratch the surface of the problem I presented.
    The closest you can get to analyze that problem is through
    gpuccio’s description of functional complex specified information. And even that is not enough when you get to some biological systems, which make my example look like a newborn baby’s toy.
    Maybe someday you’ll understand what gpuccio and KF write in their insightful posts on this subject.
    Anyway, thank you for trying. Take care.

  88. 88
    kairosfocus says:

    GD: I have already cited, where it can be seen that Zones T in aggregate are taken up, and a covering number is indicated. While side track debates are possible, I have not the time for such (there are at least two crises from the past 48 hrs on my plate just now, including one with potential life or death health care implications and my first meeting for the morrow is literally at daybreak . . . ), especially as what I did with Dembski’s 2005 is to erect a model through drawing out the info beyond a threshold structure, that is more relevant to the functionally specific complexity of the world of life, is tractable empirically and is easily testable. Which BTW is close to a point GP just made. Where, any fairly complex and nuanced statement by a design thinker, obviously is going to be twisted into pretzels and obfuscated behind a cloud of objections and zero concessions tactics that would never have been in play if so much ideological baggage on the part of materialists and fellow travellers was not in play. I have put something on the table to be addressed on its merits not pretzel tactic obfuscations and side tracks. KF

  89. 89
    rich says:

    Gordon Davisson,
    I am immensely impressed with your exposition of CSI (shortcomings and all). I think it frames both CSI and FSCO/I well.
    I must commend you on an actual CSI calculation (albeit not for anything organic, sadly) a feat that is unique to you at Uncommon Descent. Surely this is grounds to grant Gordon Davisson moderator / author privileges? His posts have done more for ID as an empirical exercise than all those before IMHO.
    I agree with Gordon’s points that the following are required:
    1) An explicit statement of what specification is being used.
    2) An explicit statement of what chance hypothesis is being used.
    3) A calculation of p(T|H) which actually takes into account all ways that the specification can be met, and calculates their probability under the stated chance hypothesis.
    With 1 (Specification) do we need to specify in terms of the configuration of the entity or the function it performs? Do we need a “winning hand” or a specific sequence of specific cards?
    With (3), FSCO/I fails twice. First it is a reformation of just ‘tornado in a junkyard’ / The odds of non-stepwise assembly. Second it doesn’t address specification at all, only that the universe can’t spontaneously create 500 bits of information, de novo. I suspect it can’t but this how recursive processes like evolution are supposed to happen.
    Gordon, please explore your ideas further! Kudos.

  90. 90
    Gordon Davisson says:

    KF: I certainly understand if you need to drop out of the conversation — real life always takes precedence over discussions like this (and I’ve dropped more than my share of conversations, for far less important reasons). Good luck, and take care!

    On a less friendly note, though, I have to say that I don’t appreciate your implication that I’m arguing in bad faith. I may be mistaken, but I honestly think you’re applying CSI incorrectly. I am not attempting to obfuscate the issues, I’m trying to clarify them. Especially, I do not use “zero concessions tactics”. Show me I’m wrong, and I will conceed. Ask gpuccio: a while back I accused him of using a circular argument (I forget the details, but I think it was about dFSCI), he pointed out that I was misunderstanding part of his approach, and I retracted my objection. Done. Ask me about weak points in the case for evolution, and I’ll concede them (although I’ll certainly disagree with you about where & how serious they are).

    Put it this way: I realize I could be wrong (in fact, I’m sure I’m wrong about something(s), I just don’t know what). If I am wrong, I’d rather realize it and correct myself than remain comfortably ignorant. But if you want to make me realize I’m wrong you’ll have to do it by taking my view seriously, and showing carefully & clearly why it’s wrong; not just by dismissing my view out of hand.

    But that is for another time, when you have much less on your plate. For the moment, I’ll leave you with my best wishes for you & yours.

  91. 91
    Gordon Davisson says:

    Rich: thanks again, but I don’t really think I’m helping advance the state of ID very much. I think what’s really needed is for a lot more people in ID to understand CSI, and understand it well. As I said, CSI is very very hard to calculate correctly for real complex systems, so it’s of limited direct use. But the really useful thing about CSI is that in developing it, Dembski has essentially built a toolkit for detecting, avoiding, & correcting the fallacies that normally plague arguments from improbability. Even where it’s not directly applicable, parts of the CSI approach can be used to make valid improbability arguments, or to recognize when an argument cannot be made valid.

    But this really only works if the relevant understanding is widespread. People in a research community critique each others’ work, and learn from each other. If most people don’t know how to spot e.g. the Texas sharpshooter fallacy, everybody “learns” that it’s ok, and will keep making that type of fallacious argument.

    And CSI is hard to understand (properly, anyway). I don’t see it catching on, unfortunately.

  92. 92
    Gordon Davisson says:

    Actually, let me extend that last comment: statistics are hard to understand, and hard to do right. CSI isn’t an exception; at least in this respect it’s completely normal.

  93. 93
    Upright BiPed says:

    Hi Gordon,

    You once made a statement on this forum that you didn’t know of a definition of information which can be shown to be present in DNA and also cannot be produced without intelligence.

    If this is the case, did you mean you already knew of a origin of information like that in DNA that had come about without intelligence, or are you merely assuming such information can come about without intelligence?

  94. 94
    Dionisio says:

    Gordon Davisson @ 90

    Put it this way: I realize I could be wrong (in fact, I’m sure I’m wrong about something(s), I just don’t know what). If I am wrong, I’d rather realize it and correct myself than remain comfortably ignorant. But if you want to make me realize I’m wrong you’ll have to do it by taking my view seriously, and showing carefully & clearly why it’s wrong

    Post #60 and its associated follow-up posts -in response to your post #46- showed clearly that your analysis of the given message information problem* was far off the target and left many important things out of consideration.
    However, did you comment on this particular case after your posts 46 and 47?
    If not, then why not?
    (*) originally presented in posts #43 and 44
    Note that gpuccio made some interesting observations in post #76, where he also commented on the case of the problem presented in the above mentioned posts.
    KF also commented on this in his posts 75 and 78.
    Although y’all seem to understand this CSI stuff and its subsets much better than I do, the simple example presented in posts 43 and 44 shows that the jury is still out on this subject and no one knows when they will come back to declare the verdict.
    This simple example also shows that even things that don’t look designed could be designed. Actually, the given example is an adaptation of a method that was used to pass information through the ‘iron curtain’ many years ago. In lieu of the blog discussion thread they used a two identical copies of the same book at both ends of the communication path. The numbers were not placed consecutively in one string, but dispersed through an innocent message, disguised as phone numbers, medicine numbers, addresses, ages, etc. More complex than the one in posts 43 and 44. But the encryption idea was similar.
    Do CSI and its subsets cover those cases too?
    How?
    Again, refer to gpuccio’s post #76.

  95. 95
    Mark Frank says:

    #86 Gpuccio

    I understand your reasoning about compressability and I am glad you accept that you don’t fully agree with Dembski to the extent you understand him.

    It may help to understand that his concern is to define “information” as an objective intrinsic property of a string as opposed to being relative to some external specification He needs this to go on to argue for a law of conservation of information – you can’t have a universal law of the conservation of information if it is relative to some specification. So he proposes compressability as a specification which is an objective property of the string (he then equates this to simplicity of description which is not the same thing and is relative to the vocabulary available to the agent).

  96. 96
    Joe says:

    Compressability is a sign of a specification of sorts. That is it. It really all depends on the scenario. CSI cannot be compressed by any algorithm- just try doing that to a computer program.

    Methinks Mark Frank doesn’t understand Dembski as the “Specification” paper is in addition to his other writings and does not replace them.

  97. 97
    Joe says:

    To Gordon and rich- No one from the materialistic position can provide any chance hypothesis for the origin of life or the subsequent evolution. That is YOUR problem, not ID’s

  98. 98
    kairosfocus says:

    In a pause in the meeting. Again, I point out that the FSCO/I concept stands on its own and is subject to — and has passed massive empirical tests [as in a whole Internet full, libraries and a tech age full], that shows it a reliable index of design as cause. Where the needle in haystack analysis shows why, if you can only make a very samall sample of a large space with rare zones T, then you have no right to expect that tiny sample, if blind, to catch a T. Where further, it can be shown that the Dembski measure of FSCO/I is an information metric and an information beyond a threshold metric — that’s what, 3rd or maybe 4th form algebra on logs? All I did was to provide a reasonable value for that threshold by giving a focus that the number of observations and that of possible cases, will be such that a 500 bit threshold will cover it adequately for practical purposes, which turns out to be 500 bits for our solar system. And for the observed cosmos, 1,000 bits. As well, I note that the 10^57 atoms of our sol system, given 10^17 s and a rxn rate for ionic rxns, with 500 coins each, would only be able to sample as 1 straw to a 1,000 LY cubical haystack, grossly too small to capture rare things, as can be seen by pulling a one straw sized sample from the 1,000 LY haystack superposed on our gapactic neighbourhood. The analysis is actually obvious, as the reference to needle in haystack points out. It is time it were seriously faced. KF

  99. 99
    Dionisio says:

    Gordon Davisson
    Here’s a very important clarification from gpuccio #76
    (but also read gpuccio #66 and KF #75 for additional documentation on this subject)

    1) dFSCI can be measured only if we define the search space and the function. The target space must be computed from the defined function. If we want a binary categorization, we must also define the system where the observed object originated and the time span (that will allow the choice of an appropriate threshold). Please, remember that the final purpose of dFSCI as a tool is of rejecting the random explanation of what we observe, in order to infer design (there is also another passage about necessary explanations, but I will not deal with that here).

    2) When we evaluate dFSCI in the object where a specific sequence is written, the essential questions are:

    a) What is the search space? This can usually be approximated as the total number of possible sequences of that length.

    b) How many of those sequences can convey the message so that the function, as previously defined, is guaranteed?

    Now, if the conversion of the sequence to the final functional form needs further computations or elaborations (like DNA transcription and translation in the case of proteins, or decoding procedures like in your* case), those procedures are probably further examples of dFSCI, but they are dFSCI in the system which uses the original sequence, so they would not be computed as dFSCI of the original object/sequence.

    IOWs, when we evaluate dFSCI for an object in a system, we give the system for granted (for that specific evaluation), and focus our attention on the object itself.

    This important paragraph should help you to clarify a few important concepts you seem to be confused about, as indicated in my posts #87 and #94:

    Now, if the conversion of the sequence to the final functional form needs further computations or elaborations (like DNA transcription and translation in the case of proteins, or decoding procedures like in your* case), those procedures are probably further examples of dFSCI, but they are dFSCI in the system which uses the original sequence, so they would not be computed as dFSCI of the original object/sequence.

    Particular attention should be paid to this part:

    …those procedures are probably further examples of dFSCI, but they are dFSCI in the system which uses the original sequence, so they would not be computed as dFSCI of the original object/sequence.

    (*) ‘your case’ here refers to the example described in my posts 43, 44, 60 and associated follow-up posts.

  100. 100
    kairosfocus says:

    F/N: I pause, now that word from Guadeloupe is, a child run over Sun night and having a head injury seems okay, relatively speaking . . .

    One of the reasons for CSI is its generality. But that comes at a price of complexity, especially as linked to how to make “specificity” tractable.

    At the same time, one of the reasons for using FUNCTIONAL SPECIFICITY is that this is quite simple to understand and observe: i’ wuk/ i’ nuh wuk, is a relevant subset of specification, and is fairly easy to quantify and address.

    The tendency to run away from FSCO/I and its manifestation in digital coded form as dFSCI, to tangle up in CSI issues irrelevant to the world of life, seems to me a case of refusing to move from the simple to the more complex. Where, holding the simple case in hand, would allow a clarification of the more complex one by reference to the clarifying power of a concrete case in point.

    It should be quite obvious that common computer files are dFSCI.

    Such dFSCI is to be found in D/RNA, and in proteins by direct extension, as was noted by the discoverers of the double helix structure from March 1953 when the analogy to printed text was made.

    FSCO/I is also a focal issue WLOG, as the relevant cases are based on functionality, and as complex organisation can be reduced to code per AutoCAD etc.

    Where also, we see a simple matter of a threshold of complexity that per 10^57 atoms in the solar system or about 10^80 in the observed cosmos, 10^14 ops per second and 10^17 s, we easily see that 500 – 1,000 bits of complexity gives a space such that the feasible searches are grossly inadequate to be a plausible explanation of finding islands of specific function.

    [Phone conversation on yet another issue, pardon . . . ]

    Extending to the evident cosmological fine tuning and the tightness of the operating point of our cosmos that supports C-Chemistry aqueous medium terrestrial planet cell based life, not even a multiverse gets out of the domain of explaining the observation of finding ourselves in a place where a lone fly on a wall is swatted by a bullet rather than finding a place where the wall is carpeted with flies so there is no tight local fine tuning issue.

    The FSCO/I issue is not going to go away, and it is so clear that obfuscations begin to take on a different colour than one would like. Especially, if they are insistent.

    We need to return matters to due proportion and material focal points.

    KF

  101. 101
    gpuccio says:

    Gordon Davisson at #81:

    I find some of your comments very interesting, and I would like to clarify better what I believe about some of those points, hoping that it may contribute to some reciprocal understanding.

    I agree with your criticism of some contents in Dembski’s specification paper. In particular, I have always been perplexed by the appeal to simplicity of description or verbosity, for which I cannot find a good reason. That’s why I prefer to refer to previous concepts by Dembski, especially the explanatory filter.

    My idea of CSI/dFSCI is, and has always been, that it is a fundamental and efficient tool for design inference, because it allows us to quantify the improbability of an observed result in a random system. That’s the role of dFSCI: establishing a definite threshold to what random variation can do in a defined context.

    So, dFSCI, at least when used to infer design in biology, must always be referred to a specific, explicit context. I usually define the context as:

    a) An explicitly defined and measurable function for the observed object

    b) An explicitly defined physical system (where the object is supposed to have originated)

    c) An explicitly defined time span (in which the object is supposed to have originated)

    d) A computation of the dFSCI linked to the defined function (expressed as -log2 of the target space / search space ratio). That would be the probability of finding a functional state in one random attempt.

    e) A computation of the probabilistic resources of the system in the time span (essentially the number of attempts, or states, that the system can explore).

    f) A cutoff appropriate for the system and its probabilistic resources, sufficient to make the random emergence of a functional state for that function absolutely unlikely, given the probabilistic resources of the system in the time span.

    At that point, if the functional complexity of the object (as dFSCI for the specified function) exceeds the appropriate cutoff, we categorize dFSCI as exhibited by the object, and we can infer design…

    If we have reasonably excluded that any known algorithm can intervene to lower the probabilistic barriers that we have just analyzed.

    At that point, design is the best explanation available.

    A couple of details:

    1) The above reasoning applies only to transitions which happen randomly. If and when a transition can be explicitly deconstructed into two or more smaller transitions (for example, by showing a naturally selectable, functional intermediate sequence state), then the dFSCI computation can still be applied, but it must be applied to each of the two (or more) smaller transitions. The algorithmic role of NS for the functional intermediate must be analyzed according to other criteria (is it credible? is it empirically verified?), but is not essentially a probabilistic issue.

    2) In biological scenarios, like the emergence of a new protein domain, the random search is usually supposed to be a random walk. But the probabilistic evaluation is rather similar. Obviously, a more or less uniform probability distribution is assumed for the different states that the system can reach. That is perfectly reasonable for a random walk in the space of nucleotide sequences. Indeed, the transition we should analyze is usually a transition from one sequence unrelated state to the functional state (IOWs, the emergence of a new functional sequence). I have argued many times that in a random walk in a sequence space, the probability of reaching some state from one unrelated to it at sequence level must be considered somwehat lower than 1/N (because related states have higher probability of being reached).

    3) There are many reasons why NS cannot help in generating complex function in biology. For brevity, I will not discuss them here now. But you are right, those reasons are not related to probability and CSI. Probability and CSI tell us that RV cannot provide the functional states that NS should select, if they are too unlikely. The criticism of NS shows that complex functions cannot be deconstructed, as a rule, into simpler functional selectable states.

    4) NS is indeed an example of “added information in the fitness function”. In the sense that the very limited powers of NS derive only from the existence of replicators which compete for environmental resources. In a sense, the complex functional information which allows biological replication provides NS with the very limited power to “select” what favors reproduction. That’s all. But that quantity of “added information” can never explain the emergence of truly new complex functional information.

    You ask “whether NS is effective enough to produce the massive functional complexity we see in the biological world”. For many reasons, both empirical and logical, the answer is absolutely not. But again, I will not discuss that here and now. Maybe in the next posts! 🙂

    Well, I would like to stop here for the moment. I want to say, however, that even if for me the important use of CSI/dFSCI is merely along the empirical lines that I have tried to sketch, still I believe that Dembski is right in believing that there is some more universal formulation about CSI that can be applied or demonstrated in more abstract terms. IOWs, I intuitively believe (but could not demonstrate as an universal statement) that, given a very strict threshold of dFSCI (let’s say Dembski’s 500 bits), original dFSCI (the emergence of a function and a sequence that did not exist in the system considered), than no system in the world cannot explain that kind of result without appealing directly or indirectly to conscious design. That’s probably the meaning of Dembski’s law of conservation of information. I don’t know if he has demostrated that universal law (probably not in the paper about specification), or if he will succeed in doing that. But I do believe that such a universal principle does exist.

    However, there is no need of that universal demonstration to safely infer design for biological objects. A much simpler empirical approach through functional information and correct methodology can do that perfectly.

  102. 102
    Gordon Davisson says:

    Dionisio:

    Post #60 and its associated follow-up posts -in response to your post #46- showed clearly that your analysis of the given message information problem* was far off the target and left many important things out of consideration.
    However, did you comment on this particular case after your posts 46 and 47?
    If not, then why not?

    Hmm, I don’t think #60 et al showed that I was off target; I think it’s more accurate to say that #60 (and the following) showed that Dembski’s CSI isn’t suitable for your example. Here’s the thing: CSI aims to be a design detection criterion in the sense that the presence of CSI implies that design is present; it explicitly does not aim to be a criterion in the sense that design implies that CSI is present. In your example, it’s clear that the digit string is designed, but it doesn’t have CSI… and this is entirely normal.

    You mentioned that “This simple example also shows that even things that don’t look designed could be designed.” You’re exactly right here, and CSI doesn’t claim to solve this problem.

    In your example, there are two big reasons your digit string doesn’t have CSI: first, Dembski’s measure looks for patterns that can be very simply described. In the excerpt from Dembski I quoted in #81, for example, he compares the specificity of the poker hand specifications “royal flush” vs. “four aces and the king of diamonds” — “four aces and the king of diamonds” is more specific, but because it’s also more verbose it winds up having lower CSI. In your case, a full description of the pattern would fall into the same trap of being very specific, but winding up with low CSI due to its verbosity (aka descriptive complexity).

    The other reason is simply the message size. CSI only becomes positive when the probability (adjusted for descriptive complexity) drops below 1 in 10^120 (a form of universal probability bound), and your messages simply weren’t long enough to be that improbable. Longer examples of the same pattern would have made the cut, and interestingly if they were long enough the CSI with the verbose description would eventually wind up giving higher CSI values than the simple number-of-zeros specifications I used. The reason for this is that the verbosity “cost” only gets paid once, but the increase in specificity would “pay back” proportional to the message size.

    Re your references to gpuccio’s comments: I’ll try to get a chance to reply to him directly (he’s raised some interesting questions), but be aware he’s talking about something different than I’ve been. I’ve been specifically addressing Dembski’s definition of CSI; gpuccio is talking about his dFSCI metric, and while they’re both centered around the same general idea, they’re very different in how they handle the details. For instance, they handle descriptive complexity in completely different ways: Dembski’s CSI penalizes is, while dFSCI requires it. There are actually good reasons for both approaches (!), but for the moment I’ll just point out that it’s an example of how different the two are.

    (BTW, there’s also a bit of terminology confusion here, because gpuccio tends to use “CSI” to refer to the basic idea that both Dembski’s and his are variants of. Unfortunately, that leaves him without a good specific term for Dembski’s version. I, on the other hand, tend to reserve “CSI” for Dembski’s metric alone… which leaves me without a good blanket term that covers all of the variants.)

    IMO, there’s actually a much deeper and more important difference between Dembski’s CSI and dFSCI: Dembski’s metric is designed to work as the basis of a theoretical argument that it can’t appear by natural means, while dFSCI is designed to work as the basis of an empirical argument. That is, with Dembski’s approach the idea is to say that we can rule out naturally-produced CSI by pure logic, while gpuccio’s approach is to say we can rule it out because we’ve never seen it happen. This difference actually drives a major difference in emphasis between the two.

    As I see it, these CSI-line measures fall on something like a spectrum, from measures like Durston’s Fits that’re easy to measure but hard to base a probability calculation on, and measures like Dembski’s that’re hard to measure but easy to base a probability calculation on. Basically, computing the relevant probabilities is difficult (see, e.g. my two “elephants”) — Dembski’s approach forces you to deal with the difficulties up front, while Durston’s simply ignores them.

    But for gpuccio’s empirical approach, the theoretical rigor of Dembski’s calculation is basically irrelevant. For his approach, the important thing is that you be able to search for counterexamples (examples of dFSCI produced by natural processes), and that means it must be easy to measure on lots and lots of potential examples. And that means he needs something far far over to the Durston end of that spectrum. Dembski’s metric, because it’s so difficult to calculate, would be totally unsuitable for use in gpuccio’s argument.

  103. 103
    Dionisio says:

    Gordon Davisson @ 102

    Thank you for responding.

    gpuccio might want to comment on what you wrote about dFSCI.

    I want to comment on this too, but will try to write it tomorrow.

  104. 104
    Dionisio says:

    Gordon Davisson @ 102

    What I meant by you being off the target is that your quick reaction to my comment #43 was to apply formulae that don’t apply to my example, because an important part of the information associated with the given string of digits, is the set of instructions on how to process the string in order to translate it and the purpose of such encryption. Those formulae don’t seem to help describe the associated instructions or the purpose.
    gpuccio commented on this in his last few posts in this thread.
    Note that in the given example one needs to know the source of the characters and the method to select the characters.
    The source could be a book, a newspaper, a magazine. The method is (page # [char position within the given page)]

  105. 105
    Dionisio says:

    Gordon Davisson @ 102
    I’ll try to write more on this tomorrow.

  106. 106
    gpuccio says:

    Gordon Davisson:

    Thank you for you #102. Although addressed to Dionisio, it is a also a very good comment of my #101.

    First of all, I am really grateful to you because you seem to understand my personal approach much better than many others have. That is, for me, an absolute gratification. 🙂

    As I agree on most of what you say. I will only try to address a few points about which I think it is worthwhile to go deeper, or where I am not sure I understand completely your view.

    First of all, it is perfectly true that:

    “there’s also a bit of terminology confusion here, because gpuccio tends to use “CSI” to refer to the basic idea that both Dembski’s and his are variants of. Unfortunately, that leaves him without a good specific term for Dembski’s version. I, on the other hand, tend to reserve “CSI” for Dembski’s metric alone… which leaves me without a good blanket term that covers all of the variants.”

    I would only say that the concept of CSI (maybe not the exact acronym) has been indeed used before Dembski, even if Dembski has the great merit of having clarified many aspects of the idea and made it popular. And that Demvski has widely used the general concept of CSI before trying to “fix” the concept of specification with the well known metrics in the well known paper. So, I believe that identifying the concept of CSI, even only as an acronym, with the specific metrics in that paper (which, I believe, many of us, including apparently you and me, consider at best perplexing, at worst wrong) is not practical, and probably penalizing for the whole ID debate (it’s not a cse that out neo darwinian adversaries love to go back to that paper!).

    CSI is a fundamental concept of ID theory: the basic notion that we can measure the minimal complexity linked to some acceptable specification, any simple rule that generates a binary partition in the set of possible events without having to enumerate each single event (that’s as near as I can go to Dembski’s concept of “verbosity”).

    In this very wide sense, CSI is the most general set of specified information, because we leave the definition of specification open, without any loss of generality. In this sense, functional information is certainly a subset of CSI (a specific subset of the possible rules of specification).

    Now, some comments about the Durston approach. I use it often as an example of how it is possible to comput “easily” dFSCI for family proteins. However, in principle it is always “possible” (not easy or empirically possible) to compute dFSCI in all cases by just testing all possible outcomes for the function. That cannot be really done because of the big numbers we are dealing with in the complex cases, but it shows that dFSCI is a very concrete complex, and that it measures a property which really exists (something that my interlocutors have tried many times to deny).

    Now, if we want to be precise, and cal “dFSI the continuous measure, adn “dFSCI” the binary categorization, it is very easy to compute dFSCI in simple systems, where the number of outcomes can be directly tested for a function. In those cases, a direct measure of dFSI can be easily obtained, even if it will never be complex enough to allow a design inference. But that is proof that dFSI is real, and so is dFSCI. dFSCI is only more difficult (impossible) to be measured directly.

    But, luckily, it can be measured, or at least approximated, indirectly, like many other quantities ion empirical science.

    Now, Durston’s method is a very good shortcut, but not the only one. Many informations about dFSCI in proteins can be derived from what we know of protein functional space (if we look at things without the neo darwinian bias). And the more our understanding of those things gorws, the more it will be obvious that most basic protein domains do exhibit dFSCI.

    But Durston’s approach is perfectly valid. OK, it implies some assumption, mostly that the functional space of the protein families has been traversed more or less completely during evolution by neutral variation, so that the variants we observe today are a reliable sample of all possible functional variants. That assumption can be more or less exact. But the fact remains that, given that assumption, Durston’s fits are very simply a measure of dFSCI in protein families. To understand why (given that obviously he does not use the term dFSCI in his paper, and he does not use my definitions) one has to understand well what he is doing, but it is easy to show that what he is measuring (approximating) is exactly dFSCI.

    In a sense, dFSCI always tends to increase with the length of the functional protein, but with another variable acting, which could be described as the functional relevance of each AA site. That’s exactly what Durston measures.

    I have given many times an extreme example, where even Durston’s method is not strictly necessary, because the computation os a minimal dFSCI value is rather immediate and obvious. I am referring to the octameric structure of ATP synthase, made of the two alpha and beta subunits.

    I have argued, for example here:

    http://www.uncommondescent.com.....on-part-1/

    that a very simple alignment of those two protein sequences using one in archaea, one in bacteria and the human one, shows 378 perfectly conserved aminoacid positions (and that is just a part of the whole functional molecule).

    Now, I think that we can probably agree that 378 identities from LUCA to humans are very difficult to explain, unless we are dealing with AAs which are essential for the function. So, we can easily assume that. more or less, at least 378 specific AA positions are necessary, exactly as they are, for the function.

    That represents a minimum value of dFSI of about 1600 bits of functional information, well beyond Dembski’s UPB.

    That’s just to show how easy it is to approximate dFSCI in some cases.

    Moreover, I have argued many times that the UPB is excessive as a threshold, if our aim is to infer design for biological objects. I have proposed a more realistic bound of 150 bits (about 35 specific AAs), which makes the emergence of a specific outcome unlikely enough to be easily rejected, considering the probabilistic resources of our planet (OK, it is a very gross estimation, but it is based on very generous assumptions for those probabilistic resources!).

    So, I would be grateful if you could explain better why you state:

    “measures like Durston’s Fits that’re easy to measure but hard to base a probability calculation on”.

    Also, I am not sure that I understand what you mean in this final comment about my position:

    “And that means he needs something far far over to the Durston end of that spectrum.”

    Except for these points, I wholly agree with all that you say. Thank you again for your very deep and clear views.

  107. 107
    kairosfocus says:

    GD:

    Specified complexity or complex specified info pre-date the ID scientific school of thought, especially in light of Orgel, Wicken and Hoyle across the 70’s into the 80’s.

    The concept of a definable characterisable narrow zone in a field of possible configs traces to statistical thermodynamics and is relevant to the statistical grounding of the second law. That one may see that a blind sample or random walk of limited scope will be challenged to plausibly capture such zones follows from simple probability of sampling.

    I have used the idea of dropping darts from a height to a chart of a bell curve carried out several SDs. It is easy to get the bulk, but to catch the truly far skirt with any reasonable number of tosses is harder. This is in fact a key insight of traditional hyp testing.

    In that context, WmAD has sought to construct a metric model, esp c 2005.

    One may wish to debate the merits and demerits of such, maybe with some profit.

    But if one conflates the discussion of that with the wider issue of searching large spaces of configs of a system blindly, hoping to catch narrow zones in it on available search resources, one runs the risk of a strawman fallacy.

    On the characterisation of zones T, a short description that specifies the zone (and like ones of similar character) is to be seen as contrasted with effectively being forced to list its members. This is a big part of the paint the target around where you already hit issue.

    While WmAD is interested in general characterisations that then may be subject to all sorts of fruitless debates, in fact from the days of Orgel and Wicken, function dependent on specific, complex arrangement, has been recognised as the material case of interest for focussed discussion.

    Where, function can easily be recognised.

    So, the material issue is in fact FSCO/I, especially its digitally coded form, dFSCI. Which is right there in the world of life in vital contexts: D/RNA, protein and enzyme synthesis, proteins themselves. Involving codes, prescriptive instructions, algorithms, initiation, continuation and halting, etc. Carried out in organised, co-ordinated execution machines based on the same molecular nanotech.

    And, involving a kinematic von Neumann Self Replication process.

    Bring this to bear in light of the search space challenges involved, and the further fact that protein fold domains are deeply isolated in AA sequence space.

    Then address the empirical observation on the only empirically known source of FSCO/I. Do so, in light of the needle in haystack blind search challenge and the credible range for an initial genome: 100 – 1,000 kbits of information.

    KF

  108. 108
    kairosfocus says:

    F/N: It is probably worth noting that Durston et al are working on the premise that mechanisms for genomes and proteins by extension that were likely enough to be observable on solar system or observed cosmos scope, would be reflected in the range of protein variation in a given family of proteins as observed in actual life forms. This implicitly captures probabilities that are practically relevant. Also, the info-probability relationship can be worked both ways, up to the “fat zero” of practically feasible possibilities. KF

  109. 109
    Dionisio says:

    gpuccio
    Thank you for writing another detailed explanation of the discussed subject.

    Is the protein functionality associated with its 3D shape?
    Is the 3D shape related to the AA sequence?
    Does a given AA sequence produce the same 3D shape?
    What is the role of the chaperones? Are they related to the folded 3D shape?
    How is all the above quantified or accounted for in dFSCI ?

  110. 110
    kairosfocus says:

    D:

    >> Is the protein functionality associated with its 3D shape?>>

    a –> It has to fold to a key-lock fitting shape as a first step to functioning. It may need other AA chains, and there may be additions that give chemically active clefts etc.

    >>Is the 3D shape related to the AA sequence?>>

    b –> Yes, a balance of forces, “seeking” a reasonably stable 3-D config

    >>Does a given AA sequence produce the same 3D shape?>>

    c –> I presume, always. No, often there is a chaperoned fold to the required shape and other patterns are possible.

    d –> Prions (as in mad cow disease and maybe Alzheimer’s), are mis-folded, and are MORE stable than the biofunctional shape.

    >>What is the role of the chaperones? Are they related to the folded 3D shape?>>

    e –> They enable proper folds for biofunction, IIRC 10% of proteins.

    >>How is all the above quantified or accounted for in dFSCI ? >>

    f –> To a limited degree, in the requisite of FUNCTION, which is linked to specific clusters of sequences that allow fold and relevant key-lock fit and function.

    g –> Recall, the issue is, a string, information-bearing structure within narrow functionally specific zones T in wider spaces of possible string states.

    h –> The ATP synthetase either works or it does not. If not, there is no factory for the energy battery molecules for the many endothermic rxns involved in life. Almost instant death — as IIRC cyanide poisoning shows.

    KF

  111. 111
    gpuccio says:

    Dionisio:

    a) Yes, the protein functionality depends critically on the 3D shape, but obviously also on the biochemical properties of individual AA residues. The 3D shape has at least two important levels: the general folding of the molecule, and the active site. We could say that the general folding of the molecule determines the active site and its form. The biochemical activity of the active site, obviously, depends both on its 3D form and on its biochemical nature. Moreover, in many cases when the active site interacts with its biochemical target, the general 3D configuration of the whole molecule changes, and that can activate or repress other active sites in the molecule.

    b) Yes, the 3D shape is strictly related to the AA sequence. The AA sequence (the primary structure of the protein) determines both the secondary structure (alpha elices, beta sheets) and the tertiary structure (the general 3D folding). However, the 3D shape of a protein must not be considered as a static condition: it is very dynamic.

    Folding is a very complex process, and it is very difficult to compute it from the primary structure (the sequence). It depends on many biochemical variables. It happens very efficiently and rapidly in functional proteins, but in many cases it requires the help of chaperones.

    c) Yes, in general I would say yes. Maybe there are some exceptions, but in general that is the case. But even small modifications can change the folding very much (or, sometimes, not at all). Moreover, post-translational modifications, which do not change the AA sequence, but act differently, for example by adding other molecules to the protein, can change the 3D structure and the function very much.

    d) Chaperones are proteins which help other proteins in the folding process, mainly by preventing wrong folding (bot also in other ways). Not all proteins require chaperones to fold correctly, but many do.

    e) dFSCI can be applied to many contexts and many systems. I generally use it to compute the functional information of protein sequences, mainly basic protein domains or protein superfamilies. The reason for that is that they are the best scenario, at present, for the design inference, because much is known in detail about protein sequence and function. In this case, the sequence of the protein (or of its coding gene, which is more or less the same thing) is the functional object for which we compute dFSCI. IOWs, what we are asking is: this new functional sequence which was not present at time T0, originated in this system between time T0 and time T1. Given for granted all that was already present in the system at time T0, could this specific new functional sequence arise by random variation? If that sequence, for that system, exhibits dFSCI (and no functional intermediates are known, so that we have no explicit NS model that can help its emergence) then we can infer design for it as the best explanation.

    As you can see, the sequence is the real vehicle of the function. We are dealing with digital information here, and therefore the sequence is all, like the sequence of bits in a piece of software is all. In this scenario, the machinery which contributes to the new function is supposed to be already in the system at time 0, and therefore does not contribute to the computation of dFSCI for this specific transition (the emergence of the new protein).

    If we want to apply the concept of dFSCI to more complex systems (let’s say a regulatory network) we need to know the sequence modifications in physical objects (if we are dealing with digital information), or more in general the modifications in physical objects (if we are dealing with analog information) that are necessary to implement the new regulatory function which was not present at T0 and is present at T1, and again compute the ratio between the total number of states (of all the objects implied) which are compatible with the new function and the total number of possible states, and use an appropriate threshold which takes into consideration the probabilistic resources for the whole system, IOWs, the number of possible different states reached by the system through RV in the allotted time span, and evaluate if the functional transition in the system exhibits dFSCI.

    That is perfectly possible, but you can see that it is more complex. That’s why I stick to single proteins to test the concept of dFSCI. However, it is rather obvious that more complex systems, and especially regulatory networks, are certainly very good examples of dFSCI. Maybe that, as our understanding grows, it will be easier to compute dFSCI for them too.

    By the way, have you seen the new ENCODE articles in the current issue of Nature? 🙂

  112. 112
    Dionisio says:

    KF
    Thank you for responding to my questions. I appreciate it very much.

  113. 113
    Joe says:

    gpuccio- your countryman, Giuseppe Semonti, has a chapter titled “What teaches proteins their shapes” in “Why is a Fly Not a Horse?” It seems it isn’t as simple as saying that AA sequence dictates the shape.

  114. 114
    Dionisio says:

    gpuccio

    Thank you very much for the detailed explanation of the questions I posted. Again!

    To me there are concepts and terms I have to chew well in order to digest them correctly. Sometimes I may ask what seem like redundant, rhetorical or even ‘duh!’ questions, but I want to ensure I’m understanding well what you write in other comments you wrote to GD and other interlocutors here.

    Hopefully many visiting onlookers are learning as much as I do from reading your and KF’s commentaries in this thread.

    I have not read the new ENCODE articles in the current edition of Nature yet, but now that you’ve asked, I’m going to read it next!

    Mile grazie caro Dottore!

    P.S. I’m recovering from a dental surgery for extracting a cracked molar (last on the left lower side).

  115. 115
    gpuccio says:

    Joe:

    Nobody is saying it is simple. But in general, there is a very strong relationship between sequence and shape. I don’t think that anybody can deny that. Monogenic disease are caused by single aminoacid mutations which affect negatively the protin’s shape and function.

  116. 116
    gpuccio says:

    Dionisio:

    I hope you will recover quickly! 🙂

  117. 117
    kairosfocus says:

    D: OUCH! Trust you get better soon, but I guess this thread helps ease “de pains.” KF

  118. 118
    Dionisio says:

    GP and KF
    Thank you.

    P.S. maybe KF would host an UD meeting in the Caribbean, so we can chat by the beach? 😉

  119. 119
    rich says:

    Hope the Real world is treating everyone, KF especially better.

    I think both CSI and FSCO/I are flawed. This is evidenced by the fact that no-one is using them for design detection.

    At least CSI has a mechanism that allows for the creation mechanism (I think Dembski was thinking evolution) as the null hypothesis.

    FSCO/I seems to only argue against spontaneous assembly, it is a reformation of Hoyle’s arguments with the addition of the UPB as a ceiling. It is unusable and not relevant to evolution, so we should return to CSI as a more honest approach – FSCO/I at its core has a straw man.

  120. 120
    Gordon Davisson says:

    gpuccio:

    Thank you for you #102. Although addressed to Dionisio, it is a also a very good comment of my #101.

    First of all, I am really grateful to you because you seem to understand my personal approach much better than many others have. That is, for me, an absolute gratification.

    I’m glad you liked it. But I’m also a little disturbed, because I don’t feel I have a really good grasp of your argument — general outline, sure, but the details matter, and I don’t have them properly sorted out in my head (your #101 actually confused me significantlly; I’ll try to ask a coherent question about this later in this message). And if I don’t really understand it that well… does anyone else?

    I do want to comment, at least briefly, on some of the points you raise:

    So, I would be grateful if you could explain better why you state:

    “measures like Durston’s Fits that’re easy to measure but hard to base a probability calculation on”.

    Let me give a simple (and completely made-up) example to illustrate the issue. Suppose we had a 750-base-pair gene (I know, pretty small, but as I said this is a simple example), meaning its total information content was 1500 bits. Let’s say that 250 of those bases are fully required for the gene’s function, and the other 500 could be anything without changing its function. That means we should get around 500 Fits for this gene (assuming the sequence space is fully explored etc). But that doesn’t mean that the probability of something like this evolving is 1 in 2^500, because there are other factors you need to take into account.

    First, there are many different functions that a gene might have, and we’re only looking at the probability of that particular one evolving. Let’s say, for the sake of illustration, that there are 2^50 possible functions that a gene might perform.

    Also, there are likely to be many different ways that a gene might perform any of these functions. I’m not talking about minor sequence variation (Durston’s method accounts for those), but e.g. the difference between the bacterial and archaeal flagella — very different structures, but essentially the same function. Again for the sake of illustration, let’s say there are 2^250 possible structures corresponding to each of those functions (and to oversimplify even further, assume each has the same degree of functional restriction, i.e. the same Fits). That means that while each possible gene’s “functional island” corresponds to only 1/2^500 of the sequence space, a total of 1/2^200 of the sequence space corresponds to some function.

    But not all sequences are equally likely, because selection biases the search toward the sequence space near other successful solutions, and functional sequences seem to cluster together (e.g. gene families). Let’s say (again, all numbers made up for the sake of illustration) that this increases the “hit rate” for functional sequences by a factor of 10^100. That means that while functional sequences make up only 1/2^200 of the sequence space, evolution stumbles into one every 1/2^100 or so “tries”.

    There appear to be about 5e30 ~= 2^102 bacteria on Earth (finally, a non-made-up number!)… after some fiddling about the number of mutations (/new potential genes) per bacteria per generation, that means we’d expect roughly one new functional gene per generation. That’s the real probability calculation I was talking about.

    Or rather, it would be the calculation I was talking about if it had real numbers, rather than just made-up-out-of-thin-air ones. And I have no idea what the real numbers are. I don’t think there’s any way to get a good handle on them until we know far more about the large-scale shape of the fitness function (i.e. the mapping between sequence and function) than we do now. But if you want to do a probability argument, you pretty much need them.

    Dembski’s CSI formula provides a good framework for handling these factors. phi_S(T) provides an upper bound on the number of functions that can be given at-least-as-simple descriptions (i.e. the 2^50 factor in my example), and the other two need to be taken into account when calculating P(T|H). But if you use a formula that doesn’t include them…

    Also, I am not sure that I understand what you mean in this final comment about my position:

    “And that means he needs something far far over to the Durston end of that spectrum.”

    Except for these points, I wholly agree with all that you say. Thank you again for your very deep and clear views.

    Here’s where your #101 confused me, because it seems to conflict with what I thought I knew about your approach. My understanding from previous discussion (which I admit I haven’t always followed in full detail) was that your argument that dFSCI can’t be produced without intelligence is that it hasn’t been observed to. In order to make a solid case like this, you really have to be able to go on and say “…and if it could be produced naturally, we would have observed it.” And that means we must have tested many cases, and (in each of those cases) be able to tell if dFSCI has appeared.

    Dembski’s CSI doesn’t allow this. According to him, “Does nature exhibit actual specified complexity? The jury is still out.” (from “Explaining Specified Complexity“, quoted via Winston Ewert since the original isn’t loading for me at the moment). All of the functional complexity in the biological world, and Dembski isn’t sure if it actually qualifies as CSI (in his sense) because of unknowns like the ones I’ve been harping on. If he can’t tell for sure if a human exhibits CSI, what hope have you of telling if this new mutation you just observed does?

    So, for your purposes (at least, as I thought I understood them), you need something that doesn’t have any big messy unknowns in the formula. Something much more like Durston’s metric.

    But here’s where I got confused. In #101, you said:

    […]f) A cutoff appropriate for the system and its probabilistic resources, sufficient to make the random emergence of a functional state for that function absolutely unlikely, given the probabilistic resources of the system in the time span.

    At that point, if the functional complexity of the object (as dFSCI for the specified function) exceeds the appropriate cutoff, we categorize dFSCI as exhibited by the object, and we can infer design…

    If we have reasonably excluded that any known algorithm can intervene to lower the probabilistic barriers that we have just analyzed.

    At that point, design is the best explanation available.

    …but that sounds more like the sort of argument from theoretical probability that Dembski’s working toward, and it means you do need to take those messy unknowns into account in your calculations.

    It also seems to conflict with the empirical approach I thought you were taking. Suppose we observed dFSCI appearing (in, say, something like the Lenski bacteria experiment): would that be evidence that it can be produced naturally, or evidence that some intelligence intervened to add it? Without some way of distinguishing the two, I don’t see how the claim that dFSCI only comes from intelligence can be empirically tested.

    Therefore, I am now confused. Is the source of my confusion clear enough that you can see what I’m worried about, and clarify it for me?

  121. 121
    Gordon Davisson says:

    Once again, I failed at proofreading.

    Let’s say (again, all numbers made up for the sake of illustration) that this [selection] increases the “hit rate” for functional sequences by a factor of 10^100.

    … that should be 2^100, not 10^100.

  122. 122
    kairosfocus says:

    Rich, FYI WmAD has explicitly noted that in the biological world, specification is always linked to function, and indeed, function is also a key component of irreducible complexity. Add the work of Durston et al on functional sequence complexity (as opposed to orderly and random), and that of Meyer as he engages the issues of functional specificity and complexity in his published work. Dismissal that “nobody uses” fails the basic fact test. More to the point, FSCO/I in the form dFSCI is abundantly common as what is reported by our PCs as file sizes, which we would never dream of assigning to lucky noise etc. And, functional specificity is as close and as real as you had better have the right part put in correctly, to fix your car. KF

  123. 123
    kairosfocus says:

    D: A UD meeting in Antigua would be a feasible proposition. “De beaches are nice, much nicer dan ice . . . ” KF

  124. 124
    kairosfocus says:

    Rich: Nope, the lineage is Thaxton et al thence Orgel-Wicken [and way back, Cicero . . . ], but even if it were Sir Fred, you would be well advised to think again before dismissing as if his very name is a mark of fallacy; a Nobel-Equivalent prize holder speaking on a matter of his expertise. When you have a substantial answer to the matter posed in the infographic, let us hear it. Meanwhile the successive side tracks and strawman caricatures are duly noted. KF

  125. 125
    Dionisio says:

    KF,
    Wow! In Antigua?
    Just tell us when, I’m ready anytime! 😉
    I’m telling my wife to book our flights.
    Will leave our orange tabby cat in our children’s home.
    Only problem the Caribbean environment might be a little distracting, hence not very conducive to having a serious meeting 😉
    But I have to wait until the left side of my face is not swollen. It’s getting better now, but not completely recovered yet. You see, that’s another sound proof that ‘n-D e’ is true – if my body were designed it would have healed in an instant and completely painless, wouldn’t it? 😉

  126. 126
    Gordon Davisson says:

    Upright Biped (back at #93):

    Hi Gordon,

    You once made a statement on this forum that you didn’t know of a definition of information which can be shown to be present in DNA and also cannot be produced without intelligence.

    If this is the case, did you mean you already knew of a origin of information like that in DNA that had come about without intelligence, or are you merely assuming such information can come about without intelligence?

    Not quite either of them; I’m saying I haven’t seen a convincing case that the type of information in DNA requires an intelligent source, and I think that there’s a reasonable case that in theory it can come from an unintelligent source, but I don’t claim to be able to demonstrate this in practice.

    (Of course, I think that the information in the living world actually did come about without intelligent assistance, but I don’t claim to be able to demonstrate this. This part is just an assumption on my part, based on Occam’s razor and what seems to me to be a lack of evidence for intelligent assistance.)

    Let me give a quick sketch of my theoretical argument that this sort of information can come from unintelligent sources. Basically, it comes from the success of genetic algorithms: they show that random variation + selection for function can produce functional information.

    But what about the No Free Lunch theorem, you ask? The NFLT shows that genetic algorithms only produce functional information (in nontrivial amounts) if the fitness function has the right characteristics; a random fitness function will stymie the evolutionary algorithm, and it’ll do no better than chance.

    Dembski, Marks, Ewert et al have extended this in various ways. I’m not as familiar with this work as I probably should be, but in general I haven’t been particularly impressed by their approach: they seem to be using very abstract models that don’t correspond very well with real evolution, and mix the real issue together with a bunch of irrelevant confusion. So I’ll ignore their models and just go with a simpler intuititive version of what I see as the real issue.

    Actually, I’ll claim there are three related issues: does the fitness function have the right shape to let evolution work better than chance (I say: yes!), does it have a right enough shape to work well enough to explain all the complex functional information in living organisms (I say: maybe!), and can that “right shape” be explained without requiring an intelligent agent to design it (I say: yes!). Let me take these slightly out of order.

    First: does the fitness function have the right shape to let evolution work better than chance (or, in Dembski et al’s terms, is there active information in the fitness function)? I think the answer on this one has to be a pretty clear yes. Take a look, for example, at the recent dustup between Mike Behe and and Larry Moran about how difficult it was for the malaria parasite to evolve chloroquine resistance. Setting aside the disagreement about the precise math, note that none of the estimates of difficulty are anywhere near searching the entire space of possible protein sequences to find that new function; but under the random-fitness-function model assumed by NFLT, that’s pretty much what would have been necessary. Under the NFLT random model, even minor changes to the function of a protein would have to basically start searching from scratch, because it assumes no correlation in function between similar gene/protein sequences.

    Second: can that “right shape” be explained without requiring an intelligent agent to design it? Again, I think the answer here is a pretty clear yes. Dembski argue that there’s no reason (other than ID) to expect the fitness function to have the right shape, but I think there’s a very good reason: similar causes tend to have similar effects, and as a result we can reasonably expect similar gene/protein sequences to have similar functions (as in the chloroquine resistance example), and that means that finding one functional sequence will increase the chance of find others nearby, and hence of evolution doing better than random.

    Third: does it have a right enough shape to work well enough to explain all the complex functional information in living organisms? If you look at the debate between Behe and Moran, they actually agree about the basics — that evolution needs selectable intermediates to guide it to distant functional sequences — but their disagrement is about the details of how many selectable intermediates are needed, how distant is distant, etc… And they’re disagreeing about this in a case where the possible intermediates have been mapped out in great detail! Trying to do similar reasoning about larger-scale and less-well-mapped-out portions of the fitness function is, at least as far as I can see, basically hopeless at this point. And that’s what leads me to say that I can’t show this in practice.

  127. 127
    Upright BiPed says:

    Hello Gordon,

    I didn’t mean to imply anything more or less than what follows from your previous statement. Here is the direct quote:

    I haven’t seen a definition [of information] which can be shown to be present in DNA and also cannot be produced without intelligence.

    When I read your statement at the time, I was struck by the obvious fact that no one has ever seen any information (like that in DNA) come into existence without intelligent intervention, and so the basis of the claim was (to me) a bit of a mystery. I suppose the wildcard in your statement is the definition of information itself, however, I would not think your statement suddenly becomes warranted by the lack of a suitable definition, given that (regardless of the definition) it remains a fact that no one has any experience whatsoever of such information rising without intelligence.

    In any case, if I read you correctly then I really appreciate your candor that you know of no information (like that in DNA) coming into existence without intelligence, but instead, you make an assumption it can do so based on your chosen line of reasoning. I disagree with your reasoning because all of it simply takes for granted the material conditions required for information to come into existence in the first place.

    On a semiotic view, those conditions include: representation (i.e. an arrangement of matter to evoke a functional effect within a system, where the arrangement of the medium and the effect it evokes are physicochemically arbitrary); specification (i.e. a physical protocol to establish the otherwise non-existent relationship between the arrangement of the medium and its post-translation effect); and discontinuity (i.e. the observation that, in order to function, the organization of the system must preserve the discontinuity between the arrangement of the medium and its effect).

    These are the necessary interdependent conditions found in any instance of translated information, and they provide for a fairly steep entry to function, particularly in an inanimate pre-biotic environment. Two sets of matter must arise where one encodes the information and the other establishes what the results of that encoding will be, but because the organization of the system must also preserve the discontinuity, a set of relationships are thereby created that otherwise wouldn’t exist – and all of this must occur prior to the organization of the cell. Not only would this system necessarily arise in an inanimate environment, but the details of its construction must be simultaneously encoded in the very information that it makes possible.

    One additional interesting thing about DNA is of the particular type of semiotic system found there. There are two distinct categories of semiotic systems. One category uses physical representations that are reducible to their material make-up (such as a pheromone for instance), while the other uses physical representations that have a dimensional orientation and are not reducible to their material make-up (i.e. they are independent of the minimum total potential energy principle). The first type is found throughout the living kingdom. The second type is found nowhere else but in the translation of language and mathematics. Such systems require not only the same transfer protocols as any other semiotic system, but also require an additional set of systematic protocols to establish the dimensional operation of the system itself.
    This leads to an intractable observation; the incredibly unique material conditions required for dimensional semiosis, which would ostensibly not exist on Earth until the rise of human intelligence, were entirely evident at the very origin of life. They are the physical means by which the living cell becomes organized.

    Given these observations, none of which are really even controversial, it seems a little dismissive of you to claim that no evidence exists of intelligent intervention in the information that organizes life.

  128. 128
    gpuccio says:

    Gordon Davisson:

    Thank you for commenting. I have not the time now, but I will address your very good points as soon as possible. 🙂

  129. 129
    kairosfocus says:

    UB: It amazes me the endless refusals to follow evident facts and cogent reasoning occasioned by the zero concessions I object to ID and to ID thinkers rhetorical strategy we see. It is sad, really, as it points to a spreading degradation of clarity and reasonableness in thought that has chilling implications for our civilisation. I just thought a word of encouragement would help. KF

    PS: The widespread assumption and cynical manipulation of the “generally dumb public,” are truly saddening. Resistance to reason and genuine enlightenment (there is also false light that is in reality darkness of a Plato’s Cave shadow-show out there).

  130. 130
    kairosfocus says:

    GD,

    on the No Free Lunch issue you conflate two things and reflect a key gap in understanding.

    Behe spoke to the challenge of EVEN micro-evolution WITHIN an Island of function. The evidence indicates that with realistic, generous generation sizes, times and mut rates, it is hard, hard, hard to get a double mutation based change going, in a creature which is a going concern.

    This raises serious how much more the case questions on macro-evo, on the imagined broad continent of beings accessible through incremental changes model implicit in so much evolutionary theorising in light of the tree of life model and its variants.

    So, even were the observation that FSCO/I naturally comes in isolated islands due to requisites of function based on correct co-ordination and coupling of many parts grossly in error, there is a major challenge.

    But in fact, the islands of function pattern is well grounded empirically and analytically.

    And the point is, that blind search in a context of FSCO/I is truly blind.

    So, your random incremental walk, once it is in the sea of non-function, has to face blind search resource challenges to reach an island of function in a context where per solar system and 500 bits of FSCO/I, the atomic resources and time can sample something like one straw to a cubical haystack 1,000 LY across (about as thick as our galaxy). The cosmos scale case of 1,000 bits swamps our observed cosmos’ search capacity to a ratio that is higher than the SQUARE of that one straw to a haystack as thick as our galaxy.

    So, there is a severe challenge to FIND the shores of an island of function through blind search. And, there is a challenge to implement irreducibly complex multiple component co-ordinated steps within such an island.

    Again, complementary, not contradictory.

    To give an idea, dot the illustrated picture in the OP with a large number of T’s, though they still must be reasonably isolated — not the BULK. Now, take a sample of 1 in 10^150 or worse (MUCH worse for realistic genomes), that is blind. Overwhelmingly you will get the bulk, straw not needles.

    Assume for argument you start at one needle and jump off “into the blue” blindly. Sampling less than 1 in 10^150 through an incremental random walk with no reinforcement until you hit another island, come up with a plausible, empirically and analytically warranted case as to how you can reach another island without a very directional information-rich driving push or strongly pulling and equally highly informational oracle.

    In short when we face an overwhelming search challenge, solved, active information is its best warranted explanation.

    KF

  131. 131
    gpuccio says:

    Gordon Davisson:

    Here I am. I will try to address your points one at a time, if possible in different posts, so that it is more easy for me and, I hope, for you to go on with the discussion.

    Obviously, my references to dFSCI in this thread have been generic and brief. I have discussed many of these points in detail elsewhere. But I must sat that you make some points with great clarity and balance, and so it will be a pleasure to deepen the discussion with you. The fact is, the general concept of dFSCI is simple and intuitive enough, but to answer the inevitable counter arguments a lot of detail is needed.

    In this first post, I would like to make a few simple premises which will help us much in the following discussion. The following points should not be controversial, but please feel free to intervene at any moment while I try to develop my reasoning.

    So, the premises:

    1) ID theory is, IMO, the best explanation available for complex functional biological information. The reasons for that conclusion will be the object of the following discussion. My simple premise here is that ID theory, like all other scientific theories, is only a theory. I believe that no scientific theory is final, and that all of them should be tested again as new facts and new understanding are gathered. I live to neo darwinism the arguable privilege of being “a theory which has become a fact”. In my world, theories and facts are separate categories. So, all my reasonings are aimed at showing that ID is the best explanation, not the only or the final explanation. What I mean is that, form me, “The jury will always be out”. Let’s say that, while the jury is out (which will probably be forever), each of us has the privilege and duty to choose what he considers the best explanation. That’s the way I conceive science: a personal choice based on what is available.

    2) There are obviously, as you correctly state, ” big messy unknowns”, but they are in the whole issue of biological information and of how it emerged. They are not in my formula or in my reasoning, they are in all formulas and in all reasonings about the issue, because the issue is much more complex than we can imagine, even with all that we know, and that grows daily. I would like to remind here that ID is trying, at least, to offer a quantitative approach to the problem of probabilities in biological evolution. That is not only the duty of ID: it is the duty of anyone who is interested in the problem, most of all the duty of those who believe that the neo darwinian model is a good solution. After all, it’s the neo darwinian model which is vastly based on RV (a probabilistic system). So, it is the cogent duty of neo darwinists to show that their random explanation is appropriate. They have repeatedly avoided any serious analysis of that aspect, so ID is trying to do that for them. OK, there are big messy unknowns, but it must be tried just the same, and refined as the unknown become less messy and, I hope, smaller.

    So, given these very general premises, let’s go to the concept of dSFCI, which I will discuss in next post.

  132. 132
    Dionisio says:

    gpuccio

    I live to neo darwinism the arguable privilege of being “a theory which has become a fact”.

    leave?

    BTW, I plan to read and learn from your coming posts too. Thanks.

    ³

  133. 133
    gpuccio says:

    Gordon Davisson:

    Here I will discuss the definition and meaning of dFSCI as a tool for design detection without any reference to its application in the biological field. OK? So, for this point, we can completely ignore biological information and the design inference for it. Let’s just say that we agree, for the moment, to consider all objects in the biological world as “objects whose origin is not known, and will be discussed later”. I think that we can agree that the origin of biological objects, or at least of the specific information in them, is at best controversial. If it were not controversial, we would not be here to discuss.

    So, the concept of dFSCI is, as I have argued, a restricted subset of the concept of CSI.

    In my definition, an observer is completely free to define explicitly any possible function for any possible object (in particular, digital sequence). There are no limitations (this is an important point) and the function must be objectively defined so that its presence or absence can unambiguosly be assessed for any possible object (sequence). Once the function is defined, a value of dFSI can be (at least in principle) computed for that function, an appropriate threshold of complexity can be established for the system and time span where we assume the object originated, and by that threshold we can asses if the object exhibits dFSCI for that system, after having checked that there are no known algorithmic procedures in the system that can help to overcome the probabilistic barriers. If we conclude that the object exhibits dFSCI, we can infer design.

    Why? Because it can be shown that this method, if applied correctly to any object out of the biological world, will work with 100% specificity to detect objects designed by humans. It has, obviously, low sensitivity. IOWs, it has no false positives, and many false negatives.

    That is the consequence of choosing a threshold which definitely favors the specificity – sensitivity tradeoff to obtain absolute specificity.

    So, how is the threshold established? It’s simple. We compute the probabilistic resources of the system in the time span (the number of new states that the system can reach in the time span, given the “engines of variation” acting in it). As the dFSI value can be interpreted as the probability of getting to that specific functional state in one attempt, assuming an uniform distribution of the possible states, we must choose a threshold which is much higher (in -log2 terms) than the probabilistic resources of the system, so that the probabilistic explanation can be safely rejected. Obviously, we don’t choose a 0.05 alpha level as in ordinary research! That would be folly. We must have a threshold that is many orders of magnitude higher than the computed probabilistic resources.

    That’s how I have proposed by a (very gross) computation a threshold of 150 bits for biological information on our planet, I an not yet speaking of biological information here, I want just to show the procedure to compute an appropriate threshold. I don’t remember the numbers now, so I will just give an idea. I grossly considered the total number of bacteria on earth and 5 billion years of existence for our planet, and an average bacterial mutation rate, and computed the total number of mutations in that system for that time span. I considered that as a reasonable higher threshold for the biological probabilistic resource of our planet. Then I added many orders of magnitude to that, and arrived to my proposed number of 150 bits. My computation is very gross and can well be wrong and need refinement, but I suppose that the final reasonable value will still be much lower than the 500 bits of the UPB.

    Now, two simple facts:

    a) The procedure to compute dFSCI, as you can see, is completely specific for one explicit function. It takes not into account other possible functions.

    b) Given that, if we apply the procedure to any digital sequence, we can easily identify correctly designed sequences, with 100% specificity and low sensitivity. This is the empirical validation of the model. It can be easily applied to language or software or any digital machine. I am not aware of any false positive that has ever occurred. To avoid long computations, it will be enough to use the 500 bit threshold (UPB), which is certainly appropriate for any system in our universe.

    Two important recommendations:

    1) The sequence must clearly be of the “non ordered” kind. IOWs, it must be grossly pseudo-random. That is usually satisfied in language and software, provided that the sequence is long enough, because in language and software the link between sequence and function is generated by cognitive and functional rules, and cannot be compressed into a simple algorithm. In these conditions, it is easy to exclude a necessity origin of the sequence.

    2) It is important to apply the model only to original, new dFSCI: IOWs, a sequence that was not present in the system at time 0, not even as homologues, and whose function is new.

    I maintain that, in this way, we can recognize designed objects (out of the biological world) with 100% specificity and low sensitivity. In all positive cases, the form we observe in the object has been represented, willed and outputted by a conscious designer (a human).

  134. 134
    gpuccio says:

    Dionisio:

    Yes, that was “leave”, definitely. Thank you for the correction. 🙂

  135. 135
    Dionisio says:

    gpuccio

    What I mean is that, form me, “The jury will always be out”.

    Agree, although that’s a challenging statement. :O

    P.S. I encounter the same issue with the type-ahead feature, which keeps rewriting some words in my comments, and many times I don’t notice the change until after I have posted the text. Pretty annoying sometimes.

  136. 136
    Dionisio says:

    gpuccio

    What I mean is that, for me, “The jury will always be out”.

    Unending revelation of the ultimate reality 🙂

  137. 137
    Dionisio says:

    gpuccio

    I would like to remind here that ID is trying, at least, to offer a quantitative approach to the problem of probabilities in biological evolution. That is not only the duty of ID: it is the duty of anyone who is interested in the problem, most of all the duty of those who believe that the neo darwinian model is a good solution.

    Does this apply to ‘the third way’ folks too? 😉

  138. 138
    kairosfocus says:

    GP: I suggest referring to the string data structure:

    . . . -*-*-*-*- . . .

    This allows us to see how digitally coded info is normally laid out, whether bits, hex code, octal code, decimal digits, sexagesimal ones [why 60 mins in the hour etc . . . ask the Babylonians!], alphabetic glyphs, or ASCII or even UNICODE, etc. Where D/RNA and Amino Acid chains are also capable of bearing such info.

    I add, as 3-D functionally organised entities may be represented by using coded strings (cf. AutoCAD etc), this is WLOG.

    KF

  139. 139
    Dionisio says:

    gpuccio

    There are obviously, […], ” big messy unknowns”, but they are in the whole issue of biological information and of how it emerged. They are not in my formula or in my reasoning, they are in all formulas and in all reasonings about the issue, because the issue is much more complex than we can imagine, even with all that we know, and that grows daily.

    Agree. This is a very good quotable declaration. 🙂

  140. 140
    gpuccio says:

    Gordon Davisson:

    OK, now let’s go to biology at last.

    Biological objects are the issue: were they designed (as has been believed for centuries, and many still believe) or do they only appear to be designed, but can be reasonably explained by some RV + NS algorithm? That many biological objects exhibit a strong appearance of design is rather obvious: even Dawkins admits that.

    Now, as we have developed a formal property which works so well for human designed things (at least in the sense of specificity), it is perfectly natural to apply it to biological objects. As our tool, dFSCI, is easily applied to digital sequences, the natural objects to apply it are certainly genes, and in particular protein coding genes. We can work with the gene or (as I usually do) with the corresponding protein sequence. There is not a great difference.

    I will take for granted, for the moment, that dFSCI can be approximated for a specific protein family by the Durston method. You seem to accept that, so I will go on fro the moment to address your specific objections.

    But, here again, I need a few premises about the scenario I will consider:

    a) My aim is not (and never has been) to demonstrate that all existing proteins exhibit dFSCI, and therefore must be considered designed. First of all, some short peptides can be so simple that a design inference cannot be made for them. Second, I have often stated that variation in the active site in a protein family, even with great differences on the final function, can be in principle compatible with the neo darwinian model. That could be the case, for example, for nylonase. In these cases, the transition is rather simple (a few aminoacids), and it would not be categorized as dFSCI.

    There are a lot of transitions, in protein families, which are somewhere in the middle. They could be considered as borderline cases, in which it is difficult at present to make a judgement.

    My aim is, definitely, to show that there are some proteins (indeed, a lot of them) for which a design inference is at present, by far, the best explanation.

    b) For the above reasons, I will focus on a very definite scenario: the emergence in natural history of a new protein superfamily, with an original sequence and a new biochemical function. As there are about 2000 superfamilies, all of them quite unrelated at sequence level, all of them with different functions, there is a great abundance of cases to be considered.

    c) Another important note is that I will define as “function” what I call “the local function”, that is the specific biochemical activity that characterizes the proteins. So, I will deal with proteins which have a very clear functional biochemical activity, such as enzymes, or ATP synthase, rather than with proteins with a regulatory function, or which work in a more complex setting, like a protein cascade. IOWs, I will not deal, for now, with cases of irreducible complexity: just single proteins, with a clear local function, which can be well defined and characterized.

    Enzymes are a very good example: an enzyme is a wonderful machine, which realizes a “biochemical miracle”: a reaction which would never occur (or would occur too slowly to be useful) in the absence of the enzyme itself. An enzymatic activity can be well defined, measure in the lab in appropriate contexts, and a threshold of activity can easily be established. All that is of great help for our discussion.

    d) Finally, I will consider as possible non design explanation only the neo darwinian model: RV + NS. Why? Because I am not aware of any other non design explanation on the market. All other “explanations” (beginning with neutral theory) don’t explain anything, at least not anything pertinent to our problem (how functional information emerges against probabilistic barriers). So, as I see it, the game, at present, is between two explanations only: design and neo darwinism.

    However, I am fully available to consider any other non design explanation: I just don’t know any such thing.

  141. 141
    gpuccio says:

    KF, Dionisio:

    Thank you for your attention and contributions.

    As you can see, for the moment I am fully focused on answering Gordon Davisson’s arguments, but be sure that I really appreciate your interventions! 🙂

  142. 142
    Dionisio says:

    gpuccio,
    I don’t expect my comments to be responded to, unless they pose serious questions, which is not the case this time.
    In some cases I repost parts of your comments, that I like so much, that I want to ensure others read them too.
    I’d rather see you using your limited spare time
    (a) working on your upcoming OP,
    (b) commenting on the difficult subjects being discussed here and
    (c) responding to the posts written by the interlocutors. Others -including myself and many anonymous visiting onlookers- can learn from your very insightful posts.
    Mile grazie caro amico Dottore!

  143. 143
    Joe says:

    rich:

    I think both CSI and FSCO/I are flawed. This is evidenced by the fact that no-one is using them for design detection.

    CORRECTION- No evolutionary biologists and no materialists are using them. But that is because they say design is accounted for by nature.

    However I will note that not one evolutionary biologist nor materialist has any evidence for their claims nor do they have a methodology to test their claims.

    FSCO/I seems to only argue against spontaneous assembly,…

    Spontaneous just means without a designer. It doesn’t mean “instantaneous”.

  144. 144
    Joe says:

    For gpuccio- FYI only

    With genetic engineering scientists have taken the protein coding sequence from one organism and transferred that to another. Only in rare cases did the newly implanted sequence produce a functioning protein, eg insulin. Most times the polypeptide was transcribed and translated but it failed to fold, meaning it was totally useless. Semonti discusses this in his book.

    Then there are prions which change the shape of a protein just by contact. If the sequence of AAs produced the shape that should not happen, yet it does. Sermonti discusses that also.

    As for monogenic disease, well sure, especially if the change is in the active site of the protein. It should only affect folding if the protein is small enough to fold without a chaperone.

    But anyway, just food for thought and not an attempt to argue…

  145. 145
    Mung says:

    Gordon Davisson:

    This part is just an assumption on my part, based on Occam’s razor and what seems to me to be a lack of evidence for intelligent assistance.)

    I don’t think that’s a legitimate appeal to Occam’s razor.

    Let me give a quick sketch of my theoretical argument that this sort of information can come from unintelligent sources. Basically, it comes from the success of genetic algorithms: they show that random variation + selection for function can produce functional information.

    So your best evidence for unintelligent design comes from programs that are intelligently designed? That seems just a tad odd.

  146. 146
    gpuccio says:

    Gordon Davisson:

    Now, let’s go to your arguments, in detail.

    You say:

    Let me give a simple (and completely made-up) example to illustrate the issue. Suppose we had a 750-base-pair gene (I know, pretty small, but as I said this is a simple example), meaning its total information content was 1500 bits. Let’s say that 250 of those bases are fully required for the gene’s function, and the other 500 could be anything without changing its function. That means we should get around 500 Fits for this gene (assuming the sequence space is fully explored etc). But that doesn’t mean that the probability of something like this evolving is 1 in 2^500, because there are other factors you need to take into account.

    OK. I will follow your example, but with the further specification that the scenario is the one I have suggested in my premises: a new protein, of a new superfamily, with new domain (or domains), IOWs a new functional sequence, unrelated to previously existing proteins, which emerges in a definite time span in the course of natural history. And which has a specific biochemical activity, possibly a new enzymatic activity, well defined and measurable in the lab.

    You go on:

    First, there are many different functions that a gene might have, and we’re only looking at the probability of that particular one evolving. Let’s say, for the sake of illustration, that there are 2^50 possible functions that a gene might perform.

    Also, there are likely to be many different ways that a gene might perform any of these functions. I’m not talking about minor sequence variation (Durston’s method accounts for those), but e.g. the difference between the bacterial and archaeal flagella — very different structures, but essentially the same function. Again for the sake of illustration, let’s say there are 2^250 possible structures corresponding to each of those functions (and to oversimplify even further, assume each has the same degree of functional restriction, i.e. the same Fits). That means that while each possible gene’s “functional island” corresponds to only 1/2^500 of the sequence space, a total of 1/2^200 of the sequence space corresponds to some function.

    OK. This is a very common objection, but I must say that you put it very well, very concretely. I will try to explain, as I have already made in past occasions, why that is not really a problem, even if the argument has certainly some relevance.

    I usually call this objection the “any possible function” argument. In brief, it says that it is wrong to compute the probability of a specific function (which is what dFSCI does, because dFSCI is specific for a defined function), when a lot of other functional gens could arise. IOWs, the true subset of which we should compute the probability is the subset of all functional genes, which is much more difficult to define. You add the further argument that the same gen can have many functions. That would complicates the computation even more, because, as I have said many times, dFSCI is computed for a specific function, explicitly defined, and not for all the possible functions of the observed object (the gene).

    I don’t agree that these objections, however reasonable, are relevant. For many reasons, that I will try to explain here.

    a) First of all, we must remember that the concept of dFSCI, before we apply it to biology, comes out as a tool to detect human design. Well, as I have tried to explain, dFSCI is defined for a specific function, not for all possible functions, and not for the object. IOWs, it is the complexity linked to the explicitly defined function. And yet, it can detect human design with 100% specificity. So, when we apply it to biological context, we can reasonably expect a similar behaviour and specificity.

    This is the empirical observation. But why does that happen? Why doesn’t dFSCI fail miserably in detecting human design? Why doesn’t it give a lot of false positives, if the existence of so many possible functions in general, and of so many possible functions fro the same object, should be considered a potential hindrance to its specificity?

    The explanation is simple, and it is similar to the reason why the second law of thermodinamics works. The simple fact is, if the ration between specified states and non specified states is really low, no specified state will ever be observed. Indeed, no ordered state is ever observed in the molecules of a gas even if there are potentially a lot of ordered states. The subset of ordered states is however trivial if compared to the subset of non ordered states.

    That’s exactly the reason why dFSCI, if we use an appropriate threshold of complexity, can detect human design with 100% specificity. The number of functionally specified states are simply too rare, is the total search space is big enough.

    I will give an example with language. If we take one of Shakespeare’s sonnets, we are absolutely confident that it was designed, even if after all it is not a very long composition, and even if we don’t make the necessary computations of its dFSCI.

    And yet, we could reason that there are a lot of sequences of characters of the same length which have meaning in english, and would be specified just the same. And we could reason that there are certainly a lot of other sequences of characters of the same length which have meaning in other known languages. And certainly a lot of sequences of characters of the same length which have meaning in possible languages that we don’t know. And that the same sequence, in principle, could have different meanings in other unknown languages, on other planets, and so on.

    Does any of those reasonings lower our empirical certainty that the sonnet was designed? Not at all. Why? Because it is simply too unlikely that such a specific sequence of characters, with such a specific, and beautiful meaning in English, could arise in a random system, even if given a lot of probabilistic resources.

    And how big is the search space here? My favourite one, n. 76, is 582 characters long, including spaces. Considering an alphabet of about 30 characters, the search space, if I am not wrong, should be of 2800 bits. And this is the search space, not the dFSCI. If we define the function as “any sequence which has good meaning in English”, the dFSCI is certainly much lower.

    As I have argued, the minimal dFSCI of ATP synthase alpha+beta subunit is about 1600 bits. Its search space if about 4500 bits, much higher than the Shakespeare sonnet’s search space.

    So, why should we doubt that ATPsyntase alpha+beta subunit was designed?

    For lack of time, I will discuss the other reasons against this argument, and the other arguments, in the following posts.

    By the way, here is Shakespeare’s sonnet n. 76, for the enjoyment of all!

    Why is my verse so barren of new pride,
    So far from variation or quick change?
    Why with the time do I not glance aside
    To new-found methods, and to compounds strange?
    Why write I still all one, ever the same,
    And keep invention in a noted weed,
    That every word doth almost tell my name,
    Showing their birth, and where they did proceed?
    O! know sweet love I always write of you,
    And you and love are still my argument;
    So all my best is dressing old words new,
    Spending again what is already spent:
    For as the sun is daily new and old,
    So is my love still telling what is told.

  147. 147
    kairosfocus says:

    GP: Very well put.

    Mung: snappy as usual.

    Joe: On the money, too.

    GD: GA’s are in effect hill climbers within islands of function with well behaved fitness functions — by very careful design. The FSCO/I challenge is to find islands of function that are deeply isolated in seas of non-function, with relatively extremely limited resources. That’s why the 500 – 1,000 bit threshold is important, and it is why the contrast between blind, chance and necessity search and intelligent injection of active information is also important.

    KF

  148. 148
    kairosfocus says:

    D: How’s progress? KF

  149. 149
    gpuccio says:

    Gordon Davisson:

    Some further thoughts on the argument of “any possible function”, continuing from my previous post.

    b) Another big problem is that the “any possible function” argument is not really true. Even if we want to reason in that sense (which, as explained in my point a, is not really warranted), we should at most consider “any possible function which is really useful in the specific context in which it arises”. And the important point is, the more a context is complex, the more difficult it is to integrate a new function in it, unless it is very complex. In a sense, for example, it is very unlikely that a single protein, even if it has a basic biochemical function, may be really useful in a biological context unless it is integrated in what already exists. That integration usually requires a lot of additional information: transciptional, post transcriptional and post translational regulation, transport and localization in the correct cellular context and, usually, coordination with other proteins or structures. IOWs, in most cases we would have an additional problem of irreducible complexity, which should be added to the basic complexity of the molecule.

    Moreover, in a beings which is already efficient (think of prokaryotes, practically the most efficient reproductors in the whole history of our planet), it is not likely at all that a single new biochemical function can really help the cell. That brings us to the following point:

    c) Even the subset of useful new functions in the context is probably too big. Indeed, as we will discuss better later, if the neo darwinian model were true, the only functions which are truly useful would be those whihc can confer a detectable reproductive advantage. IOWs, those which are “visible” to NS.

    Even if we do not consider, for the moment, the hypothetical role of naturally selectable intermediates (we will do that later), still a new single functional protein which is useful, but does not confer a detectable reproductive advantage would very likely be lost, because it could not be expanded by positive selection (be fixed in the population) nor be conserved by negative selection.

    So, even if we reason about “any possible function”, that should become “any possible function which can be so useful in the specific cellular context in which it arises, that it can confer a detectable, naturally selectable reproductive advantage.

    That is certainly a much smaller subset than “any possible function”. Are you sure that 2^50 is still a reasonable guess? After all we have got only about 2000 basic protein superfamilies in the course of natural history. Do you think that we have only “scratched the surface” of the space of possible useful protein configurations in our biological context? And how do you explain that about half of those superfamilies were already present in LUCA, and that the rate of appearance of new superfamilies has definitely slowed down with time?

    d) Finally, your observation about the “many different ways that a gene might perform any of these functions”. You give the example of different types of flagella. But flagella are complex structures made of many different parts, and again a very strong problem of irreducible complexity applies. Moreover, as I have said, I have never tried to compute dFSCI for such complex structures (OK, I have given the example of the alpha-beta part of ATP synthase, but that is really a single structure that is part of a single multi-chain protein). That’s the reason why I compute dFSCI preferably for single proteins, with a clear biochemical function. If an enzyme is conserved, we can assume that the specific sequence is necessary for the enzymatic reaction, and not for other things. And, in general, that biochemical reaction will be performed only by that structure in the proteome (with some exceptions). The synthesis of ATP from a proton gradient is accomplished by ATP synthase. That is very different from saying, for example, that flight can be accomplished by many different types of wings.

    More in next post.

  150. 150
    Dionisio says:

    KF
    Doing much better, though still a little swollen left side of face, but almost unnoticeable.
    Thank you for asking!
    BTW, was looking at some maps of the islands Antigua and Montserrat. Pretty spectacular scenery in that area.

  151. 151
    kairosfocus says:

    GP: It is indeed true that function is inter alia specific to context. The wrong part, will not work on a given car even if it will work fine on another. And if the right part is not properly oriented and placed then coupled, trouble. KF

  152. 152
    gpuccio says:

    Gordon Davisson:

    Your next argument is the following:

    But not all sequences are equally likely, because selection biases the search toward the sequence space near other successful solutions, and functional sequences seem to cluster together (e.g. gene families). Let’s say (again, all numbers made up for the sake of illustration) that this increases the “hit rate” for functional sequences by a factor of 10^100. That means that while functional sequences make up only 1/2^200 of the sequence space, evolution stumbles into one every 1/2^100 or so “tries”.

    No. This argument is simply wrong if you consider my premises. My aim is not to say that all proteins are designed. My aim is to make a design inference for some (indeed, many) proteins.

    I have already said that I consider differentiation of individual proteins inside a superfamily/family as a “borderline” issue. It has no priority.

    The priority is, definitely, to explain how new sequences emerge. That’s why I consider superfamilies. Proteins from different superfamilies are completely unrelated at sequence level.

    Therefore, your argument is indeed in favor of my reasoning. As I have said many times, assuming an uniform distribution is reasonable, but is indeed optimistic in favor of the neo darwinian model. There is no doubt that related or partially related states have higher probability of being reached in a random walk. Therefore, their probability is higher that 1/N. That also means, obviously, that the probability of reaching an unrelated state is certainly lower than 1/N, which is the probability of each state in a uniform distribution. For considerations similar to some that I have already done (the number of related states is certainly much smaller than the number of unrelated states), I don’t believe that the difference is significant. However, 1/N is a higher threshold for the probability of reaching an unrelated state, which is what the dFSCI of a protein family or superfamily is measuring.

    Then you say:

    Or rather, it would be the calculation I was talking about if it had real numbers, rather than just made-up-out-of-thin-air ones. And I have no idea what the real numbers are. I don’t think there’s any way to get a good handle on them until we know far more about the large-scale shape of the fitness function (i.e. the mapping between sequence and function) than we do now. But if you want to do a probability argument, you pretty much need them.

    As I have tried to show, that is not the case.

    dFSCI is a tool which works perfectly even if it is defined for a specific function.

    The number of really useful functions, that can be naturally selected in a specific cellular context, is certainly smnall enough that it can be overlooked. Indeed, as we are speaking of logarithmic values, even if we considered the only empirical number that we have: 2000 protein superfamilies that have a definite role in all biological life as we know it today, that is only 11 bits. How can you think that it matters, when we are computing dFSCI in the order of 150 to thousands of bits?

    Moreover, even if we consider the probabiliti of finding one of the 2000 superfamilies in one attempt, the mean functional complexity in the 35 families studied by Durston is 543 bits. How do you think that 11 bits more or less would count?

    And there is another important point which is often overlooked. 543 bits (mean complexity) means that we have 1:2^543 probabilities to find one superfamily in one attempt, which is already well beyond my cutoff of 150 bits, and also beyond Dembski’s UPB of 520 bits. But the problem is, biological beings have not found one protein superfamily once. They have found 2000 independent protein superfamilies, each with a mean probability of being found of 1:2^543. Do you want to use the binomial distribution to compute the probability of having 2000 successes of that kind?

    Now, some of the simplest families could have been found, perhaps. The lowest value of complexity in Durston’s table is 46 bits (about 10 AAs). It is below my threshold of 150 bits, so I would not infer design for that family (Ankyrin). However, 10 AAs are certainly above the empirical thresholds suggested by Behe and Axe, from different considerations.

    But what about Paramyx RNA Polymerase (1886 bits), or Flu PB2 (2416 bits), or Usher (1296 bits)?

    If your reason of “aggregating” all useful functional proteins worked, we should at most find a few examples of the simplest ones, which are much more likely to be found, and not hundreds of complex ones, which is what we observe.

    More in next post.

  153. 153
    gpuccio says:

    KF:

    Yes, I believe that part of what is not conserved in proteins can correspond to “information which changes because it must change for functional reasons”, and not only to “information which changes because it is not essential to function”.

    I have tried to make that distinction in one OP.

    For example, the same protein can have some sequence which differs among species, because it is regulatory, and it interacts differently in different species. I have observed something suggestive of that in very complex transcription factors. They are very long proteins, but only a small part of the sequence corresponds to known conserved domains. The rest is less understood, and less conserved. And you observe that trend exactly in the most complex and important regulators!

  154. 154
    kairosfocus says:

    D: Good to hear of progress. And yes, the scenery is quite wonderful. Apart from those caught up in the deceitfulness of riches and the agendas of folly-tricks the people are even better, never mind the usual foibles. Best of all, no ice . . . except from the ‘fridge! KF

  155. 155
    kairosfocus says:

    GP: Yes, adaptation to specific context is important. KF

    PS: I am happy to see you being able to take the lead in the main technical discussion in-thread. The local silly season has me struggling to try to keep or get eyes on the ball: it’s the POLICY, stupid! Backed up by, if you do not have a sustainable, integrated framework of priority initiatives, a valid world class implementer-oriented project cycle management process and adequate implementing and expediting capacity to a timeline, you are dead, Fred. Now I understand what my dad was going through when he stood contra mundum for policy soundness decades ago. He was right but was brushed aside, that’s why Jamaica is in the mess it is in. Oh, that we would learn the lesson of soundness in decision, instead of in regret . . .

  156. 156
    Box says:

    Gpuccio #149: And the important point is, the more a context is complex, the more difficult it is to integrate a new function in it, unless it is very complex. In a sense, for example, it is very unlikely that a single protein, even if it has a basic biochemical function, may be really useful in a biological context unless it is integrated in what already exists. That integration usually requires a lot of additional information: transciptional, post transcriptional and post translational regulation, transport and localization in the correct cellular context and, usually, coordination with other proteins or structures. IOWs, in most cases we would have an additional problem of irreducible complexity, which should be added to the basic complexity of the molecule.

    The problem you raised here is in IMHO devastating for any attempt of naturalistic explanation of life. The most ‘blessed’ DNA mutation is dangerous without proper regulation (epigenetics). You go even further when you write: “(…) it is very unlikely that a single protein, even if it has a basic biochemical function, may be really useful in a biological context unless it is integrated in what already exists. “/ In fact, I believe you are holding back here. An improperly regulated protein, irrespective of its potential usefulness, is almost certainly detrimental to the organism.
    When we see the inner workings of an organism we stand in awe of the intricate display of balance and harmony.

  157. 157
    gpuccio says:

    Box:

    You are perfectly right: I was holding back. My usual shy attitude! 🙂

    When Szostak used all his (remarkable) knowledge and understanding to build a new protein in a context of bottom up intelligent selection, pretending that it emerged randomly, the best he could do was to generate an artificial protein with a strong binding for ATP. When someone tried to put that protein in a real cellular context, the effect was devastating.

    So yes, I was holding back.

  158. 158
    kairosfocus says:

    Denton:

    >> To grasp the reality of life as it has been revealed by molecular biology, we must magnify a cell a thousand million times until it is twenty kilometers in diameter [[so each atom in it would be “the size of a tennis ball”] and resembles a giant airship large enough to cover a great city like London or New York. What we would then see would be an object of unparalleled complexity and adaptive design. On the surface of the cell we would see millions of openings, like the port holes of a vast space ship, opening and closing to allow a continual stream of materials to flow in and out. If we were to enter one of these openings we would find ourselves in a world of supreme technology and bewildering complexity. We would see endless highly organized corridors and conduits branching in every direction away from the perimeter of the cell, some leading to the central memory bank in the nucleus and others to assembly plants and processing units. The nucleus itself would be a vast spherical chamber more than a kilometer in diameter, resembling a geodesic dome inside of which we would see, all neatly stacked together in ordered arrays, the miles of coiled chains of the DNA molecules. A huge range of products and raw materials would shuttle along all the manifold conduits in a highly ordered fashion to and from all the various assembly plants in the outer regions of the cell.

    We would wonder at the level of control implicit in the movement of so many objects down so many seemingly endless conduits, all in perfect unison. We would see all around us, in every direction we looked, all sorts of robot-like machines . . . . We would see that nearly every feature of our own advanced machines had its analogue in the cell: artificial languages and their decoding systems, memory banks for information storage and retrieval, elegant control systems regulating the automated assembly of components, error fail-safe and proof-reading devices used for quality control, assembly processes involving the principle of prefabrication and modular construction . . . . However, it would be a factory which would have one capacity not equaled in any of our own most advanced machines, for it would be capable of replicating its entire structure within a matter of a few hours . . . .

    Unlike our own pseudo-automated assembly plants, where external controls are being continually applied, the cell’s manufacturing capability is entirely self-regulated . . . .

    [[Denton, Michael, Evolution: A Theory in Crisis, Adler, 1986, pp. 327 – 331. This work is a classic that is still well worth reading. Also see Meyer’s Signature in the Cell, 2009.] >>

  159. 159
    Dionisio says:

    For those interested in reading gpuccio’s interesting explanation of the dFSCI concept, here are the post #s within this thread: 133, 140, 146, 149, 152.

  160. 160
    Dionisio says:

    For those interested in reading gpuccio’s interesting explanation of the dFSCI concept, here are the post #s within this thread: 133, 140, 146, 149, 152.

    That could be a separate OP, very related to this thread, but started from scratch.

  161. 161
    Dionisio says:

    KF
    I like the ‘no ice’ part of the story 😉

  162. 162
    Box says:

    Gpuccio, thank you for your conformation 🙂
    I hold that this ‘balancing act’ of organisms is highly underrated and utterly mysterious. How is it that organisms maintain dynamic equilibrium? The life of even one cell is a constant flux from one equilibrium into the next – an ever shifting context. Somehow ‘equilibrium information’ relative to ‘whatever the context is’ is available, but cannot be explained ‘bottom-up’.

  163. 163
    Dionisio says:

    KF
    Is it easy to commute between Antigua and Montserrat on ferryboats? What about Barbuda?
    If one goes there, is it better to stay in Antigua and use it as a base from where to tour the other islands too?
    Would the locals understand my heavy Hispanic accent?
    Is the best season to visit at the beginning of the hear?
    Probably the most expensive too?
    Thank you.

    P.S. sorry, just realized this post is very OT for the current thread, but I needed a mental break after trying to understand the FSCO/I and dFSCI concepts well. Imagining myself in one of those islands is quite a refreshing exercise 😉
    Maybe that’s why there are so many nice smiling people there? Probably a more layback ‘don’t worry, be happy’ less agitated, slower paced, friendlier, less stressful lifestyle ? 🙂
    Except, as you said, for those few who have never understood your fellow Britton’s old song “can’t buy me love”? 🙁

  164. 164
    Dionisio says:

    Box
    You and I don’t understand the power of the magic ‘n-D e’ formula RV+NS
    Apparently KF and GP don’t understand it either
    😉

  165. 165
    gpuccio says:

    Dionisio:

    A common cognitive problem obviously unites us. 🙂

  166. 166
    gpuccio says:

    Box:

    Indeed, I am trying ti deepen my understanding of what is known about developmental procedures, with the aim of concluding a post about that, and I must say that the scenario is really overwhelming.

    You have emphasized the right point: the flux of information between epigenome and genome (yes, in that order) seems to be the key, but it is a key that still vastly eludes our understanding.

    It is absolutely true that living beings are systems which constantly exist in a “far from equilibrium” state which defies any simple explanation (and I am speaking of design explanations, non design explanations are just out of the game from the very beginning).

    I hope we can soon discuss more in detail this fascinating issue. 🙂

  167. 167
    kairosfocus says:

    D: Antigua is a good base to move around between the islands, and yes, there are ferry boats that make the run. Barbuda is a very special case, but actually both Montserrat and Antigua are effectively bilingual (there are complications . . . ), with significant Hispanic settlements from the Dominican Republic. You could be understood, or at least could make your way around. Barbuda, BTW, was a slave breeding plantation that was willed in perpetuity to the former slaves and is a sort of commune — complete with wild deer and actually deer hunting. Pink sand, too! Not to mention the Lagoon, a major eco site and home base for the Magnificent Frigate Bird . . . never mind it is a bit of a pirate bird, it is an awesome creature and a beautiful soaring flier. KF

  168. 168
    Box says:

    Gpuccio and Dionisio I guess your cognitive problem with regard to the magic of RV+NS are much more ‘severe’ than mine. I’m trying my utmost to descent to your level 🙂

    With regard to development procedures and dynamic equilibrium a quote by Stephen Talbott:

    The problem of form in the organism — how does a single cell (zygote) reliably develop to maturity “according to its own kind” — has vexed biologists for centuries. But the same mystery plays out in the mature organism, which must continually work to maintain its normal form, as well as restore it when injured. It is difficult to bring oneself fully face to face with the enormity of this accomplishment. Scientists can damage tissues in endlessly creative ways that the organism has never confronted in its evolutionary history. Yet, so far as its resources allow, it mobilizes those resources, sets them in motion, and does what it has never done before, all in the interest of restoring a dynamic form and a functioning that the individual molecules and cells certainly cannot be said to “understand” or “have in view”.

    We can frame the problem of identity and context with this question: Where do we find the context and activity that, in whatever sense we choose to use the phrase, does “have in view” this restorative aim? Not an easy question. Yet the achievement is repeatedly carried through; an ever-adaptive intelligence comes into play somehow, and all those molecules and cells are quite capable of participating in and being caught up in the play.

  169. 169
    Dionisio says:

    gpuccio @ 166

    I hope we can soon discuss more in detail this fascinating issue.

    I hope so too! 🙂

  170. 170
    Dionisio says:

    gpuccio @ 165

    A common cognitive problem obviously unites us.

    Agree. 🙂

  171. 171
    Dionisio says:

    Box @ 168

    The problem of form in the organism — how does a single cell (zygote) reliably develop to maturity “according to its own kind” — has vexed biologists for centuries.

    …and it has blown my poor mind for the last few years

  172. 172
    Dionisio says:

    KF
    thank you for the interesting story about the islands.

  173. 173
    gpuccio says:

    Gordon Davisson:

    Some more thoughts:

    Here’s where your #101 confused me, because it seems to conflict with what I thought I knew about your approach. My understanding from previous discussion (which I admit I haven’t always followed in full detail) was that your argument that dFSCI can’t be produced without intelligence is that it hasn’t been observed to. In order to make a solid case like this, you really have to be able to go on and say “…and if it could be produced naturally, we would have observed it.” And that means we must have tested many cases, and (in each of those cases) be able to tell if dFSCI has appeared.

    Yes, my point is that it has never been observed to be produces without conscious intelligence (empirical observation). And that is absolutely true. If we exclude the biological world, no example exists of dFSCI which was not designed, while we have tons of dFSCI which is designed by humans.

    A special case are computers and, in general, algorithms designed by humans. While they can certainly output objects with apparent dFSCI, I believe that it can be easily demonstrated that the dFSCI in the output is related to the dFSCI which had already been inputted into the system. I believe that Dembski’s and Marks’ work about active information is very good about that. I will not deal further with this aspect because after all it is not my field.

    Just a simple example will be enough to show my position. A computing algorithm can certainly output strings of increasing functional complexity. A simple case would be a software which outputs the digital figures of pi. If it outputs 10 digits, the search space for that string is 10^10. If it outputs 100 digits, the search space is 10^100. As the target space is always the same (1), the functional complexity apparently increases of 90 orders of magnitude.

    But:

    a) The specification is always the same: a string which corresponds to the decimal digits of pi. This is a very important point. Algorithms cannot generate new specifications, because they have no idea of what a function is. They have no notion of purpose (indeed, they have no notion of anything). They can “recognize” and use only what has been in some way “defined” to them in their input, or can be deducted from the initial input according to rules which are already in the initial input too, or can be deducted from it.

    b) According to the dFSCI concept, if the digits of p can be computed (and they can) and if the computing algorithm is simpler than the output, then we should consider the complexity of the computing algorithm, rather than the complexity of the output. IOWs, if we find in a computing environment a string which corresponds to the first 10^1000 digits of pi, it is probably more likely that the computing algorithm arose by chance in the system, rather than the string itself, and that the string was simply computed (necessity) by the random algorithm. Why? because the complexity of the random computing algorithm is probably lower than 10^1000 bits!

    Another way to say that is that, when the observed string is computable, we should consider the Kolmogorov complexity instead of the apparent complexity.

    These consideration can be summed up by saying that computing algorithms cannot generate new original dFSCI (a new functional specification, which requires high complexity to be implemented).

  174. 174
    gpuccio says:

    Gordon Davisson:

    You say:

    All of the functional complexity in the biological world, and Dembski isn’t sure if it actually qualifies as CSI (in his sense) because of unknowns like the ones I’ve been harping on. If he can’t tell for sure if a human exhibits CSI, what hope have you of telling if this new mutation you just observed does?

    I can’t speak for Dembski. Maybe I am mentally obtuse, but I really believe that I can say that ATP synthase, and many other proteins, do exhibit dFSCI, and therefore CSI (in its basic meaning) and therefore allow a design inference as the best available explanation for their origin. And I have tried to explain why.

    You say:

    …but that sounds more like the sort of argument from theoretical probability that Dembski’s working toward, and it means you do need to take those messy unknowns into account in your calculations.

    Well, I have tried to explain what I think of those “messy unknowns”, and why, and why I don’t think they are so “messy”, after all.

    Nest post (the last one, I hope) is about Lenski.

  175. 175
    gpuccio says:

    Gordon Davisson:

    Finally, you say:

    It also seems to conflict with the empirical approach I thought you were taking. Suppose we observed dFSCI appearing (in, say, something like the Lenski bacteria experiment): would that be evidence that it can be produced naturally, or evidence that some intelligence intervened to add it? Without some way of distinguishing the two, I don’t see how the claim that dFSCI only comes from intelligence can be empirically tested.

    First of all, I hope we agree that no dFSCI at all has emerged from the Lenski experiment. At most, loss of function has come out. And the small regulatory change about citrate.

    Now, I will be very clear: if a new protein exhibiting clear dFSCI emerged in the Lenski experiment (and we could be sure that it did not exist before in the system), that would be a serious blow for ID theory and especially for my version of it. A very serious blow. Maybe not the end of it, but a great support to the neo darwinian model (I would say, the first empirical support).

    Why? Because the Lenski experiment is a controlled experiment in a controlled system, and there is absolutely no reason to assume that the biological designer would “intervene” in it, maybe to prove Lenski right and demonstrate that he does not exist.

    We must be serious in our scientific approach. The purpose of experiments is not to prove our personal ideas, but to find what is true. The Lenski experiment has been designed (well, I must say) to support the neo darwinian model. If it succeeds, it will support it. If it fails… no need to comment further. Up to now, it has failed.

    But let’s suppose that it succeeds. Let’s suppose that something like ATP synthase emerges in the system, something that did not exist before.

    A serious blow for ID theory, no doubt. But also a fact that requires explanation.

    Now, luckily, the Lenski experiment is well designed, and it would allow us not only to observe the final event, but also to try to understand it. Indeed, that’s why they freeze intermediate states, so that they can study them retrospectively, to understand what happens and why it happens.

    So, if our new complex functional protein emerges, we can try to understand the “path” that brings to it. Now, two things are possible:

    a) The path is compatible with the neo darwinian model. IOWs, we find a lot of functional intermediates, each of them naturally selectable in the context of the experiment, and each transition is in the range of the probabilistic resources of the system. OK, so the neo darwinian explanation is supported, at least in this case. A very serious blow to ID theory.

    b) There is no neo darwinian path. The protein just emerges, more or less gradually, without any selectable intermediate, and against the probability laws. This would be a very interesting situation. Still, I would not argue for an intervention of the designer: that would be ad hoc, and is against my principles. But I would say that we would have a fact that cannot reasonably be explained with our present understanding of the laws of physicals, biochemistry and biology. If confirmed, that type observation would probably require a thorough rethinking of all that we know. The design hypothesis, in this regard, is frankly more “conservative”.

    But, luckily (at least for me), such an event not only has nor been confirmed, but has never been observed. So, at present, the Lenski experiment (like all the other evidence from biological research) fully supports ID theory.

    Well, I believe that’s all. You final remark is:

    Therefore, I am now confused. Is the source of my confusion clear enough that you can see what I’m worried about, and clarify it for me?

    I have sincerely tried. With some detail. I hope you have been able to read my posts (just as a summary: #131, 133, 140, 146, 149, 152, 173, 174 and this one).

  176. 176
    Dionisio says:

    gpuccio @ 173

    Yes, my point is that it has never been observed to be produces without conscious intelligence (empirical observation).

    Did you mean “to be produced” ?

    I have problems with the auto-correcting feature when writing posts too.

  177. 177
    Dionisio says:

    For those interested in reading gpuccio’s interesting explanation of the dFSCI concept, here are the post #s within this thread: 133, 140, 146, 149, 152, 173-175.

  178. 178
    gpuccio says:

    Dionisio:

    Yes, “produced”. It’s just typos, unfortunately I have not always the time to re-read everything with care. And the error signal does not work when the alternative word is a correct english word.

    Maybe I just subconsciously hope that random errors will make my posts better! 🙂

  179. 179
    Upright BiPed says:

    Hmmm…I thought we might have heard back from Gordon Davisson by now.

  180. 180
    gpuccio says:

    UB:

    Maybe he is busy, and had not the time to read the comments. However, I am grateful that he offered me the occasion to elaborate on some not so obvious aspects.

    I would certainly appreciate very much his comments, I hope he can get in touch sooner or later, in the unending flow of information that is this blog! 🙂

  181. 181
    Mung says:

    I trust Gordon is taking a break to find out if evolutionary algorithms really are intelligently designed.

    😀

    I suppose though that even that is begging the question, since the claim being made is with respect to functional information beyond some minimum threshold, isn’t it?

  182. 182
    Dionisio says:

    For those interested in reading gpuccio’s insightful comments on the dFSCI concept, here are the post #s within this thread: 133, 140, 146, 149, 152, 173-175.

  183. 183
    Dionisio says:

    #179 Upright BiPed

    Hmmm…I thought we might have heard back from Gordon Davisson by now.

    I agree with gpuccio on his post #180:

    Maybe he[GD] is busy, and had not the time to read the comments.

    Here’s a summary list of posts related to GD:
    GD’s most recent post in this thread:

    126 Gordon Davisson August 28, 2014 at 6:28 pm

    Posts addressed to GD after his latest post:

    127 Upright BiPed August 28, 2014 at 10:20 pm

    128 gpuccio August 28, 2014 at 10:54 pm

    130 kairosfocus August 29, 2014 at 1:39 am

    131 gpuccio August 29, 2014 at 2:50 am

    133 gpuccio August 29, 2014 at 3:32 am

    140 gpuccio August 29, 2014 at 6:10 am

    145 Mung August 29, 2014 at 1:50 pm

    146 gpuccio August 30, 2014 at 4:30 am

    149 gpuccio August 30, 2014 at 7:22 am

    152 gpuccio August 30, 2014 at 8:01 am

    173 gpuccio August 31, 2014 at 5:44 am

    174 gpuccio August 31, 2014 at 7:19 am

    175 gpuccio August 31, 2014 at 7:47 am

  184. 184
    Upright BiPed says:

    Well…this is either the third or fourth time I’ve tried to re-engage Gordon on his statement, so you other cats have to wait.

    🙂

  185. 185
    Mung says:

    lol

    roger that. I will put evolutionary algorithms in a queue.

  186. 186
    gpuccio says:

    UB, Mung:

    I will patiently wait for my turn. No fight between fellow IDists! 🙂

  187. 187
    Gordon Davisson says:

    Hi, all! Sorry I’m late in replying (as usual), and I’ll probably be even slower in the future (going on a trip, and I may have intermittent time & Internet access). I won’t have time to reply to everything that’s been said, so I want to concentrate on what I see as the core of my digreement with gpuccio. If I have time later (hah!), I’ll try to reply to Upright Biped and kairosfocus as well.

    GP: You’ve covered a lot of ground in your replies; thanks for the extensive explanation! I agree with some of it, disagree with some, and semi-agree with a lot; all of which could easily spawn extensive discussion (the question of whether ID and RV+NS are really the only relevant hypotheses is … a very complicated question), so even though many are worth discussing, I think I’d better concentrate on what I see as the main issue here: whether your reasoning is sufficient to show that RV+NS are insufficient to produce new functional proteins (de novo, not variants).

    I think I understand how you’re using theoretical probability calculations and experimental results (“we haven’t seen it happen”) much better now. I was initially under the impression your argument was like Dembski’s: entirely theoretical. After our previous conversation, I switched to the understanding that it was entirely based on the experimental results. Now I understand you’re using both, and I think I’ve got the hang of how you’re interrelating them.

    And perhaps not suprisingly, don’t think you can fully support your conclusion in either way. Let me start with the theoretical probability calculation.

    Theoretical probability calculation (aka the probability of what?):

    First, since you’re mainly interested in the origin of completely new genes (e.g. the first member of a superfamily), I agree that selection doesn’t help (at least as far as I can see — there’s a lot I don’t know about evolution!). But I don’t agree that the other factors I mentioned — the number of different functions and the number of “islands” per function — are irrelevant. The format I used to describe them is derived from Dembski’s formulation, not to your approach, so let me restate my (again, completely made-up) example in a more relevant format:

    In my example, to keep the math simple, we’re looking at genes are 750 base pairs long, and of those 250 are fully required for the gene’s function, so the genes have 500 Fits. That means the probability of any particular one of these genes arising at random is 1 in 2^500 (about 10^150).

    But (again, in my example) there are 2^300 such functional islands here. In my previous discussion I broke that into 2^50 functions, each with 2^250 islands — but this distinction is only relevant for Dembski’s formulation, for your approach (again, at least as I understand it) it’s really the total number of islands that matter. You could also break them up into superfamilies, or whatever, but it’s really the total number that matters.

    That means that about 1/2^200 (about 10^60) of the sequence space corresponds to a function. That’s not high enough that evolution is likely to find any such sequence at random (I estimate there’ve been something like 10^45 bacteria alive since the origin of life), but it’s far far higher than the 10^150 number you’d get just from looking at a particular island (and remember it’s completely made up — more on that later).

    My first, and most critical claim, is that it’s that the probability of hitting any island, not the probability of hitting a specific island that matters. Actually, a better way to describe this is the density of functional sequences in the overall sequence space.

    Now, you made a couple of points about this probability:

    Another big problem is that the “any possible function” argument is not really true. Even if we want to reason in that sense (which, as explained in my point a, is not really warranted), we should at most consider “any possible function which is really useful in the specific context in which it arises”.

    Agreed. Many (/most) functional sequences that arise will be irrelevant to the organism they happen to arise in, and thus not be selected for. Also, even most of the ones that are beneficial to the organism will still be lost to genetic drift before selection really has a chance to kick in. So there are several more factors to include for a full calculation.

    The number of really useful functions, that can be naturally selected in a specific cellular context, is certainly smnall enough that it can be overlooked. Indeed, as we are speaking of logarithmic values, even if we considered the only empirical number that we have: 2000 protein superfamilies that have a definite role in all biological life as we know it today, that is only 11 bits. How can you think that it matters, when we are computing dFSCI in the order of 150 to thousands of bits?

    Here I disagree. As I said, for your approach it’s not the number of functions that matter, it’s the number of islands; and for that, I’m pretty sure that 2000 is a vast, vast underestimate. First, because that’s the number we know of; we have no reason to think it’s anything but a miniscule subset of what exists. Second, because (at least as I understand it), the Fits value would count the size of each different gene in a superfamily (rather than the size of the superfamily group as a whole), so you’d also have to multiply by the number of possible (again, not just known) functional islands in that superfamily’s range.

    More technically: the number of known functional islands is a lower bound on the total number of existing (in sequence space) functional islands; but for your argument you need an upper bound, so that you can get an upper bound on the overall density of functional sequences.

    So what’s the actual density of functional sequences?

    Short answer: I don’t know.

    Some quick googling turned up a couple of sources on the subject: Douglas Axe’s paper “Estimating the prevalence of protein sequences adopting functional enzyme folds“, which (if I’m reading it right) puts the overall density of functional sequences at 1 in 10^64 (and those corresponding to a particular function at 1 in 10^77). This is actually fairly close to my made-up figure, though it’s purely coincidence! It’s also significantly beyond what evolution can realistically achieve (at least as far as I can see), but far higher than you’d think from looking at the Fits values for individual genes.

    If Axe’s figure is right (and I’m not missing something), evolution has a big problem.

    But Axe’s also may be far from the real value. There’s a response by Arthur Hunt over at the Panda’s Thumb (“Axe (2004) and the evolution of enzyme function“), which says:

    Studies such as these involve what Axe calls a “reverse” approach – one starts with known, functional sequences, introduces semi-random mutants, and estimates the size of the functional sequence space from the numbers of “surviving” mutants. Studies involving the “forward” approach can and have been done as well. Briefly, this approach involves the synthesis of collections of random sequences and isolation of functional polymers (e.g., polypeptides or RNAs) from these collections. Historically, these studies have involved rather small oligomers (7-12 or so), owing to technical reasons (this is the size range that can be safely accommodated by the “tools” used). However, a relatively recent development, the so-called “mRNA display” technique, allows one to screen random sequences that are much larger (approaching 100 amino acids in length). What is interesting is that the forward approach typically yields a “success rate” in the 10^-10 to 10^-15 range – one usually need screen between 10^10 -> 10^15 random sequences to identify a functional polymer. This is true even for mRNA display. These numbers are a direct measurement of the proportion of functional sequences in a population of random polymers, and are estimates of the same parameter – density of sequences of minimal function in sequence space – that Axe is after.

    If the 10^-10 to 10^-15 range is right, then we’ve got clear sailing for evolution. Now, I’m not going to claim to understand either Axe or Hunt’s reasoning in enough detail to make any claim about which (if either) is right, or what that tells us about the actual number. What I will claim is that getting a solid upper bound on that density is necessary for your reasoning to work. And there’s no way to get that from a specific gene’s Fits value (even if it is really large, like Paramyx RNA Polymerase).

    The empirical argument: overview

    Now, I want to turn to the empirical side of your argument: that we’ve never seen dFCSI produced without intelligence, which is evidence that it cannot be produced without intelligence. There are a number of complications here, but I want to concentrate on what I see as the big problem with this: we can only infer the nonexistence of something from not detecting it if we should expect to detect it.

    We’ve never actually seen an electron; but given how small they are, we don’t expect to be able to see them, so that’s not evidence they don’t exist. We’ve only seen atoms fairly recently, but that didn’t count as evidence against their existance for the same reason. On the other hand, if we’d gotten to the point where we should be able to see them, and hadn’t… then that would have been evidence against them.

    We’ve never seen a supervolcano erupt; that should be really easy to detect, but they’re rare enough that we don’t expect to have seen one anyway.

    In the case of completely novel genes, I’d expect them to be both rare and hard to detect, so making an empirical case against them is going to be very difficult.

    But let me take care of a minor issue. I’m going to be referring fairly extensively to the Lenski experiment, so I want to clarify my stance on its results. You said:

    First of all, I hope we agree that no dFSCI at all has emerged from the Lenski experiment. At most, loss of function has come out. And the small regulatory change about citrate.

    I have to disagree about the loss of function statement. At least as I understand it, a new functional connection was made between duplicates of two pre-existing functional elements (a gene and a regulatory sequence), creating a new function. This is certainly not a new functional gene (your main concern), but I’d argue it’s a gain of function and (at least as I understand it) a gain of dFSI (the continuous measure) if not dFSCI.

    The empirical argument: rarity vs. sample size

    The basic idea here is that in order to empirically show that something has a probability less than p, we need more than 1/p trials (with some complications if events aren’t independent, probability isn’t constant, etc — I’ll ignore these). AIUI the Lenski experiment has now been running for about 60,000 cell generations, with the population having recently reached 60,000. If I simplify the situation and assume the population was constant, that’s 3.6*10^9 cells in the study. That means we only expect to see any event that has a probability over 1 in 3.6*10^9.

    If something doesn’t happen over the course of a Lenski-scale experiment, the most we can safely say is that its probability is below 1 in 10^8 or so, and events with probabilities below 10^11 are unlikely to occur in an experiment like this.

    But the wild population of bacteria is estimated at around 10^30. That means events with probabilities between 1 in 10^11 and 10^30 will be unlikely to show up in Lenski’s experiment, but occur constantly in the wild. And events with probabilities down to 1 in 10^45 are so are likely to have occured over the course of life on Earth.

    Evolution’s sample size is far larger than Lenski’s, or indeed any plausiple controlled experiment, and hence it can “see” events of much lower probability than we can.

    The empirical argument: detection efficiency

    As you pointed out, just because a new functional sequence arises doesn’t mean it’ll be selected and spread through the population. Metaphorically speaking, evolution doesn’t “see” every new sequence.

    But we won’t “see” everything that happens either. In Lenski’s experiment, he doesn’t scan each new bacterium for new genes/functions/etc, he waits for something visible to happen (overall fitness goes up, ability to digest citrate appears, etc), then goes back and looks at what happened. That means that if a new functional sequence arises but doesn’t spread through the population (i.e. evolution doesn’t “see” it), then Lenski won’t see it either.

    That means that his detenction efficiency is necessarily lower than evolution’s.

    It also means that for every new benefificial mutation that he detected, there are probably lots of others that occurred, but didn’t spread and hence weren’t detected. And for every new function (cit*) that was detected, there were probably several that weren’t relevant to the bacteria, or died out due to drift, or something like that.

    Now, you could imagine doing a more detailed experiment than Lenski’s: one where you actually examined each new bacteria for new genes, functions, etc… but doing that at anything like Lenski’s scale would require a huge investment of time, money, effort, etc. I’d argue there’s a tradeoff here: the more detailed observations you make, the smaller scale you can afford to work in. One of the reasons I’m using Lenski as my example here is that I think he’s in about the sweet spot of this tradeoff, but if you know of a better experiment please point me to it.

    The empirical argument: detecting a process

    There’s actually another reason I’m picking out Lenski’s experiment here: his method lets him track the history of significant events (like the cit* phenotype), not just the event itself.

    Suppose we had an experiment without this feature, and we saw some part of the process of the emergence of a completely new gene. Would we be able to detect this as a significant event? Let’s take a look at a sequence of possible events leading to a new gene:

    1) A series of mutations take a particular genetic sequence on a random tour of sequence space… that happens to wind up “next to” a functional sequence. If this happened in an experiment, I’m pretty sure nobody would realize it’d come within a mutation of a functional sequence, so nothing significant would be detected.

    2) Another mutation changes it into the functional sequence. This is really the event we’re looking for, but if we just saw this single step (an almost-functional sequence mutating into a functional one), the probability of that mutation is going to be pretty high (in a reasonable sample size). Also, the gene probably has a pretty low level of function, so its Fits value (if we could calculate it at this point) would be fairly small. Would this be considered new dFSCI?

    3) Selection spreads the new functional mutation through the population… If we just saw this part alone, we wouldn’t necessarily realize it was even a new gene, so wouldn’t categorize it as new dFSCI.

    4) Additional mutations optimize the gene’s function, and also spread through the population. Again, just microevolution, not new dFSCI. But it does increase the Fits value, maybe pushing it beyond the relevant probabilty threshold…

    …If I’m understanding this right, that means there aren’t very many experiments that’re going to be useful for looking for new dFSCI. That is, you have a very limited set of experiments to draw data from (and aggregate sample sizes from, etc).

    Which brings me to a final question: what experiments do you consider relevant for the “no dFSCI” argument? What do you know about their sample sizes, expected detection efficiency, etc?

  188. 188
    kairosfocus says:

    F/N:

    The old switcheroo game, sadly, continues.

    In the teeth of a very specific discussion of the search challenge of a configuration space of relevant scale, and a very definite focus on functionality dependent on configuration leading to isolated islands of function and relatively tiny possible scope of search, the “answer” is to pretend that search does not confront a needle in haystack challenge by pretending that probability calculations rather than search challenge is pivotal.

    FSCO/I is real, as real as the text in posts in this thread. As real as the coded algorithmic info in D/RNA chains. As real as the fold-function constraints on proteins in AA sequence space leading to deeply isolated islands of function manifested in the real-world distribution of proteins (which will reflect dominant processes of exploration). As real as, the key-lock fit tight coupling and integration to form 3-d functional structures in both the world of life and that of technology (which, per AutoCAD etc can be converted into coded strings).

    And the blind sample challenge is real, as can be seen from the utter absence of a credible, empirically grounded account of origin of cell based life after decades of increasing embarrassment. (That’s why a typical talking point today — never mind Miller-Urey etc appearing as icons of the lab coat clad mythology in science textbooks — is to try to run away from the root of the tree of life, declaring it somehow off limits and not part of “Evolution.”)

    The blind search challenge is real, as real as needing to account incrementally for the rise of novel proteins and regulatory circuits as well as 3-d key-lock functional organisation to form the dozens of main body plans of life, with ever so many deeply isolated protein fold domains in the AA sequence space.

    The blind search challenge is real, as real as the need to account for reasonable mutation rates, population genetics, time to fix mutations once found, and to discover incremental ever-improving forms across the branches of the tree of life.

    Where, it remains the case that there is exactly one empirically warranted causal force behind FSCO/I . . . design — with trillions of cases in point, just on the Internet — so, the pretence that this is some mysterious, hard to find, hard to test thing is strawman tactic nonsense.

    Apart from the sort of tests and empirical results we have seen with micro-organisms — the most favourable case for evolution [and no the ducking, dodging, obfuscation and pretence that supportive results undermine the findings speak inadvertent volumes to the strength of the point (remember, up to 10^12 malaria parasites in a victim and a chronic condition that recurs again and again providing a pool of experiments with 10’s or 100’s of millions of walking “labs” across Africa and Asia . . . 10^20 reproductive events being easily on the table] — we can easily see the nature of the challenge in light of findings on random text generation tests that we can for convenience summarise by clipping Wiki testifying against interest in its infinite monkeys theorem article:

    The theorem concerns a thought experiment which cannot be fully carried out in practice, since it is predicted to require prohibitive amounts of time and resources. Nonetheless, it has inspired efforts in finite random text generation.

    One computer program run by Dan Oliver of Scottsdale, Arizona, according to an article in The New Yorker, came up with a result on August 4, 2004: After the group had worked for 42,162,500,000 billion billion monkey-years, one of the “monkeys” typed, “VALENTINE. Cease toIdor:eFLP0FRjWK78aXzVOwm)-‘;8.t” The first 19 letters of this sequence can be found in “The Two Gentlemen of Verona”. Other teams have reproduced 18 characters from “Timon of Athens”, 17 from “Troilus and Cressida”, and 16 from “Richard II”.[24]

    A website entitled The Monkey Shakespeare Simulator, launched on July 1, 2003, contained a Java applet that simulates a large population of monkeys typing randomly, with the stated intention of seeing how long it takes the virtual monkeys to produce a complete Shakespearean play from beginning to end. For example, it produced this partial line from Henry IV, Part 2, reporting that it took “2,737,850 million billion billion billion monkey-years” to reach 24 matching characters:

    RUMOUR. Open your ears; 9r”5j5&?OWTY Z0d…

    In short, successes are well short of the 500 – 1,000 bit threshold and much further short of the jump from 100 – 1,000 kbits for a first genome (itself a major OOL challenge), to 10 – 100+ mns credibly required for new body plans. We still have only 10^57 atoms in our solar system, mostly H in the sun. We still have only ~ 10^80 atoms in the observed cosmos. We still have 10^-14 s as a reasonable fast rxn rate and perhaps 10^17 s. The realistic config space challenges utterly overwhelm the threshold case of a blind one straw sized sample from a cubical haystack as thick as our galaxy.

    And no, one does not require a wild goose chase after every possible mechanism that hyperskeptical objectors can dream up when one easily knows that the scope of search of the toy example of giving every atom in the solar system 500 coins and flipping, observing and testing a state each 10^14 times per second — comparable to the oscillation frequency of visible light — utterly overwhelms what is possible in a Darwin’s pond or in a reproducing population of cell based life forms.

    The coin flipping atoms search, whether scatter-shot random sample or incremental random walk from blind initial configs, still face only doing a sample of possibilities that is as a straw to a cubical haystack 1,000 LY across, for our solar system. And a similar calc for 1,000 bits ends up doing a one straw sample to a haystack that would swallow up our 90 bn LY across visible cosmos and not notice it.

    That is, we are confidently able to apply the premise of statistics that a small, blind sample of a large population tends strongly to reflect its bulk, not isolated unrepresentative clusters. Which, BTW is exactly the line of reasoning that grounds the statistical form of the 2nd law of thermodynamics.

    In short, the empirical result is backed up by the blind search needle in haystack analysis summarised in the infographic in the OP and elsewhere.

    The twists, side-slips, side tracks and more in the end only serve to underscore the force of the point.

    The ONLY empirically warranted, analytically plausible explanation for FSCO/I is design. For reasons that are not that hard to figure out.

    And so, we have excellent reason to hold that until SHOWN otherwise, FSCO/I is a strong, empirically grounded sign of design as cause.

    KF

  189. 189
    gpuccio says:

    Gordon Davisson:

    Thank you for your detailed answer.

    I don’t want (and I suppose you don’t want either) to start an unending discussion repeating the same points. Therefore, I will try to comment only on the points in your post which allow some new clarification or argument. For the rest, I stick to what I have already said.

    In your introduction, you say:

    “I think I’d better concentrate on what I see as the main issue here: whether your reasoning is sufficient to show that RV+NS are insufficient to produce new functional proteins (de novo, not variants).”

    And that’s perfectly fine.

    “I think I understand how you’re using theoretical probability calculations and experimental results (“we haven’t seen it happen”) much better now.”

    I am happy of that.

    “I was initially under the impression your argument was like Dembski’s: entirely theoretical. After our previous conversation, I switched to the understanding that it was entirely based on the experimental results. Now I understand you’re using both, and I think I’ve got the hang of how you’re interrelating them.”

    Good. I agree that I am using both, like everybody usually does in empirical science. We build theories to explain facts.

    “And perhaps not surprisingly, don’t think you can fully support your conclusion in either way. Let me start with the theoretical probability calculation”

    OK. So, to the first point.

  190. 190
    gpuccio says:

    Gordon Davisson:

    “Theoretical probability calculation (aka the probability of what?):”

    You say:

    “First, since you’re mainly interested in the origin of completely new genes (e.g. the first member of a superfamily), I agree that selection doesn’t help (at least as far as I can see — there’s a lot I don’t know about evolution!).”

    I think you see very well. This sets the field for the following discussion, so let’s understand well what it means.

    There are, IMO, only two ways to try to counter ID arguments about the origin of protein superfamilies, especially in the form I generally use.

    The first is to rely on NS to overcome the probabilistic barriers. That is easily shown as non scientific, because there is no reason, either logical or empirical, to believe that complex functions in general, and functional proteins in particular, can be deconstructed as a general rule (for proteins, I would say not even occasionally) into simpler steps which are both functional and naturally selectable. The complete absence of ant empirical evidence of functional intermediates between protein superfamilies (IOWs, their absolute separate condition at sequence level) is the final falsification of that approach, as I have argued many times. Of course, neo darwinists try all possible excuses to explain that, but I am very much sure that they fail miserably. Luckily, you did not follow that approach, so we need not spend further time on it.

    The second approach is to deny that protein superfamilies really exhibit dfSCI. There is only a way to do that: assume that the search space of protein sequences is really full of functional islands, big islands and archipelagos, maybe whole continents. I will cal this approach “the myth of frequent function”, the idea that some swimming can easily bring us in view of land. This is the way you take in your post, do I will try to show why it is a myth.

    First of all, you make an interesting distincion between function and functional islands, saying that each function has many islands. There are two possible ways to see that:

    a) Each basic superfamily is made of many clustered islands, sequence related. For example, a superfamily can include different families, and each family can include different proteins. Sometimes the families are very related at sequence level, other times they are not. This is true. The superfamily is a level of grouping that I usually use because superfamilies are certainly unrelated at sequence level. But the same discourse can be done for folds (which are about 1000), or for families (which are about 4000). the reasoning is rather similar. Nor does considering 4000 islands of families instead of 2000 of superfamilies change much. Individual proteins are many more, but most of them are present in great number of homologues. So, even if we consider an island for each protein we are still with a number of islands which is not really huge. The fact is, the 2000 superfamilies represent clusters which, being unrelated at sequence level, are certainly very distant in the ocean of the search space.

    b) Another aspect is that the same function can be implemented from different islands, even rather unrelated at sequence level. We know that there are structures, even individual proteins, which are very similar as structure and function, but share only very limited homology. That os more or less the concept of the “rugged landscape, and we will go back to it in our discussion. I has avoided this point in my first round of pots for simplicity, but it is indeed a very strong point in favor of ID theory.

    Now, I say that the concept of “frequent function” is a myth. Why? Because all that is known denies it. And neo darwinists are forced to defend it with false reasonings.

    You make your first point here:

    Here I disagree. As I said, for your approach it’s not the number of functions that matter, it’s the number of islands; and for that, I’m pretty sure that 2000 is a vast, vast underestimate. First, because that’s the number we know of; we have no reason to think it’s anything but a miniscule subset of what exists. Second, because (at least as I understand it), the Fits value would count the size of each different gene in a superfamily (rather than the size of the superfamily group as a whole), so you’d also have to multiply by the number of possible (again, not just known) functional islands in that superfamily’s range.

    More technically: the number of known functional islands is a lower bound on the total number of existing (in sequence space) functional islands; but for your argument you need an upper bound, so that you can get an upper bound on the overall density of functional sequences.

    I want to make a first counter point which is very important. You betray here a methodological error which is almost the rule in our debate (on the side of neo darwinists), but is an error just the same.

    I tried to point at that error in one of my posts to you, when I said:

    “I would like to remind here that ID is trying, at least, to offer a quantitative approach to the problem of probabilities in biological evolution. That is not only the duty of ID: it is the duty of anyone who is interested in the problem, most of all the duty of those who believe that the neo darwinian model is a good solution. After all, it’s the neo darwinian model which is vastly based on RV (a probabilistic system). So, it is the cogent duty of neo darwinists to show that their random explanation is appropriate. They have repeatedly avoided any serious analysis of that aspect, so ID is trying to do that for them. ”

    IOWs, it’s not so much that for my argument I need an upper bound, so that I can get an upper bound on the overall density of functional sequences. The real problem is that darwinists absolutely need some empirical support to their probabilistic model, and they have none.

    However, let’s try to do the work for them.

    You say: “I’m pretty sure that 2000 is a vast, vast underestimate. ”

    No, it is not.

    “First, because that’s the number we know of; we have no reason to think it’s anything but a miniscule subset of what exists. ”

    No reason? Let’s see.

    1000 superfamilies (more or less) are already found in LUCA. Many of them are very complex, for example the domains in alpha and beta subunits of ATP synthase. So, those were “found” in the first few hundreds of million years of our planet’s life.

    The other 1000 (more or less) emerged in the rest of natural history, with slowing rate.

    Why? Why, if that is just “a minuscule subset of what exists”? If there are so many thousands of different foldings and superfamilies potentially useful in the existing biological context, ready to be found, how is it that the powerful mechanism of RV and NS have been slowing so much their efficiency in the last, say 4 billion years, after the frenetic “luck” of the beginning?

    How is it that we have not a lot of other molecular machines which generate ATP from proton gradients? Why haven’t we simpler machines which do that? Was it really necessary to have all those chains, each of them so conserved?

    The simple truth is, there is absolutely no reason to think that “it’s a minuscule subset of what exists”. That is only wishful thinking of neo darwinists, supported by nothing.

    But let’s go on. You say that the important point is the total number of islands. Firts of all, the important point is rather the total surface of islands which correspond to an useful, naturally selectable function. And that is much, much less.

    But it is not true just the same. The probability computed in that way would just be the probability of finding one of those functions. Now, lets say that the total “selectable” surface is, say 4 square Kms. And let’s say, for the sake of reasoning, that it is made of two sub archipelagos:

    ankyrin (46 Fits)

    and

    Paramyx RNA Pol (1886 Fits)

    That means that, of those 4 square Kms of islands, only 1:10^575 (more or less) of the surface is Paramyx RNA Pol, while the rest is ankyrin.

    Now, if you find ankyrin, your reasoning could be fine. But are you really saying that, if you find Paramyx RNA Pol, that is easily explained by the facts that ankyrin is much more likely, and is functional too?

    What a way of using probability is that? What a scientific methodology is that?

    If you find Paramyx RNA Pol, that’s what you have to explain.

    So, as I hope you can see, dFSCI is very useful and appropriate, because it allows us to distinguish between relatively more “likely” results and results which cannot be explained at all by RV.

    So again, let’s say for the sake of argumentation that we can explain ankyrin (that I don’t believe). In which way does that help to explain Paramyx RNA Pol, or ATP synthase?.

    So, I will keep my scientific dFSCI and the scientific data from Durston, and I will leave the unscientific wishful thinking and imagination to neo darwinists.

    You are free to make your choice.

    More in next post (tomorrow, I suppose).

  191. 191
    Dionisio says:

    For those interested in reading gpuccio’s insightful comments on the dFSCI concept, here are the post #s within this thread: 133, 140, 146, 149, 152, 173-175, 189, 190.

  192. 192
    gpuccio says:

    Gordon Davisson:

    Let’s go on:

    “So what’s the actual density of functional sequences?”

    You say:

    “Short answer: I don’t know.”

    Which, since Socrates, has always been a good statement. 🙂

    But you go on:

    Some quick googling turned up a couple of sources on the subject: Douglas Axe’s paper “Estimating the prevalence of protein sequences adopting functional enzyme folds“, which (if I’m reading it right) puts the overall density of functional sequences at 1 in 10^64 (and those corresponding to a particular function at 1 in 10^77). This is actually fairly close to my made-up figure, though it’s purely coincidence! It’s also significantly beyond what evolution can realistically achieve (at least as far as I can see), but far higher than you’d think from looking at the Fits values for individual genes.

    If Axe’s figure is right (and I’m not missing something), evolution has a big problem.

    But Axe’s also may be far from the real value.

    And then you quote Arthur Hunt’s famous statement which opposes to Axe’s “reverse” estimate of a folding/function sequence out of 10^70, the idea that “What is interesting is that the forward approach typically yields a “success rate” in the 10^-10 to 10^-15 range – one usually need screen between 10^10 -> 10^15 random sequences to identify a functional polymer. This is true even for mRNA display. These numbers are a direct measurement of the proportion of functional sequences in a population of random polymers, and are estimates of the same parameter – density of sequences of minimal function in sequence space – that Axe is after.”

    Well, I have great respect for Arthur Hunt, but what he says here is simply wrong.

    It is not true that according to data there is an “uncertainty” in the quantification of foldung/functional sequences in random libraries. The simple truth is that Axe’s data (and those of some other, who used similar reverse methodology) are true, while the forward data are wring. Not because the data themselves are wrong, but because they are not what we are told they are.

    The most classical paper about this froward approach is the famous Szostak paper:

    Functional proteins from a random-sequence library

    http://www.nature.com/nature/j.....0715a0.pdf

    I have criticized that paper in detail here some time ago, so I will not repeat myself. The general idea is that the final protein, the one they studies and which has some folding and a strong binding to ATP, is not in the original random library of 6 * 10^12 random sequences of 80 AAs, but is derived through rounds of random mutation and intelligent selection for ATP binding from the original library, where only a few sequences with very weak ATP binding exist.

    Indeed, the title is smart enough: “Functional proteins from a random-sequence library” (emphasis added), and not “Functional proteins in a random-sequence library”.

    The final conclusion is ambiguous enough to serve the darwinian propaganda (which, as expected, has repeatedly exploited the paper for its purposes):

    “In conclusion, we suggest that functional proteins are sufficiently common in protein sequence space (roughly 1 in 10^11) that they may be discovered by entirely stochastic means, such as presumably operated when proteins were first used by living organisms. However, this frequency is still low enough to emphasize the magnitude of the problem faced by those attempting de novo protein design.”

    Emphasis mine. The statement in emphasis is definitely wrong: the authors “discovered” the (non functional) protein in their library by selecting weak affinity for ATP (which is not a function at all) and deriving from that a protein with strong affinity (which is a useless function, in no way selectable) by RV + Intelligent selection (for ATP binding).

    That’s why the bottom up studies like Szostak’s tell us nothing about the real frequency of truly functional, and especially naturally selectable proteins in a ranodm library. That’s why they are no alternative to Axe’s data, and that’s why Hunt’s “argument” is simply wrong.

    Probably unaware of all that, you go on in good faith:

    If the 10^-10 to 10^-15 range is right, then we’ve got clear sailing for evolution. Now, I’m not going to claim to understand either Axe or Hunt’s reasoning in enough detail to make any claim about which (if either) is right, or what that tells us about the actual number. What I will claim is that getting a solid upper bound on that density is necessary for your reasoning to work. And there’s no way to get that from a specific gene’s Fits value (even if it is really large, like Paramyx RNA Polymerase).

    I think I have fully answered those points, both in this post and in the previous one. So, I will not repeat myself.

    Finally, as an useful resource, I give here a reference to a paper about protein engineering and mRNA display:

    De novo enzymes: from computational design to mRNA display

    http://www.cbs.umn.edu/sites/d.....0Cover.pdf

    Please, look in particular at Box 2 and Figure 1. of which I quote here the legend:

    Figure 1. General scheme for enzyme selection by mRNA display. A synthetic DNA library is transcribed into mRNA and modified with puromycin. During the subsequent in vitro translation, this modification creates a covalent link between each protein and its encoding mRNA. The library of mRNA-displayed proteins is reverse transcribed with a substrate-modified primer, thereby attaching the substrate to the cDNA/RNA/protein complex. Proteins that catalyze the reaction of the substrate modify their encoding cDNA with the product. Selected cDNA sequences are amplified by PCR and used as input for the next round of selection.

    Emphasis mine.

  193. 193
    gpuccio says:

    Gordon Davisson:

    Before going to the next argument, I would like to point out that, in your argument about how big the functional space is (let’s call it the “inflating the functional space” argument), you have not addressed an important comment that I have made:

    I paste it again here:

    “And there is another important point which is often overlooked. 543 bits (mean complexity) means that we have 1:2^543 probabilities to find one superfamily in one attempt, which is already well beyond my cutoff of 150 bits, and also beyond Dembski’s UPB of 520 bits. But the problem is, biological beings have not found one protein superfamily once. They have found 2000 independent protein superfamilies, each with a mean probability of being found of 1:2^543. Do you want to use the binomial distribution to compute the probability of having 2000 successes of that kind?”

    Now, let’s make a mental experiment, and let’s suppose that the functional islands are so frequent and big that we can fins a functional sequence out of, say, 10^10 random sequences. You will agree that this is an exaggeration, even more optimistic than the false estimates I have discussed preciously.

    Well, that would be the probability of finding (in one attempt) one functional island.

    If you agree (as you seem to agree) that we have at least 200 unrelated and independent functional islands in the proteome (the superfamilies), that means that the “successful” search has happened 2000 times independently. So, even in this highly imaginary scenario, the probability of getting 2000 successes would be approximately 10^20000. Quite a number, certainly untreatable by any realistic probabilistic system with realistic probabilistic resources. A number with which no sane scientist wants to deal.

    I would appreciate a comment on that, in your future answers.

  194. 194
    gpuccio says:

    Gordon Davisson:

    Errata corrige: “we have at least 2000 unrelated and independent functional islands” (not 200!)

  195. 195
    gpuccio says:

    Gordon Davisson:

    Another correction: “even more optimistic than the false estimates I have discussed previously“, not “preciously”! I suppose my narcissism pulled a subconscious trick on me. 🙂

  196. 196
    gpuccio says:

    Gordon Davisson:

    Now, the next argument:

    “The empirical argument: overview”

    Now, I want to turn to the empirical side of your argument: that we’ve never seen dFCSI produced without intelligence, which is evidence that it cannot be produced without intelligence. There are a number of complications here, but I want to concentrate on what I see as the big problem with this: we can only infer the nonexistence of something from not detecting it if we should expect to detect it.

    We’ve never actually seen an electron; but given how small they are, we don’t expect to be able to see them, so that’s not evidence they don’t exist. We’ve only seen atoms fairly recently, but that didn’t count as evidence against their existance for the same reason. On the other hand, if we’d gotten to the point where we should be able to see them, and hadn’t… then that would have been evidence against them.

    We’ve never seen a supervolcano erupt; that should be really easy to detect, but they’re rare enough that we don’t expect to have seen one anyway.

    I am not sure that I understand your argument here. My argument is that dFSCI, out of the biological world, is never observed without design, and is observed in tons when it comes from a design process. It is not, and never has been, that dFSCI is never observed.

    So, if we never observe a supervolcano erupt in normal circumstances, and then we see that a lot of supervolcanos erupt each time, say, there is an earthquake of at least some magnitude, it would be very reasonable to assume that there is a sound relationship between big earthquakes and supervolcano eruptions.

    So, my argument is not that we never see dFSCI, but rather that we see it a lot, and always in designed things.

    Then you say:

    In the case of completely novel genes, I’d expect them to be both rare and hard to detect, so making an empirical case against them is going to be very difficult.

    But again, the point here is not that completely novel genes are not observed. There are a lot of them. As we have said, all the 2000 superfamilies were completely novel genes when they appeared. And they are not hard to detect.

    Remember, the point is not to see a novel gene appear now. It is to explain how novel genes appeared at definite times in natural history, and how they can express dFSCI against all probabilistic rules and all known algorithms.

    Finally, you say:

    But let me take care of a minor issue. I’m going to be referring fairly extensively to the Lenski experiment, so I want to clarify my stance on its results. You said:

    “First of all, I hope we agree that no dFSCI at all has emerged from the Lenski experiment. At most, loss of function has come out. And the small regulatory change about citrate.”

    I have to disagree about the loss of function statement. At least as I understand it, a new functional connection was made between duplicates of two pre-existing functional elements (a gene and a regulatory sequence), creating a new function. This is certainly not a new functional gene (your main concern), but I’d argue it’s a gain of function and (at least as I understand it) a gain of dFSI (the continuous measure) if not dFSCI.

    Perhaps I have not been clear. I said:

    “At most, loss of function has come out. And the small regulatory change about citrate.”

    I did not mean that the regulatory change about citrate is loss of function (it could be, but I am not interested in analyzing that aspect).

    What I meant was that, apart from the regulatory change, the rest is mainly loss of function.

    I was referring to this: (from Wikipedia)

    In the early years of the experiment, several common evolutionary developments were shared by the populations. The mean fitness of each population, as measured against the ancestor strain, increased, rapidly at first, but leveled off after close to 20,000 generations (at which point they grew about 70% faster than the ancestor strain). All populations evolved larger cell volumes and lower maximum population densities, and all became specialized for living on glucose (with declines in fitness relative to the ancestor strain when grown in dissimilar nutrients). Of the 12 populations, four developed defects in their ability to repair DNA, greatly increasing the rate of additional mutations in those strains. Although the bacteria in each population are thought to have generated hundreds of millions of mutations over the first 20,000 generations, Lenski has estimated that within this time frame, only 10 to 20 beneficial mutations achieved fixation in each population, with fewer than 100 total point mutations (including neutral mutations) reaching fixation in each population

    Emphasis mine.

  197. 197
    Dionisio says:

    #193 gpuccio

    If you agree (as you seem to agree) that we have at least 2000 unrelated and independent functional islands in the proteome (the superfamilies), that means that the “successful” search has happened 2000 times independently. So, even in this highly imaginary scenario, the probability of getting 2000 successes would be approximately 10^20000. Quite a number, certainly untreatable by any realistic probabilistic system with realistic probabilistic resources. A number with which no sane scientist wants to deal.

    🙂

  198. 198
    kairosfocus says:

    GP: Where also the search/config space issue sits on the table as well as the extensions that we are looking at codes, algorithms and associated execution machinery in a von Neumann self replicating automaton. KF

  199. 199
    Dionisio says:

    KF @ 198

    Maybe that relates to the OP on procedures that GP is working on, which could be another technical checkmate (++) in all these discussions?

    Next, GP could sing:

    lasciatemi cantare con la chitarra in mano
    lasciatemi cantare una canzone piano piano

    while waiting for the interlocutors to recover from the shock

    🙂

  200. 200
    Dionisio says:

    For those interested in reading gpuccio’s insightful comments on the dFSCI concept, here are the post #s within this thread: 133, 140, 146, 149, 152, 173-175, 189, 190, 192-196.

  201. 201
    Mung says:

    gpuccio:

    Errata corrige: “we have at least 2000 unrelated and independent functional islands” (not 200!)

    Given the theory that all these families must be related, how likely is that!

  202. 202
    Popperian says:

    Gpuccio, our ability to make valid probability calculations is limited because there is more than one kind of uncertainty.

    When we know all of the possible outcomes and a process is random, we can calculate it’s probability. However, it’s unclear how you can calculate if biological darwinism is probable unless you know what all the possible outcomes are. As I pointed out in a previous thread..

    … probabilities can only be assigned based on an explanation that tells us where the probability comes from. Those numbers cannot be the probability of that explanation itself. They are only applicable in an intra-theory context, in which we assume the theory is true for the purpose of criticism.

    For example, you seem to assume that, if we went back in time and evolution was “played back” again, we would end up with the same result because that was the intended result. But that’s not evident in observations alone. Nor is it implied in evolutionary theory. That’s one of the theories you first implicitly bring to those observations.

    Some problems could be solved by some other combination of proteins. So, working the probability backwards is only valid in the context of categorizing where proteins fall in an intra-theory context, rather being valid the context of that theory being true.

    Furthermore, what constitutes repetition is not a sensory experience. Rather, it comes from theories about how the world works, in reality.

    For example, we do not expect all organisms to be preserved in the fossil record. As such, what does not constitute observing a transitional is, in part, based on the theory of paleontology, which includes the conditions in which we think fossilization occurs, etc.

    So, I’d ask, how is it that you know all the possible outcomes by which to calculate if evolution is probable?

  203. 203
    Upright BiPed says:

    Still no response from Gordon Davisson…

    😐

  204. 204
    kairosfocus says:

    P: Probability calculation is not the issue. Blind sampling in a context driven by chance and mechanical necessity is . . . if you will, imagine a sort of real world, massively parallel Ulam Monte Carlo process in ponds, puddles and patches of oceans, comets, gas giant moons then multiply across our observed cosmos. The search capacity of that would be dwarfed by the 10^80 atoms, each with 1,000 coins flipped, read and functionally tested every 10^-14s (~ 1 cycle of red light). For 10^17 s. Such a search could not sample — the search window, cf the illustration in the OP — as much as 1 in 10^150 of the space of configs for 1,000 H/T coins. This is a very sparse search. The very same Monte Carlo thinking tells us the predictable, all but certain result: a needle in haystack search. So long as FSCO/I relevant to life forms is rare, such a process will with all but absolute certainty only capture the bulk, straw. That is, islands of function are not credibly findable on such a process as is imagined for abiogenesis. And similar searches confined to planetary or solar system scope are much more overwhelmed by the complexity challenge posed by body plans. Recall, the genome alone of a first cell is credibly 100 – 1,000 kbits, and that for new body plans, 10 – 100+ mn, based on reasonable estimates and observed cases. For each additional bit, the search space doubles . . . 500 bits is 3.27*10^150 possibilities, 1,000 is 1.07*10^301, 100,000 is 9.99 *10^30,102 possibilities and it gets worse and worse. The dismissive talking points fail. KF

  205. 205
    DNA_Jock says:

    Soooo, kf, no calculation of p(T|H),despite the requests in posts 21, 39, and 42.

    Therefore, no calculation of CSI or FSCO/I.

    [crickets]

  206. 206
    kairosfocus says:

    D-J,

    actually, I have done that for controlled cases (at request/challenge of MF years ago, it’s in my always linked note through my handle and may be in the UD WACs).

    But the point is, why stop half way through the calc just to entertain those whose whole point is to obfuscate what is going on?

    Take the next step which reveals this is an info metric with a threshold.

    Info metrics, we have handled for seventy years.

    And the info beyond a threshold metric makes physical sense, that’s why between VJT Paul Giem and I the Chi_500 metric popped out three and a half years back.

    If you wonder, think about a Monte Carlo sim set, it explores a population and captures the range of outcomes reasonably likely to be encountered in a dynamic system [explores across time/runs] . . . or if you have background reflect on an ensemble of duplicate systems as Gibbs did for stat thermod, looking at what is feasibly observable. So, we have a reasonable practical model that is easily validated.

    Give all 10^57 atoms of the solar system trays of 500 coins, flip the trays and observe every 10^-14 s, fast ionic rxn rate. Do for 10^17 s.

    You will be able to sample of the config space for 500 coins, 3.27*10^150 possibilities, about as one straw to a cubical haystack 1,000 light years across, about as thick as our galaxy’s central bulge.

    Sampling theory tells us pretty quickly, that if such were superposed on our neighbourhood, with practical certainty we would blindly pick straw. We just are not able to sample a big enough fraction to make picking rarities up a practical proposition. And, precisely because FSCO/I comes because many correct parts have to be correctly arranged and coupled to achieve function, that tightly constrains configs. Equivalently, configs can be seen as coded sequences of Y/N q’s, set in string data structures. What AutoCAD does, in effect. So, discussion on strings is WLOG.

    To get a handle, 500 bits is about 72 ASCII characters as are used for this text. Not a lot.

    So, with just 500 bits, we already have a supertask for the solar system’s atomic resources. 501 bits doubles the space’s scope and 1,000 bits would make for a haystack that would swallow up the observable cosmos.

    Sure, you can tag, big numbers and sneer.

    Rhetoric, not analysis.

    In Darwin’s pond or the other usual OOL scenarios, there are just simple chemicals to begin, you need to get to a gated, encapsulated, protein using metabolic entity with string storage of DNA codes [and proteins come in thousands of super families deeply isolated in AA sequence space], with ribosomes ATP synthetase, and a von Neumann kinematic self replicator. And you have to get to that starting from physics, thermodynamics forces and chemistry, without intelligence.

    The degree of FSCO/I dwarfs the 500 – 1,000 bit toy scale thresholds we mentioned. Just genomes look like 100,000 – 1 mn bases.

    Where, if you want to argue incrementalism, you have to get to code based, taped storage, controlled constructor self replication, on empirically well grounded evidence.

    What we can reliably say is that there is per trillions of test cases and the needle in haystack analysis outlined, one known and plausible source: intelligently directed configuration, aka design.

    To move on to body plans and the other half of the protein domains, you are looking at 10 – 100+mn base prs per major body plan. The config spaces here are in calculator smoking territory.

    I know, you will have been assured with tree of life diags and stories that give the impression that there is a vast continent of functional forms easily traversed in increments that work all the way.

    False impression, starting from the degree of isolation of protein clusters in AA seq space.

    What has really driven the dominant school is ideological imposition, that starts on the concept that blind chance and mechanical necessity incrementally did it so anything that fits the narrative gets iconised.

    But, when you turn to the info issues that assurance evaporates.

    I hope that helps you at least understand some of where we are coming from.

    No this is not a little rhetorical gotcha game, some big unanswered issues lurk. And FSCO/I is at the heart of them.

    KF

  207. 207
    DNA_Jock says:

    kf, you said “So, with just 500 bits, we already have a supertask for the solar system’s atomic resources. 501 bits doubles the space’s scope and 1,000 bits would make for a haystack that would swallow up the observable cosmos.”
    Likewise when you calculated “Your comment no 248 to me is 1071 characters, at 7 bits each, wel past the 500 bit threshold.”
    You appear to be assuming that every state in the config space is equiprobable; is this your assumption?

  208. 208
    the bystander says:

    DNA jock @ 207
    Iam curious to know what other probability distribution would you use if there is no way of finding the probability of every possible state.

  209. 209
    MrMosis says:

    After a 14 billion year shake of its dice, the universe seems to have managed to create this here comment. Even in the config space of ascii characters in a comment box, that is quite impressive. But that is an artificial context. Truly the current configuration facilitating this comment was equiprobable with all other possible arrangements of constituent physical/material entities. Seems like usually things wind up in a state of equilibrium, rather than like this. Just hasn’t been enough time for this. We need more universes. Infinite universes will do the trick.

  210. 210
    Thorton says:

    KF’s nonsense really needs to be retitled

    “Functional Information Associated with Specific Complex Organization”

    FIASCO for short.

  211. 211
    kairosfocus says:

    Thorton, all you have done here is silly schoolyard taunting to play at erecting and knocking over a strawman. Strike one. KF

  212. 212
    DavidD says:

    KF – “Thorton, all you have done here is silly schoolyard taunting to play at erecting and knocking over a strawman. Strike one. KF”

    Well to be honest, that’s what happens when you open the door and let a wild animal inside of a human dwelling who is not potty trained. He’ll lift his leg and defecate anywhere he sees fit. Then turn right around growl, snarl and attack when you try and manage the situation. It’s not like you folks were unaware of his/her animalistic behavior. If anything can be said of this individual, he is at least consistent in his disrespect of fellow man. It’s not like he’s ever pretended to hide anything or promise to behave.

    Allowing this amnesty thing to all those banned Sock-Puppets was like a rural land owner calling the Septic Tank Company and asking them to please come back and dump that load they pumped from your Tank onto your front lawn because you want to prove to all your neighbors and the whole world how ecologically responsible and sustainable you are. The resulting infections and diseases that followed that fateful decision proved what a fatal flaw that decision really was, even though the motive may have been a genuine one.

  213. 213
    kairosfocus says:

    D-J:

    I am not making any particular probability assignment beyond what is implicit in normal constructions of information.

    We could have a side debate on the Bernouilli-Laplace principle of indifference as the first point of debates about probabilities and go on to the entropy approach used by Shannon to get the SUM pi log pi value that gives avg info per symbol, and how this is connected to entropy of a macrostate as the average lack of info (think Y/N q chain length to specify microstate given lab observable parameters specifying macro state), etc, but that would be a grand detour liable to even more pretzel twisting.

    Simply follow up on how monte carlo sims run as a set allow characterisation of the stochastic behaviour of a system as an exercise in sampling. Then understand that there are logical possibilities that are too remote relative to dominant clusters that lead to the needle in haystack search challenge with limited resources. The comparative I give, 10^57 sol system atoms as observers doing a sample of a 500-bit trial every 10^-14 s [a fast chem rxn rate] is a Monte Carlo on steroids, and in effect gives a generous upper limit to what sampling in our solar system can do.

    No realistic sol system process can explore more of a config space than this. In short I am giving a limit based on number of possible chem rxn time events in our sol system on a reasonable lifespan. (Cf Trevors and Abel on the universal plausibility bound for more, here. Well worth reading — rated, “highly accessed.”)

    For the observable cosmos, we are looking at 1,000 bits exhausting sampling possibilities even more spectacularly.

    Now, go to much slower organic chem rxns in some Darwin’s pond, comet, gas giant moon, etc. Pre-biotic situation. Note that hydrolysis and condensation rxns compete, there is generally no significant energetic reason to favour L- vs R-handed forms in ponds etc, there are cross-rxns, many of the relevant life reactions are energetically uphill (note ATP as molecular energy battery), we need to meet a fairly demanding cluster of organisation, process control etc. With the proteins coming in thousands of clusters in AA sequence space that are structurally unrelated, perhaps half needing to be there for the first working cell. Mix in chicken-egg loops for many processes, then impose a von Neumann kinematic self-replicator facility that uses codes, algorithmic step by step processes and more.

    No intelligently directed configuration permitted, per OOL model assumptions.

    Here are Orgel and Shapiro in their final exchange some years ago on the main schools of thought on OOL:

    [[Shapiro:] RNA’s building blocks, nucleotides contain a sugar, a phosphate and one of four nitrogen-containing bases as sub-subunits. Thus, each RNA nucleotide contains 9 or 10 carbon atoms, numerous nitrogen and oxygen atoms and the phosphate group, all connected in a precise three-dimensional pattern . . . . [[S]ome writers have presumed that all of life’s building could be formed with ease in Miller-type experiments and were present in meteorites and other extraterrestrial bodies. This is not the case.

    A careful examination of the results of the analysis of several meteorites led the scientists who conducted the work to a different conclusion: inanimate nature has a bias toward the formation of molecules made of fewer rather than greater numbers of carbon atoms, and thus shows no partiality in favor of creating the building blocks of our kind of life . . . .

    To rescue the RNA-first concept from this otherwise lethal defect, its advocates have created a discipline called prebiotic synthesis. They have attempted to show that RNA and its components can be prepared in their laboratories in a sequence of carefully controlled reactions, normally carried out in water at temperatures observed on Earth . . . .

    Unfortunately, neither chemists nor laboratories were present on the early Earth to produce RNA . . .

    [[Orgel:] If complex cycles analogous to metabolic cycles could have operated on the primitive Earth, before the appearance of enzymes or other informational polymers, many of the obstacles to the construction of a plausible scenario for the origin of life would disappear . . . .

    It must be recognized that assessment of the feasibility of any particular proposed prebiotic cycle must depend on arguments about chemical plausibility, rather than on a decision about logical possibility . . . few would believe that any assembly of minerals on the primitive Earth is likely to have promoted these syntheses in significant yield . . . . Why should one believe that an ensemble of minerals that are capable of catalyzing each of the many steps of [[for instance] the reverse citric acid cycle was present anywhere on the primitive Earth [[8], or that the cycle mysteriously organized itself topographically on a metal sulfide surface [[6]? . . . Theories of the origin of life based on metabolic cycles cannot be justified by the inadequacy of competing theories: they must stand on their own . . . .

    The prebiotic syntheses that have been investigated experimentally almost always lead to the formation of complex mixtures. Proposed polymer replication schemes are unlikely to succeed except with reasonably pure input monomers. No solution of the origin-of-life problem will be possible until the gap between the two kinds of chemistry is closed. Simplification of product mixtures through the self-organization of organic reaction sequences, whether cyclic or not, would help enormously, as would the discovery of very simple replicating polymers. However, solutions offered by supporters of geneticist or metabolist scenarios that are dependent on “if pigs could fly” hypothetical chemistry are unlikely to help.

    Mutual ruin, in short; what they probably did not tell you in HS or College Bio textbooks and lectures.

    This is the root of the iconic tree of life.

    It is chock full of FSCO/I, as Orgel and Wicken noted across the 1970’s:

    ORGEL: . . . In brief, living organisms are distinguished by their specified complexity. Crystals are usually taken as the prototypes of simple well-specified structures, because they consist of a very large number of identical molecules packed together in a uniform way. Lumps of granite or random mixtures of polymers are examples of structures that are complex but not specified. The crystals fail to qualify as living because they lack complexity; the mixtures of polymers fail to qualify because they lack specificity. [[The Origins of Life (John Wiley, 1973), p. 189.]

    WICKEN: ‘Organized’ systems are to be carefully distinguished from ‘ordered’ systems. Neither kind of system is ‘random,’ but whereas ordered systems are generated according to simple algorithms [[i.e. “simple” force laws acting on objects starting from arbitrary and common- place initial conditions] and therefore lack complexity, organized systems must be assembled element by element according to an [[originally . . . ] external ‘wiring diagram’ with a high information content . . . Organization, then, is functional complexity and carries information. It is non-random by design or by selection, rather than by the a priori necessity of crystallographic ‘order.’ [[“The Generation of Complexity in Evolution: A Thermodynamic and Information-Theoretical Discussion,” Journal of Theoretical Biology, 77 (April 1979): p. 353, of pp. 349-65. (Emphases and notes added. Nb: “originally” is added to highlight that for self-replicating systems, the blue print can be built-in.)]

    The roots of the terms complex specified information, specified complexity and functionally specific complex organisation and associated information lie in this OOL context. Early design thinkers such as Thaxton et al built on this and Dembski et al and Durston et al, used statistical and information theory techniques to develop metric models.

    The simple model in the OP above allows us to understand how they work. and provide a useful threshold metric of functionally specified complexity that is not plausibly the result of blind chance and mechanical necessity on the gamut of sol system or observed cosmos. Where, for chemical interactions, the sol system is to first reasonable approximation our effective universe.

    It is patently maximally implausible to get to first cell based life meeting known requisites buy blind forces. Speculative intermediates depend on speculation or major investigator injections of intelligently directed contrivances. And, there is a paucity of the scope of results that would be required to give adequate empirical warrant to such an extraordinary claim. Including, recall, codes and algorithms plus matched executing machinery assembling themselves out of molecular agitation and thermodynamic forces in that pond or the like.

    The fact is, on trillions of test cases — start with the Internet and a technological civilisation around us — FSCO/I is a commonplace phenomenon and one where there is a reliably known, uniformly observed cause for it. One, consistent with the needle in haystack challenge outlined. Namely, the process of intelligently directed contingency, aka design.

    This more than warrants accepting FSCO/I as a reliable, inductively strong signature of design when we see it, even for cases where we do not have direct access to the observation of the causal process, e.g. the remote past of origins or an arson investigation etc.

    This is not even controversial generally, it is only because of the ideologically imposed demand and attempted redefinition of science and its methods in recent years . . . turning science and sci edu, too often into politics and propaganda . . . that there is a significantly polarised controversy and the sort of dismissive sneering schoolyard taunt contempt Th just exemplified.

    We speak here of inductive inference to the best current explanation, open to further observation and reasoned argument. If you disagree bring forth empirically backed evidence, as opposed to ideologised just so stories dressed up in lab coats.

    Turning to origin of major body plans, the degree of isolation in islands of function of protein domains in AA sequence space should give pause. The pop genetics to find and fix successive developmental muts should add to that. The number of co-ordinated muts to make a whale out of a land animal should ring warning bells. And the scope of genomes to effect body plans, 10 – 100+ mn bases, caps off for now.

    In short, we have an origins narrative that is clearly ideologically driven and lacks warranting observational evidence. It is deeply challenged to account for a common phenomenon, FSCO/I. One that happens to be pervasive in cell based life.

    One, whose observed origin, on trillions of cases, is no mystery: design.

    Yes, one may debate on merits and information theory based metrics. But in the end, the FSCO/I phenomenon is an empirical fact, not a dubious speculation. In examples in the living cell, we face digitally — discrete state — coded genetic information with many algorithmic features, and associated execution machinery that at that object code level, have to be very specifically co-ordinated. With kilobits to megabits and more of information to address, right in the heart of the living cell, in string data structures. With Nobel Prizes awarded for discovery and elucidation.

    So, what do you know about coded strings such as these in this thread, or in the chips in the computer behind what is on your screen, or in D/RNA?

    First, they are informational, and that coded information is functionally specific and measurably complex. This doc file has 369 kbytes, etc.

    Second, that info metrics take in selection and surprise from a field of possibilities, implying a log prob model, often on base 2 hence bits. Though, often the direct length of chain of y/n q’s to specify state approach [7 y/n q’s per ASCII character for familiar instance] gives essentially the same result. Hence again bits, binary or two state digits or elements. Redundancy in codes — per familiar cases, u almost always follows q in English text and adds very little info, e is about 1/8 of English text, etc — may reduce capacity somewhat but in the end does not materially affect the point.

    Third, that codes imply rules or conventions that communicate on assignment of symbolised states to meanings, which may be embedded in the architecture of a machine that effects machine code. The ribosome protein assembly system is a case of this.

    Fourth, that this is on the table at origin of cell based life. At its core.

    Design is at the table from OOL forward as a live option, whatever the ideologues may wish to censor.

    And, it is therefore at the table thereafter.

    And, would make sense out of a lot of things.

    Opening up many investigations and the prospect of reverse engineering and forward engineering. (Though, given the obvious dangers of molecular nanotech, that needs to be carefully regulated across this Century.)

    Nope, a science sparker not a science stopper.

    So, why should debates over Bernouilli-Laplace indifference and the like be seen as gotcha, lock up talking points . . .? (Apart from, ideologues looking for handy distractors to cloud and polarise an issue.)

    Sure, debate why we think any card has 1 in 52 chances, or any side of a die 1 in 6, or a coin 1 in 2, or the idea used by early stat mech thinkers that microstates to first approximation are equiprobable so the focus shifts to statistical weight of clusters.

    But recall, there is a more complex analysis that takes in distributions, and that Monte Carlo methods allow characterisation of pattens. Where, Durston et al are in effect arguing that across its span, life eplored the range of reasonably practical empirically warranted possibilities, reflected in the patterns we see in protein families.

    And more.

    KF

    PS: BTW, in decision theory, we don’t usually assign a blind chance distribution to a decision node as intelligent deciders have purposes.

  214. 214
    kairosfocus says:

    DD, I hear your point — and I suspect, so does Management, but obviously factors we are not privy to were at work. KF

  215. 215
  216. 216
    DNA_Jock says:

    So your answer is “All prior probabilities are equal, by the principle of indifference”.
    Excellent. That’s specific, although rather difficult to justify. It does have the weight of tradition behind it, and I think I would make the same assumption myself.
    Next up: Once you have selected a value at one position, are the other states still all equiprobable? Or, to put it another way, are all the dimensions in your config space completely independent from all other dimensions?

  217. 217
    DNA_Jock says:

    As the bystander wrote:

    Iam curious to know what other probability distribution would you use if there is no way of finding the probability of every possible state

    I agree – which makes calculating p(T|H) problematic.
    I think it is even worse: you cannot find the probability of any state, which makes calculating p(T|H) impossible. But I could be wrong…

  218. 218
    Joe says:

    DNA_jock- the problem with using probabilities is that evos cannot provide any. Just allowing probabilities is giving evos more than they deserve.

  219. 219
    Joe says:

    The anti-ID mob can try to make fun of CSI and all of its variants but they sure as hell cannot provide any methodology that shows unguided processes can account for Crick’s defined biological information that is found in all living organisms.

    IOW the anti-ID mob is full of cowards.

  220. 220
    kairosfocus says:

    D-J: I made no such claim, please do not pigeon-hole and strawmannise me. I simply pointed out the significance and utility of the principle (which is undeniable regardless of debates made, note dice, card and coin cases, other assignments modify this . . . and in info contexts reduce info capacity [but in our case not materially]), and that there is a more complex SUM of pi log pi approach used with Shannon’s H metric and other cases. I take as much more basic the approach of string length of y/n q’s to specify state in a high contingency context. Beyond, I pointed out that the Durston et al approach, building on the H metric, uses the actual history of life as recorded in protein family variability, is in effect a real world Monte Carlo search which can characterise realistic distribution. Is it not 3.8 – 4.2 Bn y on the usual timeline for sampling, and c 600 MY for extant complex forms of esp animals. So, there is a practical, real world Monte Carlo sim run pattern that allows H, avg info per symbol or relevant message bloc, to be deduced. We have reasonable access to info metrics, by simple 20 states per AA, or by H-metric assessed off real life variability in protein families. So, the silly schoolyard taunts on elephants in rooms collapse. Protein chains are info bearing and are functionally specific, in aggregate are well beyond the sort of thresholds that make blind chance and necessity utterly implausible, and with the associated phenomena of codes, algorithms and execution machinery, strongly point to design of the cell based life form. KF

  221. 221
    Thorton says:

    Still no real-world CSI calculations from any of the ID crowd. Oh well.

  222. 222
    DNA_Jock says:

    kf noted

    in info contexts reduce info capacity…

    Agreed; therefore any ‘bit-counting’ method will be inaccurate

    …[but in our case not materially]

    Aye, there’s the rub!
    What is the basis for this claim?
    How have you quantified the magnitude of the error?
    Please be precise.

  223. 223
    kairosfocus says:

    D-J:

    Info carrying capacity maxes out at a flat distribution, and anything other will be somewhat different and less.

    But prevalence of A/G/C/T or U and the 20 proteins in general and for specific families has long been studied as can be found in Yockey’s studies.

    The Durston metrics of H reflect this, which comes out in SUM pi log pi metrics.

    What happens is that AFTER going to H-values on functional vs ground vs null states — family, general distribution, flat distribution, using those for I and addressing the 500 bit threshold, of Durston’s 15 families, I gave three results, chosen to illustrate the point:

    RecA: 242 AA, 832 fits, Chi: 332 bits beyond
    SecY: 342 AA, 688 fits, Chi: 188 bits beyond
    Corona S2: 445 AA, 1285 fits, Chi: 785 bits beyond

    Immaterial, even for individual proteins.

    And in aggregate, the genomes are of order 100,000 – 1 mn kbases at the low end. Toss 98% as junk and treat the 2,000 bases for 100 k bases as one bit per base not two, we are still in “designed” territory. That’s before addressing the associated organisation of machinery which must be going concern in a living cell for DNA to work. That raises issues of FSCO/I beyond protein codes in the DNA.

    A more realistic number for a low end independent cell based life form is 300 kbases.

    In short, no material difference, even giving concessions that should not be given just so.

    Remember, at OOL, the chemistry and thermodynamics of chaining do not program which base follows which, and that if it did, info capacity would be zero.

    The biochemical predestination thesis Kenyon et al put forth in the 60’s, on proteins did not survive examination of chaining patterns. (Indeed that is part of how Kenyon, in writing the foreword to TMLO, c 1984, repudiated his thesis.)

    And more . . .

    KF

  224. 224
    kairosfocus says:

    Thorton, you are now speaking with disregard to truth, hoping that what you have said or suggested will be taken as true. Sad. You need to take time to actually address what Durston et al have done which is a direct fact to the contrary of your declaration. And, of course there is much more that you have willfully despised and pretended away. Please, do better. KF

    PS: Onlookers, remember that just to start, ordinary computer files are FSCO/I this is a familiar thing, and that through node-arc 3-d patterns that ae functionally specific, ortganisation converts to FSCO/I metrics, as AutoCAD etc routinely do. MRNA coding for proteins is a string data structure at essentially 2 bits per base, 6 per protein AA. Again, if you have been unduly disturbed cf:

    http://www.uncommondescent.com.....formation/

  225. 225
    gpuccio says:

    DNA_Jock:

    If you want, I can give you the scenario for protein coding genes.

    If we assume that a new protein superfamily appears at some time in natural history (which is exactly what we observe), and that the new gene derives from some existing sequence (another protein coding gene, or a pseudogene, or some non coding DNA sequence), then the scenario is:

    A to B through a random walk.

    Where A and B are two states which are unrelated at sequence level. Why? Because protein superfamilies are by definition unrelated at sequence level. And there are 2000 of them

    So, B is sequence unrelated to A, and therefore we can only assume that it has the same probability as any other unrelated state.

    What is that probability? As we assume that all unrelated states have approximately the same probability, if we have n possible states, it is obvious that the minority of states which are sequence related to A has a higher probability of being found than any unrelated state, especially in a random walk where most variation is of a single aminoacid.

    Therefore, we can safely assume that 1/n is a higher threshold for the probability of each unrelated state.

    Is that clear?

  226. 226
    Joe says:

    timmy: “Thar ain’t no ID CSI calculations cuz I don’t know what I am talking about!

    And that is after being shown how to calculate CSI and a peer-reviewed paper that follows the methodology.

  227. 227
    Thorton says:

    Joke

    And that is after being shown how to calculate CSI

    Yeah Joke, you told us. Like when you calculated the CSI of an aardvark by counting the letters in its dictionary definition.

    ALL SCIENCE SO FAR!

  228. 228
    Joe says:

    LoL!@ thorton the moron. I NEVER calculated the CSI of an aardvark by counting the letters in its dictionary definition. You are either a liar or a loser.

    I used the definition of an aardvark as an example of specified information and then used Shannon methodology to determine if it was also complex. I was a simple example but obviously too complicated for a simpleton like yourself.

    Being proud of your ignorance and dishonesty can’t be a good thing.

  229. 229
    DNA_Jock says:

    gpuccio,

    Thank you for the thoughtful reply.

    When you say

    A to B through a random walk

    I have two quibbles:
    1) The ancestral sequence is not directly available to us; we have X to A and X to B. Some researchers have, with some success, inferred what X must have looked like. To assume that the path of interest is from A to B is to fall victim to “Axe’s mistake”.
    2) It is NOT a ‘random’ walk. It is two or more stochastic, somewhat constrained walks. Selection may play a role.

    Where A and B are two states which are unrelated at sequence level. Why? Because protein superfamilies are by definition unrelated at sequence level. And there are 2000 of them

    Your definition of superfamilies differs from mine. My understanding is that a superfamily is the highest taxonomic level at which ancestry can be inferred. It’s not that different families are completely unrelated, it’s that the relationship is unclear.

    So, B is sequence unrelated to A, and therefore we can only assume that it has the same probability as any other unrelated state.

    This makes no sense. You appear to be claiming that novel superfamilies arise from TOTALLY random sequences. Where on earth would an organism find such totally random sequences? Cells are, however, chock-a-block full of sequences that form stable alpha-helices, others that form stable beta-sheets, some that form stable helix-turn-helix motifs. These motifs are too short and permissive to allow the accurate inference of common ancestry millions of years later*, but they completely destroy the ‘bit-counting’ probability calculations. As does any selection.
    So using 1/n for p(T|H) is hopelessly inaccurate.

    * Do you know how recent is the most recent superfamily? Thanks,
    DNA_J

  230. 230
    DNA_Jock says:

    kf,

    But prevalence of A/G/C/T or U and the 20 proteins in general and for specific families has long been studied as can be found in Yockey’s studies.

    Accounting for the different prevalences of the different monomers is easily done, I agree. But that is not where your problem lies.

    The Durston metrics of H reflect this, which comes out in SUM pi log pi metrics.

    H is a measure of functional uncertainty, not a probability. You have to feed pi into this equation…
    I am asking you how you calculate a single pi value. You have already admitted that you assume independence and that this assumption is incorrect- viz: ” info contexts reduce info capacity”, but you have asserted that your error is not “material”.
    When I ask you what basis you have for this “not material” claim, your response is “because the numbers we get are soooo big”.
    So you have yet to calculate p(T|H) with any known level of precision.

  231. 231
    gpuccio says:

    DNA_Jock:

    Good questions. Here are my answers.

    1) “The ancestral sequence is not directly available to us; we have X to A and X to B. Some researchers have, with some success, inferred what X must have looked like. To assume that the path of interest is from A to B is to fall victim to “Axe’s mistake”.”

    OK, but we are dealing here with superfamilies which have no detectable sequence similarities, and no similar folding, and no similar function.

    Yoi must remember that the walk happens at sequence level, through random mutations in the nucleotide sequence, which have nothing to do with the function. I will discuss the role of selection later.

    So, whatever the possible ancestor X, the fact remains that what we observe are unrelated sequences (2000 of them) with unrelated folding and unrelated function. Believing that they derived from some common ancestor, which was itself functional, is like believing in a myth: you can do that, but there is no empirical support.

    When an ancestor is reconstructed (tentatively) that is done because of similarities in the descendants. You cannot do that for unrelated sequences.

    2) “It is NOT a ‘random’ walk. It is two or more stochastic, somewhat constrained walks. Selection may play a role.”

    Again, I will discuss selection later. Regarding the two walks, no, it is a random walk form the ancestor A, whatever it was, to B. The simple fact is, the ancestor had to be different at sequence level and folding level and function level, if the superfamily as such only appears at some moment in natural history, and there is no trace at all of any ancestor, at sequence, folding or function level, in the existing proteome. Again, you can believe what you like, and myths are a legitimate hobby, but in science we have to look at facts.

    You say:

    “Your definition of superfamilies differs from mine. My understanding is that a superfamily is the highest taxonomic level at which ancestry can be inferred. It’s not that different families are completely unrelated, it’s that the relationship is unclear.”

    The currently defined superfamilies in SCOP are completely sequence unrelated, and they have different folding and different functions. What do you want more?

    You say:

    “This makes no sense. You appear to be claiming that novel superfamilies arise from TOTALLY random sequences. Where on earth would an organism find such totally random sequences?”

    You are forgetting non coding DNA. We have now many examples of new genes emerging from non coding sequences, usually through transposon activity. Can you explain why a non coding sequence should not be unrelated to a protein coding sequence which emerges in time from it? Especially if that happens through unguided transposon activity?

    Let’s go back to the alpha and beta subunits of ATP synthase. They are unique sequences, emerging very early (in LUCA) and retaining most of the sequence for 4 billion years.

    What is your hypothesis? From what “ancestor” sequence did they emerge?

    The fact remains, if a specific function sequence emerges, say 2 billion years ago (for example, all the many new superfamilies and proteins in eukatyotes) and there is no trace of those sequences, foldings and functions before, the only reasonable hypothesis is that those sequences emerged from unrelated sequences at some time.

    You say:

    “Cells are, however, chock-a-block full of sequences that form stable alpha-helices, others that form stable beta-sheets, some that form stable helix-turn-helix motifs. These motifs are too short and permissive to allow the accurate inference of common ancestry millions of years later*, but they completely destroy the ‘bit-counting’ probability calculations. ”

    No. Absolutely not. Secondary structures are common to all proteins, but in themselves they build no specific function. The function is defined at many levels, but the two most important ones are:

    a) The general folding and more specific tertiary structure of the molecule. That is highly sequence dependent, even if different sequences can fold in similar ways, but almost always they retain at least some homology. So, in the same superfamily or family you can have 90% homology or 30% homology. In rare cases you can have almost no homology, but that is rather an exception.

    b) The active site. That determines more often the specific affinity to different possible substrates, and is often what changes inside a family to bear new, more or less different, functions.

    But you cannot certainly consider secondary structures, like alpha helix and beta sheets, as independent functional units.

    Instead, domains which are really functional, even if simple, are usually retained throughout evolution, and show clear homologies even in very distant proteins.

    You say:

    “As does any selection.”

    OK, but as I have always said the role of selection must be considered separately. It has nothing to do with the computation of probabilistic resources and barriers.

    Now, the topic is very big, and I have dealed with it in detail many times. I cannot cover it in all its aspects in this post.

    Briefly, the main points are.

    1) Negative selection is responsible for the sequence conservation of what is already functional. That is a very powerful mechanism. It is the reason why ATP synthase chains are almost the same after 4 billion years: almost all mutations are functionally detrimental, so the sequence is conserved.

    2) Neutral mutations and genetic drift are responsible for the modifications in the permissive, not strictly constrained, parts of the sequence. That’s why similar proteins, with the same function, and structure, can sometimes diverge very much in different species. That is the “big bang theory of protein evolution”: new proteins appear functional, and then traverse their functional space through neutral variation.

    3) Positive selection should act by expanding any naturally selectable positive variation and therefore “fixing” it in the general population. This is the only part that can lower the probabilistic barriers. Unfortunately, it is also the part for which we have no evidence. Apart from the few known, trivial microevolutionary scenarios.

    The fact is, positive selection is extremely rare and trivial. It selects for small variations (one or two AAs) which usually lower the functionality of some existing structure, but confer an advantage under specific extreme selective scenarios (see antibiotic resistance, for example).

    Now, for positive selection to be really useful in generating new unrelated functional sequences, two thing should be true:

    a) Complex functional sequences should be, as a rule, deconstructable into many intermediate, each of them functional and naturally selectable, each of them similar at sequence level to its ancestor except for a couple of AAs. IOWs, the transition from A to B should always be deconstructable into a composite transition:

    A to A1 to A2 to A3… to B-3 to B-2 to B-1 to B

    where each transition is in the range of the probabilistic resources of the system (from one to a few AAs) and each intermediate step is fully naturally selectable and naturally selected at some time (expanded to the general population).

    That is simply not true, both logically and empirically. We have no examples of such a deconstruction, and there is no logical reason why that should be possible, not only in specific cases, but in the general case. So, we have both logical and empirical reasons to reject that assumption.

    b) Anyway, if a) were true, we should observe strong and obvious traces of those intermediates in the proteome. We don’t. Or we should be able to find those paths in our labs. We don’t.

    So, again, myths and nothing else.

    A final consideration: negative selection, which is a strong and well documented mechanism, can only act as a force against the transition as long as the ancestor is a functional sequence. That has been shown very clearly in the ragged landscape paper. Each relative peak of functionality in the sequence space tends to block the process: those peaks are, indeed, holes.

    That’s why most biologists rely more on scenarios where the ancestor is not functional: duplicated, inactivated genes, or simply non coding sequences.

    But that means relying essentially on neutral variation. And neutral variation is completely irrelevant in lowering the probabilistic barriers: it simply cannot do that.

    Finally, as far as I know new superfamilies are documented up to the primates. I am not sure in humans, probably not. But, as you know, there are many possible new genes in humans which have not yet been characterized (a few thousands).

  232. 232
    kairosfocus says:

    D-J et al: I have to be busy elsewhere, but I note simply that H is anmetric of avg info per element, which can be derived from the on the ground statistics, which is what Durston et al did. Where, the history of life can be taken as exploring the field of realistic possibilities in a real life Monte Carlo run, on the usual timeline across c 4 BY. Similarly, it is a pretty standard result that the chain of y/n possibilities defining a state is a valid info metric. So, it’s not we are stuck for probability values, but that info can be arrived at in its own terms and in fact then worked back ways to probabilities if you want. For example, Morse simply went to printers to learn general English text statistical frequencies and used that to set up his code for efficiency. No prizes for guessing why E is a single dot. So, the pretence that no there are no valid info metrics on the table that indicate FSCO/I collapsed long since; not to mention the pretence that FSCO/I does not describe a real phenomenon — something as commonplace as text in posts in this thread and the complex functional organisation of the PCs etc people use to look at the thread. Also, the AA space and islands in it, GP is clearly addressing. KF

  233. 233
    DNA_Jock says:

    kf,
    As I already said, in the comment you are replying to,

    “Accounting for the different prevalences of the different monomers is easily done, I agree. “

    Which is what Morse did.

    “But that is not where your problem lies.”

    You have already admitted that you assume independence and that this assumption is incorrect.
    How big is the error? How do you know?

  234. 234
    DNA_Jock says:

    Gpuccio,
    I agree with you that the origin of novel protein domains (specifically “folds”) is perhaps the greatest challenge to the MES.
    OTOH I disagree with almost everything that you say about selection; in my reply below, however, I am going to attempt to restrict myself to those disagreements that bear directly on the point at issue.

    DNA_Jock:
    Good questions. Here are my answers.
    1) “The ancestral sequence is not directly available to us; we have X to A and X to B. Some researchers have, with some success, inferred what X must have looked like. To assume that the path of interest is from A to B is to fall victim to “Axe’s mistake”.”
    OK, but we are dealing here with superfamilies which have no detectable sequence similarities, and no similar folding, and no similar function.

    No. The 2020 superfamilies in SCOP are grouped into 1202 “folds”, based on similarities in folding.
    Furthermore, Apolipoprotein A-I and Apolipoprotein A-II belong to different folds, and therefore different superfamilies, yet they have similarity in “function”.

    Yoi must remember that the walk happens at sequence level, through random mutations in the nucleotide sequence, which have nothing to do with the function. I will discuss the role of selection later.
    So, whatever the possible ancestor X, the fact remains that what we observe are unrelated sequences (2000 of them) with unrelated folding [incorrect] and unrelated function [incorrect]. Believing that they derived from some common ancestor, which was itself functional, is like believing in a myth: you can do that, but there is no empirical support.
    When an ancestor is reconstructed (tentatively) that is done because of similarities in the descendants. You cannot do that for unrelated sequences.
    2) “It is NOT a ‘random’ walk. It is two or more stochastic, somewhat constrained walks. Selection may play a role.”
    Again, I will discuss selection later. Regarding the two walks, no, it is a random walk form the ancestor A, whatever it was, to B. The simple fact is, the ancestor had to be different at sequence level and folding level and function level, if the superfamily as such only appears at some moment in natural history, and there is no trace at all of any ancestor, at sequence, folding [incorrect] or function [incorrect] level, in the existing proteome. Again, you can believe what you like, and myths are a legitimate hobby, but in science we have to look at facts.
    You say:
    “Your definition of superfamilies differs from mine. My understanding is that a superfamily is the highest taxonomic level at which ancestry can be inferred. It’s not that different families are completely unrelated, it’s that the relationship is unclear.”
    The currently defined superfamilies in SCOP are completely sequence unrelated, and they have different folding [incorrect] and different functions [incorrect]. What do you want more?
    You say:
    “This makes no sense. You appear to be claiming that novel superfamilies arise from TOTALLY random sequences. Where on earth would an organism find such totally random sequences?”
    You are forgetting non coding DNA. We have now many examples of new genes emerging from non coding sequences, usually through transposon activity. Can you explain why a non coding sequence should not be unrelated to a protein coding sequence which emerges in time from it? Especially if that happens through unguided transposon activity?

    I have not forgotten non-coding DNA. It still isn’t “random” in the mathematical sense. Consider the prevalence of di-, tri- and tetra-nucleotide repeats, or CpG bias. But thank you from bringing up transposons.

    Let’s go back to the alpha and beta subunits of ATP synthase. They are unique sequences, emerging very early (in LUCA) and retaining most of the sequence for 4 billion years.

    Not sure what “unique” means here: ncbi reports 42 distinct versions of alpha, and 51 distinct versions of beta. “unique” means they differ from each other, I guess.

    What is your hypothesis? From what “ancestor” sequence did they emerge?

    “A primordial ATP binding protein”. I, too, am unsatisfied with this answer.

    The fact remains, if a specific function sequence emerges, say 2 billion years ago (for example, all the many new superfamilies and proteins in eukatyotes) and there is no trace of those sequences, foldings and functions before, the only reasonable hypothesis is that those sequences emerged from unrelated sequences at some time.

    As I understand this sentence, you are saying that if we cannot determine the source DNA for an event that took place 2 billion years ago, the only reasonable hypothesis is that the source DNA is completely unrelated to the DNA that is ancestral to any other extant DNA sequences. Weird.

    You say:
    “Cells are, however, chock-a-block full of sequences that form stable alpha-helices, others that form stable beta-sheets, some that form stable helix-turn-helix motifs. These motifs are too short and permissive to allow the accurate inference of common ancestry millions of years later*, but they completely destroy the ‘bit-counting’ probability calculations. ”
    No. Absolutely not. Secondary structures are common to all proteins, but in themselves they build no specific function. The function is defined at many levels, but the two most important ones are:
    a) The general folding and more specific tertiary structure of the molecule. That is highly sequence dependent, even if different sequences can fold in similar ways, but almost always they retain at least some homology [but the homology may be undetectable, see ‘folds’, above]. So, in the same superfamily or family you can have 90% homology or 30% homology. In rare cases you can have almost no homology, but that is rather an exception. [as you get further apart, it becomes a fairly frequent exception]
    b) The active site. That determines more often the specific affinity to different possible substrates, and is often what changes inside a family to bear new, more or less different, functions.

    c) Allosteric sites are important too.

    But you cannot certainly consider secondary structures, like alpha helix and beta sheets, as independent functional units.
    Instead, domains which are really functional, even if simple, are usually retained throughout evolution, and show clear homologies even in very distant proteins.

    Why can I not consider them? “Forms a stable alpha helix” is a useful function, from the protein’s perspective. Anyway, I also included the helix-turn-helix motif. Sequence-specific DNA binding seems “really functional”. There are a number of 2ndary structure motifs that make useful building blocks for new proteins, but they are too short and/or too permissive of substitutions for the relationship to be unambiguously detected billions of years later, and you have to be wary of false positives caused by convergent evolution. Hence the problem with inferring ancestral relationships above the level of superfamily.

    You say:
    “As does any selection.”
    OK, but as I have always said the role of selection must be considered separately. It has nothing to do with the computation of probabilistic resources and barriers.

    No. If you are trying to calculate the probability of hitting a target, given the “chance” hypothesis (which includes RM + NS), then you ARE going to have to consider the effect of selection. And recombination…

    [snipped section on selection, most of which I disagree with, but we can save that debate for another time…]
    That’s why most biologists rely more on scenarios where the ancestor is not functional: duplicated, inactivated genes, or simply non coding sequences.

    Or fragments of the aforementioned categories, brought into novel combinations by illegitimate recombination, potentially catalyzed by transposons.
    But in all cases, and especially if the source is duplicated or inactivated genes, then the source DNA is not “random”. It may be random with respect to function, but my point to kairosfocus was this:
    Any bit-counting method of calculating p(T|H), or the information content of a string, rests on the assumption that the values at each position of the string are INDEPENDENT.
    Whatever “random” (meaning unselected) bits of DNA a cell might cobble together to make a new gene, the resulting string is NOT “random” in the independent sampling sense of the word. That was my point.

  235. 235
    rich says:

    *grabs popcorn*

  236. 236
    DNA_Jock says:

    kairosfocus said

    …the history of life can be taken as exploring the field of realistic possibilities in a real life Monte Carlo run, on the usual timeline across c 4 BY.

    I agree. Of course n=1, and the posterior probability for the outcome we observe is = 1. What can we infer about the prior probabilities? Not a lot, given the [cough] small sample size.

    …the history of life can be taken as exploring the field of realistic possibilities in a real life Monte Carlo run, on the usual timeline across c 4 BY.

    Not if you believe in Intelligent Design, it can’t.

    …the history of life can be taken as exploring the field of realistic possibilities in a real life Monte Carlo run, on the usual timeline across c 4 BY.

    That’s quite the GSW to the foot.

  237. 237
    kairosfocus says:

    D-J:

    Have you really read what I did say; as in, particularly the significance of SUM {Pi log pi) and the bridge from Shannon’s info work to entropy on an informational view?

    Next, the point of the real world Monte Carlo is not that blind chance and mechanical necessity led to life — that’s a big begging of the question on a priori evo mat [your probabilities as triumphalistically announced collapse], but that we see the scope of search relative to space of possibilities, which turns out to be such that the only reasonable expectation is to capture the bulk, non-functional gibberish. Precisely because of the tight specificity of configs that confer function, which locks out most of the space of possibilities.

    As for the claimed contradiction, the chaining chemistry of DNA and AAs do not impose any particular sequencing in strings, that would be self defeating. Indeed, if there were such an imposition it would render DNA incapable of carrying information, and it would undermine the ability of the proteins to have the varied functional forms they do.

    In that context, a priori it is quite legitimate to use carrying capacity as an info measure as is common in PC work.

    However, real codes do not show flat random distributions, mostly because of functional rules or requisites or conventions.

    DNA chains are coding for proteins, which in turn must fold and funciton, which imposes subtle, specific limits. Thus protein families as studied by Durston et al, have the difference between null — flat random, ground and funcitonal states, which can then lead to onward more refined metrics such as he used.

    But in no wise does this delegitimise the string of choices approach. Are you going to say that file sizes as reported for DOC fioes etc are not reporting information metrics?

    If so, then I think you will be highly idiosyncratic.

    A fairer result would be to accept that here are diverse info metrics taken up for different purposes, a point highlighted by Shannon in 1948.

    And of course, whether we use DNA and AA chains or go to more refined metrics — try Yockey’s work of several decades ago — the material point remains the same. We are well beyond any threshold reasonably attainable by blind chance and mechanical necessity with 10^57 monte carlo sim runs of 10^14 tries per sec for 10^17 s, or if you want 10^80 runs for the observed cosmos.

    I have to run, having been summoned for an urgent meeting on very short notice.

    I cannot take up more time from what is now on my plate, tonight.

    G’night,

    KF

  238. 238
    gpuccio says:

    DNA_Jock:

    I read only now your post. Thank you for the comments.

    Some brief answers:

    I don’t understand your argument about my choice of superfamilies. I have always said clearly that in SCOP there are at least 3 major groupings which can be used in a discussion: about 1000 basic foldings, about 2000 superfamilies, and about 4000 families. In the papers which in the literature deal with the problem of functional complexity or the appearance of new strucutres in natural history, any of them has been used. Some reason in terms of folds, others in terms of superfamilies, others in terms of families.

    I believe that superfamilies are the grouping which offers the best tradeoff in terms of detecting truly isolated groups. The family grouping probably includes some possibly similar sequences which are not the best example to discuss the problems of new functional complexity. The folding grouping is IMO too “high”, and exaggerates the aggregation of completely different structures.

    However, I am proposing a general reasoning here: the point is not if we are debating a grouping of 1000, or 2000, or 4000. The point is that there are a lot of functional structures which appear isolated in all reasonable dimensions.

    You say:

    “I have not forgotten non-coding DNA. It still isn’t “random” in the mathematical sense. Consider the prevalence of di-, tri- and tetra-nucleotide repeats, or CpG bias. But thank you from bringing up transposons.”

    I love transposons. They are the future of ID.

    I think you miss the point. What is random is not the sequence itself. A non functional sequence and a functional proteins can have similar quasi ranodm appearance, or share some regularities which have biochemical reasons and are not related to the function. That’s not the point.

    The point is that the only way to go from one sequence to another one which are sequence unrelated, by means of random variation alone (bear with me about that for a moment, I will answer your other point), that is by variations in the sequence which are independent from the final sequence which will be reached (the functional sequence) can only be modeled as a random walk, where distant unrelated results have a similar probability of being reached, whose higher threshold is 1/n.

    So, it’s the walk which is random, and it’s the probability of a functional state of being reached through that random walk that we are computing.

    You say:

    ““A primordial ATP binding protein”. I, too, am unsatisfied with this answer.”

    So am I. And remember, ATP synthase is just one example of many. Of many many.

    You say:

    “As I understand this sentence, you are saying that if we cannot determine the source DNA for an event that took place 2 billion years ago, the only reasonable hypothesis is that the source DNA is completely unrelated to the DNA that is ancestral to any other extant DNA sequences. Weird.”

    Not weird at all. The point is, let’s say that we have A and B today, and that they are completely sequence unrelated, and structure unrelated, and function unrelated. Are you saying that it is reasonable to explain them both with a derivation from some unknown sequence, let’s call it X, which had sequence similarities to both, or structure similarities to both, or function similarities to both? Is that your proposal? Weird, really.

    Do you really think that it is a credible scientific explanation for two completely different things to postulate an unknown source for both? Without any evidence at all that it exists?

    You say:

    ““Forms a stable alpha helix” is a useful function, from the protein’s perspective. ”

    But that is exactly the point. We cannot consider “the protein’s perspective” in the RV+NS algorithm. The only perspective there is the reproductive fitness of the replicator.

    So, unless you can show how the formation of a generic stable alpha helix can confer a reproductive advantage to some real biological being, the fact remains that a generic alpha helix cannot be naturally selected.

    “Anyway, I also included the helix-turn-helix motif.”

    That is different. That is a 3d motif, which is functional. It is a simple one, about 20 AAs, so I would not choose it as an example of dFSCI. But what is your point? Of course there are simple functional motifs, and simple short proteins too which can be functional. That’s why I don’t use them as examples of dFSCI and I don’t infer design for them.

    Moreover, that motif is a DNA binding motif (one of many) which is essential to functions of proteins interacting with DNA. It is difficult to think of an independent function for the motif itself, which can confer a reproductive advantage.

    I don’t understand what is your point. Are you saying that you can explain the emergence of long and complex proteins or domains or superfamilies (whatever you prefer) as the result of reasonable recombinations of an existing pool of short motifs, each of which was expanded because in itself it conferred a reproductive advantage? Are you imagining some pre-LUCA whose main activity was to synthesize short alpha helices or beta sheets, selected in name of I don’t know what perspective, or short DNA binding motifs which bind DNA without having any other effect, while waiting that ATP synthase and all the rest came out from their random recombination?

    I believe that you are forgetting here the huge limits of the necessity part of the algorithm: NS. It can never help to explain the origin of complex functional structures, because complex functional structures cannot be deconstructed into simple naturally selectable steps. It’s as simple as that.

    You say:

    “No. If you are trying to calculate the probability of hitting a target, given the “chance” hypothesis (which includes RM + NS), then you ARE going to have to consider the effect of selection. And recombination…”

    No. What I do is to use probability for the parts of the proposed explanation which are attributed to random variation, and include NS in the model if and when it is really explained how it worked in that case. IOWs, I don’t accept NS as a magic fairy, never truly observed, which prevents me from computing real probability barriers to a mechanism which is supposed to rely on probabilities. I believe that that is the only credible scientific approach.

    Let’s say that you show me a true, existing step between A and B which is naturally selectable and is a sequence intermediate between A and B. Let’s call it A1. OK, then I divide the transition into two steps: A to A1 and A1 to B. That helps your case, but I can still compute the probabilites of each of the two transitions, and of the whole process, including the expansion of A1 by some perfect NS process. I have done those computations, some time ago, here. They show that selectable intermediates do reduce the probabilistic barriers, but they don’t eliminate them. In general, you need a lot of selectable intermediates, each of them completely expanded to the whole population, to bring a complex proteins into the range of small probabilistic transitions, which can be accomplished by the probabilistic resource of a real biological system.

    And that should be the general case. What a pity that we have not even one of those paths available. But that’s what neo darwinism is: a scientific explanation which relies on uncomputed probabilities and never observed necessity paths. That’s not my idea of science.

    “But in all cases, and especially if the source is duplicated or inactivated genes, then the source DNA is not “random”. It may be random with respect to function, but my point to kairosfocus was this:
    Any bit-counting method of calculating p(T|H), or the information content of a string, rests on the assumption that the values at each position of the string are INDEPENDENT.
    Whatever “random” (meaning unselected) bits of DNA a cell might cobble together to make a new gene, the resulting string is NOT “random” in the independent sampling sense of the word. That was my point.”

    Well, I am not really interested in what your point to KF was or is. I have tried to clarify what is the role of “random” in the ID reasoning: a random walk to some unrelated functional sequence. I have tried to be explicit and clear. The only thing I am interested in is eventual points about that, if you have them.

  239. 239
    DNA_Jock says:

    DNA_Jock:
    I read only now your post. Thank you for the comments.
    Some brief answers:
    I don’t understand your argument about my choice of superfamilies. I have always said clearly that in SCOP there are at least 3 major groupings which can be used in a discussion: about 1000 basic foldings, about 2000 superfamilies, and about 4000 families. In the papers which in the literature deal with the problem of functional complexity or the appearance of new strucutres in natural history, any of them has been used. Some reason in terms of folds, others in terms of superfamilies, others in terms of families.
    I believe that superfamilies are the grouping which offers the best tradeoff in terms of detecting truly isolated groups. The family grouping probably includes some possibly similar sequences which are not the best example to discuss the problems of new functional complexity. The folding grouping is IMO too “high”, and exaggerates the aggregation of completely different structures.
    However, I am proposing a general reasoning here: the point is not if we are debating a grouping of 1000, or 2000, or 4000. The point is that there are a lot of functional structures which appear isolated in all reasonable dimensions.

    And on this point I agree with you. However, your choice of superfamilies made a number of your statements regarding ‘relatedness’ factually incorrect. Let’s aim for precision in thought and language.

    You say:
    “I have not forgotten non-coding DNA. It still isn’t “random” in the mathematical sense. Consider the prevalence of di-, tri- and tetra-nucleotide repeats, or CpG bias. But thank you from bringing up transposons.”
    I love transposons. They are the future of ID.
    I think you miss the point. What is random is not the sequence itself. A non functional sequence and a functional proteins can have similar quasi ranodm appearance, or share some regularities which have biochemical reasons and are not related to the function. That’s not the point.
    [emphasis added]

    Actually, that is the point. The sequence itself is NOT “random”, in the mathematical sense.

    The point is that the only way to go from one sequence to another one which are sequence unrelated, by means of random variation alone (bear with me about that for a moment, I will answer your other point), that is by variations in the sequence which are independent from the final sequence which will be reached (the functional sequence) can only be modeled as a random walk, where distant unrelated results have a similar probability of being reached, whose higher threshold is 1/n.
    So, it’s the walk which is random, and it’s the probability of a functional state of being reached through that random walk that we are computing.

    I think that your error here may be caused by an equivocation of the word “random”. I am confident that this has been explained to you before, so I do not hold out much hope of succeeding where others have failed. But I’m an optimist, so…
    The variations that are introduced during any iteration are “random” with respect to function. However, the output from any iteration (and therefore the input to all subsequent iterations) is strongly dependent on function.
    It’s a subtle point, but technically, your statement “by variations in the sequence, which are independent from the final sequence” is also incorrect: “independent” has a specific meaning, and the variations introduced (random as they are wrt function) are in fact correlated with the final sequence that is reached. This confusion may be caused by your Texas Sharpshooter Fallacy, as seen in the phrase “which will be reached” – there is not a unitary final sequence, rather there is a set of possible sequences, one of which we observe.
    One can model drift as a random walk, agreed, but any application of selection, however slight, wrecks the RW model. Furthermore, the probability for a random walk occupying position x only reaches 1/n after the string has been completely randomized, which will take a very long time. By way of illustration, the probability that a random walk will occupy positions closer to the starting point remains higher than 1/n even after every ‘monomer’ has been mutated, on average, five times in every member of the population. (After 5 an average of mutations per monomer, there is a 0.6% chance that any individual monomer is still unmutated; so for a 100 amino acid domain, that’s a half chance of still retaining an original amino acid in the sequence; it ain’t scrambled yet, but there has not been any discernable similarity for a while…)

    You say:
    ““A primordial ATP binding protein”. I, too, am unsatisfied with this answer.”
    So am I. And remember, ATP synthase is just one example of many. Of many many.

    Agreed.

    You say:
    “As I understand this sentence, you are saying that if we cannot determine the source DNA for an event that took place 2 billion years ago, the only reasonable hypothesis is that the source DNA is completely unrelated to the DNA that is ancestral to any other extant DNA sequences. Weird.”
    Not weird at all. The point is, let’s say that we have A and B today, and that they are completely sequence unrelated, and structure unrelated, and function unrelated. Are you saying that it is reasonable to explain them both with a derivation from some unknown sequence, let’s call it X, which had sequence similarities to both, or structure similarities to both, or function similarities to both? Is that your proposal? Weird, really.

    Here’s where a second equivocation causes problems. You appear to be using the word “related” to mean both “having discernable similarity” and “sharing a common ancestor”.
    The sentence I was replying to:

    The fact remains, if a specific function sequence emerges, say 2 billion years ago (for example, all the many new superfamilies and proteins in eukatyotes) and there is no trace of those sequences, foldings and functions before, the only reasonable hypothesis is that those sequences emerged from unrelated sequences at some time.

    I had great difficulty trying to understand this sentence. Firstly, we are actually considering two or more extant sequences, that share no apparent similarity. Absent a time machine, we cannot directly access the ancestral sequences. You appear to be saying that, because they share no apparent similarity today, the only reasonable hypothesis is that they emerged from “unrelated sequences” 2 billion years ago, meaning sequences that had no discernable similarity 2 billion years ago. I think that a reasonable hypothesis might be that the intervening 2 billion years has buried any detectable similarity signal.

    Do you really think that it is a credible scientific explanation for two completely different things to postulate an unknown source for both? Without any evidence at all that it exists?

    Well, that’s a whole other conversation. Motes and beams, y’know.
    😉

    You say:
    ““Forms a stable alpha helix” is a useful function, from the protein’s perspective. ”
    But that is exactly the point. We cannot consider “the protein’s perspective” in the RV+NS algorithm. The only perspective there is the reproductive fitness of the replicator.
    So, unless you can show how the formation of a generic stable alpha helix can confer a reproductive advantage to some real biological being, the fact remains that a generic alpha helix cannot be naturally selected.

    Protein X’s function is to be a convenient way of storing amino acids for future use, without raising osmolality prohibitively. The stable alpha helix reduces the risk of aggregation.

    “Anyway, I also included the helix-turn-helix motif.”
    That is different. That is a 3d motif, which is functional. It is a simple one, about 20 AAs, so I would not choose it as an example of dFSCI. But what is your point? Of course there are simple functional motifs, and simple short proteins too which can be functional. That’s why I don’t use them as examples of dFSCI and I don’t infer design for them.
    [emphasis added]

    You do realize that you can string simple short proteins together to produce longer, more complicated proteins? I’m glad that you don’t infer design from them, I guess, but this concession alone pretty much torpedoes your argument.

    Moreover, that motif is a DNA binding motif (one of many) which is essential to functions of proteins interacting with DNA. It is difficult to think of an independent function for the motif itself, which can confer a reproductive advantage.

    Well, it could itself bind to DNA, and thus sterically exclude RNA polymerase from binding. Kd for the monomer is much higher, but it still exists. And any dimerization will produce a protein dimer that recognizes a palindrome with far lower Kd, purely as a consequence of C2 symmetry. Call me biased, but I think modulation of transcription is pretty useful.

    I don’t understand what is your point. Are you saying that you can explain the emergence of long and complex proteins or domains or superfamilies (whatever you prefer) as the result of reasonable recombinations of an existing pool of short motifs, each of which was expanded because in itself it conferred a reproductive advantage? Are you imagining some pre-LUCA whose main activity was to synthesize short alpha helices or beta sheets, selected in name of I don’t know what perspective, or short DNA binding motifs which bind DNA without having any other effect, while waiting that ATP synthase and all the rest came out from their random recombination?

    Yup, pretty much, but without the unnecessary value judgments. I am imagining some pre-LUCA’s who synthesized short alpha helices, beta sheets, helix-turn-helix motifs, etc., etc. and concatenations thereof, selected in name of some minimal selective advantage, such as DNA binding motifs which inhibit transcription, until they were out-competed by their fractionally less hopeless cousins.
    But, unfortunately, all the selection and extinction that has occurred since that time has badly fogged our view of this era. I doubt that we will ever know the historical truth about how early proteins did emerge, but – thanks to in vitro protein evolution studies – we do know a fair amount about what is feasible.

    I believe that you are forgetting here the huge limits of the necessity part of the algorithm: NS. It can never help to explain the origin of complex functional structures, because complex functional structures cannot be deconstructed into simple naturally selectable steps. It’s as simple as that.

    Please provide support for this assertion. Please be very precise.

    You say:
    “No. If you are trying to calculate the probability of hitting a target, given the “chance” hypothesis (which includes RM + NS), then you ARE going to have to consider the effect of selection. And recombination…”
    No. What I do is to use probability for the parts of the proposed explanation which are attributed to random variation, and include NS in the model if and when it is really explained how it worked in that case. IOWs, I don’t accept NS as a magic fairy, never truly observed, which prevents me from computing real probability barriers to a mechanism which is supposed to rely on probabilities. I believe that that is the only credible scientific approach.

    Examples exist that model how error-prone replication combined iteratively with selection can achieve results that ID-proponents claim cannot be achieved within the lifetime of the universe. Some rudimentary ones have been discussed on UD, but the discussions here shed more heat than light, sadly.

    Let’s say that you show me a true, existing step between A and B which is naturally selectable and is a sequence intermediate between A and B. Let’s call it A1. OK, then I divide the transition into two steps: A to A1 and A1 to B. That helps your case, but I can still compute the probabilites of each of the two transitions, and of the whole process, including the expansion of A1 by some perfect NS process. I have done those computations, some time ago, here. They show that selectable intermediates do reduce the probabilistic barriers, but they don’t eliminate them. In general, you need a lot of selectable intermediates, each of them completely expanded to the whole population, to bring a complex proteins into the range of small probabilistic transitions, which can be accomplished by the probabilistic resource of a real biological system.
    And that should be the general case. What a pity that we have not even one of those paths available. But that’s what neo darwinism is: a scientific explanation which relies on uncomputed probabilities and never observed necessity paths. That’s not my idea of science.
    [emphasis added]

    Some of your colleagues have yet to get the memo.
    Contrast the behavior of those who assert “such pathways cannot exist” with those who say “let’s investigate what would be required?”

    “But in all cases, and especially if the source is duplicated or inactivated genes, then the source DNA is not “random”. It may be random with respect to function, but my point to kairosfocus was this:
    Any bit-counting method of calculating p(T|H), or the information content of a string, rests on the assumption that the values at each position of the string are INDEPENDENT.
    Whatever “random” (meaning unselected) bits of DNA a cell might cobble together to make a new gene, the resulting string is NOT “random” in the independent sampling sense of the word. That was my point.”
    Well, I am not really interested in what your point to KF was or is. I have tried to clarify what is the role of “random” in the ID reasoning: a random walk to some unrelated functional sequence. I have tried to be explicit and clear. The only thing I am interested in is eventual points about that, if you have them.

    I would humbly suggest that if you want to clarify what is the role of “random” in the ID reasoning, you make clear, on every occasion that the word is used, which of the various meanings you are ascribing to it:
    “Unguided” = your “random walk”
    “Unselected” = your “random” source DNA.
    “Mathematically random” (i.e. unbiased, independent sampling) = kairosfocus’s erroneous assumption about strings.
    And get your colleagues to do the same.

  240. 240
    gpuccio says:

    DNA_Jock:

    Again, thank you for the comments. I like this discussion, because your arguments are good and clear and pertinent.

    So, let’s go on.

    First of all, I give you my explicit definition of “random”. I use that term in one sense only.

    Random is any system which, while evolving according to necessity laws (we are not discussing quantum systems here) can best be described by some appropriate probability distribution. IOWs, a random system is a necessity system which we cannot describe with precision by its necessity laws, usually because too many variables are involved.

    Let’s go on.

    And on this point I agree with you. However, your choice of superfamilies made a number of your statements regarding ‘relatedness’ factually incorrect. Let’s aim for precision in thought and language.

    Well, I am happy we have clarified. I like precision, but please admit that this is a general blog where I was trying to make a general discussion available to all. Sometimes you have to make some tradeoff between precision and simplicity of language.

    Actually, that is the point. The sequence itself is NOT “random”, in the mathematical sense.

    That’s why I refer “random” to the random walk if not modified by selection. I only use the term random in the “RV” part of the algorithm. It mean that the variation is random, because we cannot describe it by a strict necessity law, but we can use a probabilistic approach. Some people (many, unfortunately) erroneously think that random means some system showing an uniform probability distribution, but that is not your case, I think. So, please always refer to my initial definition of a random system. To be more clear, I will explicitly define RV:

    In a biological system, RV is any variation oh which we cannot describe the results with precision, but which can be well modeled by some probability distribution.

    The neo darwinian model is a probabilistic model which includes a necessity step, NS. The model is also sequential, because it hypothesizes that NS acts on the results of RV, modifying the probabilistic scenario by expanding naturally selectable outcomes. But RV and NS always act in sequence, and that’s why we can separate them in our reasoning and modeling.

    The variations that are introduced during any iteration are “random” with respect to function. However, the output from any iteration (and therefore the input to all subsequent iterations) is strongly dependent on function.

    I don’t agree. See my definition of “random system”. If you have a different definition, please give it explicitly. I find your use of “random” rather confusing here.

    First of all, what do you mean by “iteration”?

    My way to describe RV is:

    “The variation events that happen at any moment are random because their outcome can best be described by a probabilistic approach.”

    For example, SNP are random because we cannot anticipate which outcome will take place. Even if the probabilities of each event is not the same, the system is random just the same. In general, we can assume an uniform distribution of the individual transitions for practical purpose. And of course other variation events, like deletion and so on, have not the same probability as a general SNP. But the system of all the possible variations remains random (unless the variations are designed 🙂 ).

    Function has nothing to do with this definition.

    It’s a subtle point, but technically, your statement “by variations in the sequence, which are independent from the final sequence” is also incorrect: “independent” has a specific meaning, and the variations introduced (random as they are wrt function) are in fact correlated with the final sequence that is reached. This confusion may be caused by your Texas Sharpshooter Fallacy, as seen in the phrase “which will be reached” – there is not a unitary final sequence, rather there is a set of possible sequences, one of which we observe.

    No, you are confused here. I will try to be more clear. What I mean is that the outcome of random variation can best be explained by a probabilistic approach, and the probabilistic description of that outcome has nothing to do with any functional consideration. Can you agree with that simple statement?

    One can model drift as a random walk, agreed, but any application of selection, however slight, wrecks the RW model.

    I have never said anything different. But RV and selection act sequentially. So I can apply the random walk model to any step which does not include selection. And I will accept NS only if it is supported by facts, not as a magic fairy accepted by default because of a pre commitment based on personal (or collective) faith.

    Furthermore, the probability for a random walk occupying position x only reaches 1/n after the string has been completely randomized, which will take a very long time. By way of illustration, the probability that a random walk will occupy positions closer to the starting point remains higher than 1/n even after every ‘monomer’ has been mutated, on average, five times in every member of the population. (After 5 an average of mutations per monomer, there is a 0.6% chance that any individual monomer is still unmutated; so for a 100 amino acid domain, that’s a half chance of still retaining an original amino acid in the sequence; it ain’t scrambled yet, but there has not been any discernable similarity for a while…)

    That’s why I say that 1/n is the highest probability we can assume for an unrelated state. Obviously, related states are more likely. What are you assuming here, that all proteins are related although we have no way of detecting that? Faith again? The fact remains that a lot of protein sequences are unrelated, as far as we can objectively judge. Therefore, we must assume 1/n as the high threshold of probability in a random walk from an unrelated precursor to the. Please, let me understand how a single precursor, related to all existing proteins, could have generated the many unrelated existing proteins by individual random walks, each with probability higher than 1/n. Is that what you are really suggesting? Seriously?

    Agreed.

    Re-agreed! 🙂

    Here’s where a second equivocation causes problems. You appear to be using the word “related” to mean both “having discernable similarity” and “sharing a common ancestor”.

    No. I only mean “having discernable similarity”. I am an empirical guy. “Having discernable similarity” is an observable, a fact. “Sharing a common ancestor” is an hypothesis, a theory. If we have not the fact, we cannot support the hypothesis (unless we can observe or infer the process in other ways). IOWs, if we have no discernable similarity, there is no reason to believe that two sequences share a common ancestor (unless you have other independent evidence, based on facts. Observables. My categories are very simple, and must remain simple.

    I had great difficulty trying to understand this sentence. Firstly, we are actually considering two or more extant sequences, that share no apparent similarity. Absent a time machine, we cannot directly access the ancestral sequences. You appear to be saying that, because they share no apparent similarity today, the only reasonable hypothesis is that they emerged from “unrelated sequences” 2 billion years ago, meaning sequences that had no discernable similarity 2 billion years ago. I think that a reasonable hypothesis might be that the intervening 2 billion years has buried any detectable similarity signal.

    No. If you have read the previous point, maybe you can understand better what I mean. “Reasonable” here mean “scientifically reasonable”, IOWs justified by facts. So, the meaning is very simple: we observe two unrelated sequences, and we have no facts that support the existence of a common ancestor. Therefore, the reasonable scientific hypothesis os that they share no common ancestor. You can certainly hypothesize that they shared a common ancestor and that “the intervening 2 billion years has buried any detectable similarity signal”, but that is not science, unless you have independent empirical support. It is not scientifically reasonable to hypothesize at the same time that something is true but that we cannot have any evidence of it.

    You do realize that you can string simple short proteins together to produce longer, more complicated proteins? I’m glad that you don’t infer design from them, I guess, but this concession alone pretty much torpedoes your argument.

    Some proteins use simpler modules. That does not torpedo anything.

    First of all, many of those basic modules are complex enough.

    Second, proteins have many long parts which are certainly functional and are not explained as the sum of simple modules. ATP synthase subunits are again a good example. But just think of transcription factors, and all the part of their sequence (usually the longest part) which is not a DBD.

    Third, individual short modules, very often, while functional in a greater context (the long proteins), would not be naturally selectable by themselves (or, again, we have no evidence for that).

    Well, it could itself bind to DNA, and thus sterically exclude RNA polymerase from binding. Kd for the monomer is much higher, but it still exists. And any dimerization will produce a protein dimer that recognizes a palindrome with far lower Kd, purely as a consequence of C2 symmetry. Call me biased, but I think modulation of transcription is pretty useful.

    Useful is not the same as naturally selectable. And again, where are evidences? Facts?

    Yup, pretty much, but without the unnecessary value judgments. I am imagining some pre-LUCA’s who synthesized short alpha helices, beta sheets, helix-turn-helix motifs, etc., etc. and concatenations thereof, selected in name of some minimal selective advantage, such as DNA binding motifs which inhibit transcription, until they were out-competed by their fractionally less hopeless cousins.
    But, unfortunately, all the selection and extinction that has occurred since that time has badly fogged our view of this era. I doubt that we will ever know the historical truth about how early proteins did emerge, but – thanks to in vitro protein evolution studies – we do know a fair amount about what is feasible.

    Again, refer to my discussion about what science is. Science is not about what is possible, but about what is supported by facts.

    Please provide support for this assertion. Please be very precise.

    It’s simple. The assertion is:

    “Complex functional structures cannot be deconstructed iinto simple naturally selectable steps. It’s as simple as that”.

    We have tons of examples of complex functional structures. In machines, in software, in language, in proteins and other biological machines. In all examples of complex functional information, the global function is the result of an intelligent aggregation of bits of information which, in themselves, cannot give any incremental function. No complex function cab be deconstructed into a simple sequence of transformations, each of them simple enough to be generated by random variation, each of them conferring higher functionality in a linear sequence. That is as true of software as it is true of proteins. There is no rule of logic which says that I can build complex functions by aggregating simple steps which are likely enough to be generated randomly.

    Please, show any deconstruction of that kind for one single complex protein. Then show that there are reasons to believe that such a (non existent) case can be the general case. Then we can discuss NS as a scientific theory supported by facts.

    Examples exist that model how error-prone replication combined iteratively with selection can achieve results that ID-proponents claim cannot be achieved within the lifetime of the universe. Some rudimentary ones have been discussed on UD, but the discussions here shed more heat than light, sadly.

    To what are you referring? And I notice the “rudimentary” in your statement. Telling, isn’t it?

    Some of your colleagues have yet to get the memo.
    Contrast the behavior of those who assert “such pathways cannot exist” with those who say “let’s investigate what would be required?”

    I have always been very clear about my position. The second one.

    I would humbly suggest that if you want to clarify what is the role of “random” in the ID reasoning, you make clear, on every occasion that the word is used, which of the various meanings you are ascribing to it:
    “Unguided” = your “random walk”
    “Unselected” = your “random” source DNA.
    “Mathematically random” (i.e. unbiased, independent sampling) = kairosfocus’s erroneous assumption about strings.

    I have given my definitions. In a discussion with me, please stick to them (or criticize them).

    And get your colleagues to do the same.

    I answer for myself. Like many other interlocutors, you have a strange idea of scientific debate. I have no colleagues. So should you. We are intelligent (I hope 🙂 ) people, sharing some ideas and not others. And we discuss. This is not a political party, or a fight between football fans.

  241. 241
    Alan Fox says:

    This is not a political party, or a fight between football fans.

    Well,it shouldn’t be.

  242. 242
    gpuccio says:

    Alan:

    It is not for me, and never has been. ID is about ideas, scientific ideas. The rest is not important.

  243. 243
    DNA_Jock says:

    Gpuccio,
    I like this discussion too; you make rational arguments.

    First of all, I give you my explicit definition of “random”. I use that term in one sense only.
    Random is any system which, while evolving according to necessity laws (we are not discussing quantum systems here) can best be described by some appropriate probability distribution. IOWs, a random system is a necessity system which we cannot describe with precision by its necessity laws, usually because too many variables are involved.
    Let’s go on.

    I would call your “random process” a “stochastic process”. I would call your “random sequence” an “undetermined sequence”. Of course, pretty much any sequence would be “random” under your usage, which does reduce the usefulness of the term somewhat. And it is my belief (albeit a rather personal, idiosyncratic one) that ALL processes are stochastic, just to a greater or lesser degree. For many, many processes the stochastic element is virtually undetectable, and can safely be ignored. Thus we can safely ignore the stochastic effects of QM for almost all biological systems.

    [snip]
    That’s why I refer “random” to the random walk if not modified by selection. I only use the term random in the “RV” part of the algorithm. It mean that the variation is random, because we cannot describe it by a strict necessity law, but we can use a probabilistic approach. Some people (many, unfortunately) erroneously think that random means some system showing an uniform probability distribution, but that is not your case, I think. So, please always refer to my initial definition of a random system. To be more clear, I will explicitly define RV:
    In a biological system, RV is any variation oh which we cannot describe the results with precision, but which can be well modeled by some probability distribution.
    The neo darwinian model is a probabilistic model which includes a necessity step, NS. The model is also sequential, because it hypothesizes that NS acts on the results of RV, modifying the probabilistic scenario by expanding naturally selectable outcomes. But RV and NS always act in sequence, and that’s why we can separate them in our reasoning and modeling.
    DNAJ wrote “The variations that are introduced during any iteration are “random” with respect to function. However, the output from any iteration (and therefore the input to all subsequent iterations) is strongly dependent on function.”
    I don’t agree. See my definition of “random system”. If you have a different definition, please give it explicitly. I find your use of “random” rather confusing here.
    First of all, what do you mean by “iteration”?

    “Iteration” refers to a single act of replication. Maybe easier to think of as a single generation, although this would underestimate the number of iterations. A population “P0” suffers mutation(s) that are random-with-respect-to-function, creating “P0m”. P0m then replicates, creating “P1” whose distribution of mutations IS dependent on function.

    My way to describe RV is:
    “The variation events that happen at any moment are random because their outcome can best be described by a probabilistic approach.”
    For example, SNP are random because we cannot anticipate which outcome will take place. Even if the probabilities of each event is not the same, the system is random just the same. In general, we can assume an uniform distribution of the individual transitions for practical purpose. And of course other variation events, like deletion and so on, have not the same probability as a general SNP. But the system of all the possible variations remains random (unless the variations are designed ).
    Function has nothing to do with this definition.

    I would use the word stochastic; I agree that modeling the individual transitions as uniform p is okay for practical purposes, although you might want to distinguish transitions from transversions. Modeling indels and recombination is tougher, of course.

    DNAJ wrote “It’s a subtle point, but technically, your statement “by variations in the sequence, which are independent from the final sequence” is also incorrect: “independent” has a specific meaning, and the variations introduced (random as they are wrt function) are in fact correlated with the final sequence that is reached. This confusion may be caused by your Texas Sharpshooter Fallacy, as seen in the phrase “which will be reached” – there is not a unitary final sequence, rather there is a set of possible sequences, one of which we observe.”
    No, you are confused here.

    No my statement is accurate and relevant. Your statement that “by variations in the sequence, which are independent from the final sequence” is incorrect. The variations in the sequence are stochastic, but they are NOT independent of the final sequence. They are correlated with the final sequence. As I said, it is a subtle point: I will try to explain: imagine a stochastic process A that leads to stochastic outcomes B. We can say “B is affected by A”, “A is not influenced by B”, but we CANNOT say “A is independent of B”, because the two things are in fact correlated. The direction of the causation arrow does not matter, as far as the correlation is concerned.

    I will try to be more clear. What I mean is that the outcome of random variation can best be explained by a probabilistic approach, and the probabilistic description of that outcome has nothing to do with any functional consideration. Can you agree with that simple statement?

    For a single iteration I agree. For two or more iterations, I disagree.

    DNAJ wrote: “One can model drift as a random walk, agreed, but any application of selection, however slight, wrecks the RW model.”
    I have never said anything different. But RV and selection act sequentially. So I can apply the random walk model to any step which does not include selection. And I will accept NS only if it is supported by facts, not as a magic fairy accepted by default because of a pre commitment based on personal (or collective) faith.

    This is a root cause of our disagreement; I think we disagree about inference to best explanation and burdens of proof. I will note that your reference to faith in magic fairies strikes me as the sort of taunt a “football fan” would make, and it disappoints me.

    DNAJ wrote “Furthermore, the probability for a random walk occupying position x only reaches 1/n after the string has been completely randomized, which will take a very long time. By way of illustration, the probability that a random walk will occupy positions closer to the starting point remains higher than 1/n even after every ‘monomer’ has been mutated, on average, five times in every member of the population. (After 5 an average of mutations per monomer, there is a 0.6% chance that any individual monomer is still unmutated; so for a 100 amino acid domain, that’s a half chance of still retaining an original amino acid in the sequence; it ain’t scrambled yet, but there has not been any discernable similarity for a while…)”
    That’s why I say that 1/n is the highest probability we can assume for an unrelated state.
    Obviously, related states are more likely. What are you assuming here, that all proteins are related although we have no way of detecting that? Faith again? The fact remains that a lot of protein sequences are unrelated, as far as we can objectively judge. Therefore, we must assume 1/n as the high threshold of probability in a random walk from an unrelated precursor to the. Please, let me understand how a single precursor, related to all existing proteins, could have generated the many unrelated existing proteins by individual random walks, each with probability higher than 1/n. Is that what you are really suggesting? Seriously?
    DNJA wrote

    Here’s where a second equivocation causes problems. You appear to be using the word “related” to mean both “having discernable similarity” and “sharing a common ancestor”.

    No. I only mean “having discernable similarity”. I am an empirical guy. “Having discernable similarity” is an observable, a fact. “Sharing a common ancestor” is an hypothesis, a theory. If we have not the fact, we cannot support the hypothesis (unless we can observe or infer the process in other ways). IOWs, if we have no discernable similarity, there is no reason to believe that two sequences share a common ancestor (unless you have other independent evidence, based on facts. Observables. My categories are very simple, and must remain simple.

    DNAJ wrote “I had great difficulty trying to understand this sentence. Firstly, we are actually considering two or more extant sequences, that share no apparent similarity. Absent a time machine, we cannot directly access the ancestral sequences. You appear to be saying that, because they share no apparent similarity today, the only reasonable hypothesis is that they emerged from “unrelated sequences” 2 billion years ago, meaning sequences that had no discernable similarity 2 billion years ago. I think that a reasonable hypothesis might be that the intervening 2 billion years has buried any detectable similarity signal.”
    No. If you have read the previous point, maybe you can understand better what I mean. “Reasonable” here mean “scientifically reasonable”, IOWs justified by facts. So, the meaning is very simple: we observe two unrelated sequences, and we have no facts that support the existence of a common ancestor. Therefore, the reasonable scientific hypothesis os that they share no common ancestor. You can certainly hypothesize that they shared a common ancestor and that “the intervening 2 billion years has buried any detectable similarity signal”, but that is not science, unless you have independent empirical support. It is not scientifically reasonable to hypothesize at the same time that something is true but that we cannot have any evidence of it.

    Aha! I was wrong about the nature of your equivocation. My apologies. The problem is rather with the implications you make from the word “unrelated”. Given your strict use of the word “related” to mean “having discernable similarity”, then “unrelated” refers to all sequences that lack a discernable similarity, however near or far they may be from the test sequence. Under your usage, many “unrelated” sequences have probabilities that are far HIGHER than 1/n.
    I had originally assumed that when you said 1/n was the “upper bound” for “unrelated sequences”, that was a typographical error, and you actually meant “lower bound”. My bad.
    Think about it this way: there are three categories of sequence :
    1) “related”, meaning having discernable similarity
    2) “totally unrelated”, meaning having a probability of less than 1/n
    3) “invisibly related”, meaning having some proximity in the phase space, but this proximity is too low to be a reliable indicator of any (ancestral) relationship
    My paragraph that you quoted above (“at an average of 5 mutations per monomer, there is still a 50% chance that a 100 amino acid domain retains an original amino acid”) was my quick-and-dirty way of trying to point out the importance of this third category, just using the Poisson distribution.
    I now realize that the existence of this third category is probably a bone of contention, so I did a little modeling. Last year, as part of my job, I wrote a Gibbs sampling routine to allow me to optimize a set of ten parameters; today it was a simple matter to turn OFF the ‘oracle’, so that the code performs a truly random walk in the ten-dimensional space. I then asked how many generations does it take for a random walk to wander outside of the “discernable similarity” space (I used p=0.05 as my criterion here), and how many generations does it take for the same walk to reach a point where its position is fully uncorrelated with its starting position, i.e. the point when the walk first crosses the dividing line separating the half of the phase space that is closer to the starting point and first explores the “more distant half”.
    The early returns show that the time-to-no-discernable-similarity is one quarter of the time-to-no-correlation. Thus for every sequence with discernable similarity, there will be three “invisibly related” sequences. I think that this ratio will only get larger as dimensionality increases.
    Take-home: there are lots of “unrelated” sequences that have probabilities above 1/n.

    DNAJ wrote “You do realize that you can string simple short proteins together to produce longer, more complicated proteins? I’m glad that you don’t infer design from them, I guess, but this concession alone pretty much torpedoes your argument.”
    Some proteins use simpler modules. That does not torpedo anything.
    First of all, many of those basic modules are complex enough.
    Second, proteins have many long parts which are certainly functional and are not explained as the sum of simple modules. ATP synthase subunits are again a good example. But just think of transcription factors, and all the part of their sequence (usually the longest part) which is not a DBD.

    In my youth, I spent a lot of time thinking about transcription factors. You are assuming that they need to be that long in order to carry out their function. Furthermore, you are assuming that their progenitor had to be that long and complicated to perform the progenitor’s function. If you want to talk about facts, they are against you here.

    Third, individual short modules, very often, while functional in a greater context (the long proteins), would not be naturally selectable by themselves (or, again, we have no evidence for that).

    DNAJ wrote “Well, it could itself bind to DNA, and thus sterically exclude RNA polymerase from binding. Kd for the monomer is much higher, but it still exists. And any dimerization will produce a protein dimer that recognizes a palindrome with far lower Kd, purely as a consequence of C2 symmetry. Call me biased, but I think modulation of transcription is pretty useful.”
    Useful is not the same as naturally selectable. And again, where are evidences? Facts?

    This makes no sense to me whatsoever.

    DNAJ wrote “Yup, pretty much, but without the unnecessary value judgments. I am imagining some pre-LUCA’s who synthesized short alpha helices, beta sheets, helix-turn-helix motifs, etc., etc. and concatenations thereof, selected in name of some minimal selective advantage, such as DNA binding motifs which inhibit transcription, until they were out-competed by their fractionally less hopeless cousins.
    But, unfortunately, all the selection and extinction that has occurred since that time has badly fogged our view of this era. I doubt that we will ever know the historical truth about how early proteins did emerge, but – thanks to in vitro protein evolution studies – we do know a fair amount about what is feasible.”
    Again, refer to my discussion about what science is. Science is not about what is possible, but about what is supported by facts.

    You have a strange view of Science. Working hypotheses may be short on supporting facts today. The key thing is that they make testable predictions.

    DNAJ wrote “Please provide support for this assertion. Please be very precise.”
    It’s simple. The assertion is:
    “Complex functional structures cannot be deconstructed iinto simple naturally selectable steps. It’s as simple as that”.
    We have tons of examples of complex functional structures. In machines, in software, in language, in proteins and other biological machines. In all examples of complex functional information, the global function is the result of an intelligent aggregation of bits of information which, in themselves, cannot give any incremental function. No complex function cab be deconstructed into a simple sequence of transformations, each of them simple enough to be generated by random variation, each of them conferring higher functionality in a linear sequence. That is as true of software as it is true of proteins. There is no rule of logic which says that I can build complex functions by aggregating simple steps which are likely enough to be generated randomly.

    As I thought, this is an argument by analogy that has been demolished previously, including in Kitzmiller.

    Please, show any deconstruction of that kind for one single complex protein. Then show that there are reasons to believe that such a (non existent) case can be the general case. Then we can discuss NS as a scientific theory supported by facts.
    Examples exist that model how error-prone replication combined iteratively with selection can achieve results that ID-proponents claim cannot be achieved within the lifetime of the universe. Some rudimentary ones have been discussed on UD, but the discussions here shed more heat than light, sadly.
    To what are you referring? And I notice the “rudimentary” in your statement. Telling, isn’t it?

    Well the “rudimentary” one was Weasel. I agree that it is “telling” that people at UD cannot even understand a simple toy example. I found the discussion
    http://theskepticalzone.com/wp/?p=576
    more interesting…

    [snip]
    I have given my definitions. In a discussion with me, please stick to them (or criticize them).

    As you noted above, a lot of people on UD read the word “random” and think that it means “uniform probability distribution”. I would recommend that you use the word “stochastic”, which does not suffer from this risk of misinterpretation. Your definition of “random” leads to a problem when used to describe an outcome, however : it is completely without meaning. All outcomes, all sequences, are the results of stochastic processes.

    DNAJ wrote “And get your colleagues to do the same.”
    I answer for myself. Like many other interlocutors, you have a strange idea of scientific debate. I have no colleagues. So should you. We are intelligent (I hope ) people, sharing some ideas and not others. And we discuss. This is not a political party, or a fight between football fans.

    I too answer for myself, and I recognize that that is all anyone can do. However, a key aspect of scientific debate is the willingness to attack the arguments of people with whom you concur with, if anything, greater aggression than you attack the arguments of those you disagree with.
    My perception may be biased, but at UD I observe a reluctance to ‘rock the boat’ by addressing differences between ID advocates. Different posters will make comments that are mutually contradictory, yet there is a strange, unscientific, reluctance to address these discrepancies. In that sense, I am afraid that UD does look like a political party, trying to keep both believers in common descent and YECs under the same big tent.
    I would hold you in even higher esteem if you were more willing to correct other regulars here when they say things that you know make no sense , such as kf’s probability calculations. But I understand the reluctance.

  244. 244
    gpuccio says:

    DNA_Jock:

    1) I can call the process random or stochastic, but my definition remains the same. For all practical purposes, I think we can maintain the difference between systems where the evolution of the system can be described by necessity laws, and systems which are best described by probability distributions. Of course, many systems are mixed, and yet in modeling them we distinguish the two approaches.

    I have not really used “random” as applied to the sequences in my reasoning. For me, the only difference between sequences is if they can implements some defined function or not. Please, refer to this OP of mine for my definition of functional information:

    http://www.uncommondescent.com.....n-defined/

    Another matter is the presence of some order in a sequence. That os not really relevant to my discussion. However, to sum up, I think we have at least 3 types of strings:

    a) Completely undetermined (what you could call “random”): no special order, and no special complex function which can be defined for the string (simple functions can always be defined). That would describe almost all the strings of a certain length.

    b) Functionally specified: a complex function can be defined for the string, which requires a high number of specific bits of information to be implemented. That is the case for language, software and protein genes, in most cases. These kind of sequences can be almost identical, in form, to the first type, but they do implement a complex function. Only the functional specification really distinguishes them from the first type.

    c) Ordered strings. They can be specified because of their order, but they can often be explained by some necessity mechanism in the system. Most biological strings are not of this type.

    2) You say:

    “P0m then replicates, creating “P1” whose distribution of mutations IS dependent on function.”

    I don’t understand what you mean. Are you assuming a selection effect?

    3) You say:

    “No my statement is accurate and relevant. Your statement that “by variations in the sequence, which are independent from the final sequence” is incorrect. The variations in the sequence are stochastic, but they are NOT independent of the final sequence. They are correlated with the final sequence. As I said, it is a subtle point: I will try to explain: imagine a stochastic process A that leads to stochastic outcomes B. We can say “B is affected by A”, “A is not influenced by B”, but we CANNOT say “A is independent of B”, because the two things are in fact correlated. The direction of the causation arrow does not matter, as far as the correlation is concerned.”

    I am not sure I understand what you mean, but I will try to explain better what I mean.

    RV can be modeled, but we cannot include a specific function in that modeling. IOWs, the probability distribution of the random events which can take place is not biased towards any specific function. Do you agree on that? And if you agree, then what is your point? Remember, I am not including NS at this level of the reasoning. That statement is about RV alone.

    4) You say:

    “This is a root cause of our disagreement; I think we disagree about inference to best explanation and burdens of proof. I will note that your reference to faith in magic fairies strikes me as the sort of taunt a “football fan” would make, and it disappoints me.”

    Sorry to disappoint you, but maybe I am a “football fan” of my philosophy of science and epistemology. One thing is respecting your views and being open to discussion about them. Another thing is agreeing with your epistemology. I do believe that the current scientific thought is ideologically biased. I cannot pretend that I think differently.

    5) You say:

    “The early returns show that the time-to-no-discernable-similarity is one quarter of the time-to-no-correlation. Thus for every sequence with discernable similarity, there will be three “invisibly related” sequences. I think that this ratio will only get larger as dimensionality increases.
    Take-home: there are lots of “unrelated” sequences that have probabilities above 1/n.”

    I am not sure that I follow all the aspects of your argument, but for the moment I will accept your conclusions. But let me understand better.

    RV in the DNA sequence is not “multidimensional”, as far as I can understand. So, if we reason only at the sequence level, and if we look at sequences as they are, they are either related (at some threshold we can decide) or unrelated. That is the only empirical assessment we can make.

    Your reasoning only tries to show that it is possible that some sequences are nearer in the phase space than they appear. But if we cannot identify them, how does that help your reasoning?

    The important point is that the variation at sequence level has nothing to do with function, unless you factor selection. The problem is not if a sequence could derive from another (in principle, any sequence can derive from another). The problem is much simpler. It is: if n sequences can derive from A through a random walk, and if the number of sequences which implement function X is only 1:10^500 of n (just to make an example with about 1600 bits of functional information, the value I assume for those famous two sequences of ATP synthase), and if function X emerges at point t1 and was not there before, how can you explain that one of the sequences of the tiny functional space emerges? The “function which is not yet there” cannot in any way “favor” some variations instead of others.

    You see, I am not interested in showing that protein B is not derived from sequence A. I believe that it derives from A. But I believe that the derivation is designed, guided by the conscious representation of the function X. The simple point is that no random walk will ever generate ATP synthase, generating the 1600 bits which are necessary for it to work. Even if you could demonstrate some “invisible derivation” from an undefined precursor that nobody knows (and I really cannot understand how you can hope to support such a theory empirically), still you have done nothing to explain the specific functional configuration of 1600 bits that accomplishes the function, and which did not exist before, and which has been conserved by negative selection for billions of years.

    6) You say:

    “In my youth, I spent a lot of time thinking about transcription factors. You are assuming that they need to be that long in order to carry out their function. Furthermore, you are assuming that their progenitor had to be that long and complicated to perform the progenitor’s function. If you want to talk about facts, they are against you here.”

    Why? You know as well as I do that the function of a TF is not linked only to the conserved DBDs. TFs act combinatorially at the enhanceosome, interacting one with many others after having been bound to DNA. There can be no doubt that other parts of the molecule are extremenly relevant to the regulatory activity, to its combinatorial complexity, and therefor to the final result. And you certainly know that the epigenetic details of that regulatory combinatorial activity may be very different in different species and contexts, even for the same TF. These are facts. I don’t understand your statement.

    7) You say:

    “You have a strange view of Science. Working hypotheses may be short on supporting facts today. The key thing is that they make testable predictions.”

    You have a strange view of science. Hypotheses are born from observed facts. That’s what they try to explain. I have no reason to understate the importance of predictions.But theories are born from existing facts, and when possible confirmed by predictions about new facts which may be observed after. However, any fact, after having been observed, becomes an observed fact. Therefore, in the end, theories are supported only by observed facts.

    I know very well that neo darwinism is “short on supporting facts”. Very short, I would say. That’s exactly its problem. And if it were good at making “testable predictions” (possibly not self-referential), then it would be no more “short”. How can a prediction be “tested”, if not by observed facts?

    8) You say:

    “As I thought, this is an argument by analogy that has been demolished previously, including in Kitzmiller.”

    I don’t agree. It’s an argument by analogy, like many of the best arguments in science and in human cognition. But I don’t think it has ever been “demolished”.

    Maybe if you give the details of the demolition argument, we can discuss it.

    9) You say:

    “I agree that it is “telling” that people at UD cannot even understand a simple toy example.”

    I don’t know why you link that post by Elizabeth. I have discussed it in detail time ago. I can do that again, if you like. It’s a good example of bad reasoning.

    10) You say:

    “As you noted above, a lot of people on UD read the word “random” and think that it means “uniform probability distribution”. I would recommend that you use the word “stochastic”, which does not suffer from this risk of misinterpretation.”

    The misinterpretation derives only by a non correct understanding of probability theory. There is no reason to change terms for that.

    11) You say:

    “Your definition of “random” leads to a problem when used to describe an outcome, however : it is completely without meaning. All outcomes, all sequences, are the results of stochastic processes.”

    What do you mean? Are you saying that the result of a computer computation which gives a specific mathematical outcome to a specific mathematical problem (even a simple one, like 2+2) is not different from the results of coin tossing? 4 as the result of the computation of 2+2 is “stochastic” as the sequence 0010111010 (let’s say derived from a fair coin tossing)? Is that your point? Please, clarify.

    12) You say:

    “I too answer for myself, and I recognize that that is all anyone can do. However, a key aspect of scientific debate is the willingness to attack the arguments of people with whom you concur with, if anything, greater aggression than you attack the arguments of those you disagree with.
    My perception may be biased, but at UD I observe a reluctance to ‘rock the boat’ by addressing differences between ID advocates. Different posters will make comments that are mutually contradictory, yet there is a strange, unscientific, reluctance to address these discrepancies. In that sense, I am afraid that UD does look like a political party, trying to keep both believers in common descent and YECs under the same big tent.
    I would hold you in even higher esteem if you were more willing to correct other regulars here when they say things that you know make no sense , such as kf’s probability calculations. But I understand the reluctance.”

    I will be very clear. I am here to defend a scientific theory (ID) which is for me very important and very true. I enter debate with people, like you, who think differently because that is my purpose here.

    When I think it is necessary, I clarify important differences in what I think in respect to what others in the ID field think. I have many times defended Common Descent against many friends here, for the simple reason that I consider it a very good scientific explanation of facts.

    In the same way, I have many times expressed differences in respect to some ideas of Dembski (a thinker that I deeply respect and love), especially his “latest” definition of specification.

    I respect very very much Behe and practically all that he says, but still I disagree with the final part of TEOE, where he apparently makes a “TE” argument.

    And so on.

    But, certainly, I don’t consider my duty to attack my fellow IDists whenever I don’t agree with some statement they make. That is not my role, and not the reason why I come here.

    In particular, I have always been clear that I consider any reference to religion and specific religious beliefs extremely out of context in a scientific discussion. That’s why I strictly avoid those aspects in my reasonings. However, it is perfectly fine to discuss those things in more general posts (however, I usually avoid doing that too).

    I am not a YEC, and I don’t approve Creation Science, but I respects both positions as faith motivated, because I have deep respect for the faith choices of everyone, including atheists and darwinists 🙂 . However, I don’t consider those positions which are explicitly motivated by faith as scientifically acceptable positions.

    At least one of the best supporters of ID here is a YEC. I respect him deeply, and he is perfectly correct in all his ID arguments, without mixing them with other types of arguments.

    That’s all I can say.

  245. 245
    DNA_Jock says:

    Gpuccio,
    I have enjoyed our chat. As I stated at the outset, I think that the origins of large protein folds is one of the most interesting challenges facing the MES.
    Unfortunately, I have yet to read an ID-proponent who avoids the sin of over-stating and over-simplifying their case. This detracts from the force of their arguments.

    Random Minor Points
    1)
    While admitting that it is “not really relevant to my discussion”, you bring up “Functional Specification”. I have yet to see a “Functional Specification” that avoids the Texas Sharp-Shooter problem; Hazen came closer than any IDist.
    2)

    I know very well that neo darwinism is “short on supporting facts”. Very short, I would say. That’s exactly its problem. And if it were good at making “testable predictions” (possibly not self-referential), then it would be no more “short”. How can a prediction be “tested”, if not by observed facts?

    I think we agree that common descent is EXTREMELY well supported. The question is whether intervention is required; here, parsimony applies. MES makes lots of testable predictions; it is subject to disconfirmation daily.

    3)
    Irreducible complexity, per Behe, assumes that the final step in construction of a system is the addition of element. ‘Nuff said.
    4)

    I don’t know why you link that post by Elizabeth. I have discussed it in detail time ago. I can do that again, if you like. It’s a good example of bad reasoning.

    Let’s continue that discussion there, then.
    Did you use a different “handle” at TSZ?
    5)

    DNAJ wrote:
    All outcomes, all sequences, are the results of stochastic processes.
    GP:
    Are you saying that the result of a computer computation which gives a specific mathematical outcome to a specific mathematical problem (even a simple one, like 2+2) is not different from the results of coin tossing? 4 as the result of the computation of 2+2 is “stochastic” as the sequence 0010111010 (let’s say derived from a fair coin tossing)? Is that your point? Please, clarify.

    I was talking in the context of DNA or protein sequences. As I said earlier in the quoted post, “For many, many processes the stochastic element is virtually undetectable, and can safely be ignored. Thus we can safely ignore the stochastic effects of QM for almost all biological systems.”
    If you add 2+2 on a computer, the result will be 4 almost all of the time. If you use a parity check, then the risk of error goes down even further. To steal from Orwell: “All processes are stochastic, but some are more stochastic than others.”
    6)

    At least one of the best supporters of ID here is a YEC. I respect him deeply, and he is perfectly correct in all his ID arguments, without mixing them with other types of arguments.

    I am curious, to whom do you refer?

    The meat of the matter
    Where we disagree, AFAICT, is your insistence that RV and NS must be considered separately AND that no NS can act until there is a selective advantage that is a “fact”, meaning it has been demonstrated to be operative (and, you seem to imply, historically accurate?) by evidence that you personally find clear and convincing. I, OTOH, am willing to posit small selective advantages for simpler, poorly optimized polymers, and try to investigate what these rudimentary functionalities might look like.
    And the experimental data on protein evolution supports me here: in particular, Phylos Inc demonstrated that using libraries of sizes of ~ 10^13 (e.g. USP 6,261,804), you could evolve peptides that bound to pretty much ANYTHING. Unfortunately, I can’t get much more specific, but here’s a “statement against interest”: the libraries produced better binders if the random peptide was anchored by an invariant ‘scaffold’. They used fibronectin, but I suspect that a bit of beta sheet at each end of the random peptide would have done the trick. They also had a technical problem in optimizing catalysis, but that limitation would not apply in actual living systems.
    Dimensionality
    I explained to you, with supporting data, why “Take-home : there are lots of “unrelated” sequences that have probabilities above 1/n.”
    You replied, in part:

    RV in the DNA sequence is not “multidimensional”, as far as I can understand .

    Your understanding is lacking. How many different ways are there for a random walk to go from AAAAA to GTTCC? See how there are five dimensions? Of course, no sane person looks at distantly related-or-unrelated proteins and compares the cDNA sequences to determine whether they are related. Everyone, including yourself, looks at the protein sequence. There are 20 possible attribute levels at each position, and the multi-dimensional nature of the space becomes more obvious.
    Thus when you discussed the amazingly unlikely, 20^-83, nature of PDZ in light of McLaughlin, you did not appear to notice that about 90% of PDZ’s 1,577 nearest neighbors retain partial function. So there are 1,419 different final steps could reach PDZ. To a first approximation, we might estimate that 80% of the sequences that are two steps away would have partial function: 80 % of 2,484,564 is 1,987,651 sequences that are only two steps away. Three steps away…
    The point being, in a multi-dimensional space, you have a lot of neighbors.

    Modulators of Gene Expression

    To recap our discussion of transcription factors:

    GP:
    Third, individual short modules, very often, while functional in a greater context (the long proteins), would not be naturally selectable by themselves (or, again, we have no evidence for that).
    DNAJ :
    “Well, it could itself bind to DNA, and thus sterically exclude RNA polymerase from binding. Kd for the monomer is much higher, but it still exists. And any dimerization will produce a protein dimer that recognizes a palindrome with far lower Kd, purely as a consequence of C2 symmetry. Call me biased, but I think modulation of transcription is pretty useful.”
    GP:
    Useful is not the same as naturally selectable. And again, where are evidences? Facts?

    And in parallel

    GP:
    Second…But just think of transcription factors, and all the part of their sequence (usually the longest part) which is not a DBD.
    DNAJ
    “In my youth, I spent a lot of time thinking about transcription factors. You are assuming that they need to be that long in order to carry out their function. Furthermore, you are assuming that their progenitor had to be that long and complicated to perform the progenitor’s function. If you want to talk about facts, they are against you here.”
    GP:
    Why? You know as well as I do that the function of a TF is not linked only to the conserved DBDs. TFs act combinatorially at the enhanceosome, interacting one with many others after having been bound to DNA. There can be no doubt that other parts of the molecule are extremenly relevant to the regulatory activity, to its combinatorial complexity, and therefor to the final result. And you certainly know that the epigenetic details of that regulatory combinatorial activity may be very different in different species and contexts, even for the same TF. These are facts. I don’t understand your statement

    Oh dear. Some TFs interact combinatorially; some genes are subject to epigenetic modulation. I am very familiar with examples of both (although I did get the mechanism of epigenetic regulation almost completely wrong, oops). But we are talking about what is the minimal functionality that could be selectable. A helix-turn-helix motif is enough. Add dimerization to square its effectiveness. Add a short amphipathic helix or a short negatively charged random peptide and you’ve turned a repressor into an activator. Your fallacy is to look at an optimized system, and say “it must have been this good to be selectable”. Some modulators of gene expression are very simple. IMO, the control of transcription in eukaryotes shows all the hallmarks of being a rather inefficient series of kludges cobbled together. If you want to see a system that is REALLY optimized, check out bacteriophage. THEY are the pinnacles of evolution (or of creation, if you prefer). Why should that be the case?

  246. 246
    DNA_Jock says:

    Correction : PDZ has only 2.456 million 2-step neighbors

  247. 247
    Joe says:

    What evidence demonstrates that blind and undirected chemical processes can produce DNA on a world that didn’t have DNA? (hint- there isn’t any)

    What evidence demonstrates that blind and undirected chemical processes can produce transcription and translation on a world that didn’t have it? (hint- there isn’t any)

  248. 248
    Joe says:

    What evidence demonstrates that blind and undirected chemical processes can produce transcription factors on a world that didn’t have any? (hint- there isn’t any)

  249. 249
    kairosfocus says:

    F/N: I have responded on record to recent comments on islands of function and the like, as well as abusive tactics, here. KF

  250. 250
    DNA_Jock says:

    Kf,
    Great to know that you have not forgotten this thread.
    You have already admitted that you assume independence and that this assumption is incorrect (“in info contexts reduce info capacity”), but you have asserted that this error is “not material”.
    How big is the error?
    How do you know?
    Please be as precise and concise as you can.

  251. 251
    Rich says:

    KF, because I think closing comments as an act of censorship and cowardly and more reflective of “wanting to keep a clean copy” rather than have a substantive exchange, I clicked through to your post but did not read it. I recommend others do the same until KF stops trying to bully the message and regions the debate.

    ‘Bydand’ indeed.

  252. 252
    gpuccio says:

    DNA_Jock:

    The discussion continues to be very good. I would very much like to leave to you the last word, but at least a few simple comments are due.

    For the “Texas Sharp-Shooter problem”, I think that I have discussed that aspect in my posts 146 and 149 here. If you like, you could refer to them

    The discussion about Elizabeth’s post, if I remember well, was “parallel”: I posted here and my interlocutors posted at TSZ. I have nothing against posting at TSZ (I have done that, or at least in similar places. more than once time ago). However, I decided some time ago to limit my activity to UD: it is already too exacting this way.

    However, my criticism to Lizzie’s argument is very simple: it is an example of intelligent selection applied to random variation. It is of the same type of the Weasel and of Szostak’s ATP binding protein.

    You see, I am already well convinced that RV + IS can generate dFSCI. It is the bottom up strategy to engineer things. So, I have no proble with Lizzie’s example, example for its title:

    “Creating CSI with NS”

    That is simply wrong. “Creating CSI with design by IS” would be perfectly fine.

    Your field seems to willfully ignore the difference between NS and IS. It is a huge difference.

    IS requires a conscious intelligent agent who recognizes some function as desirable, sets the context to develop it, can measure it at any desired level, and can intervene in the system to expand any result which shows any degree of the desired function. IOWs, both the definition of the function, the way to measure it, and the interventions to facilitate its emergence are carefully engineered. It’s design all the way.

    On the contrary, NS assumes that some new complex function arises in a system which is not aware of its meaning and possibilities, only because some intermediary steps represent a step to it, and through the selection of the intermediary steps because of one property alone: higher reproductive success.

    So, I ask a simple question: what reproductive success is present in Lizzie’s example? None at all. It’s the designer who selects what he wants to obtain. The property selected has no capability at all to be selected “on its own merits”.

    Therefore, Lizzie’s example has nothing to do with NS.

    I am certain of Lizzie’s good faith. I have great esteem for her. I am equally certain that she is confused about these themes.

    I don’t remember ever having discussed PDZ. However, I understand your discussion about neighbours, and if that’s what you mean by multidimensional I have no problem with that.

    My point remains that the only way to judge if a sequence is a neighbour of another one, in the absence of any other observable, is sequence similarity. I can accept strucure similarity as a marker of “neighbouroodness” even in absence of detectable sequence similarity. But in the absence of both similarities, I maintain that two sequences should be considered as unrelated. Which does not mean that one cannot be derived from the other. But it means that it is distant from the other, and therefore the probability of reaching it in a random walk is in the range of the lower probability states.

    I am well aware that the target space is not a single sequence, but a set of sequences. That’s what we call “the functional set”, whose probability is approximately measured by the target space/search space ration. That’s why I usually refer to Durston’s results as a measure of the functional complexity. In the case of ATP synthase, mt 1600 bits derive from a consideration only of the AA positions which have been conserved throughout natural history. If I had used Durston’s method, I would have got a higher value of complexity. Again, I am using a lower threshold of functional complexity for that molecule.

    About optimization, do you agree that ATP synthase seems to have been highly optimized already in LUCA?

    About TFs, I respect your idea that “the control of transcription in eukaryotes shows all the hallmarks of being a rather inefficient series of kludges cobbled together”, but I strongly disagree. I think it shows all the hallmarks of being an extremely efficient combinatorial system of which we still don’t understand almost anything.

    Different scientific epistemologies often entail different interpretations. We will see, in time, who is right.

  253. 253
    DNA_Jock says:

    gpuccio,

    Given our shared interest in the evolution of ‘novel’ protein folds, I thought you might find this recent paper interesting:
    Large-scale determination of previously unsolved protein structures using evolutionary information
    This work demonstrates that there are many proteins that lack any homology that is detectable at the simple protein-sequence-alignment level, but in fact have similar 3D structures. Also, once you start thinking about protein evolution in the context of co-evolution of distant residues that make contacts, the multi-dimensional search space is not so sparse as one might think and it is, in fact, rather well interconnected.
    Enjoy.

Leave a Reply