Uncommon Descent Serving The Intelligent Design Community

On The Calculation Of CSI

Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

My thanks to Jonathan M. for passing my suggestion for a CSI thread on and a very special thanks to Denyse O’Leary for inviting me to offer a guest post.

[This post has been advanced to enable a continued discussion on a vital issue. Other newer stories are posted below. – O’Leary ]

In the abstract of Specification: The Pattern That Signifies Intelligence, William Demski asks “Can objects, even if nothing is known about how they arose, exhibit features that reliably signal the action of an intelligent cause?” Many ID proponents answer this question emphatically in the affirmative, claiming that Complex Specified Information is a metric that clearly indicates intelligent agency.

As someone with a strong interest in computational biology, evolutionary algorithms, and genetic programming, this strikes me as the most readily testable claim made by ID proponents. For some time I’ve been trying to learn enough about CSI to be able to measure it objectively and to determine whether or not known evolutionary mechanisms are capable of generating it. Unfortunately, what I’ve found is quite a bit of confusion about the details of CSI, even among its strongest advocates.

My first detailed discussion was with UD regular gpuccio, in a series of four threads hosted by Mark Frank. While we didn’t come to any resolution, we did cover a number of details that might be of interest to others following the topic.

CSI came up again in a recent thread here on UD. I asked the participants there to assist me in better understanding CSI by providing a rigorous mathematical definition and showing how to calculate it for four scenarios:

  1. A simple gene duplication, without subsequent modification, that increases production of a particular protein from less than X to greater than X. The specification of this scenario is “Produces at least X amount of protein Y.”
  2. Tom Schneider’s ev evolves genomes using only simplified forms of known, observed evolutionary mechanisms, that meet the specification of “A nucleotide that binds to exactly N sites within the genome.” The length of the genome required to meet this specification can be quite long, depending on the value of N. (ev is particularly interesting because it is based directly on Schneider’s PhD work with real biological organisms.)
  3. Tom Ray’s Tierra routinely results in digital organisms with a number of specifications. One I find interesting is “Acts as a parasite on other digital organisms in the simulation.” The length of the shortest parasite is at least 22 bytes, but takes thousands of generations to evolve.
  4. The various Steiner Problem solutions from a programming challenge a few years ago have genomes that can easily be hundreds of bits. The specification for these genomes is “Computes a close approximation to the shortest connected path between a set of points.”

vjtorley very kindly and forthrightly addressed the first scenario in detail. His conclusion is:

I therefore conclude that CSI is not a useful way to compare the complexity of a genome containing a duplicated gene to the original genome, because the extra bases are added in a single copying event, which is governed by a process (duplication) which takes place in an orderly fashion, when it occurs.

In that same thread, at least one other ID proponent agrees that known evolutionary mechanisms can generate CSI. At least two others disagree.

I hope we can resolve the issues in this thread. My goal is still to understand CSI in sufficient detail to be able to objectively measure it in both biological systems and digital models of those systems. To that end, I hope some ID proponents will be willing to answer some questions and provide some information:

  1. Do you agree with vjtorley’s calculation of CSI?
  2. Do you agree with his conclusion that CSI can be generated by known evolutionary mechanisms (gene duplication, in this case)?
  3. If you disagree with either, please show an equally detailed calculation so that I can understand how you compute CSI in that scenario.
  4. If your definition of CSI is different from that used by vjtorley, please provide a mathematically rigorous definition of your version of CSI.
  5. In addition to the gene duplication example, please show how to calculate CSI using your definition for the other three scenarios I’ve described.

Discussion of the general topic of CSI is, of course, interesting, but calculations at least as detailed as those provided by vjtorley are essential to eliminating ambiguity. Please show your work supporting any claims.

Thank you in advance for helping me understand CSI. Let’s do some math!

Comments
tgpeeler,
mg @ 390 “You’re confusing the map with the territory. Just because it is possible to model some aspects of biological systems as “language” doesn’t mean that you can logically conclude an intelligence behind that language. That’s the fallacy of equivocation.” Hardly. Are you now saying that there is no such thing as REAL biological information? Is that what I’m hearing?
I said nothing about "biological information". I simply made the point that modeling some aspects of biological systems as a "language" doesn't mean you can then equivocate on the concepts underlying the term in order to define your intelligent agent into existence.MathGrrl
March 29, 2011
March
03
Mar
29
29
2011
03:08 PM
3
03
08
PM
PDT
#401 tgpeeler It’s a short answer. Maybe that’s why you missed it last time. The base pairs symbolize LIFE. They all symbolise the same thing! Then every DNA string symbolises the same thing and is equivalent to every other one. This is indeed a radical theory.markf
March 29, 2011
March
03
Mar
29
29
2011
02:40 PM
2
02
40
PM
PDT
mg @ 392 "There are a number of information theorists who would beg to differ." I'm sure there are. They'd be wrong, too. I will say again, boldly, if I may, that it is IMPOSSIBLE to account for the phenomenon of information in terms of the laws of physics. Why do I say this? Because symbols, rules (language), rationality (logic), free will (freely assembling symbols according to the aforementioned language specific rules and laws of rational thought), and intentionality (for a reason, to communicate a message) are ALL necessary for human information/communication. Materialism or naturalism or physicalism, whatever ilk your particular version is, all fail to account for human language and information because the only "tool" they have to explain anything and everything are the laws of physics (embarrassingly, they happen to be immaterial but I'll leave that alone for now). Therefore, as metaphysical projects, all of these "isms" utterly fail. The "assumption" that the natural or material or physical is all there is is clearly and obviously false. Whatever incarnation of the naturalistic story of life is currently being discussed is just false. They are all false. It is impossible for any naturalistic account of life to be true. Anybody who can string a couple of thoughts together should be able to track this with no problem. I know this includes you. So if I'm wrong, tell me how I'm wrong. Then I'll change my mind. But until you bring an actual argument to rebut the argument I'm making I'm afraid I will remain unmoved in my opinion that trying to explain information of any kind without symbols and rules is sheer lunacy. If you would STOP and THINK about this for a moment before dashing off a dismissal of one kind or another, you would see that I am correct about this. Analyze your own posts. Do they not obey (generally) the laws of reason? Yes. You make use of the law of identity. Do you not freely use English symbols, arranged according to arbitrary convention, to purposefully communicate a message? Yes, you do. I get it that this is a bold claim. Perhaps even grandiose. But that doesn't make it any less true. The materialist project is defeated. That game is over. You can restart it by communicating something without using a language. Good luck with that.tgpeeler
March 29, 2011
March
03
Mar
29
29
2011
02:00 PM
2
02
00
PM
PDT
mg @ 390 "You’re confusing the map with the territory. Just because it is possible to model some aspects of biological systems as “language” doesn’t mean that you can logically conclude an intelligence behind that language. That’s the fallacy of equivocation." Hardly. Are you now saying that there is no such thing as REAL biological information? Is that what I'm hearing?tgpeeler
March 29, 2011
March
03
Mar
29
29
2011
01:25 PM
1
01
25
PM
PDT
mf @ 381 "So I then asked what do the symbols in DNA (presumably the base pairs) symbolise? Actually I have asked this three times now." It's a short answer. Maybe that's why you missed it last time. The base pairs symbolize LIFE.tgpeeler
March 29, 2011
March
03
Mar
29
29
2011
01:11 PM
1
01
11
PM
PDT
As for the EF answer the question- do you think scientists flip a coin or throw darts? Or do you think they have a methodology? JR:
Perhaps they use the same methodology as is used in the calculation of CSI?
That doesn't make any sense. Is nonsense the best you have to offer? JR:
I.E they don’t.
They don't what? They don't have a methodology? They don't need to eliminate necessity and chance before reaching a design inference? Can you do anything more than babble incoherently?Joseph
March 29, 2011
March
03
Mar
29
29
2011
12:08 PM
12
12
08
PM
PDT
Muramasa: bockqote>Oh, and Joseph, can you tell me who said: “I’ve pretty much dispensed with the EF. It suggests that chance, necessity, and design are mutually exclusive. They are not. Straight CSI is clearer as a criterion for design detection.” I know who said it. The SAME person who said:
In an off-hand comment in a thread on this blog I remarked that I was dispensing with the Explanatory Filter in favor of just going with straight-up specified complexity. On further reflection, I think the Explanatory Filter ranks among the most brilliant inventions of all time (right up there with sliced bread). I’m herewith reinstating it — it will appear, without reservation or hesitation, in all my future work on design detection. (see here)
Yeah baby...Joseph
March 29, 2011
March
03
Mar
29
29
2011
11:59 AM
11
11
59
AM
PDT
MathGrrl:
Gene duplication involves far more than the 250 base pairs required to meet your 500 bit limit.
Except gene duplications don't have any place at the OoL table. And CSI relates to origins. Not only that there isn't any mathematically rigorous definition of gene dupication that demonstrates it is a blind watchmaker process. As for a definition of CSI Information is taken care of by Shannon- mathematical rigor and all. Soecified information is Shannon information with meaning/ function. Complex Specified Informtion is Specified Information of 500 bits or more. As for GAs creating CSI- GAs create what they are designed to create. All the SI is there for them as a resource. Show me a GA that arose without a designing agency and you will have something. But then again MathGrrl refuses to read "No Free Lunch" so this is all a waste of bandwidth. Although I am sure even if she read it she would still have these hollow criticisms.Joseph
March 29, 2011
March
03
Mar
29
29
2011
11:50 AM
11
11
50
AM
PDT
Why CSI is difficult to calculate, even though it is real: An elegant program is the shortest program that produces a given output. In other words, a program P is elegant if no program shorter than P produces the same output as P. NOTE: "Program" here means code plus input data. Every elegant program produces a single, unique (but possibly infinite) output. Some obvious but important consequences of this definition are 1) every output from an elegant program is computable and 2) there are an infinite number of elegant programs, one for each possible computable output. Theorem (Chaitin): It is not possible in general to determine whether or not a given program is elegant.Albert Voie
March 29, 2011
March
03
Mar
29
29
2011
11:25 AM
11
11
25
AM
PDT
Tomato Addict,
MathGrrl writes at #358:
My gut instinct is that, if CSI can be rigorously defined and objectively calculated, GAs will prove to be capable of generating it. I’d very much like to test that instinct (and ID claims at the same time).
This would seem to be a reasonable goal for all interested parties. If this is testable, it should be tested!
Thank you! I'm glad that others find my request reasonable.MathGrrl
March 29, 2011
March
03
Mar
29
29
2011
06:41 AM
6
06
41
AM
PDT
PaV, Your football field analogy does not reflect this situation in the slightest.
When, by definition, a bit sequence has to be at least 500 bits long to rank as CSI, and someone wants you to rigoruosly define CSI for a bit-string (sequence) that is 260 bits long, what would you make of it?
Gene duplication involves far more than the 250 base pairs required to meet your 500 bit limit. I have already provided a link to Schneider using ev in excess of that limit. It is also easy to come up with a Steiner problem that requires more than a 500 bit genome to solve. Please provide a mathematically rigorous definition of CSI and apply it to those scenarios so that others can perform similar calculations.MathGrrl
March 29, 2011
March
03
Mar
29
29
2011
06:41 AM
6
06
41
AM
PDT
CJYman, Nice to see you in this thread!
I do understand that MathGrrl would like to have CSI calculated for the scenarios that she has provided, and in principle that’s great, but as PaV has noted above if she can’t figuer out the examples already given her, then there is no reason to suspect that if we take the time to go through her provided scenarios she would understand the calculations any better.
I hardly think that's fair, based on our interaction in the last thread. There are a number of questions around your calculation of CSI for titin that remain unanswered. It also appears that your definition of CSI isn't the same as that discussed by Dembski.
It would be better if she understood first what we’ve already tried to explain and calculate for her and then she can work on her own examples herself and ask us what we think about her calculations.
If someone would be kind enough to define CSI with mathematical rigor and show me, in detail, how to calculate it for the four scenarios I described, I will be more than happy to implement the metric in software and share the results I get from other scenarios I have in mind. Are you or are you not willing and able to do this?MathGrrl
March 29, 2011
March
03
Mar
29
29
2011
06:40 AM
6
06
40
AM
PDT
PaV,
My claim is this: Dembski’s book, NFL, contains a rigorous, mathematical description/definition of CSI.
The amount of confusion regarding the definition of CSI, even among ID proponents on this very thread, demonstrates that Dembski's discussion is insufficiently rigorous to allow the objective, unambiguous calculation of CSI.
MathGrrl doesn’t want the definition contained there, she wants a worked-out example of CSI. Very different things.
If you have a mathematically rigorous definition of a metric that is claimed to be calculable, providing example calculations should not be a problem. The rest of your long post contains neither a rigorous definition of CSI nor example calculations for the scenarios I described. After repeatedly asking these straightforward questions, I believe I am justified in provisionally concluding that you are unable to answer them.MathGrrl
March 29, 2011
March
03
Mar
29
29
2011
06:39 AM
6
06
39
AM
PDT
tgpeeler,
I am saying, per my previous post, and interminable posts prior to this on other threads, that is it impossible, IN PRINCIPLE, i.e. it is logically impossible, to explain information in terms of algorithms....
There are a number of information theorists who would beg to differ.MathGrrl
March 29, 2011
March
03
Mar
29
29
2011
06:38 AM
6
06
38
AM
PDT
kairosfocus, Thank you for your copious responses. I appreciate the time you have taken to participate in this thread. My focus here, though, is to get qualitative answers to the questions I posed my original post. I'm sure that we will have the opportunity to discuss your thoughts in other threads and I will, of course, be happy to re-engage with you on this one if you choose to provide the definition and calculations I've requested. PS: Durston's metric is not the same as Dembski's. It might be interesting to discuss, but it isn't applicable to my questions.MathGrrl
March 29, 2011
March
03
Mar
29
29
2011
06:38 AM
6
06
38
AM
PDT
tgpeeler,
The reason that we “are very concerned” that mg has not addressed the presence of symbols is that the presence of symbols destroys the materialist project. In other words, it’s not even possible for you to be right.
You're confusing the map with the territory. Just because it is possible to model some aspects of biological systems as "language" doesn't mean that you can logically conclude an intelligence behind that language. That's the fallacy of equivocation.MathGrrl
March 29, 2011
March
03
Mar
29
29
2011
06:37 AM
6
06
37
AM
PDT
F/N 3: Excerpting the Durston et al paper: ____________ >> It is known that the variability of data can be measured using Shannon uncertainty [16]. However, Shannon's original formulation when applied to biological sequences does not express variations related to biological functionality such as metabolic utility. Shannon uncertainty, however, can be extended to measure the joint variable (X, F), where X represents the variability of data, and F functionality. This explicitly incorporates empirical knowledge of metabolic function into the measure that is usually important for evaluating sequence complexity. This measure of both the observed data and a conceptual variable of function jointly can be called Functional Uncertainty (Hf) [17], and is defined by the equation: H(Xf(t)) = -?P(Xf(t)) logP(Xf(t))(1) where Xf denotes the conditional variable of the given sequence data (X) on the described biological function f which is an outcome of the variable (F). For example, a set of 2,442 aligned sequences of proteins belonging to the ubiquitin protein family (used in the experiment later) can be assumed to satisfy the same specified function f, where f might represent the known 3-D structure of the ubiquitin protein family, or some other function common to ubiquitin. The entire set of aligned sequences that satisfies that function, therefore, constitutes the outcomes of Xf. Here, functionality relates to the whole protein family which can be inputted from a database. The advantage of using H(Xf(t)) is that changes in the functionality characteristics can be incorporated and analyzed. Furthermore, the data can be a single monomer, or a biosequence, or an entire set of aligned sequences all having the same common function . . . . The change in functional uncertainty (denoted as ?Hf) between two states can be defined as ?H (Xg(ti), Xf(tj)) = H(Xg(tj)) - H(Xf(ti))(2) where Xf (ti) and Xg (tj) can be applied to the same sequence at two different times or to two different sequences at the same time. ?Hf can then quantify the change in functional uncertainty between two biopolymeric states with regard to biological functionality. Unrelated biomolecules with the same function or the same sequence evolving a new or additional function through genetic drift can be compared and analyzed. A measure of ?Hf can increase, decrease, or remain unchanged . . . . The ground state g (an outcome of F) of a system is the state of presumed highest uncertainty (not necessarily equally probable) permitted by the constraints of the physical system, when no specified biological function is required or present. Certain physical systems may constrain the number of options in the ground state so that not all possible sequences are equally probable [27] . . . . The null state, a possible outcome of F denoted as ø, is defined here as a special case of the ground state of highest uncertainly when the physical system imposes no constraints at all, resulting in the equi-probability of all possible sequences or options. Such sequencing has been called "dynamically inert, dynamically decoupled, or dynamically incoherent" [28,29]. For example, the ground state of a 300 amino acid protein family can be represented by a completely random 300 amino acid sequence . . . . The change in functional uncertainty from the null state is, therefore, ?H(Xø(ti), Xf(tj)) = log (W) - H(Xf(ti)).(5) Physical constraints increase order and change the ground state away from the null state, restricting freedom of selection and reducing functional sequencing possibilities, as mentioned earlier. The genetic code, for example, makes the synthesis and use of certain amino acids more probable than others, which could influence the ground state for proteins. However, for proteins, the data indicates that, although amino acids may naturally form a nonrandom sequence when polymerized in a dilute solution of amino acids [30], actual dipeptide frequencies and single nucleotide frequencies in proteins are closer to random than ordered [31]. For this reason, the ground state for biosequences can be approximated by the null state. The value for the measured FSC of protein motifs can be calculated by relating the joint (X, F) pattern to a stochastic ensemble, the null state in the case of biopolymers that includes any random string from the sequence space . . . . The measure of Functional Sequence Complexity, denoted as ?, is defined as the change in functional uncertainty from the ground state H(Xg(ti)) to the functional state H(Xf(ti)), or ? = ?H (Xg(ti), Xf(tj)).(6) The resulting unit of measure is defined on the joint data and functionality variable, which we call Fits (or Functional bits). The unit Fit thus defined is related to the intuitive concept of functional information, including genetic instruction and, thus, provides an important distinction between functional information and Shannon information [6,32]. Eqn. (6) describes a measure to calculate the functional information of the whole molecule, that is, with respect to the functionality of the protein considered. The functionality of the protein can be known and is consistent with the whole protein family, given as inputs from the database. >> ______________ Durston et al go on in much more details, and give a table of values of FSC for 35 protein families [and of course FSC is functionally specified complexity of sequence]. Given here: http://www.tbiomed.com/content/4/1/47/table/T1 All, in the peer reviewed literature:
Measuring the functional sequence complexity of proteins Kirk K Durston1 email, David KY Chiu2 email, David L Abel3 email and Jack T Trevors4 Theoretical Biology and Medical Modelling 2007, 4:47doi:10.1186/1742-4682-4-47
But of course, the results that passed peer review and were published in a significant journal, years ago, and as repeatedly linked or mentioned in previous threads and as linked from the WACs, are meaningless, and are not sufficiently mathematically defined to be of significance . Etc etc. NOT!kairosfocus
March 29, 2011
March
03
Mar
29
29
2011
04:34 AM
4
04
34
AM
PDT
F/N 3: Excerpting the Durston et al paper: ____________ >> It is known that the variability of data can be measured using Shannon uncertainty [16]. However, Shannon's original formulation when applied to biological sequences does not express variations related to biological functionality such as metabolic utility. Shannon uncertainty, however, can be extended to measure the joint variable (X, F), where X represents the variability of data, and F functionality. This explicitly incorporates empirical knowledge of metabolic function into the measure that is usually important for evaluating sequence complexity. This measure of both the observed data and a conceptual variable of function jointly can be called Functional Uncertainty (Hf) [17], and is defined by the equation: H(Xf(t)) = -?P(Xf(t)) logP(Xf(t))(1) where Xf denotes the conditional variable of the given sequence data (X) on the described biological function f which is an outcome of the variable (F). For example, a set of 2,442 aligned sequences of proteins belonging to the ubiquitin protein family (used in the experiment later) can be assumed to satisfy the same specified function f, where f might represent the known 3-D structure of the ubiquitin protein family, or some other function common to ubiquitin. The entire set of aligned sequences that satisfies that function, therefore, constitutes the outcomes of Xf. Here, functionality relates to the whole protein family which can be inputted from a database. The advantage of using H(Xf(t)) is that changes in the functionality characteristics can be incorporated and analyzed. Furthermore, the data can be a single monomer, or a biosequence, or an entire set of aligned sequences all having the same common function . . . . The change in functional uncertainty (denoted as ?Hf) between two states can be defined as ?H (Xg(ti), Xf(tj)) = H(Xg(tj)) - H(Xf(ti))(2) where Xf (ti) and Xg (tj) can be applied to the same sequence at two different times or to two different sequences at the same time. ?Hf can then quantify the change in functional uncertainty between two biopolymeric states with regard to biological functionality. Unrelated biomolecules with the same function or the same sequence evolving a new or additional function through genetic drift can be compared and analyzed. A measure of ?Hf can increase, decrease, or remain unchanged . . . . The ground state g (an outcome of F) of a system is the state of presumed highest uncertainty (not necessarily equally probable) permitted by the constraints of the physical system, when no specified biological function is required or present. Certain physical systems may constrain the number of options in the ground state so that not all possible sequences are equally probable [27] . . . . The null state, a possible outcome of F denoted as ø, is defined here as a special case of the ground state of highest uncertainly when the physical system imposes no constraints at all, resulting in the equi-probability of all possible sequences or options. Such sequencing has been called "dynamically inert, dynamically decoupled, or dynamically incoherent" [28,29]. For example, the ground state of a 300 amino acid protein family can be represented by a completely random 300 amino acid sequence . . . . The change in functional uncertainty from the null state is, therefore, ?H(Xø(ti), Xf(tj)) = log (W) - H(Xf(ti)).(5) Physical constraints increase order and change the ground state away from the null state, restricting freedom of selection and reducing functional sequencing possibilities, as mentioned earlier. The genetic code, for example, makes the synthesis and use of certain amino acids more probable than others, which could influence the ground state for proteins. However, for proteins, the data indicates that, although amino acids may naturally form a nonrandom sequence when polymerized in a dilute solution of amino acids [30], actual dipeptide frequencies and single nucleotide frequencies in proteins are closer to random than ordered [31]. For this reason, the ground state for biosequences can be approximated by the null state. The value for the measured FSC of protein motifs can be calculated by relating the joint (X, F) pattern to a stochastic ensemble, the null state in the case of biopolymers that includes any random string from the sequence space . . . . The measure of Functional Sequence Complexity, denoted as ?, is defined as the change in functional uncertainty from the ground state H(Xg(ti)) to the functional state H(Xf(ti)), or ? = ?H (Xg(ti), Xf(tj)).(6) The resulting unit of measure is defined on the joint data and functionality variable, which we call Fits (or Functional bits). The unit Fit thus defined is related to the intuitive concept of functional information, including genetic instruction and, thus, provides an important distinction between functional information and Shannon information [6,32]. Eqn. (6) describes a measure to calculate the functional information of the whole molecule, that is, with respect to the functionality of the protein considered. The functionality of the protein can be known and is consistent with the whole protein family, given as inputs from the database. >> ______________ Durston et al go on in much more details, and give a table of values of FSC for 35 protein families [and of course FSC is functionally specified complexity of sequence]. Given here: http://www.tbiomed.com/content/4/1/47/table/T1 All, in the peer reviewed literature:
Measuring the functional sequence complexity of proteins Kirk K Durston1 email, David KY Chiu2 email, David L Abel3 email and Jack T Trevors4 Theoretical Biology and Medical Modelling 2007, 4:47doi:10.1186/1742-4682-4-47
But of course, the results that passed peer review and were published in a significant journal, years ago, and as repeatedly linked or mentioned in previous threads and as linked from the WACs, are meaningless, and are not sufficiently mathematically defined to be of significance . Etc etc. NOT!kairosfocus
March 29, 2011
March
03
Mar
29
29
2011
04:33 AM
4
04
33
AM
PDT
F/N 2: For reference, from the UD weak argument correctives top right this and every UD page for some years now: _________________ >> 25] Intelligent Design proponents deny, without having a reason, that randomness can produce an effect, and then go make something up to fill the void ID proponents do not deny that “randomness can produce an effect.” For instance, consider the law-like regularity that unsupported heavy objects tend to fall. It is reliable; i.e. we have a mechanical necessity at work — gravity. Now, let our falling heavy object be a die. When it falls, it tumbles and comes to rest with any one of six faces uppermost: i.e. high contingency. But, as the gaming houses of Las Vegas know, that contingency can be (a) effectively undirected (random chance), or (b) it can also be intelligently directed (design). Also, such highly contingent objects can be used to store information, which can be used to carry out functions in a given situation. For example we could make up a code and use trays of dice to implement a six-state digital information storing, transmission and processing system. Similarly, the ASCII text for this web page is based on electronic binary digits clustered in 128-state alphanumeric characters. In principle, random chance could produce any such message, but the islands of functional messages will as a rule be very isolated in the sea of non-functional, arbitrary strings of digits, making it very hard to find functional strings by chance. ID thinkers have therefore identified means to test for objects, events or situations that are credibly beyond the reach of chance on the gamut of our observed cosmos. (For simple example, as a rule of thumb, once an entity requires more than about 500 – 1,000 bits of information storage capacity to carry out its core functions, the random walk search resources of the whole observed universe acting for its lifetime will probably not be adequate to get to the functional strings: trying to find a needle in a haystack by chance, on steroids.) Now, DNA for instance, is based on four-state strings of bases [A/C/G/T], and a reasonable estimate for the minimum required for the origin of life is 300,000 – 500,000 bases, or 600 kilo bits to a million bits. The configuration space that even just the lower end requires has about 9.94 * 10^180,617 possible states. So, even though it is in principle possible for such a molecule to happen by chance, the odds are not practically different from zero. But, intelligent designers routinely create information storage and processing systems that use millions or billions of bits of such storage capacity. Thus, intelligence can routinely do that which is in principle logically possible for random chance, but which would easily empirically exhaust the probabilistic resources of the observed universe. That is why design thinkers hold that complex, specified information (CSI), per massive observation, is an empirically observable, reliable sign of design. 26] Dembski’s idea of “complex specified information” is nonsense First of all, the concept of complex specified information (CSI) was not originated by Dembski. For, as origin of life researchers tried to understand the molecular structures of life in the 1970?s, Orgel summed up their findings thusly: Living organisms are distinguished by their specified complexity. Crystals fail to qualify as living because they lack complexity; mixtures of random polymers fail to qualify because they lack specificity. [ L.E. Orgel, 1973. The Origins of Life. New York: John Wiley, p. 189. Emphases added.] In short, the concept of complex specified information helped these investigators understand the difference between (a) the highly informational, highly contingent functional macromolecules of life and (b) crystals formed through forces of mechanical necessity, or (c) random polymer strings. In so doing, they identified a very familiar concept — at least to those of us with hardware or software engineering design and development or troubleshooting experience and knowledge. Namely, complex, specified information, shown in the mutually adapted organization, interfacing and integration of components in systems that depend on properly interacting parts to fulfill objectively observable functions. For that matter, this is exactly the same concept that we see in textual information as expressed in words, sentences and paragraphs in a real-world language. Furthermore, on massive experience, such CSI reliably points to intelligent design when we see it in cases where we independently know the origin story. What Dembski did with the CSI concept in the following two decades was to: (i) recognize CSI’s significance as a reliable, empirically observable sign of intelligence, (ii) point out the general applicability of the concept, and (iii) provide a probability and information theory based explicitly formal model for quantifying CSI. 27] The Information in Complex Specified Information (CSI) Cannot Be Quantified That’s simply not true. Different approaches have been suggested for that, and different definitions of what can be measured are possible. As a first step, it is possible to measure the number of bits used to store any functionally specific information, and we could term such bits “functionally specific bits.” Next, the complexity of a functionally specified unit of information (like a functional protein) could be measured directly or indirectly based on the reasonable probability of finding such a sequence through a random walk based search or its functional equivalent. This approach is based on the observation that functionality of information is rather specific to a given context, so if the islands of function are sufficiently sparse in the wider search space of all possible sequences, beyond a certain scope of search, it becomes implausible that such a search on a planet wide scale or even on a scale comparable to our observed cosmos, will find it. But, we know that, routinely, intelligent actors create such functionally specific complex information; e.g. this paragraph. (And, we may contrast (i) a “typical” random alphanumeric character string showing random sequence complexity: kbnvusgwpsvbcvfel;’.. jiw[w;xb xqg[l;am . . . and/or (ii) a structured string showing orderly sequence complexity: atatatatatatatatatatatatatat . . . [The contrast also shows that a designed, complex specified object may also incorporate random and simply ordered components or aspects.]) Another empirical approach to measuring functional information in proteins has been suggested by Durston, Chiu, Abel and Trevors in their paper “Measuring the functional sequence complexity of proteins”, and is based on an application of Shannon’s H (that is “average” or “expected” information communicated per symbol: H(Xf(t)) = -?P(Xf(t)) logP(Xf(t)) ) to known protein sequences in different species. A more general approach to the definition and quantification of CSI can be found in a 2005 paper by Dembski: “Specification: The Pattern That Signifies Intelligence”. For instance, on pp. 17 – 24, he argues: define ?S as . . . the number of patterns for which [agent] S’s semiotic description of them is at least as simple as S’s semiotic description of [a pattern or target zone] T. [26] . . . . where M is the number of semiotic agents [S's] that within a context of inquiry might also be witnessing events and N is the number of opportunities for such events to happen . . . . [where also] computer scientist Seth Lloyd has shown that 10^120 constitutes the maximal number of bit operations that the known, observable universe could have performed throughout its entire multi-billion year history.[31] . . . [Then] for any context of inquiry in which S might be endeavoring to determine whether an event that conforms to a pattern T happened by chance, M·N will be bounded above by 10^120. We thus define the specified complexity [?] of T given [chance hypothesis] H [in bits] . . . as [the negative base-2 logarithm of the conditional probability P(T|H) multiplied by the number of similar cases ?S(t) and also by the maximum number of binary search-events in our observed universe 10^120] ? = – log2[10^120 ·?S(T)·P(T|H)]. To illustrate consider a hand of 13 cards with all spades, which is unique. 52 cards may have 635 *10^9 possible combinations, giving odds of 1 in 635 billions as P(T|H). Also, there are four similar all-of-one-suite hands, so ?S(T) = 4. Calculation yields ? = -361, i.e. > __________________ In short, the above is largely a revisiting of old ground, to no edifferent outcome than previously.kairosfocus
March 29, 2011
March
03
Mar
29
29
2011
04:13 AM
4
04
13
AM
PDT
F/N: The onward thread, sadly predictably, has been largely unproductive. I will comment on a few points of note, as it is plainly winding down to an impasse, as I predicted/expected from my very first for the record remark: 1 --> Meaningfulness of specified complexity and associated information, especially functionally specific complex organisation and associated information is a longstanding reality acknowledged by Orgel and Wicken from the 1970's in the technical literature on OOL and related areas. 2 --> At no point above has MG found herself able to acknowledge this. 3 --> All else in the above impasse follows from this, as the objective reality of functionally specific complexity is antecedent to mathematical models, analyses and metrics. 4 --> The what comes before the how much, as length comes before defining the metre as how far light goes in about 3 ns, or previously, the distance taken up by a certain number of wavelengths of a certain spectral line from a certain isotope of Cs; and before that, the distance between two marks on a certain bar of Pt alloy kept in France, and before that, a certain fraction of the distance from pole to equator through Paris. 5 --> You will note that I have consistently put up a simple, brute-force X-metric for FSCI (the most relevant subset of CSI: the specification is based on observed function), X = C*S*B, based on the Shannon-Hartley quantification of information, as is commonly used in the form of functional bits. 6 --> I have only added the Semiotic Agent/Observer who judges function and complexity . . . comparable to how an observer in the end has to make a judgement for us to measure a length, e.g. alignment with marks on a metre rule. 7 --> I have given cases and calculations, and I have applied it to the case of gene duplication as proposed by MG [exposing the implied complex regulatory processes being glided over question-beggingly]. 8 --> All, only to be dismissed without analysis, as giving only a flood of words without calculations and backdrop analysis [note how EVERY post I have ever made at UD links through my handle to a briefing note that ties my thoughts to the relevant information theory and thermodynamics and related factors]. 9 --> So, MG is either refusing to notice the calculations and underlying analysis, or she is willfully distorting what I have done. 10 --> When I and others have then pointed to other, more complex metrics, analyses and calculations [which build on the rudimentary principles involved in the admittedly crude X-metric], they too have been brushed aside, and goal posts have been repeatedly moved. Remember, Durston et al have provided FSC results for 35 protein families, based on extending Shannon's H-metric of average information per symbol. Remember, Dembski has provided books and papers --- and key excerpts have been given -- to quantify CSI, and has provided his own calculation for a specific case in that context. CJY and VJT have given calculations for specific cases. 11 --> All, have been brushed aside. 12 --> Now, someone above wishes to make the claim that the explanatory filter that is used to identify cases of law, chance and design affecting aspects of an object, process, system, or phenomenon, is a dubious novelty. 13 --> I beg to remind that person that every time we make the distinction between a meaningful message and meaningless noise or natural regularity, in a communication context, we are making a design inference on the filter, used intuitively. [BTW, this immediately extends to the hypothesis testing context where we are looking for intentional action vs chance patterns.] 14 --> In fact, in science, the work of identifying what is law and what is scatter is an explanatory filter; and this extends to inference to design in contexts where results of an intervention are being identified, such as in control/treatment studies. 15 --> In short, the idea of an inference filter is nothing new. What is controversial in the origins science context is simply this: that people are inferring on FSCI to design, in contexts that cut across dominant schools of thought and the sort of Sagan-Lewontin a priori materialism that was so plainly documented in 1997:
To Sagan, as to all but a few other scientists, it is self-evident that the practices of science provide the surest method of putting us in contact with physical reality, and that, in contrast, the demon-haunted world rests on a set of beliefs and behaviors that fail every reasonable test . . . . It is not that the methods and institutions of science somehow compel us to accept a material explanation of the phenomenal world, but, on the contrary, that we are forced by our a priori adherence to material causes to create an apparatus of investigation and a set of concepts that produce material explanations, no matter how counter-intuitive, no matter how mystifying to the uninitiated. Moreover, that materialism is absolute, for we cannot allow a Divine Foot in the door. [[From: “Billions and Billions of Demons,” NYRB, January 9, 1997.]
16 --> have you identified a case where FSCI [use the crude X metric to quantify, e.g. 143 ASCII characters of text would suffice . . . ], on empirical evidence, originates by blind chance and mechanical necessity without intelligent direction? 17 --> ANS: Plainly, not; or that would have been long since triumphantly announced.
(The sort of cases being suggested above, of gene duplication begs the question of the underlying regulatory mechanism and its origin, and programs like ev are intelligently designed and similarly beg the question of how they arrived on an island of function with a conveniently working hill climbing algorithm.)
17 --> The production of FSCI by intelligence, in many ways, is a matter of routine direct observation with literally billions of cases in point. That routine observation is backed up by the infinite monkeys type analysis that also grounds the related second law of thermodynamics. 18 --> So, on inference ot best explanation across aspects of a phenomenon, object etc, we are entitled to infer from FSCI as reliable sign to its observationally known source, intelligence. 19 --> And, in that context, we are entitled to use an explanatory framework -- we may call it a filter -- to differentiate signs that point us to mechanical necessity, to chance contingency and to choice contingency [aka design] as best explanation. ____________ So, we are plainly at impasse, and it still remains that FSCI and CSI more broadly are meaningful, are quantifiable in principle and in fact in enough cases to be relevant, as well as fitting into a broader view of the methods of science and similar serious explanatory investigations. G'day GEM of TKIkairosfocus
March 29, 2011
March
03
Mar
29
29
2011
04:04 AM
4
04
04
AM
PDT
PaV
I suppose for me to be pried away from what I do to focus long and hard on that particular problem would take, quite honestly, hundreds of thousands of dollars to begin to pique my interest.
As this would be a significant development for ID I'm sure a body like the Templeton foundation or the DI would be interested in providing the relevant funding. And compared to the amount of money that "Darwinism" get's on a daily basis a couple of hundred K is peanuts. So, given that you claim it's possible but for the lack of funds, and given that you are an ardent ID supporter (one of the few to step up to MathGrrls challenge) will you be putting together a proposal for funding? If not, why not?JemimaRacktouey
March 29, 2011
March
03
Mar
29
29
2011
02:26 AM
2
02
26
AM
PDT
Joseph
As for the EF answer the question- do you think scientists flip a coin or throw darts? Or do you think they have a methodology?
Perhaps they use the same methodology as is used in the calculation of CSI? I.E they don't.JemimaRacktouey
March 29, 2011
March
03
Mar
29
29
2011
01:55 AM
1
01
55
AM
PDT
Oh, and Joseph, can you tell me who said: "I’ve pretty much dispensed with the EF. It suggests that chance, necessity, and design are mutually exclusive. They are not. Straight CSI is clearer as a criterion for design detection."Muramasa
March 29, 2011
March
03
Mar
29
29
2011
01:23 AM
1
01
23
AM
PDT
Joseph, I am unfamiliar with "hind-n-seek". What exactly is that?Muramasa
March 29, 2011
March
03
Mar
29
29
2011
01:19 AM
1
01
19
AM
PDT
Jon @ 379. I count myself among the supportive onlookers. Up until now I’ve been silent – this is my first post on this site. But my interpretation of this debate is not similar to yours. I’m certainly not disillusioned. I think at least a couple of good examples of rigorous calculations have been given by vjtorley and CJYman. I think PaV and others have given satisfactory rebuttals for each of the four scenarios. And I think that several premise challenging questions have been asked of Mathgrrl which haven’t been addressed. For every lurker out there who ‘desperately’ wants ID supporters to meet Mathgrrl’s demands, there’s a lurker like me: who thinks that Mathgrrl needs to step up and respond in depth to the many attempts to help her; and who thinks that the consistency of message on the ‘show-me-rigorous-math-for-my-four-scenarios’ line, while rhetorically effective at first, falls flat when it's the seemingly the only thing she is willing to type. But I would agree with you in a strange way – I do hope that the ID supporters who have responded to Mathgrrl’s request keep up the posting. Not because they under some idealistic obligation to not ‘walk away from a fight’, but because the content is instructive (except for some of Joseph’s crabbiness). But I don’t blame them if they don’t.cmow
March 28, 2011
March
03
Mar
28
28
2011
10:56 PM
10
10
56
PM
PDT
#380 UB I commented to you that I make a distinction between an object, the information that can be created from an object, and an object arranged to contain information. So you did - and I have no problem with that. It arose because I asked what the amino acids in a protein symbolise. You said they didn't. The symbols were in the DNA. So I then asked what do the symbols in DNA (presumably the base pairs) symbolise? Actually I have asked this three times now.markf
March 28, 2011
March
03
Mar
28
28
2011
10:52 PM
10
10
52
PM
PDT
From PaV in 377: "When, by definition, a bit sequence has to be at least 500 bits long to rank as CSI, and someone wants you to rigoruosly define CSI for a bit-string (sequence) that is 260 bits long, what would you make of it?" Then take the human genome, duplicate any gene in any way you see fit, and then measure the CSI before and after. Surely the human genome is larger than 500 bits and fits your criteria. If you want I can give you an amino acid sequence larger than 500 bits and you can measure the CSI of that sequence. Or even better yet, I can give a diagram of the tertiary structure of the protein and you can measure the CSI of that protein. What would be most helpful?Taq
March 28, 2011
March
03
Mar
28
28
2011
10:13 PM
10
10
13
PM
PDT
From #331, it turns out that simply not engaging is the chosen option, and that is fine. A sensible choice, given the alternative. Mark, I commented to you that I make a distinction between an object, the information that can be created from an object, and an object arranged to contain information. I think observation supports that view, but if you have a case to make, then by all means make it. I will be traveling, but should be able to keep up.Upright BiPed
March 28, 2011
March
03
Mar
28
28
2011
09:15 PM
9
09
15
PM
PDT
Naturally when I told them what I had found that ended the discussion. But, of course, I had wasted a lot of my time. Maybe now you understand my reticence.
Well, I would suggest you are playing to the wrong audience. Who cares what the Darwinists say. Your real audience should be the supportive onlookers who wish to see ID succeed, but remain largely silent in this forum. Does it not bother you that those onlookers, among who I count myself, are becoming disillusioned with ID because it's most vocal supporters are basically walking away from an opportunity to demonstrate their primary tool? Do you believe ID is right or don't you? If you do, then you have no excuse to shrug your shoulders and walk away from this fight.jon specter
March 28, 2011
March
03
Mar
28
28
2011
07:37 PM
7
07
37
PM
PDT
Somewhere back in the 200's, Colin commented about psychological measures of somewhat "fuzzy" nature. I wanted to add that such factors resulting from statistical multivariate analysis may be fuzzy in the interpretation, but the method of calculation is explicit - There is no question of how to do the calculation or the definition of the methodology, which is a rather different situation than with the calculation of CSI. AND MathGrrl writes at #358:
My gut instinct is that, if CSI can be rigorously defined and objectively calculated, GAs will prove to be capable of generating it. I’d very much like to test that instinct (and ID claims at the same time).
This would seem to be a reasonable goal for all interested parties. If this is testable, it should be tested!Tomato Addict
March 28, 2011
March
03
Mar
28
28
2011
06:27 PM
6
06
27
PM
PDT
1 2 3 4 15

Leave a Reply