Uncommon Descent Serving The Intelligent Design Community

On The Calculation Of CSI

Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

My thanks to Jonathan M. for passing my suggestion for a CSI thread on and a very special thanks to Denyse O’Leary for inviting me to offer a guest post.

[This post has been advanced to enable a continued discussion on a vital issue. Other newer stories are posted below. – O’Leary ]

In the abstract of Specification: The Pattern That Signifies Intelligence, William Demski asks “Can objects, even if nothing is known about how they arose, exhibit features that reliably signal the action of an intelligent cause?” Many ID proponents answer this question emphatically in the affirmative, claiming that Complex Specified Information is a metric that clearly indicates intelligent agency.

As someone with a strong interest in computational biology, evolutionary algorithms, and genetic programming, this strikes me as the most readily testable claim made by ID proponents. For some time I’ve been trying to learn enough about CSI to be able to measure it objectively and to determine whether or not known evolutionary mechanisms are capable of generating it. Unfortunately, what I’ve found is quite a bit of confusion about the details of CSI, even among its strongest advocates.

My first detailed discussion was with UD regular gpuccio, in a series of four threads hosted by Mark Frank. While we didn’t come to any resolution, we did cover a number of details that might be of interest to others following the topic.

CSI came up again in a recent thread here on UD. I asked the participants there to assist me in better understanding CSI by providing a rigorous mathematical definition and showing how to calculate it for four scenarios:

  1. A simple gene duplication, without subsequent modification, that increases production of a particular protein from less than X to greater than X. The specification of this scenario is “Produces at least X amount of protein Y.”
  2. Tom Schneider’s ev evolves genomes using only simplified forms of known, observed evolutionary mechanisms, that meet the specification of “A nucleotide that binds to exactly N sites within the genome.” The length of the genome required to meet this specification can be quite long, depending on the value of N. (ev is particularly interesting because it is based directly on Schneider’s PhD work with real biological organisms.)
  3. Tom Ray’s Tierra routinely results in digital organisms with a number of specifications. One I find interesting is “Acts as a parasite on other digital organisms in the simulation.” The length of the shortest parasite is at least 22 bytes, but takes thousands of generations to evolve.
  4. The various Steiner Problem solutions from a programming challenge a few years ago have genomes that can easily be hundreds of bits. The specification for these genomes is “Computes a close approximation to the shortest connected path between a set of points.”

vjtorley very kindly and forthrightly addressed the first scenario in detail. His conclusion is:

I therefore conclude that CSI is not a useful way to compare the complexity of a genome containing a duplicated gene to the original genome, because the extra bases are added in a single copying event, which is governed by a process (duplication) which takes place in an orderly fashion, when it occurs.

In that same thread, at least one other ID proponent agrees that known evolutionary mechanisms can generate CSI. At least two others disagree.

I hope we can resolve the issues in this thread. My goal is still to understand CSI in sufficient detail to be able to objectively measure it in both biological systems and digital models of those systems. To that end, I hope some ID proponents will be willing to answer some questions and provide some information:

  1. Do you agree with vjtorley’s calculation of CSI?
  2. Do you agree with his conclusion that CSI can be generated by known evolutionary mechanisms (gene duplication, in this case)?
  3. If you disagree with either, please show an equally detailed calculation so that I can understand how you compute CSI in that scenario.
  4. If your definition of CSI is different from that used by vjtorley, please provide a mathematically rigorous definition of your version of CSI.
  5. In addition to the gene duplication example, please show how to calculate CSI using your definition for the other three scenarios I’ve described.

Discussion of the general topic of CSI is, of course, interesting, but calculations at least as detailed as those provided by vjtorley are essential to eliminating ambiguity. Please show your work supporting any claims.

Thank you in advance for helping me understand CSI. Let’s do some math!

Comments
Collin, Could you please show how your approach aligns with Dembski's definition of CSI and show how to calculate it for the four scenarios I described?MathGrrl
March 23, 2011
March
03
Mar
23
23
2011
11:38 AM
11
11
38
AM
PDT
PaV,
Give us your own sense of what you think specification is, and one of your own examples of what you think a specification is.
Again, I have provided a specification for each of four scenarios. If you find them somehow unusable, please provide your rigorous definition of CSI, show how to create appropriate specifications, and perform some example calculations. I clearly stated in the original post that, based on my reading of the available material, I do not understand how to calculate CSI. Instead of asking me questions, why don't you provide some answers?MathGrrl
March 23, 2011
March
03
Mar
23
23
2011
11:35 AM
11
11
35
AM
PDT
Joseph,
How was it determined that a gene duplication that leads to an additional protein is a blind watchmaker process?
The biochemistry that can result in a gene duplication is reasonably well understood. No intelligent agent is required for it to occur. If you are trying to make some form of fine-tuning, front-loading, or cosmological ID argument, that might be interesting but isn't pertinent to determining how to calculate CSI.MathGrrl
March 23, 2011
March
03
Mar
23
23
2011
11:32 AM
11
11
32
AM
PDT
Continued... This inference can be made because if a similar function arose in an unrelated species, then it would probably arise via different coding instructions. After all, there must be a million ways to say "Build protein X" in the genome. but if it is said the same way, in unrelated species, then a common designer is using what has worked for Him in the past to implement the new protein creation function.Collin
March 23, 2011
March
03
Mar
23
23
2011
11:27 AM
11
11
27
AM
PDT
vjtorley, Nice to hear from you again!
The “x2? refers to the semiotic description.
If I understand you correctly, "semiotic description" is equivalent to Kolmogorov-Chaitin complexity. I agreed with you in the previous thread when you suggested that this would make sense, but it's not what Dembski uses in his description of CSI. It may be interesting to discuss alternative metrics in a subsequent thread, but here I'm very focused on trying to understand CSI as described by Dembski.
Here’s a reply I got from physicist Rob Sheldon:
[I]f “x2? were independent of the remainder of the genome, it would be CSI. The point is that it isn’t independent, it is a duplication.
All mutations are dependent on the existing genome (this is one reason why the search metaphor isn't a particularly good one -- evolutionary mechanisms only "search" near known viable solutions). They are "just" duplications, "just" point mutations, "just" insertions, "just" deletions, etc. What you seem to be suggesting here is that CSI is only applicable to de novo creation events. Am I misunderstanding?
received a message recently from another ID proponent, E.H.:
I believe Rob is basically saying the CSI is a product of the duplication algorithm. . . . This is because the probability of G2 occurring given the duplication algorithm and G1 is exactly 1.
What "duplication algorithm"? A quick PubMed search yields one estimate of 0.00115 / million years / lineage for gene duplication events in vertebrates. That's a lot lower probability than 1. As near as I can tell, you followed Dembski's approach quite straightforwardly in your original calculation. Correct me if I'm overstating the case, but it appears that you and your correspondents are recognizing a need to adjust the definition of CSI to better account for known evolutionary mechanisms.MathGrrl
March 23, 2011
March
03
Mar
23
23
2011
11:27 AM
11
11
27
AM
PDT
Maybe you can identify CSI via triangulation of the following 3 factors. 1. Good code to functionality fit. 2. Identical code in another species that causes the identical function and 3. Those species are unrelated so that the code and function must have arisen separately (convergence). If those 3 factors are present, then we can point to a common designer and can be assured that the code is CSI.Collin
March 23, 2011
March
03
Mar
23
23
2011
11:24 AM
11
11
24
AM
PDT
#3 Pav Here’s a question I’d ask you to answer: What do you understand Bell Dembski’s notion of “specification” to mean, and, could you provide an example of a “specification”? I am sure Mathgrrl will respond to this very well. But it relates directly my point above. The ID community itself seems to divided about what "specified" means. As Mathgrrl is asking the ID community to clarify what they mean by CSI surely it is for you guys to say what they mean by "specification"?markf
March 23, 2011
March
03
Mar
23
23
2011
11:15 AM
11
11
15
AM
PDT
MathGrrl: The use of the word "specification" is at issue in at least one of your "scenarios". How is it that you want to have a discussion about "specification", yet you cannot provide a simple description and example of a specification. Leave these putative scenarios to the one side: Please answer the question. Give us your own sense of what you think specification is, and one of your own examples of what you think a specification is.PaV
March 23, 2011
March
03
Mar
23
23
2011
11:06 AM
11
11
06
AM
PDT
A few refereces I posted on my blog:
Biological specification always refers to function. An organism is a functional system comprising many functional subsystems. In virtue of their function, these systems embody patterns that are objectively given and can be identified independently of the systems that embody them. Hence these systems are specified in the same sense required by the complexity-specification criterion (see sections 1.3 and 2.5). The specification of organisms can be crashed out in any number of ways. Arno Wouters cashes it out globally in terms of the viability of whole organisms. Michael Behe cashes it out in terms of minimal function of biochemical systems.- Wm. Dembski page 148 of NFL
In the preceding and proceeding paragraphs William Dembski makes it clear that biological specification is CSI- complex specified information. In the paper "The origin of biological information and the higher taxonomic categories", Stephen C. Meyer wrote:
Dembski (2002) has used the term “complex specified information” (CSI) as a synonym for “specified complexity” to help distinguish functional biological information from mere Shannon information--that is, specified complexity from mere complexity. This review will use this term as well.
from Kirk K. Durston, David K. Y. Chiu, David L. Abel, Jack T. Trevors, “Measuring the functional sequence complexity of proteins,” Theoretical Biology and Medical Modelling, Vol. 4:47 (2007):
[N]either RSC [Random Sequence Complexity] nor OSC [Ordered Sequence Complexity], or any combination of the two, is sufficient to describe the functional complexity observed in living organisms, for neither includes the additional dimension of functionality, which is essential for life. FSC [Functional Sequence Complexity] includes the dimension of functionality. Szostak argued that neither Shannon’s original measure of uncertainty nor the measure of algorithmic complexity are sufficient. Shannon's classical information theory does not consider the meaning, or function, of a message. Algorithmic complexity fails to account for the observation that “different molecular structures may be functionally equivalent.” For this reason, Szostak suggested that a new measure of information—functional information—is required.
Here is a formal way of measuring functional information: Robert M. Hazen, Patrick L. Griffin, James M. Carothers, and Jack W. Szostak, "Functional information and the emergence of biocomplexity," Proceedings of the National Academy of Sciences, USA, Vol. 104:8574–8581 (May 15, 2007). See also: Jack W. Szostak, “Molecular messages,” Nature, Vol. 423:689 (June 12, 2003).Joseph
March 23, 2011
March
03
Mar
23
23
2011
10:52 AM
10
10
52
AM
PDT
Perhaps the linguistic method of stylometry can be applied to DNA. After all, it has been applied to music and art. http://en.wikipedia.org/wiki/Stylometry#Methods For those who don't know what stylometry is, I'll explain. When there is a historical record of disputed authorship, linguists use this method to discover the author. Linguists have discovered that when you write, you leave a "word print" (like a finger print) unconsciously in your work. Even if you write in different styles, use different "voices" and even write in different languages, your word print can be identified statistically. http://en.wikipedia.org/wiki/Writeprint I wonder if similar methods could be employed to find "word prints" in DNA, and if so, maybe that could lead to an inference of CSI. The big obstacle is that for stylometry you have to have a reference that you know was written by a certain author. So if you wanted to find out who wrote an anonymous Federalist Paper, you would have to compare it to writings of Thomas Jefferson or James Madison to compare them to. F. Mosteller and D. Wallace (1964). Inference and Disputed Authorship: The Federalist. Reading, MA: Addison-Wesley.Collin
March 23, 2011
March
03
Mar
23
23
2011
10:39 AM
10
10
39
AM
PDT
How was it determined that a gene duplication that leads to an additional protein is a blind watchmaker process?Joseph
March 23, 2011
March
03
Mar
23
23
2011
10:38 AM
10
10
38
AM
PDT
Hi MathGrrl: I've received a couple of personal communications on the subject of my CSI calculations since we last corresponded, which may be of assistance. Some people with whom I corresponded asked me to clarify my calculations, so I wrote the following explanation:
The "x2" refers to the semiotic description. Let me put it another way, borrowing an example from the old joke about what dogs understand when their owners are talking: "Blah Blah Blah Blah Ginger Blah Blah" - except that in this case the "Blah" is not repetitive. In the original genome, it's a long random string, then the gene that gets duplicated, and then more random stuff. And the gene that gets duplicated is itself a random string. To make things easier to visualize, I imagined that the duplicated gene was right at the end. I wrote the random stuff as "!@#$%^" even though of course it's all A's G's, T's and C's. I wrote the gene itself as (AGTCGAGTTC), even though a real gene has about 100,000 bases (and of course it's random too). Thus after the gene duplication, the simplest semiotic description is not !@#$%^(AGTCGAGTTC)(AGTCGAGTTC), but !@#$%^(AGTCGAGTTC)x2, which is much more economical. My point is that because the semiotic description is scarcely any longer than the original, Phi_s(T) in Professor Dembski's formula should be about the same for both. On the other hand, P(T|H) is strikingly different, because the duplicated genome is 100,000 bases longer than the original, so because there are four possible bases at each site, the probability of the duplicated genome is lower by a factor of 4^100,000. Put the two together, and you get an odd result where Chi is very high for the duplicated genome but not for the original. I hope that helps explain where I'm coming from. Is there an error in my CSI calculation?
Here's a reply I got from physicist Rob Sheldon:
[I]f "x2" were independent of the remainder of the genome, it would be CSI. The point is that it isn't independent, it is a duplication. In that case, all that is new is the "please duplicate" bit. Suppose we have used PKZIP to calculate the information in the genome. What happens when we duplicate a gene and run it through PKZIP? The amount of information added is very small. On the other hand, the resources to duplicate genes, and the entropy is not small. So if signal-to-noise is related to CSI, I would say duplicating a gene adds an eensy-weensy bit of information, while greatly increasing the entropic noise, so SNR goes down, CSI/codon goes down, and the designer doesn't need to be invoked.
I couldn't quite understand everything in Rob's reply, but fortunately I received a message recently from another ID proponent, E.H.:
I believe Rob is basically saying the CSI is a product of the duplication algorithm. This is a trick that Dr. Shallit used throughout his article "debunking" CSI. He claimed to show a number of times that he could algorithmically produce CSI, even though the CSI was already contained in his algorithms. When calculating CSI we have to first determine the probability of a gene G occurring given the chance hypothesis. In the case of the duplicated gene G2 this probability is still 1/4^(3,000,000,000), the same as the single gene G1. This is because the probability of G2 occurring given the duplication algorithm and G1 is exactly 1. So, the duplication adds nothing to the probability of occurring by chance. That is why Rob says the problem is G2 is not independent of G1. For G2 to be independent of G1, P(G2|G1) must equal P(G2|~G1), or at least be fairly close. If anything, duplication actually decreases any CSI that may have been in the gene, since the description still grows in length.
If I read E.H. correctly, he maintains that my original calculation of CSI in the duplication case was incorrect, and that in fact, using Dembski's formula in Specification: The Pattern That Signifies Intelligence, gene duplication does not, after all, increase CSI. Any thoughts?vjtorley
March 23, 2011
March
03
Mar
23
23
2011
10:34 AM
10
10
34
AM
PDT
All credit to Jonathan M and Denyse for allowing an anti-ID guest post. I look forward to seeing some instructions on how to do the calculations. I want to point out that vjtorley's comment is not the only one that illustrates disagreement about the concept of CSI within the ID community. For example, Gpuccio has in the past asserted that part of the definition of dFSCI is that the item in question should be scarcely compressible (see for example his comment here.) This is the complete opposite of Dembski's definition of CSI in Specification: The Pattern That Signifies Intelligence, where he defines CSI in terms of outcomes that are highly compressible (see pp 9-12). Gpuccio's response was admirably honest and straightforward - he doesn't agree with everything that William Dembski writes. But remember this is the most recent mathematical definition of CSI (as far as I know) by the most well-known theoretician in the field, which he explicitly says supersedes all previous definitions. So at the very least this must give pause to those who say the concept of CSI is so simple they cannot understand why we sceptics do not understand it and accuse us of "hyperscepticism".markf
March 23, 2011
March
03
Mar
23
23
2011
10:32 AM
10
10
32
AM
PDT
What if the original gene was CSI? Perhaps it is a command: "Build protein X." If the plan was duplicated then presumably you would get two protein Xs. But CSI is a quality not a quantity. So I don't think that by duplicating a gene you are creating CSI, you are just copying it. If you copy a novel, have you created CSI? No, you've just copied it. Perhaps that is the same thing in DNA. If CSI is a quality not a quantity, then I do not know if it can be measured mathematically. I hope I'm wrong though because it would be easier to start publishing some very powerful papers if CSI were measurable mathematically.Collin
March 23, 2011
March
03
Mar
23
23
2011
10:26 AM
10
10
26
AM
PDT
Collin,
I would ask for a clarification concerning the gene duplication. Does the gene duplication result in new function?
Gene duplication can increase the production of certain proteins, as described in my scenario. This increased production can have significant impact on subsequent chemical reactions.MathGrrl
March 23, 2011
March
03
Mar
23
23
2011
10:24 AM
10
10
24
AM
PDT
PaV,
What do you understand Bell Dembski’s notion of “specification” to mean, and, could you provide an example of a “specification”?
I provided the specifications for each of my scenarios. If you don't think those are "good" specifications for some reason, please explain why.MathGrrl
March 23, 2011
March
03
Mar
23
23
2011
10:22 AM
10
10
22
AM
PDT
I hesitate to comment because I'm not a mathematician. But I would ask for a clarification concerning the gene duplication. Does the gene duplication result in new function? I do know that in communication, repeating an idea adds nothing. For example, if I say, "I like the color red. I like the color red" the second sentence adds no new information. I suppose if I were speaking via radio and the signal was bad, that the duplication could help result in the message being correctly transmitted.Collin
March 23, 2011
March
03
Mar
23
23
2011
10:18 AM
10
10
18
AM
PDT
MathGrrl: You ask: 2. Do you agree with [vjtorley's] conclusion that CSI can be generated by known evolutionary mechanisms (gene duplication, in this case)? As I read the quote of his you've provided, this does NOT appear to be his conclusion. You've misunderstood him. Here's a question I'd ask you to answer: What do you understand Bell Dembski's notion of "specification" to mean, and, could you provide an example of a "specification"? If you can't answer this question relatively well, then there is no way to really have a discussion with you, I'm afraid.PaV
March 23, 2011
March
03
Mar
23
23
2011
10:17 AM
10
10
17
AM
PDT
Origins- CSI is about origins. If living organisms can arise from non-living matter via chance and necessity then CSI is moot, all evolutionary processes are blind watchmaker processes and ID is dead. If you are going to start with that which needs an explanaion in the first place- ie living organisms- then you have already cheated.Joseph
March 23, 2011
March
03
Mar
23
23
2011
10:11 AM
10
10
11
AM
PDT
For some time I’ve been trying to learn enough about CSI to be able to measure it objectively and to determine whether or not known evolutionary mechanisms are capable of generating it.
Blind watchmaker mechanisms, not "evolutionary" mechanisms. Ya see that is part of the problem- equvocation. For all you know most "evolutionary" mechanisms are design mechanisms. So first we have to deal with the equivocation.Joseph
March 23, 2011
March
03
Mar
23
23
2011
10:06 AM
10
10
06
AM
PDT
1 13 14 15

Leave a Reply