Uncommon Descent Serving The Intelligent Design Community

On The Calculation Of CSI

Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

My thanks to Jonathan M. for passing my suggestion for a CSI thread on and a very special thanks to Denyse O’Leary for inviting me to offer a guest post.

[This post has been advanced to enable a continued discussion on a vital issue. Other newer stories are posted below. – O’Leary ]

In the abstract of Specification: The Pattern That Signifies Intelligence, William Demski asks “Can objects, even if nothing is known about how they arose, exhibit features that reliably signal the action of an intelligent cause?” Many ID proponents answer this question emphatically in the affirmative, claiming that Complex Specified Information is a metric that clearly indicates intelligent agency.

As someone with a strong interest in computational biology, evolutionary algorithms, and genetic programming, this strikes me as the most readily testable claim made by ID proponents. For some time I’ve been trying to learn enough about CSI to be able to measure it objectively and to determine whether or not known evolutionary mechanisms are capable of generating it. Unfortunately, what I’ve found is quite a bit of confusion about the details of CSI, even among its strongest advocates.

My first detailed discussion was with UD regular gpuccio, in a series of four threads hosted by Mark Frank. While we didn’t come to any resolution, we did cover a number of details that might be of interest to others following the topic.

CSI came up again in a recent thread here on UD. I asked the participants there to assist me in better understanding CSI by providing a rigorous mathematical definition and showing how to calculate it for four scenarios:

  1. A simple gene duplication, without subsequent modification, that increases production of a particular protein from less than X to greater than X. The specification of this scenario is “Produces at least X amount of protein Y.”
  2. Tom Schneider’s ev evolves genomes using only simplified forms of known, observed evolutionary mechanisms, that meet the specification of “A nucleotide that binds to exactly N sites within the genome.” The length of the genome required to meet this specification can be quite long, depending on the value of N. (ev is particularly interesting because it is based directly on Schneider’s PhD work with real biological organisms.)
  3. Tom Ray’s Tierra routinely results in digital organisms with a number of specifications. One I find interesting is “Acts as a parasite on other digital organisms in the simulation.” The length of the shortest parasite is at least 22 bytes, but takes thousands of generations to evolve.
  4. The various Steiner Problem solutions from a programming challenge a few years ago have genomes that can easily be hundreds of bits. The specification for these genomes is “Computes a close approximation to the shortest connected path between a set of points.”

vjtorley very kindly and forthrightly addressed the first scenario in detail. His conclusion is:

I therefore conclude that CSI is not a useful way to compare the complexity of a genome containing a duplicated gene to the original genome, because the extra bases are added in a single copying event, which is governed by a process (duplication) which takes place in an orderly fashion, when it occurs.

In that same thread, at least one other ID proponent agrees that known evolutionary mechanisms can generate CSI. At least two others disagree.

I hope we can resolve the issues in this thread. My goal is still to understand CSI in sufficient detail to be able to objectively measure it in both biological systems and digital models of those systems. To that end, I hope some ID proponents will be willing to answer some questions and provide some information:

  1. Do you agree with vjtorley’s calculation of CSI?
  2. Do you agree with his conclusion that CSI can be generated by known evolutionary mechanisms (gene duplication, in this case)?
  3. If you disagree with either, please show an equally detailed calculation so that I can understand how you compute CSI in that scenario.
  4. If your definition of CSI is different from that used by vjtorley, please provide a mathematically rigorous definition of your version of CSI.
  5. In addition to the gene duplication example, please show how to calculate CSI using your definition for the other three scenarios I’ve described.

Discussion of the general topic of CSI is, of course, interesting, but calculations at least as detailed as those provided by vjtorley are essential to eliminating ambiguity. Please show your work supporting any claims.

Thank you in advance for helping me understand CSI. Let’s do some math!

Comments
niwrad,
“The difference is, as noted in the original post and in my post 6, a duplicate gene can lead to an increase in production of a particular protein, with significant impact on the subsequent biochemistry. Such a change in protein production can even enable or disable other genes. The analogy to email or books is fatally flawed.”
My general reasoning was: given a simple text string X with CSI(X), the concatenation of another X, giving XX, has CSI(XX) = CSI(X).
You're just restating your original claim that duplication does not increase CSI. I explained how duplication in biological systems can result in significant biochemical changes. If you maintain that this does not increase CSI, please show detailed calculations for the scenario I described.MathGrrl
March 24, 2011
March
03
Mar
24
24
2011
02:01 PM
2
02
01
PM
PDT
StephenB,
markf: “Mathgrrl points out Dembski writes in his paper (which he says supersedes all other definitions of CSI) (Dembski) “By contrast, to employ specified complexity to infer design is to take the view that objects, even if nothing is known about how they arose, can exhibit features that reliably signal the action of an intelligent cause” -markf: “i.e. we should be able to look at an object and assess its CSI without knowing anything about its origins.” Dembski is speaking about the how of the origin not the fact of the origin. In other words, we can assess its CSI without knowing anything about the process or the mechanism that produced it. He is not saying that CSI could be about something other than origins.
That does not follow at all from either what Dembski has written, the calculations in his books and papers, or how CSI is used by other ID proponents. Are you asserting that CSI cannot be calculated for biological systems or components? If not, please show how to calculate it for my four scenarios.MathGrrl
March 24, 2011
March
03
Mar
24
24
2011
02:01 PM
2
02
01
PM
PDT
Collin,
So here is your calculation: when the command (build X amount of protein Y) results in X amount of protein Y, then you have a perfect fit between code and function. 100%. But if you get a gene duplication that says “Build 2X amount of Protein Y” and you only get 1.9X amount of protein Y, then you have less than 100% fit between code and function and therefore you have a decrease in CSI. But if you get 2X amount of Protein Y, you do not have an INCREASE in CSI, you have a CHANGE in CSI.
First, your approach highlights one of my points of confusion around CSI, namely what constitutes a valid specification. There seems to be a great deal of disagreement on this. The specification I provide in my example is "Produces at least X amount of protein Y." Instead of changing that, please explain why it isn't a reasonable specification. Second, Dembski's CSI has units of bits. A change must be either an increase or a decrease in the number of bits. Are you agreeing with vjtorley and others that CSI can be increased by gene duplication?MathGrrl
March 24, 2011
March
03
Mar
24
24
2011
02:00 PM
2
02
00
PM
PDT
Upright BiPed,
I had made the decision to stop asking you to acknowledge the point. It had become obvious that you have no intention of doing so. I see now that this judgment has been confirmed yet again.
I am disappointed in your decision and will certainly happily continue the discussion with you, should you ever decide to do so. The arguments of someone who thinks it logical to ask others to define his terms for him should prove . . . interesting.MathGrrl
March 24, 2011
March
03
Mar
24
24
2011
02:00 PM
2
02
00
PM
PDT
SCheesman,
First, what is the “object” in your example? Is it the act of duplication? Is it the existence of the protein that is being duplicated? Is it the specific action of the protein? Is it the degree of efficacy of that production? Is it the composition or complexity of the item being produced?
In my gene duplication scenario, the object is the post-duplication genome. Can you provide a detailed calculation for the CSI of this object?MathGrrl
March 24, 2011
March
03
Mar
24
24
2011
01:59 PM
1
01
59
PM
PDT
Joseph,
Your four scenaios are bogus as not one deals with ORIGINS and CSI is all about ORIGINS
Mark Frank has already addressed this objection quite concisely. The only point I would like to add is that CSI is claimed by ID proponents, including Dembski, as an unambiguous indicator of intelligent agency for biological artifacts such as the bacterial flagella. If CSI were really only about origins, such claims would be ridiculous on their face. The fact is that ID proponents do claim to be able to measure CSI in biological systems without reference to their origins. As Dembski himself states in the paper referenced in my original post, "Always in the background throughout this discussion is the fundamental question of Intelligent Design (ID): Can objects, even if nothing is known about how they arose, exhibit features that reliably signal the action of an intelligent cause?" Note the phrase even if nothing is known about how they arose.MathGrrl
March 24, 2011
March
03
Mar
24
24
2011
01:59 PM
1
01
59
PM
PDT
QuietID:
Joseph, if you’re saying that bad mutations & duplications are chance but good mutations & duplications are design, what’s your evidence for that?
That's not what I am saying. The bad mutations that aren't point mutations are the result of a design gone bad.Joseph
March 24, 2011
March
03
Mar
24
24
2011
01:45 PM
1
01
45
PM
PDT
And I hypothesize that when a genome is doubled and has an effect, the negative results of that duplication are almost always expressed unless another system in the cell takes an active role in correcting the problem (error checking, redundancies, etc).Collin
March 24, 2011
March
03
Mar
24
24
2011
01:17 PM
1
01
17
PM
PDT
Collin, of course. Personally, I don't think "good" mutations or duplications are common enough to support evolution. But for some here to suggest "if it's a good mutation, then it's designed or frontloaded" -- well that's just dumb.QuiteID
March 24, 2011
March
03
Mar
24
24
2011
01:13 PM
1
01
13
PM
PDT
Joseph, if you're saying that bad mutations & duplications are chance but good mutations & duplications are design, what's your evidence for that?QuiteID
March 24, 2011
March
03
Mar
24
24
2011
01:06 PM
1
01
06
PM
PDT
No one is denying that doubling a genome will having an effect. The question is, does the effect cause a new and beneficial result? The effect, if not absolutely perfect, is almost always a deterioration of a multi-faceted function. It usually degrades and does not lead to a beneficial result much less the construction of a very complex organelle or something like that.Collin
March 24, 2011
March
03
Mar
24
24
2011
01:06 PM
1
01
06
PM
PDT
niwrad #120, Suppose that duplication of a gene doubles the amount of a protein produced by a cell. This can have a huge impact on phenotype. MathGrrl seems to have measure of CSI on the phenotype, and not the genotype, in mind. Take my behavior as the phenotype. If you send me an email message, I may put off dealing with it. If you repeat the transmission, I will generally respond promptly. What I'm describing here is a nonlinear response to a redundant transmission.Noesis
March 24, 2011
March
03
Mar
24
24
2011
01:00 PM
1
01
00
PM
PDT
Mathgrrl, I would love it if you'd address my comment #95. Concerning your situation #2 (Schneider's ev) do the genomes do something other than bind to sites? I mean, how is it different from dropping a mixture of square and round pegs into a seive and having the round pegs go into the round holes and the square pegs going into the square holes after enough shaking around?Collin
March 24, 2011
March
03
Mar
24
24
2011
12:58 PM
12
12
58
PM
PDT
Spiny Norman #64
"Depends what you mean by “sends an email twice”. I’m an IT guy. Duplicate emails are not usually identical. I can examine headers to see whether they were sent by the same mail client, from the same server, with the same unique MessageID, passed through the same set of mail servers, etc etc."
I agree that in general there are no two perfectly identical things or events in the entire universe (for the Leibniz’s principle of identity of indiscernibles). However my argument about duplication of messages not increasing CSI is fully independent from the medium. Hence your notes as an IT admin about email protocols and systems are ininfluent. In fact you can well think the two equal messages written on paper or whatever and nothing changes about the fact that CSI doesn’t increase.niwrad
March 24, 2011
March
03
Mar
24
24
2011
12:23 PM
12
12
23
PM
PDT
markf, OTOH if any sequence of amino acids can result in a protein then specified information really isn't. If Craig Venter inserted randomly generted synthesized DNA into a bacteria that had its DNA removed and it worked, specified information would evaporate.Joseph
March 24, 2011
March
03
Mar
24
24
2011
11:50 AM
11
11
50
AM
PDT
QuietID:
I don’t know why you say that. For example, specific gene duplications have been associated with various medical problems. Unless we say those duplications in those patients are designed (to hurt them?) those seem to be undirected.
Or caused by a program corrupted by the blind watchmaker.Joseph
March 24, 2011
March
03
Mar
24
24
2011
11:45 AM
11
11
45
AM
PDT
markf:
I agree that CSI is supposed to be a method for making deductions about origins. However, the whole point of Dembski’s paper (and indeed his other work) is that he suggests that CSI is a property of an object that you can assess without knowing anything about its origins.
Right if we didn't know anything about its origins and CSI is present we infer a designing agency was required. markf:
All that Mathgrrl is asking is how do you calculate CSI in some specific cases.
Right and at least some IDists are saying those cases are bogus. markf:
So Joseph’s objection that the situations she puts forward are not about origins is irrelevant.
The point of CSI is that its existence is a sign of a designing agency. CSI is defined as X number of bits f specified information. In "No Free Lunch" X = 500, which the math shows equals a probability greater than the upper probability bound. With respect to biology specified information equates to biological function. To see if CSI is present we need to determine if there is > 500 bits of specified information. One "easy" example of doing so is taking a gene that cannot tolerate any variation- for example say it it codes for a protein that has 200 amino acids- all have to be in that specific order. 6 bits per amino acid (2^6 = 64) x 200 amino acids = 1200 bits of specified information. And that means CSI is present. That said if there can be some variation you have to figure that in, which brings us back to the paper I linked to in comment 12.Joseph
March 24, 2011
March
03
Mar
24
24
2011
11:42 AM
11
11
42
AM
PDT
MathGrrl, Where is the rigorous definition of the sample space ? [Omega]? If the sample space is ill-defined, then so is CSI.Noesis
March 24, 2011
March
03
Mar
24
24
2011
11:33 AM
11
11
33
AM
PDT
KairosFocus: For something to be "functional", I believe it has to be translatable. If I write random letters, it's an entirely useless activity. One can think of art: obviously the work of an intelligent agent (although sometimes this isn't obvious!) and one where, especially in its most abstract forms, the question is asked what does it mean? Then one tries to "interpret" (interpreters "interpret" from one language to another; i.e., they "translate") what the various objects and parts of the artwork mean. The example that M Holcumbrink used about the 3D configurations of t-RNA seems to fall both with the "functional" and "translation" categories. However, in a more general way one could say that the nucleotides that form the t-RNA molecule have been "translated", via chemical bonds, to a functional form: that is, if the nucleotides making up t-RNA molecule were linear, the translation mechanism of the cell would not operate properly. So, the "specification" of the sequence, leading to its ultimate configuration, is both a "translation" from a linear form to a cruciform pattern that provides needed function. Got to go. See you all tomorrow.PaV
March 24, 2011
March
03
Mar
24
24
2011
11:14 AM
11
11
14
AM
PDT
MathGrrl[78]:
Please provide a mathematically rigorous method for creating a specification and show how those that I included with the four scenarios in my original post do not meet your criteria. You seem to be simply dismissing them out of hand.
You keep making, in the words of SCheesman, a "category" mistake. I've already written, per Dembski, that a "specification" is fundamentally a "pattern". If you could "mathematically ... [create]" a "specification", then it would no longer be specification. Or, rather, you CAN'T "create" a "specification". "Specifications" are DISCOVERED: I see something. It suggests a pattern to me. I uncover the pattern (which means that it is translatable, or functional). I determine its complexity. If it exceeds the UPB, then its CSI. Only the mind can detect a "specification". If there were some mathematical determination of it, then it wouldn't be CSI; it would be the result of whatever the mathematical equations determined it to be. You, and almost every other critic of ID, save Sober, make this fundamental categorical mistake. Why? Is it willful ignorance? What blinds you to the rather obvious?
PV:Here’s a link to a paper from 2003 that uses a variant of Shannon and Kolmogorov complexity to calculate the increase of complexity over increasing computer time used. While I have an interest in Kolmogorov-Chaitin complexity, the topic of this thread is CSI as defined by Dembski. Please provide example calculations of CSI for the four scenarios I described in my original post.
Please.... The basic model that Dembski uses for determining "complexity" is the inverse of Shannon information. How is it that you can't see that if something is the most "specified" thing in the world, attested to by every scientist in existence, would NOT constitute CSI if the complexity didn't exceed 500 bits, or whatever the equivalent of 10^150 is? What was the number of bits of "complexity" they found in Tierra? 30-58. End of story. I think it is the height of hubris to come to this blog and DEMAND that someone demonstrate to you in a mathematically rigorous fashion that CSI was NOT generated by Tierra and its ilk. That is the stuff of Ph'D work. You could publish a book if you did that. You'd have to learn assembly language in order to evaluate it. This is a ridiculous demand. No, the burden is on you. If you think that Dembski is wrong, then you point out just how Tierra, ev and the like, produce CSI. Instead, you come here and say: define CSI for me. Well, read the books. It's really simple enough. Then you can come back and prove Dembski wrong. Shallit has tried to prove Dembski wrong, and it turned out he was wrong. Why? Because he doesn't understand what a "specification" is, first of all, and, second, because he's convinced that CSI is trivial. Well, he was, and is, wrong. The classic example of CSI is the random binary string that turns out to be the first hundred digits in binary code. Either you get that, or you don't. Beyond that, you're on your own. Quit pestering us. Try to learn the stuff.PaV
March 24, 2011
March
03
Mar
24
24
2011
11:04 AM
11
11
04
AM
PDT
Oops. I cannot recall any piece of technical writing by Dembski that does not refer to -log p as information.Noesis
March 24, 2011
March
03
Mar
24
24
2011
10:49 AM
10
10
49
AM
PDT
Welcome Mathgrrl, Evolutionary algorithms can generate CSI if it acts as a surrogate of an intelligent agency. I demonstrated as much here at UD with my Genetic Algorithm: Dave Thomas Says Cordova's Algorithm is Remarkable The question is whether evolutionary algorithms can spontaneously generate CSI without authorship of intelligent agency. That is the subject of No Free Lunch discussions.scordova
March 24, 2011
March
03
Mar
24
24
2011
10:46 AM
10
10
46
AM
PDT
Upright BiPed says that
information – any information – only exists by means of a semiotic convention and rules (unless you disagree, and can show an example otherwise).
The Shannon self-information of an event that occurs with probability p is -log p. Shannon chose the term "self-information" precisely to indicate that the quantity is unrelated to communication. And without communication, there is no semiosis. One of the two terms (quantities added together) in Dembski's latest definition of CSI is the self-information of the event of "hitting the target." And Dembski has always emphasized that the specification of the target is detachable. The other term in the expression for CSI is a measure of the complexity of the description of the target by some semiotic agent. I cannot recall any piece of technical writing by Dembski that does refer -log p as information. So it appears that Upright BiPed disagrees with Dembski as to what constitutes information.Noesis
March 24, 2011
March
03
Mar
24
24
2011
10:44 AM
10
10
44
AM
PDT
O'Leary #50
"Niwrad at 37, I must disagree with the view that duplication adds no new information. It often does. Let me tell you a wheeze from the mid-twentieth century that perfectly illustrates that fact:A young lady went into the telegraph office and asked to send a telegram. She gave the operator a piece of flowered stationery with one word on it: Yes.The operator explained: Miss, it’ll cost you $2.00. You can have ten words for $2.00. She replied, “Certainly not! Nine more yesses will make it sound like I am too anxious.”"
Here duplication could mean anxiety; or could mean the operator is drunk; or the telegraph line has problems, etc. How can the receiver know the true reason (among many) of the duplication? He cannot. This means that the duplicated message *per se* doesn’t convey complex specified information, rather a simple unspecified bit of uncertainty, which is not at all CSI. Therefore also in this case duplication doesn’t increase CSI.niwrad
March 24, 2011
March
03
Mar
24
24
2011
10:43 AM
10
10
43
AM
PDT
Joseph #99, Stephenb #100 I agree that CSI is supposed to be a method for making deductions about origins. However, the whole point of Dembski's paper (and indeed his other work) is that he suggests that CSI is a property of an object that you can assess without knowing anything about its origins. All that Mathgrrl is asking is how do you calculate CSI in some specific cases. According to Dembski it should be possible to do this without knowing anything about the origins of the object. So Joseph's objection that the situations she puts forward are not about origins is irrelevant.markf
March 24, 2011
March
03
Mar
24
24
2011
10:39 AM
10
10
39
AM
PDT
M Holcumbrink: Personally, I find that causing the physical medium that carries the code to serve as both code carrier and components for machinery is SCARY genius. It causes me to join in with the Psalmist and say “we are FEARFULLY and wonderfully made”! Amen!PaV
March 24, 2011
March
03
Mar
24
24
2011
10:33 AM
10
10
33
AM
PDT
MathGrrl #48
"The difference is, as noted in the original post and in my post 6, a duplicate gene can lead to an increase in production of a particular protein, with significant impact on the subsequent biochemistry. Such a change in protein production can even enable or disable other genes. The analogy to email or books is fatally flawed."
My general reasoning was: given a simple text string X with CSI(X), the concatenation of another X, giving XX, has CSI(XX) = CSI(X). Differently, you provide a more complex biological scenario where a gene X is duplicated (giving XX); where exist boundary conditions such as XX causes increase in production of a particular protein or even enable or disable other genes. Such conditions are mechanisms or logic of the overall system that make your scenario different and richer than mine. Here you have an original gene X with CSI(X) plus a system Y implying the above mechanisms with CSI(Y). So here CSI(XX) seems to be greater than CSI(X) only because there is also CSI(Y) at play.niwrad
March 24, 2011
March
03
Mar
24
24
2011
10:14 AM
10
10
14
AM
PDT
Joseph, "The only way to say a gene duplication is a blind watchmaker process is to demonstrate that blind watchmaker-type processes can account for the origin of living organisms from non-living matter and energy." I don't know why you say that. For example, specific gene duplications have been associated with various medical problems. Unless we say those duplications in those patients are designed (to hurt them?) those seem to be undirected.QuiteID
March 24, 2011
March
03
Mar
24
24
2011
10:13 AM
10
10
13
AM
PDT
QuietID, Yes, but the chances of that are low. CSI is a probablistic argument and I think that what you describe is very very improbable. So if we see tons of CSI then we can safely say that there has been design.Collin
March 24, 2011
March
03
Mar
24
24
2011
10:03 AM
10
10
03
AM
PDT
markf @ 102 "I assume the symbols in a string of DNA are the bases. What do they symbolise?" Life.tgpeeler
March 24, 2011
March
03
Mar
24
24
2011
09:49 AM
9
09
49
AM
PDT
1 9 10 11 12 13 15

Leave a Reply