Uncommon Descent Serving The Intelligent Design Community

On The Calculation Of CSI

Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

My thanks to Jonathan M. for passing my suggestion for a CSI thread on and a very special thanks to Denyse O’Leary for inviting me to offer a guest post.

[This post has been advanced to enable a continued discussion on a vital issue. Other newer stories are posted below. – O’Leary ]

In the abstract of Specification: The Pattern That Signifies Intelligence, William Demski asks “Can objects, even if nothing is known about how they arose, exhibit features that reliably signal the action of an intelligent cause?” Many ID proponents answer this question emphatically in the affirmative, claiming that Complex Specified Information is a metric that clearly indicates intelligent agency.

As someone with a strong interest in computational biology, evolutionary algorithms, and genetic programming, this strikes me as the most readily testable claim made by ID proponents. For some time I’ve been trying to learn enough about CSI to be able to measure it objectively and to determine whether or not known evolutionary mechanisms are capable of generating it. Unfortunately, what I’ve found is quite a bit of confusion about the details of CSI, even among its strongest advocates.

My first detailed discussion was with UD regular gpuccio, in a series of four threads hosted by Mark Frank. While we didn’t come to any resolution, we did cover a number of details that might be of interest to others following the topic.

CSI came up again in a recent thread here on UD. I asked the participants there to assist me in better understanding CSI by providing a rigorous mathematical definition and showing how to calculate it for four scenarios:

  1. A simple gene duplication, without subsequent modification, that increases production of a particular protein from less than X to greater than X. The specification of this scenario is “Produces at least X amount of protein Y.”
  2. Tom Schneider’s ev evolves genomes using only simplified forms of known, observed evolutionary mechanisms, that meet the specification of “A nucleotide that binds to exactly N sites within the genome.” The length of the genome required to meet this specification can be quite long, depending on the value of N. (ev is particularly interesting because it is based directly on Schneider’s PhD work with real biological organisms.)
  3. Tom Ray’s Tierra routinely results in digital organisms with a number of specifications. One I find interesting is “Acts as a parasite on other digital organisms in the simulation.” The length of the shortest parasite is at least 22 bytes, but takes thousands of generations to evolve.
  4. The various Steiner Problem solutions from a programming challenge a few years ago have genomes that can easily be hundreds of bits. The specification for these genomes is “Computes a close approximation to the shortest connected path between a set of points.”

vjtorley very kindly and forthrightly addressed the first scenario in detail. His conclusion is:

I therefore conclude that CSI is not a useful way to compare the complexity of a genome containing a duplicated gene to the original genome, because the extra bases are added in a single copying event, which is governed by a process (duplication) which takes place in an orderly fashion, when it occurs.

In that same thread, at least one other ID proponent agrees that known evolutionary mechanisms can generate CSI. At least two others disagree.

I hope we can resolve the issues in this thread. My goal is still to understand CSI in sufficient detail to be able to objectively measure it in both biological systems and digital models of those systems. To that end, I hope some ID proponents will be willing to answer some questions and provide some information:

  1. Do you agree with vjtorley’s calculation of CSI?
  2. Do you agree with his conclusion that CSI can be generated by known evolutionary mechanisms (gene duplication, in this case)?
  3. If you disagree with either, please show an equally detailed calculation so that I can understand how you compute CSI in that scenario.
  4. If your definition of CSI is different from that used by vjtorley, please provide a mathematically rigorous definition of your version of CSI.
  5. In addition to the gene duplication example, please show how to calculate CSI using your definition for the other three scenarios I’ve described.

Discussion of the general topic of CSI is, of course, interesting, but calculations at least as detailed as those provided by vjtorley are essential to eliminating ambiguity. Please show your work supporting any claims.

Thank you in advance for helping me understand CSI. Let’s do some math!

Comments
PaV,
I think this is an insincere statement. You can’t possibly state that you’re familiar with No Free Lunch and then turn around and then say: ” . . . I don’t understand ow to calculate CSI.”
That is exactly what I am, quite sincerely, saying. Since you seem to have a good grasp on the topic, would you please define CSI with some mathematical rigor and demonstrate how to calculate it for the four scenarios I detailed in the original post?MathGrrl
March 23, 2011
March
03
Mar
23
23
2011
03:39 PM
3
03
39
PM
PDT
Upright BiPed,
In each of the threads you recently participated in here at UD, you kept making the claim that evolutionary algorithms can create information, in this case complex specified information.
To be precise, I noted that known evolutionary mechanisms can create CSI based on the definition used by vjtorley in his calculation. No one else has defined CSI with any degree of mathematical rigor, let alone provided any example calculations.
So the question remains: Does the output of any evolutionary algorithm being modeled establish the semiosis required for information to exist, or does it take it for granted as an already existing quality.
Darned if I know, it really depends on the exact definition of "information", "complex specified information" in this case. Would you please define CSI with some mathematical rigor and demonstrate how to calculate it for the four scenarios I detailed in the original post?MathGrrl
March 23, 2011
March
03
Mar
23
23
2011
03:39 PM
3
03
39
PM
PDT
Joseph,
Origins- CSI is about origins.
That is not reflected in Dembski's paper referenced in the original post of this thread and it does not help to demonstrate how to calculate CSI.MathGrrl
March 23, 2011
March
03
Mar
23
23
2011
03:38 PM
3
03
38
PM
PDT
Indium says "Anyway, it is quite easy to see that, no matter what the exact definition is, known evolutionary processes like gene duplication+divergence can increase CSI, at least theoretically. Therefore, from a purely mathematical point of view, things are quite clear." No matter what the exact definition is? If you don't know what the definition is then how is it "quite clear?"Collin
March 23, 2011
March
03
Mar
23
23
2011
02:58 PM
2
02
58
PM
PDT
F/N: What Dembski said on CSI at ARN in 1998, is; as excerpted: ______________ >> I shall (1) show how information can be reliably detected and measured [he develops in outline the usual negative log probability metric that traces to Hartley et al and which is easily accessed elsewhere, e.g in the always linked section a], and (2) formulate a conservation law that governs the origin and flow of information. My broad conclusion is that information is not reducible to natural causes, and that the origin of information is best sought in intelligent causes. Intelligent design thereby becomes a theory for detecting and measuring information, explaining its origin, and tracing its flow . . . . In Steps Towards Life Manfred Eigen (1992, p. 12) identifies what he regards as the central problem facing origins-of-life research: “Our task is to find an algorithm, a natural law that leads to the origin of information.” Eigen is only half right. To determine how life began, it is indeed necessary to understand the origin of information. Even so, neither algorithms nor natural laws are capable of producing information . . . . What then is information? The fundamental intuition underlying information is not, as is sometimes thought, the transmission of signals across a communication channel, but rather, the actualization of one possibility to the exclusion of others. As Fred Dretske (1981, p. 4) puts it, “Information theory identifies the amount of information associated with, or generated by, the occurrence of an event (or the realization of a state of affairs) with the reduction in uncertainty, the elimination of possibilities, represented by that event or state of affairs.” . . . . For specified information not just any pattern will do. We therefore distinguish between the “good” patterns and the “bad” patterns. The “good” patterns will henceforth be called specifications. Specifications are the independently given patterns that are not simply read off information . . . . The distinction between specified and unspecified information may now be defined as follows: the actualization of a possibility (i.e., information) is specified if independently of the possibility’s actualization, the possibility is identifiable by means of a pattern. If not, then the information is unspecified. Note that this definition implies an asymmetry between specified and unspecified information: specified information cannot become unspecified information, though unspecified information may become specified information . . . . there are functional patterns to which life corresponds, and which are given independently of the actual living systems. An organism is a functional system comprising many functional subsystems. The functionality of organisms can be cashed out in any number of ways. Arno Wouters (1995) cashes it out globally in terms of viability of whole organisms. Michael Behe (1996) cashes it out in terms of the irreducible complexity and minimal function of biochemical systems. Even the staunch Darwinist Richard Dawkins will admit that life is specified functionally, cashing out the functionality of organisms in terms of reproduction of genes. Thus Dawkins (1987, p. 9) will write: “Complicated things have some quality, specifiable in advance, that is highly unlikely to have been acquired by random chance alone. In the case of living things, the quality that is specified in advance is . . . the ability to propagate genes in reproduction.” . . . . To see why CSI is a reliable indicator of design, we need to examine the nature of intelligent causation. The principal characteristic of intelligent causation is directed contingency, or what we call choice. Whenever an intelligent cause acts, it chooses from a range of competing possibilities. This is true not just of humans, but of animals as well as extra-terrestrial intelligences . . . . A bottle of ink spills accidentally onto a sheet of paper; someone takes a fountain pen and writes a message on a sheet of paper. In both instances ink is applied to paper. In both instances one among an almost infinite set of possibilities is realized. In both instances a contingency is actualized and others are ruled out. Yet in one instance we infer design, in the other chance. What is the relevant difference? Not only do we need to observe that a contingency was actualized, but we ourselves need also to be able to specify that contingency. The contingency must conform to an independently given pattern, and we must be able independently to formulate that pattern. A random ink blot is unspecifiable; a message written with ink on paper is specifiable . . . . CSI is a reliable indicator of design because its recognition coincides with how we recognize intelligent causation generally. In general, to recognize intelligent causation we must establish that one from a range of competing possibilities was actualized, determine which possibilities were excluded, and then specify the possibility that was actualized. What’s more, the competing possibilities that were excluded must be live possibilities, sufficiently numerous so that specifying the possibility that was actualized cannot be attributed to chance. In terms of probability, this means that the possibility that was specified is highly improbable. In terms of complexity, this means that the possibility that was specified is highly complex . . . . To see that natural causes cannot account for CSI is straightforward. Natural causes comprise chance and necessity (cf. Jacques Monod’s book by that title). Because information presupposes contingency, necessity is by definition incapable of producing information, much less complex specified information. For there to be information there must be a multiplicity of live possibilities, one of which is actualized, and the rest of which are excluded. This is contingency. But if some outcome B is necessary given antecedent conditions A, then the probability of B given A is one, and the information in B given A is zero. If B is necessary given A, Formula (*) reduces to I(A&B) = I(A), which is to say that B contributes no new information to A. It follows that necessity is incapable of generating new information. Observe that what Eigen calls “algorithms” and “natural laws” fall under necessity . . . Contingency can assume only one of two forms. Either the contingency is a blind, purposeless contingency-which is chance; or it is a guided, purposeful contingency-which is intelligent causation. Since we already know that intelligent causation is capable of generating CSI (cf. section 4), let us next consider whether chance might also be capable of generating CSI. First notice that pure chance, entirely unsupplemented and left to its own devices, is incapable of generating CSI. Chance can generate complex unspecified information, and chance can generate non-complex specified information. What chance cannot generate is information that is jointly complex and specified. Biologists by and large do not dispute this claim. Most agree that pure chance-what Hume called the Epicurean hypothesis-does not adequately explain CSI. Jacques Monod (1972) is one of the few exceptions, arguing that the origin of life, though vastly improbable, can nonetheless be attributed to chance because of a selection effect. Just as the winner of a lottery is shocked at winning, so we are shocked to have evolved. But the lottery was bound to have a winner, and so too something was bound to have evolved. Something vastly improbable was bound to happen, and so, the fact that it happened to us (i.e., that we were selected-hence the name selection effect) does not preclude chance. This is Monod’s argument and it is fallacious. It fails utterly to come to grips with specification . . . . The problem here is not simply one of faulty statistical reasoning. Pure chance is also scientifically unsatisfying as an explanation of CSI. To explain CSI in terms of pure chance is no more instructive than pleading ignorance or proclaiming CSI a mystery. It is one thing to explain the occurrence of heads on a single coin toss by appealing to chance. It is quite another, as Küppers (1990, p. 59) points out, to follow Monod and take the view that “the specific sequence of the nucleotides in the DNA molecule of the first organism came about by a purely random process in the early history of the earth.” CSI cries out for explanation, and pure chance won’t do. As Richard Dawkins (1987, p. 139) correctly notes, “We can accept a certain amount of luck in our [scientific] explanations, but not too much.” If chance and necessity left to themselves cannot generate CSI, is it possible that chance and necessity working together might generate CSI? The answer is No. Whenever chance and necessity work together, the respective contributions of chance and necessity can be arranged sequentially. But by arranging the respective contributions of chance and necessity sequentially, it becomes clear that at no point in the sequence is CSI generated. Consider the case of trial-and-error (trial corresponds to necessity and error to chance). Once considered a crude method of problem solving, trial-and-error has so risen in the estimation of scientists that it is now regarded as the ultimate source of wisdom and creativity in nature. The probabilistic algorithms of computer science (e.g., genetic algorithms-see Forrest, 1993) all depend on trial-and-error. So too, the Darwinian mechanism of mutation and natural selection is a trial-and-error combination in which mutation supplies the error and selection the trial. An error is committed after which a trial is made. But at no point is CSI generated. Natural causes are therefore incapable of generating CSI. This broad conclusion I call the Law of Conservation of Information, or LCI for short. LCI has profound implications for science. Among its corollaries are the following: (1) The CSI in a closed system of natural causes remains constant or decreases. (2) CSI cannot be generated spontaneously, originate endogenously, or organize itself (as these terms are used in origins-of-life research). (3) The CSI in a closed system of natural causes either has been in the system eternally or was at some point added exogenously (implying that the system though now closed was not always closed). (4) In particular, any closed system of natural causes that is also of finite duration received whatever CSI it contains before it became a closed system.>> ______________ That was in 1998, 13 years ago.kairosfocus
March 23, 2011
March
03
Mar
23
23
2011
02:58 PM
2
02
58
PM
PDT
FOR THE RECORD: I note that MG has posted a contribution as above. I note, for the record as follows (following up from several posts here and at MF's blog that have gone over much the same ground): 1 --> In the post's main body there are 4469 ASCII characters, and at 7 bits per character [128 possibilities] that gives a space of ~1.32 *10^9,417 possible configurations. 2 --> Of these, rather few will be in grammatically correct, contextually relevant English, i.e. functional. 3 --> But the functionality can be recognised by our now proverbial "semiotic agent" [who under the label "observer" is firmly embedded in say Quantum Physics]; the post fits the external specification/pattern of being in English language sentences, defining a cluster of states distinguishable from those that are not. 4 --> Likewise, we already have a specification in binary digits [bits], a standard measure of information carrying capacity. 5 --> In essence -- for clarity -- one bit is one yes/no unit. Two bits allows 2 states in the second digit for each of the two in the first, and so on. So, 2 bits permits 4 states, 3, eight, and n bits 2^n. 6 --> Further to this, we are of course long since beyond 143 ASCII characters or 1,000 bits; which corresponds to 1.07*10^301 configs. 7 --> It so turns out that the ~ 10^80 atoms of the observed cosmos, changing state every Planck time [about 10^20 times faster than the fastest nuclear interactions, on the strong force], for the thermodynamic lifespan of eh cosmos [about 50 mn times the time usually estimated to have lapsed since the origin of the observed cosmos] would undergo 10^150 states. 8 --> Consequently, the resources of the observed cosmos would be inadequate to investigate more than 1 in 10^151 of the possibilities for 1,000 bits. 9 --> That is, no search on the gamut of the cosmos would be adequate to search enough of such a space to be materially different from no search. [Similar challenges are at the basis of the second law of thermodynamics. App 1 my always linked discusses this.] 10 --> We are now ready to calculate the basic level, functionally specific bits measure of the post above, C*S*B = X, where complexity C = 1 as 4469 7-bit symbols > 1,000 bits, (functional) specificity = 1 as this is English text, and number of bits is B = 4469 * 7 = 31,283. 11 --> We have 31, 283 functionally specific bits, similar to how we measure file size in general. The FSCI inference is that once we are past the 1,000 bit threshold, and have functionally specific bits -- think about what random bit changes would do to the above message in short order -- the explanatory inference is that the post is designed. 12 --> This is of course, independently known and reflects the reliability of the inference on FSCI to design. 13 --> Similarly, for a 300 AA typical functional protein, we are looking at mRNA of 3 * 300 = 900 4-state characters, or 1,800 functionally specific bits. And, it is known that while some stretches of a protein are fairly tolerant, significant random AA substitution will destroy function in rapid order, i.e, proteins are functionally specific. 14 --> The onward inference is that on the empirical reliability of FSCI as a sign of design and the linked "Infinite Monkeys" analyses, the protein too is best explained as an artifact of design. 15 --> Now, we have chosen a familiar example and a simple heuristic, to bring across the point of what functionally specific complexity is [by live example], and why it is that intelligence rather than blind random walks filtered on trial and error is the best explanation for it. 16 --> There are more complex models and metrics and more elaborate calculations [as have been raised before also, all of this is uncommonly like going in pointless circles], but they boil down to identifying the same Orgel complex specificity, in light of the same Wicken wiring diagram discussed here [and remember a string structure is a wiring diagram with nodes in a line], leading to the same issue of isolated islands of function in large config spaces where intelligent search is the most credible means to arrive at functionally specific configs, on search challenges to a random walk filtered by trial and error. 17 --> And MF's dismissive remarks notwithstanding, this is where the hyperskepticism issue comes in:
(i) If the above is "meaningless" -- a term used by MG in previous discussions -- then Orgel and Wicken were also meaningless. (MG has consistently not responded to this.) (ii) By posting a contribution in English, ASCII text, MG has in fact provided a personal example of how FSCI is best explained by design. (For very good reason, no infinite monkeys process on the gamut of the observed cosmos would be credible as an explanation of her post. Why then are we invited to imagine that such lucky noise and trial and error processes are credible as the source of the vastly more complex and specific functionality of the first living cell, and onward novel body plans, where we are dealing with 100,000 - 1 mn bits and well past 10 mn bits for these cases?) (iii) So, if MG wishes to deny the above, she is forced to actually instantiate the reason why FSCI is a good sign of design. That is very important as a baseline: self-referentiality. (iv) In addition, she has produced a file suing digital technology, and applying the metric of bits that are functional, i.e. we are dealing with a real world metric, one that is now a commonplace of a dominant technology. So, to try to dismiss the significance of functional bits is also self-referential. (v) Finally, if one refuses to acknowledge the above, it is predictable that more complex cases will be even more intractable. (Not, because they are "meaningless," but because MG is in a self referentially incoherent loop.)
18 --> Going further, one of the major issues over recent days has been the proposal that ev and co show how evolutionary mechanisms per blind chance filtered by selection on improved function, show how CSI can arise by chance. 19 --> This begs the material question highlighted by the above simple analysis: getting to islands of function in large config spaces. 20 --> Hill climbing within such an island on built in information and algorithms tracing to design, look to me suspiciously like a case of walking in circles in the snow and since one is seeing more and more tracks, one thinks one is getting closer to civilisation and rescue. 21 --> In reality, one is just going in circles and mis-attributing causes. ++++++++ So, have fun. Good day, GEM of TKI PS: I will follow up with an excerpt from Dembski where he took an initial step in his analysis that leads to the paper Specification. Maybe that will help in clarifying what is going on.kairosfocus
March 23, 2011
March
03
Mar
23
23
2011
02:52 PM
2
02
52
PM
PDT
Didn´t Dr. Dembski kind of denounce the CSI concept? Or was that the explanatory filter? Anyway, it is quite easy to see that, no matter what the exact definition is, known evolutionary processes like gene duplication+divergence can increase CSI, at least theoretically. Therefore, from a purely mathematical point of view, things are quite clear. But the ID camp will not give up the ambiguity so easily, Mathgrrl. It is extremely valuable, in fact it´s the only thing that keeps the whole concept alive. Therefore, I am quite surprised you have been allowed to put a spotlight on this topic... Credits to O´Leary and Jonathan M: This has been a bold decision! Special thanks to vjtorley for his work, too!Indium
March 23, 2011
March
03
Mar
23
23
2011
02:39 PM
2
02
39
PM
PDT
Mathgrrl: In the very long recent UD thread on CSI, I proposed an alternative definition of CSI: Chi=-log2[(10^120).(SC/KC).PC], where SC is the Shannon complexity (here defined as the length of the string after being compressed in the most efficient manner possible), KC is the Kolmogorov complexity (here defined as the length of the minimum description needed to characterize a pattern) and PC is the probabilistic complexity, defined as the probability of the pattern arising by natural non-intelligent processes. In your response, you wrote several comments:
While I understand your motivation for using Kolmogorov Chaitin complexity rather than the simple string length, the problem with doing so is that KC complexity is uncomputable.
Quite so. That's the point. Intelligence is non-computational. That's one big difference between minds and computers. But although CSI is not computable, it is certainly measurable mathematically. To use an old example: suppose we received a signal from space, containing the first 100 digits of pi. Here, the length of the description "1st 100 digits of pi" (or the Kolmogorov complexity, as I have defined it) is significantly less than the length of the string, which cannot be compressed because the digits in pi follow no pattern - hence the Shannon complexity as I have defined it above is 100. Concerning the probabilistic complexity in the denominator of my formula, I originally wrote:
I envisage PC as a summation, where we consider all natural non-intelligent processes that might be capable of generating the pattern, calculate the probability of each process actually doing so over the lifespan of the observable universe and within the confines of the observable universe, and then sum the probabilities for all processes. Thus PC would be Sigma[P(T|H_i)], where H_i is the hypothesis that the pattern in question, T, arose through some naturalistic non-intelligent process (call it P_i). In reality, a few processes would likely dwarf all the others in importance, so PC could be simplified by ignoring the processes that had a very remote chance of generating T, relatively speaking.
Your comment on probabilistic complexity was as follows:
This is another term that is impossible to calculate, although in this case it is a practical rather than a theoretical limitation. We simply don't know the probabilities that make up PC... Computing PC based on known processes and assumed probabilities will certainly lead to many false positives. This version of CSI is therefore more a measure of our ignorance than of intelligent agency, just as Dembski's is.
In reply: the fact that we don't know what the probabilities are doesn't mean that we can't put an upper bound on them, by computing the probabilities for a wildly optimistic scenario. That was what Dr. Stephen Meyer wrote about in his book, Signature in the Cell, where he states on page 213:
In 1983 distinguished British cosmologist Sir Fred Hoyle calculated the odds of producing the proteins necessary to service a single one-celled organism by chance at 1 in 10^40,000... [Postdoctoral researcher Douglas] Axe's experimental findings suggest that Hoyle's guesses were pretty good. If we assume that a minimally complex cell needs at least 250 proteins of, on average, 150 amino acids and that the probability of producing just one such protein is 1 in 10^164 as calculated above, then the probability of producing all of the necessary proteins needed to service a minimally complex cell is 1 in 10^164 multiplied by itself 250 times, or 1 in 10^41,000. That kind of number allows a great amount of quibbling about the accuracy of various estimates without altering the conclusion. The probability of producing the proteins necessary to build a minimally complex cell - or the genetic information necessary to produce these proteins - by chance is unimaginably small.
Of course, Meyer's calculation here applies to chance processes, and various origin-of-life researchers have suggested that there is a kind of biochemical predestination in Nature which makes the emergence of life highly likely, given enough time and a planet orbiting its star in the habitable zone. But the key problem with this view (as Meyer argues in chapter 10 of his book) is that if bonding affinities in DNA determined their sequencing, it would be unable to carry the vast amounts of information that it does. DNA would be characterized by order (and hence redundancy) rather than information. I conclude that it is possible to compute plausible upper bounds on probabilistic complexity, in the light of what we know. You also wrote:
If you're proposing a new metric, you need to clearly and rigorously define it, which you've made a good start at, and show how it actually measures what you claim it measures with some worked examples... One problem you'll immediately encounter is identifying artifacts that are not designed, so that you can show that your metric doesn’t give false positives.
OK. Let's deal with that last point. Here's an example. The Precambrian Smiley Face Suppose I dig up a Precambrian rock with what appears to be a smiley face on it: a circle, two dots that look like eyes, and a curved line segment that looks like a mouth. Two possible Kolmogorov descriptions of this face would be: (1) "smiley face" and (2) "a circle, containing two dots above a curved line". If the proportions were sufficiently accurate that anyone seeing it would call it a smiley face (e.g. if the eyes were evenly spaced and about one-third of the way down from the top), then I'd go with the former description; but if the two eyes were on the same side of the circle or something like that, I'd go with the latter description. To calculate the Shannon complexity, I'd need to break it up into its three components: circle, two dots and curve. To make it mathematically manageable in terms of the level of precision, I'd pixellate the representation of the smiley face, as no perfect circles exist in Nature. Let's suppose that at the 128x128 level, however, the smiley face was still a perfect circle. Then each row could be represented as alternating white and black spaces (or 0s and 1s), where the outline of the circle corresponded to black or 1. So in a typical row, you'd have x 0's (white space), a 1 (black), y 0's (more white spaces), a 1 (black, on the other side of the circle) and x 0's again (by symmetry). x would always be less than 64, so it'd need 6 bits to specify. y would always be less than 128, so it'd need 7 bits. So in a typical pixellated row, the number of bits you'd need to specify the circle would be: (1+6)+(1+1)+(1+7)+(1+1)+(1+6)=26 bits. The first 1+ in each case specifies the color; the number after the + specifies the number of bits with that color. But since the right hand side of the circle is the same as the left, 13 bits should be enough to specify the pixellated row, in terms of Shannon information. The next row would be much like the previous one, except that the black spaces would be a little closer together or further apart. To specify that row, you'd only need a two bits: one telling you how many spaces left or right to move the black pixel, relative to the previous row (it would never be more than one space, as the shape we're dealing with is a circle, and the pixellation is pretty fine), and the other to tell you whether to move the black pixel left or right. So that's two bits. Also, the top half of a circle is the same as the bottom, so we'd only need to specify 64 rows. By my calculations, I get 13+(63x2)=139 bits to specify a 128x128 pixellated circle, in terms of its Shannon complexity, where 13 stands for the top row and 63 represents the number of rows following it. Since we only have a quarter of a circle here, we'd need two more bits to specify: copy again to the right and copy again in the bottom half. So that's 141 bits altogether. All right. What about the eyes? Row number, column number for the first eye should suffice. So: 1 bit to specify black, 6 bits to specify row number, and 6 to specify column number. That's 13, and if you add 1 bit to say: copy on the right hand side, you get 14. The mouth is a bit tricky. Let's say it's about 4 rows deep. However, the black pixels in each row would be one or two line segments (not two dots, as in the circle case), so the specification required to describe it in the two-line case should be: (1+6)+(1+6)+(1+6)+(1+6)+(1+6)=35 bits. Half of that gives you 18 bits. You also need 6 bits to specify the row number for the first row where the mouth appears. Changing each successive row of the mouth requires more bits than for the circle case, as we have to change the start and end points of the line segment, by moving it (let's say) anywhere up to 16 (=2^4) columns to the left or right in the next row, so that's [1 (for L or R) + 4] times two (for start and end points of the line segment), or 10 bits. Total for the mouth: 18+6+(10x3)=54 bits. Total Shannon complexity for a pixellated 128x128 smiley face: 141+14+54= 209 bits. "Smiley face" has 11 characters, making it much shorter than the Shannon string needed to specify it. To properly describe the one we found, we need to specify it as follows: "128x128 smiley face". That's only 19 characters. Even if you insist on representing each letter as 6 bits (2^6=64, compared to 26 letters in the alphabet plus 10 digits), you still get 114 bits, which is much less than 209. Probabilistic complexity: let's assume that all the world's rocks are black and white, with no colors and no shades of gray. Let's assume they can all be represented in pixellated terms. The odds that a given 128x128 slab of rock will have an identical arrangement of black and white pixels to the smiley face we found are 1 in 2^16384. But of course a smiley face could look slightly different. Since I specified a 128x128 smiley face, I'm just going to deal with the 128x128 smiley face in my calculations, and not a smaller one. How much could its shape vary while leaving it recognizable as a smiley face? Each dot for the eye could probably move about 15 pixels up, down, left or right. Of course, the other eye dot would have to move the same way, to maintain perfect facial symmetry. So that gives us 30x30=900 possibilities. The top row of the mouth could probably move the black line segments 15 pixels left or right. The rows below would more or less have to move in sync. The row number for the top row could perhaps be varied by 15 pixels, up or down. So that's 30x30 again, but let's be generous and allow the mouth to vary in depth, from 1 (a flat smile) to 10 (we don't want a V-shaped smile). 30x30x10 is 9,000. The circle can't vary, if it's a 128x128 circle. So the number of possible 128x128 smiley faces comes out at 900x9,000= 8,100,000, which is much, much less than 2^16,384. It's about 2^23. So the probabilistic complexity of a 128x128 smiley face is about1 in 2^16,361. 10^120 (the upper bound on the number of events in the observable universe, and hence a very generous upper bound for the number of slabs of rock) is about 2^399. SC/KC is 209/114 or about 2^1. So on my definition Chi is: -log2[(2^399).2^1/2^16,361] or: -log2[2^(-15,961)] or 15,961 >> 1. I hope this satisfies you as a detailed calculation, using a concrete example. Not being a biologist I can't comment on ev, Tierra or the Steiner problem. But I hope you will recognize that it represents a useful metric for SETI fans who encounter alien artifacts when exploring another planet. I don't think my measure of CSI will yield any false positives.vjtorley
March 23, 2011
March
03
Mar
23
23
2011
02:32 PM
2
02
32
PM
PDT
I want to thank MathGrrl for this post. The subsequent discussion is most interesting. And I'll thank UD for permitting this post.Neil Rickert
March 23, 2011
March
03
Mar
23
23
2011
02:29 PM
2
02
29
PM
PDT
Kudos to Jonathan M and Denyse for allowing a non-ID proponent to open a thread. Question for ID proponents: at what point is the "specified" bit of CSI determined? Before the design arises or after?Grunty
March 23, 2011
March
03
Mar
23
23
2011
02:28 PM
2
02
28
PM
PDT
About gene duplication allegedly increasing CSI I would reply with a simple question: "when one sends an email twice, do you think the receiver gets additional information respect the single mail?". It is obvious there is no additional information. About Dembski’s CSI, it is true that if we concatenate two identical mails we have a text string with double quantity of characters (then double complexity) respect the single one. But the specification of the two identical sub-strings is perfectly equal (said otherwise, there is no new functionality). Therefore the specification doesn’t increase at all. If the added specification is zero then the added CSI is zero too (also if the complexity increases). In fact we have *new* CSI only when complexity _AND_ specification are in the same time both greater than zero (i.e. in a sense they are both *new*) and here the specification is not. Therefore gene duplication doesn’t help to create new genetic information, exactly as, in software industry, to simply duplicate subroutines doesn’t help to produce new software.niwrad
March 23, 2011
March
03
Mar
23
23
2011
02:09 PM
2
02
09
PM
PDT
Mod comment: PaV at 32 and 34: MathGrrl is an invited guest. We don't accuse people of making an "insincere statement" here. Cool off, okay? jon specter, too bad about your sciatica. The witless platypus thinks he is fine, and who dare argue? It's funny, yes, but why not wait for one of my more frivolous Coffee!! posts to add to the fun.O'Leary
March 23, 2011
March
03
Mar
23
23
2011
01:58 PM
1
01
58
PM
PDT
I once suggested that maybe God delegated creation to a committee of angels.
That would certainly explain the platypus.... and my sciatica.jon specter
March 23, 2011
March
03
Mar
23
23
2011
01:43 PM
1
01
43
PM
PDT
By the way, I registered because of the integrity of allowing opponents to post. It means a lot to me.maproctor
March 23, 2011
March
03
Mar
23
23
2011
01:36 PM
1
01
36
PM
PDT
"Discussion of the general topic of CSI is, of course, interesting, but calculations at least as detailed as those provided by vjtorley are essential to eliminating ambiguity." -- MathGrrl I believe they can be found below the abstract in the paper: `Specification: The Pattern That Signifies Intelligence.` You'll find it's the first link in your post. It also provides worked examples of the same caliber as vtjorley's previous response to you. Given that what you ask is already knowingly available then you are simply being unserious in demanding that the gallery provide a dissertation to you. You have either not read the paper you linked and have your answers easily available or your are playing cute by not stating your objections to the work with which you are assumedly already familiar. That said, vtjorley is correct that CSI is unuseful. CSI, as laid out in the paper you linked, is based on the algorithmic complexity of a linear bitstring as compared to a random set of coin flips. DNA is not a pure linear bit string of a structured language in this sense -- Dembski's sonnet -- nor do its products express and operate in a single linear fashion so it's invalid at first blush. Further, given that the Darwinian process does not posit a one-time random coin flip it's not useful in distinguishing from the claims of ID from Darwin. To the point that anyone wishes to treat DNA as a proper bitstring under these notions then the rejection of the CSI hypothesis would reject both ID and Darwin in one shot. Or, to the other side, conform to both of them. The problem is not, and should not be, one of outcome but of the process that led to it. In that regard ID and Darwin are each the null hypothesis of the other. To the degree that you are asking for testable claims of ID and rigorous math? I give you the last century of research into evolutionary biology. It is impossible for this discussion to be settled so long as everyone keeps pretending that an organism is a simple bitstring rather than a control theory issue of graph topology and feedback.John Quincy Public
March 23, 2011
March
03
Mar
23
23
2011
01:29 PM
1
01
29
PM
PDT
vjtorley, That's a really interesting thought... isn't it conceivable that random processes would have a 'signature' too? So one could determine what is the work of a 'designer 1' which is intelligent and a 'designer 2' which is a process. Perhaps the devil is in the details? Repetition can give you new information if the information is relative. So for instance two exit signs have more information than one if the first points to the second. How to get to a destination contains a variable amount of information.maproctor
March 23, 2011
March
03
Mar
23
23
2011
01:22 PM
1
01
22
PM
PDT
MathGrrl: I just noticed that you wrote this at the beginning of your post (which I really can't believe is being allowed. You should be run off of this board!): "In the abstract of Specification: The Pattern That Signifies Intelligence, . . . Please tell me you've read more than just the abstract!! If you haven't, then please, just go away. If you have read more, then, please, tell me where in that paper are you having confusion or difficulties. And, what problems are you having applying what is written in that paper to the putative "scenarios" you've listed? If you can't do this much, then my assessment of you in our earlier post is 'spot on'.PaV
March 23, 2011
March
03
Mar
23
23
2011
01:16 PM
1
01
16
PM
PDT
Upright, Would you explain what semiotic convention means?Collin
March 23, 2011
March
03
Mar
23
23
2011
01:11 PM
1
01
11
PM
PDT
MathGrrl: I clearly stated in the original post that, based on my reading of the available material, I do not understand how to calculate CSI. Instead of asking me questions, why don’t you provide some answers? I think this is an insincere statement. You can't possibly state that you're familiar with No Free Lunch and then turn around and then say: " . . . I don't understand ow to calculate CSI." This is outlandish. The EASIEST part of CSI is the calculation of complexity. And certainly, as Dembski presents it in his paper on "Specification", it is a more complicated, world-encompassing approach; but the simplified version is a simple negative log calculation of improbability. Some 8th graders could do the calculation. Why can't you---or, won't you---give a definition of, and an example of, a "specification", as best you understand it? If you have no basic understanding, then why should I attempt any kind of dialogue with you? Why should I waste my time? MarkF: I think I'm entitled to some kind of an answer. A "specification" can be a very involved mental construction, especially when it comes to the putative "scenarios" that MathGrrl invokes. Why should I waste my time putting something together like that when she hasn't made the effort to come to grips with Dembski's definition of "specification"?PaV
March 23, 2011
March
03
Mar
23
23
2011
01:08 PM
1
01
08
PM
PDT
Mathgrrl, In each of the threads you recently participated in here at UD, you kept making the claim that evolutionary algorithms can create information, in this case complex specified information. Yet, as I have pointed out to you, information – any information – only exists by means of a semiotic convention and rules (unless you disagree, and can show an example otherwise). So the question remains: Does the output of any evolutionary algorithm being modeled establish the semiosis required for information to exist, or does it take it for granted as an already existing quality. In other words, if the evolutionary algorithm – by any means available to it – should add perhaps a ‘UCU’ within an existing sequence, does that addition create new information outside (independent) of the semiotic convention already existing? If we lift the convention, does UCU specify anything at all? If UCU does not specify anything without reliance upon a condition which was not introduced as a matter of the genetic algorithm, then your statement that genetic algorithms can create information is either a) false, or b) over-reaching, or c) incomplete.Upright BiPed
March 23, 2011
March
03
Mar
23
23
2011
01:01 PM
1
01
01
PM
PDT
Joseph, by "well understood" I think MG means -- in part -- that gene duplication has been observed to happen without apparent intervention. To posit intelligent intervention in those instances would be to say that an intelligence came in and duplicated a gene while leaving no trace of its presence. Since we know genes mutate without intervention, and since gene duplication is similar to any other mutation at base, your suggestion would mean that an intelligence could be operating at every point along the way. That seems like an impossibly high bar.QuiteID
March 23, 2011
March
03
Mar
23
23
2011
12:52 PM
12
12
52
PM
PDT
Or at least you have an 81% chance of the code containing CSI and therefore an 81% chance of the code being designed.Collin
March 23, 2011
March
03
Mar
23
23
2011
12:44 PM
12
12
44
PM
PDT
Mathgrrl, Then in step 2 you identify a similar function and see if the code is the same. You can compare the code-function of species 1 with the code-function of species 2 and get a similarity factor. Maybe they are 90% the same. Then you multiply it by a factor of negative relatedness. A human and a chimp would have a low factor like .1. So multiply the 90% by the .1 and you could get a 9% chance that the code is specified. But if the code-function is found in unrelated species, like a human and a spong, then you multiply it by something like .9. 90% by .9 equals an 81% chance of the code being "specified." If the code is also complex (a given) then you have CSI. Again, I'm just brainstorming, so don't ridicule me even if my reasoning is ridiculous. :)Collin
March 23, 2011
March
03
Mar
23
23
2011
12:32 PM
12
12
32
PM
PDT
MathGrrl, though I can't 'do some math' with you (unless you want to stay with very basic math), perhaps this empirical evidence will be of interest to you; Flowering Plant Study 'Catches Evolution in the Act' Excerpt: The new species formed when two species introduced from Europe mated to produce a hybrid offspring. The species mated before in Europe, but the hybrids were never successful. However, in America something new happened -- the number of chromosomes in the hybrid spontaneously doubled, and at once it became larger than its parents and quickly spread. http://www.sciencedaily.com/releases/2011/03/110317131034.htm Now MathGrrl this looks like just the type of evidence you need to make your case does it not? But it turns out that this evidence, as compelling as it may be on the surface, does not 'make the case' for evolution. Can you tell me why this does not? Here is a hint. Evolution by Gene Duplication Falsified - December 2010 Excerpt: The various postduplication mechanisms entailing random mutations and recombinations considered were observed to tweak, tinker, copy, cut, divide, and shuffle existing genetic information around, but fell short of generating genuinely distinct and entirely novel functionality. Contrary to Darwin’s view of the plasticity of biological features, successive modification and selection in genes does indeed appear to have real and inherent limits: it can serve to alter the sequence, size, and function of a gene to an extent, but this almost always amounts to a variation on the same theme—as with RNASE1B in colobine monkeys. The conservation of all-important motifs within gene families, such as the homeobox or the MADS-box motif, attests to the fact that gene duplication results in the copying and preservation of biological information, and not its transformation as something original. http://www.creationsafaris.com/crev201101.htm#20110103abornagain77
March 23, 2011
March
03
Mar
23
23
2011
12:32 PM
12
12
32
PM
PDT
vjtorley, You think that there were more than 1 designers? That is radical. I like it. I once suggested that maybe God delegated creation to a committee of angels. Anyway, off topic... On topic: I'm not sure how stylometry would work because all of our references would be called related via common descent and therefore not independent.Collin
March 23, 2011
March
03
Mar
23
23
2011
12:18 PM
12
12
18
PM
PDT
bornagain77:
MathGrrl posting a thread??? Is this Uncommon Descent???
Yes, a big thank you to all involved- good job and perhaps the start of something interesting.Joseph
March 23, 2011
March
03
Mar
23
23
2011
12:11 PM
12
12
11
PM
PDT
MatGrrl:
The biochemistry that can result in a gene duplication is reasonably well understood. No intelligent agent is required for it to occur.
The circuitry of my PC is very well understood and required designing agencies for its manufacture. Just because something is "understood" doesn't mean the blind watchmaker didit. Dr Spetner wrote about this back in 1997 in "Not By Chance". But this gets to the root of the problem- MathGrrl just accepts that a protein producing gene dupliction is a blind watchmaker process- just we because it is reasonably well understood- yet moans about CSI. So I will say it again: Origins- CSI is about origins. If living organisms can arise from non-living matter via chance and necessity then CSI is moot, all evolutionary processes are blind watchmaker processes and ID is dead. If you are going to start with that which needs an explanaion in the first place- ie living organisms- then you have already cheated. Would you like to talk about that? Or are you going to continue to ignore it?Joseph
March 23, 2011
March
03
Mar
23
23
2011
12:07 PM
12
12
07
PM
PDT
Collin (#11) I was very interested in your proposal that the linguistic method of stylometry can be applied to DNA. The reason is that I suspect you'd find more than one fingerprint. You also wrote:
The big obstacle is that for stylometry you have to have a reference that you know was written by a certain author.
I'd suggest starting with the most highly conserved sections of our DNA, which are found in all or nearly all organisms. Let's attribute these to Designer 1. I suspect that if you examined the DNA of higher animals, which regulates their pain responses as well as their social behavior, you'd find another fingerprint (let's call it Designer 2), indicating that Designer 1's work may have been tampered with. Death, predation, disease and some degree of suffering are part and parcel of the natural order. But that does not imply that the various instances we see of aberrant behavior in the animal kingdom (e.g. cannibalism of infants by female chimpanzees), or excruciatingly painful deaths, are part of the original plan of Providence. Something is rotten in the state of Nature, as we know it.vjtorley
March 23, 2011
March
03
Mar
23
23
2011
11:59 AM
11
11
59
AM
PDT
MathGrrl posting a thread??? Is this Uncommon Descent??? Twilight Zone Opening THEME MUSIC 1962 Rod Serling http://www.youtube.com/watch?v=-b5aW08ivHUbornagain77
March 23, 2011
March
03
Mar
23
23
2011
11:54 AM
11
11
54
AM
PDT
Mathgrrl, I guess I'm being tangential, sorry. I think I am still struggling with the definition of CSI. So I am trying to nail it down first by brainstorming different approaches. Perhaps we can start with #1 which would help us nail down just how "specified" a code section is. If a code section has no extraneous parts, then it is highly specified. For example, if it says, "Build protein X and be happy about it" then it is not highly specified because the "and be happy about it" is extraneous. So you could compare the function with the code and see how tightly they fit. If they fit tightly then you can quantify it mathematically, I think. Does that make any sense? I'm not sure it does.Collin
March 23, 2011
March
03
Mar
23
23
2011
11:50 AM
11
11
50
AM
PDT
1 12 13 14 15

Leave a Reply