## 16 Replies to ““The Conservation of Information: Measuring the Cost of Successful Search””

1. 1

The mechanistic option presupposed by the Darwinian Narrative gives us Natural Selection as some sort of Ben Gunn character, digging up the treasure and relocating it for purposes of self-preservation.

All the while he sings his merry little song:

http://www.bengunnsociety.com/bgs01.html

2. 2
Mats says:

Guys
check out what Jon Witt posted on Evo-News:

http://www.pssiinternational.com/ / http://www.doctorsdoubtingdarwin.org (Same destination)

The list is still embrionic but the tendency is for it to grow. Spred the word.

3. 3
hypermoderate says:

Dr. Dembski,

I’m still reading through your paper, but since you plan to revise it, I offer the following simpler, more intuitive derivation of your first result (top of page 2):

Given that each random sample has a probability p of hitting the target space A:

1. The probability of a sample missing A is 1 – p. (by exclusion)
2. The probability of n independent samples missing A is (1 – p)^n (by independence)
3. The probability that at least one of the n samples hits A is therefore 1 – (1 – p)^n (by exclusion)

Regards,
Hypermoderate

4. 4
jaredl says:

Dr. D – I suggest your paper here seems to bring into stark focus what I call the “No Prior Specification/No Possible Background Knowledge” problem facing classical theology. I’ve written you before with a rough draft of this question. The gist of the problem is summed up in your paper: “If the information is created as
an act of intelligence, where did the information that this intelligence employs
come from?
” I suggest that within classical theology, positing as it does an intelligence which is the ground of all being and from which all information flows, this paradox is unresolvable. In our experience, all intelligences use preexisting information to generate “new” information.

This problem is explicitly addressed and resolved in at least one other theistic system.

5. 5
jaredl says:

Of course, you could (and probably do) mean that bolded statement in an entirely different way than I have taken it, but the problem raised is nonetheless real.

6. 6

Just curious, jaredl, which theistic system is it that resolves the question?

7. 7
bdelloid says:

Hi,

I took a quick read of that.

I have one question.

Suppose I have a string of size N, and each value is a binary value, 0 or 1. Let us suppose our string is of size N = 10^9 in length. Therefore, our string can take on 2^10^9 different values. Now, suppose I have a target which is 1 for each value of the first half of the string, and zero for each value for the second half of the string. According to this article, as far as I understand it, a search strategy to achieve this particular string would require more information than the string.

But, suppose by binary value, I mean, either above a certain line in a can, or below a certain line in a can. (A binary state) and each member of my string is a grain of sand. Suppose I have each grain of sand numbered, 1 to 10^9. The first half of the grains of sand are small. The second half of the grains of sand are large, like pebbles. Now, each grain corresponds to a value in my string. These grains of sand are randomnly distributed in my can. Now, the question is, how can I have a search strategy that will place the first half of the grains of sand on one side of the line (achieve state 1, say) and the second half on the other side of the line (achieve state zero). One possibility would be to randomly sort the grains, check if it matched the string, and if it fails, reject, and perform another random search.

However, another search strategy would be to shake the can. If I shake the can enough, the grains of sand will sort themselves, with small grains on one side of the line (achieving state 1) and large grains on the other side of the line (state 0). Does this “shake the can” search strategy use more information than that specified by the string (Information being indicated by the fact that there are 2^10^9 possible states for the string) ?

Basically, what I am saying here, is that one can easily imagine a search strategy that reaches a target that doesn’t require more information – at least external information. In this case, gravity is acting as a free “distance calculator” for sorting all the grains of sand. Natural selection does the same thing.

8. 8
Bob OH says:

I’m afraid I don’t get this:

This higher-level bar{S} random search will be consistent with the original random search S, which serves as the baseline for searching the original space, provided that, on average, the bar{S}_{\$i}s yield a search whose probability of success on the original space is no greater than p.

Why is this so?

Bob

9. 9

Every context requires a Context. Keep up the great work, Bill.

10. 10
hypermoderate says:

Dr. Dembski,

On page 3 you remark that it seems counterintuitive for the information associated with a search to decrease as the number of samples increases. I suggest that a proper understanding of surprisal renders this quite intuitive.

The surprisal definition of information content, as you note in your paper, holds that information increases as surprise increases. The most surprising (and thus most informative) events are those which are the most improbable. For example, if I call you tomorrow and inform you that the Eiffel Tower is standing, I’ve given you very little information. You expected it to be standing, and I’ve simply confirmed something that was overwhelmingly likely anyway. If I tell you the Eiffel Tower has fallen, however, I’ve given you lots of information. This is a highly surprising event, carrying lots of information.

Now let’s apply this notion of surprisal to random searches. If the probability p of a sucessful sample is small, you would be very surprised to hear that a random search succeeded on its very first sample. You’d be less surprised to hear that it succeeded after 10,000 samples, and still less surprised to hear that it succeeded after a billion samples. To take the two extremes, you’d be infinitely surprised if the search succeeded after zero samples (an event of probability 0), and you wouldn’t be surprised at all if the search succeeded after an infinite number of samples (an event of probability 1). An infinite random search is guaranteed to succeed eventually if the search space is finite.

So the surprise (and therefore the information) associated with successful searches decreases as the searches get longer, because the longer they are, the less surprising it is when they succeed.

The error is in thinking that the information associated with a successful search is somehow an “information cost” of that search. Random searches, whether they succeed or fail, have an information cost of zero. They’re random.

Looking at it from the other side, a random search of length n which succeeds has a different information content from one which fails (except in the special case where p = 0.5). Yet both searches are completely random. It hardly makes sense to say one has a greater “information cost” than the other.

Regards,
Hypermoderate

11. 11
Kipli says:

1) The definition of “random search” does not fit with the way it is used in the paper. Defined as “individual elements from the search space are independently sampled with respect to a given probability distribution,” the probability distribution could very well be a point-mass distribution: it assigns all its probability to one point. This is not how the phrase ‘random search’ is used in the rest of paper. Instead, it is probably intended that the probability distribution is uniform. What the probability distribution for the random search is should be specified more precisely.

2) The notion of “added information” doesn’t seem well thought-out. Though it is claimed in footnote 4 that it is the “non-averaged form of…relative entropy”, it is, in fact, not. Relative entropy involves comparing the probabilities of specific events of the same probability space under two (different) probability measures. In the definition given, the events A and B need not have anything to do with each other. In fact, we could let A be the event “a fair coin comes up heads when flipped” and B the event “a fair die comes up 6 when rolled”. Then the information that B adds to A is log[(1/6)/(1/2)] = -log(3). Somehow getting a 6 on the die takes away information from the coin flip, even though the events have nothing to do with each other?

I would think that for this notion to be reasonable, when two events are independent then the information added to one by the other should be 0.

3) While initially defined for two events, “added information” is immediately applied to two searches. But the information added by one search to another is not been defined. It appears that what is meant by the information added by search T to search S is the information added by the event “T is successful” to the event “S is successful”. In the end, then, the information added by search T to search S is simply the difference in the probabilities that they are successful, even though there need not be any connection between the two.

4) The idea that a search may contain information (as implied by the phrasing in various places) is odd. Is it simply the self-information of the event “the search is successful”? In that case, any search that is guaranteed to succeed will actually have information of 0. Also, the probability that a search succeeds is tied to the underlying space; if the information in a search is intended to be independent of the space then how can we explain a search that does better than random with one space (so adds information), but worse than random with another space (so takes away information).

5) After making all of the simplifying assumptions, the statement (and proof) of the theorem is trivial: Let S and S-bar be random variables on [0,1] with probability measures \mu and \mu-bar. Suppose \mu({1}) = p, \mu({0}) = 1-p, and the expected value of S-bar is p-bar = -log(p) = -log(\mu({1})).

6) Stylistically, the structure of the paper is bothersome. I’ve never found long discussions followed by “…and so we have proved the following theorem” especially illuminating. Exactly what is the proof? Where does it start, end, and what are the steps in between? Moreover, the ‘proof’ given of the theorem is needlessly complicated. Unless I’m missing something, I don’t see the need for introducing weak convergence, the strong law of large numbers, or a justification for the claim that if probability measure \mu on [0,1] has expected value at most p then \mu({1}) is at most p.

12. 12
hypermoderate says:

I see that Kipli has already made my next point, which is that any notion of ‘added information’ which does not take dependence and independence into account is problematic.

The paper defines added information in such a way that the information added by any event of probability p to another event of probability p is exactly zero. This makes sense when the two events are identical, but not when they merely happen to have the same probability. In other words, event A should add no information to event A. However, an independent event B of probability p *should* add information to event A, regardless of the fact that they happen to have the same probability.

Suppose event A is flipping a coin and getting heads, while event B is generating a random number on [0,1] and getting a value > 0.5. If I tell you that A has occurred, I am communicating exactly one bit of information to you. If I tell you again that A has occurred, I am not adding to that information; I have still communicated only one bit of information. But if I tell you that B has occurred, I am communicating a new bit of information to you. The total information associated with knowing that both A and B have occurred is two bits, and B has added one bit of information to the one bit conveyed by A.

So I agree with Kipli that the notion of “added information” cannot be coherent without also taking dependence/independence into account. Where we disagree is that I believe that to be consistent with the rest of information theory, the information should be considered to be additive when the events are independent.

13. 13
Kipli says:

Something happened to my point #5. Here is the correct version:

5) After making all of the simplifying assumptions, the statement (and proof) of the theorem is trivial: Let S and S-bar be random variables on [0,1] with probability measures \mu and \mu-bar. Suppose \mu({1}) = p, \mu({0}) = 1-p, and the expected value of S-bar is p-bar = -log(p) = -log(\mu({1})).

14. 14
Kipli says:

Okay, what I sent in included ‘bad’ html characters that screwed up the message. Here’s another try. Sorry for the clogs.

5) After making all of the simplifying assumptions, the statement (and proof) of the theorem is trivial: Let S and S-bar be random variables on [0,1] with probability measures \mu and \mu-bar. Suppose \mu({1}) = p, \mu({0}) = 1-p, and the expected value of S-bar is p-bar which is at most p. Then the self-information of the event {1} with \mu is at most the self-information of {1} with \mu-bar.

Proof: Since the expected value of S-bar is p-bar which is at most p and \mu-bar({1}) is at most p-bar, \mu-bar({1}) is at most p. Then -log(\mu-bar({1})) is at least -log(p) = -log(\mu({1})).

(end 5)

I can see hypermoderate’s point about information being additive when the events are independent. In fact, that would be the case if we find the self-information of the intersection of two independent events: P(A and B) = P(A) P(B) so the self-information of the intersection would be the sum of the self-information of the two events.

But what I thought of (rightly or wrongly) was the notion of mutual information:

Intuitively, mutual information measures the information about X that is shared by Y. If X and Y are independent, then X contains no information about Y and vice versa, so their mutual information is zero. If X and Y are identical then all information conveyed by X is shared with Y: knowing X reveals nothing new about Y and vice versa, therefore the mutual information is the same as the information conveyed by X (or Y) alone, namely the entropy of X.

That may not be the correct analogue for “added information”, but it is what I have in mind when I hear the phrase.

15. 15
Michaels7 says:

I have a simple question. Complexity is on the increase with regards to gene regulation. Non-coded regions are now not only “appearing” to have direct impact on coding proteins, but to be directly involved as “bioactive molecules”.

How does this effect information conservation? More variables are introduced. Plus, we know that information is not fully identical, but subsets of sets. These subsets may contain small information parts unique as well. Not to mention just the interaction of multiple variables bring about new dependencies heretofore left unrecognized and thus unconsidered.

Does this weaken or strengthen Dembski’s ID arguments with regards to search space? From a programming background, this appears to be modularity and intelligent use of libraries. What I do not fully understand is how it impacts future analysis. It would seem to constrict or limit the search space with modular components if top-down structured ID, but increase search space and complexity if bottom-up evolutionary design. Am I on the right track?

And if in fact a “library” is found within non-coded regions, well….

http://genetics.plosjournals.o.....en.0020063
There is a recognition of new levels of interaction…
“The novel finding of 23,000 noncoding TUs and their prominent biological role in the regulation of gene expression has dramatically changed the traditional view of proteins as the only bioactive molecules, and emphasizes the need to modernize the central dogma.”

This sentence might be quoted back to Ward re: who’s who in Dogma and what is changing in the field of genetics.

This paper seems to add weight to ID, not take away from it. Curious also about above questions regarding search space and Dembski’s paper.

I have to wonder whether, if Darwin had the first clue how complex were “blobs of protoplasm”, he would have tried to foist natural selection on the world? We can forgive Darwin for his ignorance. We can’t forgive his modern day followers. -ds

16. 16
great_ape says:

some thoughts on NS, and a question about search algorithms:

ds, I suspect Darwin would have foisted natural selection upon the world however deep his appreciation for blobplexity. In retrospect, natural selection (at its core) is inevitable given excess offspring, heritable variation, and differential survival based on that variation (a fitness landscape). You can argue (and certainly we do…) about how much blobplexity that simplistic process can generate, but is anyone questioning the operation of natural selection itself? It seems a straightforward and logically inevitable algorithm that is at work in nature. Darwin was the first to describe it cogently and recognize its potential importance. At least give the fellow credit for synthesizing a large amount of empirical data, recognizing the analogy with selective breeding in agriculture, etc, and jotting down a concept that should have been articulated ages earlier–but wasn’t. (I really think Aristotle dropped the ball on this one.) Recall that, at the time Darwin was writing, the world was just coming to grips with the notion that miniscule forces operating over long stretches of time could yield vast geological structures. Is it any wonder Darwin would not attempt to extend that notion to the generation of observed biodiversity? In my opinion it’s no surprise that with such an elegant and inherently naturalistic idea, modern scientists and other intellectuals are extremely reluctant to abandon it. Is blobplexity the informational equivalent of the geological grand canyon? (roughly speaking) I don’t know. Does anyone? I will try to read Dr. Dembski’s work once more, but so far I can’t get past the notion that the search problem is incongruent with the evolutionary situation. I apologize as I’m sure this has been asked previously at some point, but I’m rather new: how is the evolution of biocomplexity analagous to a search algorithm, random or otherwise? Evolution, as I understand it, is completely nonteleological. It has no goal to achieve, no target to search. As such, it would seem the question of how long or how much information is necessary to find X through searching the landscape is beside the point. Isn’t it fundamentally more important to address the nature of the searched landscape itself? While the probability of a not-better-than-random search finding blobplexity-alpha-minor may be infintesimally small, if the searched landscape itself is riddled with superficial variants of blobplexity-alpha-minor, the specification of that particular space, and its associated probability, does not seem to be of great importance.

I was talking about foisting natural selection as an explanation for all biological diversity not just the variations in scale and cosmetics within the same biological species. It’s a reasonable explanation until you find out that the source of potential beneficial mutations is highly constrained by sub-cellular molecular realities that Darwin was not aware of. -ds