Uncommon Descent Serving The Intelligent Design Community

ScienceBlogs praises disses Dembski-Marks paper on Conservation of Information


ScienceBlogs has just posted what can only be called a rant (go here) against the paper by Robert Marks and me that was the subject of a post here at UD (for the paper, “Life’s Conservation Law,” go here; for the UD post, go here).

According to ScienceBlogs, the paper fails (or as they put it, “it’s stupid”) because

(1) As a search, evolution is a multidimensional search. Most of our intuitions about search landscapes is based on two or three dimensions. But evolution as a landscape has hundreds or thousands of dimensions; our intuitions don’t work.

(2) Evolution is a dynamic landscape – that is, a landscape that changes in response to the progress of the search. Pretty much every argument that Dembski makes can be thrown out on the basis of this one fact: all of his arguments are based on static landscapes. Once the landscape can change, every single one of his arguments become invalid – none of them work in dynamic landscapes.

(3) As a search, evolution doesn’t have to work on all possible landscapes. It doesn’t even need to work on most landscapes. It works on landscapes that have a particular kind of structure. It doesn’t matter whether evolution will work in every possible landscape — just like it doesn’t matter that fraction notation doesn’t work for every possible real number. What matters is whether it works in the particular kind of landscape in which our theory says it works. And on that question, the answer is quite clear: yes, it works.

Regarding (1), the work by Robert Marks and me typically focuses on compact metric spaces, which can include infinite dimensional spaces; for the purposes of this paper, which simplifies some of our previous work, we went with finite spaces. But even these can approximate any dimensionality we like for empirical investigations. Regarding (2), we explicitly point out that our approach is general enough to model time-dependent fitness functions (see section 8 — hey, why bother reading a paper if you know it’s wrong and can simply intuit the mistakes the authors must make). What ScienceBlogs appears not to appreciate or understand is that time-dependent fitness functions can be modeled by time-independent fitness functions (“static landscapes”) provided that one represents the search space with sufficiently many dimensions (by going to a Cartesian product — we point this out explicitly in our paper). Regarding (3), our point is that precisely because evolution works with constrained landscapes, those constraints require prior information. Yes, the environment is pumping in information; so where did that information come from? ScienceBlogs resents the very question. But what’s the alternative? Simply to say, “Oh, it’s just there.” The Law of Conservation of Information, despite ScienceBlog’s caricatures, provides cogent grounds for thinking that the information had to come from somewhere, i.e., from an information source.

Wm. Dembski writes: An environment with Karl Marx, paper, and pen in it will output Das Kapital. Not necessarily. Yet this sort of thinking demonstrates one of my pet peeves with the ID movement's claims in this area - they must know the outcome (in this case, that Marx wrote Das Kapital) prior to being able to give their equations/claims/analogies/filters a chance at success. Sort of like how biblical creation scientists KNOW that the b ibical version of history is 100% true, then seek facts and evidence to support their conclusion. derwood
Mark, you're correct -- both assumptions are needed. Marks and Dembski's one-sentence statement of the LCI doesn't explicitly state either of them, but elsewhere they state that the comparison is between the lower-order active information and the higher-order endogenous information, which entails the higher-order UPD assumption. Atom's condition, on the other hand, isn't stated anywhere, although all of their examples meet it. R0b
R0b It may well be that we agree but I am reluctant to overstate Atom's position. Atom's condition is that the average value of column one is 1/3. However, this does not necessarily mean that the probability of finding 1 in the lower order set from this set of searches is 1/3. For that to be true you need an additional assumption that each row in the set of searches is equally probable. This is the UPD assumption. It is the combination of this (unreasonable) assumption and Atom's condition that leads to LCI. Maybe that was what you were saying - I just wanted to be clear. Mark Frank
Mark, yeah, the thread is mostly dead. Where's Miracle Max when you need him? Yes, your function-theoretic higher-order space does not meet the conditions of the measure-theoretic CoI. A measure-theoretic higher-order space would look like this (the set should be infinite, but I'm setting the granularity to 1/3 to make it finite): -1- -2- -3-  1   0   0  0   1   0  0   0   1 2/3 1/3  0 2/3  0  1/3 1/3 2/3  0  0  2/3 1/3 1/3  0  2/3  0  1/3 2/3 1/3 1/3 1/3 But this set has something in common with your function-theoretic set, namely that the average of the distributions is: 1/3 1/3 1/3 In each of Marks and Dembski's three CoI theorems, the assumptions of the theorem entail a uniform average distribution on the lower-order space. This means that Atom's condition is met, and the LCI conclusion follows. R0b
R0b I think this thread is pretty much dead, but for completeness. "With Atom’s condition, the LCI is easily proven." I don't think that is true unless you also assume whichever UPD fits your needs (see #72 above). Mark Frank
Mark Frank:
where did Atom’s condition: sum(O2)/|O2| = p come from?
Atom's position is that this condition is implied in Marks and Dembski's work. Indeed, in each of their examples, they define the higher-order space with a symmetry that evenly distributes probabilities over the lower-order space, which satisfies Atom's condition. This symmetry is how Marks and Dembski neutralize any deviation from uniformity. Here's the game: 1) They posit a completely unbiased search space. 2) You counter with a fitness function (or search space translation, or probability distribution, etc.) that biases some points over others. 3) They counter with a uniform space of fitness functions (or of search space translations, or of probability distributions, etc.) that again renders the original search space unbiased. 4) etc. Without Atom's condition, the LCI is easily falsified. With Atom's condition, the LCI is easily proven. Interestingly, the paper says that the LCI is neither falsified nor provable. R0b
Mark Frank:
D&M’s measure-theoretic version assumes a UPD itself. It assumes that all pdfs across the search space are equally probable. So it can’t be used to prove that a UPD is justified.
Yes, as they regress probabilities up the hierarchy, they keep moving their assumption of uniformity to a higher level. Ultimately they justify that assumption by the principle of insufficient reason. As you point out, Haggstrom and others have explained why this justification doesn't work. As you also point out, Marks and Dembski's "information cost" is arbitrary, as it depends on how we define the higher-order space. Without Atom's condition, the information cost can range from 0 to infinity. With Atom's condition, the lower bound is at least log(q/p), but the upper bound is still infinity. R0b
Mark Frank:
The LCI is that: -log(probability(Q))>=I+
Marks and Dembski's stated formulation of the LCI is vague on the condition of that probability; that is, probability(Q) given what? But their examples make it clear that they're talking about a null higher-order search. Notice that each of their three CoI theorems ends with the statement, "Equivalently, the (higher-order) endogenous information ... is bounded by the (lower-order) active information..." R0b
Folks: Pardon a quick note: H = SUM (pi log pi) does not at all assume a uniform probability distribution. (We do use info theory with say English text, which has a significant degree of redundancy, i.e non-uniformity of probability. Also cf Bradley's working out of ICSI for 110-aa Cytochrome-C here, which treats of the non-uniformity per Yockey et al.) [I would be most interested to find out that the laws of physics and chemistry had in effect written into them, the DNA code, processing algorithms and associated molecular nanomachinery; onward the integration of proteins to form the complex, interwoven systems of life in the cell! If that is the effective objection to inference to design on seeing FSCI in DNA and its cognates, that looks a lot like jumping form the frying pan into the fire.] Also, that much derided uniform probability distribution is saying that this is the maximum uncertainty case, where the symbols i are least constrained. (It is a generally accepted principle of probability that absent reason to constrain otherwise, we default to equiprobable individual outcomes. Bernouilli and Laplace among others, if I recall. A classic and effective approach to statistical mechanics is based on just that.) We can then make shifts to account for non-uniformity; and H the average information per symbol is an application of that. GEM of TKI PS: Atom et al -- good stuff. kairosfocus
I see that Wordpress has turned my greek omegas to ?. I hope it still makes sense. Mark Frank
Atom More on uniform probability distributions (UPDs) D&M’s measure-theoretic version assumes a UPD itself. It assumes that all pdfs across the search space are equally probable. So it can’t be used to prove that a UPD is justified. See Häggström 2007 (pp 6-7) for some of the problems with UPDs. One of them is that UPDs are not closed under non-linear transformations. In most real situations there is more than one UPD to choose from. Häggström uses the example of the size of a square. Do we say all lengths of the side are equally likely or all areas are equally likely? We can’t have both. Something similar applies to choosing algorithms. For example, M&D give three “definitions” of an algorithm. All three assume UPDs. However, in at least some cases, the UPD assumptions of the definitions are incompatible. I can illustrate with a simple example. Suppose: The space we are searching (?) is the digits 1 2 and 3. The target (T) is the digit 1. So p=1/3 Using the function theoretic approach let the other space (?’) be the two letters a and b. Then here is the set of all possible functions from ?’ to ? and the associated value of q a b q 1 1 1 1 2 0.5 1 3 0.5 2 1 0.5 2 2 0 2 3 0 3 1 0.5 3 2 0 3 3 0 ? We could assume that each of these is equally likely. But each function is also associated with a probability distribution function on ?. Thus (sorry about the formatting): a b -1- -2- -3- 1 1: 1.0 0.0 0.0 1 2: 0.5 0.5 0.0 1 3: 0.5 0.0 0.5 2 1: 0.5 0.5 0.0 2 2: 0.0 1.0 0.0 2 3: 0.0 0.5 0.5 3 1: 0.5 0.0 0.5 3 2: 0.0 0.5 0.5 3 3: 0.0 0.0 1.0 And you will see that there are only six unique pdfs (e.g. 1 2 and 2 1 give the same pdf). But in the measure-theoretic version M&D assume that all pdfs are equally probable. In which case the function 1 2 and the function 2 1 should count as one algorithm. Which UDP is it? Mark Frank
Atom Loads of comments I could make - but to quickly address your last post. Taking logs makes little difference - except to confuse things slightly. The LCI is that: -log(probability(Q))>=I+ But probability(Q) only equals |Q|/|O2| if you assume that all functions in O2 are equally likely i.e. a uniform probability distribution. Mark Frank
Mark Frank, On further reflection I think you may not even need the uniformity assumption to get from step 5 (p/q >= |Q|/|O2|) to the LCI. I don't believe we have made use of the uniformity assumption in the initial steps (steps 1-5), or in our definitions. (Though I could be wrong...) All that we've said is that p is some probability and that q is a greater probability than q, so it has been improved over p. Furthermore, Sum(O2) / |O2| = p, so that the O2 set has on average the same performance as the original search p and so can serve as an objective baseline. Those were the important definitions and I don't think we'd have to change anything if p differed from a uniform search probability, since we left p as a variable. The above will work for any value of p. From there, we do the following to get LCI: 6. Rearrange, by multiplication and division |O2|/|Q| >= q/p 7. Take the log (base 2) of both sides log(|O2|/|Q|) >= log(q/p) 8. log(q/p) is the active information (I+), by definition log(|O2|/|Q|) >= I+ 9. Break up log, using quotient rule log(|O2|) - log(|Q|) >= I+ 10.Rearrange logs and factor out -1 -[log(|Q|) - log(|O2|)] >= I+ 11. Combine, using quotient rule, we get -log(|Q|/|O2|) >= I+ ...which is the LCI. Atom Atom
R0b, I've just gone through your proof step-by-step and you are in fact correct: it is a simpler method of proving the COI. You didn't make any mistakes in your derivation (that I saw) and the final step is equivalent to Dembski's function-theoretic derivation. I wish I could take credit for being a genius, but you're the genius who built the proof. So let's just say we're both pretty smart guys. :) (Feel free to share any for that discovery as your proof was elegant.) Mark Frank, The condition sum(O2)/|O2| = p is based on the definition of O2, which is the next largest set containing Q as a proper subset and has an average performance (on the lower-level search) equal to null, blind search. This is what I said was the logical definition of our higher order search space and as R0b and I have shown, is a sufficient condition for the LCI to hold. As for your second objection, you can assume that O2 has a non-uniform probability distribution on its elements that makes "good" functions more likely than bad, the same way that O2 induces a higher probability on "good" elements in the original search space, O. Since the probability distribution on O2 is only one of many possible, you now have to explain what the cost of choosing that probability distribution over the others was. So you have a search-for-a-search-for-a-search. Dembski has proven a measure-theoretic version for probability distributions and demonstrated that the LCI still holds. So your regress doesn't solve the problem, it only exacerbates it. Atom Atom
Re #68 I aplogise for being too lazy to trace back all the posts - where did Atom's condition: sum(O2)/|O2| = p come from? Also, even if: p/q >= |Q|/|O2| Is not the LCI unless you assume all members of O2 are equally probable. D&M do assume this when they write of their "epistemic rights" to assume a uniform probability distribution. But there are massive problems with this assumption and it is key to the whole paper. Mark Frank
Atom, you were right and I was wrong. You're a genius, man. (Not that it takes a genius to be right when I'm wrong.) Not only does the LCI follow from your condition, but you've also pointed the way to much easier proofs for the three CoI theorems in the paper. Here's a way that the LCI can be derived from your condition. Definitions: p,q: Same as in the paper O2: Higher-order space Q: Set of "good" functions in O2 sum(X): Sum of all probabilities in set X |X|: Cardinality of set X Derivation: 1. Since the probabilities in Q are at least q:   sum(Q) >= q*|Q| 2. Since sum(O2) >= sum(Q)   sum(O2) >= q*|Q| 3. Divide both sides by |O2|:   sum(O2)/|O2| >= q*|Q|/|O2| 4. Your condition is sum(O2)/|O2| = p. So:   p >= q*|Q|/|O2| 5. So:   p/q >= |Q|/|O2| And that's the LCI. And since your condition obviously holds in the scenarios posited by the three CoI theorems, the above constitutes a simple proof for those theorems also. Unless I'm wrong again. Did I mess up somewhere? R0b
Joseph, This is just going around in a circle and getting quite tedious. Dembski is arguing that Darwinian evolution is teleological and a targetted search. Agree? If not, why not? Hoki
Hoki, See comment 12 Joseph
R0b, continued, 4) I posted a reply here. Although a fitness function method would not work well when we're using a different search strategy, a similar way of setting a baseline could be used in other cases as well. But since I can't enumerate all cases (being an infinite number), I can explain the applicability on a case-by-case basis until it is clear to you that the problem you posed, while insightful and demonstrating a good place that the paper could have been more explicit, does not represent an insurmountable obstacle. Atom Atom
As I said there isn’t anything to search for. So Darwinian selection in a scenario without something to search for would be nature, operating freely. Darwinian selection with a target is not nature, operating freely.
From page 8 of the Dembski/Marks paper:
In other words, viability and functionality, by facilitating survival and reproduction, set the targets of evolutionary biology. Evolution, despite Dawkins’s denials, is therefore a targeted search after all.
Do think that you and Dembski agree with eachother? Hoki
Good morning R0b. 1) I already discussed this trivial set in a previous post and said we're looking for the next largest set from the "reduced" set. Since Dembski/Marks' paper begins with a set-up where someone shows an improved performance over blind search (such as by using a fitness function, f1), we begin with that set and add to the higher level space until we reach a null performance baseline. Then we measure the fraction of "good" functions (with efficiency at least as good as the first proposed function, f1) to this total set. According to the paper, the informational cost of this reduction will be at least the active information. 2) You are completely correct, though I believe this is implied in the paper due to the way they set-up the problem. It is a straight-forward extension of their work and I agree they should probably state it explicitly. 3) If you can do what you propose - begin with a higher level search space with performance averaged to blind search on the lower level search then reduce that set to a good fitness function that increases performance such that the active information gained is greater than the informational cost incurred by your reduction - I will concede. If my ideas were not what Dembski and Marks had in mind with their paper they may clarify and argue against your point, but I won't. So you will have proved your point to me, at very least. Atom Atom
No we can’t. You measure the fraction of “efficient” functions from the total number of elements in the next largest set inducing an average performance equal to blind search.
Atom, four answers to this: 1) I'm not sure what you mean by "next largest". Next to what? A lot of sets of functions can have an average performance the same as the null search, and some of the sets can be very small. Consider the set that consists of a single function in which every point has a fitness of 0. For some algorithms, this will result in a performance equal to the null search. 2) I don't see anything in the paper that states or implies the bolded part above. If your idea remedies this problem, then Marks and Dembski need to add it as a condition to the LCI. 3) Having said that, I don't think it remedies the problem. Consider a case in which q=2*p. To falsify the LCI, we need to show that more than 1/2 of the higher-order space consists of searches that succeed with a probability of at least q. We can define our higher-order space so that, say, two-thirds of it consists of these "good" searches, and the other third consists of searches that are bad enough to offset them, so the average of the whole set is the same as the null search. 4) Your condition doesn't seem to be generally applicable. See my comment here. R0b
Mr StephenA, Actually, Mr Chu-Carroll was being loose in his description of evolution as a search strategy. NFL would say that there are some spaces evolution does well in, some spaces where it equals a random walk, and some spaces where it does worse than a random walk, so that on average it equals the random walk in performance across all spaces. By accepting NFL, Dr Dembski and the rest of us have to accept that evolution works, full stop. Really, the only remaining issue is whether the universe we inhabit is a search space tuned to make evolution easy, or not. One approach to this question is to look at universes (fitness functions) where evolution fails to perform as well as a random walk. Dr David Goldberg at the University of Illinois studies deception in genetic algorithms. Imagine a fitness surface like a bowl, with one point sticking up from the lowest point to reach just a little bit above the rim. That is a deceptive fitness function. All the information points away from the optimum. By tuning the parameters of the fitness function, it is possible to force an evolutionary algorithm to perform worse than a random search. Is our universe deceptive? Or are its laws monotonic and regular over the scale of life in size, temperature and pressure? To the extent that the laws are regular, we should expect that we live in an evolution friendly universe. To the extent that the laws are deceptive, if we still saw evolution work, that would be evidence of some interference or assistance. Nakashima
David Kellogg:
the original intelligent agent created all the information there is. We just shuffle it around.
We tap into it and use it. Geneticist Giuseppe Semonti, in his book “Why is a Fly Not a Horse?” tells us in chapter VIII (“I Can Only Tell You What You Already Know”):
An experiment was conducted on birds-blackcaps, in this case. These are diurnal Silviidae that become nocturnal at migration time. When the moment for the departure comes, they become agitated and must take off and fly in a south-south-westerly direction. In the experiment, individuals were raised in isolation from the time of hatching. In September or October the sky was revealed to them for the first time. Up there in speldid array were stars of Cassiopeia, of Lyra (with Vega) and Cygnus (with Deneb). The blacktops became agitated and, without hesitation, set off flying south-south-west. If the stars became hidden, the blackcaps calmed down and lost their impatience to fly off in the direction characteristic of their species. The experiment was repeated in the Spring, with the new season’s stars, and the blackcaps left in the opposite direction- north-north-east! Were they then acquainted with the heavens when no one had taught them?
The experiment was repeated in a planetarium, under an artificial sky, with the same results! Joseph
So, nature, operating freely does not include Darwinian selection?
As I said there isn't anything to search for. So Darwinian selection in a scenario without something to search for would be nature, operating freely. Darwinian selection with a target is not nature, operating freely. Joseph
StephenA @54
Thes doesn’t work for a very simple reason: evolution doesn’t have to work in all possible landscapes. Dembski always sidesteps that issue.
So the search just happens to be be one able to take advantage of the structure of the landscape? What an interesting cooincidence.
Not really. We see the search mechanisms that work in this environment working. We don't see search mechanisms working that don't work in this environment. Hardly surprising. For my own edification, is Dr. Dembski now admitting that the mechanisms identified by modern evolutionary theory do, in fact, result in the evolution we observe? I get the impression that he has stepped back (or up) a level and is now taking the position that the environment itself is intelligently designed. Is my understanding accurate? If so, is ID now a type of fine tuning argument? JJ JayM
"After that, I was a bit too discouraged to keep reading." You should have kept up, because Chu-Carroll points out that the paper implies that human intelligences don't create information either: the original intelligent agent created all the information there is. We just shuffle it around. David Kellogg
P.S.--I would ask you how to make Karl Marx into a repeatable experiment, but it brings to mind The Boys from Brazil. T M English
Bill Dembski (30), Sorry to respond slowly -- it's the end of the semester.
I’m not sure why you equate environments with Nature writ large.
Every environment has an environment, except for the universal environment. What is a closed environment but a thermodynamically closed system -- a modeling fiction that is sometimes useful? When you say that information has entered a material system from a non-material source, a methodological naturalist must contend that your accounting is an artifact of your framing of the informed material system. We could play with all of our matryoshka dolls, but I suggest we go straight to the biggest. I'm really not picking on you here. I no longer believe that anyone can speak of the objective probability of the universe (perhaps multiverse) being what it is. Juergen Schmidhuber was a bit obnoxious in presenting his inductive bias as the Great Programmer religion -- the idea that the universe unfolds as a dovetailing program runs. But I think he correctly indicates that our explanations of nature begin with unprovable assumptions about the nature of nature.
An environment with Karl Marx, paper, and pen in it will output Das Kapital. Environments, it seems, can be quite cozy and the information they contain and the sources from which they obtain it may be studied and assigned probabilities.
Yes, Marx must have found the reading room of the British Museum cozy. He spent much of his time there, over a period of 12 years, surveying the economics literature. There are historians of ideas who explain Marx as a product of his times, much as they do Darwin. Shall we next set up the 1866 holdings of the British Museum's library as the target of a search? T M English
I though I'd go back to the start of the dialogue, so I went to Mark C. Chu-Carroll's first article about Dembski.
Dembski has been trying to apply the NFL theorems to evolution: his basic argument is that evolution (as a search) can't possibly produce anything without being guided by a supernatural designer - because if there wasn't some sort of cheating going on in the evolutionary search, according to NFL, evolution shouldn't work any better than random walk - meaning that it's as likely for humans to evolve as it is for them to spring fully formed out of the ether.
This would have to be the most accurate descrition of ID by an Anti-ID proponent that I have seen. (which is kinda sad when you think about it) The only thing that is wrong with it is the word 'supernatural' should read 'intelligent'.
Thes doesn't work for a very simple reason: evolution doesn't have to work in all possible landscapes. Dembski always sidesteps that issue.
So the search just happens to be be one able to take advantage of the structure of the landscape? What an interesting cooincidence.
Let me pull out a metaphor to demonstrate the problem. You can view the generation of a notation for a real number as a search process. Suppose you're given ?. You first see that it's close to 3. So the first guess is 3. Then you search further, and get closer - 3.14. That's not quite right. So you look some more, and get 3.141593. You'll get closer and closer to a notation that precisely represents ?. Of course, for ?, you'll never get to an optimum value in decimal notation; but your search will get progressively closer and closer. Unfortunately, most real numbers are undescribable. There is no notation that accurately represents them. The numbers that we can represent in any notation are a miniscule subset of the set of all real numbers. In fact, you can prove this using NFL.
Ok, I think I'm with you so far...
If you took Dembski's argument, and applied it to numbers, you'd be arguing that because most numbers can't be represented by any notation, that means that you can't write rational numbers without supernatural intervention.
Erg. I was going to let your earlier slipup pass, but you've gone and based your counterargument on it. If you replace this with an accurate description you get: "If you took Dembski's argument, and applied it to numbers, you'd be arguing that because most numbers can't be represented by any notation, that means that you can't write rational numbers without intelligent intervention." After that, I was a bit too discouraged to keep reading. StephenA
1 2

Leave a Reply