A Tutorial on Specified Complexity

I’ve found that a lot of people who are interested in Intelligent Design are nonetheless unaware of the mathematics behind it. Therefore, I decided to do some videos teaching the basic ideas.

I would love to hear any feedback you have on the videos, or anything that you would like to see covered.

Originally, I was going to redo this video into a series of shorter videos, but time has prevented me. So, I’ll start with this one and we’ll see where we go from here.

I would also like to hear any criticisms of the math presented here. That is, not on the applications of it elsewhere (we can debate it in another thread), or on other people’s usage of the mathematics (we can also debate that elsewhere), but just on the basic ideas presented here.

16 Replies to “A Tutorial on Specified Complexity”

1. 1
anthropic says:

Excellent video! Thank you so much, really helped me understand specified complexity and the warrant to design on a mathematical level.

2. 2
rigby says:

Ditto. Excellent video! Totally worth watching all the way through to the end.

3. 3
daveS says:

Very nice video, johnnyb, I think it lays out the concepts quite clearly.

4. 4
daveS says:

johnnyb,

One very basic question, if you don’t mind, concerning the part around 4:00 where you discuss converting probabilities to bits.

Let’s say we are working with the experiment where you roll two fair dice and add the two numbers.

E is the event of getting a sum of 4, while F is the event of getting a sum of 7.

Then if you tell me E has occurred, you have given me -log_2(1/12) = 3.58 bits of information. If you tell me F has occurred, you have given me log_2(1/6) = 2.58 bits of information.

Since E has lower probability than F, if you tell me E has happened, then you have given me more “specific” information than you would have by telling me F happened. By that I mean that in the case of E occurring, I can narrow down the exact outcome to a smaller (in probability) region of the sample space than with F. Is that right?

5. 5
johnnyb says:

daveS – that is correct.

6. 6
kairosfocus says:

JB,

Good attempt. Useful simplification.

There are a few quibbles, e.g. I will suggest that order of magnitude usually refers to a factor of ten so take off 4 bits not 1, early on. 1000 vs 996 not 999.

I found it interesting to see how you worked around the concept of probability vs information, and of course heard in the background all along the issue of a configurational space W vs the relevant zone T including case of interest E. (And yes, that is where statistical mechanics enters the picture.)

This video brings out the distinction between the more general concept specification and the narrower one, functional specificity that by addressing highly contingent components that must be organised in one of a relatively few ways from all clumped or scattered possibilities, will naturally be in narrow zones in the config space W. (Think of having the parts of an Abu 6500 C3 reel in a bait bucket. How many ways will there be no function as a reel vs how many ways will there be functionality as the famous reel? [See where the concept islands of function in a sea of non-functional possibilities comes from? BTW, this term traces to Dembski.])

Magic step 1, give a description of the reel in functional form, in some bit-based practical description language, e.g. AutoCAD. This reduces to a structured pattern of y/n questions, with their answers. (We can contrast, randomising that description by hitting it with white noise.)

Magic step 2, give the 10^57 atoms of our sol system, or the 10^80 of the observable cosmos, one bait bucket of 6500 parts each, then shake up for 10^17 s, where we can see this as 10^12 – 14 shakes per second as a maximum practical rate — rates of fast chem rxns, not Planck time events. This is the creation of an ensemble, a common analytical move in statistical mechanics.

In what fraction of such buckets will we expect to find a functional reel or the like?

Once the description length for such a reel is greater than 500 – 1,000 bits, the answer to all but utter certainty, will be NIL.

That is because, the utterly overwhelming bulk of possible configs of 500 – 1,000 bits, from 000 . . . 0 to 111 . . . 1 inclusive, will be a near 50:50 H:T distribution, in no particular order or organisation. (That is, to describe the string [in hopes of compressing it into an algorithm that could get our result step by step], we basically have to list it, as you pointed out above.)

Of course, magic step 3: we don’t need to use the bait buckets full of reels, we can instead have a numerically controlled machine that reads the description language and would build the reel form parts as instructed, then we can use instead trays of 500 – 1,000 coins each, and flip at the given rate, for the given time.

Magic step 4: equivalent to coins, we can have paramagnetic bodies with 500 – 1,000 atoms each, so we can indeed flip our coins at that rate more or less.

Then we see that we are saying in effect our machine tries the assembly 10^12 – 10^14 times per second, once it reads an updated string! (This brings out just how ridiculously generous the plausibility threshold is.)

So, once we see that probabilities on fair coin toss searches and information are more or less dual to one another — and there are [more complicated] adjustments for non-fair coin searches (cf. your 99% H coin case) — we can use the information content, description approach freely.

Further to this, we see that complex specified information [CSI] is a general approach, with functionally specific organisation and information [FSCO/I] a subset for particular cases of high interest.

Of these, the most interesting cases are those tied to digitally coded, functionally specific information, dFSCI. For, we find TEXT in DNA in the cell, forming machine code and associated with a molecular nanotech information processing system that makes proteins and especially enzymes. As, Crick recognised in his March 19, 1953 letter to his son.

Yet further, we can speak in terms of search challenge beyond a threshold in the context of a highly contingent functionally configured whole (like the 6500 fishing reel) on the gamut of the solar system or the observed cosmos. to do so, we exploit the point that FSCO/I naturally comes in narrow islands of function in large config spaces.

Basically, we can go like

Chi_500 = I_p * S – 500,

in functionally specific bits beyond the sol system threshold, or 1000 for the observed cosmos threshold.

Where, S is a dummy variable assumed 0 by default and going to 1 when there is warrant to infer functionally specific configuration; with I_p being the bit length in some relevant description language. (This is actually equivalent to the effect of the explanatory filter in per aspect form as I have pointed out over the years.)

In effect if something has in it functionally specific configurational information beyond a threshold, we may confidently infer design to all but utter certainty. Certainly, to moral certainty.

That is, we are here pointing to Dembski’s observation, that in biological contexts, specification is “cashed out” as function.

But the above is directly, readily empirically testable as a heuristic: find something of observed causal origin with 500+ bits, and note that it is definitively observed to have come about by blind chance and/or mechanical necessity, and the design inference frame of thought on CSI etc is finished.

Of course, the observational base, starting with the Internet, is in the trillions, and it reliably indicates that the inference is strong.

In short, the analysis gives a reason why we see the pattern that FSCO/I and especially dFSCI, are strong indicators of design as cause.

Add, Newton’s rule that we should only use observed adequate causes — vera causa principle — in causal explanations of traces from objects and events which we did not directly observe.

Instantly, we see that we are well warranted to infer that the DNA in the living cell, and its general organisaiton are designed. That is, produced by intelligently directed configuration.

Similarly, for major body plans (which require 10 – 100+ million bases of fresh DNA) all the way across the tree of life to our own.

As GP has been fond of pointing out, the presence of many very small protein fold domains deeply isolated in AA sequence space even as marking distinctions between closely related species, is itself already beyond the relevant threshold.

So, when folks want to hold up “evolution” as dismissive of the design inference, they need to be aware that hey are implicitly begging big questions. Not least, by smuggling in a criterion of decision that breaches Newton’s vera causa rule and imposes instead a premise that directly implies that design is ruled out on matters of origins science, so called methodological naturalism.

I think it is fair comment to observe that vera causa is far more credible as a rule of inductive inference than is the imposed principle that locks its implications out on this case.

It is time for some serious rethinking.

KF

PS: That Cloud Flare test is getting aggressive again

7. 7
kairosfocus says:

DS & JB: Yes, the 7 is more surprising than the 4 and is more informational; where, no surprise, no information. Perhaps this from my always linked note will help those digging in deeper. KF

8. 8
kairosfocus says:

PS: Pardon, I have 7 and 4 reversed. The 4 is the more surprising cluster of outcomes than the 7. In stat mech terms the 7 has the higher statistical weight. And yes, all of these areas of thought converge.

9. 9
daveS says:

Thanks johnnyb and KF.

10. 10

Then if you tell me E has occurred, you have given me -log_2(1/12) = 3.58 bits of information. If you tell me F has occurred, you have given me log_2(1/6) = 2.58 bits of information.

Just a minor point about “being given” different amounts of information depending on whether you are given an E or an F. Perhaps I am woefully misreading your question (completely likely), but it seems to me that you are being given 1 of 11 possible answers to a question, corresponding to the 11 possible sums of two die. The additional information regarding probabilities is information you already had.

11. 11
daveS says:

UB,

… it seems to me that you are being given 1 of 11 possible answers to a question, corresponding to the 11 possible sums of two die. The additional information regarding probabilities is information you already had.

Yes, I would have to agree with that. But then we can evaluate the information content of such a “message”, I take it.

So far, my understanding is that messages of the form “the sum of the dice is n” (or, more generally, some event E has occurred) convey to me some amount of information which we can quantify as in the OP.

For another example, if someone says “the sum of the dice was between 2 and 12 inclusive”, ultimately they have conveyed 0 bits of information to me (so they have in essence told me nothing).

12. 12
steveh says:

I found your video quite useful for the most part – very carefully explained in great detail until the last seconds where you brought up the relationship to Dembskis work – and then things got very sketchy indeed.

At around 42:50 you explained that the Specified Complexity Formular is

C(T|H)) – K(T|H – 500

and then stated

The formula given by Dembski is the same, just using different notation:
-log_2 [2^500 phi_s(T).P(T|H)]

where phi_S(T) = 2^(K(T|H))

If I have understood what you said earlier in the video correctly, I think you are pulling a fast one here. K(T|H) is (paraphrased)

the length of the shortest program that will produce target T under hypothesis H.

and phi_S(T) is (from dembski) [https://billdembski.com/documents/2005.06.Specification.pdf]

the number of patterns for which S’s semiotic description of them is at least as simple as S’s semiotic description of T

This appears to me to be much more than a change in notation. You and Dembski appear to be discussing completely different concepts.

i.e your phi_S(T) is 2^(length in bits of smallest program that will reproduce the pattern) and Dembski’s is a count of a number of patterns.

Maybe the two concepts are related in some mind-boggling way but I don’t think it’s valid to gloss it over as a notation change because the numbers don’t match in the simple 1000-Heads example. For you the K(T|H) is 64 bits [@13:45 in the video] and phi_S(T) wouuld be the largest number that can be written with 64 bits (approx NINETEEN QUADRILLION); For Dembski phi_S(T) for 1000 coins is TWO (his example used 100 coins but the same logic also gives two for 1000). [page 17]

And then you make matters worse by casually stating that you can compute CSI for biological organisms, which as I understand it has never been done (regardless of anything KF says). Demsbski described what factors the calculations should consider:

P(T|H) as the probability for the chance formation for the bacterial flagellum. T, here, is conceived not as a pattern but as the evolutionary event/pathway that brings about that pattern (i.e., the bacterial flagellar structure). Moreover, H, here, is the relevant chance hypothesis that takes into account Darwinian and other material mechanisms.

In the only attempts I have seen at calculating CSI, P(T|H) is essentially calculated as the probability of forming a protein / dna strand entirely by chance by adding bases at random and then getting the required result at the first attempt ( 4^num-bases). That’s not modelling the Darwinian mechanism.

The only attempts at calculating phi_S(T) I have seen have been by choosing four concepts at random from a dictionary of 100000 concepts (Different from both formulations discussed above) or by using using a number of alternative patterns that could produce the same result (then why not just factor that into P(T|H)?) or IIUIC by using a constant of ONE (indicating functional) or ZERO (none functional).

13. 13
johnnyb says:

steveh –

Thanks for watching, and for watching all the way through! I’m going to start in the middle, because I think you misunderstood an important point. You said,

And then you make matters worse by casually stating that you can compute CSI for biological organisms, which as I understand it has never been done

I did not state this, in fact, I purposefully stayed away from this question precisely because I wanted to focus on the mathematics rather than the applications. If I accidentally slipped and put something like this in, please let me know the timecode so I can cut that part out.

Now, as to the difference between phi_s(t) and K(T), there are a few things to consider:

Note that earlier in the video, I noted that you can use any defensible method (i.e., conditions of attachment, etc.) for generating the size of the specification space.
Also note that these are all upper-bounding methods.
The shortest sequence that I can think of to generate a sequence of 1,000 characters in ruby is 64 bits, so, therefore, saying the specification is 64-bits is probably a massively overstated upper bound.
I don’t remember Dembski’s own method for generating specificational complexity outside of Kolmogorov complexity, but both he and most others agree that KC is much more defensible for generating the properties of detachability, etc. Most of his current work uses KC precisely for this reason.

I don’t know if Dembski’s original computation of phi_S(t) of being 1 bit is correct or not, but, as I said, it isn’t unreasonable because both (a) they are both upper bounds, and (b) the smallest program to generate the right number of characters is 64-bits. So, it actually seems like it is on the right scale.

Technically, KC involves a +C attached to it, but, in general, I think it works well, and in fact grossly overestimates phi_s(t) because of the redundancies in the language, in the alphabet, and the dead space in the syntax of the language.

So that deals with your questions regarding what was actually in the video. My next post will cover the rest, but I want to make sure we separate out what was in the video from what other topics we are talking about.

14. 14
johnnyb says:

First of all, I would agree with the statement that specified complexity as it is outlined here is hard to apply to biology, especially to do all the things people want it to do. I think that, for instance, finding the specified complexity of the flagellum is problematic for a number of reasons, including the ones you state. Now, technically, I think it is at least in theory possible to factor natural selection into P(T|H), but I don’t think anyone has done it explicitly (we will see some ways around this later).

However, if one is calculating the specified complexity of the first life, then one can indeed ignore natural selection for P(T|H). In fact, this actually has been done in the peer-reviewed literature, though by another name and with some other differences (all which actually give less probability to the origin of life than would Dembski’s). It was allowed in at the time because the criticism was aimed specifically at the prebiotic soup theory, despite the fact that it applies to any given non-design abiogenesis theory.

Estimating P(T|H) for the origin of life is actually much simpler, as it can be done on a stochastic model for the smallest possible organism. While it is true that we don’t know for sure what the smallest possible organism is, I think that we can make reasonable estimates from known data, based on the minimal genome size of modern organisms, the requirements of self-replication, etc. If you have a defensible minimum size (I would probably be willing to go with any size you had in mind, provided you could reasonably justify it), then you can pretty much find an upper bound to the probability just using information theory (physics would actually cause the actual probability to be *much* lower, because the size of the molecule would increase the probability that it broke up in absence of an existing organism).

That leaves finding phi_s(t). Some think that self-replication is not detachable from biology (I think it is, but whatever). Nonetheless, it is not hard to describe self-replication using other terms, such as “a sequence of amino acids such that it is able to produce a new copy of the same set of amino acids for multiple generations with at least 90% accuracy and seek building blocks for the same”. Using the dictionary method, that’s 709 bits. So, if you think that a food-seeking self-replicator can arise in less than 1,209 bits, then it is possible that such a self-replicator might have arisen sometime in the history of the universe. Personally, I think that 500 bits is a bit much (I prefer Yockey’s calculation), but I won’t make a big deal about it here.

15. 15
johnnyb says:

What’s really interesting, though, is using Active Information and the No Free Lunch theorem. NFL gives an expected average value for search success rates. Therefore, having search success rates significantly above NFL is itself an indicator that something is skewed positively. Even if it turns out that natural selection itself is somehow able to do the things it is said (not likely, for reasons shown here), then that would indicate that there was design within either the universe or natural selection to facilitate this operation.

In fact, NFL and Active Information allow us to do really interesting things, like measure the amount of information that a cell applies to its own evolution. I’ll probably be presenting my paper on that in the AM-Nat Biology meeting in February.

Anyway, I don’t think I explained that last one well, but it’s midnight and I still have to write finals for my students. Hopefully another day. Thanks for taking the time to watch the video!

16. 16
kairosfocus says:

DS, actually, it is a lump of background information ( –> there is a world of possibilities otherwise, starting with loaded dice as a near alternative world, and going on from there, perhaps endlessly [a la “animal, vegetable or mineral”]!) that is important to understand that you have an alphabet of possible outcomes, with varying likelihoods, tracing to a particular dynamic process, a pair of fair dice and linked a priori or frequentist probabilities. This in effect defines a frame — a model world — and a description language. A set of possibilities and probabilities like this is informational. Then, in that context, 7 is less informative than 4 as there are more ways that the former can be had than the latter. Things get interesting if you then get a long string of dice and define each as a six-state code element [~2.585 bits/character . . . i.e. y/n choices], clustering the digits in clusters, to code more interesting results, e.g. 3-letter codons with 6 states would give us 216 possibilities, a whole lot more than 64. (BTW, I think someone out there is doing that, extending the 4-base genome world to 6-bases.) Now, we can code in dice, and the baseline probabilities of individual dice and triplets of dice allow us to identify information carrying capacity. Likely, we will throw some away by having don’t care [for now] states. So, we see now a world in which we can have codes based on dice, exploiting the high contingency of the 6-state elements in an organised way, as opposed to the more common random dice-toss approach. Talking strings of dice is less loaded in the e-YES vs EYE-s sense [ –> try the experiment in the joke that is likely still making the rounds, you will be amazed at how people struggle RW with this one . . . ], too, so maybe it will help throw some lateral illumination on the more polarised matters. KF

PS: JB, the only thing worse than sitting exams is setting and marking them!