Confusing Probability: The “Every-Sequence-Is-Equally-Improbable” Argument
|June 4, 2017||Posted by Eric Anderson under Complex Specified Information, Design inference, Information, Intelligent Design, Mathematics, Probability|
Note to Readers:
The past few days on this thread there has been tremendous activity and much discussion about the concept of probability. I had intended to post this OP months ago, but found it still in my drafts folder yesterday mostly, but not quite fully, complete. In the interest of highlighting a couple of the issues hinted at in the recent thread, I decided to quickly dust off this post and publish it right away. This is not intended to be a response to everything in the other thread. In addition, I have dusted this off rather hastily (hopefully not too hastily), so please let me know if you find any errors in the math or otherwise, and I will be happy to correct them.
Confusing Probability: The “Every-Sequence-Is-Equally-Improbable” Argument
In order to help explain the concept of probability, mathematicians often talk about the flip of a “fair coin.” Intelligent design proponents, including William Dembski, have also used the coin flip example as a simplified way to help explain the concept of specified complexity.
For example, a flip of a fair coin 500 times can be calculated as a simple 2 to the 500th power, with the odds of such a sequence being approximately 1 in 3.3*10^150. Based on this simple example, I have heard some intelligent design proponents, perhaps a little too simplistically, ask: “What would we infer if we saw 500 heads flipped in a row?”
At this point in the conversation the opponent of intelligent design often counters with various distractions, but perhaps the favorite argument – certainly the one that at least at first blush appears to address the question with some level of rationality – is that every sequence is just as improbable as another. And therefore, comes the always implied (and occasionally stated) conclusion, there is nothing special about 500 heads in a row. Nothing to see here; move along, folks. This same argument at times rears its head when discussing other sequences, such as nucleotides in DNA or amino acid sequences in proteins.
For simplicity’s sake, I will discuss two examples to highlight the issue: the coin toss example and the example of generating a string of English characters.
At first blush, the “every-sequence-is-just-as-improbable-as-the-next” (“ESII” hereafter) argument appears to make some sense. After all, if we have a random character generator that generates a random lowercase letter from the 26 characters in the English alphabet, where each character is generated without reference to any prior characters, then in that sense, yes, any particular equal-length sequence is just as improbable as any other.
As a result, one might be tempted to conclude that there is nothing special about any particular string – all are equally likely. Thus, if we see a string of 500 heads in a row, or HTHTHT . . . repeating, or the first dozen prime numbers in binary, or the beginning of Hamlet, then, according to the ESII argument, there is nothing unusual about it. After all, any particular sequence is just as improbable as the next.
This is nonsense.
Everyone, including the person making the ESII argument, knows it is nonsense.
Bridge Random Generator for Sale
Imagine you are in the market for a new random character generator. I invite you to my computer lab and announce that I have developed a wonderful new random character generator that with perfect randomness selects one of 26 lowercase letters in the English alphabet and displays it, then moves on to the next position, with each character selection independent of the prior. If I then ran my generator and it spit out 500 a’s in a row, everyone in the world would immediately and clearly and unequivocally recognize that something funny was going on.
But if the ESII argument is valid, no such recognition is possible. After all, every sequence is just as improbable as the next, the argument goes.
Yet, contrary to that claim, we would know, with great certainty, that something was amiss. Any rational person would immediately realize that either (i) there was a mistake in the random character generator, perhaps a bad line of code, or (ii) I had produced the 500 a’s in a row purposely. In either case, you would certainly refuse to turn over your hard-earned cash and purchase my random character generator.
Why does the ESII argument so fully and abruptly contradict our intuition? Could our intuition about the random character generator be wrong? Is it likely that the 500 a’s in a row was indeed produced through a legitimate random draw? Where is the disconnect?
Sometimes intelligent design proponents, when faced with the ESII argument, are at a loss as to how to respond. They know – everyone knows – that there is something not quite right about the ESII argument, but they can’t quite put a finger on it. The ESII argument seems correct on its face, so why does it so strongly contradict our real-world experience about what we know to be the case?
My purpose today is to put a solid finger on the problems with the ESII argument.
In the paragraphs that follow, I will demonstrate that it is indeed our experience that is on solid ground, and that the ESII argument suffers from two significant, and fatal, logical flaws: (1) assuming the conclusion and (2) a category mistake.
Assuming the Conclusion
On this thread R0bb stated:
Randomly generate a string of 50 English characters. The following string is an improbable outcome (as is every other string of 50 English characters): aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
R0bb goes on to note that the probability of a particular string occurring is dependent on the process that produced it. I agree on that point.
Yet there is a serious problem with the “everything-is-just-as-improbable” line of argumentation when we are talking about ascertaining the origin of something.
When R0bb claims his string of a’s is just as improbable as any other string of equal length, that is only true by assuming the string was generated by a random generator, which, if we examine his example, is exactly what he did.
However, the way in which an artifact was generated when we are examining it to determine its origin is precisely the question at issue. Saying that every string of equal length is just as improbable as any other, in the context of design detection, is to assume as a premise the very conclusion we are trying to reach.
We cannot say, when we see a string of characters (or any other artifact) that exhibits a specification or particular pattern, that “Well, every other outcome is just as improbable, so nothing special to see here.” The improbability, as Robb pointed out, is based on the process that produced it. And the process that produced it is precisely the question at issue.
When we come across a string like: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa or some physical equivalent, like a crystal structure or a repeating pulse from a pulsar, we most definitely do not conclude it was produced by some random process that just happened to produce all a’s this time around, because, hey, every sequence is just as improbable as the next.
Flow of Analysis for Design Detection
Let’s dig in just a bit deeper and examine the proper flow of analysis in the context of design detection – in other words in the context of determining the origin of, or the process that produced, a particular sequence.
The proper flow of analysis is not:
- Assume that two sequences, specified sequence A and unspecified sequence B, arose from a random generator.
- Calculate the odds of sequence A arising.
- Calculate the odds of sequence B arising.
- Compare the odds and observe that the odds are equal.
- Conclude that every sequence is “just as likely” and, therefore, there is nothing special about a sequence that constitutes a specification.
Rather, the proper flow of analysis is:
- Observe the existence of specified sequence A.
- Calculate the odds of sequence A arising, assuming a random generator.
- Observe that a different cause, namely an intelligent agent, has the ability to produce sequence A with a probability of 1.
- Compare the odds and observe that there is a massive difference between the odds of the two causal explanations.
- Conclude, based on our uniform and repeated experience and by way of inference to the best explanation, that the more likely source of the sequence was an intelligent agent.
The problem with the first approach – the approach leading to the conclusion that every sequence is just as improbable as the next – is that it assumes the sequence under scrutiny was produced by a random generator. Yet the origin of the sequence is precisely the issue in question.
This is the first problem with the ESII claim. It commits a logical error in thinking that the flow of analysis is to assume a random generator and then compare sequences, when the question of whether a random generator produced the specified sequence in the first place is precisely the issue in question. As a result, the ESII argument against design detection fails on logical grounds because it assumes as a premise the very conclusion it is attempting to reach.
The Category Mistake
Now let us examine a more nuanced, but equally important and substantive, problem with the ESII argument. Consider the following two strings:
When we consider these two strings in the context of design detection, we immediately notice a pattern in the first string, in this case a short-period repeating pattern ‘ab’. That pattern is a specification. In contrast, the second string exhibits no clear pattern and would not be flagged as a specification.
At this point the ESII argument rears its head and asserts that both sequences are just as improbable. We have already dispensed with that argument by showing that it assumes as its premise the very conclusion it is trying to reach. Yet there is a second fundamental problem with the ESII argument.
Specifically, when we are looking at a new artifact to see whether it was designed, we need not be checking to see if it conforms to an exact, previously-designated, down-to-the-letter specification. Although it is possible that in some particular instance we might want to home in on a very specific pre-defined sequence for some purpose (such as when checking a password), in most cases we are interested in a general assessment as to whether the artifact exhibits a specification.
If I design a new product, if I write a new book, if I paint a new painting – in any of these cases, someone could come along afterwards and recognize clear indicia of design. And that is true even if they did not have in mind a precise, fully-detailed description of the specification up front. It is true even if they are making what we might call a “post specification.”
Indeed, if the outside observer did have such a fully-detailed specification up front, then it would have been them, not I, that had designed the product, or wrote the book, or painted the painting.
Yet, the absence of a pre specification does not deter their ability to — correctly and accurately — infer design in the slightest. As with the product or the book or the painting, every time we recognize design after the fact, which we do regularly every day, we are drawing an inference based on a post specification.
The reason for this is that when we are looking at an artifact to determine whether it is designed, we are usually analyzing its general properties of specification and complexity rather than the very specific sequence in question. Stated another way, it is the fact of a particular type of pattern that gives away the design, not necessarily the specific pattern itself.
Back to our example. If instead of aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa, our random character generator produced ababababababababababababababab, we would still be confident that something was fishy with the random character generator. The same would be true with acacacacacacacacacacacacacacac and so on. We could also alter the pattern to make it somewhat longer, perhaps abcdabcdabcdabcdabcdabcdabcdabcd or even abcdefghabcdefghabcdefghabcdefgh and so on.
Indeed, there are many periodic repetitive patterns that would raise our suspicions just as much and would cause us to conclude that the sequence was not in fact produced by a legitimate random draw.
How many repetitive sequences would raise our suspicions – how many would we flag as a “specification”? Certainly dozens or even hundreds. Likely many thousands. A million? Yes, perhaps.
“But, Anderson,” you complain, “doesn’t your admission that there are many such repetitive sequences mean that we have to increase the likelihood of a random process stumbling upon a sequence that might be considered a “specification”? Absolutely. But let’s keep the numbers in context.
From Repetition to Communication
I’ll return to this in a moment, but first another example to drive the point home. In addition to many repetitive sequences, don’t we also have to consider non-repetitive, but meaningful sequences? Absolutely.
David Berlinski, in his classic essay, The Deniable Darwin, notes the situation thusly:
Linguists in the 1950’s, most notably Noam Chomsky and George Miller, asked dramatically how many grammatical English sentences could be constructed with 100 letters. Approximately 10 to the 25th power, they answered. This is a very large number. But a sentence is one thing; a sequence, another. A sentence obeys the laws of English grammar; a sequence is lawless and comprises any concatenation of those 100 letters. If there are roughly 10^25 sentences at hand, the number of sequences 100 letters in length is, by way of contrast, 26 to the 100th power. This is an inconceivably greater number. The space of possibilities has blown up, the explosive process being one of combinatorial inflation.
Berlinski’s point is well taken, but let’s push him even further. What about other languages? Might we see a coherent sentence show up in Spanish or French? If we optimistically include 1,000 languages in the mix, not just English, we start to move the needle slightly. But, remarkably, still not very much. Even generously ignoring the very real problem of additional characters in other languages, we end up with an estimate of something on the order of 10^28 language sequences – 10^28 patterns that we might reasonably consider as specifications.
In addition to Chomsky’s and Miller’s impressive estimate of coherent language sentences, let’s now go back to where we started and add in the repetitive patterns we mentioned above. A million? A billion? Let’s be generous and add 100 billion repetitive patterns that we think might be flagged as a specification. It hardly budges the calculation. It is a rounding error. We still have approximately 10^28 potential specifications.
10^28 is a most impressive number, to be sure.
But, as Berlinski notes, the odds of a specific sequence in a 100-character string, is 1 in 26^100, or 3.14 x 10^141. Just to make the number simpler for discussion, let’s again be more generous and divide by a third: 1 x 10^141. If we subtract out the broad range of potential specifications from this number, we are still left with an astronomically large number of sequences that would not be flagged as a specification. How large? Stated comparatively, given a 100-letter randomly-generated sequence, the odds of us getting a specification — not a particular pre-determined specification, any specification — are only 1 in 10^113.
What Are the Odds?
What this means in practice is that even if we take an expansive view of what can constitute a “specification,” the odds of a random process ever stumbling upon any one of these 10^28 specifications is still only approximately 1 in 10^113. This is an outrageously large number and one that gives us excellent confidence, based on what we know and our real-world experience, that if we see any of these specifications – not just a particular one, but any one of them out of the entire group of specifications, that it likely did not come from a random draw. And it doesn’t even make much difference if our estimate of specifications is off by a couple orders of magnitude. The difference between the number of specifications and non-specifications is so great it would still be a drop in the proverbial bucket.
Now 1 in 10^113 is a long shot to be sure; it is difficult if not impossible to grasp such a number. But intelligent design proponents are willing to go further and propose that a higher level of confidence should be required. Dembski, for example, proposed 1 in 10^150 as a universal probability bound. In the above example, Berlinski and I talked of a 100-character string. But if we increase it to 130 characters then we start bumping up against the universal probability bound. More characters would of course compound the odds.
Furthermore, when we have, as we do with living systems, multiple such sequences that are required for a molecular machine or a biological process or a biological system – arranged as they are in their own additional specified configuration that would compound the odds – then such calculations quickly go off the charts.
We can quibble about the exact calculations. We can add more languages and can dream up other repetitive patterns that might, perhaps, be flagged as specifications. We can tweak the length of the sequence and argue about minutiae. Yet the fundamental lesson remains: the class of nonsense sequences vastly outnumbers the class of meaningful and/or repetitive sequences.
To sum, when we see the following sequences:
We need to understand that rather than comparing one improbable sequence with another equally improbably sequence, what we are really comparing is a recognizable pattern, in the form of either (1) a repetitive sequence or (2) a meaningful sequence, versus (3) what appears to be a nonsense, random draw.
Properly formulated thusly, the probability of (1) or (2) versus (3) is definitely not equal.
Not even close.
Not even in the ballpark.
Thus, the failure to carefully identify what we are dealing with for purposes of design detection gives the ESII proponent the false impression that when choosing between a recognizable pattern and a random draw we are dealing with equivalent odds. We are not.
While examples of coin tosses and character strings may be oversimplifications in comparison to biological systems, such examples do give us an idea of the basic probabilistic hurdles faced by any random-based process.
The ESII argument, popular though it may be among some intelligent design opponents, is fatally flawed. First, because it assumes as a premise (random generation) the very conclusion it seeks to reach. Second, because it fails to properly define sequences, mistakenly assuming that a random sequence is on the same probabilistic footing as a patterned/specified sequence, rather than properly looking at the relative sizes of the two categories of sequences.
Opponents of intelligent design may be able to muster rational arguments that question the strength of the design inference, but the “every-sequence-is-equally-improbable” argument is not one of them.