Uncommon Descent Serving The Intelligent Design Community

An attempt at computing dFSCI for English language

Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

In a recent post, I was challenged to offer examples of computation of dFSCI for a list of 4 objects for which I had inferred design.

One of the objects was a Shakespeare sonnet.

My answer was the following:

A Shakespeare sonnet. Alan’s comments about that are out of order. I don’t infer design because I know of Shakespeare, or because I am fascinated by the poetry (although I am). I infer design simply because this is a piece of language with perfect meaning in english (OK, ancient english).
Now, a Shakespeare sonnet is about 600 characters long. That corresponds to a search space of about 3000 bits. Now, I cannot really compute the target space for language, but I am assuming here that the number of 600 characters sequences which make good sense in english is lower than 2^2500, and therefore the functional complexity of a Shakespeare sonnet is higher than 500 bits, Dembski’s UPB. As I am aware of no simple algorithm which can generate english sonnets from single characters, I infer design. I am certain that this is not a false positive.

In the discussion, I admitted however that I had not really computed the target space in this case:

The only point is that I have not a simple way to measure the target space for English language, so I have taken a shortcut by choosing a long enough sequence, so that I am well sure that the target space /search space ratio is above 500 bits. As I have clearly explained in my post #400.
For proteins, I have methods to approximate a lower threshold for the target space. For language I have never tried, because it is not my field, but I am sure it can be done. We need a linguist (Piotr, where are you?).
That’s why I have chosen and over-generous length. Am I wrong? Well, just offer a false positive.
For language, it is easy to show that the functional complexity is bound to increase with the length of the sequence. That is IMO true also for proteins, but it is less intuitive.

That remains true. But I have reflected, and I thought that perhaps, even if I am not a linguist and not even a amthematician, I could try to define better quantitatively the target space in this case, or at least to find a reasonable higher threshold for it.

So, here is the result of my reasonings. Again, I am neither a linguist nor a mathematician, and I will happy to consider any comment, criticism or suggestion. If I have made errors in my computations, I am ready to apologize.

Let’s start from my functional definition: any text of 600 characters which has good meaning in English.

The search space for a random search where every character has the same probability, assuming an alphabet of 30 characters (letters, space, elementary punctuation) gives easily a search space of 30^600, that is 2^2944. IOWs 2944 bits.

OK.

Now, I make the following assumptions (more or less derived from a quick Internet search:

a) There are about 200,000 words in English

b) The average length of an English word is 5 characters.

I also make the easy assumption that a text which has good meaning in English is made of English words.

For a 600 character text, we can therefore assume an average number of words of 120 (600/5).

Now, we compute the possible combinations (with repetition) of 120 words from a pool of 200000. The result, if I am right, is: 2^1453. IOWs 1453 bits.

Now, obviously each of these combinations can have n! permutations, therefore each of them has 120! different permutation, that is 2^660. IOWs 660 bits.

So, multiplying the total number of word combinations with repetitions by the total number of permutations for each combination, we have:

2^1453 * 2^660 = 2^2113

IOWs, 2113 bits.

What is this number? It is the total number of sequences of 120 words that we can derive from a pool of 200000 English words. Or at least, a good approximation of that number.

It’s a big number.

Now, the important concept: in that number are certainly included all the sequences of 600 characters which have good meaning in English. Indeed, it is difficult to imagine sequences that have good meaning in English and are not made of correct English words.

And the important question: how many of those sequences have good meaning in English? I have no idea. But anyone will agree that it must be only a small subset.

So, I believe that we can say that 2^2113 is a higher threshold for out target space of sequences of 600 characters which have a good meaning in English. And, certainly, a very generous higher threshold.

Well, if we take that number as a measure of our target space, what is the functional information in a sequence of 600 characters which has good meaning in English?

It’s easy: the ratio between target space and search space:

2^2113 / 2^ 2944 = 2^-831. IOWs, taking -log2, 831 bits of functional information. (Thank you to drc466 for the kind correction here)

So, if we consider as a measure of our functional space a number which is certainly an extremely overestimated higher threshold for the real value, still our dFSI is over 800 bits.

Let’s go back to my initial statement:

Now, a Shakespeare sonnet is about 600 characters long. That corresponds to a search space of about 3000 bits. Now, I cannot really compute the target space for language, but I am assuming here that the number of 600 characters sequences which make good sense in english is lower than 2^2500, and therefore the functional complexity of a Shakespeare sonnet is higher than 500 bits, Dembski’s UPB. As I am aware of no simple algorithm which can generate english sonnets from single characters, I infer design. I am certain that this is not a false positive.

Was I wrong? You decide.

By the way, another important result is that if I make the same computation for a 300 character string, the dFSI value is 416 bits. That is a very clear demonstration that, in language, dFSI is bound to increase with the length of the string.

Comments
KeithS said, Yes, but I’m saying something more as well: the fact that we can always write such a program places a firm upper bound on the Kolmogorov complexity of any finite string. I say, You can "always write such a program" to produce a non-computable string ???? what? Can you not see the blatant contradiction? or are you claiming that every finite string is computable? If that is your claim then we are at a metaphysical impasse. The only way forward that I can see is to do science!! Prove that you can produce an algorithm that will fool an observer infallibly with out cheating when it comes to IC configurations and my claim that they are non-computable will be falsified. If you can't do that my hypothesis stands peacefifthmonarchyman
November 28, 2014
November
11
Nov
28
28
2014
03:50 AM
3
03
50
AM
PDT
gpuccio:
Sorry to intrude...
No need to apologize.
...but are you saying that we can output a string by a simple program if we already know it? :)
Yes, but I'm saying something more as well: the fact that we can always write such a program places a firm upper bound on the Kolmogorov complexity of any finite string. So FMM's claim is clearly wrong:
Kolmogorov_complexity of an object, such as a piece of text, is a measure of the computability resources needed to specify the object. since the computability resources needed to specify an IC object is infinite the Kolmogorov complexity of said object is infinite by definition.
The Kolmogorov complexity of a finite string is finite. Irreducible complexity has nothing to do with it.keith s
November 28, 2014
November
11
Nov
28
28
2014
12:12 AM
12
12
12
AM
PDT
keith s and fifthmonarchyman: Just another thought. From my OP, and considering the English language as made of 500000 words, we can see that there are only about 2^2271 sequences of 600 characters made of correct English words. This is a finite number, although a very big one. About 10^684. So, an algorithm which includes a list of those 500000 words can, in a time which I will not try to compute, but finite anyway, output the whole list of all possible sequences of 600 characters made of English words. As said, the sequences which have good meaning in English will be among them. As said, all possible sonnets of that length, including my favorite from Shakespeare's, "Why is my verse so barren of new pride", would be there. So, has the algorithm computed Shakespeare's sonnet? The answer is: no. The algorithm has computed the whole list of sequences made of English words from a list of all English words. Which is a perfectly possible computational task, a rather simple one too, even is a very long task indeed. What the algorithm can output is a very long list. But in no way it can output a list of all the sequences which have meaning. Because it has no idea of what meaning is. Let's make an example. The first verse is: "Why is my verse so barren of new pride" Now, a possible similar sequence would be: "Why is one table more bestead than its fear" which has no detectable meaning, although it is syntactically correct. In particular, in the original verse, the use of the adjective "barren" for the verse is specially beautiful and evokes many complex and meaningful connotations, exactly because it is not an adjective that we would normally use in that context. We, as conscious observers, can easily understand that. Our conscious representation, evoked by the words, is immediately rich and deep. So, how could an algorithm understand that the original verse is meaningful and beautiful, while the second sequence is simply meaningless? How complex should an algorithm be to recognize all possible contexts of that kind?gpuccio
November 28, 2014
November
11
Nov
28
28
2014
12:04 AM
12
12
04
AM
PDT
keith s: Sorry to intrude, but are you saying that we can output a string by a simple program if we already know it? :) Yes, that's what you are saying. Maybe my post #863 about not using the specific bits of a sequence to specify it could have some relationship to this discussion? I am not an expert in Kolmogorov complexity, but is it possible that the real utility of it is to know if we can compute a string which we don't know in advance by an algorithm simpler than the string itself, and not if we can output a string which is already in the algorithm? The second fact (your example) seems really trivial, and it does not seem to have anything to do with computing the string. Now, even if the term "Kolmogorov complexity" is perhaps describing both cases, I think that we are dealing with two different concepts here. The interesting point is: how big must an algorithm be to compute a Shakespeare sonnet (or something equivalent) without previously knowing it? Maybe fifthmonarchyman's point is that such an algorithm would have infinite complexity. I would simply say that such an algorithm cannot exist, and that a conscious agent who understands meaning and has complex conscious representations is necessary to do that. Again, I apologize for the intrusion.gpuccio
November 27, 2014
November
11
Nov
27
27
2014
11:25 PM
11
11
25
PM
PDT
fifthmonarchyman:
since a Shakespearean sonnet is by definition a sonnet composed by Shakespeare the “overhead required” must specify all that is Shakespeare. Clearly that is a lot of information Since Shakespeare is a non computable function the value of C in this case is infinite this is not hard
True, it's not hard, but you're having a lot of trouble with it. C is not infinite. It isn't even large. The Kolmogorov complexity of a given string depends on the size of the program needed to produce that same given string on a specified machine. It has nothing to do with programming a Shakespeare emulator. Any specified finite string can be produced by a program that looks something like the following.
string = "<insert specified string here>"; output(string);
The program is longer than the string, obviously, but not much longer. C is a small number.keith s
November 27, 2014
November
11
Nov
27
27
2014
11:03 PM
11
11
03
PM
PDT
DNA_Jock: My specifications. 3 for the Shakespeare sonnet: 1) A sequence of 600 characters made of English words 2) A sequence of 600 characters which has good meaning in English 3) A sequence of 600 characters which has good meaning in any known language and one for a generic enzyme: 4) Any protein which can accelerate reaction A at least x times (or more). Please, tell me where in any of those definitions am I mentioning a specific sequence with specific bits, or am I using, or showing that I am aware of, the specific bits (characters or AAs) of an observed sequence. Nowhere. Instead, let's look at your attempt: “After decryption with algorithm X, the string becomes a passage in “good English” that describes [insert arbitrarily narrow specification of the passage’s content here]” What is "algorithm X"? Did it exist before your observation of the sequence? Was it built using the specific bits of the sequence? If the answer is that it did not exist before the observation of the sequence abd that it can work only on that particular random sequence because it has been engineered for that sequence and for its specific bits after having observed them, then you are really committing a perfect TSS fallacy. Not I. If, on the other hand, you give a generic algorithm which can transform any random sequence in an English phrase, then your specification has no functional complexity, because any random sequence is in the target. Please, show your "algorithm X", ans we will see. You have met my challenge? Absolutely not! And you say: "If I am not allowed, then the analogy fails: the protein-specifier IS using the observed functionality to come up with the specification." You said it! The protein-specifier is using THE OBSERVED FUNCTIONALITY to come up with the specification. Not the AAs in the sequence. I have never said that you cannot use any functionality observed in my random string. You can. Absolutely. And again, mine is not an "analogy". It is an example of how the dFSCI procedure works and correctly detects design in language sequences. Do you admit that it works, or are you trying to meet my challenge to show how the application of the procedure is an example of TSS fallacy?gpuccio
November 27, 2014
November
11
Nov
27
27
2014
10:40 PM
10
10
40
PM
PDT
me_Thinks asks I don’t know why you bring in Shakespeare here, I say, Shakespeare is important because we are using Shakespearean sonnets as a typical test case to illustrate what Irreducible complexity is and how the game works. If you need me to I can explain again how all this applies equally well to any IC configuration even those with no obvious designer like circles. let me know peacefifthmonarchyman
November 27, 2014
November
11
Nov
27
27
2014
08:07 PM
8
08
07
PM
PDT
Zachriel:
Gary S. Gaulin: The contradiction is that new phyla and species suddenly emerged instead of the usual “undergone modification” that came afterward.
Not sure what you mean. Perhaps you could provide a specific example.
Instead of the discovery of (what later became known as) the Cambrian Explosion having been predicted by Charles Darwin and a relief that a major prediction of the theory was finally shown to be true, the discovery came as a surprise to scientists who were certainly not expecting exponential increase like this: https://sites.google.com/site/intelligenceprograms/Home/JoeMeertTimeline.jpg Not having beforehand predicted a sudden proliferation of multicellular intelligence is one of the very serious weaknesses of Darwinian theory.Gary S. Gaulin
November 27, 2014
November
11
Nov
27
27
2014
07:12 PM
7
07
12
PM
PDT
Since Shakespeare is a non computable function the value of C in this case is infinite this is not hard
I don't know why you bring in Shakespeare here, unless you have to know who the designer was and his capabilities before you can decipher what ever string is given to you.Me_Think
November 27, 2014
November
11
Nov
27
27
2014
06:27 PM
6
06
27
PM
PDT
Zac says In this case, it’s the length of the string plus whatever overhead the language requires to define a literal. I say, since a Shakespearean sonnet is by definition a sonnet composed by Shakespeare the "overhead required" must specify all that is Shakespeare. Clearly that is a lot of information Since Shakespeare is a non computable function the value of C in this case is infinite this is not hard peacefifthmonarchyman
November 27, 2014
November
11
Nov
27
27
2014
05:53 PM
5
05
53
PM
PDT
* There are probably even shorter descriptions, as the English language is generally compressible.Zachriel
November 27, 2014
November
11
Nov
27
27
2014
05:37 PM
5
05
37
PM
PDT
fifthmonarchyman: Yes but is there any reason it can’t be more than one bit? There are an infinite number of possible algorithms of all possible lengths, however, Kolmogorov Complexity is defined as the shortest description. In this case, it's the length of the string plus whatever overhead the language requires to define a literal.Zachriel
November 27, 2014
November
11
Nov
27
27
2014
05:09 PM
5
05
09
PM
PDT
Zac says As low as a single bit, the bit determining whether what follows is to be read as a literal. I say, Yes but is there any reason it can't be more than one bit? Is there an upper limit to it's size? How would you determine what the upper limit was? Peacefifthmonarchyman
November 27, 2014
November
11
Nov
27
27
2014
05:02 PM
5
05
02
PM
PDT
Gary S. Gaulin: The contradiction is that new phyla and species suddenly emerged instead of the usual “undergone modification” that came afterward. Not sure what you mean. Perhaps you could provide a specific example.Zachriel
November 27, 2014
November
11
Nov
27
27
2014
04:22 PM
4
04
22
PM
PDT
fifthmonarchman: Can you speed the process up a little bit by giving me the value of C? As low as a single bit, the bit determining whether what follows is to be read as a literal.Zachriel
November 27, 2014
November
11
Nov
27
27
2014
04:22 PM
4
04
22
PM
PDT
Zachriel:
Gary S. Gaulin: I’m recalling contradictions such as Charles Darwin’s predicting the opposite of “punctuated eqilibrium” would be discovered in the fossil evidence. Actually, Darwin predicted just the opposite. the periods during which species have undergone modification, though long as measured in years, have probably been short in comparison with the periods during which they retain the same form. — Darwin, “Origin of Species”
I have to agree that the Cambrian Explosion can be said to be one of the "periods during which species have undergone modification". The contradiction is that new phyla and species suddenly emerged instead of the usual "undergone modification" that came afterward. There was no mention of that, due to Darwinian theory being unable to predict these events (like the theory that I defend easily did).Gary S. Gaulin
November 27, 2014
November
11
Nov
27
27
2014
01:25 PM
1
01
25
PM
PDT
At first glance it appears that the value of C is not computable. So the most you can say is that an algorithm can not determine if the total Kolmogorov Complexity in an IC string is infinite. With that I completely agree in fact that is my point. I'll study some more peacefifthmonarchyman
November 27, 2014
November
11
Nov
27
27
2014
01:08 PM
1
01
08
PM
PDT
Thanks Zac, I'll study this. Can you speed the process up a little bit by giving me the value of C? peacefifthmonarchyman
November 27, 2014
November
11
Nov
27
27
2014
12:49 PM
12
12
49
PM
PDT
fifthmonarchyman: According to the standard formal definition it is infinite if the string is IC. K(X|l(x)) is less than or equal to l(x) + C http://www.cs.princeton.edu/courses/archive/fall11/cos597D/L10.pdfZachriel
November 27, 2014
November
11
Nov
27
27
2014
12:04 PM
12
12
04
PM
PDT
#846 addendum FTR: Post #796 contains the post #s associated with the failed discussion.Dionisio
November 27, 2014
November
11
Nov
27
27
2014
11:35 AM
11
11
35
AM
PDT
zac says How many more terms do you intend to redefine? I say. I have not redefined any terms. I have used a rough and ready everyday dictionary definition of "computable" to save time and facilitate discussion on an informal internet blog. If I was to present my ideas in a formal paper I would be sure to specify that I'm not using the less restrictive mathematical definition of that term at the outset just as I have done repeatedly during this very thread. As far as Kolmogorov Complexity goes I'm using the standard formal definition. you say, The Kolmogorov Complexity of a finite string is not infinite. I say, According to the standard formal definition it is infinite if the string is IC. This is just not a debatable point. I'm sorry that you have no room in your Worldview for these very simple concepts but that is your problem not mine. In the mean time lets do science. If you think the Kolmogorov Complexity of an IC object is finite prove it write an algorithm that will fool the observer. peacefifthmonarchyman
November 27, 2014
November
11
Nov
27
27
2014
10:50 AM
10
10
50
AM
PDT
fifthmonarchyman: since the computability resources needed to specify an IC object is infinite the Kolmogorov complexity of said object is infinite by definition. The Kolmogorov Complexity of a finite string is not infinite. How many more terms do you intend to redefine?Zachriel
November 27, 2014
November
11
Nov
27
27
2014
10:20 AM
10
10
20
AM
PDT
Gary S. Gaulin @819 I don't have time to squander it on senseless discussions with folks who get upset when someone asks them simple questions. Why did you turn to name calling and personal attacks, as you did in this thread and in the 'third way' thread? Can't you just stick to the discussed subject? Is it because you don't like discussions outside your comfort zone? One who has strong arguments can be magnanimous to others. But if we lack strong arguments, we should humbly admit it. Or ask for additional clarification if the questions are not understood well. The only positive thing out of this could be that the discerning onlookers/lurkers can read what was written and arrive at their own conclusions. I wish the best to you. :)Dionisio
November 27, 2014
November
11
Nov
27
27
2014
10:19 AM
10
10
19
AM
PDT
zac says In any case, that supports the fact that the Kolmogorov Complexity of a finite string can’t be infinite as claimed. I say, You are missing the point. In an irreducibly complex string an algorithm will never be able to produce those finial few bits. from here http://en.wikipedia.org/wiki/Kolmogorov_complexity Kolmogorov_complexity of an object, such as a piece of text, is a measure of the computability resources needed to specify the object. since the computability resources needed to specify an IC object is infinite the Kolmogorov complexity of said object is infinite by definition. That is what we mean by irreducibly complex me_thinks said. Don’t bestow incredible powers of detecting double meaning to Kolmogorov Complexity I say, I am simply using the standard definition.Kolmogorov Complexity does not detect anything it measures something. "The computability resources needed to specify an object" Peacefifthmonarchyman
November 27, 2014
November
11
Nov
27
27
2014
09:46 AM
9
09
46
AM
PDT
The claim wasn't wrt a stringJoe
November 27, 2014
November
11
Nov
27
27
2014
08:58 AM
8
08
58
AM
PDT
Wikipedia: It can be shown that the Kolmogorov complexity of any string cannot be more than a few bytes larger than the length of the string itself. You do need to call the identity function or equivalent. (In a reductive language, that could be the string itself.) In any case, that supports the fact that the Kolmogorov Complexity of a finite string can't be infinite as claimed.Zachriel
November 27, 2014
November
11
Nov
27
27
2014
08:53 AM
8
08
53
AM
PDT
Zachriel, caught again:
Kolmogorov Complexity can’t be any greater than the length of the string.
Wikipedia on Kolmogorov Complexity: It can be shown that the Kolmogorov complexity of any string cannot be more than a few bytes larger than the length of the string itself. Joe
November 27, 2014
November
11
Nov
27
27
2014
08:34 AM
8
08
34
AM
PDT
fifthmonarchyman @ 839,
Zac: Kolmogorov Complexity can’t be any greater than the length of the string. Shakespeare’s sonnets are finite in length (14 iambic pentameter lines) 5th : You are assuming no hidden or double meaning in the string.
Don't bestow incredible powers of detecting double meaning to Kolmogorov ComplexityMe_Think
November 27, 2014
November
11
Nov
27
27
2014
08:24 AM
8
08
24
AM
PDT
fifthmonarchyman: You are assuming no hidden or double meaning in the string. Kolmogorov Complexity can’t be any greater than the length of the string. If you mean something else, then you have to define it explicitly. fifthmonarchyman: Than there is in the sum of the information of each of the characters in the string taken individually. There's that word "information" again. If you are making a qualitative claim, then sure, there is a lot to interpreting one of Shakespeare's sonnets. But if you are making a quantitative claim, then you have to be much more precise in your definitions. As for infinity, anything can have infinite (or at least vast) meaning when attached to the real world. If one says "warm", it may evoke any manner of feelings, experiences, or memories. There is a one-to-one mapping between the lotus blossom and the universe (assuming each are a continuum). So contemplate the lotus blossom.Zachriel
November 27, 2014
November
11
Nov
27
27
2014
08:21 AM
8
08
21
AM
PDT
Zac says Kolmogorov Complexity can’t be any greater than the length of the string. Shakespeare’s sonnets are finite in length (14 iambic pentameter lines) I say, You are assuming no hidden or double meaning in the string. There is a more information in the phrase "Me thinks it's a Weasel" Than there is in the sum of the information of each of the characters in the string taken individually. That is what we mean by Irreducible Complexity peacefifthmonarchyman
November 27, 2014
November
11
Nov
27
27
2014
08:07 AM
8
08
07
AM
PDT
1 2 3 4 5 31

Leave a Reply