Uncommon Descent Serving The Intelligent Design Community

An attempt at computing dFSCI for English language

Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

In a recent post, I was challenged to offer examples of computation of dFSCI for a list of 4 objects for which I had inferred design.

One of the objects was a Shakespeare sonnet.

My answer was the following:

A Shakespeare sonnet. Alan’s comments about that are out of order. I don’t infer design because I know of Shakespeare, or because I am fascinated by the poetry (although I am). I infer design simply because this is a piece of language with perfect meaning in english (OK, ancient english).
Now, a Shakespeare sonnet is about 600 characters long. That corresponds to a search space of about 3000 bits. Now, I cannot really compute the target space for language, but I am assuming here that the number of 600 characters sequences which make good sense in english is lower than 2^2500, and therefore the functional complexity of a Shakespeare sonnet is higher than 500 bits, Dembski’s UPB. As I am aware of no simple algorithm which can generate english sonnets from single characters, I infer design. I am certain that this is not a false positive.

In the discussion, I admitted however that I had not really computed the target space in this case:

The only point is that I have not a simple way to measure the target space for English language, so I have taken a shortcut by choosing a long enough sequence, so that I am well sure that the target space /search space ratio is above 500 bits. As I have clearly explained in my post #400.
For proteins, I have methods to approximate a lower threshold for the target space. For language I have never tried, because it is not my field, but I am sure it can be done. We need a linguist (Piotr, where are you?).
That’s why I have chosen and over-generous length. Am I wrong? Well, just offer a false positive.
For language, it is easy to show that the functional complexity is bound to increase with the length of the sequence. That is IMO true also for proteins, but it is less intuitive.

That remains true. But I have reflected, and I thought that perhaps, even if I am not a linguist and not even a amthematician, I could try to define better quantitatively the target space in this case, or at least to find a reasonable higher threshold for it.

So, here is the result of my reasonings. Again, I am neither a linguist nor a mathematician, and I will happy to consider any comment, criticism or suggestion. If I have made errors in my computations, I am ready to apologize.

Let’s start from my functional definition: any text of 600 characters which has good meaning in English.

The search space for a random search where every character has the same probability, assuming an alphabet of 30 characters (letters, space, elementary punctuation) gives easily a search space of 30^600, that is 2^2944. IOWs 2944 bits.

OK.

Now, I make the following assumptions (more or less derived from a quick Internet search:

a) There are about 200,000 words in English

b) The average length of an English word is 5 characters.

I also make the easy assumption that a text which has good meaning in English is made of English words.

For a 600 character text, we can therefore assume an average number of words of 120 (600/5).

Now, we compute the possible combinations (with repetition) of 120 words from a pool of 200000. The result, if I am right, is: 2^1453. IOWs 1453 bits.

Now, obviously each of these combinations can have n! permutations, therefore each of them has 120! different permutation, that is 2^660. IOWs 660 bits.

So, multiplying the total number of word combinations with repetitions by the total number of permutations for each combination, we have:

2^1453 * 2^660 = 2^2113

IOWs, 2113 bits.

What is this number? It is the total number of sequences of 120 words that we can derive from a pool of 200000 English words. Or at least, a good approximation of that number.

It’s a big number.

Now, the important concept: in that number are certainly included all the sequences of 600 characters which have good meaning in English. Indeed, it is difficult to imagine sequences that have good meaning in English and are not made of correct English words.

And the important question: how many of those sequences have good meaning in English? I have no idea. But anyone will agree that it must be only a small subset.

So, I believe that we can say that 2^2113 is a higher threshold for out target space of sequences of 600 characters which have a good meaning in English. And, certainly, a very generous higher threshold.

Well, if we take that number as a measure of our target space, what is the functional information in a sequence of 600 characters which has good meaning in English?

It’s easy: the ratio between target space and search space:

2^2113 / 2^ 2944 = 2^-831. IOWs, taking -log2, 831 bits of functional information. (Thank you to drc466 for the kind correction here)

So, if we consider as a measure of our functional space a number which is certainly an extremely overestimated higher threshold for the real value, still our dFSI is over 800 bits.

Let’s go back to my initial statement:

Now, a Shakespeare sonnet is about 600 characters long. That corresponds to a search space of about 3000 bits. Now, I cannot really compute the target space for language, but I am assuming here that the number of 600 characters sequences which make good sense in english is lower than 2^2500, and therefore the functional complexity of a Shakespeare sonnet is higher than 500 bits, Dembski’s UPB. As I am aware of no simple algorithm which can generate english sonnets from single characters, I infer design. I am certain that this is not a false positive.

Was I wrong? You decide.

By the way, another important result is that if I make the same computation for a 300 character string, the dFSI value is 416 bits. That is a very clear demonstration that, in language, dFSI is bound to increase with the length of the string.

Comments
fifthmonarchyman: As I demonstrated earlier there is infinite Kolmogorov Complexity in any Irreducibly Complex configuration Kolmogorov Complexity can't be any greater than the length of the string. Shakespeare's sonnets are finite in length (14 iambic pentameter lines).Zachriel
November 27, 2014
November
11
Nov
27
27
2014
07:11 AM
7
07
11
AM
PDT
ZAc says You’re saying there is infinite information in a sonnet? I say. As I demonstrated earlier there is infinite Kolmogorov Complexity in any Irreducibly Complex configuration That implication follows necessarily from the fact that such things are not computable* *Fine print: not able to be produced by a finite Turing Machine in a finite amount of time Peace Peacefifthmonarchyman
November 27, 2014
November
11
Nov
27
27
2014
07:05 AM
7
07
05
AM
PDT
Unguided evolution cannot account for any flagellum. So that would still be a problemJoe
November 27, 2014
November
11
Nov
27
27
2014
06:19 AM
6
06
19
AM
PDT
fifthmonarchy: That depends on the time inclination and resources of the observer. You're saying there is infinite information in a sonnet? Gary S. Gaulin: I’m recalling contradictions such as Charles Darwin’s predicting the opposite of “punctuated eqilibrium” would be discovered in the fossil evidence. Actually, Darwin predicted just the opposite. the periods during which species have undergone modification, though long as measured in years, have probably been short in comparison with the periods during which they retain the same form. — Darwin, "Origin of Species" Mung: Q1: What are the possible values returned by that function? Q2: Who or what chooses those values? We answered this above. The function, RandomLetter, returns a random letter (or space). It’s part of Dawkins’ original Weasel algorithm, which was what you requested. Of course, you can change this parameter if you want. Weasel is just a simple instance of a larger class of evolutionary algorithms.Zachriel
November 27, 2014
November
11
Nov
27
27
2014
06:18 AM
6
06
18
AM
PDT
fifthmonarchyman: Perhaps I need to remind everyone of the definition of IC that I’m using. In a system composed of connected “mechanisms” (nodes containing information and causally influencing other nodes), the information among them is said to be integrated if and to the extent that there is a greater amount of information in the repertoire of a whole system regarding its previous state than there is in the sum of the all the mechanisms’ considered individually. Gee whiz. When you redefine well-established terms, it just leads to confusion. You should either find the correct term, or coin a new one. What you are describing is called synergy or emergence. Irreducible means you can't remove any of its parts and still have the same thing. http://www.youtube.com/watch?v=Q_UsmvtyxEI fifthmonarchyman: The greater information in the repertoire of a whole system is the fact that these and only these sonnets are Shakespearean. Huh? They're Shakespearean by definition. You haven't actually made a distinction. fifthmonarchyman: The greater information in the repertoire of a whole system is the fact that these and only these pieces constitute a working BF. So? And? Turns out that there are more than one type of bacterial flagellum, and the parts vary also.Zachriel
November 27, 2014
November
11
Nov
27
27
2014
06:09 AM
6
06
09
AM
PDT
Mung: Is RandomLetter a function? What are the possible values returned by that function? Who or what chooses those values? Zachriel: Yes, it returns a random letter (or space). It’s part of the Dawkins’ original algorithm. Of course, you can change this if you want. It’s called an instance of a larger class. RandomLetter is a function. It returns a random letter or a space. Q1: What are the possible values returned by that function? Q2: Who or what chooses those values?Mung
November 26, 2014
November
11
Nov
26
26
2014
07:47 PM
7
07
47
PM
PDT
Me_thinks lets keep our eye on the prize here shall we Remember that the key is where the meat/magic is. If an Algorithm is supplied with the Key producing a string that will fool the observer is easy. With no key the program will never fool the observer. That is my hypothesis. The question of whether or not the observer can intuit the key given feedback while interesting is irreverent to my argument because Algorithms can't intuit anything by definition, peacefifthmonarchyman
November 26, 2014
November
11
Nov
26
26
2014
07:45 PM
7
07
45
PM
PDT
Me_think asks If your ‘game’ requires the designer to send feedback on whether something is right or wrong with every guess, how can you claim that you can Inuit specification/key ? I say, the designer does not tell me the key he just tells me whether my Idea of what the key is is correct. Imagine a small child asking his parent if a triangle is a circle and his parent pointing to another shape and saying "no this is a circle" . At some point the child will intuit what it takes to be a circle. That is all the feedback the observer gets from the designer it assumes that the key ("Ideal" circles) exists Now imagine a robot producing a random shape and a quality control agent throwing out every thing that is not a circle That is the feedback the programer gets it's exactly the same feedback the observer gets except it does not assume the key. That is the only difference you say, If those could be intuited, we wouldn’t need networked computers and cryptographers to break coded messages. I say, you are confusing intuition with mind reading these are not remotely the same thing peacefifthmonarchyman
November 26, 2014
November
11
Nov
26
26
2014
06:37 PM
6
06
37
PM
PDT
fifthmonarchyman @ 798
Now the problem is that my game right now is in the form of a excel sheet so I will need to send that to you in order for you plug your strings in.
You already have the strings in this thread. Just copy and use in your excel sheet. Get feedback from it. If your 'game' requires the designer to send feedback on whether something is right or wrong with every guess, how can you claim that you can Inuit specification/key ? If those could be intuited, we wouldn't need networked computers and cryptographers to break coded messages.Me_Think
November 26, 2014
November
11
Nov
26
26
2014
06:16 PM
6
06
16
PM
PDT
Or in other words evolutionary biology would gloss over another failed prediction, which is in this case more the fault of those who used the theory to predict something that the theory was not actually able to predict. The same is true of forming conclusions related to whether intelligence guided our genetic level development or it was unguided. Darwinian theory is simply not for explaining how intelligence (at any level) works. Future evolutionary biologists can easily enough say that I was right all along, and it's not the fault of the theory that some went overboard with it. So no matter how well you show that an idea some have is false Darwinian (evolutionary) theory would go on. Evolutionary Creationism also exists to even help cover the theory leading to a "God did it" answer, to cover all that only the ID model is for demonstrating.Gary S. Gaulin
November 26, 2014
November
11
Nov
26
26
2014
05:39 PM
5
05
39
PM
PDT
fifthmonarchyman:
By Darwinian evolution I mean the idea that all of biology can be explained by RM/NS plus whatever. That Idea would be falsified
I'm recalling contradictions such as Charles Darwin's predicting the opposite of "punctuated eqilibrium" would be discovered in the fossil evidence. It's easy enough to keep the theory going just by adding a new phrase to its vocabulary, which in time makes it seem like that's what the theory all along predicted.Gary S. Gaulin
November 26, 2014
November
11
Nov
26
26
2014
04:39 PM
4
04
39
PM
PDT
GSG said, Darwinian theory is too much of a generalization for it to thrown out by just one more thing that it could not predict I say, By Darwinian evolution I mean the idea that all of biology can be explained by RM/NS plus whatever. That Idea would be falsified peacefifthmonarchyman
November 26, 2014
November
11
Nov
26
26
2014
03:58 PM
3
03
58
PM
PDT
zac says, How long is this representation? How many points of comparison are there? I say, That depends on the time inclination and resources of the observer. There is no upper limit. peacefifthmonarchyman
November 26, 2014
November
11
Nov
26
26
2014
03:54 PM
3
03
54
PM
PDT
fifthmonarchyman:
If IC really exists then Darwinism is mathematically false.
I do not agree. Darwinian theory is too much of a generalization for it to thrown out by just one more thing that it could not predict, having to be added into it (at least another buzz-word) for the logical construct to remain scientifically coherent enough to keep the Darwinian empire going.Gary S. Gaulin
November 26, 2014
November
11
Nov
26
26
2014
03:48 PM
3
03
48
PM
PDT
Perhaps I need to remind everyone of the definition of IC that I'm using. quote In a system composed of connected "mechanisms" (nodes containing information and causally influencing other nodes), the information among them is said to be integrated if and to the extent that there is a greater amount of information in the repertoire of a whole system regarding its previous state than there is in the sum of the all the mechanisms' considered individually. In this way, integrated information does not increase by simply adding more mechanisms to a system if the mechanisms are independent of each other. end quote: First imagine a set of all and only Shakespearean Sonnets. The greater information in the repertoire of a whole system is the fact that these and only these sonnets are Shakespearean. Next imagine a set containing the all and only individual components of a circle (The circumference, the diameter, etc), The greater information in the repertoire of a whole system is the fact that these and only these parts constitute a circle. Next imagine a set containing all and only individual components of a bacterial flagellum. The greater information in the repertoire of a whole system is the fact that these and only these pieces constitute a working BF. I hope you see obvious equivalence? peacefifthmonarchyman
November 26, 2014
November
11
Nov
26
26
2014
03:38 PM
3
03
38
PM
PDT
zac says, There are an infinity of algorithms that can create any finite string. However, they may be more complex than the string itself. I say. That doesn't help you. let's say that algorithm 1 "Shakespeare" is less complex than the string and algorithm 2 "Marlowe" is more complex than the string. You still have two algorithms that can produce the same string directly contrary to the stated definition of IC. You say, There are very simple evolutionary pathways to irreducible structures. The simplest way is to knock out a scaffolding. I say, I know that is the talking point but as you have just conclusively demonstrated if by evolutionary we mean algorithmic then this is impossible!!!! On the other hand if you already had the "key" for an IC structure and wanted to produce an object that approximated it knocking out the scaffolding would be a good approach. peacefifthmonarchyman
November 26, 2014
November
11
Nov
26
26
2014
03:14 PM
3
03
14
PM
PDT
fifthmonarchyman: There is the proof fifthmonarchyman: mathematically proven that Algorithms like RM/NS can not produce Irreducibly Complex configurations!! Um, no. There are an infinity of algorithms that can create any finite string. However, they may be more complex than the string itself. fifthmonarchyman: Now we are back to fooling the observer. You are claiming that you can fake IC with out actually producing it. No, we're claiming that we can make signatures that are very similar to one another. Think about what is meant by a signature. It's a representation of sorts of the patterns in the original string. How long is this representation? How many points of comparison are there? fifthmonarchyman: If IC really exists then Darwinism is mathematically false. There are very simple evolutionary pathways to irreducible structures. The simplest way is to knock out a scaffolding.Zachriel
November 26, 2014
November
11
Nov
26
26
2014
02:25 PM
2
02
25
PM
PDT
ZAC says, No, there are an infinite number of algorithms that can produce any given finite string. I say, There is the proof folks!!!!! How flipping cool is that Zac says, The algorithms could be made close enough that the signatures would be indistinguishable, even if the outputs were superficially different. I say, Now we are back to fooling the observer. You are claiming that you can fake IC with out actually producing it. That is what the Game is all about. However we have now established that. If IC really exists then Darwinism is mathematically false. back to studying peacefifthmonarchyman
November 26, 2014
November
11
Nov
26
26
2014
02:12 PM
2
02
12
PM
PDT
ZAc check this out, http://research.microsoft.com/pubs/70544/tr-2008-20.pdf I'm just now starting to research this, FUN FUN I don't know how it will turn out stay tunedfifthmonarchyman
November 26, 2014
November
11
Nov
26
26
2014
02:06 PM
2
02
06
PM
PDT
Dionisio:
BTW, I’m a student, not a scientist. My scientific credibility is none, zero, nada, null. That’s why I ask simple questions in order to learn.
You are not asking learning questions like "Can an electronic sensor bit be connected to any memory address bit of an electronic RAM or do they have to be in some order?" you're just asking snotty questions that expect me to dedicate the next four or more years to tutoring you for free, so that you can teach me a punishing lesson about some imaginary error in my ways. Dionisio:
But apparently some folks don’t like my questions. Are my simple questions really that inconvenient?
Yes it is very inconvenient for me to have to pamper to your bratty demands. But since that's what you asked for I'll first ask the appropriate teacherly question normally used for getting to better know each other: What do you Dionisio want to be when you grow up?Gary S. Gaulin
November 26, 2014
November
11
Nov
26
26
2014
02:03 PM
2
02
03
PM
PDT
fifthmonarchyman: When you do so you have created a new algorithm!!!! That's correct. While the outputs may appear very different, they would have similar signatures. fifthmonarchyman: Now you have a set containing all the sonnets produced by algorithm 1 and all the sonnets produced by algorithm 2 That's correct. fifthmonarchyman: Marlowe can’t produce Shakespearean sonnets by definition. Only Shakespeare can produce Shakespearean sonnets!!! The algorithms could be made close enough that the signatures would be indistinguishable, even if the outputs were superficially different. fifthmonarchyman: Now the question I have is there an output that can only be produced by one algorithm? No, there are an infinite number of algorithms that can produce any given finite string.Zachriel
November 26, 2014
November
11
Nov
26
26
2014
01:56 PM
1
01
56
PM
PDT
Wow Zac This discussion has been worth the trouble for just your latest question. I think you have stumbled on a better way of expressing the point I've been trying to make. Thank you so much zac says. We could modify the algorithm ever so slightly, so that its output would closely resemble that of the other. I say, When you do so you have created a new algorithm!!!! Now you have a set containing all the sonnets produced by algorithm 1 and all the sonnets produced by algorithm 2 That directly violates my stated definition of IC!!!!! Lets call algorithm 1 Shakespeare and algorithm 2 Marlowe Marlowe can't produce Shakespearean sonnets by definition. Only Shakespeare can produce Shakespearean sonnets!!! Now the question I have is there an output that can only be produced by one algorithm? If there is not then ZAC has just mathematically proven that Algorithms like RM/NS can not produce Irreducibly Complex configurations!! These are strange and exciting times indeed. peacefifthmonarchyman
November 26, 2014
November
11
Nov
26
26
2014
01:48 PM
1
01
48
PM
PDT
gpuccio @797
In my post #691 I have proposed a challemge, whose aim is to show that dFSCI is not an example of the TSS fallacy. IOWs, that invoking the TSS fallacy for the dFSCI procedure is a fallacy. The challenge was aimed at DNA_Jock, but it seems that he has not taken it seriously. OK.
As I explained to you at 720 and at 745, this restriction:
You are allowed to use the text, but not the specific bits of it, in your specification.
is incoherent, since the text comprises the specific characters. Either your challenge is trivial (and I have met it), or your analogy fails. My point has always been that your attempts to define a target for a protein exhibit the TSS fallacy. Your analogizing to text strings is hopelessly flawed.DNA_Jock
November 26, 2014
November
11
Nov
26
26
2014
01:35 PM
1
01
35
PM
PDT
gpuccio: how can the environment give any hint about how to build a protein of thousands of aminoacids that is able to use a proton gradient to synthesize a much more “feasible” energy tool like ATP? The usual. Each step in the process provides an advantage to the organism. ATP synthase appears to be an association of two subunits, each of which is similar to other protein domains in the cell.Zachriel
November 26, 2014
November
11
Nov
26
26
2014
01:29 PM
1
01
29
PM
PDT
Zachriel: It's not a case that you try to move the discussion to metabolism, and more general functional issues. You see, the real problem that you try to evade is: how can the environment give any hint about how to build a protein of thousands of aminoacids that is able to use a proton gradient to synthesize a much more "feasible" energy tool like ATP?gpuccio
November 26, 2014
November
11
Nov
26
26
2014
01:15 PM
1
01
15
PM
PDT
fifthmonarchyman: If such a set could possibly exist my hypotheses would be that such that no other algorithm could reproduce it sufficiently enough to fool an observer infallibly. We could modify the algorithm ever so slightly, so that its output would closely resemble that of the other. fifthmonarchyman: Distinctive patterns are precisely what distinguishes Shakespearean sonnets from other data sets. So? What does it show?Zachriel
November 26, 2014
November
11
Nov
26
26
2014
01:04 PM
1
01
04
PM
PDT
Hey gpuccio I think you should consider starting a new thread. By now I think you understand that your challenge is the mirror image of mine. As I said before It's just these sorts of strange equivalences that tell me we are on to something here. I think that this interesting topic needs to be seen by as many people as possible on both sides and this thread is so long and I'm afraid some are missing it. peacefifthmonarchyman
November 26, 2014
November
11
Nov
26
26
2014
12:57 PM
12
12
57
PM
PDT
zac says If we took the output of a computer algorithm that writes sonnets, then it would be irreducibly complex too because “it contains all the sonnets and nothing else”. I say, I suppose so I had not thought of that. If such a set could possibly exist my hypotheses would be that such that no other algorithm could reproduce it sufficiently enough to fool an observer infallibly. Of course The no cheating clause still applies. That is something fun to think about. Is there any output that can only be produced by one algorithm? Is there a way to know this? you say, So, as we said, it’s just a measure of the personal distinctive patterns. That’s standard forensics in the art world. Not sure what that proves. I say, ID is simply forensics on a grander scale ;-) Distinctive patterns are precisely what distinguishes Shakespearean sonnets from other data sets. In other words they are the Key/specification/platonic form. They tell you what is unique about a IC artifact. For example Distinctive patterns tell you that you are looking at circle rather than a oval. peacefifthmonarchyman
November 26, 2014
November
11
Nov
26
26
2014
12:44 PM
12
12
44
PM
PDT
fifthmonarchyman: The set of Shakespearean sonnets is Irreducibly Complex by definition it contains all Shakespearean sonnets and nothing else That's an odd definition of irreducibly complex. If we took the output of a computer algorithm that writes sonnets, then it would be irreducibly complex too because "it contains all the sonnets and nothing else". fifthmonarchyman: exactly, now you are getting it. So, as we said, it's just a measure of the personal distinctive patterns. That's standard forensics in the art world. Not sure what that proves.Zachriel
November 26, 2014
November
11
Nov
26
26
2014
12:01 PM
12
12
01
PM
PDT
zac says, That doesn’t imply anything about irreducible complexity, just personality. I say, I don't establish that the set of Shakespearean sonnets is Irreducibly Complex by learning form. The set of Shakespearean sonnets is Irreducibly Complex by definition it contains all Shakespearean sonnets and nothing else You say, A sonnet by Marlowe will have a different pattern. A sonnet by someone modern will probably exhibit even more differences. I say, exactly, now you are getting it. peacefifthmonarchyman
November 26, 2014
November
11
Nov
26
26
2014
10:23 AM
10
10
23
AM
PDT
1 2 3 4 5 6 31

Leave a Reply