Uncommon Descent Serving The Intelligent Design Community

An attempt at computing dFSCI for English language

Categories
Intelligent Design
Share
Facebook
Twitter/X
LinkedIn
Flipboard
Print
Email

In a recent post, I was challenged to offer examples of computation of dFSCI for a list of 4 objects for which I had inferred design.

One of the objects was a Shakespeare sonnet.

My answer was the following:

A Shakespeare sonnet. Alan’s comments about that are out of order. I don’t infer design because I know of Shakespeare, or because I am fascinated by the poetry (although I am). I infer design simply because this is a piece of language with perfect meaning in english (OK, ancient english).
Now, a Shakespeare sonnet is about 600 characters long. That corresponds to a search space of about 3000 bits. Now, I cannot really compute the target space for language, but I am assuming here that the number of 600 characters sequences which make good sense in english is lower than 2^2500, and therefore the functional complexity of a Shakespeare sonnet is higher than 500 bits, Dembski’s UPB. As I am aware of no simple algorithm which can generate english sonnets from single characters, I infer design. I am certain that this is not a false positive.

In the discussion, I admitted however that I had not really computed the target space in this case:

The only point is that I have not a simple way to measure the target space for English language, so I have taken a shortcut by choosing a long enough sequence, so that I am well sure that the target space /search space ratio is above 500 bits. As I have clearly explained in my post #400.
For proteins, I have methods to approximate a lower threshold for the target space. For language I have never tried, because it is not my field, but I am sure it can be done. We need a linguist (Piotr, where are you?).
That’s why I have chosen and over-generous length. Am I wrong? Well, just offer a false positive.
For language, it is easy to show that the functional complexity is bound to increase with the length of the sequence. That is IMO true also for proteins, but it is less intuitive.

That remains true. But I have reflected, and I thought that perhaps, even if I am not a linguist and not even a amthematician, I could try to define better quantitatively the target space in this case, or at least to find a reasonable higher threshold for it.

So, here is the result of my reasonings. Again, I am neither a linguist nor a mathematician, and I will happy to consider any comment, criticism or suggestion. If I have made errors in my computations, I am ready to apologize.

Let’s start from my functional definition: any text of 600 characters which has good meaning in English.

The search space for a random search where every character has the same probability, assuming an alphabet of 30 characters (letters, space, elementary punctuation) gives easily a search space of 30^600, that is 2^2944. IOWs 2944 bits.

OK.

Now, I make the following assumptions (more or less derived from a quick Internet search:

a) There are about 200,000 words in English

b) The average length of an English word is 5 characters.

I also make the easy assumption that a text which has good meaning in English is made of English words.

For a 600 character text, we can therefore assume an average number of words of 120 (600/5).

Now, we compute the possible combinations (with repetition) of 120 words from a pool of 200000. The result, if I am right, is: 2^1453. IOWs 1453 bits.

Now, obviously each of these combinations can have n! permutations, therefore each of them has 120! different permutation, that is 2^660. IOWs 660 bits.

So, multiplying the total number of word combinations with repetitions by the total number of permutations for each combination, we have:

2^1453 * 2^660 = 2^2113

IOWs, 2113 bits.

What is this number? It is the total number of sequences of 120 words that we can derive from a pool of 200000 English words. Or at least, a good approximation of that number.

It’s a big number.

Now, the important concept: in that number are certainly included all the sequences of 600 characters which have good meaning in English. Indeed, it is difficult to imagine sequences that have good meaning in English and are not made of correct English words.

And the important question: how many of those sequences have good meaning in English? I have no idea. But anyone will agree that it must be only a small subset.

So, I believe that we can say that 2^2113 is a higher threshold for out target space of sequences of 600 characters which have a good meaning in English. And, certainly, a very generous higher threshold.

Well, if we take that number as a measure of our target space, what is the functional information in a sequence of 600 characters which has good meaning in English?

It’s easy: the ratio between target space and search space:

2^2113 / 2^ 2944 = 2^-831. IOWs, taking -log2, 831 bits of functional information. (Thank you to drc466 for the kind correction here)

So, if we consider as a measure of our functional space a number which is certainly an extremely overestimated higher threshold for the real value, still our dFSI is over 800 bits.

Let’s go back to my initial statement:

Now, a Shakespeare sonnet is about 600 characters long. That corresponds to a search space of about 3000 bits. Now, I cannot really compute the target space for language, but I am assuming here that the number of 600 characters sequences which make good sense in english is lower than 2^2500, and therefore the functional complexity of a Shakespeare sonnet is higher than 500 bits, Dembski’s UPB. As I am aware of no simple algorithm which can generate english sonnets from single characters, I infer design. I am certain that this is not a false positive.

Was I wrong? You decide.

By the way, another important result is that if I make the same computation for a 300 character string, the dFSI value is 416 bits. That is a very clear demonstration that, in language, dFSI is bound to increase with the length of the string.

Comments
gpuccio at#124
My argument about those two sequences is about their conservation in a complex molecule. You can scarcely deny that those specific sequences are necessary, with that high level of conservation, to the working of ATP synthase in its common form, and especially the form which utilizes H+ gradients. The Apicomplexa paper you link describes a very different complex molecule, made of many different protein sequences, and is a complex example of a different engineering solution. In no way it is in contradiction with the functional specification of the sequences I examined in the traditional ATP synthase complex.
Another beautiful example of the Texas Sharp-Shooter. You were quite satisfied with your specification of the "ATP synthase", a nice tight cluster of bullet holes in the wall. Then REC points out a separate cluster of bullet holes, the Alveolata ATP synthase. Immediately you re-define your "ATP synthase" as "ATP synthase in its common form" or "the traditional ATP synthase", and get out some fresh paint for the recently observed bullet holes, which represent "a very different complex molecule, made of many different protein sequences, and is a complex example of a different engineering solution."DNA_Jock
November 11, 2014
November
11
Nov
11
11
2014
04:32 AM
4
04
32
AM
PDT
me think said, What do you mean by moving down the Y-Axes? I say, check out comment 81 peacefifthmonarchyman
November 11, 2014
November
11
Nov
11
11
2014
04:29 AM
4
04
29
AM
PDT
hey Zac You said, If different people give entirely different answers, then it’s not objective. I say, In one sense I agree with you. By objective I mean that my standard is exactly the same for different objects. Your standard might be lower or higher on the Y access than mine but it you should be consistent with yourself when it comes to the X axes, Hope that makes sense peacefifthmonarchyman
November 11, 2014
November
11
Nov
11
11
2014
04:27 AM
4
04
27
AM
PDT
fifthmonarchyman @138,
I think a good next step would be to actually use your calculation on the Voynich manuscript. I agree that there is enough CSI there to infer design it would be cool however to objectively compare the actual amount in the object with that in the sonnet.
I think that is too ambitious and is not within the realms of dFSCI /CSI , because you don't have a standard dictionary database of the Voynich script, have no idea of alphabet probability and no way of checking the result. You need to be an Egyptian hieroglyphs expert to even start deciphering a single word.
To do that we would need to move further down the Y axes
What do you mean by moving down the Y-Axes?Me_Think
November 11, 2014
November
11
Nov
11
11
2014
04:25 AM
4
04
25
AM
PDT
Guys: As a shameful form of self-promotion, I will try to draw again the attention to the OP and the computation in it. To do that, I will repost here what I said in post #51:
So, if the computation here is correct, a few interesting things ensue: 1) It is possible to compute the target space, and therefore dFSCI, for specific search spaces by some reasonable, indirect method. Of course, each space should be analyzed with appropriate methods. 2) Nobody seems to object that he knows some simple algorithm which can write a passage of 600 characters which has good meaning in English. Where are all those objections about how difficult it is to exclude necessity, and about how that generates circularity, and about how that is bound to generate many false positives? The balance at present: a) Algorithms proposed to explain Shakespeare’s sonnet (or any other passage of the same length in good English): none. b) False positives proposed: none. c) True positives found: a lot. For example, all the posts in the last thread that were longer than 600 characters (there were a few). 3) We have a clear example that functional complexity, at least in the language space, is bound to increase hugely with the increase in length of the string. This is IMO an important result, very intuitive, but now we have a mathematical verification. Moreover, while the above reasoning is about language, I believe that it is possible in principle to demonstrate it also for other functional spaces, like software and proteins.
I would like to spend a few more words on point 1. The essence of point 1 is that the computation of a target space can be done by indirect methods, but that we must eagerly look for the best method to do that in each case. To those who criticize the approach of Durston and my personal approach to the computation of the target space for functional proteins, I just say: OK, propose your approach. Maybe ti will be better. But there is no reason to deny that an interesting problem exists, that we must look for the best solutions, and that the problem has important implications for the problem of the origin of biological information. CSI denialism has no real place in science.gpuccio
November 11, 2014
November
11
Nov
11
11
2014
04:13 AM
4
04
13
AM
PDT
fifthmonarchyman: For some one familiar with this debate “me thinks it’s a weasel” is loaded with meaning. For an average Joe it might take a whole sonnet to pass the threshold. On the other hand if I were looking at a string of text in Chinese it might take a string the length of a whole play to pass the tet because I would be looking for mere arbitrary structure and grammar as apposed English words. In other words, it's subjective. fifthmonarchyman: But even in that case I would be able to give the sequence a real objective value and compare it to strings that were the result of combination of algorithmic and random processes If different people give entirely different answers, then it's not objective. fifthmonarchyman: produce an algorithm capable of producing a 600-character English text independently with out smuggling information through the back door. Evolutionary algorithms require an interface to an environment of some sort. Turns out that Shakespeare also incorporated information from his cultural environment. For instance, the William Shakespeare algorithm included an extensive dictionary, grammar rules, stock phrases, scansion, personality types, history, and so on. mullerpr: Natural processes flowing from the uniformity of classical mechanics … Is that really going to be presented as an analogue for Natural Selection? You didn't ask for an analogue for natural selection, but examples of natural sieves.Zachriel
November 11, 2014
November
11
Nov
11
11
2014
04:07 AM
4
04
07
AM
PDT
I linked the wrong paper. Here is the one I meant http://arxiv.org/pdf/1002.4592.pdffifthmonarchyman
November 11, 2014
November
11
Nov
11
11
2014
04:07 AM
4
04
07
AM
PDT
fifthmonarchyman: "To do that we would need to move further down the Y axes and look at the arbitrary structure and grammar instead of “good English”. That might be a little more difficult but I believe it’s still doable." Yes, I believe it's doable. It is not my personal priority, however. And thank you for the kind words.gpuccio
November 11, 2014
November
11
Nov
11
11
2014
04:05 AM
4
04
05
AM
PDT
Dionisio said That sounds interesting. I say Thank you I think it's way cool too. Right now I come at my calculation in a different way than gpuccio. By graphically comparing an actual data string with a scrambled set of the same data. Then I try to quantify the the differences between the two strings. You can find the paper that was my inspiration here https://www.cs.duke.edu/~conitzer/turingtradeAAMAS09demo.pdf peacefifthmonarchyman
November 11, 2014
November
11
Nov
11
11
2014
04:02 AM
4
04
02
AM
PDT
#137 follow-up Discussions between people with irreconcilable worldview positions turn into senseless arguments that lead nowhere. However, apparently they provide some entertainment, like gladiators and lions provided to the public in the Roman coliseum many years ago. That's why they have clowns in the circus. Perhaps that increases attendance, traffic and ad revenues. There's also a strong argument for allowing this for the sake of the onlookers/lurkers visiting this blog and also to sharpen the ID arguments. I don't quite agree with some of these arguments, but respect the opinions of others. :)Dionisio
November 11, 2014
November
11
Nov
11
11
2014
04:01 AM
4
04
01
AM
PDT
Hats off to you gpuccio this is a great thread. I think a good next step would be to actually use your calculation on the Voynich manuscript. I agree that there is enough CSI there to infer design it would be cool however to objectively compare the actual amount in the object with that in the sonnet. To do that we would need to move further down the Y axes and look at the arbitrary structure and grammar instead of "good English". That might be a little more difficult but I believe it's still doable. Peacefifthmonarchyman
November 11, 2014
November
11
Nov
11
11
2014
03:48 AM
3
03
48
AM
PDT
#112 Reality
D: “Why do you want to see a calculation?"
Because you IDists claim that you can calculate CSI-dFSCI-FSCO/I.
D: “Is that important to you? Why?”
To see if you can, and laugh at you when you can’t.
D: “If an example is given, would you ask for another?”
Yes.
D: “If ten examples are provided, would you demand eleven?”
Provide ten and then we’ll see.
Thank you for answering my questions. Now every reader in this blog can see that you have revealed, very clearly, your own motives for being here in this blog. Very probably your comrades and fellow travelers would have answered exactly as you did. Which is exactly what I (and probably others) suspected.Dionisio
November 11, 2014
November
11
Nov
11
11
2014
03:47 AM
3
03
47
AM
PDT
Graham2: I suppose that there is enough CSI in the Voynich manuscript to easily infer design for it. Even the illustrations would be enough. Decrypting the meaning, if there is a meaning, is all another matter. Obviously, we cannot infer design from the meaning, if we are not sure that there is a meaning. If our inference depended only on the possible meaning (which is not the case for that object), we would not infer design unless and until a meaning is found. In the worst case, that would simply be a false negative. As said many times.gpuccio
November 11, 2014
November
11
Nov
11
11
2014
03:18 AM
3
03
18
AM
PDT
Has KairosFocus been baned from this thread?sparc
November 11, 2014
November
11
Nov
11
11
2014
03:17 AM
3
03
17
AM
PDT
#133 mullerpr Thank you.Dionisio
November 11, 2014
November
11
Nov
11
11
2014
02:56 AM
2
02
56
AM
PDT
Dionisio, the link was just to the Amazon page for Yockey's book. Information Theory, Evolution, and The Origin of Life http://www.amazon.com/gp/aw/d/0521169585?pc_redir=1414569767&robot_redir=1mullerpr
November 11, 2014
November
11
Nov
11
11
2014
02:12 AM
2
02
12
AM
PDT
Perhaps CSI could be applied to the Voynich manuscript to determine if its designed or not. You would be doing the whole world a favour.Graham2
November 11, 2014
November
11
Nov
11
11
2014
02:09 AM
2
02
09
AM
PDT
Me_Think: Thank you, you are making my argument. You cannot distinguish between designed things and non designed things, unless the object exhibits functional complexity. Why? Because natural mechanisms, through randomness or necessity, can generate configurations that are functional, but only with low functional complexity. That's why the computation of dFSCI is necessary to reliably infer design. Could you please explain that to keith?gpuccio
November 11, 2014
November
11
Nov
11
11
2014
01:15 AM
1
01
15
AM
PDT
keith s: You are really trying your worst. The meaning is really obvious, and you are not stupid. What should I think? The meaning is: Procedure 2 is useless as a separate procedure, because it is the same as procedure 1. The real useless thing here is your "argument".gpuccio
November 11, 2014
November
11
Nov
11
11
2014
01:11 AM
1
01
11
AM
PDT
The Chinese letter B is written as 'tt'. If dFSCI is calculated for this letter, wouldn't it be less than 500 bits? So is it designed or not ? A splatter left on wall by a stone falling in a water puddle by gravity or a splatter on wall by stone dropped by a person on puddle would (I guess)have pretty much same dFSCI. How will you distinguish between the two ? A man-made crop circle and similar natural crop circle would present the same problem.Me_Think
November 11, 2014
November
11
Nov
11
11
2014
01:05 AM
1
01
05
AM
PDT
Ok Keith S is confused, but we can't be certain, even he said so......Andre
November 11, 2014
November
11
Nov
11
11
2014
12:00 AM
12
12
00
AM
PDT
gpuccio, In his #110, PaV says that procedure 2 is useless:
Gpuccio’s DFCSI isn’t useless, your Procedure 2 is useless.
You agree wholeheartedly:
PaV at #110: Absolutely correct! Thank you.
You then tell me that procedure 1 and procedure 2 are the same:
As explained, your procedure 2 is the same procedure, and implies the calculation.
You and PaV agree that procedure 2 is useless. You tell me that Procedure 1 is the same as Procedure 2. Therefore, Procedure 1 is useless, according to you. Oops.keith s
November 10, 2014
November
11
Nov
10
10
2014
11:42 PM
11
11
42
PM
PDT
keith s: As explained, your procedure 2 is the same procedure, and implies the calculation. Why do you speak of 600 characters? (a definite complexity threshold) Why do you speak of "meaningful in English"? (a definite functional specification) You are simply giving my procedure in its final form, without the logical explanations. My compliments!gpuccio
November 10, 2014
November
11
Nov
10
10
2014
11:20 PM
11
11
20
PM
PDT
PaV,
If you omit step #3 of Procedure 1 in Procedure 2, then step#3 in Procedure 2 is completely meaningless.
Exactly! I think you're close to understanding this! Steps 3 and 4 are useless in procedure 1, and step 3 is useless in procedure 2. All of the useful work is done by steps 1 and 2:
1. Look at a comment longer than 600 characters. 2. If you recognize it as meaningful English, conclude that it must be designed.
The calculation adds nothing. Now, could you please point this out to gpuccio before he embarrasses himself further? He won't accept it from me, but he might from you.
Gpuccio’s DFCSI isn’t useless, your Procedure 2 is useless.
Procedure 1 gives exactly the same answers as Procedure 2. You say Procedure 2 is useless. Therefore, Procedure 1 is also useless. Excellent job, PaV. You're a real asset to the ID team!keith s
November 10, 2014
November
11
Nov
10
10
2014
11:19 PM
11
11
19
PM
PDT
REC at #91: My argument about those two sequences is about their conservation in a complex molecule. You can scarcely deny that those specific sequences are necessary, with that high level of conservation, to the working of ATP synthase in its common form, and especially the form which utilizes H+ gradients. The Apicomplexa paper you link describes a very different complex molecule, made of many different protein sequences, and is a complex example of a different engineering solution. In no way it is in contradiction with the functional specification of the sequences I examined in the traditional ATP synthase complex. I paste here the abstract of that interesting paper, for all to read: "Highly Divergent Mitochondrial ATP Synthase Complexes in Tetrahymena thermophila Abstract The F-type ATP synthase complex is a rotary nano-motor driven by proton motive force to synthesize ATP. Its F1 sector catalyzes ATP synthesis, whereas the Fo sector conducts the protons and provides a stator for the rotary action of the complex. Components of both F1 and Fo sectors are highly conserved across prokaryotes and eukaryotes. Therefore, it was a surprise that genes encoding the a and b subunits as well as other components of the Fo sector were undetectable in the sequenced genomes of a variety of apicomplexan parasites. While the parasitic existence of these organisms could explain the apparent incomplete nature of ATP synthase in Apicomplexa, genes for these essential components were absent even in Tetrahymena thermophila, a free-living ciliate belonging to a sister clade of Apicomplexa, which demonstrates robust oxidative phosphorylation. This observation raises the possibility that the entire clade of Alveolata may have invented novel means to operate ATP synthase complexes. To assess this remarkable possibility, we have carried out an investigation of the ATP synthase from T. thermophila. Blue native polyacrylamide gel electrophoresis (BN-PAGE) revealed the ATP synthase to be present as a large complex. Structural study based on single particle electron microscopy analysis suggested the complex to be a dimer with several unique structures including an unusually large domain on the intermembrane side of the ATP synthase and novel domains flanking the c subunit rings. The two monomers were in a parallel configuration rather than the angled configuration previously observed in other organisms. Proteomic analyses of well-resolved ATP synthase complexes from 2-D BN/BN-PAGE identified orthologs of seven canonical ATP synthase subunits, and at least 13 novel proteins that constitute subunits apparently limited to the ciliate lineage. A mitochondrially encoded protein, Ymf66, with predicted eight transmembrane domains could be a substitute for the subunit a of the Fo sector. The absence of genes encoding orthologs of the novel subunits even in apicomplexans suggests that the Tetrahymena ATP synthase, despite core similarities, is a unique enzyme exhibiting dramatic differences compared to the conventional complexes found in metazoan, fungal, and plant mitochondria, as well as in prokaryotes. These findings have significant implications for the origins and evolution of a central player in bioenergetics. Author Summary Synthesis of ATP, the currency of the cellular energy economy, is carried out by a rotary nano-motor, the ATP synthase complex, which uses proton flow to drive the rotation of protein subunits so as to produce ATP. There are two main components in mitochondrial F-type ATP synthase complexes, each made up of a number of different proteins: F1 has the catalytic sites for ATP synthesis, and Fo forms channels for proton movement and provides a bearing and stator to contain the rotary action of the motor. The two parts of the complex have to interact with each other, and critical protein subunits of the enzyme are conserved from bacteria to higher eukaryotes. We were surprised that a group of unicellular organisms called alveolates (including ciliates, apicomplexa, and dinoflagellates) seemed to lack two critical proteins of the Fo component. We have isolated intact ATP synthase complexes from the ciliate Tetrahymena thermophila and examined their structure by electron microscopy and their protein composition by mass spectrometry. We found that the ATP synthase complex of this organism is quite different, both in its overall structure and in many of the associated protein subunits, from the ATP synthase in other organisms. At least 13 novel proteins are present within this complex that have no orthologs in any organism outside of the ciliates. Our results suggest significant divergence of a critical bioenergetic player within the alveolate group."gpuccio
November 10, 2014
November
11
Nov
10
10
2014
11:17 PM
11
11
17
PM
PDT
gpuccio, You get exactly the same answer whether or not you do the calculation, in 100% of the cases. Why waste time on a calculation that adds no value whatsoever? I repeat:
gpuccio, We can use your very own test procedure to show that dFSCI is useless. Procedure 1: 1. Look at a comment longer than 600 characters. 2. If you recognize it as meaningful English, conclude that it must be designed. 3. Perform a pointless and irrelevant dFSCI calculation. 4. Conclude that the comment was designed. Procedure 2: 1. Look at a comment longer than 600 characters. 2. If you recognize it as meaningful English, conclude that it must be designed. 3. Conclude that the comment was designed. The two procedures give exactly the same results, yet the second one doesn’t even include the dFSCI step. All the work was done by the other steps. The dFSCI step was a waste of time, mere window dressing. Even your own test procedure shows that dFSCI is useless, gpuccio.
keith s
November 10, 2014
November
11
Nov
10
10
2014
11:10 PM
11
11
10
PM
PDT
PaV: Thank you for your contributions. It's beautiful to have you here! :)gpuccio
November 10, 2014
November
11
Nov
10
10
2014
11:04 PM
11
11
04
PM
PDT
Reality at #112: Is what you see in this OP a calculation?gpuccio
November 10, 2014
November
11
Nov
10
10
2014
11:00 PM
11
11
00
PM
PDT
PaV at #110: Absolutely correct! Thank you.gpuccio
November 10, 2014
November
11
Nov
10
10
2014
10:58 PM
10
10
58
PM
PDT
Adapa: "The purpose of a dFSCI calculation is merely for gpuccio to convince himself he was specially created by his loving God." Really? What an argument. I am overwhelmed.gpuccio
November 10, 2014
November
11
Nov
10
10
2014
10:56 PM
10
10
56
PM
PDT
1 25 26 27 28 29 31

Leave a Reply