An attempt at computing dFSCI for English language

_{gpuccio
November 10, 2014

Intelligent Design

6}_{Categories
Intelligent Design}

Share: Facebook; Twitter/X; LinkedIn; Flipboard; Print; Email

In a recent post, I was challenged to offer examples of computation of dFSCI for a list of 4 objects for which I had inferred design.

One of the objects was a Shakespeare sonnet.

My answer was the following:

A Shakespeare sonnet. Alan’s comments about that are out of order. I don’t infer design because I know of Shakespeare, or because I am fascinated by the poetry (although I am). I infer design simply because this is a piece of language with perfect meaning in english (OK, ancient english).
Now, a Shakespeare sonnet is about 600 characters long. That corresponds to a search space of about 3000 bits. Now, I cannot really compute the target space for language, but I am assuming here that the number of 600 characters sequences which make good sense in english is lower than 2^2500, and therefore the functional complexity of a Shakespeare sonnet is higher than 500 bits, Dembski’s UPB. As I am aware of no simple algorithm which can generate english sonnets from single characters, I infer design. I am certain that this is not a false positive.

In the discussion, I admitted however that I had not really computed the target space in this case:

The only point is that I have not a simple way to measure the target space for English language, so I have taken a shortcut by choosing a long enough sequence, so that I am well sure that the target space /search space ratio is above 500 bits. As I have clearly explained in my post #400.
For proteins, I have methods to approximate a lower threshold for the target space. For language I have never tried, because it is not my field, but I am sure it can be done. We need a linguist (Piotr, where are you?).
That’s why I have chosen and over-generous length. Am I wrong? Well, just offer a false positive.
For language, it is easy to show that the functional complexity is bound to increase with the length of the sequence. That is IMO true also for proteins, but it is less intuitive.

That remains true. But I have reflected, and I thought that perhaps, even if I am not a linguist and not even a amthematician, I could try to define better quantitatively the target space in this case, or at least to find a reasonable higher threshold for it.

So, here is the result of my reasonings. Again, I am neither a linguist nor a mathematician, and I will happy to consider any comment, criticism or suggestion. If I have made errors in my computations, I am ready to apologize.

Let’s start from my functional definition: any text of 600 characters which has good meaning in English.

The search space for a random search where every character has the same probability, assuming an alphabet of 30 characters (letters, space, elementary punctuation) gives easily a search space of 30^600, that is 2^2944. IOWs 2944 bits.

OK.

Now, I make the following assumptions (more or less derived from a quick Internet search:

a) There are about 200,000 words in English

b) The average length of an English word is 5 characters.

I also make the easy assumption that a text which has good meaning in English is made of English words.

For a 600 character text, we can therefore assume an average number of words of 120 (600/5).

Now, we compute the possible combinations (with repetition) of 120 words from a pool of 200000. The result, if I am right, is: 2^1453. IOWs 1453 bits.

Now, obviously each of these combinations can have n! permutations, therefore each of them has 120! different permutation, that is 2^660. IOWs 660 bits.

So, multiplying the total number of word combinations with repetitions by the total number of permutations for each combination, we have:

2^1453 * 2^660 = 2^2113

IOWs, 2113 bits.

What is this number? It is the total number of sequences of 120 words that we can derive from a pool of 200000 English words. Or at least, a good approximation of that number.

It’s a big number.

Now, the important concept: in that number are certainly included all the sequences of 600 characters which have good meaning in English. Indeed, it is difficult to imagine sequences that have good meaning in English and are not made of correct English words.

And the important question: how many of those sequences have good meaning in English? I have no idea. But anyone will agree that it must be only a small subset.

So, I believe that we can say that 2^2113 is a higher threshold for out target space of sequences of 600 characters which have a good meaning in English. And, certainly, a very generous higher threshold.

Well, if we take that number as a measure of our target space, what is the functional information in a sequence of 600 characters which has good meaning in English?

It’s easy: the ratio between target space and search space:

2^2113 / 2^ 2944 = 2^-831. IOWs, taking -log2, 831 bits of functional information. (Thank you to drc466 for the kind correction here)

So, if we consider as a measure of our functional space a number which is certainly an extremely overestimated higher threshold for the real value, still our dFSI is over 800 bits.

Let’s go back to my initial statement:

Now, a Shakespeare sonnet is about 600 characters long. That corresponds to a search space of about 3000 bits. Now, I cannot really compute the target space for language, but I am assuming here that the number of 600 characters sequences which make good sense in english is lower than 2^2500, and therefore the functional complexity of a Shakespeare sonnet is higher than 500 bits, Dembski’s UPB. As I am aware of no simple algorithm which can generate english sonnets from single characters, I infer design. I am certain that this is not a false positive.

Was I wrong? You decide.

By the way, another important result is that if I make the same computation for a 300 character string, the dFSI value is 416 bits. That is a very clear demonstration that, in language, dFSI is bound to increase with the length of the string.

Comments

gpuccio at#124
My argument about those two sequences is about their conservation in a complex molecule. You can scarcely deny that those specific sequences are necessary, with that high level of conservation, to the working of ATP synthase in its common form, and especially the form which utilizes H+ gradients. The Apicomplexa paper you link describes a very different complex molecule, made of many different protein sequences, and is a complex example of a different engineering solution. In no way it is in contradiction with the functional specification of the sequences I examined in the traditional ATP synthase complex.
Another beautiful example of the Texas Sharp-Shooter. You were quite satisfied with your specification of the "ATP synthase", a nice tight cluster of bullet holes in the wall. Then REC points out a separate cluster of bullet holes, the Alveolata ATP synthase. Immediately you re-define your "ATP synthase" as "ATP synthase in its common form" or "the traditional ATP synthase", and get out some fresh paint for the recently observed bullet holes, which represent "a very different complex molecule, made of many different protein sequences, and is a complex example of a different engineering solution."DNA_Jock_{November 11, 2014
November
11
Nov
11
11
2014
04:32 AM
4
04
32
AM
PDT}

me think said, What do you mean by moving down the Y-Axes? I say, check out comment 81 peacefifthmonarchyman_{November 11, 2014
November
11
Nov
11
11
2014
04:29 AM
4
04
29
AM
PDT}

hey Zac You said, If different people give entirely different answers, then it’s not objective. I say, In one sense I agree with you. By objective I mean that my standard is exactly the same for different objects. Your standard might be lower or higher on the Y access than mine but it you should be consistent with yourself when it comes to the X axes, Hope that makes sense peacefifthmonarchyman_{November 11, 2014
November
11
Nov
11
11
2014
04:27 AM
4
04
27
AM
PDT}

fifthmonarchyman @138,
I think a good next step would be to actually use your calculation on the Voynich manuscript. I agree that there is enough CSI there to infer design it would be cool however to objectively compare the actual amount in the object with that in the sonnet.
I think that is too ambitious and is not within the realms of dFSCI /CSI , because you don't have a standard dictionary database of the Voynich script, have no idea of alphabet probability and no way of checking the result. You need to be an Egyptian hieroglyphs expert to even start deciphering a single word.
To do that we would need to move further down the Y axes
What do you mean by moving down the Y-Axes?Me_Think_{November 11, 2014
November
11
Nov
11
11
2014
04:25 AM
4
04
25
AM
PDT}

Guys: As a shameful form of self-promotion, I will try to draw again the attention to the OP and the computation in it. To do that, I will repost here what I said in post #51:
So, if the computation here is correct, a few interesting things ensue: 1) It is possible to compute the target space, and therefore dFSCI, for specific search spaces by some reasonable, indirect method. Of course, each space should be analyzed with appropriate methods. 2) Nobody seems to object that he knows some simple algorithm which can write a passage of 600 characters which has good meaning in English. Where are all those objections about how difficult it is to exclude necessity, and about how that generates circularity, and about how that is bound to generate many false positives? The balance at present: a) Algorithms proposed to explain Shakespeare’s sonnet (or any other passage of the same length in good English): none. b) False positives proposed: none. c) True positives found: a lot. For example, all the posts in the last thread that were longer than 600 characters (there were a few). 3) We have a clear example that functional complexity, at least in the language space, is bound to increase hugely with the increase in length of the string. This is IMO an important result, very intuitive, but now we have a mathematical verification. Moreover, while the above reasoning is about language, I believe that it is possible in principle to demonstrate it also for other functional spaces, like software and proteins.
I would like to spend a few more words on point 1. The essence of point 1 is that the computation of a target space can be done by indirect methods, but that we must eagerly look for the best method to do that in each case. To those who criticize the approach of Durston and my personal approach to the computation of the target space for functional proteins, I just say: OK, propose your approach. Maybe ti will be better. But there is no reason to deny that an interesting problem exists, that we must look for the best solutions, and that the problem has important implications for the problem of the origin of biological information. CSI denialism has no real place in science.gpuccio_{November 11, 2014
November
11
Nov
11
11
2014
04:13 AM
4
04
13
AM
PDT}

fifthmonarchyman: For some one familiar with this debate “me thinks it’s a weasel” is loaded with meaning. For an average Joe it might take a whole sonnet to pass the threshold. On the other hand if I were looking at a string of text in Chinese it might take a string the length of a whole play to pass the tet because I would be looking for mere arbitrary structure and grammar as apposed English words. In other words, it's subjective. fifthmonarchyman: But even in that case I would be able to give the sequence a real objective value and compare it to strings that were the result of combination of algorithmic and random processes If different people give entirely different answers, then it's not objective. fifthmonarchyman: produce an algorithm capable of producing a 600-character English text independently with out smuggling information through the back door. Evolutionary algorithms require an interface to an environment of some sort. Turns out that Shakespeare also incorporated information from his cultural environment. For instance, the William Shakespeare algorithm included an extensive dictionary, grammar rules, stock phrases, scansion, personality types, history, and so on. mullerpr: Natural processes flowing from the uniformity of classical mechanics … Is that really going to be presented as an analogue for Natural Selection? You didn't ask for an analogue for natural selection, but examples of natural sieves.Zachriel_{November 11, 2014
November
11
Nov
11
11
2014
04:07 AM
4
04
07
AM
PDT}

I linked the wrong paper. Here is the one I meant http://arxiv.org/pdf/1002.4592.pdffifthmonarchyman_{November 11, 2014
November
11
Nov
11
11
2014
04:07 AM
4
04
07
AM
PDT}

fifthmonarchyman: "To do that we would need to move further down the Y axes and look at the arbitrary structure and grammar instead of “good English”. That might be a little more difficult but I believe it’s still doable." Yes, I believe it's doable. It is not my personal priority, however. And thank you for the kind words.gpuccio_{November 11, 2014
November
11
Nov
11
11
2014
04:05 AM
4
04
05
AM
PDT}

Dionisio said That sounds interesting. I say Thank you I think it's way cool too. Right now I come at my calculation in a different way than gpuccio. By graphically comparing an actual data string with a scrambled set of the same data. Then I try to quantify the the differences between the two strings. You can find the paper that was my inspiration here https://www.cs.duke.edu/~conitzer/turingtradeAAMAS09demo.pdf peacefifthmonarchyman_{November 11, 2014
November
11
Nov
11
11
2014
04:02 AM
4
04
02
AM
PDT}

#137 follow-up Discussions between people with irreconcilable worldview positions turn into senseless arguments that lead nowhere. However, apparently they provide some entertainment, like gladiators and lions provided to the public in the Roman coliseum many years ago. That's why they have clowns in the circus. Perhaps that increases attendance, traffic and ad revenues. There's also a strong argument for allowing this for the sake of the onlookers/lurkers visiting this blog and also to sharpen the ID arguments. I don't quite agree with some of these arguments, but respect the opinions of others. :)Dionisio_{November 11, 2014
November
11
Nov
11
11
2014
04:01 AM
4
04
01
AM
PDT}

Hats off to you gpuccio this is a great thread. I think a good next step would be to actually use your calculation on the Voynich manuscript. I agree that there is enough CSI there to infer design it would be cool however to objectively compare the actual amount in the object with that in the sonnet. To do that we would need to move further down the Y axes and look at the arbitrary structure and grammar instead of "good English". That might be a little more difficult but I believe it's still doable. Peacefifthmonarchyman_{November 11, 2014
November
11
Nov
11
11
2014
03:48 AM
3
03
48
AM
PDT}

#112 Reality

D: “Why do you want to see a calculation?"
Because you IDists claim that you can calculate CSI-dFSCI-FSCO/I.
D: “Is that important to you? Why?”
To see if you can, and laugh at you when you can’t.
D: “If an example is given, would you ask for another?”
Yes.
D: “If ten examples are provided, would you demand eleven?”
Provide ten and then we’ll see.
Thank you for answering my questions. Now every reader in this blog can see that you have revealed, very clearly, your own motives for being here in this blog. Very probably your comrades and fellow travelers would have answered exactly as you did. Which is exactly what I (and probably others) suspected.Dionisio_{November 11, 2014
November
11
Nov
11
11
2014
03:47 AM
3
03
47
AM
PDT}

Graham2: I suppose that there is enough CSI in the Voynich manuscript to easily infer design for it. Even the illustrations would be enough. Decrypting the meaning, if there is a meaning, is all another matter. Obviously, we cannot infer design from the meaning, if we are not sure that there is a meaning. If our inference depended only on the possible meaning (which is not the case for that object), we would not infer design unless and until a meaning is found. In the worst case, that would simply be a false negative. As said many times.gpuccio_{November 11, 2014
November
11
Nov
11
11
2014
03:18 AM
3
03
18
AM
PDT}

Has KairosFocus been baned from this thread?sparc_{November 11, 2014
November
11
Nov
11
11
2014
03:17 AM
3
03
17
AM
PDT}

#133 mullerpr Thank you.Dionisio_{November 11, 2014
November
11
Nov
11
11
2014
02:56 AM
2
02
56
AM
PDT}

Dionisio, the link was just to the Amazon page for Yockey's book. Information Theory, Evolution, and The Origin of Life http://www.amazon.com/gp/aw/d/0521169585?pc_redir=1414569767&robot_redir=1mullerpr_{November 11, 2014
November
11
Nov
11
11
2014
02:12 AM
2
02
12
AM
PDT}

Perhaps CSI could be applied to the Voynich manuscript to determine if its designed or not. You would be doing the whole world a favour.Graham2_{November 11, 2014
November
11
Nov
11
11
2014
02:09 AM
2
02
09
AM
PDT}

Me_Think: Thank you, you are making my argument. You cannot distinguish between designed things and non designed things, unless the object exhibits functional complexity. Why? Because natural mechanisms, through randomness or necessity, can generate configurations that are functional, but only with low functional complexity. That's why the computation of dFSCI is necessary to reliably infer design. Could you please explain that to keith?gpuccio_{November 11, 2014
November
11
Nov
11
11
2014
01:15 AM
1
01
15
AM
PDT}

keith s: You are really trying your worst. The meaning is really obvious, and you are not stupid. What should I think? The meaning is: Procedure 2 is useless as a separate procedure, because it is the same as procedure 1. The real useless thing here is your "argument".gpuccio_{November 11, 2014
November
11
Nov
11
11
2014
01:11 AM
1
01
11
AM
PDT}

The Chinese letter B is written as 'tt'. If dFSCI is calculated for this letter, wouldn't it be less than 500 bits? So is it designed or not ? A splatter left on wall by a stone falling in a water puddle by gravity or a splatter on wall by stone dropped by a person on puddle would (I guess)have pretty much same dFSCI. How will you distinguish between the two ? A man-made crop circle and similar natural crop circle would present the same problem.Me_Think_{November 11, 2014
November
11
Nov
11
11
2014
01:05 AM
1
01
05
AM
PDT}

Ok Keith S is confused, but we can't be certain, even he said so......Andre_{November 11, 2014
November
11
Nov
11
11
2014
12:00 AM
12
12
00
AM
PDT}

gpuccio, In his #110, PaV says that procedure 2 is useless:
Gpuccio’s DFCSI isn’t useless, your Procedure 2 is useless.
You agree wholeheartedly:
PaV at #110: Absolutely correct! Thank you.
You then tell me that procedure 1 and procedure 2 are the same:
As explained, your procedure 2 is the same procedure, and implies the calculation.
You and PaV agree that procedure 2 is useless. You tell me that Procedure 1 is the same as Procedure 2. Therefore, Procedure 1 is useless, according to you. Oops.keith s_{November 10, 2014
November
11
Nov
10
10
2014
11:42 PM
11
11
42
PM
PDT}

keith s: As explained, your procedure 2 is the same procedure, and implies the calculation. Why do you speak of 600 characters? (a definite complexity threshold) Why do you speak of "meaningful in English"? (a definite functional specification) You are simply giving my procedure in its final form, without the logical explanations. My compliments!gpuccio_{November 10, 2014
November
11
Nov
10
10
2014
11:20 PM
11
11
20
PM
PDT}

PaV,
If you omit step #3 of Procedure 1 in Procedure 2, then step#3 in Procedure 2 is completely meaningless.
Exactly! I think you're close to understanding this! Steps 3 and 4 are useless in procedure 1, and step 3 is useless in procedure 2. All of the useful work is done by steps 1 and 2:
1. Look at a comment longer than 600 characters. 2. If you recognize it as meaningful English, conclude that it must be designed.
The calculation adds nothing. Now, could you please point this out to gpuccio before he embarrasses himself further? He won't accept it from me, but he might from you.
Gpuccio’s DFCSI isn’t useless, your Procedure 2 is useless.
Procedure 1 gives exactly the same answers as Procedure 2. You say Procedure 2 is useless. Therefore, Procedure 1 is also useless. Excellent job, PaV. You're a real asset to the ID team!keith s_{November 10, 2014
November
11
Nov
10
10
2014
11:19 PM
11
11
19
PM
PDT}

REC at #91: My argument about those two sequences is about their conservation in a complex molecule. You can scarcely deny that those specific sequences are necessary, with that high level of conservation, to the working of ATP synthase in its common form, and especially the form which utilizes H+ gradients. The Apicomplexa paper you link describes a very different complex molecule, made of many different protein sequences, and is a complex example of a different engineering solution. In no way it is in contradiction with the functional specification of the sequences I examined in the traditional ATP synthase complex. I paste here the abstract of that interesting paper, for all to read: "Highly Divergent Mitochondrial ATP Synthase Complexes in Tetrahymena thermophila Abstract The F-type ATP synthase complex is a rotary nano-motor driven by proton motive force to synthesize ATP. Its F1 sector catalyzes ATP synthesis, whereas the Fo sector conducts the protons and provides a stator for the rotary action of the complex. Components of both F1 and Fo sectors are highly conserved across prokaryotes and eukaryotes. Therefore, it was a surprise that genes encoding the a and b subunits as well as other components of the Fo sector were undetectable in the sequenced genomes of a variety of apicomplexan parasites. While the parasitic existence of these organisms could explain the apparent incomplete nature of ATP synthase in Apicomplexa, genes for these essential components were absent even in Tetrahymena thermophila, a free-living ciliate belonging to a sister clade of Apicomplexa, which demonstrates robust oxidative phosphorylation. This observation raises the possibility that the entire clade of Alveolata may have invented novel means to operate ATP synthase complexes. To assess this remarkable possibility, we have carried out an investigation of the ATP synthase from T. thermophila. Blue native polyacrylamide gel electrophoresis (BN-PAGE) revealed the ATP synthase to be present as a large complex. Structural study based on single particle electron microscopy analysis suggested the complex to be a dimer with several unique structures including an unusually large domain on the intermembrane side of the ATP synthase and novel domains flanking the c subunit rings. The two monomers were in a parallel configuration rather than the angled configuration previously observed in other organisms. Proteomic analyses of well-resolved ATP synthase complexes from 2-D BN/BN-PAGE identified orthologs of seven canonical ATP synthase subunits, and at least 13 novel proteins that constitute subunits apparently limited to the ciliate lineage. A mitochondrially encoded protein, Ymf66, with predicted eight transmembrane domains could be a substitute for the subunit a of the Fo sector. The absence of genes encoding orthologs of the novel subunits even in apicomplexans suggests that the Tetrahymena ATP synthase, despite core similarities, is a unique enzyme exhibiting dramatic differences compared to the conventional complexes found in metazoan, fungal, and plant mitochondria, as well as in prokaryotes. These findings have significant implications for the origins and evolution of a central player in bioenergetics. Author Summary Synthesis of ATP, the currency of the cellular energy economy, is carried out by a rotary nano-motor, the ATP synthase complex, which uses proton flow to drive the rotation of protein subunits so as to produce ATP. There are two main components in mitochondrial F-type ATP synthase complexes, each made up of a number of different proteins: F1 has the catalytic sites for ATP synthesis, and Fo forms channels for proton movement and provides a bearing and stator to contain the rotary action of the motor. The two parts of the complex have to interact with each other, and critical protein subunits of the enzyme are conserved from bacteria to higher eukaryotes. We were surprised that a group of unicellular organisms called alveolates (including ciliates, apicomplexa, and dinoflagellates) seemed to lack two critical proteins of the Fo component. We have isolated intact ATP synthase complexes from the ciliate Tetrahymena thermophila and examined their structure by electron microscopy and their protein composition by mass spectrometry. We found that the ATP synthase complex of this organism is quite different, both in its overall structure and in many of the associated protein subunits, from the ATP synthase in other organisms. At least 13 novel proteins are present within this complex that have no orthologs in any organism outside of the ciliates. Our results suggest significant divergence of a critical bioenergetic player within the alveolate group."gpuccio_{November 10, 2014
November
11
Nov
10
10
2014
11:17 PM
11
11
17
PM
PDT}

gpuccio, You get exactly the same answer whether or not you do the calculation, in 100% of the cases. Why waste time on a calculation that adds no value whatsoever? I repeat:
gpuccio, We can use your very own test procedure to show that dFSCI is useless. Procedure 1: 1. Look at a comment longer than 600 characters. 2. If you recognize it as meaningful English, conclude that it must be designed. 3. Perform a pointless and irrelevant dFSCI calculation. 4. Conclude that the comment was designed. Procedure 2: 1. Look at a comment longer than 600 characters. 2. If you recognize it as meaningful English, conclude that it must be designed. 3. Conclude that the comment was designed. The two procedures give exactly the same results, yet the second one doesn’t even include the dFSCI step. All the work was done by the other steps. The dFSCI step was a waste of time, mere window dressing. Even your own test procedure shows that dFSCI is useless, gpuccio.
keith s_{November 10, 2014
November
11
Nov
10
10
2014
11:10 PM
11
11
10
PM
PDT}

PaV: Thank you for your contributions. It's beautiful to have you here! :)gpuccio_{November 10, 2014
November
11
Nov
10
10
2014
11:04 PM
11
11
04
PM
PDT}

Reality at #112: Is what you see in this OP a calculation?gpuccio_{November 10, 2014
November
11
Nov
10
10
2014
11:00 PM
11
11
00
PM
PDT}

PaV at #110: Absolutely correct! Thank you.gpuccio_{November 10, 2014
November
11
Nov
10
10
2014
10:58 PM
10
10
58
PM
PDT}

Adapa: "The purpose of a dFSCI calculation is merely for gpuccio to convince himself he was specially created by his loving God." Really? What an argument. I am overwhelmed.gpuccio_{November 10, 2014
November
11
Nov
10
10
2014
10:56 PM
10
10
56
PM
PDT}

Prev 1 … 25 26 27 28 29 … 31 Next

You must be logged in to post a comment.

Leave a Reply