Uncommon Descent Serving The Intelligent Design Community

An attempt at computing dFSCI for English language

Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

In a recent post, I was challenged to offer examples of computation of dFSCI for a list of 4 objects for which I had inferred design.

One of the objects was a Shakespeare sonnet.

My answer was the following:

A Shakespeare sonnet. Alan’s comments about that are out of order. I don’t infer design because I know of Shakespeare, or because I am fascinated by the poetry (although I am). I infer design simply because this is a piece of language with perfect meaning in english (OK, ancient english).
Now, a Shakespeare sonnet is about 600 characters long. That corresponds to a search space of about 3000 bits. Now, I cannot really compute the target space for language, but I am assuming here that the number of 600 characters sequences which make good sense in english is lower than 2^2500, and therefore the functional complexity of a Shakespeare sonnet is higher than 500 bits, Dembski’s UPB. As I am aware of no simple algorithm which can generate english sonnets from single characters, I infer design. I am certain that this is not a false positive.

In the discussion, I admitted however that I had not really computed the target space in this case:

The only point is that I have not a simple way to measure the target space for English language, so I have taken a shortcut by choosing a long enough sequence, so that I am well sure that the target space /search space ratio is above 500 bits. As I have clearly explained in my post #400.
For proteins, I have methods to approximate a lower threshold for the target space. For language I have never tried, because it is not my field, but I am sure it can be done. We need a linguist (Piotr, where are you?).
That’s why I have chosen and over-generous length. Am I wrong? Well, just offer a false positive.
For language, it is easy to show that the functional complexity is bound to increase with the length of the sequence. That is IMO true also for proteins, but it is less intuitive.

That remains true. But I have reflected, and I thought that perhaps, even if I am not a linguist and not even a amthematician, I could try to define better quantitatively the target space in this case, or at least to find a reasonable higher threshold for it.

So, here is the result of my reasonings. Again, I am neither a linguist nor a mathematician, and I will happy to consider any comment, criticism or suggestion. If I have made errors in my computations, I am ready to apologize.

Let’s start from my functional definition: any text of 600 characters which has good meaning in English.

The search space for a random search where every character has the same probability, assuming an alphabet of 30 characters (letters, space, elementary punctuation) gives easily a search space of 30^600, that is 2^2944. IOWs 2944 bits.

OK.

Now, I make the following assumptions (more or less derived from a quick Internet search:

a) There are about 200,000 words in English

b) The average length of an English word is 5 characters.

I also make the easy assumption that a text which has good meaning in English is made of English words.

For a 600 character text, we can therefore assume an average number of words of 120 (600/5).

Now, we compute the possible combinations (with repetition) of 120 words from a pool of 200000. The result, if I am right, is: 2^1453. IOWs 1453 bits.

Now, obviously each of these combinations can have n! permutations, therefore each of them has 120! different permutation, that is 2^660. IOWs 660 bits.

So, multiplying the total number of word combinations with repetitions by the total number of permutations for each combination, we have:

2^1453 * 2^660 = 2^2113

IOWs, 2113 bits.

What is this number? It is the total number of sequences of 120 words that we can derive from a pool of 200000 English words. Or at least, a good approximation of that number.

It’s a big number.

Now, the important concept: in that number are certainly included all the sequences of 600 characters which have good meaning in English. Indeed, it is difficult to imagine sequences that have good meaning in English and are not made of correct English words.

And the important question: how many of those sequences have good meaning in English? I have no idea. But anyone will agree that it must be only a small subset.

So, I believe that we can say that 2^2113 is a higher threshold for out target space of sequences of 600 characters which have a good meaning in English. And, certainly, a very generous higher threshold.

Well, if we take that number as a measure of our target space, what is the functional information in a sequence of 600 characters which has good meaning in English?

It’s easy: the ratio between target space and search space:

2^2113 / 2^ 2944 = 2^-831. IOWs, taking -log2, 831 bits of functional information. (Thank you to drc466 for the kind correction here)

So, if we consider as a measure of our functional space a number which is certainly an extremely overestimated higher threshold for the real value, still our dFSI is over 800 bits.

Let’s go back to my initial statement:

Now, a Shakespeare sonnet is about 600 characters long. That corresponds to a search space of about 3000 bits. Now, I cannot really compute the target space for language, but I am assuming here that the number of 600 characters sequences which make good sense in english is lower than 2^2500, and therefore the functional complexity of a Shakespeare sonnet is higher than 500 bits, Dembski’s UPB. As I am aware of no simple algorithm which can generate english sonnets from single characters, I infer design. I am certain that this is not a false positive.

Was I wrong? You decide.

By the way, another important result is that if I make the same computation for a 300 character string, the dFSI value is 416 bits. That is a very clear demonstration that, in language, dFSI is bound to increase with the length of the string.

Comments
I guess so :)Collin
November 10, 2014
November
11
Nov
10
10
2014
05:51 PM
5
05
51
PM
PDT
Collin:
Puccini, thanks. It’s clearer.
Puccini? Is that your auto-correct talking?keith s
November 10, 2014
November
11
Nov
10
10
2014
05:49 PM
5
05
49
PM
PDT
Puccini, thanks. It's clearer.Collin
November 10, 2014
November
11
Nov
10
10
2014
05:44 PM
5
05
44
PM
PDT
gpuccio, You failed to provide a way to measure "meaning" whatever that means.Mung
November 10, 2014
November
11
Nov
10
10
2014
05:24 PM
5
05
24
PM
PDT
Reality:
I don’t think that Collin’s questions are misguided. They are good, relevant questions that deserve a good, relevant response.
Why do you think they are they good questions? Why do you think they are relevant?Mung
November 10, 2014
November
11
Nov
10
10
2014
05:21 PM
5
05
21
PM
PDT
centrestream:
A little less condescension and a little more civil discourse would be appropriate.
Civil discourse requires honesty.Mung
November 10, 2014
November
11
Nov
10
10
2014
05:17 PM
5
05
17
PM
PDT
gpuccio, I want to think about your response and will get back to you later.Reality
November 10, 2014
November
11
Nov
10
10
2014
04:59 PM
4
04
59
PM
PDT
gpuccio, you said, if you define the function differently, for example any sequence of that length which has good meaning in English, things change. I say, exactly!!!!!! Think of this measure as having 2 axises. The X axis is the lengthen of the sequence and the y axis is the "meaning threshold" I'm evaluating . The lower on the y axis you are the longer the string needs to be for me to infer design. For some one familiar with this debate "me thinks it's a weasel" is loaded with meaning. For an average Joe it might take a whole sonnet to pass the threshold. On the other hand if I were looking at a string of text in Chinese it might take a string the length of a whole play to pass the tet because I would be looking for mere arbitrary structure and grammar as apposed English words. But even in that case I would be able to give the sequence a real objective value and compare it to strings that were the result of combination of algorithmic and random processes peacefifthmonarchyman
November 10, 2014
November
11
Nov
10
10
2014
04:19 PM
4
04
19
PM
PDT
Zachriel: I agree with that. I remember your softwares.gpuccio
November 10, 2014
November
11
Nov
10
10
2014
03:55 PM
3
03
55
PM
PDT
gpuccio: Algorithmic oracles can only recycle frozen meaning. If you can't objectively judge the meaning of a phrase or sonnet, then you're fairly well stuck. However, we can certainly evolve sequences of words as words are somewhat frozen by convention. Grammar, as well. This gives us strings of words, which would seemingly have more meaning than random letters.Zachriel
November 10, 2014
November
11
Nov
10
10
2014
03:53 PM
3
03
53
PM
PDT
Zachriel: You have always been elegant in your skirmishes. That's why I like you! :)gpuccio
November 10, 2014
November
11
Nov
10
10
2014
03:52 PM
3
03
52
PM
PDT
gpuccio: The the 200,000 word dictionary, for example, is rather complex as an oracle. About 10^7 bits. gpuccio: I agree with you: with the appropriate oracle, you can do anything. There has to be some reasonable continuity in relative reward or it won't work. gpuccio: Algorithmic oracles can only recycle frozen meaning. If you define it so it can't exist, then sure.Zachriel
November 10, 2014
November
11
Nov
10
10
2014
03:48 PM
3
03
48
PM
PDT
Roy: Don't be fastidious. I took the number from the Internet. OK, I have redone the computation for 500,000 words. Is that enough for you? The dFSCI is now 673 bits. You can check yourself. And do you realize how much I am underestimating the dFSCI when I take the total number of sequences made by English words as target space, instead of taking the total number of sequences which have good meaning in English? So, don't be fastidious.gpuccio
November 10, 2014
November
11
Nov
10
10
2014
03:48 PM
3
03
48
PM
PDT
Zachriel: Nice to hear from you. I agree with you: with the appropriate oracle, you can do anything. And so? The the 200,000 word dictionary, for example, is rather complex as an oracle. What an algorithmic oracle can never do is to generate new complex meaning because it understands that meaning. Algorithmic oracles can only recycle frozen meaning. Conscious oracles, instead, understand meaning. It's all another matter. Please, look at this interesting paper: Using Turing Oracles in Cognitive Models of Problem-Solving http://www.blythinstitute.org/images/data/attachments/0000/0041/bartlett1.pdfgpuccio
November 10, 2014
November
11
Nov
10
10
2014
03:29 PM
3
03
29
PM
PDT
Reality: "According to Joe, CSI=dFSCI=FSC=FSCO/I. Do you agree with him?" My CSI=dFSCI=FSC=FSCO/I has detected a sock. You slick devil you.centrestream
November 10, 2014
November
11
Nov
10
10
2014
03:28 PM
3
03
28
PM
PDT
a) There are about 200,000 words in English
Nowhere nearly enough. OED has more than that and Webster's 2nd has more than twice as many. And that's without including a similar number of place and personal names, all of which are valid in English text.Roy
November 10, 2014
November
11
Nov
10
10
2014
03:27 PM
3
03
27
PM
PDT
Reality: Please. read post #37 here for the procedure. Please, read post #661 here: https://uncommondescent.com/intelligent-design/evolution-driven-by-laws-not-random-mutations/ for the acronyms.gpuccio
November 10, 2014
November
11
Nov
10
10
2014
03:23 PM
3
03
23
PM
PDT
Another reason why it is best to start with something like an English phrase rather than biology is that logically it should be much easier to-produce a false positive for a short sequence of letters than for a protein sequence. Again I would love to see the algorithm that can create a false positive here. Since we are dealing with text I see no reason such an algorithm could not be put together on a laptop with no special software. come on critics give it a go. peacefifthmonarchyman
November 10, 2014
November
11
Nov
10
10
2014
03:20 PM
3
03
20
PM
PDT
mullerpr: Do you know many sieve like things in nature? Non-living nature if full of natural sieves. If not, then the Earth would be homogeneous, which it is not. Gold and salt are found concentrated in some places, water in others. Indeed, there are natural water pumps to replenish the headwaters of rivers so that the running water can continue to shape and sort the rocks. The movement of sun and moon and wind and surf make the sand on the beach. gpuccio: Nobody seems to object that he knows some simple algorithm which can write a passage of 600 characters which has good meaning in English. If you had an oracle which could return relative meaning, and if you consider "the king" to have more meaning than "king", the former being specific, then an evolutionary algorithm should be able to create long sequences of meaning. Perhaps you could have the snippets read to Elizabethan audiences, and rate them by applause. gpuccio: An attempt at computing dFSCI for English language... I have no idea. The density of meaningful sonnets is an interesting question, but let's grant that Shakespeare's sonnets were the result of an intelligent mind. gpuccio: 2^2113 / 2^ 2944 = 2^-831 That's fine, but it can be shown that, given a suitable oracle, words can evolve from letters, and sentences from words. Indeed, if you feed the little genomes based on their iambic character, they'll evolve into iambs. But just start with words and the 200,000 word dictionary you so kindly provided for our oracle. As the algorithm can generate sequences of words, that means your calculation becomes 2^2113 / 2^ 2113 = 1. We've already crossed a distance of 2^-831.Zachriel
November 10, 2014
November
11
Nov
10
10
2014
03:18 PM
3
03
18
PM
PDT
fifthmonarchyman: I appreciate your insightful comment! About "“me thinks it’s a weasel", it depends. If you define the target as that specific phrase, its length is 23 characters, the search space is about 113 bits, and as there is only one sequence which satisfies the definition, the functional space is 1 and the functional complexity is -log2 of 1:2^113, again 113 bits. Not too much, not too little. For many systems and time spans, it would be enough to infer design. After all, 10^34 is a big number. But if you define the function differently, for example any sequence of that length which has good meaning in English, things change. Applying the method I have used, which is probably less precise for short sequences, the functional complexity is about 25 bits. IOWs there are about 3 probabilities in 100,000,000 to get a positive result. Quite in the range of many random systems.gpuccio
November 10, 2014
November
11
Nov
10
10
2014
03:16 PM
3
03
16
PM
PDT
gpuccio, you said: "Design detection by dFSCI is a procedure with 100% specificity and low sensitivity. It has no false positives, and many false negatives." If that's true it's only because dFSCI is a useless term that is used by IDists to make it look as though scientific methods are being used to detect design, even though the alleged design detection pertains only to things that are already known to be designed. On another note, you said: "Design detection by dFSCI is a procedure...". Design detection by doing what with dFCSI? According to Joe, CSI=dFSCI=FSC=FSCO/I. Do you agree with him?Reality
November 10, 2014
November
11
Nov
10
10
2014
03:09 PM
3
03
09
PM
PDT
Reality: I have just done it for ATP synthase. Look at post #27 here. You may perhaps understand that the specificity of the procedure must be tested with objects of which we can assess independently the origin, and then be applied to objects whose origin is controversial. So, this discussion about language is important. You may perhaps understand that an elephant, a cancer cell, and a galaxy cluster are not digital sequences. So, I prefer to apply the procedure to proteins. Do you agree that it is a relevant application?gpuccio
November 10, 2014
November
11
Nov
10
10
2014
02:55 PM
2
02
55
PM
PDT
Mung, what "ID theory" are you referring to? I don't think that Collin's questions are misguided. They are good, relevant questions that deserve a good, relevant response.Reality
November 10, 2014
November
11
Nov
10
10
2014
02:53 PM
2
02
53
PM
PDT
Mung: "Collin @ 56. I take it you understand NOTHING about ID theory. Nothing. Would you at least cop to that before I put out the effort to answer your oh so misguided questions?" A little less condescension and a little more civil discourse would be appropriate. Now that you have been corrected, could you please explain why Collin's question is misguided? If it is the answer that I expect you to give, it will be based on a misguided understanding of evolution. Please, enlighten us.centrestream
November 10, 2014
November
11
Nov
10
10
2014
02:51 PM
2
02
51
PM
PDT
Collin: Please, take the time to review my procedure reposted here by me at #37. Design detection by dFSCI is a procedure with 100% specificity and low sensitivity. It has no false positives, and many false negatives. The main reason for false negatives is that the observer cannot see the function and define it. So, in your example, if the text is in a language I don't know and I don't understand its meaning, I cannot define a function as "having good meaning in this language". So, I will not infer design, and that will be a false negative. False positives, to my best knowledge, don't exist. Unless someone here proposes one. So, if we infer design, we can be rather certain of our inference. Regarding information, I give no special meaning to the word: only what I have explicitly defined. Please, see my OP about that: https://uncommondescent.com/intelligent-design/functional-information-defined/ The relevant part:
So, the general definitions: c) Specification. Given a well defined set of objects (the search space), we call “specification”, in relation to that set, any explicit objective rule that can divide the set in two non overlapping subsets: the “specified” subset (target space) and the “non specified” subset. IOWs, a specification is any well defined rule which generates a binary partition in a well defined set of objects. d) Functional Specification. It is a special form of specification (in the sense defined above), where the rule that specifies is of the following type: “The specified subset in this well defined set of objects includes all the objects in the set which can implement the following, well defined function…” . IOWs, a functional specification is any well defined rule which generates a binary partition in a well defined set of objects using a function defined as in a) and verifying if the functionality, defined as in b), is present in each object of the set. It should be clear that functional specification is a definite subset of specification. Other properties, different from function, can in principle be used to specify. But for our purposes we will stick to functional specification, as defined here. e) The ratio Target space/Search space expresses the probability of getting an object from the search space by one random search attempt, in a system where each object has the same probability of being found by a random search (that is, a system with an uniform probability of finding those objects). f) The Functionally Specified Information (FSI) in bits is simply –log2 of that number. Please, note that I imply no specific meaning of the word “information” here. We could call it any other way. What I mean is exactly what I have defined, and nothing more.
IOWs, FSI is only -log2 of the probability of finding the target space. It is a measure of the functional bits, the number of bits which are absolutely necessary to implement the function. More intuitively, it's the quantity of information necessary to implement the defined function. If you read the whole OP linked above, you may understand better my definitions. You say: "The information content in this sentence is not found separately in each word but in their associations. So I could write “red happens glory fishing diamond wrangler” and although each word has meaning, the phrase itself has none. Can that be calculated or determined somehow? Can an objective number be placed on it? 12 units of meaning?" No. If you follow with attention my reasoning in the OP of this thread, you will see that my functional definition is: any sequence of 600 characters which has good meaning in English. Therefore, for a sequence to be specified (to be part of the target space), the whole sequence must have good meaning in English. But, you may say that what I have computed as target space is the total number of combinations and permutations of English words in 600 characters. That's true. But I have done that only because so I have a higher threshold for the functional space (and therefore a lower threshold for the functional complexity). Why? Because the set of all sequences which have good meaning in English is certainly a small subset of the set of all sequences made by English words, and is included in it. That's why I say:
Now, the important concept: in that number are certainly included all the sequences of 600 characters which have good meaning in English. Indeed, it is difficult to imagine sequences that have good meaning in English and are not made of correct English words. And the important question: how many of those sequences have good meaning in English? I have no idea. But anyone will agree that it must be only a small subset. So, I believe that we can say that 2^2113 is a higher threshold for out target space of sequences of 600 characters which have a good meaning in English. And, certainly, a very generous higher threshold.
Clear?gpuccio
November 10, 2014
November
11
Nov
10
10
2014
02:49 PM
2
02
49
PM
PDT
What gpuccio is trying to do here is demonstrate an objective Turing test. What the critic's algorithm needs to do is fool us into believing that it is intelligent. very interesting!!!!!! How about a simpler example to help those of us who struggle with the big numbers? If the "me thinks it's a weasel" program was shown to have not smuggled in it's information how many bits would it have produced? peacefifthmonarchyman
November 10, 2014
November
11
Nov
10
10
2014
02:48 PM
2
02
48
PM
PDT
Centrestream, Sand on the beach... Is that really going to be presented as an analogue for Natural Selection? I like physical necessity when it comes to things like orbital paths for planets, chemical bonds in minerals, mechanical action etc. But the patterns created by necessity is by definition bad at carrying information new. If there is no degree of freedom there is no information carrying capacity. The clear distinction between life and the uniformity of physical processes convinced Hubert Yockey that the definition of life is informational in contrast to non-informational physical processes... The first biological information he concluded is an axiomatic concept not explained by natural processes. Information Theory, Evolution, and The Origin of Life http://www.amazon.com/gp/aw/d/0521169585?pc_redir=1414569767&robot_redir=1mullerpr
November 10, 2014
November
11
Nov
10
10
2014
02:45 PM
2
02
45
PM
PDT
gpuccio, Joe commented in this thread, and since Joe claims to know all about CSI I thought that everyone here could learn all about it by looking at his brilliant explanations on his blog. Besides, linking to other sites or bringing up what has been said by others in another thread is a common action by IDists here so I don't see why there should be any problem with my doing the same. Regarding your "computation", does your "computation" have anything to do with measuring, calculating, or computing CSI in anything other than English text which is already known to be designed? For example, can and will you please measure, calculate, or compute CSI in an elephant, a cancer cell, and a galaxy cluster? Thanks in advance.Reality
November 10, 2014
November
11
Nov
10
10
2014
02:44 PM
2
02
44
PM
PDT
Collin @ 56. I take it you understand NOTHING about ID theory. Nothing. Would you at least cop to that before I put out the effort to answer your oh so misguided questions?Mung
November 10, 2014
November
11
Nov
10
10
2014
02:43 PM
2
02
43
PM
PDT
Even Richard Dawkins believes in calculating design. His minimally designed phrase was "METHINKS IT IS LIKE A WEASEL." Not quite a sonnet.Mung
November 10, 2014
November
11
Nov
10
10
2014
02:37 PM
2
02
37
PM
PDT
1 27 28 29 30 31

Leave a Reply