Uncommon Descent Serving The Intelligent Design Community

An attempt at computing dFSCI for English language

Categories
Intelligent Design
Share
Facebook
Twitter/X
LinkedIn
Flipboard
Print
Email

In a recent post, I was challenged to offer examples of computation of dFSCI for a list of 4 objects for which I had inferred design.

One of the objects was a Shakespeare sonnet.

My answer was the following:

A Shakespeare sonnet. Alan’s comments about that are out of order. I don’t infer design because I know of Shakespeare, or because I am fascinated by the poetry (although I am). I infer design simply because this is a piece of language with perfect meaning in english (OK, ancient english).
Now, a Shakespeare sonnet is about 600 characters long. That corresponds to a search space of about 3000 bits. Now, I cannot really compute the target space for language, but I am assuming here that the number of 600 characters sequences which make good sense in english is lower than 2^2500, and therefore the functional complexity of a Shakespeare sonnet is higher than 500 bits, Dembski’s UPB. As I am aware of no simple algorithm which can generate english sonnets from single characters, I infer design. I am certain that this is not a false positive.

In the discussion, I admitted however that I had not really computed the target space in this case:

The only point is that I have not a simple way to measure the target space for English language, so I have taken a shortcut by choosing a long enough sequence, so that I am well sure that the target space /search space ratio is above 500 bits. As I have clearly explained in my post #400.
For proteins, I have methods to approximate a lower threshold for the target space. For language I have never tried, because it is not my field, but I am sure it can be done. We need a linguist (Piotr, where are you?).
That’s why I have chosen and over-generous length. Am I wrong? Well, just offer a false positive.
For language, it is easy to show that the functional complexity is bound to increase with the length of the sequence. That is IMO true also for proteins, but it is less intuitive.

That remains true. But I have reflected, and I thought that perhaps, even if I am not a linguist and not even a amthematician, I could try to define better quantitatively the target space in this case, or at least to find a reasonable higher threshold for it.

So, here is the result of my reasonings. Again, I am neither a linguist nor a mathematician, and I will happy to consider any comment, criticism or suggestion. If I have made errors in my computations, I am ready to apologize.

Let’s start from my functional definition: any text of 600 characters which has good meaning in English.

The search space for a random search where every character has the same probability, assuming an alphabet of 30 characters (letters, space, elementary punctuation) gives easily a search space of 30^600, that is 2^2944. IOWs 2944 bits.

OK.

Now, I make the following assumptions (more or less derived from a quick Internet search:

a) There are about 200,000 words in English

b) The average length of an English word is 5 characters.

I also make the easy assumption that a text which has good meaning in English is made of English words.

For a 600 character text, we can therefore assume an average number of words of 120 (600/5).

Now, we compute the possible combinations (with repetition) of 120 words from a pool of 200000. The result, if I am right, is: 2^1453. IOWs 1453 bits.

Now, obviously each of these combinations can have n! permutations, therefore each of them has 120! different permutation, that is 2^660. IOWs 660 bits.

So, multiplying the total number of word combinations with repetitions by the total number of permutations for each combination, we have:

2^1453 * 2^660 = 2^2113

IOWs, 2113 bits.

What is this number? It is the total number of sequences of 120 words that we can derive from a pool of 200000 English words. Or at least, a good approximation of that number.

It’s a big number.

Now, the important concept: in that number are certainly included all the sequences of 600 characters which have good meaning in English. Indeed, it is difficult to imagine sequences that have good meaning in English and are not made of correct English words.

And the important question: how many of those sequences have good meaning in English? I have no idea. But anyone will agree that it must be only a small subset.

So, I believe that we can say that 2^2113 is a higher threshold for out target space of sequences of 600 characters which have a good meaning in English. And, certainly, a very generous higher threshold.

Well, if we take that number as a measure of our target space, what is the functional information in a sequence of 600 characters which has good meaning in English?

It’s easy: the ratio between target space and search space:

2^2113 / 2^ 2944 = 2^-831. IOWs, taking -log2, 831 bits of functional information. (Thank you to drc466 for the kind correction here)

So, if we consider as a measure of our functional space a number which is certainly an extremely overestimated higher threshold for the real value, still our dFSI is over 800 bits.

Let’s go back to my initial statement:

Now, a Shakespeare sonnet is about 600 characters long. That corresponds to a search space of about 3000 bits. Now, I cannot really compute the target space for language, but I am assuming here that the number of 600 characters sequences which make good sense in english is lower than 2^2500, and therefore the functional complexity of a Shakespeare sonnet is higher than 500 bits, Dembski’s UPB. As I am aware of no simple algorithm which can generate english sonnets from single characters, I infer design. I am certain that this is not a false positive.

Was I wrong? You decide.

By the way, another important result is that if I make the same computation for a 300 character string, the dFSI value is 416 bits. That is a very clear demonstration that, in language, dFSI is bound to increase with the length of the string.

Comments
keith s #101: "The calculation is completely unnecessary." Why? Guys, please clarify how can you reliably infer design for the sonnet without any calculation.gpuccio
November 10, 2014
November
11
Nov
10
10
2014
10:54 PM
10
10
54
PM
PDT
Me_Think at #100: "The answer is :no. I can see sonnet is designed without the need to calculate" How?gpuccio
November 10, 2014
November
11
Nov
10
10
2014
10:52 PM
10
10
52
PM
PDT
keith s: Again, the aim of this thread is not to re-discuss the whole issue of dFSCI and design detection, but only to propose a computetion of dFSCI in language. I have discussed in great detail the "any possible function" argument here: https://uncommondescent.com/intelligent-design/evolution-driven-by-laws-not-random-mutations/ Post #400. As I have already said, I don't like repetition. I have discussed the role of eliminating necessity in that same thread, for example, at posts #599 and #604. As I have already said, I don't like repetition. I have been discussing the lack of explanatory power of the RV + NS myth for years, in very great detail. You can find some thoughts on the difference between Natural Selection and Intelligent selection at post #524 of the above referenced thread, and a lot of other detailed stuff in posts of mine practically everywhere at UD. As I have already said, I don't like repetition. But just to humot you a little, a very very brief summary: Negative NS is a powerful mechanisms, and it works essentially against the RV + NS algorithm. Positive NS is almost non existent, limited to a few irrelevant microevolutionary scenarios, and can never help generate new complex functions, because complex functions cannot be deconstructed into naturally selectable simpler steps, neither in the general case (which is requested for the algorithm to work) nor in any single real example (which would at least be start). Moreover, if positive NS had had some role in generating the biological functional information, we should see tons of traces of naturally selectable functional intermediates in the proteome. We don't. Finally, genetic drift is completely irrelevant to the probabilistic computation, and in no way helps to lower the probabilistic barriers.gpuccio
November 10, 2014
November
11
Nov
10
10
2014
10:35 PM
10
10
35
PM
PDT
keith s:
Evolution does not seek out specific targets. It isn’t “trying” to find the flagellum, or binocular vision, or opposable thumbs. If it stumbles on something good, whatever that happens to be, it keeps it. If it stumbles on something bad, whatever that happens to be, it tosses it.
Yes, NS "stumbles." What is given it comes about "randomly;" but, all NS does, and can do, is either 'eliminate,' or 'not eliminate.' When the "search space" is enormous, then an 'enormous' number of 'eliminations' must take place. It is simply impossible for 'nature' to provide this enormity of possibilities. Hence, NS is rendered, except in minor ways, "useless." The "minor ways" where NS is "useful," we call "microevolution." But, this is a digression since gpuccio is simply trying to demonstrate that dFCSI calculations can eliminate "false positives."PaV
November 10, 2014
November
11
Nov
10
10
2014
10:29 PM
10
10
29
PM
PDT
keith s:
The dFSCI number reflects the probability that a given sequence was produced purely randomly, without selection. No evolutionary biologists thinks the flagellum (or any other complex structure) arose through a purely random process; everyone thinks selection was involved. By neglecting selection, your dFSCI number is answering a question that no one is asking. It’s useless.
Why are you substituting a question regarding "irreducible complexity" for one that involves the random generation of DNA strings? gpuccio's argument is not about IC. Yes, NS does act on what comes about randomly, and thus, there is a non-random component to the process. Nevertheless, the only thing NS does is to "eliminate" that which cannot either 'live' or 'compete.' NS doesn't 'form' the DNA string, it either accepts or eliminates. The "sonnet" that gpuccio is using for his example represents a "protein" that is found in nature, encoded in extant DNA. There are only so many known proteins and protein families. If each DNA string that is generated---you're not positing that DNA is generated non-randomly are you?---is generated 'randomly,' then the proteins and protein families we know of are the "survivors" of NS---as in "survival of the fittest." This means that the "sonnet" represents one of a number of acceptable forms on English words pieced together in a string of 'letters' that runs 600 letters long. It is like a protein family. The entire collection of such "combinations" represents an entirety of all such "protein families" found in nature, and, thus, presumably culled by NS from the "search space" strings of length 600 letters. Your invocation of NS does nothing to change his calculations, nor his logic.PaV
November 10, 2014
November
11
Nov
10
10
2014
10:19 PM
10
10
19
PM
PDT
fifthmonarchyman at #81: Absolutely correct! :)gpuccio
November 10, 2014
November
11
Nov
10
10
2014
10:08 PM
10
10
08
PM
PDT
Dionisio asked: "Why do you want to see a calculation?' Because you IDists claim that you can calculate CSI-dFSCI-FSCO/I. "Is that important to you? Why?" To see if you can, and laugh at you when you can't. "If an example is given, would you ask for another?" Yes. "If ten examples are provided, would you demand eleven?" Provide ten and then we'll see. "Is it possible, like someone suggested today, that you were hired by this blog to write what you write, in order to provoke certain folks to keep heated arguments, hence increase the number of posts in the discussion threads and increase the traffic in the blog?" It's possible but extremely unlikely. So unlikely that it's safe to say that what you implied is incredibly childish and trollish. How old are you, 6, 7?Reality
November 10, 2014
November
11
Nov
10
10
2014
10:03 PM
10
10
03
PM
PDT
Keith S
Exactly. The calculation is completely unnecessary.
But I must protest here Keith! Did you recognise a design and accept it? Design all around us then but not in biological systems? Why would that be? I'll tell you why, you have to deny design in biology because if you accept it you have to accept that you have been created by a designer, I believe that you find that idea repugnant, and I even know why, and so do you!Andre
November 10, 2014
November
11
Nov
10
10
2014
09:45 PM
9
09
45
PM
PDT
keith s:
gpuccio, We can use your very own test procedure to show that dFSCI is useless. Procedure 1: 1. Look at a comment longer than 600 characters. 2. If you recognize it as meaningful English, conclude that it must be designed. 3. Perform a pointless and irrelevant dFSCI calculation. 4. Conclude that the comment was designed. Procedure 2: 1. Look at a comment longer than 600 characters. 2. If you recognize it as meaningful English, conclude that it must be designed. 3. Conclude that the comment was designed. The two procedures give exactly the same results, yet the second one doesn’t even include the dFSCI step. All the work was done by the other steps. The dFSCI step was a waste of time, mere window dressing. Even your own test procedure shows that dFSCI is useless, gpuccio.
Aren't you missing something? If you omit step #3 of Procedure 1 in Procedure 2, then step#3 in Procedure 2 is completely meaningless. The whole point of gpuccio's "procedure" is to compare the recognition of "design" that is naturally made with the use of a particular language, and the values that are generated using dFSCI. Shouldn't that be clear to you? Gpuccio's DFCSI isn't useless, your Procedure 2 is useless.PaV
November 10, 2014
November
11
Nov
10
10
2014
09:42 PM
9
09
42
PM
PDT
gpuccio Another OT: https://uncommondescent.com/evolution/a-third-way-of-evolution/#comment-527403Dionisio
November 10, 2014
November
11
Nov
10
10
2014
09:28 PM
9
09
28
PM
PDT
#106 Adapa This is for you too:
https://uncommondescent.com/intelligent-design/an-attempt-at-computing-dfsci-for-english-language/#comment-527392
Dionisio
November 10, 2014
November
11
Nov
10
10
2014
09:08 PM
9
09
08
PM
PDT
#96 fifthmonarchyman
For example I’m working on a method to evaluate the strength of forecasting models at my place of employment.
That sounds interesting.Dionisio
November 10, 2014
November
11
Nov
10
10
2014
09:02 PM
9
09
02
PM
PDT
keith s Exactly. The calculation is completely unnecessary. The purpose of a dFSCI calculation is not to convince anyone in the scientific community of its design detection worth. The purpose of a dFSCI calculation is merely for gpuccio to convince himself he was specially created by his loving God.Adapa
November 10, 2014
November
11
Nov
10
10
2014
09:01 PM
9
09
01
PM
PDT
#103 mullerpr Interesting commentary. Thank you. BTW, I could not open the link you provided.Dionisio
November 10, 2014
November
11
Nov
10
10
2014
08:59 PM
8
08
59
PM
PDT
#100 Me_Think Does this link answer your question?: https://uncommondescent.com/intelligent-design/an-attempt-at-computing-dfsci-for-english-language/#comment-527381 Now, can you answer the questions in this link?: https://uncommondescent.com/intelligent-design/an-attempt-at-computing-dfsci-for-english-language/#comment-527389 Thank you.Dionisio
November 10, 2014
November
11
Nov
10
10
2014
08:54 PM
8
08
54
PM
PDT
Zachriel, Natural processes flowing from the uniformity of classical mechanics … Is that really going to be presented as an analogue for Natural Selection? I don't think you saw the critical questions I asked. I like physical necessity when it comes to things like orbital paths for planets, chemical bonds in minerals, mechanical action etc. But the patterns created by necessity is by definition bad at carrying information new. If there is no degree of freedom there is no information carrying capacity. The clear distinction between life and the uniformity of physical processes convinced Hubert Yockey that the definition of life is informational in contrast to non-informational physical processes… The first biological information he concluded is an axiomatic concept not explained by natural processes. Information Theory, Evolution, and The Origin of Life http://www.amazon.com/gp/aw/d/.....ot_redir=1 P.S. I would like you to discuss any sieve design with a minaral processing engineer, and he/she will tell you how symmerty in the behaviour of nature do not descriminate they way his/her processing plant does.mullerpr
November 10, 2014
November
11
Nov
10
10
2014
08:52 PM
8
08
52
PM
PDT
keith s Why do you want to see a calculation? Is that important to you? Why? If an example is given, would you ask for another? If ten examples are provided, would you demand eleven? Is it possible, like someone suggested today, that you were hired by this blog to write what you write, in order to provoke certain folks to keep heated arguments, hence increase the number of posts in the discussion threads and increase the traffic in the blog? :)Dionisio
November 10, 2014
November
11
Nov
10
10
2014
08:47 PM
8
08
47
PM
PDT
Me_Think:
The question is: do I need to calculate dFSCI to see if sonnet is designed? The answer is :no. I can see sonnet is designed without the need to calculate, so I am not sure what is being achieved here.
Exactly. The calculation is completely unnecessary.keith s
November 10, 2014
November
11
Nov
10
10
2014
08:45 PM
8
08
45
PM
PDT
The question is: do I need to calculate dFSCI to see if sonnet is designed? The answer is :no. I can see sonnet is designed without the need to calculate, so I am not sure what is being achieved here.Me_Think
November 10, 2014
November
11
Nov
10
10
2014
08:39 PM
8
08
39
PM
PDT
KF:
We don’t actually need to quantify to recognise, but we can quantify and the result is the quantification helps us see how hard it is for the atomic and temporal resources of the observed cosmos to arise beyond sparse search of very large config spaces implied by the possible arrangements of parts vs the tight configurational constraints implied by needs of interactive, specific functional organisation. KF
GP:
https://uncommondescent.com/intelligent-design/an-attempt-at-computing-dfsci-for-english-language/#comment-527189
Dionisio
November 10, 2014
November
11
Nov
10
10
2014
08:28 PM
8
08
28
PM
PDT
Sorry -- that should be a period, not a question mark, at the end of the quote.keith s
November 10, 2014
November
11
Nov
10
10
2014
08:26 PM
8
08
26
PM
PDT
FMM:
Well I guess I will rest my case then.
What case? Did you make an argument?
I can think of all kinds of useful purposes [for the calculation of dFSCI]?
Please share some of them! Gpuccio hasn't been able to come up with any, and I'm sure he'd be grateful.keith s
November 10, 2014
November
11
Nov
10
10
2014
08:24 PM
8
08
24
PM
PDT
Well I guess I will rest my case then. I can think of all kinds of useful purposes. For example I'm working on a method to evaluate the strength of forecasting models at my place of employment. The more "CSI" found in the actual data the weaker the model will be. Peacefifthmonarchyman
November 10, 2014
November
11
Nov
10
10
2014
08:10 PM
8
08
10
PM
PDT
FMM, This thread is about dFSCI. I'm not interested in your proposed digression. I would like to see an example in which dFSCI actually serves a useful purpose. Can you think of one? Gpuccio seems unable to.keith s
November 10, 2014
November
11
Nov
10
10
2014
08:03 PM
8
08
03
PM
PDT
Keith's Again we understand you think this is all a waste of time How about this? produce an algorithm capable of producing a 600-character English text independently with out smuggling information through the back door. Call it what ever you want. feel free to disregard the calculation Do you think such an algorithm is even possible? What would convince you that it is not? peacefifthmonarchyman
November 10, 2014
November
11
Nov
10
10
2014
08:00 PM
8
08
00
PM
PDT
FMM:
Now why not humor us and point us to an operation that combines random and algorithmic processes that is capable of giving a false positive in gpuccio’s Turing test.
Because what you are calling "gpuccio's Turing test" isn't a test of dFSCI at all. Here's what the GTT boils down to: 1. Present a 600-character text to gpuccio. 2. If gpuccio recognizes it as meaningful English, then conclude that the text was designed. The dFSCI calculation isn't required. It accomplishes nothing. I keep asking gpuccio for an example in which dFSCI actually does something useful, but he can't come up with one.keith s
November 10, 2014
November
11
Nov
10
10
2014
07:50 PM
7
07
50
PM
PDT
Keith's said In other words, you assume design if gpuccio is not aware of an explicit algorithm capable of producing the sequence. I say, I'm not sure if gpuccio is willing to go this far but I would say that there are certain sequences that algorithms are mathematically incapable of producing. That means any possible algorithm. would you disagree with this claim? peacefifthmonarchyman
November 10, 2014
November
11
Nov
10
10
2014
07:36 PM
7
07
36
PM
PDT
"I have just done it for ATP synthase" I don't think you have. Ignoring other significant issues (modelling evolution from a precursor as a random goal oriented search), you've simply no idea what fraction of sequence space gives a functional ATP synthase. You guess by aligning three sequences. 1) Nice cheat on the Archaeal sequence--using the one with maximum identity to the others. It is thought to be acquired through horizontal gene transfer. True archaeal ATP synthases have far less identity, so knock it off already with these silly 50% or whatever identity across all life #s. 2) Evolution hits on solutions, and get stuck in local optima. Rubisco is possibly the worst enzyme ever, but there it is, roughly the same in all plants. Designers (human) have already worked around it. That a sequence is conserved in evolution in NO way indicates it is the only solution in sequence space. Just a contingent solution that has persisted. 3) Despite this, some ATP synthase lineages have diversified. Plug an ATP synthase from apicomplexia into your alignment. What is that? It doesn't align at all, save some topologies and one key arginine? So how many bits is that??? Hmm... http://www.ncbi.nlm.nih.gov/pubmed/9425287?dopt=Abstract http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2881411/ http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1000418 http://www.nature.com/nature/journal/v513/n7519/full/nature13776.htmlREC
November 10, 2014
November
11
Nov
10
10
2014
07:33 PM
7
07
33
PM
PDT
We get it Keiths, you think this is all a waste of time. understood we read you loud and clear Now why not humor us and point us to an operation that combines random and algorithmic processes that is capable of giving a false positive in gpuccio's Turing test. It does not have to be an evolutionary algorithm. It does not even have to include a random component any algorithm will do. I promise once you do this small thing we will get to the nitty gritty of explaining why we think this is so important peacefifthmonarchyman
November 10, 2014
November
11
Nov
10
10
2014
07:22 PM
7
07
22
PM
PDT
drc466:
1) Despite your admiration, natural selection serves as a subtractive force in a search – it reduces the number of spaces searched.
Yes, and that's a good thing. Maximizing the amount of space searched is not the "goal". Searches have a cost.
It doesn’t directly affect either the target space, or the search space, numerically – it simply reduces the number of tries...
Evolution does not seek out specific targets. It isn't "trying" to find the flagellum, or binocular vision, or opposable thumbs. If it stumbles on something good, whatever that happens to be, it keeps it. If it stumbles on something bad, whatever that happens to be, it tosses it. (The above neglects drift, of course. With drift, beneficial mutations can sometimes be lost and deleterious mutations can sometimes be fixed.) When you define a target space in terms of a specific function, as gpuccio does, you are making a huge mistake, because evolution is not seeking that specific target. It is seeking anything that improves fitness. Gpuccio compounds the error by taking the ratio of the target space to the entire search space. That makes the dFSCI number useless for anything other than a purely random search. Evolution is not a purely random search. It includes selection. Why waste time calculating a number that neglects selection?
2) “If you recognize it as meaningful English, conclude that it must be designed have function.” When you fix this glaring error in your “logic”, it is obvious you have completely mistated the issue. The process is detect function/specificity, calculate complexity, determine design – not detect design, calculate complexity, determine design.
No, the result of the calculation simply tells us that the sequence in question could not have come about by a purely random search. We knew that already, so the calculation is pointless. All of the work gets done by the other, boolean component of dFSCI -- not the numerical value. And the boolean component of dFSCI boils down to what I described earlier:
In other words, you assume design if gpuccio is not aware of an explicit algorithm capable of producing the sequence. This is the worst kind of Designer of the Gaps reasoning. It boils down to this: “If gpuccio isn’t aware of a non-design explanation, it must be designed!”
The calculation and the result -- the number of bits of dFSCI -- are pure window dressing. They are designed to look mathy and sciencey, but they have no actual value and can be completely dispensed with.keith s
November 10, 2014
November
11
Nov
10
10
2014
06:37 PM
6
06
37
PM
PDT
1 26 27 28 29 30 31

Leave a Reply