Uncommon Descent Serving The Intelligent Design Community

An attempt at computing dFSCI for English language

Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

In a recent post, I was challenged to offer examples of computation of dFSCI for a list of 4 objects for which I had inferred design.

One of the objects was a Shakespeare sonnet.

My answer was the following:

A Shakespeare sonnet. Alan’s comments about that are out of order. I don’t infer design because I know of Shakespeare, or because I am fascinated by the poetry (although I am). I infer design simply because this is a piece of language with perfect meaning in english (OK, ancient english).
Now, a Shakespeare sonnet is about 600 characters long. That corresponds to a search space of about 3000 bits. Now, I cannot really compute the target space for language, but I am assuming here that the number of 600 characters sequences which make good sense in english is lower than 2^2500, and therefore the functional complexity of a Shakespeare sonnet is higher than 500 bits, Dembski’s UPB. As I am aware of no simple algorithm which can generate english sonnets from single characters, I infer design. I am certain that this is not a false positive.

In the discussion, I admitted however that I had not really computed the target space in this case:

The only point is that I have not a simple way to measure the target space for English language, so I have taken a shortcut by choosing a long enough sequence, so that I am well sure that the target space /search space ratio is above 500 bits. As I have clearly explained in my post #400.
For proteins, I have methods to approximate a lower threshold for the target space. For language I have never tried, because it is not my field, but I am sure it can be done. We need a linguist (Piotr, where are you?).
That’s why I have chosen and over-generous length. Am I wrong? Well, just offer a false positive.
For language, it is easy to show that the functional complexity is bound to increase with the length of the sequence. That is IMO true also for proteins, but it is less intuitive.

That remains true. But I have reflected, and I thought that perhaps, even if I am not a linguist and not even a amthematician, I could try to define better quantitatively the target space in this case, or at least to find a reasonable higher threshold for it.

So, here is the result of my reasonings. Again, I am neither a linguist nor a mathematician, and I will happy to consider any comment, criticism or suggestion. If I have made errors in my computations, I am ready to apologize.

Let’s start from my functional definition: any text of 600 characters which has good meaning in English.

The search space for a random search where every character has the same probability, assuming an alphabet of 30 characters (letters, space, elementary punctuation) gives easily a search space of 30^600, that is 2^2944. IOWs 2944 bits.

OK.

Now, I make the following assumptions (more or less derived from a quick Internet search:

a) There are about 200,000 words in English

b) The average length of an English word is 5 characters.

I also make the easy assumption that a text which has good meaning in English is made of English words.

For a 600 character text, we can therefore assume an average number of words of 120 (600/5).

Now, we compute the possible combinations (with repetition) of 120 words from a pool of 200000. The result, if I am right, is: 2^1453. IOWs 1453 bits.

Now, obviously each of these combinations can have n! permutations, therefore each of them has 120! different permutation, that is 2^660. IOWs 660 bits.

So, multiplying the total number of word combinations with repetitions by the total number of permutations for each combination, we have:

2^1453 * 2^660 = 2^2113

IOWs, 2113 bits.

What is this number? It is the total number of sequences of 120 words that we can derive from a pool of 200000 English words. Or at least, a good approximation of that number.

It’s a big number.

Now, the important concept: in that number are certainly included all the sequences of 600 characters which have good meaning in English. Indeed, it is difficult to imagine sequences that have good meaning in English and are not made of correct English words.

And the important question: how many of those sequences have good meaning in English? I have no idea. But anyone will agree that it must be only a small subset.

So, I believe that we can say that 2^2113 is a higher threshold for out target space of sequences of 600 characters which have a good meaning in English. And, certainly, a very generous higher threshold.

Well, if we take that number as a measure of our target space, what is the functional information in a sequence of 600 characters which has good meaning in English?

It’s easy: the ratio between target space and search space:

2^2113 / 2^ 2944 = 2^-831. IOWs, taking -log2, 831 bits of functional information. (Thank you to drc466 for the kind correction here)

So, if we consider as a measure of our functional space a number which is certainly an extremely overestimated higher threshold for the real value, still our dFSI is over 800 bits.

Let’s go back to my initial statement:

Now, a Shakespeare sonnet is about 600 characters long. That corresponds to a search space of about 3000 bits. Now, I cannot really compute the target space for language, but I am assuming here that the number of 600 characters sequences which make good sense in english is lower than 2^2500, and therefore the functional complexity of a Shakespeare sonnet is higher than 500 bits, Dembski’s UPB. As I am aware of no simple algorithm which can generate english sonnets from single characters, I infer design. I am certain that this is not a false positive.

Was I wrong? You decide.

By the way, another important result is that if I make the same computation for a 300 character string, the dFSI value is 416 bits. That is a very clear demonstration that, in language, dFSI is bound to increase with the length of the string.

Comments
Any comments on the computation itself?gpuccio
November 10, 2014
November
11
Nov
10
10
2014
11:22 AM
11
11
22
AM
PDT
mullerpr: NS can do almost nothing. Don't believe the neo darwinian propaganda. They have nothing. At the biochemical level, where an enzyme is needed, or a wonderful biological machine like ATP synthase, NS is powerless. I have challenged anyone here to offer even the start of an explanation for just two subunits of ATP synthase, alpha and beta. Look here, point 3: https://uncommondescent.com/intelligent-design/four-fallacies-evolutionists-make-when-arguing-about-biological-function-part-1/ Together, the two chains are 553 + 529 = 1082 AAs long. That is a search space of 4676 bits, greater than the Shakespeare sonnet. Together, they present 378 perfectly conserved aminoacid positions from LUCA to humans, which point to a target space of at least 1633 bits, probably greater than the Shakespeare sonnet (we cannot say for certain, because we have only lower thresholds of complexity, 831 bits for the sonnet, 1633 for the molecule, but the molecule seems to win!). Interesting, isn't it?gpuccio
November 10, 2014
November
11
Nov
10
10
2014
11:21 AM
11
11
21
AM
PDT
Mullerpr Keith S is the village dirt worshipper, he believes that dirt not only made itself but magically became alive all by itself... matter in Keith's opinion can create CSI and can build anything using unguided procesess, highly complicated engineering marvels anything poof into existence and nothing can do it a trillion times, than a designer.... You've really missed nothing.....Andre
November 10, 2014
November
11
Nov
10
10
2014
11:14 AM
11
11
14
AM
PDT
keith s:
The organisms with the beneficial mutations are the ones that do best at surviving and reproducing.
That is too vague to be of any use.Joe
November 10, 2014
November
11
Nov
10
10
2014
11:13 AM
11
11
13
AM
PDT
keith s:
And as I pointed out above, Dembski’s CSI requires knowing the value of P(T|H), which he cannot calculate.
That is incorrect as CSI does not appear in that paper. Also it is up to you and yours to provide "H" and you have failed to do so. Stop blaming us for your failures.
The dFSCI number reflects the probability that a given sequence was produced purely randomly, without selection.
What is there to select if nothing works until it is all together? Why can't keith s show that the addition of natural selection would change the calculation? AGAIN, CSI and dFSCI exist REGARDLESS of how they arose. The point of using them as intelligent design indicators is because every time we have observed them and knew the cause it has ALWAYS been via intelligent design. We have NEVER observed nature producing CSI nor dFSCI. There isn't anything in ID that prevents nature from producing CSI and there isn't anything in the equation that neglects natural selection. keith s is all fluff.Joe
November 10, 2014
November
11
Nov
10
10
2014
11:12 AM
11
11
12
AM
PDT
mullerpr:
How would natural selection translate into a search algorithm with a non-random search capability of selecting benefit?
That's as silly as asking "How does an unintelligent sieve know how to sort particles non-randomly by size?" The organisms with the beneficial mutations are the ones that do best at surviving and reproducing.keith s
November 10, 2014
November
11
Nov
10
10
2014
11:09 AM
11
11
09
AM
PDT
jerry: Now you found out! If you are interested, keith is not very expensive... :)gpuccio
November 10, 2014
November
11
Nov
10
10
2014
11:06 AM
11
11
06
AM
PDT
keith s: "Yes. The computation is useless, for reasons that I explain in the comments I just reposted." Good. My only interest here is that the computation is correct. :)gpuccio
November 10, 2014
November
11
Nov
10
10
2014
11:04 AM
11
11
04
AM
PDT
It may seem that I have payed you to increase the number of the comments in my OP.
I often wonder that the hostility and inanity of most of the anti-ID people is due to that they may be double agents and produce incoherent irrelevant comments to make the pro-ID people look good. Or that they are mindless and egged on by someone who is a double agent.jerry
November 10, 2014
November
11
Nov
10
10
2014
11:04 AM
11
11
04
AM
PDT
mullerpr: "I think the issue is the input info required just to be able to search for English words cannot be discounted… It should at least be a dictionary full of words to be added as input to the search algorithm." You are perfectly right. That's what Dembski and Marks call "added information". The best is always Dawkins, with his magic algorithm which can find a phrase that it already knows! And if they had the whole English dictionary in the algorithm, still they couldn only easily find the subset of good words, but the task of finding the subset of the subset, passages with good meaning, would remain unsurmountable. And if they had vast catalogues of well formed sentences, they could only find those sentences which they have, or similar to them. Still, a 600 character passage of original meaning would be out of range. That's why no algorithm can generate original language: algorithms have no idea of what meaning is, they can only recycle passively the meanings that have been "frozen" in them. That's why dFSCI is a sure marker of design. Unfortunately for keith! :) (keith, I am still waiting for a false positive. You can use this thread, so my comments will increase even more...)gpuccio
November 10, 2014
November
11
Nov
10
10
2014
11:03 AM
11
11
03
AM
PDT
gpuccio:
Old stuff.
Devastating stuff. Why should my criticisms change when your dFSCI concept hasn't?
Have you anything to say about this post?
Yes. It repeats the errors that I point out in the comments I just reposted.
Have you anything to say about the computation?
Yes. The computation is useless, for reasons that I explain in the comments I just reposted.keith s
November 10, 2014
November
11
Nov
10
10
2014
11:02 AM
11
11
02
AM
PDT
How would natural selection translate into a search algorithm with a non-random search capability of selecting benefit? I suppose survival or "more" successful replication also has a "say" in this so called "almost stochastic" system of evolving flagellum/s. That looks like a very information rich search scenario to me. The information from the combined "environment & survival" system fascinates me most... Just how much and what kind of information must be available in that system? (I suspect Keith S, don't see it to be problematic, but at least Jerry Fodor sees it) http://www.amazon.com/What-Darwin-Wrong-Jerry-Fodor/dp/0374288798 Did I miss something?mullerpr
November 10, 2014
November
11
Nov
10
10
2014
10:50 AM
10
10
50
AM
PDT
keith s: It may seem that I have payed you to increase the number of the comments in my OP. :) Good job!gpuccio
November 10, 2014
November
11
Nov
10
10
2014
10:46 AM
10
10
46
AM
PDT
Reposting this one, also: gpuccio, to Learned Hand:
I will explain what is “simple, beautiful and consistent” about CSI. It is the concept that there is an objective complexity which can be linked to a specification, and that high values of that complexity are a mark of a design origin.
gpuccio, That is true for Dembski’s CSI, but not your dFSCI. And as I pointed out above, Dembski’s CSI requires knowing the value of P(T|H), which he cannot calculate. And even if he could calculate it, his argument would be circular. Your “solution” makes the numerical value calculable, at the expense of rendering it irrelevant. That’s a pretty steep price to pay. There are indeed different approaches to a formal definition of CSI and of how to compute it, Different and incommensurable. a) I define a specification as any explicit rule which generates a binary partition in a search space, so that we can identify a target space from the rest of objects in the search space. Which is already a problem, because evolution does not seek out predefined targets. It takes what it stumbles upon, regardless of the “specification”, as long as fitness isn’t compromised. b) I define a special subset of SI: FSI. IOWs, of all possible types of specification I choose those where the partition is generated by the definition of a function. c) I define a subset of FSI: those objects exhibiting digital information. d) I define dFSI the -log2 of the ratio of the target space / the search space. This is why the numerical value of dFSCI is irrelevant. Evolution isn’t searching for that specific target, and even if it were, it doesn’t work by random mutation without selection. By omitting selection, you’ve made the dFSCI value useless. e) I categorize the value of dFSI according to an appropriate threshold (for the system and object I am evaluating, see later). If the dFSI is higher than the threshold, I say that the object exhibits dFSCI (see later for the evaluation of necessity algorithms) To infer design for an object, the procedure is as follows: a) I observe an object, which has its origin in a system and in a certain time span. b) I observe that the configuration of the object can be read as a digital sequence. c) If I can imagine that the object with its sequence can be used to implement a function, I define that function explicitly, and give a method to objectively evaluate its presence or absence in any sequence of the same type. d) I can define any function I like for the object, including different functions for the same object. Maybe I can’t find any function for the object. e) Once I have defined a function which is implemented by the object, I define the search space (usually all the possible sequences of the same length). f) I compute, or approximate, as much as possible, the target space, and therefore the target space/search space ratio, and take -log2 of that. This is the dFSI of the sequence for that function. h) I consider if the sequence has any detectable form of regularity, and if any known explicit algorithm available in the system can explain the sequence. The important point here is: there is no need to exclude that some algorithm can logically exist that will be one day found, and so on. All that has no relevance. My procedure is an empiric procedure. If an algorithmic explanation is available, that’s fine. If no one is available, I go on with my procedure. Which immediately makes the judgment subjective and dependent on your state of knowledge at the time. So much for objectivity. i) I consider the system, the time span, and therefore the probabilistic resources of the system (the total number of states that the system can reach by RV in the time span). So I define a threshold of complexity that makes the emergence by RV in the system and in the time span of a sequence of the target space an extremely unlikely event. For the whole universe, Dembski’s UPB of 500 bits is a fine threshold. For biological proteins on our planet, I have proposed 150 bits (after a gross calculation). Again, this is useless because nobody thinks that complicated structures or sequences come into being by pure random variation. It’s a numerical straw man. l) If the functional complexity of the sequence I observe is higher than the threshold (IOWs, if the sequence exhibits dFSCI), and if I am aware of no explicit algorithm available in the system which can explain the sequence, then I infer a design origin for the object. IOWs, I infer that the specific configuration which implements that function originated form a conscious representation and a conscious intentional output of information form a designer to the object. In other words, you assume design if gpuccio is not aware of an explicit algorithm capable of producing the sequence. This is the worst kind of Designer of the Gaps reasoning. It boils down to this: “If gpuccio isn’t aware of a non-design explanation, it must be designed!”keith s
November 10, 2014
November
11
Nov
10
10
2014
10:44 AM
10
10
44
AM
PDT
keith s: Old stuff. Have you anything to say about this post? Have you anything to say about the computation?gpuccio
November 10, 2014
November
11
Nov
10
10
2014
10:42 AM
10
10
42
AM
PDT
Reposting another comment comparing the flaws of CSI, FSCO/I, and dFSCI: Learned Hand, to gpuccio:
Dembski made P(T|H), in one form or another, part of the CSI calculation for what seem like very good reasons. And I think you defended his concept as simple, rigorous, and consistent. But nevertheless you, KF, and Dembski all seem to be taking different approaches and calculating different things.
That’s right. Dembski’s problems are that 1) he can’t calculate P(T|H), because H encompasses “Darwinian and other material mechanisms”; and 2) his argument would be circular even if he could calculate it. KF’s problem is that although he claims to be using Dembski’s P(T|H), he actually isn’t, because he isn’t taking Darwinian and other material mechanisms into account. It’s painfully obvious in this thread, in which Elizabeth Liddle and I press KF on this problem and he squirms to avoid it. Gpuccio avoids KF’s problem by explicitly leaving Darwinian mechanisms out of the numerical calculation. However, that makes his numerical dFSCI value useless, as I explained above. And gpuccio’s dFSCI has a boolean component that does depend on the probability that a sequence or structure can be explained by “Darwinian and other material mechanisms”, so his argument is circular, like Dembski’s. All three concepts are fatally flawed and cannot be used to detect design.keith s
November 10, 2014
November
11
Nov
10
10
2014
10:37 AM
10
10
37
AM
PDT
keith s, you are not a very critical thinker are you? What about your objection supports your assertions? Can you highlight it maybe? My search for an argument failed, but you seem to be convinced there is an argument. So, go for, it... What would it be?mullerpr
November 10, 2014
November
11
Nov
10
10
2014
10:35 AM
10
10
35
AM
PDT
Another comment from that thread worth reposting here: gpuccio, We’ve been over this many times, but the problem with your dFSCI calculations is that the number they produce is useless. The dFSCI number reflects the probability that a given sequence was produced purely randomly, without selection. No evolutionary biologists thinks the flagellum (or any other complex structure) arose through a purely random process; everyone thinks selection was involved. By neglecting selection, your dFSCI number is answering a question that no one is asking. It’s useless. There is a second aspect of dFSCI that is a boolean (true/false) variable, but it depends on knowing beforehand whether or not the structure in question could have evolved. You can’t use dFSCI to show that something couldn’t have evolved, because you already need to know that it couldn’t have evolved before you attribute dFSCI to it. It’s hopelessly circular. What a mess. The numerical part of dFSCI is useless because it neglects selection, and the boolean part is also useless because the argument that employs it is circular. dFSCI is a fiasco.keith s
November 10, 2014
November
11
Nov
10
10
2014
10:32 AM
10
10
32
AM
PDT
I think the issue is the input info required just to be able to seach for English words cannot be discounted... It should at least be a dictionary full of words to be added as input to the search algorithm. Did I miss something? ????mullerpr
November 10, 2014
November
11
Nov
10
10
2014
10:30 AM
10
10
30
AM
PDT
gpuccio, You're repeating your earlier mistakes. I already showed, in the other thread, that the dFSCI calculation is a complete waste of time:
gpuccio, We can use your very own test procedure to show that dFSCI is useless. Procedure 1: 1. Look at a comment longer than 600 characters. 2. If you recognize it as meaningful English, conclude that it must be designed. 3. Perform a pointless and irrelevant dFSCI calculation. 4. Conclude that the comment was designed. Procedure 2: 1. Look at a comment longer than 600 characters. 2. If you recognize it as meaningful English, conclude that it must be designed. 3. Conclude that the comment was designed. The two procedures give exactly the same results, yet the second one doesn’t even include the dFSCI step. All the work was done by the other steps. The dFSCI step was a waste of time, mere window dressing. Even your own test procedure shows that dFSCI is useless, gpuccio.
keith s
November 10, 2014
November
11
Nov
10
10
2014
10:28 AM
10
10
28
AM
PDT
Tim: What do you mean? I am trying to compute the target space, that is the set of sequences that have good meaning in English. IOWs, sequences which are made of English words. If a sequence is made of other groupings of characters which are not English words, it will not have a good meaning in English and it will not be part of the target space. Did I miss something?gpuccio
November 10, 2014
November
11
Nov
10
10
2014
10:22 AM
10
10
22
AM
PDT
On the consciousness/brain interface I also agree and find it very interesting that some serious science projects aim at finding the physical structure that can enstantiate a mind from the fundamental property of consciousness in nature... Allen Institute’s Christof Koch on Computer Consciousness | MIT Technology Review http://www.technologyreview.com/news/531146/what-it-will-take-for-computers-to-be-conscious/ This I also see as just another agreement that mind is not matter, and mind is the only known design capable entity.mullerpr
November 10, 2014
November
11
Nov
10
10
2014
10:21 AM
10
10
21
AM
PDT
This is just for fun, but uh, er, hm, how did you skip from those 30 character options right up to words?
For a 600 character text, we can therefore assume an average number of words of 120
Shouldn't that be "120 groups of characters?" What I mean is it seems like there are many many more factors of strings of characters that are not good ol' words? Was this just one aspect of your kind, conservatism in the math, or did I miss something?Tim
November 10, 2014
November
11
Nov
10
10
2014
10:18 AM
10
10
18
AM
PDT
mullerpr: The quantum level is certainly fundamental to understand conscious processes. But I think that it works as an interface between consciousness and the brain, a la Eccles. That's how conscious experiences and material events can exchange information without violating any physical law. That's how we design and, very likely, how the biological designer designed biological things.gpuccio
November 10, 2014
November
11
Nov
10
10
2014
10:10 AM
10
10
10
AM
PDT
mullerpr: "I am aware that Penrose see his method as a only non-reductionist and non-algorithmic, but still materialist… However I think Penrose, Nagel and Dembski (and others) are independently closing in on a post-materialist and/or post-classical mechanics explanation of reality. This looks like the stuff of a scientific revolution!" It does! And it is. :) And ID theory has a very important role in that scenario.gpuccio
November 10, 2014
November
11
Nov
10
10
2014
09:21 AM
9
09
21
AM
PDT
Thank you gpuccio, this is exactly the way I interpret the work of Penrose in this regards. I also see that more and more people consider consciousness from a non-materialistic, non-reductionist perspective. I actually read Thomas Nagel's "Mind and Cosmos" before "Being as Communion", and it was refreshing to see Dembski incorporating the proposed teleology from Nagel. Is see this as a far more rational metaphysics than naturalism or materialism. I am aware that Penrose see his method as a only non-reductionist and non-algorithmic, but still materialist... However I think Penrose, Nagel and Dembski (and others) are independently closing in on a post-materialist and/or post-classical mechanics explanation of reality. This looks like the stuff of a scientific revolution!mullerpr
November 10, 2014
November
11
Nov
10
10
2014
09:19 AM
9
09
19
AM
PDT
mullerpr: I am a big fan of Penrose's argument, even if I don't necessarily agree with his proposed explanatory model for consciousness. You may also be interested in this paper: http://www.blythinstitute.org/images/data/attachments/0000/0041/bartlett1.pdf which explores similar concepts. In my opinion, consciousness is a primary reality, which has its laws and powers. Its fundamental ability to always be able to go to a "metalevel" in respect to its contents and representations, due to the transcendental nature of the "I", is the true explanation for Turing's theorem and its consequences, including Penrose's argument. The same is true for design: it is a product of consciousness, and that's the reason why it can easily generate dFSCI, while nothing else in the universe can. The workings of consciousness use the basic experiences of meaning (cognition), feeling (purpose) and free will. Design is the result of those experiences. dFSCI is the magic result of them.gpuccio
November 10, 2014
November
11
Nov
10
10
2014
09:06 AM
9
09
06
AM
PDT
Has anyone discussed/considered Roger Penrose's criticism of algorithmic consciousness as presented in his works like "The Emperor's New Mind". There he use the incompleteness theorems to show that mental actions exceed the ability to account for consciousness, like the ability of Shakespeare for example, in terms of algorithmic search. He propose a non-algorithmic quantum effect. I am not qualified to give more than an interested lay person's perspective. I am reading William Dembski's "Being as Communion" and also don't see this proposal from Penrose and Hameroff being discussed.mullerpr
November 10, 2014
November
11
Nov
10
10
2014
08:58 AM
8
08
58
AM
PDT
1 29 30 31

Leave a Reply