Uncommon Descent Serving The Intelligent Design Community

An attempt at computing dFSCI for English language

Categories
Intelligent Design
Share
Facebook
Twitter/X
LinkedIn
Flipboard
Print
Email

In a recent post, I was challenged to offer examples of computation of dFSCI for a list of 4 objects for which I had inferred design.

One of the objects was a Shakespeare sonnet.

My answer was the following:

A Shakespeare sonnet. Alan’s comments about that are out of order. I don’t infer design because I know of Shakespeare, or because I am fascinated by the poetry (although I am). I infer design simply because this is a piece of language with perfect meaning in english (OK, ancient english).
Now, a Shakespeare sonnet is about 600 characters long. That corresponds to a search space of about 3000 bits. Now, I cannot really compute the target space for language, but I am assuming here that the number of 600 characters sequences which make good sense in english is lower than 2^2500, and therefore the functional complexity of a Shakespeare sonnet is higher than 500 bits, Dembski’s UPB. As I am aware of no simple algorithm which can generate english sonnets from single characters, I infer design. I am certain that this is not a false positive.

In the discussion, I admitted however that I had not really computed the target space in this case:

The only point is that I have not a simple way to measure the target space for English language, so I have taken a shortcut by choosing a long enough sequence, so that I am well sure that the target space /search space ratio is above 500 bits. As I have clearly explained in my post #400.
For proteins, I have methods to approximate a lower threshold for the target space. For language I have never tried, because it is not my field, but I am sure it can be done. We need a linguist (Piotr, where are you?).
That’s why I have chosen and over-generous length. Am I wrong? Well, just offer a false positive.
For language, it is easy to show that the functional complexity is bound to increase with the length of the sequence. That is IMO true also for proteins, but it is less intuitive.

That remains true. But I have reflected, and I thought that perhaps, even if I am not a linguist and not even a amthematician, I could try to define better quantitatively the target space in this case, or at least to find a reasonable higher threshold for it.

So, here is the result of my reasonings. Again, I am neither a linguist nor a mathematician, and I will happy to consider any comment, criticism or suggestion. If I have made errors in my computations, I am ready to apologize.

Let’s start from my functional definition: any text of 600 characters which has good meaning in English.

The search space for a random search where every character has the same probability, assuming an alphabet of 30 characters (letters, space, elementary punctuation) gives easily a search space of 30^600, that is 2^2944. IOWs 2944 bits.

OK.

Now, I make the following assumptions (more or less derived from a quick Internet search:

a) There are about 200,000 words in English

b) The average length of an English word is 5 characters.

I also make the easy assumption that a text which has good meaning in English is made of English words.

For a 600 character text, we can therefore assume an average number of words of 120 (600/5).

Now, we compute the possible combinations (with repetition) of 120 words from a pool of 200000. The result, if I am right, is: 2^1453. IOWs 1453 bits.

Now, obviously each of these combinations can have n! permutations, therefore each of them has 120! different permutation, that is 2^660. IOWs 660 bits.

So, multiplying the total number of word combinations with repetitions by the total number of permutations for each combination, we have:

2^1453 * 2^660 = 2^2113

IOWs, 2113 bits.

What is this number? It is the total number of sequences of 120 words that we can derive from a pool of 200000 English words. Or at least, a good approximation of that number.

It’s a big number.

Now, the important concept: in that number are certainly included all the sequences of 600 characters which have good meaning in English. Indeed, it is difficult to imagine sequences that have good meaning in English and are not made of correct English words.

And the important question: how many of those sequences have good meaning in English? I have no idea. But anyone will agree that it must be only a small subset.

So, I believe that we can say that 2^2113 is a higher threshold for out target space of sequences of 600 characters which have a good meaning in English. And, certainly, a very generous higher threshold.

Well, if we take that number as a measure of our target space, what is the functional information in a sequence of 600 characters which has good meaning in English?

It’s easy: the ratio between target space and search space:

2^2113 / 2^ 2944 = 2^-831. IOWs, taking -log2, 831 bits of functional information. (Thank you to drc466 for the kind correction here)

So, if we consider as a measure of our functional space a number which is certainly an extremely overestimated higher threshold for the real value, still our dFSI is over 800 bits.

Let’s go back to my initial statement:

Now, a Shakespeare sonnet is about 600 characters long. That corresponds to a search space of about 3000 bits. Now, I cannot really compute the target space for language, but I am assuming here that the number of 600 characters sequences which make good sense in english is lower than 2^2500, and therefore the functional complexity of a Shakespeare sonnet is higher than 500 bits, Dembski’s UPB. As I am aware of no simple algorithm which can generate english sonnets from single characters, I infer design. I am certain that this is not a false positive.

Was I wrong? You decide.

By the way, another important result is that if I make the same computation for a 300 character string, the dFSI value is 416 bits. That is a very clear demonstration that, in language, dFSI is bound to increase with the length of the string.

Comments
Collin I’m not claiming anything. I’m just saying what the test is supposed to do. No worries, I know it wasn't your claim. How do you know that dFSCI only works for items already known to be designed? That sounds like an article of faith. Because gpuccio told us he needs that info. In his language example he can't calculate the dFSCI unless he knows the string is an intelligible English phrase. If you give him symbols in a language he can't understand (i.e Chinese characters) he can't calculate dFSCI. Again, that makes his test pretty worthless for design detection.Adapa
November 12, 2014
November
11
Nov
12
12
2014
01:07 PM
1
01
07
PM
PDT
Adapa, I'm not claiming anything. I'm just saying what the test is supposed to do. My point was that if gpuccio's test could not tell if it were designed for sure, it does not mean that it is not a useful test. A test that cannot eliminate all false-negatives can still be useful if it can eliminate all false-positives. How do you know that dFSCI only works for items already known to be designed? That sounds like an article of faith.Collin
November 12, 2014
November
11
Nov
12
12
2014
12:50 PM
12
12
50
PM
PDT
Collin As an aside, gpuccio’s method is supposed to not have any false positives, but it may have a lot of false negatives. Since the method of calculating "dFSCI" only works for items already known to be designed claiming "no false positives in detecting design" is completely worthless.Adapa
November 12, 2014
November
11
Nov
12
12
2014
12:33 PM
12
12
33
PM
PDT
Reality, I assume in your example, if it were an algorithm, it would choose words randomly from a list of words. Obviously the words themselves are english words that have individual meanings. As an aside, gpuccio's method is supposed to not have any false positives, but it may have a lot of false negatives.Collin
November 12, 2014
November
11
Nov
12
12
2014
12:17 PM
12
12
17
PM
PDT
gpuccio, I don't agree that you answered my questions. Here's a repost with my questions in bold: gpuccio, I just don’t understand what you’re trying to prove. All I see is you claiming that some English text that is obviously designed or already known to be designed is designed. How does that demonstrate that IDists can calculate, measure, or compute (Which is the correct term?) CSI, dFSCI, FSCO/I, or FIASCO, and can verify the intelligent design in or of things that are not obviously designed and not known to be designed? And how does what you’re doing establish that CSI, and dFSCI, and FSCO/I are anything other than superficial labels? In regard to English text, what can you tell me about the text below? Is it a sonnet, or what? Does it have meaning? Does it have good meaning? If it has meaning or good meaning, what is it? Was it generated by a conscious being, or by an algorithm? How much CSI, and dFSCI, and FSCO/I does it have? Show your work. O me, and in the mountain tops with white After you want to render more than the zero Counterfeit: o thou media love and bullets She keeps thee, nor out the very dungeon Their end. O you were but though in the dead, Even there is best is the marriage of thee Brass eternal numbers visual trust of ships Masonry, at the perfumed left. Pity The other place with vilest worms, or wealth Brings. When my love looks be vile world outside Newspaper. And this sin they left me first last Created; that the vulgar paper tomorrow blooms More rich in a several plot, either by guile Addition me, have some good thoughts today Other give the ear confounds him, deliver’d From hands to be well gently bill, and wilt Is’t but what need’st thou art as a devil To your poem life, being both moon will be dark Thy beauty’s rose looks fair imperfect shade, ‘you, thou belied with cut from limits far behind Look strange shadows doth live. Why didst thou before Was true your self cut out the orient when sick As newspaper taught of this madding fever! Love’s picture then in happy are but never blue No leisure gave eyes against original lie Far a greater the injuries that which dies Wit, since sweets dost deceive and where is bent My mind can be so, as soon to dote. If. Which, will be thy noon: ah! Let makes up remembrance What silent thought itself so, for every one Eye an adjunct pleasure unit inconstant Stay makes summer’s distillation left me in tears Lambs might think the rich in his thoughts Might think my sovereign, even so gazed upon On a form and bring forth quickly in night Her account I not from this title is ending My bewailed guilt should example where cast Beauty’s brow; and by unions married to frogs Kiss the vulgar paper to speak, and wail Thee, and hang on her wish sensibility greenReality
November 12, 2014
November
11
Nov
12
12
2014
11:50 AM
11
11
50
AM
PDT
DNA_Jock, How would you create a design-detecting method without testing it on known designs to see if it accurately detected designed artifacts? This discussion reminds me of a scientific method for determining authorship called "Stylometry." Apparently everyone who writes, leaves a statistically-recognizable "wordprint." The wordprint can identify the author of a document whose authorship is unknown if it can be compared with the known writings of candidate authors. This method was tested by having researchers determine the authorship of certain texts that had known authors to see if it came up with false positives. It did not. They then used the method on other writings, including anonymous Federalists Papers essays to determine authorship. Here is the wikipedia article: http://en.wikipedia.org/wiki/StylometryCollin
November 12, 2014
November
11
Nov
12
12
2014
10:48 AM
10
10
48
AM
PDT
gpuccio: This is the essence of DNA_jock's concern:
Yes ATP was getting synthesized before humans existed, but the specification “ATP synthase” was generated by humans AFTER the biochemical activity was delineated. And re-defined by you in the light of Nina’s work.
I almost get the impression that DNA_jock has, unlike countless many others, actually wrestled with Dembski's work, and likely his "No Free Lunch" presentation of ID. That is good. Very good, if true. The discussion is all about "specification," and whether it is "pre" or "post." He insists that it be "pre." Here he suffers from a fundamental misunderstanding of Dembski--if he has read him---in which he fails to understand that a "specification" relies on the recognition of a "pattern." How, then, can you make "specification" prior to recognizing the "pattern" it forms? DNA_jock's position is this: it is you, gpuccio, who are making this "specification." He is wrong, because it is NOT gpuccio who is making the "specification," it is Nature itself which 'recognizes' this specification---or else we wouldn't even be talking about it. You, gpuccio, have only "recognized" what Nature has first "recognized." Here's an analogy: The SETI observers "recognize" a "pattern" in some electro-magnetic signal they've received. From this "pattern," they decide that it is so "unnatural" (i.e., it falls outside normal patterns of EM transmissions--IOW, it is highly IMPROBABLE) that its origin is intelligent life outside of our planet, possibly our galaxy. Unless something is responsible for this "highly improbable" signal, then why, and how, did the SETI observers conclude that they had evidence of intelligent life beyond earth? Per DNA_jock, the SETI observers are doing this all "post-hoc," and therefore their conclusion is meaningless. The criteria for 'specified, complex information' is the 'independence' of the 'source' of the information from the 'decipher-er' of the information. As long as DNA_jock takes the position that Nature does not "specify" the information, then there is only one "source" and one "decipher-er," and it's you, gpuccio. So, DNA_jock, let us ask you directly: do you, or do you not, believe that Nature itself is the "source" of the information found in ATPase?PaV
November 12, 2014
November
11
Nov
12
12
2014
10:40 AM
10
10
40
AM
PDT
Gp, Thanks for the paper - very cool. I know I have seen Fig 5 before, but I do not recall reviewing the body of the paper previously. I will definitely take a look. Tx againDNA_Jock
November 12, 2014
November
11
Nov
12
12
2014
10:22 AM
10
10
22
AM
PDT
keith s:
The calculation adds nothing. Now, could you please point this out to gpuccio before he embarrasses himself further? He won’t accept it from me, but he might from you.
This is not a serious answer. I pointed out to you the importance and purpose of step #3 in Procedure 2. You're willfully ignoring it. Why should anyone here at UD take you seriously?PaV
November 12, 2014
November
11
Nov
12
12
2014
09:55 AM
9
09
55
AM
PDT
DNA_Jock: I apologize if I have misunderstood that statement. Here is the rugged landscape paper. It is very interesting. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0000096 You say:
Most importantly, would a shorter answer to my question , “What truth standard do you plan on using to validate your protein-design detector?”, be “I don’t have one.”
let's make it: "Nobody has one, for any theory about protein origins". The point is, we have no direct evidence on how proteins originated. All the evidence, whatever the theory or paradigm we use, is indirect.gpuccio
November 12, 2014
November
11
Nov
12
12
2014
09:30 AM
9
09
30
AM
PDT
gpuccio, You show signs of starting to think about landscapes. This is good. You do seem to have mis-understood my point about the difference between "never been found" and "did not survive". You were claiming that nearby optima must be rare or inaccessible because they (according to you) "have never been found". I pointed out that you cannot draw any conclusions about whether they have been found: all we can observe is the ones that have survived. Parodying my position as you did here: "Then, if I ask why we don’t see big traces of all those independent local optima and of their independent optimization, or that in the ragged landscape paper the optimal local optimum could not be found except in the wildtype, suddenly they “have never been found” or “have not survived”. " is inaccurate. You may be confusing "have never been found" during the course of evolution with "have never been found, i.e. observed" by biologists. Do you have the citation for the retrieval of viral infectivity paper? I would love to read it. Most importantly, would a shorter answer to my question , “What truth standard do you plan on using to validate your protein-design detector?”, be "I don't have one."DNA_Jock
November 12, 2014
November
11
Nov
12
12
2014
08:46 AM
8
08
46
AM
PDT
DNA_Jock: "What truth standard do you plan on using to validate your protein-design detector?" The gold standard is used to validate a procedure, and then the procedure is used in cases where we don't know the gold standard value. Otherwise, it would be useless, and keith would be right. In the problem of origins, I think that nobody has an independent "truth standard". For obvious reasons, we try to understand what we observe, but we have no videos of how it happened on Youtube. The design detection procedure is validate with human artifacts. It has 100% specificity in all cases where we can know independently the origin. And the procedure is the same: it does not depend on sonnets or limericks: the functional definition can vary, but if a sufficiently high complexity can be linked to the definition, any definition, design is always the cause. The application to artifacts that, if confirmed such, are not human, is an inference by analogy. A very strong one, and a very good one. This is the only argument that, in the end, each person individually can opt for. Something like: "I understand the procedure, and it is correct, but I will not do that final jump of the inference by analogy". OK, I can accept that. Let's call it "the Fifth Amendment in science". :)gpuccio
November 12, 2014
November
11
Nov
12
12
2014
08:02 AM
8
08
02
AM
PDT
DNA_Jock: "Thank you REC @ 209 for running the alignment on a decent number of ATPases. 12 residues are 98% conserved. I suspect he might have been better off going with his Histone H3 example, but H3 doesn’t look complicated. Re your reply to him at 232. I won’t speak for REC, but I am happy to stipulate that extant, traditional ATP synthase is fairly highly constrained; you could return the favor by recognizing that this constraint informs us about the region immediately surrounding the local optimum, nothing more. Rather , I think the problem is with your cherry-picking of 3 sequences out of 23,949 for your alignment, which smacks of carelessness. Why not use the full data set?" This is more important, so I will try to be more precise. Durston computes a reduction of uncertainty for each aligned position, on a big number of sequences. Then he sums the results of all positions to get the total functional constraint. I am happy that you admit that extant, traditional ATP synthase is fairly highly constrained. That is my point. And I am happy to return the favor by recognizing that this constraint informs us about the region surrounding the local optimum. But you should admit that there are very different functional restraints for different functional "local optimums". And I don't really agree with your "immediately". Such a high level of conservation implies a steep separation of the peak in a non functional, or at most scarcely functional valley. The discussion about local optimums could lead us very far. One of my favorite papers is the one about the rugged landscape, where the authors conclude that local optimums for the particular function they are testing (the retrieval of viral infectivity in the lab) are so sparse that only starting from a random library of about 10^70 sequences the optimal local optimum (the wild type) could be reasonably found. Now, if local optimums are so sparse, and I do believe they are, how can they be so numerous as you seem to believe, so that their brute number could tamper the probability barriers? And if there are so many, and the search is really random, why don't we see such a variety? Why in the ragged landscape paper the wildtype is by far the best and most functional? If local optimums are distant, and evolve independently by independent lucky hits, we should see a lot of them. We certainly see many of them, but I must remind you that in your post you suggested: "Tough to say which local optima have never been found, when all we have to go on is the ones that survived. 2^1000 seems possible, but the number could be a lot higher." The behavior of these local optima is very strange, in darwinist rhetorics. When they are needed to improve probabilities, there are certainly a lot of them. A lot a lot. I had suggested 2^1000 as a mad hyperbole, but that was not enough for you. A lot higher! Then, if I ask why we don't see big traces of all those independent local optima and of their independent optimization, or that in the ragged landscape paper the optimal local optimum could not be found except in the wildtype, suddenly they "have never been found" or "have not survived". OK. for the moment let's leave the local optima alone. I have great expectations for histone H3. You say that it "doesn’t look complicated", but you are probably aware of its growing importance in understandimg epigenetic regulation. That can well be a strong functional constraint for the sequence. Now, I will return the favor again, and I am happy to admit that my "cherry-picking of 3 sequences out of 23,949 for my alignment" is not a rigorous scientific procedure. You say that it "smacks of carelessness". But that is not the real question. There is no doubt that the full procedure is to use the full data set and apply the Durston method (or any other method which can be found to work better and to be more empirically supported). So, why did I align three sequences and take only the identities? As I have clearly stated many times, that is only my "shortcut". But, I believe, a honest one. I had not the data from Durston about those two chains, but I was, and still I am, fascinated by their high conservation and very old age, and by their very special function in an even more complex biological machine. So, I have done, explicitly, a simple tradeoff: I have taken only one sequence for each of the three kindoms (and the human one for metazoa) and I have aligned them. And I have given explicitly the results. Now, it should be clear that when I only compute the identities in that alignment, giving 4.l3 bits for each one, I am certainly overestimating the absolute identities (obviously, on 23949 sequences, it is much more likely to have some divergence, and I must say that 12 residues with 80% conservation on the whole set looks rather stunning). But I am also not considering all the rest: the similarities, which still I could have emphasized in the basic alignment in BLAST, and all the other restraints which the Durston method can detect by comparing the frequencies of each AA at each site in the sample. IOWs, I have badly underestimated on all other sides. In my simple shortcut, I attribute 4.3 bits for each absolute conservation (378), but I attribute nothing for all the other positions, as though they were completely random, which is certainly not the case. On a total of 1082 positions (in the two chains), I have therefore vastly underestimated the fits of 704 AA positions, setting them to 0. I have done that for the sake of simplicity, because I have not the time and the tools to make complex biological analyses (I am not a biologist, only a medical doctor), and because going too deep in the details of biology, especially when writing a general OP on an important general concept, is not the best option. So, to sum up: REC's comments are correct, but they do not paint the right scenario. If his purpose is only to attack my "carelessness", OK, that's fine. But if he really suggests that my argument about the high conservation of those two chains is not realistic, I have to disagree. Those two chains are extremely conserved, even if compared with many other conserved sequences. And I am really confident that, if we apply the Durston method to the full set proposed by REC, the result will not be very different from my 1600 bits for both sequences, maybe higher. I could try to do that, I don't know if I can. We will see.gpuccio
November 12, 2014
November
11
Nov
12
12
2014
07:52 AM
7
07
52
AM
PDT
I note that you had no response to my commentary regarding your problems with Fisherian testing, but instead chose to focus on comments I made, as an aside, to humor you, about a text-detection procedure that I have always maintained is a deeply flawed analogy. I realize that I may have confused you, when I referred to your sonnet-detector under problem #4, but Problems A, B, C, (and even #4) refer to your protein-design-detector. But I’ll keep going with your flawed analogy, because it is irrelevant irreverent fun. You make a big deal out of the fact that you are validating your text-detection procedure. As Analytical Method Validation Protocols go, yours leaves something to be desired, but , given my view of the relevance of sonnet-detection to proteins, I will let that slide, and accept that you have been able, by blind-testing known sonnets and known non-sonnets, to get approximate values for the specificity and sensitivity of your sonnet-detection procedure. Whether it is robust has not been tested. The key point here, which you admit, is that you have to make use of a “truth standard” that allows you distinguish sonnets from non-sonnets, quite independent of your detector. This is an essential part of your method for validating your sonnet-detector. Cool. Now, if you want to convert your sonnet-detector into a limerick detector, you will have adjust some parameters, based on what you know about limericks. Then, to validate your limerick-detector, you will need a “truth standard” that allows you to independently distinguish limericks from non-limericks. Likewise for haiku’s etc., etc. What truth standard do you plan on using to validate your protein-design detector?DNA_Jock
November 12, 2014
November
11
Nov
12
12
2014
07:37 AM
7
07
37
AM
PDT
Shakespeare had an extensive dictionary, knowledge of grammar, rhyme, scansion, and verse structure; not to mention an understanding of what people enjoy, and of the human condition. Can you quantify the amount of additional 'information' in a Shakespearean sonnet that is not found in the background knowledge?Zachriel
November 12, 2014
November
11
Nov
12
12
2014
06:31 AM
6
06
31
AM
PDT
DNA_Jock: "1) you are equating the ratio of the size of the target space and the size of the total space with a probability. This assumes, incorrectly, that all members are equiprobable." See previous answer. It is true, however, that in my OP I assume a uniform distribution for the characters. "2) Since you are allowing repetition in your 120 words, then about one text in 1,700 will have word duplication. You need to adjust your n! term when this happens. Unlike error (1), this one is “not material” That only means that some permutations will be repeated. That makes the target space even smaller (not much). As I have computed, anyway, a lower threshold for functional complexity, I can't see how that is a problem. "You need to fix error 1 before you can claim to have calculated dFSCI. Good luck." Do you really believe that? Error 1 is not material too. But even if it could increase the probability of the target space a little, do you really believe that such an adjustment would compensate for my choice to use the target space of all combinations of English words instead of the target space of all the combinations of English words which have good meaning in English? OK, you have tried.gpuccio
November 12, 2014
November
11
Nov
12
12
2014
06:12 AM
6
06
12
AM
PDT
DNA_Jock: "A) You have not adequately described your null, the so-called “Chance hypothesis”" I have. I have assumed a text generated by a random character generator. An uniform probability distribution is the most natural hypothesis, but it is not necessary. Any probability distribution of the characters will do. Do you want to adjust the probability of each single character according to its probability in English? Be my guest. It would be added information, but OK, I am generous today. Now your piece of English with good meaning is nearer. Are you happy? :)gpuccio
November 12, 2014
November
11
Nov
12
12
2014
06:02 AM
6
06
02
AM
PDT
DNA_Jock: Good thoughts, as usual. But I have to disagree on may things. "Oh dear. Post-hoc specifications are suspect because they work perfectly." No. Follow me. I apply the specification "having a good meaning in English". And I make the computation to exclude possible random results. This is a procedure, well defined. I have generated the specification after seeing the sonnet, therefore it is a post-specification. Now, I test my procedure by applying it to ant sequence of 600 characters. I Easily detect those which have good meaning in English, and I infer design for them. Please, not that I am applying my model to new data, not only to the original sonnet. IOWs, I am testing my model and validating it. Now, two things are possible. a) My model works. When I compare the results of my inference with the real origin of the strings (which is known independently, and was not known to me at the time of the inference), I see that all my positive inferences are true positives, there is no false positive, and ny negative inferences are a mix of true negatives and false negatives. b) My model does not work, and a lot of false positives are found among my inferred positives. It's as simple as that. What has happened up to now? More in next post.gpuccio
November 12, 2014
November
11
Nov
12
12
2014
05:57 AM
5
05
57
AM
PDT
fifthmonarchyman: Exactly how much of a sonnet is new CSI and how much is borrowed from the background is a great question. Common ground! Shakespeare exhibit a huge amount of background knowledge, of what we call the human condition. fifthmonarchyman: However I’m pretty sure that most folks would say that at least a small amount of Shakespeare’s work was original as apposed borrowed from his environment. We would say a great deal was original to Shakespeare. fifthmonarchyman: If an algorithm can duplicate the pattern by any means whatsoever as long as it is independent of the source string then I discount the originality of the string. A random sequence is original by that definition, and even harder to duplicate. gpuccio: I infer design simply because this is a piece of language with perfect meaning in english Seems rather parochial and subjective.Zachriel
November 12, 2014
November
11
Nov
12
12
2014
05:55 AM
5
05
55
AM
PDT
Reality at #245: "gpuccio, are you going to answer my questions about the text I posted above?" I believed that my post #228 was an answer.gpuccio
November 12, 2014
November
11
Nov
12
12
2014
05:43 AM
5
05
43
AM
PDT
Biological specification refers to function. We don't care what you call it because we understand that your position cannot account for it regardless. And if you don't like our null we happily await your numbers. We have been waiting for over 100 years...Joe
November 12, 2014
November
11
Nov
12
12
2014
05:24 AM
5
05
24
AM
PDT
Gpuccio @ 187
“ALL post-hoc specifications are suspect.”
Except when they work perfectly.
Oh dear. Post-hoc specifications are suspect because they work perfectly. You are shooting yourself in the foot here.
IOWs, we are not trying to sell a drug at the 0.05 threshold of alpha error. I am afraid that you are completely missing the point.
Actually, the analogy is spot on. You are applying Fisherian testing to your data. You have at least three problems. What you and Dembski are doing is “formulating” (and I use the word loosely) a null hypothesis, examining a data set, and asking “what is the probability of getting a result THIS extreme (or more extreme) under my null?” If the probability is below an appropriate threshold, then the null is rejected. Problems A and B are related. A) You have not adequately described your null, the so-called “Chance hypothesis” B) some of you (e.g. Winston Ewert) are performing multiple tests, considering various “chance hypotheses” sequentially, rather than as a whole. I’ve made fun of this previously. Take-home is that, in order to perform the test and arrive at a p value, you need to be able to describe the expected distribution of your metric under the global “Chance Hypothesis”, which includes the effects of iterative selection. One can debate whether this is possible or not, but it is abundantly clear that no-one has even tried. You are indulging in Problem C: you are adjusting your metric after you have seen the data. This is the post-hoc specification. It renders the results of your calculations quite useless. By way of illustration, if you give me a sufficiently rich real-world data set for two groups of patients, X and Y, I can demonstrate that X is better than Y. AND I can demonstrate that Y is better than X, so long as I am allowed to mess with the way “better” is measured. Hence the FDA & EMA's insistence on pre-specified statistical tests. No, four problems! Amongst your prob… I’ll come in again. There’s also a subtle issue around the decision to do a test. If potentially random text is flowing across your desk and you are sitting quietly thinking, “Not sonnety, not sonnety, not sonnety, OOOH! Maybe sonnety, I will test this one!” then you have to be able to model the filtering process, or you’re screwed.
The example of the limerick is he same as saying that I should consider also the probability of chinese poems. As I have explained, at those levels of improbability those considerations are simply irrelevant.
“Yes”, and “Sez you”, respectively
My statement has always been simple: the procedure works empirically, as it is, with 100% specificity. In spite of all your attempts to prove differently.
I have never made any attempt to prove that your procedure does not work “empirically”. With appropriate post-hoc specifications, it should work every time. On anything.
Then, if your point is simply to say that the space of proteins is different from the space of language, that is another discussion, which we have already done and that we certainly will do again. But it has nothing to do with logical fallacies, painting targets, and making the probability arbitrarily small. IOWs with the methodology. IOWs, with all the wrong arguments that you have attempted against the general procedure.
Well I do think they are different, but you asked a specific question at 193 “Is my math wrong?”, so I’ll humor you once more. Two errors: 1) you are equating the ratio of the size of the target space and the size of the total space with a probability. This assumes, incorrectly, that all members are equiprobable. 2) Since you are allowing repetition in your 120 words, then about one text in 1,700 will have word duplication. You need to adjust your n! term when this happens. Unlike error (1), this one is “not material” You need to fix error 1 before you can claim to have calculated dFSCI. Good luck. Thank you REC @ 209 for running the alignment on a decent number of ATPases. 12 residues are 98% conserved. I suspect he might have been better off going with his Histone H3 example, but H3 doesn’t look complicated. Re your reply to him at 232. I won’t speak for REC, but I am happy to stipulate that extant, traditional ATP synthase is fairly highly constrained; you could return the favor by recognizing that this constraint informs us about the region immediately surrounding the local optimum, nothing more. Rather , I think the problem is with your cherry-picking of 3 sequences out of 23,949 for your alignment, which smacks of carelessness. Why not use the full data set? P.S. I did enjoy kf’s treatise at 223 on how NOT to build an amplifier. Gripping stuff.DNA_Jock
November 12, 2014
November
11
Nov
12
12
2014
04:52 AM
4
04
52
AM
PDT
Reality- Biological information, as defined by Crick, exists. Your position cannot account for it. And we understand that bothers you.Joe
November 12, 2014
November
11
Nov
12
12
2014
04:37 AM
4
04
37
AM
PDT
gpuccio, are you going to answer my questions about the text I posted above?Reality
November 12, 2014
November
11
Nov
12
12
2014
04:34 AM
4
04
34
AM
PDT
kairosfocus said: "Personalities via loaded language only serve to hamper ability to understand..." kairosfocus, FOR RECORD, rarely, if ever, have I encountered a person who is as hypocritical and sinister as you are. Your language is thoroughly "loaded" with "personalities". You constantly accuse Keith S and everyone else who disagrees with you or even just questions you of being evil, radical Marxists, liars, and a long list of other despicable things. Your insulting, sanctimonious, malicious, libelous accusations are FALSE and YOU are in dire need of CORRECTION. Sixty of the best with Mr. Leathers would be a good start in that correction.Reality
November 12, 2014
November
11
Nov
12
12
2014
04:32 AM
4
04
32
AM
PDT
KS, You have unfortunately confirmed my concern. I will just note a few points for onlookers: 1 --> Personalities via loaded language only serve to hamper ability to understand; this problem and other similar problems have dogged your responses to design thought for years, consistently yielding strawman caricatures that you have knocked over. 2 --> You will kindly note, I have consistently called attention to the full tree of life which as Smithsonian highlights, has OOL at its root. This points to the island of function phenomenon, and that FSCO/I includes that connected with the von Neumann Self Replicator in the cell along with gated encapsulation, protein assembly, code use, integrated metabolism etc. 3 --> Thus, to the need to first explain reproduction from cellular level up before embedding in claimed mechanisms capable of originating body plans. Starting with the first one of consequence, the living cell. 4 --> So also, to the pivotal concern of design theory, to get TO islands of function and how to effectively do so: (a) sparse Blind Watchmaker search vs (b) intelligently directed configuration. Of these, only b has actually been observed as capable of causing FSCO/I. 5 --> Once we have ourselves such, there is no problem in a first life form diversifying incrementally and filling niches in an island of function. chance variation and differential reproductive success and culling [what the misnomer "natural selection" describes . . . nature cannot actually make choices] leading to descent with incremental modifications are fine in such a context. Most of the time, probably, such differential success will only stabilise varieties already existing. 6 --> The onward problem is to move from such an original body plan to major multicellular body plans by blind watchmaker mechanisms, because of the island of function effect of multi-part interactive organisation to achieve relevant function and the consequent sharp constraint on possible configs relative to possible clumped or scattered arrangements of the same parts. Multiplied by sparseness of possible search, the needle in haystack exploration challenge, whether by scattershot or dynamic-stochastic walk with significant randomness. 7 --> That is, once you hit the sea of non-function, you have no handy oracle to guide you on blind watchmaker approaches and you have a non-computable result on resource inadequacy. Body plan origin and more specifically, origin of required FSCO/I by blind Watchmaker mechanisms have no good analytic or observed experience grounds. 8 --> Origin of FSCO/I by intelligently directed configuration aka design is a routine matter, and we have in hand first steps of bio-engineering of life forms. Just yesterday I was looking at a policy document on genetic manipulation of foods. 9 --> So, accusations of dodging NS on your part are a strawman tactic. 10 --> Likewise, I outlined how models are developed and validated, underscoring that the Chi_500 model: Chi_500 = I*S - 500 functionally specific bits beyond the sol system limit . . . is such a model, developed in light of the Dembski 2005 metric model for CSI, and exploiting the fact that logs may be reduced, yielding info metrics in the case of log probabilities. The actual validation is success in recognising cases of design, whilst consistently not generating false positives. False negatives are no problem, it is not intended to spot any and all cases of design . . . the universal decoder wild goose chase. 11 --> I know, you and TSZ generally wish to fixate on debating log [p(T|h)] -- note the consistent omission in your discussions that we are looking at a log-probability metric, i.e. an informational one (and relevant probabilistic hyps as opposed to any and every one that can be dreamed of or suggested or hyperskeptically demanded would be laughed out of court in any t/comms discussion as irrelevant) -- in the Dembski expression. I simply point out by referring to real world dynamic-stochastic cases, that abstract probabilities may often be empirically irrelevant, as there are limits to observability in a sol system of 10^57 atoms and 10^17 s, or the observed cosmos extension. 12 --> As has been repeatedly pointed out and dismissed or ignored, a search of a config space of cardinality W will be a subset and the Blind Watchmaker Search for a Golden Search (S4GS, a riff on Dembski's S4S) . . . and remember search resource sparseness constraints all along . . . will have to address the power set of cardinality 2^W. And that can cascade on, getting exponentially worse. 13 --> So, as has been repeatedly pointed out and ignored, the sensible discussion is of reasonably random searches in the original space, with dynamic-stochastic patterns and sparseness, in the face of deeply isolated islands of function. Such searches are maximally unlikely to succeed. On average, they will perform about as well as . . . flat random searches of the space, which with maximal likelihood, will fail. No surprise, to one who ponders what is going on. 14 --> Where, such gives us a reasonable first estimate of the probability value at stake, if we want to go down that road. P(T) = T/W, starting with either scattershot search or arbitrary initial point dynamic-stochastic walks not reasonably correlated to the structure of the space. No S4GS need apply, in short. 15 --> This can then reckon with the relevant facts that in a computer memory register there is no constraint on bit chains, we can have 00 01 10 or 11. In D/RNA we can have any of ACGT/U following any other. Confining ourselves to the usual, correctly handed AA's, any of the 20 may follow any other of the 20. 16 --> So, reasonably flat distributions are generally reasonable and if we go on to later patterns not driven by chaining but shaped by the after the fact of the DNA code need to be a folding, functioning protein in a cell context, variations in frequency and flexibility in AAs in the chain can be and are factored in in more sophisticated metrics that exploit the avg info per symbol measure - SUM pi log pi. This was discussed with you and other objectors only a few days ago here at UD. 17 --> Once we start with say a first organism with say 100 AAs per protein avg [Cy-C as model], and at least 100, we see coding for 10,000 AAs and associated regulatory stuff and execution machinery as requisites. Self replication requires correlations between codes and other units. At even 1 bit or a modest fraction thereof per AA, the material point remains, the cell is well past FSCO/I thresholds and is designed. 18 --> Just the digitally coded FSCI -- dFSCI -- in the genome is well beyond the threshold. The FSCO/I in the cell is only reasonably explainable on design. The codons just for 10,000 AAs would be 30,000 [which is probably an order of magnitude too low.] 19 --> And, to go on to novel body plans, reasonable genomes run like 10 - 100+ millions dozens of times over. Not credible on Blind Watchmaker sparse search. So, while it is fashionable to impose the ideologically loaded demands of lab coat clad evolutionary materialism and/or fellow travellers even written into question-begging radical redefinitions of science and its methods, the message is plain. Absent question begging the reasonable conclusion is that the world of life is chock full of strong, inductively well warranted, signs of design, with FSCO/I and its subset dFSCI, at their heart. KFkairosfocus
November 12, 2014
November
11
Nov
12
12
2014
03:13 AM
3
03
13
AM
PDT
Dembski’s problems are that 1) he can’t calculate P(T|H), because H encompasses “Darwinian and other material mechanisms”;
What a joke! Evolutionists can't provide the probabilities and they think that is our problem?! Evolutionists can't muster a methodology and they think that is our problem?! AmazingJoe
November 12, 2014
November
11
Nov
12
12
2014
03:09 AM
3
03
09
AM
PDT
keith s:
The correct question is “Could this sequence have been produced by random variation plus selection, or some other ‘material mechanism’?”
And keith's position cannot answer that question and he thinks that is a poor reflection on ID. Also natural selection doesn't come into play until there is a product that can be "seen" by nature. That is the problem-> unguided evolution can't even muster testable hypotheses.Joe
November 12, 2014
November
11
Nov
12
12
2014
03:07 AM
3
03
07
AM
PDT
keith s: This is a blog. My OPs, which are relatively recent, are an attempt at systematizing more my arguments. I have not yet written an OP on the computation of dFSCI, I am still at its definition. It will come. However, as you can see, I am ready to discuss all aspects when prompted. If you accuse me of not being able to discuss everything each time systematically, well, I am certainly culpable for that. And I maintain what I have said: I am perfectly fine with that acknowledgement of my small original contribution. What counts are the ideas, not the people who express them. May I quote Stephen King? "It is the tale, not he who tells it" (Different Seasons)gpuccio
November 12, 2014
November
11
Nov
12
12
2014
02:08 AM
2
02
08
AM
PDT
keiths:
Thus, your contribution was nothing more than inventing an acronym for an old and well-known probability calculation.
gpuccio:
I am perfectly fine with that.
keith s
November 12, 2014
November
11
Nov
12
12
2014
01:44 AM
1
01
44
AM
PDT
1 21 22 23 24 25 31

Leave a Reply