Uncommon Descent Serving The Intelligent Design Community

Evolutionist: You’re Misrepresenting Natural Selection

Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

How could the most complex designs in the universe arise all by themselves? How could biology’s myriad wonders be fueled by random events such as mutations?  Read more

Comments
Elizabeth: And I see other papers that do attempt explanations. Please, quote a paper that gives an explicit evolutionary path for a new protein domain. Otherwise, accept that no such a path has ever been shown. My fundamental point here (and where I do have some expertise) is not the details of protein evolution, but on the method by which you calculate the the probability when you simply do not have the information with which to calculate all possible evolutionary paths If that is your position, we can stop here. I have said many times that "possible evolutionary paths" are not a scientific arguments, unless they are shown to exist. I will calculate the probability according to what is known, not according to what is imagined or just hoped. This is science, and not a silly game. So, if you want to go on with a scientific methodology, let's go on, but know that I will not consider "possible unknown evolutionary paths, if which nobody has ever shown that they are possible, as a scientific argument, and I will never accept that it is my duty to show thatthey are not logically possible. I am satisfied that they have never been found, and therefore they do not exist empirically as part of an explanation.gpuccio
January 16, 2012
January
01
Jan
16
16
2012
01:51 AM
1
01
51
AM
PDT
Why should we believe that A becomes B?Petrushka
January 15, 2012
January
01
Jan
15
15
2012
06:12 PM
6
06
12
PM
PDT
It is clear that this is not your field. There is no explanation of the origin of protein domains, as you can see form Axe’s paper about that issue.
No, I cannot see that. And I see other papers that do attempt explanations. So why should I (as a non-expert) accept your assertion (or Axe's) that there is no explanation? My fundamental point here (and where I do have some expertise) is not the details of protein evolution, but on the method by which you calculate the the probability when you simply do not have the information with which to calculate all possible evolutionary paths, nor, if you did, to calculate what possible reproductive benefit, in what environments, each intermediate step might confer.Elizabeth Liddle
January 15, 2012
January
01
Jan
15
15
2012
04:04 PM
4
04
04
PM
PDT
Well obviously it is not an analogy to push very far! I just used to to demonstrate that there are two senses of "relatedness" here, and there is a danger of conflating them.Elizabeth Liddle
January 15, 2012
January
01
Jan
15
15
2012
03:36 PM
3
03
36
PM
PDT
Dr Liddle, As regards word chains it is a stretch of imagination to assume that semantics can change "en route". If I now all of a sudden start overriding English words with new meanings without telling you about it, that will be the end of our communication. For anything like word chains to happen in practice one needs to assume that the sender and the receiver must a-priori agree (or synchronously be told) on semantics for the entire length of the word chain. Semantics is totally independent of physicality. Another example is Ken Miller's suggested preadaptational use of Behe's malfunctional mousetrap as a catapult: the system must be told of the alternative use a priori. Haphazard semantic switching is nonsensical. First, someone has to decide what information needs to be passed and only then is its semantics instantiated into physicality. Upon massive observation, semantics goes first, while its instantiation happens second. In other words, (new) semantics cannot emerge spontaneously without agency. There are simply no observations whatsoever to support the assertion that the opposite can ever be the case.Eugene S
January 15, 2012
January
01
Jan
15
15
2012
02:19 PM
2
02
19
PM
PDT
Elizabeth (and Eugene): I think the best way to go on is tracing my general reasoning, and then going into details. So, here it is: a) If biological systems were modified only by random variation, any transition could be simply modeled probabilistically. Let's follow this reasoning just to start: a1) Our problem is how a new basic protein domains (a new protein superfamily) emerges in the course of evolution. I think we can agree that the general models are practically all transition models. I don't believe that anyone believes that a new sequence is built "from scratch", just adding one nucleotide to the other. So, I will focus on transition models (indeed, the reasoning could be very similar also for non transition models) a2) In a transition model, the new superfamily in some way must arise from existing sequences. To be general enough, we will consider three variants: a pre-existing functional protein gene, a duplicated and inactivated gene, and a non coding sequence. a3) In all three cases, the original sequence is unrelated to the final sequence, in the sense I have defined. If it is a functional protein gene, because it is part of another superfamily. If it is a duplicated, inactivated gene, the reason is the same. If it is a non coding DNA sequence, there is simply no reason to believe that any non coding sequence should be near, at sequence level, to what we will obtain: that is simply too unlikely, and would be assuming that the starting point for some strange reason has already "almost found" the target. b) The transition from A to B, in the measure that it is only the result of RV (of all kinds, including drift) can be described as a random walk. Each new state generated by any variation event is a "step" in the random walk, and a probabilistic "attempt" at generating a new sequence. Drift does not generate new states, but changes the representation of existing states in the population. But, being a random phenomenon, that can interest any existing state in the same way, it does not change the probabilistic scenario. c) How can we describe probabilistically that scenario, where only RV acts, and some specific functional target emerges? Our first approach will be to ask: what is the probability to reach B (the specific functional target that did emerge in natural history) from A (any unrelated starting sequence)? d) I will make now a very simple statement: if A is completely free to change, then our search space is essentially made of two gross subsets: a much smaller one (near-A), which could be defined as all the sequences that share some homology with A (let's say more than 10% homology); and a much bigger one (not-near-A), all the rest of possible sequences. e) It should be obvious that we cannot simply apply an uniform distribution to the search space. Indeed, the sequences in near-A will be reached much more likely by a random walk, and in a smaller number of steps. The probability of being reached is grossly dependent on how similar each sequence is to A. f) But B, by definition, is in not-near-A. Now, it should be equally obvious that all sequences in not-near-A have more or less the same probability of being reached by a random walk starting from A. If n_SP is the number of sequences in the search space, 1/n_SP is the probability of each sequence of being reached if we apply a uniform probability distribution. As the total probability is the sum of the probabilty of reaching a state in near-A + the probability of reaching a state in not-near-A, and as the states in near-A are more likely than the states in not-near-A, we can safely assume that the states in not-near-A have a lower probability than 1/n_SP, and that their probability can be considered grossly uniform in not-near-A. That means simply that if we take the probability of each unrelated state to be simply 1/n_SP, we are certainly overestimating it. I will stop here for now, and wait for comments (Eugene, you are obviously invited to the discussion)gpuccio
January 15, 2012
January
01
Jan
15
15
2012
01:48 PM
1
01
48
PM
PDT
Elizabeth: I will be brief about your concerns, because otherwise I will never get to the other points. a) The necessity relation is the way variation affects reproduction according to a cause effect relation between the variation in biochemical function of the affected protein and the reproductive fitness. This is a necessity relation, although you seem to do your best not to admit it. That random components can influence the final reproductive result is true, but in no way this fact "cancels" the causal relationship between protein function variation and differential reproduction. That causal relation is what interests us, because all the rest (the random components) will not influence the basic computation of probabilities, while the causal influence of protein function variation, being not random, will have a definite effect on the probabilistic scenario, as I am going to show. This seems circular. Clearly the descendent of one domain will be “related” to the other (by definition). After many intervening generations and changes, there may be no commonality, but that doesn’t mean they are unrelated. No. That's not the meaning I am giving to "unrelated". As I have clearly stated, by "unrelated" I mean that they have no sequence homology. Nothing more, nothing less. I ask you to accept this meaning for the following discussion. But you are assuming that two dissimilar proteins are unrelated. They may not be – it may simply be that the intermediate versions are no longer extant. Think of a word chain: I am not discussing at this point the "possible" intermediates. That is another discussion, for another moment. Here, unrelated just means "with no primary sequence homology". Nothing else. I don’t know what “absolutely distant” means. And I don’t see why you are assuming that the distance can’t be traversed. Absolutely distant just means that there is the highest distance in the sequence space, because the two sequences have no homology. Two sequences cannot be more distant than that. And I have never said, least of all "assumed" that the distance "cannot be traversed". What I said is "No protein A can become protien B, with a completely different sequence, without passing through the “distance” that separates the two states." That is not "the disatnce cannot be traversed". It is, on the contrary, "the distance must be traversed", if we have to reach B from A. Do you think it's the same concept? I have to ask you to stick more precisely to what I say, if we have to go on with the discussion. Well, either is free to change as long as the other remains functional (if it’s important. Of course. Well, I still don’t know what you mean by “unrelated”. Well, I hope at least that is now clear. This is not my field, but there seems to be a substantial literature on protein domain evolution. I think you are possibly making the error of regarding “protein domains” as some absolute category (rather like “species”, which often suffers from a similar problem) as opposed to a convenient simplifying categorisation by scientists to refer to common evolutionary units. Have you done a literature search? It is clear that this is not your field. There is no explanation of the origin of protein domains, as you can see form Axe's paper about that issue. My definition of protein domains is taken from the literature, and from SCOP, the database of the known proteome classification. The classification I often use (2000 independent, unrelated domains) corresponds to SCOP's concept of "protein superfamilies". The result of 6000 groupings sharing less than 10% homology is taken from SCOP, too. The following is from the SCOP site: "Classification Proteins are classified to reflect both structural and evolutionary relatedness. Many levels exist in the hierarchy, but the principal levels are family, superfamily and fold, described below. The exact position of boundaries between these levels are to some degree subjective. Our evolutionary classification is generally conservative: where any doubt about relatedness exists, we made new divisions at the family and superfamily levels. Thus, some researchers may prefer to focus on the higher levels of the classification tree, where proteins with structural similarity are clustered. The different major levels in the hierarchy are: Family: Clear evolutionarily relationship Proteins clustered together into families are clearly evolutionarily related. Generally, this means that pairwise residue identities between the proteins are 30% and greater. However, in some cases similar functions and structures provide definitive evidence of common descent in the absense of high sequence identity; for example, many globins form a family though some members have sequence identities of only 15%. Superfamily: Probable common evolutionary origin Proteins that have low sequence identities, but whose structural and functional features suggest that a common evolutionary origin is probable are placed together in superfamilies. For example, actin, the ATPase domain of the heat shock protein, and hexakinase together form a superfamily. Fold: Major structural similarity Proteins are defined as having a common fold if they have the same major secondary structures in the same arrangement and with the same topological connections. Different proteins with the same fold often have peripheral elements of secondary structure and turn regions that differ in size and conformation. In some cases, these differing peripheral regions may comprise half the structure. Proteins placed together in the same fold category may not have a common evolutionary origin: the structural similarities could arise just from the physics and chemistry of proteins favoring certain packing arrangements and chain topologies." As you can see, at superfamily level, it is only "probable" that the proteins in the same superfamily share a common evolutionary origin. Therefore, the 2000 superfamilies grouping is till a very "generous" grouping. I usually take that number because I believe it is a fair intermediate between the "1000 foldings" concapt and the "6000 groupings sharing less than 10% homology" concept. That's it for your last comments. Now, on to probabilities.gpuccio
January 15, 2012
January
01
Jan
15
15
2012
01:17 PM
1
01
17
PM
PDT
Elizabeth: Well, it seems we are beginning to communicate better. Before going on with the reasoning, let’s try to clarify the small misunderstandings we still have. Using random in the sense in which you are using it, of course drift is random. That’s why it’s called drift! It belongs in the “NS” part of your discussion, though, not the “RV” part. As you say, it does not change the genome. It is part of the process by which certain genotypes become more prevalent, as is NS. But you can dispense with drift as an additional factor simply by modelling differential reproduction stochastically. That’s OK for me: NS and drift act in similar ways: changing the representation of some genome in the population. Both cannot change the genome. And still, it’s important to remember that drift, unlike NS, is totally random because the fact that some gene becomes more represented in the population because of drift has not necessity relation with the gene itself (the effect is random), while NS expands, or reduces, the representation og genes according to a necessity relation (the causal effect of the specific variation on reproduction). If we agree on that, we can go on.
Well, as I said, you can model the two together by modelling differential reproduction stochastically, and including a bias to represent natural selection. I’d be very unhappy about calling that a “necessity relation” because there is no “necessity” that any one phenotypic feature will always result in increased (or reduced) reproduction. Whether that feature is “selected” or not has a statistical answer, not a “necessity” answer. This is why I think that the chance vs necessity distinction is unhelpful, and potentially misleading.
What do you mean by “unrelated”? One will be the direct descendent of the other! Or do you mean “completely different” sequence – if so, why? Yes, it means that A and B have completely different primary sequences. That is already implied by the premise (they belong to different basic domains, and basic domains have less than 10% homologies, which is more or less the random similarity you can expect between completely different random seqeunces of that type and length).
This seems circular. Clearly the descendent of one domain will be “related” to the other (by definition). After many intervening generations and changes, there may be no commonality, but that doesn’t mean they are unrelated.
IOWs, here we are not discussing how a protein in a superfamily can be transformed into another protein of the same suprfamily, with high homology, similar structure, and similar, or slightly different, function. We are discussiong how a new protein domain can emerge from an existing protein domain, or from some non coding DNA. That’s why the starting sequence is certainly unrelated to the final one in the cae of a protein belonging to amother, previously existing, protein domain. In the case of non coding DNA, we are not sure of anything, but obviously there is no reason in the world why non coding DNA should be potentially related to the new protein domain, except in the case that it was designed for that.
But you are assuming that two dissimilar proteins are unrelated. They may not be – it may simply be that the intermediate versions are no longer extant. Think of a word chain: Hand Hard Hold Hood Food Foot. Foot is a “new domain”. If hard, hold, hood, and food are no longer extant, we cannot assign it to the same “family” as “hand”. But that doesn’t mean that “hand” wasn’t ancestral to it.
and that all the variations happen at sequence level. As opposed to what? What I mean is that it is the primary sequence that varies, at each random event. Therefore, it the primary sequences are unrelated and therefore distant in the search space, the variation has to traverse the search space anyway. We can imagine the search space with the following topology: the distance between states (sequences) is defined as the percent of homology. Sequences with less than 10% homology are absolutely distant in the search space. And that’s exactly our case. No protein A can become protien B, with a completely different sequence, without passing through the “distance” that separates the two states.
I don’t know what “absolutely distant” means. And I don’t see why you are assuming that the distance can’t be traversed.
Why? And what do you mean by “its 3D structure and function”? You mean that the protein it codes for has to not be coded for at some point in the change? What? Well, if the gene has been duplicated, and the original gene is kept functional by the effect of negative NS, then the duplicated gene is free to change in neutral mode (that would be the subject of my next post, if we can arrive there). The same is true for the start from non coding DNA.
Well, either is free to change as long as the other remains functional (if it’s important.
If the variation must happen in a functional gene, I really don’t know how that could work. The emergence of the new function would inevitably imply, long before the new function emerges, the loss of the old function. I suppose that’s why darwinists very early elaborated the model of gene duplication.
Well, that’s certainly one possible mechanism.
But if the duplkicated gene changes in neutral mode, it will very early become unrelated to the original gene, while being at the same time unrelated to the new gene (unless design guides the transition, and probably even in that case).
Well, I still don’t know what you mean by “unrelated”.
At this point, I could anticipate my next point. But let’s before finish our work on what already said. But why should it ever reach that state? Why are you insisting that a new domain (“B”) must be totally dissimilar to its parent, “A”? That seems to me a quite unsafe assumption. I know there is a fair amount of research into the origins of protein domains, but obviously it’s not my field. But what makes you think that this research is wrong? Again, the difference between A and B is implied by the fact that they belong to different basic domains, or superfamilies. And, as I have already said, as far as I know there is no model that shows intermediates between protein domains, either in the proteome, or in the lab. I have been saying these things for years now, and no interlocutor has ever shown any “research abot the origin of protein domains” that contradicts that. So, please, show me the research you think of, and in that case I will try to explain why it is wrong.
This is not my field, but there seems to be a substantial literature on protein domain evolution. I think you are possibly making the error of regarding “protein domains” as some absolute category (rather like “species”, which often suffers from a similar problem) as opposed to a convenient simplifying categorisation by scientists to refer to common evolutionary units. Have you done a literature search?
But I am not holding my breath… So, if you answer these points, I will go on about the role of NS and the computation of probabilities.
OK. I’m sort of interested in how you compute the probabilities, but I’m a bit concerned by some of your assumptions.Elizabeth Liddle
January 14, 2012
January
01
Jan
14
14
2012
02:46 PM
2
02
46
PM
PDT
GPuccio, So, if you answer these points, I will go on about the role of NS and the computation of probabilities. Could I ask you to continue regardless of whether or not you get the feedback. It is really interesting. Cheers.Eugene S
January 14, 2012
January
01
Jan
14
14
2012
01:58 PM
1
01
58
PM
PDT
Thank you for conceding. I agree, anyway.
It would be good of you to respond to the business of cousin extinction. We see it all the time when one variant is noticeably superior to another. My concession is that as the technology permits, it needs to be addressed.Petrushka
January 4, 2012
January
01
Jan
4
04
2012
01:07 PM
1
01
07
PM
PDT
Elizabeth: Well, it seems we are beginning to communicate better. Before going on with the reasoning, let's try to clarify the small misunderstandings we still have. Using random in the sense in which you are using it, of course drift is random. That’s why it’s called drift! It belongs in the “NS” part of your discussion, though, not the “RV” part. As you say, it does not change the genome. It is part of the process by which certain genotypes become more prevalent, as is NS. But you can dispense with drift as an additional factor simply by modelling differential reproduction stochastically. That's OK for me: NS and drift act in similar ways: changing the representation of some genome in the population. Both cannot change the genome. And still, it's important to remember that drift, unlike NS, is totally random because the fact that some gene becomes more represented in the population because of drift has not necessity relation with the gene itself (the effect is random), while NS expands, or reduces, the representation og genes according to a necessity relation (the causal effect of the specific variation on reproduction). If we agree on that, we can go on. What do you mean by “unrelated”? One will be the direct descendent of the other! Or do you mean “completely different” sequence – if so, why? Yes, it means that A and B have completely different primary sequences. That is already implied by the premise (they belong to different basic domains, and basic domains have less than 10% homologies, which is more or less the random similarity you can expect between completely different random seqeunces of that type and length). IOWs, here we are not discussing how a protein in a superfamily can be transformed into another protein of the same suprfamily, with high homology, similar structure, and similar, or slightly different, function. We are discussiong how a new protein domain can emerge from an existing protein domain, or from some non coding DNA. That's why the starting sequence is certainly unrelated to the final one in the cae of a protein belonging to amother, previously existing, protein domain. In the case of non coding DNA, we are not sure of anything, but obviously there is no reason in the world why non coding DNA should be potentially related to the new protein domain, except in the case that it was designed for that. and that all the variations happen at sequence level. As opposed to what? What I mean is that it is the primary sequence that varies, at each random event. Therefore, it the primary sequences are unrelated and therefore distant in the search space, the variation has to traverse the search space anyway. We can imagine the search space with the following topology: the distance between states (sequences) is defined as the percent of homology. Sequences with less than 10% homology are absolutely distant in the search space. And that's exactly our case. No protein A can become protien B, with a completely different sequence, without passing through the "distance" that separates the two states. Why? And what do you mean by “its 3D structure and function”? You mean that the protein it codes for has to not be coded for at some point in the change? What? Well, if the gene has been duplicated, and the original gene is kept functional by the effect of negative NS, then the duplicated gene is free to change in neutral mode (that would be the subject of my next post, if we can arrive there). The same is true for the start from non coding DNA. If the variation must happen in a functional gene, I really don't know how that could work. The emergence of the new function would inevitably imply, long before the new function emerges, the loss of the old function. I suppose that's why darwinists very early elaborated the model of gene duplication. But if the duplkicated gene changes in neutral mode, it will very early become unrelated to the original gene, while being at the same time unrelated to the new gene (unless design guides the transition, and probably even in that case). At this point, I could anticipate my next point. But let's before finish our work on what already said. But why should it ever reach that state? Why are you insisting that a new domain (“B”) must be totally dissimilar to its parent, “A”? That seems to me a quite unsafe assumption. I know there is a fair amount of research into the origins of protein domains, but obviously it’s not my field. But what makes you think that this research is wrong? Again, the difference between A and B is implied by the fact that they belong to different basic domains, or superfamilies. And, as I have already said, as far as I know there is no model that shows intermediates between protein domains, either in the proteome, or in the lab. I have been saying these things for years now, and no interlocutor has ever shown any "research abot the origin of protein domains" that contradicts that. So, please, show me the research you think of, and in that case I will try to explain why it is wrong. But I am not holding my breath... So, if you answer these points, I will go on about the role of NS and the computation of probabilities.gpuccio
January 4, 2012
January
01
Jan
4
04
2012
01:05 PM
1
01
05
PM
PDT
Elizabeth: Now, let’s mopdel the system of random variation in a reasonable biological model. To do that, I will try to do what darwinists never do: to be specific, and to refer always to explicit causal models.
I do wish you’d stop throwing out these completely unsupported, and IMO completely erroneous generalisations. You can’t get a scientific paper published (or only with difficulty) if you aren’t specific, and, if your hypothesis is a causal one, without specifying your causal model. And yet, tens of thousands of papers on evolutionary biology are published each year.
a) First of all, what are the data that we are trying to explain. For consistency, I will stick to my usual scenario: the emergence, in the course of natural history, of about 2000 – 6000 unrelated protein domains (according to how we group them). That kind of event has happened repeatedly in natural history, and I believe that we can agree that those 2000 or so protein domain are the essebtial core of biological functio in life as we know it. IOWs, they are very important empirical data that need to be causally explained.
OK, so you are talking specifically about the evolution of protein domains. Yes, we can agree that the protein domains that are the core of life as we know it are the core of life as we know it. We do not, of course, know that they are the core of life as it might have been had some things turned out a bit differently. Hence my repeated reminder of the importance of correct specification of the null hypothesis.
b) The main characteristics of our 2000 or so basic protein domains are: b1) they are completely unrelated at sequence level: less than 10% homology. b2) they have specific and different 3D structure due to very good and stable folding. b3) they have specific abd different biochemical function (biochemical activity, if you prefer), that can be objectively define and measured. b4) There are no known functional intermediates that bridge those groups, neither in the proteome nor in the lab. c) Neo darwinism has a tentative explanation for the emergence of those protein structures: they would be the result of RV + NS. d) Let’s leave alone, for the moment, NS, and consider RV. What does RV mean? It means any form of modification of the genome that is not designed and whose result cannot be described by a necessity model, but only by a probabilistic model. So, RV includes many proposed mechanisms: single point mutation, insertions and deletion, inversion, frameshift mutation, chromosomal modifications, duplications, sexual rearrangements, and so on. e) Now, why do we call all those mechanisms “RV”? As we have discussed, each of those mechanisms is operated in accord to biochemical necessity laws. But the result of the variation cannot in any way ne anticipated by a necessity model. It can only be described probabilistically. Let’s take, for example, single point mutation: It can occur at any site, even if some sites are more likely than others. We can never say, by biochemical computations, which nucleotide will change as a result of some duplication error. But we can try to describe those events probabilistically. That is true for all the “variation engines” considered.
More or less. Even drift is random, although in a strict sense it does not chamge the genome, but only the representations of some genomes in the population. Using random in the sense in which you are using it, of course drift is random. That’s why it’s called drift! It belongs in the “NS” part of your discussion, though, not the “RV” part. As you say, it does not change the genome. It is part of the process by which certain genotypes become more prevalent, as is NS. But you can dispense with drift as an additional factor simply by modelling differential reproduction stochastically.
So, we can say that all variation that changes the genome (excluding possible design interventions) is random, in the sense that it can only be described probabilistically. f) But how can we describe probabilistically those variations? It’s not easy because, as we have said, not all events have the same probability. The probability of a single point mutation is not the same as that of an inversion, for example. Different biochemical mechanisms explain different forms of variation.
Exactly. Both differential reproduction and variation generation are stochastic processes based on what you would call “necessity laws”. Everything is caused by something, it’s just that to model it, you have to take a stab at the probability distribution of those causal events. And remember also that we are dealing with a dynamic system here, not a static one, in which the state of the system at any given time strongly constrain what happens next. In other words, given Genome X, the probability that its descendent will resemble it very closely is astronomically higher than the probability that it will not.
g) There is a way, however, to simplify the reasoning,
There is indeed :)
if we stick top an explicit scenario: the emergence of a new protein domain. h) In principle, there are at least two way to generate a functional sequence of codons corresponding to a functional sequence of AAs, IOWs to generate a functional protein coding gene: h1) It could emerge gradually “from scratch”, starting with a codon and adding the others. h2) It could emerge by a random walk from some existing sequence, by gradual or less gradual modifications of the sequence. I will stick to the second scenario, because I believe that nobody is really proposing the first.
Right. And a random walk is indeed the scenario I described, in which the state the sequence is in at Time t strongly constrains the state it is in at Time t+1.
i) So, what we need in our scenario is: i1) a starting sequence i2) a series of modifications that realize a random walk i3) a final result Now, I can already hear you objections about the target, and so on.
Nope that’s fine. I accept that your target is a sequence that codes for a protein domain.
But try to undersatnd the context. We are trying to explain how existing functional domains originated. What I am doing is exploring the probability that one specific existing functional protein domain originated by means of RV such as we observe in biological systems. I will deal after with problems such as the contribution of NS, or the idea that other functional sequences could have arised. So, please follow me. j) The starting sequence. I will stick to the more common scenarios: j1) the starting sequence is an existing, unrelated gene (IOWs, a gene coding for some other protein, with a different basci domain, different sequence, different structure, different function. j2) the starting sequence is an existing piece of non coding DNA. Now, the only way to simplify the discussion is to stress that the starting sequence and the final sequence are totally unrelated at sequence level,
What do you mean by “unrelated”? One will be the direct descendent of the other! Or do you mean “completely different” sequence – if so, why? and that all the variations happen at sequence level. As opposed to what?
That implies that, if we call A the starting sequence, and B the final sequence, A must completely loose its characteristic primary sequence, to get to B. In the same way, A has to loose its 3D structure and function.
Why? And what do you mean by “its 3D structure and function”? You mean that the protein it codes for has to not be coded for at some point in the change? What?
Now, the fundamental point: once A changes so much that it becomes unrelated to its original sequence, at that point all unrelated states have more or less the same probability to be reached by a random walk.
But why should it ever reach that state? Why are you insisting that a new domain (“B”) must be totally dissimilar to its parent, “A”? That seems to me a quite unsafe assumption. I know there is a fair amount of research into the origins of protein domains, but obviously it’s not my field. But what makes you think that this research is wrong?Elizabeth Liddle
January 4, 2012
January
01
Jan
4
04
2012
10:38 AM
10
10
38
AM
PDT
Elizabeth: Now, let's mopdel the system of random variation in a reasonable biological model. To do that, I will try to do what darwinists never do: to be specific, and to refer always to explicit causal models. a) First of all, what are the data that we are trying to explain. For consistency, I will stick to my usual scenario: the emergence, in the course of natural history, of about 2000 - 6000 unrelated protein domains (according to how we group them). That kind of event has happened repeatedly in natural history, and I believe that we can agree that those 2000 or so protein domain are the essebtial core of biological functio in life as we know it. IOWs, they are very important empirical data that need to be causally explained. b) The main characteristics of our 2000 or so basic protein domains are: b1) they are completely unrelated at sequence level: less than 10% homology. b2) they have specific and different 3D structure due to very good and stable folding. b3) they have specific abd different biochemical function (biochemical activity, if you prefer), that can be objectively define and measured. b4) There are no known functional intermediates that bridge those groups, neither in the proteome nor in the lab. c) Neo darwinism has a tentative explanation for the emergence of those protein structures: they would be the result of RV + NS. d) Let's leave alone, for the moment, NS, and consider RV. What does RV mean? It means any form of modification of the genome that is not designed and whose result cannot be described by a necessity model, but only by a probabilistic model. So, RV includes many proposed mechanisms: single point mutation, insertions and deletion, inversion, frameshift mutation, chromosomal modifications, duplications, sexual rearrangements, and so on. e) Now, why do we call all those mechanisms "RV"? As we have discussed, each of those mechanisms is operated in accord to biochemical necessity laws. But the result of the variation cannot in any way ne anticipated by a necessity model. It can only be described probabilistically. Let's take, for example, single point mutation: It can occur at any site, even if some sites are more likely than others. We can never say, by biochemical computations, which nucleotide will change as a result of some duplication error. But we can try to describe those events probabilistically. That is true for all the "variation engines" considered. Even drift is random, although in a strict sense it does not chamge the genome, but only the representations of some genomes in the population. So, we can say that all variation that changes the genome (excluding possible design interventions) is random, in the sense that it can only be described probabilistically. f) But how can we describe probabilistically those variations? It's not easy because, as we have said, not all events have the same probability. The probability of a single point mutation is not the same as that of an inversion, for example. Different biochemical mechanisms explain different forms of variation. g) There is a way, however, to simplify the reasoning, if we stick top an explicit scenario: the emergence of a new protein domain. h) In principle, there are at least two way to generate a functional sequence of codons corresponding to a functional sequence of AAs, IOWs to generate a functional protein coding gene: h1) It could emerge gradually "from scratch", starting with a codon and adding the others. h2) It could emerge by a random walk from some existing sequence, by gradual or less gradual modifications of the sequence. I will stick to the second scenario, because I believe that nobody is really proposing the first. i) So, what we need in our scenario is: i1) a starting sequence i2) a series of modifications that realize a random walk i3) a final result Now, I can already hear you objections about the target, and so on. But try to undersatnd the context. We are trying to explain how existing functional domains originated. What I am doing is exploring the probability that one specific existing functional protein domain originated by means of RV such as we observe in biological systems. I will deal after with problems such as the contribution of NS, or the idea that other functional sequences could have arised. So, please follow me. j) The starting sequence. I will stick to the more common scenarios: j1) the starting sequence is an existing, unrelated gene (IOWs, a gene coding for some other protein, with a different basci domain, different sequence, different structure, different function. j2) the starting sequence is an existing piece of non coding DNA. Now, the only way to simplify the discussion is to stress that the starting sequence and the final sequence are totally unrelated at sequence level, and that all the variations happen at sequence level. That implies that, if we call A the starting sequence, and B the final sequence, A must completely loose its characteristic primary sequence, to get to B. In the same way, A has to loose its 3D structure and function. Now, the fundamental point: once A changes so much that it becomes unrelated to its original sequence, at that point all unrelated states have more or less the same probability to be reached by a random walk. More on that in next post.gpuccio
January 4, 2012
January
01
Jan
4
04
2012
08:44 AM
8
08
44
AM
PDT
Elizabeth: You say: If by “chance causes things” you mean “we made a stochastic model that fit the data”, then fair enough. But why not be precise? Because it’s easy to equivocate accidentally if you use language imprecisely, and in Dembski’s formulation he rejects the “Chance” null and infers design without ever making a stochastic model. And evolutionists would, and do, argue that an appropriate stochastic model fits the data very well. I don't really think that your comments about Dembski, and Behe, are correct. The point in ID is that the random system imagined by darwinists cannot explain the data, nor even with the introduction of NS. I think Dembski has stated very clearly that he assumes an uniform distribution for the random system, that is the onlt reasonable thing to do. Then, he establishes the UPB (indeed, too generous a threshold) as alimit of what a random system can empirically explain. Maybe he has not gone into the biological details (but Behe certainly has). Anyway, while waiting to understand better the nature of your objections to Dembski and Behe, I will try to analyze the biological random system for you, and to show that it cannot explain data. I have really already done that, but it could be useful to review the resoning in detail and in order, now that maybe we have clarified some epistemological points. In next post.gpuccio
January 4, 2012
January
01
Jan
4
04
2012
08:10 AM
8
08
10
AM
PDT
Petrushka: I concede that research needs to be done to demonstrate a path to protein coding. Thak you for conceding. I agree, anyway. It appears that this kind of research will be very difficult, possibly impossible for the foreseeable future. I am more otpimistical. And I really look forward to that. So I suspect you will be able to hold on to your opinion. And you to yours. At least for a while. Not too much, I hope :)gpuccio
January 3, 2012
January
01
Jan
3
03
2012
02:14 PM
2
02
14
PM
PDT
As I have said many times, it is inconceivable that all your “cousins” should systematically die
The systematic elimination of less functional protein coding sequences is no more mysterious than the systematic extinction of less intelligent hominids. Or the systematic extinction of marsupials from South America when a land bridge enabled the crossing over of more competitive mammals. I concede that research needs to be done to demonstrate a path to protein coding. It appears that this kind of research will be very difficult, possibly impossible for the foreseeable future. So I suspect you will be able to hold on to your opinion.Petrushka
January 3, 2012
January
01
Jan
3
03
2012
10:37 AM
10
10
37
AM
PDT
Petrushka: I find your comments reasonable this time, although obviously I don't agree with the substance of them. I believe that the 2000 basci protein superfamilies have no sequence relatives. Indeed, the number of unrelated groupings at sequence level in SCOP (less than 10% homology) is about 6000. As I have said many times, it is inconceivable that all your "cousins" should systematically die, especially when, if you introduec NS in the scenario, there should be hundreds of thousands of them (or probably more). The majority of protein domains have been "invented by microbes" (indeed, a little more than 50%). But it is true that only a small number is found in higher animals. The reason for that is not necessarily the one you give (although that could contribute). Another reason can well be that most relecant biochemcical activities have alredy been designed at that stage, and regulatori networtks become more important than ever. I don't deny the utility of bone lengths, only I would like to know the molecular basis. Simple dimensions could be regulated by simple factors, while morphological and functional connections seem better candidates for complex regulatory information.gpuccio
January 3, 2012
January
01
Jan
3
03
2012
09:03 AM
9
09
03
AM
PDT
It is possible for Axe's research to have been correctly done, but not present a problem for evolution. Axe did not test an evolutionary scenario as Thornton did. Axe did not ask an evolutionary question, which would be: given two alleles, can you get from one to the other by small steps? I'm not convinced that the protein problem doesn't support evolution rather than present a problem. In the entire history of life there have been only 2000 or so protein domains invented. That's about one every two million years (and they are spread out over several billion years, so any putative designer would have had to make multiple interventions and many visits). Some domains have sequence relatives and some don't. But not having living relatives does not mean that one is specially created. It simply means that you have no living cousins. The most interesting thing about protein domains is that nearly all of them have been invented by microbes, which is what you would expect if it requires large numbers of trials. Microbes have large populations and short reproductive cycles. The designer of proteins has been rather stingy with vertebrates. Most evolution of vertebrates has skipped the necessity of new proteins in favor of regulatory networks. and whatever you might say about the objective existence of protein folds, the utility of bone length and such is settled in the arena of reproductive success.Petrushka
January 2, 2012
January
01
Jan
2
02
2012
06:00 PM
6
06
00
PM
PDT
Elizabeth: Because chance does not cause anything. It’s just a way of saying that whatever caused something, it was not something you predicted, or could easily have predicted. I have tried to explain in detail why that is wrong, but if you just object to the wording, we can reach an agreement. Let’s say that not rejecting the null means that a probabilistic explanation, based on our understandin of the system and our probabilistic modeling of it, explains quite well what we observe. For instance, in a test control experiment, the probabilistiv variation due to random sampling can well explain the observed difference. Rejecting the null means the opposite: that some other explanation, possibly the causal model hypothesized by the researchers, is reasonably needed. So, as you can see, chance does cause things, if we just mean that true determinsitic causes, that we model probabilistically because their is no ither way to do that, are the best explanation for what we observe.
Well, I think it’s important to be precise. If by “chance causes things” you mean “we made a stochastic model that fit the data”, then fair enough. But why not be precise? Because it’s easy to equivocate accidentally if you use language imprecisely, and in Dembski’s formulation he rejects the “Chance” null and infers design without ever making a stochastic model. And evolutionists would, and do, argue that an appropriate stochastic model fits the data very well.
That’s the only thing I ever meant, and the only thing necessary for ID theory.
Well, depends on the theory. It’s a major problem in Dembki’s. And in Behe’s.
So, your final statement in your post 19.2: Chance doesn’t “cause” anything. But lots of unmodelled factors do. However, under the null hypothesis, only rarely will those unmodelled factors combine to give you results like those you have observed. And ID simply does not attempt to model those unmodelled factors. It simply assumes that under the null (no design) the observed data will be very rare. In other words, it assumes what it sets out to demonstrate, which is fallacious. is completely unwarranted. It’s not that ID is not “attepting to model unmodelled factors”. It’s simply that those factors (such as what aminoacid will change because of random mutation) cannot be modeled explicitly in a necessity form, and must be modelled probabilistically. It’s not some strange position of IDists, it’s the only way to proceed scientifically.
Right. That’s what I’ve been saying! It requires a stochastic model! But I’m not seeing those stochastic models.
And ID does not assume anything like waht you state. It just tries to model probabilistically the system that, according to neo darwinist theory, is the probabilisti cause of the emergence of genetic information.
Not that I’ve seen. Dembski doesn’t. Do you have a reference? Don't rush, though, because I'm going to have to take another break :) I've bookmarked the thread though. And thank you for a very interesting conversation. We've at least managed a couple of points of agreement :) Cheers LizzieElizabeth Liddle
January 2, 2012
January
01
Jan
2
02
2012
03:52 PM
3
03
52
PM
PDT
Thanks for your responses M.I. My holiday is over, but I'll bookmark the thread and hope to get back to it! Some food for thought. Thanks! LizzieElizabeth Liddle
January 2, 2012
January
01
Jan
2
02
2012
03:24 PM
3
03
24
PM
PDT
Hi Elizabeth, It’s been a couple of days, but I wanted to clarify my argument (while I have the chance) and try and address some of the points you raised in 21.1. I hope you had a good new year. Let me restate the case I’ve made, that within protein sequence space there exists an objective target space for function which is determined by the laws of physics, and not by the post hoc definition of a given observer. (I don’t mean to be overly repetitious, but I want to restate my argument with regard to some of the points you raised.) S = {sequence space} F = {folding proteins} F1 = {functional proteins} F1 ⊆ F ⊂ S n(F1) ≤ n(F) < n(S), which implies 0 < P(F1) ≤ P(F) < 1, for a single trial The function n(X) determines the number of elements in a given set X. The set S consists of all sequences of a given length n. For a little more specificity, let’s assume that n is greater than or equal to 150. n ≥ 150 The set F consists of those sequences in S which will fold into stable proteins. This set is determined by the laws of physics. This set is unchanging. That is, it’s the same today as it was yesterday and will be tomorrow. This set is deterministic. If a sequence can be folded into a stable three-dimensional structure, then it exists in this set. I’m making the following assumptions. n(F) < n(S) That is the same as saying that not all sequences will fold properly. In addition, I’m claiming that the set F is a narrow subset of S -- there are many more elements in F’ (the complement of F) than there are in F: n(F) < n(F’). This also implies that for a partition 0 < n(F) < n(S) that n(F) is closer to zero than it is to n(S); or, that n(F) / n(S) < 1. This implies that P(F) < 0.5 for any single trial. The set F1 is a subset of F which consists of all folded proteins that can function in a biological context -- any context -- past, preset, or future. If a protein can possibly be functional and beneficial to any organism at any time, then it exists in set F1. It’s important to note that F1 may, in my assumptions, be equal to F. This is another way of saying that every protein which folds may indeed be functional in some biological context -- some organism, at some point in past, present, or future. That is, F1 ⊆ F, which implies n(F1) ≤ n(F). This makes F1 a static set regardless of what RV+NS might do at some point in an organism’s history. The size of the set F1 is bounded by the size of F. Given the above, it is clear that F1 is not arbitrary. Let me illustrate. If all of sequence space, the set S, is a dart board, and the set F is a smaller target within the confines of the dart board (or we can say that it is several small targets scattered around the board uniformly), when a dart is thrown by a blindfolded participant, we are far more likely to miss a target in F than we are to hit one. That being the case, F1 can’t be arbitrary -- striking a functional sequence is less likely than striking a non-functional one. Not only is the probability for F1 closer to zero than to one, but if we know that we struck a space in F then the probability of F1 having occurred is greater. P(F1) < P(F1|F) Striking a target in F increases the probability that F1 has also been struck. There are two implications to this: 1) Striking the target is not as likely as missing it. So the cards analog (Miller’s) takes a dirt nap -- one sequence is not as good as any other with respect to function -- and although each individual sequence has equal probability (axiomatic), functional sequences do not share an equal probability with non-functional ones: P(F1) < P(F1’) because P(F) < P(F’). So if a functional sequence is found, we can infer a rare event, and not one that is probabilistically insignificant, as is suggested by the cards analogy, and ones like it. 2) When a target in F1 is struck, the fact that it may be functional is not post hoc but objective. One cannot be accused of fitting the data to some arbitrary notion of function, because not all sequences can be functional; and the ones that can are contained to F. So the notion that the observation of function is dreamt up by pro-design ideologues is incorrect. Function exists within a narrow subset F, the folding proteins, which are determined by the laws of physics. We don’t have protein function outside of F. When we observe F1, we are not drawing an arrow around the target, we are observing an empirically validated, physical constraint on S: the set F. The estimate provided by Axe’s research suggests that 10^-74 of all sequences will fold into stable structures. This is provisional. However the notion that it is impossible to estimate how many sequences will fold is untenable, since this is determined by the laws of physics. I’ll include again the quote from Meyer which puts this in layman’s terms:
Since proteins can’t perform functions unless they first fold into stable structures, Axe’s measure of the frequency of folded sequences within sequence space also provided a measure of the frequency of functional proteins—any functional proteins—within that space of possibilities. Indeed, by taking what he knew about protein folding into account, Axe estimated the ratio of (a) the number of 150-amino-acid sequences that produce any functional protein whatsoever to (b) the whole set of possible amino-acid sequences of that length. Axe’s estimated ratio of 1 to 1074 implied that the probability of producing any properly sequenced 150-amino-acid protein at random is also about 1 in 1074. In other words, a random process producing amino-acid chains of this length would stumble onto a functional protein only about once in every 1074 attempts. Meyer, Stephen C. (2009-06-06). Signature in the Cell (pp. 210-211). Harper Collins, Inc.. Kindle Edition.
Here are some objections you raised, or variations on them (let me know if I’ve missed the point with any of these) along with some comments. a) Axe’s research is wrong. This may very well be the case. But unless research is proposed which drastically reduces this estimate, we can infer, at the least, that F is very narrow. I do not accept that any sort of reasonable estimate is just plain impossible. This is provisional, of course. Even if it were shown that say, 1 out of every 10^20 sequences can fold into stable proteins, we would still have a narrow subset of S, called F, within which function exists -- and so it could not be considered that function resides with arbitrary sequences in regard to S; we would just have a much larger target. b) We may find that long, unfolded sequences have some sort of function. That might indeed be demonstrated at some point. However empirically, we find the opposite to be true: if sequences fold into stable proteins, there is a possibility of function; otherwise, no. Again, this is provisional. c) RV+NS, the NDE mechanism, makes the set F1 non-static. That is, function is determined relative to environmental pressures and so forth, meaning that what is functional is subject to change. This is addressed earlier. Any potentially functional sequence, for any point in time, exists in set F1, which is bounded by set F. If there is a contextually selectable functional set, it is a subset of F1, and thereby a subset of F. SEL = {functions selectable by N.S.} SEL ⊆ F1 ⊆ F If NDE can act significantly outside of the bounds of F, it should be demonstrable. This is provisional. My claim is that NDE is not free to drift in and out of the set F unconstrained (RV could explore spaces which do not fold, but N.S. cannot select for them, as I understand it). It can certainly try, and it’s reasonable to speculate that such might be possible in principle, but AFAICT empirical observations do not warrant this. Even if it were shown that in limited cases, long polypeptides could serve a function in some narrow context, stable protein function is crucial to biological function and acts as the rule, not the exception; and the rule is important -- exceptions do not exist. (That is not to say that unfolded polypeptides will never be shown to have some limited, contextual role within an organism, only that they cannot substitute for folded, highly specific, functional proteins. I think this was one of gpuccio’s points earlier.) Thanks for a good discussion. I appreciate that you took some time with my previous comments. Best, m.i.material.infantacy
January 2, 2012
January
01
Jan
2
02
2012
03:17 PM
3
03
17
PM
PDT
Elizabeth: Because chance does not cause anything. It’s just a way of saying that whatever caused something, it was not something you predicted, or could easily have predicted. I have tried to explain in detail why that is wrong, but if you just object to the wording, we can reach an agreement. Let's say that not rejecting the null means that a probabilistic explanation, based on our understandin of the system and our probabilistic modeling of it, explains quite well what we observe. For instance, in a test control experiment, the probabilistiv variation due to random sampling can well explain the observed difference. Rejecting the null means the opposite: that some other explanation, possibly the causal model hypothesized by the researchers, is reasonably needed. So, as you can see, chance does cause things, if we just mean that true determinsitic causes, that we model probabilistically because their is no ither way to do that, are the best explanation for what we observe. That's the only thing I ever meant, and the only thing necessary for ID theory. So, your final statement in your post 19.2: Chance doesn’t “cause” anything. But lots of unmodelled factors do. However, under the null hypothesis, only rarely will those unmodelled factors combine to give you results like those you have observed. And ID simply does not attempt to model those unmodelled factors. It simply assumes that under the null (no design) the observed data will be very rare. In other words, it assumes what it sets out to demonstrate, which is fallacious. is completely unwarranted. It's not that ID is not "attepting to model unmodelled factors". It's simply that those factors (such as what aminoacid will change because of random mutation) cannot be modeled explicitly in a necessity form, and must be modelled probabilistically. It's not some strange position of IDists, it's the only way to proceed scientifically. And ID does not assume anything like waht you state. It just tries to model probabilistically the system that, according to neo darwinist theory, is the probabilisti cause of the emergence of genetic information.gpuccio
January 2, 2012
January
01
Jan
2
02
2012
11:44 AM
11
11
44
AM
PDT
And that’s exactly what is wrong in your reasoning. Why do you remprimend your students? Now, as you say: “All it does is to allow them to say, as you said yourself, that the observed results are unlikely to be observed if the null is true” All? What else are you looking for? The observed results are so unlikely (for instance, five sigma) that the only reasonable empirical choice is to reject the null hypothesis.
Yes.
IOWs, to “rule out effects that are “due to chance alone”.
No. That is not the same thing. "The null is true" is not the same as saying "the results are due to chance alone". Because "due to chance" is meaningless. "Chance" doesn't cause things. This is not a nitpick.
Now, what you have said seems senseless to me, but I will try to make sense out of it, giving possible interpretation that could be in some way true. Please, let me know if that’s what you meant: a) Their alpha criterion does not allow them to automatically affirm their H1 hypothesis. Perfectly true, but it’s not what you said.
Well, it allows them to claim that their H1 is supported. I'd be happy if they said that.
b)Their alpha criterion does not allow them to exclude that some small effects due to chance alone are however present in the system. True, but absolutely irrelevant. Why are you interested in subliminal random effects, when you have a five sigma in favour of a non random explanation?
Well, they should bear that in mind, but that is not the source of my objection to the phrase "the results are unlikely to be due to chance alone".
c) Their alpha criterion does not allow them to logically exclude an explanation “due to chance alone”. True, but who cares? Oue science is mainly empirical. Almost never it can falsify data interpretations “logically”. Five sigma is a very good empirical falsification, abd an absolute reason to rject the null hypothesis.
I have no objection to rejecting the null. I object to them claiming that it is unlikely their results are due to "chance alone".
So again, why do you reprimand your students? (I care for them, you know…)
I care for them too :) Because chance does not cause anything. It's just a way of saying that whatever caused something, it was not something you predicted, or could easily have predicted. And the reason it isn't a nitpick is that if you reject the null, you have to be really clear what the null is. The null is not "chance alone". The null is "H1 is false". And working out the expected distribution under the null is (as I'm sure you agree) an integral part of the test, and not always easy.Elizabeth Liddle
January 2, 2012
January
01
Jan
2
02
2012
09:49 AM
9
09
49
AM
PDT
Elizabeth: And that is exactly what is wrong with ID. Lori Ann White has done what I always reprimand students for doing – saying that their alpha criterion allows them to rule out effects that are “due to chance alone”. It does no such thing. All it does is to allow them to say, as you said yourself, that the observed results are unlikely to be observed if the null is true. And that's exactly what is wrong in your reasoning. Why do you remprimend your students? Now, as you say: "All it does is to allow them to say, as you said yourself, that the observed results are unlikely to be observed if the null is true" All? What else are you looking for? The observed results are so unlikely (for instance, five sigma) that the only reasonable empirical choice is to reject the null hypothesis. IOWs, to "rule out effects that are “due to chance alone”. Now, what you have said seems senseless to me, but I will try to make sense out of it, giving possible interpretation that could be in some way true. Please, let me know if that's what you meant: a) Their alpha criterion does not allow them to automatically affirm their H1 hypothesis. Perfectly true, but it's not what you said. b)Their alpha criterion does not allow them to exclude that some small effects due to chance alone are however present in the system. True, but absolutely irrelevant. Why are you interested in subliminal random effects, when you have a five sigma in favour of a non random explanation? c) Their alpha criterion does not allow them to logically exclude an explanation "due to chance alone". True, but who cares? Oue science is mainly empirical. Almost never it can falsify data interpretations "logically". Five sigma is a very good empirical falsification, abd an absolute reason to rject the null hypothesis. So again, why do you reprimand your students? (I care for them, you know...) :)gpuccio
January 2, 2012
January
01
Jan
2
02
2012
09:06 AM
9
09
06
AM
PDT
Elizabeth: Let's go on: “Random noise” is simply unmodelled variance (as you’ve said yourself). For instance, we can often reduce the residuals in our models by including a covariate that models some of that “noise”, so it ceases to be noise – we’ve found, in effect, a systematic relationship between some of the previously unmodelled variance and a simply modelled factor. Age, for instance, in my field, or “working memory capacity” is a useful one, as measured by digit span. OK. That's perfectly in line with what i said in my previous post. Furthermore, sampling error, is not a “cause of random error”, but represents the variability in summary statistics of samples resulting from variance in the population that is not included in your model. Why not? Sampling error is similar to measurement error. We want to measure something in a population, and instead we measure it in a sample. That implies a possible error in the measurement, because the sample does not represent perfectly the population. If the sampling technique is really random, than the error is random too. And, obviously, it depends critically on sample size, as well known. That applies also to tests comparing two or more samples. then sampling error is treated as the possible source of random differences between the groups because of ranbdom sampling, difference that could appear to be effects of a necessity cause (such as a true difference between groups), but are not. Certainly some variance is due to measurement error. But it would be very foolish to assume that you have modelled every variable impacting on your data apart from measurement error. I certainly don't assume any such thing. I could never work in medicine, that way! But sampling error is a random error due to sampling, and not another causal variable. And anyway my point is that, even if there are other variables that I have not modeled, stil I can often reliably assume a casual relation for my modeled variable. You are also making a false distinction between “the effects of random noise and the assumed effect of necessity”. Apart from measurement error, all the rest of your “random noise” may well be “effects of necessity”. What makes them “noise” is simply the fact that you haven’t modelled them. Model them, and they become “effects of necessity”. And, as I said, you can model them as covariates, or you can model them as stochastic terms. Either way, you aren’t going to get away without a stochastic term in your model, even if it just appears as the error term. Right and wrong. As I have said, everything in the model is "effect of necessity" (if we are not experimenting about quantum mechanics), even random noise. The difference is in how we treat those effects. The effects of one strong, but unknown, variable will show as "structure" in our residuals, for instance, and will deserve special attention because that variable could be detected and added to the model. But the effects of many unknown independent variables, all of them contributing slightly to what we observe, can be treated only probabilistically, either they are measurement errors, sampling error, or hidden variables. So, to sum up: a) All that we observe is the result of necessity b) Much of what we observe can be treated only probabilstically. not because it is essentially different, but because the form and impact of the cause effect relationship is beyond a detailed necessity treatment. c) We must always be well aware of wht we are doing: are we treating some variable as a possible necessity cause, and affirming that causal relation, or are we just modelling unknown variables probabilistically? That's where your arguments are confused. I disagree. I think yours is the cognitive mistake, and I think it is the mistake of inadequately considering what “random causes” means. “Random causes” are not “explanations”. They are the opposite of “explanations”. They are theunexplained aka unmodeled variance in your data. Thinking that “random” is an “explanation” is a really big cognitive mistake! But you are not the only person on this board to make it ? This is definitely wrong. Random causes, that is unknown causes that we model probabilistically, can very well be an explanation of what we observe. They explain probabilistically, exactly because they are not modelled in detail. But explain they do, just the same. Let's take the simplest experimentsl model, where we test differences in some test variable in two different groups, a test group and a contyrol group, by the methodology of Fisher's hypothesis testing. The resoning goes as follows: the groups differ only for the tested variable, and our null hypothesis is that our variable has not causal relation with what we observe in the data. And yet, in the data we do observe a difference (well, it would be very unlikely not to observe any). At this point, we have two possible explanations competing: we accept the null hypothesis, abd assume that the observed difference is well explained by the known possible cause that is sampling error, treated probabilistically, and not in detail. Or we reject the null hypothesis, and assume that our variable (or any other possible causal model) explains the observed difference. The decision, as you know, is usually taken on probabilistic terms, because the hyothesis we are rejecting or assuming as best explanation is a probabilistic hypothesis, where the cause is treated probabilistically. Therefore, "random causes" are causes just the same. Their treatment is probabilistic, but just the same we can have good motives to assume that those causes, treated probabilstically, are the best explanation for what we observe. The important point, that darwinists never want to recognize, is that when we assume a probabilistic cause in our model, the only way to analyze it, to decide if it is a credible cause or not, is a probabilstic analysis. That is exactly the point of ID. Neo darwinisms assumes RV as the engine of variation. That is a causal assumtpion for something that can be trated only probabilistically. Therefore, darwinists have the duty to analyze its probabilisitc credibility. They don't want to do that, so we do it for them :) .gpuccio
January 2, 2012
January
01
Jan
2
02
2012
08:54 AM
8
08
54
AM
PDT
Elizabeth: In the meantime, I will try to finish answering your post 19.2: Scientific methodology involves fitting models to data. Yes, you can build a non-stochastic model, but you still have to deal with the error term, in other words the residuals, aka the stuff impacting on your data that you haven’t modelled. And you can either make assumptions about the distributions of your residuals (assume a Gaussian, for instance) or you can actually include specified distributions for the uncertain factors in your model. If you don’t – you report a non-stochastic model with the assumption that the residuals are normally distributed, and they aren’t, you will be making a serious error, and your model will be unreliable. Well, yes and not. Statistical modeling is one of my favourite activities, and I would say that a normal distribution of residuals, for instance in a regression model, is what we expect if our necessity model really explains completely our data, except for random error. Now, that random error could just be error measurement (as is more likely in an experiment of physics, where the relation between causal modle and data is often more direct and simple). In that case, we certainly expect a normal distribution of residuals. Another possibility, much more common in biology and medicine, is that our causal model, while true, explains only part of what we observe. In biology, there may be a lot of hidden variables that we cannot take into account (each individual is different, each individual is much more complex than we can understand, and so on). That's why our biological (and especially medical) causal models rarely explain not only all, but not even much, of what we observe. Just to make an example, if we test a new grug, we don't expect it to cure 100% of the cases. It may cure just 15%. And yet, the causal relation between drug administration and cure can be strong abd reliable. In the same way, our regression in a disgnostic hypothesis between two variables (with a specific causal model in the background) can be strongly significant, but the effect size (such as R^2) can be relatively small. And yet, the information is sometimes precious, and the causal relation strongly assumed. Now the point is, if our supposed cause is really acting in data, but it is only one of many causes, our statistical relation will be diluted by many other effects, not only by "errors", such as measurement errors. The point is: if those other diluting effects are many and independent, then we expect that our residuals will be normally distributed. But take the case that there is one major hidden causal factor in the system, that we don't know of. Then we can expect that residuals will not be normally distributed. IOWs, residuals will contain structure that is not accounted for in the model. Identifying that structure and adding the unknwon term to the original model leads to a better model, but is not always possible. Anyway, in that case, the fact that residuals are not normally distributed does not mean that our causal model is not good: our supposed cause can still be a very valid and credible cause, but we have reason to belive that there is at least another big, detectable cause that may explain the structure in residuals, and we have the duty to look for it, if there is a chance to understand what it is. So, what you say is partially right, but in no way if invalidates the importance of statistical analysis to detect causal relations in data, even when they are diluted by error measurements, or simply by other unknown causal relations. In the end, the purpose of science is and remains to detect causes.gpuccio
January 2, 2012
January
01
Jan
2
02
2012
08:15 AM
8
08
15
AM
PDT
Elizabeth: Yes, I use it in the frequentist sense. In the case of protein domains, we have data about functional domains taken from 4 billion years of sampling. That's something. Again, we are not looking for final and precise measurements, but for a reasonable estimate. I a functional protein emerged 4 billion years ago, neutral variation has reasonably explored the functional space, while keeping the function the same. Even if the functional space had been not completely explored, still that would be a good approximation. Protein families include sequences with big differences, a furthur evidence that the functionla space, or a great part of it, has been traversed in the course of evolution. When you get, by the Durston method, functional complexities of hundreds of bits for many protein families, you should assume that only a very tiny part of the functional space has been explored by neutral evolution to ignore that finding. There is absolutely no reason to believe that. All the data we have, both form observation of the proteome and its natural history, and from lab data, confirn that the functional space is extremely small compared to the serach space. Just consider the rugged landscape paper: "Experimental Rugged Fitness Landscape in Protein Sequence Space" "Although each sequence at the foot has the potential for evolution, adaptive walking may cease above a relative fitness of 0.4 due to mutation-selection-drift balance or trapping by local optima. It should be noted that the stationary fitness determined by the mutation-selection-drift balance with a library size of N(d)all is always lower than the fitness at which local optima with a basin size of d reach their peak frequencies (Figure 4). This implies that at a given mutation rate of d, most adaptive walks will stagnate due to the mutation-selection-drift balance but will hardly be trapped by local optima. Although adaptive walking in our experiment must have encountered local optima with basin sizes of 1, 2, and probably 3, the observed stagnations are likely due only to the mutation-selection-drift balance. Therefore, stagnation was overcome by increasing the library size. In molecular evolutionary engineering, larger library size is generally favorable for reaching higher stationary fitness, while the mutation rate, d, may be adjusted to maintain a higher degree of diversity but should not exceed the limit given by N=N(d) all to keep the stationary fitness as high as possible. In practice, the maximum library size that can be prepared is about 1013 [28,29]. Even with a huge library size, adaptive walking could increase the fitness, ~W , up to only 0.55. The question remains regarding how large a population is required to reach the fitness of the wild-type phage. The relative fitness of the wild-type phage, or rather the native D2 domain, is almost equivalent to the global peak of the fitness landscape. By extrapolation, we estimated that adaptive walking requires a library size of 10^70 with 35 substitutions to reach comparable fitness." Emphasis mine. Don't you think that "a library size of 10^70" (strangely similar as a number to some Axe's estimate) for "35 substitutions to reach comparable fitness" (strangely similar to my threshold for biological dFSCI) means something, in a lab experiment based on retrtiving an existing function in a set where NS is strongly working?gpuccio
January 2, 2012
January
01
Jan
2
02
2012
04:04 AM
4
04
04
AM
PDT
Yes, I thought you were using probability in the frequentist sense. The problem there, though, is frequency estimates depend on having data sampled from the entire relevant population, and that's what we don't have in the case of protein domains. We only have data sampled from the population that happened to make it. A bit like estimating the sex ratio of the human population from the sex ratio in CEOs. "Stochastic", in English, is more precise than "random" because "random" has many meanings in regular English usage, including "equiprobable" and "purposeless". In Italian (or the Italian equivalent) it may be more precise. "Stochastic" however is used much more rarely, and with much more precise meaning, and denotes a system, or process, or model, that is not deterministic. Even a straightforward regression model is a stochastic model because it contains an error term, assumed to be normally distributed. That's in addition, of course, to the error associated with the fitted parameters.Elizabeth Liddle
January 2, 2012
January
01
Jan
2
02
2012
02:37 AM
2
02
37
AM
PDT
heh. Did I just verify that I am human?Elizabeth Liddle
January 2, 2012
January
01
Jan
2
02
2012
02:29 AM
2
02
29
AM
PDT
Elizabeth: If that is your problem, you can be reassured: in my reasoning, probability is always used as a frequency estimate. The concept of reduction of uncertainty is used in Durston's method to approximate the target space in protein families, and is treated mathematically according to Shannon's concepts. The reduction of uncertainty due to the constraint of the specific function is computed from the probability of each aminoacid at each site, but is a different concept. So, there is absolutely no problem here. And I would be happy if you could specify in what sense "stochastic" is more precise than "random".gpuccio
January 2, 2012
January
01
Jan
2
02
2012
12:40 AM
12
12
40
AM
PDT
1 2 3 4 8

Leave a Reply