Coming from a long and detailed discussion about the limits of Natural Selection, here:
I realized that some attention could be given to the other great protagonist of the neo-darwinian algorithm: Random Variation (RV).
For the sake of clarity, as usual, I will try to give explicit definitions in advance.
Let’s call RV event any random event that, in the course of Natural History, acts on an existing organism at the genetic level, so that the genome of that individual organism changes in its descendants.
That’s more or less the same as the neo-darwinian concept of descent with modifications.
A few important clarifications:
a) I use the term variation instead of mutation because I want to include in the definition all possible kinds of variation, not only single point mutations.
b) Random here means essentially that the mechanisms that cause the variation are in no way related to function, whatever it is: IOWs, the function that may arise or not arise as a result of the variation is in no way related to the mechanism that effects the change, but only to the specific configuration which arises randomly from that mechanism.
In all the present discussion we will not consider how NS can change the RV scenario: I have discussed that in great detail in the quoted previous thread, and those who are interested in that aspect can refer to it. In brief, I will remind here that NS does not act on the sequences themselves (IOWs the functional information), but, if and when and in the measure that it can act, it acts by modifyng the probabilistic resources.
So, an important concept is that:
All new functional information that may arise by the neo-darwinian mechanism is the result of RV.
Examining the Summers paper about chloroquine resistance:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4035986/
I have argued in the old thread that the whole process of generation of the resistance in natural strains can be divided into two steps:
a) The appearance of an initial new state which confers the initial resistance. In our example, that corresponds to the appearance of one of two possible resistant states, both of which require two neutral mutations. IOWs, this initial step is the result of mere RV, and NS has no role in that. Of course, the initial resistant state, once reached, can be selected. We have also seen that the initial state of two mutations is probably the critical step in the whole process, in terms of time required.
b) From that point on, a few individual steps of one single mutation, each of them conferring greater resistance, can optimize the function rather easily.
Now, point a) is exactly what we are discussing in this new thread.
So, what are the realistic powers of mere RV in the biological world, in terms of functional information? What can it really achieve?
Another way to ask the same question is: how functionally complex can the initial state that for the first time implements a new function be, arising from mere RV?
And now, let’s define the probabilistic resources.
Let’s call probabilistic resources, in a system where random events take place, the total number of different states that can be reached by RV events in a certain window of time.
In a system where two dies are tossed each minute, and the numbers deriving from each toss are the states we are interested in, the probabilistic resources of the system in one day amount to 1440 states.
The greater the probabilstic resources, the easier it is to find some specific state, which has some specific probability to be found in one random attempt.
So, what are the states generated by RV? They are, very simply, all different genomes that arise in any individual of any species by RV events, or if you prefer by descent with modification.
Please note that we are referring here to heritable variation only, we are not interested to somatic genetic variation, which is not transmitted to descendants.
So, what are the probabilistic resources in our biological world? How can they be estimated?
I will use here a top-down method. So, I will not rely on empirical data like those from Summers or Behe or others, but only on what is known about the biological world and natural history.
The biological probabilstic resources derive from reproduction: each reproduction event is a new state reached, if its genetic information is different from the previous state. So, the total numbet of states reached in a system in a certain window of time is simply the total number of reproduction events where the genetic information changes. IOWs, where some RV event takes place.
Those resources depend essentially on three main components:
- The population size
- The number of reproductions of each individual (the reproduction rate) in a certain time
- The time window
So, I have tried to compute the total probabilistic resources (total number of different states) for some different biological populations, in different time windows, appropriate for the specific population (IOWs, for each population, from the approximate time of its appearance up to now). As usual, I have expressed the final results in bits (log2 of the total number).
Here are the results:
Population | Size | Reproduction rate (per day) | Mutation rate | Time window | Time (in days) | Number of states | Bits | + 5 sigma | Specific AAs |
Bacteria | 5.00E+30 | 24 | 0.003 | 4 billion years | 1.46E+12 | 5.26E+41 | 138.6 | 160.3 | 37 |
Fungi | 1.00E+27 | 24 | 0.003 | 2 billion years | 7.3E+11 | 5.26E+37 | 125.3 | 147.0 | 34 |
Insects | 1.00E+19 | 0.2 | 0.06 | 500 million years | 1.825E+11 | 2.19E+28 | 94.1 | 115.8 | 27 |
Fish | 4E+12 | 0.1 | 5 | 400 million years | 1.46E+11 | 2.92E+23 | 78.0 | 99.7 | 23 |
Hominidae | 5.00E+09 | 0.000136986 | 100 | 15 million years | 5.48E+09 | 3.75E+17 | 58.4 | 80.1 | 19 |
The mutation rate is expressed as mutations per genome per reproduction.
This is only a tentative estimate, and of course a gross one. I have tried to get the best reasonable values from the sources I could find, but of course many values could be somewhat different, and sometimes it was really difficult to find any good reference, and I just had to make an educated guess. Of course, I will be happy to acknowledge any suggestion or correction based on good sources.
But, even if we consider all those uncertainties, I would say that these numbers do tell us something very interesting.
First of all, the highest probabilistic resources are found in bacteria, as expected: this is due mainly to the huge population size and high reproduction rate. The number for fungi are almost comparable, although significantly lower.
So, the first important conclusion is that, in these two basic classes of organisms, the probabilistic resources, with this hugely optimistic estimate, are still under 140 bits.
The penultimate column just adds 21.7 bits (the margin for 5 sigma safety for inferences about fundamental issues in physics). What does that mean?
It means, for example, that any sequence with 160 bits of functional information is, by far, beyond any reasonable probability of being the result of RV in the system of all bacteria in 4 billion years of natural history, even with the most optimistic assumptions.
The last column gives the number of specific AAs that corrispond to the bit value in the penultimate column (based on a maximum information value of 4.32 bits per AA).
For bacteria, that corresponds to 37 specific AAs.
IOWs, a sequence of 37 specific AAs is already well beyond the probabilistic resources of the whole population of bacteria in the whole world reproducing for 4 billion years!
For fungi, 147 bits and 34 AAs are the upper limit.
Of course, values become lower for the other classes. Insects still perform reasonably well, with 116 bits and 27 AAs. Fish and Hominidae have even lower values.
We can notice that Hominidae gain something in the mutation rate, which as known is higher, and that I have considered here at 100 new mutations per genome per reproduction (a reasonable estimate for homo sapiens). Moreover, I have considered here a very generous population of 5 billion individuals, again taking a recent value for homo sapiens. These are not realistic choices, but again generous ones, just to make my darwinist friends happy.
Another consideration: I have given here total populations (or at least generous estimates for them), and not effective population sizes. Again, the idea is to give the highest chances to the neo-darwinian algorithm.
So, these are very simple numbers, and they should give an idea of what I would call the upper threshold of what mere RV can do, estimated by a top down reasoning, and with extremely generous assumptions.
Another important conclusion is the following:
All the components of the probabilistic resources have a linear relationship with the total number of states.
That is true for population size, for reproduction rate, mutation rate and time.
For example, everyone can see that the different time windows, ranging from 4 billion years to 15 million years, which seems a very big difference, correspond to only 3 orders of magnitude in the total number of states. Indeed, the highest variations are probably in population size.
However, the complexity of a sequence, in terms of necessary AA sites, has an exponential relationship with the functional information in bits: a range from 19 to 37 AAs (only 18 AAs) corresponds to a range of 24 orders of magnitude in the distribution of probabilistic resources.
Can I remind here briefly, without any further comments, that in my OP here:
I have analyzed the informational jump in human conserved information at the apperance of vertebrates? One important result is that 10% of all human proteins (about 2000) have an information jump from pre-vertebrates to vertenrates of at least (about) 500 bits (corresponding to about 116 AAs)!
Now, some important final considerations:
- I am making no special inferences here, and I am drawing no special conclusions. I don’t think it is really necessary. The numbers speak for themselves.
- I will be happy of any suggestion, correction, or comment. Especially if based on facts or reasonable arguments. The discussion is open.
- Again, this is about mere RV. This is about the neutral case. NS has nothing to do with these numbers.
- For those interested in a discussion about the possible role of NS, I can suggest the thread linked at the beginning of this OP.
- I will be happy to answer any question about NS too, of course, but I would be even more happy if someone tried to answer my two questions challenge, given at post #103 of the other thread, and that nobody has answered yet. I paste it here for the convenience of all:
Will anyone on the other side answer the following two simple questions?
1) Is there any conceptual reason why we should believe that complex protein functions can be deconstructed into simpler, naturally selectable steps? That such a ladder exists, in general, or even in specific cases?
2) Is there any evidence from facts that supports the hypothesis that complex protein functions can be deconstructed into simpler, naturally selectable steps? That such a ladder exists, in general, or even in specific cases?