Uncommon Descent Serving The Intelligent Design Community

Extra Characters to the Biological Code

Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

Even if compressed I’ve always thought that the known informational content was not enough data. This makes sense because from an engineering point of view because there doesn’t seem to be enough data storage space in a few billion base pairs of nuclear DNA to specify all the detail in a mammal or similarly complex animal. It’s enough room to store a component library of the nuts and bolts required to build individual cells of different types but not the whole animal.

Obviously no one can argue against the assertion that we do not fully comprehend the biological code. Unlike with computer code we cannot simply determine at a glance which informational content defines what biological function. The title of geneticist Sermonti’s book is “Why a Fly is not a Horse”. In it he writes the only thing we know for certain about why a horse is a horse and not a fly is because its mother was a horse.

Thus, based on our current level of knowledge, any calculations that quantify biological informational content are going to be rough estimates. Personally, when measuring the functional sequence complexity of code encoding proteins I’ve long biased any calculations I do by rounding up to several extra informational bits. And this action seems justified by this recent news:

“Anyone who studied a little genetics in high school has heard of adenine, thymine, guanine and cytosine–the A, T, G and C that make up the DNA code. But those are not the whole story. The rise of epigenetics in the past decade has drawn attention to a fifth nucleotide, 5-methylcytosine (5-mC), that sometimes replaces cytosine in the famous DNA double helix to regulate which genes are expressed. And now there’s a sixth: 5-hydroxymethylcytosine.

In experiments to be published online April 16 by Science, researchers reveal an additional character in the mammalian DNA code, opening an entirely new front in epigenetic research.

The work, conducted in Nathaniel Heintz’s Laboratory of Molecular Biology at The Rockefeller University, suggests that a new layer of complexity exists between our basic genetic blueprints and the creatures that grow out of them. “This is another mechanism for regulation of gene expression and nuclear structure that no one has had any insight into,” says Heintz, who is also a Howard Hughes Medical Institute investigator. “The results are discrete and crystalline and clear; there is no uncertainty. I think this finding will electrify the field of epigenetics.”

Genes alone cannot explain the vast differences in complexity among worms, mice, monkeys and humans, all of which have roughly the same amount of genetic material. Scientists have found that these differences arise in part from the dynamic regulation of gene expression rather than the genes themselves. Epigenetics, a relatively young and very hot field in biology, is the study of nongenetic factors that manage this regulation.”

Go to Science Daily for more.

Comments
AD, I will look that up. Today I checked out the April 10 issue of Science from the library. I haven't gotten to Ingolia yet, but will. I've been wanting to read more about protein signaling, and there's an article about it on p.198 (Smock & Gierasch).womanatwell
April 30, 2009
April
04
Apr
30
30
2009
03:06 PM
3
03
06
PM
PST
womanatwell, you may be applying the term microevolution too loosely. How "closely related" were the two Hydra species examined in the paper? Take a look at the phylogenetic trees in reference 34 (Hemmrich G, et al., Molecular phylogenetics in Hydra, a classical model in evolutionary developmental biology. Mol Phyl Evol. 2007;44:281–290) and you will see that H. oligactis and H. magnipapillata are not sister species. (Not so very closely related.) Incidentally, your reference to Behe and TEOE reminded me of gpuccio's statement at #48:
But there is another approach which gives us a more realistic idea of where we are with darwinian explanations. Behe in TEOE has suggested that, in natural models like malaria, random mutations can, at best, provide two coordinated necessary mutations under a very strong selective pressure.
I have been looking into Behe's claims about the chloroquine resistance data and how they relate to his "edge," and I find them questionable...Adel DiBagno
April 30, 2009
April
04
Apr
30
30
2009
06:08 AM
6
06
08
AM
PST
gpuccio,
Some time ago I was in favor of a completely gradualistic design implementation, except for OOL. But the data about the two “explosions”, and possible others, have convinced me that probably design has been implemented with different modalities in natural history: sometimes more gradually, sometimes more suddenly.
I agree. I even am starting to wonder about microevolution, since they are finding species-specific unique genes, as in: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2586386 As Behe says in TEOE, HIV continually mutates, but remains HIV.womanatwell
April 30, 2009
April
04
Apr
30
30
2009
05:39 AM
5
05
39
AM
PST
gpuccio, Good to see that you are back. To avoid distraction, I'll reserve further comment until you have come up with support for your claim that
There are even paper in the literature trying to support that (I will give you the reference to the most important one later, but you probably know it, it’s the one about generating functional calcium binding proteins from a random set of sequences).
Sorry, I don't recognize that reference.Adel DiBagno
April 29, 2009
April
04
Apr
29
29
2009
01:53 PM
1
01
53
PM
PST
womanatwell: Very good arguments. And thank you for the links. ATP synthase is one wonderful example of functional complexity, but it's only one of the many available. At present, it seems that a lot of fundamental proteins had to "be there" very early. I am convinced that life started very complex and organized. All OOL theories are absolute myths: I can only pity darwinists who have to try to explain what cannot be explained (at least, not their way). OOL is an example of sudden emergence of complexity. The ediacara and cambrian explosions are two more. While in general speciation can be though as more gradual, even from an ID point of view, for these three great steps graduality is practically prohibited by facts themselves, as we know them. Some time ago I was in favor of a completely gradualistic design implementation, except for OOL. But the data about the two "explosions", and possible others, have convinced me that probably design has been implemented with different modalities in natural history: sometimes more gradually, sometimes more suddenly. The transition from prokaryotes to eukaryotes is another good candidate for "acute" design implementation. All these are issues which can be partially clarified as our understanding of natural history improves.gpuccio
April 29, 2009
April
04
Apr
29
29
2009
01:27 PM
1
01
27
PM
PST
AD, thanks. The thing about ATP synthase is that it's in all three domains--Archaea, Bacteria and Eukaryotes, so would have been there before any branching. It needs a working membrane so that there is an osmotic/electrochemical pull on the hydrogen protons from one side to the other. The energy is converted to the high-energy phosphate bond of ATP. It is used in just about all the cell's metabolism, including the construction of DNA, RNA and proteins.womanatwell
April 27, 2009
April
04
Apr
27
27
2009
03:07 PM
3
03
07
PM
PST
womanatwell, Nice references. Rotary motors! You have made me and gpuccio happy.Adel DiBagno
April 27, 2009
April
04
Apr
27
27
2009
01:39 PM
1
01
39
PM
PST
gpuccio [73]: Hasta la vista. Thanks for considering me reasonable. Can't fight the facts. That would be unprofessional.Adel DiBagno
April 27, 2009
April
04
Apr
27
27
2009
01:34 PM
1
01
34
PM
PST
Adel: I will be away for a couple of days. I am very sorry that Nakashima has been banned. I was not aware of that. You ask: "What is the source of your belief about the hopes of Darwinists?" Th hope for "huge functional spaces" has been expressed many times by darwinists here in the course of discussions. It is usually expressed as the conviction that big "slopes" exist which easily allow to pass by random variation from one island of functionality to another one. There are even paper in the literature trying to support that (I will give you the reference to the most important one later, but you probably know it, it's the one about generating functional calcium binding proteins from a random set of sequences). Those papers have been many times quoted against ID, and against my personal arguments in particular. So, I don't think I am making that up. But if you agree with me that: "the fraction of possible protein sequences and protein domains that are functional is a very small fraction of the total search space" then I am very happy of that. It confirms my idea that you are a reasonable guy :-)gpuccio
April 26, 2009
April
04
Apr
26
26
2009
08:16 PM
8
08
16
PM
PST
Here's a picture of ATP Synthase from RCSB Protein Data Bank: http://www.rcsb.org/pdb/static.do?p=education_discussion/molecule_of_the_month/pdb72_1.htmlwomanatwell
April 26, 2009
April
04
Apr
26
26
2009
12:33 PM
12
12
33
PM
PST
Adel DiBagno, I'm honored that my presence is requested. An important aspect to consider in the possible usefulness of proteins is the ability for them to fold into usable shapes. The ATP synthase is a collection of at least 8 types of proteins that are perfectly shaped to work togehter. Some are used once, some 3 times, some more than 3 times. They act together to capture an ADP molecule, and add a phospate to it to produce ATP, the energy molecule of the cell. In one of the simplest, that of E. Coli, there are a total of 6000 amino acids. This energy source had to be there pretty early on. You can read from the abstract of this paper: http://www.pnas.org/content/100/23/13270.abstract?cited-by=yes&legid=pnas;100/23/13270 that protein folds are not easy to come by. In nature, collections of amino acids that fold are rare, much less folding into just the right shape.womanatwell
April 26, 2009
April
04
Apr
26
26
2009
12:23 PM
12
12
23
PM
PST
What part of transcription and translation- complete with proof-reading, error-correction and editing, strikes you as being cobbled together via an acumulation of genetic accidents? And How can we test the premise that a bacterial flagellum, for example, arose from a population that never had one via an acumulation of genetic accidents?Joseph
April 26, 2009
April
04
Apr
26
26
2009
08:04 AM
8
08
04
AM
PST
gpuccio, I have seen on the Poofery thread that Mr Nakashima has been banned. I am sorry to learn that, because I had hoped that more personalities could be engaged in this discussion. I hope that womanatwell will come back. Anyway, your explanation of Durston et al. was most lucid and helpful. I will defer further questions and comments about that contribution to FSCI because I want to focus for the moment on a closely related issue. You said in #56:
The size of the target space for a specific function is the most difficult variable to assess, even as an order of magnitude. Indeed, at present no one can define it with certainty. That’s where the opinions of IDists and darwinists necessarily diverge: we do believe that the target space, however big, is anyway a tiny fraction of the search space. Darwinists do hope in huge functional spaces, and profit as much as they can of the present partial ignorance about the relationship between protein structure and function. But one thing is certain: this is an issue which is going to be clarified, and in a relatively short time. So, this particular “gap” in our knowledge will be filled, and we will see who is right.
(My emphasis) What is the source of your belief about the hopes of Darwinists? (References, please.) I don't remember when I learned that the fraction of possible protein sequences and protein domains that are functional was a very small fraction of the total search space, just as the fraction of viable life forms is a fraction of the total conceivable search space pertaining thereunto. But I've known those things for quite a while, and I'm not especially perceptive. So that issue seems already to have been clarified, and both sides are right.Adel DiBagno
April 26, 2009
April
04
Apr
26
26
2009
05:39 AM
5
05
39
AM
PST
What does ID have to offer? 1- That living organisms are NOT reducible to matter, energy, chance and necessity 2- That the DNA sequence is NOT the information 3- That like all other designs we can study and figure out this one so that we can better maintain it.Joseph
April 25, 2009
April
04
Apr
25
25
2009
02:12 PM
2
02
12
PM
PST
Adel, My conclusion is spot on. Otherwise you would just put up the data. So until you answer my questions don't be asking anything of me.Joseph
April 25, 2009
April
04
Apr
25
25
2009
02:07 PM
2
02
07
PM
PST
Adel: By the way: no insult taken. Ignorance is more something of a compliment for me :-)gpuccio
April 25, 2009
April
04
Apr
25
25
2009
10:00 AM
10
10
00
AM
PST
Adel: Durston's FSC is a measure of FSCI, only the method of measurement is different. In the traditional approach, to measure FSCI in a protein, you have to know bith the search space (which is simple) and the target space (which is difficult), and then you have to calculate the ratio of the second to the first. In the Durston approach, you consider not a single protein, but a big family of proteins with the same function and similar structure. Then you align all the primary structures, and compute the H (uncertainty) for each position, according to how much that position varies in the family. So, if an aminoacid is alwasy the same in all the proteins, the H will be the least, and the reduction of uncertainty with respect to the ground state will be the highest. IOW, that position can only host that specific aminoacid, if the function has to be conserved, and contributes very much to the total functional information. On the contrary, if one position is occupied preferentially by 2 or 3 amonoacids, amd rarely by a few others, its informative power will be less. Finally, if a position can be occupied with the same frequency by any of the 20 aminoacids, its H will be as high as in the ground state, and therefore its contribution to H the reduction of uncertainty will be null. IOW, that position has no fucntional informative value. In the ground state (a random, non functional sequence of the same length) H will be the highest. So, the highest value of H per position is log 20 (in base 2), that is 4.32 bits. If a position bears always the same aminoacid, its H will be 0, and so the uncertainty reduction will be of 4.32 bits, ans so the Fit value for that position. If a position changes more, its Fit value will be lower. If a position changes randomly, its H will be 4.32, and its Fit value 0. The total Fit value for a molecule is obtained by the sum of the individual Fit values per position. The average Fit value per position is obtained by dividing for the number of positions. So, let's see the example of Ribosomal S12 protein family, cited in the paper. The protein is 121 AAs long. So, the ground state (a random sequence of that length) has an H value of about 523 bits, corresponding to the size of the whole search space of 20^121 sequences. The Fit value of the protein family is 359 bits (not 379: there is an error in the text). That means that the H value of the functional state (the protein family) is about 164 bits. So, the reduction of H from the ground state is 523 - 164 = 359 bits. That's the Fit value for the protien family. What does that mean? a) 523 bits is the H of the ground (random) state, which corresponds to the whole search space of 20^121 (about 10^157) b) The H of the protein family (the functional state) is much lower: only 164 bits, which corresponds to about 10^49- IOW, only 10^49 sequences of that length are expected to express that function. That is an "indirect" way of measuring the target space, and the true wonderful intuition in the method. c) The difference, 359 bits, expresses the functional information of the molecule in Fits. Please note that it is the same as the ratio of the target space (10^49) to the search space (10^157): 10^-72 (-log of that is 359 bits). So, the value in Fits expresses exactly the probability to find the target space in the search space by a random search. For this molecule, that probability according to the above method is of 1:10^72. As I have arbitrarily set my threshold to reject any random hypothesis in the biological context at 1:10^30 - 1:10^50, with my criteria such a molecule is of 40 - 20 orders of magnitude beyond the threshold. IOW, unless a credible necessity mechanism is offered for its emergence (that is, a detailed series of selectable sub modifications starting for another previously existing protein with another completely different function), the best explanation at present is that it is designed. Is everything clear? This is a method of analysis. It is simple. It is quantitative. It can be easily applied to what we know. Is it perfect? Certainly not. It is obviously based on many assumptions. What I believe is that, if and when we have all the data to calculate the target space "directly", IOW to know with certainty how many sequences of a certain length can express a specific function, the Fit value of those proteins will be shown to be higher (there is a reason for that belief, but for the moment I will not debate it). But the method is here, and it can be applied, and it definitely measures, although with some approximation and probably error, the informational content of known proteins, which, as you can see, is not a myth or a vague argument, but a precise reality.gpuccio
April 25, 2009
April
04
Apr
25
25
2009
09:59 AM
9
09
59
AM
PST
womanatwell: In the model we are interested in, that is proteins, there should not be so much a problem like the one you suggest. Protein function is usually tied to a specific 3D structure and active site conformation. Usually, if two proteins have the same function in different species, it is very likely that their 3D structure is similar. So, we could define a function as connected to a 3D structure. If there are proteins with similar function, but completely different structure, they could be treated separately. Essentially, the functional information is necessary to get the correct folding "and" the correct active site. It is interesting that the relationship between primary structure and tertiary structure is very complex, and difficult to compute. For instance, myoglobins and related molecules have almost the same structure (and function) in very distant species, and yet the primary structure is sometimes very different. The Durston method has the great value of easily assigning an "average" value to each aminoacid in terms of H reduction, but it is obviously an approximation. Sometimes, an aminoacid can change without influencing the function only if many other coordinated changes occur at the same time. It is interesting that what we observe in protein families is conservation of function, and somtimes adaptation of it to different environments (what we coulld call "fine tuning" of the function), rather than "evolution" of the function. One of the great surprises of recent sequencing of genomes is that many proteins are very old, and are already present in "simple" organisms, where their function is difficult to understand (see for instance the paper "Sea Anemone Genome Reveals Ancestral Eumetazoan Gene Repertoire and Genomic Organization"). And, at the same time, practically all species reveal also species specific proteins, which have apparently no known homologues. So, we have two different and serious problems for which the current darwinian paradigm has really no convincing answer: 1) How could so many different proteins with so many different functions and structures, "evolve" so efficiently as to be already present even in the first stages of life? Darwinian theory can only avoid that problem by searching refuge in the misty mythologies of OOL "theories". 2) How could so many species "evolve" specific new proteins, without a trace of homologues in similar species? Darwinists can only hope that, in time, such homologues will be found. I believe they will not.gpuccio
April 25, 2009
April
04
Apr
25
25
2009
09:02 AM
9
09
02
AM
PST
Joseph, Your conclusion is unwarranted. By the way, where did you study marine biology? Do you have any publications?Adel DiBagno
April 25, 2009
April
04
Apr
25
25
2009
06:14 AM
6
06
14
AM
PST
gpuccio, Thanks for the links. They work. I will read them and try to understand the issues. In the interim, I've been reading and re-reading the Durston et al. paper and I have to confess that I don't follow the math. Any help you'd care to give in explaining the authors' argument would be welcome. I would especially appreciate an explanation of how the measure they term FSC relates to FSCI as measured by you or your colleagues here. Table 1 lists FSC (in Fits) for 35 protein families, in values ranging from 46 Fits to 2,416 Fits. What are we to make of those numbers? (How do those numbers relate to the argument from design?) Please excuse any apparent delays in responses by me. I've been placed in moderation for a perceived insult to you in an earlier post. I didn't intend my reference to ignorance as an insult, and I hope that you didn't take it that way. We are all ignorant of most things. I enjoy this site as a way to reduce my ignorance.Adel DiBagno
April 25, 2009
April
04
Apr
25
25
2009
05:40 AM
5
05
40
AM
PST
gpuccio, Sorry, don't know what key I hit to cut me off. Thanks for your explanations and links. I read the Durston paper and will read the discussions, but that will take a while. Since we are here on the thread, I'd like to ask how the comparisons of completely different molecules with the same functions can help quantitatively. For example, how can you really compare a typewriter quantitatively with a computer-printer system? Even though they both print letters, they are so different that I don't see how you narrow down the space of possible letter-printing machines. BTW, "Fit" is a wonderful term for functional bit.womanatwell
April 25, 2009
April
04
Apr
25
25
2009
05:36 AM
5
05
36
AM
PST
gpuccio,womanatwell
April 25, 2009
April
04
Apr
25
25
2009
05:30 AM
5
05
30
AM
PST
Nakashima, Adel: Regarding the links to past discussion about quantitative calculation of FSCI. One of the best discussions (IMO) on that subject took place recently on Mark Frank's blog, in a more "intimate" environment. I think many good points were expressed there by both parts, thanks also to the contribution of very good "adversaries", such as Mark himself, Zachriel and others. Here is the link: The Clapham Omnibus and the thread, titled "Let's calculate some CSI", is of January 3, 2009. A very good discussion about the Dirston approach was conducted here by Durston himself, until for some "mysterious" reasons he had to "go away". I proposed to go on with the discussion, but nobody complied. The discussion started at this thread, of January 28, 2009: Mathematically Defining Functional Information In Biology and went on on this other one, of February 3, 2009: Durston Cont’d (I hope the links work...) But the subject has been discussed many times here, and under very different perspectives. These are just the examples I remember best.gpuccio
April 25, 2009
April
04
Apr
25
25
2009
01:07 AM
1
01
07
AM
PST
You are sorry, Adel- otherwise you would have answered the questions. That you didn't pretty much demonstrates you can't. And that you can't proves my point.Joseph
April 24, 2009
April
04
Apr
24
24
2009
02:47 PM
2
02
47
PM
PST
The point is that YOUR position doesn’t have anything to offer besides “it evolved”.
Sorry, Joseph, The point is that your position has nothing to offer.Adel DiBagno
April 24, 2009
April
04
Apr
24
24
2009
02:14 PM
2
02
14
PM
PST
Nakashima: I will try to sum up very briefly my views about calculation of FSCI in proteins. At present there are at least two different approaches. 1) The first, and more fundamental, is to define a function in a context (you are perfectly right, every function is defined in a context). So, for instance, let's consider an enzyme which catalyzes a specific reaction in a cell. The function is then defined as a minimum level (arbitrarily fixed) of enzymatic activity in that cell environment. Then we look at the protein length, and define as search space the total space of configurations of that length L, that is 20^L (that is an approximation, because obviously shorter and longer proteins can have the same activity, but it is simpler to reason about a fixed length, at least as a first approach). Up to now, everything is simple. Now comes the difficult part. We have to calculate, at least approximately, the subset pf the search space which has the defined function at the defined minimum level. Let's call that the target space. The size of the target space for a specific function is the most difficult variable to assess, even as an order of magnitude. Indeed, at present no one can define it with certainty. That's where the opinions of IDists and darwinists necessarily diverge: we do believe that the target space, however big, is anyway a tiny fraction of the search space. Darwinists do hope in huge functional spaces, and profit as much as they can of the present partial ignorance about the relationship between protein structure and function. But one thing is certain: this is an issue which is going to be clarified, and in a relatively short time. So, this particular "gap" in our knowledge will be filled, and we will see who is right. Once we have an approximate idea of the size of the target space, the rest is easy. The rate of the target space to the search space expresses well enough the probability to access the target space by a random search "from scratch", under the very reasonable assumption of an uniform distribution for the nucleotide sequences in a random biochemical system. At that point, that probability has to be compared with the existing probability resources in the assumed biological model (available time, reproduction rate, population size, etc.), or, more simply, a threshold can be assumed as low enough to reject the random hypothesis in "any" biological context (I have suggested that such a threshold could be fixed at about 10^-30 or 10^-50 for the biological context. Let's remember that Dembski's UPB of 10^150 was an extreme value intended to cut out any random hypothesis in the whole known universe...). One thing should be clear. The above model refers to calculations of absolute FSCI for one protein, for one specific function, and assuming a random generation from scratch. And it does not take into account any necessity mechanism, like NS. IOW, the above scenario is more appropriate for a partial analysis of OOL scenarios. So, let's go to different mechanisms which could more directly interest darwinists. We can apply the same principles to calculate the variation in FSCI in an evolutionary transition. But to do that, a specific evolutionary transition has to be proposed. IOW, if darwinists decide to propose a specific model of protein transition (something like: this protein superfamily derived from this other one in such and such time, in such and such population, with such and such mutation rate), then such a model can be quentitatively tested. In that case, we have to calculate the minimum necessary variation which transforms protein and function A into protein and function B: the search space and the target space will be defined for that variation, and not for the whole protein, and again the probability will be assessed of a random emergence of such a functional variation, and compared with the available probabilistic resources in the defined model. And NS? Well, it can be incorporated in the model easily. Once darwinists define explicitly what has been selected and why at the different steps, we can do again the calculation for each gap between selectable steps, and compare the probability to available resources for that step. In any case, any explicit model can be tested quantitatively, even if we may have to wait some time to get the necessary details to test it. But generic models, "just so stories", can never be tested. 2) Let's go to the second approach: in a way, it's easier, and bypasses some of the difficulties in the above approach. It's the "Durston" method, of calculating the variation in Shannon's H in protein families, and deduct FSCI from that. I will not go into detail about that, and just refer you to the Durston paper, already cited in a previous post.gpuccio
April 24, 2009
April
04
Apr
24
24
2009
09:59 AM
9
09
59
AM
PST
Mr Nakashima [54], I think you have me confused with gpuccio. I'm as curious as you are.Adel DiBagno
April 24, 2009
April
04
Apr
24
24
2009
07:14 AM
7
07
14
AM
PST
Mr DiBagno, I am sadly ignorant of how to calculate FSCI and or FCSI. So I have to ask, if I want to calculate the FSCI of another protein, do I assume anything about the prior probability of oxytocin (and many other small proteins)? Do I assume the solvent is water of a specific temperature and pressure? It seems to me that function is very much dependent on context.Nakashima
April 24, 2009
April
04
Apr
24
24
2009
07:00 AM
7
07
00
AM
PST
Adel, What part of transcription and translation- complete with proof-reading, error-correction and editing, strikes you as being cobbled together via an acumulation of genetic accidents? And How can we test the premise that a bacterial flagellum, for example, arose from a population that never had one via an acumulation of genetic accidents? The point is that YOUR position doesn't have anything to offer besides "it evolved".Joseph
April 24, 2009
April
04
Apr
24
24
2009
06:06 AM
6
06
06
AM
PST
Khan:
true enough, but if the egg had as much control over development as you claim, you would expect the cloned gaur to have at least some cow-like features. but it did not.
A gaur is very cow-like. Or are you saying that it should have been more feminine? And BTW cartilage is NOT bone. And that means there isn't any such thing as "cartilaginous bones". Thanks for proving that you are either dishonest or have absolutely no clue.Joseph
April 24, 2009
April
04
Apr
24
24
2009
06:02 AM
6
06
02
AM
PST
1 2 3

Leave a Reply