A fierce argument has been raging over at Barry Arrington’s post, A dog is a chien is a perro is a hund, over whether the genetic code is really a semiotic code, or whether “code” is merely a scientific term of convenience in this case. In this post, I hope to clarify the issues and sharpen the discussion between the two sides.
Let’s begin with Barry Arrington’s argument:
An arrangement of signs is arbitrary when the identical purpose could be accomplished through a different arrangement of signs if the rules of the semiotic code were different…
Here’s an example of an arbitrary arrangement of signs: DOG. This is the arrangement of signs English speakers use when they intend to represent Canis lupus familiaris. … Now, the point is that there is nothing inherent in a dog that requires it to be represented in the English language with the letters “D” followed by “O” followed by “G.” If the rules of the semiotic code (i.e., the English language) were different, the identical purpose could be accomplished through a different arrangement of signs. We know this because in other codes the same purpose is accomplished with vastly different signs. In French the purpose is accomplished with the following arrangement of signs: C H I E N. In Spanish the purpose is accomplished with the following arrangement of signs: P E R R O. In German the purpose is accomplished with the following arrangement of signs: H U N D…
How does this apply to the DNA code? The arrangement of signs constituting a particular instruction in the DNA code is arbitrary in the same way that the arrangement of signs for representing Canis lupus familiaris is arbitrary. For example, suppose in a particular strand of DNA the arrangement “AGC” means “add amino acid X.” There is nothing about amino acid X that requires the instruction “add amino acid X” to be represented by “AGC.” …
Why is all of this important to ID? It is important because it shows that the DNA code is not analogous to a semiotic code. It is isometric with a semiotic code. In other words, the digital code embedded in DNA is not “like” a semiotic code, it “is” a semiotic code. This in turn is important because there is only one known source for a semiotic code: intelligent agency.
In what follows, I’d like to sort out some of the issues relating to whether we can properly speak of a genetic code, and whether talk of a “genetic code” is indispensable to biology. I’ll also look at some objections to the term “genetic code,” before drawing a conclusion as to how the issue might be successfully adjudicated on a scientific basis.
What is “the genetic code”?
Here is how Wikipedia defines the genetic code:
The genetic code is the set of rules by which information encoded within genetic material (DNA or mRNA sequences) is translated into proteins (amino acid sequences) by living cells. Biological decoding is accomplished by the ribosome, which links amino acids in an order specified by mRNA, using transfer RNA (tRNA) molecules to carry amino acids and to read the mRNA three nucleotides at a time. The genetic code is highly similar among all organisms, and can be expressed in a simple table with 64 entries.
The code defines how sequences of these nucleotide triplets, called codons, specify which amino acid will be added next during protein synthesis. With some exceptions, a three-nucleotide codon in a nucleic acid sequence specifies a single amino acid. Because the vast majority of genes are encoded with exactly the same code (see the RNA codon table), this particular code is often referred to as the canonical or standard genetic code, or simply the genetic code, though in fact some variant codes have evolved. For example, protein synthesis in human mitochondria relies on a genetic code that differs from the standard genetic code.
Is the genetic code just a metaphor, or is it real?
On May 2, 2011, Professor Gregory Chaitin, a world-famous mathematician and computer scientist, gave a talk entitled, Life as Evolving Software. The talk was given at PPGC UFRGS (Portal do Programa de Pos-Graduacao em Computacao da Universidade Federal do Rio Grande do Sul Mestrado), in Brazil. Professor Chaitin is an avowed neo-Darwinist who is currently endeavoring to create a new mathematical version of Darwin’s theory which rigorously proves that evolution can really work. In 2012, Professor Chaitin published a book entitled, Proving Darwin: Making Biology Mathematical (Pantheon, ISBN: 978-0-375-42314-7). Here are some short excerpts from what Professor Chaitin said about the software of life in his talk in May 2011:
[P]eople often talk about DNA as a kind of programming language, and they mean it sort of loosely, as some kind of metaphor, and we all know about that metaphor. It’s especially used a lot, I think, in evo-devo. But it’s a very natural metaphor, because there are lots of analogies. For example, people talk about computer viruses. And another analogy is: there is this sort of principle in biology as well as in the software world that you don’t start over. If you have a very large software project, and it’s years old, then the software tends to get complicated. You start having the whole history of the software project in the software, because you can’t start over… You … can try adding new stuff on top…
So the point is that now there is a well-known analogy between the software in the natural world and the software that we create in technology. But what I’m saying is, it’s not just an analogy. You can actually take advantage of that, to develop a mathematical theory of biology, at some fundamental level…
Here’s basically the idea. We all know about computer programming languages, and they’re relatively recent, right? Fifty or sixty years, maybe, I don’t know. So … this is artificial digital software – artificial because it’s man-made: we came up with it. Now there is natural digital software, meanwhile, … by which I mean DNA, and this is much, much older – three or four billion years. And the interesting thing about this software is that it’s been there all along, in every cell, in every living being on this planet, except that we didn’t realize that … there was software there until we invented software on our own, and after that, we could see that we were surrounded by software…
So this is the main idea, I think: I’m sort of postulating that DNA is a universal programming language. I see no reason to suppose that it’s less powerful than that. So it’s sort of a shocking thing that we have this very very old software around…
So here’s the way I’m looking at biology now, in this viewpoint. Life is evolving software. Bodies are unimportant, right? The hardware is unimportant. The software is important…
In the opinion of this eminent Darwinist scientist, then, talk of a genetic code is quite literal: “it’s not just an analogy.”
Is the information in life quantifiable?
Some ID critics object that the functional complex specified information we see in living things is unquantifiable. However, this criticism can be easily rebutted. The term “functional information” has been rigorously defined by Szostak and his colleagues in recent scientific papers:
1. Hazen, R.M.; Griffin, P.L.; Carothers, J.M.; Szostak, J.W. 2007, Functional information and the emergence of biocomplexity, Proc Natl Acad Sci U S A, 104 Suppl 1, 8574-81.
2. Szostak, J.W. 2003, Functional information: Molecular messages, Nature, 423, (6941) 689.
3. Carothers, J.M.; Oestreich, S.C.; Davis, J.H.; Szostak, J.W. 2004, Informational complexity and functional activity of RNA structures, J Am Chem Soc, 126, (16) 5130-7.
For instance, here is how Hazen, Carothers and Szostak define functional information in the abstract of their 2007 paper:
Complex emergent systems of many interacting components, including complex biological systems, have the potential to perform quantifiable functions. Accordingly, we define “functional information,” I(Ex), as a measure of system complexity. For a given system and function, x (e.g., a folded RNA sequence that binds to GTP), and degree of function, Ex (e.g., the RNA–GTP binding energy), I(Ex) = -log2[F(Ex)], where F(Ex) is the fraction of all possible configurations of the system that possess a degree of function [greater than or equal to] Ex. Functional information, which we illustrate with letter sequences, artificial life, and biopolymers, thus represents the probability that an arbitrary configuration of a system will achieve a specific function to a specified degree.
The information in a protein is therefore quantifiable, and has been quantified, as those who are familiar with Dr. Douglas Axe’s work will be aware.
Do we have to speak of a “genetic code”?
On Barry Arrington’s post, A dog is a chien is a perro is a hund, ID critic Alan Fox has proposed that talk of a “genetic code,” while convenient for scientific purposes, is ultimately reducible to chemistry:
Interactions between molecules involve their chemical properties; charge, conformation, level of hydrophilic and lipophilic residues etc. Nothing analogous to language goes on here. (Comment 118)
DNA sequences translate to specific protein sequences by chemical interactions. (Comment 174)
I see no communicative element in the chemical processes that occur when DNA sequences are transcribed into RNA and translated into polypeptide sequences. It’s all a result of the inherent physical and chemical properties of the interacting molecules… To lump chemical processes in with aspects of linguistics is such a stretch that any set that encompasses both is large enough and fuzzy enough to be meaningless… At the cellular and sub-cellular level and consequently and cumulatively at the level of the organism there is a huge amount of communication going on. It is chemical communication… “Encode” could be used as a defined shorthand for some step in the chemical processes that go on in the cell, of course. Maybe there is a scientific definition in the context of biochemistry. (Comment 184)
DNA transcription and translation is a chemical chain of reactions that depends on the spacial conformation and inherent chemical properties of atoms and molecules. (Comment 274)
Eric Anderson challenged Alan Fox at one point in the exchange of opinions:
I hope you aren’t saying that specific protein sequences arise automatically by chemical reactions once the sequence of nucleotides is exposed? There is a whole system in place that takes the 4-character digital code and translates it on the basis of the genetic code into a subsequent physical chain of amino acids. This does not just happen by chemistry. The translation (and it is not just called that by analogy, it is really what is going on) is precisely one of the things that highlights the semiotic nature of the system we are dealing with. (Comment 179)
In a similar vein, Joe responded:
Except there isn’t any physical and chemical properties that DETERMINE which codon REPRESENTS what amino acid. (Comment 192)
Without being uncharitable to Alan Fox, I’d like to get to what seems to me to be the fundamental issue dividing those who insist that talk of a “genetic code” is indispensable to biology from those who say it is not. The critical question, it seems to me, is whether life can be described and explained from the bottom up. What Fox is saying is that we can give from reductionist, bottom-up account which has the same explanatory power as talk of a genetic code. The latter might be more convenient than the former, but is is no more powerful.
If it turns out, then, that the genetic code is a top-down feature of life, then it will indeed be indispensable to biology. But is it? To answer this question, we need to examine the various hypotheses regarding the origin of the genetic code.
Hypotheses regarding the origin of the code
I’d like to quote again from the Wikipedia article on the genetic code, as no-one will accuse Wikipedia of being based in favor of Intelligent Design:
If amino acids were randomly assigned to triplet codons, then there would be 1.5 x 10^84 possible genetic codes to choose from. However, the genetic code used by all known forms of life is nearly universal with few minor variations. This suggests that a single evolutionary history underlies the origin of the genetic code. Many hypotheses on the evolutionary origins of the universal genetic code have been proposed.
Four themes run through the many hypotheses about the evolution of the genetic code:
* Chemical principles govern specific RNA interaction with amino acids. Experiments with aptamers showed that some amino acids have a selective chemical affinity for the base triplets that code for them. Recent experiments show that of the 8 amino acids tested, 6 show some RNA triplet-amino acid association.
* Biosynthetic expansion. The standard modern genetic code grew from a simpler earlier code through a process of “biosynthetic expansion”. Here the idea is that primordial life “discovered” new amino acids (for example, as by-products of metabolism) and later incorporated some of these into the machinery of genetic coding. Although much circumstantial evidence has been found to suggest that fewer different amino acids were used in the past than today, precise and detailed hypotheses about which amino acids entered the code in what order have proved far more controversial.
* Natural selection has led to codon assignments of the genetic code that minimize the effects of mutations. A recent hypothesis suggests that the triplet code was derived from codes that used longer than triplet codons (such as quadruplet codons). Longer than triplet decoding would have higher degree of codon redundancy and would be more error resistant than the triplet decoding. This feature could allow accurate decoding in the absence of highly complex translational machinery such as the ribosome and before cells began making ribosomes.
* Information channels: Information-theoretic approaches model the process of translating the genetic code into corresponding amino acids as an error-prone information channel. The inherent noise (that is, the error) in the channel poses the organism with a fundamental question: how can a genetic code be constructed to withstand the impact of noise while accurately and efficiently translating information? These “rate-distortion” models suggest that the genetic code originated as a result of the interplay of the three conflicting evolutionary forces: the needs for diverse amino-acids, for error-tolerance and for minimal cost of resources. The code emerges at a coding transition when the mapping of codons to amino-acids becomes nonrandom. The emergence of the code is governed by the topology defined by the probable errors and is related to the map coloring problem.
Transfer RNA molecules appear to have evolved before modern aminoacyl-tRNA synthetases, so the latter cannot be part of the explanation of its patterns.
There are enough data to refute the possibility that the genetic code was randomly constructed (“a frozen accident”). For example, the genetic code clusters certain amino acid assignments. Amino acids that share the same biosynthetic pathway tend to have the same first base in their codons. Amino acids with similar physical properties tend to have similar codons. A robust hypothesis for the origin of genetic code should also address or predict the following gross features of the codon table:
1. absence of codons for D-amino acids
2. secondary codon patterns for some amino acids
3. confinement of synonymous positions to third position
4. limitation to 20 amino acids instead of a number closer to 64
5. relation of stop codon patterns to amino acid coding patterns
It seems to me that what the “code skeptics” are saying is that if we can account for the origin of the genetic code in terms of either bottom-up processes (e.g. unknown chemical principles that make the code a necessity), or bottom-up constraints (i.e. a kind of selection process that occurred early in the evolution of life, and that favored the code we have now), then we can dispense with the code metaphor. The ultimate explanation for the code has nothing to do with choice or agency; it is ultimately the product of necessity.
In responding to the “code skeptics,” we need to keep in mind that they are bound by their own methodology to explain the origin of the genetic code in non-teleological, causal terms. They need to explain how things happened in the way that they suppose. Thus if a code-skeptic were to argue that living things have the code they do because it is one which accurately and efficiently translates information in a way that withstands the impact of noise, then he/she is illicitly substituting a teleological explanation for an efficient causal one. We need to ask the skeptic: how did Nature arrive at such an ideal code as the one we find in living things today?
By contrast, a “top-down” explanation of life goes beyond such reductionistic accounts. On a top-down account, it makes perfect sense to say that the genetic code has the properties it has because they help it to withstand the impact of noise while accurately and efficiently translating information. The “because” here is a teleological one. A teleological explanation like this ties in perfectly well with intelligent agency: normally the question we ask an agent when they do something is: “Why did you do it that way?” The question of how the agent did it is of secondary importance, and it may be the case that if the agent is a very intelligent one, we might not even understand his/her “How” explanation. But we would still want to know “Why?” And in the case of the genetic code, we have an answer to that question.
We currently lack even a plausible natural process which could have generated the genetic code. On the other hand, we know that intelligent agents can generate codes. The default hypothesis should therefore be that the code we find in living things is the product of an Intelligent Agent.
The question we now have to ask ourselves is whether a teleological account of life implies an Intelligent Designer. Recently, the philosopher Thomas Nagel has argued for a form of teleological naturalism, which I discussed in a recent post. Teleological naturalism at least recognizes the inadequacy of non-purposive causal explanations of the cosmos. That’s a big step in the right direction, and I respect Thomas Nagel for taking that step. Nevertheless, it has a fatal flaw: Nature, being unintelligent, does not and cannot look forward. It is precisely for this reason that the Intelligent Design movement contends that if the genetic code can only be understood from a top-down teleological perspective, then the emergence of the genetic code can only be adequately explained in terms of a Mind which produced it.
Does talk of a “genetic code” beg the question, by assuming the existence of a conscious sender and receiver and a rule-maker?
On Barry Arrington’s post, A dog is a chien is a perro is a hund, Alan Fox objected to talk of a “genetic code” on the grounds that “there is no giver or receiver of information in a communicative sense” (comment 274). He also objected to “rule” terminology on the grounds that it begs the question of the existence of a rule-maker. However, Chance Ratcliff successfully rebutted this objection when he defined the rule using the mathematical notion of a mapping: “the DNA to polypeptide mapping function is in the form F:A→B, where F: is performed essentially by RNA polymerase, aminoacyl trna synthetase, and the ribosome, and works by mechanism to convert elements of A (codons) into elements of B (amino acids)” (comment 308).
Likewise, there is no reason why we cannot speak of a molecule as a sender or receiver of information.
The road ahead
The Wikipedia article on the genetic code mentions five striking facts about the genetic code which any successful account must be able to explain. It seems to me that Intelligent Design would do well to focus on these “nitty-gritty” questions, in order to demonstrate its scientific superiority to “bottom-up,” reductionistic accounts of life. That, for the time being, is the way forward, I believe. If ID proponents can explain a lot more about the peculiar properties of life than their Darwinian counterparts, then the younger generation of scientists, who are not wedded to old dogmas and fossilized ways of thinking, will start to take notice.
What do readers think?