Uncommon Descent Serving The Intelligent Design Community

Guest Post Part 2: Qualitative Complex and Specified Information within genes – an example

arroba Email

All that follows is from Dr. JDD:

Hopefully from the first of these two posts the simplified concept of AltORFs that overlap existing genes has been sufficiently introduced. It appears to me that this is an area of research vastly underrepresented in not only the literature, but the minds and understandings of many PhD-level scientists today. I think the very fact that it is barely mentioned in papers such as those referenced previously (Nature publications on the human proteome, for example) illustrates this point to a degree.
The paper I wish to discuss is this one from 2013:

An out-of-frame overlapping reading frame in the ataxin-1 coding sequence encodes a novel ataxin-1 interacting protein. Bergeron D, et al. J Biol Chem. 2013 Jul 26;288(30):21824-35

ATXN1 is a gene that encodes a protein (ataxin-1) implicated in neurodegeneration. In particular, the expansion of a polyglutamine (poly-Q) repeat in the middle of the ~815amino acid protein is a hallmark of Spinocerebellar ataxia type 1 (you can read more about this condition online). As a side note – this is quite common in several neurodegenerative diseases: a repeat expansion of a triplet codon that encodes for a specific amino acid, usually Q. There is often some variation in how many Q’s are present in such a region from one person to the next, but over a certain number of repeats a degenerative phenotype is observed, with severity of disease usually associated with this number of repeats. Therefore understanding the regulation of the gene/protein and other genes/proteins associated with it is essential for better understanding the disease.

The normal function of ataxin-1 was unclear, however it is known to bind RNA and several transcription factors and localises to the cell nucleus. The researchers examined the genetic sequence of ATXN1 and noticed a potential overlapping ORF (open reading frame) that spanned from bp 30 up to bp 587.

They then elucidated the sequence of what they found was the dominant product of this AltORF, which was around 180aa in length (compared with with ~815aa of ataxin-1). This AltORF was an alternative frame read that started on the 3rd codon of the normal ORF. As a result, here is the protein sequence that is yielded from this AltORF:


If we take the amino acid sequence translated from the same region of the consensus ORF we get this sequence:


As you can see, these do not align at all. If you BLAST the alt-ATXN1 amino acid sequence in a simple BLASTp against nr sequences (default parameters), there is nothing in the database that it aligns with any measurable homology against, at all. Therefore, it seems very unique. It does not even align with any part of the ATXN1 protein (which is not unexpected given it is a +3 frame-shift).
So the researchers show that this protein is real, it is expressed in the cell, and most astonishingly, it interacts with the ataxin-1 protein. Not only does it localise to the same place (nucleus) but there is clear interaction between the two proteins that appears to be specific.

Now let us take a step back for one second. Why is this astonishing? I am sure many of our materialists friends will say this is not astonishing at all (despite the researchers using that common word “surprisingly” we so often read). They will maintain of course it coevolved as it is within the same gene, so it would only make sense to arise to be transcribed and translated at the same time as the normal ATXN1 as it is on the same ORF. But let not such playing down of this issue fool you – think about the complexity and impact of an 180aa protein with no homology to an 815aa protein co-evolving where the evolution of one inherently impacts that of the other (in that region), yet they seem to be functioning together. If you can remove your bias from the evolutionary paradigm and not shout “co-evolution” then you can see the astonishing nature of this unique gene within a gene. This is the stuff programmers dream of – embedding useful, functional and meaningful code within code. If that does not amaze you I do not know what will in molecular biology, quite frankly!

But qualitatively, we must come back to the point about this being an example of something complex (dual code layering) and specified (shared functionality – proteins interacting with each other and localising to the same place) information. I can give you no fancy formulas, probabilities nor statistical significance analysis. However if you lean towards the arguments of Douglas Axe, Michael Behe, Stephen Meyers, as well as others including people here at UD who have looked objectively at the probabilities and chance of unguided random processes generating a single protein of say the size of ataxin-1 (810aa long) that is complex and clearly has a specific role and function in the cell, such work casts doubt on the abilities of such arising in the proposed manner. However when you then add another layer of complexity to that – well, the probabilities do not really need to be calculated (I am not sure how they can be calculated). I think this is a case of “data speaking for itself.” I know most/all materialists will disagree with me here – but I write this more for sharing the fascination of biology with those who will be able to see how astonishingly complex our genes are, and think objectively about the implications.

So, let us return to the paper. The researchers do demonstrate that the Alt-ATXN1 protein interacts with ataxin-1. Another striking feature is that it does not contain a classical canonical nuclear localisation signal (NLS), yet it is clearly transported back into the nucleus after translation in the cytoplasm. Again, they show a more novel and unique way that Alt-ATXN1 is transported to the nucleus, which allows it to correctly localise with ataxin-1 and interact with it.

It should be emphasised that interaction does not automatically mean functionality or essential role. However, the researchers do show a specific interaction of the two proteins and they show significant amounts of each are transcribed. Some AltORFs that are quite small are thought to be transcribed and near instantly broken down and degraded in the cell so have a very short cellular half-life. This does not seem to be the case here. The paper presents therefore several lines of evidence that would suggest this Alt-ATXN1 has a cellular role to play or fulfil. The authors suggest from some of these results that Alt-ATXN1 appears to be a “mediator of the function of ATXN1 in normal and/or pathological conditions.” They also display RNA-binding activity of this Alt-ATXN1 protein. Therefore, it would be foolish and naive to conclude that this Alt-ATXN1 protein is in fact of insignificance and the by-product of an aberrant translational event by accidental transcription of an out-of-frame coding sequence.

In summary of this work we see several things, which are of significance when we globally consider random mutations within genes. Some of the key questions therefore abound:

1) How common are such AltORFs?

2) How many AltORFs produce translated products that are directly involved with the normal ORF product?

3) How can we model the pathway of “evolution” to account for such AltORF products?

Consider the comment on the Nature paper in my first post –they anticipate/estimate up to 3.88 AltORFs per gene. Additionally, around 50% of the MS spectra from the peptides identified to define the human proteome could not be mapped to sequences in the public datasets, implying that we are indeed, missing many proteins (often tissue restricted expression) and that our understanding of what the genetic information is for defining the proteinacious compartment is still relatively naïve. If indeed we find commonality (even if 2-5% of proteins that contain such AltORFs) of overlapping AltORFs, this to my mind poses a serious problem for understanding and acceptance of unguided processes generating such codes that have specific function and complexity. This is especially true if such proteins are indispensible in developmental pathways.

Perhaps you are reading and already knew some (or all) of this and it is old news. Personally, I still get fascinated by such work even when I contemplate it for the second, third and times beyond. To me, it cries out incredible design. Sure, design that is broken, but still design, and I would actually be less OK if I saw design that was not broken, given my faith. We all must have faith though – faith that something so intricate and complex as multiple layers of code within the same code can be accounted for by random unguided accidents, or faith that what cries out design is actually the product of a designer outside our known space-time dimension and therefore invisible to our eyes (and full understanding). Personally I strongly believe that no more faith is required for the latter when you consider the vastness of even the most minuscule cellular process, and quite frankly, how little we know.

I actually quite like such posts. It is all the more important that their authors are biologists as they can provide very interesting details. In terms of perfect design, I think that biosystems were designed to meet a whole number of (conflicting) criteria. Failing to recognize that may lead to severe underestimations of the 'quality of service'. I strongly suspect that biosystems are very close to (if not spot-on) Pareto-optimal. EugeneS
Mung, Many Thanks. EugeneS
Eric, in fact, bacteria are too perfect, thus they could not have arisen first and given rise to eukaryotes. :) I was thinking on this exact point today. Given the evolutionary narrative, it should have been the earliest forms of life that had the most "junk" in them. So once again things seem upside down from what would be expected from evolution. And to add further to the subject of the OP, in eukaryotes you don't start with DNA -> messenger RNA. Before that you have the splicing of introns, so that's yet another overlay of a code. Mung
Dr JDD @10:
Broken design is what would fit with the Christian design paradigm . . .
Or the Judeo paradigm. Or various other groups. Or, more importantly, the paradigm of any rational individual who recognizes that designed systems can break, fail, degrade over time. What is a failed paradigm is the absurd, illogical, and juvenile insistence from so many ID opponents that anything designed in biology must be "perfect" (always according to their subjective, poorly-thought-out, personal preference of what such perfect design should look like). Eric Anderson
nad med @11: ". . . something must regulate the rule by which both proteins are generated specially when they differ in time and quantity . . ." Exactly. And this is an an area that has been vastly underrepresented in the literature, though it is becoming more appreciated as a significant issue. Control of processes is absolutely critical. In most complex, integrated, functional systems it requires significantly more information than the specification of the parts themselves. This realization gives the lie to the old gene => makes protein => makes us simplistic storyline. It should also give anyone pause who thinks that there is a lot of junk DNA around because most of the DNA doesn't code for proteins. An extremely exciting and wide-open area of research for decades to come. Eric Anderson
Dr. JDD : I mean if we got a sequence within sequence then something must regulate the rule by which both proteins are generated specially when they differ in time and quantity .....do we know how that is regulated ? nad med
Nad Med: Broken design is what would fit with the Christian design paradigm. I.e. degradation of the genome and biological components from their original design. Dr John Sandford I believe terms this genetic entropy. What we see around us looks imperfect (a criticism of design touted as flawed) but that is a broken version of the original. Apologies, I'm not sure what you mean in your second question exactly? Dr JDD
Hi EugeneS, https://uncommondesc.wpengine.com/intelligent-design/guest-post-part-1-of-2-qualitative-complex-and-specified-information-within-genes-an-introduction/ Mung
DR JDD: I could not find a hyperlink from here to the first of the two posts. Thanks. EugeneS
Dr. JDD : What is the supervisor of the process ? Thanks nad med
Dr. JDD : What do you mean by broken design ? Thanks nad med
Box - I have access to that article. It is a useful review, if you want it. There is an open access article of original research from the same authors describing how nearly 3% of all proteins are from AltORFs, which is not insignificant when you consider the number of proteins vastly out numbers the number of traditional genes due to alternative splicing etc. Thanks for the interest. As stated in and of itself these are not proofs at all that unguided processes cannot account for such observations. However, any additional layering of code only decreases chance and increases the complexity and hence problem of chance generating such a code. I do hope the many IDers who are smarter than me with probabilistic models will start to interrogate such findings more and see how it can or cannot fit with current proposed evolutionary models. Another thing that is very interesting to me is how there is very high homology in mammals for the parent protein but the alternative format shows dramatically less homology, as low as 35% from human to mouse (where the traditional protein is >80%). Dr JDD
Dionisio, Indeed! Keep up the good work. Overlapping code may very well prove to be of special interest, because, as Dr JDD points out, it appears to be well within the reach of probability calculation a la Gauger & Axe. If so, it will be very interesting to see it's devastating effect on materialism. - - - - - Found in translation: functions and evolution of a recently discovered alternative proteome may be a relevant article. Unfortunately I have no access. Box
Box, Very interesting article. Thank you. These days it's very exciting to read about the discoveries reported by biology researchers, right? :) Dionisio
Thank you for discussing this article Dr JDD. The following article may be of interest: The A2A Adenosine Receptor Is a Dual Coding Gene, Chien-fei Lee et al.
We report here that in addition to the production of the A2AR protein, translation from an upstream, out-of frame AUG of the rat A2AR gene produces a 134-amino acid protein (designated uORF5).
uORF5 has an “unknown function”, right? Nope.
Expression of the uORF5 protein suppressed the AP1-mediated transcription promoted by nerve growth factor and modulated the expression of several proteins that were implicated in the MAPK pathway.
We already knew this, right? Nope.
Taken together, our results show that the rat A2AR gene encodes two distinct proteins (A2AR and uORF5) in an A2AR-dependent manner. Our study reveals a new example of the complexity of the mammalian genome and provides novel insights into the function of A2AR.
So only in rats, right? Nope.
The expression of uORF5 was detected in rat tissues where the Adora2a transcript is markedly expressed (Fig. 3C) and under conditions in which Adora2a is up-regulated (e.g. hypoxia) (Fig. 3F). In addition, an ORF for a similar mouse uORF5 protein, which shares 84% aa identity with rat uORF5, was detected in the mouse Adora2a gene. In the human and chimpanzee genomes, the ORF for a shorter uORF5-like protein, which shares 75% aa identity with rat uORF5, was also detected.
Dr. JDD, Very interesting post. Thank you. Dionisio

Leave a Reply