This paper will be quoted for many years as the basis for faith in the de novo origin by evolution of useful proteins. How good is the evidence?
De Novo Origination of a New Protein-Coding Gene in a yeast species by Wen Wang et al
“THE total number of different proteins in all organisms on earth is estimated to be 10exp10 to 10exp12. How the protein repertoire evolved the giant diversity that underlies the evolution of the complexity of life is attracting many evolutionary biologists to the field.
Origination of new genes is an important mechanism generating genetic novelties during the evolution of an organism. Processes creating new genes using pre-existing genes as the raw materials are well characterized, such as exon shuffling, gene duplication, retroposition, gene fusion, and fission. However, the process of how a new gene is de novo created from noncoding sequence is largely unknown.
On the basis of genome comparison we have identified a new de novo protein-coding gene, BSC4 that may be involved in the DNA repair pathway, when shifted to a nutrient-poor environment.
We propose that a new de novo protein-coding gene may have evolved from a previously expressed non-coding sequence.”
DOI: 10.1534/genetics.107.084491 May 2008
7 Replies to “Claims of De Novo active protein”
It seems very simple. Judging from the abstract, there is no evidence. That’s what they say:
“The BSC4 gene has an open reading frame (ORF) encoding a 132-amino-acid-long peptide, while there is no homologous ORF in all the sequenced genomes of other fungal species, including its closely related species such as S. paradoxus and S. mikatae. The functional protein-coding feature of the BSC4 gene in S. cerevisiae is supported by population genetics, expression, proteomics, and synthetic lethal data. The evidence suggests that BSC4 may be involved in the DNA repair pathway during the stationary phase of S. cerevisiae and contribute to the robustness of S. cerevisiae, when shifted to a nutrient-poor environment. Because the corresponding noncoding sequences in S. paradoxus, S. mikatae, and S. bayanus also transcribe, we propose that a new de novo protein-coding gene may have evolved from a previously expressed noncoding sequence.”
Anyway, the paper could be really interesting. Unfortunately, I could not gain access to the full paper, up to now.
In my opinion, the demonstration of a functional protein which has no homologue in the related species is strong evidence for ID. The proposal that it evolved from non coding sequence could be either a fairy tale or an interesting issue form the point of view of ID, I cannot judge unless I can read the full text.
I am always interested in non coding sequences, after all, especially when they transcribe…
In order to know whether the gene was original or a retread (I’m assuming a minor alteration), we’d have to know the history of the non-coding sequence.
This is a interesting article, thank you for bringing it to my attention. Add this to the list of “unknown unknowns”.
By “10exp10 to 10exp12” do you mean 10 to the tenth power and ten to the 12th power? YES
Well, I have read the paper, and I will try to give my impressions.
First of all, it’s interesting. Indeed, it could be a first step in a very rewarding approach to some fundamental questions.
Second, it’s highly speculative. It makes a lot of assumptions, none of which is really without risk. Even so, it is worthwhile to make at least some of them, provided that nobody considers these speculations as real evidence. We have here a really tentative approach, and as such it should be considered.
Third, and this is the bad news, the interpretation of the results (and of the assumptions, because there are more assumptions than results) is strictly conventional, and applies all the standard darwinian fairy tales as though they were consecrated truths. But now, what could you expect from a mainstream paper at the discussion level?
Anyway, here the important thing is not the discussion, but rather the facts and the interesting assumptions linked to the facts. So, let’s go into some detail.
To sum up, the scenario is as follows.
1) The authors have studied a gene in the yeast Saccharomyces Cerevisiae, called BSC4. It has the characteristics of a protein coding gene: it has an ORF and a stop codon. The corresponding protein would be 132 aminoacids long.
2) There is no final proof that it is really a protein coding gene, or that the protein really exists, or of what its funcion is. The authors, however, give a series of indirect evidences, some of them rather convincing, that it could be so, and that the putative protein could have some function in the DNA repair pathway. I think we can accept this, always remembering that it is not supported by direct evidence, such as the isolation of the real protein.
3) The gene, as it is, is specific of S. cerevisiae, and presents no homology with any other protein coding gene in four other strictly related species of yeast, nor in less related yeasts. On that basis, the authors consider it a de-novo protein coding gene, at present restricted to a single yeast species. That is another assumption, but a rather credible one. Obviously, other mechanisms, like HGT from some other species, could be discovered in the future, as the authors are ready to admit.
4) Here comes the interesting part. The authors have observed that the flanking regions of the gene have good homologues in the four related species. So they compared the gene with the sequence of bp included between those flanking regions in the four related species (which are not protein coding genes), after manual alignment. The four related sequences and the BSC4 gene present some degree of homology, not too high, but not too low, 35.71%. That means that approximately one third of the nucleotides are the same in all five species, if we allow that the alignment is correct. That’s, in my opinion, a very interesting finding.
5) What can we say of the non coding sequences “homologue” to the BCS4 gene? Not much. There is some evidence that they can transcribe. Some parts of them, as we have seen, seem to be relatively conserved. All considered, they could be RNA genes, probably with regulatory functions. But, again, that’s a broad assumption.
6) So, what is the interpretation according to the authors? The RNA genes were already there, and they were transcribed, maybe for a function, maybe not. At some point, and only in S. cerevisiae, a new protein coding gene “evolved” form those RNA genes, profiting of the existing transcription apparatus, and acquiring the ORF and other details necessary for translation. The translated protein in some way appeared to be apt to be used in the DNA repair pathway, and so it was fixed by the usual hobbits working under NS.
a) If the assumptions are true, we have here a protein coding gene in a species approximately at the same position where, in strictly relates species, there are RNA genes, probably regulatory.
b) There is some homology, which would not seem completely random, between those sequences. While it is definitely too law to suspect any random derivation (approximately 350 nucleotides are completely different), it could have functional reasons. That would be interesting, especially if the genes are really of two completely different kinds (regulatory RNA genes, and a protein coding gene). Indeed, protein coding genes obey to the genetic code for aminoacids, while there is no reason why regulatory RNA genes should do the same (see the following discussion in “problems”).
a) What is the function of the non coding sequences? Why are they transcribed?
b) Is that function conserved in S. cerevisiae, even after the “transformation” of the sequence into a protein coding gene? And if not, what has been lost? Please notice that here, apparently, there is no gene duplication: the same sequence acts as an RNA gene in the four related species, and as a protein coding gene in S. cerevisiae.
c) What is the real advantage of the new protein coding gene? Was it really fixed by NS? Can we get any evidence of that?
d) Is it possible that the BSC4 gene still retains the functions of the RNA genes? That would explain the conserved sequences, and it would be extremely interesting, because it would mean that two completely different types of function (and probably of language) are implemented in the same sequence.
e) And, last but not least, the fundamental ID question, which obviously is never even whispered by the authors: how did the BCS4 gene arise? How did about 350 different nucleotides become fixed in the functional solution, on a search space practically infinite? (We are speaking of 4^350 combinations for the nucleotides to be changed, admitting that the existing 128 nucleotides remained constantly fixed; or, if you want, of 10^171 combinations for the protein). Were there intermediate functions, intermediate selections? And why?
f) Finally, a very interesting note, The proposed scenario allows us to understand well the ridiculous aspects of the concept of cooption. Indeed, here a cooption has to have happened, if we accept the proposed darwinian mechanism as a causal mechanism. The original RNA genes have their function. That’s why parts of them, probably the most important functional ones, are conserved. The protein coding gene has its function, which is completely different: to code for a protein. The sequence in the protein coding gene represents aminoacids, the sequence in the RNA genes, as far as we know, doesn’t. So, the regulatory function of the RNA genes in some way causes the fixation of the common sequences for a definite function, and then those same sequences are “coopted” in the protein coding gene, for another function, with another language. And still, those are the only conserved sequences!
How reasonable does all that sound?
Thanks, gpuccio, for taking the time to read and summarize. Seems like an interesting paper.
It is very clear that the actual research and results are quite far from the headlines. If I am reading your summary correctly, the authors have identified a unique coding sequence that is not present in other species of yeast.
The question of its origin, however, seems wide open. It would be interesting indeed if, as the authors suggest, the sequence “may have evolved from a previously expressed non-coding sequence,” but even if this were the case, this is hardly a good example of the kinds of de novo changes that are required for the traditional evolutionary paradigm.
I think the authors’ prior statement of the field remains fully intact: Namely, “the process of how a new gene is de novo created from noncoding sequence is largely unknown.”
From an ID perspective, we should also note that “how a new gene is de novo created from noncoding sequence” is very different from how the overall sequence came on the scene in the first place.
I doubt this paper will serve as evidence for, or will have much relevance to, the idea of the de novo origination of genetic information, but the paper does bring out some interesting questions that merit further research and evaluation.
Thank you for your comments.
I would like to add some reflections about this paper. The interest of the paper depends critically on the correctness of the fundamental assumption, that BSC4 is really a protein coding gene. That could be confirmed if the protein is really isolated, and, better still, if its function is discovered.
But let’s accept, for the moment, that the gene is what it seems. I have to remark again that this is a very interesting situation. Not only we have a well defined new protein coding gene (no homology to other known protein coding genes). We have it in a position well delimitated by homologous flanking regions, and in partial homology to corresponding segments in related species which are not protein coding genes.
Now, from a darwinian point of view, what could one expect? Perhaps that the non coding segments may be viewed as “links” to the evolution of the final gene. But here there is nothing which supports that view. The homologous parts are well conserved in all related species studied. So, one would think that they are functional. OK, it’s “only” abou 128 nucleotides, and there is the problem of manual aligning which could create biases, but still… The longest sequence conserved is of 12 consecutive nucleotides. 12 consecutive nucleotides in four species. 128 non consecutive. Why? What function can they ensure in non coding genes, which is retained in the coding gene?
We could think that those 128 nucleotides are only a passive demonstration of derivation, and not of function. But then, how and why did the other 350 nucleotide change (finding, let’s remember it, a rare functional configuration), while those 128 have remained the same? Pure chance?
No, we have to go back to the concept of function. And again, the only reasonable answer seems to be: the protein coding gene is designed to attain a completely new function, but it was necessary to maintain, inside it, the sequences responsible for another function, which has nothing to do with the protein. It could be a regulatory functon, linked to the general 3D structure of that segment of DNA. Or just the transcription of a functional RNA segment, in alternative to the protein coding mRNA.
Another good question is: what do darwinists think in a scenario like that: we have a new gene with no homologue, and a partial homology with non coding segments. About 350 nucleotides are completely new. How did they “evolve”? 350 independent mutations? Non selected? Anybody wants to compute the probabilistic resources?
Or, rather, selected? Selected for what? For the protein function? An where are all the intermediates? Negatively selected? And why have we at least four species without the protein, and without any visible approach to the protein, which are perfectly thriving today? And not one of the intermediate functional steps of the new protein?
Niches? Fairies? TE? Any guess?