Uncommon Descent Serving The Intelligent Design Community

Many genes relatively new, scientists find

Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

Science writer Carl Zimmer in the New York Times on recent discoveries in the ongoing evolution of genes:

New genes were long thought to derive from duplications or mistakes in older genes. But small mutations can also form new genes from scratch.

For some scientists, like Dr. Tautz, the data pointed to an inescapable conclusion: Orphan genes had not been passed down through the generations for billions of years. They had come into existence much later.

“It’s almost like Sherlock Holmes,” said Dr. Tautz, citing the detective’s famous dictum: “When you have eliminated the impossible, whatever remains, however improbable, must be the truth.”

Dr. Begun and his colleagues renamed orphan genes “de novo genes,” from the Latin for new. He found that many of his fellow scientists weren’t ready to accept this idea.

Can’t think why not, can you? 😉

While many de novo genes ultimately vanish, some cling to existence and take on essential jobs. Dr. Tautz said the rise of these genes might be as important a factor in evolution as gene duplication.

Now how will this affect attempts to construct evolutionary histories via genome mapping?

More Zimmer on the genome:

One gets the impression that the guy who thought our genomes were in some sense “us” spoke too quickly.

Follow UD News at Twitter!

Comments
Hi Piotr, Thank you for your response and your posts. I have really enjoyed reading the posts in this thread. This is good stuff! I see you're getting it from all sides, so I will will try not to expect you too respond to every single point. :) I'll try to summarize what I see as some of the issues. How do we know how young or how old an orphan gene is? It's not like we're looking at DNA from long dead organisms! (except by inference) If it's relatively short and only somewhat complex, then it's young, just doesn't seem satisfying. Regarding probabilities, you seem to think it more likely that there will be many young simple orphan genes than that there will be one or a few old complex orphan genes. But what makes one more probably than the other? It's as if you're saying simple function is easy to find but complex function is not, so we are more likely to find instances of simple function than instances of complex function. But this is based on what? Again, please don't feel you have to respond to all points.
Complex proteins are modular, and they can evolve by multiplying and/or shuffling their modules. Again, actual gene families can be used to illustrate it.
I still have a great deal to learn in this area, so what do you mean by modular? Are you talking about modular protein domains? Are you saying that all complex proteins are modular, and if so in what sense?
Protein domains represent the minimum levels of organization that provide a stable globular core...modular domains tend to be stable and fold on their own...they are not only folding modules but functional modules. ...Most protein adaptor domains are relatively small, comprising 60 to 130 amino acids. This is sufficient for a shallow concave binding site that can interact with a flexible peptide, carbohydrate, or lipid. These binding sites differ from most enzyme active sites, which usually lie between two domains and are able to take two or more molecules out of the aqueous environment. The nature of the binding site of adaptor modules has a number of interesting consequences for the functions of protein modules...The small size of the core binding region also means that there is often insufficient interaction to allow specificity. Modular Protein Domains
Mung
May 2, 2014
May
05
May
2
02
2014
04:59 PM
4
04
59
PM
PDT
Piotr's premise is nothing but wishful thinking. And why does similarity automatically mean homologs? Talk about weak thinking.Joe
May 2, 2014
May
05
May
2
02
2014
04:24 PM
4
04
24
PM
PDT
Piotr: It’s past midnight where I am too. See you tomorrow, if possible. :)gpuccio
May 2, 2014
May
05
May
2
02
2014
03:27 PM
3
03
27
PM
PDT
@gpuccio It's past midnight where I am. I shall be back tomorrow if I find the time.Piotr
May 2, 2014
May
05
May
2
02
2014
03:20 PM
3
03
20
PM
PDT
What would be the role of duplication to explain the things I have outlined? I mean internal duplication, e.g. when a whole exon is duplicated. It isn't something that could be achieved by point mutations accumulating over a zillion years.Piotr
May 2, 2014
May
05
May
2
02
2014
03:18 PM
3
03
18
PM
PDT
Piotr: As you can see, I have already answered. The problem is that new protein superfamilies appear all the time without any detectable homologue which can explain their random evolution, least of all naturally selectable intermediates. The problem here is not to argue about how to use the term "orphan" or de novo, but to explain how and why new complex sequence information arises against all probabilistic considerations. I think you are trying to evade the real issue. Your choice.gpuccio
May 2, 2014
May
05
May
2
02
2014
03:16 PM
3
03
16
PM
PDT
Piotr: The second "exception", Branchiostoma floridae, is much less stunning. There is only a partial alignment of interest. Here is the blast: Length: Zebrafish 1057 BF 1204 Identities: 29/90(32%) Positives 48/90(53%) Expected: 3e-06 Nothing really important here.gpuccio
May 2, 2014
May
05
May
2
02
2014
03:13 PM
3
03
13
PM
PDT
gpuccio: The fact remains that jawless fish, and practically all other species except vertebrates, have no trace of it. You mean gnathostomes (lampreys are also vertebrates). But homologues have been found in outgroups (lancelets, starfish and sea urchins), which militates against its de novo origin in gnathostomes and suggests the loss of the gene in lampreys and hagfish as an alternative. Note that even when studying "modern" orphan genes researchers cautiously check a number of criteria before they declare a gene an orphan, ruling out all other explanations. By contrast, you recklessly laim such a status for a gene that must be 500+ million years old. It's a cavalier approach.Piotr
May 2, 2014
May
05
May
2
02
2014
03:10 PM
3
03
10
PM
PDT
Piotr: OK, here is the blast of RAG1 in zebrafish agianst its homolog in Strongylocentrotus purpuratus (a sea urchin): Length: Zebrafish 1057 SP 983 Identities: 244/1010(24%) Positives: 402/1010(39%) Expect:. 1e-55 I agree that it is a strong homology, but completely isolated. However, we have here 244 identities. In humans, we have 632. 632 - 244 = 388. And the length of the SP is almost the same (983 AAs). So, even if the protein in SP is a precursor (but then , why was it lost in all other species preceding vertebrates), still 388 fixed AAs are to be explained after that precursor. I must admit that this strange observation would be a good argument against CD!gpuccio
May 2, 2014
May
05
May
2
02
2014
03:04 PM
3
03
04
PM
PDT
I asked for a serious reference. This ASA talk is hardly a serious review paper.
You obviously did not read it. It is about as serious as it gets. There are hundreds of references backing up every thing Wilcox says.jerry
May 2, 2014
May
05
May
2
02
2014
02:53 PM
2
02
53
PM
PDT
Piotr:
I never said that. I pointed out that there’s strong eveidence against its gene being a de novo one in early vertebrates.
What "strong evidence"? It appeared in vertebrates as a de novo gene(I know, there is some isolated homology in a couple of isolated species, which is really difficult to explain, whatever the theory one supports: are you referring to that?) The fact remains that jawless fish, and practically all other species except vertebrates, have no trace of it.gpuccio
May 2, 2014
May
05
May
2
02
2014
02:52 PM
2
02
52
PM
PDT
Piotr: I don't follow you. What would be the role of duplication to explain the things I have outlined? Please. explain. I can make allowance for any kind of random variation. Nothing changes. I used the term mutation to mean any possible random variation. What counts is the number of states which can be randomly reached in the system, in the time span. Each variation is a new state. So no, I am not arguing against a straw man. I like to argue against men in flesh and bones.gpuccio
May 2, 2014
May
05
May
2
02
2014
02:48 PM
2
02
48
PM
PDT
So, RAG1 is 1040 AAs long because it started short at the beginning of the vertebrate radiation? I never said that. I pointed out that there's strong eveidence against its gene being a de novo one in early vertebrates.Piotr
May 2, 2014
May
05
May
2
02
2014
02:46 PM
2
02
46
PM
PDT
@gpuccio, #102 I don't see where in your model allowance is made for things like internal duplication. Of course if you only consider point mutations, you argue against a straw man.Piotr
May 2, 2014
May
05
May
2
02
2014
02:42 PM
2
02
42
PM
PDT
Piotr:
Your answer is a non-explanation. We still don’t know why “the designer” behaves in this fashion. My explanation is simple: I reject your assertion that protein length can’t evolve.
It's not my assertion. It's only that we don't observe anything like that. Again, give empirical support to that idea.
The first vertebrates are more than twice as old as the first mammals. When the mammalian de novo genes arose, the vertebrate ones had already been evolving for 300 million years.
So, RAG1 is 1040 AAs long because it started short at the beginning of the vertebrate radiation? But then how is it that 632 AAs are absolutely conserved up to humans, and 786 are similar? Facts please, not imagination. I have given you some definite facts.gpuccio
May 2, 2014
May
05
May
2
02
2014
02:36 PM
2
02
36
PM
PDT
@Jerry I asked for a serious reference. This ASA talk is hardly a serious review paper.Piotr
May 2, 2014
May
05
May
2
02
2014
02:34 PM
2
02
34
PM
PDT
Errata corrige: Piotr:
If you don’t mind, I would appreciate some information about the flavour of ID that you prefer. Guided evolution, which accepts universal common decent, speciation, etc., over billions of years, but “the designer” is required to tweak the genome every time when some new function is to evolve?
Yes. I have always defended that position. To be precise, every time when some new complex function is to evolve, Remember, complex functional specification is the tool to infer design.gpuccio
May 2, 2014
May
05
May
2
02
2014
02:30 PM
2
02
30
PM
PDT
Piotr:
If you don’t mind, I would appreciate some information about the flavour of ID that you prefer. Guided evolution, which accepts universal common decent, speciation, etc., over billions of years, but “the designer” is required to tweak the genome every time when some new function is to evolve?
Yes. I have always defended that position. To be precise, every time when some new complex function is to evolve, Remember, complex functional specification is the tool to infer design.
gpuccio
May 2, 2014
May
05
May
2
02
2014
02:29 PM
2
02
29
PM
PDT
Piotr:
First, it isn’t my theory; it reflects the consensus of those who know more about protein evolution than either of us. Secondly, it is consistent with the statistics relating the mean length of proteins to their evolutionary age.
First, as already said I don't accept ideas out of authority, from anyone. If you report a theory, be ready to defend it yourself. Second, the theory needs not only to be "consistent" with protein lengths, a facts that can be better explained in many other ways. It needs to be supported by some observation in the proteome, for example some real cases of proteins which start short and become longer across speciation, while maintaining or improving their function. I am waiting.gpuccio
May 2, 2014
May
05
May
2
02
2014
02:27 PM
2
02
27
PM
PDT
Piotr:
OK, I’ll repeat ad nauseam if necessary: why should it be the case?
Proteins are as long as it is necessary for their function. We have seen that the mean length in vertebrates is 624 AAs, but RAG1 is more than 1000. There are short old proteins, and long recent proteins. In average, the length becomes lower with time. As I have said, the only reasonable explanation for that is that the kind of function which is necessary in more recent speciation is different, more regulatory and less focused on fundamental biochemical activities. Your theory that proteins become longer after starting short has no empirical support. I have made specific examples. You give us some, something real.
If they are very important and highly conserved, stabilising selection conserves their size as well. It doesn’t mean that the ATP gene family sprang into existence ready-made.
So, what is your idea? How did the beta subunit of ATP synthase, with its 330 conserved AAS, "spring into existence"? A proteins which is in all living cells? And how did the RAG1 protein, with its 700 conserved aminoacids, "spring into existence"? A protein which appears in jawed fish and is ultraconserved up to humans, and has no detectable homolog at all in lamprey, a jawless fish?gpuccio
May 2, 2014
May
05
May
2
02
2014
02:23 PM
2
02
23
PM
PDT
Piotr: Here I post again the computation for my biological probability bound of 150 bits:
Just follow me a little bit. a) “Simultaneous” obviously does not mean “in one attempt”. What we have to consider is the whole system, which is made of: a1) a population size (a number of replicators) a2) a mean replication time a3) a time span (the time available for the new “species”, or whatever, to appear), IOWs for the transition from A (the precursor) to B (the new thing) a4) a mutation rate a5) the number of new proteins that characterizes the new state (B) versus A a6) the probability for each new protein to arise in a random system, in one attempt b) Given those numbers, we can make a few easy computations c) I will assume an extremely generous model. c1) Out population is the whole prokaryotic population on our planet. I will estimate it at 5*10^30 individuals (I have found that on the internet) c2) I assume a mean replication time of one division every 30 minutes c3) I assume a time span of 4 billion years (2.1*10^15 minutes) c4) I assume a mutation rate of 0.003 mutations per genome per generation (from internet, again) c5) I assume that B is characterized, versus A, by 3 new proteins, completely unrelated at sequence level with all the proteins in A, and unrelated one with the others c6) I assume the same functional complexity for each of the 3 proteins, of 357 bits (Fits), which is the median value for the 35 protein families evaluated in Durston’s paper. Multiplying c1 by c2 by c3 by c4, we get the total number of possible mutations in our system in the time span of 4 billion years. The result, with those numbers, is 1.0512*10^42. That is a higher threshold for the total number of individual new states that can be reached in our system in the time span (if each mutation gives a new state).
gpuccio
May 2, 2014
May
05
May
2
02
2014
02:08 PM
2
02
08
PM
PDT
Have you got a reference for that? It’s widely believed that the differences between humans and chimps are mostly due to differences in gene regulation, but it doesn’t follow that the regulatory networks of the human genome are “incredibly more complex”. They are just different for some key genes.
Go to https://uncommondescent.com/news/1177-human-orphan-genes-removed-by-evolutionists-from-databases/#comment-496683 Here is part of the summary:
What shall we say about the genes which make us human? We and chimps share 96% to 99% of our protein coding sequences. Why are we different? Not the 1.5% of our genome that codes for proteins but the 98.5% that controls their production. Literally, no other primate lineage has evolved as fast as our lineage has during the last 1.5 million years, and it’s all due to unique changes in our control genome. At least 80% probably more of our “non-coding” genome is also transcribed, starting from multiple start points, transcribed in both directions, with overlapping reading frames of many sizes and a whole spectrum of alterations, producing a whole zoo of ‘new’ types of RNA control elements – piRNA,siRNA, miRNA,sdRNA, xiRNA, moRNA, snoRNA, MYS-RNA, crasiRNA, TEL-sRNA, PARs, and lncRNA. Most of these unique RNA transcripts – and there are thousands, if not millions of them – are uniquely active in developing human neural tissue – uniquely active compared to their activity in chimpanzees, much less other primates or mammals. It is the new epigenetic world
Here is the link to the review paper http://www.asa3.org/ASA/meetings/belmont2013/papers/ASA2013Wilcox.pdfjerry
May 2, 2014
May
05
May
2
02
2014
02:04 PM
2
02
04
PM
PDT
Piotr: This is another post of mine:
Thank you for your thoughtful post. We really need more people with biological experience here! I think that, when we try to “explain” the evolution of functional sequences, like proteins, the mutation rate can be safely approximated in favour of the darwinian theory: the theory will however fail, and without any possible doubt. I will try to be more clear. What really counts here is not the mutation rate itself, but the number of states, or sequence configurations, that can be really achieved by the system in the available time. When Dembski proposed his famous universal probability bound, he set the threshold very high (500 bits of complexity, about 10^150 possible configurations), so that he could exclude any possible random search in the whole universe, even using all quantic states from the big bang on as bits for the computation. That is really remarkable, because 500 bits is equivalent to the complexity of a 115 AAs sequence (if the target space were made of one single state). Even considering the functional redundancy, we are well beyond that threshold in many complex proteins. For example, in Durston’s famous paper where he analyzes 35 protein families, 12 protein families have functional complexities beyond this universal probability bound, with the highest functional complexity being 2416 bits (Flu PB2). But I always felt that Dembski was being too generous here. So some time ago I tried to compute a gross threshold which was more appropriate for a biological system. So, I considered our planet, with a life span of 5 billion years, as though it had been fully covered by prokaryotes from the beginning of its existence to now, and I tried to compute, grossly, the total number of states which could have been tested by such a system, considering a mean bacterial genome, reproduction time, and a very generous estimation of a global bacterial population on our planet. The result, which can be more or less appropriate, was that 150 bits (10^45) of functional complexity were more than enough to exclude a random generation of a functional sequence in the whole life of our planet. Now, that is even more remarkable, because 150 bits is equivalent to about 35 AAs, and in Durston’s paper 29 protein families out of 35 were well beyond that threshold. Are we still exaggerating in favour of darwinism? Yes, we certainly are. First of all, prokaryotic life did not certainly begin 5 billion years ago (which is even more than earth’s real life span). Second, the earth was not certainly fully covered by prokaryotes from the beginning of life. Third, the appearance of new protein families is not restricted to prokaryotes, but it goes on in higher beings, up to mammals. And mammals reproduce much more slowly than prokaryotes, and, even more important, they are not as many of them on our planet. Therefore, the number of states that can be reached / tested by mammals, or more in general by metazoa, is much smaller than what can be reached / tested by prokaryotes, whatever the mutation rate. And still, new complex functional protein families which never existed before, and are totally unrelated to what existed before, continued to emerge in metazoa, up to mammals. So, maybe 150 bits is still too generous as a biological probability bound. After all, both Behe and Axe, starting from bottom up considerations, tend to fix the threshold of what random variation can achieve at 3-5 AAs (about 13-22 bits). But I fell that I can safely be generous. We win anyway. So, let it be 150 bits, for now :)
More in next post.gpuccio
May 2, 2014
May
05
May
2
02
2014
01:45 PM
1
01
45
PM
PDT
Piotr:
I suppose you mean Durston’s “fits”, units related to, but not identical with ordinary “bits”. Is there a magic threshold above which complexity is not evolvable? If so, what is it and how do you know?
Yes. As I have said, I have proposed 150 bits as a higher threshold for biological evolution of function on our planet by pure random variation. Here is a recnt discussion I had with VJ (if you don't accuse me of "Gish gallop"):
Let’s talk a minute of the edge of evolution. IOWs, about how much functional variation RV + NS can really do in the real world. Behe has put it above two AAs, in his very good book, and considering very good arguments from observable scenarios. Axe, with different experimental consideration, puts it at about 5 AAs. I believe that is probably very true. Dembski, with very general considerations, proposes 500 bits (about 116 AAs) as universal probability bound. I have proposed 150 bits (about 35 AAs) starting from a model like the one I posted here. You mention 1:10^70 as the number of folding proteins, from Axe again, I believe. That is probably true. But remember, folding is not all. A protein must not only fold to be visible to NS. It must confer a reproductive advantage in a definite environment. I often quote the following paper: “Experimental Rugged Fitness Landscape in Protein Sequence Space” http://www.plosone.org/article.....ne.0000096 which experiments with a mutated protein in a phage, trying to find the original wild type sequence which confers full infectivity. Well, here RV and NS are acting at their most, starting from a random library of peptide sequences, in a very favorable context (the function is maintained, even if at low levels, so NS can act just from the beginning). I always quote the final conclusion of the authors: “The question remains regarding how large a population is required to reach the fitness of the wild-type phage. The relative fitness of the wild-type phage, or rather the native D2 domain, is almost equivalent to the global peak of the fitness landscape. By extrapolation, we estimated that adaptive walking requires a library size of 10^70 with 35 substitutions to reach comparable fitness.” Well, 10^70 is the order of magnitude proposed by Axe for folding proteins, and 35 AAs is my “universal biological bound”. Coincidences? Probably, but what it means is that we are on the right way, and that our intuitions about the functional space of proteins are quite right. Durston has computed functional information for 35 different protein families. Only six of them have functional complexity below 150 bits (my biological bound). All of them are very short peptides (33 -55 AAs). Insulin (65 AAs) already has a functional complexity of 156 bits. The biggest protein in the list, Paramyx RNA Pol, has a functional complexity of 1886 bits. In my simple model, even if only one new unrelated protein of the median functional complexity of 357 bits, were to be found to confer some definite advantage to a new species, the probability of that single event in 4 billion years, in the whole planet, would still be 5.78e-132. IOWs, not a single new protein superfamily of median complexity could ever be generated on our planet by neutral mutations alone. And no functional precursors to superfamilies have ever been shown to exist, so the role of NS in that scenario is nil.
More in next post.gpuccio
May 2, 2014
May
05
May
2
02
2014
01:42 PM
1
01
42
PM
PDT
Quote was from Doug Axe page 420 "The Nature of Nature"Joe
May 2, 2014
May
05
May
2
02
2014
01:28 PM
1
01
28
PM
PDT
Reality refutes Piotr:
In essence, it appears to be physically implausible for the large protein structures we see in biology to have been built up from tiny ancestral structures in a way that: 1) employed only simple mutation events, and 2) progressed from one well-formed structure to another. Simply put, the reason for this is that folded protein structures consist of discrete multi-residue units in hierarchal arrangements that cannot be built through continuous accretion. The material on the outer surface of an accretive structure, such as a stalagmite, is converted to interior material as successive layers are added. For structures of that kind the distinction between exterior and interior is one of time-dependent geometry rather than of substance. By contrast, the process by which proteins fold involves a substantive distinction between interior and exterior that is evident in the final folded form. Since an evolutionary progression from tiny protein structures to large globular ones would have to repeatedly convert exterior surface to interior interface, this means that any such progression would have to coordinate the addition of appropriate new residues with the simultaneous conversion of existing ones. Considering that these structural additions and conversions would both involve many residues, it seems inescapable that one or the other of the above two conditions would be violated. Furthermore, on top of these conditions is the primary consideration in this section- that of function.
IOW the only way they could have evolved is by design.Joe
May 2, 2014
May
05
May
2
02
2014
01:25 PM
1
01
25
PM
PDT
Mung:
What’s your evidence for the gradual increase in length over time and increased specificity over time and heightened complexity over time of the older proteins?
There's enough evidence of things like exon/domain duplication via unequal crossing-over, for example. You can easily google up any number of examples. Complex proteins are modular, and they can evolve by multiplying and/or shuffling their modules. Again, actual gene families can be used to illustrate it.Piotr
May 2, 2014
May
05
May
2
02
2014
01:17 PM
1
01
17
PM
PDT
@Mung: OK, back to your post #71
Are these “young orphan genes” and how would you know? Is it because they are short and don’t code for anything all that complex? Because there are no duplicates of them?
One of the important criteria for the identification of true orphan genes is the presence of a related non-coding sequence in close outgroups: link
Why are numerous short functional orphan genes more probable than a single one that codes for a complex protein?
I'm not sure what you mean by "probable" here. Both simple and complex proteins exist, so there must be possible pathways leading to both. I would say that the accidental origin "from scratch" of a gene encoding for a complex functional protein is practically impossible, while the accidental emergence of functionality in a small protein with a random amino acid sequence is possible despite its small probability (see Szostak's experiments with ATP-binding short proteins randomly generated in vitro).Piotr
May 2, 2014
May
05
May
2
02
2014
01:03 PM
1
01
03
PM
PDT
Nothing and obviously it happened at the OoL- the intervention event. However now there isn’t any intervention now. Organisms have to rely on their built-in responses to environmental cues- most of which occur within the immune system- something else that unguided evolution cannot explain.
A fine statement of young earth creationism. God started it all off then it was hands off, everything else mechanistic.
That has nothing to do with YEC.
Evolution.
ID is not anti-evolution.
Can we call this intelligent PRE-DESIGN so as to distinguish it from intelligent design?
If you can show there is a difference. Good luck with that.
Does part of that pre-design preclude species from eventually evolving into new “kinds”?
Only if such a thing is possible.Joe
May 2, 2014
May
05
May
2
02
2014
12:41 PM
12
12
41
PM
PDT
Doug Axe, in an eaasy in "The Nature of Nature", explains why protein length doesn't grow. I'll have another read of it laterJoe
May 2, 2014
May
05
May
2
02
2014
12:36 PM
12
12
36
PM
PDT
1 2 3 4 5 7

Leave a Reply