Uncommon Descent Serving The Intelligent Design Community

Interesting proteins: DNA-binding proteins SATB1 and SATB2

Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

With this OP, I am starting a series (I hope) of articles whose purpose is to present interesting proteins which can be of specific relevance to ID theory, for their functional context and evolutionary history.

DNA-binding protein SATB1

SATB1 (accession number Q01826) is a very intriguing molecule. Let’s start with some information we can find at Uniprot, a fundamental protein database, about what is known of its function (in the human form):

Crucial silencing factor contributing to the initiation of X inactivation mediated by Xist RNA that occurs during embryogenesis and in lymphoma

And:

Transcriptional repressor controlling nuclear and viral gene expression in a phosphorylated and acetylated status-dependent manner, by binding to matrix attachment regions (MARs) of DNA and inducing a local chromatin-loop remodeling. Acts as a docking site for several chromatin remodeling enzymes

IOWs, it is an important regulatory protein involved in many different, and not necessarily well understood, processes, which binds to DNA and in involved in chromatin remodeling.

It is also involved in hematopoiesis (especially in T cell development), and has important roles in the biology of some tumors:

Modulates genes that are essential in the maturation of the immune T-cell CD8SP from thymocytes. Required for the switching of fetal globin species, and beta- and gamma-globin genes regulation during erythroid differentiation. Plays a role in chromatin organization and nuclear architecture during apoptosis.

Reprograms chromatin organization and the transcription profiles of breast tumors to promote growth and metastasis.

Keywords for molecular function: Chromatin regulatorDNA-bindingRepressor

Now, some information about the protein itself. I will relate, again, to the human form of the protein:

Length: 763 AAs. It’s a rather big protein, like many important regulatory molecules.

Its subcellular location is in the nucleus.

It is a multi-domain protein, with at least 5 detectable domains and many DNA binding sites.

Evolutionary history of SATB1

Now, let’s see some features of the evolutionary history of this protein in the course of metazoa evolution.

I will use here the same tools that I have developed and presented in my previous OP:

The amazing level of engineering in the transition to the vertebrate proteome: a global analysis

So, I invite all those who are interested in the technical details to refer to that OP.

Here is a graph of the levels of homology to the human protein detectable in other metazoan groups, expressed as mean bitscore per aminoacid site:

 

Fig. 1: Evolutionary history of SATB1 by human-conserved functional information

 

The green line represents the evolutionary history of our protein, while the red dotted line is the reference mean line for the groups considered, as already presented in my previous post quoted above (Fig. 2).

As everyone can see, this specific protein has a very sudden gain in human-conserved information with the transition from pre-vertebrates to vertebrates. So, it represents a very good example of the information jump that I have tried to quantify globally in my previous post.

Here, the jump is of almost 1.5 bits per aminoacid site. What does that mean?

Let’s remember that the protein is 763 AA long. Therefore, an increase of information of 1.5 bits per aminoacid corresponds to more than 1000 bits of information. To be precise, the jump from the best pre-vertebrate hit to the best hit in cartilaginous fish is:

1049 bits

But let’s see more in detail how the jump happens.

I will show here in detail some results of protein blasts. All of them have been obtained using the Blastp software at the NCBI site:

https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastp&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome

with default settings.

Here is the result of blasting the human protein against all known protein sequences except for vertebrate sequences:

Fig. 2: Results of blasting human SATB1 against all non vertebrate protein sequences

 

As can be seen, we find only low homologies in non vertebrates, and they are essentially restricted to a small part of the molecule, that correspond to the first two domains in the protein, or just to the first domain. The image shows clearly that all the rest of the sequence has no detectable significant homologies in non vertebrates (except for a couple of very low homologies for the third domain).

The best hit in non vertebrates is 154 bits with Parasteatoda tepidariorum, a spider. Here it is:

Fig. 3: The best hit in non vertebrates (with a spider)

The upper line (Query) is the human sequence. The bottom line (Sbjct) is the aligned sequence of the spider. In the middle line, letters are identities, “+” characters are similarities (substitutions which are frequently observed in proteins, and are probably quasi-neutral), and empty spaces are less frequent substitutions, those that are more likely to affect protein structure and function if they happen at a functionally important aminoacid site.

The alignment here is absolutely restricted to AAs 71 – 245 (the first two domains), and involves only 177 AAs. Of these, only 78 (44%) are identities and 111 (62%) are positives (identities + similarities). So, in the whole protein we have only 78 identities out of 763 (10.2%).

The spider protein is labeled as “uncharacterized protein”, and that is the case in most of the other non vertebrate hits.

All the other non vertebrate hits, with a couple of exceptions, are well below 100 bits, most of them between 70 and 86 bits.

IOWs, the protein as we know it in vertebrates essentially does not exist in non vertebrates.

Even non vertebrate deuterostomia, which should be the nearest precursors of the first vertebrates, have extremely low homology bitscores with the human protein:

Saccoglossus kowalevskii (hemichordates):  87 bits

Branchiostoma floridae (cephalochordate): 67 bits

The information jump in vertebrates

Now, what happens with the first vertebrates?

The oldest split in vertebrates is the one between cartilaginous fish and bony fish (from which the human lineage derives). Therefore, homologies that are conserved between cartilaginous fish and humans had reasonably to be already present in the Last Common Ancestor of Vertebrates, before the split between cartilaginous fish and bony fish, and have been conserved for about 420 million years.

So, let’s see the best hit between the human protein and cartilaginous fish. It is with Rhincodon typus (whale shark). Here it is:

 

Fig. 4: The best hit of human SATB1 in cartilaginous fish (with the whale shark)

 

Here, the alignment involves practically the whole molecule (756 AAs), and we have 1203 bits of homology, 603 identities (79%), 659 positives (86%).

IOWs, the two molecules are almost identical. And the homology is extremely high not only in the domain parts, but also in the rest of the protein sequence.

Now, the evolutionary time between pre-vertebrates and the first split in vertebrates is certainly rather small, a few million years, or at most 20 – 30 million years. Not a big chronological window at all, in evolutionary terms.

However, in that window, this protein appears almost complete. 603 aminoacids are already those that will remain up to the human form of the protein, and only 78 of them were detectable in the best hit before vertebrate appearance.

1049 bits of new, original functional information. In such a short evolutionary window.

Functionality

Why functional? Because those 603 aminoacid have remained the same thorugh more than 400 million years of evolution. They have evaded neutral or quasi neutral variation, that would have certainly completely transformed the sequence in such a big evolutionary time, if those aminoacid sites were not under extreme functional constraint and purifying (negative) selection.

Now, I say that this fact cannot in any way be explained by any neo-darwinian model. Absolutely not.

Moreover, there is absolutely no evidence in the available proteome of any intermediate form, of any gradual development of the functional sequence that will be conserved up to humans (except, of course, for the 50 – 78 AAs which are already detectable in the first two domains in many pre -vertebrates).

By the way, Callorhincus milii, the Elephant shark, has almost identical values of homology:

1184 bits, 599 identities, 654 positives

But, how important is this protein?

In the ExAC database, a database of variations in the human genome, missense mutations are 110 out of 260.3 expected, with a z score of 4.56, an extremely high measure of functional constraint.

The recent medical literature has a lot of articles about the important role of SATB1 at least in two big fields:

  • T cell development
  • Tumor development (many different kinds of tumors)

If we want to sum up in a few words what is known, we could say that SATB1 is considered a master regulator, essentially a complex transcription repressor, involved mainly (but not only) in the development of the immune system, in particular T cells. A disregulation of this protein is linked to many aspects of tumor invasivity (especially metastases). The protein seems to act, among other possibilities, as a global organizer of chromatin states.

Here is a very brief recent bibliography:

Essential Roles of SATB1 in Specifying T Lymphocyte Subsets

SATB1 overexpression correlates with gastrointestinal neoplasms invasion and metastasis: a meta-analysis for Chinese population

SATB1-mediated Functional Packaging of Chromatin into Loops

DNA-binding protein SATB2

But there is more. There is another protein which is very similar to SATB1. It is called DNA-binding protein SATB2 (accession number Q9UPW6).

Its length is very similar to SATB1: 733 AAs.

Uniprot describes its function as follows:

Binds to DNA, at nuclear matrix- or scaffold-associated regions. Thought to recognize the sugar-phosphate structure of double-stranded DNA. Transcription factor controlling nuclear gene expression, by binding to matrix attachment regions (MARs) of DNA and inducing a local chromatin-loop remodeling. Acts as a docking site for several chromatin remodeling enzymes

Which is very similar to SATB1. But now come the differences. While SATB1 is implied prevalently in T cell development and tumor development, SATB2 is:

Required for the initiation of the upper-layer neurons (UL1) specific genetic program and for the inactivation of deep-layer neurons (DL) and UL2 specific genes, probably by modulating BCL11B expression. Repressor of Ctip2 and regulatory determinant of corticocortical connections in the developing cerebral cortex. May play an important role in palate formation. Acts as a molecular node in a transcriptional network regulating skeletal development and osteoblast differentiation

So, similar proteins with rather different specificities. While SATB1 is mainly connexted to adaptive immunity (T cell development), SATB2 seems to be more linked to neuronal development. Like SATB1, it is involved in cancer development, although usually in different types of cancer.

Here is a brief recent bibliography about SATB2:

Mutual regulation between Satb2 and Fezf2 promotes subcerebral projection neuron identity in the developing cerebral cortex

SATB1 and SATB2 play opposing roles in c-Myc expression and progression of colorectal cancer

However, how similar is SATB2 to SATB1 in terms of sequence homology?

Here is a direct blast of the two human molecules:

 

Fig. 5: Blast of human SATB1 vs human SATB2:

 

OK, they are very similar, but…  only 460 identities, 550 positives, 854 bits. IOWs, these two human proteins are similar, but not so similar as the two sequences of SATB1 in the shark and in humans.

Now, here is the evolutionary history of SATB2:

 

Fig. 6: Evolutionary history of SATB2 by human-conserved functional information

 

As everyone can see, it is almost identical to the evolutionary history of SATB1. To see it even better, Fig. 7 shows the two evolutionary histories together (the green line is SATB1, the brown line is SATB2):

 

Fig. 7: Evolutionary history of SATB1 and SATB2 by human-conserved functional information

 

In particular, pre-vertebrate history and the jump in cartilaginous fish are practically identical. And yet these are two different molecules, as we have seen, with different specificities and about one third of difference in sequence.

Now, let’s blast human SATB2 against cartilaginous fish. Again the best hit is with the whale shark:

 

Fig. 8: The best hit of human SATB2 in cartilaginous fish (with the whale shark)

 

And the numbers are very similar, incredibly similar I would say, to those we found for SATB1:

1197 bits, 592 identities, 662 positives.

But what if we blast SATB1 of the whale shark against SATB2 of the whale shark?

Here are the results:

 

Fig. 9: Blast of whale shark SATB1 vs whale shark SATB2:

Now, please, compare the numbers we got here with those from the similar blast between the two proteins in humans:

SATB1 human vs SATB2 human:  460 identities, 550 positives, 854 bits

SATB1 shark vs SATB2 shark:      468 identities, 556 positives, 856 bits

Almost exactly the same numbers! Wow!

What does that mean?

It means that this system of two similar proteins with different function arises in vertebrates as a whole system, already complete, with the two components already differentiated, and is conserved almost identical up to humans. Indeed, SATB1 and SATB2 have the same degree of homology both in sharks and in humans, and the two SATB1 proteins in shark and humans, as well as the two SATB2 proteins in shark and humans, have greater similarity, after more than 400 million years of divergence, than SATB1 and SATB2 show when compared, both in sharks and in humans.

Would you describe that as sudden appearance of huge amounts of functional information, followed by an extremely long stasis? I certainly would!

The following table sums up these results:

Sequence 1 Sequence 2 Bitscore
SATB1 Human SATB2 Human 854
SATB1 Shark SATB2 Shark 856
SATB1 Human SATB1 Shark 1203
SATB2 Human SATB2 Shark 1197

IOWs, the whole system appeared practically as it is today, before the split of cartilaginous fish and bony fish, and has retained its essential form up to now.

So, the total amount of new functional information implied by the whole system of these two proteins is about 1545 bits (considering 855 bits of common information, and 345 bits x 2 of specific information in each molecule).

An amazing amount, for a system of just two molecules, considering that 500 bits is Dembski’s Universal Probability Bound!

Let’s remember that in my previous post, quoted above, I showed that the informational jump from pre-vertebrates to vertebrates is more than 1.7 million bits. That’s a very big number, but big numbers sometimes are not easily digested. So, I believe that seeing that just two important molecules can contribute for almost 1500 bits can help us understand what we are really seeing here.

Moreover, it’s certainly not a case that those two molecules seem to be fundamental in two very particular fields:

a) The adaptive immune system

b) The nervous system

if we consider that those are exactly the two most relevant developments in vertebrates.

And, as a final note, please consider that these are very complex master regulators, which interact with tens of other complex proteins to effect their functions. The whole system is certainly much more irreducibly complex than we can imagine.

But still, just the analysis of these two sister proteins is more than enough to demonstrate that the neo Darwinian paradigm is completely inappropriate to explain what we can see in the proteome and in its natural history. And this is only one example among thousands.

So, I want to conclude repeating again this strong and very convinced statement:

The observed facts described here cannot in any way be explained by any neo-darwinian model. Absolutely not. They are extremely strong evidence for a design inference.

Comments
gpuccio @271:
"That can work in theory, but it cannot explain the observed differences between similar proteins, and they rather regular relationship with evolutionary time distance.
Assuming it's "their" instead of "they", are the observed differences between similar proteins the result of accumulated mutations through biological history? Are those differences functional or just structural? Does "common design" relate to the name of this website? Or that's a separate issue? All the years working on a software development project for engineering design applications and lately trying to understand how the biological systems function have not left much spare time to look at this part of the discussion on how we got the biological systems to begin with. I've been looking at the terminology often seen here: uncommon/common descent, etc, with a "huh?", but kind of letting it go by, thinking that I could come back to clarify it later. Perhaps it's time for me to start looking into this too. :)Dionisio
July 29, 2017
July
07
Jul
29
29
2017
02:54 AM
2
02
54
AM
PDT
gpuccio @270:
But, of course, there is no difficulty for RV + NS, in the RodW version, to generate 731 specific AA sites and 1300+ bits of functional information, with population size and mutation rate and generation time of the first chordates (whatever they were), on a timescale of, at most, 30 million years (but probably much less).
Well, perhaps with some imagination one could accept such an scenario, wouldn't you? :) See Harry Potter is widely accepted out there. :) I personally prefer Cinderella's story, where a pumpking turns into an elegant carriage, mice become beautiful horses and a grasshopper is hired as their cochero. At least those things make sense, don't they? :)Dionisio
July 29, 2017
July
07
Jul
29
29
2017
02:33 AM
2
02
33
AM
PDT
gpuccio @258: "...either a cryptic existing ORF becomes transcribed and translated..." In such a case, isn't the information already present* in the cryptic existing ORF before it becomes a gene? (*) or stored in or associated withDionisio
July 29, 2017
July
07
Jul
29
29
2017
02:23 AM
2
02
23
AM
PDT
gpuccio's discussion with RodW seems interesting. RodW should get credits for that too. :) Note how gpuccio dissects the text and comments on separate parts making sure nothing is left out. Not many folks have such a detail-oriented commenting style here -as far as I see. It would help if others could take note and try to imitate such an exegetical approach.Dionisio
July 29, 2017
July
07
Jul
29
29
2017
02:20 AM
2
02
20
AM
PDT
Dionisio: Common design would be the theory that functional similarities can be explained by functional constraints, and do not imply common descent. IOWs, the designer designs it from scratch each time, but has to use the same solutions because they are the required solutions, or the best. That can work in theory, but it cannot explain the observed differences between similar proteins, and their rather regular relationship with evolutionary time distance. I refer in particular to the divergence of synonimous sites, measured by the Ks, that remain IMO the best (not the only) argument for physical common descent. Moreover, I must say that explaining all sequence similarities in proteins by functional constraints, even when a short evolutionary time separates the species (see for example chimps and humans) seem a little far-fetched.gpuccio
July 29, 2017
July
07
Jul
29
29
2017
01:51 AM
1
01
51
AM
PDT
ET: "For population sizes and mutation rates appropriate for Drosophila, a pair of mutations can switch off one transcription factor binding site and activate another on a timescale of several million years, even when we make the conservative assumption that the second mutation is neutral." Thank you for the interesting article. But, of course, there is no difficulty for RV + NS, in the RodW version, to generate 731 specific AA sites and 1300+ bits of functional information, with population size and mutation rate and generation time of the first chordates (whatever they were), on a timescale of, at most, 30 million years (but probably much less). And, definitely, I am not making the conservative assumption that any relevant part of those mutations can be neutral. You say: "I understand what RodW is saying and he is being honest, at least." I agree. And he is not condescending like wd400, at least (well, not too much!) :)gpuccio
July 29, 2017
July
07
Jul
29
29
2017
01:44 AM
1
01
44
AM
PDT
gpuccio @251: On page 6 & 7 in the paper referenced through the link @262 they talk about rhodopsin proteins within the context of quantum physics.Dionisio
July 29, 2017
July
07
Jul
29
29
2017
01:42 AM
1
01
42
AM
PDT
gpuccio, @256 you mentioned 'common design'. What is that in biology?Dionisio
July 28, 2017
July
07
Jul
28
28
2017
07:46 PM
7
07
46
PM
PDT
gpuccio @251: Very interesting comment on the proteins Rhodopsin and Long-wave-sensitive opsin 1. Thanks. BTW, are they also in the VIP club along with SATB1 and SATB2? :)Dionisio
July 28, 2017
July
07
Jul
28
28
2017
07:32 PM
7
07
32
PM
PDT
ET,
What he fails to realize is that his position has nothing but luck.
OK, but why does he fail to realize that? Is it because it's too difficult to understand? Or because it hasn't been explained well? Or a language issue?Dionisio
July 28, 2017
July
07
Jul
28
28
2017
07:20 PM
7
07
20
PM
PDT
Dionsio- I understand what RodW is saying and he is being honest, at least. Evolutionism does demand that chance occurrences, ie luck, produces the DNA, RNA and proteins (he forget the DNA) and that unless the organism(s) are less fit during the build-up then natural selection will preserve whatever is there. It's just that the sheer amount of luck involved makes it not only inconceivable but also as close to impossible something can get. Even Richard Dawkins admits that science can allow for only so much luck. What he fails to realize is that his position has nothing but luck.ET
July 28, 2017
July
07
Jul
28
28
2017
06:02 PM
6
06
02
PM
PDT
ET, Sorry to disappoint you with this news, but no matter how hard you may try, some folks won't understand what you mean, apparently because they simply don't want to.Dionisio
July 28, 2017
July
07
Jul
28
28
2017
05:52 PM
5
05
52
PM
PDT
RodW:
Yes. Its luck that produces the RNA and protein. Presumably this happens often over evolutionary time but the vast majority have no function. When one by chance does have a function NS acts on it to change the sequence relatively fast. This is what the evidence shows.
What evidence shows that? The following paper pretty much squashes that scenario: Waiting for Two Mutations: With Applications to Regulatory Sequence Evolution and the Limits of Darwinian Evolution:
For population sizes and mutation rates appropriate for Drosophila, a pair of mutations can switch off one transcription factor binding site and activate another on a timescale of several million years, even when we make the conservative assumption that the second mutation is neutral.
And that is just for binding sites!ET
July 28, 2017
July
07
Jul
28
28
2017
05:15 PM
5
05
15
PM
PDT
gpuccio: When you are ready to take a break from the "heated" discussions you're having here, you might want to take a look at this: https://uncommondescent.com/intelligent-design/neuroscience-tries-to-be-physics-asks-is-matter-conscious/#comment-636746Dionisio
July 28, 2017
July
07
Jul
28
28
2017
04:44 PM
4
04
44
PM
PDT
RodW:
All DNA sequences potentially code for protein.
Like the footprints in the sand of birds on the beach can potentially produce a love poem. :)Mung
July 28, 2017
July
07
Jul
28
28
2017
02:47 PM
2
02
47
PM
PDT
My dear gpuccio, You just don't understand how evolution works. I'll explain it to you once I figure it out.Mung
July 28, 2017
July
07
Jul
28
28
2017
02:42 PM
2
02
42
PM
PDT
Perhaps the comment @255 could serve as an illustration of what the term "incoherence" stands for. :)Dionisio
July 28, 2017
July
07
Jul
28
28
2017
02:21 PM
2
02
21
PM
PDT
RodW: We agree on some things, but you misunderstand others. "Transcription is already happening in 80% of the genome, even non-functional parts." I never said anything different, Transcription happens, as ENCODE teaches. That it is functional ot not, is all to be decided. However, transcription ofo non coding DNA is usually non translated. My point is that a new gene must be both transcribed and translated. "No. An ORF is any sequence that doesn’t contain a STOP codon..." The prevailing theories about the origin of new genes are two, as I say in my post: either a cryptic existing ORF becomes transcribed and translated, or some non coding sequence acquires an ORF by mutation. You seem to privilege the first scenario, but evidence exists also for the second. Of course, both things could happen. "Yes. Its luck that produces the RNA and protein. Presumably this happens often over evolutionary time but the vast majority have no function. When one by chance does have a function NS acts on it to change the sequence relatively fast. This is what the evidence shows." No. If you look at one of the papers I reference, it shows that new genes have specific properties just from the beginning of their appearance, and that cannot be explained by NS. So, that's not what the evidence shows. "Right, NS begins with the emergence of a new protein that has some level of function. NS improves that function." What evidence have you that most new genes are random sequences, and that some instead have som naturally selectable function? Not much is known about the function of new genes. In no way it has been shown that some basic function emerges randomly with reasonable probability and can lead by NS to a new complex function in a new protein. "No, see above" No what? You agreed that NS cannot act until a naturally selectable function emerges. My point is that none of the classic imagined neo-darwinian concepts: a) Gradual evolution of function b) Recombination c) Starting near a functional island, and profiting of imaginary connections between functional islands and so on, here do not apply. can apply to the emergence of new genes from non coding DNA. If they apply. they apply only after and if a naturally selected function appears, by mere chance. Do you agree on that? I said: Here we are starting from a completely random non coding sequence, according to the neo-darwinist theory. Until it becomes transcribed and translated, there is no reason at all that it may have any information about any protein sequence. None at all. You comment: "No. All DNA sequences potentially code for protein." You don't understand my point. If a non coding DNA sequence becomes suddenly translated, while it was never translated before (IOWs, a new gene), there is no specific information in the DNA sequence about any protein function. IOWs, the codons that emerge as the new translation of a sequence that had never coded for a protein cannot of course have any special relationship with protein space. IOWs, they cannot be the result of any process of: a) Gradual evolution of function b) Recombination c) Starting near a functional island, and profiting of imaginary connections between functional islands and so on, here do not apply. As I said before. Instead, those processes have been usually invoked for the emergence of new genes form gene duplication, because in that case the original sequence that undergoes transformation was a protein coding sequence. In that sense, origin form non coding DNA is a much worse scenario than origin from gene duplication, from a neo-darwinian point of view (even if some different specific problems arise also in the gene duplication scenario). Do you understand my point now? "Just about everything in living things evolve from slight modifications of previous things. De novo genes shocked most scientists. 10 years ago I would have said its absolutely impossible." It is still impossible, without design. You have just become acquainted with the idea. "There was no ‘preparation’ of the sequence. Randome sequences that are transcribed and translated can occasionally produce functional proteins." No, again you did not read well what I wrote. I will repeat it: The second paper I quoted above seems to demonstrate that new genes, when they emerge, already have specific properties that “normal” non coding DNA does not have. IOWs, the long preparation of the sequence through apparently neutral variation, before any possible intervention of NS, seems to generate specific functional properties before the sequence is translated into a protein. How do you think that can happen? That paper finds those specific properties in the vast majority of new genes. Those properties are not found in non coding DNA. So, obviously, there is a preparation through mutations before the new genes emerge, and NS can have no role in that. How do you explain it? Moreover, if the scenario you suggest were true, cells should be repleted of non functional, random new genes, out of which only once in a while (if ever) a functional, naturally selectable gene could emerge. But that is not the case, of course. "To summarize: because RNA polymerase initiates transcription by a stochastic mechanism there will always be a large amount of non-functional transcription. Those transcripts will occasionally produce proteins at a low rate and some small fraction of those will by luck have selectable function. All of this has already been demonstrated. ( I’m pretty sure nonfunctional peptides have been detected)" You are sure of too many things. None of that has been "demonstrated". Of course there is a lot of transcription that is not destined to translation, but that was not at all expected, it was shown by encode and by the new powerful molecular techniques in the last years. That a large amount of non translated transcription is non functional is at the center of scientific debate. It is not demonstrated at all, indeed there is growing evidence that non translated transcription is essential for almost all cell functions. "Those transcripts will occasionally produce proteins at a low rate and some small fraction of those will by luck have selectable function." Demonstrated? Where? "I’m pretty sure nonfunctional peptides have been detected" It's very difficult to demonstrate that something is not functional. I can agree that non functional peptides can exist, but then? What has that to do with all our discussions here? "The evidence that they produce functional proteins is that the de novo genes have non-functional counterparts in closely related species." In the papers I have read, they have non translated counterparts, not non functional counterparts. That has been shown in primates. Can you give examples of what you say? "They are much shorter than the average protein" That is true. "and they have a higher than neutral mutation rate of the type that indicates positive selection." Examples, please? "And most important…. they contain mostly intrinsically disordered regions. In other words they don’t contain the folded domains that we find in most older proteins." That new genes are different, for some aspects, from older genes is true. I don't see how that helps your theory, however. "In other words they don’t contain the folded domains that we find in most older proteins." That is not necessarily true, even if it can be true in part. The fact that we don't recognize domains in new genes is in part because new genes are categorized as such exactly because they don't have recognizable homologies. I am not sure that we have enough structural information about new genes to exclude that they contain folded parts which could be, in the future, recognized as new domains. "Doug Axe’s entire argument is that protein domains can’t evolve at random and protein domains are required for function." I cannot speak for Axe, but I think that his point it that functional domains cannot evolve at random. Which is true. That protein domains are required for many functions is certainly true, but it is certainly not true that only domains have function. We have the example of intrinsically disordered proteins (the old ones, not new genes) that are functional, and all my reasoning in this OP is about the idea that interdomain sequences are functional. It was you, if I remember well, who said differently. Moreover, my point is that even new genes, with or without domains, if they are functional, cannot arise at random. Unfortunately, not much is known about the functions of new genes. I think we have to wait some more time to understand better. Another important point is that the emergence of new domains has certainly been slowing down in the course of natural history. Almost half of existing superfamilies were already present in LUCA. Does that mean that no new information has been generated, say, between chimp and humans? Of course not. The simple fact is, most new information in the most "recent" part of natural history is probably regulatory, and I believe that we still don't understand it. "Not only that but they show that the stability of de novo genes can gradually improve by natural selection" Examples, please. It is really strange to invoke NS for genes whose function we don't know. "Even YEC’s agree that NS can improve the function of biological entities. " Well, I don't agree. Certainly, not generating new complex functional information. The few examples of NS we know are about very trivial tracts. But maybe I cannot understand your point because I am not an YEC. :) "This topic, and the papers gpuccio has posted show that its much easier to get functional proteins from scratch – without the need for intelligent intervention- than anyone had imagined." ??????? No comment!gpuccio
July 28, 2017
July
07
Jul
28
28
2017
02:06 PM
2
02
06
PM
PDT
gpuccio #149
Well, for a new gene to arise from a non coding sequence, at least two things must take place: a) Transcription of the sequence must take place b) An ORF must be acquired, so that the transcribed sequence may be translated and become a gene Nobody knows exactly in what order the two things may happen, in the limited understanding that we have today of the issue. One hypothesis is that the non coding sequence is already transcribed, then acquires an ORF and is translated. Another hypothesis is that many ORFs exist but are not transcribed (“cryptic ORFs”), and at some point one of them is transcribed.
a) Transcription is already happening in 80% of the genome, even non-functional parts. One of the papers in the refs for the second paper ( by Tautz) calculates how long before every region is covered by transcription. b) No. An ORF is any sequence that doesn't contain a STOP codon. A random sequence of DNA will have a length distribution of ORFS around an average ( IIRC ~150 bases). SO the genome contains lots of them. Very long ORFs are relatively rare and the fact that de novo genes tend to be much shorter than older genes is supporting evidence for their existence.
Whatever. In both cases, one thing is for certain: unless and until the sequence is both transcribed and translated, no form of NS can take place for its function as a protein sequence, because no protein exists.
Yes. Translation could occur from the first MET in the RNA or from many other aa at low efficiency.
IOWs, if a new gene emerges from a non coding sequence, whatever the modality, its “evolution” is completely neutral up to the moment it is both transcribed and translated. IOWs, up to the moment it becomes a new gene.
Yes. Its luck that produces the RNA and protein. Presumably this happens often over evolutionary time but the vast majority have no function. When one by chance does have a function NS acts on it to change the sequence relatively fast. This is what the evidence shows.
That means that the emergence of a new gene, in the absence of any design intervention, is left purely to random variation. NS can have no role at all.
Right, NS begins with the emergence of a new protein that has some level of function. NS improves that function.
Of course, darwinists imagine that after the emergence of the new gene, NS comes to the rescue, and operates its miracles. OK, faith has no limits. But remember, all the classic reasonings of neo-darwinism: a) Gradual evolution of function b) Recombination c) Starting near a functional island, and profiting of imaginary connections between functional islands and so on, here do not apply.
No, see above
Here we are starting from a completely random non coding sequence, according to the neo-darwinist theory. Until it becomes transcribed and translated, there is no reason at all that it may have any information about any protein sequence. None at all.
No. All DNA sequences potentially code for protein.
So, the imagined NS should operate on a completely random sequence, when the sequence is already transcribed and translated. Not good for the neo-darwinian scenario. Not good at all.
Just about everything in living things evolve from slight modifications of previous things. De novo genes shocked most scientists. 10 years ago I would have said its absolutely impossible.
Just a few final remarks: 2) The second paper I quoted above seems to demonstrate that new genes, when they emerge, already have specific properties that “normal” non coding DNA does not have. IOWs, the long preparation of the sequence through apparently neutral variation, before any possible intervention of NS, seems to generate specific functional properties before the sequence is translated into a protein. How do you think that can happen?
There was no 'preparation' of the sequence. Randome sequences that are transcribed and translated can occasionally produce functional proteins.
3) So much for the idea that non coding DNA is junk: it is a source of new genes, the clue to all functional evolution.
No one ever thought that non-coding = junk. In the 1950s scientists knew about regulatory sites. But there is nonfunctional DNA that can give rise to new genes. This is what the 2 papers you posted, and the references therein, show. To summarize: because RNA polymerase initiates transcription by a stochastic mechanism there will always be a large amount of non-functional transcription. Those transcripts will occasionally produce proteins at a low rate and some small fraction of those will by luck have selectable function. All of this has already been demonstrated. ( I'm pretty sure nonfunctional peptides have been detected) The evidence that they produce functional proteins is that the de novo genes have non-functional counterparts in closely related species. They are much shorter than the average protein (ORFs tend to be short) and they have a higher than neutral mutation rate of the type that indicates positive selection. And most important.... they contain mostly intrinsically disordered regions. In other words they don't contain the folded domains that we find in most older proteins. This last point should ring a bell with everyone here, and not just because of gpuccio's post on SATB. Doug Axe's entire argument is that protein domains can't evolve at random and protein domains are required for function. But de novo genes show that domains ARE NOT required for function. Not only that but they show that the stability of de novo genes can gradually improve by natural selection (although I don't know if anyone has suggested that new domains have formed) Even YEC's agree that NS can improve the function of biological entities. The difficulty in appreciating and accepting evolution for everyone ( IDers AND scientists) has always been how to you get the new structure, new body part, new protein etc from scratch? This topic, and the papers gpuccio has posted show that its much easier to get functional proteins from scratch - without the need for intelligent intervention- than anyone had imagined.RodW
July 28, 2017
July
07
Jul
28
28
2017
12:45 PM
12
12
45
PM
PDT
RodW: I am not sure that I understand your comments at #255. Of course ID and neo-darwinian theory are alternative explanations of facts: the facts involved being of course the functional information we see in biological beings, and the problem of how to explain its origin. "Alternative" means that they cannot be both true. Either biological information evolved through RV and NS, or it was designed. So, I certainly agree with you: the evidence must clearly be on one side. And I have no doubts on what side it is! :) You say: "To follow up on what WD40 said I’d say that when one looks very superficially, ID and Evo might seem compatible, and arguments about ‘common design’ might seem reasonable, but when one looks in detail that’s not the case." I have never thought that ID and neo-darwinian evolution are compatible. So, it seems that I agree with you, and you with me. "and arguments about ‘common design’ might seem reasonable," I have never, never made arguments about "commone design". If you read well my OP and my posts, you will see that my argument is about common descent through design. I don't believe that common design can really explain what we observe in biological information. I have defended that idea many times. I think that common descent cannot be reasonably denied, with what we know. My main argument about that is the evidence from progressive divergence of Ks (IOWs, the rather regular rate of neutral variation in neutral sites). I have debated a lot, here, with others IDists who do not accept common descent. But I certainly don't believe that common descent and RV + NS can explain the appearance of complex functional information. I believe that ID theory proves that only common descent + design can explain it. Finally, you say: "YEC say that Noah’s flood explains the pattern of the fossil record. But if you know anything at all about the fossil record you can see that’s absurd." I agree. And so? What has that to do with ID?gpuccio
July 28, 2017
July
07
Jul
28
28
2017
12:24 PM
12
12
24
PM
PDT
gpuccio Ok, I'm working my way through them. #144. You suggest that both ID and evolution are based on alternate interpretation of the facts. I don't think this could be the case. Its not like the evidence from fossils, anatomy, molecular bio, genetics etc could point to either ID or Evo and its a kind of glass-is-half-empty/full choice that we all make based on our preference. The evidence must clearly be on one side. To follow up on what WD40 said I'd say that when one looks very superficially, ID and Evo might seem compatible, and arguments about 'common design' might seem reasonable, but when one looks in detail that's not the case. YEC say that Noah's flood explains the pattern of the fossil record. But if you know anything at all about the fossil record you can see that's absurd.RodW
July 28, 2017
July
07
Jul
28
28
2017
11:44 AM
11
11
44
AM
PDT
RodW: You could look at wd400 arguments in post #133 and at my answers in posts #144, 149, 157, 159, 171 and 213.gpuccio
July 28, 2017
July
07
Jul
28
28
2017
10:55 AM
10
10
55
AM
PDT
OK, well is there any point in posting here now? If so direct me to some posts and I'll weigh inRodW
July 28, 2017
July
07
Jul
28
28
2017
10:23 AM
10
10
23
AM
PDT
Thanks gpuccio. I am going to have to learn how to do this kind of research myself. Perhaps you should hand out homework assignments. hah!Mung
July 27, 2017
July
07
Jul
27
27
2017
07:44 PM
7
07
44
PM
PDT
Mung: More on our two opsins (Rhodopsin and Long-wave-sensitive opsin 1). The scenario here is similar to what we have observed for SATB1 and SATB2. Let's see why. Each of the two proteins is extremely conserved from shark to humans, and exhibits a big jump in vertebrates (IOW, most of the conserved information is generated in sharks). Rhodopsin (348 AAs): Total homology between shark (Scyliorhinus canicula) and humans: 608 bits; 288 identities; 321 positives Jump in vertebrates: 380 bits; 165 identities b) Long-wave-sensitive opsin (364 AAs) Total homology between shark (Callorhincus milii) and humans: 573 bits; 264 identities; 305 positives Jump in vertebrates: 324 bits; 118 identities So, there is no doubt that the two molecules are extremely conserved (83% and 73% identity between sharks and humans). There is also no doubt that both exhibit a very big informational jump at the beginning of vertebrate history (380 + 324 bits). Now, let's compare the two human molecules: Rhodopsin and LWS opsin in humans share: 281 bits; 149 identities; 211 positives Now, if we consider that the two molecules share a very similar basic identity with the opsins present in non vertebrates, we can see that the best hit in deuterostomia non vertebrates is: 228 bits and 126 identities for Rhodopsin 249 bits and 146 identities for LWS opsin So, we can say that the two molecules, in humans, share essentially the basic structure and sequence of opsins that were already present in non vertebrates, but are completely different for the part of information which is specific to vertebrates. And, if we compare the two corresponding molecules in sharks (Rhodopsin and Red-sensitive opsin-like protein), again we find a similar result: 275 bits; 142 identities; 217 positives incredibly similar to what we had found comparing the two human molecules: 281 bits; 149 identities; 211 positives This is not a case. Of course, the basic structure and sequence of opsins is based on pre-vertebrate information which is shared by the two molecules in the same way in sharks and humans. IOWs, it is basic information that is conserved from non vertebrate deuterostomia to sharks and humans. But the rest is all another thing. Each of the two molecules has more than 300 bits of new information which: a) appears for the first time in vertebrates b) is conserved between sharks and humans, but c) is different in each of the two molecules. It's the same scenario that we have seen for SATB1 and SATB2, and that is evidence that the two proteins of my OP are no rare exception. It's also important to note that these opsins are extremely functionally constrained. For example, 83% of the sequence is identical between sharks and humans, a result that is certainly not common. We can say, therefore, that almost all the sequence of the molecule is functional, an idea that certainly will not make our darwinist friends happy, considering how they like to think that only a very small part of the sequence of proteins is really specific! So, Mung, you are, as usual right: "Perhaps Doug Axe should have chosen opsins" :)gpuccio
July 27, 2017
July
07
Jul
27
27
2017
04:17 PM
4
04
17
PM
PDT
gpuccio @246: "there is no shortage at all of much more extreme situations" Agree. The couple of VIPs you just disturbed in this thread are a clear example.Dionisio
July 27, 2017
July
07
Jul
27
27
2017
02:09 PM
2
02
09
PM
PDT
Can you suggest a metrics?
Similarly gullible people must be closely related.Mung
July 27, 2017
July
07
Jul
27
27
2017
11:22 AM
11
11
22
AM
PDT
bill cole: I have no definitive answer to that. Probably some genes are "lost" in some branches because their specific function is not necessary in that branch, or it is simply performed by some alternative pathway. The "coming back" could maybe be explained because the original gene was passed before it became lost in that specific branch. The important thing, IMO, is to acknowledge that loss of genes is not a rare thing, but it is certainly not the rule. In most cases, the natural history of a protein is rather linear. Of course, some information can be lost in specific species, rather than in whole big branches, but that has really no special relevance for the general scenario.gpuccio
July 27, 2017
July
07
Jul
27
27
2017
10:31 AM
10
10
31
AM
PDT
gpuccio
The third one, SWS1, has a different curve, with low homology in sharks, and a big jump in bony fish and amphibians, lost in crocodiles, conserved in further species. Again, that can already be seen in the figure at genomewiki.
Genes being lost and then coming back seems to be a big problem for orthodox common descent. Thoughts?bill cole
July 27, 2017
July
07
Jul
27
27
2017
10:18 AM
10
10
18
AM
PDT
Dionisio at #237: Yes, but that seems more about smaller jumps, about "tweaking" of function in a protein family, at least if we judge from the abstract: "Using Ancestral Sequence Reconstruction, we have generated a putative ancestral lipases/acyltransferases, PaleoLAc. This enzyme shares a high level of identity with CpLIP2 but has a different catalytic behavior." (emphasis mine). Unfortunately, I cannot access the full article to have more details. Tweaking an existing function in a protein family, with limited AA changes, is indeed a "grey zone". Axe has shown that even in that kind of scenario neo-darwinian evolution is not a satisfying explanation, but as there is no shortage at all of much more extreme situations, as we have seen, I would stick to them for the moment.gpuccio
July 27, 2017
July
07
Jul
27
27
2017
07:43 AM
7
07
43
AM
PDT
1 2 3 4 11

Leave a Reply