Uncommon Descent Serving The Intelligent Design Community

Homologies, differences and information jumps

Share
Facebook
Twitter
LinkedIn
Flipboard
Print
Email

shark-553666_1280In recent posts, I have been discussing some important points about the reasonable meaning of homologies and differences in the proteome in the course of natural history. For the following discussion, just to be clear, I will accept a scenario of Common Descent (as explained in many recent posts) in the context of an ID approach. I will also accept the very reasonable concept that neutral or quasi-neutral random variation happens in time, and that negative (purifying) selection is the main principle which limits random variation in functional sequences.

My main points are the following:

  1. Given those premises, homologies through natural history are certainly an indicator of functional constraints, because they mean that some sequence cannot be significantly transformed by random variation. Another way to express this concept is that variation in a functional sequence with strong functional constraints is not neutral, but negative, and therefore negative selection will in mot cases suppress variation and conserve the functional sequence through time. This is a very important point, because it means that strong homologies through time point to high functional complexity, and therefore to design. I have used this kind of argument, for example, for proteins like the beta chain of ATP synthase (highly conserved from LUCA to humans) and Histone H3 (highly conserved in all eukaryotes).
  2. Differences between homologues, instead, can have two completely different meanings:
  •  2a) They can be the result of accumulating neutral variation in parts of the molecule which are not functionally constrained
  • 2b) They can be the expression of differences in function in different species and contexts

I do believe that both 2a and 2b happen and have an important role in shaping the proteome. 2b, in particular, is often underestimated. It is also, in many cases, a very good argument for ID.

 

Now, I will try to apply this reasoning to one example. I have chosen a regulatory protein, one which is not really well understood, but which has certainly an important role in epigenetic regulation. The protein is called “Prickle”, and we will consider in particular the one known as “Prickle 1”. It has come to my attention trough an interesting paper linked by Dionisio (to whom go my sincere thanks and appreciation):

Planar polarization of Vangl2 in the vertebrate neural plate is controlled by Wnt and Myosin II signaling

In brief, Prickle is a molecule implied, among other things, in planar polarization events and in the regulation of neural system in vertebrates.

Let’s have a look at the protein. From Wikipedia:

Prickle is part of the non-canonical Wnt signaling pathway that establishes planar cell polarity.[2] A gain or loss of function of Prickle1 causes defects in the convergent extension movements of gastrulation.[3] In epithelial cells, Prickle2 establishes and maintains cell apical/basal polarity.[4] Prickle1 plays an important role in the development of the nervous system by regulating the movement of nerve cells.[5

And:

Mutations in Prickle genes can cause epilepsy in humans by perturbing Prickle function.[12] One mutation in Prickle1 gene can result in Prickle1-Related Progressive Myoclonus Epilepsy-Ataxia Syndrome.[2] This mutation disrupts the interaction between prickle-like 1 and REST, which results in the inability to suppress REST.[2] Gene knockdown of Prickle1 by shRNA or dominant-negative constructs results in decreased axonal and dendritic extension in neurons in the hippocampus.[5] Prickle1 gene knockdown in neonatal retina causes defects in axon terminals of photoreceptors and in inner and outer segments.[5]

The human protein is 831 AAs long.

Its structure is interesting: according to Uniprot, in the first part of the molecule we can recognize 4 domains:

1 PET domain:  AAs 14 – 122

3 LIM zinc-binding doamins:  AAs 124 – 313

In the rest of the sequence (AAs 314 – 831) no known domain is recognized.

Here is the FASTA sequence of the human protein, divided in the two parts (red: 4 domain part; blue: no domain part):

 

>sp|Q96MT3|PRIC1_HUMAN Prickle-like protein 1 OS=Homo sapiens GN=PRICKLE1 PE=1 SV=2
MPLEMEPKMSKLAFGCQRSSTSDDDSGCALEEYAWVPPGLRPEQIQLYFACLPEEKVPYV
NSPGEKHRIKQLLYQLPPHDNEVRYCQSLSEEEKKELQVFSAQRKKEALGRGTIKLLSRA
VMHAVCEQCGLKINGGEVAVFASRAGPGVCWHPSCFVCFTCNELLVDLIYFYQDGKIHCG
RHHAELLKPRCSACDEIIFADECTEAEGRHWHMKHFCCLECETVLGGQRYIMKDGRPFCC
GCFESLYAEYCETCGEHIGVDHAQMTYDGQHWHATEACFSCAQCKASLLGCPFLPKQGQI
YCSKTCSLGEDVHASDSSDSAFQSARSRDSRRSVRMGKSSRSADQCRQSLLLSPALNYKF
PGLSGNADDTLSRKLDDLSLSRQGTSFASEEFWKGRVEQETPEDPEEWADHEDYMTQLLL
KFGDKSLFQPQPNEMDIRASEHWISDNMVKSKTELKQNNQSLASKKYQSDMYWAQSQDGL
GDSAYGSHPGPASSRRLQELELDHGASGYNHDETQWYEDSLECLSDLKPEQSVRDSMDSL
ALSNITGASVDGENKPRPSLYSLQNFEEMETEDCEKMSNMGTLNSSMLHRSAESLKSLSS
ELCPEKILPEEKPVHLPVLRRSKSQSRPQQVKFSDDVIDNGNYDIEIRQPPMSERTRRRV
YNFEERGSRSHHHRRRRSRKSRSDNALNLVTERKYSPKDRLRLYTPDNYEKFIQNKSARE
IQAYIQNADLYGQYAHATSDYGLQNPGMNRFLGLYGEDDDSWCSSSSSSSDSEEEGYFLG
QPIPQPRPQRFAYYTDDLSSPPSALPTPQFGQRTTKSKKKKGHKGKNCIIS

So, this is a very interesting situation, which is not so rare. We have the first part of the sequence (313 AAs) which configures well known and conserved domains, while “the rest”(517 AAs)  is apparently not understood in terms of structure and function.

So, to better understand what all this could mean, I have blasted those two parts of the human molecule separately.

(Those who are not interested in the technical details, can choose here to go on to the conclusions  🙂 )

The first part of the sequence (AAs 1 – 313) shows no homologies in prokaryotes. So, we are apparently in the presence of domains which appear in eukaryotes.

In fungi, we find some significant, but weak, homologues. The best hit is an expect of 2e-21, with 56 identities and 93 positives (99.4 bits).

Multicellular organisms have definitely stronger homologies:

C. elegans:  144 identities, 186 positives, expect 2e-90 (282 bits)

Drosophila melanogaster:  202 identities, 244 positives, expect 5e-152 (447 bits)

Let’s go to non vertebrate chordates:

Cephalochordata (Branchiostoma floridae):  222 identities, 256 positives, expect 6e-165 (484 bits)

Tunicata (Ciona intestinalis): 196 identities, 241 positives, expect 2e-149 (442 bits)

Now, vertebrates:

Cartilaginous fishes (Callorhincus milii): 266 identities, 290 positives, expect 0.0 (588 bits)

Bony fishes (Lepisosteus oculatus): 274 identities, 292 positives, expect 0.0 (598 bits)

Mammals (Mouse): 309 identities, 312 positives, expect 0.0 (664 bits)

IOWs, what we see here is that the 4 domain part of the molecule, absent in prokaryotes, is already partially observable in single celled eukaryotes, and is strongly recognizable in all multicellular beings. It is interesting that homology with the human form is not very different between drosophila and non vertebrate chordates, while there is a significant increase in vertebrates, and practical identity already in mouse. That is a very common pattern, and IMO it can be explained as a mixed result of different functional constraints and neutral evolution in different time splits.

Now, let’s go to “the rest” of the molecule: AAs 314 – 831 (518 AAs). No recognizable domains here.

What is the behaviour of this sequence in natural history?

Again, let’s start again from the human sequence and blast it.

With Prokaryotes: no homologies

With Fungi: no homologies

C. elegans: no homologies

Drosophila melanogaster: no homologies

Let’s go to non vertebrate chordates:

Cephalochordata (Branchiostoma floridae):  no significant homologies

Tunicata (Ciona intestinalis): no significant homologies

So, there is no significant homology in the whole range of eukaryotes, excluding vertebrates and including chordates which are not vertebrates.

Now, what happens with vertebrates?

Here are the numbers:

Cartilaginous fishes (Callorhincus milii): 350 identities, 429 positives, expect 0.0 (597 bits)

Bony fishes (Lepisosteus oculatus): 396 identities, 446 positives, expect 0.0 (662 bits)

Mammals (Mouse): 466 identities, 491 positives, expect 0.0 (832 bits)

IOWs, what we see here is that the no domain part of the molecule is practically non existent in prokaryotes, in single celled eukaryotes and in all multicellular beings which are not vertebrates. In vertebrates, the sequence is not only present in practically all vertebrates, but it is also extremely conserved, from sharks to humans. So, we have a steep informational jump from non chordates and non vertebrate chordates, where the sequence is practically absent, to the very first vertebrates, where the sequence is already highly specific.

What does that mean from an ID point of view? It’s simple:

a) The sequence of 517 AAs which represents the major part of the human protein must be reasonably considered highly functional, because it is strongly conserved throughout vertebrate evolution. As we have said in the beginning, the only reasonable explanation for high conservation throughout a span of time which must be more than 400 million years long is the presence of strong functional constraints in the sequence.

b) The sequence and its function, whatever it may be (but it is probably an important regulatory function) is highly specific of vertebrates.

We have here a very good example of a part of a protein which practically appears in vertebrates while it is absent before, and which is reasonably highly functional in vertebrates.

So, to sum up:

  1. Prickle 1 is a functional protein which is found in all eukaryotes.
  2. The human sequence can be divided in two parts, with different properties.
  3. The first part, while undergoing evolutionary changes, is rather well conserved in all eukaryotes. Its function can be better understood, because it is made of known domains with known structure.
  4. The second part does not include any known domain or structure, and is practically absent in all eukaryotes except vertebrates.
  5. In vertebrates, it is highly conserved and almost certainly highly functional. Probably as a regulatory epigenetic sequence.
  6. For its properties, this second part, and its functional sequence, are a very reasonable object for a strong design inference.

 

I have added a graph to show better what is described in the conclusions, in particular the information jump in vertebrates for the second part of the sequence:

Graph3

Note: Thanks to the careful checking of Alicia Cartelli, I have corrected a couple of minor imprecisions in the data and the graph (see posts #83 and #136). Thank you, Alicia, for your commitment. The sense of the post, however, does not change.

Those who are interested in the evolutionary behaviour of protein Prickle 2 could give a look at my posts #127 and #137.

Comments
Dionisio at #15: No, obviously there is probably no clue about the appearance of those domains, as there is usually no clue about the appearance of any new domain. However, my point here is that, at least, those known domains show some "graduality" of modification in eukaryotes, while the second sequence "arises" suddenly in vertebrates, and is very well conserved afterwards.gpuccio
February 2, 2016
February
02
Feb
2
02
2016
02:33 AM
2
02
33
AM
PDT
I have added a graph to the post. I hope it shows more clearly the point I am discussing here.gpuccio
February 2, 2016
February
02
Feb
2
02
2016
02:29 AM
2
02
29
AM
PDT
Before the OP gets to what is mentioned @10 & @14, it states this:
The first part of the sequence (AAs 1 – 313) shows no homologies in prokaryotes. So, we are apparently in the presence of domains which appear in eukaryotes.
Their functionality is well known, but how much is it known about how exactly they appeared? Can someone point to serious literature that explains it well? Thanks.Dionisio
February 2, 2016
February
02
Feb
2
02
2016
02:17 AM
2
02
17
AM
PDT
Dionisio at #10: As soon as I have time, I will look for that. But probably there is not much: blast does not recognize any domain in the sequence, which usually means that there is no information about the 3D structure. The protein function is not really well understood, but its regulatory role in many contexts is well provengpuccio
February 2, 2016
February
02
Feb
2
02
2016
01:17 AM
1
01
17
AM
PDT
Dionisio: Yes, the shark is there because cartilaginous fishes are the oldest vertebrates (except for lampreys, which are jawless vertebrates). But you are right, a little bit of aggression could be helpful, sometimes... :)gpuccio
February 2, 2016
February
02
Feb
2
02
2016
01:14 AM
1
01
14
AM
PDT
Dionisio at #6: Correction made! :)gpuccio
February 2, 2016
February
02
Feb
2
02
2016
12:59 AM
12
12
59
AM
PDT
gpuccio You have answered satisfactorily my question about the E value. Thank you. BTW, one of my children -who works on something related to cancer research- has used that Blast program, but I prefer to ask you publicly about the E value, so that others readers here -who might be as technically challenged as I am- could benefit from your clear explanation too. :)Dionisio
February 2, 2016
February
02
Feb
2
02
2016
12:57 AM
12
12
57
AM
PDT
So, we have a steep informational jump from non chordates and non vertebrate chordates, where the sequence is practically absent, to the very first vertebrates, where the sequence is already highly specific. We have here a very good example of a part of a protein which practically appears in vertebrates while it is absent before, and which is reasonably highly functional in vertebrates.
Is anyone aware of any literature that explains the appearance of this highly specific sequence? In the meantime, would it help to look for papers on any known functionality associated with that vertebrate-only sequence?Dionisio
February 2, 2016
February
02
Feb
2
02
2016
12:53 AM
12
12
53
AM
PDT
Dionisio: Your questions are most welcome! :) In Blast, the expect value (E value) is a measure of the improbability of finding the observed homology by chance in the existing sequence database. Here is the description from the Blast FAQ page:
Q: What is the Expect (E) value? The Expect value (E) is a parameter that describes the number of hits one can "expect" to see by chance when searching a database of a particular size. It decreases exponentially as the Score (S) of the match increases. Essentially, the E value describes the random background noise. For example, an E value of 1 assigned to a hit can be interpreted as meaning that in a database of the current size one might expect to see 1 match with a similar score simply by chance. The lower the E-value, or the closer it is to zero, the more "significant" the match is. However, keep in mind that virtually identical short alignments have relatively high E values. This is because the calculation of the E value takes into account the length of the query sequence. These high E values make sense because shorter sequences have a higher probability of occurring in the database purely by chance. For more details please see the calculations in the BLAST Course. The Expect value can also be used as a convenient way to create a significance threshold for reporting results.
The bit score is another way to express the same concept. The lower the expect value, the higher the bit score. When the expect value is very low, it is simply given as "0.0", but the bit score can still be used to evaluate differences between different blasts.gpuccio
February 2, 2016
February
02
Feb
2
02
2016
12:52 AM
12
12
52
AM
PDT
BTW, impressive shark image at the top of the OP. Why did you choose that particular picture? Does it have to do with the shark being an allegedly ancient species? Or was it intended to impose certain level of discipline in the follow-up discussion, so everyone must stick to the discussed topic or else... :) Thank you.Dionisio
February 2, 2016
February
02
Feb
2
02
2016
12:18 AM
12
12
18
AM
PDT
Sorry for asking so many distracting irrelevant questions. I prefer to understand most technical terms encountered within the text before trying to understand more deeply what the whole article actually means. What does the following expression mean?
The best hit is an expect of 2e-21.
I'll try to refrain from asking too many questions though. Please, delete any of my posts that you deem irrelevant, distracting or off topic. Thank you.Dionisio
February 2, 2016
February
02
Feb
2
02
2016
12:09 AM
12
12
09
AM
PDT
The protein is called “Prickle”, and we will consider in particular the from known as “Prickle 1”.
What does that mean? Thank you.Dionisio
February 1, 2016
February
02
Feb
1
01
2016
11:27 PM
11
11
27
PM
PDT
[...] we have to try to bring the discussion into some biological detail.
Yes, agree. Thank you for taking the time to write this insightful OP, which should lead to a serious discussion on some related biological details.Dionisio
February 1, 2016
February
02
Feb
1
01
2016
11:21 PM
11
11
21
PM
PDT
Dionisio: Thank you for the revision! :) Yes, I know, the post is rather technical. But I think that we have to try to bring the discussion into some biological detail. Thank you for taking the time to consider it!gpuccio
February 1, 2016
February
02
Feb
1
01
2016
10:13 PM
10
10
13
PM
PDT
(hihly conserved from LUCA to humans)Dionisio
February 1, 2016
February
02
Feb
1
01
2016
03:57 PM
3
03
57
PM
PDT
This is a very important point. because it means that stron homologies thorugh time point to high functional complexity, and therefore to design.
Dionisio
February 1, 2016
February
02
Feb
1
01
2016
03:53 PM
3
03
53
PM
PDT
Very insightful article. Thank you. Haven't digested it entirely yet, though. It'll take me some time to process it completely.Dionisio
February 1, 2016
February
02
Feb
1
01
2016
03:52 PM
3
03
52
PM
PDT
1 31 32 33

Leave a Reply