A good number of proteomic researchers believe there are millions of protein isoforms. A protein isoform is a slight variation of a basic protein. I’m not averse to thinking there are only a limited number of “genes” that govern a basic number of limited protein classes, but that there are millions or billions of protein isoforms.
Consider something like the DSCAM gene which has 38,016 alternative splices and presumably 38,016 isoforms.
And how about the Dystrophin gene that consist of 2.5 million base pairs and codes for a protein with 3,500 amino acids? There are only a few isoforms so far identified, but with a gene that gigantic, one might guess there is much to discover about the Dystrophin gene. I would not be surprised, given the size of the Dystrophin gene, that we will find thousands of isoforms in the future.
From the literature:
First, the nonredundant set of proteins has here been defined as a single representative protein from every gene locus. At present, the human genome contains 22,000–23,000
genes, although the estimated number is still changing on a monthly basis (www.ensembl.org). Thus, the nonredundant set of proteins in the human proteome is probably between 20,000 and 25,000. On the other hand, if one includes all the protein variants generated by RNA splicing or specific proteolytic processing, the number of protein isoforms rapidly increases. Splicing is a common phenomenon and frequently gives rise to different forms of the same protein, often through events linked to a targeting of the protein variants to different compartments of the cell (50). Site-specific proteolysis is particularly
common in the processing of preproteins of neuropeptides,
but has also been shown to be involved in the aturation of proteins, such as the cleavage of the C-peptide from the proinsulin molecule to create functional insulin and
C-peptide (51). The estimated number of protein variants
represented by splicing and proteolysis is still unknown, but the number might be between 50,000 and 500,000.
Another form of variation in the human proteome is the combinatorial variants created by somatic rearrangement in cells involved in the immune system. A well-known example is the immunoglobulin G (IgG) The number of different IgG molecules in a human individual is probably
more than 10 million molecules all with different complementarity-determining regions and thus different binding properties
Who knows how many more combinatorial variants there are. If these isoform variations are used as information for spatial organization of molecules like a quasi GPS, there could be billions of isoforms! 😯
Due to the complex nature of biologics which oftentimes consist of millions or billions of isoforms, characterization of molecular isoforms is an ever challenging task that is heavily dependent on technical advances in the related areas. This presentation will review the major isoforms in protein therapeutics especially in monoclonal antibodies.
How does this have bearing on ID? Isoforms involve alternative splicing, post translation modification, and regulation of these processes to make something useful. Not only would the coding portions of the genes need to evolve, but also the alternative splicing mechanisms, non coding regions, post translational modification, and regulation of these complex processes, etc… Evolutionary biology is rather sparse on mechanical explanations for how these things would evolve…
Finally, if each gene codes for a few thousand isoforms on average, then it become more reasonable to speculate the 90% non-coding regions of DNA might be critical for managing, creating, and regulating the millions or billions of isoforms in humans. This information to accomplish isoform creation and management has to be stored somewhere, and if not in the coding regions, then where?
Cocktail designation is for speculative ideas that I consider to have substantial merit.