As scientists sequence more genomes from different organisms, they are discovering roughly 10-20% of each genome’s protein-coding sequence is new, that is, unlike any other known protein-coding sequence. This was one of the biggest surprises to come out of the whole genome-sequencing project, though by no means the biggest.
Why? The working assumption had been that, given common descent and the fact that most house-keeping genes are shared among living things, and the hither-to assumption that evolution occurs by incremental small changes, orphan genes (protein-coding sequences without known protein-coding antecedents) were assumed to be rare if not non-existent.
At this point it is necessary to explain a little about how such orphan sequences come to be identified. More.
4. Given the fact that such surprising species- or clade-specific proteins exist, it raises interesting questions about where orphans come from. Some might have come from gene duplication followed by rapid adaptive evolution (see #3 above). If that is the case we should see traces left behind in the orphan protein’s three-dimensional structure. Some propose recruitment from non-coding DNA by a combination of mechanisms, including insertion of transposable elements. This is possible, but it would require that the insertion or other mechanism(s) be lucky events in order to produce a stable, functional protein, that is, one that is of use to the organism. Exactly how lucky is one of the issues we are debating.
5. Then there is the elephant in the room that evolutionary biologists don’t want to acknowledge. Perhaps we see so many species- and clade-specific orphan genes because they are uniquely designed for species- and clade-specific functions. Certainly, this runs contrary to the expectation of common descent.
It’s convenient that discussing these problems is “religion”—unless researchers suggest that space aliens dunit. Maybe they’ll have to do that in order to enable a discussion.