And we are not sure which ones they are. From ScienceDaily:
Research Centre (CNIO) reveals that up to 20% of genes classified as coding (those that produce the proteins that are the building blocks of all living things) may not be coding after all because they have characteristics that are typical of non-coding or pseudogenes (obsolete coding genes).
They don’t mean “junk DNA,” do they? Not this again.
The work once again highlights doubts about the number of real genes present in human cells 15 years after the sequencing the human genome. Although the most recent data indicates that the number of genes encoding human proteins could exceed 20,000, Federico Abascal, of the Wellcome Trust Sanger Institute in the United Kingdom and first author of the work, states: “Our evidence suggests that humans may only have 19,000 coding genes, but we still do not know which 19,000 genes are.”
For his part, David Juan, of the Pompeu Fabra University and participant in the study, reiterates the importance of these results: “Surprisingly, some of these unusual genes have been well studied and have more than 100 scientific publications based on the assumption that the gene produces a protein. ”
This study suggests that there is still a large amount of uncertainty, since the final number of coding genes could 2,000 more or 2,000 fewer than it is now. The human proteome still requires much work, especially given its importance to the medical community. Paper. (open access) – Federico Abascal, David Juan, Irwin Jungreis, Laura Martinez, Maria Rigau, Jose Manuel Rodriguez, Jesus Vazquez, Michael L Tress. Loose ends: almost one in five human genes still have unresolved coding status. Nucleic Acids Research, 2018; 46 (14): 7070 DOI: 10.1093/nar/gky587 More.
Been a while since we’ve heard much about humans as the 98% or 99% chimpanzee. If the human genome is this fuzzy how would we know? And doubtless, things have gotten more complex.
See also: Human genome shrinks again, lower than projected nematode worm (2014)