All? Some? None?
Clark et al., The Reality of Pervasive Transcription:
Current estimates indicate that only about 1.2% of the mammalian genome codes for amino acids in proteins. However, mounting evidence over the past decade has suggested that the vast majority of the genome is transcribed, well beyond the boundaries of known genes, a phenomenon known as pervasive transcription [1]. Challenging this view, an article published in PLoS Biology by van Bakel et al. concluded that “the genome is not as pervasively transcribed as previously reported” [2] and that the majority of the detected low-level transcription is due to technical artefacts and/or background biological noise. These conclusions attracted considerable publicity [3]–[6]. Here, we present an evaluation of the analysis and conclusions of van Bakel et al. compared to those of others and show that (1) the existence of pervasive transcription is supported by multiple independent techniques; (2) re-analysis of the van Bakel et al. tiling arrays shows that their results are atypical compared to those of ENCODE and lack independent validation; and (3) the RNA sequencing dataset used by van Bakel et al. suffered from insufficient sequencing depth and poor transcript assembly, compromising their ability to detect the less abundant transcripts outside of protein-coding genes. We conclude that the totality of the evidence strongly supports pervasive transcription of mammalian genomes, although the biological significance of many novel coding and noncoding transcripts remains to be explored.
However, van Bakel et al. respond:
Clark et al. criticize several aspects of our study [1], and specifically challenge our assertion that the degree of pervasive transcription has previously been overstated. We disagree with much of their reasoning and their interpretation of our work. For example, many of our conclusions are based on overall sequence read distributions, while Clark et al. focus on transcript units and seqfrags (sets of overlapping reads). A key point is that one can derive a robust estimate of the relative amounts of different transcript types without having a complete reconstruction of every single transcript.
In this brief response, we first revisit what is meant by pervasive transcription, and its potential significance. We then discuss the major points raised by Clark et al. in the order presented in their critique. Finally, we demonstrate that conclusions very similar to those of our original study are reached with a dataset with far greater read depth, obtained by strand-specific sequencing of rRNA-depleted total RNA from a single cell type.
Thoughts?