Endogenous Retroviruses and Junk DNA

The latest version of the Carnival of Evolution pointed to an article by Steven Quistad on Small Things Considered. The article reviewed a recent paper on endogenous retroviruses [The Rise of Genomic Superspreaders].

Retroviruses are RNA viruses that go though a stage where their RNA genomes are copied into DNA by reverse transcriptase. The virus may integrate into the host genome and be carried along for many generations producing low levels of virus particles [Retrotransposons/Endogenous Retroviruses ]. Most of these events will occur in somatic cells so the integrated virus is not passed along to progeny but from time to time the virus integrates into germ line DNA and this is heritable.

There are 31 such events in our lineage, meaning that we have copies of 31 different retroviruses in our genome. The retroviruses may have produced copies in germ line DNA such that each of the 31 retroviruses is now represented by a family of sequences scattered throughout the genome. Today, these retrovirus sequences represent a total of 8% of our genome! That's over 200,000,000 base pairs of DNA. There are about 100 thousand different sites.1

There's no selective pressure to maintaining the functionality of these retrovirus sequences so, as you might have guessed, most of them have accumulated mutations over millions of years. (The original insertion events took place at various times ranging from 100 million years ago to only a few million years ago.) Almost all of the 8% consists of defective retrovirus sequences. It's junk.2

But it's a special kind of junk because retrovirus DNA has strong promoters that bind various transcription factors and the flanking enhancers ensure that the region around these promoters will be in open chromatin regions that have all the characteristics of real promoter sites. A substantial proportion of the defective retroviruses will still produce transcripts because the promoter region may not be mutated even though there may be lethal mutations elsewhere in the sequence.

What does this mean? It means that there will be thousands of junk DNA sites that bind transcription factors and RNA polymerase and may even be transcribed. When you're doing whole genome analyses, like those in the ENCODE study, you need to be careful to distinguish between functional promoters and non-functional promoters.

1. The typical retrovirus genome is about 3,000 bp in length but many of the defective retrotransposon sequences have been are truncated by deletions.

2. Except for an extremely small number that might have acquired a secondary function such as enhancing expression of a nearby gene.