RT inhibitors are under investigation and will show! RNase HI of Escherichia coli was the first one of which the three-dimensional structure was solved, revealing a conserved protein architecture, the RNase H fold Katayanagi et al. The conserved amino acids in E. RNase H-like proteins are directed to their nucleic acid substrates via additional factors. The best-studied example is fusion to the RT domain in retroviruses. The fusion leads to a distance of 18 nucleotides between the two active centers.
An unusual structural bend is located within the extended polypurine tract PPT that comprises 18 nucleotides, interrupted by a CU dinucleotide. A high degree of precision is required for this cut, leaving a terminal dinucleotide that is essential for integration and LTR formation Sarafianos et al. RNases H do not normally release mononucleotides as exonucleases would do. Exonuclease activity is controversial; however, it was recently described that RNase H-like enzymes can act as exonucleases depending on the orientation of a C-terminal alpha helix Majorek et al.
Furthermore, some RNases H can also cleave dsRNA, an activity that involves aspartic acid residues not required for cleavage of hybrid nucleic acids, as described for archaea Ohtani et al.
The importance of nucleic acid structure may be a remnant of a function in the ancient RNA world, where RNA structure was an important feature. Specificities of RNase H-like enzymes are governed by structural properties of the nucleic acid substrate, fused protein domains, protein—protein interacting factors, ion cofactors, and guide nucleic acids to find substrates in a sequence-specific manner.
This, the RNases H are team players, which is supported by partnering or fused proteins that enable specific functions Figure 1C and Table 1. Retroviral RNase H is discussed above. ZNF domains with desired sequence specificities can be artificially fused to an RNase H enzyme to achieve cleavage at specific target DNA sequences with potential use in gene therapy Sulej et al.
Transposases, the most abundant RNase H-like proteins Majorek et al. Transposases cleave DNA to excise and reinsert the transposon into the host chromosome. Terminases of phages contain an RNase H domain that cleaves dsDNA concatemers of phages into single genomes and are crucial for packaging into the virion and assembly Feiss and Rao, Both proteins resemble each other in structure, domain organization, RNA binding and cleavage properties.
The enzyme contains two nuclease domains and is directed to its target dsDNA by a guide RNA, a transcript originating from a previous invader that matches the DNA sequence of a new invader. The RuvC domain was originally identified in the RuvC resolvase that cleaves Holliday junctions, four-stranded DNA intermediates that form during recombination processes.
Thus, RNases H can specialize according to the required needs. Poxviruses also encode a Holliday junction RNase H-like resolvase. Then, V, D and J gene fragments are recombined. Both enzymes originate from a transposon and may have entered the vertebrate genomes from invertebrates such as sea urchin, sea star, and Aplysia more than Mio years ago Kapitonov and Koonin, ; Moelling and Broecker, Prp8, the core protein of the eukaryotic spliceosome, has both an RNase H and an RT domain, suggesting an evolutionary origin from an ancient retroelement.
The catalysis originally performed by a single ribozyme has evolved into the spliceosome — a large multi-component RNA-protein complex involving dozens of proteins, with Prp8 in its center. RecQ domains have been identified in various DNA polymerases of prokaryotes, eukaryotes, and viruses such as phage T7 Majorek et al.
Mutations can lead to Werner syndrome, which is associated with accelerated aging. Studying evolutionarily ancient proteins and their abundance is difficult since no protein fossils are available. They were first identified by protein architecture, conserved domains, or by conserved amino acids in their catalytic centers e.
Among the most frequent proteins in the biosphere is the RNase H — more abundant than enzymes involved in nucleotide metabolism, polymerases or kinases Ma et al. Transposases were described as the most ubiquitous the majority of sequenced genomes encode at least one transposase and abundant present in highest copy number per genome genes in nature Aziz et al.
Ubiquitous genes are essential and indispensable in every genome, whereas abundant genes can be frequent in only a few ecosystems. Transposases are ubiquitous but also have the highest copy number per genome and may accelerate biological diversification and evolution. They carry RNase H folds, of which recently more than about 60, unique domains were identified based on comparative structural analyses.
RNase H-like domains can be grouped by their evolutionary relationships into families Majorek et al. Recently, metagenomic sequencing was performed with large sample sizes from marine samples, including small eukaryotes protists , prokaryotes, ranging from 0. According to a recent study, RTs predominate in the metagenomes more than in metatranscriptomes , reaching up to Their weak transcriptional activity may reflect the active proliferation of retroelements that may contribute to genome evolution or adaptive processes of plankton.
Prokaryotes also harbor retroelements and DNA transposons, but less frequently than eukaryotes. This is likely due to the fact that transposons, retrotransposons, and other retroelements are extremely abundant on our planet. The RT has previously been better characterized than the RNase H-like superfamily, hence, we were wondering about the total abundance of RNases H-like genes. For that we analyzed the RNase H gene superfamily distribution and abundance in prokaryote-enriched samples of the Tara Ocean Sunagawa et al.
RNase H-like genes with homology to a set of RNase H-like gene families were identified in all regions. On average about 10 to 15 RNase H-like gene copies per cell were detectable at all three levels. These numbers are comparable to previous findings that an average of about 13 transposase and integrase genes, the two most abundant RNase H-like genes, are present per genome, including viral, prokaryotic and eukaryotic species Aziz et al.
Thus, our data provide evidence that RNase H-like genes are probably among the most abundant gene superfamilies found in plankton organisms throughout the global oceans. Effective abundance of RNase H-like gene family members across the global ocean. The geographic distribution of effective abundances per cell of all genes homologous to a set of RNase H-like gene families Majorek et al. Samples correspond to prokaryote-enriched size fractions collected in the context of the TARA Oceans project Sunagawa et al.
For each sample, the sum of RNAse H-like gene abundances was normalized by the median abundance of 10 universal single copy phylogenetic marker genes Sunagawa et al. Interestingly, certain plankton populations and gene functions as judged by taxonomic marker gene sequences and gene family abundances, were dominant and shared among different regions of the oceans designated core taxa and core gene families, respectively Sunagawa et al.
Dispersal mechanisms by currents are thought to distribute these species and their genes. The less abundant species were not easily detected Sunagawa et al. A similar phenomenon about core sequences we observed in the human gut microbiota when we analyzed the virome in comparison to the bacterial and fungal communities of a patient, who underwent a fecal microbiota stool transfer, where core sequences also dominated Broecker et al.
Thus, in the oceans, it appears that RNase H-like proteins represent important core sequences, as they were identified in all samples tested here Figure 2. Group II self-splicing introns, a large class of mobile ribozymes, are found in all domains of life, eukaryotes, bacteria, archaea, plants, and marine plankton.
It is site-specific and recognizes intron sequences transcribed into a DNA, targeted for target-primed RT. Group II introns are the only ones in bacteria, which are mobile Simon and Zimmerly, Thus, an RNase H was required for the retromobility of the evolutionarily most ancient retroelements Figure 3A. NC is a nucleoprotein that enhances ribozyme activity and reverse transcription.
C Phylogenetic tree of RNase H domains found in giant viruses Mimiviridae , Phycodnaviridae , and Pithoviridae , eukaryotes, various retrotransposons, retroviruses and pararetroviruses Hepadnaviruses and Caulimoviruses. This msDNA accumulates to high levels within the cell, yet its function remains elusive.
The msDNA is about 3. Diversity-generating retroelements DGRs are a class of mobile elements found in phages that infect whooping cough causing Bordetella bacteria. Bordetella bacteriophage BPP-1 integrates into the bacterial genome as temperate phage and encodes an error-prone RT.
The RT generates variants of the phage protein at the tip of the tail fibers of the phages. This allows for tropism changes by switching binding specificity to bacterial receptors.
The number of variants is extraordinary and reminescent of V D J antibody diversity. DGRs are widely distributed in bacterial chromosomes, phage and plasmid genomes Miller et al.
The abundance of RT and RNase H genes in prokaryotes and unicellular eukaryotes leads to the question about their abundance in eukaryotic genomes. The human genome contains about 40, complete or partially truncated HERVs. HERVs are proviruses of ancient retroviral infections that have mostly degenerated to solitary LTRs about , per genome via homologous recombination. They do not utilize RNA primers that would require an RNase H, but a target site primed mechanism for reverse transcription.
Movements require endonucleases, no RNases H. These sequences can be regarded as a graveyard of previous infections by retroviruses and retroviral-like elements, or as a viral archive Lander et al.
Most TEs have accumulated mutations and deletions over time, rendering them inactive. Interesting, DNA transposons are rare in humans. The total number of retroviruses and their genes calculated from the number of , solitary LTRs would exceed the total size of the present human genome that is about 3.
Could it have been larger in the past as is the case in some plants? TEs are even more abundant in some plant genomes, e. The large genomes may be a consequence of breeding for improving the yields for food supply. Similarly, prokaryotic genomes can contain phage sequences within CRISPR arrays, albeit in smaller quantities per genome see below. The promoters of retroviruses or retroelements such as the LTRs can influence host gene expression, for instance, by supplying transcription factor binding sites, altering chromatin structure, or promoting regulatory non-coding RNAs both in trans and in cis Broecker et al.
However, whether RE activity is causal for cancer or is simply a bystander effect remains unknown. Retroviruses and REs can mobilize flanking sequences, and thereby cause gene duplication events, one of the most impactful mechanisms for creating genes with novel functions Feschotte and Pritham, Despite the accumulation of often deleterious mutations, TEs frequently contain functional promoters and contribute to a large fraction of the human transcriptome Faulkner et al.
TEs can influence cellular genome architecture and function by means of gene duplication or mutagenic events caused by integration Chuong et al. While one copy can continue fulfilling acquired necessary functions, the other copy can change substantially. An interesting example of a gene duplication event is the RNase H of retroviruses, whereby the RNase H linked to the RT has deteriorated to an inactive enzyme with only linker function to the neighboring active RNase H Ustyantsev et al.
Every retrovirus infection supplies about 10 novel genes to the cell, including those encoding the RT, RNase H and integrase. Viral integration events can lead to horizontal gene transfer HGT or recombination events. For example, there are about retroviral oncogenes known, whereby some of them are relevant targets for human cancer therapies today such as Raf, ErbB, etc.
Flint et al. In summary, the global abundance of RTs and RNases H can be attributed to retroelements, retrotransposons, retro- and pararetroviruses, and endogenous viruses in eukaryotic genomes.
RTs are more prevalent and occur also without RNases H. In the early RNA world, before proteins and DNA arose, simple self-replicating RNAs with ribozyme activity formed as primary biological entities that were non-coding nc but relied on structural information based on robust hairpin-looped structures. These ribozymes are capable of cleaving, joining, and evolving, as demonstrated experimentally Lincoln and Joyce, Replication must have occurred in a prebiotic environment at hydrothermal vents down in the dark oceans with energy supply from chemical reactions without light.
Ribozymes lack coding information but rely on structural information, and today are still important for the biological function of the vast majority of ncRNAs in eukaryotic cells and RNA viruses.
These ancient mobile genetic elements are present until today, not only in eukaryotes but also in prokaryotes Lambowitz and Zimmerly, They are naked viruses, free of proteins until today. Since such elements were not considered as viruses, they were designated as viroids.
Just like some other viruses the viroids can be pathogenic and inflict significant damage to many plants Moelling, They contain a core region, which is active as siRNA for gene regulation. Their enzymatic activity has been lost in some viroid species today — presumably in the rich cellular environment of host cells, which today harbor viroids.
Gene loss as a consequence of a rich milieu is a known principle. Even in the nutrient-rich environment of the guts of an obese patient the complexity is reduced Moelling, Improvement of the catalytic activity of ribozymes early during evolution can be easily imagined coming from RNA binding proteins RBPs or small peptides with positive charge based on in vitro studies.
Nucleocapsids can be detected in every RNA virus today as nucleocapsids or ribonucleoproteins Flint et al. They are surprisingly multifunctional proteins serving many purposes. NCs are rich in basic peptides such as lysine and arginine, which may have formed as smaller precursors of the NCs, and are essential components in all RNA viruses today. Peptides could have formed in the prebiotic environment at hydrothermal vents, before the translation machinery, codons or DNA arose.
The protein translation machinery must have evolved later, since ribozymes themselves contributed to the protein synthesis machinery by supplying the enzymatically active component at the center of the protein synthesis apparatus. Ribosomes today consist of about one hundred scaffold proteins for maintenance of ribosomal structure and function, and in addition some ribosomal RNAs that serve as the basis for determination of bacterial species in microbiomes.
Viruses with only tRNA-like structures may have contributed to the evolution of the protein synthesis machinery. Such narnaviruses exist till today in fungal species Moelling, Then there is a rare example of a retroviroid described in carnation plants. Thus, this viroid exploits an RT, presumably provided in trans by a plant pararetrovirus, such as cauliflower mosaic virus.
Patel et al. It is a frequent evolutionary progress and improvement that RNA leads to proteins, fulfilling similar functions but with significantly increased efficiencies. It can be easily envisaged that ribozymes became RNases H, probably by multistage processes.
How the RT evolved, is still a matter of speculation. They appear like frozen intermediates and may be relics from early steps in evolution. They may be more ancient than the separation between prokaryotes and eukaryotes. Thus the retrons may point to the earliest possible roots of these elements and these two enzymes Moelling, RNase H and RT are involved in intron splicing by forming loops, the lariats. Today a eukaryotic spliceosome includes dozens of proteins. Surprisingly, Prp8 at the core of the spliceosome encodes an RT and RNase H domain, albeit both without enzymatic activities.
LINEs no longer have ribozyme activity and encode a limited number of proteins in addition to the RT. An evolutionary advancement of non-LTR retrotransposons, compared to group II introns, is their independence of a foreign RNase H for retrotransposition. Interestingly, the RNase H domain, compared to that of non-LTR retrotransposons, has lost a subdomain perhaps resulting in weaker catalytic activity Malik and Eickbush, An important evolutionary event was the duplication of the RNase H domain with one component leading to a tether or connection region, an inactive RNase H.
The archaeal RNase H2 has been suggested to be derived from retrovirus elements Ohtani et al. Compared to LTR retrotransposons, retroviruses additionally gained an Envelope Env protein that is required for cell-to-cell transmission. Env proteins of different retrovirus lineages may have been acquired independently from different viral sources.
The Env-derived cellular protein Syncytin contributes to syncytia formation by cell-cell fusion and originates from an endogenous retrovirus ERV-W of about 35 Mio years ago Dewannieux et al. Due to the immunosuppressive properties of ERV-W Env, the derived syncytin prevents immune rejection of the embryo by the mother in humans and other species.
It is rather unknown that there are not only retroviruses but even retrophages in bacteria. Only a few such intermediate-type viruses are known, such as the BPP-1 retrophage hosted by B. This temperate phage expresses an RT whose infidelity exerts a mutagenic effect on the phage receptor gene, which can alter phage tropism. At least 36 types of such retrophages exist Guo et al.
The infidelity of the RT leads to about one mutation per thousand nucleotides and round of replication. This is a major force for change of bacterial tropism and evolution in general. Thus, these diversity generating retroelements, DGRs, demonstrate the contribution of a phage retroelement with mutagenic RT to genetic diversity and genomic variation of surface proteins of phage particles, but also of bacterial cells themselves, such as Legionella pneumophila Guo et al.
Viruses or virus-like elements as the beginning of an RNA world have built up to bigger and more complex entities. Recently, intermediates between viruses and bacteria have attracted attention, the giant viruses or Megavirales. They are the biggest viruses known, surpassing the size of many bacteria and some encode genes involved in the protein translation machinery, an indicator of independent life. Are they half-finished bacteria or regressed from bacteria?
Interestingly, giant virus genomes can harbor retrotranspo sons Maumus et al. The virus is related to the Cafeteria roenbergensis Cro virus, a giant marine virus widespread in protists with more than genes and a dsDNA genome of , nucleotides.
The sequence alignment of RNase H-like enzyme sequences in comparison to known sequences with the highly conserved aminoacids D, E, D, D and the partially conserved H as indicated. The origin of the RNases H and a comparison will be subject of further analysis Russo et al. The conserved amino acids DEDD are hallmarks of RNases H, and were identified in all of them, indicative of enzymatically active proteins.
A prominent phycodnavirus is a green algae virus that infects Emiliania huxleyi coccoliths and leads to algae bloom. It also generated millions of years ago the white cliffs of Dover.
However, two conflicting results that E. The results are shown in Figs. This band is produced more effectively in the presence of manganese ions than in the presence of magnesium ions. These results indicate that E. This result is consistent with that previously reported Thus, E.
It has been reported that hydrolysis of this substrate by E. This substrate is not cleaved by E. It has been reported that the RNase H activity of E. As a result, E. The concentration of the substrate was 1. The enzymes used to cleave these substrates are shown above the gel. The metal cofactors used to cleave these substrates are also shown above the gel together with their concentrations.
C, substrate before enzymatic reaction; M, marker. The differences in the lengths of the arrows reflect relative cleavage intensities at the position indicated.
To examine whether E. This result suggests that E. Inability of E. This result indicates that the presence of two upstream ribonucleotides is sufficient for the cleavage of an Okazaki fragment-like substrate by E. The RNA-DNA junctions of these primary products are cleaved only when the concentration of the enzyme is greatly elevated. This result suggests that the substrate specificity of E. Decreased substrate specificity and cleavage-site selectivity in the presence of manganese ions have also been reported for other RNases H 23 , 24 , 26 , 27 , The enzymes, metal cofactors and substrates are shown above the gel.
The open box represents the DNA strand. The finding that E. A crude extract from E. However, it remains to be determined whether the purified protein of E. As shown in Fig. Likewise, as shown in Fig. The RNase H activity of E. As a result, the single ribonucleotide was removed from the substrate.
These results also suggest that E. C, substrate before enzymatic reaction. C, substrate before enzymatic reaction; M, markers. The open and solid arrows represent the first and second cleavage sites respectively. Tannous, unpublished results. These results suggest that single ribonucleotides embedded in dsDNA can be removed by a cooperative work of E. The arrows represent the cleavage sites. DNA deoxyribonucleic acid is a universal hereditary material that encodes genetic information essential for the existence of all cellular life and some viruses.
It is characterized by its ability to store and transfer information and to self-replicate, which is catalyzed by enzymatic machinery.
To keep the integrity of the transferred information, DNA should be kept unmodified. However, DNA is frequently subject to various modifications that can render it unstable if left unrepaired. Of these modifications, the presence of single ribonucleotide monophosphates rNMPs misincorporated into the DNA backbone shows both negative and positive consequences on the genome Recent studies indicate that RNase H2 saves the genome by initiating the pathway which removes those intruders and restores the DNA back to its original form with the assistance of several other enzymes 11 , 12 , 18 , 19 , 20 , 38 , In this study, we showed for the first time that E.
Two possible RER pathways, in which both enzymes are included, are schematically shown in Fig. Tannous, unpublished result , whereas Halo-RNase H1 is an acidic protein and is purified by anion-exchange column chromatography Because E.
DNA strand and a single ribonucleotide are shown by blue and red boxes respectively. Single ribonucleotide and flanking deoxyribonucleotides are labeled R and D respectively.
Two possible pathways, in which a single ribonucleotide embedded in dsDNA is removed by RNases H1 and H2 in a stepwise manner, are shown. The in vivo studies showing that E. Thus, the significance of this work is that it highlights a new possible role for bacterial type 1 RNases H in DNA repair, which should be further investigated in vivo , as well as for RNases H from different organisms.
Overproduction and purification of E. The value of E. FAM represents 6-carboxyfluorescein. These duplexes were used as substrates. C Denaturing gel electrophoresis showing the limited digestion of double stranded substrates. The uncleaved duplexes are indicated with the asterisk and the cleaved fragment with the tilde. D Correlation of heptamer counts for the construct R4b positions 9—15 between two untreated controls top left , two human RNase H1 treated samples top right and a control and a treated sample bottom.
To better understand the sequence preferences of the three RNase H enzymes, we used our sequencing data to determine the i position-specific changes in nucleotide content after RNase H treatment, ii importance information content of different positions of the recognized sequence and iii best-cleaved motifs Figure 2A and Supplementary Figure S2a. The sequence preferences of the E. Importantly, for each enzyme the observed preferred sequences are very similar for the three different duplexes tested, demonstrating that they are not dependent on the context of the specific constructs and that H-SPA is reproducible.
In the case of all three enzymes, our data demonstrate that they exhibit clear sequence preferences. This supports the notion previously observed for E.
The intensity of the red and blue color indicates the k rel of having given nucleotide at a given position fixed relative to the average hydrolysis rate of the randomized pool. Note that only the randomized parts of the probed duplexes is displayed.
With respect to the reference substrate, the k rel of the preferred substrate is 3. C The design of the dumbbell substrate mimics. D The cleavage of a reference substrate in the presence of increasing concentrations of a preferred or avoided dumbbell substrate mimic. To validate the observed preferences of the human RNase H1, we synthesized three duplexes with the sequences found to be the most preferred, the most avoided or the closest to the average for cleavage by the RNase H, respectively Supplementary Figure S4.
We therefore tested nicked dumbbell versions of the preferred and avoided substrates, designed in such a way that the cleaved strand cannot dissociate from the duplex Figure 2C. To further validate our data and explore the importance of the Human RNase H1 sequence preferences, we used our comprehensive dataset for duplex R4a to construct a position weight matrix PWM , which for each nucleotide at each position quantifies the relative concentration change after RNase H cleavage of the subset of molecules having this nucleotide at this position change Supplementary Data.
This finding suggests that features such as stacking or bending of the duplex, which are affected by dinucleotide dependencies, are involved in RNase H1 cleavage efficiency. More complex models using 3, 4 or 5 nt words as the input did not lead to further improvements in the performance of the prediction Supplementary Figure S5 , indicating that long-range dependencies for the substrate recognition by the RNase H1 catalytic domain are negligible.
When requiring the previously reported interaction with 11 bp of the duplex, we find that out of the eight possible interaction modes, RNase H1 contacts the enzyme at exactly the position predicted to have the highest preference for cleavage Figure 3C.
Since the structure was solved using a catalytic deficient mutant, this result supports the idea that the RNase H sequence preferences at least partially depend on the binding of the duplex substrate. RNase H Sequence Preferences correlate with gapmer efficiency. A Correlation between the log 2 fold changes of different hexamers observed for the R4b construct in the experiment and the corresponding log 2 fold changes as predicted by a single nucleotide model prepared from the data obtained in the R4a experiment.
B As in A , but with prediction with a dinucleotide model. Each bar corresponds to the cleavage site of a potential binding mode of RNase H1. The filled bar corresponds to the RNase H1 binding mode observed in the crystal structure and is also indicated in the drawing below the plot.
D Correlation between the change of target RNA level for MAPT 30 after treatment with different gapmers and the corresponding downregulation predicted by the dinucleotide model for the different binding modes of RNase H1 on each gapmer target duplex. The drawing below the plot indicates the RNase H1 binding mode associated with the best-observed correlation.
RNase H1 is central to the therapeutic action of gapmers and we therefore wanted to investigate whether our sequence preference model could predict gapmer potency. For the target site of each of the gapmers tested in these studies, we calculated the predicted cleavage using the H-SPA 9 nt wide dinucleotide model and compared this value to the observed reduction of mRNA level.
The predictions for positions in the central part, which can be cleaved by RNase H, significantly correlated with the observed gapmer activity, but predictions for the flanks, which cannot be cleaved, did not Figure 3D and E , indicating that the H-SPA dinucleotide model has the potential to improve gapmer design.
The HIV-1 RNase H enzyme activity of the viral reverse transcriptase is essential for the viral life cycle, but so far it has not been possible to make a general prediction of cleavage positions in the viral genome. This agrees with the findings from a previous study, which analyzed a limited number of sequences for in vitro cleavage efficiency by HIV-1 RNase H 18 , but our dataset allows a much more comprehensive characterization of the sequence preferences Figure 2A and Supplementary Figure S6.
For the HIV-1 enzyme, we only observed efficient cleavage of the construct with the seven RNA positions, which allows the HIV-1 RNase H to cleave at several different positions, thereby making the interpretation of the results more difficult. We therefore reasoned that the sequence preferences observed in Figure 2A could be further improved by aligning sequence words by their position of cleavage. For optimal cleavage of this motif by HIV-1 RNase H, at least four ribonucleotides upstream and at least two ribonucleotides downstream of the cleavage site are required.
In addition, we find that having less than three ribonucleotides upstream of the cleavage site completely inhibits the cleavage. This allowed us to split all the heptamers into seven groups depending on the cleavage location. For each set, we used the most downregulated quartile to create sequence logos Figure 4C. The logos from the different groups largely resemble each other when aligned by the assigned cleavage location.
The position of the arrow indicates the cleavage site as aligned to the picture of scissors in the box and the arrow length represents the efficiency of cleavage. C Sequence logos of the best cleaved quartile of sets of heptamers predicted to have the same cleavage site. The arrows indicate the predicted cleavage site, with the length proportional to the observed cleavage efficiency.
B Schematic of the HIV reverse transcription. White scissors at the black circle indicate specific areas zoomed-in in subsequent panels. The red rhombi shows the observed count of distances between positions predicted to be efficiently cleaved in HIV-1 genome that fall into the indicated distance intervals.
G Predicted cleavage efficiency of the best-cleaved site in the terminal 18 nt of the different human tRNAs plus CCA and of the corresponding reverse complement. The tRNA-Lys3 is indicated in red. Conceptually, our strategy resembles the strategy previously used to detect the sequence preference of RNase P 29 , but in our method, we have introduced a double stranded substrate with a mixed chemistry and increased the length of the randomized gap.
To deal with the increased sequence complexity of our experimental set-up, which exceeded the number of obtained sequencing reads, we also developed a novel data analysis approach focusing on enrichment of k-mers at specific positions of the randomized sequence.
The computational strategies and methods developed for H-SPA data analysis use robust count-based RNA-seq analysis 28 , which explicitly takes experimental variability into account and could be adapted for the analysis of other types of SPAs. We were particularly interested in the preferences of the human enzyme, which is necessary for gapmer-mediated target knockdown 8 — 10 and therefore influences the pharmaceutical activity of an entire class of human drugs.
Moreover, these preferences significantly correlate with the efficiency of antisense oligonucleotides ASOs in cell culture, strongly suggesting that our results can be used to improve their design.
One of the limitations of the H-SPA method is the unnatural structure of the probed substrates. Nevertheless, only one of the modified nucleotides directly interacts with the RNase H1 enzyme 22 Supplementary Figure S8 and reassuringly, for the human and E. We observe strikingly similar sequence preferences of the human and E.
Thus, although we cannot rule out that HBD binding preferences affect ASO efficiency in vivo , our results indicate that the catalytic domain is central for human RNase H1 target preferences.
Interestingly, our results show that preferred sequences, apart from having a higher relative processing rate, also have slower dissociation rate of the enzyme from the cleaved substrate, which in some of our biochemical assays resulted in the depletion of the active enzyme. Depending on the exact cellular conditions these two counteracting mechanisms will have opposite effects on the knockdown efficiency.
It has been reported that the concentration of RNase H is a limiting factor to the antisense activity In contrast, we observed pronounced differences in the preferences of HIV-1 and other tested RNases H upstream from the cleavage site, with HIV-1 having a propensity to cleave after GC-rich sequences, as opposed to G-rich for the two other enzymes. Future studies are needed to resolve to what degree the polymerase domain dictates the overall sequence preferences.
The difference in preferences between the HIV-1 and the human RNase H probably reflects the structural differences of these two enzymes. Our R7 substrate is sufficiently long to simultaneously bind both the HIV-1 polymerase and RNase H domains and is efficiently cleaved in our experiments, indicating that the constant dsDNA part of the R7 substrate binds the polymerase domain, even though a RNA—DNA heteroduplex have been shown to bind with higher affinity to the polymerase domain than dsDNA Additionally, in this region the enzyme contacts exclusively the non-cleaved DNA strand, which has the same sugar chemistry in our R7 substrate as in the natural substrate 38 , The comprehensive nature of our analysis allowed us to create a preference motif sufficiently rich in information that we could perform global prediction of RNase H cleavage sites in the HIV genome.
Despite millions of years of extremely rapid evolution 56 the same host molecule tRNA-Lys3 is utilized by almost all lentiviruses to prime reverse transcription Cleavage of the R7 substrate resembles the internal mode of cleavage and we assume the observed preferences are relevant also for HIV-1 RNases H end-directed cleavages. In support of this, we find that HIV-1 genomic positions predicted by our cleavage model to be efficiently cleaved are preferentially separated by 13—19 nt Figure 5C.
Interestingly, similar analysis performed for HIV-2 genome did not show significant enrichment Supplementary Figure S7b , which may reflect that the role of viral RNase H in the reverse transcription of HIV-2 is less important, as indicated by much lower RNase H activity Finally, we note that the two sites in the HIV-1 genome predicted to be the most efficiently cleaved are located within the Rev response element, suggesting a functional relevance.
In conclusion, this study uncovers sequence preferences of the important RNase H enzymes and thereby provides valuable information for the future design of antisense oligonucleotides and contributes to an improved understanding of HIV-1 biology.
The presented method, both experimental and computational, can be readily applied to other enzymes acting on duplexed oligonucleotides, such as the human RNase H2 or restriction endonucleases. Conflict of interest statement. In Ribonucleases H, Crouch, R. Eds , pp 39— Mol Cell — Good L Translation repression by antisense sequences. J Med Chem — Nakagawa Y, Tayama K Mechanism of mitochondrial dysfunction and cytotoxicity induced by tropolones in isolated rat hepatocytes. Intervirology — Simpson, R.
A Laboratory Manual. Loya S, Hizi A The interaction of illimaquinone, a selective inhibitor of the RNase H activity, with the reverse transcriptases of human immunodeficiency and murine leukemia retroviruses. Biochimie —
0コメント