-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsummary.txt
22 lines (22 loc) · 907 KB
/
summary.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
"Title" "Authors" "Corresponding Author" "Publication Date" "Abstract" "Keywords" "Full Text"
"Remobilization of Sleeping Beauty transposons in the germline of Xenopus tropicalis" "Donald A Yergeau, Clair M Kelley, Emin Kuliyev, Haiqing Zhu, Michelle R Johnson Hamlet, Amy K Sater, Dan E Wells, Paul E Mead" "Paul E Mead" "24 November 2011" "The Sleeping Beauty (SB) transposon system has been used for germline transgenesis of the diploid frog, Xenopus tropicalis. Injecting one-cell embryos with plasmid DNA harboring an SB transposon substrate together with mRNA encoding the SB transposase enzyme resulted in non-canonical integration of small-order concatemers of the transposon. Here, we demonstrate that SB transposons stably integrated into the frog genome are effective substrates for remobilization., Transgenic frogs that express the SB 10 transposase were bred with SB transposon-harboring animals to yield double-transgenic 'hopper' frogs. Remobilization events were observed in the progeny of the hopper frogs and were verified by Southern blot analysis and cloning of the novel integrations sites. Unlike the co-injection method used to generate founder lines, transgenic remobilization resulted in canonical transposition of the SB transposons. The remobilized SB transposons frequently integrated near the site of the donor locus; approximately 80% re-integrated with 3 Mb of the donor locus, a phenomenon known as 'local hopping'., In this study, we demonstrate that SB transposons integrated into the X. tropicalis genome are effective substrates for excision and re-integration, and that the remobilized transposons are transmitted through the germline. This is an important step in the development of large-scale transposon-mediated gene- and enhancer-trap strategies in this highly tractable developmental model system." "Donor Locus, Sleep Beauty, Tropicalis Genome, Outcross Progeny, Sleep Beauty Transposase" " Remobilization of Sleeping Beauty transposons in the germline of Xenopus tropicalis: Donald A Yergeau1, Clair M Kelley1, Emin Kuliyev1, Haiqing Zhu1, Michelle R Johnson Hamlet1, Amy K Sater2, Dan E Wells2 & Paul E Mead1 : Mobile DNA volume 2, Article number: 15 (2011) Cite this article : 4349 Accesses: 6 Citations: 0 Altmetric: Metrics details: Abstract: Background: The Sleeping Beauty (SB) transposon system has been used for germline transgenesis of the diploid frog, Xenopus tropicalis. Injecting one-cell embryos with plasmid DNA harboring an SB transposon substrate together with mRNA encoding the SB transposase enzyme resulted in non-canonical integration of small-order concatemers of the transposon. Here, we demonstrate that SB transposons stably integrated into the frog genome are effective substrates for remobilization.: Results: Transgenic frogs that express the SB 10 transposase were bred with SB transposon-harboring animals to yield double-transgenic 'hopper' frogs. Remobilization events were observed in the progeny of the hopper frogs and were verified by Southern blot analysis and cloning of the novel integrations sites. Unlike the co-injection method used to generate founder lines, transgenic remobilization resulted in canonical transposition of the SB transposons. The remobilized SB transposons frequently integrated near the site of the donor locus; approximately 80% re-integrated with 3 Mb of the donor locus, a phenomenon known as 'local hopping'.: Conclusions: In this study, we demonstrate that SB transposons integrated into the X. tropicalis genome are effective substrates for excision and re-integration, and that the remobilized transposons are transmitted through the germline. This is an important step in the development of large-scale transposon-mediated gene- and enhancer-trap strategies in this highly tractable developmental model system.: Background: Amphibian model systems have provided a wealth of information on the molecular mechanisms controlling early vertebrate development. Frogs of the Xenopus genus are particularly well suited for embryological study as these animals adapt well to captivity and the females can be induced to lay large numbers of eggs throughout the year. The most commonly used amphibian model is the South African clawed frog, X. laevis. Genetic manipulation of this species is not practical due to the long generation time (> 1 year) and the pseudo-tetraploid nature of the genome. Another species of the Xenopus genus, X. tropicalis, shares the embryological advantages of its South African cousin and is better suited for genetic studies as it is a true diploid and has a relatively short generation time (approximately 6 months). The potential of applying modern genetics to this classical embryological model system has resulted in the rapid development of genomic tools for X. tropicalis in recent years (reviewed in [1, 2]), and the publication of the genome sequence [3].: Our studies have focused on using the class II DNA 'cut-and-paste' transposable elements to modify the frog genome for gene- and enhancer-trapping and for insertional mutagenesis [4–9]. Transposable elements have been used for many years to experimentally modify the genomes of plants and invertebrates and, more recently, have been applied to vertebrate model systems [10, 11]. Transgenesis with non-autonomous transposable elements offers advantages over other transgenic methodologies. First, transposable elements efficiently integrate into the target genomes. Second, as the transposon is excised from the donor plasmid prior to integration, plasmid sequences, which may cause epigenetic silencing [12, 13], are not integrated at the targeted locus. Third, once integrated into the genome, the transposon transgene is an effective substrate for excision and re-integration (remobilization) following re-expression of the cognate transposase enzyme. The ability to remobilize transposons resident in the genome can be used for a variety of applications, including large-scale transposon 'hopping' screens using gene- or enhancer-trap constructs.: Remobilization of a non-autonomous transposon transgene is achieved by expressing the transposase enzyme in the same cell harboring the transposon. This can be achieved by simply injecting fertilized one-cell embryos from the outcross of transposon transgenic animals with mRNA encoding the transposase. As development proceeds, the injected mRNA is translated by the host cell and catalyzes the excision and re-integration reactions. This approach has been used successfully with the Tol2 transposon system in fish and frogs [7, 14–16]. Another approach is to develop transgenic animals that express the transposase enzyme under the control of tissue specific promoters and to cross these animals with those that harbor a transposon substrate to generate double-transgenic progeny. This approach has been used very successfully for somatic remobilization of the Sleeping Beauty (SB) transposon to identify cancer genes in mice [17, 18]. Outcross of the transposase enzyme and transposon substrate double transgenic animals can result in novel remobilization events in the progeny [19–23].: We, and others, have used a co-injection strategy with the SB[24] transposon system to generate transgenic Xenopus that express fluorescent proteins under the control of ubiquitous or tissue-specific promoters [4, 6, 25]. The integration events generated by this method in the frog are not caused by the simple transposition of the transposon from the plasmid into the frog genomic DNA. Analysis of the integration sites indicated that several copies of the transposon, and parts of the flanking plasmid sequence, are introduced at discrete loci as small-order concatemers. This unexpected non-canonical integration mechanism makes cloning the integration site complicated and time consuming [6]. Although the integration events generated by the co-injection strategy resulted in non-canonical integration, we next investigated whether SB transposons stably integrated into the X. tropicalis genome are effective substrates for remobilization. Using a double-transgenic strategy, we show that SB transposons in the frog genome can be remobilized following re-expression of the SB transposase and that the remobilized integration events occur via canonical transposition.: Results: Generation and analysis of transgenic X. tropicalis expressing SB 10 transposase: A transgenic X. tropicalis line was engineered to express the SB 10 transposase under the control of a synthetic regulatory element, chicken ß-actin promoter coupled with a cytomegalovirus enhancer (CAGGS [26]) [27]. To track the inheritance of the SB 10 transgene, a X. laevis <U+03B3>1 crystallin-red fluorescent protein (RFP) [28] reporter was cloned downstream of the CAGGS-SB 10 transgene in a head-to-head orientation (Figure 1a). The presence of the linked <U+03B3>1 crystallin-RFP reporter allows screening for the CAGGS-SB 10 transgene based on the presence of red eyes (Figure 1b). We used the simple linear plasmid DNA injection method described by Etkin and Pearman to generate the transgenic SB transposase-expressing frogs [29]. Injected embryos were scored for the presence of RFP expression in the lens, and RFP-positive tadpoles (27 RFP-positive from 570 injected, 4.7%) were raised to adulthood. A single founder (CAGGS-SB 10;<U+03B3>cRFP 2 M), from a total of five animals outcrossed to date, was identified. Outcross of male founder CAGGS-SB 10;<U+03B3>cRFP 2M with a wild-type female resulted in 779 RFP-positive tadpoles from a total of 3,333 offspring (23.4%). The non-Mendelian inheritance of the transgene indicates that the germline of the CAGGS-SB 10;<U+03B3>cRFP 2M founder was mosaic for the transgene. Subsequent outcross of F1 animals derived from CAGGS-SB 10;<U+03B3>cRFP 2M resulted in the expected 50% of the progeny expressing the dominant lens-specific RFP reporter (in a representative F1 outcross there were 239 RFP-positive tadpoles from a total of 479, 49.9%). Southern blot analysis of RFP-positive tadpoles indicated that several copies of the transgene were integrated at a single locus in the founder (Figure 1c). Reverse transcriptase (RT)-PCR and Western blot analyses were used to verify that SB 10 transposase was expressed in the transgenic line. RT-PCR analysis showed that RFP-positive tadpoles at stage 40 [30] express mRNA encoding the SB transposase enzyme (Figure 1d). As expected, sibling tadpoles that did not express the RFP reporter in the lens were also negative for SB 10 mRNA expression. In adults, robust expression of SB transposase was detected in protein lysates prepared from testes harvested from RFP-positive male frogs, but not from RFP-negative animals (Figure 1e). SB 10 is also expressed in the liver of the transgenic frogs, but not in the RFP-negative littermates.: Generation of a transgenic Xenopus tropicalis that expresses SB 10 transposase. (a) Schematic of the pCAGGS-SB 10;<U+03B3>cRFP construct used to develop SB (SB 10) transposase-expressing transgenic frogs. The two transgenes were cloned in a tail-to-tail orientation. Not to scale. (b) Red lens in the right eye of an adult F1 transgenic frog from outcross of founder CAGGS-SB 10;<U+03B3>cRFP 2 M. The border of the eye is indicated by the dashed white line. (c) Southern blot analysis of genomic DNA harvested from RFP-positive and control animals indicated integration of multiple copies of the CAGGS-SB 10;<U+03B3>cRFP linear transgene. The DNA was digested with Bam HI and the blot was probed with a radiolabelled SB 10 cDNA probe (see schematic (a)). (d) RT-PCR analysis of SB 10 expression in tadpoles. SB RNA was detected in RFP-positive tadpoles (+RFP) but not in RFP-negative (-RFP) progeny from CAGGS-SB 10;<U+03B3>cRFP 2M. RNA from a wild-type tadpole was used as a negative control (St. 15). A mock reverse transcription reaction, without added RT, with RNA harvested from an RFP-positive tadpole (+RFP(-RT)) was used as a negative control. Primers for X. tropicalis a-actin were used as a control for RNA recovery. (e) Western blot analysis of SB transposase expression in tissues harvested from adult transgenic frogs. A monoclonal antibody to SB was used to demonstrate abundant transposase expression in the testis and liver of RFP-positive adults, but not in the RFP-negative siblings. Protein lysates prepared from tadpoles injected with SB 10 mRNA at the one-cell stage were prepared at stage 15 (control lane). The blots were stripped and re-probed with a monoclonal antibody that recognizes Xenopus a-actin. PCR: polymerase chain reaction; RFP: red fluorescent protein; RT: reverse transcriptase; SB: Sleeping Beauty.: Generation of double-transgenic 'hopper' frogs: The CAGGS-SB 10;<U+03B3>cRFP 2M line was outcrossed with SB transposon transgenic animals that express GFP under the control of the CAGGS promoter (pT2 ßGFP [6]). Double-transgenic F2 'hopper' frogs (ubiquitous GFP and lens-specific RFP) were outcrossed with wild-type frogs and the progeny (F3) were either analyzed for remobilization events or raised and outcrossed (Figure 2). Five independent substrate donor lines were used to generate double-transgenic hopper lines for this study. As the methodology for the generation and analysis of the hopper lines is the same for each donor locus, two donor lines (pT2 ßGFP 8 F and 7 M) will be described in detail below.: Breeding strategy to generate double-transgenic hopper frogs. The F2 hopper frogs were outcrossed with wild-type animals and the progeny was scored for GFP and RFP expression. The GFP-positive/RFP-negative F3 progeny were either raised to adulthood for outcross or genomic DNA was harvested after stage 45 for molecular analyses. GFP: green fluorescent protein; RFP: red fluorescent protein.: 8F hoppers: The pT2 ßGFP 8F founder harbors two independently-segregating alleles: a concatemer of three SB transposons integrated at a single locus on scaffold 57, at base number 2456981 (57:2456981) of the JGI X. tropicalis genomic sequence v4.1 assembly, and another allele with a single-copy transposon integration [6]. Thus, the F2 hopper frogs inherited either one, or both, of the 8F integration events. Southern blot analysis of progeny from 8Fhopper<U+2642>35 indicated that this double-transgenic hopper had inherited the trimeric concatemer of pT2 ßGFP on scaffold 57 alone. Double-transgenic (RFP+/GFP+) progeny (F3) from the outcross of 8Fhopper<U+2642>35 were raised and outcrossed, and the resulting progeny (F4) were analyzed for modification of the parental pT2 ßGFP locus (Figure 2).: Observation of the GFP expression in the hopper outcross populations indicated that, in most cases, the GFP expression of the progeny was identical to that of the 8F founder, suggesting that the parental SB transposon locus was intact. In a small number of the outcross progeny, we observed markedly different GFP expression in either small populations of cells within the tadpole (Figure 3a, b) or in whole tadpoles (Figure 4a, b; 40 from 20,015 GFP-positive tadpoles). We reasoned that the change in GFP expression might result from the modification of the parental pT2 ßGFP locus in the remobilized progeny. Embryos with small subsets of cells with increased GFP intensity likely represent stochastic transposase activity in somatic tissues (somatic remobilization (Figure 3a, b)). An organism-wide change in GFP intensity (Figure 4a) likely represents modification of the parental transposon donor locus during gametogenesis that is passed on to the resulting progeny. Remobilization of a transposon from the donor locus to a novel site will likely alter the local epigenetic environment of the transgene, and also subject the re-integrated transposon to the influence of nearby gene regulatory sequences that differ from the parental locus.: Somatic remobilization of pT2 ßGFP in double transgenic tadpoles. Outcross of double transgenic hopper frogs resulted in progeny that inherited both transgenes. In rare instances, we identified double transgenic tadpoles that express intense levels of the GFP transgene reporter in individual cells or in small groups of cells. The change in GFP expression seen in these somatic cells is likely to be due to sporadic remobilization of the pT2 ßGFP transposon and the change in GFP intensity is likely due to the influence of the local chromatin environment at the novel integration site. The region of each tadpole shown (dashed box) is indicated on the cartoon inset. (a) Tail of a double transgenic tadpole with a single muscle cell expression intense GFP (arrow). (b) Double transgenic tadpole with high-level GFP expression in a subset of cells in the brachial cartilage (arrow). The immobilized tadpole was also photographed using a dsRED filter and the two images were overlaid to demonstrate that this animal had inherited the CAGGS-SB 10;<U+03B3>cRFP transgene (RFP expression in the lens is indicated by the white arrowhead). GFP: green fluorescent protein; RFP: red fluorescent protein.: Excision and re-integration of SB transposons in the progeny of double-transgenic hopper frogs. (a) GFP expression in sibling tadpoles derived from the outcross of an 8F hopper frog. Tadpoles 1 and 2 are significantly brighter than their GFP-positive siblings (tadpoles 3, 4 and 5). Tadpole 6 is a GFP-negative tadpole. Dorsal view, with anterior facing towards the right. (b) Representative data for the outcross population of 8F hopper frogs. Table includes data from breeding four F2 (8F<U+2642>54, 8F<U+2642>55, 8F<U+2642>56 and 8F<U+2640>61) and seven F3 (8F35<U+2642>A, B, C etc.) double-transgenic hoppers with wild-type frogs. The outcross progeny were scored for GFP expression and the GFP-bright progeny were either harvested for integration site analysis or raised to adulthood and outcrossed. A range of apparent remobilization activity from 0% to 0.7% was observed in individual 8F hopper frogs, with an average rate of two remobilization events per thousand GFP-positive progeny (0.2%). (c) Southern blot analysis of genomic DNA harvested from the progeny of double transgenic 8F hopper frogs. Genomic DNA was digested with Bgl II and the blot was probed with a radiolabelled GFP cDNA probe. DNA harvested from tadpoles in lanes 3, 4 and 5 have the same banding pattern as the parental pT2 ßGFP 8F founder line. Lanes 1 and 2 show example of remobilization of an SB transposon. The dashed arrow indicates the change in the mobility of the transposon-harboring Bgl II fragment. Lane 6 contains DNA from GFP-negative siblings. GFP: green fluorescent protein; SB: Sleeping Beauty.: Genomic DNA harvested from GFP-positive progeny from double transgenic (pT2 ßGFP8F:CAGGS-SB 10;<U+03B3>cRFP) 8F hopper frogs was analyzed by Southern blot. Digestion of genomic DNA from pT2 ßGFP 8F tadpoles with Bgl II resulted in three bands when the blot was hybridized with a GFP probe (Figure 4c). Changes in the Southern blot hybridization pattern were used to determine whether the parental concatemer had been altered by expression of the SB transposase. Analysis of progeny from the outcross of F3 double transgenic hopper frogs with wild-type animals indicated that most of the progeny had inherited the unaltered pT2 ßGFP 8F parental concatemer (Figure 4c; lanes 3, 4 and 5). Examples of germline remobilization of the pT2 ßGFP transposon from 'GFP-bright' tadpoles (Figure 4a; tadpoles 1 and 2) harvested from the outcross of 8Fhopper<U+2642>58 are shown in Figure 4c (lanes 1 and 2, dashed arrows). This data indicates that, as predicted, the GFP-bright individuals in the outcross population of the hopper frogs represent tadpoles that have modified the parental transposon donor locus. Thus, remobilized animals can be identified in the outcross population by simply observing the tadpoles for changes in GFP intensity. Outcross of eleven 8F hopper double transgenic frogs indicated that the frequency of remobilized progeny varied from 0.07% to 0.71% (Figure 4b). The variation in the remobilization activity between individual hopper frogs likely reflects subtle differences in epigenetic modification of the substrate and enzyme transgenes in each animal that may alter the activity of the excision and reintegration reactions.: Analysis of the cloned flanking sequences of the parental locus (57:2456981) from the remobilized tadpoles (Figure 4c; lane 1 and 2) showed no sequence change, indicating that the remobilized transposon was excised from within the donor concatemer (data not shown). Extension primer tag selection linker mediated-PCR (EPTS LM-PCR) and standard genomic PCR [6, 8] were used to clone the integration sites of the novel bands. The re-integration event from tadpole 8Fhopper<U+2642>58-1 had occurred on the same scaffold as the parental integration site, and thus represented a 'local hop' (Table 1 and Figure 5; tadpole 8Fhopper<U+2642>58-1). Genomic PCR and sequencing was used to verify both the 5'- and 3'-ends of the novel insertion site. The integration site of the remobilized pT2 ßGFP transposon is at 57:2491386 and is 34,405 bp away from the parental locus on chromosome 6 (Figure 5b). Sequence analysis of the integration site indicated that the remobilization event was catalyzed by a canonical transposition event. That is, the transposon inserted precisely at the predicted boundary of the indirect repeat/direct repeats (IR/DRs). Furthermore, the integrated transposon is flanked by the expected TA dinucleotide target site duplication catalyzed by SB transposase [31–33]. Thus, unlike the co-injection method used to generate the pT2 ßGFP founder lines that results in unexpected concatemer formation (Figure 5a, [6]), the remobilization events catalyzed by re-expression of SB transposase are via canonical transposition (Figure 5b). To date, we have identified 40 remobilization events, based on differences in GFP expression intensity, from 20,015 GFP-positive tadpoles from the outcross of 8F hopper frogs (Figure 4b). Southern analysis has confirmed excision and re-integration of a SB transposon from the parental locus and yields an apparent remobilization frequency of approximately 0.2%.: Integration site analysis of remobilized SB transposons. (a) Schematic representation of the 8F donor locus showing the predicted orientation of the trimeric concatemer in scaffold 57. This injection-mediated integration event occurred by a non-canonical mechanism. (Not to scale.) (b) Schematic representation of the novel integration event in the remobilized tadpole shown in Figure 4b. EPTS LM-PCR was used to clone the sequence flanking the 5' end of the pT2 ßGFP transposon and the 5' and 3' flanking sequences were verified using PCR primers designed to the scaffold sequence. The novel integration event occurred on the same scaffold (57 at position 2544323 bp) of the Joint Genome Institute X. tropicalis genome sequence assembly v4.1 as the 8F transposon donor locus (57:2456981) and represented a local hop. The sequence of the SB transposon and genomic DNA junctions (arrows) indicated that the remobilization event occurred via a canonical transposition reaction. The pT2 ßGFP transposon is flanked by the expected TA dinucleotide target site duplication (TSD; bold underlined), and the transposon is inserted precisely, without any flanking plasmid sequence from the donor site. The genomic DNA sequence of scaffold 57 is capitalized and the transposon sequence is in lowercase italics. (Not to scale.) (c) The preferred sequence for SB transposon re-integration in the X. tropicalis genome. Weblogo analysis http://weblogo.berkeley.edu for the five base pair sequence flanking the TA target site. The relative size of the letters indicates the strength of the information on the y-axis, with the maximum indicated by two bits. The table shows the base distribution of the pT2 ßGFP transposon re-integration target sites. EPTS LM-PCR: extension primer tag selection linker-mediated polymerase chain reaction; PCR: polymerase chain reaction; SB: Sleeping Beauty.: Pre-sorting tadpoles based on GFP intensity may underestimate the total remobilization activity if the re-integration event resulted in GFP expression that was not markedly different from the parental expression. To test this, we outcrossed a double transgenic hopper frog (8Fhopper<U+2642>51) and analyzed all of the GFP-positive progeny by Southern blot. The 8Fhopper<U+2642>51 frog inherited both of the pT2 ßGFP transposon alleles from the 8F founder. The progeny from this 8Fhopper<U+2642>51 outcross displayed GFP expression patterns and intensities that were indistinguishable from that of the parental alleles (data not shown). From the 677 GFP-positive progeny analyzed by Southern blot, we identified four excision-only events and two remobilizations. Samples of genomic DNA where changes were evident by Southern blot analysis were used in EPTS LM-PCR to clone the integration site of the remobilization events. In this experiment, the remobilization frequency was 0.3% (two remobilization events out of 677 GFP-positive tadpoles). These data indicated that the actual remobilization frequency may be somewhat higher than that estimated by simple visual inspection of the GFP-positive progeny. The observed rate of excision-only events in this outcross population was 4 out of 677, that is, 0.6%.: Scoring the outcross progeny of hopper frogs for changes in GFP intensity may also overestimate the remobilization frequency, as this method may not distinguish between remobilization events and excision-only events. We analyzed 25 GFP-bright tadpoles from the outcross of 8F and 7M (see below) hopper frogs, by Southern blot analysis and by cloning the novel insertion sites by EPTS LM-PCR. Only one GFP-bright tadpole had an excision-only modification of the parental transposon donor locus (4%); 24 GFP-bright tadpoles (96%) had re-integration events that were evident by novel bands on the Southern blot and by cloning the sequences flanking the canonical re-transposition events. Thus, while it is possible to identify excision-only events by changes in GFP expression, the vast majority of GFP-bright progeny represent re-integration events.: Remobilization of transposons resident in the genome may result in chromosomal rearrangements near the donor locus [34–41]. In mice, germline remobilization of SB transposons from a high-copy number (approximately 30 copies) concatemer resulted in frequent alteration of the genomic sequences flanking the transposon donor locus; nine out of nine remobilized pedigrees examined displayed genomic alterations spanning 105 bp to 107 bp flanking the donor site [41]. To determine whether SB remobilization in the frog resulted in similar genomic alterations near the donor locus, we examined the sequences flanking the 8F donor locus by PCR. Genomic DNA samples from eight remobilization events and eight excision-only events were used to amplify the sequences flanking the 5' and 3' ends of the 8F concatemer on chromosome 6. In each case, genomic PCR using primers that amplify the 5' and 3' junctions of the 8F locus generated the appropriate sized products (data not shown), indicating that the sequences directly flanking the donor locus are intact following excision of pT2 ßGFP transposons from the donor site.: Sequence analysis of the re-integration target sites indicated a similar base distribution flanking the canonical TA dinucleotide to that observed with SB integration in mammalian genomes [42, 43] (Figure 5d). Transposons of the Tc1/mariner family, including SB, integrate at TA dinucleotides. The consensus sequence for SB integration in frogs, as in mammals, is a palindromic ATATA TAT sequence, where the canonical TA target is in bold, although none of the re-integration events observed in the frog have this exact palindrome.: Cloning the integration sites of the novel loci indicated that the remobilized transposons frequently integrate near the parental locus (Figures 5 and 6; 12 out of 15 classed as local hopping, 80%). In two cases, we identified remobilization events that had re-integrated within the parental transposon concatemer on scaffold 57 (Table 1 and Figure 6a). The scaffold identity was used to 'map' the chromosomal location [44] of the novel integration events and showed that, while local hopping was more frequent, re-integration on other chromosomes was also detected (Figure 6b; three out of fifteen (20%) of integrations are on different chromosomes).: Schematic representation of remobilized SB transposons in the X. tropicalis genome. (a) Local hopping on X. tropicalis chromosome 6 depicts the integration sites for eight local (< 200 kb) remobilization events (hops). (Not to scale.) This region of X. tropicalis chromosome 6 is syntenic with human chromosome 7. The Vista http://genome.lbl.gov/vista alignment shows regions of homology between the frog and human genomic sequences; pink represents non-coding regions and blue represents conserved exons. The position of the 8F donor concatemer is indicated by the grey box. The remobilized transposition events are depicted by the grey triangles. The position and orientation of predicted genes near the 8F locus are depicted in the lower section of the panel. (b) Schematic representation of the X. tropicalis chromosomes indicating the distribution of remobilized SB transposons. The parental 8F donor site is on chromosome 6 (thick line). The predicted loci of the remobilized transposons are depicted by the thin black lines. Approximately 80% of the remobilization events occur near the donor locus (local hopping), and the remaining 20% are distributed randomly throughout the genome. SB: Sleeping Beauty.: GFP-bright progeny from the outcross of 8F hopper frogs were raised to the adult stage, and outcrossed to demonstrate that the remobilized transposon alleles are stably transmitted through the germline. Genomic DNA was harvested from GFP-positive and GFP-negative siblings and used for Southern blot analysis and for cloning the novel integration site by EPTS LM-PCR. For example, remobilized female frog 62E3 produced GFP-bright progeny and integration site analysis showed a single copy of the pT2 ßGFP transposon on scaffold 140 (140:1237072). The novel re-integration event was on the same chromosome as the donor locus (chromosome 6, linkage group 2), approximately 1 cM from the parental 8F concatemer, and represents a local hop (data not shown). The 62E3 integration event was in the 3' UTR of a muscle-related coiled coil protein (GenBank accession number XM_002935280.1) gene.: 7M hoppers: The pT2 ßGFP 7M founder had a concatemer of 8 to 10 pT2 ßGFP transposons at a single locus within a repeat on scaffold 38 (Linkage Group 10, chromosome 10), and mapped, by fluorescence in situ hybridization (FISH) analysis, near a telomere on chromosome 10 (Figure 7). Double transgenic 7M hopper frogs were generated by breeding heterozygous pT2 ßGFP 7M F1 frogs with heterozygous CAGGS-SB 10;<U+03B3>cRFP 2M F1 frogs, and the progeny were sorted for GFP-positive and RFP-positive expression. The double-heterozygous 7M hopper frogs were outcrossed with wild-type animals and remobilization events were scored in the progeny by observing the outcross population for changes in GFP expression. To date, ten 7M hoppers have been outcrossed and 112 remobilized (GFP-bright) tadpoles have been identified from 11,646 GFP-positive progeny (Figure 7a). Genomic DNA from several GFP-bright tadpoles was analyzed by Southern blot, and this data verified that the banding pattern had changed from the parental 7M pattern, indicative of transposon remobilization. The novel integration sites were cloned (Table 2), and sequence analysis confirmed that the remobilized transposons had re-integrated via canonical SB-mediated transposition (data not shown). The average apparent rate of remobilization was approximately 1%, and is five-times higher than that observed for the 8F hopper animals. The higher rate of remobilization observed in the 7M hoppers compared to the 8F hoppers may be due to the increased number of potential substrate transposons in the donor concatemer (three for 8F compared with 8 to 10 for 7M). A range of remobilization activities, from 0% to 5%, was noted between the different 7M hopper frogs. The 7M hoppers were produced by breeding frogs that were heterozygous for the SB 10 enzyme transgene with frogs that were heterozygous for the pT2 ßGFP 7M allele. Double-heterozygous males (7Mhopper<U+2642>1, 7Mhopper<U+2642>2, 7Mhopper<U+2642>3, 7Mhopper<U+2642>5, 7Mhopper<U+2642>14, 7Mhopper<U+2642>20) produced offspring with an average remobilization frequency of approximately 0.44%. The frequency of GFP-positive progeny in the outcrosses from these males was approximately 50%, as expected for the Mendelian inheritance of a heterozygous dominant allele. Two 7M hopper male frogs (7Mhopper<U+2642>9 and 7Mhopper<U+2642>11) produced a much higher rate of remobilized (GFP-bright) tadpoles than their siblings (approximately 1.9% compared with 0.44% for male sibling hoppers). Intriguingly, these animals appear to be homozygous for both the enzyme (CAGGS-SB 10;<U+03B3>cRFP) and substrate (pT2 ßGFP) transgenes; 100% of the outcross progeny were RFP-positive and nearly all (> 98%) were also GFP-positive. Southern blot analysis of the outcross progeny indicated that all of the GFP-positive animals (n = 107 for 7Mhopper<U+2642>9; n = 114 for 7Mhopper<U+2642>11) had inherited the 7M concatemer, and the banding pattern was identical to the parental locus. The GFP-bright tadpoles in the outcross populations of 7Mhopper<U+2642>9 and 7Mhopper<U+2642>11 showed changes in parental 7M locus indicative of excision and re-integration of a transposon from the substrate donor locus. The rare (< 2%) GFP-negative tadpoles observed in the outcross populations did not inherit the pT2 ßGFP transgene as determined by Southern blot and genomic PCR for GFP sequences (data not shown). The unexpected non-Mendelian inheritance of hoppers 7M<U+2642>9 and 7M<U+2642>11 is unexplained; however, this data suggests that increasing the copy number of the transposon substrates in the hopper lines may increase the remobilization frequency observed in the outcross population.: Transposon hopping from the 7M donor locus. (a) The outcross progeny from nine 7M hopper adults were scored for changes in GFP intensity indicative of transposon remobilization. The 7M donor locus contains a concatemer of approximately eight to ten pT2 ßGFP transposons on chromosome 10. The remobilization frequency is expressed as a percentage of GFP bright tadpoles observed in the GFP-positive outcross population. 7M hopper frogs 7M<U+2642>9 and 7M<U+2642>11 are 'functionally homozygous' for both the transposon substrate allele and the transposase enzyme transgene (see text for details), and display higher remobilization activity (approximately 1.8%) than their 'heterozygous' male hopper littermates (7M<U+2642>1, 7M<U+2642>2, 7M<U+2642>3, 7M<U+2642>5, 7M<U+2642>14, 7M<U+2642>20; approximately 0.44%). Outcross of 7M hopper 7M<U+2640>25 produced a remobilization rate of > 4%. (b) Schematic representation of the X. tropicalis chromosomes indicating the distribution of remobilized SB transposons. The parental 7M donor site (thick line) is located on scaffold 38 which maps to chromosome 10. Remobilization of discrete transposons away from the 7M donor locus is represented by the grey arrows. (Not to scale.) (c) Fluorescence in situ hybridization of metaphase chromosomes verifies that the 7M parental donor locus is located near a telomere of X. tropicalis chromosome 10. GFP: green fluorescence protein; SB: Sleeping Beauty.: To date, four 7M female hopper frogs have been outcrossed, and the mean remobilization rate from these animals is 2.54%. This may reflect individual differences in excision and reintegration activities between different hopper animals, or it may indicate that remobilization, driven by the CAGGS-SB 10 transgene, is more efficient in the female germline (mean 2.54%; n = 4) than in the male germline (mean 0.76%; n = 9, unpaired Student's t-test, P = 0.0088, degrees of freedom, 11). With the 8F hoppers, we observed a modest increase in the mean remobilization efficiency with the female hoppers (0.56%; n = 2) compared to the male hoppers (0.25%; n = 7); however, due to the small sample size, this may not be statistically significant (Student's t-test P = 0.25, degrees of freedom, 1).: The 7M pT2 ßGFP concatemer is located on scaffold 38 that maps to chromosome 10 (Figure 7b and 7c). Genomic DNA harvested from representative GFP-bright tadpoles from the 7M hopper outcrosses was analyzed by Southern blot and the novel integration sites were cloned by EPTS LM-PCR. As noted for the remobilized 8F hopper progeny above, the novel integration events from the 7M hoppers were canonical SB-mediated transposition events. As determined for the remobilization events from the 8F hopper frogs, a strong bias for local re-integration was observed for the 7M hoppers; re-integration events on the same scaffold (scaffold 38) as the transposon donor were cloned from the GFP-bright tadpoles.: Discussion: SB transposons can be remobilized in X. tropicalis: Here, we demonstrate that SB transposons integrated into the frog genome are effective substrates for remobilization following re-expression of the SB transposase. Unlike the integration events observed in the co-injection strategy that were mediated by a complex non-canonical mechanism, the remobilized SB transposons re-integrated via canonical SB-mediated transposition. The observed frequency of excision and subsequent re-integration of the parental pT2 ßGFP transposon was low (on average, less than 1%). Our data in X. tropicalis is similar to that observed in other in vitro[45] and in vivo[19, 46, 47] systems, where low-copy number transposon donor sites served poorly as substrates for remobilization. In mammals, increasing the number of transposon substrates by using donor sites that contain high-order concatemers resulted in increased remobilization activity [19, 46]. For example, in AB1 embryonic stem cells, a single SB transposon was 'knocked in' to the Hprt gene on the mouse X chromosome and subsequent transient expression of SB transposase (SB 10) resulted in a transposition rate of circa 3.5 × 10-5 events per cell per generation [45]. In mice, low-copy number SB transposon donor sites result in a low frequency of remobilization events that are passed through the germline [19, 47]. For example, single-copy SB transposon donors result in novel re-integration sites in approximately one embryo in every one hundred (around 1%) in an outcross of double transgenic 'seed' mice [47]. Increasing the number of transposon substrates by using donor sites that contain high-order concatemers resulted in increased remobilization activity; however, Geurts and colleagues noted that there was not a linear correlation between donor site copy number and remobilization activity [47]. This suggests that other factors, such as the methylation status, and other local chromatin-environment factors, may also influence the ability of integrated SB transposons to serve as substrates for remobilization. Horie and colleagues observed that low-copy number SB transposon concatemers served very poorly as substrates for remobilization in mice, and that the presence of more copies of the SB transposon in the concatemer acted synergistically to increase the frequency of excision and re-integration. With a donor site that contained around 20 copies of the transposon substrate, a remobilization rate of 1.25 transpositions per genome per animal (125%) was observed [19]. Keng and colleagues also reported a similar transposition frequency when using double transgenic mice that contained either 20 copies (1.16 transpositions per GFP-positive mouse) or 100 copies (1.14 transpositions per GFP-positive mouse) of the substrate transposon in the donor concatemer [46].: In this study, we used low-copy number donor sites as substrates for remobilization as the integration events observed with the plasmid-mRNA co-injection strategy were mediated by a complex, non-canonical integration mechanism [6]. We reasoned that, if a similar non-canonical mechanism were used in the remobilization step, starting with a simple substrate would help facilitate cloning of the remobilization event. Although the remobilization frequency observed in the outcross of the double transgenic frogs is low, the re-integration events are canonical SB-mediated transpositions. There are several strategies available to increase the frequency of remobilization events in the frog genome. Increasing the copy number of SB transposon substrates will likely significantly increase the frequency of novel re-integration events in the outcross progeny from hopper frogs. Transgenic frogs with multiple copies of the pT2 ßGFP transposon have been generated (7M and <U+2640>622E) that each harbor more than seven copies of the pT2 ßGFP transposon [6]. The total number of transposon substrates in each hopper line can be further increased by incrossing the hopper lines with other pT2 ßGFP founders that contain multiple copies of the SB transposon. In addition to donor site transposon copy number, the transgenic transposase enzyme may also influence the remobilization activity in the frog. The transgenic SB enzyme frog described here was generated using the first generation SB transposase (SB 10; [24]). In recent years, several hyperactive mutant forms of the SB enzyme have been developed, including SB 11 that has approximately three-fold higher activity [48] and SB 100X that has a 100-fold increase in enzymatic activity when compared to SB 10 [49]. In addition to the choice of modified enzyme, different promoters with varying transcriptional activity could be used to drive expression of the SB transgene to enhance the rate of germline remobilization. This may not be as simple as finding the most powerful promoter and/or enhancer available, as SB is sensitive to overproduction inhibition, where increasing levels of enzyme impair the overall transposition efficiency [48].: Why are different integration mechanisms observed with the co-injection and the double transgenic strategies?: There are several possible reasons for the different integration mechanisms used in the two SB-mediated methods, that is, injection-mediated and breeding-mediated transposition. First, the concentration of the substrate is vastly higher in the injection method where approximately 75 pg (around 10.8 × 106 copies) of the plasmid harboring the SB transposon substrate was co-injected with SB mRNA. By comparison, the pT2 ßGFP 8F founder used in the transgenic remobilization strategy contained three copies of the SB transposon. The SB transposase catalyzes transposition as a dimer of dimers (tetramer) bound to the indirect IR/DR elements that flank the transposon [32]. The massive excess of substrate present in the injection-mediated strategy may prohibit the correct assembly of transposase on the substrate and may result in non-canonical enzymatic activity. Also, the integrated transposon may also be a better substrate for SB activity due to DNA methylation and heterochromatinization [50, 51]. Recent studies have shown that CpG methylation and supercoiling of SB transposon-harboring plasmids result in highly efficient transposition by the co-injection method in mammals when compared to non-methylated linear plasmid DNA donors [52]. Finally, the differences in the integration mechanisms observed with the two strategies may reflect differences in the availability of host factors for SB transposition in the developing gametes and the early-cleavage stage Xenopus embryos.: Potential uses for SB remobilization in X. tropicalis: The demonstration that SB transposons stably integrated into the frog genome are effective substrates for remobilization is an important step in the development of large-scale insertional mutagenesis and enhancer- or gene-trap screens in the frog. The breeding-based remobilization strategy described here provides a simple and robust method for generating novel transgenic lines without the need for labor- and skill-intensive micro-injection methodologies. The frog provides several important advantages for transposon-based genetic screens. First, each outcross can generate several thousand progeny. The high fecundity of X. tropicalis indicates that, even if the remobilization frequency is low, multiple novel re-integration events can be identified in a single outcross.: A second advantage is that Xenopus have a long lifespan in captivity that may reach two decades or more, and the animals remain fertile for more than ten years. This has important implications for remobilization strategies, as double transgenic hopper frogs can be maintained and outcrossed at regular intervals for many years; male frogs can be outcrossed every two weeks and females every two months. Laboratories with limited animal holding space can, with a small cadre of hopper frogs, perform large-scale enhancer- or gene-trap screens, keeping only those tadpoles with interesting GFP expression profiles. The long lifespan of the hopper frogs may also have important implications if epigenetic silencing of the transgenic transposase locus is identified over a series of generations. Stably integrated transgenes are frequently subjected to epigenetic silencing over successive generations [12, 53, 54]. Silencing of the transposase locus would likely result in abolishment of the hopping activity. As each generation of frog lives for many years, having to regenerate new lines for hopping strategies is not likely to be a problem in this species.: The propensity of SB to catalyze local hopping events is a third advantage that can be exploited to generate insertional mutants of genes near the donor locus. In mice, approximately 75% of remobilized SB transposons re-integrate within 3 Mb of the donor locus [19]. Our data with remobilization of the 8F locus indicates that local hopping is also a feature of the SB transposition in X. tropicalis. Transgenic SB transposon frogs that have integrations in gene-dense regions of the genome can be used as donors for insertional mutagenesis strategies, as re-integration within a nearby gene may disrupt the normal activity of that locus. In the example presented here, the 8F transposon donor is located on scaffold 57, which maps to X. tropicalis chromosome 6 and is syntenic with human chromosome 7 (Figure 6a). It is in a gene dense region with approximately 50 genes in the 3 Mb flanking the transposon donor. The genome size of X. tropicalis is approximately one-half that of the human genome [3], while the gene content of the frog is similar to that of man. Thus, the overall gene density is relatively high in the frog. Different DNA 'cut and paste' transposon systems offer unique advantages for manipulating the vertebrate genome. For example, the local hopping activity of SB can be exploited to saturate the genomic sequences flanking the transposon donor locus with novel re-integration events. We have recently demonstrated that Tol2 transposons stably integrated into the frog genome are effective substrates for remobilization [7]. The local hopping activity of Tol2 is less pronounced than that of SB; approximately 20% of Tol2 re-integration events occur near the donor locus, compared to approximately 80% for SB. Using nested transposon substrates, with, for example, an SB transposon cloned within a Tol2 element, genome-wide remobilization screens could be performed using Tol2 to randomly distribute the dual substrate throughout the genome, with subsequent SB remobilization to locally saturate regions of interest with novel insertion events. In the study described here, we have used a simple ubiquitous promoter element to drive expression of the GFP reporter. Substrate transposons that harbor potentially more mutagenic elements, such as polyadenylation trap elements [55], can be used to efficiently disrupt the activity of the 'trapped' gene.: Finally, the frog is an excellent model for embryological and biochemical studies due to its small size, simple husbandry and the ease of manipulating embryos at all stages of development. Furthermore, as a tetrapod species, X. tropicalis shares a similar body plan with mammals, allowing analysis of developmental processes that are unique to higher vertebrates, such as limb and digit pattern formation. Combining these features with a simple and robust method for generating novel transgenic lines will provide valuable tools to apply to this highly tractable developmental model system.: Conclusions: SB transposons stably integrated into the X. tropicalis genome are substrates for remobilization: Co-injection of plasmid DNA harboring a SB transposon together with mRNA encoding the SB transposase results in efficient transgenesis of X. tropicalis. The integration events mediated by this co-injection approach are complex and frequently contain low-order concatemers of the transposon. In this study, we demonstrate that SB transposons stably integrated in the frog genome are effective substrates for remobilization. Transgenic frogs that express SB 10 transposase in the germline were bred with SB transposon frogs and the double transgenic progeny were outcrossed to wild-type animals. Remobilization events were readily identified by increased GFP expression in the offspring where the parental transposon concatemer had been modified by the SB 10 enzyme. Integration site analysis of the GFP-bright progeny indicated that the transposon re-integration events had occurred via a canonical cut and paste mechanism. The rate of remobilization observed in the frog was similar to that observed in other species when a low copy number concatemer was used as the transposon donor.: SB transposon remobilization as a tool for genetic manipulation of X. tropicalis: The diploid frog X. tropicalis offers several advantages for large-scale forward genetic screens in a tetrapod model, including vast numbers of progeny per spawn, long lifespan and availability of genomic resources including the genome sequence and genetic map. Here, we have demonstrated that we can exploit the cut and paste activity of SB transposase to generate novel transposon transgenics by simply breeding the double-transgenic hopper frogs. Novel transposon lines are readily identified by changes in GFP reporter expression in the remobilized progeny compared to the parental GFP pattern. The local hopping activity of SB can be exploited to saturate genomic regions that flank the transposon donor sites. The ability to generate thousands of progeny in each outcross, combined with the ease of identifying novel insertion events, will allow large-scale SB-mediated screens to be performed in X. tropicalis.: Methods: Plasmids and generation of transgenic lines: The generation of the pT2 ßGFP construct and the transgenic pT2 ßGFP X. tropicalis line 8F have been described previously [6]. The pCAGGS-SB 10 construct was a gift from Dr David Largaespada [27]. A 3,613 bp Hin cII/Bam HI fragment containing the promoter/enhancer and the SB 10 transposase from pCAGGS-SB 10 was cloned in pBluescript SK+ (pBS-SK+) to generate pBS-CAGGS-SB 10. The 2.2 kb X. laevis <U+03B3>1 crystallin promoter driving dsRed construct was a gift from Dr Robert Grainger (2.2 <U+03B3>1 crystallin-RFP; [28]). An approximately 3.5 kb <U+03B3>1-crystallin promoter-RFP fragment was PCR amplified from 2.2 <U+03B3>1 crystallin-RFP using primers DSR1 5'-GTAAGCGGCAGGGTCGGA-3' and DSR2 5'-GCCTCGAGCGATTTCGGCCTATTGGT-3', cloned into pGEM-Teasy (Promega, Madison, WI, USA), and fully sequenced, to yield pGEM-<U+03B3>cRFP. An approximately 3.5 kb Sac I restriction fragment from pGEM-<U+03B3>cRFP encoding the <U+03B3>1 crystallin-RFP reporter was cloned into the unique SacI restriction site of pBS-CAGGS-SB 10. A single clone was selected with the two mini-genes oriented tail-to-tail in the pBS-SK plasmid (pCAGGS-SB 10;<U+03B3>cRFP; Figure 1a). The pCAGGS-SB 10;<U+03B3>cRFP construct was linearized with Sca I and injected in vitro into X. tropicalis fertilized embryos at the one-cell stage (500 pg of linear plasmid DNA in 3 nL of water) as described previously [6, 8]. Tadpoles were scored for expression of RFP in the lens after stage 40 [30]. RFP-positive tadpoles (27 positive from 570 injected, 4.7%) were selected and raised to adulthood and outcrossed to determine germline transmission of the transgene. Male frog CAGGS-SB 10;<U+03B3>cRFP 2M produced progeny that expressed RFP robustly in the lens and was selected for further analysis.: Husbandry and micro-injection of X. tropicalis: X. tropicalis tadpoles were maintained at 28°C in static tanks and were staged according to Nieuwkoop and Faber [30]. Adult animals were housed in a recirculating aquarium at 26°C. Transgenic adult frogs were identified by implanting a radio-frequency identification microchip (microSensys GmbH, Erfurt, Germany) beneath the skin of the dorsal surface of each animal [56]. The unique 16-digit alphanumeric sequence encoded on each chip provides a convenient method for identifying individual animals throughout their lifespan. Female X. tropicalis animals were pre-primed with a 1:5 dilution of human chorionic gonadotropin (hCG) overnight, and primed the day of injection with 200 U of hCG. Fertilized eggs were obtained by natural matings. Injected eggs were allowed to heal at 28°C and transferred to tanks for growth at 28°C [56]. This project was approved by St. Jude Children's Research Hospital's Institutional Animal Care and Use Committee.: RT-PCR analysis of SB expression: Total cellular RNA was isolated using the RNAeasy kit (Clontech, Mountain View, CA, USA) from individual RFP-positive and RFP-negative stage 40 embryos generated by outcross of CAGGS-SB 10;<U+03B3>cRFP 2M. First strand cDNA was synthesized and used as template for 32P-labelled PCR reactions as described previously [57]. Primers for SB 10 amplification (SB3 5'-GCCGCTCAGCAAGGAAGA-3' and SB4 5'-GAAGACCCATTTGCGACCAAG-3') annealed at 56°C and produced a 383 bp fragment. Primers to Xenopus a-actin were used as a control for RNA recovery.: Western blot analysis of SB 10 protein in tissues harvested from transgenic frogs: Whole embryo or adult tissue samples were snap frozen in a dry ice and ethanol bath and stored at -80°C. 100 µl of RIPA buffer (150 mM sodium chloride, 50 mM tris(hydroxymethyl)aminomethane with hydrogen chloride pH 7.5, 1% (v/v) nonyl phenoxypolyethoxylethanol (IPEGAL), 0.5% (w/v) sodium deoxycholate, 0.1% (w/v) sodium dodecyl sulfate, 1X Complete Protease Inhibitor Cocktail (Roche, Indianapolis, IN, USA)) was added to the frozen samples, mixed by vortex, extracted with Freon (200 µL of 1,1,2-trichlorotrifluroethane (Sigma-Aldrich, St. Louis, MO, USA)) and centrifuged at 16,100 × g for 15 minutes at 4°C. The upper phase was transferred to a fresh tube and the protein concentration was measured using the Bradford assay (Bio-Rad, Hercules, CA, USA). Aliquots of each sample were diluted with an equal volume of 2× Laemmli Sample Buffer containing ß-mercaptoethanol (Bio-Rad) and denatured by heating at 100°C for 5 minutes. Proteins were separated by electrophoresis on 4% to 15% (w/v) Criterion precast polyacrylamide gels (Bio-Rad); pre-stained SDS-PAGE standards (Bio-Rad) were used as molecular weight markers. Proteins were transferred to Hybond-P polyvinylidene difluoride (PDVF; GE Healthcare Life Sciences, Piscataway, NJ, USA) membranes at 50 V for 1.5 hours at 4°C. Protein transfer was verified by staining the membrane with Ponceau S. Monoclonal anti-SB transposase antibody (MAB2798; R&D Systems, Minneapolis, MN, USA) was resuspended at a final concentration of 500 µg/mL and diluted 1:500 to probe the membranes blocked with Superblock (Pierce, Rockford, IL, USA) containing 0.05% (v/v) Tween 20. A secondary goat anti-mouse horseradish peroxidase-conjugated antibody was diluted to 1:10,000 and developed using the Chemiluminescent Detection System (Pierce). Membranes were stripped with Restore Western Blot Stripping Solution (Pierce) and re-probed with a mouse monoclonal antibody specific for Xenopus ß-actin (ab8224; Abcam, Cambridge, MA, USA) as a control for protein recovery.: Fluorescent protein expression analysis: A Leica FLIII fluorescent dissecting microscope was used to analyze GFP and RFP expression. Digital images were captured using a Nikon Ri1 color digital camera and the Nikon Elements Basic Research software package (Nikon, Melville, NY, USA). Tadpoles were immobilized for photography by brief anesthesia in 0.015% (w/v) tricaine methanesulphonate.: Southern blot hybridization: Genomic DNA was harvested from individual tadpoles by overnight proteinase K digestion at 56°C and phenol/chloroform/isoamyl alcohol extraction using standard protocols [8]. Genomic DNA (3 µg to 5 µg) was digested with Bgl II, separated on a 0.7% (w/v) agarose gel and transferred to Hybond N+ hybridization transfer membranes (GE Healthcare Life Sciences, Piscataway, NJ, USA). The hybridization membranes were probed with a 32P-radiolabeled fragment of the GFP open reading frame (approximately 700 bp) and exposed onto a GE Healthcare Life Sciences phosphorimager screen for detection.: Genomic PCR and transposon integration site analysis: Integration site analysis was performed EPTS LM-PCR to the right arm (IR/DR) of the SB transposon as described previously [6, 8]. The integration sites were verified using genomic PCR strategies with primers that bind to scaffold sequences beyond the EPTS LM-PCR products. PCR primers were also designed to amplify the predicted sequences that flank the left IR/DR of the SB transposon. All genomic PCR products were cloned into either pGEM-T Easy (Promega) or TOPO-TA (Invitrogen, Carlsbad, CA, USA) and sequenced. Novel sequences were queried against the Joint Genome Institute X. tropicalis genome (version 4.1; http://genome.jgi-psf.org/Xentr4/Xentr4.home.html) and scaffolds were assigned to chromosomes and linkage groups based on the genetic map developed at the University of Houston [44].: Abbreviations: base pair: chicken ß-actin promoter coupled with a cytomegalovirus enhancer: extension primer tag selection linker-mediated polymerase chain reaction: fluorescence in situ hybridization: green fluorescent protein: human chorionic gonadotropin: indirect repeat/direct repeats: polymerase chain reaction: red fluorescent protein: reverse transcriptase: Sleeping Beauty : untranslated region.: References: Hirsch N, Zimmerman LB, Grainger RM: Xenopus, the next generation: X. tropicalis genetics and genomics. Dev Dyn. 2002, 225: 422-433. 10.1002/dvdy.10178.: Showell C, Conlon FL: Decoding development in Xenopus tropicalis. Genesis. 2007, 45: 418-26. 10.1002/dvg.20286.: Hellsten U, Harland RM, Gilchrist MJ, Hendrix D, Jurka J, Kapitonov V, Ovcharenko I, Putnam NH, Shu S, Taher L, Blitz IL, Blumberg B, Dichmann DS, Dubchak I, Amaya E, Detter JC, Fletcher R, Gerhard DS, Goodstein D, Graves T, Grigoriev IV, Grimwood J, Kawashima T, Lindquist E, Lucas SM, Mead PE, Mitros T, Ogino H, Ohta Y, Poliakov AV, Pollet N, Robert J, Salamov A, Sater AK, Schmutz J, Terry A, Vize PD, Warren WC, Wells D, Wills A, Wilson RK, Zimmerman LB, Zorn AM, Grainger R, Grammer T, Khokha MK, Richardson PM, Rokhsar DS: The genome of the Western clawed frog Xenopus tropicalis. Science. 2010, 328: 633-636. 10.1126/science.1183670.: Doherty JR, Johnson Hamlet MR, Kuliyev E, Mead PE: A flk-1 promoter/enhancer reporter transgenic Xenopus laevis generated using the Sleeping Beauty transposon system: an in vivo model for vascular studies. Dev Dyn. 2007, 236: 2808-2817. 10.1002/dvdy.21321.: Hamlet MR, Yergeau DA, Kuliyev E, Takeda M, Taira M, Kawakami K, Mead PE: Tol2 transposon-mediated transgenesis in Xenopus tropicalis. Genesis. 2006, 44: 438-445. 10.1002/dvg.20234.: Yergeau DA, Johnson Hamlet MR, Kuliyev E, Zhu H, Doherty JR, Archer TD, Subhawong AP, Valentine MB, Kelley CM, Mead PE: Transgenesis in Xenopus using the Sleeping Beauty transposon system. Dev Dyn. 2009, 238: 1727-1743. 10.1002/dvdy.21994.: Yergeau DA, Kelley CM, Kuliyev E, Zhu H, Sater AK, Wells DE, Mead PE: Remobilization of Tol2 transposons in Xenopus tropicalis. BMC Dev Biol. 2010, 10: 11-: Yergeau DA, Kuliyev E, Mead PE: Injection-mediated transposon transgenesis in Xenopus tropicalis and the identification of integration sites by modified extension primer tag selection (EPTS) linker-mediated PCR. Nat Protoc. 2007, 2: 2975-2986. 10.1038/nprot.2007.428.: Yergeau DA, Mead PE: Manipulating the Xenopus genome with transposable elements. Genome Biol. 2007, 8 (Suppl 1): S11-10.1186/gb-2007-8-s1-s11.: Kawakami K: Tol2: a versatile gene transfer vector in vertebrates. Genome Biol. 2007, 8 (Suppl 1): S7-10.1186/gb-2007-8-s1-s7.: Sivasubbu S, Balciunas D, Amsterdam A, Ekker SC: Insertional mutagenesis strategies in zebrafish. Genome Biol. 2007, 8 (Suppl 1): S9-10.1186/gb-2007-8-s1-s9.: Whitelaw E, Sutherland H, Kearns M, Morgan H, Weaving L, Garrick D: Epigenetic effects on transgene expression. Methods Mol Biol. 2001, 158: 351-368.: Chen ZY, He CY, Meuse L, Kay MA: Silencing of episomal transgene expression by plasmid bacterial DNA elements in vivo. Gene Ther. 2004, 11: 856-864. 10.1038/sj.gt.3302231.: Kondrychyn I, Garcia-Lecea M, Emelyanov A, Parinov S, Korzh V: Genome-wide analysis of Tol2 transposon reintegration in zebrafish. BMC Genomics. 2009, 10: 418-10.1186/1471-2164-10-418.: Nagayoshi S, Hayashi E, Abe G, Osato N, Asakawa K, Urasaki A, Horikawa K, Ikeo K, Takeda H, Kawakami K: Insertional mutagenesis by the Tol2 transposon-mediated enhancer trap approach generated mutations in two developmental genes: tcf7 and synembryn-like. Development. 2008, 135: 159-169.: Parinov S, Kondrichin I, Korzh V, Emelyanov A: Tol2 transposon-mediated enhancer trap to identify developmentally regulated zebrafish genes in vivo. Dev Dyn. 2004, 231: 449-459. 10.1002/dvdy.20157.: Collier LS, Carlson CM, Ravimohan S, Dupuy AJ, Largaespada DA: Cancer gene discovery in solid tumours using transposon-based somatic mutagenesis in the mouse. Nature. 2005, 436: 272-276. 10.1038/nature03681.: Dupuy AJ, Akagi K, Largaespada DA, Copeland NG, Jenkins NA: Mammalian mutagenesis using a highly mobile somatic Sleeping Beauty transposon system. Nature. 2005, 436: 221-6. 10.1038/nature03691.: Horie K, Kuroiwa A, Ikawa M, Okabe M, Kondoh G, Matsuda Y, Takeda J: Efficient chromosomal transposition of a Tc1/mariner- like transposon Sleeping Beauty in mice. Proc Natl Acad Sci USA. 2001, 98: 9191-9196. 10.1073/pnas.161071798.: Keng VW, Ryan BJ, Wangensteen KJ, Balciunas D, Schmedt C, Ekker SC, Largaespada DA: Efficient Transposition of Tol2 in the mouse germline. Genetics. 2009, 183: 1565-1573. 10.1534/genetics.109.100768.: Kitada K, Ishishita S, Tosaka K, Takahashi R, Ueda M, Keng VW, Horie K, Takeda J: Transposon-tagged mutagenesis in the rat. Nat Methods. 2007, 4: 131-133. 10.1038/nmeth1002.: Lu B, Geurts AM, Poirier C, Petit DC, Harrison W, Overbeek PA, Bishop CE: Generation of rat mutants using a coat color-tagged Sleeping Beauty transposon system. Mamm Genome. 2007, 18: 338-346. 10.1007/s00335-007-9025-5.: Takeda J, Keng VW, Horie K: Germline mutagenesis mediated by Sleeping Beauty transposon system in mice. Genome Biol. 2007, 8 (Suppl 1): S14-10.1186/gb-2007-8-s1-s14.: Ivics Z, Hackett PB, Plasterk RH, Izsvák Z: Molecular reconstruction of Sleeping Beauty, a Tc1-like transposon from fish, and its transposition in human cells. Cell. 1997, 91: 501-510. 10.1016/S0092-8674(00)80436-5.: Sinzelle L, Vallin J, Coen L, Chesneau A, Du Pasquier D, Pollet N, Demeneix B, Mazabraud A: Generation of transgenic Xenopus laevis using the Sleeping Beauty transposon system. Transgenic Res. 2006, 15: 751-760. 10.1007/s11248-006-9014-6.: Okabe M, Ikawa M, Kominami K, Nakanishi T, Nishimune Y: 'Green mice' as a source of ubiquitous green cells. FEBS Lett. 1997, 407: 313-319. 10.1016/S0014-5793(97)00313-X.: Dupuy AJ, Fritz S, Largaespada DA: Transposition and gene disruption in the male germline of the mouse. Genesis. 2001, 30: 82-88. 10.1002/gene.1037.: Offield MF, Hirsch N, Grainger RM: The development of Xenopus tropicalis transgenic lines and their use in studying lens developmental timing in living embryos. Development. 2000, 127: 1789-1797.: Etkin LD, Pearman B: Distribution, expression and germ line transmission of exogenous DNA sequences following microinjection into Xenopus laevis eggs. Development. 1987, 99: 15-23.: Nieuwkoop PD, Faber F: Normal table of Xenopus laevis (Daudin): a systematical and chronological survey of the development from the fertilized egg till the end of metamorphosis. 1994, New York & London: Garland Publishing, Inc.: Plasterk RH, Izsvak Z, Ivics Z: Resident aliens: the Tc1/mariner superfamily of transposable elements. Trends Genet. 1999, 15: 326-332. 10.1016/S0168-9525(99)01777-1.: Izsvák Z, Khare D, Behlke J, Heinemann U, Plasterk RH, Ivics Z: Involvement of a bifunctional, paired-like DNA-binding domain and a transpositional enhancer in Sleeping Beauty transposition. J Biol Chem. 2002, 277: 34581-24588. 10.1074/jbc.M204001200.: Cui Z, Geurts AM, Liu G, Kaufman CD, Hackett PB: Structure-function analysis of the inverted terminal repeats of the Sleeping Beauty transposon. J Mol Biol. 2002, 318: 1221-1235. 10.1016/S0022-2836(02)00237-1.: Zhang J, Yu C, Pulletikurti V, Lamb J, Danilova T, Weber DF, Birchler J, Peterson T: Alternative Ac/Ds transposition induces major chromosomal rearrangements in maize. Genes Dev. 2009, 23: 755-765. 10.1101/gad.1776909.: Zhang J, Peterson T: A segmental deletion series generated by sister-chromatid transposition of Ac transposable elements in maize. Genetics. 2005, 171: 333-344. 10.1534/genetics.104.035576.: Zhang J, Peterson T: Transposition of reversed Ac element ends generates chromosome rearrangements in maize. Genetics. 2004, 167: 1929-1937. 10.1534/genetics.103.026229.: Zhang J, Peterson T: Genome rearrangements by nonlinear transposons in maize. Genetics. 1999, 153: 1403-1410.: Lister C, Jackson D, Martin C: Transposon-induced inversion in Antirrhinum modifies nivea gene expression to give a novel flower color pattern under the control of cycloidearadialis. Plant Cell. 1993, 5: 1541-1553.: Martin C, Lister C: Genome juggling by transposons: Tam3-induced rearrangements in Antirrhinum majus. Dev Genet. 1989, 10: 438-451. 10.1002/dvg.1020100605.: Moerman DG, Kiff JE, Waterston RH: Germline excision of the transposable element Tc1 in C. elegans. Nucleic Acids Res. 1991, 19: 5669-5672. 10.1093/nar/19.20.5669.: Geurts AM, Collier LS, Geurts JL, Oseth LL, Bell ML, Mu D, Lucito R, Godbout SA, Green LE, Lowe SW, Hirsch BA, Leinwand LA, Largaespada DA: Gene mutations and genomic rearrangements in the mouse as a result of transposon mobilization from chromosomal concatemers. PLoS Genet. 2006, 2: e156-10.1371/journal.pgen.0020156.: Vigdal TJ, Kaufman CD, Izsvák Z, Voytas DF, Ivics Z: Common physical properties of DNA affecting target site selection of Sleeping Beauty and other Tc1/mariner transposable elements. J Mol Biol. 2002, 323: 441-452. 10.1016/S0022-2836(02)00991-9.: Yant SR, Wu X, Huang Y, Garrison B, Burgess SM, Kay MA: High-resolution genome-wide mapping of transposon integration in mammals. Mol Cell Biol. 2005, 25: 2085-2094. 10.1128/MCB.25.6.2085-2094.2005.: Wells DE, Gutierrez L, Xu Z, Krylov V, Macha J, Blankenburg KP, Hitchens M, Bellot LJ, Spivey M, Stemple DL, Kowis A, Ye Y, Pasternak S, Owen J, Tran T, Slavikova R, Tumova L, Tlapakova T, Seifertova E, Scherer SE, Sater AK: A genetic map of Xenopus tropicalis. Dev Biol. 2011, 354: 1-8. 10.1016/j.ydbio.2011.03.022.: Luo G, Ivics Z, Izsvák Z, Bradley A: Chromosomal transposition of a Tc1/mariner-like element in mouse embryonic stem cells. Proc Natl Acad Sci USA. 1998, 95: 10769-10773. 10.1073/pnas.95.18.10769.: Keng VW, Yae K, Hayakawa T, Mizuno S, Uno Y, Yusa K, Kokubu C, Kinoshita T, Akagi K, Jenkins NA, Copeland NG, Horie K, Takeda J: Region-specific saturation germline mutagenesis in mice using the Sleeping Beauty transposon system. Nat Methods. 2005, 2: 763-769. 10.1038/nmeth795.: Geurts AM, Wilber A, Carlson CM, Lobitz PD, Clark KJ, Hackett PB, McIvor RS, Largaespada DA: Conditional gene expression in the mouse using a Sleeping Beauty gene-trap transposon. BMC Biotechnol. 2006, 6: 30-10.1186/1472-6750-6-30.: Geurts AM, Yang Y, Clark KJ, Liu G, Cui Z, Dupuy AJ, Bell JB, Largaespada DA, Hackett PB: Gene transfer into genomes of human cells by the Sleeping Beauty transposon system. Mol Ther. 2003, 8: 108-117. 10.1016/S1525-0016(03)00099-6.: Mátés L, Chuah MK, Belay E, Jerchow B, Manoj N, Acosta-Sanchez A, Grzela DP, Schmitt A, Becker K, Matrai J, Ma L, Samara-Kuko E, Gysemans C, Pryputniewicz D, Miskey C, Fletcher B, VandenDriessche T, Ivics Z, Izsvák Z: Molecular evolution of a novel hyperactive Sleeping Beauty transposase enables robust stable gene transfer in vertebrates. Nat Genet. 2009, 41: 753-761. 10.1038/ng.343.: Yusa K, Takeda J, Horie K: Enhancement of Sleeping Beauty transposition by CpG methylation: possible role of heterochromatin formation. Mol Cell Biol. 2004, 24: 4004-4018. 10.1128/MCB.24.9.4004-4018.2004.: Ikeda R, Kokubu C, Yusa K, Keng VW, Horie K, Takeda J: Sleeping Beauty transposase has an affinity for heterochromatin conformation. Mol Cell Biol. 2007, 27: 1665-1676. 10.1128/MCB.01500-06.: Carlson DF, Geurts AM, Garbe JR, Park CW, Rangel-Filho A, O'Grady SM, Jacob HJ, Steer CJ, Largaespada DA, Fahrenkrug SC: Efficient mammalian germline transgenesis by cis-enhanced Sleeping Beauty transposition. Transgenic Res. 2011, 20: 29-45. 10.1007/s11248-010-9386-5.: Allen ND, Norris ML, Surani MA: Epigenetic control of transgene expression and imprinting by genotype-specific modifiers. Cell. 1990, 61: 853-861. 10.1016/0092-8674(90)90195-K.: Carver AS, Dalrymple MA, Wright G, Cottom DS, Reeves DB, Gibson YH, Keenan JL, Barrass JD, Scott AR, Colman A, Garner I: Transgenic livestock as bioreactors: stable expression of human alpha-1-antitrypsin by a flock of sheep. Biotechnology (N Y). 1993, 11: 1263-1270.: Clark KJ, Balciunas D, Pogoda HM, Ding Y, Westcot SE, Bedell VM, Greenwood TM, Urban MD, Skuster KJ, Petzold AM, Ni J, Nielsen AL, Patowary A, Scaria V, Sivasubbu S, Xu X, Hammerschmidt M, Ekker SC: In vivo protein trapping produces a functional expression codex of the vertebrate proteome. Nat Methods. 2011, 8: 506-515. 10.1038/nmeth.1606.: Yergeau DA, Kelley CM, Zhu H, Kuliyev E, Mead PE: Transposon transgenesis in Xenopus. Methods. 2010, 51: 92-100. 10.1016/j.ymeth.2010.03.001.: Doherty JR, Zhu H, Kuliyev E, Mead PE: Determination of the minimal domains of Mix.3/Mixer required for endoderm development. Mech Dev. 2006, 123: 56-66. 10.1016/j.mod.2005.09.006.: Download references: Acknowledgements: We thank Cheryl Winter for animal husbandry, Drs David Largaespada and Robert Grainger for providing plasmids (pCAGGS-SB 10 and 2.2 <U+03B3>1-crystallin-RFP, respectively). We thank the following St. Jude Children's Research Hospital shared resources: the Hartwell Center of Bioinformatics and Biotechnology for DNA sequencing and bioinformatics support; the Cytogenetics Lab for FISH analysis; and the Animal Resource Center. Support for this study was provided by the National Institutes of Health (HD042994, MH079381 to PEM and HD046661 to AKS and DEW) and by the American Lebanese and Syrian Associated Charities (PEM).: Author information: Affiliations: Corresponding author: Correspondence to Paul E Mead.: Additional information: Competing interests: The authors declare that they have no competing interests.: Authors' contributions: DAY carried out embryo injections, scored tadpoles, performed molecular analysis of transposon integration sites and helped prepare the manuscript. CMK performed molecular analyses of transposon integration sites, scored tadpoles and helped prepare the manuscript. EK performed embryo injections, scored progeny, assisted with molecular analyses and helped with general husbandry. HZ performed embryo injections and helped score progeny. MRJH generated the SB 10 transposase transgenic line. AKS and DEW provided mapping data to assign sequence scaffolds to the X. tropicalis linkage groups and/or chromosomes. PEM conceived the study, directed the project and wrote the manuscript. All authors read and approved the final manuscript.: Authors’ original submitted files for images: Below are the links to the authors’ original submitted files for images.: Authors’ original file for figure 1: Authors’ original file for figure 2: Authors’ original file for figure 3: Authors’ original file for figure 4: Authors’ original file for figure 5: Authors’ original file for figure 6: Authors’ original file for figure 7: Authors’ original file for figure 8: Authors’ original file for figure 9: Rights and permissions: Reprints and Permissions: About this article: Cite this article: Yergeau, D.A., Kelley, C.M., Kuliyev, E. et al. Remobilization of Sleeping Beauty transposons in the germline of Xenopus tropicalis. Mobile DNA 2, 15 (2011). https://doi.org/10.1186/1759-8753-2-15: Download citation: Received: 22 July 2011: Accepted: 24 November 2011: Published: 24 November 2011: DOI: https://doi.org/10.1186/1759-8753-2-15: Keywords: Advertisement: ISSN: 1759-8753: Follow BMC: By using this website, you agree to our Terms and Conditions, Privacy statement and Cookies policy. Manage the cookies we use in the preference centre. : © 2020 BioMed Central Ltd unless otherwise stated. Part of Springer Nature. "
"Soluble expression, purification and characterization of the full length IS2 Transposase" "Leslie A Lewis, Mekbib Astatke, Peter T Umekubo, Shaheen Alvi, Robert Saby, Jehan Afrose" "Leslie A Lewis" "27 October 2011" "The two-step transposition pathway of insertion sequences of the IS3 family, and several other families, involves first the formation of a branched figure-of-eight (F-8) structure by an asymmetric single strand cleavage at one optional donor end and joining to the flanking host DNA near the target end. Its conversion to a double stranded minicircle precedes the second insertional step, where both ends function as donors. In IS2, the left end which lacks donor function in Step I acquires it in Step II. The assembly of two intrinsically different protein-DNA complexes in these F-8 generating elements has been intuitively proposed, but a barrier to testing this hypothesis has been the difficulty of isolating a full length, soluble and active transposase that creates fully formed synaptic complexes in vitro with protein bound to both binding and catalytic domains of the ends. We address here a solution to expressing, purifying and structurally analyzing such a protein., A soluble and active IS2 transposase derivative with GFP fused to its C-terminus functions as efficiently as the native protein in in vivo transposition assays. In vitro electrophoretic mobility shift assay data show that the partially purified protein prepared under native conditions binds very efficiently to cognate DNA, utilizing both N- and C-terminal residues. As a precursor to biophysical analyses of these complexes, a fluorescence-based random mutagenesis protocol was developed that enabled a structure-function analysis of the protein with good resolution at the secondary structure level. The results extend previous structure-function work on IS3 family transposases, identifying the binding domain as a three helix H + HTH bundle and explaining the function of an atypical leucine zipper-like motif in IS2. In addition gain- and loss-of-function mutations in the catalytic active site define its role in regional and global binding and identify functional signatures that are common to the three dimensional catalytic core motif of the retroviral integrase superfamily., Intractably insoluble transposases, such as the IS2 transposase, prepared by solubilization protocols are often refractory to whole protein structure-function studies. The results described here have validated the use of GFP-tagging and fluorescence-based random mutagenesis in overcoming this limitation at the secondary structure level." "Coiled Coil, Catalytic Core, Rous Sarcoma Virus, Insertion Reaction, Catalytic Active Site" " Soluble expression, purification and characterization of the full length IS2 Transposase: Leslie A Lewis1,2, Mekbib Astatke3, Peter T Umekubo1,4, Shaheen Alvi1,5, Robert Saby1,6 & Jehan Afrose1,7 : Mobile DNA volume 2, Article number: 14 (2011) Cite this article : 7595 Accesses: 5 Citations: Metrics details: Abstract: Background: The two-step transposition pathway of insertion sequences of the IS3 family, and several other families, involves first the formation of a branched figure-of-eight (F-8) structure by an asymmetric single strand cleavage at one optional donor end and joining to the flanking host DNA near the target end. Its conversion to a double stranded minicircle precedes the second insertional step, where both ends function as donors. In IS2, the left end which lacks donor function in Step I acquires it in Step II. The assembly of two intrinsically different protein-DNA complexes in these F-8 generating elements has been intuitively proposed, but a barrier to testing this hypothesis has been the difficulty of isolating a full length, soluble and active transposase that creates fully formed synaptic complexes in vitro with protein bound to both binding and catalytic domains of the ends. We address here a solution to expressing, purifying and structurally analyzing such a protein.: Results: A soluble and active IS2 transposase derivative with GFP fused to its C-terminus functions as efficiently as the native protein in in vivo transposition assays. In vitro electrophoretic mobility shift assay data show that the partially purified protein prepared under native conditions binds very efficiently to cognate DNA, utilizing both N- and C-terminal residues. As a precursor to biophysical analyses of these complexes, a fluorescence-based random mutagenesis protocol was developed that enabled a structure-function analysis of the protein with good resolution at the secondary structure level. The results extend previous structure-function work on IS3 family transposases, identifying the binding domain as a three helix H + HTH bundle and explaining the function of an atypical leucine zipper-like motif in IS2. In addition gain- and loss-of-function mutations in the catalytic active site define its role in regional and global binding and identify functional signatures that are common to the three dimensional catalytic core motif of the retroviral integrase superfamily.: Conclusions: Intractably insoluble transposases, such as the IS2 transposase, prepared by solubilization protocols are often refractory to whole protein structure-function studies. The results described here have validated the use of GFP-tagging and fluorescence-based random mutagenesis in overcoming this limitation at the secondary structure level.: Background: IS2, a 1.3 kb transposable element, is a member of the IS3 family, the largest and most widespread family of insertion sequences (IS) ([1, 2]; see also ISfinder: http://www-is.biotoul.fr/is.html). These insertion sequences are characterized by terminal imperfect inverted repeats, the right (IRR) and left (IRL) ends, that flank an internal protein coding sequence (Figure 1a). The latter is comprised of two -1 frameshifted overlapping open reading frames, OrfA and OrfB (Figure 1a, i) and is regulated in IS2 by a weak extended-10 promoter (E-10) promoter (Figure 1b, ii). Within the overlap, a ribosomal slippage window [3, 4], characterized in IS2 by an A6G motif (Figure 1a, i), enables translational frameshifting to create the functional transposase (TPase) at a low frequency (OrfAB) but an A7G mutation (Figure 1a, ii) has permitted the production of an engineered frame-fused OrfAB as the principal translation product [5, 6]. The ends of these elements are bipartite structures (Figure 1b, upper) with internal protein binding domain and outer catalytic domains (CD) [7, 8] terminating in most cases with a CA-3' dinucleotide that is the essential substrate for cleavage and joining (donor function) reactions, see [9]. In IS2, IRL terminates with a TA-3' dinucleotide which creates a functional Pribnow box for a minicircle junction promoter (see below).: Organization of the IS 2 insertion sequence and its transposition pathway. (A) Wild type IS2 with left and right inverted repeats (IRL, blue; IRR, red) and the two overlapping open reading frames, orfA and orfB, expanded to show the detail of the A6G slippery codon window which regulates low levels of OrfAB formation (i). High levels of the transposase (TPase) are produced by altering the window to A7G (ii). (B) Upper. Aligned sequences of IRR and IRL ((i) and (ii)) with the binding domains (yellow) and color coded catalytic domains. Conserved residues are in uppercase and diverged residues are in lower case. The catalytic domain (CD) of IRL contains an additional G/C base pair that is essential for its role in target function [7]. The E-10 promoter, PIRL, [19] (ii) drives the events of Step I of the transposition pathway [6] resulting in the formation of the minicircle shown in panel C. Lower: Abutted ends at the minicircle junction (MCJ), form a more powerful promoter (Pjunc) which indispensably controls the events in Step II of the transposition pathway. The only functional form of Pjunc contains a single base pair spacer (x) which creates the mandatory 17 bp spacer. (C) The two-step transposition pathway of IS2. Step I (I) occurs in the TPase-DNA complex, the synaptic complex I (SC I). Asymmetric single strand cleavage of the active IRR donor is followed by strand transfer to the donor-inactive IRL target end, creating the figure-of-eight structure. Host replication mechanisms (HR) convert it into a covalently closed double stranded circular intermediate [10], the minicircle. In step II (II) a second synaptic complex (SC II) is assembled. Cleavages at the abutted CDs result in two exposed 3'OH groups which carry out transesterification attacks on the target DNA. CD: catalytic domain; E-10: extended-10 promoter; IRR/IRL: right and left inverted repeats; IS: insertion sequence; MCJ: minicircle junction; orf: open reading frame; SC: synaptic complex.: Transposition mechanisms, initially discovered in the IS3 family (see [2]) have been described as a two-step copy and paste pathway [10] which is now quite widespread and is found in several other families of insertion sequences, such as IS30, IS21 and IS256 [11–14]. In IS3 family members, IS911 [8, 15] and IS2 (Lewis et al, Protein-DNA interactions define the mechanistic aspects of circle formation and insertion reactions in IS2 transposition, submitted), Step I occurs within a synaptic complex (SC) or transpososome (Figure 1c, SC I) that is formed when the TPase binds to the two ends. In general, however, in these circle-forming elements the first step involves a circularization process (Figure 1c) in which either end (optionally) is the substrate for an asymmetric cleavage reaction that leads to a donor-to-target intrastrand joining reaction near the other end to form a branched figure-of-eight (F-8) structure [6, 16–18] Host replication mechanisms [10] convert the F-8 into a covalently closed double stranded minicircle (Figure 1c, HR) with the abutted ends generally separated by one or more base pairs derived from the host DNA flanking the target end. These abutted ends constitute the minicircle junction (MCJ) at which a powerful promoter (Figure 1b, lower; Pjunc [19–21]) is assembled and generates the higher levels of TPase needed for the formation of the second synaptic complex (Figure 1c, SC II).: In SC II, the MCJ, a reactive junction, is the substrate for strand transfer reactions; it is cleaved at the abutted termini of IRR and IRL, creating 3'OH groups which permit both ends to function symmetrically as donors (Figure 1c, Step II). Thus it has been proposed that intrinsically different transpososomes must be assembled at each of the two steps [7, 8]. This is particularly true for IS2. Although both right and left ends in other IS3 family elements, such as IS911 [16], IS3 [22] and IS150 [23], possess donor function in Step I reactions, in IS2 the right end is the exclusive donor and the left end the only functional target; this type of asymmetry has also been described for copies of IS256 in Tn4001 [13]. In IS2, the left end has evolved through altered residues at positions 2 (creating a TA-3' terminal dinucleotide), 5 and 7 and an additional base pair at position 9 in its catalytic domain (Figure 1b, upper) to become a unique target which ensures accuracy of the joining reaction through the insertion of a single base pair between the abutted ends [7]. This accuracy is essential for the formation of an MCJ with a mandatory 17 bp Pjunc spacer between the -10 Pribnow box and an outwardly reading -35 motif in the right end [19]. Despite these changes in the catalytic domain of IRL which suppress donor function in Step I, IRL does possess the donor function [19] needed for strand transfer to the target site in the Step II SC.: IS3 family TPases have been identified as members of the TPase/retroviral integrase superfamily (referred to as RISF) of polynucleotidyl transferases [9, 24–27] and functional comparisons of their protein-DNA interactions with those of other RISF TPases should be useful. To date, a complete and comparative biophysical analysis of the protein-DNA interactions in fully formed Step I and Step II SCs with protein complexed to the protein binding and catalytic domains of the inverted repeats (IRs) has not been reported for any IS3 family member or other circle-forming elements, primarily due to the difficulty in isolating full length proteins capable of binding efficiently and generating fully formed complexes with the IRs [8, 28]. Partial footprints of the ends have however been carried out with cell-free extracts in IS2 [5] and similar analyses carried out with the N-terminal half of the truncated protein have been reported for IS911 [8, 15, 17] and IS30 [29]. In order to carry out a detailed biophysical study with fully formed complexes in IS2 it was first necessary to resolve the problem of the intractable insolubility of the TPase.: We report here a protocol utilizing a green fluorescent protein (GFPuv) tag that generates an IS2 TPase derivative that functions normally in vivo. We show for the first time that preparation under native conditions results in the recovery of a full length, soluble derivative that, when partially purified, binds very efficiently to cognate DNA sequences in vitro. This binding utilizes residues at both the N- and C-termini of the protein and is shown elsewhere to generate fully formed SCs with double stranded cognate IRR, IRL and MCJ sequences, with TPase bound to both the protein binding and catalytic domains of the ends (Lewis et al, Protein-DNA interactions define the mechanistic aspects of circle formation and insertion reactions in IS2 transposition, submitted).: Although aspects of structure-function relationships of the IS2 and IS911 TPases have been reported [30–34], we show here, using the GFP-tagged TPase derivative, that mutations which confer gain- or loss-of-function that are readily recovered in all of the principal domains of the protein (for examples, see Table 1) have been used to confirm, extend and further refine these structure-function relationships in IS2 and other IS3 family TPases. In addition, we have been able to describe the role of a residue whose mutation appears to have consequences primarily beyond its domain. Specifically, first an N-terminal 3-helix (H + HTH) bundle constitutes a binding domain whose architecture includes the HTH motif in helices 2 and 3 and possesses at least one residue in helix 3 which appears to play a more global role by affecting cleavage reactions in the catalytic active site (CAS). Adjacent to this, is an atypical leucine zipper-like motif, null mutations of which have allowed us to decipher its mode of function in oligomerization and binding. Within the C-terminal half of the protein, a middle domain is located adjacent to a 5a helix/5ß strand secondary structure motif, the CAS, which is highly conserved in the RISF. Gain- and loss-of-function mutations in this latter domain help describe its role in regional binding (that is, to the catalytic domain of the ends (Lewis et al, Protein-DNA interactions define the mechanistic aspects of circle formation and insertion reactions in IS2 transposition, submitted) and global binding of the protein; but equally importantly, they give credence to the supposition that, at the tertiary level, the organization and function of the CAS is similar to that of the three dimensional a/ß/a catalytic core motif of proteins of the RISF.: Results: Purification of the IS2 TPase by conventional methods: Conventional methods for purifying active full length IS2 TPase under native conditions generated insoluble protein as inclusion bodies. Although standard solubilization protocols [35–37] and attempts at directed evolution [38] were unsuccessful, the protein was easily purified to homogeneity using denaturing protocols and refolded either on-column [39, 40] or in solution [41–43] in native buffers. In all cases, these TPase preparations bound very poorly to oligonucleotide substrates containing the cognate IRR DNA sequence in gel-retardation studies (for example see Figure 5a, lane 2).: Creation of an IS2orfAB::GFP fusion construct: Fusion of the GFPuv gene to the carboxy- but not the N-terminus of IS2orfAB generated a soluble fusion product under native conditions (see Methods). In brief, IS2orfAB was cloned into pGLO-ATG2 (Figure 2a), a modified version of the commercially available pGLO plasmid. The strategy was to clone an Eco RI-Nhe I cassetted version of IS2orfAB (Figure 2d) into the cloning sites created at the 5' end the of the GFP gene to generate pLL2522 (IS2orfAB::GFP clones; Figure 2e). The resulting slow growing colonies fluoresced much less intensely than control colonies carrying only the pGLO plasmid (Figure 3a).: Structure of plasmids used to create the IS 2OrfAB::GFP fusion construct. Modifications and alternations are indicated in red. (a) pGLO-ATG2, a derivative of the commercially available pGLO plasmid (Biotechnology Explorer GFP Chromatography kit, Bio-Rad Inc., Hercules, CA, USA) containing the GFPuv gene under the control of the PBAD promoter. An Eco RI-Nhe I cassetting site was created in the 5' multiple cloning site (MCS), to facilitate the cloning of the IS2orfAB fused frame gene. A unique Eco RI site was deleted from its position adjacent to the GFP stop codon and transferred to a position downstream of the PBAD promoter and 9 bp from an existing Nhe I site which encodes the first two amino acids of GFP. The mutagenizing primer for this last step also deleted the GFP start codon to create pGLO-ATG2. (b) pLL18, a pUC19 derivative with IS2 carrying the Kmr reporter gene [6]. IS2 in this construct contains the engineered orfAB gene described in Figure 1a (ii). (c) pLL2509A was created by removing the left inverted repeats and repositioning the existing Eco RI site to a location downstream of the PIRL promoter, effectively excluding this IS2 endogenous promoter from subsequent cloning of the cassetted orfAB gene. (d) pLL2521HK was created by the successive steps of adding (i) the 3'-located cassetting Nhe I site which included the removal of the orfAB stop codon and (ii) the 6XHIS-Tag, downstream of the Eco RI cassetting site. (e) pLL2522 was formed when the Nhe I-Eco RI cassetted orfAB (part d) was cloned into the corresponding 5' cloning site of pGLO-ATG2 (part a). bp: basepair; GFP: green fluorescent protein; IS: insertion sequences.: Comparative growth and fluorescence of colonies with the pGLO, pLL2522 and pLL 2524-XXX plasmids. (A) Contrasting growth patterns of colonies of XL1 Blue cells of E. coli (Stratagene Inc.) transformed with (a) the pGLO plasmid and (b) the pLL2522 (IS2orfAB::GFP) plasmid. Cells were plated on lysogeny broth (LB) plus carbenicillin and arabinose, incubated at 37°C for 48 hours and irradiated with UV light. (B) XL1 Blue cells transformed with the ligation products generated by cloning PCR products recovered from the Genemorph II Random mutagenesis of IS2orfAB DNA, into the Eco RI/Nhe I sites of pGLO-ATG2. Colonies were generated as described above and viewed after 72 hours at 37°C. Arrows identify the faster growing more brightly fluorescing colonies, the vast majority of which contained plasmids pLL2524-XXX (IS2orfAB::GFP-GMF) with loss-of-function mutations in the orfAB gene. Isolated colonies at the periphery of the Petri dish (see white asterisk) occasionally produced false positives without mutations or with silent mutations, for example, A42T. PCR: polymerase chain reaction.: Overexpression of the putative IS2OrfAB-GFP fusion protein: We assumed that the presence of fluorescence in colonies with the pLL2522 plasmid was an indication of a soluble fusion protein, and the supposition that the diminished fluorescence (see below) was not due to partial solubility of the protein [44] was confirmed by the presence of bright fluorescence of the supernatant after a standard native lysis procedure. Partial purification (see Methods) generated two prominent bands present in these isolates following SDS-PAGE analysis (arrows; Figure 4a, lanes 1-3 and 4b, lane 2) but absent from the control pGLO (Figure 4b, lane 1) or the pGLO-ATG2 plasmids (Figure 4b, lane 3). These were determined to be the 74 kDa fusion protein (the 46-kDa IS2 OrfAB TPase and the 27 kDa GFP) and the 17.5 kDa OrfA protein, the product of ribosomal frameshifting [3, 4]. The 74 kDa protein was also expressed from plasmid pTW2orfAB::GFP, where orfAB::GFP was cloned into a pTWIN2 vector (IMPACT; New England Biolabs, Ipswich, MA). In this case it was easily purified to near homogeneity using the manufacturer's protocol, followed by an ion exchange Q-sepharose polishing step (HiTrap Q XL, GE Healthcare, Piscataway, NJ; Figure 4c).: 12% SDS-PAGE analysis of proteins prepared under native conditions. (A) Analysis of fluorometrically determined peak fractions from Ni-NTA gravity flow affinity chromatography purification of the 6xHis-tagged OrfAB-GFP. Lanes: 1. Prestained Protein Molecular Weight markers (New England Biolabs). 2-4. Partial purification of the 74 kDa His-tagged OrfAB-GFP fusion protein (upper arrow) from cells with the pLL2522 plasmid. The lower arrow identifies the 17.5 kDa OrfA protein generated by programmed -1 translational frameshifting. These lanes represent peak fractions (determined fluorometrically) which were run out prior to pooling. (B) Analysis of the pooled fractions in part (A) following concentration and dialysis (see Methods). Lanes: 1. Hydrophobic interaction chromatography purification of the 27 kDa GFP from cells with the pGLO plasmid. 2. Pooled fractions from the purification protocol. 3. Protein preparation from the pGLO-ATG2 control plasmid. 4. Prestained protein molecular weight markers. (C). Purification of the 74 kDa OrfAB-GFP fusion protein to near homogeneity with the IMPACT system (New England Biolabs) from overexpression of the fused orfAB::GFP genes cloned into the pTWIN2 vector. The eluted protein was subjected to a polishing step on an ion exchange Hi Trap Q sepharose column (GE Healthcare Biosciences). GFP: green fluorescent protein; kDA: kiloDaltons; orf: open reading frame.: Electrophoretic mobility shift assays using purified and partially purified preparations of the IS 2 OrfAB-GFP fusion protein. (A) Purified OrfAB-GFP fusion protein preparations shown in Figure 4c and the purified native protein from refolding experiments were used in gel retardation reactions. 0.46 µM of the fusion protein and 6.02 µg of the refolded protein were reacted for 30 minutes at room temperature (20°C) with 2 nM of 32P-labeled annealed 87-mer oligonucleotides containing the 41 bp inverted right repeat sequence. The reactions were run at 4°C at 120 mA for 2400 Vhr in a 5% native polyacrylamide gel. The arrow shows complexes formed with low efficiency. Lanes: 1. Protein-free control. 2. Refolded native OrfAB. 3. OrfAB-GFP. (B) Partially purified preparations of the OrfAB-GFP fusion protein shown in Figure 4a and crude extracts from overexpression of the pTW2 OrfAB-GFP construct used in binding reactions. Approximately 80 nM of the protein from the partially purified preparations shown in Figure 4a and from the crude extracts were reacted with 2 nM of the 32P-labeled annealed 87-mer oligonucleotides as described in part A. The reactions were run for 1400 Vhrs at 4°C. Lanes: 1. Protein-free control. 2. Partially purified preparation of OrfAB-GFP. 3. Crude extract from the overexpressed pTW2 OrfAB-GFP plasmid. Bp: base pairs; GFP: green fluorescent protein; orf: open reading frame; Vhr: volt hour.: Electrophoretic mobility shift assays with IS2 OrfAB-GFP: Preparations of the OrfAB-GFP fusion protein purified to near homogeneity also bound poorly to cognate DNA sequences in gel retardation assays (Figure 5a, lane 3). Neither OrfA nor host factors, such as the bacterial histone-like protein, HU and integration host factor [45–47] enhanced binding efficiency (data not shown). On the other hand, the partially purified preparations of OrfAB-GFP shown in Figure 4a, lanes 2-4, generated results in which all of the DNA was driven into the complex (Figure 5b, lane 2). A similar result was obtained with the crude extract from the overexpression of pTW2orfAB::GFP (Figure 5b, lane 3). The multimeric nature of these complexes has been demonstrated in concurrent footprinting studies in which complexes similar to those shown in Figure 5b were created with MCJ DNA substrates containing abutted IRR and IRL ends. There, the protein binding domains and the catalytic domains of the two ends were protected along their entire lengths, suggesting that the complex consisted of at least a dimer (Lewis et al, Protein-DNA interactions define the mechanistic aspects of circle formation and insertion reactions in IS2 transposition, submitted).: Fluorescence levels can be used to isolate IS2 TPase loss-of-function mutants leading to a structure-function analysis of the protein: We asked whether loss-of-function mutants of the IS2 TPase could be isolated as faster growing more brightly fluorescing colonies in order to test the idea that the low level of fluorescence of slow growing colonies with the pLL2522 plasmid might be due to the toxicity of the fusion protein, as well as to explore the possibility that we could obtain and analyze random mutations along the entire length of the protein. Random mutagenesis of IS2orfAB was accomplished with the PCR-based Genemorph II Random Mutagenesis kit (Stratagene, Santa Clara, CA) using very low, low and medium mutation rates. PCR products were cloned into the Eco RI/Nhe I sites of pGLO-ATG2 and the ligation products transformed into XL1blue cells (Stratagene). After 72 hours at 37°C, faster growing, more brightly fluorescing colonies were observed among a background of less intensely fluorescing colonies (Figure 3b). Recovery and analysis of the plasmids pLL2524-XXX (that is, 001-110) from these brighter fluorescing isolates (referred to here as GMF strains 1-110) showed that they carried mutations at frequencies which corresponded to the protocol-based mutation rates.: From the 110 brightly fluorescing colonies which were isolated, twenty one orfAB sequences containing single mutations and two with interesting double mutations were successfully analyzed for the nature of their amino acid substitutions (Table 1) and for the corresponding effect of the substitutions on transposition frequencies (Table 2) as determined by a lacZ papillation assay [48]. In addition, the relative binding efficiencies of the TPase to the cognate IRR DNA sequence from 22 of the 23 mutants were determined on electrophoretic mobility shift assay (EMSA) gels (Figure 6 and Tables 1 and 2).: Electrophoretic mobility shift assays. Binding efficiencies of the IS2 OrfAB Transposase derivatives from 22 randomly induced mutants. Reactions were carried out for 30 minutes at 20°C, with 10 nM of 32P-labeled annealed 50-mer oligonucleotides (except where stated in part (f) below) containing the inverted right repeat sequence and 0.11 µM of the partially purified mutant or wild type IS2 OrfAB-GFP protein derivatives (see Methods). Domain locations of the substitutions are color-coded and identified by a single letter code, that is, the binding domain (B) yellow, the leucine zipper-like (L) blue, the catalytic active site (C) green, and the middle interval (M) orange. Reactions were separated on 5% native polyacrylamide gels at 4°C at 120 mA as follows: (a) 450 Vhrs. (b) 420 Vhrs (c) 300 Vhrs (d) 450 Vhrs (e) 450 Vhrs (f) 12% native PAGE for 300 Vhrs using 87-mer annealed oligonucleotides. Binding efficiencies are identified as follows: 5 = Identical to that of the wild type, that is, absence of any dissociation of the complex. 4.5 = a slight loss of compactness of the undissociated complex seen in the wild type control. 4.0 = as in 4.5 but with a faster migrating tail of dissociated complexes. 3.5 = as in 4.0 but with a more prominent faster migrating tail of dissociated complexes. 3.0 = significant loss of compactness of the complex with a small amount of uncomplexed DNA. 2.5 = as in 3.0 but with significantly more uncomplexed DNA. 2.0 = as in 3.0 but mostly composed of uncomplexed DNA. 0.5 = mostly composed of uncomplexed DNA with a small tail of dissociated complex. 0 = no complex formation, identical to that of the protein-free controls (lane 1 in each panel) or the GFP control (part a lane 10). Double mutations are indicated within rectangular boxes. For GMF 18 the operative mutation, L97H, is shown in red (gel c, lane 4). GFP: green fluorescent protein; orf: open reading frame; Vhr: volt hour: Sequence analysis of the wild type IS2 TPase and secondary structure analysis of the IS3 family TPases: The wild type IS2orfAB DNA sequence and those of five other members (IS861, IS3, IS911, IS407, and IS51) of the five principal sub-groups of the IS3 family [1, 30] were translated into the protein sequences using the ExPASy SWISS PROT translation toolkit [49]. These sequences were aligned using the ClustalW2 multiple sequence alignment tool [50] producing many groups of short aligned sequences (Figure 7) which were then analyzed for their secondary structure (Figure 8) using the Protein Structure Prediction (PSIPRED) Server [51]. Figure 7 merges the sequence alignment data and the secondary structure data for IS2 and describes a pattern that is essentially conserved in all of the five principal subgroups of the IS3 family (data not shown).: Alignment of OrfAB sequences from IS 3 -family sub-groups correlated with secondary structure data of IS 2. Sequences in descending order, IS861 (IS150 subgroup), IS3, IS911 (IS3 subgroup), IS2, IS407 and IS51 were aligned using the ClustalW2 multiple sequence alignment tool [50]. Coordinates above the sequences are those of IS2. Amino acid groups are color coded as follows: Red - acidic residues; blue - basic residues; green - non-polar hydrophobics; cyan - aromatics (Y and F); dark green - tryptophan; gray - proline; light purple - amides; blue-gray - small polar; aquamarine - small non-polar; ochre - glycine; magenta - histidine and brown - cysteine. Secondary structure elements (green cylinders for a helices and red arrows for ß strands) for IS2 were determined by the Protein Structure Prediction Protocol (see Figure 8) and are shown above the sequences for the N-terminus of the protein as B a1-3 (putative binding domain), the putative leucine zipper-like domain and the middle interval (elements M a1-7). In the C-terminal half of the sequences, elements of a putative catalytic active site motif are identified as C ß 1-5 and C a 1-6. IS: insertion sequence.: Secondary structure elements of the IS 2 OrfAB TPase. Elements were generated by the Protein Structure Prediction server [51]. The transposase (TPase) sequence has been color coded to identify the four putative domains; binding (yellow), oligomerization (leucine zipper-like; blue), a middle interval (orange) and the catalytic active site (CAS; green). The numbering of a helix #7 in the middle interval is designed here to reflect the alignment of the six principal a helices found in the IS3 family (Figure 10a). Numbering of a helices 2 and 3 in the CAS reflects the organization of the aligned elements in TPase and integrase sequences of the TPase/retroviral integrase superfamily (Figure 9c). Vertical arrows and substituted amino acids identify the locations of the 23 substitutions within the secondary structures of the IS2 TPase. CAS: catalytic active site; TPase: transposase.: Although DNA binding domains in TPases have long been identified at their N-termini [52] and an HTH motif for the IS911 TPase in the IS3 family has been confirmed experimentally by Rousseau et al. [34], the precise nature at the secondary structure level of all elements which contribute to the three-dimensional architecture of the binding domain in this family, and specifically in IS2, has not been demonstrated (see [5, 33]). We asked whether the three N-terminal a helices might comprise such a binding domain in IS2 and used the PSIPRED server [53] (Figure 9a) and the PHD secondary structure analysis algorithm (Pole Bioinformatique Lyonnais (PBIL; [54, 55]) to arrive at a consensus that the location of three a helices in a putative binding domain in the IS2 TPase was somewhere between residues 13 and 55 (Figure 9b). In addition, a PBIL-HTH Determination Algorithm based on the protocol of Dodd and Egan [56] detected an HTH motif at residues 30-51 (Figure 9c) corresponding approximately to helices 2 and 3 in Figures 8, 9a and 9b. Similar predictions have been made for the existence of an HTH motif in IS2 (residues 31-50) [5, 33] and in the IS3 family (including IS2, residues 30-55) [34], with the assumption in the latter study that a third N-terminal helix might form part of the binding domain. In this study we show through randomly recovered mutations that the binding domain of the IS2 TPase at a secondary level consists of a three-helix H + HTH bundle and provide evidence for the precise locations of the three helices.: Secondary structure predictions for the first 120 amino acids of the IS 2 OrfAB TPase. (A) Comparison of secondary structure predictions based on the Protein Structure Prediction server protocol [51] and the PROF Secondary Structure Protocol [53]. The PCOILS analysis for coiled coils [57, 58] is also shown. Disordered regions (D) determined by the VSL2 predictor package from the DisProt database [111, 112] correspond well with these secondary structure predictions. (B) Secondary structure analysis of the first 60 amino acids of the IS2 TPase generated by the Pole Bioinformatique Lyonnais [54] PHD Secondary Structure Analysis algorithm [55]. H/h = alpha helix; C/c = random coil and e = extended strand. (C) Identification of a putative HTH motif in the first 60 amino acids of the IS2 TPase generated by the Pole Bioinformatique Lyonnais HTH Determination Algorithm of Dodd and Egan [56]. TPase: transposase.: A PCOILS analysis for coiled coils [57, 58] predicted the presence of a coiled coil motif (Figure 9a) in the IS2 TPase between residues 73 and 100. Lei and Hu [33], using deletion derivatives of IS2 OrfA, showed that a sequence between residues 58 and 105 was responsible for dimerization and they as well as Haren et al. [30], predicted that the sequence between residues 73 and 100 of IS2 OrfA possessed an atypical heptad repeat showing some similarities to the canonical leucine zipper (LZ) of DNA binding proteins. In this study, however, a probe for the potential for a LZ within the first 120 residues of IS2 OrfAB was scored at a probability of zero using the 2ZIP server [59] even though the existence of a coiled coil domain between residues 73 and 100 was confirmed with a probability of 0.8 to 1.0. Here, we show through the use of loss-of-function point mutations how this sequence functions as an LZ-like motif and describe its role in the oligomerization, DNA binding and transposition properties of the IS2 TPase.: The alignment corresponding to IS2 residues 103 to 400 in Figure 7 matches that previously published for the IS3 family TPases and the retroviral integrases [60], as well as for the IS3, IS4 and IS6-family TPases and integrases from several retroelements residues 236-354 [61]. The latter sequence, the CAS, is characterized by the presence of an invariant triad of catalytic carboxylases, the D, D(35)E motif [9, 27, 62, 63]. We asked what degree of correlation might exist between the aligned residues 101 to 400 in Figure 7 and a structure-based alignment of the sequences of the a helices and ß strands generated by PSIPRED analysis in Figure 8; that is, how similar would these elements be in sequence and length in the IS3 family TPases and in the HIV-1 and Rous sarcoma virus (RSV) integrases.: Of the six alpha helices in a middle interval (residues 105 to 210 of IS2), from all six TPases in the IS3 family sub-groups (Figure 10a), only a helices 2, 5 and 6 were well aligned. Only a helices 4, 5 and 6 in the IS3 family, located just upstream of the CAS (Figure 8), aligned with the NH2-terminal a helices of the integrases.: Structure-based alignments of middle interval and catalytic active site elements of IS 3 -family transposases and HIV-1 and Rous sarcoma virus integrases. (A) a helices identified in the middle interval of the IS2 transposase (TPase) and the corresponding sequences of five other members of the principal sub-groups in the IS3 family. Where applicable, the sequences of corresponding elements in the Rous sarcoma virus (RSV) and the HIV-1 were also aligned (red lettering). All coordinates are those of IS2. Functionally conserved non-polar hydrophobic residues are highlighted in yellow and identified as h1 and h2 (Methods - alignment tools). Functionally conserved basic residues (b) are highlighted in blue. NA = no alignments identified in the integrases of RSV and HIV-1. (B) a helices and ß strands in the catalytic active sites (CASs) of the TPases of IS2, five other IS3 family members, and the integrases of RSV and HIV-1 (red lettering). Functionally conserved hydrophobic and basic residues are identified as described in part A. In addition, functionally conserved acidic residues or their amides (a) are highlighted in purple, non-polar aromatics (aro) in green, polar serines and/or threonines (p) in orange and prolines (pro) in mauve. DDE residues are indicated by large black dots. Sequences in parentheses are not components of the a helices or ß strands. a helix 2 (2+3) in the TPases aligns with helices 2 and 3 in the integrases. Residues conserved in a helix 2 of the integrases and in its remnants in IS407, are enclosed in a black rectangle. Large double asterisks indicate short a helices with no homology to other sequences (see part C graphic). Substitutions are indicated by red ovals; twin ovals indicate A341P and A341T. (C) Graphic alignment of a helices and ß strands of the CASs of the TPases of IS2 and five other members of the IS3 family and of the integrases of RSV and HIV-1. Black dots within the elements represent the positions of the DDE triad. DDE: the catalytic triad of two aspartates and a glutamate; CAS: catalytic active site; IS: insertion sequence; RSV: Rous sarcoma virus; TPase: transposase.: Structure-based sequence alignments of residues corresponding to residues 236 to 398 in IS2 for IS3 family TPases and the HIV-1 and RSV integrases showed a series of five well-aligned a helices and five equally well-aligned ß strands (Figure 10b), showing almost perfect conservation in their lengths, with high levels of identity (the presence of the same amino acid in at least 85% of the eight sequences) and high proportions of functionally conserved residues per element (approximately 50% in the ß strands and 25% in the a helices). The significance of this in this study is that all but one of the eight random mutations recovered in this domain occurred at these conserved residues.: These a helices and ß strands occur in a conserved order (Figure 10c) characteristic of the integrases and of the TPases with the DDE motif of two aspartates and a glutamate, for example, Mu [64], Tn5 and the IS1 family [65, 66]. In IS3 family TPases, a helices 2 and 3 in the integrases are present as a single helix (a helix 2) and it is interesting that remnants of a helix 2 of the integrases are seen in IS2 and IS407 but specifically in IS407, as two well-conserved residues in the first three amino acids of the single a helix (Figure 10b). In IS911 of the IS3 family, this group of tightly conserved elements has been proposed to be the putative CAS [2, 24, 34].: The three-dimensional structure of this unit, the catalytic core, has been demonstrated in several members of the TPase/RISF, including the TPases of the DDE family, such as Mu [64] and Tn5 [67], the integrases, such as HIV-1 [68–71] and the avian (ASV) and Rous (RSV) Sarcoma viruses [72, 73] and other nucleases, for example, RNase H1 [74, 75] and RuvC [76]. For comprehensive reviews see [25, 26, 77]. This catalytic core is characterized by a five-stranded partially buried ß sheet of mixed parallel and antiparallel elements with a polar face, with six a helices distributed on either side of it. The two aspartate residues of the DDE catalytic triad are located on adjacent strands of the ß sheet (numbers 1 and 4) with the glutamate residue assigned to the closely located a helix 4 [78]. We show here that randomly induced mutations in this putative catalytic core that affected residues other than the DDE alter the function of this motif in both positive and negative ways, identifying additional signatures characteristic of the catalytic core and supporting the intuitive contention that, in the IS3 family, it is organized and functions like the three-dimensional structure in the RISF; additional mutations also provide insights into its role in both the regional and the global binding strategies of the protein.: Effect of TPase mutations on TPase binding efficiencies and on in vivo transposition frequencies of IS2: Eleven of the twenty-five mutations (from the twenty-one single mutants and two double mutants) were within the putative binding domain, five were located in the coiled coil domain, eight in the putative CAS and one in the middle interval (Table 1; see also Figure 8 for an overview of the locations of these mutations within the secondary structures of the TPase). The binding efficiencies of the partially purified TPases of 22 of the mutant proteins were studied by EMSA (Figure 6) using a pair of annealed oligomers (50 bp in length) containing 41 bp of cognate DNA of the IRR [6]. The substrate was labeled at the 5' end of the upper strand with <U+03B3>32P (see Methods). A summary of the binding efficiencies together with results of in vivo transposition frequencies of all 23 mutants (determined from lacZ transposition assays) is shown in Table 2.: The putative binding domain: Nine mutants with substitutions in the putative binding domain are described in Table 2 (rows 4-12). Binding data are shown in the EMSA gel (Figure 6, yellow highlights). Proteins from three of the mutants, GMF isolates 28 (S44N), 29 (L58I) and 34 (R13H) (Figure 6a, lanes 7-9) formed no complexes, indicative of structural defects. The TPase from the double mutant, GMF 36 (R37Q/S44N- Figure 6b, lane 2), however, showed a partially restored, unstable, dissociated complex, absent in isolate 28 (S44N). Two GMF isolates, 9 (R50H; Figure 6a, lane 5) and 13 (S57G; Figure 6c, lane 2) also produced proteins which formed mostly dissociated complexes, likely indicative of deficiencies in binding reactions to the DNA substrate (see discussion). All six of the mutants with TPases completely defective or deficient in binding, (GMF isolates 9, 13, 28, 29, 34 and 36) had significantly reduced or no detectable levels of transposition (Table 2, rows 5-10). The remaining three mutants with substitutions in the putative binding domain, GMF isolates 4 (A42T; Figure 6c, lane 3), 37 (W49R) and 40 (V35L) (Figure 6b, lanes 3 and 5) showed marginal or no observable effects on binding efficiency. Two of these three mutants, GMF isolates 4 (A42T) and 40 (V35L) (Table 2, rows 4 & 12) had in vivo transposition frequencies (approximately 1.3) that were statistically comparable to those of the wild type controls, two versions of which, one fused to GFP (Table 2, row 3) and the other not (Table 2, row 2), showed identical transposition frequencies within experimental error.: The third mutant with little or no loss of binding efficiency, GMF 37, (W49R) was the single exception to the consistency in the relationship between binding efficiency and transposition frequency described above (Table 2, row 11). While this TPase derivative was quite proficient in binding to the substrate, the substitution completely abolished transposition. The apparent inconsistency in these properties of GMF 37 can be explained by the fact that W49 in IS2, which is one of the most highly conserved residues in the IS3 family (Figure 7 and [34]) and is also conserved in the homeodomain proteins [79], may play a more global role in effecting transposition. It may not simply be limited to a binding domain function and is not likely to be involved in DNA sequence recognition in helix 3 (see discussion).: The abolition of both DNA binding and in vivo transposition in R13H and L58I (Table 2, rows 8 and 9) and the significant reduction in transposition frequency and binding in S57G (Table 2, row 6), suggest that the architecture of the binding domain consists of a three helix bundle encompassing residues 13 to 58. Furthermore, the ability of the R37Q/S44N double substitution in helices 2 and 3 (Table 2, row 10) to partially restore both the binding and transposition lacking in S44N, suggests that they may be involved in the H-bonded stabilization of the two helices where the HTH motif may be located (see Figure 11 and the discussion section for a complete elaboration of these ideas).: Analysis of locations and phenotypes of nine randomly induced substitutions in the binding domain of IS 2 OrfAB. Location of the three a helix bundle which constitutes the binding domain (green cylinders) is based on the prediction of the PBIL-PHD Secondary Structure Analysis Algorithm ([55]; see Figure 9b). The sequence in red indicates the prospective HTH motif identified by the PBIL- HTH Determination Algorithm of Dodd and Egan [56]. The Pabo and Sauer [95] consensus sequence for prokaryotic HTH motifs is shown within the large brackets and correlates well with this prospective motif (red lettering). Four of the nine mutations fell within this 12-residue consensus sequence including the double mutation represented by the combination of the red bracket and the hooked arrow. The phenotype of this double mutation is indicated by the vertical red arrow. Binding efficiencies are as described in Figure 6 and transposition frequencies were calculated as described in Table 2.: The coiled coil motif: Five of the randomly induced mutations (in GMF isolates 6, 7, 18, 94 and 106) fell into the coiled coil segment (Table 2, rows 13-17 and Figure 10, blue highlights). Although isolate GMF 18 carries the double substitutions A42T+L97H, its phenotype, that is the loss of transposition and an unstable complex (Table 2, row 15; Figure 6a, lane 6) should be allocated to L97H, since analysis of the A42T mutation showed that the transposition frequency of the GMF 4 mutant and the binding efficiency of its protein are identical to those of the wild type. Another mutant, GMF 106 (L83V; see Figure 6d, lane 5), showed complete loss of binding proficiency and two others, GMF 6 (Q79L; Figure 6a, lane 4) and 7 (N94D; Figure 6c, lane 5) showed marked dissociation of their complexes in the EMSA gel. All five mutations effectively eliminated transposition (Table 2, rows 13-17).: The four heptads which make up the putative LZ motif in the IS2 TPase and the substitutions within them are shown in Figure 12a. This proposed LZ motif contains zipper-functional leucines in only two of the four d positions that are assigned to a canonical LZ [80, 81]; see also the aligned sequences of predicted LZ sequences in the IS3 family [30]. Two of the five randomly induced substitutions in the coiled coil segment, L97H (GMF 18) and L83V (GMF 106) affected these hydrophobic residues. The three other substitutions also affected residues that are critical to the function of a LZ-like motif; Q79L (g) and N94D (the a-located buried Asn) likely affected residues that are required for inter-subunit stabilization and K89M appears to have altered a c position residue essential for the integrity of the helical structure. Figure 12 and the discussion section contain a detailed explanation of how all five of these randomly isolated mutations resulted in amino acid changes that would critically compromise a zipper-like function of the domain.: Analysis of the coiled coil domain in IS 2 OrfAB aligned with similar domains in the IS 3 family. (A) The coiled coil sequence in IS2 identified by the PCOILS analysis of coiled coils [57, 58] annotated to show the four putative heptad repeats of a leucine zipper-like motif. Italicized letters a to g represent the repeated positions within each heptad. The critical d positions which favor hydrophobic leucines are highlighted in green (or in red for a non-canonical amino acid). The a-located buried asparagine (N94) is shown in red while green lettering identifies the three canonical a-located hydrophobics. The five randomly induced mutations are indicated by arrows. The corresponding GMF mutant strain is listed beneath each mutation. (B) Alignment of the coiled coil domains of seven members from the five principal subgroups of the IS3 family showing their relationships to the putative heptads of a leucine-zipper motif. Annotation is as described in part A but for the IS2 sequence the a positions are highlighted in aqua. (C) Analysis of the potential of the coiled coil sequence in IS2 to function as a leucine zipper and the effect of mutations recovered within the motif on that function. The data suggest that the sequence which fails the 2ZIP test for a leucine zipper [59] may indeed have that function. Stabilization by the two d-located leucines is indicated by vertical bold green lines, by the a-located hydrophobics by narrow green lines and by the buried asparagine by a vertical broken red line. Weak salt bridges between glutamines in the g and e locations in heptads 1 and 2 are indicated by a large narrow-lined red × and the canonical ionic salt bridges between the g and e-located E and K residues in heptads 3 and 4, are indicated by a large bold red X. Binding efficiencies (see Figure 6) and transposition frequencies (see Table 2) are listed below the schematic. Additional annotation is as described in part A. GFP: green fluorescent protein; IS: insertion sequence.: The catalytic active site: Eight of the twenty-five mutations occurred in the proposed CAS of the protein (see GMF isolates 3, 22, 24, 31, 38, 68, 71 and 96 in Table 2, rows 18-25) and seven of them altered conserved residues (Figure 10b). EMSA gel reactions are shown in Figure 6 (green highlights). Three protein derivatives from GMF 22, 24 and 31 (A341P, L266P and V301M (Figure 6e, lanes 2-4) produced no complexes. Three others showed mostly dissociated complexes, GMF 3 (R291H; Figure 6a, lane 3), GMF 68 and 71 (H267D and E391K; Figure 6d, lanes 2-3). Two mutant derivatives with proficient binding reactions were GMF 38 (A341T; Figure 6b, lane 4) and GMF 96 (W237R; Figure 6f, lane 4); of these, the transposition frequency in the former was enhanced by about 50% and abolished in the latter (Table 2, rows 22 and 25). Transposition was eliminated in the six mutant derivatives with deficient or completely defective binding reactions (Table 2, rows 18-21 and 23-24). The locations of these substitutions in three a helices and three ß strands of the CAS are shown in Figure 10b. Two of the eight substitutions altered residues conserved only in the IS3 family (R291H and V301M), one affected a non-conserved residue (H267D) and the remaining five substitutions resulted from alterations of residues conserved in the RISF.: The six TPase derivatives whose binding efficiencies were partially or completely reduced give some insight into the role of the putative catalytic core's contribution to both regional (catalytic domain) and global (catalytic and binding domains) binding of the TPase. Three mutations eliminated global binding, indicative of the structurally destabilizing effects of the substitutions. The A341P substitution located one residue from E342 of the DDE catalytic triad altered a residue at a position normally conserved for a hydrophobic amino acid in a helix 4 of the RISF. The presence of the helix-breaking proline had a devastating effect on binding and most of the DNA remained uncomplexed (Figure 6e, lane 2). Binding of the protein was completely eliminated in two other derivatives (Figure 6e, lanes 3-4). First, the L266P substitution occurred in ß strand 3 where proline replaced a hydrophobic residue that is essentially conserved in the RISF; secondly, V301M changed another very hydrophobic residue that is conserved the IS3 family as either a valine or leucine in ß strand 4 and is located adjacent to the second Asp of the DDE triad in the RISF (D306 in IS2).: EMSA gels of TPase derivatives with three other substitutions showed reactions in which unstable complexes were formed, suggestive of a reduction in the binding affinity of the CAS for its DNA contacts. R291H altered a positively charged residue in a helix 1, which is essentially invariant in the IS3 family, for one which readily assumes a neutral state (Figure 6a, lane 3). E391K substituted a basic residue for one which is essentially conserved as glutamate or glutamine in a helix 6 of the RISF. H267D substituted a negatively charged residue at a non-conserved position in ß strand 3 (Figure 6d, lanes 2-3). The combined results from these six substitutions suggest that the catalytic core plays a role not only in binding to the catalytic domain of the end (unstable complexes) but that its integrity contributes to global binding proficiency of the full length protein (see Discussion).: Two mutations which did not affect binding proficiency provided insights into the role of ß strand 1 and a helix 4 in facilitating the catalytic functions of the IS2 TPase (Table 2, rows 22 and 25). The 50% increase in transposition frequency of the mutant with the A341T mutation likely results from the substitution of a polar residue at this conserved hydrophobic position in the RISF, creating the potential for an additional specific or stochastic contact with the terminus possessing the CA-3' dinucleotide. The W237R mutation, located three residues from D240, a member of the catalytic triad, replaced a highly conserved aromatic residue in the RISF in ß strand 1 with a basic amino acid and completely eliminated transposition without affecting the global binding proficiency. This substitution replaced a residue that is probably involved in positioning the DNA in the catalytic pocket [82], a change that did not affect the integrity of the ß strand (see Discussion).: The middle interval: The V179L (GMF 101) substitution occurred in a helix M5 (Figure 10a). This change disrupted binding (Figure 6f, lane 5) and completely eliminated transposition (Table 2, row 26), a result which suggests that at least a helices M4-M6 of the middle region of the protein, which are aligned with the first three N-terminal helices of the integrase protein (IN), contribute to the overall structural and functional architecture needed to facilitate binding by the protein.: Discussion: Rationale for soluble expression of the GFP-tagged IS2 TPase: GFP has been used widely as a reporter or biological marker [83], extensively in fusion constructs to determine the extent of solubility of target proteins, in protein folding assays and in directed evolution [44, 84]. Although its use as an agent to facilitate the soluble expression of proteins that misfold or aggregate when overproduced in Escherichia coli has been approached with caution [85], success has been reported for a plant actin [86]. We reasoned that, given its robust solubility, it might be used to facilitate soluble expression of the intractably insoluble IS2 TPase under native conditions.: The full length fusion protein achieves very efficient binding to cognate DNA sequences: The inefficient binding to cognate DNA of full length native or GFP-tagged IS2 TPase, purified to homogeneity, contrasts starkly with the extremely efficient binding of the partially purified OrfAB-GFP utilizing residues at both the N- and C-termini of the TPase. In addition, footprinting studies reported elsewhere show that the protein binds to both the protein binding and catalytic domains of IRR, generating fully formed complexes (Lewis et al, Protein-DNA interactions define the mechanistic aspects of circle formation and insertion reactions in IS2 transposition, submitted). In this study we have not explored in detail the reasons for this difference but reports of inefficient binding of full length TPases of insertion sequences are not uncommon. For example, in IS911 [8, 15] and in IS30 [28, 29], both of which transpose via the two-step circle-forming pathway, successful footprinting studies have only been conducted with truncated versions of the Tpase, which retain the DNA binding domain and lack the C-terminus. Inefficient binding was initially also reported for IS50 [87, 88] and in both IS50 [88] and IS911 [15] it has been proposed that this is due to interference of binding domain function by the C-terminus. Recently, a full length calmodulin-binding peptide fusion derivative of the IS256 TPase, which catalyzes circle formation in this element [12], was shown to bind to the ends, but it did so much less efficiently than N-terminal fragments containing the DNA binding domain, lending additional support to this hypothesis [89]. Other reports of inefficient binding by recombinant TPases in both prokaryotic and eukaryotic transposons, such as IS903 [90], Tc1 [91] and TAG1 [92], has led to the speculation that improper folding during the purification process may be the cause of inefficient binding. Our results with the partially purified IS2 TPase suggest that an unidentified component or, speculatively, even the presence of unspecific or IR DNA may be the agent which facilitates and/or maintains proper folding in these TPases.: The DNA binding domain of IS2 OrfAB consists of a three-helix bundle with a defined HTH motif: The location of three a helices, which might comprise the binding domain of the IS2 TPase at positions 13 to 26, 32 to 38 and 43 to 55, by the PHD secondary structure algorithm of PBIL [55] represents the best fit of our data (compare Figures 9b and 11). The only discrepancy is our decision to include residues 56 to 58 in helix 3 because substitutions S57G and L58I both negatively impact binding and transposition. L58I substitutes a residue whose most pronounced effect is its difficulty in adapting to an a helix conformation because of its branched ß carbon for one which shows a distinct preference for being in a helices [93]. The absence of complex formation (Figure 6c, lane 2) suggests that the substitution destabilized the a helix and likely the entire binding domain. We discuss the role that S57 plays in the recognition helix of an HTH motif below. These two substitutions suggest that residues 57 and 58 are within helix 3 or, less likely (given the potential role of S57 described below), are required for the stabilization of the helix. The R13H substitution completely abolished both binding and transposition (Figure 6a, lane 9) by replacing a polar, hydrophilic, positively charged residue that often has a structural role [94] with one which is less likely to carry a charge, making it likely that helix 1 plays an important role in the structural architecture responsible for binding the cognate DNA sequence in IS2. These data suggest that the binding domain includes all three helices and is comprised of residues 13 to 58 (Figure 11).: The HTH motif predicted by the HTH secondary structure analysis protocol of PBIL [54] also represents an excellent fit with our data. The motif includes residues M30 to K51 and is associated with helices 2 and 3 of the putative binding domain (compare Figures 9c and 11). The consensus sequence of Pabo and Sauer [95] which generally characterizes the HTH motif in prokaryotes supports the claim that it resides in helices 2 and 3 (Figure 11). When this consensus sequence [ho-G/A-(X)2]helix 1-[ho-G-ho-X]turn-[(X)3-I/L/V-...]helix2, is applied to residues M30 to L58, (where ho is a hydrophobic residue, and × is any residue) we see a very reasonable fit: [V35-A36-R37-Q38]helix 1-[H39-G40-V41-A42]turn-[A43-S44-Q45-L46....]helix2. The critical residues here (in bold) are, (i) the optional hydrophobics (ho), V35 in helix 1 and H39 and V41 in the turn (histidine has the potential to be buried like a hydrophobic [93]) and (ii) three conserved hydrophobics, A36 in helix 1, the invariant glycine (G40) in the second position of the turn (both weak hydrophobics) and L46 in helix 3 (Figure 11).: It is interesting that four of the nine randomly induced substitutions in the binding domain affected residues in this consensus sequence. A comparison of the effects of the S44N substitution and of the R37Q/S44N double replacement in helices 1 and 2 respectively of the proposed HTH motif gives some additional insight into the role of these two residues in the stabilization of the HTH motif. Since the drastic effect of S44N (no detectable binding and 80-85% reduction in the transposition frequency, Figure 6a, lane 7) is partially reversed by R37Q/S44N (about 60% and 65% reduction in binding and transposition frequency, respectively, Figure 6b, lane 2), we make the following assumptions: S44 and R37 are likely involved in interhelix H-bonding and contribute to stabilizing the HTH. In the S44N mutant derivative, arginine and asparagine are apparently not as effective in H-bonding, resulting in a destabilized motif. H-bonding by glutamine and asparagine in the double mutant, however, appears to be partially restored, most likely because of the increased capacity of this pair of amino acids to form H-bonds [94].: The fact that four of the seven mutations which disrupted binding occurred in the second helix of this HTH motif (Figure 11) supports the convention that it is the recognition helix. Two of these substitutions, R50H and S57G, help identify residues that are likely involved in making specific DNA contacts. The R50H substitution in the putative recognition helix produced a protein derivative which generated the partially dissociated complex in Figure 6a, lane 5 and completely eliminated transposition. In this case the positively charged arginine is replaced by an amino acid whose flexibility in shedding its proton allows it to readily assume a neutral state, making it less effective as a residue involved in binding to DNA sequences [93] and suggesting that R50 plays a pivotal role in recognizing its cognate DNA sequence. Because the IS2 transposition pathway requires separate binding events for each of the two steps, even a moderate reduction in binding would probably have a drastic effect in reducing transposition frequency, as seen with R50H. S57G substitutes a small residue without a side chain for a polar hydrophilic residue with a fairly reactive OH group, which is usually involved in forming hydrogen bonds. Since this residue is located in the putative recognition helix, a DNA-contact assignment to S57 could also explain the effect of this substitution in generating the dissociated complex in Figure 6c, lane 2.: Two substitutions, A42T and V35L, which produced little or no change in the wild type phenotype, lend additional support to our identification of the HTH based on the Pabo and Sauer predictions. Replacement of A42 in the four-residue turn with any small amino acid would probably have little effect on protein function (A42T; Figure 6c, lane 3); in addition, the replacement of the optional hydrophobic, V35, with leucine in the first helix of the HTH would not be expected to have a significantly negative effect (Figure 6b, lane 5) on HTH function (see Figure 11). These results confirm that in IS2, N-terminal helices 2 and 3 contain the HTH motif with a four-residue turn between them. Thus the IS2 binding domain consists of residues 13 to 26 which form helix 1, 32 to 38 form helix 2, (helix 1 of the HTH; Figure 11), 39 to 42 form the turn, and 43 to 58 form helix 3 (helix 2 of the HTH; Figure 11). The A42T mutation has an interesting phenotype in that it was selected as a bright colony (see the legend to Table 3) but is not toxic to the cell even though it is phenotypically a silent mutation. It is possible that its protein is produced in lower amounts or that the mutation has simply made the protein more soluble.: These results are in general accord with, and extend the work of, Prere et al. [52], Hu et al. [5], Lei and Hu [33] and Rousseau et al. [34] on IS3 family TPases. Hu et al. predicted the existence of an HTH motif in the IS2 TPase at residues 31 to 50 and Lei and Hu demonstrated the loss of binding capability experimentally for IS2 OrfA deletion derivatives lacking as few as the first 12 residues (likely destabilizing the formation of helix 1) and as many as 57 residues from the N-terminus. PSIPRED secondary structure analyses of the TPases of all other prototypes of the principal subgroups of the IS3 family show three helices whose positions are similar to those shown for IS2 (data not shown).: There is much evidence for multihelix binding domains which include at least one HTH motif in TPases. IS30, which transposes via a circle-forming pathway, possesses an N-terminal binding domain with two HTH motifs, one of which is a component of an H + HTH structure [28]. The MuA Iß and I<U+03B3> DNA-binding subdomains which form bipartite binding structures are composed of five and four a helices, respectively, each including an HTH motif [96, 97]. In the case of the Iß subdomain of MuA, all five helices are involved in the interaction with the DNA. Similar results have been reported for the TPases Tc3 [98] and the Tc1-like element Sleeping Beauty [99] whose multihelix structures with two HTH motifs are not dissimilar from those of the homeodomain family of helix-turn-helix DNA-binding proteins [100] or the paired DNA binding domain family [101].: The W49R substitution in the second and putative recognition helix of the HTH generated a protein with no negative effects on binding efficiency (Figure 6b, lane 3) but lacked any capacity for transposition (Table 2, row 11). Resolution of this apparent contradiction has led to the conclusion that W49 may not directly interact with the protein binding domains of IRR and IRL. Figure 7 shows that few residues in the N-terminal helix 3 (B a-3) in IS2, are conserved in IS3 family TPases. This is expected for the recognition helices of these motifs which have little identity in the sequences of their ends; on the contrary, W49 in IS2 however, corresponds to what has been described as one of the most highly conserved of all residues in the TPases of the IS3 family [34]. The ability of the W49R mutation to disrupt transposition but not binding in IS2, (even when a charged hydrophilic residue is substituted for a highly hydrophobic one) suggests that the function of W49 may extend globally in the protein and is not confined to binding functions of the HTH motif.: A similar but not identical inconsistency in the relationship between binding efficiency and transposition was also observed with the equivalent W42 in IS911 [34]. There, the W42F mutant derivative which produced little to no binding efficiency with a truncated OrfAB lacking the CAS, showed a strongly positive result for in vivo transposition in the presence of the CAS of the IS911 TPase. This suggested that the CAS somehow had the ability to compensate for the deficiency of the W42F substitution in facilitating binding.: Our results suggest that this conserved tryptophan in IS3 family TPases may be involved in interacting with the CAS of the protein, for example, by promoting the folding which allows that motif to be correctly positioned in binding to the catalytic domain of IRR. W49R may fail to communicate the level of accuracy in CAS binding (for example, by permitting a minor misfolding) that is needed to allow recombination, without affecting regional DNA binding. Evidence for extensive binding of the IS2 TPase to the catalytic domain of IRR (the donor end in this insertion sequence) has been shown in concurrent footprinting studies described elsewhere (Lewis et al, Protein-DNA interactions define the mechanistic aspects of circle formation and insertion reactions in IS2 transposition, submitted) and the issue of the role of the CAS in global binding of the protein is addressed in this study in the discussion of CAS mutations which reduce binding efficiency.: The IS2 TPase possesses a LZ-like oligomerization motif at its N-terminus that facilitates binding to the ends of the element: The sequence of the coiled coil motif of the IS2 OrfAB TPase (residues 73-100; Figure 12a) differs in significant ways from that of the canonical LZ. Indeed when this sequence is tested on the 2-ZIP server (2zip.molgen.mpg.de/cgi-bin/2zip.pl;[59]) a LZ is not predicted. In this study, all five substitutions in the coiled coil domain indicate that a LZ-like motif, whose function is required for binding and transposition, exists within residues 73 to 100 in the IS2 TPase.: We have aligned the four OrfAB LZ-like heptads in IS2 with corresponding sequences from prototype elements of the four other subgroups of the IS3 family (Figure 12b). Haren et al. [30] have, however, created a detailed alignment of putative LZ sequences from OrfA, involving 15 members of the five subgroups (IS2, IS3, IS51, IS150 and IS407) of the IS3 family and they have specifically demonstrated the presence of a canonical LZ motif with a four-heptad repeat in OrfAB of IS911 [30, 31]. These alignments reveal, however, that the putative IS2 LZ-like motif is the only sequence in which only two of the four d positions are occupied by leucine (L83 and L97) and that IS2 alone lacks the leucine residue at the d position of the first heptad (for example, see A76-; Figure 12b). However, three of the four hydrophobic residues at the a positions (L73, I80 and L87) are occupied by leucines or isoleucine. The fourth a position, N94, in the fourth heptad is the buried polar asparagine, which is essential for inter-subunit H-bonding in canonical LZ structures [102]. Another significant difference between this putative IS2 LZ-like motif and the canonical LZ is the restriction of ionic (g/e' g'/e) stabilizing salt bridges to the third and fourth heptads (Figure 12c). It is possible, however, that weak non-ionic inter-subunit stabilizing interactions between the first and second heptads are brought about by the glutamine residues (Q79 and Q84) in the g and e positions of these two heptads. We propose, based on the analysis of all five mutations, that stabilization of a potential LZ-like structure (Figure 12c) would be brought about as follows: the N-terminal half of the structure would be relatively weakly stabilized by the concerted action of the d-located leucines at L83 in the second heptad, the a-located hydrophobics L73 and I80 and by hydrogen bonds at the g and e positions, Q79 and Q84, in the first and second heptads respectively. The C-terminal half of the motif, on the other hand would be more strongly stabilized by the d-located leucines at L97, the a-located asparagine (N94) whose buried hydrogen bonds contribute significantly to stabilization of the zipper (both in the fourth heptad) and the canonical ionic salt bridges generated by the g and e residues at E93 and K98 in the third and fourth heptads, respectively. Thus, L83V and L97H affected the canonical d-located leucines. The L83V substitution (Figure 6c, lane 5) completely abolished both binding and transposition, suggesting that substitution of the C-ß branched valine residue destroyed the primary interaction for stabilization at the N-terminus and consequently the entire LZ-like motif. The phenotype of the Q79L substitution appears to have affected the weak g/e' g'/e inter-subunit stabilizing reactions at the N-terminal end of the zipper-like structure but, given that the primary stabilization interaction is still present, it produced a less drastic phenotypic change insofar as binding efficiency is concerned (Figure 6a, lane 4), compared to the replacement at Leu83 (L83V).: L97H, on the other hand, had a much less drastic effect on binding (Figure 6a, lane 6), although transposition was all but abolished. The L97H substitution destabilized the putative motif at its C-terminal end but the two other strong stabilization interactions described above appear to allow a level of oligomerization that permits unstable binding with minimal dissociation. Similarly, N94D altered the buried a-located asparagine residue required for stabilization of the zipper but the existence of the two remaining stabilization interactions at the C-terminus appears to have influenced the production of a phenotype similar to that of L97H (Figure 6c, lane 5).: The K89M substitution (Figure 12c) also abolished transposition completely and provides further evidence for a functional LZ-like motif. Its phenotype is consistent with the location of K89 at a c-located position, which is part of the solvent-exposed helical surface that must be occupied by a hydrophilic residue. A hydrophobic residue would disrupt the formation of that surface and subsequently abolish zipper function [103, 104].: The CAS of the TPase of IS2 and other IS3 family members share the functional properties of the three-dimensional catalytic core of the TPase/RISF: The eight substitutions, W237R, L266P, H267D, R291H, V301M, A341T, A341P and E391K (Table 2, rows 18-25) fell into 3 a helices and 3 ß strands of the putative CAS (Figure 10b). Four of these (W237R, L266P, H267D and V301M) impacted the putative ß sheet of the catalytic core and abolished transposition but only W237R had no effect on binding (Figure 6f, lane 4), a result that helps identify the function of W237 and of ß strand 1 in the CAS. Two of the remaining four mutations, A341T and A341P, located adjacent to the third member of the catalytic triad, E342, affected a highly conserved hydrophobic residue in a helix 4 in the RISF, that is, V151 in HIV-1 (Figure 10b; see also [105]). A341T had no negative effect on binding efficiency (Figure 6b, lane 4) and enhanced the frequency of transposition by about 50% (Table 2, row 22), a result that also sheds light on the function of a helix 4 in the IS2 CAS. Substitutions were recovered in two other a helices, E391K in a helix 6 and R291H in a helix 1. These and H267D in ß strand 3, which reduced but did not eliminate binding, helped identify residues and elements which likely function in binding the CAS to the catalytic domain.: The W237R and A341T substitutions eliminated and enhanced cleavage respectively, and provide strong evidence, based on the deduced function of the two WT residues, that the three-dimensional structure of the catalytic core of the IS2 TPase functions similarly to that in the RISF. W237R is highly conserved in ß strand 1 of the RISF and aligns with W61 in HIV-1 and RSV. The location of this tryptophan, three residues from the first of the catalytic aspartates (D240 in IS2 and D 64 in HIV-1) on ß strand 1, is consistent with its role, as shown from crosslinking studies with W61 of HIV-1 [106], in interacting with the 3' end of the DNA and positioning it within the catalytic pocket. The ability of W237R to eliminate transposition without affecting binding could then be explained by a similar role for W237.: The A341T substitution highlights the essential supporting role of residues adjacent to E342 in a helix 4, in the chemistry of cleavage and joining, and we draw this conclusion from the extent of conservation in this a helix in the RISF. For example, the co-crystal structure of the Tn5 TPase has shown that Y319, R322, K330 and K333, which flank E326 (the triad glutamic acid) in a helix 4, are involved in making specific contacts with the 3' and 5' ends (transferred and non-transferred strands) of the catalytic domain of the DNA [67]. These four residues are aligned directly, in a helix 4 of IS2, with E336, N338, K346 and K349 (N338 and K349 are highly conserved residues), which flank E342 [61] and presumably have the same function as their equivalents in Tn5. In addition, K346 and the conserved K349 in IS2 are aligned with K156 and K159 in HIV-1 integrase (Figure 10b). These two residues in IN have been shown to contact the DNA, with K159 directly interacting with the adenosine of the terminal CA-3' dinucleotide, where it is involved in orienting the DNA properly for cleavage [83]. Earlier, van Gent et al. [107] had shown that a K159V substitution in HIV-1 significantly slowed the rate of integration without significantly reducing the amount of integration in an overnight incubation. Their implication was that this mutation reduced by one the number of residues flanking E152 (the triad glutamic acid) available for contact with the DNA and thus reduced the efficiency of interaction between the protein and the DNA. In addition, Calmels et al. [108] demonstrated in HIV-1 that 75% of the random mutations immediately flanking E152 that resulted in an increase in the amount of binding to a strand transfer substrate included a V151T mutation, the homologue of A341T in IS2. One can then account for the 50% increase in transposition of A341T, by assuming that enhanced interaction with the catalytic domain of IRR, due to an additional specific or stochastic DNA contact by the substituted threonine, produced the subsequent enhancement. This is likely the case, given its proximity to the four residues which putatively make contact with the catalytic domain of the IS2 IRR and its location adjacent to E342. These two results, with W237R and A341T on ß strand 1 and a helix 4 respectively, suggest that the three-dimensional structures of these elements, and subsequently that of the catalytic core, are functionally similar to those of the RISF.: We have been able to differentiate between substitutions in the CAS which do not affect the binding efficiency of the protein, W237R or A341T, those which affected the structural integrity of the catalytic core and thus the entire protein, preventing any complex formation, A341P, L266P and V301M, (Figure 6e, lanes 2-4) and those which reduce binding efficiency of the CAS to the cognate DNA, such as H267D, R291H and E391K (Figure 6a, lane 1 and 6d lanes 2-3); these last three produced partially dissociated complexes identifying residues that are likely important binding contacts between the CAS and the catalytic domain. H267D replaced a basic residue with a negatively charged one at a non-conserved position on ß strand 3. The enhanced level of substrate dissociation is in accord with reduced contact with the DNA. R291H substituted a weakly basic residue at a position occupied by a conserved arginine in four of the five subgroups in a helix 1 of the IS3 family. The substitution reduced binding efficiency, likely compromising the DNA anchoring function provided by Arg 291. E391K occurs in a helix 6, which is characterized by two highly conserved residues, proline (P389 in IS2) in RSV and the IS3 family and a glutamic acid or glutamine in the RISF; E391K in IS2 altered the latter and the replacement of the acidic residue with the basic lysine reduced the overall binding affinity to the DNA in the catalytic domain, without completely eliminating it. The phenotypes of these mutations (H267D, R291H and E391K) suggest that their wild type residues are critical contacts which facilitate the binding of the CAS to the catalytic domain of IRR.: On the other hand, A341P, the helix-breaking proline substitution in a helix 4, altered a conserved hydrophobic residue in the RISF, significantly reducing complex formation. L266P altered a conserved hydrophobic residue in ß strand 3 of the RISF and V301M altered a very hydrophobic, conserved residue in the IS3 family in ß strand 4, associated with the second aspartate of the catalytic triad (D306); both of these completely eliminated complex formation. The fact that all three of these substitutions replaced very hydrophobic residues and eliminated binding suggests that their principal effect was to disrupt the a helix or ß strand, or the putative ß sheet and thus the catalytic core, the integrity of which is clearly essential for proper folding of the full length protein and thus global binding.: These results underscore the importance that binding of the catalytic core to the CD plays in regional and global binding of the full length protein. On one level the W49R substitution in the recognition helix of the HTH apparently failed to coordinate the necessary level of accuracy of binding of the catalytic core to the DNA of the catalytic domain (most likely due to a minor folding impairment), eliminating transposition but nevertheless permitting global binding. However, a full length protein with a mutation of a single anchoring residue in its catalytic core, which may not alter the structural integrity of the protein, significantly impacts global binding, manifested by partial dissociation of the complex. From this we conclude that the binding reactions with wild type proteins shown in Figures 2 and 6, in which all of the DNA is driven into the complex, result from fully formed complexes in which both the DNA binding domain and the CAS of the protein are fully complexed to the ends. This conclusion is supported by data showing extensive protection of the protein binding and catalytic domains of IRR or of the abutted ends of the minicircle junction (Lewis et al, Protein-DNA interactions define the mechanistic aspects of circle formation and insertion reactions in IS2 transposition, submitted). Impaired binding by either domain of the protein thus produces dissociation of the complex.: The integrity of a middle interval contributes to the binding capability of the IS2 TPase: The V179L substitution affects a hydrophobic residue that is functionally conserved in a helix M5 in the RISF (Figure 10a). Two of the three residues conserved in the IS3 family are also conserved in the RISF and V179L affected one of them. The disruption of binding and abolition of transposition in IS2 likely resulted from the replacement of the C-ß branched valine, which affected the backbone of the a helix, distorting or disrupting it [93]. The result suggests that at least a helices M4 to M6 of the middle interval of the protein, which align with good conservation with the first three a helices of IN, are critical to the functional architecture of the protein that relates to global binding to the cognate IS2 DNA.: Conclusions: These results validate the strategy of the GFP-tagged approach to obtaining, under native conditions, preparations of a full length, soluble, active protein like the IS2 TPase that is usually insoluble when prepared under native conditions and refractory to whole protein structure-function or biophysical studies when solubilized. This strategy has resulted, for the first time (among circle forming insertion sequences with a two-step transposition pathway), in the recovery of a full length protein which is capable of very efficient binding in vitro to cognate DNA and the formation of fully formed complexes (Lewis et al, Protein-DNA interactions define the mechanistic aspects of circle formation and insertion reactions in IS2 transposition, submitted) involving residues at both the N- and C-termini of TPase. In addition the fluorescence-based random mutagenesis approach to exploring structure-function relationships has helped refine our understanding of those relationships in IS2 and the IS3 family TPases by teasing out residues that facilitate binding, oligomerization and (as they relate to the integrases) catalysis, as well as those that define possible interactions between structural motifs of the protein.: Methods: Bacterial strains and media: E. coli strain JM105 (New England Biolabs) was used for most procedures involving plasmid DNA preparation, cloning and the lacZ papillation assay. DNA transformation was carried out into supercompetent XL1 Blue cells (Stratagene Inc, Santa Clara, CA, USA) for reactions requiring cloning and overexpression of the fused orfAB and GFPuv genes in pLL2522. BL21(DE3)pLysS cells (Novagen-EMD4Biosciences, La Jolla, CA, USA) were used for over expression of the OrfAB-GFP fusion product cloned into the pTWIN2 vector (New England Biolabs).: Cultures were routinely grown in lysogeny broth (LB) media at 37°C, supplemented where necessary with carbenicillin (Cb, 50 µg/mL), kanamycin (Km, 40 µg/mL) or chloramphenicol (Cm, 20 µg/mL). For the overexpression of pGLO, pLL2522 and pLL2524-XXX (plasmids with the GMF mutations), cultures were grown at 28°C in 2x YT media supplemented with Cb and arabinose (6 mg/mL).: DNA procedures: Plasmid DNA preparation was carried out using the standard alkaline lysis procedure of the Wizard DNA Purification System (Promega Corp., Madison, WI, USA) for in-labarotory protocols. The Pure Link HQ Miniplasmid Purification Kit (Invitrogen Corp., Carlsbad, CA, USA) was used in the preparation of DNA samples for outsourced sequencing reactions (see below).: Restriction endonuclease digestion was carried out with enzymes and buffers from New England Biolabs. Diagnostic gels were made with 0.8% Seakem agarose and preparative gels were made with 0.6% Seaplaque Low Melting Temperature agarose (Cambrex Corp., East Rutherford, NJ, USA). DNA was purified from preparative gels with Gelase (Epicentre Biotechnologies, Madison, WI, USA) following the manufacturer's instructions and concentrated in a Microcon-100 Filter Device (Millipore, Billerica, MA, USA) to a 50 µL volume. The solution was dried down to a pellet in a Savant SpeedVac DNA concentrator, resuspended in 12 µL ultrapure H2O and frozen at -20°C until use. Standard cloning procedures were as previously described [7].: Standard PCR and PCR-mediated in vitro site-directed mutagenesis were carried out with the Vent DNA polymerase (New England Biolabs) used in accordance with the manufacturer's instructions. The reaction protocols were as described earlier [6]. PCR products were cleaned up with the Direct PCR Purification Buffer and the Wizard PCR Preps Resin (Promega Corp.).: Plasmid constructs and mutagenizing oligonucleotides: pLL2522 which contained the fused orfAB and GFPuv genes (Figure 2e) was prepared following the procedure illustrated in Figure 2.: pGLO-ATG2 containing 3'-located Eco RI-Nhe I cloning sites (Figure 2a) was created by removing an Eco RI site located adjacent to the two stop codons (bold upper case) at the 3' end of GFPuv with the oligonucleotide (all mutagenizing sites in this section are in bold lower case) 5'GGATCATCAGGTACCGAGCg CGt ATTCATTA TTTGTAGAGCTCATCCATGCC3' and creating a new cassetting Eco R1 site upstream of the existing Nhe I site (in upper case, containing the first two codons of GFP) and destroying the ATG start codon at the 5' end of the gene, with the oligonucleotide 5'TCCCCTTCCCCGCTATGg ATCAGCTGAgaattc TTCTCCTTCTTAAAGTTAAA3'.: pLL2521HK (Figure 2d) containing an Eco RI-Nhe I cassetted orfAB gene was created in successive steps by removing the upstream Eco RI site in pLL18 (Figure 2b) with the oligonucleotide 5'AGACTATCACTTATCCGCGGAACAGTCTAGAGCTCcccctc ACTGGCCGTC3', placing Eco RI adjacent to the IS2 start codon (pLL2509A; Figure 2c) with the oligonuclotide 5'ACTAGTTTTTAGACCGTCATTGGAgaattc ATGATTGATGTGTTAGGGCC3', adding an Nhe I site and altering the adjacent stop codon at the 3' end of IS2 orfAB to create pLL2520 with the oligonucleotide 5'GGGCCCgcgctagc ACCGGTTATTTCCAGACATCTGTTATCACTTAACC3' and adding a 6X HIS tag downstream of the IS2 orfAB start codon (Figure 2d) with oligonucleotide 5'GTATGcatcatcatcatcatcatagcagatatctggtattgagtataagc ATTGATGTCTAAGGGCCGGAG3'Finally, in order to fuse the Eco R I-Kpn I cassetted orfAB-GFPuv fusion sequence (Figure 2e) to the Kmr reporter gene, a procedure needed for the creation of lacZ papillation assay constructs, a Kpn I site was added adjacent to and downstream of the Nhe I site (upper case lettering) in the sequence that connects orfAB to the Kmr gene. For this we used the primer 5'AACTGATCCAGGGCCCGggtacc AGCTAGC ACCAGTTATTTC3'.: pLL2522 was produced by cloning the cassetted Eco R1-Nhe I orfAB gene into pGLO-ATG2 (Figure 2e).: pUH2509, a construct used for lacZ papillation assays, containing IS2 with the frame fused orfAB gene from pLL18 (Figure 2b) was created as follows. IRL in pLL18 was deleted and the weak indigenous E-10 promoter (upper case lettering) conserved while adding a Sac II site to form pLL2509A (Figure 2c), into which the Xba I-Sac II cassetted lacZ gene could be cloned. We used the oligonucleotide 5'CCAGTGGAATTCGAGCTCTAGACTGTTccgcgg ATAAGTGATAGTCT TAATATTAGTTTTTTAGACTAGTCATTGG3'. lacZ was obtained from pLL135 [19]. The 3' end of the gene was modified to add the necessary Sac II site, generating pLL135II using 5'GGTACCGGGGATCCgccg AGACATGATAAGATACATTGATGAGTTTGG3'. The 5' end of lacZ was modified to remove the lacUV5 promoter, to add an Xba I site as well as the IS2 IRL (upper case lettering) generating LL135IRLLZ. All three reading frames reading into the IRL sequence lacked stop codons. We used the oligonucleotide 5'ATGTTCTTTCCTCGAGtctaga TAGACTGGCCCCCTGAATCTCCAGACAACCAATATCACTTAATTAT TGCCGTAAGCCGTGGCCG3'. The Xba I-Sac II fragment from pLL135IRLLZ was cloned into pLL2509A to produce plasmid pUH2509, which contained a 6.4 kb version of IS2 consisting of (from 5' to 3'): IRL, the promoterless lacZ gene sequence, the orfAB sequence without functional left or right ends, the Kmr gene and IRR.: pUH2523, the construct containing the fused orfAB::GFPuv genes, used for lacZ papillation assays, was created as follows. (i) orfAB linked to the Kmr gene in pLL2521HK is cassetted within Eco RI and Kpn I restrictions sites (Figure 2d), so in order to add the Kmr reporter gene to the fused orfAB::GFP genes we replaced orfAB in pLL2521HK (Figure 2d) with the Eco RI-Kpn I cassetted orfAB::GFP sequence shown in Figure 2e, to create pLL2523. (ii) The lacZ papillation assay plasmid pUH2509 possesses an Spe I site downstream of the E-10 promoter of IS2orfAB and an Nru I site within the Kmr reporter gene, as do all constructs in which Kmr is present as a reporter gene (see, for example pLL2521HK in Figure 2d). The Spe I-Nru I fragment from pUH2509 was replaced by the corresponding fragment from pLL2523 to create pUH2523. Similarly, Spe I-Nru I fragments from pLL2524-XXX, plasmids containing mutated orfAB genes (see below), were used to create lacZ papillation plasmids pUH2524-XXX.: pUH2523<U+0394>orfAB, the null mutation used as a control in lacZ papillation assays (Table 2, row 1), was created by deleting a 1743 bp fragment between two Mfe I restriction sites, 103 bp from the start of the IS2orfAB sequence and 156 bp from the end of the GFPuv sequence in pUH2523, followed by blunt ligation of the sites.: pTW2orfAB::GFP was created by cloning the fused orfAB::GFP genes into the pTWIN2 vector of the IMPACT system (Intein Mediated Purification with an Affinity Chitin-binding Tag; New England Biolabs) for the purposes of improving the purification of the fusion protein. The construct was cloned into the N-terminal multiple cloning site of the vector by first creating a Sbf I site close to the existing Eco RI site with 5'GGCATACATGAATTCCTCGAGGcctgcagg CTGCGTATCCGGTGACACC3' to accommodate the EcoR I/Sbf I cassetted orfAB::GFP sequence.: Creation and cloning of mutations in IS2 orfAB from a PCR-based random mutagenesis protocol: The GeneMorph II Random Mutagenesis Kit (Stratagene) was used to create mutations within orfAB in pLL2521HK (Figure 2d) using a 30-cycle PCR-based protocol. Primers were M13F (forward) and KmR1 (reverse; [6]). Mutations were generated at very low, low and medium rates (900 ng of target DNA within 3.6 µg of plasmid DNA; 500 ng of target within 2.0 µg of plasmid DNA; and 250 ng of target within 1.0 µg of plasmid DNA respectively). PCR products were cloned into the Eco RI-Nhe I sites of pGLO-ATG2, transformed into XL1-Blue Supercompetent cells and plated onto LB plus Cb plus arabinose agar. After 72 hours at 37°C, plates were examined for brightly fluorescing colonies among a background of less brightly fluorescing colonies. Plasmids from the brighter fluorescing clones carrying mutations in the orfAB sequence were identified as pLL2524-XXX where XXX stands for 001-110.: LacZ papillation assays: Papillation was best observed when pUH2509, pUH2523 or pUH2524-XXX plasmid DNA was transformed into JM105 cells. The DNA concentration was titrated to produce about 50 to 60 transformants per plating on to LB plus Km plus Cb plus arabinose agar. Plates were incubated in airtight bags to minimize drying. The numbers of papillae plateaued after 20 to 25 days at 37°C.: Preparation of the wild type and mutant OrfAB-GFP fusion proteins under native conditions: pLL2522 and other mutant plasmid DNA were transformed into XLI-Blue cells (Stratagene), plated on to LB plus Cb plus arabinose agar and incubated for 48 hours at 37°C. A single fluorescing colony was inoculated into 10.0 mL of similarly supplemented 2x YT broth and incubated overnight at 28°C. After centrifugation, the pellet was checked for fluorescence, washed in 3.0 mL Native Wash Buffer pH 8.0 (50 mM sodium phosphate monobasic monohydrate, 300 mM NaCl), resuspended in 3.0 mL Bug Buster Protein Extraction Reagent (Novagen-EMD4Biosciences) supplemented with 1.0 uL of Benzonase (Novagen-EMD4Biosciences) per 10.0 mL overnight (o/n) culture and 3.0 uL of Protease Arrest (Calbiochem-EMD4Biosciences La Jolla, CA, USA) per mL of lysate and nutated at 4°C for 30 minutes. If necessary, the suspension was subjected to a single round of freezing and thawing to complete lysis. The lysate was checked for bright fluorescence before and after centrifugation at 16,000 × g for 1 hour at 4°C.: 6xHis-tag purification of the protein was achieved by gravity flow affinity chromatography using Ni-NTA agarose (Qiagen Valencia, CA, USA) under native conditions essentially following the manufacturer's instructions. The crude lysate was loaded on to a 1.0 mL bed of the nickel-charged resin in a 5.0 mL column and chromatographic separation followed with UV light. The protein bound as a tight brightly fluorescing band at the top of the column and remained bound through washings with 10 to 60 mM Imidazole when a slight dissociation of the band was observed. To circumvent continued dissociation, the band was eluted with 250 mM Imidazole and its progress through the column followed. Peak fractions (fluorometrically determined) were subjected to diagnostic 12% PAGE using Ac:Bis (30%:8%) polyacrylamide gels (Figure 4a). Fractions showing both the 74 kDa OrfAB-GFP and the 17 kDa OrfA proteins were pooled (approximately 700 uL), concentrated to about 75 uL in a YM-10 Microcon Centrifugal Filter Device (Millipore), dialyzed overnight in 300 mM NaCl, 50 mM tris(hydroxymethyl)amino methane (Tris-Cl), pH 8.0 and 1.5 mM dithiothreitol using Slide-A-Lyzer cassettes (Pierce/Thermo Scientific Rockford, IL, USA) and stored in 50% glycerol at -20°C. Concentrations of GFP in the sample shown in Figure 4a were measured with spectrophotometry at 280 nm and 397 nm while those of the wild type and mutant versions of the fused OrfAB-GFP proteins were measured at 397 nm. Comparative levels of fluorescence of GFP and the fusion proteins were measured fluorometrically and used to confirm the concentration data.: For the overexpression of the OrfAB-GFP fusion protein in the pTWIN2 derivative (IMPACT, New England Biolabs), plasmid pTWorfAB::GFP was transformed into BL21(DE3)pLysS cells. Single colonies were inoculated into 10 mL 2xYT plus Cb plus Cm and grown overnight at 37°C. Two milliliters of this starter culture was inoculated into 120 mL of the same medium (to establish an optical density (OD) of 0.2) and grown at 37°C to an OD of 0.8 when it was induced with 1.0 mM isopropyl ß-D-1-thiogalactopyranoside and allowed to grow overnight at 16°C. The culture was lysed as described above and the cleared lysate loaded onto the chitin column. The protein was purified per the manufacturer's instructions with binding and elution monitored by UV light-induced fluorescence. Peak fractions were collected pooled and analyzed as described above, purified on ion exchange Q-sepharose columns (HiTrap Q XL, GE Healthcare) following the manufacturer's instructions, and concentrated, dialyzed and stored as described above.: Electrophoretic mobility shift assays: Annealed 50-mer oligonucleotides containing the 41 bp IRR sequence were used in all but one of the EMSA experiments (Figure 6a-e). The upper strand was labeled at the 5' end with <U+03B3>32P-ATP. Primer A - upper strand (the IRR sequence is within the square brackets): 5'GGATCC[TTAAGTGATAACAGATGTCTGGAAATATAGGGGCAAATCCA]GCG3'. Primer B - lower strand: 5'CGC[TGGATTTGCCCCTATATTTCCAGACATCTGTTATCACTTAA]GGATCC3'.: Reactions shown in Figure 6f utilized annealed 87-mer oligonucleotides containing the IRR sequence. The top strand (primer A) was labeled at its 5' end with <U+03B3>32P-ATP. Primer A - 5'GCTGACTTGACGGGACGGGGATCC[TTAAGTGATAACAGATGTCTGGAAATATAGGGGCAAATCCA]ATCGACCTGCAGGCATATAAGC3'. Primer B - 5'GCTTATATGCCTGCAGGTCGAT[TGGATTTGCCCCTATATTTCCAGACATCTGTTATCACTTAA]GGATCCCCGTCCCGTCAAGTCAGC3'.: A 20 µL labeling reaction contained 40 units of T4 polynucleotide kinase in 1X T4 polynucleotide kinase reaction buffer (New England Biolabs), 20 µM of the primer (upper strand) and 50 µCi of <U+03B3>32P-ATP. The reaction was incubated at 37°C for 30 minutes and heat-killed at 90°C for 5 minutes. A 100-µL annealing reaction contained 10 <U+03C1>mol and 13 <U+03C1>mol of the labeled and unlabeled strands respectively, 20 mM Tris-Cl pH 8.0 and 100 mM NaCl. The reaction was placed in a boiling water bath, cooled to 65°C, held there for 15 minutes and allowed to cool to room temperature.: Binding of the TPase to its cognate DNA was carried out for 30 minutes at room temperature (20°C) in a 15-uL reaction mixture of 20 mM Tris-Cl pH 8.0, 1 mM ethylenediaminetetraacetic acid, 5.0 µg/mL calf thymus DNA, 10 nM of the radioactively labeled annealed primers and 0.13 µM of the partially purified preparation of the OrfAB-GFP fusion protein. Reactions were separated on 5% native polyacrylamide gels at 4°C for an average of 450 volt hours (Vhrs) (see Figure 6).: Secondary structure algorithms and protein alignment tools: The ExPASy SWISS PROT translation toolkit [49] of the Swiss Institute of Bioinformatics was used to translate DNA sequences from the prototypes of the principal subgroups of the IS3 family, that is, IS2, IS3, IS51, and IS407 plus IS911 of the IS3 subgroup and IS861 of the IS150 subgroup, into protein sequences. Similar translations were done for sequences of the HIV-1 and RSV integrases. The ClustalW2 multiple alignment tool [50] was used for the alignment of protein sequences in Figure 7. Structure-based alignments in Figure 10 were determined from the sequences shown in Figure 7, from published RSV and HIV-1 sequences [73, 109, 110] from the alignments of Fayet et al. [60] and Rezsohazy et al. [61] and from the PSIPRED secondary structure determinations for the members of the IS3 family sub-groups and the two integrases. In these aligned sequences, functionally conserved non-polar hydrophobic residues were identified as h1 when all sequences possessed only very hydrophobic residues (L, I, V, C, M, F or W) and h2 when less hydrophobic residues are present or the conserved residues are only found in fewer than 80% of the sequences. Three different algorithms were used for secondary structure predictions: the PSIPRED server[51], the PROF Secondary Structure Prediction Protocol [53] using the Bioinformatics Information toolkit of the Max Planck Institute for Developmental Biology and the PHD Secondary Structure Analysis Algorithm [55] from the secondary analysis prediction protocol of PBIL (pbil.univ-lyon.fr; [54]). A PCOILS algorithm for coiled coils from the Bioinformatics Information toolkit of the Max Planck Institute for Developmental Biology [57, 58] was used to predict the presence of a coiled coil motif and the 2ZIP server [59] from the same institution was used to predict the presence of a LZ within the coiled coil motif.: Abbreviations: carbenicillin: catalytic active site: catalytic domain: electrophoretic mobility shift assay: extended-10 promoter: figure-of-eight: green fluorescent protein: right and left inverted repeats: insertion sequences: inverted repeat: kilobases: kiloDaltons: lysogeny broth: leucine zipper: minicircle junction: sodium chloride: optical density: open reading frame: polymerase chain reaction: TPase/retroviral integrase superfamily: Rous sarcoma virus: synaptic complex: transposase: tris(hydroxymethyl)amino methane: volt hour.: References: Chandler M, Mahillon J: Insertion sequences revisited. Mobile DNA II. Edited by: Craig NL, Craigie R, Gellert M, Lambowitz AM. 2002, Washington, DC: ASM Press, 305-366.: Rousseau P, Normand C, Loot C, Turlan C, Alazard R, Duval-Valentin G, Chandler M: Transposition of IS911. Mobile DNA II. Edited by: Craig NL, Craigie R, Gellert M, Lambowitz AM. 2002, Washington, DC: ASM Press, 367-383.: Polard P, Prere MF, Chandler M, Fayet O: Programmed translational frameshifting and initiation at an AUU codon in gene expression of bacterial insertion sequence IS911. J Mol Biol. 1991, 222: 465-477. 10.1016/0022-2836(91)90490-W.: Vogele K, Schwartz E, Welz C, Schiltz E, Rak B: High-level ribosomal frameshifting directs the synthesis of IS150 gene products. Nucleic Acids Res. 1991, 19: 4377-4385. 10.1093/nar/19.16.4377.: Hu ST, Lee LC, Lei GS: Detection of an IS2-encoded 46-kilodalton protein capable of binding terminal repeats of IS2. J Bacteriol. 1996, 178: 5652-5659.: Lewis LA, Grindley ND: Two abundant intramolecular transposition products, resulting from reactions initiated at a single end, suggest that IS2 transposes by an unconventional pathway. Mol Microbiol. 1997, 25: 517-529. 10.1046/j.1365-2958.1997.4871848.x.: Lewis LA, Gadura N, Greene M, Saby R, Grindley ND: The basis of asymmetry in IS2 transposition. Mol Microbiol. 2001, 42: 887-901. 10.1046/j.1365-2958.2001.02662.x.: Normand C, Duval-Valentin G, Haren L, Chandler M: The terminal inverted repeats of IS911: requirements for synaptic complex assembly and activity. J Mol Biol. 2001, 308: 853-871. 10.1006/jmbi.2001.4641.: Polard P, Chandler M: Bacterial TPases and retroviral integrases. Mol Microbiol. 1995, 15: 13-23. 10.1111/j.1365-2958.1995.tb02217.x.: Duval-Valentin G, Marty-Cointin B, Chandler M: Requirement of IS911 replication before integration defines a new bacterial transposition pathway. Embo J. 2004, 23: 3897-3906. 10.1038/sj.emboj.7600395.: Kiss J, Olasz F: Formation and transposition of the covalently closed IS30 circle: the relation between tandem dimers and monomeric circles. Mol Microbiol. 1999, 34: 37-52. 10.1046/j.1365-2958.1999.01567.x.: Loessner I, Dietrich K, Dittrich D, Hacker J, Ziebuhr W: TPase-dependent formation of circular IS256 derivatives in Staphylococcus epidermidis and Staphylococcus aureus. J Bacteriol. 2002, 184: 4709-4714. 10.1128/JB.184.17.4709-4714.2002.: Prudhomme M, Turlan C, Claverys JP, Chandler M: Diversity of Tn4001 transposition products: the flanking IS256 elements can form tandem dimers and IS circles. J Bacteriol. 2002, 184: 433-443. 10.1128/JB.184.2.433-443.2002.: Schmid S, Berger B, Haas D: Target joining of duplicated insertion sequence IS21 is assisted by IstB protein in vitro. J Bacteriol. 1999, 181: 2286-2289.: Rousseau P, Tardin C, Tolou N, Salome L, Chandler M: A model for the molecular organisation of the IS911 transpososome. Mob DNA. 2010, 1: 16-10.1186/1759-8753-1-16.: Polard P, Chandler M: An in vivo TPase-catalyzed single-stranded DNA circularization reaction. Genes Dev. 1995, 9: 2846-2858. 10.1101/gad.9.22.2846.: Polard P, Ton-Hoang B, Haren L, Betermier M, Walczak R, Chandler M: IS911-mediated transpositional recombination in vitro. J Mol Biol. 1996, 264: 68-81. 10.1006/jmbi.1996.0624.: Szabo M, Kiss J, Nagy Z, Chandler M, Olasz F: Sub-terminal sequences modulating IS30 transposition in vivo and in vitro. J Mol Biol. 2008, 375: 337-352. 10.1016/j.jmb.2007.10.043.: Lewis LA, Cylin E, Lee HK, Saby R, Wong W, Grindley ND: The left end of IS2: a compromise between transpositional activity and an essential promoter function that regulates the transposition pathway. J Bacteriol. 2004, 186: 858-865. 10.1128/JB.186.3.858-865.2004.: Szeverenyi I, Bodoky T, Olasz F: Isolation, characterization and transposition of an (IS2)2 intermediate. Mol Gen Genet. 1996, 251: 281-289.: Ton-Hoang B, Betermier M, Polard P, Chandler M: Assembly of a strong promoter following IS911 circularization and the role of circles in transposition. Embo J. 1997, 16: 3357-3371. 10.1093/emboj/16.11.3357.: Sekine Y, Aihara K, Ohtsubo E: Linearization and transposition of circular molecules of insertion sequence IS3. J Mol Biol. 1999, 294: 21-34. 10.1006/jmbi.1999.3181.: Haas M, Rak B: Escherichia coli insertion sequence IS150: transposition via circular and linear intermediates. J Bacteriol. 2002, 184: 5833-5841. 10.1128/JB.184.21.5833-5841.2002.: Haren L, Ton-Hoang B, Chandler M: Integrating DNA: TPases and retroviral integrases. Annu Rev Microbiol. 1999, 53: 245-281. 10.1146/annurev.micro.53.1.245.: Nowotny M: Retroviral integrase superfamily: the structural perspective. EMBO Rep. 2009, 10: 144-151. 10.1038/embor.2008.256.: Rice PA, Baker TA: Comparative architecture of TPase and integrase complexes. Nat Struct Biol. 2001, 8: 302-307. 10.1038/86166.: Rowland SJ, Dyke KG: Tn552, a novel transposable element from Staphylococcus aureus. Mol Microbiol. 1990, 4: 961-975. 10.1111/j.1365-2958.1990.tb00669.x.: Nagy Z, Szabo M, Chandler M, Olasz F: Analysis of the N-terminal DNA binding domain of the IS30 TPase. Mol Microbiol. 2004, 54: 478-488. 10.1111/j.1365-2958.2004.04279.x.: Stalder R, Caspers P, Olasz F, Arber W: The N-terminal domain of the insertion sequence 30 TPase interacts specifically with the terminal inverted repeats of the element. J Biol Chem. 1990, 265: 3757-3762.: Haren L, Normand C, Polard P, Alazard R, Chandler M: IS911 transposition is regulated by protein-protein interactions via a leucine zipper motif. J Mol Biol. 2000, 296: 757-768. 10.1006/jmbi.1999.3485.: Haren L, Polard P, Ton-Hoang B, Chandler M: Multiple oligomerisation domains in the IS911 TPase: a leucine zipper motif is essential for activity. J Mol Biol. 1998, 283: 29-41. 10.1006/jmbi.1998.2053.: Hu ST, Hwang JH, Lee LC, Lee CH, Li PL, Hsieh YC: Functional analysis of the 14 kDa protein of insertion sequence 2. J Mol Biol. 1994, 236: 503-513. 10.1006/jmbi.1994.1161.: Lei GS, Hu ST: Functional domains of the InsA protein of IS2. J Bacteriol. 1997, 179: 6238-6243.: Rousseau P, Gueguen E, Duval-Valentin G, Chandler M: The helix-turn-helix motif of bacterial insertion sequence IS911 TPase is required for DNA binding. Nucleic Acids Res. 2004, 32: 1335-1344. 10.1093/nar/gkh276.: Barth S, Huhn M, Matthey B, Klimka A, Galinski EA, Engert A: Compatible-solute-supported periplasmic expression of functional recombinant proteins under stress conditions. Appl Environ Microbiol. 2000, 66: 1572-1579. 10.1128/AEM.66.4.1572-1579.2000.: Davis GD, Elisee C, Newham DM, Harrison RG: New fusion protein systems designed to give soluble expression in Escherichia coli. Biotechnol Bioeng. 1999, 65: 382-388. 10.1002/(SICI)1097-0290(19991120)65:4<382::AID-BIT2>3.0.CO;2-I.: Galloway CA, Sowden MP, Smith HC: Increasing the yield of soluble recombinant protein expressed in E. coli by induction during late log phase. Biotechniques. 2003, 34: 524-526. 528, 530: Jenkins TM, Hickman AB, Dyda F, Ghirlando R, Davies DR, Craigie R: Catalytic domain of human immunodeficiency virus type 1 integrase: identification of a soluble mutant by systematic replacement of hydrophobic residues. Proc Natl Acad Sci USA. 1995, 92: 6057-6061. 10.1073/pnas.92.13.6057.: Compaan DM, Ellington WR: Functional consequences of a gene duplication and fusion event in an arginine kinase. J Exp Biol. 2003, 206: 1545-1556. 10.1242/jeb.00299.: Stempfer G, Holl-Neugebauer B, Rudolph R: Improved refolding of an immobilized fusion protein. Nat Biotechnol. 1996, 14: 329-334. 10.1038/nbt0396-329.: Armstrong N, de Lencastre A, Gouaux E: A new protein folding screen: application to the ligand binding domains of a glutamate and kainate receptor and to lysozyme and carbonic anhydrase. Protein Sci. 1999, 8: 1475-1483. 10.1110/ps.8.7.1475.: Chen GQ, Gouaux E: Overexpression of a glutamate receptor (GluR2) ligand binding domain in Escherichia coli: application of a novel protein folding screen. Proc Natl Acad Sci USA. 1997, 94: 13431-13436. 10.1073/pnas.94.25.13431.: Rudolph R, Lilie H: In vitro folding of inclusion body proteins. Faseb J. 1996, 10: 49-56.: Waldo GS, Standish BM, Berendzen J, Terwilliger TC: Rapid protein-folding assay using green fluorescent protein. Nat Biotechnol. 1999, 17: 691-695. 10.1038/10904.: Chalmers R, Guhathakurta A, Benjamin H, Kleckner N: IHF modulation of Tn10 transposition: sensory transduction of supercoiling status via a proposed protein/DNA molecular spring. Cell. 1998, 93: 897-908. 10.1016/S0092-8674(00)81449-X.: Chalmers RM, Kleckner N: Tn10/IS10 TPase purification, activation, and in vitro reaction. J Biol Chem. 1994, 269: 8029-8035.: Craig NL, Nash HA: E. coli integration host factor binds to specific sites in DNA. Cell. 1984, 39: 707-716. 10.1016/0092-8674(84)90478-1.: Krebs MP, Reznikoff WS: Use of a Tn5 derivative that creates lacZ translational fusions to obtain a transposition mutant. Gene. 1988, 63: 277-285. 10.1016/0378-1119(88)90531-8.: Gasteiger E, Gattiker A, Hoogland C, Ivanyi I, Appel RD, Bairoch A: ExPASy: The proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res. 2003, 31: 3784-3788. 10.1093/nar/gkg563.: Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, et al: Clustal W and Clustal × version 2.0. Bioinformatics. 2007, 23: 2947-2948. 10.1093/bioinformatics/btm404.: McGuffin LJ, Bryson K, Jones DT: The PSIPRED protein structure prediction server. Bioinformatics. 2000, 16: 404-405. 10.1093/bioinformatics/16.4.404.: Prere MF, Chandler M, Fayet O: Transposition in Shigella dysenteriae: isolation and analysis of IS911, a new member of the IS3 group of insertion sequences. J Bacteriol. 1990, 172: 4090-4099.: Ouali M, King RD: Cascaded multiple classifiers for secondary structure prediction. Protein Sci. 2000, 9: 1162-1176. 10.1110/ps.9.6.1162.: Combet C, Blanchet C, Geourjon C, Deleage G: NPS@: network protein sequence analysis. Trends Biochem Sci. 2000, 25: 147-150. 10.1016/S0968-0004(99)01540-6.: Rost B, Sander C: Combining evolutionary information and neural networks to predict protein secondary structure. Proteins. 1994, 19: 55-72. 10.1002/prot.340190108.: Dodd IB, Egan JB: Improved detection of helix-turn-helix DNA-binding motifs in protein sequences. Nucleic Acids Res. 1990, 18: 5019-5026. 10.1093/nar/18.17.5019.: Lupas A: Coiled coils: new structures and new functions. Trends Biochem Sci. 1996, 21: 375-382.: Lupas A, Van Dyke M, Stock J: Predicting coiled coils from protein sequences. Science. 1991, 252: 1162-1164. 10.1126/science.252.5009.1162.: Bornberg-Bauer E, Rivals E, Vingron M: Computational approaches to identify leucine zippers. Nucleic Acids Res. 1998, 26: 2740-2746. 10.1093/nar/26.11.2740.: Fayet O, Ramond P, Polard P, Prere MF, Chandler M: Functional similarities between retroviruses and the IS3 family of bacterial insertion sequences?. Mol Microbiol. 1990, 4: 1771-1777. 10.1111/j.1365-2958.1990.tb00555.x.: Rezsohazy R, Hallet B, Delcour J, Mahillon J: The IS4 family of insertion sequences: evidence for a conserved TPase motif. Mol Microbiol. 1993, 9: 1283-1295. 10.1111/j.1365-2958.1993.tb01258.x.: Katzman M, Mack JP, Skalka AM, Leis J: A covalent complex between retroviral integrase and nicked substrate DNA. Proc Natl Acad Sci USA. 1991, 88: 4695-4699. 10.1073/pnas.88.11.4695.: Kulkosky J, Jones KS, Katz RA, Mack JP, Skalka AM: Residues critical for retroviral integrative recombination in a region that is highly conserved among retroviral/retrotransposon integrases and bacterial insertion sequence TPases. Mol Cell Biol. 1992, 12: 2331-2338.: Rice P, Mizuuchi K: Structure of the bacteriophage Mu TPase core: a common structural motif for DNA transposition and retroviral integration. Cell. 1995, 82: 209-220. 10.1016/0092-8674(95)90308-9.: Ohta S, Tsuchida K, Choi S, Sekine Y, Shiga Y, Ohtsubo E: Presence of a characteristic D-D-E motif in IS1 TPase. J Bacteriol. 2002, 184: 6146-6154. 10.1128/JB.184.22.6146-6154.2002.: Ton-Hoang B, Turlan C, Chandler M: Functional domains of the IS1 TPase: analysis in vivo and in vitro. Mol Microbiol. 2004, 53: 1529-1543. 10.1111/j.1365-2958.2004.04223.x.: Davies DR, Goryshin IY, Reznikoff WS, Rayment I: Three-dimensional structure of the Tn5 synaptic complex transposition intermediate. Science. 2000, 289: 77-85. 10.1126/science.289.5476.77.: Chen JC, Krucinski J, Miercke LJ, Finer-Moore JS, Tang AH, Leavitt AD, Stroud RM: Crystal structure of the HIV-1 integrase catalytic core and C-terminal domains: a model for viral DNA binding. Proc Natl Acad Sci USA. 2000, 97: 8233-8238. 10.1073/pnas.150220297.: Dyda F, Hickman AB, Jenkins TM, Engelman A, Craigie R, Davies DR: Crystal structure of the catalytic domain of HIV-1 integrase: similarity to other polynucleotidyl transferases. Science. 1994, 266: 1981-1986. 10.1126/science.7801124.: Goldgur Y, Dyda F, Hickman AB, Jenkins TM, Craigie R, Davies DR: Three new structures of the core domain of HIV-1 integrase: an active site that binds magnesium. Proc Natl Acad Sci USA. 1998, 95: 9150-9154. 10.1073/pnas.95.16.9150.: Maignan S, Guilloteau JP, Zhou-Liu Q, Clement-Mella C, Mikol V: Crystal structures of the catalytic domain of HIV-1 integrase free and complexed with its metal cofactor: high level of similarity of the active site with other viral integrases. J Mol Biol. 1998, 282: 359-368. 10.1006/jmbi.1998.2002.: Bujacz G, Jaskolski M, Alexandratos J, Wlodawer A, Merkel G, Katz RA, Skalka AM: High-resolution structure of the catalytic domain of avian sarcoma virus integrase. J Mol Biol. 1995, 253: 333-346. 10.1006/jmbi.1995.0556.: Yang ZN, Mueser TC, Bushman FD, Hyde CC: Crystal structure of an active two-domain derivative of Rous sarcoma virus integrase. J Mol Biol. 2000, 296: 535-548. 10.1006/jmbi.1999.3463.: Katayanagi K, Miyagawa M, Matsushima M, Ishikawa M, Kanaya S, Ikehara M, Matsuzaki T, Morikawa K: Three-dimensional structure of ribonuclease H from E. coli. Nature. 1990, 347: 306-309. 10.1038/347306a0.: Yang W, Hendrickson WA, Crouch RJ, Satow Y: Structure of ribonuclease H phased at 2 A resolution by MAD analysis of the selenomethionyl protein. Science. 1990, 249: 1398-1405. 10.1126/science.2169648.: Ariyoshi M, Vassylyev DG, Iwasaki H, Nakamura H, Shinagawa H, Morikawa K: Atomic structure of the RuvC resolvase: a holliday junction-specific endonuclease from E. coli. Cell. 1994, 78: 1063-1072. 10.1016/0092-8674(94)90280-1.: Rice P, Craigie R, Davies DR: Retroviral integrases and their cousins. Curr Opin Struct Biol. 1996, 6: 76-83. 10.1016/S0959-440X(96)80098-4.: Lovell S, Goryshin IY, Reznikoff WR, Rayment I: Two-metal active site binding of a Tn5 TPase synaptic complex. Nat Struct Biol. 2002, 9: 278-281. 10.1038/nsb778.: Wintjens R, Rooman M: Structural classification of HTH DNA-binding domains and protein-DNA interaction modes. J Mol Biol. 1996, 262: 294-313. 10.1006/jmbi.1996.0514.: Landschulz WH, Johnson PF, McKnight SL: The leucine zipper: a hypothetical structure common to a new class of DNA binding proteins. Science. 1988, 240: 1759-1764. 10.1126/science.3289117.: O'Shea EK, Klemm JD, Kim PS, Alber T: X-ray structure of the GCN4 leucine zipper, a two-stranded, parallel coiled coil. Science. 1991, 254: 539-544. 10.1126/science.1948029.: Jenkins TM, Esposito D, Engelman A, Craigie R: Critical contacts between HIV-1 integrase and viral DNA identified by structure-based analysis and photo-crosslinking. Embo J. 1997, 16: 6849-6859. 10.1093/emboj/16.22.6849.: Zimmer M: Green fluorescent protein (GFP): applications, structure, and related photophysical behavior. Chem Rev. 2002, 102: 759-781. 10.1021/cr010142r.: Gupta RD, Tawfik DS: Directed enzyme evolution via small and effective neutral drift libraries. Nat Methods. 2008, 5: 939-942. 10.1038/nmeth.1262.: Hanson DA, Ziegler SF: Fusion of green fluorescent protein to the C-terminus of granulysin alters its intracellular localization in comparison to the native molecule. J Negat Results Biomed. 2004, 3: 2-10.1186/1477-5751-3-2.: Liu AX, Zhang SB, Xu XJ, Ren DT, Liu GQ: Soluble expression and characterization of a GFP-fused pea actin isoform (PEAc1). Cell Res. 2004, 14: 407-414. 10.1038/sj.cr.7290241.: Davies DR, Mahnke Braam L, Reznikoff WS, Rayment I: The three-dimensional structure of a Tn5 TPase-related protein determined to 2.9-A resolution. J Biol Chem. 1999, 274: 11904-11913. 10.1074/jbc.274.17.11904.: Wiegand TW, Reznikoff WS: Interaction of Tn5 TPase with the transposon termini. J Mol Biol. 1994, 235: 486-495. 10.1006/jmbi.1994.1008.: Hennig S, Ziebuhr W: Characterization of the TPase encoded by IS256, the prototype of a major family of bacterial insertion sequence elements. J Bacteriol. 2010, 192: 4153-4163. 10.1128/JB.00226-10.: Derbyshire KM, Grindley ND: Binding of the IS903 TPase to its inverted repeat in vitro. Embo J. 1992, 11: 3449-3455.: Vos JC, van Luenen HG, Plasterk RH: Characterization of the Caenorhabditis elegans Tc1 TPase in vivo and in vitro. Genes Dev. 1993, 7: 1244-1253. 10.1101/gad.7.7a.1244.: Mack AM, Crawford NM: The Arabidopsis TAG1 TPase has an N-terminal zinc finger DNA binding domain that recognizes distinct subterminal motifs. Plant Cell. 2001, 13: 2319-2331.: Betts MJ, Russell RB: Amino acid properties and consequences of substitutions. Bioinformatics for Geneticists. Edited by: Barnes MI, Gray IC. 2003, Chichester, UK: John Wiley and Sons Ltd., 289-316.: Henikoff S, Henikoff JG: Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA. 1992, 89: 10915-10919. 10.1073/pnas.89.22.10915.: Pabo CO, Sauer RT: Transcription factors: structural families and principles of DNA recognition. Annu Rev Biochem. 1992, 61: 1053-1095. 10.1146/annurev.bi.61.070192.005201.: Clubb RT, Schumacher S, Mizuuchi K, Gronenborn AM, Clore GM: Solution structure of the I gamma subdomain of the Mu end DNA-binding domain of phage Mu TPase. J Mol Biol. 1997, 273: 19-25. 10.1006/jmbi.1997.1312.: Schumacher S, Clubb RT, Cai M, Mizuuchi K, Clore GM, Gronenborn AM: Solution structure of the Mu end DNA-binding ibeta subdomain of phage Mu TPase: modular DNA recognition by two tethered domains. Embo J. 1997, 16: 7532-7541. 10.1093/emboj/16.24.7532.: van Pouderoyen G, Ketting RF, Perrakis A, Plasterk RH, Sixma TK: Crystal structure of the specific DNA-binding domain of Tc3 TPase of C.elegans in complex with transposon DNA. Embo J. 1997, 16: 6044-6054. 10.1093/emboj/16.19.6044.: Izsvak Z, Khare D, Behlke J, Heinemann U, Plasterk RH, Ivics Z: Involvement of a bifunctional, paired-like DNA-binding domain and a transpositional enhancer in Sleeping Beauty transposition. J Biol Chem. 2002, 277: 34581-34588. 10.1074/jbc.M204001200.: Kissinger CR, Liu BS, Martin-Blanco E, Kornberg TB, Pabo CO: Crystal structure of an engrailed homeodomain-DNA complex at 2.8 A resolution: a framework for understanding homeodomain-DNA interactions. Cell. 1990, 63: 579-590. 10.1016/0092-8674(90)90453-L.: Xu W, Rould MA, Jun S, Desplan C, Pabo CO: Crystal structure of a paired domain-DNA complex at 2.5 A resolution reveals structural basis for Pax developmental mutations. Cell. 1995, 80: 639-650. 10.1016/0092-8674(95)90518-9.: Gonzalez L, Woolfson DN, Alber T: Buried polar residues and structural specificity in the GCN4 leucine zipper. Nat Struct Biol. 1996, 3: 1011-1018. 10.1038/nsb1296-1011.: Graddis TJ, Myszka DG, Chaiken IM: Controlled formation of model homo- and heterodimer coiled coil polypeptides. Biochemistry. 1993, 32: 12664-12671. 10.1021/bi00210a015.: O'Shea EK, Lumb KJ, Kim PS: Peptide 'Velcro': design of a heterodimeric coiled coil. Curr Biol. 1993, 3: 658-667. 10.1016/0960-9822(93)90063-T.: Baker TA, Luo L: Identification of residues in the Mu TPase essential for catalysis. Proc Natl Acad Sci USA. 1994, 91: 6654-6658. 10.1073/pnas.91.14.6654.: Esposito D, Craigie R: Sequence specificity of viral end DNA binding by HIV-1 integrase reveals critical regions for protein-DNA interaction. Embo J. 1998, 17: 5832-5843. 10.1093/emboj/17.19.5832.: van Gent DC, Groeneger AA, Plasterk RH: Mutational analysis of the integrase protein of human immunodeficiency virus type 2. Proc Natl Acad Sci USA. 1992, 89: 9598-9602. 10.1073/pnas.89.20.9598.: Calmels C, de Soultrait VR, Caumont A, Desjobert C, Faure A, Fournier M, Tarrago-Litvak L, Parissi V: Biochemical and random mutagenesis analysis of the region carrying the catalytic E152 amino acid of HIV-1 integrase. Nucleic Acids Res. 2004, 32: 1527-1538. 10.1093/nar/gkh298.: Andrake MD, Skalka AM: Retroviral integrase, putting the pieces together. J Biol Chem. 1996, 271: 19633-19636. 10.1074/jbc.271.33.19633.: Valkov E, Gupta SS, Hare S, Helander A, Roversi P, McClure M, Cherepanov P: Functional and structural characterization of the integrase from the prototype foamy virus. Nucleic Acids Res. 2009, 37: 243-255. 10.1093/nar/gkn938.: Obradovic Z, Peng K, Vucetic S, Radivojac P, Dunker AK: Exploiting heterogeneous sequence properties improves prediction of protein disorder. Proteins. 2005, 61 (Suppl 7): 176-182.: Sickmeier M, Hamilton JA, LeGall T, Vacic V, Cortese MS, Tantos A, Szabo B, Tompa P, Chen J, Uversky VN, Obradovic Z, Dunker AK: DisProt: the Database of Disordered Proteins. Nucleic Acids Res. 2007, 35: D786-793. 10.1093/nar/gkl893.: Download references: Acknowledgements: We thank W Wong for technical assistance and NDF Grindley for useful discussions. This research was supported by US Public Health Service grant NIGMS/MBRS GMO8153 to LAL and a York College FDSP award 990110 to LAL.: Author information: Affiliations: Corresponding author: Correspondence to Leslie A Lewis.: Additional information: Competing interests: The authors declare that they have no competing interests.: Authors' contributions: PTU created the fusion construct, carried out the overexpression and protein purification experiments, the secondary structure analysis and in silico determination of the amino acid substitutions in the mutant strains. SA carried out all cloning experiments involving the creation of plasmids with the orfAB mutations. RS performed the PCR and PCR-based mutagenesis experiments. JA carried out all of the lacZ papillation experiments. LAL designed the study and provided facilities and funding. LAL and MA wrote the manuscript. All authors have read and approved the final manuscript.: Authors’ original submitted files for images: Below are the links to the authors’ original submitted files for images.: Authors’ original file for figure 1: Authors’ original file for figure 2: Authors’ original file for figure 3: Authors’ original file for figure 4: Authors’ original file for figure 5: Authors’ original file for figure 6: Authors’ original file for figure 7: Authors’ original file for figure 8: Authors’ original file for figure 9: Authors’ original file for figure 10: Authors’ original file for figure 11: Authors’ original file for figure 12: Authors’ original file for figure 13: Rights and permissions: This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.: Reprints and Permissions: About this article: Cite this article: Lewis, L.A., Astatke, M., Umekubo, P.T. et al. Soluble expression, purification and characterization of the full length IS2 Transposase. Mobile DNA 2, 14 (2011). https://doi.org/10.1186/1759-8753-2-14: Download citation: Received: 26 August 2011: Accepted: 27 October 2011: Published: 27 October 2011: DOI: https://doi.org/10.1186/1759-8753-2-14: Keywords: Advertisement: ISSN: 1759-8753: Follow BMC: By using this website, you agree to our Terms and Conditions, Privacy statement and Cookies policy. Manage the cookies we use in the preference centre. : © 2020 BioMed Central Ltd unless otherwise stated. Part of Springer Nature. "
"Characterization and potential functional significance of human-chimpanzee large INDEL variation" "Nalini Polavarapu, Gaurav Arora, Vinay K Mittal, John F McDonald" "John F McDonald" "25 October 2011" "Although humans and chimpanzees have accumulated significant differences in a number of phenotypic traits since diverging from a common ancestor about six million years ago, their genomes are more than 98.5% identical at protein-coding loci. This modest degree of nucleotide divergence is not sufficient to explain the extensive phenotypic differences between the two species. It has been hypothesized that the genetic basis of the phenotypic differences lies at the level of gene regulation and is associated with the extensive insertion and deletion (INDEL) variation between the two species. To test the hypothesis that large INDELs (80 to 12,000 bp) may have contributed significantly to differences in gene regulation between the two species, we categorized human-chimpanzee INDEL variation mapping in or around genes and determined whether this variation is significantly correlated with previously determined differences in gene expression., Extensive, large INDEL variation exists between the human and chimpanzee genomes. This variation is primarily attributable to retrotransposon insertions within the human lineage. There is a significant correlation between differences in gene expression and large human-chimpanzee INDEL variation mapping in genes or in proximity to them., The results presented herein are consistent with the hypothesis that large INDELs, particularly those associated with retrotransposons, have played a significant role in human-chimpanzee regulatory evolution." "insertion and deletion, differential gene expression, retrotransposon, noninterspersed sequence, human insertion, short interspersed nuclear element" " Characterization and potential functional significance of human-chimpanzee large INDEL variation: Nalini Polavarapu2, Gaurav Arora1, Vinay K Mittal1 & John F McDonald1 : Mobile DNA volume 2, Article number: 13 (2011) Cite this article : 19k Accesses: 6 Citations: 20 Altmetric: Metrics details: Abstract: Background: Although humans and chimpanzees have accumulated significant differences in a number of phenotypic traits since diverging from a common ancestor about six million years ago, their genomes are more than 98.5% identical at protein-coding loci. This modest degree of nucleotide divergence is not sufficient to explain the extensive phenotypic differences between the two species. It has been hypothesized that the genetic basis of the phenotypic differences lies at the level of gene regulation and is associated with the extensive insertion and deletion (INDEL) variation between the two species. To test the hypothesis that large INDELs (80 to 12,000 bp) may have contributed significantly to differences in gene regulation between the two species, we categorized human-chimpanzee INDEL variation mapping in or around genes and determined whether this variation is significantly correlated with previously determined differences in gene expression.: Results: Extensive, large INDEL variation exists between the human and chimpanzee genomes. This variation is primarily attributable to retrotransposon insertions within the human lineage. There is a significant correlation between differences in gene expression and large human-chimpanzee INDEL variation mapping in genes or in proximity to them.: Conclusions: The results presented herein are consistent with the hypothesis that large INDELs, particularly those associated with retrotransposons, have played a significant role in human-chimpanzee regulatory evolution.: Background: Although humans and chimpanzees have accumulated significant differences in a number of phenotypic traits since diverging from a common ancestor about six to eight million years ago, their genomes are more than 98.5% identical at protein-coding loci [1]. Since this modest degree of nucleotide divergence does not seem sufficient to explain the extensive phenotypic differences that exist between the two species, it has been hypothesized that the genetic basis of the differences lies at the level of gene regulation [2] and is associated with the extensive insertion and deletion (INDEL) variation between the two species [3].: A number of comparative genomic studies focused on specific chromosomal regions of humans and nonhuman primates that have been carried out have revealed that significant INDEL variation exists between these species [4, 5]. For example, in a comparison of human chromosome 21 and the syntenic chimpanzee chromosome 22, as many as 68,000 INDELs were identified [6]. We have shown previously that interspersed repeats, particularly retrotransposons (RTs), have contributed significantly to the INDEL variation between humans and chimpanzees [7]. Because RT sequences located in or near genes have the capacity to significantly alter patterns of gene expression, it has long been recognized that these elements may be important factors in regulatory evolution [8–16]. Other sources of INDEL variation between chimpanzees and humans are simple tandem repeats (TRs) and other noninterspersed sequences (NISs) [17]. Because NISs in or near genes are capable of altering gene expression, they also have been postulated to play a role in regulatory evolution [18–23].: In this article, we present our detailed characterization of large INDEL variation (80 to 12,000 bp in length) associated with human and chimpanzee genes and test if this variation is significantly correlated with differences in gene expression in a variety of tissues. We characterize INDELs by type (that is, chimpanzee insertion (CI), chimpanzee deletion (CD), human insertion (HI) and human deletion (HD) of interspersed sequences and/or NISs). Our results indicate that both interspersed repeats (predominately RTs) and NISs have contributed significantly to human-chimpanzee genome evolution, primarily due to insertions within the human lineage. This variation is significantly correlated with previously determined differences in gene expression consistent with the hypothesis that large INDEL variation has played a significant role in human-chimpanzee evolution.: Results and discussion: The computational pipeline of our analysis is outlined in Figure 1 (see Methods for additional information).: Computational pipeline for the detection and characterization of human and chimpanzee insertions and deletions. Using information from the designated databases, we characterized insertions and deletions (INDELs) and analyzed them using various in-house Perl scripts and open source algorithms (Multiz, RepeatMasker [44] and Tandem Repeats Finder [45]). The multiple alignment program Multiz was used to classify chimpanzee gaps (CGs) as insertions or deletions. The UCSC Genome Browser [40] pairwise alignment databases were used for human gap (HG) classification as insertions or deletions. Human and chimpanzee INDELs were associated with the known human and chimpanzee Ensembl genes [30] obtained from the UCSC Table Browser (http://genome.ucsc.edu/cgi-bin/hgTables), and the presence of INDELs was correlated with the microarray gene expression data. INDEL sequences that were obtained from their corresponding reference genomes were searched for various repeat elements using RepeatMasker and Tandem Repeats Finder and classified according to the families of repeat sequences (partial or complete) present within each INDEL. The characterized INDELs were then assessed using various statistical analytical methods.: Characterization of human and chimpanzee gaps: We use the term \"human gaps\" (HGs) to refer to sequences present in chimpanzees but absent in humans and the term \"chimpanzee gaps\" (CGs) for sequences present in humans but absent in chimpanzees [7]. Collectively, these gaps constitute the INDEL variation (defined in this study as gaps ranging in size from 80 to 12,000 bp) between humans and chimpanzees. Using the database available at the UCSC Genome Bioinformatics web site [24], we identified a total of 26,509 INDELs (11,365 HGs and 15,144 CGs) (Table 1). The majority (18,574/26,509, or 70%) of these INDELs are interspersed sequences associated with transposable elements (TEs) (that is, complete, truncated or composite TE sequences repeated multiple times throughout the genome). Nearly all the TE-associated INDELs (18,476/18,574, or 99.5%) are homologous to RT sequences. The 30% (7,935/26,509) of INDELs not associated with TEs are composed of what we refer to as \"noninterspersed sequences,\" or NISs, the majority of which (5,335/7,935, or 67%) are gaps of unique sequence (US) (that is, sequences uniquely associated with a single INDEL). The remainder of the NISs (2,600/7,935, or 33%) is composed of TRs (Table 1).: The majority of human-chimpanzee INDELs are a result of insertions: The presence of a sequence in humans (or vice versa in chimpanzees) that is missing at an orthologous genomic position in chimpanzees (humans) can be due to either an insertion in one species or a deletion in the other. Since DNA TEs compose less than 0.4% (98/26,509, or 0.37%) of human-chimpanzee INDEL variation, the following analysis of the relative contribution of insertions and deletions is limited to RTs and NISs.: By using Rhesus macaques (Macaca mulatta) as an out-group, we determined that 63% (16,518/26,411) of the INDEL variation between humans and chimpanzees is due to insertions (Table 2). The vast majority of all insertions are associated with RTs (12,683/16,518, or 77%), and the majority of RT insertions have occurred in the human lineage (8,648/12,683, or 68%) (Table 3). Indeed, 61% ((5,399 + 10,607)/26,411) of all human-chimpanzee INDELs can be attributed to events (insertions or deletions) that occurred within the human lineage after the two species diverged from a common ancestor (Table 2). The percentage of all insertions and deletions that occurred in the human lineage that are associated with RTs is 64% ((3,086 + 8,648)/18,476) (Table 3), and the percentage associated with NIS insertions and deletions is 54% ((2,313 + 1,959)/7,935) (Table 4). In contrast to RT-associated INDELs, where insertions clearly predominate (12,683/18,476, or 69%) (Table 3), NIS-associated INDELs are more equally attributable to insertion (3,835/7,935, or 48%) and deletion (4,100/7,935, or 52%) events (Table 4).: We grouped INDELs associated with RTs into five groups based upon the subclass of RTs associated with each INDEL: (1) short interspersed nuclear elements (SINEs), (2) long interspersed nuclear elements (LINEs), (3) endogenous retroviruses (ERVs), (4) biologically active composite elements consisting of fragments of SINEs, VNTRs (variable number of tandem repeats), and Alu elements (SVAs) and (5) \"mosaic elements\" (MEs), a term we will use to refer to inactive sequences composed of a mosaic of more than one class of the above-named RT homologous sequences. Of the RTs associated with HGs, 49% (3,494/7,121) are homologous to SINEs, 26% (1,847/7,121) to LINEs, 7% (519/7,121) to ERVs, 2% (114/7,121) to SVAs and 16% (1,147/7,121) to MEs (Table 1). Of the RTs associated with CGs, 62% (7,021/11,355) are homologous to SINEs, 18% (2,052/11,355) to LINEs, 3% (356/11,355) to ERVs, 6% (681/11,355) to SVAs and 11% (1,245/11,355) to MEs (Table 1). These values are proportionate to the relative frequency of the various classes of RTs in the human and chimpanzee genomes [1, 25].: Consistent with the relative transpositional activity of RT families in humans and chimpanzees [1, 25], we found that the majority of the RT-associated insertions involve SINEs and LINEs (Table 3). RTs with low or undetectable transpositional activity (ERVs and SVAs) were rarely associated with insertions. We found that the frequency of ERV insertions is 1.3-fold higher in chimpanzees than in humans (208/156 = 1.3-fold) (Table 3), predominately due to the expansion of two chimpanzee-specific endogenous retrovirus families (CERV 1/PTERV 1 and CERV 2) three to five million years ago [7, 26, 27]. In contrast, we found that the frequency of SVA-associated insertions is 6.9-fold higher in humans than in chimpanzees (680/98 = 6.9-fold) (Table 3), which is consistent with the presence of transpositionally active SVA subfamilies in the human lineage [28, 29]. Overall, we found that the frequency of RT-associated insertions is more than twofold higher in humans than in chimpanzees (8,648/4,035 = 2.1-fold). The frequency of LINE-associated, SVA-associated and ERV-associated deletions is, on average, higher in humans than in chimpanzees, whereas the frequency of SINE-associated and ME-associated deletions is nearly the same in both species (Table 3).: As stated above, we grouped INDELs associated with NISs into two classes: those associated with TRs and those not associated with TRs that we classify as USs. We found that the majority of NIS INDELs are associated with US (5,335/7,935, or 67%), most of which (3,034/5,335, or 57%) are deletions (Table 4). In contrast, the majority of TR-associated INDELs are insertions (1,534/2,600, or 59%).: Most INDELs located in or in proximity to human and chimpanzee genes are the consequence of retrotransposon insertions within the human lineage: Of the 34,914 human/chimpanzee genes listed in the Ensembl database (March 2006 build) [30], 10,597 (10,597/34,914, or about 30%) are associated with INDELs (that is, having one or more INDELs located in or within 5 kb upstream or downstream of a gene) (Table 5). The majority of INDELs associated with human genes are insertions (HI/HI + HD: 4,193/(4,193 + 2,034) = 67%), and the proportion of INDELs associated with chimpanzee genes is about equally distributed between insertions and deletions (CI/CI + CD: 2,125/(2,125 + 2,245) = 49%) (Table 5). The percentage of genes associated with RT-containing INDELs ((6,873 + 816)/10,597 = 73%) is more than two times greater than the percentage of genes associated with NIS-containing INDELs ((2,908 + 816)/10,597 = 35%) (Table 5). The majority of RT INDELs associated with genes is the result of insertions or deletions within the human lineage ((3,149 + 326) + (1,139 + 155)/(6,873 + 816) = 62%), and the vast majority of these events are due to insertions ((3,149 + 326)/((3,149 + 326) + (1,139 + 155) = 73%) (Table 5). In contrast, the frequencies of NIS INDELs associated with genes are more nearly equal within the human lineage (((718 + 326) + (740 + 155))/(2908 + 816) = 52%) and the chimpanzee lineage (((674 + 175) + (776 +160))/(2,908 + 816) = 48%). Similarly, the overall frequencies of NIS insertions (((718 + 326) + (674 + 175))/((2,908 + 816)) = 51%) and deletions (((740 + 155) + (776 + 160))/(2,908 + 816) = 49%) are more nearly the same.: Human-chimpanzee INDEL variation is correlated with differences in gene expression: Although the identification, quantification and characterization of human-chimpanzee INDEL (and other types of genetic) variation are relatively straightforward, the establishment of whether this variation may be of potential functional and/or adaptive significance is not. One approach taken by evolutionary biologists in addressing this question is to correlate differences in genetic variation between species with differences in levels of gene expression [31]. Such comparative studies can be problematic because the lack of a significant correlation between differences in gene expression and a specific genetic variant or class of variants at a particular life stage or from a particular tissue does not preclude the possibility that significant correlations will exist at other life stages and/or in other tissues not examined. Nevertheless, if statistically significant correlations are found at even a single life stage or in a single tissue, they can be informative and suggestive of potentially productive areas of future research.: To explore possible correlations between human-chimpanzee INDEL variation and differences in gene expression, we reanalyzed a previously published human-chimpanzee expression data set consisting of expression arrays from five different tissues (brain, testis heart, liver and kidney) [31]. A major goal of this previous study was to correlate sequence differences with expression differences and a number of microarray probe sets for which quality sequences could not be obtained in humans and chimpanzees (for example, those required for the calculation of Ka/Ks ratios) were excluded. Since the quality of the chimpanzee genome sequence has improved in recent years, and because our interest is in the possible contribution of INDELs to chimpanzee-human expression differences, we reanalyzed this microarray data set, including probe sets that had previously been excluded.: Of the 20,676 (Affymetrix May 2004 build) genes examined in our reanalysis, we found that 17,755 (17,755/20,676, or 86%) are expressed genes (we define \"expressed genes\" as those designated as \"present\" by default in MAS 5.0 Affymetrix software (Affymetrix Inc, Santa Clara, CA USA) in at least one tissue in either chimpanzees or humans) and that 15,004 (15,004/17,755, or 85%) of these expressed genes display a significant between-species difference (P < 0.05) in expression in at least one of the five tissues examined (Table 6). The most dramatic difference in gene expression between humans and chimpanzees is in testis, where 70% of expressed genes (10,803/15,445) display a significant difference in expression between chimpanzees and humans, followed by heart (51%), brain (49%), kidney (47%) and liver (39%) (Table 6).: Of all expressed genes (in the tissues and adult life stages examined), an average of 30% were associated with INDELs (brain: ((2,266 + 2,153)/14,133 = 31%; testis: (3,438 +1,256)/15,445 = 30%; heart: (2,233 + 1,948)/13,497 = 31%; liver: (1,696 + 2,466)/13,684 = 30%; and kidney: (2,179 + 2,144)/14,059 = 31%) (Table 7). Of differentially expressed (DE) genes, an average of 33% (brain: 2,266/(2,266 + 4,618) = 33%; testis: 3,438/(3,438 + 7,365) = 32%; heart: 2,233/(2,233 + 4,610) = 33%; liver: 1,696/(1,696 + 3,612) = 32%; and kidney: 2,179/(2,179 + 4,410) = 33%) were associated with INDELs (Table 7).: The proportion of DE genes associated with INDELs was significantly greater (P < 0.05) than the proportion of non-differentially expressed (non-DE) genes associated with INDELs in all five tissues, indicating that the association of INDELs with genes may be of functional significance (Table 8). Partitioning these differences in proportion to RT-associated INDELs and NIS-associated INDELs indicates that the functional differences are attributable to both types of INDELs, although the majority of DE genes are associated with RTs (Tables 9 and 10).: To further explore the hypothesis that INDELs may contribute to gene expression differences between chimpanzees and humans, we computed the proportion of genes associated (or not associated) with INDELs and DE relative to the proportion of genes associated (or not associated) with INDELs that were non-DE. We reasoned that if the presence or absence of an INDEL in or in proximity to chimpanzee and human genes is not a contributing factor to differences in gene expression, the proportion of genes associated (or not associated) with INDELs should be approximately equal for DE and non-DE genes. For example, of the 15,445 genes expressed in testis, 4,694 (3,438 + 1,256) were associated with INDELs and 10,751 (7,365 + 3,386) were not associated with INDELs (Table 7). Of the 4,694 expressed genes associated with INDELs, 73% (3,438/4,694) were DE genes. In contrast, of the 10,751 genes expressed in testis that were not associated with INDELs, 69% (7,365/10,751) were DE genes. These proportions are significantly different (p = 3.93E-09), which is consistent with the hypothesis that the association of genes with an INDEL is of functional significance for DE genes in testis at the life stage examined (Table 11). The same analysis was carried out for genes expressed in the other tissues, and the results indicate that the proportion of DE genes associated with INDELs is consistently higher than the proportion of DE genes not associated with INDELs (Table 11).: Little overlap exists between differentially expressed genes associated with INDELs and differentially expressed genes associated with nucleotide sequence differences between species: As indicated previously, the gene expression data used in our analysis were originally generated by Khaitovich et al.[31], and we used them to look for correlations with human-chimpanzee nucleotide variation.. We were interested in determining the degree of overlap between DE genes associated with INDEL variation identified in our study with DE genes previously associated with nucleotide variation in the Khaitovich et al. study.: The results presented in Figure 2 indicate that, on average, fewer than 9% of the genes found to be differentially expressed between humans and chimpanzees in these two studies were associated with both nucleotide and INDEL variation. Of the 2,266 DE genes in brain and associated with INDEL variation, only 132 (132/2,266, or approximately 6%) were also associated with differences in nucleotide sequence. Similarly low proportions were found for DE genes in heart (170/2,233, or approximately 8%), liver (124/1,696, or approximately 7%) and kidney (185/2,179, or approximately 8%). Interestingly, the greatest degree of overlap was associated with DE genes in testis (680/3,438, or approximately 20%).: Overlap (blue region) between genes significantly differentially expressed between humans and chimpanzees and associated with nucleotide differences (green region) [31]or large insertion and deletion differences (red region) between the species. On average, fewer than 9% of genes differentially expressed at the life stages and tissues examined were associated with both types of variation. The number of differentially expressed genes associated with nucleotide differences as determined by Khaitovich et al. [31], as well as the number of differentially expressed genes associated with large insertions and deletions (INDELs) as determined in this study, are shown. The number of overlapping genes are shown at the intersection.: Testis is also the tissue where we found INDEL variation to be most highly and consistently correlated with differences in gene expression (Tables 8, 9 and 10). As previously pointed out by Khaitovich et al.[31], a majority of DE genes between human and chimpanzee testes are involved in reproduction and map to the X chromosome, making them potentially more responsive than autosomal loci to selection for differences in reproductive function.: Summary and conclusions: Over the approximately six million years since the human and chimpanzee lineages diverged from a common ancestor, the two species evolved a variety of distinctive morphological, behavioral, cognitive and other phenotypic traits [32]. To explore the genetic basis of the phenotypic differences that distinguish humans from chimpanzees, a number of comparative genomic studies have been conducted in recent years [1, 33]. Perhaps the most surprising finding of these studies is the paucity of protein-coding nucleotide variations between these two species, which supports earlier contention that the basis of the phenotypic differences lies in the realm of gene regulation [2].: Direct evidence in support of the regulatory hypothesis has recently been provided by a number of comparative microarray studies showing that significant differences in gene expression patterns exist between humans and chimpanzees, especially in organs (for example, brain and testis) and functions (for example, cognitive ability and fertility) directly related to some of the major phenotypic traits distinguishing the two species [31, 32]. Questions remain, however, concerning the genetic basis of the differences in gene regulation that separates humans from chimpanzees. One hypothesis is that the substantial INDEL variation that exists between humans and chimpanzees may contribute significantly to the regulatory differences between the species [3, 7]. In an effort to address this hypothesis, we categorized the large (80 to 12,000 bp) INDEL variation existing between humans and chimpanzees that is located in or near genes and conducted a preliminary analysis to assess whether this variation might be of functional significance. We found that 70% of the 26,509 human-chimpanzee INDELs are homologous to RT sequences (primarily SINEs and LINEs) that have inserted within the human genome subsequent to the divergence of the two species from a common ancestor. The remaining 30% of the human-chimpanzee INDEL variation is associated with US NISs or with NISs composed of TRs.: Large INDELs were found to map within or in proximity to (± 5 kb) 30% of human-chimpanzee genes. The majority of INDELs mapping within or in proximity to human genes are RT sequences, and the INDELs mapping within or in proximity to chimpanzee genes are about equally distributed between RTs and NISs. SINEs and LINEs were the most frequent categories of RTs associated with human-chimpanzee genes, which is consistent with the fact that these are the most transpositionally active classes of RTs in both species.: We found that the proportion of DE genes associated with INDELs is significantly greater than the proportion of DE genes not associated with INDELs across all tissues examined. Similarly, the proportion of DE genes associated with INDELs was significantly greater than the proportion of non-DE genes and was associated with INDELs across all tissues examined. These findings, coupled with the observation that there is relatively little overlap (fewer than 9% averaged across all tissues) between DE genes associated with nucleotide variation and those associated with large INDEL variation, are consistent with the hypothesis that large INDELs have contributed significantly to regulatory differences between humans and chimpanzees at the life stage and in the tissues examined in this study. Indeed, we have previously presented evidence that RT INDELs may have contributed to differences in apoptotic function between the two species, possibly accounting for the relatively larger size of the human brain's being pleiotropically coupled with an increased propensity for cancer development [34].: Although more extensive studies involving larger sample sizes and multiple life stages are needed to more precisely assess the relative contribution of INDELs and nucleotide differences to human-chimpanzee differences in gene expression, the preliminary analyses presented herein and previously reported by Khaitovich et al.[31] indicate that both classes of genetic variation contribute significantly to differences in patterns of gene expression between the two species, especially in testis.: The fact that most of the human-chimpanzee INDEL variation that correlates with differences in gene expression is attributable to HIs is interesting for two reasons. First, it is consistent with the considerable body of evidence suggesting that much of the divergence in gene expression between chimpanzees and humans may have been driven by accelerated regulatory evolution within the human lineage [35–39]. Our results are consistent with the hypothesis that an accelerated rate of INDELs (predominately RT insertions) within the human lineage may also have contributed significantly to the regulatory differences between these two species. Second, our data suggest that, at least with respect to the evolutionary contribution of INDELs to chimpanzee-human divergence in gene expression, selection operating on de novo mutations (for example, insertions that occurred after the divergence of the two species from a common ancestor) may have been more important than selection operating on standing INDEL variation preexisting in common ancestral populations. This second conclusion is contingent on the generally held presumption that transposition rates in humans and chimpanzees are approximately equal. Whereas previous analyses of gene expression and protein-coding sequence variation between chimpanzees and humans have revealed a pattern consistent with neutral evolution and negative selection [31], our findings are consistent with the hypothesis that INDELs in general, and RT insertions within the human lineage in particular, have been a positive driving force behind human regulatory evolution.: Methods: Initial data sets: Reference genome coordinates for CGs (on human genome assembly (July 2003 build)) and HGs (on panTro assembly (November 2003 build)) of sizes ranging from 80 to 12,000 bp were obtained using the UCSC Table Browser [40, 41]. The CG data set was originally generated by aligning the chimpanzee genome against the human genome build hg16 (July 2003 build) and the HG data set by aligning the human genome against the chimpanzee genome build panTro1 (November 2003 build) [40, 41]. The CG and HG genomic coordinates were updated to the hg18 version (March 2006 build) and the panTro2 version (March 2006 build) of the human and chimpanzee genomes, respectively, using the Batch Coordinate Conversion liftOver tool [42]. Some of the gap sequences (76 CGs and 2,581 HGs) not represented in the new versions of genome assemblies were removed in this process. Genomic sequences corresponding to the updated gap coordinates were downloaded from the UCSC Genome Database.: We derived gap coordinates from the older UCSC genome browser assemblies (hg16 (2003) and panTro1 (2003)) because these gap coordinates are not provided in the newer assemblies (hg18 (2006) and panTro2 (2006)). The gaps derived from the earlier assemblies, however, were confirmed (after converting them using the liftOver tool) in the newer assemblies by multiple and pairwise genome alignments (see Figure 1). Only those gaps that were confirmed to be present in the more recent assemblies were used in our analysis. Only regions of the human and chimpanzee genomes that could be unambiguously aligned with one other (that is, well-assembled contigs of both genome assemblies) were used in identification of the INDELs. Genomic regions containing ambiguous bases (N's) and/or assembly gaps were excluded from our analysis. HGs and CGs characterized as partial deletions or partial insertions due to incomplete sequencing of the Rhesus macaque (out-group) genome were also excluded from our analysis.: Identification of INDELs: CGs and HGs were further categorized as INDELs by comparing reference genome alignments of the human genome (hg18), the chimpanzee genome (panTro2) and the Rhesus macaque genome (rheMac2). Reference genome sequences were obtained from the UCSC Genome Browser [43]. To identify INDELs, we followed different approaches for CGs and HGs. For CGs, the chimpanzee and Rhesus macaque genomes were aligned with the human genome to produce a three-way multiple-genome alignment. For HGs, instead of performing whole-genome multiple alignments, we consolidated pairwise alignments of human-chimpanzee, chimpanzee-Rhesus macaque and human-Rhesus macaque genomes that were already available in the UCSC Genome Browser database. Genomic coordinates of gaps were used to search the genomic regions associated with CGs and HGs in genomic alignments (multiple-genome alignment for CGs and consolidated pairwise alignment for HGs). Using the presence or absence of gap sequence in the out-group (Rhesus macaques) genome, we characterized each gap as a chimpanzee (human) deletion or human (chimpanzee) insertion. Pairwise alignment consolidation and comparison of genomic regions were done using in-house Perl scripts.: Characterization of sequences associated with INDELs: The RepeatMasker program [44] was used to identify all interspersed repeats in the INDEL sequences. These were further classified according to the type of interspersed repeats, such as SINEs, LINEs, ERVs, SVAs or DNA elements. INDEL sequences consisting of more than one type of interspersed repeat (for example, ERVs inserted within LINE elements, etc) were classified as MEs. The Tandem Repeats Finder program [45] was used to identify TR sequences within the INDELs characterized as NISs (that is, INDELs not containing interspersed repeat sequences). The remainder of the NISs was classified as USs.: Association of human and chimpanzee genes with the INDEL variation: The genomic coordinates of the genic regions of the human and chimpanzee Ensembl genes were downloaded from the UCSC Genome Bioinformatics website [24]. An INDEL was considered to be associated with the gene if the genomic coordinates of the INDEL mapped within or 5 kb upstream or downstream of the gene. In-house Perl scripts were used to match these coordinates.: Microarray gene expression data analysis: The human-chimpanzee gene expression data from five different tissues (brain, heart, liver, kidney and testis) in six humans and five chimpanzees were obtained from a previous study [31]. The samples were studied using Affymetrix Human Genome U133 Plus 2.0 arrays. The expression data were reanalyzed using the following procedure. The data were processed using the MAS normalization method encoded in the Affymetrix function library of the Bioconductor package (http://www.bioconductor.org/) developed for the R statistical programming environment (http://www.r-project.org/) [45]. The genes with significant sequence differences in Affymetrix probes between humans and chimpanzees and with inconsistent hybridization patterns within samples in a species were removed. The reason for filtering is to differentiate real detection of expression in chimpanzee from expression differences due to probe mismatch, because chimpanzee expression data are derived by hybridizing to the human Affymetrix chip. The genes with detection P-values of less than 0.065 were considered for further analysis. The expression values of these genes were normalized across samples by Z-score calculation using TIBCO Spotfire DecisionSite software (http://spotfire.tibco.com/products/decisionsite.cfm; TIBCO Software, Inc, Somerville, MA, USA). Genes with t-test P-values less than 0.05 between human and chimpanzee were considered DE genes.: Correlating INDEL variation with differential gene expression: Differences in gene expression between chimpanzee and human in each of the five tissues were partitioned for DE or non-DE genes and associated with INDELs. We looked for evidence of selection by comparing the proportion of DE genes associated with INDELs with the proportion of DE genes not associated with INDELs across all tissues examined. Similarly, we compared the proportion of DE genes associated with INDELs with the proportion of non-DE genes and associated with INDELs across all tissues examined. Proportions tests (R statistical software package [46]) were used to determine whether the differences in proportions were statistically significant (P < 0.05).: Categories of genes associated with INDEL variation between humans and chimpanzees: Genes associated with HGs and CGs were analyzed in two different ways: (1) On the basis of the type of gap sequence, whether the gene is homologous to an interspersed sequence or not. For this analysis, we divided the INDEL variation data set into two different categories: (a) interspersed INDEL variation and (b) noninterspersed INDEL variation (interspersed INDEL variation was further divided into RT INDEL variation and non-RT INDEL variation); (2) On the basis of the location of the INDEL variation, that is, upstream of the transcription start site or downstream (within 5 kb downstream of the transcription termination site) of a gene. Some genes were associated with INDEL variation in two or more regions, that is, a gap starting upstream of the gene and ending at the first intron. Such genes were included in more than one category, depending on the regions covered by gap sequences. The genes associated with RT INDEL variation were further divided based on RT class and whether the sequence is homologous to SINEs, LINEs, ERVs, SVAs or MEs. As with the previous analysis, some genes were associated with many gap sequences, each of which is homologous to a different class of RT sequences. Such genes were included in more than one category, depending on the number of RT classes contained in the gap sequences.: Linking INDEL variation with differential expression: The genes in each of the above-defined categories were checked for their expression levels between humans and chimpanzees in each of the five tissues. We used the same criteria described above in considering a gene as detected or DE between humans and chimpanzees. All genes that were detected but non-DE were considered non-DE between humans and chimpanzees. We used the R statistical software package to measure the statistical significance of the differential expression of genes associated with different categories of INDEL variation. We looked for evidence of selection by comparing the proportion of DE genes associated with INDELs with the proportion of DE genes not associated with INDELs across all tissues examined. Similarly, we compared the proportion of DE genes associated with INDELs with the proportion of non-DE genes associated with INDELs across all tissues examined. We used a proportions test to measure the statistical significance of the comparisons described above. P < 0.05 was considered statistically significant.: Identification of differentially expressed genes that are correlated with both INDELs and single nucleotide variation: A list of DE genes between humans and chimpanzees in the five tissues tested (brain, testis, heart, liver and kidney) as well as those associated with single-nucleotide variation was obtained from the supplementary information published by Khaitovich et al.[31]. These genes were compared with DE genes (between the two species) as well as INDEL variation-associated genes that were obtained in our analyses (Additional file 1).: Abbreviations: analysis of variance: base pair: chimpanzee deletion: coding sequence: chimpanzee endogenous virus/Pan troglodytes endogenous retrovirus: chimpanzee gap: chimpanzee insertion: differentially expressed: endogenous retrovirus: human deletion: human gap: human genome: human insertion: insertion and deletion: kilobase pair: long interspersed nuclear element: mosaic element: noninterspersed sequence: non-differentially expressed: chimpanzee genome: Rhesus macaque genome: retrotransposon sequence: short interspersed nuclear element: biologically active composite elements consisting of fragments of SINE, VNTRs and Alu elements: transposable element: tandem repeat: unique sequence: variable number of tandem repeats.: References: Mikkelsen TS, Hillier LW, Eichler EE, Zody MC, Jaffe DB, Yang SP, Enard W, Hellmann I, Lindblad-Toh K, Altheide TK, Archidiacono N, Bork P, Butler J, Chang JL, Cheng Z, Chinwalla AT, de Jong P, Delehaunty KD, Fronick CC, Fulton LL, Gilad Y, Glusman G, Gnerre S, Graves TA, Toshiyuki H, Hayden KE, Huang X, Ji H, Kent JW, King MC, Chimpanzee Sequencing and Analysis Consortium, et al: Initial sequence of the chimpanzee genome and comparison with the human genome. Nature. 2005, 437: 69-87. 10.1038/nature04072.: King MC, Wilson AC: Evolution at two levels. Science. 1975, 188: 107-116. 10.1126/science.1090005.: Britten RJ: Divergence between samples of chimpanzee and human DNA sequences is 5%, counting INDELs. Proc Natl Acad Sci USA. 2002, 99: 13633-13635. 10.1073/pnas.172510699.: Frazer KA, Chen X, Hinds DA, Pant PVK, Patil N, Cox DR: Genomic DNA insertions and deletions occur frequently between humans and nonhuman primates. Genome Res. 2003, 13: 341-346. 10.1101/gr.554603.: Chen F, Chen C, Li W, Chuang T: Human-specific insertions and deletions inferred from mammalian genome sequences. Genome Res. 2007, 17: 16-22.: Watanabe H, Fujiyama A, Hattori M, Taylor TD, Toyoda A, Kuroki Y, Noguchi H, BenKahla A, Lehrach H, Sudbrak R, Kube M, Taenzer S, Galgoczy P, Platzer M, Scharfe M, Nordsiek G, Blöcker H, Hellman I, Khaitovich P, Pääbo S, Reinhardt R, Zheng HJ, Zhang XL, Zhu GF, Wang BF, Fu G, Ren SX, Zhao GP, Chen Z, Lee YS, et al: DNA sequence and comparative analysis of chimpanzee chromosome 22. Nature. 2004, 429: 382-388. 10.1038/nature02564.: Polavarapu N, Bowen NJ, McDonald JF: Identification, characterization and comparative genomics of chimpanzee endogenous retroviruses. Genome Biol. 2006, 7: R51-: McClintock B: The significance of responses to the genome to challenge. Science. 1984, 266: 792-801.: McDonald JF: Macroevolution and retroviral-like elements. Bioscience. 1990, 40: 183-191. 10.2307/1311363.: McDonald JF: Evolution and consequences of transposable elements. Curr Opin Genet Dev. 1993, 3: 855-64. 10.1016/0959-437X(93)90005-A.: Britten RJ: Mobile elements inserted in the distant past have taken on important functions. Gene. 1997, 205: 177-182. 10.1016/S0378-1119(97)00399-5.: Kidwell MG, Lisch DR: Transposable elements and host genome evolution. Trends Ecol Evol. 2000, 15: 95-99. 10.1016/S0169-5347(99)01817-0.: Kidwell MG, Lisch DR: Perspective: transposable elements, parasitic DNA and genome evolution. Evolution. 2001, 55: 1-24.: Bowen NJ, Jordan IK: Transposable elements and the evolution of eukaryotic complexity. Curr Issues Mol Biol. 2002, 4: 65-76.: van de Lagemaat LN, Landry JR, Mager DL, Medstrand P: Transposable elements in mammals promote regulatory variation of diversification of genes with specialized functions. Trends Genet. 2003, 19: 530-536. 10.1016/j.tig.2003.08.004.: Fescotte C: Transposable elements and evolution of regulatory networks. Nat Rev Genet. 2008, 9: 397-405. 10.1038/nrg2337.: Mills RE, Luttig CT, Larkins CE, Beauchamp A, Tsui C, Pittard WS, Devine SE: An initial map of chromosome insertion and deletion (INDEL) variation in the human genome. Genome Res. 2006, 16: 1182-1190. 10.1101/gr.4565806.: Tautz D, Trick M, Dover GA: Cryptic simplicity in DNA is a major source of genetic variation. Nature. 1986, 322: 652-656. 10.1038/322652a0.: Kashi Y, King D, Soller M: Simple sequence repeats as a source of quantitative genetic variation. Trends Genet. 1997, 13: 74-78. 10.1016/S0168-9525(97)01008-1.: Pardue M, Lowenhaupt K, Rich A, Nordheim A: (dC-dA)n.(dG-dT)n sequences have evolutionarily conserved chromosomal locations in Drosophila with implications for roles in chromosome structure and function. EMBO J. 1987, 6: 1781-1789.: Yee HA, Wong AK, van de Sande JH, Rattner JB: Identification of novel single-stranded d(TC)n binding proteins in several mammalian species. Nucleic Acids Res. 1991, 19: 949-953. 10.1093/nar/19.4.949.: Sinha S, Siggia ED: Sequence turnover and tandem repeats in cis-regulatory modules in Drosophila. Mol Biol Evol. 2005, 22: 874-875. 10.1093/molbev/msi090.: Tomilin NV: Regulation of mammalian gene expression by retroelements and non-coding tandem repeats. Bioessays. 2008, 30: 338-348. 10.1002/bies.20741.: Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D: The human genome browser at UCSC. Genome Res. 2002, 12: 996-1006.: Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov P, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridian A, Sougnez C, Thomann-Stange N, International Human Genome Sequencing Consortium, et al: Initial sequencing and analysis of the human genome. Nature. 2001, 409: 860-921. 10.1038/35057062.: Maksakova IA, Romanish MT, Gagnier L, Dunn CA, van de Lagemaat LN, Mager DN: Retroviral elements and their hosts: insertional mutagenesis in the mouse germ line. PLoS Genet. 2006, 2: e2-10.1371/journal.pgen.0020002.: Yohn CT, Jiang Z, McGrath SD, Hayden KE, Khaitovich P, Johnson ME, Eichler MY, McPherson JD, Zhao S, Pääbo S, Eichler EE: Lineage-specific expansions of retroviral insertions within the genomes of African great apes but not humans and orangutans. PLoS Biol. 2005, 3: e110-10.1371/journal.pbio.0030110.: Ostertag EM, Goodier JL, Zhang Y, Kazazian HH: SVA elements are non-autonomous retrotransposons that cause human diseases. Am J Hum Genet. 2003, 73: 1444-1451. 10.1086/380207.: Wang H, Xing J, Grover D, Hedges DJ, Han K, Walker JA, Batzer MA: SVA elements: A hominid-specific retrotransposon family. J Mol Biol. 2005, 354: 994-1007. 10.1016/j.jmb.2005.09.085.: Birney E, Andrews TD, Bevan P, Caccamo M, Chen Y, Clarke L, Coates G, Cuff J, Curwen V, Cutts T, Down T, Eyras E, Fernandez-Suarez XM, Gane P, Gibbins B, Gilbert J, Hammond M, Hotz HR, Iyer V, Jekosch K, Kahari A, Kasprzyk A, Keefe D, Keenan S, Lehvaslaiho H, McVicker G, Melsopp C, Meidl P, Mongin E, Pettett R, et al: An overview of Ensembl. Genome Res. 2004, 14: 925-928. 10.1101/gr.1860604.: Khaitovich P, Hellmann I, Enard W, Nowick K, Leinweber M, Franz H, Weiss G, Lachmann M, Pääbo S: Parallel patterns of evolution in the genomes and transcriptomes of humans and chimpanzees. Science. 2005, 309: 1850-1854. 10.1126/science.1108296.: Varki A, Altheide T: Comparing the human and chimpanzee genomes: searching for needles in a haystack. Genome Res. 2005, 15: 1746-1758. 10.1101/gr.3737405.: Li WH, Gu Z, Wang H, Nekrutenko A: Evolutionary analyses of human genome. Nature. 2001, 409: 847-849. 10.1038/35057039.: Arora G, Polavarapu N, McDonald JF: Did natural selection for increased cognitive ability in humans lead to an elevated risk of cancer?. Med Hypotheses. 2009, 73: 453-456. 10.1016/j.mehy.2009.03.035.: Enard W, Khaitovich P, Klose J, Heissig F, Giavalisco P, Nieselt-Struwe K, Muchmore E, Varki A, Ravid R, Doxiadus GM, Bontrop RE, Pääbo S: Intra- and interspecies variation in primate gene expression patterns. Science. 2002, 296: 340-343. 10.1126/science.1068996.: Gu J, Gu X: Induced gene expression in human brain after split from chimpanzee. Trends Genet. 2003, 19: 63-65. 10.1016/S0168-9525(02)00040-9.: Prabhakar S, Noonan JR, Pääbo S, Rubin RM: Accelerated evolution of conserved non-coding sequences in human. Science. 2006, 314: 786-10.1126/science.1130738.: Wang QF, Prabhakar S, Chanan S, Cheng JF, Rubin RM, Boffelli D: Detection of weakly conserved ancestral mammalian regulatory sequences by primate comparisons. Genome Biol. 2007, 8: R1-10.1186/gb-2007-8-1-r1.: Hawks J, Wang ET, Cochran GM, Harpending HC, Moyzis RK: Recent acceleration of human adaptive evolution. Proc Natl Acad Sci USA. 2007, 52: 20753-20758.: Karolchik D, Baertsch R, Diekhans M, Furey TS, Hinrichs A, Lu YT, Roskin KM, Schwartz M, Sugnet CW, Thomas DJ, Weber RJ, Haussler D, Kent WJ, University of California Santa Cruz: The UCSC Genome Browser Database. Nucleic Acids Res. 2003, 31: 51-54. 10.1093/nar/gkg129.: Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, Kent WJ: The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 2004, 32 (Database): D493-D496.: UCSC Genome Browser Utilities: Batch Coordinate Conversion (liftOver).http://genome.ucsc.edu/cgi-bin/hgLiftOver: UCSC Genome Database ftp website.ftp://hgdownload.cse.ucsc.edu/goldenPath/: Smit AFA, Hubley R, Green P: RepeatMasker Open-3.0 1996-2010.http://www.repeatmasker.org/: Benson G: Tandem Repeats Finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999, 27: 573-580. 10.1093/nar/27.2.573.: The R Project for Statistical Programming.http://www.r-project.org/: Download references: Acknowledgements: This work was supported by the Georgia Tech Foundation.: Author information: Affiliations: Corresponding author: Correspondence to John F McDonald.: Additional information: Competing interests: The authors declare that they have no competing interests.: Authors' contributions: NP and JM conceptualized the original study. NP, GA and VM conducted the analyses. JM, NP, GA and VM wrote the manuscript. All authors read and approved the final manuscript.: Electronic supplementary material: Additional file 1:Insertion and deletion-associated genes differentially expressed between humans and chimpanzees. Microsoft Excel file listing all insertions and deletion (INDEL)-associated genes differentially expressed between humans and chimpanzees for each tissue type (brain, testis, heart, liver and kidney). (XLS 498 KB): Authors’ original submitted files for images: Below are the links to the authors’ original submitted files for images.: Authors’ original file for figure 2: Rights and permissions: Reprints and Permissions: About this article: Cite this article: Polavarapu, N., Arora, G., Mittal, V.K. et al. Characterization and potential functional significance of human-chimpanzee large INDEL variation. Mobile DNA 2, 13 (2011). https://doi.org/10.1186/1759-8753-2-13: Download citation: Received: 07 September 2011: Accepted: 25 October 2011: Published: 25 October 2011: DOI: https://doi.org/10.1186/1759-8753-2-13: Keywords: Advertisement: ISSN: 1759-8753: Follow BMC: By using this website, you agree to our Terms and Conditions, Privacy statement and Cookies policy. Manage the cookies we use in the preference centre. : © 2020 BioMed Central Ltd unless otherwise stated. Part of Springer Nature. "
"Crypton transposons: identification of new diverse families and ancient domestication events" "Kenji K Kojima, Jerzy Jurka" "Jerzy Jurka" "19 October 2011" "\"Domestication\" of transposable elements (TEs) led to evolutionary breakthroughs such as the origin of telomerase and the vertebrate adaptive immune system. These breakthroughs were accomplished by the adaptation of molecular functions essential for TEs, such as reverse transcription, DNA cutting and ligation or DNA binding. Cryptons represent a unique class of DNA transposons using tyrosine recombinase (YR) to cut and rejoin the recombining DNA molecules. Cryptons were originally identified in fungi and later in the sea anemone, sea urchin and insects., Herein we report new Cryptons from animals, fungi, oomycetes and diatom, as well as widely conserved genes derived from ancient Crypton domestication events. Phylogenetic analysis based on the YR sequences supports four deep divisions of Crypton elements. We found that the domain of unknown function 3504 (DUF3504) in eukaryotes is derived from Crypton YR. DUF3504 is similar to YR but lacks most of the residues of the catalytic tetrad (R-H-R-Y). Genes containing the DUF3504 domain are potassium channel tetramerization domain containing 1 (KCTD1), KIAA1958, zinc finger MYM type 2 (ZMYM2), ZMYM3, ZMYM4, glutamine-rich protein 1 (QRICH1) and \"without children\" (WOC). The DUF3504 genes are highly conserved and are found in almost all jawed vertebrates. The sequence, domain structure, intron positions and synteny blocks support the view that ZMYM2, ZMYM3, ZMYM4, and possibly QRICH1, were derived from WOC through two rounds of genome duplication in early vertebrate evolution. WOC is observed widely among bilaterians. There could be four independent events of Crypton domestication, and one of them, generating WOC/ZMYM, predated the birth of bilaterian animals. This is the third-oldest domestication event known to date, following the domestication generating telomerase reverse transcriptase (TERT) and Prp8. Many Crypton-derived genes are transcriptional regulators with additional DNA-binding domains, and the acquisition of the DUF3504 domain could have added new regulatory pathways via protein-DNA or protein-protein interactions., Cryptons have contributed to animal evolution through domestication of their YR sequences. The DUF3504 domains are domesticated YRs of animal Crypton elements." "tyrosine recombinase,
Crypton
, domestication, transposon, DUF3504" " Crypton transposons: identification of new diverse families and ancient domestication events: Kenji K Kojima1 & Jerzy Jurka1 : Mobile DNA volume 2, Article number: 12 (2011) Cite this article : 11k Accesses: 24 Citations: 1 Altmetric: Metrics details: Abstract: Background: \"Domestication\" of transposable elements (TEs) led to evolutionary breakthroughs such as the origin of telomerase and the vertebrate adaptive immune system. These breakthroughs were accomplished by the adaptation of molecular functions essential for TEs, such as reverse transcription, DNA cutting and ligation or DNA binding. Cryptons represent a unique class of DNA transposons using tyrosine recombinase (YR) to cut and rejoin the recombining DNA molecules. Cryptons were originally identified in fungi and later in the sea anemone, sea urchin and insects.: Results: Herein we report new Cryptons from animals, fungi, oomycetes and diatom, as well as widely conserved genes derived from ancient Crypton domestication events. Phylogenetic analysis based on the YR sequences supports four deep divisions of Crypton elements. We found that the domain of unknown function 3504 (DUF3504) in eukaryotes is derived from Crypton YR. DUF3504 is similar to YR but lacks most of the residues of the catalytic tetrad (R-H-R-Y). Genes containing the DUF3504 domain are potassium channel tetramerization domain containing 1 (KCTD1), KIAA1958, zinc finger MYM type 2 (ZMYM2), ZMYM3, ZMYM4, glutamine-rich protein 1 (QRICH1) and \"without children\" (WOC). The DUF3504 genes are highly conserved and are found in almost all jawed vertebrates. The sequence, domain structure, intron positions and synteny blocks support the view that ZMYM2, ZMYM3, ZMYM4, and possibly QRICH1, were derived from WOC through two rounds of genome duplication in early vertebrate evolution. WOC is observed widely among bilaterians. There could be four independent events of Crypton domestication, and one of them, generating WOC/ZMYM, predated the birth of bilaterian animals. This is the third-oldest domestication event known to date, following the domestication generating telomerase reverse transcriptase (TERT) and Prp8. Many Crypton-derived genes are transcriptional regulators with additional DNA-binding domains, and the acquisition of the DUF3504 domain could have added new regulatory pathways via protein-DNA or protein-protein interactions.: Conclusions: Cryptons have contributed to animal evolution through domestication of their YR sequences. The DUF3504 domains are domesticated YRs of animal Crypton elements.: Background: The structural and mechanistic variety of transposable elements (TEs) is well-documented [1]. They encode proteins that include diverse functional domains involved in catalysis or interaction with DNA, RNA and other proteins. Because of this diverse repertoire, TEs can supply functional modules to generate new genes. \"Molecular domestication\" of transposable elements [2] led to evolutionary milestones such as the origin of telomerase and the vertebrate adaptive immune system. Telomerase reverse transcriptase (TERT) provides a solution for end replication problems accompanying linear chromosome replication and was derived from a reverse transcriptase (RT) related to Penelope-like elements in the very early stage of eukaryote evolution [3, 4]. V(D)J recombination is a mechanism used in jawed vertebrates to generate a variety of immunoglobulins and T-cell receptors. It is catalyzed by the recombination activating gene 1 (RAG1) derived from a transposase encoded by the Transib family of DNA transposons [5]. Different kinds of transposon proteins were domesticated, including transposase, integrase, RT, envelope and gag proteins [6]. Herein we report in-depth studies of another type of transposon enzyme, tyrosine recombinase (YR), which was repeatedly domesticated in the history of animals.: To date four types of enzymes are known to catalyze DNA integration of eukaryotic transposons: DDE-transposase, YR, rolling-circle replication initiator and the combination of RT and endonuclease (EN) [7]. DDE-transposase is the most abundant gene in nature [8] and is carried by many DNA transposon superfamilies, self-synthesizing transposons (Polinton), as well as long terminal repeat (LTR) retrotransposons (Gypsy, Copia, BEL and endogenous retroviruses) [1, 9–11]. They share three conserved amino acids (DDD or DDE) at their catalytic sites, which are separated by amino acid sequences of varying length. Some domesticated DDE-transposases became DNA-binding proteins, such as CENP-B in mammals and Daysleeper in Arabidopsis thaliana[12, 13]. Non-LTR retrotransposons and Penelope-like elements use a combination of RT and EN in their transposition [14–17]. Helitron is the only group of eukaryotic transposons encoding rolling-circle replication initiator [9].: YR genes are ubiquitous in prokaryotes but rare in eukaryotes [18, 19]. All YRs found in eukaryotes are encoded by mobile elements: yeast 2-micron circle plasmids [20], ciliate Euplotes crassus transposons (Tec1, Tec2 and Tec3) [21, 22], three groups of retrotransposons (DIRS/Pat, Ngaro and VIPER) [23–25], and Cryptons[19]. The YR encoded by the yeast 2-micron plasmid, known as \"flippase\" (FLP), is widely used for site-specific recombination in the FLP-FRT system [26]. Tec1 and Tec2 transposons encode a DDE-transposase in addition to YR, and therefore the YR domains in these transposons are probably involved in resolving transposition intermediates. To date the only YR-encoding transposons found in the vertebrate genomes are DIRS and Ngaro retrotransposons. Cryptons were originally found in a basidiomycete Cryptococcus neoformans and several pathogenic fungi. Their boundaries are difficult to characterize because they have neither terminal inverted repeats (TIRs) nor long direct repeats. Instead they have short direct repeats at both termini. These 4- or 6-bp direct repeats are considered substrates for recombination. By analogy to prokaryotic YR-encoding transposons, Goodwin et al. [19] proposed that Cryptons are excised from the host genome as an extrachromosomal circular DNA and integrated at a different locus in the genome. YR typically recognizes recombination sites consisting of two inverted repeats that are 11 to 13 bp long and separated by a segment 6 to 8 bp long [27]. Recently, transposons encoding only a YR have been found in sea urchin, insects and cnidarians and classified as Cryptons[28, 29]. YR contains four catalytically important residues (R-H-R-Y), but their overall sequence identity is very low among different genes and transposons [18, 19]. The conserved tyrosine residue directly binds to DNA in the recombination reaction. In this paper, we report Cryptons from various species, including medaka fish, and six human genes originated from ancient domestication events of Crypton YRs.: Results: The diversity of Crypton elements in terms of their sequence and domain structure: We identified 94 Crypton elements from 24 species representing animals, fungi and stramenopiles that include oomycetes and diatom (Figure 1, Table 1 and Additional file 1). Phylogenetic clustering of Cryptons on the basis of their YR domain sequences revealed four groups reflecting the systematics of their hosts (Figure 2, open circles), but two of them were not strongly supported phylogenetically because of the low bootstrap values. Herein we designate them as CryptonF, CryptonS, CryptonA and CryptonI to indicate their corresponding hosts: fungi, stramenopiles, animals and insects. CryptonA and CryptonI are structurally similar; however, CryptonF, CryptonS and CryptonA/CryptonI have distinct protein domain structures (see Figure 1 and detailed description in the next three sections). Because of the low resolution of the phylogenetic tree, we could not determine whether there is any relationship between these four Crypton groups and to other YR-encoding elements, and we cannot rule out the possibility that they have originated independently.: Schematic structures of Cryptons. Crypton-Cn1 and MarCry-1_FO belong to the CryptonF group. YR = tyrosine recombinase; GCR1_C = DNA-binding domain; DDE = DDE-transposase; C48 = C48 peptidase; HTH = helix-turn-helix motif.: Phylogeny of Cryptons , DUF3504 genes and other eukaryotic tyrosine recombinases. The numbers at nodes are bootstrap values over 40. Open circles indicate the clusters of Cryptons, and filled circles show the clusters of DUF3504 genes. YR = tyrosine recombinase. Prefixes of names are as follows. Cry = Crypton; 1958 = KIAA1958. Accession numbers of DUF3504 genes are shown in Additional file 5. Sequences of the transposable elements are deposited in Repbase http://www.girinst.org/repbase/. Other abbreviations and accession numbers are as follows. FLP = FLP recombinase of the 2-micron plasmid in Saccharomyces cerevisiae (NP_040488); FLP_Klac = FLP recombinase of the plasmid pKD1 in Kluyveromyces lactis (YP_355327); CRE = Cre recombinase of the enterobacteria phage P1 (YP_006472); Vlf1_AcNPV = very late expression factor 1 from the Autographa californica nucleopolyhedrovirus (NP_054107); Tn916 = Tn916 transposase from Enterococcus faecalis (NP_0687929); XerD = XerD from Escherichia coli (NP_417370); Lambda = lambda phage recombinase (NP_040609); At_Ti = recombinase from the Agrobacterium tumefaciens Ti plasmid (NP_059767); SpPat1 from Strongylocentrotus purpuratus (obtained at http://biocadmin.otago.ac.nz/fmi/xsl/retrobase/home.xsl). Suffixes for species names are as follows. Animals: Hs = human, Homo sapiens; Oa = platypus, Ornithorhynchus anatinus; Gg = chicken, Gallus gallus; Tg = zebra finch, Taeniopygia guttata; Ac/ACa = lizard, Anolis carolinensis; Xt/XT = frog, Xenopus tropicalis; Dr/DR = zebrafish, Danio rerio; OL = medaka, Oryzias latipes; Cm = chimaera, Callorhinchus milii; SP = sea urchin, Strongylocentrotus purpuratus; SK = acorn worm, Saccoglossus kowalevskii; Dm = fruit fly, Drosophila melanogaster; Tc/TC/TCa = beetle, Tribolium castaneum; NVi = parasitic wasp, Nasonia vitripennis; CQ = southern house mosquito, Culex quinquefasciatus; AA = yellow fever mosquito, Aedes aegypti; DPu = water flea, Daphnia pulex; Acal = sea hare, Aplysia californica; Sm = bloodfluke, Schistosoma mansoni; NV = sea anemone, Nematostella vectensis. Fungi: RO = Rhizopus oryzae; CGlo = Chaetomium globosum; TS = Talaromyces stipitatus; CI = Coccidioides immitis; FO = Fusarium oxysporum. Stramenopiles: PI = Phytophthora infestans; PS = Phytophthora sojae; PU = Pythium ultimum; HAra = Hyaloperonospora arabidopsidis; ALai = Albugo laibachii; PTri = Phaeodactylum tricornutum. Plants: CR = Chlamydomonas reinhardtii.: CryptonF elements from fungi and oomycetes, and CryptonF-derived genes: We identified CryptonF elements in nine species of fungi and four species of oomycetes (Table 1 and Additional file 1). These elements encode a protein that includes YR and GCR1_C DNA-binding domains (Figure 1). Most of the fungal Cryptons and the five oomycete Cryptons are associated with 6-bp terminal direct repeats, which are likely substrates for Crypton integration (Additional file 1). In Fusarium oxysporum, Crypton is fused with a Mariner-type DNA transposon and this composite transposon is hearafter named MarCry-1_FO (Figure 1). The analysis of four MarCry-1_FO copies with more than 97% identity to each other revealed the presence of 16-bp TIRs and target site duplications (TSDs) of the TA dinucleotide, indicating that their Mariner-type DDE-transposase is responsible for transposition. CryptonF-2_PS from Phytophthora sojae and related elements encode a C48 peptidase (Ulp1 protease) in addition to a YR (Figure 1). The oomycete CryptonF elements are nested in fungal CryptonF elements in the phylogenetic tree (Figure 2), indicating a horizontal transfer between fungi and oomycetes.: Four genes from Saccharomyces cerevisiae were derived from CryptonF elements (Figure 3 and Additional files 2 and 3). It was previously reported that the GCR1_C protein domain encoded by Gcr1, Msn1 and Hot1 genes is similar to the C-terminal part of fungal Cryptons[19]. In addition to these three genes, we found that Cbf2/Ndc10 contains a C-terminal domain similar to CryptonF proteins. The central portions of Cbf2 and Gcr1 are similar to CryptonF YR domains, but the catalytic site is not preserved (data not shown). Vanderwaltozyma polyspora carries two paralogous genes of Gcr1 and Msn1. Candida tropicalis and related species (Candida albicans, Pichia stipitis and Pichia guilliermondii) harbor another gene derived from a CryptonF element, represented by XP_002548716 in C. tropicalis. It is designated herein as Crypton-derived gene 1 (Cdg1) (Figure 3). The only domain shared by CryptonF elements and all Crypton-derived genes is the GCR1_C domain. The phylogenetic analysis of GCR1_C domains (Figure 3C) indicates that Hot1 and Msn1 are paralogous and that the gene related to Hot1/Msn1 in C. tropicalis represents an outgroup of both genes. Therefore, it is likely that four domestication events (for Hot1/Msn1, Gcr1, Cbf2 and Cdg1) occurred in this group.: Distribution and schematic structures of Crypton -derived genes in Saccharomycetaceae fungi. (A) Schematic protein structures encoded by Crypton-derived genes and Cryptons. (B) Distribution of Crypton-derived genes. Each gene identified in the haploid genome is represented by a plus symbol. (C) The phylogeny of Crypton-derived genes and Cryptons using the GCR1_C domain sequences. The numbers at nodes are bootstrap values over 50. Accession numbers of genes are shown in Additional file 2. \"Cry\" stands for Crypton. Suffixes for species names are as follows. Sc = Saccharomyces cerevisiae; Cg = Candida glabrata; Vp = Vanderwaltozyma polyspora; Zr = Zygosaccharomyces rouxii; Lt = Lachancea thermotolerans; Kl = Kluyveromyces lactis; Ag = Ashbya gossypii; Ct = Candida tropicalis; Ca = Candida albicans; Ps = Pichia stipitis; Pg = Pichia guilliermondii.: We could not find any Crypton insertions in the subphylum Saccharomycotina (including S. cerevisiae, C. tropicalis and related species). The distribution of Crypton-derived genes indicates that Crypton was active in the past and that the DNA-binding domain GCR1_C was most likely derived from Cryptons.: CryptonS, a new group of Cryptons from oomycetes and diatom: We found CryptonS elements in seven oomycete and one diatom species (Figure 1, Table 1 and Additional file 1). CryptonS elements do not encode any GCR1_C domain, but the C-terminal region is conserved among CryptonS elements. CryptonS elements are associated with 5- or 6-bp terminal direct repeats. The majority of CryptonS elements share TATGG termini. Some CryptonS elements encode an additional protein containing a C48 peptidase domain. The peptidases encoded by CryptonS and CryptonF elements in oomycetes belong to the same family and are related to the Ulp1 protease family. Domain shuffling between two groups of Crypton elements could explain the similarity, but more data are needed to determine the relationship between these peptidases and other cellular peptidases.: Cryptons in animals (CryptonA and CryptonI groups): We identified Cryptons in seven metazoan animals belonging to five phyla (Table 1 and Additional file 1). CryptonI elements were found only in insects, whereas CryptonA elements were found in various animals, including cnidarians. Animal Cryptons (both CryptonA and CryptonI) have no C-terminal domain (Figure 1). We did not find any terminal repeats in animal Cryptons. CryptonI-1_RPro from Rhodnius prolixus hosts a non-autonomous derivative family, CryptonI-1N1_RPro, in which 5' 438 bp and 3' 260 bp are 98% identical to those of CryptonI-1_RPro. This is the first report of non-autonomous Crypton elements. Comparison of 50 copies of CryptonI-1_RPro and CryptonI-1N1_RPro revealed no terminal repeats (neither direct nor inverted). In medaka, we also found two families of non-autonomous derivatives (CryptonA-1N1_OL and CryptonA-1N2_OL) of CryptonA-1_OL. As in the case of other DNA transposons, Crypton non-autonomous elements are much more abundant than their autonomous counterparts.: We can safely rule out the theoretically possible contamination of the genomic sequences from medaka used in this study. First, we identified more than 2,700 copies of autonomous and non-autonomous Crypton elements with DNA sequence identities to consensus ranging from 59% to 98%. The nucleotide diversity of Cryptons from medaka is consistent with their long-term presence in the medaka lineage. Second, we found many Crypton sequences in the database of expressed sequence tags (ESTs) from three different medaka strains: Hd-rR, CAB and HNI (data not shown). We also found several Cryptons with inserted medaka-specific transposons such as piggyBac-N1_OL and RTE-1_OL (Table 2).: Crypton-derived sequences in the ATF7IP gene: Identification of Cryptons in three deuterostome species (medaka, sea urchin and acorn worm) prompted us to extend analysis of Cryptons in chordates, including four sequenced actinopterygian species (Fugu rubripes, Tetraodon nigroviridis, Gasterosteus aculeatus and Danio rerio). Although multiple copies of Crypton elements were found only in medaka, sequences similar to Cryptons were found in various chordate species (Table 3). Most of them do not encode any functional recombinases, owing to frameshifts, deletions and substitutions at catalytically essential residues.: However, two similar sequences (ABQF01015803 from the zebra finch Taeniopygia guttata and AAVX01068049 from the chimaera (elephant shark) Callorhinchus milii) include an intact open reading frame of YR (Figure 4A). We did not further analyze the sequence from chimaera, because the sequenced region was only 2,661 bp in length. The Crypton-like sequence in zebra finch is inside an intron of a gene coding for activating transcription factor 7 interacting protein (ATF7IP) (Figure 4B). There is a YR sequence at the orthologous locus of chicken Gallus gallus, which encodes a protein 97% identical to that of zebra finch, but it contains a frameshift inside the YR region. The orthologous YR sequence from the turkey Meleagris gallopavo contains a frameshift at the same position (data not shown). Because the divergence between chicken and zebra finch occurred some 107 million years ago (MYA) [30], this unusually high similarity indicates a strong selection operating on these YR sequences. An exon-intron prediction program would predict alternative splicing in the ATF7IP gene from zebra finch, although at present there are no mRNA or ESTs corresponding to the fusion transcript. It is possible that the YR is translated as part of the ATF7IP protein and retains catalytic activity in some birds.: Crypton -derived sequence in an intron of ATF7IP gene. (A) Alignment of proteins coded by deuterostome Cryptons and Crypton-derived sequences. Catalytically essential residues are shown below the alignment. (B) Illustration of the conservation of ATF7IP loci. The position of the YR sequence is indicated by the open box. Black boxes represent exons of the chicken ATF7IP gene. Gray boxes indicate conserved blocks between chicken and respective species based on the Net Tracks of the UCSC Genome Browser http://genome.ucsc.edu/. Lines between gray boxes indicate that boxes are connected by unalignable sequences. (C) Alignment of nucleotide sequences of Crypton-derived sequences.: Using the University of California Santa Cruz (UCSC) Genome Browser http://genome.ucsc.edu/, we found that there are partial Crypton sequences at the orthologous positions of the ATF7IP gene from the human, horse, kangaroo and platypus genomes (Figures 4B and 4C). There are also closely related sequences present in the genomes of rhesus macaque and tarsier. Therefore, the insertion of Crypton in the ATF7IP gene must have occurred in the common ancestor of amniotes more than 325 MYA [30]. None of the mammalian orthologous sequences encode intact YR proteins, and many mammalian species are missing the YR sequence. This indicates only a slight, if any, selective pressure on this sequence in mammals.: Ancient domestication of Cryptons in animals: Most vertebrate genes similar to Crypton code for proteins (Additional file 4). In the human genome, there are seven proteins similar to Crypton YRs, which are annotated as parts of six genes (Figure 5 and Additional file 5). The KIAA1958 gene contains two isoforms, both of which include YR-derived sequences. The other genes are potassium channel tetramerization domain containing 1 (KCTD1), zinc finger, myeloproliferative and mental retardation type 2 (ZMYM2)/zinc finger protein 198 (ZNF198), ZMYM3/ZNF261, ZMYM4/ZNF262 and glutamine-rich protein 1 (QRICH1) (Figure 5). A PSI-BLAST search of these proteins against the National Center for Biotechnology Information (NCBI) conserved domain database (CDD) revealed that they share a domain of unknown function (DUF3504 superfamily; E-value = 1e-29). The six genes are widespread among vertebrates (Figure 6) and are highly conserved among phylogenetically distant species (Table 4). The phylogenetic relationship of each gene agreed with that of species (data not shown). The nucleotide sequences corresponding to all seven DUF3504 domains were present in the NCBI EST database, indicating their expression. The data clearly show that they are neither pseudogenes nor defective Cryptons (see the accession numbers of DUF3504 genes in Additional file 5). However, none of them preserve the YR catalytic site. All of them lost the catalytic tyrosine and the second conserved arginine, and all but KCTD1 also lost the conserved histidine.: Schematic structures of DUF3504 proteins. KIAA1958 gene has two isoforms, each of which encodes a DUF3504 domain. The structures of KCTD1, KIAA1958, QRICH1, ZMYM2, ZMYM3 and ZMYM4 are from humans. The structure of WOC is from Drosophila melanogaster.: Distribution of Cryptons and Crypton -derived genes. Each gene identified in the haploid genome is represented by a plus symbol. Minus symbols indicate the absence of Cryptons or Crypton-derived genes. Asterisks indicate the presence of their disrupted fragments. The branch ages are based on TimeTree [30]. The unit of time is indicated. Crypton-derived genes listed at nodes of the tree indicate the times of their domestication based on their distribution in different species. KIAA1958L, QRICH1, ZMYM2, ZMYM3 and ZMYM4 are not shown, because they were likely derived by gene duplications.: Although the resolution is low because of high divergence and the short length of the YR sequence, animal DUF3504 genes tend to cocluster with animal Cryptons (CryptonA) in the YR phylogenetic tree (Figure 2). There are four independent clusters of DUF3504 genes: KCTD1, KIAA1958a, KIAA1958b/KIAA1958L and WOC/ZMYM/QRICH1 (Figure 2, filled circles). KCTD1 coclusters with several animal Cryptons, and the clustering is supported by 100% bootstrap value. Cryptons form a paraphyletic cluster, which indicates that the DUF3504 domain of KCTD1 was derived from a Crypton YR. The position of KIAA1958a is distinct from either CryptonA or CryptonI, and WOC/ZMYM/QRICH1 is clustered as a sister group of all animal CryptonA elements. Therefore, phylogeny alone does not support the domestication of animal Cryptons leading to WOC/ZMYM/QRICH1 and KIAA1958a.: The DUF3504 domain was derived from YR, not vice versa, because DUF3504 lacks the complete catalytic tetrad essential for YR activity. YR is essential for transposition, and repeated generation of active YRs from defective YRs is highly improbable. The distributions of WOC/ZMYM/QRICH1 and KIAA1958a are restricted to bilaterians and jawed vertebrates, respectively. Apart from Cryptons, the only other possible sources of YRs in animal genomes are the retrotransposon families DIRS and Ngaro[23, 24]. However, all searches of the YRs from CryptonA-1_OL, Crypton-1_SP and CryptonA-1_SK matched the DUF3504 sequence with E-values = 8e-12, whereas YRs of DIRS and Ngaro did not match the DUF3504 sequence at all (even when the threshold E-value was set at 100). Several representatives of DUF3504 are actually Crypton sequences; for example, XP_001639277 is the protein coded by Crypton-1_NV. The patchy distribution of Cryptons and the inconsistency between Crypton and host phylogenies indicate ancient amplification and extinction events in Crypton evolution. The ancient amplifications would have generated many lineages of Cryptons, and it is likely that WOC/ZMYM/QRICH1 and KIAA1958a derived from lineages of Cryptons that are now extinct. We cannot completely rule out the possibility that the two genes and CryptonA elements were independently derived from DIRS-like retrotransposons or some as yet uncharacterized types of mobile elements, but this implies independent origins of CryptonA and other Crypton groups (CryptonF, CryptonS and CryptonI). Therefore, four independent domestication events of animal Cryptons are the most parsimonious explanation for the origins of animal DUF3504 genes.: A representative of DUF3504 from Halocynthia roretzi (BAB40645) has orthologs in other tunicates: Ciona intestinalis (XP_002125964), Ciona savignyi (AACT01002283 and AACT10141791) and Oikopleura dioica (CBY34656). They could also represent a domestication event of Crypton. Another representative (YP_025778) is coded in the mitochondrion of the green alga Pseudendoclonium akinetum. It could also be a candidate Crypton-derived gene; owing to the lack of related sequences, however, we did not analyze it further.: All DUF3504 genes encode much longer proteins than their DUF3504 domains, and it is possible that the preexisting genes captured entire Crypton protein-coding sequences. However, the only recognizable domain encoded by animal Cryptons is YR (DUF3504), and there is little sequence similarity beyond YRs among Cryptons themselves. Therefore, it is unlikely that any significant sequence similarity was preserved beyond the DUF3504 domains between the DUF3504 and Crypton proteins.: KCTD1 gene: The KCTD1 gene contains the DUF3504 domain confined within a single exon. Among the vertebrate genes, the KCTD1 DUF3504 domain is the closest to the Crypton YRs in terms of protein sequence similarity. The sequence identity between the KCTD1 DUF3504 domain and the YR of Crypton-1_SP is 32%, which exceeds the analogous identity among different lineages of Cryptons. For example, Crypton-1_SP in the CryptonA lineage and Crypton-1_TC in the CryptonI lineage show less than 30% sequence identity to each other. KCTD1 encodes two protein isoforms of different lengths. The longer isoform (isoform b) contains both an N-terminal DUF3504 domain and a C-terminal BTB/POZ (Broad-complex, Tramtrack and Bric-a-brac/poxvirus and zinc finger) domain (Figure 5), whereas the shorter one (isoform a) contains only the BTB/POZ domain. The shorter isoform is approximately 80% identical to the KCTD15 gene at the protein level, and related genes are found in various organisms, including lancelet, sea urchin and insects (Additional file 6). The KCTD15 gene does not have any DUF3504 domain and is found in gnathostomes from mammals to chimaera. We infer that KCTD1 and KCTD15 were duplicated from a single gene in the early evolution of vertebrates and after that a Crypton copy was inserted upstream of the KCTD1 gene, which generated a new transcriptional variant encoding the isoform b. This insertion should have occurred before the branching of the Chondrichthyes (sharks, rays, skates and chimaeras) about 527 MYA [30].: KIAA1958 gene: The KIAA1958 gene, of unknown function, is present in two isoforms (a and b) which contain different DUF3504 domains encoded by different exons (Figure 5). DUF3504 domains in isoforms a and b are only 26% identical to each other in the human genome. Although alternative splicing has not been confirmed experimentally, the high conservation of both DUF3504-coding sequences indicates that both encode functional proteins (Table 4 and Additional file 4). Neither of the two DUF3504-coding sequences is interrupted by introns. We found both isoforms in a wide range of tetrapods (Figure 6). Zebrafish lacks isoform a, whereas Chondrichthyes (chimaera) lack isoform b. Some Actinopterygii (medaka, stickleback, fugu and pufferfish) lack both isoforms. Chicken, zebra finch and platypus have an additional gene similar to KIAA1958 isoform b, designated KIAA1958L. The exons encoding isoform-specific regions of KIAA1958b are positioned upstream from those of KIAA1958a. The KIAA1958L gene likely originated from a duplication of the segment including KIAA1958b-specific exons but not including KIAA1958a-specific exons, and this duplication event predated the branching between mammals and birds. KIAA1958L is less conserved than KIAA1958 isoforms a and b. The DUF3504 domains of KIAA1958L proteins from platypus and chicken are only 44% identical, whereas those of the KIAA1958b are 97% identical between the species. KIAA1958a is nonfunctional in chicken and zebra finch, but it is intact in lizard. Isoform b was not found in Chondrichthyes, and it is possible that it originated later in the lineages which branched off Chondrichthyes. KIAA1958a might have originated in the common ancestor of gnathostomes. Another possibility is that both isoforms were acquired in the common ancestor of gnathostomes and isoforms a and b had been lost in the lineages of Actinopterygii and Chondrichthyes, respectively.: ZMYM2, ZMYM3, ZMYM4 and QRICH1 genes: The ZMYM2, ZMYM3 and ZMYM4 genes are present in diverse gnathostomes from human to chimaera (Figure 6). ZMYM2, ZMYM3 and ZMYM4 are similar to arthropod WOC in terms of their sequence and structure [31, 32]. The DUF3504 domains from the Drosophila melanogaster WOC gene and the human ZMYM2 gene are 41% identical at the protein level. Some introns are also positioned at the corresponding sites of ZMYM2 and WOC (Figure 7A). In addition to chordates and arthropods, we found sequence fragments related to WOC in echinoderms (Strongylocentrotus purpuratus), hemichordates (Saccoglossus kowalevskii), mollusks (Pinctada maxima and Aplysia californica) and platyhelminthes (Schistosoma mansoni, S. japonicum and Schmidtea mediterranea) (Additional file 5). There is no evidence that WOC forms multiple gene families in invertebrates. The ZMYM2, ZMYM3 and ZMYM4 genes are listed in the data set of ohnologs reported recently by Makino and McLysaght [33], which means that they were duplicated from a single gene during two rounds of whole-genome duplication in the early evolution of vertebrates before the split between jawed vertebrates and agnathans [34, 35]. The synteny blocks of ZMYM2, ZMYM3 and ZMYM4 share several genes in addition to ZMYM genes (Figure 7B). The most parsimonious scenario is that the WOC/ZMYM gene family originated from the domestication of Crypton in the common ancestor of bilaterians.: Paralogous relationships of WOC/ZMYM/QRICH1 genes. (A) Two conserved intron positions among WOC, ZMYM2, ZMYM3, ZMYM4 and QRICH1. Introns are printed in lowercase letters and shaded. Protein sequences are shown below nucleotide sequences. The upper and lower intron positions correspond to the 20th and 22nd introns of human ZMYM2, respectively. (B) The synteny blocks of ZMYM2, ZMYM3 and ZMYM4. Ohnologous relationships reported by Makino and McLysaght [33] are indicated by dotted lines. GJB = gap junction protein ß; GJA = gap junction protein a; DLGAP3 = discs large homolog-associated protein 3; C1orf212 = chromosome 1 open reading frame 212. Other gene names are described in the text.: There are three other ZMYM genes (ZMYM1, ZMYM5 and ZMYM6) in the synteny blocks (Figure 7B), but they have no DUF3504 domain. The N-terminal part of ZMYM5 is similar to that of ZMYM2, whereas those of ZMYM1 and ZMYM6 are similar to that of ZMYM4. These three genes are present only among eutherian mammals. These data support independent gene duplication events inside each synteny block. It is noteworthy that the C-terminal parts of ZMYM1, ZMYM5 and ZMYM6 derived from transposases of hAT-type DNA transposons, but these hAT-derived sequences are not close to each other. The C-terminal part of ZMYM6 is close to Charlie elements in the human genome, whereas that of ZMYM1 is closer to plant hAT elements such as HAT-1_Mad from apple (data not shown).: The QRICH1 gene was found in diverse vertebrates, including lamprey (Figure 6). The DUF3504 domain in QRICH1 is quite similar to those of ZMYM2, ZMYM3 and ZMYM4. Besides, five of eight introns of QRICH1 were at the sites corresponding to those of ZMYM2, ZMYM3 and ZMYM4 (Figure 7A and data not shown). The high structural and sequence similarity between WOC, ZMYM2/3/4 and QRICH1 indicates that QRICH1 originated from either WOC or ZMYM genes. In the neighborhood of QRICH1, we could not find any genes paralogous to genes in the synteny blocks of ZMYM2, ZMYM3 and ZMYM4. However, because QRICH1 is present in the lamprey genome, it must have originated at the time close to the whole-genome duplication events.: Discussion: Evolution of WOC: the third-oldest event of transposon domestication: The most ancient transposon-derived gene known to date is TERT, which was generated by the domestication of a Penelope-like retroelement [4], and Prp8, a spliceosomal component derived from a retrointron (group II self-splicing intron) [36]. TERT retains the catalytic activity of RT, but Prp8 does not. These two genes are shared by almost all eukaryotes. Another example of an ancient domestication event is the RAG1 gene [5]. It is distributed widely among gnathostomes, but no RAG1 ortholog was found in agnathans, including lamprey and hagfish. Given that agnathans have a different type of adaptive immune system called \"variable lymphocyte receptors\" [37], the domestication of RAG1 likely occurred in the last common ancestor of gnathostomes after their branching from agnathans. Other transposons domesticated in the distant past are in HARBI1 and PBDG5 genes, both of which are present in vertebrates from humans to actinopterygian fish [38, 39]. The KCTD1b, KIAA1958a and KIAA1958b genes are as old as or older than the HARBI1 and PBDG5 genes (Figure 6). A transposon-derived CENP-B, a highly conserved mammalian centromere, and three CENP-B-like proteins (Abp1, Cbh1 and Cbh2) in fission yeast resemble each other in terms of their sequences and functions, but they derived independently from different pogo-like transposases [40]. The human genome harbors a significant number of genes derived from transposons [6]. Some of them were domesticated in the distant past, and there are no traces of related repetitive sequences or TEs from which they were derived. For example, the HARBI1 gene was derived from PIF/Harbinger and PHSA (THAP domain-containing protein 9, or THAP9) from a P-like element [38, 41]. Both HARBI1 and PHSA were found by screening mammalian genes against DNA transposons from zebrafish. Similarly, the key to our findings of Crypton-derived genes was screening of genes against Cryptons preserved in medaka, because there are only a few remnants of Cryptons left in vertebrate genomes sequenced to date, except in medaka.: The ancestral gene for WOC/ZMYM probably originated in the common ancestor of all bilaterians more than 910 MYA [30]. This is the third-oldest transposon domestication event known to date, following the two domestication events of RT [4, 36]. Our study indicates that domestication of Crypton-like elements in eukaryotes was relatively common in the distant past. This implies that Cryptons are very ancient and, given their rare occurrence in the genomic fossil record and their great diversity, they were probably much more active in the distant past than in more recent evolutionary history.: Functional implications for domesticated Crypton YRs: No function of DUF3504 domains has been reported to date. Even so, the YR origin of DUF3504 domains implies their functions to some extent. YR forms a multimer when it binds substrate DNA during recombination [42]. On that basis, we can envision two possible functions derived from YRs: DNA binding and protein-protein interaction. There are several indications for functions of domesticated YRs. First, many genes derived from YRs are transcriptional regulators. Gcr1, KCTD1, WOC, ZMYM2, ZMYM3 ZMYM4, and ATF7IP are either transcriptional activators or repressors [43–48]. Cbf2 acts as a centromeric protein directly binding to centromere-specific sequences and is essential for spindle pole body formation [49, 50]. Although these proteins usually contain a DNA-binding domain other than DUF3504, exemplified by the GCR1_C domain of Gcr1 and Cbf2, the DUF3504 domain could also work as a DNA-binding domain. Second, there is an interesting resemblance between functions of domesticated DDE-transposases and YRs. Daysleeper is a transcription factor derived from a hAT DDE-transposase and binds a specific motif for transcription regulation [12]. CENP-B is a centromere protein derived from the DDE-transposase of a pogo-like transposon [13]. In these genes, transposase-derived domains act as DNA-binding domains. Third, a large family of prokaryotic transcriptional activators, AraC/XylS, shows structural similarity to YRs. The overall fold of the 129-amino acid protein MarA, a member of the AraC/XylS family, almost entirely recapitulates the YR domain of Cre recombinase [51]. MarA can simultaneously bind RNA polymerase II and DNA to form a ternary complex [52]. These data support the putative function of DUF3504 to be DNA or protein binding.: To date relatively little is known about Cryptons. There have been no studies of their transposition, transcription, translation or regulation. The sequence similarity between Cryptons is very low, especially in their non-protein-coding regions. We compared DNA sequences of Cryptons from different species, but we could not find any conserved nucleotide sequences among them. Furthermore, all Crypton domestication events are very old. Therefore, it is very difficult to propose any specific functions of DUF3504 domains. Instead, herein we propose potential pathways in which DUF3504 domains could be involved.: KCTD1 and KCTD15 are paralogs that have diverged during the early evolution of vertebrates (Additional file 6). KCTD1 isoform b, generated by an insertion of Crypton upstream of the original KCTD1 gene, is widely conserved among jawed vertebrates (Figure 6), although it is unclear whether the agnathans carry the KCTD1b gene. The high conservation of KCTD1b (Table 4) indicates its essential function shared among jawed vertebrates. KCTD1 represses the activity of the AP-2a transcription factor, and the BTB/POZ domain is responsible for the interaction [46]. AP-2a plays an essential role in neural border (NB) and neural crest (NC) formations during embryonic development [53]. NB is the precursor of NC. KCTD15 is expressed in NB and inhibits NC induction [54]. The NC cells are a transient, multipotent, migratory cell population unique to vertebrates. They give rise to diverse cell lineages. We can speculate that by adding a new protein-protein or protein-DNA interaction KCTD1b can contribute to the network of NC formation through the regulation of AP-2a.: Among DUF3504 genes, the function of WOC/ZMYM is of special interest because two of the genes in this group, ZMYM2 and ZMYM3, are linked to human diseases. A chromosomal translocation between ZMYM2 and fibroblast growth factor receptor 1 (FGFR1) causes lymphoblastic lymphoma and a myeloproliferative disorder [55]. A translocation involving ZMYM3 is associated with X-linked mental retardation [56]. Mutations of their ortholog, WOC, cause larval lethality in D. melanogaster[31].: WOC/ZMYM gene-encoded proteins are involved in various processes, including transcription, DNA repair and splicing. WOC is a transcriptional regulator that colocalizes with the initiating forms of RNA polymerase II [31, 32]. The WOC proteins also colocalize with all telomeres, and mutants of WOC are associated with frequent telomeric fusions [31, 32]. ZMYM2, ZMYM3 and ZMYM4 are components of a multiprotein corepressor complex, including histone deacetylase 1 (HDAC1) and HDAC2 [47, 48]. ZMYM2 binds to various transcriptional regulators including Smad proteins [57]. It also binds to proteins involved in homologous recombination, such as RAD18, HHR6A and HHR6B, which are human orthologs of the yeast RAD proteins [58], and to spliceosomal components including SFPQ (splicing factor, proline- and glutamine-rich) [59].: Interestingly, the SFPQ gene is a component of the syntenic cluster of ZMYM4 (Figure 7B). The paralog of SFPQ in the cluster of ZMYM3 is NONO (non-POU domain-containing, octamer-binding), which is a partner of SFPQ in heteromers [60]. PSPC1 (paraspeckle component 1) present in the cluster of ZMYM2 also shows similarity to SFPQ and NONO genes. In addition to their involvement in splicing, the SFPQ proteins contribute to DNA repair by interacting with RAD51 [61]. They are also recruiting HDAC1 to the STAT6 transcription complex [62]. Therefore, it is likely that WOC/ZMYM and SFPQ/NONO/PSPC1 proteins cooperatively act in transcription regulation, splicing and DNA repair, and that they have coevolved by maintaining their functional relationships. Their DUF3504 domains may contribute to some of the protein-protein interactions.: Evolution of Cryptons: To date Cryptons have been identified in a limited number of fungi and animal species. Herein we report the presence of Cryptons in new species, but information regarding their overall distribution continues to be patchy (Table 1). CryptonF elements are present in three phyla of fungi (Ascomycota, Basidiomycota and Zygomycota) and two orders of oomycetes (Peronosporales and Saprolegniales). Our phylogenetic analysis supports the horizontal transfer of CryptonF elements between fungi and oomycetes (Figure 2), which is consistent with frequent horizontal transfer of genes between them [63]. CryptonS elements are also present in two oomycete orders and one species of diatoms. Both oomycetes and diatoms are lineages of stramenopiles, and the origin of CryptonS elements could date back to their common ancestor.: Animal Cryptons (CryptonA and CryptonI) were found in six phyla: Chordata, Echinodermata, Hemichordata, Arthropoda, Mollusca and Cnidaria. CryptonI elements have the same overall structure as CryptonA elements and were observed only in insect genomes. It is possible that CryptonI elements constitute a branch of CryptonA but they have evolved more rapidly in insects. The overall distribution in fungi, oomycetes and animals indicates that Cryptons were long present in these three eukaryotic groups, probably with some contribution of a horizontal transfer. It is likely that Cryptons originated in the common ancestor of these three groups, although because of the low resolution of the YR phylogeny, we cannot rule out the possibility of their independent origins.: The identification of Crypton elements in medaka is surprising. The nucleotide diversity of Cryptons in the medaka genome clearly shows that Cryptons were maintained in the lineage leading to medaka for a long time. It is possible that Cryptons invaded the medaka population after the split of medaka from the three actinopterygian fish species (Gasterosteus, Takifugu and Tetraodon), whose genomes have been sequenced. The vertical transfer of Cryptons in the lineage leading to medaka is a preferable scenario because of the domestication of Crypton in the common ancestor of bilaterian animals, which led to the origin of WOC genes. In most identified host organisms, Cryptons are preserved in very low copy numbers (Additional file 1). We found several fragments of Cryptons in various vertebrates, including zebrafish (Table 3). The origin of Crypton-derived genes took place at different times during the evolution of vertebrates (Figure 6). This is consistent with the hypothesis that Cryptons continued to maintain very low copy numbers in the vertebrate genomes and were occasionally amplified in certain lineages.: Conclusions: This study has revealed the diversity of a unique class of DNA transposons, Cryptons, and their repeated domestication events. The DUF3504 domains are domesticated YRs of animal Crypton elements. Our findings add a new repertoire of domesticated proteins and provide further evidence for an important role of transposable elements as a reservoir for new cellular functions.: Methods: Data source: Genome sequences of various species were obtained mostly from GenBank, and sequences of known Cryptons, DIRS and Ngaro were obtained from Repbase http://www.girinst.org/repbase/. All characterized Cryptons have been deposited in Repbase.: Sequence analysis: Characterization of new Cryptons was achieved by repeated BLAST [64] and CENSOR [65] searches using genome sequences of various species with Cryptons as queries. All analyses were done with default settings. The consensus sequences of elements were derived using the majority rule applied to the corresponding sets of multiple aligned copies of Cryptons. Alignment gaps were manually adjusted to maximize similarity to other related elements. Characterization of DUF3504 genes was performed by BLAST searches against both protein and nucleotide databases with known DUF3504 genes as queries. We predicted exon-intron boundaries with the aid of SoftBerry FGENESH:: http://linux1.softberry.com/berry.phtml?topic=fgenesh&group=programs&subgroup=gfind and manually adjusted them through the comparison to orthologous sequences in other species.: Sequence alignment and phylogenetic analysis: We used MAFFT [66] with the linsi option or MUSCLE [67] with default settings to align both nucleotide and protein sequences of various Cryptons and Crypton-derived proteins. We constructed maximum likelihood trees by using PhyML [68, 69] with 100 bootstrap replicates [70] for the amino acid substitution model LG. We also constructed trees with other substitution models, WAG, RtREV and DCMut, and with the Neighbor-joining method, but the resolution did not improve. The tree topology search method was Nearest Neighbor Interchange (NNI), and the initial tree was BIONJ. The phylogenetic trees were drawn with FigTree 1.3.1 software http://tree.bio.ed.ac.uk/software/figtree/.: Abbreviations: base pair: endonuclease: million years ago: reverse transcriptase: transposable element: telomerase reverse transcriptase: terminal inverted repeat: target site duplication: tyrosine recombinase.: References: Kapitonov VV, Jurka J: A universal classification of eukaryotic transposable elements implemented in Repbase. Nat Rev Genet. 2008, 9: 411-414.: Miller WJ, Hagemann S, Reiter E, Pinsker W: P-element homologous sequences are tandemly repeated in the genome of Drosophila guanche. Proc Natl Acad Sci USA. 1992, 89: 4018-4022. 10.1073/pnas.89.9.4018.: Greider CW, Blackburn EH: The telomere terminal transferase of Tetrahymena is a ribonucleoprotein enzyme with two kinds of primer specificity. Cell. 1987, 51: 887-898. 10.1016/0092-8674(87)90576-9.: Gladyshev EA, Arkhipova IR: Telomere-associated endonuclease-deficient Penelope-like retroelements in diverse eukaryotes. Proc Natl Acad Sci USA. 2007, 104: 9352-9357. 10.1073/pnas.0702741104.: Kapitonov VV, Jurka J: RAG1 core and V(D)J recombination signal sequences were derived from Transib transposons. PLoS Biol. 2005, 3: e181-10.1371/journal.pbio.0030181.: Volff JN: Turning junk into gold: domestication of transposable elements and the creation of new genes in eukaryotes. Bioessays. 2006, 28: 913-922. 10.1002/bies.20452.: Curcio MJ, Derbyshire KM: The outs and ins of transposition: from Mu to kangaroo. Nat Rev Mol Cell Biol. 2003, 4: 865-877.: Aziz RK, Breitbart M, Edwards RA: Transposases are the most abundant, most ubiquitous genes in nature. Nucleic Acids Res. 2010, 38: 4207-4217. 10.1093/nar/gkq140.: Kapitonov VV, Jurka J: Self-synthesizing DNA transposons in eukaryotes. Proc Natl Acad Sci USA. 2006, 103: 4540-4545. 10.1073/pnas.0600833103.: Bao W, Jurka MG, Kapitonov VV, Jurka J: New superfamilies of eukaryotic DNA transposons and their internal divisions. Mol Biol Evol. 2009, 26: 983-993. 10.1093/molbev/msp013.: Bao W, Kapitonov VV, Jurka J: Ginger DNA transposons in eukaryotes and their evolutionary relationships with long terminal repeat retrotransposons. Mob DNA. 2010, 1: 3-10.1186/1759-8753-1-3.: Bundock P, Hooykaas P: An Arabidopsis hAT-like transposase is essential for plant development. Nature. 2005, 436: 282-284. 10.1038/nature03667.: Tudor M, Lobocka M, Goodell M, Pettitt J, O'Hare K: The pogo transposable element family of Drosophila melanogaster. Mol Gen Genet. 1992, 232: 126-134. 10.1007/BF00299145.: Feng Q, Moran JV, Kazazian HH, Boeke JD: Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell. 1996, 87: 905-916. 10.1016/S0092-8674(00)81997-2.: Kojima KK, Fujiwara H: An extraordinary retrotransposon family encoding dual endonucleases. Genome Res. 2005, 15: 1106-1117. 10.1101/gr.3271405.: Yang J, Malik HS, Eickbush TH: Identification of the endonuclease domain encoded by R2 and other site-specific, non-long terminal repeat retrotransposable elements. Proc Natl Acad Sci USA. 1999, 96: 7847-7852. 10.1073/pnas.96.14.7847.: Pyatkov KI, Arkhipova IR, Malkova NV, Finnegan DJ, Evgenev MB: Reverse transcriptase and endonuclease activities encoded by Penelope-like retroelements. Proc Natl Acad Sci USA. 2004, 101: 14719-14724. 10.1073/pnas.0406281101.: Nunes-Düby SE, Kwon HJ, Tirumalai RS, Ellenberger T, Landy A: Similarities and differences among 105 members of the Int family of site-specific recombinases. Nucleic Acids Res. 1998, 26: 391-406. 10.1093/nar/26.2.391.: Goodwin TJ, Butler MI, Poulter RT: Cryptons: a group of tyrosine-recombinase-encoding DNA transposons from pathogenic fungi. Microbiology. 2003, 149: 3099-3109. 10.1099/mic.0.26529-0.: Broach JR, Hicks JB: Replication and recombination functions associated with the yeast plasmid, 2 µ circle. Cell. 1980, 21: 501-508. 10.1016/0092-8674(80)90487-0.: Doak TG, Witherspoon DJ, Jahn CL, Herrick G: Selection on the genes of Euplotes crassus Tec1 and Tec2 transposons: evolutionary appearance of a programmed frameshift in a Tec2 gene encoding a tyrosine family site-specific recombinase. Eukaryot Cell. 2003, 2: 95-102. 10.1128/EC.2.1.95-102.2003.: Jacobs ME, Sánchez-Blanco A, Katz LA, Klobutcher LA: Tec3, a new developmentally eliminated DNA element in Euplotes crassus. Eukaryot Cell. 2003, 2: 103-114. 10.1128/EC.2.1.103-114.2003.: Goodwin TJ, Poulter RT: A new group of tyrosine recombinase-encoding retrotransposons. Mol Biol Evol. 2004, 21: 746-759. 10.1093/molbev/msh072.: Goodwin TJ, Poulter RT: The DIRS1 group of retrotransposons. Mol Biol Evol. 2001, 18: 2067-2082.: Lorenzi HA, Robledo G, Levin MJ: The VIPER elements of trypanosomes constitute a novel group of tyrosine recombinase-enconding retrotransposons. Mol Biochem Parasitol. 2006, 145: 184-194. 10.1016/j.molbiopara.2005.10.002.: Golic KG, Lindquist S: The FLP recombinase of yeast catalyzes site-specific recombination in the Drosophila genome. Cell. 1989, 59: 499-509. 10.1016/0092-8674(89)90033-0.: van Duyne GD: A structural view of tyrosine recombinase site-speciic recombination. Mobile DNA II. Edited by: Craig NL, Craigie R, Gellert M, Lambowitz AM. 2002, Washington, DC: American Society of Microbiology Press, 93-117.: Jurka J, Kapitonov VV: First Cryptons from invertebrates. Repbase Rep. 2008, 8: 232-233.: Jurka J: First Cryptons from insects. Repbase Rep. 2009, 9: 468-480.: Hedges SB, Kumar S: The TimeTree of Life. 2009, New York: Oxford University Press: Raffa GD, Cenci G, Siriaco G, Goldberg ML, Gatti M: The putative Drosophila transcription factor Woc is required to prevent telomeric fusions. Mol Cell. 2005, 20: 821-831. 10.1016/j.molcel.2005.12.003.: Font-Burgada J, Rossell D, Auer H, Azorín F: Drosophila HP1c isoform interacts with the zinc-finger proteins WOC and Relative-of-WOC to regulate gene expression. Genes Dev. 2008, 22: 3007-3023. 10.1101/gad.481408.: Makino T, McLysaght A: Ohnologs in the human genome are dosage balanced and frequently associated with disease. Proc Natl Acad Sci USA. 2010, 107: 9270-9274. 10.1073/pnas.0914697107.: Kuraku S, Meyer A, Kuratani S: Timing of genome duplications relative to the origin of the vertebrates: did cyclostomes diverge before or after?. Mol Biol Evol. 2009, 26: 47-59.: Ohno S: Evolution by Gene Duplication. 1970, New York: Springer-Verlag: Dlakic M, Mushegian A: Prp8, the pivotal protein of the spliceosomal catalytic center, evolved from a retroelement-encoded reverse transcriptase. RNA. 2011, 17: 799-808. 10.1261/rna.2396011.: Pancer Z, Amemiya CT, Ehrhardt GR, Ceitlin J, Gartland GL, Cooper MD: Somatic diversification of variable lymphocyte receptors in the agnathan sea lamprey. Nature. 2004, 430: 174-180. 10.1038/nature02740.: Kapitonov VV, Jurka J: Harbinger transposons and an ancient HARBI1 gene derived from a transposase. DNA Cell Biol. 2004, 23: 311-324. 10.1089/104454904323090949.: Sarkar A, Sim C, Hong YS, Hogan JR, Fraser MJ, Robertson HM, Collins FH: Molecular evolutionary analysis of the widespread piggyBac transposon family and related \"domesticated\" sequences. Mol Genet Genomics. 2003, 270: 173-180. 10.1007/s00438-003-0909-0.: Casola C, Hucks D, Feschotte C: Convergent domestication of pogo-like transposases into centromere-binding proteins in fission yeast and mammals. Mol Biol Evol. 2008, 25: 29-41.: Hammer SE, Strehl S, Hagemann S: Homologs of Drosophila P transposons were mobile in zebrafish but have been domesticated in a common ancestor of chicken and human. Mol Biol Evol. 2005, 22: 833-844. 10.1093/molbev/msi068.: Guo F, Gopaul DN, van Duyne GD: Structure of Cre recombinase complexed with DNA in a site-specific recombination synapse. Nature. 1997, 389: 40-46. 10.1038/37925.: Holland MJ, Yokoi T, Holland JP, Myambo K, Innis MA: The GCR1 gene encodes a positive transcriptional regulator of the enolase and glyceraldehyde-3-phosphate dehydrogenase gene families in Saccharomyces cerevisiae. Mol Cell Biol. 1987, 7: 813-820.: Rep M, Reiser V, Gartner U, Thevelein JM, Hohmann S, Ammerer G, Ruis H: Osmotic stress-induced gene expression in Saccharomyces cerevisiae requires Msn1p and the novel nuclear factor Hot1p. Mol Cell Biol. 1999, 19: 5474-5485.: Liu L, Ishihara K, Ichimura T, Fujita N, Hino S, Tomita S, Watanabe S, Saitoh N, Ito T, Nakao M: MCAF1/AM is involved in Sp1-mediated maintenance of cancer-associated telomerase activity. J Biol Chem. 2009, 284: 5165-5174.: Ding X, Luo C, Zhou J, Zhong Y, Hu X, Zhou F, Ren K, Gan L, He A, Zhu J, Gao X, Zhang J: The interaction of KCTD1 with transcription factor AP-2a inhibits its transactivation. J Cell Biochem. 2009, 106: 285-295. 10.1002/jcb.22002.: Hakimi MA, Dong Y, Lane WS, Speicher DW, Shiekhattar R: A candidate X-linked mental retardation gene is a component of a new family of histone deacetylase-containing complexes. J Biol Chem. 2003, 278: 7234-7239. 10.1074/jbc.M208992200.: Gocke CB, Yu H: ZNF198 stabilizes the LSD1-CoREST-HDAC1 complex on chromatin through its MYM-type zinc fingers. PLoS One. 2008, 3: e3255-10.1371/journal.pone.0003255.: Jiang W, Lechner J, Carbon J: Isolation and characterization of a gene (CBF2) specifying a protein component of the budding yeast kinetochore. J Cell Biol. 1993, 121: 513-519. 10.1083/jcb.121.3.513.: Goh PY, Kilmartin JV: NDC10: a gene involved in chromosome segregation in Saccharomyces cerevisiae. J Cell Biol. 1993, 121: 503-512. 10.1083/jcb.121.3.503.: Gillette WK, Martin RG, Rosner JL: Probing the Escherichia coli transcriptional activator MarA using alanine-scanning mutagenesis: residues important for DNA binding and activation. J Mol Biol. 2000, 299: 1245-1255. 10.1006/jmbi.2000.3827.: Martin RG, Gillette WK, Martin NI, Rosner JL: Complex formation between activator and RNA polymerase as the basis for transcriptional activation by MarA and SoxS in Escherichia coli. Mol Microbiol. 2002, 43: 355-370. 10.1046/j.1365-2958.2002.02748.x.: de Crozé N, Maczkowiak F, Monsoro-Burq AH: Reiterative AP2a activity controls sequential steps in the neural crest gene regulatory network. Proc Natl Acad Sci USA. 2011, 108: 155-160. 10.1073/pnas.1010740107.: Dutta S, Dawid IB: Kctd15 inhibits neural crest formation by attenuating Wnt/ß-catenin signaling output. Development. 2010, 137: 3013-3018. 10.1242/dev.047548.: Xiao S, Nalabolu SR, Aster JC, Ma J, Abruzzo L, Jaffe ES, Stone R, Weissman SM, Hudson TJ, Fletcher JA: FGFR1 is fused with a novel zinc-finger gene, ZNF198, in the t(8;13) leukaemia/lymphoma syndrome. Nat Genet. 1998, 18: 84-87. 10.1038/ng0198-84.: van der Maarel SM, Scholten IHJM, Huber I, Philippe C, Suijkerbuijk RF, Gilgenkrantz S, Kere J, Cremers FPM, Ropers HH: Cloning and characterization of DXS6673E, a candidate gene for X-linked mental retardation in Xq13.1. Hum Mol Genet. 1996, 5: 887-897. 10.1093/hmg/5.7.887.: Warner DR, Roberts EA, Greene RM, Pisano MM: Identification of novel Smad binding proteins. Biochem Biophys Res Commun. 2003, 312: 1185-1190. 10.1016/j.bbrc.2003.11.049.: Kunapuli P, Somerville R, Still IH, Cowell JK: ZNF198 protein, involved in rearrangement in myeloproliferative disease, forms complexes with the DNA repair-associated HHR6A/6B and RAD18 proteins. Oncogene. 2003, 22: 3417-3423. 10.1038/sj.onc.1206408.: Kasyapa CS, Kunapuli P, Cowell JK: Mass spectroscopy identifies the splicing-associated proteins, PSF, hnRNP H3, hnRNP A2/B1, and TLS/FUS as interacting partners of the ZNF198 protein associated with rearrangement in myeloproliferative disease. Exp Cell Res. 2005, 309: 78-85. 10.1016/j.yexcr.2005.05.019.: Shav-Tal Y, Zipori D: PSF and p54nrb/NonO: multi-functional nuclear proteins. FEBS Lett. 2002, 531: 109-114. 10.1016/S0014-5793(02)03447-6.: Rajesh C, Baker DK, Pierce AJ, Pittman DL: The splicing-factor related protein SFPQ/PSF interacts with RAD51D and is necessary for homology-directed repair and sister chromatid cohesion. Nucleic Acids Res. 2011, 39: 132-145. 10.1093/nar/gkq738.: Dong L, Zhang X, Fu X, Zhang X, Gao X, Zhu M, Wang X, Yang Z, Jensen ON, Saarikettu J, Yao Z, Silvennoinen O, Yang J: PTB-associated splicing factor (PSF) functions as a repressor of STAT6-mediated Ige gene transcription by recruitment of HDAC1. J Biol Chem. 2011, 286: 3451-3459. 10.1074/jbc.M110.168377.: Richards TA, Dacks JB, Jenkinson JM, Thornton CR, Talbot NJ: Evolution of filamentous plant pathogens: gene exchange across eukaryotic kingdoms. Curr Biol. 2006, 16: 1857-1864. 10.1016/j.cub.2006.07.052.: Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.: Kohany O, Gentles AJ, Hankus L, Jurka J: Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor. BMC Bioinformatics. 2006, 7: 474-10.1186/1471-2105-7-474.: Katoh K, Kuma K, Toh H, Miyata T: MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 2005, 33: 511-518. 10.1093/nar/gki198.: Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32: 1792-1797. 10.1093/nar/gkh340.: Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O: New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol. 2010, 59: 307-321. 10.1093/sysbio/syq010.: Guindon S, Lethiec F, Duroux P, Gascuel O: PHYML Online: a web server for fast maximum likelihood-based phylogenetic inference. Nucleic Acids Res. 2005, W557-W559. 33 Web Server: Anisimova M, Gascuel O: Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Syst Biol. 2006, 55: 539-552. 10.1080/10635150600755453.: Download references: Acknowledgements: This work was supported by National Institutes of Health grant 5 P41 LM006252. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Library of Medicine or the National Institutes of Health.: Author information: Affiliations: Corresponding author: Correspondence to Jerzy Jurka.: Additional information: Competing interests: The authors declare that they have no competing interests.: Authors' contributions: KKK initiated the research. KKK and JJ performed the research and wrote the manuscript. Both authors read and approved the final manuscript.: Electronic supplementary material: Additional file 1:PDF file listing Crypton elements found in this study. (PDF 103 KB): Additional file 2:PDF file listing Crypton-derived genes in fungi. (PDF 88 KB): Additional file 3:PDF file showing alignment of Cryptons and Crypton-derived genes in Saccharomycetaceae fungi in fasta format. (PDF 132 KB): Additional file 4:PDF file showing alignment of Cryptons and Crypton-derived genes in animals in fasta format. (PDF 104 KB): Additional file 5:PDF file listing the accession numbers for DUF3504 genes. (PDF 98 KB): Additional file 6:PDF file showing alignment of KCTD1, KCTD15 and related protein sequences in fasta format. (PDF 83 KB): Authors’ original submitted files for images: Below are the links to the authors’ original submitted files for images.: Authors’ original file for figure 1: Authors’ original file for figure 2: Authors’ original file for figure 3: Authors’ original file for figure 4: Authors’ original file for figure 5: Authors’ original file for figure 6: Authors’ original file for figure 7: Rights and permissions: This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.: Reprints and Permissions: About this article: Cite this article: Kojima, K.K., Jurka, J. Crypton transposons: identification of new diverse families and ancient domestication events. Mobile DNA 2, 12 (2011). https://doi.org/10.1186/1759-8753-2-12: Download citation: Received: 19 August 2011: Accepted: 19 October 2011: Published: 19 October 2011: DOI: https://doi.org/10.1186/1759-8753-2-12: Keywords: Advertisement: ISSN: 1759-8753: Follow BMC: By using this website, you agree to our Terms and Conditions, Privacy statement and Cookies policy. Manage the cookies we use in the preference centre. : © 2020 BioMed Central Ltd unless otherwise stated. Part of Springer Nature. "
"Retrotransposition of R2 elements in somatic nuclei during the early development of Drosophila" "Michael T Eickbush, Thomas H Eickbush" "Thomas H Eickbush" "29 September 2011" "R2 retrotransposable elements exclusively insert in the 28S rRNA genes of their host. Their RNA transcripts are produced by self-processing from a 28S R2 cotranscript. Because full-length R2 transcripts are found in most tissues of R2-active animals, we tested whether new R2 insertions occurred in somatic tissues even though such events would be an evolutionary dead end., PCR assays were used to identify somatic R2 insertions in isolated adult tissues and larval imaginal discs of Drosophila simulans. R2 somatic mosaics were detected encompassing cells from individual tissues as well as tissues from multiple body segments. The somatic insertions had 5' junction sequences characteristic of germline insertions suggesting they represented authentic retrotransposition events., Body segments are specified early in Drosophila development, thus the detection of the same somatic insertion in cells from multiple tissues suggested that the R2 retrotransposition events had occurred before the blastoderm stage of Drosophila development. R2 activity at this stage, when embryonic nuclei are rapidly dividing in a common cytoplasm, suggests that some retrotransposition events appearing as germline events may correspond to germline mosaicism." "Mobile Element, Somatic Tissue, Imaginal Disc, Nurse Cell, rDNA Locus" " Retrotransposition of R2 elements in somatic nuclei during the early development of Drosophila: Michael T Eickbush1,2 & Thomas H Eickbush1 : Mobile DNA volume 2, Article number: 11 (2011) Cite this article : 4875 Accesses: 20 Citations: 7 Altmetric: Metrics details: Abstract: Background: R2 retrotransposable elements exclusively insert in the 28S rRNA genes of their host. Their RNA transcripts are produced by self-processing from a 28S R2 cotranscript. Because full-length R2 transcripts are found in most tissues of R2-active animals, we tested whether new R2 insertions occurred in somatic tissues even though such events would be an evolutionary dead end.: Findings: PCR assays were used to identify somatic R2 insertions in isolated adult tissues and larval imaginal discs of Drosophila simulans. R2 somatic mosaics were detected encompassing cells from individual tissues as well as tissues from multiple body segments. The somatic insertions had 5' junction sequences characteristic of germline insertions suggesting they represented authentic retrotransposition events.: Conclusions: Body segments are specified early in Drosophila development, thus the detection of the same somatic insertion in cells from multiple tissues suggested that the R2 retrotransposition events had occurred before the blastoderm stage of Drosophila development. R2 activity at this stage, when embryonic nuclei are rapidly dividing in a common cytoplasm, suggests that some retrotransposition events appearing as germline events may correspond to germline mosaicism.: Findings: Mobile element insertions during the development of somatic tissues provide no benefit to the element, as these insertions are not transferred to subsequent generations. Thus in animals, where the separation of somatic and germline tissues is established early, the ability of a mobile element to generate new insertions in somatic tissues would most likely be selected against. Consistent with this prediction, early studies in Drosophila melanogaster showed that P element transpositions were dependent upon a germline-specific RNA splicing component [1], and I elements were only transcribed in ovaries [2]. However, counter to this model, mobile elements in other animals have been shown to generate new insertions in somatic tissues (for example, Tc1 elements in Caenorhabditis elegans [3], L1 elements in mammals [4, 5]).: Several explanations can be put forward for the somatic activity of mobile elements. First, somatic events are inconsequential to the host and thus there is little selective pressure for a mobile element to evolve specificity to the germline. Second, somatic events are harmful, however it is risky for a mobile element to become dependent on a germline-specific mechanism, as it provides another opportunity for the host to control the element. Third, on occasions somatic events provide a benefit to the host. This last fascinating possibility has been suggested to explain the ability of L1 to retrotranspose in nerve tissues [6].: R2 non-LTR retrotransposable elements specifically insert into the tandemly repeated rRNA genes of many animal genera, Figure 1A[7, 8]. Each R2 insertion blocks the production of functional 28S rRNA from the inserted gene. Because animals contain many more rRNA genes than are needed for transcription [9, 10], in most individuals inserted rRNA units are simply not transcribed. However, studies in Drosophila simulans indicate that in individuals where R2-inserted units are distributed throughout the rDNA locus, inserted rDNA units are transcribed [11–13]. The R2 transcripts are processed from the cotranscript [14], and new germline retrotransposition events can be observed in the progeny. Because rRNA transcription is essential in all tissues, and full-length R2 transcripts are readily detected in most larval and adult tissues of active lines ([12] and D. Eickbush and T. Eickbush, unpublished results), we tested whether R2 retrotranspositions also occur in somatic tissues. The D. simulans stock selected for study, sim89, had high levels of R2 transcripts in many tissues and new insertions could be detected in progeny that had originated in either the male or female parent [12, 15]. For each animal the screens for R2 somatic mosaics were conducted with either seven adult tissues (antenna, proboscis, the rest of the head, wing, haltere and individual legs from different thoracic segments), or with four third instar larval tissues that are precursors to adult tissues (brain, and three pairs of imaginal discs). Larval-specific tissues were not used because they are composed of polytene cells, which under-replicate R2-inserted rDNA units [16]. New insertions were assayed using the same 5' junction PCR assays previously used to detect germline events [11, 15, 17]. These assays utilize the property that while all R2 insertions occur at the identical location in the 28S genes when monitored from the 3' end, over half of the R2 retrotransposition events result in a deletion of element sequences starting at its 5' end and extending to locations throughout its 3.6 kb length. These 5' truncated copies can also contain short deletions or duplications of upstream 28S gene sequences. As a result somatic R2 insertions containing 5' truncations will generate PCR bands of lengths that seldom match the lengths of the PCR bands derived from the germline inherited 5' truncated elements. The PCR assays were conducted using a single primer located 80 bp upstream of the R2 insertion site in combination with a series of primers to sequences spaced throughout the length of the R2 element (Figure 1B) [12].: Diagram of R2 insertions within the rRNA genes of Drosophila and the PCR assay used to monitor somatic mosaicism. (A) Diagram of the tandemly repeated rRNA genes of Drosophila simulans and the location of R2 insertions. Black boxes, 18S, 5.8S and 28s rRNA genes (5.8S gene between the 18S and 28S genes is not labeled); white boxes, transcribed spacer regions. (B) About half of the R2 insertions have deletions of their 5' end that can extend to nearly the entire length of the element. All R2 copies have the same 3' junction with the 28S gene. Arrows above the R2/28S diagrams indicate the positions of the oligonucleotide primers used to assay for the 5' truncations. The DNA extraction method, the primers used and the PCR protocols used were identical to those in previous reports [11–13]. (C) Examples of the ethidium stained PCR products derived from larval tissues. The larval tissues were dissected in Drosophila Ringers. (D) Examples of the ethidium stained PCR products derived from adult tissues. Adult tissues were placed directly in the DNA extraction solution. PCR bands interpreted as somatic insertions are indicated with arrows. To be scored as a somatic mosaic the amplified band had to be detected with two sets of primer combinations (shown below the figures). The following abbreviations for body segments were used: An = antenna; Br, brain from a larvae; D1-D3, individual pairs of imaginal disc (the specific disc pair used was not known); Ha = haltere; Hd = head; L1 and L2 = individuals legs from different body segments; Pr = proboscis; Wi = wing.: Somatic mosaics were defined as the presence of unique PCR bands in only a subset of the tissues tested from a single animal. To be scored as a somatic insertion, each new PCR band also had to be reproducibly detected using two different PCR primer combinations. Examples of an R2 insertion in one tissue of the four tested from a third instar larva, and of another insertion detected in three of seven adult tissues are shown in Figure 1C, D. The PCR bands representing potential somatic events were less intense than the bands derived from the R2 elements inherited from the mother or father, as expected if not all cells of a tissue type contained the insertion. Generally new bands could be reproducibly observed if they corresponded to at least one-tenth the intensity of those bands derived from inherited R2 copies. In total, tissues from 29 individuals (14 females, 15 males) were scored. A total of 15 potential somatic insertions were detected in 7 animals (2 females with 4 total events, and 5 males with 11 total events). The detection of greater numbers of new insertions in males compared to females was likely due to the greater sensitivity of the PCR assay in males. The rRNA genes in D. simulans are located on the × chromosome [18]: thus males contain a single rDNA locus, compared to two copies of the rDNA locus in females. The somatic events were detected in essentially all tissues examined, although the numbers of events were not sufficient to make conclusions about relative frequencies.: To confirm that the PCR bands detected in only a subset of tissues corresponded to new R2 insertions arising from retrotransposition mechanisms similar to that of germline events, PCR bands representing 12 events that were well separated from the germline bands were excised from the gel, reamplified and the product sequenced. As shown in Figure 2 the 5' ends of the somatic R2 insertions had the characteristics associated with germline R2 insertions [19]. First, the 5' junctions of the R2 sequences with the 28S gene occurred at a variety of positions near the R2 insertion site. As with germline insertions, most junctions were within a few base pairs of the insertion site, but a few were found at distances of approximately 25 and 50 nucleotides. Second, seven insertions had microidentities of from 1-3 nucleotides between the R2 element and the upstream 28S sequence (sequences highlighted in blue). These microidentities are suggested to arise by the R2 DNA polymerase (also known as reverse transcriptase) annealing the upstream target DNA of the 28S gene to the newly made cDNA strand to prime second strand DNA synthesis. Microidentities at the 5' junction of truncated copies is a common property of L1 and other non-LTR retrotransposons [20]. Third, in those cases with no microidentity, from 1-9 nucleotides were present at the junction that did not correspond to either the upstream 28S gene or the R2 element (sequences highlighted in orange). These bases are postulated to represent non-templated synthesis by the R2 reverse transcriptase on the second DNA strand cleavage site until a microidentity between these added nucleotides and the cDNA strand enables the polymerase to prime second strand DNA synthesis. In summary, the physical properties of the 5' junctions of the new PCR bands detected in somatic tissues suggest they represent authentic retrotransposition events.: Diagram of the 5' junction sequences of the somatic R2 insertions with the 28S gene. Putative somatic insertions such as those shown in Figure 1C, D were excised from a gel, re-amplified with the same PCR primers, purified on a second gel and subjected to double-stranded DNA sequencing. The 28S gene sequence is shown at the top of the figure. Short regions upstream and downstream of the R2 insertion site (arrow) are not shown because no R2 junctions occurred in these areas. For each junction those sequences corresponding to R2 sequences have been highlighted with tan shaded boxes. Those nucleotides that could correspond to either the 28S sequence or R2 (described as microidentities in the text) have been indicated with a blue box. Those sequences that do not correspond to either the 28S gene or R2 (described as non-templated nucleotides in the text) are indicated with an orange box. The number at each junction corresponds to the first nucleotide of the R2 element, based on the consensus Drosophila simulans R2 sequence [18].: Because the development of Drosophila has been intensively investigated, the timing of the retrotransposition events that generated the observed somatic mosaics can be estimated. By mid-embryogenesis (10-12 h), small clusters of cells (10-40 cells) are specified to become individual imaginal discs [21, 22]. Each imaginal disc primordium divides during the 3 larval instars to form from 10,000 to 60,000 cells by late larval development [23]. Because the observed somatic events were present in a significant fraction of the cells present in a third instar larval disc or an adult tissue, the retrotransposition events probably occurred before or early in imaginal disc development. Those retrotransposition events detected in more than one disc or adult appendage probably occurred even earlier in development, before determination of body segments at the blastoderm stage (2-3 h). Of the 15 events we observed, 5 were detected in cells derived from more than 1 body segment. Because we surveyed only a fraction of all body segments in either the larvae or adult, it is likely that a larger fraction of the somatic R2 insertion events we observed occurred before the blastoderm stage. This developmental period corresponds to rapid nuclear division in a common cytoplasm. During this period there is little RNA synthesis but active protein synthesis using the RNA synthesized by the nurse cells and deposited in the oocyte during oogenesis [24]. Because rRNA synthesis also does not occur in these first hours of development [25], R2 retrotransposition events occurring during this time probably use RNA templates synthesized by the nurse cells during oogenesis.: It should be noted that the observed somatic retrotransposition events likely occurred at a time when embryonic nuclei had not yet entered the pole plasma of the egg to become the germline. Thus in addition to somatic mosiacism there is also likely to be germline mosiacism of R2 elements in Drosophila. As a result, re-evaluation of a previous study of retrotransposition in the germline of males and females appears warranted [15]. We have previously suggested that the rate of R2 insertion inherited through the male germline was one-third to one-quarter the rate of insertions through the female germline. Based on the findings in this report, it is possible that all of the insertions scored as inherited through the male germline (that is, during spermatogenesis), actually occurred during early embryogenesis.: Because preblastoderm development in male and female embryos are similar, we suggest the higher rate of R2 insertions observed through the female germline represents this germline mosiacism as well as authentic germline events during oogenesis. Two separate periods of R2 activity in females was also consistent with experiments to monitor a large fraction of the offspring from individual females. In the most comprehensive study, new insertions were assayed in 213 progeny of a single female [15]. Of the 32 different R2 insertions detected in these progeny, 27 were found in only 1 individual and 4 were detected in 2 individuals. These insertions appeared to have occurred late in the development of the germline (that is, during oogenesis). The final R2 insertion was detected in 13 progeny, and could correspond to an insertion during early development. Additional evidence for germline mosiacism was found in the analysis of progeny from another female in which 6 of 17 individuals contained the same new R2 insertion.: In conclusion, we suggest that R2 elements are active early in Drosophila development, and as in the case with L1 elements in mouse and humans [4, 5], can lead to both somatic and germline mosiacism. To determine if R2 elements are also active in somatic tissues later in development will require assaying many smaller samples from individual tissues or more sensitive approaches to detect insertions in smaller percentages of cells. Finally, R2 should serve as a reminder in the study of other mobile elements that events early in development can give rise to insertion mosaics that could be misinterpreted as germline events in the subsequent generation.: References: Laski FA, Rio DC, Rubin GM: Tissue specificity of Drosophila P element transposition is regulated at the level of mRNA splicing. Cell. 1986, 44: 7-19. 10.1016/0092-8674(86)90480-0.: Chaboissier MC, Busseau I, Prosser J, Finnegan DJ, Bucheton A: Identification of a potential RNA intermediate for transposition of the LINE-like element I factor in Drosophila melanogaster. EMBO J. 1990, 9: 3557-3563.: Emmons SW, Yesner L: High-frequency excision of transposable element Tc1 in the nematode Caenorhabditis elegans is limited to somatic cells. Cell. 1984, 36: 599-605. 10.1016/0092-8674(84)90339-8.: van den Hurk JA, Meij IC, Seleme MC, Kano H, Nikopoulos K, Hoefsloot LH, Sistermans EA, de Wijs IJ, Mukhopadhyay A, Plomp AS, de Jong PTVM, Kazazian HH, Cremers FPM: L1 retrotransposition can occur early in human embryonic development. Hum Mol Genet. 2007, 16: 1587-1592. 10.1093/hmg/ddm108.: Kano H, Godoy I, Courtney C, Vetter MR, Gerton GL, Ostertag EM, Kazazian HH: L1 retrotransposition occurs mainly in embryogenesis and creates somatic mosaicism. Genes Dev. 2009, 23: 1303-1312. 10.1101/gad.1803909.: Muotri AR, Chu VT, Marchetto MCN, Deng W, Moran JV, Gage FH: Somatic mosaicism in neuronal precursor cells mediated by L1 retrotransposition. Nature. 2005, 435: 903-910. 10.1038/nature03663.: Eickbush TH: R2 and related site-specific non-LTR retrotransposons. Mobile DNA II. Edited by: Craig N, Craigie R, Gellert M, Lambowitz A. 2002, Washington DC: American Society of Microbiology Press, 813-835.: Kojima KK, Fujiwara H: Long-term inheritance of the 28S rDNA-specific retrotransposon R2. Mol Biol Evol. 2005, 22: 2157-2165. 10.1093/molbev/msi210.: Conconi A, Widmer AR, Koller T, Sogo JM: Two different chromatin structures coexist in ribosomal RNA genes throughout the cell cycle. Cell. 1989, 57: 753-761. 10.1016/0092-8674(89)90790-3.: Ye J, Eickbush TH: Chromatin structure and transcription of R1- and R2-inserted rRNA genes of Drosophila melanogaster. Mol Cell Biol. 2006, 26: 8781-8790. 10.1128/MCB.01409-06.: Zhang X, Eickbush TH: Characterization of active R2 retrotransposition in the rDNA locus of Drosophila simulans. Genetics. 2005, 170: 195-205. 10.1534/genetics.104.038703.: Eickbush DG, Ye J, Zhang X, Burke WD, Eickbush TH: Epigenetic regulation of retrotransposons within the nucleolus of Drosophila. Mol Cell Biol. 2008, 28: 6452-6461. 10.1128/MCB.01015-08.: Zhou J, Eickbush TH: The pattern of R2 retrotransposon activity in natural populations of Drosophila simulans reflects the dynamic nature of the rDNA locus. PLoS Genetics. 2009, 5: e1000386-10.1371/journal.pgen.1000386.: Eickbush DG, Eickbush TH: R2 retrotransposons encode a self-cleaving ribozyme for processing from an rRNA co-transcript. Mol Cell Biol. 2010, 30: 3142-3150. 10.1128/MCB.00300-10.: Zhang X, Zhou J, Eickbush TH: Rapid R2 retrotransposition leads to the loss of previously inserted copies via large deletions of the rDNA locus. Mol Biol Evol. 2008, 25: 229-237.: Endow SA, Glover DM: Differential replication of ribosomal gene repeats in polytene nuclei of Drosophila. Cell. 1979, 17: 597-605. 10.1016/0092-8674(79)90267-8.: Perez-Gonzalez CE, Eickbush TH: Dynamics of R1 and R2 elements in the rDNA locus of Drosophila. Genetics. 2001, 158: 1557-1567.: Lohe AR, Roberts PA: An unusual Y chromosome of Drosophila simulans carrying amplified rDNA spacer without rRNA genes. Genetics. 1990, 125: 399-406.: Stage DE, Eickbush TH: Origin of nascent lineages and the mechanisms used to prime second-strand DNA synthesis in the R1 and R2 retrotransposons of Drosophila. Genome Biol. 2009, 10: R49-10.1186/gb-2009-10-5-r49.: Ostertag EM, Kazazian HH: Biology of mammalian L1 retrotransposons. Annu Rev Genet. 2001, 35: 501-538. 10.1146/annurev.genet.35.102401.091032.: Technau GM: A single cell approach to problems of cell lineage and commitment during embryogenesis of Drosophila melanogaster. Development. 1987, 100: 1-12.: Cohen SM: Imaginal disc development. The Development of Drosophila melanogaster. Edited by: Bate M, Arias AM. 1993, Cold Spring Harbor, NY, USA: Cold Spring Harbor Laboratory Press, 2:: Klebes A, Biehs B, Cifuentes F, Kornberg TB: Expression profiling of Drosophila imaginal discs. Genome Biol. 2002, 3: RESEARCH0038-: Nasiadka A, Dietrich BH, Krause HM: Anterior-posterior patterning in the Drosophila embryo. Adv Develop Biol Biochem. 2002, 12: 155-186.: McKnight SL, Miller OL: Ultrastructural patterns of RNA synthesis during early embryogenesis of Drosophila melanogaster. Cell. 1976, 8: 305-319. 10.1016/0092-8674(76)90014-3.: Download references: Acknowledgements: The research was support by funds from the National Institutes of Health grant GM42790. The authors would like to thank B Burke for help with the DNA sequencing, D Eickbush for comments on the manuscript, and M Welte for discussions of early Drosophila development.: Author information: Affiliations: Corresponding author: Correspondence to Thomas H Eickbush.: Additional information: Competing interests: The authors declare that they have no competing interests.: Authors' contributions: MTE helped design the experiments, conducted all the experiments, and help perfect the manuscript. THE helped design the experiments and wrote the first draft of the manuscript. Both authors read and approved the final manuscript.: Authors’ original submitted files for images: Below are the links to the authors’ original submitted files for images.: Authors’ original file for figure 1: Authors’ original file for figure 2: Rights and permissions: This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.: Reprints and Permissions: About this article: Cite this article: Eickbush, M.T., Eickbush, T.H. Retrotransposition of R2 elements in somatic nuclei during the early development of Drosophila. Mobile DNA 2, 11 (2011). https://doi.org/10.1186/1759-8753-2-11: Download citation: Received: 08 July 2011: Accepted: 29 September 2011: Published: 29 September 2011: DOI: https://doi.org/10.1186/1759-8753-2-11: Keywords: Advertisement: ISSN: 1759-8753: Follow BMC: By using this website, you agree to our Terms and Conditions, Privacy statement and Cookies policy. Manage the cookies we use in the preference centre. : © 2020 BioMed Central Ltd unless otherwise stated. Part of Springer Nature. "
"Alu pair exclusions in the human genome" "George W Cook, Miriam K Konkel, James D Major III, Jerilyn A Walker, Kyudong Han, Mark A Batzer" "Mark A Batzer" "23 September 2011" "The human genome contains approximately one million Alu elements which comprise more than 10% of human DNA by mass. Alu elements possess direction, and are distributed almost equally in positive and negative strand orientations throughout the genome. Previously, it has been shown that closely spaced Alu pairs in opposing orientation (inverted pairs) are found less frequently than Alu pairs having the same orientation (direct pairs). However, this imbalance has only been investigated for Alu pairs separated by 650 or fewer base pairs (bp) in a study conducted prior to the completion of the draft human genome sequence., We performed a comprehensive analysis of all (> 800,000) full-length Alu elements in the human genome. This large sample size permits detection of small differences in the ratio between inverted and direct Alu pairs (I:D). We have discovered a significant depression in the full-length Alu pair I:D ratio that extends to repeat pairs separated by = 350,000 bp. Within this imbalance bubble (those Alu pairs separated by = 350,000 bp), direct pairs outnumber inverted pairs. Using PCR, we experimentally verified several examples of inverted Alu pair exclusions that were caused by deletions., Over 50 million full-length Alu pairs reside within the I:D imbalance bubble. Their collective impact may represent one source of Alu element-related human genomic instability that has not been previously characterized." "Space Length, Chimpanzee Genome, Direct Pair, Inverted Pair, Target Prime Reverse Transcription" " Alu pair exclusions in the human genome: George W Cook1, Miriam K Konkel1, James D Major III1, Jerilyn A Walker1, Kyudong Han2 & Mark A Batzer1 : Mobile DNA volume 2, Article number: 10 (2011) Cite this article : 6068 Accesses: 9 Citations: 0 Altmetric: Metrics details: Abstract: Background: The human genome contains approximately one million Alu elements which comprise more than 10% of human DNA by mass. Alu elements possess direction, and are distributed almost equally in positive and negative strand orientations throughout the genome. Previously, it has been shown that closely spaced Alu pairs in opposing orientation (inverted pairs) are found less frequently than Alu pairs having the same orientation (direct pairs). However, this imbalance has only been investigated for Alu pairs separated by 650 or fewer base pairs (bp) in a study conducted prior to the completion of the draft human genome sequence.: Results: We performed a comprehensive analysis of all (> 800,000) full-length Alu elements in the human genome. This large sample size permits detection of small differences in the ratio between inverted and direct Alu pairs (I:D). We have discovered a significant depression in the full-length Alu pair I:D ratio that extends to repeat pairs separated by = 350,000 bp. Within this imbalance bubble (those Alu pairs separated by = 350,000 bp), direct pairs outnumber inverted pairs. Using PCR, we experimentally verified several examples of inverted Alu pair exclusions that were caused by deletions.: Conclusions: Over 50 million full-length Alu pairs reside within the I:D imbalance bubble. Their collective impact may represent one source of Alu element-related human genomic instability that has not been previously characterized.: Background: Retrotransposons are mobile DNA elements that populate genomes via their respective RNA transcripts. The retrotransposon with the highest copy number in the human genome is the Alu element [1]. Alu elements lack the necessary repertoire of enzymes to effect their independent insertion and are thus classified as non-autonomous mobile elements. For recent reviews, see [2–4].: Following transcription, Alu RNA is thought to require the assistance of the LINE1 open reading frame 2 protein (ORF2p) both for nicking the genome at the insertion site and for reverse transcription of the Alu RNA transcript [5, 6]. The endonuclease and reverse transcriptase functions of ORF2p are referred to as L1EN and L1RT, respectively. While L1EN has been shown to have some tolerance for target site variation, it most frequently cleaves at the T/A transition within the sequence, 5'-TTTTAA-3' [7–10]. Following cleavage, the poly-T sequence of the target site becomes accessible to the complementary poly(A) tail of Alu RNA. Hybridization of these two sequences results in a short RNA-DNA hybrid that both orients the RNA transcript and primes reverse transcription of the Alu RNA by L1RT. Identical sequences flanking the insertion are characteristic of most Alu elements [11]. These flanking sequences are referred to as target site duplications (TSDs) [2, 12]. The presence of TSDs suggests that a nick occurs on the complementary strand of DNA 3' to the L1EN cleavage site on the first strand. However, little is known of the mechanisms associated with this second nick or the eventual insertion of the 5' end of the Alu element [13, 14]. This process of Alu element mobilization and insertion is commonly referred to as target primed reverse transcription (TPRT) [15, 16]. TPRT also occurs with two additional non-long terminal repeat (LTR) retrotransposons, LINE1 and SVA (SINE-R, variable number of tandem repeats and Alu) elements, within the human lineage [8]. While recognizing rare exceptions [17, 18], the majority of non-LTR retrotransposon insertions are dependent upon the activity of L1EN. As with Alu elements, LINE1 and SVA element insertions are typically characterized by TSDs that flank each element.: Alu elements also possess several features that provide directionality. Including the poly(A) tail, full-length Alu elements are approximately 300 bp in length (Additional File 1, Figure S1) and are dimeric structures with two adenine-rich regions flanking the 3' monomer [2, 19]. The middle adenine-rich region separates the two monomers and the 3' adenine-rich region forms the variable length poly(A) tail. Additionally, the 5' monomer possesses the A and B boxes required for the transcription by RNA Polymerase III and the 3' monomer contains a 31-bp insert not present in the 5' monomer [20, 21].: Inverted pairs of full-length Alu elements form near-palindromic sequences that are separated by spacers of other DNA sequences of varying size and composition. Palindromic sequences have been shown to be unstable in Escherichia coli[22], yeast [23] and mice [24]. The genomic instability of inverted Alu pairs has also been demonstrated in a yeast experimental system [25]. Other previous research has reported that inverted Alu pairs are potential sources of chromosomal instability when separated by = 650 bp in humans [26]. The ability of Alu sequences to interact is directly correlated with the degree of sequence identity between the copies [25]. It is estimated that the majority of full-length human Alu elements share sequence identity ranging between 65 and 85 percent [26].: Alu element insertions have been linked to several genetic diseases including hemophilia, hypercholesterolemia and various cancers [4, 27]. While multiple diseases have been attributed to Alu element insertions, their most important role may be in shaping human genome architecture through various post-insertion interactions. Such interactions could result in deletions, duplications, inversions and a host of other complex genomic structural changes [9, 28]. Alu element interactions with each other have been found to generate recombination mediated deletions and inversions [10, 29, 30]. In addition, Alu elements have been associated with multiple deletions related to various cancers [8, 31] and copy number variation breakpoints [9, 32–35].: It has also been shown in humans that closely spaced adjacent Alu pairs in opposing orientation (inverted pairs) are found less frequently than Alu pairs having the same orientation (direct pairs) [26]. However, this imbalance has previously only been investigated for Alu pairs separated by = 650 bp in a study conducted prior to the completion of the draft human genome sequence. Here, we have performed a comprehensive analysis of all (> 800,000) full-length Alu elements (275 to 325 bp) in the public human genome assembly (hg18). Using the large data set of full-length Alu elements enabled us to detect small imbalances in the ratio between inverted and direct Alu pairs (I:D). We report a potential new insight into human genomic instability, a non-random depression in the I:D ratio for full-length Alu pairs whose elements are separated by up to 350,000 bp (P < 0.05). Over 50 million (59,357,435) full-length Alu pairs reside within this I:D imbalance window. This phenomenon of full-length Alu pair I:D imbalance is hypothesized to reflect the activity of four separate mechanisms which result in Alu pair exclusions (APEs).: Results: The size distribution of the human genomic Alu element population is shown in Additional File 1, Figure S1. Full-length Alu elements, having lengths between 275 and 325 bp, account for approximately 69 percent of all human Alu elements. Slightly over two percent of human Alu elements have lengths greater than 325 bp with 29 percent being truncated (< 275 bp). Sequences of less than 30 bp cannot be reliably determined to be actual Alu elements and are therefore excluded from this study (P < 0.05). Alu element length constraints provide a full-length Alu element sample size of 806,880 (Methods).: The directionality of Alu elements creates four possible types of Alu pairs (Additional File 1, Figure S2). Two of these four configurations share both elements in the same (or direct) orientation and two share elements in the opposite (or inverted) orientation. A pair of Alu elements in which both members of the pair are positioned on the positive strand are in the 'forward' orientation. Conversely, when both members in the pair are positioned on the negative strand, the pair is defined as being in the 'reverse' orientation. Throughout this manuscript, the sequence separating each pair is referred to as the spacer. When an inverted Alu pair is oriented with the poly(A) tails pointing toward each other, the pair is termed as being in the 'tail-to-tail' orientation, and when an inverted pair is oriented with the poly(A) tails pointing away from each other, it is termed as being in the 'head-to-head' orientation.: I:D ratio for adjacent full-length Alu pairs departs from unity: Departures from unity in the full-length Alu pair (FAP) I:D ratio may be suggestive of non-random insertion or deletion of Alu elements within the human genome. Testing for randomness was performed using binomial distributions assuming an equal probability for Alu insertions to occur on both the positive and negative strands (Methods). Adjacent FAPs contain no Alu elements within the spacer. The human adjacent FAP population of 560,485 contains 252,748 inverted pairs and 307,737 direct pairs. The I:D ratio for this population is 0.8213. Any I:D ratio outside of 0.9947 to 1.0053 reflects a non-random distribution (P < 0.05). The I:D ratio for adjacent FAPs of 0.8213 represents a P-value of < 0.000001 and therefore falls well outside of the 95 percent confidence interval for randomness.: Furthermore, the adjacent FAP I:D ratio departure from unity appears to be a function of the FAP spacer size. The median spacer size for adjacent FAPs is 930 bp (mean spacer length = 921 bp). Adjacent FAPs with less than and greater than this median spacer length possess I:D ratios of 0.7105 and 0.9477, respectively. The expected I:D range for a random distribution of these half-size FAP populations is 0.9925 to 1.0075 (P < 0.05). A more thorough analysis of the variation of FAP I:D ratio versus spacer size requires adjustment of the data set and is provided later in this section (see CLIQUES, catenated L1EN induced queues of uninterrupted Alu, LINE1 and SVA elements).: The adjacent FAP I:D imbalance calculation reported above provides a macroscopic view of the entire human genome. Human chromosome one was chosen to determine if a similar I:D bias (non-random distributions of Alu elements with respect to orientation) was evident across a smaller region of the genome. A comparison of the actual distribution versus a simulated random distribution of Alu elements on chromosome one indicated that orientational clustering of Alu elements occurs over 40 percent more frequently than would be expected if Alu insertions were orientationally random (Additional File 1, Figure S3).: Three patterns of I:D ratio: Figure 1A illustrates the I:D ratio for adjacent human FAPs which are separated by = 500 bp. This range includes over one-third of the human adjacent FAP population and is the first breakdown of this I:D parameter by individual spacer length. Three distinct patterns of FAP density and I:D ratio are evident from Figure 1.: Frequency of closely spaced FAPs. (A) Human adjacent FAP frequency versus the spacer size (bp) separating the two members of the FAP. The number of inverted pairs (blue and green lines) is much lower than the number of direct pairs (red and black lines) when the spacer has a size = 24 bp (I:D = 0.076); (B) Spacer lengths within 24 to 36 bp define the only region within the human genome where head-to-head (inverted) FAPs outnumber either type of direct oriented FAPs. Bp: base pair; FAP: full-length Alu pair.: The first pattern is the combined high FAP density and low I:D ratio (0.073) for spacer lengths of = 24 bp. An unexpected inflection point in the frequency of direct FAPs occurs after as spacer size of 6 bp (Figure 1A). This pattern may be indicative of a potential orientational insertion preference for Alu elements within the TSD of an existing Alu element. The second FAP I:D ratio pattern evident in Figure 1A (magnified in Figure 1B) is the 13 bp span of elevated FAPs in the head-to-head orientation within the spacer size range of 24 to 36 bp. This span contains 1.6 percent of adjacent human FAPs and is the only spacer size range within the human genome where the FAP I:D ratio exceeds unity (I:D = 1.053). Previous research identified an elevated presence of Alu pairs (> 275 bp) in this orientation for the spacer size range of 21 to 40 bp [26]. As can be seen in Figure 1B, the most accentuated head-to-head frequencies occur between spacer lengths of 24 to 36 bp. For this span of spacer sizes, head-to-head (inverted) FAPs outnumber either forward or reverse (direct) FAPs. Although the most elevated head-to-head frequencies reside within the spacer size range of 24 to 36 bp, Figure 1B also reveals that an attenuated elevation of head-to-head FAPs over tail-to-tail inverted FAPs is present within the spacer size range of 37 to 50 bp.: The third FAP density and I:D ratio pattern is evident in Figure 1A. It is characterized by similar FAP frequencies among the four Alu pair types between spacer sizes of 51 to 500 bp. This third pattern persists for adjacent FAPs with spacer sizes of > 500 bp (data not shown).: CLIQUEs, catenated L1EN induced queues of uninterrupted Alu, LINE1 and SVA elements: The common dependence of Alu, L1NE1, and SVA insertions upon L1 enzymes raises the possibility that the clustering of closely spaced Alu elements (= 50 bp) observed in Figure 1A is also associated with various combinations of all three element types. A similar clustering pattern exists in the form of catenated Alu clusters (see Additional File 1, Catenated Alu Clusters and Figure S4). A total of 412,380 various combinations of these Alu-LINE1-SVA clusters are present within the human genome. These clusters comprise 16.6 percent of all human DNA and contain 52.6 percent of the Alu, LINE1 and SVA sequence within the human genome. Retrotransposons residing within these L1EN-induced clusters can exist in both orientations but exhibit a clear bias for one orientation. These clusters are characterized by this orientational bias as the I:D ratio for adjacent FAPs within these clusters is 0.3847. These clusters are enriched with potential L1EN target sites because of their shared TPRT insertion mechanism creating L1EN-induced TSDs flanking these three types of retrotransposons, as well as by the adenine-rich region within Alu elements (see Discussion, APE mechanisms). This enrichment of potential L1EN target sites inherently increases the likelihood of future Alu, LINE1 and SVA elements within these clusters. The common participation of Alu, L1NE1, and SVA elements within catenated clusters is consistent with L1EN activity. These catenated L1EN induced queues of uninterrupted Alu, L1NE1, and SVA elements are hereafter referred to as CLIQUEs.: The potential for TPRT-related insertion bias within TSDs makes CLIQUE identification an important consideration in evaluating deviations from unity in the FAP I:D ratio. The potential for L1EN orientational bias to propagate within CLIQUEs could conceivably result in FAPs separated by more than 10 kb to be orientationally related. As an example, CLIQUE number 397,134 (chrX:74,530,726-74,548,236) is 17,511 bp in length and contains two full-length Alu elements which form a FAP in the forward orientation with a spacer size of 11,870 bp. This potential for orientational bias between Alu elements residing within the same CLIQUE has resulted in their exclusion for determination of genome-wide FAP I:D ratios. The adjacent FAP I:D ratio, excluding FAPs generated within the same CLIQUE, reduces the FAP sample size from 560,485 to 460,588. This correction increases the adjacent FAP I:D ratio from 0.821 to 0.955. The smaller sample size for CLIQUE corrected adjacent FAPs slightly decreases the precision for detection of non-random I:D ratios from 0.9947 to 1.0053 to 0.9942 to 1.0058 (P < 0.05). However, the CLIQUE-adjusted adjacent I:D ratio (0.955) remains statistically different from random (P < 0.00001) even though it varies with spacer size. The most closely spaced 10 percent of human adjacent FAPs (spacer size = 51-205 bp) have an I:D ratio of 0.898 while the most distantly spaced 10 percent (spacer size = approximately 7,400-50,000 bp) have an I:D ratio of 0.989. This relationship is illustrated in Additional File 1, Figure S6.: A calculated 52.6 percent of human LINE1, Alu and SVA sequences reside in CLIQUEs. The average CLIQUE is 1,169 bp in length and is occupied by 3.3 elements. The median CLIQUE length is 638 bp and 95 percent of all CLIQUEs have lengths less than 4,100 bp. The most CLIQUE-rich chromosome is the chromosome 19 (0.252 CLIQUES per kb) and the least rich is chromosome Y (0.061 CLIQUEs per kb). Over half of the longest 100 CLIQUEs are found on chromosome X, with the longest being over 55,000 bp at locus chrX:75,592,945-75,648,671 (Additional File 1, Figure S5).: Non-adjacent Alu pair: One of the findings in this study is that the FAP I:D imbalance is not limited to adjacent FAPs. Intervening Alu elements within the spacer of a FAP also generate non-random FAP I:D ratios. This non-random I:D imbalance (P < 0.05) was detected in FAPs whose spacer contains up to 106 intervening Alu elements and over 350,000 bp. Taken at the whole human genome level, the human FAP I:D imbalance window encompasses ± 107 of an Alu's neighboring Alu elements (Methods). No size constraint was placed upon intervening Alu elements. Therefore, while the entire inventory of human Alu elements is used in this study, only I:D ratios for FAPs are reported. The smallest CLIQUE adjusted FAP sample size (460,588) occurs for adjacent FAPs. Sample size ranges of 551,764 to 557,454 exist for all FAP families with more than three intervening Alu elements within the spacer (Additional File 1, Table S1). The inclusion of FAPs with intervening Alu elements requires terminology for defining different FAP types (Figure 2 and Methods).: Naming convention for FAPs. This example from chr1:154,126,854-154,134,237 (7,384 bp) illustrates the FAP naming convention. The central Alu is always the element being evaluated and the second member of the pair is designated by its sequential separation from the central Alu. The central Alu is designated with the number '0'. The absolute value of the sequential separation of a given Alu element from the central Alu is defined as its APSN. Additionally, Alu elements located 5' of the central Alu are assigned a negative value and with a positive value if located 3' of the central Alu. APSN: Alu pair sequence number; FAP: full-length Alu pair.: I:D ratio versus Alu pair sequence number: Adjusting the adjacent (0,1) FAP population for CLIQUEs increases its median spacer size from 930 to 1,296 bp. The CLIQUE-adjusted I:D ratios for the smaller and larger spacer sizes about this new median are 0.951 and 0.959, respectively. Both of these I:D ratios are outside of the 0.9918 to 1.0082 range which would be expected for a random distribution (P < 0.05). The small difference between these I:D ratios raises the possibility that FAPs with much larger spacers may also be subject to an FAP I:D imbalance. Unfortunately, this hypothesis is difficult to measure using only adjacent FAPs as 95 percent of this population has spacer sizes of less than 11,005 bp.: The inclusion of intervening Alu elements within FAP spacers permits identification of the boundaries of the FAP I:D imbalance phenomenon (Figure 2). The FAP I:D ratio as a function of Alu pair sequence number (APSN) are shown in Figure 3. Both unadjusted and CLIQUE-corrected I:D curves are provided in this figure. Figure 3A shows FAP I:D ratios across APSN values of ± 1,000 and reveals that the FAP I:D ratio depression appears to be limited to APSNs of = 100. Further refinement of this I:D depression boundary was accomplished by grouping 10 consecutive APSNs together. This increased the FAP sample size from approximately 555,000 to over 5.5 million. The larger sample size improved the precision of detection of the I:D depression boundary to an APSN value of ± 107 (Methods).: FAP I:D ratio versus Alu pair sequence number. The APSN, with (red) and without (blue) correction for CLIQUEs. (A) The I:D ratio of full length Alu pairs for APSNs of ± 1,000 Alu elements. Note that a bubble of depressed I:D ratio exists for those elements within about ± 100 Alu elements of the central Alu element. (B) A closer view of the I:D imbalance bubble. The 95% confidence for each value is estimated ± 0.6%. Therefore, the bubble of I:D imbalance extends for an approximately APSN = ± 85 around the central Alu. A more rigorous treatment of the data (see text) extends this I:D imbalance boundary to an APSN = ± 107. (C) Over 99% of the impact of CLIQUEs on the FAP I:D ratio dissipates after the APSN = 5. The largest CLIQUEs, while rare, contain up to 32 Alu elements. No CLIQUE impact exists on the FAP I:D ratio for an APSN > 31. APSN: Alu pair sequence number; CLIQUE: catenated LINE1 endonuclease induced queue of uninterrupted Alu, LINE1 and SVA elements; FAP: full-length Alu pair; I:D Ratio: ratio between inverted and direct Alu pairs.: Over 50 million FAPs reside within the CLIQUE-adjusted FAP I:D imbalance window. Based on the CLIQUE-adjusted I:D values illustrated in Figure 3, human direct FAPs outnumber inverted FAPs by 629,027 (Additional File 1, Table S1). Random variation reduces this difference to 613,924 (P < 0.05). Figure 3C magnifies Figure 3A to APSN values of ± 15 and illustrates that the greatest departure between CLIQUE-adjusted and unadjusted FAP I:D ratios occurs for APSNs of less than five. The largest APSN for a FAP residing within a single human CLIQUE is 0,31. Consequently, no CLIQUE adjustments to the FAP I:D ratio are required for APSN values greater than 31.: PCR evidence of Alu pair exclusions in the chimpanzee genome: We have presented computational evidence for a significant FAP I:D ratio imbalance in the human genome. To investigate our hypothesis that this imbalance may be due to the increased instability of inverted Alu pairs, resulting in APEs, we compared the human genome (hg18) to the chimpanzee genome (panTro2) to identify potential APE deletions. A total of 58 APE deletion candidate loci were identified for evaluation by PCR (Methods) in the chimpanzee genome through comparison of the human, chimpanzee, orangutan and rhesus macaque genome draft sequences. Fourteen of these loci were selected for PCR examination. These validations confirmed that 10 of these 14 loci had undergone chimpanzee-specific deletions consistent with inverted FAP instability. PCR primer design was problematic for the remaining four loci. No instances of false positive identification of chimpanzee-specific deletions were observed. The characteristics of the 10 loci confirmed as chimpanzee-specific deletions are summarized in Table 1. Images of gel chromatographs of the experimental interrogation of five of the loci are shown in Figure 4.: Chimpanzee-specific APE deletions. PCR analysis confirmed chimpanzee-specific APE deletions in orthologous human, chimpanzee, gorilla, orangutan and rhesus macaque loci. Human adjacent inverted FAP loci were chosen with spacer sizes between 651 and 1500 bp and a minimum of 1,000 bp of Alu-free flanking sequence. PCR loci were selected for which the chimpanzee loci were > 350 bp shorter than the human ortholog. Using identical primers, PCRs were then prepared for human, chimpanzee, gorilla, orangutan and rhesus macaque. APE: Alu pair exclusion; bp: base pair; FAP: full-length Alu pair; PCR: polymerase chain reaction.: A secondary purpose of these PCR examinations was to assess the accuracy of the hg18 and panTro2 genome assemblies at loci involved in APE deletions. If we broadly assume that the combined hg18/panTro2 genome assemblies provide at least 50% accuracy in identification of inverted APE deletion loci, the probability of successfully validating five of these events in five consecutive PCR evaluations would be P = 0.03125 (0.55). The fact that we were able to validate 10 such APE events in 10 consecutive PCR reactions with no evidence of false positives provides over 95% confidence that these two assemblies are at least 74 percent accurate (0.7410 = 0.04924). When we compared the PCR-based estimate of chimpanzee-specific inverted APE deletions to the computationally derived estimate of human inverted APE deletions for this same data set, we found these results to be within 15 percent of each other (108 versus 94). The computation was based upon the human FAP I:D ratio (0.931) for loci satisfying the original PCR criteria (Methods). Thus, these data provide strong evidence for the existence of APE-induced genomic deletions. The characteristics of the 10 loci confirmed as chimpanzee-specific deletions are summarized in Additional File 1, Table S4. Images of gel chromatographs of the experimental interrogation of five of the loci are shown in Figure 4.: Chimpanzee-specific APE deletions within these (human) orthologous loci were estimated to have occurred during the six million years following the divergence between human and chimpanzee lineages [32].: Comparison of orthologous human-chimpanzee direct and inverted FAP loci: An effort was made to better compare the characteristics of deletions within direct and inverted FAP loci. Loci selection criteria for this evaluation were identical to those used for PCR validation with two exceptions: direct FAP loci were included and chimpanzee loci were limited to those that were 1,000 to 2,000 bp shorter than their human ortholog. The second constraint was applied to avoid lengthy deletions that could be more difficult to analyze and to provide a reasonable sample size for manual analysis. Surprisingly, these criteria generated an almost equal number of shorter direct (193) and inverted (187) chimpanzee orthologs. A subsequent examination of the shorter direct chimpanzee FAP loci revealed that inverted APE-related deletions can plausibly be attributed to 93 (48%) of these shorter orthologous loci. These deletions are consistent with an interaction between a member of the direct FAP and a flanking Alu element in the opposite orientation. Furthermore, excluding chimpanzee orthologs that are shorter because of a human-specific retrotransposon insertion, fully 75 percent of the balance of the shorter chimpanzee loci can be plausibly attributed to have resulted from a flanking inverted APE-related deletion (see Methods and Additional File 1, Table S2). The attribution of shorter chimpanzee orthologs to possible inverted APE-related deletions is based upon the hypothesized APE deletion mechanism involving the resolution of Alu-induced double-strand breaks outlined in Additional File 1, Figure S7. This hypothesized APE deletion pattern applies to interactions between inverted FAPs with spacer sizes over 50 bp.: Discussion: Non-random differences between direct and inverted FAPs exist for spacer sizes of zero to = 350,000 bp. These differences may reflect orientation biases for either Alu element insertions or deletions. The instability of Alu pairs with spacer sizes below 650 bp has been previously described [26]. Our research suggests that additional mechanisms may be operational.: APE mechanisms: Four separate mechanisms are theorized for generating APEs within the human genome (Figure 5). Although some overlap likely exists for the spacer size ranges wherein these four mechanisms operate, the first three mechanisms appear to be confined to adjacent FAPs that are separated by = 100 bp. The first of these small-spacer APEs is identified by the observation that inverted Alu pairs form near-palindromic sequences that are vulnerable to hairpin formation and can induce double-strand breaks. This mechanism is termed 'hairpin APE' (Figure 5) and is thought to be operational between spacer sizes of 0 and approximately100 bp [25].: Estimated ranges for four potential APE mechanisms for FAPs. This semi-log chart illustrates the activity of the one previously identified [25] and three new APE mechanisms. The APE Type 1 mechanism can also be termed 'hairpin APEs' and has been previously identified as related to Alu-Alu hairpin formation with subsequent deletion. The range of this mechanism has been demonstrated to extend up to 100 bp in a yeast model [25]. The APE Type 2 mechanism can be described as 'TSDs APEs' and refers to a potential orientational insertion preference for Alu element insertions within the TSD of existing Alu elements. This mechanism would preferentially form direct-oriented FAPs. As with TSD APEs (Type 2), the Type 3 APE mechanism appears to reflect an insertional preference for the formation of head-to-head (inverted) FAPs. Type 3 APEs occur approximately within the range of 21 to 50 bp (Figure 1). The proposed mechanism for formation of Type 4 APEs is described in Figures 6 and S7 and is hypothesized to arise through a DNA conformation termed a 'doomsday junction'. APE: Alu pair exclusion; bp: base pair; FAP: full-length Alu pair.: The second mechanism is termed 'TSD APE' and appears to be active for spacer lengths of less than 23 bp (Figure 1B). This spacer length only slightly exceeds the 7 to 20 bp size range for TSDs [2]. The nexus of high FAP density coupled with low I:D ratio is unique to human FAPs with these spacer lengths. The instability of inverted Alu pairs with spacer lengths of = 100 bp has been demonstrated in a yeast model [25]. This instability would be expected to reduce the FAP I:D ratio. However, the coincident phenomena of high FAP density and low FAP I:D ratio may also be associated with the TPRT insertion mechanism. Alu elements inherently provide an increased density of L1EN target sites. These target sites are generated by Alu TSDs and by the adenine-rich region within Alu elements [36] (see also Additional File 1, Alu-Alu Insertions). The additional L1EN target sites coupled with Alu insertion bias associated with the RNA/DNA hybrid during the TPRT mechanism are consistent with the two superimposed patterns observed in Figure 1A. The instability of inverted Alu pairs almost certainly contributes to the low I:D ratios associated with closely spaced human FAPs. However, attribution of this instability to the entirety of the low I:D ratio observed for FAPs with spacer sizes of = 20 bp may be an overestimate.: The third small-spacer APE mechanism is termed 'head-to-head APE' and involves the elevated frequency of head-to-head FAPs present between spacer sizes of 23 and 50 bp. This elevated frequency is more pronounced for spacer sizes between 25 and 35 bp and very pronounced for spacer sizes of 27 to 30 bp. Within the spacer range of 25 to 35 bp, head-to-head (inverted) FAPs outnumber either type of direct-oriented FAPs (Additional File 1, Figure S2). For spacer sizes of 27 to 30 bp, head-to-head FAPs actually outnumber the sum of both direct-oriented FAP pair types. If direct-oriented FAPs are relatively stable entities, this region of elevated head-to-head frequency may evidence an insertion-related phenomenon. A more detailed discussion of this possibility is provided in Additional File 1 (Possible Epigenetics Associated with Head-to-Head FAPs with Spacer Sizes of 24-36 bp).: The fourth APE mechanism is very dissimilar from the first three small-spacer APE mechanisms in that it involves the loss of inverted FAPs separated by approximately 50 to = 350,000 bp. The third APE mechanism overlaps this range up to a spacer size of 100 bp. Over 99 percent of all CLIQUE-corrected FAPs (not residing within the same CLIQUE) have spacer sizes greater than 100 bp. The higher energy state required for formation of single-stranded DNA makes hairpin loop formation a rare event between inverted Alu pairs separated by more than 100 bp [25, 37]. Three possible pathways for interactions of distantly separated inverted FAPs are illustrated in Figure 6 and Additional File 1, Figure S7. Each of these pathways results in the ectopic annealing of single-stranded DNA associated with inverted FAPs. This annealing, which is hypothesized to result in a 'double-bubble' type structure, could potentially overcome the thermodynamic hurdle associated with single-stranded large-spacer hairpins. This structure is termed a 'doomsday junction' or DDJ (illustrated in Figure 6, Steps 6A and 6B and Additional File 1, Figure S7, Step 5).: Possible mechanisms for formation of G and S phase DDJ. Steps 1 and 2 illustrate an inverted FAP. When the DNA in Step 1 is bent 180°, the two Alu elements within the inverted FAP are aligned. Steps 3A-6A and 3B-6B illustrate two possible mechanisms for interactions between inverted Alu elements without the formation of a hairpin loop. Steps 3A-6A, DNA Breathing (G phase) Mediated APE deletion. (3A) DNA breathing bubbles are typically < 20 bp [45] and are characterized by flipping of the unpaired nucleotide bases away from the center line of the double-helix [37]. A bubble in this conformation could be susceptible to interaction with a bubble of similar sequence. (4A) Simultaneous bubbles may arise in identical sections of aligned Alu elements. (5A) Simultaneous homologous bubble alignment could initiate bubble-bubble interaction with the potential for forming a 'double-bubble' conformation. (6A) The ectopic formation of the double-bubble conformation within two aligned breathing bubbles could potentially extend to the entire length of the two aligned Alu elements. The high GC content of Alu elements would likely increase the stability of the hypothesized doomsday junction. DDJs likely possess four single-stranded sections of single-stranded DNA at each end which could be susceptible to single-strand nuclease attack. Steps 3B-6B, Replication Fork (S phase) Mediated APE Deletion 3B-5B) Initiation and growth of a replication bubble. (5B) Coincident progression of the DNA replication bubble through an inverted FAP. (6B) Invasion and ectopic annealing of high-homology replication forks. APE: Alu pair exclusion; DDJ: doomsday junction; FAP: full-length Alu pair.: Nuclease attack of DNA hairpins has been found to occur at the base, rather than the loop of DNA hairpins in yeast [23]. If DDJs exist, and if single-strand nucleases are active in primates, the eight single-stranded sections of DNA on the periphery of DDJs (Figure 6, Steps 6A and 6B and Additional File 1, Figure S7, Step 5) could form attractive nuclease targets. Such nicking could help resolve the DDJ. However, this nicking could potentially result in various combinations of flanking deletions on either side of the two Alu elements forming the DDJ. The resultant tell-tale deletion patterns that we would predict from this mechanism are outlined in Additional File 1, Figure S8. The varied repair products from nuclease attack on these single-stranded structures could result in partial or total removal of one or both Alu elements. These proposed patterns are consistent with those observed by PCR of possible chimpanzee-specific APE deletions shown in Figure 4 and Additional File 1, Figure S8D. The pattern is also consistent with deletion patterns in 199 of 380 orthologous human-chimpanzee FAP loci (51%) where a potential chimpanzee deletion had occurred (Additional File 1, Table S2). This deletion pattern increases to 75 percent when the 114 human-specific retrotransposon insertions are removed from the data set.: G-phase doomsday APEs: Figure 6 and Additional File 1, Figure S7 outline separate mechanisms by which DDJs could form during the G and S phases of the cell cycle. We propose that G-phase DDJs result from the ectopic invasion and annealing of high-homology bubbles associated with DNA breathing (Figure 6, Steps 1-6A). Nucleosomes and other chromatin structures mitigate DNA breathing and thus may reduce the potential for G-phase DDJ formation. Therefore, in addition to their multifarious roles in signaling and protein binding, nucleosomes may also serve to minimize the interaction between high-homology DNA strands. The instability of closely spaced inverted Alu elements shown here and noted by previous researchers may be evidence that nucleosomes are either absent from hairpin prone DNA sequences or provide insufficient interference for hairpin formation [3, 25, 26]. The postulated G phase DDJ phenomenon may enjoy this same dominance over nucleosome interference.: If simultaneous DNA breathing bubbles were to arise between aligned homologous sequences, the flipped-out conformation of complimentary bases on both strands could provide additional potential for intra-strand interaction (Figure 6, Step 4A) [10]. This altered genomic structure formed by the hypothetical interaction between two homologous DNA bubbles would effectively create the double-bubble conformation associated with DDJs. The initial smaller double-bubble structure (Figure 6, Step 5A) could easily expand to form a larger double-bubble which could extend to almost the entire length of the two aligned Alu elements (Figure 6, Step 6A). The high GC content (> 60%) of Alu elements composing the large bubble conformation would likely enhance the stability of the hypothesized DDJ.: S-phase doomsday APEs: S phase DDJs are proposed to result from invasion and subsequent annealing of high-homology DNA replication forks (Figure 6, Steps 1-6B and Additional File 1, Figure S4). Coincident passage of replication forks through inverted FAPs could provide an environment susceptible to formation of an S-phase DDJ. Unlike the chromatin interference present in G phase, replicating S-phase DNA is forced to lift its chromatin kimono and becomes much more vulnerable to ectopic DNA interaction. While single-strand binding proteins stabilize single-stranded portions of the replication fork, they are eventually displaced with a newly replicated strand of single-stranded DNA. This second strand could conceivably be supplied from an invading second replication fork.: Notably, upon formation of an S-phase DDJ, the DNA replication apparatus would be completely assembled and could potentially proceed, albeit in an ectopic fashion, and conceivably generate segmental duplications. In addition, the double-bubble binding of near-homologous Alu elements within a DDJ could invite the activity of cellular mismatch repair mechanisms. Such mismatch activity could help explain elevated mutation rates which have previously been observed close to deletions [38].: Finally, the DDJ mechanisms outlined in Figure 6 and Additional File 1, Figure S7 do not preclude interactions between direct-oriented FAPs. However, the distinctive 'V' shape of replication forks may provide steric hindrance to interactions with direct pairs and thus preferably favor interactions between inverted pairs. Regardless of the mechanism(s) associated with the human FAP I:D ratio imbalance, this metric is not an absolute measure of change in the number of either direct or inverted FAPs, but of the relative change between the two types.: Conclusions: Direct and inverted FAPs are distributed non-randomly in the human genome. This non-random pattern exists for APSNs = 107 bp and for spacer sizes up to 350,000 bp. A total of 59,357,435 FAPs (CLIQUE corrected) reside within this window and direct FAPs outnumber inverted FAPs by 629,027 (over two percent). Random variation only reduces this imbalance to 613,924 (P < 0.05). Outside of CLIQUEs, no known orientation insertion preferences exist for Alu elements. We believe that APE-related deletions may be responsible for a substantial proportion of the imbalance of over 600,000 between inverted and direct human FAPs. Future investigations of the APE phenomenon should better illuminate the mechanisms involved and characterize its extent in primate genomes.: Methods: Data acquisition and management: Data used in the research was obtained from the RepeatMasker [39] output for the hg18, 2006 Human Genome assembly. This data was downloaded from the UCSC genome BLAT Table Browser http://genome.ucsc.edu/cgi-bin/hgTables[40] and imported to Excel 2010 (Microsoft Corporation; Redmond, Washington). Orthologous chimpanzee, orangutan and rhesus macaque loci were obtained using the panTro2, ponAbe2 and rheMac2 genomes assemblies, respectively. Statistics were calculated using Minitab 15 (Minitab Inc.; State College, Pennsylvania).: Histogram of human Alu size distribution: The RepeatMasker scan of the hg18 human genome assembly identifies potential Alu fragments as small as 12 bp. Using a haploid genome size of 3.1 × 109 bp, a total of 185 instances of a given 12 bp should randomly occur in human DNA. However, most Alu elements have sequence identities between 65 and 85 percent [26]. Using the lower sequence identity (65%) increases the number of random instances of a 12 bp target sequence occurring in the human genome from 185 to 32,485 (Additional File 1, Figure S1). The target sequence must increase in length to 26 bp before statistical significance (P < 0.05) occurs. This sequence size increases to 29 bp for 60% identity. For this study, only Alu sequences of = 30 bp are used. For perspective, a 30 bp Alu fragment length is roughly 10 percent of the length of a full-length Alu element. Finally, it should be noted that the 12 bp sequences become significant (P < 0.05) when a segment of DNA shorter than 4,770 bp is being evaluated.: Sequences of less than 30 bp in length cannot be reliably determined to be actual Alu elements and are therefore excluded from this truncated percentage. A lower size limit of 275 bp is set to avoid I:D ratio directional bias caused by fragmented elements that can be generated by Alu insertions into a preexisting Alu element (Additional File 1). The upper Alu element size limit of 325 bp is set to avoid the potential for confounding results by inclusion of the smaller population of larger elements.: Terminology for non-adjacent Alu pairs: The central Alu in this naming convention is always designated with the number '0'. The second member of the pair is designated by its sequential separation from the central Alu. If this second member of a pair is located 5' of the central Alu element, it is designated by a negative number and by a positive number if it is located 3' of the central Alu element. The value of the sequential separation of a given Alu element from the central Alu is defined as its APSN. For adjacent elements, these FAP pairs are described as -1,0 and 0,1. Similarly, FAPs separated by 25 intervening Alu elements are described as -26,0 and 0,26 pairs, respectively.: Determination of 95% confidence interval for FAP I:D ratios: FAP sample sizes used in this study range from 555,354 to 567,242 (APSNs 0,1 to 0,107). These sample sizes are retrieved by counting functions within the Alu element Excel spreadsheet. Following removal of FAPs residing within the same CLIQUE (CLIQUE-adjusted), these data set sizes are reduced to between 460,588 and 557,364. CLIQUE-adjusted samples sizes below 550,000 only exist for APSNs = 4. For a FAP sample size of 550,000, the number of direct and inverted FAPs should range between 274,272 and 275,728 (P < 0.05). Any imbalance in direct or inverted FAPs is offset by an equal and opposite imbalance in the other FAP type. Therefore, the I:D ratio for a sample size of 550,000 is expected to range from 0.9947 to 1.0053 (P= 0.05). This range increases to between 0.9942 and 1.0058 for the lowest sized (0,1) FAP family of 460,588.: Determination of maximum APSN within the FAP I:D ratio imbalance window: Determination of the limits of the FAP I:D ratio imbalance boundary beyond an APSN of approximately 85 (Figure 3B) was accomplished by increasing the precision of the method. This added precision was achieved by increasing the FAP sample size. This larger sample size was acquired by calculating a 10-point moving average of the FAP I:D ratio across consecutive APSNs beyond the ± 85 range. This approach increased the FAP sample size from approximately 550,000 to 5.55 million and reduced the 95 percent confidence interval for randomness from 1 ± 0.0053 to 1 ± 0.0017. The highest ten consecutive APSNs which had an I:D average outside of these new confidence limits was the APSN range 103 to 112. The midpoint of this range is the APSN value of 107.: Determination of maximum spacer size within the FAP I:D ratio imbalance window: Approximately 90 percent of the adjacent FAPs have spacer sizes below 6,400 bp. In addition, the I:D ratio for the upper 10 percent of this family is 0.9838 which is lower than the statistically significant I:D ratio of 1 ± 0.995. Consequently, determination of the boundary of the FAP I:D imbalance bubble (Figure 3B) requires examination of larger APSN families. The number of FAPs within a given size range can be summed across various APSNs. This summation was used to determine the spacer size boundaries for the FAP I:D imbalance window.: APSN families smaller than 0,25 contain very few members with spacer sizes between 300,000 and 400,000 bp. However, 3,541,238 FAPs reside within this spacer range for APSN's of 0,25 to 0,107. This spacer size range was divided into two separate ranges of 300,000 to 350,000 and 351,000 to 400,000. The number of FAPs within these spacer ranges was determined as 1,974,605 (I:D = 0.9951) and 1,566,633 (I:D = 0.9956), respectively. The expected ranges for FAP I:D ratios for these two spacer size ranges are 0.9972 to 1.0028 and 0.9969 to 1.0031, respectively (P < 0.05). These two I:D ratios are outside of these ranges and thus show that the FAP I:D imbalance window extends beyond ± 350,000 bp.: Selection of loci for validation of APE deletions in the chimpanzee genome: The methodology employed for selection of potential APE deletion loci utilized five criteria. These criteria were pair orientation, APSN, Alu element size, spacer size and Alu-free flanking sequence 5' and 3' of the pair being evaluated. Only inverted Alu pairs were chosen as potential experimental loci as they have been previously demonstrated to be unstable [25]. The second criterion, APSN, was limited to 0,1 (adjacent) FAPs as any intervening Alu element necessarily forms a second, more closely spaced inverted pair with one of the two elements of that FAP. Therefore, any deletion associated with this locus could reasonably be attributed to interactions associated with the intervening element. For this reason, only the pool of adjacent human FAPs (APSN = 0,1) was used to identify candidate APE deletion loci.: The third criterion, Alu element size, was limited to the 275 to 325 bp constraints set for FAPs. The fourth criterion, spacer length separating the two FAP elements, was limited to those elements separated by 651 to 1,500 bp. The lower spacer size limit was set by the upper limit of previous work [26] and upper limit was set to provide an acceptable number of candidate loci. The fifth criterion, 5' and 3' Alu-free flanking sequence around a 0,1 FAP, was set to a minimum of 1,000 bp. This constraint was necessary to avoid attribution of an APE deletion to nearby elements. These criteria created locus sizes between 3,201 and 4,150 bp.: A total of 13,664 human loci were identified which satisfied these five criteria. This sample size was approximately 0.03 percent of the approximately 50 million CLIQUE-adjusted FAPs within the I:D imbalance window shown in Figure 3B. These loci were then compared to the chimpanzee panTro2 genome assembly using the LiftOver feature of the USCS Genome Browser [40–42]. This screening identified 715 (or slightly over five percent) of the chimpanzee loci that were over 350 bp smaller than their human ortholog. The less than 350 bp lower limit was set to reduce the number of false-positive loci (in other words, human specific Alu insertions can be flagged as potential sites for chimpanzee APE-related deletions). The 715 loci were individually inspected using the UCSC genome browser for the human, chimpanzee, orangutan and rhesus macaque genomes [40–44]. These inspections reduced the number of PCR candidate loci to 58. Four criteria accounted for approximately 90 percent of this reduction. These four criteria, in order of magnitude, were:: 1. The presence of N's in the chimpanzee genome assembly (382 loci): 2. The insertion of a human specific transposable element as the cause of the smaller chimpanzee loci (141 loci): 3. A deletion present, but so large that it encompassed an adjacent Alu element making the deletion non-diagnostic (56 loci): 4. Complementary deletions were also present in orangutan or rhesus (38 loci): The remaining 58 loci were selected as potential candidates for further examination with PCR.: Estimation of APE deletions in chimpanzee genome by observation: Although only 58 of the 715 loci were accepted for further examination by PCR, an additional 94 of these loci showed considerable evidence of being potential APE deletions (criterions 3 and 4, above). Adding these 94 loci to the 58 PCR candidate loci increases the number of APE-related deletion loci to 152. It was also assumed that the 382 loci which contained N's in the chimpanzee (rejection criterion 1) were indeterminate and could neither be accepted nor rejected regarding detection of APE-related deletions. Separating these 382 loci (which contained N's in the chimpanzee deletion) from the original set of 715 loci reduces the total number of individually inspected loci to 333. It is estimated that 152 likely APE-related deletion loci exist out of these 333 loci (45.6%). Of the 14 loci evaluated by PCR, 10 were informative (71.4%). The PCR results from the remaining four loci were uninformative and no false positive instances of chimpanzee-specific deletions were observed. Combining these two probabilities provides an estimate that 32.6 percent (108) of the 333 loci were likely APE-related deletions. Therefore, within these 13,664 inverted FAP loci, a total of 108 APE-type deletions are estimated to have occurred in chimpanzee (by observation) since the human-chimpanzee divergence.: Primer design for PCR: Candidate PCR amplicon sequences were obtained with the BLAT feature of the UCSC genome browser [40, 42]. These sequences were aligned using the BioLign software (developed by Tom Hall and available from the Buckler Lab website: http://www2.maizegenetics.net/bioinformatics). These alignments were manually inspected for common identity between the four primate species. Forward and reverse oligonucleotide primers were selected from regions of common alignment. Primer sequences are shown in Additional File 1, Table S2.: PCR amplification: All PCR amplifications were conducted in 27.5 µL reactions using 25 ng DNA template, 0.2 µM oligonucleotide primer, 1.25 units TaKaRa LA Taq™, 0.4 mM dNTPs, and 1X TaKaRa LA Taq™ buffer containing 2.3 uM MgCl2. A list of primers is provided in Additional File 1, Table S2. The primate panel contained templates from Homo sapiens (HeLa; cell line ATTCC CCL-2); Pan troglodytes (common chimpanzee \"Clint\", cell line Coriell Cell repositories NS06006), Gorilla gorilla (Western lowland gorilla; cell line Coriell Cell Repositories NG05251); Pongo abelii (Sumatran orangutan; cell line Coriell Cell Repositories NG06209); and Macaca mulatta (rhesus macaque; cell line Coriell cell Repositories NG07110). PCRs were run for 80 sec for initial denaturation at 94°C. Denaturing, annealing and extension times and temperatures were 20 sec at 94°C, 20 sec at optimum temperatures (Additional File 1, Table S2) and 8 min 30 sec at 68°C, respectively, for 32 cycles. The 32 cycles were followed by a final extension time of 10 min at 68°C. Following amplification, all PCR products were electrophoresed on 1.5% agarose gels stained with ethidium bromide at a concentration of 1 µl per 50 mL of gel solution. Gels were run for 45 to 55 min at 175 volts. Finally, fragments were visualized using UV fluorescence.: Comparison of APE deletions in chimpanzee genome by computation and observation: Using the original criteria for isolating potential experimental loci, 13,664 inverted FAP and 14,680 direct FAPs were identified. The I:D ratio for these FAPs is 0.931 and the difference between these inverted and direct FAPs is 1,016, which we believe correspond to APE-associated deletion events. All Alu element insertions have occurred over the 65 million years of primate evolution. It is estimated that the most recent common ancestor of humans and chimpanzees lived approximately six million years ago [32]. Consequently, approximately 12 million years of genome evolution are estimated to have occurred between extant humans and chimpanzees. For this 12-million year period of evolution to be incorporated into calculated APE rate estimates, both orthologous chimpanzee-specific and human-specific APE-related deletions must be estimated. Only chimpanzee-specific APE-related deletions are measured in this study. Therefore, only half of the 12-million years of evolution are used (six million years) in this estimate. Therefore, a conservative estimate of 94 chimpanzee-specific APE deletions would be expected over the 6 million years since the human-chimpanzee divergence (1016 × 6 ÷ 65 = 94). This number is concordant with the 108 APE deletions previously estimated to have occurred by observational methods (see Estimation of APE Deletions in Chimpanzee Genome by Observation above).: Moving average distributions of actual and random Alu clustering: The RepeatMasker scan of the hg18 human chromosome assembly recovers 102,592 Alu elements in chromosome 1. Since orientational clustering bias has been shown to occur within CLIQUEs, only the 5' Alu element in each CLIQUE was included in this evaluation. Chromosome 1 contains 50,262 Alu elements that do not reside within a CLIQUE. Human chromosome 1 contains 34,916 CLIQUEs, of which 26,277 contain at least one Alu element. Consequently, only 76,539 (50,262 + 26,277) Alu elements were used in this clustering evaluation. A value of +1 was assigned to each Alu on the positive strand and a value of -1 was assigned to each Alu on the negative strand. Moving average data was calculated for the 50, 100, 200, 500 and 1,000 sequential directional data points in Excel.: Five sets of 76,539 random +1 and -1 data (equivalent to the revised data set of Alu elements in human chromosome 1, above) were generated using Minitab15. This data was transferred to Excel and moving averages were calculated for each set of random data for 50, 100, 200, 500, 1,000, 2,000, 5,000 and 10,000 sequential directional data points. These 48 sets of moving average data (one set of actual data and five sets of random data for eight separate moving averages) were then transferred back to Minitab. Individual mean and standard deviations for each set of random distributions were determined using the Mintab15 histogram 'with fit and groups' algorithm. The five individual means and standard deviations were then averaged for each set of random moving averages. The random data curves were generated using these average mean and standard deviations (Additional File 1, Figure S3).: Abbreviations: Alu pair exclusion: Alu pair sequence number: base pair: catenated LINE1 endonuclease induced queue of uninterrupted Alu, LINE1 and SVA elements: doomsday junction: deoxyribonucleotide triphosphate: full-length Alu pair: D Ratio: ratio between inverted and direct Alu pairs: long interspersed element 1: endonuclease domain of LINE1 ORF2 protein: reverse transcriptase domain of LINE1 ORF2 protein: long terminal repeat: open reading frame: polymerase chain reaction: short interspersed elements: SINE/variable number of tandem repeats/Alu: target primed reverse transcription: target site duplication: See also Additional File 2.: References: Lander ES, et al: Initial sequencing and analysis of the human genome. Nature. 2001, 409 (6822): 860-921. 10.1038/35057062.: Cordaux R, Batzer MA: The impact of retrotransposons on human genome evolution. Nat Rev Genet. 2009, 10 (10): 691-703. 10.1038/nrg2640.: Lee J, Han K, Meyer TJ, Kim HS, Batzer MA: Chromosomal inversions between human and chimpanzee lineages caused by retrotransposons. PLoS One. 2008, 3 (12): e4047-10.1371/journal.pone.0004047.: Belancio VP, Hedges DJ, Deininger P: Mammalian non-LTR retrotransposons: for better or worse, in sickness and in health. Genome Res. 2008, 18 (3): 343-358. 10.1101/gr.5558208.: Luan DD, Korman MH, Jakubczak JL, Eickbush TH: Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition. Cell. 1993, 72 (4): 595-605. 10.1016/0092-8674(93)90078-5.: Mathias SL, Scott AF, Kazazian HH, Boeke JD, Gabriel A: Reverse transcriptase encoded by a human transposable element. Science. 1991, 254 (5039): 1808-1810. 10.1126/science.1722352.: Repanas K, Zingler N, Layer LE, Schumann GG, Perrakis A, Weichenrieder O: Determinants for DNA target structure selectivity of the human LINE-1 retrotransposon endonuclease. Nucleic Acids Res. 2007, 35 (14): 4914-4926. 10.1093/nar/gkm516.: Konkel MK, Batzer MA: A mobile threat to genome stability: The impact of non-LTR retrotransposons upon the human genome. Semin Cancer Biol. 2010, 20 (4): 211-221. 10.1016/j.semcancer.2010.03.001.: 1000 Genomes Project Consortium: A map of human genome variation from population-scale sequencing. Nature. 2010, 467 (7319): 1061-1073. 10.1038/nature09534.: Fogedby HC, Metzler R: Dynamics of DNA breathing: weak noise analysis, finite time singularity, and mapping onto the quantum Coulomb problem. Phys Rev E Stat Nonlin Soft Matter Phys. 2007, 76 (6 Pt 1): 061915-: Batzer MA, Kilroy GE, Richard PE, Shaikh TH, Desselle TD, Hoppens CL, Deininger PL: Structure and variability of recently inserted Alu family members. Nucleic Acids Res. 1990, 18 (23): 6793-6798. 10.1093/nar/18.23.6793.: Grimaldi G, Singer MF: A monkey Alu sequence is flanked by 13-base pair direct repeats by an interrupted alpha-satellite DNA sequence. Proc Natl Acad Sci USA. 1982, 79 (5): 1497-1500. 10.1073/pnas.79.5.1497.: Batzer MA, Deininger PL: Alu repeats and human genomic diversity. Nat Rev Genet. 2002, 3 (5): 370-379. 10.1038/nrg798.: Goodier JL, Kazazian HH: Retrotransposons revisited: the restraint and rehabilitation of parasites. Cell. 2008, 135 (1): 23-35. 10.1016/j.cell.2008.09.022.: Luan DD, Eickbush YH: RNA template requirements for target DNA-primed reverse transcription by the R2 retrotransposable element. Mol Cell Biol. 1995, 15 (7): 3882-3891.: Kazazian HH: Mobile elements and disease. Curr Opin Genet Dev. 1998, 8 (3): 343-350. 10.1016/S0959-437X(98)80092-0.: Morrish TA, Gilbert N, Myers JS, Vincent BJ, Stamato TD, Taccioli GE, Batzer MA, Moran JV: DNA repair mediated by endonuclease-independent LINE-1 retrotransposition. Nat Genet. 2002, 31 (2): 159-165. 10.1038/ng898.: Srikanta D, Sen SK, Conlin EM, Batzer MA: Internal priming: an opportunistic pathway for L1 and Alu retrotransposition in hominins. Gene. 2009, 448 (2): 233-241. 10.1016/j.gene.2009.05.014.: Weiner AM, Deininger PL, Efstratiadis A: Nonviral retroposons: genes, pseudogenes, and transposable elements generated by the reverse flow of genetic information. Annu Rev Biochem. 1986, 55: 631-661. 10.1146/annurev.bi.55.070186.003215.: Watson JB, Sutcliffe JG: Primate brain-specific cytoplasmic transcript of the Alu repeat family. Mol Cell Biol. 1987, 7 (9): 3324-3327.: Quentin Y: Fusion of a free left Alu monomer and a free right Alu monomer at the origin of the Alu family in the primate genomes. Nucleic Acids Res. 1992, 20 (3): 487-493. 10.1093/nar/20.3.487.: Collins J: Instability of palindromic DNA in Escherichia coli. Cold Spring Harb Symp Quant Biol. 1981, 45 (Pt 1): 409-416.: Lengsfeld BM, Rattray AJ, Bhaskara V, Ghirlando R, Paull TT: Sae2 is an endonuclease that processes hairpin DNA cooperatively with the Mre11/Rad50/Xrs2 complex. Mol Cell. 2007, 28 (4): 638-651. 10.1016/j.molcel.2007.11.001.: Lewis S, Akgun E, Jasin M: Palindromic DNA and genome stability. Further studies. Ann N Y Acad Sci. 1999, 870: 45-57. 10.1111/j.1749-6632.1999.tb08864.x.: Lobachev KS, Stenger JE, Kozyreva OG, Jurka J, Gordenin DA, Resnick MA: Inverted Alu repeats unstable in yeast are excluded from the human genome. EMBO J. 2000, 19 (14): 3822-3830. 10.1093/emboj/19.14.3822.: Stenger JE, Lobachev KS, Gordenin D, Darden TA, Jurka J, Resnick MA: Biased distribution of inverted and direct Alu s in the human genome: implications for insertion, exclusion, and genome stability. Genome Res. 2001, 11 (1): 12-27. 10.1101/gr.158801.: Deininger PL, Batzer MA: Alu repeats and human disease. Mol Genet Metab. 1999, 67 (3): 183-193. 10.1006/mgme.1999.2864.: Hedges DJ, Deininger PL: Inviting instability: Transposable elements, double-strand breaks, and the maintenance of genome integrity. Mutat Res. 2007, 616 (1-2): 46-59. 10.1016/j.mrfmmm.2006.11.021.: Sen SK, Han K, Wang J, Lee J, Wang H, Callinan PA, Dyer M, Cordaux R, Liang P, Batzer MA: Human genomic deletions mediated by recombination between Alu elements. Am J Hum Genet. 2006, 79 (1): 41-53. 10.1086/504600.: Han K, et al: Alu recombination-mediated structural deletions in the chimpanzee genome. PLoS Genet. 2007, 3 (10): 1939-49.: Franke G, Bausch B, Hoffmann MM, Cybulla M, Wilhelm C, Kohlhase J, Scherer G, Neumann HP: Alu-Alu recombination underlies the vast majority of large VHL germline deletions: Molecular characterization and genotype-phenotype correlations in VHL patients. Hum Mutat. 2009, 30 (5): 776-786. 10.1002/humu.20948.: Xing J, Zhang Y, Han K, Salem AH, Sen SK, Huff CD, Zhou Q, Kirkness EF, Levy S, Batzer MA, Jorde LB: Mobile elements create structural variation: analysis of a complete human genome. Genome Res. 2009, 19 (9): 1516-1526. 10.1101/gr.091827.109.: Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, Aerts J, Andrews TD, Barnes C, Campbell P, Fitzgerald T, Hu M, Ihm CH, Kristiansson K, Macarthur DG, Macdonald JR, Onyiah I, Pang AW, Robson S, Stirrups K, Valsesia A, Walter K, Wei J, Wellcome Trust Case Control Consortium, Tyler-Smith C, Carter NP, Lee C, Scherer SW, Hurles ME: Origins and functional impact of copy number variation in the human genome. Nature. 2010, 464 (7289): 704-712. 10.1038/nature08516.: Mills RE, et al: Mapping copy number variation by population-scale genome sequencing. Nature. 2011, 470 (7332): 59-65. 10.1038/nature09708.: Bailey JA, Liu G, Eichler EE: An Alu transposition model for the origin and expansion of human segmental duplications. Am J Hum Genet. 2003, 73 (4): 823-834. 10.1086/378594.: Levy A, Schwartz S, Ast G: Large-scale discovery of insertion hotspots and preferential integration sites of human transposed elements. Nucleic Acids Res. 2010, 38 (5): 1515-1530. 10.1093/nar/gkp1134.: SantaLucia J, Hicks D: The thermodynamics of DNA structural motifs. Annu Rev Biophys Biomol Struct. 2004, 33: 415-440. 10.1146/annurev.biophys.32.110601.141800.: Tian D, Wang Q, Zhang P, Araki H, Yang S, Kreitman M, Nagylaki T, Hudson R, Bergelson J, Chen JQ: Single-nucleotide mutation rate increases close to insertions/deletions in eukaryotes. Nature. 2008, 455 (7209): 105-108. 10.1038/nature07175.: Smit A, Hubley R, Green P: RepeatMasker Open-3.0. 1996, -2010http://www.repeatmasker.org: Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, Kent WJ: The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 2004, D493-D496. 32 Database: Chimpanzee Sequencing and Analysis Consortium: Initial sequence of the chimpanzee genome and comparison with the human genome. Nature. 2005, 437 (7055): 69-87. 10.1038/nature04072.: Kent WJ: BLAT--the BLAST-like alignment tool. Genome Res. 2002, 12 (4): 656-664.: Rhesus Macaque Genome Sequencing and Analysis Consortium: Evolutionary and biomedical insights from the rhesus macaque genome. Science. 2007, 316 (5822): 222-234.: Locke DP, et al: Comparative and demographic analysis of orang-utan genomes. Nature. 2011, 469 (7331): 529-533. 10.1038/nature09687.: Jeon JH, Sung W, Ree FH: A semiflexible chain model of local denaturation in double-stranded DNA. J Chem Phys. 2006, 124 (16): 164905-10.1063/1.2192774.: Download references: Acknowledgements: The authors would like to thank all members of the Batzer Laboratory for their advice and feedback. They would especially like to thank Thomas J Meyer for his insightful suggestions and sage advice in the preparation of the manuscript and figures. Special thanks are given to Jungnam Lee for her important admonition that intervening Alu elements be considered in this study of Alu pairs. This research was supported by National Institutes of Health RO1 GM59290 (MAB), and the Louisiana Board of Regents Governor's Biotechnology Initiative GBI (2002-005) (MAB).: Author information: Affiliations: Corresponding author: Correspondence to Mark A Batzer.: Additional information: Competing interests: The authors declare that they have no competing interests.: Authors' contributions: GWC, MKK, JAW and KH designed the research; GWC and JDM performed the research; MAB contributed new reagents and analytic tools; GWC analyzed the data and wrote the paper. All authors read and approved the final manuscript.: Electronic supplementary material: Additional file 1:Supplemental Information. This file contains fundamental background information related to Alu pair exclusion research. It also contains discussions and data that support the findings within the manuscript. (DOCX 3 MB): Additional file 2:Definition of Terms. This file contains a list with definitions of abbreviations and novel terminology introduced within the manuscript. (TIFF 172 KB): Authors’ original submitted files for images: Below are the links to the authors’ original submitted files for images.: Authors’ original file for figure 1: Authors’ original file for figure 2: Authors’ original file for figure 3: Authors’ original file for figure 4: Authors’ original file for figure 5: Authors’ original file for figure 6: Rights and permissions: Reprints and Permissions: About this article: Cite this article: Cook, G.W., Konkel, M.K., Major, J.D. et al. Alu pair exclusions in the human genome. Mobile DNA 2, 10 (2011). https://doi.org/10.1186/1759-8753-2-10: Download citation: Received: 24 June 2011: Accepted: 23 September 2011: Published: 23 September 2011: DOI: https://doi.org/10.1186/1759-8753-2-10: Keywords: Advertisement: ISSN: 1759-8753: Follow BMC: By using this website, you agree to our Terms and Conditions, Privacy statement and Cookies policy. Manage the cookies we use in the preference centre. : © 2020 BioMed Central Ltd unless otherwise stated. Part of Springer Nature. "
"DNA binding activities of the Herves transposase from the mosquito Anopheles gambiae" "Amandeep S Kahlon, Robert H Hice, David A O'Brochta, Peter W Atkinson" "Peter W Atkinson" "20 June 2011" "Determining the mechanisms by which transposable elements move within a genome increases our understanding of how they can shape genome evolution. Class 2 transposable elements transpose via a 'cut-and-paste' mechanism mediated by a transposase that binds to sites at or near the ends of the transposon. Herves is a member of the hAT superfamily of class 2 transposons and was isolated from Anopheles gambiae, a medically important mosquito species that is the major vector of malaria in sub-Saharan Africa. Herves is transpositionally active and intact copies of it are found in field populations of A gambiae. In this study we report the binding activities of the Herves transposase to the sequences at the ends of the Herves transposon and compare these to other sequences recognized by hAT transposases isolated from other organisms., We identified the specific DNA-binding sites of the Herves transposase. Active Herves transposase was purified using an Escherichia coli expression system and bound in a site-specific manner to the subterminal and terminal sequences of the left and right ends of the element, respectively, and also interacted with the right but not the left terminal inverted repeat. We identified a common subterminal DNA-binding motif (CG/AATTCAT) that is critical and sufficient for Herves transposase binding., The Herves transposase binds specifically to a short motif located at both ends of the transposon but shows differential binding with respect to the left and right terminal inverted repeats. Despite similarities in the overall structures of hAT transposases, the regions to which they bind in their respective transposons differ in sequence ensuring the specificity of these enzymes to their respective transposon. The asymmetry with which the Herves terminal inverted repeats are bound by the transposase may indicate that these differ in their interactions with the enzyme." "Binding Motif, Terminal Inverted Repeat, Specific Competitor, Herves Element, Transposase Binding" " DNA binding activities of the Herves transposase from the mosquito Anopheles gambiae: Amandeep S Kahlon1, Robert H Hice2, David A O'Brochta3 & Peter W Atkinson1,2 : Mobile DNA volume 2, Article number: 9 (2011) Cite this article : 4175 Accesses: 3 Citations: 0 Altmetric: Metrics details: Abstract: Background: Determining the mechanisms by which transposable elements move within a genome increases our understanding of how they can shape genome evolution. Class 2 transposable elements transpose via a 'cut-and-paste' mechanism mediated by a transposase that binds to sites at or near the ends of the transposon. Herves is a member of the hAT superfamily of class 2 transposons and was isolated from Anopheles gambiae, a medically important mosquito species that is the major vector of malaria in sub-Saharan Africa. Herves is transpositionally active and intact copies of it are found in field populations of A gambiae. In this study we report the binding activities of the Herves transposase to the sequences at the ends of the Herves transposon and compare these to other sequences recognized by hAT transposases isolated from other organisms.: Results: We identified the specific DNA-binding sites of the Herves transposase. Active Herves transposase was purified using an Escherichia coli expression system and bound in a site-specific manner to the subterminal and terminal sequences of the left and right ends of the element, respectively, and also interacted with the right but not the left terminal inverted repeat. We identified a common subterminal DNA-binding motif (CG/AATTCAT) that is critical and sufficient for Herves transposase binding.: Conclusions: The Herves transposase binds specifically to a short motif located at both ends of the transposon but shows differential binding with respect to the left and right terminal inverted repeats. Despite similarities in the overall structures of hAT transposases, the regions to which they bind in their respective transposons differ in sequence ensuring the specificity of these enzymes to their respective transposon. The asymmetry with which the Herves terminal inverted repeats are bound by the transposase may indicate that these differ in their interactions with the enzyme.: Background: Transposable elements (TEs) are ubiquitous components of genomes in which they impact genomic evolution and maintenance [1–6]. Their mobility properties have resulted in their adoption as genetic tools in modern genetics with one of their many uses in biotechnology being the introduction of foreign genes into insect disease vectors of medical and agricultural importance [7–14]. Anopheles gambiae is the principal vector of the malaria-causing parasite Plasmodium falciparum in sub-equatorial Africa and is a mosquito species in which robust TE-based genetic tools need to be developed. At present there are six reports of successful genetic transformation of this mosquito, one using the P element, and five using the piggyBac element, transformation remaining a low frequency event [9, 15–19]. Isolating active, well adapted, endogenous TEs from A gambiae and understanding their biology is likely to improve the efficiency of genetic transformation in this species since these native active TEs are likely to have adapted to overcome or evade the host response systems that are proposed inactivate mobile DNA [20, 21].: Herves is an active class 2 TE that was isolated from A gambiae[22]. It contains a transposase-encoding open reading frame (ORF) that is flanked by left (Herves-L) and right (Herves-R) end sequences with the Herves-L end being unusually long (1,478 bp) compared with the Herves-R end (421 bp) and contains three 100 bp imperfect tandem repeats, commencing 146 bp from the end (Figure 1a). Herves has 11 bp imperfect terminal inverted repeats (TIRs) at the left (L-TIR) and right (R-TIR) ends (Figure 1a) [22]. It is transpositionally active and can genetically transform Drosophila melanogaster[22]. Population dynamics studies suggest that Herves has been recently active within field populations of A gambiae from Kenya and that many intact copies of it are present in these populations [23]. Class 2 TEs often accumulate internal deletions over time that render the elements inactive and so unable to cause further harm to the host organism [24]. The presence of intact forms of Herves and other hAT TEs, such as Hermes, indicate that at least some hAT elements appear, for reasons unknown, less prone to accumulating internal deletions however the significance of this in absence of information concerning MITEs (Miniature Inverted Terminal Elements) generated from them remains unknown [23, 25, 26].: Herves transposase binds to the terminal sequences of L and R ends of the Herves element. (a) Schematic representation of the Herves element. Numbers indicate the distance in bp internal to either the left (L) or right (R) end. (b) SDS-PAGE analysis of purified Herves transposase. A Coomassie-stained gel shows 70 kDa purified Herves transposase. E1-E5 represent different elutions obtained during the final step of protein purification. (c) Transposase binding to the terminal fragment of Herves-L and Herves-R ends. Electrophoretic mobility shift assay (EMSA) analysis with the Herves-L bp 1-100 (lanes 1-4) and Herves-R bp 1-100 (lanes 5-7) probes. The DNA fragments were incubated in the presence (+) or absence (-) of pure transposase. A homologous fragment was used as specific competitor. The E1 flanking sequence from the Hermes transposable element was used as the non-specific competitor [44]. Specific and non-specific competitors were use at 200-fold molar excess to the probe. Arrows indicate various protein DNA complexes.: Class 2 transposases typically bind to the TIRs and nearby internal sequences and mediate transposition to a new genomic location by the classical 'cut-and-paste' mechanism [27]. Other cis-acting sequences, which usually consist of short repeat sequence motifs located close to the TIRs, are also important for proper transposase binding and efficient excision and transposition [28–35]. In many cases, native cis elements are not optimized for maximal transposition mobility; thus, new and improved TE gene vectors can be designed by altering these elements to increase or decrease transposase binding [36, 37]. The identification and characterization of these transposase binding sites and of the specific DNA-binding transposase residues is therefore important to our understanding of the biology and post integration behavior of TEs. This study aimed to identify the DNA sequences of the Herves element bound by its transposase.: Results: Purification of Herves transposase and its binding to the Herves-L end: Herves transposase is 603 amino acids in length and is predicted to have a molecular weight of 70 kDa. Herves transposase was purified from an Escherichia coli expression system and its size was confirmed by SDS-PAGE (Figure 1b). To examine the binding of Herves transposase to the Herves-L end, we focused on the terminal 100 bp region. A radioactively labeled Herves-L 1-100 bp probe was incubated in the presence or absence of purified Herves transposase for use with a molar excess (200-fold) of unlabeled specific and non-specific DNA fragments were used as specific and non-specific competitors, respectively, in electrophoretic mobility shift assays (EMSAs). The transposase interacted with the Herves-L 100 bp probe and formed three transposase-DNA complexes (Figure 1c). A specific competitor competed for the transposase, but the non-specific competitor did not affect binding (Figure 1c) implicating a sequence-specific interaction between the transposase and the probe.: To specify the transposase binding site(s) within this terminal 100 bp sequence, overlapping oligonucleotides (approximately 30 bp in length) were competed with the Herves-L 100 bp probe for transposase binding. The DNA fragments Herves-L bp 12-48 and bp 28-60 competed with this probe in all three transposase-DNA complexes, whereas the Herves-L bp 1-30, bp 48-75, and bp 76-100 fragments had no effect (Figure 2a). This suggested that the Herves transposase binds tightly and specifically within the L bp 12-60 region. The overlapping Herves-L bp 1-30 and bp 48-75 fragments did not alter binding, indicating that a binding motif(s) was present in the Herves-L bp 28-48 bp (Figure 2a). We also observed that the L bp 12-48 bp and bp 28-60 fragments competed partially with the 100 bp probe, whereas the specific bp 1-100 fragment competed fully for transposase binding, implicating the existence of additional binding motifs that act cooperatively with the binding motif(s) in the bp 28-48 region (Figure 2a).: Herves -L bp 12-48 and bp 28-60 are important for transposase binding. Electrophoretic mobility shift assays (EMSAs) with (a) Herves- left (L) bp 1-100 and (b) Herves-L bp 12-48 as probes. Overlapping 30 bp fragments were used as competitors for transposase binding to the probe. The specific and non-specific competitors were use at 200-fold molar excess to the probe, unless specified otherwise. The asterisk (*) indicates various protein DNA complexes. (a) Specific and non-specific competitors was used as described for Figure 1; (b) a 30 bp non-homologous fragment, from the ß2 tubulin gene of Aedes aegypti, was used as a non-specific competitor.: To confirm these results, each of the unlabeled 30 bp fragments was tested against the Herves-L bp 12-48 probe for binding to the transposase. Binding to the transposase was observed for this probe (Figure 2b), which resulted in two transposase-DNA complexes. The unlabeled L bp 28-60 fragment specifically competed for binding of the transposase (Figure 2b). Herves-L bp 48-75, bp 61-90, and bp 76-100 competed partially with the probe, indicating weak transposase binding to these regions. These results suggested that Herves-L bp 12-48 and bp 28-60 have strong and equal binding for Herves transposase, leading us to believe that the DNA binding motif lay within the Herves-L bp 28-48 fragment.: We performed DNase I protection assays to confirm the EMSA results and to specifically identify the DNA region bound by pure transposase. A terminal 1-100 bp fragment was labeled at the 3' end and labeled probes were incubated separately with Herves transposase and subsequently with DNase I and then analyzed on a denaturing polyacrylamide gel. The two 3' end-labeled probes were protected at bp 25-73 and bp 30-75, respectively (Figure 3). Increasing amounts of transposase led to greater protection of the Herves-L 100 bp probe (Figure 3). Overall, the DNase I protection assay results confirmed the EMSA findings, indicating sequence-specific binding of transposase to at least the Herves-L bp 28-48 region.: Transposase binding analysis to the Herves -left (L) end. The single-end-labeled Herves-L 1-100 bp fragment (100 nM) was incubated in presence (+, ++) or absence (-) of DNase I or the transposase. The ++ indicates 1.4 µM of transposase or 0.2 units of DNase I, whereas + indicates 850 nM of transposase or 0.1 units of DNase I. 32P indicates the position where probe was labeled. The solid bars indicate regions protected by the transposase from DNase I degradation. The red asterisk (*) indicates hypersensitive sites.: Transposase binds to the Herves-R end: To investigate the binding of transposase to the Herves-R end, the Herves-R 1-100 bp fragment was radiolabeled and used in EMSAs. Herves transposase interacted specifically with the probe and formed two transposase-DNA complexes (Figure 1c). Unlabeled specific competitor competed with the probe for transposase and, notably, the addition of a non-specific competitor led to the formation of a single, higher-molecular-weight complex (Figure 1c). The molecular composition of this complex, however, is unknown.: Overlapping 30 bp oligonucleotides were then used as probes to identify the transposase binding site(s) within the Herves-R 1-100 bp region by EMSA. The bp 1-30, bp 15-45, and bp 61-90 fragments elicited specific binding of transposase (Figure 4a). Fragment bp 31-60 showed weak, non-specific binding, whereas the bp 46-75 and bp 91-110 fragments failed to bind (Figure 4a).: Transposase binding to the Herves -right (R) end. (a) Herves transposase binds to Herves-R bp 1-30, bp 5-45 and bp 61-90. Electrophoretic mobility shift assay (EMSA) analysis of transposase binding to the overlapping 30 bp fragments (bp 1-30, bp 31-60, bp 61-90, bp 91-110, bp 15-45 and bp 46-75). The asterisk (*) indicates various protein DNA complexes. (b) Herves transposase binding to Herves-R bp 15-45 and bp 61-90. The Herves-R bp 61-90 fragment was used as a probe in EMSAs. The fraction of the transposase-bound probe was quantified using a phosphoimager. A homologous fragment was used as specific competitor at a molar excess of 50-fold, 100-fold and 200-fold, whereas non-specific competition was used as described for Figure 2b. Overlapping fragments (bp 1-30, bp 31-60, bp 91-110, bp 15-45, and bp 46-75) were used as competitors of transposase binding to the probe, at 200-fold molar excess. (c) DNase I protection assay of the Herves-R end. The single-end-labeled Herves-R 1-100 bp fragment (100 nM) was incubated in presence (+, ++) or absence (-) of DNase I or the transposase. The ++ indicates 1.4 µM of transposase or 0.5 units of DNase I, whereas + indicates 850 nM of transposase or 0.25 units of DNase I. 32P indicates the end of the probe that was labeled. The solid bars on the sides indicate the region of the probe protected by the transposase. The asterisk (*) indicates hypersensitive sites.: Two transposase-DNA complexes formed with the Herves-R bp 1-30, compared with a single complex each with the Herves-R bp 15-45 and bp 61-90 fragments, implicating the existence of two transposase binding sites within Herves-R bp 1-30 fragment and one site within both the Herves-R bp 15-45 and bp 61-90 fragments (Figure 4a). To determine relative transposase binding preferences, each 30 bp overlapping DNA fragment was allowed to compete against the Herves-R bp 61-90 probe for transposase binding using EMSAs. Fragment bp 15-45 successfully competed against the probe for transposase, whereas the Herves-R bp 1-30 and bp 31-60 fragments had no effect (Figure 4b). These data suggest that the transposase binds strongly to the terminal Herves-R end at positions bp 15-45 and bp 61-90.: We performed DNase I protection assays to identify specific binding motifs in the R end of Herves however these were inconclusive and showed some evidence of protection at bp 23-35 and bp 63-92 (Figure 4c).: Mutational analysis of the Herves transposase binding motif: Because the Herves-L bp 28-48 fragment showed the strongest binding to transposase, a detailed analysis was performed to define the critical nucleotides for binding. We analyzed 22 sequence variants for their ability to compete with the Herves-L bp 28-60 probe for transposase. Each sequence variant differed from the wild-type sequence by a single nucleotide. An unlabeled wild-type Herves-L bp 28-60 fragment competed successfully against the probe for transposase binding, whereas mutating nucleotides Herves-L bp 32-36 and bp 43-45 abolished this competition, indicating that nucleotides at these positions mediate the binding of transposase (Figure 5a). We identified a conserved binding motif, CG/AATTCAT, in both regions, suggesting that it constitutes the transposase-binding motif. To confirm these results, we simultaneously mutated this putative motif at both locations within the Herves-L bp 28-60 fragment and allowed the mutant (Herves-L 31-47mut) to compete against the wild-type Herves-L bp 28-60 probe. Mutating both sites abolished the interaction, confirming that CG/AATTCAT is the binding site for Herves transposase in the Herves-L end (Figure 5b).: CGATTCAT acts as transposase binding motif. (a) Electrophoretic mobility shift assay (EMSA) analysis of transposase binding to the Herves-left (L) bp 28-60 probe and single nucleotide sequence variants as competitors. For example: G28T indicates that G at position 28 was changed to T in the Herves-L bp 28-60 fragment. The fraction of the transposase bound probe in each lane was quantified using a phosphoimager. Mutations that have no effect on the transposase binding are expected to produce values similar to the specific competitor. (b) The Herves-L 31-47 region is important for binding. The asterisk (*) indicates various protein DNA complexes. The Herves-L 28-60 bp fragment was used as a probe in EMSA experiment. The Herves-L 31-47mut carries mutations at every position within bp 31-47. (c) Role of right (R)-terminal inverted repeat (TIR) in the transposase binding. The Herves-R bp 1-30 fragment was used as a probe. The Herves-R TIRmut and R 15-22mut fragments consist of the Herves-R bp 1-30 sequence with mutations in TIR and bp 15-22 respectively. The probe was incubated in the presence (+) or absence (-) of the transposase or competitors. Unlabeled homologous and non-homologous fragments were used as specific and non-specific competitors, respectively. The asterisk (*) indicates various protein DNA complexes.: The CG/AATTCAT motif is conserved between the Herves-L and Herves-R ends: We identified similar potential binding motifs within the Herves-R bp 15-22 and bp 73-86 regions. Furthermore, the bp 1-30 region also contains the R-TIR, a potential candidate for transposase binding. To determine whether the R-TIR or the CG/AATTCAT motif mediated the binding of transposase to the Herves-R bp 1-30 region, we mutated each region (Herves-R TIRmut and Herves-R bp 15-22mut) and subjected them to EMSA. Mutating each potential binding site abolished its ability to compete against the wild-type probe, suggesting that the CG/AATTCAT motif and R-TIR are both important for the transposase binding to the Herves-R end (Figure 5c).: The CGATTCAT motif is sufficient for purified Herves transposase binding: The CG/AATTCAT motif and its derivatives are repeated several times within the transposase binding regions at both the Herves-L and Herves-R ends (Figure 6). To determine whether the CG/AATTCAT motif was sufficient for transposase binding, we used a probe containing four direct repeats of the CGATTCAT sequence as a probe to measure relative binding to the Herves transposase. The transposase bound to the (CGATTCAT)4 probe and formed two transposase-DNA complexes (Figure 7). Based on the unlabeled specific and non-specific competitors, the interaction was determined to be sequence specific.: The CGATTCAT binding motif is sufficient for transposase binding. The probe (CGATTCAT)4 represents four direct repeats of the CGATTCAT sequence motif. Unlabeled (CGATTCAT)4 and Herves-left (L) bp 28-60 were used as specific competitors, whereas ß2 tubulin was used as the non-specific competitor. The transposase binding was compared between the sequence variants of the binding motif such as CGATTCTT, CGATTCAC and CGTTCAT (each used as four direct repeats). The asterisk (*) indicates various protein DNA complexes.: Sequence of Herves -left (L) bp 1-100 and Herves -right® bp 1-100 showing sequence repeats. The solid arrow indicates conserved CGATTCA transposase binding motif, whereas the dotted arrow indicates the single nucleotide sequence variants.: We used Herves-L bp 28-60 as a specific competitor for transposase against the (CGATTCAT)4 probe and found that it outcompeted it for transposase (Figure 7). Furthermore, splitting the CGATTCAT motif in half abolished the binding (data not shown). Together, these data indicated that the CGATTCAT motif was sufficient for the transposase binding.: We also tested the ability of unlabeled sequence variants of the CGATTCAT motif (CGATTCT T/CGATTCAC/CGTTCAT) to compete against radiolabeled CGATTCAT for transposase binding. None of the sequence variants competed fully with CGATTCAT for the transposase, indicating that CGATTCAT is the strongest binding motif (Figure 7). Nevertheless, CGATTCAC competed partially for transposase, suggesting that this variant may also be important for binding of transposase.: Discussion: We purified active Herves transposase and demonstrated that it site-specifically binds to subterminal and terminal sequences at the Herves-L and Herves-R ends, respectively. Such asymmetrical binding may affect transposition frequency. The Drosophila P element transposase has been shown to bind asymmetrically to the P ends and interchanging the L end sequence with the R end sequence led to fewer transposition events [38]. This phenomenon also occurs for the Ac element in maize and the Tag1 element in Arabidopsis[28, 31].: There was strong transposase binding to the Herves-L bp 12-48 and bp 28-60 regions and relatively weak binding to bp 48-75 as shown by EMSA and DNase I footprinting. None of these fragments however, outcompeted the L bp 1-100 probe for transposase binding, suggesting that the binding was cooperative between two or more regions. Furthermore, the overlapping Herves-L bp 12-48 and bp 28-60 fragments showed similar levels of binding, indicating that the binding motif lies in the overlapping region in Herves-L bp 28-48. In contrast to the L end, the binding occurred toward the terminal sequences on the Herves-R end in regions bp 15-45 and bp 61-90.: EMSA results with the Herves-L bp 28-60 probe and single nucleotide sequence variants indicated that the CGATTCAT motif, or its derivatives, mediated binding of the transposase. The CGATTCAT transposase-binding motif and its derivatives are repeated and conserved in the Herves-L and Herves-R end sequences. Our results suggested that this motif is important and sufficient for transposase binding, because: (1) mutating the CGATTCAT motif at either end abolished binding, and (2) the transposase bound specifically to a synthetic tetramer of the motif.: TEs frequently have multiple transposase binding sites adjacent to their TIRs [33, 39–42]. In other hAT elements, such as Ac, Tol2 and Tag1, their respective transposases bind to short sequence repeats [31, 33, 34, 43]. For Ac and Tol2, the transposase binding sequence motifs differ at the L and R ends [33, 34]. The Herves transposase-binding CGATTCAT motif, however, is highly conserved at both ends with several single nucleotide variants CGATTCAC, CGTTCAT, and CGATTCTT being present. Our results suggest that these additional motifs may also mediate transposase binding. Although these derivatives are related to the CGATTCAT motif, their ability to bind transposase differs. The transposase binds to CGATTCAT, but weakly to the CGATTCAC and CGTTCAT motifs. It is also possible that the transposase only recognizes a subset or a family of related sequences in which GATTC or ATTCA is the central sequence. Similar results have been reported for the Tag1 element, for which the R-TGACCC and L-AAACCC motifs have different affinities for the transposase [31, 43]. The sequences that flank these motifs differ, and although they might fail to influence transposase binding, they may regulate transposition [31].: We observed no binding to the L-TIR. Several related hAT transposases, such as Ac and Tag1, do not bind their L-TIR and R-TIR sequences [33, 43]. This phenomenon raises the possibility that transposase binding to the L-TIR may require the presence of a host factor however nuclear extracts from a Herves transposase-expressing Drosophila S2 cell line did not bind to L-TIR making the argument for such a factor less compelling. Nevertheless, pure Herves transposase interacted with the R-TIR sequence, the binding at which appeared to be cooperative since both the R-TIR and CGATTCAT motif at 15-22 bp participated in it.: We have identified the sites within the Herves element to which the Herves transposase binds and shown that it binds asymmetrically to sequences at either end of the element. Future work will be directed towards determining whether mutants of Herves which show changes in the binding of the transposase will affect the transpositional activity of Herves in vivo leading to the development of this endogenous TE of A gambiae as a genetic tool in this medically important mosquito species.: Conclusions: We identified the specific DNA-binding sites of the Herves transposase, a member of the hAT transposon superfamily. We found that it displayed an asymmetry of specific binding to the L and R ends of the Herves transposon in that it bound to both subterminal regions but interacted only with the R, but not L, TIR. We identified a common subterminal DNA-binding motif (CG/AATTCAT) that is critical and sufficient for Herves transposase binding. The asymmetry of binding of the transposase to the L and R ends may indicate that these ends differ in their interactions with the enzyme during the transposition reaction. The differences in transposase binding sites between different hAT transposases illustrates that this superfamily provides a fascinating diversity with which to study the biology of transposition.: Methods: Plasmid constructions: The Herves ORF was cloned into pBAD myc/HisA (Invitrogen, Carlsbad, CA). The Bsp HI (incorporated into the Herves start codon) and Kpn I restriction sites were used to amplify a 766-bp fragment of the Herves ORF using the Herves F-Bsp HI (GATCAATCATGATGGCTCCAACAAACGCAAC) and Herves R-Kpn I (GTTCAAGGTACCTTGAATCCAATTAGCTATATTCTTACC) primers.: The resulting fragment was cloned into Nco I/Kpn I-digested pBAD myc/HisA to generate pBADHvPCR1. The remaining Herves ORF (1,118 bp) was amplified using the Herves F-Kpn I (CAAGGTACCTTGAACAAATTTGACATAGAGGATAAG) and Herves R-Hin dIII primers (TATCAAGCTTTGAACAAATTTGACATAGAGGATAAG) and cloned into Kpn I/Hin dIII digested pBADHvPCR1 to generate pBADHv1.: Herves transposase purification: Herves transposase was purified by His-tag purification as described [47]. pBADHv1-transformed LMG 194 E coli cells were grown overnight at 30°C in LB media that contained carbenicillin (100 mg/ml). The overnight culture was diluted 1:100 in LB and carbenicillin (100 mg/ml) and grown at 30°C and 230 rpm to an absorbance of 0.6 at 600 nm. The cultures were then induced with 0.1% L-arabinose and shaken at 16°C for 18 h. The cells were harvested and washed by centrifugation with binding buffer (0.5 M NaCl, 20 mM Tris-Cl pH 7.9, 10% glycerol, 10 mM imidazole). The cells were lysed twice using a French press at 20,000 psi. The cell lysate was cleared by centrifugation and by passing through 0.45 µm syringe filters. Cleared lysate was loaded onto Sepharose (Amersham/GE Healthcare, Piscataway, NJ) chromatography columns that were pre-equilibrated with Ni2+. The columns were washed with 10 ml binding buffer and 6 ml wash buffer (0.5 M NaCl, 20 mM Tris-Cl pH 7.9, 10% glycerol, 50 mM imidazole). His-tagged Herves was eluted in five 1 ml fractions of elution buffer (0.5 M NaCl, 20 mM Tris-Cl pH 7.9, 10% glycerol, 200 mM imidazole). The purified Herves transposase was dialyzed overnight in dialysis buffers 1 (0.5 M NaCl, 20 mM Tris base, 10% glycerol pH 8.0) and 2 (0.5 M NaCl, 20 mM Tris base, 2 mM dithiothreitol (DTT), 25% glycerol pH 8.0) for 3 h using a Slide-A-Lyzer dialysis cassette (Thermo Fisher Scientific, Waltham, MA). The dialyzed, purified Herves transposase was stored at -80°C.: EMSAs: The DNA fragment (100 nM) that we tested for transposase binding was end labeled using T4 polynucleotide kinase and 32P ATP and purified on a Biospin 30 column (BioRad, Hercules, CA). The labeled DNA fragment (probe) was incubated at 4°C for 45 min with 1 × EMSA binding buffer (16 mM Tris pH 8.0, 0.2 µg bovine serum albumin (BSA), 0.4 µg T3 single-stranded oligo, 0.5 µg poly(dI-dC), 1 mM DTT, 150 mM NaCl, 0.25% Triton X) and 850 nM of Herves transposase. Specific and non-specific DNA fragments were used as specific and non-specific competitors, respectively (if applicable). The reaction was incubated with the probe for an additional 40 min at 4°C. The non-specific competitors were 126 bp gDNA fragment (E1) that flanks Hermes TE from Musca domestica and a 30 bp DNA oligo from Aedes aegypti ß2 tubulin. The EMSA reaction products were analyzed on a 5% TBE polyacrylamide gel (Bio-Rad).: DNase I protection assay: DNA fragments (100 bp each) from the Herves-L and Herves-R ends, containing an Eco RV restriction site at the L-end or R-end, were cloned into pJET 1.2 (Fermentas/Thermo Fisher Scientific, Piscataway, NJ) to generate pL5'Eco RV, pL3'Eco RV, pR5'Eco RV, and pR3'Eco RV. The transferred and non-transferred strands from the Herves-L and Herves-R ends were selectively radiolabeled at one end by digesting pL5'Eco RV, pL3'Eco RV, pR5'Eco RV, and pR3'Eco RV with Xho I and Eco RV and labeling them with [32P] dATP using Klenow (NEB, Ipswich, MA). Herves transposase was allowed to bind to 100 nM single end-labeled DNA fragment (probe) under the same binding conditions as in the EMSA. The optimal concentrations of transposase were determined empirically (Additional files 1 and 2). The DNA probe was subjected to DNase I digestion for 2 min at 4°C. The reaction was stopped by adding stop solution (92% ethanol, 0.7 M ammonium acetate, 0.35 µg tRNA) for 15 min in a dry ice/ethanol bath. DNA was extracted with phenol/chloroform and precipitated with ethanol. The reaction products were analyzed on a 10% denaturing polyacrylamide sequencing gel. The DNA sequencing kit 2.0 (USB) was used to construct a nucleotide ladder that was analyzed with the reaction products on the sequencing gel (Additional file 3).: References: Jiang N, Bao Z, Zhang X, Eddy SR, Wessler SR: Pack-MULE transposable elements mediate gene evolution in plants. Nature. 2004, 431: 569-573. 10.1038/nature02953.: Kidwell MG, Lisch D: Transposable elements as sources of genomic variation. Mobile DNA II. Edited by: Craig NL, Craigie R, Gellert M, Lambowitz AM. 2002, Washington, DC: American Society for Microbiology Press, 59-90.: Levis RW, Ganesan R, Houtchens K, Tolar LA, Sheen FM: Transposons in place of telomeric repeats at a Drosophila telomere. Cell. 1993, 75: 1083-1093. 10.1016/0092-8674(93)90318-K.: Hurst GDD, Werren JH: The role of selfish genetic elements in eukaryotic evolution. Nat Rev Genet. 2001, 2: 597-606.: Dimitri P, Junakovic N: Revising the selfish DNA hypothesis: new evidence on accumulation of transposable elements in heterochromatin. Trends Genet. 1999, 15: 123-124. 10.1016/S0168-9525(99)01711-4.: Hua-Van A, Le Rouzic A, Maisonhaute C, Capy P: Abundance, distribution and dynamics of retrotransposable elements and transposons: similarities and differences. Cytogenet Genome Res. 2005, 110: 426-440. 10.1159/000084975.: Atkinson PW: Genetic engineering in insects of agricultural importance. Insect Biochem Mol Biol. 2002, 32: 1237-1242. 10.1016/S0965-1748(02)00086-3.: Smith RC, Walter MF, Hice RH, O'Brochta DA, Atkinson PW: Testis-specific expression of the ß2 tubulin promoter of Aedes aegypti and its application as a genetic sex-separation marker. Insect Mol Biol. 2007, 16: 61-71. 10.1111/j.1365-2583.2006.00701.x.: Grossman GL, Rafferty CS, Clayton JR, Stevens TK, Mukabayire O, Benedict MQ: Germline transformation of the malaria vector, Anopheles gambiae, with the piggyBac transposable element. Insect Mol Biol. 2001, 10: 597-604. 10.1046/j.0962-1075.2001.00299.x.: Catteruccia F, Nolan T, Loukeris TG, Blass C, Savakis C, Kafatos FC, Crisanti A: Stable germline transformation of the malaria mosquito Anopheles stephensi. Nature. 2000, 405: 959-962. 10.1038/35016096.: Jasinskiene N, Coates CJ, Benedict MQ, Cornel AJ, Rafferty CS, James AA, Collins FH: Stable transformation of the yellow fever mosquito, Aedes aegypti, with the Hermes element from the housefly. Proc Natl Acad Sci USA. 1998, 95: 3743-3747. 10.1073/pnas.95.7.3743.: Michel K, Stamenova A, Pinkerton AC, Franz G, Robinson AS, Gariou-Papalexiou A, Zacharopoulou A, O'Brochta DA, Atkinson PW: Hermes-mediated germ-line transformation of the Mediterranean fruit fly, Ceratitis capitata. Insect Mol Biol. 2001, 10: 155-162. 10.1046/j.1365-2583.2001.00250.x.: Coates CJ, Jasinskiene N, Miyashiro L, James AA: Mariner transposition and transformation of the yellow fever mosquito, Aedes aegypti. Proc Natl Acad Sci USA. 1998, 95: 3748-3751. 10.1073/pnas.95.7.3748.: O'Brochta DA, Atkinson PW, Lehane MJ: Transformation of Stomoxys calcitrans with a Hermes gene vector. Insect Mol Biol. 2000, 9: 531-538. 10.1046/j.1365-2583.2000.00217.x.: Miller LH, Sakai RK, Romans P, Gwadz RW, Kantoff P, Coon HG: Stable integration and expression of a bacterial gene in the mosquito, Anopheles gambiae. Science. 1987, 237: 779-781. 10.1126/science.3039658.: Kim W, Koo H, Richman AM, Seeley D, Vizioli J, Klocko AD, O'Brochta DA: Ectopic expression of a cecropin transgene in the human malaria vector mosquito Anopheles gambiae (Diptera: Culicidae): effects on susceptibility to Plasmodium. J Med Entomol. 2004, 41: 447-455. 10.1603/0022-2585-41.3.447.: Lombardo F, Lycett GJ, Lanfrancotti A, Coluzzi M, Arcà B: Analysis of apyrase 5' upstream region validates improved Anopheles gambiae transformation technique. BMC Res Notes. 2009, 2: 24-10.1186/1756-0500-2-24.: Meredith JM, Basu S, Nimmo DD, Larget-Thiery I, Warr EL, Underhill A, McArthur CC, Carter V, Hurd H, Bourgouin C, Eggleston P: Site-specific integration and expression of an anti-malarial gene in transgenic Anopheles gambiae significantly reduces Plasmodium infections. PLoS One. 2011, 6: e14587-10.1371/journal.pone.0014587.: Papathanos PA, Windbichler N, Menichelli M, Burt A, Crisanti A: The vasa regulatory region mediates germline expression and maternal transmission of proteins in the malaria mosquito Anopheles gambiae: a versatile tool for genetic control strategies. BMC Mol Biol. 2009, 10: 65-10.1186/1471-2199-10-65.: Saito K, Siomi MC: Small RNA-mediated quiescence of transposable elements in animals. Dev Cell. 2010, 19: 687-697. 10.1016/j.devcel.2010.10.011.: Senti KA, Brennecke J: The piRNA pathway: a fly's perspective on the guardian of the genome. Trends Genet. 2010, 26: 499-509. 10.1016/j.tig.2010.08.007.: Arensburger P, Kim YJ, Orsetti J, Aluvihare C, O'Brochta DA, Atkinson PW: An active transposable element, Herves, from the African malaria mosquito Anopheles gambiae. Genetics. 2005, 169: 697-708. 10.1534/genetics.104.036145.: Subramanian RA, Arensburger P, Atkinson PW, O'Brochta DA: Transposable element dynamics of the hAT element Herves in the human malaria vector Anopheles gambiae s.s. Genetics. 2007, 176: 2477-2487. 10.1534/genetics.107.071811.: Engels WR, Johnson-Schlitz DM, Eggleston WB, Sved J: High-frequency P element loss in Drosophila is homolog dependent. Cell. 1990, 62: 515-525. 10.1016/0092-8674(90)90016-8.: Subramanian RA, Cathcart LA, Krafsur ES, Atkinson PW, O'Brochta DA: Hermes transposon distribution and structure in Musca domestica. J Hered. 2009, 100: 473-480. 10.1093/jhered/esp017.: Galindo MI, Ladevèze V, Lemeunier F, Kalmes R, Periquet G, Pascual L: Spread of autonomous transposable element hobo in the genome of Drosophila melanogaster. Mol Biol Evol. 1995, 12: 723-734.: Craig NL: Mobile DNA: an introduction. Mobile DNA II. Edited by: Craig NL, Craigie R, Gellert M, Lambowitz A. 2002, Washington, DC: ASM Press, 3-11.: Coupland G, Plum C, Chatterjee S, Post A, Starlinger P: Sequences near the termini are required for transposition of the maize transposon Ac in transgenic tobacco plants. Proc Natl Acad Sci USA. 1989, 86: 9385-9388. 10.1073/pnas.86.23.9385.: Li X, Harrell RA, Handler AM, Beam T, Hennessy K, Fraser MJ: piggyBac internal sequences are necessary for efficient transformation of target genomes. Insect Mol Biol. 2005, 14: 17-30. 10.1111/j.1365-2583.2004.00525.x.: Liu D, Wang R, Galli M, Crawford NM: Somatic and germinal excision activities of the Arabidopsis transposon Tag1 are controlled by distinct regulatory sequences within Tag1. Plant Cell. 2001, 13: 1851-1863.: Liu D, Mack A, Wang R, Galli M, Belk J, Ketpura NI, Crawford NM: Functional dissection of the cis-acting sequences of Arabidopsis transposable element Tag1 reveals dissimilar subterminal sequence and minimal spacing requirements for transposition. Genetics. 2001, 157: 817-830.: Li X, Heinrich JC, Scott MJ: piggyBac-mediated transposition in Drosophila melanogaster: an evaluation of the use of constitutive promoters to control transposase gene expression. Insect Mol Biol. 2001, 10: 447-455.: Kunze R, Starlinger P: The putative transposase of transposable element Ac from Zea mays L. interacts with subterminal sequences of Ac. EMBO J. 1989, 8: 3177-3185.: Urasakai A, Morvan G, Kawakami K: Functional dissection of the Tol2 transposable element identified the minimal cis-sequence and a highly repetitive sequence in the subterminal region essential for transposition. Genetics. 2006, 174: 639-649. 10.1534/genetics.106.060244.: Atkinson H, Chalmers R: Delivering the goods: viral and non-viral gene therapy systems and the inherent limits on cargo DNA and internal sequences. Genetica. 2010, 138: 485-498. 10.1007/s10709-009-9434-3.: Yang G, Nagel DH, Feschotte C, Hancock CN, Wessler SR: Tuned for transposition: molecular determinants underlying the hyperactivity of a Stowaway MITE. Science. 2009, 325: 1391-1394. 10.1126/science.1175688.: Guynet C, Archard A, Hoang BT, Barabas O, Hickman AB, Dyda F, Chandler M: Resetting the site: redirecting integration of an insertion sequence in a predictable way. Molec Cell. 2009, 34: 612-619. 10.1016/j.molcel.2009.05.017.: Mullins MC, Rio DC, Rubin GM: cis-acting DNA sequence requirements for P-element transposition. Genes Dev. 1989, 3: 729-738. 10.1101/gad.3.5.729.: Cristancho MA, Gaitan AL: Isolation, characterization and amplification of simple sequence repeat loci in coffee. Crop Breed Appl Biotechnol. 2008, 8: 321-329.: Craigie R, Mizuuchi K: Site-specific recognition of the bacteriophage Mu ends by the MuA protein. Cell. 1984, 39: 387-394. 10.1016/0092-8674(84)90017-5.: Vos JC, Plasterk RHA: Tc1 transposase of Caenorhabditis elegans is an endonuclease with a bipartite DNA binding domain. EMBO J. 1994, 13: 6125-6132.: Colloms SD, van Luenen HGAM, Plasterk RHA: DNA binding activities of the Caenorhabditis elegans Tc3 transposase. Nucleic Acids Res. 1994, 22: 5548-5554. 10.1093/nar/22.25.5548.: Mack AM, Crawford NM: The Arabidopsis TAG1 transposase has an N-terminal zinc finger DNA binding domain that recognizes distinct subterminal motifs. Plant Cell. 2001, 13: 2319-2331.: Warren WD, Atkinson PW, O'Brochta DA: The Hermes transposable element from the housefly, Musca domestica, is a short inverted repeat-type element of the hobo, Ac, and Tam3 (hAT) element family. Genetic Res Camb. 1994, 64: 87-97. 10.1017/S0016672300032699.: Download references: Acknowledgements: This research was supported by PHS grants AI45741 and GM48102 to PWA and DAO, respectively, and by the Interdepartmental Graduate Program in Cell, Molecular and Developmental Biology at the University of California, Riverside.: Author information: Affiliations: Corresponding author: Correspondence to Peter W Atkinson.: Additional information: Competing interests: The authors declare that they have no competing interests.: Authors' contributions: ASK performed the experiments and wrote early drafts of the manuscript. RHH provided technical expertise and assisted in experimental design. DAO participated in experimental design and edited the manuscript. PWA conceived the study, directed the experimentation and edited subsequent drafts of the manuscript. All authors read and approved the final manuscript.: Electronic supplementary material: Additional file 1:DNase I protection of the Herves right (R) end. Various concentrations of Herves transposase (as indicated) were tried to titrate for the optimum concentration for the protections assays for the Herves right end. Concentrations higher than 850 nM (such as 1 µM or 1.2 µM) or lower than 850 nM (150 nM, 300 nM and 428 nM) produced non-specific protection of the probe or no protection at all, respectively. (a) 100 nM or (b) 50 nM and 100 nM of the single-end-labeled Herves-R 1-100 bp fragment was incubated in absence (-) or presence of the transposase at various concentrations as indicated. 32P indicates end of the probe that was labeled. (PDF 256 KB): Additional file 2:DNase protection of the Herves left (L) end. Various concentrations of Herves transposase (as indicated) were tried to titrate for the optimum concentration for the protections assays for the Herves left end. Concentrations higher than 850 nM (such as 1 µM or 1.2 µM) or lower than 850 nM (150 nM, 300 nM and 428 nM) produced non-specific protection of the probe or no protection at all, respectively. (a) 50 nM or (b) 100 nM of the single-end-labeled Herves-L 1-100 bp fragment was incubated in absence (-) or presence of the transposase at various concentrations as indicated. 32P indicates end of the probe that was labeled. (PDF 126 KB): Additional file 3:Figure3with DNA ladder. The panel is identical to that shown in the left of Figure 3 but with a DNA ladder. The nucleotide positions were determined by the Sanger sequencing reactions shown in lanes G and A. (PDF 95 KB): Authors’ original submitted files for images: Below are the links to the authors’ original submitted files for images.: Authors’ original file for figure 1: Authors’ original file for figure 2: Authors’ original file for figure 3: Authors’ original file for figure 4: Authors’ original file for figure 5: Authors’ original file for figure 6: Authors’ original file for figure 7: Rights and permissions: Reprints and Permissions: About this article: Cite this article: Kahlon, A.S., Hice, R.H., O'Brochta, D.A. et al. DNA binding activities of the Herves transposase from the mosquito Anopheles gambiae. Mobile DNA 2, 9 (2011). https://doi.org/10.1186/1759-8753-2-9: Download citation: Received: 30 August 2010: Accepted: 20 June 2011: Published: 20 June 2011: DOI: https://doi.org/10.1186/1759-8753-2-9: Keywords: Advertisement: ISSN: 1759-8753: Follow BMC: By using this website, you agree to our Terms and Conditions, Privacy statement and Cookies policy. Manage the cookies we use in the preference centre. : © 2020 BioMed Central Ltd unless otherwise stated. Part of Springer Nature. "
"Mobile DNA and the TE-Thrust hypothesis: supporting evidence from the primates" "Keith R Oliver, Wayne K Greene" "Keith R Oliver" "31 May 2011" "Transposable elements (TEs) are increasingly being recognized as powerful facilitators of evolution. We propose the TE-Thrust hypothesis to encompass TE-facilitated processes by which genomes self-engineer coding, regulatory, karyotypic or other genetic changes. Although TEs are occasionally harmful to some individuals, genomic dynamism caused by TEs can be very beneficial to lineages. This can result in differential survival and differential fecundity of lineages. Lineages with an abundant and suitable repertoire of TEs have enhanced evolutionary potential and, if all else is equal, tend to be fecund, resulting in species-rich adaptive radiations, and/or they tend to undergo major evolutionary transitions. Many other mechanisms of genomic change are also important in evolution, and whether the evolutionary potential of TE-Thrust is realized is heavily dependent on environmental and ecological factors. The large contribution of TEs to evolutionary innovation is particularly well documented in the primate lineage. In this paper, we review numerous cases of beneficial TE-caused modifications to the genomes of higher primates, which strongly support our TE-Thrust hypothesis." "Signal Recognition Particle, Human Lineage, Primate Genome, Chimpanzee Genome, Ectopic Recombination" " Mobile DNA and the TE-Thrust hypothesis: supporting evidence from the primates: Keith R Oliver1 & Wayne K Greene2 : Mobile DNA volume 2, Article number: 8 (2011) Cite this article : 12k Accesses: 59 Citations: 8 Altmetric: Metrics details: Abstract: Transposable elements (TEs) are increasingly being recognized as powerful facilitators of evolution. We propose the TE-Thrust hypothesis to encompass TE-facilitated processes by which genomes self-engineer coding, regulatory, karyotypic or other genetic changes. Although TEs are occasionally harmful to some individuals, genomic dynamism caused by TEs can be very beneficial to lineages. This can result in differential survival and differential fecundity of lineages. Lineages with an abundant and suitable repertoire of TEs have enhanced evolutionary potential and, if all else is equal, tend to be fecund, resulting in species-rich adaptive radiations, and/or they tend to undergo major evolutionary transitions. Many other mechanisms of genomic change are also important in evolution, and whether the evolutionary potential of TE-Thrust is realized is heavily dependent on environmental and ecological factors. The large contribution of TEs to evolutionary innovation is particularly well documented in the primate lineage. In this paper, we review numerous cases of beneficial TE-caused modifications to the genomes of higher primates, which strongly support our TE-Thrust hypothesis.: Introduction: Building on the groundbreaking work of McClintock [1] and numerous others [2–14], we further advanced the proposition of transposable elements (TEs) as powerful facilitators of evolution [15] and now formalise this into 'The TE-Thrust hypothesis'. In this paper, we present much specific evidence in support of this hypothesis, which we suggest may have great explanatory power. We focus mainly on the well-studied higher primate (monkey, ape and human) lineages. We emphasize the part played by the retro-TEs, especially the primate-specific non-autonomous Alu short interspersed element (SINE), together with its requisite autonomous partner long interspersed element (LINE)-1 or L1 (Figure 1A). In addition, both ancient and recent endogenizations of exogenous retroviruses (endogenous retroviruses (ERVs)/solo long terminal repeats (sLTRs) have been very important in primate evolution (Figure 1A). The Alu element has been particularly instrumental in the evolution of primates by TE-Thrust. This suggests that, at least in some mammalian lineages, specific SINE-LINE pairs have a large influence on the trajectory and extent of evolution on the different clades within that lineage.: Summary of the effect of TEs on primate evolution. (A) Transposable elements (TEs) implicated in the generation of primate-specific traits. (B) Types of events mediated by TEs underlying primate-specific traits. Passive events entail TE-mediated duplications, inversions or deletions. (C) Aspects of primate phenotype affected by TEs. Based on the published data shown in Tables 3 to 6.: The TE-Thrust Hypothesis: The ubiquitous, very diverse, and mostly extremely ancient TEs are powerful facilitators of genome evolution, and therefore of phenotypic diversity. TE-Thrust acts to build, sculpt and reformat genomes, either actively by TE transposition and integration (active TE-Thrust), or passively, because after integration, TEs become dispersed homologous sequences that facilitate ectopic DNA recombination (passive TE-Thrust). TEs can cause very significant and/or complex coding, splicing, regulatory and karyotypic changes to genomes, resulting in phenotypes that can adapt well to biotic or environmental challenges, and can often invade new ecological niches. TEs are usually strongly controlled in the soma, where they can be damaging [16, 17], but they are allowed some limited mobility in the germline and early embryo [18–20], where, although they can occasionally be harmful, they can also cause beneficial changes that can become fixed in a population, benefiting the existing lineage, and sometimes generating new lineages.: There is generally no Darwinian selection for individual TEs or TE families, although there may be exceptions, such as the primate-specific Alu SINEs in gene-rich areas [21, 22]. Instead, according to the TE-Thrust hypothesis, there is differential survival of those lineages that contain or can acquire suitable germline repertoires of TEs, as these lineages can more readily adapt to environmental or ecological changes, and can potentially undergo, mostly intermittently, fecund radiations. We hypothesize that lineages lacking a suitable repertoire of TEs are, if all else is equal, are liable to stasis, possibly becoming 'living fossils' or even becoming extinct.: TE activity is usually intermittent [23–27], with periodic bursts of transposition due to interplay between various cellular controls, various stresses, de novo syntheses, de novo modifications, new infiltrations of DNA-TEs (by horizontal transfer), or new endogenizations of retroviruses. However, the vast majority of viable TEs usually undergo slow mutational decay and become non-viable (incapable of activity), although some superfamilies have remained active for more than 100 Myr. Episodic TE activity and inactivity, together with differential survival of lineages, suggests an explanation for punctuated equilibrium, evolutionary stasis, fecund lineages and adaptive radiations, all found in the fossil record, and for extant 'fossil species' [15, 28].: TE-Thrust is expected to be optimal in lineages in which TEs are active and/or those that possess a high content of homogeneous TEs, both of which can promote genomic dynamism [15]. We hypothesize four main modes of TE-Thrust (Table 1), but as these are extremes of continuums, many intermediate modes are possible.: Mode 1: periodically active heterogeneous populations of TEs result in stasis with the potential for intermittent punctuation events.: Mode 2: periodically active homogenous populations of TEs result in: 1) gradualism as a result of ectopic recombination, if the TE population is large, with the potential for periodic punctuation events, or 2) stasis with the potential for periodic punctuation events if the TE population is small.: Mode 3: non-viable heterogeneous populations of TEs, in the absence of new infiltrations, result in prolonged stasis, which can sometimes result in extinctions and/or 'living fossils'.: Mode 4: non-viable homogenous populations of TEs, in the absence of new infiltrations, can result in: 1) gradualism as a result of ectopic recombination, if the TE population is large or 2) stasis if the TE population is small.: These modes of TE-Thrust are in agreement with the findings of palaeontologists [29] and some evolutionary biologists [30] that punctuated equilibrium is the most common mode of evolution, but that gradualism and stasis also occur. Many extant 'living fossils' are also known.: We acknowledge that TE-Thrust acts by enhancing evolutionary potential, and whether that potential is actually realized is heavily influenced by environmental, ecological and other factors. Moreover, there are many other 'engines' of evolution besides TE-Thrust, such as point mutation, simple sequence repeats, endosymbiosis, epigenetic modification and whole-genome duplication [31–35], among others. These often complement TE-Thrust; for example, point mutations can endow duplicated or retrotransposed genes with new functions [36, 37]. There may also be other, as yet unknown, or hypothesized but unconfirmed, 'engines' of evolution.: Higher primate genomes are very suited to TE-Thrust as they possess large homogeneous populations of TEs: Human and other extant higher primate genomes are well endowed with a relatively small repertoire of TEs (Table 2). These TEs, which have been extensively implicated in engineering primate-specific traits (Table 3; Table 4; Table 5; Table 6), are largely relics of an evolutionary history marked by periodic bursts of TE activity [25, 38, 39]. TE activity is presently much reduced, but extant simian lineage genomes remain well suited for passive TE-Thrust, with just two elements, Alu and L1, accounting for over 60% of the total TE DNA sequence [21, 40, 41]. In humans, there are 10 times as many mostly homogeneous class I retro-TEs as there are very heterogeneous class II DNA-TEs [21]. Only L1, Alu, SVA (SINE-R, variable number of tandem repeats (VNTR), Alu) and possibly some ERVs, remain active in humans [42].: L1 and the primate-specific Alu predominate in simians [21, 40, 41], and thus strongly contribute to TE-Thrust in this lineage (Figure 1A). The autonomous L1 is almost universal in mammals, whereas the non-autonomous Alu, like most SINEs, is conspicuously lineage-specific, having been synthesized de novo, extremely unusually, from a 7SL RNA-encoding gene. The confinement of Alu to a single mammalian order is typical of younger SINEs, whereas ancient SINEs, or exapted remnants of them, may be detectable across multiple vertebrate classes [43]. Alu possesses additional unusual characteristics: extreme abundance (1.1 million copies, occurring every 3 kb on average in the human genome), frequent location in gene-rich regions, and a lack of evolutionary divergence [21, 44]. Their relatively high homology is most easily explained as being the result of functional selection helping to prevent mutational drift. Thus, Alus have been hypothesized to serve biological functions in their own right, leading to their selection and maintenance in the primate genome [22]. For example, A-to-I RNA editing, which has a very high prevalence in the human genome, mainly occurs within Alu elements [45], which would seem to provide primates with a genetic sophistication beyond that of other mammals. Alus may therefore not represent a peculiar, evolutionary neutral invasion, but rather positively selected functional elements that are resistant to mutational degradation [46]. This has significance for TE-Thrust, as it would greatly prolong the usefulness of Alus as facilitators of evolution within primate lineages.: Other human retro-TEs include the fossil tRNA mammalian-wide intespersed repeat (MIR) SINE, which amplified approximately 130 Mya [21, 47] and the much younger SVA, a non-autonomous composite element partly derived from ERV and Alu sequences, which is specific to the great apes and humans [48]. Like Alus, SVAs are mobilised by L1-encoded enzymes and, similar to Alu, a typical full-length SVA is GC-rich, and thus constitutes a potential mobile CpG island. Importantly, ERVs are genome builders/modifiers of exogenous origin [49]. Invasion of ERVs seems to be particularly associated with a key mammalian innovation, the placenta (Table 4). The endogenisation of retroviruses and the horizontal transfer of DNA-TEs into germlines clearly show that the Weismann Barrier is permeable, contrary to traditional theory.: The DNA-TEs, which comprise just 3% of the human genome, are extremely diverse, but are now completely inactive [21, 50]. Although some have been exapted within the simian lineage as functional coding sequences (Table 3; Table 4; Table 5; Table 6), DNA-TEs, it seems, cannot now be a significant factor for TE-Thrust in primates, unless there are new infiltrations.: TE-Thrust influences evolutionary trajectories: A key proposal of our TE-Thrust hypothesis is that TEs can promote the origin of new lineages and drive lineage divergence through the engineering of specific traits. Ancestral TEs shared across very many lineages can, by chance, lead to the delayed generation of traits in one lineage but not in another. For example, more than 100 copies of the ancient amniote-distributed AmnSINE1 are conserved as non-coding elements specifically among mammals [51]. However, as they often show a narrow lineage specificity, we hypothesize that younger SINEs (with their partner LINEs) may have a large influence upon the trajectory and the outcomes of the evolution within clades, as is apparent with the Alu/L1 pair in primates (Figure 1A). Probably not all SINEs are equal in this ability; it seems that some SINEs are more readily mobilised than others, and when mobilised, some SINEs are more effective than others at facilitating evolution by TE-Thrust. The extremely abundant primate Alu dimer seems to illustrate this. Whereas the overwhelming majority of SINEs are derived from tRNAs, Alus may have proliferated so successfully because they are derived from the 7SL RNA gene [52], which is part of the signal recognition particle (SRP) that localises to ribosomes. Alu RNAs can therefore bind proteins on the SRP and thus be retained on the ribosome, in position to be retrotransposed by newly synthesized proteins encoded by their partner L1 LINEs [53].: Among the primates, the simians have undergone the greatest evolutionary transitions and radiation. Of the approximately 367 extant primate species, 85% are simians, with the remainder being prosimians, which diverged about 63 Mya. Significantly, large amplifications of L1, and thus of Alus and other sequences confined to simians, offer a plausible explanation for the lack of innovation in the trajectory of evolution in the prosimian lineages, compared with the innovation in the simian lineages. Since their divergence from the basal primates, the simians have experienced repeated periods of intense L1 activity that occurred from about 40 Mya to about 12 Mya [54]. The highly active simian L1s were responsible for the very large amplification of younger Alus and of many gene retrocopies [55]. Possibly, differential activity of the L1/Alu pair may have driven the trajectory and divergence of the simians, compared with the prosimians. The greater endogenization of some retroviruses in simians compared with prosimians [56] may also have played a part. These events may also explain the larger genome size of the simians compared with prosimians [57].: A significant feature of Alus is their dimeric structure, involving a fusion of two slightly dissimilar arms [58]. This added length and complexity seems to increase their effectiveness as a reservoir of evolutionarily useful DNA sequence or as an inducer of ectopic recombination. It may therefore be no coincidence that simian genomes are well endowed with dimeric Alus. Viable SINEs in the less fecund and less evolutionary innovative prosimians are heterogeneous, and include the conventional dimeric Alu, Alu-like monomers, Alu/tRNA dimers and tRNA SINEs [59]. This distinctly contrasts with simian SINEs; in simians, viable SINEs are almost entirely dimeric Alus. Thus, both qualitatively and quantitatively, the Alu dimer seems to represent a key example of the power of a SINE to strongly influence evolutionary trajectory.: Although these coincident events cannot, by themselves, be a clear indication of cause and effect, distinct Alu subfamilies (AluJ, AluS, AluY) correlate with the divergence of simian lineages [38, 39]. Whereas the AluJ subfamily was active about 65 Mya when the separation and divergence between the simians and the prosimians occurred, the AluS subfamily was active beginning at about 45 Mya, when the Old World monkey proliferation occurred, followed by a surge in AluY activity and expansion beginning about 30 Mya, contemporaneous with the split between apes and Old World monkeys [38, 39]. Thus, periodic expansions of Alu subfamilies in particular seem to correspond temporally with major divergence points in primate evolution. More recent Alu activity may be a factor in the divergence of the human and chimpanzee lineages, with Alus having been three times more active in humans than in chimpanzees [40, 60]. Moreover, at least two new Alu subfamilies (AluYa5 and AluYb8) have amplified specifically within the human genome since the human-chimpanzee split [40, 60, 61].: Passive TE-Thrust mediated by the Alu/L1 pair has also been evident as a force contributing to lineage divergence in the primates. Ectopic recombinations between Alus, in particular, are a frequent cause of lineage-specific deletion, duplication or rearrangement. Comparisons between the human and chimpanzee genomes have revealed the extent to which they have passively exerted their effects in the relatively recent evolutionary history of primates. An examination of human-specific Alu recombination-mediated deletion (ARMD) identified 492 ARMD events responsible for the loss of about 400 kb of sequence in the human genome [62]. Likewise, Han et al.[63] reported 663 chimpanzee-specific ARMD events, deleting about 771 kb of genomic sequence, including exonic sequences in six genes. Both studies suggested that ARMD events may have contributed to the genomic and phenotypic diversity between chimpanzees and humans. L1-mediated recombination also seems to be a factor in primate evolution, with Han et al.[64] reporting 50 L1-mediated deletion events in the human and chimpanzee genomes. The observed high enrichment of TEs such as Alu at low-copy-repeat junctions indicates that TEs have been an important factor in the generation of segmental duplications that are uniquely abundant in primate genomes [39]. Such genomic duplications provide a major avenue for genetic innovation by allowing the functional specialization of coding or regulatory sequences. Karyotypic changes are thought to be an important factor in speciation [65]. Major differences between the human and chimpanzee genomes include nine pericentric inversions, and these have also been linked to TE-mediated recombination events [66]. It thus seems that both the active and passive effects of Alu and L1 have greatly facilitated and influenced the trajectory of simian evolution by TE-Thrust. Transfer RNA-type SINEs, with suitable partner LINEs, probably perform this role in other lineages.: TE-Thrust affects evolutionary trajectory by engineering lineage-specific traits: TEs can act to generate genetic novelties and thus specific phenotypic traits in numerous ways. Besides passively promoting exon, gene or segmental duplications (or deletions) by unequal recombination, or by disruption of genes via insertion, TEs can actively contribute to gene structure or regulation via exaptation. On multiple occasions, TEs have been domesticated to provide the raw material for entire genes or novel gene fusions [11]. More frequently, TEs have contributed partially to individual genes through exonization after acquisition of splice sites [67, 68]. Independent exons generated by TEs are often alternatively spliced, and thereby result in novel expressed isoforms that increase the size of the transcriptome [69]. The generation of novel gene sequences during evolution seems to be heavily outweighed by genetic or epigenetic changes in the transcriptional regulation of pre-existing genes [34, 70]. Consistent with this, much evidence indicates that a major way in which TEs have acted to functionally modify primate genomes is by actively inserting novel regulatory elements adjacent to genes, thus silencing or enhancing expression levels or changing expression patterns, often in a tissue-specific manner [71–73]. Moreover, because they are highly repetitious and scattered, TEs have the capacity to affect gene expression on a genome-wide scale by acting as distributors of regulatory sequences or CpG islands in a modular form [74]. Many functional binding sites of developmentally important transcription factors have been found to reside on Alu repeats [75]. These include oestrogen receptor-dependent enhancer elements [76] and retinoic acid response elements, which seem to have been seeded next to retinoic acid target genes throughout the primate genome by the AluS subfamily [77]. As a consequence, TEs are able to contribute significantly to the species-specific rewiring of mammalian transcriptional regulatory networks during pre-implantation embryonic development [78]. Similarly, primate-specific ERVs have been implicated in shaping the human p53 transcriptional network [79] and rewiring the core regulatory network of human embryonic stem cells [80].: Certain classes of retro-TEs can actively generate genetic novelty using their retrotranspositional mechanism to partially or fully duplicate existing cellular genes. Duplication is a crucial aspect of evolution, which has been particularly important in vertebrates, and constitutes the primary means by which organisms evolve new genes [81]. LINEs and SVAs have a propensity to transduce host DNA due to their weak transcriptional termination sites, so that 3' flanking regions are often included in their transcripts. This can lead to gene duplication, exon shuffling or regulatory-element seeding, depending on the nature of the sequence involved [37, 82, 83]. Duplication of genes can also occur via the retrotransposition of mRNA transcripts by LINEs. Such genes are termed retrocopies, which, after subsequent useful mutation, can sometimes evolve into retrogenes, with a new, related function. There are reportedly over one thousand transcribed retrogenes in the human genome [84], with about one new retrogene per million years having emerged in the human lineage during the past 63 Myr [26]. Some primate retrogenes seem to have evolved highly beneficial functions, such as GLUD2[37].: Specific evidence for TE-Thrust: examples of traits engineered by TEs in the higher primates: TEs seem to have heavily influenced the trajectories of primate evolution and contributed to primate characteristics, as the simians in particular have undergone major evolutionary advancements in cognitive ability and physiology (especially reproductive physiology). The advancement and radiation of the simians seems to be due, in part and all else being equal, to exceptionally powerful TE-Thrust, owing to its especially effective Alu dimer, partnered by very active novel L1 families, supplemented by ERVs and LTRs. These have engineered major changes in the genomes of the lineage(s) leading to the simian radiations and major transitions. We identified more than 100 documented instances in which TEs affected individual genes and thus were apparently implicated at a molecular level in the origin of higher primate-specific traits (Table 3; Table 4; Table 5; Table 6). The Alu SINE dominated, being responsible for nearly half of these cases, with ERVs/sLTRs being responsible for a third, followed by L1-LINEs at 15% (Figure 1A). Just 2% were due to the young SVAs, and 1% each to ancient MIR SINEs and DNA-TEs. More than half the observed changes wrought by TEs were regulatory (Figure 1B). As discussed below, TEs seem to have influenced four main aspects of the primate phenotype: brain and sensory function, reproductive physiology, immune defence, and metabolic/other (Figure 1C and Table 3; Table 4; Table 5; Table 6). Notably, ERVs, which are often highly transcribed in the germline and placenta [85], were strongly associated with reproductive traits, whereas Alus influenced these four aspects almost equally (Figure 2).: Comparison of aspects of primate phenotype affected by (A) Alu elements and (B) LTR/ERVs. Based on the published data shown in Tables 3 to 6.: Brain and sensory function: The large brain, advanced cognition and enhanced colour vision of higher primates are distinct from those of other mammals. The molecular basis of these characteristics remains to be fully defined, but from evidence already available, TEs (particularly Alus) seem to have contributed substantially via the origination of novel genes and gene isoforms, or via altered gene transcription (Table 3). Most of the neuronal genes affected by TEs are restricted to the apes, and they seem to have roles in synaptic function and plasticity, and hence learning and memory. These genes include multiple neurotransmitter receptor genes and glutamate dehydrogenase 2 (GLUD2), a retrocopy of GLUD1 that has acquired crucial point mutations. GLUD2 encodes glutamate dehydrogenase, an enzyme that seems to have increased the cognitive powers of the apes through the enhancement of neurotransmitter recycling [37]. The cell cycle-related kinase (CCRK) gene represents a good example of how the epigenetic modification of TEs can be mechanistically linked to the transcriptional regulation of nearby genes [86]. In simians, this gene possesses regulatory CpGs contained within a repressor Alu element, and these CpGs are more methylated in the cerebral cortex of human compared with chimpanzee. Concordantly, CCRK is expressed at higher levels in the human brain [86]. TEs may also affect the brain at a somatic level, because embryonic neural progenitor cells have been found to be permissive to L1 activity in humans [87]. This potentially provides a mechanism for increasing neural diversity and individuality. As our human lineage benefits from a diversity of additional individual talents, as well as shared talents, this phenomenon, if confirmed, could increase the 'fitness' of the human lineage, and is entirely consistent with the concept of differential survival of lineages, as stated in our TE-Thrust hypothesis.: The trichromatic vision of Old World monkeys and apes immensely enhanced their ability to find fruits and other foods, and probably aided them in group identity. This trait evidently had its origin in an Alu-mediated gene-duplication event that occurred about 40 Mya, and subsequently resulted in two separate cone photoreceptor (opsin) genes [36], the tandem OPN1LW and OPN1MW, which are sensitive to long- and medium-wave light respectively. Other mammals possess only dichromatic vision.: Reproductive physiology: Compared with other mammals, simian reproduction is characterized by relatively long gestation periods and by the existence of a hemochorial-type placenta that has evolved additional refinements to ensure efficient fetal nourishment. Available data suggests that TE-Thrust has contributed much of the uniqueness of the higher primate placenta, which seems to be more invasive than that of other mammals, and releases a large number of factors that modify maternal metabolism during pregnancy. These characteristics appear to be due to the generation of novel placenta genes and to various TEs having been exapted as regulatory elements to expand or enhance the expression of pre-existing mammalian genes in the primate placenta (Table 4). The growth hormone (GH) gene locus is particularly notable for having undergone rapid evolution in the higher primates compared with most other mammals. A crucial aspect of this evolutionary advance was a burst of gene-duplication events in which Alu-mediated recombination is implicated as a driving force [88]. The simians thus possess between five and eight GH gene copies, and these show functional specialization, being expressed in the placenta, in which they are thought to influence fetal access to maternal resources during pregnancy [88, 89]. Longer gestation periods in simians were accompanied by adaptations to ensure an adequate oxygen supply. One key event was an L1-mediated duplication of the HBG globin gene in the lineage leading to the higher primates, which generated HBG1 and HBG2[90]. HBG2 subsequently acquired expression specifically in the simian fetus, in which it ensures the high oxygen affinity of fetal blood for more efficient oxygen transfer across the placenta. Old World primates additionally express HBG1 in the fetus, owing to an independent LINE insertion at the beta globin locus [91]. Thus, the important process of placental gas exchange has been extensively improved by TEs in simians, in contrast to that of many mammals, including prosimians, in which fetal and adult haemoglobins are the same.: Two prominent examples of functionally exapted genes whose sequences are entirely TE-derived are syncytin-1 (ERVWE1) and syncytin-2 (ERVWE2). Both of these primate-specific genes are derived from ERV envelope (env) genes [92, 93]. The syncytins play a crucial role in simian placental morphogenesis by mediating the development of the fetomaternal interface, which has a fundamental role in allowing the adequate exchange of nutrients and other factors between the maternal bloodstream and the fetus. In a remarkable example of convergent evolution, which attests to the importance of this innovation, two ERV env genes, syncytin-A and syncytin-B, independently emerged in the rodent lineage about 20 Mya [94], as did syncytin-Ory1 within the lagomorphs 12-30 Mya, and these exhibit functional characteristics analogous to the primate syncytin genes [95]. This example, as well as many others (Table 3; Table 4; Table 5; Table 6) suggests the possibility that TE-Thrust may be an important factor in convergent evolution, a phenomenon that can be difficult to explain by traditional theories.: Immune defence: Immune-related genes were probably crucial to the primate lineage by affording protection from potentially lethal infectious diseases. TEs have been reported to contribute to higher primate-restricted transcripts, or to the expression of a wide variety of immunologically relevant genes (Table 5). One example is the insertion of an AluY element into intron 1 of the fucosyltransferase (FUT)1 gene in an ancestor of humans and apes. This enabled erythrocytic expression of FUT1, and thus the ABO blood antigens [96], an adaptation linked to the selective pressure by malarial infection [97]. A particularly good example of a primate-specific adaptation that can be accounted for by a TE is the regulation of the cathelicidin antimicrobial peptide (CAMP) gene by the vitamin D pathway. Only simians possess a functional vitamin D response element in the promoter of this gene, which is derived from the insertion of an AluSx element. This genetic alteration enhances the innate immune response of simians to infection, and potentially counteracts the anti-inflammatory properties of vitamin D [98].: Metabolic/other: TEs seem to underlie a variety of other primate adaptations, particularly those associated with metabolism (Table 6). A striking example, related to dietary change, was the switching of the expression of certain a-amylase genes (AMY1A, AMY1B and AMY1C) from the pancreas to the salivary glands of Old World primates. This event, which was caused by the genomic insertion of an ERV acting as a tissue-specific promoter [99], facilitated the utilization of a higher starch diet in some Old World primates. This included the human lineage, in which consumption of starch became increasingly important, as evidenced by the average human having about three times more AMY1 gene copies than chimpanzees [100]. Another example was the loss of a 100 kb genomic region in the gibbons, due to homologous recombination between AluSx sites [101], resulting in gibbons lacking the ASIP gene involved in the regulation of energy metabolism and pigmentation, which may help to account for their distinctive low body mass, so beneficial for these highly active arboreal primates.: TE-Thrust and divergence of the human lineage: Human and chimpanzee genomes exhibit discernable differences in terms of TE repertoire, TE activity and TE-mediated recombination events [21, 40, 54, 60–64]. Thus, although nucleotide substitutions to crucial genes are important [31], TE-Thrust is likely to have made a significant contribution to the relatively recent divergence of the human lineage [102, 103]. In support of this, at least eight of the examples listed (Table 3; Table 4; Table 5; Table 6) are unique to humans. A notable example of a human-specific TE-mediated genomic mutation was the disruption of the CMAH gene, which is involved in the synthesis of a common sialic acid (Neu5Gc), by an AluY element over 2 Mya [104]. This may have conferred on human ancestors a survival advantage by decreasing infectious risk from microbial pathogens known to prefer Neu5Gc as a receptor.: Conclusions: A role for TEs in evolution has long been recognized by many, yet its importance has probably been underestimated. Using primates as exemplar lineages, we have assessed specific evidence, and conclude that it points strongly to an instrumental role for TEs, via TE-Thrust, in engineering the divergence of the simian lineage from other mammalian lineages. TEs, particularly Alu SINEs, have essentially acted as a huge primate-restricted stockpile of potential exons and regulatory regions, and thereby have provided the raw material for these evolutionary transitions. TEs, including Alu SINEs, L1 LINEs, ERVs and LTRs have, through active TE-Thrust, contributed directly to the primate transcriptome, and even more significantly by providing regulatory elements to alter gene expression patterns. Via passive TE-Thrust, homologous Alu and L1 elements scattered throughout the simian genome have led to both genomic gain, in the form of segmental and gene duplications, and genomic loss, by promoting unequal recombination events. Collectively, these events seem to have heavily influenced the trajectories of primate evolution and contributed to characteristic primate traits, as the simian clades especially have undergone major evolutionary advancements in cognitive ability and physiology. Although as yet incompletely documented, the evidence presented here supports the hypothesis that TE-Thrust may be a pushing force for numerous advantageous features of higher primates. These very beneficial features apparently include enhanced brain function, superior fetal nourishment, valuable trichromatic colour vision, improved metabolism, and resistance to infectious-disease agents. Such large evolutionary benefits to various primate clades, brought about by various TE repertories, powerfully demonstrate that if TEs are 'junk' DNA then there is indeed much treasure in the junkyard, and that the TE-Thrust hypothesis could become an important part of some future paradigm shift in evolutionary theory.: Abbreviations: Alu recombination-mediated deletion: DNA transposon: endogenous retrovirus: LINE-1: long interspersed nuclear element: long terminal repeat: mammalian-wide interspersed repeat: million years ago: million years: retrotransposable element: reverse transcriptase: short interspersed nuclear element: SINE-VNTR-Alu: transposable element.: References: McClintock B: Controlling elements and the gene. Cold Spring Harb Symp Quant Biol. 1956, 21: 197-216.: Georgiev GP: Mobile genetic elements in animal cells and their biological significance. Eur J Biochem. 1984, 145: 203-220. 10.1111/j.1432-1033.1984.tb08541.x.: Brosius J: Retroposons - seeds of evolution. Science. 1991, 251: 753-10.1126/science.1990437.: Fedoroff NV: Transposable elements as a molecular evolutionary force. Ann NY Acad Sci. 1999, 870: 251-264. 10.1111/j.1749-6632.1999.tb08886.x.: Kidwell MG, Lisch DR: Perspective: transposable elements, parasitic DNA, and genome evolution. Evolution. 2001, 55: 1-24.: Bowen NJ, Jordan IK: Transposable elements and the evolution of eukaryotic complexity. Curr Issues Mol Biol. 2002, 4: 65-76.: Deininger PL, Moran JV, Batzer MA, Kazazian HH: Mobile elements and mammalian genome evolution. Curr Opin Genet Dev. 2003, 13: 651-658. 10.1016/j.gde.2003.10.013.: Kazazian HH: Mobile elements: drivers of genome evolution. Science. 2004, 303: 1626-1632. 10.1126/science.1089670.: Wessler SR: Eukaryotic transposable elements: teaching old genomes new tricks. The Implicit Genome. Edited by: Caporale LH. 2006, New York: Oxford University Press, 138-165.: Biémont C, Vieira C: Genetics: junk DNA as an evolutionary force. Nature. 2006, 443: 521-524. 10.1038/443521a.: Volff JN: Turning junk into gold: domestication of transposable elements and the creation of new genes in eukaryotes. Bioessays. 2006, 28: 913-922. 10.1002/bies.20452.: Feschotte C, Pritham EJ: DNA transposons and the evolution of eukaryotic genomes. Annu Rev Genet. 2007, 41: 331-368. 10.1146/annurev.genet.40.110405.090448.: Muotri AR, Marchetto MC, Coufal NG, Gagen FH: The necessary junk: new functions for transposable elements. Hum Mol Genet. 2007, 16: R159-R167. 10.1093/hmg/ddm196.: Böhne A, Brunet F, Galiana-Arnoux D, Schultheis C, Volff JN: Transposable elements as drivers of genomic and biological diversity in vertebrates. Chromosome Res. 2008, 16: 203-215. 10.1007/s10577-007-1202-6.: Oliver KR, Greene WK: Transposable elements: powerful facilitators of evolution. BioEssays. 2009, 31: 703-714. 10.1002/bies.200800219.: Matzke MA, Mette MF, Aufsatz W, Jakowitsch J, Matzke AJ: Host defenses to parasitic sequences and the evolution of epigenetic control mechanisms. Genetica. 1999, 107: 271-287. 10.1023/A:1003921710672.: Schulz WA, Steinhoff C, Florl AR: Methylation of endogenous human retroelements in health and disease. Curr Top Microbiol Immunol. 2006, 310: 211-250. 10.1007/3-540-31181-5_11.: Dupressoir A, Heidmann T: Germ line-specific expression of intracisternal A-particle retrotransposons in transgenic mice. Mol Cell Biol. 1996, 16: 4495-4503.: Brouha B, Meischl C, Ostertag E, de Boer M, Zhang Y, Neijens H, Roos D, Kazazian HH: Evidence consistent with human L1 retrotransposition in maternal meiosis I. Am J Hum Genet. 2002, 71: 327-336. 10.1086/341722.: van den Hurk JA, Meij IC, Seleme MC, Kano H, Nikopoulos K, Hoefsloot LH, Sistermans EA, de Wijs IJ, Mukhopadhyay A, Plomp AS, de Jong PT, Kazazian HH, Cremers FP: L1 retrotransposition can occur early in human embryonic development. Hum Mol Genet. 2007, 16: 1587-1592. 10.1093/hmg/ddm108.: Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, et al: Initial sequencing and analysis of the human genome. Nature. 2001, 409: 860-921. 10.1038/35057062.: Walters RD, Kugel JF, Goodrich JA: InvAluable junk: the cellular impact and function of Alu and B2 RNAs. IUBMB Life. 2009, 61: 831-837. 10.1002/iub.227.: Haring E, Hagemann S, Pinsker W: Ancient and recent horizontal invasions of Drosophilids by P elements. J Mol Evol. 2000, 51: 577-586.: Gerasimova TI, Matjunina LV, Mizrokhi LJ, Georgiev GP: Successive transposition explosions in Drosophila melanogaster and reverse transpositions of mobile dispersed genetic elements. EMBO J. 1985, 4: 3773-3779.: Kim TM, Hong SJ, Rhyu MG: Periodic explosive expansion of human retroelements associated with the evolution of the hominoid primate. J Korean Med Sci. 2004, 19: 177-185. 10.3346/jkms.2004.19.2.177.: Marques AC, Dupanloup I, Vinckenbosch N, Reymond A, Kaessmann H: Emergence of young human genes after a burst of retroposition in primates. PLoS Biol. 2005, 3: e357-10.1371/journal.pbio.0030357.: Ray DA, Feschotte C, Pagan HJ, Smith JD, Pritham EJ, Arensburger P, Atkinson PW, Craig NL: Multiple waves of recent DNA transposon activity in the bat, Myotis lucifugus. Genome Res. 2008, 18: 717-728. 10.1101/gr.071886.107.: Zeh DW, Zeh JA, Ishida Y: Transposable elements and an epigenetic basis for punctuated equilibria. BioEssays. 2009, 31: 715-726. 10.1002/bies.200900026.: Gould SJ: The Structure of Evolutionary Theory. 2002, Cambridge: The Belknap Press of Harvard University Press: Ridley M: Evolution. 2004, Oxford: Blackwell Science: Pollard KS, Salama SR, Lambert N, Lambot MA, Coppens S, Pedersen JS, Katzman S, King B, Onodera C, Siepel A, Kern AD, Dehay C, Igel H, Ares M, Vanderhaeghen P, Haussler D: An RNA gene expressed during cortical development evolved rapidly in humans. Nature. 2006, 443: 167-172. 10.1038/nature05113.: Kashi Y, King DG: Simple sequence repeats as advantageous mutators in evolution. Trends Genet. 2006, 22: 253-259. 10.1016/j.tig.2006.03.005.: Margulis L, Chapman MJ: Endosymbioses: cyclical and permanent in evolution. Trends Microbiol. 1998, 6: 342-346. 10.1016/S0966-842X(98)01325-0.: Monk M: Epigenetic programming of differential gene expression in development and evolution. Dev Genet. 1995, 17: 188-197. 10.1002/dvg.1020170303.: McLysaght A, Hokamp K, Wolfe KH: Extensive genomic duplication during early chordate evolution. Nat Genet. 2002, 31: 200-204. 10.1038/ng884.: Dulai KS, von Dornum M, Mollon JD, Hunt DM: The evolution of trichromatic color vision by opsin gene duplication in New World and Old World primates. Genome Res. 1999, 9: 629-638.: Burki F, Kaessmann H: Birth and adaptive evolution of a hominoid gene that supports high neurotransmitter flux. Nat Genet. 2004, 36: 1061-1063. 10.1038/ng1431.: Batzer MA, Deininger PL: Alu repeats and human genomic diversity. Nat Rev Genet. 2002, 3: 370-379. 10.1038/nrg798.: Bailey JA, Liu G, Eichler EE: An Alu transposition model for the origin and expansion of human segmental duplications. Am J Hum Genet. 2003, 73: 823-834. 10.1086/378594.: Mikkelsen TS, Hillier LW, Eichler EE, Zody MC, Jaffe DB, Yang SP, Enard W, Hellmann I, Lindblad-Toh K, Altheide TK, Archidiacono N, Bork P, Butler J, Chang JL, Cheng Z, Chinwalla AT, de Jong P, Delehaunty KD, Fronick CC, Fulton LL, Gilad Y, Glusman G, Gnerre S, Graves TA, Hayakawa T, Hayden KE, Huang XQ, Ji HK, Kent WJ, King MC, et al: Initial sequence of the chimpanzee genome and comparison with the human genome. Nature. 2005, 437: 69-87. 10.1038/nature04072.: Gibbs RA, Rogers J, Katze MG, Bumgarner R, Weinstock GM, Mardis ER, Remington KA, Strausberg RL, Venter JC, Wilson RK, Batzer MA, Bustamante CD, Eichler EE, Hahn MW, Hardison RC, Makova KD, Miller W, Milosavljevic A, Palermo RE, Siepel A, Sikela JM, Attaway T, Bell S, Bernard KE, Buhay CJ, Chandrabose MN, Dao M, Davis C, Delehaunty KD, Ding Y, et al: Evolutionary and biomedical insights from the rhesus macaque genome. Science. 2007, 316: 222-234.: Mills RE, Bennett EA, Iskow RC, Devine SE: Which transposable elements are active in the human genome?. Trends Genet. 2007, 23: 183-191. 10.1016/j.tig.2007.02.006.: Gilbert N, Labuda D: CORE-SINEs: Eukaryotic short interspersed retroposing elements with common sequence motifs. Proc Natl Acad Sci USA. 1999, 96: 2869-2874. 10.1073/pnas.96.6.2869.: Labuda D, Striker G: Sequence conservation in Alu evolution. Nucleic Acids Res. 1989, 17: 2477-2491. 10.1093/nar/17.7.2477.: Levanon EY, Eisenberg E, Yelin R, Nemzer S, Hallegger M, Shemesh R, Fligelman ZY, Shoshan A, Pollock SR, Sztybel D, Olshansky M, Rechavi G, Jantsch MF: Systematic identification of abundant A-to-I editing sites in the human transcriptome. Nat Biotechnol. 2004, 22: 1001-1005. 10.1038/nbt996.: Mattick JS, Mehler MF: RNA editing, DNA recoding and the evolution of human cognition. Trends Neurosci. 2008, 31: 227-233. 10.1016/j.tins.2008.02.003.: Krull M, Petrusma M, Makalowski W, Brosius J, Schmitz J: Functional persistence of exonized mammalian-wide interspersed repeat elements (MIRs). Genome Res. 2007, 17: 1139-1145. 10.1101/gr.6320607.: Ostertag EM, Goodier JL, Zhang Y, Kazazian HH: SVA elements are nonautonomous retrotransposons that cause disease in humans. Am J Hum Genet. 2003, 73: 1444-1451. 10.1086/380207.: Mayer J, Meese E: Human endogenous retroviruses in the primate lineage and their influence on host genomes. Cytogenet Genome Res. 2005, 110: 448-456. 10.1159/000084977.: Pace JK, Feschotte C: The evolutionary history of human DNA transposons: evidence for intense activity in the primate lineage. Genome Res. 2007, 17: 422-432. 10.1101/gr.5826307.: Nishihara H, Smit AF, Okada N: Functional non-coding sequences derived from SINEs in the mammalian genome. Genome Res. 2006, 16: 864-874. 10.1101/gr.5255506.: Ullu E, Tschudi C: Alu sequences are processed 7SL RNA genes. Nature. 1984, 312: 171-172. 10.1038/312171a0.: Dewannieux M, Esnault C, Heidmann T: LINE-mediated retrotransposition of marked Alu sequences. Nat Genet. 2003, 35: 41-48. 10.1038/ng1223.: Khan H, Smit A, Boissinot S: Molecular evolution and tempo of amplification of human LINE-1 retrotransposons since the origin of primates. Genome Res. 2006, 16: 78-87.: Ohshima K, Hattori M, Yada T, Gojobori T, Sakaki Y, Okada N: Whole-genome screening indicates a possible burst of formation of processed pseudogenes and Alu repeats by particular L1 subfamilies in ancestral primates. Genome Biol. 2003, 4: R74-10.1186/gb-2003-4-11-r74.: Bénit L, Lallemand JB, Casella JF, Philippe H, Heidmann T: ERV-L elements: a family of endogenous retrovirus-like elements active throughout the evolution of mammals. J Virol. 1999, 73: 3301-3308.: Liu G, Zhao S, Bailey JA, Sahinalp SC, Alkan C, Tuzun E, Green ED, Eichler EE: Analysis of primate genomic variation reveals a repeat-driven expansion of the human genome. Genome Res. 2003, 13: 358-368. 10.1101/gr.923303.: Quentin Y: Fusion of a free left Alu monomer and a free right Alu monomer at the origin of the Alu family in the primate genomes. Nucleic Acids Res. 1992, 20: 487-493. 10.1093/nar/20.3.487.: Schmid CW: Does SINE evolution preclude Alu function?. Nucleic Acids Res. 1998, 26: 4541-4550. 10.1093/nar/26.20.4541.: Mills RE, Bennett EA, Iskow RC, Luttig CT, Tsui C, Pittard WS, Devine SE: Recently mobilized transposons in the human and chimpanzee genomes. Am J Hum Genet. 2006, 78: 671-679. 10.1086/501028.: Hedges DJ, Callinan PA, Cordaux R, Xing J, Barnes E, Batzer MA: Differential Alu mobilization and polymorphism among the human and chimpanzee lineages. Genome Res. 2004, 14: 1068-1075. 10.1101/gr.2530404.: Sen SK, Han K, Wang J, Lee J, Wang H, Callinan PA, Dyer M, Cordaux R, Liang P, Batzer MA: Human genomic deletions mediated by recombination between Alu elements. Am J Hum Genet. 2006, 79: 41-53. 10.1086/504600.: Han K, Lee J, Meyer TJ, Wang J, Sen SK, Srikanta D, Liang P, Batzer MA: Alu recombination-mediated structural deletions in the chimpanzee genome. PLoS Genet. 2007, 3: 1939-1949.: Han K, Sen SK, Wang J, Callinan PA, Lee J, Cordaux R, Liang P, Batzer MA: Genomic rearrangements by LINE-1 insertion-mediated deletion in the human and chimpanzee lineages. Nucleic Acids Res. 2005, 33: 4040-4052. 10.1093/nar/gki718.: Rieseberg LH: Chromosomal rearrangements and speciation. Trends Ecol Evol. 2001, 16: 351-358. 10.1016/S0169-5347(01)02187-5.: Kehrer-Sawatzki H, Sandig C, Chuzhanova N, Goidts V, Szamalek JM, Tänzer S, Müller S, Platzer M, Cooper DN, Hameister H: Breakpoint analysis of the pericentric inversion distinguishing human chromosome 4 from the homologous chromosome in the chimpanzee (Pan troglodytes). Hum Mutat. 2005, 25: 45-55. 10.1002/humu.20116.: Sela N, Mersch B, Hotz-Wagenblatt A, Ast G: Characteristics of transposable element exonization within human and mouse. PLoS One. 2010, 5: e10907-10.1371/journal.pone.0010907.: Shen S, Lin L, Cai JJ, Jiang P, Kenkel EJ, Stroik MR, Sato S, Davidson BL, Xing Y: Widespread establishment and regulatory impact of Alu exons in human genes. Proc Natl Acad Sci USA.: Sorek R, Ast G, Graur D: Alu-containing exons are alternatively spliced. Genome Res. 2002, 12: 1060-1067. 10.1101/gr.229302.: Carroll SB: Evolution at two levels: on genes and form. PLoS Biol. 2005, 3: e245-10.1371/journal.pbio.0030245.: Nigumann P, Redik K, Mätlik K, Speek M: Many human genes are transcribed from the antisense promoter of L1 retrotransposon. Genomics. 2002, 79: 628-634. 10.1006/geno.2002.6758.: Jordan IK, Rogozin IB, Glazko GV, Koonin EV: Origin of a substantial fraction of human regulatory sequences from transposable elements. Trends Genet. 2003, 19: 68-10.1016/S0168-9525(02)00006-9.: van de Lagemaat LN, Landry JR, Mager DL, Medstrand P: Transposable elements in mammals promote regulatory variation and diversification of genes with specialized functions. Trends Genet. 2003, 19: 530-536. 10.1016/j.tig.2003.08.004.: Feschotte C: Transposable elements and the evolution of regulatory networks. Nat Rev Genet. 2008, 9: 397-405. 10.1038/nrg2337.: Polak P, Domany E: Alu elements contain many binding sites for transcription factors and may play a role in regulation of developmental processes. BMC Genomics. 2006, 7: 133-10.1186/1471-2164-7-133.: Norris J, Fan D, Aleman C, Marks JR, Futreal PA, Wiseman RW, Iglehart JD, Deininger PL, McDonnell DP: Identification of a new subclass of Alu DNA repeats which can function as estrogen receptor-dependent transcriptional enhancers. J Biol Chem. 1995, 270: 22777-22782. 10.1074/jbc.270.39.22777.: Vansant G, Reynolds WF: The consensus sequence of a major Alu subfamily contains a functional retinoic acid response element. Proc Natl Acad Sci USA. 1995, 92: 8229-8233. 10.1073/pnas.92.18.8229.: Xie D, Chen CC, Ptaszek LM, Xiao S, Cao X, Fang F, Ng HH, Lewin HA, Cowan C, Zhong S: Rewirable gene regulatory networks in the preimplantation embryonic development of three mammalian species. Genome Res. 2010, 20: 804-815. 10.1101/gr.100594.109.: Wang T, Zeng J, Lowe CB, Sellers RG, Salama SR, Yang M, Burgess SM, Brachmann RK, Haussler D: Species-specific endogenous retroviruses shape the transcriptional network of the human tumor suppressor protein p53. Proc Natl Acad Sci USA. 2007, 104: 18613-18618. 10.1073/pnas.0703637104.: Kunarso G, Chia NY, Jeyakani J, Hwang C, Lu X, Chan YS, Ng HH, Bourque G: Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nat Genet. 2010, 42: 631-634. 10.1038/ng.600.: Ohno S: Evolution by Gene Duplication. 1970, New York: Springer-Verlag: Moran JV, DeBerardinis RJ, Kazazian HH: Exon shuffling by L1 retrotransposition. Science. 1999, 283: 1530-1534. 10.1126/science.283.5407.1530.: Goodier JL, Ostertag EM, Kazazian HH: Transduction of 3'-flanking sequences is common in L1 retrotransposition. Hum Mol Genet. 2000, 9: 653-657. 10.1093/hmg/9.4.653.: Vinckenbosch N, Dupanloup I, Kaessmann H: Evolutionary fate of retroposed gene copies in the human genome. Proc Natl Acad Sci USA. 2006, 103: 3220-3225. 10.1073/pnas.0511307103.: Prudhomme S, Bonnaud B, Mallet F: Endogenous retroviruses and animal reproduction. Cytogenet Genome Res. 2005, 110: 353-364. 10.1159/000084967.: Farcas R, Schneider E, Frauenknecht K, Kondova I, Bontrop R, Bohl J, Navarro B, Metzler M, Zischler H, Zechner U, Daser A, Haaf T: Differences in DNA methylation patterns and expression of the CCRK gene in human and nonhuman primate cortices. Mol Biol Evol. 2009, 26: 1379-1389. 10.1093/molbev/msp046.: Coufal NG, Garcia-Perez JL, Peng GE, Yeo GW, Mu Y, Lovci MT, Morell M, O'Shea KS, Moran JV, Gage FH: L1 retrotransposition in human neural progenitor cells. Nature. 2009, 460: 1127-1131. 10.1038/nature08248.: De Mendoza A, Escobedo DE, Dávila IM, Saldaña H: Expansion and divergence of the GH locus between spider monkey and chimpanzee. Gene. 2004, 336: 185-193. 10.1016/j.gene.2004.03.034.: Lacroix MC, Guibourdenche J, Frendo JL, Muller F, Evain-Brion D: Human placental growth hormone-a review. Placenta. 2002, 23: S87-94.: Fitch DH, Bailey WJ, Tagle DA, Goodman M, Sieu L, Slightom JL: Duplication of the gamma-globin gene mediated by L1 long interspersed repetitive elements in an early ancestor of simian primates. Proc Natl Acad Sci USA. 1991, 88: 7396-7400. 10.1073/pnas.88.16.7396.: Johnson RM, Prychitko T, Gumucio D, Wildman DE, Uddin M, Goodman M: Phylogenetic comparisons suggest that distance from the locus control region guides developmental expression of primate beta-type globin genes. Proc Natl Acad Sci USA. 2006, 103: 3186-3191. 10.1073/pnas.0511347103.: Mi S, Lee X, Li X, Veldman GM, Finnerty H, Racie L, LaVallie E, Tang XY, Edouard P, Howes S, Keith JC, McCoy JM: Syncytin is a captive retroviral envelope protein involved in human placental morphogenesis. Nature. 2000, 403: 785-789. 10.1038/35001608.: Blaise S, de Parseval N, Bénit L, Heidmann T: Genomewide screening for fusogenic human endogenous retrovirus envelopes identifies syncytin 2, a gene conserved on primate evolution. Proc Natl Acad Sci USA. 2003, 100: 13013-13018. 10.1073/pnas.2132646100.: Dupressoir A, Marceau G, Vernochet C, Bénit L, Kanellopoulos C, Sapin V, Heidmann T: Syncytin-A and syncytin-B, two fusogenic placenta-specific murine envelope genes of retroviral origin conserved in Muridae. Proc Natl Acad Sci USA. 2005, 102: 725-730. 10.1073/pnas.0406509102.: Heidmann O, Vernochet C, Dupressoir A, Heidmann T: Identification of an endogenous retroviral envelope gene with fusogenic activity and placenta-specific expression in the rabbit: a new \"syncytin\" in a third order of mammals. Retrovirology. 2009, 6: 107-10.1186/1742-4690-6-107.: Apoil PA, Roubinet F, Despiau S, Mollicone R, Oriol R, Blancher A: Evolution of alpha 2-fucosyltransferase genes in primates: relation between an intronic Alu-Y element and red cell expression of ABH antigens. Mol Biol Evol. 2000, 17: 337-351.: Cserti CM, Dzik WH: The ABO blood group system and Plasmodium falciparum malaria. Blood. 2007, 110: 2250-2258. 10.1182/blood-2007-03-077602.: Gombart AF, Saito T, Koeffler HP: Exaptation of an ancient Alu short interspersed element provides a highly conserved vitamin D-mediated innate immune response in humans and primates. BMC Genomics. 2009, 10: 321-10.1186/1471-2164-10-321.: Ting CN, Rosenberg MP, Snow CM, Samuelson LC, Meisler MH: Endogenous retroviral sequences are required for tissue-specific expression of a human salivary amylase gene. Genes Dev. 1992, 6: 1457-1465. 10.1101/gad.6.8.1457.: Perry GH, Dominy NJ, Claw KG, Lee AS, Fiegler H, Redon R, Werner J, Villanea FA, Mountain JL, Misra R, Carter NP, Lee C, Stone AC: Diet and the evolution of human amylase gene copy number variation. Nat Genet. 2007, 39: 1256-1260. 10.1038/ng2123.: Nakayama K, Ishida T: Alu-mediated 100-kb deletion in the primate genome: the loss of the agouti signaling protein gene in the lesser apes. Genome Res. 2006, 16: 485-490. 10.1101/gr.4763906.: Cordaux R, Batzer MA: The impact of retrotransposons on human genome evolution. Nat Rev Genet. 2009, 10: 691-703. 10.1038/nrg2640.: Britten RJ: Transposable element insertions have strongly affected human evolution. Proc Natl Acad Sci USA. 2010, 107: 19945-19948. 10.1073/pnas.1014330107.: Hayakawa T, Satta Y, Gagneux P, Varki A, Takahata N: Alu-mediated inactivation of the human CMP- N-acetylneuraminic acid hydroxylase gene. Proc Natl Acad Sci USA. 2001, 98: 11399-11404. 10.1073/pnas.191268198.: Parrott AM, Mathews MB: snaR genes: recent descendants of Alu involved in the evolution of chorionic gonadotropins. Cold Spring Harb Symp Quant Biol. 2009, 74: 363-373. 10.1101/sqb.2009.74.038.: Watson JB, Sutcliffe JG: Primate brain-specific cytoplasmic transcript of the Alu repeat family. Mol Cell Biol. 1987, 7: 3324-3327.: Li CY, Zhang Y, Wang Z, Zhang Y, Cao C, Zhang PW, Lu SJ, Li XM, Yu Q, Zheng X, Du Q, Uhl GR, Liu QR, Wei L: A human-specific de novo protein-coding gene associated with human brain functions. PLoS Comput Biol. 2010, 6: e1000734-10.1371/journal.pcbi.1000734.: Cordaux R, Udit S, Batzer MA, Feschotte C: Birth of a chimeric primate gene by capture of the transposase gene from a mobile element. Proc Natl Acad Sci USA. 2006, 103: 8101-8106. 10.1073/pnas.0601161103.: Mola G, Vela E, Fernández-Figueras MT, Isamat M, Muñoz-Mármol AM: Exonization of Alu-generated splice variants in the survivin gene of human and non-human primates. J Mol Biol. 2007, 366: 1055-1063. 10.1016/j.jmb.2006.11.089.: Lai F, Chen CX, Carter KC, Nishikura K: Editing of glutamate receptor B subunit ion channel RNAs by four alternatively spliced DRADA2 double-stranded RNA adenosine deaminases. Mol Cell Biol. 1997, 17: 2413-2424.: Rodriguez IR, Mazuruk K, Schoen TJ, Chader GJ: Structural analysis of the human hydroxyindole-O-methyltransferase gene. Presence of two distinct promoters. J Biol Chem. 1994, 269: 31969-31977.: Fornasari D, Battaglioli E, Flora A, Terzano S, Clementi F: Structural and functional characterization of the human alpha3 nicotinic subunit gene promoter. Mol Pharmacol. 1997, 51: 250-261.: Ebihara M, Ohba H, Ohno SI, Yoshikawa T: Genomic organization and promoter analysis of the human nicotinic acetylcholine receptor alpha6 subunit (CHNRA6) gene: Alu and other elements direct transcriptional repression. Gene. 2002, 298: 101-108. 10.1016/S0378-1119(02)00925-3.: Romanish MT, Nakamura H, Lai CB, Wang Y, Mager DL: A novel protein isoform of the multicopy human NAIP gene derives from intragenic Alu SINE promoters. PLoS One. 2009, 4: e5761-10.1371/journal.pone.0005761.: Kjeldbjerg AL, Villesen P, Aagaard L, Pedersen FS: Gene conversion and purifying selection of a placenta-specific ERV-V envelope gene during simian evolution. BMC Evol Biol. 2008, 8: 266-10.1186/1471-2148-8-266.: Larsson E, Andersson AC, Nilsson BO: Expression of an endogenous retrovirus (ERV3 HERV-R) in human reproductive and embryonic tissues - evidence for a function for envelope gene products. Ups J Med Sci. 1994, 99: 113-120. 10.3109/03009739409179354.: Hsu DW, Lin MJ, Lee TL, Wen SC, Chen X, Shen CK: Two major forms of DNA (cytosine-5) methyltransferase in human somatic tissues. Proc Natl Acad Sci USA. 1999, 96: 9751-9756. 10.1073/pnas.96.17.9751.: Damert A, Löwer J, Löwer R: Leptin receptor isoform 219.1: an example of protein evolution by LINE-1-mediated human-specific retrotransposition of a coding SVA element. Mol Biol Evol. 2004, 21: 647-651. 10.1093/molbev/msh056.: Piriyapongsa J, Polavarapu N, Borodovsky M, McDonald J: Exonization of the LTR transposable elements in human genome. BMC Genomics. 2007, 8: 291-10.1186/1471-2164-8-291.: Huh JW, Kim TH, Yi JM, Park ES, Kim WY, Sin HS, Kim DS, Min DS, Kim SS, Kim CB, Hyun BH, Kang SK, Jung JS, Lee WH, Takenaka O, Kim HS: Molecular evolution of the periphilin gene in relation to human endogenous retrovirus m element. J Mol Evol. 2006, 62: 730-737. 10.1007/s00239-005-0109-0.: Komiyama H, Aoki A, Tanaka S, Maekawa H, Kato Y, Wada R, Maekawa T, Tamura M, Shiroishi T: Alu-derived cis-element regulates tumorigenesis-dependent gastric expression of GASDERMIN B (GSDMB). Genes Genet Syst. 2010, 85: 75-83. 10.1266/ggs.85.75.: Cohen CJ, Lock WM, Mager DL: Endogenous retroviral LTRs as promoters for human genes: a critical assessment. Gene. 2009, 448: 105-114. 10.1016/j.gene.2009.06.020.: Bièche I, Laurent A, Laurendeau I, Duret L, Giovangrandi Y, Frendo JL, Olivi M, Fausser JL, Evain-Brion D, Vidaud M: Placenta-specific INSL4 expression is mediated by a human endogenous retrovirus element. Biol Reprod. 2003, 68: 1422-1429.: Dunn CA, Romanish MT, Gutierrez LE, van de Lagemaat LN, Mager DL: Transcription of two human genes from a bidirectional endogenous retrovirus promoter. Gene. 2006, 366: 335-342. 10.1016/j.gene.2005.09.003.: Scofield MA, Xiong W, Haas MJ, Zeng Y, Cox GS: Sequence analysis of the human glycoprotein hormone alpha-subunit gene 5'-flanking DNA and identification of a potential regulatory element as an Alu repetitive sequence. Biochim Biophys Acta. 2000, 1493: 302-318.: Wu J, Grindlay GJ, Bushel P, Mendelsohn L, Allan M: Negative regulation of the human epsilon-globin gene by transcriptional interference: role of an Alu repetitive element. Mol Cell Biol. 1990, 10: 1209-1216.: Trujillo MA, Sakagashira M, Eberhardt NL: The human growth hormone gene contains a silencer embedded within an Alu repeat in the 3'-flanking region. Mol Endocrinol. 2006, 20: 2559-2575. 10.1210/me.2006-0147.: Hewitt SM, Fraizer GC, Saunders GF: Transcriptional silencer of the Wilms' tumor gene WT1 contains an Alu repeat. J Biol Chem. 1995, 270: 17908-17912. 10.1074/jbc.270.30.17908.: Bi S, Gavrilova O, Gong DW, Mason MM, Reitman M: Identification of a placental enhancer for the human leptin gene. J Biol Chem. 1997, 272: 30583-30588. 10.1074/jbc.272.48.30583.: Wheelan SJ, Aizawa Y, Han JS, Boeke JD: Gene-breaking: a new paradigm for human retrotransposon-mediated gene evolution. Genome Res. 2005, 15: 1073-1078. 10.1101/gr.3688905.: Huh JW, Kim YH, Lee SR, Kim H, Kim DS, Kim HS, Kang HS, Chang KT: Gain of new exons and promoters by lineage-specific transposable elements-integration and conservation event on CHRM3 gene. Mol Cells. 2009, 28: 111-117. 10.1007/s10059-009-0106-z.: Mätlik K, Redik K, Speek M: L1 antisense promoter drives tissue-specific transcription of human genes. J Biomed Biotechnol. 2006, 2006: 71753-: Romanish MT, Lock WM, van de Lagemaat LN, Dunn CA, Mager DL: Repeated recruitment of LTR retrotransposons as promoters by the anti-apoptotic locus NAIP during mammalian evolution. PLoS Genet. 2007, 3: e10-10.1371/journal.pgen.0030010.: Medstrand P, Landry JR, Mager DL: Long terminal repeats are used as alternative promoters for the endothelin B receptor and apolipoprotein C-I genes in humans. J Biol Chem. 2001, 276: 1896-1903. 10.1074/jbc.M006557200.: Schulte AM, Lai S, Kurtz A, Czubayko F, Riegel AT, Wellstein A: Human trophoblast and choriocarcinoma expression of the growth factor pleiotrophin attributable to germ-line insertion of an endogenous retrovirus. Proc Natl Acad Sci USA. 1996, 93: 14759-14764. 10.1073/pnas.93.25.14759.: Landry JR, Rouhi A, Medstrand P, Mager DL: The Opitz syndrome gene Mid1 is transcribed from a human endogenous retroviral promoter. Mol Biol Evol. 2002, 19: 1934-1942.: Huh JW, Ha HS, Kim DS, Kim HS: Placenta-restricted expression of LTR-derived NOS3. Placenta. 2008, 29: 602-608. 10.1016/j.placenta.2008.04.002.: Sin HS, Huh JW, Kim DS, Kang DW, Min DS, Kim TH, Ha HS, Kim HH, Lee SY, Kim HS: Transcriptional control of the HERV-H LTR element of the GSDML gene in human tissues and cancer cells. Arch Virol. 2006, 151: 1985-1994. 10.1007/s00705-006-0764-5.: Xing J, Wang H, Belancio VP, Cordaux R, Deininger PL, Batzer MA: Emergence of primate genes by retrotransposon-mediated sequence transduction. Proc Natl Acad Sci USA. 2006, 103: 17608-17613. 10.1073/pnas.0603224103.: Lee Y, Ise T, Ha D, Saint Fleur A, Hahn Y, Liu XF, Nagata S, Lee B, Bera TK, Pastan I: Evolution and expression of chimeric POTE-actin genes in the human genome. Proc Natl Acad Sci USA. 2006, 103: 17885-17890. 10.1073/pnas.0608344103.: Babushok DV, Ohshima K, Ostertag EM, Chen X, Wang Y, Mandal PK, Okada N, Abrams CS, Kazazian HH: A novel testis ubiquitin-binding protein gene arose by exon shuffling in hominoids. Genome Res. 2007, 17: 1129-1138. 10.1101/gr.6252107.: Lahn BT, Page DC: Retroposition of autosomal mRNA yielded testis-specific gene family on human Y chromosome. Nat Genet. 1999, 21: 429-433. 10.1038/7771.: Betrán E, Long M: Expansion of genome coding regions by acquisition of new genes. Genetica. 2002, 115: 65-80. 10.1023/A:1016024131097.: Zhang R, Wang YQ, Su B: Molecular evolution of a primate-specific microRNA family. Mol Biol Evol. 2008, 25: 1493-1502. 10.1093/molbev/msn094.: Than NG, Romero R, Goodman M, Weckle A, Xing J, Dong Z, Xu Y, Tarquini F, Szilagyi A, Gal P, Hou Z, Tarca AL, Kim CJ, Kim JS, Haidarian S, Uddin M, Bohn H, Benirschke K, Santolaya-Forgas J, Grossman LI, Erez O, Hassan SS, Zavodszky P, Papp Z, Wildman DE: A primate subfamily of galectins expressed at the maternal-fetal interface that promote immune cell death. Proc Natl Acad Sci USA. 2009, 106: 9731-9736. 10.1073/pnas.0903568106.: Caras IW, Davitz MA, Rhee L, Weddell G, Martin DW, Nussenzweig V: Cloning of decay-accelerating factor suggests novel use of splicing to generate two proteins. Nature. 1987, 325: 545-549. 10.1038/325545a0.: Singer SS, Männel DN, Hehlgans T, Brosius J, Schmitz J: From \"junk\" to gene: curriculum vitae of a primate receptor isoform gene. J Mol Biol. 2004, 341: 883-886. 10.1016/j.jmb.2004.06.070.: Bekpen C, Marques-Bonet T, Alkan C, Antonacci F, Leogrande MB, Ventura M, Kidd JM, Siswara P, Howard JC, Eichler EE: Death and resurrection of the human IRGM gene. PLoS Genet. 2009, 5: e1000403-10.1371/journal.pgen.1000403.: Thomson SJ, Goh FG, Banks H, Krausgruber T, Kotenko SV, Foxwell BM, Udalova IA: The role of transposable elements in the regulation of IFN-lambda1 gene expression. Proc Natl Acad Sci USA. 2009, 106: 11564-11569. 10.1073/pnas.0904477106.: Brini AT, Lee GM, Kinet JP: Involvement of Alu sequences in the cell-specific regulation of transcription of the gamma chain of Fc and T cell receptors. J Biol Chem. 1993, 268: 1355-1361.: Hambor JE, Mennone J, Coon ME, Hanke JH, Kavathas P: Identification and characterization of an Alu-containing, T-cell-specific enhancer located in the last intron of the human CD8 alpha gene. Mol Cell Biol. 1993, 13: 7056-7070.: Dunn CA, Medstrand P, Mager DL: An endogenous retroviral long terminal repeat is the dominant promoter for human beta 1,3-galactosyltransferase 5 in the colon. Proc Natl Acad Sci USA. 2003, 100: 12841-12846. 10.1073/pnas.2134464100.: Gerlo S, Davis JR, Mager DL, Kooijman R: Prolactin in man: a tale of two promoters. Bioessays. 2006, 28: 1051-1055. 10.1002/bies.20468.: Piedrafita FJ, Molander RB, Vansant G, Orlova EA, Pfahl M, Reynolds WF: An Alu element in the myeloperoxidase promoter contains a composite SP1-thyroid hormone-retinoic acid response element. J Biol Chem. 1996, 271: 14412-14420. 10.1074/jbc.271.24.14412.: Ackerman H, Udalova I, Hull J, Kwiatkowski D: Evolution of a polymorphic regulatory element in interferon-gamma through transposition and mutation. Mol Biol Evol. 2002, 19: 884-890.: Hess JF, Fox M, Schmid C, Shen CK: Molecular evolution of the human adult alpha-globin-like gene region: insertion and deletion of Alu family repeats and non-Alu DNA sequences. Proc Natl Acad Sci USA. 1983, 80: 5970-5974. 10.1073/pnas.80.19.5970.: Huh JW, Kim DS, Ha HS, Lee JR, Kim YJ, Ahn K, Lee SR, Chang KT, Kim HS: Cooperative exonization of MaLR and AluJo elements contributed an alternative promoter and novel splice variants of RNF19. Gene. 2008, 424: 63-70. 10.1016/j.gene.2008.07.030.: Wu M, Li L, Sun Z: Transposable element fragments in protein-coding regions and their contributions to human functional proteins. Gene. 2007, 401: 165-171. 10.1016/j.gene.2007.07.012.: Yi P, Zhang W, Zhai Z, Miao L, Wang Y, Wu M: Bcl-rambo beta, a special splicing variant with an insertion of an Alu-like cassette, promotes etoposide- and Taxol-induced cell death. FEBS Lett. 2003, 534: 61-68. 10.1016/S0014-5793(02)03778-X.: Lee JR, Huh JW, Kim DS, Ha HS, Ahn K, Kim YJ, Chang KT, Kim HS: Lineage specific evolutionary events on SFTPB gene: Alu recombination-mediated deletion (ARMD), exonization, and alternative splicing events. Gene. 2009, 435: 29-35. 10.1016/j.gene.2009.01.008.: Landry JR, Medstrand P, Mager DL: Repetitive elements in the 5' untranslated region of a human zinc-finger gene modulate transcription and translation efficiency. Genomics. 2001, 76: 110-116. 10.1006/geno.2001.6604.: Le Goff W, Guerin M, Chapman MJ, Thillet J: A CYP7A promoter binding factor site and Alu repeat in the distal promoter region are implicated in regulation of human CETP gene expression. J Lipid Res. 2003, 44: 902-910. 10.1194/jlr.M200423-JLR200.: Shephard EA, Chandan P, Stevanovic-Walker M, Edwards M, Phillips IR: Alternative promoters and repetitive DNA elements define the species-dependent tissue-specific expression of the FMO1 genes of human and mouse. Biochem J. 2007, 406: 491-499. 10.1042/BJ20070523.: McHaffie GS, Ralston SH: Origin of a negative calcium response element in an ALU-repeat: implications for regulation of gene expression by extracellular calcium. Bone. 1995, 17: 11-14. 10.1016/8756-3282(95)00131-V.: Reinton N, Haugen TB, Orstavik S, Skålhegg BS, Hansson V, Jahnsen T, Taskén K: The gene encoding the C gamma catalytic subunit of cAMP-dependent protein kinase is a transcribed retroposon. Genomics. 1998, 49: 290-297. 10.1006/geno.1998.5240.: Jin H, Selfe J, Whitehouse C, Morris JR, Solomon E, Roberts RG: Structural evolution of the BRCA1 genomic region in primates. Genomics. 2004, 84: 1071-1082. 10.1016/j.ygeno.2004.08.019.: Szabó Z, Levi-Minzi SA, Christiano AM, Struminger C, Stoneking M, Batzer MA, Boyd CD: Sequential loss of two neighboring exons of the tropoelastin gene during primate evolution. J Mol Evol. 1999, 49: 664-671. 10.1007/PL00006587.: Download references: Acknowledgements: We are grateful to Professor Jen McComb of Murdoch University for critical assessment of the manuscript.: Author information: Affiliations: Corresponding author: Correspondence to Keith R Oliver.: Additional information: Competing interests: The authors declare that they have no competing interests.: Authors' contributions: KRO and WKG contributed equally to the writing and the research for this article. Both authors approved the final manuscript.: Authors’ original submitted files for images: Below are the links to the authors’ original submitted files for images.: Authors’ original file for figure 1: Authors’ original file for figure 2: Rights and permissions: This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.: Reprints and Permissions: About this article: Cite this article: Oliver, K.R., Greene, W.K. Mobile DNA and the TE-Thrust hypothesis: supporting evidence from the primates. Mobile DNA 2, 8 (2011). https://doi.org/10.1186/1759-8753-2-8: Download citation: Received: 23 February 2011: Accepted: 31 May 2011: Published: 31 May 2011: DOI: https://doi.org/10.1186/1759-8753-2-8: Keywords: Associated Content: Collection: Mobile DNA All Reviews : Advertisement: ISSN: 1759-8753: Follow BMC: By using this website, you agree to our Terms and Conditions, Privacy statement and Cookies policy. Manage the cookies we use in the preference centre. : © 2020 BioMed Central Ltd unless otherwise stated. Part of Springer Nature. "
"A revised nomenclature for transcribed human endogenous retroviral loci" "Jens Mayer, Jonas Blomberg, Ruth L Seal" "Ruth L Seal" "04 May 2011" "Endogenous retroviruses (ERVs) and ERV-like sequences comprise 8% of the human genome. A hitherto unknown proportion of ERV loci are transcribed and thus contribute to the human transcriptome. A small proportion of these loci encode functional proteins. As the role of ERVs in normal and diseased biological processes is not yet established, transcribed ERV loci are of particular interest. As more transcribed ERV loci are likely to be identified in the near future, the development of a systematic nomenclature is important to ensure that all information on each locus can be easily retrieved., Here we present a revised nomenclature of transcribed human endogenous retroviral loci that sorts loci into groups based on Repbase classifications. Each symbol is of the format ERV + group symbol + unique number. Group symbols are based on a mixture of Repbase designations and well-supported symbols used in the literature. The presented guidelines will allow newly identified loci to be easily incorporated into the scheme., The naming system will be employed by the HUGO Gene Nomenclature Committee for naming transcribed human ERV loci. We hope that the system will contribute to clarifying a certain aspect of a sometimes confusing nomenclature for human endogenous retroviruses. The presented system may also be employed for naming transcribed loci of human non-ERV repeat loci." "European Molecular Biology Laboratory, Human Genome Organisation, Nomenclature Scheme, Primer Binding Site Sequence, Partial Open Reading Frame" " A revised nomenclature for transcribed human endogenous retroviral loci: Jens Mayer1, Jonas Blomberg2 & Ruth L Seal3 : Mobile DNA volume 2, Article number: 7 (2011) Cite this article : 12k Accesses: 55 Citations: 6 Altmetric: Metrics details: Abstract: Background: Endogenous retroviruses (ERVs) and ERV-like sequences comprise 8% of the human genome. A hitherto unknown proportion of ERV loci are transcribed and thus contribute to the human transcriptome. A small proportion of these loci encode functional proteins. As the role of ERVs in normal and diseased biological processes is not yet established, transcribed ERV loci are of particular interest. As more transcribed ERV loci are likely to be identified in the near future, the development of a systematic nomenclature is important to ensure that all information on each locus can be easily retrieved.: Results: Here we present a revised nomenclature of transcribed human endogenous retroviral loci that sorts loci into groups based on Repbase classifications. Each symbol is of the format ERV + group symbol + unique number. Group symbols are based on a mixture of Repbase designations and well-supported symbols used in the literature. The presented guidelines will allow newly identified loci to be easily incorporated into the scheme.: Conclusions: The naming system will be employed by the HUGO Gene Nomenclature Committee for naming transcribed human ERV loci. We hope that the system will contribute to clarifying a certain aspect of a sometimes confusing nomenclature for human endogenous retroviruses. The presented system may also be employed for naming transcribed loci of human non-ERV repeat loci.: Human endogenous retroviruses: Human endogenous retroviruses (ERVs) are remnants of infections of former exogenous retroviruses. Proviruses formed by numerous distinct exogenous retroviruses in the germline genome could be inherited by subsequent generations. About 8% of the human genome consists of sequences that are potentially of retroviral origin [1] and are distributed in about 700,000 different loci. In addition to proviruses, these sequences include solitary long terminal repeats (LTRs), nonretroviral sequences flanked by LTRs that may not be directly derived from infectious retroviruses and sequences similar to LTRs. ERVs and related sequences are thus part of the repetitive portions of the human genome, which comprise about 45% of the human genome mass, including mobile DNA such as L1, Alu and SVA elements.: Detailed analysis of the human genome sequence by wet-lab and bioinformatics approaches resulted in the definition of ERV groups, with the number depending on the methods used for defining groups: 31 groups were defined by Sperber et al.[2] and Blomberg et al.[3], 42 groups were defined by Mager and Medstrand [4], 30 groups were defined by Gifford and Tristem [5] and several hundred human ERV and LTR families were defined by Repbase [6].: Almost all human ERV loci no longer encode former retroviral proteins because of their ancient incorporation into the host genome and thus accumulation of nonsense mutations. Many loci are missing large proviral portions, and most loci have been reduced to so-called solitary LTRs by homologous recombination between proviral LTRs. For more detailed information on human ERVs, we refer interested readers to recent reviews on the topic and the references therein [7–10].: While protein coding capacity is very limited, many human ERV loci still are transcribed and usually are initiated by promoter sequences within the proviral LTRs. Obviously, mutations within LTRs have not yet rendered all LTRs in the human genome defective. In principle, promoters in flanking, non-ERV sequences may also contribute to transcription of those loci. Probably every human tissue and cell type, diseased or not, contains ERV transcripts [11, 12]. More than a single ERV group is usually found transcribed, and patterns of transcribed ERV groups differ between tissue and cell types. Transcription of ERV loci is thus regulated in some way. While expression of ERV sequences has been associated with a number of human diseases, such as germ cell tumours, melanoma and multiple sclerosis, the involvement of ERVs in human diseases remains to be elucidated. On the other side, some ERV loci very likely provide important biological functions, such as the syncytin [13] and syncytin 2 loci [14], referred to herein as ERVW-1 and ERVFRD-1, respectively. Other loci harbouring only partial open reading frames, such as a recently characterized HERV-W locus on chromosome Xq22.3 [15] (ERVW-2), may likewise produce partial retroviral proteins with potential biological functions. It is therefore of particular interest which ERV loci actually contribute to the human transcriptome.: Recent studies have identified transcribed ERV loci in normal and diseased human cells and tissues by means of reassigning ERV cDNA sequences to individual loci in the human reference genome sequence, employing characteristic nucleotide differences between individual loci of a regarded ERV group. Many more transcribed ERV loci are likely to be identified in future studies. It is therefore necessary to introduce a nomenclature for transcribed human ERV sequences.: Previous nomenclature used in the literature: The lack of an established nomenclature for transcribed ERV elements has led to confusion within the literature. These problems were previously reviewed in detail [16]. ERVs have been classified into groups (formerly known as \"families\", which is heresy to virologists because \"family\" refers to Retroviridae), although different classification systems have been used. For instance, some groups have been defined initially by molecular genetics means, others by sequence similarity and others by primer binding site sequences. Changing amounts of sequence information also showed that some ERV groups' designations needed to be revised. Different names have been used for the same ERV group. Likewise, individual loci have been referred to using a variety of different symbols (for example, see the aliases listed in Table 1 for the ERVK-6 locus). The use of different symbols for the same locus makes it difficult to retrieve all information on that particular locus.: Previous ERV nomenclature and the Human Genome Organisation Gene Nomenclature Committee: The Human Genome Organisation (HUGO) Gene Nomenclature Committee (HGNC) works under the auspices of HUGO and is the only worldwide authority that assigns standardised nomenclature to human genes [17]. The HGNC has previously focused on approving nomenclature for protein-coding genes, pseudogenes, phenotypes and noncoding RNA. In the past, the committee has approved symbols for specific human ERVs only at the request of individual researchers. The symbols did not follow a systematic nomenclature: some symbols were of a simple format (for example, ERV1), some provided information on the group to which the ERV belonged (for example, ERVK2) and others included information on proteins encoded by the ERV (for example, ERVWE1 (endogenous retroviral family W, env(C7), member 1)). On reviewing the literature, it was clear that (1) many of the most frequently published loci were not represented by HGNC symbols, (2) by following more than one system, HGNC symbols were not serving the community, and (3) the nomenclature needed both updating and expansion.: HGNC editors curate relevant information for each gene that has approved nomenclature. In addition to approving a gene symbol and name for each transcribed human ERV, the HGNC records all known symbol aliases so that information on each gene can be retrieved using any known symbol. HGNC entries also include the chromosomal location of the ERV locus, links to GenBank, European Molecular Biology Laboratory (EMBL) and DNA Databank of Japan (DDBJ) sequence records and links to at least one PubMed reference. Where appropriate, links are also provided to annotation projects at both the genomic and proteomic levels. HGNC names are propagated to other major biological databases, such as Ensembl, UniProt and Entrez Gene. Therefore, this new nomenclature will provide a useful resource that is currently unavailable to the ERV community and other researchers concerned with ERVs.: A gene-based nomenclature: The primary definition of a gene used by the HGNC is \"a DNA segment that contributes to phenotype/function\" [18]. It is beyond the scope of this nomenclature effort to standardise the nomenclature of ERVs in general or to attempt to name every ERV element in the genome. As discussed above, there is evidence that some human ERVs encode functional proteins and that some encode transcripts and/or proteins which may be associated with disease, so the transcriptionally active loci come under the remit of the HGNC for naming. This category of ERVs represents most of the individual loci that have been published with individual names, so it is worth developing a standardised nomenclature for this subset. The three criteria for being accepted as a transcriptionally active ERV are as follows: (1) The ERV must be represented by an mRNA sequence in a public database, (2) the reported cDNA sequence must map unambiguously to the reference genome to allow identification and (3) the sequence must represent a viral gene rather than solely a solitary LTR. We acknowledge that there are sources of uncertainty. Many ERVs may be expressed at a low level [19], a \"leakage\" which can be hard to distinguish from perhaps more significant expression. Groups of recently integrated ERVs may be highly expressed, but their transcripts may be identical or almost identical and could be hard to map unambiguously. However, these difficulties should not prevent the naming of ERV loci which fulfil the criteria mentioned above. There is one symbol approved per ERV locus independently of how many viral genes the ERV may encode.: A systematic ERV nomenclature scheme: The nomenclature scheme described in this paper aims to be concise so that it is user-friendly. It also aims to be informative to researchers, including those who are less familiar with the field. To be informative, the nomenclature scheme is hierarchical, with each symbol beginning with the root symbol \"ERV\" so that the symbols are instantly recognisable and can be grouped together in searches. Note that many researchers have published papers using symbols beginning with \"HERV\", but it is against the guidelines of the HGNC ever to use H for \"human\" in symbols, mainly because this precludes the possibility of the nomenclature scheme's being extended to other species. Each ERV symbol, then, includes an identifier that represents the group to which the ERV belongs.: In order for the nomenclature scheme to be systematic, one method of sorting ERVs into groups needed to be selected. The Repbase system [6] is a widely known, comprehensive database of repetitive elements that groups ERVs together on the basis of sequence similarity. RepeatMasker annotations using Repbase designations are available on the University of California, Santa Cruz (UCSC) [20], and Ensembl [21] genome browsers, making these ERV groups highly accessible and recognisable to researchers in the field. Therefore, the nomenclature system uses the Repbase classification system for naming the ERVs within groups. Repbase groups, however, do not follow a systematic nomenclature and often contain an unallowable \"H\" for \"human\". When deciding on the group identifier to be included in each symbol, we compared Repbase symbols with those that have appeared frequently in the literature. In cases where there was a well-supported nomenclature present in the literature, we used this symbol in place of the Repbase symbol; for example, we used ERVW instead of the Repbase group designation HERV17, as we felt that these would be more likely to be used by the ERV community. For a comparison of the group symbols used in the new nomenclature scheme with Repbase designations, see Table 2.: Finally, each ERV within a particular group is uniquely identified by a number, for example, ERVK-1. Numbers are assigned consecutively within each group to make the nomenclature system expandable. The number is used to make each symbol unique and has no intrinsic meaning. ERVK-2 has merely been assigned the next number following ERVK-1, but this provides no information on the position of the ERVs within the genome or the order in which an ERV may have been published. The use of numerical identifiers keeps the symbols as short as possible to encourage widespread use by researchers. Newly identified transcribed loci will take the next available consecutive number for their particular group; for example, if a newly transcribed ERVK locus is identified, it will take the symbol ERVK-26. Each symbol is accompanied by an expanded gene name which clearly and succinctly explains that derivation of the nomenclature; for example, the full name of ERVFRD-1 is \"endogenous retrovirus group FRD, member 1\".: We are aware that the proposed nomenclature scheme cannot encompass all conceivable (and sometimes known) unusual structures of ERV loci, such as hybrid loci consisting of different ERV groups and ERV insertions into existing ERV loci [22]. HGNC, after conferring with researchers who submit newly identified transcribed loci, will decide whether or how to name such unique loci on a case-by-case basis. For example, the scheme will not incorporate ERV locus transcripts that are part of another gene's transcript, as these elements will not be considered separate loci.: Table 1 lists transcribed human ERVs that have been named according to the new nomenclature system. All ERVs in the table either have been published or have been annotated by the RefSeq project. An initial list was sent to a number of researchers in the field for their comments. The list was expanded as these researchers suggested more loci. Where no transcript sequence was available, authors were asked to submit representative sequences to the GenBank, EMBL and DDBJ databases. We encourage researchers to contact the HGNC if they know of further ERVs that can be included in the scheme.: Finally, although only human gene nomenclature is under the remit of the HGNC, we wish to mention that the naming system introduced here for transcribed ERVs could, in principle, also be applied to other, non-ERV repetitive sequences in the human genome, as well as to repetitive DNA in nonhuman species. Future research will probably reveal numerous transcribed repetitive DNA sequences in various species. Judged just from ERV designations in different species, a standardised naming system for transcribed repeat loci may be highly beneficial to avoid future confusion.: References: Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al: Initial sequencing and analysis of the human genome. Nature. 2001, 409 (6822): 860-921. 10.1038/35057062.: Sperber GO, Airola T, Jern P, Blomberg J: Automated recognition of retroviral sequences in genomic data--RetroTector. Nucleic Acids Res. 2007, 35 (15): 4964-4976. 10.1093/nar/gkm515.: Blomberg J, Goran S, Jern P, Benachenhou F: Towards a retrovirus database, RetroBank. Proceedings of the Centennial Retrovirus Meeting, 29 April - 4 May 2010. Edited by: Daniel R, Hejnar J, Skalka AM, Svoboda J Prague. 2010, Czech Republic: Medimond International Proceedings, 19-22.: Mager DL, Medstrand P: Retroviral Repeat Sequences. Nature Encyclopedia of the Human Genome. Edited by: Cooper D. 2003, Hampshire, England: Macmillan Publishers Ltd, 5: 57-63.: Gifford R, Tristem M: The evolution, distribution and diversity of endogenous retroviruses. Virus Genes. 2003, 26 (3): 291-315. 10.1023/A:1024455415443.: Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J: Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005, 110 (1-4): 462-467. 10.1159/000084979.: Blikstad V, Benachenhou F, Sperber GO, Blomberg J: Evolution of human endogenous retroviral sequences: a conceptual account. Cell Mol Life Sci. 2008, 65 (21): 3348-3365. 10.1007/s00018-008-8495-2.: Kurth R, Bannert N: Beneficial and detrimental effects of human endogenous retroviruses. Int J Cancer. 2010, 126 (2): 306-314. 10.1002/ijc.24902.: Mayer J, Meese E: Human endogenous retroviruses in the primate lineage and their influence on host genomes. Cytogenet Genome Res. 2005, 110 (1-4): 448-456. 10.1159/000084977.: Ruprecht K, Mayer J, Sauter M, Roemer K, Mueller-Lantzsch N: Endogenous retroviruses and cancer. Cell Mol Life Sci. 2008, 65 (21): 3366-3382. 10.1007/s00018-008-8496-1.: Hu L: Endogenous retroviral RNA expression in humans. PhD thesis. 2007, Department of Medical Sciences, Clinical Virology, Uppsala University: Seifarth W, Frank O, Zeilfelder U, Spiess B, Greenwood AD, Hehlmann R, Leib-Mösch C: Comprehensive analysis of human endogenous retrovirus transcriptional activity in human tissues with a retrovirus-specific microarray. J Virol. 2005, 79 (1): 341-352. 10.1128/JVI.79.1.341-352.2005.: Mi S, Lee X, Li X, Veldman GM, Finnerty H, Racie L, LaVallie E, Tang XY, Edouard P, Howes S, et al: Syncytin is a captive retroviral envelope protein involved in human placental morphogenesis. Nature. 2000, 403 (6771): 785-789. 10.1038/35001608.: de Parseval N, Heidmann T: Human endogenous retroviruses: from infectious elements to human genes. Cytogenet Genome Res. 2005, 110 (1-4): 318-332. 10.1159/000084964.: Roebke C, Wahl S, Laufer G, Stadelmann C, Sauter M, Mueller-Lantzsch N, Mayer J, Ruprecht K: An N-terminally truncated envelope protein encoded by a human endogenous retrovirus W locus on chromosome Xq22.3. Retrovirology. 2010, 7 (1): 69-10.1186/1742-4690-7-69.: Blomberg J, Benachenhou F, Blikstad V, Sperber G, Mayer J: Classification and nomenclature of endogenous retroviral sequences (ERVs): problems and recommendations. Gene. 2009, 448 (2): 115-123. 10.1016/j.gene.2009.06.007.: Seal RL, Gordon SM, Lush MJ, Wright MW, Bruford EA: genenames.org: the HGNC resources in 2011. Nucleic Acids Res. 2011, D514-519. 39 Database: Wain HM, Bruford EA, Lovering RC, Lush MJ, Wright MW, Povey S: Guidelines for human gene nomenclature. Genomics. 2002, 79 (4): 464-470. 10.1006/geno.2002.6748.: Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE, et al: Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007, 447 (7146): 799-816. 10.1038/nature05874.: Mangan ME, Williams JM, Kuhn RM, Lathe WC: The UCSC genome browser: what every molecular biologist should know. Curr Protoc Mol Biol. 2009, Chapter 19: Unit19 19-: Flicek P, Amode MR, Barrell D, Beal K, Brent S, Chen Y, Clapham P, Coates G, Fairley S, Fitzgerald S: Ensembl 2011. Nucleic Acids Res. 2011, D800-806. 39 Database: Flockerzi A, Burkhardt S, Schempp W, Meese E, Mayer J: Human Endogenous Retrovirus HERV-K14 Families: Status, Variants, Evolution, and Mobilization of Other Cellular Sequences. J Virol. 2005, 79 (5): 2941-2949. 10.1128/JVI.79.5.2941-2949.2005.: Flockerzi A, Ruggieri A, Frank O, Sauter M, Maldener E, Kopper B, Wullich B, Seifarth W, Muller-Lantzsch N, Leib-Mosch C, et al: Expression patterns of transcribed human endogenous retrovirus HERV-K(HML-2) loci in human tissues and the need for a HERV Transcriptome Project. BMC Genomics. 2008, 9: 354-10.1186/1471-2164-9-354.: Villesen P, Aagaard L, Wiuf C, Pedersen FS: Identification of endogenous retroviral reading frames in the human genome. Retrovirology. 2004, 1: 32-10.1186/1742-4690-1-32.: Sugimoto J, Matsuura N, Kinjo Y, Takasu N, Oda T, Jinno Y: Transcriptionally active HERV-K genes: identification, isolation, and chromosomal mapping. Genomics. 2001, 72 (2): 137-144. 10.1006/geno.2001.6473.: Berkhout B, Jebbink M, Zsiros J: Identification of an active reverse transcriptase enzyme encoded by a human endogenous HERV-K retrovirus. J Virol. 1999, 73 (3): 2365-2375.: de Parseval N, Lazar V, Casella JF, Benit L, Heidmann T: Survey of human genes of retroviral origin: identification and transcriptome of the genes with coding capacity for complete envelope proteins. J Virol. 2003, 77 (19): 10414-10422. 10.1128/JVI.77.19.10414-10422.2003.: Barbulescu M, Turner G, Seaman MI, Deinard AS, Kidd KK, Lenz J: Many human endogenous retrovirus K (HERV-K) proviruses are unique to humans. Curr Biol. 1999, 9 (16): 861-868. 10.1016/S0960-9822(99)80390-X.: Turner G, Barbulescu M, Su M, Jensen-Seaman MI, Kidd KK, Lenz J: Insertional polymorphisms of full-length endogenous retroviruses in humans. Curr Biol. 2001, 11 (19): 1531-1535. 10.1016/S0960-9822(01)00455-9.: Zsiros J, Jebbink MF, Lukashov VV, Voute PA, Berkhout B: Evolutionary relationships within a subgroup of HERV-K-related human endogenous retroviruses. J Gen Virol. 1998, 79 (Pt 1): 61-70.: Tonjes RR, Czauderna F, Kurth R: Genome-wide screening, cloning, chromosomal assignment, and expression of full-length human endogenous retrovirus type K. J Virol. 1999, 73 (11): 9187-9195.: Okahara G, Matsubara S, Oda T, Sugimoto J, Jinno Y, Kanaya F: Expression analyses of human endogenous retroviruses (HERVs): tissue-specific and developmental stage-dependent expression of HERVs. Genomics. 2004, 84 (6): 982-990. 10.1016/j.ygeno.2004.09.004.: Frank O, Verbeke C, Schwarz N, Mayer J, Fabarius A, Hehlmann R, Leib-Mosch C, Seifarth W: Variable transcriptional activity of endogenous retroviruses in human breast cancer. J Virol. 2008, 82 (4): 1808-1818. 10.1128/JVI.02115-07.: Blaise S, de Parseval N, Benit L, Heidmann T: Genomewide screening for fusogenic human endogenous retrovirus envelopes identifies syncytin 2, a gene conserved on primate evolution. Proc Natl Acad Sci USA. 2003, 100 (22): 13013-13018. 10.1073/pnas.2132646100.: Cohen M, Powers M, O'Connell C, Kato N: The nucleotide sequence of the env gene from the human provirus ERV3 and isolation and characterization of an ERV3-specific cDNA. Virology. 1985, 147 (2): 449-458. 10.1016/0042-6822(85)90147-3.: Larsson E, Andersson AC, Nilsson BO: Expression of an endogenous retrovirus (ERV3 HERV-R) in human reproductive and embryonic tissues--evidence for a function for envelope gene products. Ups J Med Sci. 1994, 99 (2): 113-120. 10.3109/03009739409179354.: Blond JL, Beseme F, Duret L, Bouton O, Bedin F, Perron H, Mandrand B, Mallet F: Molecular characterization and placental expression of HERV-W, a new human endogenous retrovirus family. J Virol. 1999, 73 (2): 1175-1185.: Laufer G, Mayer J, Mueller BF, Mueller-Lantzsch N, Ruprecht K: Analysis of transcribed human endogenous retrovirus W env loci clarifies the origin of multiple sclerosis-associated retrovirus env sequences. Retrovirology. 2009, 6: 37-10.1186/1742-4690-6-37.: Komurian-Pradel F, Paranhos-Baccala G, Bedin F, Ounanian-Paraz A, Sodoyer M, Ott C, Rajoharison A, Garcia E, Mallet F, Mandrand B, et al: Molecular cloning and characterization of MSRV-related sequences associated with retrovirus-like particles. Virology. 1999, 260 (1): 1-9. 10.1006/viro.1999.9792.: Jeong BH, Lee YJ, Carp RI, Kim YS: The prevalence of human endogenous retroviruses in cerebrospinal fluids from patients with sporadic Creutzfeldt-Jakob disease. J Clin Virol. 2010, 47 (2): 136-142. 10.1016/j.jcv.2009.11.016.: Yao Y, Schroder J, Nellaker C, Bottmer C, Bachmann S, Yolken RH, Karlsson H: Elevated levels of human endogenous retrovirus-W transcripts in blood cells from patients with first episode schizophrenia. Genes Brain Behav. 2008, 7 (1): 103-112.: de Parseval N, Diop G, Blaise S, Helle F, Vasilescu A, Matsuda F, Heidmann T: Comprehensive search for intra- and inter-specific sequence polymorphisms among coding envelope genes of retroviral origin found in the human genome: genes and pseudogenes. BMC Genomics. 2005, 6: 117-10.1186/1471-2164-6-117.: Yi JM, Kim HS: Expression and phylogenetic analyses of human endogenous retrovirus HC2 belonging to the HERV-T family in human tissues and cancer cells. J Hum Genet. 2007, 52 (4): 285-296. 10.1007/s10038-007-0115-8.: Shiroma T, Sugimoto J, Oda T, Jinno Y, Kanaya F: Search for active endogenous retroviruses: identification and characterization of a HERV-E gene that is expressed in the pancreas and thyroid. J Hum Genet. 2001, 46 (11): 619-625. 10.1007/s100380170012.: Prusty BK, zur Hausen H, Schmidt R, Kimmel R, de Villiers EM: Transcription of HERV-E and HERV-E-related sequences in malignant and non-malignant human haematopoietic cells. Virology. 2008, 382 (1): 37-45. 10.1016/j.virol.2008.09.006.: Takahashi Y, Harashima N, Kajigaya S, Yokoyama H, Cherkasova E, McCoy JP, Hanada K, Mena O, Kurlander R, Tawab A, et al: Regression of human kidney cancer following allogeneic stem cell transplantation is associated with recognition of an HERV-E antigen by T cells. J Clin Invest. 2008, 118 (3): 1099-1109.: Kjeldbjerg AL, Villesen P, Aagaard L, Pedersen FS: Gene conversion and purifying selection of a placenta-specific ERV-V envelope gene during simian evolution. BMC Evol Biol. 2008, 8: 266-10.1186/1471-2148-8-266.: Liang QY, Ding JY, Zheng S: Identification and detection of a novel human endogenous retrovirus-related gene, and structural characterization of its related elements. Genet Mol Biol. 2009, 32 (4): 704-U738. 10.1590/S1415-47572009005000082.: Patzke S, Lindeskog M, Munthe E, Aasheim HC: Characterization of a novel human endogenous retrovirus, HERV-H/F, expressed in human leukemia cell lines. Virology. 2002, 303 (1): 164-173. 10.1006/viro.2002.1615.: Lindeskog M, Blomberg J: Spliced human endogenous retroviral HERV-H env transcripts in T-cell leukaemia cell lines and normal leukocytes: alternative splicing pattern of HERV-H transcripts. J Gen Virol. 1997, 78 (Pt 10): 2575-2585.: Moles JP, Tesniere A, Guilhou JJ: A new endogenous retroviral sequence is expressed in skin of patients with psoriasis. Br J Dermatol. 2005, 153 (1): 83-89. 10.1111/j.1365-2133.2005.06555.x.: La Mantia G, Pengue G, Maglione D, Pannuti A, Pascucci A, Lania L: Identification of new human repetitive sequences: characterization of the corresponding cDNAs and their expression in embryonal carcinoma cells. Nucleic Acids Res. 1989, 17 (15): 5913-5922. 10.1093/nar/17.15.5913.: Andersson ML, Lindeskog M, Medstrand P, Westley B, May F, Blomberg J: Diversity of human endogenous retrovirus class II-like sequences. J Gen Virol. 1999, 80 (Pt 1): 255-260.: Download references: Acknowledgements: Research in the laboratory of JM is supported by Deutsche Forschungsgemeinschaft (DFG) and Homburger Forschungsförderungsprogramm (HOMFOR) The HGNC is funded by The Wellcome Trust grant 081979/Z/07/Z and National Human Genome Research Institute grant P41 HG03345). We also acknowledge those who provided extra information for the list of transcribed ERV loci: Yoshihiro Jinno, Finn Skou Pedersen, Benoit Barbeau, Christine Leib-Mösch and Patrick M Alliel.: Author information: Affiliations: Corresponding author: Correspondence to Ruth L Seal.: Additional information: Competing interests: The authors declare that they have no competing interests.: Authors' contributions: All authors contributed to the presented nomenclature scheme and wrote the manuscript.: Rights and permissions: Reprints and Permissions: About this article: Cite this article: Mayer, J., Blomberg, J. & Seal, R.L. A revised nomenclature for transcribed human endogenous retroviral loci. Mobile DNA 2, 7 (2011). https://doi.org/10.1186/1759-8753-2-7: Download citation: Received: 11 February 2011: Accepted: 04 May 2011: Published: 04 May 2011: DOI: https://doi.org/10.1186/1759-8753-2-7: Keywords: Associated Content: Collection: Mobile DNA All Reviews : Advertisement: ISSN: 1759-8753: Follow BMC: By using this website, you agree to our Terms and Conditions, Privacy statement and Cookies policy. Manage the cookies we use in the preference centre. : © 2020 BioMed Central Ltd unless otherwise stated. Part of Springer Nature. "
"Prevalence of SOS-mediated control of integron integrase expression as an adaptive trait of chromosomal and mobile integrons" "Guillaume Cambray, Neus Sanchez-Alberola, Susana Campoy, Émilie Guerin, Sandra Da Re, Bruno González-Zorn, Marie-Cécile Ploy, Jordi Barbé, Didier Mazel, Ivan Erill" "Didier Mazel" "30 April 2011" "Integrons are found in hundreds of environmental bacterial species, but are mainly known as the agents responsible for the capture and spread of antibiotic-resistance determinants between Gram-negative pathogens. The SOS response is a regulatory network under control of the repressor protein LexA targeted at addressing DNA damage, thus promoting genetic variation in times of stress. We recently reported a direct link between the SOS response and the expression of integron integrases in Vibrio cholerae and a plasmid-borne class 1 mobile integron. SOS regulation enhances cassette swapping and capture in stressful conditions, while freezing the integron in steady environments. We conducted a systematic study of available integron integrase promoter sequences to analyze the extent of this relationship across the Bacteria domain., Our results showed that LexA controls the expression of a large fraction of integron integrases by binding to Escherichia coli-like LexA binding sites. In addition, the results provide experimental validation of LexA control of the integrase gene for another Vibrio chromosomal integron and for a multiresistance plasmid harboring two integrons. There was a significant correlation between lack of LexA control and predicted inactivation of integrase genes, even though experimental evidence also indicates that LexA regulation may be lost to enhance expression of integron cassettes., Ancestral-state reconstruction on an integron integrase phylogeny led us to conclude that the ancestral integron was already regulated by LexA. The data also indicated that SOS regulation has been actively preserved in mobile integrons and large chromosomal integrons, suggesting that unregulated integrase activity is selected against. Nonetheless, additional adaptations have probably arisen to cope with unregulated integrase activity. Identifying them may be fundamental in deciphering the uneven distribution of integrons in the Bacteria domain." "Integrase Gene, LexA Protein, IntI Sequence, LexA Binding, lexA Gene" " Prevalence of SOS-mediated control of integron integrase expression as an adaptive trait of chromosomal and mobile integrons: Guillaume Cambray1, Neus Sanchez-Alberola2,3, Susana Campoy2, Émilie Guerin4, Sandra Da Re4, Bruno González-Zorn5, Marie-Cécile Ploy4, Jordi Barbé3, Didier Mazel1 & Ivan Erill3 : Mobile DNA volume 2, Article number: 6 (2011) Cite this article : 13k Accesses: 61 Citations: 0 Altmetric: Metrics details: Abstract: Background: Integrons are found in hundreds of environmental bacterial species, but are mainly known as the agents responsible for the capture and spread of antibiotic-resistance determinants between Gram-negative pathogens. The SOS response is a regulatory network under control of the repressor protein LexA targeted at addressing DNA damage, thus promoting genetic variation in times of stress. We recently reported a direct link between the SOS response and the expression of integron integrases in Vibrio cholerae and a plasmid-borne class 1 mobile integron. SOS regulation enhances cassette swapping and capture in stressful conditions, while freezing the integron in steady environments. We conducted a systematic study of available integron integrase promoter sequences to analyze the extent of this relationship across the Bacteria domain.: Results: Our results showed that LexA controls the expression of a large fraction of integron integrases by binding to Escherichia coli-like LexA binding sites. In addition, the results provide experimental validation of LexA control of the integrase gene for another Vibrio chromosomal integron and for a multiresistance plasmid harboring two integrons. There was a significant correlation between lack of LexA control and predicted inactivation of integrase genes, even though experimental evidence also indicates that LexA regulation may be lost to enhance expression of integron cassettes.: Conclusions: Ancestral-state reconstruction on an integron integrase phylogeny led us to conclude that the ancestral integron was already regulated by LexA. The data also indicated that SOS regulation has been actively preserved in mobile integrons and large chromosomal integrons, suggesting that unregulated integrase activity is selected against. Nonetheless, additional adaptations have probably arisen to cope with unregulated integrase activity. Identifying them may be fundamental in deciphering the uneven distribution of integrons in the Bacteria domain.: Background: Integrons are bacterial genetic elements capable of incorporating exogenous and promoterless open reading frames (ORF), referred to as gene cassettes, by site-specific recombination (Figure 1). First described in the late 1980s in connection with the emergence of antibiotic resistance [1], integrons always contain three functional components: an integrase gene (intI), which mediates recombination; a primary recombination site (attI); and an outward-orientated promoter (PC ) [2]. Cassette integrations occur mainly at the attI site, ensuring the correct expression of newly captured cassettes by placing them under the control of the PC promoter [3, 4]. To date, two main subsets of integrons have been described. On the one hand, mobile integrons, also referred to as multiresistance integrons, contain relatively few (two to eight) cassettes, and collectively encode resistance to a broad spectrum of antibiotics [5–7]. They have been conventionally divided into five different classes according to their intI gene sequence: class 1 for intI1, class 2 for intI2, class 3 for intI3, class 4 for intISXT (formerly intI9) and class 5 for intIHS[8, 9]. Mobile integrons are typically associated with transposons and conjugative plasmids, ensuring their dissemination across bacterial species. They are present mostly in the Proteobacteria, but have also been reported in other bacterial phyla, such as Gram-positive bacteria [9]. By contrast, chromosomal integrons have been identified in the genomes of many bacterial species [10]. Because their phylogeny reflects a predominant pattern of vertical inheritance, these integrons are not catalogued based on the class nomenclature described above, but according to their host species [8, 9]. A subfamily of these, termed superintegrons (SIs), has been specifically identified in the Vibrionaceae and, to some extent, in the Xanthomonadaceae and Pseudomonadaceae[11–16]. Superintegrons typically encompass between 20 and 200 cassettes with species-specific sequence signatures [9], and seem to be ancient residents of the host genome [13]. Most of the genes in the superintegron cassettes are of unknown function [10], but some of them are related to existing resistance cassettes [17–20]. Although stable under laboratory conditions, superintegrons have been reported to be the most variable loci of V. cholerae natural isolates [12, 21], suggesting that integron reorganization might be occasionally upregulated in natural environments. Integron integrases mediate recombination by interacting with single-stranded (ss) attC sites present in all reported cassettes, employing a unique, site-specific recombination process [22–24]. Despite the importance of integrons in the acquisition and spread of antibiotic-resistance determinants and, from a broader perspective, in bacterial adaptation, little was known about the regulatory control and dynamics of cassette recombination until recently, when we reported that the expression of the integron integrases in the V. cholerae superintegron and in a class 1 mobile integron was controlled by the SOS response [25].: Schematic organization of integrons. The functional platform of integrons is constituted by an intI gene encoding an integrase, its own promoter Pint , a cassette promoter PC , and a primary recombination site attI. The system maintains an array that can consist of more than 200 cassettes. Only the few first cassettes are strongly expressed by the PC promoter, as indicated by the fading fill color. Cassettes generally contain a promoterless open reading frame (ORF) flanked by two recombination sites termed attC. Cassettes can be excised from any position in the array through attC × attC recombination mediated by the integrase. The resulting circular intermediate can then be integrated by the integrase, preferentially at attI. Exogenous circular intermediates can also be integrated, owing to the low specificity of the integrase activity, rendering the system prone to horizontal transfer. The SOS response directly controls the expression of many integron integrases by binding of its repressor protein, LexA, to a target site in the Pint promoter.: The SOS response is a global regulatory network governed by a repressor protein (LexA) and principally targeted at addressing DNA damage [26, 27]. LexA represses SOS genes by binding to highly specific binding sites present in their promoter regions. In E. coli and most ß- and <U+03B3>-Proteobacteria, these sites consist of a palindromic motif (CTGTatatatatACAG) 16 bp long, commonly known as the LexA box [26]. The SOS response is typically induced by the presence of ssDNA fragments, which can arise from a number of environmental stresses [28], but are typically linked to replication-fork stall caused by DNA lesions. These ssDNA fragments bind non-specifically to the universal recombination protein RecA [29], enabling it to promote LexA inactivation by autocatalytic cleavage [30], and thus inducing the SOS response. Up to 40 genes have been shown to be directly regulated by LexA in E. coli[31], encoding proteins that stabilize the replication fork, repair DNA, promote translesion synthesis and arrest cell division. Since its initial description in E. coli[26], the SOS response has been characterized in many other bacterial classes and phyla, and LexA has been shown to recognize markedly divergent motifs in different bacterial groups [27].: In recent years, the SOS response has been linked to clinically relevant phenotypes, such as the activation and dissemination of virulence factors carried in bacteriophages [32–34], transposons [35] pathogenicity islands [36] and in integrating conjugative elements encoding antibiotic-resistance genes [27, 37, 38]. Moreover, it has recently become established that some widely used antibiotics, such as fluoroquinolones, trimethoprim and ß-lactams, are able to trigger SOS induction and are thus able to promote the dissemination of antibiotic-resistance genes [27, 37, 39–42]. This puts forward a positive feedback loop that has been suggested to have important consequences for the emergence and dissemination of antibiotic resistance [43]. Our recent work, showing a link between the SOS response and integrase-mediated recombination [25] further reinforces this line of reasoning. Such a link provides bacteria with an antibiotic-induced mechanism for gene acquisition, reorganization and dispersal. In hindsight, the coupling of genetic elements capable of cassette integration with a global response to stress comes out as an elegant and powerful pairing. Integrons can be seen as stockpiling agents of genetic diversity that, in addition, can tap into a huge and variable pool of cassettes through horizontal gene transfer from the surrounding bacterial communities (Figure 1) [10, 44]. Nonetheless, efficient expression of these acquired traits is strongly dependent on integrase-mediated recombination. Newly acquired cassettes sitting in the proximal region of the integron are strongly expressed by the PC promoter, but they can be displaced gradually to distal parts of the integron by the incorporation of new cassettes, and can thus become partially silenced. Infrequent excision and integration events can also relocate cassettes, moving them to distal or proximal parts of the integron, and thus have the ability to reinstate previously acquired cassettes (Figure 1). The timing of all these events is therefore of fundamental importance, and depends on the regulatory systems controlling the expression of the integron integrase gene. In this context, the discovery of a link between the SOS response and integrase expression is an important first step towards unraveling the precise mechanisms controlling integrase expression.: In this study, we expanded on this recent connection between the SOS response and integron integrase expression by means of a systematic study of integron integrase promoter regions. By combining in silico and in vitro methods, we show that LexA control of integrase expression is a widespread phenomenon that arose very early in the evolutionary history of integrons and has since been maintained through positive selection in mobile integrons and large chromosomal integrons. We report a significant correlation between the loss of LexA control and integrase inactivation, indicating that unregulated recombination may be deleterious in these genetic elements. Exceptions to this rule suggest that secondary adaptations to tolerate unregulated integrase expression may have arisen in some clades, and that the identification of such adaptations might shed light onto the uneven distribution of integrons in the Bacteria domain. We discuss these findings for the adaptive dynamics of integrons, and their implications for the acquisition and dissemination of antibiotic-resistance determinants.: Results and discussion: Identification of LexA binding sites in intI promoters: We recently identified E. coli-like LexA binding sites in the promoter region of intI1 integrase genes from mobile integrons and of the intIA integrase from the V. cholerae superintegron (Figure 2AB). In V. cholerae and some of these mobile integrons, the identified LexA boxes partially overlap the -10 element of the intI promoter in a classic operator organization. We have shown that expression of V. cholerae and E. coli pAT674 integrase genes is indeed controlled by the SOS response, leading to heightened rates of integrase-mediated recombination upon SOS induction [25].: In silico analysis of integrase promoters. (A) Alignment of representative promoter regions of Vibrionaceae intIA homologs. Putative LexA binding sequences are boxed, whereas putative s70 promoter elements (-35 and -10) are underlined, and the translation start site of intIA is boxed and in bold type. The multiple sequence alignment was performed using CLUSTALW with default parameters [89]. (B) Representative examples of LexA binding sites identified upstream of different integrase genes from mobile integrons, with (1-5) denoting the integrase class. The provided accessors correspond to IntI proteins from: Escherichia coli pSa (AAA92752), Providencia stuartii ABR23a (ABG21674), Serratia marcescens AK9373 (BAA08929), Vibrio cholerae 569B (AAC38424) and Vibrio salmonicida VS224 pRVS1 (CAC35342). (C) Sequence logos [100] of the profile used to search for ß/<U+03B3>-Proteobacteria LexA binding sites (top) and the profile emerging from the 93 distinct binding sites located (bottom). Lan = Listonella anguillarum; Lpe_CIP = L. pelagia CIP 102762T; Val_12G01 = Vibrio alginolyticus 12G01; Vch_N16961 = V. cholerae O1 biovar Eltor str. N16961; Vha_ATCC = Vibrio harveyi ATCC BAA-1116; Vha_HY01 = V. harveyi HY01; Vme = Vibrio metschnikovii; Vmi = Vibrio mimicus; Vna_CIP = Vibrio natriegens strain CIP 10319; Vpa = Vibrio parahaemolyticus; Vpa_RIMD = V. parahaemolyticus RIMD 2210633; Vsh_AK1 = Vibrio shilonii AK1; Vsp_DAT722 = Vibrio sp. DAT722; Vsp_Ex25 = Vibrio sp. Ex25; Vvu_CIP754 = Vibrio vulnificus CIP 75.4; Vvu_YJ016 = V. vulnificus YJ016 (see Additional file 11 for corresponding accession numbers).: To gain insight into the general relevance of this observation, we undertook an exhaustive in silico study of integrase regulation by the LexA protein. Using a TBLASTN search (National Center for Biotechnology Information (NCBI); http://blast.ncbi.nlm.nih.gov/), we identified 1,483 homologs of intIA in the non-redundant (NR) (971), environmental (ENV) (381) and Whole Genome Shotgun (WGS) (131) subdivisions of the GenBank database. When sufficient data were available (1,103 sequences), the nucleotide sequences corresponding to the first 50 bp of the coding region plus 100 bp upstream of the translation start site (-100, +50) were systematically searched for LexA binding sites. We conducted independent searches for all the 15 LexA binding motifs described to date in the literature [27]. Putative LexA binding sites were detected in 56.6% (624) of the 1,103 sequences for which the (-100,+ 50) region was available (see Additional file 1), with 40 sequences displaying two LexA binding sites in tandem. All the identified LexA binding sites corresponded exclusively to the motif found in E. coli and most ß/<U+03B3>-Proteobacteria (Figure 2C). Given that we searched for 14 additional LexA binding motifs and that the sample of integrase sequences contained representatives from the respective clades in which these motifs have been reported, including one a-Proteobacteria species, this strongly suggests that the putative LexA regulation of intI genes is restricted to organisms harboring LexA proteins that are able to recognize the ß/<U+03B3>-Proteobacteria. The LexA binding motif of the ß/<U+03B3>-Proteobacteria is markedly divergent from that seen in E. coli and the ß/<U+03B3>-Proteobacteria, and it is known to have arisen after the split of the a- and ß/<U+03B3>-Proteobacteria subclasses [45–49]. Hence, it seems very likely that LexA regulation of integrase genes also arose after this evolutionary branching point. When we examined the core 16 bp sequence of the identified E. coli-like LexA binding sites, we identified 93 distinct sequences (see Additional file 2). These LexA binding sites presented substantial diversity while maintaining a high level of conservation, as reflected in their joint information content logo (Figure 2C). Importantly, E. coli-like LexA sites were detected in almost all Vibrionaceae superintegrons (Figure 2A), and in all but one of the mobile integron classes (Figure 2B), indicating that putative LexA regulation of intI genes is a widespread phenomenon among integrons.: Predicted LexA binding sites correspond to functional transcriptional-control elements: We have previously shown that LexA regulates the expression of intI in V. cholerae, and our in silico search identified LexA binding sites in the promoter region of intI for all sequenced Vibrio species (see Additional file 1). To further assess the overall functionality of the in silico predicted LexA binding sites, we evaluated integrase LexA regulation in Vibrio parahaemolyticus strain ATCC 17802, which harbors a LexA binding site upstream of its intIA gene in a genomic context that is substantially different from that of V. cholerae (Figure 2A). Using quantitative reverse transcriptase (RT)-PCR, we determined the intIA expression level in both the wild-type strain and its lexA(Def) derivative (lacking a functional lexA gene). We found an expression ratio of 9.28, revealing a strong LexA regulation of the intIA gene expression (Figure 3A). Furthermore, electrophoretic mobility-shift assays (EMSA) with purified V. parahaemolyticus LexA protein showed that the observed upregulation of intIA expression was directly mediated by LexA in this organism (Figure 3A).: Electrophoretic mobility-shift assay (EMSA) and quantitative real-time reverse transcription PCR on different intI genes and their respective promoters. (A) Vibrio parahaemolyticus integron. EMSA of V. parahaemolyticus intIA promoter with purified V. parahaemolyticus LexA protein. Competitive assays using either non-specific or Pint non-labeled DNA are also shown. The intA expression factor was calculated as the ratio of the relative intA mRNA concentration in the V. parahaemolyticus lexA mutant strain with respect to that in the wild type. (B) E. coli pMUR050 integrons. EMSA of pMUR050 intI1-and intI1+(containing GGG insertion) promoters with purified E. coli LexA protein. The expression factor for both intI genes was calculated as the ratio of each relative intI mRNA concentration in the E. coli lexA-sulA mutant strain with respect to that in the wild type. In all cases, the expression factor of recA is shown as a control, and all expression factors are the mean value from three independent experiments (each in triplicate).: In several class 1 integrons, heightened expression of the cassette genes has been shown to rely on a secondary cassette promoter called PC2 , located just upstream of the intI1 gene (see Additional file 3). PC2 is enabled by a GGG insertion (on the top strand) that increases the distance between the -35 box sequence and a sequence resembling the -10 box consensus from 14 to 17 bp, thereby generating a functional s70 promoter [3, 4, 50, 51]. In all its reported instances, this GGG insertion disrupts a seemingly functional LexA binding site. Therefore, it is likely that the GGG insertion that enables PC2 should simultaneously abolish integrase regulation by LexA. We tested this hypothesis using the E. coli multi-resistant plasmid pMUR050 [52], which provides an ideal background to test this hypothesis because it harbors two integrons with inactivated copies of the intI1 gene. The promoter regions of both intI genes are almost identical, and differ only in that one (PintI1 -) contains a functional LexA binding site in its promoter, whereas the other (PintI1 +) presents the aforementioned GGG insertion, disrupting the LexA binding site and enabling the PC2 promoter (see Additional file 3). As expected, EMSA confirmed that E. coli LexA is able to bind the PintI1 - promoter, but that the GGG insertion effectively prevents LexA binding on PintI1 + (Figure 3B). Furthermore, RT-PCR in wild-type and lexA-defective backgrounds confirmed that LexA regulation was only present in the IntI1- integrase carrying the intact LexA binding site, with a strong deregulation (ratio of 6.55) in the lexA mutant (Figure 3B). Thus, the GGG insert not only enables the secondary cassette promoter PC2 , but concomitantly disrupts the LexA binding site of the integrase promoter.: To check whether the GGG insert did in fact lead to the activation of PC2 and increased cassette expression in the pMUR IntI1+ integrase, we compared RT-PCR expression profiles for the first cassette gene of both pMUR integrons. We found that cassette-gene expression was enhanced in the integron containing the GGG insertion, and that this increase was independent of LexA (data not shown). In silico searches for disrupted LexA binding sites found 44 instances of similar GGG inserts in integrons from a wide variety of species (see Additional file 4), all corresponding to class 1 mobile integrons harboring multiple antibiotic-resistance cassettes. Together, these results suggest that LexA regulation may eventually be lost under heavy selection to promote higher basal levels of the antibiotic-resistance transcript.: Ancestral-state reconstruction of LexA regulation and integrase functionality: The presence of confirmed LexA regulation in V. cholerae and V. parahaemolyticus superintegrons suggested that SOS control of intI genes probably originated very early in their evolutionary history. Likewise, the complete absence of LexA binding motifs different from that of E. coli in all the intI promoters analyzed in this study indicated that LexA regulation must have been lost in integrons borne by species without LexA, or in which LexA recognizes a divergent motif [27, 47]. At the same time, there is ample evidence of extensive (10% to 30%) and independent integrase inactivation across the Bacteria domain, implying that loss of integrase functionality may be an adaptive trait under particular selective pressures [53]. In this respect, the evidence of integrase inactivation in bacterial groups in which it is known that LexA does not recognize the E. coli motif [16, 54, 55], such as the Xanthomonadales, suggests that loss of LexA regulation might be linked to mutational inactivation of the integrase gene.: To explore this hypothesis, we developed an automated system to assess integrase functionality based on the detection of generic (nonsense and indels) and integrase-specific missense mutations known to inactivate the protein (see Methods). This method was applied to 1,135 intIA homologs identified in this work for which sufficient coding sequence was available. Consistent with previous results, we found that a substantial fraction of integrase genes (43%, see Additional file 5) seem to be inactivated [53]. For the 755 intIA homologs with sufficient sequence to apply both analyses, the predicted inactivation status for each integrase sequence (active/inactive) was combined with the predicted presence of a LexA binding site in its promoter (-100, +50) region as computed previously. A correlation analysis was carried out to determine the existence of a link between loss of LexA regulation and integrase inactivation. The results of this analysis showed a significant correlation (Pearson r = 0.58, Spearman <U+03C1> = 0.53; P < 0.001) between both traits (Figure 4), and give credence to the idea that loss of LexA regulation is associated with integrase inactivation.: Correlation between inferred LexA regulation and integron integrase functionality. The plot was generated from the frequency values for each trait at each reference panel taxon, as derived from reciprocal BLAST mapping (see Additional file 14). Pearson and Spearman rank correlations and their respective P values were computed in Excel (Microsoft Corp., Redmond, WA, USA). The asterisk rating system is used for correlation P values (***P < 0.001). P values are relative to two-tailed Student t-test on the null hypothesis (no correlation).: To gain insight into the evolutionary history of this correlation, we generated a phylogenetic tree of 44 representative IntI sequences, and applied ancestral-state reconstruction methods for both phenotypic characters (predicted integrase functionality and LexA regulation). The tree (Figure 5) is in overall agreement with previously published IntI phylogenies [9, 53, 56]. As in previous phylogenies, two major ecological groups can be outlined on the tree: marine and freshwater/soil bacteria. Chromosomal superintegrons and class 5 mobile integrons borne by marine species form a monophyletic clade that sits at the root of the tree. From this early branch, a second radiation of integrons encompassing both chromosomal integrons and all other mobile integron classes splits neatly into integrons borne by, respectively, marine and soil/freshwater bacteria. In the marine species, class 2 and 4 mobile integrons form a monophyletic cluster with Shewanella chromosomal integrons that is also in agreement with previous analyses [57, 58]. In the soil/freshwater clade, class 1 and 3 mobile integrases form a distinct group, suggesting an early split from their chromosomal counterparts in the Proteobacteria [59].: Phylogenetic tree of IntI protein sequences. The tree is the majority-rule consensus tree generated by MrBayes. The tree was rooted using the Escherichia coli and Thiobacillus denitrificans XerCD protein sequences as outgroup. Bayesian posterior probabilities for each branch are displayed at each branching point. Inferred states for phenotypic traits derived from parsimony ancestral-state reconstruction analysis are displayed as follows. Integrase functionality: solid lines on tree branches represent inferred integrase functionality in that branch, and dotted lines indicate non-functionality. LexA regulation: at each taxon and branching point, small filled circles represent inferred presence of LexA regulation, and open circles indicate loss of LexA regulation. For clarity, the results of maximum likelihood reconstruction are not shown (see Additional file 6 and see Additional file 7 for these). The number of sequences mapping to each taxon in the reciprocal BLAST mapping analysis is shown between brackets after the taxon name. Stacked pie charts next to this number indicate the observed percentage of integrase functionality (upper pie) and LexA regulation (lower pie) in all the analyzed integrase sequences mapping to that specific taxon. The M letter followed by a subscript number (MX ) legend indicates mobile integron classes (1 to 5). Background colors delineate the main division into marine and soil/freshwater radiations and the XerCD outgroup.: Both parsimony and maximum likelihood (ML) reconstructions of the ancestral state for LexA regulation strongly supported the notion that this feature was present in the common ancestor of bacterial integrons. LexA regulation (Figure 5, filled circles) is pervasive among Vibrio superintegrons and is also widespread within the marine integron radiation. It is also most likely (0.7 likelihood in ML reconstruction, see Additional file 6) that LexA regulation was present in the ancestor of the soil/freshwater radiation, and has been subsequently lost (Figure 5, open circles) in many of its internal clades. A notable exception to this trend are the class 1 and class 3 integrons, in which LexA regulation is still the norm. Our results thus imply that some particular trait in the environment of both chromosomal superintegrons and mobile integrons must be exerting a considerable selective pressure towards preservation of integrase LexA regulation. In the chromosomal integrons of the Vibrionaceae, the most likely source of this pressure is the stabilization of large integrons, which may include essential genes [15]. In mobile integrons, it seems likely that selection might favor integrons that remain largely inactive, but are capable of generating sharp bursts of recombination activity in times of need for evolutionary innovation.: The reconstruction of ancestral states for inferred integrase functionality is relatively congruent with the hypothesis that the loss of LexA regulation might be associated with integrase inactivation (Figure 5; see Additional file 7). Even though there is testimonial evidence of inactivation (Figure 5, dotted lines), integrases from almost all marine species in the tree were found to be active (Figure 5, solid lines). By contrast, integrase inactivation was found to be monophyletic for two soil/freshwater subgroups, hinting at consistent selective pressure towards inactivation.: Phylogenetic distribution of predicted LexA regulation and integrase functionality: To further analyze the correlation between integrase LexA regulation and inactivation, we mapped through reciprocal BLAST searches [60] the 755 IntI homologs containing sufficient available sequence to assess both traits against the panel of IntI sequences used to reconstruct the phylogenetic tree. Even though reciprocal BLAST provides only a crude estimate of phylogenetic relationship, this mapping process allowed us to observe the apparent frequencies (Figure 5, pie charts) of both traits in the clusters represented by each tree taxon (Figure 5). Overall, the results of this analysis broadly agree with those of the ancestral-state reconstruction, and give further credence to the idea that loss of LexA regulation is associated with integrase inactivation. Nonetheless, close inspection of these results also reveals a complex pattern of phylogenetic distribution for both traits.: Among marine species, LexA regulation of intI genes is clearly prevalent, and loss of LexA regulation is only present in a few instances. One such instance is the V. cholerae SXT integrative-conjugative element (ICE), which harbors a class 4 integrase and for which SOS-dependent transfer has been reported through an indirect path involving the phage-like SetR repressor [37, 38, 61]. In spite of this, mapping results show that five out of the twenty sequences clustering with the V. cholerae SXT integrase have predicted LexA binding sites. These sequences belong to mobile integrases in Alteromonadales species that do not have homology with the SetR transcriptional regulator, suggesting that LexA regulation may have been preserved in the absence of SetR-mediated SOS regulation (see Additional file 8). Another exception is Lutiella nitroferrum, but the absence of predicted LexA sites is not surprising in a member of the Neisseriaceae, because all the sequenced members of this family lack a lexA gene [27]. A similar reasoning applies to another exception, Rhodopirellula baltica, because it is known that the LexA of Planctomycetes does not recognize the conventional E. coli LexA binding motif [27].: Conversely, loss of LexA regulation seems to be the norm among soil and freshwater species harboring chromosomal integrons. In most cases, this loss of regulation has an obvious explanation. Some families, such as the Nitrosomonadaceae and the Chromatiaceae, simply do not possess any LexA homologs, Thus explaining the absence of any LexA binding sites upstream of their intI genes [27]. A similar argument can be made for the Xanthomonadaceae, in which neither of the two identified LexA proteins recognizes the ß/<U+03B3>-Proteobacteria LexA binding motif [54], and for the Spirochetes, the d-Proteobacteria and the Cyanobacteria, in which LexA also recognizes divergent motifs [48, 62, 63]. However, reciprocal BLAST mapping indicates that there is residual LexA regulation persisting within several of these groups. The M. flagellatus cluster, for instance, has six out of thirty-two mapped sequences with predicted LexA binding sites. Careful examination reveals that, in this and all other cases of residual LexA regulation of soil/freshwater bacterial integrons, regulated integrases turn out to be harbored by a ß/<U+03B3>-Proteobacteria species or originate from environmental samples (see Additional file 9). This strongly suggests that, for the most part, LexA regulation is positively maintained when a suitable genomic background (a compatible lexA gene encoding a repressor that recognizes the ß/<U+03B3>-Proteobacteria motif) is available.: Several factors explain partly the absence of LexA binding motifs, other than the ß/<U+03B3>-Proteobacteria motif, regulating integron integrases. An obvious explanation is the lack of evolutionary time to develop such motifs. This is manifestly true for many mobile integrons subject to lateral gene transfer. Indeed, predicted ß/<U+03B3>-Proteobacteria LexA binding sites can still be seen in the mobile integrons harbored by species from distant phyla, such as the Actinobacteria. Integrase inactivation is another mechanism that several groups, such as those of the Xanthomonadales, seem to have evolved to compensate for unregulated integrase expression [16]. Even though this constitutes a general trend (Figure 4) and functionality can be temporarily restored through non-native recombination, the observed correlation is moderate (Pearson r = 0.58***). Moreover, integrase functionality has been assayed experimentally in several soil/freshwater chromosomal integrons in which the integrase is clearly not regulated [64, 65], suggesting that additional mechanisms must be at play.: Class 1 and 3 mobile integron integrases depart sharply from the trend towards loss of LexA regulation that is seen among soil/freshwater integrons. Reciprocal BLAST mapping supports the results of ancestral-state reconstruction methods, providing ample support for the persistence of LexA regulation in these well-sampled mobile integron classes. In addition, the high percentages of LexA regulation seen in both these integron classes (81% and 64%, respectively) are consistent with high percentages of predicted regulation in the marine mobile integrons of classes 2 and 5 (84% and 89%, respectively; Figure 5). Beyond its fundamental relevance to bacterial adaptation, the high prevalence of predicted LexA regulation of mobile integron integrases has serious clinical implications, as it establishes a generic system for genetic interchange under control of a general stress response shared by a large group of human and animal pathogens. Furthermore, bacterial conjugation has been shown recently to induce the SOS response, triggering integrase-mediated cassette recombination, in recipient bacteria [66]. In this setting, it is important to note that integron cassettes encoding resistance to several antibiotics known to induce the SOS response, such as trimethoprim, quinolones and ß-lactams, are common today [5, 67]. This suggests that the indirect triggering effect of these antibiotics on the capture of resistance cassettes may have resulted in a very efficient selection mechanism.: A less obvious consequence of integrase SOS regulation in clinically relevant mobile integrons is its repercussion on antibiotic-resistance policies. Current policies rely largely on the detrimental effects that most resistance mechanisms inflict on bacteria, which eventually lead to loss of resistance genes in the absence of antibiotic exposure [68]. Because most cassettes are promoterless, the most ancient cassettes (located at the distal part of the integron) are subject to severe polar effects, leading to rare or non-existent protein products (Figure 1) [4]. In this context, the incorporation of SOS regulation in integrons puts forward a mechanism by which antibiotic-resistance genes and other useful adaptations can be silently set aside, while current adaptive traits are maintained. In time of stress, such as exposure to antibiotics, the relevant resistance cassettes can be called upon by integrase-mediated translocation, and thus selected for only when their expression is required. Furthermore, the cassette genes that have been temporarily relegated to distal positions in integrons may also sustain increased evolution rates, generating a substantial pool of variability from which to draw on when the appropriate selective pressures resurface [69].: Reciprocal BLAST mapping also shows that predicted integrase inactivation is very common among soil/freshwater bacteria, coinciding with a prevailing loss of putative LexA regulation. Nonetheless, predicted integrase inactivation is also relatively common in marine species. Even though the predicted integrase inactivation correlates well with reduced LexA regulation (Figure 4), there are notable outliers to this trend in both radiations. For instance, among mobile class 5 integrons, only 44% of mapped integrases seem to be functional, despite predicted LexA regulation in 89% of them. The opposite is also true; many mobile integrons with putatively functional integrases have disrupted, absent or non-native LexA binding sites. This suggests that lack of LexA regulation can be tolerated or selected for when it provides adaptive benefit. We have shown here that in some mobile class 1 integrons, the LexA binding site has been disrupted by a GGG insertion that drastically increases the expression of antibiotic-resistance cassettes (Figure 3). In a similar vein, it seems likely that sustained integrase activity (with its associated shuffling of gene cassettes [21, 25]) must be preferable to permanent inactivation under the shifting selective environments associated with clinical environments and mobile integrons. This would explain why integrase inactivation is not seen as frequently in mobile integron classes associated with clinical settings, in spite of their dissemination into bacterial species that do not harbor a lexA gene capable of regulating the preset LexA binding site.: Overall, however, the pattern of integrase inactivation is broadly in agreement with that reported previously [53]. In fact, we found a higher proportion (46%) of inactivated IntI proteins than that reported previously [53], indicating that integrase inactivation is a pervasive phenomenon and typically correlated with loss of LexA regulation. Hence, our findings suggest that putative integrase inactivation is the main mechanism evolved to deal with lack of LexA regulation, but it seems likely that other factors must provide heightened tolerance to unregulated integrase activity in soil/freshwater bacteria. Smaller integron sizes and lessened integrase activity may both contribute to make unregulated integrase expression more tolerable, but regulation by an alternative transcription factor is an obvious possibility that needs to be carefully explored. This is particularly true because most integrase functionality assays have been carried out in a non-native context [64, 65] and may thus have missed regulatory effects. The quest to define precisely the multiple mechanisms behind this adaption is an important goal, because the lack of a mechanism to mitigate the effects of integrase activity upon loss of LexA regulation may well lie at the root of the intriguing absence of chromosomal integrons from many bacterial phyla [53].: Conclusions: The results presented here illustrate the extent of SOS regulation of integron integrases, and provide several important clues to the evolution of this regulation and to the evolution of bacterial integrons. The combination of in silico and in vivo assays allows us to conclude that LexA regulation was probably present in the primordial integron and that its loss may be linked to a number of factors, including inactivation of the integrase gene and enhancement of resistance cassettes expression. Our findings have important clinical implications for the evolution of antibiotic resistance, and suggest that the emergence of mechanisms to palliate unregulated integrase expression may provide an explanation for the uneven distribution of integrons across the Bacteria domain.: Methods: Data mining and preprocessing: A custom set of scripts was developed in BioPhyton to search for intI homologs on NCBI GenBank databases (NR, ENV and WGS). The scripts retrieved and re-annotated both the intI coding sequences and their corresponding upstream sequences. The scripts used the whole VchIntIA protein sequence (AAC38424) and its IntI specific domain [70] (positions 186 to 245 in VchIntIA) as a query for a TBLASTN search. To limit the number of false positives, a cut-off e-value of 10-5 was set, and only sequences matching both queries were retrieved.: TBLASTN results were used to identify frameshift and deletion events of up to 100 bp. Larger events where not considered. The nucleotide sequences spanning the full length of the processed hits and 1 kb upstream of the hit start were recovered. Conceptual translations of these sequences (corrected for frameshift when necessary) were then used to search a curated reference panel using BLASTP. The reference panel comprised 43 phylogenetically diverse IntI proteins, phage integrases and XerCD recombinases. The reference sequence of the best reciprocal hit was used to consistently re-annotate the start and stop points of all retrieved sequences, thereby allowing homogenization of the dataset and efficient detection of in-frame premature stop codons. Sequences with a best reciprocal hit not belonging to the IntI family (that is, phage integrases and XerCD recombinases) were removed from further analysis. Similarly, all IntI homologs lacking a significant amount of coding sequence at both ends of the predicted coding region (+30 bp downstream of the start codon and -90 bp upstream of stop codon) were also removed from further analysis. Duplicates resulting from the use of partially redundant databases were removed, defining duplicates as two sequences having the same sequences, coordinates and NCBI taxonomical assignment, and the same strain or plasmid number when applicable. The final annotated dataset comprised 1,483 sequences, and is available online as supplementary material in both GenBank (.GBK) and spreadsheet-compatible (.XLS) format (see Additional file 10, see Additional file 11).: Assessment of protein functionality: Integrase functionality was assessed systematically using a custom rule-based system operating on aligned IntI sequences. To generate functional rules to detect inactivation, we analyzed published structural and mutational studies of both the chromosomal V. cholerae IntI4 and the mobile IntI1 integron integrases [22, 70–74]. From this analysis, we identified a list of five essential residues in the catalytic site that cannot be mutated (R135, K160, H267, R270, H293, Y302; positions relative to the V. cholerae IntIA sequence), and eight residues essential for binding, for which only a limited range of substitutions is likely to be tolerated (L202 (<U+2192>LIVM), P203 (<U+2192>PST), K209, Y210 (<U+2192>YFWH), P211 (<U+2192>PRQ), R239 (<U+2192>KRH), H240 (<U+2192>KRH), H241 (<U+2192>KRH); positions again relative to the V. cholerae IntIA sequence).: A multiple alignment of all IntI sequences in the reference panel was generated using MUSCLE software http://www.drive5.com/muscle/ with an opening gap penalty of -20, and otherwise standard parameters [75]. This alignment was used to propagate the functional rules defined on the VchIntIA sequence towards the reference panel IntI sequences. The consistency of this propagation was reviewed manually. Pairwise alignments of all the TBLASTN identified homologs with their corresponding best hits were used to further propagate the functional model and allow a decision on whether each particular protein should be considered functional. IntI sequences containing an internal stop, a frameshift and/or any number of inactivating mutations were tagged as 'non-functional'. If either the start or stop of sequence was unavailable (see above), the functionality of the corresponding protein was tagged as 'unknown'. Otherwise, the protein was considered functional by default.: The automated rule-based system was evaluated against a reference set of integron integrase sequences for which activity has been experimentally assessed [64, 65, 76–80]. This reference set encompasses active and inactive integrases from both marine and soil/freshwater chromosomal integrons, and class 1, 2 and 3 mobile integrons. The rule-based system was able to correctly predict integrase activity in all these cases. In addition, it also detected all indels, frameshift and nonsense mutations that have been reported previously in independent studies as leading to integrase inactivation [16, 53].: In silico searches for LexA binding sites: The presence of LexA binding sites on all the retrieved intI homolog sequences was assessed by scanning them using xFITOM http://compbio.umbc.edu/2280/, a generic program for binding site search in genomic sequences [81, 82]. Searches were conducted using the Ri index [83] and a motif-normalized threshold as reported previously [84]. Identified sites were considered 'w/functional box' if located within -100 or +50 bp of the re-annotated intI start codon. When the sequence in the specified range was not fully available, this feature was tagged as 'unknown'. Searches were conducted using the 15 different LexA binding motifs reported to date [27], which include those of largely sampled phylogenetic groups, such as the Firmicutes, the Actinobacteria, the Cyanobacteira or the Alpha Proteobacteria [62, 85–87]. We also identified, and specifically searched for, a particular motif consisting of a LexA binding site inactivated by the insertion of a GGG triplet. These sites are referred to as 'broken', and were categorized as 'without functional box'. The results of integrase functionality and LexA binding site searches are fully annotated on the main dataset files (see Additional file 10, see Additional file 11).: Phylogenetic analyses: Alignments of the reference-panel protein sequences were carried out using a combined procedure to improve alignment quality as described previously [88]. Protein sequences were first aligned with CLUSTALW (version1.83; http://www.ebi.ac.uk/Tools/msa/clustalw2/[89] using Gonnet matrices and default [10], twenty-five and five gap-opening penalties for the multiple alignment stage, thus generating three different alignments. These three different alignments, together with a local alignment generated by the T-COFFEE Lalign method, were integrated as libraries into T-COFFEE (version 1.37; http://www.ebi.ac.uk/Tools/msa/tcoffee/[90] for optimization. The optimized alignment was then processed with Gblocks (version 0.91b; http://molevol.cmima.csic.es/castresana/Gblocks.html[91] with the half-gaps setting and otherwise default parameters to select conserved positions and discard poorly aligned and phylogenetically unreliable information. Phylogenetic analyses were then carried out using MrBayes (version 3.1.1; http://mrbayes.csit.fsu.edu/ and PHYML version 2.4.1; http://code.google.com/p/phyml/[92] for Bayesian inference of tree topologies as reported previously [88]. A mixed four-category <U+03B3> distributed rate plus proportion of invariable sites model [invgamma] was applied and its parameters were estimated independently by the program. Eight independent MrBayes Metropolis-Coupled Markov Chain Monte Carlo runs were carried out with four independent chains for 106 generations. The resulting phylogenetic trees were plotted with TreeView (version 1.6.6; http://taxonomy.zoology.gla.ac.uk/rod/treeview.html[93] and edited for presentation using CorelDraw Graphic Suite (version 12; Corel Corp., Fremont, CA, USA).: Ancestral-state reconstruction was conducted with the Mesquite ancestral-state reconstruction package (Mesquite Software Inc., Austin, TX, USA) [94] using the majority-rule consensus tree generated by MrBayes. The results of in silico searches for LexA binding sites were mapped into a discrete (1/0/?) character for each taxon of the tree. Reconstruction of LexA binding site presence was first carried out using the ML reconstruction method [95, 96] and the AsymmMk model (Asymmetrical Markov k-state two-parameter model), estimating asymmetric rates of change between characters. The estimated rates (0.145 forward, 0.813 backward) were then converted into parsimony steps by direct inversion (6.89, 1.23), and used to generate the step matrix for parsimony reconstruction [97]. The results from in silico integrase functionality assessment were also mapped into a discrete (1/0/?) character for each taxon. Ancestral-state reconstruction for this character was carried out using both an ordered parsimony model and AsymmMk-based maximum-likelihood model. The results of both reconstruction methods were broadly in agreement (see Additional file 7), but for clarity, only parsimony results are superimposed on Figure 5.: EMSA: The V. parahaemolyticus and E. coli lexA genes were amplified using suitable primers (see Additional file 12) and cloned into a pET15b vector (see Additional file 13). Overexpression and purification of the corresponding LexA protein was performed as described previously for other LexA proteins [84]. Each DNA probe was constructed using two complementary 100 bp synthetic oligonucleotides (see Additional file 12). EMSA experiments were performed as described previously [84], using 80 nmol/l V. parahaemolyticus LexA or 200 nmol/l of E.coli LexA protein and 20 ng of each DIG-marked DNA probe in the binding mixture. For EMSA competitive assays, 200 fold of either specific or non-specific non-labeled DNA was added to the binding mixture. In all cases, samples were loaded onto 6% non-denaturing Tris-glycine polyacrylamide gels. Digoxigenin-labeled DNA-protein complexes were detected using the manufacturer's protocol (Roche Applied Science, Indianapolis, IN, USA).: RNA extraction and RT-PCR: RT-PCR experiments were performed (Titan One Tube RT-PCR System; Roche) with suitable oligonucleotides (see Additional file 12 for list), following the manufacturer's instructions. Real-time quantitative RT-PCR analysis of total RNA was carried out in a PCR system, (LightCycler; Roche), using a commercial kit (LCRNA Master SYBR Green I Kitl Roche) according to the manufacturer's instructions. Transcription of pMUR050 intID1 and intID2 genes (under control of PintI1 - and PintI1 +, respectively) was determined in wild-type E. coli K12 and in a lexA-defective strain (UA6189). Both strains contained either the pUA1105 (intA1) or the pUA1106 (intA2) plasmid. Expression of the V. parahaemolyticus intI gene was tested in the ATCC17802 wild-type strain and in a lexA-defective strain (UA10001) (see Additional file 13). In both cases, expression of the recA gene was used as the positive control, and the mRNA concentration for each gene was normalized to that of the housekeeping dxs gene. The expression factor was calculated as the ratio of the relative mRNA concentration for each gene in the corresponding lexA mutant strain with respect to that in the wild type. In each case, the mean value from three independent experiments (each in triplicate) was calculated. Strains UA6189 and UA10001 were constructed, respectively, using the Lambda-Red recombinase system [98] or the marker exchange procedure, as described previously [99].: Funding: This work wmas supported by grants from the Ministère de la Recherche et de l'Enseignement supérieur, the Conseil Régional du Limousin, the Fondation pour la Recherche Médicale (FRM) and from the Institut National de la Santé et de la Recherche Médicale (Inserm) for the Ploy laboratory; by the Institut Pasteur, the Centre National de la Recherche Scientifique (CNRS URA 2171), the FRM and the EU (NoE EuroPathoGenomics, LSHB-CT-2005-512061), for the Mazel laboratory; and by grants BFU2008-01078/BMC from the Ministerio de Ciencia e Innovación de España and 2009SGR-1106 from the Generalitat de Catalunya, for the Barbé laboratory. NSA was supported by the Fundació Cellex at the Erill laboratory.: References: Stokes HW, Hall RM: A novel family of potentially mobile DNA elements encoding site-specific gene-integration functions: integrons. Molecular microbiology. 1989, 3: 1669-1683. 10.1111/j.1365-2958.1989.tb00153.x.: Collis CM, Kim MJ, Stokes HW, Hall RM: Integron-encoded IntI integrases preferentially recognize the adjacent cognate attI site in recombination with a 59-be site. Molecular microbiology. 2002, 46: 1415-1427. 10.1046/j.1365-2958.2002.03260.x.: Levesque C, Brassard S, Lapointe J, Roy PH: Diversity and relative strength of tandem promoters for the antibiotic-resistance genes of several integrons. Gene. 1994, 142: 49-54. 10.1016/0378-1119(94)90353-0.: Collis CM, Hall RM: Expression of antibiotic resistance genes in the integrated cassettes of integrons. Antimicrobial agents and chemotherapy. 1995, 39: 155-162.: Rowe-Magnus DA, Mazel D: The role of integrons in antibiotic resistance gene capture. Int J Med Microbiol. 2002, 292: 115-125. 10.1078/1438-4221-00197.: Fluit AC, Schmitz FJ: Resistance integrons and super-integrons. Clin Microbiol Infect. 2004, 10: 272-288. 10.1111/j.1198-743X.2004.00858.x.: Partridge SR, Tsafnat G, Coiera E, Iredell JR: Gene cassettes and cassette arrays in mobile resistance integrons. FEMS microbiology reviews. 2009, 33: 757-784. 10.1111/j.1574-6976.2009.00175.x.: Cambray G, Guerout AM, Mazel D: Integrons. Annual review of genetics. 2010, 44: 141-166. 10.1146/annurev-genet-102209-163504.: Mazel D: Integrons: agents of bacterial evolution. Nature reviews. 2006, 4: 608-620. 10.1038/nrmicro1462.: Boucher Y, Labbate M, Koenig JE, Stokes HW: Integrons: mobilizable platforms that promote genetic diversity in bacteria. Trends in microbiology. 2007, 15: 301-309. 10.1016/j.tim.2007.05.004.: Mazel D, Dychinco B, Webb VA, Davies J: A distinctive class of integron in the Vibrio cholerae genome. Science (New York, NY). 1998, 280: 605-608. 10.1126/science.280.5363.605.: Rowe-Magnus DA, Guerout AM, Mazel D: Super-integrons. Research in microbiology. 1999, 150: 641-651. 10.1016/S0923-2508(99)00127-8.: Rowe-Magnus DA, Guerout AM, Ploncard P, Dychinco B, Davies J, Mazel D: The evolutionary history of chromosomal super-integrons provides an ancestry for multiresistant integrons. Proceedings of the National Academy of Sciences of the United States of America. 2001, 98: 652-657. 10.1073/pnas.98.2.652.: Vaisvila R, Morgan RD, Posfai J, Raleigh EA: Discovery and distribution of super-integrons among pseudomonads. Molecular microbiology. 2001, 42: 587-601.: Rowe-Magnus DA, Guerout AM, Biskri L, Bouige P, Mazel D: Comparative analysis of superintegrons: engineering extensive genetic diversity in the Vibrionaceae. Genome research. 2003, 13: 428-442. 10.1101/gr.617103.: Gillings MR, Holley MP, Stokes HW, Holmes AJ: Integrons in Xanthomonas: a source of species genome diversity. Proceedings of the National Academy of Sciences of the United States of America. 2005, 102: 4419-4424. 10.1073/pnas.0406620102.: Rowe-Magnus DA, Guerout AM, Mazel D: Bacterial resistance evolution by recruitment of super-integron gene cassettes. Molecular microbiology. 2002, 43: 1657-1669. 10.1046/j.1365-2958.2002.02861.x.: Melano R, Petroni A, Garutti A, Saka HA, Mange L, Pasteran F, Rapoport M, Rossi A, Galas M: New carbenicillin-hydrolyzing beta-lactamase (CARB-7) from Vibrio cholerae non-O1, non-O139 strains encoded by the VCR region of the V. cholerae genome. Antimicrobial agents and chemotherapy. 2002, 46: 2162-2168. 10.1128/AAC.46.7.2162-2168.2002.: Petroni A, Melano RG, Saka HA, Garutti A, Mange L, Pasteran F, Rapoport M, Miranda M, Faccone D, Rossi A, Hoffman PS, Galas MF: CARB-9, a carbenicillinase encoded in the VCR region of Vibrio cholerae non-O1, non-O139 belongs to a family of cassette-encoded beta-lactamases. Antimicrobial agents and chemotherapy. 2004, 48: 4042-4046. 10.1128/AAC.48.10.4042-4046.2004.: Le Roux F, Zouine M, Chakroun N, Binesse J, Saulnier D, Bouchier C, Zidane N, Ma L, Rusniok C, Lajus A, Buchrieser C, Médigue C, Polz MF, Mazel D: Genome sequence of Vibrio splendidus: an abundant planctonic marine species with a large genotypic diversity. Environmental microbiology. 2009, 11: 1959-1970. 10.1111/j.1462-2920.2009.01918.x.: Labbate M, Boucher Y, Joss MJ, Michael CA, Gillings MR, Stokes HW: Use of chromosomal integron arrays as a phylogenetic typing system for Vibrio cholerae pandemic strains. Microbiology (Reading, England). 2007, 153: 1488-1498. 10.1099/mic.0.2006/001065-0.: MacDonald D, Demarre G, Bouvier M, Mazel D, Gopaul DN: Structural basis for broad DNA-specificity in integron recombination. Nature. 2006, 440: 1157-1162. 10.1038/nature04643.: Bouvier M, Ducos-Galand M, Loot C, Bikard D, Mazel D: Structural features of single-stranded integron cassette attC sites and their role in strand selection. PLoS genetics. 2009, 5: e1000632-10.1371/journal.pgen.1000632.: Bouvier M, Demarre G, Mazel D: Integron cassette insertion: a recombination process involving a folded single strand substrate. The EMBO journal. 2005, 24: 4356-4367. 10.1038/sj.emboj.7600898.: Guerin E, Cambray G, Sanchez-Alberola N, Campoy S, Erill I, Da Re S, Gonzalez-Zorn B, Barbe J, Ploy M-C, Mazel D: The SOS Response Controls Integron Recombination. Science. 2009, 324: 1034-10.1126/science.1172914.: Walker GC: Mutagenesis and inducible responses to deoxyribonucleic acid damage in Escherichia coli. Microbiol Rev. 1984, 48: 60-93.: Erill I, Campoy S, Barbe J: Aeons of distress: an evolutionary perspective on the bacterial SOS response. FEMS microbiology reviews. 2007, 31: 637-656. 10.1111/j.1574-6976.2007.00082.x.: Aertsen A, Michiels CW: Upstream of the SOS response: figure out the trigger. Trends in microbiology. 2006, 14: 421-423. 10.1016/j.tim.2006.08.006.: Sassanfar M, Roberts JW: Nature of the SOS-inducing signal in Escherichia coli. The involvement of DNA replication. Journal of molecular biology. 1990, 212: 79-96. 10.1016/0022-2836(90)90306-7.: Little JW: Mechanism of specific LexA cleavage: autodigestion and the role of RecA coprotease. Biochimie. 1991, 73: 411-421. 10.1016/0300-9084(91)90108-D.: Fernandez De Henestrosa AR, Ogi T, Aoyagi S, Chafin D, Hayes JJ, Ohmori H, Woodgate R: Identification of additional genes belonging to the LexA regulon in Escherichia coli. Molecular microbiology. 2000, 35: 1560-1572.: Waldor MK, Friedman DI: Phage regulatory circuits and virulence gene expression. Current opinion in microbiology. 2005, 8: 459-465. 10.1016/j.mib.2005.06.001.: Quinones M, Davis BM, Waldor MK: Activation of the Vibrio cholerae SOS response is not required for intestinal cholera toxin production or colonization. Infection and immunity. 2006, 74: 927-930. 10.1128/IAI.74.2.927-930.2006.: Kimmitt PT, Harwood CR, Barer MR: Toxin gene expression by shiga toxin-producing Escherichia coli: the role of antibiotics and the bacterial SOS response. Emerging infectious diseases. 2000, 6: 458-465. 10.3201/eid0605.000503.: Aleshkin GI, Kadzhaev KV, Markov AP: High and low UV-dose responses in SOS-induction of the precise excision of transposons tn1, Tn5 and Tn10 in Escherichia coli. Mutation research. 1998, 401: 179-191.: Ubeda C, Maiques E, Knecht E, Lasa I, Novick RP, Penades JR: Antibiotic-induced SOS response promotes horizontal dissemination of pathogenicity island-encoded virulence factors in staphylococci. Molecular microbiology. 2005, 56: 836-844. 10.1111/j.1365-2958.2005.04584.x.: Kelley WL: Lex marks the spot: the virulent side of SOS and a closer look at the LexA regulon. Molecular microbiology. 2006, 62: 1228-1238. 10.1111/j.1365-2958.2006.05444.x.: Beaber JW, Hochhut B, Waldor MK: SOS response promotes horizontal dissemination of antibiotic resistance genes. Nature. 2004, 427: 72-74. 10.1038/nature02241.: Phillips I, Culebras E, Moreno F, Baquero F: Induction of the SOS response by new 4-quinolones. The Journal of antimicrobial chemotherapy. 1987, 20: 631-638. 10.1093/jac/20.5.631.: Miller C, Thomsen LE, Gaggero C, Mosseri R, Ingmer H, Cohen SN: SOS response induction by beta-lactams and bacterial defense against antibiotic lethality. Science (New York, NY). 2004, 305: 1629-1631. 10.1126/science.1101630.: Maiques E, Ubeda C, Campoy S, Salvador N, Lasa I, Novick RP, Barbe J, Penades JR: beta-lactam antibiotics induce the SOS response and horizontal transfer of virulence factors in Staphylococcus aureus. Journal of bacteriology. 2006, 188: 2726-2729. 10.1128/JB.188.7.2726-2729.2006.: Goerke C, Koller J, Wolz C: Ciprofloxacin and trimethoprim cause phage induction and virulence modulation in Staphylococcus aureus. Antimicrobial agents and chemotherapy. 2006, 50: 171-177. 10.1128/AAC.50.1.171-177.2006.: Avison MB: New approaches to combating antimicrobial drug resistance. Genome biology. 2005, 6: 243-10.1186/gb-2005-6-13-243.: Michael CA, Gillings MR, Holmes AJ, Hughes L, Andrew NR, Holley MP, Stokes HW: Mobile gene cassettes: a fundamental resource for bacterial evolution. The American naturalist. 2004, 164: 1-12. 10.1086/421733.: Tapias A, Barbe J: Mutational analysis of the Rhizobium etli recA operator. Journal of bacteriology. 1998, 180: 6325-6331.: Campoy S, Fontes M, Padmanabhan S, Cortes P, Llagostera M, Barbe J: LexA-independent DNA damage-mediated induction of gene expression in Myxococcus xanthus. Molecular microbiology. 2003, 49: 769-781.: Mazon G, Erill I, Campoy S, Cortes P, Forano E, Barbe J: Reconstruction of the evolutionary history of the LexA-binding sequence. Microbiology. 2004, 150: 3783-3795. 10.1099/mic.0.27315-0.: Jara M, Nunez C, Campoy S, Fernandez de Henestrosa AR, Lovley DR, Barbe J: Geobacter sulfurreducens has two autoregulated lexA genes whose products do not bind the recA promoter: differing responses of lexA and recA to DNA damage. Journal of bacteriology. 2003, 185: 2493-2502. 10.1128/JB.185.8.2493-2502.2003.: Campoy S, Salvador N, Cortes P, Erill I, Barbe J: Expression of canonical SOS genes is not under LexA repression in Bdellovibrio bacteriovorus. Journal of bacteriology. 2005, 187: 5367-5375. 10.1128/JB.187.15.5367-5375.2005.: Kim T-E, Kwon H-J, Cho S-H, Kim S, Lee B-K, Yoo H-S, Park Y-H, Kim S-J: Molecular differentiation of common promoters in Salmonella class 1 integrons. Journal of Microbiological Methods. 2007, 68: 453-457. 10.1016/j.mimet.2006.09.019.: Jove T, Da Re S, Denis F, Mazel D, Ploy MC: Inverse correlation between promoter strength and excision activity in class 1 integrons. PLoS genetics. 2010, 6: e1000793-10.1371/journal.pgen.1000793.: Gonzalez-Zorn B, Catalan A, Escudero JA, Dominguez L, Teshager T, Porrero C, Moreno MA: Genetic basis for dissemination of armA. The Journal of antimicrobial chemotherapy. 2005, 56: 583-585. 10.1093/jac/dki246.: Nemergut DR, Robeson MS, Kysela RF, Martin AP, Schmidt SK, Knight R: Insights and inferences about integron evolution from genomic data. BMC genomics. 2008, 9: 261-10.1186/1471-2164-9-261.: Yang MK, Yang YC, Hsu CH: Characterization of Xanthomonas axonopodis pv. citri LexA: recognition of the LexA binding site. Mol Genet Genomics. 2002, 268: 477-487. 10.1007/s00438-002-0754-6.: Campoy S, Mazon G, Fernandez de Henestrosa AR, Llagostera M, Monteiro PB, Barbe J: A new regulatory DNA motif of the gamma subclass Proteobacteria: identification of the LexA protein binding site of the plant pathogen Xylella fastidiosa. Microbiology (Reading, England). 2002, 148: 3583-3597.: Diaz-Mejia JJ, Amabile-Cuevas CF, Rosas I, Souza V: An analysis of the evolutionary relationships of integron integrases, with emphasis on the prevalence of class 1 integrons in Escherichia coli isolates from clinical and environmental origins. Microbiology (Reading, England). 2008, 154: 94-102. 10.1099/mic.0.2007/008649-0.: Boucher Y, Nesbo CL, Joss MJ, Robinson A, Mabbutt BC, Gillings MR, Doolittle WF, Stokes HW: Recovery and evolutionary analysis of complete integron gene cassette arrays from Vibrio. BMC Evolutionary Biology. 2006, 6: 3-10.1186/1471-2148-6-3.: Larouche A, Roy PH: Analysis by mutagenesis of a chromosomal integron integrase from Shewanella amazonensis SB2BT. Journal of bacteriology. 2009, 191: 1933-1940. 10.1128/JB.01537-08.: Gillings M, Boucher Y, Labbate M, Holmes A, Krishnan S, Holley M, Stokes HW: The evolution of class 1 integrons and the rise of antibiotic resistance. Journal of bacteriology. 2008, 190: 5095-5100. 10.1128/JB.00152-08.: Fuchsman CA, Rocap G: Whole-genome reciprocal BLAST analysis reveals that planctomycetes do not share an unusually large number of genes with Eukarya and Archaea. Applied and environmental microbiology. 2006, 72: 6841-6844. 10.1128/AEM.00429-06.: Beaber JW, Waldor MK: Identification of operators and promoters that control SXT conjugative transfer. Journal of bacteriology. 2004, 186: 5945-5949. 10.1128/JB.186.17.5945-5949.2004.: Mazon G, Lucena JM, Campoy S, Fernandez de Henestrosa AR, Candau P, Barbe J: LexA-binding sequences in Gram-positive and cyanobacteria are closely related. Mol Genet Genomics. 2004, 271: 40-49. 10.1007/s00438-003-0952-x.: Cune J, Cullen P, Mazon G, Campoy S, Adler B, Barbe J: The Leptospira interrogans lexA gene is not autoregulated. Journal of bacteriology. 2005, 187: 5841-5845. 10.1128/JB.187.16.5841-5845.2005.: Holmes AJ, Holley MP, Mahon A, Nield B, Gillings M, Stokes HW: Recombination activity of a distinctive integron-gene cassette system associated with Pseudomonas stutzeri populations in soil. Journal of bacteriology. 2003, 185: 918-928. 10.1128/JB.185.3.918-928.2003.: Leon G, Roy PH: Excision and integration of cassettes by an integron integrase of Nitrosomonas europaea. Journal of bacteriology. 2003, 185: 2036-2041. 10.1128/JB.185.6.2036-2041.2003.: Baharoglu Z, Bikard D, Mazel D: Conjugative DNA transfer induces the bacterial SOS response and promotes antibiotic resistance development through integron activation. PLoS genetics. 2010, 6: e1001165-10.1371/journal.pgen.1001165.: Fonseca EL, Dos Santos Freitas F, Vieira VV, Vicente AC: New qnr Gene Cassettes Associated with Superintegron Repeats in Vibrio cholerae O1. Emerging infectious diseases. 2008, 14: 1129-1131. 10.3201/eid1407.080132.: Andersson DI, Levin BR: The biological cost of antibiotic resistance. Current opinion in microbiology. 1999, 2: 489-493. 10.1016/S1369-5274(99)00005-3.: Gupta RD, Tawfik DS: Directed enzyme evolution via small and effective neutral drift libraries. Nature methods. 2008, 5: 939-942. 10.1038/nmeth.1262.: Messier N, Roy PH: Integron integrases possess a unique additional domain necessary for activity. Journal of bacteriology. 2001, 183: 6699-6706. 10.1128/JB.183.22.6699-6706.2001.: Johansson C, Boukharta L, Eriksson J, Aqvist J, Sundstrom L: Mutagenesis and homology modeling of the Tn21 integron integrase IntI1. Biochemistry. 2009, 48: 1743-1753. 10.1021/bi8020235.: Gravel A, Messier N, Roy PH: Point mutations in the integron integrase IntI1 that affect recombination and/or substrate recognition. Journal of bacteriology. 1998, 180: 5437-5442.: Demarre G, Frumerie C, Gopaul DN, Mazel D: Identification of key structural determinants of the IntI1 integron integrase that influence attC × attI1 recombination efficiency. Nucleic acids research. 2007, 35: 6475-6489. 10.1093/nar/gkm709.: Frumerie C, Ducos-Galand M, Gopaul DN, Mazel D: The relaxed requirements of the integron cleavage site allow predictable changes in integron target specificity. Nucleic acids research. 2010, 38: 559-569. 10.1093/nar/gkp990.: Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic acids research. 2004, 32: 1792-1797. 10.1093/nar/gkh340.: Drouin F, Melancon J, Roy PH: The IntI-like tyrosine recombinase of Shewanella oneidensis is active as an integron integrase. Journal of bacteriology. 2002, 184: 1811-1815. 10.1128/JB.184.6.1811-1815.2002.: Collis CM, Kim MJ, Partridge SR, Stokes HW, Hall RM: Characterization of the class 3 integron and the site-specific recombination system it determines. Journal of bacteriology. 2002, 184: 3017-3026. 10.1128/JB.184.11.3017-3026.2002.: Hansson K, Sundstrom L, Pelletier A, Roy PH: IntI2 integron integrase in Tn7. Journal of bacteriology. 2002, 184: 1712-1721. 10.1128/JB.184.6.1712-1721.2002.: Biskri L, Bouvier M, Guerout AM, Boisnard S, Mazel D: Comparative study of class 1 integron and Vibrio cholerae superintegron integrase activities. Journal of bacteriology. 2005, 187: 1740-1750. 10.1128/JB.187.5.1740-1750.2005.: Martinez E, de la Cruz F: Genetic elements involved in Tn21 site-specific integration, a novel mechanism for the dissemination of antibiotic resistance genes. The EMBO journal. 1990, 9: 1275-1281.: Erill I, O'Neill MC: A reexamination of information theory-based methods for DNA-binding site identification. BMC bioinformatics. 2009, 10: 57-10.1186/1471-2105-10-57.: Bhargava N, Erill I: xFITOM: a generic GUI tool to search for transcription factor binding sites. Bioinformation. 2010, 5: 49-50.: Schneider TD: Information Content of Individual Genetic Sequences. Journal of Theoretical Biology. 1997, 189: 427-441. 10.1006/jtbi.1997.0540.: Abella M, Campoy S, Erill I, Rojo F, Barbe J: Cohabitation of two different lexA regulons in Pseudomonas putida. Journal of bacteriology. 2007, 189: 8855-8862. 10.1128/JB.01213-07.: Cheo DL, Bayles KW, Yasbin RE: Elucidation of regulatory elements that control damage induction and competence induction of the Bacillus subtilis SOS system. Journal of bacteriology. 1993, 175: 5907-5915.: Movahedzadeh F, Colston MJ, Davis EO: Characterization of Mycobacterium tuberculosis LexA: recognition of a Cheo (Bacillus-type SOS) box. Microbiology (Reading, England). 1997, 143 (Pt 3): 929-936.: Fernandez de Henestrosa AR, Rivera E, Tapias A, Barbe J: Identification of the Rhodobacter sphaeroides SOS box. Molecular microbiology. 1998, 28: 991-1003. 10.1046/j.1365-2958.1998.00860.x.: Erill I, Campoy S, Mazon G, Barbe J: Dispersal and regulation of an adaptive mutagenesis cassette in the bacteria domain. Nucleic acids research. 2006, 34: 66-77. 10.1093/nar/gkj412.: Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic acids research. 1994, 22: 4673-4680. 10.1093/nar/22.22.4673.: Notredame C, Higgins DG, Heringa J: T-Coffee: A novel method for fast and accurate multiple sequence alignment. Journal of molecular biology. 2000, 302: 205-217. 10.1006/jmbi.2000.4042.: Castresana J: Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Molecular biology and evolution. 2000, 17: 540-552.: Guindon S, Gascuel O: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic biology. 2003, 52: 696-704. 10.1080/10635150390235520.: Page RD: TreeView: an application to display phylogenetic trees on personal computers. Comput Appl Biosci. 1996, 12: 357-358.: Mesquite: a modular system for evolutionary analysis. Version 2.72.http://mesquiteproject.org: Pagel M: The maximum likelihood approach to reconstructing ancestral character states of discrete characters on phylogenies. Systematic biology. 1999, 48: 612-622. 10.1080/106351599260184.: Schluter D, Price T, Mooers AO, Ludwig D: Likelihood of ancestor states in adaptive radiation. Evolution. 1997, 51: 1699-1711. 10.2307/2410994.: Omland KE: The assumptions and challenges of ancestral state reconstructions. Systematic biology. 1999, 48: 604-611. 10.1080/106351599260175.: Datsenko KA, Wanner BL: One-step inactivation of chromosomal genes in Escherichia coli K-12 using PCR products. Proceedings of the National Academy of Sciences of the United States of America. 2000, 97: 6640-6645. 10.1073/pnas.120163297.: Abella M, Erill I, Jara M, Mazon G, Campoy S, Barbe J: Widespread distribution of a lexA-regulated DNA damage-inducible multiple gene cassette in the Proteobacteria phylum. Molecular microbiology. 2004, 54: 212-222. 10.1111/j.1365-2958.2004.04260.x.: Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome research. 2004, 14: 1188-1190. 10.1101/gr.849004.: Download references: Acknowledgements: We thank Mike C. O'Neill for his careful reading and comments on the different versions of this manuscript. We also thank Nicholas Friedman for his assistance in ancestral-state reconstruction techniques.: Author information: Affiliations: Corresponding authors: Correspondence to Didier Mazel or Ivan Erill.: Additional information: Competing interests: The authors declare that they have no competing interests.: Authors' contributions: GC implemented the functionality assessment method and carried out data mining, preprocessing and statistical analysis. NSA performed protein purification, RT-PCR and mobility-shift assays. SC and JB designed and directed the in vitro and in vivo studies and provided expertise on LexA binding motifs. EG, SDR and MCP coordinated in vitro and in vivo analyses and provided expertise on integrase expression. BGZ provided the pMUR plasmid and participated in coordination. GC, NSA, DM and IE conceived of the study and participated in its design and coordination. IE developed the site-search method, carried out phylogenetic reconstruction and ancestral-state reconstruction, and directed the in silico and statistical analyses. DM and GC developed the functionality assessment method. DM and IE coordinated and directed this work and drafted the manuscript. All authors read and approved the final manuscript.: Guillaume Cambray, Neus Sanchez-Alberola contributed equally to this work.: Electronic supplementary material: Additional file 1:Identified LexA binding sites in the promoter region (-100, +50 of the start codon) of integrase homologs from the WGS, NR and ENV NCBI databases. (XLS 114 KB): Additional file 2:List of the 93 unique, distinct LexA binding sites identified in this work. (XLS 27 KB): Additional file 3:(A) Schematic representation of the pMUR050 plasmid, showing (bold) the location of the two intI1 homologs. (B) Schematic representation of the promoter region of both intI1 homologs, showing the organization of the PintI1 - and PintI1 + promoters, the standard cassette promoter (PC and the secondary cassette promoter (PC2 ) enabled by the GGG insertion. For both genes, promoter elements are also mapped into their corresponding sequence fragments. Red boxes depict LexA binding sites, black boxes outline the -35 and -10 elements of the PintI1 promoter, and green boxes depict the secondary PC2 promoter. (JPEG 951 KB): Additional file 4:List of the 45 LexA binding sites presenting GGG disruption identified in this work. (XLS 19 KB): Additional file 5:Predicted functionality status for the 1,135 intI homolog sequences for which sufficient coding sequence was available to determine inactivation using the in silico method reported in this work. (XLS 376 KB): Additional file 6:Phylogenetic tree of IntI protein sequences showing the maximum likelihood ancestral-state reconstruction of LexA regulation, as inferred from in silico analyses, using an asymmetric two-state Markov model (AsymmMk) in Mesquite[94]. The tree is the majority-rule consensus tree generated by MrBayes, and was rooted using the Escherichia coli and Thiobacillus denitrificans XerCD protein sequences as outgroup. At each taxon and branching point, pie-filled circles indicate the likelihood of LexA regulation at each node, with a completely filled circle indicating certainty of LexA regulation, and a completely open circle indicating certainty of lack of LexA regulation. Taxon name colors indicate the natural habitat of each organism (blue for marine, green for soil/freshwater, black for ambiguous) or their pertaining to the outgroup (red). Azo = Azoarcus sp. EbN1; Dar = Dechloromonas aromatica; Eco = E. coli; Gme = Geobacter metallireducens; Lan = Listonella anguillarum; Lar = Lentisphaera araneosa; Lni = Lutiella nitroferrum; Lpe = Listonella pelagia; Mfl = Methylobacillus flagellatus; Neu = Nitrosomonas europaea; Nmo = Nitrococcus mobilis; Pal = Pseudomonas alcaligenes; Pme = Pseudomonas mendocina; Ppr = Photobacterium profundum; PstuBA = Pseudomonas stutzeri BAM; PstuQ = Pseudomonas stutzeri Q; Rei = Reinekea sp.; Rba = Rhodopirellula baltica; Rge = Rubrivivax gelatinosus; Sde = Saccharophagus degradans; Sam = Shewanella amazonensis; Ssp = Shewanella sp. MR-7; Son = Shewanella oneidensis; Spu = Shewanella putrefaciens; SynSp = Synechococcus sp; Tden = Treponema denticola; Tde = Thiobacillus denitrificans; Vch = Vibrio cholerae; Vfi = Vibrio fischeri; Vme = Vibrio metschnikovii; Vmi = Vibrio mimicus; Vpa = Vibrio parahaemolyticus; Vsp = Vibrio splendidus; Vvu = Vibrio vulnificus; Xca = Xanthomonas campestris; Xor = Xanthomonas oryzae; Xsp = Xanthomonas sp. (PDF 346 KB): Additional file 7:Phylogenetic tree of IntI protein sequences showing the maximum likelihood ancestral-state reconstruction of integrase functionality, as inferred from in silico analyses, using an asymmetrical two-state Markov model (AsymmMk) in Mesquite[94]. The tree is the majority-rule consensus tree generated by MrBayes, and was rooted using the Escherichia coli and Thiobacillus denitrificans XerCD protein sequences as outgroup. At each taxon and branching point, pie-filled circles indicate the likelihood of integrase functionality at each node, with a completely filled circle indicating certainty of integrase functionality and a completely open circle indicating certainty of integrase inactivation. Taxon name colors indicate the natural habitat of each organism (blue for marine, green for soil/freshwater, black for ambiguous) or their pertaining to the outgroup (red). Azo = Azoarcus sp. EbN1; Dar = Dechloromonas aromatica; Eco = Escherichia coli; Gme = Geobacter metallireducens; Lan = Listonella anguillarum; Lar = Lentisphaera araneosa; Lni = Lutiella nitroferrum; Lpe = Listonella pelagia; Mfl = Methylobacillus flagellatus; Neu = Nitrosomonas europaea; Nmo = Nitrococcus mobilis; Pal = Pseudomonas alcaligenes; Pme = Pseudomonas mendocina; Ppr = Photobacterium profundum; PstuBA = Pseudomonas stutzeri BAM; PstuQ = Pseudomonas stutzeri Q; Rei = Reinekea sp.; Rba = Rhodopirellula baltica; Rge = Rubrivivax gelatinosus; Sde = Saccharophagus degradans; Sam = Shewanella amazonensis; Ssp = Shewanella sp. MR-7; Son = Shewanella oneidensis; Spu = Shewanella putrefaciens; SynSp = Synechococcus sp; Tden = Treponema denticola; Tde = Thiobacillus denitrificans; Vch = Vibrio cholerae; Vfi = Vibrio fischeri; Vme = Vibrio metschnikovii; Vmi = Vibrio mimicus; Vpa = Vibrio parahaemolyticus; Vsp = Vibrio splendidus; Vvu = Vibrio vulnificus; Xca = Xanthomonas campestris; Xor = Xanthomonas oryzae; Xsp = Xanthomonas sp. (PDF 161 KB): Additional file 8:List of sequences mapping to the VchIntISXT (AAK95987) taxon according to reciprocal BLAST results. Sequences belonging to the Alteromonadales order are shown yellow. (XLS 88 KB): Additional file 9:Sequences from soil/freshwater bacteria clusters showing evidence of residual LexA regulation. Sequences with identified LexA binding sites are shown in yellow. (XLS 30 KB): Additional file 10:Complete and fully annotated set of IntI homologs identified in this work in GenBank format. (TXT 5 MB): Additional file 11:Complete and fully annotated set of IntI homologs identified in this work in Excel (XLS) format. (XLS 4 MB): Additional file 12:Oligonucleotides used in this work. (DOC 50 KB): Additional file 13:Strains and plasmids used in this work. (DOC 32 KB): Additional file 14:Statistical summary of reciprocal BLAST mapping results with regard to the two in silico predicted traits analyzed here: LexA regulation and integrase functionality. (XLS 20 KB): Authors’ original submitted files for images: Below are the links to the authors’ original submitted files for images.: Authors’ original file for figure 1: Authors’ original file for figure 2: Authors’ original file for figure 3: Authors’ original file for figure 4: Authors’ original file for figure 5: Authors’ original file for figure 6: Authors’ original file for figure 7: Rights and permissions: Reprints and Permissions: About this article: Cite this article: Cambray, G., Sanchez-Alberola, N., Campoy, S. et al. Prevalence of SOS-mediated control of integron integrase expression as an adaptive trait of chromosomal and mobile integrons. Mobile DNA 2, 6 (2011). https://doi.org/10.1186/1759-8753-2-6: Download citation: Received: 09 November 2010: Accepted: 30 April 2011: Published: 30 April 2011: DOI: https://doi.org/10.1186/1759-8753-2-6: Keywords: Advertisement: ISSN: 1759-8753: Follow BMC: By using this website, you agree to our Terms and Conditions, Privacy statement and Cookies policy. Manage the cookies we use in the preference centre. : © 2020 BioMed Central Ltd unless otherwise stated. Part of Springer Nature. "
"Reliable transgene-independent method for determining Sleeping Beauty transposon copy numbers" "Orsolya Kolacsek, Virág Krízsik, Anita Schamberger, Zsuzsa Erdei, Ágota Apáti, György Várady, Lajos Mátés, Zsuzsanna Izsvák, Zoltán Ivics, Balázs Sarkadi, Tamás I Orbán,
Erratum" "Tamás I Orbán" "03 March 2011" "The transposon-based gene delivery technique is emerging as a method of choice for gene therapy. The Sleeping Beauty (SB) system has become one of the most favored methods, because of its efficiency and its random integration profile. Copy-number determination of the delivered transgene is a crucial task, but a universal method for measuring this is lacking. In this paper, we show that a real-time quantitative PCR-based, transgene-independent (qPCR-TI) method is able to determine SB transposon copy numbers regardless of the genetic cargo., We designed a specific PCR assay to amplify the left inverted repeat-direct repeat region of SB, and used it together with the single-copy control gene RPPH1 and a reference genomic DNA of known copy number. The qPCR-TI method allowed rapid and accurate determination of SB transposon copy numbers in various cell types, including human embryonic stem cells. We also found that this sensitive, rapid, highly reproducible and non-radioactive method is just as accurate and reliable as the widely used blotting techniques or the transposon display method. Because the assay is specific for the inverted repeat region of the transposon, it could be used in any system where the SB transposon is the genetic vehicle., We have developed a transgene-independent method to determine copy numbers of transgenes delivered by the SB transposon system. The technique is based on a quantitative real-time PCR detection method, offering a sensitive, non-radioactive, rapid and accurate approach, which has a potential to be used for gene therapy." "Green Fluorescent Protein, Sleep Beauty, gDNA Sample, Relative Standard Curve Method, High Speed Cell Sorter" " Reliable transgene-independent method for determining Sleeping Beauty transposon copy numbers: Orsolya Kolacsek1, Virág Krízsik1, Anita Schamberger1, Zsuzsa Erdei1, Ágota Apáti1, György Várady1, Lajos Mátés2, Zsuzsanna Izsvák2,3, Zoltán Ivics2,3, Balázs Sarkadi1 & Tamás I Orbán1 : Mobile DNA volume 2, Article number: 5 (2011) Cite this article : 9577 Accesses: 26 Citations: 1 Altmetric: Metrics details: Abstract: Background: The transposon-based gene delivery technique is emerging as a method of choice for gene therapy. The Sleeping Beauty (SB) system has become one of the most favored methods, because of its efficiency and its random integration profile. Copy-number determination of the delivered transgene is a crucial task, but a universal method for measuring this is lacking. In this paper, we show that a real-time quantitative PCR-based, transgene-independent (qPCR-TI) method is able to determine SB transposon copy numbers regardless of the genetic cargo.: Results: We designed a specific PCR assay to amplify the left inverted repeat-direct repeat region of SB, and used it together with the single-copy control gene RPPH1 and a reference genomic DNA of known copy number. The qPCR-TI method allowed rapid and accurate determination of SB transposon copy numbers in various cell types, including human embryonic stem cells. We also found that this sensitive, rapid, highly reproducible and non-radioactive method is just as accurate and reliable as the widely used blotting techniques or the transposon display method. Because the assay is specific for the inverted repeat region of the transposon, it could be used in any system where the SB transposon is the genetic vehicle.: Conclusions: We have developed a transgene-independent method to determine copy numbers of transgenes delivered by the SB transposon system. The technique is based on a quantitative real-time PCR detection method, offering a sensitive, non-radioactive, rapid and accurate approach, which has a potential to be used for gene therapy.: Background: Transposon-based systems have become the method of choice for gene delivery, and their applications as potential genetic vehicles are receiving great interest [1–3]. In recent years, the Sleeping Beauty (SB) transposon has been emerging as the most favorable delivery system, because of its random integration profile and the lack of similar transposon-like elements in the human genome, which significantly minimizes the risk often represented by viral-based methods [4–6]. Owing to its advantageous characteristics, SB is the first transposon-based system to be used in a clinical trial for a hematologic malignancy [7]. Recently, a novel hyperactive version of the originally reconstituted SB transposase was developed [8], which, apart from making the system more favorable than other widely used non-viral methods, further substantiates its applicability as a mutagenic tool to perform genetic analyses, similar to the transposon-based systems in D. melanogaster and C. elegans [9, 10]. Although already possessing clear advantages, rigorous characterization of the SB system still remains to be carried out to set up standard methods concerning its applicability. One of the important issues in setting up gene-therapy guidelines or genome-wide mutagenesis protocols is that of copy-number determination in stable clones [11–13].: Various technical methods have been developed to determine transgene copy numbers after gene delivery, including Southern blotting and the specific PCR-based transposon display method [14, 15]. In most cases, these are performed using radioactively labeled probes; although fluorescent labeling can also be used, its threshold detection levels are generally lower. Depending on the transgene used, other techniques such as in situ hybridization quantification of fluorescent marker proteins such as green fluorescent protein (GFP) can also be employed [16]. Although these methods are widely accepted and used, they are usually laborious and require specific chemicals and equipment. In addition, these detection methods are often limited to the measurement of a specific transgene, and lengthy pilot experiments are often required to determine the exact measurements needed to accurately quantify a newly arising gene of interest within a particular delivery system [17–19].: During this study, we aimed to develop an accurate method for quantifying SB transposon copy numbers, independent of the transgene sequence. We term this the real-time quantitative PCR-based, transgene-independent (qPCR-TI) method. It can be used for any SB-based gene delivery experiments without a priori optimization of the protocol.: To establish this method, we used specific probe sets designed for the left and right inverted repeat-direct repeat (IRDR) regions, which are the recognition motifs of the transposase and therefore required for any SB transposition reaction [20]. As an internal control for normalization, a probe for the RPPH1 gene, the H1 RNA subunit of the RNaseP enzyme complex, was used. This gene is a widely accepted one-copy gene of the haploid human genome [21]. Comparing this system with the radioactive transposon display and Southern/dot blotting techniques, we provide evidence that using the IRDR-L specific probe set in comparative 2-<U+0394><U+0394>Ct measurements can reliably and accurately quantify SB transposon copy numbers in various cell lines, regardless of the transgene used. Apart from being sensitive, accurate and rapid, this real-time PCR-based quantification method also offers a powerful non-radioactive technique as an alternative against other standard methods.: Results and Discussion: The exact and rapid quantification of transgene copy numbers is often required for gene-delivery experiments. As we generally use the SB transposon system in our laboratory, we aimed to develop a real-time PCR-based technique that would be transgene-independent, specific for the transposon regions and therefore widely applicable. To optimize the qPCR-TI method, we began with clones of HEK-293 cells with SB transposons carrying two transcription units expressing GFP and the puromycin-resistance gene, which are both under the control of the CAG promoter (Figure 1A). This transgene setup allowed generation of clones with various copy numbers by either fluorescence-activated cell sorting (FACS) or antibiotic selection. Specific TaqMan® (Applied Biosystems, Foster City, CA, USA) assays were designed for the two IRDR motifs of the SB transposon and for the GFP sequence (Figure 1A). The widely applicable SB transposon version used throughout this study has two asymmetric IRDR regions ('left' and 'right' [22]). In most transposon flanking sequences, the two IRDR regions are repeat-rich DNA sequences, which makes PCR primer design relatively difficult. Moreover, the left and the right IRDRs are very similar to each other, which further increases the difficulty of designing specific assays for them. Nevertheless, we could still develop specific assays for each; neither of the IRDR-L nor the IRDR-R probe set gave signals in the exclusive presence of the other template (data not shown).: Real-time PCR assay designed for different transposon and transgene regions. (A) Structure of the used SB transposons with asymmetric IRDRs [22]. For each construct, the TaqMan® assays (TQ) used for copy-number determination are indicated. Sequences are not drawn to scale. IRDR-L/-R = inverted repeat-direct repeat left/right regions; pA = SV40 polyadenylation signals. (B) Efficiencies of the real-time assays determined by standard curves. For all assays, a dilution series was prepared from pooled genomic DNA samples from clones containing integrated transposon 1. The efficiency of the IRDR-R TaqMan® assay was notably lower than that of the others (<90%).: As the first (and simplest) approach, absolute quantification of DNA samples was performed using plasmid dilution series complemented with transposon-free non-specific genomic (g)DNA. However, the difficulties of determining the exact nucleic-acid concentration of very dilute samples and the differences in purity between samples made it necessary to abandon absolute quantification, and to include an internal copy control to overcome these problems with relative quantification. The RPPH1 gene, the H1 RNA subunit of the RNaseP enzyme complex, was chosen as this is a widely-accepted one copy gene of the haploid human genome [21] (http://www.ncbi.nlm.nih.gov/ieb/research/acembly/index.html). However, the assay efficiency for the IRDR-R region differed significantly from that of the others, including the RPPH1 endogenous control assay. Various conditions for the IRDR-R set were tried, and although template concentration seemed to be a crucial factor, the widely accepted template range of 10 to 40 ng still produced efficiency values that were significantly lower than those of the other assays (<90%) (Figure 1B). Sequence constraints originating from the similarity to IRDR-L hindered us designing other specific assays with different combinations of primers and probes in this short (228 bp) and repeat-rich region. Therefore, if this assay were to be included for measurement, the relative standard curve method would be the only acceptable quantification method, as it is the most suitable to compare reactions with suboptimal PCR efficiency. Apart from the setting up of standard curves (for both the transposon-specific assays and the RPPH1 endogenous control), relative quantification also requires the use of a calibrator (a reference sample with a known copy number, preferably '1') to ensure the precision of quantification.: In the search for a potential calibrator sample, generated clones were screened by FACS for the lowest possible GFP signal, assuming that clones with one copy number should be among those samples (the signal could also vary because of positional effects of different integration sites). Although the CAG promoter we used is known to be less prone to silencing [23–25], we had to make sure the lowest fluorescent signals were also associated with the lowest real-time signals when normalized to the RPPH1 level, in order to exclude the potential presence of silenced copies. Using the GFP TaqMan® assay, several clones with one integrated transposon copy and numerous others with three or four copies were found (Figure 2A,B). Using the IRDR-L set, very similar copy numbers could be calculated using the relative standard curve method (Figure 2C), whereas the IRDR-R TaqMan® set gave unreliable results, mainly due to the problems discussed earlier (Additional file 1). Because the assays for RPPH1, GFP and the transposon IRDR-L had very similar efficiency values (Figure 1B), we also tried another approach, calculating the copy numbers in the examined clones by the comparative Ct (2-<U+0394><U+0394>Ct) method in the same experiments. The results based on GFP or IRDR-L were in agreement with each other and with the results of the relative standard curve method. Moreover, technical errors could be further decreased by using a pool of gDNA samples with known copy number as a reference. We therefore concluded that once we left out the specific but less efficient assay for the IRDR-R region, the comparative Ct method could be used for reliable and precise transposon copy-number determination using the IRDR-L TaqMan® assay. Abandoning the relative standard curve method also allowed inclusion of more samples in one reaction plate, as no more dilution series with several parallels were required.: Comparing copy-number determination by green fluorescent protein (GFP) or transposon-specific real-time PCR. (A) Fluorescence-activated cell sorting (FACS) analysis of different HEK-293 derived clones expressing GFP. Higher fluorescent intensities indicate higher copy numbers, although signals can vary because of integration position effects and/or transgene silencing. The control sample shows the autofluorescence detected in non-transfected HEK-293 cells. (B) Copy numbers determined by transgene (GFP) specific real-time PCR assay normalized to the level of one copy control RPPH1; clones analyzed by FACS (A) and other clones established subsequently were examined. Various clones with low GFP expression level were determined to have one integrated transposon copy, whereas the majority with higher GFP fluorescence was found to have four transposon copies. In the case of clone 5.a, further analysis revealed that it was not a clone but rather a mixture of clones with an average copy number around 4.5. (C) Comparison of two techniques. The copy values determined by the transgene independent TaqMan® assay for the IRDR-L sequence correlated well with the GFP-based copy numbers. Clone 2.r originated from random integration, so the transposon repeat sequence might not be intact, and the partial presence of IRDR-L could result in a lower signal, therefore this clone was not included among the controls for later experiments. a = clones obtained from a ctive transposition; r = clones obtained from r andom integration (from transfection with the mutant transposase). For copy numbers, values are means ± SEM of at least three independent measurements.: To test the qPCR-TI method on other samples, we examined clones of the HUES9 human embryonic stem cell line expressing the GFP-tagged ABCG2 transporter [26] generated by the SB transposon system. Again, the GFP and the IRDR-L TaqMan® assays could be compared with each other (Figure 1A, transposon 2). As a general assay setup, the RPPH1 control and reference samples (pool of clones with known copy numbers) were used. As shown in Figure 3, the 2-<U+0394><U+0394>Ct method produced the same copy numbers, using either of the probe sets. These experiments therefore supported the use of the IRDR-L repeat specific assay for transposon copy-number determination, as it gave the same results as the assay specific for the carried internal transgene.: Copy-number determinations of green fluorescent protein (GFP)-ABCG2 expressing HUES9 clones. The sample 'pool' indicates the equimolar mixture of gDNA samples from the first four single-copy clones on Figure 2C. Later examination of the G2C3 line indicates that it is not derived from a clone but rather from a mixture of cells with five and six transposon copies. Values are means ± SEM of at least three independent measurements.: To compare our transgene-independent quantification approach with other techniques, we measured copy numbers of clones generated from HeLa cells by transposons containing a neomycin-resistance (neoR) gene (Figure 1A, transposon 3). Such clones were ideal for comparison because of the different transgene sequences and because their copy numbers were also determined by the Southern/dot blotting techniques or the transposon display method [5]. Several clones were tested, and the qPCR-TI method gave the same copy numbers as determined by the other radioactive methods (Table 1). For higher (>5) copy-number clones, the qPCR-TI method was also reasonably accurate, with occasional low relative-error margins (=9%). The slight differences in some cases could be due to the inaccuracy of the standard methods for this range [14, 15]. In addition, it has been suggested that precise values of very high copy numbers are more reliably measured by dot blot rather than transposon display methods. We found that the copy number of clone 4 determined by the dot-blot technique correlated well with data produced by the qPCR-TI. For low copy-number clones, only one clone (2/2 of neoR; see Table 1) did not give identical results with the different techniques. A difference of one copy number here clearly represents a higher percentage error margin, but this error might be related to the difference in integration sites in that particular clone (see discussion below). Taken together, the results of the neoR transposon clones indicated that the qPCR-TI technique is just as sensitive and accurate as the other widely used methods.: A further proof of principle was given by the determination of the transposon copy numbers in HUES9 clones previously generated using another sequentially distinct transgene. In those experiments, the amaxaGFP (a special fluorescent protein from a Pontellina copepod species, http://www.lonzabio.com) was carried by the transposon to generate clones of an embryonic stem-cell line, and the transposon integration sites were determined by the splinkerette PCR and the inverse PCR methods [27]. Based on these integration assays, copy numbers were estimated to be one to six in various clones, although all integrated copies may not be reliably detected by these methods because of the different flanking genomic sequences. When using the qPCR-TI method for several clones using the IRDR-L assay, the measured transposon copies were almost always the same as those previously claimed on the basis of the different proven integration sites (Table 1). One exception here was clone B1, where qPCR-TI gave a result one copy higher, similarly to the 2/2 neoR clone. A difference of one copy number here again undoubtedly represents a higher discrepancy with higher percentage error margin. However, because all the other low copy-number clones gave identical results with the various techniques, the two outliers might represent the lower sensitivity of the standard methods due to the dependence of transgene-integration sites [15]. These comparisons lead us to the conclusion that the qPCR-TI method provides reliable results for different SB transposon constructs, thereby being a consistent transgene-independent copy-number quantification method.: Using the experiments described above, the newly developed transgene-independent method for determining SB transposon copy numbers was validated: (i) it provided the same results as the assays specific for the carried transgene sequence and (ii) it could also reliably replace widely used standard radioactive techniques. The TaqMan® assay designed for the IRDR-L region of the transposon provides the basis for transgene independence as it is present in all SB constructs. In fact, 'symmetric' SB transposons with two IRDR-L (but not two IRDR-R) flanking sequences are functional [28], and the qPCR-TI method is also applicable to such constructs (with an obvious correction factor of 0.5). We found evidence that the PCR efficiency of this probe set is similar to the RPPH1 single-copy control, so reliable quantification can be performed using the comparative 2-<U+0394><U+0394>Ct method. To ensure precise and rapid quantification, reference samples (calibrators) with known copy numbers are also included, preferably a pool of gDNAs from different clones, to minimize discrepancies resulting from different transgenic sampling techniques and purities. The method could also be extended to other non-human gDNA samples; however, a suitable and validated single-copy reference gene control must always be used.: Another technical point that should be considered is the transposition-independent, random integration of the transgene. Because this is a stochastic process, it could possibly lead to the integration of the carried transcription unit without the transposon IRDR sequences. In such cases, the qPCR-TI method clearly underestimates transgene copy numbers, as it only detects copies resulted from bona fide transposition. As a general rule, we always include control experiments with gene delivery using the mutant transposase to estimate the level of random integration [20]. According to previous experiments, this phenomenon is generally very rare when using the new hyperactive SB100x transposase, but its extent can vary between different cell lines. Nevertheless, if such random background integration increases significantly, it may be necessary to measure the copy numbers of the transgene itself in the samples generated with the active transposase.: Conclusions: We have developed a sensitive and reliable real-time PCR-based (qPCR-TI) method for measuring SB transposon copy numbers. When compared with widely used standard methods, such as various blotting techniques or transposon display, it proved to be just as accurate as those other methods, while also offering a faster and non-radioactive method. However, the real advantage of this method is the transgene independence, which makes it applicable for any scientists working with Sleeping Beauty transposon constructs. Therefore, we believe that qPCR-TI could become the method of choice for gene therapy and general gene-delivery applications.: Methods: Cell-culture maintenance and creation of clones: Human embryonic kidney cells (HEK-293) were cultured in Dulbecco's modified Eagle's medium (DMEM) supplemented with 10% fetal calf serum, 1% L-glutamine and 1% penicillin/streptomycin (Invitrogen, Carlsbad, CA, USA). Transfected cell populations were first enriched for transgene expression by flow cytometry (see below). Subsequently, cell clones were created by serial dilutions in 96-well plates. Selected clones were further analyzed by flow cytometry and harvested for genomic DNA isolation (see below). The HUES9 embryonic stem-cell line (originally provided by Dr. Douglas Melton, Harvard University, USA) was maintained essentially as described previously [29], using cells from passage 35. To create transgene-expressing HUES9 clones, we used our previously developed method for human embryonic stem-cell lines [27].: Transfection and transposition: HEK-293 and HUES9 cells were transfected using a transfection reagent (FuGENE® 6; Roche Applied Science, Rotkreuz, Switzerland) in accordance with the manufacturer's instructions. The transfection mix contained 1 µg of a given transposon plasmid (Figure 1A) and 100 ng of the hyperactive SB100x Sleeping Beauty transposase, in a 10:1 ratio to minimize the overproduction inhibition phenomenon [5, 8]. To visualize the random integration background, a control transfection with the inactive DDE motif mutant of the transposase was carried out, using the same experimental setup [20].: Flow cytometry: GFP-expressing cells were analyzed by a flow cytometer (FACSCalibur; Becton-Dickinson, San Jose, CA, USA) with Cellquest-Pro analysis software (Becton-Dickinson). Mock-transfected cells were used as labeling controls, and propidium iodide or 7-aminoactinomycin D staining was used to exclude non-viable cells. To select and clone cells expressing GFP, a fluorescence based cell sorter (FACSAria High Speed Cell Sorter; Becton-Dickinson) was used in accordance with the manufacturer's instructions.: Genomic DNA isolation, transposon display and Southern/dot blotting: After treatment with trypsin, cells were separated by centrifugation and washed with 1 × phosphate-buffered saline. After careful removal of the liquid supernatant, the dry cell pellets were stored at -80°C until further processing. Genomic DNAs were isolated from the cells by standard phenol-chloroform extraction after cell lysis and proteinase K digestion. DNA samples were quantified with a spectrophotometer (GeneQuant II; Pharmacia Biotech, Piscataway, NJ, USA). Transposon display and Southern-/dot-blotting techniques were performed essentially as described previously [5, 14].: Quantitative real-time PCR: Reactions were performed on a real-time PCR platform (StepOne™ or StepOnePlus™; Applied Biosystems, Foster City, CA, USA) in accordance with the manufacturer's instructions. The gDNA samples (30 ng each) were run in triplicate, in singleplex reactions with a final volume of 20 µl using TaqMan® chemistry. All primers and probes were designed by Primer Express software (version 3.0; Applied Biosystems), and probes were labeled with 5'-FAM and 3'-nonfluorescent (minor groove binding) quencher molecules. Sequences for the TaqMan® assays are given in Table 2. Final concentrations of primers and probes were 250 and 900 nM, respectively. Data were analyzed by StepOne software (version 2.1; Applied Biosystems).: References: Ivics Z, Izsvak Z: Transposons for gene therapy!. Curr Gene Ther. 2006, 6: 593-607. 10.2174/156652306778520647.: VandenDriessche T, Ivics Z, Izsvak Z, Chuah MK: Emerging potential of transposons for gene therapy and generation of induced pluripotent stem cells. Blood. 2009, 114: 1461-1468. 10.1182/blood-2009-04-210427.: Claeys Bouuaert C, Chalmers RM: Gene therapy vectors: the prospects and potentials of the cut-and-paste transposons. Genetica. 2010, 138: 473-484. 10.1007/s10709-009-9391-x.: Izsvak Z, Ivics Z: Sleeping beauty transposition: biology and applications for molecular therapy. Mol Ther. 2004, 9: 147-156. 10.1016/j.ymthe.2003.11.009.: Grabundzija I, Irgang M, Mates L, Belay E, Matrai J, Gogol-Doring A, Kawakami K, Chen W, Ruiz P, Chuah MK, VandenDriessche T, Izsvák Z, Ivics Z: Comparative analysis of transposable element vector systems in human cells. Mol Ther. 2010, 18: 1200-1209. 10.1038/mt.2010.47.: Hackett PB, Largaespada DA, Cooper LJ: A transposon and transposase system for human application. Mol Ther. 2010, 18: 674-683. 10.1038/mt.2010.2.: Williams DA: Sleeping beauty vector system moves toward human trials in the United States. Mol Ther. 2008, 16: 1515-1516. 10.1038/mt.2008.169.: Mates L, Chuah MK, Belay E, Jerchow B, Manoj N, Acosta-Sanchez A, Grzela DP, Schmitt A, Becker K, Matrai J, Ma L, Samara-Kuko E, Gysemans C, Pryputniewicz D, Miskey C, Fletcher B, VandenDriessche T, Ivics Z, Izsvák Z: Molecular evolution of a novel hyperactive Sleeping Beauty transposase enables robust stable gene transfer in vertebrates. Nat Genet. 2009, 41: 753-761. 10.1038/ng.343.: Ryder E, Russell S: Transposable elements as tools for genomics and genetics in Drosophila. Brief Funct Genomic Proteomic. 2003, 2: 57-71. 10.1093/bfgp/2.1.57.: Mates L, Izsvak Z, Ivics Z: Technology transfer from worms and flies to vertebrates: transposition-based genome manipulations and their future perspectives. Genome Biol. 2007, 8 (Suppl 1): S1-10.1186/gb-2007-8-s1-s1.: Bian Q, Belmont AS: BAC TG-EMBED: one-step method for high-level, copy-number-dependent, position-independent transgene expression. Nucleic Acids Res. 2010, 38: e127-10.1093/nar/gkq178.: Sivalingam J, Krishnan S, Ng WH, Lee SS, Phan TT, Kon OL: Biosafety assessment of site-directed transgene integration in human umbilical cord-lining cells. Mol Ther. 2010, 18: 1346-1356. 10.1038/mt.2010.61.: Huang X, Haley K, Wong M, Guo H, Lu C, Wilber A, Zhou X: Unexpectedly high copy number of random integration but low frequency of persistent expression of the Sleeping Beauty transposase after trans delivery in primary human T cells. Hum Gene Ther. 2010, 21: 1577-1590. 10.1089/hum.2009.138.: Wicks SR, de Vries CJ, van Luenen HG, Plasterk RH: CHE-3, a cytosolic dynein heavy chain, is required for sensory cilia structure and function in Caenorhabditis elegans. Dev Biol. 2000, 221: 295-307. 10.1006/dbio.2000.9686.: Devon RS, Porteous DJ, Brookes AJ: Splinkerettes--improved vectorettes for greater efficiency in PCR walking. Nucleic Acids Res. 1995, 23: 1644-1645. 10.1093/nar/23.9.1644.: Moeller F, Nielsen FC, Nielsen LB: New tools for quantifying and visualizing adoptively transferred cells in recipient mice. J Immunol Methods. 2003, 282: 73-82. 10.1016/j.jim.2003.07.007.: Wang LJ, Chen YM, George D, Smets F, Sokal EM, Bremer EG, Soriano HE: Engraftment assessment in human and mouse liver tissue after sex-mismatched liver cell transplantation by real-time quantitative PCR for Y chromosome sequences. Liver Transpl. 2002, 8: 822-828. 10.1053/jlts.2002.34891.: Ballester M, Castello A, Ibanez E, Sanchez A, Folch JM: Real-time quantitative PCR-based system for determining transgene copy number in transgenic animals. Biotechniques. 2004, 37: 610-613.: Joshi M, Keith Pittman H, Haisch C, Verbanac K: Real-time PCR to determine transgene copy number and to quantitate the biolocalization of adoptively transferred cells from EGFP-transgenic mice. Biotechniques. 2008, 45: 247-258. 10.2144/000112913.: Ivics Z, Hackett PB, Plasterk RH, Izsvak Z: Molecular reconstruction of Sleeping Beauty, a Tc1-like transposon from fish, and its transposition in human cells. Cell. 1997, 91: 501-510. 10.1016/S0092-8674(00)80436-5.: Baer M, Nilsen TW, Costigan C, Altman S: Structure and transcription of a human gene for H1 RNA, the RNA component of human RNase P. Nucleic Acids Res. 1990, 18: 97-103. 10.1093/nar/18.1.97.: Cui Z, Geurts AM, Liu G, Kaufman CD, Hackett PB: Structure-function analysis of the inverted terminal repeats of the sleeping beauty transposon. J Mol Biol. 2002, 318: 1221-1235. 10.1016/S0022-2836(02)00237-1.: Chung S, Andersson T, Sonntag KC, Bjorklund L, Isacson O, Kim KS: Analysis of different promoter systems for efficient transgene expression in mouse embryonic stem cell lines. Stem Cells. 2002, 20: 139-145. 10.1634/stemcells.20-2-139.: Liew CG, Draper JS, Walsh J, Moore H, Andrews PW: Transient and stable transgene expression in human embryonic stem cells. Stem Cells. 2007, 25: 1521-1528. 10.1634/stemcells.2006-0634.: Xia X, Zhang Y, Zieth CR, Zhang SC: Transgenes delivered by lentiviral vector are suppressed in human embryonic stem cells in a promoter-dependent manner. Stem Cells Dev. 2007, 16: 167-176. 10.1089/scd. 2006.0057.: Orban TI, Seres L, Ozvegy-Laczka C, Elkind NB, Sarkadi B, Homolya L: Combined localization and real-time functional studies using a GFP-tagged ABCG2 multidrug transporter. Biochem Biophys Res Commun. 2008, 367: 667-673. 10.1016/j.bbrc.2007.12.172.: Orban TI, Apati A, Nemeth A, Varga N, Krizsik V, Schamberger A, Szebenyi K, Erdei Z, Varady G, Karaszi E, Homolya L, Német K, Gócza E, Miskey C, Mátés L, Ivics Z, Izsvák Z, Sarkadi B: Applying a \"double-feature\" promoter to identify cardiomyocytes differentiated from human embryonic stem cells following transposon-based gene delivery. Stem Cells. 2009, 27: 1077-1087. 10.1002/stem.45.: Izsvak Z, Khare D, Behlke J, Heinemann U, Plasterk RH, Ivics Z: Involvement of a bifunctional, paired-like DNA-binding domain and a transpositional enhancer in Sleeping Beauty transposition. J Biol Chem. 2002, 277: 34581-34588. 10.1074/jbc.M204001200.: Apati A, Orban TI, Varga N, Nemeth A, Schamberger A, Krizsik V, Erdelyi-Belle B, Homolya L, Varady G, Padanyi R, Karászi E, Kemna EW, Német K, Sarkadi B: High level functional expression of the ABCG2 multidrug transporter in undifferentiated human embryonic stem cells. Biochim Biophys Acta. 2008, 1778: 2700-2709. 10.1016/j.bbamem.2008.08.010.: Download references: Acknowledgements: We thank Dr Douglas Melton for the gift of the HUES9 cell line. T I O is a recipient of the János Bolyai Scholarship from the Hungarian Academy of Sciences. This work was supported by grants from OTKA (NK72057), ETT (213-09), ES2Heart Jedlik (OM00203/2007), STEMKILL Jedlik (OM00108/2008) and National Development Agency grant KMOP-1.1.2-07/1-2008-0003.: Author information: Affiliations: Corresponding author: Correspondence to Tamás I Orbán.: Additional information: Competing interests: The authors declare that they have no competing interests.: Authors' contributions: OK established the HEK clone; OK and VK optimized the real-time PCR and performed copy-number measurements; AS, ZE and ÁA established the HUES9 clones; GV helped in FACS measurements; LM measured copy numbers in HeLa clones; ZsI and ZI gave technical help and advices with the SB transposon work; BS provided financial support and discussed the data; and TIO designed the overall strategy, analyzed the data and wrote the paper.: An erratum to this article is available at http://dx.doi.org/10.1186/1759-8753-4-11.: Electronic supplementary material: Additional file 1: Supplementary Figure 1: Comparison of the IRDR-R assay with the GFP specific real-time PCR method. Selected HEK-293 clones were examined for transposon copy numbers in parallel by the accepted green fluorescent protein (GFP) specific assay and the assay specific for Sleeping Beauty (SB) inverse repeat-direct repeat, right (IRDR)-R. In contrast to the IRDR, left (IRDR-L) real-time assay, the IRDR-R specific assay failed to reproduce previously determined copy numbers consistently (see Figure 2C). For this particular experiment, 30 ng genomic (g)DNA was used for the reaction. Although different starting gDNA concentrations (higher than the recommended range of 10 to 40 ng) improved the reproducibility of the IRDR-R assay, it still did not reach the reliability level of the GFP or the IRDR-L assays. (TIFF 285 KB): Authors’ original submitted files for images: Below are the links to the authors’ original submitted files for images.: Authors’ original file for figure 1: Authors’ original file for figure 2: Authors’ original file for figure 3: Rights and permissions: Reprints and Permissions: About this article: Cite this article: Kolacsek, O., Krízsik, V., Schamberger, A. et al. Reliable transgene-independent method for determining Sleeping Beauty transposon copy numbers. Mobile DNA 2, 5 (2011). https://doi.org/10.1186/1759-8753-2-5: Download citation: Received: 05 November 2010: Accepted: 03 March 2011: Published: 03 March 2011: DOI: https://doi.org/10.1186/1759-8753-2-5: Keywords: Associated Content: Collection: Mobile DNA Tools : Advertisement: ISSN: 1759-8753: Follow BMC: By using this website, you agree to our Terms and Conditions, Privacy statement and Cookies policy. Manage the cookies we use in the preference centre. : © 2020 BioMed Central Ltd unless otherwise stated. Part of Springer Nature. "
"Plant centromeric retrotransposons: a structural and cytogenetic perspective" "Pavel Neumann, Alice Navrátilová, Andrea Koblížková, Eduard Kejnovský, Eva Hribová, Roman Hobza, Alex Widmer, Jaroslav Doležel, Jirí Macas" "Pavel Neumann" "03 March 2011" "The centromeric and pericentromeric regions of plant chromosomes are colonized by Ty3/gypsy retrotransposons, which, on the basis of their reverse transcriptase sequences, form the chromovirus CRM clade. Despite their potential importance for centromere evolution and function, they have remained poorly characterized. In this work, we aimed to carry out a comprehensive survey of CRM clade elements with an emphasis on their diversity, structure, chromosomal distribution and transcriptional activity., We have surveyed a set of 190 CRM elements belonging to 81 different retrotransposon families, derived from 33 host species and falling into 12 plant families. The sequences at the C-terminus of their integrases were unexpectedly heterogeneous, despite the understanding that they are responsible for targeting to the centromere. This variation allowed the division of the CRM clade into the three groups A, B and C, and the members of each differed considerably with respect to their chromosomal distribution. The differences in chromosomal distribution coincided with variation in the integrase C-terminus sequences possessing a putative targeting domain (PTD). A majority of the group A elements possess the CR motif and are concentrated in the centromeric region, while members of group C have the type II chromodomain and are dispersed throughout the genome. Although representatives of the group B lack a PTD of any type, they appeared to be localized preferentially in the centromeres of tested species. All tested elements were found to be transcriptionally active., Comprehensive analysis of the CRM clade elements showed that genuinely centromeric retrotransposons represent only a fraction of the CRM clade (group A). These centromeric retrotransposons represent an active component of centromeres of a wide range of angiosperm species, implying that they play an important role in plant centromere evolution. In addition, their transcriptional activity is consistent with the notion that the transcription of centromeric retrotransposons has a role in normal centromere function." "Long Terminal Repeat, Bacterial Artificial Chromosome Clone, Centromeric Region, Centromere Function, Centromeric Localization" " Plant centromeric retrotransposons: a structural and cytogenetic perspective: Pavel Neumann1, Alice Navrátilová1, Andrea Koblížková1, Eduard Kejnovský2, Eva Hribová3, Roman Hobza2, Alex Widmer4, Jaroslav Doležel3 & Jirí Macas1 : Mobile DNA volume 2, Article number: 4 (2011) Cite this article : 12k Accesses: 97 Citations: 4 Altmetric: Metrics details: Abstract: Background: The centromeric and pericentromeric regions of plant chromosomes are colonized by Ty3/gypsy retrotransposons, which, on the basis of their reverse transcriptase sequences, form the chromovirus CRM clade. Despite their potential importance for centromere evolution and function, they have remained poorly characterized. In this work, we aimed to carry out a comprehensive survey of CRM clade elements with an emphasis on their diversity, structure, chromosomal distribution and transcriptional activity.: Results: We have surveyed a set of 190 CRM elements belonging to 81 different retrotransposon families, derived from 33 host species and falling into 12 plant families. The sequences at the C-terminus of their integrases were unexpectedly heterogeneous, despite the understanding that they are responsible for targeting to the centromere. This variation allowed the division of the CRM clade into the three groups A, B and C, and the members of each differed considerably with respect to their chromosomal distribution. The differences in chromosomal distribution coincided with variation in the integrase C-terminus sequences possessing a putative targeting domain (PTD). A majority of the group A elements possess the CR motif and are concentrated in the centromeric region, while members of group C have the type II chromodomain and are dispersed throughout the genome. Although representatives of the group B lack a PTD of any type, they appeared to be localized preferentially in the centromeres of tested species. All tested elements were found to be transcriptionally active.: Conclusions: Comprehensive analysis of the CRM clade elements showed that genuinely centromeric retrotransposons represent only a fraction of the CRM clade (group A). These centromeric retrotransposons represent an active component of centromeres of a wide range of angiosperm species, implying that they play an important role in plant centromere evolution. In addition, their transcriptional activity is consistent with the notion that the transcription of centromeric retrotransposons has a role in normal centromere function.: Background: Long terminal repeat (LTR) retrotransposons represent a common class of mobile genetic elements in eukaryotic genomes [1–7]. Because of their replicative mode of transposition based on an RNA intermediate, they compose the majority of the DNA of many eukaryotic genomes. They are particularly abundant in plant genomes and are intimately involved in the evolution of genome structure and size [8, 9]. Plant retrotransposon families differ considerably from one another, not only with respect to their sequence and structure but also with regard to their chromosomal distribution. Thus, while some plant retrotransposon families are essentially randomly dispersed, others are concentrated in distinct chromosomal regions [10, 11]. Among the latter category are the centromeric retrotransposons, which accumulate preferentially in the centromeric region. (Note that the term \"centromeric\" is used hereinafter to refer to both the centromeric and pericentromeric regions, as these are difficult to distinguish from one another.) They usually accompany arrays of satellite DNA, which are the dominant centromeric sequences in most species [12]. However, centromeres of some species, such as wheat [13], are dominated by centromeric retrotransposons.: A number of centromeric retrotransposons have been fully characterized in grass species: specifically, RIRE7 and CRR in rice (Oryza sativa) [14–17], CRM in maize (Zea mays) [18, 19], CRW in wild einkorn wheat (Triticum boeoticum) [13], CRS in sugar cane (Saccharum officinarum) [20], Bilby in cereal rye (Secale cereale) [21] and Cereba in barley (Hordeum vulgare) [22, 23]. In sorghum (Sorghum bicolor), pHind22 and pSau3A9 have been partially characterized [24]. Equivalent elements extracted from dicotyledonous species include Beetle1 (sugar beet, Beta vulgaris) and Beetle2 (wild beet, Beta procumbens) [25, 26] as well as CRA (Arabidopsis thaliana, hereinafter referred to as At) [27, 28]. Their phylogeny, based on their reverse transcriptase (RT) sequences, reveals that they are chromoviruses (Chromoviridae), a lineage of Ty3/gypsy retrotransposons possessing an integrase chromodomain [27, 29, 30]. Further classification of chromoviruses has shown that these centromeric retrotransposons form a phylogenetically distinct clade designated CRM [27, 29, 30]. Although the chromoviruses are widespread within eukaryotic genomes, CRM elements are specific to plants, both angiosperms and gymnosperms [27]. Few of these elements have been described in any detail, and little is known of their chromosomal distribution. Thus it remains unclear both whether all CRM elements are in reality centromeric retrotransposons and how widespread the genuine centromeric retrotransposons are in plant genomes.: The most distinctive structural feature of a centromeric retrotransposon is the presence of an integrase chromodomain, which is widely assumed to ensure correct targeting to the centromeric region [30]. Although chromodomains are present at the integrase C-terminus in all chromoviruses, their sequence is highly polymorphic [27, 29–31]. On the basis of their similarity to cellular chromodomains (for example, those present in HP1 or Swi6 proteins), chromovirus chromodomains have been classified into types I and II and a CR motif [31]. Types I and II chromodomains have sequence and structural similarity both to cellular chromodomains and to each other. However, while the type I chromodomains contain all three aromatic residues known to recognize methylated lysine on histone H3 (H3K9), type II chromodomains lack the first and usually also the last of these residues. Unlike all other plant chromoviruses which include a type II chromodomain, centromeric retrotransposons possess a CR motif, which is key for the recognition of centromeric chromatin [31]. Although the CR motif is found at the position corresponding to a chromodomain, Gao et al. [31] showed that it has neither sequence nor structural similarity to types I and II chromodomains, suggesting that it is not a genuine chromodomain. For this reason, all sequences found at the position of a chromodomain are collectively referred to hereinafter as putative targeting domains (PTDs). Although the CR motif's interacting partner has yet to be identified, it has been established that, unlike the type I chromodomains, it involves neither a dimethylated nor a trimethylated form of histone H3 lysine 9 (H3K9me2, H3K9me3) [31].: Circumstantial evidence suggests that centromeric retrotransposons have been influential in the evolution of centromeres, as well as in their structure and function. Their transpositional activity contributes to high evolutionary dynamics of centromeres by generating new insertions, which may be further subjected to illegitimate and unequal homologous recombination [32, 33]. Transcription driven by centromeric retrotransposon promoters has been proposed to underlie the substitution of histone H3 by CenH3 (centromere-specific variant of histone H3 which is essential for the establishment and maintenance of centromere function and kinetochore assembly) [12]. As the RNA component of maize centromeric chromatin includes CRM retrotransposon transcripts, it has been suggested that centromeric retrotransposons are also important determinants of the structure of centromeric chromatin [34]. Because transcripts of CRR elements are processed by the RNA interference (RNAi) machinery of rice, Neumann et al. [35] proposed that these elements play a role in RNAi-mediated formation and maintenance of centromeric chromatin. However, as yet there have been no systematic attempts to investigate the function of centromeric retrotransposons in centromere activity, largely because of a lack of sufficient representatives to build a generalized picture that is valid across a spectrum of plant species. Thus, here we set out to produce a comprehensive survey of plant CRM retrotransposons. We have analyzed their nucleotide and protein sequences, with a goal of illuminating their structure, diversity, type of PTD, chromosomal distribution and transcriptional activity.: Results: Identification of putative centromeric retrotransposon sequences: The in silico search for CRM elements detected 145 novel elements, which fell into 63 families on the basis of species of origin and sequence similarity. An additional three families were identified from the sequence contigs assembled from the 454 derived sequences of pea and white campion. In addition to the sequences described in the literature, we gathered 190 elements representing 81 different retrotransposon families and distributed across 33 plant species belonging to 12 plant families (Figure 1; see also Additional file 1: Origin and structural features of sequences used in this work, and Additional file 2: CRM sequences used in this study). The phylogenetic analysis of representatives of each of the families, based on their RT domain sequence, clustered all the de novo sequences with previously identified CRM members (Figure 1A; see also Additional file 3: Alignment of RT domains). The same result was obtained by extending the analysis to a comparison of integrase and whole polyprotein sequences, confirming the appropriateness of the RT domain sequence (data not shown).: Diversity of CRM families and their species of origin. (A) Neighbor-joining tree inferred from a comparison of reverse transcriptase (RT) domain sequences. The non-chromovirus element Tat4-1 was used as an outgroup, while members of the Tekay, Reina and Galadriel clades were included as representatives of other plant chromoviruses. Alignment of the RT domains is provided in Additional file 3: Alignment of RT domains. On the basis of differences at the C-terminus of integrase, the CRM families were divided into groups A, B and C (Figure 3). Previously described CRM members are shown in purple (see also Additional file 1: Origin and structural features of sequences used in this work). Families with confirmed centromeric localization are marked with orange stars (fluorescence in situ hybridization results) or green stars (in silico localization). Families having a dispersed chromosomal distribution are labeled with orange or green hexagons. Bootstrap values are shown only for the major nodes. Elements belonging to the Tekay, Reina and Galadriel clades are listed in Additional file 1: Origin and structural features of sequences used in this work. It should be noted that because of the limitations of the neighbor-joining method and the lack of representatives from a wider range of evolutionarily distant species, the tree topology may not fully reflect real phylogenetic relationships between different groups of CRM elements. (B) Taxonomy classification of the species containing the CRM elements. Dates of divergence between major groups of plants are from the work by Chaw et al. [105]. The names of CRM families present in the species are shown in brackets.: Elements belonging to the CRM clade are variable at their integrase C-terminus: The integrase protein is probably critical for the correct targeting of the centromeric retrotransposons to the centromere region. Most of the CRM integrases possessed a zinc finger with an HHCC motif at its N-terminus and a core domain containing the D,D(35)E motif around the active site (Figure 2). Between the core domain and the C-terminus, which presumably includes the DNA binding region and PTD, sequence divergence prevented full alignment. While the putative DNA binding region contained several strongly conserved amino acid residues, the PTDs and their flanking sequences were variable. Surprisingly, this also applied to the CR motif, which was relatively well conserved in previously described elements, except for Beetle1 and Beetle2 [26, 31]. Of 81 CRM clade families, only 50 showed similarity to the CR motif. The integrases of the remaining families either possessed a type II chromodomain in place of the CR motif or lacked a PTD of any type. On the basis of the presence or absence and type of PTD, the elements were divided into three groups (Figures 1A and 3).: Graphical representation of the conserved portion of the integrase protein sequence. Integrase sequences extracted from CRM, Tekay, Reina and Galadriel chromoviruses aligned using the Muscle program are shown as sequence logo plots [96]. CRM clade members are shown in the upper part of the figure, and those from the other clades are shown in the lower part. Despite the overall high level of sequence similarity, several amino acid residues are conserved only within the CRM clade. The HHCC and DD35E motifs are indicated by green and red stars, respectively.: Alignments of sequences at the C-terminus integrase. Group A elements possess a CR motif with several strongly conserved amino acid residues near the N-terminus. These residues are not present in Beetle1, Beetle2 or SilL1, but are well conserved in the grass species elements (see bottom part of the alignment). The numbers between two aligned blocks specify the number of amino acid residues not shown. No putative targeting domain (PTD) is encoded by group B elements. The type II chromodomain of group C elements shares sequence similarity with Tekay, Galadriel and Reina clade members. The dotted line above each alignment marks a region conserved among all plant chromoviruses. The arrow shows the portion of the integrase lying within the 3' long terminal repeat(LTR). Asterisks indicate stop codons at the end of each open reading frame. A highly conserved GPY/F motif [36, 37] is indicated by a black trapezoid above the beginning of each alignment.: Group A members possessed the CR motif, although in a few cases the motif was significantly altered (Figure 3). Apart from Beetle1 and Beetle2, the most mutated CR motifs occurred in SilL1 and SilL2. Comparison of these elements with 454 sequencing-generated reads containing partial sequences of SilL1 and SilL2 showed high protein similarity in this region, suggesting that the altered sequences of the CR motif in these subfamilies were most likely due not to mutations in the two analyzed sequences but rather to a real divergence of these families from other elements belonging to group A.: Integrase sequences of the group B elements lacked any PTD, and they terminated shortly beyond the conserved glycine-proline-tyrosine/phynelyalanine (GPY/F) motif [36, 37] (Figure 3). To confirm that the absence of PTD was not an in silico translation error, evidence for the presence of the CR motif or the type II chromodomain typical for all other plant chromoviruses was sought within predicted polypeptides translated in all possible reading frames. Although these searches involved using the BLASTP, RPS-BLAST and MAST programs (National Center for Biotechnology Information (NCBI), http://www.ncbi.nlm.nih.gov/) to maximize sensitivity, the results were consistently negative. Together with their intact coding region and the similarity shown by the polyprotein termini, the evidence therefore strongly suggested that these groups of chromoviruses encode neither the type II chromodomain nor the CR motif.: Elements encoding the type II chromodomain were defined as group C (Figure 3). Similarly to other plant chromovirus clades (Tekay, Reina, Galadriel), the chromodomain of group C elements lacked the conserved aromatic cage residues known to interact with methylated H3K9. It should be noted that among members of this group were found all gymnosperm sequences, some of which were highly similar to the partial chromodomain-lacking sequence of the Spdl element [GenBank:AF229251] [38], present in white spruce (Picea glauca) and classified as a CRM member by Gorinsek et al. [27].: Structural features of the CRM elements: The range in size of the complete CRM elements was approximately 5.1 to 10.2 kbp. They were flanked by two LTRs ranging from 299 to 1,225 bp. The LTR termini featured the highly conserved inverted repeat motif 5'-TGATG/CATCA-3'. Upon insertion, CRM elements generated a 5-bp target site duplication, the sequence of which varied substantially from insertion to insertion. Thus these elements do not appear to target specific sequences in the genome. The age of the insertions ranged from 0 to 6.7 million years ago (see Additional file 1: Origin and structural features of sequences used in this work), demonstrating the recentness of insertion activity of CRM elements. The 5' LTR was followed by a primer binding site, while the 3' LTR was preceded by a polypurine tract (Figures 4A and 4B). Although the primer binding site of all CRM subfamilies was complementary to 12 to 18 nucleotides at the 3' end of tRNAMet, its sequences were only partially conserved, corresponding to various types of tRNAMet (Figure 4B). The polypurine tract ranged in length from 4 to 13 bp, and its sequence in group A elements was highly similar even between distantly related species (Figure 4B; see also Additional file 1: Origin and structural features of sequences used in this work). The A-rich stretch within the 5' UTR, common to many rice CRRs [17], was present in a number of group A elements (including those present in dicotyledonous species), which suggests its likely importance as a structural feature. However, it was absent in most members of groups B and C. The polyprotein region extended into the 3' LTR in all group A elements, but only in a few elements of groups B and C (Figure 4A). Although the coding sequence was interrupted by nonsense codons and/or frame shifts in many elements, it seemed to be organized as a single open reading frame in the intact autonomous elements. The putative polyprotein sequences contained all the domains necessary for replication and integration (gag, protease, RT, RNase H and integrase) (Figure 4A), showing a pronounced level of similarity between elements (Figure 4C). A relatively high level of similarity was also found between nucleotide sequences of the elements (see Additional file 4: Dot plot comparison of full-length CRM elements).: Structural analysis of CRM elements. (A) Polyprotein coding (white boxes), noncoding (gray boxes), putative targeting domain (PTD) (hatched boxes) and long terminal repeats (LTRs) (arrowed). Pbs, primer binding site; ppt, polypurine tract; aaa, A-rich stretch. The group A member coding region extends into the 3' LTR, which encodes the CR motif. Group B elements lack any PTD. Group C possesses a type II chromodomain-coding domain which terminates close to the 5' end of the 3' LTR. The graph is not drawn in proportion to segment lengths in base pairs. (B) Most elements share TGATG and T/CATCA inverted repeats at, respectively, the 5' and 3' end of the LTR. The primer binding site complementary to the 3' end of tRNAMet differs in sequence between various families. Group A elements contain highly conserved polypurine tract sequences. (C) A protein similarity plot shows that the CRM polyproteins are highly conserved, varying mainly within their C-terminal PTD regions. Individual polyprotein domains: GAG, capsid domain, similar to pfam37032; ZF, nucleocapsid GAG protein zinc finger; PRO, protease; RT, reverse transcriptase RNase H; IN, integrase; PTD, putative targeting domain.: Not all retrotransposon families within the CRM clade are accumulated in centromeres: Although the elements described above formed a well-defined phylogenetic clade, it remained to be established whether they were all preferentially localized in centromeric regions. The chromosomal distribution of selected families was investigated both experimentally by fluorescence in situ hybridization (FISH) and computationally in those species in which the whole genome sequence was available. A centromeric FISH signal was observed for all of the group A sequences tested (including PiSat1 in pea, SilL1 and SilL2 in white campion and PopT2 in black cottonwood) (Figure 5). A weak MedT1/2 centromeric signal was observed in barrel medic (data not shown). No detectable VitV2 FISH signal was obtained in grape, a result ascribable to a copy number of only approximately 50 per haploid genome, according to both a dot blot hybridization experiment and an in silico search of the whole grape genome sequence. The distribution of rare VitV2 copies in the whole genome sequence was essentially random (data not shown), but it must be borne in mind that published chromosome sequences are still incomplete and the position of the centromeres is as yet ill-defined [39].: Fluorescence in situ hybridization (FISH)-based visualization of the intrachromosomal distribution of chromoviruses. (A) Pea chromosomes hybridized with PiSat1 (group A). (B and C) Black cottonwood interphase nucleus and metaphase chromosomes hybridized with PopT2 (group A). Note that most of the signal is associated with chromocenters (bright 4',6-diamidino-2-phenylindole (DAPI)-stained spots in the interphase nucleus). Three metaphase chromosomes were enlarged to allow a clearer localization of PopT2 to the centromeric region (C1-C3). (D and E) White campion chromosomes hybridized with SilL1 and SilL2 (group A). (F) Banana chromosomes hybridized with MusA1 (group B). Since all of the banana chromosomes are metacentric or submetacentric, signals located around the center of the chromosome are taken to reflect loci near or within the centromere. Three of the chromosomes with identifiable centromeres were enlarged (F1-F3). (G and H) Norway spruce chromosomes counterstained with DAPI and hybridized with a Spdl-like sequence (group C). (I) Pea chromosomes hybridized with Peabody (Tekay clade). Positive hybridization signals are shown in green, and DAPI-counterstained DNA appears in red.: Surprisingly, we demonstrated by FISH with MusA1 in banana that even elements lacking PTD (group B members) can be preferentially accumulated in centromeres (Figure 5F). At elements CRA5 and CRA6, other representatives of group B, were also found in centromeres, but it should be mentioned that they have only a few copies in the genome (data not shown).: On the other hand, two tested representatives of group C showed dispersed distribution along chromosomes. The distribution of VitV1 elements in the current genome assembly appeared random (data not shown), a result which could not be validated cytogenetically, as the copy number of this element is too low (100 to 500 copies per haploid genome). When FISH was attempted in Norway spruce using a probe which shared 96% and 94% identity with white spruce Spdl and PicG1 sequences, respectively, the hybridization signal was dispersed along the whole length of all chromosomes (Figures 5G and 5H), unlike the pattern generated after probing with the centromeric satellite 2F [40] used to label centromeric regions (data not shown).: The same in silico strategy was extended to investigate the intrachromosomal distribution of elements belonging to other chromovirus clades represented in the complete At and rice genome sequences. While non-CRM chromoviruses are concentrated in the centromeric region of At, those in rice appear to be dispersed throughout the genome (data not shown) [31]. The distinct chromosomal distribution of rice and At type II chromodomain-containing chromoviruses, in combination with our own unpublished findings from preliminary experiments performed in other species as well as other data in the literature, suggest that the distribution of the elements may correlate with genome size. Therefore, we also carried out FISH based on a fragment of the pea retrotransposon Peabody, which is the most abundant chromovirus family (Tekay clade) in this species, with a copy number of about 10,000 per haploid genome (4,300 Mbp/1C) [41–43]. The hybridization signal covered every chromosome almost uniformly, although it was absent from secondary constrictions and major heterochromatic blocks (Figure 5I).: Elements possessing the CR motif are common in the angiosperms: As the search for novel CRM retrotransposons was aimed only at full-length sequences predicted by the LTR_Finder program (http://tlife.fudan.edu.cn/ltr_finder/), it excluded partial elements. An attempt was made to widen the search by trawling GenBank for species not identified by the initial search, querying with a set of all polyprotein domains extracted from the chromovirus elements shown in Figure 1A. This generated a set of >100 sequences showing >70% identity to CRM representatives (data not shown) and originating from both angiosperm and gymnosperm species. The angiosperm sequences were related to representatives of all three groups. However, all of the gymnosperm sequences were of the group C type. While the C-terminal portion of the integrase gene bearing the CR motif was present in a broad range of angiosperm species, it was not detected in any nonangiosperms. Thus, group A elements are either angiosperm-specific or have not been sequenced yet in gymnosperms.: The CRM elements are transcribed: A growing body of evidence suggests that noncoding transcripts derived from centromeric repetitive DNA, as well as small RNA produced via their RNAi-mediated degradation, are important for the proper function of the centromere [34, 44, 45]. When reverse-transcriptase polymerase chain reaction (RT-PCR) was performed to assay the transcriptional activity of a number of centromeric retrotransposon families, amplicons of the appropriate length were recovered in every case. Thus, transcriptional activity appears to be a general feature of centromeric retrotransposons (Additional file 5: Transcription of centromeric retrotransposons). A database search for small RNA sequences in At [46], barrel medic [47] and black cottonwood [48], as well as in pea (using an in-house database) identified small RNA sequences matching CRM clade elements in each of these species. With the exception of black cottonwood, for which very little sequence data were available (about 27,000 sequences, 14 of which were identical to PopT retrotransposons), >100 distinct small RNA sequences per species were identified. The abundance of particular small RNA was estimated on the basis of the frequency with which they occurred in each library, and this proved to be very low: only one or a few per tens of thousands to several million (data not shown). The global frequency of small RNA was low as well, especially in pea and barrel medic (respectively, 8 and 13 transcripts per quarter million, TPQ). The highest global frequency was found in At siliques (449 TPQ; see Additional file 5: Transcription of centromeric retrotransposons). The size range of these small RNA was 18 to 27 nt, but most were 24 nt (Additional file 5: Transcription of centromeric retrotransposons), and they originated from throughout the whole element sequence. In At, centromeric retrotransposon small RNA were represented in four different tissues, suggesting that they are constitutively transcribed and that RNAi is involved in their processing. A similar number of small RNA sequences were also present in At mutant lines in which the activity of various RNAi genes was disrupted (Additional file 5: Transcription of centromeric retrotransposons).: Discussion: The classification of CRM retrotransposons: Since centromeric retrotransposons have been classified as belonging to the chromovirus CRM clade [27, 29, 30], it was expected that the elements identified here would be largely concentrated in the centromeric region. Although the members of the CRM clade are taken as being the most highly conserved of the plant chromoviruses [27], the present data showed that they do vary sufficiently to allow for their subdivision into three groups distinguished with respect to the structure of their integrase C-termini (Figure 3). The CR motif, shown by Gao et al. [31] to be particularly well conserved, was indeed present in most of the group A members, but was lacking in those from groups B and C. Group C elements possessed a type II chromodomain in the place of the CR motif, while those in group B appeared to lack any kind of PTD. Although real phylogenetic relationships between elements from different groups remain to be resolved, the presence of the type II chromodomain in group C elements probably reflects an evolutionary divergence between the CRM and the other plant chromovirus clades from a common ancestor possessing this type of PTD. On the other hand, group B elements probably derived from those belonging to groups A and/or C either by deletion of PTD-coding region or by successive accumulation of mutations.: CRM clade members are not confined to the centromeric region of plant chromosomes: Although they appeared to be phylogenetically closely related to one another, CRM clade members were not universally localized to the centromeric region of the chromosome. The Spdl-like sequence (group C) in particular was dispersed over the whole length of the Norway spruce genome. A second CRM clade member with this type of genomic distribution was VitV1 (group C), although the evidence supporting its dispersed nature has relied entirely on an assembly of the grape genome known to be as yet incomplete [39]. Such an intrachromosomal distribution is consistent with that of other chromovirus non-CRM families containing the same type of PTD, such as Peabody in pea (Figure 5I) and a Peabody-like sequence in white campion [49]. On the other hand, group B elements, although they lack a PTD, tended to be concentrated in the centromeric region. Whether this has come about because of a PTD-independent targeting mechanism or whether group B elements accumulate in the centromeric region via some other process remains unclear.: What makes centromeric retrotransposons centromeric?: The general assumption is that the PTD is responsible for the targeting of centromeric retrotransposons to the centromeric region. Experimental evidence for this targeting by the CRM PTD has been generated in At [31]. Chromatin immunoprecipitation-based experiments have demonstrated that centromeric retrotransposons are associated with histones CenH3 and H3K9me2 [13, 17, 19, 35, 50] and are depleted in the euchromatic fraction marked with H3K4me2 [35]. While the interaction with CenH3 has yet to be tested, it has been demonstrated that the CRM PTD does not interact with H3K9me2 [31]. Provided that the integrase C-terminus ensures centromere-specific integration, it is reasonable to assume that the CR motif is a key component of the targeting process, since this is the sole relatively well-conserved portion of an otherwise rather variable sequence (Figure 3). This line of argument is challenged, however, by the centromeric localization of plant Ty3/gypsy retrotransposons lacking the CR motif. These include CRM group B members in addition to, in At at least, representatives of three major Ty3/gypsy retrotransposon lineages, two of which (Tat and Athila) lack any sort of PTD [51, 52]. Some chromoviruses possessing the type II chromodomain, especially those belonging to the Tekay clade, are concentrated in the centromeric regions of At [31] and banana [53]. In contrast, chromoviruses possessing type II chromodomains are dispersed along the chromosome arms in rice [31]. Peabody (Tekay clade) and PIGY (Athila lineage) elements are both highly dispersed in pea [54] (Figure 5I). Relatives of these two families are also dispersed in white campion [49]. Thus, while centromeric localization is the norm for elements possessing the CR motif, that of elements from lineages or clades lacking the CR motif is less predictable, although there is a tendency for their dispersion to be favored in large genomes. Heterochromatin in small genomes, as defined by the presence of methylated H3K9, is localized principally in the centromeric region, while in larger genomes, heterochromatic sites occur along the length of the chromosomes [55]. As a result, the apparently inconsistent intrachromosomal distribution of elements with particular types of PTD may simply reflect the contrasting distribution of heterochromatin. A consequence of this model is that elements possessing the CR motif must be able to recognize a centromeric chromatin-specific mark, while those with a type II chromodomain recognize a mark specific to heterochromatin more generally. No experimental evidence is available yet to either support or refute this notion, nor has any mechanism been suggested which can explain the colonization of the centromeric regions by elements that lack a PTD. However, a previous study of At showed that the accumulation of retrotransposons in centromeres may be the result of not only targeting but also purifying selection from centromere distal regions [51]. For the time being, therefore, we suggest that the term \"centromeric retrotransposons\" be reserved for group A elements, because only these are likely to actively target the centromeric region.: How widespread are the centromeric retrotransposons?: The present data show that CRM retrotransposons are widespread among seed plants. However, representatives of groups A and B were present in the angiosperms (both mono- and dicotyledonous species), but not in the gymnosperms and evolutionarily older species, such as the moss Physcomitrella patens, the genome of which has recently been sequenced [56, 57]. All CRM elements with confirmed centromeric localization belong to one or the other of these two groups. Thus, genuinely centromeric retrotransposons are either angiosperm-specific or are yet to be discovered in the other groups of plants. The gymnosperm CRM elements that we have identified belong to group C and are noncentromeric. Note that the Pinus pinaster pPpgy1 sequence was wrongly cited by Gorinsek et al. [27] as being centromeric ([58] and J.S.P. Heslop-Harrison, personal communication), and we believe that it is more likely to be a member of another chromovirus clade. Some insertions have proven to be very recent. The maintenance of transpositional activity suggests that CRM clade members are probably not all mere relics of earlier activity. The degree of amplification in the host genome differs from retrotransposon family to retrotransposon family. For instance, the copy number of At CRA is small, while that of the Norway spruce Spdl-like sequence reaches 50,000 to 100,000. A combination of published data for rice and maize [17, 28], along with the present data relating to banana, white campion, pea, grape and black cottonwood, indicates that the copy number of group A and B members, which is in the range of hundreds to a few thousand, is lower than that achieved by at least some group C families.: The role of centromeric retrotransposons in centromere function: Whether centromeric retrotransposons play any role in centromere function is of fundamental interest. One possibility is that they are merely parasitic and target the centromeric region to escape negative selection against insertions in distal regions of the chromosome [31]. The opposing hypothesis holds that they play a positive role in centromere function [59], in which case their targeting is also beneficial to the host. Centromeric sequences are polymorphic, yet the centromere represents a functionally highly conserved cytological structure [12, 60]. Most centromeric sequences are repetitive in nature. While centromeric satellites evolve rapidly at the sequence level (to the extent that they are largely species-specific) [61–63], centromeric retrotransposons appear to evolve more slowly. However, as the centromeres are assumed to be determined more epigenetically than genetically [64], it is unlikely that the centromeric retrotransposon sequence itself can be a direct determinant of centromere identity and function. Instead, it is probable that these repetitive sequences help to produce a conducive genomic environment for the establishment of centromeric chromatin. The promoters of centromeric retrotransposons may be important not only for their own transcription but also for the transcription of adjacent sequences as suggested by Jiang [12]. While it remains to be confirmed that their transcription is required for the deposition of CenH3 into the centromere, it does seem clear that transcripts of centromeric repeats do play some role in determining the integrity of centromeric chromatin and pericentromeric heterochromatin [45, 65–67]. CRM element transcripts remain bound to CenH3 chromatin, suggesting that they have a stabilizing role in the structure of the maize centromere [34]. All centromeric retrotransposons tested to date are actively transcribed [26, 34, 35] (Additional file 5: Transcription of centromeric retrotransposons), so it is reasonable to suggest that their function is similar to that of CRM. The outer centromeric repeats in the pericentromeric heterochromatin of fission yeast (Saccharomyces pombe) are required for the RNAi-mediated formation of heterochromatin necessary for the establishment of CENP-A (a synonym for CenH3) chromatin in the core domain [44, 68]. A portion of the centromeric retrotransposons is also associated with the heterochromatin mark H3K9me2, and at least some of their transcripts are processed via the RNAi pathway [35]. However, although the dependence on RNAi of both heterochromatin formation and centromere function has been demonstrated repeatedly [69–77], defective cell division has not as yet been associated with RNAi mutants in plants [78]. As the production of small RNA derived from centromeric retrotransposon transcripts was not compromised in RNAi mutants (Additional file 5: Transcription of centromeric retrotransposons), the absence of this predicted phenotype in these mutants may reflect a sufficient level of redundancy in the RNAi machinery. However, considering the very low frequency of small RNA sequences, we cannot exclude the possibility that they are merely an artefact of high-throughput sequencing. Therefore, it remains an open question both whether RNAi plays an important role in the regulation of centromeric retrotransposons and whether it is required for normal centromere function in plants.: Conclusions: Although centromeric retrotransposons were classified as a CRM clade of chromoviruses, our results show that genuinely centromeric retrotransposons represent only a fraction of this clade, which is referred to as group A in this paper. All tested elements from this group have centromeric localization, and most of them contain the CR motif at the C-terminus of their integrase. This motif is crucial for centromere targeting, and its N-terminal part is relatively well conserved even among evolutionarily distant species. Some chromoviruses containing altered sequences of the CR motif or lacking the CR motif also have centromeric localization. It remains unclear, however, whether their localization in centromeres is a result of centromere targeting or some other mechanisms.: The genuinely centromeric retrotransposons are present in both major angiosperm groups (mono- and dicotyledonous species), but have not been found in the gymnosperms and evolutionarily older species. They represent the only relatively conserved component within highly diverse sequences of plant centromeres. Their transpositional activity contributes to high evolutionary dynamics of centromeres by generating new insertions which may be further subjected to illegitimate and unequal homologous recombination. In addition, their transcriptional activity is consistent with the notion that the transcription of centromeric retrotransposons has a role in normal centromere function.: Methods: Plant material: Seeds of pea (Pisum sativum) cv. Carrera were obtained from Osiva Boršov (Boršov nad Vltavou, Czech Republic). Seeds of barrel medic (Medicago truncatula) cv. Jemalong were obtained from the Crop Research Institute (Prague-Ruzyne, Czech Republic). Seeds of white campion (Silene latifolia) were obtained from the Institute of Biophysics (Brno, Czech Republic). Seeds of Norway spruce (Picea abies) were harvested from natural stands at Strážkovice, Czech Republic. Banana (Musa acuminata cv. Calcutta 4 ITC 0249) plants were received from the International Transit Centre, Katholieke Universiteit (Leuven, Belgium), and grape (Vitis vinifera) cv. Pinot Noir plants were obtained from N.O.S. (Nepomuk, Czech Republic). Black cottonwood (Populus trichocarpa) cuttings were a gift from the Silva Tarouca Research Institute for Landscape and Ornamental Gardening (Pruhonice, Czech Republic).: In silico discovery of centromeric retrotransposons and sequence analysis: The in silico search strategy depended on the reliable discrimination of CRM chromoviruses from other LTR retrotransposons on the basis of their RT domain protein sequence [27, 79]. Thus, all green plant (Viridiplantae) sequences available in the GenBank database were queried with the RIRE7 RT domain using TBLASTN [80, 81], with an e-value threshold of 1 e-5. Rice sequences were excluded because the CRR elements have already been well characterized [17]. Full-size elements were identified among the resulting hits using LTR_Finder [82]. Elements from different species, elements which could not be aligned with the others over the whole length of their sequencesand elements sharing less than 70% similarity in the LTR were classified as distinct families. The relaxed TBLASTN stringency generated a diverse set of full-length retrotransposons containing elements from various Ty3/gypsy lineages, and BLASTX was applied to compare their sequences with a comprehensive database of RT domains extracted from all the major groups of plant Ty3/gypsy retrotransposons (data not shown). Only elements which had the best hits for some of the previously described CRM members were retained. As the best hit-based criteria could theoretically have resulted in the selection of chromovirus elements related to, but not necessarily falling within, the CRM clade, a phylogenetic analysis was carried out to clarify the relationships between the various elements. An additional search was conducted of 454-originated sequence data obtained from pea [42], white campion (J. Macas, E. Kejnovský, P. Novák, P. Neumann, A. Koblížková, B. Vyskot, unpublished data) and banana [53]. Contigs assembled from these sequence reads according to the method described by Macas et al. [42] were used to identify RT domains as delineated above. De novo full-length or nearly full-length sequences of these elements in white campion and pea were obtained from, respectively, bacterial artificial chromosome (BAC) clones and sequenced amplicons. Banana full-length elements corresponding to 454-generated sequences were already represented in GenBank. Although for the majority of the similarity searches we used TBLASTN, some searches were performed at the protein level using programs implemented in either HMMER [83, 84] or MEME [85, 86]. Sequence analysis was conducted using software within the EMBOSS or Staden packages [87, 88], multiple alignments were performed using Clustal X [89] or Muscle [90] software, and pairwise ones were performed using the Stretcher program [91]. Protein domains were identified by searching the Conserved Domains Database with RPS-BLAST [92], and by searching a local database with BLASTP and BLASTX [81]. Phylogenetic analyses relied on a neighbor-joining method using observed evolutionary distances implemented in the SeaView program [93]. Bootstrap values were calculated from 1,000 replications. Phylogenetic trees were drawn and edited using the iTOL [94] and FigTree [95] programs. The timing of individual insertion events was estimated on the basis of comparisons between 5' and 3' LTRs as described by Liu et al. [13]. Sequence logos were generated using the WebLogo tool[96]. The distribution of BLAST hits across the whole genome sequence was visualized using the NCBI MapViewer [97]. Small RNA sequences originating from centromeric retrotransposons were identified using BLASTN searches.: PCR, cloning, sequencing and hybridization: The sequences of all the PCR primers used for retrotransposon amplification and cloning are listed in Additional file 6: PCR primer sequences and targets. Longer fragments were amplified using LA DNA polymerase (Top-Bio, Prague, Czech Republic). Each 30 µl of PCR contained 1 × PCR buffer, 0.2 mM deoxyribonucleotide triphosphate (dNTP), 0.3 µM concentrations of each primer, 2% (wt/vol) dimethyl sulfoxide, 0.3 U of LA DNA polymerase and 150 ng of template. The reaction profile included 35 cycles of 15 seconds at 94°C, 30 seconds at 60°C, and 7 minutes at 68°C, preceded by initial denaturation step (94°C for 60 seconds) and followed by a final extension step (10 minutes at 68°C). Shorter fragments were amplified using Platinum Taq DNA Polymerase (Invitrogen, Carlsbad, CA, USA). Here each 25 µl of PCR contained 1 × PCR buffer, 0.2 mM dNTP, 0.2 µM concentrations of each primer, 1.5 mM MgCl2, 1 U of Platinum Taq DNA Polymerase and 5 ng of template. The reaction profile included 35 cycles of 30 seconds at 94°C, 30 seconds at 55°C, and 1 to 3 minutes at 72°C, preceded by initial denaturation (3 minutes at 94°C) and followed by a final extension step (10 minutes at 72°C). All PCR products were cloned into the pCR4 TOPO plasmid (Invitrogen). The resulting clones were either fully (cID58-2 and cID58-6) or partially sequenced to verify that they contained the intended insert. The sequences of the two fully sequenced inserts have been deposited in GenBank [GenBank:GU136551 and GenBank:GU136552]. A complete list of the clones used in this work is provided in Additional file 6: PCR primer sequences and targets.: A set of 20,000 white campion BAC clones (A. Widmer, unpublished data) was spotted onto a filter and screened by independent hybridizations with a-[32P]-dATP-labeled cID51-1 and cID51-2 (Prime-It II Random Primer Labeling Kit; Stratagene, La Jolla, CA, USA). The hybridization method used was the one described by Yang et al. [98], which was followed by a high-stringency wash in 0.1 × saline-sodium citrate (SSC) buffer and 0.1% sodium dodecyl sulfate at 65°C. Clones hybridizing strongly with both probes were isolated, and the presence of centromeric elements was verified by PCR. BAC clone BAC105E4 was sequenced using GS FLX technology (454 Life Sciences/Roche, Branford, CT, USA) to a depth of 20 × at GATC Biotech AG (Konstanz, Germany). Reads (mean length, 250 bp) were assembled into contigs using CAP3 [99]. The sequences of the SilL1 and SilL2 retrotransposons present in this BAC clone have been deposited in GenBank [GenBank:GU136549 and GenBank:GU136550]. Copy numbers were estimated for the Spdl-like sequence (clones cID79-1 and cID81-4), VitV1 (cID73-4 and cID74-1), VitV2 (cID91-2) and VitV3 (cID90-10) as described elsewhere [43]. These estimates were based on the published 1C genome sizes of grape (0.43 pg [100]) and Norway spruce (18.6 pg [101]).: Fluorescence in situ hybridization: Root meristems were obtained from young seedlings (barrel medic, Norway spruce, pea, white campion) or plants (banana, black cottonwood, grape). The accumulation of meristematic cells at metaphase for banana, pea, Norway spruce and white campion was carried out following the methods described by, respectively, Doleželová et al. [102], Neumann et al. [43], Uberall et al. [103] and Kejnovský et al. [11], while for the remaining species, mitotic metaphases were accumulated by treatment of the roots with 2.5 µM amiprophos-methyl (in 1 × Hoagland's solution) for 2 hours at room temperature. Mitotic spreads of barrel medic, Norway spruce, pea, black cottonwood and grape chromosomes were made using a conventional squashing method, followed by RNase A and pepsin treatment [104]. FISH probes for these species were labeled by nick translation incorporation of biotin-deoxyuridine triphosphatase (biotin-dUTP) [104] into a plasmid containing a retrotransposon insert. The following clones were used as sources of FISH probes: cID58-2 (PiSat1), cID64-3 plus cID68-2 (MedT1, MedT2), cID79-1 plus cID81-4 (Spdl-like sequences), cID85-11 (PopT2), cID91-2 (VitV2), cID90-10 (VitV3), cID73-4 plus cID74-1 (both VitV1) and Psat32 (partial sequence of the Peabody retrotransposon [43]). FISH hybridization was performed overnight at 28°C, followed by a posthybridization wash first in 2 × SSC at 32°C for 5 minutes and then in 50% (vol/vol) formamide in 2 × SSC at 32°C for 10 minutes. Biotinylated probes were detected as described by Leitch et al. [104] using fluorescein-avidin DN and biotinylated anti-avidin D. Chromosomes were counterstained with 4',6-diamidino-2-phenylindole. Images were captured with a DS-Qi1Mc cooled camera (Nikon, Tokyo, Japan) and analyzed using NIS Elements 3.0 software (Laboratory Imaging, Prague, Czech Republic). For white campion, chromosome preparation, probe labeling, hybridization and signal detection followed the methods described by Kejnovský et al. [11]. Probes consisting of SilL1 and SilL2 LTR fragments were amplified from BAC105E4 (for primer sequences, see Additional file 6: PCR primer sequences and targets). Chromosome preparations of banana and the subsequent hybridization and signal detection procedures followed the methods described by Doleželová et al. [102]. The MusA1 probe was PCR-labeled with biotin-dUTP from a template of clone cID53-1 DNA.: RT-PCR: Total RNA was isolated from leaves using TRIzol reagent (Invitrogen) and treated with DNase I (Ambion, Austin, TX, USA). First-strand synthesis was achieved using a SuperScript III First-Strand Synthesis System for RT-PCR kit (Invitrogen) according to the manufacturer's recommendations and employing random hexamers as primers. A sample of 5 ng of the resulting cDNA was used as a template for 25 µl of PCR containing 1 × PCR buffer, 0.2 mM dNTP, 0.2 µM concentrations of each primer, 1.5 mM MgCl2 and 1 U of Platinum Taq DNA Polymerase (Invitrogen). The amplification regime included 35 cycles of 30 sec at 94°C, 50 sec at 55°C, and 1-3 minutes at 72°C, preceded by initial denaturation (3 min at 94°C) and followed by a final extension step (10 min at 72°C). All relevant primer sequences are given in Additional file 6: PCR primer sequences and targets.: References: Arkhipova IR: Transposable elements in the animal kingdom. Mol Biol. 2001, 35: 157-167. 10.1023/A:1010485915642.: Bennetzen JL: The contributions of retroelements to plant genome organization, function and evolution. Trends Microbiol. 1996, 4: 347-353. 10.1016/0966-842X(96)10042-1.: Deininger PL, Batzer MA: Mammalian retroelements. Genome Res. 2002, 12: 1455-1465. 10.1101/gr.282402.: Hirochika H, Hirochika R: Ty1-copia group retrotransposons as ubiquitous components of plant genomes. Jpn J Genet. 1993, 68: 35-46. 10.1266/jjg.68.35.: Kumekawa N, Ohtsubo E, Ohtsubo H: Identification and phylogenetic analysis of gypsy-type retrotransposons in the plant kingdom. Genes Genetic Syst. 1999, 74: 299-307. 10.1266/ggs.74.299.: Suoniemi A, Tanskanen J, Schulman AH: Gypsy-like retrotransposons are widespread in the plant kingdom. Plant J. 1998, 13: 699-705. 10.1046/j.1365-313X.1998.00071.x.: Wöstemeyer J, Kreibich A: Repetitive DNA elements in fungi (Mycota): impact on genomic architecture and evolution. Curr Genet. 2002, 41: 189-198.: Feschotte C, Jiang N, Wessler SR: Plant transposable elements: where genetics meets genomics. Nat Rev Genet. 2002, 3: 329-341. 10.1038/nrg793.: Vitte C, Panaud O: LTR retrotransposons and flowering plant genome size: emergence of the increase/decrease model. Cytogenet Genome Res. 2005, 110: 91-107. 10.1159/000084941.: Balint-Kurti PJ, Clendennen SK, Doleželová M, Valárik M, Doležel J, Beetham PR, May GD: Identification and chromosomal localization of the monkey retrotransposon in Musa sp. Mol Gen Genet. 2000, 263: 908-915. 10.1007/s004380000265.: Kejnovský E, Kubát Z, Macas J, Hobza R, Mrácek J, Vyskot B: Retand: a novel family of gypsy-like retrotransposons harboring an amplified tandem repeat. Mol Genet Genomics. 2006, 276: 254-263.: Jiang J: A molecular view of plant centromeres. Trends Plant Sci. 2003, 8: 570-575. 10.1016/j.tplants.2003.10.011.: Liu Z, Yue W, Li D, Wang R, Kong X, Lu K, Wang G, Dong Y, Jin W, Zhang X: Structure and dynamics of retrotransposons at wheat centromeres and pericentromeres. Chromosoma. 2008, 117: 445-456. 10.1007/s00412-008-0161-9.: Bao W, Zhang W, Yang Q, Zhang Y, Han B, Gu M, Xue Y, Cheng Z: Diversity of centromeric repeats in two closely related wild rice species, Oryza officinalis and Oryza rhizomatis. Mol Genet Genomics. 2006, 275: 421-430. 10.1007/s00438-006-0103-2.: Cheng ZK, Dong FG, Langdon T, Shu OY, Buell CR, Gu MH, Blattner FR, Jiang JM: Functional rice centromeres are marked by a satellite repeat and a centromere-specific retrotransposon. Plant Cell. 2002, 14: 1691-1704. 10.1105/tpc.003079.: Kumekawa N, Ohmido N, Fukui K, Ohtsubo E, Ohtsubo H: A new gypsy-type retrotransposon, RIRE7: preferential insertion into the tandem repeat sequence TrsD in pericentromeric heterochromatin regions of rice chromosomes. Mol Genet Genomics. 2001, 265: 480-488. 10.1007/s004380000436.: Nagaki K, Neumann P, Zhang DF, Ouyang S, Buell CR, Cheng ZK, Jiang JM: Structure, divergence, and distribution of the CRR centromeric retrotransposon family in rice. Mol Biol Evol. 2005, 22: 845-855. 10.1093/molbev/msi069.: Nagaki K, Song J, Stupar RM, Parokonny AS, Yuan Q, Ouyang S, Liu J, Hsiao J, Jones KM, Dawe RK, Buell CR, Jiang J: Molecular and cytological analyses of large tracks of centromeric DNA reveal the structure and evolutionary dynamics of maize centromeres. Genetics. 2003, 163: 759-770.: Zhong CX, Marshall JB, Topp C, Mroczek R, Kato A, Nagaki K, Birchler JA, Jiang JM, Dawe RK: Centromeric retroelements and satellites interact with maize kinetochore protein CENH3. Plant Cell. 2002, 14: 2825-2836. 10.1105/tpc.006106.: Nagaki K, Murata M: Characterization of CENH3 and centromere-associated DNA sequences in sugarcane. Chromosome Res. 2005, 13: 195-203. 10.1007/s10577-005-0847-2.: Francki MG: Identification of Bilby, a diverged centromeric Ty1-copia retrotransposon family from cereal rye (Secale cereale L.). Genome. 2001, 44: 266-274. 10.1139/gen-44-2-266.: Hudakova S, Michalek W, Presting GG, ten Hoopen R, dos Santos K, Jasencakova Z, Schubert I: Sequence organization of barley centromeres. Nucleic Acids Res. 2001, 29: 5029-5035. 10.1093/nar/29.24.5029.: Presting GG, Malysheva L, Fuchs J, Schubert I: A TY3/GYPSY retrotransposon-like sequence localizes to the centromeric regions of cereal chromosomes. Plant J. 1998, 16: 721-728. 10.1046/j.1365-313x.1998.00341.x.: Miller JT, Dong F, Jackson SA, Song J, Jiang J: Retrotransposon-related DNA sequences in the centromeres of grass chromosomes. Genetics. 1998, 150: 1615-1623.: Gindullis F, Desel C, Galasso I, Schmidt T: The large-scale organization of the centromeric region in Beta species. Genome Res. 2001, 11: 253-265. 10.1101/gr.162301.: Weber B, Schmidt T: Nested Ty3-gypsy retrotransposons of a single Beta procumbens centromere contain a putative chromodomain. Chromosome Res. 2009, 17: 379-396. 10.1007/s10577-009-9029-y.: Gorinsek B, Gubensek F, Kordis D: Evolutionary genomics of chromoviruses in eukaryotes. Mol Biol Evol. 2004, 21: 781-798. 10.1093/molbev/msh057.: Sharma A, Presting GG: Centromeric retrotransposon lineages predate the maize/rice divergence and differ in abundance and activity. Mol Genet Genomics. 2008, 279: 133-147. 10.1007/s00438-007-0302-5.: Gorinsek B, Gubensek F, Kordis D: Phylogenomic analysis of chromoviruses. Cytogenet Genome Res. 2005, 110: 543-552. 10.1159/000084987.: Kordis D: A genomic perspective on the chromodomain-containing retrotransposons: chromoviruses. Gene. 2005, 347: 161-173. 10.1016/j.gene.2004.12.017.: Gao X, Hou Y, Ebina H, Levin HL, Voytas DF: Chromodomains direct integration of retrotransposons to heterochromatin. Genome Res. 2008, 18: 359-369. 10.1101/gr.7146408.: Ma J, Wing RA, Bennetzen JL, Jackson SA: Plant centromere organization: a dynamic structure with conserved functions. Trends Genet. 2007, 23: 134-139. 10.1016/j.tig.2007.01.004.: Wu J, Fujisawa M, Tian Z, Yamagata H, Kamiya K, Shibata M, Hosokawa S, Ito Y, Hamada M, Katagiri S, Kurita K, Yamamoto M, Kikuta A, Machita K, Karasawa W, Kanamori H, Namiki N, Mizuno H, Ma J, Sasaki T, Matsumoto T: Comparative analysis of complete orthologous centromeres from two subspecies of rice reveals rapid variation of centromere organization and structure. Plant J. 2009, 60: 805-819. 10.1111/j.1365-313X.2009.04002.x.: Topp CN, Zhong CX, Dawe RK: Centromere-encoded RNAs are integral components of the maize kinetochore. Proc Natl Acad Sci USA. 2004, 101: 15986-15991. 10.1073/pnas.0407154101.: Neumann P, Yan H, Jiang J: The centromeric retrotransposons of rice are transcribed and differentially processed by RNA interference. Genetics. 2007, 176: 749-761. 10.1534/genetics.107.071902.: Lloréns C, Fares MA, Moya A: Relationships of gag-pol diversity between Ty3/Gypsy and Retroviridae LTR retroelements and the three kings hypothesis. BMC Evol Biol. 2008, 8: 276-: Malik HS, Eickbush TH: Modular evolution of the integrase domain in the Ty3/Gypsy class of LTR retrotransposons. J Virol. 1999, 73: 5186-5190.: L'Homme Y, Séguin A, Tremblay FM: Different classes of retrotransposons in coniferous spruce species. Genome. 2000, 43: 1084-1089.: Jaillon O, Aury JM, Noel B, Policriti A, Clepet C, Casagrande A, Choisne N, Aubourg S, Vitulo N, Jubin C, Vezzi A, Legeai F, Hugueney P, Dasilva C, Horner D, Mica E, Jublot D, Poulain J, Bruyère C, Billault A, Segurens B, Gouyvenoux M, Ugarte E, Cattonaro F, Anthouard V, Vico V, Del Fabbro C, Alaux M, Di Gaspero G, Dumas V, French-Italian Public Consortium for Grapevine Genome Characterization: The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature. 2007, 449: 463-467. 10.1038/nature06148.: Sarri V, Minelli S, Panara F, Morgante M, Jurman I, Zuccolo A, Cionini PG: Characterization and chromosomal organization of satellite DNA sequences in Picea abies. Genome. 2008, 51: 705-713. 10.1139/G08-048.: Bennett MD, Leitch IJ: Angiosperm DNA C-values database (release 7.0, Dec. 2010). http://www.kew.org/cvalues/: Macas J, Neumann P, Navrátilová A: Repetitive DNA in the pea (Pisum sativum L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula. BMC Genomics. 2007, 8: 427-10.1186/1471-2164-8-427.: Neumann P, Nouzová M, Macas J: Molecular and cytogenetic analysis of repetitive DNA in pea (Pisum sativum L.). Genome. 2001, 44: 716-728. 10.1139/gen-44-4-716.: Folco HD, Pidoux AL, Urano T, Allshire RC: Heterochromatin and RNAi are required to establish CENP-A chromatin at centromeres. Science. 2008, 319: 94-97. 10.1126/science.1150944.: Wong LH, Brettingham-Moore KH, Chan L, Quach JM, Anderson MA, Northrop EL, Hannan R, Saffery R, Shaw ML, Williams E, Choo KHA: Centromere RNA is a key component for the assembly of nucleoproteins at the nucleolus and centromere. Genome Res. 2007, 17: 1146-1160. 10.1101/gr.6022807.: Gustafson AM, Allen E, Givan S, Smith D, Carrington JC, Kasschau KD: ASRP: the Arabidopsis Small RNA Project Database. Nucleic Acids Res. 2005, 33: D637-D640. 10.1093/nar/gki127.: Szittya G, Moxon S, Santos DM, Jing R, Fevereiro MPS, Moulton V, Dalmay T: High-throughput sequencing of Medicago truncatula short RNAs identifies eight new miRNA families. BMC Genomics. 2008, 9: 593-10.1186/1471-2164-9-593.: Barakat A, Wall PK, DiLoreto S, dePamphilis CW, Carlson JE: Conservation and divergence of microRNAs in Populus. BMC Genomics. 2007, 8: 481-10.1186/1471-2164-8-481.: Cermák T, Kubát Z, Hobza R, Koblížková A, Widmer A, Macas J, Vyskot B, Kejnovský E: Survey of repetitive sequences in Silene latifolia with respect to their distribution on sex chromosomes. Chromosome Res. 2008, 16: 961-976.: Houben A, Schroeder-Reiter E, Nagaki K, Nasuda S, Wanner G, Murata M, Endo TR: CENH3 interacts with the centromeric retrotransposon cereba and GC-rich satellites and locates to centromeric substructures in barley. Chromosoma. 2007, 116: 275-283. 10.1007/s00412-007-0102-z.: Pereira V: Insertion bias and purifying selection of retrotransposons in the Arabidopsis thaliana genome. Genome Biol. 2004, 5: R79-10.1186/gb-2004-5-10-r79.: Peterson-Burch BD, Nettleton D, Voytas DF: Genomic neighborhoods for Arabidopsis retrotransposons: a role for targeted integration in the distribution of the Metaviridae. Genome Biol. 2004, 5: R78-10.1186/gb-2004-5-10-r78.: Hribová E, Neumann P, Matsumoto T, Roux N, Macas J, Doležel J: Repetitive part of the banana (Musa acuminata) genome investigated by low-depth 454 sequencing. BMC Plant Biol. 2010, 10: 204-: Neumann P, Požárková D, Koblížková A, Macas J: PIGY, a new plant envelope-class LTR retrotransposon. Mol Genet Genomics. 2005, 273: 43-53. 10.1007/s00438-004-1092-7.: Houben A, Demidov D, Gernand D, Meister A, Leach CR, Schubert I: Methylation of histone H3 in euchromatin of plant chromosomes depends on basic nuclear DNA content. Plant J. 2003, 33: 967-973. 10.1046/j.1365-313X.2003.01681.x.: Novikova O, Mayorov V, Smyshlyaev G, Fursov M, Adkison L, Pisarenko O, Blinov A: Novel clades of chromodomain-containing Gypsy LTR retrotransposons from mosses (Bryophyta). Plant J. 2008, 56: 562-574. 10.1111/j.1365-313X.2008.03621.x.: Rensing SA, Lang D, Zimmer AD, Terry A, Salamov A, Shapiro H, Nishiyama T, Perroud PF, Lindquist EA, Kamisugi Y, Tanahashi T, Sakakibara K, Fujita T, Oishi K, Shin-I T, Kuroki Y, Toyoda A, Suzuki Y, Hashimoto S, Yamaguchi K, Sugano S, Kohara Y, Fujiyama A, Anterola A, Aoki S, Ashton N, Barbazuk WB, Barker E, Bennetzen JL, Blankenship R: The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science. 2008, 319: 64-69. 10.1126/science.1150646.: Friesen N, Brandes A, Heslop-Harrison JSP: Diversity, origin, and distribution of retrotransposons (gypsy and copia) in conifers. Mol Biol Evol. 2001, 18: 1176-1188.: Slotkin RK, Martienssen R: Transposable elements and the epigenetic regulation of the genome. Nat Rev Genet. 2007, 8: 272-285. 10.1038/nrg2072.: Hall AE, Keith KC, Hall SE, Copenhaver GP, Preuss D: The rapidly evolving field of plant centromeres. Curr Opin Plant Biol. 2004, 7: 108-114. 10.1016/j.pbi.2004.01.008.: Henikoff S, Ahmad K, Malik HS: The centromere paradox: stable inheritance with rapidly evolving DNA. Science. 2001, 293: 1098-1102. 10.1126/science.1062939.: Houben A, Schubert I: DNA and proteins of plant centromeres. Curr Opin Plant Biol. 2003, 6: 554-560. 10.1016/j.pbi.2003.09.007.: Nagaki K, Walling J, Hirsch C, Jiang J, Murata M: Structure and evolution of plant centromeres. Prog Mol Subcell Biol. 2009, 48: 153-179. full_text.: Dawe RK, Henikoff S: Centromeres put epigenetics in the driver's seat. Trends Biochem Sci. 2006, 31: 662-669. 10.1016/j.tibs.2006.10.004.: Carone DM, Longo MS, Ferreri GC, Hall L, Harris M, Shook N, Bulazel KV, Carone BR, Obergfell C, O'Neill MJ, O'Neill RJ: A new class of retroviral and satellite encoded small RNAs emanates from mammalian centromeres. Chromosoma. 2009, 118: 113-125. 10.1007/s00412-008-0181-5.: Maison C, Bailly D, Peters AHFM, Quivy JP, Roche D, Taddei A, Lachner M, Jenuwein T, Almouzni G: Higher-order structure in pericentric heterochromatin involves a distinct pattern of histone modification and an RNA component. Nat Genet. 2002, 30: 329-334. 10.1038/ng843.: Muchardt C, Guillemé M, Seeler JS, Trouche D, Dejean A, Yaniv M: Coordinated methyl and RNA binding is required for heterochromatin localization of mammalian HP1a. EMBO Rep. 2002, 3: 975-981. 10.1093/embo-reports/kvf194.: Kagansky A, Folco HD, Almeida R, Pidoux AL, Boukaba A, Simmer F, Urano T, Hamilton GL, Allshire RC: Synthetic heterochromatin bypasses RNAi and centromeric repeats to establish functional centromeres. Science. 2009, 324: 1716-1719. 10.1126/science.1172026.: Cam HP, Sugiyama T, Chen ES, Chen X, FitzGerald PC, Grewal SIS: Comprehensive analysis of heterochromatin- and RNAi-mediated epigenetic control of the fission yeast genome. Nat Genet. 2005, 37: 809-819. 10.1038/ng1602.: Deshpande G, Calhoun G, Schedl P: Drosophila argonaute-2 is required early in embryogenesis for the assembly of centric/centromeric heterochromatin, nuclear division, nuclear migration, and germ-cell formation. Genes Dev. 2005, 19: 1680-1685. 10.1101/gad.1316805.: Durand-Dubief M, Bastin P: TbAGO1, an Argonaute protein required for RNA interference, is involved in mitosis and chromosome segregation in Trypanosoma brucei. BMC Biol. 2003, 1: 2-10.1186/1741-7007-1-2.: Fukagawa T, Nogami M, Yoshikawa M, Ikeno M, Okazaki T, Takami Y, Nakayama T, Oshimura M: Dicer is essential for formation of the heterochromatin structure in vertebrate cells. Nat Cell Biol. 2004, 6: 784-791. 10.1038/ncb1155.: Kanellopoulou C, Muljo SA, Kung AL, Ganesan S, Drapkin R, Jenuwein T, Livingston DM, Rajewsky K: Dicer-deficient mouse embryonic stem cells are defective in differentiation and centromeric silencing. Gene Dev. 2005, 19: 489-501. 10.1101/gad.1248505.: Pal-Bhadra M, Leibovitch BA, Gandhi SG, Rao M, Bhadra U, Birchler JA, Elgin SCR: Heterochromatic silencing and HP1 localization in Drosophila are dependent on the RNAi machinery. Science. 2004, 303: 669-672. 10.1126/science.1092653.: Pidoux AL, Allshire RC: The role of heterochromatin in centromere function. Philos Trans R Soc Lond B Biol Sci. 2005, 360: 569-579. 10.1098/rstb.2004.1611.: Provost P, Silverstein RA, Dishart D, Walfridsson J, Djupedal I, Kniola B, Wright A, Samuelsson B, Rådmark O, Ekwall K: Dicer is required for chromosome segregation and gene silencing in fission yeast cells. Proc Natl Acad Sci USA. 2002, 99: 16648-16653. 10.1073/pnas.212633199.: Volpe T, Schramke V, Hamilton GL, White SA, Teng G, Martienssen RA, Allshire RC: RNA interference is required for normal centromere function in fission yeast. Chromosome Res. 2003, 11: 137-146. 10.1023/A:1022815931524.: May BP, Lippman ZB, Fang Y, Spector DL, Martienssen RA: Differential regulation of strand-specific transcripts from Arabidopsis centromeric satellite repeats. PLoS Genet. 2005, 1: e79-10.1371/journal.pgen.0010079.: Lloréns C, Futami R, Bezemer D, Moya A: The Gypsy Database (GyDB) of mobile genetic elements. Nucleic Acids Res. 2008, 36: 38-46.: Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410.: Altschul SF, Madden TL, Schaffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.: Xu Z, Wang H: LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 2007, 35: W265-W268. 10.1093/nar/gkm286.: Durbin R, Eddy SR, Krogh A, Mitchison GJ: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. 1998, Cambridge, UK: Cambridge University Press: Eddy SR: A probabilistic model of local sequence alignment that simplifies statistical significance estimation. PLoS Comput Biol. 2008, 4: e1000069-10.1371/journal.pcbi.1000069.: Bailey TL, Elkan C: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol. 1994, 2: 28-36.: Bailey TL, Gribskov M: Combining evidence using p-values: application to sequence homology searches. Bioinformatics. 1998, 14: 48-54. 10.1093/bioinformatics/14.1.48.: Rice P, Longden I, Bleasby A: EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000, 16: 276-277. 10.1016/S0168-9525(00)02024-2.: Staden R: The Staden sequence analysis package. Mol Biotechnol. 1996, 5: 233-241. 10.1007/BF02900361.: Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG: Clustal W and Clustal X version 2.0. Bioinformatics. 2007, 23: 2947-2948. 10.1093/bioinformatics/btm404.: Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32: 1792-1797. 10.1093/nar/gkh340.: Myers EW, Miller W: Optimal alignments in linear space. Comput Appl Biosci. 1988, 4: 11-17.: Marchler-Bauer A, Anderson JB, DeWeese-Scott C, Fedorova ND, Geer LY, He S, Hurwitz DI, Jackson JD, Jacobs AR, Lanczycki CJ, Liebert CA, Liu C, Madej T, Marchler GH, Mazumder R, Nikolskaya AN, Panchenko AR, Rao BS, Shoemaker BA, Simonyan V, Song JS, Thiessen PA, Vasudevan S, Wang Y, Yamashita RA, Yin JJ, Bryant SH: CDD: a curated Entrez database of conserved domain alignments. Nucleic Acids Res. 2003, 31: 383-387. 10.1093/nar/gkg087.: Gouy M, Guindon S, Gascuel O: SeaView version 4: a multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol Biol Evol. 2010, 27: 221-224. 10.1093/molbev/msp259.: Letunic I, Bork P: Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics. 2007, 23: 127-128. 10.1093/bioinformatics/btl529.: FigTree.http://tree.bio.ed.ac.uk/software/figtree/: Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome Res. 2004, 14: 1188-1190. 10.1101/gr.849004.: Map viewer.http://www.ncbi.nlm.nih.gov/mapview/: Yang H, McLeese J, Weisbart M, Dionne JL, Lemaire I, Aubin RA: Simplified high throughput protocol for northern hybridization. Nucleic Acids Res. 1993, 21: 3337-3338. 10.1093/nar/21.14.3337.: Huang X, Madan A: CAP3: a DNA sequence assembly program. Genome Res. 1999, 9: 868-877. 10.1101/gr.9.9.868.: Lodhi MA, Reisch BI: Nuclear DNA content of Vitis species, cultivars, and other genera of the Vitaceae. Theor Appl Genet. 1995, 90: 11-16. 10.1007/BF00220990.: Siljak-Yakovlev S, Cerbah M, Coulaud J, Stoian V, Brown SC, Zoldos V, Jelenic S, Papes D: Nuclear DNA content, base composition, heterochromatin and rDNA in Picea omorika and Picea abies. Theor Appl Genet. 2002, 104: 505-512. 10.1007/s001220100755.: Doleželová M, Valárik M, Swennen R, Horry JP, Doležel J: Physical mapping of the 18S-25S and 5S ribosomal RNA genes in diploid bananas. Biol Plantarum. 1998, 41: 497-505.: Uberall I, Vrána J, Bartoš J, Šmerda J, Doležel J, Havel L: Isolation of chromosomes from Picea abies and their analysis by flow cytometry. Biol Plantarum. 2004, 48: 199-203.: Leitch AR, Schwarzacher T, Jackson D, Leitch IJ: In situ Hybridization. 1994, Oxford, UK: BIOS Scientific: Chaw SM, Chang CC, Chen HL, Li WH: Dating the monocot-dicot divergence and the origin of core eudicots using whole chloroplast genomes. J Mol Evol. 2004, 58: 424-441. 10.1007/s00239-003-2564-9.: Download references: Acknowledgements: This research was financially supported by grants from the Academy of Sciences of the Czech Republic (KJB500960802 to PN and AVOZ50510513 to JM), the Ministry of Education, Youth and Sport of the Czech Republic (LC06004 to JM and JD) and the Czech Science Foundation (522/09/0083 to RH). We thank H. Štepancíková and J. Látalová for their excellent technical assistance and Dr J. Weger (The Silva Tarouca Research Institute for Landscape and Ornamental Gardening, Pruhonice, Czech Republic) for his provision of black cottonwood cuttings.: Author information: Affiliations: Corresponding author: Correspondence to Pavel Neumann.: Additional information: Competing interests: The authors declare that they have no competing interests.: Authors' contributions: PN and JM designed the study. PN carried out bioinformatics analyses and participated in some experiments. JM, EK, JD and EH were involved in 454 sequencing. JM processed sequence data from 454 sequencing. AN carried out fluorescent in situ hybridization (FISH) experiments in pea, black cottonwood, barrel medic, grape and Norway spruce. EK and EH carried out FISH in white campion and banana, respectively. AK participated in cloning and sequencing experiments. AW constructed the bacterial artificial chromosome (BAC) cloning library of white campion. RH and EK screened the BAC library and sequenced the BAC clone BAC105E4. PN and JM drafted the manuscript. All authors read and approved the final manuscript.: Electronic supplementary material: Additional file 1: Origin and structural features of sequences used in this work. (A) Origin and (B) sequence and structural features of CRM clade chromoviruses. (C) Elements belonging to the Tekay, Reina and Galadriel clades. (XLS 112 KB): Additional file 2: CRM sequences used in this study. (FASTA 1 MB): Additional file 3: Alignment of reverse transcriptase domains. (FASTA 24 KB): Additional file 4: Dot plot comparison of full-length CRM elements. The elements are ordered according to group and plant family. Each family is represented by one element. (PDF 69 KB): 13100_2010_29_MOESM5_ESM.TIFF: Additional file 6: Polymerase chain reaction primer sequences and targets. (XLSX 12 KB): Authors’ original submitted files for images: Below are the links to the authors’ original submitted files for images.: Authors’ original file for figure 1: Authors’ original file for figure 2: Authors’ original file for figure 3: Authors’ original file for figure 4: Authors’ original file for figure 5: Authors’ original file for figure 6: Rights and permissions: This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.: Reprints and Permissions: About this article: Cite this article: Neumann, P., Navrátilová, A., Koblížková, A. et al. Plant centromeric retrotransposons: a structural and cytogenetic perspective. Mobile DNA 2, 4 (2011). https://doi.org/10.1186/1759-8753-2-4: Download citation: Received: 08 October 2010: Accepted: 03 March 2011: Published: 03 March 2011: DOI: https://doi.org/10.1186/1759-8753-2-4: Keywords: Associated Content: Collection: Mobile DNA All Reviews : Advertisement: ISSN: 1759-8753: Follow BMC: By using this website, you agree to our Terms and Conditions, Privacy statement and Cookies policy. Manage the cookies we use in the preference centre. : © 2020 BioMed Central Ltd unless otherwise stated. Part of Springer Nature. "
"Effect of attC structure on cassette excision by integron integrases" "André Larouche, Paul H Roy" "Paul H Roy" "18 February 2011" "Integrons are genetic elements able to integrate and disseminate genes as cassettes by a site-specific recombination mechanism. These elements contain a gene coding for an integrase that carries out recombination by interacting with two different target sites; the attI site in cis with the integrase and the palindromic attC site of a gene cassette. Integron integrases (IntIs) bind specifically to the bottom strand of attC sites. The extrahelical bases resulting from folding of attC bottom strands are important for the recognition by integrases. These enzymes are directly involved in the accumulation and formation of new cassette arrangements in the variable region of integrons. Thus, it is important to better understand interactions between IntIs and their substrates., We compared the ability of five IntIs to carry out excision of several cassettes flanked by different attC sites. The results showed that for most cassettes, IntI1 was the most active integrase. However, IntI2*179E and SonIntIA could easily excise cassettes containing the attCdfrA1
site located upstream, whereas IntI1 and IntI3 had only a weak excision activity for these cassettes. Analysis of the secondary structure adopted by the bottom strand of attCdfrA1
has shown that the identity of the extrahelical bases and the distance between them (A-N7-8-C) differ from those of attC s contained in the cassettes most easily excisable by IntI1 (T-N6-G). We used the attCdfrA1
site upstream of the sat2 gene cassette as a template and varied the identity and spacing between the extrahelical bases in order to determine how these modifications influence the ability of IntI1, IntI2*179E, IntI3 and SonIntIA to excise cassettes. Our results show that IntI1 is more efficient in cassette excision using T-N6-G or T-N6-C attC s while IntI3 recognizes only a limited range of attC s. IntI2*179E and SonIntIA are more tolerant of changes to the identity and spacing of extrahelical bases., This study provides new insights into the factors that influence the efficiency of cassette excision by integron integrases. It also suggests that IntI2 and SonIntIA have an evolutionary path that is different from IntI1 and IntI3, in their ability to recognize and excise cassettes." "Bottom Strand, Resistance Gene Cassette, Tyrosine Recombinases, attC Site, Excision Activity" " Effect of attC structure on cassette excision by integron integrases: André Larouche1,2 & Paul H Roy1,2 : Mobile DNA volume 2, Article number: 3 (2011) Cite this article : 6750 Accesses: 13 Citations: 0 Altmetric: Metrics details: Abstract: Background: Integrons are genetic elements able to integrate and disseminate genes as cassettes by a site-specific recombination mechanism. These elements contain a gene coding for an integrase that carries out recombination by interacting with two different target sites; the attI site in cis with the integrase and the palindromic attC site of a gene cassette. Integron integrases (IntIs) bind specifically to the bottom strand of attC sites. The extrahelical bases resulting from folding of attC bottom strands are important for the recognition by integrases. These enzymes are directly involved in the accumulation and formation of new cassette arrangements in the variable region of integrons. Thus, it is important to better understand interactions between IntIs and their substrates.: Results: We compared the ability of five IntIs to carry out excision of several cassettes flanked by different attC sites. The results showed that for most cassettes, IntI1 was the most active integrase. However, IntI2*179E and SonIntIA could easily excise cassettes containing the attCdfrA1 site located upstream, whereas IntI1 and IntI3 had only a weak excision activity for these cassettes. Analysis of the secondary structure adopted by the bottom strand of attCdfrA1 has shown that the identity of the extrahelical bases and the distance between them (A-N7-8-C) differ from those of attC s contained in the cassettes most easily excisable by IntI1 (T-N6-G). We used the attCdfrA1 site upstream of the sat2 gene cassette as a template and varied the identity and spacing between the extrahelical bases in order to determine how these modifications influence the ability of IntI1, IntI2*179E, IntI3 and SonIntIA to excise cassettes. Our results show that IntI1 is more efficient in cassette excision using T-N6-G or T-N6-C attC s while IntI3 recognizes only a limited range of attC s. IntI2*179E and SonIntIA are more tolerant of changes to the identity and spacing of extrahelical bases.: Conclusions: This study provides new insights into the factors that influence the efficiency of cassette excision by integron integrases. It also suggests that IntI2 and SonIntIA have an evolutionary path that is different from IntI1 and IntI3, in their ability to recognize and excise cassettes.: Background: In recent years, Gram-negative pathogens such as Pseudomonas aeruginosa, Acinetobacter baumannii, and Klebsiella pneumoniae have become increasingly resistant to antibiotics. The widespread dissemination of bacterial resistance genes is mediated by horizontal transfer and many of these genes are integrated and expressed as operons in DNA elements called integrons.: Integrons are genetic elements that can integrate and disseminate genes as cassettes by a site-specific recombination mechanism [1]. They contain an integrase gene (intI), a recombination site (attI), and a promoter region (Pc) that directs the expression of captured genes (Figure 1) [2]. Cassettes located within the variable region of integrons all share certain characteristics. First, the integrated cassettes are composed of a gene and an imperfect inverted repeat, called an attC site, located downstream of the gene (Figure 1) [3–6]. Second, the boundaries of each integrated cassette are defined by two GTTRRRY sequences that are targets for recombination events mediated by integron integrases (IntIs).: General structure of class 1 integrons. Cassettes are inserted in the variable region of integrons by a site-specific recombination mechanism. The attI1 and attC sites are shown by a vertical rectangle and oval, respectively, and promoters are denoted by Pint, Pc and P. Integrated cassettes are composed of a gene and an attC recombination site. Genes are as follows: intI1, integrase gene; qacE<U+0394>1, antiseptic resistance gene; sul1, sulphonamide resistance gene; orf5, gene of unknown function.: Studies on the site-specific recombination mechanism mediated by IntIs have demonstrated that IntIs form a separate subfamily, characterized by the presence of an additional domain required for their activity, within the larger family of tyrosine recombinases [7, 8]. IntIs can share as little as 35% sequence identity, indicating a long evolutionary history for these enzymes. Their catalytic domain is similar to that of other members of the tyrosine recombinase family and contains the conserved residues: Arg146-Lys171-His277-Arg280-His/Trp303 and the nucleophilic tyrosine, Tyr312 (coordinates are those of IntI1).: Unlike other members of the family, the IntI recombinases can exchange DNA using two sites with different structures, the non-palindromic attI and palindromic attC. Integration of cassettes occurs preferentially by recombination of the attC site in a closed-circular cassette with the attI site of an integron [3] while excision of a cassette, generating a circular form, occurs preferentially by recombination between two attC sites, one of them associated with the upstream cassette [4].: The attI and attC sequences are complex attachment sites that include the crossover site and additional binding sites (Figure 2), suggesting that integrase monomers act as accessory factors at these additional sites [9–11]. attI sites are located at the end of the 5' conserved region of integrons and their sequences vary considerably. Unlike the attI sites, attC sites share a common set of characteristics that enable them to be identified despite the diversity of their sequence and size [6, 12]. They are characterized by a palindrome of variable length and sequence between the RYYYAAC inverse core site and the GTTRRRY core site [12]. The size of these recombination sites (57 to 141 bp) is currently the main criterion for classification of attCs[13, 14]. They consist of two pairs of binding sites in opposite orientation (1L-2L and 2R-1R), each pair forming a simple site (LH and RH), separated by a segment of variable length and sequence but including an inverted repeat (Figure 2) [12]. These features are generally well recognized by IntI enzymes since many attC sites can act as recombination sites for IntIs sharing less than 50% amino acid sequence identity [6, 12, 15–20].: Integron recombination sites. (a) Sequence of the double strand (ds) attI1 site. (b) Sequence of the ds attCant(3'')-Ia site. (c) Secondary structure of the folded bottom strand of the attCant(3'')-Ia site, according to MFOLD (http://mfold.bioinfo.rpi.edu/cgi-bin/dna-form1.cgi). The inverted repeats L, 1L, and 2L, R, 1R and 2R are shown by horizontal black arrows. The attI1 direct repeats bound by IntI1 are indicated by horizontal lines with an empty arrowhead. The crossover positions are indicated by vertical arrows and the extrahelical bases are identified by asterisks.: As members of the tyrosine recombinase family, IntIs use a topoisomerase I type mechanism of cleavage [8, 21, 22]. Four integrase monomers are involved in the site-specific recombination reaction in which the exchange of one DNA strand contributes to the formation of a Holliday junction [23, 24]. For most tyrosine recombinases, this intermediate is resolved by the exchange of the second strand [22, 25]. IntIs differ from other tyrosine recombinases by their use of a folded single-stranded attC site [26, 27] and by the exchange of one strand, with the intermediate possibly resolved by DNA replication (Figure 3) [28, 29].: Model of cassette integration mediated by integron integrases. The site-specific recombination reaction is carried out between the folded bottom strand of an attC site and the attI site of an integron. (a) Two integrase monomers bind each DNA molecule. (b) Attacking monomer on each DNA molecule cuts one of the two strands. (c) Strand exchange of cut strands. (d) Ligation of exchanged strands forms a Holliday junction. (e) This intermediate structure could be resolved by DNA replication, generating three products, one of which contains the inserted cassette. The attacking monomers are shown by gray circles and the non-attacking monomers are shown by white circles. Y: catalytic tyrosine residues. Adapted from [28].: IntI recombinases bind specifically to the bottom strand (bs) of attC[26] and the extrahelical bases resulting from folding of the attC bs are important for recognition by IntIs [27, 29, 30]. The VchIntIA-Vibrio cholerae repeat (VCR)bs three-dimensional structure showed that the ß-4,5 strands from the non-attacking subunits interact with the extrahelical base T12\" (first extrahelical base of the folded attC bs) while the a-I2 helix from the attacking subunits forms several important contacts in trans with DNA in the region of the extrahelical base G20\" (second extrahelical base of the folded attC bs; Figure 4A and 4B) [29].: cis and trans extrahelical base interactions. (a) cis interactions of the VchIntIA non-attacking subunit made by P232, H240 and H241 (in blue) with the extrahelical base T12''. (b) trans interactions of the VchIntIA attacking subunit made by Q145, W157, K209, Y210 and W219 (in magenta) with the extrahelical base G20''. The non-attacking subunit is in yellow, the attacking subunit in green, DNA in orange, and the extrahelical bases T12'' and G20'' are in red. (Based on the structure of the VchIntIA-VCRbs complex (PDB:2A3V) [29].: To date, more than 100 IntIs have been reported in the literature and databases and it is estimated that about 10% of partially or completely sequenced bacterial genomes carry genes coding for these enzymes [31]. Activity for cassette excision and integration has been demonstrated for IntI1 [5, 16, 32], IntI2*179E [18], IntI3 [15], SonIntIA [17], NeuIntIA [19] and VchIntIA [32]. Moreover, it has been shown that IntIs can excise cassettes containing a variety of attC sites [17, 19, 33]. However, it is not well understood why these enzymes can easily recognize and excise some cassettes, while others are poorly (or not) excised.: We compared the ability of several IntIs to excise cassettes flanked by different attC sites. Preliminary results of these excision tests combined with molecular modelling based on the structure of the VchIntIA integrase leads us to suggest that IntIs prefer certain attC sites to others and that these preferences could be related to the recognition of the extrahelical bases. In this study, we used the attCdfrA1 site upstream of the sat2 cassette as a template to alter nucleotide sequence and spacing between the extrahelical bases in order to determine how these modifications influence the efficiency of cassette excision by IntI1, IntI2*179E, IntI3 and SonIntIA.: Results: Comparative excision activities of IntI1, IntI3, IntI2*179E, SonIntIA and VchIntIA on cassettes containing different attC sites: In order to determine why some cassettes are excised by several IntIs while others are poorly (or not) excised, we compared the efficiency of five IntIs in excision of cassettes flanked by different attI and attC sites. Nineteen clones (pLQ423 to pLQ431 and pLQ437 to pLQ446 (Table 1) containing various resistance gene cassettes cloned into pACYC184 were used to compare the recombination activity of IntI1, IntI3, IntI2*179E, SonIntIA and VchIntIA by qualitative excision tests (QL-ETs). The results showed a pronounced effect of the identity and spacing of the extrahelical bases in the attC sites on the efficiency of cassette excision. All integrases efficiently excised cassettes flanked by attC sites whose extrahelical bases are T and G separated by a distance of six nucleotides, with some exceptions for VchIntIA. IntI1 and IntI2*179E also efficiently excised cassettes with their homologous attI site upstream and this same attC site downstream. IntI1 was also able to recognize attI2 but IntI2*179E was unable to recognize attI1. Notably, IntI2*179E and SonIntIA could easily recognize and excise cassettes with the attCdfrA1 site located upstream of the cassette, whereas IntI1 and IntI3 had only a weak excision activity for the same cassettes and the VchIntIA integrase did not excise any of them. The attCdfrA1 folded bs has the extrahelical bases A71 or A72 (either of these adenines could pair with the thymine at position 23) and C80 separated by a distance of 7 or 8 nucleotides (Figure 5). The unusual specificity shown by IntI2*179E and SonIntIA led us to choose clone pLQ430, with attCdfrA1 upstream of the sat2 cassette (with its T-N6-G-containing attC site downstream), for tests of the effect of changes of the upstream attCdfrA1 site on efficiency of cassette excision by the various integrases.: Secondary structure of the folded bottom strand of the attCant(3'')-Ia , attCsat2 , attCaac(6')-Ia- orfG and attCdfrA1 sites. The extrahelical bases are identified by arrows.: Comparative excision activities of IntI1, IntI3, IntI2*179E and SonIntIA on cassettes with an upstream attCdfrA1 site or mutant attCdfrA1 sites: We then determined the effect of different attC structures on excision by IntI1, IntI3, IntI2*179E and SonIntIA. We made several mutants of the attCdfrA1 site, upstream of the sat2 cassette in pLQ430, with altered extrahelical base identity and spacing, and used QN-ETs.: The first set of mutants was made using various substitutions to determine whether the presence of a cytosine or a guanine at position 80 of the attCdfrA1 bottom strand (bs) could alter recognition and excision by IntI1, IntI3, IntI2*179E and SonIntIA (Figure 6A). First, we compared the ability of the four IntIs to carry out cassette excision on clones pLQ430 (A-N7-8-C attCdfrA1 + sat2) and pAL4316 (A-N7-8-G attCdfrA1 + sat2). The results of our QN-ETs showed that the excision activity of IntI1 remained very weak on the sat2 cassette when we changed the cytosine at position 80 to guanine, keeping adenines located at positions 71 and 72 of the attCdfrA1 site and the distance between the extrahelical bases at seven or eight nucleotides (Figure 7A). The efficiency of recognition and excision of this mutant cassette by IntI3 was only slightly decreased. However, the excision by IntI2*179E and SonIntIA were decreased from 51% to 21% and from 25% to 12%, respectively, by the C80G substitution of the attCdfrA1 site.: Secondary structure of the folded bottom strand of several mutants of the attCdfrA1 site. (a) First set of mutants. (b) Second set of mutants. The extrahelical bases are identified by arrows. The ‡ indicates that the first extrahelical base is located on the opposite side of the folded bottom strand structure.: Excision percentage of sat cassette by the IntI1, IntI3, IntI2*179E and SonIntIA integron integrases. For each integrase, the bars indicate the excision percentage determined in the in vivo quantitative excision tests. Excision percentages correspond to the average of two independent assays. We tested the excision percentage of the gene cassette sat2 [pLQ430 and several mutants (A: first set of mutants and B: second set of mutants) created from the clone pLQ430 carrying the attCdfrA1 + sat2 cassette] coding for streptothricin resistance. The ‡ indicates that the first extrahelical base is located on the opposite side of the folded bottom strand structure. Error bars show standard error.: We also compared the excision activity of IntI1, IntI3, IntI2*179E, and SonIntIA on clones pAL4318 (T-N6-C attCdfrA1 + sat2) and pAL4319 (T-N6-G attCdfrA1 + sat2). These mutants were made by A22T substitution, and A22T and C80G substitutions, respectively, on attCdfrA1 . When the first extrahelical base is a thymine (at position 73) and the distance between the extrahelical bases is six nucleotides, the C80G substitution slightly decreased excision by IntI1 and IntI3. The decrease was more pronounced with IntI2*179E and SonIntIA, from 71% to 51% and from 54% to 33%, respectively. Together, these results suggest that the presence of a cytosine as the second extrahelical base favors cassette excision by IntI, in particular by IntI2*179E and SonIntIA. These QN-ETs therefore supported our hypothesis that IntIs differ in their preferences for the extrahelical bases.: We then tested several other mutants of the attCdfrA1 site, made in order to eliminate ambiguities in the spacing of the extrahelical bases. The second set of mutants was based upon double deletions of bases A22 and T23 and A72 and T73 from the wild-type bs of the attCdfrA1 site. This forced the spacing to be six bases and made these attC s more comparable to T-N6-G attC s (Figure 6B). Several other substitutions were added and tested to determine whether the presence of an adenine or a thymine at position 69 or the presence of a cytosine or a guanine at position 76, of the attCdfrA1 <U+0394>AT22-<U+0394>AT72 site (corresponding to positions 72 and 80 of the wild type site) could alter its recognition by IntI1, IntI3, IntI2*179E, and SonIntIA (Figure 6B).: The results of our QN-ETs showed that a reduction of the distance between the extrahelical bases of the attCdfrA1 site (A-N6-C instead of A-N7-8-C) led to a significant increase of excision by IntI1, IntI2*179E, and SonIntIA, while the activity of IntI3 was unchanged (Figure 7B). This preference for a shorter distance between the extrahelical bases was particularly marked for the IntI1 integrase. The excision activity of IntI1 increased from 3% to 33.5% while that of IntI2*179E increased from 51% to 79% and that of SonIntIA increased from 26% to 45% when the distance between the extrahelical bases is six nucleotides rather than seven or eight.: Mutations of the extrahelical bases to those of the consensus of the attC s most easily excised by IntI1 (A69T and C76G) were tested individually and in combination on the attCdfrA1 <U+2206>AT22-<U+2206>AT72 site (clone pAL4305: A-N6-C). The A69T substitution increased excision of the sat2 cassette by IntI1 and IntI3, from 33.5% to 41.5% and from 17% to 38.5%, respectively, when a cytosine is present at position 76, whereas excision by IntI2*179E and SonIntIA was unchanged. The C76G substitution significantly decreased excision of this cassette by IntI1 (33.5% to 14%), IntI3 (17% to 4%), IntI2*179E (79% to 45.5%) and SonIntIA (45% to 30%) when an adenine is located at position 69. The combination of the A69T and C76G substitutions (clone pAL4311) increased excision of the sat2 cassette by IntI1 (33.5% to 43%) and IntI3 (17% to 24%) while it decreased excision by IntI2*179E and SonIntIA, from 79% to 58% and from 45% to 35%, respectively.: Comparison of the excision activity of the four IntIs on clones pAL4310 (A-N6-G attCdfrA1 + sat2) and pAL4311 (T-N6-G attCdfrA1 + sat2) showed that the efficiency of IntI1, IntI3, IntI2*179E, and SonIntIA in excision of the sat2 cassette was increased by the A69T substitution when the second extrahelical base is a guanine and the distance between the extrahelical bases is six nucleotides. Comparison of excision on clones pAL4308 (T-N6-C attCdfrA1 + sat2) and pAL4311 (T-N6-G attCdfrA1 + sat2) showed that the activity of IntI1 was unchanged by the C76G substitution when the first extrahelical base is a thymine and the distance between the extrahelical bases is six nucleotides. However, the C76G substitution significantly decreased excision by IntI3, IntI2*179E and SonIntIA, from 38.5% to 24%, 83% to 58% and 44% to 35%, respectively, when a thymine is at position 69 of the attCdfrA1 site.: The VchIntIA-VCRbs three-dimensional structure shows that IntIs interact closely with the extrahelical bases T12\" and G20\" [29]. We, therefore, tested three other mutant cassettes to determine whether variation of the distance between the extrahelical bases could alter interaction between IntIs and their substrates. The results of our QN-ETs using clones pAL4307 (A-N5-C attCdfrA1 + sat2) and pAL4309 (T-N5-C attCdfrA1 + sat2) showed that IntI1 can excise the sat2 cassette (15% excision) when the attCdfrA1 site located upstream contains the extrahelical bases T and C separated by five nucleotides (pAL4309) but not when the extrahelical bases are A and C (pAL4307) separated by the same distance. However, the IntI3, IntI2*179E and SonIntIA integrases were very inefficient in excision of this cassette with either of these clones.: Folded bottom strand attC s found in integrons are characterized by two extrahelical bases located on the same side of the structure. In order to determine if IntIs can recognize and excise cassettes containing an attC site characterized by two extrahelical bases located on either side of the folded bs structure, we changed the position of the first extrahelical base and tested the excision activity of IntI1, IntI2*179E, IntI3 and SonIntIA. Johansson et al.[27] previously reported that binding to the attCaadA1 bs by IntI1 was decreased by deletion of the T32 extrahelical base and by insertion of a T or an A between positions 16 and 17 to generate a bulge on the opposite side of a potential stem-loop. They showed that the presence of an adenine between positions 16 and 17 weakly affects the binding by IntI1 while the presence of a thymine significantly decreases its binding [27]. However, they did not test the excision activity of IntI1 on a cassette containing this mutant attC site. Our QN-ETs using clone pAL4304 (A‡-N6-C attCdfrA1 + sat2: the ‡ indicates that the first extrahelical base is located on the opposite side of the folded bottom strand structure; the two extrahelical bases are located on either side of the folded bs structure) showed that IntI1 was not able to excise the sat2 cassette when the altered A‡-N6-C attCdfrA1 site is located upstream, while IntI3 showed an excision activity of only 3%. However, the IntI2*179E integrase showed an excision activity of 24% on this very unusual attC site, while SonIntIA showed a level of excision of 9%.: Discussion: The dissemination of antibiotic resistance by mobilization of resistance gene cassettes is a key factor affecting the clinical usefulness of antibiotics. The recruitment of these cassettes by integrons is carried out by IntIs and the efficiency of recombination by these enzymes varies greatly. In this work, we studied some parameters that affect the specificity of these recombinases for attC sites.: Influence of extrahelical base identity, spacing and position on specificity of IntIs: The data presented in this study showed that the IntI1 was efficient in cassette excision using T-N6-G or T-N6-C attC sites, while IntI3 recognized a limited range of attC s and recombined mainly cassettes with T-N6-C attC sites (Figure 7). For their part, IntI2*179E and SonIntIA tolerated changes to the identity of extrahelical bases, as they efficiently excised cassettes with attC s characterized by most of the extrahelical base combinations tested (A-N6-C, A-N6-G, T-N6-C and T-N6-G). In their IntI binding study, Johansson and colleagues [27] showed that substitution of the first extrahelical base (T) with a cytosine or an adenine, and substitutions of the second extrahelical base (G) with any of the alternative bases, does not affect binding of IntI1 to the attCant(3\")-Ia site. However, they did not test the excision efficiency of IntI1 on these mutant cassettes. Taken together, these results suggest that the extrahelical base identities do not affect the binding of IntIs but do influence the recombination reaction. They confirm our hypothesis that attC preferences of IntIs are related to the recognition of the extrahelical bases.: Surprisingly, our results showed that the presence of a cytosine rather than a guanine at the second extrahelical base position increased cassette excision by IntI3, IntI2*179E and SonIntIA, whether the first extrahelical base is a thymine or an adenine. The excision activity of IntI1 was increased by the presence of a cytosine at the second extrahelical position only when the first extrahelical base is an adenine. These results were unexpected since most attC sites have a guanine at the second extrahelical position. They suggest that the emergence of cassettes with increased mobility is possible. The only example of a cassette containing an attC site with a cytosine as the second extrahelical base is blaVIM-2 , but its mobility remains to be evaluated. Results recently obtained by Bouvier et al.[34] show that the attC x attC recombination (attCaadA7 x VCR) carried out by IntI1 is slightly increased by the guanine (G) to cytosine (C) substitution of the second extrahelical base of the VCR when the first extrahelical base is a thymine. In their study, mutations were made on the downstream attC partner whereas they were made on the upstream attC partner in this study. Our QL-ETs and results obtained by Bouvier et al.[34] suggest that changes to the downstream attC partner may have a greater impact on the ability of IntIs to excise cassettes.: Since the VchIntIA-VCRbs three-dimensional structure shows that IntIs interact closely with the extrahelical bases T12\" and G20\" [29], we tested the effect of the distance between these bases on this interaction. Our data showed that IntI1, IntI3, IntI2*179E and SonIntIA most efficiently excised cassettes containing attC s when the spacing between the extrahelical bases was six nucleotides. They also showed that IntI2*179E and SonIntIA were more tolerant than IntI1 and IntI3 to changes in spacing between the extrahelical bases. Johansson et al.[27] showed that increasing the distance between the two extrahelical bases (from six to eight or 10) does not affect binding of IntI1 to the attCant(3\")-Ia site, but they did not test the recombination activity of IntI1 on these mutant attC sites. It appears that the distance between the extrahelical bases is important for the excision reaction but not for bs attC binding by IntIs.: As mentioned above, Johansson et al.[27] reported that the presence of an adenine between positions 16 and 17 on the opposite side of a potential stem-loop combined with the deletion of the T32 extrahelical base decreased binding to the attCaadA1 bs by IntI1. In our study, we observed that, despite the fact that their excision activity is decreased, IntI2*179E and SonIntIA can excise the sat2 cassette when the altered A‡-N6-C attCdfrA1 site, characterized by an extrahelical base at position 22 and another at position 78, is located upstream. IntI1 and IntI3 have no apparent activity on this very atypical site. The influence of the first extrahelical base position (corresponding to T32) was also tested by Bouvier et al.[34] and they showed that the re-localization of this base at the corresponding location on the opposite strand leads to a decrease of VCRbs excision by IntI1. Together, these results show that, in addition to being more tolerant to changes in the identity and spacing between the extrahelical bases, IntI2*179E and SonIntIA are more tolerant than IntI1 and IntI3 with respect to the position of the first extrahelical base. They also suggest that changing the first extrahelical base to a thymine decreases the binding by IntIs and probably affects the excision activity. However, the re-localization of the first extrahelical base as an adenine does not affect the binding but the excision activity is decreased.: QN-ETs using the attCdfrA1 site: IntI1 versus IntI2*179E: The results of our QN-ETs using IntI1 and IntI2*179E with the T-N6-G attCdfrA1 site raised an important issue. It is not clear why the excision percentage observed with IntI1 on cassettes containing the T-N6-G attCdfrA1 site is not higher than that observed with IntI2*179E on the same substrates. Our QL-ETs showed that IntI1 is generally more effective than IntI2*179E in excision of cassettes containing T-N6-G attC sites (for example, aac(6')-Ia-orfG and ant(3'')-Ia). It is possible that these differences are explained by the presence of different nucleotides near the extrahelical bases of the attCdfrA1 mutants used for our QN-ETs and those of attC sites used in our QL-ETs. It has been shown that the identity of the bases located near the extrahelical bases influences the binding of IntI1 [27].: Structural elements of IntI1, IntI2*179E, IntI3, SonIntIA and VchIntIA involved in attC recognition: IntIs bind specifically to the attC bs [26] and the extrahelical bases resulting from its folding are important for recognition by these enzymes [27, 29]. The three-dimensional structure of the VchIntIA-VCRbs complex reveals that the extrahelical base T12'' is stabilized by cis interactions with the ß-4,5 strands from the non-attacking subunits by becoming inserted between two stacked histidines (H240 and H241 in VchIntIA (Figure 4A); H250 and H251 in IntI1) and a highly conserved proline [P232 in VchIntIA (Figure 4A); P242 in IntI1] [29]. The attacking subunits make important DNA contacts in trans with the extrahelical base G20'' through interactions with Q145, W157, K209, Y210 and W219 in VchIntIA (Figure 4B) that correspond to K156, R168, K219, Y220 and W229 in IntI1 [29]. The protein-DNA interactions are otherwise essentially nonspecific [29].: We compared the region located between the aI2 helix and the ß-4,5 strands of IntI1, IntI3, IntI2*179E, SonIntIA and VchIntIA and observed that many residues are conserved among these enzymes (Figure 8). This reflects the importance of this region in attC recognition by IntIs. However, we identified some differences between IntI2*179E and SonIntIA versus IntI1 and IntI3 sequences that could be responsible for the greater versatility of IntI2*179E and SonIntIA in excision of cassettes containing non-T-N6-G attC s. One interesting difference is the presence of two cysteine residues in the ß-4 and ß-5 strands of IntI2*179E and SonIntIA. The same positions are occupied by a serine and an arginine in IntI1 and IntI3. We previously found that the cysteine residue in the ß-5 strand is essential to the excision activity of Shewanella-type integrases while the cysteine in the ß-4 strand is less important [33]. Mutagenesis of the two cysteines studied in SonIntIA suggests that there is no disulfide bridge between the ß-4 and ß-5 strands of these integrases [33]. However, we do not know if these cysteine residues play a role in the ability of IntI2*179E and SonIntIA to tolerate changes to the extrahelical base identity and spacing. Other differences (indicated by arrows in Figure 8) are located at various positions between the aI2 helix and the ß-4,5 strands of IntIs and could also contribute to the greater versatility of IntI2*179E and SonIntIA in excision of cassettes.: Sequence alignment of the aI2 helix and the ß-4,5 strands region of some integron integrases (IntIs). IntI2*, class 2 IntI from Tn7; SonIntIA, IntI from Shewanella oneidensis MR1; VchIntIA, IntI from the Vibrio cholerae chromosomal integron; IntI1, class 1 IntI from plasmid pVS1; IntI3, class 3 IntI from a S. marcescens plasmid. Residues potentially related to the difference in specificity of IntI2 and SonIntIA versus IntI1 and IntI3 are identified by arrows. Positions of the aI2 helix and the ß-4,5 strands in this figure are based on the VchIntIA structure (Protein Data Bank accession no. 2A3V) [29].: Some residues and motifs located outside the aI2 helix and the ß-4,5 strands could also be related to the different preferences of IntIs for extrahelical base identity and spacing. For example, Demarre et al.[35] found mutations of IntI1, with higher activity on wild type and mutant attC sites, in the loop located between the ß-1 and ß-2 strands. Interestingly, the ß-2 strand contains one of the two residues, R168 (W157 in the sequence of VchIntIA), that interact with the extrahelical base G20\" [29]. Also, Johansson et al.[30] showed that substitution of the tryptophan residue at position 199 of IntI1 with alanine, which is aliphatic, small, and uncharged, decreases DNA binding. Interestingly, when it was replaced by an aromatic residue (W199Y), it regained its affinity for attCbs [30]. These results suggest that the presence of an aromatic or a bulky amino acid residue at this position is important. The authors propose that the decreased binding of the IntI1W199A mutant could be explained by structural changes of the aI2 helix and the ß-4,5 strands region [30]. Thus, in addition to residues that interact directly with the extrahelical bases, we must also consider the residues that are important to protein structure in our attempt to identify the factors that may affect the ability of IntIs to excise cassettes. The alignment of IntIs used in our study show that the position 199 is occupied by a tryptophan residue in IntI1 and IntI3 while it is occupied by a glutamine in IntI2*179E and SonIntIA. This difference may contribute to explain the different ability to excise cassettes of these IntIs.: In summary, it seems that IntI2 and SonIntIA have an evolutionary path that is different from IntI1 and IntI3, in their ability to recognize and excise cassettes. IntI2*179E and SonIntIA, although generally less efficient in cassette excision, tolerate a wider variety of configurations of the extrahelical bases of attC sites. We believe that the ability of IntI2*179E and SonIntIA to excise cassettes containing attC s characterized by a broader range of extrahelical base identity and spacing could be related to a greater flexibility of their aI2 helix and their ß-4,5 strand domains.: Acquisition of new cassettes: Analysis of the variable region of class 1 integrons showed that these multiresistance integrons contain a large number of different cassettes and those containing T-N6-G attC s are found at various positions within the variable region. This can be explained by our finding that these cassettes can be easily recognized and excised by IntI1. The IntI1 integrase is efficient in cassette integration and, since cassettes are preferentially integrated by attI x attC recombination [3], cassette order tends to reflect the order of introduction of antibiotics, with the most recently acquired cassettes closest to the promoter.: Class 2 integrons, carrying the IntI2* or IntI2 integrases, as well as class 3 integrons, carrying the IntI3 integrase, have only a limited range of cassettes. Among the arrangements of cassettes associated with the class 2 integrons are dfrA1-sat2-aadA1-orfX, estX-sat2-aadA1-orfX and sat2-aadA1-orfX[18]. The position of cassettes within this class of multiresistance integrons is conserved since most class 2 integrons carry the inactive IntI2* integrase. Recently, class 2 integrons with active integrases have been found [36, 37] but they still have a relatively limited number of cassette arrangements. The dfrA1 and estX cassettes contain non-T-N6-G attC s while the sat2, aadA1 and orfX cassettes contain T-N6-G attC s. Two different arrangements of resistance gene cassettes were found to be associated with class 3 integrons: blaIMP-1 -aac(6')-Ib and blaGES-1 -blaOXA /aac(6')-Ib[38, 39]. The blaGES -1 cassette contains a T-N6-G attC site while blaIMP-1 , aac(6')-Ib, and blaOXA /aac(6')-Ib cassettes contain non-T-N6-G attC s.: The cases of the dfrA1, blaIMP-1 and blaGES-1 cassettes are particularly interesting. Although first found in class 2 and class 3 integrons, these cassettes are now more frequently disseminated by class 1 integrons. This may reflect the greater versatility of IntI1 in cassette rearrangement. Moreover, the dfrA1 cassette is nearly always located in first position in class 1 integrons. As we have shown, cassettes containing non-T-N6-G attC s are weakly excised by IntI1, which suggests that integration of cassettes containing attC sites like that of dfrA1 may hinder their own subsequent excision and that of their downstream neighbors (attC x attC excision). In the case of the blaIMP-1 cassette, the additional secondary structure located between the extrahelical bases of the attC bs does not interfere with the ability of IntIs to excise cassettes when this recombination site is located either upstream or downstream of the gene (Table 1). The blaGES-1 cassette is associated with a T-N6-G attC site that would facilitate its acquisition by integrons (in particular, class 1 integrons) and its dissemination among bacteria.: Leon and Roy [40] have shown that there is no relationship between a cassette structural gene and its associated attC site. According to their new model for gene cassette formation, group IIC-attC introns can target separately a transcriptional terminator adjoining a gene and an isolated attC. Thereafter, the gene and the attC can be joined by homologous recombination between the introns, followed by transcription, RNA splicing, and reverse transcription to lead to the formation of a cassette [40]. The characteristics of the structure of the bottom strand of the attC site would determine the subsequent mobility of the cassette.: Conclusions: In conclusion, this work and previous studies [27, 29, 32, 34] clearly show that the attC structure is an important factor that facilitates the integration of new cassettes into integrons. In our study, we carried out excision tests with several IntIs on cassettes containing a wide variety of attC sites. This work could aid the development of a site-specific recombination system using the IntIs. In contrast to the Cre recombinase of the Cre-lox system, IntIs have a more relaxed specificity for their recognition sites attI and attC. The main advantage of a site-specific recombination system using an IntI is that it would allow insertion of genes (cassettes) in tandem. The results presented in this article could be used to optimize such a system.: Methods: Bacterial strains and growth media: Escherichia coli strains were cultured at 37°C in Luria-Bertani (LB) broth or on LB agar supplemented with ampicillin (100 µg/mL; Sigma, MO, USA), chloramphenicol (50 µg/mL; Sigma) or streptothricin (3 µg/mL). DH5a cells [F-endA1 glnV44 thi-1 recA1 relA1 gyrA96 deoR nupG F80dlacZ<U+2206> M15 <U+2206>(lacZYA-argF)U169 hsdR17(rK -mK + ) <U+03BB>-] were used as a host for construction and maintenance of all plasmid clones and for QL-ETs, while HB101 cells [F-mcrB mrr hsdS20(rB- mB-) recA13 leuB6 ara-14 proA2 lacY1 galK2 xyl-5 mtl-1 rpsL20(SmR) glnV44 <U+03BB>-] were used for QN-ETs.: Bioinformatic analysis: Sequence analysis was done using the Genetics Computer Group programs (Wisconsin Package version 10.3; Accelrys). Folding of attC bottom strands was done using the MFOLD software (http://mfold.bioinfo.rpi.edu/cgi-bin/dna-form1.cgi).: Mutagenesis method: Several mutations were introduced within the attCdfrA1 site located upstream of the sat2 cassette cloned into pACYC184 (clone pLQ430). Specific mutations were introduced into pLQ430 using the QuickChange site-directed mutagenesis system including Pfu Turbo (Stratagene, CA, USA) DNA polymerase. Primer pairs, designed with the OLIGO software package (version 4.1; National Biosciences, MN, USA), were used to create 16 mutants of the attCdfrA1 site. The forward primers are shown in Additional File 1. Mutagenesis products were digested with DpnI, transformed into E. coli DH5a, grown in LB medium for 1h and selected for chloramphenicol resistance by plating on LB agar plates containing chloramphenicol. DNAs from several colonies were purified using a QIAprep spin miniprep kit (Qiagen, Düsseldorf, Germany) and sequenced to confirm the presence of desired mutations and the integrity of surrounding sequences. The isolateswere maintained as glycerol stock cultures at -80°C.: Qualitative excision tests (QL-ETs): IntI1, IntI3, IntI2*179E, SonIntIA and VchIntIA clones (see Additional File 2) were introduced by transformation into E. coli DH5a containing various cassettes cloned into pACYC184 (Table 1). E. coli was grown in LB medium at 37°C to an optical density at 600 nm of 0.5. Cassette excision was induced by the overexpression of the integrase gene using 1 mM isopropyl-ß-D-thiogalactopyranoside (IPTG; Sigma) and incubation at 37°C overnight. Cell cultures were done in the presence of ampicillin and chloramphenicol. Plasmid DNA was subsequently extracted from 5-ml cultures with a QIAprep spin miniprep kit (Qiagen).: In order to determine the ability of IntIs to excise cassettes, we used polymerase chain reaction (PCR) primers pACYC184-5' and pACYC184-3' (See Additional File 1) to detect reductions in length of cassette clones. PCR conditions were 5 min at 95°C, 30 cycles consisting of 30 s at 95°C, 30 s at 62°C and 3 min 30 s at 68°C, and a final elongation step of 5 min at 68°C.: Quantitative excision tests (QN-ETs): Cells containing integrase clones were transformed by various plasmids containing gene cassettes cloned into pACYC184 (See Additional File 2). One colony of each double transformant was used to inoculate 5 mL of LB medium and grown at 37°C to an optical density at 600 nm of 0.5. Cell cultures were done in the presence of ampicillin and chloramphenicol. Isopropyl-ß-thio-galactoside (IPTG) was then added to a final concentration of 1 mM to induce cassette excision and cultures were incubated at 37°C overnight; plasmid DNA extractions (Qiagen) were done on each culture. DNA was incubated at 37°C 1h with PstI to digest the integrase clone and prevent its co-transformation into the E. coli strain used to determine, by replica plating, the antibiotic resistance cassettes that were excised. Thereafter, cassette clones were transformed into E. coli HB101 and colonies selected for chloramphenicol resistance.: One hundred colonies of each transformation were replicated on LB chloramphenicol + streptothricin plates and incubated at 37°C overnight. The proportion of transformants that could not grow indicated the excision percentage of the sat2 cassette for each integrase and each upstream attC tested.: Abbreviations: adenine: arginine: bottom strand: cytosine: guanine: histidine: integron integrase: Luria-Bertani: lysine: nucleotide: qualitative excision tests: quantitative ETs: arginine (amino acid symbol) or purine (nucleotide symbol): thymine: transposon: tryptophan: tyrosine: Vibrio cholerae repetitive DNA sequence: tryptophan: pyrimidine.: References: Stokes HW, Hall RM: A novel family of potentially mobile DNA elements encoding site-specific gene-integration functions: integrons. Mol Microbiol. 1989, 3: 1669-1683. 10.1111/j.1365-2958.1989.tb00153.x.: Hall RM, Collis CM: Mobile gene cassettes and integrons: capture and spread of genes by site-specific recombination. Mol Microbiol. 1995, 15: 593-600. 10.1111/j.1365-2958.1995.tb02368.x.: Collis CM, Grammaticopoulos G, Briton J, Stokes HW, Hall RM: Site-specific insertion of gene cassettes into integrons. Mol Microbiol. 1993, 9: 41-52. 10.1111/j.1365-2958.1993.tb01667.x.: Collis CM, Hall RM: Gene cassettes from the insert region of integrons are excised as covalently closed circles. Mol Microbiol. 1992, 6: 2875-2885. 10.1111/j.1365-2958.1992.tb01467.x.: Collis CM, Hall RM: Site-specific deletion and rearrangement of integron insert genes catalyzed by the integron DNA integrase. J Bacteriol. 1992, 174: 1574-1585.: Hall RM, Brookes DE, Stokes HW: Site-specific insertion of genes into integrons: role of the 59-base element and determination of the recombination cross-over point. Mol Microbiol. 1991, 5: 1941-1959. 10.1111/j.1365-2958.1991.tb00817.x.: Messier N, Roy PH: Integron integrases possess a unique additional domain necessary for activity. J Bacteriol. 2001, 183: 6699-6706. 10.1128/JB.183.22.6699-6706.2001.: Nunes-Düby SE, Kwon HJ, Tirumalai RS, Ellenberger T, Landy A: Similarities and differences among 105 members of the Int family of site-specific recombinases. Nucleic Acids Res. 1998, 26: 391-406.: Gravel A, Fournier B, Roy PH: DNA complexes obtained with the integron integrase IntI1 at the attI1 site. Nucleic Acids Res. 1998, 26: 4347-4355. 10.1093/nar/26.19.4347.: Collis CM, Kim MJ, Stokes HW, Hall RM: Binding of the purified integron DNA integrase IntI1 to integron- and cassette-associated recombination sites. Mol Microbiol. 1998, 29: 477-490. 10.1046/j.1365-2958.1998.00936.x.: Collis CM, Hall RM: Comparison of the structure-activity relationships of the integron-associated recombination sites attI3 and attI1 reveals common features. Microbiology. 2004, 150: 1591-1601. 10.1099/mic.0.26596-0.: Stokes HW, O'Gorman DB, Recchia GD, Parsekhian M, Hall RM: Structure and function of 59-base element recombination sites associated with mobile gene cassettes. Mol Microbiol. 1997, 26: 731-745. 10.1046/j.1365-2958.1997.6091980.x.: Recchia GD, Hall RM: Gene cassettes: a new class of mobile element. Microbiology. 1995, 141: 3015-3027. 10.1099/13500872-141-12-3015.: Recchia GD, Hall RM: Origins of the mobile gene cassettes found in integrons. Trends Microbiol. 1997, 5: 389-394. 10.1016/S0966-842X(97)01123-2.: Collis CM, Kim MJ, Partridge SR, Stokes HW, Hall RM: Characterization of the class 3 integron and the site-specific recombination system it determines. J Bacteriol. 2002, 184: 3017-3026. 10.1128/JB.184.11.3017-3026.2002.: Collis CM, Recchia GD, Kim MJ, Stokes HW, Hall RM: Efficiency of recombination reactions catalyzed by class 1 integron integrase IntI1. J Bacteriol. 2001, 183: 2535-2542. 10.1128/JB.183.8.2535-2542.2001.: Drouin F, Mélançon J, Roy PH: The IntI-like tyrosine recombinase of Shewanella oneidensis is active as an integron integrase. J Bacteriol. 2002, 184: 1811-1815. 10.1128/JB.184.6.1811-1815.2002.: Hansson K, Sundström L, Pelletier A, Roy PH: IntI2 integron integrase in Tn7. J Bacteriol. 2002, 184: 1712-1721. 10.1128/JB.184.6.1712-1721.2002.: Léon G, Roy PH: Excision and integration of cassettes by an integron integrase of Nitrosomonas europaea. J Bacteriol. 2003, 185: 2036-2041.: Martinez E, de la Cruz F: Genetic elements involved in Tn21 site-specific integration, a novel mechanism for the dissemination of antibiotic resistance genes. EMBO J. 1990, 9: 1275-1281.: Esposito D, Scocca JJ: The integrase family of tyrosine recombinases: evolution of a conserved active site domain. Nucleic Acids Res. 1997, 25: 3605-3614. 10.1093/nar/25.18.3605.: Grainge I, Jayaram M: The integrase family of recombinase: organization and function of the active site. Mol Microbiol. 1999, 33: 449-456. 10.1046/j.1365-2958.1999.01493.x.: Gopaul DN, van Duyne GD: Structure and mechanism in site-specific recombination. Curr Opin Struct Biol. 1999, 9: 14-20. 10.1016/S0959-440X(99)80003-7.: van Duyne GD: A structural view of Cre-loxP site-specific recombination. Annu Rev Biophys Biomol Struct. 2001, 30: 87-104. 10.1146/annurev.biophys.30.1.87.: Grindley ND: Site-specific recombination: synapsis and strand exchange revealed. Curr Biol. 1997, 7: R608-612. 10.1016/S0960-9822(06)00314-9.: Francia MV, Zabala JC, de la Cruz F, Garcia Lobo JM: The IntI1 integron integrase preferentially binds single-stranded DNA of the attC site. J Bacteriol. 1999, 181: 6844-6849.: Johansson C, Kamali-Moghaddam M, Sundström L: Integron integrase binds to bulged hairpin DNA. Nucleic Acids Res. 2004, 32: 4033-4043. 10.1093/nar/gkh730.: Bouvier M, Demarre G, Mazel D: Integron cassette insertion: a recombination process involving a folded single strand substrate. EMBO J. 2005, 24: 4356-4367. 10.1038/sj.emboj.7600898.: MacDonald D, Demarre G, Bouvier M, Mazel D, Gopaul DN: Structural basis for broad DNA-specificity in integron recombination. Nature. 2006, 440: 1157-1162. 10.1038/nature04643.: Johansson C, Boukharta L, Eriksson J, Aqvist J, Sundström L: Mutagenesis and homology modeling of the Tn21 integron integrase IntI1. Biochemistry. 2009, 48: 1743-1753. 10.1021/bi8020235.: Mazel D: Integrons: agents of bacterial evolution. Nat Rev Microbiol. 2006, 4: 608-620. 10.1038/nrmicro1462.: Biskri L, Bouvier M, Guerout AM, Boisnard S, Mazel D: Comparative study of class 1 integron and Vibrio cholerae superintegron integrase activities. J Bacteriol. 2005, 187: 1740-1750. 10.1128/JB.187.5.1740-1750.2005.: Larouche A, Roy PH: Analysis by mutagenesis of a chromosomal integron integrase from Shewanella amazonensis SB2BT. J Bacteriol. 2009, 191: 1933-1940. 10.1128/JB.01537-08.: Bouvier M, Ducos-Galand M, Loot C, Bikard D, Mazel D: Structural features of single-stranded integron cassette attC sites and their role in strand selection. PLoS Genet. 2009, 5 (9): e1000632-10.1371/journal.pgen.1000632.: Demarre G, Frumerie C, Gopaul DN, Mazel D: Identification of key structural determinants of the IntI1 integron integrase that influence attC x attI1 recombination efficiency. Nucleic Acids Res. 2007, 35: 6475-6489. 10.1093/nar/gkm709.: Barlow RS, Gobius KS: Diverse class 2 integrons in bacteria from beef cattle sources. J Antimicrob Chemother. 2006, 58: 1133-1138. 10.1093/jac/dkl423.: Márquez C, Labbate M, Ingold AJ, Chowdhury PR, Ramirez MS, Centrón D, Borthagaray G, Stokes HW: Recovery of a functional class 2 integron from an Escherichia coli strain mediating a urinary tract infection. Antimicrob Agents Chemother. 2008, 52: 4153-4154.: Arakawa Y, Murakami M, Suzuki K, Ito H, Wacharotayankun R, Ohsuka S, Kato N, Ohta M: A novel integron-like element carrying the metallo-ß-lactamase gene blaIMP . Antimicrob Agents Chemother. 1995, 39: 1612-1615.: Correia M, Boavida F, Grosso F, Salgado MJ, Lito LM, Cristino JM, Mendo S, Duarte A: Molecular characterization of a new class 3 integron in Klebsiella pneumoniae. Antimicrob Agents Chemother. 2003, 47: 2838-2843. 10.1128/AAC.47.9.2838-2843.2003.: Léon G, Roy PH: Potential role of group IIC-attC introns in integron cassette formation. J Bacteriol. 2009, 191: 6040-6051.: Download references: Acknowledgements: This work was supported by Canadian Institutes for Health Research (CIHR) grant MT-13564 to PHR.: Author information: Affiliations: Corresponding author: Correspondence to Paul H Roy.: Additional information: Competing interests: The authors declare that they have no competing interests.: Authors' contributions: AL performed the experiments and wrote the article. PHR supervised the work and participated in writing the article.: Electronic supplementary material: Additional File 1:Primers used in this study. (DOC 44 KB): Additional File 2:Integrase clones and mutant cassette clones used in this study. (DOC 62 KB): Authors’ original submitted files for images: Below are the links to the authors’ original submitted files for images.: Authors’ original file for figure 1: Authors’ original file for figure 2: Authors’ original file for figure 3: Authors’ original file for figure 4: Authors’ original file for figure 5: Authors’ original file for figure 6: Authors’ original file for figure 7: Authors’ original file for figure 8: Rights and permissions: This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.: Reprints and Permissions: About this article: Cite this article: Larouche, A., Roy, P.H. Effect of attC structure on cassette excision by integron integrases. Mobile DNA 2, 3 (2011). https://doi.org/10.1186/1759-8753-2-3: Download citation: Received: 09 August 2010: Accepted: 18 February 2011: Published: 18 February 2011: DOI: https://doi.org/10.1186/1759-8753-2-3: Keywords: Advertisement: ISSN: 1759-8753: Follow BMC: By using this website, you agree to our Terms and Conditions, Privacy statement and Cookies policy. Manage the cookies we use in the preference centre. : © 2020 BioMed Central Ltd unless otherwise stated. Part of Springer Nature. "
"Characterization of a synthetic human LINE-1 retrotransposon ORFeus-Hs" "Wenfeng An, Lixin Dai, Anna Maria Niewiadomska, Alper Yetil, Kathryn A O'Donnell, Jeffrey S Han, Jef D Boeke" "Jef D Boeke" "14 February 2011" "Long interspersed elements, type 1(LINE-1, L1) are the most abundant and only active autonomous retrotransposons in the human genome. Native L1 elements are inefficiently expressed because of a transcription elongation defect thought to be caused by high adenosine content in L1 sequences. Previously, we constructed a highly active synthetic mouse L1 element (ORFeus-Mm), partially by reducing the nucleotide composition bias. As a result, the transcript abundance of ORFeus-Mm was greatly increased, and its retrotransposition frequency was > 200-fold higher than its native counterpart. In this paper, we report a synthetic human L1 element (ORFeus-Hs) synthesized using a similar strategy. The adenosine content of the L1 open reading frames (ORFs) was reduced from 40% to 27% by changing 25% of the bases in the ORFs, without altering the amino acid sequence. By studying a series of native/synthetic chimeric elements, we observed increased levels of full-length L1 RNA and ORF1 protein and retrotransposition frequency, mostly proportional to increased fraction of synthetic sequence. Overall, the fully synthetic ORFeus-Hs has > 40-fold more RNA but is at most only ~threefold more active than its native counterpart (L1RP); however, its absolute retrotransposition activity is similar to ORFeus-Mm. Owing to the elevated expression of the L1 RNA/protein and its high retrotransposition ability, ORFeus-Hs and its chimeric derivatives will be useful tools for mechanistic L1 studies and mammalian genome manipulation." "Codon Optimization, Saline Sodium Citrate, ORF1 Protein, Synthetic Sequence, Puromycin Selection" " Characterization of a synthetic human LINE-1 retrotransposon ORFeus-Hs: Wenfeng An1,2, Lixin Dai1, Anna Maria Niewiadomska1, Alper Yetil1,3, Kathryn A O'Donnell1, Jeffrey S Han1,4 & Jef D Boeke1 : Mobile DNA volume 2, Article number: 2 (2011) Cite this article : 8710 Accesses: 35 Citations: 0 Altmetric: Metrics details: Abstract: Long interspersed elements, type 1(LINE-1, L1) are the most abundant and only active autonomous retrotransposons in the human genome. Native L1 elements are inefficiently expressed because of a transcription elongation defect thought to be caused by high adenosine content in L1 sequences. Previously, we constructed a highly active synthetic mouse L1 element (ORFeus-Mm), partially by reducing the nucleotide composition bias. As a result, the transcript abundance of ORFeus-Mm was greatly increased, and its retrotransposition frequency was > 200-fold higher than its native counterpart. In this paper, we report a synthetic human L1 element (ORFeus-Hs) synthesized using a similar strategy. The adenosine content of the L1 open reading frames (ORFs) was reduced from 40% to 27% by changing 25% of the bases in the ORFs, without altering the amino acid sequence. By studying a series of native/synthetic chimeric elements, we observed increased levels of full-length L1 RNA and ORF1 protein and retrotransposition frequency, mostly proportional to increased fraction of synthetic sequence. Overall, the fully synthetic ORFeus-Hs has > 40-fold more RNA but is at most only ~threefold more active than its native counterpart (L1RP); however, its absolute retrotransposition activity is similar to ORFeus-Mm. Owing to the elevated expression of the L1 RNA/protein and its high retrotransposition ability, ORFeus-Hs and its chimeric derivatives will be useful tools for mechanistic L1 studies and mammalian genome manipulation.: Background: The human genome is littered with transposable element sequences; some are mere fossil records of ancient insertion events, whereas others remain active. Of these active elements, the long interspersed elements, type 1 (LINE-1 or L1) remain among the most active, and are capable of autonomous retrotransposition [1] and of providing enzymatic activities for the non-autonomous retrotransposition of short interspersed nucleotide elements (SINE) such as Alu elements [2]. Full-length versions of L1 elements are approximately 6 kb long, and consist of a 5' (untranslated region) UTR containing an internal promoter sequence, two open reading frames (ORFs), ORF1 and ORF2, and a 3'UTR followed by a poly(A) tail encoded in the DNA [3–8]. The L1 ORF1 protein (ORF1p) is a non-specific nucleic acid binding protein with nucleic acid chaperone activity [9–12]. The ORF2 protein (ORF2p) is responsible for the catalytic activity necessary for retrotransposition, and contains both endonuclease and reverse transcriptase activities [13, 14].: L1s make up approximately 17% of the human genome. However, despite their abundance, the replication and control mechanisms of these elements are poorly understood, partly because of their low expression levels of messenger (m)RNA and protein [15]. We have previously linked inefficient L1 expression to a transcription elongation defect potentially caused by high adenosine content in the ORFs. We subsequently constructed a synthetic L1, termed ORFeus, in which the codons of both ORFs were synonymously optimized, based on a mouse L1 protein sequence [16, 17]. This element was at least 200-fold more active for retrotransposition than the native mouse element L1spa[18].: In this paper, we describe our use of similar techniques to construct a synthetic human L1 (ORFeus-Hs) element and several synthetic/native chimeric L1 elements. Although we observed increased levels of L1 mRNA and ORF1p, the levels of L1 retrotransposition, as measured by two different retrotransposition reporter assays [1, 19], were only increased by a maximum of about threefold in this element. We discuss various models to explain the possible restrictions on ORFeus- Hs activity. Certain chimeric synthetic/native constructs were higher in activity than the fully synthetic constructs, suggesting that recoding may have abolished a cis element(s) or introduced one or more deleterious sequences into ORFeus- Hs. Moreover, one of these chimeras produced slightly more mRNA and ORF1 protein compared with ORFeus- Hs. ORFeus-Hs represents a valuable tool for studying mechanisms of L1 replication and control, particularly at the protein level, by providing nucleic acid and protein markers that can be detected more easily.: Results: Construction of ORFeus-Hs and the synthetic/native L1 chimeras: The ORFeus-Hs open reading frames were designed using the same principles used to construct murine ORFeus, which we now refer to as ORFeus-Mm to distinguish it from the main topic of this paper; ORFeus-Mm was referred to as smL1 in the original publication [16]. The reading frames were recoded to the preferred codon for each amino acid (that is, 20 codons were used), except where internal restriction sites were strategically positioned to facilitate assembly of the complete synthetic ORFs (see Additional file 1, Figure S1). The synthetic ORFs were fused either to a cytomegalovirus (CMV) promoter-enhancer with a Kozak signal, the native L1 5' UTR promoter, or a combination of both (see Additional file 1, Figure S2 for sequences of these segments). These constructs were also tagged with either enhanced green fluorescent protein (EGFP-AI) [19] or neomycin (Neo-AI) [1] retrotransposition markers to monitor retrotransposition frequency.: Finally, because synthetic and native elements showed distinct retrotransposition frequencies, and to further study the sequence requirements for L1 retrotransposition, we made several chimeric L1 elements consisting of various combinations of native and synthetic L1 elements (Figure 1; see Additional file 2, Table S1).: Schematic representation of native, synthetic and chimeric human L1 elements. Three sets of such constructs differing from each other at the promoter region are illustrated: the first set carries a cytomegalovirus (CMV) promoter and a Kozak (K) signal, the second set has a dual CMV-L1 5' untranslated region (UTR) promoter, and the third has a 5' UTR promoter only. All elements are cloned in a pCEP-Puro vector backbone. AMPR = ampicillin resistance gene; B, E, P = restriction sites EcoR I, BamH I and Pml I, respectively, at the junctions in various chimeras; EBNA-1 = Epstein-Barr nuclear antigen 1 gene permitting extrachromosomal replication; Intron = human gamma globin intron; Marker = either enhanced green fluorescent protein or neomycin marker; ORF = open reading frame; PuroR = puromycin resistance gene; SV40pA = Simian virus 40 polyadenylation signal. Blue = native sequence; purple = synthetic sequence.: Active retrotransposition by ORFeus-Hs and synthetic/native L1 chimeras: To explore the effects of our sequence manipulations on the levels of L1 retrotransposition, retrotransposition frequency was measured using several independent assays (Table 1). Briefly, the transfected HEK-293T cells were harvested, and EGFP-positive cells were counted by fluorescence-activated cell sorting (FACS) analysis. Retrotransposition levels of the corresponding native L1 sequences with their various promoters (pLD223, pWA174, pLD143) were used as a reference for the other constructs (Table 1, row 1).: Interestingly, the construct containing both the CMV and native L1 promoters exhibited a significantly higher percentage of EGFP-positive cells (14%) than the constructs containing either promoter on its own (2.8% and 3.8% respectively). As for the partially synthetic chimeric L1 constructs, retrotransposition levels appeared to increase as the length of segments of synthetic L1 sequence increased, regardless of which promoter was driving transcription (Table 1, rows 2 to 4). However, when the fully synthetic L1 constructs (pWA163, pWA165, pLD255) were compared with their respective native constructs (pLD223, pWA174, pLD143), retrotransposition levels were variable (Table 1, row 5). The ORFeus-Hs construct driven by a CMV promoter alone was about threefold more active than its native L1 counterpart, whereas the ORFeus-Hs construct driven by the 5' UTR promoter alone was only slightly more active (~1.2 times) than its native L1 counterpart. Perhaps most unexpectedly, the ORFeus-Hs construct driven by both the CMV and 5' UTR promoters was actually less active (0.58 times) than its native L1 counterpart. Thus, the nature of the promoter used strongly influenced the relative retrotransposition frequency of synthetic versions of retrotransposons.: Additionally, it is of interest that in all of the comparisons, the chimera-B constructs were consistently more active than their respective fully synthetic L1 constructs (Table 1, rows 4 & 5). The B chimeras consisted of the fully synthetic ORF1 and the first three quarters of synthetic ORF2, with the last quarter of ORF2 derived from the native L1RP element.: Similar trends were observed when a two-step retrotransposition assay [1] was used, in which the cells underwent selection for a Neo-AI reporter, after enrichment for a population of plasmid-bearing cells using puromycin selection (see Additional file 2. Table S2), indicating that these effects are dependent on intrinsic aspects of retrotransposition, not on the specific retrotransposition reporter used.: The 3' UTR, inter-ORF and constitutive transport elements are not essential to ORFeus-Hs retrotransposition: We noted that several of the initial synthetic element constructs we made had extremely short 3' UTRs. A portion of the 3' UTR is dispensable for (native) human L1 retrotransposition [1], thus we investigated whether the virtually complete absence of the UTR in these constructs could explain the modest increase in transposition rates (relative to the mouse synthetic elements). To test this theory, we made several constructs in which the full-length 3' UTR sequence was restored in the various ORFeus-Hs constructs, and tested their retrotransposition levels (see Additional file 2, Table S2). This alteration did increase the ORFeus-Hs retrotransposition frequency, but only by 1.1 to 2 times, compared with the construct without a 3' UTR (Table 2, rows 2 and 3).: It has also been reported that the last 20 nucleotides of ORF2 and the first 70 nucleotides of the 3' UTR contain a constitutive transport element (CTE) that is important for the export of full-length mRNA to the cytoplasm [20]. Because this sequence is imbedded within ORF2, which was extensively recoded in ORFeus-Hs, it was a candidate to explain the difference in activity between chimera B and ORFeus-Hs. We evaluated the effect of restoring the CTE sequence alone to the wild-type sequence (synthetic sequence <U+2192> native sequence). We therefore constructed plasmids in which both the native CTE and 3' UTR were restored to ORFeus-Hs. Interestingly, this maneuver actually reduced retrotransposition slightly (Table 2, rows 2 and 4), suggesting the presence of a sequence in the native CTE that is slightly inhibitory to retrotransposition by these constructs. It is important to recognize that the constructs described here contain introns, and thus they may be able to exit the nucleus via a mechanism distinct from that used by native L1 sequences, which do not normally undergo splicing as part of the retrotransposition process. Thus, it is formally possible that retrotransposition in the absence of an intron-containing reporter would actually depend on these sequences.: Finally, we replaced the native L1 inter-ORF region in ORFeus-Hs with a randomized version (see Methods). This randomization modestly increased the retrotransposition frequency of ORFeus (see Additional file 2, Table S2). Thus, as has been previously reported for native L1 [21, 22], the native inter-ORF region sequence is also not crucial for retrotransposition of ORFeus-Hs.: Synthetic sequence increases levels of L1 mRNA: We examined differences in transcript levels between the native L1RP constructs, ORFeus-Hs and the various semi-synthetic L1 chimeras. HEK-293T cells were transfected with the following vectors, all of which contain the CMV promoter: native L1RP with (pWA174) and without (pLD223) a L1 5' UTR sequence; the three chimeras with increasing lengths of synthetic sequence with (pLD224, pLD227, pLD225) and without (pWA172, pWA170, pWA176) L1 5' UTR sequences; and the fully synthetic ORFeus-Hs constructs with (pWA165) and without (pWA163) L1 5' UTRs. The cells were harvested at 24 hours post-transfection and total RNA was isolated. Levels of EGFP-containing transcripts were measured by RNA blotting and normalized to control (acidic ribosomal phosphoprotein P0; ARPP0) transcript levels (Figure 2). Lanes 1 to 5 show constructs without a 5' UTR and lanes 6 to 10 show those with a 5' UTR. Spliced and unspliced L1 mRNAs can be identified by their predicted mobilities in both cases. In all cases, mRNA levels increased monotonically, but not linearly, with retrotransposition levels. Amounts of the ORFeus-Hs L1 transcripts (Figure 2, lanes 5 and 10) were significantly greater than those observed in their native counterparts (Figure 2, lanes 1 and 6). As the length of the synthetic segment in the chimeric constructs increased, so did L1 transcript levels (Figure 2, lanes 2 to 4 and 7 to 9). In addition, chimera B constructs appeared to have slightly higher transcript levels than did the fully synthetic ORFeus-Hs constructs (Figure 2, lanes 9 and 10). Finally, we observed a complex context effect: the first three constructs (native, chimera P and chimera E), which had both CMV and L1 5' UTR promoters, had higher transcript levels than those with a CMV promoter alone. This may account for the higher retrotransposition levels observed. By contrast, the last two plasmids (chimera B and fully synthetic) showed the opposite effect, with the CMV promoter alone producing more RNA than the combination of CMV and 5' UTR (Figure 3).: Total RNA analysis of L1 expression. Expression levels of native, partially synthetic, and completely synthetic ORFeus-Hs were compared in 293T cells. The vectors used were: pLD223, pWA172, pWA170, pWA176, pWA163, pWA174, pLD224, pLD227, pLD225, pWA165. Top, L1 mRNA expression; note both spliced and unspliced transcripts. Bottom, RNA expression of loading control, ARPP0.: Relative increases in RNA, protein and retrotransposition frequency. (A) Relative increases in RNA, protein and retrotransposition frequency in the constructs containing CMV only. Chimera B = pWA176; chimera E = pWA170; chimera P = pWA172; native = pLD223; synthetic = pWA163. (B) Relative increases in RNA, protein and retrotransposition frequency in the constructs containing both CMV and 5' UTR promoters. Chimera E = pLD227; chimera B = pLD225; chimera P = pLD224; native = pWA174; synthetic = pWA165. Values of pLD223 and pWA174 were assigned as control in each group of constructs. Data are mean of a minimum of three independent experiments plus standard error. Blue = RNA; gray = relative retrotransposition frequency; purple = ORF1p.: Increases in ORF1p expression and ORF2 RT activity in ORFeus-Hs: To determine the effect of our codon optimization on the levels of protein expression, HEK-293T cells were transfected and harvested as described above. After lysis, cells were analyzed by SDS-PAGE, transferred to membranes, and probed with anti-ORF1p antibody (Figure 4A). Similar to the levels of RNA transcripts, levels of ORF1 protein were considerably elevated for the synthetic and partly synthetic L1s (Figure 4A). The results were quantified by densitometry (Figure 3). In cells transfected with the native L1RP constructs (pLD223, pWA174), only a low level of ORF1p was observed at ~41 kDa (Figure 4A, lanes 1 and 6). As with RNA, the levels of ORF1p increased as the length of synthetic sequence was increased, but the extent of the increase in ORF1 protein was much less impressive (Figure 4A, lanes 2 to 5 and 7 to 10). The RNA, protein and retrotransposition increases correlated in terms of whether retrotransposition increased or decreased in each construct. Notably, levels of ORF1p from the chimera-B L1 elements (pWA176 and pLD225) were also slightly higher than those observed in the fully synthetic ORFeus-Hs elements (pWA163, pWA165) (Figure 3, Figure 4A).: Analysis of ORF1 protein expression and relative ORF2 RT activity. (A) The same constructs as in Figure 2 were analyzed. The vectors used were: pLD223, pWA172, pWA170, pWA176, pWA163, pWA174, pLD224, pLD227, pLD225, pWA165. Top, protein expression of ORF1. Bottom, protein expression of the tubulin loading control. (B) L1 element amplification protocol (LEAP): assay was performed using ribonucleoproteins (RNPs) prepared from cells transfected with pWA174 (L1RP) and pWA165 (ORFeus-Hs). The numbers 0, 0.1, 0.2, 0.5, 1, 5 and 10 indicate the amount (µg) of total protein of the RNP prep added to each LEAP reaction. The arrows indicate the mobility of LEAP PCR product. SuperScript III represents a positive control in which 100 U of SuperScript III reverse transcriptase (Invitrogen, Carlsbad, CA, USA) was added to the LEAP reaction.: Immunoblotting for ORF2 protein was not of sufficient quality to allow quantification. Instead, to evaluate ORF2p activity from ORFeus-Hs, we performed an L1 element amplification protocol (LEAP) [23] using ribonucleoprotein (RNP) prepared from 293T cells transfected with either L1RP (pWA174) or ORFeus-Hs (pWA165). As little as 0.1 µg RNP prepared from cells transfected with ORFeus-Hs produced a signal of strength equal to that produced by 10 µg native RNP (Figure 4B), but native L1 RNP did not produce a visible signal until at least 5 µg RNP was added. To roughly quantify the reverse transcription (RT) activity in these two samples, we titrated the ORFeus-Hs RNP down to 0.025 µg per reaction (see Additional file 1, Figure S3) and compared its activity with that of native L1 RNP. Approximately 10 µg native L1 RNP contained similar RT activity to that of ~0.05 µg ORFeus-Hs RNP (see Additional file 1: Figure S3B). Although this experiment is only semiquantitative, it is obvious that cells transfected with ORFeus-Hs displayed ORF2 RT activity of at least two orders of magnitude higher than those transfected with native L1. This activity increase is much more in line with RNA abundance than ORF1p abundance (see below), and suggests that L1 RNA may be limiting for RNP activity.: An interesting finding was that although the fold increases of L1 RNA and protein were all in the same direction, the magnitude of the increases was dramatically different (Figure 3). Comparing RNA increases with protein increases, it can readily be seen that in the chimera B and fully synthetic cases, the increases in RNA were much larger than the increases in ORF1 protein, by a factor of four to five in the CMV-only constructs. The retrotransposition frequency increases did not correlate well with RNA abundance in terms of fold increase, but were consistently 1.3 to 2 times larger in magnitude than the protein increases in the CMV promoter constructs. By contrast, for the CMV/L1 5' UTR promoter constructs, there was a larger increase in protein level than in retrotransposition frequency.: Discussion: The data presented here provide an interesting contrast between the synthetic versions of human and mouse retrotransposons, ORFeus-Hs and ORFeus-Mm. Our previous data showed that the generation of ORFeus-Mm with optimized codons, which were presumably free of sequences that might hamper transcription, resulted in a highly active element with levels of retrotransposition that were as much as 200-fold higher than the native element [16]. These were shown to be in part due to higher levels of mouse L1 transcripts [15], and presumably correspondingly higher levels of protein products. However, when similar techniques were attempted in order to develop a highly active human retrotransposon, we were only able to increase levels of retrotransposition by a maximum of two to three times. Contrary to the findings in mouse L1 elements, the synthetic sequences did not increase human L1 protein and retrotransposition levels by the same margin.: Native L1 elements contain premature polyadenylation sites [15, 24] and cryptic splice sites [25] that produce premature polyadenylated and spliced form L1 RNAs. These isoform RNAs could limit full-length L1 RNA production or compete for L1-encoded ORFs [25]. In our recoding process, most or all of these signals were removed, and this probably contributed to the increased abundance of full-length L1 RNA (Figure 2, Figure 3). Although the function of these signals in nature remains unknown, they are dispensable for L1 retrotransposition in tissue-culture assays.: One obvious reason that ORFeus-Hs did not increase retrotransposition frequency by 200 times is that the native mouse L1 element (L1spa) has much lower activity than the native human L1RP element [18]. In fact, codon optimization of both mouse and human L1 elements increased their retrotransposition abilities to a similar level (Table 1). This could represent an upper limit of L1 retrotransposition that can be readily tolerated by tissue-culture cells and/or a rate-limiting step(s) during the process of retrotransposition. Elevated levels of L1 RNA/protein and shuttling between nucleus and cytoplasm may have a strong effect on the cell, perhaps overloading its full capacity to process L1-RNP retrotransposition intermediates [26]. Consistent with this, we observed that cells overexpressing either ORFeus-Mm or ORFeus-Hs displayed considerably higher sensitivity to antibiotics than those transfected with native L1s. For example, HEK-293T cells transfected with ORFeus-Hs grew more slowly at a concentration of 2 µg/ml puromycin than did cells overexpressing native L1RP (see Additional file 1, Figure S4). These results are consistent with studies that reported effects on L1 protein expression leading to high levels of double-strand breaks and/or apoptosis and/or cellular senescence [27–30]. It is formally possible that codon optimization changes made in ORFeus corrected a mutation(s) in a cis element(s) that hampers retrotransposition efficiency of native L1, but results from both the study of Han et al. [16] on the mouse element and the present study on the human element show that L1 activity increases progressively as larger proportions of the native sequence are recoded, consistent with the reported elongation defect.: We noted that the three sets of L1 constructs driven by CMV only, CMV plus 5' UTR or the 5' UTR only had very different trends of retrotransposition frequency changes. This suggests that different promoters somehow produce RNAs that are of different 'quality'. One difference in quality is the structure of the RNA 5' end; the CMV promoter fragment also contains the 51 bp viral 5' UTR upstream of the L1 ORF1 AUG motif, whereas the native L1 promoter introduces the native 907 bp L1 5' UTR in its place. In the double-promoter construct, there are thought to be two transcription start sites, one identical to the native site, and one extended on its 5' end by the CMV 5' UTR. We have not directly examined the relative abundance of these two RNA forms. Thus, the 5' UTR sequences differ between the three types of elements, and it is possible that the interactions between the 5' UTRs and the rest of the RNA sequence influence retrotransposition efficiency.: Unexpected discrepancies in the increases in RNA abundance, protein abundance and retrotransposition frequency were noted between the various constructs, with RNA increasing much more dramatically than protein, and protein increasing more than retrotransposition frequency. This suggests that when comparing native versus synthetic RNA sequences, the latter either interferes with translational efficiency, decreases stability of the encoded protein, or both. The larger relative increase in RNA suggests that the primary effect of the codon optimization was to improve levels of full-length L1 mRNA. Because codon optimization is predicted to increase translational efficiency, it is surprising that protein levels actually decreased relative to RNA template abundance. It is formally possible that recoding leads to enhanced protein degradation. Native codon usage may provide signals for proper folding of the nascent protein [31]. Alternatively, if the interaction between the RNA and the protein to form RNP intermediates is abrogated in the fully or mostly synthetic elements, any ORF1 protein that does not get incorporated into RNPs may become very unstable, potentially explaining the RNA-protein discrepancy observed here.: Perhaps most surprisingly, we found that changes in ORF2 sequences in chimera B had a significant effect on the expression of ORF1p (compare pLD225 and pLD165 in Figure 4). This is consistent with models in which ORF2 protein, or the RNA sequence encoding it, might participate in the regulation of ORF1 protein translation or stability. Finally, although recoding of the human element did not produce an increase in retrotransposition frequency that was as dramatic as with ORFeus-Mm, ORFeus-Hs and its chimeric derivatives remain useful tools for L1 studies, and have the highest retrotransposition frequency of the available human L1s reported. Higher levels of full-length L1 mRNA and ORF1p provide a convenient and rapid marker for the early stages of the L1 replication cycle by various methods such as immunofluorescence and immunoprecipitation.: Methods: Plasmid construction: Synthetic human ORF1 and ORF2 sequences were created by replacing each codon in the human L1 ORFs with codons favored in strongly expressed human genes, and introducing strategically placed restriction enzyme sites using a DNA-shuffling approach [32]. Oligonucleotides (60-mer) (Integrated DNA Technologies, Coralville, IA, USA) collectively encoding both strands of ORFeus-Hs reading frames were used, and gene synthesis and assembly were performed as previously described [16]. Synthetic/native L1 chimeras were assembled by exploiting various native restriction sites (Figure 1; Additional file 2, Table S1). The sequence of the inter-ORF region was randomized by counting the number of each base in the native inter-ORF region. These were then randomized by selecting that number of each base, using the order of their occurrence in a famous novel [33] (Table 3).: Cell culture, transfection and retrotransposition assay: Retrotransposition assays with EGFP-AI indicators were performed in HEK-293T cells. HEK-293T cells (ATCC, Manassas, VA, USA) were maintained in Dulbecco modified Eagle medium (DMEM; Invitrogen, Carlsbad, CA, USA) with 10% fetal bovine serum (FBS) and penicillin/streptomycin (penicillin, 100 units/ml; streptomycin, 100 µg/ml), and cells were passaged upon confluence. HEK-293T cells were seeded at 2×105 cells per well in six-well plates and grown overnight. The next day, transfections were performed with 1 µg plasmid and 2.5 µl transfection reagent (Fugene HD; Roche Applied Science, Indianapolis, IN, USA) according to the manufacturer's protocol. The day after transfection, cells were treated with trypsin and transferred to 60 mm plates with complete medium containing puromycin 1 µg/ml. After 3 days of puromycin selection, cells were washed in 1× phosphate-buffered saline (PBS), and kept on ice before undergoing FACS (FACSCalibur instrument; BD Biosciences, Sparks, MD, USA), using forward scatter versus green fluorescence plots. The gating for EGFP-positive cells was determined by analyzing cells transfected with an expression plasmid (pCEP-Puro; puromycin-resistant, EGFP-negative). A minimum of 20,000 cells per sample were analyzed. Data were analyzed using CellQuest software. A minimum of eight independent experiments were performed for each construct.: Retrotransposition assays with Neo-AI indicators were performed with HeLa cells. HeLa cells were maintained in DMEM (Invitrogen, Carlsbad, CA, USA) supplemented with 10% FBS and penicillin/streptomycin, and were passaged upon confluence. Cells were seeded at 2.5×105 per well in a six-well plate, transfected (FuGene6; Roche Applied Science, Indianapolis, IN, USA) on the following day, and selected by growing with puromycin 2.5 µg/ml for 3 days. For each transfection, three 100 mm dishes were seeded with 1×104 to 4×104 cells each under G418 selection (500 µg/ml) for 10 to 14 days. A minimum of six independent experiments was performed for each construct. Retrotransposition activity was normalized to activity of L1RP.: Northern blot assays: Total RNA was purified (RNeasy; Qiagen,Valencia, CA,USA), then 10 µg RNA was loaded on a 1.2% agarose/formaldehyde gel, blotted overnight to a nylon membrane (Genescreen plus; Perkin Elmer,Waltham, MA, USA) in 10× saline sodium citrate (SSC), crosslinked by ultraviolet radiation, and baked. Prehybridizations and hybridizations were both performed in an ultrasensitive hybridization buffer (ULTRAhyb; Applied Biosystems/Ambion, Austin, TX, USA) at 42°C. Washes were performed in 2×SSC with 0.1% sodium dodecyl sulfate (SDS) and in 0.1× SSC with 0.1% SDS. Radioactive signals were detected with a phosphoimager (Typhoon; GE Healthcare, Piscataway, NJ, USA) and quantified using ImageQuant software. Northern probes were first amplified by PCR, purified in gels [alpha 32P] labeled with TP (Random Prime-It II Kit, Stratagene, Santa Clara, CA, USA). Primers (Table 3) were used to amplify the EGFP and ARPP0 probes.: Immunoblot assays: After the 3-day puromycin selection, the cells were lysed in buffer (M-PER; Thermo Scientific, Rockford, IL, USA) and spun in a centrifuge for 15 minutes at 13,000 g to separate out the cell lysate. The cell lysate (10 µl) was mixed with 10 µl 2× loading buffer (0.1 mol/l Tris-HCl pH 6.8, with 4.0% SDS, 20% glycerol, 5% ß-mercaptoethanol and 0.2% bromophenol blue), and samples were separated in 4% to 20% SDS-PAGE gels. After transfer to nitrocellulose membranes, membranes were probed with anti-ORF1p IgY antibody, which had been generated by immunizing chickens with purified human L1 ORF1p overexpressed in Escherichia coli and purified from yolks (Gallus Immunotech Inc,Fergus, ON, Canada). Western blots were developed with detection reagent (ECL-plus; GE Healthcare, Piscataway, NJ, USA), detected using an imaging system (LAS3000 instrument (Fujifilm) and Image Reader LAS-3000 software) at 'high' setting. The signals in the electronic file were quantified using Multi-Gauge software (Fuji Film), a program based on band density. Only the nonsaturated signals were quantified, and the background was subtracted. Results were normalized by using the tubulin controls as a reference, and are presented as fold difference relative to fully native construct. A total of three western blots were performed, and a representative blot is shown.: LEAP: LEAP was performed according to Kulpa et al. [23]. Briefly, 293T cells were transfected with pWA174 (L1RP) or pWA165 (ORFeus-Hs), and then selected on puromycin 1 µg/ml for 2 weeks. On harvest day, ~800 million cells were washed with PBS three times and then resuspended in 10 ml cold PBS. Cells were pelleted at 3,000 g for 5 minutes in a swinging bucket rotor, then lysed with 1 ml buffer (1.5 mmol/l KCl, 2.5 m mol/l MgCl2, 5 m mol/l Tris-HCl, 1% deoxycholic acid, 1% Triton X-100, 1× protease inhibitor cocktail) for 5 min on ice. The lysate was clarified by centrifugation at 3,000 g for 5 min at 4°C, and the supernatant was transferred to an 8.5% to 17% sucrose cushion. The gradient was spun at 39,000 rpm (SW40.1 rotor) (178,000 g) for 2 h at 4°C. The pellet was resuspended in 100 µL 5 mmol/l Tris (pH 7.5) with 1× protease inhibitor, and with glycerol added to give a final concentration of 50%. RNP preparations were stored at -80°C.: For the LEAP reaction, various amounts of RNP were added to 50 µl of 50 mmol/l Tris-HCl pH 7.5, 50 mmol/l KCl, 5 mmol/l MgCl2, 10 mmol/l dithiothreitol, 0.4 µmmol/l 3' LEAP primer (JB11560; Table 3), 20 units RNasin, 0.2 mmol/l dNTPs and 0.05% Tween 20, and incubated at 37°C for 1 hour. LEAP reaction product (1 µL) was used as template in a 50 µl PCR assay with 5 µl 10× PCR buffer (Roche Applied Science, Indianapolis, IN, USA), 5 µl 2.5 mmol/l dNTPs, 1 µl each of primers JB11564 (5 µmol/l)and JB14067 primers (5 µmol/l) (Table 3), and 1 µl FastStart Taq polymerase (Roche Applied Science, Indianapolis, IN, USA). The reaction was carried out at with an initial denaturation at 94°C for 5 min, followed by 35 cycles of 94°C for 30 seconds, 56°C for 30 seconds, and 72°C for 30 seconds, and final extension at 72°C for 7 minutes. An aliquot (10 µL) of each PCR product was loaded onto a 1.5% agarose gel. Band density was quantified using Multi-Gauge software.: Cell viability assay: 293T cells were transfected with pWA174 (L1RP), pLD225 (chimera B) or pWA165 (ORFeus-Hs) in six-well plates (2 µg L1 plasmid + 10 ng pCAG-eGFP/40000 cells). The following day, cells were treated with trypsin, and an equal number of cells were plated in a 96-well plate (~10,000 cells/well) with puromycin 2 µg/ml, or without puromycin selection. Another portion of the cells was analyzed by FACS to acquire transfection efficiency. Two days later, 20 µl cell-viability solution (CellTiter-Blue®; Promega, Madison, WI, USA) was added to each well. The plate was incubated at 37°C for 2 hours and then read, using a microplate reader with an excitation wavelength 550 nm and an emission wavelength of 600 nm. Relative cell viability is presented as the ratio of viable puromycin-resistant cells divided by total viable cells (without puromycin selection). Four independent transfections were performed for each construct, and all values were normalized to the transfection efficiency acquired by FACS.: References: Moran JV, Holmes SE, Naas TP, DeBerardinis RJ, Boeke JD, Kazazian HH: High frequency retrotransposition in cultured mammalian cells. Cell. 1996, 87: 917-927. 10.1016/S0092-8674(00)81998-4.: Dewannieux M, Esnault C, Heidmann T: LINE-mediated retrotransposition of marked Alu sequences. Nat Genet. 2003, 35: 41-48. 10.1038/ng1223.: Dombroski BA, Mathias SL, Nanthakumar E, Scott AF, Kazazian HH: Isolation of an active human transposable element. Science. 1991, 254: 1805-1808. 10.1126/science.1662412.: Fanning TG: Size and structure of the highly repetitive BAM HI element in mice. Nucleic Acids Res. 1983, 11: 5073-5091. 10.1093/nar/11.15.5073.: Fanning TG, Singer MF: LINE-1: a mammalian transposable element. Biochim Biophys Acta. 1987, 910: 203-212.: Grimaldi G, Skowronski J, Singer MF: Defining the beginning and end of KpnI family segments. EMBO J. 1984, 3: 1753-1759.: Loeb DD, Padgett RW, Hardies SC, Shehee WR, Comer MB, Edgell MH, Hutchison CA: The sequence of a large L1Md element reveals a tandemly repeated 5' end and several features found in retrotransposons. Mol Cell Biol. 1986, 6: 168-182.: Scott AF, Schmeckpeper BJ, Abdelrazik M, Comey CT, O'Hara B, Rossiter JP, Cooley T, Heath P, Smith KD, Margolet L: Origin of the human L1 elements: proposed progenitor genes deduced from a consensus DNA sequence. Genomics. 1987, 1: 113-125. 10.1016/0888-7543(87)90003-6.: Hohjoh H, Singer MF: Sequence-specific single-strand RNA binding protein encoded by the human LINE-1 retrotransposon. EMBO J. 1997, 16: 6034-6043. 10.1093/emboj/16.19.6034.: Kolosha VO, Martin SL: In vitro properties of the first ORF protein from mouse LINE-1 support its role in ribonucleoprotein particle formation during retrotransposition. Proc Natl Acad Sci USA. 1997, 94: 10155-10160. 10.1073/pnas.94.19.10155.: Kolosha VO, Martin SL: High-affinity, non-sequence-specific RNA binding by the open reading frame 1 (ORF1) protein from long interspersed nuclear element 1 (LINE-1). J Biol Chem. 2003, 278: 8112-8117. 10.1074/jbc.M210487200.: Martin SL, Bushman FD: Nucleic acid chaperone activity of the ORF1 protein from the mouse LINE-1 retrotransposon. Mol Cell Biol. 2001, 21: 467-475. 10.1128/MCB.21.2.467-475.2001.: Feng Q, Moran JV, Kazazian HH, Boeke JD: Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell. 1996, 87: 905-916. 10.1016/S0092-8674(00)81997-2.: Mathias SL, Scott AF, Kazazian HH, Boeke JD, Gabriel A: Reverse transcriptase encoded by a human transposable element. Science. 1991, 254: 1808-10. 10.1126/science.1722352.: Han JS, Szak ST, Boeke JD: Transcriptional disruption by the L1 retrotransposon and implications for mammalian transcriptomes. Nature. 2004, 429: 268-274. 10.1038/nature02536.: Han JS, Boeke JD: A highly active synthetic mammalian retrotransposon. Nature. 2004, 429: 314-318. 10.1038/nature02535.: An W, Han JS, Wheelan SJ, Davis ES, Coombes CE, Ye P, Triplett C, Boeke JD: Active retrotransposition by a synthetic L1 element in mice. Proc Natl Acad Sci USA. 2006, 103: 18662-18667. 10.1073/pnas.0605300103.: Naas TP, DeBerardinis RJ, Moran JV, Ostertag EM, Kingsmore SF, Seldin MF, Hayashizaki Y, Martin SL, Kazazian HH: An actively retrotransposing, novel subfamily of mouse L1 elements. EMBO J. 1998, 17: 590-597. 10.1093/emboj/17.2.590.: Ostertag EM, Prak ET, DeBerardinis RJ, Moran JV, Kazazian HH: Determination of L1 retrotransposition kinetics in cultured cells. Nucleic Acids Res. 2000, 28: 1418-1423. 10.1093/nar/28.6.1418.: Lindtner S, Felber BK, Kjems J: An element in the 3' untranslated region of human LINE-1 retrotransposon mRNA binds NXF1(TAP) and can function as a nuclear export element. RNA. 2002, 8: 345-356. 10.1017/S1355838202027759.: Alisch RS, Garcia-Perez JL, Muotri AR, Gage FH, Moran JV: Unconventional translation of mammalian LINE-1 retrotransposons. Genes Dev. 2006, 20: 210-224. 10.1101/gad.1380406.: Li PW, Li J, Timmerman SL, Krushel LA, Martin SL: The dicistronic RNA from the mouse LINE-1 retrotransposon contains an internal ribosome entry site upstream of each ORF: implications for retrotransposition. Nucleic Acids Res. 2006, 34: 853-864. 10.1093/nar/gkj490.: Kulpa DA, Moran JV: Cis-preferential LINE-1 reverse transcriptase activity in ribonucleoprotein particles. Nat Struct Mol Biol. 2006, 13: 655-660. 10.1038/nsmb1107.: Belancio VP, Whelton M, Deininger P: Requirements for polyadenylation at the 3' end of LINE-1 elements. Gene. 2007, 390: 98-107. 10.1016/j.gene.2006.07.029.: Belancio VP, Hedges DJ, Deininger P: LINE-1 RNA splicing and influences on mammalian gene expression. Nucleic Acids Res. 2006, 34: 1512-1521. 10.1093/nar/gkl027.: Kulpa DA, Moran JV: Ribonucleoprotein particle formation is necessary but not sufficient for LINE-1 retrotransposition. Hum Mol Genet. 2005, 14: 3237-3248. 10.1093/hmg/ddi354.: Belgnaoui SM, Gosden RG, Semmes OJ, Haoudi A: Human LINE-1 retrotransposon induces DNA damage and apoptosis in cancer cells. Cancer Cell Int. 2006, 6: 13-10.1186/1475-2867-6-13.: Haoudi A, Semmes OJ, Mason JM, Cannon RE: Retrotransposition-Competent Human LINE-1 Induces Apoptosis in Cancer Cells With Intact p53. J Biomed Biotechnol. 2004, 2004: 185-194. 10.1155/S1110724304403131.: Gasior SL, Wakeman TP, Xu B, Deininger PL: The human LINE-1 retrotransposon creates DNA double-strand breaks. J Mol Biol. 2006, 357: 1383-1393. 10.1016/j.jmb.2006.01.089.: Wallace NA, Belancio VP, Deininger PL: L1 mobile element expression causes multiple types of toxicity. Gene. 2008, 419: 75-81. 10.1016/j.gene.2008.04.013.: Hatfield GW, Roth DA: Optimizing scaleup yield for protein production: Computationally Optimized DNA Assembly (CODA) and Translation Engineering. Biotechnol Annu Rev. 2007, 13: 27-42. full_text.: Stemmer WP, Crameri A, Ha KD, Brennan TM, Heyneker HL: Single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides. Gene. 1995, 164: 49-53. 10.1016/0378-1119(95)00511-4.: Huxley A: Brave New World. 1932, London: HarperCollins: Download references: Acknowledgements: This work was supported in part by NIH grant CA16519.: Author information: Affiliations: Corresponding author: Correspondence to Jef D Boeke.: Additional information: Competing interests: The authors declare that they have no competing interests.: Authors' contributions: WA, LD, JSH and JDB designed the experiments, WA, LD, AMN, AY, KAO, and JSH performed the experiments. LD, AMN and JDB wrote the manuscript. All authors read and approved the final manuscript.: Wenfeng An, Lixin Dai, Anna Maria Niewiadomska contributed equally to this work.: Electronic supplementary material: Additional file 1:Supplementary Figures 1-4. (1) Alignment of native human L1RP with ORFeus-Hs. BioEdit was used to create a nucleic acid alignment of native human L1 and ORFeus-Hs, starting at the ATG of open reading frame (ORF)1 and ending at the stop codon of ORF2. For these sequences, the base composition of L1RP is 40% A (1998), 21% C (1047), 19% G (906), 20% T (967). The base composition of ORFeus-HS is 27% A (1322), 34% C (1648), 27% (1314) G, 12% T (624). L1RP (Genbank accession number AF148856) was used as the sequence for native human L1. Identities are marked with asterisks. Start and stop codons of ORF1 and ORF2, restriction sites used to clone building blocks (Mfe I, BsmB I, Asc I, Age I, BstB I, Nru I, Xma I, Mlu I, Nhe I, EcoR V, Nde I, Cla I, Xho I) and make native/synthetic chimeras (Pml I, EcoR I and BamH I) are highlighted in gray boxes. (2) Alignment of different promoters used in this study. Kozak sequence and boundaries of cytomegalovirus (CMV) and the 5' untranslated region (UTR) are highlighted. (3) Quantification of L1 element amplification protocol (LEAP). (A) LEAP was performed using a ribonucleoprotein (RNP) preparation with the indicated amount, and an equal amount of PCR product was loaded onto a 1.5% agarose gel. The arrow indicates the mobility of LEAP PCR product on the gel. (B) The density of the bands was quantified using the Multi-Gauge program and plotted as a function of the amount of ORFeus-Hs RNP. A trend line was drawn using values from lane 3 to lane 6 and the X-axis values from lane 1 to lane 2 (0.03 and 0.05 µg respectively) were calculated based on the trend line. Blue diamond = data from ORFeus-Hs RNP (lanes 3 to 8); red triangle = data from L1RP RNP (lanes 1 and 2). X-axis, amount of ORFeus-Hs RNP in uG; Y-axis, pixel value. (4) Cell-viability assay. Relative cell viability is presented as the ratio of viable puromycin-resistant cells divided by total viable cells (without puromycin selection). Four independent transfections were performed for each construct, and triplet reading was acquired from each transfection. All values were normalized to the transfection efficiency acquired by fluorescence-activated cell sorting, and standard error is shown. (PDF 3 MB): Additional file 2:Supplementary Tables. Tables 1 and 2. (DOC 81 KB): Authors’ original submitted files for images: Below are the links to the authors’ original submitted files for images.: Authors’ original file for figure 1: Authors’ original file for figure 2: Authors’ original file for figure 3: Authors’ original file for figure 4: Rights and permissions: Reprints and Permissions: About this article: Cite this article: An, W., Dai, L., Niewiadomska, A.M. et al. Characterization of a synthetic human LINE-1 retrotransposon ORFeus-Hs. Mobile DNA 2, 2 (2011). https://doi.org/10.1186/1759-8753-2-2: Download citation: Received: 27 October 2010: Accepted: 14 February 2011: Published: 14 February 2011: DOI: https://doi.org/10.1186/1759-8753-2-2: Keywords: Advertisement: ISSN: 1759-8753: Follow BMC: By using this website, you agree to our Terms and Conditions, Privacy statement and Cookies policy. Manage the cookies we use in the preference centre. : © 2020 BioMed Central Ltd unless otherwise stated. Part of Springer Nature. "
"Reverse transcription of the pFOXC mitochondrial retroplasmids of Fusarium oxysporum is protein primed" "Jeffrey T Galligan, Sarah E Marchetti, John C Kennell" "John C Kennell" "21 January 2011" "The pFOXC retroplasmids are small, autonomously replicating DNA molecules found in mitochondria of certain strains of the filamentous fungus Fusarium oxysporum and are among the first linear genetic elements shown to replicate via reverse transcription. The plasmids have a unique clothespin structure that includes a 5'-linked protein and telomere-like terminal repeats, with pFOXC2 and pFOXC3 having iterative copies of a 5 bp sequence. The plasmids contain a single large open reading frame (ORF) encoding an active reverse transcriptase (RT). The pFOXC-RT is associated with the plasmid transcript in a ribonucleoprotein (RNP) complex and can synthesize full-length (-) strand cDNA products. In reactions containing partially purified RT preparations with exogenous RNAs, the pFOXC3-RT has been shown to initiate cDNA synthesis by use of snapped-back RNAs, as well as loosely associated DNA primers., The complete sequence of the distantly related pFOXC1 plasmid was determined and found to terminate in 3-5 copies of a 3 bp sequence. Unexpectedly, the majority of (-) strand cDNA molecules produced from endogenous pFOXC1 transcripts were attached to protein. In vitro experiments using partially purified pFOXC3-RT preparations having a single radiolabeled deoxyribonucleotide triphosphate (dNTP) generated a nucleotide-labeled protein that migrated at the size of the pFOXC-RT. The nucleotide preference of deoxynucleotidylation differed between pFOXC3 and pFOXC1 and showed complementarity to the respective 3' terminal repeats. In reactions that include exogenous RNA templates corresponding to the 3' end of pFOXC1, a protein-linked cDNA product was generated following deoxynucleotidylation, suggesting that reverse transcription initiates with a protein primer., The finding that reverse transcription is protein primed suggests the pFOXC retroplasmids may have an evolutionary relationship with hepadnaviruses, the only other retroelement family known to initiate reverse transcription via a protein primer. Moreover, the similarity to protein-primed linear DNA elements supports models in which the terminal repeats are generated and maintained by a DNA slideback mechanism. The ability of the pFOXC-RT to utilize RNA, DNA and protein primers is unique among polymerases and suggests that the pFOXC plasmids may be evolutionary precursors of a broad range of retroelements, including hepadnaviruses, non-long terminal repeat (non-LTR) retrotransposons and telomerase." "Reverse Transcription Reaction, Plasmid Transcript, cDNA Product, Micrococcal Nuclease, Retroelement Family" " Reverse transcription of the pFOXC mitochondrial retroplasmids of Fusarium oxysporum is protein primed: Jeffrey T Galligan1,2, Sarah E Marchetti1,3 & John C Kennell1 : Mobile DNA volume 2, Article number: 1 (2011) Cite this article : 5599 Accesses: 8 Citations: 0 Altmetric: Metrics details: Abstract: Background: The pFOXC retroplasmids are small, autonomously replicating DNA molecules found in mitochondria of certain strains of the filamentous fungus Fusarium oxysporum and are among the first linear genetic elements shown to replicate via reverse transcription. The plasmids have a unique clothespin structure that includes a 5'-linked protein and telomere-like terminal repeats, with pFOXC2 and pFOXC3 having iterative copies of a 5 bp sequence. The plasmids contain a single large open reading frame (ORF) encoding an active reverse transcriptase (RT). The pFOXC-RT is associated with the plasmid transcript in a ribonucleoprotein (RNP) complex and can synthesize full-length (-) strand cDNA products. In reactions containing partially purified RT preparations with exogenous RNAs, the pFOXC3-RT has been shown to initiate cDNA synthesis by use of snapped-back RNAs, as well as loosely associated DNA primers.: Results: The complete sequence of the distantly related pFOXC1 plasmid was determined and found to terminate in 3-5 copies of a 3 bp sequence. Unexpectedly, the majority of (-) strand cDNA molecules produced from endogenous pFOXC1 transcripts were attached to protein. In vitro experiments using partially purified pFOXC3-RT preparations having a single radiolabeled deoxyribonucleotide triphosphate (dNTP) generated a nucleotide-labeled protein that migrated at the size of the pFOXC-RT. The nucleotide preference of deoxynucleotidylation differed between pFOXC3 and pFOXC1 and showed complementarity to the respective 3' terminal repeats. In reactions that include exogenous RNA templates corresponding to the 3' end of pFOXC1, a protein-linked cDNA product was generated following deoxynucleotidylation, suggesting that reverse transcription initiates with a protein primer.: Conclusions: The finding that reverse transcription is protein primed suggests the pFOXC retroplasmids may have an evolutionary relationship with hepadnaviruses, the only other retroelement family known to initiate reverse transcription via a protein primer. Moreover, the similarity to protein-primed linear DNA elements supports models in which the terminal repeats are generated and maintained by a DNA slideback mechanism. The ability of the pFOXC-RT to utilize RNA, DNA and protein primers is unique among polymerases and suggests that the pFOXC plasmids may be evolutionary precursors of a broad range of retroelements, including hepadnaviruses, non-long terminal repeat (non-LTR) retrotransposons and telomerase.: Background: Retroplasmids are autonomously replicating genetic elements that represent a lineage of mobile elements that replicate via reverse transcription. Thus far, retroplasmids have only been found in mitochondria of filamentous fungi and, like mitochondrial DNA plasmids, they exist in both linear and circular forms (reviewed in [1]). As a group, retroplasmids are relatively small and simple. They range in size from 1.9 to approximately 4.0 kb, and have a single open reading frame (ORF) that encodes a reverse transcriptase (RT). Retroplasmid RTs lack domains that are often associated with other RTs, such as an RNAse H domain, and are thought to be ancestral to a broad range of retroelements as they are deeply rooted within the RT phylogenies [2, 3]. Their primitive nature is supported by the mechanisms used to initiate cDNA synthesis; the RT encoded by the circular Mauriceville plasmid of Neurospora crassa has been shown to initiate cDNA synthesis de novo (that is, without a primer), suggesting that it is mechanistically related to RNA-dependent RNA polymerases [4], whereas the RT encoded by the linear pFOXC3 plasmid of Fusarium oxysporum can initiate reverse transcription using RNA or DNA primers having minimal base pairing interactions with templates [5]. The structural and mechanistic features of the plasmids, together with their possible evolutionary origin in the precursors of mitochondria, has led to speculation that mitochondrial retroplasmids represent a type of 'molecular fossil' which has been defined as a contemporary genetic element that is ancient in origin and can reveal information about the evolutionary past [4, 6, 7].: The pFOXC retroplasmids are 1.9 kb linear DNA molecules found in mitochondria of certain forma speciales of the fungal plant pathogen F. oxysporum. Plasmids that have been identified to date fall into two homology groups, the pFOXC2/pFOXC3 group and the pFOXC1 group [1, 8, 9]. The RTs encoded by pFOXC2 and pFOXC3 have 93% amino acid sequence identity within the highly conserved RT domains yet they each share less than 40% sequence identity within the conserved domains of the pFOXC1-RT [6], indicating a relatively high degree of evolutionary divergence between the two homology groups. The plasmid DNAs have a 'clothespin' structure that includes a hairpin at one terminus and telomere-like repeats at the other terminus (Figure 1), with plasmids pFOXC2 and pFOXC3 each having 3-5 copies of a 5 bp sequence (5'-ATCTA; Table 1; [6]). Characterization of in vivo replication intermediates showed that the pentameric repeats are transcribed and that reverse transcription begins at or near the 3' end of the plasmid transcript. The number of repeats in (-) strand cDNA intermediates was found to be slightly greater than the number in the plasmid transcripts, suggesting that the maintenance and generation of the terminal repeats occurs during the reverse transcription step of plasmid replication [5]. The association of linear genetic elements having telomere-like repeats that replicate via reverse transcription suggests that the pFOXC plasmids may be contemporary descendants of primitive chromosomes and/or have a direct evolutionary relationship to telomerase. In addition, the 3' repeats of the pFOXC plasmids bear a striking resemblance to 3' tails of certain long and short interspersed elements (LINEs and SINEs, respectively) that have been shown to generate additional repeats during retrotransposition [10, 11].: Schematic diagram of the pFOXC plasmids. The pFOXC retroplasmids are approximately 1.9 kb linear, double-stranded DNA molecules that have a clothespin structure. They possess a covalently closed hairpin and iterative terminal repeats (black boxes) with a 5'-linked protein (circle). The plasmids have a single open reading frame (ORF) encoding a reverse transcriptase (RT; open box). The location of conserved domains characteristic of reverse transcriptases is indicated by shaded regions 1, 2, 2a, 3-7, with domain 2a also being conserved among non-long terminal repeat (non-LTR) retrotransposons. Details concerning the plasmid ORFs and iterative repeats are given in Table 1.: The pFOXC plasmids also have a 5'-linked protein, which is reminiscent of a variety of linear DNA and RNA genetic elements that replicate by use of a protein primer. Protein-primed replication has been most thoroughly studied with linear DNA viruses, such as adenovirus and bacteriophage <U+0424>-29 (reviewed in [12, 13]). These linear DNA molecules possess a covalently linked 5' terminal protein (TP). During initiation of replication, the TP associates with the viral DNA polymerase and the TP-polymerase complex binds to the genome terminus that includes one or more iterations of a 1-3 nucleotide sequence. The polymerase catalyzes a phosphodiester bond between the initiating nucleotide and the hydroxyl group of a serine, threonine or tyrosine residue of the TP [13]. Linear mitochondrial DNA plasmids also possess covalently bound 5' terminal proteins and replication appears to initiate via protein priming associated with the plasmid encoded DNA polymerase [14, 15]. To date, hepadnaviruses are the only retroelement family shown to employ a protein primer. The partially double-stranded, circular hepadnaviral genome encodes an RT that initiates reverse transcription at a specific site of a structured RNA (e) by use of a tyrosine residue located in the TP domain of the RT itself [16, 17]. Similar protein-priming mechanisms have also been reported with certain RNA viruses (for example, polio [18]).: Previous studies using isolated mitochondrial ribonucleoprotein (mtRNP) particles from pFOXC plasmid-containing strains demonstrated that reverse transcription begins opposite the 3' end of the plasmid RNA and can generate full length (-) strand cDNA products [6]. Studies using an in vitro system to investigate reverse transcription showed that the pFOXC-RT can initiate cDNA synthesis by the use of snapped-back RNAs or DNA primers. The RT showed preference for DNA primers that bound to the 3'-most terminal repeat of in vitro synthesized RNA templates. The RT was also found to copy DNA templates and was able to extend weakly associated primers that had up to three 3' mismatches with the template [5].: Here, we provide evidence that the pFOXC retroplasmids can use a protein primer to initiate reverse transcription. Analysis of reverse transcription reactions using pFOXC1-containing mtRNPs demonstrated that a large portion of the products are associated with protein, and an in vitro system using partially purified pFOXC1-RT preparations with exogenous RNA templates corresponding to the 3' end of the pFOXC1 plasmid produced a protein-linked cDNA. Analogous reactions using pFOXC3-containing mtRNPs revealed a nucleotide-linked protein that migrated at the size of the plasmid RT, suggesting that the RT is serving as primer. These findings suggest that the pFOXC plasmids have an evolutionary relationship with protein-primed genetic elements, including hepadnaviruses, a finding that may shed new light on the evolutionary origins of the telomerase complex.: Results: Prior studies of reverse transcription reactions using mtRNPs from plasmid-containing strains failed to provide evidence that large RNA or DNA molecules are associated with nascent cDNA replication intermediates [6], yet studies of in vitro reactions using exogenous templates indicated that the pFOXC3-RT could use snapped-back RNAs or loosely associated DNAs to prime cDNA synthesis [5]. To investigate whether snapped-back RNAs (or possibly small DNAs) are used as a method of initiation in vivo, plasmid replication products were re-examined by primer extension analysis in hopes of capturing remnants of nucleic acid primers. For this analysis, total nucleic acids were isolated from mitochondria and used in reactions having an end-labeled primer that is complementary to a region approximately 100 nucleotides downstream of the 5' end of (-) strand cDNAs. To extend the primers, Moloney murine leukemia virus (MMLV)-RT was used as it is capable of copying both RNA and DNA templates, and products were separated on a denaturing polyacrylamide gel. A minor fraction (<5%) of the labeled products extended beyond the site corresponding to the 3' end of the plasmid DNA (data not shown). These products were isolated, amplified by anchored PCR, cloned and sequenced. The sequence of several clones indicated that the primer extension products terminated at the site corresponding to the 3' end of the plasmid and had no additional sequences that might be suggestive of an attached primer. It is likely that a fraction of the primer extension products reassociated with DNA or cDNA templates during electrophoresis or were otherwise held up during the migration into the polyacrylamide gel. These findings, coupled with previous analysis of plasmid replication intermediates [5], failed to provide evidence of nucleic acid primers associated with nascent in vivo (-) strand cDNAs.: Analysis of the distantly related pFOXC1: To gain additional insight into the mechanism of reverse transcription of the pFOXC retroplasmids, experiments were carried out with pFOXC1, which is found in the mitochondria of F. oxysporum f. sp. conglutinans. Previous reports indicated that the sequence of a 785 bp internal Bgl II restriction fragment of pFOXC1 contained a potential open reading frame that has similarity to the RTs of pFOXC2 and pFOXC3 [6, 19] yet, the percentage amino acid sequence identity within the highly conserved RT regions was surprisingly low (<40%), indicating that pFOXC1 is distantly related. To complete the sequence of pFOXC1, the terminal regions of the plasmid DNA were cloned using an anchored PCR approach and the multiple clones were sequenced. The complete plasmid DNA was found to be 1,867 bp, which is an approximation as the 3' end of the plasmid DNA is heterogeneous in length (GenBank accession no.: HQ026775; Table 1). The plasmid contains a single, large open reading frame that is predicted to encode a polypeptide of 497 amino acids that has highly conserved domains associated with known reverse transcriptases. The predicted pFOXC1-RT is 30 amino acids shorter than the RTs of pFOXC2 and pFOXC3 and is amongst the shortest functional RTs described to date. Similar to the other retroplasmids, the 3' end of the pFOXC1 plasmid has short repeats but, interestingly, they differ in both length and sequence from the terminal repeats of pFOXC2 and pFOXC3. The sequence of clones of the terminus downstream of the ORF showed that the end terminates with three to five copies of the three base sequence 5' CAA (Table 1 and Additional file 1).: Reverse transcription reactions using mitochondrial ribonucleoprotein (mtRNP) particles isolated from a pFOXC1-containing strain (777) were subjected to a variety of pretreatments and post-treatments as previously described for pFOXC2-containing and pFOXC3-containing strains [6]. Mitochondrial RNPs isolated from the pFOXC1-containing strain had reverse transcriptase activity, as measured by the ability to incorporate [32P]-labeled dNTPs into high molecular weight products. The level of activity was comparable to that previously reported for pFOXC2-containing and pFOXC3-containing strains and reverse transcriptase activity was distinguished from DNA polymerase activity by its sensitivity to pretreatment with RNase A and insensitivity to the DNA-dependent DNA polymerase inhibitor actinomycin D (Additional file 2).: When labeled cDNA products were analyzed on agarose gels, most of the reaction products were unexpectedly retained in the loading well (Figure 2a, lanes 1-3). To prevent protein aggregation, SDS was included in the gel (to a concentration of 0.2%), and reactions were heated in the presence of 0.2% SDS (65°C, 5 min) prior to electrophoresis. This treatment enabled the majority of the labeled products to migrate into agarose gels (Figure 2a, lanes 4-6). The cDNA products were in the size range of 0.5 to 2.0 kb, and a larger product (>12 kb) was also found in reactions that were not pretreated with actinomycin D (Figure 2a, lane 4), which most likely reflects mitochondrial DNA polymerase activity. As expected, labeled products were not generated when the mtRNPs were pretreated with RNase A (Figure 2a, lane 6).: A portion of pFOXC1 and pFOXC3 endogenous reverse transcription products is associated with protein. (a) Mitochondrial ribonucleoproteins (mtRNPs) from pFOXC1-containing strains were untreated (lanes 1 and 4), or pretreated with actinomycin D (A; lanes 2 and 5) or RNAse A (R; lanes 3 and 6) for 5 min prior to reverse transcription reactions. Following precipitation, products were heated and electrophoresed in 1.2% agarose gels without (lanes 1-3) or with (lanes 4-6) 0.2% SDS in the gel and loading buffer. (b) Products from endogenous reverse transcription reactions pretreated with actinomycin D and using pFOXC1-containing (lanes 1-6) and pFOXC3-containing (lanes 7-9) mtRNPs were subjected to the following treatments: lane 1, no treatment; lane 2, incubation with proteinase K (K); lane 3, extraction with phenol-CIA (f); lane 4, incubation with proteinase K followed by extraction with phenol-CIA (K, f). Lanes 5 and 6 contain acetone-precipitated products recovered from the organic phase of the phenol extractions shown in lanes 3 and 4, respectively. Minus-strand cDNA products from pFOXC3-containing mtRNPs were subjected to the following treatments: lane 7, no treatment; lane 8, incubation with proteinase K followed by extraction with phenol-CIA (K, f); lane 9, extraction with phenol-CIA (f). Products were heated at 65°C in 0.2% SDS followed by electrophoresis in a 1.2% agarose gel containing 0.2% SDS. Marker sizes from 5'-end labeled <U+03BB>-Pst I restriction fragments are indicated in kb pairs on the left. The location of the wells is indicated with a gray arrow. The full-length (-) strand cDNA product is indicated on the right with a black arrow and a high molecular weight band detected in reactions lacking actinomycin D is indicated with an asterisk.: When products of the endogenous reactions were further separated by extending the time of electrophoresis, a major cDNA product was observed that migrated at 1.9 kb, which matches the length of the plasmid RNA. This is analogous to the major products obtained with pFOXC2-containing and pFOXC3-containing mtRNPs that were previously shown to represent (-) strand cDNA products (Figure 2b; [6]). When products of the reverse transcription reactions using pFOXC1-containing mtRNPs were extracted with phenol-chloroform-isoamyl alcohol (CIA), only a small portion were recovered by ethanol precipitation of the aqueous phase and the majority of the cDNA products remained in the organic phase (Figure 2b, lane 3). Treatment of the reaction products with proteinase K resulted in a slight increase in migration compared to untreated reactions (Figure 2b, compare lanes 1 and 2) and prevented cDNA products from being extracted by phenol-CIA (Figure 2b, lane 4). To confirm that the untreated cDNA products extracted with phenol-CIA were trapped in the organic phase, the phenol-CIA layer was precipitated with acetone and labeled products were recovered (Figure 2b, lane 5). In contrast, no labeled products were recovered in the organic phase of extractions from reactions that were post-treated with proteinase K (Figure 2b, lane 6). These results suggest that the products of reverse transcription are attached to protein.: Given the effect that SDS had on the pFOXC1 reaction products, parallel reactions were conducted using mtRNPs containing pFOXC3 (Figure 2b, lanes 7-9). A fraction of cDNAs was removed by extraction with phenol-CIA (Figure 2b, lane 9), suggesting that a smaller, yet significant, portion of pFOXC3 cDNA products is associated with protein and was previously overlooked. Taken together, these results indicate that varying portions of the labeled (-) strand cDNA products derived from endogenous reverse transcription reactions are attached to protein.: Evidence of protein-primed reverse transcription: Due to the observation that a portion of endogenous (-) strand cDNA products are attached to protein, coupled with the previous demonstration that the plasmid DNA is retained in the organic phase following extraction with phenol, unless pretreated with protease K, and is insensitive to digestion with <U+03BB> exonuclease (a 5' exonuclease; [6, 9]), we hypothesized that the pFOXC-RT uses a protein to prime reverse transcription. To test this hypothesis, reverse transcription reactions were performed using exogenous RNAs that correspond to the 3' end of the plasmid transcripts (that is, 'exogenous' reactions). As described in a previous study [5], endogenous nucleic acids associated with mtRNP particles are first degraded by treatment with micrococcal nuclease (MN) to liberate the plasmid RT. Following chelation of free Ca++ with ethyleneglycol tetra-acetic acid (EGTA), MN-treated mtRNPs are used in reactions having in vitro synthesized RNAs together with a [a-32P]-labeled nucleotide and a full complement of unlabeled nucleotides. However, in contrast to previous studies in which the labeled products were precipitated with ethanol and resolved on denaturing urea polyacrlyamide gels, exogenous reactions were instead terminated by boiling in Laemmli buffer and products were resolved via SDS-PAGE to examine possible protein-linked cDNA products.: In reactions using MN-treated pFOXC1-containing mtRNPs, cDNA products were not detected in the absence of an exogenous RNA (Figure 3a, lane 1), yet when a 92 nucleotide transcript that corresponds to the 3' end of the pFOXC1 RNA having 4 copies of the 5' CAA repeat was included, a discrete band was observed that migrated at approximately 120 kDa, as well as non-discrete products migrating below 35 kDa (Figure 3a, lane 2). The omission of thymidine triphosphate (TTP) or deoxyguanosine triphosphate (dGTP) from the reactions prevented the synthesis of both the approximately 120 kDa product and the majority of the products under 35 kDa, whereas the omission of deoxycytidine triphosphate (dCTP) only slightly affected the intensity of the 120 kDa band. Significantly, when the products of a reaction having an exogenous RNA and a full complement of nucleotides were post-treated with proteinase K, the 120 kDa product was eliminated and a prominent band was observed at approximately 34 kDa. When a duplicate reverse transcription reaction was extracted with phenol-CIA, labeled products migrating above approximately 35 kDa were eliminated (Figure 3a, lane 7). To demonstrate that the 120 kDa product was retained in the organic phase following extraction, the phenol-CIA layer was precipitated with acetone and products were resolved on SDS polyacrylamide gels. The 120 kDa product was recovered from the organic phase (Figure 3a, lane 9), which strongly suggests that the labeled product is covalently attached to protein. In contrast, the non-discrete reverse transcription products that migrate below 35 kDa were neither affected by proteinase K treatment nor removed by extraction with phenol-CIA, indicating that they are not associated with a protein and are likely cDNAs that were initiated from snapped-back RNAs.: Protein-primed reverse transcription and identification of a nucleotide-linked protein in exogenous reverse transcription reactions. Mitochondrial ribonucleoprotein (mtRNP) particles were treated with micrococcal nuclease (MN), incubated with actinomycin D, and used in reactions having [a-32P]dATP with various combinations of deoxyribonucleotide triphosphates (dNTPs), either in the absence or presence of an exogenous RNA, as indicated. (a) MN-treated pFOXC1-containing mtRNP particles were incubated in the absence (lane 1) or presence of a 92 nucleotide RNA corresponding to the 3' terminus of the pFOXC1 transcript with a full complement of nucleotides (lane 2). Reactions in lanes 3-5 were performed as in lane 2, but a single unlabeled nucleotide was omitted, as indicated. Products from exogenous reactions having a full complement of nucleotides and exogenous RNA were post-treated with proteinase K (K; lane 6), or extracted with phenol-CIA (f; lane 7), prior to precipitation with ethanol. Products from a duplicate reaction of that in lane 2 were recovered from the aqueous (lane 8) and organic (lane 9) phase. (b) MN-treated pFOXC3-containing mtRNPs incubated in the absence (lanes 2-4) or presence (lanes 1, 5 and 6) of a 98 nucleotide RNA corresponding to the 3' terminus of the pFOXC3 transcript with [a-32P]dATP and combinations of unlabeled nucleotides, as indicated. Products from a duplicate reaction of that shown in lane 5 were post-treated with proteinase K (K; lane 6). All reactions were boiled in Laemmli buffer prior to separation via 4-20% gradient SDS-PAGE. Prestained protein size markers are indicated on the left in kDa. Protein-linked products and (-) strand cDNA products are indicated on the right.: Similar experiments were performed with pFOXC3-containing mtRNPs. In exogenous reactions having a 98 nucleotide RNA corresponding to the 3' end of the pFOXC3 plasmid having three copies of the 5 bp repeat and using [a-32P]dATP with a full complement of nucleotides, labeled products were observed in the range of 20-115 kDa, with the majority of products being below 35 kDa (Figure 3b, lane 1). As was done in experiments using MN-treated pFOXC1 mtRNPs, reactions were performed in the presence and absence of an RNA template and with different combinations of nucleotides. Interestingly, most of these reactions produced a single radiolabeled band that migrated at 60 kDa (Figure 3b, lanes 2-4). The intensity of the band varied depending on the specific combination of unlabeled nucleotides used in the reactions and, unlike reactions with pFOXC1-containing mtRNPs, the generation of this product was not dependent on the presence of an exogenous RNA template. Post-treatment of exogenous reactions with proteinase K eliminated the 60 kDa product and had little effect on the RNA-primed (that is, snapped-back) cDNA products that migrate below 35 kDa (compare lanes 6 and 5). Experiments carried out with mtRNPs isolated from a plasmid-free strain of F. oxysporum failed to produce labeled products (data not shown). Collectively, the results suggest that reactions using MN-treated pFOXC3-containing mtRNPs produce a nucleotide-linked protein in the absence of an exogenous RNA template.: Identification of the nucleotide-linked protein in pFOXC3-containing mtRNPs: The predicted size of the pFOXC3-RT polypeptide is 62 kDa, approximately the size of the nucleotide-linked protein detected in reverse transcription reactions using MN-treated pFOXC3-containing mtRNPs. To determine if the labeled protein is the pFOXC3-RT itself, an antibody was generated against a synthetic peptide that corresponds to amino acids 55-68 of the predicted pFOXC3-RT polypeptide (C3-RT55-68). This antibody successfully identifies a protein that migrates at 58-60 kDa in pFOXC3-containing mtRNPs and is absent in mtRNPs isolated from a plasmid-free (P-F) strain (Figure 4 and data not shown). The C3-RT55-68 antibody was then used to identify the pFOXC3-RT following exogenous reverse transcription reactions. For this experiment, products from reverse transcription reactions using MN-treated pFOXC3-containing mtRNPs were separated by SDS-PAGE and transferred to a nitrocellulose membrane, which was subsequently probed with the C3-RT55-68 antibody. A band corresponding to the pFOXC3-RT was identified migrating at 60 kDa (Figure 4a) and this band was absent in mtRNPs from the plasmid-free strain. The membrane was also exposed to phosphorimager analysis and a single radiolabeled band representing the nucleotide-linked protein was detected migrating at the same position as the pFOXC3-RT. These results suggest that the protein covalently linked to DNA in exogenous reverse transcription reactions is the pFOXC3-RT itself.: A nucleotide-linked protein comigrates with pFOXC3-reverse transcriptase (RT) and is unaffected by strong base treatment. Products of reverse transcription reactions using mitochondrial ribonucleoprotein (mtRNP) particles isolated from a pFOXC3-containing strain or a plasmid-free strain (P-F) having [a-32P]dATP and unlabeled deoxyguanosine triphosphate (dGTP) and thymidine triphosphate (TTP) were separated via 10% SDS-PAGE and transferred to nitrocellulose. (a) The two panels on the left are from a nitrocellulose membrane probed with protein A-purified pFOXC3-RT55-68 rabbit antiserum and visualized by chemiluminesence. The panel on the right is a phosphorimage of radiolabeled protein on the same nitrocellulose membrane. Prestained protein size markers are indicated on the left in kDa. The gray arrow indicates non-specific bands detected in the plasmid-free mtRNP preparation. (b) Micrococcal nuclease (MN)-treated pFOXC3-containing mtRNPs were incubated with [a-32P]dATP and unlabeled dGTP and TTP. The products were separated via 4-20% gradient SDS-PAGE and radiolabeled products were detected by a phosphorimager. The gel was rehydrated in 1 M KOH and, following incubation at 55°C, the gel was neutralized and dried prior to detection by a phosphorimager.: To provide further evidence that the pFOXC3-RT is the nucleotide-linked protein, we attempted to use the pFOXC3-RT55-68 antibody to immunoprecipitate (IP) the labeled protein. Despite making modifications to the buffer, chaotropic agents (both anionic and non-ionic detergents) and antibody affinity resins, no successful combination of antibody-buffer-detergent-bead was found that would selectively capture the nucleotide-linked protein. A separate approach was taken to identify the isolated nucleotide-linked protein via mass spectrometry. Unfortunately, the initial attempt was inconclusive as none of the reported peptides had significant homology to the RT, or to mitochondrial or fungal proteins (data not shown). Thus, at this point, we do not have definitive evidence that the protein labeled in the RT reactions is the pFOXC3-RT, yet it remains the leading candidate and alternative approaches are being explored to identify the labeled protein.: Genetic elements that initiate DNA synthesis via protein priming use a hydroxyl group present in the side chain of a serine, threonine, or tyrosine of the primer to initiate synthesis (reviewed in [13]). Phosphoserine and phosphothreonine linkages have been shown to be labile under strong base treatment, whereas phosphotyrosine linkages are not [20]. To assess which type of amino acid residue is involved in forming the nucleotide linkage, reverse transcription products from MN-treated pFOXC3-containing mtRNPs were separated via SDS-PAGE and incubated under alkaline conditions (1 M KOH, 55°C, 2 h) that were previously shown to liberate phosphoserine and phosphothreonine linkages [20]. Following base treatment, the reaction products retained the 32P label, suggesting that the observed linkage occurs through a tyrosine residue of the protein (Figure 4b).: Nucleotide preference of deoxynucleotidylation: As shown in Figure 3b, the intensity of the 60 kDa nucleotide-linked protein (nucleotide-protein) varied depending on the combination of dNTPs used in the reactions. To investigate the specificity of the deoxynucleotidylation activity that generates the 60 kDa nucleotide-protein product, a comprehensive set of experiments was performed using a radiolabeled dNTP with different combinations of unlabeled dNTPs (summarized in Tables 2 and 3). The 60 kDa nucleotide-protein product was generated using each of the four radiolabeled deoxynucleotides alone; however, when additional nucleotides were included in the reactions, the intensity of the band was markedly higher in reactions having dATP and/or dGTP. In most cases, the addition of one, two or three unlabeled dNTPs resulted in increased levels of the 60 kDa nucleotide-protein product over reactions having a single deoxynucleotide (Table 2). In reactions using [a-32P]dATP plus one additional nucleotide, the relative amount of the 60 kDa nucleotide-protein increased slightly with dGTP (1.6-fold) and dCTP (1.3-fold) and was unaffected with TTP, while in reactions having [a-32P]dGTP, the addition of dATP led to a threefold increase in band intensity. A more pronounced effect was observed in reactions having two additional unlabeled nucleotides. Figure 5a shows the results of reactions carried out with [a-32P]dATP alone and in the presence of dCTP and dGTP or with dGTP and TTP. The intensity of the 60 kDa nucleotide-protein increased more than 10-fold in the presence of two unlabeled dNTPs compared to [a-32P]dATP alone (Table 2). This increase in labeled nucleotide-protein product was not as high when dCTP and TTP were used (2.6-fold), and similar results were obtained when using [a-32P]dGTP as the label (that is, the highest increase was observed using dATP with dCTP and dATP with TTP, but not dCTP with TTP). Experiments with [a-32P]dCTP showed that the addition of dATP with dGTP was the most effective combination (2.4-fold increase). In reactions with [a-32P]TTP, all three dinucleotide combinations (that is, dATP with dGTP, dATP with dCTP, and dCTP with dGTP) were effective (8-11-fold increase). While differences were detected depending on the specific nucleotide used as the label, the results of these experiments indicate that the deoxynucleotidylation associated with pFOXC3-containing mtRNPs requires two or more additional deoxynucleotides for maximum labeling and has a strong preference for dATP and dGTP.: Analysis of the nucleotide-linked protein in the presence of different combinations of deoxynucleotides and dideoxynucleotides. Micrococcal nuclease (MN)-treated pFOXC3-containing mitochondrial ribonucleoproteins (mtRNPs) were incubated with [a-32P]dATP or [a-32P]deoxyguanosine triphosphate (dGTP), and with unlabeled deoxyribonucleotide triphosphates (dNTPs) or dideoxynucleotide triphosphates (ddNTPs), as indicated. Reactions were terminated with Laemmli buffer and reaction products were separated via 4% to 20% SDS-PAGE. Prestained protein size markers are indicated on the left in kDa. (a) Reactions having [a-32P]dATP or [a-32P]dGTP with one unlabeled deoxynucleotide. (b) Reactions having [a-32P]dATP or [a-32P]dGTP, with a full complement of deoxynucleotides or dideoxynucleotides. (c) Reactions having [a-32P]dGTP and a single dideoxynucleotide with a complement of deoxynucleotides, as indicated. (d) Reactions having [a-32P]dATP and a full complement of deoxynucleotides incubated with or without phosphonoformate (PFA) at the indicated concentrations. Prestained protein size markers are indicated on the left in kDa and bands showing a slight size difference are indicated by an arrow.: Experiments using MN-treated pFOXC3-containing mtRNPs with RNAs that correspond to the 3' end of pFOXC3 failed to show the production of a discrete product that represents a protein-linked cDNA like that observed in reactions with pFOXC1-containing mtRNPs. However, when MN-treated pFOXC3 mtRNPs were used in reactions having [a-32P]dATP or [a-32P]dGTP with a full complement of unlabeled dNTPs in the absence of an RNA, these reactions unexpectedly yielded a radiolabeled cDNA product that migrated at approximately 115 kDa (Figure 3b, lane 1; Figure 5b-d). In a side-to-side comparison in a 10% PAGE gel, this product was found to be smaller than the 120 kDa product observed in reactions having pFOXC1-containing mtRNPs with an exogenous RNA. The 115 kDa product was sensitive to proteinase K treatment and, when analyzed on denaturing 8 M urea polyacrylamide gels, the cDNAs ranged in length with the majority being smaller than 50 nucleotides (not shown). The intensity of the 115 kDa band varied depending on the particular mtRNP preparation, suggesting that it could represent cDNA products that derive from RNA templates protected from micrococcal nuclease treatment. This was further substantiated in experiments using dideoxynucleotide triphosphates (ddNTPs). When ddNTPs were included in place of unlabeled dNTPs, the 115 kDa product was eliminated, whereas the 60 kDa band was unaffected compared to reactions having only the labeled deoxynucleotide (Figure 5b; compare lane 3 with 1, and 6 with 4). In reactions having a single dideoxynucleotide, dideoxy-ATP (ddATP) and dideoxythymidine triphosphate (ddTTP) completely eliminated the 115 kDa product, whereas reactions having dideoxycytidine triphosphate (ddCTP) resulted in products that migrated between 60-115 kDa (Figure 5c). Previous studies of protein-primed reverse transcription of the hepadnaviruses showed that deoxynucleotidylation could be distinguished from DNA synthesis by its insensitivity to the pyrophosphate analog phosphonoformate (PFA; [21]). Figure 5d shows that the formation of the 60 kDa nucleotide-protein product occurs at concentrations of PFA up to 10 mM, whereas the 115 kDa product is substantially reduced. This demonstrates that the 60 kDa and 115 kDa products derive from separate reactions and further suggests that the latter represents a protein-linked cDNA.: The increase in the intensity of the 60 kDa band when using [a-32P]dATP or [a-32P]dGTP with two or more unlabeled dNTPs suggests that multiple nucleotides are added to a protein in these reactions. A slight increase in the size of the nucleotide-linked protein was also observed in reactions having more than one dNTP (Figure 5b and data not shown). To determine if there was a preferred order of nucleotides, single dideoxynucleotides were included in reactions having a labeled dNTP and two unlabeled dNTPs (Table 3). The inclusion of ddATP inhibited the synthesis of the 60 kDa nucleotide-protein in all cases and resulted in a 40% reduction in band intensity in a reaction using [a-32P]dGTP, with dCTP and TTP. The inclusion of dideoxyguanosine triphosphate (ddGTP) inhibited reactions having dATP and dCTP and those with dCTP and TTP, but not with dATP and TTP. These findings suggest that deoxyadenosine monophosphate (dAMP) is added prior to deoxyguanosine monophosphate (dGMP) in the deoxynucleotidylation reactions.: Deoxynucleotidylation precedes cDNA synthesis: The nucleotide preference for the production of the 120 kDa cDNA-protein product associated with pFOXC1-containing mtRNPs was also examined. In reactions with [a-32P]dATP and a full complement of unlabeled dNTPs, a 120 kDa protein-linked cDNA was produced. However, in experiments using [a-32P]TTP with a full complement of unlabeled dNTPs, an approximately 60 kDa product was detected (Figure 6). This product was generated with and without an exogenous RNA (Figure 6a, lanes 2 and 3), similar to the 60 kDa nucleotide-product observed in reactions with pFOXC3-containing mtRNPs. Since the 3' end of pFOXC1 plasmid RNA is highly A-rich (ending in 4 repeats of CAA), the failure to copy the 92 nucleotide in vitro RNA could be impeded without sufficient concentrations of TTP. Increasing the concentration of unlabeled TTP led to a decrease in the intensity of the 60 kDa band due to competition with the radionucleotide (data not shown), yet chasing the reaction with 20 µM dNTPs after 1, 3, and 5 min led to the synthesis of the 120 kDa protein-primed cDNA product (Figure 6b). This finding suggests that deoxynucleotidylation of a 60 kDa protein with TTP precedes the copying of RNA templates and that higher concentrations of TTP are required for elongation.: A nucleotide-linked protein generated with [a-32P]thymidine triphosphate (TTP) in pFOXC1-mitochondrial ribonucleoproteins (mtRNPs) is chased into a protein-cDNA product. (a) Micrococcal nuclease (MN)-treated pFOXC1-containing mtRNPs incubated with [a-32P]dATP or [a-32P]TTP and a full complement of deoxynucleotides, with or without a 92 nucleotide RNA template, as indicated. (b) MN-treated pFOXC1-containing mtRNPs incubated with [a-32P]TTP with or without a 92 nucleotide RNA corresponding to the 3' end of pFOXC1. All four unlabeled deoxyribonucleotide triphosphates (dNTPs) were added to reactions to a final concentration of 20 µM at the times indicated. All reactions were terminated with Laemmli buffer and reaction products were separated via 4-20% gradient SDS-PAGE. Prestained protein size markers are indicated on the left in kDa.: Discussion: The evidence presented here suggests that reverse transcription catalyzed by the RTs encoded by the pFOXC mitochondrial retroplasmids is protein primed. The initial observation that the plasmid DNAs have a 5'-linked protein led to speculation that a protein is used to initiate reverse transcription [6] yet protein-linked cDNAs were not observed in previous studies which suggested that other mechanisms of initiation were involved. Prior studies focused on the mechanism of reverse transcription of the pFOXC3 retroplasmid and the efficiency by which the pFOXC3-RT was able to use RNA and DNA primers suggested that nucleic acid primers were employed. In those studies, partially purified pFOXC3-RT preparations were used in reactions having exogenous RNAs that corresponded to the 3' end of the plasmid RNAs, and it was found that cDNA synthesis could initiate by use of complementary DNA oligonucleotides or snapped-back RNAs having minimal base pairing interactions with the template [5]. In this report, we describe unsuccessful attempts to recover RNA:cDNA hybrid molecules among in vivo replication intermediates that would represent snapped-back initiation, and when our studies expanded to include the distantly related pFOXC1 retroplasmid, it was found that a high percentage of cDNAs generated from endogenous RNA templates are associated with protein. In experiments using nuclease-treated pFOXC1-containing mtRNPs, a protein-linked cDNA was produced in the presence of an exogenous RNA that corresponds to the 3' end of the plasmid transcript, and in reactions carried out in the absence of an exogenous RNA with a single radiolabeled deoxynucleotide, a 60 kDa nucleotide-linked protein was generated that appears to serve as the primer for reverse transcription.: Heating the products of reverse transcription reactions in the presence of 0.2% SDS prior to separation in SDS-agarose or SDS-polyacrylamide gels made it possible to resolve and analyze protein-linked products. A large portion of (-) strand cDNA products generated from endogenous RNAs associated with pFOXC1-containing mtRNPs was found to be removed by extraction with phenol-CIA, whereas post-treatment of the reactions with proteinase K prevented their removal, indicating that the majority of the labeled cDNAs are associated with protein. When similar reactions were carried out with pFOXC3-containing mtRNPs, a smaller but significant portion of products was also found to be associated with protein. The finding that some of the labeled products were not eliminated by extraction with phenol-CIA suggests that a portion of cDNA products is not attached to protein. This could indicate that nascent protein-primed cDNAs are subject to a cleavage event that removes the protein primer or, since the isolated mtRNPs contain plasmid intermediates at all stages of replication, a fraction of the replication products likely represent pre-existing cDNAs that are extended during the reactions.: Reactions using micrococcal nuclease-treated pFOXC1-containing mtRNPs having an exogenous RNA that corresponds to the terminal 92 nucleotides of the plasmid transcript produced a radiolabeled 120 kDa product when resolved by SDS-PAGE. This product was only observed in reactions having exogenous RNA and was found to be sensitive to post-treatment with proteinase K. The proteinase K-treated reactions contained a new product of approximately 34 kDa which is approximately the size predicted of a full-length cDNA copy of the RNA template, although the precise length of nucleic acids is difficult to determine in these gel systems. The 120 kDa product was also removed by extraction with phenol-CIA and could be recovered from the organic phase of the extraction. Taken together, these results are consistent with those expected of a protein-primed cDNA product. Interestingly, while the 120 kDa product was not detected in reactions that lacked dGTP or TTP, it was synthesized in the absence of dCTP. The reasons for this are unclear, but could reflect the propensity of the pFOXC-RT to misincorporate nucleotides [5].: Reactions carried out with pFOXC3-containing mtRNPs with a similar sized (98 nucleotides) in vitro RNA that corresponded to the 3' end of the pFOXC3 plasmid failed to produce a protein-linked cDNA. Instead, a 60 kDa band was produced in the absence of an exogenous RNA. This product was also generated in reaction mixtures having each of the four deoxynucleotides alone or in combination with other dNTPs. Thus, the 60 kDa product appears to represent a deoxynucleotide monophosphate-linked protein (that is, [dNMP]n-protein). The 60 kDa product was eliminated by proteinase K treatment and removed by extraction with phenol-CIA (data not shown), confirming its proteinaceous nature. Significantly, the nucleotide-linked protein was found to comigrate with the pFOXC3-RT, suggesting that the RT is self-priming; however, initial efforts to identify the nucleotide-linked protein by immunoprecipitation and mass spectrometry were unsuccessful, thus it remains possible that another protein is involved. Like hepadnaviral RTs, the only other family of RTs known to initiate cDNA synthesis via protein priming, the initiating amino acid appears to be a tyrosine due to the insensitivity of the labeled 60 kDa nucleotide-protein to high alkaline treatment.: Reactions with pFOXC3-containing mtRNPs also generated an approximately 115 kDa product when a full complement of deoxynucleotides was used in the reaction mix. This product appears to represent a protein-primed cDNA as it was found to be sensitive to proteinase K treatment (not shown) as well as to phosphonoformate, which inhibits DNA synthesis without affecting deoxynucleotidylation of proteins that serve as a primer [21, 22]. Formation of the 115 kDa product was also strongly suppressed by dideoxynucleotides at concentrations that had little or no effect on the 60 kDa nucleotide-linked protein product. The intensity of the 115 kDa band was also found to vary depending on the MN-treated mtRNP preparation used in the reactions, suggesting that the product could derive from the copying of endogenous RNAs that were incompletely digested or protected during micrococcal nuclease treatment. Similar studies of reverse transcription using MN-treated mtRNPs isolated from strains of N. crassa containing the Mauriceville retroplasmid showed that more than 20 nucleotides of plasmid nucleic acid remained associated with the plasmid RT following MN digestion [23].: A comprehensive analysis of the synthesis of the 60 kDa product in reactions with MN-treated pFOXC3-containing mtRNPs under different nucleotide combinations revealed that deoxynucleodtidylation was greatest when three dNTPs were used, and reaction mixtures having dATP and dGTP were the most productive. Reactions having single dideoxynucleotides (together with the appropriate complement of dNTPs) showed that ddATP was the only dideoxynucleotide to inhibit deoxynucleotidylation in all reactions in which it was included, indicating that dAMP is the initial nucleotide incorporated. The next greatest effect was found in reactions that included ddGTP together with [a-32P]TTP plus unlabeled dATP and dCTP, in which case the synthesis of the 60 kDa nucleotide-protein was inhibited by more than 40% in this reaction. Taken together, these findings suggest that the first two nucleotides of deoxynucleotidylation are 5'-AG. Since the relative intensity of the nucleotide-labeled product was greatest with three dNTPs, it follows that a third dNTP is necessary for maximum labeling; however, it is unclear from the results whether dCTP or TTP is preferred. The total number of dNMPs incorporated during deoxynucleotidylation also remains to be determined.: When different nucleotide combinations were used in exogenous reactions having MN-treated pFOXC1-containing mtRNPs, it was discovered that a 60 kDa labeled protein was produced in reactions having [a-32P]TTP. Like reactions with MN-treated pFOXC3-containing mtRNPs, the generation of this product was not dependent on the addition of an exogenous RNA, indicating that deoxynucleotidylation was not templated by the added RNA; however, based on analysis of the 115 kDa product described above, the priming protein may remain associated with partially digested endogenous RNAs following MN treatment. Interestingly, even in reactions having [a-32P]TTP with a full complement of unlabeled dNTPs together with an exogenous RNA, the 120 kDa product was not observed (Figure 6a, lane 3). Yet, when reactions having [a-32P]TTP as the sole nucleotide (at a concentration of 0.33 µM) were chased with all four dNTPs (to a final concentration of 20 µM), the 120 kDa band was produced. This suggests that concentrations of TTP greater than 0.33 µM are necessary for the nucleotide-protein complex to engage in cDNA elongation. This finding is consistent with studies of adenoviral protein-primed DNA synthesis that showed that the Km for elongation was significantly greater than the Km for initiation [24]. Taken together, these experiments suggest that deoxynucleotidylation occurs prior to reverse transcription.: The finding that cDNA synthesis catalyzed by the pFOXC-RTs is protein-primed is consistent with earlier models of repeat addition that were based on a slideback mechanism associated with protein-primed elements. In cases where protein priming occurs at the termini of these elements (that is, <U+0424>-29 and adenovirus), the nucleotide that functions as template for the initial deoxynucleotidylation reaction is located 2-4 nucleotides upstream of the 3' terminus. Following the addition of one or more nucleotides, the nucleotide-protein complex undergoes a slideback of 1-3 nucleotides prior to elongation (reviewed in [13]). Collectively, our studies support a model of retroplasmid replication that involves the use of the plasmid RT as primer for reverse transcription and, when combined with the slideback mechanism associated with protein-primed DNA elements, would account for the maintenance and potential extension of the 3' terminal repeats (Figure 7). As previously proposed, the plasmid RNA appears to serve as an mRNA for the synthesis of the RT and as template for (-) strand cDNA synthesis. The initial step of DNA synthesis involves the covalent linkage of dNMP to the protein primer, which is likely to be the pFOXC-RT. In vitro experiments indicate that deoxynucleotidylation occurs in the absence of an RNA template (as shown), yet we cannot discount that the RT remains associated with remnants of the endogenous RNA that serve as template for this reaction. Experiments using pFOXC3-containing mtRNPs indicate that deoxynucleotidylation involves the covalent linkage of dAMP to a tyrosine residue of a 60 kDa protein, followed by the incorporation of dGMP and at least one other dNMP. The resulting nucleotide-protein complex would have partial complementarity to the 3' terminal repeat (5'-AUCUA) of the plasmid transcript. Likewise, in reactions using pFOXC1-containing mtRNPs, deoxynucleotidylation leads to the incorporation of one or more thymidine monophosphate (TMP) molecules attached to a 60 kDa protein and the resulting [TMP]n-protein complex would have complementarity to the 3' terminal repeat (5'-CAA) of the pFOXC1 plasmid transcript. Based on analogies to protein-primed linear DNAs, the TMP-labeled protein associates with the A residues of the penultimate repeat and one or more nucleotides are added prior to a slideback that would reposition the nascent cDNA opposite the terminal repeat. It is not known if a second RT molecule is involved in elongation (as shown) or whether the RNA is spooled through the active site of the attached RT. The resulting (-) strand cDNA would maintain the integrity of the terminal repeats and would be protected by a 5'-linked protein. The model could also accommodate more than one slideback, as has been demonstrated with DNA slidebacks associated with the PRD1 bacteriophage [13]. This would potentially extend the length of the repeated region and ensure that sequence information is not lost during replication.: Model for protein-primed reverse transcription by the pFOXC-reverse transcriptase (RT). Transcription of the pFOXC plasmid DNA molecules produces full-length RNAs that appear to function as both mRNAs for the synthesis of the RT and as templates for (-) strand cDNA synthesis [6]. Transcripts of pFOXC3 terminate in approximately three pentameric repeats, whereas transcripts of pFOXC1 terminate in approximately four copies of a 3 bp sequence (the 3' terminus of in vitro RNA used in this study is shown). Following production of the plasmid-encoded RT, deoxynucleotidylation occurs with the covalent addition of dAMP to a tyrosine residue of the 60 kDa pFOXC3-RT, followed by incorporation of deoxyguanosine monophosphate (dGMP) and a third nucleotide. Deoxynucleotidylation of the pFOXC1-RT results in the addition of thymidine monophosphate (TMP) to the RT, followed by one or more deoxynucleotide monophosphates (dNMPs) (a second TMP is shown). The resulting RT-(dNMP)n complex would have complementarity to the corresponding terminal repeat. Based on studies of protein-primed DNA elements, the model predicts that the complex anneals to the penultimate 3' repeat of the template (shown for pFOXC1 only). Following the synthesis of a unit-length repeat, the RT-(dNMP)n complex undergoes a slideback and is repositioned opposite the terminal repeat. The nascent cDNA is elongated via reverse transcription of the template by the 5'-linked RT or by a separate RT recruited to the complex. The model could also accommodate an increase in the number of repeats, depending on the number of slideback events that occur.: The finding that the pFOXC plasmids replicate via protein-primed reverse transcription makes them only the second retroelement family known to use a protein to initiate cDNA synthesis. Amino acid sequence comparisons fail to identify regions of the pFOXC-RTs that show high similarity to the terminal protein domain of hepadnaviral RTs. The pFOXC-RTs also lack homology to the TPs of protein-primed DNA elements or genome-linked proteins of RNA viruses (VPgs). Yet, despite the surprisingly high degree of evolutionary divergence between the pFOXC1-RTs and pFOXC2/pFOXC3-RTs, it is noteworthy that a 20 amino acid region of high similarity (greater similarity than that of the conserved RT domains) is detected near the amino terminus and contains a tyrosine residue that could potentially serve as the primer. Efforts are underway to express the RTs in heterologous hosts to demonstrate the importance of this region and further characterize the mechanism of cDNA initiation.: Conclusions: We provide evidence that the pFOXC retroplasmids initiate reverse transcription by use of a protein primer, making them only the second retroelement lineage known to use a protein to prime cDNA synthesis. When combined with previous studies, the pFOXC-RTs appear to be unique among polymerases in their ability to use RNA, DNA or protein to initiate DNA synthesis. This provides additional support of the hypothesis that mitochondrial retroelements represent a type of 'molecular fossil' that hold clues about the evolutionary past. Our findings also suggest that the role of protein-primed replication in maintaining the termini of linear genetic elements is not restricted to elements that replicate using DNA-dependent DNA polymerases or RNA-dependent RNA polymerases, and can include elements that replicate via reverse transcription. Whether this is a case of convergent evolution or represents a primordial feature conserved among polymerases remains to be determined. It is intriguing that the 3' terminal repeats of the pFOXC1 and pFOXC2/pFOXC3 retroplasmid families have such marked differences, both in length and sequence. If, as proposed, a slideback mechanism is involved in the maintenance of the terminal repeats, further study of the pFOXC retroplasmids could reveal mechanistic adaptations of the RT to accommodate the progressive lengthening of slidebacks. These studies could also provide insight into the mechanistic ability of the RTs associated with telomerases (TERTs) to add short DNA repeats via iterative slidebacks on an RNA template, as well as the duplication of 3' repeats during retrotransposition of certain non-long terminal repeat (non-LTR) retroelements.: Methods: F. oxysporum strains and growth conditions: Strains used in this study were F. oxysporum 777, f. sp. conglutinans (pFOXC1-containing strain), F. oxysporum 725, f. sp. matthioli (pFOXC3-containing strain) and plasmid-free strain F. oxysporum 9129, f. sp. cubense. These strains are maintained at the US Department of Agriculture Agricultural Research Service (USDA-ARS) Cereal Disease Laboratory (St Paul, MN, USA). Strains were grown on potato dextrose agar plates, and conidia preserved in 20% glycerol and stored at -80°C. Conidia were germinated for 3-4 days in 1-2 l of 1 × Vogel's medium [25] shaking at 150 rpm at 25°C.: Isolation of mitochondria and mtRNP particles: Mitochondria were prepared from mycelia by the flotation gradient method [26]. Mitochondrial RNP complexes were isolated by resuspending mitochondrial pellets in 3.5 ml of 25 mM Tris, pH 7.5, 500 mM KCl, 25 mM CaCl2, 20 mM dithiothreitol (HKCT-D) and lysed by addition of Nonidet P-40 to a final concentration of 1%. Lysates were layered over a 1.85 M sucrose cushion containing HKCT-D, and centrifuged in a Beckman Ti50 rotor (226,000 g, 17 h, 4°C; [27]). Mitochondrial RNP pellets were stored at -80°C and resuspended in a solution of 50 mM Tris-HCl, pH 8.2, 0.5 mM ethylenediaminetetra-acetic acid (EDTA), 10 mM KCl, and 5 mM dithiothreitol (DTT) at concentrations of 1-2 A260 OD units/µl.: Cloning of the termini of pFOXC1 and (-) strand cDNA products: Plasmid pFOXC1 was isolated from agarose gels and subjected to digestion with exonuclease III (New England Biolabs, Beverly, MA, USA) to cleave single-stranded regions in the hairpin or used directly in tailing reactions using terminal deoxynucleotide transferase (Promega, Madison, WI, USA) and dGTP. The tailed products were amplified by anchored PCR using plasmid-specific primers C1 5' (5'-GCTGGATCCCCGACACTGATTCATG) or C1 3' (5'-GAAGGATCCAGTATCAAATGGGGACTC), and dCBAM (ATATAGGAC16). Products were digested with Bgl II and Bam HI, cloned into the Bam HI site of Bluescribe (pBS; Stratagene, La Jolla, CA, USA) and sequenced.: Reverse transcription reactions: Mitochondrial RNP particles were used directly in reactions (that is, endogenous reactions) or were treated with nuclease to degrade nucleic acids prior to use in reactions having added RNAs (that is, exogenous reactions). Endogenous reactions included 0.4 A260 OD units of mtRNP particles in a solution having 50 mM Tris-HCl, pH 8.2, 5 mM MgCl2, 50 mM KCl, 20 µCi of [a-32P]dCTP, 125 µM dATP, dGTP and TTP and 5 mM DTT. Unless otherwise indicated, mtRNP particles were preincubated with actinomycin D (100 µg/ml, Sigma-Aldrich, St Louis, MO, USA) for 5 min at 4°C prior to addition to reaction mixtures. Reactions were incubated at 37°C for 15 min, then chased by the addition of dCTP to 100 µM and incubated for an additional 10 min. Reactions were stopped by the addition of EDTA to a concentration of 125 mM prior to post-treatment and/or precipitation with ethanol. Post-treatment of endogenous reactions included incubation with proteinase K at 50°C for 15 min with and without extraction with equal volume phenol-CIA (25:24:1). Recovery of products from phenol was carried out by precipitation with four volumes of 100% acetone. Precipitated products were resuspended in 10 mM Tris-HCl, pH 7.0, 1 mM EDTA (TE) and quantified as previously described [6, 28]. Endogenous products were directly separated on 1.0% to 1.2% agarose gels, or were incubated in a loading dye mixture containing 0.2% SDS and heated for 5 min at 65°C prior to loading on a 1.0% to 1.2% agarose gel containing 0.2% SDS. For exogenous reactions, the pFOXC-RT was released from the mtRNP particles by degrading endogenous nucleic acids with MN (Takara Bio USA, Madison, WI, USA). In most cases, 5-10 A260 OD units were incubated in a reaction having 50 mM Tris, pH 8.2, 1 mM CaCl2 with 5 IU MN/A260 OD unit for 15 s at 37°C followed by 10 min at room temperature. To inhibit the MN activity, EGTA was added to a concentration of 10-20 mM. The MN-treated mtRNPs were incubated at 4°C with 100 µg/ml actinomycin D prior to use in reaction mixtures having 1-2 µg of an in vitro synthesized RNA template. Reverse transcription reactions were carried out in a reaction buffer having 50 mM Tris-HCl, pH 8.2, 20 mM MgCl2, and 0.33 µM [a-32P]dNTP and/or 20 µM dNTPs and/or 100 µM ddNTPs, as indicated. Where indicated, reactions were post-treated with proteinase K (0.2 mg/ml) at 50°C for 15 min, with or without extraction with equal volume of phenol-CIA. Reactions were boiled in Laemmli buffer (125 mM Tris-HCl, 2% SDS, 10% glycerol, 5% 2-mercaptoethanol) for 5 min, and separated via 7.5%, 10% or 4-20% gradient SDS-PAGE, as indicated. Gels were electrophoresed for 1-2 h at 120 volts and then fixed in a solution of 20% ethanol and 10% glycerol for 30-45 min, and dried in a drying apparatus. Dried gels were exposed to phosphorimager screen overnight and analyzed by a Storm 860 Phosphorimager and ImageQuant V.5.2 (GE Healthcare Biosystems, Piscataway, NJ, USA). Where indicated, products in gels were transferred to a nitrocellulose membrane for detection by phosphorimager and western blot analysis.: Synthesis of in vitro RNA templates: Templates for RNA synthesis of the pFOXC3 transcript containing three repeats were generated and transcribed as previously described [5]. Templates for the synthesis of the 92 nucleotide pFOXC1 RNA were generated by amplification of a clone containing the 3' terminal region of pFOXC1 with a primer having a T7 promoter sequence (C192nt+Bam; 5'-CCGGATCCTAATACGACTCACTATAGGCTGAGGAAATTTG) and another corresponding to the end of the plasmid having 4 copies of the repeat (C14R+ Eco; 5'-CCGAATTCTTGTTGTTGTTGTTTCCAACCTC). This fragment was inserted into the multiple cloning site of pBluescribe. An amplicon was generated using a pBS forward primer and C1 92nt 4R (5'-TTGTTGTTGTTGTTTCCAACCTC) and used as a template to generate run off transcripts. In vitro transcription was carried out in a 20 µl reaction volume containing 100 ng of template DNA, reaction buffer containing 40 mM Tris-HCl, pH 7.9, 6 mM MgCl2, 10 mM DTT, 2 mM spermidine, and 50 IU of T7 RNA polymerase (New England Biolabs, Ipswich, MA, USA). Reactions were incubated 30-60 min at 37°C. DNA was digested with 2 IU RQ1 DNase (Thermo Fisher Scientific, Waltham, MA, USA) in 40 mM Tris-HCl, pH 8.0, 10 mM MgSO4, 1 mM CaCl2, for 30 min at 37°C. Transcripts were extracted with phenol-CIA, precipitated with ethanol, and resuspended in dH2O.: Development of an antibody to pFOXC3-RT and western blot analyses: A synthetic peptide derived from the predicted pFOXC3 polypeptide corresponding to positions 55-68 (KEVKRANRYLAFQE) was synthesized and used in the production of a rabbit polyclonal antibody (Sigma Genosys, Woodlands, TX, USA). Following electrophoresis, protein samples were electroblotted to nitrocellulose (0.45 µm; MSI Laboratories, Westboro, MA, USA) in transfer buffer (25 mM Tris, pH 8.5, 200 mM glycine, 20% methanol). The membrane was washed in Tris-buffered saline (TBS; 20 mM Tris, pH 7.6, 125 mM NaCl) with 0.1% Tween 20 and incubated for 1 h in blocking buffer (5% non-fat dry milk, 0.1% Tween 20 in TBS). The membrane was washed again and incubated with protein-A purified pFOXC3-RT55-68 antibody in blocking buffer for 2-16 h. Following incubation with primary antibody, the membrane was washed and incubated with horseradish peroxidase (HRP)-linked anti-rabbit IgG secondary antibody (1:2,000; Cell Signaling Technology, Beverly, MA, USA) in blocking buffer for 1 h. The membrane was washed and incubated with a chemiluminescent HRP substrate (Thermo Scientific, Rockford, IL, USA) for 5 min, followed by detection using x-ray film and analysis with a LAS-4000 chemiluminescent imaging reader (Fujifilm, Tokyo, Japan).: Analysis of nucleotide-amino acid linkage: Exogenous reverse transcription products were separated via 4-20% gradient SDS-PAGE, and the gel was dried and subjected to phosphorimager analysis. The gel was then rehydrated in 10 gel volumes of 1 M KOH and incubated at 55°C for 2 h, then neutralized with four changes of 10% acetic acid/10% isopropanol. The gel was dried again and exposed to phosphorimager analysis.: The sequence of the pFOXC1 plasmid was deposited in GenBank under the accession ID number HQ026775.: References: Galligan J, Kennell J: Retroplasmids: linear and circular plasmids that replicate via reverse transcription. Microbial Linear Plasmids. Edited by: Meinhardt F, Klassen R. 2007, Berlin, Germany: Springer, 7: 163-185. full_text.: Eickbush TH: Origin and evolutionary relationships of retroelements. Evolutionary Biology of Viruses. Edited by: Morse SS. 1994, New York, USA: Raven Press, 121-157.: Eickbush TH: Telomerase and retrotransposons: which came first?. Science. 1997, 277: 911-912. 10.1126/science.277.5328.911.: Wang H, Lambowitz AM: The Mauriceville plasmid reverse transcriptase can initiate cDNA synthesis de novo and may be related to reverse transcriptase and DNA polymerase progenitor. Cell. 1993, 75: 1071-1081. 10.1016/0092-8674(93)90317-J.: Simpson EB, Ross SL, Marchetti SE, Kennell JC: Relaxed primer specificity associated with reverse transcriptases encoded by the pFOXC retroplasmids of Fusarium oxysporum. Eukaryotic Cell. 2004, 3: 1589-1600. 10.1128/EC.3.6.1589-1600.2004.: Walther TC, Kennell JC: Linear mitochondrial plasmids of F. oxysporum are novel, telomere-like retroelements. Mol Cell. 1999, 4: 229-238. 10.1016/S1097-2765(00)80370-6.: Weiner AM, Maizels N: The genomic tag hypothesis: modern viruses as molecular fossils of ancient strategies for genomic replication, and clues regarding the origin of protein synthesis. Biol Bull. 1999, 196: 327-330. 10.2307/1542962.: Hirota N, Hashiba T, Yoshida H, Kikumoto T, Yoshio E: Detection and properties of plasmid-like DNA in isolates from twenty three formae speciales of Fusarium oxysporum. Ann Phytopath Soc Japan. 1992, 58: 386-392.: Kistler HC, Leong SA: Linear plasmidlike DNA in the plant pathogenic fungus Fusarium oxysporum f. sp. conglutinans. J Bacteriol. 1986, 167: 587-593.: Kajikawa M, Okada N: LINEs mobilize SINEs in the eel through a shared 3' sequence. Cell. 2002, 111: 433-444. 10.1016/S0092-8674(02)01041-3.: Ohshima K, Okada N: SINEs and LINEs: symbionts of eukaryotic genomes with a common tail. Cytogenet Genome Res. 2005, 110: 475-490. 10.1159/000084981.: de Jong RN, van der Vliet PC, Brenkman AB: Adenovirus DNA replication: protein priming, jumping back and the role of the DNA binding protein DBP. Curr Top Microbiol Immunol. 2003, 272: 187-211.: Salas M, Miller JT, Leis J, DePamphilis ML: Mechanisms for priming DNA synthesis. DNA Replication in Eukaryotic Cells. Edited by: DePamphilis ML. 1996, New York, USA: Cold Spring Harbor, 131-176.: Kim EK, Jeong JH, Youn HS, Koo YB, Roe JH: The terminal protein of a linear mitochondrial plasmid is encoded in the N-terminus of the DNA polymerase gene in white-rot fungus Pleurotus ostreatus. Curr Gene. 2000, 38: 283-290. 10.1007/s002940000157.: Klassen R, Meinhardt F: Linear protein-primed replicating plasmids in eukaryotic microbes. Microbial Linear Plasmids. Volume 7. Edited by: Meinhardt F, Klassen R. 2007, Berlin, Germany: Springer, 187-226. full_text.: Weber M, Bronsema V, Bartos H, Bosserhoff A, Bartenschlager R, Schaller H: Hepadnavirus P protein utilizes a tyrosine residue in the TP domain to prime reverse transcription. J Virol. 1994, 68: 2994-2999.: Zoulim F, Seeger C: Reverse transcription in hepatitis B viruses is primed by a tyrosine residue of the polymerase. J Virol. 1994, 68: 6-13.: Paul AV, Yin J, Mugavero J, Rieder E, Liu YE, Wimmer E: A \"slide-back\" mechanism for the initiation of protein-primed RNA synthesis by the RNA polymerase of poliovirus. J Biol Chem. 2003, 278: 43951-43960. 10.1074/jbc.M307441200.: Kistler HC, Benny U, Powell WA: Linear mitochondrial plasmids of Fusarium oxysporum contain genes with sequence similarity to genes encoding a reverse transcriptase from Neurospora spp. Appl Environ Microbiol. 1997, 63: 3311-3313.: Cooper JA, Hunter T: Changes in protein phosphorylation in Rous sarcoma virus-transformed chicken embryo cells. Mol Cell Biol. 1981, 1: 165-178.: Wang GH, Seeger C: The reverse transcriptase of hepatitis B virus acts as a protein primer for viral DNA synthesis. Cell. 1992, 71: 663-670. 10.1016/0092-8674(92)90599-8.: Seifer M, Standring DN: Recombinant human hepatitis B virus reverse transcriptase is active in the absence of the nucleocapsid or the viral replication origin, DR1. J Virol. 1993, 67: 4513-4520.: Wang H, Kennell JC, Kuiper MT, Sabourin JR, Saldanha R, Lambowitz AM: The Mauriceville plasmid of Neurospora crassa: characterization of a novel reverse transcriptase that begins cDNA synthesis at the 3' end of template RNA. Mol Cell Biol. 1992, 12: 5131-5144.: Mul YM, van der Vliet PC: The adenovirus DNA binding protein effects the kinetics of DNA replication by a mechanism distinct from NFI or Oct-1. Nucleic Acids Res. 1993, 21: 641-647. 10.1093/nar/21.3.641.: Davis RH, de Serres FJ: Genetic and microbiological research techniques for Neurospora crassa. Methods Enzymol. 1970, 17: 79-143. full_text.: Lambowitz AM: Preparation and analysis of mitochondrial ribosomes. Methods Enzymol. 1979, 59: 421-433. full_text.: Garriga G, Lambowitz AM: Protein-dependent splicing of a group I intron in ribonucleoprotein particles and soluble fractions. Cell. 1986, 46: 669-680. 10.1016/0092-8674(86)90342-9.: Kennell JC, Moran JV, Perlman PS, Butow RA, Lambowitz AM: Reverse transcriptase activity associated with maturase-encoding group II introns in yeast mitochondria. Cell. 1993, 73: 133-146. 10.1016/0092-8674(93)90166-N.: Download references: Acknowledgements: We thank Matthew Althage and Nikola Kellner for assistance in cloning pFOXC1, Chytra Mandyam for pFOXC1 RT assays and Haedar Abuirqeba for assistance in the preparation and analysis of mtRNPs. We also thank Dr John Tavis for advice at various stages of the project. This work was supported by a grant 1R15GM07605201A1 from the National Institute of Health and MCB0196483 from the National Science Foundation.: Author information: Affiliations: Corresponding author: Correspondence to John C Kennell.: Additional information: Competing interests: The authors declare that they have no competing interests.: Authors' contributions: JG was responsible for the execution and analysis of exogenous reverse transcription assays and western blots, participated in experimental design, construction of figures for the manuscript, drafting and editing of the manuscript. SM was responsible for execution and analysis of endogenous and exogenous reverse transcription assays and editing of the manuscript. JK conceived of the study, was responsible for its design and coordination, and drafted the manuscript. All authors read and approved of the final manuscript.: Electronic supplementary material: Additional file 1:Supplementary Table 1 Sequence of clones of the terminus of pFOXC1. (DOCX 11 KB): Additional file 2:Supplementary Table 2 Reverse transcriptase activity associated with mitochondrial ribonucleoproteins (mtRNPs) containing pFOXC1 or pFOXC3. (DOCX 12 KB): Authors’ original submitted files for images: Below are the links to the authors’ original submitted files for images.: Authors’ original file for figure 1: Authors’ original file for figure 2: Authors’ original file for figure 3: Authors’ original file for figure 4: Authors’ original file for figure 5: Authors’ original file for figure 6: Authors’ original file for figure 7: Rights and permissions: Reprints and Permissions: About this article: Cite this article: Galligan, J.T., Marchetti, S.E. & Kennell, J.C. Reverse transcription of the pFOXC mitochondrial retroplasmids of Fusarium oxysporum is protein primed. Mobile DNA 2, 1 (2011). https://doi.org/10.1186/1759-8753-2-1: Download citation: Received: 06 October 2010: Accepted: 21 January 2011: Published: 21 January 2011: DOI: https://doi.org/10.1186/1759-8753-2-1: Keywords: Advertisement: ISSN: 1759-8753: Follow BMC: By using this website, you agree to our Terms and Conditions, Privacy statement and Cookies policy. Manage the cookies we use in the preference centre. : © 2020 BioMed Central Ltd unless otherwise stated. Part of Springer Nature. "