Click
here to close Hello! We notice that
you are using Internet Explorer, which is not supported by Echinobase
and may cause the site to display incorrectly. We suggest using a
current version of Chrome,
FireFox,
or Safari.
Nucleic Acids Res
2009 Nov 01;3721:e143. doi: 10.1093/nar/gkp752.
Show Gene links
Show Anatomy links
MGEScan-non-LTR: computational identification and classification of autonomous non-LTR retrotransposons in eukaryotic genomes.
Rho M
,
Tang H
.
???displayArticle.abstract???
Computational methods for genome-wide identification of mobile genetic elements (MGEs) have become increasingly necessary for both genome annotation and evolutionary studies. Non-long terminal repeat (non-LTR) retrotransposons are a class of MGEs that have been found in most eukaryotic genomes, sometimes in extremely high numbers. In this article, we present a computational tool, MGEScan-non-LTR, for the identification of non-LTR retrotransposons in genomic sequences, following a computational approach inspired by a generalized hidden Markov model (GHMM). Three different states represent two different protein domains and inter-domain linker regions encoded in the non-LTR retrotransposons, and their scores are evaluated by using profile hidden Markov models (for protein domains) and Gaussian Bayes classifiers (for linker regions), respectively. In order to classify the non-LTR retrotransposons into one of the 12 previously characterized clades using the same model, we defined separate states for different clades. MGEScan-non-LTR was tested on the genome sequences of four eukaryotic organisms, Drosophila melanogaster, Daphnia pulex, Ciona intestinalis and Strongylocentrotus purpuratus. For the D. melanogaster genome, MGEScan-non-LTR found all known ''full-length'' elements and simultaneously classified them into the clades CR1, I, Jockey, LOA and R1. Notably, for the D. pulex genome, in which no non-LTR retrotransposon has been annotated, MGEScan-non-LTR found a significantly larger number of elements than did RepeatMasker, using the current version of the RepBase Update library. We also identified novel elements in the other two genomes, which have only been partially studied for non-LTR retrotransposons.
Figure 1. The model for identifying and classifying non-LTR retrotransposons. The circles represent states corresponding to protein domains and linker regions. The shaded ovals represent super states corresponding to clades. This model classifies the elements in 12 clades independently.
Figure 2. The distribution of E-values estimated by the pHMMs of 12 clades in Drosophila genomes. The x-axis represents each element (listed in the Supplementary Table S2) and the y-axis represents E-values. For each element, the E-values measured by 12 clades pHMMs are plotted. Each element was assigned to the clade from which the lowest E-value was obtained. Note that the highest point in each element (x-axis) does not mean that they belong to the same clade. Lines connecting each point were added for visualization purpose.
Figure 3. (a) The distribution of hydrophobicity values in KD, (b) WW and (c) HH hydrophobicity scales, and (d) probability for linker region. In (a)−(c), the x-axis represents the hydrophobicity scale and the y-axis represents the frequency. The distribution on the left-hand side with green dots was plotted with the values obtained from random sequences. The distribution on the right-hand side with blue dots was plotted with the values obtained from the elements in the training set. The yellow and red lines represent Gaussian distribution fitting of the data plotted. In (d), the x-axis represents the probability for the linker region and the y-axis represents the frequency.
Figure 4. Phylogenetic analysis of RT domain sequences in the D. pulex elements, along with previously known elements from other genomes. For the D. pulex elements, LOA elements are highlighted in red, I elements in blue, L2 elements in green, L1 elements in pink and NeSL elements in cyan. A cluster of L2 elements (L2_Tr_AF086712, L2_Maul_2, L2_Poll_AAN15747, L2 Danio, L2_Oryzlas, L2_Xlphophorus and L2_L2_Mars_1) from fish genomes were indicated by green circle.
Figure 5. (a) Phylogenetic tree and (b) schematic representation of the elements in five ancient clades of CRE, GENIE, R2, R4 and NeSL. The neighbor-joining tree was built with 1000 rounds of bootstrapping. Note that the cluster of D. pulex elements is connected with the cluster of HEROs in the NeSL clades with a very high-bootstrap value (100%). The schematic representation of the D. pulex NeSL elements is most similar with one of the HERO elements. The gray bars downstream of the RT are cysteine−histidine motif (C-X2-C-X8-H-X4-C). The blank bars downstream of RT represent the REL domain motif (PD..D).
Figure 6. Multiple sequence alignments of the RT domain sequences from CiI, CiL1, CiL2, CiLOA and CiRTE in the C. intestinalis genome. Seven well conserved regions are highlighted in boxes.
Adams,
The genome sequence of Drosophila melanogaster.
2000, Pubmed
Adams,
The genome sequence of Drosophila melanogaster.
2000,
Pubmed
Altschul,
Basic local alignment search tool.
1990,
Pubmed
Bao,
Automated de novo identification of repeat sequence families in sequenced genomes.
2002,
Pubmed
Berezikov,
A search for reverse transcriptase-coding sequences reveals new non-LTR retrotransposons in the genome of Drosophila melanogaster.
2000,
Pubmed
Biedler,
Non-LTR retrotransposons in the African malaria mosquito, Anopheles gambiae: unprecedented diversity and evidence of recent activity.
2003,
Pubmed
Blesa,
Distribution of the bilbo non-LTR retrotransposon in Drosophilidae and its evolution in the Drosophila obscura species group.
2001,
Pubmed
Blesa,
bilbo, a non-LTR retrotransposon of Drosophila subobscura: a clue to the evolution of LINE-like elements in Drosophila.
1997,
Pubmed
Burke,
Ancient lineages of non-LTR retrotransposons in the primitive eukaryote, Giardia lamblia.
2002,
Pubmed
Burke,
R5 retrotransposons insert into a family of infrequently transcribed 28S rRNA genes of planaria.
2003,
Pubmed
Burke,
The domain structure and retrotransposition mechanism of R2 elements are conserved throughout arthropods.
1999,
Pubmed
Dehal,
The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins.
2002,
Pubmed
Eddy,
Profile hidden Markov models.
1998,
Pubmed
Edgar,
PILER: identification and classification of genomic repeats.
2005,
Pubmed
Hessa,
Recognition of transmembrane helices by the endoplasmic reticulum translocon.
2005,
Pubmed
Hizer,
Evidence of multiple retrotransposons in two litopenaeid species.
2008,
Pubmed
Jurka,
Repbase Update, a database of eukaryotic repetitive elements.
2005,
Pubmed
Kaminker,
The transposable elements of the Drosophila melanogaster euchromatin: a genomics perspective.
2002,
Pubmed
Kojima,
Cross-genome screening of novel sequence-specific non-LTR retrotransposons: various multicopy RNA genes and microsatellites are selected as targets.
2004,
Pubmed
Krogh,
Hidden Markov models in computational biology. Applications to protein modeling.
1994,
Pubmed
Kyte,
A simple method for displaying the hydropathic character of a protein.
1982,
Pubmed
Lander,
Initial sequencing and analysis of the human genome.
2001,
Pubmed
Larkin,
Clustal W and Clustal X version 2.0.
2007,
Pubmed
Lovsin,
Evolutionary dynamics in a novel L2 clade of non-LTR retrotransposons in Deuterostomia.
2001,
Pubmed
Luan,
Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition.
1993,
Pubmed
Malik,
NeSL-1, an ancient lineage of site-specific non-LTR retrotransposons from Caenorhabditis elegans.
2000,
Pubmed
Malik,
The age and evolution of non-LTR retrotransposable elements.
1999,
Pubmed
McCarthy,
LTR_STRUC: a novel search and identification program for LTR retrotransposons.
2003,
Pubmed
McClure,
Automated characterization of potentially active retroid agents in the human genome.
2005,
Pubmed
Novikova,
Non-LTR retrotransposons in fungi.
2009,
Pubmed
Permanyer,
The non-LTR retrotransposons in Ciona intestinalis: new insights into the evolution of chordate genomes.
2003,
Pubmed
Peterson-Burch,
Genomic neighborhoods for Arabidopsis retrotransposons: a role for targeted integration in the distribution of the Metaviridae.
2004,
Pubmed
Rho,
De novo identification of LTR retrotransposons in eukaryotic genomes.
2007,
Pubmed
Sodergren,
The genome of the sea urchin Strongylocentrotus purpuratus.
2006,
Pubmed
,
Echinobase
Sperber,
Automated recognition of retroviral sequences in genomic data--RetroTector.
2007,
Pubmed
Thompson,
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.
1994,
Pubmed
Tu,
Structural, genomic, and phylogenetic analysis of Lian, a novel family of non-LTR retrotransposons in the yellow fever mosquito, Aedes aegypti.
1998,
Pubmed
Unge,
2.2 A resolution structure of the amino-terminal half of HIV-1 reverse transcriptase (fingers and palm subdomains).
1994,
Pubmed
Volff,
Multiple lineages of the non-LTR retrotransposon Rex1 with varying success in invading fish genomes.
2000,
Pubmed
Wimley,
Experimentally determined hydrophobicity scale for proteins at membrane interfaces.
1996,
Pubmed