Click
here to close Hello! We notice that
you are using Internet Explorer, which is not supported by Echinobase
and may cause the site to display incorrectly. We suggest using a
current version of Chrome,
FireFox,
or Safari.
A method for identifying alternative or cryptic donor splice sites within gene and mRNA sequences. Comparisons among sequences from vertebrates, echinoderms and other groups.
Buckley KM
,
Florea LD
,
Smith LC
.
Abstract
BACKGROUND: As the amount of genome sequencing data grows, so does the problem of computational gene identification, and in particular, the splicing signals that flank exon borders. Traditional methods for identifying splicing signals have been created and optimized using sequences from model organisms, mostly vertebrate and yeast species. However, as genome sequencing extends across the animal kingdom and includes various invertebrate species, the need for mechanisms to recognize splice signals in these organisms increases as well. With that aim in mind, we generated a model for identifying donor and acceptor splice sites that was optimized using sequences from the purple sea urchin, Strongylocentrotus purpuratus. This model was then used to assess the possibility of alternative or cryptic splicing within the highly variable immune response gene family known as 185/333.
RESULTS: A donor splice site model was generated from S. purpuratus sequences that incorporates non-adjacent dependences among positions within the 9 nt splice signal and uses position weight matrices to determine the probability that the site is used for splicing. The Purpuratus model was shown to predict splice signals better than a similar model created from vertebrate sequences. Although the Purpuratus model was able to correctly predict the true splice sites within the 185/333 genes, no evidence for alternative or trans-gene splicing was observed.
CONCLUSION: The data presented herein describe the first published analyses of echinoderm splice sites and suggest that the previous methods of identifying splice signals that are based largely on vertebrate sequences may be insufficient. Furthermore, alternative or trans-gene splicing does not appear to be acting as a diversification mechanism in the 185/333 gene family.
Figure 1. Purpuratus donor splice site model. A. Analysis of the frequency of each base within the splice site reveals the S. purpuratus donor splice site consensus sequence. The nine nt window surrounding the donor splice sites from 292 annotated S. purpuratus gene models (2845 donor sequences) were extracted, and the frequency of each nt within the window was calculated. The values shown in bold are the consensus nucleotides. Positions 1 and 2 are invariant because only canonical splice sites were used in this analysis. B. The Purpuratus splice site model incorporated non-adjacent dependences among the bases within the splice site. The model is implemented such that a splice site score of a given candidate sequence is computed using the matrix determined by applying the set of rules shown in the flowchart. For example, the sequence AAGGTAAGT would be scored using the matrix A-2G5G-1A4T6 (A-2→A-2G5→A-2G5G-1→ A-2G5G-1A4→A-2G5G-1A4T6). Non-adjacent dependences were calculated for the 2845 S. purpuratus donor splice sites for each of the seven variable positions between the consensus nt and the non-consensus nucleotides in the other six positions (Table 1). The position with the maximum dependencies was used to serially subdivide the sites until either the subdivision became too small to obtain reliable data, or no more significant dependences were observed. Position frequency matrices are shown, which were calculated for each of the terminal subdivisions and ultimately used in the Purpuratus splice site model.
Figure 2. Analysis of known positive and negative splice sites using the Purpuratus and Vertebrate splice site models. Histograms of the scores given to known positive (solid lines) and negative (dashed lines) splice sites were generated (bin size = 2) for the Purpuratus (A) and Vertebrate (B) splice site models by analyzing the genes used to generate the models (Additional file 2, 3, and 4; [28]). For example, 22% of the known positive sites received scores between 4 and 6. The average of the means (Table 3) is shown by a vertical dotted line. The gray region corresponds to N0.95, and P0.05 (Table 3), which flank the left and right side of the gray region, respectively, and are shown as dashed/dotted lines. The ✳ located on the 0.25% line indicate the mean of the positive and negative scores.
Figure 3. Histograms to evaluate the models. Genes isolated from S. purpuratus (circles), vertebrates (diamonds), and protostomes (triangles) were collected and analyzed using the Purpuratus (A) and Vertebrate (B) models. Histograms of the known positive (solid lines) and negative (dashed lines) donor splice sites were generated (bin size = 2). The average of the means (Table 3) is shown by a vertical dotted line. Values corresponding to N0.95, and P0.05 (Table 3) flank the left and right side of the gray region, respectively, and are shown as a dashed/dotted line. The tables within the graphs indicate the percentage of known positive (Pos.) and negative (Neg.) S. purpuratus (Purp.), vertebrate (Vert.), and protostome (Prot.) sequences, which were classified as positive or negative using the average of the means as the threshold.
Allen,
Computational gene prediction using multiple sources of evidence.
2004, Pubmed
Allen,
Computational gene prediction using multiple sources of evidence.
2004,
Pubmed
Ast,
How did alternative splicing evolve?
2004,
Pubmed
Berget,
Exon recognition in vertebrate splicing.
1995,
Pubmed
Brites,
The Dscam homologue of the crustacean Daphnia is diversified by alternative splicing like in insects.
2008,
Pubmed
Brockton,
Localization and diversity of 185/333 proteins from the purple sea urchin--unexpected protein-size range and protein expression in a new coelomocyte type.
2008,
Pubmed
,
Echinobase
Buckley,
Extraordinary diversity among members of the large gene family, 185/333, from the purple sea urchin, Strongylocentrotus purpuratus.
2007,
Pubmed
,
Echinobase
Buckley,
The 185/333 gene family is a rapidly diversifying host-defense gene cluster in the purple sea urchin Strongylocentrotus purpuratus.
2008,
Pubmed
,
Echinobase
Buckley,
Sequence variations in 185/333 messages from the purple sea urchin suggest posttranscriptional modifications to increase immune diversity.
2008,
Pubmed
,
Echinobase
Burge,
Finding the genes in genomic DNA.
1998,
Pubmed
Burge,
Prediction of complete gene structures in human genomic DNA.
1997,
Pubmed
Burset,
Evaluation of gene structure prediction programs.
1996,
Pubmed
Burset,
Analysis of canonical and non-canonical splice sites in mammalian genomes.
2000,
Pubmed
Cai,
Modeling splice sites with Bayes networks.
2000,
Pubmed
Carter,
Vertebrate gene finding from multiple-species alignments using a two-level strategy.
2006,
Pubmed
Davidson,
A genomic regulatory network for development.
2002,
Pubmed
,
Echinobase
Dewey,
Accurate identification of novel human genes through simultaneous gene prediction in human, mouse, and rat.
2004,
Pubmed
Graveley,
The organization and evolution of the dipteran and hymenopteran Down syndrome cell adhesion molecule (Dscam) genes.
2004,
Pubmed
Harris,
Distribution and consensus of branch point signals in eukaryotic genes: a computerized statistical analysis.
1990,
Pubmed
Hibino,
The immune gene repertoire encoded in the purple sea urchin genome.
2006,
Pubmed
,
Echinobase
Huang,
Optimized mixed Markov models for motif identification.
2006,
Pubmed
Kan,
Gene structure prediction and alternative splicing analysis using genomically aligned ESTs.
2001,
Pubmed
Lander,
Initial sequencing and analysis of the human genome.
2001,
Pubmed
LeBlanc,
Sea urchin small RNA ribonucleoprotein particles: identification, synthesis, and subcellular localization during early embryonic development.
1992,
Pubmed
,
Echinobase
Mathé,
Current methods of gene prediction, their strengths and weaknesses.
2002,
Pubmed
Murakami,
Gene recognition by combination of several gene-finding programs.
1998,
Pubmed
Nair,
Macroarray analysis of coelomocyte gene expression in response to LPS in the sea urchin. Identification of unexpected immune diversity in an invertebrate.
2005,
Pubmed
,
Echinobase
Pertea,
Computational gene finding in plants.
2002,
Pubmed
Rast,
Genomic insights into the immune system of the sea urchin.
2006,
Pubmed
,
Echinobase
Rast,
New approaches towards an understanding of deuterostome immunity.
2000,
Pubmed
,
Echinobase
Schmucker,
Drosophila Dscam is an axon guidance receptor exhibiting extraordinary molecular diversity.
2000,
Pubmed
Sodergren,
The genome of the sea urchin Strongylocentrotus purpuratus.
2006,
Pubmed
,
Echinobase
Stanke,
Gene prediction with a hidden Markov model and a new intron submodel.
2003,
Pubmed
Terwilliger,
Distinctive expression patterns of 185/333 genes in the purple sea urchin, Strongylocentrotus purpuratus: an unexpectedly diverse family of transcripts in response to LPS, beta-1,3-glucan, and dsRNA.
2007,
Pubmed
,
Echinobase
Terwilliger,
Unexpected diversity displayed in cDNAs expressed by the immune cells of the purple sea urchin, Strongylocentrotus purpuratus.
2006,
Pubmed
,
Echinobase
Thanaraj,
Prediction of exact boundaries of exons.
2000,
Pubmed
Wasserman,
Applied bioinformatics for the identification of regulatory elements.
2004,
Pubmed
Yu,
Minimal introns are not "junk".
2002,
Pubmed
Zhang,
Computational prediction of eukaryotic protein-coding genes.
2002,
Pubmed