Click
here to close Hello! We notice that
you are using Internet Explorer, which is not supported by Echinobase
and may cause the site to display incorrectly. We suggest using a
current version of Chrome,
FireFox,
or Safari.
???displayArticle.abstract???
A comprehensive transcriptome analysis has been performed on protein-coding RNAs of Strongylocentrotus purpuratus, including 10 different embryonic stages, six feeding larval and metamorphosed juvenile stages, and six adult tissues. In this study, we pooled the transcriptomes from all of these sources and focused on the insights they provide for gene structure in the genome of this recently sequenced model system. The genome had initially been annotated by use of computational gene model prediction algorithms. A large fraction of these predicted genes were recovered in the transcriptome when the reads were mapped to the genome and appropriately filtered and analyzed. However, in a manually curated subset, we discovered that more than half the computational gene model predictions were imperfect, containing errors such as missing exons, prediction of nonexistent exons, erroneous intron/exon boundaries, fusion of adjacent genes, and prediction of multiple genes from single genes. The transcriptome data have been used to provide a systematic upgrade of the gene model predictions throughout the genome, very greatly improving the research usability of the genomic sequence. We have constructed new public databases that incorporate information from the transcriptome analyses. The transcript-based gene model data were used to define average structural parameters for S. purpuratus protein-coding genes. In addition, we constructed a custom sea urchin gene ontology, and assigned about 7000 different annotated transcripts to 24 functional classes. Strong correlations became evident between given functional ontology classes and structural properties, including gene size, exon number, and exon and intron size.
???displayArticle.pubmedLink???
22709795
???displayArticle.pmcLink???PMC3460201 ???displayArticle.link???Genome Res ???displayArticle.grants???[+]
Figure 1. Computational simulation of quantitative variations at different sequencing depths. The ordinate, ratio of FPKM per transcript species in the two data sets compared, is given in log2; the abscissa, mean of the two FPKM values, in log10. (Blue dots) 20 million (M) reads; (green dots) 2M reads; (red dots) 0.2M read. (Vertical dashed line) Average FPKM 5; (horizontal dotted lines) ± twofold change. The plot shows that in the 20M read data set, prevalence estimations for almost all mRNAs over FPKM 5 are within twofold.
Figure 2. Length distributions of protein-coding genes and their components. Essentially these plots are smoothed versions of a histogram where the ordinate represents the frequency of the given length in base pairs. All distributions have very long tails, and the plots only show part of the distributions: (A) genes, 0â100 kb; (B) introns and mRNA, 0â10 kb; (C) UTRs and CDS, 0â5 kb; (D) exons, 0â1 kb.
Figure 3. Lengths of exons and introns with respect to their relative positions in genes. (A) Labeling method for introns and exons used in the following panels. (B,C) Average length of exons and introns diagrammed in A. (D,E) Average length of each exon and intron in all genes containing 10 exons.
Figure 4. Discrepant predicted and observed gene structure displayed in the IGV genome browser. A selectable variety of aligned features is shown in horizontal tracks with the feature label to the left: Repeat sequences (gray; shows the number of matches using 76-bp sequence windows in the whole genome, using Bowtie with the same parameters as when mapping the reads); Gap (gray; sequence regions of the genome assembly that lie in gaps and are therefore undetermined; several short gaps are shown in A); GLEAN model (red; the original gene model predicted by the GLEAN method); RNA-seq gene models (blue; the models produced by this study; the blank terminal regions are UTRs); Coverage (green; a graphical presentation of the number of sequencing reads that align at a particular location); Reads (gray; the alignment of individual reads to the genome sequence). (Orange arrows) Individual RNA sequence-derived exons. (A) The genomic structure of the gene blimp1. The overall structure of the GLEAN gene model is correct except longer UTRs are recovered and an alternatively spliced isoform that uses a distant 5â² exon is recovered. (B) The genomic structure of the gene hnf6. The GLEAN model predicted an incorrect exon1/intron1 boundary, and the 3â² exon is not supported by sequence. The correct 3â² exons and two isoforms were identified from the RNA sequence data.
Figure 5. Numbers of gene models associated with major functional classes. The distribution is based on the custom sea urchin ontology discussed in the text.
Figure 6. Gene structure parameters for individual ontological classes. The four panels show average gene length, exon length, intron length, and exon number. (Black horizontal lines) The average value of the feature in the whole gene set. The âUnclassifiedâ class refers to gene models that were not included in these ontological classes. The âNovelâ class refers to gene models newly identified in this study as described in the text; these tend to be atypically small genes with few exons.
Blencowe,
Current-generation high-throughput sequencing: deepening insights into mammalian transcriptomes.
2009,
Pubmed
Bolouri,
Transcriptional regulatory cascades in development: initial rates, not steady state, determine network kinetics.
2003,
Pubmed
,
Echinobase
Bradnam,
Longer first introns are a general property of eukaryotic gene structure.
2008,
Pubmed
Cameron,
SpBase: the sea urchin genome database and web site.
2009,
Pubmed
,
Echinobase
Götz,
High-throughput functional annotation and data mining with the Blast2GO suite.
2008,
Pubmed
Grabherr,
Full-length transcriptome assembly from RNA-Seq data without a reference genome.
2011,
Pubmed
Hastings,
SL trans-splicing: easy come or easy go?
2005,
Pubmed
Hibino,
The immune gene repertoire encoded in the purple sea urchin genome.
2006,
Pubmed
,
Echinobase
Howard-Ashby,
High regulatory gene use in sea urchin embryogenesis: Implications for bilaterian development and evolution.
2006,
Pubmed
,
Echinobase
Huang,
CD-HIT Suite: a web server for clustering and comparing biological sequences.
2010,
Pubmed
Koralewski,
Evolution of exon-intron structure and alternative splicing.
2011,
Pubmed
Langmead,
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.
2009,
Pubmed
Livi,
Expression and function of blimp1/krox, an alternatively transcribed regulatory gene of the sea urchin endomesoderm network.
2006,
Pubmed
,
Echinobase
Materna,
High accuracy, high-resolution prevalence measurement for the majority of locally expressed regulatory genes in early sea urchin development.
2010,
Pubmed
,
Echinobase
Messier-Solek,
Highly diversified innate receptor systems and new forms of animal immunity.
2010,
Pubmed
Mortazavi,
Mapping and quantifying mammalian transcriptomes by RNA-Seq.
2008,
Pubmed
Oliveri,
Global regulatory logic for specification of an embryonic cell lineage.
2008,
Pubmed
,
Echinobase
Otim,
SpHnf6, a transcription factor that executes multiple functions in sea urchin embryogenesis.
2004,
Pubmed
,
Echinobase
Peter,
Modularity and design principles in the sea urchin embryo gene regulatory network.
2009,
Pubmed
,
Echinobase
Peter,
A gene regulatory network controlling the embryonic specification of endoderm.
2011,
Pubmed
,
Echinobase
Putnam,
The amphioxus genome and the evolution of the chordate karyotype.
2008,
Pubmed
Rast,
Genomic insights into the immune system of the sea urchin.
2006,
Pubmed
,
Echinobase
Robinson,
Integrative genomics viewer.
2011,
Pubmed
Smith,
The larval stages of the sea urchin, Strongylocentrotus purpuratus.
2008,
Pubmed
,
Echinobase
Sodergren,
The genome of the sea urchin Strongylocentrotus purpuratus.
2006,
Pubmed
,
Echinobase
Sodergren,
Shedding genomic light on Aristotle's lantern.
2006,
Pubmed
,
Echinobase
Tarazona,
Differential expression in RNA-seq: a matter of depth.
2011,
Pubmed
Toung,
RNA-sequence analysis of human B-cells.
2011,
Pubmed
Trapnell,
TopHat: discovering splice junctions with RNA-Seq.
2009,
Pubmed
Trapnell,
Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.
2010,
Pubmed
Wang,
RNA-Seq: a revolutionary tool for transcriptomics.
2009,
Pubmed
Zhu,
Patterns of exon-intron architecture variation of genes in eukaryotic genomes.
2009,
Pubmed