Searching annotated genes
Gene search database has 29891 Strongylocentrotus purpuratus annotated genes
Gene search database has 28094 Lytechinus variegatus annotated genes
Gene search database has 29697 Patiria miniata annotated genes
The Gene Search page serves as an entry into the database of information on annotated genes predicted from the genome sequence of the Strongylocentrotus purpuratus, Lytechinus variegatus and Patiria miniata . Each search returns a list of matches that are linked to the gene information page as indicated. The query can take the form of a text fragment and wild card symbols, asterisk " * " to denote any alphanumeric text. (bra* = Sp-Bra or brachyury or Sp-Cobra). The question mark " ? " denotes one alphanumeric character.( br? = Sp-Bra or Sp-Fibropellin)
Gene Official ID (Example: SPU_000001, LVA_000001 or PMI_000001)
This query uses the SpBase form of the official gene identifier unique in the database. The form of the identifier is 3 letters "SPU" followed by an underscore and 6 numbers.
The Gene identifiers for Strongylocentrotus purpuratus are derived from and continues on the Official Gene Set identifiers generated at Baylor, the GLEAN numbers. The relationship between SPU numbers and GLEAN number is as follows:
GLEAN3_##### = SPU_0#####
SPU numbers higher that the original 28,944 GLEAN3 numbers denote annotated genes for which no GLEAN prediction existed. These genes came from individually sequenced cDNAs or other evidence.
Lytechinus variegatus genes have format LVA_0#####
Patiria miniata genes have format PMI_0#####
Scaffold (Example: Scaffold100, Scaffold3148)
The scaffold identifier refers to the assembly scaffold from the Version 3.1 for Strongylocentrotus purpuratus Version 0.4 for Lytechinus variegatus and Version 1.0 for Patiria miniata of the genome assembly. This query uses the number of the scaffold after the identifier "Scaffold" with no space between. It will return a list of the annotated genes predicted from the scaffold sequence. To see all the genes predicted from that scaffold, view the scaffold in Jbrowse with the GLEAN:Prediction track displayed(for Strongylocentrotus purpuratus) or MAKER2-prediction track(for Lytechinus variegatus and Version 1.0 for Patiria miniata).
Official gene name
The official gene names of the annotated sea urchin genes take the following form:
Sp-gene_symbol. The first three characters are always the "Sp-".
Lv-gene_symbol. The first three characters are always the "Lv-".
Pm-gene_symbol. The first three characters are always the "Lv-".
Symbols are usually 3-5 characters but additional characters may be added as necessary up to 10 in order to distinguish similar symbols. Symbols should begin with an uppercase letter followed by all lowercase letters unless extensive previous usage dictates otherwise. Punctuation is only used to separate two adjacent numbers (e.g., Lamb1-2) or for designating related (e.g., Es10-rs1). sequences and pseudogenes (e.g., Adh5-ps1).
The query can be a text fragment and doesn't need to include the general prefix (Sp-,Lv- or Pm-). A list of the genes whose names match the text fragment will be returned.
Synonym (Example: Src,EXT,chrd)
In order to account for variable usage of gene names, we have tried to collect any synonyms known for annotated genes. For example Sp-Bra has been called brachyury in many publications. A mouse name that has been for brachyury is "T" or small-T locus. Our synonym category can be searched with text fragments as above. A list of genes whose synonyms match the query will be returned.
PubMed is the bibliographic component of the NCBI's Entrez retrieval system. It accesses a database designed to provide access to citations from biomedical journals. It resides at the National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM), one of the institutes of the National Institutes of Health (NIH). PubMed links also lead to full-text journal articles at Web sites of participating publishers, as well as to other related Web resources. The PubMed ID (PMID) takes the form of a 8-digit number (Example: 9628328). Wild cards are not useful here.
Gene Information (search results)
Searching for gene by its gene id, gene name, Scaffold location, gene synonym or pubmed id returns information on that gene such as general gene information, expression data, functional annotation, associated GO terms, sequence information, reagent data, pubmed references and much more information. Lytechinus variegatus and Patiria miniata genomes were recently sequenced and annotation of these genes is in progress. For these genomes we have general gene information,functional annotation, associated GO terms, sequence information and pubmed references.
Identification(general gene information
ID: This is the official gene id (spu id,lva id or pmi id).
The relationship between SPU numbers and GLEAN number is as follows, GLEAN3_##### = SPU_0#####
LVA genes have format LVA_0#####
PMI genes have format PMI_0#####
Common Name: These are official gene names of the annotated sea urchin genes take the following form, Sp-gene_symbol, Lv-gene_symbol, Pm-gene_symbol
Synonym: These are synonyms known for annotated genes commonly used in publications or in other organisms.
Family Member: The InterProScan program was employed to identify all know protein motifs, families, domains.
Best Genbank Hit: Protein sequence comparison (blastp) was used from annotated proteins in the non-redundant protein set (nr) or reference sequences (refseq_protein).
Gene Model Check: The first release of Sp genome data in the year 2006, and resulting GLEAN gene models (predicted genes) were initially characterized by nearly 200 scientists in the US and abroad. The annotators focused on the groups of proteins that are highly similar to the proteins with known functions in human and other organisms. The annotation data was published in the December issue of Developmental Biology (2006). There are total of 10,244 manually annotated genes. The Gene model check would identify these genes as manually annotated. Protein sequence comparison (blastp) was used from annotated proteins in the non-redundant protein set (nr) or reference sequences (refseq_protein). These genes are termed as IEA(electronically annotated genes. Lv and Pm genes are all electronically annotated.
Ortholog/Homolog: The name of the gene in other species that shares a common ancestor with the gene of interest is given here.
Expression data is provided for:
1.Early Embryo Spatial and Temporal Expression
2.Early Whole Embryo Expression TimeCourse
3.QPCR time course
4.Locally expressed regulatory genes in early sea urchin development (Using the NanoString nCounter, an RNA counting device)
5.Transcriptome (tiling array data to genome assembly V2.6.)
6.Temporal expression profiles of sequences found in 35,282 gene predictions within the sea urchin genome.
Dr. Qiang Tu employed the protein groups that naturally categorize the Sp proteome into groups based on 34 papers published in the December, 2006 issue of Developmental Biology. This new scheme consists of 24 groups and 138 sub-groups (http://spbase.org/SpBase/misc/Qiang-138-subgroups). Apart from this 3,362 Pfam domain names from the GLEAN proteins were re-grouped according to a variety of attributes including structure, biological processes, intracellular association, enzymatic activity, etc. Currently 3,362 pfam groups were compressed and organized into 135 new groups.For the Pm-genes and Lv-genes that matched SPU genes, analogous functional category class and sub-class was used to annotate.
The InterProScan program was employed to identify all know protein motifs, families, domains and Gene Ontology (GO) terms (Cellular Component, Biological Process, and Molecular Function categories) in SPU_ (GLEAN3) proteins. Also protein domain assignment came from superfamily (using HMM library for assignment). http://supfam.cs.bris.ac.uk/SUPERFAMILY/cgi-bin/gen_list.cgi?genome=tu
Sequence information is provided for 3 gene models
a)Glean predicted gene (CDS and peptides)
b)Glean predicted gene with 3’UTRs (Exons)
c)Trancriptome gene (Exons and peptides)
SPU id nomenclature
-SPU genes have version #1
Example: SPU_013015.1 is Glean predicted gene
-SPU genes with 3’ UTR have version #2
Example: SPU_013015.2 is Glean predicted gene with 3’UTRs
-WHL gene assigned to SPU (glean genes) have version #3. WHL isomers are assigned subversion a,b,c,….,q
Example: SPU_013015.3a is WHL22.600041 gene with single isoform
SPU_010424.3a, SPU_010424.3b, SPU_010424.3c and SPU_010424.3d is WHL22.532435 gene with four isoforms.
-if a SPU gene has more than one WHL mapped to WHL gene WHL22.100773 and SPU_021500.4a is mapped to WHL22.100781
The sequence information can be downloaded using the download tab. There is also linkout to GBrowse : a genome browser
Always refer to the Foot notes link for information on respective sequence
For LVA and PMI genes we have MAKER2 predicted gene (CDS and peptides).
Reagent data was mined from literature. The reagents types are QPCR primer (forward, reverse), reverse transcriptase PCR: RT-PCR(forward, reverse), Whole mount in situ hybridization: WMISH(forward and reverse), WMISH probe, Splicing Morpholino, Translation Block Morpholino.
The comment section would indicate if the gene has a duplicate, isomer, overlapped ribosomal RNA or if gene model is derived directly from the RNAseq transcriptome(note: There are 3000 WHL genes which do not have SPU gene assigned. Comparison using ncbi BLAST was against nt database and 900 genes had match to Sp genes, remainder were mostly bacterial hits. These genes have been auto-assigned SPU id starting SPU_030267)
Reference section gives list of publications in which the gene has been refered.
Duplicate genes have been retired from annotation database. A SPU gene is a duplicate if more than two SPU genes have exact start and end coordinates with a single WHL. A primary gene essentially has reference, gene expression data(experimentally supported) etc. a duplicate would have this data missing. Searching for duplicate gene does get you to the gene information page but explicitly mentions in the header section that it is a duplicate gene example: SPU_011552 has duplicate gene SPU_009384
Isomers have been retired from annotation database. A SPU gene is an isomer if more than two SPU genes overlap with a single WHL. Example: SPU_005343 is primary gene and SPU_009374 is it's isomer gene. A primary gene essentially has reference, gene expression data(experimentally supported) etc. a isomer would have this data missing. Searching for isomer gene in annotation database would direct you to the primary gene. In case of primary isomer gene the comment section explicitly mentions that the secondary isomer has been retired.
Trasncriptome data (RNA-seq)
A comprehensive transcriptome analysis has been performed on protein coding RNAs of Strongylocentrotus purpuratus, including 10 different embryonic stages, 6 feeding larval and metamorphosed juvenile stages, and 6 adult tissues. To generate a more complete set of gene models we pooled the transcriptomes from all these sources. 22465 WHL genes (transcriptome model) mapped to GLEAN genes. There are 3000 WHL genes which do not have SPU gene assigned. Comparison using ncbi BLAST was against nt database and 900 genes had match to Sp genes, remainder were mostly bacterial hits. These genes have been auto-assigned SPU id starting SPU_030267