Other data

Table of Contents

  1. EST
  2. BAC end
  3. BACs
  4. Repeats
  5. Quantitative PCR primers
  6. Reagents
  7. Mapping all features to 2.6
  8. Mapping all features to 3.1


A search for S. purpuratus ESTs at NCBI dbEST database produces 141,833 hits.A search at NCBI tracedb database gives us 66,555 ESTs. Among them, all but 14040 were present in the first set. The remaining 14,040 are supposed to be also included in the first set based on discussions with Baylor. This needs to be investigated.In the meanwhile, we continue our calculations with the first set of 141,833 ESTs.The ESTs in first set had descriptive names, and could be assigned to following tissue type:

Coelomocyte (2647)

Egg (6006)

Different stages of embryo (23545)

Gut (939)

Lantem (1113)

Larva (12484)

Mesentary (1201)

PMC (51097)

Radial nerve (2026)

Testis (2043)

Tube foot (2139)

Rest (36593)

The ESTs in 'Rest' start with 'yd' (36,523 ESTs) or 'MPMG' (70 ESTs). The 'yd's are full-length clones from Coffman, and MPMGs are from Albert Poustka at the Max Planck Institute for Molecular Genetics, Berlin.

Back to the top of the page

BAC end

The BAC-end STC scan data is valuable in two respects. First, it provides a virtual map and immediate access to any given genomic region for further study. Second, it also provides sequences from a significant, random samples of the whole genome. After editorial removal of substandard sequence, the average length of the sequence reads in the final STC database was 610 bp. The data set consists of about 76,020 BAC-end sequences.The genome size of S. purpuratus is around 800 Mbp, based on sperm DNA content, so the STC sequences in total amount to 5% of the genomic sequence length. On the average, they willoccur about every 10 kb in the genome.Additional BAC-ends were collected by the Human Genome Sequencing Center, Baylor College of Medicine as part of the purple sea urchin genome project. These sequences are also available from Genbank database.


Large, high-density libraries arrayed on filters
A major product of the Sea Urchin Genome Resource is genomic BAC libraries and cDNA libraries arrayed in 384-well plates and spotted on nylon macroarray filters (Rast et al., 2000; Cameron et al, 2000). The filters are approximately 22x22 cm and contain a total of 18,432 duplicated spots arranged in uniquely identifiable, geometrically predefined patterns in 4 x 4 blocks (Maier et al., 1994.). The cDNA libraries are likely to be missing only a very few if any expressed genes considering that about 8-10,000 expressed genes are arrayed in 100,000 clones. Hundreds of screenings (see below) of these libraries support that conclusion. The genomic BAC libraries are constructed in a common BAC vector at an average insert size of 140Kb. Therefore, 100,000 clone libraries contain about 13x coverage of the 800Mb sea urchin genome. We have in our arsenal both genomic and cDNA libraries from several species (below). In most cases we also have purified genomic DNA from the same animal used to make the genomic library. This latter feature obviates some of the problems associated with sequence use such as primer design which stem from the highly polymorphic state of these invertebrate genomes.
BAC Libraries
Sp BAC(Mbo1)
Sp BAC genomic
Sp small BAC
Et small BAC
Lv small BAC
Gut Sp cDNA
Pl small BAC
Am small BAC
Am large BAC

Sp - Strongylocentrotus purpuratus
Sf - Strongylocentrotus franciscanus
Lv - Lytechinus variegatus
Ap - Arbacia punctulata
Et - Eucidaris tribuloides
Am - Asterina miniata
Pf - Ptychodera flava
PI - Paracentrotuslividis

Back to the top of the page


The frequency distribution and canonical sequences of all middle and highly repetitive sequence families in the genome were obtained from the STCs. Over 500 simple sequence repeats that are useful for genotyping have been catalogued. These collections are included in our sequence database and are available for downloading.

Back to the top of the page

Quantitative PCR primers

The Davidson laboratory at Caltech has generated a panel of quantitative PCR primers that are useful for measuring the level of mRNA abundance for genes involved in early development of sea urchin in general and the endomesoderm gene regulatory network in particular. A table of primer sequences and comments can be viewed here or downloaded here.

Back to the top of the page


The new reagent_table structure in SpBase is as following:

1. SPU_#
2. data_type (see below)
3. data_set (1,2,3...to indicate the pair of primer and how many version for certain data)
3. data_prefix (for storing primer name)
4. data (sequence data)
5. reference (PMID)

For data_type, current nomenclature:

1. QPCR_F (Forward primer)
2. QPCR_R (Reverse primer)
3. RT-PCR_F (Forward primer)
4. RT-PCR_R (Reverse primer)
5. S-MO (splicing Morpholino)
6. T-MO (translation block Morpholino)
7. WMISH_F (WMISH Forward primer)
8. WMISH_R (WMISH Reverse primer)
9. WMISH_P (WMISH probe)

The table only contains Strongylocentrotus purpuratus reagent data only, which is taken from the texts and tables of published papers. The papers are selected from the collection of sea urchin papers by looking for the keywords such as WMISH, QPCR, RT-PCR, morpholino, and patterns for oligonucleotides.

Back to the top of the page

Mapping all features to 2.6

We created a genomic coordinate match file that covered identical regions between V2.1 assembly and V2.6 assembly. This file was used to map features from V2.1 assembly to 2.6 assembly based on their known locations in 2.1 assembly.

Back to the top of the page

Mapping all features to 3.1

We mapped all genomic features to V3.1 assembly using the following procedure. First, the features were mapped on to V3.1 assembly using blat. For some genes/features, blat produced multiple matches. Some of the matches mapped the entire gene sequence and some mapped part of the gene. We kept the match that had the highest fraction of gene mapped.

Back to the top of the page