Repeats

Repetitive Sequences of the S. purpuratus Genome

There are a large number of distinct families of repeated sequence in the sea urchin genome. For an initial examination a subset of 37,187 of the STCs were compared with one another. The result is a total of about 3,000,000 statistically significant matches due to sequences repeated at least once in the sample. Of these 242,597 are matches that according to calculation would be recognized by hybridization under normal incubation conditions (60°C, 0.18 M Na+). This is termed the 'hybridization criterion' for the significance of repeated sequences, and is the criterion applied throughout this description. The quantity and frequency of repeated sequence is larger if shorter or less well matched repeated sequence motifs are considered.

The observed matches were used to classify sequences into families whose members are similar to a canonical sequence of the family. These families have a wide range in the number of copies in the genome ('frequency of repetition'). The total content of repetitive sequence in these frequency classes, i.e., >500 copies/genome, is summarized in Table 1. Here a separate entry is reserved for a specific repeat sequence (2109B) studied earlier, which is the highest frequency repetitive element of the S. purpuratus genome. In addition there are large numbers of lower frequency families for which the actual genomic frequency is yet uncertain. These are not included in Table 1. About 3,120 of these lower frequency repeats occur in the sample of 37,187 BAC STCs, in which they are present in two to ten copies. A little less than a third of the genome is made up of all repeated sequences at hybridization criterion, in agreement with reassociation kinetic estimates made a quarter of a century ago.

Table 1. Parameters of the S. purpuratus Genome

Genome size (mb)* 800
Number of genes 27,350
Average gene spacing (bp) 29,000
Frequency of middle repeat families:

elements/genome
500-8500

(7.5% of genome DNA)
2109B repeat family

elements/genome
22,000

(0.3% of genome DNA)
Simple sequence repeat

elements/genome
70,000