|
Figure 1. Amplification of 185/333 sequences. A. gDNA from animal 2 was amplified by PCR using primers flanking the 185/333 genes (185-5'UTR and 185-3'UTR; see Table 1) and showed bands of five major sizes from 1.2 to 2 kb and a minor band around 4 kb. Triangles indicate the positions of each of the bands. Similar amplification of gDNA from six additional animals had identical patterns of bands (data not shown). B. Amplification of intergenic regions by PCR using primers that annealed within genes but were oriented away from each other (185-LR1 and 185-F5 primers; Table 1) revealed a major band in all animals of about 3 kb (lanes 1â4, animals 10â13). C. gDNA from animal 2 was amplified by PCR using primers that annealed in the 5' UTR (185-5' UTR; Table 1) and the type I repeats found in elements Ex4, Ex5, and Ex6(185-R5; Table 1) and revealed the presence of genes that lacked introns. Animal 2 gDNA (lane 1) shows a band of 450 bp (indicated with a carrot). A cDNA (Sp0313, DQ183171; lane 2) also amplifies a band of about 450 bp. A cloned gene (2-02, Additional file 1; GenBank accession number EF607673; lane 3) with a typical intron amplifies a band within the range of bands in A. D. PCR amplification of 185/333 from eight different BAC clones using primers (185-5'UTR and 185-3'UTR; Table 1). Each lane contains template DNA from a different BAC (1, 126J14; 2, 108H07; 3, 053M03; 4, 121A07; 5, 004D19; 6, 182N21; 7, 148J22; 8, 019L13).
|
|
Figure 2. cDNA-based Element patterns of the 185/333 genes. A. Exon element patterns. Gene sequences were manually aligned based on alignments of cDNA sequences. The insertion of gaps (black lines), which defined the 25 elements (shown as colored boxes) [30]. A consensus sequence containing all the elements and the locations of the two exons is shown at the top. The source of each gDNA is indicated by the colored dots in the columns labeled "Animal". Blue dots indicate that the pattern was isolated from animal 2, red dots from animal 4, and green dots from animal 10. The frequency with which the patterns were found is indicated in the column labeled "Freq". The element patterns are sorted into groups (grey background shading) based on the subtype of Ex15 (specified by the letter in the box). The locations of the five types of repeats [27,30] are shown at the bottom of A (red boxes indicate type 1 repeats; blue = type 2; green = type 3; yellow = type 4; and purple = type 5). Patterns not previously identified from cDNA analysis [27,30] are denoted with an asterisk (*). The position of the stop codon in element Ex25 is indicated by the letter present in the box and the size/shape of the box (Additional file 2, Additional file 3). There are 31 unique patterns based on both exon elements and intron type. B: Intronless genes with unique element patterns amplified from BAC DNA. Untranslated sequence (due to a missense deletion) is indicated by lighter shading and narrower boxes for E2.7. The locations of the repeats are the same as A.
|
|
Figure 3. Phylogenetic tree of intron sequences. Intron sequences were aligned manually (Additional file 2). Maximum likelihood and maximum parsimony methods both produced the unrooted tree shown. The five major clades were used to define the five intron types, as shown.
|
|
Figure 4. Diversity of the 185/333 gene elements. Nucleotide diversity scores (black bars) and amino acid diversity scores (gray bars) of individual exon elements are shown. The average element nucleotide diversity is indicated by a dashed black line and average element amino acid diversity is shown as a dashed gray line. The absolute range of diversity scores is 0 (entirely conserved) to 1.609 for nucleotide alignments (based on an even distribution of five states: four nucleotides and a gap) or 3.044 for amino acid alignments (21 possible states) [30]. The highest element diversity score previously observed was from element 11 (0.5381) from a set of cDNAs isolated from bacterially-challenged coelomocytes pooled from five animals [30]. This score was similar to that obtained from a modeled element in which 40% of the positions contained four different states in 20% of the sequences. Elements with a diversity score of 0 were omitted (Ex11, Ex15a, Ex15c, and Ex15f-g of the cDNA-based alignment (A) and Er7, Er10a, Er10c, Er10f-g, Er12 -13, Er18, and Er20 of the repeat-based alignment (B). The number (#), length in nucleotides (L) and nucleotide to amino acid diversity ratio (R) are indicated in the table to the left of the graphs. Ex25 includes both Ex25a and Ex25b. Ex25a includes all sequences from the start of Ex25 up to and including the first stop codon. Ex25b includes only the sequence following the first stop codon to the second stop codon (Additional file 2 and Additional file 3). Element Er27 in the repeat-based alignment (B) is treated similarly.
|
|
Figure 5. Diversity of nucleotide positions. A. Each bar represents the nucleotide diversity of a single nucleotide position within the alignment of all cloned 185/333 genes. Green bars represent nucleotide positions within the intron. Gaps due to missing elements were excluded from the diversity calculations. B. Hypervariable positions within the 185/333 genes were identified using diversity analysis on clones belonging to cDNA-based sets (Table 5) with more than two members (94 genes total). Because sets had variable numbers of members, which artificially increases diversity for positions in sets with fewer members, each diversity score greater than 0 was assigned a value of 1. Each bar in the graph represents the number of sets (out of 8 total) that contained a polymorphism at the specific nucleotide position. Green bars indicate positions within the intron. Bars below the x-axis denote the borders between elements (cDNA-based alignment).
|
|
Figure 6. Repeat-based alignment of the 185/333 genes. The alignment was optimized for repeats and 27 elements (colored boxes) were defined. Gaps due to missing elements are indicated by horizontal black lines. There are 17 different exon element patterns, which, when combined with the intron types (Figure 3), form a total of 21 unique gene patterns. The frequency (Freq) of patterns indicates how often the pattern was identified. The source animal for each pattern is indicated by the presence of a colored dot as in Figure 2. The subtype of element Er10 (which corresponds to Ex15, Figure 2A) is indicated by the letter in the box. Intron types are unaffected by this alignment because they do not have repeats; their designation corresponds to that shown in Figure 3.
|
|
Figure 7. Correlation of elements in the cDNA-based and repeat-based alignments. The elements of the cDNA-based and repeat-based alignments are shown as blocks. Lines connecting the blocks indicate the relative positions of elements in the two alignments. In elements ExL to Ex7 and ErL to Er6, the elements correlate closely between the two alignments. Only elements Ex1 and Ex2 of the cDNA-based alignment are merged to form Er1 in the repeat-based alignment. In Ex8 to Ex25, however, the sequences are interdigitated to accommodate the alignment of the repeats, particularly those in Ex23.
|
|
Figure 8. Structure of part of the 185/333 locus. Scaffold_v2_79421 from the S. purpuratus genome assembly (Version 2, June 15, 2006) contains four linked 185/333 genes (diagram not to scale). Each of the genes includes a single intron in the predicted location. The element patterns of the genes are based on the cDNA-based alignment are indicated. Pattern D8 was not isolated from the cloned genes; however, it is similar to pattern D1, but contains Ex12 rather than Ex10. The orientations of the genes are indicated by the arrows. Dinucleotide (GA; striped ovals) flank the genes and trinucleotide repeats (GAT; solid parallelograms) are present on the 5' side of the genes.
|