Click
here to close Hello! We notice that
you are using Internet Explorer, which is not supported by Echinobase
and may cause the site to display incorrectly. We suggest using a
current version of Chrome,
FireFox,
or Safari.
???displayArticle.abstract???
BACKGROUND: The immune system of the purple sea urchin, Strongylocentrotus purpuratus, is complex and sophisticated. An important component of sea urchin immunity is the Sp185/333 gene family, which is significantly upregulated in immunologically challenged animals. The Sp185/333 genes are less than 2 kb with two exons and are members of a large diverse family composed of greater than 40 genes. The S. purpuratus genome assembly, however, contains only six Sp185/333 genes. This underrepresentation could be due to the difficulties that large gene families present in shotgun assembly, where multiple similar genes can be collapsed into a single consensus gene.
RESULTS: To understand the genomic organization of the Sp185/333 gene family, a BAC insert containing Sp185/333 genes was assembled, with careful attention to avoiding artifacts resulting from collapse or artificial duplication/expansion of very similar genes. Twelve candidate BAC assemblies were generated with varying parameters and the optimal assembly was identified by PCR, restriction digests, and subclone sequencing. The validated assembly contained six Sp185/333 genes that were clustered in a 34 kb region at one end of the BAC with five of the six genes tightly clustered within 20 kb. The Sp185/333 genes in this cluster were no more similar to each other than to previously sequenced Sp185/333 genes isolated from three different animals. This was unexpected given their proximity and putative effects of gene homogenization in closely linked, similar genes. All six genes displayed significant similarity including both 5'' and 3'' flanking regions, which were bounded by microsatellites. Three of the Sp185/333 genes and their flanking regions were tandemly duplicated such that each repeated segment consisted of a gene plus 0.7 kb 5'' and 2.4 kb 3'' of the gene (4.5 kb total). Both edges of the segmental duplications were bounded by different microsatellites.
CONCLUSIONS: The high sequence similarity of the Sp185/333 genes and flanking regions, suggests that the microsatellites may promote genomic instability and are involved with gene duplication and/or gene conversion and the extraordinary sequence diversity of this family.
Figure 1. The Sp185/333 genes on 7096 have four different element patterns. The genes are aligned according to the repeat-based alignment [15]. The genes have two exons and a single intron that is shown as a white box (not to scale). The first exon encodes the leader (L). The Greek letters indicate the intron type based on sequence analysis [15]. The second exon has large gaps (horizontal lines) inserted to optimize the alignment, which define blocks of sequence called elements (gray and colored boxes). The consensus of all possible elements is shown at the bottom. Variations in the presence or absence of elements defines element patterns (A2γ, B8β, D1α, and E2δ, which are abbreviated according to [15]). Elements that correlate with each of the six types of repeats are shown in different colors (type 1 in red; type 2 in blue; type 3 in green; type 4 in yellow; type 5 in purple; type 6 in orange; [15]). The figure is modified from [15].
Figure 2. Varying the unitigger rate affects assembly of the region of 7096 that contains three D1 genes. The sequences assembled using various unitigger rates in the Celera WGS assembler are shown with oriented and ordered scaffolds (see Table 3 for details). The assembly generated by Baylor (Genbank:AC204781.3) is shown at the bottom (B). Sequence differences and scaffold fragmentation among the assemblies is shown for the D1 gene region, which is represented by the "D1 region" box on the consensus assembly and is illustrated in the enlarged regions of the various assemblies. The multi-colored D1-g/y gene and gene fragments indicate hybrid genes in which a mix of sequencing reads from D1-g and D1-y are incorrectly assigned to a single region. The gene fragments marked D1 and shown in gray do not contain enough sequence to identify them as D1-g or D1-y. The primer positions are indicated by number (see Table 2) at the bottom with the forward primers (For) on the top line and the reverse primers (Ref) on the bottom line. Assemblies 3, 6, 11, 12, and 15 have been omitted because of assembly errors based on incorrect positioning of primers.
Figure 3. Experimental evidence supports assembly 9. A. PCR amplification confirms the sizes of the regions surrounding the A2, B8, D1-b, and E2 genes. Amplicons in lanes 1 (~4 kb), 2 (~3.6 kb), 4 (~3.6 kb), and 5 (~4 kb), correspond to the sizes of the A2, B8, D1-b, and E2 genes plus their flanking regions according to sizes predicted in all the candidate assemblies (see Figure 2). A single amplicon of ~4 kb (lane 3) was generated from primers predicted to amplify each D1-g, and D1-y genes plus flanking regions. See Table 2 and Figure 2 for primer information. B. Diagram of a region of assembly 9 showing the D1 genes (B8, orange; see also Figure 2). The subcloned regions of 7096 containing D1 genes (amplified with primers 2F and 1R; see Table 2 and Figure 2) are indicated. The assembled sequence for these subclones contains either one (D1-y) or two (D1-g) AseI restriction sites (purple lines). One of the AseI sites in the D1-g gene is mutated by a SNP. Because of the gap in assembly 9, which includes this region (dashed line), one of the AseI sites is predicted (dashed purple line) based on sequence similarity with the D1-y subclone. C. A SNP obliterates an AseI restriction site and differentiates D1-y and D1-g genes. PCR amplicons using 2F and 1R primers produce 4 kB fragments. When digested with AseI the clones containing a D1-y gene could be differentiated from those with a D1-g gene. Lane 1, D1-y gene (4.2 kb, 2.3 kb, and 0.9 kb). Lane 2, D1-g gene (4.2 kb and 3.2 kb). Lane 3, vector without insert has one AseI site (4 kb).
Figure 4. The experimentally validated assembly of 7096 contains six Sp185/333 genes. The finished assembly and sequence of the region in 7096 containing the Sp185/333 genes is shown after experimental confirmation by PCR, PFGE, AseI digests, and sequencing subclones. The BAC contains six Sp185/333 genes: one A2 gene, one B8 gene, three D1 genes, and one E2 gene (see Figure 1 for element pattern information). All are located at the 3' end of the BAC insert. Gene orientations are indicated and spacing is to scale unless otherwise noted. GA microsatellites are shown flanking each gene and GAT microsatellites are shown to the 5' side of B8 and the three D1 genes.
Figure 5. The Sp185/333 genes from 7096 are equally diverse as those randomly isolated from three animals. Mean pairwise diversity scores of the six Sp185/333 genes from 7096 are compared to other Sp185/333 genes previously isolated from three other sea urchins (blue bars; 29 genes from animal 4, 87 genes from animal 2, and 49 genes from animal 10 [15]). The D1 genes (9 genes from animal 4, 20 genes from animal 2, 6 genes from animal 10, and 3 genes from 7096) were analyzed separately (red bars).
Figure 6. The element sequence diversity for the Sp185/333 genes clustered on 7096 is not different from the element diversity for the Sp185/333 genes with unknown genomic organization. Unique Sp185/333 genes from three sea urchins previously isolated (see legend for Figure 5, [15]) and the six genes from 7096 were aligned according to the repeat-based alignment (as in Figure 1, [15]). Genes from individual animals and those from the BAC were used to calculate pairwise diversity scores for each element in MEGA [37]. Average diversity scores and the standard deviations are shown. Elements that were not present in a majority of the sequences were omitted from the analysis.
Figure 7. Alignment of sequences surrounding the GAT and GA microsatellites located to the 5' side of most of the Sp185/333 genes. A region that is approximately 240 to 700 nucleotides 5' of each gene is shown. A GAT microsatellite is not present 5' of the A2 and E2 genes. The red boxes indicate the GAT microsatellite sequences; the green boxes indicate the GA microsatellite sequences. Dots indicate identity with respect to the sequence on the first line. The alignment was produced using MEGA [37] and edited by hand.
Figure 8. Transposon fragments are present in the flanking regions for the A2 and E2 genes. A. Gypsy10-long terminal repeat (LTR)_S fragment. The LTR [GenBank:AAGJ02039135.1] is represented by 139 nt of 2430 nt of the consensus sequence. It is present 684 nt to the 3' side of A2 and continues 87 nt into the GA microsatellite. B. Three Tc1-N1 SP transposon fragments. Transposon fragments [44] (267 nt, 75 nt, and 140 nt of 553 nt consensus sequence) are present in tandem and positioned 453 nt 5' of E2 and 70 nt 5' of the GA microsatellite. Fragment orientation is indicated with arrows.
Figure 9. Microsatellites border the conserved sequence flanking the Sp185/333 genes. A. Pairwise diversity of Sp185/333 genes and flanking regions. Pairwise diversity was calculated among five regions (gene and four flanking regions) in MEGA [37]. The Sp185/333 gene (on the x-axis in red; 5' to 3' orientation) represents a generic gene with the intron shown as a thinner region. The flanking regions are defined by the edge of the gene and the location of the GA microsatellites (purple triangles). Region 1 (~250 nt; dark gray) is upstream of the 5' GA microsatellite. Region 2 (~430 nt; light gray) is between the GA microsatellite and the start codon. Region 3 (~330 nt; light gray) is between the stop codon and the 3' GA microsatellite. Region 4 (~330 nt; dark dray) is downstream of the 3' GA microsatellite. The two colors in each line correspond to the two genes that were used in the pairwise comparison and match the gene colors shown in Figure 4. Three categories of pairwise diversity are i) high (squares) in regions flanking the gene, ii) low (circles) in all regions including the gene sequences, and iii) hybrid (triangles) where pairwise diversity is low in regions 1 and 2 and high in regions 3 and 4. B. The microsatellites are boundaries for sequence conservation. Alignments of the genes and flanking sequences were used to calculate the entropy over a 30 nt window that slides 1 nt for each calculation. Entropy scores are shown for the analysis with all six genes (blue line) and for only the three D1 genes (red line). The black lines show the average diversity of the regions indicated on the x-axis for all six genes (dashed line), or only the three D1 genes (dotted line).
Figure 10. Dot plot of the Sp185/333 gene cluster shows gene and segment duplication. The Sp185/333 gene cluster of 34 kB is plotted against itself. The colored pentagrams indicate the positions of each of the Sp185/333 genes. Matching sequence appears as diagonal lines of dots indicating similar sequences are present in the same or opposite orientation. The genes plus a short region upstream of each of the genes are conserved. The dotted box illustrates the region of segmental duplications as indicated by the length and number of parallel, diagonal lines. The total length of the segmental duplications is ~13.7 kb and consists of three segments that each include a D1 gene (yellow, green and blue).
Figure 11. Duplications of genes and larger segments may be mediated by microsatellites. A. Alignment of the Sp185/333 genes and flanking regions. The genes (indicated as colored arrows) and proximal flanking regions between the GA microsatellites (purple triangles) are conserved. (For definitions of distal and proximal flanking regions, see the text and legend for Figure 9.) Conserved sequence between the GAT microsatellites (orange triangles) includes the three D1 genes and the associated intergenic regions. The GAT microsatellites are split into half triangles, except for the one located on the 5' side of D1-b, to show their positions relative to each gene. All of the intergenic sequence between the genes is shown, except for the region between A2 and B8. The fine dotted lines indicate how the sequences fit together on the BAC. The legend shows variations in color that relate to ranges of pairwise diversity scores based on results in B. B. Pairwise sequence diversity relative to the D1-y and D1-g genes. The level of sequence conservation is based on pairwise diversity scores for each of the Sp185/333 genes compared to the D1-y and D1-g genes. Colors in the table correlate to colors in the alignment in A.
Adams,
The genome sequence of Drosophila melanogaster.
2000, Pubmed
Adams,
The genome sequence of Drosophila melanogaster.
2000,
Pubmed
Bagshaw,
High frequency of microsatellites in S. cerevisiae meiotic recombination hotspots.
2008,
Pubmed
Boothroyd,
A yeast-endonuclease-generated DNA break induces antigenic switching in Trypanosoma brucei.
2009,
Pubmed
Brites,
The Dscam homologue of the crustacean Daphnia is diversified by alternative splicing like in insects.
2008,
Pubmed
Britten,
The single-copy DNA sequence polymorphism of the sea urchin Strongylocentrotus purpuratus.
1978,
Pubmed
,
Echinobase
Brockton,
Localization and diversity of 185/333 proteins from the purple sea urchin--unexpected protein-size range and protein expression in a new coelomocyte type.
2008,
Pubmed
,
Echinobase
Buckley,
Extraordinary diversity among members of the large gene family, 185/333, from the purple sea urchin, Strongylocentrotus purpuratus.
2007,
Pubmed
,
Echinobase
Buckley,
Sequence variations in 185/333 messages from the purple sea urchin suggest posttranscriptional modifications to increase immune diversity.
2008,
Pubmed
,
Echinobase
Buckley,
The 185/333 gene family is a rapidly diversifying host-defense gene cluster in the purple sea urchin Strongylocentrotus purpuratus.
2008,
Pubmed
,
Echinobase
Bullock,
Effects of poly[d(pGpT).d(pApC)] and poly[d(pCpG).d(pCpG)] repeats on homologous recombination in somatic cells.
1986,
Pubmed
Cameron,
SpBase: the sea urchin genome database and web site.
2009,
Pubmed
,
Echinobase
Cameron,
A sea urchin genome project: sequence scan, virtual map, and additional resources.
2000,
Pubmed
,
Echinobase
Cannon,
Identification of diversified genes that contain immunoglobulin-like variable regions in a protochordate.
2002,
Pubmed
Cannon,
The phylogenetic origins of the antigen-binding receptors and somatic diversification mechanisms.
2004,
Pubmed
Dheilly,
Highly variable immune-response proteins (185/333) from the sea urchin, Strongylocentrotus purpuratus: proteomic analysis identifies diversity within and between individuals.
2009,
Pubmed
,
Echinobase
Dishaw,
Genomic complexity of the variable region-containing chitin-binding proteins in amphioxus.
2008,
Pubmed
Dong,
AgDscam, a hypervariable immunoglobulin domain-containing receptor of the Anopheles gambiae innate immune system.
2006,
Pubmed
Drost,
A microarray-based genotyping and genetic mapping approach for highly heterozygous outcrossing species enables localization of a large fraction of the unassembled Populus trichocarpa genome sequence.
2009,
Pubmed
Flajnik,
Evolution of innate and adaptive immunity: can we draw a line?
2004,
Pubmed
Gendrel,
(CA/GT)(n) microsatellites affect homologous recombination during yeast meiosis.
2000,
Pubmed
Ghosh,
Sp185/333: a novel family of genes and proteins involved in the purple sea urchin immune response.
2010,
Pubmed
,
Echinobase
Gilad,
Human specific loss of olfactory receptor genes.
2003,
Pubmed
Havlak,
The Atlas genome assembly system.
2004,
Pubmed
Hibino,
The immune gene repertoire encoded in the purple sea urchin genome.
2006,
Pubmed
,
Echinobase
Hoskins,
Sequence finishing and mapping of Drosophila melanogaster heterochromatin.
2007,
Pubmed
Jensen-Seaman,
Comparative recombination rates in the rat, mouse, and human genomes.
2004,
Pubmed
Kapitonov,
Harbinger transposons and an ancient HARBI1 gene derived from a transposase.
2004,
Pubmed
Kong,
A high-resolution recombination map of the human genome.
2002,
Pubmed
Kumar,
MEGA: a biologist-centric software for evolutionary analysis of DNA and protein sequences.
2008,
Pubmed
Lander,
Initial sequencing and analysis of the human genome.
2001,
Pubmed
Leister,
Tandem and segmental gene duplication and recombination in the evolution of plant disease resistance gene.
2004,
Pubmed
Li,
Chromatin modification and epigenetic reprogramming in mammalian development.
2002,
Pubmed
Li,
Analysis on frequency and density of microsatellites in coding sequences of several eukaryotic genomes.
2004,
Pubmed
Loker,
Invertebrate immune systems--not homogeneous, not simple, not well understood.
2004,
Pubmed
,
Echinobase
McDowell,
Molecular diversity at the plant-pathogen interface.
2008,
Pubmed
Medvedev,
Maximum likelihood genome assembly.
2009,
Pubmed
Méndez-Lago,
Novel sequencing strategy for repetitive DNA in a Drosophila BAC clone reveals that the centromeric region of the Y chromosome evolved from a telomere.
2009,
Pubmed
Messier-Solek,
Highly diversified innate receptor systems and new forms of animal immunity.
2010,
Pubmed
Meyers,
Genome-wide analysis of NBS-LRR-encoding genes in Arabidopsis.
2003,
Pubmed
Multerer,
Two cDNAs from the purple sea urchin, Strongylocentrotus purpuratus, encoding mosaic proteins with domains found in factor H, factor I, and complement components C6 and C7.
2004,
Pubmed
,
Echinobase
Murphy,
RecA independent recombination of poly[d(GT)-d(CA)] in pBR322.
1986,
Pubmed
Myers,
A fine-scale map of recombination rates and hotspots across the human genome.
2005,
Pubmed
Myers,
A whole-genome assembly of Drosophila.
2000,
Pubmed
Nair,
Macroarray analysis of coelomocyte gene expression in response to LPS in the sea urchin. Identification of unexpected immune diversity in an invertebrate.
2005,
Pubmed
,
Echinobase
Napierala,
Structure-dependent recombination hot spot activity of GAA.TTC sequences from intron 1 of the Friedreich's ataxia gene.
2004,
Pubmed
Payen,
Segmental duplications arise from Pol32-dependent repair of broken forks through two alternative replication-based mechanisms.
2008,
Pubmed
Raftos,
Evolutionary immunology: early vertebrates reveal diverse immune recognition strategies.
2008,
Pubmed
Rast,
New approaches towards an understanding of deuterostome immunity.
2000,
Pubmed
,
Echinobase
Rast,
Marine invertebrate genome sequences and our evolving understanding of animal immunity.
2008,
Pubmed
,
Echinobase
Rebeiz,
GenePalette: a universal software tool for genome sequence visualization and analysis.
2004,
Pubmed
Richly,
Mode of amplification and reorganization of resistance genes during recent Arabidopsis thaliana evolution.
2002,
Pubmed
Schatz,
Hawkeye: an interactive visual analytics tool for genome assemblies.
2007,
Pubmed
Schlötterer,
Evolutionary dynamics of microsatellite DNA.
2000,
Pubmed
Schmucker,
Dscam and DSCAM: complex genes in simple animals, complex animals yet simple genes.
2009,
Pubmed
Schmucker,
Drosophila Dscam is an axon guidance receptor exhibiting extraordinary molecular diversity.
2000,
Pubmed
She,
Shotgun sequence assembly and recent segmental duplications within the human genome.
2004,
Pubmed
Sodergren,
The genome of the sea urchin Strongylocentrotus purpuratus.
2006,
Pubmed
,
Echinobase
Strathmann,
Transposon-facilitated DNA sequencing.
1991,
Pubmed
Terwilliger,
Unexpected diversity displayed in cDNAs expressed by the immune cells of the purple sea urchin, Strongylocentrotus purpuratus.
2006,
Pubmed
,
Echinobase
Terwilliger,
Distinctive expression patterns of 185/333 genes in the purple sea urchin, Strongylocentrotus purpuratus: an unexpectedly diverse family of transcripts in response to LPS, beta-1,3-glucan, and dsRNA.
2007,
Pubmed
,
Echinobase
Tóth,
PLOTREP: a web tool for defragmentation and visual analysis of dispersed genomic repeats.
2006,
Pubmed
Traherne,
Human MHC architecture and evolution: implications for disease association studies.
2008,
Pubmed
Wahls,
The Z-DNA motif d(TG)30 promotes reception of information during gene conversion events while stimulating homologous recombination in human cells in culture.
1990,
Pubmed
Watson,
Extensive diversity of Ig-superfamily proteins in the immune system of insects.
2005,
Pubmed
Zhang,
Diversification of Ig superfamily genes in an invertebrate.
2004,
Pubmed