Click
here to close Hello! We notice that
you are using Internet Explorer, which is not supported by Echinobase
and may cause the site to display incorrectly. We suggest using a
current version of Chrome,
FireFox,
or Safari.
BMC Genomics
2016 Nov 09;171:900. doi: 10.1186/s12864-016-3241-x.
Show Gene links
Show Anatomy links
Short tandem repeats, segmental duplications, gene deletion, and genomic instability in a rapidly diversified immune gene family.
Oren M
,
Barela Hudgell MA
,
D'Allura B
,
Agronin J
,
Gross A
,
Podini D
,
Smith LC
.
???displayArticle.abstract???
BACKGROUND: Genomic regions with repetitive sequences are considered unstable and prone to swift DNA diversification processes. A highly diverse immune gene family of the sea urchin (Strongylocentrotus purpuratus), called Sp185/333, is composed of clustered genes with similar sequence as well as several types of repeats ranging in size from short tandem repeats (STRs) to large segmental duplications. This repetitive structure may have been the basis for the incorrect assembly of this gene family in the sea urchin genome sequence. Consequently, we have resolved the structure of the family and profiled the members by sequencing selected BAC clones using Illumina and PacBio approaches.
RESULTS: BAC insert assemblies identified 15 predicted genes that are organized into three clusters. Two of the gene clusters have almost identical flanking regions, suggesting that they may be non-matching allelic clusters residing at the same genomic locus. GA STRs surround all genes and appear in large stretches at locations of putatively deleted genes. GAT STRs are positioned at the edges of segmental duplications that include a subset of the genes. The unique locations of the STRs suggest their involvement in gene deletions and segmental duplications. Genomic profiling of the Sp185/333 gene diversity in 10 sea urchins shows that no gene repertoires are shared among individuals indicating a very high gene diversification rate for this family.
CONCLUSIONS: The repetitive genomic structure of the Sp185/333 family that includes STRs in strategic locations may serve as platform for a controlled mechanism which regulates the processes of gene recombination, gene conversion, duplication and deletion. The outcome is genomic instability and allelic mismatches, which may further drive the swift diversification of the Sp185/333 gene family that may improve the immune fitness of the species.
Fig. 1. The Sp185/333 genes have two exons and a mosaic of elements in the second exon. a An alignment cartoon illustrates the structure of several genes with two exons (shown in relative size scale) and one intron (int; not shown to scale). Elements in the second exon are indicated as colored rectangles and gaps that have been artificially inserted to optimize the alignment are shown as horizontal black lines. All known elements are numbered at the top. Element patterns share mosaics of elements and naming of element patterns (on the left) are based on the sequence of element 10 (equivalent of element 15 in [25]). The imperfect, tandem type I repeats in the 5â² half of the second exon are indicated as red rectangles (elements 2â5) and have been evaluated computationally for duplications, deletions and recombinations [31]. Five additional types of repeats are imperfect and interspersed in elements 11â26 (see [20]). This figure is modified from the repeat-based alignment published in [20]. b The approximate locations of primers are indicated with arrows within the standard Sp185/333 gene structure. These primers are used to amplify Sp185/333 gene sequences and to identify the genes within the BAC insert assemblies. Primer sequences are listed in Additional file 1: Table S1. The arrows between (a) and (b) indicate the correlation between elements in (a) and locations of primers shown in (b)
Fig. 2. Three distinct Sp185/333 gene amplicon size patterns are detected in large insert BAC clones. Amplicon peak height and area indicate three major amplicon size patterns from fragment analysis, which are designated as Cluster 1 (a), Cluster 2 (b) and Cluster 3 (c). Amplicon size patterns identified less often (1 or 2 clones of 27 analyzed; see Additional file 3: Figure S2A) and that partially match amplicons in either Cluster 1 or 2 are designated as Cluster 1â², Cluster 1â³ (a) or Cluster 2â² (b). Fragment length analysis of each cluster was carried out for each of the corresponding BACs (listed below) using primers F6 and R9 (see Additional file 1: Table S1; Additional file 2: Figure S1B). Means and standard error were calculated when multiple BACs were used for analysis of Clusters 1 and 2. Cluster 1 BACs are 10A2, 10B1, 10C6, 4074Â J14, 4079E24, 61Â M13. Cluster 2 BACs are 4C3, 4007Â J10, 4024Â N22, 4029Â F3, 4091I3, 10Â K9, 64C18, 4011G4. Cluster 3 BACs are 10C18, 10H9, 10H10, 10Â M18, 3090I9, 3104Â N4, 3104P4, 4028Â F7, 4067A10. Cluster 1â² BACs are 3033E12, 4093A10. Cluster 1â³ BAC is 4093A10. The Cluster 2â² BAC is 409O3
Fig. 3. Repetitive sequences in the Sp185/333 gene clusters are assembled correctly by PacBio but not by Illumina. Five BAC insert assemblies are compared to self by pairwise alignments and illustrated as dot plots. Lines and dots that are displaced from the central diagonal are repetitive sequences that include Sp185/333 gene sequences (regions with displaced diagonals are indicated with boxes) and STRs (dots). a The Sp185/333 gene clusters assembled from Illumina reads for BAC assemblies corresponding to Clusters 1, 2 and 3 have Sp185/333 genes that are fragmented and shorter than expected (the displaced diagonals are composed of short lines or dots) based on gene size predictions (not shown) and according to previous reports [20, 21]. b The Sp185/333 gene clusters assembled from PacBio reads for BACs compared to self. The PacBio assemblies show intact and longer displaced diagonals representing genes of expected length. The orientations of the genes are indicated by the displaced diagonals are either parallel (same orientation) or perpendicular (opposite orientation) to the central diagonal
Fig. 4. Comparisons among BAC insert assemblies suggest two genomic loci for three Sp185/333 gene clusters. A map of all BAC insert sequences is based on pairwise alignments among all BAC insert sequences and indicates that Clusters 1 and 2 may be allelic. a Nine BAC insert sequences, including five that contain Sp185/333 gene Cluster 1 and four that contain Sp185/333 gene Cluster 2, match almost perfectly within the regions that flank the gene clusters. b The two BAC inserts that include Cluster 3 align separately as a different locus
Fig. 5. Three clusters of Sp185/333 genes in the sequenced genome of S. purpuratus. The genes within Clusters 1, 2 and 3 range in size from 1170 to 1894 nt and are spaced apart by 3â12.8 kb. All genes have two exons, as indicated by the rectangle (first exon) and pentagon (second exon), which also indicates gene orientation within each cluster. Element patterns are listed above each gene and are indicated in different colors as initially defined by [21], which is identified by the shaded area in Cluster 1. All genes are surrounded by GA STRs (green triangles). The long stretches of GA STRs located on both sides of Cluster 3 are positioned at distances that correlate with corresponding genes in the other clusters. This map is based on three verified BAC assemblies (accession numbers: KU668452, KU668453, KU668454). Segmental duplications are surrounded by GAT STRs (black triangles denote GAT STRs ofââ¥â35 repeats and gray triangles denote GAT STRs of 4â17 repeats; see Additional file 7: Table S3). Triangle location above or below the line indicates the orientation of the STRs, which is relative to the proximal gene
Fig. 6. Tandem segmental duplications are bounded by GAT STRs. a Segmental duplications that include D1 genes are present in all three clusters (dark gray rectangles), are bounded by GAT STRs (black triangles denote GAT STRs of 35 or more repeats; gray triangles denote 4â17 repeats), and are indicated by the Cluster number and order of appearance (1â1, 1â2, etc.). 3* indicates a partial segmental duplication in Cluster 3, which is flanked by a GAT STR on the left and GA STR on to the right. The locations of putatively deleted genes in Cluster 3 are marked as âGene?â in light gray that correlate with the positions of long GA repeats (see Fig. 5). b An alignment of the full length segmental duplications was employed for phylogenetic analysis by maximum parsimony using MEGA. Bootstrap numbers are indicated and based on 500 iterations. The regions surrounding B8 and B8a are defined as the outgroup (OG-1, OG-2). The sequences of the segmental duplications used for the alignment and phylogenetic tree are from BAC 10B1 (GenBank accession number KU668451) duplication 1â1 is nt 110752â115376; duplication 1â2 is nt 115460â120091; duplication 1â3 is nt 120177â124609; outgroup OG-1 is nt 106160â110660. BAC 10K9 (GenBank accession number KU668453) duplication 2â1 is nt 121690â126221; duplication 2â2 is nt 126365â130976; outgroup OG-2 is nt 117100â121600. BAC 3104P4 (GenBank accession number KU668454) duplication 3â1 is nt 91585â93433 and the partial duplication 3* is nt 96723â98626
Alekseyev,
Comparative genomics reveals birth and death of fragile regions in mammalian evolution.
2010, Pubmed
Alekseyev,
Comparative genomics reveals birth and death of fragile regions in mammalian evolution.
2010,
Pubmed
Al-Sharif,
Sea urchin coelomocytes specifically express a homologue of the complement component C3.
1998,
Pubmed
,
Echinobase
Benson,
Tandem repeats finder: a program to analyze DNA sequences.
1999,
Pubmed
Boehm,
VLR-based adaptive immunity.
2012,
Pubmed
Brockton,
Localization and diversity of 185/333 proteins from the purple sea urchin--unexpected protein-size range and protein expression in a new coelomocyte type.
2008,
Pubmed
,
Echinobase
Buckley,
Diversity of animal immune receptors and the origins of recognition complexity in the deuterostomes.
2015,
Pubmed
,
Echinobase
Buckley,
Extraordinary diversity among members of the large gene family, 185/333, from the purple sea urchin, Strongylocentrotus purpuratus.
2007,
Pubmed
,
Echinobase
Buckley,
The 185/333 gene family is a rapidly diversifying host-defense gene cluster in the purple sea urchin Strongylocentrotus purpuratus.
2008,
Pubmed
,
Echinobase
Butler,
Forensic DNA typing by capillary electrophoresis using the ABI Prism 310 and 3100 genetic analyzers for STR analysis.
2004,
Pubmed
Cameron,
A sea urchin genome project: sequence scan, virtual map, and additional resources.
2000,
Pubmed
,
Echinobase
Carrillo-Bustamante,
The evolution of natural killer cell receptors.
2016,
Pubmed
Chin,
Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
2013,
Pubmed
Dawkins,
Arms races between and within species.
1979,
Pubmed
Deitsch,
Common strategies for antigenic variation by bacterial, fungal and protozoan pathogens.
2009,
Pubmed
Dheilly,
Highly variable immune-response proteins (185/333) from the sea urchin, Strongylocentrotus purpuratus: proteomic analysis identifies diversity within and between individuals.
2009,
Pubmed
,
Echinobase
Eno,
Methods for karyotyping and for localization of developmentally relevant genes on the chromosomes of the purple sea urchin, Strongylocentrotus purpuratus.
2009,
Pubmed
,
Echinobase
Gemayel,
Variable tandem repeats accelerate evolution of coding and regulatory sequences.
2010,
Pubmed
Ghosh,
Sp185/333: a novel family of genes and proteins involved in the purple sea urchin immune response.
2010,
Pubmed
,
Echinobase
Gnerre,
High-quality draft assemblies of mammalian genomes from massively parallel sequence data.
2011,
Pubmed
Gordon,
Long-read sequence assembly of the gorilla genome.
2016,
Pubmed
Hibino,
The immune gene repertoire encoded in the purple sea urchin genome.
2006,
Pubmed
,
Echinobase
Joshi,
Perspectives of genomic diversification and molecular recombination towards R-gene evolution in plants.
2013,
Pubmed
Kasahara,
Variable Lymphocyte Receptors: A Current Overview.
2015,
Pubmed
Kuang,
Multiple genetic processes result in heterogeneous rates of evolution within the major cluster disease resistance genes in lettuce.
2004,
Pubmed
Langmead,
Fast gapped-read alignment with Bowtie 2.
2012,
Pubmed
Leister,
Tandem and segmental gene duplication and recombination in the evolution of plant disease resistance gene.
2004,
Pubmed
Liberti,
Expression of Ciona intestinalis variable region-containing chitin-binding proteins during development of the gastrointestinal tract and their role in host-microbe interactions.
2014,
Pubmed
Lohse,
RobiNA: a user-friendly, integrated software solution for RNA-Seq-based transcriptomics.
2012,
Pubmed
López-Flores,
The repetitive DNA content of eukaryotic genomes.
2012,
Pubmed
Lun,
A recombinant Sp185/333 protein from the purple sea urchin has multitasking binding activities towards certain microbes and PAMPs.
2016,
Pubmed
,
Echinobase
Martin,
Comparative genomic analysis, diversity and evolution of two KIR haplotypes A and B.
2004,
Pubmed
McDowell,
Molecular diversity at the plant-pathogen interface.
2008,
Pubmed
Miller,
An Sp185/333 gene cluster from the purple sea urchin and putative microsatellite-mediated gene diversification.
2010,
Pubmed
,
Echinobase
Multerer,
Two cDNAs from the purple sea urchin, Strongylocentrotus purpuratus, encoding mosaic proteins with domains found in factor H, factor I, and complement components C6 and C7.
2004,
Pubmed
,
Echinobase
Nadalin,
GapFiller: a de novo assembly approach to fill the gap within paired reads.
2012,
Pubmed
Nair,
Macroarray analysis of coelomocyte gene expression in response to LPS in the sea urchin. Identification of unexpected immune diversity in an invertebrate.
2005,
Pubmed
,
Echinobase
Noé,
YASS: enhancing the sensitivity of DNA similarity search.
2005,
Pubmed
Nydam,
Creation and maintenance of variation in allorecognition Loci: molecular analysis in various model systems.
2011,
Pubmed
Ogden,
Multiple sequence alignment accuracy and phylogenetic inference.
2006,
Pubmed
Papamichos-Chronakis,
Chromatin and the genome integrity network.
2013,
Pubmed
Parniske,
Novel disease resistance specificities result from sequence exchange between tandemly repeated genes at the Cf-4/9 locus of tomato.
1997,
Pubmed
Pearson,
Repeat instability: mechanisms of dynamic mutations.
2005,
Pubmed
Rebeiz,
GenePalette: a universal software tool for genome sequence visualization and analysis.
2004,
Pubmed
Rosa,
Hydractinia allodeterminant alr1 resides in an immunoglobulin superfamily-like gene complex.
2010,
Pubmed
Rosengarten,
Genetic diversity of the allodeterminant alr2 in Hydractinia symbiolongicarpus.
2011,
Pubmed
Roth,
Characterization of the highly variable immune response gene family, He185/333, in the sea urchin, Heliocidaris erythrogramma.
2014,
Pubmed
,
Echinobase
Sherman,
Extraordinary Diversity of Immune Response Proteins among Sea Urchins: Nickel-Isolated Sp185/333 Proteins Show Broad Variations in Size and Charge.
2015,
Pubmed
,
Echinobase
Smith,
Diversification of innate immune genes: lessons from the purple sea urchin.
2010,
Pubmed
,
Echinobase
Smith,
Recombination events generating a novel Rp1 race specificity.
2005,
Pubmed
Smith,
Innate immune complexity in the purple sea urchin: diversity of the sp185/333 system.
2012,
Pubmed
,
Echinobase
Sodergren,
The genome of the sea urchin Strongylocentrotus purpuratus.
2006,
Pubmed
,
Echinobase
Sommer,
Minimus: a fast, lightweight genome assembler.
2007,
Pubmed
Taketa,
Botryllus schlosseri allorecognition: tackling the enigma.
2015,
Pubmed
Tang,
Genome assembly, rearrangement, and repeats.
2007,
Pubmed
Teng,
Regulation and Evolution of the RAG Recombinase.
2015,
Pubmed
Terwilliger,
Unexpected diversity displayed in cDNAs expressed by the immune cells of the purple sea urchin, Strongylocentrotus purpuratus.
2006,
Pubmed
,
Echinobase
Terwilliger,
Distinctive expression patterns of 185/333 genes in the purple sea urchin, Strongylocentrotus purpuratus: an unexpectedly diverse family of transcripts in response to LPS, beta-1,3-glucan, and dsRNA.
2007,
Pubmed
,
Echinobase
Thys,
DNA secondary structure at chromosomal fragile sites in human disease.
2015,
Pubmed
Treangen,
Repetitive DNA and next-generation sequencing: computational challenges and solutions.
2011,
Pubmed
Trowsdale,
Major histocompatibility complex genomics and human disease.
2013,
Pubmed
Uhrberg,
The KIR gene family: life in the fast lane of evolution.
2005,
Pubmed
Walter,
Diversification of both KIR and NKG2 natural killer cell receptor genes in macaques - implications for highly complex MHC-dependent regulation of natural killer cells.
2017,
Pubmed
Wilson,
Plasticity in the organization and sequences of human KIR/ILT gene families.
2000,
Pubmed
Zhang,
Representation of an immune responsive gene family encoding fibrinogen-related proteins in the freshwater mollusc Biomphalaria glabrata, an intermediate host for Schistosoma mansoni.
2004,
Pubmed