Click
here to close Hello! We notice that
you are using Internet Explorer, which is not supported by Echinobase
and may cause the site to display incorrectly. We suggest using a
current version of Chrome,
FireFox,
or Safari.
Nucleic Acids Res
2017 Sep 06;4515:e142. doi: 10.1093/nar/gkx574.
Show Gene links
Show Anatomy links
A massively parallel strategy for STR marker development, capture, and genotyping.
Kistler L
,
Johnson SM
,
Irwin MT
,
Louis EE
,
Ratan A
,
Perry GH
.
???displayArticle.abstract???
Short tandem repeat (STR) variants are highly polymorphic markers that facilitate powerful population genetic analyses. STRs are especially valuable in conservation and ecological genetic research, yielding detailed information on population structure and short-term demographic fluctuations. Massively parallel sequencing has not previously been leveraged for scalable, efficient STR recovery. Here, we present a pipeline for developing STR markers directly from high-throughput shotgun sequencing data without a reference genome, and an approach for highly parallel target STR recovery. We employed our approach to capture a panel of 5000 STRs from a test group of diademed sifakas (Propithecus diadema, n = 3), endangered Malagasy rainforest lemurs, and we report extremely efficient recovery of targeted loci-97.3-99.6% of STRs characterized with ≥10x non-redundant sequence coverage. We then tested our STR capture strategy on P. diadema fecal DNA, and report robust initial results and suggestions for future implementations. In addition to STR targets, this approach also generates large, genome-wide single nucleotide polymorphism (SNP) panels from flanking regions. Our method provides a cost-effective and scalable solution for rapid recovery of large STR and SNP datasets in any species without needing a reference genome, and can be used even with suboptimal DNA more easily acquired in conservation and ecological studies.
Figure 2. Orthologous locations of diademed sifaka STR targets on the human reference genome. (A) Genome-wide distribution of genic (blue, n = 2267) and intergenic (red, n = 1982) diademed sifaka STR loci that could be mapped the human genome.
Figure 3. Target Enrichment Results. (A) Number of target STR loci recovered in shotgun (n = 2) and captured (n = 5) libraries. Read data were randomly downsampled using SAMtools (32) after read mapping and before genotype calling to normalize all libraries to 30 million input reads for cross-comparability. Actual reads generated per captured sample ranged from 34.8 million to 73 million (Supplemental Table S1). (B) Per-site coverage is highly correlated among samples, illustrating non-random variation in marker enrichment. Marker coverage in the 30 million read subsample is compared between Titania Oberon capture data (left axis, blue), and Titania and Romeo's tissue library (right axis, red). (C) Enrichment of reads carrying target STRs in subsamples of 30 million reads. Left axis shows the proportion of callable reads, right axis shows the estimated enrichment level given the genomic expectation of 0.000156 of reads on target with no enrichment. For the fecal libraries, enrichment values are given without any correction for the high proportion of exogenous DNA, whereas previous estimates of endogenous fecal DNA content suggest actual enrichment similar to the tissue samples.
Figure 4. SNP-STR compound markers. (A) Simulated example of a phased STR-linked SNP locus, where the ‘A’ SNP allele associates with six repeats of the TC motif and the ‘G’ allele associates with seven. (B) Proportions of genic, non-genic, and unplaced STR loci (based on mapping analysis to the human reference genome; see Figure 1) with number of SNPs detected at ≥4× coverage on associated inserts from three lemur tissue sample libraries (Oberon, Titania, and Romeo). Forty two STR loci associated with >15 SNPs (n = 8 genic, n = 25 non-genic, n = 9 unplaced) are not shown.
Figure 5. Simulated performance of STR discovery using BaitSTR. (A) At variable k-mer lengths and coverage levels, the number of discoverable simulated bi-allelic STRs in a random synthetic genome that could be discovered with no requirement of local extension. 1000 total markers were present. (B) Using the same simulated STRs in a synthetic genome, this simulation required successful block extension to 200nt non-repeat flanks and a total contig length of 500nt. (C) At variable coverage levels, the number of heterozygous STRs discovered in the NA12878 genome data, along with a low frequency of false positives and the number of heterozygous SNPs recovered from extended blocks.
Agrafioti,
SNPSTR: a database of compound microsatellite-SNP markers.
2007, Pubmed
Agrafioti,
SNPSTR: a database of compound microsatellite-SNP markers.
2007,
Pubmed
Arandjelovic,
Two-step multiplex polymerase chain reaction improves the speed and accuracy of genotyping using DNA from noninvasive and museum samples.
2009,
Pubmed
Bonatelli,
Using Next Generation RAD Sequencing to Isolate Multispecies Microsatellites for Pilosocereus (Cactaceae).
2015,
Pubmed
Carlson,
MIPSTR: a method for multiplex genotyping of germline and somatic STR variation across many individuals.
2015,
Pubmed
Carpenter,
Pulling out the 1%: whole-genome capture for the targeted enrichment of ancient DNA sequencing libraries.
2013,
Pubmed
Chikhi,
Informed and automated k-mer size selection for genome assembly.
2014,
Pubmed
Ellegren,
Microsatellites: simple sequences with complex evolution.
2004,
Pubmed
Fan,
A brief review of short tandem repeat mutation.
2007,
Pubmed
Fordyce,
Second-generation sequencing of forensic STRs using the Ion Torrent™ HID STR 10-plex and the Ion PGM™.
2015,
Pubmed
Fu,
DNA analysis of an early modern human from Tianyuan Cave, China.
2013,
Pubmed
Gnirke,
Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing.
2009,
Pubmed
Guichoux,
Current trends in microsatellite genotyping.
2011,
Pubmed
Gymrek,
lobSTR: A short tandem repeat profiler for personal genomes.
2012,
Pubmed
Haak,
Massive migration from the steppe was a source for Indo-European languages in Europe.
2015,
Pubmed
Hoban,
The number of markers and samples needed for detecting bottlenecks under realistic scenarios, with and without recovery: a simulation-based study.
2013,
Pubmed
Hu,
pIRS: Profile-based Illumina pair-end reads simulator.
2012,
Pubmed
Kent,
BLAT--the BLAST-like alignment tool.
2002,
Pubmed
Kistler,
Comparative and population mitogenomic analyses of Madagascar's extinct, giant 'subfossil' lemurs.
2015,
Pubmed
Koboldt,
VarScan: variant detection in massively parallel sequencing of individual and pooled samples.
2009,
Pubmed
Lander,
Genomic mapping by fingerprinting random clones: a mathematical analysis.
1988,
Pubmed
Li,
The Sequence Alignment/Map format and SAMtools.
2009,
Pubmed
Li,
Fast and accurate short read alignment with Burrows-Wheeler transform.
2009,
Pubmed
Manichaikul,
Robust relationship inference in genome-wide association studies.
2010,
Pubmed
Melsted,
Efficient counting of k-mers in DNA sequences using a bloom filter.
2011,
Pubmed
Meyer,
Illumina sequencing library preparation for highly multiplexed target capture and sequencing.
2010,
Pubmed
Perry,
Comparative RNA sequencing reveals substantial genetic variation in endangered primates.
2012,
Pubmed
Perry,
Genomic-scale capture and sequencing of endogenous DNA from feces.
2010,
Pubmed
Perry,
A genome sequence resource for the aye-aye (Daubentonia madagascariensis), a nocturnal lemur from Madagascar.
2012,
Pubmed
Quinlan,
BEDTools: a flexible suite of utilities for comparing genomic features.
2010,
Pubmed
Quéméré,
Genetic data suggest a natural prehuman origin of open habitats in northern Madagascar and question the deforestation narrative in this region.
2012,
Pubmed
Scheible,
Short tandem repeat typing on the 454 platform: strategies and considerations for targeted sequencing of common forensic markers.
2014,
Pubmed
Schlötterer,
Slippage synthesis of simple sequence DNA.
1992,
Pubmed
Schoebel,
Lessons learned from microsatellite development for nonmodel organisms using 454 pyrosequencing.
2013,
Pubmed
Seguin-Orlando,
Ligation bias in illumina next-generation DNA libraries: implications for sequencing ancient genomes.
2013,
Pubmed
Seo,
Reduction of stutter ratios in short tandem repeat loci typing of low copy number DNA samples.
2014,
Pubmed
Smith,
Identification of common molecular subsequences.
1981,
Pubmed
Snyder-Mackler,
Efficient Genome-Wide Sequencing and Low-Coverage Pedigree Analysis from Noninvasively Collected Samples.
2016,
Pubmed
Trapnell,
TopHat: discovering splice junctions with RNA-Seq.
2009,
Pubmed
Vartia,
A novel method of microsatellite genotyping-by-sequencing using individual combinatorial barcoding.
2016,
Pubmed
Veeramah,
The impact of whole-genome sequencing on the reconstruction of human population history.
2014,
Pubmed
Willems,
The landscape of human STR variation.
2014,
Pubmed