Click
here to close Hello! We notice that
you are using Internet Explorer, which is not supported by Echinobase
and may cause the site to display incorrectly. We suggest using a
current version of Chrome,
FireFox,
or Safari.
Uncovering transcriptional dark matter via gene annotation independent single-cell RNA sequencing analysis.
Wang MFZ
,
Mantri M
,
Chou SP
,
Scuderi GJ
,
McKellar DW
,
Butcher JT
,
Danko CG
,
De Vlaminck I
.
Abstract
Conventional scRNA-seq expression analyses rely on the availability of a high quality genome annotation. Yet, as we show here with scRNA-seq experiments and analyses spanning human, mouse, chicken, mole rat, lemur and sea urchin, genome annotations are often incomplete, in particular for organisms that are not routinely studied. To overcome this hurdle, we created a scRNA-seq analysis routine that recovers biologically relevant transcriptional activity beyond the scope of the best available genome annotation by performing scRNA-seq analysis on any region in the genome for which transcriptional products are detected. Our tool generates a single-cell expression matrix for all transcriptionally active regions (TARs), performs single-cell TAR expression analysis to identify biologically significant TARs, and then annotates TARs using gene homology analysis. This procedure uses single-cell expression analyses as a filter to direct annotation efforts to biologically significant transcripts and thereby uncovers biology to which scRNA-seq would otherwise be in the dark.
Fig. 1. Generating de novo features based on genome coverage.a Workflow to generate TARs and to identify biologically meaningful uTARs. b Total genome assembly sequence length for human (hg38 and hg16), mouse (mm10), chicken (GRCg6a), gray mouse lemur (Mmur_3.0), naked mole rat (HetGla_1.0), and sea urchin (Spur_4.2). c Total number of annotated transcripts in existing annotations normalized to the assembly sequence length for humans (hg38 GENCODE v30, hg16 RefSeq), mouse (GENCODE vM21), chicken (GRCg6a Ensembl v96), gray mouse lemur (Mmur_3.0 RefSeq), naked mole rat (HetGla_1.0 RefSeq), and sea urchin (Spur_4.2 RefSeq). d Relative number of unique scRNA-seq reads outside of gene annotations contained in uTARs for each cell shown as violin plots (3849 cells in hg38 and hg19, 6113 in mouse, 14008 in chicken, 6321 in lemur, 2657 in naked mole rat, 2658 in sea urchin). Mean values (black dots) and 2 standard deviations above and below the mean (black bars) are shown. e Relative number of unique scRNA-seq reads outside of gene annotations for different human genome assemblies and annotations at different times (3849 cells). f Example of groHMM defined aTAR (red) and uTAR (maroon) features along hg16 chr22 with RefSeq hg16 gene annotations shown in blue. Sense strand coverage plotted in black while antisense strand coverage plotted in gray (log-e scale).
Fig. 2. Reads in uTARs can separate cell types in different organisms.a UMAP dimensional reduction on annotated gene expression features (top row) and uTARs (second row) for mouse spleen, mouse kidney, different time points in chicken embryonic heart development, gray mouse lemur lung tissue, and sea urchin embryonic tissue. Cells are colored in each column based on gene expression clustering. Relative number of uTAR reads for each cell in every cluster also shown as violin plots (third row, colors correspond to UMAPs); 6113 cells in mouse spleen, 610 cells in mouse kidney, 4365 in chicken day 4, 2198 in chicken day 14, 6321 in gray mouse lemur lungs, 2657 in naked mole rat spleen, and 2658 in sea urchin embryo. b Silhouette coefficient values based on 2D UMAP coordinates of gene expression (blue), aTARs (red), and uTARs (maroon) for 11 samples. UMAPs for samples labeled with (*) are shown in Supplementary Fig. 1b. Cell labels are defined by gene annotation clustering. c Correlation between top 5 PC loadings and pseudo-bulk read coverage of uTARs across 11 samples. Horizontal line at uTAR PC loading = 0.5, vertical line at uTAR pseudo-bulk read coverage = 1e + 4, r2 = 4.0e-3. Quadrant numbers represent the number of uTARs in respective quadrant. d Relative percentage of uTARs containing homology to any sequence (blue) and mRNA sequences (light blue) as a function of log-e fold change expression for each cell type in naked mole rat spleen data. BLAST sequence homology results relative to nucleotide collection database thresholds: mean uTAR peak query length = 686 ± 731 bps, uTAR peak percent identity > 71%, e-value < 0.053, bit score > 52.8.
Fig. 3. Biologically relevant information is contained in uTAR features.Differential uTAR feature analysis for mouse spleen data (a), chicken heart day 4 data (b), gray mouse lemur EPCAM+ lung data (c), naked mole rat spleen data (d), and sea urchin embryo data (e). Dot plot (left) of differentially expressed uTAR features that are labeled based on sequence homology and cell clusters are numbered along the x-axis. Dot size corresponds to the percentage of cells that express the uTAR feature while darker blue color corresponds to higher level of log-e-normalized expression. UMAP (second left) colored and dimensionally reduced using gene expression features where cell clusters are labeled above the UMAP. Total coverage plot (top) of 5 uTARs along the length of the uTAR feature on the x-axis. The corresponding feature plot on UMAP projection is shown below the coverage plots where darker brown color correlates with higher log-e-normalized expression in each cell.
Fig. 4. Spatial transcriptomics to map uTAR expression in chicken embryonic hearts.a Spatial log-e-normalized expression of canonical TNNT2 myocytes marker, SH3BGR uTAR, canonical COL1A1 epicardial cells marker, RUNX1T1 uTAR, and annotated RUNX1T1 gene for chicken embryonic heart at day 4 (5 hearts) and day 14 (1 heart) post fertilization. b Dendrogram computed on Pearson correlation of log-e-normalized spatial expression for canonical gene markers and uTARs (underlined) in a day 4 chicken heart tissue section.
Alfayez,
Runt-related Transcription Factor 1 (RUNX1T1) Suppresses Colorectal Cancer Cells Through Regulation of Cell Proliferation and Chemotherapeutic Drug Resistance.
2016, Pubmed
Alfayez,
Runt-related Transcription Factor 1 (RUNX1T1) Suppresses Colorectal Cancer Cells Through Regulation of Cell Proliferation and Chemotherapeutic Drug Resistance.
2016,
Pubmed
Altschul,
Basic local alignment search tool.
1990,
Pubmed
Anderson,
Defining data-driven primary transcript annotations with primaryTranscriptAnnotation in R.
2020,
Pubmed
Becht,
Dimensionality reduction for visualizing single-cell data using UMAP.
2018,
Pubmed
Board,
Glutathione transferases, regulators of cellular metabolism and physiology.
2013,
Pubmed
Butler,
Integrating single-cell transcriptomic data across different conditions, technologies, and species.
2018,
Pubmed
Cao,
Comprehensive single-cell transcriptional profiling of a multicellular organism.
2017,
Pubmed
Chae,
groHMM: a computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data.
2015,
Pubmed
Cheng,
DUSP1 promotes senescence of retinoblastoma cell line SO-Rb5 cells by activating AKT signaling pathway.
2018,
Pubmed
Foster,
Single cell RNA-seq in the sea urchin embryo show marked cell-type specificity in the Delta/Notch pathway.
2019,
Pubmed
,
Echinobase
Gierahn,
Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput.
2017,
Pubmed
González-Silva,
Tumor Functional Heterogeneity Unraveled by scRNA-seq Technologies.
2020,
Pubmed
Harrow,
GENCODE: the reference human genome annotation for The ENCODE Project.
2012,
Pubmed
Hilton,
Single-cell transcriptomics of the naked mole-rat reveals unexpected features of mammalian immunity.
2019,
Pubmed
Hong,
Principles of metadata organization at the ENCODE data coordination center.
2016,
Pubmed
HuBMAP Consortium,
The human body at cellular resolution: the NIH Human Biomolecular Atlas Program.
2019,
Pubmed
Jaitin,
Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types.
2014,
Pubmed
Johnson,
NCBI BLAST: a better web interface.
2008,
Pubmed
Kanton,
Organoid single-cell genomic atlas uncovers human-specific features of brain development.
2019,
Pubmed
Kindgren,
Native elongation transcript sequencing reveals temperature dependent dynamics of nascent RNAPII transcription in Arabidopsis.
2020,
Pubmed
Klein,
Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells.
2015,
Pubmed
Köster,
Snakemake--a scalable bioinformatics workflow engine.
2012,
Pubmed
Li,
The Sequence Alignment/Map format and SAMtools.
2009,
Pubmed
Liao,
featureCounts: an efficient general purpose program for assigning sequence reads to genomic features.
2014,
Pubmed
Liu,
Single-cell transcriptome sequencing: recent advances and remaining challenges.
2016,
Pubmed
Macosko,
Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets.
2015,
Pubmed
Mahauad-Fernandez,
BST-2 promotes survival in circulation and pulmonary metastatic seeding of breast cancer cells.
2018,
Pubmed
Menon,
Single-cell transcriptomic atlas of the human retina identifies cell types associated with age-related macular degeneration.
2019,
Pubmed
Mura,
Identification and angiogenic role of the novel tumor endothelial marker CLEC14A.
2012,
Pubmed
Picelli,
Smart-seq2 for sensitive full-length transcriptome profiling in single cells.
2013,
Pubmed
Quinlan,
BEDTools: a flexible suite of utilities for comparing genomic features.
2010,
Pubmed
Rakhmanov,
High levels of SOX5 decrease proliferative capacity of human B cells, but permit plasmablast differentiation.
2014,
Pubmed
Rosenberg,
Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding.
2018,
Pubmed
Saunders,
Molecular Diversity and Specializations among the Cells of the Adult Mouse Brain.
2018,
Pubmed
Schenk,
H3.5 is a novel hominid-specific histone H3 variant that is specifically expressed in the seminiferous tubules of human testes.
2011,
Pubmed
Shi,
Long non-coding RNA RUNX1-IT1 plays a tumour-suppressive role in colorectal cancer by inhibiting cell proliferation and migration.
2019,
Pubmed
Shiraishi,
Roles of histone H3.5 in human spermatogenesis and spermatogenic disorders.
2018,
Pubmed
Stuart,
Comprehensive Integration of Single-Cell Data.
2019,
Pubmed
Tabula Muris Consortium,
Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris.
2018,
Pubmed
Teng,
DUSP1 induces apatinib resistance by activating the MAPK pathway in gastric cancer.
2018,
Pubmed
Tiwari,
Beyond Tethering the Viral Particles: Immunomodulatory Functions of Tetherin (BST-2).
2019,
Pubmed
Wickramasinghe,
Regulation of constitutive and alternative mRNA splicing across the human transcriptome by PRPF8 is determined by 5' splice site strength.
2015,
Pubmed
Xue,
Genetic programs in human and mouse early embryos revealed by single-cell RNA sequencing.
2013,
Pubmed
Yates,
Ensembl 2020.
2020,
Pubmed
Zhang,
SOX5 promotes epithelial-mesenchymal transition in osteosarcoma via regulation of Snail.
2017,
Pubmed