The single cell transcriptomics revolution
It was more than 10 years ago that bulk RNA-Sequencing (“RNA-Seq”) was denominated as a “revolutionary tool for transcriptomics”1, and to this day it still remains the gold standard for genome-wide expression analysis2. However, one of its drawbacks is that before sequencing the tissue of interest is homogenized, and thus, an average portrait of gene expression is obtained.
Novel advances of RNA-Seq include spatially resolved methods and single cell RNA-Seq (scRNA-Seq). Through the use of scRNA-Seq, it is now possible to study tissue heterogeneity, an aspect that has been shown to be critical in the understanding of several healthy and disease states. Despite this breakthrough, Transposable Elements are not commonly analyzed in such studies.
Improving locus-specific analysis of TEs with SoloTE
Transposable Elements (TEs) are highly repetitive elements that can be found in almost every eukaryotic genome known to date. For example, they occupy almost 50% of the human and mouse genomes. Although they used to be labelled as “junk DNA”, they are now being recognized as regulators of gene expression3. Barbara McClintock pioneered the idea that TEs are genetic entities that can control gene expression, and to this day we are still discovering ways in which this occur. For instance, transcriptional activation of TEs can influence expression of neighboring genes3 (Fig. 1a), and because of this, analyzing their expression with locus resolution can further help us understand their role in gene regulation.
You just know sooner or later, it will come out in the wash, but you may have to wait some time.
When I first read about scRNA-Seq, I was amazed and went on learning as much as I could about it. Even though it shares some similarities with bulk RNA-Seq, the computational analyses are quite different, and I found a whole new world to explore. Up to this point, I had spent most of my career on the study of TEs in bulk RNA-Seq data, so I was familiar with the approaches developed to study them at the locus level. When I found out scTE, a method published to study TEs in scRNA-Seq experiments, I noticed that it omits the genomic location of TEs (Fig. 1b)4. This is similar to the approach used in early bulk RNA-Seq TE analysis methods, and taking this into account, I considered the possibility of improving upon this.
Although in the scTE paper they showed that TE expression can be associated with specific cell groups, by using that tool is not feasible to predict and understand the impact of TE expression in genetic programs. With SoloTE we aimed to improve the quantification of TEs in scRNA-Seq studies by taking advantage of the differences at the sequence level between them (Fig. 1c), and thus, maintaining the genomic location of expressed TEs. Consequently, this allows predicting the impact that their transcriptional activation might have on specific genes, and in our paper we show several examples of this.
Figure 1. a. Schematics of the influence of an expressed TE on neighboring gene expression. b. Sample diagram showing an example of TE expression ("Real") and how scTE aggregates expression at the family level, losing the genomic location of expressed TEs. Two TE families are depicted (Family 1 in red, and Family 2 in cyan), and yellow diamonds correspond to sequence mutations in the TE. Smaller rectangles above correspond to reads belonging to each TE. c. Example of how SoloTE process reads and estimates TE expression.
To assess our tool, first we used simulated data, in which we knew the true locus of TE expression, and compared it versus scTE in several scRNA-Seq metrics, such as cell clusters marker detection and UMAP dimensional reduction. As we found that our tool outperformed scTE in these metrics, we then aimed to uncover the repertoire of activated TEs and their impact in gene expression in 3 datasets: murine embryonic 2C-like cells5, early gastric cancer6, and Alzheimer’s disease APP/PS1 mouse model7.
We first studied the 2C-like dataset, because it is a well-known example of TE expression restricted to a specific cell group. On the other hand, both the early gastric cancer (EGC) and Alzheimer’s disease APP/PS1 mouse model (AD APP/PS1) were datasets in which, to the best of our knowledge, TE expression has not been explored before. Nonetheless, previous works using bulk RNA-Seq have indicated alterations in TE activity in these diseases 8,9. Overall, when compared to scTE, we found a greater number of TEs (Fig. 2a), and we were able to confirm the expected TE expression pattern in the 2C-like cells (Fig. 2b). For the EGC and AD APP/PS1 data, we found expression that could be associated with the disease state (examples shown in Fig. 2c and Fig. 2d).
Figure 2. Marker TEs detected with SoloTE. a. Overview of the number of marker TEs detected with scTE (blue) or SoloTE (red). b. Marker TEs detected in the 2C-like dataset. c. Marker TEs detected in the Early Gastric Cancer (EGC) dataset. d. Marker TEs detected in the Alzheimer's Disease (AD) APP/PS1 mouse model.
Finally, using the locus information of the detected TEs, we correlated their expression with that of the genes located closer to them. Through this analysis, we were able to find potential genes whose expression might be modulated by TEs. For example, in EGC we found that TEs might be negatively modulating the CDH1 and LINC-PINT, genes whose down-regulation has been previously associated with the pathology. In AD APP/PS1, we found a negative correlation with Olfr1033, an olfactory receptor. Interestingly, olfactory dysfunction has also been associated with AD, further highlighting the importance of this association.
Using SoloTE, we expect that the scientific community can start exploring the fascinating patterns of TE activity in scRNA-Seq studies. Although there is room for improvement, we think it is the first step towards taking advantage of the unprecedented resolution that scRNA-Seq provides, allowing us to further understand the impact of these often-overlooked elements in gene expression.
Read the full article at: https://www.nature.com/articles/s42003-022-04020-5
- Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57–63 (2009). doi:10.1038/nrg2484.
- Stark, R., Grzelak, M. & Hadfield, J. RNA sequencing: the teenage years. Nat. Rev. Genet. 20, 631–656 (2019). doi:10.1038/s41576-019-0150-2.
- Lanciano, S. & Cristofari, G. Measuring and interpreting transposable element expression. Nat. Rev. Genet. 21, 721–736 (2020). doi:10.1038/s41576-020-0251-y.
- He, J. et al. Identifying transposable element expression dynamics and heterogeneity during development at the single-cell level with a processing pipeline scTE. Nat. Commun. 12, 1456 (2021). doi:10.1038/s41467-021-21808-x.
- Zhao, T. et al. Single-Cell RNA-Seq Reveals Dynamic Early Embryonic-like Programs during Chemical Reprogramming. Cell Stem Cell 23, 31-45.e7 (2018). doi:10.1016/j.stem.2018.05.025.
- Zhang, P. et al. Dissecting the Single-Cell Transcriptome Network Underlying Gastric Premalignant Lesions and Early Gastric Cancer. Cell Rep. 27, 1934-1947.e5 (2019). doi:10.1016/j.celrep.2019.04.052.
- Yang, H. S. et al. Natural genetic variation determines microglia heterogeneity in wild-derived mouse models of Alzheimer’s disease. Cell Rep. 34, 108739 (2021). doi:10.1016/j.celrep.2021.108739.
- Guo, C. et al. Tau Activates Transposable Elements in Alzheimer’s Disease. Cell Rep. 23, 2874–2880 (2018). doi:10.1016/j.celrep.2018.05.004.
- Chenais, B. Transposable Elements in Cancer and Other Human Diseases. Curr. Cancer Drug Targets 15, 227–242 (2015). doi:10.2174/1568009615666150317122506.