The nuclear lamina (NL) is a meshwork found beneath the inner nuclear membrane. The study of the NL is hindered by the insolubility of the meshwork and has driven the development of proximity ligation methods to identify the NL-associated/proximal proteins, RNA, and DNA. To simplify and improve temporal labeling, we fused APEX2 to the NL protein lamin-B1 to map proteins, RNA, and DNA. The identified NL-interacting/proximal RNAs show a long 3′ UTR bias, a finding consistent with an observed bias toward longer 3′ UTRs in genes deregulated in lamin-null cells. A C-rich motif was identified in these 3′ UTR. Our APEX2-based proteomics identifies a C-rich motif binding regulatory protein that exhibits altered localization in lamin-null cells. Finally, we use APEX2 to map lamina-associated domains (LADs) during the cell cycle and uncover short, H3K27me3-rich variable LADs. Thus, the APEX2-based tools presented here permit identification of proteomes, transcriptomes, and genome elements associated with or proximal to the NL.
Introduction
The nuclear lamina (NL) is a substructure of the nucleus that resides beneath the inner nuclear membrane. The major structural components of the NL are the intermediate filament proteins, the A- and B-type lamins (Burke and Stewart, 2013; Dechat et al., 2008). The study of the NL has gained increased interest due to the distinct human pathologies that are caused by mutant versions of NL proteins and the connections of lamins to aging (Chen et al., 2014, 2015; Hatch and Hetzer, 2014; Schreiber and Kennedy, 2013; Tran et al., 2016; Yue et al., 2019). The importance of the NL in general organismal biology was highlighted by the discovery of differential expression of A- and B-type lamins during embryonic development (Röber et al., 1989; Stewart and Burke, 1987) and the discovery of the essential nature of NL genes for viability and overall genome organization (Coffinier et al., 2010; Kim et al., 2011; Sullivan et al., 1999; Vergnes et al., 2004). More recently, the NL has been implicated in aging with the apparent reduction or alteration of lamin proteins in aged animals and in a form of cellular aging termed senescence (Chen et al., 2014, 2015; Freund et al., 2012; Frost et al., 2016; Lattanzi et al., 2014; Shimi et al., 2011; Tran et al., 2016; Yue et al., 2019; Zheng et al., 2018). Studies of the NL have revealed functions for the NL in highly dynamic processes such as genome organization, gene transcription, signal transduction, protein/RNA trafficking, and cell division. The NL may influence the trafficking of RNA and proteins through a lamin meshwork by ensuring an even distribution of nuclear pore complexes (NPCs) throughout the nuclear envelope (Guo et al., 2014; Guo and Zheng, 2015). The NL also organizes the genome by interacting with regions of DNA known as lamina-associated domains (LADs; Guelen et al., 2008). However, our understanding of these NL functions is rather limited, in part due to an incomplete characterization of the molecular components of the NL and our limited knowledge of the dynamics associated with it.
The study of the NL is challenging as it has long been recognized as a proteinaceous structure with limited solubility (Gerace et al., 1984; Moir et al., 2000). This insolubility occurs throughout most of a cell’s life with the exception of animal cell mitosis when the nuclear envelope is disassembled. Early biochemical studies of the NL employed fractionation of this structure, but this method is limited by the ability to screen for different biological molecules and the amount of starting material required. This problem led to the development and utilization of a number of proximity ligation methods such as enzyme-mediated biotin identification (BioID), DNA adenine methylation-mediated identification (DamID), tyramide signal amplification (TSA), and biotinylation by antibody recognition (BAR, referred to here as “TSA-BAR”; Bar et al., 2018; Chen et al., 2018b; Guelen et al., 2008; Roux et al., 2012). The principle behind each of these methods is the enzymatic tagging of proteins and/or nucleic acids in the proximity of the NL with a molecule that is either readily identified or amenable to purification. As examples, DamID uses a DNA adenine methyltransferase to label DNA at the NL, and BioID utilizes the promiscuous E. coli biotin ligase, BirA, to label NL proteins with biotin (Guelen et al., 2008; Roux et al., 2012). More recently the TSA method, classically used to increase the signal of immunostaining procedures by regional horseradish peroxidase (HRP)-catalyzed deposition of biotin, was used to map the NL-associated proteome (TSA-BAR method) and LADs (TSA-sequencing or "TSA-seq"; Bar et al., 2018; Chen et al., 2018b). These proximity methods, while extremely useful, do have some restrictions. DamID, for example, targets DNA and does not provide proteomic information (van Steensel and Henikoff, 2000), while TSA-BAR requires specific antibodies that might not discriminate between isoforms (Bar et al., 2018; Chen et al., 2018b). Further, some of these approaches have limited temporal resolution, or they require large amounts of cells as starting material. An alternative enzyme suitable for the study of the NL, called ascorbate peroxidase (APEX), was recently developed for the purposes of proteomic and RNA identification (Fazal et al., 2019; Hung et al., 2014; Kaewsapsak et al., 2017; Rhee et al., 2013). This enzyme, which has been extensively engineered into the highly reactive form, APEX2 (Lam et al., 2015), uses hydrogen peroxide to catalyze the covalent addition of a radicalized biotin-phenol moiety to both protein and RNA species.
Here we describe the use of APEX2 to obtain a more complete picture of the NL by the identification of its interacting or proximal proteins, RNA, and DNA. We show that the NL interacts with or is proximal to proteins involved in RNA regulation such as mRNA splicing and stability, and that the APEX2 identified NL proteome exhibits strong overlap with that identified by the related TSA-BAR method (Bar et al., 2018). The use of APEX2 to identify NL-associated RNA species suggests an interesting role for the NL in the regulation of a select group of mRNAs. Finally, the APEX2 method allows easy study of LADs in different cell cycle stages.
Results
APEX2-lamin-B1 labels the nuclear periphery
We transfected HEK293FT cells using no plasmid or plasmid expressing FLAG-APEX2-lamin-B1 (human lamin-B1) and performed the APEX2 labeling reaction using the protocol similar to that described by Ting and colleagues (Fig. 1, Method #1; Hung et al., 2016). The cells were incubated with biotin-phenol in their culture medium and then treated with hydrogen peroxide for 1 min before processing. After cell fixation, we performed streptavidin labeling and immunostaining using the FLAG-M2 antibody. The latter shows that most transfected cells had low or modest FLAG-APEX2-lamin-B1 expression (Fig. S1 A). We find that the APEX2 reaction is very robust, but the streptavidin signal is diffused throughout the entire nucleus, whereas the Flag-M2 tag staining shows a clear NL localization (Fig. 1 B). We tried limiting diffusion by reducing the APEX2 reaction down to 15 s but continued to observe a similar diffuse-staining pattern for streptavidin (Fig. S1 B). We next determined whether the APEX2 reaction could be performed on fixed cells (Fig. 1, Method #2). We found that fixing cells with 1% Paraformaldehyde (PFA), permeabilizing with Triton X-100, and incubation with biotin-phenol and hydrogen peroxide resulted in a strong streptavidin staining that coincided with FLAG-M2 staining for the APEX2-lamin-B1 fusion protein (Fig. 1 C, white arrows). This suggests that APEX2 on lamin-B1 might have labeled nearby nuclear proteins that can diffuse throughout the nucleus in live cells before fixation. The APEX2 reaction does result in diffusion of labeling, as has been observed for the HRP-based TSA-seq method (Chen et al., 2018b). We have measured the diffusion in cells where labeling was performed after fixation and found it to be up to 1 µm from the FLAG signal without using reagents to limit diffusion (Fig. S1 C). This result demonstrates that APEX2-based biotinylation can be performed in fixed cells, which may capture both stable and transient NL-associated and NL-proximal proteins.
The robustness of the APEX2 enzyme under fixation conditions prompted us to examine the stability of the enzyme under other conditions. We found that the APEX2 enzyme was still reactive in fixed cells that were stored at 4°C for up to 3 wk (Fig. S1 D) and that the enzyme also withstood flash freezing in liquid N2 (Fig. S1 E). Our results show that the APEX2 reaction is robust under different experimental and cell fixation conditions permitting the biotinylation of NL-associated or proximal proteins; in unfixed cells, the diffuse nuclear signal was likely due to diffusion of biotinylated protein inside the nucleus.
APEX2-lamin-B1 labeling identifies NL-interacting or NL-proximal RNA with long 3′ UTR
Previous studies have used APEX2 to isolate RNAs associated with cellular organelles and structures. This procedure can be done by precipitating protein/RNA complexes, or by directly precipitating RNA since the APEX2 reaction will label RNA (Fazal et al., 2019; Kaewsapsak et al., 2017). In this study, we chose to precipitate protein/RNA complexes with streptavidin, which we refer to as the APEX2-lamin-B1 RNA-identification procedure (RIP), as this is expected to yield a more extensive dataset. We generated RNA sequencing (RNA-seq) datasets for total nuclear and total cytosol RNA, and then used both native APEX2-lamin-B1-RIP and PFA-fixed APEX2-lamin-B1-RIP to obtain NL-interacting and NL-proximal RNAs (Fig. 2 A and Table S1). The biological replicates for each experimental condition are consistent with Pearson correlation values ranging from 0.84 to 1.00 (Fig. S2, A and B). As expected, a greater number (∼42%) of nuclear RNA-seq reads map to introns, while only ∼2% of cytosolic reads are intronic (Fig. 2 B). Our native APEX2-lamin-B1-RIP– and fixed APEX2-lamin-B1-RIP–identified RNAs show low intronic reads, and the intron/exon profile is similar to that seen for cytosolic RNAs (Fig. 2, A and B). Interestingly, we observe known mRNA transcripts that retain introns in both the nuclear and APEX2-lamin-B1-RIP samples (Fig. 2 C, black bars; Boutz et al., 2015; Fazal et al., 2019; Lareau et al., 2007).
To identify fraction enrichments, we compared the nuclear and cytosolic RNA-seq (genes with >5 counts per million). We found a total of 3,596 cytosolic- and 5,073 nuclear-enriched RNAs using differential expression analysis with a false discovery rate (FDR) cutoff of 0.05 from two replicate experiments (Fig. S2 C, right panel). We found the expected enrichment (in our case, ∼15–20-fold over cytosolic reads) of known nuclear residents, XIST, MALAT1, and NEAT in the nuclear fraction (Fig. S2 C, left panel, red dots; Hutchinson et al., 2007). Differential analysis of APEX2-lamin-B1-RIP and nuclear RNA datasets (FDR ≤ 0.05) yielded 3,622 genes in the PFA-fixed dataset and 2,333 in the unfixed dataset, of which 1,058 and 1,806 were considered enriched, respectively (Fig. 2 D and Fig. S2 D). To confidently define RNA species that interact with NL or are near the NL, we intersected significantly different genes from both of our APEX2-lamin-B1-RIP and PFA-fixed APEX2-lamin-B1-RIP datasets, yielding a consensus set of 707 RNAs (Fig. S2 D, right panel). The genes encoding the 707 RNAs identified by our APEX2-lamin-B1-RIP (lamin-B1-RIP) do not show a biased localization to LAD boundaries (Fig. S2 E). This suggests that compared with all other RNAs, these RNAs have increased interactions with or are more proximal to the NL. These lamin-B1-RIP RNAs were primarily protein-coding mRNAs and contained a small percentage (4.1%) of RNAs that encoded secretory proteins (29 of 707; Table S1). The latter argues against excessive cytosolic RNA contamination since the rough ER and RNAs encoding secretory protein are often perinuclear (Gerace and Burke, 1988; Newport and Forbes, 1987; Shibata et al., 2006). We did not observe a differential enrichment for noncoding RNAs (Human Gene Organization, HUGO-defined) such as XIST or MALAT although they were present in the lamin-B1-RIP RNA-seq data (see Discussion and Table S1). Most of these lamin-B1-RIP mRNAs (93.4%) are also found in the cytosolic-enriched population (Fig. 2 E, “lamin-B1 RIP”), but only 18.4% of cytosolic-enriched RNAs are lamin-B1-RIP mRNAs. We found a small population of mRNAs (45) exclusively in the RIP fraction (Fig. 2 E, “RIP only”) and excluded these from further analysis due to the low number. We compared the expression levels and the overall length of the mRNAs among the lamin-B1-RIP, cytosolic, and nuclear fractions and found that mRNAs in the lamin-B1-RIP and cytosolic fractions are expressed at higher levels than those enriched in the nuclear fraction, but mRNA lengths are similar among all three fractions (Fig. 2 F).
A comparison of specific features in the lamin-B1-RIP, total nuclear, and cytosolic mRNAs shows that the lamin-B1-RIP population is biased toward longer 3′ UTRs while other features such as the 5′ UTRs, coding exons, and introns were similar (Fig. 3 A and Fig. S3 A). This suggests that a subset of mRNAs with long 3′ UTRs have increased association with the NL or structures near NL. We considered an alternate possibility that genes encoding mRNAs with longer 3′ UTRs are physically close to LADs. An analysis of 3′ UTR lengths for mRNAs in the human genome reveals a biphasic distribution for 3′ UTR size (Fig. S3 B, left panel), and indeed, genes with 3′ UTRs greater than the median size appear closer to LADs (Fig. S3 B, middle and right). However, we find that genes encoding the lamin-B1-RIP RNAs with 3′ UTRs greater than median size are farther away from LADs borders (Fig. 3 B). Thus, lamin-B1-RIP RNAs with longer 3′ UTRs were not identified because of the physical location of their encoding genes. Next, we examined the up- and down-regulated genes in our previously published RNA-seq datasets of the WT and lamin-null (triple knockout of lamin-A/C, -B1, and -B2, TKO) mouse embryonic stem cells (mESC; Zheng et al., 2018). The 3′ UTRs in both the up- and down-regulated genes were larger than the unchanged RNA population (Fig. 3 C), but other features were similar in size (Fig. S3 C). Taken together, the APEX2-based lamin-B1-RIP procedure identifies potential RNA regulation via 3′ UTRs and could possibly explain some of the differentially expressed genes upon lamin deletion in mESCs.
In an effort to identify potential regulatory motifs, we performed a MEME motif search of nonredundant 3′ UTRs from lamin-B1-RIP RNAs that were greater than the median size. This search identified enrichment for a short nucleotide motif high in C residues (CCCCWCCCC, W can be A or U; Fig. 3 D). We next performed a motif search of down- and up-regulated genes in TKO mESCs and found a significant enrichment for a similar C-rich motif (Fig. 3 D). C-rich motifs were previously found to regulate α-globin and mu-opioid receptor mRNA stability (Hwang et al., 2017; Kong and Liebhaber, 2007). The use of the APEX2-based RIP method to identify NL-associated or NL-proximal RNAs reveals a potential role for the NL in regulating some transcripts with long 3′ UTRs that contain C-rich motifs.
APEX2-lamin-B1–based labeling identifies potential NL-associated proteins with RNA splicing and stability functions
APEX has been used extensively to identify proteomes interacting with different organelles. To explore if APEX2-lamin-B1 can be used to identify an NL-interacting proteome, including those that interact with the NL or NL-proximal structures transiently, we performed the APEX2 reaction with unfixed cells before protein purification to facilitate mass spectrometry. Biotinylated proteins were observed and could be efficiently precipitated by streptavidin beads (streptavidin pulldown [StrePD]; Fig. 4 A). As anticipated, known NL proteins (lamin-A/C and emerin) were detected by Western blotting (Fig. 4 A). Mass spectrometry identified 338 putative NL-interacting proteins with an average of five or more spectral counts from two replicate experiments (Table S2). To assess the reliability of our proteomic data, we compared our dataset with a recently published NL proteomic dataset generated by the TSA-BAR method (Bar et al., 2018) and found 23% overlap with the pooled, nonredundant TSA-BAR and APEX2 dataset (Fig. S4 A). Our dataset also shows a reasonable overlap with many previous datasets (Fig. S4 A) generated by different methods (Bar et al., 2018; Depreux et al., 2015; Dittmer et al., 2014; Dreger et al., 2001; Engelke et al., 2014; Fu et al., 2015; Kubben et al., 2010; Roux et al., 2012; Schirmer et al., 2003; Thul et al., 2017). The variability between studies might reflect technical and/or true differences among different cell types.
We next compared our dataset against a list of 120 proteins identified in at least three previous NL proteomic experiments and found that 66 (55%) of these proteins are in our APEX2 study (Table S1; and Fig. 4 B, black circles), and a smaller number of hits were seen in more than half (6 out of 11) of the studies examined (Fig. 4 B, yellow circles). Normalizing our mass spectrometry–identified proteins based on their tyrosine content, a residue readily labeled by the biotin phenoxyl radical, did not change the overall result (Fig. S4 B), and the tyrosine content of our mass spectrometry proteins was not higher than that for nuclear proteins (Fig. S4 C; Hung et al., 2016). Regardless, we caution that our APEX2 mass spectrometry hits are candidates and that we do not have evidence that the spectral counts are proportional to a given protein’s abundance at the NL. We note that our APEX2-based NL mass spectrometry candidates did not contain many secreted proteins (3/338) or transmembrane proteins (27/338; Fig. S3 D). Further, our mass spectrometry data detect mainly nucleoplasmic nuclear pore components (TPR and NUP153; Table S2; Frosst et al., 2002; Sukegawa and Blobel, 1993). These data suggest that the APEX2 labeling was predominantly of NL-interacting proteins that include those transiently interacting with NL. A select number of RNA splicing proteins (e.g., SC-35, SRSF1, and HNRNPA1) found in our mass spectrometry were independently seen by Western blot experiments (Fig. S4 E). Immunostaining showed that a small subfraction of several splicing factors, HNRNPA1, ASF1, and SFRS7, are found at NL (Fig. 4 C, inset arrowheads), consistent with the idea that a fraction of these proteins may interact with NL.
A Gene Ontology (GO)-term analysis of our NL proteome dataset revealed enrichment for proteins involved in RNA splicing/stability, nuclear protein localization, and DNA replication (Table S2). We identified the five components (IGF2BP1, HNRNPU, SYNCRIP, YBX1, and DHX9) of the coding region instability determinant (CRD)-mediated mRNA stabilization complex involved in β-catenin–mediated C-MYC RNA stability (Noubissi et al., 2006; Weidensdorfer et al., 2009) and seven members of the T-chaperonin complex ([CCT], TCP1, CCT2, CCT3, CCT4. CCT6A, CCT7, and CCT8), which contributes to protein folding and the localization of proteins in nuclear subregions such as telomeres and Cajal bodies (Freund et al., 2014; Gestaut et al., 2019; Wrighton, 2015). We also identify proteins involved in the initiation of DNA replication (MCM2/4/6/7, RFC1/3, and RPA1), a process that is negatively affected by lamin mutants (Moir et al., 2000).
Since we observed an enrichment for a C-rich sequence (CCCWCCC) in the 3′ UTR in our APEX2-lamin-B1-RIP experiments above (Fig. 3 D), we anticipated that our APEX2-lamin-B1 proteome experiments would identify a protein or a complex of proteins that could bind polyC (poly[rC]) sequences. Indeed, our proteome contains three known poly(rC) binding proteins (PCBPs), PCBP1, PCBP2, and PCBP3, which had abundances greater than other known NL proteins such lamin-A/C ("LMNA") and emerin ("EMD"; Table S2; Fig. 4 B, red circles). Immunostaining of the most abundant PCBP protein (PCBP2) from our mass spectrometry study in TKO mESC and in lamin-B1 and -B2 null mouse embryonic fibroblasts treated with a Lmna siRNA reveals a reduced localization of the protein in the nucleus relative to the cytoplasm (Fig. S4, F and G). The effect is most apparent in mouse embryonic fibroblasts (MEFs), which grow as a monolayer and can be reliably quantitated (Fig. S3 H). The proteome identified by APEX2-lamin-B1 here suggests a role for the NL in RNA regulation, and also identifies proteins at the NL that are involved in protein localization and DNA replication.
APEX2-lamin-B1–based labeling identifies both stable and variable LADs (vLADs) in G1, S, and G2 cells
The protein-rich NL interacts with specific DNA regions known as LADs (Guelen et al., 2008). To see if APEX2 could be used to map LADs by StrePDs of DNA–protein complexes (hereafter referred to as APEX2-ID), we performed the reaction in K562 cells, which is a readily transfectable cell line previously used for both DamID and TSA-seq mapping of LADs (Chen et al., 2018b). We found that APEX2-ID–mapped K562 LADs were similar to DamID and TSA-seq (both based on DNA labeling; Fig. 5 A), with Pearson coefficients of 0.84 and 0.81 (Fig. 5 B), respectively. The distribution of LADs, which were defined by Hidden Markov Modeling (HMM; see below) across chromosomes is similar between all methods, and is most similar between the related APEX2-ID and TSA-seq methods (Fig. S5 A). As anticipated, gene expression in APEX2-ID–defined LADs was much lower than those outside of LADs (Fig. S5 B). We conclude that APEX2-ID reliably maps LADs in cultured cells.
LAD mapping is typically done in asynchronous cell populations, and it is not clear exactly how similar LADs are in different stages of the cell cycle. To determine this, we performed APEX2-ID in HCT116 cells, a cell line often used in studies of mitosis. Fixed cells are FACS-sorted for G1, S, and G2/M populations and then subjected to the APEX2 procedure. Since the NL disassembles in M phase, LAD maps for the G2/M population correspond to the G2 phase. We normally observed ∼40–60% APEX2-lamin-B1 transfection efficiency and sorted for ∼1 million cells in each cell cycle stage. LAD maps of each cell cycle stage reveal no large changes in profile across G1, S, and G2 phases (Fig. 5 C). Next, we performed chromatin immunoprecipitation (ChIP) sequencing (ChIP-seq) for H3K9me3, which is abundant in heterochromatin/LADs. The results show that the pattern for H3K9me3 also remains largely the same throughout the cell cycle, and these data correlated well with our cell cycle LADs data (Fig. 5 C and Fig. S5 C). A cross-referencing of our LADs and H3K9me3 datasets with the publicly available H3K9me3, H3K27me3, H3K27ac, H3K4me1, and H3K4me3 ENCODE Project datasets for HCT116 cells indicate that our LADs/H3K9me3 datasets are reliable with positive correlation with repressive chromatin (H3K9me3 and H3K27me3) and negative correlation with active chromatin marks (H3K4me1, H3K4me3, and H3K27ac; Fig. S5 C).
While LADs patterns are similar overall across G1, S, and G2, we noticed some variability (Fig. 5 C, red boxes). To examine this, we first defined LADs using a three-state HMM at each cell cycle stage. The three-state model distinguishes between (1) strong LAD signals, (2) an intermediate LAD signal that is characterized by a mixture of both enrichment and lack of enrichment or a weak signal, and (3) a signal that is clearly not LADs. The presence of intermediate-type signals suggests that APEX2-ID, like TSAseq, does not produce binary genome mapping data (Chen et al., 2018b). The average number of LAD calls in S-phase is slightly reduced across virtually all chromosomes (Fig. S5, D and E). When we compare LADs in each cell cycle, we find that >80% of LADs are shared among all cell cycle stages. These stable LADs have a median size of ∼1.5 megabases and are strongly enriched for the heterochromatin marker H3K9me3 (Fig. 5, D and E). LADs that are not found in every cell cycle stage, which we call vLADs, tend to be smaller in size (median ∼500 kb, Fig. 5 D), and are instead strongly enriched for H3K27me3 (ENCODE data, Fig. 5 E). The average lamin-B1 log2(StrePD/input) signal was reduced in vLADs across the cell cycle, and this was most pronounced during S-phase (Fig. 5 F). We identified the cell cycle stage when a specific vLAD was present (e.g., G1-, S-, or G2-specific vLADs) by comparing the HMM calls against the previous cell cycle stage and then measured the lamin-B1 signal across the cell cycle. As expected, G1 vLADs had their highest lamin-B1 signal in G1, but this followed by a reduction of signal in S phase and a subsequent gain of signal in G2 (Fig. S5 F). This patterning was similar for G2 vLADs where the highest signal was in G2, followed by G1 and then S (Fig. S5 F). S-phase vLADs, on the other hand, had modest lamin-B1 signal differences between cell cycle stages. Together these results show that the APEX2-ID is capable of mapping LADs with temporal resolution and reveals properties of LADs during the cell cycle.
Discussion
The NL has been difficult to study due to, in part, its insolubility. Here, we use APEX2 as a multifunctional tool capable of identifying proteins, RNA, and chromatin at or near the NL to reveal insights into the function of this nuclear substructure. The APEX2 enzyme is very robust, and importantly, we show that its enzymatic activity persists after standard cellular fixation with PFA. This is particularly advantageous as fixation limits the diffusion of biomolecules away from the region of interest.
Using APEX2, we were able to isolate RNA species that interact either transiently or stably with the NL, or structures near the NL. The vast majority of these RNAs appeared to be spliced, as evidenced by the low intronic read counts (Fig. 2 B), and while speculative, this could represent a group of spliced RNAs that interact with the NL or NL-proximal structures for longer time than other RNAs. Interestingly, these experiments also revealed a small fraction of mRNAs containing retained introns (Fig. 2 C). This suggests a potential role for the NL or structures near the NL in facilitating removal of retained introns for some RNAs, or alternatively the degradation of intron-retaining transcripts (Boutz et al., 2015; Lareau et al., 2007). This observation is consistent with previous work (Fazal et al., 2019) and it will be interesting to further explore whether the nuclear periphery plays a role in processing transcripts with retained introns.
Our APEX2-lamin-B1-RIP approach suggests a role for the NL in regulating RNAs with long 3′ UTRs. mRNAs with shorter 3′ UTRs have been reported to be more readily exported than those with longer 3′ UTRs in Drosophila cells (Chen and van Steensel, 2017), and it was proposed that RNA regulatory processes could explain why species with longer 3′ UTRs are transported more slowly. The length of the 3′ UTR affects RNA stability and localization by providing a platform for other regulatory elements, such as microRNAs, or through the use of alternative polyadenylation sites (Mayr, 2016). The length of the 3′ UTR, and in particular the use of alternative polyadenylation sites, is implicated in a number of pathologies including cancer, cell senescence, and cellular stress response (Chang et al., 2015; Chen et al., 2018a; Mayr and Bartel, 2009). Senescence of cultured primary cells, for example, is a phenomenon often connected to the reduction of lamin-B1 (Freund et al., 2014; Shimi et al., 2011). Whether or not regulation of longer 3′ UTRs at the NL is truly linked and if it is connected to any of these observations remains open for exploration.
We note that our dataset highlights methodological variation, which is to be expected when the downstream approaches differ. The RNAs identified in our study, which were isolated by pulldown of both RNA-protein complexes and presumed biotinylated RNAs, do not strongly overlap (∼1.5%) with those identified in a previous study that used APEX2-lamin-A/C proximity labeling and pulldown of biotinylated RNAs from a purified RNA input and identified noncoding RNAs as NL residents (Fazal et al., 2019). However, it might be possible that stable RNA residents at the NL are more readily labeled by the biotin-phenoxyl radical than trafficking RNAs. Nevertheless, APEX2 can be used to label both RNA directly and RNA-protein complexes and allows multiple downstream approaches that could be used to distinguish between stable and transient RNA residents.
APEX is commonly used to identify proteins (Chen and Perrimon, 2017; Rhee et al., 2013). Our APEX2-assisted identification of proteins interacting with the lamin-B1 containing NL or structures near the NL uncovered PCBPs as candidates that can potentially regulate NL or NL-proximal RNAs with longer 3′ UTRs. Poly(rC) proteins have been reported to affect the stability of RNAs, and they can also more broadly affect the transcriptional landscape in cancer cells (Behm-Ansmant et al., 2007; Choi et al., 2009; Perron et al., 2018). Deregulation of these proteins, and subsequently mRNAs containing 3′ UTR C-rich motifs at or near the NL, could explain some of the gene deregulation observed in TKO mESCs (Zheng et al., 2018). Indeed, our bioinformatic analysis identified a C-rich motif in many of our APEX2-identified NL or NL-proximal RNAs with longer 3′ UTRs and also in some of the genes deregulated in TKO mESCs (Fig. 3 D). Further, we observed that PCBP2 nuclear localization was disrupted in cells lacking all lamins (Fig. S4, F–H). The exact mechanism behind this is currently unknown; however, the APEX2 proteomic method raises the possible role of the NL in mRNA stability via PCBPs.
Finally, we show here that the APEX2 method is capable of mapping large genomic features such as LADs in asynchronous cell populations and cells sorted from specific cell cycle stages. We find that most LADs are unchanged during the cell cycle (Fig. 5 C) and that this is consistent with a recent study using an antibody-targeted DamID approach (van Schaik et al., 2019). However, we find that some LADs do show variations during the cell cycle, and we call these LADs, vLADs. Unlike stable or cell cycle–consistent LADs, which are enriched for H3K9me3 and possess higher lamin-B1 signal, vLADs are enriched for H3K27me3 and have modest lamin-B1 signal. Our previous modeling studies of histone lamin landscapes (HiLands) in mESCs identified two types of LADs, HiLands-B and -P (Zheng et al., 2015), that were largely concordant with facultative and constitutive LADs, respectively (Meuleman et al., 2013; Zheng et al., 2015). Interestingly, HiLands-B LADs have similar features to the vLADs identified here. HiLands-B LADs are short and exhibit weak lamin-B1 signal, and enrich for H3K27me3. On the other hand, the HiLands-P LADs are similar to the cell cycle–stable LADs we describe here, and are larger in size and H3K9me3-rich. It will be important to further investigate the similarities between cell cycle–variable and –stable LADs we found here and HiLands-B and -P LADs defined in mESCs. Interestingly, we observed that lamin deletion caused HiLands-B detachment from the NL and HiLands-P decompaction at the NL in cell cycle–asynchronous mESCs (Zheng et al., 2018). Based upon these considerations, it would be important to understand if the differential changes of HiLands-B and -P LADs are related to cell cycle states in lamin-TKO mESCs (Zheng et al., 2018). The APEX2-ID method described here reveals that LAD regions enriched for H3K9me3 are stable structures throughout the major stages of the cell cycle and that LADs variability exists in regions enriched primarily for the heterochromatin marker H3K27me3.
The mapping techniques described in this study enabled by the APEX2 are ideal for applications requiring genomic, RNA, and/or protein interactions occurring at a defined cellular structure such as the NL. Further, the APEX2-linked system is an ideal platform for situations where a discriminating antibody is unavailable, such as when dealing with genetic variants and closely related protein isoforms. In the future, it will be of interest to carry out temporal studies of changes in NL protein composition, RNA, and LADs in response to mechanical perturbation or cell stress as these have been previously linked to the NL (Dahl et al., 2006; Dou et al., 2016; Shimi and Goldman, 2014; Swift et al., 2013). It will be of additional interest to exploit the direct labeling of RNA and RNA–protein complexes, and see if this can be expanded to DNA. In this study, we expand on a flexible and temporally capable method of characterizing the NL structure that we believe will be a valuable tool.
Materials and methods
Cell culture
HEK293FT (Thermo Fisher Scientific; R70007), HCT116 (American Type Culture Collection; CCL-247), and K562 (American Type Culture Collection; CCL-243) cell lines were cultured in DMEM, McCoy’s 5a, and RPMI media, respectively. The media was supplemented with 10% FBS for DMEM and McCoy’s 5a and 15% FBS for RPMI. mESCs were cultured in 2i media, and mouse embryonic fibroblasts were cultured in DMEM with 10% FBS. The cells were grown at 37°C in 5% CO2.
Plasmid construction, siRNA, and transfection
The APEX2-lamin-B1 fusion was generated by replacing the tubulin cDNA in Addgene plasmid 66171 (Lam et al., 2015) with the human lamin-B1 cDNA using the XhoI and BamHI cloning sites. A (GGGGS)×3 linker was placed between the APEX2 and lamin-B1 (Huston et al., 1988). The siRNA against lamin-A/C is commercially available (Thermo Fisher Scientific; 4390771). Transfections were performed using Lipofectamine 2000 (Thermo Fisher Scientific; 11668030) according to the manufacturer’s instructions. The APEX2-lamin-B1 construct was expressed in cells for 24–48 h. The plasmid is available at Addgene (#139442).
FACS sorting
Cells were harvested by trypsinization and neutralized in medium containing FBS. The cells were fixed for 10 min at room temperature by adding freshly prepared 4% PFA to a final of 1%. The fixation was quenched with 125 mM glycine and pelleted, and the cells were washed with PBS. Cells were resuspended in phenol-red free HBSS + 2% FBS and stained with Hoechst 33342 for at least 20 min at room temperature before FACS sorting for G1, S, and G2/M phases of the cell cycle.
APEX2 reaction
The APEX2 reaction was done under either previously published unfixed (Hung et al., 2014) or PFA-fixed conditions, depending on the material to be isolated. For protein identification by mass spectrometry, cells expressing the APEX2-lamin-B1 construct were incubated with 6 ml of media with 250 µM biotin-phenol (Iris Biotech; LS-3500.0250) for 30 min at 37°C. An equal volume of 2 mM H2O2 was added and incubated for 1 min at room temperature. The reaction was quenched by adding a solution containing Trolox (Sigma-Aldrich; 238813) and sodium ascorbate (Sigma-Aldrich; PHR1279) to a final of 5 mM and 10 mM, respectively. The solution was aspirated, and the cells were immediately lysed on the dish with 1 ml of RIPA buffer (50 mM Tris, 150 mM NaCl, 0.1% [wt/vol] SDS, 0.5% [wt/vol] sodium deoxycholate and 1% [vol/vol] Triton X-100, pH 7.5) containing a protease inhibitor cocktail tablet (Sigma-Aldrich; cOmplete, EDTA-free; 11873580001). For isolation of RNA-bound biotinylated proteins under unfixed conditions, cells were treated as above except lysis was done in the presence of 100 U/ml RNasin (Promega; N251B).
For isolation of APEX2-labeled RNA– and DNA–protein complexes under fixed conditions, cells were harvested by trypsinization and neutralized with medium containing 10% FBS. The cells were then pelleted and fixed for 10 min at room temperature by adding freshly prepared 4% PFA to a final of 1%. The fixation was quenched with 125 mM glycine, and the cells were washed with PBS, pH 7.2. The cells were then permeabilized with PBS + 0.25% Triton X-100 for 5 min at room temperature. In the case of RNA, all solutions were supplemented with 100 U/ml RNasin. The cells were briefly pelleted and washed with PBS. The pellet was resuspended in 100 µl of DMEM medium containing 10% FBS due to the requirement for heme in the reaction (Martell et al., 2012), and 100 µM biotin-phenol for 5 min at room temperature. An equal volume of 2 mM H2O2 (final concentration, 1 mM) was added to the solution, mixed, and incubated for 20 s. The reaction was then quenched with Trolox (5 mM) and sodium ascorbate (10 mM). A small aliquot (40 µl) of the APEX2 reaction was stained and examined by fluorescence microscopy (see below) to confirm the reaction. The remainder of the APEX2 reaction could be used immediately for isolation of material or snap frozen and stored at –80°C for later use.
Fluorescence microscopy
For initial testing of the APEX2 reaction, cells were grown on coverslips and transfected with the APEX2-lamin-B1 construct. The reaction was performed either with live cells, or after 1% PFA fixation as described above. The APEX2 reaction was confirmed by incubating with mouse-anti FLAG-M2 antibody (1:1,000; Sigma-Aldrich; F3165) overnight at 4°C and followed by incubation with streptavidin conjugated to Alexa 488 (1:200; Biolegend; 405235) and an anti-mouse secondary conjugated (1:1,000) to an Alexa fluorophore. Other primary antibodies used for immunofluorescence include rabbit anti-HNRNPA1 (1:2,500; ProteinTech; 11176-1-AP), rabbit anti-SRSF1 (1:2,500; ProteinTech; 12929-2-AP), rabbit anti-SRSF7 (1:2,500; Novus; NBP1-92382), rat anti-Nup153 (Abcam; ab81463), and rabbit anti-PCBP2 (Invitrogen; PAS-30116). For cell suspensions, all washing procedures were done by pelleting at 500 g × 5 m and resuspended in Prolong anti-fade gold for mounting and imaging. Imaging was done at room temperature using the Leica Application Suite software v2.7.3.9723, an SP5 confocal microscope (Leica), a 40×/1.3 NA oil objective, and Type F immersion liquid (Leica; 11513859).
Isolation of protein, RNA, and DNA
The isolation of protein, RNA, and DNA was performed using the same precipitation pipeline with only minor differences that depended on the desired molecules. For all protocols, cells were initially lysed in RIPA buffer containing a protease inhibitor cocktail tablet (Sigma-Aldrich; cOmplete, EDTA-free) for 30 min with rotation at 4°C. In the specific case of RNA isolation, the RIPA buffer was also supplemented with RNasin (100 U/ml). In the case of DNA isolation, the resulting lysate was sonicated using a Diagenode Bioruptor Pico and cleared by centrifugation at 15,000 g × 30 s. At this point, one tenth of the lysate was obtained for the input fraction. Lysates, regardless of target, were incubated with 70 µl streptavidin magnetic beads (Pierce; 88817) overnight at 4°C using end-over-end rotation. The next day, the magnetic beads were harvested on a magnetic rack and washed in sequence with 2 × 1 ml RIPA, 1 × 1 ml with high-salt buffer (1 M KCl, 50 mM Tris-Cl pH 8.0, and 5 mM EDTA), 1 × 1 ml urea wash buffer (2 M urea and 10 mM Tris-Cl, pH 8.0), and then 1 × 1 ml RIPA (Hung et al., 2014). For RNA experiments, the streptavidin beads were treated with a solution of 0.1 M NaOH and 0.05 M NaCl to remove RNases and cleared with a solution containing 0.1 NaCl before use, and the urea buffer wash was excluded. The material was subjected to streptavidin beads–mediated pulldown (StrePD) followed by processing for each specific target. For protein, the StrePD beads were resuspended in 1× SDS-PAGE sample buffer and subjected to Western blotting or mass spectrometry (see below). For RNA, RNase-free DNase I (Sigma-Aldrich; 716728001) was added to the StrePD beads and incubated at 37°C for 10 min followed by an incubation with Proteinase K at 60°C for 30 min. The RNA was then extracted using Trizol reagent according to the manufacturer’s protocol. For DNA, RNase A (Qiagen; 19101) was added to the StrePD beads and incubated at 37°C for 10 min followed by an incubation with Proteinase K (Takara; 9034) at 60°C overnight. Both StrePD pulldown DNA and DNA from the input lysate were extracted using Ampure XP beads (Agencourt; A63881). Quantitation of nucleic acids was done using the Qubit system (Thermo Fisher Scientific).
Nuclear and cytosol preparation for RNA-seq
Plasma membranes were disrupted by gently resuspending cell pellets in 10 mM Hepes, pH 7.5, 60 mM KCl, 1 mM EDTA, pH 8.0, 1 mM DTT, 1 mM PMSF, and 0.075% vol/vol IGEPAL CA-630 supplemented with 100 U/ml RNasin and rotating the mixture at 4°C for 5 min. The volume of lysis buffer used was approximately five times the cell pellet volume. The nuclei were pelleted at 200 g for 5 min. Half of the cytosolic supernatant was carefully removed from the top and pelleted a second time as initially done, and the resulting supernatant was used as the cytosolic fraction. To obtain nuclei, the remainder of the cytosolic fraction was removed, and the pellet was washed twice with 10 mM Hepes, pH 7.5, 60 mM KCl, 1 mM EDTA, pH 8.0, 1 mM DTT, and 1 mM PMSF supplemented with an 100 U/ml RNAsin. The nuclei were then resuspended in the starting lysis volume with lysis buffer. One half of the nuclear fraction was used for RNA extraction using Trizol reagent, followed by quantification by Nanodrop (Thermo Fisher Scientific).
Western blotting and mass spectrometry
The APEX2 reaction was performed as described above and lysed in RIPA. Aliquots were taken for input, post-streptavidin flow-through, and StrePD. Proteins were separated on an SDS-PAGE gel and transferred to nitrocellulose for Western blotting with the indicated reagents/antibodies. Detection reagents used were streptavidin-HRP (1:1,000; GE Healthcare; RPN1231V), rabbit anti-lamin-B1 (1:10,000; Abcam; ab16048), mouse anti-lamin-A/C (1:5,000; Active Motif; 39287), rabbit anti-emerin (1:5,000; Santa Cruz; sc-15378), mouse anti-SC-35 (1:2,500, Sigma-Aldrich; S4045), rabbit anti-HNRNPA1 (1:2,500; ProteinTech; 11176-1-AP), and rabbit anti-SRSF1 (1:2,500; ProteinTech; 12929-2-AP). Imaging for streptavidin-HRP was done on a Licor Odyssey Fc machine. All other Western blots were imaged with a Licor Odyssey CLx machine.
Mass spectrometry
Proteins were precipitated with 23% TCA and washed with acetone. Protein pellets were solubilized in 8 M urea and 100 mM Tris, pH 8.5; reduced with 5 mM Tris(2-carboxyethyl)phosphine hydrochloride (Sigma-Aldrich); and alkylated with 55 mM 2-chloroacetamide (Fluka Analytical). Digested proteins were analyzed by four-step MudPIT using an Agilent 1200 G1311 quaternary pump and a Thermo LTQ Orbitrap Velos with an electrospray stage built in-house (Wolters et al., 2001).
Protein and peptide identification and protein quantitation were done with Integrated Proteomics Pipeline-IP2 (Integrated Proteomics Applications). Tandem mass spectra were extracted from raw files using RawConverter (He et al., 2015) with a monoisotopic peak option. Peptide matching was done against a reviewed Uniprot human protein database (released January 22, 2014; 20,275 entries) with common contaminants and with reversed sequences using ProLuCID (Peng et al., 2003; Xu et al., 2015) with a fixed modification of 57.02146 on cysteine and differential modification of 363.146012 on tyrosine. Peptide candidates were filtered using DTASelect with these parameters: -p 1 -y 1–trypstat–pfp 0.01 -DM 10–DB–dm -in -t 1 (Tabb et al., 2002). GO-term analysis was done using Panther (Mi et al., 2019).
Cell cycle H3K9me3 ChIP-seq
Asynchronously growing HCT116 cells were fixed with 1% PFA and FACS-sorted on a FACSAria III (BD Biosciences) to obtain ∼1 million cells for each of the cell cycle stages (G1, S, and G2/M). The cells were lysed with RIPA buffer (see above) supplemented with 500 µM PMSF for 30 min with rotation at 4°C. The lysate was sonicated using a Diagenode Bioruptor Pico and immunoprecipitated with anti-H3K9me3 (Abcam; ab8898) complexed to Protein A/G dynabeads. The precipitates were washed with a low-salt buffer (20 mM Tris-Cl, pH 8.0, 150 mM NaCl, 2 mM EDTA, 1% Triton X-100, and 0.1% SDS), high-salt buffer (20 mM Tris-Cl, pH 8.0, 500 mM NaCl, 2 mM EDTA, 1% Triton X-100, and 0.1% SDS), and a LiCl buffer (10 mM Tris-Cl, pH 8.0, 250 mM LiCl, 1 mM EDTA, 1% deoxycholic acid, and 1% IGEPAL-CA-630) supplied from the Millipore Chromatin Immunoprecipitation kit (17-295). Precipitates were resuspended in RIPA buffer and digested with Proteinase K overnight at 60°C. DNA for both input and ChIP was recovered using Ampure XP beads.
RNA-seq and DNA sequencing
RNA library building was done using the Illumina TruSeq RNA library kit v2 (Illumina; RS-122-2201) with Ribo-depletion. DNA libraries were prepared using the Rubicon Genomics ThruPlex kit (Rubicon Genomics; R400428). Sequencing was performed on the Illumina NextSeq 500 platform (Illumina). Raw and processed data were deposited at NCBI GEO under accession no. GSE159482.
Data analysis
RNA-seq data were aligned using Bowtie 2.3.2 and Tophat2 with default settings using the hg19 assembly. Counting into features was done using featureCounts from the Subread package v1.5.2 (Liao et al., 2014) using the -s 0 parameter. Exonic and intronc read counts were done using RefFlat coding exons and RefFlat introns. We removed any intron containing snoRNAs, miRNA, and/or lincRNAs. WT and lamin-null RNA-seq data were obtained from NCBI GEO under accession no. GSE89520 (Zheng et al., 2018). Differential enrichment was calculated using the glmFIt method in the edgeR v3.42.3 R package (Robinson et al., 2010). We used a threshold of an FDR ≤0.05 to determine fraction enrichment. We obtained mRNA features from the University of California, Santa Cruz (UCSC) Genome Browser. For analyses of gene distance, we used the start position relative to the nearest LAD HMM call (see below) to determine the position and excluded genes at the ends of chromosomes. Cumulative probability was plotted using the ecdf function in base R. Plotting was done in Rstudio v0.98.953 (R v3.5.1) using base functions and pHeatmap v1.0.12 or Microsoft Excel 2016. Motif analysis was done using the MEME Suite v5.1.0 with -mod zoops (for RIP), -mod anr (for mESC data), -minw 5, -maxw 10, and -markov_order 0 settings (Bailey and Elkan, 1994). Browser tracks for RNA-seq data were displayed with the Integrative Genomics Viewer v2.3.94.
LADs mapping was done by first aligning input and StrePD reads using the hg19 assembly and Bowtie 2.3.2 and then called into 100-kb genomic windows using the coverage function in Bedtools v2.26.0. Enrichment was calculated by first normalizing the StrePD and input data by read count to 1 million and then transforming the StrePD/input ratio by log2. LADs (DamID and TSAseq) data from K562 was obtained from GEO under accession no. GSE66019 (Chen et al., 2018b). A three-state HMM was used to call LADs for each replicate (e.g., G1, S, or G2 replicates and K562 replicates). The HMM LADs calls were intersected between replicates to obtain consensus LADs. The consensus LADs were then used for downstream comparisons. To identify cell cycle variable LADs, we compared the consensus HMM call from a particular stage against the previous cell cycle stage. For example, S vLADs were identified by comparing the S phase HMM calls against G1 HMM calls. K562 RNA-seq data were obtained from GEO under accession no. GSM958731, and epigenome data for HCT116 cells was obtained from the ENCODE Project (HCT116 reference epigenome series ENCSR361KMF). For H3K9me3 ChIP data, input and ChIP reads were aligned using Bowtie 2.3.2 and the hg19 assembly. Peak calling was performed using MACS2 v2.1.1.20160309 (Zhang et al., 2008) with default settings. Quantitation of histone signal in LADs was done using the bigwigAverageOverBed function in kentutils v3.62. Correlative analysis, HMM, statistics, and graphical plotting were done either in R Studio using the pHeatmap v1.0.12, corrplot v0.84, Hmisc v4.2-0, and depmixs4 v1.4-0 packages or Microsoft Excel 2016. Browser tracks were displayed using the University of California Santa Cruz Genome Browser with a smoothing window of two to three.
Protein tyrosine content was determined with the Biostrings v2.50.2 package. Diffusion of APEX2 biotin labeling (streptavidin signal) from the FLAG signal was measured by line profiling with Fiji (ImageJ v2.00-rc-69/1.53c). The data were fit with the y = y0 + A * exp(R0 * x) equation (Chen et al., 2018b) using the R package minpack.lm v1.2-1 to perform the iterative Levenberg–Marquardt algorithm.
Online supplemental material
Fig. S1 (related to Fig. 1) reveals that the APEX2 reaction in APEX2-lamin-B1–expressing cells is quick and durable. Fig. S2 (related to Fig. 2) displays APEX RIP biological replicates and enrichment analysis. Fig. S3 (related to Fig. 3) shows additional analyses of RIP RNA features. Fig. S4 (related to Fig. 4) provides additional analyses of our APEX2-lamin-B1–identified proteome. Fig. S5 (related to Fig. 5) displays additional analyses of APEX2-ID LADs. Table S1 is related to the RNA-seq experiments conducted in this study. Table S2 is related to the proteomic experiments performed in this study.
Acknowledgments
We would like to thank members of the Zheng and Goldman laboratories, and Matthew Sieber for advice and discussions. We also thank Allison Pinder, Frederick Tan, and Xiaobin Zheng for their help with sequencing and data analysis.
This study was funded by the National Institutes of Health, National Institute of General Medical Sciences (GM106023 to R.D. Goldman and Y. Zheng, GM110151 to Y. Zheng, and 8 P41 GM103533 to J.R. Yates III).
The authors declare no competing financial interests.
Author contributions: J.R. Tran designed, performed, and interpreted experiments. D.I. Paulson performed analysis of LAD data. J.J. Moresco performed mass spectrometry and assisted with data interpretation. S.A. Adam, J.R. Yates III, R.D. Goldman, and Y. Zheng participated in the design of the study. J.R. Yates III, R.D. Goldman, and Y. Zheng were responsible for project funding. J.R. Tran and Y. Zheng cowrote the manuscript. All authors participated in revisions of the manuscript.