Glioblastoma is an incurable brain cancer characterized by high genetic and pathological heterogeneity. Here, we mapped active chromatin landscapes with gene expression, whole exomes, copy number profiles, and DNA methylomes across 44 patient-derived glioblastoma stem cells (GSCs), 50 primary tumors, and 10 neural stem cells (NSCs) to identify essential super-enhancer (SE)–associated genes and the core transcription factors that establish SEs and maintain GSC identity. GSCs segregate into two groups dominated by distinct enhancer profiles and unique developmental core transcription factor regulatory programs. Group-specific transcription factors enforce GSC identity; they exhibit higher activity in glioblastomas versus NSCs, are associated with poor clinical outcomes, and are required for glioblastoma growth in vivo. Although transcription factors are commonly considered undruggable, group-specific enhancer regulation of the MAPK/ERK pathway predicts sensitivity to MEK inhibition. These data demonstrate that transcriptional identity can be leveraged to identify novel dependencies and therapeutic approaches.
Glioblastoma (GBM; World Health Organization grade IV glioma) is the most prevalent and lethal primary, intrinsic brain tumor (Ostrom et al., 2016). Standard therapy of surgery, radiation, and alkylating chemotherapy offer only palliation, and ultimately almost all patients succumb to this disease (Stupp et al., 2009). Other than conventional therapies, the only US Food and Drug Administration–approved GBM therapies include tumor-treating fields (Stupp et al., 2017) and bevacizumab (Gilbert et al., 2014). Although GBM has been extensively characterized at the molecular level, current clinical management has limited guidance from molecular pathology, beyond mutations in isocitrate dehydrogenase 1/2 (IDH1/2) and MGMT promoter methylation (Hegi et al., 2005; Cancer Genome Atlas Research Network, 2008; Parsons et al., 2008; Yan et al., 2009; Brennan et al., 2013; Frattini et al., 2013). Additionally, gliomas have high intratumoral phenotypic heterogeneity, leading to the hypothesis that self-renewing GBM stem cells (GSCs) persist after therapy and are able to regenerate refractory tumors (Singh et al., 2004; Bao et al., 2006; Chen et al., 2012a; Patel et al., 2014; Suvà et al., 2014). GBMs possess remarkable cellular plasticity and are highly responsive to changes in nutrient state, oxygen levels, and the local tumor microenvironment (Li et al., 2009b; Eyler et al., 2011; Flavahan et al., 2013; Zhou et al., 2015; Quail et al., 2016; Miller et al., 2017; Quail and Joyce, 2017). These data support highly dynamic and robust tumor ecosystems that resist standard and genetically defined targeted therapies.
Despite these challenges, transcriptional profiling of GBM lacking IDH1 or IDH2 or classic driver mutations has identified molecular subgroups that predominate among patient tumors (Phillips et al., 2006; Verhaak et al., 2010; Wang et al., 2018a). Several transcriptional profiling paradigms have been proposed, but predominantly there have been at least three biologically distinct profiles (Phillips et al., 2006; Verhaak et al., 2010). The first is characterized by a proneural gene expression program, likely defined by the OLIG1 and OLIG2 transcription factors (TFs), which are master regulators of oligodendrocyte identity (Lu et al., 2002; Yu et al., 2013). The second is marked by mesenchymal gene expression signatures and, like many mesenchymal tumors, has been associated with increased resistance to radiotherapy (Carro et al., 2010; Kim et al., 2016). The third, referred to as classical or proliferative, is typically driven by recurrent genomic alterations found in EGFR, PTEN, and CDKN2A. Although bulk tumors tend to fall into one category or the other, single-cell RNA sequencing (RNA-seq) suggests that cells of distinct states exist within the same tumor (Patel et al., 2014). Emerging evidence suggests that tumors can transition between these states, designated as proneural-to-mesenchymal transition, suggesting that this plasticity of cellular identity is a hallmark of the disease (Bhat et al., 2013; Mao et al., 2013; Segerman et al., 2016).
At its core, cellular identity is defined and governed by a small number of master TFs that are tissue specific and establish the cell’s chromatin and epigenetic landscape (Takahashi and Yamanaka, 2006). Overexpression of master TFs reprogram or transdifferentiate cells (Weintraub et al., 1989; Graf and Enver, 2009). As a rule, master TFs are regulated by large adjacent enhancer elements (often termed super-enhancers [SEs]), and master TFs bind to their own enhancers and those of other master TFs, thereby forming an autoregulatory TF core regulatory circuitry. Cancers often maintain portions of the core regulatory circuitry from their cells of origin, and at the same time, oncogenic deregulation of SEs and core regulatory circuitry are common themes across cancers (Mansour et al., 2014; Lin et al., 2016; van Groningen et al., 2017; Wang et al., 2018b; Zeid et al., 2018). Recently, we and others demonstrated in medulloblastoma, ependymoma, and neuroblastoma that molecularly defined tumor subgroups each possess distinct core regulatory circuitries and that defining TFs were often faithful tracers of cell-of-origin and critically shown to be nonmutated tumor dependencies (Lin et al., 2016; Garancher et al., 2018; Wang et al., 2018b; Zeid et al., 2018).
GBMs exhibit distinct chromatin and epigenetic profiles distinguished from other brain tumors
To define the TF core regulatory circuitries of GBM, we integrated active enhancer landscapes defined by histone H3 lysine acetylated chromatin (H3K27ac) with gene expression, DNA methylomes, copy number variations (CNVs), and whole exomes across a large cohort of GSCs (n = 44), primary GBM surgical specimens (n = 50), and neural stem cells (NSCs; n = 10; Fig. 1 and Fig. S1). 101 samples were profiled by at least one genomic measurement, and within this dataset, 79 samples were matched for H3K27ac, gene expression, and methylation. In sum, this integrated genomic and epigenomics dataset represents the largest integrated cohort of chromatin and epigenetic profiling compiled for GBMs. All datasets were subjected to rigorous quality control based on Encyclopedia of DNA elements (ENCODE) best practices, and only datasets passing stringent quality control metrics are presented in this study (Fig. S1).
Overall, we find strong concordance between gene expression, DNA methylation, and active enhancers. For instance, at the SOX2 locus we see consistent hypomethylation of promoter DNA, the presence of active euchromatin modifications (H3K27ac) at both the promoter and proximal enhancer, and evidence of abundant mRNA (Fig. 1, a and b). Globally, there is a positive correlation between H3K27ac at gene promoters and proximal enhancer with mRNA levels (Fig. S1 a) and anticorrelation between DNA methylation and mRNA levels (Fig. S1 a). These data suggest that our combined profiling efforts faithfully capture the epigenetic, chromatin, and transcriptional state of GBM.
Previously, we and others have shown that reference chromatin and epigenetic landscapes of brain tumors can be used to accurately to characterize tumor identity, classify tumor subgroups, and faithfully recapitulate molecular features of disease (Capper et al., 2018). Using t-Distributed Stochastic Neighbor Embedding (t-SNE) dimensionality reduction, we mapped chromatin and epigenetic profiles of our samples onto a comparative landscape of additional tumor types and tissues. By active enhancer H3K27ac profiles, GSCs and GBM surgical samples form clusters distinct from other brain tumors (e.g., medulloblastoma and ependymoma), as well as normal brain and other tissues drawn from the ENCODE and Roadmap Epigenomics Consortium datasets (Fig. 1 c). Integrating DNA methylomes into the German Cancer Research Center brain tumor DNA methylation atlas (https://www.molecularneuropathology.org/mnp; Capper et al., 2018), GSCs and GBM surgical samples cluster closely together with reference non-IDH1–mutated GBMs and were distinct from other brain tumors, as well as genetically defined GBM subtypes (e.g., IDH1/2 mutant; Fig. 1 d). Supporting this finding, the most frequent somatic alterations detected in GSC models included TP53 mutations, PTEN mutations, EGFR amplifications, and CDKN2A/p16 deletions, further reinforcing that our models recapitulate genetic features of primary tumors while capturing molecular features of tumor heterogeneity (Fig. S1 b). Within GBMs, classically defined mutational spectra (e.g., EGFR mutations) fail to cluster either by active enhancer or DNA methylation landscapes, suggesting an additional contribution of chromatin and epigenetic state to GBM identity (Fig. 1).
Recurrent SE genes define GSC identity and uncover novel dependencies
To delineate genes important for defining and maintaining the GSC state, we mapped SE loci across a large panel of patient-derived GSC cultures. Our analysis revealed 8,533 distinct SE loci detected in at least one GSC sample (Fig. 2 a; Materials and methods). Across all samples, we identified a subset of SE loci and genes shared by a high proportion of GSCs (>75% of the samples) that represent potential key mediators of GSC identity (Fig. 2, a and b). These recurrent SEs were ordered by the product of SE frequency and amplitude (H3K27ac levels), termed FASE score. Many of these genes serve functional roles in GSC maintenance, including CDK6, SOX2, EGFR, BRD4, POU3F2, and SALL3 (Suvà et al., 2014; Lathia et al., 2015; Liu et al., 2015). Novel core GSC identity transcripts were also identified, such as MIR4316 and CXXC5 (Fig. 2 a and Fig. S2). SE-associated genes exhibited elevated H3K27ac in GSCs, primary GBM tumor tissues, and in some cases, NSCs (Fig. 2 b). Globally, proximity of an SE to a gene correlated with increased expression in GSCs and GBM surgical samples (Fig. S2 a). To delineate the impact of core SE transcription on clinical outcome, we derived a gene signature composed of the top 250 SE-regulated genes identified in GSCs, which predicted patient survival in an independent set of transcriptional profiles defined by The Cancer Genome Atlas (TCGA; Verhaak et al., 2010; Brennan et al., 2013; Wang et al., 2018a). Patients with GBM that harbors enrichment of the core GSC SE signature have inferior survival compared with those with reduced enrichment (Fig. 2 c). Patients with high expression of this core GSC SE signature have higher-grade tumor histology and are less likely to have tumors with IDH1 mutations and a glioma CpG-island methylator phenotype (Noushmehr et al., 2010; Fig. 2 d). Our data support a paradigm in which many genes that define GSC identity are highly expressed, delineated, and regulated by the presence of SEs. To determine if core GSC identity genes were required for tumor cell proliferation, we targeted several top SE genes—SOX2, SEPT9, CXXC5, CDK6, SALL3, and EGFR—using shRNAs in a panel of patient-derived GSC models (Fig. 2 e). Core GSC SE-associated identity genes were essential for tumor cell proliferation over a 7-d time course, suggesting that GSCs depend on expression of core SE genes to facilitate cell growth. Collectively, our data uncovered a set of GSC identity genes that are regulated by SEs and are required for GSC maintenance. Tumors that harbor enrichment of this core GSC SE signature exhibit features of increased tumor malignancy and correlate with a worse prognosis. Core GSC SE genes, therefore, represent a lead set of non-mutationally classified dependency genes as promising targets for therapeutic testing.
GSCs are composed of at least two distinct SE states
Given that core GSC SE genes define a shared axis of GSC identity genes (Singh et al., 2017), we next sought to investigate SE landscape heterogeneity in a large series of GSC models. Using unsupervised hierarchical clustering of H3K27ac activity, we found that two highly distinct SE states predominated among GSCs (Fig. 3 a; Materials and methods). These subgroups, annotated as group 1 and group 2, comprised the majority (24/30) of GSCs profiled and exhibited high similarity within groups and low similarity between groups. Differential analysis of SEs between group 1 and group 2 revealed 597 group 1 and 651 group 2 SEs that drive group-specific differences (Fig. 3 b; Materials and methods). These regions show opposing patterns of H3K27ac at exemplary loci and globally (Fig. 3 c). Differential H3K27ac patterns often appear binary—most regions have greater than fivefold change in H3K27ac (Fig. 3 d)—indicative of de novo chromatin acetylation in each subgroup as opposed to a more quantitative increase/decrease in acetylation. Almost all genes proximal to and associated with GSC group-specific SEs are differentially expressed between groups consistent with enhancer acetylation correlating to target gene expression (Fig. 3 d). Using single-sample gene set enrichment analysis, TCGA-defined molecular signatures from bulk primary GBM samples profiles were used to classify group 1 and group 2 samples. While group 2 was highly associated with “mesenchymal” features, group 1 samples contained both “proneural” and “classical” or “proliferative” features (Phillips et al., 2006; Verhaak et al., 2010; Mao et al., 2013; Wang et al., 2018a), suggesting potentially convergent pathways at the level of active chromatin regulation. We hypothesized that group-specific SE-associated genes are also likely to be enriched for dependencies but in a group-specific manner. Genetic knockdown of group-specific genes, such as BCAN and HMGA2, in GSCs expressing concordant SEs attenuated tumor growth both in vitro and in vivo (Fig. 3, e and f), reinforcing the relationship between SE presence and dependency. Overall, these data suggest that GBM and GSCs are composed of at least two distinct SE states and that maintenance of SEs that regulate group-specific identity is required for tumor growth/proliferation.
Group-specific TFs define group identity and are required for glioma growth
Cellular identity in both tumors and normal tissues is defined by a small number of TFs that establish cell-type SEs, are often regulated by large SEs, and regulate one another in what has been termed a transcriptional core regulatory circuitry (Boyer et al., 2005). Reverse analysis of active enhancer landscapes maps putative TF core regulatory circuitry (Lin et al., 2016; Mack et al., 2018). Here, we strengthened TF enhancer–target gene regulatory predictions by implementing a ridge regression model, which incorporates gene expression to identify TFs where gene expression of the TF and target correlate or anticorrelate significantly (Fig. 4 a). Using this approach, we identified cohorts of SE-associated TFs that form highly interconnected circuitries and are enriched at the SEs of each respective group (Fig. 4 b; Materials and methods). Group-specific TFs were also shown to have group-specific SE association, expression, and exhibited highly restricted expression patterns across human tissues (Fig. 4 c). Functional analysis of high-confidence target gene sets identified thematic pathways consistent with features of oligodendrocyte and neuronal differentiation for group 1 and JAK/STAT signaling for group 2 (Fig. S4 f). shRNA knockdown of RUNX2, a group 2–specific TF with no prior association to GBM, was lethal in vitro and in vivo (Fig. 4, d and e) but not in group 1 GSCs. OLIG2, a group 1–specific TF that has been implicated in oligodendrocyte identity, showed a reciprocal pattern of dependency with its knockdown highly sensitive in group 1 but not group 2 (Fig. 4 d). These data suggest that GSC subgroups are defined by and driven by highly tumor-specific TFs identified within core regulatory programs and that group-specific TFs are themselves highly specific dependencies.
Group-specific TF core regulatory circuitries are recapitulated in primary tumors
Primary GBMs exhibit high intratumoral heterogeneity and are highly dynamic to signaling queues from the tumor microenvironment (Hu et al., 2016; Venkatesh et al., 2017; Griveau et al., 2018; Wang et al., 2018b). As TF core regulatory circuitries were defined by using in vitro GSCs, we next sought to determine whether primary GBMs contained similar group-specific transcriptional regulation both across surgical tumor specimens and in the individual cells comprising tumors. Using group-specific TFs and their high-confidence target genes, we found that by RNA-seq and H3K27ac profiling GBM biopsy specimens strongly associated with one group or the other with a high overall degree of similarity (Fig. 5, a–c; Materials and methods). These data are consistent with prior multi-omic profiling efforts on primary tumors that originally elucidated GBM subgroups (Phillips et al., 2006; Verhaak et al., 2010). Moreover, they suggest that tumor identity as defined by GSC TF core regulatory circuitry is stable across GBM independent of cell culture conditions.
Within tumors, single-cell RNA-seq profiling has revealed varying degrees of heterogeneity with some tumor cell populations more homogenously aligning with a specific GBM subgroup and others containing more mixed populations and traits of multiple subgroups (Patel et al., 2014). To investigate the TF core regulatory circuitry of single cells within GBM tumors, we analyzed these single-cell expression data from five GBM tumors with 50–100 cells each. The expression profile of each cell was correlated against signatures composed of group-specific TFs and their high-confidence target genes (Fig. 5 d; Materials and methods). Considering significant correlations to TFs, two of the tumors contained mostly cells assigned to group 1 (MGH26 and MGH30). Cells for tumors MGH28 and MGH29 showed a strong correlation to group 2 and more moderate correlation to group 1. For MGH31, only a small number of cells significantly correlated with a specific subgroup, and no group-specific bias was observed. Our findings demonstrate the high sample–sample variance in group 1 and group 2 identities and potential utilization of other distinct GSC states found to be less abundant across GBM (Fig. 3 a). Given our delineation of group 1– and group 2–specific genetic dependencies, there may be utility in estimating GSC group/state-specific compositions within GBM samples as a basis for molecular targeting.
Group-specific aberrant activation of developmental TFs in GBM
The presence of at least two GSC subgroups suggests either multiple cells of origin for GBMs or multiple transforming events that fundamentally alter cell state and transcriptional identity. To further investigate this question, we interrogated a cohort of NSCs, commonly considered to be useful comparators for GSCs and upstream or primordial to the GBM developmental cell of origin. By H3K27ac enhancer landscapes and TF activity, NSCs subdivided and clustered with either group 1 or group 2 GSCs (Fig. 6, a–d). Further analysis revealed that although most group-specific TFs are consistent between group-specific GSCs and their associated NSCs, a small number of TFs exhibited aberrant enhancer activity and expression in GSCs relative to NSCs (Fig. 6, e and f). For group 1, GSC-specific TFs included OLIG1/2, NR2E1, NKX2-2, EN1, and MEOX2. OLIG1 and OLIG2 are master regulators of oligodendrocyte identity and GSC drivers (Lu et al., 2002; Yu et al., 2013; Tachon et al., 2019; Fig. 6 e). NR2E1 (TLX) is an orphan nuclear hormone receptor previously implicated in both GSC and NSC renewal (Zhu et al., 2014). NKX2-2, EN1, and MEOX2 are developmental TFs with neural-specific expression whose roles in glioma are heretofore unknown. For group 2, HOXA and HOXB developmental TFs were up-regulated compared with NSCs. Additionally, a cohort of TFs with known roles in mesenchymal identity and inflammatory signal transduction were up-regulated, including RUNX2, STAT3/5, TFAP2A, and IRX3 (Fig. 6 e). As data above support OLIG1/2 and RUNX2 as GSC-specific dependencies, we hypothesized that GSC-specific TFs define a highly oncogenic gene expression program. For each group, we defined high-confidence target gene sets for GSC-specific and GSC/NSC shared TFs. Thus, when compared with NSCs, GSCs activate and integrate highly lethal oncogenic TFs into their core regulatory circuitry.
Core cell identity genes uncover therapeutic targets in GBM
Core GSC SE genes were integrated with the Washington University Drug Gene Interaction Database (Cotto et al., 2018) and annotated 718 SE associated genes as targets with drug interactions with 309 genes nonannotated (Figs. 2 a and 7 a). Many of these candidates were divided into targets defined as “clinically actionable” or classified as “druggable” proteins, kinase targets, or histone-associated proteins (Fig. 7 a). Four GSC models driven by predicted clinically actionable SE genes EGFR and CDK6 were treated with a combination of EGFR and CDK4/6 inhibitors, lapatinib, and palbociclib, revealing cellular growth inhibition by treatments (Fig. 7, b and c). GBM models were further inhibited by combinatory treatments against EGFR and CDK6 in mice bearing orthotopic xenograft tumors with relevant GSC SEs (Fig. 7 d), validating this approach as potentially informative of preclinical efficacy for precision-based therapy selection. Given our findings that GBM identity (and group-specific specification) are regulated by SEs and distinct TF programs, and that these represent potential cellular dependency genes, we asked whether this list of potential oncogenic drivers could be explored for therapeutic targeting by small-molecule inhibitors. High-confidence group-specific target genes and pathways were mined with publicly available gene–drug interaction databases (Cotto et al., 2018) to pinpoint 16 small-molecule inhibitors of interest targeting group 1 or group 2 SE genes. A high-throughput screen was performed on 12 GSC models (n = 7 for group 1 and n = 5 for group 2), testing 16 drugs, at 10 concentrations, in eight technical replicates, and in two biological replicates (Fig. 7 e). This compound screen identified MAP3K1 (MEKK1) as a target with almost exclusive group 1 enhancer regulation (Fig. 7 f). MAP3K1 is upstream of canonical MAPK/ERK and JNK activity, and we hypothesized that group 1 GSCs may harbor selective dependency to MAPK/ERK inhibition. Using AZD8330, a potent and selective inhibitor to the MAPK/ERK MEK family of kinases, we observed selective effects on GSC group 1 viability at low micromolar doses with little effect in group 2 GSCs (Fig. 7, e and g). Our findings, therefore, demonstrate that SE programs that regulate GSC identity (both core and group specific) can be effectively mapped and used for therapeutic selection as a basis for discovery of non-mutationally defined oncogenic dependencies.
Numerous studies have sought to define groups of GBMs that share transcriptional profiles. Classification schemes based on these transcriptional signatures have ranged in the number of groups widely, but none has reliably informed clinical management beyond the identification of the IDH1 or IDH2 mutant tumors, which are tumors with a distinct molecular basis. Transcriptional profiles appear dynamic, displaying both spatial and temporal transitions. We recently showed that defined radiographical features were associated with selected chromatin regulation and transcriptional profiles: contrast enhancement with an EZH2-associated vascular (proneural) transcriptional signature and necrosis with a BMI1-associated hypoxic (mesenchymal signature; Jin et al., 2017). GBMs display spatial diversity of genetic events, which have not, to date, been associated with specific regional variation, but genetic analyses have suggested that tumors may accumulate mutations selectively based on heterochromatin and euchromatin states, supporting crosstalk between epigenetic and genetic tumor regulation (Chen et al., 1998; Zheng et al., 2014; Makova and Hardison, 2015).
Further, GBMs may transition in the dominant signature upon recurrence after radiation and chemotherapy (Phillips et al., 2006; Verhaak et al., 2010). Despite these complexities or perhaps because of them, the identification of core, reinforcing TF circuitry may provide relatively preserved molecular regulation over time and space. We demonstrated across a large panel of GSCs maintenance of such consistent chromatin patterns as defined by presence and elevated expression of core SE genes. Our results are in line with previous studies that use comparable approaches to characterize the active chromatin landscapes of GSCs in an effort to delineate GSC identity (Suvà et al., 2014; Singh et al., 2017). However, owing to the large and diverse panel of GSC models that we characterized in our study, we were able to 1) robustly quantify the frequency of core SEs detected, and 2) dissect the molecular subgroups of GSCs as defined distinct SE landscapes. Elevated expression of the top 250 most frequent GSC SE-linked genes stratified patients with poor survival. Furthermore, our characterization of a large number of GSC models from patients revealed consistent patterns of at least four GSC subgroups harboring unique SE signatures. The two predominant groups, which we assigned group 1 and group 2, have some similarities in terms of their regulation by key TFs (i.e., OLIG1/2 and RUNX2) that define proneural and mesenchymal GBM subgroups, respectively. However, at the transcriptional level, group 1 and group 2 do not significantly match proneural and mesenchymal subgroup signature genes, thus supporting that the GSC groups we identify are restricted to specific tumor cell populations, highly consistent between distinct sets of patients, and defined by distinct transcriptional programs.
A proneural–mesenchymal transition (Bhat et al., 2013; Mao et al., 2013; Barnes et al., 2018) has been described in numerous studies on GBM as a cell identity switch in response to microenvironmental and/or treatment-associated factors (i.e., irradiation). As nearly all of our specimens were derived from newly diagnosed primary unirradiated specimens, the proneural–mesenchymal transition process does not explain the group 2 GSC signatures we observed in treatment-naive GSC models and tumors. This does not exclude the possibility that group 1 and group 2 GSC populations may transition between states as this was not explicitly tested in our study and may be an important area of further investigation, particularly in the setting of prospective therapies targeting GSC group identity.
Our data also suggest that GSCs may adopt TF circuitry similar to distinct signatures of neural stem and progenitor cells. These findings may represent maintenance of distinct cells of origin or convergent evolution driven by the collective effects of genetic and epigenetic events. Genetically engineered mouse models of GBM have supported divergent cells of origin (e.g., NSCs, oligodendroglial progenitor cells, or astrocytes; Chen et al., 2012b), which may reflect divergent precursor cell states adopted by GBMs. Further expansion of normal to tumor comparisons (at the chromatin level) in a large series of adult NSC, oligodendroglial progenitor cell, and immature/mature astrocyte cell populations (cultured under the same growth conditions) would be potentially informative of transcriptional programs that mediate tumor initiation and potential cells of origin.
Targeted therapies have transformed the therapeutic landscape of several cancers, but many cancer types fail to respond to molecularly guided precision medicine approaches that are based solely on tumor genetics or transcriptional signatures. Prominently, GBM ranks among the most deeply characterized cancers, yet current therapies focus on generalized treatment paradigms. Here, we demonstrated that the epigenetic landscapes of GSCs and extension to GBM tumors yields novel insights into cellular dependencies and core transcriptional regulation. While TFs represent essential factors in cellular transformation and tumor maintenance, they have been notoriously challenging to target. We find that core TF circuits are associated with the presence of SEs that identify essential genes that are amenable to therapeutic targeting and may provide the basis of developing novel therapeutic paradigms. These findings are in keeping with growing literature that supports that SEs and associated TF circuitries represent important nonmutated cancer dependencies (Durbin et al., 2018; Mack et al., 2018; Ott et al., 2018). Such vulnerabilities may be identified from SEs shared across GSCs and those that occur in a GSC subtype–specific fashion. It is important to note that, although our approach points to a novel set of target genes, single-agent approaches are unlikely to be effective against this highly genetically and phenotypically heterogeneous disease. As a key example, EGFR inhibition used in EGFR SE-containing GSC tumor models, although effective in vitro, showed limited efficacy as a single agent in vivo. Combinatorial approaches against multiple SE genes in addition to mutated drivers may represent a potential approach that may prove efficacious. Potential translation of targets (and combinations of targets) will require comprehensive in vitro and in vivo functional validation to prioritize and screen candidates for further drug testing. To this end, we recently developed an in vivo screening method to evaluate a prioritized set of candidate genes using inducible loss-of-function technology in mouse orthotopic GBM xenografts (Miller et al., 2017). This approach could be used to systematically evaluate the targets we identified in this study with expansion to multiple GSC group-specific tumor models. In sum, our data support that targeting of transcriptional identity has the potential for a large therapeutic index as we find elements of core TF circuits and their associated pathways to be not only highly GBM specific but also related to the presence of distinct GSC populations defined by specific chromatin signatures.
Materials and methods
Derivation of GSC models and maintenance of xenografts
GBM tissues were acquired from excess surgical resection tissues from patients at the Case Western Reserve University. All specimens were reviewed by a neuropathologist, and appropriate informed consent was obtained from patients in accordance with a Cleveland Clinic Institutional Review Board–approved protocol (090401). All studies involving human patients were conducted in accordance with the Declaration of Helsinki. To limit cell culture–based artifacts, patient-derived xenografts were generated and dissociated as needed to serve as a renewable source of tumor cells for study. Short tandem repeat analyses were performed to authenticate the identity of each tumor model used in this article every 6 mo. Cells were stored at −160°C when not being actively cultured. All cells were thawed within 1 mo of these experimental procedures. All experiments conformed to the relevant regulatory standards.
In vivo tumorigenesis
50,000 human-derived GSCs were implanted into the right cerebral cortex of NSG mice (NOD.Cg-Prkdcscid Il2rgtm1Wjl/SzJ; The Jackson Laboratory) at a depth of 3.5 mm to establish intracranial xenograft models. All murine experiments were performed under an animal protocol approved by the Cleveland Clinic and University of California, San Diego, Institutional Animal Care and Use Committee. Healthy, wild-type female mice of NSG background, 4–6 wk old, were randomly selected and used in this study for intracranial injection. None of the mice had experienced any treatment or procedures before the experiments described. Mice were housed together in a controlled environment with 14 h of light and 10 h of dark per day. Animal husbandry staff at the Cleveland Clinic and University of California, San Diego, regularly observed all animals, and no more than five mice were housed in each cage. Animals were monitored until neurological signs were observed, at which point they were sacrificed. Hunched posture, gait changes, lethargy, and weight loss served as neurological signs or signs of morbidity, which indicated an endpoint condition. Brains were harvested and fixed in 4% formaldehyde, cryopreserved in 30% sucrose, and then cryosectioned. H&E staining was performed on sections for histological analysis. In parallel survival experiments, mice were observed until the development of neurological signs as described above. Healthy 4–6-wk-old female NSG mice were randomly selected and used in this study for intracranial injection. For inhibitor treatments, mice were treated with control DMSO vehicle, palbociclib (50 mg/kg every day), lapatinib (50 mg/kg every day), or respective combinations until endpoint criteria were reached. Mice were sacrificed when neurological signs were observed, and tissues were collected for further analysis. Brains were harvested and fixed in 4% formaldehyde, cryopreserved in 30% sucrose, and cryosectioned. H&E staining was performed on brain sections for histological analysis.
Tumor dissociation and GSC culture
Dissociation of xenografted tumors was performed using a papain dissociation system according to the manufacturer’s instructions. Cells were then cultured in Neurobasal medium supplemented with 2% B27, 1% l-glutamine, 1% sodium pyruvate, 1% penicillin/streptomycin, 10 ng/ml basic fibroblast growth factor, and 10 ng/ml epidermal growth factor (EGF) for ≥6 h to recover expression of surface antigens. Because no single-cell surface marker is uniformly informative for marking GSCs, we used functional criteria to define and validate the presence of GSCs. GSCs were isolated immediately following dissociation or following transient xenograft passage in immunocompromised NOD.Cg-Prkdcscid Il2rgtm1Wjl/SzJ (NSG) mice using prospective sorting followed by assays to confirm stem cell marker expression, sphere formation, and secondary tumor initiation. When experiments were performed, we isolated AC133 marker–positive populations using CD133/1 antibody-conjugated magnetic beads. Neurobasal medium was used to wash specimens, which were then acutely dissociated to remove nontumor tissue and subjected to enzymatic dissociation using the Papain dissociation system (Worthington Biomedical Corp; LK003150). The isolated tumor cells were briefly placed in Neurobasal medium with B27 supplement (Life Technologies; 12587010) to permit recovery following enzymatic dissociation. Cells were labeled with CD133/2(293C)-allophycocyanin antibody kit (Miltenyi Biotec; 130098826), and the CD133+ cells were sorted and analyzed by flow cytometry. The sorted CD133+ cells were cultured in NBM-B27 medium containing 20 ng/ml of both EGF (R&D Systems; 236-EG-01M) and recombinant human basic fibroblast growth factor (R&D Systems; 4114-TC-01M) for a short period before treatment and analysis (Bao et al., 2006; Flavahan et al., 2013). Expression of stem cell markers (SOX2 and OLIG2), functional assays of self-renewal (serial neurosphere passage), and tumor propagation by using in vivo limiting dilution assays were used to validate GSC phenotypes.
Proliferation and neurosphere formation assay
Cell proliferation experiments were conducted by plating cells of interest at a density of 1,000 cells/well in a 96-well plate with six replicates. CellTiter-Glo (Promega) was used to measure cell proliferation. All data were normalized to day 0 and presented as mean ± SD. In vitro limiting dilution assays were used to assess neurosphere formation capacity as previously reported (Flavahan et al., 2013). Briefly, decreasing numbers of cells were plated into each well of a 96-well plate. The presence and number of neurospheres in each well were recorded 7 d after plating. All tumorsphere and proliferation experiments were performed at least six times.
Chromatin immunoprecipitation (ChIP) of 5–10 mg of flash-frozen primary GBM tumors was performed by using 5 μg H3K27ac antibody per ChIP experiment (Active Motif; 39133). In the case of cells, 5 million cells were used for ChIP studies. ChIPs were performed as described previously (Wang et al., 2018b). Enriched DNA was quantified by using PicoGreen (Invitrogen) and ChIP libraries were amplified and barcoded by using the Thruplex DNA-seq library preparation kit (Rubicon Genomics) according to the manufacturer’s recommendations. Following library amplification, DNA fragments were agarose gel (1.0%) size selected (<1 kb), assessed by using a Bioanalyzer (Agilent Technologies), and sequenced at The Centre for Applied Genomics (The Hospital for Sick Children) by using Illumina Hi-Seq 2500 150-bp single-end reads (GeneWiz).
RNA-seq library preparation
Total RNA was extracted from cell pellets or liquid nitrogen–pulverized tissue using the miRNeasy mini kit (Qiagen) according to instructions from the manufacturer. Stranded RNA library preparation was performed with ribosomal RNA depletion according to instructions from the manufacturer (Epicentre) to achieve greater coverage of mRNA and other long noncoding transcripts. Paired-end sequencing was performed on the Illumina HiSeq 2500 with 2 × 150-bp paired-end read configuration.
Trizol reagent (Sigma-Aldrich) was used to isolate total cellular RNA from cell pellets. The qScript cDNA Synthesis Kit (Quanta BioSciences) was used for reverse transcription into cDNA. Quantitative real-time PCR was performed by using Applied Biosystems 7900HT cycler using SYBR-Green PCR Master Mix (Thermo Fisher Scientific). Quantitative PCR (qPCR) primers used in this study were human ASCL1 forward 5′-CCCAAGCAAGTCAAGCGACA-3′ and reverse 5′-AAGCCGCTGAAGTTGAGCC-3′; human BCAN forward 5′-TGGAAGGAGACAGCTCAGAGG-3′ and reverse 5′-CGGGACAGGAAAGTCCACTTG-3′; human C1ORF61 forward 5′-CTTTTCCAGCTTTGGGAGTCA-3′ and reverse 5′-GTGGCTCTATCGTCCACACG-3′; human CDK6 forward 5′-GCTGACCAGCAGTACGAATG-3′ and reverse 5′-GCACACATCAAACAACCTGACC-3′; human CXCL8 forward 5′-TTTTGCCAAGGAGTGCTAAAGA-3′ and reverse 5′-AACCCTCTGCACCCAGTTTTC-3′; human CXXC5 forward 5′-CCGAGCGTCGGAACAAG-AG3′ and reverse 5′-CCACTGCTGCCAAAAGAAGAG-3′; human EGFR forward 5′-AGGCACGAGTAACAAGCTCAC-3′ and reverse 5′-ATGAGGACATAACCAGCCACC-3′; human HMGA2 forward 5′-ACCCAGGGGAAGACCCAAA-3′ and reverse 5′-CCTCTTGGCCGTTTTTCTCCA-3′; human MSRB3 forward 5′-CGGTTCAGGTTGGCCTTCATT-3′ and reverse 5′-GTGCATCCCATAGGAAAAGTCA-3′; human RUNX2 forward 5′-TGGTTACTGTCATGGCGGGTA-3′ and reverse 5′-TCTCAGATCGTTGAACCTTGCTA-3′; human SALL3 forward 5′-CCAATGTGTCGGTGTTCGAG-3′ and reverse 5′-CCGGGTAAGGGTTCATCTGG-3′; human SEPT9 forward 5′-TTCGGCTACGTGGGGATTG-3′ and reverse 5′-CTGCCCGACCACCATGATG-3′; human SOX2 forward 5′-GCCGAGTGGAAACTTTTGTCG-3′ and reverse 5′-GGCAGCGTGTACTTATCCTTCT-3′; and 18S RNA forward 5′-ACCCGTTGAACCCCATT-3′ and reverse 5′-CCATCCAATCGGTAGTAGCG-3′.
Plasmids and lentiviral transduction
Lentiviral clones expressing two nonoverlapping shRNAs directed against human ASCL1 (TRCN0000235657, TRCN0000244309), human BCAN (TRCN0000151872, TRCN0000371366), human C1ORF61 (TRCN0000172451, TRCN0000167021), human CDK6 (TRCN0000196261, TRCN0000196337), human CXCL8 (TRCN0000058030, TRCN0000232050), human CXXC5 (TRCN0000139498, TRCN0000142729), EGFR (TRCN0000121329, TRCN0000121068), HMGA2 (TRCN0000342671, TRCN0000021966), MSRB3 (TRCN0000064761, TRCN0000064759), RUNX2 (TRCN0000013656, TRCN0000013655), SALL3 (TRCN0000418831, TRCN0000019755), SEPT9 (TRCN0000119069, TRCN0000445092), SOX2 (TRCN0000355694, TRCN0000231642), or a nontargeting control shRNA that has no targets in the human genome were obtained from Sigma-Aldrich. Nonoverlapping shRNAs were selected based on knockdown efficiency and were then used for all following experiments. 293FT cells were used to generate lentiviral particles through cotransfection of the packaging vectors pCMV-dR8.2 dvpr and pCI-VSVG by using a standard calcium phosphate transfection method in Neurobasal complete medium.
Whole-exome sequencing variant discovery
BWA-MEM version 0.7.12 (Li and Durbin, 2010) was used to align paired-end exome sequencing reads to the hg19 reference genome, and SAMtools (Li et al., 2009a) was used for sorting. PCR duplicates were removed with PicardTools (http://broadinstitute.github.io/picard/). Single-nucleotide variants and indels were identified with the Genome Analysis Toolkit (McKenna et al., 2010) version 3.8 in accordance with the Genome Analysis Toolkit best practices with the principal steps of base quality score recalibration, variant genotyping for single-nucleotide variants and indels, and variant hard-filtering with standard recommendations (DePristo et al., 2011; Van der Auwera et al., 2013). Variants with predicted moderate or high impact on gene function (primarily nonsynonymous variants or those that directly affect splice donors or acceptors) were annotated with SNPeff (Cingolani et al., 2012). Variants were further annotated with allelic frequencies from the Exome Aggregation Consortium release 1 (Lek et al., 2016) or the 1000 Genomes Project (Auton et al., 2015). Potential pathogenic variants were identified in a group of commonly mutated PI3K pathway genes (John Lin et al., 2017) and GBM genes (Brennan et al., 2013) with an allelic frequency of <1/10,000 in both Exome Aggregation Consortium and the 1000 Genomes Project. CopywriteR (Kuilman et al., 2015) was used to identify CNVs. Focal CNVs of commonly altered GBM and PI3K genes were identified with a log2 (copy number ratio) of >2 or >1 and focal (i.e., <10 megabases and clearly delimited) for amplifications with a threshold of <−2 or <−1 and focal for deletions. A small minority of focal CNVs were also excluded manually based on small size and having limited support in the BAM file.
H3K27ac ChIP-seq processing
The H3K27ac ChIP-seq was processed following the guidelines of ENCODE (phase-3) by using the AQUAS pipeline (https://github.com/kundajelab/chipseq_pipeline). This ChIP-seq processing pipeline (1) mapped the sequenced reads to the human reference genome hg19/GRCh37 using the Burrows-Wheeler Aligner (Li and Durbin, 2009), (2) filtered low-quality, duplicate, multimapping, unmapped reads along with reads mapping to the mitochondrial genome, and (3) performed peak calling using MACS2 with a P value threshold of 10−5. The quality control measures such as mapping statistics, library complexity (PCR bottlenecking coefficients, PBC1, and PBC2), cross-correlation scores (normalized strand cross-correlation coefficient and relative strand cross-correlation coefficient) and fraction of reads in the peaks as defined by ENCODE data standards (https://www.encodeproject.org/chip-seq/histone/) were also determined using the pipeline. Additionally, we also determined the fraction of reads mapping within 2-kb of an annotated promoter as a quality control measure. The H3K27ac ChIP-seq experiments with PBC1 >0.8 with total mapped reads >10 million and MACS2 peaks (P < 10−05) >5,000 were used for further analysis. Experiments detected as outliers based on acetylation signal and potential impurity (n = 4) were further removed to provide a robust set of experiments of GSCs (n = 30), primary tumors (n = 7), and NSCs (n = 7).
The sequenced reads of RNA-seq experiments were aligned to human reference genome hg19/GRCh37 using STAR aligner (Dobin et al., 2013). The expression of the genes was determined as fragments per kilobase million (FPKM) by counting the number of uniquely mapping fragments and normalizing it by the length coding sequence and the library size.
Gene annotation was based on UCSC RefSeq. Genes expressed with >1 FPKM in ≥50% of the samples of each group were considered to be expressed genes.
Defining GSCs groups
Unsupervised consensus clustering using the ConsensusClusterPlus package from Bioconductor was performed on the top 10% most variable regions by acetylation signal (n = 14,621). This was done by calculating the median absolute deviation of tags per base per million of all the peaks with an acetylation signal. Unsupervised clustering was based on the Euclidean distance between the experiments with maximum clusters k = 6 using 80% of samples and 80% of features for every k. The consensus clustering results showed that GSCs optimally fell into four clusters, group 1 (n = 15), group 2 (n = 9), group 3 (n = 2), and group 4 (n = 4).
Defining NSCs groups
NSC samples were clustered with the GSC samples using the consensus clustering approach as described above with maximum clusters k = 7 for on top 10% most variable regions by acetylation signal (n = 16,817).
Mapping SEs using H3K27ac enhancer definitions
Identifying group-specific SEs
The SEs that were present in at least two samples of each of the two GSC groups were compared for differential occupancy (n = 2,445). SEs with significantly differential H3K27ac occupancy (false discovery rate [FDR]–adjusted P < 0.01) were identified by modeling H3K27ac read counts as negative binomial distribution using a generalized linear model (Anders and Huber, 2010). Bioconductor package DESeq2 was used for this purpose.
Identifying differentially expressed genes between the GSC groups
Differentially expressed genes (FDR-adjusted P < 0.1) between the two GSC groups were identified using the generalized linear model as described above from the RNA-seq gene counts. Genes expressed in either of the GSC groups were used for this.
Occupancy heatmap of SEs
The occupancy heatmap of significantly differential SEs was generated using the Bioconductor package genomation. The heatmap is centered at the center of the differential SEs with total width as the 75th quartile width of all the differential SEs. The H3K27ac density in the heatmaps is shown in reads per million (rpm).
Identifying group-specific TFs
SE-associated genes have been shown to be critical for establishing cell identity. Thus, we wanted to identify group-specific SE-associated TFs that are key regulators of SE-associated target genes. We predicted TF binding to SEs by motif and used proximity to assign SEs to genes. The H3K27ac valleys were used as the nucleosome-free regions where the TFs could potentially bind. TF binding sites were obtained from TRANSFAC (Matys et al., 2006) and detected at nucleosome-free regions by using FIMO (Grant et al., 2011). For each group of GSCs, we identified the SE-associated target genes and TFs. We used ridge regression to predict the target gene activity from the binding TF activity and then used a permutation test to assign significance to ridge regression coefficients. The regularizer in the ridge regression was chosen by 10-fold cross-validation. The target gene activity was randomly permuted followed by model fitting by ridge regression. We performed 10,000 permutations to obtain a null distribution of the regression coefficients. This null distribution was used to assign the significance to the true model coefficients. TFs with P < 0.05 were considered the potential key regulators of the target genes. TFs with significant correlation to >2% of target genes in both the groups were considered as consensus TFs (n = 24) while the ones that were significantly correlated with >1% target genes in the specific groups only were the potential group-specific TFs (group 1 TFs = 14 and group 2 TFs = 18).
High-confidence group-specific target genes
Genes with group-specific SEs with predicted binding sites for TFs along with significantly different gene expression between the two groups were categorized as group-specific target genes (common target genes = 111, group 1 target genes = 125, and group 2 target genes = 171).
Gene ontology enrichment analysis
Functional annotation enrichment done for group-specific TFs, common TFs, and target genes was performed by using DAVID with the expressed genes as the background.
Primary GBM classification
A gene-signature set was created for group 1 by combining the group 1–specific TFs and group 1 target genes. Similarly, the group 2 gene signature was defined by combining group 2–specific TFs and group 2 target genes. k-means (k = 2) was used to classify the GSC samples and primary GBMs into two groups by using the quantile normalized log2(rpm/bp) acetylation signal at the SEs of these genes with 10,000 iterations with 100 different starting points. This created two clusters, cluster 1 corresponding to group 1 GSCs (GSCs = 15, primary GBMs = 6) and cluster 2 corresponding to group 2 GSCs (GSCs = 9, primary GBM = 1). Primary GBMs that clustered with group 1 GSCs were assigned to be group 1–like GSCs, and the one that clustered with group 2 GSC was assigned to be group 2–like GSC. This methodology was repeated on the gene expression (log2 RPKM) of group 1 and group 2 gene signatures yielding consistent assignments of primary GBMs to be like group 1 GSC and group 2 GSC. This created two clusters, cluster 1 corresponding to group 1 GSCs (GSCs = 14/15 and primary GBMs = 38) and cluster 2 corresponding to group 2 GSCs (GSCs = 9/9 and primary GBM = 5).
Group representation at single-cell level in bulk tumors
Single-cell RNA-seq samples from Gene Expression Omnibus dataset GSE57872 (Patel et al., 2014) were aligned and processed as the other RNA-seq samples as described previously. This study has single-cell RNA-seq from five bulk tumors. Only single-cell samples and genes that had been described to have robust expression in the study were used for further analysis (n = 468). We additionally filtered for single-cell samples that had expression for at ≥30% of filtered genes (n = 5,516). These overlapped with four group 1–specific TFs, four group 2–specific TFs, 60 group 1 target genes, and 64 group 2 target genes. The gene expression in tags per million of the single-cell samples and RNA-seq of GSCs were log2 transformed and quantile normalized together. Spearman rank correlation coefficient for the group-specific target genes and TFs was determined for every single-cell sample of each of the five tumor types with all the GSC samples in group 1 or group 2. To assess the robustness of the correlation, the gene expression values of the GSC sample were randomized 5,000 times, and the correlation coefficient was calculated. Only if the absolute correlation coefficient was >99% of the random absolute correlation coefficients then the single-cell sample was considered to have a correlation with the GSC sample. For a group level similarity score, the median correlation coefficient for every single-cell sample with all the samples in the GSC group was determined. This was done only for cells that had significant correlation with >75% samples of each group.
Compound drug screening assay
Compounds and DMSO vehicle controls were transferred to black, clear-bottom 1,536-well plates (Greiner; 789092) using an acoustic transfer system (ATS) Gen4+ instrument (EDC Biosystems). Cells were dispensed in a volume of 10 µl, and a density of 1,000 cells per well by using a Multidrop Combi liquid handler (Thermo Fisher Scientific; 5840300). Plates were incubated at 37°C and 5% CO2 for 72 h, at which point CellTiter-Glo reagent (Promega; G7572) was added to each well using the Multidrop Combi instrument. Plates were shaken at 25°C for 30 min and then read for luminescence output by using an EnVision plate reader (Perkin Elmer). Luminescence intensity of compound wells was then expressed as a percentage of the average of the DMSO controls for the corresponding plate for each cell line and plotted against the log of the concentration of compound using GraphPad Prism 7 (GraphPad Software Version 7.04). Data were fit to a log10 (concentration) versus a normalized response curve, and 50% cytotoxic concentration values were obtained for each compound for individual cell lines.
Data accession statement and information
Online supplemental material
Fig. S1 shows a summary of study DNA methylation, H3K27ac, DNA exome, and copy number data. Fig. S2 shows validation of core GSC SE genes. Fig. S3 shows functional validation of group-specific GSC identity genes. Fig. S4 shows gene expression and pathway analysis of group-specific GSC TFs. Fig. S5 shows examples of group-specific GSC TF enhancer and gene expression profiles.
We thank Mary McGraw for her excellent assistance in sample curation and processing.
This work was supported by National Institutes of Health grants CA217066 (to B.C. Prager), CA217065 (to R.C. Gimple), CA203101 (to L.J.Y. Kim), and CA197718, CA154130, CA169117, CA171652, NS087913, NS089272, and NS103434 (to J.N. Rich). C.Y. Lin is supported by the Cancer Prevention and Research Institute of Texas (RR150093), the National Institutes of Health, and the National Cancer Institute (1R01CA215452-01), and is a Pew-Stewart Scholar for Cancer Research (Alexander and Margaret Stewart Trust). S.C. Mack is supported by Cancer Prevention and Research Institute of Texas scholar award (RR170023), Alex’s Lemonade Stand Foundation for Childhood Cancer young investigator award, Rally Foundation research grant, BEAR Necessities Pediatric Cancer Foundation Grant, Children’s Cancer Research Fund award, Children’s Brain Tumor Foundation Award, and Baylor College of Medicine Junior Faculty Award.
C.Y. Lin is a shareholder and inventor of intellectual property licensed to Syros Pharmaceuticals. The other authors declare no competing financial interests.
Author contributions: S.C. Mack and X. Wang led and performed a majority of the functional studies and were actively involved in study design, data analysis, interpretation, and manuscript preparation. I. Singh led the majority of the bioinformatics analysis and also was actively involved in study design, data analysis, interpretation, and manuscript preparation. R. Hirsch, R.C. Gimple, L.J.Y. Kim, A. Morton, B.C. Prager, K.C. Bertrand, C. Mah, and L. Chavez, contributed to data collection, analysis, and study design. Q. Wu, S. Lai, R. Villagomez, C. Lee, and W. Zhou, assisted with in vitro and in vivo functional studies. J.A. Bernatchez, Z. Zhu, Z. Qiu, and J.L. Siqueira-Neto led the drug validation studies, analysis, and data interpretation. G.H. Barnett, M.A. Vogelbaum, A.E. Sloan, and S. Bao assisted with the collection of patient samples, study design, and data interpretation. S.C. Mack, P.C. Scacheri, C.Y. Lin, and J.N. Rich contributed to study design, data interpretation, and manuscript preparation.
S.C. Mack, I. Singh, and X. Wang contributed equally to this paper.