Mammalian genomes are folded into unique topological structures that undergo precise spatiotemporal restructuring during healthy development. Here, we highlight recent advances in our understanding of how the genome folds inside the 3D nucleus and how these folding patterns are miswired during the onset and progression of mammalian disease states. We discuss potential mechanisms underlying the link among genome misfolding, genome dysregulation, and aberrant cellular phenotypes. We also discuss cases in which the endogenous 3D genome configurations in healthy cells might be particularly susceptible to mutation or translocation. Together, these data support an emerging model in which genome folding and misfolding is critically linked to the onset and progression of a broad range of human diseases.
Introduction
The nucleus is nonrandomly organized into distinct substructures with specialized functions (Misteli, 2007; de Wit and de Laat, 2012; Gorkin et al., 2014). Macroscopic structures such as the nuclear pore(s), lamina, and nucleolus provide spatial anchor points around which the genetic material is folded into complex higher order configurations (Peric-Hupkes et al., 2010; Mattout et al., 2015; Yang et al., 2015). Colocalization and tethering of specific genetic loci to nuclear substructures has been correlated with alterations in genome functions such as transcription (Guelen et al., 2008; Capelson et al., 2010; Németh and Längst, 2011) and replication (Pope et al., 2014). Moreover, the folding patterns of genomic loci with respect to one another are also critically linked to how chromatin is read and interpreted throughout the lifetime of a given cell.
Recent high-throughput and high-resolution sequencing studies have provided new insight into the global organization of the genome. Interphase chromosomes are folded into independent spatial territories in the 3D nucleus (Cremer and Cremer, 2001). Chromosomes are further partitioned into megabase-scale topologically associating domains (TADs; Dixon et al., 2012; Hou et al., 2012; Nora et al., 2012; Sexton et al., 2012) and smaller, nested subTADs (Phillips-Cremins et al., 2013; Rao et al., 2014). TADs/subTADs represent segments of the genome in which all pairs of loci interact more frequently with one another than surrounding regions. TADs/subTADs can also form higher order “A” and “B” compartments of active and inactive chromatin, respectively (Lieberman-Aiden et al., 2009; Rao et al., 2014; Fraser et al., 2015). Genomic loci can form specific long-range looping interactions within and across TAD/subTAD boundaries (Chambeyron and Bickmore, 2004; Sanborn et al., 2015). At each level in the chromatin folding hierarchy, the folding patterns exhibit a complex connection to genome function and dysfunction in cellular models of healthy development and disease (Roix et al., 2003; Branco and Pombo, 2006; Fudenberg et al., 2011; Ibn-Salem et al., 2014; Qian et al., 2014; Lupiáñez et al., 2015; Flavahan et al., 2016; Hnisz et al., 2016).
In this review, we discuss recent evidence linking chromatin topology to a range of abnormal phenotypic states. We highlight two emerging models for the role of 3D genome architecture in disease. Misconfigured chromatin folding can occur as a consequence of disease onset, leading to a secondary cascade of genome dysfunction. Alternatively, unique topological folding patterns in healthy cells can be more susceptible to inherited and somatic mutations compared with the rest of the genome. Together these data support an emerging model in which genome folding and misfolding is critically linked to the onset and progression of a broad range of human diseases.
CTCF connects the genome at different length scales in healthy mammalian cells
CCCTC-binding factor (CTCF) is a ubiquitously expressed zinc finger (ZNF) protein with widespread roles in regulating diverse genome functions such as transcriptional activation, repression, insulation, replication, recombination, and splicing (Phillips and Corces, 2009; Ong and Corces, 2014). It has been hypothesized that the pleiotropic effects of CTCF can be explained by its unifying role as an architectural protein connecting long-range genomic interactions (Phillips and Corces, 2009; Phillips-Cremins and Corces, 2013). Seminal studies reported CTCF at the base of critical looping interactions and a disruption of looping as a consequence of CTCF knockdown (Kurukuti et al., 2006; Hadjur et al., 2009; Handoko et al., 2011; Splinter et al., 2012). Moreover, in a high-resolution analysis of fine-scale chromatin interactions within TADs, constitutively bound CTCF sites were found to be enriched at the base of developmentally constitutive looping interactions (Phillips-Cremins et al., 2013). Consistent with these results, a genome-wide Hi-C study reported that CTCF knockdown in a human cell line increased long-range interactions across TAD boundaries and disrupted genome folding within TADs (Zuin et al., 2014). Moreover, 10,000 chromatin looping interactions were identified in the highest (1–5 kb) resolution human Hi-C experiment to date. The large majority of these interactions were anchored by CTCF (Rao et al., 2014). Finally, a recent study degrading CTCF on short time scales demonstrated a genome-wide disruption of thousands of loops and hundreds of TAD boundaries genome-wide (Nora et al., 2017). Together, these data indicate that CTCF is an essential organizer of long-range chromatin interactions across the genome.
Recent papers coupling high-resolution proximity ligation experiments (de Wit and de Laat, 2012; Dekker et al., 2013) with CRISPR/Cas9 genome editing (Doudna and Charpentier, 2014; Hsu et al., 2014) have provided additional direct evidence for the unique role for CTCF as an architectural protein. A key discovery emerging from these experiments is the first causal evidence that orientation of the CTCF consensus binding sequence is a critical component of the specificity of loop formation. The consensus sequence orientation of CTCF correlates with looping directionality (Vietri Rudan et al., 2015). Moreover, the majority of CTCF-mediated loops genome-wide contain CTCF consensus sites in a convergent orientation, whereas divergently oriented sequences are severely depleted (Rao et al., 2014). In their analysis, Rao et al. (2014) focused specifically on a subset of 4,000 genome-wide loops identified in humans that exhibit CTCF binding on both anchoring segments and only a single copy of the consensus sequence. Thus, an open question is whether loops can form by tandem consensus orientations. Subsequent studies reported 65% of looping interactions with consensus convergency and 10–40% exhibiting tandem/same-direction orientation (de Wit et al., 2015; Guo et al., 2015; Tang et al., 2015). The causal link between convergent CTCF consensus sequences and looping was confirmed via CRISPR/Cas9-mediated inversion of consensus directionality, which led to disruption of looping without perturbing CTCF occupancy (de Wit et al., 2015; Guo et al., 2015; Sanborn et al., 2015). Although the functionality of tandem loops remains to be elucidated, these results suggest that convergent CTCF orientation is a causal mechanistic feature of long-range chromatin looping. Together, the genome editing studies to date are consistent with an essential role for CTCF as a looping facilitator.
CTCF is also enriched at the boundaries of megabase-scale TADs (Dixon et al., 2012). Seminal genetic experiments in which a CTCF-containing boundary was deleted in embryonic stem cells provided the first indication that CTCF was required for the demarcation of TAD boundaries (Nora et al., 2012). More recently, it was reported that both active and inactive developmentally regulated genes are often contained within subTADs anchored by CTCF-mediated looping interactions (Dowen et al., 2014). At least some subTADs create “insulated neighborhoods” that are thought to restrict enhancer-promoter contacts and guard against key developmentally regulated enhancers activating off-target genes. Disruption of domain boundaries via targeted deletion of CTCF binding sites led to ectopic enhancer looping and activation of genes outside of the insulated neighborhood (Dowen et al., 2014; Guo et al., 2015; Lupiáñez et al., 2015; Flavahan et al., 2016; Hnisz et al., 2016). As discussed in detail elsewhere (Phillips-Cremins and Corces, 2013; Beagan and Phillips-Cremins, 2016), we have hypothesized that classic enhancer blocking and barrier insulation mechanisms could be a downstream consequence of the role for CTCF in demarcating TAD boundaries. Thus, an emerging model is that CTCF functions primarily as an organizer of long-range looping interactions and TAD/subTAD boundaries, with diverse, context-specific roles in genome function as a downstream consequence of its primarily architectural role.
Modular engagement of CTCF ZNFs with the genome in healthy cells
A leading model is that the diversity the functions of CTCF can be explained, at least in part, through differential, combinatorial engagement of its 11 ZNFs with variants of the consensus sequence across the genome (Ohlsson et al., 2001; Phillips and Corces, 2009; Nakahashi et al., 2013). Ohlsson et al. (2001) originally proposed the “CTCF code hypothesis,” in which differential ZNF engagement would in turn lead to alternative CTCF conformations, diversity in binding partners, and differential post-translational modifications. A recent study tested this hypothesis by examining genome-wide CTCF occupancy after functional disruption of each of the 11 ZNFs (Nakahashi et al., 2013). Mutation of core ZNFs (ZNFs 4–7) markedly reduced the number of genome-wide CTCF-occupied binding sites, whereas disruption of the peripheral ZNFs had a less pronounced effect. These results confirm and extend seminal in vitro biochemistry experiments showing that the central CTCF ZNFs bind to the core consensus sequence to control occupancy (Renda et al., 2007). The role for the peripheral ZNFs in occupancy and CTCF-mediated long-range interactions remains an exciting open question under active investigation. We posit that differential CTCF ZNF engagement with the genome might ultimately result in diverse genome folding patterns. Variation in CTCF ZNF-mediated architectures might contribute to the functional differences in CTCF sites across the genome serving as looping facilitators, enhancer blocking insulators, and TAD boundary elements.
An important unanswered question is how CTCF occupancy is linked to its underlying consensus sequence and combinatorial ZNF engagement with the genome. Previous studies profiling genome-wide CTCF occupancy identified a 20-bp core consensus motif that is present in >80% of sites (Kim et al., 2007; Nakahashi et al., 2013). Subsequent higher resolution chromatin immunoprecipitation with lambda exonuclease digestion experiments uncovered additional secondary CTCF motifs adjacent to the core consensus sequence that are present in various combinations throughout the genome (Rhee and Pugh, 2011). More recently, Nakahashi et al. (2013) parsed CTCF consensus sequences into several distinct modular classes and studied their influence on the binding propensity of the CTCF ZNF mutants. Although the large majority of CTCF sites contain only the core consensus sequence, a small proportion contain various combinations of upstream, downstream, and core consensus motifs (Fig. 1 A). Importantly, CTCF binding strength was increased at sites containing a combination of core and upstream motifs compared with the core consensus alone, whereas the downstream motif appeared to destabilize/reduce CTCF occupancy when in the presence of the upstream and/or core consensus motifs. Thus, ZNF interactions with the genome beyond the core consensus have a complex but critical role in CTCF occupancy.
To further understand the interplay between CTCF occupancy and ZNF engagement with the genome, Nakahashi et al. (2013) also examined the relationship between the modular CTCF motifs and their interaction with the different ZNFs. As predicted from previous studies, mutations to the core ZNFs dramatically reduced CTCF occupancy at all sites independent of the underlying motif. Moreover, mutations to the C-terminal ZNFs 9–11 resulted in a sharp decrease in CTCF binding, but primarily at motifs containing the core consensus coupled with the upstream motif. In contrast, mutations to the N-terminal ZNFs 1–3 showed minimal effect on CTCF binding strength regardless of the underlying motif. Together, these results support a working model in which combinations of specific consensus sequences modulate the engagement of individual ZNF mutations with the genome and, ultimately, CTCF occupancy (Fig. 1 A). CTCF binding can be oriented by the genetic sequence, and the C-terminal ZNFs can stabilize CTCF binding by associating with the upstream motif. Future studies integrating Hi-C analysis with targeted ZNF mutations in the endogenous Ctcf gene will shed valuable new light on the impact of altered genome-wide CTCF occupancy on genome organization, gene expression, and cellular function.
Heterozygous CTCF gene deletions are implicated in cancer
Mounting evidence suggests that CTCF and the genome configurations that it organizes might be linked to the establishment and/or or maintenance of disease phenotypes (Lupiáñez et al., 2015; Flavahan et al., 2016; Hnisz et al., 2016). An early indication that CTCF may be implicated in cancer was the discovery that deletion of 16q, a region containing the Ctcf gene, is observed in human cancers (Carter et al., 1990; Radford et al., 1995). Indeed, heterozygous deletion of the 16q22.1 genomic locus is one of the most common genetic events in breast cancer, and nearly all cases of a particular subtype of breast cancer, lobular carcinoma, exhibit the CTCF deletion (Rakha et al., 2006). Consistent with the possibility that CTCF levels may contribute to the pathogenesis of cancer, a Ctcf+/− mouse model exhibits an increased rate of spontaneous, radiation-induced, and chemically induced tumor formation (Kemp et al., 2014). Although the mechanistic role of CTCF in tumor generation has not been explicitly elucidated, these data highlight that perturbations to its protein and/or mRNA levels might be causally linked to certain cancers.
It is currently unknown whether disruption of CTCF levels contributes to cancer through primary or secondary effects. We favor a model in which CTCF depletion results in selective perturbation of a subset of genome folding configurations that are highly sensitive to CTCF levels. Seminal genome-wide mapping studies indicated that CTCF binds to 35,000–75,000 diverse sites across the mammalian genome. Initial analyses across a small number of cell types supported the idea that the large majority of CTCF sites are invariantly bound across cell types (Kim et al., 2007; Chen et al., 2008, 2012; Cuddapah et al., 2009; Klein et al., 2011; Rhee and Pugh, 2011; Yamane et al., 2011; Schmidt et al., 2012; Wang et al., 2012). However, a more recent comparison of CTCF binding across 40 human cell types demonstrated that of the 110,000 CTCF sites possible in any given cell type, only a small number (20,000 sites) are constitutive, suggesting that cell type–specific/dynamic occupancy could occur at a markedly larger fraction than previously realized (Maurano et al., 2015). Constitutive sites contain strong CTCF binding and have a high sequence homology to the core CTCF consensus sequence (Cuddapah et al., 2009; Wang et al., 2012). Importantly, constitutive CTCF sites were reported to be resistant to CTCF knockdown, whereas dynamic sites were more susceptible to perturbation (Schmidt et al., 2012). The differential susceptibility of different classes of CTCF sites to knockdown might explain why reduction of CTCF levels to 10–20% of its WT levels often only results in modest changes in gene expression (Soshnikova et al., 2010; Zuin et al., 2014). Thus, we hypothesize that developmentally invariant TAD boundaries and looping interactions connected by constitutive CTCF would be more resistant to the effects of CTCF haploinsufficiency, whereas the sites in the genome containing tissue-specific CTCF occupancy and architecture might be more readily susceptible to CTCF level changes during cancer pathogenesis.
CTCF ZNF mutations are implicated in tumorigenesis
Mutations to the genetic sequence encoding CTCF ZNFs have been identified in patient tumor samples and specific cancer models. For example, recurrently mutated hotspots were uncovered in CTCF ZNFs 1, 2, and 5 in multiple endometrioid tumor samples (Ciriello et al., 2013; Fig. 1, A and B). Moreover, a screen of 100 breast, prostate, and Wilms’ tumors for Ctcf mutations identified somatic missense mutations in ZNFs 3 and 7 (Filippova et al., 2002; Fig. 1, A and B). These data suggest that multiple subtypes of cancer can exhibit missense mutations in the Ctcf gene that lead to altered ZNF structure.
It is tempting to speculate that many of the ZNF mutations identified in cancer could alter CTCF occupancy patterns and, consequently, 3D genome folding and gene expression (Filippova et al., 1996, 2002; Burcin et al., 1997; Awad et al., 1999; Nakahashi et al., 2013). Filippova et al. (2002) provided evidence to support this idea by reporting that (1) in vitro synthesized CTCF with ZNF 3 or ZNF 7 mutations alter protein binding in gel shift assays and (2) some ZNF mutations affected reporter expression in transgene assays in a probe-dependent manner. Although the downstream effects of CTCF ZNF mutations have yet to be determined in vivo, these mutations likely affect the specificity and affinity of CTCF binding throughout the genome and have the potential to affect 3D genome organization and gene expression (Fig. 1 C). Future studies that introduce specific cancer-associated mutations to CTCF ZNFs along with an integrated downstream analysis including profiling of CTCF occupancy genome-wide, gene expression, and global genome topology will help elucidate the implications of CTCF ZNF mutations in cancer.
Genomic CTCF-binding site mutations in cancer
In addition to perturbations to the CTCF protein itself, an emerging body of evidence suggests that the genetic sequence underlying CTCF-binding sites can also be disrupted in cancer and could be a mechanism for oncogene activation. In a whole genome sequencing of 200 human samples of colorectal cancer and patient-matched controls, Katainen et al. (2015) reported that the frequency of somatic mutations was significantly higher at CTCF-binding sites than flanking regions and the genome-wide rate. The most frequent mutation was in the core consensus of CTCF, suggesting that cancer-associated point mutations might directly affect CTCF ZNF engagement and occupancy (Fig. 1 B). In support of this hypothesis, single-nucleotide polymorphisms (SNPs) in the core CTCF-binding motif have been shown to be associated with abrogated CTCF binding (Nakahashi et al., 2013). Together, these data indicate that somatic mutations accumulated throughout the lifetime of an individual might accumulate at CTCF occupied consensus sequences and could negatively affect binding and 3D genome folding in a manner that contributes to gene expression deregulation in pathology.
Several lines of evidence support the idea that somatic mutations observed in some cancers might be specifically linked to the role of CTCF in connecting chromatin architecture. First, the increased mutation frequency was not observed at binding sites for other queried transcription factors that are not known to be involved in 3D genome folding, suggesting that CTCF sites are uniquely susceptible to mutation. Second, CTCF-only sites without cohesin and cohesin-only sites without CTCF did not exhibit increases in mutation frequency. Given the propensity for combination CTCF/cohesin co-occupied sites to participate in long-range interactions (Hadjur et al., 2009; see below for a more detailed discussion on cohesin), we posit that genome configurations connected by CTCF and cohesin might be susceptible to genome mutations. Third, SNPs in the core CTCF consensus have also been linked to perturbations in CTCF-mediated chromatin looping (Tang et al., 2015), and a recent study showed that introducing a single additional base pair to one specific CTCF-binding motif was sufficient to disrupt chromatin architecture (Sanborn et al., 2015). Together, these data support the possibility that genome mutations can occur at CTCF-binding sites, leading to abrogated CTCF occupancy and disruption of chromatin looping.
Perturbation of TAD boundaries in disease
Megabase-scale TADs are largely invariant across cell types; therefore, their role in gene expression regulation has not been straightforward to dissect. One leading idea is that TADs function to selectively restrict the long-range interaction landscape of enhancers to prevent off-target gene activation. Indeed, enhancer-promoter contacts typically occur within the demarcated boundaries of TADs, and disruption of TAD structure leads to ectopic expression of genes adjacent to the perturbed boundary (Nora et al., 2012; Anderson et al., 2014; Dowen et al., 2014; Symmons et al., 2014; Dekker and Heard, 2015; Lupiáñez et al., 2015; Hnisz et al., 2016). Emerging evidence suggests that disruption of TAD boundaries and concomitant aberrant enhancer-promoter contact is a pathological mechanism implicated in several different human diseases, including cancer and congenital diseases such as aberrant limb development (Ibn-Salem et al., 2014; Lupiáñez et al., 2015; Hnisz et al., 2016; Ji et al., 2016).
Two recent studies shed light onto a mechanism linking TAD boundary and looping disruption to gene expression deregulation in cancer. In the first study, Ji et al. (2016) identified chromatin loops demarcating subTADs/TADs around key developmentally regulated enhancers and their target genes. Deletion of the loop-forming CTCF sites led to ectopic enhancer looping and activation of genes outside of the domain. Consistent with previous studies (Katainen et al., 2015), >7,000 mutations overlapping anchors of CTCF/cohesin-mediated looping interactions were observed in the the International Cancer Genome Consortium database, suggesting that loops could be altered in disease (Ji et al., 2016). In the second study, Hnisz et al. (2016) identified a second functional class of subTADs that form insulated neighborhoods around key proto-oncogenes to keep them in an inactivated state. CRISPR/Cas9 deletion of key CTCF sites disrupts the subTAD around the silent proto-oncogenes TAL1 or LMO2, resulting in proto-oncogene activation via ectopic long-range interactions with distal enhancers outside of the domain. Together, these data support a model in which some “silencing” subTADs function to prevent proto-oncogenes from activation and some “activating” subTADs create insulated neighborhoods around key developmentally regulated enhancer promoter interactions. Disruption of boundaries via mutation in the CTCF consensus sequence might result in (1) the escape of enhancers from “activating” subTADs to ectopically up-regulate nearby cancer-associated genes outside of the domain or (2) the ectopic looping of distal enhancers into a “silencing” subTAD to aberrantly up-regulate inactive oncogenes (Fig. 1 C). These results suggest that even a small number of disrupted CTCF-binding sites could have a profound effect on gene expression changes that ultimately lead to disease.
Beyond cancer models, 3D genome domain disruption has also been observed in other mammalian disorders. One study focused on limb malformation diseases demonstrated that perturbation of TAD boundaries via genetic rearrangements can lead to altered gene expression and shortened and/or fused digits (Lupiáñez et al., 2015). The authors first identified genetic rearrangements that were associated with phenotypes of limb malformation in a patient population. Nearly all disease-associated genetic rearrangements were adjacent to the developmentally regulated gene EPH Recepter A4 (Epha4) and involved deletion or displacement of a TAD boundary (Fig. 2). To test the hypothesis that TAD boundary disruption contributes to digit malformation, genetic rearrangements that mimic those found in the limb malformation disease states were introduced in mice with CRISPR/Cas9. Importantly, the disease-associated genetic rearrangements recapitulated the disease phenotypes, disrupted TAD integrity, and resulted in ectopic long-range contacts over the mutated boundary between the Epha4 enhancer and developmentally regulated genes in the adjacent domain (Fig. 2). Consistent with the cancer studies, these data also support the idea that disruption of TAD boundary integrity, from either deletion (Fig. 2 B) or inversion (Fig. 2 C), can lead to off-target long-range contacts and aberrant gene deregulation. A lingering question from these studies is whether it is truly CTCF demarcation of boundaries that is the critical mechanistic driver of boundary disruption or other regulatory elements at the boundaries. This issue was directly addressed by creating transgenic mice with deletions resembling those seen in disease phenotypes, but with the CTCF-binding sites intact. Strikingly, ectopic 3D contacts, aberrant gene expression, and abnormal phenotypes were not observed despite the large-scale genomic deletion (Lupiáñez et al., 2015). Together, these data suggest that a subset of constitutive CTCF sites create subTADs via the formation of loops and that disruption of these boundaries in disease can lead to aberrant gene expression activation.
Aberrant CTCF-binding site methylation is implicated in cancer
The mechanism by which differential CTCF occupancy is specified across cellular states is a critical aspect of understanding how the genome reconfigures in healthy development and disease. A leading hypothesis is that cell type–specific methylation of CpGs within the CTCF-binding motif is refractory to CTCF occupancy (Renda et al., 2007). Renda et al. (2007) demonstrated with in vitro gel shift assays that methylation of two critical CpGs in the 12-bp core CTCF-binding motif disrupts CTCF binding (Fig. 1 A). A more recent study examined the link between genome-wide CTCF occupancy and DNA methylation (Wang et al., 2012). By integrating CTCF chromatin immunoprecipitation sequencing in 19 different cell lines with bisulfite sequencing data, the authors confirmed that increased DNA methylation was negatively associated with CTCF occupancy. Two specific CpGs within the core CTCF-binding motif showed a strong correlation with differential occupancy due to DNA methylation, one of which Renda et al. (2007) had previously identified (Fig. 1 A). Intriguingly, global reduction in DNA methylation via knockout of DNA methyltransferases was insufficient to reinstate CTCF binding at the vast majority of CTCF-binding motifs (Maurano et al., 2015). We posit that additional cell type–specific epigenetic features would need to be remodeled to facilitate CTCF rebinding after removing DNA methylation. Together, these results support a model in which the acquisition of DNA methylation readily disrupts CTCF binding, whereas the removal of DNA methylation is insufficient for ubiquitous reengagement of CTCF with the genome.
Because DNA methylation can modulate CTCF occupancy during normal development (Bell and Felsenfeld, 2000; Hark et al., 2000; Kanduri et al., 2000), it is plausible that aberrant DNA methylation patterns could disrupt genome architecture in disease. A recent study investigated a particular subtype of cancer, isocitrate dehydrogenase (IDH) mutant gliomas, and asked whether the widespread DNA hypermethylation that occurs in this disease could lead to perturbations to 3D genome organization (Flavahan et al., 2016). The authors identified a subset of CTCF sites with a higher degree of methylation and abrogated CTCF occupancy in IDH mutant gliomas compared with non-IDH mutant gliomas. A critical TAD boundary was perturbed in the IDH mutant glioma condition near the oncogene platelet derived growth factor receptor α (PDGFRA). The perturbed boundary contained aberrant methylation at a key CpG residue in the CTCF-binding motif and reduced occupancy of CTCF in the IDH mutant glioma condition (Fig. 1 C). As a consequence of the loss of boundary integrity, a long-range enhancer 900 kb upstream aberrantly contacted the PDGFRA promoter in the IDH mutant cells, leading to up-regulation of PDGFRA and a selective growth advantage. These results provide a striking example of the causal link among aberrant DNA methylation, altered CTCF occupancy, miswired 3D genomic contacts, and misregulation of gene expression.
Overall, the new dimension of aberrant CTCF-binding site methylation and altered genome organization in disease inspires new ideas for druggable targets and therapeutic intervention. Indeed, Flavahan et al. (2016) treated an in vitro model of IDH mutant gliomas with the demethylating agent 5-azacytidine and found that CTCF occupancy was increased and PDGFRA expression was down-regulated. Although a subset of CTCF sites exhibit poor ability to regain occupancy after methylation (Maurano et al., 2012), these results suggest that there might also be a subset of readily “reactivated” sites that might be responsive to drug intervention. Of note, these results also suggest that TAD boundary disruption might also be reversible with a specific, well-designed therapy.
The nuclear lamina and laminopathies
The nuclear lamina is a structure lining the nuclear membrane that plays an integral role in chromatin organization and gene expression (Mattout et al., 2015). It is composed of a meshwork of filamentous lamin proteins that provides structural support to the nucleus and anchor points for 3D chromatin folding configurations (Dittmer and Misteli, 2011). DamID, a technique that involves the fusion of DNA adenine methyltransferase to a protein of interest and genome-wide mapping of methylated adenine residues, has identified 1,300 lamina-associated domains (LADs) ranging from 0.1–10 Mb in size (Guelen et al., 2008). LADs have sharply demarcated boundaries that are enriched for CTCF and YY1 (Guelen et al., 2008; Harr et al., 2015) and in some cases directly overlap with TADs (Dixon et al., 2012). Chromatin in LADs is enriched for the histone modification H3K9me2/3, and genes in LADs tend to be inactive (Guelen et al., 2008; Wen et al., 2009; Peric-Hupkes et al., 2010). Interestingly, key pluripotency genes were found to associate with the nuclear lamina and become repressed as embryonic stem cells differentiate, whereas genes are released from the lamina concurrently with activation of lineage-specific transcription (Peric-Hupkes et al., 2010). Thus, the nuclear lamina provides anchor points around which chromatin is organized in distinct patterns during development.
Mutations to the lamin and lamina-associated proteins lead to a set of disorders known collectively as “laminopathies” (Dittmer and Misteli, 2011; Mattout et al., 2015). Laminopathy phenotypes include cardiomyopathy, neuropathy, and premature aging and can affect single or multiple organ systems (Mattout et al., 2015). Uncovering the mechanistic basis of these disorders across the diverse set of mutations and phenotypes represents an active area of investigation, and much remains unknown. Knockout of LMNA, the gene containing the highest number of laminopathy-associated familial mutations, leads to severely delayed postnatal growth and muscular dystrophy in mice (Sullivan et al., 1999; Nikolova et al., 2004). Moreover, homozygous disruption of the LMNB1 gene is fatal at birth in mice (Vergnes et al., 2004), and a limited number of laminopathy-associated mutations in LMNB1 or LMNB2 have been identified (Dittmer and Misteli, 2011). Intriguingly, retinal cells of nocturnal animals are deficient in two key nuclear lamina proteins (LMA and LBR) and exhibit an inversion of the typical distribution of heterochromatin from the nuclear periphery to nuclear interior (Solovei et al., 2009, 2013). Deletion of the LMNA and LBR genes recapitulates the inversion of heterochromatin from the periphery to the nuclear interior in cells across many tissue types (Solovei et al., 2013). Further investigation is required to elucidate the mechanism by which mutations in nuclear lamina proteins can lead to a diverse array of disease phenotypes, but it is clear that the dramatic reorganization of chromatin structure and concomitant changes in gene expression are linked to the disruption of the nuclear lamina (Burke and Stewart, 2002; Nikolova et al., 2004; Galiová et al., 2008; Peric-Hupkes et al., 2010; Dittmer and Misteli, 2011; Solovei et al., 2013; Talamas and Capelson, 2015). Thus, laminopathies represent another class of disorders in which disruption of proper chromatin organization may play a significant role in driving pathogenic phenotypes.
Cohesin and cohesinopathies
Cohesin is a multiple-subunit, ring-like protein that plays a crucial role in sister chromatid cohesin (Michaelis et al., 1997), regulation of gene expression (Parelho et al., 2008; Rubio et al., 2008; Wendt et al., 2008; Kagey et al., 2010; Faure et al., 2012) and DNA replication (Guillou et al., 2010; Nasmyth, 2011). Recently, subunits of the cohesin complex have also emerged as key architectural proteins connecting the genome into unique configurations during interphase (Hadjur et al., 2009; Kagey et al., 2010; Phillips-Cremins et al., 2013; Seitan et al., 2013; Haarhuis et al., 2017). One leading model is that cohesin works in concert with CTCF to anchor long-range looping interactions that are largely invariant between cell types by forming a ring around the fragments anchoring the base of the loop. Cohesin subunits can also anchor short-range, cell type–specific loops between enhancers and promoters in a CTCF-independent manner (Kagey et al., 2010; Phillips-Cremins et al., 2013), suggesting that it can work with many classes of architectural DNA-binding proteins.
An intriguing model based on loop extrusion has been proposed for the physical mechanism by which cohesin forms chromatin loops (Nasmyth, 2001; Alipour and Marko, 2012; Sanborn et al., 2015; Fudenberg et al., 2016). In the loop extrusion model, a segment of DNA becomes entrapped within the ring-like structures of cohesin, and then the loop lengthens as cohesin slides along the DNA until it reaches a boundary element. This model has gained traction recently because of a series of empirical and simulation-based Hi-C studies (Sanborn et al., 2015; Fudenberg et al., 2016; Haarhuis et al., 2017). Fudenberg et al. (2016) simulated TAD structures that recapitulate Hi-C data by modeling chromatin as a polymer that is subject to the activity of loop extruding factors such as cohesin and boundary elements such as CTCF. Moreover, Sanborn et al. (2015) experimentally manipulated the placement and orientation of CTCF-binding sites and found that the loop extrusion model could correctly predict the effect the manipulations would have on Hi-C heatmaps.
Recently, compelling experimental evidence supporting the loop extrusion model for cohesin function has been reported (Haarhuis et al., 2017). The authors explored the effect of two cohesin complexes, WAPL and SCC4, on chromatin organization via CRISPR-Cas9 genome editing and high-resolution Hi-C. WAPL is a protein responsible for the unloading of cohesin from the genome, whereas SCC4 works together with NIPBL to load cohesin onto the genome. Deleting the Wapl gene led to an increase in the number of cohesin sites genome-wide and a striking increase in loop length, suggesting that the amount of time a cohesin molecule remains in contact with the genome is a critical factor governing loop length. Conversely, disruption of the Scc4 gene led to decreased Smc1 binding genome-wide, more diffuse TAD boundaries, and a significant decrease in loop size. Thus, experimental studies are consistent with the role for cohesin as a loop extrusion factor and indicate that the dynamics of cohesin engagement with the genome determines the length of chromatin loops.
Mutations in cohesin subunits or regulators have been observed in a class of diseases known as cohesinopathies (Liu and Krantz, 2008; Ball et al., 2014). One of the most widely studied cohesinopathies, Cornelia de Lange syndrome, is caused by mutations in the various subunits of cohesin (Rad21, Smc1A, Smc3) or the cohesin-loading factor NIPBL (Liu and Krantz, 2008). Cornelia de Lange syndrome is characterized by upper limb defects, distinct craniofacial features, growth retardation, microcephaly, and intellectual disability (Liu and Krantz, 2008). The precise mechanism(s) by which cohesinopathy-associated mutations give rise to disease have yet to be elucidated. Given the crucial role of cohesin in facilitating short-range enhancer promoter interactions and longer range CTCF-mediated interactions, it is possible that the disruption of loop extrusion could be a driving pathological mechanism behind Cornelia de Lange syndrome (Kagey et al., 2010; Phillips-Cremins et al., 2013; Sanborn et al., 2015; Fudenberg et al., 2016; Haarhuis et al., 2017). Future studies that explore the effect of cohesinopathy-associated mutations on loop extrusion and gene expression will likely uncover new mechanisms of disease as well as offer further insight into the mechanisms of cohesin-mediated chromatin organization.
Nuclear organization and chromosomal rearrangements in cancer
Genomic rearrangements are implicated in the pathogenesis of many types of cancer (Vogelstein et al., 2013), and studies investigating the mechanism of the formation of rearrangements have long hypothesized that nuclear organization plays a fundamental role. Rearrangements occur when two or more double-stranded DNA breaks (DSBs) are located proximally enough to fuse (Wijchers and de Laat, 2011). Two models have been proposed for the mechanism by which this occurs: the “breakage first” and the “contact first” models (Misteli and Soutoglou, 2009). The “breakage first” model asserts that two DSBs may be distally located and then traverse large spatial distances to find each other and rearrange (Aten et al., 2004). However, imaging studies indicated that chromatin fragments do not typically move distances greater than 0.5–1 µm (Soutoglou et al., 2007; Roukos et al., 2013), suggesting that the model might not be valid. In contrast, the “contact first” model asserts that genomic rearrangements occur between two loci that were already proximally located at the time of DSB formation (Fig. 3). In support of this hypothesis, DNA FISH studies have uncovered that the frequency of translocations between two chromosomes is related to their spatial proximity in interphase nuclei (Croft et al., 1999; Roix et al., 2003). Roix et al. (2003) queried the nuclear positions of seven genes on distinct chromosomes involved in translocations in patients with B cell lymphomas. Intriguingly, they observed that gene pairs that were located spatially closer to each other in nuclei of healthy B cells were more likely to be involved in translocations in the patient population. Consistent with this finding, high-resolution in situ hybridization demonstrated that chromosome territories intermingle significantly in healthy cells and that the degree of intermingling between specific chromosomes correlates with translocation frequency (Branco and Pombo, 2006).
To assess the relationship between 3D proximity and formation of translocations genome-wide (Zhang et al., 2012), an independent study mapped all observed translocation partners of experimentally induced DSBs in pro-B cells using high-throughput genome-wide translocation sequencing (Chiarle et al., 2011; Klein et al., 2011). By correlating Hi-C maps of healthy pro-B cells to translocation maps, the 3D interaction counts between whole chromosomes were found to exhibit a strong correlation with the frequency of translocation, suggesting that spatial genomic proximity precedes translocation. Thus, both locus-specific microscopy and genome-wide sequencing experiments support a contact-first model in which the spatial proximity of DNA is related to the frequency of translocation when DSBs are abundant.
Disease-associated alleles and long-range enhancer-promoter interactions
3D genome folding patterns can also provide new insight into understanding how genetic sequence variation is linked to differences in disease susceptibility among individuals. Genome-wide association studies (GWAS) have shown utility in identifying candidate SNPs statistically associated with complex diseases. However, to date the functional role for many candidate SNPs in the underlying mechanisms of pathogenesis remain undefined. One critical reason why the functional significance of SNPs has been difficult to assess is that the target genes they regulate are unknown. Indeed, more than 2,600 SNPs have been associated with human disease, but the large majority fall in noncoding intronic and intergenic regions with unknown target genes (Hindorff et al., 2009; Maurano et al., 2012; Schaub et al., 2012). Approaches in which GWAS hits are linked to the closest, or most biologically plausible, gene candidate have often been misleading, resulting in time-consuming genetic and functional dissection of genes which end up being noncausal for the trait of interest. The ability to dissect a candidate locus into the constituent SNPs that causally contribute to altered protein expression and/or function would be transformative toward understanding the mechanisms underlying disease susceptibility.
Understanding how the genome folds in 3D has recently shown promise in facilitating the connection of distal SNPs with their target genes. A classic example of this ideology is found at the fat mass and obesity associated (FTO) gene, where GWAS have uncovered a 47-kb locus of SNPs in high linkage disequilibrium exhibiting a strong statistical association with obesity and type II diabetes (Dina et al., 2007; Frayling et al., 2007; Scuteri et al., 2007). Notably, although located in an FTO intron, the SNPs have failed to exhibit a clear causal link to FTO expression. Thus, although FTO encodes an enzyme involved in metabolism and body weight, its variability in expression might not be specifically governed by its intronic SNPs. To understand the regulatory landscape of the broader genomic locus around FTO, chromosome conformation capture was used in E9.5 mouse embryos, adult mouse brains, and human cell lines (Smemo et al., 2014). In adult mouse brain tissue, the obesity-associated locus did not interact with the FTO promoter but did form an unexpected 3D connection ∼500 kb downstream with the Irx3 gene. Using transgene assays, the intronic SNPs were found to have enhancer activity in some of the cells and tissues in which Irx3 was expressed. Moreover, in an expression quantitative trait locus mapping study with ∼150 human brain samples, a strong statistical association was reported between 11 of the obesity-linked intronic SNPs and Irx3, but not FTO, expression. Notably, knockout of Irx3 resulted in (1) a 25%–30% reduction in body weight, (2) resistance to obesity induced by a high-fat diet, and (3) reduction in occurrence of age-induced metabolic disorders compared with WT mice. These results support a model in which the obesity-associated SNPs overlap an enhancer which functions via long-range interactions to contact and differentially regulate Irx3 instead of FTO (Smemo et al., 2014).
The long-range gene-enhancer regulation model is not restricted to obesity and the FTO locus. Evidence continues to accumulate suggesting that a large proportion of disease-associated SNPs fall in enhancers. Furthermore, in several recent high-resolution mapping studies, enhancers have shown a strong propensity for interacting distally beyond the next adjacent gene (Li et al., 2012; Sanyal et al., 2012; Sahlén et al., 2015). Therefore, to move beyond the SNP-proximal transcription start site model, several additional recent studies have pursued the mapping of higher order genome folding around their candidate susceptibility loci. Chromatin interactions were mapped around 14 colorectal cancer risk loci in cancer cell lines as well as around SNPs linked to four autoimmune disorders in human B and T cell lines (Jäger et al., 2015; Martin et al., 2015) thus uncovering novel interactions between risk loci and disease-relevant genes. Although much work is left to be done in mapping enhancer activity, gene expression, and 3D interactions in biologically relevant cell models with genotypes linked to disease states, these initial studies provide an important advance in linking genetic variation with 3D genome folding and disease.
Conclusions
The mammalian genome is organized in a nested hierarchy of unique topological features ranging from loops to chromosome territories. Perturbations to genome organization at each length scale have been observed in human disease. At the level of chromosome territories, the spatial proximity of chromosomes directly influences the probability of translocation between specific genomic loci (Croft et al., 1999; Roix et al., 2003; Parada et al., 2004; Branco and Pombo, 2006; Zhang et al., 2012). Perturbation of TAD and subTAD boundaries via mutations, deletions, or aberrant methylation of CTCF-binding sites has been linked to pathological activation of genes by the ectopic looping of enhancers in limb malformation syndromes (Lupiáñez et al., 2015) and cancer (Flavahan et al., 2016; Hnisz et al., 2016). Mutations affecting proper cohesin function and the structural properties of the nuclear lamina give rise to diseases known as cohesinopathies and laminopathies, respectively. Additionally, a mutation in an intron leading to aberrant enhancer-promoter contacts has implicated a new target gene linked to obesity (Smemo et al., 2014). Data thus far indicate that CTCF can be disrupted in disease via (1) reduction in global CTCF levels, (2) mutations of CTCF ZNFs, (3) mutations to the binding motif, or (4) aberrant methylation of the consensus sequence. Perturbations to proper CTCF function can lead to severe disruptions in 3D genome organization in a wide range of human diseases. Aberrant 3D genome folding represents a new dimension in understanding sporadic and familial disease states and might inspire a unique class of therapeutic interventions based on preventing or rewiring pathological 3D contacts.
Acknowledgments
Jennifer E. Phillips-Cremins is a New York Stem Cell Foundation – Robertson Investigator and an Alfred P. Sloan Foundation Fellow. This research was supported by The New York Stem Cell Foundation (to J.E. Phillips-Cremins), the Alfred P. Sloan Foundation (to J.E. Phillips-Cremins), and the National Institutes of Health Director’s New Innovator Award from the National Institute of Mental Health (1DP2MH11024701; J.E. Phillips-Cremins).
The authors declare no competing financial interests.