Imaging studies, high-resolution chromatin conformation maps, and genome-wide occupancy data of architectural proteins have revealed that genome topology is tightly intertwined with gene expression. Cross-talk between gene-regulatory elements is often organized within insulated neighborhoods, and regulatory cues that induce transcriptional changes can reshape chromatin folding patterns and gene positioning within the nucleus. The cause–consequence relationship of genome architecture and gene expression is intricate, and its molecular mechanisms are under intense investigation. Here, we review the interdependency of transcription and genome organization with emphasis on enhancer–promoter contacts in gene regulation.
Overview
The genetic material of eukaryotic cells is encased by the nuclear envelope. The inner of the two nuclear membranes is attached to a fibrillar network of proteins that constitutes the nuclear lamina, which has a role in genome organization (Akhtar and Gasser, 2007; van Steensel and Belmont, 2017). Chromosomes occupy distinct territories within the nucleus, and their radial positioning correlates with gene content and activity (Cremer et al., 2006). Gene poor regions and repressed genes are closer to the nuclear periphery and are often found in close contact with the lamina as part of lamina-associated domains (van Steensel and Belmont, 2017). Gene-rich, active regions tend to localize centrally (Bickmore and van Steensel, 2013) but are also found at nuclear pore complexes that perforate the nuclear envelope. In addition, the nucleoplasm contains numerous presumably self-organizing structures with roles in gene expression, such as the nucleolus, Cajal bodies, splicing bodies, and promyelocytic leukemia bodies, that are not surrounded by membranes (Mao et al., 2011). Transcriptionally inactive chromatin can aggregate near nuclear chromocenters (pericentromere-associated domains; Wijchers et al., 2015) and is found near the nucleolus in so-called nucleolus associating domains (Németh et al., 2010; van Koningsbruggen et al., 2010). Repositioning of loci from the nucleolus to the lamina after cell division suggests mobility of transcriptionally repressed DNA between different heterochromatic sites (Kind et al., 2013). Differentiation-induced activation can move genes away from the lamina (Peric-Hupkes et al., 2010), and as an additional example of gene mobility upon activation, loci can be extruded from chromosome territories (CTs; Ragoczy et al., 2003; Chambeyron and Bickmore, 2004). This indicates that gene positioning within different nuclear environments is closely linked to gene activity (Fig. 1).
Imaging studies have provided numerous fundamental insights into global nuclear organization, but techniques based on chromosome conformation capture (3C) enabled unprecedented views of chromosome folding, reaching single-kilobase resolution of chromatin contacts (Cullen et al., 1993; Dekker et al., 2002). 3C and its high-throughput derivatives (Denker and de Laat, 2016) are based on proximity ligation of DNA fragments and have detected numerous layers of chromosome organization. At the lowest scale, transcriptional regulation was found to involve long-range contacts between regulatory elements, such as enhancers, and their target genes. Enhancers serve as a binding platform for transcription factors that can boost gene transcription, and a characteristic chromatin signature has allowed for their annotation in many cell types (Calo and Wysocka, 2013). Enhancers mostly reside in the noncoding part of the genome and they often act in a cell type–specific manner, thereby establishing specialized gene expression programs characteristic of specific cell types. Genes can be influenced by multiple enhancers that can be located up to over a megabase away from their target promoter, but in many cases, chromatin looping was found to allow for communication between elements regardless of distance. However, the regulatory space of enhancers is confined by megabase scale topologically associating domains (TADs) that are flanked by boundary elements across which the probability of chromatin interactions is reduced (Dixon et al., 2012; Nora et al., 2012; Sexton et al., 2012). The tendency of active and repressive chromatin to segregate and occupy distinct regions in the nucleus is referred to as A (transcriptionally active) and B (inactive) compartments (Lieberman-Aiden et al., 2009). In addition, intra-chromosomal contacts are more frequent than inter-chromosomal contacts, which is compatible with the concept of CTs that had been described in imaging studies (Cremer and Cremer, 2001).
In this review we explore in greater depth the mutual relationship between enhancer function and architectural features of chromatin, including how certain interactions are favored and stabilized while others are disfavored, and how enhancer activity is regulated by and impacts upon a complex, multi-layered nuclear architectural framework.
Physical proximity can mediate communication among regulatory elements
Transcriptional enhancers are DNA sequences that augment gene transcription (Banerji et al., 1981; Moreau et al., 1981). Enhancers function via a plethora of mechanisms that are initiated by sequence-specific DNA binding proteins and their coregulators (Bulger and Groudine, 2011; Buecker and Wysocka, 2012; de Laat and Duboule, 2013). Enhancer-bound transcription factors can impact gene activity by initiating the opening of chromatin via recruitment of chromatin remodeling complexes to generate nucleosome free regions. They can also recruit histone modifying complexes, basal transcription factors including RNA polymerase 2 (pol2), and/or proteins involved in transcriptional pause-release. Many of these features along with sequence conservation have been used to annotate enhancers genome-wide (Calo and Wysocka, 2013).
Enhancers typically contain clusters of transcription factor binding sites (Fig. 1). Transcription factors and their cofactors can form assemblies (termed enhanceosomes) via cooperative chromatin binding, which can result in synergistic transcriptional outcomes (Merika and Thanos, 2001). In addition, genes can be controlled by multiple enhancers that in some cases are close to each other along the linear chromosome (Fig. 1). Clusters of enhancer elements have been varyingly named locus control region (LCR) as well as stretch-, spread-, or super-enhancers based on size, histone modifications, and level of occupancy by nuclear factors (Heinz et al., 2015; Pott and Lieb, 2015). Constituent enhancer elements can also be scattered across a locus, residing upstream, within, or downstream of the genes they control, and can cluster in 3D to form “hubs” (Patrinos et al., 2004). Yet neither the size of a composite enhancer nor its positioning allows for reliable predictions of its mechanism of action.
Vertebrate enhancers can be located up to over a megabase away from the promoters they act on (Lettice et al., 2003; Sagai et al., 2004). An answer to how these elements can communicate over large distances came with the development of 3C (Cullen et al., 1993; Dekker et al., 2002). 3C and its high-throughput derivatives (4C, 5C, capture-C, and Hi-C, among others) employ stabilization of chromosomal contacts via chemical cross-linking, restriction digestion of DNA, and ligation of DNA fragments, followed by quantification of chimeric DNA fragments (Denker and de Laat, 2016). The underlying concept is that the closer genomic regions are positioned in nuclear space, the more likely they are to be cross-linked and ligated to each other. Original analyses of the β-globin locus by 3C (Tolhuis et al., 2002) or a method called RNA-TRAP (Carter et al., 2002) revealed that an enhancer, the LCR, contacts the β-globin promoter, looping out the intervening ∼50 kb of DNA. The β-globin LCR, which is used frequently here as an example, is a strong erythroid-specific enhancer required for the normal expression of all genes in the cluster. It consists of a cluster of DNase I hypersensitive sites that cooperate to ensure proper globin expression during erythroid differentiation (Bender et al., 1998, 2001; Bulger et al., 2003; Fang et al., 2005). Similar regulatory loops were found for other genes (e.g., Murrell et al., 2004a; Würtele and Chartrand, 2006; Vernimmen et al., 2007), and broad assessments of 3D contacts have revealed that looping is a widespread phenomenon (Sanyal et al., 2012; Rao et al., 2014). Systematic examination of chromatin contacts during lineage commitment and development revealed that certain contacts are stable across different developmental stages and cell types, while others, mostly associated with variable chromatin states, are dynamic (Mifsud et al., 2015; Schoenfelder et al., 2015; Javierre et al., 2016; Andrey et al., 2017; Freire-Pritchett et al., 2017). Stable loops may make genes permissive for rapid transcriptional activation upon external stimuli but may also serve as a structure within which more dynamic contacts are formed. The fact that some enhancer–promoter interactions are established de novo to initiate gene expression while other loops are preformed indicates that contacts among regulatory elements do not necessarily result in active transcription and that expression or recruitment of additional transcription factors may be required for the loop to become functional. Moreover, ongoing transcription is not necessary to maintain enhancer–promoter loops (Mitchell and Fraser, 2008; Palstra et al., 2008).
It is important to realize that enhancer–promoter looping, even though widespread, might not be a universal mechanism of distal enhancer function. For example, a neuronal enhancer at the sonic hedgehog (Shh) locus appears to decompact the locus upon activation, thereby increasing its distance from the Shh promoter (Benabdallah et al., 2017 ,Preprint). This particular scenario seems irreconcilable with an enhancer–promoter looping mechanism. Additional experiments in this study support a mechanism by which enhancer activity spreads along the chromosome to activate the promoter. Nonetheless, distinct Shh enhancers have been shown to engage in looped contacts with the Shh gene promoter (Amano et al., 2009; Williamson et al., 2016). It is intriguing that seemingly different mechanisms evolved to control the Shh promoter over large distances in distinct cell types. An additional consideration is that proximity among regulatory elements might not necessarily be a reflection of simple loops but of more complex chromatin folding patterns in which active loci can be condensed (Williamson et al., 2016). Hence, both de-compaction and compaction (perhaps as a result of complex looping structures) can occur upon enhancer activation. Indeed, architectural studies at the Igh locus suggest that spatial confinement may help in forging enhancer–promoter interactions (Lucas et al., 2014; as will be discussed). In conclusion, transcriptional output is dependent on distal enhancer–promoter communication, which can be enabled by chromatin looping. As multiple elements or forces may be involved, high-resolution analysis (using multiple techniques/viewpoints/anchors/probes) may be necessary to interpret the contact complexity of specific loci.
Functional assessment of chromatin loops as a driving force of transcription
Evidence that chromatin loops can be instructive to activate transcription came from tethering experiments in which proximity between regulatory elements was forced in living cells (Nolis et al., 2009; Deng et al., 2012). Zinc-finger–mediated tethering of the candidate looping factor Ldb1 to the β-globin promoter led to recruitment of the LCR and transcription activation in immature erythroid precursors (Deng et al., 2012). This approach was also used to rewire the LCR with a different gene to activate its expression while reducing transcription of the enhancer-deprived gene (Deng et al., 2014). dCas9-mediated tethering of YY1 to the Etv4 promoter in mouse embryonic stem cells (mESCs) increased interaction frequency between the promoter and its enhancer and resulted in increased transcription (Weintraub et al., 2017). These studies indicate that inducing proximity between regulatory elements and promoters can causally underlie transcription.
What is the temporal relationship between transcription and architectural chromatin features? Transcription is a discontinuous process with periods of intense mRNA production (bursts) followed by longer periods of transcriptional silence (Raj and van Oudenaarden, 2008). Output can be modulated by altering burst size, reflective of the number of RNA molecules produced, and/or burst fraction, reflective of frequency and duration of the bursts. Forced juxtaposition between β-globin and the LCR leads to an increase in burst frequency but not burst size (Bartman et al., 2016). Early pioneering RNA FISH studies at this locus (Wijgerde et al., 1995), later revisited using single-molecule RNA FISH (Bartman et al., 2016), provided evidence in favor of a model in which two genes that are under the control of the same enhancer display alternating and thus mutually exclusive contacts with the enhancer. Hence, alternating bursting of these genes might be a reflection of competing enhancer–promoter loops (Fig. 2 A). These findings contrast with a study using an engineered locus in Drosophila melanogaster in which a single enhancer was capable of regulating two flanking genes in a manner such that both genes exhibited synchronous bursting behavior (Fukaya et al., 2016). The latter result suggests that looped contacts are not always mutually exclusive (Fig. 2 A), and that competition for them is not a universal mechanism for gene control at multi-gene loci. Recent live measurements of the proximity between the endogenous even-skipped (eve) enhancers and an integrated eve promoter-driven LacZ reporter gene in Drosophila embryos demonstrated a close correlation between transcriptional bursting and enhancer–promoter juxtaposition (Chen et al., 2018). Together with the aforementioned studies, this suggests that dynamic chromatin contacts can underlie bursty behavior of gene transcription. Moreover, in Chen et al. (2018), the presence of an ectopic eve promoter in the reporter construct diminished expression of the endogenous gene, presumably as a result of promoter competition.
Alternating chromatin loops have also been invoked to explain changes in gene expression over much longer time scales such as during ontogeny. For example, the developmental switch in the expression of β-type and α-type globin genes during development is thought to involve mutually exclusive interactions with the enhancers (Foley and Engel, 1992; Palstra et al., 2003; Vernimmen et al., 2007), likely controlled by developmentally regulated nuclear factors that foster specific enhancer–promoter contacts, and by developmentally dynamic architectural constraints, as will be discussed below. Competition among poised or active genes for shared enhancers can modulate gene expression and even underlie aberrant gene expression in disease. De novo formation of a transcriptional start site by a gain-of-function regulatory single-nucleotide polymorphism upstream of the human α-globin genes sequestered the distal enhancer away from the α-globin promoters (Fig. 2 B), reducing their expression and resulting in α-thalassemia (De Gobbi et al., 2006). Conversely, increased expression of the MYC proto-oncogene in cancer can result from mutations in the PVT1 promoter (Fig. 2 C), a gene in close proximity of MYC, which normally competes for the same enhancer (Cho et al., 2018). CRISPR interference based mapping of functional regulatory sequences revealed similar competitive relationships between promoters and enhancers at the MYC and GATA1 loci (Fulco et al., 2016).
Genes can be activated simply by residing within the regulatory range of an enhancer even if their expression is ostensibly irrelevant to the function of the cell in which the enhancer is active, a phenomenon called the bystander effect (Fig. 1). For example, the gene CD79b, which is transcribed and functional in B-lymphocytes, sits between the human growth hormone gene cluster and a distal LCR that drives gene expression in the pituitary gland. Transcription of CD79b is activated in the pituitary gland as a result of being close to the enhancer even though it plays no discernible role in pituitary function (Cajiao et al., 2004). As another example, the NME4 gene that resides 300 kb away from the human α-globin genes is in physical contact and under the regulatory influence of the α-globin enhancer. NME4 competes for the activity of the α-globin enhancer even though it is dispensable for erythroid cell function (Lower et al., 2009). Finally, in the forced chromatin looping experiments described above, tethering of the adult type β-globin gene to the LCR additionally activated the interspersed embryonic bH1-globin gene, presumably as a result of bringing it closer to the LCR (Deng et al., 2012), which corroborates that proximity can favor regulatory influence.
Transcriptional activity is linked to nuclear positioning
Transcriptional activation (by enhancers) often results in relocalization of genes toward the nuclear interior. Enhancers such as the β-globin LCR can drive positioning of the β-globin locus away from heterochromatin (Francastel et al., 1999) and promote movement of the locus toward the nuclear interior during erythroid cell maturation, which is accompanied by increased gene activity (Ragoczy et al., 2006). The β-globin LCR directs the locus toward foci of engaged (Ser5 phosphorylated) pol2, also referred to as transcription factories (Ragoczy et al., 2006). The function, if any, of gene movement to the nuclear center, although common and generally observed across species, is unclear. As a case in point, the CFTR gene and its neighbors migrate to the nuclear center upon activation in human but not murine cells, in which they are constitutively positioned centrally (Sadoni et al., 2008), suggesting that central nuclear localization might be permissive but not instructive for gene expression. Perhaps similarly, relocalization of β-globin away from its CT does not necessarily result in transcription but primes the region for activation (Ragoczy et al., 2003). Ectopic integration of the LCR in a different chromosome moved the integration site away from the CT and resulted in activation of some, but not all, surrounding genes (Noordermeer et al., 2008). This indicates that regulatory elements can influence nuclear localization, but does not settle the question whether it is the LCR per se, chromatin opening, or active transcription that causes positioning outside the CT. Experiments in embryonic stem cells showed that targeted chromatin decondensation at a chosen gene in the absence of transcription activation is sufficient to mobilize the locus toward the nuclear interior (Therizols et al., 2014). This suggests that chromatin state but not transcription can be a driving force in nuclear repositioning.
While many observations regarding enhancer function and nuclear architecture have been made at the β-globin locus, most concepts gleaned from this locus are likely to apply to other genes as well. However, it is important to bear in mind that considerations of enhancer function in the context of genome topology are fraught with the limitation that correlation of a particular feature and gene activity does not imply causation.
Gene silencing is an important component of transcriptional programs, and maintenance of this state is indispensable for cell fate specification. The consequences of impaired gene silencing were first observed in Drosophila in the form of severe phenotypes in mutants with derepression of developmental patterning genes such as hox genes (Lewis, 1978; Struhl, 1981; Duncan, 1982; Ingham, 1985). Polycomb group (PcG) protein complexes are now known to recognize silent genes and to maintain their inactive status through repressive epigenomic modifications such as histone 3 lysine 27 trimethylation (H3K27me3; Schwartz and Pirrotta, 2007). Interchromosomal as well as intrachromosomal interactions between PcG-silenced genes were found to underlie the aggregation of silenced chromatin into PcG bodies (Pirrotta and Li, 2012), and recent super-resolution imaging in Drosophila revealed very dense packaging of PcG chromatin (Boettiger et al., 2016). There is considerable evidence that H3K27me3 and H3K9me2/3 promote peripheral localization (Harr et al., 2015), and deposition of the former repressive mark by forced EZH2 recruitment was sufficient to induce repression and compartment (active to inactive) switching (Wijchers et al., 2016). Targeted perturbations, in which genes were tethered to specific nuclear sites, further demonstrated the direct impact of gene positioning on activity (reviewed in Deng and Blobel, 2014; Bartman and Blobel, 2015). For instance, tethering genes to the nuclear lamina resulted in repression of some but not all genes (Andrulis et al., 1998; Finlan et al., 2008; Kumaran and Spector, 2008; Reddy et al., 2008). This indicates that relocalization can drive transcriptional changes but, as discussed, regulatory elements can also drive movement of loci to compartments with a different activity status. Importantly, most of the links between subnuclear positioning and gene expression are still correlative with unexplored cause–effect relationships.
Regulatory interactions in gene repression
Heterochromatinization and loss of gene-specific looping are associated with long-term gene silencing, but architectural changes that occur at the initial stages of repression are less well understood. Especially in higher eukaryotes, studies on the link between genome organization and early gene repression are underrepresented. Polycomb response elements (PREs) in Drosophila are sufficient to recruit PcG proteins that mark regions with H3K27me3, and repress nearby genes (Simon et al., 1993; Chan et al., 1994; Wang et al., 2004). PREs, like enhancers, can be distant from their targets and are able to form long-range contacts with the genes they act on (Schwartz and Cavalli, 2017). Notably, no mammalian counterpart of PREs has been identified so far. One alternative mechanism by which polycomb can be recruited to chromatin involves noncoding RNAs that can bind to the polycomb component PRC2 (Schaaf et al., 2013; Kaneko et al., 2014; Berrozpe et al., 2017). For instance, initiation of X-chromsome inactivation by Xist results from recruiting PRC2 to Xist occupied sites. Xist spreads across the X-chromsome by scanning and occupying genomic loci that reside in spatial vicinity, which then alters chromosome structure of these sites and allows H3K27me3 to spread to new, and from there other, genomic sites (Engreitz et al., 2013).
Gene repression can be accompanied by acute loss of enhancer–promoter contacts. For example, a repressive cue leads to rapid dissocation between the enhancer and promoter at the Kit locus in maturing murine erythroblasts (Jing et al., 2008). Loss of enhancer–promoter contacts upon cell state change was observed in other cell types as well and is indispensable for shutdown of the pluripotenty program and differentiation of mESCs (Whyte et al., 2012; Respuela et al., 2016; Schnappauf et al., 2016; Bonev et al., 2017). At genes under circadian regulation, rhythmic enhancer–promoter contacts are found (Aguilar-Arnal et al., 2013; Kim et al., 2018; Mermet et al., 2018), and the transcriptional repressor Rev-erbα has been implicated in the circadian disruption of enhancer–promoter loops (Kim et al., 2018). This echoes studies in Drosophila suggesting that the transcriptional repressor Snail functions to disrupt enhancer–promoter contacts, earning its description as “anti-looping” factor (Chopra et al., 2012). However, as mentioned in other contexts above, establishing a cause–effect relationship between lost enhancer activity and the disruption of long-range contacts remains an unmet challenge.
Novel chromatin loops can be established during transcriptional repression in yeast (Yadon et al., 2013) and murine cells (Jing et al., 2008). In the case of the former, it is thought that looping juxtaposes sites bound by the repressive chromatin remodeler Isw2 with silenced genes. In the case of the latter, acute repression of the Kit gene was associated not only with a loss of the enhancer–promoter loop but also with a concomitant gain of a repression-specific interaction of a promoter-proximal region with an intronic segment. Whether de novo loop formation at the Kit gene contributes to repression remains an open question. It is possible that alternative looped contacts compete with activating loops to diminish enhancer–promoter contacts. This indicates that loops are dynamic and that chromatin architectural changes seem as tightly coupled to gene repression as to activation.
Which proteins establish and maintain chromatin loops?
A longstanding and difficult to answer question relates to the identity of nuclear factors that forge chromatin loops. Early in vitro studies using electron microscopy showed that purified transcription factors capable of dimerization such as Sp1 or the viral protein E2 can form loops when bound to chromatin-free DNA templates (Knight et al., 1991; Su et al., 1991). Pinpointing the proteins at the base of the loops that form the actual connections in vivo remains an unresolved issue. Nonetheless, several proteins have been identified that are thought to contribute to chromatin looping, and a short summary of different factors that were found at the base of dynamic and/or stable long-range contacts will be provided. While some proteins act in a more cell type–specific manner (e.g., GATA1, Ldb1, and EKLF at the β-globin locus in erythroblasts [Drissen et al., 2004; Vakoc et al., 2005; Song et al., 2007], TAF3 in mESCs [Liu et al., 2011], and SATB1 in thymocytes [Cai et al., 2006]), others are thought to serve as more general looping factors (e.g., Mediator, CTCF, cohesin, and YY1 [ Rollins et al., 1999; Splinter et al., 2006; Kagey et al., 2010; Lai et al., 2013; Phillips-Cremins et al., 2013; Ing-Simmons et al., 2015; Beagan et al., 2017]).
ChIA-PET and HiChIP are methods that enrich for chromatin contacts associated with selected proteins or chromatin modifications of interest (Fullwood et al., 2009; Mumbach et al., 2016). These approaches have revealed that transcription factors (e.g., estrogen receptor; Fullwood et al., 2009), architectural proteins (e.g., CTCF and cohesin; Tang et al., 2015; Mumbach et al., 2016), and pol2 or histone acetylation (Zhang et al., 2013; Mumbach et al., 2017) can be linked to contacts among multiple enhancers and/or promoters in cis and in trans, thereby creating regulatory nodes, which led to the speculation that transcriptional regulation of multiple genes might be coordinated. However, the degree to which chromatin interactions between chromosomes impact gene expression remains a subject of debate (Cremer and Cremer, 2001; Meaburn and Misteli, 2007; Williams et al., 2010; Cavalli and Misteli, 2013; Bonev and Cavalli, 2016).
While the above reports provide insights into which factors and chromatin marks occupy the base of the loops, they do not establish evidence for their direct involvement as “glue” between contact sites. Studies that monitor in vivo long-range interactions have typically relied on loss-of-function assays of candidate looping factors or regulatory elements themselves (Drissen et al., 2004; Patrinos et al., 2004; Vakoc et al., 2005). However, these approaches typically fail to distinguish direct from secondary effects that might result from transcriptional perturbations. Even in the case of targeted tethering of the candidate looping factor Ldb1 to a predetermined site in the genome, which was successful in forging long-range chromatin contacts in the globin locus as discussed, definitive proof that this occurs in vivo via direct contacts rather than protein intermediates is difficult to attain. This challenge still stands for most, if not all, factors implicated in chromatin looping.
Enhancer function is restricted by insulators
Most enhancers are promiscous since they are capable of augmenting the expression of commonly used reporter genes with minimal promoters (Picard and Schaffner, 1983). Methods to identify enhancers rely on this concept (e.g., Arnold et al., 2013; Symmons et al., 2014). Moreover, as discussed, ectopic integration of the β-globin LCR into a novel genomic region leads to close association and activation of many but not all nearby genes, including nonerythroid ones (Noordermeer et al., 2008). What, then, underlies the high degree of specifity of enhancer action in vivo? One mode to constrain enhancer function comes from enhancer-blocking insulators (West et al., 2002). Insulators are position-dependent as their placement between an enhancer and promoter nullifies enhancer activity, while placement upstream or downstream is inoccuous (Fig. 3; West et al., 2002). This strict position dependence distinguishes them from transcriptional repressor elements (such as PREs) that can silence genes in a more position-independent manner. Interestingly, insulators were found to be able to reposition and colocalize PRE-repressed genes in Drosophila, thereby ensuring their silencing (Sigrist and Pirrotta, 1997; Comet et al., 2011; Li et al., 2011, 2013).
CTCF is currently thought to be the major enhancer blocking insulator protein in mammals (Bell et al., 1999). Thus, even though CTCF can play a role in fostering enhancer–promoter communication, it can also prevent it. This has been well characterized at the imprinting control region (ICR) between the paternally expressed Igf2 and the maternally expressed H19 genes, which operates in part via DNA methylation–sensitive CTCF binding (Bell and Felsenfeld, 2000; Hark et al., 2000; Engel et al., 2004). On the maternal chromosome, CTCF occupies the unmethylated ICR to shield the Igf2 gene from a downstream enhancer. On the paternal chromosome, the ICR is methylated, preventing CTCF binding and enabling the enhancer to interact with the Igf2 promoter (Murrell et al., 2004b; Kurukuti et al., 2006). In addition, it has been proposed that CTCF forms looped contacts with the blocked enhancer (Fig. 3; Yoon et al., 2007). Another instructive example of CTCF insulating an enhancer, presumably via loop formation, is provided by ectopic insertion of a CTCF-bound element between the β-globin genes and the LCR (Hou et al., 2008). The ectopic element pairs with a CTCF-bound element upstream of the LCR, encasing it in a loop (Fig. 3), which is thought to prevent LCR contacts with the globin promoters. Pairing of insulators has been described in Drosophila (Blanton et al., 2003; Byrd and Corces, 2003), but how widespread this dual function of CTCF of preventing loops while engaging in new loops is in mammals remains unclear. Loop formation via pairing of CTCF sites might not be required for enhancer-blocking activity, since it can occur in reporter assays with single CTCF elements and no known partner elements (Bell et al., 1999). CTCF might alternatively prevent tracking of an enhancer toward a promoter (Zhao and Dean, 2004), which would explain the position dependence of an enhancer blocker and at the same time provide a rationale as to why enhancers cannot simply loop across an insulator element to reach a promoter. This suggests that the role of CTCF as an insulator may involve looping in some but not all cases. Regardless, loss of CTCF can lead to aberrant gene expression, presumably due to enhancer–promoter miswiring.
Originally appreciated as a factor important for sister chromatid cohesion (Michaelis et al., 1997), cohesin was found to colocalize at >50% of CTCF sites across multiple cell lines (Parelho et al., 2008; Rubio et al., 2008; Wendt et al., 2008). It is likely to contribute to the insulation function of CTCF through mediating long-range chromatin interactions. Similarly, the insulating potential of CTCF is dependent on other cofactors (Ghirlando and Felsenfeld, 2016). In addition, insulation can also be obtained in an ostensibly CTCF-independent manner. For instance, the β-globin LCR can function promiscuously under experimental conditions, so why does it not activate fetal-type globin genes in adult erythroid cells and vice versa? Fine scale chromosomal contact maps in human erythroid cells identified a regulatory element that engages in developmental stage–specific long-range chromatin contacts in a manner that insulates the “wrong” type of globin genes from contacting the LCR (Huang et al., 2017). Deletion of this element (which does not contain any detectable CTCF-occupied sites) in adult cells leads to increased LCR contacts with fetal genes and reactivates them. How this element functions is currently not understood, but it provides an example of a developmental stage- and tissue-specific architectural element that influences enhancer–promoter wiring. This further suggests that different architectural proteins, some of which may be unknown, cooperate to set up a framework to promote or prevent contacts between regulatory elements.
Architectural constraints define regulatory domains
The introduction of Hi-C has enabled the mapping of chromatin interactions genome-wide (Lieberman-Aiden et al., 2009). Contact maps revealed that chromosomes are partitioned into roughly megabase-scale TADs (Dixon et al., 2012; Nora et al., 2012; Sexton et al., 2012). TADs are highly conserved between cell types and species (Dixon et al., 2012; Ho et al., 2014; Vietri Rudan et al., 2015). Accordingly, recent analysis of late primate evolution revealed that boundaries are depleted for evolutionary changes that disrupt their function (Fudenberg and Pollard, 2018 ,Preprint). The invariant nature of these domains suggests they represent a more general principle of chromosomal organization and have little influence on tissue-specific gene expression programs. This is consistent with the view that specific transcriptional profiles are mainly driven by unique long-range contacts between regulatory elements within TADs as well as compartmentalization of chromatin contacts between TADs (Dixon et al., 2015; Beagan et al., 2016, 2017; Bonev et al., 2017). Uniformity of TADs across tissues also suggests that gene expression cannot be the only determinant of TAD formation even though it likely contributes to it (Bonev et al., 2017). Of note, TADs were detected with high-resolution imaging as well as more recently developed genomic methods that are independent of proximity ligation, suggesting these findings are not reflective of technical biases (Beagrie et al., 2017; Nir et al., 2018 ,Preprint; Wang et al., 2016; Quinodoz et al., 2018; Szabo et al., 2018). Sub-TADs, which are nested within larger TADs are more variable between cell types (Phillips-Cremins et al., 2013; Rao et al., 2014). Communication between regulatory elements is generally restricted to such contact domains because interactions across TAD or sub-TAD boundaries are depleted. This was corroborated by insertion of a LacZ reporter-based enhancer trap at random sites in the mouse genome (Symmons et al., 2014). The reporter genes were mostly responsive to enhancers within the same TAD, indicating that TADs are not just physical structures but function to constrain enhancer influence. The other side of the coin is that TADs can also foster long-range enhancer–promoter contacts. Integration of the Shh limb enhancer at different sites within its TAD showed that the enhancer functions largely independent of distance (Symmons et al., 2016). However, when the TAD was disrupted, the enhancers seemed to only function at shorter distances. Thus, physical constraints imposed by TAD boundaries might not only prevent aberrant enhancer–promoter pairing but also facilitate long-range contacts between them, a concept reminiscent of the bystander effects due to forced chromatin looping as was discussed.
In line with its insulating and looping potential, CTCF was found to be enriched at TAD boundaries (Dixon et al., 2012; Nora et al., 2012; Phillips-Cremins et al., 2013). While insulator function often refers to blocking the communication between cis-regulatory elements (enhancer blocking insulators) or limiting the spreading of repressive chromatin (barrier insulators), boundary annotations of topological domains are often contact-based (usually measured by Hi-C). While the different functions of insulators can be separated (Recillas-Targa et al., 2002; West et al., 2004), they can be related, and as a result, the definition of these functions has become somewhat blurred. In general, chromatin interactions across insulator/boundary elements are disfavored, and CTCF is often found at the base of loops that define domains (loop domains; Rao et al., 2014). However, not all CTCF-occupied sites are boundaries (Dixon et al., 2012; Nora et al., 2012), which suggests that CTCF binding alone is insufficient for (sub)TAD boundary formation. CTCF binding per se can be tissue-specific, modulated by contextual transcription factors (e.g., Behera et al., 2018), but there are also examples of sites where CTCF is bound across a range of tissues but engaged in looped interactions only in one of them. This is the case, for instance, for CTCF sites flanking the α- and β-globin loci (e.g., Hou et al., 2008; Hanssen et al., 2017; Huang et al., 2017).
The functional relevance of domain boundaries in safeguarding proper gene regulation has been impressively demonstrated in cases where their disruption leads to disease (reviewed in Krijger and de Laat, 2016). Structural variants, such as inversion, duplication, or deletion of a CTCF-associated TAD boundary can result in pairing between a limb enhancer and a gene normally not regulated by this enhancer, leading to digit malformation (Lupiáñez et al., 2015). Cook’s syndrome, characterized by limb malformations, can be caused by a specific duplication event that forms a new TAD in which the KCNJ2 gene is contacted and activated by SOX9 enhancers (Franke et al., 2016). Targeted deletion of CTCF sites at boundaries near the miR-290-295 and Pou5f1 loci in mESCs enabled new enhancer contacts and activation of nearby genes (Dowen et al., 2014). Similar observations have been made at a boundary near an erythroid gene where deletion of a CTCF site enables a tissue- and developmental stage–specific enhancer to act on a housekeeping gene (Hsu et al., 2017). At the HoxA-locus, deletion of a CTCF site leads to spreading of active chromatin into a previously repressed domain and inappropriate gene activation (Narendra et al., 2015). In patients with T cell acute lymphoblastic leukemia, microdeletions perturb a CTCF-associated boundary, which results in proto-oncogene activation (Hnisz et al., 2016). In gain-of-function IDH mutant gliomas, DNA hypermethylation prevents CTCF binding and perturbs a domain boundary, allowing an enhancer to aberrantly activate a proto-oncogene (Flavahan et al., 2016). Deletion of a CTCF–cohesin cobound site that demarcates one end of an erythroid-specific sub-TAD surrounding the mouse α-globin locus extended the sub-TAD, enabling an α-globin regulatory element to contact and augment the expression of genes outside the sub-TAD (Hanssen et al., 2017). These examples illustrate how CTCF loss can lead to enhancer–promoter miswiring.
It may, however, be over-simplified to conceptually equate domain boundaries with functional insulation by CTCF. Numerous recent papers show that features such as housekeeping genes, pol2 occupancy, transcriptional activity, and short interspersed nuclear elements are also enriched at TAD boundaries (Dixon et al., 2012; Nora et al., 2012; Rowley et al., 2017). Transposable element-mediated expansion of transcription factor binding sites (such as for CTCF) was found to be a substrate for cooption of regulatory functions during evolution. Therefore, CTCF and gene activity may (co)evolve to establish regulatory domains (Schmidt et al., 2012; Sundaram et al., 2014).
Global perturbation of architectural proteins and the 3D genome
The previously described examples at single loci highlight the potential consequences of perturbing (CTCF) boundaries, but recent experiments in which architectural proteins were depleted globally report limited transcriptional changes, even though domain configuration was markedly perturbed. Auxin-mediated degradation of CTCF for 24 h in mESCs decreased domain insulation at around 80% of the 5,525 boundaries that were called, but only ∼400 genes were transcriptionally affected (Nora et al., 2017). Approximately half of the genes whose expression changed were down-regulated. Those tended to have CTCF bound close to their transcriptional start site. Next to loss of CTCF as a potential transcriptional activator (West et al., 2002), expression changes may result from misregulation of CTCF-mediated loops that arrange pairing among cis-regulatory elements (Phillips and Corces, 2009; Ghirlando and Felsenfeld, 2016). The genes that were up-regulated upon CTCF depletion were often located close to boundaries, raising the possibility that they came under the inappropriate influence of enhancers. However, gene activation was rare under these conditions (Nora et al., 2017), suggesting that other requirements, such as the presence of certain transcription factors, may have to be met in specific cases in order for aberrant transcription to occur even when genome partitioning defects spatially allow for it. This is the case for the β-globin locus, for example, where mutations or deletion of CTCF sites led to new interactions, but did not result in activation of surrounding nonerythroid genes by the LCR (Bulger et al., 2003; Splinter et al., 2006).
Contact domains have been proposed to be formed by a mechanism called loop extrusion (Nasmyth, 2001; Sanborn et al., 2015; Fudenberg et al., 2016). Upon recruitment to genomic elements, the cohesin complexes are thought to propel the chromatin fiber through their ring-shaped structure until extrusion stalls at convergently oriented CTCF sites (de Wit et al., 2015; Guo et al., 2015; Vietri Rudan et al., 2015). This, in concert with tightly regulated loading and unloading of extrusion factors, results in looped domain-like configurations. This model found support in studies in which cohesin subunits and the factors that load and unload components of the complex were targeted. Knockout of the cohesin unloading factor WAPL1 in HAP1 cells increased the size of loops between convergent CTCF sites due to loop extrusion beyond primary CTCF sites at which it would normally stall (Haarhuis et al., 2017). Conversely, knockout of the cohesin subunit Rad21 or its loading factor NIPBL resulted in loss of TADs and revealed a 3D organization that was more reflective of compartments based on epigenomic chromatin landscapes (Rao et al., 2017; Schwarzer et al., 2017). This is in line with recent findings in Drosophila, in which genes with comparable transcriptional status were found to converge in mini-domains (Rowley et al., 2017). In-depth analysis of Hi-C data in a human lymphoblastoid cell line revealed comparable active/inactive switches at domain borders. With CTCF present at the majority of these borders (Rowley et al., 2017), this corroborates the interplay between architectural proteins and transcriptional activity in domain formation.
Early embryonic lethality of CTCF knockout mice suggests that this protein is indispensable for proper gene regulation (Moore et al., 2012). In addition, cell death is observed in mammalian cells upon prolonged CTCF depletion, and cohesin loss interferes with cell division (Soshnikova et al., 2010; Watson et al., 2014; Gupta et al., 2016; Nora et al., 2017). However, CTCF is not required for embryonic development in Drosophila (Gambetta and Furlong, 2018), and while pronounced developmental and disease phenotypes at the organismal level are observed in individuals with mutations in genes that code for cohesin subunits, these alterations do not prevent viability (Watrin et al., 2016). As discussed, acute depletion of architectural proteins impaired 3D folding without dramatically altering the transcriptome. The ostensibly mild changes at the level of gene expression after domain disruptions raise questions regarding the degree to which large-scale chromatin architecture impinges on gene expression. One needs to keep in mind, however, that boundary loss has been measured mainly by Hi-C. As suggested by recent super-resolution imaging (Bintu et al., 2018), it is possible that smaller shifts and heterogeneity at boundaries as a result of architectural protein depletion might lead to loss of boundary calls in cell population–based experiments even though boundaries could be functionally preserved at the single-cell level. Together with the other examples in which boundary loss does lead to enhancer–promoter miswiring and disease, this suggests that consequences may have to be investigated on a case-by-case basis because of context specificity.
Conclusion
In this review, we attempted to highlight examples of the closely intertwined relationship between chromosomal architectural features and gene regulation. While most studies, especially earlier ones, are correlative when linking gene activity to nuclear architecture, an increasing number includes specifically targeted perturbations, allowing inferences about causal relationships. We have learned that gene positioning, chromosomal looping, and gene expression are in mutually influential relationships. However, the state of a gene cannot predict architectural features surrounding it with certainty and vice versa. For example, as illustrated above, escape from the nuclear periphery, extrusion from the chromosome territory, and enhancer looping are frequently seen at active genes, but none of these features are fully deterministic. Moreover, the interplay between regulatory elements and their nuclear environment may be distinct between genes, where, for example, in one scenario, enhancer–promoter loops exist before activation, whereas in another, looping might be rate-limiting. These considerations are also important when using chromosomal contact maps to interpret genome-wide association study signals and making inferences of nucleotide variants on variable traits or disease mechanisms.
While this field has witnessed a massive expansion in knowledge in a relatively short time span, key challenges remain. These include establishing the identity of nuclear factors that directly assemble the myriad of chromosomal configurations observed. This issue is not easily addressed since depletion, even transiently, of nuclear factors can have wide-ranging secondary effects, and since even precise editing of transcription factor binding sites can affect chromatin association of nearby factors. Hence, loss-of-function and gain-of-function studies have to be combined to strengthen the conclusions. Another important issue that the field has to contend with is that while general principles of chromatin organization and gene regulation have been recognized, there are clearly differences in the way different regulatory elements function and genes are controlled. This necessitates deep reductionist experimentation at individual loci if the goal is to fully understand a specific gene.
Another widely faced challenge in the field is cell-to-cell and even allele-to-allele variability in gene regulation and architecture. Heterogeneity in gene expression is impinged upon by responsiveness to regulatory cues (e.g., signaling gradients, asymmetric distribution of cellular content during mitosis, etc.), cell cycle stage, and stochastic effects. Especially when it comes to the cell cycle, everything that is measurable about higher-order nuclear architecture undergoes dramatic reorganization during the life of a cell, most strikingly during mitosis. This needs to be taken into account when targeted experimental perturbations can affect cell cycle progression.
The road ahead promises that many of the above challenges can be met. This optimisim is rooted in the stunning improvements of single-cell technologies and live imaging tools, as well as more refined genome and epigenome editing methods. Hence, our insights into the intricacies of the nucleus will benefit from ever finer spatial and temporal resolution.
Acknowledgments
We thank Jennifer E. Phillips-Cremins, Peter H.L. Krijger, and Haoyue Zhang for critically reading the manuscript and for helpful suggestions.
Cited work from the laboratory of G.A. Blobel is supported by grants from the National Institutes of Health (RO1DK54937, R01HL119479, R24DK106766, U01HL129998A2017, and R37DK058044). M.W. Vermunt is supported by the Rubicon research program, which is financed by the Netherlands Organization for Scientific Research (project no. 019.173EN.006).
The authors declare no competing financial interests.