The assembly of sequence-specific enhancer-binding transcription factors (TFs) at cis-regulatory elements in the genome has long been regarded as the fundamental mechanism driving cell type–specific gene expression. However, despite extensive biochemical, genetic, and genomic studies in the past three decades, our understanding of molecular mechanisms underlying enhancer-mediated gene regulation remains incomplete. Recent advances in imaging technologies now enable direct visualization of TF-driven regulatory events and transcriptional activities at the single-cell, single-molecule level. The ability to observe the remarkably dynamic behavior of individual TFs in live cells at high spatiotemporal resolution has begun to provide novel mechanistic insights and promises new advances in deciphering causal–functional relationships of TF targeting, genome organization, and gene activation. In this review, we review current transcription imaging techniques and summarize converging results from various lines of research that may instigate a revision of models to describe key features of eukaryotic gene regulation.
Early research established that transcription in prokaryotes is primarily controlled by proximal promoter-bound factors (e.g., repressors and σ factors) that often directly interact with RNA polymerase (Ptashne and Gann, 1997). However, as the complexity of organism increases, a key feature of higher eukaryotes (metazoans) emerged as a dramatic expansion of genome size and especially the intergenic regions that contain widely dispersed cis-regulatory elements such as enhancers and insulators (Levine et al., 2014). Specifically, enhancers communicate with promoters to regulate gene activities, whereas insulators block the communication between enhancers and promoters (Heintzman and Ren, 2009). In the past decades, functional components of the gene-regulatory machinery have been extensively characterized by classical in vitro biochemistry and in vivo genetic research (Levine et al., 2014). Moreover, the recent rapid development of next-generation sequencing–based genomewide high-throughput assays (e.g., chromatin immunoprecipitation sequencing, assay for transposase-accessible chromatin sequencing, and Hi-C) have provided additional insights into transcription factor (TF)-binding patterns as well as chromatin architecture and genome organization at the cell population level with rich sequence information (Barski et al., 2007; Lieberman-Aiden et al., 2009; Consortium and ENCODE Project Consortium, 2012; Buenrostro et al., 2013). However, these end point assays are not able to reveal 3D molecular structures or intrinsic dynamics associated with gene-regulatory activities within individual living cells. For example, how quickly and by what means does a TF find its target site inside the nucleus of a live cell? How do multiple TFs dynamically assemble at enhancer DNA? How are cis-regulatory elements organized inside the 3D space of the nucleus? Most importantly, what are the fundamental structural basis and dynamics underlying enhancer–promoter communication? Recent advances in fluorescence labeling techniques in combination with superresolution microscopy modalities have overcome several technical barriers to visualizing and tracking the dynamic movement of individual TFs in single living cells (Mazza et al., 2012; Gebhardt et al., 2013; Chen et al., 2014b), even within developing embryos (Mir et al., 2017). These studies reveal hitherto unknown intranuclear structures and TF kinetic features in live cells with unprecedented spatiotemporal resolution, providing unique opportunities to dissect transcription dynamics and genome organization. In this review, we first outline emerging transcription imaging methods and then discuss recent studies that converge on a gene regulation model that integrates TF dynamics, genome organization, and gene activity in live cells.
Imaging and labeling strategies to probe TF dynamics
The successful development of superresolution TF imaging strategies became possible largely as a consequence of dual advances in new biomolecular labeling techniques and advanced imaging modalities (Fig. 1 A). Early pioneering studies (McNally et al., 2000; Stenoien et al., 2001) to measure TF diffusion and binding dynamics relied on fluorescent proteins (FPs; Tsien, 1998) and the use of FRAP assays (Fig. 1 B; Axelrod et al., 1976). In a typical FRAP experiment, a focused laser beam rapidly bleaches FPs within a selected region in the nucleus. The subsequent rate of fluorescence recovery in the bleached region is dependent on the diffusive and binding kinetics (Kon and Koff) of labeled molecules outside the target area as well as the dissociation rate (Koff) and diffusion kinetics of the bleached molecules within the target area. FRAP has proven to be an effective tool for studying TFs with relatively long-lived binding residence times (several seconds to hours), and as such, it provides valuable information about the behavior of some TFs in live cells. For example, it was found that core histone subunits are generally very stable with little exchange occurring even after ∼1–2 h. Somewhat surprisingly, a small fraction of H2B molecules displayed a considerably more dynamic behavior and exchanged within minutes. These results suggest that the core of the nucleosome is very stable, whereas H2B on the surface of active nucleosomes may exchange more frequently (Kimura and Cook, 2001). Strikingly, heterochromatin protein 1 (HP1) displayed remarkably transient residence times (∼10–20 s) in live cells compared with core histones (approximately hours; Cheutin et al., 2003), suggesting a rather dynamic maintenance of heterochromatin in live cells. In addition, FRAP assays targeting artificially amplified gene arrays suggested that nuclear receptor TFs (e.g., estrogen receptors and glucocorticoid receptors [GRs]) have short residence times (approximately seconds) even at cognate target sites (McNally et al., 1999, 2000; Voss et al., 2011), suggesting rapid dynamic exchange of TFs at enhancer sites in live cells in the time scales of several seconds. Until now, FRAP assays offered the best available method to estimate residence times for proteins that predominantly operated in a binding-dominant mode. However, FRAP measurements rely on averaging over a large, often heterogeneous population of molecules, and the resulting data analysis is a priori highly model dependent (Mueller et al., 2010; Mazza et al., 2012). As a result, FRAP has limited power to resolve complex fast-diffusion dynamics or to measure residence times for minor subpopulations of bound molecules.
A more recently developed and complementary imaging technology for dissecting TF dynamics is direct observation of individual molecules in motion within live cells (Fig. 1 C). Although single-particle tracking is an ancient technique that was first applied to study random movements (Brownian motion) of tiny objects (e.g., pollen) under the microscope as early as 1820s (Brown, 1828), single-molecule tracking (SMT) in live cells was not possible until the realization of fluorescence microscopy (Fluoreszenzmikroskop., 1911) and subsequent development of suitable protein-labeling techniques. For example, SMT was first applied to study the dynamics of membrane proteins by antibody labeling more than two decades ago (Ghosh and Webb, 1994). However, because of the high packing density of intracellular, and particularly intranuclear, proteins and the lack of tools for sparse labeling, reliable intracellular single-molecule imaging even in fixed cells was not possible until the discovery of photoactivatable or photoswitchable FPs (or dyes; Patterson and Lippincott-Schwartz, 2002). These are proteins whose fluorescence upon excitation at certain wavelengths (usually 405 nm) can be either switched on or off or modified to a different emission spectrum (Lukyanov et al., 2005). These advanced labeling strategies, when combined with the subsequent development of photoactivated localization microscopy (PALM; Betzig et al., 2006) and stochastic optical reconstruction microscopy (STORM; Rust et al., 2006), ushered in a new era of superresolution imaging. With photoactivation, sparse labeling can be achieved by tunable stochastic activation of fluorophores, separating the appearance of individual molecules temporally. However, FPs are generally not very photostable and thus cannot support long-term single-molecule observation (Manley et al., 2008; Xia et al., 2013). Recent development of self-labeling tags (e.g., HaloTag and SNAPTag) and a new suite of photostable cell-permeable organic dyes (e.g., JF549 and JF646; Grimm et al., 2015) further expanded the spatiotemporal length scales of single-molecule measurements in live cells, enabling high signal-to-noise single-molecule detection in individual living cells. The 2D and 3D imaging modalities and requirements for a specific SMT application have been extensively discussed previously; for detailed reviews, we refer the readers to Liu et al. (2015), Manzo and Garcia-Parajo (2015), Presman et al. (2017), and Shen et al. (2017). The next major challenge of labeling was to devise means to sparsely label biomolecules in live cells for reliable single-molecule imaging and SMT under physiologically relevant conditions. Encouragingly, the rapid development of genome-editing techniques (Doudna and Charpentier, 2014) enabled functional labeling and imaging of endogenous proteins as reported in several recent studies (Teves et al., 2016; Hansen et al., 2017a).
Because of its ability to monitor the dynamics of individual biomolecules, SMT has the resolving power to monitor complex TF diffusion and binding kinetics (Fig. 1 E; Mazza et al., 2012; Gebhardt et al., 2013; Chen et al., 2014b; Izeddin et al., 2014; Ball et al., 2016; Paakinaho et al., 2017), investigate subpopulation-associated structures (Fig. 1 F; Liu et al., 2014; Mir et al., 2017; Wollman et al., 2017), and determine the order of multistep molecular binding events in live cells (Chen et al., 2014b; Xie et al., 2017). Moreover, robust computational pipelines have been devised to achieve accurate data analysis (Jaqaman et al., 2008; Sergé et al., 2008; Hansen et al., 2017b).
An important technical aspect to consider for SMT experiments in live cells is the effect of motion blurring. In fixed samples, molecules are by definition stationary. Thus, image acquisition times do not significantly influence single-molecule detection. However, in live cells, rapid molecular diffusion introduces motion blur during image acquisition, limiting localization precision. Various techniques have been deployed to counteract or, in some cases, take advantage of the motion blur effect (Fig. 1 C). At one end of the spectrum, high-intensity stroboscopic illumination pulses (<1 ms) have been used to minimize motion blur and capture rapid binding and unbinding events in the range of milliseconds (Elf et al., 2007). At the other extreme, low laser power and long acquisition times can be deployed to image relatively stable TF–chromatin binding events by blending blurred images of fast-diffusing unbound molecules into the background while also preserving in clear relief stable binding events as bright diffraction-limited spots for measuring binding events that last for seconds (Chen et al., 2014b; Swinstead et al., 2016; Hansen et al., 2017a; Xie et al., 2017). In addition, long dark periods between acquisitions can be introduced to reduce photobleaching, thereby increasing the dynamic range of single-molecule detection up to ∼10–20 min (Gebhardt et al., 2013; Normanno et al., 2015; Paakinaho et al., 2017; Liu et al., 2018). In essence, because the number of photons (photon budget) that can be emitted from any FP or dye molecule is finite, distinct illumination patterns may be used to capture desired TF dynamics occurring at widely different time scales. Because of this reason, SMT-based residence time measurements are highly sensitive to changes in imaging setups (e.g., the laser power, the objective NA, the camera sensitivity, the photostability of the label, and the imaging acquisition strategy). For example, it was reported that dimeric GR has a residence time of ∼1.45 s by using a EOS2 fusion protein (Gebhardt et al., 2013), whereas the GR-stable binding residence time measured by using the more photostable HaloTag is several-fold longer (∼7.4 s; Swinstead et al., 2016). Similarly, CTCF residence times estimated by different SMT imaging strategies showed drastic differences (∼1–2 min [Hansen et al., 2017a] versus >15 min [Agarwal et al., 2017]). These results highlight that specific imaging regimes are only optimized to detect the TF dwell times in particular temporal domains. Therefore, the best practice to avoid bias in data interpretation is to derive conclusions based on relative changes of residence times followed by well-controlled perturbations.
Another useful technique to probe fast TF diffusion dynamics is fluorescence correlation spectroscopy (FCS; Magde et al., 1972; Chen et al., 2008). Specifically, in FCS, a high-sensitivity point detector is used to record motion-induced fluorescence fluctuations at the focal spot (Fig. 1 D). Because molecules with different diffusion kinetics give rise to distinct photon burst patterns, the molecular diffusion characteristics can be estimated from subsequent autocorrelation function calculations. This technique can be quite useful for resolving fast diffusion dynamics. One limitation, however, is that the concentration of molecules in cells must be low enough (<10 nM) to generate robust fluctuations for detection. Combining FCS with photoactivatable GFPs (PA-FCS) can overcome this problem and allow FCS measurement in densely labeled samples (White et al., 2016). Another limitation of FCS is that immobile fluorophores in the focus region do not generate fluorescence fluctuations and can be bleached quickly (Stasevich et al., 2010). Thus, this technique is not suitable for studying stable TF binding events. In addition, FCS suffers from the same model-fitting problem as FRAP: mechanistically distinct models can often fit the same FCS curve equally well (Mazza et al., 2012).
The assembly of multiple TFs at distal enhancers is a signature feature of gene regulation in metazoans. However, one longstanding unresolved question in the field (Halford, 2009; Mirny et al., 2009) is how a TF molecule efficiently navigates through the complex and crowded nuclear environment, searching through several billion base pairs of DNA in the form of chromatin before locating a cognate target site. Early FRAP studies found that site-specific TFs show fast recovery after photobleaching, suggesting that a large fraction of TF molecules are in fast diffusion states (McNally et al., 2000; Stenoien et al., 2001; Sprague et al., 2004). The recent development of higher quantum yield and more stable self-labeling dyes enabled long-term tracking of TFs at the single molecule level. A study focused on imaging Sox2 and Oct4 dynamics in live embryonic stem (ES) cells observed that Sox2 and Oct4 molecules use a 3D diffusion–dominant trial-and-error target search mechanism (Chen et al., 2014b). Specifically, Sox2 molecules spend most of the time (∼97%) in stochastic diffusion and collision with nonspecific DNA before engaging a cognate-binding site. It is estimated that only a small fraction of Sox2 molecules (∼3%) in ES cells are bound to specific recognition sites in the equilibrium state. Consistent with previous studies on nuclear receptors (McNally et al., 1999, 2000; Voss et al., 2011; Paakinaho et al., 2017), Sox2 also displayed short mean residence times on the scale of ∼12 s at specific target sites and even more fleeting dwell times (0.7 s) at nonspecific sites. Interestingly, recent SMT studies also show that in live cells, Sox2 interacts with the mitotic chromosomes for bookmarking (Deluz et al., 2016; Teves et al., 2016). Depletion of Sox2 at the M–G1 transition abolishes the ability of ES cells to differentiate into neuroectodermal lineages but does not interfere with reprogramming toward induced pluripotent stem cells, suggesting that dynamic interaction of Sox2 with mitotic chromosome is critical for poised activation of neural gene expression programs (Deluz et al., 2016). Surprisingly, one recent study reveals that the FOXA1 pioneer TF fails to generate significant DNA footprints at chromatin-binding sites and likely only has a short specific binding residence time in the range of seconds (Swinstead et al., 2016). These results highlight a highly dynamic turnover of TFs at enhancer sites in live cells. Simulation experiments suggest that one important feature of rapid and dynamic TF binding and unbinding is that TF occupancy at the target site is highly sensitive to TF concentrations in the nucleus (Fig. 2 B; Chen et al., 2014b). Specifically, to compensate for short dwell times, high TF concentrations, at least locally, are required to increase TF sampling frequency at the enhancer to maintain functionally relevant TF temporal occupancy. However, if a TF binds DNA with very long residence times, its occupancy at a target site remains high regardless of its concentration in the cell and thus, the system becomes much less tunable (Fig. 2 B). It seems plausible that during evolution, there has been a tradeoff between TF DNA binding affinity and a more tunable on/off system. Rapid TF unbinding is likely desirable at the enhancers of those genes that need precise spatiotemporal expression patterns. Indeed, one study found that even a slight enhancement of Hox TF DNA binding affinity disrupts normal Drosophila melanogaster developmental programs (Crocker et al., 2015).
Another interesting question is whether multiple TFs bind to enhancers via a symmetric (random order) or asymmetric (hierarchically ordered) manner. One model is that TFs are symmetrically required for the binding and establishment of open chromatin at the target site. However, this model was quickly excluded by the finding that the FoxA1 factor binding in the genome opens up the chromatin and is required for the binding of other TFs to the target sites (Cirillo and Zaret, 1999; Cirillo et al., 2002). However, it was recently shown that the binding of FoxA1 to chromatin in many cases is dependent on steroid receptors (Swinstead et al., 2016). These results suggest that although the assembly of multiple TFs at enhancer sites is hierarchical, the order of assembly at particular enhancer sites could be highly context dependent. Supporting this model, single-molecule analysis found that in ES cells, Sox2 is kinetically favored to engage with a target site first and assists the subsequent binding of Oct4 (Chen et al., 2014b) in a hierarchically ordered fashion. Consistent with this observation, it was revealed by PA-FCS that the binding dynamics of Sox2 rather than that of Oct4 predicts the lineage fate of a cell in early embryos (White et al., 2016). Interestingly, a lead binder, Zelda, was identified in Drosophila with similar functions (Harrison et al., 2011; Mir et al., 2017). In the analysis of a more complex system, deletion of a single Oct4/Sox2 composite site at a distal Klf4 enhancer abolishes chromatin accessibility and the binding of Esrrb and Stat3 to the enhancer (Xie et al., 2017). In contrast, depletion of Esrrb and Stat3 had no significant effect on the binding of Sox2 and Oct4 to the enhancer, adding further support for an ordered assembly process at least for some enhancers in ES cells. Interestingly, Stat3 is the downstream component of the leukemia inhibitory factor signaling pathway (Dahéron et al., 2004), whereas Esrrb has been shown to interact with core promoter factors and Pol II (Percharde et al., 2012), suggesting a functional division of labor among individual TFs assembling at complex enhancers. Specifically, this model envisions that a small subset of lead binders (e.g., Sox2) are required to scan the genome, find its target site, and establish a permissive chromatin environment to assist other auxiliary TFs to bind subsequently (Fig. 2 A). The advantage for such a hierarchical and modular system is that TFs have overlapping as well as differentiating functions, which provides more flexibility and greater specificity, enabling the generation of rapid, nonlinear spatiotemporal outputs (Fig. 2 C). In contrast, in a symmetric system where partner TFs exhibit the same target search efficiency and equal ability to open up chromatin to initiate transcription, gene activity output would be a linear function of TF binding occupancy at the enhancer (Fig. 2 C).
Roughly 2 m (approximately six billion base pairs) of linear DNA must be packed into the nucleus of each diploid human cell (∼5–10 µm in diameter). With the same packaging density, it is possible to put a strand of DNA >6,000 times the Earth’s circumference inside a chicken egg. Effective DNA packing into the cell nucleus is partially achieved by wrapping DNA around histones into nucleosomes. Currently, we have crystal structures of the nucleosome (Luger et al., 1997). Based on in vitro structural work (Schalch et al., 2005; Woodcock and Ghosh, 2010; Song et al., 2014), it was proposed that nucleosome polymers can assemble into 30-nm fibers. Recently, a new DNA EM dye has been developed to image detailed high-resolution chromatin fiber organization in single fixed cells. These studies revealed very disordered structures, and fibers of ∼5–24 nm were found in situ (Ou et al., 2017), suggesting that the 30-nm fiber is probably not the dominant form of chromatin in the nucleus. Another finding from this study suggests that 3D subnuclear domains are assembled with distinct chromatin densities, and this feature likely determines the global accessibility and activity of DNA in the nucleus.
Currently, large subnuclear compartments such as heterochromatin, nuclear speckles, Cajal bodies, histone locus clusters, and nucleoids can be visualized in live cells with light microscopy (Mao et al., 2011). However, because of the high packing density of native chromatin fibers and a lack of noninvasive tools for specific gene locus labeling, the structural information underlying enhancer–promoter communication in the cell remains out of reach by conventional structural biology or light microscopy. The current information regarding potential structures driving enhancer–promoter communication is mainly derived from indirect methods such as chromosome conformation capture (3C)-based assays (de Wit and de Laat, 2012). Specifically, chemically cross-linked chromatin fibers are digested by restriction enzymes, and the contact frequency between chromatin fragments is inferred from proximity ligation reactions performed with mixtures of large populations of cells. In the past decade, extensive genomewide studies (4C, 5C, and Hi-C) have provided a model for chromatin folding, topological domains, and potential 3D genome organization (Dekker et al., 2013; Dekker and Mirny, 2016). The two main features extracted from these studies are: (1) as proposed originally from the earliest enhancer research (Su et al., 1991), enhancers and promoters likely communicate with each other via chromatin loops, and (2) the genome is organized into topologically associated domains (TADs) that might compartmentalize local gene activities. However, the data from imaging and genomics sometimes show significant discrepancies (Belmont, 2014; Williamson et al., 2014; Boettiger et al., 2016; Wang et al., 2016b), suggesting that 3C-based assays might have limited ability to capture the true 3D structures and physical proximity of elements in single live cells. There is also the added possibility that chemical cross-linking may introduce some unintended consequences not easily anticipated or interpreted. For example, recent SMT studies show that in live cells, Sox2 interacts with the mitotic chromosomes for bookmarking (Deluz et al., 2016; Teves et al., 2016), whereas most strikingly, chemical cross-linking leads to exclusion of Sox2 from mitotic chromosomes (Teves et al., 2016). In addition, a recent study on CTCF and cohesin with SMT reveal that chromatin loop formation might be a highly dynamic (~min) and regulated event in the cell (Hansen et al., 2017a). Results from such imaging studies using live cells suggest that the interaction frequency detected by 3C assays more likely reflect a complex convolution of physical proximity, DNA looping frequency, local chromatin environment, and cellular heterogeneity (Fudenberg and Imakaev, 2017). Consistent with this notion, single-cell Hi-C experiments reveal that stochastic clusters of contacts can occur across TAD boundaries in single cells but average into TADs in ensemble assays (Flyamer et al., 2017), suggesting that TADs might be the result of computational normalization of cross-linking events over millions of cells.
Despite some room for debate, live-cell imaging data do converge with genomic studies on the presence of topological structures in the nucleus that likely also shape local gene-regulatory activities. What is the reciprocal relationship between these topological structures and the spatial distribution of various TFs in the nucleus? By imaging long-lived Sox2 binding events in live cells with lattice light-sheet microscopy, it was possible to systematically map 3D Sox2 enhancer organization in ES cells. It was observed that Sox2 stable binding sites (most likely enhancers) form spatially restricted clusters in the nucleus of live ES cells (Liu et al., 2014). These enhancer clusters are spatially segregated from heterochromatic regions but overlap with a subset of Pol II–enriched clusters (Fig. 3 A). Furthermore, SMT experiments revealed that inside enhancer clusters, Sox2 displays significantly faster forward association rates (kon), thereby increasing local TF concentrations and allowing rapid TF rebinding to stretches of open chromatin (Fig. 3 B). Consistent with this early observation, SMT experiments confirmed that TF also forms clusters in yeast and Drosophila cells and that the TF clustering likely regulates transcriptional output from downstream genes (Mir et al., 2017; Wollman et al., 2017). In addition, detailed analysis of long-lived Sox2-binding events at the single molecule level revealed that Sox2 hops between clustered binding sites in spatially restricted subnuclear regions (Liu et al., 2018). These results support the existence of certain topological structures and TF hubs in the nucleus that can shape local TF target search dynamics and potentially fine-tune the rates of TF complex assembly at cis-regulatory elements.
What are the forces holding together such cis-regulatory element hubs? From the earliest cloning and characterization of classical sequence-specific TFs, a puzzling feature emerged: the discovery of simple repetitive, largely unstructured amino acid motifs (i.e., glycine- and proline-rich acidic repeats) that serve as activation domains (ADs) coupled with DNA-binding domains (Courey and Tjian, 1988). More recent evidence suggests that such simple repetitive amino acid motifs, now referred to as low-complexity domains (LCDs), are found to be highly prevalent in a variety of regulatory proteins including many classical TFs and other proteins such as Fused in sarcoma (FUS), TAF15, and Ewing sarcoma protein (EWS; Kwon et al., 2013) and can participate in forming dynamic phase-separated compartments in live cells (Patel et al., 2015; Shin and Brangwynne, 2017). We speculate that the in vivo enhancer clustering or “hub formation” observed in live-cell studies opens the possibility that high local concentrations of TFs may result from the formation of LCD clustering, at least transiently. Thus, it is possible that these enhancer clusters could serve as multivalent docking sites for dynamic TF recruitment via weak protein–protein interactions potentially directed by LCD-containing proteins. In turn, it is also possible that such weak protein–protein interactions reinforce and contribute to the establishment of cis-regulatory element clustering. Indeed, LCDs within the ADs of Sp1 and Foxp2 can dynamically interact with PolyQ arrays on the surface of mutant huntingtin aggregates in live cells (Li et al., 2016). Similarly, we can imagine that such hubs of weak multivalent protein–protein interactions within enhancer clusters could be assisted by stronger sequence-specific protein–DNA transactions that together influence the local chromatin environment to regulate gene activities. Interestingly, in vitro structural analysis on reconstituted Mig1 suggests that TF clusters in the yeast nucleus are stabilized by interactions between LCDs, highlighting a crucial role of LCD in mediating the formation of TF clusters (Fig. 3 C). Consistent with the increased forward association rate for Sox2 in the enhancer cluster, it is proposed that formation of these TF clusters in yeast reduces promoter search times through intersegment transfer while stabilizing gene expression (Fig. 3 D; Wollman et al., 2017). An interesting theoretical work also suggests that in the presence of TFs, DNA could undergo phase separation (Le Treut et al., 2016). Consistent with this TF hub model, other well-described phase-separated structures known as nuclear bodies (e.g., nuclear speckles, promyelocytic leukemia protein bodies, and Cajal bodies) are thought to be capable of influencing genome organization by sequestering target genes in specialized microenvironments (Brown et al., 2008; Ching et al., 2013; Wang et al., 2016a). Modeling of cell population–based genomic data also suggests clustering and higher-order organization of cis-regulatory elements in the nucleus (Canals-Hamann et al., 2013; Hnisz et al., 2013, 2017; Dai et al., 2016; Beagrie et al., 2017). However, these studies and hypothetical models do not provide spatial aspect information of the clusters at the single-cell, single-molecule level. Despite emerging evidence for TF clustering, it is important to note that the relationship between genome organization, gene positioning, and gene activities have not been clearly established. On one hand, it was shown that gene positioning in the nucleus could be potentially critical for mediating gene activities. For example, it was shown in pro-B cells and muscle cells that activation or repression of specific genes are associated with translocation of genes into new locations (Kosak et al., 2002; Moen et al., 2004). On the other hand, genomic data suggest that regions contacting GR-regulated genes are not particularly enriched for GR targeting sites or for any functional group of genes (Hakim et al., 2011), suggesting that functional genome organization does not likely respond to the clustered binding of individual TFs. The contact regions are, however, enriched for DNaseI-hypersensitive sites, indicating that the nucleus is preorganized in a conformation favorable to rapid transcriptional reprogramming, and this organization is likely orchestrated by chromatin sites accessible to diverse regulatory factors.
Core promoters and Pol II transcription
In addition to enhancer-mediated control, Pol II transcription is also extensively regulated at the core promoter by diverse sequence elements and cell type–specific core promoter factors (Goodrich and Tjian, 2010; Juven-Gershon and Kadonaga, 2010). Early fluorescence imaging and EM research led to the proposal that genes may be transcribed in “transcription factories” formed by clustered Pol II molecules in the nucleus (Cook, 1999). Later, live-cell imaging by FRAP and superresolution imaging in fixed cells showed that contrary to the “factory” hypothesis, there were no detectable stably bound and transcriptionally engaged Pol II within the clustered hot spots in the nucleus (Darzacq et al., 2007; Zhao et al., 2014). More recently, a live-cell PALM study revealed that Pol II molecules indeed form subdiffraction-size clusters (Cisse et al., 2013), but these clusters displayed lifetimes of only a few seconds incompatible with actively transcribing Pol II. These striking findings provide compelling evidence that alternative mechanisms must underlie Pol II–mediated transcription. A followup study on the β-actin locus revealed that dynamic Pol II cluster formation precedes mRNA production (Fig. 4 A; Cho et al., 2016), consistent with the notion that the observed Pol II clustering might participate in transcription initiation rather than elongation. Similarly, other research showed that low-complexity sequences within the Pol II C-terminal domain are required for phosphorylation-dependent Pol II hydrogel formation in vitro (Kwon et al., 2013). These data highlight the rapid molecular kinetics that likely drives transcription initiation. In light of observed TF enhancer clusters and Pol II hubs, one hypothesis is that multivalent weak interactions and cooperative binding within the enhancer clusters lead to increased local TF concentrations. Such TF hubs would in turn dictate local target search dynamics of key transcriptional preinitiation components including chromatin-remodeling complexes and general TFs, thereby triggering dynamic Pol II clustering and subsequent transcription initiation from nearby genes (Fig. 4 C). It is tempting to speculate that the enhancer clustering and its associated cofactors could thus form local TF hubs required for coordinated and synergistic gene regulation. Whether the enhancer clusters observed represent actively transcribed regions remains unclear, but the significant colocalization between enhancer clusters and Pol II would be consistent with such an interpretation (Liu et al., 2014).
This model predicts that rapid and highly dynamic weak interactions characterized by local high concentrations and physical proximity of TF and DNA elements rather than conventional stable lock-and-key protein–protein and TF–DNA contact is the driving force for enhancer–promoter communication. Results from several lines of research are consistent with this prediction. Most interestingly, when a single enhancer was used to drive the expression of two symmetric genes located upstream and downstream of the enhancer, it was observed that the two genes have synchronized transcription-bursting kinetics (Fig. 4 B; Fukaya et al., 2016). This result argues against a stable lock-and-key looping model for enhancer–promoter communication. Consistent with this finding, an early RNA FISH study showed that transcription bursting of two genes inserted into the same genomic locus is highly correlated, whereas the correlation disappears when the two genes are at separate genomic locations (Raj et al., 2006). Both examples suggest the existence of coordinated control of gene activities by local chromatin environment. Another interesting study based on CRISPR/Cas9 cis-regulatory deletion screening found that even deletions of a promoter could affect the expression of other nearby genes (Diao et al., 2016). These results strongly suggest that although enhancer elements play an important role in driving gene expression, their influence can be extensively modulated by the local chromatin environment and the level of available weak interactions in proximity.
Live-cell imaging of transcriptional regulation has the potential to greatly advance our understanding of molecular mechanisms underlying precise spatiotemporal gene expression programs. In this review, we have highlighted a few key concepts derived from recent studies linking TF dynamics, genome organization, and gene expression. Collectively, these studies reveal a highly dynamic yet regulated TF assembly process at enhancer elements. Specifically, rapid and dynamic binding and unbinding of TFs makes the enhancer system highly tunable and sensitive to TF concentration fluctuations in the cell. Hierarchically ordered TF assembly at enhancers enables rapid and flexible regulatory outputs with high specificity (Fig. 2). These live-cell studies also support the existence of topological structures in the nucleus that shape local TF target search dynamics and potentially other gene-regulatory activities. Based on these results, we propose an enhancer-mediated regulatory mechanism driven by and dependent on dynamic weak protein–protein and protein–DNA interactions (Fig. 3) that is quite distinct from prevalent “textbook” models. An important feature of this revised model is that physical proximity (i.e., local high concentration of TFs and DNA elements) rather than direct stable lock and key–type interactions between distal enhancer elements and gene-proximal promoters may be sufficient to deliver transcription activation by TFs bound to distal cis-elements communicating with core promoters. We envision that cis-regulatory element clustering with its high local TF and cofactor concentrations accompanied by altered target search features may be sufficient to serve as an alternative mechanism for achieving distal enhancer–directed transcription activation (Fig. 3 D). In the future, it will be critical to further probe mechanistic links between TF dynamics, Pol II clustering, and transcription output along with a deeper understanding of the molecular basis and functional relevance of topological structures in living cells. To address these problems, we will need to devise strategies to image functionally linked events (e.g., simultaneous measurements of TF binding, 3D single locus gene position, and active mRNA production) at single-molecule resolution and in single living cells. With continued development of noninvasive multicolor imaging modalities (Schermelleh et al., 2008; Chen et al., 2014a; Balzarotti et al., 2017), more robust live-cell single locus labeling tools (Chen et al., 2013; Ochiai et al., 2015), and large-scale FISH platforms (Beliveau et al., 2015; Shachar et al., 2015), we foresee a promising future for more accurately delineating gene-regulatory mechanisms and deciphering the dynamic behavior of key TFs at single-cell and single-molecule resolution. Indeed, we are optimistic that we can extend these types of quantitative measurements to whole organisms in vivo and perhaps even throughout the course of embryonic development with next-generation deep-tissue imaging techniques (Ji, 2017).
We thank Claudia Cattoglio, Anders Sejr Hansen, Liangqi Xie, and Mustafa Mir for proofreading the manuscript.
This work is solely funded by Howard Hughes Medical Institute.
The authors declare no competing financial interests.