Replication of mammalian genomes starts at sites termed replication origins, which historically have been difficult to locate as a result of large genome sizes, limited power of genetic identification schemes, and rareness and fragility of initiation intermediates. However, origins are now mapped by the thousands using microarrays and sequencing techniques. Independent studies show modest concordance, suggesting that mammalian origins can form at any DNA sequence but are suppressed by read-through transcription or that they can overlap the 5′ end or even the entire gene. These results require a critical reevaluation of whether origins form at specific DNA elements and/or epigenetic signals or require no such determinants.
Introduction
In 1963, Cairns invented DNA fiber autoradiography to spread and visualize [3H]thymidine-labeled chromosomal DNA from Escherichia coli cells (Fig. 1 A). Using this technique, bidirectional replication was found to start at a single site (Cairns, 1963; Prescott and Kuempel, 1972), but its position could not be determined. Meanwhile, genetic studies suggested that in E. coli, each autonomous replication unit (replicon) contained a cis-acting element, the replicator, and a trans-acting element, the structural gene for the initiator, whose interaction triggered replication (Jacob et al., 1964). Isolation of the chromosomal replicator (oriC) and initiator (DnaA) followed and led to in vitro reconstitution of replication initiation (Bramhill and Kornberg, 1988). In this reaction, DnaA binds and unwinds oriC to load the replicative helicase DnaB onto single DNA strands, seeding the assembly of two divergent replisomes (Bell and Kaguni, 2013; Costa et al., 2013).
When Huberman and Riggs (1968) applied DNA autoradiography to eukaryotic cells, they found that replication started (fired) at multiple points spaced at 20–400-kb intervals and progressed bidirectionally at 2–3 kb/min (Fig. 1 B). A new term (replication origin) was coined to designate the start sites. The word replicon then designated the DNA replicated from a single origin. In mammalian cells with a typical 8–10-h S phase, groups of 5–10 adjacent replicons replicated synchronously within ∼1 h, implying a sequential activation of origin clusters through S phase. Yurov and Liapunova (1977) later unveiled ∼1–2-Mb-long mammalian replicons that replicated through a larger S phase window. The fork progression rate was fairly constant between cell types and organisms, whereas origin spacing and synchrony were more flexible and accounted for developmental and evolutionary variations of S phase length (Berezney et al., 2000). Replicons shortened when fork progression was artificially perturbed, revealing additional flexibility in response to stress (Taylor, 1977; Gilbert, 2007). However, the anonymity of labeled tracts precluded determination of whether origins corresponded to specific DNA sequences.
Eukaryotic replicators were first isolated from budding yeast as 100–200-bp DNA segments that conferred autonomous replication to recombinant plasmids (Stinchcomb et al., 1979; Struhl et al., 1979; Chan and Tye, 1980). Autonomously replicating sequences (ARSs) required a degenerate 11-bp T-rich ARS consensus sequence (ACS) together with nonconsensus elements 3′ to the ACS for function (Newlon and Theis, 1993). On ARS plasmids, replication does initiate at the ARS element and nowhere else, as first shown by 2D agarose gel electrophoretic analysis of replicating restriction fragments (Fig. 2, A and B; Brewer and Fangman, 1987; Huberman et al., 1987). In their chromosomal context, ARSs fire with variable efficiency (0–100% of cell cycles) and at different times in S phase (correlated with efficiency); termination occurs wherever converging forks meet (Fangman and Brewer, 1991; Raghuraman and Brewer, 2010). Origins sometimes replicate passively from adjacent origins, and different cells activate different origin cohorts. Genome-wide replication profiling and mathematical modeling have corroborated these conclusions (Raghuraman et al., 2001; Yang et al., 2010; Bechhoefer and Rhind, 2012; Gillespie et al., 2012; Hawkins et al., 2013).
The eukaryotic initiator was first isolated from budding yeast as a heterohexameric origin recognition complex (ORC) that bound yeast replicators in vitro (Bell and Stillman, 1992). ORC genes are conserved throughout eukaryotes. Mutations in yeast ORC genes caused defects in initiation (Bell et al., 1993; Foss et al., 1993; Micklem et al., 1993). In vivo footprinting showed that the ORC binds ARSs through the cell cycle and that additional proteins join ORC in G1 to form a prereplicative complex (pre-RC; Diffley and Cocker, 1992; Diffley et al., 1994). Only ∼400 of the ∼20,000 ACSs in the yeast genome are actually occupied by the ORC in vivo and function as origins. Origin ACSs are specifically flanked on the 3′ side by A-rich nucleosome-excluding signals that allow ORC binding. The ORC subsequently repositions the flanking nucleosomes, which probably facilitates pre-RC assembly (Lipford and Bell, 2001; Berbenetz et al., 2010; Eaton et al., 2010). Initiation was mapped at nucleotide resolution at plasmid and chromosomal ARS1 by replication initiation point mapping (Fig. 3 C, 4). A single initiation site was found for each leading strand, next to the ORC binding site, at positions separated by 18 bp on the plasmid but 2 bp in the chromosome (Bielinsky and Gerbi, 1998, 1999).
Unlike the bacterial initiator, ORC does not unwind origin DNA. In G1 phase, ORC and replication factors Cdc6 and Cdt1 load the minichromosome maintenance proteins 2–7 (MCM2–7), which form the core of the replicative helicase, as inactive head-to-head double hexamers onto double-stranded DNA (dsDNA). This step is termed origin licensing or pre-RC formation. In S phase, pre-RCs are acted on by S-phase protein kinases and many accessory factors, which reconfigure the inactive MCM2–7 double hexamer from a dsDNA binding mode to a single-stranded DNA binding mode, rendering the helicase active for origin unwinding and bidirectional replisome assembly (Fu et al., 2011; Tognetti et al., 2014).
A single ORC can load multiple MCM2–7 double hexamers onto dsDNA during licensing, but only a small fraction is activated in an unperturbed S phase. Unfired MCM2–7 double hexamers provide backup origins that can facilitate completion of normal S phase (Lucas et al., 2000; Hyrien et al., 2003) or rescue artificially stalled forks (Woodward et al., 2006; Ge et al., 2007; Ibarra et al., 2008). As cells complete S phase, however, unfired MCM2–7 double hexamers are cleared from chromatin by still elusive mechanisms. MCM2–7 double hexamers cannot be reloaded until the next G1 phase because of multiple cell cycle regulatory mechanisms (Siddiqui et al., 2013), which prevent DNA re-replication in a single cell cycle.
Random versus site-specific initiation in metazoans
In contrast to yeast, autonomous replication assays generally failed to identify metazoan replicators. DNA of any source replicated with an efficiency proportional to size but largely independent of DNA sequence in Xenopus laevis eggs or egg extracts (Harland and Laskey, 1980; Méchali and Kearsey, 1984; Blow and Laskey, 1986) and in human cells (Krysan et al., 1989). 2D gels showed random initiation in both cases (Krysan and Calos, 1991; Hyrien and Méchali, 1992; Mahbubani et al., 1992). Random initiation was also observed within the transcriptionally silent chromosomes of early Xenopus and Drosophila melanogaster embryos (Shinomiya and Ina, 1991; Hyrien and Méchali, 1993). Thus, random initiation is not a unique feature of exogenous DNA and is compatible with organismal viability. Consistent with this apparent lack of metazoan replicators, metazoan ORC bound DNA in vitro without sequence specificity, albeit with increased affinity for negatively supercoiled DNA (Vashee et al., 2003; Remus et al., 2004; Schaarschmidt et al., 2004).
In spite of that, replication initiates at specific positions within transcriptionally active metazoan chromosomes. EM first revealed that active nucleolar chromatin of fly larvae replicates from origins restricted to nontranscribed spacer elements between ribosomal RNA genes (McKnight and Miller, 1977), a location conserved in Xenopus and human somatic cells (Bozzoni et al., 1981; Little et al., 1993; Hyrien et al., 1995). The transition from random to specific initiation in intergenic sequences occurred when transcription resumed at the midblastula transition in developing Xenopus and Drosophila embryos (Hyrien et al., 1995; Sasaki et al., 1999). Thus, although the entire genome was a potential substrate for initiation, the efficiency of individual sites was epigenetically modulated in coordination with transcriptional activity during development.
Metazoan ORC may be targeted to specific sites by cofactors such as HMGA1 (Thomae et al., 2008), ORC-associated protein ORCA (Shen et al., 2010), noncoding RNAs (Norseen et al., 2008; Krude et al., 2009; Zhang et al., 2011), or specific histone modifications (Shen and Prasanth, 2012; Méchali et al., 2013; Sherstyuk et al., 2014). Tethering ORC to an array of Gal4 DNA binding sites was sufficient to generate a mammalian origin (Takeda et al., 2005). Tethering PR-Set7, the methylase responsible for H4K20 monomethylation, promoted trimethylation by Suv4-20h followed by ORC recruitment through ORC1 and ORCA (Tardat et al., 2010; Beck et al., 2012). Metazoan but not yeast ORC1 contains a bromo-adjacent homology (BAH) domain that specifically recognizes H4K20me2, and ORC1 BAH mutations that caused primordial dwarfism abolished this interaction and impaired ORC loading and cell cycle progression (Kuo et al., 2012). Thus, various trans-regulatory mechanisms may create sequence-specific origins despite the lack of classical replicators in metazoans.
Interestingly, yeast replicators were not strictly required for in vitro replication in yeast extracts; replication only became origin dependent in the presence of competitor DNA and limiting ORC concentrations (Gros et al., 2014; On et al., 2014). As restrictions imposed by chromatin were bypassed in this system, epigenetic mechanisms may contribute to origin specification in yeast as in mammals.
The Chinese hamster dihydrofolate reductase (DHFR) initiation zone
Identification of the first and most studied mammalian origin took advantage of a methotrexate-resistant Chinese ovary cell line (CHOC400) that carried ∼1,000 copies of a ∼240-kb amplicon including the DHFR gene (Hamlin et al., 2010). Separation of EcoRI-digested CHOC400 DNA on agarose gels revealed ∼50 high-copy number restriction fragments above the background of single-copy fragments. Autoradiography of DNA labeled with [3H]thymidine in early S phase identified a half-dozen of earliest labeled fragments (Heintz and Hamlin, 1982) that all mapped within the 55-kb spacer between the convergently transcribed DHFR and 2BE2121 genes (Looney and Hamlin, 1987).
These results gave hope that mammalian replicators could be identified by more precise mapping within the spacer. Early studies indeed pointed to one or two preferential initiation sites (Handeli et al., 1989; Leu and Hamlin, 1989; Burhans et al., 1990; Vassilev et al., 1990). However, 2D gels revealed that initiation could in fact occur at any of a large number (>40) of potential sites with different efficiencies within the 55-kb spacer (Vaughn et al., 1990; Dijkwel and Hamlin, 1992; Dijkwel et al., 2002), even in nonamplified CHO cells (Dijkwel and Hamlin, 1995). This dispersive initiation was confirmed by macroarray hybridization of Okazaki fragments or short nascent strands (SNSs; replicative 300–1,000-nt single strands, assumed to be centered on origins; Fig. 3) labeled in permeabilized cells (Wang et al., 1998; Dijkwel et al., 2002; Sasaki et al., 2006) and by DNA combing (Fig. 1 C; Lubelsky et al., 2011). ORCs and minichromosome maintenance proteins were located by chromatin immunoprecipitation at low nucleosome occupancy sites, and enrichment was related to initiation efficiency (Lubelsky et al., 2011).
Initiation was only detected in <30% of the spacer copies, and active spacers appeared to support a single initiation event, implying a mean efficiency of initiation per kilobase of ∼0.5% (Dijkwel and Hamlin, 1992). Sites named ori-β, ori-β′, and ori-γ appeared preferred, though (Handeli et al., 1989; Leu and Hamlin, 1989; Burhans et al., 1990; Dijkwel et al., 2002), and analysis of ectopically relocalized ori-β suggested that small deletions could reduce its ectopic origin activity (Altman and Fanning, 2004). However, in loco deletion of ori-β, ori-β′, or even a central 40-kb segment spanning ori-β, ori-β′, and ori-γ did not reduce but actually increased initiation in the rest of the spacer (Mesner et al., 2003). Therefore, none of the preferred initiation sites contained any critical, nonredundant element required for initiation, and if redundant replicators existed, each appeared to control initiation only locally and inefficiently.
Transcription circumscribes and stimulates replication initiation
Dispersive initiation bounded by transcribed genes, as in Xenopus and Drosophila post-midblastula transition embryos (Hyrien et al., 1995; Sasaki et al., 1999), was also observed by 2D gels at the CHO rhodopsin locus (Dijkwel et al., 2000) and in human ribosomal DNA (Little et al., 1993). Deleting the DHFR gene promoter broadened the initiation zone to include the inactive DHFR gene (Saha et al., 2004). Conversely, deleting the DHFR transcription terminator allowed transcription to invade all but 8 kb of the spacer and confined initiation to that segment (Mesner and Hamlin, 2005). When fragments containing ori-β, ori-β′, or DHFR gene sequences were integrated at ectopic locations, they all sustained dispersive initiation, whereas the transcribed, neomycin-resistance adjacent marker did not, but when a cosmid containing an active DHFR gene was relocated, dispersive initiation was detected in the nontranscribed bacterial vector sequences but not in the DHFR gene (Lin et al., 2005). These genetic experiments strongly suggested that any DNA sequence contained potential initiation sites but that these sites could be silenced by read-through transcription. Consistently, transcription inhibited autonomous plasmid replication in human cells (Haase et al., 1994).
When bare DNA or early G1 CHOC400 nuclei were added to Xenopus egg extracts, replication initiated at random sequences. When late G1 nuclei were the template, however, replication initiated specifically within the DHFR initiation zone (Gilbert et al., 1995; Wu and Gilbert, 1996). This transition, named the origin decision point (ODP), occurred in G1 ∼4 h after metaphase and was abolished by transcription inhibitors (but not protein synthesis inhibitors). However, transcription of the DHFR domain was detected before the ODP and did not increase at the ODP. Therefore, transcription was necessary but not sufficient to circumscribe initiation (Dimitrova, 2006; Sasaki et al., 2006). The ODP perhaps activates a mechanism for unloading pre-RCs from transcribed genes in G1, reminiscent of pre-RC unloading ahead of progressing forks during S phase.
Deletion of the DHFR gene promoter broadened the initiation zone but also lowered its overall efficiency (Saha et al., 2004). Conversely, zone truncation by internal deletion (Mesner et al., 2003) or invading transcription (Mesner and Hamlin, 2005) increased initiation in the remaining nontranscribed sequences. Thus, nearby transcription had positive effects on adjacent initiation, causing compensatory changes in zone size and local initiation rate.
Broad and narrow initiation zones
Before the genomic era, only few mammalian origins were identified. Mapping single-copy origins was challenging, and various techniques were elaborated to capture and quantify the rare and fragile initiation intermediates (Figs. 1–3).
Early strand polarity (Fig. 3 C, 2 and 3) or SNS abundance (Fig. 3, B and C,1) assays pointed to narrowly localized origins upstream of the MYC gene (Vassilev and Johnson, 1990), between the LMNB2 and TIMM13 genes (Biamonti et al., 1992; Giacca et al., 1994; Kumar et al., 1996), and between the δ- and β-globin genes (Kitsberg et al., 1993). In contrast to the apparent lack of human replicators (Krysan et al., 1989), an ARS was identified upstream of the MYC gene in HeLa cells (McWhinney and Leffak, 1990), and the leading strand switch at the β-globin origin was suppressed by a natural 8-kb deletion spanning the origin (Kitsberg et al., 1993) or by a remote deletion encompassing a distant transcriptional regulatory element, the locus control region (Aladjem et al., 1995). When wild-type or mutated MYC, β-globin, or LMNB2 origin fragments were ectopically relocalized, SNS assays detected ectopic origin activity, and certain mutations reduced SNS abundance in a manner consistent with a modular replicator structure (Aladjem et al., 1998; Malott and Leffak, 1999; Liu et al., 2003; Paixão et al., 2004; Wang et al., 2004; Buzina et al., 2005), as for the DHFR ori-β relocation experiments (Altman and Fanning, 2004). However, later SNS studies revealed broader and more dispersive initiation than initially thought at the MYC (Waltz et al., 1996; Trivedi et al., 1998) and human β-globin (Kamath and Leffak, 2001) loci and broad initiation zones at the homologous mouse and chicken β-globin domains (Aladjem et al., 2002; Prioleau et al., 2003). Although two replication initiation point mapping (Fig. 3 C, 4) studies (Abdurashidova et al., 2000; Lee and Romero, 2012) reported highly localized—but partly conflicting—leading-strand start sites within the LMNB2/TIMM13 intergene, DNA combing (Fig. 1 C) detected broadly dispersed initiation over ∼800 kb of surrounding DNA with only some preference for a ∼200-kb area upstream of the LMNB2 gene (Palumbo et al., 2010). In summary, sites initially believed to represent efficient and specific replicators may in fact be embedded in broad initiation zones. The significance of ectopic relocation experiments is thus limited, as only local effects were monitored while surrounding sequences may also support initiation.
A prevalence of dispersive initiation zones has been observed in mammalian cells. DNA combing identified 36 fully or predominantly intergenic initiation zones in a 1.5-Mb region of human chromosome 14q11.2 (Lebofsky et al., 2006). Each zone (2.6–21.6 kb in size) fired in only a fraction of the cell cycles and seldom sustained more than one initiation, reminiscent of the DHFR initiation zone. Broad initiation zones were identified by single molecule analysis of replicated DNA (SMARD; Fig. 1 C) at the mouse Igh locus (Norio et al., 2005; Demczuk et al., 2012; Gauthier et al., 2012), at the human POU5F1, NANOG (Schultz et al., 2010), and FMR1 (Gerhardt et al., 2014) loci, and at human subtelomeres (Drosopoulos et al., 2012). Six narrow intergenic origins identified by DNA combing in the polygenic Chinese hamster AMPD2 amplicon (Anglana et al., 2003) may consist of single sites or narrow zones. The most efficient one, oriGNAI3, had been previously detected by neutral/alkaline 2D gel, SNS abundance, and leading-strand polarity assays (Toledo et al., 1998, 1999; Svetlova et al., 2001). Origin hierarchy was regulated by fork speed such that oriGNAI3 predominance was stronger when forks progressed faster (Anglana et al., 2003), perhaps because faster forks left nearby weaker origins less chance to fire.
Developmental, metabolic, and hierarchical regulation of origins
Developmental activation or repression of initiation sites was observed at the mouse Igh locus during B cell development (Norio et al., 2005). Extensive SMARD analysis suggested that potential origins were abundant throughout the locus but fired at a rate that changed abruptly (≤77-fold) between adjacent domains (50–650 kb in size) while staying constant within domains and implicated the developmental regulator Pax5 in modifying origin usage during differentiation (Demczuk et al., 2012; Gauthier et al., 2012). Changes in origin usage were also observed at the chicken β-globin locus during terminal erythrocyte differentiation (Dazy et al., 2006), at the mouse HoxB9 locus during in vitro differentiation of embryonic carcinoma cells (Grégoire et al., 2006), and at the human POU5F1 locus during human embryonic stem cell differentiation (Schultz et al., 2010).
By analogy to transcription, it was proposed that histone acetylation may increase origin accessibility and activity. Trichostatin A, a histone hyperacetylating agent, increased initiation genome wide and evened out initiation preference at specific human origins (Kemp et al., 2005). However, no clear link was observed between developmental regulation of origin activity and histone acetylation at the chicken β-globin and mouse HoxB9 loci (Prioleau et al., 2003; Grégoire et al., 2006). In the AMPD2 amplicon, Trichostatin A attenuated origin hierarchy but also altered pyrimidine pools and slowed fork progression; supplying nucleotide precursors restored both fork speed and origin hierarchy, which were therefore independent of origin histone acetylation (Gay et al., 2010).
Genome-wide analysis of purified replication intermediates
DNA microarrays and massive DNA sequencing have caused an explosion in the number of mammalian genome-wide origin maps (Table 1) and replication timing profiles (Gilbert, 2010, 2012; Rhind and Gilbert, 2013). In general, replication timing was highly reproducible but not resolutive enough to map individual origins, whereas origin maps were more resolutive but less concordant.
The first high-throughput mapping of human origins probed microarrays spanning the myc, LMNB2, β-globin, and FMR1 origins and a 1-Mb region on chromosome 22 with short (0.3–1.0 kb) DNA single strands (short single strands [SSS]), assumed to represent newly synthesized origin DNA, from lymphoblastoid cells. The four control and 28 new origins were detected (Lucas et al., 2007). However, these SSS (1% of starting total DNA) were much more abundant than expected (∼0.001%), suggesting massive contamination by irrelevant nicked DNA. Cadoret et al. (2008) reported that ∼99% of SSS from HeLa cells were eliminated by λ 5′-exonuclease, which digests DNA lacking an RNA primer (Bielinsky and Gerbi, 1998). When the remaining material (λ-SNS) was hybridized to ENCODE microarrays, 283 peaks were identified compared with nine peaks with undigested SSS, suggesting that λ-exonuclease treatment was mandatory to identify origins (Cadoret et al., 2008), as confirmed by Cayrou et al. (2011). In contrast, Valenzuela et al. (2011) detected no difference between SSS and λ-SNS from MCF-7 cells (Table 1). No major differences between SSS and BrdU-labeled, immunopurified SNS (BrdU-SNS) were detected by PCR analyses of the LMNB2 (Kumar et al., 1996) and GNAI3 (Toledo et al., 1998, 1999) origins. How could origins be detected in SSS if >99% are irrelevant broken strands? Do they preferentially break during SSS isolation, overreplicate, or accumulate as unligated strands? Strikingly, Gómez and Antequera (2008) reported that origins were overrepresented ∼10-fold in total human DNA, as a result of reiterative synthesis and release of short (200 bp) dsDNA molecules with 5′RNA primers. Whether such “abortive initiation” generates SSS, BrdU-SNS, or λ-SNS peaks remains unclear.
Disturbingly, only a 11–35% pairwise matching was observed between independent studies profiling either λ-SNS (Cadoret et al., 2008; Karnani et al., 2010), BrdU-SNS (Karnani et al., 2010), or replication bubble–containing EcoRI fragments (trapped in gelling agarose; Fig. 2 C; Mesner et al., 2011) from HeLa cells along ENCODE microarrays. Importantly, the purity of the trapped bubbles was evaluated to >80% by 2D gel analysis, a method independent of the origin isolation scheme (Fig. 2, B and C).
The ENCODE λ-SNS formed narrow peaks, whereas the bubble fragments clustered into zones that showed dispersive initiation by 2D gels. It was possible that flat SNS signals produced by dispersive initiation were overlooked by peak-calling algorithms or that lack of saturation limited the overlap of ENCODE studies. Consistently, Mukhopadhyay et al. (2014) reported 70% genome-wide overlap between independently prepared λ-SNS and BrdU-SNS. Furthermore, when Besnard et al. (2012) sequenced λ-SNS from HeLa and three other cell types to saturation (∼250,000 peaks in each), many peaks did cluster into zones. However, the complete genomic set of HeLa λ-SNS still only overlapped 51%, 14%, and 6% of the bubbles, λ-SNS, and BrdU-SNS of Mesner et al. (2011) and Karnani et al. (2010), respectively, in contrast to 80% of the λ-SNS of Cadoret et al. (2008).
Bubbles and SNS showed different conservation between cell types. HeLa (adenocarcinoma) and GM06990 (lymphoblastoid) cells shared only 28–43% of their ENCODE bubble fragments (Mesner et al., 2011), but λ-SNS were more conserved between cell lines (Sequeira-Mendes et al., 2009; Cayrou et al., 2011; Valenzuela et al., 2011; Picard et al., 2014). The λ-SNS of the four cell lines studied by Besnard et al. (2012) overlapped by 65–84% pairwise, with 50% ubiquitous peaks that included 91% of the λ-SNS of Cadoret et al. (2008) or Martin et al. (2011) and even 81% of the SSS of Lucas et al. (2007). When Mesner et al. (2013) sequenced bubbles from GM06990 cells, however, only 33–37% overlapped any of the λ-SNS of Besnard et al. (2012), and only 45–46% of the latter overlapped the bubbles. Picard et al. (2014) called λ-SNS peaks from the Besnard et al. (2012) data using a broader window and clustered them into zones comparable to bubbles, thus raising the overlap with bubbles to 65%; yet, only 44% of the bubbles overlapped the λ-SNS zones. In summary, bubble maps remain discordant from SNS and are markedly more flexible between cell lines (Table 1). The contrasting properties of these origin populations are further discussed at the end of this review.
Genome-wide localization of Homo sapiens ORC in HeLa cells was reported by Dellino et al. (2013), who identified 13,600 ORC1 binding sites with no consensus sequence. Only 11%, 30%, and 47% of the 229 ENCODE ORC1 peaks matched the Karnani λ-SNS, the Cadoret λ-SNS, and the Mesner bubbles, and only 8%, 23%, and 20% of the latter, respectively, coincided with ORC1 peaks. As ORC has other functions than replication initiation, it is likely that only a fraction of these peaks correspond to true origins.
Potential genetic and epigenetic determinants of origins
Despite incomplete overlap, many studies reported a correlation with transcription start sites (TSSs) and CpG islands (Table 1). Moreover, G-rich or G-quadruplex (G4) motifs were associated with 70–90% of human, mouse, and Drosophila λ-SNS peaks (Besnard et al., 2012; Cayrou et al., 2012a). Interestingly, Homo sapiens ORC bound randomly to dsDNA but preferentially to G4 motifs on single-stranded DNA (Hoshina et al., 2013). G4s were required in an orientation-dependent manner for λ-SNS accumulation at two origins in DT40 chicken cells (Valton et al., 2014). However, 36% of GM06990 bubble fragments did not contain G4s (Mesner et al., 2013). The concern was raised that λ-exonuclease can pause in a strand-specific manner at GC-rich sequences (Perkins et al., 2003; Conroy et al., 2010). This could explain the λ-SNS enrichment in CpG islands and G4s, their strong conservation between cell types, and the orientation effects observed by Valton et al. (2014), although Cayrou et al. (2011, 2012b) reported that λ-SNS were eliminated by previous RNase or alkali treatment and absent from mitotic or quiescent cells.
Until recently, no particular histone modification showed a striking association with origins. H3K79me2 methylation was more enriched, at origins mapped by Martin et al. (2011), than any other single chromatin modification, but prevention of H3K79 methylation did not alter origin density (Fu et al., 2013). Monomethylation of H4K20 by Pr7-Set followed by di- and trimethylation by Suv4-20h has been implicated in pre-RC assembly (Tardat et al., 2010; Beck et al., 2012). ORC1 BAH domain specifically recognizes H4K20me2 (Kuo et al., 2012), but this histone modification seems too abundant (80% of all H4 molecules) to explain origin specificity (Schotta et al., 2008). ORCA showed a preference for H4K20me3, which, in combination with H4K20me2, may guide origin choice more selectively (Beck et al., 2012). Association of H4K20me1 with origins, though not observed by Martin et al. (2011), was detected by Picard et al. (2014). Cell cycle investigations of all three H4K20 methylation states may provide further insight. More details on epigenetic modulation of origins can be found in Sherstyuk et al. (2014).
Genome-wide analysis of replication fork directionality
Origins can be predicted from DNA sequence alone (Hyrien et al., 2013). Lobry (1996) discovered that bacteria have an asymmetric composition of the two DNA strands, with enrichment of the leading strand in G over C and T over A. The GC and TA skews SGC = (G − C)/(G + C) and STA = (T − A)/(T + A) reflect replication direction throughout evolution because the leading and lagging strands experience different rates of nucleotide substitution. Detecting an abrupt change of sign of SGC is thus used to predict bacterial origins and termini (Grigoriev, 1998).
Upward skew jumps (+S jumps) similar to bacterial origins have been detected at 1,546 sites in the human genome (Brodie of Brodie et al., 2005; Touchon et al., 2005). They frequently occurred between divergent housekeeping genes (Huvet et al., 2007) in open chromatin (Audit et al., 2009) and arose from additive effects of replication- and transcription-associated mutational asymmetries (Chen et al., 2011). Between upward jumps, the skew decreased in a linear manner, suggesting a progressive inversion of replication fork directionality across megabase-sized segments termed N domains because of the resulting N-shaped skew profile (Huvet et al., 2007). Thus, S-jump origins must have been highly active over evolution, whereas any intervening origins must have fired dispersively to account for this progressive inversion (Hyrien et al., 2013).
When compared with somatic cell replication timing profiles, S jumps coincided with early replicating peaks and N-domain centers with late-replicating U-shaped valleys (Audit et al., 2007; Chen et al., 2010; Hansen et al., 2010; Baker et al., 2012). Hundreds of megabase-sized domains with a U-shaped timing profile were identified independent of skew analysis (Baker et al., 2012; Audit et al., 2013). Demonstrating that the timing gradient equaled the ratio of fork speed to fork directionality led us to predict an N-shaped fork directionality profile of U domains strikingly similar to skew N domains (Guilbaud et al., 2011; Baker et al., 2012). U domains coincided with chromatin self-interaction domains revealed by conformation capture (Lieberman-Aiden et al., 2009). The U and N shapes of the timing and fork directionality profiles were quantitatively explained by a cascade model for sequential activation of origins with increasing synchrony from domain borders to center (Hyrien et al., 2013).
Potential models for mammalian genome replication
Given the incomplete concordance of origin features and locations between studies (Table 1), a definitive portrait of mammalian origins is premature. Nonetheless, most studies agreed that a fraction of origins overlapped the 5′ end or the entirety of active transcription units and their chromatin marks, especially in early replicating regions, whereas late origins were less efficient, more dispersed, and not associated with these marks. λ-SNS tended to highlight the former category, whereas bubbles were more often detected in nontranscribed genic or intergenic sequences, regardless of replication timing, and were anticorrelated with both activating and repressive chromatin marks in late-replicating zones. Comparison of replication timing and chromatin interaction data suggested that early and late-replicating sequences reside in two segregated chromatin compartments (Ryba et al., 2010). Replication timing can be predicted by DNase I hypersensitivity better than by TSSs, suggesting that origins colocalize with promoters just because they colocalize with DNase hypersensitivity (Gindin et al., 2014). Consistently, the cascade model proposed for N/U-domain replication combines efficient initiation at early replicating master origins in open chromatin between active genes with more random and later initiation elsewhere (Hyrien et al., 2013).
Direct determination of replication fork directionality by sequencing of Okazaki fragments, a powerful technique first validated in yeast (McGuffee et al., 2013), has been recently achieved in human cells (unpublished data). These new data, which confirm the predicted directionality profiles of N/U domains, will hopefully contribute to clarifying the mist that still surrounds mammalian replication origins.
Acknowledgments
I thank Benjamin Audit, Alain Arneodo, Francesco de Carli, Chun-Long Chen, Nataliya Petryk, Claude Thermes, and Xia Wu for their comments on the manuscript.
Work in my laboratory was supported by grants from the Ligue Nationale Contre le Cancer (Comité de Paris), the Cancéropole Ile-de-France (ERABL), and the Agence Nationale pour la Recherche (REFOPOL-BLAN2010-161501).
The author declares no competing financial interests.
References
- ACS
ARS consensus sequence
- ARS
autonomously replicating sequence
- BAH
bromo-adjacent homology
- CldU
chlorodeoxyuridine
- DHFR
dihydrofolate reductase
- dsDNA
double-stranded DNA
- IdU
iododeoxyuridine
- ODP
origin decision point
- ORC
origin recognition complex
- pre-RC
prereplicative complex
- SMARD
single molecule analysis of replicated DNA
- SNS
short nascent strand
- SSS
short single strands
- TSS
transcription start site