Replication of mammalian genomes starts at sites termed replication origins, which historically have been difficult to locate as a result of large genome sizes, limited power of genetic identification schemes, and rareness and fragility of initiation intermediates. However, origins are now mapped by the thousands using microarrays and sequencing techniques. Independent studies show modest concordance, suggesting that mammalian origins can form at any DNA sequence but are suppressed by read-through transcription or that they can overlap the 5′ end or even the entire gene. These results require a critical reevaluation of whether origins form at specific DNA elements and/or epigenetic signals or require no such determinants.
Introduction
In 1963, Cairns invented DNA fiber autoradiography to spread and visualize [3H]thymidine-labeled chromosomal DNA from Escherichia coli cells (Fig. 1 A). Using this technique, bidirectional replication was found to start at a single site (Cairns, 1963; Prescott and Kuempel, 1972), but its position could not be determined. Meanwhile, genetic studies suggested that in E. coli, each autonomous replication unit (replicon) contained a cis-acting element, the replicator, and a trans-acting element, the structural gene for the initiator, whose interaction triggered replication (Jacob et al., 1964). Isolation of the chromosomal replicator (oriC) and initiator (DnaA) followed and led to in vitro reconstitution of replication initiation (Bramhill and Kornberg, 1988). In this reaction, DnaA binds and unwinds oriC to load the replicative helicase DnaB onto single DNA strands, seeding the assembly of two divergent replisomes (Bell and Kaguni, 2013; Costa et al., 2013).
Replication mapping by DNA fiber techniques. (A and B) DNA fiber autoradiography (Cairns, 1963; Huberman and Riggs, 1968). Cells are labeled with [3H]thymidine, gently lysed on a glass slide, covered with photographic emulsion, and exposed for several months to reveal the stretches of radiolabeled spread DNA as silver grain tracks. (A) Intact E. coli chromosomal DNA labeled for several generations showed θ forms, suggesting replication by the fork mechanism and a single initiation event per chromosome. Brief sequential pulses of low and high activity produced grain tracks denser on both ends than in the middle, providing evidence for bidirectional replication (Prescott and Kuempel, 1972). (B) Pulse-labeled DNA from eukaryotic cells showed tandem tracks, indicating multiple bidirectional origins (Huberman and Riggs, 1968). (C) In DNA fiber fluorography, cells are typically pulsed with chlorodeoxyuridine (CldU) and then iododeoxyuridine (IdU; or vice versa), and the labeled tracks are detected with appropriate fluorescent antibodies, shortening imaging times to seconds. DNA can be spread by direct cell lysis on a glass slide (Jackson and Pombo, 1998), by attachment of purified DNA molecule ends to silanized coverslips before parallel stretching by a receding air/water meniscus (DNA combing; Michalet et al., 1997), or by capillary stretching between slide and coverslip (Norio and Schildkraut, 2001). CldU/IdU detection (green/blue) can be combined with FISH with specific DNA probes (red) to identify and orient target DNA molecules (Norio and Schildkraut, 2001; Anglana et al., 2003). Unlabeled DNA (dotted line) may be simultaneously detected in a fourth color with anti-DNA antibodies. In a variation called SMARD (Norio and Schildkraut, 2001), the labeled DNA is cut with a rare cutter restriction endonuclease, and a large (100–500 kb) target fragment is enriched by pulsed-field gel electrophoresis before stretching and detection. In this case, long CldU/IdU labeling times are needed to chase forks out of the labeled fragments before their electrophoretic separation. Tens to hundreds of single DNA molecules 100–1,000 kb in size identified by FISH are analyzed in a typical SMARD or DNA combing experiment.
Replication mapping by DNA fiber techniques. (A and B) DNA fiber autoradiography (Cairns, 1963; Huberman and Riggs, 1968). Cells are labeled with [3H]thymidine, gently lysed on a glass slide, covered with photographic emulsion, and exposed for several months to reveal the stretches of radiolabeled spread DNA as silver grain tracks. (A) Intact E. coli chromosomal DNA labeled for several generations showed θ forms, suggesting replication by the fork mechanism and a single initiation event per chromosome. Brief sequential pulses of low and high activity produced grain tracks denser on both ends than in the middle, providing evidence for bidirectional replication (Prescott and Kuempel, 1972). (B) Pulse-labeled DNA from eukaryotic cells showed tandem tracks, indicating multiple bidirectional origins (Huberman and Riggs, 1968). (C) In DNA fiber fluorography, cells are typically pulsed with chlorodeoxyuridine (CldU) and then iododeoxyuridine (IdU; or vice versa), and the labeled tracks are detected with appropriate fluorescent antibodies, shortening imaging times to seconds. DNA can be spread by direct cell lysis on a glass slide (Jackson and Pombo, 1998), by attachment of purified DNA molecule ends to silanized coverslips before parallel stretching by a receding air/water meniscus (DNA combing; Michalet et al., 1997), or by capillary stretching between slide and coverslip (Norio and Schildkraut, 2001). CldU/IdU detection (green/blue) can be combined with FISH with specific DNA probes (red) to identify and orient target DNA molecules (Norio and Schildkraut, 2001; Anglana et al., 2003). Unlabeled DNA (dotted line) may be simultaneously detected in a fourth color with anti-DNA antibodies. In a variation called SMARD (Norio and Schildkraut, 2001), the labeled DNA is cut with a rare cutter restriction endonuclease, and a large (100–500 kb) target fragment is enriched by pulsed-field gel electrophoresis before stretching and detection. In this case, long CldU/IdU labeling times are needed to chase forks out of the labeled fragments before their electrophoretic separation. Tens to hundreds of single DNA molecules 100–1,000 kb in size identified by FISH are analyzed in a typical SMARD or DNA combing experiment.
When Huberman and Riggs (1968) applied DNA autoradiography to eukaryotic cells, they found that replication started (fired) at multiple points spaced at 20–400-kb intervals and progressed bidirectionally at 2–3 kb/min (Fig. 1 B). A new term (replication origin) was coined to designate the start sites. The word replicon then designated the DNA replicated from a single origin. In mammalian cells with a typical 8–10-h S phase, groups of 5–10 adjacent replicons replicated synchronously within ∼1 h, implying a sequential activation of origin clusters through S phase. Yurov and Liapunova (1977) later unveiled ∼1–2-Mb-long mammalian replicons that replicated through a larger S phase window. The fork progression rate was fairly constant between cell types and organisms, whereas origin spacing and synchrony were more flexible and accounted for developmental and evolutionary variations of S phase length (Berezney et al., 2000). Replicons shortened when fork progression was artificially perturbed, revealing additional flexibility in response to stress (Taylor, 1977; Gilbert, 2007). However, the anonymity of labeled tracts precluded determination of whether origins corresponded to specific DNA sequences.
Eukaryotic replicators were first isolated from budding yeast as 100–200-bp DNA segments that conferred autonomous replication to recombinant plasmids (Stinchcomb et al., 1979; Struhl et al., 1979; Chan and Tye, 1980). Autonomously replicating sequences (ARSs) required a degenerate 11-bp T-rich ARS consensus sequence (ACS) together with nonconsensus elements 3′ to the ACS for function (Newlon and Theis, 1993). On ARS plasmids, replication does initiate at the ARS element and nowhere else, as first shown by 2D agarose gel electrophoretic analysis of replicating restriction fragments (Fig. 2, A and B; Brewer and Fangman, 1987; Huberman et al., 1987). In their chromosomal context, ARSs fire with variable efficiency (0–100% of cell cycles) and at different times in S phase (correlated with efficiency); termination occurs wherever converging forks meet (Fangman and Brewer, 1991; Raghuraman and Brewer, 2010). Origins sometimes replicate passively from adjacent origins, and different cells activate different origin cohorts. Genome-wide replication profiling and mathematical modeling have corroborated these conclusions (Raghuraman et al., 2001; Yang et al., 2010; Bechhoefer and Rhind, 2012; Gillespie et al., 2012; Hawkins et al., 2013).
Replication mapping by restriction fragment shape or strand composition analysis. (A) Neutral/alkaline 2D gel technique (Huberman et al., 1987). A restriction digest of total DNA is enriched for partially single-stranded, replication fork–containing fragments, by chromatography on BND (benzyl-naphtyl-DEAE)-cellulose. The enriched material is first separated in neutral agarose so that replication intermediates (RIs) of each fragment are resolved according to mass (horizontal arrows). Parental and nascent strands are then melted and resolved in an orthogonal direction in alkaline agarose (vertical arrows). After membrane transfer, center or end probes (noted left [L], middle [M], and right [R]) are used to reveal whether nascent strands grow from the center by internal initiation or from either end by entry of outside-initiated forks. The diagonal smear of nascent strands detected by each probe is indicated in the same color as the probe. (B) Neutral/neutral 2D gel technique (Brewer and Fangman, 1987). A restriction digest of total DNA is enriched in replication intermediates and separated in a first electrophoresis as in A. Branched fragments of similar mass but various shapes are then resolved in an orthogonal neutral electrophoresis using conditions that maximize contribution of shape to migration rate. Transfer and hybridization reveal whether the fragment’s replication intermediates contain two diverging forks (bubbles, internal initiation), one fork (simple Ys, passive replication), or two converging forks (double Ys, termination; not depicted). Panels illustrate the patterns obtained in case of centered, off-centered, or random initiation within the restriction fragment. (C) Bubble trap (Mesner et al., 2006). A restriction digest of total DNA enriched in replication intermediates by isolation on the nuclear matrix and chromatography on BND-cellulose is mixed with molten agarose, allowed to solidify, and electrophoresed out of the agarose plug. Bubbles become topologically trapped in the gel as a result of agarose fiber polymerization through their circular structure, whereas replication intermediates of other shapes can migrate out of the plug. Trapped bubbles are then cloned in a plasmid library and either hybridized to microarrays (Mesner et al., 2011) or sequenced (Mesner et al., 2013). Library purity is estimated to >80% by 2D gel analysis of the trapped material or by probing 2D gels of total replication intermediates with individual clones and scoring for a bubble arc.
Replication mapping by restriction fragment shape or strand composition analysis. (A) Neutral/alkaline 2D gel technique (Huberman et al., 1987). A restriction digest of total DNA is enriched for partially single-stranded, replication fork–containing fragments, by chromatography on BND (benzyl-naphtyl-DEAE)-cellulose. The enriched material is first separated in neutral agarose so that replication intermediates (RIs) of each fragment are resolved according to mass (horizontal arrows). Parental and nascent strands are then melted and resolved in an orthogonal direction in alkaline agarose (vertical arrows). After membrane transfer, center or end probes (noted left [L], middle [M], and right [R]) are used to reveal whether nascent strands grow from the center by internal initiation or from either end by entry of outside-initiated forks. The diagonal smear of nascent strands detected by each probe is indicated in the same color as the probe. (B) Neutral/neutral 2D gel technique (Brewer and Fangman, 1987). A restriction digest of total DNA is enriched in replication intermediates and separated in a first electrophoresis as in A. Branched fragments of similar mass but various shapes are then resolved in an orthogonal neutral electrophoresis using conditions that maximize contribution of shape to migration rate. Transfer and hybridization reveal whether the fragment’s replication intermediates contain two diverging forks (bubbles, internal initiation), one fork (simple Ys, passive replication), or two converging forks (double Ys, termination; not depicted). Panels illustrate the patterns obtained in case of centered, off-centered, or random initiation within the restriction fragment. (C) Bubble trap (Mesner et al., 2006). A restriction digest of total DNA enriched in replication intermediates by isolation on the nuclear matrix and chromatography on BND-cellulose is mixed with molten agarose, allowed to solidify, and electrophoresed out of the agarose plug. Bubbles become topologically trapped in the gel as a result of agarose fiber polymerization through their circular structure, whereas replication intermediates of other shapes can migrate out of the plug. Trapped bubbles are then cloned in a plasmid library and either hybridized to microarrays (Mesner et al., 2011) or sequenced (Mesner et al., 2013). Library purity is estimated to >80% by 2D gel analysis of the trapped material or by probing 2D gels of total replication intermediates with individual clones and scoring for a bubble arc.
The eukaryotic initiator was first isolated from budding yeast as a heterohexameric origin recognition complex (ORC) that bound yeast replicators in vitro (Bell and Stillman, 1992). ORC genes are conserved throughout eukaryotes. Mutations in yeast ORC genes caused defects in initiation (Bell et al., 1993; Foss et al., 1993; Micklem et al., 1993). In vivo footprinting showed that the ORC binds ARSs through the cell cycle and that additional proteins join ORC in G1 to form a prereplicative complex (pre-RC; Diffley and Cocker, 1992; Diffley et al., 1994). Only ∼400 of the ∼20,000 ACSs in the yeast genome are actually occupied by the ORC in vivo and function as origins. Origin ACSs are specifically flanked on the 3′ side by A-rich nucleosome-excluding signals that allow ORC binding. The ORC subsequently repositions the flanking nucleosomes, which probably facilitates pre-RC assembly (Lipford and Bell, 2001; Berbenetz et al., 2010; Eaton et al., 2010). Initiation was mapped at nucleotide resolution at plasmid and chromosomal ARS1 by replication initiation point mapping (Fig. 3 C, 4). A single initiation site was found for each leading strand, next to the ORC binding site, at positions separated by 18 bp on the plasmid but 2 bp in the chromosome (Bielinsky and Gerbi, 1998, 1999).
Replication mapping by nascent strand abundance or polarity analysis. (A) Schematic drawing of nascent strands, Okazaki fragments, and leading strands synthesized at an origin. (B) Simplified flowcharts for isolating short nascent strands (SNSs), Okazaki fragments, or leading strands. (C) Principles of origin mapping by replicative strand analysis. In SNS abundance assays (B, 1–3; and C, 1), total DNA is denatured and SSS in the 0.5–3-kb range are isolated on sucrose gradients or agarose gels, taking care to exclude the smaller Okazaki fragments (<0.5 kb). SSS include SNS synthesized specifically at origins as well as inadvertently sheared or nicked strands, which sample the entire genome and typically form the vast majority of SSS molecules. SNS enrichment has been achieved (a) by lysing cells directly into the well of an alkaline agarose gel to minimize breakage before size fractionation (B, 1; in-gel lysis [IGL]-SNS), (b) by labeling neosynthetized DNA with BrdU and purifying the Br-DNA by immunoprecipitation (B, 2) or isopycnic centrifugation (B, 3; BrdU-SNS), and (c) by treating SSS with λ-exonuclease, a 5′-exonuclease that digests DNA but not RNA, to eliminate all DNA strands except nascent strands with an attached RNA primer (B, 4; λ-SNS). The latter strategy requires heat denaturation and neutral gradient purification of SSS to avoid RNA primer hydrolysis. Origins can be mapped by determining relative SNS abundance at closely spaced genomic positions (C, 1) by quantitative PCR (Vassilev et al., 1990), microarray hybridization (Lucas et al., 2007), or high-throughput sequencing (Besnard et al., 2012). Alternatively, SNS have been metabolically labeled with radioactive precursors and used to probe macroarrays representing highly amplified genomic loci (Dijkwel et al., 2002). (B, 4; and C, 2) Lagging-strand polarity assay (Hay and DePamphilis, 1982). Cells are pulse labeled with BrdUTP and radioactive precursors. Labeled Okazaki fragments are purified by size and immunoprecipitation and hybridized to immobilized, strand-specific probes spanning the locus of interest to determine the lagging-strand template (Burhans et al., 1990; Wang et al., 1998). Alternatively, Okazaki fragments accumulated after ligase inactivation are size purified and sequenced (Smith and Whitehouse, 2012). Template switches of opposite direction are observed at initiation and termination sites. The length of DNA over which the switch occurs indicates the size of the initiation or termination zone. (B, 5; and C, 3) Leading-strand polarity assay (Handeli et al., 1989). Cells are treated with emetine to prevent lagging-strand synthesis. Leading strands are density labeled with BrdU, isolated by isopycnic centrifugation, and hybridized with strand-specific probes spanning the locus of interest to determine the template of leading strand synthesis. The precise mechanism by which emetine, a protein synthesis inhibitor, specifically inhibits lagging-strand synthesis is still unclear (Burhans et al., 1991). (C, 4) Replication initiation point mapping (Bielinsky and Gerbi, 1998). (top) SNS 5′ ends can be mapped at nucleotide resolution by extension of a labeled downstream primer (red + blue arrows) followed by sequencing gel electrophoresis. (bottom) The leading-strand start sites are distinguished from the 5′ end of upstream joined Okazaki fragments by preventing joining in yeast ligase mutants (Bielinsky and Gerbi, 1999) or lagging-strand synthesis in mammalian cells treated with emetine (Abdurashidova et al., 2000). Ligation-mediated (Abdurashidova et al., 2000) or one-way (Romero and Lee, 2008) PCR amplification of SNS has been used to increase the sensitivity to the level required for the human genome.
Replication mapping by nascent strand abundance or polarity analysis. (A) Schematic drawing of nascent strands, Okazaki fragments, and leading strands synthesized at an origin. (B) Simplified flowcharts for isolating short nascent strands (SNSs), Okazaki fragments, or leading strands. (C) Principles of origin mapping by replicative strand analysis. In SNS abundance assays (B, 1–3; and C, 1), total DNA is denatured and SSS in the 0.5–3-kb range are isolated on sucrose gradients or agarose gels, taking care to exclude the smaller Okazaki fragments (<0.5 kb). SSS include SNS synthesized specifically at origins as well as inadvertently sheared or nicked strands, which sample the entire genome and typically form the vast majority of SSS molecules. SNS enrichment has been achieved (a) by lysing cells directly into the well of an alkaline agarose gel to minimize breakage before size fractionation (B, 1; in-gel lysis [IGL]-SNS), (b) by labeling neosynthetized DNA with BrdU and purifying the Br-DNA by immunoprecipitation (B, 2) or isopycnic centrifugation (B, 3; BrdU-SNS), and (c) by treating SSS with λ-exonuclease, a 5′-exonuclease that digests DNA but not RNA, to eliminate all DNA strands except nascent strands with an attached RNA primer (B, 4; λ-SNS). The latter strategy requires heat denaturation and neutral gradient purification of SSS to avoid RNA primer hydrolysis. Origins can be mapped by determining relative SNS abundance at closely spaced genomic positions (C, 1) by quantitative PCR (Vassilev et al., 1990), microarray hybridization (Lucas et al., 2007), or high-throughput sequencing (Besnard et al., 2012). Alternatively, SNS have been metabolically labeled with radioactive precursors and used to probe macroarrays representing highly amplified genomic loci (Dijkwel et al., 2002). (B, 4; and C, 2) Lagging-strand polarity assay (Hay and DePamphilis, 1982). Cells are pulse labeled with BrdUTP and radioactive precursors. Labeled Okazaki fragments are purified by size and immunoprecipitation and hybridized to immobilized, strand-specific probes spanning the locus of interest to determine the lagging-strand template (Burhans et al., 1990; Wang et al., 1998). Alternatively, Okazaki fragments accumulated after ligase inactivation are size purified and sequenced (Smith and Whitehouse, 2012). Template switches of opposite direction are observed at initiation and termination sites. The length of DNA over which the switch occurs indicates the size of the initiation or termination zone. (B, 5; and C, 3) Leading-strand polarity assay (Handeli et al., 1989). Cells are treated with emetine to prevent lagging-strand synthesis. Leading strands are density labeled with BrdU, isolated by isopycnic centrifugation, and hybridized with strand-specific probes spanning the locus of interest to determine the template of leading strand synthesis. The precise mechanism by which emetine, a protein synthesis inhibitor, specifically inhibits lagging-strand synthesis is still unclear (Burhans et al., 1991). (C, 4) Replication initiation point mapping (Bielinsky and Gerbi, 1998). (top) SNS 5′ ends can be mapped at nucleotide resolution by extension of a labeled downstream primer (red + blue arrows) followed by sequencing gel electrophoresis. (bottom) The leading-strand start sites are distinguished from the 5′ end of upstream joined Okazaki fragments by preventing joining in yeast ligase mutants (Bielinsky and Gerbi, 1999) or lagging-strand synthesis in mammalian cells treated with emetine (Abdurashidova et al., 2000). Ligation-mediated (Abdurashidova et al., 2000) or one-way (Romero and Lee, 2008) PCR amplification of SNS has been used to increase the sensitivity to the level required for the human genome.
Unlike the bacterial initiator, ORC does not unwind origin DNA. In G1 phase, ORC and replication factors Cdc6 and Cdt1 load the minichromosome maintenance proteins 2–7 (MCM2–7), which form the core of the replicative helicase, as inactive head-to-head double hexamers onto double-stranded DNA (dsDNA). This step is termed origin licensing or pre-RC formation. In S phase, pre-RCs are acted on by S-phase protein kinases and many accessory factors, which reconfigure the inactive MCM2–7 double hexamer from a dsDNA binding mode to a single-stranded DNA binding mode, rendering the helicase active for origin unwinding and bidirectional replisome assembly (Fu et al., 2011; Tognetti et al., 2014).
A single ORC can load multiple MCM2–7 double hexamers onto dsDNA during licensing, but only a small fraction is activated in an unperturbed S phase. Unfired MCM2–7 double hexamers provide backup origins that can facilitate completion of normal S phase (Lucas et al., 2000; Hyrien et al., 2003) or rescue artificially stalled forks (Woodward et al., 2006; Ge et al., 2007; Ibarra et al., 2008). As cells complete S phase, however, unfired MCM2–7 double hexamers are cleared from chromatin by still elusive mechanisms. MCM2–7 double hexamers cannot be reloaded until the next G1 phase because of multiple cell cycle regulatory mechanisms (Siddiqui et al., 2013), which prevent DNA re-replication in a single cell cycle.
Random versus site-specific initiation in metazoans
In contrast to yeast, autonomous replication assays generally failed to identify metazoan replicators. DNA of any source replicated with an efficiency proportional to size but largely independent of DNA sequence in Xenopus laevis eggs or egg extracts (Harland and Laskey, 1980; Méchali and Kearsey, 1984; Blow and Laskey, 1986) and in human cells (Krysan et al., 1989). 2D gels showed random initiation in both cases (Krysan and Calos, 1991; Hyrien and Méchali, 1992; Mahbubani et al., 1992). Random initiation was also observed within the transcriptionally silent chromosomes of early Xenopus and Drosophila melanogaster embryos (Shinomiya and Ina, 1991; Hyrien and Méchali, 1993). Thus, random initiation is not a unique feature of exogenous DNA and is compatible with organismal viability. Consistent with this apparent lack of metazoan replicators, metazoan ORC bound DNA in vitro without sequence specificity, albeit with increased affinity for negatively supercoiled DNA (Vashee et al., 2003; Remus et al., 2004; Schaarschmidt et al., 2004).
In spite of that, replication initiates at specific positions within transcriptionally active metazoan chromosomes. EM first revealed that active nucleolar chromatin of fly larvae replicates from origins restricted to nontranscribed spacer elements between ribosomal RNA genes (McKnight and Miller, 1977), a location conserved in Xenopus and human somatic cells (Bozzoni et al., 1981; Little et al., 1993; Hyrien et al., 1995). The transition from random to specific initiation in intergenic sequences occurred when transcription resumed at the midblastula transition in developing Xenopus and Drosophila embryos (Hyrien et al., 1995; Sasaki et al., 1999). Thus, although the entire genome was a potential substrate for initiation, the efficiency of individual sites was epigenetically modulated in coordination with transcriptional activity during development.
Metazoan ORC may be targeted to specific sites by cofactors such as HMGA1 (Thomae et al., 2008), ORC-associated protein ORCA (Shen et al., 2010), noncoding RNAs (Norseen et al., 2008; Krude et al., 2009; Zhang et al., 2011), or specific histone modifications (Shen and Prasanth, 2012; Méchali et al., 2013; Sherstyuk et al., 2014). Tethering ORC to an array of Gal4 DNA binding sites was sufficient to generate a mammalian origin (Takeda et al., 2005). Tethering PR-Set7, the methylase responsible for H4K20 monomethylation, promoted trimethylation by Suv4-20h followed by ORC recruitment through ORC1 and ORCA (Tardat et al., 2010; Beck et al., 2012). Metazoan but not yeast ORC1 contains a bromo-adjacent homology (BAH) domain that specifically recognizes H4K20me2, and ORC1 BAH mutations that caused primordial dwarfism abolished this interaction and impaired ORC loading and cell cycle progression (Kuo et al., 2012). Thus, various trans-regulatory mechanisms may create sequence-specific origins despite the lack of classical replicators in metazoans.
Interestingly, yeast replicators were not strictly required for in vitro replication in yeast extracts; replication only became origin dependent in the presence of competitor DNA and limiting ORC concentrations (Gros et al., 2014; On et al., 2014). As restrictions imposed by chromatin were bypassed in this system, epigenetic mechanisms may contribute to origin specification in yeast as in mammals.
The Chinese hamster dihydrofolate reductase (DHFR) initiation zone
Identification of the first and most studied mammalian origin took advantage of a methotrexate-resistant Chinese ovary cell line (CHOC400) that carried ∼1,000 copies of a ∼240-kb amplicon including the DHFR gene (Hamlin et al., 2010). Separation of EcoRI-digested CHOC400 DNA on agarose gels revealed ∼50 high-copy number restriction fragments above the background of single-copy fragments. Autoradiography of DNA labeled with [3H]thymidine in early S phase identified a half-dozen of earliest labeled fragments (Heintz and Hamlin, 1982) that all mapped within the 55-kb spacer between the convergently transcribed DHFR and 2BE2121 genes (Looney and Hamlin, 1987).
These results gave hope that mammalian replicators could be identified by more precise mapping within the spacer. Early studies indeed pointed to one or two preferential initiation sites (Handeli et al., 1989; Leu and Hamlin, 1989; Burhans et al., 1990; Vassilev et al., 1990). However, 2D gels revealed that initiation could in fact occur at any of a large number (>40) of potential sites with different efficiencies within the 55-kb spacer (Vaughn et al., 1990; Dijkwel and Hamlin, 1992; Dijkwel et al., 2002), even in nonamplified CHO cells (Dijkwel and Hamlin, 1995). This dispersive initiation was confirmed by macroarray hybridization of Okazaki fragments or short nascent strands (SNSs; replicative 300–1,000-nt single strands, assumed to be centered on origins; Fig. 3) labeled in permeabilized cells (Wang et al., 1998; Dijkwel et al., 2002; Sasaki et al., 2006) and by DNA combing (Fig. 1 C; Lubelsky et al., 2011). ORCs and minichromosome maintenance proteins were located by chromatin immunoprecipitation at low nucleosome occupancy sites, and enrichment was related to initiation efficiency (Lubelsky et al., 2011).
Initiation was only detected in <30% of the spacer copies, and active spacers appeared to support a single initiation event, implying a mean efficiency of initiation per kilobase of ∼0.5% (Dijkwel and Hamlin, 1992). Sites named ori-β, ori-β′, and ori-γ appeared preferred, though (Handeli et al., 1989; Leu and Hamlin, 1989; Burhans et al., 1990; Dijkwel et al., 2002), and analysis of ectopically relocalized ori-β suggested that small deletions could reduce its ectopic origin activity (Altman and Fanning, 2004). However, in loco deletion of ori-β, ori-β′, or even a central 40-kb segment spanning ori-β, ori-β′, and ori-γ did not reduce but actually increased initiation in the rest of the spacer (Mesner et al., 2003). Therefore, none of the preferred initiation sites contained any critical, nonredundant element required for initiation, and if redundant replicators existed, each appeared to control initiation only locally and inefficiently.
Transcription circumscribes and stimulates replication initiation
Dispersive initiation bounded by transcribed genes, as in Xenopus and Drosophila post-midblastula transition embryos (Hyrien et al., 1995; Sasaki et al., 1999), was also observed by 2D gels at the CHO rhodopsin locus (Dijkwel et al., 2000) and in human ribosomal DNA (Little et al., 1993). Deleting the DHFR gene promoter broadened the initiation zone to include the inactive DHFR gene (Saha et al., 2004). Conversely, deleting the DHFR transcription terminator allowed transcription to invade all but 8 kb of the spacer and confined initiation to that segment (Mesner and Hamlin, 2005). When fragments containing ori-β, ori-β′, or DHFR gene sequences were integrated at ectopic locations, they all sustained dispersive initiation, whereas the transcribed, neomycin-resistance adjacent marker did not, but when a cosmid containing an active DHFR gene was relocated, dispersive initiation was detected in the nontranscribed bacterial vector sequences but not in the DHFR gene (Lin et al., 2005). These genetic experiments strongly suggested that any DNA sequence contained potential initiation sites but that these sites could be silenced by read-through transcription. Consistently, transcription inhibited autonomous plasmid replication in human cells (Haase et al., 1994).
When bare DNA or early G1 CHOC400 nuclei were added to Xenopus egg extracts, replication initiated at random sequences. When late G1 nuclei were the template, however, replication initiated specifically within the DHFR initiation zone (Gilbert et al., 1995; Wu and Gilbert, 1996). This transition, named the origin decision point (ODP), occurred in G1 ∼4 h after metaphase and was abolished by transcription inhibitors (but not protein synthesis inhibitors). However, transcription of the DHFR domain was detected before the ODP and did not increase at the ODP. Therefore, transcription was necessary but not sufficient to circumscribe initiation (Dimitrova, 2006; Sasaki et al., 2006). The ODP perhaps activates a mechanism for unloading pre-RCs from transcribed genes in G1, reminiscent of pre-RC unloading ahead of progressing forks during S phase.
Deletion of the DHFR gene promoter broadened the initiation zone but also lowered its overall efficiency (Saha et al., 2004). Conversely, zone truncation by internal deletion (Mesner et al., 2003) or invading transcription (Mesner and Hamlin, 2005) increased initiation in the remaining nontranscribed sequences. Thus, nearby transcription had positive effects on adjacent initiation, causing compensatory changes in zone size and local initiation rate.
Broad and narrow initiation zones
Before the genomic era, only few mammalian origins were identified. Mapping single-copy origins was challenging, and various techniques were elaborated to capture and quantify the rare and fragile initiation intermediates (Figs. 1–3).
Early strand polarity (Fig. 3 C, 2 and 3) or SNS abundance (Fig. 3, B and C,1) assays pointed to narrowly localized origins upstream of the MYC gene (Vassilev and Johnson, 1990), between the LMNB2 and TIMM13 genes (Biamonti et al., 1992; Giacca et al., 1994; Kumar et al., 1996), and between the δ- and β-globin genes (Kitsberg et al., 1993). In contrast to the apparent lack of human replicators (Krysan et al., 1989), an ARS was identified upstream of the MYC gene in HeLa cells (McWhinney and Leffak, 1990), and the leading strand switch at the β-globin origin was suppressed by a natural 8-kb deletion spanning the origin (Kitsberg et al., 1993) or by a remote deletion encompassing a distant transcriptional regulatory element, the locus control region (Aladjem et al., 1995). When wild-type or mutated MYC, β-globin, or LMNB2 origin fragments were ectopically relocalized, SNS assays detected ectopic origin activity, and certain mutations reduced SNS abundance in a manner consistent with a modular replicator structure (Aladjem et al., 1998; Malott and Leffak, 1999; Liu et al., 2003; Paixão et al., 2004; Wang et al., 2004; Buzina et al., 2005), as for the DHFR ori-β relocation experiments (Altman and Fanning, 2004). However, later SNS studies revealed broader and more dispersive initiation than initially thought at the MYC (Waltz et al., 1996; Trivedi et al., 1998) and human β-globin (Kamath and Leffak, 2001) loci and broad initiation zones at the homologous mouse and chicken β-globin domains (Aladjem et al., 2002; Prioleau et al., 2003). Although two replication initiation point mapping (Fig. 3 C, 4) studies (Abdurashidova et al., 2000; Lee and Romero, 2012) reported highly localized—but partly conflicting—leading-strand start sites within the LMNB2/TIMM13 intergene, DNA combing (Fig. 1 C) detected broadly dispersed initiation over ∼800 kb of surrounding DNA with only some preference for a ∼200-kb area upstream of the LMNB2 gene (Palumbo et al., 2010). In summary, sites initially believed to represent efficient and specific replicators may in fact be embedded in broad initiation zones. The significance of ectopic relocation experiments is thus limited, as only local effects were monitored while surrounding sequences may also support initiation.
A prevalence of dispersive initiation zones has been observed in mammalian cells. DNA combing identified 36 fully or predominantly intergenic initiation zones in a 1.5-Mb region of human chromosome 14q11.2 (Lebofsky et al., 2006). Each zone (2.6–21.6 kb in size) fired in only a fraction of the cell cycles and seldom sustained more than one initiation, reminiscent of the DHFR initiation zone. Broad initiation zones were identified by single molecule analysis of replicated DNA (SMARD; Fig. 1 C) at the mouse Igh locus (Norio et al., 2005; Demczuk et al., 2012; Gauthier et al., 2012), at the human POU5F1, NANOG (Schultz et al., 2010), and FMR1 (Gerhardt et al., 2014) loci, and at human subtelomeres (Drosopoulos et al., 2012). Six narrow intergenic origins identified by DNA combing in the polygenic Chinese hamster AMPD2 amplicon (Anglana et al., 2003) may consist of single sites or narrow zones. The most efficient one, oriGNAI3, had been previously detected by neutral/alkaline 2D gel, SNS abundance, and leading-strand polarity assays (Toledo et al., 1998, 1999; Svetlova et al., 2001). Origin hierarchy was regulated by fork speed such that oriGNAI3 predominance was stronger when forks progressed faster (Anglana et al., 2003), perhaps because faster forks left nearby weaker origins less chance to fire.
Developmental, metabolic, and hierarchical regulation of origins
Developmental activation or repression of initiation sites was observed at the mouse Igh locus during B cell development (Norio et al., 2005). Extensive SMARD analysis suggested that potential origins were abundant throughout the locus but fired at a rate that changed abruptly (≤77-fold) between adjacent domains (50–650 kb in size) while staying constant within domains and implicated the developmental regulator Pax5 in modifying origin usage during differentiation (Demczuk et al., 2012; Gauthier et al., 2012). Changes in origin usage were also observed at the chicken β-globin locus during terminal erythrocyte differentiation (Dazy et al., 2006), at the mouse HoxB9 locus during in vitro differentiation of embryonic carcinoma cells (Grégoire et al., 2006), and at the human POU5F1 locus during human embryonic stem cell differentiation (Schultz et al., 2010).
By analogy to transcription, it was proposed that histone acetylation may increase origin accessibility and activity. Trichostatin A, a histone hyperacetylating agent, increased initiation genome wide and evened out initiation preference at specific human origins (Kemp et al., 2005). However, no clear link was observed between developmental regulation of origin activity and histone acetylation at the chicken β-globin and mouse HoxB9 loci (Prioleau et al., 2003; Grégoire et al., 2006). In the AMPD2 amplicon, Trichostatin A attenuated origin hierarchy but also altered pyrimidine pools and slowed fork progression; supplying nucleotide precursors restored both fork speed and origin hierarchy, which were therefore independent of origin histone acetylation (Gay et al., 2010).
Genome-wide analysis of purified replication intermediates
DNA microarrays and massive DNA sequencing have caused an explosion in the number of mammalian genome-wide origin maps (Table 1) and replication timing profiles (Gilbert, 2010, 2012; Rhind and Gilbert, 2013). In general, replication timing was highly reproducible but not resolutive enough to map individual origins, whereas origin maps were more resolutive but less concordant.
Summary of origin features reported in mammalian genome-wide mapping studies
| Study | Origin purification | Detection | Genome span | Cell type | Origin number | Mean origin spacing | Mean origin size | Main origin features |
| Mb | kb | kb | ||||||
| Human | ||||||||
| Lucas et al., 2007 | SSS | Microarray | 1,425 | 11365 | 32 | <50 | <5 | 80% within transcription units. Correlated with chromatin acetylation. |
| Cadoret et al., 2008 | λ-SNS | Microarray | 30 (ENCODE) | HeLa | 283 | 63 | <5 | Clustered in GC-rich regions. Rare in GC-poor regions. Associated with CGI, TRE, and c-JUN and c-FOS BSs, with open chromatin due to CGIs, with DNase HSSs and with evolutionarily conserved regions. |
| Karnani et al., 2010 | λ-SNS/BrdU-SNS | Microarray | 30 (ENCODE) | HeLa | 150 | 58/28 | 1.4/1.7 | AT-rich but within GC-rich regions, associated with conserved evolutionary elements. λ-SNS and λ-SNS + BrdU SNS intersects enriched in TSSs, but BrdU-SNS specific peaks depleted in TSSs. |
| Mesner et al., 2011 | Bubble trap | Microarray | 30 (ENCODE) | Early S HeLa/HeLa/GM06990 | 111 (646)/128 (657)/177 (988) | 58/69/41 | 15.2/18.1/14.5 | Broad initiation zones covering 15–22% of the genome, within intergenic regions as well as within or overlapping active and inactive genes. 20% encompass 5′ end of or entire active genes and activating histone marks. Overlap by only ∼1/3 between cell types and affected by synchronization. |
| Valenzuela et al., 2011 | SSS, λ-SNS/λ-SNS | Microarray sequencing | 34/3,000 (WG) | MCF-7, BT474, H520/MCF-7 | 8,281 | 4 | NR | >70% conserved in all cell lines. Enriched at active TSSs and in H3K4me3 and Pol-II. Associated with conserved evolutionary elements. |
| Martin et al., 2011 | λ-SNS | Sequencing | 3,000 (WG) | K562, MCF-7 | NR | NR | NR | Clustered near regions of moderate transcription. Rare in highly transcribed or nontranscribed regions. Excluded from TSSs but enriched ∼0.5 kb downstream. Strongly associated with meCpGs and DNase HSSs./Weakly associated with umCpGs, miRNA transcripts, CTCF, Pol-II and c-JUN BSs, H3K4 me, H3K29ac, and H3K27ac. |
| Besnard et al., 2012 | λ-SNS | Sequencing | 3,000 (WG) | HeLa, IMR-90, hESC H9, iPSCs from IMR-90 | 250,000 | 11 | 0.5 | Often grouped in clusters (mean size of 11 kb). At saturation cover ∼10% of the genome. One half within genes, <18% with TSS and CGI. 65–84% pairwise overlap between cell lines, few and inefficient cell type–specific origins. Density correlated with percentage of GC, timing, and efficiency. 91% associated with G4. Strand asymmetric distribution of G, C, and G4. |
| Mesner et al., 2013 | Bubble trap | Sequencing | 3,000 (WG) | GM06990 | 72,812 (123,297) | NR | 20 | Broad initiation zones covering 24% of the genome. 17,999 early, 25,735 mid-, and 29,020 late-replicating zone of mean size 27 kb, 18 kb, and 16 kb./Early zones more focused and efficient than late zones. Majority in nontranscribed DNA regardless of firing time. Early but not mid- and late zones associated with transcribed genes and activating marks DNase I HSSs (58%), H3K4me3, H3K27me3, H3K36me3, and CTCF BSs. At megabase scale, late zones anticorrelated with both activating and repressive marks. Densities were highest in both highly accessible and highly compact chromatin. |
| Dellino et al., 2013 | ORC1-ChIP | Sequencing | 3,000 (WG) | HeLa | 13,600 | NR | NR | Mostly associated with TSSs of coding and noncoding RNAs. 39% of all expressed TSSs in HeLa cells. Most and least transcribed sites associated with coding and noncoding RNAs, respectively. No consensus sequence. |
| Picard et al., 2014 | λ-SNS | Sequencing | 3,000 (WG) | K562 | 59,185 | NR | 3.4 | Reanalyzed data of Besnard et al. (2012) and found 80,000–90,000 origins for each cell type. Early origins shared between cell types. Cell type–specific origins replicate late. 80% of CGI-associated origins are constitutive. 76% of CGIs are origins. Efficiency correlated with H4K20me1 + H3K27me3. |
| Mukhopadhyay et al., 2014 | λ-SNS/BrdU-SNS | Sequencing | 3,000 (WG) | Primary basophilic erythroblasts | 100,000 | NR | NR | Association with G4 (37%), CGIs (7%), and TSSs (13%). DNase I HSSs associated with but not required for origin formation. |
| Mouse | ||||||||
| Sequeira-Mendes et al., 2009 | λ-SNS | Microarray | 10.1 | mESC PGK12, MEFs, NIH-3T3 | 97 | 103 | NR | Most within transcription units. Half at CGI promoters. Efficiency conserved across cell types and correlated with embryonic TSSs. |
| Cayrou et al., 2011 | λ-SNS | Microarray | 60.4/118.3 | mESC GCR8, mTC P19, MEFs/Kc (Drosophila) | 2,748/6,184 | 21/19 | NR | 44% conserved between cell types. Spacing fivefold smaller than IOD on combed DNA. Inferred firing efficiency 20%. Preferentially intragenic./Bimodal distribution of SNS around CGI. G-rich motifs and local nucleotide skew./Drosophila origins correlated with HP1 BSs. |
| Study | Origin purification | Detection | Genome span | Cell type | Origin number | Mean origin spacing | Mean origin size | Main origin features |
| Mb | kb | kb | ||||||
| Human | ||||||||
| Lucas et al., 2007 | SSS | Microarray | 1,425 | 11365 | 32 | <50 | <5 | 80% within transcription units. Correlated with chromatin acetylation. |
| Cadoret et al., 2008 | λ-SNS | Microarray | 30 (ENCODE) | HeLa | 283 | 63 | <5 | Clustered in GC-rich regions. Rare in GC-poor regions. Associated with CGI, TRE, and c-JUN and c-FOS BSs, with open chromatin due to CGIs, with DNase HSSs and with evolutionarily conserved regions. |
| Karnani et al., 2010 | λ-SNS/BrdU-SNS | Microarray | 30 (ENCODE) | HeLa | 150 | 58/28 | 1.4/1.7 | AT-rich but within GC-rich regions, associated with conserved evolutionary elements. λ-SNS and λ-SNS + BrdU SNS intersects enriched in TSSs, but BrdU-SNS specific peaks depleted in TSSs. |
| Mesner et al., 2011 | Bubble trap | Microarray | 30 (ENCODE) | Early S HeLa/HeLa/GM06990 | 111 (646)/128 (657)/177 (988) | 58/69/41 | 15.2/18.1/14.5 | Broad initiation zones covering 15–22% of the genome, within intergenic regions as well as within or overlapping active and inactive genes. 20% encompass 5′ end of or entire active genes and activating histone marks. Overlap by only ∼1/3 between cell types and affected by synchronization. |
| Valenzuela et al., 2011 | SSS, λ-SNS/λ-SNS | Microarray sequencing | 34/3,000 (WG) | MCF-7, BT474, H520/MCF-7 | 8,281 | 4 | NR | >70% conserved in all cell lines. Enriched at active TSSs and in H3K4me3 and Pol-II. Associated with conserved evolutionary elements. |
| Martin et al., 2011 | λ-SNS | Sequencing | 3,000 (WG) | K562, MCF-7 | NR | NR | NR | Clustered near regions of moderate transcription. Rare in highly transcribed or nontranscribed regions. Excluded from TSSs but enriched ∼0.5 kb downstream. Strongly associated with meCpGs and DNase HSSs./Weakly associated with umCpGs, miRNA transcripts, CTCF, Pol-II and c-JUN BSs, H3K4 me, H3K29ac, and H3K27ac. |
| Besnard et al., 2012 | λ-SNS | Sequencing | 3,000 (WG) | HeLa, IMR-90, hESC H9, iPSCs from IMR-90 | 250,000 | 11 | 0.5 | Often grouped in clusters (mean size of 11 kb). At saturation cover ∼10% of the genome. One half within genes, <18% with TSS and CGI. 65–84% pairwise overlap between cell lines, few and inefficient cell type–specific origins. Density correlated with percentage of GC, timing, and efficiency. 91% associated with G4. Strand asymmetric distribution of G, C, and G4. |
| Mesner et al., 2013 | Bubble trap | Sequencing | 3,000 (WG) | GM06990 | 72,812 (123,297) | NR | 20 | Broad initiation zones covering 24% of the genome. 17,999 early, 25,735 mid-, and 29,020 late-replicating zone of mean size 27 kb, 18 kb, and 16 kb./Early zones more focused and efficient than late zones. Majority in nontranscribed DNA regardless of firing time. Early but not mid- and late zones associated with transcribed genes and activating marks DNase I HSSs (58%), H3K4me3, H3K27me3, H3K36me3, and CTCF BSs. At megabase scale, late zones anticorrelated with both activating and repressive marks. Densities were highest in both highly accessible and highly compact chromatin. |
| Dellino et al., 2013 | ORC1-ChIP | Sequencing | 3,000 (WG) | HeLa | 13,600 | NR | NR | Mostly associated with TSSs of coding and noncoding RNAs. 39% of all expressed TSSs in HeLa cells. Most and least transcribed sites associated with coding and noncoding RNAs, respectively. No consensus sequence. |
| Picard et al., 2014 | λ-SNS | Sequencing | 3,000 (WG) | K562 | 59,185 | NR | 3.4 | Reanalyzed data of Besnard et al. (2012) and found 80,000–90,000 origins for each cell type. Early origins shared between cell types. Cell type–specific origins replicate late. 80% of CGI-associated origins are constitutive. 76% of CGIs are origins. Efficiency correlated with H4K20me1 + H3K27me3. |
| Mukhopadhyay et al., 2014 | λ-SNS/BrdU-SNS | Sequencing | 3,000 (WG) | Primary basophilic erythroblasts | 100,000 | NR | NR | Association with G4 (37%), CGIs (7%), and TSSs (13%). DNase I HSSs associated with but not required for origin formation. |
| Mouse | ||||||||
| Sequeira-Mendes et al., 2009 | λ-SNS | Microarray | 10.1 | mESC PGK12, MEFs, NIH-3T3 | 97 | 103 | NR | Most within transcription units. Half at CGI promoters. Efficiency conserved across cell types and correlated with embryonic TSSs. |
| Cayrou et al., 2011 | λ-SNS | Microarray | 60.4/118.3 | mESC GCR8, mTC P19, MEFs/Kc (Drosophila) | 2,748/6,184 | 21/19 | NR | 44% conserved between cell types. Spacing fivefold smaller than IOD on combed DNA. Inferred firing efficiency 20%. Preferentially intragenic./Bimodal distribution of SNS around CGI. G-rich motifs and local nucleotide skew./Drosophila origins correlated with HP1 BSs. |
WG, whole genome; NR, not reported; iPSCs, induced pluripotent stem cells; mESC, mouse embryonic stem cell; hESC, human embryonic stem cell; MEFs, mouse embryonic fibroblasts; mTC, mouse teratocarcinoma; CGI, CpG islands; TRE, transcriptional regulatory elements; ChIP, chromatin immunoprecipitation; HSS, hypersensitive site; CTCF, CCCTC-binding factor; Pol II, RNA polymerase II; meCpG, methylated CpG dinucleotide; umCpG, unmethylated CpG; G4, G-quadruplex elements; BS, binding site; IOD, inter-origin distance; HP1, heterochromatin protein 1; For Mesner et al. (2011, 2013), the numbers in parentheses indicate the number of individual EcoRI fragments clustering into the indicated number of initiation zones.
The first high-throughput mapping of human origins probed microarrays spanning the myc, LMNB2, β-globin, and FMR1 origins and a 1-Mb region on chromosome 22 with short (0.3–1.0 kb) DNA single strands (short single strands [SSS]), assumed to represent newly synthesized origin DNA, from lymphoblastoid cells. The four control and 28 new origins were detected (Lucas et al., 2007). However, these SSS (1% of starting total DNA) were much more abundant than expected (∼0.001%), suggesting massive contamination by irrelevant nicked DNA. Cadoret et al. (2008) reported that ∼99% of SSS from HeLa cells were eliminated by λ 5′-exonuclease, which digests DNA lacking an RNA primer (Bielinsky and Gerbi, 1998). When the remaining material (λ-SNS) was hybridized to ENCODE microarrays, 283 peaks were identified compared with nine peaks with undigested SSS, suggesting that λ-exonuclease treatment was mandatory to identify origins (Cadoret et al., 2008), as confirmed by Cayrou et al. (2011). In contrast, Valenzuela et al. (2011) detected no difference between SSS and λ-SNS from MCF-7 cells (Table 1). No major differences between SSS and BrdU-labeled, immunopurified SNS (BrdU-SNS) were detected by PCR analyses of the LMNB2 (Kumar et al., 1996) and GNAI3 (Toledo et al., 1998, 1999) origins. How could origins be detected in SSS if >99% are irrelevant broken strands? Do they preferentially break during SSS isolation, overreplicate, or accumulate as unligated strands? Strikingly, Gómez and Antequera (2008) reported that origins were overrepresented ∼10-fold in total human DNA, as a result of reiterative synthesis and release of short (200 bp) dsDNA molecules with 5′RNA primers. Whether such “abortive initiation” generates SSS, BrdU-SNS, or λ-SNS peaks remains unclear.
Disturbingly, only a 11–35% pairwise matching was observed between independent studies profiling either λ-SNS (Cadoret et al., 2008; Karnani et al., 2010), BrdU-SNS (Karnani et al., 2010), or replication bubble–containing EcoRI fragments (trapped in gelling agarose; Fig. 2 C; Mesner et al., 2011) from HeLa cells along ENCODE microarrays. Importantly, the purity of the trapped bubbles was evaluated to >80% by 2D gel analysis, a method independent of the origin isolation scheme (Fig. 2, B and C).
The ENCODE λ-SNS formed narrow peaks, whereas the bubble fragments clustered into zones that showed dispersive initiation by 2D gels. It was possible that flat SNS signals produced by dispersive initiation were overlooked by peak-calling algorithms or that lack of saturation limited the overlap of ENCODE studies. Consistently, Mukhopadhyay et al. (2014) reported 70% genome-wide overlap between independently prepared λ-SNS and BrdU-SNS. Furthermore, when Besnard et al. (2012) sequenced λ-SNS from HeLa and three other cell types to saturation (∼250,000 peaks in each), many peaks did cluster into zones. However, the complete genomic set of HeLa λ-SNS still only overlapped 51%, 14%, and 6% of the bubbles, λ-SNS, and BrdU-SNS of Mesner et al. (2011) and Karnani et al. (2010), respectively, in contrast to 80% of the λ-SNS of Cadoret et al. (2008).
Bubbles and SNS showed different conservation between cell types. HeLa (adenocarcinoma) and GM06990 (lymphoblastoid) cells shared only 28–43% of their ENCODE bubble fragments (Mesner et al., 2011), but λ-SNS were more conserved between cell lines (Sequeira-Mendes et al., 2009; Cayrou et al., 2011; Valenzuela et al., 2011; Picard et al., 2014). The λ-SNS of the four cell lines studied by Besnard et al. (2012) overlapped by 65–84% pairwise, with 50% ubiquitous peaks that included 91% of the λ-SNS of Cadoret et al. (2008) or Martin et al. (2011) and even 81% of the SSS of Lucas et al. (2007). When Mesner et al. (2013) sequenced bubbles from GM06990 cells, however, only 33–37% overlapped any of the λ-SNS of Besnard et al. (2012), and only 45–46% of the latter overlapped the bubbles. Picard et al. (2014) called λ-SNS peaks from the Besnard et al. (2012) data using a broader window and clustered them into zones comparable to bubbles, thus raising the overlap with bubbles to 65%; yet, only 44% of the bubbles overlapped the λ-SNS zones. In summary, bubble maps remain discordant from SNS and are markedly more flexible between cell lines (Table 1). The contrasting properties of these origin populations are further discussed at the end of this review.
Genome-wide localization of Homo sapiens ORC in HeLa cells was reported by Dellino et al. (2013), who identified 13,600 ORC1 binding sites with no consensus sequence. Only 11%, 30%, and 47% of the 229 ENCODE ORC1 peaks matched the Karnani λ-SNS, the Cadoret λ-SNS, and the Mesner bubbles, and only 8%, 23%, and 20% of the latter, respectively, coincided with ORC1 peaks. As ORC has other functions than replication initiation, it is likely that only a fraction of these peaks correspond to true origins.
Potential genetic and epigenetic determinants of origins
Despite incomplete overlap, many studies reported a correlation with transcription start sites (TSSs) and CpG islands (Table 1). Moreover, G-rich or G-quadruplex (G4) motifs were associated with 70–90% of human, mouse, and Drosophila λ-SNS peaks (Besnard et al., 2012; Cayrou et al., 2012a). Interestingly, Homo sapiens ORC bound randomly to dsDNA but preferentially to G4 motifs on single-stranded DNA (Hoshina et al., 2013). G4s were required in an orientation-dependent manner for λ-SNS accumulation at two origins in DT40 chicken cells (Valton et al., 2014). However, 36% of GM06990 bubble fragments did not contain G4s (Mesner et al., 2013). The concern was raised that λ-exonuclease can pause in a strand-specific manner at GC-rich sequences (Perkins et al., 2003; Conroy et al., 2010). This could explain the λ-SNS enrichment in CpG islands and G4s, their strong conservation between cell types, and the orientation effects observed by Valton et al. (2014), although Cayrou et al. (2011, 2012b) reported that λ-SNS were eliminated by previous RNase or alkali treatment and absent from mitotic or quiescent cells.
Until recently, no particular histone modification showed a striking association with origins. H3K79me2 methylation was more enriched, at origins mapped by Martin et al. (2011), than any other single chromatin modification, but prevention of H3K79 methylation did not alter origin density (Fu et al., 2013). Monomethylation of H4K20 by Pr7-Set followed by di- and trimethylation by Suv4-20h has been implicated in pre-RC assembly (Tardat et al., 2010; Beck et al., 2012). ORC1 BAH domain specifically recognizes H4K20me2 (Kuo et al., 2012), but this histone modification seems too abundant (80% of all H4 molecules) to explain origin specificity (Schotta et al., 2008). ORCA showed a preference for H4K20me3, which, in combination with H4K20me2, may guide origin choice more selectively (Beck et al., 2012). Association of H4K20me1 with origins, though not observed by Martin et al. (2011), was detected by Picard et al. (2014). Cell cycle investigations of all three H4K20 methylation states may provide further insight. More details on epigenetic modulation of origins can be found in Sherstyuk et al. (2014).
Genome-wide analysis of replication fork directionality
Origins can be predicted from DNA sequence alone (Hyrien et al., 2013). Lobry (1996) discovered that bacteria have an asymmetric composition of the two DNA strands, with enrichment of the leading strand in G over C and T over A. The GC and TA skews SGC = (G − C)/(G + C) and STA = (T − A)/(T + A) reflect replication direction throughout evolution because the leading and lagging strands experience different rates of nucleotide substitution. Detecting an abrupt change of sign of SGC is thus used to predict bacterial origins and termini (Grigoriev, 1998).
Upward skew jumps (+S jumps) similar to bacterial origins have been detected at 1,546 sites in the human genome (Brodie of Brodie et al., 2005; Touchon et al., 2005). They frequently occurred between divergent housekeeping genes (Huvet et al., 2007) in open chromatin (Audit et al., 2009) and arose from additive effects of replication- and transcription-associated mutational asymmetries (Chen et al., 2011). Between upward jumps, the skew decreased in a linear manner, suggesting a progressive inversion of replication fork directionality across megabase-sized segments termed N domains because of the resulting N-shaped skew profile (Huvet et al., 2007). Thus, S-jump origins must have been highly active over evolution, whereas any intervening origins must have fired dispersively to account for this progressive inversion (Hyrien et al., 2013).
When compared with somatic cell replication timing profiles, S jumps coincided with early replicating peaks and N-domain centers with late-replicating U-shaped valleys (Audit et al., 2007; Chen et al., 2010; Hansen et al., 2010; Baker et al., 2012). Hundreds of megabase-sized domains with a U-shaped timing profile were identified independent of skew analysis (Baker et al., 2012; Audit et al., 2013). Demonstrating that the timing gradient equaled the ratio of fork speed to fork directionality led us to predict an N-shaped fork directionality profile of U domains strikingly similar to skew N domains (Guilbaud et al., 2011; Baker et al., 2012). U domains coincided with chromatin self-interaction domains revealed by conformation capture (Lieberman-Aiden et al., 2009). The U and N shapes of the timing and fork directionality profiles were quantitatively explained by a cascade model for sequential activation of origins with increasing synchrony from domain borders to center (Hyrien et al., 2013).
Potential models for mammalian genome replication
Given the incomplete concordance of origin features and locations between studies (Table 1), a definitive portrait of mammalian origins is premature. Nonetheless, most studies agreed that a fraction of origins overlapped the 5′ end or the entirety of active transcription units and their chromatin marks, especially in early replicating regions, whereas late origins were less efficient, more dispersed, and not associated with these marks. λ-SNS tended to highlight the former category, whereas bubbles were more often detected in nontranscribed genic or intergenic sequences, regardless of replication timing, and were anticorrelated with both activating and repressive chromatin marks in late-replicating zones. Comparison of replication timing and chromatin interaction data suggested that early and late-replicating sequences reside in two segregated chromatin compartments (Ryba et al., 2010). Replication timing can be predicted by DNase I hypersensitivity better than by TSSs, suggesting that origins colocalize with promoters just because they colocalize with DNase hypersensitivity (Gindin et al., 2014). Consistently, the cascade model proposed for N/U-domain replication combines efficient initiation at early replicating master origins in open chromatin between active genes with more random and later initiation elsewhere (Hyrien et al., 2013).
Direct determination of replication fork directionality by sequencing of Okazaki fragments, a powerful technique first validated in yeast (McGuffee et al., 2013), has been recently achieved in human cells (unpublished data). These new data, which confirm the predicted directionality profiles of N/U domains, will hopefully contribute to clarifying the mist that still surrounds mammalian replication origins.
Acknowledgments
I thank Benjamin Audit, Alain Arneodo, Francesco de Carli, Chun-Long Chen, Nataliya Petryk, Claude Thermes, and Xia Wu for their comments on the manuscript.
Work in my laboratory was supported by grants from the Ligue Nationale Contre le Cancer (Comité de Paris), the Cancéropole Ile-de-France (ERABL), and the Agence Nationale pour la Recherche (REFOPOL-BLAN2010-161501).
The author declares no competing financial interests.
References
- ACS
ARS consensus sequence
- ARS
autonomously replicating sequence
- BAH
bromo-adjacent homology
- CldU
chlorodeoxyuridine
- DHFR
dihydrofolate reductase
- dsDNA
double-stranded DNA
- IdU
iododeoxyuridine
- ODP
origin decision point
- ORC
origin recognition complex
- pre-RC
prereplicative complex
- SMARD
single molecule analysis of replicated DNA
- SNS
short nascent strand
- SSS
short single strands
- TSS
transcription start site
