DNA polymerase ι (Pol ι) is an attractive candidate for somatic hypermutation in antibody genes because of its low fidelity. To identify a role for Pol ι, we analyzed mutations in two strains of mice with deficiencies in the enzyme: 129 mice with negligible expression of truncated Pol ι, and knock-in mice that express full-length Pol ι that is catalytically inactive. Both strains had normal frequencies and spectra of mutations in the variable region, indicating that loss of Pol ι did not change overall mutagenesis. We next examined if Pol ι affected tandem mutations generated by another error-prone polymerase, Pol ζ. The frequency of contiguous mutations was analyzed using a novel computational model to determine if they occur during a single DNA transaction or during two independent events. Analyses of 2,000 mutations from both strains indicated that Pol ι–compromised mice lost the tandem signature, whereas C57BL/6 mice accumulated significant amounts of double mutations. The results support a model where Pol ι occasionally accesses the replication fork to generate a first mutation, and Pol ζ extends the mismatch with a second mutation.
Upon encounter with antigen, B cells express activation-induced deaminase (AID), which deaminates cytosine to uracil in DNA (Maul et al., 2011). The uracil base is then used to induce a vast array of mutations and DNA breaks to promote somatic hypermutation and class switch recombination. During somatic hypermutation, uracils are detected by either the mismatch repair protein complex, MSH2-MSH6 (Wiesendanger et al., 2000), or the base excision repair protein, uracil DNA glycosylase (UNG; Rada et al., 2002). However, these proteins do not function in the canonical repair pathways of removing base damage and allowing faithful DNA synthesis by high fidelity DNA polymerases (Pols) β, δ, and ε. Instead, low fidelity Pols η, ζ, and Rev1 are recruited to synthesize multiple mutations. Pols η and ζ function mainly during synthesis in gaps created by MSH2–MSH6 and exonuclease 1 (Bardwell et al., 2004; Martomo et al., 2005). Pol η is responsible for the majority of mutations of A:T bp (Zeng et al., 2001; Delbos et al., 2007), and Pol ζ contributes to synthesis of tandem double mutations (Daly et al., 2012; Saribasak et al., 2012). Rev1, a deoxycytidyl transferase, inserts C opposite abasic sites generated after removal of uracils by UNG to produce transversions of C:G bp (Jansen et al., 2006). The abasic site could also be nicked by an apurinic/apyrimidinic endonuclease to create a single strand break (Stavnezer et al., 2014), which can allow entry by Pol η to generate mutations of A:T bp (Delbos et al., 2007).
Pol ι would also appear to be an attractive candidate for somatic hypermutation because of its very high misincorporation rate (Tissier et al., 2000). Indeed, Pol ι may be present during gap synthesis because it physically interacts with Pol η through ubiquitination (McIntyre et al., 2013), and both polymerases are recruited to DNA damage foci (Kannouche et al., 2003). However, there was no alteration in mutation frequency or spectra in the 129 strain of mice (McDonald et al., 2003; Martomo et al., 2006), which does not express full length Pol ι due to a naturally occurring point mutation in exon 2 that produces a nonsense codon. It has recently been reported that there is a high incidence of exon 2 skipping in 129-derived strains, and the truncated protein has residual polymerase activity (Aoufouchi et al., 2015). However, another study demonstrated that human Pol ι lacking exon 2 is inactive (Makarova et al., 2011), likely because it is missing critical active-site contacts required for polymerase function. Thus, it remains unclear whether Pol ι does, or does not, participate in somatic hypermutation.
Because of this controversy, we generated a knock-in mouse with catalytically inactive Pol ι. We also considered a new role for Pol ι to act together with Pol ζ during mutagenesis. Multiple polymerases can work sequentially when bypassing DNA lesions, and this includes Pol ι and Pol ζ (Johnson et al., 2000; Ziv et al., 2009). Using a novel computational model to statistically analyze the frequency of tandem mutations, we demonstrate that Pol ι cooperates with Pol ζ to produce contiguous mutations.
RESULTS AND DISCUSSION
Catalytically inactive Pol ι binds DNA and is expressed in knock-in mice
To generate the mutant protein, active site residues D126 and E127 were both changed to alanine (Pol ιm; Fig. 1 A), which should disrupt the ability of Pol ι to chelate Mg2+ required for DNA synthesis. A similar alteration of the corresponding residues in Pol η abolished its catalytic activity (Tissier et al., 2000). To confirm inactivation of Pol ι activity, primer extension assays were performed using Pol ι and Pol ιm proteins expressed in E. coli. As expected, Pol ιm displayed no extension, whereas an identical concentration of wild-type Pol ι had robust synthesis (Fig. 1 B). To see if the lack of synthesis by Pol ιm was caused by an inability to bind to a primer terminus, we compared the mutant enzyme to wild-type enzyme in an electrophoretic mobility shift assay (EMSA; Fig. 1 C). The analysis revealed that Pol ι bound the primer-template with a Kd of 21 ± 3 nM, whereas Pol ιm bound with a comparable Kd of 8.8 ± 1.7 nM. To test the in vivo properties of the mutant protein, we examined if it could accumulate in DNA replication foci analogous to wild-type protein (Kannouche et al., 2003). EGFP-Pol ι and EGFP-Pol ιm constructs were transfected into proliferating HEK293T cells, and fluorescence images of cell nuclei were obtained. Spontaneous replication foci from 600 nuclei in each group were then analyzed. As shown in Fig. 1 D, Pol ι foci were observed in approximately half of the nuclei, and Pol ιm foci occurred in approximately one-third of the nuclei. This shows that Pol ιm is present, albeit in a lesser amount, in physiological transactions during DNA replication, similar to wild-type Pol ι.
We next generated a mouse strain expressing Pol ιm. Knock-in mice were made from C57BL/6 embryonic cells using standard genetic techniques. To confirm the existence of the D126A-E127A allele, PCR was used to amplify a 320-bp region surrounding exon 4, and the mutant allele was confirmed by digestion with TseI (Fig. 1, A and E). Homozygous Polim/m mice were then bred and tested for expression of the mutant Pol ι protein. Testis extracts, which contain abundant quantities of the protein, were prepared from Poli/+(C57BL/6), Poli129/129, and Polim/m mice, and analyzed by Western blotting (McDonald et al., 2003). Pol ι protein was expressed at high levels in extracts from Poli+/+ and Polim/m mice, compared with Poli129/129 extracts (Fig. 1 F). Quantitation of the blots revealed that steady-state levels of mutant Pol ι were approximately two-thirds of that observed for wild type (Fig. 1 G). The lower levels of Pol ιm may be a result of either reduced expression or increased turn-over. In contrast, Pol ι was virtually absent in extracts from 129 mice, as previously reported (McDonald et al., 2003).
Mice with defective Pol ι have normal frequency and spectra of mutations
The frequency and types of mutation were measured in B cells from Peyer’s patches, which undergo constitutive activation from gut flora in the small intestine. Germinal center B cells (GL7+ B220+) were isolated by flow cytometry, and DNA was prepared and PCR amplified to produce a 492-bp region downstream of the rearranged JH4 gene segment on the Igh locus. Mutations in this intron region reflect the direct effects of polymerase mutagenesis in the absence of selection. The frequencies in mutations/base pairs from Poli+/+ (1.4 × 10−2), Poli129/129 (1.9 × 10−2), and Polim/m (1.5 × 10−2) sequences (Fig. 2 A), and the number of mutations per sequence (Fig. 2 B) were similar. The types of mutations in the two strains expressing defective Pol ι were also analogous to those from Poli+/+ mice (Fig. 2 C). Thus, the absence of Pol ι in the Poli129/129 strain, or the presence of catalytically inactive protein in the Polim/m strain, did not affect the frequency or spectra of mutations, perhaps because it quickly falls off the template after partial synthesis.
Pol ι acts sequentially with Pol ζ to generate tandem mutations
Pol ζ generates a unique error signature in variable regions: tandem or adjacent double mutations (Daly et al., 2012; Saribasak et al., 2012). Because the error frequency of Pol ζ is only 10−3 (Zhong et al., 2006), it would be inefficient at generating the first mutation. However, Pol ζ excels at extending from an initial mismatch with a second mutation to generate a tandem double mutation (Stone et al., 2012). What is the polymerase that puts in the first mutation? In somatic hypermutation, we reported that tandem mutations were elevated in the absence of Pol η (Saribasak et al., 2012), indicating that Pol η is not involved. We propose that the very error-prone Pol ι, with an error frequency as high as 10−1 when copying some bases (Tissier et al., 2000), generates the first mismatch. To address this hypothesis, we developed a sensitive computational model to statistically determine if tandem substitutions were generated during one synthesis event or during two independent events. The model was based on two parameters: (1) the frequency of mutations per nucleotide and (2) the number of mutations per sequence. Frequencies were calculated from 2866 mutations in C57BL/6 sequences (Fig. 3 A). A list of the frequencies for the first 10 bases is displayed in Fig. 3 A; a complete listing for 492 bases in the JH4 intron is shown in Table S1. The data demonstrate that each nucleotide accumulates mutations at a different frequency, which might be caused by hot spot locations and sequence environment (MacCarthy et al., 2009). Sequences with more mutations will have an increased probability of two independent events occurring adjacently during clonal expansion. However, a tandem event in a sequence with only two mutations would be highly significant, and likely result from a single synthesis event.
Starting with the nonmutated germline sequence, the program predicts the location of mutations depending on these two parameters. For example, Fig. 3 B shows eight sequences, with three sequences containing two mutations, four sequences with three mutations, and one sequence with four mutations. There are four predicted tandem events in this simulated dataset. The process is repeated 100,000 times to calculate the number of expected tandems. Their distribution is represented as a histogram, and the area under the histogram illustrates the number of simulations that would yield the expected tandems. We then analyzed data from Peyer’s patches to count the number of actual tandem mutations. In Poli+/+ mice, 425 sequences were examined, and 123 tandems were observed (Table 1). This was compared against the expected distribution of tandems given by the computational program (Fig. 3 C); a p-value of 10−5 indicates that the observed number of tandems (red line) was significantly higher than the simulated number. Tandem mutations were then examined in datasets from Poli129/129 and Polim/m mice. The observed number of tandems was counted for each of these (Table 1), and compared against the expected distribution given by the computational model. Fig. 3 (D and E) show Poli129/129 mice and Polim/m mice had no increase in observed tandems beyond what was expected by random chance (P = 0.06 and 0.22, respectively). Thus, they did not accrue a surplus of contiguous mutations, suggesting that loss of functional Pol ι prevented tandem generation.
Pols ι and ζ cooperate at the Ig loci
Although the absence of functional Pol ι had no effect on the frequency or spectra of mutation, it did affect the signature of Pol ζ. In yeast and mice, adjacent mutations were abolished in the absence of Pol ζ, and elevated in the presence of a mutagenic form of the polymerase (Daly et al., 2012; Stone et al., 2012). Using a computational model to assess if tandem mutations are produced during a single DNA transaction or during two independent events, we show that Poli+/+ mice have a highly significant excess of tandem substitutions, implying that many of them are synthesized simultaneously. To estimate the frequency of tandem mutations that result from a single synthesis event, the mean number of expected tandems (82) was subtracted from the observed number (123) to acquire the number of single transaction events. This number was then compared with the total mutations (2,581), to show that tandem events make up 1.6% of mutations. This contribution to somatic hypermutation is small, which explains why it was missed before (McDonald et al., 2003; Martomo et al., 2006). Nonetheless, the value is reduced in Poli129/129 and Polim/m mice to 0.7 and 0.5%, respectively. Thus, Pol ι-compromised mice lost most of the tandem signature, suggesting that this polymerase plays a role with Pol ζ to produce the double substitutions.
To model this in Fig. 4 A, Pol η is the dominant polymerase in mutation synthesis, and generates mostly single mutations. Pol η does not convert these to tandems because Pol η–deficient mice and yeast have an abundance of tandems (Saribasak et al., 2012; Stone et al., 2012). In Fig. 4 B, we propose that occasionally Pol ι accesses a repair gap, and due to its low fidelity and distributive synthesis, it fills in one or two bases and then dissociates from the template after introducing a mismatch. This predicts that the first mutation in a tandem pair should have the error signature of Pol ι. To enrich for tandems that have likely resulted from one synthesis event, we examined sequences with ≤5 total mutations. The first mutation in five of seven sequences containing a tandem pair had an A to G or A to T substitution. This error signature corresponds to the two most probable mutations generated by Pol ι in vitro (Tissier et al., 2000), supporting a role for Pol ι in incorporating the first mismatch. This mispair can be extended by either Pol η, which faithfully inserts the next nucleotide, or Pol ζ, which has a unique catalytic property that allows it to synthesize a second misinsertion, and then extend from the double mismatch. Could Pol ι insert the second mismatch? Analysis of mismatch extension by Pol ι suggests that it can generate a second mismatch only in a specific sequence context (Vaisman et al., 2001). However, Pol ζ can generate the second mismatch and extend from that mismatch without an apparent sequence bias (Zhong et al., 2006).
A role for Pol ι in somatic hypermutation has been proposed in a BL2 human cell line that was deficient for the polymerase, and had decreased mutation frequency (Faili et al., 2002), and it was suggested that Pol ι is involved in the UNG pathway in these cells (Weill and Reynaud, 2008). However, in mice deficient for the polymerase, there was no decrease in the frequency of mutation. Further, our data indicate a role for Pol ι in the MSH2–MSH6 pathway, as tandem mutations are eliminated in Msh2−/− or Msh6−/− mice (Saribasak et al., 2012). It is unclear if these conflicting results are caused by a species difference or by a unique feature of the BL2 cell line.
A biological role for the three translesion polymerases is evident in mice deficient for the enzymes. Pol η–deficient mice have increased levels of sensitivity to ultraviolet irradiation (Lin et al., 2006), Pol ζ–deficient mice sustain developmental defects (Wittschieben et al., 2000), and Pol ι–deficient mice accumulate mesenchymal tumors (Ohkumo et al., 2006). An immunological role for Pol ι may reside in its ability to participate in changing two bases in a codon during one mutagenic DNA synthesis reaction, to rapidly increase the pool of diverse antibodies for selection by antigen. The interplay of these three polymerases in the minefield of AID-induced lesions in variable regions raises interesting questions about hierarchy, repair, and mutagenesis.
MATERIALS AND METHODS
Protein expression and purification
Full length N-terminal His-tagged human Pol ι was expressed using the E. coli strain RW644 and expression plasmid pJM868, as previously described (Frank et al., 2012). The Pol ι mutant was generated by chemically synthesizing a 615-bp BglII–HindIII fragment encoding the N terminus of the E. coli codon-optimized Poli gene with the D126A-E127A substitution (Genscript). The fragment was subsequently subcloned as a BglII–HindIII fragment into plasmid pCT14 (Donigan et al., 2014), and the resulting plasmid, pKD005, was grown in RW644. Pol ι and Pol ιm proteins were purified and determined to be >95% pure by SDS-PAGE analysis.
Primer extension and EMSA
Primer extension reactions (10 µl) contained 2 nM polymerase, 10 nM 32P-labeled 16-mer primer annealed to a 28-mer template, T10AGC (5′-GCAAAAAAAAAAAAGCACGTCCGTACCA-3′; the position of the primer is underlined), and 100 µM nucleotides in reaction buffer (0.5 mM MnCl2, 40 mM Tris-HCl, pH 8.0, 10 mM dithiothreitol, 250 µg/ml bovine serum albumin, 2.5% glycerol, and 10 mM 2-mercaptoethanol). DNA substrates were confirmed to be >95% annealed by incubating with DNA Pol I Klenow fragment exo− (New England Biolabs). The reactions were then incubated at 37°C for 20 min, and quenched by the addition of 10 µl of 95% formamide and 10 mM EDTA. Samples were heated to 100°C for 5 min, and resolved on an 18% polyacrylamide-urea gel. Reaction products were visualized and quantified using a Fuji FLA-5100 Phosphorimager and ImageGauge software. EMSA gels were used to determine the DNA binding constants (KD(DNA)) for Pol ι and Pol ιm (Donigan et al., 2014). Serial dilutions of the enzymes (0.2–819 nM for Pol ι and 0.2–429 nM for Pol ιm) were incubated with radiolabeled DNA substrate in reaction buffer and separated by native gel electrophoresis. The fraction of bound DNA was quantified relative to protein concentration, as previously described (Donigan et al., 2014).
The Poli gene was codon optimized for maximal expression in human cells. The ∼2.2-kb gene was synthesized (Genscript) and subcloned as a XhoI–BamHI fragment into pEGFP-C1 (Invitrogen), to generate EGFP-Pol ι (pDH24). The Polim gene was constructed by synthesizing an ∼500-bp XhoI–PmlI fragment containing D126A and E127A substitutions, as well as silent mutations that generated a unique BstXI restriction enzyme site for subsequent subclone identification. The fragment was cloned as pEGFP-Pol ιm (pDH33), and the insert was confirmed by digestion with BstXI. HEK293T cells (ATCC) were plated onto Number 1 coverslips to be fixed and mounted onto slides after treatments. These plasmids were then transfected into HEK293T cells using TurboFectin 8.0, according to the manufacturer's protocol (OriGene). Cells were fixed using 2% formaldehyde (16% methanol-free formaldehyde; Polysciences, Inc.) in PBS, and mounted onto slides using Mowiol mounting medium. Fluorescence images of cell nuclei were acquired on an Axiovert 40 CFL (Zeiss) with an X-cite Series 120Q light source using Zen 2 (blue edition) software.
Polim/m mouse generation and characterization
The Polim allele was generated in a C57BL/6 background (Ozgene Pty Ltd.) using standard genetic manipulation techniques. The mutant allele was distinguished from the wild-type allele by PCR amplification and subsequent digestion with TseI restriction enzyme (New England Biolabs). To this end, genomic tail DNA was amplified with forward 5′-ACACACACAATACTACACA-3′ and reverse 5′-AAGCTGCTGGAGTCTTTT-3′ primers to generate a 320-bp fragment. The product was then digested with TseI, and separated on an agarose gel to visualize products. The 320-bp wild-type band was not cut, whereas the mutant band generated 270- and 50-bp fragments. For Western blots, protein extracts from mouse testes were prepared and analyzed with affinity-purified rabbit anti-Pol ι (McDonald et al., 2003) and mouse anti–β-actin (clone AC-15; Sigma-Aldrich), followed by goat anti–rabbit or goat anti–mouse IgG peroxidase antibodies (Sigma-Aldrich), respectively. Chemiluminescent detection was performed using ECL plus Western blotting substrate (Thermo Fisher Scientific).
Homozygous Polim/m and C57BL/6 mice were bred in the NIA mouse facility. 129X1/SvJ mice were purchased from The Jackson Laboratory. Mice of both genders were used at 2–6 mo of age. All animal procedures were reviewed and approved by the Animal Care and Use Committee of the National Institute on Aging.
Lymphocytes were isolated from Peyer’s patches and stained with FITC-labeled anti-B220 (clone RA3-6B2; eBioscience) and Alexa Fluor 647–labeled anti-GL7 (clone GL7; BioLegend) antibodies, followed by cell sorting for the B220+ GL7+ germinal center B cell population. Cells were then lysed in digestion buffer (10 mM Tris, pH 8.0, 25 mM EDTA, 100 mM NaCl, 1% SDS, and 0.1 mg/ml proteinase K) at 55°C overnight. Genomic DNA was isolated by phenol/chloroform extraction and ethanol precipitation. The JH4 intronic region was amplified and analyzed as previously described (Saribasak et al., 2012). For mutation frequency and spectra, mutations were analyzed from previously published data (2,866 mutations) from C57BL/6 mice (Rada et al., 2004; Martomo et al., 2008; Schenten et al., 2009; Saribasak et al., 2011; Zanotti et al., 2015). Poli129/129 data were from McDonald et al. (2003) and data generated in this study, and Polim/m data were generated in-house.
For analysis of tandems, a subset of sequences containing only full-length clones was used, and is listed in Table 1. The method starts with the nonmutated germline sequence, together with the mutation frequencies observed at each site in the C57BL/6 dataset (Table S1). The general idea was to use the expected mutation frequencies to generate simulated datasets that represent a null model, which could then be compared with a target dataset, usually from a mutant. In preparation for the simulation, probabilities for mutating each site were calculated such that they were proportional to the mutation frequency at that site. The R function “sample” was then used to generate the simulated mutations at these sites according to the assigned probabilities. When we compared simulated data with a target dataset, we ensured that the simulated dataset contained the same number of sequences and equal numbers of mutations per sequence as the target dataset, to avoid any bias in the simulated datasets. Simulated datasets that, for example, contain higher numbers of mutations per sequence would be biased toward having more tandem mutations. Because we were interested specifically in measuring the number of tandem mutations, only sequences containing at least two mutations in the target dataset were considered. As shown schematically in Fig. 3 B, the number of tandem mutations in each simulated dataset was counted. For each target dataset we generated a total of 100,000 simulated datasets, and the number of tandem mutations gave us an expected, or null model, distribution of tandem mutations, after normalization. We then considered how the number of tandem mutations observed in the target dataset (for example, observed, in Fig. 3 C) compared with this null model distribution. The p-value represents the fraction of simulations (out of 100,000) in which the number of expected tandems was greater than, plus half of the number equal, to the observed value.
Online supplemental material
Table S1 lists the mutation frequency at individual nucleotides in the JH4 intron. Data were accumulated from published C57BL/6 sequences containing 2,866 mutations within the 492-bp region.
We thank Z. Cao for technical support, and R. Wersto, C. Nguyen, and T. Wallace of the National Institute on Aging flow cytometry unit for sorting. We gratefully acknowledge Ranjan Sen for critical reading of the manuscript.
This work was supported in part by National Institutes of Health (NIH) grant GM111741 to T. MacCarthy, and the Intramural Research Programs of the National Institute of Child Health and Human Development to R. Woodgate, and National Institute on Aging to P.J. Gearhart.
The authors declare no competing financial interests.