Somatic hypermutation (SHM) is restricted to VDJ regions and their adjacent flanks in immunoglobulin (Ig) genes, whereas constant regions are spared. Mutations occur after about 100 nucleotides downstream of the promoter and extend to 1–2 kb. We have asked why the very 5′ and most of the 3′ region of Ig genes are unmutated. Does the activation-induced cytosine deaminase (AID) that initiates SHM not gain access to these regions, or does AID gain access, but the resulting uracils are repaired error-free because error-prone repair does not gain access? The distribution of mutations was compared between uracil DNA glycosylase (Ung)-deficient and wild-type mice in endogenous Ig genes and in an Ig transgene. If AID gains access to the 5′ and 3′ regions that are unmutated in wild-type mice, one would expect an “AID footprint,” namely transition mutations from C and G in Ung-deficient mice in the regions normally devoid of SHM. We find that the distribution of total mutations and transitions from C and G is indistinguishable in wild-type and Ung-deficient mice. Thus, AID does not gain access to the 5′ and constant regions of Ig genes. The implications for the role of transcription and Ung in SHM are discussed.
Somatic hypermutation (SHM) of Ig genes requires the function of activation-induced cytosine deaminase (AID; references 1 and 2). Although unidentified, AID may have an RNA substrate (3). However, in vitro, AID deaminates monomeric dCTP (4), single-stranded DNA (5–7), transcribed double-stranded DNA (8–11), and supercoiled double-stranded DNA (12). Moreover, studies in cells deficient in uracil DNA glycosylase (Ung) activity strongly support the idea that AID deaminates cytosines in DNA directly to produce U:G DNA mispairs (13–15). In the absence of Ung, C mutations are almost exclusively C to T (and for C deamination on the opposite strand, complementary G to A) transitions, as expected if deoxyuridines resulting from AID cytidine deamination are left unrepaired before replication. Indeed, in activated B cells, uracil excision appears to be accomplished mostly by Ung, with minimal contribution from the SMUG1 uracil glycosylase (16, 17). Although transition mutations in Ung–wild-type mammals are the most common SHMs from C and G, transversions are also frequent and are thought to arise either entirely (13, 15) or in part (18) from Ung-mediated base excision repair. Ung-mediated excision of the uracil base creates an abasic site that, if used as a template for DNA synthesis, could lead to any mutation from C or G (for review see reference 19). Alternatively, processing of the abasic site to generate a single-stranded gap might lead to mutations when followed by error-prone DNA synthesis involving lesion-bypass polymerases (pols), of which pol η, pol ι, and pol ζ have been implicated in SHM (18). Finally, the U:G mispair could also be processed by the mismatch repair system. The SHM pattern in mismatch repair (Msh)2- and pol η–deficient B cells supports a model in which Msh2/6-mediated removal of the U:G mismatch, followed by pol η and pol ι error-prone DNA synthesis, leads mainly to mutations from A and T (20, 21).
The mechanism by which SHM is targeted almost exclusively to Ig DNA is a mystery. Although Ig V regions accumulate mutations at an average frequency of 10−4–10−3 mutations/base pair/cell generation (∼106-fold higher than the frequency of spontaneous mutations), with a few exceptions (22–25), most other genes are not mutated. Transcription of the target DNA is required for normal SHM because (a) removing an enhancer from a murine Ig transgene decreases the SHM frequency (26); (b) mutation frequency correlates positively with the level of transcription regulated by an inducible promoter upstream of a transgene in a mutating B cell line (27); (c) the frequency of mutations is dependent on distance from the promoter (28–33); and (d) duplicating an Igκ promoter upstream of the normally unmutated Ig C region in an Igκ transgene results in initiation of mutations in the C region with a similar dependence on distance from the promoter (34). The latter observation led to the proposal that a “mutator factor,” perhaps an endonuclease, is recruited to transcription complexes initiating at Ig loci, travels with the complex, and gets deposited stochastically where it nicks the DNA, perhaps when the RNA pol pauses (34, 35). Error-prone repair of nicks would generate mutations. Due to a high chance of deposition within the first 1–2 kb from the promoter, the mutator factor was proposed to be absent from transcription complexes beyond 1–2 kb downstream of the transcription start, thereby protecting the C region from mutation (34, 35).
We sought to determine whether AID is a transcription-coupled mutator factor in vivo. In vitro, deamination of double-stranded linear DNA by AID requires transcription (8–11) or a putative negatively supercoiled conformation (12). Negative supercoils are generated behind an elongating transcription complex (36). However, that transcription alone creates the appropriate substrate for AID does not explain how DNA within the first ∼200 bp and beyond 1–2 bp of the transcription start (i.e., Ig C regions), and most other genes that are highly transcribed in germinal center B cells, remain essentially mutation-free. To address this question, we considered two possibilities: transcription complexes at or near the transcription initiation site load and carry AID, which deaminates cytosines for 1–2 kb before becoming inactive or dissociating from the elongating transcription complex, or AID deaminates cytosines made accessible by transcription or supercoiled conformation throughout Ig genes and in other loci. However, although repair of U:G mismatches would be error-free in the very 5′ and C regions of Ig loci, and in other loci (like during Ung-mediated repair of replication errors; references 37 and 38), error-prone repair could occur in V regions because error-prone repair factors (or factors that recruit them) are targeted preferentially to Ig V regions.
Whether targeting of AID or of error-prone repair causes the skewed distribution of SHM was addressed by comparing the distribution of mutations relative to the transcription start site in vivo, in murine B cells wild-type (Ung+) or deficient (Ung−) for Ung. Importantly, these experiments were performed under conditions of endogenous rather than overexpressed levels of AID and therefore should be most appropriate for detecting Ig-specific targeting of SHM. SHMs in endogenous Ig loci and in an unselected transgene (39) were analyzed. We reasoned that C to T (and complementary G to A) transition mutations in the absence of Ung represent AID-catalyzed cytosine deaminations, i.e., an AID “footprint,” based on the following considerations. In the absence of Ung, uracils created by AID result in transitions from C and G when the DNA is replicated. It is unlikely that many transitions from C or G occur by the interaction of error-prone pols with mismatch repair. The major error-prone pols found to be involved in SHM are pol η and pol ι (18, 20, 21, 40). Both have the highest error rates when copying T, leading to transitions from A (T) (41, 42). In the Ung− mice, where MMR would still be active, there is no increase in mutations from A or T (15 and see Figs. 1–4). Thus, the transitions from C and G are likely an AID footprint (see Discussion). We have mapped mutations relative to each transcription initiation site. Strikingly, the AID footprint is virtually absent in the first 100 transcribed nt, low between 100–200 nt, and significantly reduced beyond ∼1.5 kb of the transcription start, supporting the idea that AID activity is coupled to the transition between transcription initiation and elongation from Ig promoters.
To determine whether AID deamination, or post-AID processing of the deaminated cytosine, determines the mutation distribution within Ig genes, we mapped C to T (and G to A) mutations in two endogenous Ig loci, the λ1 light chain locus and VJ558-rearranged IgH genes, as well as in an Igκ transgene in germinal center B cells of immunized Ung− and Ung+ control mice.
AID deaminations are essentially absent from regions upstream and within ∼100 bp downstream of the transcription start site in endogenous rearranged Vλ1 genes
Upstream and immediately downstream of the transcription start site, SHMs have been reported to occur at very low frequencies, if at all (28–31). To determine whether this 5′ region is protected or not from AID-mediated cytosine deamination, we mapped mutations in rearranged λ1 light chain genes. In mice, there are only three Vλ genes (43), and ∼60% of λ-expressing B cells derive from Vλ1/Jλ1 rearrangements (44). Genomic DNA from PNAhigh germinal center B cells of immunized Ung+ or Ung− mice was used as a template to PCR amplify rearranged λ1 genes. As reported previously for SHM in other Ig loci of Ung− cells (15), mutations from C and G are almost exclusively transitions in the Ung− as compared with Ung+ mice (Fig. 1 A). The specific mutation pattern shift favoring C to T (and G to A) mutations in Ung− mice in all loci assayed here (see below) further substantiates that C and G transitions likely represent an AID footprint.
A 991-bp region of Vλ1 between 453 bp upstream and 538 bp downstream (ending at the 3′ region of Jλ1) of the start of transcription of λ1 was analyzed for SHM (Fig. 1 C). Both the subset of transition mutations from C and G as well as the total mutations are plotted with respect to their position in Fig. 1 B (note that all mutations from A, C, G, and T are plotted below the zero ordinate, whereas only the transitions from C and G are plotted above the zero ordinate). Similar to previous reports, we found only three mutations between −453 and +100 bp relative to the transcription start in wild-type mice. Mutations in Ung− mice, including C and G transition mutations, were almost entirely absent from −453 to +242 bp. In both Ung+ and Ung− mice, mutations eventually reached a plateau (at about +200 or +350 bp, respectively). Elevated numbers of mutations at complementarity-determining region (CDR)1, CDR2, and CDR3 (Fig. 1 B) suggest that B cells expressing a functional Vλ1-Jλ1 gene were selected in germinal centers for their affinity to the immunogen.
The paucity of mutations upstream of the plateau is not due to a lack of AID hotspots in the upstream and very 5′ regions (Table I). Therefore, because C or G transition mutations in Ung− mice, i.e., AID footprints, are virtually absent upstream and within the first 100 bp of the transcription start, AID appears not to gain efficient access to this 5′ region of λ1.
Mutations in the 5′ region of the RS transgene
B cells require surface Ig expression for survival (for review see reference 45), and mutations in Ig genes are selected for their ability to confer increased antigen affinity. Because mutations in the promoter, leader, or leader-V intron region of the endogenous λ1 gene may have been selected against and might therefore be undetected, we also examined the 5′ region of an unselected murine Ig transgene. The RS transgene is based on a rearranged Igκ167 gene, including its native promoter and both intronic and 3′ enhancer elements, and was reported previously to acquire 5–10-fold more SHMs than similar transgenes due to two E boxes (5′ CAGGTG 3′), DNA binding elements located ∼900 bp from the promoter (in the RS region; Fig. 2 C) to which E2A proteins can bind (46). Despite increasing the mutation frequency of all regions of the RS transgene, the E boxes do not increase the level of RS transgene transcription (46).
The RS transgenic mice were bred onto Ung− and Ung+ backgrounds and immunized to collect activated B cells from germinal centers. SHMs were mapped in a 5′ region of the RS transgene between nt positions –301 and +964 relative to the transcription start (Fig. 2 C). As in the endogenous λ1 gene, Ung deficiency caused a shift in the C and G mutation pattern predominantly to transitions (Fig. 2 A). Therefore, the molecular mechanisms governing mutation patterns appear to be similar between endogenous Ig genes and the RS transgene. In the Ung+ background, there were no mutations upstream of the promoter and in the first 83 bp of the transcribed region (Fig. 2 B). The numbers of mutations per 100 bp become comparable between +200 and +964 bp, such that the mutation frequency reaches an apparent plateau after +200 bp, similar to λ1. In Ung− mice, there was one mutation upstream of the promoter and no mutations within the first 101 bp from the transcription start. Three mutations were seen in the next 50 bp, and an apparent plateau of mutations was reached around +180 (Fig. 2 B). Thus, the very 5′ end of the RS transgene lacks mutations in both Ung+ and Ung− situations.
In Ung− mice, one of the 138 clones analyzed contained eight mutations upstream of the transcription start of the RS transgene. The mutations were all from A or T, and mostly from A to T, and therefore did not represent the AID footprint directly. The RS transgenic mouse has two copies of the RS transgene in tandem. We discovered previously that the 3′ κ enhancer contains a promoter that can drive SHM (47). Thus, the 5′ mutations in this rare clone may have gained mutations from the 3′ promoter of the upstream transgene, and this clone was not included in the analysis of Ung− mice.
To rule out the possibility that lack of AID hotspots results in the mutation pattern observed in the 5′ region of the RS transgene, the distribution of WRC and GYW motifs (7) was analyzed (Table II). On average, there are 11.3 hotspots per 100 bp in the 5′ end of the RS gene. The upstream regions are not impoverished for hotspots, and the consistency of hotspots throughout the gene to +964 bp does not correlate with the absence of mutations in the 5′ end of the transcribed region.
Because the number of mutations in the first 200 bp is very low, the frequency of mutations in the 5′ portion of the RS transgene was analyzed statistically. To test whether a very low number of mutations in the first 200 bp can result from random mutation, we defined our null hypothesis to be that mutations occur at a flat rate between +1 and +964 bp of the transcription start. For the Ung+ mice, in 10,000 simulations of randomly placing a total of 318 mutations between bp +1 and +964, observing only two mutations or fewer within the first 100 bp is unlikely (P ≅ 0; Table II). For +101 to +200 bp, observing only 16 mutations or fewer is also very unlikely in this region if mutations occur randomly because there were only 5 out of 10,000 cases with 16 or fewer mutations in the simulations (P = 0.0005). For the Ung− mice, the same null hypothesis was tested by placing 260 mutations randomly between nt +1 to +964 in the RS transgene. Again, observing no mutations between +1 and +100 (P ≅ 0) and 14 mutations between +101 and +200 bp (P = 0.0036) was very unlikely, indicating that mutations are very unlikely to be distributed randomly in this region. The same conclusions are obtained when applying similar statistical analysis to the mutations in the endogenous λ1 gene, despite smaller numbers of mutations (Table I).
The rejection of the null hypothesis for the first 200 bp of both Ung+ and Ung− backgrounds indicates that the mutation processes occurring within the first 200 bp are statistically different from the processes at +201 to +964. The results suggest that SHM does not operate in the first 100 bp and operates at a reduced level from ∼+100 to ∼+200 bp. Moreover, AID appears not to have access to the 5′end of Ig genes (see Discussion).
The frequency of AID deaminations decreases with increasing distance downstream of transcription initiation in endogenous IgH chain genes
To analyze the decline in mutation frequency in 3′ regions beyond 1–2 kb from the transcription start, mutations were mapped in the intron between JH4 and Cμ in VHJ558-rearranged IgH genes. IgH chain genes rearrange using one of four possible JH segments, such that the intronic DNA immediately downstream of J4 is closer to the start of transcription in D-J4 rearrangements compared with D-J rearrangements with upstream JH segments, J1, J2, or J3, due to deletion of the DNA between the recombined D and J segments. This feature allows comparison of the frequency of SHM in an identical, endogenous segment of DNA when located at different distances from the start of transcription (Fig. 3 C). Mutation frequencies between J3 and J4 rearrangements reportedly differ 3.6-fold (4.0 × 10−3 vs. 14.4 × 10−3, respectively) in a 314-bp segment encompassing two thirds of the 3′ end of J4 and the immediate 3′ intronic flank (33).
We performed a similar analysis of mutations in the J4-flanking intron segment in immunized Ung+ and Ung− mice to determine whether AID gains equal access to this region regardless of the distance between this DNA segment and the start of transcription. (We did not consider switch region mutations that would be located further 3′ and are probably induced independently of VDJ region mutations.) The distribution of mutations in J2- and J4-rearranged genes is shown in Fig. 3 B, top and bottom, respectively. For both J2- and J4-rearranged genes, the sequence was analyzed between the 3′ end of the rearranged J segment and the 3′ primer used for PCR amplification (located 1,108 bp downstream of J4; Fig. 3 B). Data in Fig. 3 represent mutations pooled from three mice, either Ung+ or Ung−, and as reported previously for this locus (15), mutations from C or G in Ung− mice are predominantly transitions (Fig. 3 A). Moreover, the distribution of mutations in the J4-Cμ intron region is similar between Ung+ and Ung− mice. In both Ung+ and Ung− mice, all mutations are more frequent within and near the rearranged J segment than further downstream (Fig. 3 B). Finally, in the segment 3′ of J4 present in both J2 and J4 rearrangements, but located ∼950 bp further from the transcription start in J2-rearranged genes, mutations are ∼12- and 5-fold higher in J4-rearranged genes versus J2-rearranged genes in Ung+ and Ung− mice, respectively (Fig. 3 B). The subset of C and G transition mutations differs ∼33- and 6-fold in Ung+ and Ung− mice, respectively, for the same region. Thus, independent of DNA sequence, high levels of cytosine deamination by AID occur only up to ∼600-700 bp downstream of the 3′ end of the rearranged JH gene. Further 3′, mutations and transitions from C and G also decrease drastically in the absence of Ung.
There appear to be more mutations in the 3′ region (beyond +1284) in the Ung− mice as compared with the Ung+ mice. We speculate that this might be due to the technical difficulty of PCR amplifying long regions of contiguous genomic Ig DNA, which produces a significant fraction of hybrid PCR products as observed previously for the IgH locus using the same PCR primers (33). We excluded all detectable hybrid products from this analysis using polymorphisms in the analyzed region (see Materials and methods), but of course hybrids between DNA molecules derived from the same allele would not be detectable. As a consequence, some of the mutations downstream of J2 might be derived from hybrids (for example, derived from a J3- or J4-rearranged segment with mutations after +1284), possibly leading to an overestimate of the mutations 3′ of J4 in clones scored as J2-rearranged. Conversely, hybrid PCR artifacts would lead to an underestimation of the mutations after +1,284 in J4-rearranged clones via similar events. As a consequence, we expect that the observed difference in mutation frequency between Ung+ and Ung− mice beyond +1,284 is not significant.
Mutations in the mutation plateau and 3′ regions of the RS transgene
Just as for the λ light chain analysis, mutations in the heavy chain J-C intron may have been selected against because of unknown functional consequences (e.g., RNA splicing, IgH intron enhancer function, etc.) and thus would be undetected. Therefore, we also examined the 3′ distribution of mutations in the unselected RS transgene. To examine whether there is a decline in mutation frequency with increasing distance from the start of transcription as is seen in endogenous IgH genes, two sections in the RS transgene were analyzed (Fig. 4, B and C). As in the endogenous IgH locus (Fig. 3 B), mutations are frequent between +162 and +1,078 and 10–13-fold less frequent 500 bp further downstream between +1,580 and +2,900. A similar decline in mutation frequency was observed in the subset of transition mutations from C and G, both in the presence and absence of Ung (e.g., a sevenfold decrease in Ung−). The paucity of mutations in the 3′ region beyond 1.5 kb of the RS transgene is particularly striking when considering the large mutation load of some clones (e.g., in Ung− mice, the maximum number of mutations per clone in the region between +162 and +1,078 was 55, whereas a maximum of only 9 mutations per clone was found between +1,580 and +2,900), suggesting that not only the 5′ but also the 3′ boundary of mutations is rather defined in the RS transgene. Moreover, DNA elements sufficient to define both the transcriptional correlation of SHM, as well as the mutation pattern, must be contained within the elements present in the RS transgene.
The WRC trinucleotide in which the C (or G in the complementary GYW sequence) that is mutated is a hotspot for AID targeting in vitro (7). To determine whether this is also true in vivo, we compared C and G mutations in hotspots and nonhotspots in Ung+ versus Ung− mice in the RS transgene between +162 and +1078 bp. Interestingly, 59 and 64% of mutations from C and G were found in hotspots in Ung+ and Ung− mice, respectively, suggesting that the WRC/GYW hotspots are also preferred AID substrates in vivo, and that targeting of these hotspots is determined by AID rather than post-AID repair steps.
Relationship of SHM to transcription
Evidence for a correlation between SHM and transcription is abundant (for review see references 48 and 49). Yet, how SHM and transcription are related mechanistically is unknown. Recent in vitro studies of the function of AID suggest that transcription creates a substrate for AID, perhaps single-stranded DNA (for review see reference 49) or negatively supercoiled double-stranded DNA (12). However, in B cells undergoing SHM, how mutations are restricted mostly to V regions of Ig DNA in vivo, where Ig C-region DNA and many other genes are transcribed, is not understood.
C to T (and the corresponding G to A) transition mutations in the absence of Ung likely represent AID-catalyzed mutations (also see below). If AID initiates the SHM process, and AID (rather than error-prone repair) is directed to the target by transcription, then the frequency of AID-promoted transition mutations from C and G in Ung− mice should follow the distribution frequency of typical SHM in an Ung+ mouse. Precisely this pattern was observed both in endogenous Ig loci and in the RS transgene in vivo, where C and G transition mutations overlapped with the distribution of all mutations in both Ung+ and Ung− mice. Moreover, the fact that the promoter and general region upstream of the transcription start are essentially devoid of C or G transition mutations in Ung− mice suggests that targeting of AID requires transcribed DNA.
Transcription coupling of AID activity might be accomplished by association of AID with the transcription complex during transcription elongation. (AID–RNA pol complexes have been observed so far only in experiments in which AID was epitope-tagged and overexpressed ). Whether physiological levels of AID bind RNA pol has not been demonstrated. Alternatively, transcribed V-region DNA might be in a special conformation that allows access to AID that is not associated with RNA pol. AID might be tethered to Ig-specific enhancers (or to cis elements specific to other genes that undergo SHM) and may thus be available in the vicinity of target DNA.
Within transcribed DNA, the first ∼100 transcribed bp are essentially mutation-free, indicating that these bp are not deaminated by AID. There are several possible reasons for the lack of access by AID to the very 5′ end. First, although AID may interact with the transcription complex (34), it may do so only when RNA pol II is in the processive mode. In this case, AID may not be loaded at the promoter, but rather after the pol has switched from the abortive to the processive mode (51, 52). These two modes are distinguished by the phosphorylation state of the pol. In the abortive phase, the COOH-terminal domain of the largest subunit of RNA pol II is phosphorylated at serine-5 of the heptad repeats. Gradually, these heptad repeats acquire a different phosphorylation pattern, such that in the processive phase, serine-5 phosphorylation has been removed and serine-2 is phosphorylated. Although the switch to the processive mode likely begins before +100 bp from the transcription start, an AID-compatible phosphorylation state may not be present until after +100 bp. AID would be expected to access DNA that is associated with RNA pol, or, more likely, in negative supercoils arising behind the pol (see below). This scenario implies that AID travels with the elongating transcription complex.
Second, the lack of mutations in the upstream region might be due to a lack of negative supercoils in that region. Based on the possibility that AID gains access to DNA in negatively supercoiled regions accumulating behind the RNA pol (12), transcription-associated negative supercoils may need to reach a certain threshold level before AID can interact with the DNA. It is conceivable that negative supercoil propagation in the upstream direction is restricted to the transcribed portion of the gene due to occupancy of the promoter by multiple factors that anchor the gene in a particular position in the nucleus. This scenario could apply whether or not AID was associated with the transcription complex.
Finally, an “inhibitory” nucleosome might be positioned somewhere over the first +146 bp downstream of the promoter. Promoters are often flanked by positioned nucleosomes due to the occupancy of the promoter itself by transcription factors that may exclude nucleosomes (53). In this case, AID would begin to gain access to the DNA after the dyad center of the nucleosome (at ∼+73 bp). Finding this result would mean that transcription can ignore such a nucleosome but AID cannot.
In contrast to the 5′ end, the clear decrease of mutations beyond ∼1 kb, in Ung− mice to the same degree as in Ung+ mice, is explained most easily by the hypothesis that AID is associated with the elongating transcription complex. AID may act stochastically on the DNA during elongation and, as a result of interacting with DNA, be unable to remain bound to the transcription complex. Alternatively, AID may need to be deposited on the DNA by a conformational change of the transcription complex, e.g., when the RNA pol pauses. If AID cannot reload after deposition, again, mutations would stochastically decrease in frequency toward the 3′ end of the gene.
Alternative hypotheses for the lack of mutations in the 3′ ends of Ig genes are as follows. First, AID might instead be tethered in the vicinity of the Ig genes, for example, at specific enhancers from where it prefers to target transcribed DNA without associating with the transcription complex. However, if so, why would AID mutations rise and fall in a relatively constant pattern from the 5′ to the 3′ region of the gene? When the intronic Igκ enhancer was moved to a position 3′ of the C region and an Ig promoter placed upstream of the C region, the V region was still mutated in the normal pattern (34). Second, chromatin differences between the V- and C-region DNA in heavy chain genes have been observed (54). However, in murine light chain genes, both the V and C regions were found to be in an active chromatin conformation in vivo (55). Thus, chromatin differences between V- and C-region DNA may not explain restriction of mutations to V-region DNA. Finally, it is possible that during SHM, certain rounds of transcription of the Ig genes abort somewhere in the middle of the gene. Clearly, full-length mRNAs are produced in PNAhigh B cells and mutating B cell lines, but short transcripts may also be made, although they would be predicted to be unstable. (We have been unable to find evidence for short transcripts; unpublished data). For AID to associate only with abortive transcription complexes, the rounds of transcription resulting in short transcripts would have to be differentiable from rounds resulting in full-length transcripts. Thus, the original hypothesis (34, 56) that a mutator factor, i.e., AID, travels with the RNA pol during Ig gene transcription seems more likely. That is, if all bases in a length of DNA are equal targets for AID, but AID always begins acting from the same point (just downstream of the transcription start) and travels in only one direction, cytosines at the start of AID's path have a greater chance to become deaminated than those further downstream if AID is deposited at its site of action.
Ung and mismatch repair in SHM
A predominance of transition mutations from C and G is not only a hallmark of SHM in Ung− mice, but also of mice lacking proteins of the mismatch repair system (16, 57–59). In addition, mutations at A and T bases are reduced in Msh2- and Msh6-deficient mice, leading to the proposal that both Ung and Msh2 can recognize the U:G mismatch generated by AID, although the SHM pattern would be different depending on the DNA repair enzymes involved. That is, Ung-mediated uracil removal and subsequent replication over the abasic site, or nicking and removal of the abasic site followed by error-prone DNA synthesis, were proposed to generate mutations from C or G almost exclusively. In contrast, Msh2-mediated removal of a single-stranded DNA patch that extends beyond the original U:G mispair could lead to mutations from any base, including A or T bases, during resynthesis of the removed strand. In this scenario, Msh2-mediated SHM would necessarily require error-prone synthesis. pol η has been implicated in this role, especially because (a) the pattern of synthesis errors made in vitro by Pol η (60), which appears to be stimulated by Msh2 (61), resembles the SHM pattern from A and T, and (b) SHMs from A and T are reduced in Pol η–deficient humans and mice (21, 61). However, if considering Msh2 and Ung as independent paths to generate SHMs from a common U:G mismatch, the SHM patterns in mice lacking one of the repair systems are difficult to explain. For example, why are mutations from A or T in Ung− mice unaffected? One would expect them to be increased if, in the absence of Ung, the Msh2–Pol η/Pol ι pathway predominates. Importantly, mismatch repair appears not to be limiting in germinal center B cells (62). Also, why are transitions from C and G increased in Msh2-deficient mice (57–59)? In the absence of Msh2 in this model, Ung would have been expected to gain more access, thereby still generating transversion mutations from C and G.
To explain this conundrum, we propose that Msh2 and Ung are factors in codependent systems in SHM. Msh2-mediated mismatch “repair” is expected not to be strand-specific for the U-containing strand, presumed here to be either the transcribed or the nontranscribed strand, because in Ung−/Msh2− double knockout mice, C to T and G to A mutations occur in roughly equal proportions (no strand bias for AID deamination; reference 16). As such, Msh2 and its cofactors are expected to remove either the U- or the G-containing DNA strand, as observed in Escherichia coli for a U:G mismatch in the absence of Ung (63). Removing the U-containing strand, followed by DNA synthesis to replace it, could result in mutations made by an error-prone pol, such as pol η or pol ι. Removing the G-containing strand would result in a similar scenario, except that in addition, the nonexcised strand contains a uracil base (Fig. 5). Interestingly, in vitro, human Ung is able to excise U from single-stranded DNA 2.5-fold (and a predicted 40-fold in vivo) better than from double-stranded DNA (17, 64). Thus, the single-stranded region produced by mismatch repair might provide a preferred substrate for Ung. A corollary of this hypothesis is that the ability of Ung to excise AID-deaminated cytosines is hampered by loss of Msh2. Ung inhibition results in an increase in C and G transitions, precisely what is seen in both Ung- and Msh2-deficient cells, and to a higher degree in the former as expected if Ung plays the major role in removing uracils. In Ung− mice, 94–98% of the mutations from C or G are transitions (15 and this paper), whereas in Msh2− mice, only 84–88% of mutations from C or G are transitions (16, 62). Thus, in the scenario described, all of the transition mutations from C and G observed in both Msh2- and Ung-deficient situations would be due to AID deamination events left unrepaired by Ung. Therefore, we expect that the great majority of C and G transitions mapped here in Ung− mice represent an accurate in vivo AID footprint.
We found that the footprint of AID overlaps with the distribution of SHMs in Ung− mice overall. Interestingly, the overlap implies that during Msh2-mediated mismatch repair at a U:G mismatch in Ig V regions during SHM, the length of single-stranded DNA removed is only a few bp, compared with the estimated length of as much as 1 kb during mismatch repair outside of SHM (65). Removal of a long tract of DNA by Msh2 and its cofactors during SHM, initiated by an AID-generated mismatch, would have been expected to expand the distribution of SHMs beyond the AID footprint because error-prone pols could potentially participate in resynthesis of the excised strand anywhere along the length of the gap.
Thus, the interactions of various DNA repair systems and AID appear rather complex. Finding that AID does not gain access to the complete Ig gene elucidates one important aspect of regulating AID, a potentially hazardous mutator factor.
Materials And Methods
Mice, immunizations, and cell sorting.
The Ung− mice on a hybrid C57BL/6J-129SV background (37) were provided by D. Barnes and T. Lindahl (Imperial Cancer Research Fund, Clare Hall Laboratories, UK) and maintained in our facility by breeding with C57BL/6J or RS transgenic mice, also on a C57BL/6J background. Ung− mice were genotyped by PCR using two primer sets: UngF (5′ GTGAATGCAGGGCTCACTTAAGTC 3′) and UngR2 (5′ CAGTGCCTATAACTTCAGCTCC) produce a 493-bp product in Ung+ mice and a 1.6-kb product in Ung− mice; NeoF (5′ GTGAATGCAGGGCTCACTTAAGTC 3′) and UngR2 produce an ∼600-bp product only in Ung− mice. PCR conditions were 1 cycle at 96°C for 4 min, 96°C for 30 s, and 58–55°C for 30 s (touchdown annealing), 7 cycles at 72°C for 1.5 min, followed by 30 identical cycles except using a constant annealing temperature of 55°C, and a final extension cycle at 72°C for 7 min. The RS transgenic mice have been described previously (39). The RS κ transgene does not encode a functional protein due to a stop codon in the leader exon and is therefore not selectable.
To immunize mice, 2 × 108 PBS-washed sheep red blood cells were injected intraperitoneally on days 1 and 22. B cells were isolated on day 25 from the spleen and Peyer's patches of immunized mice aged 4–10 mo. Cells were stained with anti–mouse B220/CD45-PE (BD Biosciences), anti–mouse PNA-FITC (Sigma-Aldrich), and anti–mouse GL7-FITC (BD Biosciences) antibodies and sorted on a Mo-Flo or FACSAria (BD Biosciences) cell sorter at the Immunology Core Facility at the University of Chicago. PNAlowB220+ and PNAhighB220+ cells were collected for DNA extraction using DNeasy columns (QIAGEN).
The Animal Care and Use Committee of the University of Chicago has approved the animal studies.
PCR amplification and sequencing.
To amplify a 991-bp segment of the Igλ gene with a Vλ1-Jλ1 rearrangement, primers were designed to hybridize at −453 bp upstream of the transcriptional start for the Vλ1 gene (5′ TCCCAGATTTAGACTCATTATACTACACAC) and +538 bp downstream in the Jλ1 region (5′ CACGGACAGGATCCTAGGACAGTCAGTTTG). PCRs were performed with DNA template from 12,000 splenic PNAhigh B cells using Pfu pol (Stratagene) for 25 cycles at 98°C for 45 s, 54°C for 45 s, and 72°C for 2 min.
VJ558-rearranged IgH genes were amplified as described previously (33, 66) using ∼10,000–20,000 cell equivalents of template DNA from splenic germinal center B cells, Pfu turbo pol (Stratagene), and PCR using 1 cycle at 95°C for 4 min, 95°C for 40 s, and 64–58°C for 40 s (touchdown annealing), 13 cycles at 72°C for 4 min, followed by 27 cycles at 95°C for 40 s, 57°C for 40 s, 72°C for 4 min, and a final extension at 72°C for 7 min. J1, J2, J3, and J4 rearrangements were resolved approximately by agarose gel electrophoresis, bands were excised, and DNA was purified using a gel extraction kit (Qiaquick; QIAGEN). The actual J rearrangement was ultimately revealed by DNA sequencing. Clones derived from hybrid PCR products identified by polymorphisms located throughout the region as analyzed by sequencing (33) were excluded.
To amplify a 1,265-bp segment of the RS transgene between −301 to +964 bp relative to the transcription start, the following primers were used: the forward primer (5′ GTTACTTTTGTCTCCTTGTCATTACAGG), which anneals to 300 bp upstream of the transcription start in the RS transgene, and the reverse primer (5′ TGTAGCACCTGTCCATGGTTAGCA), which anneals to the RS insert, a sequence that is unique to the RS transgene.
Two sets of primers were used to amplify the two downstream sections of the RS transgene from PNAhigh Peyer's patch B cells: JRH1F (5′ GCAGATGTAGATTCAGGTGCT) and 950R (5′ GCCCTCTCCATTTTCTCAAGAT, unique to the RS transgene), which amplify a region between +141 and +1100 relative to the transcription start; and RsendF (5′ CTACCAGAATTCAGTTCTCACGTTC, unique to the RS transgene) and mIGkJCR (5′ CTAATGGTTTGTAACCACATGGGAC), which amplify the region between +950 and +2,982 nt relative to the transcription start. PCR was performed using 1 cycle at 95°C for 4 min, 27 cycles at 95°C for 30 s, 55°C for 30 s, and 72°C for 3 min, followed by 1 cycle at 72°C for 7 min.
For all DNA, PCR products were gel purified using the Qiaquick gel extraction kit and cloned using the Zero Blunt Topo PCR Cloning Kit (Invitrogen). Individual colonies were picked for automated DNA preparation and sequencing using M13F (5′ GTAAAACGACGGCCAGT) and M13R (5′ CACACAGGAAACAGCTATGACCAT) primers at the University of Chicago DNA Sequencing Core Facility. Additional sequencing primers used were JCintronRseq (5′ CAGTTCTGAATAGGGTATGAGAGAGCC), which hybridizes to nt 2,120 to 2,146 (relative to GenBank/EMBL/DDBJ under accession no. X53774) between the end of J4 and the 3′ primer used for PCR, which is complementary to nt 2,429–2,458, located 3′ of the IgH intron enhancer (66); mIgkJCseqF (5′ GTGGGATAGCAGAGTTGAGTGAGC); and mIgkJCR to sequence the furthest 3′ portion of the RS transgene.
For all mutation analyses shown, all duplicate mutations were included in the totals to purposefully overestimate mutations found in the mutation-poor 5′ and 3′ regions of the genes examined. Nevertheless, we found only a few duplicated clones, and the data therefore are not significantly altered by retaining them (not depicted).
Simulations were performed using the R statistical software (R Foundation: http://www.r-project.org). To test the null hypothesis of a flat mutation rate, simulated datasets were created by generating random positions along the sequence. P-values were calculated as twice the percentage of simulations that had fewer mutations in the region investigated than in the collected data (e.g., a p-value of 0.0004 indicates that 4 simulated datasets out of 10,000 had a lower number of mutations in the region).
We thank B. Eisfelder, R. Duggan, and J. Marvin for skillful flow cytometric cell sorting, and W. Buikema and C. Hall for DNA sequencing. We thank T. Martin, S. Ratnam, P. Engler, and K. Padjen for critical reading of the manuscript.
The work was supported by National Institutes of Health grants AI47380 and AI053130. S. Longerich was supported by an AAUW International Ph.D. fellowship.
The authors have no conflicting financial interests.
Abbreviations used: AID, activation-induced cytosine deaminase; CDR, complementarity-determining region; Msh, mismatch repair; pol, polymerase; SHM, somatic hypermutation; Ung, uracil DNA glycosylase.