Somatic hypermutation (SHM) in immunoglobulin genes is required for high affinity antibody–antigen binding. Cultured cell systems, mouse model systems, and human genetic deficiencies have been the key players in identifying likely SHM pathways, whereas “pure” biochemical approaches have been far less prominent, but change appears imminent. Here we comment on how, when, and why biochemistry is likely to emerge from the shadows and into the spotlight to elucidate how the somatic mutation of antibody variable (V) regions is generated.
Wilson et al. (1) report in this issue that the error-prone DNA polymerase η is stimulated by the heterodimeric MSH2–MSH6 mismatch repair recognition complex. Humans deficient for polymerase (pol) η (2) or mice deficient in MSH2–MSH6 (3–6) show a significant reduction in hypermutation at A:T sites. This study, which shows a functional interaction between error-prone pol η and MSH2–MSH6, suggests that this low fidelity polymerase might be working hand-in-hand with the mismatch repair proteins to generate A:T mutations. At the very least, these data provide an impetus for more extensive biochemical analysis.
Properties and pathways of SHM
SHM is characterized by specific types of base substitutions in the variable portion of immunoglobulin genes which occur at a million-fold higher frequency than normal somatic mutations (10−3/bp versus 10−9/bp). V region mutations at G:C and A:T sites occur on both strands of DNA in vivo. Mutations at G:C sites tend to be favored in DNA WRC hot spot motifs (WR: W = A or T, R = purine). These mutations are attributable to the C → U deamination specificity of activation-induced cytidine deaminase (AID; reference 7) acting either on single-stranded DNA (ssDNA) or on the nontranscribed strand within a moving transcription bubble (7–9). Mutations at A:T sites show a preference for WA hot spot motifs reminiscent of the in vitro behavior of pol η. Pol η is highly error prone and tends to generate AT → GC mutations preferentially in TA motifs in vitro (10). The expression of AID, which occurs during a short time span in B cells, is essential for SHM as is active transcription of the V gene (11, 12). In fact, biochemical studies with semipurified AID provided the evidence that ssDNA was the substrate for AID, thus explaining the perplexing need for transcription (7–9, 13).
A synopsis of potential SHM pathways responsible for the V region mutations is shown in Fig. 1. C → T mutations might be initiated by AID tracking along a moving transcription bubble and generating U residues on the nontranscribed DNA strand or, less often, on the transcribed DNA strand (7–9). Faithful copying of U by a high fidelity polymerase, such as pol δ or pol ε, would generate C to T transitions, whereas aberrant copying of U by a low fidelity polymerase, such as pol η, could cause both transitions (the replacement of a purine with a different purine and pyrimidine with a different pyrimidine) and transversions (the replacement of a purine with a pyrimidine or vice versa) (Fig. 1, left). However, this simplified picture cannot account for the equal numbers of mutations at C on the transcribed strand, nor can it explain mutations at A:T sites.
Further SHM diversity could arise if the U residue created by AID is excised rather than copied. The removal of U residues by the enzyme uracil N-glycosylase (UNG) results in an abasic site that can then be removed by base excision repair. Alternatively, a mismatched U:G base pair can be repaired by mismatch repair. In this process, single base mismatches are recognized by an MSH2–MSH6 dimer which then recruits other proteins to excise the mismatch and replace the excised DNA. In either case, the repair patches—short in the case of base excision repair and much longer in the case of mismatch repair—would expose the transcribed strand to the action of pol η to generate mutations at A:T sites (Fig. 1, right). AID might also attack C residues on the transcribed strand.
Biochemical issues and challenges
The “holy grail” from a biochemical perspective would be to reconstitute SHM entirely in a cell-free system using purified proteins. This possibility might not be all that remote given the availability of eukaryotic base excision repair, mismatch repair, and RNA pol II transcription systems in vitro. However, defining where pol η and AID might fit into any one of these systems is itself a difficult problem, and full reconstitution of SHM would appear to require integration of all of the components of the base excision repair, mismatch repair, and transcription systems—a daunting task. The reconstitution of the system is further complicated by the finding that deletions and mutations of the COOH-terminal end of AID result in the loss of class switch recombination (which requires AID-induced mutations in switch regions upstream from the constant regions, double-stranded DNA breaks, and recombination), but the preservation of V region SHM (14, 15), whereas mutations at the NH2-terminal end of AID affect SHM but not class switch recombination (16). This suggested that proteins such as RNA pol II and replication protein A, which interact with AID (17, 18), and possibly other unidentified proteins, are required for the selective targeting of AID to certain parts of the immunoglobulin gene. A sensible point of departure, therefore, would be to take a step back and investigate individual protein–protein and protein–nucleic acid interactions.
MSH2–MSH6 stimulates pol η activity
The paper by Wilson et al. (1) identifies a potentially relevant functional interaction between MSH2–MSH6 and pol η, proteins that have been shown to be involved in SHM diversification (Fig. 1, right). The authors show that endogenous MSH2 binds to pol η in cell extracts, and that MSH2–MSH6 binds U:G mismatched base pairs, as has also been noted previously (19). The principal observation, however, is that the rate of pol η–catalyzed DNA synthesis is increased 2.4-fold in the presence of MSH2–MSH6. This increase reflects an enhancement in the catalytic efficiency (Vmax/Km) of pol η with no discernible change in polymerase processivity. Control experiments showed that DNA synthesis was unaffected in the presence of MSH2–MSH3 which recognizes larger mismatches, nor did MSH2–MSH6 stimulate pol ι, an error-prone polymerase of unknown function. Admittedly, although the increase in DNA synthesis is a modest 2.4-fold, it nonetheless may imply a coupling of mismatch repair and DNA synthetic functions—which could not be established solely by detecting protein–protein binding interactions using analyses such as chromatin immunoprecipitation.
A possible coupling of error-prone DNA synthesis with mismatch repair should be viewed in the broader context of determining the temporal fate of a U:G base misrepair. MSH2–MSH6 can bind to U:G (19) (Fig. 1, right), but competition for access to the U residue might also include high and low fidelity DNA polymerases vying with proteins involved in base excision repair. Each process, depending on when it occurs, could leave its own mutagenic signature. Because a heavy enzymatic traffic jam is likely to converge at the U residue once AID has acted, the hypermutagenic outcome depends on the enzyme queue. Do replication and repair enzymes encounter U residues in a temporally ordered process or in a random manner governed by mass action?
A recent paper by Rada et al. (20) addresses the temporal access to U by analyzing SHM and class switch recombination in mice deficient for both mismatch repair and base excision repair machinery. The data showed that A:T hypermutation in immunoglobulin V regions was eliminated entirely in mice lacking both MSH2 and UNG. It had been shown previously that the elimination of either mismatch repair (MSH2−/−) or base excision repair (UNG−/−) caused a reduction in mutations at A:T sites, with the loss of mismatch repair having a far greater effect (4, 21). It was concluded from that study that mismatch repair is primarily responsible for SHM at A:T sites with base excision repair used as a backup pathway (20). This suggests that AID-generated U residues do lead to a process in which MSH2–MSH6 competes with UNG for access to the U residue (Fig. 1, right).
Studies using the Msh2−/− Ung−/− double knock-out mice nicely defined the biochemical realities in vivo by, for example, eliminating a role for uracil DNA glycosylases other than UNG and focusing our attention on the origins of A:T mutation (20). However, many questions cannot be addressed in vivo, in short-term culture, or in cell lines because molecules such as proliferating cell nuclear antigen and RNA pol II are critical for all cell processes. For example, a possible queuing mechanism directing protein access to U:G might involve the proliferating cell nuclear antigen. Along with its role in increasing the processivity of pol δ during DNA replication, proliferating cell nuclear antigen is known to bind pol η (22) and is required for mismatch repair (23). It is possible that proliferating cell nuclear antigen facilitates binding of pol η with MSH2–MSH6 proximal to a U:G mismatch, thus coordinating the initiation of MMR with subsequent error-prone gap-filling synthesis. Now that we have learned that MSH2–MSH6 appears to stimulate pol η activity directly (1), we can begin to imagine how pol η might be selectively recruited to generate mutations at A:T sites as part of the mismatch repair machinery.
Help from biochemistry
A biochemical approach to reconstitute SHM using purified proteins should help us to advance beyond speculating on the molecular mechanisms. It might also make it possible to address the differences that govern the targeting of each of the repair mechanisms to V and switch region, since the different DNA substrates that characterize these regions could be used.
An attempt to model G:C and A:T hypermutation raises different questions and engenders different challenges. If we assume that both mutational processes originate from the action of AID on ssDNA, perhaps by tracking along the nontranscribed strand of a transcription bubble, as suggested from in vitro experiments using a bacterial T7 transcription assay (7–9), then how do mutations originate at the same frequency at C sites on the transcribed strand in vivo (24)? The chance that AID might attack a transcribed strand gap during mismatch repair or base excision repair seems unlikely because Msh2−/−Ung−/− mutant and normal mice show a similar distribution of G:C-targeted mutations, except for the absence of transversions in mutant mice, which is presumably caused by the absence of UNG-generated abasic lesions (20). Instead the mammalian RNA pol II transcription apparatus, in contrast to T7, may provide greater access to the transcribed strand.
Thus, a mechanistic understanding of G:C and A:T hypermutation would benefit from studying AID in a mammalian transcription system for G:C mutations, and by studying pol η in conjunction with MSH2–MSH6-based mismatch repair for A:T mutations. The rationale for splitting the two classes of mutations into two distinct systems is based on the mouse data showing that Msh2−/−Ung−/− mice appear to mutate G:C but not A:T sites (20). However, if AID acting on ssDNA during transcription is responsible for triggering SHM (and class switch recombination), then what is responsible for confining these mutations to immunoglobulin V regions? Perhaps specialized transcription factors or chromatin accessibility restrict the targeting of AID to IgV genes.
A biochemical approach may also help in resolving several extant controversies. One involves the possibility that AID may be acting on RNA in vivo, not on DNA (25). A second posits that antibody gene diversification requires the presence of UNG during class switch recombination, not for its ability to excise U, but rather to stabilize a protein complex needed for the mutational process (26). Arguments for (27) and against (28) this hypothesis have recently appeared in the literature. A third controversy involves the suggestion that endonuclease-catalyzed blunt end double-stranded DNA breaks, not AID, are responsible for initiating SHM (29, 30). It would seem that the burden of proof in each of these examples rests on showing that AID can use RNA as a substrate, that catalytically inactive UNG is an essential subunit of a complex whose activity is currently unknown, and that an endonuclease can be identified having a specificity commensurate with the properties of class switching and mutagenesis.
As a means of addressing the targeting and trafficking of mutator proteins to immunoglobulin V but not C regions, and in investigating why one repair pathway or error-prone polymerase is chosen in preference to another, biochemistry should play a decisive role in deciphering the mechanisms of antibody diversification. To paraphrase Arthur Kornberg (31), we cannot be fully confident in our fundamental understanding of somatic hypermutation until the process has been reconstituted successfully using purified proteins with the ultimate goal to “capture it alive.”