The process of somatic hypermutation (SHM) of immunoglobulin (Ig) genes requires activation-induced cytidine deaminase (AID). Although mistargeting of AID is detrimental to genome integrity, the mechanism and the cis-elements responsible for targeting of AID are largely unknown. We show that three CAGGTG cis-elements in the context of Ig enhancers are sufficient to target SHM to a nearby transcribed gene. The CAGGTG motif binds E47 in nuclear extracts of the mutating cells. Replacing CAGGTG with AAGGTG in the construct without any other E47 binding site eliminates SHM. The CA versus AA effect requires AID. CAGGTG does not enhance transcription, chromatin acetylation, or overall target gene activity. The other cis-elements of Ig enhancers alone cannot attract the SHM machinery. Collectively with other recent findings, we postulate that AID targets all genes expressed in mutating B cells that are associated with CAGGTG motifs in the appropriate context. Ig genes are the most highly mutated genes, presumably because of multiple CAGGTG motifs within the Ig genes, high transcription activity, and the presence of other cooperating elements in Ig enhancers.
The immune system has evolved effective mechanisms using a limited number of Ig genes to defend against a plethora of pathogens. To produce a diverse repertoire of antibodies, B cells undergo a series of genetic alterations to create a much greater variety of antibodies than specified by the number of Ig genes. Somatic hypermutation (SHM) is one of the diversification processes in activated B cells. SHM requires the activity of a mutation factor, activation-induced cytosine deaminase (AID; Muramatsu et al. 2000), and transcription of the target gene (Storb et al., 2001; Barreto et al., 2005). In many studies, the mutation frequencies of Ig genes were found to correlate positively with the level of transcription (Bachl et al., 2001). The frequency and the range of mutations depended on the distance of the targeted sequences from the promoter (Lebecque and Gearhart, 1990; Motoyama et al., 1994; Rada et al., 1994; Rogerson, 1994; Wu and Claflin, 1998; Rada and Milstein, 2001), and initiation of transcription inside an Ig gene induced SHM at the distal constant region that is normally unmutated (Peters and Storb, 1996). SHM targets Ig genes at high rates and several other non-Ig genes, such as BCL6, at intermediate rates (Pasqualucci et al., 1998; Shen et al., 1998; Müschen et al., 2000; Gordon et al., 2003), and may target all other transcribed genes at very low frequencies (Liu et al., 2008).
The specific association of AID with the mutable target genes is likely to be regulated by cis-elements within or near the target genes. The Ig enhancers are required for SHM in endogenous Ig genes. Deletion of either intronic or 3′ enhancer regions in Ig transgenes eliminated or drastically decreased SHM (Betz et al., 1994). The Ig enhancers comprise multiple cis-elements, all of which also occur in other enhancers in one combination or another. A previous study in our laboratory showed that the presence of two CAGGTG motifs in addition to the four CAGGTG and eight CAGCTG motifs already present in an Igκ transgene greatly enhanced the frequency of SHM without increasing transcription (Michael et al., 2003). The CAGGTG motif is present in Ig heavy and light chain enhancers that are shown to be required for Ig expression and SHM, as well as in all of the frequent non-Ig targets of SHM, such as BCL6 (Pasqualucci et al., 1998,Shen et al., 1998). Moreover, T cell lymphomas from mice with overexpressed transgenic AID indicated that all of the mutated genes found in these tumors shared the CAGGTG motif in the enhancer and/or promoter (Kotani et al., 2005).
These findings showed that CAGGTG was an enhancer of SHM but did not test whether CAGGTG is sufficient/required to attract AID to a nearby gene. To determine whether the CAGGTG motif is sufficient/required for SHM targeting, we generated mutable GFP transgenes that contained either three CAGGTG motifs or no CAGGTG motif in the entire construct. Our findings show that SHM occurs with otherwise identical transgenes only when they contain CAGGTG motifs.
Transgenic DT40 cell lines
To address whether the presence of the CAGGTG motif is required for targeting of SHM, we generated a mutable enhanced GFP (eGFP) transgene containing three CAGGTG motifs (C-GFP) and a control transgene containing AAGGTG (A-GFP) but completely lacking the CAGGTG motif. The transgenes have GFP and neomycin resistance genes driven by individual CMV promoters (Fig. 1).
Because SHM in an Ig transgene context requires both an intronic enhancer/matrix attachment region and a 3′ enhancer (Betz et al., 1994), we used enhancers from a mouse Igκ gene. These enhancers contain one CAGGTG motif in the intronic enhancer and two CACCTG motifs in the 3′ enhancer, which are the reverse complements of CAGGTG. In the A-GFP transgene, these CAGGTG motifs were changed to AAGGTG or AACCTG, motifs that showed no enhancement of SHM frequency when present in an Ig transgene (Michael et al., 2003). Because one of the binding factors known to bind the CAGGTG motif is an E47 homodimer, a predominant form of E protein in the B cell lineage, we also considered much weaker binding motifs of the E47 homodimer, such as CAGCTG and CACGTG (Hsu et al., 1994). To eliminate the possibility of E47 homodimer binding to CAGCTG, which is present in the neomycin resistance gene, we changed this CAGCTG to CCGCTG. This change does not alter the amino acid coding, and thus, neomycin resistance was preserved. This makes the C-GFP construct transgene have only three CAGGTG motifs in the entire construct and no other CAGGTG, CACCTG, CAGCTG, or CACGTG motifs, whereas the A-GFP transgene is completely devoid of these motifs.
To monitor SHM targeting, we inserted a premature stop codon in the GFP gene and a 300-bp spacer from the mutable region of the human BCL6 gene to position this stop codon at the peak of the mutable region observed in Ig genes. The spacer from BCL6 does not contain any CAGGTG sites. We chose it because it was known not to interfere with continuous SHM in its vicinity. This seemed important because we had shown that a transcriptional promoter placed in front of the C region can initiate a new wave of mutations (Peters and Storb,1996), and we suspected that anything inhibiting transcription elongation may affect the progress of SHM further 3′. For cells transfected with these transgenes to express functional GFP, they must mutate the stop codon. Because this stop codon is also part of an AID hotspot (GYW), it should have a reasonable chance of being mutated by AID.
Both the C-GFP and A-GFP constructs were transfected into an actively mutating chicken B cell line, DT40 CL18 (Sale et al., 2001), and stably transfected clones were selected. To verify SHM in this study, GFP fluorescence must be linked directly to a mutation of the premature stop codon by DNA sequencing. In cells where multiple copies of the transgene are integrated, mutation of a premature stop codon in one of the transgene copies may escape detection by PCR cloning and sequencing analyses. Therefore, for accurate assessment of SHM activity, independent C-GFP and A-GFP clones with only a single copy of the respective transgene were selected by Southern blot analyses (unpublished data).
Because SHM depends on transcription levels as well as transcription initiation (Peters and Storb, 1996), we further selected clones with equivalent levels of transgene transcription. To assess the levels of stable transcripts as a measure of transcription of the transgenes, we measured GFP mRNA levels by quantitative real-time PCR. Five C-GFP clones and five A-GFP clones were selected that expressed very similar levels of GFP mRNA (P > 0.1; Fig. 2).
The transgenes with C-GFP, but not those with A-GFP, are mutable in all transfectants
From each transgenic clone, 12 subclones were derived by single-cell sorting of GFP−IgM+ cells. IgM+ cells were chosen to eliminate potential differences in growth efficiency between IgM+ and IgM− cell cultures. To determine the effect of the CAGGTG motif on SHM targeting, C-GFP and A-GFP subclones were cultured individually and GFP fluorescence was measured by flow cytometry weekly for 6 wk. When assayed in this manner, all C-GFP transgenic clones showed GFP+ subclones and, overall, 54 out of 60 subclones showed GFP+ cells. On the other hand, only two of the AAGGTG clones showed any significant numbers of GFP+ cells and, overall, only 18 out of 60 AAGGTG subclones showed any GFP+ cells during the 6-wk-long experiment (Fig. 3). The differences of observed GFP+ cells between C-GFP subclones and A-GFP subclones were highly significant (P < 0.00001). The mean frequencies of GFP+ cells in 106 cells were also higher in C-GFP clones, ranging between 20 and 60, whereas A-GFP clones were consistently <10 GFP+ cells/106 cells (Fig. 4 B).
There were many C-GFP subclones with >100 GFP+ cells/106 cells at each week (22 C-GFP subclones; Fig. 3, green), and some had high numbers of GFP+ cells for 4–5 wk. In contrast, A-GFP subclones showed only a few subclones with high numbers of GFP+ cells (four A-GFP subclones). To verify expression of functional GFP by C-GFP and A-GFP clones, subclones with a high number of GFP+ cells were sorted and cultured for 1 wk. Indeed, FACS analyses showed that most of the GFP+ cells in C-GFP clones could be cultured (Fig. 4 A), whereas only the GFP+ cells of one A-GFP subclone (A2) survived (the others may have been autofluorescent dying cells).
Because the levels of transcripts of the C-GFP and A-GFP transgenes were equivalent in the 10 clones (Fig. 2), this eliminates the possibility that the difference in the occurrence of SHM was caused by varied transcription levels of the target genes. Furthermore, the AID mRNA levels were also equivalent among all 10 clones (P > 0.1; Fig. 2), indicating that the enhanced mutation frequencies of the C-GFP clones are independent of variability in AID expression.
The GFP premature stop codon is mutated
To analyze GFP sequences for mutations, sorted GFP+ cells from C-GFP and A-GFP subclones were cultured for 1 wk before DNA cloning and sequencing (Fig. 4 A). Although there were a few A-GFP clones with GFP+ cells, the A2-4 subclone (Fig. 3) was the only A-GFP subclone from which we were able to sort enough viable GFP+ cells to culture and expand for genomic DNA isolation. Sequencing analyses of the GFP region of C-GFP clones showed that 86% of the sequences contained a mutation at the premature stop codon (121 out of 141 sequences; Fig. 5). These were mostly G-to-C transversions at the third nucleotide of the stop codon. The SHM frequency of the GFP-expressing C-GFP clones was 1.38 × 10−3 mutations/bp. This is comparable to the SHM frequency of 3.05 × 10−3 in DT40 cells selected for loss of membrane Ig because of mutations of IgL genes (Arakawa et al., 2004). Because the latter cells overexpressed AID, the mutation frequency would be expected to be somewhat higher than in the cells we used that produce limiting amounts of AID (Arakawa et al., 2004). In accord with the SHM pattern of the IgL gene in DT40 cells, transversions were frequent.
The mutated stop codon by G-to-C transversion, TAC, is yet another hotspot for AID. Because there were several sorted GFP+ cells containing mutations in GFP but not in the stop codon, a fraction of the sorted cells apparently reverted TAC back to the TAG stop codon while the green cells were cultured (Fig. 4 A and Fig. 5). In addition, cells lost the green color, presumably because of internal GFP mutations. There are AID hotspots in the chromophore itself (Thr-Tyr-Gly) as well as in the rest of the GFP protein. The entire 27-kD polypeptide structure is essential for the development and maintenance of fluorescence (Olympus FluoView Resource Center at http://www.olympusfluoview.com/). Thus, many sites in GFP must remain unchanged for green color but are susceptible to inactivating changes by SHM. We also sequenced the GFP transgene of unsorted cells after 6 wk of culture and found a few mutations, but none at the stop codon.
The GFP mutations depend on AID
To determine whether C-GFP transgenes were mutated by an AID-dependent mechanism, C-GFP and A-GFP constructs were transfected into AID−/− Cre DT40 or the parental clone, AID+ Cre DT40, and six independent clones were selected for each of the four combinations (Fig. 6). There were no significant differences in GFP mRNA levels between AID−/− and AID+ cells (Fig. S1). The AID+ but not the AID−/− cells expressed AID mRNA (Fig. S1). The flow cytometry assay of GFP showed no GFP+ cells in the AID−/− background in both C-GFP and A-GFP clones. However, in the AID+ cells, mutations were significant for C-GFP clones. Out of the six A-GFP clones, only two showed sporadic, very low numbers of GFP+ cells. We assume that, as in the experiments shown in Fig. 3, in the A-GFP clones of Fig. 6, the transgene was inserted near CAGGTG sites (see CAGGTG motifs near… and Fig. 10). The data indicate that the mutability differences of C-GFP and A-GFP transgenes are dependent on AID-mediated SHM (Fig. 6).
No correlation between mutation frequency and DNase I hypersensitive sites
Between rearranged Ig alleles, which are mutable, and unrearranged Ig alleles, there are dramatic differences in DNase I hypersensitive sites (Storb et al., 1986; Thompson and Neiman, 1987). To investigate factors leading to differences in C-GFP and A-GFP mutation frequencies, the overall accessibility of DNA between C-GFP and A-GFP clones within the transgene-specific region was examined by treating nuclei with DNase I (Fig. 7). The digestibility of the two C-GFP clones appears greater than that of the A-GFP clones (Fig. 7 A). However, the ethidium bromide–stained gel showed that there was less DNA in the C-GFP clones, and overall, the DNA was more digested with lower amounts of DNA (Fig. 7 B). Thus, the digestability is equal. This is also supported by reprobing the Southern blot with an ovalbumin probe: the ovalbumin gene is also more digested in the C-GFP clones (Fig. 7 C). The ovalbumin gene is not expressed in DT40; therefore, one does not see DNase I hypersensitive sites but an overall increasing digestion with increasing DNase I. Among C-GFP clones, there was no variability in hypersensitive sites despite the differences in integration sites (Fig. 7 D). This was true among A-GFP clones as well. However, contrary to our expectation, DNase I hypersensitive sites for C-GFP and A-GFP clones also showed identical patterns (Fig. 7 D). Thus, it appears that the C-GFP and A-GFP transgenes exist in equally accessible chromatin.
Protein binding to CAGGTG
To study the possibility that a protein bound to CAGGTG may mediate an increase in SHM targeting, each of the three CAGGTG and AAGGTG motifs in the transgenes was assayed for protein binding by EMSA. Nuclear extracts of parental DT40 (CL18) cells showed a specific protein binding band with all probes containing the CAGGTG motif that was not seen with the AAGGTG motif (Fig. 8 A). This protein binding specific to the CAGGTG motif was not affected by the presence of an unlabeled AAGGTG competitor probe (50×), but it was diminished by an unlabeled CAGGTG competitor probe (unpublished data). Because one of the known factors binding to CAGGTG are homodimers of E47, we included anti-E47 or IgG control antibodies in the binding reaction. Clearly, inhibition of the protein–DNA complexes was observed with anti-E47 antibody for all three CAGGTG probes (Fig. 8 B). We also used nuclear extracts of DT40 cells expressing a human E47 transgene and observed inhibition of the protein–DNA complexes with anti–human E47 antibody as well (unpublished data). These results suggest that the predominant protein binding specifically to CAGGTG motifs, but not to AAGGTG motifs, is likely to be E47. If the binding of E47 to the CAGGTG motif leads to targeting of SHM to the E47-bound gene, it may be involved in a direct recruitment of AID and SHM factors. However, in our EMSA conditions, we did not observe differences in protein–DNA binding patterns between AID+ and AID null nuclear extracts (Fig. S2).
Histone acetylation of the GFP transgene
One of the known functions of E47 is the recruitment of the SAGA histone acetyltransferase complex (Massari et al., 1999). To determine whether there were differences in the histones associated with C-GFP and A-GFP, chromatin immunoprecipitation (ChIP) assays were performed using antibodies against acetylated H3 and H4 histones. ChIP results from both histones H3 and H4 showed no significant differences between C-GFP and A-GFP clones (Fig. 9; primers for ChIP PCR are shown in Fig. 1). This result suggests that the preferential targeting of SHM to the C-GFP transgene is not caused by hyperacetylation of histones H3 and H4 when E47 can interact with the transgene.
Collectively with the DNase I hypersensitivity data, these findings suggest that the difference in SHM targeting triggered by the presence or absence of the CAGGTG cis-element is not caused by changes in accessibility of the target gene; rather, it may be caused by a specific recruitment of AID and other SHM machinery mediated by E47 protein bound to the CAGGTG cis-element.
CAGGTG motifs near the integration site of mutated A-GFP clones
Sequencing of sorted GFP+ cells from A2 subclones showed mutations at the premature stop codon. Because the hexamer motif, CAGGTG, is in theory present once every ∼4 kb of the genome, we sought to determine whether a CAGGTG motif is present in the neighborhood of the A-GFP transgene integration site in the mutated A2 clone. Cloning of a fragment containing the integration site showed that the A2 clone has two CACCTG motifs ∼1 kb downstream from the 3′ end of the transgene (Fig. 10). The presence of these motifs might have induced the targeting of SHM to the A-GFP transgene in this particular clone. Because the chicken genome sequence is incomplete and a contig for this particular integration site is unavailable, we were only able to match this integration site with a cDNA library from chicken. Therefore, the chromosome and nature of this genomic region is uncertain.
To determine whether unmutated A-GFP clones lack the CAGGTG motif nearby, we attempted to sequence integration sites from these clones. Surprisingly, the A9 clone had an identical integration site as the A2 clone. Although we did not observe a significant number of GFP+ cells in the A9 subclones, upon random sequencing of GFP in unsorted A9-1 cells grown for 6 wk, we did observe two transition mutations upstream of the GFP gene out of eight sequences (Fig. 10). This is consistent with the idea that the CAGGTG cis-element can positively influence SHM targeting of a nearby gene.
We have searched for other Ig enhancer motifs near the integration site of the A2 and A9 clones that may have enhanced the mutability of the GFP transgene. We found two potential PU.1 binding sequences; however, unlike PU.1 motifs found in proximity to CAGGTG in Ig enhancers, they were far from the CACCTG motifs in the A2 and A9 integration site. This finding and the enhanced mutability seen in transgenic mice with CAGGTG in the J region without other nearby motifs found in Ig enhancers (Michael et al., 2003) suggest that the presence of CAGGTG in a context other than an Ig enhancer may aid the targeting of SHM to a nearby gene (see Discussion).
The findings reported in this paper show that SHM occurs in transgenes containing CAGGTG but not those without CAGGTG. The only difference between the C-GFP and A-GFP clones is a C-to-A mutation in three CAGGTG motifs in the Ig enhancers. The C-GFP and A-GFP transgenic constructs are transcribed into identical GFP mRNA sequences and their only differences, CAGGTG versus AAGGTG, lie after the 3′ poly A site. Because the mRNA terminates 5′ of the CAGGTG/AAGGTG sites, it is extremely unlikely that the GFP mRNAs from these transgenes have different structures or stability. Therefore, finding the same transgene mRNA levels in the C-GFP and A-GFP transgenes should reflect equal mRNA transcription efficiency. Thus, the effect on mutability seems independent of transcription activity, chromatin acetylation, or overall target gene activity. It seems unlikely that within the C-GFP pools only the cells that mutated were only at that time in more accessible chromatin than the rest of the cells; there is no precedence where a well-transcribed gene is in a heterochromatic state most of the time. The results show that the CAGGTG cis-element in the context of Ig enhancers is sufficient and likely required to attract AID to a target gene.
There are several studies showing a requirement for Ig enhancers for SHM induction (for review see Longerich et al., 2006). It was also clearly shown that CAGGTG, an element of Ig enhancers, enhances SHM (Michael et al., 2003). However, it was not known whether CAGGTG was sufficient/required for SHM. CAGGTG is a binding site for E2A proteins. In previous studies, the IgL enhancer of DT40 was not required for SHM (Yang et al., 2006); however, an element 3′ of the enhancer that contains five predicted E2A binding sites was required when the enhancer was deleted (Kothapalli et al., 2008; Blagodatski et al., 2009). This supported the idea that E2A proteins may be required for SHM. In the present study, the EMSA assays (Fig. 8) show that there are clear differences in protein binding to CAGGTG- versus AAGGTG-containing probes. Furthermore, the proteins binding to CAGGTG sequences were inhibited by addition of anti-E47 antibody, suggesting that the E47 protein may be the one that leads to increased targeting of SHM to the GFP transgene. Besides E47, there may be other known or unknown helix-loop-helix proteins that are involved or can substitute in experiments where low levels of SHM still existed in cells with E2A deletion (Schoetz et al., 2006; Kitao et al., 2008).
The finding that only transgenes with CAGGTG motifs attract AID shows that the other cis-elements of Ig enhancers alone cannot attract the SHM machinery. However, proteins that bind to other motifs in Ig enhancers may be cofactors in SHM. The avid targeting of Ig genes compared with non-Ig genes may be caused by the combined action of CAGGTG binding factors with factors that bind other Ig enhancer motifs. Mutation of PU.1 or NF-EM5 motifs in the 3′ κ enhancer was found to reduce the mutation rate in a κ transgene (Kodama et al., 2001). The available evidence on enhanced mutability by additional CAGGTG motifs in a J region of RS-transgenic mice without other adjacent Ig enhancer motifs (Michael et al., 2003; Fig. S3), as well as finding low mutagenesis in A2 and A9 clones with CACCTG elements present within 1 kb of the transgene outside of enhancers (Fig. 10), may indicate that the CAGGTG motif and other cis-elements need not be at the same site. The CAGGTG motif and other potentially required motifs for SHM targeting may act independently or coordinately over some distance. Furthermore, mutations of the Bcl6 gene and other non-Ig genes imply that the presence of CAGGTG in or near a gene that has the appropriate conditions, such as transcription activity and chromatin configuration similar to Ig genes, is able to attract the SHM machinery.
A known downstream effect of E2A binding is the recruitment of p300 and SAGA histone acetyltransferase complexes (Massari et al., 1999). Acetylated histones H3 and H4 were suggested to allow SHM induction in the V region of mutating BL2 cells (Woo et al., 2003) or the IgL locus in DT40 cells (Kitao et al., 2008). On the other hand, the ChIP data reported in this paper from DT40 cells and by others from mouse B cells (Odegard et al., 2005) suggested that changes of acetylated histones do not play a major role in inducing SHM. The highly transcribed Ig loci in mature B cells as well as developing B cells seem to be occupied by acetylated histones (McMurry and Krangel, 2000; Nambu et al., 2003; Osipovich et al., 2004). It is possible that the highly active transcription from the CMV promoter in the present study may have caused equally high acetylation of histones H3 and H4 in C-GFP and A-GFP cells; nevertheless, mutability was not rescued in A-GFP cells.
The CAGGTG motif is present in all of the frequent targets of SHM. Diffuse large B cell lymphomas undergo SHM of Ig genes as well as certain non-Ig genes. Searching non-Ig genes in diffuse large B cell lymphoma sequences revealed CAGGTG motifs in both mutated and several unmutated genes (unpublished data; Pasqualucci et al., 2001). However, some of the unmutated genes did not contain any CAGGTG motif within the coding regions, whereas all of the mutated genes observed contained at least one CAGGTG motif. Because this hexamer motif, theoretically, is present every ∼4 kb, it is unlikely that all CAGGTG motifs in the genome of activated B cells are associated with DNA binding proteins. Therefore, whether CAGGTG motifs are associated with E2A proteins may determine whether AID associates efficiently with the respective gene.
An extensive analysis of dozens of genes expressed in mutating B cells has shown that essentially all of these genes undergo some level of SHM (Liu et al., 2008). This was revealed by checking mutations in mice with deletions of both Ung and Msh2. Because most of the genes had no significant mutations in wild-type mice, the authors concluded that most non-Ig genes in mutating B cells are accessed by AID but protected from mutation by restriction of error-prone repair to Ig genes and some non-Ig genes, such as BCL6. Other genes would suffer C deamination by AID, but the uracil would be repaired error free by base excision repair and mismatch repair (MMR).
An important finding of this report by Liu et al. (2008) is that most of the genes that can interact with AID have CAGGTG motifs nearby within ∼2 kb. It remains to be determined whether non-Ig genes that show no significant mutations in wild-type mice possess a special mechanism that prevents error-prone repair. Alternatively, the lack of mutations in most non-Ig genes may actually reflect what is also going on in Ig genes that have been targeted by AID (Storb et al., 2009). In Ig genes of Ung/MMR null mice there are about twice as many transitions at C/G than in wild-type mice despite about the same mutation frequencies overall (Rada et al., 2004; Shen et al., 2006; Storb et al., 2009). C/G transitions in these null mice are footprints of AID. Thus, a conservative calculation shows that in wild-type mice, 39–47% or more of the uracils created in Ig genes by AID are repaired error free (Storb et al., 2009). In the study by Liu et al. (2008), the non-Ig genes show mutations in Ung/MMR null mice at very low frequencies. If, indeed, in Ung/MMR wild-type mice one half or more of AID-induced mutations in non-Ig genes are also repaired error free, they are difficult or impossible to detect because of the background of PCR errors. Furthermore, even the groups of genes with the least mutations (groups II and III in Liu et al. ) in wild-type mice have 35% of their rare mutations at A/T and 20% of transversions at C/G; thus, they did undergo error-prone repair in wild-type mice. Therefore, it seems that uracils in non-Ig genes targeted by AID can also be processed by error-prone repair. It will be very interesting to learn if and how the same genes can be treated either by high-fidelity or error-prone repair, or whether there simply are situations where the high-fidelity repair capacity is overwhelmed by excess uracils in the genome.
The important take-home message from the findings of Pasqualucci et al. (2001) and Liu et al. (2008) are that AID may attack most non-Ig genes, at least those that are expressed, and that CAGGTG may be required to target AID to any gene, at least under conditions of limited AID levels as they exist during SHM. Parsa et al. (2007) and Wang et al. (2004) found that transfected non-Ig genes inserted at various places in the genome can also be mutated without Ig enhancers. These non-Ig transgenes contained multiple CAGGTG as well as CAGCTG sites. Thus, these publications agree with our finding in this paper that CAGGTG is clearly an enabler of SHM without leading to increased transcription or chromatin activation of the target gene. We postulate that AID may be directed to expressed genes that are associated with CAGGTG elements through the interaction of E-box proteins, such as E47, bound to CAGGTG, either with AID and the transcription complex or with the transcription complex alone, thus somehow enabling AID access to the transcribed gene. Ig genes may be the most highly mutated genes because of the coincidence of several conditions: very active transcription, a high proportion of CAGGTG sites that are located within the gene, and SHM-favoring combinations of other cis-elements present in the enhancers. Many mouse and human Ig genes contain CAGGTG motifs besides the several present in the enhancers (unpublished data). The enhanced mutability by additional CAGGTG motifs in a J region of RS transgenic mice (Michael et al., 2003) indicates that CAGGTG motifs located within genes can increase SHM targeting. The distance of CAGGTG to the target may also play a role. The A2 and A9 A-GFP clones with two distant CACCTG sites ∼1 kb 3′ of the A-GFP transgene have a 5.6 times lower mutation frequency than the average C-GFP transgene (based on the number of GFP+ cells). The relative roles of numbers and distance from the target of CAGGTG motifs and their vicinity to enhancers are not known. It will be important to determine whether CAGGTG alone can suffice to attract AID and, if not, which other motifs of Ig enhancers are essential.
As suggested by a transcription model of SHM (Peters and Storb, 1996; Michael et al., 2003), E-box proteins bound to CAGGTG motifs in Ig enhancers may associate with the transcription complex and facilitate binding of AID to the RNA polymerase. This and other models implicating CAGGTG but not requiring the tethering of AID to the transcription complex remain to be investigated.
MATERIALS AND METHODS
Cell culture and transgenic clones.
DT40 CL18 cells derived from avian leukosis virus–induced chicken bursal B cells were a gift of H. Arakawa and J.M. Buerstedde (Institute of Molecular Radiology, Neuherberg, Germany; Arakawa et al., 2002). The cells were cultured in RPMI 1640 with 1% penicillin/streptomycin, 1% l-glutamine (Invitrogen), 1% chicken serum, and β-mercaptoethanol (Sigma-Aldrich) at 39.5°C with 5% CO2. C-GFP and A-GFP constructs were linearized with Blp I and transfected as previously described (Arakawa et al., 2004). After 12 h, transfected cells were treated with 2 mg/ml G418 for selection of neomycin resistance and single clones were isolated by subsequent limiting dilutions. Single clones from limiting dilutions were expanded to collect their genomic DNA for PCR analysis using a 5′ primer from the CMV promoter and a 3′ primer from the eGFP gene. For analyzing copy numbers of the transgene, Pst I, which cuts inside the transgene, as well as Stu I and Spe I, noncutters of the transgene, were used to digest genomic DNA. A radioactive probe for eGFP was used for Southern blot analysis.
Flow-cytometric analysis and cell sorting.
Five C-GFP and five A-GFP DT40 clones with a single copy of the transgene were stained with PE-conjugated anti–chicken IgM antibody (Santa Cruz Biotechnology, Inc.) and sorted for sIgM+GFP− single cells on a cell sorter (FACSAria; BD) at the University of Chicago Flow Cytometry Facility. 60 C-GFP and 60 A-GFP subclones derived from sorting were monitored for GFP expression of 50,000 live cells on an LSR II (BD) using CL18 cells and GFP+ pseudo-V KO (a gift of H. Arakawa and J.M. Buerstedde; Arakawa et al., 2004) as gating controls.
Total RNA was made from DT40 cells with RNA STAT-60 (Tel-Test Inc.), recovered in 50 µl, and stored at −80°C. Equal amounts of RNA were used for making cDNA by the SuperScript III First-Strand Synthesis System for RT-PCR (Invitrogen). Real-time PCRs were run and analyzed on an MYiQ system with SYBR Green SuperMix (both from Bio-Rad Laboratories). Primers used were eGFP Fwd4 (5′-CCTACCAGGGATCCACCGGTCG-3′), eGFP Rev3 (5′-GATCGCGCTTCTCGTTGGGGTCTTTGCTCA-3′), ggActin1 (5′-CCCCAAGCTTACTCCCACAGCCAGCCATGG-3′), gg-Actin2 (5′-GGCTCTAGATAGTCCGTCAGGTCACGGCCA-3′), ggAID1 (5′-GTTTCTGTGCACCAGAGGGCTGAACA-3′), and ggAID4 (5′-CTCCTTTCTTGGCTGGGTGAGAGGTC-3′). PCR conditions were 95°C for 30 s, 64°C for 45 s, and 72°C for 60 s for 40 cycles. The data from chicken actin were used as a reference for the relative quantification of eGFP or AID cDNA levels using the Pfaffl method (Pfaffl, 2001).
Identification of somatic mutations.
Mutations in a transgene were detected by PCR cloning using Pfu polymerase (Agilent Technologies), and eGFP primers Fwd2 (5′-TTTCTGCTGCTGCTTGCGTACGG-3′) and Rev6 (5′-GCCGTTCTTCTGCTTGTCGGCCATGATATAG-3′) were used for PCR cloning with Pfu polymerase at 95°C for 30 s, 58°C for 30 s, and 72°C for 90 s for 24 cycles and cloned into a PCR cloning kit (Zero Blunt TOPO; Invitrogen). DNA sequencing were performed by the University of Chicago Cancer Research Center DNA Sequencing Facility.
Mapping DNase I hypersensitive sites.
Nuclei from DT40 cells were isolated and digested as previously described (Sambrook, 2001) with modified lysis buffer containing 0.05% NP-40. 107 DT40 cells were used for each digestion with 0, 10, and 50 U DNase I (Applied Biosystems) in 250 µl of solution. Isolated DNA was digested with Pst I and Bcl I (New England Biolabs, Inc.), and a Sac I/BsrG I fragment from eGFP was used to make a probe using a random-priming synthesis kit (Roche) for Southern blot analysis.
ChIP assays for acetylated histones H3 and H4 were performed as previously described (Agata et al., 2001) with few modifications. For shearing chromatin, 10 U MNase (Sigma-Aldrich) was used in buffer containing 50 mM Tris, 1 mM CaCl2, and protease inhibitors (Roche) at 37°C. Antibodies used were anti–acetyl histone H3 or anti–acetyl histone H4 (both from Millipore). Primers for the GFP region were eGFP Fwd4 (5′-CCTACCAGGGATCCACCGGTCG-3′) and eGFP Rev1 (5′-TGCCGGTGGTGCAGATGAACTTCAGGGTCA-3′) to amplify a 185-bp fragment. For quantification, the quantitative RT-PCR conditions using SYBR green described in RT-PCR analysis were used.
Nuclear extracts were prepared as described by Dignani et al. (1983). The EMSA was performed as previously described (Bain et al., 1993; Spaulding et al., 2007). Oligonucleotides used for EMSA assays were 22mers: Ei (5′-CCAGGCAGGTGGCCCAGATTAC-3′), A-Ei (5′-CCAGGAAGGTGGCCCAGATTAC-3′), 3′E1 (5′-CAAAGCCTCATACACCTGCTCC-3′), A-3′E1 (5′-CAAAGCCTCATAAACCTGCTCC-3′), 3′E2 (5′-TACCCCAGCACCTGGCCAAGGC-3′), and A-3′E2 (5′-TACCCCAGAACCTGGCCAAGGC-3′). These oligonucleotides were annealed to the complementary strand and labeled with ɣP32-ATP using polynucleotide kinase. Approximately 0.1 ng of labeled probe was added to 10 µg of nuclear extract in binding buffer containing 1 µg BSA/0.8 µg poly-d(I)d(C). Competitor oligonucleotides were added to protein for 10 min before labeled probe, and the mixture was incubated for a further 20 min at room temperature. Antibodies used were anti-E47 and control IgG (Santa Cruz Biotechnology, Inc.).
Online supplemental material.
Fig. S1 shows relative mRNA expression for GFP, AID, and β-actin. Fig. S2 depicts an EMSA to test whether AID associates with the CAGGTG motif. Fig. S3 shows the location of two transgene-specific CAGGTG motifs in the J region of the RS transgene.
We are especially grateful to T.E. Martin for suggestions during the course of this work. We would also like to express our gratitude to H. Arakawa and J.M. Buerstedde for DT40 CL18 cells and ψV KO DT40 cells, B. Kee for technical advice and human E47 cDNA, L. Pasqualucci and R. Dalla-Favera for the sequences of non-Ig genes checked for SHM, W. Buikema for DNA sequencing, and R. Duggan for flow cytometric cell sorting. We also acknowledge the contributions of G. Bozek for technical advice and T.E. Martin for critical reading of the manuscript.
The work was supported by National Institutes of Health grants AI047380 and AI053130. A. Tanaka was supported by an Abbott Graduate Fellowship. P. Kodgire has been supported by Cancer Research Institute and Lady Tata postdoctoral fellowships.
The authors have no conflicting financial interests.