Here we describe a time-efficient strategy for endogenous C-terminal gene tagging in mammalian tissue culture cells. An online platform is used to design two long gene-specific oligonucleotides for PCR with generic template cassettes to create linear dsDNA donors, termed PCR cassettes. PCR cassettes encode the tag (e.g., GFP), a Cas12a CRISPR RNA for cleavage of the target locus, and short homology arms for directed integration via homologous recombination. The integrated tag is coupled to a generic terminator shielding the tagged gene from the co-inserted auxiliary sequences. Co-transfection of PCR cassettes with a Cas12a-encoding plasmid leads to robust endogenous expression of tagged genes, with tagging efficiency of up to 20% without selection, and up to 60% when selection markers are used. We used target-enrichment sequencing to investigate all potential sources of artifacts. Our work outlines a quick strategy particularly suitable for exploratory studies using endogenous expression of fluorescent protein–tagged genes.
Targeted insertions of transgenes into genomes of mammalian cells (knock-ins) for applications such as protein tagging are critical genomic modifications for functional studies of genes within their endogenous context, thus reducing the likelihood of artifacts due to overexpression (Doyon et al., 2011). In mammalian cells, such knock-ins are complicated by inefficient targeting and a high likelihood of off-target integrations. Knock-in efficiency in mammalian cells can be enhanced by inducing site-specific double-strand breaks (DSBs) using programmable endonucleases such as zinc finger nucleases, transcription activator-like effector nucleases (TALENs), or CRISPR-associated (Cas) endonucleases (Dambournet et al., 2014). These lesions can promote integration of the desired heterologous DNA sequences via DSB repair pathways such as homologous recombination (HR) or canonical nonhomologous end joining (c-NHEJ; Scully et al., 2019). While zinc finger nucleases and TALENs were initially shown to yield high on-target editing rates, the CRISPR-Cas endonucleases are nowadays preferred due to their simplistic usage and versatility (Zhang, 2019). Among different Cas endonucleases, Cas9 has found its way into most genome engineering applications, mainly for historical reasons. The subsequently characterized Cas12a, in comparison, has the reported advantage of being more specific in vivo (Kleinstiver et al., 2016; Kim et al., 2017), and its CRISPR RNA (crRNA) structure is simpler (Zetsche et al., 2015). In addition, Cas9 induces DSBs close to the protospacer-associated motif (PAM) site, while Cas12a cuts further away from it, which might increase targeting efficiency as the target sequence is not as easily destroyed by indel formation and may be recleaved after repair (Zetsche et al., 2015; Moreno-Mateos et al., 2017).
A variety of methods have been developed to use Cas9/12a for knock-in applications (reviewed in Yamamoto and Gerbi, 2018). They can be classified by the DSB repair pathway they depend on. Methods that rely on c-NHEJ require a correctly positioned cut site for the endonuclease, and alternative processing of the DNA ends can generate out-of-frame integrations. Methods relying on HR are more flexible in terms of target sites, enabling highly precise genomic modifications. However, HR is only active in late S/G2 phase of the cell cycle (Moynahan and Jasin, 2010), decreasing the likelihood that this pathway is selected for the repair of a particular DSB. Irrespective of the method and targeted DNA repair pathway, suitable reagents are required to provide all the necessary components for integration such as recombinant proteins, RNAs, single-stranded DNA, or the cloning of tailored and gene-specific plasmids (Yamamoto and Gerbi, 2018). In yeast, genomic tagging has been simplified to a strategy based on PCR (Baudin et al., 1993; Wach et al., 1994), now commonly referred to as PCR tagging. It requires two gene-specific DNA oligonucleotides (oligos) for PCR and a generic template plasmid that provides the tag and a selection marker to generate a PCR cassette. Upon transformation into cells, the homologous sequences provided by the oligos target precise insertion of the PCR cassette into the genome by the efficient HR machinery in this species.
In mammalian cells, long linear double-stranded DNA donors containing short homology arms (50–100 bp) have been shown to suffice for efficient HR if a DSB is simultaneously induced at the modification site (Orlando et al., 2010; Zheng et al., 2014; Zhang et al., 2017). Hence, the use of PCR for the generation of repair templates for gene tagging in mammalian cells is in principle possible. However, it is complicated by the requirement to simultaneously introduce a crRNA for CRISPR-Cas–mediated cleavage at the modification site.
Here we develop mammalian PCR tagging. Similar to yeast PCR tagging, this method also depends on two gene-specific oligos and a single PCR. In contrast to the yeast method, one of the oligos also contains the sequences encoding the crRNA, and the PCR generates a fragment termed PCR cassette that simultaneously contains a functional gene for the expression of the crRNA to direct the integration of the cassette into the genome. We optimized the design of the oligos and explored the effect of oligo protection by chemical modification, the use of selection markers, and applications in different cell lines. Using targeted next-generation sequencing, we characterize tagging fidelity, off-target insertions, and by-product formation such as repair template concatemerization. We facilitate adaptation of mammalian PCR tagging by introducing a toolbox comprising many possible PCR templates allowing genomic integration of various tags. A web application allows the rapid design of the two oligos needed for mammalian PCR tagging of individual genes. Finally, we discuss applications of the method for basic research in cell biology and for screening purposes.
Implementation of mammalian PCR tagging and method optimization
Mammalian PCR tagging requires two oligos for C-terminal tagging of proteins. The M1 tagging oligo provides homology to the 5′ region of the insertion site. The M2 tagging oligo provides homology to the 3′ region of the insertion site as well as the sequence of the crRNA for guiding the Cas12a endonuclease along with a (T)6 element that functions as a polymerase III terminator (Arimbasseri et al., 2013). The PCR with the M1/M2 tagging oligos is performed with the template plasmid, which provides the desired tag (e.g., GFP) and a U6 Pol III promoter for the crRNA. The template plasmid contains also a heterologous 3′ UTR after the fluorescent protein reporter to properly terminate the gene fusion before the crRNA expression unit. PCR generates a PCR cassette that contains locus-specific homology arms as well as a functional gene for the expression of a locus-specific crRNA for Cas12a (Fig. 1 a and Fig. S1). Based on our experience with similar PCR cassettes in yeast (Buchmuller et al., 2019), we predicted that upon transfection, the crRNA will be expressed and will assemble with Cas12a, which is simultaneously expressed from a cotransfected plasmid, into a functional complex that cleaves the target gene (Fig. 1, a and b).
DSB repair can occur via different pathways. One option is that the DSB is repaired by HR using the transfected PCR cassette as a template, as it contains homology arms that match the region adjacent to the cleaved site. This yields the desired integrands expressing the appropriately tagged proteins from the target locus. Other repair pathways like c-NHEJ are less well defined and likely do not produce a functionally tagged gene.
To test if this approach permits efficient gene tagging in mammalian cells, we designed a template plasmid containing the bright GFP mNeonGreen (Shaner et al., 2013). We designed 16 M1/M2 tagging oligo pairs for tagging of 16 different genes encoding proteins with a diverse range of cellular localizations (Table S1) and with high endogenous expression levels (Geiger et al., 2012; Schaab et al., 2012). This allows for easy detection of the corresponding mNeonGreen-tagged fusion proteins by fluorescence microscopy. We cotransfected the PCR cassettes together with a Cas12a-encoding plasmid into HEK293T cells and quantified fluorescent cells 3 d later. For all genes, we observed between 0.2% and 13% of fluorescent cells with the expected protein-specific localization pattern (Fig. 1 c), e.g., ER for CANX, mitochondrial staining for TOMM20, or a diffuse and a dotted nuclear staining for HNRNPA1 and PCNA, respectively (Fig. 1 d). We validated that the formation of cells with correctly localized fluorescence signal depended on the presence of Cas12a and matching combinations of homology arms and crRNA, irrespective of whether they are on the same or different PCR products (Fig. 1 e). In the presence of a crRNA for a locus different from the one targeted by the homology arms, we found very rarely cells where the cassette became integrated into the foreign locus, indicating that in addition to HR, other integration pathways such as c-NHEJ are also used (Fig. 1 e and Fig. S2 a). Together, these results establish that the crRNA is transcribed from the transfected PCR cassette and that it directs Cas12a for cleavage of the target locus. Furthermore, we conclude that the Cas12a-mediated DSB is repaired frequently using HR and linear donor templates with short homology arms.
In addition to cells with the expected localization of the green fluorescence, we also observed in several transfections cells with diffuse cytoplasmic fluorescence of variable brightness (Fig. 1, c and d; see examples labeled with arrows in Fig. 1 d). This fluorescence was independent on Cas12a or matching combinations of crRNA and homology arms (Fig. 1 e). This indicates that the diffuse cytoplasmic signal resulted from the transfected PCR cassettes alone.
Diffuse cytoplasmic fluorescence is caused by unstable extra-chromosomal DNA molecules
The nature of the diffuse cytoplasmic fluorescence observed in a fraction of the cells was unclear. We reasoned that the cytoplasmic fluorescence could originate from extra-chromosomal DNA molecules or fragments that have integrated at chromosomal off-target loci. To investigate the fate of the transfected fragments, we specifically amplified from cells 3 d after transfection the junctions between PCR cassettes and their upstream flanking DNA sequences using Anchor-Seq (Meurer et al., 2018; Buchmuller et al., 2019; Fig. 2 a). We detected junctions indicative for PCR cassettes inserted into the correct chromosomal locus (Fig. 2 b). However, the detection sensitivity of correctly inserted cassettes was limited because of a large number of reads that did not extend beyond the sequence of the M1 or M2 tagging oligos (Fig. 2 b). This suggests that they result from transfected PCR cassettes that are still present in the cultured cells. In addition, we also observed a substantial fraction of reads that originate from ligated ends of transfected cassettes, consistent with the idea that the free ends of the transfected PCR cassettes were recognized and processed by c-NHEJ. Although different types of fusions were detected, the most dominating comprised a head-to-tail fusion of the PCR fragment (Fig. 2 b).
To further explore the nature of the cassette fusions, we transfected a mixture of PCR cassettes used for tagging the genes shown in Fig. 1 c. This detected hybrid fusions between PCR cassettes targeting different genes (Fig. 2 b), validating the idea that after transfection, the cassettes are ligated together, e.g., via c-NHEJ–mediated DNA damage repair. However, head-to-tail fusions among cassettes for the same gene remained the most abundant events also in the transfection of the mixture. This can be best explained by a preference for intramolecular ligation and subsequent concatemerization by HR, as reported in previous studies (Folger et al., 1982, 1985).
In head-to-tail fusions, the crRNA gene is ligated to the 3′ end of the mNeonGreen sequence with the homology arms of the M1 and M2 tagging oligos in between. The used U6 Pol III promoter has previously been shown to also mediate Pol II–driven expression (Rumi et al., 2006; Gao et al., 2018). This could lead to the expression of mNeonGreen. To assess this, we next transfected a PCR cassette where the ATG codons at positions 1 and 10 of the mNeonGreen ORF have both been substituted with codons for valine. This largely, but not completely, suppressed the population of cells with diffuse cytoplasmic signal, while the fraction of cells with specific localization indicative for correct gene tagging was unchanged (Fig. 2 c). This indicates that the necessary ATG is often provided by mNeonGreen itself. Additionally, the crRNA or homology sequences within the M1 or M2 tagging oligo may provide an ATG in frame with the mNeonGreen ORF.
If head-to-tail fused PCR cassettes are not or rarely incorporated into the genome, they are unlikely to be stable. Consistently, we observed during subsequent growth of the cells a gradual loss of the fraction of cells with diffuse cytoplasmic fluorescence, while the fraction of cells with correctly localized fluorescence signal remained constant (Fig. 2 d). This argues that head-to-tail fused fragments that are formed as byproducts do not hamper a general applicability of mammalian PCR tagging for targeted knock-in of PCR cassettes.
Parameters influencing tagging efficiency
To explore PCR tagging further, we determined tagging efficiency as a function of various parameters.
We first explored basic parameters such as DNA amount and transfection method. We found that equal amounts of Cas12a plasmid DNA and PCR cassette DNA are optimal (Fig. S2 b), whereas the transfection method did not seem to influence the outcome (Fig. S2 c). Furthermore, we noticed that PCR cassette purification using standard DNA clean-up columns (that do not remove long oligos) can be used. However, we observed that inefficient PCR amplification resulting in the presence of significant contamination of the final product with M1 and M2 tagging oligos can potentially lower the yield of integration at the correct loci (data not shown).
We also tested whether Cas12a could be delivered using mRNA or protein instead of plasmid-borne Cas12a expression. We found that transformation required electroporation and that for all three expression systems, successful tagging could be achieved (Fig. S2 d). This indicates the modularity of the system, but for the sake of simplicity, we used plasmid-borne expression for the remainder of the study.
Length of homology arms
From yeast it is known that ∼28–36 nt of continuous sequence homology are minimally required for HR of transfected DNA with the genome (Rothstein, 1991). For PCR tagging in yeast, homology arms between 45 and 55 nt in length are routinely used. To obtain some insights into the requirement in mammalian cells, we tested the integration efficiency as a function of the length of the homology arms. This revealed that already short homology arms of 30 nt on both sides allow efficient integration of the cassette (Fig. 3 a), but increasing the length results in more efficient integration.
Dependence on homology arms
Our control experiment (Fig. 1 e) suggested that PCR tagging depends on the presence of homology arms. However, it could still be that a fraction of the productive events is not mediated by HR, but by alternative DNA repair pathways. To test this directly, we generated a series of PCR cassettes with different types of ends. In particular, we also generated a PCR cassette with compatible overhangs for direct ligation, by using a type IIS restriction enzyme (HgaI). This enzyme generates ends that contain 3′ overhangs of 5 nt on both sides, which were designed such that they are compatible with the ends produced by Cas12a (Zetsche et al., 2015) in the corresponding genomic locus (Fig. 3 b). We observed in-frame integration of the HgaI cut fragment, but with lower frequency when compared with the integration in the presence of homology arms (Fig. 3 b). This demonstrates the requirement of homology arms for efficient integration. Insertion of the PCR cassettes via c-NHEJ can be observed, but it is rather inefficient.
End-to-end joining of transfected dsDNA inside cells can be reduced when bulky modifications such as biotin are introduced at the 5′ end of the DNA fragment. This has been reported to enhance targeting efficiency approximately twofold in medaka (Gutierrez-Triana et al., 2018), and the biotin modification could contribute to enhance targeting efficiency in mouse embryos (Gu et al., 2018), leading to the insertion of preferentially one copy of the donor DNA. We tested M1/M2 tagging oligos with multiple phosphorothioate bonds (to prevent exonuclease degradation) with and without biotin at the 5′ end. Synthetic oligo synthesis occurs in the 3′ to 5′ direction, and oligo preparations without size selection are contaminated by shorter species without the 5′ modifications. Therefore, we additionally included size-selected (PAGE purified) oligos.
In all cases, we observed a two- to threefold reduced frequency of cells with diffuse cytoplasmic fluorescence (Fig. 3 c). This is consistent with the idea that the modifications are partially effective in suppressing end-to-end ligation and therefore concatemer formation. Quantification of the targeting efficiency revealed for TOMM20, CLTC, and DDX21 an increased tagging efficiency to a maximum of two- to threefold. It was irrelevant whether the oligos were size selected or not. However, for HNRNPA1 and also CANX, the modifications only slightly enhanced tagging efficiency.
Taken together, these experiments demonstrate the robustness of the procedure and dependency on homology arms for efficient recombination with the target locus, leading to the tagged gene. The use of modified oligos with phosphorothioate exhibits an overall positive effect on tagging efficiency and reduces diffuse cytosolic fluorescence most likely by reducing end-to-end ligation of fragments by c-NHEJ.
Tagging fidelity and off-target integrations
Integration of DNA by HR in the genome of mammalian cells might be associated with mutations caused by the integration process or that result from faulty oligos (Fig. 4 a). In addition, integration by c-NHEJ and off-target integration of the cassette elsewhere in the genome might occur. To investigate this in more detail, we transfected HEK293 cells with PCR cassettes targeting three different genes. The Cas12a cleavage sites were selected to have different positions around the STOP codon, either before (CANX), after (HNRNPA1), or directly at the STOP codon (CLTC; Fig. 4 b). For all cases, we used protected primers (5S biotin; Fig. 3 c) to reduce the load of concatemerized cassettes. Insertion junctions at the targeted gene were amplified from unselected cell populations 18 d after transfection.
We used PCR to amplify the insertion junction between the 3′ of the ORF and the inserted tag. This yielded two distinct amplicon populations. The shorter bands correspond in their size to the junctions formed by HR tag, and the longer to the size expected from fragment insertions by c-NHEJ (Fig. 4, a–c). Despite that PCR of not fully identical fragments can differ in efficiency, the results suggest that a considerable number of insertion junctions in the population are formed by HR. Illumina dye sequencing of the shorter bands revealed >80% correct sequences (Fig. 4 d), and most other reads contained mutations that were enriched at the end of the homology region near the junction of the tag (Fig. 4 e). This suggests that they result from faulty synthesis of the long oligos and that HR does select against PCR cassettes containing faulty sequences in the region of the homology arms. Similar observations that select against faulty oligos have been made for yeast (Buchmuller et al., 2019), where it is known that mismatch repair systems prevent recombination between short imperfect sequences (Anand et al., 2017).
We next generated amplicons of the wild-type loci to quantify the alterations resulting from DSBs that were not repaired by HR with the PCR cassette. Illumina dye sequencing revealed that between 7 and 12% amplicons contained small deletions close to the positions of the Cas12a-induced DSBs (Fig. 4 f). Depending on the exact position of the Cas12a DSB with regard to the STOP codon (Fig. 4 b) and the exact manner through which the DSB is repaired via the c-NHEJ machinery, this may cause a modification of the C terminus of the protein due to a frame shift or altered transcript stability (e.g., due to nonsense-mediated decay; Fig. 4 g).
Next, we used Anchor-Seq to determine potential off-target integrations. We used transfected cells that were passaged for 30 d to minimize PCR cassette–derived concatemers. We observed multiple off-target integration events throughout the genome (Fig. 4 h). Comparison of the integration sites between replicates and controls without Cas12a plasmid did not identify integration sites that are common between the samples (with the exception of integrations at the target locus). This indicates that the majority, if not all, off-target integration events were caused by random integration of the donor template and were not due to off-target activity of Cas12a.
Together, these data indicate that a large fraction of the on-target integration events yields the expected gene fusions as a result.
Selection of clones using antibiotics resistance markers and multi-loci tagging
Next, we generated template plasmids that additionally incorporated selection markers for different antibiotics and used them to generate PCR cassettes for tagging twelve genes, including five genes that we did not tag before (Table S1). PCR cassettes were incubated with DpnI or FspEI to selectively digest the Dam methylated template plasmid DNA (which also contains the selection marker). Selection using either Zeocin or Puromycin resistance yielded cell populations highly enriched in cells exhibiting the correct localization of the fluorescent fusion protein (Fig. 5 a). The selected populations still contained cells with the diffuse cytoplasmic fluorescence, but the fraction remained either constant or decreased, consistent with the idea that the transcripts leading to this fluorescence originate predominantly from extrachromosomal concatemers.
After enrichment of positive cells by Zeocin selection, we isolated individual clones for detailed analysis. PCR identified in all clones correct insertion junctions on the side of the fluorescent protein tag, and also in four out of five on the other side of the PCR cassette. Antibodies detected the corresponding mNeonGreen fusion protein (Fig. S3). HEK293T cells are aneuploid and appear to have up to five copies of the CANX gene (Lin et al., 2014). We also detected the wild-type protein of CANX in all clones, indicating that not all copies were tagged (Fig. S3). We used PCR to investigate the presence of concatemers and found that this was the case for four of the clones. Therefore, it appears that correctly tagged clones contain frequently integrated concatemers at the tagged locus, as also predicted from previous work (Folger et al., 1985, 1982). Clones with concatemer might be enriched during antibiotics selection due to the presence of multiple resistance genes. In either case, the inserted additional copies are unlikely to interfere with the tagged gene since they are insulated from the inserted tag by a proper transcription terminator.
To gain insight into the frequency of multiple tagging events, we next generated for CANX and HNRNPA1 two PCR cassettes each, one for tagging with the RFP mScarlet-I (Bindels et al., 2017) and one with mNeonGreen, respectively. The resulting four cassettes were then cotransfected into HEK293T cells in mixtures of pairs of two, using all four possible red–green and gene–gene combinations. This detected three types of cells, with green, red, or green and red fluorescence in the nucleus or the ER, respectively, as shown for the example of the HNRNPA1-mScarlet-i/HNRNPA1-mNeonGreen transfection (Fig. S4). The frequency of each of the three types of cells was roughly equal, no matter whether the same or two different genes were tagged (Fig. 5 b). This indicates high double-tagging efficiency of different loci, and demonstrates that often more than one allele is tagged. This suggests applications of PCR tagging for the analysis of protein–protein interactions using epitope tagging, or protein colocalization using different fluorescent proteins. We validated this possibility in double-tagging experiments (Fig. 5 c), which demonstrated simultaneous detection of various cellular structures with one transfection.
Together, this analysis demonstrates that all positive clones contain insertions by HR that yield the correct fusion protein. Insertions are not necessarily single copy, but likely concatenated segments of PCR cassettes. Nevertheless, since the PCR cassette provides STOP codon and a 3′ UTR along with the tag, the generated transcript is properly defined.
Applications of PCR tagging in different cell lines
So far, we have described and characterized mammalian PCR tagging as a robust workflow for chromosomal tagging in HEK293T and HEK293 cells. To challenge the general applicability of PCR tagging, we tested additional human but also murine cell lines to target genes already tagged successfully in our initial experiments. In each cell line we identified for most genes cells that showed correctly localized green fluorescence. However, we note that for some of these cell lines, transfection efficiency was in the lower range, so that we observed a tagging frequency of 0.2–5% (Fig. 6, a–d). Examples of tagged murine myoblast (C2C12) cells are shown in Fig. S5 a. For HeLa cells, which also provide only moderate transfection levels, we additionally subjected the cells to selection, and found up to 40% of cells exhibiting the correct localization (Fig. S5 b). In conclusion, these results demonstrate that PCR tagging works for different mammalian cell lines and species, including differentiated cells and mouse stem cells (mESCs), whereby combining transfection with selection vastly increases tagging efficiency.
crRNA design, PAM site selection, and genomic coverage
Next, we asked how well Cas12a-targeted PCR tagging covers the human genome. Our tagging approach relies on relatively short homology arms of the PCR cassette. This constrains the target sequence space, since cleavage of the target locus must be inside the area of the homology arms, leaving enough sequence for recombination. In addition, insertion of the cassette needs to destroy the crRNA cleavage site, in order to prevent recleavage of the locus (also see legend to Fig. 4 g). For C-terminal protein tagging, these criteria confine potentially useful PAM sites to a region of 17 nt on both sides of the STOP codon including the STOP codon, with the PAM site or protospacer sequence overlapping the STOP codon (Fig. 7 a). So far, we have used Cas12a from Lachnospiraceae bacterium ND2006 (LbCas12a; Zetsche et al., 2015), but PAM sites that are recognized by this Cas12a (TTTV; Gao et al., 2017) and that are located in this area of a gene are relatively infrequent and would allow C-terminal tagging of about one third of all human genes (Fig. 7 b). To increase this number, we first tested different Cas12a variants with altered PAM specificities (Gao et al., 2017). The results demonstrated that other variants and PAM sites are also functional and can be used for PCR tagging (Fig. 7 c). Considering these and additional enCas12a variants (Gao et al., 2017; Kim et al., 2016; Tóth et al., 2018; Kleinstiver et al., 2019; Sanson et al., 2019 Preprint) renders ∼97% of all human genes accessible for C-terminal PCR tagging (Fig. 7 b). To increase the number of suitable PAM sites for one Cas12a variant further, we extended the search space into the 3′ UTR (typically 50 nt; Fig. 7 a) and adjusted the design of the M2 tagging oligo such that a small deletion occurs that removes the binding site of the crRNA. Since tagging introduces a generic terminator for proper termination of the tagged gene, this small deletion is unlikely to have an impact on the tagged gene. Considering the extended search space and the currently available palette of Cas12a variants, we calculated that close to 100% of all human ORFs (Fig. 7 b) are amenable for C-terminal PCR tagging.
PCR tagging toolkit for mammalian cells
To further facilitate application of mammalian PCR tagging, e.g., for quick C-terminal fluorescent protein labeling, we set up a webpage for oligo design (Fig. 1 a). The online tool (www.pcr-tagging.com; Fueller et al., 2019) requires as input the Ensembl transcript ID (www.ensembl.org) of the target gene. Alternatively, the genomic DNA (gDNA) sequence around the desired insertion site, i.e., the STOP codon of the gene of interest for C-terminal tagging, can be provided. The software then generates the sequence of the M1 tagging oligo, which specifies the junction between the gene and the tag. Next, the software identifies all PAM sites for the available Cas12a variants and uses these to generate crRNA sequences and to assemble corresponding M2 tagging oligos. M2 tagging oligos are designed such that the integration of the PCR cassette does lead to a disruption of the crRNA binding site or PAM site in order to prevent recleavage of the locus. M2 tagging oligos are then ranked based on the quality of the PAM site and the presence of motifs that might interfere with crRNA synthesis or function. M1/M2 tagging oligos can be used with template plasmids based on different backbones: either without a marker or with Zeocin or Puromycin resistance genes (Fig. 8 a). We generated a series of template plasmids containing different state-of-the-art reporter genes (Table 1; examples shown in Fig. 8 b).
Ongoing efforts continue to improve optimal crRNA prediction and to eliminate crRNAs with potential off-target binding activity. The current version of the server already allows us to flexibly add novel Cas12a variants by adjusting PAM site specificity and the sequence of the corresponding constant region of the crRNA.
In conclusion, Cas12a-mediated PCR tagging of mammalian genes using short homology arms is a rapid, robust, and flexible method enabling endogenous gene tagging. The versatility of the method suggests many types of applications for functional or analytical gene and protein studies in mammalian cells.
In this paper, we demonstrate efficient targeted integration of DNA fragments of several kilobase pairs in size into the genome of mammalian cells, guided by short homology arms (<100 bp). Integration is assisted by CRISPR-Cas12a and a crRNA that is expressed from the DNA fragment itself. This enables a PCR-only strategy for the production of the gene specific reagents for tagging. In addition, we present a software tool for oligo design and established streamlined procedures for application in several cell lines.
PCR tagging can be easily scaled up and parallelized since it needs only two oligos per gene. In yeast, where PCR tagging is very efficient even in the absence of targeted DSB induction, the ease of upscaling permitted the creation of many types of genome-wide resources. In these, all genes were modified in the same manner, i.e., by gene deletion or by tagging with a fluorescent protein or affinity tag (Gavin et al., 2002; Ghaemmaghami et al., 2003; Huh et al., 2003; Meurer et al., 2018; Winzeler et al., 1999). We believe that in mammalian cell culture, similar endeavors are now within reach using the approach presented in this study.
Tagging efficiency might be influenced by various factors including chromatin structure and expression levels. Our choice of relatively highly expressed genes as convenient reporters to validate and investigate the method might bias the efficiency. While genome wide analysis of tagging efficiency for Cas12a tagging using yeast did not reveal a correlation of expression levels and tagging efficiency (Buchmuller et al., 2019), further experiments will be needed to validate whether this is also the case in mammalian cells.
The use of tagged genes always raises the question about the functionality of the tag fusion. Here, two questions matter: How does tagging affect gene regulation, and how does it affect protein function? Many aspects of protein tagging have been discussed in literature, i.e., from functional or structural points of view. But ultimately, one has to be aware of the fact that a cell expressing a tagged gene is a mutant, and that the tag does not necessarily correctly report about the behavior of the untagged protein (Lundberg and Borner, 2019). As part of good laboratory practice, this demands some sort of phenotypic analyses to investigate the functionality of the tagged gene/protein and/or orthogonal experiments to obtain independent validation of the conclusions that were derived with the tagged clone(s). In haploid yeasts, genome-wide analysis of the influence of a C-terminal tag revealed that >95% of the ∼1,000 essential yeast genes, when endogenously tagged with a large tag such as a fluorescence protein reporter, retain enough functionality to not cause an obvious growth phenotype under standard growth conditions (Khmelinskii et al., 2014).
Various methods for gene tagging with long DNA fragments in mammalian cells have been developed, including methods that are tailored for particular DNA damage pathways such as c-NHEJ or HR to repair induced DSBs via CRISPR-Cas9 or other endonucleases. In all cases, the heterologous sequence to be inserted needs to be provided by using either circular or linear repair templates generated ex vivo or in vivo upon endonuclease excision of the repair template (Agudelo et al., 2017,Preprint; He et al., 2016; Lackner et al., 2015; Merkle et al., 2015; Suzuki et al., 2016; Zhang et al., 2017; Zhu et al., 2015; Roberts et al., 2017; Chen et al., 2018). In nongermline cells, the insertion precision by HR is often limited due to the coexistence of alternative repair pathways, and errors such as small indels are frequently observed near one or the other side of the inserted fragment. Therefore, a substantial number of clones needs to be screened in order to obtain a few correct ones (Koch et al., 2018). PCR tagging does not generate seamlessly integrated tags, since it is accompanied by a generic transcription termination site that replaces the endogenous 3′ UTR. This actually bears the advantage that it reduces the errors associated with tag insertion, since an erroneous insertion downstream of the PCR cassette, i.e., caused by c-NHEJ instead of HR, will only affect the 3′ UTR of the gene, which is not used for the tagged allele. Obviously, this constitutes a compromise, and includes the possibility that important gene regulatory sequences are omitted from the tagged gene, e.g., miRNA binding sites, targeting motifs, or sequences regulating mRNA stability. While for mammalian cells no global dataset about the regulatory impact of the 3′ UTR on gene expression is available, data from yeast, where seamless tagging was compared with tagging using a generic 3′ UTR, demonstrated that only ∼11% of the genes were impacted in their expression more than twofold (Meurer et al., 2018).
Based on our detailed analysis of three genes in Fig. 4 c and Fig. 5 a, we conclude that c-NHEJ is not the dominating repair outcome after all and that it is easy to obtain enriched populations containing the correct gene fusions. Given the fact that enriched populations are composed from many different clones, it is possible to use them for a rapid assessment of experimental questions, for example the localization of one or the colocalization of even two proteins upon endogenous expression, in a specific condition, environment, or cell line, by simply scoring multiple cells. Since they are derived from different clones, clone-specific effects can be spotted rapidly and considered in the analysis. This avoids the need of perfectly characterized cell lines with exactly the intended genomic modification and does save a lot of time.
DSB induction at off-target locations by Cas12a in vivo has been investigated extensively in the past and found to be reduced compared with canonical Cas9 variants (Kleinstiver et al., 2016; Kim et al., 2016, 2017). In agreement with this, our unbiased targeted next-generation sequencing experiment using Anchor-Seq (Fig. 4 h) detected no obvious off-target activity related to Cas12a. Nevertheless, we observed random integrations of the PCR cassettes, which is common when foreign DNA is introduced into mammalian cells (Folger et al., 1982; Saito et al., 2017). Analysis of multiple independent clonal cell lines will exclude unwanted effects from off-target integrations.
The toolset available for PCR tagging can easily be expanded by constructing new template plasmids. Maintaining a certain level of standardization, such as the preservation of the primer annealing sites for the M1 and M2 tagging oligos in new template cassettes, makes it is possible to reuse already purchased M1/M2 tagging oligos of the same gene for many different tagging experiments. We recommend the use of chemically modified M1 and M2 primers (e.g., with 5S and biotin) as we noticed considerable enhancement in tagging efficiency.
Further improvements of the tagging efficiency might be possible, i.e., by targeting the repair template to the CRISPR endonuclease cut site (Roy et al., 2018) or by using Cas12a variants that are only active in S/G2 phase of the cell cycle (Smirnikhina et al., 2019).
Beyond mammalian cells, there may be other species where this strategy could improve tagging methodology, i.e., many fungal species that require a DNA DSB for targeted integration of a foreign DNA fragment.
In conclusion, PCR-mediated C-terminal gene tagging is a simple, noncommercial, easily adoptable method to exploratively study protein localization or to explore other functional aspects using endogenous-level expression. It is simple to design the oligos (www.pcr-tagging.com; Fueller et al., 2019), open access to all other resources is granted (via Addgene or colleagues), and reagents can be freely exchanged. We believe that for many applications, PCR tagging is quicker than the construction of a plasmid for transient transfection or exogenous chromosomal integration.
With PCR tagging at hand, many different and exciting experimental avenues are becoming possible, from the rapid assessment of protein localizations to high throughput localization studies of many proteins.
Materials and methods
Throughout the manuscript and in Table 1, we use the following terms in a consistent manner in order to denote the different components and processes:
Plasmids and oligos
Construction of template cassettes
For cloning, standard restriction enzyme digests or oligo annealing and ligations using enzymes from NEB were used. Most of the elements inside the template cassettes (M1-mNeonGreen-SV40polyA-ZeocinR-BGHpolyA-hU6promoter) were custom synthesized (gBlock, IDT) and cloned via BsiWI and XbaI into a BsiWI and SpeI cut pFA6a backbone. The SV40 promoter was cloned separately into the cassette via SalI and EcoRI, since it contains repeats and could not be synthesized together with the other elements. In addition to the ZeocinR marker, we have also introduced a PuromycinR marker. Because the standard DNA sequence for this marker is very GC rich and difficult to amplify by PCR, we synthesized a new version with lower GC-content and cloned it via EcoRI and PstI into the cassette. To obtain a cassette without a marker, the SV40promoter-ZeocinR-BGHpolyA sequence was removed using a digest with SalI and XhoI and subsequent relegation of the backbone. This resulted in three different plasmids based upon the backbone pFA6 (see Fig. 8 a).
The mNeonGreen ORF of these template plasmids is flanked by unique restriction sites and is therefore easily exchangeable. For introduction of new tags, BamHI and SpeI sites can be used. For a high flexibility in cloning, the sticky ends of both restriction sites are compatible to sticky ends produced by other enzymes (BclI/BglII and AvrII/NheI/XbaI, respectively).
All tags listed in Table 1 and Table S2 are cloned either by amplification from template plasmids with oligos containing restriction sites or by annealing of two oligos and are ligated into BamHI/SpeI cut backbones of pMaM523/526/541 (for detailed information see Table S2) to retrieve template cassettes called pMaCTag (plasmid for mammalian C-terminal tagging) with the following naming scheme: pMaCTag-xy: Tag xy, no marker, pMaM526 backbone; pMaCTag-Zxy: Tag xy, ZeocinR marker, pMaM523 backbone; and pMaCTag-Pxy: Tag xy, PuromycinR marker, pMaM541 backbone.
M1 and M2 tagging oligo design
The online oligo design tool (www.pcr-tagging.com; Fueller et al., 2019) was implemented using Shiny. The interactive web application was developed in R v3.4.4 (R Core Team, 2014) with the R packages shiny v1.1.0 (Chang et al., 2018) and shinyjs v1.0 (Attali, 2017). The R package Biostrings v2.46.0 (Pagès et al., 2018) is used for searching PAM sites. The latest code is available from our GitHub repository (https://github.com/knoplab/mammalian_PCR_tagging_oligo_design_tool). Oligo design principles are as follows.
M1 tagging oligo
The design of the M1 tagging oligo is straightforward, as it contains only two functional elements: the primer annealing site for PCR, which is constant in all template cassettes (5′-TCAGGTGGAGGAGGTAGTG-3′), and the sequence of the homology arm, which is derived from the target locus.
Example: M1 tagging oligo (for TOMM70)
M2 tagging oligo
The design of the M2 tagging oligo is more complex. It contains the annealing site for PCR (5′-GCTAGCTGCATCGGTACC-3′), the direct repeat sequence of the crRNA, which is Cas12a-variant specific, and the protospacer sequence of the crRNA, which depends on available PAM sites at the target locus, a terminator for the Pol III RNA polymerase and the homology arm, as outlined below.
Example: M2 tagging oligo (for TOMM70)
(1) Location of the crRNA binding site in the genome in a region where it becomes destroyed upon cassette integration in order to prevent recleavage. This can be on either side in close proximity to the insertion site (17 nt up- and downstream of the insertion site). If no suitable crRNA binding site is found in this confined search space, the software offers the option to select PAM sites in the 3′ region of the insertion site (extended search space). In this case, the design of the homology arm of the M2 tagging oligo is adjusted in such a manner that the target site of the crRNA is deleted. This results in a small deletion in the 3′ UTR of the gene after the insertion site of the cassette. Since the PCR cassette contains a transcriptional terminator, we deem this to be noncritical. With these criteria, it is possible to design suitable crRNAs for C-terminal tagging of the vast majority of mammalian genes (Fig. 7 b).
(2) The protospacer sequence should preferably not contain four or more Ts in a row, since this might lead to premature termination of the Pol III transcription of the crRNA (Arimbasseri et al., 2013) In practice, we observed that crRNAs with TTTT are frequently functional.
(3) PAM sites are ranked according to literature (Gao et al., 2017; Kim et al., 2016; Tóth et al., 2018; Kleinstiver et al., 2019; Sanson et al., 2019 Preprint). In addition, unconventional PAM sites were considered (MCCC for the AsCas12a RR variant and RATR for LbCas12a RVR variant), based on depositor comments on the Addgene webpage. For ranking crRNAs, conventional PAM sites are preferred.
(4) If multiple crRNAs are fulfilling these criteria, they are ranked according to the position of the cleavage site, with a preference for greater distance after the STOP codon.
Synthesis of M1 and M2 tagging oligo
All M1 and M2 tagging oligos were obtained from Sigma-Aldrich using a 0.05-µmol synthesis scale and are RP1 cartridge purified, unless otherwise stated (as in Fig. 3 c).
PCR of template cassettes using M1 and M2 tagging oligos
PCR using long oligos is not always easy and requires optimized protocols. We routinely use a self-purified DNA polymerase for PCR (Pfu-Sso7d; Wang et al., 2004). Alternatively, for cassette PCR, commercial high-fidelity polymerases can also be used. We have tested Phusion (Thermo Fisher Scientific) and Velocity polymerase (Bioline). We note that the Phusion polymerase using the buffer provided by the manufacturer does not work for PCR cassette amplification with M1 and M2 tagging oligos, whereas good amounts can be obtained using our buffer. Velocity polymerase works using the buffer provided by the manufacturer.
We found that all polymerases work well using the buffer conditions and amplification scheme shown below, yielding similar amounts of PCR cassette.
For the PCR mixture, the following were used: 5.0 µl of 10× HiFi-buffer (200 mM Tris-HCl, pH 8.8, 100 mM (NH4)2SO4, 500 mM KCl, 1% [vol/vol] Triton X-100, 1 mg/ml BSA, and 20 mM MgCl2); 5.0 µl of deoxynucleotides (10 mM stock; Bioline, BIO-39026); 1.0 µl of MgCl2 (50 mM stock); 5.0 µl of betaine (5 M stock; Sigma-Aldrich, 61962); 0.3 µl of template DNA (200 ng/µl stock); 2.5 µl of M1 tagging oligo (10 µM stock); 2.5 µl of M2 tagging oligo (10 µM stock); x µl of H2O up to 50 µl; and 1 µl self-purified DNA polymerase (1 U/µl), 0.5 µl Phusion, or 0.25 µl Velocity polymerase.
PCR was mixed on ice and was performed in a Biometra TRIO (Analytik Jena) using the following program: 3 min at 95°C; 30 cycles of 20 s at 95°C, 30 s at 64°C, and 45 s per kb at 72°C (see Table S2); 5 min at 72°C; and 4°C.
After PCR, 0.4 µl DpnI or FspEI (and 1.67 µl enzyme activator) was added to the reaction mixture and incubated at 37°C for 1 h to digest the template that contains a selection marker that would contaminate the transfection.
PCR products were analyzed by agarose gel electrophoresis and purified using column purification (Macherey-Nagel).
Note: sometimes a particular pair of oligos does not yield a product upon PCR. In this case, it is worth testing whether adding 2 min on top of the calculated elongation time does solve the problem. If not, it might be that synthesis of the primer went wrong. To determine the faulty primer, pairwise PCR with established M1 and M2 primers can be used to identify the faulty primer. Usually, ordering the same primer again solves the problem. Providers may waive the cost of reordering.
Preparation of gDNA
gDNA for experiments shown in all figures except Fig. 4 was isolated from HEK293T cells using a protocol adapted from Greene and Sambrook (2012). After washing with PBS, confluently grown cells from a well on a 6-well plate were lysed in 600 µl SNET buffer (20 mM Tris, pH 8.0, 400 mM NaCl, 5 mM EDTA, pH 8.0, and 1% SDS), and 2 µl of RNase A (10 mg/ml RNase A, 10 mM Tris-HCl, pH 8.0, and 10 mM MgCl2) was added for 30 min at room temperature. Afterwards, proteinase K (20 mg/ml proteinase K, 50 mM Tris-HCl, pH 8.0, 1.5 mM CaCl2, and 50% glycerol) was added for another 30 min at room temperature. Proteins were precipitated using 200 µl 3 M K-acetate solution, followed by precipitation of the DNA with isopropanol and washing with 70% ethanol. DNA was dried and dissolved in TE (10 mM Tris and 1 mM EDTA) buffer. gDNA for experiments shown in Fig. 4 were purified according to the instructions of the manufacturer using the High Pure PCR Template Preparation Kit (Roche) followed by RNase A digest and a final purification with the High Pure PCR Product Purification Kit (Roche).
Targeted next-generation sequencing of tagged and wild-type alleles
Tag- and wild type–specific amplicons from cells used in the experiments shown in Fig. 4 were generated from 200 ng gDNA using junction-specific primers (Table S4) by a two-step nested PCR with Velocity polymerase (Bioline). The first PCR reaction was performed for 15 cycles with 60°C annealing and 30 s elongation and then purified with AMPure XP PCR beads (Beckman Coulter). The second PCR was performed for 15 cycles for wild type–specific and for 21 cycles for tag-specific amplicons, respectively, using 60°C annealing and 30 s elongation. PCR products were size selected by gel electrophoresis on 2% agarose/TAE and gel extracted by column purification (Macherey-Nagel). Amplicons were paired-end sequenced with 500 cycles on a MiSeq system (Illumina) using the Amplicon-EZ (150–500 bp) service by Genewiz to acquire at minimum 13,123 reads per sample. Paired reads were merged and aligned to the respective expected amplicon references using CRISPResso (v2.0.29; Kleinstiver et al., 2019) with parameters “cleavage_offset,” 1; and “window_around_sgrna,” 0. Mutations were subsequently quantified using a custom R script excluding primer binding sites in the analysis.
Next-generation sequencing of gDNA with Anchor-Seq
Sequencing libraries for experiments to determine cassette junction sites presented in Fig. 2 were prepared based on our previously published Anchor-Seq protocol (Meurer et al., 2018) with some modifications to the adapter design to include unique molecular identifiers (UMIs; Table S4; Buchmuller et al., 2019). Quantified libraries were sequenced paired-end with 300 cycles on a NextSeq 550 sequencing system (Illumina) with a spike-in of 20% phiX gDNA library (Illumina). Raw reads were trimmed from technical sequences (adapter and cassette sequences) using custom scripts (Julia v0.6.0 and BioSequences v0.8.0). The trimmed reads were aligned to the human reference genome (Genome Reference Consortium Human Build 38 for alignment pipelines, ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/) using bowtie2 (v18.104.22.168; Langmead and Salzberg, 2012). Template cassette sequences were included in the reference genome as decoy. Aligned reads were grouped with UMI tools (Smith et al., 2017) based on UMIs included in the Anchor-Seq adapters. Enriched integration sites were further evaluated and counted using IGV (v2.4.10; Robinson et al., 2011).
Sequencing libraries for mapping the genomic integration sites of off-target integrations presented in Fig. 4 were prepared by a modified Anchor-Seq protocol using tagmentation instead of sonication for gDNA fragmentation (Picelli et al., 2014). In detail, 100 ng/µl Tn5(E54K,L372P) transposase (purified according to Hennig et al., 2018) was loaded with 1.25 µM annealed adapters (P5-UMI-gri501…506-ME.fw, Tn5hY-Rd2-Wat-SC3) in 50 mM Tris-HCl (pH 7.5) by incubating the reaction for 1 h at 23°C. Tagmentation reactions were prepared by mixing loaded transposase with 1 µg gDNA and tagmentation buffer (10 mM Tris-HCl, pH 7.5, 10 mM MgCl2, and 25% [vol/vol] dimethylformamide) and incubating for 10 min at 55°C. For our batch of Tn5 transposase, we achieved reasonable tagmentation using an enzyme/gDNA mass ratio of 0.75. Tagmentation reactions were purified by bead purification (AMPure XP, Beckman Coulter) according to the manufacturer's instructions. Total eluates were used as input for a first PCR reaction with cassette- and Tn5 adapter–specific primers (5Btn-hmNeong.rv, P5.fw) with NEB-Next Q5 HotStart polymerase (New England BioLabs) with 15 cycles of 68°C and 1 min elongation. Biotinylated amplicons were first purified by column purification (Macherey-Nagel) and then enriched using Dynabeads MyOne Streptavidin C1 beads (Invitrogen) according to the manufacturer’s protocol. These beads were then used as input of a second PCR with cassette- and Tn5 adapter–specific primers (P7-gri701…706-hmNeong.rv, P5.fw) with NEB-Next Q5 HotStart polymerase (New England BioLabs) with 25 cycles of 68°C and 1 min elongation. PCR products were size selected for 400–550 bp using a 2% agarose/TAE gel and column purification (Macherey-Nagel). Libraries were sequenced as above. Raw reads were trimmed as already mentioned, but aligned to the human reference genome supplemented with PCR cassette sequences with bwa mem (v0.7.17-r1188; Li, 2013,Preprint). Mapped insertion sites were summarized by a custom R script and further evaluated and counted using IGV (v2.4.10; Thorvaldsdóttir et al., 2013).
In vitro transcription (IVT) of LbCas12a mRNA
Template for IVT of LbCas12a mRNA was amplified from pY016 with primers (CMV-fw, bGH_polyA_IVT.rv; Table S4) using self-purified DNA polymerase. The PCR reaction was column-purified using the Monarch PCR & DNA Cleanup Kit (New England BioLabs). The IVT reaction including DNaseI digest was performed with the mMESSAGE mMACHINE T7 Transcription Kit (Invitrogen) according to the manufacturer’s instructions. After quality control by gel-electrophoresis, the IVT product was purified by phenol-chloroform extraction and subsequent lithium acetate and isopropanol precipitation, and the mRNA was reconstituted in nuclease-free water.
Cell counting and fluorescence microscopy
For Fig. 6, b–d, cells were grown on coverslips (no. 1.5, Thermo Fisher Scientific), washed once with PBS, and fixed with 3% PFA for 10 min at 37°C. After fixation, coverslips were washed three times with PBS, incubated in PBS containing 0.1 µg/ml DAPI for 10 min, and embedded in Mowiol. Coverslips were coated with 0.1% gelatin type B (Sigma-Aldrich) for culturing C2C12 and 0.2% gelatin type A (Sigma-Aldrich) for C2C12 and mESCs, respectively. Images of RPE-1 and C2C12 cells were acquired as Z stacks using Zeiss Axio Observer Z1 equipped with 40× NA 1.3 PlanNeo oil immersion objective, and AxioCam MRm CCD camera using ZEN software. Images of mESC colonies were acquired as Z stacks using Nikon A1R confocal microscope equipped with Nikon Plan Apo λ 20× NA 0.75 objective, using NIS elements software. Maximum intensity projections of the Z stacks were prepared using Fiji (Schindelin et al., 2012; Schneider et al., 2012).
For cell counting, random fields of view were inspected in the HOECHST/DAPI channel, and all nuclei present in the entire field of view were counted. Cells containing transfected fluorescent protein–expressing cassettes were then counted subsequently in the same fields of view using the appropriate illumination wavelengths. In some experiments, counting was done using images recorded in the same manner.
For Fig. 5 c, images were taken with Zeiss LSM 780 confocal microscope using a Plan-APOCHROMAT 63×, 1.40 NA oil Objective (panels i–iii) or a Leica Spinning DMi8 spinning-disk microscope with HC PL APO 63×, 1.40 NA oil Objective (panel iv).
For all other figures, for live-cell imaging, cells were split 24 h after transfection into eight-well µ-slides (Ibidi). Analyses of transfected cells were performed 3 d after transfection or as described in the figure legends. Cells were stained with Hoechst 33342 (4 µg/ml in PBS, Thermo Fisher Scientific) for 5 min, and then the medium was changed to FluoroBrite (Thermo Fisher Scientific) supplemented with 10% FBS (Gibco) and 20 mM Hepes-KOH, pH 7.4 (Thermo Fisher Scientific).
For counting and imaging, different microscopes were used: a Nikon Ti-E widefield epifluorescence microscope or a DeltaVision, each with 60× oil immersion objectives (1.49 NA, Nikon; 1.40 NA, DeltaVision). Z stacks of 11 planes with 0.5 µm spacing were recorded with 100 ms exposure time. Single-plane images and maximum intensity Z projections are shown. Subcellular localizations were identified and scored visually.
Cells were solubilized in SDS sample buffer (50 mM Tris-HCl, pH 6.8, 10 mM EDTA, 5% glycerol, 2% SDS, and 0.01% bromophenol blue) containing 5% β-mercaptoethanol. All samples were incubated for 15 min at 65°C. Denatured and fully reduced proteins were resolved on Tris-glycine SDS-PAGE followed by Western blot analysis using the following antibodies: rat monoclonal anti-HA (11867423001; Roche), mouse monoclonal anti-V5 (V8012; Sigma-Aldrich), anti–S-tag mouse monoclonal antibody (MA1-981; Thermo Fisher Scientific), rabbit polyclonal anti mNeonGreen Tag (53061S; Cell Signaling), and rabbit anti-Calnexin (ab22595; Abcam).
h-TERT–immortalized retinal pigment epithelial (RPE-1, ATCC, CRL-4000) cells were grown in DMEM/F12 (Sigma-Aldrich) supplemented with 10% FBS (Biochrom), 2 mM L-glutamine (Thermo Fisher Scientific), and 0.348% sodium bicarbonate (Sigma-Aldrich). Mouse myoblast C2C12 cells (gift from Edgar R. Gomis, Instituto de Medicina Molecular, Lisbon, Portugal) were grown in DMEM high glucose (Sigma-Aldrich) supplemented with 20% FBS (Biochrom). Mouse embryonic stem cell line E14 (gift from Frank van der Hoeven, DKFZ, Germany) was grown in knockout DMEM (Thermo Fisher Scientific) supplemented with 10% ESC-qualified FBS (Thermo Fisher Scientific), 2 mM GlutaMax (Thermo Fisher Scientific), 0.1 mM β-mercaptoethanol, and 103 U murine leukemia inhibitory factor (from ESGRO, Millipore). mESCs were grown under feeder-free conditions on 0.2% gelatin type B–coated dishes (Sigma-Aldrich).
HEK293T, HeLa, and U2OS cells were grown in DMEM high glucose (Life Technologies) supplemented with 10% (vol/vol) FBS (Gibco).
All cell lines were grown at 37°C with 5% CO2 and regularly screened for mycoplasma contamination.
Selection was performed using 1 µg/ml puromycin (Sigma-Aldrich) or 500 µg/ml Zeocin (Invitrogen) for HEK293T cells. For HeLa cells, 300 µg/ml Zeocin was used.
Transfection of HEK293T, HeLa, and U2OS cells was performed using Lipofectamine 2000 (Invitrogen) according to protocol of the manufacturer and using a 24-well format. If not stated otherwise, 500 ng Cas12a plasmid and 500 ng of the PCR cassette were used for transfection of 1 well in a 24-well plate.
Plasmids containing Cas12a variants and PCR cassettes were electroporated into RPE-1, C2C12, and mESCs using 2-mm gap cuvettes and a NEPA-21 electroporator (Nepa Gene) according to the manufacturer’s instructions. OPTI-MEM (Thermo Fisher Scientific) was used as electroporation buffer.
For electroporation of HEK293T cells, the Neon Transfection System (Thermo Fisher Scientific) was used according to the protocol of the manufacturer using two pulses of 20 ms and 1,150 V.
Generation of clonal lines
After Zeocin selection, cells were trypsinized from a confluent plate and counted in a Neubauer chamber. Three cells per well were calculated and seeded in a 96-well plate. After 5 d, wells were checked for single clones. After another 7–10 d, cells were checked for fluorescence, and positive clones were transferred to a 24-well plate.
Online supplemental material
Fig. S1 shows the PCR strategy, andFig. S2 shows exploring transfection parameters. Fig. S3 shows analysis of clones from a CANX-mNeonGreen tagging experiment. Fig. S4 shows multi-color integration, and Fig. S5 shows tagging in C2C12 and HeLa cells. Table S1 contains the gene names and brief info about all genes tagged in this study. Table S2 lists information about the construction and the features of the plasmid resources generated in this study. This table includes the Addgene ordering numbers. Table S3 lists additional plasmids constructed in this work. Table S4 lists all oligos used in this study. Data S1 contains information about replicates and independents experiments to support the results shown in Fig. 1, c and e; Fig. 5, a and b; Fig. 6, a and b; and Fig. 7 c.
The authors wish to thank Jan Dohnálek for help with implementing the tagmentation-based Anchor-Seq protocol, Cyril Mongis for help with IT infrastructure, and Anne Schlaitz and Frauke Melchior for critical reading of the manuscript.
We acknowledge support from the Deutsche Forschungsgemeinschaft (grant KN498/12-1). Additional support was provided by the Collaborative Research Center (SFB1036), the state of Baden-Württemberg through bwHPC for high-performance computing, and SDS@hd for data storage (grant INST 35/1314-1 FUGG). K. Herbst was supported by a Heidelberg Biosciences International Graduate School fellowship and J.D. Knopf by a fellowship of the Boehringer Ingelheim Fonds. G. Pereira and B. Kurtulmus are supported by the Collaborative Research Center (SFB873, 3DMM2O) Cluster of Excellence (EXC-2082/1-390761711) and the Heisenberg Program of the Deutsche Forschungsgemeinschaft (PE1883/4 granted to G. Pereira).
The authors declare no competing financial interests.
Author contributions: M. Knop and M. Meurer designed the project and together with M.K. Lemberg, J. Fueller, K. Herbst, and G. Pereira designed the experiments, with input from B.C. Buchmuller. J. Fueller, K. Herbst, M. Meurer, B. Kurtulmus, J.D. Knopf, and D. Kirrmaier performed the experiments. K. Herbst analyzed the next-generation sequencing data, and K. Gubicza wrote the webtool for primer design. All authors analyzed the data and discussed the results. M. Knop wrote the manuscript with input from all authors.
J. Fueller, K. Herbst, and M. Meurer contributed equally to this paper.
K. Gubicza and B. Kurtulmus contributed equally to this paper.
B.C. Buchmuller’s present address is Technische Universität Dortmund, Chemische Biologie, Dortmund, Germany.