The identification of interaction partners in protein complexes is a major goal in cell biology. Here we present a reliable affinity purification strategy to identify specific interactors that combines quantitative SILAC-based mass spectrometry with characterization of common contaminants binding to affinity matrices (bead proteomes). This strategy can be applied to affinity purification of either tagged fusion protein complexes or endogenous protein complexes, illustrated here using the well-characterized SMN complex as a model. GFP is used as the tag of choice because it shows minimal nonspecific binding to mammalian cell proteins, can be quantitatively depleted from cell extracts, and allows the integration of biochemical protein interaction data with in vivo measurements using fluorescence microscopy. Proteins binding nonspecifically to the most commonly used affinity matrices were determined using quantitative mass spectrometry, revealing important differences that affect experimental design. These data provide a specificity filter to distinguish specific protein binding partners in both quantitative and nonquantitative pull-down and immunoprecipitation experiments.
Most biological processes involve the action and regulation of multiprotein complexes. In many cases, separate properties such as subcellular localization, catalytic activity, and substrate specificity are determined by different polypeptides in a holoenzyme complex, and specific protein interaction partners may be present in nonstoichiometric amounts. For example, catalytic subunits such as protein phosphatase 1 (PP1) can interact with a spectrum of alternative protein partners, which thus bind nonstoichiometrically to generate a range of holoenzymes with different specificities (for review see Moorhead et al., 2007). This can make it difficult to distinguish specific but low abundance interacting proteins from the larger number of low affinity, but abundant, contaminant proteins that are inevitably recovered using commonly used methods such as pull-down or immunoprecipitation strategies. A key goal in most areas of cell biology, therefore, is the characterization of the protein components of multiprotein complexes through the reliable identification of specific protein interaction partners.
Any putative interaction partner identified either through affinity purification or biochemical fractionation must be validated to confirm its physiological relevance. These downstream validation experiments, involving detailed molecular characterization, are both costly and time consuming and thus it is imperative to focus resources on those subsets of potential interactions with a high probability of biological significance. Continuing improvement in the sensitivity and resolution of the mass spectrometric technology for protein identification, for example, allows for the identification of ever larger numbers of proteins in immunoaffinity and pull-down experiments. In addition to bona fide interaction partners, however, these expanding lists include increased numbers of contaminant proteins, including those that bind nonspecifically to the affinity matrix. The problem of nonspecific binding cannot be overcome satisfactorily using high stringency purification methods; although this can reduce the level of nonspecific binding, it will inevitably also remove low abundance and low affinity specific partner proteins. The most effective strategy must therefore preserve all specific interaction events, which inevitably results in a large number of nonspecific proteins also copurifying that must be identified and discarded.
To solve this problem, we and others have demonstrated that a quantitative mass spectrometry–based approach combined with isotope labeling can help to distinguish which of the many proteins identified in a pull-down or immunoprecipitation experiment represent specific binding. This is done by the inclusion of a negative control, which provides a background of contaminant proteins that bind nonspecifically to the affinity matrix and/or the fusion tag, against which proteins that bind specifically to the protein of interest clearly stand out (for review see Vermeulen et al., 2008). For example, using a combination of stable isotope labeling with amino acids in cell culture (SILAC)–based quantitative proteomics (Ong et al., 2002) with immunoprecipitation of GFP-tagged fusion proteins, we revealed differences in binding partners for two different isoforms of the nuclear protein phosphatase, PP1 (Trinkle-Mulcahy et al., 2006). Other groups have used a similar approach based on tagged bait proteins to map the spectrum of human 26S proteasome interacting proteins (Wang and Huang, 2008) and to detect dynamic members of transcription factor complexes (Mousson et al., 2008). Isotope-based quantitative approaches have also been used to define tagged protein complexes in yeast (Ranish et al., 2003; Tackett et al., 2005) and both tagged and endogenous protein complexes in mammalian cells (Blagoev et al., 2003; Cristea et al., 2005; Selbach and Mann, 2006).
Although the isotope labeling strategy used in a SILAC affinity purification approach provides great help in separating specific from nonspecific interactors, experience shows that not all specific interactions can be unambiguously determined, particularly near the threshold level where signal-to-noise ratios are close to background. Here we describe a new SILAC-based mass spectrometry strategy that specifically addresses this issue, incorporating methods to increase the signal, i.e., the abundance of purified protein complexes, while reducing or filtering out the noise, i.e., proteins that bind nonspecifically to the affinity matrix, tag, and/or antibody.
The efficiency of detecting interaction partners relies upon efficient depletion of the targeted complex. Here we show that GFP-tagged proteins can be near quantitatively depleted using the recently developed GFP binder (Rothbauer et al., 2008). The GFP binder is an Escherichia coli–expressed 16-kD protein derived from a llama heavy chain antibody that binds with high affinity and specificity to GFP. This underlines the utility of using GFP as a dual tag for both affinity purification and in vivo fluorescence microscopy. Furthermore, characterizing the proteins that bind nonspecifically to three of the most commonly used affinity matrices, in either whole cell, nuclear, or cytoplasmic extracts of mammalian cells, provides a “bead proteome” filter. This facilitates distinguishing specific from nonspecific binding proteins and thereby allows objective prioritization of suitable targets for detailed molecular characterization.
In summary, we present here a powerful and reliable workflow that can be applied to analyze affinity-purified protein complexes isolated using either tagged fusion proteins or via immunoprecipitation of endogenous proteins.
Optimized workflow for quantitative analysis of endogenous and tagged protein complexes
A standard workflow for SILAC-based analysis of protein interaction partners in pull-down experiments is summarized in Fig. 1. In brief, the total protein components isolated from either an immunoprecipitation or affinity pull-down experiment are size fractionated using SDS-PAGE.
The gel is cut into typically 5–10 slices, each of which is digested with trypsin and the resulting peptides eluted and analyzed by high sensitivity mass spectrometry (see Materials and methods).
The procedures described show the optimized protocols we have derived from over 50 separate interaction analyses. This is applied routinely for the analysis of interaction partners binding to fluorescent protein (FP)–tagged fusion proteins in whole cell, cytoplasmic, and nuclear extracts (Fig. 1 A). Cells expressing the tagged protein are grown in “heavy” media, i.e., containing 13C-substituted arginine and lysine. As a control, either parental/untransfected cells or cells expressing free GFP are grown in “light”, i.e., unlabeled (12C) media. Initially, cell lines expressing free GFP were routinely used as a control. However, experience showed that the level of nonspecific protein binding to free GFP in mammalian cell lines was so low that nonexpressing cells can also provide a suitable negative control.
In this approach the negative “light” control and the experimental “heavy” sample are mixed before mass spectrometric analysis. This reduces the effective experimental variability that inevitably results when the samples are processed independently. Here extracts mixed before the GFP immunoprecipitation step were analyzed. However, separate immunoprecipitations can also be performed and the affinity matrices mixed before eluting proteins for further analysis. Specific steps in the protocol can be optimized according to the specific requirements of individual experiments. However, it is recommended that the duration of incubation for the binding step to the affinity matrix is always minimized, to reduce potential losses of dynamic or weakly associated factors. The present protocol has been optimized using extracts from HeLa and U2OS cells. Analysis using extracts from other cell lines should be optimized individually to ensure efficient protein recovery.
A similar SILAC strategy can also be applied for the analysis of protein interaction partners recovered from direct immunoprecipitation of endogenous complexes (Fig. 1 B). In this case, a control must be performed with a nonspecific antibody, e.g., either preimmune IgG, or an antibody raised against a tag or epitope that is not expressed in these cells. Because separate, parallel immunoprecipitations are required for the control and test samples, care must be taken when mixing the beads to ensure that equal quantities of material are compared.
An important issue for maximizing the identification of protein interaction partners is ensuring both efficient isolation of the target protein under study and achieving a high signal-to-noise ratio. In the case of FP-tagged proteins, our results show this is best achieved using the recently developed GFP binder (Rothbauer et al., 2008), which reproducibly provides near-quantitative depletion of GFP fusion proteins (Fig. 2).
Direct comparison with commercially available anti-GFP monoclonal antibodies (mAbs) shows that an affinity matrix coupled to the GFP binder routinely produces higher depletion efficiencies and improves signal-to-noise ratios (Fig. 2, A and B; and unpublished data).
GFP is a 27-kD protein, and a tag of this size could potentially bind itself to a range of cell proteins. We note that in vivo FRAP measurements in both the cytoplasm and nucleus show that photobleaching GFP expressed in live cells results in rapid recovery (Fig. S1). This indicates that GFP in vivo predominantly diffuses as a free protein and therefore binds weakly or not at all with most cellular protein complexes. Nonetheless, a subset of GFP molecules could still associate with cell proteins, and it is also possible that this could increase upon cell fractionation. To test this more rigorously, the SILAC pull-down method was used to analyze directly which proteins in mammalian cell extracts copurify with GFP isolated using either the GFP binder or a commercially available anti-GFP mAb (Fig. 2 C). Data from four independent experiments generated a short list of potential GFP-interacting proteins that should be considered as possible contaminants when identified in any interaction analysis of a GFP-tagged protein. However, none of these putative contaminants were recovered in all four experiments and most are also identified as proteins that bind nonspecifically to affinity matrices (see below). Consistent with the FRAP data, it was observed in the extracts tested that there are no major contaminating proteins that copurify reproducibly with free GFP. However, attention is drawn to six proteins, specifically variants of heat shock 70-kD protein, cytokeratins 8 and 18, and ubiquitin, which were most frequently detected as copurifying with GFP-tagged fusion proteins (Fig. 2 C). It is possible that these proteins, which all bind nonspecifically to the Sepharose matrix, are not binding GFP directly but are instead up-regulated in the cell line overexpressing GFP. In summary, the SILAC data demonstrate that GFP, despite its size of 27 kD, is an effective tag for use in pull-down experiments. It shows low levels of nonspecific interactions and can be quantitatively depleted from cell extracts using the GFP binder.
Characterization of Sepharose bead proteome
Next, a systematic assessment was made of which proteins in cell extracts bind nonspecifically to the Sepharose matrix, which has been used routinely in pull-down experiments and with the GFP binder (Tables I and II).
We define the set of proteins binding to the affinity matrix as a “bead proteome.” Data were pooled from 27 independent SILAC pull-down experiments on 11 separate GFP fusion proteins in either whole cell, cytoplasmic, or nuclear extracts prepared from HeLa and U2OS cells using standard RIPA buffer (see Materials and methods). Analysis of the combined dataset reveals a wide range of cellular proteins that routinely bind to the Sepharose matrix and which therefore must be regarded as potential nonspecific contaminants whenever they are identified in protein interaction studies. These include histones, hnRNP proteins, heat shock proteins, ribosomal proteins, translation and initiation factors, DEAD box proteins, and multiple cytoskeletal proteins (Table I). Over 100 additional proteins of other classes were also identified (Table II).
These common matrix-binding contaminants have therefore been incorporated into a filter set that can be used to compare with sets of proteins identified as potential specific interaction partners for any target protein under study.
Comparison of Sepharose, agarose, and magnetic bead proteomes
Using the SILAC protocol, a comparison was made of nonspecific protein binding to Sepharose as compared with two other commonly used affinity matrices, i.e., agarose and magnetic beads (Fig. 3). In this case, labeling was conducted using three isotopic states, i.e., 12C-arg and 12C-lys for agarose, 13C-arg and D4-lys for Sepharose, and 13C/15N-arg and 13C/15N-lys for magnetic beads. Nonspecific protein binding was observed for all three matrices after incubation of either nuclear or cytoplasmic extracts, whether the incubation time was short (30 min) or long (18 h). At both the short and long time points, a similar distribution of classes of contaminating proteins was observed, although the levels of protein binding can increase after longer incubation. An interesting difference was apparent in the relative performance of Sepharose and magnetic beads when incubated with either nuclear or cytoplasmic extracts. Thus, magnetic beads, which showed more nonspecific binding to structural/motility protein classes and lower nonspecific binding to nucleic acid–binding factors, had lower backgrounds of contaminating proteins in nuclear extracts as compared with Sepharose.
In contrast, Sepharose, which showed more nonspecific interactions with nucleic acid–binding factors, gave better results than magnetic beads in reducing nonspecific background in cytoplasmic extracts (Fig. 3 C; Table S1). In the case of agarose beads, similar levels of nonspecific binding to Sepharose were observed in nuclear extracts, whereas agarose beads showed lower nonspecific binding in cytoplasmic extracts as compared with either Sepharose or magnetic beads (Fig. S2). Overall, it can be concluded that the affinity matrices constitute a major source of nonspecific protein binding for all protein interaction studies and the detailed data obtained from comparing the three main types of affinity matrices show that no single type of bead is ideally suited to all applications. Rather, improved results with respect to nonspecific protein binding can be obtained by using different types of affinity matrix depending upon whether protein interaction studies are performed using cytoplasmic or nuclear extracts, or other types of cellular fractions.
Application of SILAC strategy to identify protein interaction partners
Having identified parameters affecting nonspecific protein binding, the optimized workflow described above was tested for the analysis of a previously characterized multiprotein complex. As a model system, we selected for analysis the intensively studied and well-characterized SMN complex. SMN is the product of the major human gene responsible for the inherited genetic disorder spinal muscular atrophy (for review see Kolb et al., 2007) and is known to form a complex with multiple specific partner proteins, including gemins and snRNP proteins (see Table III and references therein).
Because SMN is found in multiprotein complexes in both the nucleus and the cytoplasm (Fig. 4 A), and because some of its previously identified interactions were reported to be compartment specific (Fig. 4 B), we fractionated cells into nuclear and cytoplasmic extracts to compare the interaction partners identified by SILAC in both compartments.
A HeLa cell line stably expressing GFP-SMN (Sleeman et al., 2003) was grown in media containing 13C-labeled arginine and lysine, with parental HeLa cells grown in normal 12C-labeled media as a negative control. The cells were harvested and fractionated into cytoplasmic and nuclear extracts, pull-down experiments were performed using the GFP binder, and proteins were analyzed by mass spectrometry. This resulted in identification of over 20 proteins previously described to copurify with SMN. The average SILAC ratio and number of peptides identified for each protein in both cytoplasmic and nuclear extracts is listed in Table III.
To facilitate identification of specific binding partners, we used a data analysis approach that incorporated both SILAC ratios (i.e., 13C:12C peptide ratios) and relative peptide abundance (Fig. 4, C and D). These data plotting log SILAC ratios versus total peptide intensity show that SMN itself and the known core members of the SMN protein complex (e.g., gemins 2–8, shown in yellow in Fig. 4, C and D) are readily identified.
These data also show that p80 coilin, which was previously shown to interact with SMN specifically in the nucleus, was here also found by SILAC as a specific interaction partner only in nuclear extract (Fig. 4 D and Table III). Furthermore, the cytoplasm-specific interaction partner PRMT5 was also found here as a specific interaction partner only in cytoplasmic extract (Fig. 4 C and Table III). These results demonstrate the effectiveness of the SILAC approach for identifying specific protein binding partners and show that it can resolve compartment-specific interactions.
Almost all of the other previously reported SMN interaction partners were also found in this analysis (see Table III), although in some cases the SILAC ratios were close to those for nonspecific Sepharose-binding contaminant proteins. The analysis of the SMN complex thus illustrates the importance of including information from additional data to the SILAC ratios, including peptide abundance and bead proteome information, to help distinguish specificity where SILAC ratios are close to background levels. For example, both PRMT5 and Unrip, which have been reported to interact with SMN, show relatively low SILAC ratios compared with the gemins. However, the fact that neither of these proteins was detected binding nonspecifically to either GFP or Sepharose increases the probability that they are specific binders. In contrast, certain proteins with higher SILAC ratios, such as desmin and transketolase, were commonly found in the Sepharose bead proteome, which reduces the probability that they represent specific binding partners for SMN. Peptides were also found for hnRNP Q and RNA helicase A, both reported to interact with SMN (Mourelatos et al., 2001; Pellizzoni et al., 2001b; Rossoll et al., 2002). The peptides were not quantifiable, however, and we therefore did not include them in the list of unambiguously identified known SMN interaction partners. Interestingly, U1 70k protein was found to copurify with GFP-SMN from cytoplasmic extracts, with 15 separate peptides detected with high SILAC ratios. SMN was reported to bind the U1 snRNA and the U1 snRNP-specific A protein, although this interaction with the U1-specific 70k protein was not previously detected (Pellizzoni et al., 2002b).
We have developed a useful strategy for analyzing the SILAC data to help distinguish specific interactions (Fig. 5). Data acquired from SILAC-based quantitative immunoprecipitation experiments are first plotted in a histogram. This helps to visualize the grouping of nonspecific binding proteins, which generally fall within a bell-shaped curve regardless of the absolute value of the SILAC ratios. Although under ideal conditions a ratio of 1 should be obtained for nonspecific binding, this absolute value can vary experimentally in either direction. This is illustrated in Fig. 5 A, where the absolute peak values for the bell-shaped curves for the separate nuclear and cytoplasmic extracts differ slightly.
Within each experiment, the SILAC ratios can thus be evaluated with respect to the actual background ratio curve determined and a corresponding threshold set for that experiment (Fig. 5 A, hashed blue and red lines).
To further extend this analysis and improve confidence, the bead proteome data are next applied as a filter to highlight proteins that are known to bind nonspecifically to the affinity matrix and reveal proteins that may bind specifically yet are close to or below the chosen threshold. As illustrated for the cytoplasmic extract, SILAC ratios are first plotted for all proteins previously identified as binding nonspecifically to Sepharose (Fig. 5 B). Proteins that may bind to the GFP tag itself (Fig. 2 C) are also included in this list (Fig. 5 B, green). In the case of hnRNP proteins, which are commonly found in the Sepharose bead proteome, multiple members of the hnRNP family seen in the analysis of SMN-associated proteins are identified as likely contaminants with SILAC ratios at or below the threshold level. However, hnRNP U alone stands out with a higher SILAC ratio in both nuclear and cytoplasmic experiments, consistent with previous evidence reporting hnRNP U as a specific component of the SMN complex (Liu and Dreyfuss, 1996). This demonstrates that not all proteins in the bead proteome are inevitably binding nonspecifically and therefore they should not be excluded on this basis alone from further analysis.
Although the majority of potential contaminants have SILAC ratios either at or near the chosen threshold, some show significantly higher ratios, such as desmin and transketolase. This is either due to a real interaction with GFP-SMN, or to variability inherent in the experiment or in the quantitation. Importantly, by highlighting these proteins as potential contaminants, they may be considered lower priority for future detailed analysis.
Next, filtering out proteins known to bind nonspecifically to Sepharose leaves a list of putative interacting partners that can also be analyzed separately (Fig. 5 C). As shown here, over two-thirds of these proteins have a SILAC ratio sufficiently high to indicate specific interaction with GFP-SMN, and indeed most are known SMN interaction partners, as detailed in Table III. Of the remaining proteins, several are known SMN interacting partners that, in this experiment, have SILAC ratios close to threshold and thus may have been overlooked in the initial analysis (e.g., Sm proteins, PRMT5, and Unrip). This emphasizes the importance of the enhanced workflow for highlighting specific interaction partners among a sea of contaminants.
Most of the remaining proteins shown in Fig. 5 C have low SILAC ratios and correspond to metabolic enzymes, which at this stage appear as low priority targets for further analysis. However, one of the remaining novel proteins identified here, USP9X, had a higher SILAC ratio (Fig. 5 C) and is known to be a de-ubiquitinating enzyme that was recently shown to regulate AMPK-related kinases (Al-Hakim et al., 2008). We therefore selected this as the highest priority for follow up analysis.
Validation of USP9X by Western blotting
To test whether the identification of USP9X by SILAC analysis can be verified by an independent method, we next performed Western blotting analysis on protein complexes affinity purified with GFP binder from both cytoplasmic and nuclear extracts (Fig. 6). In this case, cells expressing free GFP were used as a control. An antibody specific to USP9X detected specific pull-down of USP9X by GFP-SMN, especially in the cytoplasmic extracts (Fig. 6 A). This confirms the identification of USP9X in the previous SILAC experiments, and is consistent with the fact that USP9X peptides were only identified by SILAC in the cytoplasmic extract (for an example of a mass spectrum for a USP9X SILAC peptide, see Fig. 6 B).
The predominantly cytoplasmic signal of USP9X is also consistent with immunofluorescence analysis. Thus, immunostaining of HeLa cells with anti-USP9X antibody revealed that it is enriched in the cytoplasm, although a weak nucleoplasmic pool is also detected (Fig. 6 C). The localization of endogenous USP9X is the same in the presence (bottom cell) and absence (top cell) of GFP-SMN, and in both cases there is no apparent accumulation in gems. The fact that USP9X had not been identified previously as associating with this well-characterized protein complex suggests that it may either be low abundance, interact transiently with the SMN complex, and/or bind with low affinity.
As a positive control, Western blotting was also performed to confirm the enrichment of SMN and U1A under the same affinity purification conditions in both cytoplasmic and nuclear extracts, and the nuclear extract–specific enrichment of coilin (Fig. 6 D). For comparison, sample mass spectra for SMN, U1A, and coilin peptides identified by SILAC analysis are shown (Fig. 6 E). Although high SILAC ratios reliably distinguish binding specificity, we note that the absolute SILAC ratio cannot currently be used to infer stoichiometry of binding. As shown by the high standard deviation values measured for high SILAC ratios (see Table III, ratios >10 in bold), it is difficult to accurately quantitate ratio values when one of the components used to generate the ratio is present in very low amounts (see representative peptide spectra in Fig. 6 E).
After confirming the positive identification of USP9X, we also tested by Western blotting other proteins that had high SILAC ratios yet were considered more likely to be contaminants based on the SILAC workflow. For example, both desmin and transketolase had high SILAC ratios in the cytoplasmic extract (Fig. 5 B), but did not show specific pull-down as judged by Western blotting (unpublished data). This confirms that they were indeed contaminants, most likely binding nonspecifically to Sepharose beads.
SILAC analysis by direct immunoprecipitation
Finally, we also evaluated the SILAC method using direct immunoprecipitation with an antibody specific for the endogenous SMN protein. This is important because not all proteins are either functional or correctly expressed after tagging with GFP, and we thus wanted to test whether a similar workflow could be applied for identification of protein partners using antibodies to endogenous proteins. For these experiments we used a monoclonal anti-SMN antibody (BD Biosciences), which was tested and found to specifically immunoprecipitate SMN (see Fig. S3 and Table S2). A similar overall workflow was applied, with minor modifications (see Fig. 1 B). SILAC analysis of the immunoprecipitated proteins again identified many of the core SMN complex proteins, although the number of peptides and overall quality of the data were notably poorer than that obtained using the GFP binder and GFP-tagged SMN (Fig. S3 and Table S2). One reason for this is likely the less efficient depletion of endogenous SMN by the anti-SMN mAb as compared with the near-quantitative depletion of GFP-SMN using the GFP binder. It appears this is not simply a question of overall expression levels, however, as GFP-SMN is expressed in the stable cell line at a lower level than endogenous SMN (Sleeman et al., 2003). To test this idea, we compared the data resulting from pull-down of GFP-SMN using the GFP binder with a pull-down using the commercial anti-GFP mAb previously shown to be less efficient in depletion of GFP (see Fig. 2). The quality of the resulting data, including the number of peptides identified and quantified, was clearly better using the GFP binder as compared with the commercial anti-GFP mAb (Fig. S3; Table S2).
In summary, these data show that the SILAC approach can be successfully applied for the analysis of endogenous proteins directly immunoprecipitated with antibodies. However, the overall quality of the resulting data will inevitably be affected by the specificity and efficiency of the available antibodies.
This study describes a method based on quantitative SILAC mass spectrometry (Ong et al., 2002) that has been optimized to facilitate the reliable detection of bona fide protein interaction partners in cell extracts by immuno- and/or affinity purification. This approach has been made possible thanks to the recent major advances in the sensitivity and mass accuracy of mass spectrometry–based proteomics (Domon and Aebersold, 2006; Cox and Mann, 2007). These technological improvements facilitate detection of lower abundance proteins and allow for a genuine high-throughput approach. Increased sensitivity of detection alone does not reliably identify specific interaction partners, however, as there is a concomitant detection also of the many nonspecifically bound proteins that routinely copurify in pull-down experiments. To minimize contaminants, many previous studies have used high stringency purification methods. This is also not ideal because stringent purification procedures often result in the loss of specific binding partners, for example those interacting in sub-stoichiometric amounts or binding with lower affinity. The strategy described here takes advantage of the sensitivity of modern mass spectrometry–based proteomics to identify en masse components of protein complexes purified under lower stringency conditions, which preserves more specific interactions.
A key feature of the method involves combining SILAC ratios with bead proteomes and other data filtering to distinguish likely specific interacting proteins from the much larger pool of nonspecific binding proteins (see Fig. 5). This is particularly valuable in assessing whether proteins with SILAC ratios close to threshold values represent specific interaction partners. This strategy can be applied directly to analyze endogenous protein complexes isolated by immunoprecipitation. In addition, we show that it can provide a powerful dual strategy when applied to the analysis of proteins interacting with GFP-tagged fusion proteins in a “what you see is what you get” approach. Importantly, this allows the integration of biochemical in vitro information derived from analysis of pull-down experiments, with in vivo data describing the localization, dynamics, and protein interactions derived from fluorescence microscopy. In contrast, the use of separate tags for affinity purification studies and microscopy analysis does not allow a direct comparison of the data obtained. GFP has been used previously as an affinity tag for proteomics studies (Cristea et al., 2005; Trinkle-Mulcahy et al., 2006). The results in this study underline the suitability of GFP as a dual strategy tag. First, both in vivo photobleaching experiments and SILAC mass spectrometry data show that GFP exhibits minimal nonspecific binding to mammalian cell proteins. Second, the recent advent of the GFP binder affinity probe allows near-quantitative depletion of GFP fusion proteins from cell extracts, thereby improving signal-to-noise ratios and maximizing the range of protein complexes that can be recovered. Based on the successful analysis of over 20 separate GFP fusion proteins in whole cell, cytoplasmic, and nuclear extracts, our results indicate that a similar strategy can be readily applied for the analysis of interaction partners binding to most, if not all, GFP-tagged proteins.
In the SILAC-based strategy for analyzing protein interaction partners (see Fig. 1), the ratio of heavy to light isotopes measured for each peptide detected provides an unbiased and often clear-cut index for distinguishing specific from nonspecific binding proteins (for examples of peptide spectra, see Fig. 6). In some cases, however, particularly for lower abundance proteins, the 13C/12C (SILAC) ratio alone is not sufficient to unambiguously distinguish specificity. The order of steps in the workflow and the detailed experimental protocol can be sources of variability. For example, accurately controlling the amounts of material mixed together before or after immunoprecipitation can affect the ratio. In addition, the ratio can also be affected by dissociation of proteins from the complex during isolation. Depending on the complex under study, it could also happen that exchange occurs between the isotope-labeled proteins on the affinity matrix and proteins in the control extract (Wang and Huang, 2008). For these reasons, our results show it is important to minimize the binding time whenever possible, which will also help to reduce the level of nonspecific protein binding. This latter point is illustrated by the larger cohort of nonspecific binding proteins recovered after extended (18 h) incubation of the extracts with all three affinity matrices (see Fig. S2 and Table S1). Finally, it is also important to optimize the efficiency of protein pull-down. This is best illustrated by the comparison of using a commercial anti-GFP mAb as compared with GFP binder to affinity purify GFP-SMN (see Fig. S3 and Table S2).
As illustrated here by the analysis of the well-characterized SMN complex, a useful additional criterion to add to the SILAC ratio is to filter all identified proteins against a database of proteins found to bind nonspecifically to affinity matrices under a range of conditions. This was shown to help distinguish known SMN interaction partners from likely contaminants (see Table III and Fig. 5). In the case of Sepharose, the bead proteome was derived from 27 different SILAC-based pull-down experiments. This includes separate analysis for pull-downs performed in whole cell, nuclear, and cytoplasmic extracts for both HeLa and U2OS cell lines. Identical results were obtained for both cell lines and the data have therefore been combined in the Sepharose bead proteome presented (Tables I and II). Interestingly, similar sets of protein contaminants were identified in the separate cytoplasmic and nuclear extracts, including ribosomal, heat shock, hnRNP, and intermediate filament proteins. We extended the analysis of the bead proteome to include direct comparisons of Sepharose, agarose, and magnetic beads, which to the best of our knowledge currently represent the three most commonly used affinity matrices. Unexpectedly, differences were observed in the spectrum of contaminating proteins that predominate for each of these matrices, and this varied between the separate nuclear and cytoplasmic extracts. Thus, we did not observe a single bead matrix that gave universally lower levels of contaminants under all circumstances. For cytoplasmic extracts, the lowest background levels were obtained using either Sepharose or agarose. Magnetic beads, in contrast, showed more nonspecific binding for cytoskeletal and structural proteins that are abundant in cytoplasmic extracts. Conversely, magnetic beads showed lower nonspecific binding to nucleic acid–associated proteins and thus gave lower backgrounds than either Sepharose or agarose when used with nuclear extracts. These data provide objective grounds for concluding that no single type of affinity matrix is best for all purposes, and highlights the importance of choosing the most suitable combination of reagents based on the specific details of the experiment to be performed.
An important question raised by this identification of many proteins that clearly bind nonspecifically to commonly used affinity matrices in protein–protein interaction experiments is the accuracy of the published literature. In many cases, published studies have listed as potential interaction partners proteins shown here to bind nonspecifically to affinity matrices. The bead proteome filters thus provide a useful and objective resource that can be consulted by cell biologists to help avoid expending time and effort on the analysis of proteins that may prove to be simple contaminants. In the future, accumulating information from many laboratories on the range of nonspecific protein interactions observed using different cell types, extracts, tags, and affinity matrices will provide an invaluable resource and we propose this should be established as a freely accessible online database.
In summary, the present data show that a strategy combining SILAC analysis with bead proteome filtering and enhanced data analysis procedures can reliably be used to characterize specific protein interaction partners while using isolation procedures that preserve the binding of lower abundance and lower affinity proteins. We show that this can also resolve interaction events confined to either nuclear or cytoplasmic compartments. Inevitable differences in the biochemical properties of different proteins mean that no unique isolation protocol may be ideal in every case. Nonetheless, we could show that a similar isolation protocol could be successfully applied to analyze over 20 different GFP fusion proteins in multiple different cell extracts and from two separate mammalian cell lines. Even when precise isolation conditions must be varied, our data indicate general principles that apply, including the importance of maintaining short incubation times during affinity purification and the need to optimize the overall efficiency of affinity depletion. We show the strategy can be used for the analysis of tagged or endogenous complexes and thus conclude it provides a general approach that can be widely applied for the analysis of protein binding partners in different fields of cell biology.
Materials And Methods
HeLaEGFP and HeLaEGFP-SMN stable cell lines were obtained and characterized as described previously (Sleeman et al., 2003). Cells were grown in custom-made DMEM (minus arginine and lysine; Invitrogen) supplemented with 10% dialyzed fetal calf serum (Invitrogen) and penicillin/streptomycin (Invitrogen). The selection marker G418 was added to SILAC media used with stable cell lines expressing GFP-tagged proteins. For double encoding experiments, l-arginine (84 μg/ml; Sigma-Aldrich) and l-lysine (146 μg/ml lysine; Sigma-Aldrich) were added to the “light” media, while l-arginine 13C and l-lysine 13C (Cambridge Isotope Laboratory) were added to the “heavy” media at the same concentrations. For triple encoding experiments, l-arginine and l-lysine were added to the “light”, l-arginine 13C and l-lysine 4,4,5,5-D4 (Cambridge Isotope Laboratory) to the “medium”, and l-arginine 13C/15N and l-lysine 13C/15N (Cambridge Isotope Laboratory) to the “heavy” media. The amino acid concentrations are based on the formula for normal DMEM (Invitrogen). Once prepared, the SILAC media was mixed well, filtered through a 0.22-μm filter (Millipore) using a suction pump, and stored at 4°C. HeLa and U2OS cell lines were passaged in SILAC media for at least 5–6 cell doublings before harvesting to ensure complete incorporation of isotopic amino acids (Ong and Mann, 2007; Harsha et al., 2008). PBS-based nonenzymatic cell dissociation buffer (Invitrogen) was used to passage cells, as trypsin-EDTA solutions may contain amino acids.
Preparation of cellular extracts
Whole cell extracts were prepared by solubilizing trypsinized and pelleted cells in ice-cold RIPA buffer (50 mM Tris, pH 7.5, 150 mM NaCl, 1% NP-40, 0.5% deoxycholate, and protease inhibitors), sonicating briefly on ice (5 × 10 s at full power), and clearing extracts by centrifuging at 2,800 g (3,500 rpm, GH3.8 rotor; Beckman Coulter GS-6) for 10 min at 4°C. For preparation of cytoplasmic and nuclear fractions, 10 × 14-cm dishes of cells were trypsinized and pelleted, resuspended in 5 ml of ice-cold swelling buffer (10 mM Hepes, pH 7.9, 1.5 mM MgCl2, 10 mM KCl, 0.5 mM DTT, and protease inhibitors) for 5 min, and cells were broken open to release nuclei using a pre-chilled Dounce homogenizer (20 strokes with a tight pestle). Dounced cells were centrifuged at 228 g (1,000 rpm, GH-3.8 rotor; Beckman Coulter GS-6) for 5 min at 4°C to pellet nuclei and other fragments. The supernatant was retained as the cytoplasmic fraction. Before use, 1 ml of 5x RIPA buffer was added and clearing performed as described above. The nuclear pellet was resuspended in 3 ml of 0.25 M sucrose/10 mM MgCl2 and layered over a 3-ml cushion of 0.88 M sucrose/0.5 mM MgCl2 and centrifuged at 2,800 g (3,500 rpm, GH-3.8 rotor; Beckman Coulter GS-6) for 10 min at 4°C. The resulting cleaner nuclear pellet was resuspended in 5 ml of RIPA buffer, sonicated and cleared as described above. Total protein concentrations were measured using a Bradford assay.
Immunoaffinity purification of GFP-tagged and endogenous proteins
Monoclonal anti-GFP antibodies (Roche) were covalently coupled to protein G–Sepharose beads (GE Healthcare) at 2 mg/ml. The beads were incubated with antibody for 1 h at 4°C and then washed twice with 10 volumes of 0.1 M sodium borate, pH 9. Next, the beads were incubated with 10 volumes of borate buffer containing 20 mM dimethylpimelimidate (DMP; Sigma-Aldrich) for 30 min at room temperature. The beads were pelleted and resuspended with 10 volumes of freshly prepared 20 mM DMP in borate buffer for an additional 30-min incubation. The beads were washed twice with 10 volumes of ice-cold 50 mM glycine (pH 2.5) to remove unbound antibody and then washed several times with PBS or RIPA buffer for use and/or storage at 4°C. Monoclonal anti-SMN antibodies (BD Biosciences) were covalently coupled to protein G–Sepharose at 1 mg/ml using a similar protocol. GFP binder (ChromoTek) was prepared and covalently coupled to NHS-activated Sepharose 4 Fast Flow beads (GE Healthcare) at 1 mg/ml as described previously (Rothbauer et al., 2008).
For the GFP immunoaffinity experiments, extracts from each cell line were precleared by incubation on Sepharose beads alone for 30 min at 4°C and then mixed in a 1:1 ratio based on total protein concentration. GFP alone or GFP-SMN were affinity purified by incubation with either anti-GFP mAbs or GFP binder conjugated to Sepharose beads. Incubation times varied according to the antibody and the experiment, and we recommend a maximum 1-h incubation, if possible. The affinity matrix was washed four times with RIPA buffer. To ensure efficient elution of bound proteins, a bead-equivalent volume of 1% SDS was added, the matrix boiled for 10 min and then a 4x volume of dH2O added. The matrix was vortexed and the solution removed and reduced to the original bead-equivalent volume (and 1% SDS concentration) using a speedvac. Proteins were reduced and alkylated in this solution, first by the addition of 10 mM DTT (boil for 2 min), and then the addition of 50 mM iodoacetamide (incubate at room temperature in the dark for 30 min). A small aliquot of Laemmli sample buffer was added and proteins were separated by running halfway down NuPAGE 12% Bis-Tris gels. Gels were Coomassie stained and de-stained overnight before excision of slices. Peptides resulting from in-gel digestion with trypsin (Promega) were extracted from the gel slices for automated LC-MS/MS analysis. For validation of SILAC results, GFP and GFP-SMN were affinity purified separately using the GFP binder and subjected to 1D SDS/PAGE and Western blotting. Primary antibodies used for Western blotting (and immunofluorescence, where indicated) included anti-USP9X (AbCam mAb, 1:500 WB, 1:50 IF), anti-coilin (204/10 rabbit polyclonal, 1:1,000 WB), anti-SMN (BD Biosciences mAb, 1:1,000 WB), anti-U1A (856 rabbit polyclonal, 1:2,000 WB), anti-desmin (Abcam mAb; 1:500 WB), and anti-transketolase (goat polyclonal, 1:500; Santa Cruz Biotechnology, Inc.). HRP-conjugated secondary antibodies (Thermo Fisher Scientific) were detected using the ECL-Plus reagent (GE Healthcare).
For the endogenous SMN immunoaffinity experiment and the bead proteome experiment comparing Protein G–Agarose (GE Healthcare), Protein G–Sepharose (GE Healthcare), and the magnetic Protein G–Dynabeads (Invitrogen), equivalent total protein amounts of extracts were incubated separately on the appropriate matrices and combined carefully after one wash step in RIPA buffer. After a further three wash steps in RIPA buffer, bound proteins were eluted and subjected to 1D SDS/PAGE followed by band excision and peptide digestion as described above.
Mass spectrometry and data analysis
An aliquot of the tryptic digest (prepared in 5% acetonitrile/0.1% trifluoroacetic acid in water) was analyzed by LC-MS on an LTQ-Orbitrap mass spectrometer system (ThermoElectron) coupled to a Dionex 3000 nano-LC system (Camberley). The peptide mixture was loaded onto an LC-Packings PepMap C18 column trap column (0.3 × 5 mm) equilibrated in 0.1% TFA in water at 20 μl/min, washed for 3 min at the same flow rate, and then the trap column was switched in-line with an LC-Packings PepMap C18 column (0.075 × 150 mm) equilibrated in 0.1% formic acid/water. The peptides were separated with a 55-min discontinuous gradient of acetonitrile/0.1% formic acid (2–40% acetonitrile for 40 min) at a flow rate of 300 nl/min and the HPLC interfaced to the mass spectrometer with an FS360-20-10 picotip (New Objective) fitted to a nanospray 1 interface (ThermoElectron) with a voltage of 1.1 kV applied to the liquid junction.
The Orbitrap was set to analyze the survey scans at 60,000 resolution and the top five ions in each duty cycle selected for MSMS in the LTQ linear ion trap. The raw files were processed to generate a Mascot generic file using the program Raw2msm (Olsen et al., 2005) and searched against the UniProt human database using the Mascot search engine v.2.2 (Matrix Science) run on an in-house server using the following criteria; peptide tolerance = 10 ppm, trypsin as the enzyme and carboxyamidomethylation of cysteine as a fixed modification. Variable modifications were oxidation of methionine, medium SILAC labels were: Label 13C(6) (R), Label 2H(4) (K), and heavy SILAC labels were: label 13C(6) 15N (4)(R), label 13C(4) 15N (2) (K).
Quantitation was performed using the program MS-Quant (http://msquant.sourceforge.net), with peptide ratios calculated for each arginine- and/or lysine-containing peptide as the peak area of labeled arginine/lysine divided by the peak area of nonlabeled arginine/lysine for each single-scan mass spectrum. Peptide ratios for all arginine- and lysine-containing peptides sequenced for each protein were averaged. Individual spectra were inspected using QualBrowser software (XCalibur; ThermoElectron). ProteinCenter (Proxeon Bioinformatics) proteomics data mining and management software was used to eliminate redundancy and compare datasets, and to convert protein IDs to gene symbols and perform initial Gene Ontology characterization.
Fluorescence microscopy and photobleaching experiments
Fluorescence imaging was performed on a DeltaVision Spectris widefield deconvolution microscope (Applied Precision) fitted with an environmental chamber (Solent Scientific) to maintain temperature at 37°C, a CoolMax charge-coupled device camera (Roper Scientific) and a quantifiable laser module (QLM; Applied Precision) with a 488-nm laser. For fixed cell imaging, a mix of parental HeLa cells and HeLa cells stably expressing GFP-SMN were paraformaldehyde fixed on glass coverslips, permeabilized with Triton X-100, stained with both anti-USP9X (detected by TRITC-anti–mouse secondary antibodies) and the DNA stain DAPI, and mounted in FluorSave mounting media (Calbiochem). Cells were imaged using a 60x NA 1.4 Plan-Apochromat objective (Olympus) and the appropriate filter sets (Chroma Technology Corp.), with 20 optical sections of 0.5 μM each acquired. SoftWorX software (Applied Precision) was used for both acquisition and deconvolution. For the FRAP experiments, HeLa cells stably expressing free GFP were cultured in glass-bottomed dishes (WILLCO, Intracel) and mounted on the same system. A single section was imaged before photobleaching, a region of interest was then bleached to ∼50% of its original intensity using the 488-nm laser, and a rapid series of images was acquired after the photobleach period. Recovery curves were plotted and the mobile fraction and half time of recovery were determined using SoftWorx.
Online supplemental material
Table S1 contains a comprehensive list of all proteins identified in the comparative bead proteome SILAC experiment, including separate datasets for cytosolic and nuclear extracts and for 30-min and 18-h incubations. Preferential enrichment on either Sepharose or magnetic beads is indicated and commonly found keratins are listed separately. Table S2 compares the quality of data obtained for known SMN complex members using either GFP binder or mAb anti-GFP to affinity purify GFP-SMN and mAb anti-SMN to affinity purify endogenous SMN. Fig. S1 demonstrates the rapid recovery of free GFP in both the cytoplasm and nucleoplasm after photobleaching in live cells. Fig. S2 compares the distribution of nonspecific protein binding between Sepharose and agarose and between magnetic beads and agarose. Fig. S3 graphically compares data obtained using either GFP binder or mAb anti-GFP to affinity purify GFP-SMN and mAb anti-SMN to affinity purify endogenous SMN. Coomassie gels used to separate proteins before mass spectrometric analysis are shown, and SILAC ratio vs. total peptide abundance plotted for known SMN complex members.
© 2008 Trinkle-Mulcahy et al. This article is distributed under the terms of an Attribution–Noncommercial–Share Alike–No Mirror Sites license for the first six months after the publication date (see http://www.jcb.org/misc/terms.shtml). After six months it is available under a Creative Commons License (Attribution–Noncommercial–Share Alike 3.0 Unported license, as described at http://creativecommons.org/licenses/by-nc-sa/3.0/).
Abbreviations used in this paper: FP, fluorescent protein; SILAC, stable isotope labeling with amino acids in cell culture.
We would like to thank Drs. Douglas Lamont and Kenneth Beattie of the Fingerprints Proteomics Facility at the University of Dundee for technical assistance.
Work in the Lamond laboratory was funded by a Wellcome Trust Program Grant (073980/Z/03/Z), and by an interdisciplinary RASOR (Radical Solutions for Researching the Proteome) initiative, which is supported by the Biotechnology and Biological Sciences Research Council, Engineering and Physical Sciences Research Council, Scottish Higher Education Funding Council, and Medical Research Council (MRC). A. Lamond is a Wellcome Trust Principal Research Fellow. S. Boulon is funded by a Human Frontier Science Program fellowship. F.-M. Boisvert is funded by a Caledonian Research Foundation fellowship. N.A. Morrice is supported by the MRC, and F. Vandermoere and R. Urcia are funded by the RASOR collaboration. U. Rothbauer and H. Leonhardt are members of the Munich Center for Integrated Protein Science (CiPSM) and shareholders of ChromoTek (Munich, Germany).