Under steady-state conditions, major histocompatibility complex (MHC) I molecules are associated with self-peptides that are collectively referred to as the MHC class I peptide (MIP) repertoire. Very little is known about the genesis and molecular composition of the MIP repertoire. We developed a novel high-throughput mass spectrometry approach that yields an accurate definition of the nature and relative abundance of unlabeled peptides presented by MHC I molecules. We identified 189 and 196 MHC I–associated peptides from normal and neoplastic mouse thymocytes, respectively. By integrating our peptidomic data with global profiling of the transcriptome, we reached two conclusions. The MIP repertoire of primary mouse thymocytes is biased toward peptides derived from highly abundant transcripts and is enriched in peptides derived from cyclins/cyclin-dependent kinases and helicases. Furthermore, we found that ∼25% of MHC I–associated peptides were differentially expressed on normal versus neoplastic thymocytes. Approximately half of those peptides are derived from molecules directly implicated in neoplastic transformation (e.g., components of the PI3K–AKT–mTOR pathway). In most cases, overexpression of MHC I peptides on cancer cells entailed posttranscriptional mechanisms. Our results show that high-throughput analysis and sequencing of MHC I–associated peptides yields unique insights into the genesis of the MIP repertoire in normal and neoplastic cells.
MHC class I molecules present short peptides at the cell surface, typically 8–11 mers, for scrutiny by CD8 T lymphocytes (1, 2). Generation of peptide–MHC I complexes is initiated by proteasomal degradation of source proteins in the cytosol (3). Peptides generated by the proteasome are translocated in the endoplasmic reticulum, where they are subjected to N-terminal trimming, and then incorporated in MHC I proteins and exported at the cell surface (4–7). Presentation of microbial peptides by MHC I is required to elicit CD8 T cell responses against intracellular pathogens (8). Under steady-state conditions, i.e., in the absence of infection, cell surface MHC I molecules are associated solely with self-peptides. These peptides, collectively referred to as the MHC class I peptide (MIP) repertoire (9), play vital roles. They shape the repertoire of developing thymocytes (10, 11), transmit survival signals to mature CD8 T cells (12), amplify responses against intracellular pathogens (13), allow immunosurveillance of neoplastic cells (14), and influence mating preferences in mice (15). The MIP repertoire is also involved in immunopathology because it can be targeted by autoreactive T cells that initiate autoimmune diseases and alloreactive T cells that cause graft rejection and graft-versus-host disease (16, 17).
Despite the tremendous importance of the MIP repertoire, we know very little about its genesis and molecular composition. Proteomic analysis of the MIP repertoire is a daunting task because estimates suggest that it encompasses thousands of peptides that are present in low copy numbers per cell (1, 18). Each MHC molecule recognizes peptides through a broadly defined consensus motif of amino acids serving as anchors to the appropriate binding pockets on the MHC molecules. Such motifs were first established by pool Edman sequencing of unfractionated peptide mixtures eluted from MHC molecules (19). Direct biochemical characterization of specific MHC I–associated peptides has typically involved immunoaffinity purification of MHC molecules after cell lysis, fractionation of the peptides by chromatography, and sequencing, initially by Edman's method and, more recently, by mass spectrometry (MS). Refinements in MS methods pioneered by Hunt et al. and Falk et al. represented major progress and led to the characterization of several MHC I–associated peptides (18, 20, 21), in spite of limitations such as low peptide yield, preferential loss of peptides with low affinity for MHC I, and contamination of the MHC molecules by cellular debris and detergents (22, 23).
Two high-throughput strategies were therefore implemented to provide comprehensive molecular definition of the MIP repertoire. The first is based on transfection of cell lines with expression vectors coding soluble secreted MHCs (lacking a functional transmembrane domain) and elution of peptides associated with secreted MHCs (24, 25). This interesting approach improves MHC I peptide recovery, but presents some limitations, as follows: (a) it cannot be used on freshly explanted cells; (b) cell transfection by itself may perturb the MIP repertoire (26); and (c) the MIP repertoire associated with soluble MHC corresponds to the repertoire of peptides that can bind the transfected MHC allele (what “can be presented”), but not necessarily to peptides that are normally presented at the cell surface (what “is presented”). The second approach hinges on chemical or metabolic labeling to provide quantitative profiles of MHC I–associated peptides (9, 27–29). Although chemical derivatization suffers from variable modification yields and unexpected side reaction products, metabolic labeling is only applicable to certain MHC I allelic products and to cell culture model systems (30, 31). Nonetheless, high-throughput peptide sequencing analyses have provided crucial insights into the structure of the MIP repertoire. First, the source proteins for the MIP repertoire are found in almost every compartment of the cell (25). Second, only a limited correlation is observed between the amounts of the MHC I–associated peptides presented by cells and the relative expression of source proteins from which these peptides are derived (28). A likely explanation for this discrepancy is that the MIP repertoire preferentially derives from defective ribosomal products (DRiPs) and short-lived proteins relative to slowly degraded proteins (2, 32, 33). Third, for peptides differentially expressed on normal versus neoplastic cells, no clear correlation was found between mRNA levels and corresponding MHC peptide levels (9).
The goal of our work was to understand the structure and genesis of the MIP repertoire. In accordance with a recent advocacy for “systems immunology” (34), we used novel bioinformatics tools based on peptide mapping and segmentation to obtain a global and accurate quantification of native unlabeled peptides present in the MIP repertoire (unpublished data) (35, 36). In particular, we addressed four specific questions. Does the MIP repertoire reflect the composition of the transcriptome? Are some gene families overrepresented in the MIP repertoire? What is the impact of neoplastic transformation on the MIP repertoire? Can a high-throughput, unlabeled-peptide–sequencing platform provide valuable insights for the identification of immunogenic tumor-associated epitopes?
Experimental approach for the identification and quantification of MHC I–associated peptides
Mild acid elution (MAE) of MHC I–associated peptides from living cells presents several advantages over immunoprecipitation of peptide–MHC I complexes. Because it involves fewer purification steps and no detergents, MAE yields ∼10 times more MHC I–associated peptides than the latter and introduces no bias linked to preferential loss of low-affinity peptides (37–39). However, MAE has never been used for high-throughput sequencing of the MIP repertoire because it is assumed that eluted peptides contain not only MHC I–associated peptides, but also “contaminant” peptides. We reasoned that this problem could be overcome by using MHC I–deficient cells as a negative control. Because β2-microglobulin (β2m) is essential for formation of stable peptide–MHC I complexes, β2m-deficient cells are MHC I deficient. We therefore compared peptides eluted from three cell populations, as follows: WT EL4 thymoma cell line, a β2m-deficient EL4 mutant cell line (C4.4-25−), and C4.4-25− cells transfected with a genomic clone of murine β2m (E50.16+) (40). Flow cytometry analysis showed that expression of H2Db, H2Kb, and Qa2 MHC I molecules were abrogated in β2m-deficient EL4 cells, but were restored in β2m transfectants (Fig. 1 A).
Samples obtained by MAE of peptides from 8 × 107 cells were analyzed using an on-line 2D-nanoLC-MS system (see Materials and methods for more details). Each individual LC-MS run was visualized as a three-dimensional map where each isotopic profile (dot) corresponds to a given ion that is defined by its specific abundance, m/z, and retention time coordinates (Fig. 1 B). We first assessed the reproducibility of data obtained with our method using peptide eluates from WT EL4 cells. We found that 95% of peptide ions showed a variation of less than ±1.4- and ±2.2-fold in abundance across 2D-nanoLC-MS instrumental and biological replicates, respectively (Fig. S1). Therefore, peptides were considered to be differentially expressed when the fold difference in abundance was ≥2.5 (P < 0.05). We next compared profiles from WT and β2m-deficient EL4 cells. Contour profiles obtained after computational analyses showed a higher level of peptide complexity for WT compared with β2m-deficient EL4 cells (Fig. 1 B). We assessed the proportion of MHC I–associated versus contaminant peptides using in-house peptide detection software (unpublished data). To identify and define individual peptide clusters, peptide lists generated by Mass Sense were compared through replicate injections from both WT- and β2m− mutant-derived samples using segmentation analyses with hierarchical clustering. Heat map representation was used to visualize differences in peptide cluster expression between WT- and β2m− mutant–derived samples (Fig. 1 B). Peptides that were recovered only from WT EL4 cells were considered to be part of the MIP repertoire. Because β2m-deficient cells express low amounts of incomplete MHC I molecules at the cell surface (41), we also considered that peptides significantly more abundant on WT relative to β2m-deficient cells were MHC I associated. In the latter case, we used a very stringent criterion; to be considered as MHC I associated, a peptide had to be at least five times more abundant on WT relative to β2m-deficient cells. Out of 3,716 unique peptide clusters that were reproducibly detected across three biological replicates, 1,487 peptide clusters were overexpressed or uniquely detected within the WT EL4 sample. Thus, we estimated that ∼40% of acid-eluted materials correspond to specific-MHC I peptides.
We obtained MS/MS assignments for 881 of the 3,716 peptide clusters, all of which had a characteristic fragmentation associated to peptide backbone cleavages. Although these identifications correspond to 24% of the entire ion population detected, other MS/MS spectra were assigned to modified residues, de novo peptide sequences, or spectra of poor quality. Peptide coordinates were computationally related to their corresponding sequenced peptides. Out of 881 sequenced peptides, 383 had a consensus motif for H2Db, H2Kb, Qa1, or Qa2 MHC I molecules. Stringent validation criteria were next applied to select peptide ions showing mass accuracy of <30 parts per million (ppm), with MS/MS spectra displaying consistent sets of b- and/or y-type fragment ion series corresponding to the MHC I peptide sequence. This manual validation enabled the identification of 178 unique MS/MS corresponding to MHC I–associated peptides: 158 were detected uniquely in eluates from WT EL4 cells, whereas 20 were overexpressed at least 5-fold on WT relative to β2m-deficient EL4 cells (Fig. 1 C). To confirm that differences between WT and β2m-deficient EL4 cells were truly β2m dependent, we analyzed peptides recovered from E50.16+ cells (β2m-deficient EL4 cells transfected with a genomic clone of murine β2m). As expected, we observed that 95% of MHC I peptides were expressed at the cell surface of β2m+ transfectants. Absence of a few MHC I peptides on β2m transfectants is probably explained by the lower expression of H2Db and H2Kb on β2m transfectants compared with WT cells (Fig. 1 A). For the 178 sequenced MHC I–associated peptides, relative abundance in the three EL4 cell variants (WT, β2m− mutants, and β2m+ transfectants) is presented in Table S1 . Data for a representative set of peptides is shown in Table I. Notably, although MHC I peptides were 8–13 mers, the length of contaminant peptides was more variable, ranging from 6 to 26 amino acids with a median value of 13 residues (unpublished data).
Definition of the MIP repertoire presented by discreteMHC I allelic products
Large databases of MHC I peptides and computational binding methods are now part of emerging biomedical resources (42–45). Information in these databases concerns mainly, though not exclusively, viral and bacterial-derived peptides presented by MHC I allelic products in humans and mice. Each MHC I peptide in Table S1 was manually classified according to restriction size and binding motif favored by Db, Kb, Qa1, and Qa2 class I molecules (19, 46, 47). From 178 MHC I peptides identified from variant EL4 cell lines, 92% were presented by classical MHC Ia allelic products H2Db and H2Kb, and 8% were presented by MHC Ib molecules Qa1 and Qa2 (Fig. 2 A and Table S2). When viewed in toto, peptide pools presented by H2Db, H2Kb, and Qa2 displayed the canonical binding motifs documented for these MHC I molecules (Fig. 2 B).
A priori, two factors may determine whether specific peptides are presented by MHC I: peptide affinity for MHC molecules and the processing of source proteins along the MHC I presentation pathway. Protein attributes that are relevant to MHC I processing include rates of protein translation, DRiP formation, and degradation (32, 33, 48, 49). To evaluate the importance of peptide binding affinity, each of the 164 source proteins of peptides presented by H2Db (n = 96) or H2Kb (n = 68) was analyzed with the smm algorithm. We scored and ranked the predicted binding affinity of all peptides from individual proteins for H2Db or H2Kb (for example, a protein of 400 amino acids contains 392 nonapeptide sequences). We then asked how each peptide that we had sequenced by MS/MS would rank relative to other peptides from its source protein. Remarkably, 91% of H2Db-associated peptides eluted from EL4 cells ranked in the top 1% in terms of H2Db binding affinity (Fig. 2 C and Table S2). Similar results were obtained from the SYFPEITHI database (unpublished data). That some peptide sequences predicted to be top MHC I binders were not found in the MIP repertoire of EL4 cells is not surprising. This means either that proteases involved in MHC I processing do not generate these specific “degradation products” or that they are present in subthreshold amounts. For H2Kb, 85% of peptides ranked in the top 5% for predicted H2Kb binding affinity (Fig. 2 C and Table S2). These data support the concept that MHC I binding affinity plays a dominant role in shaping the content of the MIP repertoire.
Discrimination between MHC I–associated peptides and contaminant peptides using bioinformatic tools
In the aforementioned experiments, we used β2m-deficient cells as negative control to discriminate MHC I–associated peptides from contaminant peptides. The need for an MHC I–deficient negative control would nevertheless represent a significant hurdle for high-throughput sequencing of the MIP repertoire of primary cells. We therefore asked whether bioinformatic tools could be used to identify MHC I–associated peptides among a peptide mixture obtained by MAE. We first ranked the 164 H2Db- and H2Kb-associated peptides eluted from EL4 cells as a function of their MHC binding score estimated with smm and SYFPEITHI algorithms (Fig. 3, red bars). We found that ∼97% of MHC I–associated peptides were scored below an smm binding threshold of 1,000 nM (IC50) and above a SYFPEITHI binding threshold of 22 for H2Db and 13 for H2Kb (Fig. 3). The aforementioned thresholds therefore provided a 3% false-negative rate (3% of MHC I–associated peptides were wrongly classified as contaminants). To estimate the false-positive rate obtained with these thresholds, we analyzed the 215 contaminant peptides eluted from EL4 cells representing high-quality assignment (Mascot score >45, mass accuracy <30 ppm, MS/MS manually inspected) from a list of 498 candidates that were not significantly overexpressed on WT relative to β2m-deficient EL4 cells. Only 56 out of the 215 contaminant peptides had the canonical length for binding H2Db or H2Kb, and very few of them satisfied our smm and SYFPEITHI thresholds (Fig. 3, blue bars). Overall, the false-positive rate was <2% (<2% of contaminants were wrongly classified as MHC I–associated peptides). Similar results were obtained for Qa2 peptides using the Rankpep algorithm with a binding threshold of 120. We synthesized nine peptides across the predicted affinity spectrum and tested their ability to bind H2Kb or H2Db. Peptides classified as MHC I associated according to their smm score did bind to MHC I, whereas peptides classified as contaminants did not (Fig. S2). These experimental data further validate bioinformatic discrimination between MHC I–associated peptides and contaminant peptides. We therefore conclude that peptides obtained by MAE can be categorized as MHC I–associated versus contaminant peptides with 1–3% false-positive and -negative rates using publicly available algorithms.
Global portrayal of the MIP repertoire of primarymouse thymocytes
In the next series of experiments, peptides eluted from primary mouse thymocytes by MAE were analyzed by on-line 2D-nanoLC-MS/MS. Of the sequenced peptides, 189 were classified as MHC I associated based on the smm, SYFPEITHI, and Rankpep thresholds selected above: 84 H2Db-, 91 H2Kb-, 13 Qa2-, and 1 Qa1-associated peptides (Table S3). Based on analyses described in the previous paragraph, we estimate that the false-positive rate for the 189 thymocyte peptides is <2%. GO term enrichment analysis was performed on the 189 genes coding for those peptides by applying stringent criteria from 772 functional annotations (Fig. 4 A). In accordance with a study on HLA-B*1801–associated peptides (25), the MIP repertoire of primary thymocytes showed a modest but significant twofold enrichment in proteins located in the cytoplasm and the nucleus. More interestingly, we found an 11–19-fold enrichment in proteins related to cyclin and cyclin-dependent kinases (Ccng1, Cdkn1b, Chek1, Bccip, Cul2, Ccnd3, and CcnF), which regulate the cell cycle. We also found an 11–17-fold enrichment in helicases (Dhx15, Ddx5, Ddx6, Hells, and Ddx47), which are required for efficient and accurate replication, repair, and recombination of the genome.
MHC I–associated peptides derive preferentially from highly abundant mRNAs
We next sought to determine whether the MIP repertoire of thymocytes was molded at the mRNA level. To this end, we compared the relative abundance of two sets of thymic transcripts: (a) those encoding MHC I peptides eluted from primary thymocytes (Table S3), and (b) 36,182 transcripts encoded by the mouse genome (50). We found a dramatic enrichment in highly abundant mRNAs among transcripts coding for MHC I peptides (Fig. 4, B and C). Thus, although 9% of total mRNAs are expressed at high levels, 42% of those coding MHC I peptides did so (Fig. 4 C). Conversely, 62% of total mRNAs showed low abundance, but only 20% of those coding MHC I peptides were expressed at low levels. Nonetheless, some MHC I peptides are coded by low abundance mRNAs (Fig. 4, B and C). We hypothesized that low abundance mRNAs may contribute to the MIP repertoire because they code peptides with very high affinity for MHC I. Our data did not support that assertion. We found no correlation between the computed MHC I binding affinity of peptides and the abundance of their mRNAs (Fig. 4 D). MHC I peptides derived from low-abundance transcripts did not display superior computed MHC binding affinity. Although the ability to detect MHC I peptides is limited by the MS sensitivity (high attomole range) and dynamic range of detection (13 orders of magnitude on log2 scale of Fig. 4), our results demonstrate that MHC I peptides derived preferentially, but not exclusively, from highly abundant mRNAs.
Evidence that the MIP repertoire of thymocytes concealsa tissue-specific signature
Because MHC I peptides derived preferentially from highly abundant mRNAs, and abundance of discrete mRNAs varies among different tissues and organs, we hypothesized the MIP repertoire might harbor a tissue-specific signature. Using the SymAtlas database (http://symatlas.gnf.org/SymAtlas/) (50), we analyzed the relative expression in 58 mouse tissues of mRNAs encoding thymocyte MHC I–associated peptides (see Materials and methods for details on calculation). Remarkably, the mean expression of the 180 mRNAs coding thymocyte MHC I peptides was higher in the thymus than in all other tissues and organs (Fig. 5 A).
Large-scale quantitative analyses of transcriptional expression profiles across different tissues have revealed broadly expressed and tissue-specific groupings (50, 51). To determine whether the MIP repertoire of thymocytes is imprinted by tissue-specific genes, we attributed a z score to individual source mRNAs (52). Genes with high z score (from ∼2.5 to 7) correspond to those that are preferentially overexpressed in the thymus. Among the 180 genes encoding thymocyte MHC I peptides, 30 showed a high z score (Fig. 5, B and C). The z score distribution of genes encoding thymocyte MHC I peptides is illustrated in Fig. 5 B, and a heat map representation of the 30 genes with high z scores is depicted in Fig. 5 C. Next, we estimated how additive removal of high z score genes would affect the thymus specificity of the gene set. Thymus specificity of the gene set was lost after removal of the 30 high z score genes (Fig. 5 D). In contrast, removal of up to 70 random genes did not affect the thymus specificity of the gene set. These results suggest that the MIP repertoire of thymocytes conceals a tissue-specific signature that is constituted by ∼30 genes, i.e., ∼17% of genes that are represented in the thymocytes' MIP repertoire.
Among the 30 genes with high thymic z score, 16 are expressed, albeit at lower levels than in other organs, whereas 14 genes are expressed almost exclusively in hematolymphoid organs: Rhoh, Centb1, Cxcr4, Depdc1a, Foxp1, Dnmt1, 9230105E10Rik, Cd3e, C330027C09Rik, Igtp, Mns1, Dock2, Actr2, and Vps13d (Fig. 5 C). In accordance with the concept that tissue-specific expression is indicative of gene function in mammals (51), we noted that hematolymphoid genes represented in the thymocytes' MIP repertoire play critical roles in T cell development. For example, Rhoh is important for positive thymocyte selection, Cxcr4 for migration of T cell progenitors, and CD3e for migration of T cell development (53–55). Further studies are needed to cogently assess the functional importance of genes that impart tissue specificity to the thymocytes' MIP repertoire. However, evidence suggests that many, if not all, of these genes are functionally important for thymocyte function. In conclusion, two major and related points can be made regarding the connection between the transcriptome and the MIP repertoire. First, the MIP repertoire is enriched in peptides derived from highly abundant transcripts. Second, our data suggest that thymocytes' MIP repertoire conceals a tissue-specific signature that derives from ∼17% of MHC I–associated peptides.
The MIP repertoire of normal versus neoplastic thymocytes
Ultimately, the genesis of MHC I peptides must be regulated by mRNA translation and protein degradation by the proteasome (2, 49), two processes that are profoundly perturbed in neoplastic cells (56, 57). To evaluate the impact of neoplastic transformation on the MIP repertoire, we compared the MIP repertoire of primary thymocytes from C57BL/6 female mice to that of neoplastic thymocytes. As a source of neoplastic thymocytes, we used in vivo grown EL4 cells. EL4 cells were originally derived from a C57BL/6 female mouse. Based on aforementioned experiments on the reproducibility of estimation of peptide abundance by MS analyses (Fig. S1), peptides were considered to be differentially expressed when the fold difference in abundance was ≥2.5 (P < 0.05). Table S3 and S4 present the complete list of peptides eluted from primary thymocytes and in vivo grown EL4 cells and their computed MHC binding score. Overall, 25% of MHC I peptides were differentially expressed on normal versus neoplastic thymocytes (Table II and Fig. 6 A). Thus, 22 peptides were underexpressed and 21 were overexpressed on neoplastic relative to normal thymocytes (Table II). Differentially expressed peptides derived from genes implicated in several biological processes, such as cell cycle progression, apoptosis, signal transduction, cytoskeleton assembly, and differentiation, as well as regulation of transcription and translation (Table II). As an example, the X-linked lymphocyte-regulated 3 (Xlr3a/b) gene encoded a peptide found only on EL4 cells (Fig. 6 B and Table II). Xlr genes are important for T cell differentiation and are overexpressed in several lymphoid malignancies (58). In contrast, an MHC I peptide from cyclin-dependent kinase inhibitor 1B (Cdkn1b) was underexpressed on neoplastic cells (Fig. 6 C, Table II). Cdkn1b is known to act as a potent tumor suppressor gene in a variety of cancers (57). Remarkably, ∼50% of differentially expressed peptides derived from genes that are known to be involved in tumorigenesis (Table II and Table S5). For example, 10 differentially expressed peptides (Bach2, Cdkn1b, Cxcr4, Eif3s2, Eif3s10, Igtp, Pa2g4, Pi3kap1, Ptpn6, and Sgk) originated from genes related to the PI3K–AKT–mTOR pathway, which is the oncogenic signaling pathway most commonly targeted by genomic aberrations in cancer (59, 60).
Genesis of peptides overexpressed on tumor cells
An important question is whether differential expression of MHC I peptides on neoplastic relative to normal thymocytes correlates with changes in mRNA levels of source transcripts. To test this, we selected 19 peptides overexpressed on EL4 cells relative to primary thymocytes, 15 that were underexpressed, and 13 that were not differentially expressed. Then, we analyzed the level of expression of their source mRNAs in neoplastic versus primary thymocytes using quantitative real-time PCR. Scatterplot representation of the correlation between relative mRNA expression and relative MHC I peptide expression is depicted in Fig. 6 D. From the linear regression, a Spearman coefficient of 0.63 was calculated, showing a significant but moderate correlation between peptide and mRNA expression ratios. The strength of the correlation was conspicuously decreased by a set of 14 peptides that were more abundant on neoplastic cells, but whose mRNA levels were not overexpressed (Fig. 6 D, dotted box). Exclusion of these 14 genes increased the correlation coefficient to 0.78. Remarkably, for the 19 peptides overexpressed on EL4 cells, increased transcript levels were present in EL4 cells in only 5 cases (Fig. 6 D). Peptides overexpressed on neoplastic cells are particularly important because they can be used as targets in cancer immunotherapy (61, 62). Thus, the salient finding here is that 74% (14 of 19) of peptides overexpressed on EL4 cells would have been missed by studies of mRNA expression levels. An important implication is that, at least in our model, overexpression of MHC I–associated peptides on neoplastic cells generally entails posttranscriptional mechanisms.
Testing the immunogenicity of peptides overexpressedon neoplastic cells
Finally, we wished to determine whether peptides overexpressed on neoplastic thymocytes (in vivo grown EL4 cells) would be able to elicit specific CD8 T cell response. To test this, we used the following two peptides: (a) STLTYSRM, which is derived from serum/glucocorticoid-regulated kinase (Sgk) and presented by H2Kb, and (b) VAAANREVL derived from X-linked lymphocyte-regulated 3 (Xlr3a/b) and presented by H2Db. One important difference between the two peptides is that VAAANREVL was not found on primary thymocytes (fold difference relative to EL4 cells ≥ 85), whereas low levels of STLTYSRM were present on primary thymocytes (fold difference = 10; Table II and Table S6). TAP-deficient T2-Db and T2-Kb cells were first incubated with titrated amounts of synthetic peptides to evaluate their ability to bind and stabilize H2Kb and H2Db. In contrast to the Ld-restricted peptide RPQASGVYM, both STLTYSRM and VAAANREVL peptide loading resulted in an increase in cell surface expression of H2-Kb or H2-Db (Fig. S2), thereby confirming their ability to bind their respective MHC allele. We next immunized C57BL/6 mice with DCs coated or not coated with VAAANREVL and STLTYSRM synthetic peptides. Splenocytes from immunized mice were tested for in vitro cytotoxicity against primary thymocytes and EL4 target cells that were not loaded with exogenous peptides. Splenocytes from mice primed with unloaded DCs showed no cytotoxic activity. In contrast, splenocytes primed with coated DCs killed EL4 cells, but not primary thymocytes (Fig. 7). Thus, peptides overexpressed by 10- to ≥85-fold on neoplastic cells elicited specific cytotoxic activity against endogenously presented epitopes.
We have developed a novel method for high-throughput analysis of MHC I peptides and performed a comprehensive study of the MIP repertoire of normal and neoplastic thymocytes. Our studies yielded important insights into the genesis of the MIP repertoire and how it is modified by neoplastic transformation.
A novel method for high-throughput, MS-based analysisof MHC I peptides
In the last few years, high-throughput screening methods coupled with bioinformatics tools have helped to figure out the complexity of biological matrices. Thus, emerging technologies in mass spectrometry have led to the development of peptide detection algorithms that can be used, in combination with segmentation analyses, to compare unlabeled peptide populations (unpublished data) (36). This powerful approach allowed us to obtain an accurate quantification of native unlabeled MHC I peptides without any chemical or metabolic labeling modifications. Because preparation of samples does not require different purification steps, higher sensitivity can be achieved from limiting the amount of materials compared with chemical derivatization where peptide recovery can be affected by variable modification yields or side reaction products (31). Thus, by combining our MS strategy with MAE, we were able to generate a comprehensive portrayal of the MIP repertoire of EL4 cells from <108 cells. The ability to identify large numbers of MHC I peptides from limiting amounts of cells is a noteworthy advantage. Moreover, our quantification method can be used to analyze any type of cell population, whereas metabolic labeling strategies can only be applied to cell culture model systems. Nevertheless, low abundance peptides (<100 copies/cell) remain challenging to identify in view of the sensitivity of present high-throughput, MS-based methods. The portrayal of the MIP repertoire described in this study and others is far from complete, and low-abundance peptides that may be biologically relevant could remain elusive to current detection methods.
We took advantage of EL4 variant cell lines (WT, β2m− mutant, and β2m+ transfectant) to unambiguously discriminate between MHC I–associated peptides and contaminants in peptide mixtures obtained by MAE. A large proportion of contaminants was made of long peptides derived from the C-terminal end of source proteins (unpublished data). We noted that some of these contaminants have previously been considered as MHC I–associated peptides in studies where peptides were obtained by MAE or immunoaffinity purification (23, 63). The need to have an MHC I–deficient negative control to distinguish MHC I peptides from contaminants would be a cumbersome limitation for analysis of primary cells. However, we showed that the use of computational models such as SYFPEITHI and smm obviated this need. Thus, thresholds used herein yielded false-positive and -negative rates of ∼2% in identification of MHC I peptides.
The MIP repertoire of primary cells
High-throughput analysis of peptides obtained by MAE provided us with a global portrayal of peptides presented by different MHC I allelic products (in this study: H2Db, H2Kb, Qa1, and Qa2). We found that, with rare exceptions, discrete MHC I molecules presented peptides derived from different sets of source proteins (Table S2 and S3). A corollary is that expression of multiple MHC I allelic products (a consequence of gene duplication and diversification ) favors representation of largely nonoverlapping sets of source proteins in the MIP repertoire. By integrating global profiling of the mouse protein-encoding transcriptome (50) with the MIP repertoire of thymocytes, we found that the thymocytes' MIP repertoire is enriched in peptides derived from highly abundant transcripts. Furthermore, our data suggest that the repertoire of MHC I–associated peptides conceals a tissue-specific signature that derives from ∼17% of genes represented in the MIP repertoire. Cogent evaluation of this exciting concept will require comprehensive analyses of MHC I–associated peptides eluted from various tissues and organs. Why would the MIP repertoire show a stronger correlation with the transcriptome (this study) than the proteome (28)? Probably because the proteome is enriched in slowly degraded proteins (with a mean t1/2 of >1,000 min), whereas the MIP repertoire originates mainly from rapidly degraded proteins (with a mean t1/2 of ∼10 min) that have recently been translated and degraded (28, 49). In other words, MHC I molecules sample what is being translated rather than what has been translated (49). Considering the pervasive roles of MIPs, particularly in CD8 T cell development and function, it will be extremely interesting to determine whether the MIP repertoire of specialized cell types does conceal a tissue-specific signature. Of particular interest are thymus cortical epithelial cells, which support thymocyte positive selection, and thymus medullary epithelial cells, which express promiscuous transcripts involved in tolerance induction. Besides, although the MIP repertoire is enriched in peptides derived from highly abundant mRNAs, some peptides presented by MHC class I molecules derive from low abundance mRNAs (Fig. 4). How can low-abundance transcripts successfully compete with more abundant transcripts for representation in the MIP repertoire? Attractive possibilities would be that successful transcripts generate more DRiPs than others or that they are translated by special “immunoribosomes” (33, 49).
The proteasome, which is the primary source of MHC I peptides, is much more ancient than the MHC. The MHC appeared in gnathostomes, whereas proteasomes are found in all eukaryotes. Fundamental functions of the proteasome are to regulate cell cycle, proliferation, and apoptosis (65). In line with this, we found that the MIP repertoire of primary thymocytes was enriched in peptides derived from cyclins, cyclin-dependent kinases, and helicases (Fig. 4 A). These highly conserved proteins regulate cell proliferation, cell cycle arrest, and apoptosis. Peptides derived from cyclin, cyclin-dependent kinase, and helicase gene families have also been found in other studies of MHC I peptides (9, 25, 63). We surmise that the presence of these peptide families in the MIP repertoire may represent an imprint of primordial functions of the proteasome. Nevertheless, an alternative hypothesis would be that overrepresentation of peptide-derived proteins regulating proliferation and apoptosis is a thymocyte-specific feature because thymocytes display particularly high rates of proliferation and apoptosis.
The MIP repertoire of neoplastic cells
Neoplastic transformation is associated with many genomic and proteomic changes. How these changes may impinge on the MIP repertoire and thereby be perceived by CD8 T cells, is a fundamental question in cancer immunology that can be addressed directly only by MS-based analysis of the MIP repertoire. Weinzierl et al. recently reported a seminal study in which they integrated mass spectrometry and microarray data on renal cell carcinomas and autologous normal kidney tissues (9). They found a poor correlation between changes in peptide expression and changes in the abundance of mRNA levels in normal versus cancer cells (r = 0.32). We found a better correlation between mRNA levels and expression of corresponding MHC I peptides in normal and neoplastic thymocytes (Fig. 6 D; r = 0.63). We surmise that the stronger correlation in our case may be caused by estimation of transcript levels with quantitative real-time PCR analyses rather than microarrays. Indeed, quantitative real-time PCR provides a more accurate estimation of quantitative differences than microarrays (66). However, our data support the main conclusion of Weinzierl et al., which is that a majority of peptides overexpressed on neoplastic cells (74% in our case) would have been missed by estimation of transcript levels. Together with data from Weinzierl et al., our results suggest that in general, overexpression of MHC I peptides on neoplastic cells is caused by posttranscriptional mechanisms and can be detected only by MS-based expression profiling approaches. Further studies on diverse cancer types are needed to test the generality of these observations. Relevant mechanisms may include dysregulation of microRNA expression, protein translation, and proteasomal degradation (67–69).
Our data suggest that the MIP repertoire gives a unique perspective into the mechanisms of carcinogenesis. Approximately 50% of peptides differentially expressed on normal versus neoplastic cells are coded by genes involved in neoplastic transformation (Table II and Table S5). It is quite remarkable that 10 of the differentially expressed peptides derived from genes related to the PI3K–AKT–mTOR pathway. The PI3K–AKT–mTOR pathway is the most prominent pathway regulating protein translation, cell growth, and cell proliferation, and is activated in most cancers (59). In addition, our data lead us to speculate that the MIP repertoire might give unique insights into acquired epigenetic abnormalities that are so important in cancer. Indeed, Dnmt1, which regulates DNA methylation and prevents genomic instability (70), is expressed at high levels in thymocytes (Fig. 5 C). We found that a Dnmt1-derived peptide was overexpressed ∼9-fold on neoplastic relative to primary thymocytes (Table II). This raises the enticing possibility that enhanced proteasomal degradation of Dnmt1 might be responsible for the genomic instability of EL4 cells. Finally, we were able to generate specific cytotoxic T cell responses against EL4 cells by priming WT mice with two peptides overexpressed on EL4 cells (Fig. 7). The finding that efficient cytotoxic responses could be elicited against endogenously expressed tumor peptides identified with our MS-based expression analyses validates the analytical potential of the present approach. Though further work is needed to evaluate the therapeutic value of vaccination with these peptides, our results strongly support the concept that high-throughput analysis of the MIP repertoire is a promising discovery platform for the identification of immunogenic tumor-associated peptides (27, 61, 62).
Materials And Methods
Chemicals and materials.
Citric acid, aprotinin, iodoacetamide, and sodium phosphate dibasic (Na2HPO4) were purchased from Sigma-Aldrich; high-performance liquid chromatography–grade water, methanol (MeOH), and acetonitrile were obtained from Thermo Fisher Scientific; formic acid (FA) and ammonium acetate were purchased from EM Science; fused-silica capillaries were obtained from Polymicro Technologies; and Teflon and PEEK tubing were purchased from Supelco. The Jupiter Proteo C12 4 μm material used for packing homemade precolumn and column was obtained from Phenomenex. Strong cation exchange (SCX) material was purchased from PolyLC.
Cell lines and flow cytometry.
The EL4 (β2m+ WT), C4.4-25− (β2m− mutant), and E50.16+ (β2m+ transfectant of C4.4-25−) cell lines were provided by R. Glas (Karolinska Institutet, Karolinska University Hospital, Huddinge, Sweden). T2-Kb and T2-Db cell lines (gifts from S. Joyce, Vanderbilt University School of Medicine, Nashville, TN) were maintained as previously described (71). MHC class I molecules at the cell surface were stained with PE-conjugated anti–H-2Kb (clone AF6-88.5; BD Biosciences), FITC-conjugated anti–H-2Db (clone KH95; BD Biosciences), and biotin-conjugated anti–Qa-2 (clone 1–1-2; BD Biosciences) and analyzed on a BD LSR II flow cytometer using FACSDiva software (BD Biosciences).
Peptide extraction and mass spectrometry analysis.
Freshly isolated thymocytes were obtained from 4–6-wk-old C57BL/6 female mice purchased from The Jackson Laboratory. All mice were maintained under specific pathogen–free conditions according to the standards of the Canadian Council on Animal Care. Experimental protocols were approved by the Comité de Déontologie Animale of the Université de Montréal. Thymocytes were separated from stroma according to standard procedures (72). For isolation of EL4 cells grown in vivo, mice were injected intraperitoneally with 200,000 EL4 cells, and ascites fluid was harvested 17 d later. Purity of EL4 cells was assessed by flow cytometry (73). Three biological replicates were prepared and analyzed for normal thymocytes and EL4 cells. Cell surface MHC I peptides were isolated from viable cells as previously described, but using a slightly modified protocol (37). In brief, 4 ml of citrate-phosphate buffer at pH 3.3 (0.131 M citric acid/0.066 M Na2HPO4, NaCl 150 mM) containing aprotinin and iodoacetamide (1:100) was added to each flask, and cell pellets were resuspended by gentle pipetting for 1 min to denature MHC I complexes. Cell suspensions were then pelletted and the resulting supernatant was isolated. Peptides extracts were desalted using Oasis HLB cartridges (30 mg; Waters), and bound material was eluted with 1 ml H2O/80% MeOH/0.2% FA (vol/vol) and diluted to H2O/40% MeOH/0.2% FA (vol/vol). Peptides were then passed through ultrafiltration devices (Amicon Ultra; Millipore) to isolate peptides <5,000 Daltons and to remove β2m proteins. The resulting flowthrough was then lyophilized and stored at −30 or −80°C until analysis.
To extract the equivalent amount of MHC I peptide from EL4 cells and normal thymocytes, we assessed the amount of MHC I molecules on both cell populations using previously described methods (74). Equivalent amount of MHC I material from neoplastic (8 × 107) and normal (2.3 × 109) thymocytes were separated by on-line 2D separation (SCX/C12 Jupiter Proteo 4μm) using an Eksigent nanoLC-2D system. Samples were diluted in H2O/2% acetonitrile/0.2% FA before LC-MS analyses. The homemade SCX column (0.3 mm i.d. × 45 mm) was connected directly to the switching valve. During sample loading, the SCX column was positioned off-line of C12 precolumn to remove interfering species. Salt fractions (10 μl each) were loaded on SCX column at 5 μl/min for 6 min to sequentially elute peptides onto the C12 precolumn using pulsed fractions of 0, 75, 300, and 1,000 mM of ammonium acetate (pH 3.0). A 69-min gradient from 3–60% acetonitrile (0.2% FA) was used to elute peptides from a homemade reversed-phase column (150 μm i.d. × 100 mm) with a flow rate set at 600 nanoliter/min. On-line 2D-nanoLC-MS system was used to provide enhanced selectivity, as well as a higher capacity to detect low-abundance MHC I peptides. To achieve high mass accuracy and sensitivity from a limited amount of starting material, abundance profiles and tandem MS experiments were performed simultaneously using a LTQ-Orbitrap mass spectrometer (Thermo Fisher Scientific) equipped with a nanoelectrospray ion source (1.5–1.7 kV). MS scans were acquired in the FT mode (Orbitrap) with a resolution set at 60,000 (m/z 400). Each full MS spectrum was followed by three MS/MS spectra (four scan events), where the three most abundant multiply charged ions were selected for MS/MS sequencing. Tandem MS experiments were performed using collision-induced dissociation in the linear ion trap (IT mode). Target ions already selected for MS/MS fragmentation were dynamically excluded for 80 s, and a minimal intensity value of 10,000 was fixed for precursor ion selection.
Peptide detection and clustering.
Raw data files generated from the Orbitrap were processed using in-house peptide detection software (Mass Sense) to identify all ions according to their corresponding m/z values, charge state, retention time, and intensity. From this process, lists of detected peptides ions were generated to define individual LC-MS analysis. A user-defined intensity threshold above the background noise (typically between 5,000 and 15,000) was fixed to limit false-positive identification. Segmentation analyses were performed across sample sets using hierarchical clustering to generate lists of nonredundant peptide clusters (35). User-specified tolerances were fixed to, typically, ±0.04 m/z, ±1.5 min, and ±1 fraction for 2D-LC-MS Orbitrap experiments. Identification files from Mascot were converted in Excel format (Microsoft), and sequenced peptides were aligned with their corresponding peptide clusters using in-house clustering software.
MS/MS sequencing and protein identification.
The data were searched against International Protein Index mouse database using the Mascot (Matrix Science) search engine. The mass tolerance on precursor and fragment ions was set to ±0.1 and ±0.4 Daltons, respectively. Searches were performed without enzyme specificity. All relevant hits were inspected manually for mass accuracy and to validate sequence assignment according to the observation of consistent b- and y-types fragment ion series. From MHC I peptide structures identified, 85% were sequenced at least twice across replicate injections, and all had mass accuracy within 30 ppm of the theoretical values (without the use of a lock mass). To estimate the false-positive rate across non–MHC I–associated peptide identification, β2m-deficient EL4 samples were searched against a concatenated forward/reverse International Protein Index mouse database to establish a cutoff score threshold (>45) for a false-positive rate of <2%. All identified MHC I peptides were blasted against nonredundant NCBInr (restricted to mouse entries) to identify their corresponding source proteins (Gene ID).
Quantitative real-time PCR.
Freshly isolated thymocytes and EL4 cells were homogenized in TRIzol RNA preparation reagent (Invitrogen), and total RNA was isolated as instructed by the manufacturer. 1 μg of total RNA was converted to cDNA using random hexamer priming (Thermoscript RT-PCR System; Invitrogen). Gene expression level was determined using primer and probe sets from Universal ProbeLibrary (Table S7). PCR reactions for 384-well plate formats were performed using 2 μl of cDNA samples (50 ng), 5 μl of the TaqMan PCR Master Mix (Applied Biosystems), 2 μM of each primer, and 1 μM of the Universal TaqMan probe in a total volume of 10 μl. The ABI PRISM 7900HT Sequence Detection System (Applied Biosystems) was used to detect the amplification level and was programmed to an initial step of 10 min at 95°C, followed by 40 cycles of 15 s at 95°C and 1 min at 60°C. All reactions were run in triplicate, and the mean values were used for quantification. The mouse GAPDH or 18S ribosomal RNAs were used as endogenous controls.
Microarray dataset cross comparison.
Normalized mRNA expression data (50) (http://symatlas.gnf.org/SymAtlas/) used in this study were visualized with TIGR MultiExperiment Viewer (http://www.tigr.org/software/microarray.shtml). Normalized mean expression values were calculated as follows for each tissue: average expression value of 180 peptide source genes/average expression value of 36,182 transcripts.
Expression Analysis Systematic Explorer analysis and statistical methods.
The Expression Analysis Systematic Explorer (EASE) software (75) was downloaded from the Database for Annotation, Visualization, and Integrated Discovery (http://david.abcc.ncifcrf.gov/ease/ease.jsp) (76). For gene enrichment analysis, statistical and corrected P values were calculated from the Fisher exact test and the global false-discovery rate, respectively.
Z score transformation.
Raw intensity data for each tissue were log2 transformed and used for the calculation of z scores. Z scores were calculated by subtracting the overall average tissue intensity (for a single gene) from the raw intensity data within a given tissue, and dividing that result by the SD of the overall tissue intensities: z score = (intensity GTh − mean intensity GT1…Tn)/SD GT1…Tn, where GTh is any gene on the microarray from the thymus tissue and T1…Tn represents the aggregate measure of all tissues.
Peptide-binding and in vitro cytolytic assay.
Peptides were synthesized by the Centre Hospitalier de l'Université Laval Research Center (Quebec) and purified by high performance liquid chromatography (purity > 90%). Rescue of class I expression in T2-Db and T2-Kb cells by allele-specific peptides was determined as previously described (77). Bone marrow–derived DCs were generated as previously described (78). On day 9 of culture, peptides were added at a concentration of 2 × 10−6 M and incubated with DCs for 3 h at 37°C. For mouse immunization, 106 peptide-pulsed DCs were injected i.v. in C57BL/6 females at day 0 and 7. At day 14, splenocytes were harvested from the spleens of immunized mice and depleted of red blood cells using 0.83% NH4CL. Cells were plated at 5 × 106 cells/well in 24-well plates and restimulated with 2 × 10−6 M peptide at 37°C. After 6 d, cytotoxicity was evaluated by a CFSE-based assay (79). The percentage of specific lysis was calculated as follows: (number of remaining CFSE+ cells after incubation of target cells alone − number of remaining CFSE+ cells after incubation with effector cells/number of CFSE+ cells after incubation of target cells alone) × 100.
Online supplemental material.
Fig. S1 illustrates that 95% of peptide ions showed a variation of less than ±1.4- and ± 2.2-fold in abundance across 2D-nanoLC-MS instrumental and biological replicates, respectively. Fig. S2 depicts experimental data that validate bioinformatic discrimination between MHC I–associated peptides and contaminant peptides. Table S1 shows the relative abundance of peptides eluted from WT, β2m−, and β2m+ EL4 cell lines. Table S2 shows the computed MHC binding affinity of MHC I–associated peptides eluted from EL4 cells. Table S3 shows the computed MHC binding affinity of MHC I–associated peptides eluted from primary mouse thymocytes. Table S4 shows the computed MHC binding affinity of MHC I–associated peptides eluted from in vivo grown EL4 cells. Table S5 is a list (with references) of cancer-related genes coding for MHC I peptides differentially expressed in neoplastic (EL4) versus normal thymocytes. Table S6 shows the relative abundance of MHC I–associated peptides eluted from primary thymocytes versus in vivo grown EL4 cells. Table S7 is a list of primers used for quantitative RT-PCR analyses.
We are grateful to the staff of the following core facilities at the Institute for Research in Immunology and Cancer for their outstanding support: Animal facility, Bioinformatics, Flow Cytometry, and Proteomics. We thank Dr. R. Glas for kindly providing us with WT EL4, C4.4-25−, and E50.16+ cell lines.
This work was supported by funds from the Canadian Cancer Society and the Terry Fox Foundation through the National Cancer Institute of Canada. M.-H. Fortier and E. Caron are supported by training grants from the Natural Sciences and Engineering Research Council of Canada and the Canadian Institutes of Health Research, respectively. C. Perreault and P. Thibault hold Canada Research Chairs in Immunobiology and Proteomics and Bioanalytical Spectrometry, respectively.
The authors have no conflicting financial interests.
Abbreviations used: 2D-nanoLC-MS, two-dimensional liquid chromatography nanoelectrospray MS; β2m, β2-microglobulin; DRiP, defective ribosomal product; FA, formic acid; MAE, mild acid elution; MIP, MHC class I peptide; MS, mass spectrometry; MS/MS, tandem MS; ppm, parts per million; SCX, strong cation exchange.
M.-H. Fortier and E. Caron, and C. Perreault and P. Thibault contributed equally to this work.