Metazoan organisms assemble two isoforms of the oligosaccharyltransferase (OST) that have different catalytic subunits (STT3A or STT3B) and partially nonoverlapping roles in asparagine-linked glycosylation. The STT3A isoform of the OST is primarily responsible for co-translational glycosylation of the nascent polypeptide as it enters the lumen of the endoplasmic reticulum. The C-terminal 65–75 residues of a glycoprotein will not contact the translocation channel–associated STT3A isoform of the OST complex before chain termination. Biosynthetic pulse labeling of five human glycoproteins showed that extreme C-terminal glycosylation sites were modified by an STT3B-dependent posttranslocational mechanism. The boundary for STT3B-dependent glycosylation of C-terminal sites was determined to fall between 50 and 55 residues from the C terminus of a protein. C-terminal NXT sites were glycosylated more rapidly and efficiently than C-terminal NXS sites. Bioinformatics analysis of glycopeptide databases from metazoan organisms revealed a lower density of C-terminal acceptor sites in glycoproteins because of reduced positive selection of NXT sites and negative selection of NXS sites.
Asparagine-linked glycosylation is an evolutionarily conserved protein modification reaction that occurs on N-(X≠P)-T/S/C consensus sites (sequons) on newly synthesized proteins in the lumen of the ER of eukaryotic cells and on the exoplasmic surface of archaebacteria and certain proteobacteria (Larkin and Imperiali, 2011). The donor substrate for N-linked glycosylation in most eukaryotic organisms is the dolichol pyrophosphate–linked oligosaccharide GlcNAc2Man9Glc3. The crystal structure of an acceptor peptide bound to the Campylobacter lari PglB, a eubacterial oligosaccharyltransferase (OST), showed that the hydroxyamino acid (T/S) in the sequon helps position the asparagine next to the active site residues involved in catalysis (Lizak et al., 2011). The architecture of the PglB active site is consistent with biochemical experiments, indicating that functional acceptor sites must be in unfolded or flexible regions of newly synthesized polypeptides (Kowarik et al., 2006).
In higher eukaryotes, the OST is a heteroligomeric membrane protein composed of seven to eight nonidentical subunits (Kelleher and Gilmore, 2006). The STT3 subunit contains the OST active site (Yan and Lennarz, 2002; Nilsson et al., 2003) and is the sole OST subunit that is conserved between the eukaryotic, archaebacterial, and eubacterial enzymes (Wacker et al., 2002). Vertebrate, insect, and multicellular plant genomes encode two STT3 proteins (STT3A and STT3B; Kelleher et al., 2003). The canine and human STT3A and STT3B proteins are incorporated into OST complexes that have distinct kinetic properties, with the STT3B isoform displaying a reduced stringency of selection of the dolichol pyrophosphate–linked GlcNAc2Man9Glc3 donor relative to lumenally oriented assembly intermediates (dolichol pyrophosphate–linked GlcNAc2Man5-9Glc0-2; Kelleher et al., 2003). Selective depletion of STT3A or STT3B in HeLa cells via siRNA treatment has shown that the two OST isoforms have partially nonoverlapping cellular roles in N-linked glycosylation (Ruiz-Canada et al., 2009). The STT3A isoform is associated with the protein translocation channel (Shibatani et al., 2005) and mediates co-translational glycosylation of NXT/S sites on a nascent polypeptide as it enters the ER lumen (Nilsson et al., 2003; Ruiz-Canada et al., 2009). Glycosylation of the rapidly folding protein prosaposin is particularly sensitive to STT3A depletion (Ruiz-Canada et al., 2009). The STT3B isoform of the OST complex can mediate posttranslational glycosylation of NXT/S sites that are skipped by the translocation channel–associated STT3A complex (Ruiz-Canada et al., 2009). As very few posttranslational glycosylation sites have been described (Bolt et al., 2005; Bas et al., 2011; Tamura et al., 2011), we do not known why certain sequons are skipped by STT3A and then modified by STT3B, nor is it known whether glycosylation site skipping is a common or rare event.
One class of glycosylation site that might be prone to skipping by the STT3A isoform of the OST complex are NXT/S sites located near the extreme C terminus of a protein. In vitro experiments using ribosome-tethered nascent polypeptides indicate that a NXT/S site in a nascent polypeptide first becomes accessible to the OST active site when it is located 65–75 residues from the peptidyltransferase center on the ribosome (Whitley et al., 1996; Nilsson et al., 2003; Deprez et al., 2005). Movement of a nascent polypeptide past the OST active site is limited by the protein synthesis elongation rate, which is roughly six residues/second in mammalian cells (Hershey, 1991). After chain termination, sequons located in the last 65–75 residues of a protein might move by STT3A more rapidly and be skipped. Although these C-terminal glycosylation sites may be prone to skipping, modification of such sites can be critical for proper folding and subsequent vesicular transport of secretory proteins (Matzuk and Boime, 1988; McKinnon et al., 2010).
Glycosylation sites that are inserted into the C-terminal epitope tags of recombinant proteins are posttranslationally glycosylated when expressed in Xenopus laevis oocytes (Pult et al., 2011). However, it is not known which OST isoform is responsible for C-terminal glycosylation, nor is it known what factors influence the modification efficiency of naturally occurring C-terminal sites. A comprehensive biochemical and bioinformatics analysis of naturally occurring C-terminal glycosylation sites has not been previously reported. Our bioinformatics analysis of large glycopeptide databases from fungal, plant, and metazoan organisms indicates that the efficiency and mechanism of glycosylation of extreme C-terminal sites is evolutionarily conserved. Here, we show that extreme C-terminal glycosylation sequons in human proteins are glycosylated by an STT3B-dependent posttranslocational pathway. The kinetics and modification efficiency of C-terminal sites is strongly dependent on the acceptor site sequence. NXT sites are modified more rapidly and efficiently than NXS sites by STT3B even when both sites are present in the same polypeptide. Glycosylation sites that are located >50 residues from the C terminus show a reduced dependence on STT3B, indicating that 50 residues can be considered as a boundary for extreme C-terminal glycosylation sites.
Bioinformatics analysis of C-terminal glycosylation sites
The most comprehensive list of experimentally verified N-glycosylation sites for a mammal has been obtained by glycoproteomic analysis of four tissues and plasma from mice (Zielinska et al., 2010). Based upon detection of previously documented murine glycopeptides, coverage of the murine glycoproteome may be as high as 75%. Using this large database of 4,922 high confidence murine glycosylation sites that match the N-(X≠P)-T/S/C sequon, we determined the percentage of modified sites that fall within each decile of protein length (Fig. 1 A). As observed previously with smaller glycopeptide datasets (Gavel and von Heijne, 1990; Ben-Dor et al., 2004), the N-terminal and C-terminal regions of murine glycoproteins have a lower density of N-linked glycans than internal regions.
Biological explanations for N-terminal or C-terminal reductions in glycan density are obscured by decile analysis because of the wide range in glycoprotein sequence length. To obtain a length-based assessment of glycan distribution, we calculated the normalized glycan density (glycans/1,000 amino acid residues) in 25-residue increments starting from the C terminus of the murine glycoproteins. The C-terminal regions of glycoproteins have 2.5-fold fewer N-linked glycans than internal regions (Fig. 1 B). The transition between the regions of lower and higher glycan density was surprisingly abrupt and occurred roughly 75 residues from the C terminus. The NXT/NXS ratio for the C-terminal glycopeptides was higher than for subsequent internal 75 residue blocks, each of which were similar to the bulk NXT/NXS ratio (Fig. 1 B, inset).
The density of total sequons (NXT/S/C) was reduced in the C-terminal 75 residues of the murine glycoproteins (Fig. 1 C). Color-coded dashed lines indicate the expected frequency for NXT, NXS, or NXC sequons based upon the amino acid composition of the murine glycoproteins. There is a strong positive selection for NXT sequons and a moderate selection for NXS sequons in murine glycoproteins. Examination of the C-terminal 75 residues revealed reduced positive selection for NXT sites as well as negative selection for NXS sites. NXC sites, which are modified at a 10-fold lower frequency than NXT sites, are subject to negative selection regardless of location. The apparent frequency of sequon modification (Fig. 1 D) was calculated by dividing the glycan density by the sequon density. Extreme C-terminal sequons are modified less frequently than internal sequons primarily because of a twofold reduction in the modification frequency of NXS sites (Fig. 1 D, inset).
STT3B-dependent glycosylation of extreme C-terminal sequons
Secretory proteins and soluble lysosomal proteins were the source of 52% of the glycopeptides located in the final 75 residues of murine glycoproteins. To address the mechanism of glycosylation of C-terminal sequons, we selected several secretory or lysosomal glycoproteins for analysis in HeLa cells. Sex hormone–binding globulin (SHBG), a 402 residue secreted glycoprotein, is an ideal substrate to investigate glycosylation of C-terminal sites (Fig. 2 A) because the two secreted forms of SHBG differ by having either one or two N-linked oligosaccharides (Bocchinfuso et al., 1992). HeLa cells transfected with expression vectors encoding wild-type SHBG or various SHBG glycosylation site mutants were pulse labeled and chased for 0 or 20 min. Wild-type SHBG could be resolved into two major polypeptides after pulse–chase labeling (Fig. 2 B). Removal of N-linked glycans by endoglycosidase H digestion yielded a single more rapidly migrating polypeptide, confirming that the SHBG doublet is caused by heterogeneous glycosylation. Most of the SHBG chains have a single glycan at the conclusion of the 3-min pulse-labeling period, whereas 50% of SHBG chains acquire a second glycan during the 20-min chase period. Trimming of glucose residues from N-linked glycans during the 20-min chase incubation is responsible for the minor increase in gel mobility of monoglycosylated SHBG that occurs during the chase incubation (Fig. 2 B, compare 0- and 20-min chase samples). The two single-site SHBG mutants (N380Q and N396Q) were pulse labeled to determine which site was glycosylated during each phase of the labeling experiment. The N396GT site was efficiently modified, primarily during the initial pulse (Fig. 2 B, N380Q mutant). The N380RS site was incompletely modified (0.7 glycan/site), primarily during the chase period (Fig. 2 B, N396Q mutant). Additional mutants were analyzed to determine why the kinetics and modification efficiency of the N380RS and N396GT site differed. Converting the N380RS into an N380RT site (S382T and S382T N396Q mutants) increased both the glycosylation rate and extent of site occupancy. Conversely, replacing threonine with serine in the N396GT site (T398S and T398S N380Q mutants) decreased the rate and extent of glycosylation. These differences in sequon modification (NXT vs. NXS) are consistent with the bioinformatics analysis, indicating that C-terminal NXS sites are modified less efficiently than internal NXS sites or C-terminal NXT sites (Fig. 1 D, inset).
Wild-type SHBG and the two single-site mutants were expressed in HeLa cells that had been transfected with previously characterized siRNAs that are specific for the STT3A or STT3B mRNAs (Ruiz-Canada et al., 2009). A10-fold depletion of STT3A did not reduce glycosylation of wild-type SHBG or either single-site mutant relative to a negative control siRNA (Fig. 2, C and D). siRNA-mediated depletion of STT3B, which lowers STT3B levels by roughly sixfold (unpublished data; Ruiz-Canada et al., 2009), reduced glycosylation of wild-type SHBG and both single-site mutants. Interestingly, the suboptimal N380RS site was more sensitive than the N396GT site to the reduction in STT3B levels (Fig. 2 D), likely because of the different kinetics of modification. Simultaneous depletion of both STT3A and STT3B was similar to STT3B depletion alone, indicating that STT3A has little, if any, role in glycosylation of SHBG (Fig. 2 C).
A more extensive pulse–chase analysis of SHBG glycosylation was conducted to determine the t1/2 for glycosylation of the C-terminal sites (Fig. 2 E). Quantification of the pulse–chase data yielded the expected relationship between a nonglycosylated precursor, a monoglycosylated intermediate, and the fully glycosylated product. Disappearance of nonglycosylated SHBG, which corresponds to glycosylation of the N396GT site occurred with a t1/2 of 2 ± 1 min. Accumulation of diglycosylated SHBG, which is diagnostic of glycosylation of the N380RS site, occurred with a t1/2 of 9 ± 2 min. The N396Q mutant was analyzed to determine whether the kinetics of glycosylation of the N380RS site was influenced by glycosylation of the N396GT site (Fig. 2 G, top and bottom, black symbols). The t1/2 for glycosylation of the N380RS site in the N396Q mutant was 7 ± 2 min, indicating that glycosylation of the two sites is independent. Depletion of STT3B reduced the rate and final extent of glycosylation of both acceptor sites in wild-type SHBG (Fig. 2 F). The calculated t1/2 values for glycosylation of the N396GT site and the N380RS site in STT3B-depleted cells increased to 5 and >30 min, respectively. Depletion of STT3B reduced the rate of glycosylation of the N380RS site by more than fourfold in the N396Q mutant (Fig. 2 G, open symbols). Collectively, these experiments indicate that the extreme C-terminal sites in SHBG are glycosylated by STT3B after the complete polypeptide has passed through the protein translocation channel; hence, modification of these sites is obligatorily posttranslational and mechanistically posttranslocational.
Internal glycosylation sites enhance C-terminal glycosylation of β-glucuronidase (β-GUS)
Our analysis of the murine glycopeptide database indicated that most glycoproteins with an extreme C-terminal glycan have one or more internal NXT/S sequons. Human β-GUS has three internal sites (sequons 1–3) and one C-terminal site (Fig. 3 A, sequon 4). Using a Myc-DDK–tagged β-GUS construct, we confirmed a previous observation (Shipley et al., 1993) that mutation of two or three internal NXT/S sites (β-GUSΔ23 or β-GUSΔ123) resulted in the accumulation of glycoform doublets, indicating incomplete modification of N631 (Fig. 3 B). These doublets were prominent in the 0-min chase samples (β-GUSΔ23 and β-GUSΔ123) unless the N631ET site (β-GUSΔ4) was eliminated (Fig. 3 B). β-GUS chains lacking 1–4 glycans were synthesized by HeLa cells when STT3A and STT3B were simultaneously depleted (Fig. 3 C). Depletion of STT3A alone did not cause hypoglycosylation of wild-type β-GUS or either mutant, whereas depletion of STT3B resulted in glycoform doublet accumulation for wild-type β-GUS and the β-GUSΔ123 mutant (Fig. 3 C). These results indicate that the three internal sequons in β-GUS can be glycosylated by either OST isoform, whereas STT3B is necessary for efficient glycosylation of the C-terminal N631ET site.
In vivo biosynthesis of β-GUS takes roughly 2 min. A kinetic analysis of wild-type β-GUS glycosylation showed that 70% of the translocated β-GUS chains were fully glycosylated by the end of a 5-min pulse-labeling period (Fig. 3 D). The remaining 30% of β-GUS chains, which lack a single glycan, disappeared during the chase. The nonglycosylated band that co-migrates with the endoglycosidase H digestion product is a nontranslocated precursor that was excluded from the kinetic analysis. Quantification of the pulse–chase experiment (Fig. 3 E, squares) yielded a t1/2 of 1.9 ± 0.2 min for posttranslocational glycosylation of wild-type β-GUS. Pulse–chase analysis of the β-GUSΔ123 mutant showed that the N631ET site was glycosylated at a reduced rate (t1/2 = 7.6 ± 4 min) when the internal sites were absent (Fig. 3 E, circles). Although STT3B depletion strongly inhibited glycosylation of the N631ET site (Fig. 3 D), a t1/2 could not be calculated without making an assumption concerning the final extent of modification (Fig. 3 E, triangles). Glycosylation of β-GUS mutants with one internal site (Δ12, Δ13, or Δ23) is sensitive to STT3B depletion (Fig. 3 F). Pulse–chase experiments analyzed as in Fig. 3 E showed that a single internal glycan, regardless of position (Δ12, Δ13, or Δ23), reduced the t1/2 for glycosylation of the C-terminal N631ET site relative to the β-GUS Δ123 mutant (Fig. 3 G).
C-terminal glycosylation of type II membrane proteins
Type II (Ncytosol-Clumenal) membrane proteins are the second most abundant class of murine proteins with glycans in their C-terminal 75 residues (22% of C-terminal glycopeptides). Human CD40 ligand was selected as an example of a type II membrane protein with a single C-terminal glycosylation site (Fig. 4 A). Glycosylation of the N240VT site was reduced slightly by STT3B depletion in the Myc-DDK– and MycΔDDK-tagged versions of CD40L (Fig. 4 B). Replacing the NVT sequon with an NVS sequon (T242S mutant) reduced the modification frequency of the site and caused an increase in sensitivity to STT3B depletion.
CD69 has a C-terminal NVT site and an internal NAC site (Fig. 4 C). Although depletion of either STT3A or STT3B caused reductions in glycosylation of CD69-Myc-DDK, it was not clear which site or sites were affected (Fig. 4 D). Elimination of the NAC site (N111Q mutant) confirmed a previous study that heterogeneous glycosylation of CD69 can be ascribed to incomplete modification of the N111AC site (Vance et al., 1997). In the context of the shorter epitope tag (MycΔDDK), glycosylation of the N166VT site was mildly sensitive to STT3B depletion but not to STT3A depletion. Replacing the NVT site with an NVS site (T168S mutant) reduced the glycosylation of N166. Thus, extreme C-terminal sequons in type II membrane proteins are also STT3B substrates.
Another role for the STT3B isoform of the OST is to glycosylate internal sites that are skipped by STT3A. We constructed a CD69 N166Q-MycΔDDK mutant to determine which OST isoform mediates glycosylation of the N111AC site (Fig. 4 D). Depletion of either STT3A or STT3B caused a partial reduction in N111AC modification, indicating that both STT3A and STT3B contribute to the observed, albeit partial, glycosylation of the suboptimal N111AC site.
Defining a boundary for C-terminal glycosylation
The serum glycoprotein transferrin (Tf) has one internal and one C-terminal glycosylation site (Fig. 5 A). Hypoglycosylated Tf is a diagnostic marker for the family of diseases known as congenital disorders of glycosylation type 1 (CDG-1). Most subtypes of CDG-1 are caused by incomplete assembly of dolichol pyrophosphate–linked GlcNAc2Man9Glc3, the oligosaccharide donor for N-glycosylation (Haeuptle and Hennet, 2009). Mass spectroscopy of serum Tf from CDG-1 patients has indicated that the C-terminal site is more frequently skipped than the internal site (Hülsmeier et al., 2007).
HeLa cells transfected with DDK-His–tagged Tf were subjected to pulse–chase analysis to determine whether the C-terminal site in Tf undergoes posttranslocational glycosylation (Fig. 5 B). Immunoprecipitation of the pulse-labeled samples with anti-DDK sera indicated that Tf was fully glycosylated at the end of the 5-min pulse. No evidence for posttranslocational modification of either sequon was obtained by pulse–chase analysis of the two single-site Tf mutants.
Depletion of STT3A plus STT3B resulted in synthesis of Tf-DDK-His chains that lack one to two N-linked glycans (Fig. 5 C). Monoglycosylated Tf was also produced when STT3A was depleted but absent when STT3B was depleted. Analysis of the N630Q mutant indicated that the N432KS site was particularly sensitive to STT3A depletion. A subset of the siRNA depletion experiments were conducted using an untagged Tf construct (Fig. 5 D) to determine whether STT3B is required for glycosylation of the N630VT site when this site is located closer (68 residues) to the C terminus. As observed for the DDK-His–tagged construct, wild-type Tf was sensitive to STT3A depletion. The C-terminal N630VT site (N432Q mutant) was relatively insensitive to depletion of either STT3A or STT3B.
To define a boundary for STT3B-dependent posttranslocational glycosylation, additional NVT sequons were inserted into the Tf-DDK-His construct or the N432Q Tf-DDK-His construct. The I-1, I-2, and I-3 sequons are located 84, 64, and 14 residues from the C terminus of Tf-DDK-His, respectively (Fig. 5 A). The I-1 and I-2 sites were more efficiently glycosylated than the I-3 site (Fig. 5 E). Depletion of STT3B eliminated glycosylation of the I-3 site but did not reduce glycosylation of the I-1 or I-2 sites (Fig. 5 F). Truncation of the DDK-His tag by 11 residues (Tf I-2ΔHis) reduced the distance between the inserted I-2 site and the C terminus to 53 residues. As the glycosylated form of Tf I-2ΔHis co-migrated with wild-type Tf and was insensitive to STT3B depletion (Fig. 5 G), it was initially unclear whether the inserted site had been modified. Limited digestion of the Tf I-2 and Tf I-2ΔHis products with endoglycosidase H produced a glycan ladder that allows facile counting of N-linked glycans (Fig. 5 H). Both the Tf I-2 and Tf I-2ΔHis product have three N-linked glycans consistent with efficient modification of the inserted NVT site.
To obtain a better understanding of the boundary for posttranslocational C-terminal glycosylation, the location of murine glycopeptides was plotted as a running sum versus distance from the C terminus (Fig. 5 I, top plot). Remarkably, the plotted data could be analyzed as two intersecting lines, the slopes of which are proportional to the observed glycan density. The calculated intersection point for the two lines is 66 residues, which falls within the range of reported distances in amino acid residues between the peptidyltransferase site on the ribosome and the OST active site (Whitley et al., 1996; Deprez et al., 2005). The location and STT3B dependence of the C-terminal glycosylation sites in SHBG, β-GUS Myc-DDK, Tf, Tf-DDK-His, CD40L, CD69, blood coagulation factor VII, and prosaposin are plotted on the same abscissa (Fig. 5 I, bottom plot). With one noteworthy exception (Fig. 5 I, factor VII, inverted black triangle), sequons that show dependence upon STT3B fall on the low glycan density arm of the glycopeptide versus distance plot. STT3B-independent sites, including those in Tf and prosaposin, are located on the high glycan density arm of the plot. The I-2ΔHis and I-2 sites in Tf fall in an intermediate region near the intersection point where the observed glycan density begins to exceed the linear regression fit for the low-density arm of the plot (Fig. 5 I, gray square and gray circle).
C-terminal glycosylation in model eukaryotes
Recently, N-linked glycopeptides from six diverse model organisms were identified by mass spectrometry (Zielinska et al., 2012). These databases were analyzed as in Fig. 5 I to determine the density of sequons and glycans in the C-terminal 150 residues of glycoproteins (Fig. 6, glycans are blue lines, and sequons are red lines). Sequon and glycan densities that were calculated by linear regression analysis are shown adjacent to each line or pair of intersecting lines. With the exception of Schizosaccharomyces pombe, there is a reduced density of N-glycans in the C-terminal 75 residues of the glycoproteins. Reductions in C-terminal glycan density ranged between 1.4-fold for Saccharomyces cerevisiae to 2.3–2.5-fold for Drosophila melanogaster, Danio rerio, and Mus musculus. With the exception of Caenorhabditis elegans, the reduced C-terminal glycan density is in part explained by a reduced density of C-terminal sequons (Fig. 6). We next asked whether NXS sequons, NXT sequons, or both were reduced in C-terminal segments of glycoproteins (Table 1 and Table 2). For all organisms except S. pombe and C. elegans, positive selection of NXT sequons is reduced near the C terminus (Table 1), whereas NXS sequons are subject to negative selection (Table 2). As observed for M. musculus (Fig. 1 D), the apparent modification frequency of NXT sequons shows little or no reduction in the C-terminal 75 residues (Table 1). The modification frequency for C-terminal NXS sequons was reduced in all organisms except S. pombe and Arabidopsis thaliana (Table 2). Thus, the overall reduction in C-terminal glycan density in glycoproteins from metazoan organisms is caused by a reduction in sequon density and a lower modification efficiency of C-terminal NXS sites (Table 2).
The number of eukaryotic STT3 sequences in the protein sequence database has increased roughly fivefold since we last examined the relationship between eukaryotic STT3 proteins (Kelleher and Gilmore, 2006). Using this larger dataset, it is now clear that the genomes of all sequenced metazoan organisms, with the exception of species in the genus Caenorhabditis, encode both STT3A and STT3B (Fig. S1). The genomes of all multicellular plants and some unicellular plants also encode two STT3 proteins. One clade of plant STT3 proteins is more closely related to metazoan STT3A proteins, whereas the second clade is related to protist STT3 proteins. Fungal genomes encode a single STT3 protein. Metazoan STT3B proteins, which include the STT3 proteins of Caenorhabditis species, are more closely related to fungal STT3 proteins. Thus, the three model organisms (S. cerevisiae, S. pombe, and C. elegans) that show the least reduction in C-terminal glycan density express a single OST catalytic subunit that is more closely related to the STT3B isoform responsible for posttranslocational glycosylation of extreme C-terminal sites in human cells.
Hypoglycosylation can cause defects in glycoprotein folding, secretion, and function that contribute to the pathophysiological symptoms displayed by patients that have congenital disorders of glycosylation (Haeuptle and Hennet, 2009). Although extreme C-terminal sequons are prone to be skipped by the translocon-associated STT3A isoform of the OST, these sequons are often evolutionarily conserved (e.g., N396 in SHBG and N631 in β-GUS), indicating that glycans at these positions are important. The α subunit of human chorionic gonadotropin (Matzuk and Boime, 1988) and von Willebrand factor (McKinnon et al., 2010) are examples of proteins that have extreme C-terminal glycans that are necessary for efficient protein folding and secretion. Here, we have presented evidence that extreme C-terminal sequons in proteins are glycosylated by a STT3B-dependent posttranslocational pathway. Our analysis of large glycoprotein databases has revealed that C-terminal sequons are not uncommon and that C-terminal NXT sites appear to be glycosylated as efficiently as internal NXT sites.
Posttranslocational glycosylation of C-terminal acceptor sites
Because of the distance between the peptidyltransferase site and the OST active site, sequons in the last 65–75 residues of a protein are not glycosylated before chain termination (Fig. 7 b). Pulse–chase labeling of two secretory proteins (SHBG and β-GUS) indicated that C-terminal sites located in the last 50 residues of a protein are prone to be skipped by the translocation channel–associated STT3A isoform of the OST complex, hence their glycosylation was insensitive to STT3A depletion. Glycosylation of extreme C-terminal NXT sites was rapid (t1/2 of ∼2 min) and efficient, whereas glycosylation of the NXS site in SHBG was roughly fourfold slower and did not reach completion. Three important conclusions can be drawn from the pulse–chase analysis of SHBG glycosylation. The order in which the sequons are modified in wild-type SHBG is not determined by the direction of nascent polypeptide exit from the translocation channel, as the extreme C-terminal N396GT site is modified before the N380RS site. This observation excludes the possibility that glycosylation of C-terminal sites in proteins is achieved by reducing the rate of polypeptide egress from the lumenal face of the protein translocation channel after termination of protein synthesis. Thus, these sites are posttranslocationally glycosylated after the completed protein enters the ER lumen (Fig. 7 c). Second, our results indicate that the kinetics and extent of C-terminal glycosylation is strongly influenced by the hydroxyamino acid in the consensus site. Finally, the difference in modification kinetics of the two extreme C-terminal sites in SHBG is not consistent with a scanning mechanism wherein a single copy of STT3B engages a nascent polypeptide and scans for skipped NXT/S sites.
As the distance between a glycosylation site and the C terminus increases, the dependence upon STT3B decreased. The N630VT site in Tf, which is 68 residues from the C terminus, is efficiently glycosylated in STT3B-depleted cells. Analysis of several Tf derivatives and β-GUS Myc-DDK indicated that the boundary for STT3B dependence falls in the vicinity of 50–55 residues from the C terminus of the protein. Thus, hypoglycosylation of Tf by CDG-1 patients is explained by increased acceptor site skipping by STT3A in cells that lack the fully assembled oligosaccharide donor rather than a defect in posttranslocational glycosylation by STT3B. The previously analyzed posttranslational glycosylation site in factor VII (Bolt et al., 2005; Ruiz-Canada et al., 2009) is well outside the range for extreme C-terminal sites (84 residues from the C terminus). Clearly, factors other than an extreme C-terminal location can cause glycosylation sites, such as the N360IT site factor VII to be skipped by STT3A.
What prevents complete modification of extreme C-terminal glycosylation sites, such as the N380RS site, in SHBG? For a slowly modified site, folding of the nascent polypeptide (Fig. 7 g) could prevent glycosylation as a result of the architecture of the OST active site (Lizak et al., 2011), which can only accommodate an unfolded region of a protein. A second factor that could limit posttranslocational glycosylation would be diffusion of the nascent polypeptide away from the lumenal surface of the ER membrane where the OST is located (Fig. 7 f). The distance between the lumenal surface of the ER membrane and the OST active site had been estimated to be 3 nm (Nilsson and von Heijne, 1993). However, this value is somewhat greater than the 1.5-nm distance between the acceptor peptide binding site and the transmembrane spans of Campylobacter lari PglB (Lizak et al., 2011). Electron microscopy of yeast cells indicates that the thickness of ER cisternae in the vicinity of membrane bound ribosomes is on average 36 nm (West et al., 2011). Thus, roughly 80% of the lumenal volume of the ER is outside a narrow zone of glycosylation near the lumenal membrane surface. C-terminal sites in glycoproteins that are tethered to the membrane by transmembrane spans or glycosylphosphatidylinositol anchors would be retained closer to the zone of glycosylation, likely explaining why the C-terminal sites in the two type II membrane proteins (CD40L and CD69) were less sensitive to STT3B depletion than the C-terminal sites in β-GUS or SHBG.
Internal glycans stimulate posttranslocational glycosylation by STT3B
A single internal glycan, regardless of location, was sufficient to increase the rate and extent of C-terminal glycosylation of the N631ET site in β-GUS. A similar enhancement of posttranslocational glycosylation by a second glycan was observed for two other substrates (factor VII and KCNE1) that we analyzed previously (Ruiz-Canada et al., 2009; Bas et al., 2011). A challenge for the future will be to determine whether posttranslocational delivery of substrates to STT3B is mediated by ER chaperones or instead simply occurs by diffusion (Fig. 7 c).
Conserved features of extreme C-terminal glycosylation
Analysis of the glycopeptide databases from seven model organisms revealed both conserved and nonconserved features of C-terminal glycosylation. As reported previously, there is strong positive selection for NXT sequons in glycoprotein sequences for organisms that have the ER–glycoprotein quality control pathway (Cui et al., 2009), and this includes all seven model organisms analyzed here. The reduction in C-terminal glycan density was most pronounced for the M. musculus, D. rerio, and D. melanogaster glycoproteins. The organisms that showed the least reduction in C-terminal glycan density (S. cerevisiae, S. pombe, and C. elegans) showed little or no reduction in sequon density between the C-terminal segment and internal regions of glycoproteins. Collectively with the phylogenetic analysis of STT3 proteins, it appears that posttranslocational glycosylation of sequons by an STT3B-like OST complex preceded the evolution of a dedicated translocation channel–associated STT3A complex.
Cellular roles of the STT3A and STT3B isoforms of the OST
The STT3A and STT3B isoforms of the OST complex have partially nonoverlapping roles in asparagine-linked glycosylation. The three internal glycosylation sites in β-GUS exemplify the class of sites that can be modified by either STT3A or STT3B, as hypoglycosylation of these internal sites only occurred when expression of both OST isoforms was reduced.
One prominent role of the STT3B complex is to mediate glycosylation of sequons that are skipped by the STT3A complex, including those at the extreme C terminus of proteins. For reasons that remain to be determined, the translocation channel–associated STT3A complex skips certain internal sequons. Previously characterized skipped sites include an N-terminal site in cathepsin C adjacent to the signal sequence cleavage site and a posttranslocational glycosylation site in blood coagulation factor VII (Ruiz-Canada et al., 2009). The internal N111AC site in CD69 is frequently skipped by STT3A, as NXC sites are inherently suboptimal because of a reduced binding affinity of the acceptor sequon to the OST active site (Bause, 1984). The N111AC site is also poorly modified by the STT3B complex, hence occupancy of this site remains low.
A novel role for the STT3B isoform of the OST in the ER-associated degradation pathway was recently described (Sato et al., 2012). Posttranslational glycosylation of a normally unmodified sequon in a folding-defective form of transthyretin directs the malfolded protein into the glycan-dependent branch of the ER-associated degradation pathway. It remains to be determined whether this aspect of the ER-associated degradation pathway is applicable to other malfolded proteins with silent glycosylation sites. These results, collectively with previous findings concerning a co-translational role for STT3A (Ruiz-Canada et al., 2009), indicate that the duplication of the STT3 gene in metazoan organisms resulted in the evolution of OST isoforms with distinct roles in N-glycosylation that together facilitate optimal glycosylation of the large number of OST substrates that are expressed by higher eukaryotes.
Materials and methods
Cell culture and plasmid or siRNA transfection
HeLa cells (CCL-13; American Type Culture Collection) were cultured in 10-cm2 dishes at 37°C in DMEM (Gibco), 10% fetal bovine serum with 100 U/ml penicillin and 100 µg/ml streptomycin. HeLa cells were seeded at 30% confluency for siRNA transfection or 80% confluency for plasmid transfection in 60-mm dishes and grown for 24 h before transfection with siRNA (60 nM negative control, 50 nM STT3A, and 60 nM STT3B) or 8 µg plasmid and Lipofectamine 2000 in Opti-MEM (Gibco) using a protocol from the manufacturer (Invitrogen). Plasmid transfection was performed after 48 h of siRNA transfection, and cells were assayed 24 h later. The siRNAs specific for the STT3A and STT3B mRNAs were characterized previously (Ruiz-Canada et al., 2009). The STT3A siRNAs are 5′-GGCCGUUUCUCUCACCGGCdTdT-3′ annealed with 5′-UCCGGUGAGAGAAACGGCCdTdT-3′. The STT3B siRNAs are 5′-GCUCUAUAUGCAAUCAGUAdTdT-3′ annealed with 5′-CACUGAUUGCAUAUAGAGCdTdT-3′. Negative control siRNA was purchased from QIAGEN.
The Myc-DDK–tagged β-GUS, CD40 ligand, and CD69 expression vectors were purchased from OriGene. Tf was amplified from an OriGene cDNA clone and cloned into a pCMV6-AC-DDK-His vector (OriGene). The SHBG, N380Q, and N396Q SHBG mutants in the vector pRC/cytomegalovirus (Invitrogen) were gifts from G. Hammond (Child and Family Research Institute, Vancouver, British Columbia, Canada). Site-directed mutagenesis was used to generate glycosylation site mutants or to introduce a stop codon after the Myc or DDK tags in the expression vector. With the exception of the N173NT site in β-GUS that was inactivated by the T175A mutation, glycosylation sites were inactivated by asparagine to glutamine substitutions.
Radiolabeling and immunoprecipitation of glycoproteins
Cell culture medium was replaced with methionine- and cysteine-free DMEM media (Gibco) containing 10% dialyzed fetal bovine serum 20 min before the addition of 200 µCi/ml of Tran35S label (PerkinElmer). Pulse-labeling periods were terminated by the addition of unlabeled methionine (3.75 mM) and cysteine (0.75 mM). Cells from one culture dish at each time point were lysed at 4°C by a 30-min incubation with 1 ml radioimmunoprecipitation assay lysis buffer or β-GUS lysis buffer (10 mM Tris-Cl, pH 8.5, 14 mM NaCl, 1% NP-40, 0.5% sodium deoxycholate, 1 mM MgCl2, and protease inhibitor cocktail, as defined in Kelleher et al. ). Cell lysates were clarified by centrifugation (2 min at 13,000 rpm) and precleared by incubation for 2 h with control IgG and a mixture of protein A/G–Sepharose beads (Invitrogen). The precleared lysates were incubated overnight with protein or epitope tag–specific antibodies followed by the addition of a second aliquot of protein A/G–Sepharose beads and incubated for 4 h. Beads were washed five times with radioimmunoprecipitation lysis buffer or β-GUS wash buffer (150 mM Tris-Cl, pH 7.4, 500 mM NaCl, 0.4% NP-40, and 1% sodium deoxycholate) and twice with 10 mM Tris-HCl before eluting proteins with gel loading buffer. Antibodies were obtained from the following sources: anti-Tf (Dako), anti-DDK (anti-FLAG; Sigma-Aldrich), anti-SHBG (R&D Systems), and anti-Myc (Santa Cruz Biotechnology, Inc.). As indicated, immunoprecipitated proteins were digested with endoglycosidase H (New England Biolabs, Inc.). Dry gels were exposed to a phosphor screen (Fujifilm), scanned in a laser scanner (Typhoon FLA 9000; GE Healthcare), and quantified using AlphaEaseFC (Alpha Innotech).
Analysis of SHBG and β-GUS pulse–chase experiments
The pulse–chase data for β-GUS and SHBG glycosylation was fit to single and double exponential equations using Kaleidograph 3.6 (Synergy Software). For wild-type SHBG, which has two glycans, consumption of SHBG-0 was fit to the equation y = a0 × e(−0.693t/t1/2(0)) + b0; formation of SHBG-2 was fit to the equation y = a2 × (1 − e(−0.693t/t1/2(2)) + b2. Consumption of SHBG-1 was fit to the equation y = a0 × e(−0.693t/t1/2(0)) + a2 × (1 − e(−0.693t/t1/2(2)) + b1 using the parameters derived for SHBG-0 and SHBG-2 (a0, a2, t1/2(0), and t1/2(2)).
Bioinformatics analysis of glycoproteomic databases and STT3 proteins
A list of 5,052 high confidence (class I) murine N-glycosylation sites was obtained from the supplemental materials in Zielinska et al. (2010). We excluded 112 sites that did not match the currently accepted consensus sequon (N-X-T/S/C), as current evidence indicates that nonsequons, when modified, show occupancy levels that do not exceed 1–3% (Valliere-Douglass et al., 2009, 2010). 18 additional putative glycopeptides were excluded because they map to proteins that do not enter the secretory pathway (12 sites) or are no longer in the murine protein database as a result of a change in open reading frame annotation (six sites). The remaining 4,922 class I sites are in 1,902 murine proteins. The glycoprotein sequences were downloaded, and the locations of the glycopeptides were verified. When several nonidentical protein sequence files were obtained for a given glycopeptide, we selected the file with the largest open reading frame to avoid duplication (for alternatively spliced proteins) or analysis of incomplete protein sequences (N- or C-terminal truncations). The amino acid compositions of the 1,902 glycoproteins were used to calculate expected sequon densities. A complete list of 13,326 sequons (N-(X≠P)-T/S/C) in the 1,902 murine proteins was generated.
High confidence (class I) sites for S. cerevisiae, S. pombe, C. elegans, D. melanogaster, A. thaliana, and D. rerio were obtained from the supplemental materials in Zielinska et al. (2012) and analyzed in a similar manner. The generated sequon lists for all model organisms except M. musculus only included N-(X≠P)-T/S sequons because of the low number of N-X-C–linked glycans in the initial class I site databases of these model organisms.
Direct retrieval and BLAST (Basic Local Alignment Search Tool) search on the NCBI Protein and UniProt databases retrieved 999 eukaryotic STT3 sequences. Of these, 369 sequences were left after elimination of duplicates and severely truncated sequences. For display purposes (Fig. S1), the collection of 369 STT3 sequences was trimmed to 152 sequences representing diverse eukaryotes yet retaining multiple nonidentical STT3 sequences from selected metazoan, plant, and protists organisms. Phylogenetic alignment of the complete dataset (369 STT3 sequences) yielded a similar tree. The 152 sequences were used for phylogenetic analysis on the SATCHMO-JS web server (Hagopian et al., 2010). The tree drawing was performed using the ETE (python environment for phylogenetic tree exploration) program (Huerta-Cepas et al., 2010).
Online supplemental material
Fig. S1 shows the phylogenetic analysis of eukaryotic STT3 proteins. Table S1 lists the accession numbers of the 152 STT3 proteins that were used to generate Fig. S1.
The authors thank Dr. Geoffrey Hammond for providing SHBG expression vectors.
Research reported in this publication was supported by the National Institute of General Medical Sciences of the National Institutes of Health under award number GM43687.