Arg (R)-rich dipeptide repeat proteins (DPRs; poly(PR): Pro-Arg and poly(GR): Gly-Arg), encoded by a hexanucleotide expansion in the C9ORF72 gene, induce neurodegeneration in amyotrophic lateral sclerosis (ALS). Although R-rich DPRs undergo liquid–liquid phase separation (LLPS), which affects multiple biological processes, mechanisms underlying LLPS of DPRs remain elusive. Here, using in silico, in vitro, and in cellulo methods, we determined that the distribution of charged Arg residues regulates the complex coacervation with anionic peptides and nucleic acids. Proteomic analyses revealed that alternate Arg distribution in poly(PR) facilitates entrapment of proteins with acidic motifs via LLPS. Transcription, translation, and diffusion of nucleolar nucleophosmin (NPM1) were impaired by poly(PR) with an alternate charge distribution but not by poly(PR) variants with a consecutive charge distribution. We propose that the pathogenicity of R-rich DPRs is mediated by disturbance of proteins through entrapment in the phase-separated droplets via sequence-controlled multivalent protein–protein interactions.
Introduction
Amyotrophic lateral sclerosis (ALS) is a fatal motor neuron disease affecting both upper and lower motor neurons. Frontotemporal dementia (FTD) is a type of brain disorder with degeneration of the frontal and temporal lobes of cerebrum. It has been shown that these neurodegenerative diseases share some genetic and clinical features, and defective C9ORF72 is the leading cause of familial cases of ALS (C9-ALS) and FTD (Renton et al., 2011; DeJesus-Hernandez et al., 2011). C9-ALS patients carry abnormally expanded GGGGCC hexanucleotide repeats in intron 1 of C9ORF72, and the unconventional repeat-associated non-ATG-initiated (RAN) translations (Zu et al., 2011) produce five dipeptide repeat proteins (DPRs), poly(GA): Gly-Ala, poly(PA): Pro-Ala, poly(GP): Gly-Pro, poly(GR): Gly-Arg, and poly(PR): Pro-Arg, from the hexanucleotide expansion (Zu et al., 2011; Mori et al., 2013). Two Arg-rich (R-rich) DPRs, poly(PR) and poly(GR), are toxic (Wen et al., 2014; Kwon et al., 2014; Mizielinska et al., 2014; Lee et al., 2016) to a wide variety of potential targets, including (a) protein translation (Kanekura et al., 2016; Lee et al., 2016; Zhang et al., 2018; Moens et al., 2019); (b) dynamics of membrane-less organelles (MLOs; Lin et al., 2016; Lee et al., 2016; Boeynaems et al., 2017); (c) metabolism of RNA (Kwon et al., 2014); (d) nucleocytoplasmic transport (Jovičič et al., 2015; Zhang et al., 2015) and (e) DNA damage response (Andrade et al., 2020) for both poly(PR) and poly(GR); (f) ubiquitin-proteasome system (Gupta et al., 2017) and (g) heterochromatin anomalies for poly(PR) alone (Zhang et al., 2019); and (h) mitochondrial function for poly(GR) alone (Choi et al., 2019). Thus, the processes impaired by R-rich DPRs are highly diverse and regulated by independent mechanisms, indicating that the interaction between R-rich DPRs and a variety of proteins involved in various processes generates ALS-related neurotoxicity.
Arg, a positively charged, highly polar amino acid with dipole moment, plays a pivotal role in protein–protein interactions and protein–nucleic acid interactions (Boeynaems et al., 2019). Abundant positive charges and a highly repetitive nature enable these R-rich DPRs to bind to proteins and nucleic acids at multiple sites via electrostatic forces and cation-π interactions, resulting in liquid–liquid phase separation (LLPS; Lee et al., 2016; Lin et al., 2016). LLPS contributes to MLO formation (Frottin et al., 2019). MLOs, such as nucleoli and stress granules, consist of condensed proteins and RNA. The biophysical characteristics of proteins that undergo LLPS involve multivalent, relatively low-affinity interactions via intrinsically disordered regions and/or frequent presence of charged amino acids (Babinchak et al., 2019). Indeed, R-rich C9ORF72 DPRs phase separate with proteins or nucleic acids to form droplets in vitro and preferentially accumulate in MLOs, such as nucleoli and stress granules in cells (Lee et al., 2016; Boeynaems et al., 2017). The R-rich DPRs modulate the dynamics of phase-separated proteins, thereby impeding the functions of MLOs as well as proteins composing MLOs. In particular, poly(PR) inhibits nucleolar nucleophosmin (NPM1) diffusion (White et al., 2019).
Although many peptides undergo LLPS, not all show cytotoxicity. The primary sequence of these is important for cytotoxicity (Ukmar-Godec et al., 2019). For example, while the highly cationic Arg12 (R12) peptide phase separates with RNA under in vitro conditions, it is located in the cytosol and has no toxic effect (Meloni et al., 2014; Meloni et al., 2015). By contrast, the (PR)12 peptide, carrying the same number of Args, is mainly located in the nucleolus and exerts toxicity by inhibiting protein translation (Kanekura et al., 2018), indicating that the toxicity of cationic peptides is defined not only by the ability to undergo LLPS but also by yet-unknown rules. These lines of evidence prompted us to structurally decode the complex coacervation and toxicity of poly(PR).
Here, we show that the alternating structure of poly(PR) governs LLPS and the resulting toxicity, including inhibition of transcription/translation and impairment of diffusion of NPM1. The toxicity of cationic peptides, including poly(PR), is determined not only by whether it undergoes LLPS but also by the number of molecules each peptide captures in the droplets via multivalent interactions. The molecular dynamics (MD) calculation shows that alternate insertion of Pro into consecutive Args is not advantageous for binding energy, but quantitative proteomic analysis indicates that alternate distribution of Arg is favorable for forming multivalent interaction with client molecules, resulting in LLPS. Poly(PR) variants with large periodicities (e.g., PnRn) have higher binding energy, form sticky droplets with poor fluidity, trap fewer proteins, and fail to exert toxicity. Data mining from the proteomic analysis revealed that proteins harboring acidic stretches and/or abundant acidic amino acids are overrepresented in the (PR)12 interactome compared with the R12 interactome. Enrichment of proteins with acidic motifs is, at least in part, mediated by LLPS. Thus, we propose that entrapment of diverse proteins with acidic motifs by poly(PR) via multivalent interactions underlies motor neuronal cell death in C9-ALS.
Results
Size of the periodicity determines the toxicity of poly(PR)
The most remarkable biochemical properties of R-rich DPRs [poly(PR) and poly(GR)] are repetitive structures and an extraordinarily highly charged nature due to abundant Arg residues. Arg is a positively charged, highly polar amino acid that interacts with negatively charged molecules, such as acidic proteins and nucleic acids. Protein–protein and protein–nucleic acid interactions are expected to confer cytotoxic properties to R-rich DPRs. Poly-Arg peptide, often used as a drug delivery carrier due to its cell-permeability properties, is highly charged and interacts with anionic molecules in a similar manner (Kosuge et al., 2008) but exerts no cytotoxic effect (Meloni et al., 2014; Meloni et al., 2015). Therefore, we speculated that the distribution of charged Args in a peptide primary sequence determines its toxicity. To study the role of distribution of Arg in poly(PR)-mediated toxicity, we adopted two biological models, protein translation and diffusion of nucleolar NPM1, which are reportedly inhibited by poly(PR) (Kanekura et al., 2016; Hartmann et al., 2018; Moens et al., 2019; White et al., 2019).
First, we investigated the effect of Arg distribution in the inhibition of translation by poly(PR). Real-time monitoring of fluorescence from GFP produced by in vitro translation (IVT) using HeLa cell lysates revealed that (PR)12, but not R12, affects protein translation (Fig. 1 A), suggesting that alternately inserting Pro residues into a continuous R12 sequence confers the peptide with ability to inhibit protein translation. To investigate the effect of Arg distribution on the inhibition of protein translation, we synthesized a series of (PR)12 variants with different periodicities (size of periodicity, t, = 2, 4, 8, 24; Fig. 1 B); proportionally different variants of (PR)12 with equal numbers of Arg, (P1R2)6 (Pro:Arg ratio = 1:2) and (P1R3)4 (Pro:Arg ratio = 1:3); or variants of (PR)12 with equal numbers of Pro, (P2R1)6 (Pro:Arg ratio = 2:1), and (P3R1)4 (Pro:Arg ratio = 3:1; Fig. 1 C). The IVT-HeLa assay showed that (PR)12 effectively suppressed translation of GFP, whereas (PR)12 variants with larger periodicities showed a much milder effect compared with (PR)12 (Fig. 1 D). The IVT-HeLa assay also showed that an increment of Pro residues up to a 1:1 ratio conferred the peptide with the ability to inhibit protein translation, whereas (PR)12 variants with fewer Arg residues (<50%) could not inhibit protein translation (Fig. 1 E). Therefore, periodical insertion of Pro into consecutive R12 ≤50% confers toxicity to the peptide. Next, we investigated whether the periodicity of poly(PR) affects protein translation in live cells by expressing a series of (PR)50 variants (Fig. 1 F). When we measured protein translation rate in live cells by puromycin labeling, GFP-(PR)50 inhibited protein translation most, and GFP-(PR)50 variants with large periodicity failed to inhibit (Fig. 1, G and H). Intriguingly, we found that GFP-(PR)50 accumulated almost exclusively in the nucleolus; however, GFP-(PR)50 variants with large periodicity localized to both the nucleolus and cytosol, and GFP-(P16R16)3 failed to localize to the nucleolus and localized almost exclusively to the cytosol (Fig. 1 G, lower panel, and Fig. 1 I).
Next, we tested whether the size of periodicity of (PR)50 variants affects the diffusion of NPM1 (White et al., 2019). FRAP of nucleolar GFP-(PR)50 variants showed that GFP-(PR)50 had higher fluidity than GFP-(P8R8)6 (Fig. 1 J), indicating that the interaction between (PR)50 and surrounding molecules in the nucleolar matrix was relatively loose compared with that of (P8R8)6. Next, we tested the effect of (PR)50 variants on the diffusion of GFP-NPM1. As reported previously, (PR)50 suppressed the mobility of GFP-NPM1; however, (P4R4)12 had a modest effect and (P8R8)6 had little effect on it (Fig. 1, K and L). To quantitatively evaluate FRAP results, we used a simplified equation to extract diffusion coefficients from FRAP data using half recovery time (Materials and methods). Half recovery time indicated that diffusion of NPM1 coexpressed with (PR)50 was much slower than when coexpressed with (P8R8)6, implying that more NPM1 molecules are affected by (PR)50 than by (P8R8)6 (Fig. 1 M). These data clearly indicated that toxicity of poly(PR) was dependent on alternating Arg distribution.
Alternating Arg distribution regulates the (PR)12 interactome
In our study, alternate insertion of Pro residues into consecutive R12 conferred the ability to inhibit protein translation on the peptide. Conventional proteomics indicates that poly(PR) preferentially interacts with RNA-binding proteins and ribosomal proteins to inhibit protein translation (Lee et al., 2016; Hartmann et al., 2018; Moens et al., 2019). These results suggest that the alternate distribution of Arg, along with the amino acid that alternates Arg, may determine the interactome of R-rich DPRs, which concomitantly results in cytotoxicity. The P12R12 variant, which carries exactly the same components as (PR)12, such as the number of Args, number of Pros, and length of peptide, may enhance or inhibit protein–protein interactions to diminish the harmful properties of (PR)12. Thus, we performed quantitative liquid chromatography–mass spectrometry (LC-MS) to identify the interactomes of these peptides (Ueda et al., 2011). Immunoprecipitation (IP) with HEK293 cell lysates followed by LC-MS identified ∼2,000 interacting proteins (Fig. 2 A; Table S1). The (PR)12 interactome is composed mostly of nuclear and cytosolic proteins, where poly(PR) mainly localizes (Fig. 2 B). To confirm whether the repeat length impacts the interactome of poly(PR), we compared the (PR)12 interactome with that of (PR)50 (Lee et al., 2016) and that of (PR)100 (Lin et al., 2016) and confirmed that the interactome of (PR)12 is quite similar to that of (PR)50 (Fig. S2, A and B) and that of (PR)100 (Fig. S2 C). The statistical probability of these levels of overlaps were extremely low (P = 1.05 × 10−126 for (PR)50 and 8.57 × 10−123 for (PR)100. Scatterplots and volcano plots showed that (GR)12 and R12 showed similar enrichment levels in the interactomes (R2 = 0.756; Fig. 2, C and D), whereas enrichment in the interactome of (PR)12 was very different from that of R12 (R2 = 0.415; Fig. 2, E and F).
When the interactome of R12 and that of (PR)12 were compared, 1,503 proteins of the 1,921 identified proteins were common to the interactomes of R12 and (PR)12. The interactomes of (P2R2)6, (P4R4)3 and P12R12 were also qualitatively similar (Fig. S2, D–F). The signals of 171 proteins (11.4%) were increased by ≥10-fold, while the signals of only 7 proteins (0.5%) were decreased to <10% in (PR)12, compared with those of R12 (Fig. 2 G). Signal intensities of the enriched interactome of (PR)12 (>10-fold compared with R12) were, in most cases, reduced in the P12R12 interactome (Fig. 2, H–L), suggesting that the enrichment of the interactome was dependent on the periodicity of (PR)12. IP followed by immunoblotting (IB) analyses indicated that the signal intensities of coprecipitated proteins were inversely correlated with the size of periodicity (Fig. 2 M). Another IP-IB analysis showed that consecutive P12 residues did not contribute to protein–protein interaction, while Pro residues inserted alternately into consecutive R12 induced a drastic increase in protein–protein interactions, showing that the role of alternating Pro is promoting multivalent interaction with client proteins (Fig. 2 N). Overall, these findings indicated that (a) peptide–protein interaction was affected by the size of (PR)12 periodicity, and (b) the number of molecules that 12 Arg residues interacted with was increased by alternate insertion of Pro.
We obtained 196 proteins in the (PR)12 interactome that were enriched at least fivefold, compared with those in the R12 interactome, and 294 proteins detected only in the (PR)12 interactome (Fig. 2 A). To identify biological processes involving these proteins, we investigated functional protein association networks using the STRING database (Szklarczyk et al., 2019). The >5-fold enriched proteome and (PR)12-specific interactome integrate well and form several functional nodes (Fig. 2 O). The PANTHER overrepresentation test (Mi et al., 2019) found that components of many biological processes, which were reportedly impaired by poly(PR), such as RNA splicing (gene ontology [GO]: 0008380; false discovery rate [FDR] = 3.38 × 10−32), nucleocytoplasmic transport (GO: 0006913; FDR = 1.37 × 10−15), and translation (GO: 0006412, FDR = 4.73 × 10−10), were included (Fig. 2 P and Table S2). We found previously unknown molecular targets of poly(PR), such as transcription, DNA-templated (GO: 0006351, FDR = 1.32 × 10−13; Fig. 2 P). The heatmap showed that all three classes of RNA polymerases may be targeted by (PR)12 (Fig. S1). To examine whether RNA polymerase activity is indeed impaired by poly(PR), RNA synthesis was evaluated by incorporation of 5-ethynyl uridine (EU), followed by visualization using Alexa Fluor 594 azide via click chemistry reaction (Fig. S3, A and B). We observed nucleolar RNA staining without treatment, but the signal intensity of newly synthesized RNA in HeLa cells treated with (PR)20 was attenuated (Fig. S3, B and C). Furthermore, overexpression of GFP-(PR)50 suppressed transcription, and GFP-(PR)50 variants with large periodicities failed to suppress (Fig. S3, E and F). These data show that the (PR)12-enriched interactome contains molecular targets of poly(PR) pathophysiology.
Alternate Pro residues enhance multivalent interactions with acidic motifs
Shared features in the enriched interactome of (PR)12 were analyzed to clarify the affinity of (PR)12 for these proteins. Sequences of the top 20 enriched proteins tended to have long acidic stretches consisting of consecutive D/E or repetitive clusters of D/E (Fig. 3 A and Fig. S4, A and B). Thus, we screened consecutive acidic stretches in the sequence of proteins with at least a fivefold increase in (PR)12 or a decrease of >50% in (PR)12, compared with R12, and found that (PR)12 preferred proteins with longer acidic stretches (Fig. 3 B). To detect repetitive small clusters of D/E, we defined a weighted scoring system for D/E using 6-aa peptides (D/E score; Fig. 3 C; Materials and methods) and calculated the score from start codon to stop codon of each protein. The total D/E scores of proteins with a greater affinity for (PR)12 were significantly higher than those of proteins with a greater affinity for R12 (P < 0.0001; Fig. 3, D and E). Next, we calculated the overall amino acid occurrence in the interactome. The whole interactomes of (PR)12 and R12 showed no remarkable differences, but occurrence of charged residues (D, E, K, and R) were higher than those in the human proteome (Fig. 3 F; Kozlowski, 2017). When we calculated amino acid occurrence in the 4 subgroups (>500-, >200-, >5-fold, or <0.5-fold enriched in (PR)12 compared with R12, respectively), we found that acidic amino acids were overrepresented in the (PR)12 interactome compared with R12 (Fig. 3 G). Aromatic acids were not enriched, suggesting that electrostatic forces rather than cation-π interactions play a central role in determining the (PR)12 interactome (Fig. 3 G).
When compared with the interactome of R12, the signal intensities of some (PR)12 interactors were drastically increased, >10,000-fold (Fig. 2 G). Even if 1 (PR)12 molecule interacts with 12 client molecules via each R and R12 interacts with only 1 molecule, the maximum difference can be 12-fold, so that such an unusually high condensation process may be explained by LLPS formation. Overrepresentation of charged amino acids in the (PR)12-enriched interactome substantiates this postulate. To confirm this, we mixed recombinant proteins with R12, (PR)12, or P12R12. Recombinant human Importin-7 (IPO7) and recombinant human ARF GTPase activating protein 1 (GIT1) phase separated with (PR)12 but not with R12 or P12R12, indicating that alternating Arg distribution facilitates LLPS (Fig. 3, H–K). We investigated whether 1,6-hexanediol, an aliphatic alcohol that disrupts LLPS, affects the IP assay (Lin et al., 2016). When the concentration of 1,6-hexanediol in the lysate was increased, the immunoprecipitated NAP1L4 signal, which is the most enriched interactor of (PR)12, faded accordingly (Fig. 3 L). These data revealed that enrichment of the interactome may be explained, at least partially, by LLPS.
Periodicity of (PR)12 regulates phase separation
LLPS formation with recombinant IPO7 and GIT1 indicates that the periodicity of (PR)12 regulates phase separation. Studies indicate that the distribution of charged amino acids affects electrostatic force and phase separation, and especially blocky sequences of charged residues exhibit strong charge correlations due to the sequence alignment of two nearby chains, which favors binding (Chang et al., 2017). To understand how periodicity of (PR)12 regulates LLPS, we tested LLPS formation with poly-rA as a model for RNA or with poly-E as a model for acidic stretches of proteins. Phase-separated droplets of (PR)12 variants and RNA wetted the hydrophobic glass surface similarly (Fig. 4 A). However, when glass coated with a hydrophilic polymer was used, most droplets floated, and fewer droplets adhered to the surface (Fig. 4 B). Droplets of (PR)12 variants with larger periodicities lost their spherical structure under this condition, and P12R12 formed sticky nonspherical condensates that failed to minimize surface tension, a typical characteristic of liquids (Fig. 4 B). FRAP analysis of tetramethylrhodamine (TAMRA)-labeled RNA mixed with each peptide showed that RNA mixed with (PR)12 had higher fluidity than RNA mixed with (P4R4)3 or P12R12 (Fig. 4 C). When mixed with poly-E, (PR)12 and (P2R2)6 formed phase-separated droplets (Fig. 4 D). (P4R4)3 also formed droplets, but they contained aggregates of poly-E. Importantly, P12R12 did not form droplets.
Considering that fluidity of (PR)12 droplets is higher than that of (PR)12 variants with large periodicities (Fig. 4 C), Pro residues may expectedly act as looseners for the Arg–RNA interaction and Arg–poly-E interaction. Further substantiating this, whole droplet bleaching showed that RNA in the (PR)12 droplet fraction was more capable of entering and exiting than RNA in R12 droplets (Fig. 4 E). To further prove this, we tested the effects of (PR)12 variants with different proportions on LLPS. When mixed with RNA, (PR)12, (P1R2)6, and (P1R3)4 formed droplets (Fig. 4 F), whereas (P2R1)6 and (P3R1)4 did not (data not shown). FRAP analysis indicated that fluidity of the droplets increased with the ratio of Pro residues (Fig. 4 G). We also observed the same tendency in phase separation and FRAP analysis when mixed with poly-E peptide, indicating that alternate Pro residues loosened protein–protein interactions and protein–RNA interactions. (Fig. 4, G and I). Visible droplets of (P2R1)6 and (P3R1)4 were not observed when mixed with poly-E (data not shown). To confirm if the effect of the distribution of Arg on FRAP recovery rate can be seen in the physiologically relevant size of poly(PR), we performed FRAP analysis with recombinant (PR)50 and (P16R16)3. As expected, the FRAP recovery rate of (P16R16)3 was slower than that of (PR)50 when mixed with RNA or poly-E (Fig. S5, A and B).
We further investigated the effects of (PR)12 variants on LLPS with proteins harboring acidic stretches. NPM1 phase separates with poly(PR) via its acidic tract-3 domain consisting of 20 consecutive acidic amino acids (White et al., 2019), and periodicity of (PR)50 regulates the diffusion of NPM1 in cells (Fig. 1 K–M). Therefore, we speculated that periodicity of (PR)12 may regulate the phase separation of NPM1 and that modulation of LLPS underlies impaired diffusion of NPM1 in cells. When recombinant human NPM1 was mixed with (PR)12 variants, NPM1 phase-separated with (PR)12 variants and droplets showed wetting properties on the glass surface, except those of P12R12, which did not phase-separate with NPM1 (Fig. 4 J). NPM1 was shown to phase-separate with ribosomal RNA (rRNA; White et al., 2019). We tested whether (PR)12 variants affect the phase separation of NPM1 and rRNA. Recombinant human NPM1 phase separated with human rRNA, and when (PR)12 was added, the signal intensities of NPM1 and rRNA in the droplets were increased, indicating that more NPM1 molecules were captured in the phase-separated droplets (Fig. 4, K and L). When mixed with (P4R4)3, NPM1 formed large droplets, but rRNA formed fibrillar structures without liquid properties (Fig. 4 K). P12R12 failed to form droplets with NPM1, resulting in rRNA fibrils (Fig. 4 K). This indicated that (PR)12 addition resulted in increased NPM1 and rRNA involvement in LLPS droplets, causing droplets to become condensed, whereas addition of (PR)12 variants with large periodicities did not. Based on the results of in vitro and in cellulo experiments, we speculated that the rate of dynamic exchange of NPM1 in phase equilibrium was disrupted by complex coacervation with (PR)n. (PR)n condenses NPM1 and rRNA in phase-separate droplets via multivalent interaction, thereby affecting the mobility of NPM1 in cells, whereas PnRn failed to condense NPM1 and accumulated to rRNA. Thus, PnRn did not significantly affect NPM1 mobility (Fig. 1 M and Fig. 4, K and L).
Alternate distribution of Arg is not advantageous with respect to binding energy
As (PR)12 variants have the same length and density of Arg, the only difference is in the distribution of Arg. To determine the manner in which Arg distribution affects molecular interaction, we performed MD simulation to calculate the binding energy between RNA and (PR)4 variants with different distributions of Arg residues (sizes of periodicity t = 2, 4, 8; Fig. 5 A). This calculation focused on one-by-one interactions between a peptide and RNA in water (Fig. 5 B). We analyzed the root mean square deviations (RMSDs) for each peptide by comparing it with the reference structure. (PR)4 and P4R4 had relatively small and constant values >30 ns, while (P2R2)2 had a larger RMSD (Fig. 5 C), suggesting that proline plays an important role in the rigidity of the peptide during its interaction with RNA. (PR)4 had the smallest RMSD among the peptides, because the prolines were inserted evenly in the sequence. On the other hand, (P2R2)2 showed the largest RMSD, indicating that RR was more flexible than PR. P4R4 had the highest binding energy, while (PR)4 had the lowest (Fig. 5 E). We examined each binding energy component according to the equation Eb = Evdw + Eele + Epol + Enonpol, where Evdw is van der Waals energy, Eele is the electrostatic energy, Epol is polar solvation energy and Enonpol is nonpolar solvation energy. The first two represent molecular interactions, while the last two are related to solvation energy. Both van der Waals forces and electrostatic forces showed negative values, and absolute values increased in the order of (PR)4, (P2R2)2 and P4R4 (Fig. 5 E). Interestingly, Epol was positive for all peptides, while (PR)4 displayed a significantly smaller value than the others, indicating that solvation of water was significantly modulated by the presence and spatial distribution of Pro. Decomposed energies for each residue in the peptide sequence and RNA (poly-A) are shown (Fig. 5, F–I). While Arg residues contributed to binding energy, the Epol of each Arg depended on the peptide sequence. (PR)4 displayed a relatively suppressed Epol, due to the presence of water molecules between (PR)4 and RNA. However, P4R4 showed a relatively large Epol, suggesting that water molecules surrounding the peptide were excluded from the interaction between peptides and RNA. We also analyzed the radius of gyration (Rg), which approximately estimates the compactness of molecules. We plotted the free energy of the system with respect to Rg and RMSD (Fig. 5, J–M). The energy minima of (PR)4 had a larger Rg value and a smaller RMSD, whereas R8 was distributed at a smaller Rg and a larger RMSD, indicating that Pro residues conferred rigidity to the structure of peptides. We propose that such structural stability with relatively extended conformation intervenes in the interaction between cationic amino acids and anionic protein/nucleic acids. Our proposal would also be supported by relevant work demonstrating the folding of PR into a helical structure (Edun et al., 2021).
Next, we verified MD simulation results by other experiments. The strength of binding was determined using critical salt concentration (Alberti et al., 2019). Increasing salt concentration interferes with ionic interaction and hinders LLPS. When the concentration of NaCl in the mixture containing phase-separated droplets consisting of (PR)12 variants and RNA was increased, the P12R12 droplets survived up to 800 mM, whereas (PR)12 droplets remained by 600 mM and dissolved by 700 mM, indicating that a stronger interaction was induced by the blocky charge sequence (Fig. 5, N and O).
Our data indicated that poly(PR) entraps proteins with acidic motifs via multivalent interaction LLPS both in vitro and in cellulo. We propose that the highly multivalent interactions enabled by the alternating distribution of Arg and Pro in poly(PR) contributes to the disturbance of biochemical reactions and the pathophysiology of C9-ALS.
Discussion
ALS is a lethal motor neuron disease affecting >200,000 patients worldwide (Logroscino et al., 2018). To date, almost 30 ALS-causative genes have been identified, and C9ORF72 has been named as the most prevalent causative gene of familial ALS (Maurel et al., 2018). Here, we shed light on the molecular mechanism underlying poly(PR) toxicity that contributes to pathogenesis of C9-ALS.
Many neurodegenerative disease–related proteins, such as TAR DNA-binding protein 43 (TDP-43), fused in sarcoma (FUS), and tau, reportedly undergo LLPS. These basically phase separate by themselves via intrinsically disordered regions, a process known as simple coacervation (Wegmann et al., 2018; Babinchak et al., 2019; Yoshizawa et al., 2018). Disease-causing mutations in these proteins disturb LLPS homeostasis and contribute to aggregate/amyloid formation. C9-DPRs as well as cationic peptides, such as R12, undergo LLPS when mixed with other molecules, a process known as complex coacervation. Complex coacervation is driven by a combination of electrostatic forces, cation–π interactions, and hydrophobic interactions. Studies indicate that poly(PR) and poly(GR) undergo LLPS in the presence of low-complexity proteins and RNA-binding proteins, leading to cytotoxicity (Lee et al., 2016; Lin et al., 2016). However, the R12 peptide, which lacks substantial toxicity, also undergoes LLPS in a similar manner. Thus, simply undergoing LLPS does not explain toxicity. We hypothesized that an alternatingly charged structure of poly(PR) may play a key role in cytotoxicity. FRAP analyses and critical salt concentration experiments using a series of (PR)12 variants indicated that the order of Pro and Arg determines the environment of phase-separated droplets, concomitantly leading to the differences seen in the inhibition of protein translation.
Based on the highly charged nature of Arg, we surmised that the interactomes of (PR)12 and R12 were qualitatively similar. To understand the pathophysiology of C9-DPRs, we used quantitative proteome analyses to compare their interactomes. Quantitative proteome analyses revealed that R12 interacts with a wide variety of proteins, including ribosomal proteins and RNA-binding proteins, which are reportedly molecular targets of poly(PR) (Fig. 3 C; Radwan et al., 2020; Moens et al., 2019). Although R12 and (PR)12 share most of the interactomes, they were quantitatively divergent, and alternating insertion of Pro into Arg sequences resulted in multivalent protein interactions, causing an increase in the number of molecules with which one peptide can interact (Fig. 1 M and Fig. 2, F and M). These multivalent interactions were reduced when Arg was consecutively distributed, as shown by minimization of the effect exerted by P12R12 on protein translation. Multivalency of protein interaction by poly(PR) also affects the mobility of phase-separated NPM1 in cells by modulating dynamic phase equilibrium. When a major fraction of NPM1 is entrapped by poly(PR) via weak but multivalent interaction, the ratio of freely mobile NPM1 is decreased, disrupting the overall mobility of NPM1. When a minor portion of NPM1 is entrapped by poly(PR) with large periodicity via strong but less-multivalent interactions, the poly(PR) variant strongly influences the minor fraction of NPM1. However, most NPM1 remain mobile, maintaining the overall mobility of NPM1. This clarifies the absence of toxicity in consecutive polyR, or poly(PR) variants with consecutive Args, as opposed to the toxicity of poly(PR).
Quantitative proteomic analyses also revealed that poly(PR) targets proteins with acidic stretches and/or abundant acidic clusters. Cationic peptides interact with acidic proteins. However, a comparison of the interactomes of R12, (GR)12, and (PR)12 indicated that Pro insertion led to selective enrichment of proteins with acidic motifs up to 10,000-fold when compared with R12. These enriched proteins display extraordinarily high acidic amino acid contents of >30%, suggesting susceptibility to LLPS. IPO7 and GIT1 phase separate with (PR)12 but not with R12 or P12R12, suggesting that extreme enrichment is mediated, at least in part, by LLPS. This is substantiated by the attenuation of NAP1L4 IP by 1,6-hexanediol.
The next issue to be clarified is the determinant of toxicity of poly(PR). MD calculation indicated that structural rigidity promoted by Pro may negatively impact the formation of protein–protein or protein–nucleotide interactions, which may be beneficial for multivalent interactions in the LLPS droplets. This demonstrated that the specific distribution of Arg in poly(PR) plays a pivotal role in phase separation, determination of interactomes, and cytotoxic properties, including inhibition of transcription and translation and disturbed NPM1 dynamics. In addition, the difference of subcellular localization of (PR)50 variants should be noted. It remains unclear why (PR)50 localizes to the nucleolus whereas (P16R16)3 exclusively localizes to the cytosol. In addition, it also remains uncertain why poly(PR) mainly locates in the cytosol rather than the nucleolus in the postmortem tissues of C9-ALS patients, even though poly(PR) locates in the nucleolus when overexpressed in mammalian cells, including human motor neurons derived from C9orf72 patient induced pluripotent stem cells (Wen et al., 2014; Mackenzie et al., 2015). The localization may be affected by continuous inhibition of nucleocytoplasmic transport (Zhang et al., 2015) or recruitment to cytosol by the interactors of poly(PR) (Hartmann et al., 2018), and further investigation is warranted. Another important problem to be solved is the pathophysiological roles of DPRs. Several poly(PR) transgenic mice exhibit ALS-like phenotypes including motor neuronal loss (Hao et al., 2019; Zhang et al., 2019); however, 500 repeat C9orf72 BAC mice lack motor deficits (Mordes et al., 2020). Although there are several explanations for this, such as that accumulation of RAN translation products increases with age (Cleary and Ranum, 2017) and can be exaggerated in humans due to long lifespans and potential differences in the efficacy of RAN translation in a species-dependent manner, the discrepancy also remains to be investigated. It should be noted that we mainly used (PR)12 in this study due to the limitation of the experimental design; however, the number of repeats observed in C9-ALS patients is >50, so we may not have captured all the elements in terms of reproducing human pathology.
We originally postulated that the number of molecules that a DPR binds to is determined by the number of Arg residues. Indeed, it was increased by insertion of alternating Pro residues, which function as spacers to enable the initiation of LLPS. Our results were substantiated by a report indicating that aromatic acids (which contribute to protein–protein interaction via π–π interaction and cation–π interaction and function as “stickers,” the appropriately distanced positioning of which is facilitated by “spacers”) induce phase separation (Martin et al., 2020). Thus, the balance and distribution of stickers [Arg, in the case of poly(PR)] and spacers [Pro, in the case of poly(PR)] regulate phase separation and subsequent biological events.
In conclusion, we revealed the previously unknown role of charge distribution in poly(PR). Alternate charge distributions govern phase separation, multivalent protein interactions, and cytotoxicity. Thus, it is the disruption of the dynamic exchanges of proteins in the phase equilibrium, and not simply undergoing LLPS, that determines the toxicity of R-rich peptides.
Materials and methods
Reagents
1-Step Human Coupled IVT kit, poly-rA RNA, Click-iT RNA Alexa Fluor 594 imaging kit, and SYTO-RNASelect reagent were purchased from Thermo Fisher Scientific. Recombinant human IPO7 (Ag27257), recombinant human GIT1 (Ag10658), rabbit polyclonal anti-NAP1L4 antibody (16018-1-AP), rabbit polyclonal anti-PAF1 antibody (15441-1-AP), rabbit polyclonal anti-IPO7 antibody (28289-1-AP), rabbit polyclonal anti-RPL5 antibody (15430-1-AP), and rabbit polyclonal anti-EIF3A antibody (11423-1-AP) were purchased from Proteintech. Horseradish peroxidase–conjugated anti-rabbit secondary antibody (#7074S) was purchased from Cell Signaling. Chambered coverglasses (uncoated or hydrophilic polymer MAS-coated) were obtained from Matsunami Glass. 1,6-Hexanediol was purchased from Tokyo Chemical Industry.
LLPS of peptides, nucleotides, and recombinant proteins
All peptides were chemically synthesized by Genscript with purity >85%. Human rRNA was extracted from HEK293 cells using the RNeasy kit (Qiagen). Human NPM1 cDNA was amplified from NPM1-GFP (gift of Xin Wang, Center for Cancer Research, National Cancer Institute, Bethesda, MD; Addgene plasmid #17578; RRID: Addgene_17578) by PCR and subcloned into the pET28a vector. BL21 (DE3) strain (New England Biolabs) was used for production of recombinant proteins. The expression of recombinant NPM1 was induced by 0.1 mM IPTG at 16°C overnight, and the expressed NPM1 was purified with a Ni-Sepharose column (Clontech) followed by dialysis and snap freezing. For the expression of recombinant GFP-(PR)50 and GFP-(P16R16)3, cDNAs encoding GFP-(PR)50 and GFP-(P16R16)3 were subcloned into pET28a vector. The expression of the proteins was induced by 1 mM IPTG at 25°C overnight, and the recombinant proteins were denatured with 6 M guanidium chloride-PBS, followed by purification with Ni-NTA agarose (Thermo Fisher Scientific) and serial dialysis. Poly-E was purchased from Fujifilm. Fluorescent labeling of recombinant NPM1 or poly-E was achieved using a HiLyte-Fluor 555 labeling kit (Dojindo), following the manufacturer’s protocol. TAMRA-labeled rA15 was synthesized by Fasmac (Kanagawa, Japan). For observation of LLPS of (PR)12 variants and poly-rA, each peptide (final concentration, 100 µM) and poly-rA (final concentration, 0.5 mg/ml) containing 100 nM of TAMRA-rA15 was mixed, and the droplets were observed on a chambered coverglass via an LSM-710 confocal microscope with a water-immersion 60× lens of NA 1.2 at room temperature (Carl Zeiss). FRAP analyses were also performed using the LSM-710 confocal microscope with a water-immersion 60× lens of NA 1.2 at room temperature, and data were analyzed with Zen software (Carl Zeiss). For observation of LLPS of (PR)12 variants and poly-E, each peptide (final concentration, 200 µM) and poly-E (final concentration, 1 mg/ml) containing 1 µg/ml of poly E labeled with HiLyte555 was mixed. For observation of LLPS of (PR)12 variants and human NPM1, each peptide (final concentration, 50 µM) and NPM1 (final concentration, 20 µM) containing 0.2 µM of NPM1 labeled with HiLyte555 was mixed. For observation of LLPS of (PR)12 variants, human NPM1 and human rRNA, each peptide (final concentration, 50 µM), NPM1 (final concentration, 20 µM) containing 0.2 µM of NPM1 labeled with HiLyte555 and 50 ng/µl of human rRNA was mixed. RNA was visualized using SYTO-RNASelect (Thermo Fisher Scientific). For LLPS with IPO7 or GIT1, 0.2 µg/µL of GST-fused human recombinant IPO7 (Proteintech) or 0.1 µg/µL of His6-fused human recombinant GIT1 (Proteintech) was mixed with 100 μΜ/µL of R12, (PR)12, P12R12, or water and observed by FV10i inverted confocal microscope (Olympus) with a 60× oil-immersion objective lens (NA 1.35) at room temperature.
FRAP analysis of live cells
HEK293 cells cultured in DMEM (Thermo Fisher Scientific) supplemented with 10% FBS and antibiotics were plated on a chambered coverglass (Matsunami). The cells were transiently transfected with GFP-(PR)50 variants or GFP-NPM1 in combination with mCherry-(PR)50 variants using Lipofectamine 2000 (Thermo Fisher Scientific). FRAP analyses were performed using the LSM-710 confocal microscope with a water-immersion 60× lens of NA 1.2 at 37°C, and data were analyzed with Zen software.
FRAP fitting
Quantitative LC-MS and IP analyses
1 nmol of HA peptide, HA-tagged (GR)12, HA-(PR)12, or HA-P12R12 was mixed with 200 µg of HEK293 cell lysates. IP was performed with IP buffer (150 mM NaCl, 20 mM Hepes, pH 7.4, 1 mM EDTA, 0.5% Triton X-100, and protease inhibitor cocktails) and 10 µl of anti-HA antibody beads (Fujifilm). After rotation at 4°C for 4 h, the beads were washed 4× with IP buffer and the precipitated proteins were eluted with 2× Laemmli sample buffer (Bio-Rad). The extracted samples were analyzed with LC-MS. Briefly, samples were reduced with 10 mM Tris(2-carboxyethyl)phosphine at 100°C for 10 min, alkylated with 50 mM iodoacetamide at ambient temperature for 45 min, and subjected to SDS-PAGE. Electrophoresis was terminated at a migration distance of 2 mm from the top edge of the separation gel. After Coomassie brilliant blue staining, protein bands were excised, destained, and finely cut before in-gel digestion with Trypsin/Lys-C Mix (Promega) at 37°C for 12 h. The resulting peptides were extracted from gel fragments and analyzed with an Orbitrap Fusion Lumos Mass Spectrometer (Thermo Fisher Scientific) combined with UltiMate 3000 RSLC nano-flow HPLC system (Thermo Fisher Scientific) in higher-energy C-trap dissociation MS/MS mode. Peptides were identified and quantified using Proteome Discoverer 2.4 software (Thermo Fisher Scientific), where the MS/MS spectra were searched against the Homo sapiens protein database in SwissProt (https://www.uniprot.org/), with an FDR of 1% as an identification filter. The results from the HA control peptide were omitted due to the very low abundance of proteins in the extract. The total counts of peptide fragments were 5.861 × 1010 for all HA peptides.
IP and IB analyses
1 nmol of HA peptide, HA-tagged (GR)12, or HA-(PR)12 variant was mixed with 200 µg of HEK293 cell lysates. IP was performed as described above. The immunoprecipitated samples were then subjected to SDS-PAGE analyses, followed by IB analyses. The signals were visualized by ECL-select reagent (GE-Amersham) and detected by ChemiDoc touch imaging system (Bio-Rad).
MD simulation
All biomolecular complexes between (PR)4 variants and RNA shown in Fig. 5 were modeled using AVOGADRO software and then studied at pH 7, with both N-terminal Pro residue and C-terminal Arg residue being protonated. MD simulations were performed using the open-source simulation package GROMACS v5.1.4 (Hanwell et al., 2012; Pronk et al., 2015). The amber99sb-ildn forcefield was used for all energetic parameters for inter- and intramolecular interactions (Lindorff-Larsen et al., 2010; Hornak et al., 2006). Simulations were performed in a cubic box (50 × 50 × 50 Å3) with periodic boundary conditions applied in all directions, where all complexes were located at the center of the box. The entire system was solvated with an explicit TIP3P water model and neutralized with sodium ions (Mahoney and Jorgensen, 2000; Jorgensen et al., 1983). The initial whole simulation system is shown in Fig. 2 A. Subsequently, energy minimization with the steepest descent method was performed to reach the maximum force <1,000 kJ/mol. The systems were preequilibrated for 1 ns at a constant temperature of 300 K using the V-rescale coupling method in a canonical (number, volume, absolute temperature [NVT]) ensemble and 1 ns at a constant pressure of 1 bar using the Parrinello–Rahman coupling method (Parrinello and Rahman, 1981) in the isothermal-isobaric (number, pressure, absolute temperature [NpT]) ensemble. The coupling constants were set to 0.1 and 2.0 ps for temperature and pressure, respectively. Electrostatic interactions were calculated using the particle mesh Ewald method (Darden et al., 1993), and van der Waals interactions were treated with a smooth cutoff at a distance of 10 Å. The LINCS algorithm (Hess et al., 1997), was used to constrain the bond lengths, thus permitting use of integration time steps of 2 fs. Finally, a 30-ns MD simulation was performed for all the complexes. Simulation trajectories were analyzed using the Visual Molecular Dynamics program (Humphrey et al., 1996), and all trajectories were stored every 10 ps for further analysis. Structural deviations along the trajectories were evaluated using the RMSDs of the protein backbone in Fig. 2 C.
By using the g_mmpbsa tool, the average binding energy was calculated from the last 10 ns in the MD trajectories. The dielectric constant of the aqueous solvent was set to 80, the interior dielectric constant was set to 4, and the surface tension constant g was set to 0.022 kJ/mol. The binding energy results for all simulated systems, including the contribution of residues in terms of all components, namely MM energy, polar energy, and nonpolar energy, are shown in Fig. 5, E–I.
Free energy landscape
Data mining
To extract effective features from protein sequence data that contained significant discriminatory information, data mining was performed to obtain accurate predictions. In this work, we analyzed protein sequence data based on Anaconda, a data science platform, by importing the data into Spyder (Python v3.7.3) in Anaconda (https://docs.anaconda.com/anaconda/).
Step 1: Detecting consecutive acidic stretches
Take the protein (i.e., EIF 3M-Human) with part of the whole amino acid sequence. We continually performed fishing from the start codon to the stop codon, and each time the fish consisted of six consecutive amino acids. Likewise, the same fishing process was followed for all other proteins selected, so that all amino acids of each protein could be covered.
Step 2: Scoring
After fishing for all proteins, we performed scoring for all sequences. Here, we focused on the occurrence of D/E amino acids in each sequence consisting of six consecutive amino acids. In particular, if the D/E are consecutive, they are expected to have a greater effect on the electrostatic force (Fig. 1 F). Therefore, we adapted a weighted scoring system as follows. For each sequence, if no D/E amino acids, the score value was 0; if one D/E amino acid, the score value was 1; if two D/E amino acids, the score value was 2; if three D/E amino acids, the score value was 4; if four D/E amino acids, the score value was 6; if five D/E amino acids, the score value was 8; and if six D/E amino acids, the score value was 10. Consequently, all six consecutive amino acids for all proteins were serially scored and summed to obtain the final score for each protein.
IVT
An IVT assay was performed using a one-step human coupled IVT kit (Thermo Fisher Scientific) as reported previously (Kanekura et al., 2016). Briefly, 100 µM of each peptide was added to HeLa cell lysates containing substrates and accessory proteins and incubated at 37°C, and fluorescence was monitored every 1 min using a LightCycler (Roche).
Evaluation of translation rate in cells by puromycin labeling
HeLa cells transiently transfected with GFP-(PR)50 variants were treated with 2 µg/ml of puromycin for 1 h. After fixation with 4% formalin-PBS, the cells were permeabilized with 0.5% Triton X-100, followed by immunostaining with anti-Puromycin antibody (mouse monoclonal, clone 12D10; Millipore) at 1:500 dilution and visualized by Alexa Fluor 594–labeled anti-mouse secondary antibody at 1:500 dilution (Thermo Fisher Scientific). The cells were mounted with ProLong Gold mounting reagent (Thermo Fisher Scientific). The images were obtained by FV10i inverted confocal microscope (Olympus) with a 60× oil immersion objective lens (NA 1.35) at room temperature. The signal intensities were determined by ImageJ software.
Evaluation of transcription in cells
HeLa cells treated with 10 µM (PR)20 peptide or transiently transfected with GFP-(PR)50 variants were incubated with 1 mM of EU for 2 h. After fixation with 4% formalin-PBS, the cells were permeabilized with 0.5% Triton X-100, followed by immunostaining with anti-GFP antibody (rabbit polyclonal; MBL) at 1:500 dilution for GFP-(PR)50 variant–expressing cells. Then newly synthesized RNA was visualized by Click-iT RNA Alexa Fluor 594 Imaging kit (Thermo Fisher Scientific). GFP-(PR)50 signals were visualized by Alexa Fluor 488–labeled anti-rabbit secondary antibody (Thermo Fisher Scientific). The cells were mounted with ProLong Gold mounting reagent (Thermo Fisher Scientific). The images were obtained by an FV10i inverted confocal microscope (Olympus) with a 60× oil-immersion objective lens (NA 1.35) at room temperature. The signal intensities were determined by ImageJ software.
Statistical analysis
All data are represented as mean values ± SD, and each statistical analysis is detailed in the figure legend. Data distribution was assumed to be normal, but this was not formally tested. Statistical analyses of the data were performed with SPSS software 26 (IBM). n indicates the number of biological replicates in an experiment unless otherwise mentioned.
Online supplemental material
Fig. S1 shows the heatmap for the interactome of R12, (GR)12, (PR)12, or P12R12. Fig. S2 compares the interactome of (PR)12 with the interactome of (PR)50 or the interactome of (PR)100. Fig. S3 shows that the transcription of RNA is potentially targeted by poly(PR) toxicity. Fig. S4 characterizes the 20 most enriched (PR)12-interacting proteins. Fig. S5 shows FRAP analyses of recombinant (PR)50 and (P16R16)3. Table S1 contains quantitative proteomics data. Table S2 contains GO enrichment data.
Data availability
All data are available in the main text or the supplementary materials. Most WB data and imaging data are deposited in https://data.mendeley.com/datasets/44sc4yb3wb/1. Further information and requests for resources and reagents should be directed to and will be fulfilled by Kohsuke Kanekura ([email protected]).
Acknowledgments
We thank Addgene and Dr. Xin Wang for providing us with NPM1-GFP (Addgene plasmid #17578; RRID: Addgene_17578). We thank Editage for English language editing.
This work was supported by grants from the Japan Society for the Promotion of Science KAKENHI (16H06247, 17H03923, and 20H03593 to K. Kanekura; 17K15671 to Y. Harada; 16H05973 and 20H02564 to Y. Hayamizu; and 17H04067 to M. Kuroda). This work was also supported in part by the Japan Agency for Medical Research and Development (16ek0109180h0001 and 17ae0101016s0904), Strategic Research Foundation Grant-aided Project for Private Universities from the Ministry of Education, Culture, Sports, Science and Technology of Japan (M. Kuroda), Takeda Science Foundation (K. Kanekura), Japan Intractable Diseases (Nanbyo) Research Foundation (K. Kanekura), the Tokyo Biochemical Research Foundation (K. Kanekura), and the Ichiro Kanehara Foundation (K. Kanekura).
The authors declare no competing financial interests.
Author contributions: C. Chen, Y. Hayamizu, M. Sugimoto, S. Narumi, M. Kuroda, and K. Kanekura designed the research and wrote the paper. K. Kanekura, Y. Yamanaka, T. Miyagi, S. Narumi, and Y. Harada designed and performed all the imaging studies, protein preparation, LLPS formation, and FRAP analyses. K. Kanekura and K. Ueda designed and performed proteomics. C. Chen, P. Li, S. Tezuka, Y. Hayamizu, and K. Kanekura designed and performed MD simulations and data mining from LC-MS.