The calcium-binding protein calmodulin (CaM) directly binds to membrane transport proteins to modulate their function in response to changes in intracellular calcium concentrations. Because CaM recognizes and binds to a wide variety of target sequences, identifying CaM-binding sites is difficult, requiring intensive sequence gazing and extensive biochemical analysis. Here, we describe a straightforward computational script that rapidly identifies canonical CaM-binding motifs within an amino acid sequence. Analysis of the target sequences from high resolution CaM–peptide structures using this script revealed that CaM often binds to sequences that have multiple overlapping canonical CaM-binding motifs. The addition of a positive charge discriminator to this meta-analysis resulted in a tool that identifies potential CaM-binding domains within a given sequence. To allow users to search for CaM-binding motifs within a protein of interest, perform the meta-analysis, and then compare the results to target peptide–CaM structures deposited in the Protein Data Bank, we created a website and online database. The availability of these tools and analyses will facilitate the design of CaM-related studies of ion channels and membrane transport proteins.
INTRODUCTION
The Ca2+-binding protein calmodulin (CaM) directly binds to membrane transport proteins to regulate membrane excitability and Ca2+-dependent intracellular signal transduction cascades. CaM communicates changes in intracellular Ca2+ levels to channels and transporters by binding to the cytoplasmic domains of these proteins to modulate protein function (calmodulation) (Mruk et al., 2012; Biswas et al., 2013; Ben-Johny and Yue, 2014). The physiological relevance of calmodulation is highlighted by the disease-associated mutations that disrupt CaM–ion channel protein interactions (Weiss et al., 2003; Ghosh et al., 2006; Shamgar et al., 2006; Etxeberria et al., 2008; Alaimo et al., 2009; Hino et al., 2012). Accordingly, many biochemical and structural investigations have focused on determining CaM-binding sites in peptides derived from the water-soluble domains of membrane proteins. Although these concerted efforts have resulted in identifying more proteins that bind to CaM, identifying CaM-binding sites in proteins is still mostly a haphazard exploration, requiring unguided, brute force experimentation.
Part of the challenge of discovering and investigating novel CaM–membrane transport protein interactions is that CaM-binding sites do not contain high sequence similarity. Instead, CaM targets often share common biochemical and biophysical characteristics such as a propensity to form amphipathic α helices, net positive charge, moderate hydrophilicity, and hydrophobic anchor residues (Rhoads and Friedberg, 1997). Because of the lack of a well-defined CaM-binding consensus sequence, CaM-binding motifs have been classified by the spacing between hydrophobic anchor residues that are broadly characterized into two subgroups: Ca2+-independent binding and Ca2+-dependent binding (Table 1). A comparison of proteins that bind to CaM in the absence of Ca2+ identified the hallmark IQ sequence motif (Rhoads and Friedberg, 1997). In this motif, amino acids at positions 1, 2, 5, 6, 11, and 14 are highly conserved. Sequences containing different amino acids at these positions are classified as IQ-like motifs, which CaM binds to in the presence or absence of Ca2+. In addition to IQ-like motifs, Ca2+ binding to CaM induces a conformational change that promotes binding to additional target proteins, which do not contain easily identifiable motifs. A closer examination of these targets showed that the primary requirement for CaM binding is the presence of bulky hydrophobic residues: Phe, Ile, Leu, Val, and Trp, at the first and last position of the binding region (Rhoads and Friedberg, 1997), which anchor the protein into the two lobes of CaM (LaPorte et al., 1980). The remaining intermediate residues between the anchors are highly variable in both sequence and spacing because the central CaM helix is flexible (Seaton et al., 1985; Ikura et al., 1991; Barbato et al., 1992), enabling CaM to bind to a wide variety of protein targets.
Although most CaM-binding motifs can be grouped into categories based on the spacing between anchor residues, several high resolution structures suggest that CaM also binds atypical sequences. For example, the crystal structures of CaM bound to a peptide from the ryanodine receptor (Maximciuc et al., 2006), the plasma membrane Ca2+-ATPase (PMCA) pump (Juranic et al., 2010), and the NMDA receptor (Ataman et al., 2007) show that CaM can bind to motifs that contain hydrophobic anchors either further apart (16 and 17 residues) or closer together (6 residues) than the classical Ca2+-dependent binding motifs. In addition, helicity of the unbound target peptide is not a strict requirement because CaM binds to the disordered proteins neuromodulin and neurogranin (Kumar et al., 2013). Thus, the repertoire of peptide sequences that CaM binds to is difficult to categorize.
Several algorithms have been developed to predict CaM-binding domains within the proteome (Radivojac et al., 2006; Hamilton, M., A.S.N. Reddy, and A. Ben-Hur. 2011. Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine; Minhas and Ben-Hur, 2012; Wang et al., 2012). The web-accessible algorithm (Calmodulin Target Database) relies on the biochemical and biophysical properties of classic CaM targets to predict a CaM-binding site within a user-inputted sequence (Yap et al., 2000). Thus, CaM-binding sites that do not meet these biophysical criteria or stretches of protein sequence that contain multiple or partially overlapping CaM-binding sites will be missed. Moreover, these algorithms are only predictive; they do not display all of the known canonical CaM-binding motifs, leaving the experimentalist in the dark. Therefore, we developed a script to identify every canonical CaM-binding motif within a given sequence. Analysis of PDB-deposited, CaM-target peptide structures revealed that CaM often binds to regions that have multiple overlapping CaM-binding motifs. Combining this identification method with a simple charge discriminator results in reasonable predictive power (71% true positive [TP]; 78% true negative [TN]) on a test set of biochemically characterized CaM-binding motifs. Because we have found this analysis useful in both the design of experiments and analysis of experimental results, we have made the meta-analysis Perl script available for download as a ZIP file (see Online supplemental material below) and created a database that allows users to search for CaM-binding motifs within a protein of interest and perform the meta-analysis (Mruk et al., 2014). Future updates to the database and scripts will be available at the Calmodulation Database and Meta-Analysis website (http://cam.umassmed.edu), which also allows users to compare their sequences to target peptide–CaM structures deposited in the Protein Data Bank (PDB).
MATERIALS AND METHODS
Dataset generation
To generate a set of proteins known to bind CaM, we searched the PDB for structures of CaM–peptide complexes deposited through December 2012. For each structure, the full sequence of the protein from which the peptide is derived was retrieved from the Universal Protein Resource Knowledgebase (UniProt). Duplicate sequences were identified on the basis of their UniProt identification numbers and removed from the dataset, yielding a final sample size of 48 different proteins and 52 CaM-binding sequences. The coordinates of CaM-binding proteins within the full sequences were determined by visual inspection of the structures deposited in the PDB.
Motif identification and sequence characterization
To identify the canonical CaM-binding motifs (listed in Table 1) in the protein sequences within our dataset, we used degenerate text pattern matching via regular expressions. We used a custom Perl script that searched the full sequence of each member of our dataset for subsequences that matched any of the motifs listed in Table 1. Matches to motifs were recorded and used to calculate the motif score for each amino acid, which we define as the number of motifs that include that amino acid.
To determine hydrophobicity, we used a rolling window average using multiple window sizes. For a window size N, the Kyte and Doolittle (1982) hydrophobicity values for the first N amino acids were summed and divided by N. This process was repeated for the window spanning amino acid 2 to N + 1, and for each window thereafter. Net charge was determined in a similar fashion, but the sum was reported instead of the average. The meta-analysis Perl script is available for download as a ZIP file as part of the online supplemental material (see below) and at http://cam.umassmed.edu (Mruk et al., 2014).
Parameter optimization
To establish an optimal combination of window size, motif count, hydrophobicity, and net charge for discriminating CaM-binding regions from nonbinding regions, we used our CaM-binding protein dataset to generate both a CaM-binding–negative and CaM-binding–positive test set. For each of the three window sizes we tested (8, 10, and 15 amino acids long), each full protein sequence in the dataset was divided into all possible windows. If a window was wholly contained within the CaM-binding regions we determined by visual inspection of deposited structures above, it was assigned to the positive test set. If a window was wholly outside of a CaM-binding region, it was instead assigned to the negative test set.
For each window in each test set, the motif score for each amino acid was averaged across the window. To determine the set of parameters that most accurately identified windows as CaM-binding regions or non–CaM-binding regions, we measured the percentage of windows correctly assigned to either the CaM-binding–positive test set (TPs) or the CaM-binding–negative test set (TNs) for 1,800 combinations of motif count, charge, and hydrophobicity. We chose the set of parameters that gave the best sum of TP and TN rates, which was as follows: window size of 10 amino acids, average motif count of ≥2, net charge of ≥1, and average hydrophobicity between −3 and 2.5 (Table S1).
Database comparison
To compare our database to the Calmodulin Target Database (Yap et al., 2000), amino acid sequences were culled from the 52 CaM-binding domains and entered into both our script and the Calmodulin Target Database. Results were tabulated (Table S2).
Online supplemental material
Fig. S1 shows that the addition of a positive charge discriminator to the script excludes all KCNQ4 transmembrane domains except the positively charged S4 segment. Table S1 compares the TP/TN values for the optimized parameters, and Table S2 compares the CaM motif identification and binding region predictions of the meta-analysis and the Calmodulin Target Database (Yap et al., 2000). In addition, the Perl script is available as part of the online supplemental material for download as a ZIP file, and it is also available at http://cam.umassmed.edu (Mruk et al., 2014). To run the standalone script, type (with spaces): perl NameofFile AminoAcidSequence 0or1 StartingAminoAcidNumber (optional). Two tab-delineated text files will be saved: (1) the meta-analysis results and (2) a list of motifs sorted by amino acid position (Option 0) or motif classification (Option 1). Note that Perl is not included with Windows OS and must be manually downloaded and installed. The program is free of charge from Perl.
RESULTS
CaM’s promiscuous binding and multifarious roles in KCNQ (Kv7.x) channel modulation motivated us to devise a method to identify CaM-binding motifs in the C terminus of these channels. Previous biochemical studies show that CaM binds to helix A and/or helix B of the KCNQ C terminus through an IQ-like motif and two adjacent 1–5–10 motifs, respectively (Yus-Najera et al., 2002) (Fig. 1 A). However, for CaM to interact with full-length KCNQ channels, both target sites must be intact (Gómez-Posada et al., 2011). In addition to the canonical CaM-binding motifs, the recently determined crystal structure of CaM bound to helix B of KCNQ4 identified a noncanonical 1–14 motif in which methionine acts as the first anchoring residue (Fig. 1 A) (Xu et al., 2013). A retrospective glance at the KCNQ C termini readily identifies these published motifs, and further sequence gazing yields additional canonical motifs that are also present in the two helices. Because the unaided scanning for CaM motifs in membrane transport proteins is prone to human bias and error, we initially wrote a script that identifies all of the canonical CaM-binding motifs within a given sequence. Examination of helix A of KCNQ channels using this script identifies the conserved IQ-like motifs. For KCNQ2 channels, the “IQ” isoleucine is also part of an as yet unnoticed 1–12 motif (Fig. 1 B). In contrast to helix A, helix B of the KCNQ family contains multiple (5–16) canonical motifs (Table 2); KCNQ4’s helix B has CaM motifs from every subgroup except for IQ (Fig. 1 C). We next ran our script on 48 unique CaM–peptide structures deposited in the PDB, as these structures contain well-annotated CaM-binding sites. This analysis revealed that CaM often binds to target peptides that contain multiple overlapping canonical CaM-binding motifs, suggesting a straightforward method for predicting CaM-binding domains. However, using this criterion alone on full-length channel sequences resulted in hydrophobic stretches (e.g., transmembrane domains) misidentified as potential CaM-binding motifs. Because most confirmed CaM motifs have a net positive charge, we added a simple positive charge discriminator to the script, which increased specificity by ∼50% while having only a modest effect on sensitivity (positive charge discriminator: 71% TP, 78% TN; neutral charge: 85% TP, 51% TN) (Table S1). Although the charge discriminator excludes most KCNQ4 transmembrane regions, the meta-analysis does predict CaM binding to the positively charged voltage sensor (Fig. S1). Counterintuitively, the addition of a stringent hydrophobicity parameter (to exclude transmembrane domains) did not significantly improve the accuracy of the meta-analysis for all proteins or membrane proteins (Table S1).
Mapping our meta-analysis (Fig. 1 C) onto the KCNQ4 helix B–CaM crystal structure (PDB accession no. 4GOW) illustrates the utility of this simple tool (Fig. 1 D). The motif score is the number of times (hexadecimal with scores of ≥15 returning a value of “Z”) a residue in the amino acid sequence is part of a unique canonical CaM-binding motif. The individual canonical motifs and their locations are shown below the KCNQ4 helix B amino acid sequence. Because both the motif score and positive charge of the highlighted sequence are equal to or greater than the cutoff values determined using the CaM target peptide test set (Materials and methods), the meta-analysis predicts a CaM-binding region in the KCNQ4 B helix. This prediction misses the first anchor in the structure by only one residue, which is expected because methionine is a noncanonical anchor. In contrast, our script identifies potential anchor residues in the distal C-terminal end of helix B that do not form contacts with CaM in the crystal structure. Although these residues are beyond the grasp of CaM in this structure, the presence of multiple canonical CaM-binding motifs within the distal end of helix B hints that CaM has the opportunity to adopt more than one conformation when bound to helix B of KCNQ channels.
Given our success at identifying CaM-binding regions in KCNQ channels, we applied the meta-analysis to the voltage-gated calcium channel, CaV1.2. CaV1.2 is an ideal test subject because it contains three well-characterized CaM-binding domains, all of which have been co-crystallized with CaM (Fig. 2 A). We first performed the meta-analysis on the pre-IQ/IQ region of CaV1.2, which predicted three regions for CaM binding (Fig. 2 B): the two regions in the pre-IQ domain each contained the anchor residue identified in the crystal structure (PDB accession no. 3G43; Fig. 2 A), and the third region corresponded with the crystallized IQ domain (PDB accession no. 3G43; Fig. 2 A). We next examined the N-terminal spatial Ca2+-transforming element (NSCaTE) peptide (Dick et al., 2008) containing the noncanonical motif xWxxx(I/L)xxxx (Taiakina et al., 2013). Currently, high resolution structures with this motif have been determined with either the N- or C-lobes of CaM (Liu and Vogel, 2012), but not with full-length CaM. Although there are canonical CaM motifs within this region of CaV1.2, they do not substantially overlap with the noncanonical motif that CaM is bound to in the crystal structures (Fig. 2 C, left). Meta-analysis on the NSCaTE target peptide did not predict a CaM-binding domain in the structure (PDB accession no. 2LQC; Fig. 2 A, right). Lastly, we ran the meta-analysis on the calcium-binding protein (CaBP)1-binding domain in CaV1.2 (Fig. 2 C, right) (Zhou et al., 2005). CaBPs 1–5 are homologous to CaM (Haeseleer et al., 2000) but have been shown to differentially regulate voltage-gated calcium channels and transient receptor potential channels (Lee et al., 2002; Haeseleer et al., 2004; Kinoshita-Kawada et al., 2005; Zhu, 2005; Oz et al., 2011). Interestingly, the N-terminal half of the CaBP1-binding domain is devoid of canonical CaM-binding motifs, whereas the other half contains multiple motifs, resulting in the prediction of a CaM-binding domain in the C-terminal end of the CaBP1-binding domain (Fig. 2 C, right).
We also tested how our meta-analysis fared with target peptides from the voltage-gated sodium channel, NaV1.5. Examination of the IQ domain of the C-terminal region of NaV1.5 channels identified multiple canonical CaM-binding motifs (Fig. 3 A). Similar to KCNQ4 channels, the meta-analysis predicted a larger site for CaM binding than was determined by the NMR structure (Protein Data Base accession no. 4DCK; Fig. 3 B). More recently, Sarhan et al. (2012) crystallized the structure of CaM bound to the DIII–IV inactivation gate of NaV1.5 channels, which binds using an unorthodox tyrosine anchor. The meta-analysis predicted two CaM-binding domains in the DIII–IV linker (Fig. 3 C); however, these domains do not substantially overlap with residues interacting with CaM in the crystal structure (PDB accession no. 4DJC; Fig. 3 D). In addition, the noncanonical motif, phenylalanine–isoleucine–phenylalanine (Potet et al., 2009), is also missed by the meta-analysis. Given that the meta-analysis relies on canonical CaM-binding motifs, it was not surprising that these noncanonical CaM-binding sites were not identified.
Lastly, we ran the meta-analysis on peptides derived from a membrane transporter: the PMCA isoform 4 (PMCA4). PMCA4 has a splice site that is located in the middle of a CaM-binding domain, which results in two different targets (Strehler, 1991; Strehler et al., 1991). Although CaM does not bind to a canonical motif in either target peptide, our meta-analysis mapped well to the NMR-determined CaM-binding regions of both the C20 and C28 peptides (PDB accession nos. 1CFF and 2KNE; Fig. 4 A). A comparison of the canonical motifs in each peptide (Fig. 4 B) reveals that the C28 peptide contains several 1–14 motifs using the tryptophan anchor at the number 1 position that are absent in the shorter C20 peptide.
These test cases highlight the predictive power and limitations of our canonical motif-clustering meta-analysis. To determine how our simple meta-analysis compares to an algorithm that relies on the biophysical properties of the target peptide, we ran 53 PDB CaM target sequences with one CaM-binding site through the Calmodulin Target Database (Yap et al., 2000). For small target peptides (<100 residues), our meta-analysis (67%) fared better than the Calmodulin Target Database (<50%) at predicting a CaM-binding domain in these structures (Table S2). In contrast, our meta-analysis overestimates the number of CaM-binding domains for targets >100 residues compared with the algorithm used by the Calmodulin Target Database. The imperfection of both computational methods suggests that neither canonical motif clustering nor the biochemical properties of the target are sufficient to flawlessly predict the molecular complexity of CaM target recognition and binding.
Because delineating every canonical CaM-binding motif in the cytoplasmic domains of ion channels and membrane transporters has more utility than computational predictions of CaM-binding sites, we created the Calmodulation Database and Meta-Analysis website: http://cam.umassmed.edu (Mruk et al., 2014). The website allows users to find canonical CaM-binding motifs within an inputted sequence and uses the meta-analysis to predict potential CaM-binding domains within that sequence. In addition to searching a given protein sequence, the database can be searched to find PDB files that contain a specific canonical CaM-binding motif in the target peptide, which may or may not be anchored by the N- and C-lobes of CaM. Additionally, PDB files that contain target peptides derived from any protein, species, or type of membrane transport protein can be retrieved and subsequently analyzed. The worldwide availability of this simple meta-analysis and database will enable ion channel and membrane transport researchers to readily identify the canonical CaM-binding motifs within their protein of interest, assist in the design of CaM target peptides, and compare CaM–peptide structures to their protein(s) of interest.
DISCUSSION
Given the increasing interest in calmodulation of ion channels and transporters, there existed a need for a simple tool that quickly identifies the potential CaM-binding regions of these membrane proteins. Therefore, we wrote a computational script to identify every canonical CaM-binding motif within a given protein sequence. Using this script on target sequences from high resolution CaM–peptide structures, we found that CaM often binds to peptide sequences containing multiple overlapping canonical motifs. Combining our motif identification script with a simple charge discriminator yielded a useful tool that can predict CaM-binding regions with both specificity and sensitivity.
To determine the strengths and weaknesses of this simple meta-analysis, we mapped our CaM-binding domain predictions onto a panel of high resolution CaM–peptide structures. For structures in which CaM binds to canonical motifs, such as the IQ domains from CaV1.2 and NaV1.5, our meta-analysis correctly predicts the region to which CaM binds. In contrast, the binding sites in noncanonical target peptide–CaM structures (CaV1.2 NSCaTE peptide and NaV1.5 DIII–DIV linker) are missed by the meta-analysis. In fact, the multiple flanking canonical motifs in the NaV1.5 DIII–DIV linker induce our meta-analysis to predict two potential binding domains; however, neither of these regions forms protein–protein interactions with CaM in the crystal structure (Fig. 3 D). Although these noncanonical binding domains are missed by our meta-analysis, CaM-binding domains were correctly predicted for both the C20 and C28 peptides of PMCA4, which bind to CaM in a noncanonical fashion (Fig. 4). In the shorter C20 peptide structure, only one CaM lobe makes contact with the peptide. Closer examination of the sequence shows that the anchoring tryptophan has the potential to act as the first anchor for canonical motifs (1–5–10, 1–5–8–14), but the shorter C20 peptide lacks the requisite terminal anchors (Fig. 4 B). Indeed, these motifs are completed in the longer C28 peptide, which in combination with a phenylalanine in position 18, anchors both lobes of CaM, wrapping the C28 peptide in an antiparallel manner.
Several proteomic computational methods have been developed to identify CaM-binding proteins and the location of the CaM-binding sites on these proteins (Yap et al., 2000; Radivojac et al., 2006; Hamilton, M., A.S.N. Reddy, and A. Ben-Hur. 2011. Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine; Minhas and Ben-Hur, 2012; Wang et al., 2012). The majority of these prediction methods use structural information to predict whether specific residues may belong to a protein–protein interface. Despite the increasing number of CaM–peptide structures, predicting the exact CaM-binding site on any one protein remains challenging, in part, because CaM’s interactions with its targets can be dynamic and not well represented by a single structure. Therefore, prediction methods dependent on sequence information alone could be useful in identifying CaM-binding sites. However, simple sequence alignments of potential partners to known CaM-binding sites afford poor sensitivity (TP ∼40%) (Minhas and Ben-Hur, 2012). Accordingly, most protein–protein predictive algorithms that rely on sequence information alone are based on neural networks (Ofran and Rost, 2003; Minhas and Ben-Hur, 2012) or hidden Markov models (Friedrich et al., 2006). Because we optimized the parameters of the meta-analysis on high resolution CaM target peptide structures in the PDB, this limited dataset is not compatible with these computational methods. In spite of this limitation, the meta-analysis TP/TN values are respectable when compared with the Calmodulin Target Database and a published neural network algorithm for predicting CaM-binding regions (Yap et al., 2000; Minhas and Ben-Hur, 2012).
For those interested in the calmodulation of ion channels and membrane transporters, the significant advantage of the web-based meta-analysis is the ability to efficiently visualize all of the canonical motifs within a given sequence. This palatable presentation of every canonical CaM-binding motif highlights the complexity of CaM binding and may explain the discrepancies observed between KCNQ–CaM structural and biochemical studies. For example, our meta-analysis predicts a larger CaM-binding domain that contains several potential anchor residues in the distal C-terminal end of helix B that do not form contacts with CaM in the KCNQ4 crystal structure (Fig. 1, C and D). This prediction, however, is consistent with biochemical and mutational data that demonstrate that this region is important for CaM association with KCNQ channels (Ghosh et al., 2006; Gómez-Posada et al., 2011). Because crystal structures are static snapshots, it is possible that CaM adopts more than one conformation when bound to functioning KCNQ channels or shuttles between the multiple binding motifs within helix B on a single KCNQ C terminus.
Although most attention has been paid to the calmodulation of ion transport proteins, additional CaBPs also regulate ion channel function. Similar to CaM, the CaBP family (CaBPs 1–8) contains paired EF hands that undergo structural rearrangements upon Ca2+ binding. CaBP1 has been shown to compete for binding to the IQ- and to the N-terminal domains of CaV1.2 channels (Lee et al., 2002). Consistent with this finding, our meta-analysis picks up a potential CaM-binding region within the CaBP1-binding domain (Fig. 2 C, right), lending credence to the hypothesis that the CaBPs may recognize their targets similarly to the way CaM binds to its targets.
Our meta-analysis suggests that CaM binds to regions rich with canonical motifs. Using the canonical motif finder, structurally defined CaM-binding domains can be redesigned to determine whether the overlapping CaM-binding sites are necessary for the binding and calmodulation of ion transport proteins. Together with the meta-analysis, these tools make it simpler to find canonical motifs, identify potential CaM-binding regions, and design future biochemical and structural experiments, thereby accelerating our understanding of calmodulation of ion channels and membrane transport proteins.
Acknowledgments
This work was supported by a grant from the National Institutes of Health (GM-070650 to W.R. Kobertz).
The authors declare no competing financial interests.
Angus C. Nairn served as editor.
References
Author notes
K. Mruk and B.M. Farley contributed equally to this paper.
K. Mruk’s present address is Dept. of Chemical and Systems Biology, Stanford School of Medicine, Stanford, CA 94305.
B.M. Farley’s present address is Dept. of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720.