The potential of spin labeling to reveal the dynamic dimension of macromolecules has been recognized since the dawn of the methodology in the 1960s. However, it was the development of pulsed electron paramagnetic resonance spectroscopy to detect dipolar coupling between spin labels and the availability of turnkey instrumentation in the 21st century that realized the full promise of spin labeling. Double electron-electron resonance (DEER) spectroscopy has seen widespread applications to channels, transporters, and receptors. In these studies, distance distributions between pairs of spin labels obtained under different biochemical conditions report the conformational states of macromolecules, illuminating the key movements underlying biological function. These experimental studies have spurred the development of methods for the rigorous analysis of DEER spectroscopic data along with methods for integrating these distributions into structural models. In this tutorial, we describe a model-based approach to obtaining a minimum set of components of the distance distribution that correspond to functionally relevant protein conformations with a set of fractional amplitudes that define the equilibrium between these conformations. Importantly, we review and elaborate on the error analysis reflecting the uncertainty in the various parameters, a critical step in rigorous structural interpretation of the spectroscopic data.
- Award Id(s): GM113195,MH070039,NS117873,GM124310,DC007664,GM128087,GM077659
The modern tool kit of structural biology includes diverse, complementary methods spanning the spectrum from computational to experimental that can visualize global structures at atomic resolution as well as probe them on local site-specific levels and can reveal the dynamic dimension under equilibrium or time-resolved conditions. The integration of these methods has been touted as the next frontier in the quest to reveal the interplay of structure and dynamics that underpins protein function. Probe methods, including electron paramagnetic resonance (EPR) spectroscopy of site-specifically spin-labeled proteins, have contributed a unique perspective on protein dynamics (Mchaourab et al., 2011; Claxton et al., 2015; Jeschke, 2018a). In particular, pulsed EPR methods are distinguished by the ability to investigate proteins in multiple conformational states in near-native environments with high sensitivity. Because integration of structural biology data is mostly performed a posteriori, the process requires not only knowledge of the rules of interpretation (i.e., how models are derived from data) but also well-established tools for assessment of rigor and reproducibility of the underlying analysis.
The pulsed EPR technique double electron-electron resonance (DEER) spectroscopy measures long-range (>20 Å) distances between electron spins (Jeschke, 2012; Borbat and Freed, 2014; Jeschke, 2014), typically extrinsic probes introduced into biological macromolecules by site-directed spin labeling (Roser et al., 2016). DEER has been applied to multiple proteins, RNA, and DNA systems. In the protein space, unique insight was revealed on ion channel architecture and gating (Pliotas et al., 2012; Dellisanti et al., 2013; Dalmas et al., 2014; Dürr et al., 2014; Puljung et al., 2014; Raghuraman et al., 2014; Arrigoni et al., 2016; Zhu et al., 2016; Evans et al., 2020), transporter alternating access and energy landscapes (Claxton et al., 2010; Mittal et al., 2012; Georgieva et al., 2013; Kazmier et al., 2014a; Kazmier et al., 2014b; Masureel et al., 2014; Joseph et al., 2015; Dastvan et al., 2016; Martens et al., 2016; Timachi et al., 2017; Verhalen et al., 2017; Göddeke et al., 2018; Dastvan et al., 2019; Joseph et al., 2019; Nyenhuis et al., 2020), and receptor activation and allosteric modulation (Park et al., 2006; Kim et al., 2012; Kang et al., 2015; Yee et al., 2015; Van Eps et al., 2017; Van Eps et al., 2018; Wingler et al., 2019; Elgeti and Hubbell, 2021). The range of applications of DEER include evaluation of static structures in solution (e.g., Zhou et al., 2005) or native-like environments (e.g., Barrett et al., 2012), probing ligand-induced conformational changes, characterization of equilibrium fluctuations, and investigation of protein–protein interactions (Hilger et al., 2005; Hilger et al., 2007; Kim et al., 2011; Edwards et al., 2014). The most frequent strategy is to attach pairs of spin labels at introduced cysteines in a purified protein (Fig. 1 A), although promising alternative strategies have been introduced (Roser et al., 2016; Ackermann et al., 2021; Gamble Jarvi et al., 2021). Multiple pairs are designed to provide a pattern of characteristic distances or distance changes relevant to the structure or model to be tested. Distance restraints from DEER have been integrated with computation methods to model protein structures ab initio (Alexander et al., 2008; Alexander et al., 2013; Jeschke, 2016), to refine structures, and to generate models of intermediate states (Kazmier et al., 2014b; Evans et al., 2020) or protein complexes (Kim et al., 2012; Edwards et al., 2014). In combination with cryo-EM, DEER distance distributions can provide a complementary approach to pinpoint the mechanistic identity of various conformations and to validate models of conformational changes (Dürr et al., 2014; Zhu et al., 2016).
The cornerstones of DEER data interpretation are the rigorous transformation of the raw signals into complex distance distributions, the structural interpretation of these distributions as spatial restraints, and the generation of structural models consistent with the derived restraints. As an illustration, consider a hypothetical membrane transporter doubly labeled at the extracellular surface of two transmembrane (TM) helices (Fig. 1 B) with two conformations, outward closed (OC) and outward open (OO), that differ by a large translocation of one helix relative to the other. The distribution, P, of distances, R, between the labels for each conformation (Fig. 1 C) is determined by its tertiary structures, the degree of disorder in the structure, and the rotameric disorder of the labels. The different distance distributions for the two conformations produce distinct DEER time traces (Fig. 1 D).
More generally, distinct distance components with varying widths in the P(R) reflect the collection of conformations of the protein backbone that are populated under the experimental conditions and the rotameric states of the spin-label side chain at each of these conformations. Biochemical conditions can be manipulated to shift the energies of protein conformers, leading to a set of DEER signal traces that differ by the populations of these components. Here, we focus on tools for the rigorous analysis of multiple DEER time traces obtained under varying conditions from one or more doubly spin-labeled mutants of a given protein. We describe a model-based approach to obtaining a minimum set of distance components related to functionally relevant conformations with a set of fractional amplitudes that define the equilibrium between these conformations together with error analysis reflecting the uncertainty in the various parameters.
From DEER signal to distance distribution
Other approaches to the analysis of DEER data include the commonly used Tikhonov regularization (Chiang et al., 2005; Jeschke et al., 2006; Ibáñez and Jeschke, 2020; Ibáñez et al., 2020), alternative regularization approaches (Ibáñez and Jeschke, 2019), denoising methods followed by singular value decomposition (Srivastava and Freed, 2017; Srivastava et al., 2017; Srivastava and Freed, 2019), and neural networks (Worswick et al., 2018; Amey et al., 2021).
Model-based analysis of DEER data
The model-based approach to the analysis of DEER data is illustrated in Fig. 2 using data from T4 lysozyme (T4L) labeled at residues 65 and 80 with the methanethiosulfonate spin label (MTSSL). After a trivial scaling and shift of the time axis, the DEER data are fit using a model-based approach that includes the best-fit background factor. The optimal fit, according to BIC values, models P(R) as a sum of two Gaussians. Table 1 gives the statistics obtained from fitting the data in Fig. 2 to n = 1, 2, and 3 Gaussian components. The BIC values for the n = 1 and n = 3 models both differ from that of the n = 2 model by <10, suggesting that none of these models can be strongly rejected based on this dataset. In fact, the n = 3 model is preferred according to the AICc values. The small amplitude of the third component and the large uncertainty of this amplitude (0.02 ± 0.06), together with the very narrow width of this Gaussian (0.04 ± 2.81 Å), all support the rejection of this more complex model (see Table 2).
The second component in the optimal P(R) does not give rise to a distinct second mode and is fully consistent with a single backbone conformation of T4L with some inherent disorder and a distribution of rotameric states of the two labels. In other cases discussed below, individual well-separated Gaussian components clearly correspond to distinct backbone conformations. A general rule of thumb is that the number of conformations is less than or equal to the number of Gaussian components.
As has been previously discussed (Hustedt et al., 2018), model-based analysis allows for rigorous error analysis of the best-fit parameter values and the calculation of a confidence band for the best-fit distance distribution from the full covariance matrix using propagation of errors. The 2σ (95%) confidence band for the best-fit n = 2 Gaussian P(R) is shown in Fig. 2 C as a shaded region and reproduced in Fig. 2 D along with the best-fit P(R) for 15 replicate datasets collected for the same sample. These results demonstrate that the confidence band obtained from fitting a given dataset properly estimates the reproducibility of the distance distribution.
The P(R) obtained using Tikhonov regularization is also shown in Fig. 2 C. The model-free P(R) is quite close to the P(R) modeled as the sum of two Gaussians. Both the model-based approach as developed primarily at Vanderbilt University (Brandon et al., 2012; Stein et al., 2015; Hustedt et al., 2018) and the model-free approach using Tikhonov regularization, as developed primarily by Jeschke et al. (2006), have been extensively used to analyze DEER data and are well documented. Although historically the use of Tikhonov regularization to give model-free fits to DEER data has required an initial background correction step, recently an approach for applying Tikhonov regularization to uncorrected data has been developed (Ibáñez and Jeschke, 2020; Ibáñez et al., 2020).
Global analysis paradigm
To demonstrate putting these concepts into practice, consider the three simulated DEER traces in Fig. 4 that were calculated for bimodal P(R) generated from the same two unimodal distributions selected from the test set generated by Edwards and Stoll (2018) with different relative amplitudes (Table 3). The underlying biological model is that the protein is undergoing a ligand-dependent shift in the equilibrium between two conformational states. Given the P(R) obtained from fitting these data individually, either to bimodal Gaussian distributions (Fig. 4 B) or using a Tikhonov regularization (Fig. 4 C), it would be hard to argue convincingly whether the protein is or is not behaving according to this model of its functional dynamics (i.e., that the protein adopts the same two conformational states at all ligand concentrations albeit in different proportions).
However, when the fit requires the means and SDs of the two Gaussian components to be equal at different ligand concentrations, the best-fit distance distributions (Fig. 4 D) more accurately and precisely correspond to the true distributions. For this analysis, the background factors, η, the modulation depths, Δ, and the relative amplitudes of the two components vary for the three datasets (see Figs. S1, S2, and S3). The benefits of the linked global analysis are clearly demonstrated by considering the low-amplitude components in datasets A and C, which are poorly recovered in the individual fits, regardless of the approach used (Fig. 4, B and C). In the global fit, however, the same two Gaussian components are used to fit all three datasets. Both of the recovered Gaussians, shown as cyan lines in Fig. 4 E, agree well with the true components. Due to the ill-conditioned nature of the problem, the considerable differences between the best-fit distributions for the individual versus global fits (cf. Fig. 4, B and D) result in negligible changes in the best-fit DEER traces (cf. solid black and dashed colored lines in Fig. 4 A). Global analysis not only results in a more accurate recovery of the true distance distributions but also results in increased precision. This is reflected in the narrow confidence bands of the linked versus individual fits and in smaller parameter uncertainties (Table 3).
Table 4 shows the statistics obtained from fitting the three simulated datasets in Fig. 4 using six different models. Models 2, 4, and 6 use n = 1, 2, and 3 Gaussian components, each corresponding to a separate conformation with no parameters linked. Thus, these models are equivalent to fitting the data individually. For models 1, 3, and 5, all of the and values are linked, and the same set of Gaussian components is used to fit both datasets. Although the most complex model, 6, with gives the lowest SSR and values, model 3 with gives the lowest BIC values and is strongly preferred.
A core principle of global analysis is that the models tested should move beyond mathematical statements about the functions and parameter sets used to fit data and incorporate the fundamental biological questions being addressed in the experiments (Beechem et al., 1991; Beechem, 1992). Moving beyond loose interpretations of the peaks in the P(R) obtained from individual fits, the results in Fig. 4 and Table 4 allow a DEER practitioner to confidently assert that the data are consistent with, and in fact optimally modeled, as a protein undergoing a ligand-dependent shift in the equilibrium between two conformational states that are themselves ligand independent.
With the recent development of the DeerLab software package (Ibáñez et al., 2020), it is now possible to perform a global analysis of DEER data using Tikhonov regularization. The results of such an analysis of these three simulated datasets are shown in Fig. S4. Note that unlike model-based fitting (Fig. 4 E), the regularization-based global analysis (Fig. S4 C) does not correctly disentangle the two component distributions corresponding to the two conformations in equilibrium. In this tutorial, we focus on the use of our GLADDvu software for the model-based global analysis of DEER data.
A description of how parameters are linked in GLADDvu is provided in the Appendix. Screenshots showing the use of GLADDvu to fit these simulated datasets to model 3 are shown in Figs. S1, S2, S3, S5, and S6. A demonstration of the use of GLADDvu to perform all of the fits listed in Table 4 is available in Video 3.
Global analysis of DEER data
Mchaourab and coworkers have globally analyzed DEER data to investigate the ligand-dependent functional dynamics of a number of membrane transporters, including ion-dependent secondary transporters (Masureel et al., 2014; Dastvan et al., 2016; Martens et al., 2016; Claxton et al., 2018; Paz et al., 2018; Jagessar et al., 2020) and ATP-dependent multidrug transporters (Mishra et al., 2014; Verhalen et al., 2017; Dastvan et al., 2019). Using previously published data, we present below details of how global analysis has advanced the understanding of ligand-dependent conformational equilibrium in a way that would not be possible from the individual fits.
P-glycoprotein 1 (Pgp)
The use of DEER to investigate the conformational dynamics of the ABC transporter Pgp provides an instructive example of the benefits of global analysis. Although the initial DEER study was performed on the basis of a single crystal structure (Verhalen et al., 2017), the recent determination of multiple cryo-EM structures (Alam et al., 2018; Kim and Chen, 2018; Alam et al., 2019) illustrates the unique and complementary insight contributed by DEER analysis (Dastvan et al., 2019). Pgp is a mammalian ATP-binding cassette transporter capable of pumping a wide array of xenobiotic substrates out of cells (Aller et al., 2009; Jin et al., 2012; Li et al., 2014; Kim and Chen, 2018). Moreover, a number of inhibitors have been developed in search of a strategy to increase the efficacy of cancer chemotherapy (Chufan et al., 2016). Here, we focus on data collected in the presence of different substrates and inhibitors of Pgp.
An extensive set of spin label pairs was introduced to study the functional dynamics of Pgp under different biochemical conditions that favor specific conformational states (Verhalen et al., 2017; Dastvan et al., 2019). Here, we focus on a pair of sites, 511 and 1043, within NBDs 1 and 2, respectively, which come together to form two nucleotide binding sites (NBS; Fig. 5). In the presence of ATP and inorganic vanadate (Vi), Pgp can form a state that mimics the post-hydrolysis transition state where ADP and phosphate are trapped in the binding sites. Under these conditions, DEER data from Pgp show an equilibrium between distinctly different conformational states (Verhalen et al., 2017) dependent on the substrate present. Fig. 6 shows 10 DEER traces collected for Pgp labeled at 511/1043 in the presence of ATP/Vi and either no substrate, 7 different substrates, or 2 inhibitors of transport (Dastvan et al., 2019).
The analysis of these 10 datasets individually using either Tikhonov regularization or model-based fitting gives broad multicomponent distance distributions that depend strongly on the substrate (Fig. S7). Using either approach, however, it is difficult to draw firm conclusions about the conformations evident in these distributions. To test the utility of global analysis for these data, three different models were used in combined fits (Table 5). Models 1 and 2 employ three and four conformational states, respectively, each modeled as a single Gaussian component. The and values of these Gaussians were linked across all 10 datasets while the amplitudes of the components, the modulation depths, background factors, and scale factors varied. For the conformation with the shortest mean interspin distance, the two NBDs are tightly bound together, trapping ADP/Vi within NBS 2. The intermediate distance component (and its corresponding conformation) is highly populated only in the presence of the two inhibitors, tariquidar and zosuquidar, or vinblastine, which can act as an inhibitor at high concentrations. The two components with the longest mean distances correspond to those seen in the absence of any nucleotide (apo state). Model 3 specifically tests whether the apo-like contributions to P(R), evident at the longer distances, can be fit as a single conformation modeled by two Gaussian components with a linked ratio across all conditions. Based on the values in Table 5, model 2 is the optimal model. The increase in BIC for model 3 versus model 2 indicates that there are distinct apo-like conformations for different substrates. The separation of these underlying conformational states is much clearer using model-based global analysis, and the quality of the fits in Fig. 6 A suggests that, indeed, there is a common set of protein conformations across all 10 experimental conditions.
Global analysis also allows enhanced quantitative analysis of the relative populations of the various conformations. Those substrates that induced the largest population of the tightly bound component were most efficient in stimulating ATP turnover, resulting in a linear relationship between this component’s population and the natural logarithm of stimulated ATP turnover rate constant, kcat (Fig. 6 C; Dastvan et al., 2019). These results demonstrate that, using global analysis, the equilibrium populations of protein conformations determined from the global analysis of DEER data can be quantitatively related to the function dynamics of the protein and that important mechanistic insights can be obtained.
A demonstration of the use of GLADDvu to perform the fits of these Pgp datasets to models 1–3 is available in Video 4.
Using the model-based approach, all of the DEER traces with the exception of the pH 6.0 data are optimally fit with n = 2 Gaussian components. At the lower pH values, the two components with means separated by ∼10 Å clearly correspond to OC and OO conformations. At the higher pH values, the two components have similar values, one broad and one narrow. These results suggest that three Gaussian components will be needed for the optimal global fit to the eight traces and raise the question of what the broad component evident at higher pH values represents.
The full set of models tested against the eight LmrP DEER traces are listed in Table 6. Models 1 and 2 fit the datasets individually with two or three Gaussian components and no parameters linked. Models 3 and 4 fit the data globally with two or three conformations, with each modeled as a single Gaussian component. For these models, the values of and are linked across all datasets, and the amplitudes, modulation depths, backgrounds, and scale factors are not linked. Models 5, 6, and 7 all use three Gaussians with the values of and linked and different ways of linking some of the amplitude variables. For model 5, the OC conformation is modeled by a single Gaussian, and the OO conformation is modeled by two Gaussians, with a linked ratio across all eight traces. For model 6, the OO conformation is modeled by a single Gaussian, and the OC conformation is modeled by two Gaussians, with a linked ratio across all eight traces. For model 7, the OO and OC conformations are both modeled by single Gaussians, and a third component contributes a linked fraction across all eight datasets.
Of the seven models tested, model 7 is optimal according to the BIC values (Table 6). Model 7 fits the data to n = 3 Gaussian components (Fig. 8). For two of these, the fractional contribution is pH dependent, one corresponding to OC state and one corresponding to the OO state The fractional contribution of the third Gaussian is independent of pH. This third component could account for one or more of a number of factors, including non-Gaussian character of the true deviation of the background factors from true exponentials, or the presence of nonfunctional protein in the sample. Assuming the latter, the fraction of OO functional protein have been plotted and fit to determine a pK for the transition between the two conformations. Significantly, when the appropriate values of are fit to a titration curve, models 3, 5, 6, and 7 all give similar pK values (Table 6). Although including a third, pH-independent population did not affect the pK determined from the titration, the ability to identify such nonfunctional proteins can in future studies spur changes and optimization of purification protocols.
GLADDvu was designed to be a fully flexible tool suitable for a wide variety of experimental situations beyond the examples presented in this tutorial. These examples demonstrated how global analysis of DEER data from one spin-labeled mutant obtained under different ligand conditions gives enhanced qualitative insights into the functionally relevant dynamics of integral membrane proteins complete with error analysis of the fit parameters. One could consider analyzing data from multiple mutants under a common set of ligand conditions with different and a common set of Such analyses would test whether domains are moving as rigid bodies or whether the spin labels have any influence on equilibrium coefficients.
The model-based approach available with GLADDvu allows the ligand-dependent fractions of various components to be determined by the fit while providing the uncertainties of the parameters as well as the confidence bands for the distance distributions. Recently, a model-free approach to the global analysis of DEER data has been developed (Ibáñez and Jeschke, 2020; Ibáñez et al., 2020). As is the case for the analysis of individual time traces, different approaches to the global analysis of multiple datasets will provide complementary insights into the fundamental biological issues under investigation.
Online supplemental material
Fig. S1 shows a screenshot of the answer panel for dataset A of GLADDvu during the global analysis of simulated data using model 3. Fig. S2 shows a screenshot of the answer panel for dataset B of GLADDvu during the global analysis of simulated data using model 3. Fig. S3 shows a screenshot of the answer panel for dataset C of GLADDvu during the global analysis of simulated data using model 3. Fig. S4 shows a global analysis of the same simulated DEER data as in Fig. 4 using Tikhonov regularization. Fig. S5 shows a screenshot of the Fit panel of GLADDvu during the global analysis of simulated data using model 3. Fig. S6 shows a screenshot of the P(R) panel of GLADDvu during the global analysis of simulated data using model 3. Fig. S7 shows distance distributions from individual fits to the DEER data for Pgp in Fig. 6. Fig. S8 shows results from individual fits for LmrP data in Fig. 8. Videos 1, 2, 3, and 4 demonstrate the fitting of the data in Fig. 2 using DD and the fitting of the data in Figs. 2, 4, and 6 using GLADDvu, respectively.
Our MATLAB software, DD, for the model-based fitting of single DEER datasets has been described in detail elsewhere (Brandon et al., 2012; Stein et al., 2015; Hustedt et al., 2018). A separate MATLAB program, GLADDvu, has been developed for globally fitting multiple DEER datasets. GLADDvu is designed for maximal flexibility, allowing the user to create both simple and complex models with parameters that are fixed, floated, or linked as desired. A fixed parameter is kept at its initial value and is not varied during the fit. A floated parameter is varied to find the best-fit values that minimize the global reduced χ2 value. A linked parameter for one experimental dataset is required to have the same value as another floated parameter for the same or a different dataset.
After reading in multiple files, each dataset is phased. Then, for each separate experiment, the 0 of the time axis is adjusted such that the signal maximum occurs at and the values of the DEER signal are scaled such that The noise variance for a given dataset, is estimated from the variance of its imaginary component. Various background models can be selected, including exponential, stretched exponential, and excluded volume. The distance distribution can be modeled as the sum of n components (Eq. 6), including Gaussians (Eq. 7). Minimization of is attempted using MATLAB’s interior point, global search, or particle swarm algorithms.
The model used to analyze data are specified by the Background, Components, Shape, and Linked dropdown menus and can be further modified using the individual Answer File spreadsheets for each dataset (Figs. S1, S2, and S3). As seen in the third column of these spreadsheets, for each of the three datasets, there are eight parameters that are floated (i.e., they are not fixed). These are (depth), (eta), (amp_2), and a scale factor (∼1). The and parameters for experiments 2 and 3 are linked to the corresponding parameters for experiment 1, as specified in the fourth and fifth columns of the Answer File spreadsheet, ensuring that the same two Gaussian components are used to define the best-fit P(R) for the two datasets.
Joseph A. Mindell served as editor.
This work was supported by National Institutes of Health grants GM128087 and GM077659 (H.S. Mchaourab). R.A. Stein was also supported by National Institutes of Health grants GM113195 (M. Maduke), MH070039 (J.E. Gouaux), NS117873 (J. Cordero Morales), GM124310 (A. Rajca), and DC007664 (D. Minor).
The authors declare no competing financial interests.
Author contributions: All authors contributed to the conceptual development of this work. E.J. Hustedt and H.S. Mchaourab wrote the manuscript with the assistance of R.A. Stein. E.J. Hustedt wrote DD and GLADDvu with signficant input from R.A. Stein. R.A. Stein provided all of the experimental data in collaboration with authors of the referenced papers.