The potential of spin labeling to reveal the dynamic dimension of macromolecules has been recognized since the dawn of the methodology in the 1960s. However, it was the development of pulsed electron paramagnetic resonance spectroscopy to detect dipolar coupling between spin labels and the availability of turnkey instrumentation in the 21st century that realized the full promise of spin labeling. Double electron-electron resonance (DEER) spectroscopy has seen widespread applications to channels, transporters, and receptors. In these studies, distance distributions between pairs of spin labels obtained under different biochemical conditions report the conformational states of macromolecules, illuminating the key movements underlying biological function. These experimental studies have spurred the development of methods for the rigorous analysis of DEER spectroscopic data along with methods for integrating these distributions into structural models. In this tutorial, we describe a model-based approach to obtaining a minimum set of components of the distance distribution that correspond to functionally relevant protein conformations with a set of fractional amplitudes that define the equilibrium between these conformations. Importantly, we review and elaborate on the error analysis reflecting the uncertainty in the various parameters, a critical step in rigorous structural interpretation of the spectroscopic data.
Introduction
The modern tool kit of structural biology includes diverse, complementary methods spanning the spectrum from computational to experimental that can visualize global structures at atomic resolution as well as probe them on local site-specific levels and can reveal the dynamic dimension under equilibrium or time-resolved conditions. The integration of these methods has been touted as the next frontier in the quest to reveal the interplay of structure and dynamics that underpins protein function. Probe methods, including electron paramagnetic resonance (EPR) spectroscopy of site-specifically spin-labeled proteins, have contributed a unique perspective on protein dynamics (Mchaourab et al., 2011; Claxton et al., 2015; Jeschke, 2018a). In particular, pulsed EPR methods are distinguished by the ability to investigate proteins in multiple conformational states in near-native environments with high sensitivity. Because integration of structural biology data is mostly performed a posteriori, the process requires not only knowledge of the rules of interpretation (i.e., how models are derived from data) but also well-established tools for assessment of rigor and reproducibility of the underlying analysis.
The pulsed EPR technique double electron-electron resonance (DEER) spectroscopy measures long-range (>20 Å) distances between electron spins (Jeschke, 2012; Borbat and Freed, 2014; Jeschke, 2014), typically extrinsic probes introduced into biological macromolecules by site-directed spin labeling (Roser et al., 2016). DEER has been applied to multiple proteins, RNA, and DNA systems. In the protein space, unique insight was revealed on ion channel architecture and gating (Pliotas et al., 2012; Dellisanti et al., 2013; Dalmas et al., 2014; Dürr et al., 2014; Puljung et al., 2014; Raghuraman et al., 2014; Arrigoni et al., 2016; Zhu et al., 2016; Evans et al., 2020), transporter alternating access and energy landscapes (Claxton et al., 2010; Mittal et al., 2012; Georgieva et al., 2013; Kazmier et al., 2014a; Kazmier et al., 2014b; Masureel et al., 2014; Joseph et al., 2015; Dastvan et al., 2016; Martens et al., 2016; Timachi et al., 2017; Verhalen et al., 2017; Göddeke et al., 2018; Dastvan et al., 2019; Joseph et al., 2019; Nyenhuis et al., 2020), and receptor activation and allosteric modulation (Park et al., 2006; Kim et al., 2012; Kang et al., 2015; Yee et al., 2015; Van Eps et al., 2017; Van Eps et al., 2018; Wingler et al., 2019; Elgeti and Hubbell, 2021). The range of applications of DEER include evaluation of static structures in solution (e.g., Zhou et al., 2005) or native-like environments (e.g., Barrett et al., 2012), probing ligand-induced conformational changes, characterization of equilibrium fluctuations, and investigation of protein–protein interactions (Hilger et al., 2005; Hilger et al., 2007; Kim et al., 2011; Edwards et al., 2014). The most frequent strategy is to attach pairs of spin labels at introduced cysteines in a purified protein (Fig. 1 A), although promising alternative strategies have been introduced (Roser et al., 2016; Ackermann et al., 2021; Gamble Jarvi et al., 2021). Multiple pairs are designed to provide a pattern of characteristic distances or distance changes relevant to the structure or model to be tested. Distance restraints from DEER have been integrated with computation methods to model protein structures ab initio (Alexander et al., 2008; Alexander et al., 2013; Jeschke, 2016), to refine structures, and to generate models of intermediate states (Kazmier et al., 2014b; Evans et al., 2020) or protein complexes (Kim et al., 2012; Edwards et al., 2014). In combination with cryo-EM, DEER distance distributions can provide a complementary approach to pinpoint the mechanistic identity of various conformations and to validate models of conformational changes (Dürr et al., 2014; Zhu et al., 2016).
Site-directed spin labeling and DEER. (A) Model showing the MTSSL attached to an α-helix at a cysteine residue. (B) Models showing spin labels at extracellular sites of two helices of an integral membrane protein in two different conformational states. (C) The expected distance distributions for the OO (red) and OC (black) states. (D) The corresponding intramolecular DEER signals.
Site-directed spin labeling and DEER. (A) Model showing the MTSSL attached to an α-helix at a cysteine residue. (B) Models showing spin labels at extracellular sites of two helices of an integral membrane protein in two different conformational states. (C) The expected distance distributions for the OO (red) and OC (black) states. (D) The corresponding intramolecular DEER signals.
The cornerstones of DEER data interpretation are the rigorous transformation of the raw signals into complex distance distributions, the structural interpretation of these distributions as spatial restraints, and the generation of structural models consistent with the derived restraints. As an illustration, consider a hypothetical membrane transporter doubly labeled at the extracellular surface of two transmembrane (TM) helices (Fig. 1 B) with two conformations, outward closed (OC) and outward open (OO), that differ by a large translocation of one helix relative to the other. The distribution, P, of distances, R, between the labels for each conformation (Fig. 1 C) is determined by its tertiary structures, the degree of disorder in the structure, and the rotameric disorder of the labels. The different distance distributions for the two conformations produce distinct DEER time traces (Fig. 1 D).
More generally, distinct distance components with varying widths in the P(R) reflect the collection of conformations of the protein backbone that are populated under the experimental conditions and the rotameric states of the spin-label side chain at each of these conformations. Biochemical conditions can be manipulated to shift the energies of protein conformers, leading to a set of DEER signal traces that differ by the populations of these components. Here, we focus on tools for the rigorous analysis of multiple DEER time traces obtained under varying conditions from one or more doubly spin-labeled mutants of a given protein. We describe a model-based approach to obtaining a minimum set of distance components related to functionally relevant conformations with a set of fractional amplitudes that define the equilibrium between these conformations together with error analysis reflecting the uncertainty in the various parameters.
From DEER signal to distance distribution
Other approaches to the analysis of DEER data include the commonly used Tikhonov regularization (Chiang et al., 2005; Jeschke et al., 2006; Ibáñez and Jeschke, 2020; Ibáñez et al., 2020), alternative regularization approaches (Ibáñez and Jeschke, 2019), denoising methods followed by singular value decomposition (Srivastava and Freed, 2017; Srivastava et al., 2017; Srivastava and Freed, 2019), and neural networks (Worswick et al., 2018; Amey et al., 2021).
Model-based analysis of DEER data
The model-based approach to the analysis of DEER data is illustrated in Fig. 2 using data from T4 lysozyme (T4L) labeled at residues 65 and 80 with the methanethiosulfonate spin label (MTSSL). After a trivial scaling and shift of the time axis, the DEER data are fit using a model-based approach that includes the best-fit background factor. The optimal fit, according to BIC values, models P(R) as a sum of two Gaussians. Table 1 gives the statistics obtained from fitting the data in Fig. 2 to n = 1, 2, and 3 Gaussian components. The BIC values for the n = 1 and n = 3 models both differ from that of the n = 2 model by <10, suggesting that none of these models can be strongly rejected based on this dataset. In fact, the n = 3 model is preferred according to the AICc values. The small amplitude of the third component and the large uncertainty of this amplitude (0.02 ± 0.06), together with the very narrow width of this Gaussian (0.04 ± 2.81 Å), all support the rejection of this more complex model (see Table 2).
Q-band DEER data and fit for T4L spin-labeled with MTSSL at residues 65 and 80. (A) The crystal structure of T4L (PDB accession no. 2LZM; Matsumura et al., 1989) showing the location of the two spin labels. pdb, Protein Data Bank. (B) The scaled DEER data are shown as filled black circles together with the best model-based fit (red line) and background factor (green line). (C) The best-fit P(R) modeled as the sum of two Gaussians is shown as a red line with the associated 2σ (95%) confidence band as a shaded region. The model-free P(R) obtained using Tikhonov regularization in DeerAnalysis 2019 is also shown (dotted blue line). The optimal regularization factor was determined by generalized cross-validation and the starting time point for background fitting using the validation tool. (D) The solid red line and shaded region duplicate those shown in C. The best-fit P(R) modeled as the sum of 2 Gaussians obtained from 15 replicate datasets are shown as thin black lines.
Q-band DEER data and fit for T4L spin-labeled with MTSSL at residues 65 and 80. (A) The crystal structure of T4L (PDB accession no. 2LZM; Matsumura et al., 1989) showing the location of the two spin labels. pdb, Protein Data Bank. (B) The scaled DEER data are shown as filled black circles together with the best model-based fit (red line) and background factor (green line). (C) The best-fit P(R) modeled as the sum of two Gaussians is shown as a red line with the associated 2σ (95%) confidence band as a shaded region. The model-free P(R) obtained using Tikhonov regularization in DeerAnalysis 2019 is also shown (dotted blue line). The optimal regularization factor was determined by generalized cross-validation and the starting time point for background fitting using the validation tool. (D) The solid red line and shaded region duplicate those shown in C. The best-fit P(R) modeled as the sum of 2 Gaussians obtained from 15 replicate datasets are shown as thin black lines.
Model statistics from fitting of T4L 65/80 data in Fig. 2
| n . | q . | SSR . | . | AICc . | BIC . | ΔAICc . | ΔBIC . |
|---|---|---|---|---|---|---|---|
| 1 | 5 | 0.001984 | 1.0803 | −3,051.3 | −3,030.3 | 17.5 | 5.4 |
| 2 | 8 | 0.001823 | 1.0042 | −3,067.0 | −3,035.7 | 1.8 | 0 |
| 3 | 11 | 0.001765 | 0.9842 | −3,068.8 | −3,027.3 | 0 | 8.3 |
| n . | q . | SSR . | . | AICc . | BIC . | ΔAICc . | ΔBIC . |
|---|---|---|---|---|---|---|---|
| 1 | 5 | 0.001984 | 1.0803 | −3,051.3 | −3,030.3 | 17.5 | 5.4 |
| 2 | 8 | 0.001823 | 1.0042 | −3,067.0 | −3,035.7 | 1.8 | 0 |
| 3 | 11 | 0.001765 | 0.9842 | −3,068.8 | −3,027.3 | 0 | 8.3 |
Fit parameters for T4L 65/80 data in Fig. 2
| n . | 1 . | 2 . | 3 . |
|---|---|---|---|
| Δ | 0.298 ± 0.002 | 0.297 ± 0.002 | 0.30 ± 0.02 |
| η | 4.61 ± 0.01 | 4.61 ± 0.01 | 4.61 ± 0.01 |
| Scale | 1.003 ± 0.002 | 1.001 ± 0.002 | 1.00 ± 0.03 |
| r0 | 25.73 ± 0.06 | 22.8 ± 0.5 | 29 ± 4 |
| σR | 3.55 ± 0.08 | 1.3 ± 0.8 | 0.04 ± 2.81 |
| Fraction | 0.10 ± 0.09 | 0.02 ± 0.06 | |
| r0 | 26.2 ± 0.4 | 26.1 ± 0.6 | |
| σR | 3.4 ± 0.2 | 3.5 ± 0.2 | |
| Fraction | 0.90 ± 0.09 | 0.86 ± 0.16 | |
| r0 | 23.1 ± 0.7 | ||
| σR | 1.6 ± 1.0 | ||
| Fraction | 0.12 ± 0.16 |
| n . | 1 . | 2 . | 3 . |
|---|---|---|---|
| Δ | 0.298 ± 0.002 | 0.297 ± 0.002 | 0.30 ± 0.02 |
| η | 4.61 ± 0.01 | 4.61 ± 0.01 | 4.61 ± 0.01 |
| Scale | 1.003 ± 0.002 | 1.001 ± 0.002 | 1.00 ± 0.03 |
| r0 | 25.73 ± 0.06 | 22.8 ± 0.5 | 29 ± 4 |
| σR | 3.55 ± 0.08 | 1.3 ± 0.8 | 0.04 ± 2.81 |
| Fraction | 0.10 ± 0.09 | 0.02 ± 0.06 | |
| r0 | 26.2 ± 0.4 | 26.1 ± 0.6 | |
| σR | 3.4 ± 0.2 | 3.5 ± 0.2 | |
| Fraction | 0.90 ± 0.09 | 0.86 ± 0.16 | |
| r0 | 23.1 ± 0.7 | ||
| σR | 1.6 ± 1.0 | ||
| Fraction | 0.12 ± 0.16 |
The second component in the optimal P(R) does not give rise to a distinct second mode and is fully consistent with a single backbone conformation of T4L with some inherent disorder and a distribution of rotameric states of the two labels. In other cases discussed below, individual well-separated Gaussian components clearly correspond to distinct backbone conformations. A general rule of thumb is that the number of conformations is less than or equal to the number of Gaussian components.
As has been previously discussed (Hustedt et al., 2018), model-based analysis allows for rigorous error analysis of the best-fit parameter values and the calculation of a confidence band for the best-fit distance distribution from the full covariance matrix using propagation of errors. The 2σ (95%) confidence band for the best-fit n = 2 Gaussian P(R) is shown in Fig. 2 C as a shaded region and reproduced in Fig. 2 D along with the best-fit P(R) for 15 replicate datasets collected for the same sample. These results demonstrate that the confidence band obtained from fitting a given dataset properly estimates the reproducibility of the distance distribution.
The P(R) obtained using Tikhonov regularization is also shown in Fig. 2 C. The model-free P(R) is quite close to the P(R) modeled as the sum of two Gaussians. Both the model-based approach as developed primarily at Vanderbilt University (Brandon et al., 2012; Stein et al., 2015; Hustedt et al., 2018) and the model-free approach using Tikhonov regularization, as developed primarily by Jeschke et al. (2006), have been extensively used to analyze DEER data and are well documented. Although historically the use of Tikhonov regularization to give model-free fits to DEER data has required an initial background correction step, recently an approach for applying Tikhonov regularization to uncorrected data has been developed (Ibáñez and Jeschke, 2020; Ibáñez et al., 2020).
Videos demonstrating the use of either DD (Video 1) or GLADDvu (Video 2) to perform the model-based fits of this T4L dataset are available online. DD and GLADDvu are available for download at https://github.com/erichustedt/.
Demonstration of the use of DD (https://github.com/erichustedt/DD) to perform the model-based fits of a T4L dataset. The fits listed in Table 1 and Table 2 using n = 1, 2, and 3 Gaussians are perforrmed using DD. The optimal results, as determined by BIC values, were obtained for n = 2 and are shown in Fig. 2. The dataset (T4L065-080M_08) is available for dowload.
Demonstration of the use of DD (https://github.com/erichustedt/DD) to perform the model-based fits of a T4L dataset. The fits listed in Table 1 and Table 2 using n = 1, 2, and 3 Gaussians are perforrmed using DD. The optimal results, as determined by BIC values, were obtained for n = 2 and are shown in Fig. 2. The dataset (T4L065-080M_08) is available for dowload.
Demonstration of the use of GLADDvu (https://github.com/erichustedt/GLADDvu) to perform the model-based fits of a T4L dataset. The fits listed in Table 1 and Table 2 using n = 1, 2, and 3 Gaussians are perforrmed using GLADDvu. The optimal results, as determined by BIC values, were obtained for n = 2 and are shown in Fig. 2. The dataset (T4L065-080M_08) is available for dowload.
Demonstration of the use of GLADDvu (https://github.com/erichustedt/GLADDvu) to perform the model-based fits of a T4L dataset. The fits listed in Table 1 and Table 2 using n = 1, 2, and 3 Gaussians are perforrmed using GLADDvu. The optimal results, as determined by BIC values, were obtained for n = 2 and are shown in Fig. 2. The dataset (T4L065-080M_08) is available for dowload.
Global analysis paradigm
DEER data assuming that two conformations are in equilibrium.(A) Simulated DEER data assuming that ligand binding (green line) favors the OC state. (B) The corresponding P(R).
DEER data assuming that two conformations are in equilibrium.(A) Simulated DEER data assuming that ligand binding (green line) favors the OC state. (B) The corresponding P(R).
To demonstrate putting these concepts into practice, consider the three simulated DEER traces in Fig. 4 that were calculated for bimodal P(R) generated from the same two unimodal distributions selected from the test set generated by Edwards and Stoll (2018) with different relative amplitudes (Table 3). The underlying biological model is that the protein is undergoing a ligand-dependent shift in the equilibrium between two conformational states. Given the P(R) obtained from fitting these data individually, either to bimodal Gaussian distributions (Fig. 4 B) or using a Tikhonov regularization (Fig. 4 C), it would be hard to argue convincingly whether the protein is or is not behaving according to this model of its functional dynamics (i.e., that the protein adopts the same two conformational states at all ligand concentrations albeit in different proportions).
Global analysis of simulated DEER data. (A) The data (filled circles) were simulated for P(R) modeled as the sum of two components using the amplitudes in Table 3. The true P'(R) of the components are distributions 5347 and 483 from the set of distributions calculated by Edwards and Stoll (2018) using MMM (Jeschke, 2018b) for T4L labeled at 118/154 and 8/126 with means/SDs of 25.1/1.9 Å and 32.6/3.0 Å, respectively. The best-fit lines for model 3 (R0 and σR values linked) and model 4 (no parameters linked) are shown as solid black and dashed colored lines, respectively. (B)P(R) for model 4 for datasets A (red), B (green), and C (blue) with the associated 2σ (95%) confidence bands as shaded regions. The black lines are the true P(R) used for the simulations. (C) Best-fit P(R) from Tikhonov regularization using DeerLab version 0.13.2 using the same color scheme. The optimal regularization factor was determined by generalized cross-validation. (D) Best-fit P(R) for model 3 using the same color scheme with the associated 2σ (95%) confidence bands as the shaded regions. The black lines are the true P(R) used for the simulations. (E) The P'(R) obtained for model 3 is shown in cyan, and the true P'(R) used for the simulations is superimposed as shaded regions. Both the x and y axis scales are the same for B–E.
Global analysis of simulated DEER data. (A) The data (filled circles) were simulated for P(R) modeled as the sum of two components using the amplitudes in Table 3. The true P'(R) of the components are distributions 5347 and 483 from the set of distributions calculated by Edwards and Stoll (2018) using MMM (Jeschke, 2018b) for T4L labeled at 118/154 and 8/126 with means/SDs of 25.1/1.9 Å and 32.6/3.0 Å, respectively. The best-fit lines for model 3 (R0 and σR values linked) and model 4 (no parameters linked) are shown as solid black and dashed colored lines, respectively. (B)P(R) for model 4 for datasets A (red), B (green), and C (blue) with the associated 2σ (95%) confidence bands as shaded regions. The black lines are the true P(R) used for the simulations. (C) Best-fit P(R) from Tikhonov regularization using DeerLab version 0.13.2 using the same color scheme. The optimal regularization factor was determined by generalized cross-validation. (D) Best-fit P(R) for model 3 using the same color scheme with the associated 2σ (95%) confidence bands as the shaded regions. The black lines are the true P(R) used for the simulations. (E) The P'(R) obtained for model 3 is shown in cyan, and the true P'(R) used for the simulations is superimposed as shaded regions. Both the x and y axis scales are the same for B–E.
Fit parameters for simulated data in Fig. 4
| Data set . | A . | B . | C . | ||||
|---|---|---|---|---|---|---|---|
| Simulation . | Fraction . | 0.75 . | 0.25 . | 0.5 . | 0.5 . | 0.25 . | 0.75 . |
| Fit to model 3: linked analysis | r0 | 25.4 ± 0.3 | 33.0 ± 0.4 | 25.4 ± 0.3 | 33.0 ± 0.4 | 25.4 ± 0.3 | 33.0 ± 0.4 |
| σr | 2.0 ± 0.3 | 2.6 ± 0.4 | 2.0 ± 0.3 | 2.6 ± 0.4 | 2.0 ± 0.3 | 2.6 ± 0.4 | |
| Fraction | 0.20 ± 0.04 | 0.46 ± 0.05 | 0.70 ± 0.06 | ||||
| Fit to model 4: individual fits | r0 | 25.2 ± 0.4 | 32.8 ± 3.1 | 26.2 ± 0.9 | 33.7 ± 1.1 | 25.6 ± 0.7 | 33.1 ± 0.5 |
| σr | 1.9 ± 0.4 | 3.4 ± 2.6 | 2.6 ± 0.9 | 2.1 ± 0.9 | 1.6 ± 0.7 | 2.6 ± 0.5 | |
| Fraction | 0.24 ± 0.15 | 0.36 ± 0.14 | 0.70 ± 0.07 | ||||
| Data set . | A . | B . | C . | ||||
|---|---|---|---|---|---|---|---|
| Simulation . | Fraction . | 0.75 . | 0.25 . | 0.5 . | 0.5 . | 0.25 . | 0.75 . |
| Fit to model 3: linked analysis | r0 | 25.4 ± 0.3 | 33.0 ± 0.4 | 25.4 ± 0.3 | 33.0 ± 0.4 | 25.4 ± 0.3 | 33.0 ± 0.4 |
| σr | 2.0 ± 0.3 | 2.6 ± 0.4 | 2.0 ± 0.3 | 2.6 ± 0.4 | 2.0 ± 0.3 | 2.6 ± 0.4 | |
| Fraction | 0.20 ± 0.04 | 0.46 ± 0.05 | 0.70 ± 0.06 | ||||
| Fit to model 4: individual fits | r0 | 25.2 ± 0.4 | 32.8 ± 3.1 | 26.2 ± 0.9 | 33.7 ± 1.1 | 25.6 ± 0.7 | 33.1 ± 0.5 |
| σr | 1.9 ± 0.4 | 3.4 ± 2.6 | 2.6 ± 0.9 | 2.1 ± 0.9 | 1.6 ± 0.7 | 2.6 ± 0.5 | |
| Fraction | 0.24 ± 0.15 | 0.36 ± 0.14 | 0.70 ± 0.07 | ||||
However, when the fit requires the means and SDs of the two Gaussian components to be equal at different ligand concentrations, the best-fit distance distributions (Fig. 4 D) more accurately and precisely correspond to the true distributions. For this analysis, the background factors, η, the modulation depths, Δ, and the relative amplitudes of the two components vary for the three datasets (see Figs. S1, S2, and S3). The benefits of the linked global analysis are clearly demonstrated by considering the low-amplitude components in datasets A and C, which are poorly recovered in the individual fits, regardless of the approach used (Fig. 4, B and C). In the global fit, however, the same two Gaussian components are used to fit all three datasets. Both of the recovered Gaussians, shown as cyan lines in Fig. 4 E, agree well with the true components. Due to the ill-conditioned nature of the problem, the considerable differences between the best-fit distributions for the individual versus global fits (cf. Fig. 4, B and D) result in negligible changes in the best-fit DEER traces (cf. solid black and dashed colored lines in Fig. 4 A). Global analysis not only results in a more accurate recovery of the true distance distributions but also results in increased precision. This is reflected in the narrow confidence bands of the linked versus individual fits and in smaller parameter uncertainties (Table 3).
Screenshot of the answer panel for dataset A of GLADDvu during the global analysis of simulated data using model 3. Related to Table 4 and Fig. 4 D.
Screenshot of the answer panel for dataset B of GLADDvu during the global analysis of simulated data using model 3. Related to Table 4 and Fig. 4 D.
Screenshot of the answer panel for dataset C of GLADDvu during the global analysis of simulated data using model 3. Related to Table 4 and Fig. 4 D.
Table 4 shows the statistics obtained from fitting the three simulated datasets in Fig. 4 using six different models. Models 2, 4, and 6 use n = 1, 2, and 3 Gaussian components, each corresponding to a separate conformation with no parameters linked. Thus, these models are equivalent to fitting the data individually. For models 1, 3, and 5, all of the and values are linked, and the same set of Gaussian components is used to fit both datasets. Although the most complex model, 6, with gives the lowest SSR and values, model 3 with gives the lowest BIC values and is strongly preferred.
Model statistics from fitting simulated data in Fig. 4
| Model . | n . | Linked . | q . | SSR . | . | BIC . | ΔBIC . |
|---|---|---|---|---|---|---|---|
| 1 | 1 | r0 and σr | 11 | 0.13044 | 1.7759 | -6776.0 | 422.3 |
| 2 | 1 | Nothing | 15 | 0.08557 | 1.1268 | -7106.9 | 91.4 |
| 3 | 2 | r0 and σr | 16 | 0.07588 | 0.9945 | -7198.3 | 0.0 |
| 4 | 2 | Nothing | 24 | 0.07503 | 0.9945 | -7153.3 | 45.1 |
| 5 | 3 | r0 and σr | 21 | 0.07532 | 0.9944 | -7170.2 | 28.1 |
| 6 | 3 | Nothing | 33 | 0.07377 | 0.9878 | -7108.0 | 90.3 |
| Model . | n . | Linked . | q . | SSR . | . | BIC . | ΔBIC . |
|---|---|---|---|---|---|---|---|
| 1 | 1 | r0 and σr | 11 | 0.13044 | 1.7759 | -6776.0 | 422.3 |
| 2 | 1 | Nothing | 15 | 0.08557 | 1.1268 | -7106.9 | 91.4 |
| 3 | 2 | r0 and σr | 16 | 0.07588 | 0.9945 | -7198.3 | 0.0 |
| 4 | 2 | Nothing | 24 | 0.07503 | 0.9945 | -7153.3 | 45.1 |
| 5 | 3 | r0 and σr | 21 | 0.07532 | 0.9944 | -7170.2 | 28.1 |
| 6 | 3 | Nothing | 33 | 0.07377 | 0.9878 | -7108.0 | 90.3 |
The models differ in the number of Gaussian components, n, used to define the distance distributions and in how their parameters are linked for the different datasets. In all models, the modulation depths, Δ, and the background parameters, η, for the different datasets are not linked. A video (Video 3) demonstrating the use of GLADDvu to perform the fits of these simulated data is available.
A core principle of global analysis is that the models tested should move beyond mathematical statements about the functions and parameter sets used to fit data and incorporate the fundamental biological questions being addressed in the experiments (Beechem et al., 1991; Beechem, 1992). Moving beyond loose interpretations of the peaks in the P(R) obtained from individual fits, the results in Fig. 4 and Table 4 allow a DEER practitioner to confidently assert that the data are consistent with, and in fact optimally modeled, as a protein undergoing a ligand-dependent shift in the equilibrium between two conformational states that are themselves ligand independent.
With the recent development of the DeerLab software package (Ibáñez et al., 2020), it is now possible to perform a global analysis of DEER data using Tikhonov regularization. The results of such an analysis of these three simulated datasets are shown in Fig. S4. Note that unlike model-based fitting (Fig. 4 E), the regularization-based global analysis (Fig. S4 C) does not correctly disentangle the two component distributions corresponding to the two conformations in equilibrium. In this tutorial, we focus on the use of our GLADDvu software for the model-based global analysis of DEER data.
Global analysis of the same simulated DEER data as in Fig. 4 using Tikhonov regularization. (A) The data are shown as filled circles. The best-fit lines for model 3 (R0 and σR values linked) are shown as solid black lines. The dashed colored lines were obtained using DeerLab version 0.13.2 (Ibáñez et al., 2020) to perform a global analysis using nonparametric distance distributions based on Tikhonov regularization. (B)P(R) from DeerLab for datasets A (red), B (green), and C (blue). The black lines are the true P(R) used for the simulations. (C) The P'(R) of the underlying components of the global analysis obtained from DeerLab are shown in cyan, and the true P'(R) used for the simulations are superimposed as shaded regions. The global Tikhonov fit with unconstrained fractions of the conformations does not recover the proper distributions of the hypothetical pure conformational states if traces representing the pure conformations are not included in the data. The model-based approach (Fig. 4) does recover the pure conformational states because it favors P'(R) with a minimum number of components. If applicable, a chemical equilibrium model and Tikhonov regularization, as is possible with DeerLab, effectively constrains the mole fractions and can recover the true P'(R).
Global analysis of the same simulated DEER data as in Fig. 4 using Tikhonov regularization. (A) The data are shown as filled circles. The best-fit lines for model 3 (R0 and σR values linked) are shown as solid black lines. The dashed colored lines were obtained using DeerLab version 0.13.2 (Ibáñez et al., 2020) to perform a global analysis using nonparametric distance distributions based on Tikhonov regularization. (B)P(R) from DeerLab for datasets A (red), B (green), and C (blue). The black lines are the true P(R) used for the simulations. (C) The P'(R) of the underlying components of the global analysis obtained from DeerLab are shown in cyan, and the true P'(R) used for the simulations are superimposed as shaded regions. The global Tikhonov fit with unconstrained fractions of the conformations does not recover the proper distributions of the hypothetical pure conformational states if traces representing the pure conformations are not included in the data. The model-based approach (Fig. 4) does recover the pure conformational states because it favors P'(R) with a minimum number of components. If applicable, a chemical equilibrium model and Tikhonov regularization, as is possible with DeerLab, effectively constrains the mole fractions and can recover the true P'(R).
A description of how parameters are linked in GLADDvu is provided in the Appendix. Screenshots showing the use of GLADDvu to fit these simulated datasets to model 3 are shown in Figs. S1, S2, S3, S5, and S6. A demonstration of the use of GLADDvu to perform all of the fits listed in Table 4 is available in Video 3.
Screenshot of the fit panel of GLADDvu during the global analysis of simulated data using model 3. Related to Table 4 and Fig. 4 D.
Screenshot of the P(R) panel of GLADDvu during the global analysis of simulated data using model 3. Related to Table 4 and Fig. 4 D.
Demonstration of the use of GLADDvu (https://github.com/erichustedt/GLADDvu) to perform the model-based global analysis of three simulated datasets. The fits to the six different models listed in Table 4 are performed. The optimal results, as determined by BIC values, were obtained for model 3 and are shown in Fig. 4. The datasets (Fig. 4, A–C) are available for dowload.
Demonstration of the use of GLADDvu (https://github.com/erichustedt/GLADDvu) to perform the model-based global analysis of three simulated datasets. The fits to the six different models listed in Table 4 are performed. The optimal results, as determined by BIC values, were obtained for model 3 and are shown in Fig. 4. The datasets (Fig. 4, A–C) are available for dowload.
Global analysis of DEER data
Mchaourab and coworkers have globally analyzed DEER data to investigate the ligand-dependent functional dynamics of a number of membrane transporters, including ion-dependent secondary transporters (Masureel et al., 2014; Dastvan et al., 2016; Martens et al., 2016; Claxton et al., 2018; Paz et al., 2018; Jagessar et al., 2020) and ATP-dependent multidrug transporters (Mishra et al., 2014; Verhalen et al., 2017; Dastvan et al., 2019). Using previously published data, we present below details of how global analysis has advanced the understanding of ligand-dependent conformational equilibrium in a way that would not be possible from the individual fits.
P-glycoprotein 1 (Pgp)
The use of DEER to investigate the conformational dynamics of the ABC transporter Pgp provides an instructive example of the benefits of global analysis. Although the initial DEER study was performed on the basis of a single crystal structure (Verhalen et al., 2017), the recent determination of multiple cryo-EM structures (Alam et al., 2018; Kim and Chen, 2018; Alam et al., 2019) illustrates the unique and complementary insight contributed by DEER analysis (Dastvan et al., 2019). Pgp is a mammalian ATP-binding cassette transporter capable of pumping a wide array of xenobiotic substrates out of cells (Aller et al., 2009; Jin et al., 2012; Li et al., 2014; Kim and Chen, 2018). Moreover, a number of inhibitors have been developed in search of a strategy to increase the efficacy of cancer chemotherapy (Chufan et al., 2016). Here, we focus on data collected in the presence of different substrates and inhibitors of Pgp.
An extensive set of spin label pairs was introduced to study the functional dynamics of Pgp under different biochemical conditions that favor specific conformational states (Verhalen et al., 2017; Dastvan et al., 2019). Here, we focus on a pair of sites, 511 and 1043, within NBDs 1 and 2, respectively, which come together to form two nucleotide binding sites (NBS; Fig. 5). In the presence of ATP and inorganic vanadate (Vi), Pgp can form a state that mimics the post-hydrolysis transition state where ADP and phosphate are trapped in the binding sites. Under these conditions, DEER data from Pgp show an equilibrium between distinctly different conformational states (Verhalen et al., 2017) dependent on the substrate present. Fig. 6 shows 10 DEER traces collected for Pgp labeled at 511/1043 in the presence of ATP/Vi and either no substrate, 7 different substrates, or 2 inhibitors of transport (Dastvan et al., 2019).
The conformational shift of Pgp. Left: The mouse Pgp structure in the inward-facing conformation (Li et al., 2014). Highlighted are two residues, 511 and 1043, which are on opposite halves of Pgp. Middle: The structure of human Pgp in the outward-facing conformation (Kim and Chen, 2018). Right: The closure of the NBDs is shown in detail, highlighting the nearness of the residues 511 and 1043 in the outward-facing conformation. PDB, Protein Data Bank.
The conformational shift of Pgp. Left: The mouse Pgp structure in the inward-facing conformation (Li et al., 2014). Highlighted are two residues, 511 and 1043, which are on opposite halves of Pgp. Middle: The structure of human Pgp in the outward-facing conformation (Kim and Chen, 2018). Right: The closure of the NBDs is shown in detail, highlighting the nearness of the residues 511 and 1043 in the outward-facing conformation. PDB, Protein Data Bank.
Global analysis of Q-band DEER data for Pgp spin labeled with MTSSL at residues 511 and 1043. (A) Samples were prepared in mixed micelles with ATP and Vi in the presence and absence of substrates and inhibitors as indicated (Dastvan et al., 2019). The scaled data are shown as filled colored circles, and the best fits obtained from global analysis using model 2 (Table 5) are shown as solid black lines. (B) The best-distance distributions obtained using model 2 to fit the Pgp data with the associated 2σ (95%) confidence bands as shaded regions. The R0 values of the four components are indicated by the arrows. (C) The substrate-induced population of the shortest distance component is correlated with the activation energy of ATP turnover.
Global analysis of Q-band DEER data for Pgp spin labeled with MTSSL at residues 511 and 1043. (A) Samples were prepared in mixed micelles with ATP and Vi in the presence and absence of substrates and inhibitors as indicated (Dastvan et al., 2019). The scaled data are shown as filled colored circles, and the best fits obtained from global analysis using model 2 (Table 5) are shown as solid black lines. (B) The best-distance distributions obtained using model 2 to fit the Pgp data with the associated 2σ (95%) confidence bands as shaded regions. The R0 values of the four components are indicated by the arrows. (C) The substrate-induced population of the shortest distance component is correlated with the activation energy of ATP turnover.
The analysis of these 10 datasets individually using either Tikhonov regularization or model-based fitting gives broad multicomponent distance distributions that depend strongly on the substrate (Fig. S7). Using either approach, however, it is difficult to draw firm conclusions about the conformations evident in these distributions. To test the utility of global analysis for these data, three different models were used in combined fits (Table 5). Models 1 and 2 employ three and four conformational states, respectively, each modeled as a single Gaussian component. The and values of these Gaussians were linked across all 10 datasets while the amplitudes of the components, the modulation depths, background factors, and scale factors varied. For the conformation with the shortest mean interspin distance, the two NBDs are tightly bound together, trapping ADP/Vi within NBS 2. The intermediate distance component (and its corresponding conformation) is highly populated only in the presence of the two inhibitors, tariquidar and zosuquidar, or vinblastine, which can act as an inhibitor at high concentrations. The two components with the longest mean distances correspond to those seen in the absence of any nucleotide (apo state). Model 3 specifically tests whether the apo-like contributions to P(R), evident at the longer distances, can be fit as a single conformation modeled by two Gaussian components with a linked ratio across all conditions. Based on the values in Table 5, model 2 is the optimal model. The increase in BIC for model 3 versus model 2 indicates that there are distinct apo-like conformations for different substrates. The separation of these underlying conformational states is much clearer using model-based global analysis, and the quality of the fits in Fig. 6 A suggests that, indeed, there is a common set of protein conformations across all 10 experimental conditions.
Distance distributions from individual fits to the DEER data for Pgp in Fig. 6. (A) The model-free P(R) were obtained using Tikhonov regularization in DeerAnalysis 2019 (Jeschke et al., 2006). The optimal regularization factor was determined by generalized cross-validation and the starting time point for background fitting using the validation tool. (B) The model-based P(R) were obtained assuming a 3-D exponential background and using Gaussian components in DD (Brandon et al., 2012; Stein et al., 2015; Hustedt et al., 2018). The optimal number of Gaussians, n, was determined using BIC values.
Distance distributions from individual fits to the DEER data for Pgp in Fig. 6. (A) The model-free P(R) were obtained using Tikhonov regularization in DeerAnalysis 2019 (Jeschke et al., 2006). The optimal regularization factor was determined by generalized cross-validation and the starting time point for background fitting using the validation tool. (B) The model-based P(R) were obtained assuming a 3-D exponential background and using Gaussian components in DD (Brandon et al., 2012; Stein et al., 2015; Hustedt et al., 2018). The optimal number of Gaussians, n, was determined using BIC values.
Statistics of fitting Pgp data (Fig. 6)
| Model . | n . | Linked . | q . | SSR . | . | BICWLS . | ΔBICWLS . |
|---|---|---|---|---|---|---|---|
| 1 | 3 | Gaussians | 56 | 2.5980E-02 | 1.2183 | −36,842.8 | 62.2 |
| 2 | 4 | Gaussians | 68 | 2.4681E-02 | 1.1593 | −36,905.0 | 0.0 |
| 3 | 4 | Gaussians and ratio of two longest | 59 | 2.5623E-02 | 1.1921 | −36,885.7 | 19.3 |
| Model . | n . | Linked . | q . | SSR . | . | BICWLS . | ΔBICWLS . |
|---|---|---|---|---|---|---|---|
| 1 | 3 | Gaussians | 56 | 2.5980E-02 | 1.2183 | −36,842.8 | 62.2 |
| 2 | 4 | Gaussians | 68 | 2.4681E-02 | 1.1593 | −36,905.0 | 0.0 |
| 3 | 4 | Gaussians and ratio of two longest | 59 | 2.5623E-02 | 1.1921 | −36,885.7 | 19.3 |
The models differ in the number of Gaussian components, n, used to define the distance distributions and in how their parameters are linked for the different datasets. In all models, the modulation depths, Δ, and the background parameters, η, for the different datasets are not linked. A video (Video 4) demonstrating the use of GLADDvu to perform the fits of these Pgp data is available.
Global analysis also allows enhanced quantitative analysis of the relative populations of the various conformations. Those substrates that induced the largest population of the tightly bound component were most efficient in stimulating ATP turnover, resulting in a linear relationship between this component’s population and the natural logarithm of stimulated ATP turnover rate constant, kcat (Fig. 6 C; Dastvan et al., 2019). These results demonstrate that, using global analysis, the equilibrium populations of protein conformations determined from the global analysis of DEER data can be quantitatively related to the function dynamics of the protein and that important mechanistic insights can be obtained.
A demonstration of the use of GLADDvu to perform the fits of these Pgp datasets to models 1–3 is available in Video 4.
Demonstration of the use of GLADDvu (https://github.com/erichustedt/GLADDvu) to perform the model-based global analysis of the Pgp datasets. The fits to the three different models listed in Table 5 are perforrmed. The optimal results, as determined by BIC values, were obtained for model 2 and are shown in Fig. 6.
Demonstration of the use of GLADDvu (https://github.com/erichustedt/GLADDvu) to perform the model-based global analysis of the Pgp datasets. The fits to the three different models listed in Table 5 are perforrmed. The optimal results, as determined by BIC values, were obtained for model 2 and are shown in Fig. 6.
LmrP
X-ray crystal structure of LmrP. Related to Debruycker et al. (2020). On the left is a side view of LmrP, and on the right is the top (extracellular) view of LmrP. Highlighted are two residues, 160 and 310, on opposite halves of LmrP where spin labels have been placed. PDB, Protein Data Bank.
X-ray crystal structure of LmrP. Related to Debruycker et al. (2020). On the left is a side view of LmrP, and on the right is the top (extracellular) view of LmrP. Highlighted are two residues, 160 and 310, on opposite halves of LmrP where spin labels have been placed. PDB, Protein Data Bank.
Global analysis of Q-band DEER data for LmrP spin labeled with MTSSL at residues 160 and 310. Related to Masureel et al. (2014). Samples were prepared in dodecyl maltoside micelles at eight different pH values ranging from 4.5 to 8.0 as indicated. (A) Overlay of all eight datasets and the fits obtained from global analysis using model 7 in Table 6. (B) The P(R) obtained for model 7. (C) The OO fraction obtained for model 7 as a function of pH fit to determine the pK value.
Global analysis of Q-band DEER data for LmrP spin labeled with MTSSL at residues 160 and 310. Related to Masureel et al. (2014). Samples were prepared in dodecyl maltoside micelles at eight different pH values ranging from 4.5 to 8.0 as indicated. (A) Overlay of all eight datasets and the fits obtained from global analysis using model 7 in Table 6. (B) The P(R) obtained for model 7. (C) The OO fraction obtained for model 7 as a function of pH fit to determine the pK value.
Results from individual fits for LmrP data in Fig. 8. On the left, the data are shown as black dots with the individual model-based fits as blue lines. The corresponding P(R) are shown (black lines) on the right together with those obtained from Tikhonov regularization using DeerAnalysis 2019 (Jeschke et al., 2006; red lines). The widths of the narrow components at pH 6.5 and 7.0 have large relative uncertainties, and owing to the fact that the modulation is not fully damped over the measured evolution time.
Results from individual fits for LmrP data in Fig. 8. On the left, the data are shown as black dots with the individual model-based fits as blue lines. The corresponding P(R) are shown (black lines) on the right together with those obtained from Tikhonov regularization using DeerAnalysis 2019 (Jeschke et al., 2006; red lines). The widths of the narrow components at pH 6.5 and 7.0 have large relative uncertainties, and owing to the fact that the modulation is not fully damped over the measured evolution time.
Using the model-based approach, all of the DEER traces with the exception of the pH 6.0 data are optimally fit with n = 2 Gaussian components. At the lower pH values, the two components with means separated by ∼10 Å clearly correspond to OC and OO conformations. At the higher pH values, the two components have similar values, one broad and one narrow. These results suggest that three Gaussian components will be needed for the optimal global fit to the eight traces and raise the question of what the broad component evident at higher pH values represents.
The full set of models tested against the eight LmrP DEER traces are listed in Table 6. Models 1 and 2 fit the datasets individually with two or three Gaussian components and no parameters linked. Models 3 and 4 fit the data globally with two or three conformations, with each modeled as a single Gaussian component. For these models, the values of and are linked across all datasets, and the amplitudes, modulation depths, backgrounds, and scale factors are not linked. Models 5, 6, and 7 all use three Gaussians with the values of and linked and different ways of linking some of the amplitude variables. For model 5, the OC conformation is modeled by a single Gaussian, and the OO conformation is modeled by two Gaussians, with a linked ratio across all eight traces. For model 6, the OO conformation is modeled by a single Gaussian, and the OC conformation is modeled by two Gaussians, with a linked ratio across all eight traces. For model 7, the OO and OC conformations are both modeled by single Gaussians, and a third component contributes a linked fraction across all eight datasets.
Models used to fit LmrP data (Fig. 8)
| Model . | n . | Linked . | q . | SSR . | . | BICWLS . | ΔBICWLS . | pK . |
|---|---|---|---|---|---|---|---|---|
| 1 | 2 | Nothing | 64 | 0.075598 | 1.1230 | −30,382.8 | 139.7 | — |
| 2 | 3 | Nothing | 88 | 0.074582 | 1.1129 | −30,242.2 | 280.4 | — |
| 3 | 2 | Gaussians | 36 | 0.079455 | 1.1651 | −30,474.8 | 47.7 | 4.6 ± 0.6 |
| 4 | 3 | Gaussians | 46 | 0.077139 | 1.1329 | −30,483.0 | 39.6 | — |
| 5 | 3 | Custom | 39 | 0.078279 | 1.1463 | −30,498.7 | 23.8 | 4.6 ± 0.7 |
| 6 | 3 | Custom | 39 | 0.078139 | 1.1419 | −30,509.5 | 13.1 | 4.8 ± 1.0 |
| 7 | 3 | Custom | 39 | 0.077450 | 1.1365 | −30,522.6 | 0.0 | 4.7 ± 1.3 |
| Model . | n . | Linked . | q . | SSR . | . | BICWLS . | ΔBICWLS . | pK . |
|---|---|---|---|---|---|---|---|---|
| 1 | 2 | Nothing | 64 | 0.075598 | 1.1230 | −30,382.8 | 139.7 | — |
| 2 | 3 | Nothing | 88 | 0.074582 | 1.1129 | −30,242.2 | 280.4 | — |
| 3 | 2 | Gaussians | 36 | 0.079455 | 1.1651 | −30,474.8 | 47.7 | 4.6 ± 0.6 |
| 4 | 3 | Gaussians | 46 | 0.077139 | 1.1329 | −30,483.0 | 39.6 | — |
| 5 | 3 | Custom | 39 | 0.078279 | 1.1463 | −30,498.7 | 23.8 | 4.6 ± 0.7 |
| 6 | 3 | Custom | 39 | 0.078139 | 1.1419 | −30,509.5 | 13.1 | 4.8 ± 1.0 |
| 7 | 3 | Custom | 39 | 0.077450 | 1.1365 | −30,522.6 | 0.0 | 4.7 ± 1.3 |
The models differ in the number of Gaussian components, n, used to define the distance distributions and in how their parameters are linked for the different datasets. In all models, the modulation depths, Δ, and the background parameters, η, for the different datasets are not linked. pK values are the negative logarithms of the dissociation constants.
Of the seven models tested, model 7 is optimal according to the BIC values (Table 6). Model 7 fits the data to n = 3 Gaussian components (Fig. 8). For two of these, the fractional contribution is pH dependent, one corresponding to OC state and one corresponding to the OO state The fractional contribution of the third Gaussian is independent of pH. This third component could account for one or more of a number of factors, including non-Gaussian character of the true deviation of the background factors from true exponentials, or the presence of nonfunctional protein in the sample. Assuming the latter, the fraction of OO functional protein have been plotted and fit to determine a pK for the transition between the two conformations. Significantly, when the appropriate values of are fit to a titration curve, models 3, 5, 6, and 7 all give similar pK values (Table 6). Although including a third, pH-independent population did not affect the pK determined from the titration, the ability to identify such nonfunctional proteins can in future studies spur changes and optimization of purification protocols.
Looking ahead
GLADDvu was designed to be a fully flexible tool suitable for a wide variety of experimental situations beyond the examples presented in this tutorial. These examples demonstrated how global analysis of DEER data from one spin-labeled mutant obtained under different ligand conditions gives enhanced qualitative insights into the functionally relevant dynamics of integral membrane proteins complete with error analysis of the fit parameters. One could consider analyzing data from multiple mutants under a common set of ligand conditions with different and a common set of Such analyses would test whether domains are moving as rigid bodies or whether the spin labels have any influence on equilibrium coefficients.
The model-based approach available with GLADDvu allows the ligand-dependent fractions of various components to be determined by the fit while providing the uncertainties of the parameters as well as the confidence bands for the distance distributions. Recently, a model-free approach to the global analysis of DEER data has been developed (Ibáñez and Jeschke, 2020; Ibáñez et al., 2020). As is the case for the analysis of individual time traces, different approaches to the global analysis of multiple datasets will provide complementary insights into the fundamental biological issues under investigation.
Online supplemental material
Fig. S1 shows a screenshot of the answer panel for dataset A of GLADDvu during the global analysis of simulated data using model 3. Fig. S2 shows a screenshot of the answer panel for dataset B of GLADDvu during the global analysis of simulated data using model 3. Fig. S3 shows a screenshot of the answer panel for dataset C of GLADDvu during the global analysis of simulated data using model 3. Fig. S4 shows a global analysis of the same simulated DEER data as in Fig. 4 using Tikhonov regularization. Fig. S5 shows a screenshot of the Fit panel of GLADDvu during the global analysis of simulated data using model 3. Fig. S6 shows a screenshot of the P(R) panel of GLADDvu during the global analysis of simulated data using model 3. Fig. S7 shows distance distributions from individual fits to the DEER data for Pgp in Fig. 6. Fig. S8 shows results from individual fits for LmrP data in Fig. 8. Videos 1, 2, 3, and 4 demonstrate the fitting of the data in Fig. 2 using DD and the fitting of the data in Figs. 2, 4, and 6 using GLADDvu, respectively.
Appendix
GLADDvu
Our MATLAB software, DD, for the model-based fitting of single DEER datasets has been described in detail elsewhere (Brandon et al., 2012; Stein et al., 2015; Hustedt et al., 2018). A separate MATLAB program, GLADDvu, has been developed for globally fitting multiple DEER datasets. GLADDvu is designed for maximal flexibility, allowing the user to create both simple and complex models with parameters that are fixed, floated, or linked as desired. A fixed parameter is kept at its initial value and is not varied during the fit. A floated parameter is varied to find the best-fit values that minimize the global reduced χ2 value. A linked parameter for one experimental dataset is required to have the same value as another floated parameter for the same or a different dataset.
After reading in multiple files, each dataset is phased. Then, for each separate experiment, the 0 of the time axis is adjusted such that the signal maximum occurs at and the values of the DEER signal are scaled such that The noise variance for a given dataset, is estimated from the variance of its imaginary component. Various background models can be selected, including exponential, stretched exponential, and excluded volume. The distance distribution can be modeled as the sum of n components (Eq. 6), including Gaussians (Eq. 7). Minimization of is attempted using MATLAB’s interior point, global search, or particle swarm algorithms.
The model used to analyze data are specified by the Background, Components, Shape, and Linked dropdown menus and can be further modified using the individual Answer File spreadsheets for each dataset (Figs. S1, S2, and S3). As seen in the third column of these spreadsheets, for each of the three datasets, there are eight parameters that are floated (i.e., they are not fixed). These are (depth), (eta), (amp_2), and a scale factor (∼1). The and parameters for experiments 2 and 3 are linked to the corresponding parameters for experiment 1, as specified in the fourth and fifth columns of the Answer File spreadsheet, ensuring that the same two Gaussian components are used to define the best-fit P(R) for the two datasets.
Acknowledgments
Joseph A. Mindell served as editor.
This work was supported by National Institutes of Health grants GM128087 and GM077659 (H.S. Mchaourab). R.A. Stein was also supported by National Institutes of Health grants GM113195 (M. Maduke), MH070039 (J.E. Gouaux), NS117873 (J. Cordero Morales), GM124310 (A. Rajca), and DC007664 (D. Minor).
The authors declare no competing financial interests.
Author contributions: All authors contributed to the conceptual development of this work. E.J. Hustedt and H.S. Mchaourab wrote the manuscript with the assistance of R.A. Stein. E.J. Hustedt wrote DD and GLADDvu with signficant input from R.A. Stein. R.A. Stein provided all of the experimental data in collaboration with authors of the referenced papers.

