A critical but often overlooked question in the study of ligands binding to proteins is whether the parameters obtained from analyzing binding data are practically identifiable (PI), i.e., whether the estimates obtained from fitting models to noisy data are accurate and unique. Here we report a general approach to assess and understand binding parameter identifiability, which provides a toolkit to assist experimentalists in the design of binding studies and in the analysis of binding data. The partial fraction (PF) expansion technique is used to decompose binding curves for proteins with n ligand-binding sites exactly and uniquely into n components, each of which has the form of a one-site binding curve. The association constants of the PF component curves, being the roots of an n-th order polynomial, may be real or complex. We demonstrate a fundamental connection between binding parameter identifiability and the nature of these one-site association constants: all binding parameters are identifiable if the constants are all real and distinct; otherwise, at least some of the parameters are not identifiable. The theory is used to construct identifiability maps from which the practical identifiability of binding parameters for any two-, three-, or four-site binding curve can be assessed. Instructions for extending the method to generate identifiability maps for proteins with more than four binding sites are also given. Further analysis of the identifiability maps leads to the simple rule that the maximum number of structurally identifiable binding parameters (shown in the previous paper to be equal to n) will also be PI only if the binding curve line shape contains n resolved components.
Introduction
The thermodynamic theory of ligands binding to macromolecules containing multiple binding sites is reviewed in several monographs (Poland, 1978; Hill, 1985; Wyman and Gill, 1990; Ben-Naim, 2010). A remarkable feature of the theory is that the mathematical form of the partition function for these systems (also known as the binding polynomial) is conserved for virtually all physically reasonable binding models (Wyman and Gill, 1990). The total binding isotherm, which is derived from the partition function, also takes on a conserved form (see companion article Middendorf and Aldrich in this issue). For all models consisting entirely of bimolecular association reactions and conformational equilibria, the canonical form of the total binding relation is
where n is the number of ligand-binding sites on the protein and v is the mean number of occupied sites at free ligand concentration x (Middendorf and Aldrich, 2017). The parameters {p0, p1, …, pn} in Eq. 1 are functions of molecular properties such as the association constants of the ligands for the binding sites on the protein, the quantitative effect of binding at a subset of the sites on binding at other sites (cooperativity), and protein conformational equilibrium constants. (Parameter p0 in Eq. 1 is zero if the protein is assumed to occupy a single conformation.) Eq. 1 applies specifically to total binding curves, which are obtained from binding measurements that detect the net occupancy of all binding sites in a protein without specifying the occupancies of the individual sites.
The binding parameters in Eq. 1 cannot be measured directly, but rather are quantified by fitting models to binding data. Knowledge of the parameter values can yield meaningful insights into mechanism, but only if the parameters are identifiable, meaning that the fitted estimates of them are accurate and unique. This problem has been addressed previously (Colquhoun, 1969; Reich and Zinke, 1974; Reich et al., 1974a,b). However, the lack of a general understanding of the factors determining binding parameter identifiability has hampered the development and validation of quantitative models for molecular systems that are governed by multiple, coupled equilibria, such as ligand-activated receptors.
In our companion paper (Middendorf and Aldrich, 2017), we showed that the conserved form of the total binding relation (Eq. 1) implies that simple, general rules govern binding parameter structural identifiability: equilibrium total binding data constrain n structurally identifiable (SI) parameters if the binding model consists only of bimolecular binding reactions and n + 1 SI parameters for binding models comprising any combination of binding and conformational equilibria. Structural identifiability is a “best-case” scenario that assumes that the binding data contain no noise or systematic artifacts (Bellman and Åström, 1970). Parameter estimation fails for models containing more than the number of SI parameters, as infinite ranges of parameter values yield perfect fits to any binding curve line shape (see Fig. 1 of Middendorf and Aldrich [2017]).
Noise in experimental data creates an additional barrier to obtaining meaningful parameter estimates (Ljung, 1987; Walter and Pronzato, 1997). Parameters are practically identifiable (PI) if they can be determined accurately and with acceptable precision under these more stringent, real-world experimental conditions (Raue et al., 2009). Here, we develop a technique for deconstructing binding curves into their simplest components. The method is the basis of a general approach for assessing and understanding the practical identifiability of binding parameters for proteins with any number of binding sites.
An important goal of this work is to provide tools to investigators that are simple to implement and have broad applicability to the design of binding experiments and to the analysis of binding data. Our approach is essentially model independent, so that the identifiability assessment need only be performed once for a given binding curve rather than being repeated for each candidate model under consideration. Identifiability maps, which classify the identifiability of binding parameters for all possible binding curve line shapes for a specified number (n) of binding sites, are constructed for the specific cases of n = 2, 3, and 4. Instructions are provided for extending the theory to generate these maps for systems with n > 4. In situations in which the parameters are not PI, this knowledge is useful for guiding the design of alternative experimental approaches. We also introduce a useful method for displaying parameter uncertainties that avoids many of the problems normally encountered when visualizing error surfaces for systems with more than a few parameters.
Materials and methods
Results
Nonidentifiability of calmodulin (CaM) binding parameters
An example of a protein with nonidentifiable total binding parameters is CaM, which contains four nonidentical EF-hand calcium-binding sites. Many published studies have used a four-site Adair–Klotz model (Fig. 1 A, top) to fit the Ca2+ total binding curve for CaM. The model parameters K1–K4 are macroscopic equilibrium association constants that quantify ligand binding to receptors containing 0–3 bound ligands, respectively. The model assumes a single protein conformation and does not distinguish between the individual binding sites in the protein nor the multiple ligation states with a given total number of bound ligands. For example, R2 in Fig. 1 A (top) represents all six ligation states containing two bound calcium ions. The total binding relation for this model is which has the form of Eq. 1 with p0 = 0, p1 = K1, p2 = K1K2, p3 = K1K2K3, and p4 = K1K2K3K4.
The previous paper (Middendorf and Aldrich, 2017) showed, based on structural identifiability considerations, that total binding data for a protein with n sites constrains a maximum of n binding parameters (if a single conformation of the protein is assumed). This is an important limitation because it excludes single-conformation models with distinct affinity parameters for the n binding sites and any additional, explicit cooperativity factors for quantifying interactions between the sites. All such models require more than n parameters. The Adair–Klotz model avoids this problem, but at a cost in mechanistic insight: parameters K1–K4 do not distinguish between affinity and cooperativity.
To explore practical identifiability of the Adair–Klotz binding parameters for CaM, a synthetic, noiseless total binding curve (Fig. 1 A, middle left and right, solid curves) was generated using Eq. 2 and the parameter estimates from one study (Linse et al., 1991). Curves nearly identical to this reference curve were obtained using other combinations of parameters with values very different from those in the reference set. For example, the curves represented by circles in Fig. 1 A (middle left) and triangles in Fig. 1 A (middle right) were calculated using comparable values of parameters K3 and K4, but K1 was >3,000 times larger, and K2 was >3,500 times smaller, for the circles compared with the triangles. The considerable flexibility in these parameter values indicates that the Adair–Klotz parameters are likely not PI when constrained by CaM total binding data.
Parameter identifiability for CaM was studied systematically by quantifying the full range of parameter values consistent with the synthetic binding data. These ranges are represented compactly in parameter fit-error curves. For example, the blue curve labeled K2 in Fig. 1 A (bottom) depicts the results of 300 curve fits. For each fit, K2 was held constant at a value in the range 10 M−1 ≤ K2 ≤ 1010 M−1 (indicated on the abscissa), and parameters K1, K3, and K4 were adjusted to yield the best least-squares fit to the synthetic data using nonlinear regression. The ordinate plots the root mean square (rms) error for the fits. (Similar approaches were used in Colquhoun and Ogden [1988] and Solc and Aldrich [1990].) The minimum in the blue curve (which is not visible because the fit error of zero cannot be shown on the logarithmically scaled ordinate) corresponds to the point at which K2 was fixed at the “correct” value (i.e., the value used to generate the synthetic data). The fit error increased when K2 was held constant at a value above or below the correct value. A remarkable feature of this curve is the large range of K2 values over which the fit error maintains a small, nearly constant value. In fact, the abscissa in Fig. 1 A (bottom) is truncated for display purposes at an upper limit of K2 = 1010 M−1; the magnitude of the “asymptotic error” (blue horizontal arrow) remains nearly constant for all K2 > 1010 M−1. Asymptotes with small, nearly constant error extending over an infinite range of parameter values were also present in the fit-error curves for parameters K1 (red curve), K3 (black curve), and K4 (green curve; Fig. 1 A, bottom).
The noise in an experimental binding curve will mask the true value of a parameter if it is larger than the asymptotic error in that parameter’s fit-error curve. In such cases, the parameter is not PI because an infinite range of values yield indistinguishably good fits to the data. Parameters K1–K4 for CaM are not PI because the typical noise in experimental binding curves (∼5%, dashed black line; Stefan et al., 2009) is much larger than the asymptotic errors for these parameters (∼0.27% rms for K1 and K2 and ∼0.48% rms for K3 and K4; Fig. 1 A, bottom). For this example, the only constraints on the parameter values are a lower limit of 2 × 104 M−1 for K4 and an upper limit of 4 × 106 M−1 for K1, based on the intersections of the corresponding fit-error curves with the assumed noise threshold. Typical total binding data would place no constraints on the values of K2 and K3 because the fit errors are <5% for all values of these parameters.
Fig. 2 shows how this enormous flexibility in parameter estimates is achieved: as a given parameter is stepped through a range of values, the other parameters undergo systematic variations that compensate for the incorrect value of the given parameter to produce close fits to the data. A major goal of this paper is to understand the mechanisms of these extremely effective compensations because this knowledge may contribute to the design of experiments that provide more powerful constraints on the binding parameters.
To mimic experimental data more realistically, Gaussian noise with a standard deviation of 0.05 (equivalent to 5% rms noise) was added to the noiseless solid curve in Fig. 1 A (middle left). Each arrow in Fig. 1 B (top left) shows the starting guess for parameter K1 used in fitting 1,000 of such noisy simulated binding curves; the curves below the arrows represent the distribution of estimates for K1 obtained from the fits. It is noteworthy that (a) the starting guesses vary over many orders of magnitude, yet the peaks of the distributions are generally very close to the initial guess; (b) the peaks of the distributions are often very different from the “correct” value of the parameter (indicated by the dashed vertical line); and (c) the distributions are generally very narrow. Similar results were obtained for parameters K2–K4 (Fig. 1 B, top right and bottom left and right). These results lead to the same conclusion as the fit-error plots in Fig. 1 A (bottom): the Adair–Klotz parameters K1–K4 are not PI when constrained by CaM total binding data.
The narrow parameter distributions in Fig. 1 B (all graphs) are misleading, as the individual distributions give the incorrect impression that the estimates are precise and reliable. This observation may partially explain the wide range of values (15-, 6-, 101-, and 66-fold for K1–K4, respectively) reported for CaM binding parameters by different groups. These estimates are summarized in Stefan et al. (2009). These wide ranges are in fact a vast underestimate of the true uncertainties in the binding parameter values, which are infinite (Fig. 1 A, bottom), and thus provide little insight into the mechanism of ligand binding to CaM. The asymptotic errors in the fit-error curves for parameters K1–K4 are so low (tenths of a percentage; Fig. 1 A, bottom) that it is unlikely that experimental refinements could improve the quality of binding data sufficiently to make the CaM binding parameters identifiable. Thus, even well-executed total binding studies (Stefan et al., 2009) are insufficient to overcome the inherently low constraining power of total binding measurements for this system. The inescapable conclusions of this analysis are that parameter estimates obtained from fitting CaM total binding data are meaningless and that reliable estimates are only possible if more powerful experimental approaches are used.
A natural question arises concerning the generality of the results in Fig. 1: is CaM an atypical, pathological case, or do binding curves for real proteins frequently lack the power to provide meaningful constraints on binding parameters? To answer this question satisfactorily, it is necessary to analyze binding curves for proteins with different numbers of binding sites (specified by the factor n in Eq. 1) and, further, to examine all possible line shapes for each value of n. In the following, binding parameter identifiability is investigated systematically, starting with two-site binding curves.
Shape and location parameters for two-site binding curves
For a protein that contains two ligand-binding sites and occupies a single conformation, Eq. 1 simplifies to
Because the units of p1 and p2 are M−1 and M−2, respectively, a unitless proportionality constant, a, can be defined by the relation
Substituting this expression into Eq. 3 yields the reparameterized binding relation
Because p1 and x always appear together and raised to the same power in Eq. 5, multiplying p1 by a constant, k, is equivalent to scaling the concentration axis by the same constant:
Two important conclusions can be drawn from this analysis: if the bound fraction is plotted against the logarithm of ligand concentration, then (1) multiplying p1 by a factor k is equivalent to shifting the binding curve horizontally by an amount log(k) because log(kx) = log(k) + log(x); and (2) varying p1 has no effect on the shape of the binding curve. These effects are illustrated in Fig. 3 (A–C, left). In each panel, two two-site binding curves were computed for the indicated value of parameter a. Multiplying p1 for the dashed curves by 1,000 yields solid curves that are shifted leftward by three log units, with no change in line shape. In contrast, increasing parameter a 1,000-fold from 0.0025 (Fig. 3 A, left) to 2.5 (Fig. 3 C, left) changes the line shapes of the curves significantly.
A simple substitution (Eq. 4) has separated the binding parameters into two classes that are “orthogonal” in the sense that parameters from each class have distinct effects on two-site binding curves: p1 is a location parameter that determines the position of a curve on the concentration axis without affecting its shape, whereas a is a unitless shape parameter that determines the curve’s line shape.
Practical identifiability of parameters for two-site binding curves
The two parameter classes are also orthogonal with respect to parameter identifiability. Increasing p1 by a factor of 1,000 does not change the identifiability of parameters p1 (Fig. 3, A–C, middle, solid vs. dashed curves) or a (Fig. 3, A–C, right, solid vs. dashed curves). In contrast, if the shape parameter is increased by the same factor of 1,000, the binding curve line shape changes in such a way that both parameters, which were initially identifiable (Fig. 3 A, middle and right), become nonidentifiable (Fig. 3 C, middle and right).
The results in Fig. 3 simplify the task of understanding parameter identifiability. It is unnecessary to consider the infinite number of parameter combinations {p1, a} that give rise to all possible two-site binding curves. Rather, the universe of possible line shapes is confined to a single axis representing the values of the continuous variable a.
Fig. 4 A shows the effect of parameter a on the line shape of two-site binding curves. To facilitate the comparison, the value of p1 for each curve was adjusted so that the midpoints are all equal to 10−8 M. The curves exhibit two clearly resolved components for small values of a. As a is increased, the resolution between these components is decreased until the curve appears as a single component for shape parameter values in the range 0.1 < a < 1. Above this range, increasing a has almost no effect on the line shape.
A convenient metric for characterizing binding curve line shapes is their midpoint slope, which can be computed from Eq. 3 as follows. The logarithmic derivative of the binding relation is given by
where the substitution logx = (lnx)/(ln10) was used. The regular derivative of the binding relation is given by
The ligand concentration at which the binding curve reaches half-saturation, xh, is derived by setting Eq. 3 equal to 1 and is given by xh = (1/p2)1/2. Substituting Eq. 8 into Eq. 7 and evaluating the resulting expression at xh yields the midpoint slope
where Eq. 4 was used to simplify the expression.
As predicted above, the midpoint slope depends on shape parameter a but not on location parameter p1. For small values of a, the midpoint slope increases sharply as a is increased (Fig. 4 B, solid curve). For large values of a, the midpoint slope is much less sensitive to increases in a and approaches a limiting value of ln(10). The transition between these two regimens occurs at a critical value of the shape parameter, ac, equal to 1/2 (Fig. 4 B, dashed vertical line).
The uncertainty in a parameter estimated by fitting experimental data should become large if there are conditions for which the experimental observable (the binding curve line shape in our case) becomes insensitive to changes in that parameter. For example, the results in Fig. 4 B suggest that the identifiability of shape parameter a should deteriorate as this parameter increases. Consistent with this prediction, the asymptotic fit error (Fig. 4 C) is large (>20% rms) for two-site binding curves with small values of a but drops well below typical experimental noise levels when a is close to unity or higher, indicating that the shape parameter is PI in the “small-a” regime and not PI in the “large-a” regime. The transition between the two identifiability regimes (Fig. 4 D, dashed vertical line) occurs close to the critical value, ac, identified in Fig. 4 B. These results suggest that (a) two distinct mechanisms couple changes in the shape parameter magnitude to changes in the binding curve line shape; (b) a switch between the mechanisms occurs at a = ac; (c) “mechanism I,” operative for a < ac, produces relatively large changes in line shape per unit change in a, and because of this strong coupling, the shape parameter is identifiable in this range; and (d) “mechanism II,” operative for a > ac, produces relatively small changes in line shape per unit change in a, and because of this weak coupling, the shape parameter is not identifiable in this range. In the following sections, mathematical analysis of two-site binding curve line shapes identifies the nature of these mechanisms, reveals why they operate over specific ranges of a values, and explains quantitatively why the transition between mechanisms occurs at a = ac = 1/2.
Partial fraction (PF) expansion of two-site binding relation
To clarify the relationship between shape parameter magnitude, binding curve line shape, and binding parameter PI, it is useful to express the two-site binding relation (Eq. 5) in terms of simpler components using a PF expansion:
In Eq. 10, the basis functions for the expansion are two one-site binding curves with association constants q and r. It is tempting to relate q and r to the microscopic affinities of the protein binding sites. However, this relation is not always correct, and for our purposes, it is unnecessary to attribute any particular mechanistic significance to q and r. Combining the terms on the right side of Eq. 10 yields
The equality in Eq. 11 holds if the coefficients of like powers of x are equal, so that
and
Solving for r in Eq. 12a and substituting this expression into Eq. 12b yields a quadratic equation (the two-site “PF polynomial”):
The PF expansion constants q and r for any two-site binding curve specified by parameters {p1, a} are the roots of the corresponding PF polynomial. Because Eq. 13 has solutions for all real values of p1 and a, it follows that every two-site binding relation can be expanded exactly using this mathematically equivalent form (Eq. 10). Furthermore, binding curves with distinct parameters {p1, a} have unique PF components that specify their underlying structure.
Fig. 5 (A–D, left) shows the PF component curves (red and blue traces) corresponding to the two-site binding curves in Fig. 4 A. The reason for choosing one-site binding curves as the basis functions for the PF expansion is that all one-site binding curves have the same line shape. Thus, the essential information obtained from the expansion is conveyed in the simplified form of “PF spectra,” which depict the midpoints xr = 1/r and xq = 1/q of the PF components as vertical lines (Fig. 5, A–D, right).
Critical value of shape parameter a
The shape parameter can be expressed explicitly in terms of the one-site PF constants by substituting Eq. 12a into Eq. 12b:
Eq. 14 can be rewritten in the equivalent form
where GM and AM represent the geometric and arithmetic means of q and r, respectively. An important inequality of classical analysis requires GM ≤ AM for any two real, positive numbers (Hardy et al., 1934), with equality applying when the two numbers are equal. Applying this inequality to Eq. 15 indicates that a ≤ 1/2 for real values of q and r and that a = 1/2 when q = r. This value is equal to the critical value of the shape parameter, ac, deduced above from line shape (Fig. 4 B) and parameter identifiability (Fig. 4 D) considerations.
How are values of a > 1/2 achieved? The nature of the roots of a polynomial are given by its discriminant, D (Gelfand et al., 1994). The roots are real and unequal if D > 0, real and equal if D = 0, and complex if D < 0. The discriminant of the two-site PF polynomial (Eq. 13) is
Eq. 16 indicates that shape parameter values larger than the critical value of 1/2 are only possible if q and r are complex. In fact, because the coefficients of the PF polynomial are real numbers derived from real physical quantities, q and r must be complex conjugates for a > 1/2 (Uspensky, 1948):
and
where i represents the square root of −1 and the real numbers qreal and qimag are the real and imaginary parts of q and r.
We have demonstrated a striking correlation between the nature of the PF expansion constants and the identifiability of two-site binding parameters. In the range 0 < a < ac, q and r are real and distinct, and the binding parameters p1 and a are identifiable, whereas in the range a > ac, q and r are complex conjugates, and the binding parameters are not identifiable (Figs. 3 and 4). The next two sections describe how the mathematical nature (real versus complex) of the PF constants for a binding curve determines the identifiability of the curve’s binding parameters.
Strong-coupling mechanism (mechanism I) and real-valued PF components
The PF component spectra in Fig. 5 (A–F, right) provide a visual representation of the relationship between shape parameter magnitude and binding curve line shape. When the midpoints of the PF components are plotted against the logarithm of ligand concentration, the spacing between them, Δ1, is given by the ratio of the one-site association constants,
because log(Δ1) = log(r/q) = log(xq/xr) = log(xq) − log(xr). (Without loss of generality, it is assumed that r ≥ q, so that Δ1 ≥ 1.) Substituting Eq. 18 into Eq. 14 yields
Eq. 19 is the mathematical representation of mechanism I, the coupling between the shape parameter and the binding curve line shape for a < ac. The shape parameter is increased from 0.001 (Fig. 5 A, right) to 1/2 (Fig. 5 D, right) by decreasing Δ1 from 106 to 1. In other words, for values below the critical value, the shape parameter is increased by moving the PF components closer together until they overlap at a = ac. Because the change in line shape per unit change in the shape parameter is large (compare curves for a < 1/2 in Fig. 4 A), the coupling is strong, and the shape parameter is identifiable in this range.
Weak-coupling mechanism (mechanism II) and complex-valued PF components
For a > ac, the PF constants are complex conjugates, and the binding curve cannot be decomposed into two real-valued one-site components. Thus, there is no spacing between components as there is when the PF constants are real-valued, and so mechanism I is not operative. Instead, we define the real number Γ as the ratio
Eq. 21 is the mathematical representation of mechanism II, the coupling between the shape parameter and the binding curve line shape for a > ac. For example, the shape parameter is increased from 1/2 (Fig. 5 D, right) to 100 (Fig. 5 F, right) by increasing the ratio of the imaginary to the real components of q and r from 0 to 200. Because the change in line shape per unit change in the shape parameter is relatively small (compare curves for a > 1/2 in Fig. 4 A), the coupling is weak, and the binding parameters are not identifiable in this range. Because the main effect of Γ is to (slightly) alter the steepness of two-site binding curves, we will refer to it later as a slope factor.
Interpretation of complex-valued PF constants
For a > ac, the PF constants q and r are complex conjugates (Eqs. 17a and 17b), and hence the one-site PF component curves cannot be represented on a physically meaningful (i.e., real-valued) concentration axis. Our interpretation is that two-site binding curves with shape parameters in this range cannot be decomposed meaningfully into simpler components (Fig. 5, E and F). Consistent with this idea, two-site curves with a > 1/2 have the appearance of single components, with no indication of underlying resolved structure (Fig. 4 A). It is worth noting that there is no inconsistency between the complex-valued one-site components q and r and the real-valued two-site binding parameters p1 and a. The expressions for parameters p1 and a in Eqs. 12a and 12b depend on the sum q + r and the product q r, both of which take on real values when q and r are complex conjugates.
Summary of root-locus and identifiability analyses for two-site binding curves
The practical identifiability of parameters for two-site total binding curves is summarized in Fig. 5 H. Mathematical analysis of Eq. 14 indicates that the line representing all possible values of shape parameter a (or, equivalently, all possible two-site binding curve line shapes) can be divided into two segments at the point a = ac. The mapping of PF constants q and r onto the line of shape parameter values (the “root-locus map”) is simple: q and r are real and distinct in the left segment, real and equal at a = ac, and complex conjugates in the right segment. This map coincides with the identifiability map for the shape parameter, which is PI when smaller than ac and not PI when larger than ac. The differences in identifiability are caused by the quantitatively different effects of the shape parameter on the binding curve line shape in these two regimens. For example, the midpoint slope becomes 50 times steeper when the shape parameter is increased 100-fold from 0.005 to 0.5, but only increases by a factor of 2 when a is increased 100-fold from 0.5 to 50.
Parameter compensations in the fitting of two-site binding curves
It is instructive to examine individual curve fits from fit-error plots to gain insight into the parameter compensations that render binding parameters nonidentifiable. Fig. 6 A (top) shows the shape parameter fit-error plot for a simulated two-site binding curve with a = 100. When a was forced to take on various incorrect values (Fig. 6 A, top, the points labeled A–C and E–H), the nonlinear regression algorithm obtained the best fits by matching the midpoint of the fit to that of the simulated data (Fig. 6 A, second panel). We showed earlier that the midpoint, xh, of a two-site binding curve is given by xh = (1/p2)1/2, which, using Eq. 4, is equivalent to Thus, when a is forced to take on an incorrect value, the midpoint of the fit curve can still be matched perfectly to that of the simulated data curve by varying p1 so that the product of parameters a and p1 remains constant (Fig. 6 A, third and fourth panels). Because this strategy yields excellent fits over a large range of shape parameter values (Fig. 5 A, second panel, points C–H), the shape parameter is not PI for binding curves with a = 100.
The reason why this strategy works is that the midpoint slopes of the simulated data and fit curves are all close to the limiting value of ln(10) (Eq. 9) and do not change much as a is varied around the correct value of 100 (see Fig. 4 B). Only when the shape parameter is forced to take on values close to or below the critical value (as in Fig. 6 A, top, points A and B) do the line shapes of the fit curves deviate significantly from that of the synthetic data curve.
For synthetic data curves with shape parameters smaller than the critical value, such as a = 0.1, the fitting software uses the same strategy of matching the midpoints of the data and fit curves (Fig. 6 B). This strategy is less successful in this case, however, because the midpoint slope changes significantly as a is varied away from the correct value of 0.1 (see Fig. 4 B). As a result, the fit error increases sharply for values of a above and below the correct value (Fig. 6 B, second panel), the asymptotic error is higher than typical experimental noise, and the shape parameter is PI.
Shape and location parameters for three-site total binding curves
The analysis of parameter identifiability for three-site binding curves proceeds similarly as for two-site curves, and starts with the canonical form of the binding relation. For a protein that contains three ligand-binding sites and occupies a single conformation, Eq. 1 simplifies to
This equation can be reparameterized by noting that the units of parameters p1, p2, and p3 are M−1, M−2, and M−3, respectively, so that unitless proportionality constants a and b can be defined by the relations
and
Substituting these relations into Eq. 23 yields
As for two-site curves (Eq. 6), p1 and x always appear together and raised to the same power, so that replacing p1 by k p1 is equivalent to replacing x by k x:
This substitution has the effect of shifting the binding curve along the concentration axis by a factor of log(k) without altering its line shape.
The effects predicted in Eq. 26 are illustrated in Fig. 7 (A–E, left). In each panel, two three-site binding curves were computed for the indicated values of shape parameters a and b. Multiplying p1 for the dashed curves by 1,000 yields curves that are shifted leftward by three log units (solid curves), with no change in line shape. In contrast, increasing parameters a and b changes the curves’ line shapes significantly. Again, simple reparameterizations (Eqs. 24a and 24b) have created two orthogonal classes of binding parameters: p1 is a location parameter that determines the horizontal position of the curves, whereas the two shape parameters a and b determine their line shape.
Approach to understanding parameter identifiability for three-site binding curves
The identifiability of parameters constrained by two-site total binding curves deteriorates as shape parameter a is increased (Fig. 4, C and D). This simple relationship is maintained for location parameter p1 (Fig. 7, A–E, middle) and shape parameter b (Fig. 7, A–E, right, blue curves) for three-site binding curves. However, shape parameter a is not PI for three-site curves with small (Fig. 7 A, right) or large (Fig. 7, D and E, right) values of a and b but is PI for intermediate (Fig. 7, B and C, right) values of a and b.
In the following, we show that this confusing identifiability pattern can be understood using a strategy similar to the one described above for two-site binding curves. Many of the insights gained there will carry over to the analysis of three-site curves. However, new phenomena will be encountered in the three-site case that have no equivalent for two-site curves. Our approach consists of the following steps: (1) the PF expansion is used to decompose three-site binding curves into their simplest components; (2) a root-locus map is computed, in which the regions of the (a, b) shape parameter space are characterized according to whether they yield real- or complex-valued PF components; (3) parameter fit-error curves are computed for numerous binding curves throughout the (a, b) space, and these results are used to create a parameter identifiability map of this space; (4) the identifiability pattern is interpreted in terms of the effects of real versus complex PF components on the binding curve line shapes; and (5) the validity of this interpretation is tested by its ability to explain why parameter compensations are effective in some cases, leading to nonidentifiable parameters, and ineffective in other cases, leading to identifiable parameters.
PF expansion of three-site total binding relation
The canonical three-site total binding curve (Eq. 25) can be expressed as the sum of three one-site binding curves with association constants q, r, and s:
Combining the terms on the right side of Eq. 27 yields
Equating the coefficients of like powers of x in Eq. 28 yields
and
Eliminating variables r and s from Eqs. 29a, 29b, and 29c yields a cubic equation (the three-site PF polynomial):
The association constants q, r, and s of the one-site PF components are the roots of Eq. 30. Because the coefficients in Eq. 30 are products of the real numbers p1, a, and b, there are two possibilities for the three roots of this equation: they are either all real, or else one is real and two are complex conjugates (Uspensky, 1948). The family of PF spectra in Fig. 8 A provide a concrete picture of these abstract scenarios. The PF constants q, r, and s are all real and distinct in subpanels −i, o, and i; they are all real, and two are equal in subpanels −ii and ii; and one is real, and two are complex conjugates in subpanels −iv, −iii, iii, and iv. A subtlety that occurs in the case of three binding sites (but not for two sites) is that there are two distinct ways in which the complex conjugates can be paired: (1) s real and (q, r) complex conjugate (Fig. 8 A, subpanels −iv and −iii) or (2) q real and (r, s) complex conjugate (Fig. 8 A, subpanels iii and iv). We show below that these two pairings behave differently with respect to parameter identifiability.
Root-locus maps and the shape parameter space
We showed above that the universe of all possible two-site line shapes can be represented as points on the line of values of shape parameter a (Fig. 5 H). The critical value of the shape parameter divides this line into two segments and completely specifies the two-site root-locus map. This map provides a simple framework for interpreting the identifiability of two-site binding parameters.
The universe of all possible three-site line shapes can be represented as points in the plane of shape parameter pairs (a, b; Fig. 8 B). In the next several sections, we show how real and complex values of the PF components {q, r, s} project onto the (a, b) parameter space. This root-locus map will be essential for interpreting parameter identifiability for three-site binding curves.
Critical values of shape parameters for three-site binding curves
In this section, we determine the critical values of shape parameters a and b, which are needed to create the root-locus map of the (a, b) parameter space. The quantitative relationship between shape parameter a and the PF constants for three-site binding curves is obtained by solving Eq. 29b for a and substituting Eq. 29a into this expression, yielding
A similar procedure using Eqs. 29a and 29c yields the corresponding relationship for shape parameter b:
Eq. 31b can be rewritten as
Because AM ≥ GM for real, positive values of q, r, and s by the AM-GM inequality (Hardy et al., 1934), Eq. 32 requires that the PF components cannot all be real if b exceeds the critical value bc = 1/3.
The AM-GM inequality is not applicable to Eq. 31a. However, the numerator and denominator of this equation are related to the elementary symmetric polynomials in three variables of degree two and one, respectively (Uspensky, 1948). Maclaurin discovered a family of inequality relations between these polynomials (Cvetkovski, 2012). The inequality relevant to Eq. 31a states that, for real, positive numbers q, r, and s,
with equality applying when the numbers are all equal. Comparison of Eq. 33 with Eq. 31a indicates that the maximum value of a that yields all real values for the set {q, r, s} is the critical value ac = (1/3)1/2. It is worth noting that this critical value is different than the critical value ac = 1/2 for the case of two binding sites (Table 1).
The critical values ac and bc partition the (a, b) shape parameter space into four quadrants (labeled I–IV in Fig. 8 B). Shape parameter pairs in quadrants II, III, and IV correspond to binding curves whose PF components must include a pair of complex conjugate values because one or both of the shape parameters exceed their critical value. However, knowledge of the critical values ac and bc is not sufficient to complete the three-site root-locus map. It remains to establish the nature of the PF constants that map to quadrant I. This question is addressed in the next section.
Mapping real-valued PF constants to quadrant I of the (a, b) shape parameter space
Three-site binding curves whose PF spectra consist of all real values of q, r, and s are completely specified by two spacing factors. The factor Δ2 = s/q represents the distance between the midpoints of the outermost PF components (Fig. 8 A, subpanel ii, red and blue components) because log(Δ2) = log(xq) − log(xs). Similarly, the factor Δ1 = r/q represents the distance between the midpoints of the middle and rightmost one-site components (Fig. 8 A, subpanel i, black and blue components) because log(Δ1) = log(xq) − log(xr). (Without loss of generality, it is assumed that s ≥ r ≥ q, so that Δ2 ≥ Δ1 ≥ 1.) Substituting these relations for Δ1 and Δ2 into Eqs. 31a and 31b yields
and
Eqs. 34a and 34b were used to map the PF spectra in subpanels −i, o, and i of Fig. 8 A onto quadrant I in the (a, b) parameter space (Fig. 8 B), where they are shown as black circles. This process was repeated for many additional values of Δ1 while holding Δ2 constant at 106; the locus of points so obtained is shown by the thin black line in Fig. 8 B.
It should be possible to use Eqs. 34a and 34b to identify all points in quadrant I corresponding to real values of q, r, and s. We will demarcate this region, which contains infinitely many (a, b) pairs, by determining its boundaries, of which there are two. One boundary corresponds to all sets of real PF constants for which r = q (equivalent to Δ1 = 1; see Fig. 8 A, subpanel ii). This boundary is represented by the bold blue curve marked “r = q” in Fig. 8 B, which was calculated using a range of Δ2 values in Eqs. 34a and 34b while holding Δ1 = 1. The second boundary corresponds to all sets of real PF constants for which r = s (equivalent to Δ1 = Δ2; see Fig. 8 A, subpanel ii), and is represented by the bold red curve marked “r = s” in Fig. 8 B. This curve was calculated using a range of Δ2 values in Eqs. 34a and 34b while holding Δ1 = Δ2. The triangular region enclosed by the bold blue and red curves in Fig. 8 B thus represents all (a, b) pairs that yield real and distinct values of q, r, and s.
Mapping complex-valued PF constants to the (a, b) shape parameter space
By default, the area in quadrant I not enclosed by the bold red and blue curves must correspond to (a, b) pairs whose PF expansion contains a pair of complex conjugate association constants. The PF spectra in those cases consists of a one-site component (derived from the real-valued PF association constant) and a two-site component (derived from the pair of complex conjugate association constants). There are two possibilities to consider.
The first possibility is that s is real and q and r are complex conjugates (such as Fig. 8 A, subpanels −iv and −iii). In this case, the midpoint of the one-site PF component is given by xs = 1/s. By combining Eqs. 12b, 17a, 17b, and 22, the midpoint of the two-site component is xqr = 1/(qreal2 + qimag2)1/2. By generalizing the previous definition of the spacing factor to include complex values, we obtain Δ2 = s/|q|, where |q| represents the magnitude of the complex number q and is given by |q| = (qreal2 + qimag2)1/2. Thus, Δ2 represents the separation between the midpoints of the one-site and two-site components because log(Δ2) = log(s/|q|) = log(xqr/xs) = log(xqr) − log(xs).
The spacing factor Δ1 is not useful here because it is always equal to 1 when q and r are complex conjugates. We use instead the slope factor Γ, which is the ratio of the imaginary to the real parts of q and r (Eq. 20). Substituting the above relations for Δ2 and Γ into Eqs. 31a and 31b yields
and
Eqs. 35a and 35b were used to map the PF spectra in Fig. 8 A (subpanels −iv, −iii, and −ii; for which Δ2 = 106) onto the (a, b) parameter space, where they are shown as blue circles (Fig. 8 B) This process was repeated for many additional values of Γ while holding Δ2 constant at 106; the locus of points obtained is shown by the thin blue line in Fig. 8 B.
It is worth noting that in the transition from PF components with all real values of q, r, and s (Eqs. 34a and 34b) to PF components containing a complex conjugate pair (Eqs. 35a and 35b), one degree of freedom has been transformed from a spacing factor, Δ1, to a slope factor, Γ (see Fig. 8 C). We show later that this phenomenon helps to explain the parameter compensations that underlie parameter nonidentifiability.
The second possibility for complex-valued PF constants is that q is real and r and s are complex conjugates (such as Fig. 8 A, subpanels iii and iv):
and
Using the same approach as for the previous case, we define the spacing factor, Δ2 = |s|/q, which again represents the separation between the midpoints of the one-site and two-site components. The spacing factor Δ1 is not useful here because it is always equal to Δ2 when r and s are complex conjugates. We use instead the slope factor Γ, which in this case is defined by Γ = simag/sreal.
and
Eqs. 37a and 37b were used to map the PF spectra in Fig. 8 A (subpanels ii, iii, and iv, for which Δ2 = 106) onto the (a, b) parameter space and are shown as red circles (Fig. 8 B). This process was repeated for many additional values of Γ while holding Δ2 constant at 106; the locus of points obtained is shown by the thin red line in Fig. 8 B.
Contours of constant Δ2 reveal a second level of organization in the (a, b) space
The union of the thin blue, black, and red segments in Fig. 8 B represents a “contour of constant Δ2,” for Δ2 = 106. The pattern of these roots hints at another level of organization in the root-locus map beyond the distinction between real and complex roots that defines the boundaries between quadrants I–IV. In this finer level of organization, complex PF constants segregate into two distinct regions according to whether q and r are complex conjugates (thin blue line) or r and s are complex conjugates (thin red line).
A more complete three-site root-locus map (Fig. 9) was constructed by calculating many additional contours of constant Δ2 for values of Δ2 ranging from 1 to 1021 using Eqs. 34a, 34b, 35a, 35b, 37a, and 37b. Consistent with the pattern in Fig. 8 B, all of the contours of constant Δ2 in Fig. 9 are composed of segments that cluster into three areas. Sets of {q, r, s} that are all real are located inside the red/blue triangle in quadrant I (region A, black), whereas sets for which r and s are complex conjugates cluster to a connected region in the right part of the space (region B, red), and sets for which q and r are complex conjugates cluster to a connected region in the left part of the space (region C, blue). Also shown in Fig. 9 are the values of Δ1 for the points in region A and the values of Γ for the points in regions B and C. We show later that this second level of organization in the root-locus map is functionally significant, because parameter identifiability is different in regions A, B, and C.
Identifiability map for three-site binding curves
The blue, black, and red circles in Fig. 9 represent a sample of 64 points distributed over a large area in the (a, b) parameter space. Binding curves for each pair of (a, b) values are shown at the corresponding locations in Fig. 10 and illustrate the wide variety of line shapes that are possible for three-site binding curves.
We explored parameter PI systematically by computing fit-error curves for parameters p1, a, and b for each of the three-site binding curves in Fig. 10. The values next to each binding curve represent, from top to bottom, the asymptotic errors from the fit-error curves for parameters p1, a, and b. For example, the asymptotic errors for parameters p1, a, and b were 0.16, 9.5 × 10−6, and 0.19 for the binding curve corresponding to (a, b) = (10−4, 10−3). Values are shown in bold text if the asymptotic error was greater than 5% (indicating that the parameter is PI) and in normal text if the asymptotic error was <5% (indicating that the parameter is not PI).
The simple identifiability pattern in Fig. 10 comprises three distinct regions: in quadrants II, III, and IV (dark green), none of the parameters are PI; inside the triangular region of quadrant I (yellow), all three parameters are PI; and in the remaining part of quadrant I (light green), only parameters p1 and b are PI. These results are similar to those for two-site curves (Fig. 5 H) in two respects: (1) all of the parameters are PI in the region of shape parameter space that contains only distinct, real-valued PF components (Fig. 10, yellow shading), and (2) nonidentifiable parameters are only obtained in the regions of shape parameter space that contain complex-valued PF components (Fig. 10, light green and dark green shading). However, the PI map for three-site curves contains an additional subtlety that has no equivalent for two-site curves. In the light green area of quadrant I, only a subset of the parameters are identifiable, whereas in the yellow region of quadrant I, all of the parameters are identifiable. The identifiability pattern in Fig. 10 reflects both levels of organization (quadrants I–IV and regions A–C) present in the root-locus map (Fig. 9).
The effectiveness of parameter compensations explains regions in which all, some, or none of the binding parameters are identifiable
Binding curves in the yellow-shaded region in quadrant I (Fig. 10) have three real, distinct PF roots, and thus three distinct one-site components in their PF expansions (such as in Fig. 8 A, subpanels −i, o, and i). The spacing between these components, quantified by the factors Δ1 and Δ2, completely specifies the line shape of three-site binding curves. Eqs. 34a and 34b indicate that it is not possible to change one of the shape parameters to an incorrect value (as is done during the calculation of a fit-error curve) without altering either Δ1 or Δ2, which in turn would change the line shape and yield a poor-quality fit to the data curve. As a result, the binding parameters are PI in this region.
Upon moving from the yellow region to one of the green regions in Fig. 10, two of the real PF constants are replaced by a complex conjugate pair, which collapses two of the one-site PF components into a single two-site PF component (such as Fig. 8 A, subpanels −iv, −iii, iii, and iv). The binding curve line shape in these regions is determined by the slope of the two-site component (determined by parameter Γ; see Eq. 20), and by the spacing between the two-site component and the remaining one-site component (specified by parameter Δ2).
We noted earlier that changing parameter Γ has small effects (a factor of 2 at most) on the midpoint slope of the two-site PF component. Furthermore, during a curve fit, location parameter p1 can be varied to shift the calculated curve along the abscissa without changing its shape. Thus, to achieve the best fit to binding curves derived from the green areas of Fig. 10, it is only necessary to vary parameters a and b so that Δ2 for the fit and data curves are equal. The key to understanding whether the parameter compensations occurring during data fitting are effective (leading to non-PI parameters) or ineffective (leading to PI parameters) is to understand how Δ2 depends on a and b.
Consider example binding curves from two regions of the (a, b) parameter space with different identifiability properties. The first curve (“curve A”), for which (a, b) = (10, 0.1), maps to quadrant II (circled in Fig. 10). For this curve, parameters p1, a, and b are all not PI. The second curve (“curve B”), for which (a, b) = (10−4, 10−4), maps to the light green region of quadrant I. For this curve, parameters p1 and b are PI, but parameter a is not PI. The spacing factor Δ2 = 106 for curves A and B (note that they are on the same contour of constant Δ2), but the relative positions of their one- and two-site components are reversed. Given this symmetry, it is perhaps surprising that the parameter identifiability for these curves is different. In the following, we show that the distinct identifiability behaviors are explained by differences in how Δ2 depends on a and b in quadrant I compared with quadrant II.
The contours of constant Δ2 in Fig. 9 graphically depict the relationship between Δ2 and the shape parameters a and b throughout the (a, b) parameter space. In region B, the shape parameters are given by Eqs. 37a and 37b. Numerical comparison of factors Δ2 and Γ indicates that Δ2 >> (1 + Γ2)1/2 in quadrant II (which lies entirely within region B). Applying this approximation to Eqs. 37a and 37b yields
and
Combining Eqs. 38a and 38b yields
Eq. 39 indicates that Δ2 remains constant if the ratio of shape parameters a and b remains constant. Indeed, for a wide range of values of parameter a (Fig. 11 A, top, points C–G), the nonlinear regression selects values for parameter b that preserve this ratio (Fig. 11 A, bottom), leading to low-error fits to curve A (Fig. 11 A, middle). Similar arguments apply to the fit-error curve for parameter b (Fig. 11 B, top), except in this case parameter a is varied in response to the linear ramp of b values to maintain a constant ratio a/b (Fig. 11 B, bottom), yielding low-error fits to curve A (Fig. 11 B, middle). These efficient parameter compensations explain why shape parameters a and b are not PI for curve A (or for any other curve from quadrant II).
In region C (Fig. 9), the shape parameters are given by Eqs. 35a and 35b. Numerical examination of factors Δ2 and Γ indicates that (1 + Γ2)1/2 Δ2 >> 1 in the green-shaded part of quadrant I (Fig. 10), which lies entirely within region C (Fig. 9). Applying this approximation to Eq. 35b yields
consistent with the finding that the contours of constant Δ2 are horizontal lines in this region. An asymmetry has been created because Δ2 depends on only one of the shape parameters. The lack of dependence of Δ2 on shape parameter a in Eq. 40 means by definition that parameter a is not identifiable for curve B (Fig. 11 C). However, parameter b is identifiable because varying it changes Δ2, and this effect cannot be compensated for by changing parameter a, as Δ2 does not depend on a (Fig. 11 D).
Extension of the theory to proteins with more than three binding sites
Our approach to assessing binding parameter identifiability is easily extended to systems with more than three binding sites. The analysis starts with the total binding relation (Eq. 1), whose form is conserved for all values of n. This equation can be reparameterized using substitutions of the following form (see Eqs. 24a and 24b):
Here, we have adopted a more flexible nomenclature for the shape parameters, in which the subscript denotes the power of ligand concentration that each shape factor appears with in the binding relation. Thus, shape parameter a becomes a2, b becomes a3, etc. Substituting Eq. 41 into the general n-site total binding relation (Eq. 1) yields
Eq. 42 has the remarkable property that multiplying p1 by a constant k is equivalent to scaling the abscissa by that same constant:
Eq. 43 indicates that parameter p1 is a location parameter that determines the position of the binding curve on the concentration axis, and the (n − 1)-dimensional set {a2, a3, …, an} are shape parameters that determine the binding curve line shape.
The reparameterized n-site binding curve can be decomposed into n one-site components using a PF expansion. The association constants for the one-site components are the roots of the n-site PF polynomial equation (see Eqs. 13 and 30 for examples for two and three sites), which has the general form
where a1 = 1. The association constants of the PF components, obtained by solving Eq. 44, are either all real, or else pairs of them are complex conjugates, with the remainder real.
The shape parameters ak for an n-site system are related in a simple way to the elementary symmetric polynomials (Uspensky, 1948), and explicit formulas for them in terms of the PF constants {q, r, s, …} can be written down by inspection for any value of n. The examples in the second column of Table 1 illustrate the predictable pattern followed by the ak for various values of n.
The critical values of the shape parameters for any value of n are easily computed using the appropriate set of Maclaurin inequalities (Cvetkovski, 2012). These inequalities yield a general formula for the critical value of parameter ak for n sites:
where Cn,k is the binomial coefficient n choose k. Critical values of the shape parameters for two-, three-, and four-site binding curves computed using Eq. 45 are compiled in Table 1.
Mathematical analysis of the PF components can be used to create a root-locus map of the shape parameter space. For example, the PF spectra for n = 4 (Fig. 12 A) represent cases where the PF roots are all real and all are distinct (subpanel o), two are equal (i, iii, and v), three are equal (ii and iv), and two pairs are equal (vi). These diagrams were used to guide the computation of the surface in Fig. 12 B. The black “sheet” in Fig. 12 B represents the locus of (a, b, c) triples corresponding to equal values of the PF constants q and r (Fig. 12 A, v), and was calculated from the shape parameter equations for four sites in the fourth column of Table 1 with Δ1 = 1. Similarly, the red sheet corresponds to PF spectra for which s = t (Fig. 12 A, iii), calculated for Δ2 = Δ3, and the blue sheet to those for which r = s (Fig. 12 A, i), calculated for Δ1 = Δ2. Together, these sheets enclose the region of all (a, b, c) triples whose PF components have distinct, real-valued association constants. This region corresponds to all four-site binding curves with four resolved features. This volume is the four-site equivalent to the triangular area enclosed by bold red and blue curves in Fig. 8 B. Our results for two- and three-site binding curves suggest that all n binding parameters are identifiable for four-site binding curves that map to the volume enclosed by the blue, red, and black sheets in Fig. 12 B.
Root-locus maps for systems with more than four binding sites would be generated in an analogous fashion. An important advantage of using the PF expansion approach is that the complexity of the calculations does not increase rapidly when the number of binding sites is increased, as it would with other methods. For example, the root-locus maps could be computed using the discriminant, but the number of terms in this function increases exponentially with degree; the discriminants for n = 4, 5, and 6 contain 16, 59, and 246 terms (Gelfand et al., 1994). In contrast, the geometric picture provided by PF spectra remain simple as the number of sites increases (Fig. 13), and the equations used to compute the boundaries of the region in the shape parameter space corresponding to all real-valued PF constants remain manageable (see Table 1 for examples for two, three, and four sites).
Discussion
Summary
Many proteins, such as ligand-activated receptors, contain multiple ligand binding sites. Function is understood at a detailed, mechanistic level for only a small number of these molecules. One barrier to progress is the difficulty in reliably quantifying site affinities and cooperative interactions because the uncertainties in parameters estimated by fitting binding data may be very large. For example, an infinite range of parameter values for the Adair–Klotz model yield close fits to the binding curve for CaM (Fig. 1), and hence these data yield almost no insight into the mechanism of calcium binding. We showed that this property is not unique to CaM, but rather is a general phenomenon encountered when parameters are estimated by fitting binding curves (Figs. 3, 4, 6, 7, 10, and 11). Thus, to understand the factors that determine binding parameter identifiability, our approach is to pose the general question, “Under what conditions are the ranges of binding parameter estimates not infinite?” rather than to simply compute the range of parameter values consistent with given data fitted by a specified model.
The infinity of potential models and the infinity of binding curve line shapes argues against using purely numerical approaches to exhaustively compute multidimensional error surfaces. Rather, we exploit features of binding problems that enable the use of analytical methods, which yield general insights into parameter identifiability. First, the binding relations for the universe of all models composed entirely of binding equilibria collapse into a single, conserved functional form for proteins with an arbitrary number of binding sites. In the previous paper (Middendorf and Aldrich, 2017), analysis of the canonical binding relation using matrix algebra revealed simple rules governing binding parameter structural identifiability: at most, n parameters can be quantified reliably by fitting total binding data for a protein with n binding sites. (The number of SI parameters increases to n + 1 for models that also include conformational change of the receptor.) This absolute limit rules out the use of models that include distinct association constants for the n binding sites and explicit cooperative interactions between those sites. Unfortunately, these are the quantities of greatest mechanistic interest! If some of the site affinities can be assumed to be identical (as a result of, for example, symmetry, as in the case of homo-multimeric proteins), then it may be possible to quantify both the affinities and cooperative interactions. In other cases, simplifying assumptions are forced on the investigator. An example of such a compromise is the Adair–Klotz model (Fig. 1 A), which uses n macroscopic association constants that do not distinguish between site affinities and cooperative interactions.
The practical identifiability of binding parameters, the subject of the present paper, is an inherently numerical problem. We assessed binding parameter PI by computing fit-error curves. Analytical tools such as the PF expansion technique and mathematical inequalities were used to guide and interpret these numerical calculations. The results of our hybrid numerical/analytical approach is summarized in identifiability maps (see Figs. 5 H, 10, and 12 B), which indicate that the n SI parameters described above can only be reliably quantified for binding curves containing n resolved components. This result reveals a significant limit to the power of equilibrium binding measurements because the ligand-binding curves for many proteins (including CaM, Fig. 1) do not satisfy this criterion.
Our approach to understanding binding parameter identifiability and the structure of total binding curves
Our analysis of binding parameter identifiability is based on five key ideas. (1) The mathematical form of the total binding relation at equilibrium is conserved for most physically reasonable binding models. (2) The parameters for such models can be converted into model-independent location and shape parameters, which determine the position and line shape, respectively, of the binding curve. (3) Every total binding curve can be decomposed into simpler component curves using a PF expansion. (4) Practical identifiability of binding parameters can be assessed by computing parameter fit-error curves. (5) The practical identifiability of a binding curve’s parameters is ultimately determined by the mathematical nature (real vs. complex) of the constants in its PF expansion, which can be visualized on a root-locus map (as in Figs. 5 H, 9, and 12 B). These ideas are discussed in the following sections.
Conserved form of total binding relation
When biophysical parameters are estimated by fitting models to data, each candidate model under consideration may generate a different equation relating the parameters to the experimental observable. Parameter estimation must then be repeated for each of the (possibly numerous) candidate models. However, two remarkable properties of binding systems can be exploited to simplify this process. First, as we showed in the previous paper (Middendorf and Aldrich, 2017), the parameters of mechanistic models whose unitary transitions comprise any combination of bimolecular ligand–protein association reactions and unimolecular conformational equilibria can be converted into a model-independent, SI parameter set {p1, p2, …, pn}. Second, because ligand binding occurs to a finite, saturable lattice of sites, the total binding relation takes on a canonical form (Eq. 1) for nearly any physically reasonable model.
These observations suggested to us an alternative approach to analyzing binding curves in which the data are only fit once, using the canonical binding equation to estimate the universal, SI parameters described above. If the parameters are PI, then the reliable fit parameter estimates obtained from this one-time fit are easily converted algebraically to parameters of specific mechanistic models. In cases in which parameters are not PI, this fact will be readily established, because any of the fits to the data will yield shape parameters that map to a “non-PI” region of the appropriate identifiability map.
It is worth noting that total binding measurements are not useful for selecting a preferred binding model from a group of candidate models, as all such models for a given number of sites reduce to an equivalent mathematical form (Eq. 1).
Location and shape parameters
The universal, SI parameter set {p1, p2, …, pn} developed in the previous paper (Middendorf and Aldrich, 2017) can be transformed by simple substitutions (Eq. 41) into an alternate, but still SI, parameter set that contains two distinct classes of parameters: location parameter p1 determines a binding curve’s horizontal position (Fig. 3, A–C, left; and Fig. 7, A–E, left), whereas the n − 1 shape parameters determine the curve’s line shape (Figs. 4 A and 10). We also discovered that the dimensionality of the identifiability problem can be reduced from n to n − 1 parameters because the location parameter has no effect on identifiability (Fig. 3, A–C, middle).
PF expansion of n-site total binding curves
The PF expansion decomposes binding curves into their simplest components. Every n-site binding curve can be expressed uniquely and exactly as a sum of n one-site binding curves (see Eqs. 10 and 27). The PF spectrum of a binding curve (Fig. 5, A–F, right; and Fig. 8 A) reveals its essential structural components, much as the Fourier transform reveals the frequency components of a time-dependent signal. The simplicity of PF spectra can be contrasted with the often complicated line shapes of total binding curves (Figs. 3, 4, 7, and 10).
The PF expansion constants (q, r, s, …) are the roots of an n-th order PF polynomial equation (Eq. 44) and are readily computed using common numerical techniques. These roots may all be real, or some may occur as complex conjugate pairs. The latter are only physically meaningful if both members of a pair are merged into a single component with a two-site line shape. Thus, one resolved feature in a binding curve (out of the maximum possible of n) may be lost for each pair of complex conjugate components in its PF expansion. This observation provides a very simple rule for qualitatively assessing binding parameter identifiability: one or more parameters will not be identifiable for binding curves with less than n resolved components. Consistent with this rule, the binding parameters for CaM are not PI because there is no underlying structure resolved in the sigmoidal binding curve for this molecule (Fig. 1).
The PF expansion technique is also useful conceptually: the PF constants (q, r, s, …) act as a “bridge” between the mathematical parameters (p1, a, b, …) of the binding equation and the spacing (Δ1, Δ2, Δ3, …) and slope (Γ) factors that characterize the binding curve line shape (Fig. 8 C).
Parameter fit-error curves
Visualizing error surfaces becomes unmanageable for systems described by more than a few parameters because of the large number of slices through the multidimensional surface that must be examined. Parameter fit-error plots collapse this error surface to two dimensions by plotting the least-squared fit error against a range of values for a single parameter; the remaining n − 1 parameters assume their best-fit values at each point. Moreover, because multiple fit-error curves can be displayed together, the ranges of all parameters consistent with experimental data can be represented compactly in a single, two-dimensional plot (Fig. 1 A, bottom).
Many binding parameter fit-error curves (including those in Figs. 1, 3, 4, 7, and 11) have the remarkable property that there is an infinite range of parameter values over which the fit error is small and essentially constant. If the magnitude of the fit error in this range (the “asymptotic error”) is smaller than the noise in an experimental binding curve, then an infinite range of parameter values will yield equally good fits to the data, indicating that the binding parameter is not identifiable.
Relation of real and complex PF components to binding parameter identifiability
The regions of constant error in parameter fit-error curves are caused by efficient compensations between the parameters, which can be understood by the following arguments. First, consider the case in which such compensations are not possible. The PF decomposition of an n-site binding curve yields a component with a one-site line shape for each real root of its PF polynomial. If all of the roots are real and distinct, then there are n − 1 spacings between the one-site components (Fig. 13). There is only one combination of the n − 1 shape parameters that reproduces the n − 1 spacings between the components in the binding curve correctly, yielding a low-error fit. (See the column labeled “Shape parameters as function of spacing factors” in Table 1 for equations for two-, three-, and four-site curves.) As a result, the parameters are PI when the roots of the PF polynomial are real and distinct.
The behavior is very different when a binding curve’s PF polynomial has complex roots or real roots that are equal. For each pair of complex conjugate roots, two of the one-site PF components are merged into a two-site component, with a loss of one spacing between the features in the binding curve. For each pair of real roots that are equal, two of the one-site PF components overlap perfectly, again with a loss of one spacing between the features in the binding curve. For either of these cases, there are an infinite number of combinations of the n − 1 shape parameters that can reproduce the set of correct spacings because there are fewer than n − 1 spacings. (See the last column in Table 1 for equations for two-, three-, and four-site curves.) By adjusting the location parameter appropriately, a calculated curve with the correct spacings can be brought into proper register with a given data curve, producing a close fit. The result is that one or more of the parameters are not PI when the PF polynomial has one or more pairs of complex roots, or one or more real, equal roots. In the limit that two or more real roots are unequal, but very close in magnitude, the behavior is essentially the same as for equal roots.
Using root-locus maps to understand parameter identifiability
The identifiability of binding parameters can be assessed by calculating parameter fit-error curves. Unfortunately, each set of fit-error curves is specific to a particular binding curve line shape and must be recalculated when a new binding curve is analyzed. It would be preferable to consult an “identifiability map” that classifies points throughout the space of the shape parameters (which correspond to different binding curve line shapes) with respect to identifiability. Such maps were constructed for two-site curves (Fig. 5 H) and three-site curves (Fig. 8 B) using the “brute force” approach of calculating fit-error curves for many different binding curve line shapes. The disadvantage of this approach is that it requires extensive numerical computations.
Our understanding of the causes underlying parameter identifiability (outlined in the previous section) suggests a simpler and more efficient approach. Because parameter identifiability is determined by the nature (real vs. complex) of the roots of the PF polynomial, the identifiability map coincides with a second map that classifies the points in the shape parameter space based on whether their PF roots are all real or contain complex conjugate pairs. This latter map, the root-locus map, is easily calculated. Remarkably, for two-site (Fig. 5 H), three-site (Fig. 8 B), and four-site (Fig. 12 B) systems, a single, connected region contains all sets of shape parameters whose PF roots are real and distinct. Therefore, the region of this space in which all binding parameters are PI can be demarcated by computing just its border. We showed above that the n − 1 sides of this border are calculated by evaluating n − 1 algebraic expressions, such as Eqs. 34a and 34b for three-site systems. Thus, the difficult and time-consuming problem of computing a large number of parameter fit-error curves can be replaced by a small number of simple algebraic calculations.
Our approach is similar to root-locus analysis (Evans, 1948), which is used in control theory to determine the stability of feedback systems. The common idea is that the nature of the roots of an equation modeling a physical system can be used to predict the behavior of the system for all parameter values.
Significance of complex-valued PF association constants
Complex-valued binding affinities were considered previously by Klotz (Klotz, 1993, 1997), who encountered them when factoring binding polynomials (the denominator of Eq. 1). The appellation “ghost affinities” given by Klotz to these purely mathematical constructs reflects their uncertain physical significance.
We have shown that complex values arise often during a similar process, the PF expansion of total binding relations. However, the PF components have a clear physical interpretation as structural elements of binding curves. We suggest that pairs of one-site components with complex conjugate association constants have no physical significance individually, and are only meaningful when combined into a two-site component. The merging of two one-site components into a two-site component is the reverse of the process shown in Eq. 10. Because the complex conjugate PF constants q and r are related to the two-site parameters p1 and a as the sum q + r (Eq. 12a) and the product q r (Eq. 12b), the process of merging the complex-valued one-site components produces a real-valued, physically meaningful two-site component.
Complex affinities, which might at first glance seem to be nonphysical artifacts, arise naturally in binding problems and do not conflict with the physical constraints of such problems. Complex numbers appear in many other physical problems, most notably in the analysis of electrical circuits. The frequent occurrence of complex-valued affinities for proteins (Klotz, 1993, 1997), coupled with our finding that they are indicators of nonidentifiability, suggests that the binding parameters of many, if not most, real proteins will be nonidentifiable when constrained by total binding data. This idea is consistent with the qualitative observation that binding curves for real proteins containing n binding sites often have fewer than n resolved components (such as CaM; Fig. 1 A, middle left).
Identifiability of binding parameters constrained by other types of experimental data
Our findings highlight the magnitude of the challenge in understanding the molecular mechanisms of proteins with multiple ligand binding sites. We expect that for many proteins, techniques that are more powerful than equilibrium total binding measurements will be required to quantify site affinities and cooperative interactions. We are now applying the mathematical tools and approach developed here and in the previous paper (Middendorf and Aldrich, 2017) to site-specific binding (Di Cera, 1995) and kinetic measurements to determine how much reliable, additional information can be obtained from those methods. We note that reliable estimates of some of the Adair–Klotz parameters for calcium binding to CaM (Fig. 1 A, middle left) could in principle be achieved by measuring the limiting slope of the total binding curve (the site occupancy at very low values).
Acknowledgments
The authors wish to thank D. Brent Halling, Suzanna Bennett, and Ben Liebeskind for helpful discussions and critical reading of the manuscript.
This work was supported by National Institutes of Health grant R01-NS077821 to R.W. Aldrich.
The authors declare no competing financial interests.
Christopher Miller served as guest editor.