A critical but often overlooked question in the study of ligands binding to proteins is whether the parameters obtained from analyzing binding data are practically identifiable (PI), i.e., whether the estimates obtained from fitting models to noisy data are accurate and unique. Here we report a general approach to assess and understand binding parameter identifiability, which provides a toolkit to assist experimentalists in the design of binding studies and in the analysis of binding data. The partial fraction (PF) expansion technique is used to decompose binding curves for proteins with n ligand-binding sites exactly and uniquely into n components, each of which has the form of a one-site binding curve. The association constants of the PF component curves, being the roots of an n-th order polynomial, may be real or complex. We demonstrate a fundamental connection between binding parameter identifiability and the nature of these one-site association constants: all binding parameters are identifiable if the constants are all real and distinct; otherwise, at least some of the parameters are not identifiable. The theory is used to construct identifiability maps from which the practical identifiability of binding parameters for any two-, three-, or four-site binding curve can be assessed. Instructions for extending the method to generate identifiability maps for proteins with more than four binding sites are also given. Further analysis of the identifiability maps leads to the simple rule that the maximum number of structurally identifiable binding parameters (shown in the previous paper to be equal to n) will also be PI only if the binding curve line shape contains n resolved components.

The thermodynamic theory of ligands binding to macromolecules containing multiple binding sites is reviewed in several monographs (Poland, 1978; Hill, 1985; Wyman and Gill, 1990; Ben-Naim, 2010). A remarkable feature of the theory is that the mathematical form of the partition function for these systems (also known as the binding polynomial) is conserved for virtually all physically reasonable binding models (Wyman and Gill, 1990). The total binding isotherm, which is derived from the partition function, also takes on a conserved form (see companion article Middendorf and Aldrich in this issue). For all models consisting entirely of bimolecular association reactions and conformational equilibria, the canonical form of the total binding relation is

(1)

where n is the number of ligand-binding sites on the protein and v is the mean number of occupied sites at free ligand concentration x (Middendorf and Aldrich, 2017). The parameters {p0, p1, …, pn} in Eq. 1 are functions of molecular properties such as the association constants of the ligands for the binding sites on the protein, the quantitative effect of binding at a subset of the sites on binding at other sites (cooperativity), and protein conformational equilibrium constants. (Parameter p0 in Eq. 1 is zero if the protein is assumed to occupy a single conformation.) Eq. 1 applies specifically to total binding curves, which are obtained from binding measurements that detect the net occupancy of all binding sites in a protein without specifying the occupancies of the individual sites.

The binding parameters in Eq. 1 cannot be measured directly, but rather are quantified by fitting models to binding data. Knowledge of the parameter values can yield meaningful insights into mechanism, but only if the parameters are identifiable, meaning that the fitted estimates of them are accurate and unique. This problem has been addressed previously (Colquhoun, 1969; Reich and Zinke, 1974; Reich et al., 1974a,b). However, the lack of a general understanding of the factors determining binding parameter identifiability has hampered the development and validation of quantitative models for molecular systems that are governed by multiple, coupled equilibria, such as ligand-activated receptors.

In our companion paper (Middendorf and Aldrich, 2017), we showed that the conserved form of the total binding relation (Eq. 1) implies that simple, general rules govern binding parameter structural identifiability: equilibrium total binding data constrain n structurally identifiable (SI) parameters if the binding model consists only of bimolecular binding reactions and n + 1 SI parameters for binding models comprising any combination of binding and conformational equilibria. Structural identifiability is a “best-case” scenario that assumes that the binding data contain no noise or systematic artifacts (Bellman and Åström, 1970). Parameter estimation fails for models containing more than the number of SI parameters, as infinite ranges of parameter values yield perfect fits to any binding curve line shape (see Fig. 1 of Middendorf and Aldrich [2017]).

Noise in experimental data creates an additional barrier to obtaining meaningful parameter estimates (Ljung, 1987; Walter and Pronzato, 1997). Parameters are practically identifiable (PI) if they can be determined accurately and with acceptable precision under these more stringent, real-world experimental conditions (Raue et al., 2009). Here, we develop a technique for deconstructing binding curves into their simplest components. The method is the basis of a general approach for assessing and understanding the practical identifiability of binding parameters for proteins with any number of binding sites.

An important goal of this work is to provide tools to investigators that are simple to implement and have broad applicability to the design of binding experiments and to the analysis of binding data. Our approach is essentially model independent, so that the identifiability assessment need only be performed once for a given binding curve rather than being repeated for each candidate model under consideration. Identifiability maps, which classify the identifiability of binding parameters for all possible binding curve line shapes for a specified number (n) of binding sites, are constructed for the specific cases of n = 2, 3, and 4. Instructions are provided for extending the theory to generate these maps for systems with n > 4. In situations in which the parameters are not PI, this knowledge is useful for guiding the design of alternative experimental approaches. We also introduce a useful method for displaying parameter uncertainties that avoids many of the problems normally encountered when visualizing error surfaces for systems with more than a few parameters.

All numerical computations were performed using Igor Pro version 6.37 (WaveMetrics). Parameter fit-error curves (Figs. 1, 3, 4, 6, 7, and 11) were generated from nonlinear least-squares fits to synthetic binding curves that used the Levenberg–Marquardt algorithm.

### Nonidentifiability of calmodulin (CaM) binding parameters

An example of a protein with nonidentifiable total binding parameters is CaM, which contains four nonidentical EF-hand calcium-binding sites. Many published studies have used a four-site Adair–Klotz model (Fig. 1 A, top) to fit the Ca2+ total binding curve for CaM. The model parameters K1K4 are macroscopic equilibrium association constants that quantify ligand binding to receptors containing 0–3 bound ligands, respectively. The model assumes a single protein conformation and does not distinguish between the individual binding sites in the protein nor the multiple ligation states with a given total number of bound ligands. For example, R2 in Fig. 1 A (top) represents all six ligation states containing two bound calcium ions. The total binding relation for this model is which has the form of Eq. 1 with p0 = 0, p1 = K1, p2 = K1K2, p3 = K1K2K3, and p4 = K1K2K3K4.

Figure 1.

Evidence that calcium total binding parameters for CaM are not PI. (A, top) Four-site, sequential Adair–Klotz model. Ri represents CaM with i Ca2+ ions bound; K1K4 are macroscopic association equilibrium constants. (A, middle left) Solid curve represents total fraction of binding sites occupied by ligand (Y), computed using Adair parameters reported in Linse et al. (1991). The function Y (with a maximum value of 1) is the normalized version of the function v (maximum value of 2) appearing in Eq. 2. Parameters are (in M−1): {K1, K2, K3, K4} = {79,433, 3.98 × 106, 25,119, 398,107}. Circles represent total binding curve computed using parameter set {160,344, 1.85 × 106, 37,678, 287,086}. (A, middle right) Same as A (middle left), except triangles represent total binding curve computed using parameter set {50.3, 6.6 × 109, 15,093, 617,904}. (A, bottom) Parameter fit-error plots. Each curve represents the results of 300 curve fits in which K1, K2, K3, or K4 was held at one of 300 values ranging from 10 to 1010 M−1. At each of these values, the chosen K was held constant while the other association constants were varied freely to achieve the best least-squares fit to the synthetic data (solid lines in middle graphs) using nonlinear regression. Curves depict rms error in best fit for systematic variation of K1 (red), K2 (blue), K3 (black), or K4 (green). Dashed line shows 5% rms error threshold. (B) Effect of noise on parameter estimates. (top left) Each curve is a histogram representing the distribution of estimates of parameter K1 obtained from 1,000 separate nonlinear regression fits in which all four parameters were allowed to vary freely. For each fit, random, Gaussian-distributed noise with a standard deviation of 0.05 (corresponding to 5% rms noise) was added to the noiseless, synthetic data (solid curve in middle graphs of A). Each histogram corresponds to a different starting value for K1. Starting values are indicated by inverted arrows. (B, top right and bottom left and right) Same as top left graph of B, except the histograms represent distributions of estimates for parameter K2, K3, and K4, respectively, for different starting values of these parameters (indicated by inverted arrows). Dashed vertical lines represent “correct” values of parameters (taken from Linse et al. [1991]).

Figure 1.

Evidence that calcium total binding parameters for CaM are not PI. (A, top) Four-site, sequential Adair–Klotz model. Ri represents CaM with i Ca2+ ions bound; K1K4 are macroscopic association equilibrium constants. (A, middle left) Solid curve represents total fraction of binding sites occupied by ligand (Y), computed using Adair parameters reported in Linse et al. (1991). The function Y (with a maximum value of 1) is the normalized version of the function v (maximum value of 2) appearing in Eq. 2. Parameters are (in M−1): {K1, K2, K3, K4} = {79,433, 3.98 × 106, 25,119, 398,107}. Circles represent total binding curve computed using parameter set {160,344, 1.85 × 106, 37,678, 287,086}. (A, middle right) Same as A (middle left), except triangles represent total binding curve computed using parameter set {50.3, 6.6 × 109, 15,093, 617,904}. (A, bottom) Parameter fit-error plots. Each curve represents the results of 300 curve fits in which K1, K2, K3, or K4 was held at one of 300 values ranging from 10 to 1010 M−1. At each of these values, the chosen K was held constant while the other association constants were varied freely to achieve the best least-squares fit to the synthetic data (solid lines in middle graphs) using nonlinear regression. Curves depict rms error in best fit for systematic variation of K1 (red), K2 (blue), K3 (black), or K4 (green). Dashed line shows 5% rms error threshold. (B) Effect of noise on parameter estimates. (top left) Each curve is a histogram representing the distribution of estimates of parameter K1 obtained from 1,000 separate nonlinear regression fits in which all four parameters were allowed to vary freely. For each fit, random, Gaussian-distributed noise with a standard deviation of 0.05 (corresponding to 5% rms noise) was added to the noiseless, synthetic data (solid curve in middle graphs of A). Each histogram corresponds to a different starting value for K1. Starting values are indicated by inverted arrows. (B, top right and bottom left and right) Same as top left graph of B, except the histograms represent distributions of estimates for parameter K2, K3, and K4, respectively, for different starting values of these parameters (indicated by inverted arrows). Dashed vertical lines represent “correct” values of parameters (taken from Linse et al. [1991]).

Close modal
(2)

The previous paper (Middendorf and Aldrich, 2017) showed, based on structural identifiability considerations, that total binding data for a protein with n sites constrains a maximum of n binding parameters (if a single conformation of the protein is assumed). This is an important limitation because it excludes single-conformation models with distinct affinity parameters for the n binding sites and any additional, explicit cooperativity factors for quantifying interactions between the sites. All such models require more than n parameters. The Adair–Klotz model avoids this problem, but at a cost in mechanistic insight: parameters K1K4 do not distinguish between affinity and cooperativity.

To explore practical identifiability of the Adair–Klotz binding parameters for CaM, a synthetic, noiseless total binding curve (Fig. 1 A, middle left and right, solid curves) was generated using Eq. 2 and the parameter estimates from one study (Linse et al., 1991). Curves nearly identical to this reference curve were obtained using other combinations of parameters with values very different from those in the reference set. For example, the curves represented by circles in Fig. 1 A (middle left) and triangles in Fig. 1 A (middle right) were calculated using comparable values of parameters K3 and K4, but K1 was >3,000 times larger, and K2 was >3,500 times smaller, for the circles compared with the triangles. The considerable flexibility in these parameter values indicates that the Adair–Klotz parameters are likely not PI when constrained by CaM total binding data.

Parameter identifiability for CaM was studied systematically by quantifying the full range of parameter values consistent with the synthetic binding data. These ranges are represented compactly in parameter fit-error curves. For example, the blue curve labeled K2 in Fig. 1 A (bottom) depicts the results of 300 curve fits. For each fit, K2 was held constant at a value in the range 10 M−1K2 ≤ 1010 M−1 (indicated on the abscissa), and parameters K1, K3, and K4 were adjusted to yield the best least-squares fit to the synthetic data using nonlinear regression. The ordinate plots the root mean square (rms) error for the fits. (Similar approaches were used in Colquhoun and Ogden [1988] and Solc and Aldrich [1990].) The minimum in the blue curve (which is not visible because the fit error of zero cannot be shown on the logarithmically scaled ordinate) corresponds to the point at which K2 was fixed at the “correct” value (i.e., the value used to generate the synthetic data). The fit error increased when K2 was held constant at a value above or below the correct value. A remarkable feature of this curve is the large range of K2 values over which the fit error maintains a small, nearly constant value. In fact, the abscissa in Fig. 1 A (bottom) is truncated for display purposes at an upper limit of K2 = 1010 M−1; the magnitude of the “asymptotic error” (blue horizontal arrow) remains nearly constant for all K2 > 1010 M−1. Asymptotes with small, nearly constant error extending over an infinite range of parameter values were also present in the fit-error curves for parameters K1 (red curve), K3 (black curve), and K4 (green curve; Fig. 1 A, bottom).

The noise in an experimental binding curve will mask the true value of a parameter if it is larger than the asymptotic error in that parameter’s fit-error curve. In such cases, the parameter is not PI because an infinite range of values yield indistinguishably good fits to the data. Parameters K1K4 for CaM are not PI because the typical noise in experimental binding curves (∼5%, dashed black line; Stefan et al., 2009) is much larger than the asymptotic errors for these parameters (∼0.27% rms for K1 and K2 and ∼0.48% rms for K3 and K4; Fig. 1 A, bottom). For this example, the only constraints on the parameter values are a lower limit of 2 × 104 M−1 for K4 and an upper limit of 4 × 106 M−1 for K1, based on the intersections of the corresponding fit-error curves with the assumed noise threshold. Typical total binding data would place no constraints on the values of K2 and K3 because the fit errors are <5% for all values of these parameters.

Fig. 2 shows how this enormous flexibility in parameter estimates is achieved: as a given parameter is stepped through a range of values, the other parameters undergo systematic variations that compensate for the incorrect value of the given parameter to produce close fits to the data. A major goal of this paper is to understand the mechanisms of these extremely effective compensations because this knowledge may contribute to the design of experiments that provide more powerful constraints on the binding parameters.

Figure 2.

Parameter compensations in fitting of synthetic CaM binding data. (A–D) Values of best-fit values of Adair–Klotz parameters, taken from fits used to generate parameter fit-error curves in Fig. 1 A (bottom). (A) Best-fit values of parameters K2K4 as parameter K1 (shown in bold) was varied systematically from 10 to 1010 M−1 (corresponding to red curve labeled K1 in Fig. 1 A, bottom). (B–D) Similar to A, except that parameters K2 (B), K3 (C), or K4 (D) were held constant at values in the range from 10 to 1010 M−1 while the other parameters were freely varied to achieve best least-squares fit to synthetic binding curve in Fig. 1 A (middle left).

Figure 2.

Parameter compensations in fitting of synthetic CaM binding data. (A–D) Values of best-fit values of Adair–Klotz parameters, taken from fits used to generate parameter fit-error curves in Fig. 1 A (bottom). (A) Best-fit values of parameters K2K4 as parameter K1 (shown in bold) was varied systematically from 10 to 1010 M−1 (corresponding to red curve labeled K1 in Fig. 1 A, bottom). (B–D) Similar to A, except that parameters K2 (B), K3 (C), or K4 (D) were held constant at values in the range from 10 to 1010 M−1 while the other parameters were freely varied to achieve best least-squares fit to synthetic binding curve in Fig. 1 A (middle left).

Close modal

To mimic experimental data more realistically, Gaussian noise with a standard deviation of 0.05 (equivalent to 5% rms noise) was added to the noiseless solid curve in Fig. 1 A (middle left). Each arrow in Fig. 1 B (top left) shows the starting guess for parameter K1 used in fitting 1,000 of such noisy simulated binding curves; the curves below the arrows represent the distribution of estimates for K1 obtained from the fits. It is noteworthy that (a) the starting guesses vary over many orders of magnitude, yet the peaks of the distributions are generally very close to the initial guess; (b) the peaks of the distributions are often very different from the “correct” value of the parameter (indicated by the dashed vertical line); and (c) the distributions are generally very narrow. Similar results were obtained for parameters K2K4 (Fig. 1 B, top right and bottom left and right). These results lead to the same conclusion as the fit-error plots in Fig. 1 A (bottom): the Adair–Klotz parameters K1K4 are not PI when constrained by CaM total binding data.

The narrow parameter distributions in Fig. 1 B (all graphs) are misleading, as the individual distributions give the incorrect impression that the estimates are precise and reliable. This observation may partially explain the wide range of values (15-, 6-, 101-, and 66-fold for K1K4, respectively) reported for CaM binding parameters by different groups. These estimates are summarized in Stefan et al. (2009). These wide ranges are in fact a vast underestimate of the true uncertainties in the binding parameter values, which are infinite (Fig. 1 A, bottom), and thus provide little insight into the mechanism of ligand binding to CaM. The asymptotic errors in the fit-error curves for parameters K1K4 are so low (tenths of a percentage; Fig. 1 A, bottom) that it is unlikely that experimental refinements could improve the quality of binding data sufficiently to make the CaM binding parameters identifiable. Thus, even well-executed total binding studies (Stefan et al., 2009) are insufficient to overcome the inherently low constraining power of total binding measurements for this system. The inescapable conclusions of this analysis are that parameter estimates obtained from fitting CaM total binding data are meaningless and that reliable estimates are only possible if more powerful experimental approaches are used.

A natural question arises concerning the generality of the results in Fig. 1: is CaM an atypical, pathological case, or do binding curves for real proteins frequently lack the power to provide meaningful constraints on binding parameters? To answer this question satisfactorily, it is necessary to analyze binding curves for proteins with different numbers of binding sites (specified by the factor n in Eq. 1) and, further, to examine all possible line shapes for each value of n. In the following, binding parameter identifiability is investigated systematically, starting with two-site binding curves.

### Shape and location parameters for two-site binding curves

For a protein that contains two ligand-binding sites and occupies a single conformation, Eq. 1 simplifies to

(3)

Because the units of p1 and p2 are M−1 and M−2, respectively, a unitless proportionality constant, a, can be defined by the relation

$p 2 =a 2 p 1 2 .$
(4)

Substituting this expression into Eq. 3 yields the reparameterized binding relation

$v=p 1 x+2a 2 p 1 2 x 2 1+p 1 x+a 2 p 1 2 x 2 .$
(5)

Because p1 and x always appear together and raised to the same power in Eq. 5, multiplying p1 by a constant, k, is equivalent to scaling the concentration axis by the same constant:

$( kp 1 )x+2a 2 ( kp 1 ) 2 x 2 1+( kp 1 )x+a 2 ( kp 1 ) 2 x 2 =p 1 ( kx )+2a 2 p 1 2 ( kx ) 2 1+p 1 ( kx )+a 2 p 1 2 ( kx ) 2 .$
(6)

Two important conclusions can be drawn from this analysis: if the bound fraction is plotted against the logarithm of ligand concentration, then (1) multiplying p1 by a factor k is equivalent to shifting the binding curve horizontally by an amount log(k) because log(kx) = log(k) + log(x); and (2) varying p1 has no effect on the shape of the binding curve. These effects are illustrated in Fig. 3 (A–C, left). In each panel, two two-site binding curves were computed for the indicated value of parameter a. Multiplying p1 for the dashed curves by 1,000 yields solid curves that are shifted leftward by three log units, with no change in line shape. In contrast, increasing parameter a 1,000-fold from 0.0025 (Fig. 3 A, left) to 2.5 (Fig. 3 C, left) changes the line shapes of the curves significantly.

Figure 3.

Practical identifiability of two-site binding curves depends on shape parameter a but not on location parameter p1. (A–C, left) Simulated two-site total binding curves for indicated values of shape parameter a. Location parameter p1 was equal to (in M−1) 1010 (A, solid curve), 107 (A, dashed curve), 108 (B, solid curve), 105 (B, dashed curve), 108 (C, solid curve), and 105 (C, dashed curve). (A–C, middle) Fit-error plots for location parameter for binding curves in A–C (left). (A–C, right) Shape parameter fit-error plots for binding curves in A–C (left). Dashed horizontal lines in A–C (middle and right) denote 5% error threshold. Binding parameters are denoted “PI” if the asymptotic value of the rms fit error is ≥5% and “NOT PI” if <5%.

Figure 3.

Practical identifiability of two-site binding curves depends on shape parameter a but not on location parameter p1. (A–C, left) Simulated two-site total binding curves for indicated values of shape parameter a. Location parameter p1 was equal to (in M−1) 1010 (A, solid curve), 107 (A, dashed curve), 108 (B, solid curve), 105 (B, dashed curve), 108 (C, solid curve), and 105 (C, dashed curve). (A–C, middle) Fit-error plots for location parameter for binding curves in A–C (left). (A–C, right) Shape parameter fit-error plots for binding curves in A–C (left). Dashed horizontal lines in A–C (middle and right) denote 5% error threshold. Binding parameters are denoted “PI” if the asymptotic value of the rms fit error is ≥5% and “NOT PI” if <5%.

Close modal

A simple substitution (Eq. 4) has separated the binding parameters into two classes that are “orthogonal” in the sense that parameters from each class have distinct effects on two-site binding curves: p1 is a location parameter that determines the position of a curve on the concentration axis without affecting its shape, whereas a is a unitless shape parameter that determines the curve’s line shape.

### Practical identifiability of parameters for two-site binding curves

The two parameter classes are also orthogonal with respect to parameter identifiability. Increasing p1 by a factor of 1,000 does not change the identifiability of parameters p1 (Fig. 3, A–C, middle, solid vs. dashed curves) or a (Fig. 3, A–C, right, solid vs. dashed curves). In contrast, if the shape parameter is increased by the same factor of 1,000, the binding curve line shape changes in such a way that both parameters, which were initially identifiable (Fig. 3 A, middle and right), become nonidentifiable (Fig. 3 C, middle and right).

The results in Fig. 3 simplify the task of understanding parameter identifiability. It is unnecessary to consider the infinite number of parameter combinations {p1, a} that give rise to all possible two-site binding curves. Rather, the universe of possible line shapes is confined to a single axis representing the values of the continuous variable a.

Fig. 4 A shows the effect of parameter a on the line shape of two-site binding curves. To facilitate the comparison, the value of p1 for each curve was adjusted so that the midpoints are all equal to 10−8 M. The curves exhibit two clearly resolved components for small values of a. As a is increased, the resolution between these components is decreased until the curve appears as a single component for shape parameter values in the range 0.1 < a < 1. Above this range, increasing a has almost no effect on the line shape.

Figure 4.

Effects of shape parameter a on line shape and parameter identifiability. (A) Two-site binding curves for various values of shape parameter a. To facilitate comparison of the structure of the curves, the location parameter was adjusted for all curves so that the midpoints equal 10−8 M. (B) Slope of two-site binding curves as function of shape parameter a (see Eq. 9). Thin dashed lines are extrapolations of the slope at small and large values of the shape parameter. Heavy vertical dashed line indicates that the slope changes at a critical value of the shape parameter, ac, equal to 1/2. (C) Parameter error plots for two-site binding curves with shape parameter values in the range 10−5a ≤ 1,000. (D) Asymptotic fit error as a function of shape parameter a. Thin dashed lines are extrapolations of the asymptotic error for small and large values of the shape parameter. Heavy dashed vertical line indicates that the asymptotic error undergoes a transition near the critical value of a shown in B.

Figure 4.

Effects of shape parameter a on line shape and parameter identifiability. (A) Two-site binding curves for various values of shape parameter a. To facilitate comparison of the structure of the curves, the location parameter was adjusted for all curves so that the midpoints equal 10−8 M. (B) Slope of two-site binding curves as function of shape parameter a (see Eq. 9). Thin dashed lines are extrapolations of the slope at small and large values of the shape parameter. Heavy vertical dashed line indicates that the slope changes at a critical value of the shape parameter, ac, equal to 1/2. (C) Parameter error plots for two-site binding curves with shape parameter values in the range 10−5a ≤ 1,000. (D) Asymptotic fit error as a function of shape parameter a. Thin dashed lines are extrapolations of the asymptotic error for small and large values of the shape parameter. Heavy dashed vertical line indicates that the asymptotic error undergoes a transition near the critical value of a shown in B.

Close modal

A convenient metric for characterizing binding curve line shapes is their midpoint slope, which can be computed from Eq. 3 as follows. The logarithmic derivative of the binding relation is given by

$∂v ∂( logx ) =xln( 10 )∂v ∂x ,$
(7)

where the substitution logx = (lnx)/(ln10) was used. The regular derivative of the binding relation is given by

(8)

The ligand concentration at which the binding curve reaches half-saturation, xh, is derived by setting Eq. 3 equal to 1 and is given by xh = (1/p2)1/2. Substituting Eq. 8 into Eq. 7 and evaluating the resulting expression at xh yields the midpoint slope

(9)

where Eq. 4 was used to simplify the expression.

As predicted above, the midpoint slope depends on shape parameter a but not on location parameter p1. For small values of a, the midpoint slope increases sharply as a is increased (Fig. 4 B, solid curve). For large values of a, the midpoint slope is much less sensitive to increases in a and approaches a limiting value of ln(10). The transition between these two regimens occurs at a critical value of the shape parameter, ac, equal to 1/2 (Fig. 4 B, dashed vertical line).

The uncertainty in a parameter estimated by fitting experimental data should become large if there are conditions for which the experimental observable (the binding curve line shape in our case) becomes insensitive to changes in that parameter. For example, the results in Fig. 4 B suggest that the identifiability of shape parameter a should deteriorate as this parameter increases. Consistent with this prediction, the asymptotic fit error (Fig. 4 C) is large (>20% rms) for two-site binding curves with small values of a but drops well below typical experimental noise levels when a is close to unity or higher, indicating that the shape parameter is PI in the “small-a” regime and not PI in the “large-a” regime. The transition between the two identifiability regimes (Fig. 4 D, dashed vertical line) occurs close to the critical value, ac, identified in Fig. 4 B. These results suggest that (a) two distinct mechanisms couple changes in the shape parameter magnitude to changes in the binding curve line shape; (b) a switch between the mechanisms occurs at a = ac; (c) “mechanism I,” operative for a < ac, produces relatively large changes in line shape per unit change in a, and because of this strong coupling, the shape parameter is identifiable in this range; and (d) “mechanism II,” operative for a > ac, produces relatively small changes in line shape per unit change in a, and because of this weak coupling, the shape parameter is not identifiable in this range. In the following sections, mathematical analysis of two-site binding curve line shapes identifies the nature of these mechanisms, reveals why they operate over specific ranges of a values, and explains quantitatively why the transition between mechanisms occurs at a = ac = 1/2.

### Partial fraction (PF) expansion of two-site binding relation

To clarify the relationship between shape parameter magnitude, binding curve line shape, and binding parameter PI, it is useful to express the two-site binding relation (Eq. 5) in terms of simpler components using a PF expansion:

$p 1 x+2a 2 p 1 2 x 2 1+p 1 x+a 2 p 1 2 x 2 =qx 1+qx +rx 1+rx .$
(10)

In Eq. 10, the basis functions for the expansion are two one-site binding curves with association constants q and r. It is tempting to relate q and r to the microscopic affinities of the protein binding sites. However, this relation is not always correct, and for our purposes, it is unnecessary to attribute any particular mechanistic significance to q and r. Combining the terms on the right side of Eq. 10 yields

(11)

The equality in Eq. 11 holds if the coefficients of like powers of x are equal, so that

$p 1 =q+r$
(12a)

and

$a 2 p 1 2 =qr.$
(12b)

Solving for r in Eq. 12a and substituting this expression into Eq. 12b yields a quadratic equation (the two-site “PF polynomial”):

$q 2 −p 1 q+a 2 p 1 2 =0.$
(13)

The PF expansion constants q and r for any two-site binding curve specified by parameters {p1, a} are the roots of the corresponding PF polynomial. Because Eq. 13 has solutions for all real values of p1 and a, it follows that every two-site binding relation can be expanded exactly using this mathematically equivalent form (Eq. 10). Furthermore, binding curves with distinct parameters {p1, a} have unique PF components that specify their underlying structure.

Fig. 5 (A–D, left) shows the PF component curves (red and blue traces) corresponding to the two-site binding curves in Fig. 4 A. The reason for choosing one-site binding curves as the basis functions for the PF expansion is that all one-site binding curves have the same line shape. Thus, the essential information obtained from the expansion is conveyed in the simplified form of “PF spectra,” which depict the midpoints xr = 1/r and xq = 1/q of the PF components as vertical lines (Fig. 5, A–D, right).

Figure 5.

PF decomposition of two-site total binding curves. (A–F, left) Black curves are two-site binding curves computed for values of shape parameter equal to 0.001 (A), 0.01 (B), 0.1 (C), 0.5 (D), 10 (E), and 100 (F). Red and blue curves represent the one-site PF component curves whose sum is equal to the two-site (black) curve. Striped curves in E and F (left) represent two-site curves that cannot be decomposed into real-valued PF components because the association constants q and r are complex conjugates. (A–F, right) PF spectra showing midpoints xr (red) and xq (blue) of one-site curves from A–D (left) as vertical lines. Midpoints of one-site curves are given by the reciprocals of the association constants: xr = 1/r and xq = 1/q. Striped lines in E and F (right) represent situations in which q and r are complex conjugates; in these cases the midpoint xqr is for the two-site curve. Also shown are the values of the shape parameter a, the spacing parameter Δ1 (see Eq. 18), and the slope parameter Γ (see Eq. 20). (G) Real (left) and imaginary (right) parts of component association constants q (blue) and r (red) as function of shape parameter a. Roman numerals i–vi relate components of q and r to spectra with the same labels in A–F (right). (H) Graphical representation of relation between magnitude of shape parameter, PF component constants q and r, and parameter identifiability for scenarios labeled i–vi in A–F (right). Dashed vertical line depicts critical value of a at which transition occurs between the two mechanisms controlling magnitude of shape parameter.

Figure 5.

PF decomposition of two-site total binding curves. (A–F, left) Black curves are two-site binding curves computed for values of shape parameter equal to 0.001 (A), 0.01 (B), 0.1 (C), 0.5 (D), 10 (E), and 100 (F). Red and blue curves represent the one-site PF component curves whose sum is equal to the two-site (black) curve. Striped curves in E and F (left) represent two-site curves that cannot be decomposed into real-valued PF components because the association constants q and r are complex conjugates. (A–F, right) PF spectra showing midpoints xr (red) and xq (blue) of one-site curves from A–D (left) as vertical lines. Midpoints of one-site curves are given by the reciprocals of the association constants: xr = 1/r and xq = 1/q. Striped lines in E and F (right) represent situations in which q and r are complex conjugates; in these cases the midpoint xqr is for the two-site curve. Also shown are the values of the shape parameter a, the spacing parameter Δ1 (see Eq. 18), and the slope parameter Γ (see Eq. 20). (G) Real (left) and imaginary (right) parts of component association constants q (blue) and r (red) as function of shape parameter a. Roman numerals i–vi relate components of q and r to spectra with the same labels in A–F (right). (H) Graphical representation of relation between magnitude of shape parameter, PF component constants q and r, and parameter identifiability for scenarios labeled i–vi in A–F (right). Dashed vertical line depicts critical value of a at which transition occurs between the two mechanisms controlling magnitude of shape parameter.

Close modal

### Critical value of shape parameter a

The shape parameter can be expressed explicitly in terms of the one-site PF constants by substituting Eq. 12a into Eq. 12b:

$a=( qr ) 1/2 q+r .$
(14)

Eq. 14 can be rewritten in the equivalent form

$a=( 1/2 )GM /AM ,$
(15)

where GM and AM represent the geometric and arithmetic means of q and r, respectively. An important inequality of classical analysis requires GM ≤ AM for any two real, positive numbers (Hardy et al., 1934), with equality applying when the two numbers are equal. Applying this inequality to Eq. 15 indicates that a ≤ 1/2 for real values of q and r and that a = 1/2 when q = r. This value is equal to the critical value of the shape parameter, ac, deduced above from line shape (Fig. 4 B) and parameter identifiability (Fig. 4 D) considerations.

How are values of a > 1/2 achieved? The nature of the roots of a polynomial are given by its discriminant, D (Gelfand et al., 1994). The roots are real and unequal if D > 0, real and equal if D = 0, and complex if D < 0. The discriminant of the two-site PF polynomial (Eq. 13) is

$D=p12(1−4a2).$
(16)

Eq. 16 indicates that shape parameter values larger than the critical value of 1/2 are only possible if q and r are complex. In fact, because the coefficients of the PF polynomial are real numbers derived from real physical quantities, q and r must be complex conjugates for a > 1/2 (Uspensky, 1948):

$q=q real −iq imag$
(17a)

and

$r=q real +iq imag ,$
(17b)

where i represents the square root of −1 and the real numbers qreal and qimag are the real and imaginary parts of q and r.

We have demonstrated a striking correlation between the nature of the PF expansion constants and the identifiability of two-site binding parameters. In the range 0 < a < ac, q and r are real and distinct, and the binding parameters p1 and a are identifiable, whereas in the range a > ac, q and r are complex conjugates, and the binding parameters are not identifiable (Figs. 3 and 4). The next two sections describe how the mathematical nature (real versus complex) of the PF constants for a binding curve determines the identifiability of the curve’s binding parameters.

### Strong-coupling mechanism (mechanism I) and real-valued PF components

The PF component spectra in Fig. 5 (A–F, right) provide a visual representation of the relationship between shape parameter magnitude and binding curve line shape. When the midpoints of the PF components are plotted against the logarithm of ligand concentration, the spacing between them, Δ1, is given by the ratio of the one-site association constants,

$Δ1=r/q,$
(18)

because log(Δ1) = log(r/q) = log(xq/xr) = log(xq) − log(xr). (Without loss of generality, it is assumed that rq, so that Δ1 ≥ 1.) Substituting Eq. 18 into Eq. 14 yields

$a=Δ11/21+Δ1.$
(19)

Eq. 19 is the mathematical representation of mechanism I, the coupling between the shape parameter and the binding curve line shape for a < ac. The shape parameter is increased from 0.001 (Fig. 5 A, right) to 1/2 (Fig. 5 D, right) by decreasing Δ1 from 106 to 1. In other words, for values below the critical value, the shape parameter is increased by moving the PF components closer together until they overlap at a = ac. Because the change in line shape per unit change in the shape parameter is large (compare curves for a < 1/2 in Fig. 4 A), the coupling is strong, and the shape parameter is identifiable in this range.

### Weak-coupling mechanism (mechanism II) and complex-valued PF components

For a > ac, the PF constants are complex conjugates, and the binding curve cannot be decomposed into two real-valued one-site components. Thus, there is no spacing between components as there is when the PF constants are real-valued, and so mechanism I is not operative. Instead, we define the real number Γ as the ratio

$Γ=qimagqreal,$
(20)

which, when substituted along with Eqs. 17a and 17b into Eq. 14, yields

(21)

Eq. 21 is the mathematical representation of mechanism II, the coupling between the shape parameter and the binding curve line shape for a > ac. For example, the shape parameter is increased from 1/2 (Fig. 5 D, right) to 100 (Fig. 5 F, right) by increasing the ratio of the imaginary to the real components of q and r from 0 to 200. Because the change in line shape per unit change in the shape parameter is relatively small (compare curves for a > 1/2 in Fig. 4 A), the coupling is weak, and the binding parameters are not identifiable in this range. Because the main effect of Γ is to (slightly) alter the steepness of two-site binding curves, we will refer to it later as a slope factor.

### Interpretation of complex-valued PF constants

For a > ac, the PF constants q and r are complex conjugates (Eqs. 17a and 17b), and hence the one-site PF component curves cannot be represented on a physically meaningful (i.e., real-valued) concentration axis. Our interpretation is that two-site binding curves with shape parameters in this range cannot be decomposed meaningfully into simpler components (Fig. 5, E and F). Consistent with this idea, two-site curves with a > 1/2 have the appearance of single components, with no indication of underlying resolved structure (Fig. 4 A). It is worth noting that there is no inconsistency between the complex-valued one-site components q and r and the real-valued two-site binding parameters p1 and a. The expressions for parameters p1 and a in Eqs. 12a and 12b depend on the sum q + r and the product q r, both of which take on real values when q and r are complex conjugates.

### Summary of root-locus and identifiability analyses for two-site binding curves

The practical identifiability of parameters for two-site total binding curves is summarized in Fig. 5 H. Mathematical analysis of Eq. 14 indicates that the line representing all possible values of shape parameter a (or, equivalently, all possible two-site binding curve line shapes) can be divided into two segments at the point a = ac. The mapping of PF constants q and r onto the line of shape parameter values (the “root-locus map”) is simple: q and r are real and distinct in the left segment, real and equal at a = ac, and complex conjugates in the right segment. This map coincides with the identifiability map for the shape parameter, which is PI when smaller than ac and not PI when larger than ac. The differences in identifiability are caused by the quantitatively different effects of the shape parameter on the binding curve line shape in these two regimens. For example, the midpoint slope becomes 50 times steeper when the shape parameter is increased 100-fold from 0.005 to 0.5, but only increases by a factor of 2 when a is increased 100-fold from 0.5 to 50.

### Parameter compensations in the fitting of two-site binding curves

It is instructive to examine individual curve fits from fit-error plots to gain insight into the parameter compensations that render binding parameters nonidentifiable. Fig. 6 A (top) shows the shape parameter fit-error plot for a simulated two-site binding curve with a = 100. When a was forced to take on various incorrect values (Fig. 6 A, top, the points labeled A–C and E–H), the nonlinear regression algorithm obtained the best fits by matching the midpoint of the fit to that of the simulated data (Fig. 6 A, second panel). We showed earlier that the midpoint, xh, of a two-site binding curve is given by xh = (1/p2)1/2, which, using Eq. 4, is equivalent to Thus, when a is forced to take on an incorrect value, the midpoint of the fit curve can still be matched perfectly to that of the simulated data curve by varying p1 so that the product of parameters a and p1 remains constant (Fig. 6 A, third and fourth panels). Because this strategy yields excellent fits over a large range of shape parameter values (Fig. 5 A, second panel, points C–H), the shape parameter is not PI for binding curves with a = 100.

Figure 6.

Parameter compensations affecting binding parameter identifiability for two-site binding curves. (A and B, top) Parameter fit-error curves for binding curves with a = 100 (A) and a = 0.1 (B). Dashed horizontal lines indicate 5% rms error level. (A, second panel) Examples of fits to synthetic data curve from selected points on shape parameter fit-error curve in A (top). Fits were obtained using nonlinear regression while holding shape parameter a fixed to the value indicated by the corresponding letter in A (top) and varying parameter p1 to achieve best (i.e., least-squared error) fit. Synthetic data curve in A (second panel) is indicated by red dots. (B, second panel) Same as A (second panel), except fits are from selected points labeled A–K in B (top). (A and B, third panel) Trajectories of parameters a and p1 for all fits used to construct fit-error curves in A (top and second panels). (A and B, bottom) Midpoints of all fits from fit-error curves in A (top and second panels). The fit midpoint (black) matches the midpoint of the synthetic data (10−8 M, red dashed line) for nearly all values of shape parameter a that were tested.

Figure 6.

Parameter compensations affecting binding parameter identifiability for two-site binding curves. (A and B, top) Parameter fit-error curves for binding curves with a = 100 (A) and a = 0.1 (B). Dashed horizontal lines indicate 5% rms error level. (A, second panel) Examples of fits to synthetic data curve from selected points on shape parameter fit-error curve in A (top). Fits were obtained using nonlinear regression while holding shape parameter a fixed to the value indicated by the corresponding letter in A (top) and varying parameter p1 to achieve best (i.e., least-squared error) fit. Synthetic data curve in A (second panel) is indicated by red dots. (B, second panel) Same as A (second panel), except fits are from selected points labeled A–K in B (top). (A and B, third panel) Trajectories of parameters a and p1 for all fits used to construct fit-error curves in A (top and second panels). (A and B, bottom) Midpoints of all fits from fit-error curves in A (top and second panels). The fit midpoint (black) matches the midpoint of the synthetic data (10−8 M, red dashed line) for nearly all values of shape parameter a that were tested.

Close modal
(22)

The reason why this strategy works is that the midpoint slopes of the simulated data and fit curves are all close to the limiting value of ln(10) (Eq. 9) and do not change much as a is varied around the correct value of 100 (see Fig. 4 B). Only when the shape parameter is forced to take on values close to or below the critical value (as in Fig. 6 A, top, points A and B) do the line shapes of the fit curves deviate significantly from that of the synthetic data curve.

For synthetic data curves with shape parameters smaller than the critical value, such as a = 0.1, the fitting software uses the same strategy of matching the midpoints of the data and fit curves (Fig. 6 B). This strategy is less successful in this case, however, because the midpoint slope changes significantly as a is varied away from the correct value of 0.1 (see Fig. 4 B). As a result, the fit error increases sharply for values of a above and below the correct value (Fig. 6 B, second panel), the asymptotic error is higher than typical experimental noise, and the shape parameter is PI.

### Shape and location parameters for three-site total binding curves

The analysis of parameter identifiability for three-site binding curves proceeds similarly as for two-site curves, and starts with the canonical form of the binding relation. For a protein that contains three ligand-binding sites and occupies a single conformation, Eq. 1 simplifies to

(23)

This equation can be reparameterized by noting that the units of parameters p1, p2, and p3 are M−1, M−2, and M−3, respectively, so that unitless proportionality constants a and b can be defined by the relations

$p 2 =a 2 p 1 2$
(24a)

and

$p 3 =b 3 p 1 3 .$
(24b)

Substituting these relations into Eq. 23 yields

(25)

As for two-site curves (Eq. 6), p1 and x always appear together and raised to the same power, so that replacing p1 by k p1 is equivalent to replacing x by k x:

(26)

This substitution has the effect of shifting the binding curve along the concentration axis by a factor of log(k) without altering its line shape.

The effects predicted in Eq. 26 are illustrated in Fig. 7 (A–E, left). In each panel, two three-site binding curves were computed for the indicated values of shape parameters a and b. Multiplying p1 for the dashed curves by 1,000 yields curves that are shifted leftward by three log units (solid curves), with no change in line shape. In contrast, increasing parameters a and b changes the curves’ line shapes significantly. Again, simple reparameterizations (Eqs. 24a and 24b) have created two orthogonal classes of binding parameters: p1 is a location parameter that determines the horizontal position of the curves, whereas the two shape parameters a and b determine their line shape.

Figure 7.

Quantitative effect of shape parameters a and b on practical identifiability of three-site binding parameters. (A–E, left) Three-site total binding curves computed using the indicated values of shape parameters a and b. Location parameter p1 was equal to (in M−1) 1010 (A, solid curve), 107 (A, dashed curve), 1010 (B, solid curve), 107 (B, dashed curve), 1010 (C, solid curve), 107 (C, dashed curve), 109 (D, solid curve), 106 (D, dashed curve), 107 (E, solid curve), and 104 (E, dashed curve). (A–E, middle) Location parameter p1 fit-error plots for binding curves in A–E (left). (A–E, right) Fit-error plots for shape parameter a (red) and shape parameter b (blue) for binding curves in A–E (left). Dashed horizontal lines in A–E (middle and left) indicate 5% error threshold. Binding parameters are denoted “PI” if the asymptotic value of the rms fit error is ≥5% and “NOT PI” if <5%.

Figure 7.

Quantitative effect of shape parameters a and b on practical identifiability of three-site binding parameters. (A–E, left) Three-site total binding curves computed using the indicated values of shape parameters a and b. Location parameter p1 was equal to (in M−1) 1010 (A, solid curve), 107 (A, dashed curve), 1010 (B, solid curve), 107 (B, dashed curve), 1010 (C, solid curve), 107 (C, dashed curve), 109 (D, solid curve), 106 (D, dashed curve), 107 (E, solid curve), and 104 (E, dashed curve). (A–E, middle) Location parameter p1 fit-error plots for binding curves in A–E (left). (A–E, right) Fit-error plots for shape parameter a (red) and shape parameter b (blue) for binding curves in A–E (left). Dashed horizontal lines in A–E (middle and left) indicate 5% error threshold. Binding parameters are denoted “PI” if the asymptotic value of the rms fit error is ≥5% and “NOT PI” if <5%.

Close modal

### Approach to understanding parameter identifiability for three-site binding curves

The identifiability of parameters constrained by two-site total binding curves deteriorates as shape parameter a is increased (Fig. 4, C and D). This simple relationship is maintained for location parameter p1 (Fig. 7, A–E, middle) and shape parameter b (Fig. 7, A–E, right, blue curves) for three-site binding curves. However, shape parameter a is not PI for three-site curves with small (Fig. 7 A, right) or large (Fig. 7, D and E, right) values of a and b but is PI for intermediate (Fig. 7, B and C, right) values of a and b.

In the following, we show that this confusing identifiability pattern can be understood using a strategy similar to the one described above for two-site binding curves. Many of the insights gained there will carry over to the analysis of three-site curves. However, new phenomena will be encountered in the three-site case that have no equivalent for two-site curves. Our approach consists of the following steps: (1) the PF expansion is used to decompose three-site binding curves into their simplest components; (2) a root-locus map is computed, in which the regions of the (a, b) shape parameter space are characterized according to whether they yield real- or complex-valued PF components; (3) parameter fit-error curves are computed for numerous binding curves throughout the (a, b) space, and these results are used to create a parameter identifiability map of this space; (4) the identifiability pattern is interpreted in terms of the effects of real versus complex PF components on the binding curve line shapes; and (5) the validity of this interpretation is tested by its ability to explain why parameter compensations are effective in some cases, leading to nonidentifiable parameters, and ineffective in other cases, leading to identifiable parameters.

### PF expansion of three-site total binding relation

The canonical three-site total binding curve (Eq. 25) can be expressed as the sum of three one-site binding curves with association constants q, r, and s:

(27)

Combining the terms on the right side of Eq. 27 yields

(28)

Equating the coefficients of like powers of x in Eq. 28 yields

$p 1 =q+r+s,$
(29a)
$a 2 p 1 2 =qr+qs+rs,$
(29b)

and

$b 3 p 1 3 =qrs.$
(29c)

Eliminating variables r and s from Eqs. 29a, 29b, and 29c yields a cubic equation (the three-site PF polynomial):

$q 3 −p 1 q 2 +a 2 p 1 2 q−b 3 p 1 3 =0.$
(30)

The association constants q, r, and s of the one-site PF components are the roots of Eq. 30. Because the coefficients in Eq. 30 are products of the real numbers p1, a, and b, there are two possibilities for the three roots of this equation: they are either all real, or else one is real and two are complex conjugates (Uspensky, 1948). The family of PF spectra in Fig. 8 A provide a concrete picture of these abstract scenarios. The PF constants q, r, and s are all real and distinct in subpanels −i, o, and i; they are all real, and two are equal in subpanels −ii and ii; and one is real, and two are complex conjugates in subpanels −iv, −iii, iii, and iv. A subtlety that occurs in the case of three binding sites (but not for two sites) is that there are two distinct ways in which the complex conjugates can be paired: (1) s real and (q, r) complex conjugate (Fig. 8 A, subpanels −iv and −iii) or (2) q real and (r, s) complex conjugate (Fig. 8 A, subpanels iii and iv). We show below that these two pairings behave differently with respect to parameter identifiability.

Figure 8.

Mapping PF component diagrams onto space of shape parameters for three-site binding curves. (A) Family of PF spectra for three-site binding curves. Scale parameter Δ2, the spacing between the outermost midpoints, was equal to 106 for all of the spectra. The midpoints xs (red), xr (black), and xq (blue) of the one-site components, shown as vertical lines, are given by xs = 1/s, xr = 1/r, and xq = 1/q. Striped lines in subpanels −iv and −iii represent situations in which q and r are complex conjugates; in these cases, the midpoint xqr is for the corresponding two-site component. Striped lines in subpanels iii and iv are similar, except that here r and s are complex conjugates, and the midpoint of the two-site PF component is xrs. The PF constants q, r, and s have the following relationships: all real and distinct in −i, o, and i, all real with two equal in −ii and ii, and one real and one pair of complex conjugates in −iv, −iii, iii, and iv. (B) Root-locus map for PF components of three-site binding curves. Bold lines represent (a, b) pairs for which the corresponding PF constants q and r are equal (blue line) or for which r and s are equal (red line). The points inside the region demarcated by these two lines represent all (a, b) pairs whose PF expansions give rise to real, distinct values of q, r, and s. Lowercase roman numerals identify points in the (a, b) parameter space corresponding to the PF spectra in A. Sum of thin blue, black, and red lines comprise all (a, b) pairs whose PF spectra have a spacing Δ2 = 106. Uppercase roman numerals I–IV identify the four quadrants demarcated by the critical values (ac and bc) of the shape parameters. (C) Relationships between location and shape parameters {p1, a, b}, PF constants {q, r, s}, and spacing and slope factors {Δ1, Δ2, Γ} for three-site binding curves. Equations for interconverting between these factors are also given. Symbols such as |q| represent the magnitude of the complex number q, not its absolute value.

Figure 8.

Mapping PF component diagrams onto space of shape parameters for three-site binding curves. (A) Family of PF spectra for three-site binding curves. Scale parameter Δ2, the spacing between the outermost midpoints, was equal to 106 for all of the spectra. The midpoints xs (red), xr (black), and xq (blue) of the one-site components, shown as vertical lines, are given by xs = 1/s, xr = 1/r, and xq = 1/q. Striped lines in subpanels −iv and −iii represent situations in which q and r are complex conjugates; in these cases, the midpoint xqr is for the corresponding two-site component. Striped lines in subpanels iii and iv are similar, except that here r and s are complex conjugates, and the midpoint of the two-site PF component is xrs. The PF constants q, r, and s have the following relationships: all real and distinct in −i, o, and i, all real with two equal in −ii and ii, and one real and one pair of complex conjugates in −iv, −iii, iii, and iv. (B) Root-locus map for PF components of three-site binding curves. Bold lines represent (a, b) pairs for which the corresponding PF constants q and r are equal (blue line) or for which r and s are equal (red line). The points inside the region demarcated by these two lines represent all (a, b) pairs whose PF expansions give rise to real, distinct values of q, r, and s. Lowercase roman numerals identify points in the (a, b) parameter space corresponding to the PF spectra in A. Sum of thin blue, black, and red lines comprise all (a, b) pairs whose PF spectra have a spacing Δ2 = 106. Uppercase roman numerals I–IV identify the four quadrants demarcated by the critical values (ac and bc) of the shape parameters. (C) Relationships between location and shape parameters {p1, a, b}, PF constants {q, r, s}, and spacing and slope factors {Δ1, Δ2, Γ} for three-site binding curves. Equations for interconverting between these factors are also given. Symbols such as |q| represent the magnitude of the complex number q, not its absolute value.

Close modal

### Root-locus maps and the shape parameter space

We showed above that the universe of all possible two-site line shapes can be represented as points on the line of values of shape parameter a (Fig. 5 H). The critical value of the shape parameter divides this line into two segments and completely specifies the two-site root-locus map. This map provides a simple framework for interpreting the identifiability of two-site binding parameters.

The universe of all possible three-site line shapes can be represented as points in the plane of shape parameter pairs (a, b; Fig. 8 B). In the next several sections, we show how real and complex values of the PF components {q, r, s} project onto the (a, b) parameter space. This root-locus map will be essential for interpreting parameter identifiability for three-site binding curves.

### Critical values of shape parameters for three-site binding curves

In this section, we determine the critical values of shape parameters a and b, which are needed to create the root-locus map of the (a, b) parameter space. The quantitative relationship between shape parameter a and the PF constants for three-site binding curves is obtained by solving Eq. 29b for a and substituting Eq. 29a into this expression, yielding

(31a)

A similar procedure using Eqs. 29a and 29c yields the corresponding relationship for shape parameter b:

(31b)

Eq. 31b can be rewritten as

$b=( 1/3 )GM /AM .$
(32)

Because AM ≥ GM for real, positive values of q, r, and s by the AM-GM inequality (Hardy et al., 1934), Eq. 32 requires that the PF components cannot all be real if b exceeds the critical value bc = 1/3.

The AM-GM inequality is not applicable to Eq. 31a. However, the numerator and denominator of this equation are related to the elementary symmetric polynomials in three variables of degree two and one, respectively (Uspensky, 1948). Maclaurin discovered a family of inequality relations between these polynomials (Cvetkovski, 2012). The inequality relevant to Eq. 31a states that, for real, positive numbers q, r, and s,

(33)

with equality applying when the numbers are all equal. Comparison of Eq. 33 with Eq. 31a indicates that the maximum value of a that yields all real values for the set {q, r, s} is the critical value ac = (1/3)1/2. It is worth noting that this critical value is different than the critical value ac = 1/2 for the case of two binding sites (Table 1).

The critical values ac and bc partition the (a, b) shape parameter space into four quadrants (labeled I–IV in Fig. 8 B). Shape parameter pairs in quadrants II, III, and IV correspond to binding curves whose PF components must include a pair of complex conjugate values because one or both of the shape parameters exceed their critical value. However, knowledge of the critical values ac and bc is not sufficient to complete the three-site root-locus map. It remains to establish the nature of the PF constants that map to quadrant I. This question is addressed in the next section.

### Mapping real-valued PF constants to quadrant I of the (a, b) shape parameter space

Three-site binding curves whose PF spectra consist of all real values of q, r, and s are completely specified by two spacing factors. The factor Δ2 = s/q represents the distance between the midpoints of the outermost PF components (Fig. 8 A, subpanel ii, red and blue components) because log(Δ2) = log(xq) − log(xs). Similarly, the factor Δ1 = r/q represents the distance between the midpoints of the middle and rightmost one-site components (Fig. 8 A, subpanel i, black and blue components) because log(Δ1) = log(xq) − log(xr). (Without loss of generality, it is assumed that srq, so that Δ2 ≥ Δ1 ≥ 1.) Substituting these relations for Δ1 and Δ2 into Eqs. 31a and 31b yields

(34a)

and

(34b)

Eqs. 34a and 34b were used to map the PF spectra in subpanels −i, o, and i of Fig. 8 A onto quadrant I in the (a, b) parameter space (Fig. 8 B), where they are shown as black circles. This process was repeated for many additional values of Δ1 while holding Δ2 constant at 106; the locus of points so obtained is shown by the thin black line in Fig. 8 B.

It should be possible to use Eqs. 34a and 34b to identify all points in quadrant I corresponding to real values of q, r, and s. We will demarcate this region, which contains infinitely many (a, b) pairs, by determining its boundaries, of which there are two. One boundary corresponds to all sets of real PF constants for which r = q (equivalent to Δ1 = 1; see Fig. 8 A, subpanel ii). This boundary is represented by the bold blue curve marked “r = q” in Fig. 8 B, which was calculated using a range of Δ2 values in Eqs. 34a and 34b while holding Δ1 = 1. The second boundary corresponds to all sets of real PF constants for which r = s (equivalent to Δ1 = Δ2; see Fig. 8 A, subpanel ii), and is represented by the bold red curve marked “r = s” in Fig. 8 B. This curve was calculated using a range of Δ2 values in Eqs. 34a and 34b while holding Δ1 = Δ2. The triangular region enclosed by the bold blue and red curves in Fig. 8 B thus represents all (a, b) pairs that yield real and distinct values of q, r, and s.

### Mapping complex-valued PF constants to the (a, b) shape parameter space

By default, the area in quadrant I not enclosed by the bold red and blue curves must correspond to (a, b) pairs whose PF expansion contains a pair of complex conjugate association constants. The PF spectra in those cases consists of a one-site component (derived from the real-valued PF association constant) and a two-site component (derived from the pair of complex conjugate association constants). There are two possibilities to consider.

The first possibility is that s is real and q and r are complex conjugates (such as Fig. 8 A, subpanels −iv and −iii). In this case, the midpoint of the one-site PF component is given by xs = 1/s. By combining Eqs. 12b, 17a, 17b, and 22, the midpoint of the two-site component is xqr = 1/(qreal2 + qimag2)1/2. By generalizing the previous definition of the spacing factor to include complex values, we obtain Δ2 = s/|q|, where |q| represents the magnitude of the complex number q and is given by |q| = (qreal2 + qimag2)1/2. Thus, Δ2 represents the separation between the midpoints of the one-site and two-site components because log(Δ2) = log(s/|q|) = log(xqr/xs) = log(xqr) − log(xs).

The spacing factor Δ1 is not useful here because it is always equal to 1 when q and r are complex conjugates. We use instead the slope factor Γ, which is the ratio of the imaginary to the real parts of q and r (Eq. 20). Substituting the above relations for Δ2 and Γ into Eqs. 31a and 31b yields

(35a)

and

(35b)

Eqs. 35a and 35b were used to map the PF spectra in Fig. 8 A (subpanels −iv, −iii, and −ii; for which Δ2 = 106) onto the (a, b) parameter space, where they are shown as blue circles (Fig. 8 B) This process was repeated for many additional values of Γ while holding Δ2 constant at 106; the locus of points obtained is shown by the thin blue line in Fig. 8 B.

It is worth noting that in the transition from PF components with all real values of q, r, and s (Eqs. 34a and 34b) to PF components containing a complex conjugate pair (Eqs. 35a and 35b), one degree of freedom has been transformed from a spacing factor, Δ1, to a slope factor, Γ (see Fig. 8 C). We show later that this phenomenon helps to explain the parameter compensations that underlie parameter nonidentifiability.

The second possibility for complex-valued PF constants is that q is real and r and s are complex conjugates (such as Fig. 8 A, subpanels iii and iv):

$s=s real +is imag$
(36a)

and

$r=s real −is imag .$
(36b)

Using the same approach as for the previous case, we define the spacing factor, Δ2 = |s|/q, which again represents the separation between the midpoints of the one-site and two-site components. The spacing factor Δ1 is not useful here because it is always equal to Δ2 when r and s are complex conjugates. We use instead the slope factor Γ, which in this case is defined by Γ = simag/sreal.

Substituting the above expressions for Δ2 and Γ into Eqs. 31a and 31b yields

(37a)

and

$b=(1+Γ2)1/2/Δ21/32+(1+Γ2)1/2/Δ2.$
(37b)

Eqs. 37a and 37b were used to map the PF spectra in Fig. 8 A (subpanels ii, iii, and iv, for which Δ2 = 106) onto the (a, b) parameter space and are shown as red circles (Fig. 8 B). This process was repeated for many additional values of Γ while holding Δ2 constant at 106; the locus of points obtained is shown by the thin red line in Fig. 8 B.

### Contours of constant Δ2 reveal a second level of organization in the (a, b) space

The union of the thin blue, black, and red segments in Fig. 8 B represents a “contour of constant Δ2,” for Δ2 = 106. The pattern of these roots hints at another level of organization in the root-locus map beyond the distinction between real and complex roots that defines the boundaries between quadrants I–IV. In this finer level of organization, complex PF constants segregate into two distinct regions according to whether q and r are complex conjugates (thin blue line) or r and s are complex conjugates (thin red line).

A more complete three-site root-locus map (Fig. 9) was constructed by calculating many additional contours of constant Δ2 for values of Δ2 ranging from 1 to 1021 using Eqs. 34a, 34b, 35a, 35b, 37a, and 37b. Consistent with the pattern in Fig. 8 B, all of the contours of constant Δ2 in Fig. 9 are composed of segments that cluster into three areas. Sets of {q, r, s} that are all real are located inside the red/blue triangle in quadrant I (region A, black), whereas sets for which r and s are complex conjugates cluster to a connected region in the right part of the space (region B, red), and sets for which q and r are complex conjugates cluster to a connected region in the left part of the space (region C, blue). Also shown in Fig. 9 are the values of Δ1 for the points in region A and the values of Γ for the points in regions B and C. We show later that this second level of organization in the root-locus map is functionally significant, because parameter identifiability is different in regions A, B, and C.

Figure 9.

Root-locus map for PF components of three-site binding curves. The blue, black, and red circles represent 64 shape parameter pairs spanning a large region of the (a, b) parameter space. Each of these pairs corresponds to a different three-site binding curve line shape (see Eq. 25). The thin blue-, black-, and red lines indicate regions in which q, r, and s are real and distinct (black, region A); q is real and r and s are either real and equal or are complex conjugates (red, region B); or s is real and q and r are either real and equal or are complex conjugates (blue, region C). In region A, the shape parameters depend on the spacing factors Δ1 and Δ2 (see Eqs. 34a and 34b), the values of which are indicated below each black circle. In Region B, the shape parameters depend on spacing factor Δ2 and on slope factor Γ = simag/sreal (see Eqs. 37a and 37b). In region C, the shape parameters depend on spacing factor Δ2 and on slope factor Γ = qimag/qreal (see Eqs. 35a and 35b). The values of Γ are indicated below each red and blue circle. Lines of constant Δ2, which connect binding curves with similar line shapes, are shown for numerous values in the range 1 ≤ Δ2 ≤ 1021. Uppercase roman numerals I–IV identify the four quadrants demarcated by the critical values (ac and bc) of the shape parameters.

Figure 9.

Root-locus map for PF components of three-site binding curves. The blue, black, and red circles represent 64 shape parameter pairs spanning a large region of the (a, b) parameter space. Each of these pairs corresponds to a different three-site binding curve line shape (see Eq. 25). The thin blue-, black-, and red lines indicate regions in which q, r, and s are real and distinct (black, region A); q is real and r and s are either real and equal or are complex conjugates (red, region B); or s is real and q and r are either real and equal or are complex conjugates (blue, region C). In region A, the shape parameters depend on the spacing factors Δ1 and Δ2 (see Eqs. 34a and 34b), the values of which are indicated below each black circle. In Region B, the shape parameters depend on spacing factor Δ2 and on slope factor Γ = simag/sreal (see Eqs. 37a and 37b). In region C, the shape parameters depend on spacing factor Δ2 and on slope factor Γ = qimag/qreal (see Eqs. 35a and 35b). The values of Γ are indicated below each red and blue circle. Lines of constant Δ2, which connect binding curves with similar line shapes, are shown for numerous values in the range 1 ≤ Δ2 ≤ 1021. Uppercase roman numerals I–IV identify the four quadrants demarcated by the critical values (ac and bc) of the shape parameters.

Close modal

### Identifiability map for three-site binding curves

The blue, black, and red circles in Fig. 9 represent a sample of 64 points distributed over a large area in the (a, b) parameter space. Binding curves for each pair of (a, b) values are shown at the corresponding locations in Fig. 10 and illustrate the wide variety of line shapes that are possible for three-site binding curves.

Figure 10.

Superimposed maps of binding curve line shapes and binding parameter identifiability for three-site binding curves. Binding curves computed for a given (a, b) pair are shown in the corresponding position in the (a, b) parameter space. Numerical values represent, from top to bottom, the asymptotic errors for parameters p1, a, and b for each binding curve. Values are shown in bold text if the corresponding asymptotic error is ≥5% (indicating that the parameter is PI) and in normal text if the asymptotic error is <5% (indicating that the parameter is not PI). Regions are colored according to parameter PI: yellow if all parameters are PI, light green if a subset of the parameters is PI, and dark green if none of the parameters are PI. Uppercase roman numerals I–IV identify the four quadrants demarcated by the critical values (ac and bc) of the shape parameters. Circled binding curves marked A and B indicate example curves A and B referred to in main text.

Figure 10.

Superimposed maps of binding curve line shapes and binding parameter identifiability for three-site binding curves. Binding curves computed for a given (a, b) pair are shown in the corresponding position in the (a, b) parameter space. Numerical values represent, from top to bottom, the asymptotic errors for parameters p1, a, and b for each binding curve. Values are shown in bold text if the corresponding asymptotic error is ≥5% (indicating that the parameter is PI) and in normal text if the asymptotic error is <5% (indicating that the parameter is not PI). Regions are colored according to parameter PI: yellow if all parameters are PI, light green if a subset of the parameters is PI, and dark green if none of the parameters are PI. Uppercase roman numerals I–IV identify the four quadrants demarcated by the critical values (ac and bc) of the shape parameters. Circled binding curves marked A and B indicate example curves A and B referred to in main text.

Close modal

We explored parameter PI systematically by computing fit-error curves for parameters p1, a, and b for each of the three-site binding curves in Fig. 10. The values next to each binding curve represent, from top to bottom, the asymptotic errors from the fit-error curves for parameters p1, a, and b. For example, the asymptotic errors for parameters p1, a, and b were 0.16, 9.5 × 10−6, and 0.19 for the binding curve corresponding to (a, b) = (10−4, 10−3). Values are shown in bold text if the asymptotic error was greater than 5% (indicating that the parameter is PI) and in normal text if the asymptotic error was <5% (indicating that the parameter is not PI).

The simple identifiability pattern in Fig. 10 comprises three distinct regions: in quadrants II, III, and IV (dark green), none of the parameters are PI; inside the triangular region of quadrant I (yellow), all three parameters are PI; and in the remaining part of quadrant I (light green), only parameters p1 and b are PI. These results are similar to those for two-site curves (Fig. 5 H) in two respects: (1) all of the parameters are PI in the region of shape parameter space that contains only distinct, real-valued PF components (Fig. 10, yellow shading), and (2) nonidentifiable parameters are only obtained in the regions of shape parameter space that contain complex-valued PF components (Fig. 10, light green and dark green shading). However, the PI map for three-site curves contains an additional subtlety that has no equivalent for two-site curves. In the light green area of quadrant I, only a subset of the parameters are identifiable, whereas in the yellow region of quadrant I, all of the parameters are identifiable. The identifiability pattern in Fig. 10 reflects both levels of organization (quadrants I–IV and regions A–C) present in the root-locus map (Fig. 9).

### The effectiveness of parameter compensations explains regions in which all, some, or none of the binding parameters are identifiable

Binding curves in the yellow-shaded region in quadrant I (Fig. 10) have three real, distinct PF roots, and thus three distinct one-site components in their PF expansions (such as in Fig. 8 A, subpanels −i, o, and i). The spacing between these components, quantified by the factors Δ1 and Δ2, completely specifies the line shape of three-site binding curves. Eqs. 34a and 34b indicate that it is not possible to change one of the shape parameters to an incorrect value (as is done during the calculation of a fit-error curve) without altering either Δ1 or Δ2, which in turn would change the line shape and yield a poor-quality fit to the data curve. As a result, the binding parameters are PI in this region.

Upon moving from the yellow region to one of the green regions in Fig. 10, two of the real PF constants are replaced by a complex conjugate pair, which collapses two of the one-site PF components into a single two-site PF component (such as Fig. 8 A, subpanels −iv, −iii, iii, and iv). The binding curve line shape in these regions is determined by the slope of the two-site component (determined by parameter Γ; see Eq. 20), and by the spacing between the two-site component and the remaining one-site component (specified by parameter Δ2).

We noted earlier that changing parameter Γ has small effects (a factor of 2 at most) on the midpoint slope of the two-site PF component. Furthermore, during a curve fit, location parameter p1 can be varied to shift the calculated curve along the abscissa without changing its shape. Thus, to achieve the best fit to binding curves derived from the green areas of Fig. 10, it is only necessary to vary parameters a and b so that Δ2 for the fit and data curves are equal. The key to understanding whether the parameter compensations occurring during data fitting are effective (leading to non-PI parameters) or ineffective (leading to PI parameters) is to understand how Δ2 depends on a and b.

Consider example binding curves from two regions of the (a, b) parameter space with different identifiability properties. The first curve (“curve A”), for which (a, b) = (10, 0.1), maps to quadrant II (circled in Fig. 10). For this curve, parameters p1, a, and b are all not PI. The second curve (“curve B”), for which (a, b) = (10−4, 10−4), maps to the light green region of quadrant I. For this curve, parameters p1 and b are PI, but parameter a is not PI. The spacing factor Δ2 = 106 for curves A and B (note that they are on the same contour of constant Δ2), but the relative positions of their one- and two-site components are reversed. Given this symmetry, it is perhaps surprising that the parameter identifiability for these curves is different. In the following, we show that the distinct identifiability behaviors are explained by differences in how Δ2 depends on a and b in quadrant I compared with quadrant II.

The contours of constant Δ2 in Fig. 9 graphically depict the relationship between Δ2 and the shape parameters a and b throughout the (a, b) parameter space. In region B, the shape parameters are given by Eqs. 37a and 37b. Numerical comparison of factors Δ2 and Γ indicates that Δ2 >> (1 + Γ2)1/2 in quadrant II (which lies entirely within region B). Applying this approximation to Eqs. 37a and 37b yields

(38a)

and

$b=(1+Γ2)1/2/Δ21/32.$
(38b)

Combining Eqs. 38a and 38b yields

$Δ2=(a/b)3.$
(39)

Eq. 39 indicates that Δ2 remains constant if the ratio of shape parameters a and b remains constant. Indeed, for a wide range of values of parameter a (Fig. 11 A, top, points C–G), the nonlinear regression selects values for parameter b that preserve this ratio (Fig. 11 A, bottom), leading to low-error fits to curve A (Fig. 11 A, middle). Similar arguments apply to the fit-error curve for parameter b (Fig. 11 B, top), except in this case parameter a is varied in response to the linear ramp of b values to maintain a constant ratio a/b (Fig. 11 B, bottom), yielding low-error fits to curve A (Fig. 11 B, middle). These efficient parameter compensations explain why shape parameters a and b are not PI for curve A (or for any other curve from quadrant II).

Figure 11.

Parameter compensations affecting binding parameter identifiability for three-site binding curves. (A and B, top) Fit-error curves for shape parameter a (A) and shape parameter b (B) for three-site binding curve (curve A in Fig. 10) for which (a, b) = (10, 0.1). Dashed horizontal lines indicate 5% rms error level. (A, middle) Examples of fits to synthetic data curve from selected points labeled A–G on curve in A (top). Fits were obtained using nonlinear regression while holding shape parameter a fixed to the value indicated by the corresponding letter in A (top) and varying parameters p1 and b to achieve best fit. (B, middle) Same as A (middle), except fits are from selected points on curve in B (top), and fits were performed holding shape parameter b fixed while p1 and a were varied to achieve best fit. (A and B, bottom) Trajectories of parameters a, b, and p1 for all fits used to construct fit-error curves in A and B (top). (C and D, top) Fit-error curves for shape parameter a (C) and shape parameter b (D) for three-site binding curve (curve B in Fig. 10) for which (a, b) = (10−4, 10−4). (C, middle and bottom) Same as for A (middle and bottom), except fits and parameter trajectories were from points in C (top). (D, middle and bottom) Same as for B (middle and bottom) B3, except fits and parameter trajectories were from points in D (top). Spacing parameter Δ2 is illustrated for binding curve in A (middle). Synthetic data curves in A–D (middle) are indicated by red dots.

Figure 11.

Parameter compensations affecting binding parameter identifiability for three-site binding curves. (A and B, top) Fit-error curves for shape parameter a (A) and shape parameter b (B) for three-site binding curve (curve A in Fig. 10) for which (a, b) = (10, 0.1). Dashed horizontal lines indicate 5% rms error level. (A, middle) Examples of fits to synthetic data curve from selected points labeled A–G on curve in A (top). Fits were obtained using nonlinear regression while holding shape parameter a fixed to the value indicated by the corresponding letter in A (top) and varying parameters p1 and b to achieve best fit. (B, middle) Same as A (middle), except fits are from selected points on curve in B (top), and fits were performed holding shape parameter b fixed while p1 and a were varied to achieve best fit. (A and B, bottom) Trajectories of parameters a, b, and p1 for all fits used to construct fit-error curves in A and B (top). (C and D, top) Fit-error curves for shape parameter a (C) and shape parameter b (D) for three-site binding curve (curve B in Fig. 10) for which (a, b) = (10−4, 10−4). (C, middle and bottom) Same as for A (middle and bottom), except fits and parameter trajectories were from points in C (top). (D, middle and bottom) Same as for B (middle and bottom) B3, except fits and parameter trajectories were from points in D (top). Spacing parameter Δ2 is illustrated for binding curve in A (middle). Synthetic data curves in A–D (middle) are indicated by red dots.

Close modal

In region C (Fig. 9), the shape parameters are given by Eqs. 35a and 35b. Numerical examination of factors Δ2 and Γ indicates that (1 + Γ2)1/2 Δ2 >> 1 in the green-shaded part of quadrant I (Fig. 10), which lies entirely within region C (Fig. 9). Applying this approximation to Eq. 35b yields

$Δ2=b−3/2,$
(40)

consistent with the finding that the contours of constant Δ2 are horizontal lines in this region. An asymmetry has been created because Δ2 depends on only one of the shape parameters. The lack of dependence of Δ2 on shape parameter a in Eq. 40 means by definition that parameter a is not identifiable for curve B (Fig. 11 C). However, parameter b is identifiable because varying it changes Δ2, and this effect cannot be compensated for by changing parameter a, as Δ2 does not depend on a (Fig. 11 D).

### Extension of the theory to proteins with more than three binding sites

Our approach to assessing binding parameter identifiability is easily extended to systems with more than three binding sites. The analysis starts with the total binding relation (Eq. 1), whose form is conserved for all values of n. This equation can be reparameterized using substitutions of the following form (see Eqs. 24a and 24b):

$p k =a k k p 1 k .$
(41)

Here, we have adopted a more flexible nomenclature for the shape parameters, in which the subscript denotes the power of ligand concentration that each shape factor appears with in the binding relation. Thus, shape parameter a becomes a2, b becomes a3, etc. Substituting Eq. 41 into the general n-site total binding relation (Eq. 1) yields

(42)

Eq. 42 has the remarkable property that multiplying p1 by a constant k is equivalent to scaling the abscissa by that same constant:

(43)

Eq. 43 indicates that parameter p1 is a location parameter that determines the position of the binding curve on the concentration axis, and the (n − 1)-dimensional set {a2, a3, …, an} are shape parameters that determine the binding curve line shape.

The reparameterized n-site binding curve can be decomposed into n one-site components using a PF expansion. The association constants for the one-site components are the roots of the n-site PF polynomial equation (see Eqs. 13 and 30 for examples for two and three sites), which has the general form

$∑ k=0 k=n ( −1 ) k a k k p 1 k q n−k =0,$
(44)

where a1 = 1. The association constants of the PF components, obtained by solving Eq. 44, are either all real, or else pairs of them are complex conjugates, with the remainder real.

The shape parameters ak for an n-site system are related in a simple way to the elementary symmetric polynomials (Uspensky, 1948), and explicit formulas for them in terms of the PF constants {q, r, s, …} can be written down by inspection for any value of n. The examples in the second column of Table 1 illustrate the predictable pattern followed by the ak for various values of n.

The critical values of the shape parameters for any value of n are easily computed using the appropriate set of Maclaurin inequalities (Cvetkovski, 2012). These inequalities yield a general formula for the critical value of parameter ak for n sites:

$a k( critical ) =( C n,k ) 1/k n ,$
(45)

where Cn,k is the binomial coefficient n choose k. Critical values of the shape parameters for two-, three-, and four-site binding curves computed using Eq. 45 are compiled in Table 1.

Mathematical analysis of the PF components can be used to create a root-locus map of the shape parameter space. For example, the PF spectra for n = 4 (Fig. 12 A) represent cases where the PF roots are all real and all are distinct (subpanel o), two are equal (i, iii, and v), three are equal (ii and iv), and two pairs are equal (vi). These diagrams were used to guide the computation of the surface in Fig. 12 B. The black “sheet” in Fig. 12 B represents the locus of (a, b, c) triples corresponding to equal values of the PF constants q and r (Fig. 12 A, v), and was calculated from the shape parameter equations for four sites in the fourth column of Table 1 with Δ1 = 1. Similarly, the red sheet corresponds to PF spectra for which s = t (Fig. 12 A, iii), calculated for Δ2 = Δ3, and the blue sheet to those for which r = s (Fig. 12 A, i), calculated for Δ1 = Δ2. Together, these sheets enclose the region of all (a, b, c) triples whose PF components have distinct, real-valued association constants. This region corresponds to all four-site binding curves with four resolved features. This volume is the four-site equivalent to the triangular area enclosed by bold red and blue curves in Fig. 8 B. Our results for two- and three-site binding curves suggest that all n binding parameters are identifiable for four-site binding curves that map to the volume enclosed by the blue, red, and black sheets in Fig. 12 B.

Figure 12.

Mapping PF component diagrams onto space of shape parameters for four-site binding curves. (A) Examples of PF spectra for four-site binding curves. Spectra show cases in which PF constants q, r, s, and t are real-valued. Spacing parameters Δ1, Δ2, and Δ3 are shown in subpanel o. (B) Root-locus map for PF components of four-site binding curves. Lowercase roman numerals identify points in the (a, b, c) parameter space corresponding to the PF spectra in A. Blue sheet represents all (a, b, c) triples for which r = s (corresponding to subpanel i in A); red and black sheets represent all (a, b, c) triples for which s = t and q = r, respectively (corresponding to subpanels iii and v in A, respectively). The “seam” where blue and red sheets intersect corresponds to all (a, b, c) triples for which r = s = t (corresponding to subpanel ii in A). The seam where blue and black sheets intersect corresponds to all (a, b, c) triples for which q = r = s (corresponding to subpanel iv in A). The seam where red and black sheets intersect corresponds to all (a, b, c) triples for which q = r and s = t (corresponding to subpanel vi in A). Volume inside of the blue, red, and black sheets corresponds to all (a, b, c) triples for which the PF constants q, r, s, and t are real and distinct. In the volume outside of the blue, red, and black sheets, the PF constants will contain one or two pairs of complex conjugate values. The cube in B is bounded by the planes defined by the critical values of shape parameters a, b, and c (see Table 1).

Figure 12.

Mapping PF component diagrams onto space of shape parameters for four-site binding curves. (A) Examples of PF spectra for four-site binding curves. Spectra show cases in which PF constants q, r, s, and t are real-valued. Spacing parameters Δ1, Δ2, and Δ3 are shown in subpanel o. (B) Root-locus map for PF components of four-site binding curves. Lowercase roman numerals identify points in the (a, b, c) parameter space corresponding to the PF spectra in A. Blue sheet represents all (a, b, c) triples for which r = s (corresponding to subpanel i in A); red and black sheets represent all (a, b, c) triples for which s = t and q = r, respectively (corresponding to subpanels iii and v in A, respectively). The “seam” where blue and red sheets intersect corresponds to all (a, b, c) triples for which r = s = t (corresponding to subpanel ii in A). The seam where blue and black sheets intersect corresponds to all (a, b, c) triples for which q = r = s (corresponding to subpanel iv in A). The seam where red and black sheets intersect corresponds to all (a, b, c) triples for which q = r and s = t (corresponding to subpanel vi in A). Volume inside of the blue, red, and black sheets corresponds to all (a, b, c) triples for which the PF constants q, r, s, and t are real and distinct. In the volume outside of the blue, red, and black sheets, the PF constants will contain one or two pairs of complex conjugate values. The cube in B is bounded by the planes defined by the critical values of shape parameters a, b, and c (see Table 1).

Close modal

Root-locus maps for systems with more than four binding sites would be generated in an analogous fashion. An important advantage of using the PF expansion approach is that the complexity of the calculations does not increase rapidly when the number of binding sites is increased, as it would with other methods. For example, the root-locus maps could be computed using the discriminant, but the number of terms in this function increases exponentially with degree; the discriminants for n = 4, 5, and 6 contain 16, 59, and 246 terms (Gelfand et al., 1994). In contrast, the geometric picture provided by PF spectra remain simple as the number of sites increases (Fig. 13), and the equations used to compute the boundaries of the region in the shape parameter space corresponding to all real-valued PF constants remain manageable (see Table 1 for examples for two, three, and four sites).

Figure 13.

Graphical depiction of PF decomposition of total binding curves for different numbers of binding sites. (A–E) PF spectra for two-site (A), three-site (B), four-site (C), five-site (D), and six-site (E) binding curves. Spectra illustrate cases in which the PF constants for n-site binding curves are all real-valued and distinct, so that there are n − 1 spacing factors (Δ1, Δ2, …, Δn−1).

Figure 13.

Graphical depiction of PF decomposition of total binding curves for different numbers of binding sites. (A–E) PF spectra for two-site (A), three-site (B), four-site (C), five-site (D), and six-site (E) binding curves. Spectra illustrate cases in which the PF constants for n-site binding curves are all real-valued and distinct, so that there are n − 1 spacing factors (Δ1, Δ2, …, Δn−1).

Close modal

### Summary

Many proteins, such as ligand-activated receptors, contain multiple ligand binding sites. Function is understood at a detailed, mechanistic level for only a small number of these molecules. One barrier to progress is the difficulty in reliably quantifying site affinities and cooperative interactions because the uncertainties in parameters estimated by fitting binding data may be very large. For example, an infinite range of parameter values for the Adair–Klotz model yield close fits to the binding curve for CaM (Fig. 1), and hence these data yield almost no insight into the mechanism of calcium binding. We showed that this property is not unique to CaM, but rather is a general phenomenon encountered when parameters are estimated by fitting binding curves (Figs. 3, 4, 6, 7, 10, and 11). Thus, to understand the factors that determine binding parameter identifiability, our approach is to pose the general question, “Under what conditions are the ranges of binding parameter estimates not infinite?” rather than to simply compute the range of parameter values consistent with given data fitted by a specified model.

The infinity of potential models and the infinity of binding curve line shapes argues against using purely numerical approaches to exhaustively compute multidimensional error surfaces. Rather, we exploit features of binding problems that enable the use of analytical methods, which yield general insights into parameter identifiability. First, the binding relations for the universe of all models composed entirely of binding equilibria collapse into a single, conserved functional form for proteins with an arbitrary number of binding sites. In the previous paper (Middendorf and Aldrich, 2017), analysis of the canonical binding relation using matrix algebra revealed simple rules governing binding parameter structural identifiability: at most, n parameters can be quantified reliably by fitting total binding data for a protein with n binding sites. (The number of SI parameters increases to n + 1 for models that also include conformational change of the receptor.) This absolute limit rules out the use of models that include distinct association constants for the n binding sites and explicit cooperative interactions between those sites. Unfortunately, these are the quantities of greatest mechanistic interest! If some of the site affinities can be assumed to be identical (as a result of, for example, symmetry, as in the case of homo-multimeric proteins), then it may be possible to quantify both the affinities and cooperative interactions. In other cases, simplifying assumptions are forced on the investigator. An example of such a compromise is the Adair–Klotz model (Fig. 1 A), which uses n macroscopic association constants that do not distinguish between site affinities and cooperative interactions.

The practical identifiability of binding parameters, the subject of the present paper, is an inherently numerical problem. We assessed binding parameter PI by computing fit-error curves. Analytical tools such as the PF expansion technique and mathematical inequalities were used to guide and interpret these numerical calculations. The results of our hybrid numerical/analytical approach is summarized in identifiability maps (see Figs. 5 H, 10, and 12 B), which indicate that the n SI parameters described above can only be reliably quantified for binding curves containing n resolved components. This result reveals a significant limit to the power of equilibrium binding measurements because the ligand-binding curves for many proteins (including CaM, Fig. 1) do not satisfy this criterion.

### Our approach to understanding binding parameter identifiability and the structure of total binding curves

Our analysis of binding parameter identifiability is based on five key ideas. (1) The mathematical form of the total binding relation at equilibrium is conserved for most physically reasonable binding models. (2) The parameters for such models can be converted into model-independent location and shape parameters, which determine the position and line shape, respectively, of the binding curve. (3) Every total binding curve can be decomposed into simpler component curves using a PF expansion. (4) Practical identifiability of binding parameters can be assessed by computing parameter fit-error curves. (5) The practical identifiability of a binding curve’s parameters is ultimately determined by the mathematical nature (real vs. complex) of the constants in its PF expansion, which can be visualized on a root-locus map (as in Figs. 5 H, 9, and 12 B). These ideas are discussed in the following sections.

### Conserved form of total binding relation

When biophysical parameters are estimated by fitting models to data, each candidate model under consideration may generate a different equation relating the parameters to the experimental observable. Parameter estimation must then be repeated for each of the (possibly numerous) candidate models. However, two remarkable properties of binding systems can be exploited to simplify this process. First, as we showed in the previous paper (Middendorf and Aldrich, 2017), the parameters of mechanistic models whose unitary transitions comprise any combination of bimolecular ligand–protein association reactions and unimolecular conformational equilibria can be converted into a model-independent, SI parameter set {p1, p2, …, pn}. Second, because ligand binding occurs to a finite, saturable lattice of sites, the total binding relation takes on a canonical form (Eq. 1) for nearly any physically reasonable model.

These observations suggested to us an alternative approach to analyzing binding curves in which the data are only fit once, using the canonical binding equation to estimate the universal, SI parameters described above. If the parameters are PI, then the reliable fit parameter estimates obtained from this one-time fit are easily converted algebraically to parameters of specific mechanistic models. In cases in which parameters are not PI, this fact will be readily established, because any of the fits to the data will yield shape parameters that map to a “non-PI” region of the appropriate identifiability map.

It is worth noting that total binding measurements are not useful for selecting a preferred binding model from a group of candidate models, as all such models for a given number of sites reduce to an equivalent mathematical form (Eq. 1).

### Location and shape parameters

The universal, SI parameter set {p1, p2, …, pn} developed in the previous paper (Middendorf and Aldrich, 2017) can be transformed by simple substitutions (Eq. 41) into an alternate, but still SI, parameter set that contains two distinct classes of parameters: location parameter p1 determines a binding curve’s horizontal position (Fig. 3, A–C, left; and Fig. 7, A–E, left), whereas the n − 1 shape parameters determine the curve’s line shape (Figs. 4 A and 10). We also discovered that the dimensionality of the identifiability problem can be reduced from n to n − 1 parameters because the location parameter has no effect on identifiability (Fig. 3, A–C, middle).

### PF expansion of n-site total binding curves

The PF expansion decomposes binding curves into their simplest components. Every n-site binding curve can be expressed uniquely and exactly as a sum of n one-site binding curves (see Eqs. 10 and 27). The PF spectrum of a binding curve (Fig. 5, A–F, right; and Fig. 8 A) reveals its essential structural components, much as the Fourier transform reveals the frequency components of a time-dependent signal. The simplicity of PF spectra can be contrasted with the often complicated line shapes of total binding curves (Figs. 3, 4, 7, and 10).

The PF expansion constants (q, r, s, …) are the roots of an n-th order PF polynomial equation (Eq. 44) and are readily computed using common numerical techniques. These roots may all be real, or some may occur as complex conjugate pairs. The latter are only physically meaningful if both members of a pair are merged into a single component with a two-site line shape. Thus, one resolved feature in a binding curve (out of the maximum possible of n) may be lost for each pair of complex conjugate components in its PF expansion. This observation provides a very simple rule for qualitatively assessing binding parameter identifiability: one or more parameters will not be identifiable for binding curves with less than n resolved components. Consistent with this rule, the binding parameters for CaM are not PI because there is no underlying structure resolved in the sigmoidal binding curve for this molecule (Fig. 1).

The PF expansion technique is also useful conceptually: the PF constants (q, r, s, …) act as a “bridge” between the mathematical parameters (p1, a, b, …) of the binding equation and the spacing (Δ1, Δ2, Δ3, …) and slope (Γ) factors that characterize the binding curve line shape (Fig. 8 C).

### Parameter fit-error curves

Visualizing error surfaces becomes unmanageable for systems described by more than a few parameters because of the large number of slices through the multidimensional surface that must be examined. Parameter fit-error plots collapse this error surface to two dimensions by plotting the least-squared fit error against a range of values for a single parameter; the remaining n − 1 parameters assume their best-fit values at each point. Moreover, because multiple fit-error curves can be displayed together, the ranges of all parameters consistent with experimental data can be represented compactly in a single, two-dimensional plot (Fig. 1 A, bottom).

Many binding parameter fit-error curves (including those in Figs. 1, 3, 4, 7, and 11) have the remarkable property that there is an infinite range of parameter values over which the fit error is small and essentially constant. If the magnitude of the fit error in this range (the “asymptotic error”) is smaller than the noise in an experimental binding curve, then an infinite range of parameter values will yield equally good fits to the data, indicating that the binding parameter is not identifiable.

### Relation of real and complex PF components to binding parameter identifiability

The regions of constant error in parameter fit-error curves are caused by efficient compensations between the parameters, which can be understood by the following arguments. First, consider the case in which such compensations are not possible. The PF decomposition of an n-site binding curve yields a component with a one-site line shape for each real root of its PF polynomial. If all of the roots are real and distinct, then there are n − 1 spacings between the one-site components (Fig. 13). There is only one combination of the n − 1 shape parameters that reproduces the n − 1 spacings between the components in the binding curve correctly, yielding a low-error fit. (See the column labeled “Shape parameters as function of spacing factors” in Table 1 for equations for two-, three-, and four-site curves.) As a result, the parameters are PI when the roots of the PF polynomial are real and distinct.

The behavior is very different when a binding curve’s PF polynomial has complex roots or real roots that are equal. For each pair of complex conjugate roots, two of the one-site PF components are merged into a two-site component, with a loss of one spacing between the features in the binding curve. For each pair of real roots that are equal, two of the one-site PF components overlap perfectly, again with a loss of one spacing between the features in the binding curve. For either of these cases, there are an infinite number of combinations of the n − 1 shape parameters that can reproduce the set of correct spacings because there are fewer than n − 1 spacings. (See the last column in Table 1 for equations for two-, three-, and four-site curves.) By adjusting the location parameter appropriately, a calculated curve with the correct spacings can be brought into proper register with a given data curve, producing a close fit. The result is that one or more of the parameters are not PI when the PF polynomial has one or more pairs of complex roots, or one or more real, equal roots. In the limit that two or more real roots are unequal, but very close in magnitude, the behavior is essentially the same as for equal roots.

### Using root-locus maps to understand parameter identifiability

The identifiability of binding parameters can be assessed by calculating parameter fit-error curves. Unfortunately, each set of fit-error curves is specific to a particular binding curve line shape and must be recalculated when a new binding curve is analyzed. It would be preferable to consult an “identifiability map” that classifies points throughout the space of the shape parameters (which correspond to different binding curve line shapes) with respect to identifiability. Such maps were constructed for two-site curves (Fig. 5 H) and three-site curves (Fig. 8 B) using the “brute force” approach of calculating fit-error curves for many different binding curve line shapes. The disadvantage of this approach is that it requires extensive numerical computations.

Our understanding of the causes underlying parameter identifiability (outlined in the previous section) suggests a simpler and more efficient approach. Because parameter identifiability is determined by the nature (real vs. complex) of the roots of the PF polynomial, the identifiability map coincides with a second map that classifies the points in the shape parameter space based on whether their PF roots are all real or contain complex conjugate pairs. This latter map, the root-locus map, is easily calculated. Remarkably, for two-site (Fig. 5 H), three-site (Fig. 8 B), and four-site (Fig. 12 B) systems, a single, connected region contains all sets of shape parameters whose PF roots are real and distinct. Therefore, the region of this space in which all binding parameters are PI can be demarcated by computing just its border. We showed above that the n − 1 sides of this border are calculated by evaluating n − 1 algebraic expressions, such as Eqs. 34a and 34b for three-site systems. Thus, the difficult and time-consuming problem of computing a large number of parameter fit-error curves can be replaced by a small number of simple algebraic calculations.

Our approach is similar to root-locus analysis (Evans, 1948), which is used in control theory to determine the stability of feedback systems. The common idea is that the nature of the roots of an equation modeling a physical system can be used to predict the behavior of the system for all parameter values.

### Significance of complex-valued PF association constants

Complex-valued binding affinities were considered previously by Klotz (Klotz, 1993, 1997), who encountered them when factoring binding polynomials (the denominator of Eq. 1). The appellation “ghost affinities” given by Klotz to these purely mathematical constructs reflects their uncertain physical significance.

We have shown that complex values arise often during a similar process, the PF expansion of total binding relations. However, the PF components have a clear physical interpretation as structural elements of binding curves. We suggest that pairs of one-site components with complex conjugate association constants have no physical significance individually, and are only meaningful when combined into a two-site component. The merging of two one-site components into a two-site component is the reverse of the process shown in Eq. 10. Because the complex conjugate PF constants q and r are related to the two-site parameters p1 and a as the sum q + r (Eq. 12a) and the product q r (Eq. 12b), the process of merging the complex-valued one-site components produces a real-valued, physically meaningful two-site component.

Complex affinities, which might at first glance seem to be nonphysical artifacts, arise naturally in binding problems and do not conflict with the physical constraints of such problems. Complex numbers appear in many other physical problems, most notably in the analysis of electrical circuits. The frequent occurrence of complex-valued affinities for proteins (Klotz, 1993, 1997), coupled with our finding that they are indicators of nonidentifiability, suggests that the binding parameters of many, if not most, real proteins will be nonidentifiable when constrained by total binding data. This idea is consistent with the qualitative observation that binding curves for real proteins containing n binding sites often have fewer than n resolved components (such as CaM; Fig. 1 A, middle left).

### Identifiability of binding parameters constrained by other types of experimental data

Our findings highlight the magnitude of the challenge in understanding the molecular mechanisms of proteins with multiple ligand binding sites. We expect that for many proteins, techniques that are more powerful than equilibrium total binding measurements will be required to quantify site affinities and cooperative interactions. We are now applying the mathematical tools and approach developed here and in the previous paper (Middendorf and Aldrich, 2017) to site-specific binding (Di Cera, 1995) and kinetic measurements to determine how much reliable, additional information can be obtained from those methods. We note that reliable estimates of some of the Adair–Klotz parameters for calcium binding to CaM (Fig. 1 A, middle left) could in principle be achieved by measuring the limiting slope of the total binding curve (the site occupancy at very low values).

The authors wish to thank D. Brent Halling, Suzanna Bennett, and Ben Liebeskind for helpful discussions and critical reading of the manuscript.

This work was supported by National Institutes of Health grant R01-NS077821 to R.W. Aldrich.

The authors declare no competing financial interests.

Christopher Miller served as guest editor.

Bellman
,
R.
, and
K.
Åström
.
1970
.
On structural identifiability
.
Math. Biosci.
7
:
329
339
.
Ben-Naim
,
A.
2010
.
Cooperativity and regulation in biochemical processes.
,
New York, NY
. 358 pp.
Colquhoun
,
D.
1969
.
A comparison of estimators for a two-parameter hyperbola
.
J. R. Stat. Soc. Ser. C Appl. Stat.
18
:
130
140
.
Colquhoun
,
D.
, and
D.C.
Ogden
.
1988
.
Activation of ion channels in the frog end-plate by high concentrations of acetylcholine
.
J. Physiol.
395
:
131
159
.
Cvetkovski
,
Z.
2012
.
Inequalities: Theorems, Techniques and Selected Problems.
Springer-Verlag
,
Berlin
. 444 pp.
Di Cera
,
E.
1995
.
Thermodynamic Theory of Site-Specific Binding Processes in Biological Macromolecules.
Cambridge University Press
,
Cambridge, UK
. 296 pp.
Evans
,
W.R.
1948
.
Graphical analysis of control systems
.
Trans. AIEE.
67
:
547
551
.
Gelfand
,
I.M.
,
M.M.
Kapranov
, and
A.V.
Zelevinsky
.
1994
.
Discriminants, Resultants, and Multidimensional Determinants.
Birkhauser
,
Boston, MA
. 523 pp.
Hardy
,
G.H.
,
J.E.
Littlewood
, and
G.
Pólya
.
1934
.
Inequalities.
Cambridge University Press
,
Cambridge, UK
. 324 pp.
Hill
,
T.L.
1985
.
Cooperativity Theory in Biochemistry.
Springer-Verlag
,
New York, NY
. 459 pp.
Klotz
,
I.M.
1993
.
A perspective into ligand-receptor affinities using complex numbers
.
90
:
7191
7194
.
Klotz
,
I.M.
1997
.
Ligand-Receptor Energetics: A Guide for the Perplexed.
John Wiley and Sons, Inc.
,
New York, NY
. 192 pp.
Linse
,
S.
,
A.
, and
S.
Forsén
.
1991
.
Calcium binding to calmodulin and its globular domains
.
J. Biol. Chem.
266
:
8050
8054
.
Ljung
,
L.
1987
.
System Identification: Theory for the User.
Prentice Hall
,
Englewood Cliffs, New Jersey
. 519 pp.
Middendorf
,
T.R.
, and
R.W.
Aldrich
.
2017
.
Structural identifiability of equilibrium ligand-binding parameters
.
J. Gen. Physiol.
149
.
Poland
,
D.
1978
.
Cooperative Equilibria in Physical Biochemistry.
Clarendon Press
,
Oxford, UK
. 344 pp.
Raue
,
A.
,
C.
Kreutz
,
T.
Maiwald
,
J.
Bachmann
,
M.
Schilling
,
U.
Klingmüller
, and
J.
Timmer
.
2009
.
Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood
.
Bioinformatics.
25
:
1923
1929
.
Reich
,
J.
, and
I.
Zinke
.
1974
.
Analysis of kinetic and binding measurements IV: Redundancy of model parameters
.
Studia Biophysica.
43
:
91
107
.
Reich
,
J.
,
J.
Winkler
, and
I.
Zinke
.
1974
a
.
Analysis of kinetic and binding measurements II: Reliability of parameter estimates
.
Studia Biophysica.
42
:
181
193
.
Reich
,
J.
,
J.
Winkler
, and
I.
Zinke
.
1974
b
.
Analysis of kinetic and binding measurements III: Consistency of the mathematical model
.
Studia Biophysica.
43
:
77
99
.
Solc
,
C.K.
, and
R.W.
Aldrich
.
1990
.
Gating of single non-Shaker A-type potassium channels in larval Drosophila neurons
.
J. Gen. Physiol.
96
:
135
165
.
Stefan
,
M.I.
,
S.J.
Edelstein
, and
N.
Le Novère
.
2009
.
Computing phenomenologic Adair-Klotz constants from microscopic MWC parameters
.
BMC Syst. Biol.
3
:
68
.
Uspensky
,
J.V.
1948
.
Theory of Equations.
McGraw-Hill
,
New York
.
353
pp.
Walter
,
E.
, and
L.
Pronzato
.
1997
.
Identification of Parametric Models.
Masson
,
Paris
. 413 pp.
Wyman
,
J.
, and
S.J.
Gill
.
1990
.
Binding and Linkage: Functional Chemistry of Biological Macromolecules.
University Science Books
,
Mill Valley, CA
. 330 pp.

Abbreviations used:

• AM

arithmetic mean

•
• CaM

calmodulin

•
• GM

geometric mean

•
• PF

partial fraction

•
• PI

practically identifiable

•
• rms

root mean square

•
• SI

structurally identifiable