To understand how ion channels and other proteins function at the molecular and cellular levels, one must decrypt their kinetic mechanisms. Sophisticated algorithms have been developed that can be used to extract kinetic parameters from a variety of experimental data types. However, formulating models that not only explain new data, but are also consistent with existing knowledge, remains a challenge. Here, we present a two-part study describing a mathematical and computational formalism that can be used to enforce prior knowledge into the model using constraints. In this first part, we focus on constraints that enforce explicit linear relationships involving rate constants or other model parameters. We develop a simple, linear algebra–based transformation that can be applied to enforce many types of model properties and assumptions, such as microscopic reversibility, allosteric gating, and equality and inequality parameter relationships. This transformation converts the set of linearly interdependent model parameters into a reduced set of independent parameters, which can be passed to an automated search engine for model optimization. In the companion article, we introduce a complementary method that can be used to enforce arbitrary parameter relationships and any constraints that quantify the behavior of the model under certain conditions. The procedures described in this study can, in principle, be coupled to any of the existing methods for solving molecular kinetics for ion channels or other proteins. These concepts can be used not only to enforce existing knowledge but also to formulate and test new hypotheses.

## Introduction

Ion channels are highly adapted to perform specific functions in the cell. To give just one example, voltage-gated sodium (Nav) channels have finely tuned kinetic properties that allow neurons and other excitable cells to generate action potentials of specific shape and frequency (Bean, 2007). The properties that enable Nav and other channels to perform such complex and well-calibrated behavior are encoded in the kinetic mechanism, defined as a set of conformational and functional states, interconnected by a network of allowed state transitions that may depend on ligand concentration, membrane potential, or other physical variables (Colquhoun and Hawkes, 1995a,b). To understand how ion channels function, one must decrypt the kinetic mechanism. The same is true for all proteins, from ion channels and receptors to enzymes and molecular motors (Popescu and Auerbach, 2003; Milescu et al., 2006; Müllner et al., 2010; Syed et al., 2010).

A kinetic mechanism can be solved by fitting experimental data with a mathematical model. However, decades of ion channel research have shown that kinetic mechanisms cannot be fully captured by any single type of experiment. Instead, to update or construct a new model, one must fit a comprehensive data collection (Horn and Lange, 1983; Hawkes et al., 1990, 1992; Vandenberg and Bezanilla, 1991; Hoshi et al., 1994; Zagotta et al., 1994a,b; Schoppa and Sigworth, 1998a,b; Rothberg and Magleby, 2000; Milescu et al., 2005), ideally generated by multiple experimental paradigms (Vandenberg and Bezanilla, 1991; Akk et al., 2005; Milescu et al., 2008). For example, we know that Nav and other channels have four voltage sensors that gate with different timing and voltage sensitivity (Bezanilla, 2000; Pantazis et al., 2014). These fundamental aspects cannot be easily resolved by single-channel or whole-cell recordings alone, but they can be addressed in combination with other types of experiments, such as patch-clamp fluorometry (Chanda and Bezanilla, 2002; Zaydman et al., 2013; Pantazis et al., 2014).

Optimizing a model against multiple types of data is difficult in itself. A further complication is that some results—quantitative or qualitative—cannot be added to the data collection that is used for fitting. The number of voltage sensors, the existence of open-state block, and numerical relationships between parameters due to allosterism, etc., are examples of such results. Instead, this prior knowledge about the channel must be encoded directly into the model. In this way, the model will explain the new data but will also remain consistent with what is already known.

How do we introduce prior knowledge into a model? We present here some strategies for addressing this issue. At the most basic level, structural assumptions about the kinetic mechanism can be stated implicitly by choosing a specific set of states and connectivity, as we explain with an example from the literature (Kuo and Bean, 1994). Further, quantitative or qualitative assumptions can be introduced by defining a set of constraints that the model has to satisfy, while also explaining the new data. These model constraints can be formulated as explicit mathematical relationships between rate constants or other model parameters, or they can specify the behavior of the model under certain conditions (Fig. 1).

To implement these ideas, we developed new computational tools with a focus on parameter estimation. First, we build upon an existing method for enforcing linear constraints between rate constants (Qin et al., 1996; Colquhoun et al., 2004; Milescu et al., 2005) and extend it to cover arbitrary linear constraints between model parameters, including allosteric factors. Furthermore, we provide a new formalism for handling both equality and inequality relationships. In the companion article (see Navarro et al. in this issue), we test a method for implementing behavioral constraints, as well as arbitrary parameter relationships, by adding a penalty term to the cost function of the fitting algorithm. The theory and computational procedures described here can be coupled, in principle, to any of the existing methods for solving molecular kinetics, for ion channels or other proteins. These concepts can be used not only to enforce existing knowledge but also to formulate and test new hypotheses.

## Materials and methods

All the mathematical and computational algorithms described in this study were implemented and tested with the freely available MLab edition of the QuB program.

### Theoretical background

#### Kinetic mechanisms

Ion channel kinetic mechanisms are well described by Markov models, which reduce the continuum of molecular conformations that can be assumed by the protein to a small set of discrete states that can be detected experimentally or inferred statistically (Colquhoun and Hawkes, 1995a,b; Colquhoun and Sigworth, 1995). These states correspond to various conformations of functional and structural elements, such as resting or activated voltage sensors, bound or unbound ligands, closed or open pore, and inactivated or noninactivated channel. Direct transitions are permitted between certain states, and the frequency of these transitions is quantified by rate constants, which can be functions of ligand concentration, membrane potential, tension, or other physical variables. The topology (or structure) of a kinetic mechanism is defined by the set of states and their transition connectivity, including information on which rates are ligand dependent, voltage dependent, etc.

Here, we assume that all microscopic rate constants follow the Eyring formalism (Eq. 1; Eyring, 1935), with the implication that complexity in kinetic behavior should be explained with more elaborate state models, rather than through over-parameterized and ad hoc rate constant expressions. Accordingly, voltage-dependent rate constants are simple exponential functions of voltage:

where *k*_{ij} is the rate constant of the transition from state *i* to state *j*, and *V* is the membrane potential. The $k ij 0 $ value is the rate constant at zero membrane potential, whereas the $k ij 1 $ value is the voltage-sensitivity factor, which can be expanded as follows:

where *z*_{ij} is the electrical charge moving over the fraction δ_{ij} of the electric field, *F* is Faraday’s constant, *R* is the gas constant, and *T* is the absolute temperature. For voltage-insensitive rates, $k ij 1 =0.$ Rate constants that depend on other physical variables, such as membrane tension, have similar exponential expressions (Gnanasambandam et al., 2017). For state transitions that represent the binding of a ligand, rate constants have the following expression:

where $k ij 0 $ is the rate constant at unitary ligand concentration [*L*]. When a model lumps several states together, some rates become macroscopic and contain a statistical factor in their expression (e.g., the transition between C_{1} and C_{2} in the model shown in Fig. 2 B).

The set of $k ij 0 $ and $k ij 1 $ values are the main parameters of the kinetic mechanism. Together, the kinetic parameters and the topology of the model fully specify the mechanism. In turn, the kinetic mechanism describes the operation of the channel within the membrane, under stationary conditions or in response to stimuli. Markov models, computational algorithms, and software have been adapted and developed to extract the kinetic mechanism from experimental data (Ball and Sansom, 1989; Hawkes et al., 1990; Qin et al., 1996, 2000a,b; Venkataramanan and Sigworth, 2002; Celentano and Hawkes, 2004; Qin and Li, 2004; Milescu et al., 2005; Csanády, 2006; Moffatt, 2007; Stepanyuk et al., 2011, 2014), with two interrelated aims: to find an appropriate topology and to estimate the kinetic parameters.

#### Formulating the topology of the model

The first step in building a kinetic model is to identify a particular topology that defines the structural and functional elements of the channel and their connecting pathways. The topology can be used to specify the number of voltage sensors, the identity of voltage-sensitive transitions, the number of inactivated states, the presence of multiple open states, the existence of allosteric relationships, etc. The model shown in Fig. 2 B illustrates how a topology can be formulated to capture the key features of a Nav channel kinetic mechanism, as detailed in the Results. This model was based on a large body of knowledge accumulated in the field and, not surprisingly, has provided a flexible enough framework that explained voltage-clamp recordings of sodium currents in several neuronal preparations (Kuo and Bean, 1994; Raman and Bean, 2001; Taddese and Bean, 2002; Milescu et al., 2010).

In contrast, when little is known about the channel, one must take a purely data-driven approach and build a parsimonious topology that explains the data reasonably well. Of course, if one wishes a realistic model, the Eyring theory and related concepts must still be obeyed. Some kinetic properties are intuitive enough and can be easily translated into model features (Salari et al., 2016). For example, whole-cell recordings in which the current first rises and then decays require a model with one conducting and at least two nonconducting states. In general, searching for the right topology can be approached as an iterative process, where one tests a series of models of increasing complexity, adding more and more states and connections. For each tested model, one must determine whether the topology is compatible with the data. If no parameter values can be found that result in a good fit, the topology must be reformulated and the parameters reestimated. Because larger models can inherently fit better, one should take into account the number of degrees of freedom when ranking models. Thus, unless a larger model improves the fit substantially, one should give preference to a model with fewer free parameters. The search across topologies can be terminated when the fit no longer improves.

This model search process is not trivial and relies heavily on the experience of the investigator. The number of nonequivalent (Kienker, 1989) connectivity schemes can be prohibitively large, even for models with a relatively small state count (Bruno et al., 2005). A possible solution is to use a smart optimization algorithm that not only estimates parameters for a given topology but also searches efficiently across topologies at the same time (Gurkiewicz and Korngreen, 2007; Menon et al., 2009). Furthermore, one may be able to reduce the searched state space by using some information contained in the data. For example, statistical analysis of single-channel electrical recordings can provide reasonable estimates on the number of conductance levels (through visual inspection or amplitude histogram analysis) and the minimum number of kinetic states in each conductance level (through dwell-time histogram analysis; Colquhoun and Hawkes, 1982; Hawkes et al., 1990). Other methods can provide more direct evidence about the structural conformations and transition pathways of the channel, such as the number of voltage sensors, the number of inactivated states, or the identity of voltage- or ligand-dependent transitions (Grosman et al., 2000; Ahern et al., 2016). In principle, an automated search across model topologies can incorporate this information.

#### Parameter estimation

A computational procedure for finding the “best” parameters for a proposed model topology combines an algorithm that measures how well a given model explains the data with an optimization engine that searches the parameter space for the “best” solution (Fletcher, 2013). This optimal solution minimizes the error between the data and the prediction of the model (e.g., the sum of square errors) or maximizes a probability function (e.g., the likelihood that the experimental data were generated by the model or the Bayesian posterior probability; Horn and Lange, 1983; Hawkes et al., 1990; Qin et al., 1996, 2000a; Celentano and Hawkes, 2004; Milescu et al., 2005; Csanády, 2006; Moffatt, 2007; Calderhead et al., 2013; Stepanyuk et al., 2014; Epstein et al., 2016). Intuitively, the first approach can be described as minimizing a “cost function,” whereas the second, as maximizing a “goodness of fit.” Throughout this study, we will use the "cost function" term, but with the understanding that it could refer to either minimizing the sum of square errors or, equivalently, minimizing the negative log-likelihood. When one also searches for an appropriate topology, the value of the cost function can be used as a score to rank the model. Even though the kinetic mechanism is fully characterized by the $k ij 0 $ and $k ij 1 $ parameters, the experimental data typically depend on some other parameters as well, such as the number of active ion channels, the single-channel conductance, and the ionic concentrations. To extract the kinetic mechanism from the data, these other parameters may need to be coestimated (Colquhoun et al., 1996; Qin et al., 2000b; Milescu et al., 2005).

## Results

Once a model topology is selected to appropriately express what is known or hypothesized about the channel, thus encoding prior knowledge, the next step is to find a set of parameters that explain the data well. However, these parameters can also contain prior knowledge. In fact, the parameter-estimation procedure itself can be designed to enforce prior knowledge, by making it generate parameter values that are in agreement with a set of model constraints. We classify these model constraints in two categories: (1) parameter constraints, discussed in this study, and (2) behavioral constraints, discussed in the companion article (Navarro et al., 2018). A parameter constraint is formulated as an explicit mathematical relationship that involves rate constants or other model parameters. An example is the scaling of one rate constant to another or restricting the range of a parameter to positive values. In contrast, a behavioral constraint quantifies the behavior of the model under certain conditions, without explicitly referring to rate constants or other model parameters. An example is enforcing the maximum open probability (*P*_{O}) reached by the channel during a specific voltage-clamp stimulation protocol.

The mathematical and computational procedures discussed here for enforcing parameter constraints are limited to linear relationships. However, nonlinear relationships can be enforced using the mechanism developed for behavioral constraints, presented in the second part of this study (Navarro et al., 2018). As illustrated in Fig. 1, linear parameter constraints that enforce an equality relationship reduce the dimensionality of the parameter space, eliminating one dimension for each relationship. In contrast, both inequality parameter constraints and behavioral constraints preserve dimensionality but reduce the size of the parameter space. To describe it intuitively, inequality parameter constraints present the optimizer with a reduced road map, whereas behavioral constraints guide the optimizer through toll-free roads only.

### Implementing prior knowledge with linear parameter constraints

In this section, we discuss the implementation of prior knowledge via linear parameter constraints. To illustrate this concept, we use the Nav channel kinetic mechanism shown in Fig. 2 A. We chose this model because it covers many of the parameter constraints that our formalism can enforce. The model was originally formulated (Kuo and Bean, 1994) with several mechanistic assumptions in mind, which are reflected in the number of states and connections, and in the mathematical relationships between various kinetic parameters (Fig. 2 B). These assumptions can be regarded and expressed as parameter constraints.

#### Model assumptions

The first assumption is that channel activation involves four identical and independent voltage sensors, and all four must be activated to open the pore. Thus, to simplify the kinetic mechanism, all closed states with the same number *n* of resting sensors are lumped into a single compound state. The result is the five-state activation pathway C_{1} … C_{5}. The frequency of activation transitions for any of the compound states C_{1} … C_{5} is equal to *n* times the frequency of the activation transition for a single sensor, where *n* is a statistical factor. The same rule applies to deactivation transitions. For example, when the channel resides in a closed state that has three resting voltage sensors (C_{2}, *n* = 3), the compound activation rate (*k*_{2,3}) is three times the activation rate of a single sensor (*k*_{4,5} or α_{m}). Thus, if *k*_{4,5} and *k*_{2,1} are the microscopic transition rates of a single sensor activating or deactivating, respectively, the assumption of identical and independent voltage sensors is expressed by the following mathematical relationships, where one rate is scaled to another by a constant factor:

Any deviation from the condition of identical and independent voltage sensors will require a model with a different number of states, connections, and statistical factors along the activation pathway. In fact, these constant multiplication factors (2, 3, and 4) could be replaced with unknown cooperativity factors, to be determined from the experimental data, similar to the inactivation allosteric factors introduced next. The same principles apply to ligand-gated ion channel mechanisms. In this case, the statistical factors can be used to describe the relationships between the ligand-binding sites.

Another assumption is that the channel can inactivate not only from the open state O_{6} but also from any of the C_{1} … C_{5} closed states into the I_{7} … I_{12} inactivated states. However, the transition into inactivation from the closed states depends on the degree of activation: as more voltage sensors are activated, the C to I transitions become faster, whereas the I to C transitions become slower. As envisioned in the original model, this property is implemented with the allosteric factors *a* and *b*. Thus, the rate of inactivation from a closed state C_{n} is equal to the rate of inactivation from the previous closed state C_{n−1}, multiplied by the *a* factor. For example, *k*_{2,8} = *a* × *k*_{1,7}, *k*_{3,9} = *a* × *k*_{2,8}, etc. The opposite is true for the return rates: *k*_{8,2} = *b*^{−1} × *k*_{7,1}, *k*_{9,3} = *b*^{−1} × *k*_{8,2}, etc. Taking *k*_{1,7} and *k*_{7,1} as the reference microscopic rates, this assumption is expressed by the following mathematical relationships, where one rate is scaled to another by an undetermined factor:

Furthermore, the voltage sensors can also activate when the channel is inactivated, along the I_{7} … I_{11} pathway, but with different kinetics. This is also encoded by the allosteric factors *a* and *b*, resulting in another set of similar mathematical relationships:

Overall, this allosteric coupling between activation and inactivation can explain the apparently contradictory findings that inactivation appears strongly voltage sensitive but only minimal electrical charge is detected to move within the channel during inactivation (Armstrong and Bezanilla, 1977). Generally, allosteric factors, such as the *a* and *b* quantities in Eqs. 5 and 6, are unknown and need to be determined from the data.

Finally, the last assumption is that the channel satisfies the condition of microscopic reversibility, i.e., no energy input is required for gating and opening. Under this condition, for any reaction loop in the model, the clockwise product of rates around the loop must equal the counterclockwise product (Song and Magleby, 1994; Rothberg and Magleby, 2001; Colquhoun et al., 2004). As the model in Fig. 2 A has five independent loops, the following mathematical relationships must hold true:

#### Voltage- and ligand-dependent rate constants

Some of the mathematical relationships used to express parameter constraints may involve rate constants that are functions of membrane potential. Unless otherwise specified, all these mathematical relationships must be true for any membrane potential value. For example, the scaling relationship $k 3,4 =2\xd7k 4,5 $ in Eq. 4 can be expanded as follows:

A logarithm transformation can be applied to convert products into sums:

Rearranging the terms gives the following:

This relationship must be true when *V* = 0, in which case it simplifies to

Using this result in Eq. 10, we obtain the following:

Thus, to enforce a scaling relationship between two voltage-dependent rate constants, the two mathematical relationships above (Eqs. 11 and 12) must be simultaneously satisfied. The same reasoning can be applied to other types of constraints. For example, after taking the logarithm and rearranging the terms, the first loop balance constraint in Eq. 7 becomes

For Eq. 13 to be true, two mathematical relationships must be simultaneously satisfied:

Some kinetic mechanisms involve state transitions associated with the binding of a ligand (Grosman et al., 2000; Burzomato et al., 2004; Akk et al., 2005). For example, a relationship where one ligand-dependent rate constant *k*_{ij} is scaled by a constant factor *c* to another ligand-dependent rate constant *k*_{kl} can be expanded as follows:

with the final solution:

Relationships involving voltage- or ligand-dependent rate constants have some special restrictions that have a simple mathematical provenance: (a) if a rate is scaled to another rate, their voltage sensitivities must be equal (or trivially zero); thus, a voltage-dependent rate cannot be scaled to a voltage-independent rate, except for a single voltage value; (b) if a loop involves only one voltage-dependent transition, then the forward and backward voltage-sensitivity factors for that transition must be equal (or zero); a more typical and useful scenario would require at least two voltage-dependent transitions in the loop; and (c) a mathematical relationship that involves ligand-dependent transitions cannot be satisfied for all ligand concentrations, unless the algebraic sum of all the ln([*L*]) terms is equal to zero. Thus, a ligand-dependent rate cannot be scaled to a ligand-independent rate, except for a single concentration value. In the case of microscopic reversibility, this condition requires that the clockwise and counterclockwise transitions around the loop involve the same number of ligand-dependent steps. When the channel binds multiple types of ligands, each type must satisfy these conditions. Models formulated without taking these precautions are, in principle, physically unrealistic.

#### Allosteric, statistical, and other multiplicative factors

Multiplicative factors can be introduced in the rate constant equation to formulate macroscopic rates and to express a variety of parameter constraints. One obvious application is to implement allosteric model behavior, as previously discussed, where the *a* and *b* factors multiply the rate constant pre-exponential term $k ij 0 .$ However, multiplicative factors can also be introduced within the exponential in Eq. 1. So far, we lumped the voltage sensitivity as a single factor, $k ij 1 ,$ but, in fact, we may need to consider the other quantities in Eq. 2. For example, one may want to introduce explicit temperature dependence for a rate. In this case, $k ij 1 $ can be factorized by the following constraint expression:

where *a*_{k} is a multiplicative factor that stands for δ_{ij} × *z*_{ij}, and *C*_{k} is a numerical constant equal to *F*/*RT*, as in Eq. 2. This approach would make it possible to mix data collected at different temperatures, in the same way as we can already account for different voltages or ligand concentrations.

As another example, one may want to enforce a relationship between the δ_{ij} and δ_{ji} values for a given transition, such as δ_{ij} = 1 − δ_{ji}. In this case, assuming that *z*_{ij} and *z*_{ji} are known quantities, we would write the following constraint expressions:

where *a*_{k1} and *a*_{k2} are multiplicative factors that stand for δ_{ij} and δ_{ji}, respectively, and *C*_{k} is a numerical constant equal to z*F*/*RT*, where *z* = *z*_{ij} = *z*_{ji}.

As the multiplicative factors are logarithmically transformed, they are subject to some restrictions. Thus, a pre-exponential parameter $k ij 0 $ can only be constrained to an unlimited product of multiplicative factors and other pre-exponential parameters, each raised to an arbitrary power:

where *C* is a positive numerical constant. Taking the logarithm from $k ij 0 $ will convert this product into a linear sum. In contrast, an exponential parameter $k ij 1 $ can only be constrained to an unlimited sum of multiplicative factors and other exponential parameters, each multiplied by an arbitrary numerical constant:

#### Inequality constraints

So far, we have only considered parameter constraints that are formulated as mathematical equalities. However, prior knowledge may also be expressed through inequality parameter constraints. First, there is a physical requirement that all pre-exponential rate parameters $k ij 0 $ must be greater than zero because transition frequencies are positive numbers. Likewise, quantities that multiply rate constants, such as the *a* and *b* allosteric factors in Eqs. 5 and 6, must also be restricted to positive values in order to keep rates positive. Both of these constraints are automatically handled by the logarithm transformation of variable, as explained further down. In contrast, the exponential factors $k ij 1 $ are in principle free to take any value in the –∞ to +∞ range, but they can also be restricted by a variety of inequality constraints. For example, we may want the voltage sensitivity parameter to be greater than zero for voltage-sensor activation, and less than zero for deactivation:

Applying any of these constraints could be a useful working hypothesis during the initial stages of formulating a model. Subsequently, these constraints could be relaxed. In reality, the forward and backward values can both have the same sign: as long as the activation value is more positive than the deactivation value ($k ij 1 \u2265k ji 1 $), the channel will be more activated at more positive membrane potentials, as is the case with Nav and other voltage-gated channels.

To give another hypothetical example, we may want the ratio between two rate constants at a certain membrane potential *V*_{0} to be smaller than a numerical constant *c*:

All of these “≤” or “≥” inequalities can be converted to equality relationships by subtracting or adding, respectively, a positive quantity to the right side of the inequality. For example, in Eq. 22, we can subtract *z*^{2}, a quantity that, by definition, is positive:

As long as the inequality condition in Eq. 23 is satisfied for *z* = 0, we can find a value for *z* that converts the inequality into an equality:

If we want the above ratio between two rate constants to be smaller than a numerical constant at any voltage *V*, not just at *V*_{0}, then we have

Because this is an equality, we follow the same logic as for equality constraints: the above relationship must also be true when *V* = 0, and thus, we obtain two simultaneous relationships:

For "≥" inequalities, we must add *z*^{2} to the right side, rather than subtract it.

In the jargon of optimization theory, *z* is called a “slack” variable (Fletcher, 2013). With equality constraints, one has to find a set of model parameters that satisfy a set of relationships. When inequalities are added to the model and transformed into equalities using slack variables, one has to find both a set of model parameters and a set of slack variables that together satisfy the constraint relationships. A slack variable is not a true parameter of the model, but merely a variable that is temporarily used to handle inequality constraints during the search for optimum parameters. Very importantly, the quantity added or subtracted via slack variables must take positive values, which is why we use *z*^{2} and not *z*. The reason for converting inequality relationships to equalities is to have all linear constraints handled by the same linear algebra mathematical formalism, as explained further.

#### Model parameters

As discussed in the previous section, the core parameters of a kinetic model are $k ij 0 $ and $k ij 1 $, together with some optional multiplicative factors *a*_{k} that describe allosteric coupling or other properties (e.g., the *a* and *b* allosteric factors in the model shown in Fig. 2 B) or help parameterizing the rate constants in more detail. However, other parameters *q*_{l} may also be added to the modeling framework, depending on the particular application. These external parameters are not necessarily present in any of the rate constant expressions. Instead, they may describe the data or experimental variables. For example, when fitting macroscopic currents, one may also need to estimate the number of channels in the record or the single channel conductance (Milescu et al., 2005). Thus, we define a set **K**, of size *N*_{K}, which contains all of these model parameters:

All of these quantities, which we term “rate constant parameters” (pre-exponential $k ij 0 $ and exponential $k ij 1 $), “multiplicative factors” (*a*_{k}), and “external parameters” (*q*_{l}), may be involved in the mathematical relationships that express parameter constraints, as discussed further.

#### A general equation for linear parameter constraints

All the mathematical relationships that were used to implement the assumptions made for the Nav model in Fig. 2 A have something in common: regardless of type (scaling, microscopic reversibility, etc.), each of these equality and inequality parameter constraints result in one or two equations involving $ln( k ij 0 )$, $k ij 1 $, *a*_{k}, and *z*^{2}, each multiplied by a constant. Although not shown in these examples, those relationships can also contain external parameters *q*_{l}. Thus, a general form that covers all these examples can be written as follows:

where *f*_{k} is an invertible function of the multiplicative factor *a*_{k} (e.g., $fk(ak)\u2261ln(ak)$ or $fk(ak)\u2261ak$), and *f*_{l} is an invertible function of the external model parameter *q*_{l}. The $Cij0,Cij1,$*C*_{k}, *C*_{l}, *C*, and *C*_{z} quantities are numerical constants, with *C*_{z} = 1 for a “≥” inequality, and *C*_{z} = −1 for a “≤” inequality.

Specific parameter constraints (e.g., scaling one rate constant to another) can be obtained from the general equation by selecting a subset of $ln( k ij 0 ),k ij 1 , $*f*_{k}(*a*_{k}), and *f*_{l}(*q*_{l}) via nonzero multiplication constants $Cij0,Cij1, $*C*_{k}, or *C*_{l}. As discussed throughout the article, a variety of useful constraints can be implemented using this mechanism, such as making a rate equal to a constant, scaling two rates by a constant factor, scaling two rates by a variable factor, constraining the total charge for a set of transitions, enforcing microscopic reversibility, constraining a reaction loop out of microscopic balance, restricting a model parameter to a range, expressing explicit temperature dependence, etc. Some of these constraints will require a single mathematical relationship, whereas others will require two. We note that using multiplicative factors in constraints generally makes sense when the same factor is used in multiple relationships. Otherwise, these factors can be simply calculated after the parameters are estimated.

#### Converting between model parameters and free parameters

One can easily verify that the model in Fig. 2 B was parameterized in such a way as to implicitly satisfy most assumptions: identical and independent voltage sensors, allosteric coupling of inactivation to activation, and microscopic reversibility. For example, the condition of identical and independent voltage sensors is enforced by the 4, 3, 2, 1 or 1, 2, 3, 4 statistical factors multiplying the α_{m} or β_{m} quantities, respectively. In other words, any values can be assigned to the model parameters, $\alpha m0,\alpha m1, $*a*, *b*, etc., and the assumptions will be automatically satisfied. There is only one exception: the C_{5}–O_{6}–I_{12}–I_{11}–C_{5} reaction loop is not implicitly balanced. In this case, the balance equation (i.e., α_{mo} × β_{ho} × β_{mh} × *b*^{−4} × α_{h} = β_{mo} × *a*^{4} × β_{h} × α_{mh} × α_{ho}) is not true by definition. Instead, it must be enforced by choosing an appropriate set of numerical values for all the parameters involved. In contrast, all the other loops are automatically balanced (e.g., 4α_{m} × β_{h} × *a* × β_{m} × *b*^{−1} × α_{h} = β_{m} × β_{h} × 4α_{m} × *a* × α_{h} × *b*^{−1}).

To formulate a parameterization that implicitly satisfies all parameter constraints, a commonly used strategy is to identify a subset of independent model parameters that can be estimated by the optimization engine and a subset of dependent parameters that can be simply derived from the independent ones. This is exactly how the model in Fig. 2 B was formulated. However, finding this parameterization is not trivial in some cases (Colquhoun et al., 2004). Moreover, it is not clear to us how constraints that are defined by inequality relationships would be handled by this type of parameterization. A potentially easier and certainly more flexible strategy is to define the constraints as an invertible transformation *f*_{c} between the set of interdependent model parameters **K** and a set of independent or “free” parameters **X**:

Thus, the model is defined by the **K** parameters, which are interdependent and thus cannot take arbitrary values but only those values that satisfy the user-defined parameter constraints. In contrast, the **X** parameters are independent of each other and are “free” to take any value in the –∞ to +∞ range. We emphasize that the **X** parameters are not simply a subset of **K**, as explained in the following paragraphs. These free parameters are passed to a model-blind optimizer that can search without any constraint in the parameter space defined by **X**, where it finds a solution that best explains the data. This optimal solution can be translated from the free parameter space back into the model parameter space, via the $f C \u22121 $ transformation.

If we want to implement the linear parameter constraints defined by Eq. 28 or 29, how do we define the *f*_{C} and $fC-1$ transformations that translate the model parameters **K** into the free parameters **X** and vice versa? In preparation for this, we need to recognize that the left side of the generalized Eq. 28 or 29 is nonlinear with respect to $k ij 0 $ and *a*_{k}, and perhaps to some external parameter *q*_{l}. However, we can make the following invertible transformations of variable:

If the multiplicative factor *a*_{k} is an allosteric factor or a similar quantity that multiplies a rate constant *k*_{ij}, then *f*_{k}(*a*_{k}) is ln(*a*_{k}), which is invertible. If *a*_{k} is a factor that multiplies a voltage-sensitivity parameter $k ij 1 $, then *f*_{k}(*a*_{k}) is *a*_{k}, which is obviously invertible as well. Similar logic applies to the external parameters *q*_{l}. For example, if *q*_{l} refers to the number of channels, we can also use the logarithm transformation (Milescu et al., 2005). In all cases, the logarithm has two desirable effects: it restricts the variables to positive values, and it scales the parameters to more similar values relative to each other, helping the optimization engine to find the solution.

The left side of these equations is now linear with respect to $\epsilon ij0,kij1,$ ϕ_{k}, and φ_{l}. Next, we define a vector **R**, of dimension *N*_{R}, with elements that correspond to $\epsilon ij0,kij1,$ ϕ_{k}, and φ_{l}. For a more intuitive notation, we refer to an element of **R** as *r*_{i}, when its type is unspecified, or as $r ij 0 ,r ij 1 ,$*r*_{k}, or *r*_{l}, when we emphasize its identity as a specific type of model parameter ($k ij 0 ,k ij 1 ,$*a*_{k}, or *q*_{l}, respectively). **R** has the same size as **K** (*N*_{R} = *N*_{K}). Thus, a parameter constraint is expressed as a linear relationship between the elements *r*_{i} of **R**, as follows:

where *C*_{i} stands for one of the numerical constants $C ij 0 ,C ij 1 ,$*C*_{k}, or *C*_{l}, respectively. Then, assuming that we have *N*_{C} constraint relationships, we can write the generalized constraint from Eqs. 34 or 35 in a more compact matrix form (Qin et al., 1996; Fletcher, 2013), as follows:

where **M** is a matrix of dimension *N*_{C} × *N*_{R}, and **V** is a vector of dimension *N*_{C}. Each row of **M** corresponds to the numerical constants on the left side of the generalized Eqs. 34 or 35, whereas each element of **V** represents the right side of Eqs. 34 or 35:

Thus, each linear relationship between the $\epsilon ij0,kij1,$ ϕ_{k}, and φ_{l} variables is encoded by row *c* of matrix **M** and element *c* of vector **V**. Eq. 36 encapsulates, in matrix form, all the linear parameter constraints imposed on the model, including both equality and inequality relationships.

Although they linearize the constraint relationships, the transformed model parameters **R** are still interdependent through the constraint relationships. How do we remove the interdependence and convert **R** into the set of free parameters **X**? Ignoring, for now, that **V** is not a constant vector because it depends (nonlinearly) on the optional slack variables *z*, we can take advantage of the linear form of the matrix equation **M** × **R** = **V** and express **R** as a linear function of **X** (Qin et al., 1996) and vice versa:

where the vector **X**, of dimension *N*_{X} = *N*_{R} − *N*_{C}, contains the independent parameters. Note that Eq. 39 is obtained from **X** = **A**^{−1} × (**R** − **B**), because **A ^{−}**

^{1}multiplied by

**B**is equal to a zero vector. The matrix

**A**, of dimension

*N*

_{R}×

*N*

_{X}, and the vector

**B**, of dimension

*N*

_{R}, can be determined from

**M**and

**V**using the singular value decomposition (Golub and Reinsch, 1970). First,

**M**is decomposed as follows:

where **U**_{M} is an orthogonal matrix of dimension *N*_{C} × *N*_{C}, **V**_{M} is an orthogonal matrix of dimension *N*_{R} × *N*_{R}, and **S**_{M} is a diagonal matrix of dimension *N*_{C} × *N*_{R} that contains the singular values of matrix **M**. Then, **A** can be extracted as a submatrix of **V**_{M}:

The inverse of the **A** matrix, **A ^{−}**

^{1}, is similarly obtained from $VM-1$. Because

**V**

_{M}is orthogonal, $VM-1$ is simply equal to

**V**

_{M}transposed ($VMT$). Then,

**B**can be calculated as follows:

where the matrix **M**^{+}, of dimension *N*_{R} × *N*_{C}, is the pseudoinverse of **M** and can be calculated as follows:

where $SM+$ is obtained from **S**_{M} by replacing all nonzero diagonal elements (the singular values) with their inverse. With the **A** and **B** matrices obtained as in Eqs. 41 and 42, we can now calculate **R** from **K**, and then **X** from **R**, and vice versa.

How do we deal with inequality constraints and slack variables? We found a solution that is actually quite simple, although, perhaps, not immediately obvious. First, we define a vector **Z**, of dimension *N*_{Z} equal to the number of inequality constraints. This vector contains all the slack variables, one for each inequality constraint. Then, we define another vector $X \xaf $, which is the union of **X** and **Z**:

The size of $X \xaf $ is equal to *N*_{X} + *N*_{Z}. The slack variables **Z** are arbitrary and thus independent of each other and of the free parameters **X**. Hence, the elements of $X \xaf $ are also independent of each other and represent the free parameters given to the optimizer. Each time the optimizer tries a new set of free-parameters $X \xaf $, the corresponding **Z** is used to recalculate **V** (Eq. 37), which, in turn, is used to recalculate **B** (Eq. 42). The **A** matrix remains the same because the coefficient matrix **M** contains only constants. Thus, we can calculate the transformed model parameters **R** as follows:

where **B**_{z} and **V**_{z} are the **B** and **V** quantities calculated for a given vector **Z**.

During optimization, the slack variables in **Z** are provided by the optimizer, together with **X**. However, **Z** must be initialized at the beginning of the optimization from a given set of transformed model parameters **R** and the appropriate relationships in Eq. 37. Let *z*_{c} be the slack variable introduced by the inequality relationship defined by row *c* of the constraint matrix **M**. Then, *z*_{c} can be calculated with the following equation:

where **M**_{c} is a vector corresponding to row *c* of **M**. This equation has the obvious solution:

The two-way conversion between the model parameters **K** and the free parameters $X \xaf $ is summarized in Fig. 3.

#### Redundant constraints

One should take care to prevent redundancy and use only the minimum number of mathematical relationships that are necessary to implement the assumptions of the model. Intuitively, a constraint relationship is redundant if its intended consequence is already enforced by other relationships. With our example model, one could use either the scaling *k*_{7,8} = *a* × *k*_{1,2}, or the scaling *k*_{7,8} = 4 × *a* × *k*_{4,5}, but not both because that would create an additional relationship between *k*_{1,2} and *k*_{4,5}. Similarly, the condition in which the algebraic sum of the voltage sensitivities around a reaction loop is equal to zero may already be enforced by some rate-scaling relationships and should not be duplicated. When in doubt, one could check the rank of the **M** matrix, which will be reduced by redundant constraints.

Redundant constraints may also arise from inequalities. Although tempting, using inequality relationships to enforce a range constraint on a model parameter is not possible, unfortunately, because it would result in two equality relationships that are redundant. However, this limitation can be overcome through the same mechanism that handles behavioral constraints (Navarro et al., 2018). Furthermore, although inequality constraints add slack variables to the overall set of free parameters, the total number of equality and inequality constraints must still be strictly smaller than the number of model parameters **K**.

#### Calculating the cost function and its gradients

In a typical scenario, the cost function *F* is an explicit function of the rate constants *k*_{ij} and of some external model parameters *q*_{l}:

Typically, *F* would not depend explicitly on the multiplicative factors *a*_{k}, which are generally used to establish relationships between other parameters. Because the model parameters **K** can be obtained from the free parameters $X \xaf $, *F* can also be written as a function of the free parameters $X \xaf $:

Thus, the optimization algorithm can be model blind. In other words, even though it eventually generates a set of optimum model parameters **K**^{*}, it actually searches for a solution in the space defined by the free parameters $X \xaf $. As it searches for the solution, the optimizer requires the cost function *F* to be calculated for each proposed $X \xaf $. The specific expression of the cost function *F* depends on the type of application; it could be a sum of square errors, a likelihood, or a Bayesian posterior probability or it could be a mixture of these expressions, when multiple types of data are bundled together (e.g., single-channel and whole-cell traces).

The optimizer may also require the gradients of *F* with respect to the free parameters $X \xaf $, as in the case of gradient descent optimization methods (Fletcher, 2013). These gradients can be calculated by numerical approximations, but analytical calculations are more accurate and may actually be faster in some instances. To calculate the gradient of *F* with respect to a free parameter $x \xaf k $, we have to consider that the rate constants *k*_{ij} are functions of $k ij 0 $ and $k ij 1 $. In turn, $k ij 0 $ is a function of $\epsilon ij 0 .$ We also have to consider that *q*_{l} is a function of φ_{l}. These $\epsilon ij0,kij1,$ and φ_{l} quantities are entries in the **R** vector and, thus, are explicit functions of a free parameter $x \xaf k .$ To calculate a gradient, we apply the chain differentiation rule, as follows:

In Eq. 51, *r*_{m}, *r*_{n}, and *r*_{p} are the elements of the **R** vector that correspond to $\epsilon ij0,kij1,$ and φ_{l}, respectively. The ∂*F*/∂*k*_{ij} and ∂*F*/∂*q*_{l} quantities depend on the specific application, e.g., the maximum interval likelihood (Qin et al., 1996) or the maximum point likelihood of single-channel data (Qin et al., 2000a) or the maximum likelihood of macroscopic currents (Milescu et al., 2005). The other partial derivatives can be calculated as follows:

The ∂*q*_{l}/∂*r*_{p} partial derivative depends on the specific transformation between *q*_{l} and φ_{l}, as illustrated in Eq. 53 for the logarithm and identity transformations:

Finally, the partial derivative of any *r*_{i} with respect to any $x \xaf k $ takes the following form:

where *a*_{i,k} and $m i,c + $ are elements in the **A** and **M**^{+} matrices, respectively. In the last two expressions of Eq. 54, the *c* subscript is the index of the inequality constraint relationship that uses $x \xaf k $ as the slack variable (the index in **V**; Eq. 45).

Using all these quantities, the overall analytical derivative of *F* with respect to $x \xaf k $ becomes

Eq. 55 is for the case of $\phi l=lnql$. The subscripts *m*, *n*, and *p* used for the *a* and *m*^{+} quantities are equal to the indices in the **R** vector that correspond to $\epsilon ij0,kij1,$ and φ_{l}, respectively.

#### Calculating the error of the estimates

When estimating the parameters of a model, it is important to have a measure of confidence in those estimates. The variance of a free parameter estimate measures the curvature of the cost function with respect to that parameter. Intuitively, the variance tells us how much the calculated prediction of the model will change when the value of a free parameter $x \xaf k $ is changed by a small amount. We emphasize that this change must be interpreted in the context of the specific data used in the analysis. Thus, parameters estimated with a large variance are generally poorly determined because of insufficient data, whereas a small variance denotes a well-defined parameter.

One could calculate the variance of a free parameter estimate, $Var(x\xafk),$ from the second-order partial derivative of the cost function or could use the variance provided by some optimization engines, as in the case of the Davidon-Fletcher-Powell optimizer (Fletcher, 2013). However, when using the parameter constraints described here, the free parameters $X \xaf $ must be converted back to model parameters **K**, and some transformation must be applied to the variance. Thus, the variance of a model parameter, *Var*(*k*_{i}), can be calculated using the following approximation (Qin et al., 2000a; Milescu et al., 2005):

where $x \xaf p $ is a free parameter in $X \xaf .$ To calculate the variance for each type of model parameter ($k ij 0 ,k ij 1 ,$*a*_{k}, and *q*_{l}), we use the chain differentiation rule. For rate constant parameters $k ij 0 andk ij 1 $, we obtain the following:

For pre-exponential and exponential multiplicative factors *a*_{k}, we have the following expressions:

Finally, for external parameters *q*_{l}, the expression depends on the transformation function. For the logarithm and the identity function, we have the following expressions:

## Discussion

Enforcing prior knowledge when fitting new data is not trivial, and one reason is that prior knowledge may take different forms. For example, it can be a linear mathematical relationship between two sequential, ligand-binding transitions or it can describe the dynamics of the channel during complex episodes of action-potential firing. The first example could be easily handled through model parameterization: the independent parameters are identified by the user and passed on to a search engine, whereas the remaining (dependent) parameters are simply derived from the first set, whenever necessary. However, a more elegant and flexible solution, in our opinion, is the method of reduction (Fletcher, 2013), first applied to kinetic modeling algorithms some 20 years ago (Qin et al., 1996, 2000a; Milescu et al., 2005). However, even the reduction method, despite its reach, is not the universal solution to enforcing prior knowledge. Although very powerful, this method is limited to constraints that can be formulated as explicit linear equality relationships between model parameters. Thus, it cannot handle inequalities, nonlinear relationships, and implicit constraints that describe a model property or behavior, such as the maximum open state occupancy during an action potential.

In this two-part study, we proposed a comprehensive set of mathematical and computational tools that address all these limitations and greatly expand the range of prior knowledge that can be enforced. First, as described in this article, we enhanced the reduction method to handle both equality and inequality linear parameter constraints. Furthermore, we expanded the range of parameters that can be constrained to include not only rate constant parameters but also allosteric and other similar factors and external parameters that describe the data or the experiment. Any relationship between these parameters can now be enforced, as long as it is linear. Second, as described in the companion article (Navarro et al., 2018), any other types of model constraints, such as range constraints, nonlinear parameter relationships, or model properties and behavior, are handled by applying a penalty to the cost function. Together, the reduction method and the penalty method can handle virtually any type of model constraint that is likely to be encountered in the field.

The constraining methods described here and in the companion article are available through the freely available QuB software, as maintained by our laboratory. These methods are also easy to implement by interested readers. The only high-level mathematical operation involved is the singular value decomposition, which is readily available from many free, linear algebra packages. As illustrated in Fig. 3, the code can be implemented as a pair of functions: one that converts a set of interdependent model parameters into a set of free parameters, and a second function that performs the reverse operation. The first function is called only once, when the optimization is started, to initialize the free parameters from the model parameters. Any optimization package has one user-customizable callback function that is called each time the search engine needs the cost function for a given set of parameters. The function that converts free parameters into model parameters can be inserted at the beginning of this callback function. For interested users, a step-by-step numerical example is given in the companion article.

## Acknowledgments

We thank the members of the Milescu laboratories for their constructive comments and suggestions. This work was supported by American Heart Association grants 13SDG16990083 to L.S. Milescu and 13SDG14570024 to M. Milescu and a training grant fellowship from the Graduate Assistance in Areas of National Need Initiative/Department of Education to M.A. Navarro.

The authors declare no competing financial interests.

Author contributions: L.S. Milescu developed the mathematical and computational algorithms and implemented them in software. All authors contributed to designing and testing the algorithms and software and to writing the manuscript.

Richard W. Aldrich served as editor.

## References

## Author notes

A. Salari and M.A. Navarro contributed equally to this paper.