## Abstract

To which extent can optimality principles describe the operation of metabolic networks? By explicitly considering experimental errors and *in silico* alternate optima in flux balance analysis, we systematically evaluate the capacity of 11 objective functions combined with eight adjustable constraints to predict ^{13}C‐determined *in vivo* fluxes in *Escherichia coli* under six environmental conditions. While no single objective describes the flux states under all conditions, we identified two sets of objectives for biologically meaningful predictions without the need for further, potentially artificial constraints. Unlimited growth on glucose in oxygen or nitrate respiring batch cultures is best described by nonlinear maximization of the ATP yield per flux unit. Under nutrient scarcity in continuous cultures, in contrast, linear maximization of the overall ATP or biomass yields achieved the highest predictive accuracy. Since these particular objectives predict the system behavior without preconditioning of the network structure, the identified optimality principles reflect, to some extent, the evolutionary selection of metabolic network regulation that realizes the various flux states.

## Visual Overview

### Synopsis

Based on a long history of biochemical and lately genomic research, metabolic networks, in particular microbial ones, are among the best characterized cellular networks. Most components (genes, proteins and metabolites) and their interactions are known. This topological knowledge of the reaction stoichiometry allows to construct metabolic models up to the level of genome scale (Price *et al*, 2004). Experimentally, sophisticated ^{13}C‐tracer‐based methodologies were developed that enable tracking of the intracellular flux traffic through the reaction network (Sauer, 2006). With the accumulation of such experimental flux data, the question arises why a particular distribution of flux within the network is realized and not one of many alternatives?

Here, we address the question whether the intracellular flux state can be predicted from optimality principles, with the underlying rational that evolution might have optimized metabolic operation toward particular objectives or combinations of multiple objectives. For this purpose, we performed a systematic and rigorous comparison between computational flux predictions and available experimental flux data (Emmerling *et al*, 2002; Perrenoud and Sauer, 2005; Nanchen *et al*, 2006) under six different environmental conditions for the model bacterium *E. coli*. For computational flux predictions, we used a constraint‐based modeling approach that requires a stoichiometric model of metabolism (Stelling, 2004). More specifically, we employed flux balance analysis (FBA) where objective functions are defined that represent optimality principles of network operation (Price *et al*, 2004). This approach has been applied successfully to predict gene deletion lethality (Edwards and Palsson, 2000a, 2000b; Forster *et al*, 2003; Kuepfer *et al*, 2005), network capacities and feasible network states (Edwards 2001, Ibarra 2002), but in only few cases to predict the intracellular flux state (Beard *et al*, 2002; Holzhütter, 2004).

While different objective functions were proposed for different biological systems (Holzhütter, 2004; Price *et al*, 2004; Knorr *et al*, 2006), by far the most common assumption is that microbial cells maximize their growth. To address this issue more generally, we evaluated the accuracy of FBA‐based flux predictions for 11 linear and nonlinear objective functions that were combined with eight adjustable constraints. For this purpose, we constructed a highly interconnected stoichiometric network model with 98 reactions and 60 metabolites of *E. coli* central carbon metabolism. Based on mathematical analyses, the overall model could be reduced to a set of 10 reactions that summarize the actual systemic degree of freedom.

As a quantitative measure of how accurate the experimental data are predicted, we defined predictive fidelity as a single value to quantify the overall deviation between *in silico* and *in vivo* fluxes. By comparing all *in silico* predictions to ^{13}C‐based *in vivo* fluxes, we show that prediction of intracellular steady‐state fluxes from network stoichiometry alone is, within limits, possible. An unexpected key result is that no further assumptions on network operation in the form of additional and potentially artificial constraints are necessary, provided the appropriate objective function is chosen for a given condition.

While no single objective was able to describe the flux states under all six conditions, we identified two sets of objectives for biologically meaningful predictions without the need for further constraints. For unlimited growth on glucose in aerobic or nitrate‐respiring batch cultures, we find that the most accurate and robust results are obtained with the nonlinear maximization of ATP yield per flux unit (Figure 1). Under nutrient scarcity in glucose‐ or ammonium‐limited continuous cultures, in contrast, linear maximization of the overall ATP or biomass yields achieved the highest predictive accuracy.

Since these identified optimality principles describe the system behavior without preconditioning of the network through further constraints, they reflect, to some extent, the evolutionary selection of metabolic network regulation that realizes the various flux states. For conditions of nutrient scarcity, the maximization of energy or biomass yield objective is consistent with the generally observed physiology (Russell and Cook, 1995). The meaning of the maximization of ATP yield per flux unit objective for unlimited growth, however, is less obvious. Generally, it selects for small networks with yet high, albeit suboptimal ATP formation, which has three biological consequences. Firstly, resources are economically allocated since expenditures for enzyme synthesis are, on average, greater for longer pathways. Secondly, suboptimal ATP yields dissipate more energy and thus enable higher catabolic rates. Thirdly, at a constant catabolic rate, a small network results in shorter residence times of substrate molecules until they generate ATP. The relative contribution of these consequences to the evolution of network regulation is unclear, but simultaneous optimization for ATP yield and catabolic rate under this optimality principle identifies a trade‐off between the contradicting objectives of maximum overall ATP yield and maximum rate of ATP formation (Pfeiffer *et al*, 2001).

The in vivo distribution of metabolic fluxes in Escherichia coli can be predicted from optimality principles

At least two different sets of optimality principles govern the operation of the metabolic network under different environmental conditions

Metabolism during unlimited growth on glucose in batch culture is best described by the nonlinear maximization of ATP yield per unit of flux

## Introduction

The acclaimed goal of systems biology is quantitative understanding of functional interactions between the multiple cellular components to eventually predict network, cell and organism behavior (Aebersold, 2005; Bork and Serrano, 2005). Beyond intuition, quantitative understanding inevitably requires computational models to capture the enormous numbers of molecular components that interact in a highly nonlinear manner within interlinked information and biochemical networks (Kitano, 2002; Cassman, 2005; Kholodenko, 2006). For most cellular networks, such as signaling or protein–protein interaction networks, however, we do not even know all involved components that need to be represented in a model. Hence, much of the current focus is on experimental (Aebersold and Mann, 2003; Sopko *et al*, 2006) and computational (Kholodenko *et al*, 2002; Stelling, 2004; Reed *et al*, 2006; Warner *et al*, 2006) identification of missing components and their interactions to establish the network topology as a prerequisite for mechanistic modeling.

Metabolic networks are a notable exception because their interaction topology is well established in several cases; that is, we know most reactions, the enzymes that catalyze them, the genes that encode the enzymes and how they interact stoichiometrically within a biochemical network. As incomplete as this knowledge may be, it is currently far beyond that of basically any other cellular network and allows to construct metabolic models that represent almost entire microbial genomes (Price *et al*, 2004). With up to 1000 biochemical reactions, these genome‐scale models allow to predict network capabilities, for example, by using flux balance analysis (FBA) (Fell and Small, 1986). Successful FBA applications include prediction of gene deletion lethality (Edwards and Palsson, 2000a; Forster *et al*, 2003; Kuepfer *et al*, 2005), end points of adaptive evolution (Ibarra *et al*, 2002) and optimal metabolic states (Edwards *et al*, 2001).

In contrast to dynamic models that require detailed, typically unavailable kinetic parameters, constraint‐based modeling with FBA permits steady‐state analysis of large‐scale networks without large fitted parameter sets (Bailey, 2001; Price *et al*, 2004; Stelling, 2004). To identify optimal solutions in the vast solution space, FBA objective functions are defined to solve the system of linear equations that represent the mass balance constraints. While different objectives were proposed for different biological systems (Kacser and Beeby, 1984; Heinrich *et al*, 1997; Ebenhoh and Heinrich, 2001; Holzhütter, 2004; Price *et al*, 2004; Knorr *et al*, 2007), by far the most common assumption is that microbial cells maximize their growth (see below for further explanation). Since the identified optimal solutions are often inconsistent with the biological reality, the solution space is further restricted through additional constraints that reflect thermodynamic, kinetic or biochemical knowledge. Another problem is that, depending on the shape of the solution space, multiple intracellular flux distributions (alternate optima) may underlie the exact same optimal value that is identified by the objective (e.g., the maximum biomass yield) (Lee *et al*, 2000; Mahadevan and Schilling, 2003). This space of steady‐state flux solutions has been explored for biological meaning (Mahadevan and Schilling, 2003; Reed and Palsson, 2004; Wiback *et al*, 2004; Bilu *et al*, 2006) and to identify candidate network states (Papin *et al*, 2002; Thiele *et al*, 2005), but was largely ignored in many FBA studies that examined arbitrary, single optimal solutions (Segre *et al*, 2002; Almaas *et al*, 2004; Papp *et al*, 2004).

In a fully complementary approach, ^{13}C‐experiments are used to determine intracellular flux states that reveal *in vivo* operation of metabolic networks (Hellerstein, 2003; Sauer, 2004; Wiechert and Nöh, 2005; Sauer, 2006). In some instances, such experimental flux data were used to further restrict the FBA‐computed flux solution space. For lack of experimental data, however, only one or two arbitrary flux distributions were considered (Burgard and Maranas, 2003; Wiback *et al*, 2004). Attempts to actually predict intracellular fluxes by FBA methods are few and either unverified (Papp *et al*, 2004) or tested for a single case (Beard *et al*, 2002; Segre *et al*, 2002; Holzhütter, 2004). With the recent availability of large‐scale experimental flux data from various microbes (Moreira dos Santos *et al*, 2003; Blank and Sauer, 2004; Fischer and Sauer, 2005; Perrenoud and Sauer, 2005; Blank *et al*, 2005b), a more systematic analysis of the correlation between the *in silico* feasible flux space and the *in vivo* realized fluxes is now possible.

Here we examine the predictive capacity of 11 linear and nonlinear network objectives, by evaluating the accuracy of FBA‐based flux predictions through rigorous comparison to ^{13}C‐based flux data from *Escherichia coli* grown under six environmental conditions. By systematically testing all permutations of 11 objective functions with or without eight additional constraints, we identify the most appropriate combination(s) to predict *in vivo* fluxes by FBA. More generally, we thus assess whether assumed optimality principles of evolved network operation are generally valid or whether specific objectives are necessary for environmental conditions that require different metabolic activity.

## Results

### Systematic testing of objective functions and constraints for FBA

To predict intracellular fluxes through the presently known reactions of *E. coli* central carbon metabolism, we constructed a highly interconnected stoichiometric network model with 98 reactions and 60 metabolites that supports the major carbon flows through the cell (Figure 1 and Supplementary Table I). FBA‐based fluxes are typically expressed as relative fluxes that are normalized to the specific glucose uptake rate. Typically, this reference flux is known, hence absolute fluxes can be calculated by re‐scaling. Due to linear dependencies in the network, the systemic degree of freedom is restricted to a limited number of reactions that define the entire flux solution. For our network, 10 reactions are sufficient to describe the actual systemic degree of freedom, as identified by calculability analysis (Van der Heijden *et al*, 1994; Klamt *et al*, 2002). These fluxes were expressed as split ratios at pivotal branch points in the network, where each of the 10 reactions that consume a cellular metabolite is divided by the sum of all producing reactions (Figure 1 and Table I). Qualitatively identical results were obtained when repeating all reported simulations directly with the 10 absolute fluxes instead of the 10 split ratios (data not shown).

Dividing a specific consumption flux by all producing fluxes scales to unity, an unbiased comparison of the 10 network fluxes with often‐different magnitudes is possible. Moreover, it enhances intuition and biological interpretation because, wherever possible, the ratios were chosen to represent metabolic flux ratios that are obtained from ^{13}C‐experiments (Fischer *et al*, 2004) (Table II and Supplementary Table II). For example, split ratio R1 represents the fraction of the intracellular glucose‐6‐phosphate (G6P) pool that is metabolized through phosphoglucoisomerase (Pgi), relative to the summed production via G6P‐dehydrogenase (Zwf), glucokinase (Glk) and the phosphotransferase system (Pts) (Figure 1 and Table I). The experimentally determined split ratios (Table II) can be subdivided into three groups: (i) R1, R4, R5, R6 are always active, (ii) R3 is inactive under all considered conditions and (iii) R2, R7–R10 are conditionally active.

Optimal solutions in this underdetermined system of linear equations were identified by FBA with 11 linear and nonlinear objective functions to identify optimal solutions, some of which are combinations of pairs of objectives (Table III). Depending on the shape of the solution space, linear optimization frequently leads to alternate optima; that is, alternate sets of feasible flux distributions with an identical optimal value (Lee *et al*, 2000; Mahadevan and Schilling, 2003). To quantify the overall variance of *in silico* fluxes, we first determined the absolute range of variation for the individual split ratios by maximizing and minimizing each flux separately. For example, maximization of biomass yield (which is synonymous to the frequently used term of maximization of growth rate (Price *et al*, 2004)) results in ranges of the split ratios R1, R4, R6 and R7, but unique values for the remaining six split ratios (Figure 2A). Maximization of ATP yield without further constraints, in contrast, is a much better defined example with unique values for all 10 split ratios (Figure 2B). Beyond objective functions, these flux variabilities are not only dependent on the chosen objective but also the network structure, and were also shown to exist in genome‐scale models (Mahadevan and Schilling, 2003; Reed and Palsson, 2004). To further constrain the solution space, we imposed eight additional constraints on network operation (Table IV). The choice of objective functions and constraints widely predefine the degree of freedom in terms of specific pathway usage (data not shown), hence appropriate objective/constraint combinations can potentially be used to approximate metabolic behavior.

We systematically assessed the predictive capability of FBA by comparing all objective/constraint permutations to ^{13}C‐detected *in vivo* flux distributions from six growth conditions, including glucose‐ and ammonium‐limited chemostat cultures and batch cultures with excess nutrient supply (Table II and Supplementary Table II). For each of the 99 different optimization problems, the maximum and minimum Euclidean distance between *in silico* and *in vivo* flux solutions was evaluated by simultaneously considering the 10 split ratios. Confidence intervals of the experimental flux ratios were considered by the standardized Euclidean distance, which weights the distance between prediction and data by the square of the corresponding standard deviation δ^{exp}. The resulting value describes the overall deviation of the *in silico* predicted flux distribution (or range of flux distributions) with respect to the corresponding experimental reference solution, and is henceforth referred to as predictive fidelity (Supplementary Figure 1). This predictive fidelity depends on two factors: (i) the minimal possible standardized Euclidean distance to the *in vivo* results and (ii) the potential variance of the *in silico* fluxes that arises from alternate FBA optima.

### FBA‐based flux prediction for batch cultures

First, we determined the predictive fidelity of the 99 objective/constraint combinations for unlimited batch growth on glucose under aerobic, anaerobic and nitrate‐respiring conditions (Sauer *et al*, 2004; Perrenoud and Sauer, 2005). Obviously, the agreement is specific for each case, since it depends (i) on the particular objective/constraint combination that defines the shape of the solution space and (ii) the experimental reference flux distribution (Figure 3, Supplementary Figure 1 and Supplementary Table III). Since minimization of glucose consumption and maximization of ATP yield per reaction step gave almost identical results as maximization of biomass yield and maximization of ATP yield, respectively, only the latter two are discussed (Supplementary Table III). The results obtained by minimization of reaction steps, minimization of the redox potential and minimization of ATP producing fluxes were considerably worse than those obtained with the other objectives, hence are not discussed in the following (Supplementary Table III).

Without requiring additional constraints, the highest predictive fidelity for aerobic batch cultures was obtained by maximizing ATP yield per flux unit, yielding a unique flux prediction that is closest to the experimental data (Figure 3). Since this nonlinear optimization function is non‐convex, it bears the danger of identifying only local optima. To confirm that indeed a global optimum was identified, we first reformulated the original objective function as a convex function that contains a linear and a nonlinear, but convex term (see Materials and methods for details). This new nonlinear but convex function cannot be optimized *per se*, since it needs *a priori* weighting of both function terms. Thus, in a second step, we performed a sensitivity analysis around the previously identified optimal solutions. Since no iteration resulted in a solution with a higher optimal value, we have strong indication that indeed global optima were identified (Supplementary Figure 7). Of the remaining objective functions, only the maximization of the ATP yield objective achieved similar predictive fidelities when combined with particular constraints. Maximization of biomass yield, in contrast, suffered from alternate optima over a wide range of constraints (Figure 3). Although unique results are feasible by invoking a P‐to‐O‐ratio (moles of ATP produced per mole of oxygen) of unity, the predictive fidelity is still inferior to the one obtained with the maximization of ATP objectives.

The predictive fidelity is a general criterion for the predictive accuracy. It cannot, however, identify the individual metabolic functions that are responsible for the deviations. To elucidate whether these are based on large errors in single ratios or on small errors in many ratios, we plotted *in silico* and *in vivo* ratios as scatter plots where perfect predictions fall on the bisecting diagonal (Figure 4). For aerobic batch cultures, acetate secretion (R9) in combination with a sound predictive fidelity is one main discriminating variable that was only captured by the maximization of ATP yield per flux unit (Figure 4A and C). In combination with oxygen constraints, the maximization of ATP yield mimics the maximization of ATP yield per flux unit objective (Figures 3, 4B and C). Minimizing the overall intracellular fluxes inherently leads to acetate secretion, however, at the cost of deviations in other ratios, in particular for Entner–Doudoroff activity (R2) (data not shown).

Akin to aerobic cultures, the maximization of ATP yield per flux unit was the only objective that achieved reasonable predictions without further constraints for anaerobic nitrate‐respiring batch cultures (Figures 3 and 4E). Improved predictions are possible, however, by setting a particular constraint on the nitrate respiration rate for the maximization of biomass yield objective (Figures 3 and 4D). Largely independent on the invoked constraints the anaerobic batch culture was well predicted by all considered objective functions (Figures 3 and 4F). This behavior is due to the low degree of freedom in the absence of an external electron acceptor and to the experimental uncertainty of some of the fluxes (see Supplementary Table II), which allow for a relatively high predictive fidelity even if the actual agreement is low.

### FBA‐based flux prediction for chemostat cultures

In contrast to unrestricted nutrient supply in batch cultures, a single, defined nutrient limits the rate of growth in continuous chemostat cultures (Russell and Cook, 1995; Hoskisson and Hobbs, 2005). To evaluate predictions for the rather different metabolic behavior under such conditions, we used experimental flux data from slowly (0.1 h^{−1}) and rapidly (0.4 h^{−1}) growing chemostat cultures under glucose‐ (Nanchen *et al*, 2006) and ammonium limitation (Emmerling *et al*, 2002) (Table II and Supplementary Table II).

Largely independent of additional constraints, the maximization of ATP or biomass yield objectives approximated all chemostat cultures best (Figures 5 and 6). Maximization of ATP producing fluxes resulted in similar predictive fidelities as maximization of ATP yield (Supplementary Table III). Although mathematically distinct, both objectives maximize ATP production and thus lead to similar *in silico* flux predictions. Alternate optima occurred only in one case for the biomass objective and can be overcome, as for aerobic batch cultures, by constraining the P‐to‐O‐ratio to unity. The relative independence of the predictive fidelity on constraints demonstrates that these objectives provide somewhat robust predictions for metabolism under nutrient limitation. Nevertheless, various specific objective/constraint combinations were also capable of describing the different conditions, in particular the maximization of ATP yield per flux unit, in combination with all eight constraints for both carbon‐limited chemostat cultures (Figures 5 and 6E). In sharp contrast to batch cultures, however, the maximization of ATP yield per flux unit objective was basically useless without further constraints (Figure 6B).

### Robustness of predicted flux solutions

As a main discriminating variable of good objectives for aerobic batch cultures, the well‐known phenomenon of acetate overflow was only captured when maximizing the ATP yield per flux unit (Figure 3). Maximizing the overall ATP yield mimicked this behavior only when combined with oxygen uptake constraints. Since particular combinations of oxygen uptake and P‐to‐O ratio constraints and the frequently used maximization of biomass yield objective should achieve the same effect (Varma *et al*, 1993; Varma and Palsson, 1994), we performed a sensitivity analysis by determining the predictive fidelity and acetate production for step‐wise increases of the oxygen uptake constraint for four P‐to‐O ratios (Figure 7A and B and Supplementary Figures 2 and 3).

As for maximization of the overall ATP yield, only fine‐tuning of the network by invoking particular combinations of P‐to‐O ratio and oxygen constraints resulted in reasonable flux predictions and acetate secretion rates for the maximization of biomass yield objective function. However, the predictive fidelity was very sensitive to changes in the parameters, such that only a narrow range of oxygen uptake constraints enforced a good fit for a given P‐to‐O ratio, for example unrealistically low oxygen uptake rates of 5–7 mmol/g h at a P‐to‐O ratio of 2 (Supplementary Figures 2 and 3). For maximization of the ATP yield objective, in contrast, the predictive fidelity was relatively insensitive to the exact value of the oxygen uptake and the P‐to‐O ratio constraint, with a critical threshold for the oxygen uptake constraint of around 15 mmol/g h; that is, the upper bound of experimentally observed values in glucose batch cultures (Varma *et al*, 1993; Varma and Palsson, 1994; Xu *et al*, 1999). Since such metabolic parameters often vary between strains or with small environmental differences, a certain robustness of predicted flux solutions with respect to the chosen constraints is a highly desirable property for constraint‐based modeling. Hence, both ATP objectives are clearly of superior robustness for the prediction of fluxes in aerobic batch cultures.

### Are all fluxes equally difficult to predict?

To systematically analyze the predictive capability for individual flux ratios, we defined the specific agreement ρ_{i} (Supplementary Figure 4). This value considers the deviation between each single pair of *in silico* and *in vivo* values for all 10 split ratios and is weighted by the ratio of the corresponding accuracies, that is the possible range of the computational split ratio divided by the experimental standard deviation.

To obtain an overview of the most difficult to predict fluxes, independent of the specific objective and constraints, we applied cluster analysis to the Euclidean distance among specific agreements ρ (Figure 8 and Supplementary Figures 5 and 6). The 2700 considered data points represent the 10 split ratios for all objective/constraint combinations considered in Figures 3 and 5 under each environmental condition. Most difficult to predict in four out of the six conditions were the fluxes into glycolysis (R1) and acetate secretion (R9), although sometimes in different combinations (Figure 8A–C and F). Exceptions were the C‐limited chemostats, were fluxes through the glyoxylate shunt (R7) and the TCA cycle (R6) were most difficult to predict (Figure 8D and E). Thus, some fluxes are clearly more difficult to predict than most others, but those problematic ones often change with the environmental conditions.

## Discussion

The key question addressed here is whether intracellular fluxes in metabolic steady state can be predicted from network stoichiometry alone by invoking optimality principles. Our systematic and statistically rigorous comparison of FBA‐based *in silico* flux predictions from 99 objective/constraint combinations to *in vivo* fluxes from ^{13}C‐experiments demonstrated that prediction of relative flux distributions is, within limits, possible. Since no single objective predicted the experimental data for wild‐type *E. coli* under all conditions, the pivotal element is to identify the most relevant objective for each condition.

For unlimited growth on glucose in oxygen or nitrate respiring batch cultures, by far the best objective function was nonlinear maximization of the ATP yield per unit of flux, which is a combination of the linear maximization of overall ATP yield and minimization of the overall flux. In some cases, similar predictions could be achieved by combining the overall maximization of ATP and biomass yield objectives with particular oxygen uptake constraints. Under nutrient scarcity in nutrient‐limited continuous cultures, in contrast, the linear maximization of ATP or biomass yield were clearly superior. As a result of the low degree of freedom in non‐respiring batch cultures, all objective functions lead to equally well flux predictions.

As an unexpected key result, model preconditioning through additional and potentially artificial constraints is not necessary if the appropriate objective function is chosen for a given condition. Invoking additional constraints for suitable objectives achieved only subtle improvements or avoided alternative optima in few cases. When combined with particular constraints, even suboptimal objectives could be forced to yield equally accurate predictions for some conditions; in the sole case of nitrate respiration even better predictions. Setting of these additional constraints, however, is condition‐ and objective specific, thus requires considerable *a priori* knowledge to be biologically meaningful. Except for subtle differences in predictive fidelity, all major conclusions are independent of using normalized ratios instead of absolute flux values (data not shown).

We explicitly considered the fundamental FBA problems of alternate optima and experimental accuracy by size reduction through calculability analysis and by including confidence regions for *in vivo* fluxes in the standardized Euclidean distance, respectively. An important question is whether or not our results are model dependent. To address this point, we verified the key conclusions with two genome‐scale models of *E. coli* metabolism (Edwards and Palsson, 2000b; Reed *et al*, 2003). Although specific properties such as flux variability clearly depend on the network structure of the particular stoichiometric network model (see below for details), the above‐identified objectives also achieved the best predictions for fluxes in the central carbon metabolism with either genome‐scale model (data not shown).

Clearly, alternate optima occurred also in genome‐scale network models (see Materials and methods for details). Independent of the model size, however, *in silico* variability can be avoided such that uniquely defined solutions are obtained when low P‐to‐O ratios are assumed or when all internal proton fluxes are balanced. With the proton‐balanced genome‐scale model of Reed *et al* (2003), for example, such unique flux solutions can be obtained. The solution space spanning all alternate optima has previously been scanned for biological meaning (Mahadevan and Schilling, 2003; Reed and Palsson, 2004; Wiback *et al*, 2004), and identified interesting correlations with levels of gene expression that point to evolutionary constraints on how tight certain reactions need to be regulated (Bilu *et al*, 2006). A potential complication is that the solution space itself is not, or not entirely, an inherent network feature, but also a function of the arbitrarily chosen objectives and constraints.

An important distinction must be made between FBA‐based prediction of the typically investigated general physiology (i.e., extracellular uptake and production rates and the growth rate) and the here attempted prediction of the underlying intracellular flux distribution, which has several‐fold more variables. Hence, there is no immediate contradiction between good prediction of growth physiology obtained by maximizing the growth yield (Price *et al*, 2004) or minimizing the redox production rate (Knorr *et al*, 2007), and their here demonstrated limited capacity to predict the underlying flux state. In some cases, alternate flux optima are responsible for the apparent discrepancy and in others it is primarily the specific combination of chosen constraints that explain why good physiology predictions were achieved with suboptimal objectives. By demonstrating that intracellular fluxes can be approximated with experimentally validated optimality assumptions, we go beyond flux prediction algorithms such as minimization of metabolic adjustment (Segre *et al*, 2002) or regulatory on/off minimization (Shlomi *et al*, 2005) that require a reference flux distribution in the wild type to predict fluxes in mutants. Carefully chosen objectives achieve intrinsically good prediction not only of growth physiology, but also of intracellular fluxes in wild‐type *E. coli* without preconditioning the system through additional constraints apart from the experimentally determined growth rate. Since the network model was identical in all cases, the identified optimality functions potentially reflect the evolved regulatory processes that realize the particular flux states under different environmental conditions.

Under nutrient scarcity in chemostat cultures, metabolism normally supports efficient biomass formation with respect to the limiting nutrient (Russell and Cook, 1995). Based on our results, this operational state appears to have evolved under the objective to maximize either the ATP or biomass yield (synonymous to the frequently used maximization of growth rate objective). Under conditions of unlimited growth in aerobic or nitrate‐respiring batch cultures, in contrast, energy production is clearly not optimized *per se* because cells secrete large amounts of acetate instead of using the more efficient respiratory chain.

What then is the biological interpretation of the more appropriate maximization of the ATP yield per flux unit? Optimization of this objective is realized by maximizing ATP production (the nominator) and by minimizing the overall intracellular flux (the denominator). Hence, small networks with yet high, albeit suboptimal catabolic ATP formation are identified, which has three potential biological consequences. Firstly, resources are economically allocated because expenditures for enzyme synthesis are, on average, greater for longer pathways. Secondly, suboptimal ATP yields dissipate more energy and thus enable higher catabolic rates because the difference between the free energies of substrates and products must be used for both, energy conservation by synthesizing ATP (increase the yield) and energy dissipation to drive the chemical reaction (increase the rate) (del Valle and Aledo, 2002; Pfeiffer and Schuster, 2005). Thirdly, at a constant catabolic rate, a small network results in shorter residence times of substrate molecules until they generate ATP and probably other cofactors. The relative contribution of these consequences to the evolution of network regulation is unclear, but simultaneous optimization for ATP yield and catabolic rate under this optimality principle identifies a trade‐off between the contradicting objectives of maximum overall ATP yield and maximum rate of ATP formation (Pfeiffer *et al*, 2001). Under nutrient scarcity, in contrast, the metabolic state is closer to an optimal yield of ATP (or biomass) at the cost of the rate of formation.

## Materials and methods

### Stoichiometric model and constraints

The constructed stoichiometric model of *E. coli* contains all presently known reactions in central carbon metabolism with 98 reactions and 60 metabolites (Supplementary Table I). To apply FBA, the reaction network was automatically translated into a stoichiometric matrix (Schilling and Palsson, 1998) by means of a parser program implemented in Matlab (MATLAB^{®}, version 7.0.0.19920 (R14), The MathWorks Inc., Natick, MA). Assuming steady‐state mass balances, the production and consumption of each of the *m* intracellular metabolites *M*_{i} is balanced to yield

with

*S* corresponds to the stoichiometric matrix (*m* × *n*) and ν (*n* × 1) to the array of *n* metabolic fluxes with ν_{i}^{lb} as lower and ν_{i}^{ub} as upper bounds, respectively. The above equations represent the conservation law of mass that is fundamental to constraint‐based modeling. For all herein presented stoichiometric analyses, maximization of biomass yield is synonymous to the frequently used maximization of growth rate objective (Price *et al*, 2004). This is because stoichiometric models are sets of linear balance equations that are inherently dimensionless, hence maximization of the biomass reaction optimizes the amount of product (i.e., the yield) rather than a time‐dependent rate of formation. The P‐to‐O ratio constraint was implemented by omitting the energy‐coupling NADH dehydrogenase I (Nuo), cytochrome oxidase *bo3* (Cyo) and/or cytochrome oxidase *bd* (Cyd) components of the respiratory chain. For a ratio of unity, Cyd and Nuo were set equal to zero. Under anaerobic conditions, electron flow is only possible via the NADH oxidases Nuo or NADH dehydrogenase II (Ndh) to fumarate reductase (Frd), hence coupled to succinate fermentation. For nitrate respiration, the terminal oxidase nitrate reductase (Nar) was used instead of Cyd or Cyo (Unden and Bongaerts, 1997).

For the genome‐scale analysis we used two recently reconstructed models of *E. coli* metabolism (Edwards and Palsson, 2000b; Reed *et al*, 2003). *In silico* growth was simulated on glucose minimal medium for all six environmental conditions. ADP remained unbalanced, since otherwise formation of adenosine would be carbon‐limited. For the proton‐balanced model of Reed *et al* (2003), severe alternate optima occurred in central carbon metabolism given an unlimited proton exchange flux between the cell and the medium and a P‐to‐O ratio of 2, that is the upper bound of the biologically feasible range of P‐to‐O ratios (Unden and Bongaerts, 1997). To prevent the unlimited production of ATP equivalents through the ATPS4r reaction under this condition, all external protons involved in the respiratory chain and the transhydrogenase reaction were balanced (specifically, we balanced the external protons around the reactions ATPS4r, TDH2, CYTBD, CYTBO3, NO3R1, NO3R2, NADH6, NADH7, NADH8). A P‐to‐O ratio of 2 was implemented by assuming both the transport of four protons through CYTBO3 and NADH6 across the membrane and the diffusion of four protons through ATPS4r for the formation of one ATP equivalent.

### Objective functions

Linear optimization was used to identify optimal solutions for the objectives maximization of biomass or ATP yield, minimization of glucose consumption, minimization of the redox potential and minimization as well as maximization of ATP producing fluxes. The mathematical definition for all 11 objective functions is given in Table III. While identification of a global optimal value is guaranteed, alternate optima occur frequently. Nonlinear optimization such as the minimization of the overall intracellular flux and maximization of biomass or ATP yield per flux unit do not produce alternate optima. Minimization of the overall intracellular flux always identifies a global optimum because the underlying optimization problem is quadratic and thus convex. Since such convexity cannot be assumed for maximization of biomass or ATP yield per flux unit, we used the general nonlinear solver of the programing package LINDO (Lindo Systems Inc., Chicago, IL) with 100 randomly chosen starting values to find global optima for these two non‐convex nonlinear optimization problems. We implemented two independent approaches to validate our results. In a first approach we randomly changed the value of 5% of the variables by 10% iteratively 100 times for all constraint permutations (data not shown). Flux distributions with a higher objective function value were not identified in any of the iterations. In a second approach we reformulated both non‐convex nonlinear objective functions

and

to a nonlinear, but convex form

respectively, corresponding to maximization of (2a) ATP and (2b) biomass yield per flux unit, respectively (Table III contains the mathematical definitions of all objective functions). ε represents a small value that characterizes the unique trade‐off between ATP (biomass) maximization and minimization of the flux norm. We assessed the objective function value of ATP (biomass) yield maximization per flux unit for different ε values according to ε=ε_{0} (1±0.5), where ε_{0} was set equal to the objective function value, which was identified previously with the non‐convex nonlinear objective function of max ATP (biomass) yield per flux unit, for the particular environmental condition (Supplementary Figures 7 and 8). Given ε=ε_{0}, the previously found flux distribution yielded the optimal solution for every environmental condition (Supplementary Figures 7 and 8). For all other ε values, independent optimizations only lead to suboptimal solutions. Hence, in the present case we have strong indication that global optima are actually identified.

For minimization of the number of reaction steps and for the maximization of the ATP yield per reaction step, we used the mixed‐integer solver of the programing package LINDO (Lindo Systems Inc.). All mixed‐integer optimizations were formalized as:

where for each flux *i*, *y*_{i}=1 stands for a non‐zero, that is active flux in ν_{i} and *y*_{i}=0 otherwise, and *w*_{i}^{ub} and *w*_{i}^{lb} are thresholds for determining non‐zero fluxes (equations 3d, 3e and f), with κ and ε specifying the relative and absolute ranges of tolerance, respectively (equations 3g and h). The definitions of the objective functions *f* of linear minimization of reaction steps and nonlinear maximization of ATP yield per reaction step (equation 3a) can be taken from Table III. The constraints of the original linear programing problem with respect to steady state of mass balances and enzyme reversibilities were maintained (equations 3b and c). For κ and ε, we chose minimal values that still resulted in reasonable running times of the mixed‐integer solver (specifically, we chose κ=0, ε=0 and κ=0, ε=1 for linear minimization of reaction steps and nonlinear maximization of ATP yield per reaction step, respectively). Optimality of the solution obtained by the mixed‐integer nonlinear optimization was verified by randomly changing 5% of the integer values 10 times iteratively for the six conditions without additional constraints (data not shown).

### Calculability analysis

Instead of comparing all computationally (ν_{i}) and experimentally identified fluxes (ν_{i}^{exp}) in the network, we focused on those that are sufficient to describe the complete systemic degree of freedom because most fluxes are linearly dependent. This minimal subset of fluxes was identified by calculability analysis (Van der Heijden *et al*, 1994; Klamt *et al*, 2002) from the null space of the stoichiometric matrix *S* and allowed the calculation of all unique reaction rates in the underdetermined network. To reduce the considerable difference in magnitude of different fluxes in the network, their rates were expressed as split ratios *R* of divergent fluxes (Figure 1 and Table I), hence they are scaled to values between zero and unity. Error propagation was used to take the standard deviation of each of the experimentally determined split ratios into account. A default error of 5% was assumed for inactive flux ratios. Secretion of succinate, pyruvate and formate was not considered for calculability analysis, since the corresponding rates are negligible. Non‐carbon fluxes such as respiration were also neglected.

### Predictive fidelity and alternate optima

Generally, there are two basic principles to quantify the agreement between data series; that is, correlation coefficients that measure linear dependencies and the geometric distance (McShane *et al*, 2002; Segre *et al*, 2002). Since we were interested in the similarity between multiple computational and experimental results rather than their linear dependency, we used the Euclidean distance to quantify the overall agreement. The Euclidean distance belongs to the group of L_{2} distance measures, which capture the deviation between two points in absolute terms (Diggle, 1983)

Our reduction of the overall solution space to a minimal set of 10 linear independent split ratios allowed the direct comparison of complete experimental and computational flux distributions. We defined the term predictive fidelity as the overall agreement between complete experimental and computational flux solutions relative to the specific experimental variance. This explicitly includes putative variability of *in silico* split ratios due to existence of alternate optima. The global optimal value *Z*^{obj} of different objective functions *f* (e.g., maximization of biomass or ATP yield) is determined in a preliminary optimization step, which defines the computational solution space. In case of linear objective functions, with potential alternate optima, the best and the worst possible agreement of the underlying range of flux vectors and experimental data is subsequently quantified by minimizing and maximizing the standardized Euclidean distance, *D*^{S}, respectively:

The global optimal value *Z*^{obj} has to hold (equation 5*b*) as well as the constraints of the original linear programing problem (equations 1, 5*c* and *d*). *D*^{S} marks the standardized Euclidean distance (equation 5*e*), where the deviation ε between *in silico* and *in vivo* ratios, *R*^{comp} and *R*^{exp}, respectively (equation 5*f*), is weighted by the experimental variance σ^{exp} (equation 5*g*). Finally, the split ratios *R* are a function of intracellular fluxes ν (equation 5h). Note that both computational and experimental split ratios determine the Euclidean distance, such that even small changes in the *in vivo* and *in silico* results can result in a completely different behavior.

Hence, the predictive fidelity explicitly considers both alternate optima and unique solutions. All optimizations for the calculation of the predictive fidelity were performed with linprog and fmincon (MATLAB^{®}, version 7.0.0.19920 (R14)). Iterative calculations with 100 different, randomly chosen starting points were performed when the nonlinear solver fmincon was used.

### Specific agreement for individual split ratios

Predictive fidelity ranks the overall agreement between FBA flux predictions and experimental data. For detailed analysis of the predictive agreement in individual split ratios, experimental confidence intervals and theoretical flexibility due to alternate optima had to be taken into account. The absolute distance between each experimental *R*^{exp} and the mean computational split ratio *R*^{comp,mean} was weighted by the experimental standard deviation δ^{exp} and the possible range Δ*R* of the computational split ratios (Supplementary Figure 4). The specific agreement ρ was quantified by a standardized variable:

By weighting the absolute distance with the experimental uncertainty, the predictive accuracy is taken into account, that is a large absolute deviation is considered less severe if it is associated with a large experimental uncertainty δ^{exp}. On the other hand, the absolute deviation is considered more severe if it is associated with a large computational uncertainty Δ*R*. A default value of 0.05 was chosen for δ^{exp} or Δ*R*, respectively, when the values were zero.

The hierarchical cluster trees were created with the linkage algorithm (MATLAB^{®}, version 7.0.0.19920 (R14)) using the Euclidean distances among all data points. The cophenetic correlation coefficients (i.e., the correlation coefficients of the distance values) (Legendre and Legendre, 1998) were calculated for all cluster trees to guarantee a faithful representation of the dissimilarities among the 10 ratios for every objective/constraint combination considered. Groups of nodes were assigned where the linkage among the nodes was less than 0.7, when the linkage was normalized to values between 0 and 1.

## Acknowledgements

We are grateful to Annik Nanchen, Eliane Fischer, Tobias Fuhrer, Nicola Zamboni, Matthias Heinemann and Joerg Stelling for fruitful discussions and critical comments on the manuscript. Support for Robert Schuetz through an ETH research grant is acknowledged.

## Supplementary Information

Supplementary Table I [msb4100162-sup-0001.xls]

Supplementary Table II [msb4100162-sup-0002.xls]

Supplementary Table III [msb4100162-sup-0003.xls]

Supplementary Figure 1 [msb4100162-sup-0004.pdf]

Supplementary Figure 2 [msb4100162-sup-0005.pdf]

Supplementary Figure 3 [msb4100162-sup-0006.pdf]

Supplementary Figure 4 [msb4100162-sup-0007.pdf]

Supplementary Figures 5 [msb4100162-sup-0008.pdf]

Supplementary Figures 6 [msb4100162-sup-0009.pdf]

Supplementary Figure 7 [msb4100162-sup-0010.pdf]

Supplementary Figure 8 [msb4100162-sup-0011.pdf]

Structured Data [msb4100162-sup-0012.zip]

## References

This is an open access article under the terms of the Creative Commons Attribution‐NonCommercial‐NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.

- Copyright © 2007 EMBO and Nature Publishing Group