The computational study of human metabolism has been advanced with the advent of the first generic (non‐tissue specific) stoichiometric model of human metabolism. In this study, we present a new algorithm for rapid reconstruction of tissue‐specific genome‐scale models of human metabolism. The algorithm generates a tissue‐specific model from the generic human model by integrating a variety of tissue‐specific molecular data sources, including literature‐based knowledge, transcriptomic, proteomic, metabolomic and phenotypic data. Applying the algorithm, we constructed the first genome‐scale stoichiometric model of hepatic metabolism. The model is verified using standard cross‐validation procedures, and through its ability to carry out hepatic metabolic functions. The model's flux predictions correlate with flux measurements across a variety of hormonal and dietary conditions, and improve upon the predictive performance obtained using the original, generic human model (prediction accuracy of 0.67 versus 0.46). Finally, the model better predicts biomarker changes in genetic metabolic disorders than the generic human model (accuracy of 0.67 versus 0.59). The approach presented can be used to construct other human tissue‐specific models, and be applied to other organisms.
The study of normal human metabolism and its alterations is central to the understanding and treatment of a variety of human diseases, including diabetes, metabolic syndrome, neurodegenerative disorders, and cancer. A promising systems biology approach for studying human metabolism is through the development and analysis of large‐scale stoichiometric network models of human metabolism. The reconstruction of these network models has followed two main paths: the former being the reconstruction of generic (non‐tissue specific) models, characterizing the complete metabolic potential of human cells, based mostly on genomic data to trace enzyme‐coding genes (Duarte et al, 2007; Ma et al, 2007), and the latter is the reconstruction of cell type‐ and tissue‐specific models (Wiback and Palsson, 2002; Chatziioannou et al, 2003; Vo et al, 2004), based on a similar methodology to that described above, with the extra complexity of manual curation of literature evidence for the cell/system specificity of metabolic enzymes and pathways.
On this background, we present in this study, to the best of our knowledge, the first computational approach for a rapid generation of genome‐scale tissue‐specific models. The method relies on integrating the previously reconstructed generic human models with a variety of high‐throughput molecular ‘omics’ data, including transcriptomic, proteomic, metabolomic, and phenotypic data, as well as literature‐based knowledge, characterizing the tissue in hand (Figure 1). Hence, it can be readily used to quite rapidly build and use a large array of human tissue‐specific models. The resulting model satisfies stoichiometric, mass‐balance, and thermodynamic constraints. It serves as a functional metabolic network that can then be used to explore the metabolic state of a tissue under various genetic and physiological conditions, simulating enzymatic inhibition or drug applications through standard constraint‐based modeling methods, without requiring additional context‐specific molecular data.
We applied this approach to build a genome scale model of liver metabolism, which is then comprehensively tested and validated. The model is shown to be able to simulate complex hepatic metabolic functions, as well as depicting the pathological alterations caused by urea cycle deficiencies. The liver model was applied to predict measured intra‐cellular metabolic fluxes given measured metabolite uptake and secretion rates at different hepatic metabolic conditions. The predictions were tested using a comprehensive set of flux measurements performed by (Chan et al, 2003), showing that the liver model obtained more accurate predictions compared to those obtained by the original, generic human model (an overall prediction accuracy of 0.67 versus 0.46). Furthermore, it was applied to identify metabolic biomarkers for liver in‐born errors of metabolism—once again, displaying superiority vs. the predictions generated by the generic human model (accuracy of 0.67 versus 0.59).
From a biotechnological standpoint, the liver model generated here can serve as a basis for future studies aiming to optimize the functioning of bio artificial liver devices. The application of the method to rapidly construct metabolic models of other human tissues can obviously lead to many other important clinical insights, e.g., concerning means for metabolic salvage of ischemic heart and brain tissues. Last but not least, the application of the new method is not limited to the realm of human modeling; it can be used to generate tissue models for any multi‐tissue organism for which a generic model exists, such as the Mus musculus (Quek and Nielsen, 2008; Sheikh et al, 2005) and the model plant Arabidopsis thaliana (Poolman et al, 2009).
The first computational approach for the rapid generation of genome‐scale tissue‐specific models from a generic species model.
A genome scale model of human liver metabolism, which is comprehensively tested and validated using cross‐validation and the ability to carry out complex hepatic metabolic functions.
The model's flux predictions are shown to correlate with flux measurements across a variety of hormonal and dietary conditions, and are successfully used to predict biomarker changes in genetic metabolic disorders, both with higher accuracy than the generic human model.
The understanding of human metabolism is crucial for the study and treatment of a diverse and wide range of clinical conditions, ranging from common metabolic disorders, such as obesity and diabetes, to rare inborn errors of metabolism (IEMs). Carcinogenesis is also known to involve abnormal metabolic phenotypes, and metabolic targets have long been used in cancer chemotherapy (Serkova et al, 2007; Galmarini et al, 2008). Recently, malfunctions in energy metabolism were shown to be involved in various brain disorders, from schizophrenia to neurodegenerative disorders (Holmes et al, 2006; Huang et al, 2007). The computational study of human metabolism could complement experimental investigations of these central medical problems providing insights into their pathophysiology and advancing their treatment.
Mathematical modeling of cellular metabolism has traditionally been performed through kinetic modeling techniques that require detailed information on kinetic constants and on enzyme and metabolite levels (Garfinkel and Hess, 1964). However, the lack of accurate cellular information of enzymes' kinetics and levels currently limits the applicability of such methods to small‐scale systems. An alternative computational approach, constraint‐based modeling (CBM), bypasses these hurdles as it does not depend on such detailed information. Constraint‐based modeling assumes a metabolic steady state under which feasible flux distributions satisfy a stoichiometric mass‐balance requirement, thermodynamic constraints and constraints on enzymes' capacities that are based on experimental observations of flux rates. This modeling paradigm has been extensively applied with considerable success to study microbial physiology (Edwards et al, 2001; Segre et al, 2002; Shlomi et al, 2005; Feist and Palsson, 2008; AbuOun et al, 2009; Durot et al, 2009; Kumar and Maranas, 2009; Oberhardt et al, 2009). Though still less developed, large‐scale modeling of human metabolism is constantly progressing (Goodacre et al, 2004); earlier study has focused on characterizing distinct human metabolic pathways (Kanehisa and Goto, 2000; Romero et al, 2004), and modeling specific cell types and organelles (Wiback and Palsson, 2002; Chatziioannou et al, 2003; Vo et al, 2004).
In 2007, two generic human metabolic models were presented on the basis of an extensive evaluation of genomic and bibliomic data (Duarte et al, 2007; Ma et al, 2007). These network models consist of a collection of biochemical reactions that may take place under different tissues and cell types, depending on physiological conditions. The potential clinical utility of the generic model was previously demonstrated by its ability to identify functionally related sets of reactions that are causally related to hemolytic anemia, and potential drug targets for treating hypercholesterolemia (Duarte et al, 2007). In a recent study, the utility of this generic human metabolic model was further demonstrated in predicting metabolic biomarkers whose concentration is altered due to genomic mutations in IEMs (Shlomi et al, 2009). Addressing the challenge of using a generic human model to predict tissue‐specific metabolism, a computational method for integrating a generic model with tissue‐specific gene and protein expression data was recently presented (Shlomi et al, 2008). This study successfully predicted a variety of metabolic behaviors of different human tissues, including the brain, liver, kidney and more. The predicted metabolic behavior characterizes, for each tissue, a single, normal physiological condition under which the expression data (used as input) was measured.
In this study, we develop a new computational method that uses a variety of different tissue‐specific molecular data sources to reconstruct functional metabolic network models of human tissues, rather than predicting a single metabolic state of a tissue as shown in the study by Shlomi et al (2008). Such models can then be used to explore the metabolic state of a tissue under various genetic and physiological conditions through standard CBM methods, without requiring additional context‐specific molecular data. More specifically, although in the previous approach of Shlomi et al (2008) one could not simulate and study the effects of genetic perturbations or drug applications, this can now be performed in a straightforward manner given the model constructed in this study. Notably, harnessing the mixed‐integer linear programming approach used in the study by Shlomi et al (2008) to our current goal of constructing a full‐fledged tissue model does not work—this is due to the high computational demands put forward by the many tissue‐specific data sources required in this study. Hence, alternatively, our new method is based on heuristically pruning the generic human metabolic model to derive a sub‐network that is as consistent as possible with the pertaining tissue‐specific molecular data sources, including literature‐based knowledge, transcriptomic, proteomic, metabolomic, and phenotypic data. Applying the new method to generate the first genome‐scale metabolic model of the liver, the resulting model is shown to be consistent with a variety of hepatic data sources, and to successfully permit the simulation of various metabolic states under different physiological conditions, in a manner superior to that obtained using the generic model.
The model‐building algorithm (MBA)
Our algorithm derives a tissue‐specific metabolic model from a generic model based on network integration with various molecular data sources. We begin by first inferring from the tissue‐specific data a set of reactions denoted as the core reactions, that is, reactions that are included in the generic model and should be included in the tissue model we aim to build (see Materials and methods section). We differentiate between core reactions that have a high versus moderate probability to be carried out in the specific tissue and divide the core into two sets, CH and CM, for high and moderate likelihood reactions, respectively. In general, the CH set includes human‐curated tissue‐specific pathways and the CM set includes reactions testified by molecular data. Both the CH and the CM are subsets of the generic model. The input to our algorithm is a generic model that is composed of a union of reactions that may exist in different, individual tissues (i.e., in our case, the generic human model), and the core, tissue‐specific reactions (Figure 1). Our goal is then to derive the most parsimonious tissue‐specific consistent model, which includes all the tissue‐specific high‐probability reactions (CH), a maximal number of moderate probability reactions (CM), and a set of additional reactions from the generic model that is required for gap filling—that is, for enabling the activation of all core reactions, resulting in a fully viable and consistent tissue‐specific model. The latter reactions are observed by searching for a minimal set of reactions that should be added to obtain a consistent model—that is, a model in which for each reaction there is an overall feasible flux distribution in which it is active. Aiming to find the most parsimonious model on one hand, and maximizing the number of moderate probability reactions that are included in the model on the other hand, induces a tradeoff. The latter is tuned through a parameter that weighs both optimization criteria to obtain a single score that evaluates the quality of a model (see Materials and methods section). Sensitivity analysis was preformed to examine the reliance of the resulting model on this optimization parameter (Supplementary information, pp 5–6), showing that MBA's performance is fairly robust and does not hinge upon a choice of a narrow range of optimization parameter values.
To find an optimal model with a maximal score, we use a greedy heuristic search that is based on iteratively pruning reactions from the generic model, in a random order, while maintaining the consistency of the pruned model (starting from a generic model that is consistent). In each pruning step, a reaction is removed only if its removal does not prevent the activation of reactions in CH, and its removal increases the model's score. As the reactions’ scanning order may affect the resulting model, the algorithm is executed with different, random pruning orders (1000 in our implementation) to construct multiple candidate models (see Supplementary Figure 1 for further clarification). The fraction of models that contains a certain reaction reflects the confidence that it should be included in the final tissue model. Hence, to construct the final tissue model, the candidate models are aggregated, starting from CH and iteratively adding reactions ordered by their scores, until a final, minimal but consistent model is obtained.
Generating a metabolic model of the liver
We applied our method to automatically generate a genome‐scale metabolic network model of the liver. Our starting point is the generic human metabolic model proposed by Duarte et al (2007), accounting for 2766 metabolites, 3742 reactions, 1905 genes, and 100 metabolic pathways. The essential core, CH, was extracted from literature‐based curation of tissue‐specific metabolic pathways that are known to be active in the liver (Bock et al, 1991; Chisari and Fausto, 2001; Gropper and Smith, 2008; Rosenthal and Glew, 2009). It consists of 37 intact metabolic pathways that are involved in central metabolism, carbohydrate, lipid, and amino acids metabolism, as well as specific hepatic metabolism (e.g., drug metabolism, bile acid biosynthesis). These pathways involve in sum 779 reactions and 873 metabolites. The more permissive core, CM, consists of a set of 304 reactions, and 484 metabolites. It was assembled from tissue‐specific data sources, including metabolomic (Wishart et al, 2007) transcriptomic (Shmueli et al, 2003; Yanai et al, 2005), proteomic (Yan and Sadee, 2000; He, 2005; Saier et al, 2006), and phenotypic data (McKusick, 2007) of the liver (Supplementary information, data set I). Each reaction was included in CM only if it was supported by at least two of the data sources listed above, or if it was necessary for the inclusion of a liver metabolite that appeared in the metabolomics data (Wishart et al, 2007). The total core hence comprises of 1083 reactions and 1187 metabolites (description of reactions comprising the CH, CM, and final liver model is given in Supplementary information, data set I).
After applying our MBA method, the resulting liver model consists of 1827 reactions and 1360 metabolites. The network visualization is accessible through a supplemental website (http://www.cs.technion.ac.il/~tomersh/methods.html) and can be explored interactively using the freely available Cytoscape software (Cline et al, 2007). Out of the reactions in the derived liver model that are not included in either reaction cores, 50% are transport reactions that transfer metabolites across compartments (most of them are not associated with genes in the model and hence could not be included in the core reaction sets). The compartmentalization pattern of the liver and the generic models is given in Supplementary Figure 2. Notably, despite the heuristic nature of the MBA algorithm, the participation of many of the reactions in the resulting liver model is consistently predicted across the different random reaction elimination orders. For 70.65% of the non‐CH reactions the algorithm provides completely unanimous predictions, that is, the reaction is either present in all models or absent in all of them (Figure 2).
We first tested the liver model by simulating various known hepatic metabolic functions under different physiological conditions. Evidently, even though the relevant metabolic pathways were already used as input to the model reconstruction algorithm (within the set of high‐reliability reactions, CH), the resulting model is not guaranteed to have the capacity to perform complex functions involving the integration of several metabolic pathways. In the absence of available carbohydrates, hepatocytes have the unique ability to perform gluconeogenesis, that is, to synthesize glucose through glucogenic amino acids, glycerol, and lactate. We simulated gluconeogenesis by limiting the uptake of metabolites that can be used as carbon sources, apart from glucogenic amino acids, glycerol, and lactate, and maximizing the glucose production and secretion rate. A feasible flux distribution in which glucose is secreted was identified, with its secretion depending on the uptake rate of the various uptake nutrients (Supplementary Figure 3), demonstrating the ability of the model to perform gluconeogenesis (KIDA et al, 1980; Chan et al, 2003).
One of the main functions that take place in the liver is the conversion of ammonia to urea. Urea cycle deficiencies are inborn errors of hepatic metabolism that may cause severe hyperammonemia and hyperglutaminemia. In the study by Lee et al (2000), metabolic fluxes of urea secretion and glutamine uptake were measured in vivo in subjects with urea disorders, including argininosuccinate synthetase (ASS) deficiency, argininosuccinate lyase (ASL) deficiency, and ornithine transcarbamylase (OTC) deficiency, as well as in control healthy subjects. As the secretion and synthesis of urea are correlated with the consumption of dietary amino acids, the ratio of urea secretion versus glutamine uptake can better depict the functionality of the urea cycle than the rate of each of the fluxes by itself, as depicted in the results obtained by Lee et al (2000). For each of the three disorders, we simulated the metabolic phenotype of the healthy homozygote genotype (enforcing a non‐zero flux through the corresponding reaction), the mutated heterozygote genotype (limiting the flux through the corresponding reaction to 50% of its maximal rate found by flux variability analysis (FVA; Mahadevan and Schilling, 2003), and the full knockout genotype (i.e., enforcing a zero flux through the reaction), both in the generic human model and in the liver model. Specifically, in each case, we computed the urea:glutamine ratio by sampling the space of feasible flux distributions, satisfying stoichiometric mass‐balance and reaction directionality constraints (Price et al, 2004), deriving a mean and s.d. of the ratio across 1000 sampled solutions. We find that the results obtained by the liver model better represent the experimentally observed metabolic profiles than the generic model. Primarily, in the liver model, the urea:glutamine ratio decreases monotonically with the severity of the disease, with the exception of the full knockout in the OTC simulation (Figure 3; Supplementary Table I). In the liver model, the simulation of the healthy case obtains a significantly higher urea:glutamine ratio in comparison to the ratio obtained in the full knockout of ASS and ASL simulations (t‐test P‐values of 6 × 10−2 and 1 × 10−4, respectively), unlike the generic model (t‐test P‐values of 0.8469 and 0.2235, respectively). In addition, the variance of the ratio across the solution spaces that are defined by the liver model is fairly smaller than the variance of the one defined by the generic model (Supplementary Table I). These latter results are in line with the phenomenon described by Lee et al (2000), that the ratio is kept rather stable for each of the genotypes.
To evaluate MBA's performance in correctly identifying liver reactions, it was applied to construct a liver model in a standard five‐fold cross‐validation process. Specifically, MBA was given various subsets of the full core reaction sets, and the reaction‐content of the predicted MBA model was then compared with the left‐out core reactions data, that is, the ability of MBA to correctly add the missing, left‐out, liver reactions was tested. The predictions of the algorithm are significantly enriched with the left‐out test data (hypergeometric P‐value=5.24 × 10−15), obtaining accuracy of 0.5086, recall of 0.5308, and precision of 0.2732. Another cross‐validation was performed in a different manner, such that each time one of the molecular omics data sets was omitted from the construction stage. Furthermore, the predictions are significantly enriched with the missing data set (mean hypergeometric P‐value=4.33 × 10−04), obtaining on average an accuracy of 0.6174, recall of 0.7, and precision of 0.3844 (Figure 4).
Predicting hepatic flux measurements
The liver model was applied to predict changes of flux rates at different hepatic metabolic conditions, and its prediction accuracy was compared with that obtained by using the original, generic human model. The predictions were tested using a comprehensive set of flux measurements performed by Chan et al (2003). The experiments reported in the study by Chan et al (2003) were performed in primary rat hepatocytes. Hepatocytes form the main component of bioartificial liver assist devices, which are destined to provide short‐term support to patients with acute liver failure (Busse et al, 1999; Stockmann et al, 2000; Tzanakakis et al, 2000; Hui et al, 2001). Hepatocytes were cultured in a standard hepatocyte culture medium and were preconditioned with high or low levels of insulin, and exposed to human plasma with or without amino‐acid supplementation. Metabolic fluxes were measured in these four conditions.
Given this relatively rich liver flux data, we set out to perform a comprehensive comparison between four different classifiers: (a) the liver model, (b) the generic model, (c) a liver model built using expression data only, and (d) a random background model. The model built on tissue‐specific expression data solely (model (c) is built and tested here too, since such data are readily available for many tissues (while other kinds of molecular data are yet more sporadic), and hence of interest. The prediction test is conducted by providing the models with the experimental flux data of the exchange reactions measured and predicting the resulting internal fluxes. To perform a rigorous comparison between the classifiers, we track their performance using a standard receiver–operator curve (ROC), which portrays the tradeoff between the true positive (TPR) and false positive rates (FPRs) of prediction as the decision threshold of a classifier is varied. The overall performance is then provided by the area under the curve (AUC), which varies between zero and one, the latter denoting the performance of a perfect classifier.
As evident from Figure 5, the liver model, obtaining an AUC of 0.6745 and 0.6044 for predicting increasing and decreasing fluxes, respectively, markedly outperforms the generic model, with predictions that obtain an AUC of 0.46073 and 0.5103 for increasing and decreasing fluxes, respectively. The model built based on liver‐specific gene expression obtains an intermediate level of accuracy (an AUC of 0.57331 for increasing fluxes and 0.4806 for decreasing ones). Thus, even though it is built based on a partial information core founded on gene expression solely, it still outperforms the generic human model. The ability of the different models to predict extracellular fluxes given the intracellular fluxes, and in a global five‐fold cross‐validation test (in which 4/5 of all reactions are used to predict the remaining 1/5, irrespective of their external/internal identity), is depicted in Supplementary Figure 4. Overall, these results show that the liver model can serve as a much better basis for predicting hepatic alterations than the generic model, relying on just a fairly limited set of flux measurements.
Predicting metabolic biomarkers
Recently, Shlomi et al (2009) have shown that the generic model of Duarte et al (2007) can be used to successfully predict changes in metabolite concentrations due to IEMs, representing diagnostic biomarkers that can be detected through biofluid metabolomics. As a further validation of the reconstructed liver model, we repeated this analysis here focusing on metabolic disorders that arise from mutations in hepatic metabolic genes. The method is based on comparing the concentrations of boundary metabolites in the disease simulation versus the normal simulation. Each boundary metabolite has an exchange reaction that transports it from the extracellular compartment to the intracellular compartment and vice versa. Therefore, changes in the concentration of each of these metabolites can be inferred from comparing the flux intervals of its exchange reaction (representing uptake versus secretion), by using FVA (see Materials and methods section; Mahadevan and Schilling, 2003) in the simulated normal versus disease states.
The initial set of metabolic disorders (documented in the OMIM database) consists of 137 IEMs, the causative gene of which is included in the generic human metabolic model (Duarte et al, 2007), and are associated with one or more known metabolic diagnostic biomarkers (Shlomi et al, 2009). From this set, we extracted a subset of 84 disorders whosecausative gene is included in the reconstructed liver model (Supplementary information, data set II). As evident from Figure 6, the liver model improves upon the generic model, with an AUC of 0.6691 for the liver model; versus an AUC of 0.5868 achieved by the generic model (description of the liver model's biomarkers predictions is given in Supplementary information, data set II). This time the model built based solely on liver‐specific gene expression performs disappointingly, obtaining a lower level of accuracy than the generic one, with an AUC of 0.5221. These results again show the ability of the liver model to improve the identification of hepatic flux alterations caused by dysfunctional enzymes beyond the level yielded by the generic model, but also highlight the importance of integrating multiple data sources for the construction process if/when the latter are available.
A rigorous, rapid computational method for the generation of in silico tissue‐specific metabolic models is presented. MBA carves out a tissue‐specific model out of a generic species model, based on existing literature and molecular data characterizing the tissue's metabolism. The resulting model satisfies stoichiometric, mass‐balance and thermodynamic constraints. The algorithm is structured according to the accuracy level of the input data by the division of the tissue‐specific core into the more reliable human‐curated (CH) and the omics data (CM), which are treated with different levels of confidence. Despite the heuristic nature of the algorithm the model construction is consistent, such that most reactions selected for the final model from the candidate solutions appear in the large majority of the solutions. The algorithm is applied to construct the first stoichiometric genome‐scale liver metabolic model. The resulting liver model has the ability to perform a range of hepatic metabolic functions, as well as correctly depict the metabolic profile of the liver at different physiological and genetic conditions, in which it outperforms the generic, human model.
The above approach is akin to a recently published method by Christian et al (2009) in that it defines a set of reactions as a core and attempts to carve out the minimal model that consists of this core, but it differs in that it is based on optimization rather than on a network expansion procedure that does not consider the stochiometric constraints. MBA has its pros and cons: on the pros side, it is a generic and fast approach to generate tissue‐specific models, which are fairly accurate and useful. It is practically unlimited in its scalability, and can process a large variety of data sources. Importantly, using cross‐validation, one can readily get a quantitative assessment of the model consistency and reliability, and know where things stand. On the cons side, several limitations should be noted. First, the starting point—as the approach hinges upon a generic species model, its accuracy depends on the quality of the latter. This dependency may be alleviated in the future with an extended computational approach that includes the possibility of adding reactions from a universal pool during model construction. Second, the accuracy of different molecular omics data that are used to determine the tissue core is also an obvious limiting factor—to mitigate this effect our approach treats human‐curated pathways data in a preferential manner, and requires evidence from multiple molecular sources, and yet, it is likely that not all inaccuracies in the molecular data are filtered out. Bearing these potential caveats in mind, the accuracy of the liver model generated here is quite remarkable.
As the final liver model consists of more than 95% of the CM core (Table I), we further tested the added value of having a moderate reliability core over simply including all of its reactions in the resulting liver model (similarly to the high reliability core CH). To this end, we constructed a model that has a fixed predetermined core composed of both the original CH and CM reaction sets (i.e., now all defined as CH reactions), and studied its ability to predict flux alterations. The resulting model is denoted as a strict model, for its construction does not tolerate a removal of any core reaction. The strict model's performance is rather similar to the performances of the liver model in predicting inner fluxes, obtaining an AUC of 0.6061 and 0.6433 for increasing and decreasing fluxes. However, comparison of the models by their ability to predict extracellular fluxes given the intracellular fluxes, in a global five‐fold cross‐validation test (Supplementary Figure 4), as well as in biomarker prediction (Supplementary Figure 5), shows the advantage of the liver model built by the standard, flexible MBA version. Although the liver model contains most of the CM reactions, a considerable number of 100 reactions differentiate between it and the strict model. This set of reactions is involved in 22 metabolic pathways (see Supplementary information, data set II, for full description), and effects the performances of the models.
The MBA approach opens up opportunities for many promising future applications. First and foremost, the liver model developed in this study can help in the rational design of bioengineering artificial devices that emulate hepatic metabolism (e.g., BAL; Yang et al, 2009). Hepatocytes, the main components of the BAL (Strain and Neuberger, 2002), have a tendency to rapidly lose their functionality due to metabolic transformations (e.g., lipid accumulation and reduced ammonia removal). Applying the liver metabolic model to optimize the functionality of these cells and predict potential ways for inhibiting the metabolic processes that contribute to this unwanted transformation can assist in bypassing this hurdle. Second, MBA can serve for the rapid development of an array of metabolic models of a variety of human tissues, providing a computational opportunity to probe the metabolism of such tissues as the kidney, heart, and brain on a genomic scale. Third, MBA can be used to generate tissue models for any organism for which a generic model exists, such as Mus musculus (Sheikh et al, 2005; Quek and Nielsen, 2008) and the model plant Arabidopsis thaliana (Poolman et al, 2009).
Materials and methods
The model‐building algorithm (MBA)
A metabolic network model, amenable to CBM, can be represented by a four‐tuple, (M, R, S, L), in which M denotes a set of metabolites, R denotes a set of biochemical reactions, denotes reactions’ stoichiometry, and denotes constraints on reactions’ directionality (i.e., reflecting a lower bound on reactions’ flux rate). Reactions’ stoichiometry is represented by a stoichiometric matrix, S, in which Si,j, represents the stoichiometric coefficient of metabolite i in reaction j.
A feasible flux distribution within a metabolic network model, is a vector , satisfying mass‐balance (Sv=0), and directionality constraints (L⩽v). A metabolic network model is considered consistent if it enables to activate all of its reactions—that is, for each reaction there exists a feasible flux distribution v, such that ∣vi∣>0.
Given a metabolic network model, GM=(MG, RG, SG, LG), referred to as the generic model, a partial model, PM=(MP, RP, SP, LP), including only a subset of the generic model's reactions (RP⊂RG) can be defined on the basis of the corresponding subsets of metabolites, stoichiometry, and directionality constraints. Notably, a partial model of a consistent generic model is not necessarily consistent by itself (i.e., it may contain dead‐end reactions that cannot be activated considering all possible feasible flux distributions).
Given a generic model, GM, and two sets of core reactions, CH and CM⊂RG, known to have a high and moderate probability to be included in some partial model, respectively, our goal is to derive the most parsimonious, consistent, partial model PM, consisting of all reaction in CH and a maximal number of reactions from CM. Specifically, we would like to identify PM by solving the following optimization problem:
where ε is a parameter reflecting a tradeoff between obtaining the most parsimonious model (for a high value of ε) and including a maximal number of moderate probability reactions in the partial model (for a low value of ε). For the construction of the liver model ε was set to 0.5.
To solve the above optimization problem, we use the following simple search heuristic:
Choose a random permutation, P, of reactions from the set RG/(CH∪CM)
For each reaction r∈P
inactiveReactions=CheckModelConsistency (RP, r)
If (∣eH∣=0) AND (∣eM∣<ε*∣eX∣) then RP=RP/(eM∪eX)
First, the current set of reactions in the partial model, RP, is initialized to include all reactions in the generic model RG (1). Then, a random permutation of all reactions that are neither in the high‐reliability core, CH, nor in the moderate‐reliability core, CM (and hence their potential removal from the partial model can increase the optimization objective function) is generated. In (3), the potential removal of each reaction in turn (based on the random order P) is evaluated. The procedure CheckModelConsistency computes the set inactiveReactions that consists of reactions in RP that cannot be activated due to the removal of reaction r from RP. eH, eM, and eX are then computed to consist of inactive reactions that belong to CH, CM or to none of the cores, respectively. In case the removal of r from RP would affect the ability to activate a high‐reliability reaction, r would not be removed from RP (3.b). Otherwise, if removal of r would increase the optimization objective value, r will be removed from RP, along with the additional reactions eM, eX that cannot be activated following its removal.
A naive implementation of CheckModelConsistency can simply iterate through all reactions in RP applying FVA (Mahadevan and Schilling, 2003) to check whether they can carry non‐zero flux within a feasible flux distribution when r is removed. The naive algorithm is computationally prohibitive, with an overall time complexity of O(nkt), where n denotes the number of reactions in CH∪CM, k denotes the number of non‐core reactions (∣RP/CH∪CL∣), and t denotes the (polynomial) complexity of each LP problem. Instead, a simple speed‐up technique was implemented that aims to concurrently activate multiple core reactions in the same LP problem, by trying to maximize their total sum of flux. These optimizations are repeated several times on a monotonically decreasing set of core reactions. Each iteration is performed on a smaller subset of the core reactions, which have received zero flux in all previous iterations (see Supplementary information, p 7 for pseudo‐code and further explanation). These speed‐ups reduce the running time by 97%, in comparison to the naive approach. The total running time of the algorithm is ∼10–20 min depending on the cores’ size.
As the resulting model depends on the chosen reaction scanning order, the algorithm is executed repeatedly for a number of times (1000 in the results presented here) with different, random scanning orders. Each run results in a candidate model. All 1000 candidate models are then processed to assign the CM and non‐core reactions with scores, representing the fraction of candidate models in which they appear. An aggregative model is built by considering the scores across all runs, starting with the CH and incrementally adding reactions according to their confidence score until a consistent, viable model is obtained.
Predicting flux measurements
The liver model was applied to predict changes in metabolic flux rates under different hepatic conditions. The predictions were compared with a set of flux measurements, including 23 exchange fluxes (uptake of metabolites) and 22 inner fluxes (Chan et al, 2003; a complete description of the experimental flux data is given in Supplementary information, data set II). The measurements were obtained from primary rat hepatocytes in four metabolic conditions: high/low‐insulin preconditioning with (HPA/LPA) or without (HP/LP) amino‐acid supplementations.
Quadratic programming (QP) was applied to first find a feasible flux distribution, satisfying stoichiometric mass‐balance and reaction directionality constraints, that is as close as possible to the measured flux through exchange reactions. This process was executed both for the generic and the liver models in each of the four growth conditions. To explore the metabolic profile that is deduced by this optimal match with the measured exchange fluxes, FVA (Mahadevan and Schilling, 2003) was used to predict a range of possible fluxes through all reactions in the model, while fixing the flux through the measured exchange reactions to the values computed using QP (the maximal uptake rate of other exchange reactions in the model was set to 10% of glutamine uptake rate (Chan et al, 2003).
Changes in metabolic flux were predicted in comparison with the reference condition HP. For each reaction, we compute the difference between its interval in the HP condition to its interval in the other three conditions, that is, change(vi)=(max(vi)‐ref_max(vi))+(min(vi)‐ref_min(vi)). A reaction is predicted to increase or decrease under the conditions for which either (1) or (2) listed below holds, respectively, where f is a threshold value. Otherwise, it is considered to be unchanged (Supplementary Figure 6). In addition, the change in the interval is normalized by its size (3), to account the significance of the flux alteration. The ROC curves of the predictions were plotted with increasing thresholds values (Bock et al, 1991).
Mi=∣mean(min(vi), max(vi), min(ref_vi), max(ref_vi))∣
We applied the computational method described by Shlomi et al (2009) to systematically predict metabolic biomarkers characterizing 84 hepatic metabolic disorders (documented in the OMIM database), that is, disorders causative gene of which is included in the liver model (Supplementary information, data set II). The method relies on comparing the concentrations of boundary metabolites in the disease simulation versus the normal simulation. A boundary metabolite is defined as a metabolite that can be transported from the extracellular compartment to the intracellular compartment, and vice versa, through an exchange reaction. Changes in the concentration of a boundary metabolite can be inferred from shifts in the interval of the minimum/maximum flux values of its exchange reaction. For each metabolic reaction r, we computed the interval of exchange reactions e, via FVA, when r is forced to be either active or inactive, simulating the normal and disease case, respectively. If m is a boundary metabolite with an exchange reaction interval of A=(mina, maxa) in the normal simulation and B=(minb, maxb) in the disease simulation, then m is a biomarker if either of the following holds:
In adddition, based on the definition of Shlomi et al (2009), we define:
where f is the decision threshold value according to which we plot the ROC curve.
This study was supported by grants from the Israeli Science Foundation (ISF) to ER and TS. LJ is supported by a fellowship from the Edmond J Safra Bioinformatics Program at TAU.
Conflict of Interest
The authors declare that they have no conflict of interest.
Data Set I
Data Set II
Supplementary figures S1–8, Supplementary table S1
The liver model in excel format.
The liver model in SBML format.The version of this supplementary file originally posted online on 7 September 2010 contained unreadable lines due to a problem with the COBRA toolbox used to generate the SBML file.These errors have been corrected and the file replaced on 21 September 2010.
This is an open‐access article distributed under the terms of the Creative Commons Attribution License, which permits distribution, and reproduction in any medium, provided the original author and source are credited. This license does not permit commercial exploitation without specific permission.
- Copyright © 2010 EMBO and Macmillan Publishers Limited