Open Access

How to deal with large models?

William S Hlavacek

Author Affiliations

  • William S Hlavacek, 1 Theoretical Biology and Biophysics Group, Theoretical Division and Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM, USA

The regulatory systems that allow cells to adapt to their environments are exceedingly complex, and although we know a great deal about the intricate mechanistic details of many of these systems, our ability to make accurate predictions about their system‐level behaviors is severely limited. We would like to make such predictions for a number of reasons. How can we reverse dysfunctional molecular changes of these systems that cause disease? More generally, how can we harness and direct cellular activities for beneficial purposes? Our ability to make accurate predictions about a system is also a measure of our fundamental understanding of that system. As evidenced by our mastery of technological systems, a useful understanding of a complex system can often be obtained through the development and analysis of a mathematical model, but predictive modeling of cellular regulatory systems, which necessarily relies on quantitative experimentation, is still in its infancy. There is much that we need to learn before modeling for practical applications becomes routine. In particular, we need to address a number of issues surrounding the large number of parameters that are typically found in a model for a cellular regulatory system.

In a recent article published in Molecular Systems Biology, Peter Sorger et al report a significant contribution not only to our system‐level understanding of an important signal‐transduction system but also to our understanding of the methodology needed for developing and testing a large‐scale mathematical model for this type of system (Chen et al, 2009). The power of sensitivity analysis is demonstrated. In this study, to which William Chen, Birgit Schoeberl and Paul Jasper contributed equally, an ordinary differential equation (ODE) model for immediate‐early events in signaling by the epidermal growth factor (EGF) receptor (EGFR, which is also called ErbB1) (Schoeberl et al, 2002) was extended to incorporate other members of the ErbB family of receptors (ErbB2, ErbB3 and ErbB4) and stimulation by heregulin (HRG), a ligand of ErbB3 and ErbB4. The scope of the model encompasses stimulation by EGF, a ligand of ErbB1, stimulation by HRG, seven homo‐ and heterodimers that the four ErbB receptors form and downstream events leading to activation of the MAP kinase ERK and the serine/threonine kinase Akt. A total of 28 proteins are included in the model. The article reports quantitative data used to identify parameters in the model (e.g. time course measurements of phosphorylation events by in‐cell western blot assay), a careful and exhaustive sensitivity analysis and tests of predictions of the model.

The model of Chen et al is comprised of 499 ODEs, which track the dynamics of 828 reactions, and 229 parameters (rate constants and copy numbers). How were these parameters determined? Chen et al used a computationally expensive method, simulated annealing, capable in principle of finding a global minimum in a rugged landscape. This method was applied after the results of an initial sensitivity analysis, which was based on estimates of nominal parameter values, focused attention on 75 of the 229 model parameters, which reduced the size of the parameter space searched in fitting. Best‐fit parameter values that varied and that remained relatively constant across multiple runs of the non‐deterministic fitting procedure were identified, resulting in a partially calibrated model. As seen in studies of other models (Gutenkunst et al, 2007), various combinations of different parameter values were found to yield essentially the same model predictions, which are consistent with the data used in fitting (120 time course measurements of responses to EGF and HRG stimulation collected over a period of 2 h after ligand addition). This finding supports the notion advanced by James Sethna et al that models of biochemical systems tend to be ‘sloppy,’ with the implication that we should be primarily concerned with the quality of the predictions of a model and not the estimates of the parameters in a model. To test the predictions of their model, Chen et al examined the dose‐dependent effects of pharmacological inhibitors (gefitinib and lapatinib), which attenuate EGFR kinase activity, on phosphorylation of ERK and Akt. Sensitivity analysis of the partially calibrated model indicated that phosphorylation of Akt should be more sensitive to inhibition of EGFR kinase activity than phosphorylation of ERK, and this prediction was confirmed in experiments. Other findings of the sensitivity analysis of Chen et al are that the relative importance of parameters for a given model prediction can be determined robustly, despite parameter uncertainty, and that the subset of model parameters that influences a given prediction changes depending on what is being predicted. In other words, parameter sensitivity is context dependent. Chen et al have made an impressive attempt to address the issues of estimating parameters in a large‐scale model and understanding how these parameters impact model predictions. This attempt, which goes beyond what is usually considered acceptable, raises the bar for this type of modeling and should serve as a useful guide for future work.

What insights into ErbB receptor signaling have been gained from analysis of the model of Chen et al? One finding is that ERK or Akt phosphorylation at 5 min after addition of ligand (EGF), for a particular cell line (A431), is a log‐linear function of the ligand dose over a broad range of ligand concentrations, including at concentrations that yield low receptor occupancy. The finding that significant ERK and Akt activation occur at ligand doses well below the KD for ligand–receptor binding may have implications for our understanding of the clinical efficacy of EGFR kinase inhibitors. Sensitivity analysis of the model suggested that the dose–response behavior of A431 cells arises because ERK and Akt phosphorylation are more sensitive to ligand at low doses than at high doses. This prediction was confirmed experimentally. Interestingly, when the MAP kinase cascade in the model was simulated in isolation from the rest of the system, the cascade exhibited a switch‐like response to stimulation rather than the more graded response to stimulation that is characteristic of the whole system. Sensitivity analysis also revealed that the dose–response behavior of the A431 cell line depends on parameters (e.g. the amount of Ras) that could potentially vary from one cell line to other. Consistent with this result, diversity in ligand dose–response behavior was observed when four other cell lines were examined. These results suggest that, in general, the function of a signaling pathway ‘motif,’ such as the MAP kinase cascade, is likely to depend on its context. These insights into the system‐level behavior of ErbB receptor signaling are intriguing and there is probably much more to learn from analysis of the model of Chen et al about the effects of receptor–receptor interactions and differential signaling by the two ligands considered in the model (EGF and HRG).

The work of Chen et al provides guidance for those who contemplate building and studying large‐scale mechanistic models for cellular regulatory systems. It also challenges us to ask questions about these types of models. The process of specifying a mechanistic model is sometimes more enlightening than formal analysis of the model because the precision required of a model specification forces a modeler to confront gaps in our knowledge and to ask questions about mechanism that might otherwise go unasked. Unfortunately, these benefits of constructing a model are often enjoyed only by those intimately familiar with the model development effort (or those who are willing to essentially repeat it), especially in the case of a large model, such as that of Chen et al This model is specified as a list of 828 reactions (or equivalently, 499 ODEs for the mass action kinetics of these reactions) in a standardized electronic format that allows one to simulate the model and reproduce the results of Chen et al but not to transparently evaluate the basis for the model specification. Models of this kind may be made more accessible by the adoption of proposed standards for model annotation, such as the MIRIAM guidelines (Le Novère et al, 2005), and by the development and use of software tools that facilitate the documentation of model assumptions and metadata (Laibe and Le Novère, 2007). However, models for large reaction systems are inherently difficult to understand, and the accessibility of these models is unlikely to improve dramatically through efforts to simply improve their annotation. How are we to have confidence in such models then?

One way to improve accessibility is to specify models at a higher level of abstraction, at the level of molecular interactions using the formalism of ‘rules’ (Hlavacek et al, 2006), rather than at the level of the reactions generated by these interactions using the conventional formalism of mass action kinetics. Models specified using rules should be easier to grasp because there are usually far fewer interactions than chemical reactions. For example, the model of Chen et al accounts for the interactions of only 28 proteins but over 800 reactions arising from these interactions. Another advantage of a rule‐based approach is the ability to account, in principle, for the ‘combinatorial complexity’ of molecular interactions. Chen et al rightly avoided an attempt to account for the full spectrum of chemical species and reactions implied by the molecular interactions considered in their model because of the computational cost of doing so. Fortunately, recently developed simulation methods and software implementations of these methods should enable the analysis of a rule‐based model regardless of the size of the reaction network implied by rules (Danos et al, 2007; Yang et al, 2008).

Another possible way to improve confidence in large models is to apply ideas of model checking from computer science (Calzone et al, 2006; Clarke et al, 2008). The essence of these ideas is to use temporal logic to specify qualitative and/or quantitative system behaviors as formal model‐checking queries, which can be processed automatically using efficient procedures to determine whether a model reproduces the specified behaviors. With further development of model‐checking methodology for biological applications, one should be able to systematically formalize observations about system behavior and establish to what extent a given model is consistent with observations of interest. From the study of Chen et al, we know that their model is consistent with the data collected as part of that study. We also know that the model is consistent with various other data reported in the literature, such as the data used by Chen et al to estimate parameter values. However, not all of the possible predictions of the model have been tested, and the extent to which the model of Chen et al can be trusted to make accurate predictions is uncertain. In fact, aspects of the model are clearly inconsistent with what we know about ErbB receptor signaling. For example, the reaction scheme for EGF–EGFR binding and EGF‐induced homodimerization of EGFR incorporated into the model is inconsistent with a model for these processes that has been experimentally tested and found to be consistent with an array of binding measurements (Macdonald and Pike, 2008). Imagine if researchers interested in modeling of ErbB receptor signaling could take advantage of a library of model‐checking queries that encode key behavioral features of ErbB receptor signaling. One could then test any new model against these queries to assess its predictive power. A model consistent with all or most of the behavioral features in the library would be viewed as reliable.

Model checking promises to provide useful quality control measures for the development of large‐scale complicated models, but other such measures are needed as well. For example, the parameters in models for reaction kinetics often must satisfy certain thermodynamic constraints, even if the system being modeled is never at thermal equilibrium (Ederer and Gilles, 2007; Yang et al, 2007). Use of available methods for ensuring these constraints are satisfied would provide additional quality control and also improve the efficiency of parameter fitting by reducing the number of parameters that are specified independently.

In summary, big sloppy models are needed to understand the system‐level dynamics of cellular regulatory systems. In the future, modelers should give careful attention to how these models can be made more accessible and how to assess the confidence that should be placed in a model given all of the available knowledge about system behavior. The study of Chen et al demonstrates the power of sensitivity analysis and should serve as a useful guide for others struggling to obtain a predictive understanding of the complex systems that cells use to make decisions. Recently reported related studies of Flaherty et al (2008) and Del Vecchio et al (2008) provide additional guidance and also deserve mention. Flaherty et al used Bayesian methods to estimate parameters in a model for a G‐protein‐coupled signal‐transduction system. These methods, which can potentially provide insights beyond those that can be obtained using the methods of Chen et al, allow one to characterize the precision of parameter estimates in terms of probability distributions that reveal qualitatively and quantitatively the extent to which data about system behavior inform and constrain parameter estimates. Del Vecchio et al investigated modularity in cellular regulatory systems through the analysis of simple idealized models, finding that post‐translational covalent modifications can provide ‘insulation’ that makes the input–output behavior of a system insensitive to connections to other systems. It should be interesting to explore the connections between the results of this study and the empirical observations of Chen et al about the modularity of the MAP kinase cascade in their model.

Conflict of Interest

The author declares that he has no conflict of interest.


Creative Commons logo

This is an open access article under the terms of the Creative Commons Attribution‐NonCommercial‐NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.