Advertisement

Open Access

Open Access

A framework for mapping, visualisation and automatic model creation of signal‐transduction networks

Carl‐Fredrik Tiger, Falko Krause, Gunnar Cedersund, Robert Palmér, Edda Klipp, Stefan Hohmann, Hiroaki Kitano, Marcus Krantz

Author Affiliations

  1. Carl‐Fredrik Tiger1,2,,
  2. Falko Krause2,,
  3. Gunnar Cedersund1,3,4,
  4. Robert Palmér3,
  5. Edda Klipp2,
  6. Stefan Hohmann1,
  7. Hiroaki Kitano3,5,6,7 and
  8. Marcus Krantz*,1,2,5
  1. 1 Department of Cell and Molecular Biology, University of Gothenburg, Göteborg, Sweden
  2. 2 Theoretical Biophysics, Humboldt‐Universität zu Berlin, Berlin, Germany
  3. 3 Department of Clinical and Experimental Medicine, Diabetes and Integrative Systems Biology, Linköping University, Linköping, Sweden
  4. 4 Freiburg Institute of Advanced Sciences, School of Life Sciences, Freiburg, Germany
  5. 5 The Systems Biology Institute, Tokyo, Japan
  6. 6 Sony Computer Science Laboratories, Inc., Tokyo, Japan
  7. 7 Okinawa Institute of Science and Technology, Okinawa, Japan
  1. *Corresponding author. Theoretical Biophysics, Humboldt‐Universität zu Berlin, Invalidenstr. 42, Berlin 10115, Germany. Tel.:+49 30 2093 8389; Fax:+49 30 2093 8813; E-mail: marcus.krantz{at}biologie.hu-berlin.de
  1. These authors contributed equally to this work

Abstract

Intracellular signalling systems are highly complex. This complexity makes handling, analysis and visualisation of available knowledge a major challenge in current signalling research. Here, we present a novel framework for mapping signal‐transduction networks that avoids the combinatorial explosion by breaking down the network in reaction and contingency information. It provides two new visualisation methods and automatic export to mathematical models. We use this framework to compile the presently most comprehensive map of the yeast MAP kinase network. Our method improves previous strategies by combining (I) more concise mapping adapted to empirical data, (II) individual referencing for each piece of information, (III) visualisation without simplifications or added uncertainty, (IV) automatic visualisation in multiple formats, (V) automatic export to mathematical models and (VI) compatibility with established formats. The framework is supported by an open source software tool that facilitates integration of the three levels of network analysis: definition, visualisation and mathematical modelling. The framework is species independent and we expect that it will have wider impact in signalling research on any system.

Synopsis

An intuitive formalism for reconstructing cellular networks from empirical data is presented, and used to build a comprehensive yeast MAP kinase network. The accompanying rxncon software tool can convert networks to a range of standard graphical formats and mathematical models.

Embedded Image

  • Network mapping at the granularity of empirical data that largely avoids combinatorial complexity

  • Automatic visualisation and model generation with the rxncon open source software tool

  • Visualisation in a range of formats, including all three SBGN formats, as well as contingency matrix or regulatory graph

  • Comprehensive and completely references map of the yeast MAP kinase network in the rxncon format

Introduction

All living cells interact with and respond to their environment via the cellular signal‐transduction network. This network encompasses all cellular components and processes that are required to receive, transmit and interpret information. Due to its key role in cellular physiology, the signalling network, and several of its subnetworks, have been intensely studied in a range of organisms. However, such networks are highly complex and difficult to analyse due to the so‐called combinatorial explosion (Hlavacek et al, 2003). This explosion refers to the fact that the specific state of each component is determined by multiple covalent modifications or interaction partners, and that these possibilities rapidly combine to a very large number of possible specific states. Experimental data do not generally distinguish between all these specific states, but instead focus mostly on reactions between pairs of components, usually giving no or limited information on other modifications or interaction partners of the reactants. Hence, there is a discrepancy between the granularity of the empirical data and the highly defined specific states used in most mathematical models. This makes the interpretation and use of empirical data in the context of such model states ambiguous and often arbitrary. These problems pose major challenges for systems biology, as they prevent us from (i) unambiguously describing a network, (ii) visualising it without simplifications or unsupported assumptions and (iii) automatically generating mathematical models from knowledge in data repositories.

Large efforts have been invested in addressing these issues. Signalling systems are commonly visualised through the informal ‘biologist's graph’ that is simple and intuitive, but lacks the stringent formalism and precision required to meet the three criteria above (exemplified by Thorner et al, 2005). The lack of standardised glyphs (defining e.g., mechanism of information transfer and how edges combines to regulate target nodes) makes the information in the ‘biologist's graph’ ambiguous and difficult to reuse. To address this, the community has developed the Systems Biology Graphical Notation, SBGN (Le Novere et al, 2009). This includes three visual formats; the activity flow diagram, the entity relationship diagram and the process description (or process diagram). The activity flow diagram shares many properties with the ‘biologist's graph’, but the entity relationship diagram and process description allow precise representations. The process description corresponds to the state transition reaction format used in most models developed by the systems biology community, and which have been standardised in the Systems Biology Markup Language (SBML; Hucka et al, 2003). The process description could meet each of the three criteria above but its utility is severely affected by the combinatorial explosion. It is based on a specific state description, which means that, for each component, each possible combination of modifications and interaction partners must be accounted for explicitly. Hence, only very simple systems can be described completely and only very few models include the entire state space (Kiselyov et al, 2009) while the vast majority include simplifying omissions. While simplifications are often necessary, the lack of discrimination between arbitrary omissions and exclusions based on experimental evidence is a significant shortcoming. These issues are partially addressed in the entity relationship diagram, or molecular interaction map, which comes in two flavours; explicit and implicit (called heuristic and combinatorial by the author (Kohn et al, 2006)). The explicit version requires all specific states to be displayed and hence share the limitations of the process description. In contrast, the implicit version displays only the possible reaction types (or elemental reactions, as we will call them below) and hence largely avoids the combinatorial explosion. The entity relationship diagram represents each component as a single node and reactions in a condensed format. While not as intuitive as the other SBGN formats, it has the advantage of concentrating all information on a given protein and works especially well for simple regulatory circuits, as the concentrated information makes it difficult to trace the order of events in more complex networks. The three SBGN format has complementary strengths, but there is currently no software available for conversion between the three different visualisation formats. However, the SBGN standards are under continuous development and these issues will likely be addressed in the future through the SBGN markup language, SBGN‐ML.

Similar efforts on the modelling side have resulted in rule‐based modelling and associated visualisation formats (Faeder et al, 2005). Briefly, rules are defined as reactions that are valid under a particular set of contingencies, and each reaction is specified for each such contingency set. This means that when a reaction's rate is increased by phosphorylation of one component it will be defined by two rules; one where that component is phosphorylated and one where it is not. While these rules define the entire state space and the system stays subject to the full combinatorial explosion, the rule description has alleviated the combinatorial problem in two respects: (1) the system has been described more compactly and (2) the actualised state space might be significantly reduced by introducing only those states that are actually populated (Lok and Brent, 2005), or by using agent‐based stochastic modelling (Sneddon et al, 2011). The rule definition format is also a significant step towards the granularity of empirical data, as compared with the abstract‐specific states. These advantages are mirrored on the visualisation side by graphical reaction rules, which use the process description format to display individual rules (Blinov et al, 2006). Network level visualisation has used either topological contact maps (Danos, 2007) or entity relationship diagrams (Le Novere et al, 2009), and these complementary visualisation formats have recently been combined in the extended contact map (Chylek et al, 2011). Contact maps have software support, but neither entity relationship diagrams nor extended contact maps can be generated automatically from the rule‐based models. Hence, the rule‐based format partially addresses the automatic creation of models from data repositories (iii), as it provides the tools to generate mathematical models automatically once the knowledge has been reformulated as rules. However, the rule‐based system provides a cumbersome format for (i) unambiguous network description and is not developed for (ii) comprehensive visualisations. Taken together, this raises the question whether graphical‐ and model‐based formats are the most appropriate for stringent network definition, or whether there are more suitable network definition formats that allow both visualisation and automatic model generation.

Here, we present a new framework to describe cellular signal‐transduction networks. Our network definition has the same granularity as experimental data, avoids the combinatorial complexity, can be automatically visualised in complementary graphical formats including all three SBGN formats and unambiguously defines mathematical models. The rxncon software tool complements the framework by automating visualisation and model creation. The key feature of our framework is the strict separation of elemental reactions (and their corresponding states); which defines the possible signalling events in the network, from contingencies; which describes the contextual constrains on these reactions. Importantly, each elemental reaction corresponds directly to a single empirical observation, such as a protein–protein interaction or a specific phosphorylation. The contingencies define the constraints on these elemental reactions in terms of one or more elemental states, for example, by defining the active state of a protein kinase or the composition of a functional protein complex. Hence, the format directly link model states to empirical observations at the same level of granularity, which pre‐empts the need for additional assumptions or extrapolations. Moreover, the separation between reactions and contingencies largely avoids the combinatorial explosion as only combinatorial states with known functional influence are considered. The rxncon tool provides automatic export to established visual formats and to two new visualisation methods, which allow compact comprehensive representation. Finally, the framework is stringent and unambiguously defines a mathematical model, and the rxncon tool support export to SBML and rule‐ or agent‐based models. This allows coding of models in a format that mirrors empirical data, which can be automatically visualised and which is highly suitable for iterative model building. We illustrate our new approach by conducting the most comprehensive literature survey to date of the complete MAP kinase signalling network of Saccharomyces cerevisiae. Taken together, we provide a framework that integrates the three levels of network analysis; definition, visualisation and mathematical modelling and a supporting software tool for automatic visualisation and export to mathematical models. We expect this to be highly useful for the community and envision a common framework to bridge different standards as well as experimental and theoretical systems biology efforts.

Results

This section describes the architecture of the framework, including its data structure, the different methods of visualisation and how it relates to a mathematical model (Figure 1A). In the first part, we present the results of the methods development and describe the system in detail. In the second part, we present our results using the MAP kinase network. The framework has been implemented in the rxncon software tool that is distributed freely under the open source LGPL licence and can be downloaded from www.rxncon.org.

Figure 1.

Schematic representation of the data structure. (A) The input data are the reaction and contingency lists, which contains the ‘what‐aspects’ and ‘when‐aspects’ of the reaction network, respectively. The rxncon software uses these lists to create a range of visualisations as well as computational models. These conversions require no additional information and are fully automated. (B) A simplified version of the Sho1 branch of the Hog pathway in S. cerevisiae will be used to illustrate the data structure. This ‘biologist's graph’ shows the activating phosphorylation cascade (arrows) from Ste20 to Hot1. Scaffolding and membrane recruitment by Sho1 facilitates the first two phosphorylation events (grey lines). (C) The (simplified) reaction list defines the elemental reactions between pairs of components. It includes the two components (columns I and III), reaction type (column II; ‘ppi’=protein–protein interaction, ‘P+’=phosphorylation; see Table I for complete list of reactions), reaction (column IV, a concatenation of the components and the reaction type) and resultant state (column V; protein dimers or phosphorylated states). Note that each elemental state only defines a single aspect of each component's specific state. (D) The (simplified) contingency list defines the relationship between states and reactions. It contains the affected reaction (Target, column I), the influencing state (Effector, column III), and the effect this particular state has on that reaction (contingency, column II). (E) The reaction and contingency information is summarised in the contingency matrix. The matrix is defined by elemental reactions (rows) and states (columns). The cells define how (if) each reaction (row) is affected by each state (column); that is, the reactions’ contingencies on different states. Note that only direct contingencies are considered; reaction/state intersections which do not share components are blacked out. The grey fields (‘x’) are automatic as states are binary and hence a reaction cannot occur if the state is already true. The green fields (‘!’/‘K+’) are imported from the contingency list, and all other open fields are defined as unknown effect (‘?’). This information can also be visualised in a number of graphical forms: The reaction graph (F) displays network topology with either components or their domains as functional units. The regulatory graph (G) combines the reaction and contingency information to display the causal relationship between the reactions in the network and provides a complete graphical representation of the knowledge compiled in the contingency matrix. The limited process description (H) displays the catalytic modifications in the signal‐transduction network as state transitions with catalysts but without complex formation (compare Supplementary Figure S1). The interaction and distance matrices (I) provide a compact description of network topology and allow calculation of distances between nodes. Finally, the reaction and contingency data can be visualised as an entity relationship diagram (J). These visualisations and the equation system for this system, subsystem or your own favourite network defined in the same format can be automatically generated using the rxncon software.

The data structure

The events in a signal‐transduction network can be categorised in four types: (1) catalytic modifications, (2) bindings and interactions, (3) degradation and synthesis and (4) changes in localisation. Due to the limited information on spatial (re)distribution of components, we have focused on types 1–3 here (Table I). However, the framework is fully capable to include localisation reactions and the rxncon tool will be upgraded to encompass these in the future. The first step of the network definition is to distil the available knowledge into two distinct categories of information: what can happen, and when it can happen. The what‐aspect (referred to as C1, or elemental reactions) specifies the possible events, including the event type (1–3 above), and which components and sites that are involved. The when‐aspect (referred to as C2, or contingencies) specifies how the reaction rate is affected by the state of the involved components. For instance, the MAP kinase Hog1 phosphorylate its target Hot1 (C1—‘what’; Figure 1B), and this reaction only occurs when Hog1 is phosphorylated on both Thr174 and Tyr176 (C2—‘when’). This second category of knowledge therefore represents the causal relationships, or contingencies, between the reactions characterised in the first class of knowledge. The separation of C1 from C2 allows us to define even large complex networks stringently in a concise format, as exemplified with the yeast MAP kinase network below.

View this table:
Table 1. Thirteen reaction types were used to map the MAP kinase network

The what‐aspects of the knowledge are represented in the reaction list (Figure 1C; simplified example). Importantly, we have broken down the reaction network in elemental reactions, which change elemental states. An elemental state is similar to an empirical observation, such as an interaction between two proteins or a specific modification at a specific site on a specific protein. If a protein has been phosphorylated on two sites, this corresponds to two different elemental states. In other words, the elemental states correspond to overlapping (non‐disjoint) sets. This is different from the specific states in ordinary state transition models, but analogous to the macroscopic states used in the works by Conzelmann et al (2008) (Borisov et al, 2008). An elemental reaction is similarly defined as a two‐component reaction that modifies a single elemental state. Note that this precludes lumped reactions and that, for example, a kinase–substrate interaction and phosphorylation must be described by two different elemental reactions. Hence, the reaction list has the same granularity as typical empirical data, which pre‐empts the need for assumptions in the mapping process. It also allows us to use the established format for high‐throughput data (Stark et al, 2006), including specific referencing of each reaction with PubMed identifiers and complemented with additional details such as active domains, subdomains and residues (Supplementary Tables S1 and S2).

The when‐aspect of the knowledge is described in the contingency list (Figure 1D; simplified example). This list defines the contextual constraints on all elemental reactions. Most contingencies will correspond to the direct effect of single elemental states of the components involved in the particular elemental reaction, but Boolean states allow for combinatorial effects and indirect effects in, for example, scaffolds that cannot be directly attributed to a single elemental state in one of the reactants. There are six distinct reaction contingencies; the Effector can be absolutely required (!), positive (K+), completely neutral (0), negative (K−), absolutely inhibitory (x) or of unknown effect (?). These overlap partially with the influences of entity relationship diagrams (Le Novere et al, 2011), but distinguish between no effect (0) and no known effect (?). The Boolean states provide a middle layer between reaction contingencies and a combination of elemental states and/or inputs, using either ‘AND’ or ‘OR’ to define, for example, large complexes or alternative mechanisms. In addition, inputs and outputs function as elemental states and reactions, respectively, at the interface between the network and the external environment. Each row in the contingency list contains a Target (elemental reaction, output or Boolean state), an Effector (elemental state, input or Boolean state) and a symbol describing how the Effector influences the Target (Contingency) that is a contingency symbol (!, K+, 0, K−, x, ?) when the Target is an elemental reaction or an output and a Boolean operator (AND, OR) when the Target is a Boolean state. The data structure is illustrated with a simplified version of the Sho branch of the HOG pathway (Figure 1B). The reaction list state that, for example, Hog1 phosphorylates (‘P+’) Hot1 (Figure 1C; eighth reaction; on the last row), and the contingency list state that this reaction requires (‘!’) that Hog1 is phosphorylated on both Thr174 and Tyr176 (Figure 1D, last two rows). These states in turn correspond to the reactions six and seven, respectively (Figure 1C). Hence, the reaction and contingency information suffice to describe the network and their separation keeps the description concise and at the granularity of empirical data. Consequently, the data structure addresses the first issue; unambiguous network definition.

Visualising the signal‐transduction network

We address the second issue; comprehensive visualisation, with two novel forms of visualisation; the contingency matrix and the regulatory graph. These also keep reactions and contingencies separate and hence avoid the combinatorial explosion and implicit assumptions. Both include the complete information about reactions (C1) and contingencies (C2). This data structure is also well suited for visualisation in entity relationship diagrams or extended contact maps, and the rxncon software tool supports export to the entity relationship format (Chylek et al, 2011; Le Novere et al, 2011). We also provide export to the reaction graph/activity flow diagram and the process description, though neither of these can fully and accurately represent the network as discussed below. Nevertheless, they all provide their unique advantages and can be automatically generated with the rxncon tool and the information in the reaction and contingency lists.

The contingency matrix integrates the information in the reaction and contingency lists (Figure 1E). The matrix is spanned by the reactions and their corresponding states (C1) and populated by the contingencies of reactions on states (C2). Each row corresponds to one elemental reaction and each column corresponds to one elemental state. The symbol in each reaction–state intersection specifies how that specific reaction depends on that specific state. Together, one row contains the complete set of rules a reaction follows, and hence describes how it works in every specific state. This is related to a dependency matrix (Yang et al, 2010), although the entries in the contingency matrix are more detailed and unambiguous. In the example (Figure 1E), the first row shows that (a) the binding of Sho1 to Ste11 cannot occur if either of the components is already part of such a dimer (column 1), (b) that we do not know whether the prior binding of Sho1 to Pbs2 (column 2) or phosphorylation of Ste11 (column 3) effects the Sho1–Ste11 binding and (c) that the other states appearing in the row are irrelevant for this specific binding reaction—as they do not describe properties of Sho1 or Ste11. The primary advantages of the contingency matrix are that it (1) allows a comprehensive documentation/visualisation of all reactions and dependencies within the network, (2) that it does so without requiring assumptions, (3) that it explicitly defines unknowns and hence gaps in our knowledge and (4) that the matrix constitutes a template from which mathematical models can be derived automatically (see below).

The reaction graph displays a topological, directed reaction network (Figure 1F). It represents each entity as a single node and each relationship between a pair of entities as a single edge. Edges can be non‐directional (e.g., protein−protein interaction), unidirectional (e.g., phosphorylation) or bidirectional (e.g., phosphotransfer). The full reaction graph displays the domains and residues involved in each reaction. The protein parts are independent nodes and defined as neighbours (proteins can have domains or residues, domains can have subdomains or residues, subdomains can have residues). The inclusion of domain information makes the reaction graph similar to the (extended) contact maps (Danos, 2007; Chylek et al, 2011). The reaction graph and contact maps are both purely topological and do not include any contextual information, in contrast to the extended contact map which, for example, may show that binding only occurs to phosphorylated residues. We also use a condensed variant that displays only the central node for each component and collapses multiple reactions of the same kind between a pair of components to a single edge, and hence corresponds closely to the activity flow diagram of SBGN (Supplementary Figure S1B; Le Novere et al, 2009). The advantages of the reaction graph are (1) the relative simplicity that makes it useful for visualisation of even large networks and (2) that it is suited for visualisation of large‐scale data sets within the context of that network (see below).

The regulatory graph shows how information is conveyed through the network (Figure 1G). It improves on the reaction graph by including information on causality between the reactions in the network (C2 data). The regulatory graph shows the network's regulatory structure; that is, which reactions (via states) actually influence the rate of other reactions. It is a bipartite graph with the elemental reactions (red) and elemental states (blue) as nodes. Reaction‐to‐state edges simply show which reactions produce or consume which states. The state‐to‐reaction edges show which states (products of upstream reactions) affect the dynamics of which (downstream) reactions. These state‐to‐reaction edges correspond to the symbols in the contingency list, i.e., ‘!’, ‘K+’, ‘K−’ or ‘x’. The regulatory graph can easily be translated into an influence graph, which can be used for structural analysis of the network (Kaltenbach et al, 2011). In contrast to the influence graph or ‘story’ (Danos, 2007), the regulatory graph strictly separates the effects of reactions (production or destruction of states) and the modifiers (increase or decrease in reaction rates) via distinct edge types. Furthermore, only the (modified) elemental states are displayed and the (the unmodified) complementary source/target state is implicit. Hence, like in the ‘stories’, cyclic motifs only appear when there is a true feedback in the system. This visualises both the (possible) sequence of events and the feedbacks clearly. However, in contrast to the ‘story’, the regulatory graph is comprehensive and simultaneously visualises all possible paths or ‘stories’. In this example (Figure 1G), the uppermost node pair corresponds to the reaction where Sho1 binds Ste11 (Sho_ppi_Ste11) and the resulting state Sho1‐‐Ste11. The reaction‐to‐state edge linking these two nodes identifies Sho1‐‐Ste11 as the product of this binding reaction. Note that the source states for this reaction are omitted (i.e., Sho1 not bound to Ste11 and Ste11 not bound to Sho1). The state‐to‐reaction edge from Sho1‐‐Ste11 to Ste20_P+_Ste11 shows that the phosphorylation of Ste11 by Ste20 is enhanced in the Sho1‐‐Ste11 complex. This reaction in turn produces the state Ste11‐{P}, which is required for phosphorylation of Pbs2 on both Ser514 and Thr518. Hence, the information flow can be followed throughout the network as all edges are unidirectional. The main advantages of the regulatory graph are that it (1) allows a comprehensive documentation/visualisation of all reactions and contingencies within the network, (2) that it does so in a very compact format (3) without forcing non‐supported assumptions, (4) that it can be used for structural analysis of the network and (5) that it clearly shows the information flow through the network.

Process descriptions are well established and allow visualisation of the information flow and mechanistic detail simultaneously (Kitano et al, 2005). They are excellent for representation of small networks which are completely known, but lack of data (of the right granularity) invariably lead to unsupported assumptions. In addition, these diagrams rapidly become very complex, generally forcing ad hoc reduction and additional implicit and unsupported assumptions. Therefore, process descriptions do not allow a complete description of the network with the stringency we require. However, the process description can be clear and easy to read, and we generate a limited version which excludes complex formation and hence avoids most of the combinatorial complexity. The difference is highlighted by the upper three nodes in the example (Figure 1H), where Ste20 phosphorylates Ste11. In contrast to full process description, the binding of Ste11 to Sho1, and how this binding would affect the phosphorylation, is not included (compare Supplementary Figure S1). The (limited) process description has several advantages: It (1) is intuitive to read and (2) defines in which internal state(s) an enzyme is active, its substrate and the exact target residue, which (3) conveys the information flow through the pathway, the enzyme–substrate relationships as well as the gaps in our understanding of these aspects.

The information can also be used to generate interaction matrices that specify which components react with which components. These can be rendered at several levels of detail ranging from a complete interaction matrix including protein domains and target residues that defines each interaction type, via condensed interaction matrices with only one row and column per protein that still contains reaction type information (Figure 1I, upper matrix), to numerical matrices that only include information on connection and directionality. We used the latter to calculate the distances within the network to generate a distance matrix (Figure 1I, lower matrix).

Finally, the rxncon tool provides export to entity relationship diagrams (Figure 1J). Like the regulatory graph, the entity relationship diagram displays reactions and contingencies separately and hence largely avoids the combinatorial complexity. The entity relationship diagram has the advantage of concentrating all information on a given protein around a central node, which works especially well for simple regulatory circuits. This emphasises the role of each component within the network, in contrast to the regulatory graph which emphasises the information flow through the network. The entity relationship diagram is generated automatically by the rxncon software and visualised via Biographer (Biographer). In the same way, the rxncon software can be used to generate the contingency matrix, the reaction graphs, the regulatory graph, and, via BioNetGen (Blinov et al, 2004), the SBML file that constitute the basis for the process description. These generations are fully automated and hence the framework addresses the issue of (ii) automatic network visualisation without further assumptions and—in the case of the contingency matrix and regulatory graph—without any simplifications.

Generation of mathematical models

The contingency matrix is a template for automatic generation of mathematical models. Each elemental reaction corresponds to a basic (context‐free) rule in a rule‐ or agent‐based model (Table II), or, in other words, a set of rules that share a reaction centre (Chylek et al, 2011). All contextual constrains on an elemental reaction is defined in a single row in the contingency matrix, and this row defines the elemental reaction's implementation in the rule‐based format. The basic rule suffices if there are no known modifiers of a particular elemental reaction (i.e., only ‘0’ and ‘?’ apart from the intersection with its own state(s) (which is always ‘x’ for a product state and ‘!’ for a source state)). Every other contingency splits the expression in two rules; one when that elemental state is true and one when it is false. The number of rules needed only increases with the number of quantitative modifiers (‘K+’ and ‘K−’) as the qualitative modifiers sets the rate constant to zero in either the ‘true’ (for ‘x’) or false (for ‘!’) case (see Supplementary information for details). The expansion to rules is fully defined in our data format and the rxncon software tool automatically generates the input file for the computational tool BioNetGen (Blinov et al, 2004). This file can be used for rule‐based modelling, network‐free simulation and creation of SBML files. The translation to and from the rule‐based format is unambiguous in both directions, and we illustrate this with translation of a rule‐based model of the pheromone response pathway (yeastpheromonemodel.org). This model contains lumped reactions which we translate to combinations of elemental reactions, resulting in a different equation structure but the same functionality given appropriate choice of rate constants (Supplementary Table S3). Furthermore, we cannot distinguish different identical proteins in, for example, homodimers, and can therefore not define strict trans‐reactions within such dimers. Apart from these issues, we can reproduce the same model with only cosmetic/nomenclature differences (see Supplementary information for details). Hence, the framework addresses the issue of (iii) automatic model generation from the database of biological information.

View this table:
Table 2. Implementation of elemental reactions in the rule‐based format

Mapping the MAP kinase network

As a benchmark, we have used the presented framework and an extensive literature search to create a comprehensive map for the yeast MAP kinase network (Supplementary Table S1). Reactions have been defined with specific residues and domains whenever experimental support was sufficient. The degree of experimental evidence has been evaluated manually and individually for each entry, and references to primary research papers supporting each interaction have been included in the reaction and contingency lists (column ‘PubMedIdentifier(s)’). We have used mechanistic data on reactions (C1) and a combination of mechanistic and genetic data on contingencies (C2) between reactions and reactants’ states from primary research literature. The mapping is based solely on primary research papers and de facto shown data to ensure a high‐quality network reconstruction. We chose to exclude almost all genetic data as indirect effects cannot be ruled out even in well‐performed genetic screens. Finally, we decided not to include spatial data, as we found information especially on regulation of (re)localisation too sparse. To the best of our knowledge, we have eliminated all questionable information from the compiled data set, and convincing reactions lacking solid mechanistic evidence have been included but clearly and distinctly labelled.

The MAP kinase network contains 84 components, 181 elementary states and 222 elementary reactions, corresponding to many hundreds of thousands of specific states. This network is large enough to be a severe challenge to the established visualisation and analysis methods. We did in fact fail to generate the complete state space and terminated the BioNetGen expansion after the first three iterations which generated 207, 1524 and 372 097 specific states, respectively. We use a range of graphical formats to visualise different aspects of this highly complex network. First, we display the network topology in the reaction graphs (Figure 2). These figures show that the number of characterised phosphorylation reactions vastly outnumbers that of characterised dephosphorylation reactions (68 to 16; Figure 2A), and that several well‐established processes are only supported by genetic data (including the entire MAP kinase cascade below Pkc1; Figure 2B, dashed lines). The reaction graph also allows comparison between the established pathway architecture and the unbiased global protein−protein interaction studies and synthetic lethal networks (Figure 3A and B, respectively).

Figure 2.

The reaction graph compactly displays the topology of the S. cerevisiae MAP kinase network. (A) The reaction graph of the MAPK network displays the components as nodes and the reactions as edges. Each component is defined by a central major node and peripheral minor nodes indicating domains, subdomains and specific residues (blue). When interacting domains and target residues are known, reactions are displayed as edges between these minor nodes. In contrast, the condensed reaction graph (B) displays each component as a single node, and each type of reaction between two nodes as a single edge. Nodes are either proteins (circles), small molecules (diamonds) or DNA (square). Edge colours indicate reaction type (co‐substrates and co‐products): Grey; protein–protein interaction (N/A), red; phosphorylation (−ATP, +ADP), orange; guanine nucleotide exchange (−GTP, +GDP), blue; dephosphorylation or GTPase activation (+Pi), gold; ubiquitination (−ubiquitin, −ATP, +ADP, +Pi), black; phosphotransfer or proteolytic cleavage (N/A). The domain layout in (A) prioritises readability and domain organisation does not reflect linear sequence or protein structure. Arrowheads indicate directionality for unidirectional or reciprocal catalytic modifications. Reactions for which we found no direct evidence but which are supported by convincing genetic data has been included as dashed lines. Note the much higher frequency of reported phosphorylation reactions as compared with dephosphorylation reactions; in total the network includes 68 phosphorylation reactions but only 16 dephosphorylation reactions (A).

Figure 3.

The condensed reaction graph is an excellent tool for visualisation of high‐throughput data. (A) Physical interactions within the MAPK network. The global protein–protein interaction network was retrieved from Biogrid (Stark et al, 2006), filtered for physical interactions excluding two hybrid, and visualised on the condensed reaction graph (Figure 2A). Purple edges indicate protein–protein interactions and their thickness indicates the number of times they were picked up, ranging from a single time (dashed line) to 19 times. Nodes that appear faded have no interactions with any other component in the MAPK network reported in this data set. Note that the nodes that do not correspond to single ORFs would be excluded automatically (e.g., the SCF complex, DNA, lipids). The smaller, boxed network display the corresponding two‐hybrid interaction network. (B) Genetic interactions within the MAPK network. Synthetic lethal interactions were retrieved from Biogrid and visualised as per (A). Also quantitative data, such as mutant phenotypes and gene expression levels, can be directly visualised on the network.

In the contingency matrix (Figure 4), we visualise the combined knowledge we have about the MAP kinase system (C1 and C2). The core matrix (red block of rows and blue block of columns) describe all the elemental reactions, elemental states and the (possible) contingencies of reaction on states. The black fields here show when there is no overlap between the components in the reactions and those defined in the states. Therefore, the matrix will always be sparsely populated. However, we also see that most of the remaining fields are grey; that is, effect not known (‘?’). This means that our knowledge of reactions (C1; which defines rows and columns) is much stronger than our knowledge of the causality between these reactions (C2; the cells). We only have data on a minority of all possible contingencies, and these gaps are explicitly shown in the contingency matrix. It should also be noted that not all effects can be ascribed to single elemental states. We have added an outer layer of Boolean states (purple rows and columns) to account for these cases. The Boolean states describe complex mechanisms such as scaffolding and can in principle correspond to the specific states of, for example, process descriptions. However, they are only added when needed to describe empirical results. Note that only a small fraction of the states are Boolean, which reflects the low abundance of empirical data on the combinatorial effect of elemental states (i.e., specific states). Therefore, we believe it to be better to use mapping strategies which do not require such data. Finally, the matrix contains a layer of inputs and outputs (grey; columns and rows, respectively). These constitute the system's interface with the outside.

Figure 4.

The contingency matrix provides a complete description of the network or network module. The core contingency matrix is spanned by the elemental reactions (rows, in red) and the elemental states (columns, in blue). The additional blocks are derived from the contingency list and contain the formation rules (rows) and effects (columns) of Boolean states (both purple) as well as the output of (rows) and input to (columns) the network (both grey). The cells in the matrix define how each reaction (row) depends on each state (column). The effects range from being absolutely required (‘!’), via positive effector (‘K+’), no effect (‘0’) and negative effector (‘K–’) to absolutely inhibitory (‘x’), or it can be unknown or undefined (‘?’). Each Boolean state is defined by a single operator (‘AND’ or ‘OR’) for the elemental states, other Booleans and/or inputs that defines it. The contingency matrix displayed here contains the complete MAPK network. Note that the contingency matrix is sparsely populated. This is both because most combinations of reactions and states lack overlap in components (black squares) and because we have very limited knowledge of the possible contingencies (grey squares). Overall, the information on what reactions can occur is much more abundant than on how they are regulated.

The regulatory graph (Figure 5) displays the information in the contingency matrix graphically, by showing how reactions produce or consume states, and how states influence reactions. This graph contains the full C1 and C2 information, and would fall apart without either. In fact, the isolated reaction–state pairs that fall outside the graph do so because they have no known incoming or outgoing contingencies. The graph shows that the MAP kinase network is rather well connected, as most reactions are indeed linked in a single graph by contingencies. However, there are relatively few input and output points; many reactions do not have known regulators and many states do not have defined regulatory effects. Only reaction–state pairs that appear between the system's input and output would be able to transmit information. This means either that all other pairs are irrelevant for the dynamics of the signal‐transduction process, or that we are lacking information about their role in this process. In fact, such lose ends might be excellent candidates for targeted empirical analysis. One example would be Msb2's binding to Cdc42, which is reported to be important for the pseudohyphal differentiation pathway; raising the question of whether this binding is regulated in response to the stimuli that activate this part of the MAP kinase network. Another point that stands out is the almost complete lack of (documented) information exchange between pathways. The exception is the Sho branch of the Hog pathway, which is closely intertwined with the mating pathway, as both are activated by the shared MAP kinase kinase kinase Ste11 and parts of the cell polarity machinery.

Figure 5.

The regulatory graph visualise the causality between reactions and reveals the regulatory structure of the network. This bipartite graph illustrates the relationships between the reactions (red nodes) and states (blue nodes) within the network. Edges from reactions to states define how states are produced (blue) or consumed (purple), and each such edge corresponds to a single elemental reaction. Edges from states to reactions define how states regulate other reactions, and each such edge correspond to a single contingency (Green; absolute requirement (‘!’) or positive effector (‘K+’), red; negative effector (‘K–’) or absolutely inhibitory (‘x’)). Booleans are used when the effect on a reaction cannot be attributed to single elemental states (white diamonds (OR) or triangles (AND) connected to the states/Booleans/inputs that define them with black lines). Inputs are displayed in grey and connected to the elemental reaction(s) they influence. Likewise, outputs are displayed in grey and connected to the states they are influenced by. Signals can be followed through the network from external cues (grey; top) to transcriptional response (grey; bottom) as all edges are directional. Reactions without input are not (known to be) regulated and would therefore be expected to have constant rates; likewise states without output have no (defined) impact on the system. We have also included likely but undocumented requirements for enzyme–substrate bindings before catalysis as dashed lines. The regulatory graph is the only graphical representation using the complete information in the contingency matrix, and hence the only complete and completely graphical visualisation of the network. It is also the most potent visualisation to evaluate the degree of knowledge about the network. For example, visualisation of high‐throughput data would result in disconnected reaction–state pairs only, due to the lack of regulatory information (no C2 data).

We have also generated a network map in the established process description format, but without complex formations (Figure 6). This decision eliminated most of the combinatorial explosion and the need for implicit assumptions. However, there is still uncertainty in the specific phosphorylation state of the active state of certain catalysts, such as Ssk2, Ste11 and Ste7. Likewise, we do not know if phosphorylation order is an issue for proteins with multiple phosphorylation sites. In contrast to the regulatory graph (Figure 5), the process description becomes more complicated the more unknowns we have and Figure 6 is simplified (compare Supplementary Figure S2). However, the limited process description in Figure 6 clearly shows the catalyst–target relationships, and reinforces the impression that very few of the known phosphorylation reactions are balanced by known dephosphorylation reactions.

Figure 6.

The limited process description displays all posttranslational modifications and their catalysts, but excludes complex formation. Each specific internal state is represented as a distinct node, although some intermediate phosphorylation states have been excluded. Phosphorylations are indicated with red arrows (ATP as co‐substrate and ADP as co‐product), GEF reactions as orange arrows (−GTP, +GDP), and dephosphorylation or GAP reactions as blue arrows (+Pi). Only a fraction of the catalytic modifications have a known catalyst for both forward and reverse reactions, and the required state of the catalyst known is in even fewer cases. Therefore, even this highly simplified process description includes uncertainty in the required states of both catalysts and substrates. In this visualisation, this uncertainty has been shown by using a single catalysis arrow from a box including all potentially active state of the catalyst to the basic state of the substrate (completely unphosphorylated for kinase reactions, or completely phosphorylated for phosphatase reactions). While these simplifications are unsupported, including additional catalytic arrows would be equally arbitrary with the added drawback of making the figure more complex (see Supplementary Figure S2). Despite the need for implicit assumptions, the process description is useful as it is very explicit and intuitive to read.

Finally, we automatically generated a mathematical description of the entire network as a proof of principle. The rxncon software used the contingency matrix to generate the input file for BioNetGen (Blinov et al, 2004). The corresponding network is too large to create but could be simulated with the network‐free simulator NFSim (Sneddon et al, 2011). Further analysis of this system falls outside the scope of this paper, but the input file to BioNetGen and/or NFSim with trivial parameters is included as a supplement. Hence, a complete mathematical model can be automatically generated from the reaction and contingency data, and to our knowledge this is the first framework that integrates network definition at the granularity of empirical data with automatic visualisation and automatic model creation.

Discussion

It is clear that the complexity of signal‐transduction networks is one of the major challenges in systems biology, impeding our ability to visualise, simulate and ultimately understand these networks. This issue has been widely recognised and substantial efforts have been committed to improve and standardise our tools for visualisation and modelling of cellular networks (Hucka et al, 2003; Le Novere et al, 2009). These standardisation efforts are essential for data exchange and reusability, but many of the existing tools are unsuitable for definition, visualisation and mathematical modelling of large networks. The arguably most important problems are the combinatorial complexity, the granularity difference between empirical and theoretical data, and the lack of exchange formats between different theoretical descriptions. Here, we have introduced a new framework for network definition at the same granularity as most empirical data. This format was already available for C1 (reaction) information, as our list of elemental reactions uses the same format as high‐throughput data (PSICQUIC). We describe contextual information at the same granularity in our contingency list (C2), which not only allows an intuitive and accurate translation of empirical data but also largely avoids the combinatorial complexity. Contrary to state transition based descriptions but like the related rule‐based format, the reaction and contingency based description becomes smaller the less knowledge we have as only known reactions and contingencies are considered. This format also provides for highly detailed referencing as each elemental reaction and contingency can and should be tied to empirical evidence (i.e., research paper(s)). Furthermore, we show that this format is stringent and unambiguously define both rule‐based models and graphical formats, such as the activity flow diagram (condensed reaction graph), entity relationship diagram and process description formats of SBGN. Our framework also supports two new visualisation formats that we introduce here and that can display our complete knowledge database (the complete reaction and contingency lists). Finally, our framework provides a very high reusability and extendibility, as the underlying network definition—in list format—is very easy to extend, merge and reuse in other context, which is not the case for most graphically or mathematically defined systems. Of course, this level of definition still leaves the issues of parameter estimation and graphical layout, but these would typically need to be repeated even when merging graphical and mathematical network definitions. Hence, we advocate a more fundamental level of network definition than graphical or mathematical formalism. We envisage this or a similar framework as a standard to greatly facilitate model/network construction, exchange and reusability.

We have applied this method to map out the MAP kinase network of S. cerevisiae. This network was chosen as a benchmark since it is both well characterised and representative for signal transduction in general. It consists of three clear subgraphs, which have traditionally been considered more or less insulated pathways; the High Osmolarity Glycerol (Hog) pathway, the Protein Kinase C (PKC) pathway and the MATing (MAT) pathway, which almost completely overlaps with the PseudoHyphal Differentiation (PHD) pathway. These pathways have also been mapped or documented in several other efforts. KEGG presents a combined map of the traditional MAP kinase pathways in a format similar to its metabolic pathways (Kanehisa et al, 2006, 2010). However, the stringent edge definitions used for the metabolic networks have been abandoned and this is a ‘biologist's graph’. The picture is similar with the maps of yeast MAP kinase pathways at Science STKE (e.g., Thorner et al, 2005). For example, these maps display Ste11 with four upstream regulators, but it is unclear how they regulate Ste11 and how their contributions combine (e.g., AND or OR?). Therefore, these network maps may provide an excellent introduction to the networks by providing a components list and a rough idea of the components’ roles in the network, but they neither define reactions (C1) nor contingencies (C2) unambiguously. On the opposite end, we have the recently published process description of the cell cycle and its surrounding signalling network (Kaizu et al, 2010). This contains explicit definition of both C1 and C2 information. However, the tremendous number of specific states in such a network forces simplifications, which not only leads to a loss of knowledge, but also mixes up known contingencies (C2) with arbitrary assumptions made to simplify the network. One example in this particular case would be the separation of the upstream activation of Ste11 and its downstream effect on the Hog and Mating pathways. The output of this module is defined by the context of its activation, and this information is lost due to these arguably necessary simplifications. In addition, the granularity difference between the highly specific map states and the underlying biological data makes the mapping ambiguous, leading to further unsupported assumptions. Despite these shortcomings, the process description is useful for visualisation of certain network properties due to the explicit representation of highly detailed knowledge such as target residues. However, we stress that neither of these established and widely used methods are sufficient to accurately capture the entire signal‐transduction network. Instead, we introduce the contingency matrix and the bipartite regulatory graph as alternative methods, which are able to fully capture the entire knowledge database without simplifications or assumptions. Together with the established methods, these visualisations provide an unprecedented view on the chosen benchmark system, and we trust that this completely referenced and comprehensive map of the MAP kinase signalling network in S. cerevisiae will be a useful reference material for the research community.

These results have direct bearing on the many efforts to create large data repositories. Pure reaction (C1) data, such as protein−protein interaction networks, can be retrieved using the standardised Molecular Interaction Query Language (MIQL; which our reaction list is designed to be compatible with) and PSICQUIC (PSICQUIC). PSICQUIC accesses, for example, ChEMBL (Overington, 2009), BioGrid (Breitkreutz et al, 2010), IntAct (Aranda et al, 2010), DIP (Xenarios et al, 2002), MatrixDB (Chautard et al, 2009) and Reactome (Croft et al, 2010). Several of these databases have additional information including contingency (C2) information and a standardised (non‐graphical) format for definition and retrieval would further improve the usefulness of these resources and facilitate further analysis of the stored information. The framework we propose here provides such a format with the key advantage of including export to mathematical models. Since mathematical modelling is the most central and natural step to bring the knowledge in these databases into a useful form, where quantitative systems properties can most exhaustively be analysed, the introduction of such an export is an important step forward. This framework is still not as flexible as direct model definition but it provides distinct advantages. Formulating models directly using classical state transition reactions is either subjective or very cumbersome in practice due to the combinatorial explosion, and state transition based models for the networks of the size we consider here are too large to be simulated. The closest related modelling framework is rule‐based modelling, in which models can be formulated without these combinatorial explosion problems, and it is also to a rule‐based format that we export our models. However, the classical rule‐based modelling frameworks lack all the database properties of our framework, such as the contingency matrix and its export to various novel visualisation formats. In short, one could therefore say that our framework combines the best of existing knowledge databases with new visualisation tools and rule‐based modelling.

In conclusion, we present a method to document and visualise signal‐transduction networks that improves on previous strategies in the following respects; (I) it allows concise mapping at the same granularity as biological data, hence pre‐empting the need for implicit, unsupported assumptions, (II) it allows referencing of each elemental reaction and contingency separately and handles unknowns explicitly, (III) the network can be visualised without any simplifications or assumptions that increase the uncertainty, (IV) the visualisations can be automatically generated from the data files, (V) the network definition is a template from which a mathematical model can be automatically generated (VI) and exported to SBML and (VII) the supplied template and rxncon tool makes the method immediately useful for anyone with an interest in signal transduction. Hence, our framework bridge three critical levels of signal‐transduction network analysis; definition, visualisation and mathematical modelling, as well as empirical data and theoretical analysis.

Materials and methods

The MAP kinase network map is based on the papers listed below. The specific reference(s) are listed for each reaction and contingency individually in the reaction and contingency lists in the ‘PubMedIdentifier(s)’ column with their PMID number.

(Ai et al, 2002; Alepuz et al, 2003; Alepuz et al, 2001; Andrews and Herskowitz, 1989; Andrews and Moore, 1992; Apanovitch et al, 1998; Baetz and Andrews, 1999; Baetz et al, 2001; Ballon et al, 2006; Bao et al, 2004; Bao et al, 2010; Bar et al, 2003; Bardwell et al, 1996; Bardwell et al, 1998a; Bardwell et al, 1998b; Bender and Sprague, 1986; Bilsland‐Marchesan et al, 2000; Blumer et al, 1988; Breitkreutz et al, 2001; Bruckner et al, 2004; Butty et al, 1998; Chou et al, 2004; Chou et al, 2006; Cismowski et al, 2001; Clark et al, 1993; Collister et al, 2002; Cook et al, 1996; Crosby et al, 2000; Cullen et al, 2004; Davenport et al, 1999; de Nadal et al, 2003; Dodou and Treisman, 1997; Doi et al, 1994; Dolan et al, 1989; Dowell et al, 1998; Drogen et al, 2000; Elion et al, 1993; Errede et al, 1993; Escote et al, 2004; Feng et al, 1998; Fitch et al, 2004; Flandez et al, 2004; Flotho et al, 2004; Friant et al, 2001; Garcia‐Gimeno and Struhl, 2000; Garrison et al, 1999; Gartner et al, 1998; Gartner et al, 1992; Good et al, 2009; Green et al, 2003; Guo et al, 2009; Hagen et al, 1986; Hagen et al, 1991; Hahn and Thiele, 2002; Heenan et al, 2009; Heise et al, 2010; Ho et al, 2002; Horie et al, 2008; Inagaki et al, 1999; Inouye et al, 1997a; Inouye et al, 1997b; Irie et al, 1993; Jacoby et al, 1997; Jung et al, 2002; Kamada et al, 1995; Kamada et al, 1996; Ketela et al, 1999; Kim et al, 2010; Kim et al, 2008; Kranz et al, 1994; Kusari et al, 2004; Lamson et al, 2002; Lee and Levin, 1992; Leeuw et al, 1995; Leeuw et al, 1998; Li et al, 1998; Liu et al, 2005; MacKay et al, 1991; MacKay et al, 1988; Madden et al, 1997; Madhani and Fink, 1997; Madhani et al, 1997; Maeda et al, 1995; Maeda et al, 1994; Maleri et al, 2004; Mapes and Ota, 2004; Martin et al, 2000; Mattison and Ota, 2000; Mattison et al, 1999; Medici et al, 1997; Melcher and Thorner, 1996; Metodiev et al, 2002; Miyajima et al, 1987; Murakami et al, 2008; Nasmyth and Dirick, 1991; Nehlin et al, 1992; Neiman and Herskowitz, 1994; Nern and Arkowitz, 1998; Nern and Arkowitz, 1999; Nonaka et al, 1995; Olson et al, 2000; Ostrander and Gorman, 1999; Ozaki et al, 1996; Paravicini and Friedli, 1996; Parnell et al, 2005; Pascual‐Ahuir et al, 2001; Peter et al, 1996; Peterson et al, 1994; Philip and Levin, 2001; Posas and Saito, 1997; Posas and Saito, 1998; Posas et al, 1998; Posas et al, 1996; Proft et al, 2005; Proft et al, 2001; Proft and Serrano, 1999; Proft and Struhl, 2002; Raicu et al, 2005; Raitt et al, 2000; Rajavel et al, 1999; Reiser et al, 2000; Remenyi et al, 2005; Rep et al, 2000; Rep et al, 1999; Roberts and Fink, 1994; Schmelzle et al, 2002; Schmidt et al, 1997; Schmidt et al, 2002; Schmitz et al, 2002; Shi et al, 2005; Shimada et al, 2004; Sidorova and Breeden, 1993; Siegmund and Nasmyth, 1996; Siekhaus and Drubin, 2003; Simon et al, 1995; Skowyra et al, 1997; Smith et al, 2002; Soler et al, 1995; Song et al, 1996; Taba et al, 1991; Takahashi and Pryciak, 2007; Tao et al, 2002; Tarassov et al, 2008; Tatebayashi et al, 2003; Tatebayashi et al, 2007; Tatebayashi et al, 2006; Tedford et al, 1997; Truckses et al, 2006; Truman et al, 2009; Vadaie et al, 2008; Valtz et al, 1995; Varanasi et al, 1996; Verna et al, 1997; Vilella et al, 2005; Wang and Konopka, 2009; Wang et al, 2005; Warmka et al, 2001; Wassmann and Ammerer, 1997; Watanabe et al, 1994; Watanabe et al, 1995; Watanabe et al, 1997; Winters and Pryciak, 2005; Wu et al, 2006; Wu et al, 1999; Wu et al, 1995; Wu et al, 2004; Wurgler‐Murphy et al, 1997; Yablonski et al, 1996; Yamamoto et al, 2010; Yesilaltay and Jenness, 2000; Young et al, 2002; Yuan and Fields, 1991; Zarrinpar et al, 2004; Zarrinpar et al, 2003; Zarzov et al, 1996; Zeitlinger et al, 2003; Zhan et al, 1997; Zhan and Guan, 1999; Zhao et al, 1995; Zheng and Guan, 1994; Zheng et al, 1994; Zhou et al, 1993).

The methods used are an integral part of the results and are outlined in that section. For additional details, please see Supplementary information.

Conflict of Interest

The authors declare that they have no conflict of interest.

Supplementary Information

Supplementary Information [msb201212-sup-0001.pdf]

Supplementary Table S1 [msb201212-sup-0002.xls]

Supplementary Table S2 [msb201212-sup-0003.xls]

Supplementary Table S3 [msb201212-sup-0004.xls]

Supplementary File 1 [msb201212-sup-0005.txt]

Supplementary File 2 [msb201212-sup-0006.pdf]

Supplementary Figure 1 [msb201212-sup-0007.pdf]

Supplementary Figure 2 [msb201212-sup-0008.pdf]

Acknowledgements

We thank past and present colleagues for helpful discussions; in particular Akira Funahashi, Noriko Hiroi and Douglas Murray for suggestions in the conception phase, Hans‐Michael Kaltenbach for introduction to the bipartite graph, Clemens Kühn for introduction to NFsim, Jens Nielsen for the suggestion to use matrix multiplication to calculate network distance and Nina Arens for proofreading. We acknowledge support from JSPS and SSF (Japan‐Sweden collaborative postdoc grant to MK), Lions and the Swedish Research Council (to GC), the German Ministry for Education and Research (BMBF, SysMO2 project Translucent 2 to EK), the European Commission (UNICELLSYS, Grant 201142, AQUAGLYCEROPORIN, Grant 35995, CELLCOMPUT, Grant 043310 and SYSTEMSBIOLOGY, Grant 514169, all to SH and EK) and from the MULTIDISCIPLINARY BIO Sweden‐Japan initiative (Sweden: Foundation for Strategic Research SSF and Vinnova, Japan: Japan Science and Technology Agency JST) to SH and HK. Work in the laboratory of SH was also supported by a grant from the Swedish Research Council (Grant 2007‐4905).

Author contributions: HK initiated the mapping project. MK conceived the framework. GC and MK developed the framework with input from all the authors. CFT and MK mapped the MAP kinase network. FK and RP implemented the framework with guidance from GC and MK. FK created the rxncon software tool. MK drafted the first manuscript with help from GC. SH and EK contributed biological and theoretical background knowledge, respectively. SH, EK and HK provided the research environments and contributed to completion of the manuscript. All authors read, edited and approved the final manuscript.

References

This is an open‐access article distributed under the terms of the Creative Commons Attribution License, which permits distribution, and reproduction in any medium, provided the original author and source are credited. This license does not permit commercial exploitation without specific permission.