## Abstract

The formation and regulation of macromolecular complexes provides the backbone of most cellular processes, including gene regulation and signal transduction. The inherent complexity of assembling macromolecular structures makes current computational methods strongly limited for understanding how the physical interactions between cellular components give rise to systemic properties of cells. Here, we present a stochastic approach to study the dynamics of networks formed by macromolecular complexes in terms of the molecular interactions of their components. Exploiting key thermodynamic concepts, this approach makes it possible to both estimate reaction rates and incorporate the resulting assembly dynamics into the stochastic kinetics of cellular networks. As prototype systems, we consider the *lac* operon and phage λ induction switches, which rely on the formation of DNA loops by proteins and on the integration of these protein–DNA complexes into intracellular networks. This cross‐scale approach offers an effective starting point to move forward from network diagrams, such as those of protein–protein and DNA–protein interaction networks, to the actual dynamics of cellular processes.

## Introduction

Cells consist of thousands of different molecular species that orchestrate their interactions to form extremely reliable functional units (Hartwell *et al*, 1999). Such molecular diversity and the pervasive ability of cellular components to establish multiple simultaneous interactions typically lead to the formation of large heterogeneous macromolecular assemblies, also known as complexes (Pawson and Nash, 2003). These complexes form the backbone of the most fundamental cellular processes, including gene regulation and signal transduction. Important examples are the assembly of the eukaryotic transcriptional machinery (Roeder, 1998), with about hundred components, and the formation of signaling complexes (Bray, 1998), with tens of different molecular species.

One of the main challenges facing modern biology is to move forward from the reductionist approach into the systemic properties of biological systems (Alon, 2003). A major goal is to understand how the dynamics of cellular processes emerges from the interactions among the different molecular components. Typical computational approaches approximate cellular processes by networks of chemical reactions between different molecular species (Endy and Brent, 2001). A strong barrier to this type of approaches is the inherent complexity of macromolecular complex formation. Complexes are typically made of smaller building blocks with a modular organization that can be combined in a number of different ways (Pawson and Nash, 2003). The result of each combination is a specific molecular species and should be considered explicitly in a chemical reaction description. Therefore, there are potentially as many reactions as the number of possible ways of arranging the different elements, which grows exponentially with the number of the constituent elements. Twenty components, for instance, already give rise to over a million of possible species.

Two main types of avenues have been followed to tackle the exponential growth of the number of molecular species that arise during macromolecular assembly. One is based on computer programs that generate reaction rate equations for all of the macromolecular species (Bray and Lay, 1997). The other generates the different species dynamically (Morton‐Firth and Bray, 1998; Lok and Brent, 2005). Yet, none of the existing methods provides a consistent way to estimate the reaction rates. This obstacle is remarkable because the potential number of rates is even higher than the number of possible complexes. As a result, those methods often lead to unrealistic situations, such as the formation of polymeric complexes that do not exist under physiological conditions, which has been noted explicitly as intriguing caveats of the existing methodologies (Bray and Lay, 1997; Lok and Brent, 2005).

Here, we review the thermodynamic concepts underlying macromolecular complex assembly and examine how they can be used to both derive the dynamics of complex formation and estimate the transition rates needed to build a faithful computational model. The resulting stochastic approach does not give rise to the formation of unrealistic complexes and addresses the exponential growth of the number of species by stochastically exploring the set of representative complexes. In this way, it brings the properties of macromolecular interactions across scales up to the dynamics of cellular networks. To illustrate the applicability of this approach, we focus on DNA–protein complexes and their integration in gene regulatory networks. We consider as prototype systems the induction switches in the *lac* operon (Müller‐Hill, 1996) and phage λ (Ptashne, 2004), the two systems that led to the discovery of gene regulation (Jacob and Monod, 1961).

## Representation of macromolecular complexes

A crucial aspect is to use a description for macromolecular complexes that can capture the underlying complexity in simple terms. It is possible to take advantage of the fact that macromolecular complexes have typically unambiguous structures, where only certain molecular species can occupy a given position within the complex. In such cases, the specific configuration or state of the macromolecular complex can be described by a set of *M* variables, denoted by *s*=(*s*_{1},…*s*_{i},…*s*_{M}), whose values indicate whether a particular molecular component is present or not at a specific position. We chose *s*_{i}=1 to indicate that the component is present and *s*_{i}=0 to indicate that it is not present (Figure 1A). With this description, the potential number of specific complexes is 2^{M} and the number of reactions ½2^{M}(2^{M}−1)≈2^{2M−1}.

The use of binary variables provides a concise method to describe all the potential complexes without explicitly enumerating them. This type of approach has been used in a wide range of interesting biological situations, such as diverse allosteric processes (Bray and Duke, 2004), binding of molecules to a substrate (Keating and Di Cera, 1993; Di Cera, 1995), binding of multistate proteins to receptor docking sites (Borisov *et al*, 2005), and signaling through clusters of receptors (Bray *et al*, 1998; Duke and Bray, 1999). In practice, current binary‐variable approaches have strong limitations to tackle the assembly of macromolecular complexes. On the one hand, there are combinations of variables that do not have a physical existence. Explicitly, if a component that bridges two disconnected parts of the complex is missing, then the complex does not exist and if two components occupy the same position, they cannot be present within the complex simultaneously. On the other hand, the structure of the complex does not have to remain fixed. A complex can have different conformations and the components can be present in several states with different properties. None of the existing approaches based on binary variables incorporates all these key features needed to study macromolecular complex formation. In the following section, we exploit the underlying thermodynamics to put forward a binary description applicable to macromolecular assembly.

## Thermodynamics of assembly

Thermodynamics allows for a straightforward connection of the binary description with the molecular properties of the system. Each configuration of the macromolecular complex has a corresponding free energy, which is a quantity that indicates the tendency of the system to change its state. Transformations that decrease the free energy of the system are favored over those that increase it; that is, the tendency of the system is to evolve towards the lowest free energy minimum.

The statistical interpretation of thermodynamics (Gibbs, 1902; Hill, 1960) connects the equilibrium probability *P*_{s} of state *s* with its free energy Δ*G*(*s*) through the expression

where *Z*=∑_{s} e^{−ΔG(s)/RT} is the normalization factor and RT is the gas constant times the absolute temperature (RT≈0.6 kcal/mol for typical experimental conditions).

One crucial aspect is that the free energy of any given configuration, *s*=(*s*_{1},…,*s*_{i}…,*s*_{M}), can be obtained to any degree of accuracy by expanding the free energy in powers of the binary variables:

The first term on the right‐hand side of the equation is the free energy of the empty complex, when none of their components is present, and it serves as a reference free energy; the second term takes into account the energetic cost of placing each element in the complex; the third term accounts for the pairwise interactions between elements; the fourth term accounts for interactions that need a third element to take place; and higher order terms, not shown in the equation, account for interactions that take place only when multiple elements are present. This expansion can be stopped at any degree of complexity. Typical network diagrams, such as those of protein–protein interactions, represent only up to pairwise interactions (the first three terms). The main advantage of this expansion is that the 2^{M} free energies needed to characterize all the possible configurations can be obtained from a much smaller subset.

In order to connect the lower terms of this expansion directly with experimental data, the most important piece of information is that the free energy of binding, Δ*G*_{bind}, between two elements can be decomposed into two main contributions (Vilar and Saiz, 2005):

One of them, the interaction free energy Δ*G*_{int}, arises from the interactions between the two molecules, such as electrostatic, hydrophobic and Van der Waals interactions (Honig and Nicholls, 1995). The other, the positional free energy Δ*G*_{pos}, results from positioning the molecules in the right place and orientation so that they can interact and it accounts, among other potential contributions, for the loss of translational and rotational entropy upon binding (Page and Jencks, 1971).

The expression for the positional free energy can be written as

where the quantity Δ*G*_{pos}^{o} is the molar positional free energy and [*N*] is the concentration expressed in moles (Δ*G*_{pos}=Δ*G*_{pos}^{o}, for [*N*]=1 M). In general, the free energy of binding depends on the concentrations of the different components through the positional free energy, Δ*G*_{pos}.

Typical values of the molar positional free energy are Δ*G*_{pos}^{o}≈15 kcal/mol (Finkelstein and Janin, 1989). This value has been computed theoretically and measured experimentally for simple molecules (Page and Jencks, 1971; Finkelstein and Janin, 1989). For complex proteins, it might differ slightly. For instance, for the binding of AChE‐Fas2, it has been estimated to be 9 kcal/mol (Minh *et al*, 2005). Such high values of the positional free energy indicate that if the free energy of interaction is zero, the state in which two molecules are as close as if they were bound is extremely unlikely. Even small values of binding free energies, such as Δ*G*_{bind}^{o}≈−2 kcal/mol, would imply considerably high interaction free energies, such as Δ*G*_{int}≈−17 kcal/mol (assuming Δ*G*_{pos}^{o}≈15 kcal/mol). This is a crucial point in order to properly account for disconnected complexes. If the bridging element is missing, there is a missing positional free energy term and two missing interaction free energy terms (Figure 1B and C). Thus, the free energy of the disconnected complex (Figure 1B) is much higher than that of the connected one (Figure 1C), which indicates that the disconnected complex is extremely less abundant than the connected one under physiological conditions. The contribution of the positional free energy is also essential to avoid the presence of spurious polymeric complexes. A compact complex (Figure 1D) will have additional free energies of interaction compared to the chain‐like one (Figure 1C), which will render it much more stable than the polymeric chain. Finally, when two elements cannot be present in the same position at the same time, with this approach their interaction energy is infinitely large and the complex does not exist in practice.

The term ∑_{i=1}^{M−1}∑_{j=i+1}^{M}Δ*G*_{ij}*s*_{i}*s*_{j} accounts, among others, for interactions between components of the complex that have a multidomain structure, where domains interact in a pairwise manner with each other. In general, as we show in detail in the examples below, one should also consider the conformational free energy that accounts for the structural changes needed to accommodate multiple simultaneous interactions. This type of interactions requires higher order terms in the free energy expansion, such as ∑_{i=1}^{M−2}∑_{j=i+1}^{M−1}∑_{k=j+1}^{M}Δ*G*_{ijk}*s*_{i}*s*_{j}*s*_{k}.

All these thermodynamic concepts blend naturally into a binary‐variable description to provide an approach that incorporates the key elements needed for studying macromolecular complex assembly, such as avoiding the formation of unrealistic polymeric complexes, taking into account disconnected complexes, considering conformational changes, and incorporating mediated or multicomponent interactions.

The explicit way in which the thermodynamic approach is applied depends on the degree of characterization of the system. In the best‐case scenario, all the required free energies are known and the behavior of the system can be predicted straightforwardly in full detail. If a few free energies are missing, it is possible to build a model based on the known interactions and obtain the unknown free energies by ‘reversing the model’ (Vilar and Leibler, 2003). The resulting free energies can then be used in new situations. This type of approach has been used, for instance, to infer for the first time the *in vivo* free energies of looping DNA by the *lac* repressor as a function of the length of the loop (Saiz *et al*, 2005). It is also possible to use available sophisticated software packages to compute free energies from the known structures (Honig and Nicholls, 1995; Baker *et al*, 2001). Such sophistication has not been achieved yet for computing reaction rates. On the other hand, when a substantial amount of information is missing, one can postulate interactions and use a thermodynamic approach to explore the potential types of behavior. In general, the thermodynamic approach needs quantitative inputs, but the requirements are much less stringent than those of other quantitative approaches based on chemical reactions.

## Applications: making networks out of complexes and *vice versa*

The formation of DNA loops by proteins and protein complexes in the regulation of the *lac* operon and phage λ provides challenging examples to illustrate the applicability of this approach. Full understating of these two genetic systems requires the use of thermodynamic concepts not considered by current methods to study macromolecular assembly. These concepts are essential to tackle more complex situations, such as gene regulation in eukaryotes, which relies widely on DNA looping to implement action at a distance from regulatory elements that are far away from the promoter region (Roeder, 2003; Yasmin *et al*, 2004). In particular, decomposition of the free energy into interaction, positional, and conformational contributions is crucial for understanding how weak distal DNA binding sites can strongly affect transcription (Vilar and Leibler, 2003) and how proteins that do not oligomerize in solution do so on DNA to form DNA loops (Revet *et al*, 1999; Dodd *et al*, 2001; Vilar and Saiz, 2005). Hereafter, to simplify the notation, we will refer to positional, interaction, and conformational free energies by *p*, *e*, and *c*, respectively.

One remarkable aspect of our approach is that the binary‐variable method we propose has an equivalent representation in terms of network diagrams, in which nodes indicate whether or not a component or molecular conformation is present and links represent the interactions among components and molecular conformations. These networks can be directly mapped to the underlying macromolecular structure of the complexes and their properties, as illustrated in detail below for the cases of DNA looping in the *lac* operon and phage λ.

The resulting networks strikingly resemble the underlying molecular structures because nodes are associated with properties of the components, which have a determined arrangement in space. In this type of interaction networks, the whole network specifies a single state of the system, in the same way as a state of the macromolecular assembly is specified by indicating where each component is within the complex. This component‐oriented description allows for a representation that is not affected by the exponential explosion in the number of states as the number of components increases.

In other widely used quantitative methods with network representations, like Markov Chains, each node represents a state of the system and therefore there are as many nodes in the network as potential states. In those state‐oriented networks, only one node is occupied at a time and the behavior of the system is represented by a series of jumps from one node to another. In our case, several nodes can be occupied simultaneously and the behavior of the system is given by the sequence of changes in occupancy. In Figure 2 , we compare a general interaction network representation for a three‐component complex with the graphical representation of a Markov Chain for the same system. The number of nodes in the interaction network is the same as the number of components. In the Markov Chain network, in contrast, the number of nodes is 2 to the number of components.

There are also qualitative graphical representations, like ‘Molecular Interaction Maps’ (Kohn *et al*, 2006) and ‘process diagrams’ (Kitano *et al*, 2005), which aim at describing many types of cellular and biochemical processes that extend beyond macromolecular assembly. They can be translated into computational models, but how to perform that translation is not given explicitly by the representation and typically depends on the details of the processes involved. As a result, although the representation itself might not be affected by an exponential increase in the number of states (Kohn *et al*, 2006), the resulting mathematical description might be susceptible to this problem. A key difference with our approach is therefore that our representation gives straightforwardly the precise quantitative behavior of the system from its equivalence with the equations that describe the macromolecular assembly.

This avenue for incorporating binary variables with thermodynamics also presents important differences with the way in which thermodynamic concepts have been used so far in macromolecular assembly. Thermodynamic concepts applied to gene regulation were pioneered by Shea and Ackers (Ackers *et al*, 1982; Shea and Ackers, 1985) and subsequently used in a variety of situations (Vilar and Leibler, 2003; Bintu *et al*, 2005; Vilar and Saiz, 2005). In the traditional framework, the probability for the system to be in a state *k* is given by

where Δ*G*_{k}^{o} is the standard (molar) free energy of the state *k*, *j*_{k} is the number of molecules bound in the state *k*, and is the normalization factor (Hill, 1960). The summation is taken over all the states. Thus, to describe the system with the traditional approach one has to give the standard free energy for each state as well as the number of molecules bound. Typically, this is done in the form of a table, which has as many entries as the number of states of the system. Thus, for a system with three components, there is a table with eight entries for the standard free energy and another eight for the number of molecules.

In the approach we have proposed, there is a simple formula that accounts for the free energy of all the states, and the resulting expressions have a compact form amenable to computational and mathematical manipulations. As we show in the following sections, this approach brings forward explicitly the connection between macromolecular structure and function and its integration into the biochemical dynamics of cellular networks.

## The *lac* operon

The *lac* operon consists of a regulatory domain and three genes required for the uptake and catabolism of lactose (Müller‐Hill, 1996). Its main regulator is the *lac* repressor, which can bind to DNA at the main operator site *O*1, thus preventing transcription of the genes, and to an auxiliary operator (*O*2 or *O*3) by looping the intervening DNA.

For a system with two operators (*O*1 and *O*2), the *lac* repressor–DNA complex (Lewis *et al*, 1996) can be in five representative states (Vilar and Leibler, 2003): (i) none of the operators is occupied, (ii) a repressor is bound to just the auxiliary operator, (iii) a repressor is bound to just the main operator, (iv) a repressor is bound to both the main and the auxiliary operators by looping the intervening DNA, and (v) one repressor is bound to the main operator and another repressor, to the auxiliary operator.

We can express the free energy of all these states in a compact form in terms of binary variables:

where *s*_{1} and *s*_{2} are the binary variables that indicate whether (*s*_{i}=1, for *i*=1, 2) or not (*s*_{i}=0, for *i*=1, 2) the repressor is bound to *O*1 and *O*2, respectively; *s*_{DNA} indicates the presence (*s*_{DNA}=1) or absence (*s*_{DNA}=0) of DNA; and *s*_{L} is a variable that indicates the molecular conformation of DNA, either looped (*s*_{L}=1) or unlooped (*s*_{L}=0). The quantities *p* and *p*_{DNA} are the positional free energies of the repressor and DNA, respectively; *e*_{1} and *e*_{2}, the interaction free energy between the repressor and *O*1 and *O*2, respectively; and *c*_{L}, the conformational free energy of looping DNA. Δ*G*(0) is the free energy of the reference state in which there is no repressor, DNA, or DNA looping (all the binary variables are zero).

We are interested in the binding of the repressor to DNA and therefore DNA is always present (*s*_{DNA}=1), which simplifies the description to just three binary variables:

where we have chosen Δ*G*(0)=−*p*_{DNA} so that the reference free energy is equal to zero when no repressor is bound and there is no DNA looping. The network representation of the *lac* repressor–DNA assembly with the three binary variables (Figure 3A) has a close connection with the underlying molecular structural properties (Figure 3B), in such a way that, given the structural arrangement of a complex, our approach provides a straightforward avenue to obtain a network representation with an associated thermodynamic description. Explicitly, each node in the interaction network corresponds to a binary variable in the equation for the free energy.

The decomposition of the free energy in its different contributions (positional, interaction, and conformational) becomes crucial to perform an expansion in terms of binary variables. From the statistical thermodynamics point of view, the binding of the repressor to two operators has only five relevant states (*i*–*v*). With the binary description of the state of the complex, there are three variables and therefore 2^{3}=8 states. These two descriptions are in fact equivalent because three of the eight states in the binary description have significantly high free energies, that is, extremely low probabilities, which for practical purposes makes them irrelevant (see Figure 3C). The high free energies arise because for these states positional and conformational free energies are not balanced by interaction free energies.

This approach can naturally be used to study the consequences that DNA looping has in gene regulation. In the *lac* operon, transcription takes place only when the main operator *O*1 is free; that is, when the binary variable *s*_{1} is zero. Thus, the transcription rate, τ, is proportional to the average value of 1 minus *s*_{1}:

where τ_{max} is the maximum transcription rate. This model shows a precise agreement with experiments (Oehler *et al*, 1994) over the three orders of magnitude of the measured repression levels (Figure 3D). The specific form that DNA looping confers to the repression level as a function of the repressor concentration has two notable characteristics. The repression level has both a significantly high value and a relatively flat profile around physiological *lac* repressor concentrations (∼15 nM). Both properties have important general consequences for the underlying microbiochemistry of the cell. If concentrations of the different molecular species are kept low to prevent nonspecific interactions, not only is the binding to the specific sites decreased but also fluctuations are expected to become important (Elowitz *et al*, 2002; Paulsson, 2004; Rosenfeld *et al*, 2005). In the case of the *lac* operon, the average number of repressors per cell is very low, around 10, and, because of this low value, is expected to fluctuate strongly from cell to cell. The effects of DNA looping, as exemplified by the repression level, not only increase specificity and affinity of the *lac* repressor for the main operator but, at the same time, also make transcription fairly insensitive to fluctuations in the number of repressors.

We can address a more general situation, which includes the binding of different molecular species at identical positions within the complex. To illustrate this possibility, we consider the case of mutant *lac* repressors that do not tetramerize (Oehler *et al*, 1990). In its dimeric form, the *lac* repressor has just a single DNA‐binding domain and thus cannot loop DNA. In this case, the free energy is given by Δ*G*(*s*)=(*p*_{d}+*e*_{1})*s*_{1d}+(*p*_{d}+*e*_{2})*s*_{2d}, where *p*_{d} is the positional free energy of the dimeric *lac* repressor and *s*_{1d} and *s*_{2d} are the two binary variables that account for its binding to *O*1 and *O*2, respectively. If this mutant repressor is expressed together with the wild‐type (WT) *lac* repressor, it will compete for the binding to the operator sites and the free energy will be given by the sum of the free energies for WT and mutant repressors plus a contribution accounting for the interaction between the two types of repressors:

The term ∞(*s*_{1}*s*_{1d}+*s*_{2}*s*_{2d}) introduces an infinite free energy of interaction when the two types of repressors are bound to the same operator simultaneously, thus making the probability of such states zero. The five‐binary‐variable description of the WT and mutant *lac* repressor–DNA complex formation is a straightforward extension of the one for just WT *lac* repressor. This example clearly illustrates how it is possible to use our approach to add complexity without escalating into an exponentially growing description.

## Phage λ

The genetic regulation of phage λ provides an explicit example in which the DNA loop is formed not by a single protein, as in the *lac* operon, but by a protein complex that is assembled on DNA as the loop forms. The lysogenic to lytic switch in phage λ‐infected *Escherichia coli* lysogens is controlled at two operators in the phage DNA. These two operators, known as the left, *O*_{L}, and right, *O*_{R}, operators, are located 2.4 kb away from one another. Each of them has a tandem of three DNA sites where phage λ cI repressors can bind as dimers (cI_{2}): r1, r2, and r3 for the right operator, and l1, l2, and l3 for the left operator. Two cI dimers bound to r1 and r2 on the right operator can form an octamer with two cI dimers bound to l1 and l2 on the left operator by looping the intervening DNA.

Stability of the *E. coli* lysogens is accomplished by repression of transcription by the phage λ cI repressor of the *cro* gene at the *P*_{r} promoter and regulation of its own transcription at the *P*_{rm} promoter. Explicitly, binding of cI repressor dimers to r2, when r3 is vacant, activates its own transcription. When r3 is occupied, cI transcription is turned off. For a long time, one of the main puzzles in the regulation of phage λ was that the strength of r3 was too weak for it to be occupied by cI_{2} at the observed physiological concentrations (Ptashne, 2004). The missing element was that the formation of the DNA loop by the octamerization of the cI repressor dimers bound at r1, r2, l1, and l2 can bring l3 close to r3 so that the cI repressor can bind cooperatively as a tetramer to these two sites even though they are ∼2.4 kb apart.

Modeling of this system already gets close to the limits of the traditional thermodynamic approach. The number of states is 128, which accounts for all the combinations of occupancies of the six binding sites in either the looped or unlooped conformations of DNA. In terms of binary variables, however, the free energy of all the possible states of the assembly of the cI repressor–DNA complex is concisely described by

Here, *p* is the positional free energy of the cI repressor dimers; and *e*_{ri} and *e*_{li}, with *i*=1, 2, 3, are the interaction free energy of the cI dimer with each of its three DNA binding sites at the right and left operators, respectively. The terms of the type *e*_{r23}*s*_{r2}*s*_{r3} account for the pairwise interactions of cI dimers bound at neighboring DNA sites. The terms with three binary variables, such as *e*_{r123}*s*_{r1}*s*_{r2}*s*_{r3}, are introduced because these pairwise interactions are affected by the binding of cI repressors to the other neighboring site. The term (*c*_{L}+*e*_{T}*s*_{r3}*s*_{l3}+*e*_{O}*s*_{r2}*s*_{l2}*s*_{r1}*s*_{l1})*s*_{L} accounts for the effects of DNA looping. It includes looping (*c*_{L}), cI tetramerization (*e*_{T}), and cI octamerization (*e*_{O}) contributions to the free energy. This system has a network representation (Figure 4A) with seven binary variables that closely follows from its molecular organization (Figure 4B). As shown here, the seven‐binary‐variable description is equivalent to previous modeling of this system using the traditional thermodynamic approach (Dodd *et al*, 2004).

It is important to note that the expression, *e*_{r23}*s*_{r2}*s*_{r3}+*e*_{r12}*s*_{r1}*s*_{r2}+*e*_{r123}*s*_{r1}*s*_{r2}*s*_{r3}, can be rewritten in the more familiar form: *e*_{r23}*s*_{r2}*s*_{r3}(1−*s*_{r1})+*e*_{r12}*s*_{r1}*s*_{r2}(1−*s*_{r3})+(*e*_{r123}+*e*_{r12}+*e*_{r23})*s*_{r1}*s*_{r2}*s*_{r3}, which indicates that the free energy of the cooperative interactions between all the cI dimers bound to the right operator is *e*_{r123}+*e*_{r12}+*e*_{r23}. This expression explicitly reveals *e*_{r123} as the correction to the free energy arising from the effects of a third cI dimer on the intraoperator interactions between pairs of cI dimers and illustrates how it is possible to expand the free energy of a given configuration, *s*=(*s*_{1},…*s*_{i},…*s*_{M}), in powers of the binary variables.

In general, there is the potential for establishing multiple loops. In the traditional approach, considering two additional loops will increase the number of states from 128 to 256. In our case, this extension involves adding just three terms, (*c*_{L2}+*e*_{O}*s*_{r2}*s*_{l3}*s*_{r1}*s*_{l2})*s*_{L2}+(*c*_{L3}+*e*_{O}*s*_{r3}*s*_{l2}*s*_{r2}*s*_{l1})*s*_{L3}+∞(*s*_{L}*s*_{L2}+*s*_{L}*s*_{L3}+*s*_{L2}*s*_{L3}), to the free energy Δ*G*(*s*). Here, *s*_{L2} and *s*_{L3} are the binary variables for the two additional loops and *c*_{L2} and *c*_{L3} are the corresponding free energies of looping. For the WT situation, the probability of having these extra loops is very small and they do not significantly affect the behavior of the system (Dodd *et al*, 2004). However, they might become important in mutants with altered binding.

The effects of the left operator and DNA looping in the induction switch of phage λ are apparent in the transcription rate at the *P*_{rm} promoter, which depends strongly on the occupancy of r2 and r3. There is transcription at a basal level τ_{bas} when neither r2 nor r3 are occupied and at an activated level τ_{act} when r2 is occupied and r3 is free. The activated transcription rate, in turn, depends on whether (τ_{act}=τ_{actl}) or not (τ_{act}=τ_{actnl}) DNA is looped (Hochschild and Ptashne, 1988). These dependences are expressed mathematically through

The activity of *P*_{rm} as a function of the cI repressor concentration shows a sharp maximum for WT phage DNA and a plateau‐like maximum for two mutants in which the r3 site is not occupied at WT concentrations (Figure 4C). One of these mutants has only the right operator and the other has a weak l3 site. The narrower maximum of WT allows for tighter control of the cI_{2} concentration, whose production sharply decreases for high concentrations. This marked decrease is the result of the extra layer of cooperativity of binding to r3 introduced by DNA looping and has important consequences for the kinetics of the system.

## Stochastic dynamics and macromolecular assembly networks

A relationship between the kinetics and the thermodynamic properties of the system can be exploited to infer transition rates. It is known as the principle of detailed balance and results from the fact that at equilibrium the rates of going from state *s* to state *s*′ and its inverse, from *s*′ to *s*, are the same. Mathematically, it implies *P*_{s}*k*_{s → s′}=*P*_{s′}*k*_{s′ → s}, where *P* is the probability of the state denoted by its subscript and *k* is the transition rate of the processes denoted by its subscript. This expression, together with the equilibrium values of the probabilities, *P*_{s}/*P*_{s′}=e^{−(ΔG(s)−ΔG(s′))/RT}, leads to the following relationship between the probability transition rate constants between two states:

The remarkable property of this expression is that reactions with known rates can be used to infer the rates of more complex reactions from the equilibrium properties. For instance, the association rate of many regulatory molecules to different DNA sites is practically independent of the particular DNA sequence. The dissociation rate, in contrast, strongly depends on the sequence. In this case, knowing one association rate can be used to obtain the dissociation rates for different binding sites through the principle of detailed balance. The inferred rates can then be used to study the dynamics of the system.

The dynamics can be simplified further by following a procedure similar to that considered previously for the free energy: it is also possible to perform an expansion for the kinetics of the system, but now in terms of the number of components that can change simultaneously in a transition. We discuss in detail the case in which only one component can change at a given time: either the component gets into or out of the complex. For each component *i*, we can define *on* (*k*_{on}^{i}) and *off* (*k*_{off}^{i}) rates for the ‘association’ and ‘dissociation’ rates, respectively, which in principle will depend on the pre‐ and post‐transition states of the complex.

The explicit dynamics can be obtained by considering the change in binary variables as reactions

with rates

The reaction changes the variable *s*_{i} to 1 when it is 0 and to 0 when it is 1, representing that the element gets into or out of the complex. The mathematical expression of the transition rate reduces to *k*_{on}^{i} when the element is outside the complex (*s*_{i}=0) and to *k*_{off}^{i} when the element is inside the complex (*s*_{i}=1). Typically, the *on* rate does not depend as strongly on the state of the complex as the *off* rate. The *on* rate is essentially the rate of transferring the component from solution to the complex. The *off* rate, in contrast, depends exponentially on the free energy. The principle of detailed balance can be used to obtain the *off* rates from the *on* rates:

These remarkably compact expressions for the transition rates between different states of the complex can be considered together with other reactions that affect or depend on the state of the complex. In this way, it is possible to integrate the stochastic dynamics of macromolecular assembly into networks of chemical reactions and move the effects of macromolecular assembly up to the properties of cellular processes. The stochastic dynamics of the resulting networks of reactions and transitions can then be obtained with well‐established Monte‐Carlo algorithms (Bortz *et al*, 1975; Gillespie, 1976).

## Phage λ cI repressor self‐regulation

We illustrate the integration of macromolecular assembly into the dynamics of cellular processes through the kinetics of phage λ cI repressor's self‐regulation feedback. The traditional approach to simulate the dynamics of this system would have to consider all the 128 states of the cI–DNA complex as chemical species. Therefore, considering just the kinetics of binding, unbinding, and looping would involve 128 rate equations, one for each state. In addition to writing down these equations, the traditional approach would need as inputs the values of the 8128 rates that connect the 128 states with each other. The onerous complexity of the resulting procedure has prevented so far the simulation of the stochastic induction switch with DNA looping. There are kinetic studies only for the induction without DNA looping (Arkin *et al*, 1998). With the approach we have developed, the simulation of the stochastic kinetics of phage λ cI repressor's self‐regulation feedback follows straightforwardly.

Our graphical representation, which accounts in full quantitative detail for the assembly of macromolecular complexes, can be seamlessly integrated with the canonical textbook representation of chemical reactions (Figure 5A). In principle, this type of integration is also possible with qualitative representations like ‘Molecular Interaction Maps’ (Kohn *et al*, 2006) and ‘process diagrams’ (Kitano *et al*, 2005) to provide them with a precise stochastic dynamics.

We have considered the cI–DNA complex together with the different stages of protein production from transcription at the promoter to protein dimerization for active repressors to study the stochastic dynamics of the phage λ repressor self‐regulation. The *on* rates of the different transitions are *k*_{on}^{i}=*a*[*N*] for *i*=r1, r2, r3, l1, l2, and l3 (describing binding to the DNA operators by cI repressor dimers, cI_{2}) and *k*_{on}^{i}=*b* for *i*=L (DNA looping), with *a* and *b* constants. The *off* rates follow from the detailed balance principle, as delineated previously. In addition, we consider the following chemical reactions (see Figure 5A legend for details): cI mRNA production and degradation; cI repressor production and degradation; cI repressor dimerization and cI_{2} dissociation; and cI_{2} nonspecific binding and degradation. The rate of cI mRNA production, *k*_{t}, is a function of the state of the macromolecular complex, as described previously: *k*_{t}=(*k*_{act}*s*_{r2}+*k*_{bas})(1−*s*_{r3}), where *k*_{act} and *k*_{bas} are the activated and basal cI mRNA production rates.

This self‐regulatory network incorporates the macromolecular complexity at the promoter region as a module (Figure 5A). Only a few of the elements of the DNA–protein complex are directly coupled to the cellular dynamics. Specifically, there is an input, the cI dimer concentration ([*N*]), and two main outputs, the occupancy of the r3 and r2 sites, which control the production of cI mRNA (*k*_{t}). Depending on the free energy of DNA looping, this module has different behaviors (see Figure 5B–D). For a high free energy of DNA looping (green curves), so that it is very difficult to form the loop, the steady state cI_{2} concentration (Figure 5B) is relatively high and noisy in the lysogenic state, when the degradation rate of cI is low (Figure 5C). In contrast, for free energies of DNA looping close to WT levels (red curves), cI_{2} concentration in the lysogenic state is tightly regulated and remains narrowly constrained at low values, exhibiting small fluctuations. Quantitatively, the fluctuations of concentration, usually referred to as noise, are characterized by the variance divided by the mean of the number of molecules: η=(〈*N*^{2}〉−〈*N*〉^{2})/〈*N*〉 (Elowitz *et al*, 2002). In the lysogenic state, DNA looping substantially lowers the strength of the intrinsic noise, from values of η=7.9, when there is no DNA looping, to η=2.7. Switching from the lysogenic to the lytic states happens as the degradation rate of cI is sharply increased. After the switch, cI_{2} concentration goes to almost zero in both cases.

The concentration of cI_{2} also affects another important output of the cI_{2}–DNA assembly module: the occupancy of r1 and r2. This output controls the *P*_{r} promoter, which leads to transcription at a given rate when r1 and r2 are free and to no transcription when r1 or r2 are occupied. The activity of a reporter gene controlled by this promoter does not show a marked dependence on the free energy of looping (Figure 5D). Therefore, DNA looping allows for tight control of cI_{2} concentration at low levels without substantially affecting the turning on and off of genes at the *P*_{rm} and *P*_{r} promoters. Remarkably, it has recently been found experimentally that turning on transcription of the *P*_{r} promoter by increasing the degradation of cI_{2} is not affected by the presence of DNA looping (Svenningsen *et al*, 2005). As in the case of the *lac* operon, here DNA looping makes the system function reliably with low numbers of molecules.

## Conclusions: from molecules to networks

The study of cellular processes requires a balance of scope and detail. Whereas single molecular interactions can be modeled in full atomic detail with current technologies (Saiz and Klein, 2002; Karplus and Kuriyan, 2005), approximations in terms of chemical reactions are needed when turning to processes that reach the cellular scale and involve networks of interacting molecular species (Endy and Brent, 2001; Hasty *et al*, 2002). There are, however, many cellular processes, such as macromolecular assembly, that cannot naturally be described in terms of chemical reactions. One of the major challenges of current biology is therefore to incorporate the molecular details into the description of the dynamics of cellular processes.

We have presented here an approach for bridging the gap between molecular properties and the dynamics of networks of interactions. It can be applied in general to compute the stochastic dynamics of macromolecular assembly networks and their integration into cellular networks. This method is based on a binary description of the potential states of the system and a decomposition of the free energy into a combination of a small subset of elementary contributions of the different components. Such decomposition not only brings forward an extra level of regulation but also provides a starting point to characterize and predict the collective properties of macromolecular complexes, such as looped DNA–protein complexes, in terms of the properties of their constituent elements. The thermodynamic grounds of the method allow for the use of the principle of detailed balance to obtain rate constants, which prevents the appearance of the unrealistic situations noted in existing approaches (Bray and Lay, 1997; Lok and Brent, 2005).

The two examples explored here, the induction switches in the *lac* operon and in phage λ, represent perhaps the most elementary gene regulatory networks of the most basic organisms. And yet, to fully understand the transcriptional regulation in both systems one has to consider thermodynamic quantities that extend beyond standard theory of chemical reactions: macromolecular assembly networks have mechanisms built in that can be used to increase specificity and affinity simultaneously and, at the same time, to control the inherent stochasticity of cellular processes. In particular, exploiting the flexibility of DNA (Cloutier and Widom, 2004; Saiz *et al*, 2005), protein–DNA complexes can lead to the suppression of cell‐to‐cell variability, the control of transcriptional noise, and the activation of cooperative interactions on demand (Vilar and Leibler, 2003; Vilar and Saiz, 2005). DNA–protein complexes take full advantage of the conformational properties of DNA by introducing long‐range interactions, thus making DNA an active participant in the delivery of the information it encodes.

The mathematical part of the method we have presented has a direct correspondence with a graphical network description, where nodes represent whether or not an element or a property of the component is present, and where the strength of the links between nodes carries information about the effects of these elements on the stability of the complex. Therefore, our approach accounts for the precise stochastic biochemical dynamics, as shown by the prototype systems considered here, while keeping the simplicity of qualitative methods based on network diagrams. This methodology thus offers a solid starting point to move from qualitative to quantitative understanding of protein–protein (Jansen *et al*, 2003), protein–DNA (Edwards *et al*, 2002), and other interaction networks (Tavazoie *et al*, 1999; Shen‐Orr *et al*, 2002) on a genomic scale.

## References

- Copyright © 2006 EMBO and Nature Publishing Group