Although the instances where substantive theory provides more than a metaphor for biology are rare, one fruitful approach has been computing physical limits on biological processes. In the best of cases, such as Berg and Purcell (1977), a simplified model of bacterial chemotaxis illuminated the problems bacteria have solved to operate at the limits imposed by molecular noise. In a recent article published in the *Proceedings of the National Academy of Sciences*, Tkačik *et al* (2008) borrow an approach from computational neuroscience (Laughlin, 1981) to show that the performance of the first step in the gene network that defines anterior–posterior (AP) position in the fly approaches the limits set by information theory. The result is appealing because it hints that aspects of the tangled networks that govern animal development may be quantitatively understood without having to consider the details of all the underlying interactions.

Position in an embryo is defined generally by proteins called morphogens, and arguably the best understood of these is the transcription factor Bicoid (Bcd) that regulates the expression of gap genes such as *hunchback* (*hb*). Bicoid is expressed from a maternal anterior‐localized message and assumes an exponential profile. The question is, therefore, does this one protein define anterior position, and if so how accurately, or are other sources of information used (e.g., distance from the posterior end of the embryo)? The physical mechanism through which a single cell, an appendage, or an embryo measures its size is still obscure despite many genetic screens (Jorgensen and Tyers, 2004; Gregor *et al*, 2005; Hufnagel *et al*, 2007).

Information theory is the creation of Claude Shannon, and its enduring value stems from its abstract formulation that frees it from any particular embodiment. An unbiased coin toss supplies one bit of information: *N* tosses, *N* bits. To define information requires knowing the probability distribution of events; the result of a biased coin toss is less informative than a fair one.

Tkačik *et al* (2008) view morphogen patterning as a problem in information transfer: how to tailor the inputs so as to transmit maximum information down a ‘channel’ with defined error characteristics. In this case, the ‘channel’ represents the transcriptional and translational machinery that reads the information provided by the Bcd concentration gradient—the input—and transforms it into a profile of Hunchback (Hb) protein level—the output. The AP spatial profiles of the morphogen and its targets in the embryo are each represented by a probability distribution, *P*. The scatter in protein levels among nuclei at the same AP position defines a standard deviation σ. Inferring a position is equivalent to inferring the value of the morphogen. The noise in the Hb response will therefore limit the precision with which the value of the Bcd gradient can be inferred and will therefore reduce the amount of transmissible information. The abstraction of information theory then allows a direct calculation of the distribution of inputs (the morphogen) that allows maximal information transmission down a ‘noisy channel’. Importantly, the genetics and biophysics of protein synthesis that define the ‘channel noise’ are irrelevant for the calculations; the noise will be taken from experimental data. The optimality problem mentioned above can be rephrased into the alternative question: how would the output look if the channel were close to optimal? Assuming that noise is small, a simple but still illustrative limit yields then an optimal solution that can be formulated as a relation between the probability distribution of an output, *P*(O), and its standard deviation, σ(O): *P*(O)∼1/σ(O). To be optimal, the system has to be tuned such that the probability distribution of the output is maximized in the regions of small variance (*P*(O) is large when σ(O) is small), which translates into the intuitive idea that one should preferentially use input values that are least corrupted by channel noise. The result is nevertheless surprising: a quantitative constraint has been placed on experimental data using only general qualitative features of the problem.

Information theory is notoriously data intensive, as probability distributions are required and rare events matter. The authors took advantage of a unique data set where the nuclear Bcd was measured in 1300 nuclei per embryo, along with the protein levels of its target Hunchback (Hb) (Gregor *et al*, 2007). At the time point chosen, Hb is present only in the anterior half of the embryo. The probability distribution of Hb (defined by sampling the embryo) thus has a peak at 1 (the anterior region), a second peak at 0 (the posterior) and a little weight in between representing the transition region. The experiments also furnished a standard deviation, so comparison was possible with the optimal solution, and the two were very close, consistent with the idea that mutual information between Bcd and Hb concentration profiles is maximized.

The expression of *hb* is very dynamic, whereas the protein lingers. Anterior expression occurs first and is replaced by a stripe overlapping the posterior edge of the anterior domain. The *hb* expression is very dependent on feedback from Hb protein (partially from maternally supplied message). With no feedback, Hb is limited to the anterior 20% of the embryo similar to a synthetic reporter driven by three bcd sites (Simpson‐Brose *et al*, 1994; Crauk and Dostatni, 2005). So mechanistically, the Hb pattern is not a passive reading out the contemporaneous Bcd profile, as the authors also note.

A skeptic would ask whether the more variable distribution of *hb* message is also optimal. The authors may counter that selection should optimize the transfer of information from protein to protein—it is the profile of Hb protein that matters for fitness—and that intermediate steps do not matter. Would their formulation work when the anterior pattern resolves into a stripe a short time later (where the distribution of values has distinctly less information than the spatial profile)? Their novel, but debatable step replaces a space‐and‐time history by the distribution of one variable. The embryo is free to integrate information from earlier times and multiple locations; this is not so feasible in the theory due to limited data. There are other genes that are activated by Bcd (and localized by repression from other gap genes). Should their products obey the relation *P*(O)∼1/σ(O)?

The alternative approach to understanding developmental regulation is to model all the transcriptional interactions in the AP system for which we have the extensive data compiled by the Reinitz lab (http://flyex.ams.sunysb.edu/FlyEx/). How literal a model is most informative for this type of data is still unclear. The robustness of development against all manner of insults has been a source of marvel for as long as embryology has been a science. Within the *bcd–hb* system, overexpression of *bcd* by 3 × increases the *hb* expression domain by 50%, yet viable adults emerge (Lawrence, 1992). Experiments such as these do not require sophisticated biophysics to perform; can information theory aid our understanding of these larger problems?

## Conflict of Interest

The author declares that he has no conflict of interest.

## References

This is an open‐access article distributed under the terms of the Creative Commons Attribution License, which permits distribution, and reproduction in any medium, provided the original author and source are credited. This license does not permit commercial exploitation without specific permission.

- Copyright © 2008 EMBO and Nature Publishing Group