The binding of transcription factors (TFs) to specific sites in the genome is a crucial step in the molecular process controlling gene expression. The in vitro sequence specificity of these regulatory proteins can generally be well represented by consensus DNA motifs or slightly more sophisticated sequence profiles called position‐specific scoring matrices. These are widely used to scan genome sequences in order to find novel transcriptional target genes. Unfortunately, usually only a small fraction of the ‘hits’ thus obtained are functional in vivo, where local chromatin structure and TF–TF interactions come into play. Taking into account the context provided by the surrounding noncoding DNA is therefore essential. In a recent study currently published in Molecular Systems Biology, Nguyen and D'haeseleer (2006) present a promising strategy for determining which context features are most important for a given TF binding motif. Their approach belongs to a growing class of methods that fit simple mathematical models of transcription regulation to DNA microarray data to map gene regulation networks.
Many of the molecular players that govern gene expression are known, but our knowledge about their interactions with the DNA and with each other is very incomplete. Information about the gene regulatory network is only implicitly represented in the large volume of functional genomics data now available to us. The strengths of the ‘arrows’ between TFs and their target genes and the condition‐specific activities of the regulatory ‘nodes’ need to be inferred by computational means. A detailed mathematical model that accurately describes the molecular computations performed by the cell would greatly deepen our understanding of cellular physiology, and provide a framework for analyzing regulatory pathways or predicting the effects of genetic variation between individuals.
While the activity of a TF is often represented by its mRNA expression level (Segal et al, 2003), regulatory control is more often than not exerted at the level of subcellular localization or covalent modification of the protein, or the presence/absence of ligands. These variables really define the regulatory state of the cell, but they are much harder to measure experimentally than mRNA expression levels and therefore usually remain ‘hidden’. Nguyen and D'haeseleer use multivariate linear modeling to computationally infer the hidden post‐translational activity of each TF from the mRNA expression levels of its target genes, ignoring the mRNA expression level of the TF itself. This model‐based approach was previously introduced (Bussemaker et al, 2001) as an alternative to clustering‐based analysis of microarray data (Eisen et al, 1998; Beer and Tavazoie, 2004), and has been extended to include TF deletion data (Wang et al, 2002), position‐specific scoring matrices (Conlon et al, 2003; Foat et al, 2005), and TF–TF interactions (Das et al, 2004). Since each individual microarray experiment is analyzed by itself, TF activities can be inferred in a condition‐specific manner.
The ability to infer condition‐specific TF activities makes it possible to estimate the regulatory coupling strength between a TF and a putative target gene, by comparing the mRNA expression profile of the gene with the inferred TF activity profile across a large number of microarray experiments. This approach has previously been used (Liao et al, 2003; Gao et al, 2004) to refine the gene regulatory network structure derived from genome‐wide TF occupancy data (Harbison et al, 2004). Nguyen and D'haeseleer derive their initial guess of the network connectivity from matches to TF binding motifs in noncoding sequence, and subsequently use a modified version of the method of Liao et al (2003) to self‐consistently infer a matrix of inferred activities of every TF in every condition and a matrix of regulatory coupling strengths between every TF and every gene. Their approach provides an alternative to the use of evolutionary conservation to distinguish functional DNA motifs from nonfunctional ones (Kellis et al, 2003). While this is already interesting per se, the unique insight of the authors is that the inferred regulatory couplings can in turn be analyzed to determine which aspects of the promoter context cause the same motif to be functional in one gene and nonfunctional in another. They use this approach to gain insight into the role of promoter geometry and the interplay between two elusive motifs called PAC and rRPE.
An appealing analogy exists between the linear model for transcription regulation used by Nguyen and D'haeseleer and the well‐known linear equation called Ohm's Law, I=GV, which states that the electrical current (I) through a resistor is proportional to the voltage (V) across it. In the cell, TF activities play the role of the voltage and transcription rates that of the current, while the regulatory coupling between a TF and a target gene corresponds to the conductivity (G) of the resistor (see Figure 1 ). Changes in the mRNA expression level of all genes (often called the ‘transcriptome’) are interpreted as a response to changes in the regulatory activity of all TFs (which we might call the ‘transfactome’), and this relationship is modeled by a linear equation one might refer to as ‘Omes Law’. Nguyen and D'haeseleer show that Omes Law allows them to predict condition‐specific expression levels that were held out from the data set used to fit their model parameters more accurately than the method of Beer and Tavazoie (2004).
Electrical engineers will be surprised to learn that, in biology, the observed conductivity of a resistor strongly depends on where it gets inserted into the electronic circuit. With the work of Nguyen and D'haeseleer, we now have a computational strategy to systematically analyze how genomic context influences the in vivo responsiveness of TF binding sites.
- Copyright © 2006 EMBO and Nature Publishing Group