MicroRNAs (miRNAs) are small, non‐coding RNAs that play critical roles in post‐transcriptional gene regulation. In plants, mature miRNAs pair with complementary sites on mRNAs and subsequently lead to cleavage and degradation of the mRNAs. Many miRNAs target mRNAs that encode transcription factors; therefore, they regulate the expression of many downstream genes. In this study, we carry out a survey of Arabidopsis microRNA genes in response to UV‐B radiation, an important adverse abiotic stress. We develop a novel computational approach to identify microRNA genes induced by UV‐B radiation and characterize their functions in regulating gene expression. We report that in A. thaliana, 21 microRNA genes in 11 microRNA families are upregulated under UV‐B stress condition. We also discuss putative transcriptional downregulation pathways triggered by the induction of these microRNA genes. Moreover, our approach can be directly applied to miRNAs responding to other abiotic and biotic stresses and extended to miRNAs in other plants and metazoans.
MicroRNAs (miRNAs) are approximately 22‐nucleotide‐long, non‐coding RNAs that play critical roles in regulating gene expression at the post‐transcriptional level (Bartel, 2004; He and Hannon, 2004). The discovery of miRNAs has broadened our perspectives on the mechanisms of repression of gene expression, which is an important regulatory mechanism mediating many biological processes such as development, cell proliferation and differentiation. In plants, mature miRNAs base‐pair with complementary sites on target mRNAs and subsequently direct the mRNAs to be cleaved or degraded. Plant miRNAs regulate many genes that are involved in developmental control, for example, auxin signaling (Bonnet et al, 2004; Jones‐Rhoades and Bartel, 2004), organ polarity (Eshed et al, 2001; McConnell et al, 2001; Kidner and Martienssen, 2004), development transitions (Aukerman and Sakai, 2003; Chen, 2004), leaf growth (Palatnik et al, 2003) and RNA metabolism (Xie et al, 2003; Vaucheret et al, 2004). Several recent studies showed important functions of miRNAs in response to adverse abiotic stresses (Jones‐Rhoades and Bartel, 2004; Sunkar and Zhu, 2004; Lu et al, 2005; Bari et al, 2006). In Arabidopsis, miR399 was identified to be highly expressed under phosphate starvation (Fujii et al, 2005; Bari et al, 2006; Chiou et al, 2006) and miR395 was identified to be induced under sulfate starvation (Jones‐Rhoades and Bartel, 2004). Furthermore, quantitative experimental analysis proved that miR393 was strongly induced under cold stress (Sunkar and Zhu, 2004). In Populus, moreover, some miRNAs can be induced by mechanical stress and may function in critical defense systems for structural and mechanical fitness (Lu et al, 2005).
Many targets genes of miRNAs encode transcription factors as well (Bartel, 2004), each of which further regulates a set of downstream genes. Thus, activation of miRNA genes under abiotic stresses will lead to the repression of many downstream protein coding genes and affect physiological responses. Among various environmental factors, light plays a particularly important role. Sunlight is not only the energy source for plant photosynthesis, but also regulates several plant developmental processes and some physiological processes such as photosynthesis, seasonal and diurnal time sensing (Chory et al, 1996; Chattopadhyay et al, 1998; Baroli et al, 2004; Jiao et al, 2005). Similar to adverse environmental factors such as drought and salinity, light can also have stress effect on plants. It interacts with endogenous developmental programs, hence affects plant growth and development. In order to acclimate under such conditions, specific photoreceptor systems have been developed and evolved to monitor changes of light composition (Dunaeva and Adamska, 2001; Harvaux and Kloppstech, 2001; Shao et al, 2006). With complex photoreceptors, plant can register UV‐B radiation and transduce the information to nucleus, hence affect gene expression (Chattopadhyay et al, 1998; Kimura et al, 2003b; Jiao et al, 2005). Changes in gene expression in response to UV‐B radiation include reduction in expression and synthesis of key photosynthetic proteins as well as perturbation of expressions of the genes involved in defense mechanisms (Chattopadhyay et al, 1998; Kimura et al, 2003b; Jiao et al, 2005).
Regulation of gene expression plays an important role in a variety of biological processes such as development and responses to environmental stimuli. In plants, transcriptional regulation is mediated by a large number of transcription factors (TFs) controlling the expression of tens or hundreds of target genes in various, sometimes intertwined, signal transduction cascades (Venter and Botha, 2004; Wellmer and Riechmann, 2005). Transcription factor binding sites (TFBSs) are functional short DNA sequences (cis‐elements) that determine the timing and location of transcriptional activities. Many computational methods have been developed to reveal relationships between gene expression patterns and TFBSs in the proximal upstream regulatory regions of the genes of interest. In yeast, motifs with known functions have been related to transcriptional pathways by statistical analysis of the occurrence of known motifs in the promoters of coregulated genes (Bussemaker et al, 2001). However, the presence of individual motifs is only marginally indicative of a gene's expression pattern. Extended strategies pursue to optimally predict gene expression patterns with promoter cis‐elements and their combinations. With a systematic strategy, the expression of a large proportion of genes in Saccharomyces cerevisiae was accurately predicted based on promoter sequences (Beer and Tavazoie, 2004). Although distal regulatory elements other than those in proximal upstream promoter regions can modulate gene expression, a recent study emphasized that the sequences in the 5′‐upstream regions of genes were of primary importance in Arabidopsis gene regulation (Lee et al, 2006). Specifically, promoter sequences were sufficient to recapture mRNA expression levels for 80% of the TFs studied. This study confirmed the important role of promoter regions in Arabidopsis gene expression.
In light of this transcriptome‐based perspective and by taking advantage of the vast available data of genome‐scale microarray expression profile of protein‐coding genes, we develop an innovative computational approach to explore the expression activity of miRNA genes under certain conditions. We focus on identifying and annotating miRNAs in A. thaliana, which are responsive to UV‐B radiation, and further consider the regulatory pathways that are probably affected by the putative UV‐B‐inducible miRNA genes. Our approach is based on the following two observations. First, plant miRNAs generally direct endonucleolytic cleavage of target mRNAs (Llave et al, 2002; Schwab et al, 2005), hence enable rapid clearance of target mRNAs when they are expressed (Bartel, 2004; Axtell and Bartel, 2005). Under a particular condition, if an miRNA is upregulated, its targets are most likely to be coherently downregulated. Second, miRNA genes are transcribed by RNA polymerase II (Lee et al, 2004; Houbaviy et al, 2005; Xie et al, 2005; Zhou et al, 2007). Hence, the 5′ proximal promoters of miRNA genes are the most important regulatory regions, and significant cis‐elements in these regions are important in determining the spatial and temporal expression patterns of the miRNA genes. Therefore, miRNA and protein‐coding genes carrying the same or similar cis‐elements in their promoters are very likely to be coregulated under the same condition and consequently very likely to be coexpressed.
Although we focus on Arabidopsis UV‐B responding miRNA genes in this study, our approach can be directly applied to plant miRNA genes functioning under other abiotic or biotic stress conditions.
Results and discussions
UV‐B responsive miRNAs
One of the bases of our method for finding stress responsive miRNA genes is that protein‐coding genes targeted by the same miRNA are likely to have coherently downregulated expression patterns. We consider an miRNA to be putatively stress inducible if the expressions of its target genes are coherently repressed and the coherence is statistically significant above a threshold. In this study, we only considered bonafide target genes reported in the literature. For each miRNA, pairwise cosine similarities of the expressions of its target genes were computed. We measured the coherence of the expressions of its target genes by average pairwise similarity. Statistical significance of the coherence was assessed with a P‐value from a Monte Carlo simulation. Briefly, for each miRNA with n target genes, we first calculated the average pairwise cosine similarity of the expressions of the target genes. We then randomly sampled n genes from the whole set of genes that were profiled and calculated their average pairwise cosine similarity. We repeated the sampling a large number of times, for instance, a million times in our study, and took as an empirical P‐value the frequency of observing a similarity value larger than that of the target genes. For each miRNA, we repeated this simulation 100 times and calculated the average P‐value and standard deviation.
Table I shows putative UV‐B responsive miRNAs. For each of these miRNAs, its target genes are coherently downregulated, and the coherence of their expression patterns is statistically significant. Except miR167, whose P‐value is less than 0.07, all candidates have P‐values less than 0.05.
For miR158, miR162, miR163, miR168, miR395, miR402, miR403, miR404, miR405 and miR406, we only found one bona fide target gene in the microarray data set, and could not test their coherence, and thus excluded them from our study.
UV‐B responsive miRNA genes
We applied our computational approach, discussed in section outline of the computational approach, to the microarray gene expression data under UV‐B radiation treatment from the Arabidopsis AtGenExpress project (www.arabidopsis.org/info/expression/ATGenExpress.jsp). We predicted 21 miRNA genes in 11 miRNA families to be upregulated under UV‐B radiation. Table II lists these UV‐B‐inducible miRNA genes. A putative UV‐B responsive miRNA gene must satisfy two criteria: First, the set of protein‐coding genes with the same array of motifs in their proximal promoter regions is enriched with UV‐B upregulated genes. Second, its inferred expression (discussed below) should be anticorrelated with the expressions of its target genes.
For each miRNA gene, we analyzed whether the combination of significant motifs in its promoter was statistically relevant to the UV‐B stress. First, we examined all protein‐coding genes in the whole set of gene profiled in the microarray experiments, and found those genes that contain the same or very similar motifs in their proximal promoter regions. We then tested whether these protein‐coding genes were enriched with upregulated genes (see sections outline of the computational approach and hypergeometric distribution of motifs or motif combinations).
We further imposed on miRNA genes a criterion of anticorrelation between the inferred expression of an miRNA gene and the expressions of its mRNA targets, in order to filter out possible false predictions. As we did not rely on any direct information of miRNA expression, we used the inferred expression of an miRNA gene and the expressions of its targets to compute their anticorrelation (see section inference of expression patterns of miRNA genes). In our study, we chose the five best protein‐coding genes whose 5′ proximal promoters contain arrays of cis‐elements that most resemble those of the corresponding miRNA genes. These five genes are most likely to be coregulated with the corresponding miRNA gene, and thus their expression patterns are most likely to be similar to the expression pattern of the miRNA gene. In the rest of our discussion, we refer to the average expression pattern of the top five coregulated protein‐coding genes of an miRNA as its inferred expression pattern or expression pattern for short.
Before we inferred expression patterns of miRNA genes, we applied the inferring procedure to 100 randomly selected protein‐coding genes with known expression patterns, and then assessed similarities between their inferred and actual expression patterns. For all these 100 genes, the cosine similarity values of their inferred and actual expression patterns are between 0.3 and 0.89, and the average of these values is 0.51. Figure 1 shows the inferred expression pattern and the actual expression pattern of protein‐coding gene, At1g19770. The figure gives a pictorial view of the similarity of the inferred and original expression patterns. The cosine similarity value of these two expression patterns of At1g19770 is 0.76.
For each putative UV‐B responsive miRNA gene, we calculated the average cosine similarity between its inferred expression and the expressions of its targets. We assessed the statistical significance of the similarity with a P‐value. Similar to the analysis of expression coherence of target genes in the section UV‐B responsive miRNAs, P‐value was also obtained by a Monte Carlo simulation. We took as an empirical P‐value the frequency to observe a cosine similarity value smaller than that in the real data. For each miRNA, the simulation was also repeated 100 times to obtain an average P‐value and a standard deviation.
Forty miRNA genes satisfy the first criterion. However, as shown in Table II, inferred expressions of 21 miRNA genes are anticorrelated to the expressions of their target genes (cosine similarity less than 0), and the anticorrelations reflected by the average cosine similarities of inferred expressions and expressions of target genes are statistically significant. These 21 genes are our predicted UV‐B responsive miRNA genes.
In all putative UV‐B responsive miRNA families shown in Table I, at least one member gene from each family was predicted to be upregulated under UV‐B radiation. However, none of the members in other miRNA families was predicted to be UV‐B responsive. Three miRNA genes, miR168a, miR395c and miR395e, might also be UV‐B responsive. The arrays of motifs in their proximal promoter regions are statistically significantly relevant to UV‐B stress, shown by small P‐values obtained from an accumulative hypergeometric test. Protein‐coding genes sharing the same array of motifs with them have enriched GO (gene ontology) terms that are related to stress response (see discussion in the section functions of protein‐coding genes that share the same arrays of motifs as putative UV‐B responsive miRNA genes). However, as these miRNAs have fewer than two experimentally validated target genes, the coherence of their target gene expressions and the anticorrelations between their expressions and expressions of their targets could not be analyzed. Hence these genes will not be discussed further.
Functions of protein‐coding genes that share the same arrays of motifs as putative UV‐B responsive miRNA genes
For each putative UV‐B responsive miRNA gene shown in Table II, there are some protein‐coding genes containing in their proximal promoter regions the same array of motifs. These protein‐coding genes are very likely to share the same regulatory program, hence coexpress with the miRNA gene. In order to further interpret their relevance to UV‐B stress, and hence to confirm the relevance of the miRNA gene to UV‐B stress, we calculated the enrichment of GO functional terms in the annotations of these protein‐coding genes. As shown in Table III, for 13 out of 21 putative UV‐B responsive miRNA genes, we identified significantly enriched stress‐related GO terms, by using the webserver for gene annotation analysis, FuncAssociate (llama.med.harvard.edu/cgi/func/funcassociate). In the table, the P‐values represent statistical significance of the GO terms.
These enriched GO terms can be grouped into three major categories. The first category is related to transcription regulation. Protein‐coding genes sharing the same regulatory regions as four miRNA genes, miR156b, miR165a, miR169j and miR172c, fall into this category. The second category is related to direct response to stress or external stimuli. Protein‐coding genes corresponding to miR156h and miR166f are in this category. The last category includes hydrolase activity and oxidoreductase activity. Protein‐coding genes that are likely regulated by the same regulatory programs as the rest seven miRNA genes are in this category. It has been well studied that hydrolase and oxidoreductase are involved in response to many stresses, including light stress (Kimura et al, 2001, 2003a; Apel and Hirt, 2004). The analysis of GO term enrichment provides additional evidence that these 13 miRNA genes are very likely to be involved in the responses to the UV‐B stress. Three miRNA genes, miR168a, miR395c and miR395e, were excluded from our prediction due to lack of reported target genes. However, protein‐coding genes containing the same arrays of motifs as these three miRNA genes, especially miR168a and miR395c, are enriched with stressrelated GO terms.
Presence of known light‐relevant cis‐elements in the promoters of miRNA genes
Using the WordSpy genome‐wide motif finding algorithm (Wang et al, 2005; Wang and Zhang, 2006), we identified many significant cis‐elements that are characteristic of UV‐B responsiveness of the 21 miRNA genes listed in Table II. Some of them are well characterized in plant motif databases such as PLACE (Higo et al, 1999) and discussed in the literature (Terzaghi and Cashmore, 1995; Narusaka et al, 2004; Zhang et al, 2004; Yamaguchi‐Shinozaki and Shinozaki, 2005). The cis‐elements, which have been experimentally characterized in light‐regulated genes, include the G‐box (CACGTG), the GT‐1 site (GGTTAA), I‐boxes (GATAAGA), TGA‐box (TGACGT), GATA‐box (GATATTT), H‐box (CCTACC) and CCAAT (CCAAT) (Terzaghi and Cashmore, 1995). As shown in Table IV, these motifs appear in the promoters of some of miRNA genes that are upregulated by UV‐B stimuli.
The presence of the well‐studied light‐related motifs shed light on the possible mechanisms activating the miRNA genes. For these 21 miRNAs, the most prevalent cis‐elements are GT‐1 site, I‐box core and CCAAT‐box. These miRNA genes all have GT‐1 site in their promoters, all except one contain I‐box core and 17 of them contain CCAAT‐box. The involvement of GT‐1 site, I‐box and CCAAT‐box in abiotic stress regulation has been well studied (Teakle and Kay, 1995; Arguello‐Astorga and Herrera‐Estrella, 1998; Shinozaki et al, 2003; Zhang et al, 2004; Yamaguchi‐Shinozaki and Shinozaki, 2005); therefore, it is not surprising to find them in almost all of these UV‐B‐induced miRNA genes. Among the 21 miRNA genes, seven contain the GATA‐box in their promoters, which has been shown to regulate light responsive genes (Teakle and Kay, 1995; Arguello‐Astorga and Herrera‐Estrella, 1998).
These motifs were previously analyzed on protein‐coding genes. Their presence suggests that these miRNA genes are regulated similarly as light responsive protein‐coding genes. To be specific, we list in Table V the known motifs that miR167d shares with its possible coregulated protein‐coding genes. These shared motifs provide additional evidence that these miRNA genes and the corresponding protein‐coding genes are regulated by similar mechanisms under UV‐B stimuli.
Expression–repression pathways that UV‐B responsive miRNAs may be involved in
Light can trigger the transcription of a set of miRNA genes, which direct their target protein‐coding mRNAs to be degraded quickly. Remarkably, eight of the 11 putative light‐inducible miRNAs (except miR393, miR398 and miR401) have targets that encode transcription factors. These targeted transcription factors can subsequently affect the expressions of their downstream genes. Hence, as in developmental stages, under light stress conditions there are downregulation pathways that are initiated by miRNAs, which cascade to the targets of these miRNAs, the targets of the targets of miRNAs, and so on. Table VI shows the targets of the putative UV‐B responsive miRNAs.
A striking observation is that auxin signaling pathways can be affected by several light‐induced miRNA genes. Auxin (principally indole‐3‐acetic acid) is an important hormone in plants. It affects many aspects of plant growth and development by influencing auxin response factors (ARF), a plant‐specific family of DNA binding proteins. ARFs regulate the expression of auxin‐inducible genes such as GH3 and auxin/indole‐3‐acetic acid (Aux/IAA) by binding auxin response elements (AREs). Figure 2 illustrates possible auxin signaling pathways that four miRNAs, miR160, miR165/166, miR167 and miR393 may be involved in. As shown, these miRNAs will affect auxin signaling pathways by regulating different transcription factors under UV‐B stimuli.
Materials and methods
We studied all miRNA genes curated in the miRNA Registry (microrna.sanger.ac.uk/sequences/) as of January 1, 2006, except miR408, miR413, miR414, miR419 and miR420 that were not reported to be bonafide miRNAs (Xie et al, 2005). Three pairs of polycistronic miRNA genes, miR169i and miR169j, miR169k and miR169l, and miR169 m and miR169n are referred to as miR169j, miR169l and miR169n, respectively, in this paper. Different from many animal miRNA genes, all these A. thaliana miRNAs have been annotated as intergenic genes, except miR402 (Sunkar and Zhu, 2004). We studied a total of 109 miRNA genes.
The upstream sequences of protein‐coding genes were downloaded from TAIR (ftp.arabidopsis.org/home/tair/seq_analysis_updates/). Known plant motifs were obtained from the motif database PLACE (www.dna.affrc.go.jp/PLACE/).
Microarray data for UV‐B radiation from the international joint effort of Arabidopsis Gene Expression Project (AtGenExpress, www.arabidopsis.org/info/expression/ATGenExpress.jsp) were used in our study. The data were directly retrieved from TAIR site www.arabidopsis.org/info/expression/ATGenExpress.jsp. In these expression profiling experiments, 18‐day‐old Arabidopsis seedlings of Columbia‐0 ecotype were harvested at the following time points: 0.25, 0.5, 1, 3, 6, 12 and 24 h after UV‐B treatment. RNAs from root and shoot were analyzed separately using Affymetrix (www.affymetrix.com) Ath1 gene chip that contains more than 22K genes. Control samples were collected at the respective time points from plants grown under normal condition. To identify differentially expressed genes, we computed, for each gene at each time point profiled, the ratio of the expression level under UV‐B treatment and the expression level under the control condition. A gene is considered upregulated if the gene expression ratio at any time point is at least 5, that is, we selected genes upregulated at least five‐fold. This gave a total of 1280 genes upregulated in the root or shoot.
Outline of the computational approach
Our approach is designed to analyze the expression activities of plant miRNAs under a specified condition, by integrating a variety of DNA sequence data and gene expression data. As shown in Figure 3, the approach consists of four major steps:
Statistical analysis of target gene expression coherence: Under a particular condition, if an miRNA is upregulated, all its targets are very likely to be coherently downregulated if they are expressed. Thus, these target genes will have more similar expression profiles than a set of arbitrarily chosen genes. In this study, we used the average pairwise cosine similarity of gene expression (see section similarity of gene expression patterns) to measure the coherence of expression profiles of target genes for all reported miRNAs, and identified putative UV‐B responsive miRNA families. For an miRNA family that has multiple member genes, we used the following steps to identify individual miRNA genes upregulated by UV‐B stress.
Discovering cis‐elements that are functionally relevant to UV‐B stress: First we identified a comprehensive set of motifs from the 1280 protein‐coding genes significantly upregulated under UV‐B stress condition, using our WordSpy algorithm (Wang et al, 2005; Wang and Zhang, 2006) (see section motif identification). Then we applied a motif selection step to discover motifs that are functionally relevant to UV‐B stress condition. If a motif is involved in the gene regulation under UV‐B, most genes containing this motif should be upregulated under UV‐B stress condition. For each motif reported by WordSpy, we assessed whether it was statistically significantly relevant to UV‐B stress, with a P‐value obtained from an accumulative hypergeometric test (see section hypergeometric distribution of motifs or motif combinations). Motifs with P‐values less than 0.1 were selected to be used for identifying miRNA genes.
Locating the 5′ proximal promoter regions of miRNA genes: To locate proximal promoters of miRNA genes, we first identified the transcription start sites or core promoter regions of the miRNA genes. In our study, we used the core promoters of 52 Arabidopsis miRNA genes experimentally identified by Carrington's laboratory (Xie et al, 2005), and the core promoters of the rest of miRNA genes predicted by our newly developed de novo core promoter prediction method called CoVote (Zhou et al, 2007). Using information of core promoters, we retrieved 1000 bp 5′ proximal promoter regions of corresponding miRNA genes. For each miRNA gene, we scanned its proximal promoter region with the motifs discovered from previous step and obtained an array of motifs present in its promoter.
Identifying individual miRNA genes expressed under UV‐B stress condition: Coexpressed genes are often coregulated, and coregulated genes contain common cis‐elements or motif modules in their upstream regulatory regions. Therefore, the promoter regions of the genes responding to particular environmental stimuli must have characteristic cis‐elements that are responsible for the upregulation of their expressions. If an miRNA gene i is upregulated under UV‐B stress condition, the array of motifs present in its proximal promoter region are very likely to be involved in the upregulation of its expression. Most protein‐coding genes that contain the same array of motifs should also be upregulated under the UV‐B stress condition. For each miRNA gene, we assessed the statistical significance that the array of motifs contained in its promoters is relevant to UV‐B stress, using a cumulative hypergeometric test (see section hypergeometric distribution of motifs or motif combinations). We selected miRNA genes with P‐values less than 0.1 as candidate UV‐B responsive genes. In order to increase accuracy, we further exploited the anticorrelation between the expression of an miRNA gene and the expressions of its targets to filter out possible false candidates. As we did not rely on any direct information of miRNA expression, we used the inferred expression of an miRNA gene and the expressions of its targets to compute their anticorrelation (see section inference of expression patterns of miRNA genes).
In the AtGenExpress microarray data, 1280 genes are upregulated at least five‐fold in either shoot or root. The 1000 bp upstream promoters of these genes were analyzed in our study. We extracted statistically significant motifs, ranging from 5‐ to 9‐mers, from the promoters of these genes using the WordSpy motif‐finding algorithm (Wang et al, 2005; Wang and Zhang, 2006). WordSpy integrated statistical modeling and word counting methods, so that it is able to build a dictionary of a large number of statistically significant motifs. WordSpy adopted a strategy of steganalysis, which is a technique for discovering hidden patterns and information from a media such as strings, so that it does not have to reply on additional background sequences and is still able to find motifs of nearly exact lengths. The details of the algorithm are available in (Wang et al, 2005; Wang and Zhang, 2006) and http://cic.cs.wustl.edu/wordspy.
Inference of expression patterns of miRNA genes
The rationale behind the inference was discussed in section outline of the computational approach. To reiterate, the method is transcriptome based; the main idea is to estimate the expected expression pattern of an miRNA using the expressions of putative coregulated protein‐coding genes, which are predicted based on the similarity of their upstream promoter regions with that of the miRNA. We used the significant cis‐elements to measure the similarity of the promoters of an miRNA and a protein‐coding gene. For two promoters, we first collected a set M of cis‐elements that appeared in one or both of the promoters. We then represented a promoter as a vector, where each entry corresponds to a cis‐element in M, with its value being 1 if it appears in the promoter, or 0 otherwise. The similarity of two promoters was then measured by the cosine similarity of their corresponding vectors (Menczer, 2004). Specifically, the similarity of the promoters of genes g and g2 is defined as,
where is the inner product of motif vectors and of the two genes.
Using this cosine similarity, for each miRNA gene, we selected five protein‐coding genes, whose promoters most closely resemble the promoter of the miRNA. We then used their average expression pattern as the inferred expression pattern of the miRNA. In our study, we also tested the Pearson correlation coefficient as similarity measure, and obtained similar result.
Hypergeometric distribution of motifs or motif combinations
The statistical significance of a motif, motif combination or motif module c was measured by a cumulative hypergeometric test (Altman, 1991). Given M genes on a microarray chip, assume that n of these M genes have a particular property (e.g. UV‐B upregulated), m of the M genes on the chip contain the motif or motif combination c, and n of the N genes (which are UV‐B upregulated) have the motif or motif module c. We calculated a P‐value for the statistical significance of the motif or motif module c as the probability under which we would expect at least n genes to have c if we randomly selected m genes from the given M genes on the microarray chip. Specifically, P‐value is computed as follows:
Similarity of gene expression patterns
The similarity of expression patterns of two genes was measured by cosine similarity defined as in Equation (1), where is the inner product of expression profile vectors and of the two genes. In our study, we also used the Pearson correlation coefficient to measure the similarity of expression patterns of two genes, and obtained similar result (data not shown).
We thank Jianhua Ruan for stimulating discussions. This work was supported in part by NSF grants ITR/EIA‐0113618 and IIS‐0535257.
- Copyright © 2007 EMBO and Nature Publishing Group