This report provides a global view of how gene expression is affected by DNA replication. We analyzed synchronized cultures of Saccharomyces cerevisiae under conditions that prevent DNA replication initiation without delaying cell cycle progression. We use a higher‐order singular value decomposition to integrate the global mRNA expression measured in the multiple time courses, detect and remove experimental artifacts and identify significant combinations of patterns of expression variation across the genes, time points and conditions. We find that, first, ∼88% of the global mRNA expression is independent of DNA replication. Second, the requirement of DNA replication for efficient histone gene expression is independent of conditions that elicit DNA damage checkpoint responses. Third, origin licensing decreases the expression of genes with origins near their 3′ ends, revealing that downstream origins can regulate the expression of upstream genes. This confirms previous predictions from mathematical modeling of a global causal coordination between DNA replication origin activity and mRNA expression, and shows that mathematical modeling of DNA microarray data can be used to correctly predict previously unknown biological modes of regulation.
In this first global view of how DNA replication affects transcription, ∼88% of global gene expression is found to be independent of DNA replication, significantly lowering the recently measured upper bound to replication‐dependent mRNA expression.
The requirement of DNA replication for efficient histone gene expression is found to be independent of conditions that elicit DNA damage checkpoint responses.
Origin licensing is found to decrease expression of genes with origins near their 3′ ends, experimentally revealing for the first time that downstream origins can regulate the expression of upstream genes, and demonstrating that mathematical modeling of DNA microarray data can be used to correctly predict previously unknown biological modes of regulation.
The recent analogy of the tensor higher‐order singular value decomposition (HOSVD) with the matrix singular value decomposition (SVD), which already enables the interpretation of the HOSVD in terms of the cellular states, biological processess and experimental artifacts that compose the data tensor, is extended by mathematically defining the operation of HOSVD data reconstruction, and by computationally using this operation to remove experimental artifacts from the data.
DNA replication and transcription occur on a common template, and there are many ways in which these activities may influence each other. First, the passage of DNA replication forks offers an opportunity to change gene expression patterns. The transcription of capsid genes generally occurs late in most viral infections and is often dependent upon prior replication of the viral genome (Rosenthal and Brown, 1977; Thomas and Mathews, 1980; Toth et al, 1992). In bacteriophage T4, there is a direct coupling of replication and transcription because the sliding clamp processivity factor gp45 acts as a mobile transcriptional enhancer (Herendeen et al, 1989). Such coupling may serve a regulatory role: DNA replication differentially affects the transcription of the embryonic and somatic 5S rRNA genes in the frog Xenopus laevis (Wolffe and Brown, 1986). Second, juxtaposed genes and replication origins can influence each other's activity. The origin of the virus SV40 replication overlaps promoter and enhancer elements for both early and late gene expression (Cowan et al, 1973; Alwine et al, 1977), and there are many examples of transcription factors influencing replication origin function (DePamphilis, 1988). Moreover, induced transcription into a yeast Saccharomyces cerevisiae replication origin can inactivate it (Snyder et al, 1988). Finally, clashes between replication and transcription machinery are potentially important causes of genome instability. Machinery exists to minimize this instability (Liu and Alberts, 1995; Vilette et al, 1995; Wellinger et al, 2006) and the direction of replication and transcription can be coordinated to avoid clashes (Brewer, 1988).
We measured global gene expression in synchronized Saccharomyces cerevisiae cultures under two conditions that prevent DNA replication initiation without delaying cell‐cycle progression. In the Cdc6− cells, depletion of the essential licensing factor Cdc6 prevents the replication origin licensing by preventing Mcm2–7, proteins necessary for the formation and maintenance of the prereplicative complex, from binding to origins during the cell cycle phase G1 (Piatti et al, 1995). In the Cdc45− cells, inactivation of the essential initiation factor Cdc45 prevents origin firing at a step after Mcm2–7 loading, and the Mcm2–7 proteins remain bound to origins even as the cells progress through the S, S/G2 and G2/M phases (Tercero et al, 2000). The duplicate time courses of a Cdc6 shutoff strain at the depleted condition of Cdc6− and its parental strain at the control condition of Cdc6+, and of a Cdc45 shutoff strain at the inactivated condition of Cdc45− and its parental strain at the control condition of Cdc45+, were sampled at approximately similar cell‐cycle phases in both Cdc6+/45+ control conditions, starting at the exit from the pheromone‐induced arrest and entry into the G1 phase through the S and S/G2 phases to the beginning of the G2/M phase just before nuclear division (Supplementary information Section 1, and Datasets 1 and 2). The experimental variation in sample batch, hybridization batch, DNA microarray platform and protocols (Gerke et al, 2006; Hu et al, 2007) was designed to be orthogonal to the biological variation in the condition of Mcm2–7 origin binding. This enables computational detection of experimental artifacts by using a higher‐order SVD (Box 1 and Supplementary Figures 3–6, Sections 2 and 3, and Mathematica Notebook) as described earlier (Golub and Van Loan, 1996; Alter et al, 2000; Nielsen et al, 2002; Alter and Golub, 2004; Alter, 2006; Li and Klevecz, 2006; Omberg et al, 2007). The HOSVD reconstruction of the data cuboid is mathematically defined (Supplementary information Section 2.4), and used to computationally remove the experimental artifacts (Supplementary information Section 3.3). The two Cdc6− time courses are then averaged, and also separately the two Cdc45− and the four Cdc6+ and Cdc45+ control time courses (Supplementary information Dataset 3). The probabilistic significance of the enrichment of the time points in the averaged Cdc6+/45+ control in overexpressed or underexpressed cell cycle‐regulated genes (Supplementary Figure 7 and Supplementary information Datasets 4 and 5), calculated (Supplementary information Section 2.5) as described (Tavazoie et al, 1999), is consistent with the flow cytometry measurements of cell synchrony in the Cdc6+ and Cdc45+ time courses (Supplementary Figures 1 and 2), as well as with previous analyses of α‐factor synchronized cultures (Spellman et al, 1998).
Box 1 Higher‐order singular value decomposition (HOSVD)
The structure of the data in this study is of an order higher than that of a matrix. Genes, time points and conditions of Mcm2–7 origin binding, each represent a degree of freedom in a cuboid, that is, a third‐order tensor. Unfolded into a matrix, these degrees of freedom are lost and much of the information in the data tensor might also be lost. We integrate these data by using a tensor HOSVD (Supplementary information Sections 2 and 3, and Mathematica Notebook). This HOSVD uncovers in the data tensor (Supplementary information Dataset 3) patterns of mRNA expression variation across the genes, time points and conditions, as depicted in the raster display of Supplementary information Equation (1) (left) with overexpression (red), no change in expression (black) and underexpression (green). This HOSVD was recently reformulated in analogy with the matrix singular value decomposition (SVD) (Golub and Van Loan, 1996; Alter et al, 2000; Nielsen et al, 2002; Alter and Golub, 2004; Alter, 2006; Li and Klevecz, 2006) such that it separates the data tensor into a weighted sum of combinations of three patterns each, that is, ‘subtensors’, as depicted in the raster display of Supplementary information Equation (7) (right), where the second and third HOSVD combinations and their corresponding weights are shown explicitly. In these raster displays, each expression pattern across the time points is centered at its time‐invariant level. The genes are sorted by their ‘angular distances’ between the second and third HOSVD combinations (Supplementary information Section 2.6), which represent the unperturbed cell cycle expression oscillations (Figure 2 and Supplementary Figure 12). The white lines separate the even and odd hybridization batches, indicated by black arrows. This reformulation of the HOSVD was shown to enable its interpretation in terms of the cellular states, biological processes and experimental artifacts that compose the data tensor by defining the significance of each combination of patterns, and the operation of rotation in a subspace of these combinations (Omberg et al, 2007). In this study, we extend this analogy to mathematically define the operation of reconstruction in a subspace of combinations (Supplementary information Section 2.4), and use it to computationally remove experimental artifacts from the global mRNA expression measured in multiple time courses (Supplementary information Section 3.3).
To uncover the cell cycle phase‐dependent effects of Mcm2–7 origin binding, we use this HOSVD as described (Omberg et al, 2007) to separate the averaged data cuboid into a weighted sum of all possible combinations of three patterns of expression variation each: one across the 4270 genes, one across the 12 time points and one across the three conditions of Mcm2–7 origin binding (Figure 1, and Supplementary Figures 8–11 and Supplementary information Dataset 6). The significance of each combination of patterns, in terms of the fraction of the overall mRNA expression that this combination captures in the data cuboid, is proportional to its weight in this sum. We find that the seven most significant and unique combinations capture ∼94% of the mRNA expression of the 4270 genes (Table I).
First, we find that ∼88% of the overall expression information in this data cuboid is independent of DNA replication and Mcm2–7 origin binding. This unperturbed expression is represented by the three most significant combinations, all of which correlate with condition‐invariant overexpression. The first combination also correlates with time‐invariant underexpression and represents steady‐state expression. The second and third combinations describe oscillations consistent within the hybridization batches, which peak at the M/G1 and G1/S phases and trough at the S/G2 and G2/M phases in the averaged Cdc6+/45+ control time course, respectively. Consistently, the second and third combinations correlate with patterns of expression variation across the genes that are enriched in overexpressed M/G1 and, separately, G1 and S genes and underexpressed S/G2 and G2/M genes, respectively, with P‐values <6.4 × 10−16. Taken together, the second and third combinations represent unperturbed cell cycle expression oscillations (Supplementary Figure 12). Upon sorting the genes by their levels of expression in the second and third HOSVD combinations (Supplementary Section 2.6), the picture that emerges is that of unperturbed global expression oscillations that are dominant in the Cdc6−, Cdc45− as well as in the Cdc6+/45+ time courses (Figure 2 and Supplementary information Dataset 3).
It was recently shown that the cell cycle phase of the peak expression of ∼70% of 1271 cell cycle‐regulated genes is conserved in cells that do not express the S phase and mitotic cyclins‐encoding genes CLB1‐6 and are, therefore, unable to replicate DNA (Orlando et al, 2008). Our analysis of the 4270 genes that were selected on the basis of data quality alone significantly lowers this upper bound to replication‐dependent mRNA expression. Our results are consistent with the idea that the program of cell cycle‐regulated transcription may be largely independent of underlying cell cycle events, such as DNA replication (Simon et al, 2001).
Second, we find that ∼3.5% of the overall expression of the 4270 genes depends on DNA replication but is independent of the method of origin inactivation. These replication‐dependent perturbations in mRNA expression are represented by the fourth and sixth combinations, which correlate with overexpression in the averaged control and underexpression in both Cdc45− and Cdc6− time courses. The fourth combination also correlates with time‐invariant underexpression and with expression variation across the genes that is enriched in underexpressed histone genes with the P‐value <9.2 × 10−13. The sixth combination correlates with overexpression of histone genes at the G1/S phase with the corresponding P‐value <4.9 × 10−4. Taken together, the time‐averaged and G1/S expression of histone genes is reduced in both situations where DNA replication is prevented, indicating that DNA replication is required for efficient histone gene expression.
To examine the joint effects of DNA replication and origin binding on global mRNA expression, we classified the 4270 genes into intersections of the fourth through seventh combinations of expression patterns (Supplementary information Dataset 6). Enrichment in overexpressed histone genes was also observed for the fifth combination with the P‐value <1.5 × 10−8. Among the 1294 genes that are underexpressed in the fourth and overexpressed in the fifth and sixth combinations, the four most significant genes, in terms of the fraction of mRNA expression that they capture in these combinations, are the histone genes HTA1, HTA2, HTB1 and HTB2. Six of the nine histone genes are among the ten most significant genes, an enrichment that corresponds to a P‐value ∼2.1 × 10−15. Overall, the histone genes are overexpressed in the control relative to the Cdc6− condition, in which the Mcm2–7 licensing of origins and subsequent DNA replication are prevented, and to a lesser extent also relative to the Cdc45− condition, in which DNA replication is prevented but only after the origins are licensed (Figure 3a).
Previous work has shown that the coupling of histone mRNA levels to DNA replication is primarily due to transcriptional regulatory mechanisms (Lycan et al, 1987). Because in our study the Rad53 checkpoint kinase is not activated in either the Cdc6− or Cdc45− conditions as previously described (Piatti et al, 1995; Tercero et al, 2000), and because we did not observe any significant enrichment in DNA damage‐induced genes (Jelinsky and Samson, 1999) in the fourth through seventh combinations of expression patterns, in which the corresponding P‐values >9.4 × 10−2, we suggest that these effects on histone gene expression are directly dependent on DNA replication status, independent of DNA damage checkpoint responses.
Third, we find that ∼2.6% of the mRNA expression is affected by Mcm2–7 origin binding. The origin binding‐dependent perturbations are represented by the fifth and seventh combinations, which correlate with overexpression in the Cdc6− and underexpression in the Cdc45− cultures. The fifth combination correlates with time‐invariant underexpression that is enriched in underexpressed genes with autonomously replicating sequences (ARSs) adjacent to their 3′ ends, defined as genes with at least one confirmed ARS at a distance of less than 100 nucleotides from their respective 3′ ends (Cherry et al, 1997; Nieduszynski et al, 2007) with the P‐value <1.9 × 10−3. The seventh combination correlates with G2/M overexpression of genes with ARSs near their 3′ ends, with the P‐value <6.9 × 10−4. Taken together, origin licensing decreases time‐averaged and G2/M expression of genes with origins near their 3′ ends. We did not observe any significant enrichment in genes with ARSs near their 5′ ends nor did we observe any significant enrichment in genes that overlap ARSs, where all the corresponding P‐values were >1.2 × 10−1. We suggest that origin licensing may affect the expression of adjacent genes by interfering with transcription elongation and/or pre‐mRNA 3′‐end processing (Proudfoot, 2004; Gilmartin, 2005; Weiner, 2005; Rosonina et al, 2006), thus destabilizing the mRNA transcripts. The accumulation of mRNA transcripts of this class of genes throughout the Cdc6− relative to the Cdc45− time courses is consistent with the observed peak in their differential expression late in the time courses at the G2/M phase (Figure 3b).
Of the 153 genes with ARSs near their 3′ ends, 16 are among the 100 most significant of the 1412 genes that are overexpressed in the fourth and underexpressed in the fifth and seventh combinations, an enrichment that corresponds to a P‐value <3.4 × 10−7. No other significant enrichment was observed among these 100 genes, nor was an enrichment in gene ontology (GO) (Cherry et al, 1997) annotations observed among the 16 genes with ARSs near their 3′ ends. Of these 153 and 16 genes, only 24 and 5, respectively, are cell cycle regulated. These 16 genes are overexpressed in the Cdc6− condition, in which the origins are unlicensed, relative to the Cdc45− condition, in which Mcm2–7 bind origins throughout the time course, and to a lesser extent also relative to the control, in which Mcm2–7 bind origins only during G1. The expression of these genes is not as highly correlated as that of the nine genes for histones, consistent with this class of genes being co‐degraded but not necessarily co‐transcribed. Previous studies have shown that transcription can interfere with the function of downstream origins (Snyder et al, 1988; Nieduszynski et al, 2005; Donato et al, 2006). Our results reveal that downstream origins can also interfere with the expression of upstream genes. This interference requires origin licensing but does not require origin firing. We suggest that cells may exploit this complex reciprocal arrangement in different contexts to regulate gene expression or origin activation.
A global pattern of correlation between DNA binding of Mcm2–7 and reduced expression of adjacent genes, most of which are not cell cycle regulated, during the cell cycle phase G1 was discovered from previous mathematical modeling of DNA microarray data (Alter and Golub, 2004; Omberg et al, 2007), in which the mathematical variables and operations were shown to represent biological reality (Alter, 2006). The mathematical variables, patterns uncovered in the data, were shown to correlate with activities of cellular elements, such as regulators or transcription factors. The operations, such as classification, rotation or reconstruction in subspaces of these patterns, were shown to simulate experimental observation of the correlations and possibly even the causal coordination of these activities (Supplementary information Section 2). Of the 153 genes with ARSs near their 3′ ends, the ARSs near 151 were identified in Mcm2–7 high‐throughput binding assays, and for the ARSs near 139 of those, including all of the 16 significant genes, consensus sequence elements were identified (Wyrick et al, 2001; Xu et al, 2006). Our results, therefore, suggest that a causal relation underlies this correlation, that is, the binding of Mcm2–7 is responsible for diminished expression of adjacent genes, and show that mathematical modeling of DNA microarray data can be used to correctly predict previously unknown biological modes of regulation.
We thank BA Cohen and VR Iyer and his laboratory members, Z Hu, PJ Killion and S Shivaswamy, for technical assistance with DNA microarrays from the Washington University Microarray Core and at the University of Texas, respectively, and AW Murray for insightful comments. We also thank GH Golub for introducing us to matrix and tensor computations, and the American Institute of Mathematics in Palo Alto for hosting the 2004 Workshop on Tensor Decompositions where some of this study was carried out. This study was supported by Cancer Research UK (JFXD) and National Human Genome Research Institute R01 Grant HG004302 and National Science Foundation CAREER Award DMS‐0847173 (OA).
Conflict of Interest
The authors declare that they have no conflict of interest.
Supplementary Text; Supplementary Figures 1–12; Supplementary Tables I–II
Mathematica Notebook (a)
A Mathematica 5.2 code file, executable by Mathematica 5.2 and readable by Mathematica Player, freely available at http://www.wolfram.com/products/player/.
Mathematica Notebook (b)
A PDF format file, readable by Adobe Acrobat Reader.
A tab‐delimited text format file, readable by both Mathematica and Microsoft Excel, tabulating relative mRNA expression levels of 4771 probes of the UT DNA microarrays that correspond to the K=4270 genes across 24 samples.
A tab‐delimited text format file, readable by both Mathematica and Microsoft Excel, tabulating relative mRNA expression levels of 8540 probes of the WU DNA microarrays that correspond to the K=4270 genes across 72 samples.
A tab‐delimited text format file, readable by both Mathematica and Microsoft Excel, tabulating the averaged log2 of the relative mRNA expression of the K=4270 genes across the L=12 time points and across the M=3 conditions of Mcm2‐7 origin binding. The genes are sorted by their angular distances between the second and third HOSVD combinations (Supplementary information Section 2.6 and Dataset 6), which represent the unperturbed cell cycle expression oscillations (Figure 2 and Supplementary Figure 12). The angular distance of each gene is also listed.
A tab‐delimited text format file, readable by both Mathematica and Microsoft Excel, reproducing descriptions and genomic coordinates [Nieduszynski et al, 2007] of the 325 confirmed ARSs in Saccharomyces cerevisiae.
A tab‐delimited text format file, readable by both Mathematica and Microsoft Excel, reproducing cell cycle annotations [Spellman et al, 1998], DNA damage responses [Jelinsky and Samson et al, 1999], descriptions and genomic coordinates [Cherry et al, 1997] and of the 4270 Saccharomyces cerevisiae genes.
A tab‐delimited text format file, readable by both Mathematica and Microsoft Excel, tabulating the eigenarrays and superpositions of eigenarrays that define the global gene expression patterns of the seven significant and unique subtensors of the averaged data cuboid. The expression levels of the genes in the intersections of the fourth through seventh HOSVD combinations, as computed by using the corresponding eigenarrays, are also tabulated. The ten significant among the 1294 genes that are underexpressed in the fourth and overexpressed in the fifth and sixth combinations are enriched in histone genes. The 100 significant among the 1412 genes that are overexpressed in the fourth and underexpressed in the fifth and seventh combinations are enriched in genes with ARSs near their 3′ ends.
This is an open‐access article distributed under the terms of the Creative Commons Attribution License, which permits distribution, and reproduction in any medium, provided the original author and source are credited. This license does not permit commercial exploitation without specific permission.
- Copyright © 2009 EMBO and Nature Publishing Group