Figure 1.Isolation of transcription elongation complexes from the 5′ and 3′ regions of a single gene
Schematic demonstrating the sequential immunoprecipitation (IP) approach used to isolate 5′ and 3′ elongation complexes (ECs). GALp, GAL1 promoter; 2× PP7, two repeats of the PP7 stem‐loop encoding sequence; 2× MS2, two repeats of the MS2 stem‐loop encoding sequence; PCP‐GFP, PP7 coat protein fused to GFP; MCP‐RFP, MS2 coat protein fused to RFP.
Western blots for input and unbound samples for Rpb3 (Pol II), GFP (5′ complexes), and RFP (3′ complexes) immunoprecipitations. The input samples for the GFP and RFP IPs correspond to the bound fraction of the Rpb3 IP.
Silver stain of ECs isolated by single GFP IP (GFP), Pol II‐GFP sequential IP (Rpb3‐GFP), and single Pol II (Rpb3) IPs. M, molecular weight marker; kD, kilodalton.
Top: depiction of RT–qPCR primer location for 5′ region and 3′ region primers in the stem‐loop‐TDH3 construct. Bottom: RT–qPCR demonstrating the fold enrichment of 5′ ECs over 3′ ECs after various RNase A fragmentation times. The input sample represents whole‐cell lysate while the 0 min sample is taken after the Rpb3 IP with no RNase A fragmentation.
Silver stain of 5′ (Rpb3‐GFP) and 3′ (Rpb3‐RFP) ECs. M, molecular weight marker; kD, kilodalton.
Bar graphs displaying the average number of reads per gene from NET‐seq analysis on sequentially isolated 5′ and 3′ ECs. Construct indicates reads from the stem‐loop‐TDH3 construct; Pol II genes indicates reads from all other Pol II genes. Error bars for Pol II genes represent standard deviation. 186 Pol II genes were detected in the 5′ (GFP) IPs, and 1,118 Pol II genes were detected in the 3′ (RFP) IPs.
Top: depiction of the stem‐loop‐TDH3 construct. Bottom: cumulative fraction of NET‐seq reads plotted as a function distance from the transcription start site of the stem‐loop‐TDH3 construct for 5′ (green) and 3′ (red) ECs.
Figure 2.5′ and 3′ elongation complexes are distinct
Scatter plots comparing the normalized log2 MS1 intensities for each protein from triplicate GFP (5′), RFP (3′), and mock IPs. Plots are colored based on the Pearson correlation value between the samples. The last plot in each row depicts a histogram of MS1 intensities for all proteins in each sample. MS1 intensities for each peptide from a given protein were summed to obtain MS1 intensity levels for each protein.
Principal component analysis of triplicate GFP (5′), RFP (3′), and mock IPs. PC, principal component.
Complete linkage hierarchical clustering of normalized MS1 intensities for triplicate GFP (5′), RFP (3′), and mock samples.
A–D The distribution of summed MS1 intensities from all peptides corresponding to each protein identified in each mass spectrometry run are plotted. The distribution of MS1 intensities is plotted for all nine samples (triplicate 5′ IPs, triplicate 3′ IPs, and triplicate mock IPs). Also shown is a table displaying the mean and standard deviation for the number of unique proteins identified from the triplicate IPs. The distributions are plotted for (A) raw MS1 intensities, (B) mean normalized (see Materials and Methods) intensities, (C) mean normalized intensities filtered for proteins present in all three samples from at least one condition, and (D) mean normalized intensities filtered for proteins present in all three samples from at least one condition with missing values (i.e. proteins present in one condition but not the other) imputed (see Materials and Methods). The mean and variance are shown in the table next to the distributions represent the mean and variance from averaging the triplicate datasets. t‐tests to test for differences in the mean and F‐tests to test for differences in the variance for each dataset are also shown. These data demonstrate that both raw and normalized datasets do not vary in their variances as F‐tests between the datasets are not significant. The 5′ and 3′ IPs show highly similar mean MS1 intensity values and distributions in the raw data. The mock IPs display a mean shifted toward larger MS1 intensities. Thus, to allow for comparison between 5′, 3′ IPs, and mock IPs, the data were mean normalized (as described in the Materials and Methods) to center the distributions of the datasets around zero, allowing the samples to be compared to one another. As the variance was not significantly different between any of the datasets, no adjustment of the variance was made. After mean normalization, the MS1 intensity distributions and variances are all similar, allowing direct comparison of protein enrichment within samples as well as t‐tests between each protein in each sample. Note that dataset mean and variance values do not differ between samples until imputation of missing values to allow for statistical analysis between samples.
Figure 3.Identifying factors enriched in 5′ and 3′ elongation complexes
A, B Volcano plot analysis comparing GFP (5′) IPs (A) or RFP (3′) IPs (B) to mock IPs. High‐confidence, specifically enriched factors determined using an FDR of 0.05 and less stringent interactors determined using an FDR of 0.1. Factors mentioned in the text are labeled.
C. Volcano plot comparing GFP (5′) and RFP (3′) IPs. Factors enriched in the 5′ complexes are colored in green, and those enriched in the 3′ complexes are colored in red. Components of the 26S proteasome are colored in maroon; factors mentioned in the text are labeled.
Figure 4.Bye1 and Rai1 function during early transcription elongation
Normalized NET‐seq reads for WT (black) and bye1Δ (green) cells at the RPL26B gene.
Normalized average NET‐seq profiles for WT and bye1Δ cells around the transcription start site (TSS) and polyadenylation site (polyA) of protein‐coding genes.
Box plot comparing the ratio of Pol II density at the 5′ region of genes to the Pol II density at the 3′ region of genes for WT and bye1Δ cells at protein‐coding genes.
Normalized NET‐seq reads for WT (black), rtt103Δ (gray), and rai1Δ (purple) cells at the ARO8 gene.
Normalized average NET‐seq profiles for WT and rtt103Δ, and rai1Δ cells around the TSS and polyA site of protein‐coding genes.
Box plot comparing the ratio of Pol II density at the 5′ region of genes to the Pol II density at the 3′ region of genes for WT, rtt103Δ, and rai1Δ cells at protein‐coding genes.
Data information: In (B) and (E), NET‐seq reads for each gene are normalized by total reads for each gene in the analyzed region, and shaded areas represent the 95% confidence interval, n = 2,738 genes. A.U., arbitrary units. In (C) and (F), P‐values were determined using a two‐sided t‐test, n = 2,734 genes. WT, rtt103Δ, and rai1Δ NET‐seq data were obtained from Harlen et al (2016) and re‐analyzed in (D–F). In (C) and (F) horizontal bars indicate samples being compared by t‐test. Dashed lines mark the median 5′ to 3′ Pol II ratio for WT cells. Solid lines in box plots represent sample median. Filled regions in box plots range form the 25th to 75th percentiles of the data while vertical lines are 1.5 times the inter‐quartile range.
Figure EV2.MPA growth assay for wild‐type cells and bye1Δ, rai1Δ, and dst1Δ mutants
WT, bye1Δ, rai1Δ, and dst1Δ mutants were plated on YPD media containing 0, 45, or 100 μg/ml of the transcription elongation inhibitor mycophenolic acid (MPA). Deletion of the transcription elongation factor DST1 confers increased sensitivity to MPA at both low (45 μg/ml) and high (100 μg/ml) doses of MPA, consistent with the role of Dst1 as a positive transcription elongation factor. Conversely, bye1Δ and rai1Δ mutants display decreased sensitivity to MPA, consistent with both bye1Δ and rai1Δ acting as negative regulators of transcription elongation.
Figure 5.Bre1 functions during the latter stages of transcription elongation
Normalized NET‐seq reads for WT (black) and bre1Δ (red) cells at the NUP157 gene. The panel on the right emphasizes the window around the polyA site where bre1Δ cells induce Pol II pausing.
Normalized average NET‐seq profiles for WT and bre1Δ, paf1Δ, and rtf1Δ cells around the transcription start site TSS and polyA of protein‐coding genes. NET‐seq reads for each gene are normalized by total reads for each gene in the analyzed region; shaded areas represent the 95% confidence interval, n = 2,738 genes. A.U., arbitrary units.
Top: box plot comparing the log2 ratio of NET‐seq reads in a window from the polyA site to 100 base pairs downstream of the polyA site to reads in a window of 50 base pairs upstream of the polyA site to the polyA site. P‐values determined using a two‐sided t‐test, n = 2,674 genes. Horizontal bars indicate samples being compared by t‐test. The dashed line marks the median polyA pausing ratio for WT cells. Solid lines in box plots represent sample median. Filled regions in box plots range form the 25th to 75th percentiles of the data while vertical lines are 1.5 times the inter‐quartile range. Bottom: cumulative distribution of the fold change in polyA pausing ratio between bre1Δ and WT cells; 60% percent of genes display increased polyA pausing in bre1Δ cells compared to WT cells.
Average normalized MNase seq (gray) or ChIP‐exo for H2B (green), H2B ubiquitylation (H2Bub, gold), and H2Bub/H2B (purple) around the TSS and polyA sites of protein‐coding genes. MNase‐seq and ChIP‐exo reads are normalized by total reads for each gene in the analyzed region, n = 2,738. MNase‐seq data were obtained from van Bakel et al (2013), and ChIP‐exo data were obtained from Rhee et al (2014).
Figure EV3.Bre1 and H2Bub are present on nucleosomes near the polyA site and are not the result of overlap with the plus one nucleosome from neighboring genes
Top: normalized average NET‐seq profiles for WT and bre1Δ cells around the transcription start site TSS and polyA of protein‐coding genes with non‐overlapping genes that do not overlap with another protein‐coding gene within 350 base pairs, n = 1,120. NET‐seq reads for each gene are normalized by total reads for each gene in the analyzed region; shaded areas represent the 95% confidence interval. Bottom: average normalized MNase seq (gray) or ChIP‐exo for H2B (green), H2B ubiquitylation (H2Bub, gold) around the TSS and polyA sites of the same genes used for the top NET‐seq analysis. MNase‐seq data were obtained from van Bakel et al (2013), and ChIP‐exo data were obtained from Rhee et al (2014).
Figure 6.Summary diagram of factors identified to regulate subgenic stages of transcription
Diagram depicting Pol II near the 5′/early elongation transcriptional phase, shaded in green and Pol II in the 3′/late elongation and termination phase of transcription, shaded in red. Factors identified by subgenic isolation of Pol II complexes and shown to regulate specific stages of transcription are displayed. Also depicted is the H2Bub modified nucleosome, which is enriched near the polyA site. Below the diagram is a table indicating genic regions where complexes were isolated from, what transcriptional stage the subgenic Pol II complexes are isolated from, the likely state of CTD phosphorylation, a sampling of factors enriched at subgenic regions during transcription, and the total number of factors enriched in each IP using the stringent cutoff.