Advertisement

Open Access

Transparent Process

The essential genome of a bacterium

Beat Christen, Eduardo Abeliuk, John M Collier, Virginia S Kalogeraki, Ben Passarelli, John A Coller, Michael J Fero, Harley H McAdams, Lucy Shapiro

Author Affiliations

  1. Beat Christen1,,
  2. Eduardo Abeliuk1,2,,
  3. John M Collier3,
  4. Virginia S Kalogeraki1,
  5. Ben Passarelli3,4,
  6. John A Coller3,
  7. Michael J Fero1,
  8. Harley H McAdams1 and
  9. Lucy Shapiro*,1
  1. 1 Department of Developmental Biology, Stanford University, Stanford, CA, USA
  2. 2 Department of Electrical Engineering, Stanford University, Stanford, CA, USA
  3. 3 Functional Genomics Facility, Stanford University, Stanford, CA, USA
  4. 4 Stem Cell Institute Genome Center, Stanford University, Stanford, CA, USA
  1. *Corresponding author. Department of Developmental Biology, Stanford University, B300 Beckman Center, Stanford, CA 94305, USA. Tel.: +1 650 725 7678; Fax: +1 650 725 7739; E-mail: shapiro{at}stanford.edu
  1. These authors contributed equally to this work

Abstract

Caulobacter crescentus is a model organism for the integrated circuitry that runs a bacterial cell cycle. Full discovery of its essential genome, including non‐coding, regulatory and coding elements, is a prerequisite for understanding the complete regulatory network of a bacterial cell. Using hyper‐saturated transposon mutagenesis coupled with high‐throughput sequencing, we determined the essential Caulobacter genome at 8 bp resolution, including 1012 essential genome features: 480 ORFs, 402 regulatory sequences and 130 non‐coding elements, including 90 intergenic segments of unknown function. The essential transcriptional circuitry for growth on rich media includes 10 transcription factors, 2 RNA polymerase sigma factors and 1 anti‐sigma factor. We identified all essential promoter elements for the cell cycle‐regulated genes. The essential elements are preferentially positioned near the origin and terminus of the chromosome. The high‐resolution strategy used here is applicable to high‐throughput, full genome essentiality studies and large‐scale genetic perturbation experiments in a broad class of bacterial species.

Visual Overview

Synopsis

This study reports the essential Caulobacter genome at 8 bp resolution determined by saturated transposon mutagenesis and high‐throughput sequencing. This strategy is applicable to full genome essentiality studies in a broad class of bacterial species.

The regulatory events that control polar differentiation and cell‐cycle progression in the bacterium Caulobacter crescentus are highly integrated, and they have to occur in the proper order (McAdams and Shapiro, 2011). Components of the core regulatory circuit are largely known. Full discovery of its essential genome, including non‐coding, regulatory and coding elements, is a prerequisite for understanding the complete regulatory network of this bacterial cell. We have identified all the essential coding and non‐coding elements of the Caulobacter chromosome using a hyper‐saturated transposon mutagenesis strategy that is scalable and can be readily extended to obtain rapid and accurate identification of the essential genome elements of any sequenced bacterial species at a resolution of a few base pairs.

We engineered a Tn5 derivative transposon (Tn5Pxyl) that carries at one end an inducible outward pointing Pxyl promoter (Christen et al, 2010). We showed that this transposon construct inserts into the genome randomly where it can activate or disrupt transcription at the site of integration, depending on the insertion orientation. DNA from hundred of thousands of transposon insertion sites reading outward into flanking genomic regions was parallel PCR amplified and sequenced by Illumina paired‐end sequencing to locate the insertion site in each mutant strain (Figure 1). A single sequencing run on DNA from a mutagenized cell population yielded 118 million raw sequencing reads. Of these, >90 million (>80%) read outward from the transposon element into adjacent genomic DNA regions and the insertion site could be mapped with single nucleotide resolution. This yielded the location and orientation of 428 735 independent transposon insertions in the 4‐Mbp Caulobacter genome.

Within non‐coding sequences of the Caulobacter genome, we detected 130 non‐disruptable DNA segments between 90 and 393 bp long in addition to all essential promoter elements. Among 27 previously identified and validated sRNAs (Landt et al, 2008), three were contained within non‐disruptable DNA segments and another three were partially disruptable, that is, insertions caused a notable growth defect. Two additional small RNAs found to be essential are the transfer‐messenger RNA (tmRNA) and the ribozyme RNAseP (Landt et al, 2008). In addition to the 8 non‐disruptable sRNAs, 29 out of the 130 intergenic essential non‐coding sequences contained non‐redundant tRNA genes; duplicated tRNA genes were non‐essential. We also identified two non‐disruptable DNA segments within the chromosomal origin of replication. Thus, we resolved essential non‐coding RNAs, tRNAs and essential replication elements within the origin region of the chromosome. An additional 90 non‐disruptable small genome elements of currently unknown function were identified. Eighteen of these are conserved in at least one closely related species. Only 2 could encode a protein of over 50 amino acids.

For each of the 3876 annotated open reading frames (ORFs), we analyzed the distribution, orientation, and genetic context of transposon insertions. There are 480 essential ORFs and 3240 non‐essential ORFs. In addition, there were 156 ORFs that severely impacted fitness when mutated. The 8‐bp resolution allowed a dissection of the essential and non‐essential regions of the coding sequences. Sixty ORFs had transposon insertions within a significant portion of their 3′ region but lacked insertions in the essential 5′ coding region, allowing the identification of non‐essential protein segments. For example, transposon insertions in the essential cell‐cycle regulatory gene divL, a tyrosine kinase, showed that the last 204 C‐terminal amino acids did not impact viability, confirming previous reports that the C‐terminal ATPase domain of DivL is dispensable for viability (Reisinger et al, 2007; Iniesta et al, 2010). In addition, we found that 30 out of 480 (6.3%) of the essential ORFs appear to be shorter than the annotated ORF, suggesting that these are probably mis‐annotated.

Among the 480 ORFs essential for growth on rich media, there were 10 essential transcriptional regulatory proteins, including 5 previously identified cell‐cycle regulators (McAdams and Shapiro, 2003; Holtzendorff et al, 2004; Collier and Shapiro, 2007; Gora et al, 2010; Tan et al, 2010) and 5 uncharacterized predicted transcription factors. In addition, two RNA polymerase sigma factors RpoH and RpoD, as well as the anti‐sigma factor ChrR, which mitigates rpoE‐dependent stress response under physiological growth conditions (Lourenco and Gomes, 2009), were also found to be essential. Thus, a set of 10 transcription factors, 2 RNA polymerase sigma factors and 1 anti‐sigma factor are the core essential transcriptional regulators for growth on rich media. To further characterize the core components of the Caulobacter cell‐cycle control network, we identified all essential regulatory sequences and operon transcripts. Altogether, the 480 essential protein‐coding and 37 essential RNA‐coding Caulobacter genes are organized into operons such that 402 individual promoter regions are sufficient to regulate their expression. Of these 402 essential promoters, the transcription start sites (TSSs) of 105 were previously identified (McGrath et al, 2007).

The essential genome features are non‐uniformly distributed on the Caulobacter genome and enriched near the origin and the terminus regions. In contrast, the chromosomal positions of the published E. coli essential coding sequences (Rocha, 2004) are preferentially located at either side of the origin (Figure 4A). This indicates that there are selective pressures on chromosomal positioning of some essential elements (Figure 4A).

The strategy described in this report could be readily extended to quickly determine the essential genome for a large class of bacterial species.

  • The essential Caulobacter genome was determined at 8 bp resolution using hyper‐saturated transposon mutagenesis coupled with high‐throughput sequencing.

  • Essential protein‐coding sequences comprise 90% of the essential genome; the remaining 10% comprising essential non‐coding RNA sequences, gene regulatory elements and essential genome replication features.

  • Of the 3876 annotated open reading frames (ORFs), 480 (12.4%) were essential ORFs, 3240 (83.6%) were non‐essential ORFs and 156 (4.0%) were ORFs that severely impacted fitness when mutated.

  • The essential elements are preferentially positioned near the origin and terminus of the Caulobacter chromosome.

  • This high‐resolution strategy is applicable to high‐throughput, full genome essentiality studies and large‐scale genetic perturbation experiments in a broad class of bacterial species.

Introduction

In addition to protein‐coding sequences, the essential genome of any organism contains essential structural elements, non‐coding RNAs and regulatory sequences. We have identified the Caulobacter crescentus essential genome to 8 bp resolution by performing ultrahigh‐resolution transposon mutagenesis followed by high‐throughput DNA sequencing to determine the transposon insertion sites. A notable feature of C. crescentus is that the regulatory events that control polar differentiation and cell‐cycle progression are highly integrated, and they occur in a temporally restricted order (McAdams and Shapiro, 2011). Many components of the core regulatory circuit have been identified and simulation of the circuitry has been reported (Shen et al, 2008). The identification of all essential DNA elements is essential for a complete understanding of the regulatory networks that run a bacterial cell.

Essential protein‐coding sequences have been reported for several bacterial species using relatively low‐throughput transposon mutagenesis (Hutchison et al, 1999; Jacobs et al, 2003; Glass et al, 2006) and in‐frame deletion libraries (Kobayashi et al, 2003; Baba et al, 2006). Two recent studies used high‐throughput transposon mutagenesis for fitness and genetic interaction analysis (Langridge et al, 2009; van Opijnen et al, 2009). Here, we have reliably identified all essential coding and non‐coding chromosomal elements, using a hyper‐saturated transposon mutagenesis strategy that is scalable and can be extended to obtain rapid and highly accurate identification of the entire essential genome of any bacterial species at a resolution of a few base pairs.

Results and discussion

We engineered a Tn5 derivative transposon (Tn5Pxyl) that carries at one end an inducible outward pointing Pxyl promoter (Christen et al, 2010; Supplementary Figure 1A; Materials and methods). Thus, the Tn5Pxyl element can activate or disrupt transcription at any site of integration, depending on the insertion orientation. About 8 × 105 viable Tn5Pxyl transposon insertion mutants capable of colony formation on rich media (PYE) plates were pooled. Next, DNA from hundred of thousands of transposon insertion sites reading outwards into flanking genomic regions was parallel PCR amplified and sequenced by Illumina paired‐end sequencing (Figure 1; Supplementary Figure 1B; Materials and methods). A single sequencing run yielded 118 million raw sequencing reads. Of these, >90 million (>80%) read outward from the transposon element into adjacent genomic DNA regions (Supplementary Figure 1C) and were subsequently mapped to the 4‐Mbp genome, allowing us to determine the location and orientation of 428 735 independent transposon insertions with base‐pair accuracy (Figure 2A; Materials and methods).

Figure 1.

Genomic high‐resolution transposon scanning strategy. Insertion mutants are pooled to generate a hyper‐saturated Tn5 mutant library. Subsequent parallel amplification of individual transposon junctions by a nested arbitrary PCR yields DNA fragments reading out of transposon elements into adjacent genomic DNA sequences. DNA fragments carry terminal adapters (orange, blue) compatible to the Illumina flow‐cell and are sequenced in parallel by standard paired‐end sequencing.

Figure 2.

Identification of essential genome features. (A) 28 735 unique Tn5 insertions sites (red) were mapped onto the 4‐Mbp Caulobacter genome that encodes 3876 annotated ORFs shown in the inner (minus strand) and middle (plus strand) tracks by black lines. (B) An 192‐bp essential genome segment (no Tn5 insertions) contains a stationary phase expressed non‐coding sRNA (Landt et al, 2008). The rectangular heat map above shows the micro‐array probe cross‐correlation pattern of the sRNA (Landt et al, 2008). The locations of transposon insertions (red marks) are shown above the genome track. P‐values for essentiality for the different gap sizes observed are below. (C) A small non‐disruptable segment containing an essential tRNA. (D) Two non‐disruptable genome regions include two regulatory sequences required for chromosome replication. (E) Locations of mapped transposon insertion sites (red marks) on a segment of the Caulobacter genome. Non‐essential ORFs (blue) have dense Tn5 transposon insertions, while large non‐disruptable genome regions contain essential ORFs (light red). For every non‐disruptable genome region, a P‐value for gene essentiality is calculated assuming uniform distributed Tn5 insertion frequency and neutral fitness costs. (F) For each of the 3876 Caulobacter ORFs, the number of Tn5 insertions is plotted against the corresponding ORF length. Non‐essential ORFs (blue), fitness relevant ORFs (Supplementary information) (green) and essential ORFs (red) have different transposon insertion frequencies. (G) The essential cell‐cycle gene divL had multiple transposon insertions within the 3′ tail. This dispensable region encodes parts of the histidine kinase domain as well as an ATPase domain that is non‐essential. (H) One of the ORFs with mis‐annotated start site. The essential cell‐cycle gene chpT tolerates disruptive Tn5 insertions in the 5′ region of the mis‐annotated ORF. The native promoter element and TSS are located downstream of the mis‐annotated start codon as confirmed by lacZ promoter activity assays.

Eighty percent of the genome sequence showed an ultrahigh density of transposon hits; an average of one insertion event every 7.65 bp. The largest gap detectable between consecutive insertions was <50 bp (Supplementary Figure 2). Within the remaining 20% of the genome, chromosomal regions of up to 6 kb in length tolerated no transposon insertions.

Essential non‐coding sequences

Within non‐coding sequences of the Caulobacter genome, we detected 130 small non‐disruptable DNA segments between 90 and 393 bp long (Materials and methods; Supplementary Data‐DT1). (Tables in the Excel file of Supplementary Data are designated DT1, DT2 and so on.) Owing to the uniform distribution of transposition across the genome (Materials and methods), such non‐disruptable DNA regions are highly unlikely (Supplementary Figure 2). Among 27 previously identified and validated sRNAs (Landt et al, 2008), three (annotated as R0014, R0018 and R0074 in Landt et al, 2008) were contained within non‐disruptable DNA segments while another three (R0005, R0019 and R0025) were partially disruptable. Figure 2B shows one of the three (Supplementary Data‐DT1) non‐disruptable sRNA elements, R0014, that is upregulated upon entry into stationary phase (Landt et al, 2008). Two additional small RNAs found to be essential are the transfer‐messenger RNA (tmRNA) and the ribozyme RNAseP (Landt et al, 2008). In addition to the 8 non‐disruptable sRNAs, 29 out of the 130 essential non‐coding sequences contained non‐redundant tRNA genes (Figure 2C); duplicated tRNA genes were found to be non‐essential. We identified two non‐disruptable DNA segments within the chromosomal origin of replication (Figure 2D). A 173‐bp long essential region contains three binding sites for the replication repressor CtrA, as well as additional sequences that are essential for chromosome replication and initiation control (Marczynski et al, 1995). A second 125 bp long essential DNA segment contains a binding motif for the replication initiator protein DnaA. Surprisingly, between these non‐disruptable origin segments there were multiple transposon hits suggesting that the Caulobacter origin is modular with possible DNA looping compensating for large insertion sequences. Thus, we resolved essential non‐coding RNAs, tRNAs and essential replication elements within the origin region of the chromosome. Although 90 additional non‐disruptable small genome elements were identified (Supplementary Data‐DT1), they cannot be explained within the context of the current genome annotation. Eighteen of these are conserved in at least one closely related species. Only two could encode a protein of over 50 amino acids.

Essential protein‐coding sequences

For each of the 3876 annotated open reading frames (ORFs), we analyzed the distribution, orientation and genetic context of transposon insertions. We identified the boundaries of the essential protein‐coding sequences and calculated a statistically robust metric for ORF essentiality (Materials and methods; Supplementary Data‐DT2). There are 480 essential ORFs and 3240 non‐essential ORFs. In addition, there were 156 ORFs that severely impacted fitness when mutated, as evidenced by a low number of disruptive transposon insertions (Supplementary methods). Figure 2E shows the distribution of transposon hits for a subregion of the genome encoding essential and non‐essential ORFs. Genome‐wide transposon insertion frequencies for the annotated Caulobacter ORFs are shown in Figure 2F. In all, 145/480 essential ORFs lacked transposon insertions across the entire coding region, suggesting that the full length of the encoded protein up to the last amino acid is essential. The 8‐bp resolution allowed a dissection of the essential and non‐essential regions of the coding sequences. Sixty ORFs had transposon insertions within a significant portion of their 3′ region but lacked insertions in the essential 5′ coding region, allowing the identification of non‐essential protein segments. For example, transposon insertions in the essential cell‐cycle regulatory gene divL, a tyrosine kinase, showed that the last 204 C‐terminal amino acids did not impact viability (Figure 2G), confirming previous reports that the C‐terminal ATPase domain of DivL is dispensable for viability (Reisinger et al, 2007; Iniesta et al, 2010). Our results show that the entire C‐terminal ATPase domain, as well as the majority of the adjacent kinase domain, is non‐essential while the N‐terminal region including the first 25 amino acids of the kinase domain contain essential DivL functions.

Conversely, we found 30 essential ORFs that tolerated disruptive transposon insertions within the 5′ region while no insertion events were tolerated further downstream (Supplementary Table 1). One such example, the essential histidine phosphotransferase gene chpT (Biondi et al, 2006), had 12 transposon insertions near the beginning of the annotated ORF (Figure 2H). These transposon insertions would prevent the production of a functional protein and should not be detectable within chpT or any essential ORF unless the translational start site is mis‐annotated. Using LacZ‐reporter assays (Supplementary methods), we found that the promoter element as well as the translational start site of chpT was located downstream of the annotated start codon (Figure 2H). Cumulatively, >6% of all essential ORFs (30 out of 480) appear to be shorter than the annotated ORF (Supplementary Table 1), suggesting that these are probably mis‐annotated, as well. Thus, 145 ORFs showed all regions were essential, 60 ORFs showed non‐essential C‐termini and the start of 30 ORFs were mis‐annotated. The remaining 245 ORFs tolerated occasional insertions within a few amino acids of the ORF boundaries (Supplementary Figure 3; Materials and methods).

The majority of the essential ORFs have annotated functions. They participate in diverse core cellular processes such as ribosome biogenesis, energy conversion, metabolism, cell division and cell‐cycle control. Forty‐nine of the essential proteins are of unknown function (Table I; Supplementary Table 2). We attempted to delete 11 of the genes encoding essential hypothetical proteins and recovered no in‐frame deletions, confirming that these proteins are indeed essential (Supplementary Table 3).

View this table:
Table 1. The essential Caulobacter genome

Among the 480 essential ORFs, there were 10 essential transcriptional regulatory proteins (Supplementary Table 4), including the cell‐cycle regulators ctrA, gcrA, ccrM, sciP and dnaA (McAdams and Shapiro, 2003; Holtzendorff et al, 2004; Collier and Shapiro, 2007; Gora et al, 2010; Tan et al, 2010), plus 5 uncharacterized putative transcription factors. We surmise that these five uncharacterized transcription factors either comprise transcriptional activators of essential genes or repressed genes that would move the cell out of its replicative state. In addition, two RNA polymerase sigma factors RpoH and RpoD, as well as the anti‐sigma factor ChrR, which mitigates rpoE‐dependent stress response under physiological growth conditions (Lourenco and Gomes, 2009), were also found to be essential. Thus, a set of 10 transcription factors, 2 RNA polymerase sigma factors and 1 anti‐sigma factor comprise the essential core transcriptional regulators for growth on rich media.

Essential promoter elements

To characterize the core components of the Caulobacter cell‐cycle control network, we identified essential regulatory sequences and operon transcripts (Supplementary Data‐DT3 and DT4). Figure 3A illustrates the transposon scanning strategy used to locate essential promoter sequences. The promoter regions of 210 essential genes were fully contained within the upstream intergenic sequences, and promoter regions of 101 essential genes extended upstream into flanking ORFs (Table I). We also identified 206 essential genes that are co‐transcribed with the corresponding flanking gene(s) and experimentally mapped 91 essential operon transcripts (Table I; Supplementary Data‐DT4). One example of an essential operon is the transcript encoding ATPase synthase components (Figure 3B). Altogether, the 480 essential protein‐coding and 37 essential RNA‐coding Caulobacter genes are organized into operons such that 402 individual promoter regions are sufficient to regulate their expression (Table I). Of these 402 essential promoters, the transcription start sites (TSSs) of 105 were previously identified (McGrath et al, 2007).

Figure 3.

Genome‐wide identification of essential gene regulatory sequences. (A) Transposon insertions within the promoter region of an essential gene are only viable if the transposon‐specific promoter points toward the gene (sense insertion); insertions in the opposite orientation (anti‐sense insertions) are lethal. The distance between the annotated start codon and the first detected occurrence of an anti‐sense insertion within the upstream region of an essential gene defines its essential promoter region. (B) Anti‐sense insertions (red marks below genome track) are absent throughout the entire ATP synthase operon while sense insertions (red marks above genome track) are only tolerated within the non‐essential lead gene and within non‐coding regions of co‐transcribed downstream genes. (C) Lengths of promoter regions that extend upstream of predicted transcriptional start sites. (D) Size distribution of essential promoter regions. The cell‐cycle master regulators ctrA, dnaA and gcrA, which are subjected to complex cell cycle‐dependent regulation, ranked among the longest essential promoter regions identified. (E) Essential cell cycle‐regulated genes are clustered according to their temporal expression profile. Essential genes with long essential promoter regions indicate cell‐cycle hub nodes subjected to complex transcriptional regulation. (F) Only insertions in the sense strand (red lines above the genome track) are tolerated within the 171 bp long essential promoter region of the cell‐cycle master regulator ctrA. Insertions in the anti‐sense strand (red lines below the genome track) are absent from the essential ctrA promoter. Both transcription start sites (arrows P1, P2), two DNA binding sites for CtrA (gray boxes) and one of two reported binding site for the SciP transcription factor (yellow boxes) are contained within the identified essential promoter region.

We found that 79/105 essential promoter regions extended on average 53 bp upstream beyond previously identified TSS (Figure 3C; McGrath et al, 2007). These essential control elements accommodate binding sites for transcription factors and RNA polymerase sigma factors (Supplementary Table 5). Of the 402 essential promoter regions, 26 mapped downstream of the predicted TSS. To determine if these contained an additional TSS, we fused the newly identified promoter regions with lacZ and found that 24 contained an additional TSS (Supplementary Table 6). Therefore, 24 genes contain at least 2 TSS and only the downstream site was found to be essential during growth on rich media. The upstream TSS may be required under alternative growth conditions.

Cell cycle‐regulated essential genes

Of the essential ORFs, 84 have a cell cycle‐dependent transcription pattern (McGrath et al, 2007; Supplementary Data‐DT5). The cell cycle‐regulated essential genes had statistically significant longer promoter regions compared with non‐cell cycle‐regulated genes (median length 87 versus 41 bp, Mann–Whitney test, P‐value 0.0018). The genes with longer promoter regions generally have more complex transcriptional control. Among these are key genes that are critical for the commitment to energy requirements and regulatory controls for cell‐cycle progression. For example, the cell‐cycle master regulators ctrA, dnaA and gcrA (Collier et al, 2006) ranked among the genes with the longest essential promoter regions (Figure 3D and E; Supplementary Data‐DT5). Other essential cell cycle‐regulated genes with exceptionally long essential promoters included ribosomal genes, gyrB encoding DNA gyrase and the ftsZ cell‐division gene (Figure 3E). The essential promoter region of ctrA extended 171 bp upstream of the start codon (Figure 3F) and included two previously characterized promoters that control its transcription by both positive and negative feedback regulation (Domian et al, 1999; Tan et al, 2010). Only one of the two upstream SciP binding sites in the ctrA promoter (Tan et al, 2010) was contained within the essential promoter region (Figure 3F), suggesting that the regulatory function of the second SciP binding site upstream is non‐essential for growth on rich media.

Altogether, the essential Caulobacter genome contains at least 492 941 bp. Essential protein‐coding sequences comprise 90% of the essential genome. The remaining 10% consists of essential non‐coding RNA sequences, gene regulatory elements and essential genome replication features (Table I). Essential genome features are non‐uniformly distributed along the Caulobacter genome and enriched near the origin and the terminus regions, indicating that there are constraints on the chromosomal positioning of essential elements (Figure 4A). The chromosomal positions of the published E. coli essential coding sequences are preferentially located at either side of the origin (Figure 4A; Rocha, 2004).

Figure 4.

Chromosomal distribution of essential transcripts and phylogenetic conservation of essential Caulobacter ORFs. (A) Chromosome distribution of essential transcripts and ORFs found within a 500‐kb window. Top panel: Essential Caulobacter transcripts are non‐uniformly distributed and are enriched near the replication origin (ori) and terminus (ter) regions of the chromosome. The P‐values for enrichment of essential Caulobacter ORFs (middle panel) and E. coli ORFs (bottom panel) are graphed as a function of chromosomal position. Inserts show a schematic representation of the corresponding circular chromosomes indicating regions of enrichment in red. (B) A heat map showing the sequence conservation of essential Caulobacter proteins across the α, β, γ, δ and ε classes of proteobacteria. Highly conserved essential proteins are in red while poorly conserved proteins are in black. (C) Venn diagram of overlap between Caulobacter and E. coli ORFs (outer circles) as well as their subsets of essential ORFs (inner circles). Less than 38% of essential Caulobacter ORFs are conserved and essential in E. coli. Only essential Caulobacter ORFs present in the STING database were considered, leading to a small disparity in the total number of essential Caulobacter ORFs.

The question of what genes constitute the minimum set required for prokaryotic life has been generally estimated by comparative essentiality analysis (Carbone, 2006) and for a few species experimentally via large‐scale gene perturbation studies (Akerley et al, 1998; Hutchison et al, 1999; Kobayashi et al, 2003; Salama et al, 2004). Of the 480 essential Caulobacter ORFs, 38% are absent in most species outside the α‐proteobacteria and 10% are unique to Caulobacter (Figure 4B). Interestingly, among 320 essential Caulobacter proteins that are conserved in E. coli, more than one third are non‐essential (Figure 4C). The variations in essential gene complements relate to differences in bacterial physiology and life style. For example, ATP synthase components are essential for Caulobacter, but not for E. coli, since Caulobacter cannot produce ATP through fermentation. Thus, the essentiality of a gene is also defined by non‐local properties that not only depend on its own function but also on the functions of all other essential elements in the genome. The strategy described here provides a direct experimental approach that, because of its simplicity and general applicability, can be used to quickly determine the essential genome for a large class of bacterial species.

Materials and methods

Supplementary information includes descriptions of (i) transposon construction and mutagenesis, (ii) DNA library preparation and sequencing, (iii) sequence processing, (iv) essentiality analysis and (v) statistical data analysis.

Conflict of Interest

The authors declare that they have no conflict of interest.

Supplementary Information

Supplementary Information

Supplementary Figures S1–3, Supplementary Tables S1–7 [msb201158-sup-0001.pdf]

Dataset 1

Excel file containing several Supplemental data tables in different worksheets [msb201158-sup-0002.xls]

Acknowledgements

This research was supported by DOE Office of Science grant DE‐FG02‐05ER64136 to HM; NIH grants K25 GM070972‐01A2 to MF, R01, GM51426k, R01 GM32506 and GM073011‐04 to LS; Swiss National Foundation grant PA00P3‐126243 to BC and the L&Th. La Roche Foundation Fellowship to BC.

Author contributions: BC designed the research. BC, EA, VSK and MJF performed the experiments and analysis. BC, JMC, BP and JAC performed the sequencing and related analysis. BC, EA, HHM and LS wrote the manuscript.

References

This is an open‐access article distributed under the terms of the Creative Commons Attribution License, which permits distribution, and reproduction in any medium, provided the original author and source are credited. This license does not permit commercial exploitation without specific permission.