Genetic screens for phenotypic similarity have made key contributions to associating genes with biological processes. With RNA interference (RNAi), highly parallel phenotyping of loss‐of‐function effects in cells has become feasible. One of the current challenges however is the computational categorization of visual phenotypes and the prediction of biological function and processes. In this study, we describe a combined computational and experimental approach to discover novel gene functions and explore functional relationships. We performed a genome‐wide RNAi screen in human cells and used quantitative descriptors derived from high‐throughput imaging to generate multiparametric phenotypic profiles. We show that profiles predicted functions of genes by phenotypic similarity. Specifically, we examined several candidates including the largely uncharacterized gene DONSON, which shared phenotype similarity with known factors of DNA damage response (DDR) and genomic integrity. Experimental evidence supports that DONSON is a novel centrosomal protein required for DDR signalling and genomic integrity. Multiparametric phenotyping by automated imaging and computational annotation is a powerful method for functional discovery and mapping the landscape of phenotypic responses to cellular perturbations.
Genetic screens for phenotypic similarity have made key contributions for associating genes with biological processes. Aggregating genes by similarity of their loss‐of‐function phenotype has provided insights into signalling pathways that have a conserved function from Drosophila to human (Nusslein‐Volhard and Wieschaus, 1980; Bier, 2005). Complex visual phenotypes, such as defects in pattern formation during development, greatly facilitated the classification of genes into pathways, and phenotypic similarities in many cases predicted molecular relationships. With RNA interference (RNAi), highly parallel phenotyping of loss‐of‐function effects in cultured cells has become feasible in many organisms whose genome have been sequenced (Boutros and Ahringer, 2008). One of the current challenges is the computational categorization of visual phenotypes and the prediction of gene function and associated biological processes. With large parts of the genome still being in unchartered territory, deriving functional information from large‐scale phenotype analysis promises to uncover novel gene–gene relationships and to generate functional maps to explore cellular processes.
In this study, we developed an automated approach using RNAi‐mediated cell phenotypes, multiparametric imaging and computational modelling to obtain functional information on previously uncharacterized genes. To generate broad, computer‐readable phenotypic signatures, we measured the effect of RNAi‐mediated knockdowns on changes of cell morphology in human cells on a genome‐wide scale. First, the several million cells were stained for nuclear and cytoskeletal markers and then imaged using automated microscopy. On the basis of fluorescent markers, we established an automated image analysis to classify individual cells (Figure 1A). After cell segmentation for determining nuclei and cell boundaries (Figure 1C), we computed 51 cell descriptors that quantified intensities, shape characteristics and texture (Figure 1F). Individual cells were categorized into 1 of 10 classes, which included cells showing protrusion/elongation, cells in metaphase, large cells, condensed cells, cells with lamellipodia and cellular debris (Figure 1D and E). Each siRNA knockdown was summarized by a phenotypic profile and differences between RNAi knockdowns were quantified by the similarity between phenotypic profiles. We termed the vector of scores a phenoprint (Figure 3C) and defined the phenotypic distance between a pair of perturbations as the distance between their corresponding phenoprints.
To visualize the distribution of all phenoprints, we plotted them in a genome‐wide map as a two‐dimensional representation of the phenotypic similarity relationships (Figure 3A). The complete data set and an interactive version of the phenotypic map are available at http://www.cellmorph.org. The map identified phenotypic ‘neighbourhoods’, which are characterized by cells with lamellipodia (WNK3, ANXA4), cells with prominent actin fibres (ODF2, SOD3), abundance of large cells (CA14), many elongated cells (SH2B2, ELMO2), decrease in cell number (TPX2, COPB1, COPA), increase in number of cells in metaphase (BLR1, CIB2) and combinations of phenotypes such as presence of large cells with protrusions and bright nuclei (PTPRZ1, RRM1; Figure 3B).
To test whether phenotypic similarity might serve as a predictor of gene function, we focused our further analysis on two clusters that contained genes associated with the DNA damage response (DDR) and genomic integrity (Figure 3A and C). The first phenotypic cluster included proteins with kinetochore‐associated functions such as NUF2 (Figure 3B) and SGOL1. It also contained the centrosomal protein CEP164 that has been described as an important mediator of the DNA damage‐activated signalling cascade (Sivasubramaniam et al, 2008) and the largely uncharacterized genes DONSON and SON. A second phenotypically distinct cluster included previously described components of the DDR pathway such as RRM1 (Figure 3A–C), CLSPN, PRIM2 and SETD8. Furthermore, this cluster contained the poorly characterized genes CADM1 and CD3EAP.
Cells activate a signalling cascade in response to DNA damage induced by exogenous and endogenous factors. Central are the kinases ATM and ATR as they serve as sensors of DNA damage and activators of further downstream kinases (Harper and Elledge, 2007; Cimprich and Cortez, 2008). To investigate whether DONSON, SON, CADM1 and CD3EAP, which were found in phenotypic ‘neighbourhoods’ to known DDR components, have a role in the DNA damage signalling pathway, we tested the effect of their depletion on the DDR on γ irradiation. As indicated by reduced CHEK1 phosphorylation, siRNA knock down of DONSON, SON, CD3EAP or CADM1 resulted in impaired DDR signalling on γ irradiation. Furthermore, knock down of DONSON or SON reduced phosphorylation of downstream effectors such as NBS1, CHEK1 and the histone variant H2AX on UVC irradiation. DONSON depletion also impaired recruitment of RPA2 onto chromatin and SON knockdown reduced RPA2 phosphorylation indicating that DONSON and SON presumably act downstream of the activation of ATM. In agreement to their phenotypic profile, these results suggest that DONSON, SON, CADM1 and CD3EAP are important mediators of the DDR. Further experiments demonstrated that they are also required for the maintenance of genomic integrity.
In summary, we show that genes with similar phenotypic profiles tend to share similar functions. The power of our computational and experimental approach is demonstrated by the identification of novel signalling regulators whose phenotypic profiles were found in proximity to known biological modules. Therefore, we believe that such phenotypic maps can serve as a resource for functional discovery and characterization of unknown genes. Furthermore, such approaches are also applicable for other perturbation reagents, such as small molecules in drug discovery and development. One could also envision combined maps that contain both siRNAs and small molecules to predict target–small molecule relationships and potential side effects.
How to predict gene function from phenotypic cues is a longstanding question in biology.
Using quantitative multiparametric imaging, RNAi‐mediated cell phenotypes were measured on a genome‐wide scale.
On the basis of phenotypic ‘neighbourhoods’, we identified previously uncharacterized human genes as mediators of the DNA damage response pathway and the maintenance of genomic integrity.
The phenotypic map is provided as an online resource at http://www.cellmorph.org for discovering further functional relationships for a broad spectrum of biological module
Forward genetic screens for visual phenotypes in model organisms have proven a powerful method for associating phenotypes with genes and for the prediction of functional relationships (Jorgensen and Mango, 2002; St Johnston, 2002). Aggregating genes by similarity of their loss‐of‐function phenotype has provided key insights into signalling pathways that have a conserved function from Drosophila to human (Nusslein‐Volhard and Wieschaus, 1980; Bier, 2005). Complex visual phenotypes, such as defects in pattern formation during development, greatly facilitated the classification of genes into pathways, and phenotypic similarities in many cases predicted molecular relationships.
The possibility of depleting specific transcripts by RNA interference (RNAi) in cells has resulted in new screening techniques for associating genes with biological processes (Dorsett and Tuschl, 2004; Echeverri et al, 2006; Moffat et al, 2006; Boutros and Ahringer, 2008). Simple, univariate phenotypes, such as measurements of cell viability or specific reporter gene activity, have been applied to screen for modifiers on a genome‐wide scale; however, the information content in finding different genes sharing the same univariate phenotype is limited (Boutros et al, 2004; DasGupta et al, 2005; Whitehurst et al, 2007). More complex multiparametric readouts from microscopy and automated image analysis promise broad coverage of distinct phenotypes. Such visual phenotypes could predict specific functional relationships between genes, while not being biased by the particular choice of phenotypic assay.
As the technology for carrying out large‐scale screens by imaging of cells has been established in recent years, a main challenge is the automated analysis of images and subsequent interpretation of cellular phenotypes. Extracting functional relationships on a genome‐wide scale is an unresolved problem. Cells respond to many extrinsic and intrinsic stimuli and change their shape, cell‐cycle status and proliferation rate. Similar to patterning decisions in whole organisms, such changes are broad reflectors of cellular functions and have the potential to simultaneously provide a multitude of views on biological processes. Annotation of phenotypes by visual inspection has been effective (Kiger et al, 2003; Eggert et al, 2004; Sonnichsen et al, 2005), but creates a bottleneck for larger experiments. Several studies used automated extraction of multiparametric computational descriptors of cellular morphology to analyse the effects of small molecules or siRNAs on cultured cells (Perlman et al, 2004; Neumann et al, 2006; Bakal et al, 2007; Loo et al, 2007; Jones et al, 2009). Owing to the complexity of the multivariate data analysis, these studies focused on small sets of reagents. Genome‐wide studies have been performed for simple, univariate phenotypes (Mukherji et al, 2006; Sims et al, 2009). Genome‐wide analysis of high‐dimensional, multiparametric descriptors, and hence the inference of more specific, multidimensional phenotypes, has remained a challenge.
In this study, we describe an experimental and computational approach to predict gene function genome‐wide based on changes in the morphology of individual cells within cell populations. We establish a computational method to separate multivariate variation of interest from noise induced by the environment or intrinsic stochastic behaviour of the cells, and to generate multivariate phenotypic profiles that summarize similarity and dissimilarity of phenotypes at the level of treated cell populations. We applied this methodology to a genome‐wide RNAi screen and produced a phenotypic map of 1820 siRNA‐mediated perturbations that led to significant phenotypic variation. We investigated the genes DONSON, SON, CADM1 and CD3EAP that were found in phenotypic proximity to known components of cell‐cycle regulation and DNA damage response (DDR). Further experiments characterized their roles in the maintenance of genomic integrity.
Automated analysis for high‐throughput imaging
We established a method for generating phenotypic profiles by automated microscopy, measuring the effects of RNAi‐mediated knockdowns on the morphology of HeLa cells (Figure 1). For each perturbation, 1000 cells were reverse transfected with siRNAs spotted in 384‐well plates. Each plate also contained negative controls and positive control siRNAs with known phenotypes. Forty‐eight hours after siRNA transfection, cells were fixed, stained for DNA and the cytoskeletal proteins actin and tubulin, and imaged using an automated microscope (Figure 1A and B).
To identify distinct phenotypes, we used automated image analysis to classify all individual cells based on nuclear and cytoskeletal fluorescent markers. Nuclei and cytoplasm boundaries of cells (Figure 1C) were determined by a segmentation algorithm. We then computed 51 cell descriptors that quantified intensities, shape characteristics and texture (Figure 1F; Supplementary information). The descriptors were rotationally invariant, robust to pixel noise and served as input for classification of cells by a support vector machine (SVM) (Boser et al, 1992). Cells were classified into 1 of 10 classes, which included cells showing protrusion/elongation, cells in metaphase, large cells, condensed cells, cells with lamellipodia and cellular debris (Figure 1D and E; Supplementary Table II). Feature selection and sensitivity analysis showed that the classification performance was not driven by single descriptors, but rather depended on the joint behaviour of multiple descriptors (Supplementary Figure 2).
Under wild type and each perturbed condition, imaged cell populations always consisted of multiple phenotypic classes. We reasoned that phenotypic information could be inferred from changes in frequencies of the different classes and from descriptor summaries over the cell populations. For each siRNA experiment, we computed a phenotypic profile, which is a vector of 13 numbers representing cell count, median cellular descriptors computed on the population and proportions of each cell class (Supplementary Table VII).
To assess the performance of the approach, we first carried out a small siRNA screen by targeting approximately 800 transcripts in HeLa cells, including most kinases and controls with known phenotypes (Figure 2A). Quantitative descriptors of cells after perturbations (Figure 2B) were evaluated to assess reproducibility. The results showed that median values of the cellular descriptors, computed over cell populations, were highly reproducible across replicate experiments, with a median correlation coefficient of 0.80 (Figure 2C and D). Cell classification accuracy, based on cellular descriptors, was estimated by cross‐validation to be 96–100%, depending on cell class (Table I). Median descriptors for the defined siRNA controls UBC, CLSPN and TRAPPC3 were well separated from those of the Renilla luciferase (Rluc) controls, showing multiparametric Z′ factors of 0.79, 0.62 and 0.60 (Figure 2D and E; Supplementary information).
Source data for Figure 2B
Source data for Figure 2C
Source data for Figure 2D
Source data for Figure 2E
The experiments also demonstrated that changes in individual descriptors contained phenotypically relevant information. For example, though Rluc siRNA‐treated cells showed a median size of 910 μm2, targeting the mitotic inhibitor kinase WEE1 led to significant smaller cells, with a median size of 453 μm2 (Wilcoxon rank sum test, P<10−15). Another example is PPP4C, whose depletion led to significantly elongated cells compared to Rluc treatment (median cell eccentricity of 0.66 compared to 0.59, Wilcoxon rank sum test, P<10−15). These results indicate high reproducibility of the experimental and computational approach, and its ability to quantitatively discern biologically relevant phenotypic variation.
Genome‐wide RNAi screen to cluster genes based on cellular descriptors
To cluster genes and predict function on a genome‐wide scale, we measured the effects of 22 839 siRNA‐mediated knockdowns on HeLa cells (see Materials and methods). We acquired four images per perturbation, from which ∼6.5 million cells were automatically segmented, quantified and classified (a comprehensive track of the analysis is provided at http://www.cellmorph.org). Each siRNA effect was summarized by a phenotypic profile as described above. On average, 324 cells were recorded in Rluc siRNA‐negative control treatments. siRNAs against PLK1, an essential cell‐cycle regulator, reduced the number of cells eight‐fold. siRNA‐mediated depletions resulted in a cell population comprising a variety of cell classes. For example, negative control Rluc wells contained on average 57.3% cells classified as normal, 15.0% condensed cells and 2.0% metaphase cells, with the remainder distributed among other classes.
To quantify the differences between perturbation effects, we measured the similarity between phenotypic profiles. As profiles are multiparametric descriptors whose components have different dynamic ranges and signal‐to‐noise ratios, because of cell intrinsic stochastic behaviour and experimental factors, we transformed the profile values of each siRNA perturbation into a set of scores ranging between 0 and 1, using for this a parametric family of sigmoidal functions (see Supplementary information). We termed the vector of scores a phenoprint (Figure 3C) and defined the phenotypic distance between a pair of perturbations as the distance between their corresponding phenoprints. We reasoned that pairs of related proteins should be enriched for smaller phenotypic distances, when compared with random pairs, and used distance metric learning (Xing et al, 2003) to determine the sigmoid transformation parameters that led to maximal enrichment. We used the STRING 7.1 dataset (von Mering et al, 2007) as a set of related protein pairs, considering all interactions with a confidence score higher than 0.4, including pairs predicted with the STRING neighbourhood, gene fusion, co‐occurence, co‐expression, experiments, databases and text mining methods. Out of the genome‐wide RNAi screen, 1820 siRNA perturbations produced non‐zero phenoprints. To test whether individual genes in STRING influenced the computation of the phenoprints, we removed all information about the genes RRM1, TMEM61, CLSPN, CADM1, CD3EAP, CEP164, NUF2, DONSON and SON (as shown in Figure 3C), recomputed their phenoprints and found that they were unaltered (Supplementary Figure 8). This indicates that the distance learning method is based on agglomerative properties of the training data, not on presence of individual gene pairs. We estimate that the false‐positive rate of the computational analysis is low because negative control siRNAs Rluc showed in 97.1% (132/136) of all cases a zero phenoprint; 0.8% (1/136) of positive controls (PLK1) scored as false negatives.
Visualization of phenotypic relationships and independent confirmation
To visualize the distribution of the 13‐dimensional phenoprints, we projected them into a 2‐dimensional map that is representative of the phenotypic similarity relationships (Figure 3A). The map shows the phenotypic landscape spanned by the genome‐wide perturbations. Seventeen clusters of similar phenoprints were defined and added to the map, to highlight different phenotypic regions (see Supplementary information). These clusters included phenotypes characterized by an over‐representation of cells with prominent lamellipodia (WNK3, ANXA4), cells with prominent actin fibres (ODF2, SOD3), abundance of large cells (CA14), many elongated cells (SH2B2, ELMO2), decrease in cell number (TPX2, COPB1, COPA), increase in number of cells in metaphase (BLR1, CIB2) and combinations of phenotypes such as presence of large cells with protrusions and bright nuclei (PTPRZ1, RRM1). Consistent with its phenoprint, ELMO2 has been described as an upstream regulator of the Rac1 GTPase (Gumienny et al, 2001), which controls the actin cytoskeleton organization. Other examples include COPB1 and COPA, which are essential for the maintenance of the Golgi architecture and are required for cell viability (Pepperkok et al, 1993; Guo et al, 1994; Neumann et al, 2006).
To test for biological reproducibility with independent siRNAs, we selected 604 out of 1820 siRNAs for retesting and confirmed that the phenotypes were reproduced for 310 (51.3%) genes in HeLa cells (Supplementary Table X), which is a rate similar to previously published reports (Tang et al, 2008). Furthermore, we assessed their phenotypes in a different cell type, osteosarcoma U2OS cells, in which 280 of the 310 siRNAs reproduced their phenotypes. Overall, 44% (122 of 280) were functionally uncharacterized; 31% (86 of 280) were implicated in cytoskeletal organization, intracellular transport, cell‐cycle regulation or DDR (Figure 3D). We also monitored knockdown efficiency in a representative set of genes by real‐time quantitative PCR (qPCR) and found that mRNA levels were reduced to at least 65% (Supplementary Figure 10). These experiments demonstrated that the screen identified functionally uncharacterized genes and reproducibly associated them with phenotypic clusters based on their cell population phenotypes.
Phenotypic proximity to predict gene function
Annotation of the 17 clusters, comprising 943 siRNA perturbations, indicated that genes having similar phenoprints tended to have similar biological functions (Supplementary Table XIII). For example, the cluster containing phenoprints characterized by bright nuclei includes several genes that are implicated in cell‐cycle regulation or DNA replication such as CDCA8, CENPI, DTL and MCM10. Interestingly, the three subunits RPA1, RPA2 and RPA3 of the replication factor A, required for DNA replication and DDR, are also present in this cluster. Furthermore, the subunits of the Golgi coatomer complex COPB2, COPZ and COPA, and the essential cell‐cycle kinase WEE1, all required for cell viability, had similar phenoprints. Comparisons of the phenotypic graph with functional networks including MouseNet (Kim et al, 2008) show significant enrichment of phenotypic similarities within network gene pairs (see Supplementary information).
To associate uncharacterized genes with biological processes, we selected candidate genes in proximity to known pathway components, and focused on phenotypic clusters that contained genes associated with DDR signalling and genomic integrity. We first examined a phenotypic cluster in proximity to CEP164 (Figure 3A and C). CEP164 is localized to the distal appendages of mature centrioles (Graser et al, 2007) and has been described as an important component in the DNA damage‐activated signalling cascade (Sivasubramaniam et al, 2008). The phenotypic cluster included other proteins implicated in kinetochore‐associated functions, such as NUF2 (DeLuca et al, 2002) (Figure 3B) and SGOL1 (Pouwels et al, 2007). In addition, we selected two largely uncharacterized factors: SON, a DNA binding protein localized at the mitotic spindles (Nousiainen et al, 2006), and DONSON (Figure 3B), which has not been studied earlier. The two genes are co‐located in the human genome at 21q22. We observed that, among other quantitative descriptors, cell populations of this cluster were characterized by a relative decrease in cell number and an increase in metaphase cells.
A second, phenotypically distinct cluster included RRM1 (Figure 3A–C), which is part of the ribonucleoside–diphosphate reductase complex required for replication and DNA repair (Xue et al, 2003), and CLSPN (Figure 3B), a protein required for the activation of the checkpoint kinase CHEK1, a key mediator of DDR (Kumagai and Dunphy, 2000; Liu et al, 2006). This cluster also contained other DDR genes such as PRIM2 (Weiner et al, 2007) and SETD8 (Jorgensen et al, 2007). From this cluster, we selected the poorly characterized genes CADM1 and CD3EAP and hypothesized that they might be components of the DDR pathway.
To confirm target specificity, we compared the effects of four single siRNAs targeting a gene to that of the siRNA pool used in the screen and assessed their knockdown efficiency by qPCR (Supplementary Figure 11). These experiments showed that for each candidate, at least two out of the four independent siRNAs were effective in target knockdown and reproduced the phenotypes observed in the screen (Supplementary Table XI).
DONSON is required for cell‐cycle progression and localizes to the centrosome
To further investigate the role of DONSON in cell‐cycle progression, we performed time‐course experiments with HeLa cells depleted for DONSON and measured the DNA content. As shown in Figure 4A, we observed a slower S‐phase progression compared to controls, suggesting a DNA replication defect, followed by accumulation of cells in G2/M at 48 h. We then assessed whether DONSON is required for DNA replication during S‐phase progression. U2OS cells were arrested with hydroxyurea (HU) in G1/S, and incorporation of bromodeoxyuridine (BrdU) was measured after release. As shown in Figure 4B, depletion of DONSON resulted in significant decrease in BrdU incorporation, indicating that DONSON is required for S‐phase transition. We next asked whether the level of DONSON protein is regulated during the cell cycle. U2OS cells were arrested either in G1/S and G2/M by HU or nocodazole, respectively. After release from HU, we assessed DONSON protein levels for up to 24 h by immunoblot analysis. As shown in Figure 4C, DONSON levels peaked at 6 h after release and preceded the cyclin A expression as a marker for G2. Furthermore, when cells were released from G2/M‐phase arrest, peak DONSON levels were observed later than phospho‐H3 as a marker for M phase (Figure 4C). Taken together, these experiments indicate that DONSON levels are regulated during cell division, peaking during S phase and that DONSON is required for cell‐cycle progression.
To examine the subcellular localization of DONSON protein during the cell cycle, we expressed a DONSON‐HA transgene, as we could not detect endogenous levels of DONSON in immunofluorescence experiments. We observed that DONSON localized in two perinuclear foci, which co‐localized with the centrosomal markers γ‐tubulin and centrin (Figure 4E and F; Supplementary Figure 15). In DONSON‐depleted U2OS and HeLa cells, we observed a 10‐fold increase of multipolar spindles in metaphase cells compared to control treatments (Figure 4D; Supplementary Figure 14). These results showed that DONSON protein is localized at the centrosome and is required for proper formation of bipolar spindles during cell‐cycle progression.
DONSON, SON, CADM1 and CD3EAP are required for genomic integrity
The bi‐orientation attachment of each chromosome to both poles of the mitotic spindle is essential for genomic integrity (Loncarek et al, 2007). To test the involvement of DONSON in genomic integrity, we used quantitative immunofluorescence to measure the formation of γH2AX foci as a marker of DDR signalling (Figure 5A and B; Supplementary Figures 12 and 13). We observed that DONSON depletion led to the formation of γH2AX foci similar to the depletion of the known DDR effectors RRM1 and CHEK1. In addition, we checked other candidate genes regarding their involvement in genomic integrity. Similar to DONSON knockdown, we observed increased γH2AX formation in cells depleted of SON, CADM1 and CD3EAP (Figure 5A and B). One possible explanation of elevated γH2AX foci formation is the accumulation of cells in S phase (Tanaka et al, 2006). To test this hypothesis, we monitored cyclin levels in DONSON‐depleted U2OS cells. As shown in Figure 5C, knock down of DONSON in U2OS cells led to increased cyclin D1 and decreased cyclin A and cyclin B1 levels, indicating that cells were arrested in G1 and that the elevated γH2AX foci formation observed in DONSON‐depleted cells is not caused by an accumulation of cells in S phase. In contrast to DONSON, SON‐depleted cells showed an enrichment of cyclin B1 protein, whereas cyclin A was decreased compared to the control treatment; this indicated that cells were arrested in G2/M (Figure 5C). In summary, these results suggest a role of DONSON, SON, CADM1 and CD3EAP in the maintenance of genomic integrity.
DONSON, SON, CADM1 and CD3EAP are mediators of the DDR
The maintenance of genomic integrity is dependent on a functional DDR, a process comprising multiple signal transduction pathways that coordinate cell‐cycle transitions, DNA replication, DNA repair and apoptosis (Cimprich and Cortez, 2008). The phosphorylation of CHEK1 is an early event in the DDR. To analyse the function of DONSON in DDR, we used immunoblot analysis to assess the phosphorylation of CHEK1 on γ irradiation in U2OS cells 48 h after transfection with DONSON. We observed that the phosphorylation of CHEK1 was strongly reduced in DONSON‐depleted cells compared to negative control (Figure 6A). Moreover, DONSON‐depleted cells showed an impaired phosphorylation of CHEK1 (Figure 6B and C) after UV exposure, similar to the effect of γ irradiation. Although ATM phosphorylation was not affected in DONSON‐depleted cells, we observed an attenuated phosphorylation of RPA2, H2AX and NBS1 (Figure 6C). These results indicate that DONSON acts downstream of ATM and upstream of CHEK1 in the DDR signalling cascade.
To test whether DONSON is required for the recruitment of DDR‐associated proteins onto chromatin, we extracted chromatin from UV‐irradiated cells and analysed RPA2 levels by immunoblot (Liu et al, 2007). As shown in Figure 6C, we observed that the amount of RPA2 on the chromatin was significantly decreased on DONSON knockdown. As DONSON depletion leads to G1 arrest, we cannot fully exclude the possibility that cell‐cycle arrest may in part contribute to the loss of RPA2, as shown previously for other factors (Szuts et al, 2003). However, these results suggest that DONSON is required for the recruitment of RPA2 to DNA lesions and for the activation of the DDR on γ and UV irradiation.
In addition to DONSON, we also tested several other phenotypically similar factors for a role in DDR signalling. We observed that knock down of SON, CADM1 and CD3EAP resulted in decreased phosphorylation of CHEK1 on γ irradiation (Figure 6A). Furthermore, depletion of SON impaired phosphorylation of RPA2, H2AX and NBS1 on UV irradiation. We also observed that RPA2 hyperphosphorylation is strongly attenuated after SON depletion in the chromatin fraction 2 h after UV exposure (Figure 6C).
In summary, our results suggest that SON, CADM1 and CD3EAP are components required for functional DDR response. Moreover, as ATM/ATR phosphorylation were not affected by the knock down of DONSON or SON, we integrate DONSON and SON in DDR signalling downstream the activation of ATM/ATR but upstream of RPA2 and NBS1.
We present an experimental and computational approach to create a phenotypic map of a genome‐wide set of RNAi‐mediated perturbations, using automated phenotyping of cell populations by high‐throughput imaging and multiparametric computational analysis. For ∼10% of targeted transcripts, we detected phenotypic changes on depletion of transcripts by RNAi. Similarity or dissimilarity of phenotypes was quantified from multiparametric descriptors by a computational method termed distance metric learning, which is able to learn a measure of (dis)similarity from a set of instances and to generalize this to unseen relationships. The derived measures for phenotypic similarity for each RNAi perturbation were visualized in a map and used to generate hypotheses about the function of genes. As the approach is unbiased, unexpected relationships can be uncovered. In this study, we investigated four previously poorly characterized genes, and described their roles in the DDR.
Cell morphology provides a broad reflection of many cellular processes, including cytoskeleton rearrangement, signalling, cell division and cell survival. Similar to forward genetic screens, the approach gives us the power to detect components in a wide range of processes. We believe that this approach ensures a coverage of many phenotypes and allows the analysis of multiple functions of the same gene, although it might miss modulators that could be detected with focused but more sensitive assays. However, it facilitates analysis of multiple functions of the same gene, and discovery of unanticipated functional relationships.
Quantifying similarity or dissimilarity between multiparametric phenotypic profiles is challenging, because of the different weights that can be given to the different parameters. For analysis on a genome‐wide scale, an automated and objective approach is needed. We used distance metric learning to transform raw phenotypic profiles, which are a combination of genetic, environmental and stochastic variation, into biologically relevant phenoprints that capture the significant effects of the genetic perturbations. In this machine learning approach, the parameters used for the transformation are learned by presenting the algorithm with training sets of phenotypic profile pairs that are considered similar, as well as with pairs that are unrelated. The learned metric is then applied to unseen data. An advantage of this approach is that it does not rely on arbitrary cut‐offs or ad hoc scoring functions. We note that distance metric learning has already been useful in other fields of biology. For instance, to compute the alignment score matrices used in protein sequence comparisons, such as PAM matrices (Dayhoff et al, 1978), their parameters are learned from a training set of naturally diverged sequences and the matrices are then used for inferring the degree of homology of protein sequences in general.
To test whether the prediction of gene function is accurate, we examined several candidates that were in phenotypic proximity to known components of the DDR and genomic integrity. We characterized the role of DONSON in DNA replication, proper spindle formation and the DDR. In agreement with our observation that knock down of DONSON led to the formation of multipolar spindles and the induction of G1 arrest, a loss of centrosomal integrity has been described earlier to result in checkpoint activation and inhibition of G1/S progression (Mikule et al, 2007). Moreover, several studies linked the loss of DDR components such as CHEK1 (Kramer et al, 2004) to impaired genomic integrity. We suggest that DONSON is part of a centrosomal network that integrates cell‐cycle arrest and repairs in response to genotoxic stress (Tang et al, 2006; Loffler et al, 2007). This model is supported by the finding that the centrosomal protein CEP164 mediates the DDR by interacting with ATM and ATR, thus forming a link between the DDR and the centrosome (Sivasubramaniam et al, 2008). Furthermore, we showed that SON, CADM1 and CD3EAP are required for the DDR. Recently, the loss of SON has also been implicated in leukaemogenesis (Ahn et al, 2008). The reduced pCHEK1 response caused by CD3EAP knockdown is consistent with CD3EAP being a downstream target of ATR on induced DNA damage (Matsuoka et al, 2007). Interestingly, CD3EAP is encoded by a gene locus that overlaps with ERCC1, a DNA excision repair gene that is often mutated in colorectal cancer (Skjelbred et al, 2006). Loss of DDR components results in increased basal DNA damage and ultimately in a loss of genomic integrity. Consistently, knock down of the candidate genes DONSON, SON, CADM1 and CD3EAP led to increased γH2AX formation, supporting a function of these genes not only in the DDR but also in the maintenance of genomic integrity.
The results show that the calculated phenoprints generate useful hypotheses. The complete data set and computational methods are available at http://www.cellmorph.org and can serve as a resource for further functional discovery. Cell population‐based phenotypic readouts are a sensitive method for deriving functional relationships and predict molecular functions.
Materials and methods
HeLa and U2OS cell lines were maintained in DMEM (Invitrogen) with 10% foetal bovine serum (FBS) (Invitrogen) and supplemented with penicillin (100 U/ml)/streptomycin (100 μg/ml) (Invitrogen). Cell lines were cultured at 37°C, 5% CO2 in a humidied incubator according to standard procedures.
An HA‐tagged version of human DONSON was generated by modification of a DONSON cDNA clone (SC111799, OriGene). In brief, Don‐HA forward and reverse primers (Supplementary Table I), and Phusion polymerase (Finnzymes) were used to amplify the DONSON coding sequence, cloned in frame with a C‐terminal HA tag. The resulting PCR product was inserted into the pcDNA3.1(+) (Invitrogen) using NheI and EcoRI restriction sites. Cloning was confirmed by sequencing.
Generation of DONSON antibody
A rabbit anti‐DONSON antibody was raised against a PGFRKPPEVVRLRRKRAR peptide (Charles River Laboratories). In western blots, the antibody specifically recognized a band corresponding to a size of 67 kDa. Specificity of the antibody was confirmed by siRNA knock down of DONSON.
Human siRNA library
The genome‐wide siRNA library siGENOME (Dharmacon) was re‐annotated against the NCBI RefSeq database (release 27). siRNA sequences were mapped to RefSeq transcripts and assigned to the corresponding gene. Out of the 21 061 siRNA pools, 17 145 were predicted to specifically target a single gene by perfect sequence identity.
High‐throughput RNAi screening
For cell‐based screening, the siRNA library was arrayed in black 384‐well, clear bottom plates (Corning #3712 or BD Flacon #353962) using a Biomek FX200 liquid handling system (Beckman Coulter). Each well contained 5 μl of a 500 nM pool of four synthetic siRNA duplexes (dissolved in 1 × siRNA solution buffer, Dharmacon). Library siRNAs were spotted in columns 5–24; the remaining columns were used for controls. Positions A04 and B04 contained an siRNA pool targeting Rluc as negative control. Positions I04 and J04 contained an siRNA pool targeting PLK1. Reverse transfection of cells with siRNA pools was performed by delivering 15 μl of RPMI (Invitrogen) containing 0.05 μl of Dharmafect1 (Dharmacon). After 30 min of incubation at room temperature, 1000 HeLa cells in 30 μl of DMEM medium (Invitrogen) supplemented with 10% FBS (Invitrogen) were added to the siRNA transfection mix. All dispensing steps were performed with a Multidrop Combi dispensing system (Thermo). Plates were incubated for 48 h at 37°C/5% CO2.
For fluorescence microscopy, cells were fixed, permeabilized and immunostained for DNA, tubulin and actin using a Beckman FX200 liquid handling robot. First, cells were fixed with 5% paraformaldehyde (PFA) in PBS for 20 min at room temperature and then permeabilized with 0.2% Triton X‐100 in PBS. Subsequently, cells were washed with 0.05% TX‐100 in PBS and incubated with blocking buffer (0.05% TX‐100/3% BSA in PBS). Cells were incubated overnight at 4°C with primary antibody anti‐α‐tubulin (Sigma, DM1A/T9026) diluted in PBS/0.05% TX‐100/3% BSA. Then, cells were washed three times with PBS and incubated with Alexa 488 secondary antibody (1:500 in PBS, Invitrogen) for 45 min at room temperature. Actin was stained with TRITC‐phalloidin (Sigma) in the secondary antibody solution, and DNA with Hoechst 33242 (Invitrogen). Cells were washed with PBS and stored therein. Batches of 10 384‐well plates were processed per batch.
Images were acquired on an automated BD Pathway 855 Bioimaging System (Becton Dickinson) with a 20 × objective (Olympus, NA=0.75) and a Hamamatsu monochrome digital black and white camera (Orca‐ER). For high‐throughput screening, plates were loaded onto the microscope with a Twister II Microplate robotic arm. Images of four different positions in each well were acquired, each containing channels for Hoechst (DNA), Alexa 488 (tubulin) and TRITC (actin). Each image had a resolution of 670 × 510 pixels with each channel at 12‐bit intensity resolution. The total number of cells measured in a well was typically around 300. For the primary screen, a total of 22 839 wells were imaged in four spots each, using three channels, resulting in 274 068 grey scale images. The combination of the three channels leads to 91 356 three‐colour images.
Automated phenotyping of cells
We used the EBImage package to perform the image processing operations (Pau et al, 2010). Images consisted of three channels (actin, tubulin and DNA). First, nuclei were segmented by adaptive thresholding of the DNA channel, distance map computation and watershed segmentation. Next, nuclei were regularized by morphological opening, and internal holes were removed by inverse flood fill (see Supplementary information). Cell boundaries were identified by Voronoi segmentation (Jones et al, 2009). Each cell was characterized by a set of 51 morphological descriptors, computed on different channels, which were grouped into 4 categories: 9 geometric features, 26 Haralick textural features, 11 Zernike moments and 5 miscellaneous features (see Supplementary information). All descriptors are translation and rotation invariant and are commonly used in cell imaging for classification purposes (Loo et al, 2007). Cells were classified into 1 of 10 classes (Supplementary Table II), using the ‘one‐against‐one’ multi‐class strategy of the SVM algorithm, with a training set of 2545 manually annotated cells (Supplementary Tables III, IV, V and VI). Each well subject to siRNA perturbation was summarized by a vector of 13 numerical descriptors (Supplementary Table VII). Each value of a phenotypic profile was transformed to a [0,1] score using a set of parameterized monotonic functions, leading to a vector of 19 scores, which we termed a phenoprint. A phenoprint was considered a hit if at least one of its elements was larger than 0.5. The phenotypic distance between two perturbations was defined as the distance between two phenoprints, according to a modified L1 metric (see Supplementary information).
Assessment of the siRNA target specificity
To confirm siRNA‐mediated phenotypic changes observed in the genome‐wide screen, we used independent siRNA pools (Qiagen) for 604 gene transcripts (Supplementary Table X). The phenotype of the siRNA perturbation was monitored by microscopy as described above in HeLa and U2OS cells. Phenotypic changes were visually confirmed for 51% of target genes. For 60 of the 604 target genes, we additionally used individual siRNA duplexes (Dharmacon). In this study, 73% of phenotypic changes were visually confirmed with at least two out of four individual siRNAs (Supplementary Table XI). siRNA sequences are listed in Supplementary Table XII and can also be accessed at http://rnai.dkfz.de.
DNA content profiling using high‐throughput immunocytometry
After transfection with siRNAs, cells were fixed by adding ice‐cold 80% EtOH for 30 min. Cells were rehydrated by washing two times with PBS. After RNaseA digest (100 μg/ml) for 1 h at 37°C, cells were stained for 15 min in PBS containing 10 μg/ml propidium iodide (Molecular Probes). The stained cells were scanned with an Acumen Explorer eX3 microplate cytometer (TTP LabTech). The resulting DNA content histograms were manually gated.
Assessment of γH2AX foci formation
For qualitative analysis of γH2AX foci formation, cells were reverse transfected on glass coverslips. Cells were fixed, permeabilized and incubated with the primary antibody to phospho‐H2AX (Upstate Biotechnology, 1:300) and subsequently with Alexa 488‐conjugated secondary antibody (Invitrogen, 1:500). DNA was counterstained with Hoechst 33342 (Sigma, 1:1000 in PBS). Coverslips were mounted onto glass slides with Fluoromount G (Southern Biotech) mounting media. An Axioimager Z1 Microscope (Zeiss) equipped with a × 63 oil objective (Zeiss, NA=1.4) and an apotome were used for image acquisition.
Quantitative assessment of γH2AX foci formation was performed by reverse transfection of cells with siRNAs in 384‐well plates. After incubation for 72 h, γH2AX foci formation was monitored by immunofluorescence staining of phospho‐H2AX as described above and acquired using an Acumen Explorer microplate reader.
Forty‐eight hours after siRNA transfection, cells were either γ irradiated (10 Gy) or UVC irradiated (20 J/m2) and collected 1 and 2 h later. Whole‐cell lysates were prepared in urea buffer (8 M urea, 0.1 M NaH2PO4, 10 mM TRIS–HCl, pH 7.5–8, protease inhibitors). Samples were subjected to SDS–PAGE (NuPage, 4–12% Bis–Tris; Invitrogen), transferred to PVDF membrane (Immobilon, Millipore) and subsequently incubated with antibodies. The following primary antibodies were used: CHEK1 (#2345), pCHEK1 (#2344), pNBS1 (#3001), NBS1 (#3000), cyclin A (#4656), pATM (#4526), ATM (#2873) and MCM2 (#4007) from Cell Signaling Technology, cyclin D1 (SC‐8396) from Santa Cruz, RPA2 (NA19L) and PARP (AM30) from Calbiochem, phospho‐H2AX (#07‐164) and cyclin B1 (#05‐373) from Millipore, actin (ab6276) from Abcam and SON (HPA023535) from Sigma. Secondary HRP‐conjugated anti‐mouse or rabbit antibodies were obtained from Amersham.
For chromatin isolation, cells were washed with PBS, resuspended in CSK buffer (10 mM Pipes, pH 6.8, 100 mM NaCl, 300 mM sucrose, 3 mM MgCl2, 1 mM EGTA, 50 mM NaF, 0.1 mM Na‐orthovanadate (Sigma‐Aldrich), 0.1% Triton X‐100 (Sigma‐Aldrich) and protease inhibitors (Roche)), and incubated on ice for 10 min. Cytoplasmic proteins were separated from nucleic proteins by low‐speed centrifugation at 1300 g for 5 min. Isolated nuclei proteins were washed once in CSK buffer and lysed in solution B (3 mM EDTA, 0.2 mM EGTA, 1 mM DTT, and protease inhibitors). After centrifugation (1700 g for 5 min), pellets were resuspended in urea buffer, 2 × SDS loading buffer was added and samples were boiled for 10 min.
Subcellular localization of DONSON
Cells were transfected with hDonson‐HA plasmid using Trans‐IT LT (Mirus) transfection reagent; 24 h after plasmid transfection, cells were either fixed in −20°C methanol/acetone (1:1) for 7 min or PFA after permeabilization with PBS supplemented with 0.05% Triton X‐100. Immunofluorescence microscopy was performed with rabbit anti‐HA polyclonal antibodies (Sigma #H6908) and mouse anti‐α‐tubulin (Sigma #T6557) or rabbit anti‐centrin (Sigma #C7736) and mouse anti‐HA antibodies (CST #2367). Alexa Fluor 488 goat anti‐rabbit (Invitrogen #A‐11008), Alexa Fluor 488 goat anti‐mouse (Invitrogen #A‐11001), Alexa Fluor 594 goat anti‐mouse (Invitrogen #A‐11005) and Alexa Fluor 594 goat anti‐rabbit (Invitrogen #A‐11012) were used as secondary antibodies. Images were acquired using an Axio CellObserver (Zeiss) equipped with a Colibri LED light source, standard fluorescence filters and a 63 × oil objective (Zeiss, NA=1.4).
Quantitative real‐time PCR
Forty‐eight hours after siRNA transfection, RNA was isolated with the RNeasy Mini kit (Qiagen) and 1 μg RNA was used as a template for cDNA synthesis using oligo‐dT primer and the RevertAid H Minus First Stand cDNA Synthesis kit (Fermentas #K1632). Expression levels were normalized against GAPDH expression. The level of target gene knockdown was quantified in three independent experiments. Quantifications were performed using the Universal Probe Library (Roche) on a Light Cycler 480 (Roche). Primers for real‐time PCR were designed using ProbeFinder (Roche) and are provided in Supplementary Table I.
We are grateful to Moritz Gilsdorf and Zeynep Arziman for bioinformatic support, Aennas Abbas for technical support and Richard Bourgon for helpful comments on the paper. TH is supported by a fellowship of the Studienstiftung. The research was supported in part by a Marie Curie Excellence Grant from the European Commission, the Helmholtz Association Alliance for Systems Biology, the BMBF and a Research Grant from the Human Frontiers Sciences Program Organization.
Conflict of Interest
The authors declare that they have no conflict of interest.
Supplementary text, Supplementary figures S1–15, Supplementary tables SI–XII
Supplementary Table XIII
This is an open access article under the terms of the Creative Commons Attribution‐NonCommercial‐NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.
- Copyright © 2010 EMBO and Macmillan Publishers Limited