Open Access

Patterns of basal signaling heterogeneity can distinguish cellular populations with different drug sensitivities

Dinesh Kumar Singh, Chin‐Jen Ku, Chonlarat Wichaidit, Robert J Steininger, Lani F Wu, Steven J Altschuler

Author Affiliations

  1. Dinesh Kumar Singh1,,
  2. Chin‐Jen Ku1,,
  3. Chonlarat Wichaidit1,,
  4. Robert J Steininger III1,
  5. Lani F Wu1 and
  6. Steven J Altschuler*,1
  1. 1 Department of Pharmacology, Green Center for Systems Biology, Simmons Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
  1. *Corresponding author. Department of Pharmacology and Systems Biology, University of Texas Southwestern Medical Center, 6001 Forest Park Road, Dallas, TX 75390‐9041, USA. Tel.: +1 214 645 6183; Fax: +1 214 645 5965; E-mail: steven.altschuler{at}
  1. These authors contributed equally to this work


Phenotypic heterogeneity has been widely observed in cellular populations. However, the extent to which heterogeneity contains biologically or clinically important information is not well understood. Here, we investigated whether patterns of basal signaling heterogeneity, in untreated cancer cell populations, could distinguish cellular populations with different drug sensitivities. We modeled cellular heterogeneity as a mixture of stereotyped signaling states, identified based on colocalization patterns of activated signaling molecules from microscopy images. We found that patterns of heterogeneity could be used to separate the most sensitive and resistant populations to paclitaxel within a set of H460 lung cancer clones and within the NCI‐60 panel of cancer cell lines, but not for a set of less heterogeneous, immortalized noncancer human bronchial epithelial cell (HBEC) clones. Our results suggest that patterns of signaling heterogeneity, characterized as ensembles of a small number of distinct phenotypic states, can reveal functional differences among cellular populations.

Visual Overview


A high degree of phenotypic diversity has been classically observed among cancer cells, even within a single tumor (Heppner, 1984; Anderson et al, 2006; Ichim and Wells, 2006; Campbell and Polyak, 2007). Importantly, not all cancer cells contribute equally to disease progression or respond equally to therapeutic intervention (Campbell and Polyak, 2007). This heterogeneity has traditionally been viewed as an impediment to efficient diagnosis and treatment. Understanding the relevance of cellular diversity to cancer requires methods for relating patterns of phenotypic heterogeneity to functional outcomes, such as drug sensitivity. Recent advances in fluorescence microscopy image‐based analysis have enabled quantitative single‐cell measurements of the activation and (co‐)localization of signaling molecules within large cellular populations (Boland and Murphy, 2001; Perlman et al, 2004). Here, we apply this technology to explore the extent to which patterns of basal signaling heterogeneity, present within cancer populations before treatment, reveal information about population‐level response to drug perturbation.

To investigate basal cell signaling heterogeneity among a collection of cancer populations having minimal exogenous differences, such as those due to environment, cell type, and genetic background, we generated a collection of 49 low‐passage clonal populations from the highly metastatic nonsmall cell lung cancer cell line H460 (Kozaki et al, 2000). We chose to observe patterns of spatial organization and activation for multiple components from diverse signaling pathways associated with cancer (marker sets 1–4: DNA/pSTAT3/pPTEN; DNA/pERK/pP38; DNA/E‐cadherin/β‐catenin/pGSK3; DNA/pAkt/H3K9‐Ac).

We identified an objective set of signaling stereotypes from each marker set based on a probabilistic description of the distribution of cells in the feature space. For each marker set, a ‘reference’ set of representative cells was sampled from all 50 H460 cancer populations. Then, each reference set was represented as a mixture of subpopulations modeled as Gaussian distributions with means centered on distinct, ‘stereotyped’ signaling states (Slack et al, 2008). Our quantitative analysis suggested that a small collection of signaling stereotypes was sufficient to characterize the complexity of observed cellular phenotypes among all clones. For simplicity, we chose to use five subpopulations to model cellular heterogeneity in each marker set.

For each clone, we computed the fraction of cells in each of the identified subpopulations (Figure 2, scatter plots). Estimation of these fractions allowed us to represent each clone as a probabilistic ensemble of subpopulations. Visual differences among the clones (Figure 2, thumbnail images) were reflected by clear differences in subpopulation mixtures (Figure 2, scatter plots). To compare the subpopulation mixtures of each clone to the parent, a ‘subpopulation enrichment’ profile vector was computed. The vector measured the log‐fold change between the clone and the H460 parent population for each subpopulation (Figure 2, heat map).

We applied hierarchical clustering to group clones based on the similarity of their subpopulation enrichment profiles (Figure 2). Clustering by subpopulation enrichment profiles revealed only a small number of distinct patterns (or ‘signatures’) of subpopulation mixtures (Figure 2, dendrogram and heat map). Thus, parameterization of observed cellular heterogeneity using subpopulation enrichment profiles succinctly encapsulated the apparent complexity of cancer cell phenotypes, and further allowed comparison of clonal populations at a resolution greater than provided by population means.

We next assessed the degree to which clones with distinct patterns of heterogeneity had distinct responses to the drug paclitaxel. We used a multidimensional scaling (Borg and Groenen, 1997) plot to visualize similarity among the clones and annotated each clone with the index of drug sensitivity. This visualization revealed striking geometric separation in ‘profile space’ of paclitaxel‐sensitive from paclitaxel‐nonsensitive clones for each marker set (Figure 3A, green versus red and black circles). The significance of separation was further confirmed by machine learning‐based classification studies. Thus heterogeneity of basal cellular signaling states contained information that could be used to predict sensitivity to drug treatment.

Our approach is general, and makes heterogeneity a computable property of cellular populations. Interrogation at subpopulation‐resolution facilitated a dramatic reduction in the observed phenotypic complexity of cancer populations, yet retained sufficient biological information to identify drug responses. Our work suggests that rigorous analysis of cancer heterogeneity can provide a new resolution at which to match disease to more effective therapies.

  • Non small cell lung cancer H460 clones exhibit a high degree of heterogeneity in signaling states.

  • Clones with similar patterns of basal signaling heterogeneity have similar paclitaxel sensitivities.

  • Models of signaling heterogeneity among the clones can be used to classify sensitivity to paclitaxel for other cancer populations.


Phenotypic heterogeneity is a commonly observed phenomenon in biology (Elsasser, 1984; Rubin, 1990). The physiological importance of phenotypic heterogeneity within cellular populations has been poorly understood. However, a growing body of evidence suggests that heterogeneity—even within clonal populations—may have functional consequences, such as effects on survival odds or homeostatic responses in response to fluctuating environments, pathogen invasion, or drug treatment (Luria and Delbruck, 1943; Balaban et al, 2004; Anderson et al, 2006; Suel et al, 2007; Chang et al, 2008; Cohen et al, 2008; Feinerman et al, 2008; Gascoigne and Taylor, 2008; Wilson et al, 2008). Many studies have focused on identifying a molecular basis for the origins of observed heterogeneity (Snijder et al, 2009; Spencer et al, 2009). However, regardless of its origins, there are many intriguing questions regarding whether heterogeneity contains biological information. Is heterogeneity a reproducible property of cellular populations? At what resolution should heterogeneity be examined? Do different patterns of heterogeneity reflect functional differences among cellular populations? And, does heterogeneity, observed with different readouts, contain similar information?

We choose cancer as a biological context to investigate whether information is contained in cellular heterogeneity. Classically, cancer cells have been shown to exhibit a high degree of heterogeneity in phenotypes, such as signaling and drug response (Heppner, 1984; Rubin, 1990; Anderson et al, 2006; Ichim and Wells, 2006; Campbell and Polyak, 2007; Cohen et al, 2008; Gascoigne and Taylor, 2008). In practice, this phenotypic heterogeneity is often ignored as ‘noise’ or viewed as an impediment to understanding the response of cancer cells to drugs. Determining the response of cancer cell populations to drug perturbations is an important challenge in basic and clinical research. Promising results based on population‐averaged methods have come from large‐scale profiling of genomes (van 't Veer et al, 2002), mRNAs, and miRNAs across different cancer populations (van 't Veer et al, 2002; Lu et al, 2005; Nagrath et al, 2007; Schlabach et al, 2008). When specific drug‐response pathways are known, directed studies of mutational heterogeneity among cancer populations can also be effective in searching for signatures of resistance (Choi et al, 2007; Engelman et al, 2007; Liegl et al, 2008). These approaches require pooling analytes from many cancer cells, which obscures information that might be encoded as cellular heterogeneity within a cancer population. Recent studies have begun quantifying cellular variability within single cancer populations after perturbation with drug treatment (Cohen et al, 2008; Gascoigne and Taylor, 2008; Slack et al, 2008; Brock et al, 2009; Spencer et al, 2009). However, it is unknown what information can be revealed through characterization of heterogeneity before treatment, and further, whether such measures can be reliably related to the drug sensitivities of cancer populations.

Understanding the relevance—if any—of cellular diversity to cancer requires quantitative approaches for relating patterns of heterogeneity to functional outcomes, such as drug sensitivity. In practice, close examination of any cellular population will reveal heterogeneity, and it is a challenge to identify which components of phenotypic variability contain functionally important information. Developments in high‐content imaging and flow cytometry have enabled the comparison of heterogeneity across multiple populations and conditions (Singh et al, 2003; Kiel et al, 2005; Wang et al, 2007; Kotecha et al, 2008; Slack et al, 2008). Image‐based methods can capture phenotypic heterogeneity arising from the spatial distribution of signaling molecules within individual cells and, ultimately, be extended to account for other, higher‐order determinants of in vivo heterogeneity, including spatial organization and microenvironment within healthy and diseased tissues.

Earlier, we developed a quantitative, image‐based approach to characterize heterogeneity observed within and among cellular populations, based on patterns of signaling marker colocalization (Slack et al, 2008). The heterogeneous responses of drug‐treated cancer populations were characterized as mixtures of phenotypically distinct subpopulations, each modeled around a ‘stereotyped’ cellular phenotype. Patterns of heterogeneous responses were shown to be reproducible, and models of heterogeneity—based on a limited, but nontrivial number of subpopulations—were shown to be sufficient to distinguish different classes of drugs based on their mechanism of action. Here, in complement to our previous study, we investigated the extent to which patterns of basal signaling heterogeneity, present within cancer populations before treatment, revealed information about population‐level response to drug perturbation. In this case, we used prediction of population drug sensitivity as an objective measure of the degree to which our decomposition of heterogeneity contained biological information.


Experimental approach for capturing heterogeneity of basal signaling states

Determining which aspects of heterogeneity contain information requires a collection of populations with diverse outcomes for a specific functional readout. We initiated our studies by generating a collection of 49 low‐passage clonal populations from the highly metastatic non‐small cell lung cancer cell line H460 (Supplementary Figure 1A) (Ichim and Wells, 2006). Consistent with earlier studies of clonal populations, variability among the H460 clones was observed for functional readouts such as growth rate, total cell count, local cell density, and cell morphology (Supplementary Figure 2) (Heppner, 1984; Carney et al, 1985). This collection of cancer populations, with similar genetics and cell type, therefore, provided an ideal test bed for our investigations.

Figure 1.

Non‐small cell lung cancer H460 clones exhibit a high degree of phenotypic heterogeneity. (A) (Top) Cellular heterogeneity can be characterized as a mixture of phenotypically distinct subpopulations using a Gaussian mixture model (GMM). Shown is the result of computing a ‘reference’ GMM of five subpopulations. Points in GMM scatter plots correspond to individual cells, visualized through feature representation and PCA reduction to two dimensions. Colored ellipses represent covariance 1 s.d. from the mean for each Gaussian cluster (see Supplementary information); cells in this (and all subsequent) scatter plot are colored by the subpopulation of maximum probability. (Bottom) Images of four representative cells from each computed subpopulation are shown. (B) Clones display phenotypically diverse signaling states as measured by activation and colocalization patterns of pSTAT3 and pPTEN immunostaining. Although some clones are phenotypically similar to the parent (e.g. clone 65), others are dramatically dissimilar to the parent (e.g. clone 100). (C) Heterogeneity observed in each clonal population is summarized with a subpopulation profile: a vector estimating the proportion of cells in each subpopulation. (D) Cell populations with similar overall distributions of marker intensities may have dramatically different proportions of subpopulations. Stacked histograms of subpopulation intensities are shown for the parent and two clones. Colors correspond to subpopulations identified in (A). Black outlines correspond to overall histogram; vertical lines indicate population medians. Pseudocolors for images in (A) are specified above the scatter plots. Pseudocolors for images in (B) are: DNA‐blue, pSTAT3‐green, pPTEN‐red. Scale bars: 20 μm. MS1 refers to marker set 1.

Which cellular readouts should be selected to capture heterogeneity? One approach is to select specific biomarkers that target conjectured or known links between cellular mechanism and functional outcome (Snijder et al, 2009). However, the focus of our study was to identify signatures of heterogeneity that may be informative in the context of diverse cancer types. Therefore, we took an alternative approach and selected combinations of general signaling readouts to capture the heterogeneity of cellular populations in ‘basal’ (untreated) conditions. Four multiplexed immunofluorescent marker sets (MS) were chosen and studied independently (Supplementary Table 1; MS1: DNA/pSTAT3/pPTEN; MS2: DNA/pERK/pP38; MS3: DNA/E‐cadherin/pGSK3‐β/β‐catenin; and MS4: DNA/pAkt/H3K9‐Ac). These biomarkers, selected to monitor the activity levels of key signal transduction components associated with diverse areas of cancer biology (Bremnes et al, 2002; Pandolfi, 2004; Zhou et al, 2004; Haura et al, 2005; Normanno et al, 2006; Stewart et al, 2006; Barre et al, 2007; Rocques et al, 2007) enabled us to obtain a snapshot of the ensemble of cellular signaling states present within our clonal cancer populations.

Identification of common cellular signaling stereotypes

A wide range of signaling phenotypes was observed within and across untreated clonal populations based on immunofluorescent microscopy images of MS1. Although some clones appeared by eye to be phenotypically similar to the parent, other clones appeared quite different (Figure 1B; Supplementary Figure 1B). In addition, within each clone we observed cells with diverse signaling patterns as defined by marker intensity and colocalization (Supplementary Figures 1C and D). However, closer inspection of all 50 cancer populations suggested that most cell phenotypes fell into a relatively small number of signaling ‘stereotypes’; that is, each stereotype was present, to varying degrees of proportion, within all clones (Supplementary Figures 1B–D and 3). These observations suggested that each clonal population could be characterized as a mixture of a small number of common signaling stereotypes.

Figure 2.

Distinct patterns of signaling heterogeneity can be compared across H460 clones. Shown are results computed using marker set 1 (DNA/pSTAT3/pPTEN); pseudocolors for the thumbnail images are as in Figure 1B. At the top are the representative GMM scatter plots from eight clones and the parent (P) culture. Below are thumbnail images of each clone. Yellow/blue heat map shows enrichment/de‐enrichment of subpopulations (rows) for all clones (columns). Profiles are computed as the log ratio of clone subpopulation proportions relative to the parent. Clone clustering is determined by hierarchical clustering (dendrogram at bottom). The dendrogram is plotted to produce decreasing average sensitivity to paclitaxel (colored squares above the heat map; Supplementary information). Paclitaxel sensitivity is scored relative to the parent and displayed in red (resistant) and green (sensitive) color scale (gray: paclitaxel‐sensitivity scores of clones 33 and 35 are unreliable due to an image‐focus problem.) (Supplementary information). Similar results using marker sets 2–4 are shown in Supplementary Figure 6.

To capture common signaling stereotypes among the clones, we applied an earlier developed approach for approximating cellular distributions as mixtures of subpopulations, which is unbiased by prior knowledge of cell‐ or marker‐specific phenotypes (Supplementary information) (Slack et al, 2008). In summary, we analyzed each MS independently as follows. We applied automated cell segmentation to our image data (Loo et al, 2007), extracted cellular features from ratios of marker intensities at every pixel within a cellular region, and identified a small number (∼⩽10) of ‘maximally informative’ signaling features by principal component analysis (PCA) (Supplementary information) (Turk and Pentland, 1991; Slack et al, 2008). These PCA‐based features were used in all subsequent analysis (though only the first two dimensions, PC1 and PC2, are used for visualization).

Approximately 4000 cells were analyzed per MS and per clone (∼200 000 cells per MS). For each MS, a ‘reference’ set of cells was generated by randomly subsampling ∼10% of cells from all the 50 H460 populations. Finally, each reference set was represented as a mixture of subpopulations, modeled as Gaussian distributions with means centered on distinct, ‘stereotyped’ signaling states (Figure 1A, top panel; Supplementary information). (Other choices of distributions for approximating local, high‐density regions of cellular feature space, such as skew t‐distributions (Pyne et al, 2009), could be made in future studies. Such choices may provide better approximations when the distributions are not normally distributed or may better model specific biological phenotypes.) We then used our mixture model to assign to each cell a probability of belonging to each subpopulation. These probabilities were used for all subsequent analysis, though for visualization purposes cells were assigned to the subpopulation of highest probability (Figure 1A, scatter plot).

The heterogeneity of each cell population was estimated by using our computed reference subpopulation model (Slack et al, 2008). In brief, the (posterior) probability of each cell belonging to the identified subpopulations was computed using Bayes’ rule and represented as a probability vector whose entries summed to one. An expected overall proportion of each subpopulation was computed by averaging these probability vectors over the cell population to obtain a subpopulation profile. Replicates were averaged to obtain a single final profile of subpopulation fractions per condition. In essence, these profiles of probability vectors (p1,…,pk) yielded a decomposition of each population, D, as a weighted mixture, D≈∑psDs, of the k reference subpopulation distributions, Ds. These profiles provided interpretable summarizations of heterogeneity present within the clones, and captured differences in subpopulation fractions, such as due to enrichment of cells into different phenotypic states and/or general population shifts.

To evaluate the optimal number of subpopulations, we applied two standard model fit criteria: Bayesian information‐theoretical criterion and the Gap statistics. These standard performance metrics evaluate models by rewarding fit to data, but penalize over fitting due to increased model complexity. Our results suggested that cellular heterogeneity among all 50 H460 populations in our four MS could be reasonably modeled by a low number (3–7) of signaling stereotypes (Supplementary Figure 4). For convenience, in subsequent analysis we chose to use reference models of five subpopulations for all MS; this choice is in line with the estimates of model fit, and allowed us to test whether a small number of subpopulations could capture information contained in cellular heterogeneity.

Examination of representative cells from the five identified subpopulations (chosen near the mean of each Gaussian distribution) revealed consistent and significant differences in the activation levels of key signaling proteins (Figure 1A, bottom panel). (We noted that the subpopulations were not particularly enriched for specific cell‐cycle states; Supplementary Figure 5.) Importantly, identification of these subpopulations revealed dramatic differences in heterogeneity among clones that were not easily distinguished on the basis of population‐level statistics of average cellular marker expression alone. For example, clone #65, clone #100, and the parent have essentially indistinguishable means and relatively similar distributions of intensities for pSTAT3 and pPTEN in MS1 (Figure 1B–D). However, the mixture of subpopulations for clones #100, 65 and the parent were distinct (though #65 appears closer to the parent mixture than #100) (Figure 1C and D). These small collections of subpopulation phenotypes provided an intermediate (less complex than single cell but more informative than population average) resolution for examining and comparing heterogeneity observed among our H460 clones.

Comparison of heterogeneity across clonal cancer populations

We next compared heterogeneity observed across our entire collection of H460 clones. We began by studying cellular heterogeneity observed with MS1, and then made use of the other marker sets (MS2–4) to test the dependence of our findings on our initial choices of readouts. Differences in heterogeneity among the clones could be seen as differences in fractions of cells in each of the five subpopulations (Figure 2, thumbnail images and scatter plots). To assess the variation of signaling heterogeneity among the clones, we transformed the subpopulation profiles of the clones to reflect their log‐fold enrichment of subpopulations compared with the parent, and grouped the profiles by hierarchical clustering based on their Euclidean distances (Supplementary information). Interestingly, clustering of the enrichment profiles revealed a relatively small number of distinct patterns (or ‘signatures’) of signaling heterogeneity (MS1 is shown in Figure 2; MS1–4 are shown in Supplementary Figure 6). In addition, subpopulation profiles from replicates of the same clone were much more similar to each other on average than replicates of clones selected from different clusters, indicating that our proposed measures of heterogeneity were experimentally reproducible (Supplementary Figure 8). Thus, cell‐to‐cell variation was captured by a few signaling stereotypes common to all the clonal populations and, further, only a few distinct patterns of heterogeneity were observed within our collection of clonal populations. Our decomposition of observed cell signaling heterogeneity provided an approach to visualize the diversity of heterogeneity among our clones, succinctly encapsulate the apparent complexity of cancer phenotypes, and compare clones at a resolution greater than provided by population means.

Classification of drug sensitivity from patterns of signaling heterogeneity

Do patterns of subpopulation mixtures reflect functional differences among the clones? It is known that not all cancer subpopulations respond equally to drugs (Tang et al, 2007; Cohen et al, 2008; Gascoigne and Taylor, 2008). Hence, we wondered whether clones with similar patterns of pre‐existing heterogeneity would have similar drug sensitivities. The H460 cancer populations were given identical 48 h treatments of the chemotherapeutic drugs paclitaxel (10 nM) and doxorubicin (1 μM). Cells were then fixed and stained with standard markers for apoptosis, and an index of relative drug sensitivity for each clone to the parent was computed based on the log ratios of remaining nonapoptotic cell counts; negative (or positive) values indicated greater drug resistance (or sensitivity) than the parent (Supplementary information). We observed that clones with similar patterns of heterogeneity tended to have similar drug sensitivities (Supplementary Figure 3). As most clones had similar sensitivities to paclitaxel and doxorubicin, we carried our analysis forward using only paclitaxel. Hierarchical clustering (MS1 is shown in Figure 2, MS1–4 are shown in Supplementary Figures 6 and 7) and multidimensional scaling (Figure 3A) (Borg and Groenen, 1997) of subpopulation profiles revealed striking separation of paclitaxel‐sensitive from paclitaxel‐nonsensitive clones. (As expected, cells stained without primary antibodies, but with secondary antibodies plus Hoechst alone, yielded no separation; Supplementary Figure 9.) Thus, heterogeneity of cellular signaling states observed in our untreated H460 clones contained information that captured sensitivity to drug treatment.

Figure 3.

Clones with similar patterns of subpopulation profiles have similar drug sensitivities. Clone IDs and relative sensitivities to paclitaxel are as in Figure 2. (A) Paclitaxel‐sensitive clones can be separated from nonsensitive clones based on patterns of subpopulation profiles. Multidimensional scaling (MDS) is used to visualize the subpopulation vectors of the H460 populations with respect to the Kullback–Leibler divergence measure (Supplementary information). Solid black squares indicate replicates of parental clone from the seven imaging plates; filled circles indicate clones; gray open circles (clones 33 and 35) indicate unreliable sensitivity scores. (B) The closest three neighbors of a clone tend to have similar drug sensitivities across all markers. Clones are sorted from least to greatest sensitivity to paclitaxel. Heat map indicates the number of nearest neighbors to each clone that are among the top 10‐most sensitive (top panel) or resistant (bottom panel) to paclitaxel. (C) Clones of similar paclitaxel sensitivity tend to be phenotypically similar across all four marker sets. Thumbnails of clones (columns) labelled in (A) are shown for all four marker sets (rows). Columns are grouped from left to right by decreasing sensitivity to paclitaxel. Scale bar: 20 μm.

To what extent does the separation of drug sensitivities based on patterns of pre‐existing heterogeneity depend on MS choice? We observed that the nearest neighbors of a clone in one MS were often close neighbors in the other MS (Supplementary Figure 10); there were ∼20 clones whose three‐nearest neighbors in MS1 remained close in MS2 (>6‐fold more than expected by chance). Further, the sets of nearest neighbors of a clone across marker sets tended to have similar average drug sensitivities, independent of our choice of MS (Figure 3B). Conversely, clones of similar drug sensitivities tended to have similar phenotypes across all marker sets (Figure 3C; Supplementary Figures 3 and 10). The consistency of information across signaling markers and clones suggested the possibility that similar patterns of cellular heterogeneity were reflective of ‘deeper’ similarities of underlying regulatory networks.

How separable are the collections of ‘sensitive’ and ‘resistant’ subpopulation profiles? We computed the accuracy of separating these two classes of profiles using a linear support vector machine (Supplementary information). Our complete set of H460 clones had separation accuracies between ∼70 and 76% for our MS (Figure 4A‐I, MS1–4). However, separation accuracies between sets of clones with ‘extreme’ sensitivities were much higher (∼80–100% for the 10 or 20 most sensitive and resistant clones) (Figure 4A‐I, MS1–4; Supplementary Figure 11A). A repeat experiment gave similar results. However, as may be expected from other studies of clones (Chang et al, 2008), we observed that separation accuracy of our low‐passage H460 clones decreased over the period of a month (Supplementary Table 5). Finally, to assess the predictive value of our model of H460 heterogeneity, we recomputed separation accuracies using a leave‐one‐out strategy (Supplementary information). Prediction accuracies for the complete and ‘extreme’ sets of clones were similar, though slightly reduced, to the full separation accuracies (66–73% and 80–90%, respectively) (Supplementary Figure 12) across MS1–4. Thus, clones with extreme opposite sensitivities had distinct and separable patterns of heterogeneity: distinct patterns of heterogeneity reflected functional divergence.

Figure 4.

Models of H460 lung cancer heterogeneity can be used to classify sensitivity to paclitaxel for other cancer populations. (A) Accuracies of separating paclitaxel‐resistant and ‐sensitive collections of cell populations based on their subpopulation profiles by a linear SVM (random separation: 50%; perfect separation: 100%). Columns correspond to marker sets; rows correspond to different pairs of sensitive and resistant groups of cell populations. ‘All:’ all populations grouped into either resistant or sensitive classes; ‘Extreme 2N:’ populations only included when in the N‐most sensitive or resistant populations. All subpopulations are computed based on H460 reference model. Accuracy not statistically significant (P>0.05). Accuracy not 1 s.d. above the average accuracy over all possible permutations of resistant/sensitive assignments (Supplementary Figure 11). *The least resistant cell line was not used for SVM analysis to create a balanced (four resistant, four sensitive) data set (Supplementary information). (B) Noncancerous HBEC clones display less diversity than the panel of H460 clones. (Top panel) HBEC clones show reduced ranges of drug sensitivities (bottom panel) and dissimilarity among phenotypic profiles compared with H460 clones. Reference model for bottom panel is built by sampling both HBEC and H460 clones; the number of subpopulations is varied from 3 to 14. Error bars are 90% confidence intervals based on bootstrapping (Supplementary information). (C) Drug sensitivity among diverse cancer populations can be separated by subpopulation profiles. The H460 reference model was used to compute subpopulation profiles for nine adherent cell lines with the most extreme GI50 values for paclitaxel within the NCI‐60 panel.

Classification of drug sensitivity in diverse cell populations

We wondered whether the phenotypic diversification and separation of drug sensitivity by cellular heterogeneity would also hold for a collection of noncancer clone populations. We applied our H460 subpopulation model to a panel of 75 noncancerous, immortalized human bronchial epithelial cell (HBEC) clones (Vaughan et al, 2006) stained with MS1 and MS4. The HBEC clones showed reduced ranges of overall heterogeneity and drug sensitivities compared with the H460 cancer clones (Figure 4B), as monitored by our assay, and showed no significant separation, even when tested on the subset of clones with extreme paclitaxel sensitivities (Figure 4A‐II; Supplementary Figure 11B). In addition, separation was poor even after building an HBEC reference model of heterogeneity (data not shown). These results were consistent with the expectation that cancer is associated with increased phenotypic heterogeneity compared with normal cells (Heppner, 1984; Rubin, 1990; Campbell and Polyak, 2007; Gascoigne and Taylor, 2008). Thus, in contrast with the H460 cancer clones, among these noncancer HBEC clones, heterogeneity provided no additional information for separating functional differences, presumably due to greater similarity among founder cells and/or more tightly regulated ranges of signaling states.

We next tested whether models of cellular heterogeneity developed on the H460 clones could reveal information about the drug sensitivity of cellular populations of diverse cancer types. We applied our H460 model of heterogeneity to nine cell lines selected from the NCI‐60 panel (Shankavaram et al, 2009) with extreme GI50 values for paclitaxel (NCI‐9—five sensitive and four resistant) (Supplementary information). These selected cell lines were derived from breast, colon, lung, ovarian, and renal cancers (Supplementary Table 2). Remarkably, subpopulation profiles for these populations were well separated by paclitaxel sensitivities using MS4, and to a lesser degree MS1 (Figures 4A‐II and C; Supplementary Figure 11B). Here, similar separation accuracies could also be obtained using a reference model of heterogeneity built entirely from subsampled cells within the NCI‐9 cell lines (Supplementary Figure 13). As with the clones, repeat experiments gave similar separation accuracies. However, in this case, separation accuracies remained similar (and high) even after 2 months of additional time in culture (Supplementary Table 5). As might be expected, the observed relationship between heterogeneity and drug sensitivity was more stable for these well‐established cell lines than for the low‐passage clones. These results suggested that diverse cancer types may share an overlapping repertoire of signaling states (Jones et al, 2008; Parsons et al, 2008; Valle et al, 2008), whose heterogeneous ensembles have similar relationships to function.

To what extent did the identification of information contained in cellular heterogeneity depend on the choices made in our study? Clearly, not every marker set, feature, or model parameter will be equally informative. For example, paclitaxel sensitivity among the H460 and NCI panels could neither be predicted by a panel of markers including its drug target microtubules MS5: DNA/actin/β‐tubulin, nor by a panel of ‘neutral’ markers MS6: DNA/GAPDH/Pericentrin (Figure 4A‐I, MS5–6; Supplementary Figures 11A, 12, and 14). Alternatively, for the sole purpose of developing functional predictions, it may be possible to identify specific markers and features whose population‐averaged measurements can provide accurate classification. For example, the average intensity of β‐catenin in MS3 provided exceptional classification accuracy among all markers (accuracy=78.72%, P<0.05 for the complete set of H460 clones). Population‐averaged measurements also lend themselves to multiplexed assays, such as those performed with array‐based technology; features based on population‐averaged measurements can be easily combined from parallel assays, thereby allowing greater numbers of markers to be explored than can be studied at present on individual cells. However, information may be lost; classification of paclitaxel sensitivity based on population‐averaged expression of any three random randomly chosen readouts from MS1–4 performed on average 5% poorer (or 10% if β‐catenin was dropped) than our heterogeneity profiles based on three readouts. Furthermore, ensemble‐averaged measurements may be predictive of function (e.g. drug response), yet poorly represent individual cellular behaviors and lead to inaccurate models of cell function (Ferrell and Machleder, 1998). Finally, a critical parameter for decomposing heterogeneity is the coarseness of the approximation (Yin et al, 2008; Pyne et al, 2009). In cross‐validation studies, we found that the range of subpopulation numbers suggested by model fit criteria (i.e. 3–7 subpopulations) coincided well with the range that provided highest separation accuracies of the H460 clones by drug sensitivities (Supplementary Figure 15). In future, refinement of model parameters may be improved by incorporating additional biological knowledge to determine when subpopulations should be merged or further split.


Cellular heterogeneity has been classically described within cellular populations, both in the settings of cell culture and in vivo. Heterogeneity, as an absolute property of a cellular population and collection of molecular readouts, can be difficult to interpret. However, relative differences in heterogeneity, such as may be due to epigenetics, genetics, or environmental conditions, may be more interpretable, in particular when tested for correlation with functional differences.

In the context of differences due to pharmacological perturbations, heterogeneity may be observed before or after treatment. In earlier work (Slack et al, 2008), the ability to distinguish mechanistic classes of perturbations based on heterogeneous cancer cell responses was studied. In contrast, here we investigated whether patterns of basal signaling heterogeneity contained information predictive of subsequent population response to perturbation. We used drug sensitivity classification to provide an objective test of whether our decomposition of heterogeneity contained biologically relevant information. (It was not the goal of this study to develop or optimize predictors for drug sensitivities that outperform other methods.) We modeled the (quasi‐equilibrium) distributions of cell signaling phenotypes present within populations from snapshots of large numbers of cells (Chang et al, 2008; Huang et al, 2009), and found that measures of these distributions served as informative, predictive readouts of population‐level responses to perturbation. Our approach allowed us to decompose heterogeneous cellular distributions into a small number of more phenotypically homogenous states (Figure 1), compare and group populations based on their patterns of heterogeneity (Figure 2), identify a consistent relationship between heterogeneity and function across multiple sets of general signaling markers (Figure 3) and, finally, test whether a common model of basal signaling heterogeneity could be used to predict drug sensitivities across different cancer populations (Figure 4). In general, characterization of the ensemble of subpopulation mixture may be required to distinguish functional differences among populations. However, in certain cases, (de‐) enrichment for specific subpopulations may be sufficient to account for overall functional differences. For example, in MS1, enrichment for subpopulation pairs (S1, S4) or (S2, S3) separated paclitaxel‐sensitive from ‐nonsensitive clones (Supplementary Figure 6). Future studies are required to investigate the deeper molecular states of specific subpopulations (Loo et al, 2009a, 2009b) and their relationship to drug response. We note that in this study, cellular phenotypes were captured on the basis of the spatial colocalization patterns of signaling activity readouts from fixed cells. The physical sorting and subsequent investigation of our identified subpopulations remain challenging.

Important questions remain, such as the origins and evolution of the phenotypic diversification, why our decomposition of heterogeneity predicts drug responsiveness in our defined culture conditions, and why classification is possible on the basis of a limited number of biomarkers that were not chosen based on a prior knowledge of the biology of drug responsiveness, but rather on a general survey of pathways implicated in cancer. The observed heterogeneity among the H460 clones could be due to several factors, including differences in epigenetic states and genetic diversity that may have been present within the parent population or evolved within the clones during their short time of establishment. Regardless, we found that a simple description of the observed heterogeneity contained functional information. One possibility for our success using a limited number of biomarkers may be that our subpopulations reveal ‘deeper’ underlying states that broadly reflect signaling in multiple pathways, and thus may be distinguishable by a small number of ‘general’ signaling markers. Another possibility is that our approach has connected the characteristic behaviors of regulatory networks in two operating regimes: namely, networks operating within each cancer clone shape the stochastic distributions of cell signaling states in unchallenged conditions (Huang et al, 2009) as well as determine an overall population response to an acute challenge (i.e. drug treatment). It is also interesting to speculate whether patterns of heterogeneity observed in primary cancer samples can be interpreted to reveal clinically important information. Importantly, the answer to this question is independent of whether profiles of clinical and cell line samples directly share common signatures. Nevertheless, the potential to study the physiological states of cell populations at a resolution greater than population averages, yet more summarized than individual cells, is highly compelling and our approach may help to interpret heterogeneity observed in healthy and diseased tissues.

Materials and methods

All clones were seeded in triplicate on the same day onto seven 96‐well plates (each plate contained seven clones and the parent), grown under identical conditions for 16 h, fixed, stained with the four MS, and imaged at × 20 magnification. Image intensities were scaled for each plate to normalize parental replicates among all plates. Subpopulation profiles were performed as described earlier (Slack et al, 2008) (Supplementary information).

Conflict of Interest

The authors declare that they have no conflict of interest.

Supplementary Information

Supplementary Material

Supplementary information, Supplementary Tables S1–S5, Supplementary Figures S1–S15 [msb201022-sup-0001.pdf]


We thank Nam Bui, Luc Grard, Lit‐Hsin Loo, Kathy Lyons, Benjamin Pavie, Michael Slack, James Sullivan, and Denise Tria for technical help, John Minna, Tim Mitchison, Rama Ranganathan, Jerry Shay, Gürol Süel, and Michael White for helpful discussions and readings of the paper, and BD Transduction Laboratories for reagents. This research was supported by the National Institute of Health grants NIH GM007062 (RJS), R01 GM081549 (LFW), and R01 GM085442 (SJA), the Endowed Scholars program at UT Southwestern Medical Center (LFW and SJA), a UT Southwestern SPORE project grant (LFW), the Welch Foundation I‐1619 (SJA) and I‐1644 (LFW), the Rita Allen Foundation (SJA), and the Mary Kay Ash Charitable Foundation (SJA).


This is an open‐access article distributed under the terms of the Creative Commons Attribution License, which permits distribution, and reproduction in any medium, provided the original author and source are credited. This license does not permit commercial exploitation without specific permission.