Mapping protein–protein interactions is an invaluable tool for understanding protein function. Here, we report the first large‐scale study of protein–protein interactions in human cells using a mass spectrometry‐based approach. The study maps protein interactions for 338 bait proteins that were selected based on known or suspected disease and functional associations. Large‐scale immunoprecipitation of Flag‐tagged versions of these proteins followed by LC‐ESI‐MS/MS analysis resulted in the identification of 24 540 potential protein interactions. False positives and redundant hits were filtered out using empirical criteria and a calculated interaction confidence score, producing a data set of 6463 interactions between 2235 distinct proteins. This data set was further cross‐validated using previously published and predicted human protein interactions. In‐depth mining of the data set shows that it represents a valuable source of novel protein–protein interactions with relevance to human diseases. In addition, via our preliminary analysis, we report many novel protein interactions and pathway associations.
Understanding the roles and consequences of protein–protein interactions is a fundamental goal in cellular biology and a prerequisite for the development of molecular systems biology. The endeavor of cataloging protein interactions is primarily hindered by the throughput and reproducibility of existing technologies. Different techniques for mapping protein interactions are available, such as the two‐hybrid approach (Chien et al, 1991) and the LUMIER approach (Barrios‐Rodiles et al, 2005) and assay whether two proteins interact in a pair‐wise fashion. We have developed a high‐throughput platform combining immunoprecipitation and high‐throughput mass spectrometry (IP‐HTMS) to rapidly identify potentially novel protein interactions for a bait protein of interest. We (Ho et al, 2002) and others (Gavin et al, 2002) previously used this approach to map protein–protein interactions in yeast, creating invaluable data sets for yeast biology and extrapolation into mammalian biology.
Mapping protein interactions in human cells has its own set of challenges owing to the number of potentially expressed genes, the number of different cell types, and the numbers of internal and external factors that impact the cellular system. Although a complete mapping of the human interactome is still beyond current capabilities, more focused studies are possible. Here we report the first large‐scale application of IP‐HTMS to the mapping of protein–protein interactions in human cells using 338 human bait proteins of significant biomedical interest. The complete data set is available from the Intact database (http://www.ebi.ac.uk/intact/site/) (accession EBI‐1059370) or as a table of bait–prey pairs with associated confidence values (Supplementary Table II).
There has been much focus and discussion over the last few years on the quality and reproducibility of interactions in high‐throughput protein–protein interaction datasets (e.g. von Mering et al, 2002). A guiding principle in our study has therefore been to implement stringent quality controls. The final data set includes protein interactions for 338 human bait proteins (Supplementary Table I). For over half of these baits, two or more replicate immunoprecipitation experiments were performed, requiring a total of 1034 individual immunoprecipitation experiments with associated SDS–PAGE. These experiments yielded over 16 000 gel bands for which over 400 000 MS/MS spectra were assigned peptide sequences. Approximately 1/5 of our immunoprecipitation experiments were control (no‐bait) experiments allowing us to build a comprehensive list of spurious and ubiquitously binding proteins that could then be filtered out of the interaction network. Another 1/5 of the experiments were directed towards a study of the reproducibility of prey protein identification using our platform. These 202 immunoprecipitation experiments, derived from 18 baits, were used to train a statistical model that associates interaction reproducibility with various observed experimental parameters, such as the number of peptides identified for the given prey protein. This model was used to assign confidence values (taking a value between 0 and 1) to each of the 6486 interactions in the data set.
As the interaction confidence score is calculated solely from IP‐HTMS experimental parameters, an initial focus was to confirm that the confidence score was an accurate means of ranking the interactions for further study. We observed, for example, that known interactions in the data set have, on average, significantly higher interaction confidence scores. For example, the set of baits corresponding to core and regulatory components of the proteasome enabled reconstruction of a proteasome interaction network (Figure 6C), comprising many known proteasome components and enriched for high‐scoring interactions.
We also integrated the IP‐HTMS data set with several other genomic‐scale data including other protein–protein interaction data sets, gene co‐expression data, and annotations from the gene ontology project. In the latter case, we analyzed the frequency of co‐occurrence of both bait and prey protein in the same biological process or cellular component category (Figure 3). We find that there is significant enrichment of bait–prey pairs sharing the same annotation category, indicating a strong tendency for bait proteins to bind prey proteins with related functions. Integration with gene co‐expression data showed that interaction data sets, this one included, are enriched for gene pairs that are co‐expressed. This enabled identification of tightly clustered sets of protein interactors that are also co‐expressed at the mRNA level. For example, the LYAR bait protein (Ly1 antibody reactive clone) is a nucleolar protein of unknown function (Su et al, 1993). This bait identified a set of nucleolar‐localized prey proteins that are also very tightly co‐expressed (Figure 5). These results along with the other protein–protein interaction data sources provided a powerful means of cross‐validating the human IP‐HTMS data set and associated methodology.
Our focus in this paper has been to prepare a quality‐controlled, large‐scale human protein interaction data set that will add significantly to our knowledge of the human protein interactome. Given the focus on baits of significant biomedical interest (through functional or disease associations), we anticipate that this data set alongside other sources of human protein–protein interactions will be an important starting point for functional characterization of disease‐related interactions and complexes. The IP‐HTMS platform utilized here shows great promise as an effective means of protein interaction discovery and we anticipate that future applications will include broadening to a larger set of disease associated proteins, to other cell lines and coupling with drug treatments.
We present a dataset of 6486 interactions between 2371 distinct proteins from a large‐scale application of immunoprecipitation and high‐throughput mass‐spectrometry (IP‐HTMS) on 338 human bait proteins expressed in human cells.
The dataset is cross‐validated using previously published and predicted human protein interactions. In depth mining of the dataset shows that it represents a valuable source of novel protein‐protein interactions with relevance to human diseases. In addition, our analysis reveals many novel protein interactions and pathway associations.
Protein interactions in the dataset are accompanied by a confidence score which is derived by combining several experimental and protein identification analysis metrics.
Mol Syst Biol. 3: 89
- Received September 22, 2006.
- Accepted January 26, 2007.
- Copyright © 2007 EMBO and Nature Publishing Group