Large‐scale mapping of human protein–protein interactions by mass spectrometry

Rob M Ewing, Peter Chu, Fred Elisma, Hongyan Li, Paul Taylor, Shane Climie, Linda McBroom‐Cerajewski, Mark D Robinson, Liam O'Connor, Michael Li, Rod Taylor, Moyez Dharsee, Yuen Ho, Adrian Heilbut, Lynda Moore, Shudong Zhang, Olga Ornatsky, Yury V Bukhman, Martin Ethier, Yinglun Sheng, Julian Vasilescu, Mohamed Abu‐Farha, Jean‐Philippe Lambert, Henry S Duewel, Ian I Stewart, Bonnie Kuehl, Kelly Hogue, Karen Colwill, Katharine Gladwish, Brenda Muskat, Robert Kinach, Sally‐Lin Adams, Michael F Moran, Gregg B Morin, Thodoros Topaloglou, Daniel Figeys

Author Affiliations

  1. Rob M Ewing1,2,
  2. Peter Chu1,,
  3. Fred Elisma3,
  4. Hongyan Li1,,
  5. Paul Taylor1,§,
  6. Shane Climie1,||,
  7. Linda McBroom‐Cerajewski1,,
  8. Mark D Robinson1,††,
  9. Liam O'Connor1,‡‡,
  10. Michael Li1,§§,
  11. Rod Taylor1,
  12. Moyez Dharsee1,2,
  13. Yuen Ho1,||||,
  14. Adrian Heilbut1,¶¶,
  15. Lynda Moore1,†††,
  16. Shudong Zhang1,
  17. Olga Ornatsky1,‡‡‡,
  18. Yury V Bukhman1,§§§,
  19. Martin Ethier3,
  20. Yinglun Sheng3,
  21. Julian Vasilescu3,
  22. Mohamed Abu‐Farha3,
  23. Jean‐Philippe Lambert3,
  24. Henry S Duewel1,||||||,
  25. Ian I Stewart1,2,
  26. Bonnie Kuehl1,¶¶¶,
  27. Kelly Hogue1,16,
  28. Karen Colwill1,17,
  29. Katharine Gladwish1,
  30. Brenda Muskat1,18,
  31. Robert Kinach1,‡‡‡,
  32. Sally‐Lin Adams1,19,
  33. Michael F Moran1,§,
  34. Gregg B Morin1,†††,
  35. Thodoros Topaloglou1,4 and
  36. Daniel Figeys*,1,3
  1. 1 Protana (now Transition Therapeutics), Toronto, Ontario, Canada
  2. 2 Infochromics, MaRS Discovery District, Toronto, Ontario, Canada
  3. 3 Faculty of Medicine, The Ottawa Institute of Systems Biology, University of Ottawa, BMI, Ottawa, Ontario, Canada
  4. 4 Information Engineering Center, Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, Ontario, Canada
  1. *Corresponding author. The Ottawa Institute of Systems Biology, University of Ottawa, BMI, 451 Smyth Road, Ottawa, Ontario, Canada K1H 8M5. Tel.: +1 613 562 5800 ext 8674; Fax: +1 613 562 5655; E-mail: dfigeys{at}
  • Present address: Faculty of Health Sciences, McMaster University, Hamilton, Ontario, Canada

  • Present address: Department of Biology, York University, Toronto, Ontario, Canada

  • § Present address: Hospital for Sick Children and McLaughlin Centre for Molecular Medicine, and Department of Medical Genetics and Microbiology, University of Toronto, Toronto, Ontario, Canada

  • || Present address: Popper and Company LLC, Sarasota, FL, USA

  • Present address: Structural Genomics Consortium, University of Toronto, Toronto, Ontario, Canada

  • †† Present address: Genetics and Bioinformatics, Walter and Eliza Hall Institute of Medical Research (WEHI), Parkville, Victoria, Australia

  • ‡‡ Present address: Novartis Institutes for Biomedical Research, Cambridge, MA, USA

  • §§ Present address: Platform Computing, Markham, Ontario, Canada

  • |||| Present address: Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada

  • ¶¶ Present address: CombinatoRx Inc, Cambridge, MA, USA

  • ††† Present address: Michael Smith Genome Sciences Centre, BC Cancer Agency Genome Sciences Centre, Vancouver, British Columbia, Canada

  • ‡‡‡ Present address: Institute of Biomaterials and Biomedical Engineering, University of Toronto, Toronto, Ontario, Canada

  • §§§ Present address: Campbell Family Institute for Breast Cancer Research, University Health Network, Toronto, Ontario, Canada

  • |||||| Present address: Sigma‐Aldrich Corporation, St Louis, MO, USA

  • ¶¶¶ Present address: Scientific Insights Consulting Group Inc., Mississauga, Ontario, Canada

  • 16 Present address: Advanced Protein Technology Centre, Hospital for Sick Children, Toronto, Ontario, Canada

  • 17 Present address: Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Ontario, Canada

  • 18 Present address: MDS Pharma Services, Mississauga, Ontario, Canada

  • 19 Present address: Division of Haematology/Oncology, Hospital for Sick Children, Toronto, Ontario, Canada

View Full Text


Mapping protein–protein interactions is an invaluable tool for understanding protein function. Here, we report the first large‐scale study of protein–protein interactions in human cells using a mass spectrometry‐based approach. The study maps protein interactions for 338 bait proteins that were selected based on known or suspected disease and functional associations. Large‐scale immunoprecipitation of Flag‐tagged versions of these proteins followed by LC‐ESI‐MS/MS analysis resulted in the identification of 24 540 potential protein interactions. False positives and redundant hits were filtered out using empirical criteria and a calculated interaction confidence score, producing a data set of 6463 interactions between 2235 distinct proteins. This data set was further cross‐validated using previously published and predicted human protein interactions. In‐depth mining of the data set shows that it represents a valuable source of novel protein–protein interactions with relevance to human diseases. In addition, via our preliminary analysis, we report many novel protein interactions and pathway associations.

Visual Overview


Understanding the roles and consequences of protein–protein interactions is a fundamental goal in cellular biology and a prerequisite for the development of molecular systems biology. The endeavor of cataloging protein interactions is primarily hindered by the throughput and reproducibility of existing technologies. Different techniques for mapping protein interactions are available, such as the two‐hybrid approach (Chien et al, 1991) and the LUMIER approach (Barrios‐Rodiles et al, 2005) and assay whether two proteins interact in a pair‐wise fashion. We have developed a high‐throughput platform combining immunoprecipitation and high‐throughput mass spectrometry (IP‐HTMS) to rapidly identify potentially novel protein interactions for a bait protein of interest. We (Ho et al, 2002) and others (Gavin et al, 2002) previously used this approach to map protein–protein interactions in yeast, creating invaluable data sets for yeast biology and extrapolation into mammalian biology.

Mapping protein interactions in human cells has its own set of challenges owing to the number of potentially expressed genes, the number of different cell types, and the numbers of internal and external factors that impact the cellular system. Although a complete mapping of the human interactome is still beyond current capabilities, more focused studies are possible. Here we report the first large‐scale application of IP‐HTMS to the mapping of protein–protein interactions in human cells using 338 human bait proteins of significant biomedical interest. The complete data set is available from the Intact database ( (accession EBI‐1059370) or as a table of bait–prey pairs with associated confidence values (Supplementary Table II).

There has been much focus and discussion over the last few years on the quality and reproducibility of interactions in high‐throughput protein–protein interaction datasets (e.g. von Mering et al, 2002). A guiding principle in our study has therefore been to implement stringent quality controls. The final data set includes protein interactions for 338 human bait proteins (Supplementary Table I). For over half of these baits, two or more replicate immunoprecipitation experiments were performed, requiring a total of 1034 individual immunoprecipitation experiments with associated SDS–PAGE. These experiments yielded over 16 000 gel bands for which over 400 000 MS/MS spectra were assigned peptide sequences. Approximately 1/5 of our immunoprecipitation experiments were control (no‐bait) experiments allowing us to build a comprehensive list of spurious and ubiquitously binding proteins that could then be filtered out of the interaction network. Another 1/5 of the experiments were directed towards a study of the reproducibility of prey protein identification using our platform. These 202 immunoprecipitation experiments, derived from 18 baits, were used to train a statistical model that associates interaction reproducibility with various observed experimental parameters, such as the number of peptides identified for the given prey protein. This model was used to assign confidence values (taking a value between 0 and 1) to each of the 6486 interactions in the data set.

As the interaction confidence score is calculated solely from IP‐HTMS experimental parameters, an initial focus was to confirm that the confidence score was an accurate means of ranking the interactions for further study. We observed, for example, that known interactions in the data set have, on average, significantly higher interaction confidence scores. For example, the set of baits corresponding to core and regulatory components of the proteasome enabled reconstruction of a proteasome interaction network (Figure 6C), comprising many known proteasome components and enriched for high‐scoring interactions.

We also integrated the IP‐HTMS data set with several other genomic‐scale data including other protein–protein interaction data sets, gene co‐expression data, and annotations from the gene ontology project. In the latter case, we analyzed the frequency of co‐occurrence of both bait and prey protein in the same biological process or cellular component category (Figure 3). We find that there is significant enrichment of bait–prey pairs sharing the same annotation category, indicating a strong tendency for bait proteins to bind prey proteins with related functions. Integration with gene co‐expression data showed that interaction data sets, this one included, are enriched for gene pairs that are co‐expressed. This enabled identification of tightly clustered sets of protein interactors that are also co‐expressed at the mRNA level. For example, the LYAR bait protein (Ly1 antibody reactive clone) is a nucleolar protein of unknown function (Su et al, 1993). This bait identified a set of nucleolar‐localized prey proteins that are also very tightly co‐expressed (Figure 5). These results along with the other protein–protein interaction data sources provided a powerful means of cross‐validating the human IP‐HTMS data set and associated methodology.

Our focus in this paper has been to prepare a quality‐controlled, large‐scale human protein interaction data set that will add significantly to our knowledge of the human protein interactome. Given the focus on baits of significant biomedical interest (through functional or disease associations), we anticipate that this data set alongside other sources of human protein–protein interactions will be an important starting point for functional characterization of disease‐related interactions and complexes. The IP‐HTMS platform utilized here shows great promise as an effective means of protein interaction discovery and we anticipate that future applications will include broadening to a larger set of disease associated proteins, to other cell lines and coupling with drug treatments.

  • We present a dataset of 6486 interactions between 2371 distinct proteins from a large‐scale application of immunoprecipitation and high‐throughput mass‐spectrometry (IP‐HTMS) on 338 human bait proteins expressed in human cells.

  • The dataset is cross‐validated using previously published and predicted human protein interactions. In depth mining of the dataset shows that it represents a valuable source of novel protein‐protein interactions with relevance to human diseases. In addition, our analysis reveals many novel protein interactions and pathway associations.

  • Protein interactions in the dataset are accompanied by a confidence score which is derived by combining several experimental and protein identification analysis metrics.

Mol Syst Biol. 3: 89

  • Received September 22, 2006.
  • Accepted January 26, 2007.
View Full Text