- Letter to the Editor
- Open Access
The cancer proteomic landscape and the HUPO Cancer Proteome Project
Clinical Proteomicsvolume 15, Article number: 4 (2018)
The Human Cancer Proteome Project (Cancer-HPP) is an international initiative organized by HUPO whose key objective is to decipher the human cancer proteome through a coordinated effort by cancer proteome researchers around the world. The ultimate goal is to map the entire human cancer proteome to disclose tumor biology and drive improved diagnostics, treatment and management of cancer. Here we report the progress in the cancer proteomics field to date, and discuss future proteomic developments that will be needed to optimally delineate cancer phenotypes and advance the molecular characterization of this significant disease that is one of the leading causes of death worldwide.
Cancer development is associated with deregulated signal transduction and aberrant protein activity and/or function as a result of genomic aberrations . Current large-scale genomics efforts are fuelled by the hope that cancer genomic data may facilitate rational therapeutic decisions, guiding the selection of treatments tailored to the individual patient (personalized, precision or P4 (personalized, predictive, preventative, and participatory) medicine . To date, cancer genomics has revealed hundreds of recurrently mutated or otherwise frequently aberrant genes, which may “drive” tumorigenesis [3, 4]. This effort has already led to numerous anticancer therapeutics that more precisely target cancer cells than do treatments that have been the mainstay of cancer care for decades, such as cytotoxic chemotherapy and radiotherapy. Currently there is a bias in the cancer research/oncology community toward using genetic testing, in particular DNA sequencing, to infer cancer biology and determine cancer patient management. However, we are still confronted with multiple challenges: (1) Only a small fraction of tumors harbor actionable mutations (~ 10%); (2) Resistance to treatment is frequent and invariably develops within a few months to a year; (3) The functional consequences of genomics alterations are often unknown or at best inferred; (4) Complicating matters further, clinical responses are tumor-context dependent. To overcome these challenges, complementary approaches in addition to genomics are needed to fully enable development of improved diagnostics and treatments as well as better informed individual patient care. To this end, comprehensive proteome analysis offers a means to measure the biochemical impact of cancer-related genomic abnormalities, including expression of variant proteins encoded by mutated genes, changed protein levels driven by altered DNA copy number, chromosomal amplification and deletion events, epigenetic regulation, and changes in microRNA expression . Furthermore, analysis of post-translational protein modifications, in particular reversible protein phosphorylation, enables the detection of signaling network adaptations driven by genomic as well as micro-environmental changes .
To date, there are not many examples of cancer proteomics that have already resulted in improved routine care for patients as most discoveries are still in clinical development. One exception is the FDA-cleared multivariate index assay (IVDMIA) for OVA1 cleared by the FDA in 2009 (reviewed by Zhang and Chan , Cancer Epidemiology Biomarkers & Prevention 2010). The intended clinical use of OVA1 is to assess the likelihood an ovarian mass is malignant prior to a planned surgery. The clinical utility of OVA1 is to provide additional information for referral of patients with higher risk of malignancy to gynecologic oncologists. Ovarian cancer patients operated on by gynecologic oncologists are more likely to receive optimal cytoreductive surgery and treatment, and have been shown to have a better outcome. The discovery of biomarkers in the panel (other than CA125) using a proteomic approach and the development of the OVA1 IVDMIA algorithm played an important role in the design of the intended use for OVA1 and the clinical studies that led to the OVA1’s clearance by FDA [8, 9].
In addition, we would also like to highlight one recent study published in Annals of Internal Medicine  that has the potential to change clinical practice in the near future. In this very large-scale clinical cancer proteomics study in which 325 stool samples were profiled by label-free mass spectrometry, the authors report on new stool-based protein biomarkers for improved colorectal cancer screening. The identified markers yielded improved cancer detection over the gold standard hemoglobin marker and a five-fold higher detection rate of advanced adenomas . As these lesions can be endoscopically removed, surgery then can be prevented. This has the potential to dramatically increase the impact of stool based colorectal cancer screening programs. Moreover, the new biomarkers can be detected with the same sampling method as used for the current fecal immunochemical (FIT) test. Upon further clinical validation, it will be easy to implement the new biomarker test in the FIT based screening programs.
This letter highlights current progress in applying high-resolution mass spectrometry-based proteomics approaches to bridge the gap between cancer genome information on the one hand and observed cancer phenotype on the other, as well as the steps forward required to unravel the cancer proteome, elucidate tumor-specific features, and identify protein targets for clinical application.
Cancer proteomics and international collaboration to empower the precision medicine pipeline
Cancer is not a single disease, and each cancer type is heterogeneous. Therefore, to get useful insights into its pathogenesis, there is a need to profile in depth many tumors from individual patients and combine cancer proteomics data and genomic data sets for meta-analysis. In addition, big-data strategies that identify statistical associations are required to discover biological relationships. International collaboration is a vital component of this effort. A multidisciplinary approach is clearly needed , and to achieve progress, collaborative teams of researchers in the fields of cancer proteomics and genomics, computational biology, and bioinformatics are needed. Together, they can translate the enormous amounts of ever-increasing genomic and proteomic information into novel clinical knowledge and tools with a favorable impact for cancer patients around the world.
Key examples of outstanding international collaboration include the large, cancer-type specific studies carried out by the International Cancer Genome Consortium within the framework of The Cancer Genome Atlas project (TCGA), spearheading the description of the genomic landscapes of several thousands of tumors of over 20 tumor types [3, 4]. Moreover in the past 5 years, the Clinical Proteomic Tumor Analysis Consortium (CPTAC) of the National Cancer Institute has performed in-depth proteomic studies of genomically characterized tumors representing three major tumor types, i.e., colorectal, breast and ovarian cancer [5, 12]. These pioneering proteogenomic studies were recently published [13,14,15] and the data are freely available for analysis by the cancer proteomics community. Key insights were obtained on variant proteins arising from DNA/RNA variation, and on aberrant gene copy numbers and expression coupled to altered protein expression levels resulting from focally amplified chromosomal segments, pinpointing various cancer driver genes [13,14,15]. Another important observation made based on these three systematic cancer proteome studies was that proteome data outperform transcriptome data for coexpression-based gene function prediction .
Collaboration and data sharing are also key to the project. The Genomics Evidence Neoplasia Information, Exchange (GENIE) project is a recent endeavor of the American Association for Cancer Research (AACR) that aims to build an international cancer registry by sharing clinical cancer sequencing data from eight international institutions . GENIE collects, catalogs, and links tumor genetic data with data on patient outcomes from all participating institutions and then makes the data publicly available. Finally, a new international initiative called APOLLO (Applied Proteogenomics OrganizationaL Learning and Outcomes) was launched as a Cancer Moonshot collaboration, that will utilize advanced genomic and proteomic expression platforms on high-quality human biospecimens in near real-time in order to identify potentially actionable therapeutic molecular targets, study the relationship of molecular findings to cancer treatment outcomes, and accelerate novel clinical trials with biomarkers of prognostic and predictive value . The above efforts, together with on-going profiling studies in individual labs around the world, will hugely expand the description of cancer proteomes in the coming years.
The Human Cancer Proteome Project
The framework for the Biology/Disease-driven Human Proteome Project (B/D-HPP) was established at HUPO 2010 in Sydney, Australia. Nine B/D-HPP workshops were then held at HUPO 2012 in Boston, at which it was decided to establish working groups focussing on specific biological processes and disease areas. Following these initial discussions, a number of additional working groups were established including the Ca-HPP which was co-chaired by the late Juan Pablo Albar (Centro Nacional de Biotecnología, Madrid, Spain) and Hui Zhang (Johns Hopkins University, Baltimore, USA)  and which is currently co-chaired by the authors of this letter.
By stimulating networking of cancer proteome scientists around the world and by organizing specific sessions at the annual HUPO meetings to share best practices and data, the Cancer-HPP aims to: (1) Delineate human cancer proteomes versus matched normal/premalignant samples; (2) Identify tumor-type specific signatures by comparison of multiple tumor types. (3) Develop standard operating procedures (SOPs) for the detection and measurement of these disease signatures. Importantly, it focuses on the analysis of human tumors, rather than experimental model systems, to help accelerate the translation of data into clinical practice.
To achieve our aims, we depend on the availability of high-quality specimens obtained according to strict SOPs to avoid protein degradation and minimize pre-analytical variability, while systematic clinical annotation is important for proper data analysis to correlate protein changes to clinical outcome. This will require multi-disciplinary collaboration between proteome scientists, (local) clinicians including pathologists. Pathologists are critical to the proteomics effort as they are in some cases collecting the sample, and in most cases processing and characterizing tissues, ensuring sufficient sample for clinical purposes, while also making significant efforts to bank as much tissue as is reasonable for research . To underline specific recognition of this discipline in the cancer proteomics effort, recently a new pathology initiative proposed to support the HPP. Protocols for tissue collection and processing are available from the National Cancer Institute  as well as from links to partner labs on the Cancer-HPP website .
Furthermore, to ensure high-quality cancer proteome profiling and productive meta-analysis, we would like to emphasize the importance of assessing inter-laboratory reproducibility of workflow and intra-laboratory reproducibility with data collected in different time for data-dependent mass spectrometry (MS) for discovery, and the application of performance metrics to benchmark system performance on a regular basis using both simple and complex reference samples. Previous studies have confirmed the ability of targeted protein quantification by multiple reaction monitoring (MRM) to achieve reproducible, precise quantification of protein concentrations in tissues and biofluids across multiple laboratories throughout the CPTAC network . Importantly, in our experience, when the whole proteomics workflow has been optimized and appropriate SOPs and reference materials are in place, there is also a good inter-laboratory and long-term correlation between label-free shotgun proteomics data that allow for relative protein quantitation, even when using different workflows (Fig. 1). Figure 1 shows an example of the overlap of identified proteins of the TCGA/CPTAC colorectal cancer proteome generated by 2D-LC–MS/MS of 95 tumors  and the proteome obtained for 40 colorectal tumors from patients around Amsterdam, generated by GeLC–MS/MS in the Jimenez laboratory. Not only does a large fraction of both proteomes overlap, but the spectral counts for the overlapping proteins are also highly correlated (Fig. 1). These encouraging inter-laboratory shotgun results indicate that meta-analysis of cancer proteomes is also possible in terms of relative quantitation when performing label-free proteomics. We expect that these results will further improve when using data-independent acquisition approaches.
We strongly encourage all proteome scientists who report on human cancer proteomics studies to adhere to the clinical proteomics reporting guidelines as formulated in 2008 by an expert team for the Molecular and Cellular Proteomics journal  and the HPP Guidelines v2.1 (hupo.org/hpp/guidelines). This guideline mandates a protein-level FDR < 1%, careful scrutiny of the spectra, use of thresholds of 9 amino acids in length and 2 uniquely mapping peptides for peptide-to-protein matches, along with careful consideration of alternative protein matches, especially to sequence variants or isobaric PTMs of abundant proteins . Finally, to achieve Cancer-HPP aims, post-publication deposition of raw data of cancer proteomics data sets in the public domain is crucial to enable meta- and pan-cancer analyses. Our recent literature survey revealed that raw data are only available for a quarter of the published cancer proteome studies (see below, Fig. 2). Therefore, this is an urgent call to make cancer proteome studies available in the public domain once they have been published.
Cancer proteomes, where do we stand?
Three years ago, two international teams independently produced the first draft of the human proteome, largely using non-diseased human tissues and biofluids [26, 27]. Data sets were mainly collected from the public domain by the Kuster lab  and generated de novo by the Pandey lab . These catalogs together represent ~ 80% of the human proteome (15,721 of 19,629 proteins; numbers reported on 27 July 2017 on the proteomicsdb website) that is available in a queriable database  and provides a baseline to better understand changes that occur in disease states. Upon the release of these two large-scale studies the authors received some criticism for overestimating the number of protein coding genes in their datasets . The numbers currently reported on the proteomicsdb website contain the adjusted numbers.
To determine the current status of the human cancer proteome landscape, the Jimenez lab performed a PubMed analysis focusing on high-resolution mass spectrometry-based studies of the past 5 years, analyzing human cancers using the search terms tumor, human, cancer, proteome, proteomics, mass spectrometry in various combinations, together with dedicated searches using also the names of the tumor types (Fig. 2, Additional file 1: Table S1). This search returned hits for 18 tumor types and revealed that breast cancer proteome profiling stands out with the largest number of profiled samples (~ 600) in 12 different studies, followed by colorectal cancer and ovarian cancer studies with about 250 samples analyzed in 12 and 6 different studies, respectively (Fig. 2). For leukemias, lung, brain, liver, and pancreatic cancer, analyses were reported for more than 100 cancers, and for other tumor types, including gastric and kidney cancer, melanoma, bladder, prostate, and cervical cancer, osteosarcoma, esophageal cancer, and retinoblastoma, the numbers quickly drop. Therefore, this analysis highlights which tumor proteomes need to be profiled to have a comprehensive view of the cancer proteome. The analysis also revealed that many studies only analyzed 10–20 samples at a depth of 1000–2000 proteins per sample. Crucially, there is a need for studies with a more substantial numbers of samples (> 50), analyzed at a substantial depth (ideally > 3000–5000 proteins per sample). In addition, data deposition in the public domain will be key to allow for other researchers to re-use data and cross-validate biomarkers. Unfortunately, currently only a subset of the published tumor proteome data is publicly available (Fig. 2 and Additional file 1: Table S1). With MS instruments becoming ever faster and more sensitive, we expect that in the coming years the number of ‘better-powered’ studies, like those performed in the CPTAC context, will grow significantly. The results of the literature survey along with a meta-analysis of public domain and own data will be reported elsewhere.
Together, the world-wide cancer proteome profiling effort by the Cancer-HPP and allied initiatives will make it possible to build comprehensive and quantitative catalogs of proteins encountered in different tumor types and clinical conditions. Integration with functional genomics will address the basic question of how genotypic variability is mechanistically translated into phenotypic variability while integration with clinical data will enable application in a precision oncology pipeline.
The ability to interrogate cancer at the proteome level and integrate acquired knowledge with genome data will further improve clinical decision-making and catalyze new clinical and translational cancer research. The Cancer-HPP will support efforts that generate, analyze, and integrate cancer proteome data by disseminating best practices and aiming for development of a queryable data resource of published human cancer proteomes. We call upon all cancer proteome researchers, clinicians and pathologists to join us in this effort. Moreover, development of closer ties between proteome scientists and those that routinely develop, implement and oversee use of in vitro diagnostics (i.e. pathologists, clinicians, clinical chemists, IVD industry) is expected to provide another potential opportunity to accelerate progress in the cancer proteomics realm. We believe the future is bright, especially in view of the advent of novel mass spectrometry approaches that combine the best of discovery and targeted mass spectrometry and the development of emerging techniques like top down proteomics , which will facilitate the analysis of disease related post-translational modifications. Ultimately, cancer proteogenomics powered by precise measurements and high-quality diagnostic methods using the lowest “–plex” possible will realize the full potential of multi-parameter diagnostics and personalized medicine.
two-dimensional liquid chromatography
American Association for Cancer Research
Applied Proteogenomics Organizational Learning and Outcomes
Biology/Disease-driven Human Proteome Project
The Human Cancer Proteome Project
Clinical Proteomic Tumor Analysis Consortium
gel electrophoresis followed by liquid chromatography
Genomics Evidence Neoplasia Information Exchange
Human Proteome Organization
tandem mass spectrometry
standard operating procedure
The Cancer Genome Atlas
Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. 2011;144:646–74. https://doi.org/10.1016/j.cell.2011.02.013.
Nice EC. From proteomics to personalized medicine: the road ahead. Expert Rev Proteomics. 2016;13:341–3. https://doi.org/10.1586/14789450.2016.1158107.
Vogelstein B, Papadopoulos N, Velculescu VE, Zhou S, Diaz LA, Kinzler KW. Cancer genome landscapes. Science. 2013;339:1546–58. https://doi.org/10.1126/science.1235122.
Rubio-Perez C, Tamborero D, Schroeder MP, Antolín AA, Deu-Pons J, Perez-Llamas C, et al. In silico prescription of anticancer drugs to cohorts of 28 tumor types reveals targeting opportunities. Cancer Cell. 2015;27:382–96. https://doi.org/10.1016/j.ccell.2015.02.007.
Ellis MJ, Gillette M, Carr SA, Paulovich AG, Smith RD, Rodland KK, et al. Connecting genomic alterations to cancer biology with proteomics: the NCI Clinical Proteomic Tumor Analysis Consortium. Cancer Discov. 2013;3:1108–12. https://doi.org/10.1158/2159-8290.CD-13-0219.
Roux PP, Thibault P. The coming of age of phosphoproteomics—from large data sets to inference of protein functions. Mol Cell Proteomics. 2013;12:3453–64. https://doi.org/10.1074/mcp.R113.032862.
Zhang Z, Chan DW. The road from discovery to clinical diagnostics: lessons learned from the first FDA-cleared in vitro diagnostic multivariate index assay of proteomic biomarkers. Cancer Epidemiol Biomark Prev. 2010;19:2995–9. https://doi.org/10.1158/1055-9965.EPI-10-0580.
Rai AJ, Zhang Z, Rosenzweig J, Shih I-M, Pham T, Fung ET, et al. Proteomic approaches to tumor marker discovery. Arch Pathol Lab Med. 2002;126:1518–26. https://doi.org/10.1043/0003-9985(2002)126<1518:PATTMD>2.0.CO;2.
Zhang Z, Bast RC, Yu Y, Li J, Sokoll LJ, Rai AJ, et al. Three biomarkers identified from serum proteomic analysis for the detection of early stage ovarian cancer. Cancer Res. 2004;64:5882–90. https://doi.org/10.1158/0008-5472.CAN-04-0746.
Bosch LJW, de Wit M, Pham TV, Coupé VMH, Hiemstra AC, Piersma SR, et al. Novel stool-based protein biomarkers for improved colorectal cancer screening. Ann Intern Med. 2017;167:855. https://doi.org/10.7326/M17-1068.
Hood L. Systems biology and p4 medicine: past, present, and future. Rambam Maimonides Med J. 2013;4:e0012. https://doi.org/10.5041/RMMJ.10112.
Rivers RC, Kinsinger C, Boja ES, Hiltke T, Mesri M, Rodriguez H. Linking cancer genome to proteome: NCI’s investment into proteogenomics. Proteomics. 2014;14:2633–6. https://doi.org/10.1002/pmic.201400193.
Zhang B, Wang J, Wang X, Zhu J, Liu Q, Shi Z, et al. Proteogenomic characterization of human colon and rectal cancer. Nature. 2014;513:382–7. https://doi.org/10.1038/nature13438.
Zhang H, Liu T, Zhang Z, Payne SH, Zhang B, McDermott JE, et al. Integrated proteogenomic characterization of human high-grade serous ovarian cancer. Cell. 2016;166:755–65. https://doi.org/10.1016/j.cell.2016.05.069.
Mertins P, Mani DR, Ruggles KV, Gillette MA, Clauser KR, Wang P, et al. Proteogenomics connects somatic mutations to signalling in breast cancer. Nature. 2016;534:55–62. https://doi.org/10.1038/nature18003.
Wang J, Ma Z, Carr SA, Mertins P, Zhang H, Zhang Z, et al. Proteome profiling outperforms transcriptome profiling for coexpression based gene function prediction. Mol Cell Proteomics. 2017;16:121–34. https://doi.org/10.1074/mcp.M116.060301.
Rose S. Huge data-sharing project launched. Cancer Discov. 2016;6:4–5. https://doi.org/10.1158/2159-8290.CD-NB2015-159.
Fiore LD, Rodriguez H, Shriver CD. Collaboration to accelerate proteogenomics cancer care: The Department of Veterans Affairs, Department of Defense, and the National Cancer Institute’s Applied Proteogenomics OrganizationaL Learning and Outcomes (APOLLO) Network. Clin Pharmacol Ther. 2017;101:619–21. https://doi.org/10.1002/cpt.658.
Aebersold R, Bader GD, Edwards AM, van Eyk JE, Kussmann M, Qin J, et al. The biology/disease-driven human proteome project (B/D-HPP): enabling protein research for the life sciences community. J Proteome Res. 2013;12:23–7. https://doi.org/10.1021/pr301151m.
Jin P, Lan J, Wang K, Baker MS, Huang C, Nice EC. Pathology, proteomics and the pathway to personalised medicine. Expert Rev Proteomics. 2018;2018(14789450):1425618. https://doi.org/10.1080/14789450.2018.1425618.
Biospecimen Collection, Processing, Storage, Retrieval, and Dissemination. National Cancer Institute, Biorepositories and Biospecimen Research Branch. 2017. https://biospecimens.cancer.gov/bestpractices/to/bcpsrd.asp. Accessed 17 July 2017.
Human Cancer Proteome Project (Cancer-HPP). Human Proteome Organization. 2017. https://www.hupo.org/Human-Cancer-Proteome-Project. Accessed 17 July 2017.
Addona TA, Abbatiello SE, Schilling B, Skates SJ, Mani DR, Bunk DM, et al. Multi-site assessment of the precision and reproducibility of multiple reaction monitoring-based measurements of proteins in plasma. Nat Biotechnol. 2009;27:633–41. https://doi.org/10.1038/nbt.1546.
Guidelines for Preparing Manuscripts Describing Research in Clinical Proteomics. Molecular & Cellular Proteomics. 2017. http://www.mcponline.org/site/cpmeeting/ProGuideIntro2.xhtml. Accessed 17 July 2017.
Omenn GS. Advances of the HUPO Human Proteome Project with broad applications for life sciences research. Expert Rev Proteomics. 2017;14:109–11. https://doi.org/10.1080/14789450.2017.1270763.
Wilhelm M, Schlegl J, Hahne H, Gholami AM, Lieberenz M, Savitski MM, et al. Mass-spectrometry-based draft of the human proteome. Nature. 2014;509:582–7. https://doi.org/10.1038/nature13319.
Kim M-S, Pinto SM, Getnet D, Nirujogi RS, Manda SS, Chaerkady R, et al. A draft map of the human proteome. Nature. 2014;509:575–81. https://doi.org/10.1038/nature13302.
ProteomicsDB. 2017. https://www.proteomicsdb.org. Accessed 17 July 2017.
Omenn GS, Lane L, Lundberg EK, Beavis RC, Nesvizhskii AI, Deutsch EW. Metrics for the Human Proteome Project 2015: progress on the human proteome and guidelines for high-confidence protein identification. J Proteome Res. 2015;14:3452–60. https://doi.org/10.1021/acs.jproteome.5b00499.
Cleland TP, DeHart CJ, Fellers RT, VanNispen AJ, Greer JB, LeDuc RD, et al. High-throughput analysis of intact human proteins using UVPD and HCD on an orbitrap mass spectrometer. J Proteome Res. 2017;16:2072–9. https://doi.org/10.1021/acs.jproteome.7b00043.
CRJ was responsible for the collection and meta-analysis of publicly available cancer tissue proteomics data sets, and for preparation of figures. CRJ, HZ, CRK, and ECN wrote the paper. All authors read and approved the final manuscript.
Dr. Jaco Knol, Dr. Thang Pham of the VU University Medical Center in Amsterdam are thanked for the analyses underlying Figs. 1 and 2 and members of the OncoProteomics Laboratory for performing the literature searches that are at the basis of Fig. 2 and included in Additional file 1: Table S1.
The authors declare that they have no competing interests.
Availability of data and materials
This publication is supported by multiple datasets, which are openly available at locations cited in the reference section. The metadata used and analysed in this publication and the accession numbers of publicly available datasets are available in Additional file 1: Table S1.
Consent for publication
Ethics approval and consent to participate
Contingent on studies providing data sets analyzed in this publication.
No funding was obtained for the work in this publication.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.