Skip to main content


Mass Spectrometric Identification of Proteotypic Peptides from Clinically Used Tumor Markers



With the rapid development of mass spectrometry-based technologies such as multiple reaction monitoring and heavy-isotope-labeled-peptide standards, quantitative analysis of biomarker proteins using mass spectrometry is rapidly progressing toward detection of target proteins/peptides from clinical samples. Proteotypic peptides are a few peptides that are repeatedly and consistently identified from a protein in a mixture and are used for quantitative analysis of the protein in a complex biological sample by mass spectrometry.

Materials and Methods

Using mass spectrometry, we identified peptide sequences and provided a list of tryptic peptides and glycopeptides as proteotypic peptides from five clinically used tumor markers, including prostate-specific antigen, carcinoembryonic antigen, Her-2, human chorionic gonadotropin, and CA125.


These proteotypic peptides have potential for targeted detection as well as heavy-isotope-peptide standards for quantitative analysis of marker proteins in clinical specimens using a highly specific, sensitive, and high-throughout mass spectrometry-based analysis method.


Currently, there are 19 proteins approved by the US Food and Drug Administration [1] as tumor markers in serum, urine, and tissue. With the exception of prostate-specific antigen (PSA), most of these tumor markers are intended for monitoring response to therapy. There is an urgent need to identify tumor markers for the detection of cancer at an early stage when treatment is much more effective.

With the advances in proteomic technologies, protein biomarker discovery has shown increasing promise. The critical aspect in the development and validation of biomarkers is to obtain precise and consistent analytical results performed over time, with different methods, or among different laboratories. The most popular method for the measurement of tumor markers is immunoassay, which provides a rapid, sensitive, and high-throughput detection platform in the clinical setting. However, standardization is often a challenge for tumor markers where assay component and design including antibodies, assay parameters, and calibration with known antigen standards are important issues [2]. First, tumor antigen standards have to be pure and consistent. Tumor antigens from commercial sources are generally isolated from clinical specimens using conventional protein purification methods, such as antibody immunoprecipitation, electrophoresis, and chromatography. However, they often are contaminated with other proteins that could potentially interfere with the assays. Second, antibodies for tumor markers need to be specific. Generation of well-defined antibodies to targeted antigens has been a continuous and expensive effort for clinical assay development. In addition, conditions for tumor marker analysis need to be properly controlled. This is illustrated by the development of appropriate reference materials for PSA measurement [3]. Survey results among 25 methods for PSA determination showed 300% variability due to different reactivities of affinity reagents in assays to the free and complexed forms of PSA. New survey materials were created and tested in multiple PSA assays with inter-assay differences shown to be reduced to 15–20%. Based on these studies, the new survey material was adopted for use by the College of American Pathologists [3].

With the rapid development of mass spectrometry (MS) technologies, especially with the recent introduction of targeted analysis of proteins using multiple reaction monitoring (MRM), sensitivity and reproducibility have increased dramatically [46], increasing the potential for MS-based methods for tumor marker detection. MRM has several advantages over previous methods. First, the marker proteins/peptides are identified using MS based on their amino acid sequence. This gene-related information is unique for each protein/peptide. Thus, the target protein/peptide can be distinguished easily from other proteins in a complex biological mixture. This reduces the interference of other proteins. Second, multiple analyses can be developed against the same proteins using different peptides from the same protein, which together will produce higher specificity and accuracy for the specific target. Third, the sensitivity of MS detection coupling with multi-dimensional chromatography and MRM is comparable to that of immunoassays. The specificity of an MRM scan greatly reduces chemical background. This enables extremely low levels of detection and limit of quantitation, and the ability to quantify the concentration over a wider dynamic range. Recently, it has been demonstrated that this approach can quantify proteins in plasma at the sub-nanogram/milliliter level with linearity over two orders of magnitude [4]. Fourth, accurate quantification can be achieved using MRM and heavy-isotope-labeled-peptide standards. Unlike the traditional antigen standards for tumor markers, the heavy-isotope-labeled-peptide standards can be chemically synthesized using the specific amino acid sequences from the tumor markers and specifically purified. Their purity and content can be defined to known levels of accuracy. Finally, the availability of highly specific antibodies will no longer be an obstacle for quantitative protein measurement using an MS-based analytical approach. The information-dependent acquisition or monitoring of multiple fragment ions from previously identified peptide standards could be used to confirm the sequences of the peptides. Developing assays for the currently used tumor makers using the MRM and heavy-isotope-labeled-peptide standards can provide opportunities to determine the performance of the proteomic method by comparing MRM to the traditional immunoassays.

Recently, the Early Detection Research Network of the National Cancer Institute developed a reference material based on a pooled serum standard for benchmarking serum proteomics as a source of early cancer detection biomarkers. This pilot reference material was spiked with several proteins (corresponding to clinically used tumor markers) to simulate the cancer disease state. We report here the mass spectrometric analysis of five of the spiked proteins—PSA, carcinoembryonic antigen (CEA), Her-2, human chorionic gonadotropin (hCG), and CA125. The five tumor markers were first digested using trypsin and identified by liquid chromatography-tandem mass spectrometry (LC–MS/MS). In addition, N-linked glycopeptides of each protein were isolated using solid phase extraction of N-linked glycopeptides (SPEG) since all five tumor markers are glycoproteins [7, 8]. The glycopeptide isolation largely reduced the sample complexity by removing most of the high-abundance serum peptides and significantly improved the analytical sensitivity [7, 9]. The extracted glycopeptides from each protein were detected by LC–MS/MS, while the N-linked glycopeptides were identified from the five tumor markers. Both results provided a list of proteotypic peptides for targeted detection of these tumor markers using MS-based methods [10]. Heavy-isotope-labeled-peptide standards could be synthesized based upon the peptide sequences identified here and used for the development of assays for these tumor markers using MS-based methods.

Experimental Methods

Materials and Reagents

Five tumor marker proteins were purchased from commercial sources, as listed in Table 1. Sequencing grade Trypsin was purchased from Promega (Madison, WI), sequencing grade endoproteinase Arg-C was from Roche (Penzberg, Germany), peptide-N-glycosidase F (PNGase F) was from New England BioLabs (Ipswich, MA), sodium periodate and hydrazide resin were from Bio-Rad (Hercules, CA), C18 and MCX desalting columns were from Waters (Milford, MS), the nano-high-performance liquid chromatography (HPLC) system was purchased from Eksigent (Dublin, CA, USA). The linear trap quadrupole (LTQ) mass spectrometer was from Thermo Fisher (Waltham, MA, USA). All trap columns and separation columns were home packed. Other chemicals were purchased from Sigma (St. Louis, MO, USA).

Table 1 Clinically used tumor markers for mass spectrometric analysis

Generation of Tryptic Peptides from Clinically Used Tumor Markers

Each tumor marker protein was digested to obtain tryptic peptides. Proteins (10 μl from original package) were first denatured in 45 μl of 8 M urea, 0.4 M NH4HCO3, and 0.1% sodium dodecyl sulfate for 1 h at 60°C, then reduced by adding 5 μl of 120 mM Tris (2-carboxyethyl) phosphine at room temperature for 30 min and alkylated by mixing with 5 μl of 160 mM iodoacetamide at room temperature for 30 min in the dark. Each sample was diluted with 95 μl of trypsin digestion buffer (100 mM KH2PO4, pH 8.0) with 10 μg of trypsin, and the proteins were digested at 37°C overnight with gentle shaking. The 0.5 ul of diluted protein solution before adding trypsin and 0.5ul of digested protein solution of each protein were analyzed by one dimensional polyacrylamide gel electrophoresis (1D-PAGE) and silver staining to monitor the protein purity and the tryptic digestion. The digested peptides were cleaned with C18 columns, dried, and resuspended in 10 μl of 0.4% acetic acid solution. Then, 5 μl of tryptic peptide mixture of each protein was used in each LC–MS/MS analysis.

N-Linked Glycopeptide Isolation

Formerly N-linked glycosylated peptides were isolated from the tumor markers using the N-linked glycopeptide capture procedure as described previously [8, 11]. Briefly, 10 μl of individual protein from the original source was used for glycopeptide isolation. Five proteins were first denatured, reduced, and alkylated using the method described as above. Proteins, except PSA, were diluted with 95 μl of trypsin digestion buffer and digested using 10 μg of trypsin. The PSA protein was diluted with 95 μl of Arg-C digestion buffer (100 mM Tris–HCl, 20 mM CaCl2, 10 mM dithiothreitol, 1 mM ethylenediaminetetraacetic acid, 40 mM methylamine, pH 7.6). One microgram of protease Arg-C was added to the digestion buffer at 37°C overnight with gentle shaking. After digestion of proteins to peptides, the peptides were cleaned with C18 columns and oxidized by adding 25 μl of 100 mM sodium periodate in 50% acetonitrile at 4°C for 1 h in the dark. After removal of the oxidant using C18 columns, the sample was conjugated to hydrazide resin at room temperature for 4 h in 80% acetonitrile. Non-glycosylated peptides were then removed by washing the resin three times each with 800 μl of 1.5 M NaCl, H2O, and 100 mM of NH4HCO3. N-linked glycopeptides were then released from the resin by addition 1 mU of PNGase F in 100 mM of NH4HCO3 and incubated at 37°C overnight. After the final cleanup by MCX columns, the peptides were dried and resuspended in 10 μl of 0.4% acetic acid solution. The 5 μl of glycopeptide mixtures were used in each LC–MS/MS analysis.

Mass Spectrometric Analysis of Peptides

The tryptic peptides and N-linked glycopeptides from tumor markers were analyzed using an LC–MS/MS platform. Five microliters of tryptic peptides and 5 μl of N-linked glycopeptides were injected into an Eksigent nano-HPLC system. Peptides were then separated by a nano-scale C-18 reverse phase column (75-μm inner diameter × 10 cm long and packed with YMC ODS-AQ, 5 μm particle size, 120A pore size) at a flow rate of 300 nl/min. The HPLC mobile phases A and B were 0.1% formic acid in HPLC grade water and 0.1% formic acid in 90% HPLC grade acetonitrile, respectively. The mobile phase B was increased from 10% to 60% in 33 min and then increased to 100% in 22 min. A 2.1-kV spray voltage was applied to transfer the separated peptides from HPLC to a LTQ ion trap mass spectrometer. The precursor scans from 350–1,800 m/z and top eight ions were picked up for MS/MS scan. The resulting MS/MS spectra were used for peptide identification.

Identification of Peptide Sequences

Peptides were identified using a database search. The MS/MS spectra were searched against the public database (version 2.28 of the International Protein Index human protein database containing 40,110 entries) using the Sequest [12]. The parameters for the database search were as follows: (1) The mass tolerance of the precursor set at ±2 m/z unit, (2) presence of protein modifications for tryptic peptides including carboxymethylation of cysteines and oxidation of methionines, (3) presence of protein modifications for glycopeptides including carboxymethylation of cysteines, oxidation of methionines, and an enzyme-catalyzed conversion of asparagine to aspartic acid at glycosylated site.

Database search results were statistically analyzed using PeptideProphet, which effectively computes a probability for the likelihood of each identification being correct (on a scale of 0 to 1) in a data-dependent fashion [13]. A minimum PeptideProphet probability score of 0.8 was used to remove low probability peptides. For N-linked glycopeptides, a N-linked glycosylation consensus motif (N-X-S/T, where X is any amino acid except proline) [14] and the conversion of asparagine to aspartic acid were used to identify the glycosylation sites. A maximum of two tryptic peptides and two N-linked glycopeptides were selected based on the rank of PeptideProphet score to represent each tumor markers.


The Purities of Tumor Marker Proteins

The five tumor marker proteins were purchased from commercial sources. Based on the information provided by the vendors, two proteins, PSA and CEA, were purified by chromatographic methods with purities higher than 98%. hCG was precipitated from pregnant women’s urine without additional purification. The purification methods of the other two proteins were not provided by the vendors. Each of the proteins was tested by 1D-PAGE gel (with 4–20% of gradient) and silver staining to verify the protein purity (see Fig. 1). The silver staining showed that all tumor marker proteins purchased for this study contained other protein bands than the expected tumor antigens. This result clearly showed that the purification methods for these tumor antigen standards were not sufficient to produce highly purified proteins.

Fig. 1

1D-PAGE gel and silver staining to verify the protein purity of five tumor markers

Tryptic Peptide Identification

To increase the accuracy of biomarker protein detection in clinical samples, mass spectrometry was used to identify tryptic peptides for each tumor marker protein. All five marker proteins were digested by trypsin, and the digested peptides were analyzed by LC–MS/MS and identified using database search. The tryptic peptides for each protein are listed in Table 2. Two peptide sequences from each protein were selected based on peptide identification probabilities, and their representative MS/MS spectra and fragment ions are shown in Fig. 2.

Fig. 2

MS/MS spectra and fragment ion lists of selected tryptic peptides from five tumor markers

Table 2 Sequences of mass spectrometric identified tryptic peptides from tumor markers

In addition to identifying the peptides from the known clinically used tumor markers, we also identified other proteins from the same sample, indicating that the tumor antigens from commercial sources are generally contaminated with other proteins. For example, PSA was purified by chromatography, and the purity reported by the vendor was to be greater than 98%. Mass spectrometry detected 15 other proteins in the solution (Table 3). These protein contaminants (even less than 2%) might potentially interfere with clinical assays when these proteins are used as antigen standards.

Table 3 Mass spectrometric identified proteins and tryptic peptides from PSA antigens purchased from commercial source

Glycopeptide Identification

Since all of the tumor markers studied are glycoproteins, N-linked glycopeptides for all five proteins were isolated using SPEG methods [8]. We chose SPEG as the additional peptide purification method due to the fact that (1) all five proteins were glycoproteins, (2) glycopeptide isolation efficiently removed most of the high-abundance serum peptides and significantly reduced the sample complexity, and (3) the glycopeptides were able to be concentrated using this method, and it largely increased the chance of peptide identification by LC–MS/MS. Based on the MS/MS database searching results, all five proteins were identified by at least one glycopeptides. These glycopeptides were listed in Table 4. We selected peptide MS/MS spectra from each glycopeptide in order to show the fragment ions in Fig. 3.

Fig. 3

MS/MS spectra and fragment ion lists of selected N-linked glycopeptides from five tumor markers. N* N-linked glycosylation sites and converted to D after deglycosylation

Table 4 Sequences of mass spectrometric identified N-linked glycopeptides from tumor markers


In this study, we analyzed five clinically used tumor marker proteins using mass spectrometry and identified at least two of their tryptic peptides and N-linked glycopeptides that can be detected by mass spectrometry as proteotypic peptides [10]. These identified proteotypic peptides for established clinically important serum tumor markers are identified here as important background data required for development of heavy-isotope-labeled-peptide standards. The MS/MS spectra and fragment ions from this study were presented here for selection of fragment ion transitions for the MRM approach. For clinical quantitative analysis, heavy peptide standards will be spiked into clinical samples to allow the mass spectrometers to detect both native and heavy peptide peaks for target peptides [46]. The native to heavy ratio of target peptides will provide the quantitative marker protein level in each sample. When coupled with chromatography and the MRM detection method, MS detection can be sensitive and quantitative for target marker proteins for each sample. Since multiple peptides can be monitored simultaneously, this method can be used to detect multiple marker proteins from a single assay. It provides an alternative approach for the clinical laboratory to measure marker protein levels in patient specimens in a precise, sensitive, and high-throughput way.

Using the mass spectrometry-based approach for marker detection, the accuracy of analytical results can be high. MS identification is a powerful method to detect specific peptides from a complex sample mixture. It provides significant advantages for the analysis of clinical samples that are often complex. Compared to immunoassays, no protein–protein interaction can adversely affect MS analysis. It significantly increases the analytical accuracy and reliability. Also, when multiple peptides are selected from one protein, the identification and quantification of these peptides can be used to measure marker concentration. Moreover, MS detection does not depend on the availability of highly specific, well-characterized antibodies, which is often the most significant technical obstacle in achieving a precise and sensitive immunoassay.

The peptides identified in this study were based on the widely available LTQ ion trap instrument. It has been shown that intrinsic peptide sequence is the major factor to determine whether a peptide from a protein can be detected by mass spectrometry methods [10]. However, mass spectrometers based upon different mechanisms for ionization, ion selection, ion separation, collision-induced dissociation, and detection may affect the generation of fragment ion patterns and peptide identification. The mass spectrometer parameter settings may also affect the MS identification result.

Digestion of proteins with proteases is another factor that might affect the peptide detection. For example, glycopeptide N*KSVILLGR (N* represents the N-linked glycosylation site) from Arg-C digestion of PSA will not be identified by MS if trypsin was used to cleave the peptide after K. In which case, a peptide with two amino acids (N*K) generated from tryptic digest would be too short for mass spectrometry detection.


We identified tryptic peptides and glycopeptides from five clinically established tumor marker proteins, including PSA, CEA, Her-2, hCG, and CA125. The identified peptide list provided candidate peptide targets for quantitative analysis of marker proteins in clinical specimens. Heavy-isotope-labeled peptides can be synthesized as standards based on the identified peptide sequences from this study for the development of assays of these five tumor markers. In addition, these peptides have potential utility for many laboratories for cancer proteomics research and may provide a basis for standardization across a broad proteomics community. This will provide valuable information for further refinement of standard reference materials and assays that are critically needed for cancer biomarker discovery and validation.

We expect that by coupling MRM detection with heavy-isotope-labeled-peptide standards, the MS-based approach has great potential as a clinical tool for specific, precise, and sensitive measurement of marker protein levels from a large number of clinical specimens.


  1. 1.

    Sokoll LJ, Chan DW. In: Abeloff MD, Armitage JO, Niederhuber JE, Kastan MB, McKenna WG, editors. Abeloff’s clinical oncology. 4th ed. Philadelphia, PA: Elsevier Inc.; 2008.

  2. 2.

    Meany DL, Chan DW. Comparability of tumor marker immunoassays: still an important issue for clinical diagnostics? Clin Chem Lab Med. 2008;46:575–6.

  3. 3.

    Sokoll LJ, Witte DL, Klee GG, Chan DW. Redesigned proficiency testing materials improve survey outcomes for prostate-specific antigen. A College of American Pathologists Ligand Assay Survey tool. Arch Pathol Lab Med. 2000;124:1608–13.

  4. 4.

    Keshishian H, Addona T, Burgess M, Kuhn E, Carr SA. Quantitative, multiplexed assays for low abundance proteins in plasma by targeted mass spectrometry and stable isotope dilution. Mol Cell Proteomics. 2007;6:2212–29.

  5. 5.

    Stahl-Zeng J, et al. High sensitivity detection of plasma proteins by multiple reaction monitoring of N-glycosites. Mol Cell Proteomics. 2007;6:1809–17.

  6. 6.

    Anderson L, Hunter CL. Quantitative mass spectrometric multiple reaction monitoring assays for major plasma proteins. Mol Cell Proteomics. 2006;5:573–88.

  7. 7.

    Zhang H, Li XJ, Martin DB, Aebersold R. Identification and quantification of N-linked glycoproteins using hydrazide chemistry, stable isotope labeling and mass spectrometry. Nat Biotechnol. 2003;21:660–6.

  8. 8.

    Tian Y, Zhou Y, Elliott S, Aebersold R, Zhang H. Solid-phase extraction of N-linked glycopeptides. Nat Protocols. 2007;2:334–9.

  9. 9.

    Zhang H, et al. High throughput quantitative analysis of serum proteins using glycopeptide capture and liquid chromatography mass spectrometry. Mol Cell Proteomics. 2005;4:144–55.

  10. 10.

    Mallick P, et al. Computational prediction of proteotypic peptides for quantitative proteomics. Nat Biotechnol. 2007;25:125–31.

  11. 11.

    Zhou Y, Aebersold R, Zhang H. Isolation of N-linked glycopeptides from plasma. Anal Chem. 2007;79:5826–37.

  12. 12.

    Eng J, McCormack AL, Yates JR 3rd. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom. 1994;5:976–89.

  13. 13.

    Keller A, Nesvizhskii AI, Kolker E, Aebersold R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem. 2002;74:5383–92.

  14. 14.

    Bause E. Structural requirements of N-glycosylation of proteins. Studies with proline peptides as conformational probes. Biochem J. 1983;209:331–6.

Download references


This work was supported by federal funds from the National Cancer Institute, National Institutes of Health, by grant R21-CA-114852 (to H.Z.) and U24 CA115102 (Early Detection Research Network, EDRN; to D.W.C.). We gratefully acknowledge the support from Dr. Robert Cole and Dr. Marjan Gucek for protein identification in the mass spectrometry/proteomics facility at Johns Hopkins School of Medicine and the support of trans-proteomic pipeline Software tools available from Aebersold group at the Institute for Systems Biology. We also acknowledge the contributions of Dr. David Bunk from the National Institute of Standards and Technology in the development of the EDRN standard reference material. Certain commercial equipment or materials are identified in this paper in order to specify adequately the experimental procedures. Such identification neither imply recommendation or endorsement by the National Institute of Standards and Technology nor imply that the materials or equipment identified are necessarily the best available for the purpose.

Author information

Correspondence to Hui Zhang.

Rights and permissions

Reprints and Permissions

About this article


  • Tumor marker
  • Cancer diagnostics
  • Mass spectrometry
  • Proteotypic peptide
  • Glycopeptide