Spectral counting assessment of protein dynamic range in cerebrospinal fluid following depletion with plasma-designed immunoaffinity columns
© Borg et al; licensee BioMed Central Ltd. 2011
Received: 6 May 2011
Accepted: 3 June 2011
Published: 3 June 2011
In cerebrospinal fluid (CSF), which is a rich source of biomarkers for neurological diseases, identification of biomarkers requires methods that allow reproducible detection of low abundance proteins. It is therefore crucial to decrease dynamic range and improve assessment of protein abundance.
We applied LC-MS/MS to compare the performance of two CSF enrichment techniques that immunodeplete either albumin alone (IgYHSA) or 14 high-abundance proteins (IgY14). In order to estimate dynamic range of proteins identified, we measured protein abundance with APEX spectral counting method.
Both immunodepletion methods improved the number of low-abundance proteins detected (3-fold for IgYHSA, 4-fold for IgY14). The 10 most abundant proteins following immunodepletion accounted for 41% (IgY14) and 46% (IgYHSA) of CSF protein content, whereas they accounted for 64% in non-depleted samples, thus demonstrating significant enrichment of low-abundance proteins. Defined proteomics experiment metrics showed overall good reproducibility of the two immunodepletion methods and MS analysis. Moreover, offline peptide fractionation in IgYHSA sample allowed a 4-fold increase of proteins identified (520 vs. 131 without fractionation), without hindering reproducibility.
The novelty of this study was to show the advantages and drawbacks of these methods side-to-side. Taking into account the improved detection and potential loss of non-target proteins following extensive immunodepletion, it is concluded that both depletion methods combined with spectral counting may be of interest before further fractionation, when searching for CSF biomarkers. According to the reliable identification and quantitation obtained with APEX algorithm, it may be considered as a cheap and quick alternative to study sample proteomic content.
Biomarkers are key tools for detecting and monitoring neurodegenerative processes. Clinical Proteomics is especially well-suited to the discovery and implementation of biomarkers derived from biofluids. A major limiting factor for in-depth proteomics profiling is the immense dynamic range of biofluid proteins, which spans 10 to 12 orders of magnitude . In human plasma, the 22 most abundant proteins are responsible for ~99% of the bulk mass of the total proteins, thus leaving several hundreds or thousands of proteins in the remaining 1%. Many biomarkers of "interest" are anticipated to be present at low concentrations and their detection is therefore hindered by highly abundant proteins. To overcome this problem, enrichment techniques and orthogonal fractionation strategies are routinely applied in proteomics studies prior to mass spectrometry (MS) analysis. Recent studies have demonstrated a substantial impact of multidimensional fractionation on the overall number of proteins identified and on sequence coverage [2–6]. Despite its benefits, extensive fractionation contributes to experimental variability and limits sample throughput.
Cerebrospinal fluid (CSF) in particular is directly related to the extracellular space of the brain and is therefore a valuable reporter of processes that occur in CNS. In the last few years, a number of proteomics strategies have been adopted to achieve in-depth coverage of the human CSF proteome. SCX-fractionation and LC-MALDI were used to identify 1,583 CSF proteins . GeLC-MS/MS approach allowed identification of 798 proteins from albumin-depleted CSF . Recently, combinatorial peptide ligand library was employed to decrease CSF dynamic range and identify 1,212 proteins . In an attempt to generate a comprehensive CSF database, Pan et al.  combined and re-analyzed the results of various CSF proteomics studies and reported 2,594 unique proteins with high confidence.
A number of commercial depletion systems are available for highly selective removal of 1, 14, 20, or over 60 of the most abundant proteins present in human plasma. Although these systems were initially designed to deplete plasma/serum samples, they have been widely used for other biofluids such as CSF. A number of reports have evaluated the efficiency and reproducibility of these systems [9–15]. They have also pointed out the potential loss of non-target proteins as a result of non-specific binding to immunodepletion columns [10, 12].
Here we evaluated the advantages afforded by immunodepletion and pre-fractionation of CSF samples. For this purpose, human CSF samples were analyzed after the removal of albumin or 14 HAP (high abundance protein) and were compared with non-depleted CSF samples without further offline fractionation. Noteworthy, the commercial depletion system used to remove 14 HAP was designed to stoichiometrically remove the 14 most abundant proteins in normal plasma/serum samples. Depleted samples were then analyzed by LC-MS/MS and further profiled using a modified spectral counting approach. In addition to proteome depth, we evaluated the performance of CSF enrichment and fractionation strategies in terms of reproducibility and experimental bias.
Protein recovery after immunodepletion
Total protein quantitation upon immunodepletion procedure.
Before depletion (μg)
Flow-through fraction (μg)
Bound fraction (μg)
248 ± 40
301 ± 25
106 ± 2
425 ± 6
Reproducibility of MS1 or MS2 spectral counts following various depletion methods.
Pattern similarity following various depletion methods.
Number of detected features
Number of common features in 3 replicates
Number of common features in 2 replicates
Number of features in only 1 replicate
Peptide and protein identification
Summary of peptide and protein identification after application of depletion methods and peptides prefractionation.
Number of spectra identified
Number of unique peptides identified1
Number of proteins identified2
Peptide fractionation techniques are expected to increase the depth of analysis while possibly deteriorating experimental reproducibility. We set out to evaluate: (1) the gain in proteome coverage attained after peptide fractionation using offline reversed-phase; (2) the overall improvement of sample dynamic range; (3) experimental reproducibility in terms of peptide and protein identification.
We compared the protein list generated with Mascot search alone using a target-decoy strategy or Mascot search combined with PeptideProphet and ProteinProphet validation analyses. CSF immunodepletion with IgYHSA column and analysis with 2DLC-MS/MS of one of the replicates led to the identification of 913 proteins with Mascot alone (FDR < 0.001). In contrast, with Mascot-TPP (PeptideProphet and ProteinProphet) strategy, a total of 947 proteins were identified, 402 of which were identified with high confidence and the remaining 545 identifications were grouped into one of the 187 protein groups for which members could not be distinguished on the basis of the peptides observed. The other replicates followed a similar trend.
The increased depth of analysis achieved with fractionation was also evident in terms of number of LAP detected in the sample. The number of proteins below 2 orders of magnitude from the most abundant protein as determined by APEX was used as a parameter to evaluate sample dynamic range following peptide pre-fractionation. Immunodepletion alone improved the number of LAP from 5 to 18 (Figure 2), whereas immunodepletion coupled with reversed-phase pre-fractionation further improved it to 53 proteins (Figure 2D).
Here we demonstrate that the reduction of sample complexity prior to analysis improves proteome coverage and the resolution of LAP. The combination of immunodepletion of the HAP and peptide fractionation is particularly attractive for "mining" CSF proteome. The objective of the study was to compare two immunodepletion methods with a simple and efficient procedure rather than identifying the largest number of proteins.
Protein inference following shotgun LC-MS/MS experiments is particularly complicated in biofluids, such as blood plasma or CSF, because of the frequent occurrence of protein families, multiple protein isoforms, and homologous proteins. The presence of peptides common to multiple proteins may lead to erroneous results at the qualitative and quantitative levels . In the present study, we used ProteinProphet software with Occam's razor rules to reduce the protein list to the minimal set that can explain the peptides observed. To illustrate the effects of this strategy on our dataset, we compared the protein list generated with the Mascot search alone using a target-decoy strategy or Mascot search combined with PeptideProphet and ProteinProphet validation analyses. It should be noted that more than 86% proteins were identified with more than one peptide and that all peptide-spectrum matches (PSM) passed the > 0.95 PeptideProphet score. The enhancement of protein identification observed following CSF immunodepletion is in accordance with previous reports [11–14]. It should be noted that albumin depletion significantly improved protein identification in the present study. Moreover, 25 additional proteins were identified following 14-proteins vs. albumin depletion, while a previous study did not report increased identification with depletion of 6 proteins compared to albumin alone . Another study compared two brands of 14 HAP depletion columns . A large number of proteins were identified with both methods, but no quantitation was performed in the flow-through. Furthermore, in serum, improved protein identification appears to be related, but to a certain extent only, to the number of proteins depleted .
One of the most remarkable aspects of this study was the use of a spectral counting approach, namely APEX, to calculate protein abundance in the sample. Of note, the global dynamic range calculated with APEX was similar in the immunodepleted and the non-depleted samples. This finding was expected since the experimental dynamic range observed is a function of the MS dynamic range. It is in accordance with previous reports [13, 14]. Nevertheless, we observed a significant improvement not only in the overall number of proteins and peptides identified, but also in the number of proteins with at least two orders of magnitude below the abundance of the most concentrated protein in the sample. These improvements were observed regardless of the immunodepletion system used, as only 4 LAP were additionally identified following 14-proteins depletion vs. albumin only. These results suggest that the ideal workflow should be elaborated individually for each study, taking into account number of identified proteins, as well as loss of non-target proteins. Dynamic range may possibly extend to 3 logs below that of HAP, if depletion methods were specifically designed to CSF and contained specific HAP like Prostaglandin-D-isomerase or Cystatin-C. Combinatorial peptide ligand library technology is another technique that was recently used to decrease dynamical range and thus increase LAP identification . Several hundreds of new proteins were identified. However this method needs large sample volumes and extensive fractionation. When this method was adapted to small volumes, the total number of identified proteins was reduced to 530, which is quite similar to the number reported in the present study following fractionation (n = 520).
Here we compared various methods attempting at enrichment of low-abundance proteins in CSF. This approach may be particularly useful in an effort to identify biomarkers for neurological diseases. The novelty of this study was to show the advantages and drawbacks of these methods side-to-side. We named and ranked proteins following two depletion strategies. Immunodepletion of high abundance proteins was shown to improve at least 3 folds detection of low abundance proteins, with good reproducibility. We compared dynamic range following immunodepletion alone or combined with peptide prefractionation. Offline fractionation using reversed-phase LC further increased 3 to 4 folds the overall number of proteins identified. According to the reliable identification and quantitation obtained with APEX algorithm, it may be considered as a cheap and quick alternative to study sample proteomic content, helping proteomics researchers to design more suitable analytical strategies. The optimal method should allow enhanced detection of LAP and prevent unspecific protein losses. These data also stress the urgent need for immunodepletion columns that specifically target the most abundant CSF proteins
Materials and methods
Using an atraumatic needle, CSF was obtained by lumbar puncture (2-4 ml per patient) from subjects attending the Department of Neurology at University Hospital Saint Etienne. CSF was collected in 12-mL polypropylene tubes (VWR), transferred on ice to the laboratory and centrifuged (3.000 × g, 10 min, +4°C). Fluid was aliquoted into 0.5 mL polypropylene cryotubes (VWR) and stored at -80°C. The study was approved by the local ethics committee of University of Saint Etienne. CSF samples from 5 ALS patients (Amyotrophic Lateral Sclerosis) aged 50-76 with clinically diagnosed or probable ALS following El Escorial diagnostic criteria were pooled and used for further analysis.
The present study was devised using a single pooled CSF sample that was further divided into 12 aliquots. Each aliquot contained 780 μg total protein (Table 1). Nine of these aliquots were used for immunodepletion evaluation as follows: 3 were depleted of the 14 most abundant proteins (IgY14), 3 were depleted of albumin (IgYHSA), and the remaining 3 were not immunodepleted. The remaining 3 aliquots were depleted of albumin and further offline-fractionated using reversed-phase liquid chromatography under basic pH after protein digestion.
Immunoaffinity depletion of highly abundant proteins
CSF immunodepletion of highly abundant proteins was performed using pre-packed liquid chromatography Seppro® columns (GenWay Biotech Inc.). The term IgYHSA, refers to the column used for immunodepletion of albumin alone while IgY14 refers to that used for immunodepletion of albumin, IgG, α1-antitrypsin, IgA, IgM, transferrin, haptoglobin, α1-acid glycoprotein, α2-macroglobulin, fibrinogen, complement C3, and apolipoproteins A-I, A-II and B. Prior to injection on the column, each CSF sample was passed through a 0.45 μm pore size filter to remove particulates. As a result of the loading capacity of IgY14 columns, CSF aliquots subjected to these columns were further concentrated using a 3 kDa molecular weight cutoff (MWCO) filter (Millipore). A chromatographic column was set up on an ÄKTA Ettan system (GE Healthcare) and run following manufacturer's instructions. Finally, flow-through was desalted and proteins concentrated using a 3 kDa MWCO filter.
Sample preparation for LC and LC-MS/MS
The final protein concentration of the depleted samples was determined by a bicinchoninic acid colorimetric assay (Pierce Biotechnology) using BSA as standard. Seventy five μg protein of each sample in dissolution buffer (0.1 M triethylammonium bicarbonate, 0.1%SDS) was reduced with 5 mM tris-(2-carboxy-ethyl)-phosphine for 60 min at 60°C. Free sulfhydryl groups of cysteine residues were then blocked with 15 mM iodoacetamide for 20 min at room temperature. Digestion with trypsin (Promega) was performed overnight at 37°C at a 1:50 enzyme-substrate ratio.
To evaluate the impact of peptide fractionation following IgYHSA immunodepletion, trypsin-digested peptides were pre-fractionated offline by reversed phase liquid chromatography under basic pH conditions (RPb) on an ÄKTA system (GE Healthcare) using a 300 Extend C18 column (150 mm length × 2.1 mm ID, 5 μm particles, 300Å pore size; Agilent). CSF peptides were fractionated into 30 fractions. Peptide mixture dissolved in buffer A (25 mM NH4OH, pH9.5) was loaded onto the column and eluted with a gradient of 0 to 10% buffer B (25 mM NH4OH in acetonitrile pH9.5) over 3 min, then 10% to 28% buffer B for 8 min, and 28% to 45% buffer B for 4 min at 0.5 mL/min column flow rate. Fractions were collected at intervals of 30 seconds. Finally, acetonitrile was removed by evaporation and fractions were stored at -20°C until further use.
Prior to LC-MS/MS analysis, dried peptide samples were reconstituted with 0.1% aqueous formic acid. Peptide concentration estimates were extrapolated either from protein concentrations (non-fractionated samples) or from peptide absorbance at 215 nm during fractionation (RPb-fractionated samples). Approximately 200 ng of each sample was then loaded onto a 0.180 mm × 20 mm C18 precolumn Symmetry® (Waters Corp., Milford, MA) coupled to an analytical C18 column (BEH130™ 75 μm × 10 cm, 1.7 μm, Waters Corp.) at 15 μl/min flow rate using nanoACQUITY Ultra Performance LC™ system (Waters Corp., Milford, MA). Peptides were separated in a 70 min gradient of 1-35% buffer B, followed by 15 min of 35-50% B (A = 0.1% formic acid in water, B = 0.1% formic acid in acetonitrile), at 250 nl/min flow rate. The column outlet was directly connected to an Advion Triversa Nanomate (Advion) fitted on an LTQ-FT Ultra mass spectrometer (Thermo). The mass spectrometer was operated in a data-dependent mode. Survey full-scan MS spectra (m/z 400-1800) were acquired in the FT with R = 100.000 at m/z 400 (after accumulation of a target value of 1e6). The five most intense ions were sequentially isolated for fragmentation and detection in the linear ion trap using collisionally induced dissociation at a target value of 50.000, 1 microscan averaging and a normalized collision energy of 35%. Target ions already selected for MS/MS were dynamically excluded for 30 s. Spray voltage and delivery pressure in the Nanomate source were set to 1.75 kV and 0.3 psi respectively. Capillary voltage and tube lens on the LTQ-FT were tuned to 35V and 109V. Minimal signal required to trigger MS to MS/MS switch was set to 100 and activation Q was 0.250. The spectrometer was working in positive polarity mode and singly charge state precursors were rejected for fragmentation. We performed at least one blank run before each analysis in order to ensure the absence of cross contamination from previous samples.
MS raw data files were processed with Mascot Distiller (Version 2.3.2, Matrix Science, London). The resulting peak lists were searched with Mascot (Version 2.1) against the human International Protein Index (IPI) database (Version 3.71) concatenated with reversed IPI sequences. Search criteria were as follows: full tryptic specificity was required with up to 2 missed cleavage sites allowed; the precursor ion m/z tolerance was set at 20 ppm; the product ion m/z tolerance at 0.6 Da; carbamidomethylation (Cys) was set as fixed modification and oxidation (Met) as variable modification.
Peptide-spectrum matches (PSMs) were subjected to statistical validation with the PeptideProphet algorithm (TransProteomic Pipeline - TPP v4.3) using the accurate mass model option and the semi-supervised approach . In brief, the expectation-maximization (EM) algorithm used by PeptideProphet to construct a Bayes classifier incorporates decoy peptide hits information from a target-decoy database search. All PSMs with PeptideProphet ≥ 0.95 were kept for further analyses. Finally, Occam's razor logic as implemented in ProteinProphet algorithm was applied to generate the most coherent list of proteins identified. Therefore, redundant protein entries were removed by clustering peptides by matching multiple members of a protein family to a single protein group and considering them as a single identification. Degenerate peptides were discarded before downstream quantitative analysis.
To gain insight into the protein profiling distinctiveness of the three protein depletion strategies, we used the modified spectral counting technique APEX (v1.2). This approach makes use of a machine-learning classification algorithm to predict peptide detectability. The program generates a correction factor for each protein (O i value), which is then used to predict the number of tryptic peptides expected to be detected for a given amount of a particular protein. Finally, spectral counts for each protein observed in a given run are corrected with their respective predicted O i value.The APEX abundance is therefore a modified spectral counting method in which the total observed spectral count for a given protein is normalized by expected (predicted) count (Oi) for one molecule or protein. In this regard, APEX abundance is considered the relative abundance of a particular protein with respect to all other proteins in the same sample.
Pattern similarity and quantitative analyses were performed using SuperHirn algorithm . Briefly, SuperHirn performs peak detection and deisotoping followed by peak integration on each LC-MS run in order to build a peptide feature map. Multiple peptide feature maps are then aligned using 10 ppm precursor tolerance within a window of 60 second retention time.
We thank Barcelona Science Park for Alexandre Campos fellowship and the Spanish Proteomics Network, ProteoRed-ISCIII. Tanya Yates (Institute for Research in Biomedicine, Barcelona) is gratefully acknowledged for excellent assistance in writing the manuscript. We also deeply appreciate those who donated their CSF for this study.
- Roche S, Gabelle A, Lehmann S: Clinical proteomics of CSF: towards discovery of new biomarkers. Proteomics Clin Appl. 2008, 2: 428-436. 10.1002/prca.200780040View ArticlePubMedGoogle Scholar
- Abdi F, Quinn JF, Jankovic J: Detection of biomarkers with a multiplex quantitative proteomic platform in CSF of patients with neurodegenerative disorders. J Alzheimers Dis. 2006, 9: 293-348.PubMedGoogle Scholar
- Faca V, Pitteri SJ, Newcomb L, Glukhova V: Contribution of protein fractionation to depth of analysis of serum and plasma proteomes. J Proteome Res. 2007, 6: 3558-65. 10.1021/pr070233qView ArticlePubMedGoogle Scholar
- Yuan X, Desiderio DM: Proteomics analysis of human CSF. J Chromatogr. 2005, 815: 179-89. 10.1016/j.jchromb.2004.06.044. 10.1016/j.jchromb.2004.06.044Google Scholar
- Zhang J, Goodlett DR, Peskind ER, Quinn JF: Quantitative proteomic analysis of age-related changes in human CSF. Neurobio Aging. 2005, 26: 207-227. 10.1016/j.neurobiolaging.2004.03.012. 10.1016/j.neurobiolaging.2004.03.012View ArticleGoogle Scholar
- Zougman A, Pilch B, Podtelejnikov A, Kiehntopf M: Integrated analysis of CSF peptidome and proteome. J Proteome Res. 2008, 7: 386-399. 10.1021/pr070501kView ArticlePubMedGoogle Scholar
- Mouton Barbosa E, Roux-Dalvai F, Bouyssie D, Berger F: In-depth exploration of CSF by combining peptide ligand library treatment and label-free protein quantification. Mol Cell Proteomics. 2010, 9: 1006-21. 10.1074/mcp.M900513-MCP200PubMed CentralView ArticlePubMedGoogle Scholar
- Pan S, Zhu D, Quinn JF, Peskind ER: A combined dataset of human CSF proteins identified by multi-dimensional chromatography and tandem mass spectrometry. Proteomics. 2007, 7: 469-473. 10.1002/pmic.200600756View ArticlePubMedGoogle Scholar
- Schutzer SE, Liu T, Natelson BH, Angel TE: Establishing the proteome of normal human CSF. PLoSOne. 2010, 5: 10980-View ArticleGoogle Scholar
- Liu T, Qian WJ, Mottaz HM, Gritsenko MA: Evaluation of Multiprotein Immunoaffinity Subtraction for Plasma Proteomics and Candidate Biomarker Discovery Using Mass Spectrometry. Molecular & Cellular Proteomics. 2006, 5: 2167-2174. 10.1074/mcp.T600039-MCP200View ArticleGoogle Scholar
- Ramstrom M, Hagman C, Mitchell JK, Derrick PJ, Hakansson P, Bergquist J: Depletion of high-abundant proteins in body fluids prior to liquid chromatography fourier transform ion cyclotron resonance mass spectrometry. J Proteome Res. 2005, 4: 410-6. 10.1021/pr049812aView ArticlePubMedGoogle Scholar
- Wetterhall M, Zuberovic A, Hanrieder J, Bergquist J: Assessment of the partitioning capacity of high abundant proteins in human CSF using affinity and immunoaffinity subtraction spin columns. J Chromatography. 2010, 878: 1519-1530. 10.1016/j.jchromb.2010.04.003. 10.1016/j.jchromb.2010.04.003Google Scholar
- Shores KS, Knapp DR: Assessment approach for evaluating high abundance protein depletion methods for CSF proteomic analysis. J Proteome Res. 2007, 6: 3739-51. 10.1021/pr070293wView ArticlePubMedGoogle Scholar
- Thouvenot E, Urbach S, Dantec C, Poncet J, Seveno M, Demettre E: Enhanced detection of CNS cell secretome in plasma protein-depleted CSF. J Proteome Res. 2008, 7: 4409-21. 10.1021/pr8003858View ArticlePubMedGoogle Scholar
- Xu J, Chen J, Peskind ER, Jin J: Characterization of proteome of human CSF. Int Rev Neurobiol. 2006, 73: 29-98.View ArticlePubMedGoogle Scholar
- Lu P, Vogel C, Wang R, Yao X, Marcotte EM: Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation. Nat Biotechnol. 2007, 25: 117-24. 10.1038/nbt1270View ArticlePubMedGoogle Scholar
- Vogel C, Marcotte EM: Calculating absolute and relative protein abundance from mass spectrometry-based protein expression data. Nat Protoc. 2008, 3: 1444-51.View ArticlePubMedGoogle Scholar
- Jin S, Daly DS, Springer DL, Miller JH: The effects of shared peptides on protein quantitation in label-free proteomics by LC/MS/MS. J Proteome Res. 2008, 7: 164-9. 10.1021/pr0704175View ArticlePubMedGoogle Scholar
- Fratantoni SA, Piersma SR, Jimenez CR: Comparison of the performance of two affinity depletion spin filters for quantitative proteomics of CSF: Evaluation of sensitivity and reproducibility of CSF analysis using GeLC-MS/MS and spectral counting. Proteomics Clin Appl. 2010, 4: 613-617. 10.1002/prca.200900179View ArticlePubMedGoogle Scholar
- Roche S, Tiers L, Provansal M, Sevenod M: Depletion of one, six, twelve or twenty major blood proteins before proteomic analysis: The more the better?. J Proteomics. 2009, 72: 945-951. 10.1016/j.jprot.2009.03.008View ArticlePubMedGoogle Scholar
- Choi H, Nesvizhskii AI: Semisupervised model-based validation of peptide identifications in mass spectrometry-based proteomics. J Proteome Res. 2008, 7: 254-65. 10.1021/pr070542gView ArticlePubMedGoogle Scholar
- Mueller LN, Rinner O, Schmidt A, Letarte S: SuperHirn - a novel tool for high resolution LC-MS-based peptide/protein profiling. Proteomics. 2007, 7: 3470-80. 10.1002/pmic.200700057View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.