A feasibility study to identify proteins in the residual Pap test fluid of women with normal cytology by mass spectrometry-based proteomics

Background The proteomic analysis of body fluids is a growing technology for the identification of protein biomarkers of disease. Given that Papanicolaou tests (Pap tests) are routinely performed on over 30 million women annually in the U.S. to screen for cervical cancer, we examined the residual Pap test fluid as a source of protein for analysis by mass spectrometry (MS). In the liquid-based Pap test, cervical cells are collected from the ectocervix and placed into an alcohol-based fixative prior to staining and pathologic examination. We hypothesized that proteins shed by cells of the female genital tract can be detected in the Pap test fixative by MS-based proteomic techniques. We examined the feasibility of using residual fluid from discarded Pap tests with cytologically “normal” results to optimize sample preparation for MS analysis. The protein composition of the cell-free Pap test fluid was determined by silver staining of sodium dodecyl sulfate -polyacrylamide gels, and the abundance of serum proteins was examined by Western immunoblot using an antibody against human serum albumin. Both pooled and individual samples were trypsin digested and analyzed by two-dimensional MS/MS. Proteins were identified by searching against the Human Uniprot database, and characterized for localization, function and relative abundance. Results The average volume of the residual Pap test fluid was 1.5 ml and the average protein concentration was 0.14 mg/ml. By Western immunoblot we showed that the amount of albumin in each sample was significantly reduced compared to normal serum. By MS/MS, we identified 714 unique proteins in pooled Pap test samples and an average of 431 proteins in individual samples. About 40% of the proteins identified were extracellular or localized to the plasma membrane. Almost 20% of the proteins identified were involved in immunity and defense, characteristic of the healthy cervical-vaginal proteome. By merging the protein sets from the individual and pooled Pap test samples, we created a “Normal Pap test Core Proteome” consisting of 153 proteins. Conclusions Residual Pap test fluid contains a sufficient amount of protein for analysis by MS and represents a valuable biospecimen source for the identification of protein biomarkers for gynecological diseases.


Background
Screening for cervical cancer by Papanicolaou tests (Pap tests) has been routinely performed for over 50 years [1]. The liquid-based Pap test consists of collecting cervical cells from the ectocervix and placing them into a vial containing a fluid transport medium to preserve the cells [2,3].
Two FDA approved liquid-based Pap tests are widely used for the screening and detection of cervical cancer, pre-cancerous lesions, and atypical cells [4]. One Pap test, which we used in this study, is the SurePath TM Pap test [Becton-Dickinson (BD Diagnostics, Burlington, NC)] which has an alcohol-based fixative consisting of 21.7% ethanol, 1.2% methanol, 1.1% isopropanol, and formaldehyde [5]. The second Pap test, the ThinPrep Pap test (Hologic, Inc., Bedford, MA) contains 30-60% methanol as the fixative [6]. In each case, fixative is removed from the vials and undergoes automated processing so that the cells are stained on a slide, and then examined by a pathologist to identify the presence of premalignant and malignant cells. The liquid fixative solution in which the cells are collected for Pap tests is routinely discarded after examination of the cells. Over 30 million Pap tests are analyzed annually by cytopathologists [4,[7][8][9]; making this an abundant source of samples for experimentation and for the potential detection of a variety of gynecological diseases in the future. To our knowledge, no one has analyzed the residual Pap test fluid by the latest mass spectrometry (MS)-based proteomic techniques to identify proteins or potential biomarkers of disease.
Several groups have performed mass spectrometrybased proteomic analysis of cervical-vaginal fluid obtained using swabs, gauze, or Dacron-tipped plastic applicators (reviewed in [10]). Cervical-vaginal fluid is a complex biological fluid that protects and lubricates the endometrial, cervical and vaginal lining. This fluid contains proteins predominantly synthesized by the endocervix and vaginal cells, but it has been shown to also contain proteins from amniotic fluid leakage during pregnancy, from endometrial and tubal secretions, and the peritoneal fluid [11][12][13][14][15]. Studies have attempted to define the proteome of healthy women as well as identify potential markers for preterm birth, pregnancy, and intra-amnionic infection [10,11,[13][14][15][16][17][18][19][20][21]. However, to date, the use of residual Pap test fluid as a source for proteomics and biomarker discovery has not been reported.
The primary objective of this study was to determine whether residual Pap test fixative is a suitable source of protein for mass spectrometry-based proteomic techniques. We have quantified the concentration of protein present in the residual SurePath TM fixative of Pap tests taken from over 100 women with normal cytology. We developed a protocol for processing the residual Pap test fluid so that peptides can be analyzed by MS/MS and proteins identified from the Human Uniprot database. Finally, we found extensive overlap between the proteins that we define as our "Normal Pap test Core Proteome" and lists of cervicalvaginal fluid proteins identified by others using different sampling methods [10,11,[13][14][15][17][18][19][20][21][22].

Cell-free residual Pap test fluid contains protein
To determine whether the cell-free fluid remaining after the examination of cervical cells from the SurePath TM liquid based Pap test preparation contained sufficient protein for mass spectrometry analysis, we measured the volume and protein content of over 100 residual SurePath TM samples. On average, these samples contained 1.5 ml of SurePath TM fixative. The protein concentration in 72 of the samples was determined using the bicinchoninic acid (BCA) protein assay (Pierce Protein Research Products, Rockford, IL) on duplicate samples and ranged from undetectable to more than 0.7 mg/ml; with an average protein concentration of 0.14 mg/ml ( Figure 1A). Sixteen of these 72 Pap test fixative samples were randomly selected to be examined by sodium dodecyl sulfate (SDS)-polyacrylamide gel electrophoresis (PAGE). We found many protein bands visible by silver stain, indicating the presence of both high and low abundance proteins in residual Pap test fluid ( Figure 1B). Overall, the protein patterns appeared relatively similar in number, size, and intensity among the individual samples. Several major protein bands of 50-250 kD were detected in almost all of the samples, as well as proteins of~25 kD and 10-15 kD.
To determine whether the variation in protein concentration of the residual Pap test fluid was due to contamination of the samples with blood proteins, we separated the proteins from the residual Pap test fluid of five individuals by size on SDS-PAGE ( Figure 1C) and then performed Western immunoblot analysis of the protein using an antibody to human serum albumin ( Figure 1D). Comparison of an equal amount of serum (lane 6; S) to the protein in the residual Pap test fluid (lanes 1-5) showed the variable presence of albumin in each of the residual Pap test samples, however at a substantially lower level than was found in serum (lane 6). The results of the Western immunoblot analysis also demonstrated that the protein concentration of the residual Pap test fluid did not directly correlate with the level of serum albumin present. For example, the sample with the highest protein concentration of 0.5 mg/ml ( Figure 1C and D, lane 2, large arrow) did not contain more serum albumin than the other samples. Similarly, the sample in which the least amount of serum albumin was detected ( Figure 1C and D, lane 4, small arrow) had the second highest protein concentration of 0.4 mg/ml.

Mass spectrometry of pooled Pap test samples
In order to get an overview of the proteins present in the SurePath TM fluid, we pooled residual Pap test fluid from 40 women with normal cervical cytology for analysis by 2D tandem mass spectrometry. These 40 samples were selected from the 56 samples that remained from the original 72 samples ( Figure 1A), after 16 samples were used for SDS-PAGE analysis ( Figure 1B). The selection of these 40 samples was based solely on the fact that they contained >50 ug of protein. A total of 714 unique proteins were identified when the pooled samples were run in two separate experiments (see Additional file 1). Only proteins from UniProtKB/Swiss-Prot (reviewed) are reported in Additional file 1. The cellular localization of the 714 proteins was determined using Gene Ontology (GO) classifications ( Figure 2A) [23]. Over 40% of the proteins identified in the pooled Pap test samples were extracellular proteins or plasma membrane proteins. The remaining proteins were cytoplasmic or nuclear proteins, suggesting the occurrence of cell lysis in situ. The proteins identified in the pooled Pap test samples were also classified according to several general functional terms by the PANTHER classification system ( Figure 2B) [24] and grouped into over a dozen categories. The major functional groups contained proteins involved in immunity and defense (19%), protein metabolism and modification (15%), the cytoskeleton (10%), and other cellular processes such as cell signaling (10%) and cell adhesion (5%). Minor groups of proteins were involved in transport (4%), cell cycle (3%), and reproduction (2%).

Sources of variability in mass spectrometry analysis
In LC/MS proteomic studies, several sources of variability exist, including biological, technical, and experimental [25,26]. In order to address the issue of technical variability which occurs during sample preparation (including trypsin digestion and solid phase extraction clean up), we randomly selected a Pap test sample from a healthy individual and precipitated the protein with acetone. The protein was then divided into two identical aliquots. These two samples were then digested by the filter aided sample preparation FASP technique in parallel and then these replicates were analyzed by LC/MS. The average of all standard deviations  calculated for each protein in the replicates was calculated to have a variance of 1.23×10 −3 with a CV of 19.23%. We then performed independent injections of one aliquot in three different MS runs; the short-term runto-run instrumental variance was estimated to be 5.69×10 −4 . These results are comparable to values obtained in the literature [25,26].

Mass spectrometry of individual Pap test samples
Five residual Pap test samples were randomly selected from a second cohort of 20 individuals with normal cytology and were prepared for mass spectrometry using the FASP technique (see Methods). On average, 431 proteins were identified in the individual samples (ranging from 317 to 539 proteins) (  from UniProtKB/Swiss-Prot (reviewed) are reported in Additional file 2. Approximately 70% (60-85%) of the proteins identified in the individual samples were also found in the pooled samples ( Table 1). The lists of proteins that were identified in the Pap test fluid from each of five individuals (Additional file 2) were then analyzed for their frequency of occurrence. The 153 proteins that were present in 4 of the 5 individuals are hereafter designated the, "Normal Pap test Core Proteome" and are listed in Table 2 with their Protein name, Gene name, and Swiss-Prot accession number. Classification of all 153 proteins in the "Normal Pap test Core Proteome" based on cellular localization ( Figure 3A) shows that most of the proteins were derived from the cytoplasm (59%), and over one third of the proteins were extracellular (29%) or in the plasma membrane (9%), which is in agreement with the pooled sample cellular localization categories ( Figure 2A). Functional classification of the 153 proteins in the "Normal Pap test Core Proteome" ( Figure 3B) is also similar to the pooled samples and shows a great diversity of biological roles, in which immunity and defense (20%), cytoskeletal proteins (15%), and protein metabolism and modification (12%) are the largest categories ( Figure 2B). One difference between the functional categories of proteins present in the Pooled Pap test and the "Normal Pap test Core Proteome" is the percentage of proteins involved in blood circulation and coagulation that were identified. In the "Normal Pap test Core Proteome", 18/153 (12%) were categorized as functioning in blood circulation and coagulation. In contrast, in the Pooled Pap test samples, only 5% (36 of 685) of the proteins were in this category.

Overlap of "Normal Pap Test Core Proteome" with other CVF proteomic studies
In a comprehensive proteomic analysis of cervicalvaginal fluid (CVF), Zegels et al. [13] determined a set of 136 "CVF Core Proteins" which were present in at least three of the four most comprehensive analyses of the CVF proteome [11,13,15,20]. We compared the lists of proteins that we had identified in the residual Pap test fluid of the 5 individuals (Additional file 2) to the list of "CVF Core Proteins", and found 132 of the 136 "CVF Core Proteins" were present in at least one of the individual Pap test samples. Furthermore, about half (64) of the 153 proteins listed in our "Normal Pap test Core Proteome" were also present in the "CVF Core Proteome" ( Table 2, column 5). An additional 61 of the proteins in our "Normal Pap test Core Proteome" were also found in at least one of ten analyses of CVF proteins enumerated in a recent review [10] ( Table 2, last column). These data demonstrate that the use of residual Pap test fluid for the identification of CVF proteins is similar to other sampling and detection methodologies.

Estimation of protein abundance
We estimated the relative abundance of the proteins identified in the individual and pooled Pap test samples by calculating a normalized spectral abundance factor (NSAF) for each protein (Additional files 1 and 2, last column) that takes into account both the spectral counts for each protein as well as the protein size [13,27,28]. Ten of the "Normal Pap test Core" proteins were among the thirty most abundant proteins in at least five experiments. These proteins include neutrophil gelatinase-associated lipocalin, serotransferrin, lactotransferrin, S100A8 and S100A9, which all play a role in immune response. Albumin, hemoglobin alpha, and hemoglobin beta were also among the ten proteins found in at least five experiments.

Discussion
This study represents the first publication in which the cell-free residual Pap test fluid has been examined as a source for proteomic profiling of CVF in women with normal cervical cytology. Using pooled samples, we identified more than 700 unique proteins; while in individual Pap test samples more than 300 proteins were identified. By merging proteins identified in the pooled samples with proteins identified in 4 of 5 individual Pap tests analyzed by MS, we determined a "Normal Pap test Core Proteome" of 153 proteins that is similar in composition to that of other proteomic analyses of CVF [10,11,[13][14][15]17,18,[20][21][22].
Previous characterization of the CVF proteome has relied on sampling methods such as Dacron tipped swabs [15,17,19,20], sponges or gauze [11,16], or direct collection of CVF [29,30] or cervical washings [13,14]. The total number of proteins identified in each individual Pap test was counted and the proteins listed in Additional file 2. False positive rates were < 1.0% for all experiments. The lists of proteins that were identified for each individual were compared to the list of 153 proteins identified in our newly defined "Normal Pap test Core Proteome" (listed in Table 2).   Only in the analysis by Zegels [13], who used cervical washings collected during colposcopy, were routine clinical samples utilized for proteomics. In addition, our MS/MS proteomic technique using the individual residual Pap test samples still yielded as many or more protein identifications than previously reported proteomic analyses of CVF, which at most found 685 proteins [10,13]. The use of the FASP protocol for trypsin digestion combined with sensitive instrumentation for the mass spectrometry analysis made the analysis of individual specimens possible. We used Genome Ontology databases to classify the proteins identified in residual Pap test fluid by cellular localization and biological processes [23,24]. In both the pooled and individual samples, approximately 40% of the proteins identified were localized to the plasma membrane or extracellular compartments. This is similar to other studies of CVF which found approximately 30% of the proteins identified were extracellular or membranous in origin [10,11,13]. Similarly, we also identified many proteins involved in immunity and defense, proteolysis, cell adhesion and numerous cytoskeletal proteins. Among the cytoskeletal proteins, we report several keratin proteins as part of our "Normal Pap test Core Proteome". While keratins are commonly considered a contaminant in mass spectrometry, cytokeratin intermediate filaments are components of the cornified envelope (CE), a highly crosslinked structure formed beneath the plasma membrane of epithelial cells that serves a barrier function [31]. Additional structural CE proteins, such as involucrin and periplakin, were identified in our study and in other proteomic analyses of CVF [11,13,15,20]. Indeed, Zegels et al. [13] reported that a "large portion" of the proteins identified in their study were CE components, although the identification of cytokeratins was apparently excluded from their analysis. The presence of these and other intracellular proteins in the cell-free residual Pap test fluid is likely due to in situ cytolysis, through mechanical disruption, bacterial lysis or autolysis. The cytokeratins identified in the CVF are therefore a reflection of the cellular composition of the female genital tract, which express a distinctive cytokeratin profile [32].
We believe that the majority of cytoplasmic and nuclear proteins that we identified by MS were most likely due to  proteolysis that occurred in situ, rather than during collection of the clinical sample per se. The BD SurePath TM preservative fluid contains ethanol, methanol, isopropanol, and formaldehyde; it was developed to serve as a fixative for cervical cells collected during a liquid-based Pap test. The SurePath TM fixative should diminish (if not eliminate) proteolytic degradation. Fixative solutions may crosslink proteins and nucleic acids, so as to interfere with proteolytic enzymes and potentially inhibit cellular lysis [33,34]. For our purposes of MS-based proteomics, the "fixative" attribute of the SurePath TM preservative fluid proved to be advantageous. Studies have shown that DNA in cervical specimens was stable for human papillomavirus testing when stored in SurePath TM fixative for up to 10 weeks at ambient temperature [35]. The Material Safety Data Sheet for the ThinPrep® PreservCyt Solution states that the cytologic sample can be stored for up to six weeks at 39-99°F [6]. Additional studies have shown that DNA could be extracted and PCR amplified from either SurePath TM or ThinPrep® Pap test samples stored for more than 2.5 years [36]. However, there is a paucity of information to document the stability of proteins in these liquid-based Pap test fixatives. Thus, the formulation of Pap test fixatives that are currently on the market may need to be improved upon to ensure that proteins are not degraded if they are to be analyzed in MS-based proteomic studies.
The relative abundance of proteins in the residual Pap test samples was estimated by NSAF, and revealed that neutrophil gelatinase-associated lipocalin, S100A8 and S100A9 were among the most abundant proteins identified. All three proteins function in innate immunity, a common function of CVF proteins [37,38], and have been previously identified in the CVF proteome [11,13,15,20]. In one study of CVF, a similar NSAF calculation determined that S100A9 was the most abundant CVF protein [13]; however, although S100A9 was identified in every sample we examined, it was among the 30 most abundant proteins in only six of seven samples analyzed.
One potential advantage of using residual Pap test fluid as a source for biomarker discovery is that CVF may not contain the high abundance proteins that impede the identification of low abundance proteins in similar proteomic analyses of serum and plasma. We examined the residual Pap test samples for the presence of serum albumin using Western immunoblot, and found the level of albumin to be substantially lower than in serum. However, when we examined the Pap test samples by mass spectrometry, we identified a large number of peptides specific for albumin in the residual Pap test samples despite having excluded samples with visible blood contamination. In this study, we specifically chose not to deplete the highly abundant proteins from the Pap test samples prior to MS, since our goal was to see whether it would be feasible to perform a limited number of steps of sample manipulation and still identify hundreds of proteins. In addition, when we designed these studies, we were concerned that by depleting the highly abundant proteins, we may also deplete some of the low abundance proteins that bind to albumin or hemoglobin.
While the presence of serum proteins is not directly addressed in other proteomic studies of CVF, serum albumin and several hemoglobin subunits were among the 10 most abundant proteins identified in CVF by Zegels et al. [13], and serum albumin was identified in all ten proteomic studies of CVF compared in the Zegels et al. review [10]. In future studies, the depletion of serum albumin and hemoglobin (as well as other highly abundant serum proteins) may improve the identification of lowly abundant CVF proteins. Furthermore, Pap test samples from women with gynecological conditions may warrant depletion of the highly abundant proteins in order to identify proteins that are differentially expressed.
Importantly, our study demonstrates the feasibility of using residual Pap test samples as a protein source for proteomic analysis of CVF. The ability to use a commonly collected clinical specimen for proteomic studies could pave the way for biomarker discovery for any number of gynecological disorders, as well as the FDA approved use in the screening and detection of cervical cancer, pre-cancerous lesions, atypical cells and other cytologic categories [4]. In addition to cytological examination of cells collected for identification of cervical cancer, Pap test samples are now routinely used to test for the presence of human papilloma virus DNA [39], but could potentially be used for diagnosis of other gynecological diseases.
The long-term goal of the research in our laboratory has been to develop a diagnostic test for the early detection of ovarian cancer. The median age of women who are diagnosed with ovarian cancer is 63 years, with almost 90% of those diagnosed over the age of 45 [40]. In this feasibility study, we chose to use Pap test samples from women who were at least 50 years old, so that we could define the "Normal Pap test Core Proteome" for this population of women who had normal cytology reports. In ongoing studies, we are using Pap test samples from women who are diagnosed with ovarian cancer (all of whom are over 50 years old), with the intent of comparing their proteome to this "Normal Pap test Core Proteome". For other gynecological conditions, it may be necessary to select a cohort of women with a lower median age to serve as the "normal" healthy control group.
Using an approach similar to ours, two studies examined cervical cytology specimens by MS in order to stratify them according to cervical cancer risk [41] or for the identification of biomarkers of cervical disease [22]. In another study, Kinde et al. [42] reported a technique (termed Safe-SeqS assay) to detect somatic mutations in the DNA of rare tumor cells present in the liquid fixative solution of Pap tests for the identification of gynecological cancers. All three studies used the liquid Pap test sample; however, they examined the cellular component of the Pap test for either DNA mutations that were known to be present in tumors from the same patient [42], or for MS profiles of cytospins [41] or laser capture microdissected cells from ThinPrep slides [22]. In our case, we used sensitive MS methods to examine "cell-free" Pap test fluid to detect proteins that are shed or secreted by cells in the female genital tract, and showed that the Pap test fluid could be used as a source for biomarker discovery. We are very optimistic that state-of-the-art technology for DNA mutations [42] coupled with our MS technology for proteins will one day be used routinely in the clinic for cancer detection, including cervical neoplasms, endometrial endometrioid and serous carcinomas, and serous tubal intraepithelial carcinomas ("STIC"), the putative precursor of ovarian cancers [12]. It will be necessary to more fully explore the sources of biological, technical, and experimental variations in order to define the feasibility of using residual Pap test fixatives for clinical diagnostics.

Conclusions
We determined that the "cell-free" component of residual Pap test fixative contains a sufficient amount of protein for analysis by MS, and have used it to define the "Normal Pap test Core Proteome". Since residual Pap test fluid is readily available from millions of patients, it represents a valuable biospecimen source for the identification of protein biomarkers for gynecological diseases and has the potential to change the way that women are routinely tested for gynecological cancers.

Clinical specimens
Clinical specimens were collected per routine procedures using the BD SurePath TM liquid-based Pap test. In the clinic, cervical cells were collected from the ectocervix of healthy women by a physician using a BD broom-like device specifically designed for this purpose. The detachable head of the sampling device was immediately placed into a BD SurePath TM vial, which contains 10 ml of a mixture consisting of 21.7% ethanol, 1.2% methanol, 1.1% isopropanol, and formaldehyde [5]. In the clinical laboratory, the BD SurePath TM vials were shaken to remove cells from the head of the broom-like device, and then 8 ml of the SurePath TM solution underwent automated processing to eliminate debris and distribute a representative portion of cells on a slide in a uniform, even layer. Cells were then stained and examined by a pathologist.
For this study, we obtained deidentified residual (waste) Pap test samples in SurePath TM vials from the University of Minnesota BioNet Tissue Procurement Facility with approval from the IRB (Protocol 1101E94895). At our institution, the SurePath TM vials are stored for one month at room temperature after the Pap test sample has been processed, at which time they were made available for our use in this study. Samples selected for this feasibility study were from women at least 50 years old (median age of 58 years; ranging from 50-76 years) with normal cytology and without visible blood contamination.

Sample processing
The workflow of Pap test samples from processing to MS/MS analysis is depicted in Figure 4. SurePath TM vials were vortexed to resuspend proteins that may have settled during the one-month of storage at room temperature in the cytology laboratory, as well as to release cells/proteins from the cervical sampling device that remained in each vial. The residual fluid was centrifuged for 5 min at 800 × g to pellet the cells. Protein concentration in the cell-free SurePath TM fluid was determined using the bicinchoninic acid (BCA) protein assay in microplates (Pierce Protein Research Products, Rockford, IL) according to the manufacturer's instructions.

Filter aided sample preparation
Equal volumes of SurePath TM fixative from 40 randomly selected normal Pap test samples were pooled and acetone precipitated as above, yielding~250 ug of protein. Precipitated proteins for pooled and individual samples were resuspended in 10 mM Tris, pH 7.6, 4% sodium dodecyl sulfate (SDS). Pooled and individual samples (~50-100 ug protein) were prepared for mass spectrometry by Filter Aided Sample Preparation (FASP) using Nanosep Omega centrifugal devices with a 10 K MW cut off (Pall Corp., Port Washington, NY) as a reaction vessel [45,46]. Samples were reduced by the addition of 10 mM Tris(2-carboxyethyl)phosphine (TCEP) at room temperature. Proteins were alkylated with 50 mM iodoacetamide (Sigma-Aldrich, St. Louis, MO) and digested with trypsin (enzyme: protein ratio 1:100) overnight at 37°C. Peptides were desalted with C18 stage tips (Thermo Scientific, West Palm Beach, FL) and dried under vacuum.

High pressure liquid chromatography fractionation
Trypsin digested samples were fractionated offline by high pH reverse phase chromatography [47] using a MAGIC 2002 high pressure liquid chromatography (HPLC) instrument (Michrom BioResources, Inc., Auburn, CA) and C18 Gemini-NX column [150 mm × 2 mm i.d., 5 um particle, 110 Å pore size (Phenomenex, Torrence, CA)]. The flow rate was maintained at 100 μL/min using Buffer A (10 mM ammonium formate pH 10) and Buffer B (10% Buffer A: 90% acetonitrile) at 5-35% gradient for 60 minutes, followed by 35-60% gradient for 5 minutes. Absorbance was monitored at 215 and 280 nm wavelengths. Thirty-two fractions were collected at 2-minute intervals and vacuum-dried. Fractions containing peptides were resuspended in loading solvent (98% water: 2.0% acetonitrile: 0.01% formic acid) prior to analyzing by mass spectrometry.   Figure 4 Diagrammatic representation of the workflow involved in Pap test sample preparation for MS analysis. Following a routine Pap test, the SurePath TM vials were sent to cytopathology for a diagnosis. Excess residual SurePath TM fluid from women with normal cytology was sent to the research laboratory. Protein concentration was determined by the BCA protein assay, proteins were precipitated with acetone, and visualized with silver stain by SDS-PAGE. Precipitated proteins were also trypsin digested and processed by FASP, and peptides were run on HPLC followed by MS. Data was analyzed by Sequest database searching and Scaffold analysis.

Mass spectrometry and database searching
Scientific, Inc., Waltham, MA) as described previously [48] with the exception that the higher-energy collisional dissociation (HCD) activation energy was 0.1 ms. Sequest (version 27, rev 12) was used for peptide matching and protein identification. MS/MS data were searched against a human Uniprot database (version_042012) plus common contaminants (thegpm.org/crap/index, 109 proteins), and a concatenated reversed sequence database for a total of 293,452 proteins. The search parameters were Fragment Tolerance: 0.80 Da (monoisotopic), Parent Tolerance: 0.073 Da (monoisotopic), carbamidomethyl as the fixed modification, methionine oxidation as the variable modification, trypsin digestion, two missed cleavages allowed, and 95% confidence for the detected protein threshold.
The dta/out files generated by Bioworks were analyzed in Scaffold (version _3.6.2, Proteome Software Inc., Portland, OR) to validate MS/MS based peptide and protein identifications and for relative protein quantitation. Peptide identifications were accepted if they could be established at >95.0% probability as specified by the Peptide Prophet algorithm [49]. Protein identifications were accepted if they could be established at >99.0% probability by the Protein Prophet algorithm [50], and contained at least 2 identified peptides. Rates of false positive identifications were estimated using the target-decoy method [51]. False positive rates were < 1.0% for all experiments.

Calculations of the relative abundance of proteins
For semi-quantitative estimation of the abundance of proteins, we determined the total count of MS/MS spectra for each protein. To correct the spectral count for differences in protein size, we normalized by dividing the number of counted spectra to the length of proteins (number of observable peptides) in in-silico trypsin digestion [13,27,28]. We then calculated the Normalized Spectral Abundance Factor (NSAF) as follows: Where S is the number of spectral counts for protein k, L is the length of protein k and N is the total number of proteins identified. We multiplied by 1000 for convenience in presentation of small numbers.

Classification of proteins by cellular localization and biological function
The proteins identified by MS were classified by cellular localization and biological function using PANTHER database (version 8.1) [24] and Ingenuity IPA (version 2013, 17199142, Ingenuity® Systems, www.ingenuity.com) and the UniProtKB Protein Knowledge database.