Identification of pathogens from native urine samples by MALDI-TOF/TOF tandem mass spectrometry

Background Reliable high-throughput microbial pathogen identification in human urine samples is crucial for patients with cystitis symptoms. Currently employed methods are time-consuming and could lead to unnecessary or inadequate antibiotic treatment. Purpose of this study was to assess the potential of mass spectrometry for uropathogen identification from a native urine sample. Methods In total, 16 urine samples having more than 105 CFU/mL were collected from clinical outpatients. These samples were analysed using standard urine culture methods, followed by 16S rRNA gene sequencing serving as control and here described culture-independent MALDI-TOF/TOF MS method being tested. Results Here we present advantages and disadvantages of bottom-up proteomics, using MALDI-TOF/TOF tandem mass spectrometry, for culture-independent identification of uropathogens (e.g. directly from urine samples). The direct approach provided reliable identification of bacteria at the genus level in monobacterial samples. Taxonomic identifications obtained by proteomics were compared both to standard urine culture test used in clinics and genomic test based on 16S rRNA sequencing. Conclusions Our findings indicate that mass spectrometry has great potential as a reliable high-throughput tool for microbial pathogen identification in human urine samples. In this case, the MALDI-TOF/TOF, was used as an analytical tool for the determination of bacteria in urine samples, and the results obtained emphasize high importance of storage conditions and sample preparation method impacting reliability of MS2 data analysis. The proposed method is simple enough to be utilized in existing clinical settings and is highly suitable for suspected single organism infectious etiologies. Further research is required in order to identify pathogens in polymicrobial urine samples.


Background
Urinary tract infections (UTIs) are the most common form of bacterial infections both in the general population and in hospital patients, attributing to nearly 25% of all infections [1]. UTIs are much more common and fungal pathogen Candida spp. [3][4][5][6]. Approximately 60-80% of all uncomplicated bacterial UTIs are caused by E. coli. Researchers have recognized that urine is not sterile and confirmed the importance of resident bacterial flora (urinary microbiota) in the lower urinary tract. Resident urinary microbiota is mostly composed of Lactobacillus gasseri, Corynebacterium coyleae, Actinobaculum schaalii, Aerococcus urinae, Gardnerella vaginalis, Streptococcus anginosus, Streptococcus epidermis, Actinomyces neuii and Bifidobacterium spp. [7,8].
In order to identify microorganisms in clinical microbiology laboratories, most used methods are microbiological techniques which are still based on cultivation on different culture media [9]. Despite advances in genomics and proteomics, urine culture method is still the golden standard for the diagnosis of UTIs. Urine samples containing more than 10 5 CFU/mL of a single microbial species usually indicate clinical relevance. However, there are significant shortcomings to these cultivation-oriented methods. The first limitation is the time required for the cultivation of microorganisms and subsequent identification [10]. Standard incubation times range from 12 to 24 h in order to enable reliable detection of the presence of uropathogens [11]. The second limitation is the requirement for fresh urine samples. Some of these limitations may result in overall negative urine cultures in up to 80% of cases, in many microbiology laboratories [12]. Unfortunately, a wide variety of sampling methods and inappropriate specimen transport are major cause of preanalytical errors [13].
Various methods have been used for detection of microorganisms in clinical microbiology [14][15][16]. For fast screening of urine samples, flow cytometry (such as Sysmex analyser) has been used. However, urine flow cytometer is not able to provide bacteria identification [17,18]. Genomic methods relying on DNA analysis, such as Sep-tiFast, FilmArray or GeneXpert, are being used, however they are still not approved by the FDA for UTI identification [14]. Usage of real-time PCR methods in the identification of uropathogens has been proven as feasible [19], however it is limited in its scope. Techniques using DNA sequencing regularly show more sensitivity compared to standard urine culture test. For this reason, bacterial identification relying on sequencing of the 16S rRNA genes is becoming a method of choice for detection of uropathogens in urine samples [20,21].
Field of proteomics also offers methods for microbial identification, mass spectrometry (MS) being the most prominent one. MS platforms used include matrixassisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS) based analysis producing characteristic spectrum called peptide mass fingerprint (PMF), or less frequently used liquid chromatography tandem mass spectrometry (LC-MS/ MS) based peptide sequencing. LC-MS/MS depends on initial isolation of bacterial colonies from urine and their subsequent cultivation [22,23], while MS based analysers claim ability to directly process samples or swabs. Today, MS-based analysers are in routine use, such as the Bruker BioTyper (Bruker Daltonics) and VITEK MS Plus (bioMérieux), both detecting MS1 spectra fingerprint consisting of most abundant proteins present in a wide array of microorganisms [24][25][26]. The US Food and Drug Administration (FDA) has issued regulatory approval for using MALDI-TOF mass spectrometry-based platform for routine identification of pathogenic microbes from human specimens in clinical microbiology laboratories [23,27]. This instrument is coupled with dedicated software and database so it can perform a comparison of the recorded MS1 spectra with the mass spectra of known microorganisms stored in the database. However, MALDI-TOF MS has its limitations and does not allow identification of microorganisms at the species level, nor it performs well when more than one species or strain is present in the sample [28][29][30]. Furthermore, in order to obtain reliable results, samples have to be cultured on selective agar and a single microbial colony is then used to identify an organism. To bypass time-consuming and selective cultivation stage, culture-independent methods have been developed [17,[31][32][33][34]. More recently, there has been growing interest in mass spectrometry based proteomic analyses directly from urine samples, thus skipping the cultivation stage [35,36]. Ideally, the metaproteomic analysis should be able to provide sufficient numbers of strain-specific peptides useful for microbial identification at the genus, species and even strain-level, and it could also be applied to urine samples containing more than one species, including even potential biomarkers used for non-invasive monitoring of human diseases [37][38][39][40].

Urine samples collection and storage
Urine specimens were collected from the Centre for Clinical Microbiology and Hospital Infections, University Hospital Dubrava with only exclusion criteria being antimicrobial therapy. Through the period from October to December 2016 total of 2993 urine specimens were received from patients for whom a urinary culture analysis was requested (Additional file 2: Table S1). The samples were collected from patients according to the instructions for collecting the urine by midstream cleancatch technique [41].

Urine culture test
The microorganisms were identified by routine microbiology methods [42]. Aliquots made from urine specimens were inoculated onto McConkey agar and blood agar plates using a 1 µl calibrated loop and incubated aerobically at 37 °C from 18 to 48 h, according to the standard operating procedure at the Centre for Clinical Microbiology and Hospital Infections, University Hospital Dubrava. Single colonies were counted to determine the bacterial concentration. Clinically significant infections were considered those with more than 10 5 CFU/mL.

Samples for genomics and proteomics analysis
From samples that tested positive (total of 1571) on urine culture test, 16 samples were randomly selected, matching the following criteria: a.) more than 10 5 CFU/mL and b.) more than 30 ml of urine. All sixteen urine samples (associated with corresponding laboratory reports) were stored at − 20 °C and used for further genomic and proteomic analyses.

DNA extraction
Frozen samples were thawed at room temperature and homogenised. Bacterial genomic DNA was extracted using the Maxwell 16 Cell DNA Purification Kit on the Maxwell 16 research instrument (Promega, Madison) according to the manufacturer's instructions. The concentration of DNA was determined using a Nano-Drop spectrophotometer (Shimadzu Biotech).

16S rRNA sequencing and bioinformatics analysis
Extracted DNA was sent to Next Generation Sequencing Service Provider (MR DNA, Texas, USA). Sequencing was performed on an Illumina MiSeq platform using paired-end sequencing protocol. Amplicons of the 16S rRNA gene were generated using primers targeting V3 and V4 variable regions of the ribosomal RNA. A 30-cycle PCR reaction was performed using the Hot-StarTaq Plus Master Mix Kit (Qiagen, USA). Microbiome bioinformatic analysis was performed using QIIME 2 (Quantitative Insights Into Microbial Ecology) software package version 2018.4 [43]. Paired-end raw sequences were demultiplexed and quality filtered using the q2-demux plugin followed by de-noising with DADA2 [44]. First 7 bases of forward and reverse reads were trimmed, forward reads were truncated to 290 bases, and reverse reads to 240 bases. Taxonomy was assigned to obtained amplicon sequence variants using the q2-feature-classifier [45] which relies on classify-sklearn naive Bayes taxonomy classifier and Greengenes v. 13_8 from which 99% OTUs reference sequences were trimmed to variable regions 3 and 4 [46]. Amplicons were analysed using the QIIME 2 (version 2017.4).

Sample preparation
For each sample, a homogenized aliquot of 10 ml urine sample was centrifuged at 1000 g at room temperature for 1 min (Additional file 1: Figure S1). Insoluble sediment was discarded, and supernatant was transferred to a new tube and centrifuged at 16,000 g at 4 °C for 5 min. The supernatant was discarded, and the bacterial pellet was re-suspended in a buffer (25 mM NH 4 HCO 3 , pH 7.8). The pellet was homogenized on vortex and centrifuged at 16,000 g, at 4 °C for 5 min. This procedure was designed to "wash out" mainly excess human cells and it was repeated three times. Proteins were extracted from the bacterial pellet using 100 µL of bacterial protein extraction reagent B-PER (Thermo-Pierce, USA). Following the manufacturer's protocol, sample was incubated at room temperature for 15 min and subsequently heated at 100 °C in a water bath for 2 min. Insoluble cellular debris was removed by centrifugation at 16,000 g at 4 °C for 5 min. Finally, supernatant with soluble proteins contained in B-PER solution was ready for the next step in proteomics sample preparation.

In solution digestion
Protein sample contained in B-PER (70 µL) was mixed with 2 µL of trypsin solution (1 mg/mL, Merck, Germany). The in-solution digestion was carried out at 37 °C on a thermoshaker (500 rpm) for 18 h (overnight).

MALDI-TOF/TOF mass spectrometry analysis
For sample analysis, 1 µl of 5-mg/mL α-CHCA (α-cyano-4-hydroxycinnamic acid) matrix solution was mixed with 1 µl of each sample fraction (six fractions per sample). From the resulting solution, 1 µl was spotted onto the Opti-TOF MALDI 384 target plate (AB Sciex). After drying at room temperature, spotted samples were analysed using a 4800 Plus MALDI-TOF/TOF mass spectrometer (Applied Biosystems Inc., Foster City, USA) equipped with a 200 Hz, 355 nm Nd: YAG laser. MS spectra were acquired over a mass range of 800-4500 m/z. Peptide fragmentation was performed at collision energy (CID) of 1 kV in positive ion reflection mode, using nitrogen as collision gas. For each sample up to 20 most intense peaks of MS spectra were selected for MS/MS spectra analysis. Approximately 1000 single shots were accumulated from different positions for MS analysis, and 2000 shots spectra were recorded for the subsequent fragment ion spectra. Internal calibration using trypsin autolysis fragments was performed. MS and MS/MS spectra were acquired using the 4000 Series Explorer software v 3.5.3 (AB Sciex).

Analysis of proteomics data
Mascot (version 2.1. Matrix Science, UK) analysis was carried out to identify peptides and to search for matching proteins in the NCBI "nr" database (20140312) with taxonomy filter set for Proteobacteria (11838333 sequences), Firmicutes (5487348 sequences) and Homo sapiens (276468 sequences). Search parameters for MS and MS/MS database were as follows: parent ion mass tolerances of 0.3 Da and 0.5 Da fragment ion mass tolerance, trypsin digestion with a maximum of one miscleavage per peptide and methionine oxidation as variable modification. Trypsin specificity was set at C-terminal lysine and arginine unless next residue is proline. Qualitative data analysis was performed with MASCOT using a 95% confidence interval, so the significance threshold was adjusted with the false discovery rate below 5%. In Mascot reports a minimum score of 48 was used.

Urine culture test
All samples, which have undergone proteomics and genomics analyses, were benchmarked against standard urine culture test that accompanied all the samples (Additional file 2: Table S2). Among the 16 clinical samples analysed, 13 were classified as monobacterial infections and 3 were classified as polymicrobial (at least two identified uropathogens). Thirteen samples showed presence of Gram-negative and only three to Gram-positive bacteria. Regarding taxonomic diversity of the samples analysed, according to standard tests, there were 7 different bacterial species in total, belonging to 4 respective genera (Additional file 2: Table S3).

Effect of storage time and temperature on bacteria in urine samples
Guidelines for the collection and storage of urine specimens differ for different diagnostic purposes. This is something we should be aware of. Urine samples should be collected and stored having in mind exact diagnostic procedures to be carried out. In our study, short-term storage (up to 4 weeks) of urines at − 20 °C showed to be a good choice for the preservation of bacteria in collected samples. Long-term storage (for more than 3 months) at − 80 °C led to biomass loss, most likely due to prolonged freezing which caused greater bacterial cell fragility, thus leading to greater extent of cell disruption during centrifugation (unpublished observations). Table 1. Lowest obtainable taxonomic level for which assignment was possible is being shown as a result of genomic identification. Table 1 provides following information: sample number, conventional urine culture result, DNA concentration and 16S rRNA gene sequencing result.

Identification of bacterial taxa is shown in
What stands out in this table is a disparity in taxonomic identification obtained through 16S rRNA gene sequencing-in the majority of cases bacteria were identified on genus level (44%) and family level (56%), while the identification on species level is usually lacking.
It is apparent that Klebsiella spp. (UR1-UR3), and Enterobacter spp. (UR13-UR14) identifications are difficult to compare due to different levels of taxonomy assignment by the method [47], while there is a significant positive correlation amongst other results for both conventional and genomics methods. A possible explanation for this difficulty might be related to bacterial nomenclature, taxonomy and very high sequence identity. Furthermore, genomic based 16S rRNA analysis was not informative at the genus and/or species level in the family Enterobacteriaceae [48]. There was a surprising difference between standard test and genomics results in sample UR 5. Standard urine culture test indicated Enterococcus faecalis as a single uropathogen in this sample, while 16S rRNA indicated polymicrobial mixture without Enterococcus genus listed. There are two possible explanations for this disparity, one indicating a urine collection sample contamination [49] which would likely cause a genomics test error, and the other being false-positive result of standard culture-based urine test giving a false positive Enterococcus result.

Method for proteomics-based identification of uropathogens
The present study was undertaken to assess the potential of bottom-up proteomics for identification of pathogens directly from the urine samples of patients with UTIs by benchmarking the results obtained against the reference ones (standard urine tests) and using the 16S rRNA gene sequencing-genomics for arbitration in cases where proteomics gives results which differ from the standard urine test.

Sample preparation
For the proteomic analysis, a minimum concentration of 10 5 CFU/mL and a volume of 5 mL of fresh urine sample or urine stored in the refrigerator up to 4 weeks were used. In this preliminary study, we investigated and compared the preparation of samples stored at − 20 °C and − 80 °C. We based our decision on the optimal storage temperature of samples on visual inspection of pellets during centrifugation. In the case of urine samples stored at − 80 °C bacterial cells were lost, and the pellet was deemed insufficient for further downstream analysis. On the other hand, samples stored at − 20 °C showed abundant biomass, however, this proved to be a challenge to wash. Reason for this could be cell aggregation, probable auto-aggregation, especially since blood was present in tested samples [32]. Furthermore, good separation of bacterial cells from other materials such as yeast cells, epithelial cells, leukocytes, erythrocytes, mucus, urinary casts, and different types of crystals that can be present in urine depends on centrifugation speed [32,49]. Moreover, at high-speed the pellet will likely be abundant with cell debris. Consequently, damaged cells will be washed off during the sample preparation process. Pellet volume was identified as an important element that influenced the success of positive protein identification. Microbial biomass had to be visible to the naked eye after washing steps. The obtained pellet biomass can be seen in Additional file 1: Figure S2.
Previous studies had considered the impact of ultrasonication on microorganisms to improve sample preparation [32,50,51]. In our research protein extraction using B-PER worked for both gram-negative and gram-positive bacteria, so there was no need for additional mechanical methods of cell rupture. In reviewed

Peptide fractionation
During a preliminary study, we found that the amount of data we could get from one sample spot was insufficient. Thus, to overcome this obstacle we used peptide fractionation. We hypothesised that peptide fractionation would help to enrich the low-abundance peptides (Additional file 1: Figure S4).

Protein identifications and data analysis
While BioTyper and Vitek use reference databases to identify and classify the microorganisms according to their mass spectra fingerprint, we relied on peptide ion fragments from MS/MS scans and MASCOT protein search results which were translated into MASCOT based uropathogen identification ranks. For this purpose, we have combined MASCOT score with a peptide count and made a simple Python script that ranks organisms suspected to be in the sample based on probability of their proteins being detected. First step was protein identification of tryptic peptides conducted using MASCOT search engine [52]. This provided us with both score and number of queries matched for proteins belonging to one or more organisms. The Mascot Score is a statistical score for how well the spectra generated match the database protein sequence [52,53]. Plainly, a higher score indicates a more confident protein match while the number of queries matched indicates the number of spectra that were matched to this protein. Although it is not unusual for a portion of peptides to be scanned multiple times, overall, the greater the score and greater the number of queries matched-greater the probability of a true positive match. Therefore, we have combined these two measures into a "summa score", simply by summing up all individual peptide scores for a given protein match. Proteins and respective taxa were ordered based on this "summa score" in descending order and highest scoring taxa was taken as most likely uropathogen identification. Table 2 compares the results of this analysis with the standard urine culture test. Summarized report on MAS-COT identified bacterial proteins is listed in Additional file 2: Table S4. The proteins ordered by summa score were listed in Additional file 3: Table S1. Significant minimum MAS-COT summa score obtained for all samples was 53, while maximum reported score was 830. A total number of 382 peptides were reported for all 16 samples. Most of these peptides belong to bacterial proteins (71%).
Although we expected the majority of proteins belonging to ribosomes, we identified a rather small percentage of ribosomal proteins (8%). In our case proteins with the highest scores, were membrane proteins including outer membrane porin protein C, peptidoglycan-associated lipoprotein (PAL) and murein lipoprotein (MLP). This interesting result might be associated with the usage of the B-PER [54]. Considering all monobacterial samples, direct identifications provided reliable identification for genus Klebsiella (3 samples), Proteus (4 samples), Enterococcus (2 samples), Enterobacter (1 sample) and Citrobacter (1 sample). Overall, 87% of correlation with standard urine test was obtained with this simple proteomics approach for monobacterial samples.
These results are very encouraging since pathogenic species were correctly identified at the genus level using a relatively small number of identified bacterial proteins per sample, and in the absence of unique peptides. Although our results indicate that proteomics-based identification with a small number of proteins is feasible, high-throughput setup yielding more spectra and retrieving larger fractions of proteomes would be more favourable.

Microbial identification in polymicrobial cultures
To investigate polymicrobial cultures (UR11, UR15 and UR16), we compared the results obtained from the conventional urine culture, 16S rRNA gene sequencing and proteomics (Additional file 2: Table S5). Our previous experience with MALDI-TOF/TOF mass spectrometer indicated that bacterial identification in polymicrobial urine samples using this platform for proteomics has some limitations. As reported previously by other authors, MALDI-TOF MS identification of polymicrobial cultures directly from urine samples did not provide reliable results [17,49]. Therefore, bacterial identification at the strain-level is still regarded as a challenge. Some of the underlying factors that compromise this method sensitivity in bacterial identification are: sample impurity substances (human proteins), low abundance of bacterial proteins in the sample [55], insufficient coverage of urinary bacterial species in the databases, shared peptide sequences among proteins from different taxa [38] as well as possibility of generating insufficient level of data by single MS injection per sample [39]. Bottom-up tandem MS accompanied with ever-growing proteomics and genomics databases and data processing through wide range of bioinformatics tools has made polymicrobial identification feasible [30,36] but it still remains in domain of experimental research and far from clinical practice.

Human proteins versus contamination
Normal human urine of a healthy individual contains over 2000 proteins [56,57], while over 5000 proteins can be found when the urinary tract is under inflammation [33]. Due to low protein concentration, urine is a difficult proteomic sample to work with [58]. We recorded 29% of human proteins in our samples, of which 33% were found to be repetitive (Additional file 2: Table S6). The most abundant of these repeated human proteins were classified as haemoglobin subunits (alpha and beta-globin), apolipoprotein and uromodulin. We did not find any evidence of epithelial cells from the urinary or vaginal tract, or any biomarkers.
As can be seen from Additional file 1: Figure S3, first two fractions cover more than 50% of the total number of proteins. Furthermore, Additional file 1: Figure S4 shows a quantitative overview of bacterial and human proteins of each sample. In terms of future work, it would be interesting to consider two-dimensional fractionation to increase bacterial proteome coverage and enhance the ratio of bacterial vs human proteins.

Limitations and future direction
With regard to the research method, the major limitation identified by this study is a small number of identified proteins per sample. Many proteomic analyses for bacterial identification were limited to monomicrobial specimens with high CFU/mL concentration based on our need to compare results with those of standard urine culture tests, which have own inherent drawbacks. This study lays the groundwork for future research. In the future, a possible direction could be dealing with lower abundant proteins to enhance effectiveness in proteome identification. Switching to a high-throughput platform such as ESI could solve this issue. Furthermore, to increase the number of proteins, a possible solution could be usage of peptide double fractionation or FASP (filter-aided sample preparation) method. To improve bacterial identification, we are developing bioinformatics software based on natural language processing. Urine is clinically underutilized and has a much greater potential in development of non-invasive tests and techniques. Proteomics approach and direct sample analysis have potential to provide us with a broader clinical picture that could bring us closer to precision medicine.

Conclusion
The main goal of the current study was to establish a procedure for analysis of uropathogens by proteomics, the procedure was tested using MALDI-TOF/TOF mass spectrometry directly from urine specimens. This study has shown that identification of bacteria from a native urine sample, without prior culturing step, depends on storage conditions, sample preparation method, as well as data analysis. Overall, the results of this study demonstrate that mass spectrometry based proteomics can effectively identify different uropathogens from fresh or cold stored, human urine samples directly, without cultivation step. The direct approach was able to provide reliable identification of bacteria at the genus-level in monobacterial samples, despite inherent limitations of mass spectrometry platform used. In case of polymicrobial urine samples, direct approach using the methods here described did not allow for unambiguous identification.
Additional file 1: Figure S1. Experimental workflow for the identification of uropathogen from a native urine sample. Figure S2. Images of 16 urine specimens. Figure S3. Protein content of each fraction as a percentage of the total protein. Figure S4. Cumulative number of bacterial and human proteins for each sample per fraction.
Additional file 2: Table S1. General information about patients. Table S2. Results of conventional urine culture and urine dipstick analysis for 16 urine samples. Table S3. Uropathogenic bacteria in urine samples. Table S4. Summary reports of identified bacterial proteins for each urine sample sorted by "MASCOT summa score". Table S5. The comparative view of urine culture, proteomics and genomic results. Table S6. Identified human proteins ranked by MASCOT score for each urine sample.