Identification of SARS-CoV-2 biomarkers in saliva by transcriptomic and proteomics analysis

The detection of SARS-CoV-2 biomarkers by real time PCR (rRT-PCR) has shown that the sensitivity of the test is negatively affected by low viral loads and the severity of the disease. This limitation can be overcome by the use of more sensitive approaches such as mass spectrometry (MS), which has not been explored for the detection of SARS-CoV-2 proteins in saliva. Thus, this study aimed at assessing the translational applicability of mass spectrometry-based proteomics approaches to identify viral proteins in saliva from people diagnosed with COVID-19 within fourteen days after the initial diagnosis, and to compare its performance with rRT-PCR. After ethics approval, saliva samples were self-collected by 42 COVID-19 positive and 16 healthy individuals. Samples from people positive for COVID-19 were collected on average on the sixth day (± 4 days) after initial diagnosis. Viable viral particles in saliva were heat-inactivated followed by the extraction of total proteins and viral RNA. Proteins were digested and then subjected to tandem MS analysis (LC-QTOF-MS/MS) using a data-dependent MS/MS acquisition qualitative shotgun proteomics approach. The acquired spectra were queried against a combined SARS-CoV-2 and human database. The qualitative detection of SARS-CoV-2 specific RNA was done by rRT-PCR. SARS-CoV-2 proteins were identified in all COVID-19 samples (100%), while viral RNA was detected in only 24 out of 42 COVID-19 samples (57.1%). Seven out of 18 SARS-CoV-2 proteins were identified in saliva from COVID-19 positive individuals, from which the most frequent were replicase polyproteins 1ab (100%) and 1a (91.3%), and nucleocapsid (45.2%). Neither viral proteins nor RNA were detected in healthy individuals. Our mass spectrometry approach appears to be more sensitive than rRT-PCR for the detection of SARS-CoV-2 biomarkers in saliva collected from COVID-19 positive individuals up to 14 days after the initial diagnostic test. Based on the novel data presented here, our MS technology can be used as an effective diagnostic test of COVID-19 for initial diagnosis or follow-up of symptomatic cases, especially in patients with reduced viral load. Supplementary Information The online version contains supplementary material available at 10.1186/s12014-023-09417-w.


Introduction
COVID-19 pandemic, caused by SARS-CoV-2 coronavirus, has lasted for more than two years now.Public health measures imposed around the world to reduce the transmission rate of SARS-CoV-2 and to prevent health systems from being overwhelmed have not been sufficient to ease the pandemic.Mass-scale vaccination is a promising addition to public health efforts however, it has not been enough to eradicate the disease, due in part to the surge of more transmissible variants of the virus and the COVID-19 outbreaks especially in unvaccinated people [1].Moreover, the real picture of the pandemic may have been underestimated due to the limited availability of testing, and protocols that often missed asymptomatic infected people.
Exploration of the SARS-CoV-2 viral genome [2] allowed the detection of SARS-CoV-2 to confirm suspected cases of COVID-19 by molecular biology assays [3].Real-time RT-PCR (rRT-PCR), considered the gold standard method, is routinely used in samples obtained from naso/oropharyngeal swabs, sputum, or bronchoalveolar lavage fluid [3].Apart from being an expensive technique, in most cases it requires a qualified healthcare worker to carry out the nasal swabbing process and a long waiting time to obtain results, making mass testing impossible [4,5].Moreover, the accuracy of rRT-PCR method can be jeopardized by inadequate sample collection, handling, and analysis [6] leading to false-negative results.To overcome these disadvantages, low-cost and rapid diagnostic methods based on the detection of SARS-CoV-2 antigens [7] or human antibodies against SARS-CoV-2 [8] have been developed.However, their inability to detect infected individuals at early stages or with low viral loads may also be a matter of concern [9,10].Therefore, there is an urgent need to develop simple and effective diagnostic platforms for COVID-19 that allow the large-scale screening of symptomatic and asymptomatic individuals, leading to the formulation of measures to control the transmission of COVID-19 and provide timely treatments at individual and population levels.
The detection of SARS-CoV-2 in the saliva of COVID-19 patients [11,12] has provided a strong rationale to propose saliva as the most reliable tool to detect SARS-CoV-2 [13,14].In fact, saliva testing is attractive for disease diagnostics and monitoring of health conditions not only because of its multiple contributors, but also because its collection is non-invasive and painless [15,16], and can be self-collected reducing the risk of transmission to health workers [14].Despite advances in diagnostic salivary methods [5], to date the sensitivity of rRT-PCR in detecting SARS-CoV-2 biomarkers in saliva is negatively affected by low viral loads and the severity of the disease [17].The limitations of the rRT-PCR method can be overcome by using more sensitive approaches such as mass spectrometry (MS), to detect viral biomarkers other than RNA, such as proteins and peptides.Identification of SARS-CoV-2 proteins in saliva by MS-based clinical proteomics has not been extensively explored [18], and the scarce proteomics research done has focused on the identification of viral proteins in naso/ oropharyngeal swabs [19][20][21][22][23][24][25][26], gargle solutions [18,27], and plasma [28].The potential use of salivary proteomics for diagnostic purposes was previously demonstrated by our group [29] using a MS-based clinical proteomics approach to identify viral proteins and peptides in saliva from individuals with Zika fever, a disease also caused by an RNA virus.
Taking into account recent advances in clinical proteomics for the identification of multiple protein biomarkers for viral infections [29] and the shedding of SARS-CoV-2 through saliva [11,12], it is expected that SARS-CoV-2 proteins can also be detected in saliva by means of MS-based proteomics.Thus, in this study we explored a new and uncharted area of COVID-19 diagnosis utilizing a proteomics approaches to identify SARS-CoV-2 proteins in saliva from people diagnosed with COVID-19 days after the initial diagnosis, when the viral load was expected to decrease [17], and compared its performance with the conventional rRT-PCR method (Table 1).

Selection of participants
This study was approved by the University of Saskatchewan Research Ethics Board (IRB#1911) and received Operational Approval from the Saskatchewan Health Authority (OA-UofS-1911).Male and female adults were invited to participate in this study and informed consent was obtained.The healthy control group was composed by two sets of samples.The first set consisted of samples collected from healthy individuals who were not experiencing COVID-19-related symptoms, had not travelled outside of Canada in the last 14 days, and had not had any contact with people diagnosed with COVID-19 (C1-C6, Table 1).The second set of samples consisted of saliva samples that were collected before the COVID-19 era and were stored in the saliva biobank at Salivary Proteomics Research Laboratory, University of Saskatchewan (C7-C16, Table 1), following the protocol of saliva collection described elsewhere [29].Individuals assigned to the COVID-19 positive group were eligible to participate in this study only if received a positive result from the rRT-PCR test done by the Saskatchewan Health Authority (SHA) through a nasopharyngeal swab (NPS).and surrounding area were informed by staff members of SHA about this study and were instructed to contact the research team if they were interested in participating in the study.Those interested in participating were screened by a member of the research team for exclusion criteria that included: any history of chronic lung or heart disease (COPD, asthma, heart failure); any symptoms of COVID-19 respiratory syndrome without a confirmatory rRT-PCR test performed by the SHA; use of prescription medications, other than antibiotics at the time of enrollment or in the last three months; and presence of physical or mental illness with motor and/or cognitive impairment(s) that, in the opinion of the researcher, could interfere with compliance or outcomes.

Saliva collection and processing
The first set of saliva samples from healthy volunteers (C1-C6, Table 1) were collected at Salivary Proteomics Research Laboratory, University of Saskatchewan.Saliva samples from people positive for COVID-19 were collected at each participants' home within 14 days after receiving a positive test result from the COVID-19 test done by SHA.Stimulated whole saliva was self-collected using a collection kit (SimplOFy ™ , Oasis Diagnostics ® Corporation, USA) without the addition of DNA stabilizers, a protocol that did not affect the stability of RNA or proteins (Additional file 1: Materials and Methods, and Figures S1 and S2).Saliva was stimulated by chewing a piece of parafilm [30].Immediately after collection, saliva samples were sealed, labeled, and placed on ice to be transported to the Salivary Proteomics Research Laboratory, following all guidelines related to transportation of biohazardous materials.Upon arrival to the laboratory, viable SARS-CoV-2 particles in saliva were inactivated at 60 °C for 30 min [31], a protocol that did not affect protein stability (Additional file 1: Figure S2).Heat-inactivation also reduces the biological risk of infection, allowing samples to be handled in a Biosafety Level 2 laboratory [24].To maintain consistency in sample handling and processing, saliva from healthy individuals was also heattreated.After heat-inactivation, 1 mL of whole saliva was transferred to a centrifuge tube, centrifuged at 14,000 × g for 20 min at 4 °C to separate the pellet from whole saliva supernatant (WSS) [32].WSS was transferred to a new centrifuge tube and kept on ice for further protein and RNA extraction.

Protein extraction from WSS
WSS proteins were purified by ice-cold acetone precipitation.Extracted proteins were reconstituted in 100 mM ammonium bicarbonate and mixed to obtain a whole saliva protein extract.Protein concentration in the whole saliva protein extract was measured using the BCA assay kit (Pierce ™ ).The equivalent of 40 µg of protein from each sample was dried in SpeedVac (Labconco, USA) and stored at − 80 °C.

MS-based proteomics workflow
Dried proteins from the whole saliva protein extract were reduced, alkylated, and digested with trypsin in-solution

Bioinformatics analyses
Tandem mass spectra were extracted from raw data, converted to a mass/charge data format using Agilent MassHunter Qualitative Analysis Software (Agilent Technologies Canada Ltd., Mississauga, ON, CA), and queried against a combined SARS-CoV-2 and human database (UniProt, both downloaded on June 25, 2021) consisting of 20,403 reviewed proteins (SwissProt), using Spectrum Mill (Agilent Technologies Canada Ltd., Mississauga, ON, CA) as the database search engine.
Search parameters included a fragment mass error of 50 parts per million (ppm), a parent mass error of 20 ppm, trypsin cleavage specificity (two missed cleavages per peptide), and carbamidomethylation as a fixed modification of cysteine.Oxidized methionine, carbamylated lysine, pyroglutamic acid, deamidated asparagine, phosphorylated serine, threonine, and tyrosine and acetyl lysine were set as variable modifications.Data were also searched using semi-trypsin non-specific C-and N-terminus to increase protein identification.Spectrum Mill results were validated at peptide and protein levels (1% false discovery rate) and by manually inspecting the MS/ MS spectra to confirm the identity of signature b-and y-fragment ions.The sequence from the tryptic peptides derived from replicase polyprotein isoform 1a and isoform 1ab were queried against the non-redundant protein sequences database using the blastp (proteinprotein Basic Local Alignment Search Tool) algorithm [34].The blastp approach allowed the identification of the non-structural proteins (nsps) in saliva.

RNA extraction from WSS and rRT-PCR
Concurrently with the proteomics analysis, saliva samples were tested using rRT-PCR.rRT-PCR tests were done in all saliva samples to compare the sensitivity and specificity of this test with our proteomics approach at the time of saliva sample collection, which differs from the time of initial COVID-19 diagnosis.For this, total RNA was extracted from WSS using QIAmp Viral RNA Mini Kit (Qiagen).Extracted RNA was used for the qualitative detection of SARS-CoV-2 (target S gene coding for the membrane fusion subunit, domain 2, of the spike protein) specific RNA in saliva with RealStar ® SARS-CoV-2 RT-PCR Kit 1.0 reagent system (Altona Diagnostics GmbH), based on rRT-PCR technology, using a CFX96

Results
Stimulated whole saliva was obtained from forty-two COVID-19 positive individuals, 16 female and 26 male; and from sixteen healthy individuals, 8 female and 8 male.The mean age of the individuals in the COVID-19 group was of 40.8 (± 13.4) and 39.9 (± 17.3) years for female and male participants, respectively.The mean age in the healthy control group was of 37.9 (± 13.0) for female and 34.1 (± 12.1) for male participants.On average, saliva samples from COVID-19 positive individuals were collected on the 6 th day (± 4 days) after the confirmatory NPS by rRT-PCR done by Saskatchewan Health Authority (SHA).
The data obtained from the proteomics analysis were compared with rRT-PCR analysis, both done on saliva samples.The results showed that SARS-CoV-2 specific RNA was detected in only 24 out of 42 people positive for COVID-19, while viral proteins were identified in all samples analyzed.Neither SARS-CoV-2 proteins nor RNA were detected in saliva samples from healthy individuals (Tables 1 and 3), while 1362 proteins from human origin were identified in these samples by MS.The Cq values obtained for the detection of SARS-CoV-2 (target S gene) specific RNA are provided in Table 4.
Considering that both isoforms of SARS-CoV-2 replicase polyprotein are cleaved into several non-structural proteins (nsps) [35], we used blastp algorithm to determine which nsps were detected in saliva of COVID-19 positive individuals.Our approach allowed identification of 5 out of 10 nsps common to both isoforms of replicase polyprotein (1a/1ab), and 4 out of 7 nsps specific to isoform 1ab (Table 6).

Discussion
In this study, we characterized the SARS-CoV-2 proteome in saliva from individuals diagnosed with COVID-19, exploring a novel area of COVID-19 diagnosis through the combined use of saliva and MS-based proteomics.Based on the exciting data presented herein, we proved the concept that SARS-CoV-2 proteins can be detected in saliva from COVID-19 positive individuals up to 14 days after the initial diagnosis, when viral loads are expected to decrease.
Our protocol allowed us to identify 7 out of 17 SARS-CoV-2 proteins in saliva of COVID-19 positive individuals (Table 2), covering 41% of viral proteome (75% structural proteins, 100% non-structural proteins, 18.2% accessory proteins).SARS-CoV-2 proteins present in saliva might originate from free viral particles produced in the lower and upper respiratory tracts, salivary glands, and surrounding tissues, or delivered from the blood to the oral cavity via the gingival crevicular fluid [14].Moreover, the identified structural viral proteins might be derived from free virions in saliva, while non-structural proteins and accessory proteins might represent active viral replication or release from lysed infected cells [29].Based on our results, we speculate that our MS-based proteomics approach may allow for the identification of COVID-19 cases at different stages of the disease, since all three kinds of viral proteins (structural, non-structural, and accessory) were detected in the saliva samples analysed (Table 1).
All 51 tryptic peptides identified in the 42 COVID-19 positive individuals were unique to SARS-CoV-2 (Fig. 1), one of which was consistently found in most of the samples: (K)-SHnIALIWnVKDFmSLSEQLR-(K), from replicase polyproteins 1a and 1ab (73.8%), (Fig. 2).Although there are no reports on the identification of replicase polyproteins 1a/1ab tryptic peptides in human biofluids, our results suggest that these proteins can be considered as target compounds for diagnosing COVID-19 by MS-based proteomics.Apart from  (K)-SHnIALIWnVKDFmSLSEQLR-(K) being the tryptic peptide most frequently identified, we were also able to detect most nsps resulting from the cleavage of both replicase polyprotein isoforms, 1a and 1ab, in saliva from all COVID-19 positive samples (Table S2).
The nsps detected participate in essential processes of COVID-19 pathogenesis such as viral RNA replication and transcription, and immune evasion [35], confirming the applicability of our proteomics approach to detect COVID-19 cases at different stages of the disease.Although saliva-based sampling for SARS-CoV-2 detection via rRT-PCR has shown to be reliable for the initial diagnosis of COVID-19 [4], our proteomics-based approach demonstrated better sensitivity than rRT-PCR to detect viral biomarkers in saliva, since SARS-CoV-2 proteins were identified in all samples, even in those   which were rRT-PCR negative at the time of sample collection (Table 3).Since saliva samples used in this study were collected approximately six days after the confirmatory NPS test by rRT-PCR done by SHA, our results are potentially explained by the reduced viral load at the time of sample collection [5,17], and by the higher rate of RNA degradation compared to that of the proteins [29], confirming the limitations of rRT-PCR test for the detection of SARS-CoV-2 biomarkers in saliva days after symptoms onset [17].The increased lifetime of SARS-CoV-2 proteins in saliva might be due to protein-protein interactions, since viral proteins self-associate forming dimers or oligomers, or interact with other proteins from human or viral origin [36].These physiological interactions may have protected the viral proteins from the proteolytic degradation that occurs in the oral cavity by salivary proteases from human and bacterial origin [37], allowing them to remain for longer periods of time as intact proteins in saliva.These mechanisms enable the detection of SARS-CoV-2 proteins by state-of-the-art proteomics approaches and help to explain the higher sensitivity obtained with this method compared to that of rRT-PCR (Table 3).Considering that there is a direct relationship among viral load, disease severity, and the detection of SARS-CoV specific RNA in saliva [17], further studies should be done to compare the sensitivity and specificity of our MS-based clinical proteomics approach with rRT-PCR in the detection of COVID-19 cases in saliva samples collected at the time of initial diagnosis, when the viral load is highest [17].
In addition to being highly sensitive, a diagnostic method must provide results in a timely manner, facilitating the early detection of COVID-19 cases or their accurate diagnosis [5].In this way, sample analysis by MS may be expedited if the method is adapted to identify target tryptic viral peptides [20,23,24,38] or to identify naturally occurring SARS-CoV-2 peptides in saliva.The main advantage of the latter approach is that the sample can be analysed directly, eliminating the preparation process required for the bottom-up proteomics strategy used in this study [29].This method allows the identification of natively cleaved SARS-CoV-2 peptides in saliva, a methodology previously reported by our group that proved useful in the identification of Zika virus peptides in saliva [29].In this regard, further studies will test the applicability of our high-throughput MS-based peptidomics method [29] for the identification of native SARS-CoV-2 peptides in saliva of COVID-19 positive individuals.The SARS-CoV-2 native cleaved peptides identified in saliva will facilitate the development of COVID-19 point-ofcare diagnostic assays, which could easily be scaled up in numerous locations, adding much-needed testing capacity.
Regarding the methodological aspects, the protocol of saliva self-collection used in this study demonstrates that this biofluid can be self-collected anywhere using the non-invasive technique reported herein.In terms of sample stability, the use of saliva for proteomics analyses demonstrated to be ideal for COVID-19 diagnosis.As mentioned in the methods section, saliva samples were kept on ice immediately after collection, a protocol that prevents proteolytic degradation without interfering with the chemistry of the proteome [39].Moreover, the method of saliva collection and processing used in this study did not lead to RNA degradation, as demonstrated by the similar rRT-PCR results of saliva samples collected using three different methods (Additional file 1: Figure S1).Although requiring further confirmation with saliva samples collected at a large cohort and comparison with the gold-standard rRT-PCR naso/oropharyngeal swab method at the time of the initial diagnosis test [3], here we demonstrated the applicability of our MS-based proteomics technique for the identification of SARS-CoV-2 proteins in saliva from COVID-19 positive individuals.Our findings reinforce the advantages of using MS over rRT-PCR for the detection and follow-up of COVID-19 cases [23,40], especially in cases with reduced viral load that cannot be detected by rRT-PCR [17].

Table 1
Results from the proteomics and rRT-PCR analyses in saliva samples collected from COVID-19 positive and healthy individuals

Table 2
List of SARS-CoV-2 proteins identified in individuals positive for COVID-19 and frequency of identification

Table 3
Sensitivity and specificity of proteomics and rRT-PCR analyses done in saliva samples collected from COVID-19 positive and healthy individuals

Table 4
Cq values obtained during the detection of SARS-CoV-2 (target S gene) specific RNA by rRT-PCR in saliva samples collected from COVID-19 positive and healthy individuals, respectively

Table 5
Fragment b-and y-ions attributed to the tryptic peptide SHnIALIWnVKDFmSLSEQLR as identified by tandem MS.The table also includes complementary fragment ions that aided in the assignment of the peptide

Table 6
List of replicase polyproteins 1a and 1ab tryptic peptides identified in saliva samples collected from COVID-19 positive individuals grouped by the polyprotein fragment to which they belong and their functions R1A replicase polyprotein 1a (accession number: P0DTC1), R1AB replicase polyprotein 1ab (accession number: P0DTD1), Nsp Non-structural protein, NS Non-