A novel non-invasive method allowing for discovery of pathologically relevant proteins from small airways

There is a lack of early and precise biomarkers for personalized respiratory medicine. Breath contains an aerosol of droplet particles, which are formed from the epithelial lining fluid when the small airways close and re-open during inhalation succeeding a full expiration. These particles can be collected by impaction using the PExA method (Particles in Exhaled Air), and are derived from an area of high clinical interest previously difficult to access, making them a potential source of biomarkers reflecting pathological processes in the small airways. Our aim was to investigate if PExA method is useful for discovery of biomarkers that reflect pathology of small airways. Ten healthy controls and 20 subjects with asthma, of whom 10 with small airway involvement as indicated by a high lung clearance index (LCI ≥ 2.9 z-score), were examined in a cross-sectional design, using the PExA instrument. The samples were analysed with the SOMAscan proteomics platform (SomaLogic Inc.). Two hundred-seven proteins were detected in up to 80% of the samples. Nine proteins showed differential abundance in subjects with asthma and high LCI as compared to healthy controls. Two of these were less abundant (ALDOA4, C4), and seven more abundant (FIGF, SERPINA1, CD93, CCL18, F10, IgM, IL1RAP). sRAGE levels were lower in ex-smokers (n = 14) than in never smokers (n = 16). Gene Ontology (GO) annotation database analyses revealed that the PEx proteome is enriched in extracellular proteins associated with extracellular exosome-vesicles and innate immunity. The applied analytical method was reproducible and allowed identification of pathologically interesting proteins in PEx samples from asthmatic subjects with high LCI. The results suggest that PEx based proteomics is a novel and promising approach to study respiratory diseases with small airway involvement. Can the PExA method identify individual protein profiles that reflect pathology of small airways, using the SOMAscan platform? Two hundred-seven proteins were detected in up to 80% of the PEx samples, with a strong overrepresentation of proteins related to innate immune responses, including nine proteins that that discriminated subjects with asthma and high LCI as compared to healthy controls. The results support that PEx based proteomics is a novel and promising approach to study respiratory diseases with small airway involvement.


Introduction
There is a growing interest in the role of small airways (inner diameter < 2 mm) in asthma and other lung diseases [1]. In asthma, involvement of small airways is associated with more severe disease and loss of control [2][3][4][5][6], but has also been demonstrated in moderate and mild asthma [7]. Small airway involvement is also a recognised feature of chronic obstructive lung disease (COPD) [8], and in other severe lung-disease including viral bronchiolitis (as observed in e.g., , lung-fibrosis and hypersensitivity pneumonitis.
In the small airways, surfactant plays a crucial role for airway patency and innate immune responses [9]. Surfactant is a complex mixture of proteins and lipids that keeps small airways open by reducing surface tension, but it also plays an important role in innate immunity by enhancing phagocytosis of inhaled pathogens and particulate matter by special surfactant proteins [10] and by modulating immune responses [11][12][13][14]. Given its crucial role for airway patency and host defence, knowledge of the protein and lipid composition of surfactant is surprisingly limited.
Although the small airways are a key compartment for the onset and progression of respiratory diseases such as COPD and lung-fibrosis [15], early detection of pathological processes in the small airways remains difficult mainly due its inaccessibility. One option for retrieving biological material from the small airways is through bronchoscopy with sampling of biopsies or bronchoalveolar lavage fluid (BALF), but this method is invasive and not suited for point-of-care situations or clinical trials. Non-invasive physiological measurements reflecting small airway function exist (e.g., inert-gas washout techniques and impulse oscillometry), but these methods do not provide the molecular information about pathways needed for the further development of precision medicine. In particular, the introduction of biologics, targeting specific molecular pathways, have highlighted the need for biomarkers that reflect disease endotypes, to enable patient stratification.
Particles in Exhaled Air (PExA) is a novel sampling method allowing non-invasive retrieval of biological material from the small airways. In short, the method is based on impaction of an aerosol consisting of ultrafine droplets of respiratory tract lining fluid (RTLF) that are formed and exhaled after a breathing manoeuvre that promote airway closure and reopening of the small airways [16]. PExA method has been thoroughly described by Larsson et al. [17].
The molecular composition of PEx samples have been explored in previous studies, and 120 different proteins could be detected in PEx samples pooled from several individuals by LC/MS [20]. The protein composition of these samples showed up to 80% similarities to BALF.
Highly abundant proteins, like Surfactant protein A (SP-A) have been successfully quantified with low intraindividual variability in PEx samples from single individuals by ELISA [17,18] and show good correlation to SP-A levels in BALF [19]. The small airway origin of the PEx sample is supported both by its composition, resembling (BALF) but not bronchial wash (BW) [19], and that no amylase is detected by LC/MS [20]. It is also indirectly supported by the 1000-10,000 fold increase in number of exhaled and sampled particles when using a breathing manoeuvre that promote airway closure and re-opening [16,21].
In the present study we sought to evaluate whether PEx samples convey information on pathophysiological processes useful in biomarker discovery. SOMAscan (SomaLogic Inc.) was identified as a potentially suitable proteomics platform for the study. As PEx samples mainly originate from the small airways, we hypothesised that differences in protein composition of PEx samples would be easiest to observe studying patients with increased lung clearance index (LCI), a standard measure of global ventilation inhomogeneity, i.e., an indirect measure of small airway involvement that also is considered a sensitive indicator of early lung damage. Based on this reasoning we chose to study the protein composition of samples from asthmatic subjects with high LCI compared to that of asthmatic subjects and heathy controls with normal LCI.

Methods and analysis
At first, we evaluated the performance of the SOMAscan platform and reproducibility. The second step was a clinical evaluation in a cross-sectional design, where the pathological relevance and the differences in protein-profiles in PEx of non-asthmatic subjects with that of subjects with asthma with-or without high LCI, were assessed.

Subjects
Twenty subjects with asthma and ten healthy controls were included in the clinical evaluation, all of Caucasians origin. All were recruited from our earlier studies on asthma, or by an advertisement in a daily paper. To identify subjects with small airway involvement all subjects were screened with multiple breath nitrogen wash test (MBNW), giving an index of heterogeneity of ventilation (LCI). Asthma subjects were stratified into two groups, whereof one with normal LCI (z-score < 2), herein referred to as A-nLCI (n = 10), and one with high LCI (z-score ≥ 2.9), herein referred to as A-hLCI (n = 10). All subjects with asthma reported a physician diagnose of asthma and were taking asthma medication regularly. We also included a control group (non-asthma) that did not report respiratory symptoms nor were taking medication for respiratory disease and had normal LCI z-score (i.e. LCI < 2), herein referred to as NA (n = 10) [22].
Exclusion criteria were current smoking or smoking within the last 10 years or > 10 pack-years, diagnosis of systemic inflammatory disease, cardiovascular disease or pregnancy. Demographic and clinical data including LCI z-scores are presented in Table 1. All participants gave their written informed consent and the study was approved by the Ethical Committee at Gothenburg University in Sweden.

Clinical characterization
Spirometry was performed according to ERS guidelines, using Spirare spirometer (Spirare, Stockholm, Sweden) Table 1 Demographic and clinical characteristics of the three study-groups, including result from statistical tests Data are presented as, means with standard error given in parenthesis and range given in brackets. Incomplete data is indicated by n numbers given in parenthesis. Kruskal Wallis and Chi Square statistical tests were used for analysing the differences between continuous and categorical data, respectively. "ns" indicate statistical test with p value below 0.05. Dash (-) indicate "not applicable"  [23]. Multiple Breath Nitrogen Wash-out tests were performed using the Exhalyzer ® D device (Eco Medics AG, Duernten, Switzerland) and software (Spiroware 3.1) in accordance with current guidelines [24]. Z-scores were calculated as described by Kjellberg et al. [25].
Fraction of exhaled nitric oxide (FENO) was measured once by a NIOX Mino (Aerocrine AB, Stockholm, Sweden) before spirometry following the ATS-ERS guidelines [26], except for only performing one exhalation.
A skin-prick test (SPT) to common allergens in Sweden was performed with positive result defined as a wheal diameter ≥ 3 mm and negative control < 3 mm. Atopy was defined as the occurrence of at least one positive SPT wheal.
Serum samples were analysed for hsCRP and differential cell counts, using standard clinical methods.
All subjects filled out a questionnaire on medical history, smoking habits, symptoms and medication and subjects with asthma filled out Asthma Control Questionnaire, ACQ, reflecting asthma control over the last week [27]. The use of medication was translated to GINA step for each subject according to GINA guidelines 2016.

PEx sample collection
The PExA method and PExA 1.0 instrument was used to collect PEx samples (described in supplement). For assessment of reproducibility, 120 ng of PEx was collected from each subject and for all other samples at least 240 ng of PEx was collected, involving two consecutive sampling sessions with a short break in between. After collection the sample holder was transferred to a cleanair room and the substrate was excised with a scalpel from the sample holder and placed in Millipore Ultrafree-MC LH Centrifugal Filter insert (FC30LH25) and stored at − 80 °C for further analysis. True blank samples were generated by applying the same procedure as for real samples but without a human breathing into the PExA instrument.

SOMAscan analysis and data processing
SOMAscan is a proprietary highly multiplexed, sensitive proteomic platform (SomaLogic Inc., Boulder, USA). As the SOMAscan platform developed during the study period two different versions was used; (i) SOMAscan 1.1 K was used for the assessment of SOMAscan performance with PEx samples and SOMAscan 1.3 K for the other experiments. Platform and sample preparation is described in supplement.
Intra-run normalization and inter-run calibration were performed by SomaLogic according to their SOMAscan assay GLP data quality-control procedures. Data from SomaLogic was reported in relative fluorescent units (RFU) after hybridization control normalization which remove individual sample variance on the basis of signalling differences between scans (herein referred to as RFU values). Data from all samples passed qualitycontrol criteria and were considered eligible for further analysis. Limit of detection (LOD) was calculated as 3 times the standard deviation from the mean RFU signal measured from 3 blank samples. Proteins with RFU values below LOD were not considered for further analyses. To account for systematic differences due to possible variability in final PEx concentration, the set of detected proteins were subjected to group median based normalization and log2 transformation before statistical analysis was performed. Mean and median values for establishment of LOD and normalization, respectively were calculated based on RFU values in all samples.

Gene Ontology enrichment analysis
To improve our understanding of the origin and functions of the proteins seen in PEx samples, a protein annotation enrichment analysis was performed, using the publicly available "Gene Ontology enrichment analysis and visualization tool GOrilla [28], matching a list of 199 uniquely mapped PEx proteins to either the Cellular Component (CC) or the Biological Process (BP) GO sub-domain (database updated on Feb 15, 2020). A list of 1291 uniquely mapped SOMAscan protein identities was used as reference/background.

Statistical analysis
Significance level for the Gene Ontology enrichment analysis was calculated using the right-tailed Fisher exact test, provided by the GOrilla web-based service [28]. Result from GO annotation enrichment analysis were considered significant at a Benjamini-Hochberg corrected p value below 0.05. PEx protein composition was compared to that of BAL and enrichment factor was calculated by Fisher Exact test.
SOMAscan data were mainly analysed using Qlucore Omics Explorer 3.6 software (Qlucore, Lund, Sweden). RFU values for the 207 proteins was found to not meet requirements for normality and was therefore log2 transformed before statistical analysis. One-way analysis of variance (ANOVA) tests were used to determine intra-individual differences in the reproducibility experiment. General linear model statistics with each variable normalized to mean 0 and variance of 1 and adjustment for imbalance in age and BMI, was used to test differences between the NA, A-nLCI and A-hLCI groups.
Benjamini-Hochberg multiple correction was used to control for rate of false-positive results (herein referred to as q value). Statistical analysis of clinical and demographic variables was performed with Kruskal-Wallis or Chi-square tests using Spotfire 7.0.2 software (TIBCO Spotfire).
Group comparisons of SOMAscan data were considered hypothesis free and proteins with p value below 0.05 and a q value below 0.2 was considered to be of interest in this explorative study.

Assessment of SOMAscan assay performance for PEx samples
SOMAscan technical variability was evaluated by repeated measurements of a pooled PEx sample (1 µg PEx per ml) 5 times on the SOMAscan 1.1 K platform. The mean CV value was 10% looking at a set of 174 proteins detected in all five samples, and below 20% for 156 of the 174 detected proteins (Fig. 1). Intra-individual repeatability related to the PEx sampling procedure and the SOMAscan 1.1 K platform combined, was evaluated by repeat measurements of three consecutive 120 ng PEx samples collected from 6 subjects with asthma. The intra individual CV values ranged from 6.1 to 24.8% with a mean of 13.8%, looking at a set of 114 proteins detected in all 18 samples. To assess if the observed intra-individual variability is low enough for the method to be useful for biomarker discovery, we analysed to what degree it was possible to separate the 6 subjects from each other, solely based on the proteomics data. Defining each of the 6 triplicate samples as groups, the between groups ANOVA test revealed 102 proteins with statistically significant differences between at least two of the group means (q < 0.05). Filtering the list of protein variables further down to a q value cut-off of 5.5 × 10 -5 yielded 42 proteins that completely separated all 6 subjects from each other, as judged by visual inspection of a Principal Component Analysis (PCA) plot (Fig. 2). This means that the intra-individual variation was very low compared to the

Assessment of pathological relevance of proteins detected in PEx samples
Of the 1310 proteins represented on the 1.3 K SOMAscan panel, 134 proteins showed RFU values larger than LOD in the complete set of 30 samples (2 µg PEx/ml). To increase chance of finding differentially abundant proteins a set of 207 SOMAscan protein ID's, detected over LOD in 80% of the 30 samples (Additional file 1: Table S1) were used for various comparative data analyses.

Comparison of the protein composition of PEx with that of BALF by enrichment analysis
Of 207 proteins detected with the SOMAscan 1.3 K platform, 81 (41%) have previously been detected in BALF [29]. Using 1323 uniquely mapped SOMAscan protein identities as reference/background gave at hand that the 207 proteins detected in PEx samples are enriched 5.9 times with the proteins previously detected in supernatant from BALF samples (p < 0.0001).

Gene Ontology (GO) enrichment analysis
Gene Ontology enrichment analysis of 199 uniquely mapped PEx/SOMAscan protein ID's ( Fig. 3) revealed an over-representation of several Cellular Components (CC) GO terms, for example; "extra cellular exosome" [enrichment factor (EF) = 1.79, q = 6.30E−11], "blood microparticle" (EF = 3.43, q = 8.28E−10) and "platelet alpha granule lumen" (EF = 3.15, q = 1.78E−04) ( Table 2). Biological Process (BP) GO domain analysis revealed an over-representation of BP terms, for example; "regulation of complement activation" (EF = 4.4, q = 5.17E−08), "platelet degranulation" (EF = 2.8, q = 2.88E−04), "regulation of coagulation" (EF = 2.6, q = 2.72E−02), "acute inflammatory response" (EF = 3.21, q = 8.08E−03), and "neutrophil activation  Table 2 Gene ontology annotation enrichment analysis This table display result from Gene Ontology enrichment analysis using the publicly available "Gene Ontology enrichment analysis and visualization tool" (GOrilla) [28]. A list of 199 uniquely mapped PEx proteins detected with SOMAscan 1.3 K were searched against the Cellular Component sub-domain database (section A) and the Biological Process sub-domain database (section B). A list of 1291 uniquely mapped SOMAscan 1.3 K protein identities was used as reference/background. Enrichment factor was calculated as (b/n)/(B/N), where n-is the total number of PEx protein ID's, identified by SOMAscan and used as input, b-is the number of PEx/SOMAscan protein ID's associated with the GO term. p values for enrichment analysis were computed according to the mHG or HG model. FDR q value is the p value corrected for multiple testing using the Benjamini and Hochberg (1995) method involved in immune response" (EF = 1.69, q = 2.9E−02) (Table 2B).

Differential abundance analysis, asthma vs. non-asthma
To identify confounding demographic factors we investigated the impact of gender, BMI and age, and found a clear effect of age and to some extent of BMI, independent of disease status. The relative abundance of each of the 207 detected proteins were then compared between various pairwise combinations of the A-hLCI, (n = 10), A-nLCI (n = 10) and NA (n = 10) groups. Adjusting for imbalance in age, 9 proteins were found to be differentially abundant in A-hLCI as compared to the NA group, whereof 2 were less abundant (ALDOA4, C4) and 7 more abundant in A-hLCI (FIGF, SERPINA1, CD93, CCL18, F10, IgM, IL1RAP) (Additional file 1: Table S2), exemplified in Fig. 4. Reviewing the scientific literature revealed that all of the 9 differentially abundant proteins are known to play role in immune response and respiratory disease (Additional file 1: Table S3).

Differential abundance analysis, ex-smokers vs. never smokers
To explore effect of smoking in a post-hoc analysis, the 207 SOMAscan/PEx protein data set was screened for proteins showing differential abundance in ex-smokers (n = 14) vs. never smokers (n = 16) (Additional file 1: Table S4). Only one protein, sRAGE (soluble Receptor for Advanced Glycation End products), a pattern-recognition receptor involved in host response to injury, infection and inflammation fulfilled the significance criteria after adjusting for age and BMI, with decreased abundance in ex-smokers as compared to never smokers ( Fig. 4). By contrast, sRAGE did not show any clear difference between any of the asthma groups and healthy controls.

Discussion
Exhalation after breath-holding at residual volume give rise to release of high numbers of tiny droplets/ particles formed from the respiratory tract lining fluid covering the small airways. Some of these particles are small enough to follow the airstream of the exhalation and can be collected by impaction technology (PExA). Due to the small size of the particles and the specific origin, the total amount of respiratory tract lining fluid that can be collected in this way is minute. In the present study we addressed the feasibility of proteomic profiling of PEx samples and could demonstrate that the SOMAscan proteomics platform is sensitive enough to detect and accurately quantify over 150 proteins in PEx samples from single individuals. Given that the SOMAscan panel cover close to 1300 unique protein IDs, 150 may sound as a relative small number. However, since the SOMAscan platform have been developed primarily for analysis of blood samples and a PEx samples originate only from exhaled air and therefore only contain 100-200 ng of a undiluted body fluid, with an unknown dynamic range, we believe that detection of more than 150 proteins in the complete set of 30 samples is a surprisingly good result. Although, limited number of proteins was detected, one should bear in mind that all these proteins originate directly from the Small airways, a highly relevant region for respiratory research, which otherwise would have been very difficult to sample in a non-invasive way. Analysis of three consecutive samples indicated that intra-individual variability is substantially smaller than the inter-individual variability.
Moreover, protein enrichment analysis showed that protein composition of the PEx matrix resembles that of BALF supernatant to a large extent. This finding provide further confidence and confirms previous findings that PEx sample originate from small airways [19,20] and hold the potential to be developed into a non-invasive substitute for bronchoscopy based diagnosis.
Protein enrichment analysis revealing that the PEx matrix is enriched in extracellular proteins associated with "exosome" (Fig. 3, Table 2), is of particular interest due to the emerging role of the exosomes as mediators of biomarkers for several chronic lung diseases [30,31]. In addition, PEx proteome seem highly relevant for studies on the role of innate immune response in development of respiratory diseases and host defence.
To explore the pathological relevance of the PEx proteome in studies of respiratory disease we analysed PEx/ SOMAscan data from 20 asthma patients and 10 healthy control subjects. Despite the low number of subjects, we found several highly interesting proteins to be differently abundant in samples from subjects with asthma compared to the non-asthma group. Alfa-1-antitrypsin (SERPIN1A) and IL1RAP were elevated only in asthma patients with high LCI, as opposed to IgM, CD93 and CCL18 which were elevated also in asthma patients without small airway involvement (Fig. 4A). The two different profiles suggest that SERPIN1A and IL1RAP may be specifically involved in small airway dysfunction, whereas IgM, CD93 and CCL18 may reflect disease processes less specific for small airway pathology. The post-hoc analysis showed that level of sRAGE, a protein suggested to be a blood based biomarker of smoking induced pathology [32,33] was found to be lower in PEx from ex-smokers, suggest that PEx samples is capable of reflecting long time effects of environmental challenges, an important feature for sub-phenotyping of disease.
Since PEx is known to originate to a large extent from the small airway region we chose to include a group of asthmatic subjects with high LCI, an indirect measure of small airway involvement, and poor level of control [22]. Interestingly, we found higher number of proteins to be differentially abundant when comparing the non-asthma group including asthmatics with small airway involvement than with those without small airway involvement, indicating that PEx samples may reflect pathology that drive a more severe type of asthma, also supported by higher ACQ in that group.
The present pilot study was small and primarily dimensioned to highlight the potential of PExA as a noninvasive method for collecting small airway samples compatible with protein biomarker analysis. The fact that as many as 9 of 207 proteins were found to be differentially abundant and that all of these 9 proteins previously have been associated with pathways or mechanism that play crucial role in pulmonary disease, indicate that the proximal sampling method we used have the potential to generate a higher share of highly relevant data than what usually is expected from biomarker discovery based on blood samples.

Conclusion
Our data illustrate for the first time how non-invasively retrieved respiratory tract lining fluid, originating from the small airways in specific, can be analyzed with regard to the relative quantity of over 150 individual proteins. Data reveal that proteins present in PEx to a large extent seem to originate from extracellular vesicles whereof many associated with innate immunity including the complement and coagulation system. Pathological relevance of PEx samples was further demonstrated by showing that all of the proteins found to be differently abundant in asthmatic subjects with small airway involvement are previously described to be involved in lung disease pathways. Collectively the results indicates that the PExA method provide a novel and non-invasive route to identify novel biomarkers and drug targets contributing to further development of precision medicine in the field of respiratory medicine.