Skip to main content

A single-sample workflow for joint metabolomic and proteomic analysis of clinical specimens

Abstract

Understanding the interplay of the proteome and the metabolome helps to understand cellular regulation and response. To enable robust inferences from such multi-omics analyses, we introduced and evaluated a workflow for combined proteome and metabolome analysis starting from a single sample. Specifically, we integrated established and individually optimized protocols for metabolomic and proteomic profiling (EtOH/MTBE and autoSP3, respectively) into a unified workflow (termed MTBE-SP3), and took advantage of the fact that the protein residue of the metabolomic sample can be used as a direct input for proteome analysis. We particularly evaluated the performance of proteome analysis in MTBE-SP3, and demonstrated equivalence of proteome profiles irrespective of prior metabolite extraction. In addition, MTBE-SP3 combines the advantages of EtOH/MTBE and autoSP3 for semi-automated metabolite extraction and fully automated proteome sample preparation, respectively, thus advancing standardization and scalability for large-scale studies. We showed that MTBE-SP3 can be applied to various biological matrices (FFPE tissue, fresh-frozen tissue, plasma, serum and cells) to enable implementation in a variety of clinical settings. To demonstrate applicability, we applied MTBE-SP3 and autoSP3 to a lung adenocarcinoma cohort showing consistent proteomic alterations between tumour and non-tumour adjacent tissue independent of the method used. Integration with metabolomic data obtained from the same samples revealed mitochondrial dysfunction in tumour tissue through deregulation of OGDH, SDH family enzymes and PKM. In summary, MTBE-SP3 enables the facile and reliable parallel measurement of proteins and metabolites obtained from the same sample, benefiting from reduced sample variation and input amount. This workflow is particularly applicable for studies with limited sample availability and offers the potential to enhance the integration of metabolomic and proteomic datasets.

Graphical Abstract

Introduction

Since proteins and metabolites constitute a rich representation of the cell’s phenotype, their collective analysis has contributed to elucidate cellular mechanisms in multiple scenarios. In a clinical settings, integrating proteomic and metabolomic data with genomic and transcriptomic profiles has the potential to significantly enhance personalised medicine strategies and to diagnose and stratify patients [1]. Integrative strategies that combine various omics techniques can significantly improve our understanding of the interactions between regulatory layers, offering valuable insights into complex and multifactorial pathologies, such as cancer [2].

Traditionally, metabolomic and proteomic sample preparation workflows have been developed independently to optimise extraction conditions for either metabolites or proteins [3,4,5,6]. For metabolomics, biphasic extractions utilising either ethanol or methanol combined with methyl-tert-butylether (MTBE), combined into a workflow termed EtOH/MTBE, showed advantages over chloroform or monophasic extractions by exhibiting higher coverage, increased extracted metabolite concentration and robustness [6, 7]. EtOH/MTBE can be performed in a semi-automated fashion for the profiling of polar and non-polar metabolites [6, 7]. In proteomics, single-pot solid-phase-enhanced sample preparation (SP3) has become a broadly used method for proteomic sample preparation because of its wide applicability, high sensitivity, ease of use, and low cost [3, 5, 8]. The concept of SP3 entails the aggregation of proteins on magnetic beads in the presence of an organic solvent, which allows removal of contaminants including salts and detergents, followed by protein release and digestion in aqueous conditions [3]. Taking advantage of the magnetic properties of these beads, SP3 has been implemented on a liquid handling robot for automated processing (autoSP3), to enhance standardized sample handling in large sample cohorts [9, 10], even for low-input samples [9].

In conventional approaches for combined proteomic and metabolomic studies, samples are often prepared separately from different specimens. This is not an ideal approach as discrepancies between proteomic and metabolomic data may be mistakenly attributed to regulatory interactions between these two layers, while in fact this might arise from sample variability. For instance, differences in pre-analytical sample handling (e.g. time and temperature of storage) may be likely to occur if proteomic and metabolomic sample preparation is conducted separately or even in different labs [11]. Therefore, consistency between proteomic and metabolomic data may be significantly enhanced if they are generated from physically the same sample, thus benefiting clinical or mechanistic interpretation of the combined data [12,13,14,15]. In addition, single-sample workflows offer several other advantages, such as minimising pre-analytical variability, reduction of sample heterogeneity related to factors such as tumour content, and limiting the required total sample amount.

These benefits have prompted several studies to develop single-sample workflows for combined proteomic, metabolomic and in some cases lipidomic analysis [14,15,16]. Yet, with few exceptions these studies focused on the analysis of one sample type (e.g. cells, tissue or plasma), and thereby the universal applicability to all biological matrices remains unclear [12]. In addition, these approaches largely employ manual sample handling procedures, although it has been noted that several steps are amenable for automation (e.g. cell lysis, protein digestion) to enhance reproducibility [17].

Here we aimed to assess the performance of an one-sample strategy that combines autoSP3 with an optimised approach for metabolomics [6]. In particular, the method entails bi-phasic extraction of metabolites with MTBE [6], resulting in a precipitated protein pellet that was subsequently used as a direct input for the SP3-based proteomic workflow. We show that prior metabolite extraction does not bias subsequent proteome analysis by demonstrating that the proteomic data generated via MTBE-SP3 is highly consistent with the original autoSP3 method. MTBE-SP3 can be applied to different clinically relevant sample types, including formalin-fixed paraffin-embedded (FFPE) tissue, fresh-frozen tissue, plasma, serum, and cells, indicating universal applicability across a broad range of application areas. This is further facilitated by automated and parallelized sonication and protein clean-up by autoSP3 to enhance standardisation of proteo-metabolomic studies. We applied the combined workflow on a lung adenocarcinoma patient cohort and used a novel network approach to determine that consistent metabolic and proteomic alterations were observed between tumour and non-tumour adjacent tissue, independent of the method that was used for proteomics (autoSP3 or MTBE-SP3). Hence, MTBE-SP3 is a powerful and robust method for integrated metabolomic and proteomic studies performed on the same sample that can be employed for universal applications in diverse biological matrices.

Material and methodes

Tissue samples

All FFPE samples were collected from a biopsy punch of archival Ewing sarcoma xenografts derived from human Ewing sarcoma cell lines. Tumour purity and tissue integrity was assessed by a pathologist before sample processing. For the fresh-frozen samples, mouse liver tissue was used. Tissues were cut into small pieces, pooled by sample type, and aliquoted for further processing. One part was directly used for the autoSP3 workflow, while the second part was subjected to biphasic 75EtOH/MTBE extraction followed by autoSP3 (MTBE-SP3). The third part was cryo-pulverised and further processed (Powder-SP3).

Cell culture

Human U2OS osteosarcoma cells were purchased from the American Type Culture Collection (ATCC) and tested for mycoplasma. Cells cultured in Dulbecco’s modified Eagle’s medium (DMEM) high glucose supplemented with 10% fetal bovine serum (FBS), 100 U/ml penicillin, and 100 µg/ml streptomycin at 37 °C with 5% CO2. Cells were harvested using 0.05% Trypsin/EDTA and centrifuged at 400xg for 3 min. Cells were suspended and washed twice with 1x PBS, counted, and aliquoted into 10 Eppendorf tubes (1.6 million cells each). Next, cells were centrifuged for 5 min at 1000xg to remove the excess of PBS. Cell pellets were always kept on ice and subsequently stored at -20 °C until further processing.

Plasma and serum

Plasma and serum samples were generated by pooling EDTA-plasma and serum samples acquired from the German Red Cross. These pooled blood samples were mixed at 4 °C and aliquots of 100 µl generated. All aliquots were snap-frozen in liquid N2 and stored at -80 °C until processing.

Lung adenocarcinoma cohort

Tissue samples were provided by the Lung Biobank Heidelberg, a member of the accredited Tissue Bank of the National Center for Tumor Diseases (NCT) Heidelberg, the Biomaterial Bank Heidelberg, and the Biobank platform of the German Center for Lung Research (DZL). The local ethics committees of the Medical Faculty Heidelberg (S-270/2001 (biobank vote) and S-699/2020 (study vote)) approved the use of specimens and data. All patients (cohort overview see Table 1) included in the study signed an informed consent and the study was performed according to the principles set out in the WMA Declaration of Helsinki.

Tumour and matched distant (> 5 cm) tumour-free lung tissue samples from patients with non-small cell lung cancer (NSCLC), who underwent therapy-naive resection for primary lung cancer at Thoraxklinik at University Hospital Heidelberg, Germany were collected between 2016 and 2017. Tissues were snap-frozen within 30 min after resection and stored at -80 °C until the time of analysis. All diagnoses were made according to the 2015 WHO classification for lung cancer by at least two experienced pathologists.

For further processing, cryosections (10–15 μm each) were prepared for each patient. The first and the last sections in each series were stained with hematoxylin and eosin (H&E) and were reviewed by an experienced lung pathologist to determine the proportions of viable tumour cells, stromal cells, normal lung cell cells, infiltrating lymphocytes and necrotic areas. Only samples with a viable tumour content of ≥ 50% were used for subsequent analyses.

Table 1 Information on lung adenocarcinoma patients

Metabolite extraction by 75EtOH/MTBE

Tissue pieces were pulverised using a Retsch mm400 ball mill without defrosting, and extracted using an optimised protocol, specifically evaluated to produce broad coverage, high concentration and repeated values for tissue samples [6, 18]. The biphasic 75EtOH/MTBE extraction generates two phases (containing polar metabolites and lipids) and additionally a protein pellet that was further analysed here (Fig. 1). Briefly, samples were extracted using 300 µl ice-cold 75% ethanol, vortexed and sonicated for 5 min on ice or in the case of tissue, disrupted using a ball mill at 25 Hz for 30s. The resulting extract was mixed with 750 µl MTBE (tert-Butyl methyl ether) and kept at room temperature on a shaker (850 rpm) for 30 min. Next, 190 µl of H2O were added to separate the phases. The samples were vortexed and kept at 4 °C for 10 min. Afterwards, the samples were centrifuged for 15 min at 13,000 g at 4 °C. After the combination of both phases in the metabolite extraction, all samples were dried using an Eppendorf Concentrator Plus (at room temperature), stored at -80 °C, and dissolved in 60 µl isopropanol (30 µl of 100% isopropanol, followed by 30 µl of 30% isopropanol in water) before the measurement. The remaining protein pellet was kept at -80 °C until further processing using the autoSP3 proteomics workflow.

Standardised targeted metabolic profiling

Tissue extracts were processed following the manufacturer’s protocol of the MxP® Quant 500 kit (Biocrates). 10 µl of the samples or blanks were pipetted on the 96 well-plate-based kit containing calibrators and internal standards using an automated liquid handling station (epMotion 5075, Eppendorf) and subsequently dried under a nitrogen stream using a positive pressure manifold (Waters). Afterwards, 50 µl phenyl isothiocyanate 5% (PITC) was added to each well to derivatize amino acids and biogenic amines. After 1 h incubation time at room temperature, the plate was dried again. To resolve all extracted metabolites, 300 µl ammonium acetate (5 mM, in MeOH) were pipetted to each filter and incubated for 30 min. The extract was eluted into a new 96-well plate using positive pressure. For the LC-MS/MS analyses 150 µl of the extract was diluted with an equal volume of water. Similarly, for the FIA-MS/MS analyses 10 µl extract was diluted with 490 µl of FIA solvent (provided by Biocrates). After dilution, LC-MS/MS and FIA-MS/MS measurements were performed in positive and negative mode. For chromatographic separation an UPLC I-class PLUS (Waters) system was used coupled to a SCIEX QTRAP 6500 + mass spectrometry system in electrospray ionisation (ESI) mode. LC gradient composition and specific 50 × 2.1 mm column are provided by Biocrates. Data was recorded using the Analyst (Version 1.7.2 Sciex) software suite and further processed via MetIDQ software (Oxygen-DB110-3005). All metabolites were identified using compound-specific multiple reaction monitoring (MRM) using optimised MS conditions and isotopically labelled internal standards for selected metabolites as provided by Biocrates. For quantification either a seven-point calibration curve or one-point calibration was used depending on the metabolite class.

Proteomic sample preparation

The sample preparation for proteome profiling was the same procedure for all sample types unless stated otherwise. A single cell suspension of U2OS cell aliquot was used as direct input into the standard autoSP3 method or the biphasic MTBE/EtOH extraction. The latter resulted in a protein pellet which was resuspended in 1% SDS, 100 mM ammonium bicarbonate for further downstream processing using the autoSP3 method. Plasma and serum pools were aliquoted for the sample purpose to provide identical samples for both workflows, autoSP3 and MTBE-SP3. For fresh-frozen tissue, chunks were manually cut-off in the range of 1 to 3 mg as direct input into the standard autoSP3 method (Bulk-SP3). The remaining tissue (~ 20–30 mg) was cryo pulverised and further aliquoted into equal proportions of powder. The powder was then either resuspended in 1% SDS, 100 mM ammonium bicarbonate and processed through autoSP3 (Powder-SP3) or subjected to the 75EtOH/MTBE extraction followed by autoSP3 (MTBE-SP3). Formalin-fixed and paraffin-embedded (FFPE) biopsy pillars (1 mm diameter and 8 mm length) were cut into cubes of roughly 1 mm3. Individual FFPE cubes were used as direct input into the standard autoSP3 method (Bulk-SP3) or a pool of cubes was used for cryo pulverisation. The resulting powder was aliquoted and resuspended in 1% SDS and 100 mM ammonium bicarbonate. The suspension was further processed through autoSP3 (Powder-SP3) or subjected to the 75EtOH/MTBE extraction followed by autoSP3 (MTBE-SP3). In summary, all sample types and formats (bulk, powder, or MTBE-pellet) were resuspended in 1% SDS and 100 mM ammonium bicarbonate and subjected to AFA-ultrasonication in a Covaris LE220plus instrument at the following settings: Duration 300 [seconds], PIP 450, DF 50, CPB 600, AIP 225 and dithering in Y +/- 1 mm, Z +/- 3 mm direction with 20 mm/second. Subsequently, the extracted amount of protein per sample was quantified using a BCA assay (Pierce) except for FFPE samples containing paraffin. FFPE samples were subjected twice to the sonication step interspaced by 2 cycles of heating at 95 °C for 1 h. Finally, all samples were processed through the autoSP3 protocol [9]. For FFPE, additional wash steps (2 × 200 µl 100% Isopropanol) and intermediate heating cycles of 10 min at 50 °C were applied. Upon overnight proteolytic digestion, the resulting peptide samples were ready for injection into the mass spectrometer. Samples were stored at -20 °C until measurement. The lung cancer fresh-frozen tissue cohort was processed via the bulk-SP3 and the (powder) MTBE-SP3 workflow.

Proteomic data acquisition

An equivalent of 200 ng peptides per sample were injected into a timsTOF Pro mass spectrometer (Bruker Daltonics) equipped with an Easy nLC 1200 system (Thermo) using the following method: peptides were separated using the Easy nLC 1200 system fitted with an analytical column (Aurora Series Emitter Column with CSI fitting, C18, 1.6 μm, 75 μm x 25 cm) (Ion Optics). The outlet of the analytical column with a captive spray fitting was directly coupled to a timsTOF Pro (Bruker) mass spectrometer using a captive spray source. Solvent A was ddH2O (Biosolve Chimie), 0.1% (v/v) FA (Biosolve Chimie), and solvent B was 80% ACN in dH2O, 0.1% (v/v) FA. The samples were loaded at a constant pressure. Peptides were eluted via the analytical column at a constant flow rate of 0.25 µL/min at 50 °C followed by 10 min at 0.4 µL/min. During the elution, the percentage of solvent B was increased in a linear fashion from 4 to 17% in 15 min, then from 17 to 25% in 8 min, then from 25 to 35% in 5 min. Finally, the column was washed for 5 min at 100% solvent A. Peptides were introduced into the mass spectrometer via the standard Bruker captive spray source at default settings. The glass capillary was operated at 1600 V and 3 L/minute dry gas at 180 °C. Full scan MS spectra with mass range m/z 100 to 1700 and a 1/k0 range from 0.85 to 1.3 V*s/cm2 with 100 ms ramp time were acquired with a rolling average switched on (10x). The duty cycle was locked at 100%, the ion polarity was set to positive, and the TIMS mode was enabled. The active exclusion window was set to 0.015 m/z, 1/k0 0.015 V*s/cm2. The isolation width was set to mass 700–800 m/z, width 2–3 m/z and the collision energy to 1/k0 0.85–1.3 V*s/ cm2, energy 27–45 eV. The resulting raw files were processed via MaxQuant (version 2.0.3.0) using the default settings unless otherwise stated. The MaxQuant search was performed using the H.sapiens Uniprot database (reviewed only, downloaded on 30.03.2021). Label-free quantification (LFQ) and intensity-based absolute quantification (iBAQ) were applied using the default settings. Matching between runs was switched on. As a part of our standard operating measures, QC (20ng of Pierce™ HeLa Protein Digest Standard, Thermo Fisher Scientific) and blank samples were routinely acquired between analyses of actual samples, to verify general operational performance of the LCMS platform (e.g. contamination, sensitivity, carry-over). The QC and blank runs were evaluated separately.

Data processing for proteomics and metabolomics datasets

Data quality of protein and metabolite datasets was checked by MatrixQCvis (version 1.3.6) [19], utilizing multiple metrics in a concerted manner, including dimension reduction plots, assessment of missing values, intensity distribution, sample distance matrix, trend/drift plot, and ECDF plot. Low-quality samples, displaying a higher number of missing values and elevated median intensity levels compared to other samples, were excluded from further analysis. For the proteomics datasets (peptides for tissue comparison, proteins for tissue comparison, and proteins for lung adenocarcinoma cohort), LFQ intensities were log-transformed. For the lung cancer dataset, proteins with > 50% data completeness (i.e. with a reported intensity in at least 18 out of 35 samples) were retained for downstream analysis. For the metabolite dataset (lung adenocarcinoma cohort), the MetIDQ-derived dataset containing raw values was filtered according to the MetIDQ-derived quality scores such that metabolites that had at least 2/3 of valid values (i.e., 10x limit of detection and/or between the lower/upper limit of quantification).

Differential expression and overlap analysis for tissue dataset (peptides and proteins)

Differentially expression peptides and proteins were determined using limma (version 3.50.1) using lmFit (method = “ls”). Contrasts were specified via makeContrasts and fitted via contrasts.fit. The contrasts were defined as following: autoSP3 - MTBE-SP3 (cells), autoSP3 - MTBE-SP3 (Powder fresh-frozen tissue, contrast 1), autoSP3 - MTBE-SP3 (Bulk fresh-frozen tissue, contrast 2), autoSP3 - MTBE-SP3 (Powder FFPE tissue, contrast 1), autoSP3 - MTBE-SP3 (Bulk FFPE tissue, contrast 2), autoSP3 - MTBE-SP3 (plasma), and autoSP3 - MTBE-SP3 (serum). Moderated t-statistics of differential expression were determined by empirical Bayes moderation of the standard errors towards a global value using the eBayes function (using default values). Corresponding p-values were adjusted using FDR using the Benjamini-Hochberg method. α was set to 0.05.

The overlap between the different contrasts were analysed using functionality from the MatrixQCvis package and visualised via functions from the upSetR package [20]. Coefficient of variation (CV) was calculated via cv from MatrixQCvis [19] using the formula \(\frac{\sqrt{\frac{1}{N}{\sum }_{i=1}^{N}({x}_{i}-\mu {)}^{2}}\cdot 100}{\mu }\), where µ is the mean of x.

Association of differential expressed peptides with physico-chemical properties for tissue dataset (peptides)

To calculate physico-chemical properties (isoelectric point and GRAVY scores of amino acids) we created the R package PhysicoChemicalPropertiesProtein that is available via https://github.com/tnaake/PhysicoChemicalPropertiesProtein. In brief, the ionizable groups of a protein/peptide sequence (N terminal, C terminal, δ-carboxyl group of glutamate, β-carboxyl group of aspartates, thiol group of cysteine, phenol group of tyrosine, imidazole side chains of histidine, ε-ammonium group of lysine, and guanidinium group of arginine) determine the isoelectric point of a given sequence. The pKA values are taken from (Kozlowski, 2016) and the implemented algorithm (bisection algorithm) is as in (Kozlowski, 2016) [21]. To calculate the isoelectric point the method IPC_protein was used. To calculate the GRAVY score, the hydropathy value for each residue is added and divided by the length of the sequence. The hydropathy values are taken from (Kyte & Doolittle, 1982) [22]. To test for association between physico-chemical properties and the extraction method (MTBE-SP3 vs. Bulk-SP3/Powder-SP3, autoSP3), Spearman’s rank correlation coefficient between GRAVY scores/isoelectric point and t-values from differential expression analysis of peptides were determined.

GO analysis for tissue dataset (proteins)

Protein ids were translated from UNIPROT to ENTREZ via AnnotationDbi (version 1.56.2). To this end, the following AnnotationDb objects were used: org.Hs.eg.db for cells, fresh-frozen tissue, FFPE tissue, plasma, and serum and org.Mm.eg.db for fresh-frozen tissue. Proteins that could not be translated to ENTREZ ids were removed from the downstream analysis. Over-representation of gene ontology (GO) terms was tested using the goana function from limma (version 3.50.1) where differential expressed proteins were proteins with adjusted p-values < 0.05 from differential expression analysis and the universe were all proteins present in the set.

Data analysis for adenocarcinoma lung cancer dataset (proteomics)

Protein IDs were translated from UNIPROT to SYMBOL via AnnotationDbi (version 1.56.2). Proteins with no corresponding SYMBOL IDs (n = 90 out of 6326) were removed from downstream analysis. To test for differential expression, a mixed linear model was created via limma (version 3.50.1) using duplicateCorrelation and lmFit. The blocking variable was set to individual. Contrasts were specified via makeContrasts and fitted via contrasts.fit. The contrasts were defined as follows: to test for differences between the autoSP3 and MTBE-SP3 method (TT_autoSP3 - NAT_autoSP3)/2 - (TT_MTBE-SP3) - NAT_MTBE-SP3)/2; to test for differences between the autoSP3 and MTBE-SP3 method in NAT NAT_autoSP3 - NAT_MTBE-SP3; to test for differences between the autoSP3 and MTBE-SP3 method in TT TT_autoSP3 - NAT_MTBE-SP3; to test for differences between TT and NAT in autoSP3 TT_autoSP3 - NAT_autoSP3; to test for differences between TT and NAT in MTBE-SP3 TT_MTBE-SP3 - NAT_MTBE-SP3; TT: tumour tissue, NAT: non-tumorous adjacent tissue. Moderated t-statistics of differential expression were determined by empirical Bayes moderation of the standard errors towards a global value using the eBayes function (using default values). Corresponding p-values were adjusted using FDR using the Benjamini-Hochberg (BH) method. α was set to 0.05. To test for tissue heterogeneity, the dataset was split into autoSP3 and MTBE-SP3 samples. For each subset, we randomly split the subsets into equal partitions. The blocking variable was set to tissue type (encoding information on NAT and TT origin). The contrast was defined as random_group1 - random_group2 to test for differences between the two random groups. Moderated t-statistics of differential expression and adjusted p-values were determined as described above.

GO analysis for adenocarcinoma lung cancer dataset (proteomics)

Protein ids were translated from SYMBOL to ENTREZ via AnnotationDbi (version 1.56.2). To this end, the org.Hs.eg.db AnnotationDb object was used. One protein could not be translated to an ENTREZ id and was removed from the downstream analysis. Over-representation of gene ontology (GO) terms was tested using the goana function from limma (version 3.50.1) where differential expressed proteins were proteins with adjusted p-values < 0.05 from differential expression analysis and the universe were all proteins present in the set.

Data analysis for adenocarcinoma lung cancer dataset (metabolomics)

To test for differential expression, a mixed linear model was created via limma (version 3.50.1) using duplicateCorrelation and lmFit. The blocking variable was set to individual. The contrast was specified via makeContrasts and fitted via contrasts.fit. The contrast was set to TT - NAT to test for differences between tumour tissue (TT) and non-tumorous adjacent tissue (NAT). Moderated t-statistics of differential expression were determined by empirical Bayes moderation of the standard errors towards a global value using the eBayes function (using default values). Corresponding p-values were adjusted using FDR using the BH method. α was set to 0.05.

Integrated analysis of proteomic and metabolomic datasets for adenocarcinoma lung cancer dataset

In order to perform a pathway enrichment analysis with proteomic and metabolomic data, the first step was to connect metabolites to their corresponding enzymes, and embed the metabolites and enzymes in their respective pathways. A ready-to-use reaction network based on recon3D was extracted from the cosmosR package [23]. As a pathway ontology, we used the cancer hallmark pathway collection from MSigDB (https://www.gsea-msigdb.org/gsea/msigdb). The identified metabolites of the metabolomic dataset were associated with their corresponding enzymes according to the reaction network. The hallmarks of the enzymes were transferred to the corresponding metabolite. This resulted in a hallmark pathway ontology containing both genes and metabolites annotated with their corresponding pathway hallmarks. The pathway enrichment analysis was performed with data from 4 patients, which had full overlap of metabolomic data and proteomic data generated with the autoSP3 and MTBE-SP3 pipelines. Using decoupleR [24], we ran pathway enrichment analyses with the run_wmean function of decoupleR, from which the norm_wmean enrichment score was extracted. The enrichment scores represent the number of standard deviations away from the mean of an empirical null distribution of scores for a given hallmark. The enrichment scores were calculated from the data presented in three different configurations: (1) from the proteomic data alone, (2) from the integrated metabolomic and proteomic dataset and (3) from the proteomic and metabolomic data separately, and subsequent averaging of the proteomic and metabolomic enrichment scores. This procedure was performed twice, once with the autoSP3 proteomic dataset, and once with the MTBE-SP3 proteomic dataset. For each dataset, the log2 fold change of protein and metabolic abundance were estimated individually for each of the 4 considered patients between the healthy and tumour samples. The fold changes of each protein and metabolite were then converted to z-scores across the 4 patients. Those z-scores were used as input for the decoupleR run_wmean function to estimate hallmark enrichment scores at the level of each patient. The enrichment scores obtained across the MSigDB hallmarks with the three data configurations were then correlated between the results of the autoSP3 and MTBE-SP3 datasets using Pearson correlation. All the scripts corresponding to this part of the analysis can be found at.

https://github.com/saezlab/prot_met_workflow/blob/main/scripts/create_combined_metab_gene_hallmarks.R, https://github.com/saezlab/prot_met_workflow/blob/main/scripts/SMARTCARE_decoupleR_sample_preparation.R and https://github.com/saezlab/prot_met_workflow/blob/main/scripts/SMARTCARE_decoupleR_pathway_enrichment_analysis.Rmd.

The ocEAn R package [25] was used following the tutorial available at https://github.com/saezlab/ocean/blob/master/vignettes/tutorial_ocEAn.R.

The t-values of the metabolomic differential expression result (see Data analysis for adenocarcinoma lung cancer dataset (metabolomics)) were used as input for ocEAn. ocEAn distance penalty was set to 8, minimum branch length to 1 upstream and 1 downstream, and the ratio of upstream and downstream branch length for enzymes was left unbounded. The scores of reactions annotated as “reverse” were inverted. In order to compare the resulting metabolic imbalance scores of ocEAn with the proteomic data, multiple scores for the same enzyme (participating in different reactions) were averaged. For simplification purpose, we specifically restrained the interpretation of the results to enzymes of the canonical Kreb’s cycle (citrate -> isocitrate -> α-keto-glutarate -> succinyl-CoA -> succinate -> fumarate -> malate -> oxaloacetate -> citrate) with its incoming branch from glycolysis (phospho-enol pyruvate -> pyruvate -> acetyl-CoA) and its outgoing branch to acetyl-carnitine (acetyl-CoA + carnitine -> acetyl-carnitine). The averaged ocEAn metabolic imbalance score was then compared to the t-values from proteomic differential expression analysis (see Data analysis for adenocarcinoma lung cancer dataset (proteomics)), by computing the respective Pearson correlation coefficient of the averaged ocEAn scores with MTBE-SP3 and autoSP3 proteomic t-values, respectively. The script corresponding to this part of the analysis can be found here: https://github.com/saezlab/prot_met_workflow/blob/main/scripts/comparison_proteomic_ocEAn.R

Results

The single-sample workflow yields similar results compared to autoSP3

Here, we aimed to establish a strategy that combines two methods that had been individually optimised for proteome and metabolome analysis, i.e. SP3 and EtOH/MTBE, respectively, for integrated proteo-metabolomic analysis of physically the same sample. In particular, we used an organic solvents-based extraction to release metabolites, leaving a protein-containing pellet that we used as input material for SP3. In more detail, we applied a bi-phasic extraction with MTBE and 75% ethanol (EtOH) that precipitates proteins as a pellet and generates an upper organic phase containing lipids, and a lower aqueous phase containing polar metabolites (Fig. 1A). The liquid extract, containing the upper and lower phase (Fig. 1A), was transferred to a new reaction tube, dried, and resuspended for downstream targeted metabolomics via the Biocrates MxP Quant 500 kit while the pellet, containing the precipitated proteins (Fig. 1A), was used as direct input for the standard autoSP3 workflow, followed by a DDA approach on a timsTOF Pro mass spectrometer for proteome analysis [6, 9].

Fig. 1
figure 1

Overview of experimental setup. (A) Proteins were extracted using two different methods: the established autoSP3 method and the single-sample workflow via 75EtOH/MTBE extraction followed by autoSP3 (MTBE-SP3). (B) The two extraction methods were tested and compared for several biological matrices (FFPE tissue, fresh-frozen tissue, cells, plasma, and serum). For FFPE and fresh-frozen tissue samples, tissue pieces (bulk) were either used as a direct input for autoSP3 or were cryo-pulverised and homogenised (powder). The powder was then used either as a direct input for autoSP3 (Powder-SP3) or subjected to the 75EtOH/MTBE extraction followed by autoSP3 (Powder MTBE-SP3). For serum, plasma and cells, samples were used either as direct input for autoSP3 or the biphasic 75EtOH/MTBE extraction followed by autoSP3 (MTBE-SP3). (C) To test the concordance between biological interpretations, both extraction methods were tested on a lung adenocarcinoma cohort and the resulting proteomes were compared

An important question is if proteome analysis of a sample that has undergone metabolite extraction and protein precipitation results in a bias in proteome composition or coverage, compared to the original autoSP3 approach that has been optimized for proteome analysis. This is relevant since autoSP3 extracts proteins using an SDS-containing buffer and does not include a protein precipitation step [9]. To test this, we performed comparative proteome analyses of samples prepared via MTBE-SP3 and autoSP3. This was evaluated for five different biological matrices: FFPE tissue, fresh-frozen tissue, plasma, serum, and cells (see methods for sample origin and further details on the biological matrices). Although FFPE tissue is not amenable for metabolome analysis, it was included here to assess potential bias of MTBE-based protein extraction in a wider diversity of samples. For each biological matrix we acquired three samples per extraction method (autoSP3, MTBE-SP3; Fig. 1) that were analysed for proteomics. For FFPE and fresh-frozen tissues, proteins were either extracted from bulk as a direct input to autoSP3 (Bulk-SP3) or from cryo-pulverised and homogenised tissue (Powder-SP3), or from the protein pellet obtained after 75EtOH/MTBE metabolite extraction (Powder-MTBE-SP3). Bulk FFPE and fresh-frozen samples were physically distinct tissue pieces, while samples from homogenised samples were taken from the same homogenate. For plasma, serum, and cell samples, proteins were extracted from the bulk (autoSP3) or from the pellet after 75EtOH/MTBE extraction (MTBE-SP3).

In a first analysis, we assessed the recovery of proteins based on the MaxQuant identification to check whether autoSP3 and MTBE-SP3 methods obtain similar sets of proteins (Supplementary Table S3). In terms of recovery of proteins in the two extraction methods, both protocols showed a high overlap of detected proteins (Fig. 2B, see also Fig. 2C for fresh-frozen tissue and Supplementary Figure S1 for data in other sample types). Looking at the shared protein identifiers after MaxQuant identification, the MTBE-SP3 method showed high overlap of detected proteins compared to the Powder-autoSP3/Bulk-autoSP3 in the FFPE (85%) and fresh-frozen samples (89.4%) and high overlap compared to autoSP3 in cells (97.6%), serum (90%) and plasma (91%). This indicates very similar efficiency of the extraction methods, which was also confirmed by the highly comparable LFQ intensity range in the respective proteomic datasets (Fig. 2A).

Fig. 2
figure 2

Intensities and overlap across all sample types. (A) Densities of log-transformed LFQ intensities for the replicates in all sample types. (B) Bar chart illustrating the percentage of shared (common) and unique quantified proteins and peptides in MTBE-SP3 and autoSP3. (C) Joint and disjoint proteins and peptide sets in fresh-frozen samples. While some of the proteins and peptides were uniquely detected in one of the extraction methods (MTBE-SP3, autoSP3), the majority of proteins and peptides were detected in both methods. The numbers (in %) indicate the proportion of the largest set relative to the total number of proteins and peptides.(D) GRAVY and isoelectric point scores for proteins for the sets autoSP3/MTBE-SP3

Next, we evaluated whether MTBE-SP3 yields concordant proteomic results to the established autoSP3 protocol by the following measures: i) the number of differentially expressed proteins between the two extraction methods, ii) the correlation of log-transformed intensities of technical replicates, iii) and the precision of measurements expressed by the coefficient of variation (CV) of the technical replicates. i) We found a variable, but generally low number of proteins that differed in abundance (fresh-frozen tissue, powder: 0%; FFPE tissue, powder: 0%; cells: 1.1%; serum: 4.6%; plasma: 14.4%; FFPE tissue, bulk: 15.1%; fresh-frozen tissue, bulk: 19.3%). Especially the homogenised tissues showed no abundance differences between the two extraction methods, indicating their equivalent performance. In contrast, these numbers were higher for bulk samples, indicating that, as expected, non-homogenized samples exhibit higher variability in their protein content (Supplementary Table S1). ii) For FFPE and fresh-frozen samples, the correlation analysis between technical replicates revealed high CVs between MTBE-SP3 and (homogenised) auto-SP3 (average R2 = 0.80, SD = 0.05 for FFPE, and R2 = 0.91, SD = 0.02 for fresh frozen), and to a lesser extent between MTBE-SP3 and Bulk-SP3 (average R2 = 0.73, SD = 0.06 for FFPE and R2 = 0.82, SD = 0.04 for fresh frozen). For plasma, serum and cells high coefficients were obtained between MTBE-SP3 and auto-SP3 with an average R2 = 0.89, SD = 0.03 for plasma, R2 = 0.92, SD = 0.01 for serum, and R2 = 0.92, SD = 0.01 for cells. (Supplementary Figure S2). iii) Similar to autoSP3, MTBE-SP3 showed low CVs for liquid (plasma, serum), pulverised (fresh-frozen and FFPE tissue), and other matrices (cells, bulk fresh-frozen, and bulk FFPE tissue). While the differences in CV were significantly different between MTBE-SP3 and autoSP3 for most of the sample types (except serum, α < 0.05, no FDR correction), the effect size was generally low in absolute terms (Supplementary Table S1).

Moreover, we devised an R package (PhysicoChemicalPropertiesProtein, available via www.github.com/tnaake/PhysicoChemicalPropertiesProtein) to calculate two important parameters, the isoelectric point and GRAVY (grand average of hydropathy) scores, to scrutinise potential differences in extraction efficiencies regarding physico-chemical properties (Fig. 2D, Supplementary Figure S3). To that end, we correlated the values of the GRAVY/isoelectric point scores for proteins with the t-values from differential expression analysis. The t-values were regarded as a measure of how differently abundant proteins are for a given extraction method. The homogenous samples (FFPE (powder), cells, plasma, and serum), showed no clear association between the GRAVY/isoelectric point scores and t-values (Spearman ρ correlation coefficients close to 0). These small correlation coefficients were not statistically significantly different from 0, indicating that there is no bias in physico-chemical properties of proteins in the tissues FFPE (powder), cells, plasma, and serum. FFPE (bulk) and fresh-frozen tissue (powder and bulk) showed a moderate positive correlation between GRAVY scores and t-values (Supplementary Table S2). This suggests that more hydrophobic proteins were detected in higher abundance in these matrices in autoSP3 compared to the MTBE-SP3 extraction. Accordingly, GO terms related to the membrane system were differentially expressed between autoSP3 and MTBE-SP3 extraction in fresh-frozen tissue (bulk), while FFPE (bulk) showed enrichment of terms related to the cytoskeleton and DNA/RNA-related processes (Supplementary Figure S4). These differences may be explained from the fact that, by necessity, bulk samples were prepared from disparate tissue pieces which may have differed in composition. Therefore, in conclusion, our data show that depending on the tissue type MTBE-SP3 is equivalent to autoSP3 with regard to the proteome coverage that is obtained across a variety of sample types, with no noticeable (e.g. for fresh-frozen tissue, powder; FFPE tissue, powder; or cells) or moderate selectivity (e.g. FFPE tissue, bulk, fresh-frozen tissue, bulk) in protein extraction.

Applying MTBE-SP3 on a lung adenocarcinoma cohort yields similar results compared to autoSP3

To demonstrate the advantages of the MTBE-SP3 workflow, we applied it in a combined proteome and metabolome analysis in a lung adenocarcinoma cohort. The cohort consisted of fresh-frozen samples from ten patients of paired tumorous tissue (TT) and non-tumorous adjacent tissue (NAT). A particular aim was to assess if similar biological conclusions can be reached in the comparison of these tissue regions when using autoSP3 or MTBE-SP3 for proteome analysis, despite minor differences that may exist between these methods. In addition, using MTBE-SP3, we performed broad-scale targeted metabolomics via MxP Quant 500 (Biocrates) (Supplementary Table S5). MatrixQCvis identified two low-quality samples, displaying a higher number of missing values and elevated median intensity levels compared to other samples, which were excluded from further analysis. In total, across all samples we quantified 6326 proteins in a single-shot DDA approach using a timsTOF Pro mass spectrometer (Supplementary Table S4). After filtering the data, proteomic data was available for 3010 protein features with quantitative information in > 50% of the samples, which were included for further analysis. The metabolomic dataset contained concentrations for 405 metabolites after applying the filtering steps based on the MetIDQ-derived quality scores (see Materials & Methods for further details).

To address if autoSP3 and MTBE-SP3 yield similar quantification results we determined if protein abundances differ when using them for protein extraction from either NAT or TT samples. Analysis of 10 vs. 10 NAT tissue pieces processed by autoSP3 and MTBE-SP3, respectively, identified 3010 proteins of which 809 showed a difference in abundance (α < 0.05 after FDR correction; 1113 significantly different features with α < 0.05 prior to FDR correction). For TT samples, 553 out of 3010 proteins showed an abundance difference (948 significantly different features with α < 0.05 prior to FDR correction). To test whether this difference may be explained by tissue heterogeneity, we run linear models for the two extraction methods separately on random, equally split partitions of samples. This analysis did not show any differentially expressed proteins for either autoSP3 or MTBE-SP3 (α < 0.05 after FDR correction, 130 and 235 significantly different features with α < 0.05 prior to FDR correction for MTBE-SP3 and autoSP3, respectively), indicating that tissue heterogeneity is not governing the observed differences. This suggests that slight differences exist between both methods for this type of samples, although fold changes were mostly modest. This is not necessarily problematic as long as no bias is introduced that skews biological differences between samples that are analysed with either method. To test this, we assessed if autoSP3 and MTBE-SP3 yield the same sets of differentially expressed proteins between NAT and TT samples. When looking at the NAT vs. TT differences adjusting for the autoSP3 and MTBE-SP3 methods (i.e., considering the differences between NATautoSP3vs. TTautoSP3 and NATMTBE−SP3vs. TTMTBE−SP3), only two proteins were significantly different (PDLIM2 and PRPF40A, α < 0.05 after FDR correction, Figs. 3A and 244 significantly different features with α < 0.05 prior to FDR correction), indicating the equivalence of both sample preparation methods.

Fig. 3
figure 3

Differential expression analysis for lung adenocarcinoma cohort (proteomics). (A) UpSet plot of significant protein features for contrast autoSP3 vs. MTBE-SP3 (α < 0.05 after FDR correction). The DE analysis was performed on the sets corresponding to autoSP3 vs. MTBE-SP3 for NAT samples, autoSP3 vs. MTBE-SP3 for TT samples, and autoSP3 vs. MTBE-SP3 for the entire sample set. (B) UpSet plot for contrast TT vs. NAT. The DE analysis was performed on the sets derived from autoSP3 and MTBE-SP3 extraction. (C) Beeswarm plot of log fold changes. The sets correspond to the protein sets from panel B: ‘shared autoSP3’ corresponds to the log fold changes of the 1004 proteins in the autoSP3 dataset, ‘shared MTBE-SP3’ to the log fold changes of the 1004 proteins in the MTBE-SP3 dataset, ‘unique autoSP3’ corresponds to the log fold changes of the 382 proteins in the autoSP3 dataset, and ‘unique MTBE-SP3’ corresponds to the log fold changes of the 378 proteins in the MTBE-SP3 dataset. The absolute log fold changes in the shared sets are higher compared to the unique sets (autoSP3: W = 239,420, p-value < 4.2e-13; MTBE-SP3: W = 230,510, p-value < 3.6e-10; Wilcoxon rank sum test with continuity correction, no adjustment for multiple testing). (D) Scatter plot between t-values from MTBE-SP3 and t-values from autoSP3. The Spearman’s rank correlation ρ between the two sets of t-values is 0.83 (p-value < 2.2e-16, no FDR correction). DE: differential expression/differentially expressed. NAT: non-tumorous adjacent tissue. TT: tumorous tissue

We next determined the overlap among the proteins that were differentially expressed between NAT vs. TT, as obtained by autoSP3 and MBTE-SP3. The extraction methods detected 1386 (autoSP3) and 1382 proteins (MTBE-SP3) to be differentially expressed between NAT and TT (α < 0.05 after FDR correction; 1593 and 1568 significantly different features with α < 0.05 prior to FDR correction). Of these, 1004 proteins were shared among autoSP3 and MTBE-SP3, while 382 (autoSP3) and 378 (MTBE-SP3) were uniquely differentially expressed in each method (Fig. 3B). The considerably lower number of statistically differentially expressed proteins above (NAT vs. TT adjusting for the autoSP3 and MTBE-SP3 methods) compared to the high number of unique proteins for each method tested individually can be explained by the further introduction of variation and higher number of levels of fitted cofactors when adjusting for the two extraction methods. The magnitude of the fold-change among the 1004 shared proteins was higher compared to the 382 and 378 proteins that were unique to autoSP3 and MTBE-SP3, respectively (autoSP3: Wilcoxon’s W = 239,420, p-value < 4.2e-13; MTBE-SP3: Wilcoxon’s W = 230,510, p-value < 3.6e-10; Wilcoxon rank sum test with continuity correction, no adjustment for multiple testing, Fig. 3C), indicating that main differences were captured by both methods. The t-values of the contrast NAT vs. TT for autoSP3 and MTBE-SP3 showed a high correlation (Fig. 3D, ρ = 0.83, p-value < 2.2e-16, no FDR correction) indicating that both autoSP3 and MTBE-SP3 detected the same differential expression patterns between NAT vs. TT. Thus, although autoSP3 and MTBE-SP3 show slight differences in sampling proteomes from these tissues, they yield similar results when comparing differences between samples (here NAT vs. TT) adjusting for the extraction method. Taken together, the results indicate that autoSP3 and MTBE-SP3 perform similarly in quantifying proteome differences in complex clinical tissues.

Integration of metabolomic and proteomic data

For the ten patients of the lung adenocarcinoma cohort, we additionally acquired metabolomic information using the Biocrates MxP Quant 500 assay. After performing quality control, the dataset contained information on the levels of 405 metabolites in the NAT and TT samples. Subsequently, we analysed the metabolomics dataset in conjunction with the MTBE-SP3 proteomics dataset, acquired from physically the same aliquot of the samples, and the autoSP3 proteomics dataset, acquired from a different aliquot of the samples (Fig. 1C). To characterise the coherence of the proteomic and metabolomic data at the level of biological processes, we determined if MSigDB hallmark enrichment scores computed from proteomic and metabolomic data were correlated and checked if this correlation differed when proteomic data were obtained by MTBE-SP3 or autoSP3. This showed notably that the hallmark scores were highly correlated (0.83 to 0.94 Pearson’s R) when considering only proteins, and that the inclusion of metabolites did not affect the hallmark scores much (Fig. 4A). Indeed, the number of measured metabolite features that could be mapped to metabolic pathways was not large enough to affect the correlation based on proteins. Nonetheless, we compared the hallmark scores that could be obtained specifically from proteomic or metabolic data, showing an average Pearson correlation of only 0.2 and 0.15 for MTBE-SP3 and autoSP3 proteomic data, respectively (Fig. 4B). This low correlation is consistent with the notion that metabolic abundance usually correlates poorly with the abundance of metabolic enzymes, even in the same pathways, further supporting that metabolomic data allows to generate complementary insights in combination with proteomic data. Furthermore, we observed no significant difference between the correlation coefficients of the MTBE-SP3 and the autoSP3 datasets (Fig. 4B, Student t-test p-value = 0.53, df = 3), indicating that both datasets are similar.

We then looked for more specific connections between enzymes and the overall metabolic deregulation profiles of tumours, and we assessed if they differ between MTBE-SP3 and autoSP3 datasets. The ocEAn package allows to explore connections between metabolites and metabolic enzymes beyond their direct interactions: ocEAn provides weighted interactions for all possible metabolites and enzymes of a reduced functional genome-scale metabolic network, where weights represent relative distances between metabolites and enzymes in the reaction network [25]. ocEAn was used to systematically explore metabolites upstream and downstream of metabolic enzymes, in order to determine which of those showed the most imbalanced metabolic abundance signatures between TT and NAT samples, i.e. enzymes that show very different metabolic abundance profile changes upstream and downstream of their respective reactions (Fig. 4C). Such imbalance can help to pinpoint metabolic bottlenecks in the metabolic reaction network, which can be more easily interpreted functionally than single metabolite abundance changes can. This notably showed that the succinate dehydrogenase (SDH) metabolic enzyme complex (composed of SDHA, SDHB, SDHC and SDHD), which converts succinate to fumarate as part of the Krebs cycle, was the most significantly imbalanced metabolic reaction according to metabolic deregulation in TT samples (Fig. 4D and E). Indeed, Fig. 4E shows that the abundance of proline and succinate, which are consumed upstream of the SDH complex, are also significantly down-regulated (thus located in the lower left quadrant), while the abundance of spermine, propionylcarnitine and acetylcarnitine, which are produced downstream of the SDH complex, is significantly increased (thus located in the upper right quadrant). Interestingly, the MTBE-SP3 and autoSP3 datasets showed a significant up-regulation of the SDHA complex subunit in TT, albeit more significant in the MTBE-SP3 dataset (MTBE-SP3: t-value = 3.80, p-value = 0.001 after FDR correction; autoSP3: t-value = 2.31, p-value = 0.04 after FDR correction). The marginal accumulation of carnitine conjugates, such as propionyl-carnitine and acetyl-carnitine (p-value = 0.06 and 0.27 respectively, after FDR correction, Fig. 4E) in TT, as well as the up-regulation of the SDH complex, can indicate a strong mitochondrial dysfunction, which is well captured by both proteomic datasets in combination with the metabolomic data. Furthermore, both MTBE-SP3 and autoSP3 datasets agreed on a significant down-regulation of the abundance of OGDH in TT compared to NAT (MTBE-SP3: t-value = 4.06, p-value = 0.005, after FDR correction; autoSP3: t-value = 5.7, p-value < 0.0001, after FDR correction), an enzyme of the TCA cycle converting α-keto-glutarate to succinyl-CoA, upstream of the SDHA complex in the TCA cycle (Fig. 4C), confirming a mitochondrial dysfunction. The integrated analysis of the proteomics and metabolomics datasets by ocEAn gives an additional perspective that is not directly recapitulated by a GO analysis of the proteomics dataset: The GO analysis mainly resulted in enriched terms related to RNA processing, gene expression, and translation (Supplementary Fig. 5). In the GO analysis of the autoSP3 dataset, seven terms in the category ‘Biological Process’ were related to mitochondrial processes linked to mitochondrial gene expression or translation, but no terms were linked to mitochondrial metabolism. For the ocEAn results, both datasets also agreed on the up-regulation of the PKM enzyme in TT, which is the final rate-limiting step of glycolysis (MTBE-SP3: t-value = 5.76, p-value < 0.0001, after FDR correction; autoSP3: t-value = 4.63, p-value < 0.0001, after FDR correction). Finally, the ocEAn scores estimated from the metabolomic data showed slightly higher correlation coefficients with the proteomic data of the MTBE-SP3 dataset than the autoSP3 dataset (MTBE-SP3/ocEAn Pearson correlation: r = 0.45, p-value = 0.05; autoSP3/ocEAn Pearson correlation: r = 0.36, p-value = 0.12). Thus, despite some sparse differences between autoSP3 and MTBE-SP3, the two methods performed equally well, leading to the same biological insight in an integrated proteomic and metabolomic analysis of clinical samples (Fig. 4A).

Fig. 4
figure 4

Comparison of proteomic and metabolomic integration between MTBE-SP3 and autoSP3. (A) Pearson correlation coefficients between MTBE-SP3 and autoSP3 (i) proteomic MSigDB hallmark enrichment scores, (ii) integrated proteomic + metabolomic MSigDB hallmark enrichment scores, and (iii) averaged proteomic and metabolomic MSigDB hallmark enrichment scores. Hallmark enrichment scores were calculated using the decoupler package and represent the number of standard deviations away from the mean of an empirical null distribution of scores for a given hallmark. The colour gradient represents the correlation coefficient. (B) Pearson correlation coefficients between MTBE-SP3 proteomic and metabolomic MSigDB hallmark enrichment scores (left column), and Pearson correlation coefficients between autoSP3 proteomic and metabolomic MSigDB hallmark enrichment scores (right column). Hallmark enrichment scores were calculated using the decoupler package and represent the number of standard deviations away from the mean of an empirical null distribution of scores for a given hallmark. (C) Representation of the TCA cycle main enzymes and metabolites in ocEAn. Arrows represent consumptions (reactant to enzyme) and productions (enzymes to product) of metabolites. Colours represent positive (red, over-production and consumption) and negative (blue, under-production and consumption) metabolic ocEAn signature imbalance (signatures are defined as the sets of metabolites that are found upstream and downstream of a given enzyme in the whole metabolic reaction network). (D) Heatmap displaying the t-values of TCA enzyme abundance changes between lung TT and NAT for the autoSP3 and MTBE-SP3 dataset, and ocEAn metabolic imbalance scores estimated from the differential metabolomic abundances between lung tumour and healthy tissue. (E) Scatter plots representing the differential metabolomic abundances upstream (consumption) and downstream (production) of the SDH enzyme complex. The x-axis represents the ocEAn score, while the y-axis represents the corresponding t-value for a given enzyme (TT vs. NAT)

Discussion

In general, the choice of extraction and processing method can highly influence downstream metabolomics and proteomics analysis of samples [18]. Depending on the composition and combination of solvents, the position of phase shifts, e.g., chloroform extraction results in a lower phase containing lipids, an interphase containing proteins and an upper phase containing polar metabolites [26]. Here, we applied a metabolite extraction suitable for broad metabolic profiling that also contains lipids by combining both polar and apolar phases. Following an adjusted biphasic extraction using 75% ethanol as organic solvent and MTBE as a substitute for chloroform, proteins will be precipitated as a pellet while the two resulting phases can be transferred, combined and dried for the metabolic profiling. We expect that a protein pellet instead of a protein interphase will produce a more discrete entity that can be collected to produce more consistent data in a downstream proteomic analysis. Similarly, an adjacent metabolite and lipid phase without an interfering protein-containing interphase can be handled more easily to produce more reliable results. Ultimately, this will allow to automate the metabolite extraction as no protein interphase is present. We previously showed that the usage of 75% EtOH and MTBE as extraction solvents results in high-coverage, robust, and reproducible measurements of the metabolome compared to monophasic and other biphasic extractions [6]. Besides the broad extraction range of polar metabolites and lipids, we here showed that the protein pellet obtained from the 75EtOH/MTBE extraction protocol can be readily integrated in already established down-stream processing steps for proteome profiling. In particular, no systematic bias in proteome coverage was observed when comparing MTBE-SP3 and autoSP3. This can be explained by that the fact that proteins, in contrast to most metabolites, are amphipathic molecules, causing them to aggregate rather than dissolve in an organic solvent.

To assess the performance of MTBE-SP3 workflow in comparison to autoSP3, we extracted bulk and/or cryo-pulverised and homogenised (powder) tissues and quantified their proteomes subsequently. The bulk samples come from physically distinct tissue pieces, while homogenised samples were taken from the same homogenate. We queried the proteomics datasets resulting from the two extraction methods (autoSP3, MTBE-SP3) and analysed the datasets to check for differences introduced by the preceding 75EtOH/MTBE extraction procedure. Both methods showed similarly low CV values for the different biological matrices (Supplementary Table S1), indicating that MTBE-SP3 can be applied to a broad range of samples, and do not exhibit higher variability when measuring technical replicates. This result generally underlines the conclusion that autoSP3 and MTBE-SP3 quantify robustly the proteome of biological samples. We also scrutinised if MTBE-SP3 discriminates differently against physico-chemical properties of proteins looking at GRAVY and isoelectric point scores calculated from amino acid sequences. High similarity of physico-chemical properties indicated that MTBE-SP3 and autoSP3 exhibit very similar extraction characteristics for most of the sample types. For homogenised tissue types (fresh-frozen powder or FFPE powder), serum, plasma and cells MTBE-SP3 showed a low number of significantly abundant protein features, while this was slightly higher for bulk tissue types (bulk fresh-frozen tissue, bulk FFPE tissue, lung cancer). The underlying difference in the number of significantly abundant protein features between bulk and homogenised tissues is possibly caused by the variability in tissue sample content when probing from adjacent tissue neighbourhoods, given that bulk samples represent physically distinct tissue pieces, while homogenised samples were pooled, cryo-pulverised and taken from the same homogenate.

While FFPE samples have advantages in terms of availability and storage, they present major challenges in metabolomic analysis and are not considered ideal for metabolomics analyses compared to fresh frozen tissue. Previous studies have shown via LC-MS or GC-MS based approaches that FFPE samples exhibit differences in metabolite concentrations and distribution compared to fresh frozen tissues which is considered as gold standard in terms of sample quality [27, 28]. Metabolite classes are differentially preserved in FFPE tissues, ranging from good conservation of fatty acids to complete loss or at least significant alteration of compound concentrations of amino acids, nucleotides and peptides [27]. This discrepancy in metabolite preservation can significantly impact the accuracy and reliability of metabolomic analyses. Therefore, we focused the metabolomics analysis on the lung adenocarcinoma cohort, which consists of fresh frozen tissue samples. AutoSP3 and MTBE-SP3 picked up equivalent differences between TT and NAT indicating that MTBE-SP3 assesses to a similar extent the proteome compared to the established autoSP3 method. The integration of proteomic and metabolomic data from NAT and TT using ocEAn, showed that both proteomic datasets are coherent with a tumour tissue displaying mitochondrial dysfunction, notably with deregulations of OGDH, SDH family enzymes and PKM. The SDH up-regulation in combination with the depletion of OGDH can well explain the depletion of succinate observed in tumours compared to healthy tissue, as illustrated by the joint up-regulation of both the abundance and ocEAn score of SDHA in TT vs. NAT. Furthermore, depletion of OGDH has been shown to lead to the stabilisation of HIF1A, which notably controls the expression of PKM [29]. In the case of OGDH, only the protein abundance is down-regulated in TT vs. NAT, while the ocEAn score does not indicate any apparent global metabolic imbalance around OGDH. This can indicate that in the comparison between TT and NAT, OGDH is not acting as a strong metabolic bottleneck as the SDH complex. Thus, the integration of the metabolomic and proteomic datasets paint the picture of a mitochondrial dysfunction in tumour samples with an up-regulation of SDH enzymes and down-regulation of OGDH, leading to the depletion of succinate and up-regulation of the glycolysis metabolic pathway through the up-regulation of the PKM enzyme. This result was not recapitulated in the global interpretation of the proteomics data using GO analysis, which powerfully illustrates the complementarity of mono and multi-omics analyses. Finally, we showed that the ocEAn scores calculated from the metabolomic data had a better correlation with the differential expression analysis results of the proteomic data of the MTBE-SP3 dataset than the autoSP3. This can be explained by the fact that for MTBE-SP3 the proteome and metabolome measurements originate from the same sample, while they come from a different sample for autoSP3.

Taken together, we have devised a new single-sample workflow MTBE-SP3 by combining autoSP3 together with the 75EtOH/MTBE extraction workflow for proteomics and metabolomics sample processing, respectively. The MTBE-SP3 workflow enables the simultaneous processing of a single sample of all biological matrices for both metabolomic and proteomic analyses, thereby bypassing the problem of inter-sample variability and enabling more robust interpretation from the combined analysis of these modalities. As continuation of the autoSP3 workflow, the combined workflow is particularly relevant to perform multi-omics profiling of rare and limited sample amounts. We expect that robust single-sample workflows, such as MTBE-SP3, will advance the combined analysis of multi-omics experiments including proteomics and metabolomics.

Data availability

The raw files, the search output files, as well as the utilised species databases have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository under the following identifier: PXD046035. The analysis scripts are available via https://www.github.com/tnaake/MTBESP3_extraction_method.

Abbreviations

SP3:

Automated single-pot solid-phase-enhanced sample preparation

MTBE:

Methyl-tert-butylether

LC-MS/MS:

Liquid chromatography-tandem mass spectrometry

FFPE:

Formalin-fixed paraffin-embedded

LFQ:

Label-free quantification

DDA:

Data-dependent acquisition

TT:

Tumorous tissue

NAT:

Non-tumorous adjacent tissue

References

  1. Kowalczyk T, Ciborowski M, Kisluk J, Kretowski A, Barbas C. Mass Spectrometry based proteomics and metabolomics in Personalized Oncology. Biochim Biophys Acta - Mol Basis Dis. 2020;1866(5):165690. https://doi.org/10.1016/j.bbadis.2020.165690.

    Article  CAS  PubMed  Google Scholar 

  2. Yoo BC, Kim KH, Woo SM, Myung JK. Clinical multi-omics strategies for the Effective Cancer Management. J Proteom. 2018;188:97–106. https://doi.org/10.1016/j.jprot.2017.08.010.

    Article  CAS  Google Scholar 

  3. Hughes CS, Foehr S, Garfield DA, Furlong EE, Steinmetz LM, Krijgsveld J. Ultrasensitive Proteome Analysis using Paramagnetic Bead Technology. Mol Syst Biol. 2014;10(10):757. https://doi.org/10.15252/msb.20145625.

    Article  CAS  PubMed  Google Scholar 

  4. Cai X, Xue Z, Wu C, Sun R, Qian L, Yue L, Ge W, Yi X, Liu W, Chen C, Gao H, Yu J, Xu L, Zhu Y, Guo T. High-throughput Proteomic Sample Preparation using pressure Cycling Technology. Nat Protoc. 2022;17(10):2307. https://doi.org/10.1038/S41596-022-00727-1.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Varnavides G, Madern M, Anrather D, Hartl N, Reiter W, Hartl M. In search of a Universal Method: a comparative survey of Bottom-Up Proteomics Sample Preparation methods. J Proteome Res. 2022;21(10):2397–411. https://doi.org/10.1021/ACS.JPROTEOME.2C00265.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Gegner HM, Mechtel N, Heidenreich E, Wirth A, Cortizo FG, Bennewitz K, Fleming T, Andresen C, Freichel M, Teleman AA, Kroll J, Hell R, Poschet G. Deep Metabolic Profiling Assessment of tissue extraction protocols for three model organisms. Front Chem. 2022;10:869732. https://doi.org/10.3389/FCHEM.2022.869732.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Erben V, Poschet G, Schrotz-King P, Brenner H. Evaluation of different stool extraction methods for metabolomics measurements in human faecal samples. BMJ Nutr Prev Heal. 2021;4(2):374. https://doi.org/10.1136/BMJNPH-2020-000202.

    Article  Google Scholar 

  8. Sielaff M, Kuharev J, Bohn T, Hahlbrock J, Bopp T, Tenzer S, Distler U. Evaluation of FASP, SP3, and IST protocols for Proteomic Sample Preparation in the low Microgram Range. J Proteome Res. 2017;16(11):4060–72. https://doi.org/10.1021/ACS.JPROTEOME.7B00433.

    Article  CAS  PubMed  Google Scholar 

  9. Müller T, Kalxdorf M, Longuespée R, Kazdal DN, Stenzinger A, Krijgsveld J. Automated Sample Preparation with SP3 for low-input clinical proteomics. Mol Syst Biol. 2020;16(1):e9111. https://doi.org/10.15252/MSB.20199111.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Leutert M, Rodríguez-Mias RA, Fukuda NK, Villén J. R2-P2 Rapid-Robotic Phosphoproteomics enables Multidimensional Cell Signaling studies. Mol Syst Biol. 2019;15(12):e9021. https://doi.org/10.15252/MSB.20199021.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Gegner HM, Naake T, Dugourd A, Müller T, Czernilofsky F, Kliewer G, Jäger E, Helm B, Kunze-Rohrbach N, Klingmüller U, Hopf C, Müller-Tidow C, Dietrich S, Saez-Rodriguez J, Huber W, Hell R, Poschet G, Krijgsveld J. Pre-analytical Processing of plasma and Serum Samples for Combined Proteome and Metabolome Analysis. Front Mol Biosci. 2022;9:961448. https://doi.org/10.3389/FMOLB.2022.961448.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Garikapati V, Colasante C, Baumgart-Vogt E, Spengler B. Sequential lipidomic, metabolomic, and Proteomic Analyses of Serum, liver, and Heart tissue specimens from Peroxisomal Biogenesis Factor 11α knockout mice. Anal Bioanal Chem. 2022;414(6):2235–50. https://doi.org/10.1007/S00216-021-03860-0.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Bayne EF, Simmons AD, Roberts DS, Zhu Y, Aballo TJ, Wancewicz B, Palecek SP, Ge Y. Multiomics Method enabled by sequential metabolomics and proteomics for human pluripotent stem-cell-derived cardiomyocytes. J Proteome Res. 2021;20(10):4646–54. https://doi.org/10.1021/ACS.JPROTEOME.1C00611.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Zougman A, Wilson JP, Roberts LD, Banks RE. Detergent-free simultaneous Sample Preparation Method for Proteomics and Metabolomics. J Proteome Res. 2020;19(7):2838–44. https://doi.org/10.1021/ACS.JPROTEOME.9B00662.

    Article  CAS  PubMed  Google Scholar 

  15. Nakayasu ES, Nicora CD, Sims AC, Burnum-Johnson KE, Kim Y-M, Kyle JE, Matzke MM, Shukla AK, Chu RK, Schepmoes AA, Jacobs JM, Baric RS, Webb-Robertson B-J, Smith RD, Metz TO. MPLEx: a robust and universal protocol for single-sample integrative proteomic, metabolomic, and lipidomic analyses. mSystems. 2016;1(3):43–59. https://doi.org/10.1128/MSYSTEMS.00043-16.

    Article  Google Scholar 

  16. Coman C, Solari FA, Hentschel A, Sickmann A, Zahedi RP, Ahrends R. Simultaneous metabolite, protein, lipid extraction (SIMPLEX): a combinatorial Multimolecular Omics Approach for systems Biology. Mol Cell Proteom. 2016;15(4):1453. https://doi.org/10.1074/MCP.M115.053702.

    Article  CAS  Google Scholar 

  17. Gutierrez DB, Gant-Branum RL, Romer CE, Farrow MA, Allen JL, Dahal N, Nei YW, Codreanu SG, Jordan AT, Palmer LD, Sherrod SD, McLean JA, Skaar EP, Norris JL, Caprioli RM. An Integrated, High-Throughput Strategy for Multiomic Systems Level Analysis. J Proteome Res. 2018;17(10):3396–408. https://doi.org/10.1021/ACS.JPROTEOME.8B00302.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Andresen C, Boch T, Gegner HM, Mechtel N, Narr A, Birgin E, Rasbach E, Rahbari N, Trumpp A, Poschet G, Hübschmann D. Comparison of extraction methods for intracellular metabolomics of human tissues. Front Mol Biosci. 2022;9. https://doi.org/10.3389/FMOLB.2022.932261.

  19. Naake T, Huber W, MatrixQCvis. Bioinformatics. 2022;38(4):1181–2. https://doi.org/10.1093/BIOINFORMATICS/BTAB748. Shiny-Based Interactive Data Quality Exploration for Omics Data.

  20. Conway JR, Lex A, Gehlenborg N, UpSetR. An R Package for the visualization of intersecting sets and their Properties. Bioinformatics. 2017;33(18):2938–40. https://doi.org/10.1093/BIOINFORMATICS/BTX364.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Kozlowski LP. IPC - isoelectric point calculator. Biol Direct. 2016;11(1):1–16. https://doi.org/10.1186/S13062-016-0159-9.

    Article  Google Scholar 

  22. Kyte J, Doolittle RF. A simple method for displaying the Hydropathic Character of a protein. J Mol Biol. 1982;157(1):105–32. https://doi.org/10.1016/0022-2836(82)90515-0.

    Article  CAS  PubMed  Google Scholar 

  23. Dugourd A, Kuppe C, Sciacovelli M, Gjerga E, Gabor A, Emdal KB, Vieira V, Bekker-Jensen DB, Kranz J, Bindels EMJ, Costa ASH, Sousa A, Beltrao P, Rocha M, Olsen JV, Frezza C, Kramann R, Saez‐Rodriguez J. Causal integration of Multi‐Omics Data with prior knowledge to Generate mechanistic hypotheses. Mol Syst Biol. 2021;17(1):9730. https://doi.org/10.15252/MSB.20209730.

    Article  Google Scholar 

  24. Badia-I-Mompel P, Vélez Santiago J, Braunger J, Geiss C, Dimitrov D, Müller-Dott S, Taus P, Dugourd A, Holland CH, Ramirez Flores RO, Saez-Rodriguez J. DecoupleR: ensemble of computational methods to Infer Biological activities from Omics Data. Bioinforma Adv. 2022;2(1). https://doi.org/10.1093/BIOADV/VBAC016.

  25. Sciacovelli M, Dugourd A, Jimenez LV, Yang M, Nikitopoulou E, Costa ASH, Tronci L, Caraffini V, Rodrigues P, Schmidt C, Ryan DG, Young T, Zecchini VR, Rossi SH, Massie C, Lohoff C, Masid M, Hatzimanikatis V, Kuppe C, Von Kriegsheim A, Kramann R, Gnanapragasam V, Warren AY, Stewart GD, Erez A, Vanharanta S, Saez-Rodriguez J, Frezza C. Dynamic Partitioning of Branched-Chain Amino Acids-Derived Nitrogen Supports Renal Cancer Progression. Nat. Commun 2022 131 2022, 13 (1), 1–20. https://doi.org/10.1038/s41467-022-35036-4.

  26. BLIGH EG, DYER WJ. A Rapid Method of Total Lipid Extraction and purification. Can J Biochem Physiol. 1959;37(8):911–7. https://doi.org/10.1139/O59-099.

    Article  CAS  PubMed  Google Scholar 

  27. Cacciatore S, Zadra G, Bango C, Penney KL, Tyekucheva S, Yanes O, Loda M. Metabolic profiling in Formalin-fixed and paraffin-embedded prostate Cancer tissues. Mol Cancer Res. 2017;15(4):439–47. https://doi.org/10.1158/1541-7786.MCR-16-0262.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Wojakowska A, Marczak Ł, Jelonek K, Polanski K, Widlak P, Pietrowska M. An optimized method of metabolite extraction from Formalin-fixed paraffin-embedded tissue for GC/MS analysis. PLoS ONE. 2015;10(9):e0136902. https://doi.org/10.1371/JOURNAL.PONE.0136902.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Burr SP, Costa ASH, Grice GL, Timms RT, Lobb IT, Freisinger P, Dodd RB, Dougan G, Lehner PJ, Frezza C, Nathan JA. Mitochondrial protein lipoylation and the 2-Oxoglutarate dehydrogenase Complex Controls HIF1α Stability in Aerobic conditions. Cell Metab. 2016;24(5):740–52. https://doi.org/10.1016/j.cmet.2016.09.015.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We would like to thank Andrea Bopp and Martin Fallenbüchel from Lung Biobank Heidelberg as well as Felina Zahnow at the Division of Translational Pediatric Sarcoma Research (DKFZ) for their collection and processing of patient and xenograft specimens, respectively. We would like to thank David Robinson and Ben Warwick for providing the geom_flat_violin function. The study was supported by the German Federal Ministry of Education and Research (BMBF) within the SMART-CARE consortium (fund no. 161L0212) (to T.G.P.G, R.H., J.S.-R., W.H., G.P. and J.K.). The laboratory of T.G.P.G. is supported by the Barbara and Wilfried Mohr Foundation. The authors gratefully acknowledge the data storage service SDS@hd supported by the Ministry of Science, Research and the Arts Baden-Württemberg (MWK) and the German Research Foundation (DFG) through grant INST 35/1314-1 FUGG and INST 35/1503-1 FUGG. For the publication fee we acknowledge financial support by Heidelberg University. Figure 1 was created with biorender.com.

Funding

This work was funded by the German Federal Ministry of Education and Research (BMBF) within the SMART-CARE consortium under project number 161L0212 (to T.G.P.G., R.H., J.S.-R., W.H., G.P. and J.K.), and the Barbara and Wilfried Mohr Foundation (to T.G.P.G.). Data storage service SDS@hd was supported by the Ministry of Science, Research and the Arts Baden-Württemberg (MWK) and the German Research Foundation (DFG) through grant INST 35/1314-1 FUGG and INST 35/1503-1 FUGG.

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: HMG, TM, GP, JK; Methodology: HMG, TM, GK; Validation: TN, AD; Investigation: HMG, KA, GK, TM; Formal analysis: HMG, TN, KA, AD, GK, TM, DS, NK-R; Resources: MAS, TGPG, JS-R, WH; Data curation: TN, AD, DS; Supervision: TGPG, RH, JS-R, WH, GP, JK; Writing- original draft: HMG, KA, TM, GP, JK. Writing-review and editing: all authors. Funding acquisition: TGPG, RH, JS-R, WH, GP, JK.

Corresponding authors

Correspondence to Gernot Poschet or Jeroen Krijgsveld.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gegner, H.M., Naake, T., Aljakouch, K. et al. A single-sample workflow for joint metabolomic and proteomic analysis of clinical specimens. Clin Proteom 21, 49 (2024). https://doi.org/10.1186/s12014-024-09501-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12014-024-09501-9