Acquisition of experimental samples and data
The discovery cohort involved urine samples from 38 healthy individuals, including children, young and senior groups. Children (n=22) and senior (n=6) RAW data were download from published studies [21, 22], and data from the young group (n=10) were obtained from existing data in our laboratory. Validation cohorts were combined with human urine samples (n=28) and male rat samples (n=21). Human cohort included young (n=8), middle-aged (n=8) and old (n=12) groups. All 29 healthy individuals conformed to periodic physical examination with passing medical tests, and their detailed parameters and information are available in Additional file 1: Table S1. These samples were collected from the clinical laboratory of China-Japan Friendship Hospital during a fasting physical examination. This study’s ethics approval was approved by the China-Japan Friendship Hospital review boards, and each participant signed informed consent. These urine samples were collected and stored in the same environment. We tried to avoid samples being exposed to air and reduced the time the sample stayed at room temperature. Finally, samples were frozen in a −80 ℃ fridge refrigerator, and then processed together.
The rat cohort includeed seven male rats of which the mother and father were born from the same brood of the same parents. We collected data from their three developmental periods including childhood (27 days), entering adulthood (240 days), and reproductive senescence (600 days) [23,24,25]. All rats were bred from birth to the indicated day, with the same fodders, and they lived in the same environment. The animal experiments were approved by the Ethics Review Committee of the Institute of College of Life Science, Beijing Normal University, China. Male rats’ parents were purchased from Beijing Charles River Laboratory. The rats were acclimated to the environment for one week before the experiment. All experimental animals were utilized following the “Guidelines for the Care and Use of Laboratory Animals” issued by the Beijing Office of Laboratory Animal Management (Animal Welfare Assurance Number: ACUC-A02-2015-004). These urine samples were collected in the same environment, frozen in −80℃ fridge refrigerators, and then processed together.
Sample preparation for label-free analysis
The urine samples were reacted with 20 mmol/L dithiothreitol (DTT) at 37 ℃ for 1 h to denature the disulfide bonds in the protein structure, followed by the addition of 55 mmol/L iodoacetamide (IAA) in the dark for 30 min to alkylate the disulfide bond site. Precipitated the supernatant with three-fold volumes of pre-cooled acetone at −20 °C for 2–4 h, and then centrifuged at 12,000 ×g for 30 min at 4 °C to obtain protein precipitate. The pellet was then resuspended in an appropriate amount of protein solubilization solution (8 mol/L urea, 2 mol/L thiourea, 25 mmol/L DTT, and 50 mmol/L Tris). The protein-concentrated solution was measured using Bradford analysis. By using the filter-assisted sample preparation (FASP) method, 100 µg of each sample was digested with trypsin (Trypsin Gold, Mass Spec Grade, Promega, Fitchburg, WI, USA) at a ratio of 50:1. After digestion with trypsin at 37°C for 14 h, 10% formic acid solution was added to the solution to terminate the enzymolysis, and the peptide solution was obtained after centrifugation through a 10 kDa ultrafiltration tube. The concentration of the peptide was determined using the BCA method and passed through a vacuum centrifugal concentrator (Thermo Fisher, USA), and the dried peptides were sealed and stored at −80 °C. Additional file 1: Table S7 shows the urine sample processing methods of the published studies in the literature, and the comparison with this method.
Liquid chromatography and mass spectrometry
Before analysis of urine samples of healthy young individuals, the dried peptide samples should be dissolved in 0.1% FA (formic acid) for liquid chromatography-mass spectrometry analysis, the final concentration should be controlled at 0.1 μg/μL, and each sample should be analyzed with 1 μg peptide. For DDA experiments, iRT (indexed retention time; Biogenesis, Switzerland) calibration peptides were spiked into the sample. Thermo EASY-nLC1200 chromatography system was loaded to Pre-column and the analytical column. Proteome data was collected by the Thermo Orbitrap Fusion Lumos mass spectrometry system (Thermo Fisher Scientific, Bremen, Germany). Liquid chromatography analysis method: pre-column: 75 μm×2 cm, nanoViper C18, 2 μm, 100 Å; analytical column: 50 μm×15 cm, nanoViper C18, 2 μm, 100 Å; injection volume: 10 μL, flow rate: 250 nL/min. The mobile phase configuration is as follows, phase A: 100% mass spectrometric grade water (Fisher Scientific, Spain)/1% formic acid (Fisher Scientific), phase B: 80% acetonitrile (Fisher Scientific, USA)/20% water/1‰ formic acid, 120 min gradient elution: 0 min, 3% phase B; 0–3 min, 8% phase B; 3–93 min, 22% phase B; 93–113 min, 35% phase B; 113–120 min, 90% phase B; mass spectrometry method, ion source: nanoESI, spray voltage: 2.0 kV, capillary temperature: 320 ℃, S-lens RF Level: 30, resolution setting: level 1 (Orbitrap) 120,000 @m/z 200, Level 2 30,000 (Orbitrap) @m/z 200, precursor ion scan range: m/z 350-1350; product ion scan range: from m/z 110, MS1 AGC: 4e5, charge range: 2–7, Ion implantation time: 50 ms, MS2 AGC: 1e5, ion implantation time: 50 ms, ion screening window: 2.0 m/z, fragmentation mode: high energy collision dissociation (HCD), energy: NCE 32, Data-dependent MS/MS : Top 20, dynamic exclusion time: 15s, internal calibration mass: 445.12003. Additional file 1: Table S8 shows the urine sample data collection methods of published studies in the literature and compares them with our method.
Analysis of urinary proteomes with the MaxQuant and Perseus software tool
The validation cohorts included 28 healthy individuals and 7 rats, allowing for robust statistics when performing label-free quantitative comparisons. Each sample was run in technical triplicates for more reliable generation of three RAW files that contained all acquired full MS and MS2 spectra. Base peak chromatograms were inspected visually in Xcalibur Qual Brower version 4.0.27.19 (Thermo Fisher Scientific). RAW files were processed by MaxQuant version 1.6.17.0 (http://www.maxquant.org) using default parameters unless otherwise specified [21, 26,27,28,29]. All RAW files of one species were analyzed together in a single MaxQuant run. Database searches were performed using the Andromeda search engine included with the MaxQuant release [30] with the Uniprot human and rat sequence database (November 27, 2020; 196,211 sequences; April 17, 2021; 36,181 sequences). Precursor mass tolerance was set to 4.5 ppm in the main search, and fragment mass tolerance was set to 20 ppm. Digestion enzyme specificity was set to Trypsin/P with a maximum of two missed cleavages. A minimum peptide length of seven residues was required for identification. Up to five modifications per peptide were allowed; acetylation (protein N-terminal) and oxidation (Met) were set as variable modifications, and carbamidomethyl (Cys) was set as fixed modification. No Andromeda score threshold was set for unmodified peptides. A minimum Andromeda score of 40 was required for modified peptides. Peptide and protein false discovery rates (FDR) were both set to 1% based on a target-decoy reverse database. Proteins that shared all identified peptides were combined into a single protein group. If all identified peptides from one protein were a subset of identified peptides from another protein, these proteins were combined into that group. Peptides that matched multiple protein groups (“razor” peptides) were assigned to the protein group with the most unique peptides. “Match between run” based on accurate m/z and retention time was enabled with a 0.7 min match time window and 20 min alignment time window. Label-free quantitation (LFQ) was performed using the MaxLFQ algorithm built into MaxQuant [31]. Peaks were detected in Full MS, and a three-dimensional peak was constructed as a function of peak centroid m/z (7.5 ppm threshold) and peak area over time. Following de-isotoping, peptide intensities were determined by extracted ion chromatograms based on the peak area at the retention time with the maximum peak height. And peptide intensities were normalized to minimize overall proteome differences based on the assumption that most peptides do not change in intensity between samples. Protein LFQ intensity was calculated from the median of pairwise intensity ratios of peptides identified in two or more samples and adjusted to the cumulative intensity across samples. Quantification was performed using razor and unique peptides, including those modified by acetylation (protein N-terminal) and oxidation (Met). A minimum peptide ratio of one was required for protein intensity normalization, and “Fast LFQ” was enabled. Only proteins that were quantified by at least two unique peptides were used for analysis. RAW data of mass spectrometry are available in iProX Datasets under the Project ID: IPX0002313003 (https://www.iprox.org/page/HMV006.html).
Data processing was performed Perseus version 1.6.14.0 (http://www.perseus-framework.org) [32, 33]. Contaminants, reverse, and protein groups identified by a single peptide were filtered from the dataset. FDR was calculated as the percentage of reverse database matches out of total forward and reverse matches. Protein group LFQ intensities were log2 transformed to reduce the effect of outliers. Protein groups missing LFQ values were assigned values using imputation. Missing values were assumed to be biased towards low abundance proteins that were below the MS detection limit, referred to as “missing not at random”, an assumption that is frequently made in proteomics studies [27, 33, 34]. Imputation was performed separately for each group from a distribution with a width of 0.3 and a downshift of 1.8.
Open search to uncover and identify global modifications and refined search for verification in pFind software tool
RAW data files were searched against Homo sapiens Uniprot canonical database. Database searches were performed with pFind studio (Version 3.1.5) [17], using default parameters unless otherwise specified [35,36,37]. Precursor ion mass and fragmentation tolerance were set as 10 ppm and 20 ppm, respectively. The maximum number of modifications allowed per peptide was three, as was the maximum number of missed cleavages allowed. The minimum peptide length was set to six amino acids. To discover global modifications, the Open Search was selected (Additional file 1: Fig. S6 shows Open-search and refined-search detailed parameters). For protein-level analysis, mass shifts of +15.9949 Da (methionine oxidation) and +28.0313 Da (dimethylation, Light, N-term/K) were searched as variable modifications; mass shifts of +57.0214 Da (Carbamidomethyl cysteine) was searched as fixed modifications. The FDRs were estimated by the program from the number and quality of peptide-spectrum-match (PSM) to the decoy database. The FDRs at spectrum, peptide, and protein levels were < 1%, and the Q-value at the protein level was less than 1%. Data are analyzed using both forward and reverse database retrieval strategies.
Refined-search did not select Open Search option. And for fixed modifications, mass shifts of +28.0313 Da (dimethylation, Light, N-term/K), +57.0214 Da (Carbamidomethyl cysteine), +15.9949 Da (methionine oxidation), −17.0265 Da (Pyro-glu from Q) and +114.0429 Da (cystine glycineglycine/double carbamidomethylation) were searched, which are top five in modifications proportion rank of open-search results. Meanwhile, selecting them as fixed modifications in the next refined-search can reduce the false-positive rate of validation results as a quantity control. All oxidations were searched as variable modifications, including mass shifts of +15.9949 Da (oxidations), +31.9898Da (Dioxidations), and +47.9847 Da (Trioxidations).
Quantification of heavy to light ratios (RH/L) was performed using pQuant as previously described [36, 38], which directly uses the RAW files as the input. pQuant calculates RH/L values based on each identified MS scan with a 15 ppm-level m/z tolerance window and assigns an interference score (Int. Score, also known as confidence score) to each value from zero to one. In principle, the lower the calculated Int. Score, the less co-elution interference signal was observed in the extracted ion chromatograms. In this regard, the median values of oxidatively modified peptide ratios with σ less than or equal to 0.5 were considered to calculate site-level ratios. For each independent experiment, only proteins identified by two or more distinct peptides with quantified PSM RH/L values were retained for further analysis. In this regard, the RH/L value of each identified protein was calculated as the median of all corresponding PSM RH/L values. For site-level analysis, a differential modification of 15.9949 Da on probe-derived modification was used for quantifying PSM RH/L values. For each independent experiment, The RH/L value of each oxidatively modified site was calculated as the median of all corresponding PSM RH/L values.
Statistical analysis
We required the proteins to be removed if the CVs of the protein intensity in QC samples were more than 30 %. Log ratios were calculated as the difference in log2 LFQ intensity averages between different age groups. Two-tailed, unpaired, heteroscedastic Student’s t-test calculations were used in statistical tests as histograms of LFQ intensities showed that all datasets approximated normal distributions. P-value<0.05 was considered statistically significant. Base 2-fold-change values for ratios <1 are represented as negative reciprocals of the ratios.
For quantify modifications, after open-search and refined-search, we obtain “pd.all_result” document, which obtains the number of identified peptides and the protein groups to which these modifications belong. As well-known, multiple kinds of modifications can occur on a single peptide, and the same kind of modifications can occur on different sites of the same peptide (Additional file 1: Fig. S7a). However, information on the total number of sites where a modification occurs or on how many types of peptides it occurs on is not provided directly. Therefore, we wrote a program to extract this information from the “pd.all_result” document. The program codes are available (Supplementary site calculate method code). Most of the significantly differential modifications in “Species” are basically in line with “Sites” (Additional file 1: Fig. S7b). Comparisons of modifications proportions (one modification’s sites/total modifications sites in this sample) between different groups were conducted using two-sided paired t-tests. The differential modifications were selected according to a P-value < 0.05 and a fold change >2 or <0.5.
Meanwhile, we also used a mini-program (designed and developed by our lab) for random grouping testing to eliminate interruptions between groups. Briefly, the program is based on “permutation and combination”. For example, for a bunch of m pieces of data divided into n groups, there are \({C}_{m}^{n}\) kinds of combinations, our program will calculate the differential modifications or proteins probability of occurring in these random combinations, and the amount if they are still differential (if they satisfy the above conditions).
Bioinformatics analysis
Unsupervised clustering and hierarchical clustering analysis were performed using the ‘Wu Kong’ platform (https://www.omicsolution.org/wkomics/main/) [39]. The distance algorithms of both rows and columns were performed by correlation linkage algorithm. ROC curves and PCA analysis were performed using Hiplot platform (https://hiplot.com.cn/basic/roc). The differential proteins were analyzed by Gene Ontology (GO) based on the biological process, cellular component, molecular function, and KEGG using DAVID (https://david.ncifcrf.gov/) [40]. Canonical pathways and diseases or functions annotation for significantly differentially expressed proteins were generated by g:Profiler toolkit[41] (https://biit.cs.ut.ee/gprofiler/gost), functional profiling of each protein set was conducted using the GO [42], KEGG [43] and WikiPathways databases [44], the full set of GO molecular function, GO biological process, KEGG and WikiPathway terms are provided in Additional Figs. Protein interaction network analysis was performed using STRING (https://string-db.org/cgi/input.pl) based on the STRING database [45]. T-test, analysis of variance, Mann-Whitney U-test, and Kruskal-Wallis test were used to evaluate the statistical comparison between groups. Statistical analysis was performed using GraphPad Prism v9.0. p-value <0.05 was considered significant.
Based on the huge amount of calculation required to limit the algorithm search, this article by virtue of the supercomputing service provided by the platform of the Supercomputing Center of Beijing Normal University. The hardware information was as follows: Windows Computing Platform (8480-4 Windows VM1), configure 4 Xeon Platinum 8160 processors, each processor has 24 cores, each CPU main frequency is 2.1GHz, memory speed is 2666 MHz, and each node is configured with 1TB DDR4 2666 ECC REG memory (32*32GB). Total hard disk capacity ≥ 9 TB, virtual machine configuration: Windows 10, number of cores: 48, memory: 512 GB, total hard disk capacity ≥ 2 TB (expandable).