Glycated lysine-141 in haptoglobin improves the diagnostic accuracy for type 2 diabetes mellitus in combination with glycated hemoglobin HbA1c and fasting plasma glucose

Background Recent epidemiological studies indicate that only 30–50% of undiagnosed type 2 diabetes mellitus (T2DM) patients are identified using glycated hemoglobin (HbA1c) and elevated fasting plasma glucose (FPG) levels. Thus, novel biomarkers for early diagnosis and prognosis are urgently needed for providing early and personalized treatment. Methods Here, we studied the glycation degrees of 27 glycation sites representing nine plasma proteins in 48 newly diagnosed male T2DM patients and 48 non-diabetic men matched for age (range 35–65 years). Samples were digested with trypsin and enriched for glycated peptides using boronic acid affinity chromatography. Quantification relied on mass spectrometry (multiple reaction monitoring) using isotope-labelled peptides as internal standard. Results The combination of glycated lysine-141 of haptoglobin (HP K141) and HbA1c provided a sensitivity of 94%, a specificity of 98%, and an accuracy of 96% to identify T2DM. A set of 15 features considering three glycation sites in human serum albumin, HP K141, and 11 routine laboratory measures of T2DM, metabolic syndrome, obesity, inflammation, and insulin resistance provided a sensitivity of 98%, a specificity of 100%, and an accuracy of 99% for newly diagnosed T2DM patients. Conclusions Our studies demonstrated the great potential of glycation sites in plasma proteins providing an additional diagnostic tool for T2DM and elucidating that the combination of these sites with HbA1c and FPG could improve the diagnosis of T2DM. Electronic supplementary material The online version of this article (doi:10.1186/s12014-017-9145-1) contains supplementary material, which is available to authorized users.


Background
Diabetes mellitus (DM) refers to a group of metabolic disorders characterized by elevated blood glucose concentrations. The majority (90-95%) of DM patients have type 2 DM (T2DM), which is characterized by peripheral insulin resistance and an inability of pancreatic beta cells to compensate for that by increasing insulin secretion [1].

Open Access
Clinical Proteomics *Correspondence: bioanaly@rz.uni-leipzig.de 2 Center for Biotechnology and Biomedicine (BBZ), Universität Leipzig, Deutscher Platz 5, 04103 Leipzig, Germany Full list of author information is available at the end of the article random plasma glucose of ≥200 mg/dL (≥11.1 mmol/L) as equally valid diagnostic criteria for type 2 diabetes [1]. However, epidemiological studies revealed that compared to OGTT glucose values the use of either HbA 1c or FPG only identifies ~30-50% of previously undiagnosed T2DM patients [4][5][6][7][8]. Moreover, categories of increased risk for diabetes (prediabetes) include a HbA 1c level of 5.7-6.4% or a FPG level between 100 and 125 mg/dL (5.6-6.9 mmol/L) or 2-h plasma glucose level between 140 and 199 mg/dL (7.8-11.0 mmol/L) in the OGTT [1]. For the distinction of patients with prediabetes, HbA 1c values have also been shown to be less sensitive compared to FPG and 2 h OGTT glucose [7]. HbA 1c also plays a critical role in the management of diabetic patients, as it reflects the average plasma glucose level for the preceding 3-month period and correlates well with micro-and macrovascular complications [4][5][6]. FPG is the preferred diagnostic parameter, because it is simple, inexpensive, relatively risk-free, and correlates with diabetic complications like retinopathy [7]. However, even using longitudinally measured HbA 1c levels, the prediction or retrospective attribution of individual T2DM associated complications or mortality are not possible with this parameter of chronic hyperglycemia [8][9][10][11][12]. Therefore, there is an unmet need to better predict the individual chronic hyperglycemia-related diabetes outcomes.
Glycated proteins, such as hemoglobin in erythrocytes and serum albumin as major plasma protein, are recognized as markers of hyperglycemia due to the high sensitivity towards even slightly elevated plasma glucose levels [13,14]. However, only methods determining global protein glycation degrees have been established, whereas recent evidence suggests specific glycation sites in plasma proteins as potential biomarkers for early diagnoses of DM [14][15][16][17]. In contrast to the long half-life time of hemoglobin in erythrocytes, plasma proteins vary in halflife times from hours to several weeks, which might allow selecting a small set of protein glycation sites resembling more closely short to medium term fluctuations of plasma glucose levels.
Here we quantified 27 glycation sites in nine plasma proteins after tryptic digestion by mass spectrometry and evaluated their diagnostic value alone and in combination with current WHO criteria of HbA 1c and FPG levels. Selection of glycation sites relied on a list of 52 candidates we previously identified and studied [17], by judging each for the biomarker potential and ease of peptide synthesis. A single-blind study using plasma samples from 48 newly diagnosed T2DM patients and 48 non-diabetic individuals clearly indicated different glycation degrees characteristic of each glycation site with glycated Lys141 of haptoglobin providing the best sensitivities (~94 and ~78%) and specificity (~98%) when combined with HbA 1c and FPG levels, respectively.

Blood samples
Blood samples were obtained from 48 newly diagnosed male T2DM patients and 48 non-diabetic men matched for age (range 35-65 years) in the context of a study of parameters of insulin resistance (Additional file 1: Table  S1). Anthropometric and laboratory chemistry parameters were measured or calculated as previously described [18,19]. The study was approved by the Ethics Committee of Universität Leipzig (approval no: 159-12-21052012), and performed in accordance to the declaration of Helsinki. All subjects gave written informed consent before taking part in this study. In our study, we defined individuals with T2DM according to the criteria of the American Diabetes Association [1]: • A HbA 1c level of 6.5% or higher, or • A fasting plasma glucose (FPG) level of 126 mg/dL (7 mmol/L) or higher (fasting was defined as no caloric intake for at least 8 h), or • A 2-h plasma glucose level of 200 mg/dL (11.1 mmol/L) or higher during a 75-g oral glucose tolerance test (OGTT), or • A random plasma glucose of 200 mg/dL (11.1 mmol/L) or higher in a patient with classic symptoms of hyperglycemia (i.e., polyuria, polydipsia, polyphagia, weight loss) or hyperglycemic crisis The group of T2DM patients has been further categorized into those with HbA 1c levels patients below (n = 23) and equal or higher than 6.5% (48 mmol/mol) (n = 25). For those patients with HbA 1c < 6.5% (48 mmol/ mol), the diagnosis of T2DM has been established on the basis of repeated measurements of fasting plasma glucose (>7.0 mmol/L) or 2 h oral glucose tolerance test glucose concentrations >11.1 mmol/L [1]. Since patients were newly diagnosed, none of the individuals in the T2DM receive anti-hyperglycemic medication. All men had a body mass index (BMI) larger than 25.0 kg/m 2 and were grouped in overweight (BMI 25-30 kg/m 2 ; 15 without and 10 with T2DM) and obese individuals (BMI ≥ 30 kg/ m 2 ). EDTA blood samples were collected between 8 a.m. and 9 a.m. after a 12-h fasting period, centrifuged (500×g, 5 min) and analyzed within 1 h after blood drawing for routine laboratory parameters or stored at −80 °C after the cell debris was removed by filtration (Rotilabo ® syringe filter) for the analyses of the glycation sites. Plasma insulin and proinsulin were measured with an enzyme immunometric assay for the IMMULITE automated analyzer (Diagnostic Products Corporation, Los Angeles, CA, USA). Serum high-sensitive CRP (C-reactive protein) was measured by immunonephelometry (Dade-Behring, Milan, Italy). HbA 1c , plasma glucose, serum total-high-density lipoprotein (HDL)-, low-density lipoprotein (LDL)-cholesterol, triglycerides, and free fatty acids were measured as previously described [18].

Peptide quantification
Glycation sites previously identified in plasma samples of patients with diabetes [17] were quantified by electrospray ionization mass spectrometry (ESI-MS) on a QTRAP 4000 (AB Sciex, Darmstadt, Germany) coupled on-line to reversed-phase high-performance liquid chromatography (RP-HPLC) using timed multiple reaction monitoring (MRM) ( Table 1 and Additional file 1: Table  S2). Briefly, small molecules and peptides were removed from plasma by ultrafiltration (5 kDa cut-off ). The concentrated sample was digested with trypsin (37 °C, 18 h, 5% w/w), spiked with a concentration-balanced mixture of 13 C, 15 N-labelled glycated peptides as internal standard, enriched for glycated peptides by boronic acid affinity chromatography (BAC), and desalted by solid phase extraction (SPE) using optimized protocols [15,[20][21][22]. The internal standard was added after tryptic digestion to provide the highest accuracy by compensating variations, such as nonspecific degradation, during sample preparation [23], especially as trypsin could cleave some standard peptides at Lys/Arg-Pro-motifs or glycated residues.

Table 1 Glycated peptides of different plasma proteins quantified in tryptic digests of type 2 diabetes and control plasma samples
The timed multiple reaction monitoring relied on the retention time of each peptide in RP-HPLC and a specific precursor/fragment ion pair (Q1/Q3 mass range) t R retention time, C, M, and K carbamidomethylated cysteine, methionine sulfoxide, and fructosamine lysine, HSA human serum albumin, IGKC Ig kappa chain c region, FGB fibrinogen beta chain, A2M alpha-2-macroglobulin, TF serotransferrin, IGLC Ig lambda chain C region, APOA1 apolipoprotein A-I precursor, HP haptoglobin, FGA fibrinogen alpha chain # Sequence Protein symbol (Accession number, glycation site) ESI-MS. Eluents A and B were water and acetonitrile, respectively, containing both formic acid (0.1%, v/v). Elution was achieved by a linear gradients starting 3 min after sample injection from 3 to 10% eluent B within 1 min, to 20% eluent B within 10 min and to 95% eluent B in 7 min. The flow rate was 0.3 mL/min and the column temperature was set to 60 °C. Quantification relied on timed MRM using specific transitions of each targeted peptide and isotope-labelled internal peptide standards synthesized on solid phase in-house. Quantification was performed by integrating individual peaks in extracted ion chromatograms (XICs) using Analyst 1.6 software (AB Sciex) relative to the coeluting isotope-labelled peptides.

Statistics and bioinformatics
The samples from individuals with or without T2DM were evaluated by different statistical tests (Kolmogorow-Smirnow, Mann-Whitney, and t test) and calculation of Spearman rank correlation coefficients using Prism 6 (GraphPad software; La Jolla, USA). Receiver operating characteristic (ROC) analysis and screening for variable combination relied on the Excel-add-in Multibase 2015 (Numerical Dynamics) and Prism 6 software, respectively. The 48 diabetic and 48 samples were classified by a decision tree algorithm using HbA 1c in combination with each glycated peptide (Additional file 1: Table S6). The same technique was also applied to FPG (Additional file 1: Table S7). The decision tree algorithm was implemented using Scikit-Learn [24]. Accuracies were evaluated using nested tenfold cross validation [25]. To find the best feature set for classification, a support vector machine-recursive feature elimination (SVM-RFE) method [26] was applied on all glycated peptides and clinical parameters, including HbA 1c , FPG, BMI, etc. Feature normalization and missing value imputation were performed using WEKA toolkit [27]. The support vector machine was implemented using Scikit-Learn [24]. Accuracies and area under the curve (AUC) values were evaluated using nested tenfold cross validation [25]. Hierarchical clustering was performed on 48 diabetic samples using the "hclust" function in R software version 3.2.1 [28]. To find the subclasses in diabetic samples, the expectation-maximization algorithm in Scikit-Learn [24] was applied. The clustering stability score [29] and elbow criterion [30] were used to find the optimal number of subclasses.

Results
The timed MRM method optimized for quantification of the targeted 27 glycated peptides (Table 1) obtained by tryptic digestion from plasma provided limits of detection (LODs) and quantification (LOQs) in the low to high nanomolar range (Additional file 1: Table S3). Intraand interday precisions showed typically coefficients of variation (CVs) below 20% (Additional file 1: Table S3). The quantities of all 27 glycated peptides normalized to the total protein content of each plasma sample were significantly higher in T2DM than in the control group (P < 0.05, Additional file 1: Figure S1). However, the normalized quantities of all glycated peptides showed a notable overlap. Interestingly, diabetic groups subdivided by an HbA 1c threshold of 6.5% (48 mmol/mol) showed very similar average glycation degrees for all peptides, indicating that the glycation degree of hemoglobin in erythrocytes showed only a weak correlation with the glycation levels of serum proteins providing a rationale that they may have different diagnostic and prognostic values.
As the diagnostic accuracies of HbA 1c (76% for cut-off 6.5% (48 mmol/mol), 86% for cut-off 6.0%), FPG (70% for cut-off 7.00 mmol/L, 78% for cut-off 5.69 mmol/L), and the evaluated glycated peptides (60 to 73%) were insufficient, the data sets were screened by decision tree algorithm for variable combinations of HbA 1c or FPG with any glycated peptide for optimal sensitivity, specificity, and accuracy using different cut-off points. HbA 1c in combination with glycated peptides yielded mostly sensitivities from 75 to 81%, specificities from 96 to 100%, and accuracies from 87 to 90% (Additional file 1: Table  S6). Most interestingly, analyses revealed a haptoglobin peptide glycated at Lys141 (HP K141; sequence No. 25) which provided in combination with HbA 1c a sensitivity of 94%, a specificity of 98%, and an accuracy of 96% for cut-off points 6.0% (42 mmol/mol, HbA 1c ) and 30 fmol/mg (HP K141) (Fig. 1a). When HP K141 was combined with FPG (6.0 mmol/L), sensitivity to detect T2DM was 78%, specificity 98%, and accuracy 88% (Fig. 1b), whereas the other glycated peptides typically provided slightly lower accuracies (Additional file 1: Table S7). Since the decision tree algorithm did not deliver all cut-off values for all combinations, manual confirmation was carried out. Combining manually HbA 1c or FPG with glycated peptides provided slightly enhanced diagnostic accuracies for all combinations and could confirm all cut points provided by the algorithm with only small variations (Additional file 1: Tables S8 and S9). Noteworthy, the combination of HP K141 and FPG (6.0 mmol/L) was improved to a sensitivity of 83%, a specificity of 98%, and an accuracy of 91%.
The SVM-RFE method was applied to find a set of diagnostic parameters and peptide amounts for maximizing the classification of T2DM patients and controls. It revealed a set of 15 features providing a sensitivity of 98%, a specificity of 100%, and an accuracy of 99% (Fig. 2).
The cluster analysis of the 48 T2DM plasma samples was applied to an expectation-maximation (EM) algorithm considering all glycated peptides and clinical parameters available (27 glycated peptides + 38 clinical parameters). The result showed two or maximal three clusters (Fig. 3). Glycated peptide levels were distinct among different groups, however, clinical parameters, including HbA 1c and FPG, were similar. This suggests that glycated peptides can be more useful to detect subtle differences among diabetic patients. A principle    (Fig. 3). A cluster stability test considering the elbow criterion identified three clusters as optimal number. This was further confirmed by hierarchical clustering that also suggested three groups of T2DM patients (Additional file 1: Figure S2).

Discussion
Protein glycation as diagnostic criterion has been addressed in many studies for decades [13][14][15][16][17]. Surprisingly, only glycation of intracellular hemoglobin is currently considered as biomarker to monitor patients assuming that HbA 1c levels closely reflect average glucose blood levels over the previous 3 months. Plasma proteins have been mostly analyzed for their global glycation degrees [especially human serum albumin (HSA)], which can be attributed to the lack of specific immunoassays, i.e. antibodies recognizing individual glycation sites. As an example, measurement of fructosamine determines the fraction of total serum proteins (mainly serum albumin) that have undergone glycation. Because albumin has a half-life of approximately 3 weeks, the plasma fructosamine concentration reflects relatively recent changes in blood glucose and is therefore not commonly used to monitor diabetes treatment [31].
Additionally, mass spectrometry has been typically applied for mapping glycation sites in serum proteins, whereas quantitative studies on distinct glycation sites in larger patient cohorts are missing. Besides the analytical challenges, this is probably attributed to the general assumption that glycation as a non-enzymatic reaction depends only on the glucose concentration and thus the glycation levels of individual sites will change at similar degrees and thus can be judged from global glycation degrees. Here, we could show that the glycation levels of 27 sites in nine plasma proteins provide significantly different sensitivities, specificities and accuracies for classifying T2DM patients and controls. Most importantly, these glycation degrees correlated only slightly to HbA 1c and other clinical parameters like FPG. Additionally, sensitivities of the newly identified glycation sites were mostly better than for HbA 1c , while specificities were lower demonstrating that T2DM diagnosis might benefit from combining currently applied clinical parameters with plasma protein glycation sites analyzed in the context of this study. Despite studying 27 glycation sites, it was our aim to identify the diagnostically most relevant modification to keep the number of biomarkers and thus the costs of the envisaged diagnostic tools low. Different statistical analyses identified glycation of Lys141 in haptoglobin as the best parameter in combination with HbA 1c and FPG. The obtained sensitivity (94%), specificity (98%), and accuracy (96%) of HP K141 in combination with HbA 1c exceeded the corresponding values of HbA 1c significantly. This suggests that in addition to HbA 1c measurements, glycation of Lys141 in haptoglobin could be used as a diagnostic tool in patients with T2DM and those with a high risk to develop T2DM (Fig. 1a). Further studies are necessary to test whether combination of these two parameters of chronic hyperglycemia may better predict individual risk for the development of diabetes complications. However, an earlier diagnosis of T2DM may be important for individual patients' outcomes, as it has been demonstrated that a "bad glycemic memory" may contribute to a higher risk of diabetes complications even after periods of well-controlled hyperglycemia [32]. The negative correlation between HP K141 and both BMI and C-peptide appears interesting, as it might indicate higher glycation levels in T2DM patients are caused by both diabetes-specific, but also obesity-associated metabolic alterations, which subsequently may cause chronic hyperglycemia. Negative correlations between glycation levels and BMI suggest that adipose tissue in patients with obesity may have a higher capacity to take up excess glucose before it contributes to glycation of circulating proteins. However, this hypothesis has to be formally proven by glucose distribution studies in lean versus obese animal models under hyperglycemic conditions. Although our initial intention was to identify a single biomarker, statistical evaluation of the data revealed a promising features selection matrix using three glycation sites in HSA (K93, K262, and K414) besides HP K141 representing short-to-medium term glucose fluctuations in combination with twelve routine parameters typically used to characterize T2DM (FPG, HbA 1c , fasting insulin), metabolic syndrome (triglycerides and blood pressure), obesity (BMI, waist circumference, waist-to-hip ratio), inflammation (leukocytes, C-reactive protein), and insulin resistance (HOMA-IR), and age [33]. This feature set provided an extremely high accuracy of 99% for the sample cohort providing the best diagnostic value that has been reported to the best of our knowledge. For example, a feature set reported in 2009 relied on 160 individuals (Danish Inter99 prospective study), who progressed from initially non-diabetic to diabetes during the following five years. The predictive values of 58 biomarkers (selected from presumed diabetes-associated pathways) together with six routine clinical parameters were evaluated by statistical learning methods [34]. The best features, i.e. six biomarkers (adiponectin, C-reactive protein, ferritin, interleukin-2 receptor A, glucose, and insulin), were used in a PreDx diabetes risk score (DRS) model providing an AUC of 0.76 that increased to 0.78 when family history, age, BMI, and waist circumference were added. A later report indicated an AUC of 0.838 [35]. The PreDx DRS model outperformed single variables like FPG (~0.73) and HbA 1c (~0.66) significantly. In this respect the 15 feature set selected here considering only four new biomarkers besides routine laboratory measures, provides a better diagnostic accuracy with on the AUC of 0.99375 (accuracy 99%).
Another intriguing result of statistical evaluation and hierarchical clustering was the separation of diabetes patients in three subgroups based on all patient information including all 27 glycation sites ( Fig. 3; Additional file 1: Figure S2). At this stage, the clinical relevance remains open, but it is compelling to speculate that the subgroups might represent distinct subgroups of T2DM. This hypothesis is supported by a recent report that three T2DM subgroups have been dissected through topological analysis of patient similarity [36]. In this analysis, one T2DM subtype was characterized by diabetic nephropathy and retinopathy, another showed enriched risk for malignancies and cardiovascular diseases, whereas the third subtype correlated most strongly with cardiovascular diseases, neurological diseases, allergies, and specific infections. Based on this data, future longitudinal studies, which assess the risk of these disease entities in addition to the glycation sites described in our study should be performed to test the hypothesis that these three subtypes could be distinguished by protein glycation markers.
These subgroups may also respond differently to multifactorial T2DM treatment strategies or may predict the progression of the disease itself and the individual risk for developing diabetes complications. Noteworthy, our study has some limitations with regard to the small size of our discovery cohort and the limited transferability into clinical practice. Although we only included newly diagnosed T2DM patients at the earliest possible time point and without antidiabetic treatment, we cannot relate the time point of diagnosis to a defined metabolic state. In addition, we could not exclude individual differences in the duration and severity of the prediabetic phase including chronic effects of intermittent glucose and lipid toxicity. Moreover, the suggestion that glycated Lys141 in haptoglobin may improve diagnostic accuracy for T2DM needs to be tested in prospective studies and cohorts representing the common population.
In conclusion, the 27 glycation sites quantified here provided sensitivities up to 79% and specificities of up to 88% to distinguish T2DM samples relative to age-and BMI-matched control samples using specific cut-offs for each glycation site. The cut-off values of HbA 1c (6.5% (48 mmol/mol)) and FPG (7.0 mmol/L) recommended by the WHO showed better specificities of 100%, but lower sensitivities of only 52 and 40%, respectively. Interestingly, lowering the cut points to 6.0% (42 mmol/mol) and 5.69 mmol/L improved the sensitivity significantly to 77 and 75%, respectively, while the selectivity was reduced to 94 and 81%, respectively. This is in good agreement with recent reports suggesting that lower HbA 1c and FPG cut points will dramatically increase diagnostic sensitivity [37][38][39][40][41]. Remarkably, plasma proteins appeared to follow other glycation kinetics than hemoglobin, which might be related to their different environment (i.e. blood versus erythrocytes). Thus, glycation sites in plasma proteins may provide an additional diagnostic tool, as confirmed here by combining the glycation levels HbA 1c and HP K141 providing high sensitivity (94%) and specificity (98%). The advantage of combining these two markers are the different half-life times of the corresponding proteins, i.e. 3-4 months for intracellular hemoglobin and two to four days for haptoglobin (t 1/2 = 2-4) [42] being sensitive to long-and short-term fluctuations of blood glucose concentrations. Furthermore, the combination of 15 features consisting of established clinical parameters and several glycated plasma proteins allowed dividing the newly diagnosed patients with T2DM into three different subgroups supporting the hypothesis that heterogeneity of T2DM phenotypes maybe due differentially affected pathways including significant differences in protein glycation.  Table S1. Characterization of type 2 diabetes patients and matched non-diabetic persons enrolled in this study. Table S2. Parameters and settings used for the RP-HPLC-ESI-QqLIT-MS operating in scheduled multiple reaction monitoring (MRM) mode. Table S3. Precision, sensitivity and linearity parameters for glycated peptides. Table S4. Spearman rank correlation coefficients (rS) and corresponding P values (P) of the statistical relation between glycated peptides and several diagnostic parameters. Table S5. Receiver operating characteristic (ROC) parameters for peptide levels of all 27 glycated peptides quantified in tryptic digests of plasma samples obtained from 48 type 2 diabetes patients and 48 controls. For comparison, ROC parameters of HbA1C and fasting plasma glucose (FPG) are listed. Table S6. Evaluation metrics for classification of type 2 diabetes patients and controls by combining the levels of 27 glycated peptides in tryptic plasma digests and corresponding HbA1c levels calculated by Decision Tree classifier from Scikit-learn package. Table S7. Evaluation metrics for classification of type 2 diabetes patients and controls by combining the levels of 27 glycated peptides in tryptic plasma digests and corresponding FPG levels. Table S8. Evaluation metrics for classification of type 2 diabetes patients and controls by combining the levels of 27 glycated peptides in tryptic plasma digests and corresponding HbA1c levels. Variable cut points were optimized manually for best classification. Table S9. Evaluation metrics for classification of type 2 diabetes patients and controls by combining the levels of 27 glycated peptides in tryptic plasma digests and corresponding fasting plasma glucose (FPG) levels. Variable cut points were optimized manually for best classification. Figure S1. Quantification of glycated peptides in tryptic plasma digests obtained from type 2 diabetes patients and non-diabetic controls using internal calibration. Figure S2. Dendrogram resulting from a hierarchical clustering of 48 type 2 diabetes patient samples considering all available patient information, i.e. diagnostic parameters and glycated peptide levels.