Quality assessment and interference detection in targeted mass spectrometry data using machine learning

Advances in the field of targeted proteomics and mass spectrometry have significantly improved assay sensitivity and multiplexing capacity. The high-throughput nature of targeted proteomics experiments has increased the rate of data production, which requires development of novel analytical tools to keep up with data processing demand. Currently, development and validation of targeted mass spectrometry assays require manual inspection of chromatographic peaks from large datasets to ensure quality, a process that is time consuming, prone to inter- and intra-operator variability and limits the efficiency of data interpretation from targeted proteomics analyses. To address this challenge, we have developed TargetedMSQC, an R package that facilitates quality control and verification of chromatographic peaks from targeted proteomics datasets. This tool calculates metrics to quantify several quality aspects of a chromatographic peak, e.g. symmetry, jaggedness and modality, co-elution and shape similarity of monitored transitions in a peak group, as well as the consistency of transitions’ ratios between endogenous analytes and isotopically labeled internal standards and consistency of retention time across multiple runs. The algorithm takes advantage of supervised machine learning to identify peaks with interference or poor chromatography based on a set of peaks that have been annotated by an expert analyst. Using TargetedMSQC to analyze targeted proteomics data reduces the time spent on manual inspection of peaks and improves both speed and accuracy of interference detection. Additionally, by allowing the analysts to customize the tool for application on different datasets, TargetedMSQC gives the users the flexibility to define the acceptable quality for specific datasets. Furthermore, automated and quantitative assessment of peak quality offers a more objective and systematic framework for high throughput analysis of targeted mass spectrometry assay datasets and is a step towards more robust and faster assay implementation. Electronic supplementary material The online version of this article (10.1186/s12014-018-9209-x) contains supplementary material, which is available to authorized users.

. Description of QC features in TargetedMSQC. The engineered QC metrics can be classified in nine general categories indicated in the 'QC Group' column. Depending on what attribute of the peak quality they represent, various QC features are calculated at one or more levels as shown in the 'Level' column. For example, jaggedness is reported at transition, isotope and peak group levels, and max intensity is reported only for each transition. The 'Description' column provides a definition for each of the QC features.

QC Group QC Metric Name Level Description
Area Ratio Area2SumRatioCV

Jaggedness TransitionJaggedness Transition
Jaggedness score for each transition for each sample. Jaggedness score is defined as the fraction of time points across a peak where the signal changes direction, excluding the peak apex.

Jaggedness IsotopeJaggedness Isotope
Mean of jaggedness scores of peaks with similar isotope label in a peak group for each sample.

Jaggedness PeakGroupJaggedness.m Peak group
Mean of jaggedness scores of all peaks in a peak group for each sample.

Symmetry TransitionSymmetry Transition
Symmetry score for each transition for each sample. Symmetry score is defined as the Pearson correlation coefficient between left and right half of the peak intensities.

Symmetry IsotopeSymmetry Isotope
Mean of symmetry scores of peaks with similar isotope label in a peak group for each sample.

Symmetry PeakGroupSymmetry.m Peak group
Mean of symmetry scores of all peaks in a peak group for each sample.

Similarity PairSimilarity Transition pair
Similarity score for each transition pair for each sample. Pair similarity score is defined as the Pearson correlation coefficient between peak intensities of two peaks.

Similarity IsotopeSimilarity Isotope
Mean of similarity scores of peak pairs with similar isotope label in a peak group for each sample.

Similarity
PeakGroupSimilarity.m Peak group Mean of similarity scores of all peak pairs in a peak group for each sample.

Modality TransitionModality Transition
Modality score for each transition for each sample. Modality score is defined as the largest unexpected dip in the peak, normalized by peak height.

Modality IsotopeModality Isotope
Mean of modality scores of peaks with similar isotope label in a peak group for each sample.

Modality
PeakGroupModality.m Peak group Mean of modality scores of all peaks in a peak group for each sample.

Shift TransitionShift Transition
Elution shift score for each transition for each sample. Elution shift score is defined as the difference between time at max intensity of each peak and the median of time at max intensities of all the peaks in the peak group normalized by the peak base width.

Shift PairShift Transition pair
Elution pair shift score for each transition pair for each sample. Shift score between two peaks is defined as the difference between the time at max intensities of the two peaks normalized by the peak base width.

Shift IsotopeShift Isotope
Mean of transition shift scores of peaks with similar isotope label in a peak group for each sample.

MeanIsotopeRTConsistency Transition
Consistency of the peak retention time of the same isotope label across all samples for each transition for each sample.

Intensity TransitionMaxIntensity Transition
Maximum peak intensity of each transition for each sample.

Intensity TransitionMaxBoundaryIntensity Transition
Maximum peak intensity at the peak boundary of each transition for each sample.

Intensity TransitionMaxBoundaryIntensityNormalized Transition
Maximum peak intensity at the peak boundary normalized by maximum peak intensity of each transition for each sample. Figure S1. Learning curve for the CSF biomarker in artificial matrix dataset. Cross validation and resampling was used to estimate the performance of the RRF model as a function of the training set size. This curve is used to determine the minimum required size for the training set to achieve acceptable performance. Here, to achieve classification accuracy of 90%, the training set should contain at least 400 transition pairs. Figure S2. Distribution of QC features in the training dataset for the CSF biomarker in artificial matrix dataset. The violin plots illustrate the distribution of peak quality metrics in a training dataset of 576 transition pairs in artificial CSF matrix. Red and green dots indicate transitions that were marked as 'flag' and 'ok' by manual inspection, respectively. Table S2. Summary of performance of peak quality models for the CSF biomarker in artificial matrix dataset. Five supervised machine learning methods were tested for building a predictive peak quality model. The accuracy of the cross-validated model on the training subset (random selection of 80% of the training set) and the corresponding standard deviation is shown in the 'Accuracy' and 'Accuracy Standard Deviation' columns. The models were further tested on the validation subset (the remaining 20% of the training set) that was excluded from the training process. The results of this evaluation are shown in 'Accuracy', 'Sensitivity' and 'Specificity on Validation Subset' columns, respectively. The performance of the regularized random forest model surpassed that of the others.  Figure S3. Comparing the accuracy of peak quality models for the CSF biomarker in artificial matrix study. Accuracy of each model in predicting peak quality was estimated and compared through collecting resampling results. Error bars indicate 95% confidence intervals of the accuracy. The performance of the RRF model surpasses that of the others. Figure S4. Relative importance of QC features in determining the output of the predictive model for the CSF biomarker in artificial matrix dataset. Every defined feature represents certain aspects of peak quality; however, some features play a more defining role in determining the output of the model. In this example, features that are related to FWHM, intensity, peak similarity and correlation and consistency of isotope pair ratios are relatively more important in the outcome of this model. In contrast, features that are related to jaggedness and shift seem to have less influence on the output. Figure S5. Examples of misclassification by the predictive peak quality model for the CSF biomarker in artificial matrix dataset. The outcome of the predictive model was in disagreement with the manual annotation of the training dataset in only two cases shown here. Transitions y8 (A), which was flagged in the training set due to high background at the peak boundary, was marked as 'ok' by the model, resulting in a false negative outcome. In contrast, transition y3 (B) that had passed QC through manual inspection was flagged by the model, resulting in a false positive. It should be noted that both of these examples are marginal cases that do not considerably impact the quantitative results and the outcome of the experiment.  Figure S6. Learning curve for the longitudinal CSF biomarkers of AD dataset. Cross validation and resampling was used to estimate the performance of the RRF model as a function of the training set size. This curve is used to determine the minimum required size for the training set to achieve acceptable performance. Here, the curve plateaus at ~ 900 transition pairs in the training set. Therefore, increasing the training set size beyond that may not provide much more benefit.

Figure S7. Distribution of QC features in the training dataset for the longitudinal CSF biomarkers of AD dataset.
The violin plots illustrate the distribution of peak quality metrics in a training dataset of 1128 transition pairs measured in CSF matrix. Red and green dots indicate transitions that were marked as 'flag' and 'ok' by manual inspection, respectively.  Figure S8. Comparison of accuracy of peak quality models for the longitudinal CSF biomarkers of AD dataset. Accuracy of each model in predicting peak quality was estimated and compared through collecting resampling results. Error bars indicate 95% confidence intervals of the accuracy. The performance of the RRF model surpasses that of the others.

ROC analysis for the longitudinal CSF biomarkers of AD dataset
One question to address is whether the engineered QC features in this study provide an advantage over using the raw chromatograms as inputs to the model. To answer this question, we compared the outputs of two RRF models built using identical training datasets, but different sets of features. The first model took the engineered QC features as input, whereas the second model used the dimensionality reduced raw chromatograms as features to classify peaks into high and low quality groups. The features for the second model were calculated by applying principal component analysis to resampled peak intensities of the raw chromatograms in the training set. The first 5 principal components of the endogenous and standard isotopes, which captured about 99.7% of the data variability, were used as input features for building the RRF model. The performance of these two models was compared using ROC analysis. The model based on engineered QC features achieved an AUC of 0.975; a considerable improvement over the AUC of 0.876 achieved by the model based on principal components of raw chromatograms. This analysis shows that using the engineered QC features, instead of merely relying on raw chromatograms, substantially improves the ability of the model to differentiate low quality peaks. Figure S9. ROC analysis for the longitudinal CSF biomarkers of AD dataset. ROC analysis was performed for the final RRF model developed for predicting peak qualities of candidate biomarkers in CSF matrix. The ROC curve was generated from the model output on the validation subset, returning an AUC of 0.975. Comparing the AUCs of the ROC curves between the models developed based on the engineered QC metrics (AUC of 0.975) and the principal components of peak intensities (AUC of 0.876) shows that using the engineered QC metrics in this study for distinguishing high and low quality peaks provides considerable advantage over relying on peak intensities alone.

Figure S10. Relative importance of QC features in determining the output of the predictive model for the longitudinal CSF biomarkers of AD dataset.
For this dataset, features that quantify peak intensity, modality, correlation of transition ratios, peak area CV and FWHM play a relatively more important role in determining the output of the model. In contrast, features that are related to jaggedness and shift seem to be of less importance. Figure S11. Examples of misclassification by the predictive peak quality model for the longitudinal CSF biomarkers of AD dataset. The outcome of the predictive model was in disagreement with the manual annotation of the training dataset for the transitions highlighted in red. Many of the misclassified transitions are below or close to the limit of quantitation and therefore difficult to annotate even through manual inspection.