Using phosphoproteomics data to understand cellular signaling: a comprehensive guide to bioinformatics resources

Mass spectrometry-based phosphoproteomics is becoming an essential methodology for the study of global cellular signaling. Numerous bioinformatics resources are available to facilitate the translation of phosphopeptide identification and quantification results into novel biological and clinical insights, a critical step in phosphoproteomics data analysis. These resources include knowledge bases of kinases and phosphatases, phosphorylation sites, kinase inhibitors, and sequence variants affecting kinase function, and bioinformatics tools that can predict phosphorylation sites in addition to the kinase that phosphorylates them, infer kinase activity, and predict the effect of mutations on kinase signaling. However, these resources exist in silos and it is challenging to select among multiple resources with similar functions. Therefore, we put together a comprehensive collection of resources related to phosphoproteomics data interpretation, compared the use of tools with similar functions, and assessed the usability from the standpoint of typical biologists or clinicians. Overall, tools could be improved by standardization of enzyme names, flexibility of data input and output format, consistent maintenance, and detailed manuals.

and can benefit from active involvement of biologists and clinicians.
A vast array of resources and tools are available to facilitate the interpretation of phosphopeptide identification and quantification results.However, each of these tools exists as a silo without connection to tools with complementary functions.In addition, many tools have overlapping functions but differ in underlying knowledge bases, algorithms, input and output format of data, accessibility, advantages, limitations, and maintenance.Although newly developed tools are usually compared to similar, previously published tools, comparisons often do not include real-world, biological use-cases.For example, inference of kinase activity based on the observed phosphorylation of its substrates is a powerful application of phosphoproteomics profiling, and multiple methods have been developed to address this need [3,4].However, there has been little validation of the methods and only one benchmarking study comparing a few of the methods has been published [3].
Biological and clinical scientists are in the best position to extract biologically and clinically relevant findings from phosphoproteomics data, however, they are rarely consulted for tool design input or requested to test the final product.Furthermore, there is no comprehensive list of tools to aid those using phosphoproteomic data in their research.Therefore, this article aims to provide a comprehensive collection of resources that can be used to gain insights from phosphoproteomic data, including knowledge bases of kinases and phosphatases, phosphorylation sites, kinase inhibitors, and sequence variants affecting kinase function, and bioinformatics tools that can predict phosphorylation sites in addition to the kinase that phosphorylates them, infer kinase activity, and predict the effect of mutations on kinase signaling.We perform some benchmarking comparisons to determine the best tool available and assess usability of the tools from the standpoint of typical biologists or clinicians.

Collection of knowledge bases and tools
The OMICtools resource (https:// omict ools.com) is a manually curated collection of bioinformatics tools [5].This site was searched in July 2019 for tools using the words 'kinase' , 'phosphorylation' , 'phospho' , or 'phosphatase' .In addition, several more tools were collected from the literature.Only tools that were freely available, still accessible, and non-obsolete were included, and tools specific for organisms other than human were discarded.The year of last update was assumed to be the year of publication unless otherwise noted on the website.These tools may be accessed by a downloadable, locally-run tool (Tool) or by a website (Web) that may have downloadable (DL) results or database information.The website URLs for all resources can be found in Additional file 1: Table S1.Each website was accessed in July 2019 and data statistics were collected for human proteins from downloadable files where possible and from websites or manuscripts for online-only resources.

Knowledge bases of kinases and phosphatases
General information about the components involved in kinase signaling is required throughout the analysis and interpretation of phosphoproteomics data.Knowledge bases for kinase signaling can be separated into those collecting information on the enzymes, and those collecting experimentally validated phosphorylation sites.Of the 16 different resources that collect information specifically on protein kinases and phosphatases, 13 provide data on kinases, while 5 provide data on phosphatases (Table 1).Only two resources, the Eukaryotic Protein Kinase & Protein Phosphatase Database (EKPD) and its updated version iEKPD contain information on both types of enzymes [6].Most databases are only available as online websites, but some provide an option for downloading data (Table 1).
The kinase knowledge bases can be further separated into two different types: those that include comprehensive data on all known protein kinases, and those that were developed for a specific purpose, such as collecting driver mutations in kinases (Kin-Driver).Notably, no kinase resource collects data on non-protein kinases.KinBase, which was developed by Gerard Manning, contains 538 protein kinases and is considered the primary source of human protein kinases and their classification [7].Many other resources base their kinase list on KinBase.
Kinomer and KinG are general kinase sequence databases that provide very little other information [8,9].KinMutBase, a collection of disease-causing mutations in protein kinase domains, is outdated, contains data on only 31 kinases, and primarily consists of broken links [10].KinWeb and EKPD provide gene and protein identifiers, classification, description, and sequence information, but these data can also be found in other resources.However, KinWeb does have prediction of the disulfide bonding state of cysteines in the protein, as well as prediction of alpha helices, and EKPD presents data in an easy-to-read format [6,11].
Use of the remaining general resources depends on which data one wants to access.KinaseNET, ProKinO, and iEKPD contain the most comprehensive data on protein kinases, but KinaseNET and ProKinO are only available as online resources [12,13].They include protein sequences, links to the kinases in other databases (e.g., UniProt, Ensembl, Entrez), information on the kinase domains, expression in tissue, and disease associations.ProKinO specifically contains pathway information, mutations and their disease associations, chromosomal location of the kinase, and links to published manuscripts.KinaseNET includes PTMs, known binding partners, inhibitors, upstream kinases, downstream substrates, and information about regulation.KinaseNET provides all data on a single page, ProKinO requires more than 10 clicks on separate tabs and pages to obtain all information on a kinase, and iEKPD contains links for 13 additional annotations.
For disease studies, MOKCa and Kin-Driver specifically have data on protein kinase mutations [14,15].MOKCa has tissue specificity of mutations while Kin-Driver focuses on driver mutations and reports whether the mutation is activating or inactivating.KLIFS provides structural information for approximately half of the protein kinases bound to various ligands [16].Finally, KID-FamMap combines structural data with known kinase inhibitors and diseases [17].
Because phosphatases are less well studied than kinases, there are fewer resources dedicated to their collection.EKPD and iEKPD provide the same information for phosphatases as they do for kinases.HuPho, however, was the first comprehensive collection of phosphatases and the database includes pathway and substrate data, as well as siRNA phenotype data and links to orthologs in other species [18].DEPOD also includes pathways, substrates, and links to orthologs in addition to interacting partners and upstream kinases [19].Finally, Phosphatome.Net is the phosphatase version of Kin-Base [20].The website contains basic classification and sequence information.

Knowledge bases of phosphorylation sites
Besides information about specific kinases and phosphatases, data on phosphorylation sites are important for studying the signaling process.Phosphorylation site databases collect information on the location of phosphorylated residues in proteins from experimental data.These experiments can be low-throughput or high-throughput.High-throughput phosphorylation site identifications are assigned by probability unlike the more stringent experimental validation in low-throughput experiments, but some databases combine sites from both types of experiments without identifying the source experiment type.
In addition to phosphorylation site information, 16 of the 27 (60%) resources collect interactions between kinases or phosphatases and their substrates (Table 2).These often do not include the exact phosphorylation site, but instead provide interactions between an enzyme and its substrate at the gene level.
The four main resources for phosphorylation sites curated data manually from the literature (Fig. 1).HPRD and Swiss-Prot are general databases of all proteins [21,22].The remaining two, PhosphoSitePlus and Phospho.ELM, specifically contain phosphorylation site information [23,24].Both PhosphoSitePlus and Swiss-Prot are frequently updated, while HPRD and Phospho.ELM were last updated in 2010.All four of these databases also include kinase information for sites if known.Other smaller databases were generated through manual curation or publication of a laboratory's own phosphorylation site data.KANPHOS collects phosphorylation sites in neural signaling identified by highthroughput experiments [25].LymPHOS, PhosphoDB, Phosphopedia, and PHOSIDA are collections of data that were primarily produced in cell lines [26][27][28][29].PhosphoPEP integrates mass spectrometry experiments from Cell Signaling Technology and their own laboratory [30,31].PTMfunc and qPhos both collect mass spectrometry experiments and add functional predictions and kinase activity from various tools [32,33].Signor extracts high quality signaling interactions from the literature [34].Finally, ANIA, PTMD, and Phospho-Networks curate the literature for a specific purpose.ANIA collects phosphorylation sites that serve as binding sites for 14-3-3 proteins, while PhosphoNetworks creates a kinase-substrate network curated from the literature and a protein microarray experiment, and PTMD collects disease-related phosphorylation sites [35][36][37].
The remaining resources integrate phosphorylation sites and kinase information from other databases (Fig. 1).The database dbPAF collects phosphorylation sites from several databases [38].ProteomeScout also collects phosphorylation sites from other databases along with literature-curated experiments and provides a tool for analyzing a user's data [39].The database dbPTM collects all PTMs and the responsible enzyme from several sources [40].Kinome NetworkX, RegPhos, and Phos-phoAtlas curate and integrate data specifically to create kinase-substrate networks [1,41,42].PhosphoNET is an online-only tool that includes predicted phosphorylation sites in addition to those with experimental evidence [43].Finally, Phospho3D specifically collects phosphorylation sites with 3D structures [44].

Table 2 Databases of phosphorylation sites
The number of unique kinases and phosphatases reported to phosphorylate sites in the database is included.For some databases, these numbers include enzyme groups in addition to individual enzymes.Data type indicates whether the data are from mass spectrometry (MS) experiments, separated high-throughput (HT) and low-throughput (LT) experiments, or whether the database combines data from both HT and LT experiments without specifying (UNSP) a Indicates inclusion of predicted phosphorylation sites (pred)

Name
Last Five databases collect information on phosphatasesubstrate interactions.As mentioned, DEPOD, HuPho, and Phosphatome.Net all curate enzyme interactions from the literature.HPRD and Signor also collect some site-specific phosphatase information.
Each database contains a different number of phosphorylation sites and enzyme-substrate relationships depending on the source and method of collection (Table 2).ProteomeScout, PhosphoSitePlus, dbPTM, and dbPAF contain the most experimentally validated, downloadable sites.The site numbers for these four databases include specific protein isoforms, as do several other resources.PhosphoAtlas contains substrates for the largest number of individual kinases.Signor, Swiss-Prot, RegPhos, Phospho3D, dbPTM, and Phospho.ELM have substrates for individual kinases and kinase families.Finally, PhosphoSitePlus has substrates for some specific kinase isoforms.

Errors in substrate databases
Based on our examination, PhosphoSitePlus is the preferred resource for experimentally-identified phosphorylation sites and kinases for phosphorylation sites.PhosphoSitePlus is frequently updated, well-curated, and distinguishes between low and high-throughput identified sites.The downstream integrating databases suffer from ID mapping errors.For example, in PhosphoAtlas there is an entry for PEG (paternally expressed gene 3) phosphorylating CDC25B.PEG is not a known kinase, but pEg3 kinase (also known as maternal embryonic leucine zipper kinase, MELK) is known to phosphorylate CDC25B [45].Many of the downstream databases also have issues with PDPK1 and PDK1.The gene PDPK1, 3-phosphoinositide-dependent protein kinase 1, produces a protein known to the biological community as PDK1.However, there is an additional kinase, pyruvate dehydrogenase kinase, that is produced by the gene PDK1.Databases that try to integrate sites frequently attribute the substrates of PDPK1 to PDK1.Finally, integrating databases propagate errors from the original databases.For example, HPRD contains an entry for PTPN11 phosphorylating PTK2B although PTPN11 is a known phosphatase and not a kinase.The original manuscript connected to this entry confirmed that PTPN11 is a phosphatase and that it just binds to PTK2B at that particular site [46].Databases that collect information from HPRD, such as RegPhos and PhosphoAtlas, include this incorrect entry for PTPN11.

Known substrates of kinases and phosphatases
The four main databases of kinases together produce 485 substrate sets of individual kinases and kinase families (Fig. 2a).PhosphoSitePlus contains the most unique sites, while other databases contribute only a few additional sites per kinase.CSNK2A1 has the most substrates (596), while over half of the sets contain fewer than 10 substrates.
For substrates of phosphatases, DEPOD, HPRD, and Phosphatome.Net combined produce sets for 83 phosphatases.The most unique information comes from DEPOD and Phosphatome.Net.The number of known sites for each phosphatase is far fewer than that for kinases.PPP2CA has the most substrates (167), while 70% of the phosphatases have fewer than 10 substrates (Fig. 2b).

Phosphorylation site prediction tools
Despite decades of research, very few phosphorylation sites have known kinases or phosphatases.Of the sites in PhosphoSitePlus, only about 3% have an experimentally validated human kinase.Therefore, numerous tools have been developed to predict which sites in a protein can be phosphorylated and which kinases phosphorylate that given site.
These prediction tools were developed using a variety of features and methods and have been reviewed elsewhere [47,48].The early versions of phosphorylation site predictors were motif-based.They generated the frequency of amino acids surrounding a site and searched for that pattern in protein sequences.Later tools used more sophisticated methods such as support vector machines (SVM), random forest, Bayesian probability, position specific scoring matrices (PSSM), and deep neural networks [49][50][51][52][53].Besides amino acid sequence, tools included a vast array of features such as the 3D structure of the phosphorylation site, disorder score, cell cycle data, and co-expression of kinases and substrates [54][55][56].Others, like NetworKIN and iGPS, used protein-protein interaction data to filter predictions [57,58].Table 3 provides   all currently available tools to predict phosphorylation sites or kinases for phosphorylation sites.While a few tools have been developed to predict sites for phosphatases, only Ptpset, NetPhorest, and NetworKIN are still accessible [49,58].
Figure 3 shows phosphorylation site predictor tools and the resources they used to make predictions.Almost all phosphorylation site predictors were trained using data from Phospho.ELM.Swiss-Prot and Phos-phoSitePlus were also heavily used resources.Notably, almost all tools were developed using experimentally verified substrate data as the training set.Therefore, the tools are only able to predict the responsible kinase if there is sufficient data for substrates of that kinase.
A researcher may utilize these prediction tools to identify kinases phosphorylating single substrates of interest, for which web-based tools would suffice.However, the limit on the number of sequences submitted for prediction and the lack of downloadable results prevent these same tools for being useful in large-scale phosphoproteomic studies.Unfortunately, many tools appropriate for large-scale studies have multiple issues limiting their use.First, tools can be difficult to install, platform-specific, and lack manuals on use.For example, NetPhos [59] is downloadable but can only be run on Linux, whereas PhoScan [60] can only be run on Windows machines.Other tools require commercial software such as MATLAB or even require understanding a programming language to modify hard-coded variables.Finally, tools like GPS [61] and phos_pred [49] provide pre-defined cutoffs for prediction, while others like musite [62] and KSP-PUEL [63] allow users to define their own thresholds or to train the models using their own data.

Testing kinase-substrate relationship prediction tools
For large-scale kinase-substrate prediction, 14 pretrained tools were available that provide downloadable results.The best, unbiased way to test these tools is to use validated sites that were not used for the training of any tool.Unfortunately, most tools do not report the actual sites used for training and finding a set of sites to fit these criteria is nearly impossible.Therefore, we evaluated all 14 tools using gold-standard positive and negative human phosphorylation sites downloaded from dbPTM [64] for four serine/threonine kinases (CDK1, CK2, MAPK1, and PKA).Positive sites were serines and threonines experimentally validated to be phosphorylated by a particular kinase.Negative sites were serines and threonines not known to be phosphorylated on the same proteins.The outcomes might be biased in favor of newer tools and those that used some of these sites in their training.
Tools predicting kinases for phosphorylation sites (Table 3) were accessed through local tool installation or through the tool's website.PhoScan [60] and phos_ pred [49] were run locally on a Windows laptop, while NetPhorest [65], NetworKIN [66], iGPS [57], GPS [61], DeepPhos [67], jEcho [68], and MusiteDeep [50] were run locally on a Mac laptop.AKID [69], PhosphoPICK [70], NetPhos [71], Musite [62], and pkaPS [72] were accessed via their websites.Tools were set with the lowest threshold if they did not have an option to return scores for all a Indicates number of trained kinases, but tool can be trained with others SVM support vector machine, PSSM position specific scoring matrix, GSEA gene set enrichment analysis, ANN artificial neural network, DNN deep neural network, HMM hidden Markov model, PPI protein-protein interaction, KNN K-nearest neighbor a Indicates tool is not available for all three main operating systems (Linux, Mac, Windows) sites.For each site, the maximum score was retained if the tool predicted for more than kinase isoform (e.g., the maximum score of PKCalpha and PKCbeta on the same site).If a tool did not return a score for a site, the lowest possible score was given to the site.The receiver operating characteristic (ROC) curve and area under the ROC curve (AUROC) were calculated for the results from each tool using the R package ROCR [73].ROC curves for four kinases (CDK1, CK2, MAPK1, and PKA) are shown in Fig. 4. Notably, musite was unable to predict for a few random protein sequences in each submission.DeepPhos and phos_pred both required manual edits of hard-coded variables.MusiteDeep and GPS had the highest area under the curve (AUC) for all kinases tested.The PKA-specific tool pkaPS also performed well.Performance for most tools, however, varied across kinases.

Comparison of kinase activity tools
The known or predicted kinases for phosphorylation sites can be used to infer kinase activity from global phosphoproteomic data.Tools and methods have been developed to predict kinase activity, but there has been little effort spent towards comparing these tools or determining the most biologically-relevant set of parameters.The available tools (PHOSIDA, KEA2, KSEA App, PHOXTRACK, INKA, and IKAP) each use a different algorithm to infer activity (Table 4).The PHOSIDA de novo motif finder uses a simple method of bootstrapping to determine enrichment of sequence motifs in a set of phosphorylated peptides and then matches those to known kinase motifs [26].Kinase Enrichment Analysis 2 (KEA2) uses over-representation analysis to determine enrichment of kinase substrates in a condition [74].Similarly, the KSEA App uses mean phosphorylation of substrates of kinases Fig. 3 Network of phosphorylation site predictor tools and the resources used to make predictions.Tools are colored purple while the databases used by the tools are colored blue as a proxy for activity [4].PHOXTRACK modified preranked gene set enrichment analysis (GSEA) to determine enrichment of known kinase targets [75].IKAP extended these methods using a cost function to infer the relative contributions of multiple kinases acting on the same site [76].Finally, INKA combines the GSEA method with activating phosphorylation on kinases [77].
We used a phosphoproteomic dataset from a cell line experiment with 20 kinase inhibitors [78] to test four kinase activity prediction tools.Because PHOSIDA is only available online without downloadable results, we excluded this tool from further analysis.INKA was also excluded as it requires MaxQuant search result files.The R programming environment was used to create files in the input format for each tool.Significantly downregulated sites for each inhibitor were submitted to KEA2 and significantly inhibited kinases were defined as those with false discovery rate (FDR) < 0.05 and at least 3 overlapping substrates [74].The log2 fold change for each thirteenmer phosphorylation site (± 6 amino acids surrounding the phosphorylated site) was submitted to PHOXTRACK (1000 permutations, minimum number of substrates = 3, weighted statistics) [75].Significantly inhibited kinases were defined as those with FDR < 0.05 and normalized enrichment value < 0. The fold change for each site with each inhibitor was submitted to the KSEA app website and significantly inhibited kinases were defined as those with FDR < 0.05, at least 3 substrates in the dataset, and a z score < 0 [4].The substrates of kinases from Phospho-SitePlus (version July 2017) and Signor (version October 2017) were used for IKAP [23,34,76].IKAP was run locally on a Mac laptop with the bounds between -11 and 11 and 50 iterations.The 5 kinases with the lowest activity scores for each experiment were chosen.The positive set were kinases known to be inhibited by each drug (as reported in supplementary table in Ref. [78]); all other kinases predicted by the tools were considered to be negative.The significant kinases for each tool were counted for presence in the positive and negative sets.
Comparison of these tools is challenging because they use different input and underlying databases.KEA2 requires a set of sites in the format of HGNC symbol and phosphorylated amino acid residue position separated by an underscore.It contains sets for 250 different kinases.KSEA App requires a strictly formatted comma-delimited file with the HGNC symbol, phosphorylated position, and non-log-transformed fold change.Users can choose between known sets from the July 2016 release of PhosphoSitePlus or the known + predicted site sets from PhosphoSitePlus and NetworKIN.PHOXTRACK requires a two-column file with a thirteenmer peptide and log-transformed fold change.It can use substrate sets from the four main databases or a user-supplied database.Finally, IKAP required tabular data entered into MATLAB, manual modification of MATLAB code to change parameters, and allowed a user to upload their own set of substrates.Because one thirteenmer might match multiple proteins and phosphorylated positions, the actual substrate list presented to each tool may differ slightly.
To determine how well each tool covered the known targets of kinases, we counted the number of significantly downregulated known kinases of each inhibitor and the significantly downregulated kinases of each inhibitor that were not known targets of that inhibitor.The KSEA App made the most true positive predictions across all experiments, while IKAP made the fewest true positive predictions (Fig. 5a).PHOXTRACK made the fewest false positive predictions (Fig. 5b).
Besides upstream kinase activity, phosphoproteomics data could additionally be used to explore altered downstream pathways.While standard tools and methods such as GSEA are typically used for this analysis, all are limited to using overall gene-level phosphorylation [79].Unfortunately the functional contribution of individual sites to pathway signaling is poorly annotated in gene set databases, although PTMsigDB has some limited pathway sets [80].Until new tools are built to handle individual sites in pathway analysis, a user might combine the results from kinase activity prediction to assemble altered kinases into pathways using tools such as String, RegPhos2, or Wikipathways [42,81,82].

Differential and clustering analysis of phosphoproteomics data
Besides activity prediction, phosphoproteomic data can be used for other analyses.SELPHI is a good tool to first explore the data as it allows biologists to quickly and easily analyze phosphoproteomic data with clustering analyses, kinase-substrate correlation, and pathway enrichment [83].PhosFox then compares phosphorylated peptides between conditions [84].Finally, a set of tools (CellNOpt, Sorad, CLUE, DynaPho, and KinasePA) were developed specifically for phosphoproteomic time-course or multiple condition analyses (Table 4) [41,[85][86][87][88].

Prediction of mutation effect
Analysis and interpretation of phosphoproteomic data can be enhanced with other multi-omics data types.For example, sequence variants can affect kinase function or presence of a phosphorylation site.The databases PhosSNP [89] and ActiveDriverDB [90] collect gene polymorphisms and somatic mutations, respectively, near phosphorylation sites and categorize them based on suspected effect (Table 5).ActiveDriverDB also includes predictions from Mutations Impact on Phosphorylation (MIMP), which uses Bayesian statistics to predict whether mutations around a phosphorylation site will change which kinase binds to that site [91].It can predict rewiring for 124 kinases using experimentally validated data, or it can be extended to predict for 322 kinases using predicted kinase-substrate relationships.ReKINect also predicts rewiring from mutations, but it further predicts the destruction or creation of phosphorylation sites and inactivation or constitutive activation of kinases [92].PhosphoPICK-SNP is also similar to MIMP.It predicts the kinase responsible for phosphorylating a site, and whether a mutation affects its ability to phosphorylate the site [93].While all of the tools are easy to use, the databases are better for individual searches and the three prediction tools are better for analysis of a user's mutation data.

Resources for kinase inhibitors
After discovering altered kinases from phosphoproteomic data to use as therapeutic targets, identifying inhibitors is essential.Most available resources connect known drugs to their known kinase targets (Table 6).DrugKiNET shows the known inhibitors for kinases, and the kinases that a compound inhibits.It also predicts which kinases a drug can inhibit.K-Map extends these interactions to suggest the best compound to inhibit a set of kinases [94].Finally, KinomeSelector groups kinases by sequence similarity and similarity of drug response.It a b Fig. 5 True and false positive predictions for kinase activity prediction tools.a For all 20 inhibitors, the number of known targets predicted to be significantly downregulated by each tool.b For all inhibitors, the number of all significantly downregulated kinases that do not match known inhibitor targets then allows a user to choose a subset of kinases to target that cover the kinome [58].

Other kinase signaling tools
The final set of bioinformatics tools, summarized in Table 7, enhance phosphoproteomic analysis and cover visualization, data retrieval, and prediction tools.Additional kinases from a genome can be predicted by Kinannote [95] and KinConform can predict whether those kinases are active in structure files [96].KinMap [97] is used to visualize the entire kinome tree and Phospho-Logo [98] is used to generate sequence logos of kinases.
On the other side, RLIMS-P and eFIP are both tools that extract data on phosphorylation interactions from the literature [99,100].Then CPhos identifies phosphorylation sites of interest that are conserved across species [101].PyTMs [102] is a tool to visualize 3D structures of phosphorylation sites and ultimately RegPhos2.0[42] can be used to visualize signaling networks.RegPhos2.0also provides heatmaps for kinase and substrate mRNA expression in cancer.Finally, 14-3-3-Pred predicts phosphorylation sites in protein sequences that might bind to 14-3-3 proteins, further adding to the phosphorylationrelated signaling network [103].

Discussion
The available databases and tools for studying kinase signaling cover diverse functions and include information on enzymes and their substrates, inhibitors, activity, and mutations.Together these knowledge bases, prediction tools, and analysis tools comprise the current best standard for studying kinase signaling and many can be used without extensive computational knowledge.Overall, these tools allow a researcher to discover vast amounts of information from their phosphoproteomic data and some tools can even perform entire sets of analyses with a single button click [83].Despite the work that has been done, there is room for advancement to fully utilize phosphoproteomic data for use in the clinic.First, the majority of tools focus almost exclusively on the study of protein kinases.However, phosphatases are critical components of the kinase signaling cascade and are frequently dysregulated in cancer.Understanding the role of the interplay between kinases and phosphatases on the net phosphorylation seen in global phosphoproteomic data is essential to identifying abnormal cell signaling in disease.Furthermore, while the current tools and research are aimed at studying dysregulated protein phosphorylation, non-protein phosphorylation is also often altered in disease.For example, hexokinases, which phosphorylate glucose, drive glucose metabolism and contribute to tumor initiation in mouse models of lung and breast cancers [104].The development of resources and tools to study non-protein kinases and phosphatases could advance research in a variety of fields.
While the current tools provide critical functions, their error rate and accuracy could be improved.Errors are frequently propagated or amplified when tools collect data from a variety of resources.However, the impact of these errors on downstream analyses and biological inferences remains to be determined.
For all tools, usability can be an issue, both for bioinformaticians and biologists with no computational experience.Tools are frequently platform-dependent, do not allow downloadable results, and are not well annotated.Furthermore, tools are difficult to compare or to use more than one during analysis.The input and output formats are not standardized and use a variety of protein naming conventions.
The largest challenge was deciphering input limitations and understanding results.For example, submitting a sequence with a large number of phosphorylatable residues to GPS caused the software to stall without an error message and no documentation mentioned a size limit.Musite did not provide results for a sequence or two each run without explanation.Furthermore, downloadable result files for many tools had no column headers so the column contents were unknown.For example, the downloadable file from musite has no column titles, so you have to check the table on the website to understand the results.Additionally, scores are usually presented without explanation.Only careful reading of the manuscript or the manual elucidates what value signifies a "good" response.For example, in Scansite, the score 0 is the best, with scores closest to 0 indicating the best match.But in PhosphoPICK, the score indicates the probability of being phosphorylated by a kinase at that site so a score closer to 1 is better.Experts in machine learning might understand the score without explanation, but naïve users likely will not.
One way to fix this challenge is to have a detailed, easyto-find manual.The manual should include ways to run the tool, the underlying mechanism of the method, and detailed description of the results.The description of the results should also be available where results are visualized.Furthermore, sample input is helpful for a new user to test the tool and determine whether the results will be useful for their experiment before preparing their own data files.

Conclusions
There are many tools and resources that can be used to study kinase signaling and these tools will become even more essential with the continued production of phosphoproteomic data.It is essential for the biological community to research under-studied enzymes and to validate specific substrates of kinases and phosphatases.Furthermore, bioinformaticians should consider creating tools that utilize information from both sides of the enzymatic phosphorylation reaction.Finally, resources should be carefully planned, easy to use, and well maintained and the community should work to standardize the use of enzyme IDs and phosphorylation site location.

Fig. 1
Fig.1Network of phosphorylation site and kinase-substrate interaction databases.Gray nodes indicate databases that are no longer accessible.Arrows point from the knowledge source to the collecting database.Arrows originating from the four most highly used databases are colored by source (green = HPRD, blue = Swiss-Prot, red = PhosphoSitePlus, pink = Phospho.ELM)

Fig. 2
Fig. 2 Number of substrates per kinase and phosphatase.a Number of substrates for the top 100 kinases in four databases.Substrates present in more than one database are colored black while the remaining sites are unique to each database.b Number of substrates for each phosphatase in DEPOD (yellow), HPRD (green), Phosphatome.Net (blue), or in more than one database (black)

Fig. 4
Fig. 4 ROC curves for substrate prediction of four kinases.The false positive and true positive rates of substrate prediction for a CDK1, b CK2, c MAPK1, and d PKA.The AUC for each tool is listed next to the tool name

Table 1
Knowledge bases of human kinases and phosphatases

Table 3
Available phosphorylation site and kinase-substrate prediction tools

Table 4
Kinase activity prediction and phosphoproteomic dataset analysis tools

Table 5
Resources for studying the effect of mutations on kinases and phosphorylation sites SNV single nucleotide variation, PSSM position specific scoring matrix

Table 6
Kinase-inhibitor relationship resourcesK-Map has two different databases-one with 178 drugs inhibiting 300 kinases and one with 72 drugs inhibiting 442 kinases

Table 7
Visualization, data retrieval, and prediction tools