pValid: Validation Beyond the Target-Decoy Approach for Peptide Identification in Shotgun Proteomics

Introduction

pValid is an automatic tool for validating credibility of search results after TDA filtration. This software is based on an SVM classifier and uses four features related to open search and theoretical spectrum prediction for validation. The first feature is the score of this PSM calculated by pFind. The second feature is the score of the PSM reported by Open-pFind. The third feature is the cosine similarity of the original spectrum and the spectrum predicted by pDeep (referred to as pDeep similarity) of the best candidate of pFind. The fourth feature is the highest pDeep similarity among up to six peptide candidates, i.e., the top-3 peptide candidates reported by pFind and top-3 peptide candidates reported by Open-pFind.

LibSVM is used in this study to train an SVM classifier. The radial basis function is used as the SVM kernel function. All feature values are first normalized to [0,1] and then used for training. The other parameters are set by default. In the testing workflow, the process of extracting features is the same as that in the training workflow and the classifier trained from the training workflow is used for predicting whether a PSM is a suspicious identification.

pValid is written in Python 2.7, thus Python environment is needed for using pValid. pDeep is also used for predicting theoritical spectra and getting the third and fourth feature for pValid. Thus, tensorflow (version 0.12.1) and keras (version 1.2.1) also need to be installed.

Figure 1. Training and testing workflow of pValid. Green arrows represent the steps are used in the training workflow. Orange arrows represent the steps are used in the testing workflow. Purple arrows represent the steps are used in the practical usage workflow.




Supplementary Files

pValid is currently free to use. Download pValid.