pFind Studio: a computational solution for mass spectrometry-based proteomics
2019
Molecular & Cellular Proteomics2019. Zhang, S et al.
Fudan Univ, Zhongshan Hosp, Liver Canc Inst, Shanghai 200032, Peoples R China.
ABSTRACT:N-glycosylation alteration has been reported in liver diseases. Characterizing N-glycopeptides that correspond to N-glycan structure with specific site information enables better understanding of the molecular pathogenesis of liver damage and cancer. Here, unbiased quantification of N-glycopeptides of a cluster of serum glycoproteins with 40-55 kDa molecular weight (40-kDa band) was investigated in hepatitis B virus (HBV)-related liver diseases. We used an N-glycopeptide method based on O-18/O-16 C-terminal labeling to obtain 82 comparisons of serum from patients with HBV-related hepatocellular carcinoma (HCC) and liver cirrhosis (LC). Then, multiple reaction monitoring (MRM) was performed to quantify N-glycopeptide relative to the protein content, especially in the healthy donor-HBV-LC-HCC cascade. TPLTAN(205)ITK (H5N5S1F1) and (H5N4S2F1) corresponding to the glycopeptides of IgA(2) were significantly elevated in serum from patients with HBV infection and even higher in HBV-related LC patients, as compared with healthy donor. In contrast, the two glycopeptides of IgA(2) fell back down in HBV-related HCC patients. In addition, the variation in the abundance of two glycopeptides was not caused by its protein concentration. The altered N-glycopeptides might be part of a unique glycan signature indicating an IgA-mediated mechanism and providing potential diagnostic clues in HBV-related liver diseases.
Use: pGlyco; pQuant
Scientific reports2019. Basharat, Abdul Rehman et al.
Lahore Univ Management Sci, Dept Biol, Biomed Informat Res Lab, Lahore, Pakistan
ABSTRACT:Top-Down Proteomics (TDP) is an emerging proteomics protocol that involves identification, characterization, and quantitation of intact proteins using high-resolution mass spectrometry. TDP has an edge over other proteomics protocols in that it allows for: (i) accurate measurement of intact protein mass, (ii) high sequence coverage, and (iii) enhanced identification of post-translational modifications (PTMs). However, the complexity of TDP spectra poses a significant impediment to protein search and PTM characterization. Furthermore, limited software support is currently available in the form of search algorithms and pipelines. To address this need, we propose 'SPECTRUM', an open-architecture and open-source toolbox for TDP data analysis. Its salient features include: (i) MS2-based intact protein mass tuning, (ii) de novo peptide sequence tag analysis, (iii) propensity-driven PTM characterization, (iv) blind PTM search, (v) spectral comparison, (vi) identification of truncated proteins, (vii) multifactorial coefficient-weighted scoring, and (viii) intuitive graphical user interfaces to access the aforementioned functionalities and visualization of results. We have validated SPECTRUM using published datasets and benchmarked it against salient TDP tools. SPECTRUM provides significantly enhanced protein identification rates (91% to 177%) over its contemporaries. SPECTRUM has been implemented in MATLAB, and is freely available along with its source code and documentation at https://github.com/BIRLSPECTRUM/.
Use: pTop
Nature methods2019. Gessulat, S et al.
Tech Univ Munich, Chair Prote & Bioanalyt, Freising Weihenstephan, Germany.
ABSTRACT:In mass-spectrometry-based proteomics, the identification and quantification of peptides and proteins heavily rely on sequence database searching or spectral library matching. The lack of accurate predictive models for fragment ion intensities impairs the realization of the full potential of these approaches. Here, we extended the ProteomeTools synthetic peptide library to 550,000 tryptic peptides and 21 million high-quality tandem mass spectra. We trained a deep neural network, termed Prosit, resulting in chromatographic retention time and fragment ion intensity predictions that exceed the quality of the experimental data. Integrating Prosit into database search pipelines led to more identifications at >10x lower false discovery rates. We show the general applicability of Prosit by predicting spectra for proteases other than trypsin, generating spectral libraries for data-independent acquisition and improving the analysis of metaproteomes. Prosit is integrated into ProteomicsDB, allowing search result re-scoring and custom spectral library generation for any organism on the basis of peptide sequence alone.
Use: pDeep
Molecular & Cellular Proteomics2019. Guan, SH et al.
Univ Waterloo, David R Cheriton Sch Comp Sci, Waterloo, ON N2L 3G1, Canada.
ABSTRACT:Deep learning models for prediction of three key LC-MS/MS properties from peptide sequences were developed. The LC-MS/MS properties or behaviors are indexed retention times (iRT), MS1 or survey scan charge state distributions, and sequence ion intensities of HCD spectra. A common core deep supervised learning architecture, bidirectional long-short term memory (LSTM) recurrent neural networks was used to construct the three prediction models. Two featurization schemes were proposed and demonstrated to allow for efficient encoding of modifications. The iRT and charge state distribution models were trained with on order of 105 data points each. An HCD sequence ion prediction model was trained with 2 x 106 experimental spectra. The iRT prediction model and HCD sequence ion prediction model provide improved accuracies over the start-of-the-art models available in literature. The MS1 charge state distribution prediction model offers excellent performance. The prediction models can be used to enhance peptide identification and quantification in data-dependent acquisition and data-independent acquisition (DIA) experiments as well as to assist MRM (multiple reaction monitoring) and PRM (parallel reaction monitoring) experiment design.
Use: pDeep
BMC genomics2019. Lin, YM et al.
Natl Chengchi Univ, Dept Comp Sci, Taipei 11605, Taiwan.
ABSTRACT:Background: Tandem mass spectrometry allows biologists to identify and quantify protein samples in the form of digested peptide sequences. When performing peptide identification, spectral library search is more sensitive than traditional database search but is limited to peptides that have been previously identified. An accurate tandem mass spectrum prediction tool is thus crucial in expanding the peptide space and increasing the coverage of spectral library search. Results: We propose (MSCNN)-C-2, a non-linear regression model based on deep convolutional neural networks, a deep learning algorithm. The features for our model are amino acid composition, predicted secondary structure, and physical-chemical features such as isoelectric point, aromaticity, helicity, hydrophobicity, and basicity. (MSCNN)-C-2 was trained with five-fold cross validation on a three-way data split on the large-scale human HCD MS2 dataset of Orbitrap LC-MS/MS downloaded from the National Institute of Standards and Technology. It was then evaluated on a publicly available independent test dataset of human HeLa cell lysate from LC-MS experiments. On average, our model shows better cosine similarity and Pearson correlation coefficient (0.690 and 0.632) than (MSPIP)-P-2 (0.647 and 0.601) and is comparable with pDeep (0.692 and 0.642). Notably, for the more complex MS2 spectra of 3+ peptides, (MSPIP)-P-2 is significantly better than both (MSPIP)-P-2 and pDeep. Conclusions: We showed that (MSCNN)-C-2 outperforms (MSPIP)-P-2 for 2+ and 3+ peptides and pDeep for 3+ peptides. This implies that (MSCNN)-C-2, the proposed convolutional neural network model, generates highly accurate MS2 spectra for LC-MS/MS experiments using Orbitrap machines, which can be of great help in protein and peptide identifications. The results suggest that incorporating more data for deep learning model may improve performance.
Use: pDeep
Molecular & Cellular Proteomics2019. Guan, SH et al.
Univ Waterloo, David R Cheriton Sch Comp Sci, Waterloo, ON N2L 3G1, Canada.
ABSTRACT:Deep learning models for prediction of three key LC-MS/MS properties from peptide sequences were developed. The LC-MS/MS properties or behaviors are indexed retention times (iRT), MS1 or survey scan charge state distributions, and sequence ion intensities of HCD spectra. A common core deep supervised learning architecture, bidirectional long-short term memory (LSTM) recurrent neural networks was used to construct the three prediction models. Two featurization schemes were proposed and demonstrated to allow for efficient encoding of modifications. The iRT and charge state distribution models were trained with on order of 105 data points each. An HCD sequence ion prediction model was trained with 2 x 106 experimental spectra. The iRT prediction model and HCD sequence ion prediction model provide improved accuracies over the start-of-the-art models available in literature. The MS1 charge state distribution prediction model offers excellent performance. The prediction models can be used to enhance peptide identification and quantification in data-dependent acquisition and data-independent acquisition (DIA) experiments as well as to assist MRM (multiple reaction monitoring) and PRM (parallel reaction monitoring) experiment design.
Use: pDeep