pFind Studio: a computational solution for mass spectrometry-based proteomics
2023
Nature Communications2023. Sun, Weiping et al.
Bioinformatics Solutions Inc., Waterloo, Ontario, Canada
ABSTRACT:Here we present GlycanFinder, a database search and de novo sequencing tool for the analysis of intact glycopeptides from mass spectrometry data. GlycanFinder integrates peptide-based and glycan-based search strategies to address the challenge of complex fragmentation of glycopeptides. A deep learning model is designed to capture glycan tree structures and their fragment ions for de novo sequencing of glycans that do not exist in the database. We performed extensive analyses to validate the false discovery rates (FDRs) at both peptide and glycan levels and to evaluate GlycanFinder based on comprehensive benchmarks from previous community-based studies. Our results show that GlycanFinder achieved comparable performance to other leading glycoproteomics softwares in terms of both FDR control and the number of identifications. Moreover, GlycanFinder was also able to identify glycopeptides not found in existing databases. Finally, we conducted a mass spectrometry experiment for antibody N-linked glycosylation profiling that could distinguish isomeric peptides and glycans in four immunoglobulin G subclasses, which had been a challenging problem to previous studies.
Use: pGlyco; pDeep
Scientific Reports2023. Yoondam Seo et al.
Doping Control Center, Korea Institute of Science and Technology, Hwarang-ro 14-gil, Seongbuk-gu, Seoul, 02792, Republic of Korea
ABSTRACT:Erythropoietin (EPO) is a glycoprotein hormone that stimulates red blood cell production. It is produced naturally in the body and is used to treat patients with anemia. Recombinant EPO (rEPO) is used illicitly in sports to improve performance by increasing the blood's capacity to carry oxygen. The World Anti-Doping Agency has therefore prohibited the use of rEPO. In this study, we developed a bottom-up mass spectrometric method for profiling the site-specific N-glycosylation of rEPO. We revealed that intact glycopeptides have a site-specific tetra-sialic glycan structure. Using this structure as an exogenous marker, we developed a method for use in doping studies. The profiling of rEPO N-glycopeptides revealed the presence of tri- and tetra-sialylated N-glycopeptides. By selecting a peptide with a tetra-sialic acid structure as the target, its limit of detection (LOD) was estimated to be<500pg/mL. Furthermore, we confirmed the detection of the target rEPO glycopeptide using three other rEPO products. We additionally validated the linearity, carryover, selectivity, matrix effect, LOD, and intraday precision of this method. To the best of our knowledge, this is the first report of a doping analysis using liquid chromatography/mass spectrometry-based detection of the rEPO glycopeptide with a tetra-sialic acid structure in human urine samples.
Use: pGlyco
Analytical Chemistry2023. Moran Chen et al.
Wuhan Univ, Inst Adv Studies, Wuhan 430072, Hubei, Peoples R China
ABSTRACT:Four-dimensional (4D) data-independentacquisition (DIA)-basedproteomics is a promising technology. However, its full performanceis restricted by the time-consuming building and limited coverageof a project-specific experimental library. Herein, we developed aversatile multifunctional deep learning model Deep4D based on self-attentionthat could predict the collisional cross section, retention time,fragment ion intensity, and charge state with high accuracies forboth the unmodified and phosphorylated peptides and thus establishedthe complete workflows for high-coverage 4D DIA proteomics and phosphoproteomicsbased on multidimensional predictions. A 4D predicted library containing similar to 2 million peptides was established that could realize experimentallibrary-free DIA analysis, and 33% more proteins were identified thanusing an experimental library of single-shot measurement in the exampleof HeLa cells. These results show the great values of the convenienthigh-coverage 4D DIA proteomics methods.
Use: pDeep
Journal of Proteome Research2023. Jianbai Ye et al.
MoE Key Laboratory of Brain-inspired Intelligent Perception and Cognition, University of Science and Technology of China, Hefei, Anhui 230026, China
ABSTRACT:
Use: pDeep
Journal of Proteome Research2023. Lewis Y.Geer et al.
Natl Inst Stand & Technol, Mass Spectrometry Data Ctr, Biomol Measurement Div, Gaithersburg, MD 20899 USA
ABSTRACT:The unbounded permutations of biological molecules, includingproteinsand their constituent peptides, present a dilemma in identifying thecomponents of complex biosamples. Sequence search algorithms usedto identify peptide spectra can be expanded to cover larger classesof molecules, including more modifications, isoforms, and atypicalcleavage, but at the cost of false positives or false negatives dueto the simplified spectra they compute from sequence records. Spectrallibrary searching can help solve this issue by precisely matchingexperimental spectra to library spectra with excellent sensitivityand specificity. However, compiling spectral libraries that span entireproteomes is pragmatically difficult. Neural networks that predictcomplete spectra containing a full range of annotated and unannotatedions can be used to replace these simplified spectra with librariesof fully predicted spectra, including modified peptides. Using sucha network, we created predicted spectral libraries that were usedto rescore matches from a sequence search done over a large searchspace, including a large number of modifications. Rescoring improvedthe separation of true and false hits by 82%, yielding an 8% increasein peptide identifications, including a 21% increase in nonspecificallycleaved peptides and a 17% increase in phosphopeptides.
Use: pDeep
2023. Charlotte Adams et al.
Laboratory of Protein Science, Proteomics and Epigenetic Signaling (PPES), Department of Biomedical Sciences, University of Antwerp, Antwerp,
ABSTRACT:
Use: pDeep
Analytical Chemistry2023. Moran Chen et al.
Wuhan Univ, Inst Adv Studies, Wuhan 430072, Hubei, Peoples R China
ABSTRACT:Four-dimensional (4D) data-independentacquisition (DIA)-basedproteomics is a promising technology. However, its full performanceis restricted by the time-consuming building and limited coverageof a project-specific experimental library. Herein, we developed aversatile multifunctional deep learning model Deep4D based on self-attentionthat could predict the collisional cross section, retention time,fragment ion intensity, and charge state with high accuracies forboth the unmodified and phosphorylated peptides and thus establishedthe complete workflows for high-coverage 4D DIA proteomics and phosphoproteomicsbased on multidimensional predictions. A 4D predicted library containing similar to 2 million peptides was established that could realize experimentallibrary-free DIA analysis, and 33% more proteins were identified thanusing an experimental library of single-shot measurement in the exampleof HeLa cells. These results show the great values of the convenienthigh-coverage 4D DIA proteomics methods.
Use: pDeep
Journal of Proteome Research2023. Lewis Y.Geer et al.
Natl Inst Stand & Technol, Mass Spectrometry Data Ctr, Biomol Measurement Div, Gaithersburg, MD 20899 USA
ABSTRACT:The unbounded permutations of biological molecules, includingproteinsand their constituent peptides, present a dilemma in identifying thecomponents of complex biosamples. Sequence search algorithms usedto identify peptide spectra can be expanded to cover larger classesof molecules, including more modifications, isoforms, and atypicalcleavage, but at the cost of false positives or false negatives dueto the simplified spectra they compute from sequence records. Spectrallibrary searching can help solve this issue by precisely matchingexperimental spectra to library spectra with excellent sensitivityand specificity. However, compiling spectral libraries that span entireproteomes is pragmatically difficult. Neural networks that predictcomplete spectra containing a full range of annotated and unannotatedions can be used to replace these simplified spectra with librariesof fully predicted spectra, including modified peptides. Using sucha network, we created predicted spectral libraries that were usedto rescore matches from a sequence search done over a large searchspace, including a large number of modifications. Rescoring improvedthe separation of true and false hits by 82%, yielding an 8% increasein peptide identifications, including a 21% increase in nonspecificallycleaved peptides and a 17% increase in phosphopeptides.
Use: pDeep
eLife2023. Ballmer, Daniel et al.
Department of Biochemistry, University of Oxford, South Parks Road, Oxford, OX1 3QU, United Kingdom, The Wellcome Centre for Cell Biology, Institute of Cell Biology, School of Biological Sciences, Edinburgh, EH9 3BF, United Kingdom
ABSTRACT:
Use: pFind; pDeep; pLink
eLife2023. Ballmer, Daniel et al.
Department of Biochemistry, University of Oxford, South Parks Road, Oxford, OX1 3QU, United Kingdom, The Wellcome Centre for Cell Biology, Institute of Cell Biology, School of Biological Sciences, Edinburgh, EH9 3BF, United Kingdom
ABSTRACT:
Use: pFind; pDeep; pLink