pFind Studio: a computational solution for mass spectrometry-based proteomics



2022




Highly robust de novo full-length protein sequencing
Analytical Chemistry2022. Mai, NB et al. Jinan Univ, Key Lab Funct Prot Res Guangdong Higher Educ Inst, Guangzhou 510632, Peoples R China; Jinan Univ, MOE Key Lab Tumor Mol Biol, Inst Life & Hlth Engn, Guangzhou 510632, Peoples R China
ABSTRACT:Accurate full-length sequencing of a purified unknown protein is still challenging nowadays due to the error-prone mass-spectrometry (MS)-based methods. De novo identified peptide sequence largely contain errors, undermining the accuracy of assembly. Bias on the detectability of the peptides also makes low-coverage regions, resulting in gaps. Although recent advances on multi-enzyme hydrolysis and algorithms showed complete assembly of full-length protein sequences in a few examples, the robustness in practical application is still to be improved. Here, inspired by genome assembly strategies, we demonstrate a contig-scaffolding strategy to assemble protein sequences with high robustness and accuracy. This strategy integrates multiple unspecific hydrolysis methods to minimize the bias in the hydrolysis process. After de novo identification of the peptides, our assembly algorithm, named Multiple Contigs & Scaffolding (MuCS), assembles the peptide sequences in a multistep, i.e., contig-scaffold manner, with error correction in each step. MS data from different hydrolysis experiments complement each other for robust contig extension and error correction. We demonstrated that our strategy on three proteins and three replications all reached 100% coverage (except one with 98.85%) and 98.69-100% accuracy. It can also efficiently deal with the membrane protein, although the transmembrane region was missing due to the limitation of the MS. The three replicates reached 88.85-92.57% coverage and 97.57-100% accuracy. In sum, we provided a practical, robust, and accurate solution for full-length protein sequencing. The MuCS software is available at http://chi-biotech. com/mucs/.
Use: pNovo; pParse



Highly Robust de Novo Full-Length Protein Sequencing
Analytical Chemistry2022. Mai, NB et al. Jinan Univ, Key Lab Funct Prot Res Guangdong Higher Educ Inst, Guangzhou 510632, Peoples R China; Jinan Univ, MOE Key Lab Tumor Mol Biol, Inst Life & Hlth Engn, Guangzhou 510632, Peoples R China
ABSTRACT:Accurate full-length sequencing of a purified unknown protein is still challenging nowadays due to the error-prone mass-spectrometry (MS)-based methods. De novo identified peptide sequence largely contain errors, undermining the accuracy of assembly. Bias on the detectability of the peptides also makes low-coverage regions, resulting in gaps. Although recent advances on multi-enzyme hydrolysis and algorithms showed complete assembly of full-length protein sequences in a few examples, the robustness in practical application is still to be improved. Here, inspired by genome assembly strategies, we demonstrate a contig-scaffolding strategy to assemble protein sequences with high robustness and accuracy. This strategy integrates multiple unspecific hydrolysis methods to minimize the bias in the hydrolysis process. After de novo identification of the peptides, our assembly algorithm, named Multiple Contigs & Scaffolding (MuCS), assembles the peptide sequences in a multistep, i.e., contig-scaffold manner, with error correction in each step. MS data from different hydrolysis experiments complement each other for robust contig extension and error correction. We demonstrated that our strategy on three proteins and three replications all reached 100% coverage (except one with 98.85%) and 98.69-100% accuracy. It can also efficiently deal with the membrane protein, although the transmembrane region was missing due to the limitation of the MS. The three replicates reached 88.85-92.57% coverage and 97.57-100% accuracy. In sum, we provided a practical, robust, and accurate solution for full-length protein sequencing. The MuCS software is available at http://chi-biotech. com/mucs/.
Use: pNovo; pParse



Diversity matters: optimal collision energies for tandem mass spectrometric analysis of a large set of N-glycopeptides
Journal of Proteome Research2022. Hever, H et al. Res Ctr Nat Sci, Eotvos Lorand Res Network, MS Prote Res Grp, H-1117 Budapest, Hungary
ABSTRACT:Identification and characterization of N-glycopep-tides from complex samples are usually based on tandem mass spectrometric measurements. Experimental settings, especially the collision energy selection method, fundamentally influence the obtained fragmentation pattern and hence the confidence of the database search results ("score"). Using standards of naturally occurring glycoproteins, we mapped the Byonic and pGlyco search engine scores of almost 200 individual N-glycopeptides as a function of collision energy settings on a quadrupole time of flight instrument. The resulting unprecedented amount of peptide-level information on such a large and diverse set of N-glycopeptides revealed that the peptide sequence heavily influences the energy for the highest score on top of an expected general linear trend with m/z. Search engine dependence may also be noteworthy. Based on the trends, we designed an experimental method and tested it on HeLa, blood plasma, and monoclonal antibody samples. As compared to the literature, these notably lower collision energies in our workflow led to 10-50% more identified N-glycopeptides, with higher scores. We recommend a simple approach based on a small set of reference N-glycopeptides easily accessible from glycoprotein standards to ease the precise determination of optimal methods on other instruments. Data sets can be accessed via the MassIVE repository (MSV000089657 and MSV000090218).
Use: pGlyco



Diversity Matters: Optimal Collision Energies for Tandem Mass Spectrometric Analysis of a Large Set of N-Glycopeptides
Journal of Proteome Research2022. Hever, H et al. Res Ctr Nat Sci, Eotvos Lorand Res Network, MS Prote Res Grp, H-1117 Budapest, Hungary
ABSTRACT:Identification and characterization of N-glycopep-tides from complex samples are usually based on tandem mass spectrometric measurements. Experimental settings, especially the collision energy selection method, fundamentally influence the obtained fragmentation pattern and hence the confidence of the database search results ("score"). Using standards of naturally occurring glycoproteins, we mapped the Byonic and pGlyco search engine scores of almost 200 individual N-glycopeptides as a function of collision energy settings on a quadrupole time of flight instrument. The resulting unprecedented amount of peptide-level information on such a large and diverse set of N-glycopeptides revealed that the peptide sequence heavily influences the energy for the highest score on top of an expected general linear trend with m/z. Search engine dependence may also be noteworthy. Based on the trends, we designed an experimental method and tested it on HeLa, blood plasma, and monoclonal antibody samples. As compared to the literature, these notably lower collision energies in our workflow led to 10-50% more identified N-glycopeptides, with higher scores. We recommend a simple approach based on a small set of reference N-glycopeptides easily accessible from glycoprotein standards to ease the precise determination of optimal methods on other instruments. Data sets can be accessed via the MassIVE repository (MSV000089657 and MSV000090218).
Use: pGlyco



Directed evolution of adeno-associated virus 5 capsid enables specific liver tropism
Molecular Therapy-Nucleic Acids2022. Wang, YQ et al. East China Univ Sci & Technol, Sch Bioengn, Shanghai 200237, Peoples R China; East China Univ Sci & Technol, Sch Pharm, Shanghai 200237, Peoples R China
ABSTRACT:Impressive achievements in clinical trials to treat hemophilia establish a milestone in the development of gene therapy. highlights the significance of AAV-mediated gene delivery to liver. AAV5 is a unique serotype featured by low neutralizing antibody prevalence. Nevertheless, its liver infectivity is rela-tively weak. Consequently, it is vital to exploit novel AAV5 capsid mutants with robust liver tropism. To this aim, we per formed AAV5-NNK library and barcode screening in mice, from which we identified one capsid variant, called AAVzk2. AAVzk2 displayed a similar yield but divergent post-transla-tional modification sites compared with wild-type serotypes. Mice intravenously injected with AAVzk2 demonstrated stronger liver transduction than AAV5, roughly comparable with AAV8 and AAV9, with undetectable transduction of other tissues or organs such as heart, lung, spleen, kidney, brain, and skeletal muscle, indicating a liver-specific tropism. Further studies showed a superior human hepatocellular transduction of AAVzk2 to AAV5, AAV8 and AAV9, whereas the seroreac-tivity of AAVzk2 was as low as AAV5. Overall, we provide novel AAV serotype that facilitates a robust and specific liver gene delivery to a large population, especially those unable to be treated by AAV8 and AAV9.
Use: pGlyco



TMT-based multiplexed quantitation of N-glycopeptides reveals glycoproteome remodeling induced by oncogenic mutations
ACS omega2022. Saraswat, M et al. Mayo Clin, Dept Lab Med & Pathol, Rochester, MN 55905 USA; Inst Bioinformat, Bangalore 560066, Karnataka, India; Manipal Acad Higher Educ MAHE, Manipal 576104, Karnataka, India; Natl Inst Mental Hlth & Neurosci NIMHANS, Ctr Mol Med, Bangalore 560029, Karnataka, India; Mayo Clin, Ctr Individualized Med, Rochester, MN 55905 USA
ABSTRACT:Y Glycoproteomics, or the simultaneous characterization of glycans and their attached peptides, is increasingly being employed to generate catalogs of glycopeptides on a large scale. Nevertheless, quantitative glycoproteomics remains challenging even though isobaric tagging reagents such as tandem mass tags (TMT) are routinely used for quantitative proteomics. Here, we present a workflow that combines the enrichment or fractionation of TMT-labeled glycopeptides with size-exclusion chromatography (SEC) for an in-depth and quantitative analysis of the glycoproteome. We applied this workflow to study the cellular glycoproteome of an isogenic mammary epithelial cell system that recapitulated oncogenic mutations in the PIK3CA gene, which codes for the phosphatidylinositol-3-kinase catalytic subunit. As compared to the parental cells, cells with mutations in exon 9 (E545K) or exon 20 (H1047R) of the PIK3CA gene exhibited site-specific glycosylation alterations in 464 of the 1999 glycopeptides quantified. Our strategy led to the discovery of site-specific glycosylation changes in PIK3CA mutant cells in several important receptors, including cell adhesion proteins such as integrin beta-6 and CD166. This study demonstrates that the SEC-based enrichment of glycopeptides is a simple and robust method with minimal sample processing that can easily be coupled with TMT-labeling for the global quantitation of glycopeptides.
Use: pGlyco



GlycAP, a glycoproteomic analysis platform for site-specific N-glycosylation research
International Journal of Mass Spectrometry2022. Wu, Mengxi et al. Fudan Univ, Inst Biomed Sci, Shanghai, Peoples R China; Fudan Univ, Shanghai Peoples Hosp 5, Dept Chem, Shanghai, Peoples R China
ABSTRACT:Protein glycosylation is of great importance for its strong association with various diseases. Mass spectrometry-based site-specific glycoproteome methods with efficient interpretation software tools have become powerful strategies for glycosylation research. However, the lack of bioinformatics tools for automatic analysis of the interpretation data hinders further exploration. Here, we developed a comprehensive N-glycoproteomic analysis platform called GlycAP, which is embedded with different analytical modules, including qualitative analysis, quantitative analysis, functional analysis, and clinical analysis. The qualitative analysis module was designed for the qualitative statistical analysis and het-erogeneity analysis. The quantitative analysis module supports the discovery of differential glycopeptides between different groups with quantitative results. The functional analysis and clinical analysis modules could help users go a step further and find potential biomarkers from differential glycopeptides. GlycAP, which is freely available online (https://project.omicsolution.com/GlycAP/), is a useful tool for facilitating glycoproteomics in site-specific glycosylation exploration.(c) 2022 Elsevier B.V. All rights reserved.
Use: pGlyco



N-glycoproteomics reveals distinct glycosylation alterations in NGLY1-deficient patient-derived dermal fibroblasts
Journal of inherited metabolic disease2022. Budhraja, R et al. Mayo Clin, Dept Clin Genom, 200 First St SW, Rochester, MN 55905 USA; Mayo Clin, Dept Lab Med & Pathol, 200 First St SW, Rochester, MN 55905 USA
ABSTRACT:Congenital disorders of glycosylation are genetic disorders that occur due to defects in protein and lipid glycosylation pathways. A deficiency of N-glycanase 1, encoded by the NGLY1 gene, results in a congenital disorder of deglycosylation. The NGLY1 enzyme is mainly involved in cleaving N-glycans from misfolded, retro-translocated glycoproteins in the cytosol from the endoplasmic reticulum before their proteasomal degradation or activation. Despite the essential role of NGLY1 in deglycosylation pathways, the exact consequences of NGLY1 deficiency on global cellular protein glycosylation have not yet been investigated. We undertook a multiplexed tandem mass tags-labeling-based quantitative glycoproteomics and proteomics analysis of fibroblasts from NGLY1-deficient individuals carrying different biallelic pathogenic variants in NGLY1. This quantitative mass spectrometric analysis detected 8041 proteins and defined a proteomic signature of differential expression across affected individuals and controls. Proteins that showed significant differential expression included phospholipid phosphatase 3, stromal cell-derived factor 1, collagen alpha-1 (IV) chain, hyaluronan and proteoglycan link protein 1, and thrombospondin-1. We further detected a total of 3255 N-glycopeptides derived from 550 glycosylation sites of 407 glycoproteins by multiplexed N-glycoproteomics. Several extracellular matrix glycoproteins and adhesion molecules showed altered abundance of N-glycopeptides. Overall, we observed distinct alterations in specific glycoproteins, but our data revealed no global accumulation of glycopeptides in the patient-derived fibroblasts, despite the genetic defect in NGLY1. Our findings highlight new molecular and system-level insights for understanding NGLY1-CDDG.
Use: pGlyco



Multiattribute glycan identification and FDR control for glycoproteomics
Molecular & Cellular Proteomics2022. Polasky, Daniel A. et al. Univ Michigan, Dept Computat Med & Bioinformat, Ann Arbor, MI 48109 USA; Univ Michigan, Dept Pathol, Ann Arbor, MI 48109 USA
ABSTRACT:Rapidly improving methods for glycoproteomics have enabled increasingly large-scale analyses of complex glycopeptide samples, but annotating the resulting mass spectrometry data with high confidence remains a major bottleneck. We recently introduced a fast and sensitive glycoproteomics search method in our MSFragger search engine, which reports glycopeptides as a combination of a peptide sequence and the mass of the attached glycan. In samples with complex glycosylation patterns, converting this mass to a specific glycan composition is not straightforward; however, as many glycans have similar or identical masses. Here, we have developed a new method for determining the glycan composition of N-linked glycopeptides fragmented by collisional or hybrid activation that uses multiple sources of information from the spectrum, including observed glycan B-type (oxonium) and Y-type ions and mass and precursor monoisotopic selection errors to discriminate between possible glycan candidates. Combined with false discovery rate estimation for the glycan assignment, we show that this method is capable of specifically and sensitively identifying glycans in complex glycopeptide analyses and effectively controls the rate of false glycan assignments. The new method has been incorporated into the PTM-Shepherd modification analysis tool to work directly with the MSFragger glyco search in the FragPipe graphical user interface, providing a complete computational pipeline for annotation of N-glycopeptide spectra with false discovery rate control of both peptide and glycan components that is both sensitive and robust against false identifications. As for for
Use: pGlyco



Exploration of quantitative site-specific serum O-glycoproteomics with isobaric labeling for the discovery of putative O-glycoprotein biomarkers
PROTEOMICS--Clinical Applications2022. Zhang, Zihan et al. Shanghai Jiao Tong Univ, Sch Med, Shanghai Peoples Hosp 9, Dept Plast & Reconstruct Surg, Shanghai 200011, Peoples R China; Tongji Univ, Shanghai Key Lab Chem Assessment & Sustainabil, Sch Chem Sci & Engn, Shanghai 200092, Peoples R China
ABSTRACT:Purpose Exploration study of site-specific isobaric-TMT-labeling quantitative serum O-glycoproteomics for the discovery of putative O-glycoprotein cancer biomarkers. Experimental design Sera of 10 breast cancer patients was used as the exploration cohort. More abundant N-glycosylation was first removed with PNGase F. After tryptic digestion of de-N-glycosylated serum proteome, the TMT-labeled O-glycopeptides mixture was prepared and analyzed with RPLC-MS/MS. Site-specific qualitative and quantitative database search of O-glycopeptides was carried out with pGlyco 3.0. The same raw datasets were also searched with intact N-glycopeptide search engine GPSeeker to exclude possible interference of N-glycosylation. The final IDs were checked manually with GlcNAc-containing glycosite-determining fragment ions for confirmation. Results With the control of spectrum-level FDR <= 1% and manual validation, 299 O-glycopeptides corresponding to 83 O-glycosites and 66 O-glycoproteins were identified, and 13 O-glycopeptides were found differentially expressed. Most interestingly, differential O-glycosylation was observed for IgG1 and IgG3, which is an interesting putative biomarker panel. Conclusion and clinical relevance Isobaric-labeling site-specific quantitative O-glycoproteomics is currently a state-of-the-art instrumental platform for discovery of putative seral cancer biomarkers. Differential seral O-glycosylation was observed in the IgG1 and IgG3.
Use: pGlyco