pFind Studio: a computational solution for mass spectrometry-based proteomics



2021




Proteogenomics Study of Blastobotrys adeninivorans TMCC 70007A Dominant Yeast in the Fermentation Process of Pu-erh Tea
Journal of proteome research2021. Tian, F et al. Yunnan Univ, Sch Life Sci, Yunnan Inst Microbiol, Minist Educ,Key Lab Microbial Divers Southwest Ch, Kunming 650091, Yunnan, Peoples R China.
ABSTRACT:Blastobotrys adeninivorans plays an essential role in pile-fermenting of Pu-erh tea. Its ability to assimilate various carbon and nitrogen sources makes it available for application in a wide range of industry sectors. The genome of B. adeninivorans TMCC 70007 isolated from pile-fermented Pu-erh tea was sequenced and assembled. Proteomics analysis indicated that 4900 proteins in TMCC 70007 were expressed under various culture conditions. Proteogenomics mapping revealed 48 previously unknown genes and corrected 118 gene models predicted by GeneMark-ES. Ortho-proteogenomics analysis identified 17 previously unidentified genes in B. adeninivorans LS3, the first strain with a sequenced genome among the genus Blastobotrys as well. More importantly, five species specific genes were identified from TMCC 70007, which could serve as a barcode for strain typing and were applicable for fermentation process protection of this industrial species. The datasets generated from tea aqueous extract culture not only increased the proteome coverage and accuracy but also contributed to the identification of proteins related to polyphenols and caffeine, which were considered to change greatly during the microbial fermentation of Pu-erh tea. This study provides a proteome perspective on TMCC 70007, which was considered to be an important strain in the production of Pu-erh tea. The systematic proteogenomics analysis not only made a better annotation on the genome of B. adeninivorans TMCC 70007 as previous proteogenomics study but also provided solution for fermentation process protection on valuable industrial species with species specific genes uniquely identified from proteogenomics study.
Use: pFind



Mirror-Cutting-Based Digestion Strategy Enables the In-Depth and Accuracy Characterization of N-Linked Protein Glycosylation
Journal of proteome research2021. Chen, Y et al. Chinese Acad Sci, Dalian Inst Chem Phys, CAS Key Lab Separat Sci Analyt Chem, Dalian 116023, Peoples R China.
ABSTRACT:N-linked glycosylation plays important roles in multiple physiological and pathological processes, while the analysis coverage is still limited due to the insufficient digestion of glycoproteins, as well as incomplete ion fragments for intact glycopeptide determination. Herein, a mirror-cutting-based digestion strategy was proposed by combining two orthogonal proteases of LysargiNase and trypsin to characterize the macro- and micro-heterogeneity of protein glycosylation. Using the above two proteases, the b- or y-ion series of peptide sequences were, respectively, enhanced in MS/MS, generating the complementary spectra for peptide sequence identification. More than 27% (489/1778) of the site-specific glycoforms identified by LysargiNase digestion were not covered by trypsin digestion, suggesting the elevated coverage of protein sequences and site-specific glycoforms by the mirror-cutting method. Totally, 10,935 site-specific glycoforms were identified from mouse brain tissues in the 18 h MS analysis, which significantly enhanced the coverage of protein glycosylation. Intriguingly, 27 mannose-6-phosphate (M6P) glycoforms were determined with core fucosylation, and 23 of them were found with the "Y-HexNAc-Fuc" ions after manual checking. This is hitherto the first report of M6P and fucosylation co-modifications of glycopeptides, in which the mechanism and function still needs further exploration. The mirror-cutting digestion strategy also has great application potential in the exploration of missing glycoproteins from other complex samples to provide rich resources for glycobiology research.
Use: pFind; pGlyco



DIA-based proteomics identifies IDH2 as a targetable regulator of acquired drug resistance in chronic myeloid leukemia
Molecular & Cellular Proteomics2021. Liu, W et al. Dalian Med Univ, Coll Pharm, Dept Clin Pharmacol, Dalian, Liaoning, Peoples R China; Westlake Univ, Sch Life Sci, Key Lab Struct Biol Zhejiang Prov, Hangzhou, Zhejiang, Peoples R China; Westlake Lab Life Sci & Biomed, Ctr Infect Dis Res, Hangzhou, Zhejiang, Peoples R China; Westlake Inst Adv Study, Inst Basic Med Sci, Hangzhou, Zhejiang, Peoples R China
ABSTRACT:Drug resistance is a critical obstacle to effective treatment in patients with chronic myeloid leukemia. To understand the underlying resistance mechanisms in response imatinib mesylate (IMA) and adriamycin (ADR), the parental K562 cells were treated with low doses of IMA ADR for 2 months to generate derivative cells with mild, intermediate, and severe resistance to the drugs defined by their increasing resistance index. PulseDIAbased (DIA [data-independent acquisition]) quantitative proteomics was then employed to reveal the proteome changes in these resistant cells. In total, 7082 proteins from 98,232 peptides were identified and quantified from the dataset using four DIA software tools including OpenSWATH, Spectronaut, DIA-NN, and EncyclopeDIA. Sirtuin signaling pathway was found to be significantly enriched in both ADR-resistant and IMA-resistant K562 cells. In particular, isocitrate dehydrogenase (NADP(+)) 2 was identified as a potential drug target correlated with the drug resistance phenotype, and inhibition by the antagonist AGI-6780 reversed the acquired resistance in K562 cells to either ADR or IMA. Together, our study has implicated isocitrate dehydrogenase (NADP(+)) 2 as a potential target that can be therapeutically leveraged to alleviate the drug resistance
Use: pFind



Proteogenomics Integrating Novel Junction Peptide Identification Strategy Discovers Three Novel Protein Isoforms of Human NHSL1 and EEF1B2
Journal of proteome research2021. He, Cuitong et al. Peking-Tsinghua Centre for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, 100871 Beijing, China
ABSTRACT:In eukaryotes, alternative pre-mRNA splicing allows a single gene to encode different protein isoforms that function in many biological processes, and they are used as biomarkers or therapeutic targets for diseases. Although protein isoforms in the human genome are well annotated, we speculate that some low-abundance protein isoforms may still be under-annotated because most genes have a primary coding product and alternative protein isoforms tend to be under-expressed. A peptide coencoded by a novel exon and an annotated exon separated by an intron is known as a novel junction peptide. In the absence of known transcripts and homologous proteins, traditional whole-genome six-frame translation-based proteogenomics cannot identify novel junction peptides, and it cannot capture novel alternative splice sites. In this article, we first propose a strategy and tool for identifying novel junction peptides, called CJunction, which we then integrate into a proteogenomics process specifically designed for novel protein isoform discovery and apply to the analysis of a deep-coverage HeLa mass spectrometry data set with identifier PXD004452 in ProteomeXchange. We succeeded in identifying and validating three novel protein isoforms of two functionally important genes, NHSL1 (causative gene of Nance-Horan syndrome) and EEF1B2 (translation elongation factor), which validate our hypothesis. These novel protein isoforms have significant sequence differences from the annotated gene-coding products introduced by the novel N-terminal, suggesting that they may play importantly different functions.
Use: pFind



An Integrated Strategy Reveals Complex Glycosylation of Erythropoietin Using Mass Spectrometry
JOURNAL OF PROTEOME RESEARCH2021. Guan, YD et al. Univ Med Ctr Hamburg Eppendorf, Inst Clin Chem & Lab Med, Sect Mass Spectrometr Prote, D-20246 Hamburg, Germany.
ABSTRACT:The characterization of therapeutic glycoproteins is challenging due to the structural heterogeneity of the therapeutic protein glycosylation. This study presents an in-depth analytical strategy for glycosylation of first-generation erythropoietin (epoetin beta), including a developed mass spectrometric workflow for N-glycan analysis, bottom-up mass spectrometric methods for site-specific N-glycosylation, and a LC-MS approach for O-glycan identification. Permethylated N-glycans, peptides, and enriched glycopeptides of erythropoietin were analyzed by nanoLC-MS/MS, and de-N-glycosylated erythropoietin was measured by LC-MS, enabling the qualitative and quantitative analysis of glycosylation and different glycan modifications (e.g., phosphorylation and O-acetylation). The newly developed Python scripts enabled the identification of 140 N-glycan compositions (237 N-glycan structures) from erythropoietin, especially including 8 phosphorylated N-glycan species. The site-specificity of N-glycans was revealed at the glycopeptide level by pGlyco software using different proteases. In total, 114 N-glycan compositions were identified from glycopeptide analysis. Moreover, LC-MS analysis of deN-glycosylated erythropoietin species identified two O-glycan compositions based on the mass shifts between non-O-glycosylated and O-glycosylated species. Finally, this integrated strategy was proved to realize the in-depth glycosylation analysis of a therapeutic glycoprotein to understand its pharmacological properties and improving the manufacturing processes.
Use: pFind; pGlyco



Potential Use of Serum Proteomics for Monitoring COVID-19 Progression to Complement RT-PCR Detection
Journal of proteome research2021. Zhang, Y et al. Wenzhou Med Univ, Taizhou Hosp, Linhai 317000, Zhejiang, Peoples R China; Westlake Univ, Sch Life Sci, Key Lab Struct Biol Zhejiang Prov, Hangzhou 310000, Zhejiang, Peoples R China; Westlake Lab Life Sci & Biomed, Ctr Infect Dis Res, Hangzhou 310000, Zhejiang, Peoples R China; Westlake Inst Adv Study, Inst Basic Med Sci, Hangzhou 310000, Zhejiang, Peoples R China
ABSTRACT:RT-PCR is the primary method to diagnose COVID-19 and is also used to monitor the disease course. This approach, however, suffers from false negatives due to RNA instability and poses a high risk to medical practitioners. Here, we investigated the potential of using serum proteomics to predict viral nucleic acid positivity during COVID19. We analyzed the proteome of 275 inactivated serum samples from 54 out of 144 COVID-19 patients and shortlisted 42 regulated proteins in the severe group and 12 in the non-severe group. Using these regulated proteins and several key clinical indexes, including days after symptoms onset, platelet counts, and magnesium, we developed two machine learning models to predict nucleic acid positivity, with an AUC of 0.94 in severe cases and 0.89 in non-severe cases, respectively. Our data suggest the potential of using a serum protein-based machine learning model to monitor COVID-19 progression, thus complementing swab RT-PCR tests. More efforts are required to promote this approach into clinical practice since mass spectrometry-based protein measurement is not currently widely accessible in clinic.
Use: pFind; pDeep



Exploring the Microbiome-Wide Lysine Acetylation, Succinylation, and Propionylation in Human Gut Microbiota
Analytical Chemistry2021. Zhang, X et al. Univ Ottawa, Fac Med, Ottawa Inst Syst Biol, Ottawa, ON K1H 8M5, Canada.
ABSTRACT:Lysine acylations are important post-translational modifications that are present in both eukaryotes and prokaryotes and regulate diverse cellular functions. Our knowledge of the microbiome lysine acylation remains limited due to the lack of efficient analytical and bioinformatics methods for complex microbial communities. Here, we show that the serial enrichment using motif antibodies successfully captures peptides containing lysine acetylation, propionylation, and succinylation from human gut microbiome samples. A new bioinformatic workflow consisting of an unrestricted database search confidently identified >60,000 acetylated, and similar to 20,000 propionylated and succinylated gut microbial peptides. The characterization of these identified modification-specific metaproteomes, i.e., meta-PTMomes, demonstrates that lysine acylations are differentially distributed in microbial species with different metabolic capabilities. This study provides an analytical framework for the study of lysine acylations in the microbiome, which enables functional microbiome studies at the post-translational level.
Use: pFind



Discovery and visualization of uncharacterized drug-protein adducts using mass spectrometry
Analytical chemistry2021. Riffle, M et al. Univ Washington, Dept Biochem, Seattle, WA 98195 USA
ABSTRACT:Drugs are often metabolized to reactive intermedi-ates that form protein adducts. Adducts can inhibit protein activity,elicit immune responses, and cause life-threatening adverse drugreactions. The masses of reactive metabolites are frequentlyunknown, rendering traditional mass spectrometry-based proteo-mics approaches incapable of adduct identification. Here, wepresent Magnum, an open-mass search algorithm optimized foradduct identification, and Limelight, a web-based data processingpackage for analysis and visualization of data from all existingalgorithms. Limelight incorporates tools for sample comparisonsand xenobiotic-adduct discovery. We validate our tools with threedrug/protein combinations and apply our label-free workflow toidentify novel xenobiotic-protein adducts in CYP3A4. Our newmethods and software enable accurate identification of xenobiotic-protein adducts with no prior knowledge of adduct masses orprotein targets. Magnum outperforms existing label-free tools in xenobiotic-protein adduct discovery, while Limelight fulfills a majorneed in the rapidly developingfield of open-mass searching, which until now lacked comprehensive data visualization tools.
Use: pFind



DIA-based Proteomics Identifies IDH2 as a Targetable Regulator of Acquired Drug Resistance in Chronic Myeloid Leukemia
Molecular & Cellular Proteomics2021. Liu, W et al. Dalian Med Univ, Coll Pharm, Dept Clin Pharmacol, Dalian, Liaoning, Peoples R China; Westlake Univ, Sch Life Sci, Key Lab Struct Biol Zhejiang Prov, Hangzhou, Zhejiang, Peoples R China; Westlake Lab Life Sci & Biomed, Ctr Infect Dis Res, Hangzhou, Zhejiang, Peoples R China; Westlake Inst Adv Study, Inst Basic Med Sci, Hangzhou, Zhejiang, Peoples R China
ABSTRACT:Drug resistance is a critical obstacle to effective treatment in patients with chronic myeloid leukemia. To understand the underlying resistance mechanisms in response imatinib mesylate (IMA) and adriamycin (ADR), the parental K562 cells were treated with low doses of IMA ADR for 2 months to generate derivative cells with mild, intermediate, and severe resistance to the drugs defined by their increasing resistance index. PulseDIAbased (DIA [data-independent acquisition]) quantitative proteomics was then employed to reveal the proteome changes in these resistant cells. In total, 7082 proteins from 98,232 peptides were identified and quantified from the dataset using four DIA software tools including OpenSWATH, Spectronaut, DIA-NN, and EncyclopeDIA. Sirtuin signaling pathway was found to be significantly enriched in both ADR-resistant and IMA-resistant K562 cells. In particular, isocitrate dehydrogenase (NADP(+)) 2 was identified as a potential drug target correlated with the drug resistance phenotype, and inhibition by the antagonist AGI-6780 reversed the acquired resistance in K562 cells to either ADR or IMA. Together, our study has implicated isocitrate dehydrogenase (NADP(+)) 2 as a potential target that can be therapeutically leveraged to alleviate the drug resistance
Use: pFind



Site-Specific N-and O-Glycosylation Analysis of Human Plasma Fibronectin
frontiers in Chemistry2021. Liu, D et al. Georgia State Univ, Dept Chem, Atlanta, GA 30303 USA.
ABSTRACT:Human plasma fibronectin is an adhesive protein that plays a crucial role in wound healing. Many studies had indicated that glycans might mediate the expression and functions of fibronectin, yet a comprehensive understanding of its glycosylation is still missing. Here, we performed a comprehensive N- and O-glycosylation mapping of human plasma fibronectin and quantified the occurrence of each glycoform in a site-specific manner. Intact N-glycopeptides were enriched by zwitterionic hydrophilic interaction chromatography, and N-glycosite sites were localized by the O-18-labeling method. O-glycopeptide enrichment and O-glycosite identification were achieved by an enzyme-assisted site-specific extraction method. An RP-LC-MS/MS system functionalized with collision-induced dissociation and stepped normalized collision energy (sNCE)-HCD tandem mass was applied to analyze the glycoforms of fibronectin. A total of 6 N-glycosites and 53 O-glycosites were identified, which were occupied by 38 N-glycoforms and 16 O-glycoforms, respectively. Furthermore, 77.31% of N-glycans were sialylated, and O-glycosylation was dominated by the sialyl-T antigen. These site-specific glycosylation patterns on human fibronectin can facilitate functional analyses of fibronectin and therapeutics development.
Use: pFind; pGlyco