pFind Studio: a computational solution for mass spectrometry-based proteomics
2019
Nature structural & molecular biology2019. Liu, Haijun et al.
Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
ABSTRACT:
Use: pFind; pDeep; pLink
Nature structural & molecular biology2019. Liu, Haijun et al.
Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
ABSTRACT:
Use: pFind; pDeep; pLink
Journal of Proteomics2019. Ma, WT et al.
Hainan Univ, Inst Trop Agr & Forestry, Haikou 570228, Hainan, Peoples R China.
ABSTRACT:Peptide-spectrum matches (PSM) scoring between the experimental and theoretical spectrum is a key step in the identification of proteins using mass spectrometry (MS)-based proteomics analyses. Efficient protein identification using MS/MS data remains a challenge. The strategy of using RNA-seq data increases the number of proteins identified by re-constructing the custom search database and integrating mRNA abundance into the false discovery rate of post-PSM. However, this process lacks an algorithm that can allow the incorporation of mRNA abundance into the key scoring model of PSM. Therefore, we developed a novel PSM scoring model, which incorporates mRNA abundance for improved peptide and protein identification. In the new algorithm, abundance information of mRNA was transformed to the prior probability of protein identification and integrated to re-score in PSM using the binomial probability distribution model. Compared with other algorithms using five MS/MS datasets, the results showed that the least improvement ratios of peptide and protein groups were 3.39%-9.79% and 0.48%-8.16% in different datasets (human, rat, zebrafish, yeast, and Arabidopsis thaliana). The new strategy offers an effective solution for MS-based identification of peptides and proteins. Significance: The new algorithm identifies proteins by quantifying mRNA abundance (FPKM) and incorporating it into a scoring model for peptide-spectrum matches. It is important to improve peptide and protein identification from MS/MS datasets in proteomics research.
Use: pFind
Genome research2019. Shao, Yi et al.
Chinese Acad Sci, State Key Lab Integrated Management Pest Insects, Inst Zool, Beijing 100101, Peoples R China; Chinese Acad Sci, CAS Ctr Excellence Anim Evolut & Genet, Kunming 650223, Yunnan, Peoples R China; Univ Chinese Acad Sci, Beijing 100049, Peoples R China; Chinese Acad Sci, Inst Zool, Key Lab Zool Systemat & Evolut, Beijing 100101, Peoples R China
ABSTRACT:The origination of new genes contributes to phenotypic evolution in humans. Two major challenges in the study of new genes are the inference of gene ages and annotation of their protein-coding potential. To tackle these challenges, we created GenTree, an integrated online database that compiles age inferences from three major methods together with functional genomic data for new genes. Genome-wide comparison of the age inference methods revealed that the synteny-based pipeline (SBP) is most suited for recently duplicated genes, whereas the protein-family-based methods are useful for ancient genes. For SBP-dated primate-specific protein-coding genes (PSGs), we performed manual evaluation based on published PSG lists and showed that SBP generated a conservative data set of PSGs by masking less reliable syntenic regions. After assessing the coding potential based on evolutionary constraint and peptide evidence from proteomic data, we curated a list of 254 PSGs with different levels of protein evidence. This list also includes 41 candidate misannotated pseudogenes that encode primate-specific short proteins. Coexpression analysis showed that PSGs are preferentially recruited into organs with rapidly evolving pathways such as spermatogenesis, immune response, mother-fetus interaction, and brain development. For brain development, primate-specific KRAB zinc-finger proteins (KZNFs) are specifically up-regulated in the mid-fetal stage, which may have contributed to the evolution of this critical stage. Altogether, hundreds of PSGs are either recruited to processes under strong selection pressure or to processes supporting an evolving novel organ.
Use: pFind
MOLECULAR & CELLULAR PROTEOMICS2019. An, ZW et al.
Chinese Acad Sci, Acad Math & Syst Sci, Natl Ctr Math & Interdisciplinary Sci, Key Lab Random Complex Struct & Data Sci, Beijing 100190, Peoples R China.
ABSTRACT:The open (mass tolerant) search of tandem mass spectra of peptides shows great potential in the comprehensive detection of post-translational modifications (PTMs) in shotgun proteomics. However, this search strategy has not been widely used by the community, and one bottleneck of it is the lack of appropriate algorithms for automated and reliable post-processing of the coarse and error-prone search results. Here we present PTMiner, a software tool for confident filtering and localization of modifications (mass shifts) detected in an open search. After mass-shift-grouped false discovery rate (FDR) control of peptide-spectrum matches (PSMs), PTMiner uses an empirical Bayesian method to localize modifications through iterative learning of the prior probabilities of each type of modification occurring on different amino acids. The performance of PTMiner was evaluated on three data sets, including simulated data, chemically synthesized peptide library data and modified-peptide spiked-in proteome data. The results showed that PTMiner can effectively control the PSM FDR and accurately localize the modification sites. At 1% real false localization rate (FLR), PTMiner localized 93%, 84 and 83% of the modification sites in the three data sets, respectively, far higher than two open search engines we used and an extended version of the Ascore localization algorithm. We then used PTMiner to analyze a draft map of human proteome containing 25 million spectra from 30 tissues, and confidently identified over 1.7 million modified PSMs at 1% FDR and 1% FLR, which provided a system-wide view of both known and unknown PTMs in the human proteome.
Use: pParse; pFind
Journal of Biological chemistry2019. Liu, L et al.
Nankai Univ, Coll Pharm, Tianjin 300353, Peoples R China.
ABSTRACT:O-GlcNAcylation is a ubiquitous protein glycosylation playing different roles on variant proteins. O-GlcNAc transferase (OGT) is the unique enzyme responsible for the sugar addition to nucleocytoplasmic proteins. Recently, multiple O-GlcNAc sites have been observed on short-form OGT (sOGT) and nucleocytoplasmic OGT (ncOGT), both of which locate in the nucleus and cytoplasm in cell. Moreover, O-GlcNAcylation of Ser(389) in ncOGT (1036 amino acids) affects its nuclear translocation in HeLa cells. To date, the major O-GlcNAcylation sites and their roles in sOGT remain unknown. Here, we performed LC-MS/MS and mutational analyses to seek the major O-GlcNAcylation site on sOGT. We identified six O-GlcNAc sites in the tetratricopeptide repeat domain in sOGT, with Thr(12) and Ser(56) being two ?key? sites. Thr(12) is a dominant O-GlcNAcylation site, whereas the modification of Ser(56) plays a role in regulating sOGT O-GlcNAcylation, partly through Thr(12). In vitro activity and pulldown assays demonstrated that O-GlcNAcylation does not affect sOGT activity but does affect sOGT-interacting proteins. In HEK293T cells, S56A bound to and hence glycosylated more proteins in contrast to T12A and WT sOGT. By proteomic and bioinformatics analyses, we found that T12A and S56A differed in substrate proteins (e.g. HNRNPU and PDCD6IP), which eventually affected cell cycle progression and/or cell proliferation. These findings demonstrate that O-GlcNAcylation modulates sOGT substrate selectivity and affects its role in the cell. The data also highlight the regulatory role of O-GlcNAcylation at Thr(12) and Ser(56).
Use: pFind
Biochimie2019. Wei, Shuangshuang et al.
Hainan Univ, 817 Nongke Lou,58 Peoples Rd, Haikou 570228, Hainan, Peoples R China
ABSTRACT:Keap1 is deemed as a suppressor of Nrf2 in cytoplasm by sequestrating Nrf2 to proteolysis as an adapter of the Cul3-Rbx1 E3 ubiquitin ligase complex. In the study, it was proposed that post-translational modification might affect the interaction between Nrf2 and Keap1, and the profiles of the phosphorylation of amino acid residues of Keap1 and its effects on the binding of Keap1 to Nrf2 was investigated. A mass spectrometry analysis revealed that S53 and 5293 were phosphorylated upon an oxidative stress. Using Keap1 proteins with amino acid residues mutated to glutamate to simulate the introduction of a negative charge by phosphorylation, it was found that a potential phosphorylation of S53 affected Keap1-Nrf2 binding in the pull-down assay, and induced nuclear translocation of Nrf2 in the electrophoretic mobility shift assay. Sequence homology analysis showed that S53 was highly conserved. Structural modeling around BTB domain of wild type and S53E-mutant Keap1 showed that the negative charge introduced by S53E mutation generates a salt bridge between E53 and ionized guanidine group of Arg50. Real-time qRT-PCR for transcription levels of antioxidant genes that are modulated by Nrf2 further proved the effects of the potential phosphorylation of S53 under an oxidative stress condition.In summary, S53 is a potential phosphorylation site of Keap1, and the phosphorylation could enhance the antioxidative capacity of cells in response to an oxidative stress. (C) 2018 Elsevier B.V. and Societe Francaise de Biochimie et Biologie Moleculaire (SFBBM). All rights reserved.
Use: pFind
Mass Spectrometry of Proteins: Methods and Protocols2019. Lund, Peder J et al.
Epigenetics Institute, Department of Biochemistry and Molecular Biophysics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
ABSTRACT:Lysine acetylation is an important posttranslational modification (PTM) that regulates the function of proteins by affecting their localization, stability, binding, and enzymatic activity. Aberrant acetylation patterns have been observed in numerous diseases, most notably cancer, which has spurred the development of potential therapeutics that target acetylation pathways. Mass spectrometry (MS) has become the most adopted tool not only for the qualitative identification of acetylation sites but also for their large-scale quantification. By using heavy isotope labeling in cell culture combined with MS, it is now possible to accurately quantify newly synthesized acetyl groups and other PTMs, allowing differentiation between dynamically regulated and steady-state modifications. Here, we describe MS-based protocols to identify acetylation sites and quantify acetylation rates on both proteins in general and in the special case of histones. In the experimental approach for the former, 13C-glucose and D3-acetate are used to metabolically label protein acetylation in cells with stable isotopes, thus allowing isotope incorporation to be tracked over time. After protein extraction and digestion, acetylated peptides are enriched via immunoprecipitation and then analyzed by MS. For histones, a similar metabolic labeling approach is performed, followed by acid extraction, derivatization with propionic anhydride, and trypsin digestion prior to MS analysis. The procedures presented may be adapted to investigate acetylation dynamics in a broad range of experimental contexts, including different cell types and stimulation conditions.
Use: pFind; pDeep
Molecular Biology Reports2019. Wen, WT et al.
Sichuan Univ, Coll Life Sci, Minist Educ, Key Lab Bioresources & Ecoenvironm, Chengdu 610064, Sichuan, Peoples R China.
ABSTRACT:The differences in proteome profile of longissimus thoracis (LT) muscles of yak (Bos grunniens) and cattle (Bos taurus) were investigated employing isobaric tag for relative and absolute quantification (iTRAQ) approach to identify differentially expressed proteins and to understand the cellular level adaptations of yaks to high altitudes. Fifty-two proteins were differentially expressed in the two species, among which 20 were up-regulated and 32 were down-regulated in yaks. Gene ontology (GO) annotation revealed that most of the differentially expressed proteins were involved in the molecular function of protein binding, catalytic activity, and structural activity. Protein-protein interaction analysis recognized 24 proteins (involved in structural integrity, calcium ion regulation, and energy metabolism), as key nodes in biological interaction networks. These findings indicated that mammals living at high altitudes could possibly generate energy by pronounced protein catabolism and glycolysis compared with those living in the plains. The key differentially expressed proteins included calsequestrin 1, prostaglandin reductase 1 and ATP synthase subunit O, which were possibly associated with the cellular and biochemical adaptation of yaks to high altitude. These key proteins may be exploited as candidate proteins for mammalian adaptation to high altitudes.
Use: pFind
Analytical and Bioanalytical Chemistry2019. Li, X et al.
Georgia State Univ, Ctr Diagnost & Therapeut, 50 Decatur St SE, Atlanta, GA 30303 USA.
ABSTRACT:Rheumatoid arthritis (RA) is an autoimmune disease in which certain immune cells are dysfunctional and attack their own healthy tissues. There has been great difficulty in finding an accurate and efficient method for the diagnosis of early-stage RA. The present shortage of diagnostic methods leads to the rough treatments of the patients in the late stages, such as joint removing. Nowadays, there is an increasing focus on glyco-biomarkers discovery for malicious disease via MS-based strategy. In this study, we present an integrated proteomics and glycoproteomics approach to uncover the pathological changes of some RA-related glyco-biomarkers and glyco-checkpoints involved in the RA onset. Among 39 distinctly expressive N-glycoproteins, 27 N-glycoproteins were discovered with over twofold expression significances. On the other hand, 13 proteins have been distinguished with significant differences in 53 distinctly expressed proteins identified in this study. Such an integrated approach will provide a comprehensive strategy for new potential glyco-biomarkers and checkpoints discovery in rheumatoid arthritis.
Use: pBuild; pFind