pFind Studio: a computational solution for mass spectrometry-based proteomics
2014
Journal of Proteomics2014. Ma, C et al.
Georgia State Univ, Ctr Diagnost & Therapeut, Atlanta, GA 30302 USA.
ABSTRACT:The core fucosylation (CF) of N-glycoproteins plays important roles in regulating protein functions during biological development, and it has also been shown to be up-regulated in several high metastasis cancer cell lines. Therefore, global profiling and quantitative characterization of CF-glycoproteins may reveal potent biomarkers for clinical applications. However, due to the complex fragmentation pattern of CF-glycopeptides, accurately identifying CF-glycosylation sites via mass spectrometry with high throughput remains a formidable challenge. In this study, we established a precise CF-glycosylation site identification strategy with UHPLC LTQ-Orbitrap Elite under low- and high-normalized collision energy (LHNCE) conditions. To demonstrate the feasibility of LHNCE, the CF-glycopeptides of target proteins in clinical plasma samples were applied and compared as a preliminary demonstration and resulted in the assignment of 357 unique CF-glycosylation sites from 209 CFglycoproteins. In this study, the largest human plasma CF-glycosylation site database was constructed, and at least three-fold more CF-sites were identified compared to previously published studies. The results further demonstrated that LIINCE provides an important approach for CF-glycoprotein function studies and biomarker screening in cancer research. Biological significance Core-fucosylation (CF) is a kind of N-linked glycosylation in which an alpha 1,6-linked fucose is added to the innermost N-acetylglucosamine (G1cNAc) residue. It has been proved that core-fucosylation is involved in regulating biological processes in mammals. Abnormal core-fucosylation has been demonstrated in human pathological processes, such as metastasis. For example, the CF-glycosylation of an alpha-fetoprotein isoform (AFP-L3) was approved as a biomarker of hepatocellular carcinoma (Hcq. In addition, GP73 is also a well-known biomarker and its CF-glycosylation level will increase in liver cancer patients. Therefore, it is crucial to develop a strategy for mapping human CF-glycosylation. (C) 2014 Elsevier B.V. All rights reserved.
Use: pFind; pBuild
Journal of Proteome Research2014. Beck, DB et al.
Univ Penn, Perelman Sch Med, Epigenet Program, Dept Cell & Dev Biol, Philadelphia, PA 19103 USA.
ABSTRACT:Accurate and sensitive detection of protein-protein and protein-RNA interactions is key to understanding their biological functions. Traditional methods to identify these interactions require cell lysis and biochemical manipulations that exclude cellular compartments that cannot be solubilized under mild conditions. Here, we introduce an in vivo proximity labeling (IPL) technology that employs an affinity tag combined with a photoactivatable probe to label polypeptides and RNAs in the vicinity of a protein of interest in vivo. Using quantitative mass spectrometry and deep sequencing, we show that IPL correctly identifies known protein-protein and protein-RNA interactions in the nucleus of mammalian cells. Thus, IPL provides additional temporal and spatial information for the characterization of biological interactions in vivo.
Use: pFind; pParse
Proteomics2014. Sun, Han et al.
Shanghai Acad Sci & Technol, Shanghai Ctr Bioinformat Technol, Shanghai 201203, Peoples R China
ABSTRACT:MS/MS has been used to improve genome annotation in various organisms. The classical approach is to construct comprehensive theoretical peptide database with six frame translation model from the whole ORF of a genome and search against this database with real MS/MS spectra. In this work we took a more focused approach, we constructed a database containing only peptides from the ab initio predicted genes from current human genome annotation, and all theoretical peptides from currently annotated lncRNAs, and searched such a database with MS/MS data from human Hela cell line. The purpose of this design is to find translation evidence for ab initio predicted genes and to rule out possible wrongly defined lncRNAs in a systematic proteogenomics effort. To validate proteogenomics results, we integrated RNA-Seq data analysis for the same Hela cell line which generated MS/MS data, and performed MRM experiment on self-cultured Hela cell line samples. Six peptides were found to support ab initio predicted genes with both RNA-Seq and MRM validations, while none was found to support a translated lncRNA. This workflow could be flexibly applied to other human samples and datasets to help further improve human gene annotation.
Use: pFind
BMC Bioinformatics2014. Li, Y et al.
Hong Kong Baptist Univ, Dept Comp Sci, Kowloon Tong, Hong Kong, Peoples R China.
ABSTRACT:Background: Tandem mass spectrometry-based database searching is currently the main method for protein identification in shotgun proteomics. The explosive growth of protein and peptide databases, which is a result of genome translations, enzymatic digestions, and post-translational modifications (PTMs), is making computational efficiency in database searching a serious challenge. Profile analysis shows that most search engines spend 50%-90% of their total time on the scoring module, and that the spectrum dot product (SDP) based scoring module is the most widely used. As a general purpose and high performance parallel hardware, graphics processing units (GPUs) are promising platforms for speeding up database searches in the protein identification process. Results: We designed and implemented a parallel SDP-based scoring module on GPUs that exploits the efficient use of GPU registers, constant memory and shared memory. Compared with the CPU-based version, we achieved a 30 to 60 times speedup using a single GPU. We also implemented our algorithm on a GPU cluster and achieved an approximately favorable speedup. Conclusions: Our GPU-based SDP algorithm can significantly improve the speed of the scoring module in mass spectrometry-based protein identification. The algorithm can be easily implemented in many database search engines such as X!Tandem, SEQUEST, and pFind. A software tool implementing this algorithm is available at http://www.comp.hkbu.edu.hk/similar to youli/ProteinByGPU.html
Use: pFind
International Conference on Intelligent Information Processing2014. Chao Pan et al.
College of Biomedical Engineering and Instrument Science, Key Laboratory of Biomedical Engineering of Ministry of Education of China, Zhejiang University
ABSTRACT:Label-free quantitative proteomics based on mass spectrometry plays an essential role in large-scale analysis of complex proteomes. Meanwhile, quantitative proteomics is not only a way for data processing, but also an important approach for exploring protein functions and interactions in a large-scale manner. An effective method combining quantitation and qualification should be built. To systematically overcome this challenge, we proposed a new label-free quantitative method using spectral counting in the proposed method, the count of shared peptides was considered as an optimized factor to accurately appraise abundance of Isoforms for complex proteomes. Large-scale functional annotations for complex proteomes were extracted by g:Profiler and were assigned to functional clusters. To test the effect of the methods, three groups of mitochondrial proteins including mouse heart mitochondrial dataset, mouse liver mitochondrial dataset and human heart mitochondrial dataset were selected for analysis. According to the biochemical properties of mitochondrial proteins, all functional annotations were assigned to various signalling pathway or functional clusters. We came to draw a conclusion that the strategy with shared peptides overcame inaccurate and overestimated results for low-abundant isoforms to improve accuracy, and quantitative proteomics coupled with biomedical knowledge can thoroughly comprehend functions and relationships for complex proteomes, and contribute to providing a new method for large-scale comparative or diseased proteomics.
Use: pFind
Journal of proteome research2014. Cao, Liwei et al.
Chinese Acad Sci, Dalian Inst Chem Phys, Key Lab Separat Sci Analyt Chem, Dalian 116023, Peoples R China
ABSTRACT:N-Glycosylation site analysis of baker's yeast Saccharomyces cerevisiae is of fundamental significance to elucidate the molecular mechanism of human congenital disorders of glycosylation (CDG). Here we present a mass spectrometry (MS)-based workflow for the profiling of N-glycosylated sites in S. cerevisiae proteins. In this workflow, proteolytic glycopeptides were enriched by using a hydrophilic material named Click TE-Cys to improve the glycopeptide selectivity and coverage. To enhance the reliability of the identified results, the enriched glycopeptides were subjected to parallel deglycosylation by using two endoglycosidases (i.e., PNGase F and Endo H-f), respectively, prior to LC-MS/MS analysis. On the basis of the workflow, a total of 135 N-glycosylated sites including 6 known, 93 potential, and 36 novel sites were identified and mapped to 79 proteins. Among the novel-type sites, nine sites from eight proteins, which were simultaneously identified via PNGase F and Endo H-f deglycosylation, are believed to possess high confidence. The established workflow, together with the profile of N-glycosylated sites, will contribute to the improvement of S. cerevisiae model for revealing the pathogenesis of CDG.
Use: pFind; pXtract; pBuild
Molecular & Cellular Proteomics2014. Chalkley, RJ et al.
600 16th St,Genentech Hall,Room N474A, San Francisco, CA 94158 USA.
ABSTRACT:The proteome informatics research group of the Association of Biomolecular Resource Facilities conducted a study to assess the community's ability to detect and characterize peptides bearing a range of biologically occurring post-translational modifications when present in a complex peptide background. A data set derived from a mixture of synthetic peptides with biologically occurring modifications combined with a yeast whole cell lysate as background was distributed to a large group of researchers and their results were collectively analyzed. The results from the twenty-four participants, who represented a broad spectrum of experience levels with this type of data analysis, produced several important observations. First, there is significantly more variability in the ability to assess whether a results is significant than there is to determine the correct answer. Second, labile post-translational modifications, particularly tyrosine sulfation, present a challenge for most researchers. Finally, for modification site localization there are many tools being employed, but researchers are currently unsure of the reliability of the results these programs are producing.
Use: pFind
Proceedings of the National Academy of Sciences of the United States of America2014. Yang, MK et al.
Chinese Acad Sci, Inst Hydrobiol, Key Lab Algal Biol, Wuhan 430072, Peoples R China.
ABSTRACT:We describe an integrated workflow for proteogenomic analysis and global profiling of posttranslational modifications (PTMs) in prokaryotes and use the model cyanobacterium Synechococcus sp. PCC 7002 (hereafter Synechococcus 7002) as a test case. We found more than 20 different kinds of PTMs, and a holistic view of PTM events in this organism grown under different conditions was obtained without specific enrichment strategies. Among 3,186 predicted protein-coding genes, 2,938 gene products (> 92%) were identified. We also identified 118 previously unidentified proteins and corrected 38 predicted gene-coding regions in the Synechococcus 7002 genome. This systematic analysis not only provides comprehensive information on protein profiles and the diversity of PTMs in Synechococcus 7002 but also provides some insights into photosynthetic pathways in cyanobacteria. The entire proteogenomics pipeline is applicable to any sequenced prokaryotic organism, and we suggest that it should become a standard part of genome annotation projects.
Use: pFind
Journal of Proteome Research2014. Chen, Z et al.
Chinese Acad Sci, Wuhan Inst Virol, State Key Lab Virol, Wuhan 430071, Peoples R China.
ABSTRACT:Protein phosphorylation on serine, threonine, and tyrosine (Ser/Thr/Tyr) is well established as a key regulatory posttranslational modification used in signal transduction to control cell growth, proliferation, and stress responses. However, little is known about its extent and function in diatoms. Phaeodactylum tricornutum is a unicellular marine diatom that has been used as a model organism for research on diatom molecular biology. Although more than 1000 protein kinases and phosphatases with specificity for Ser/Thr/Tyr residues have been predicted in P. tricornutum, no phosphorylation event has so far been revealed by classical biochemical approaches. Here, we performed a global phosphoproteomic analysis combining protein/peptide fractionation, TiO2 enrichment, and LC-MS/MS analyses. In total, we identified 264 unique phosphopeptides, including 434 in vivo phosphorylated sites on 245 phosphoproteins. The phosphorylated proteins were implicated in the regulation of diverse biological processes, including signaling, metabolic pathways, and stress responses. Six identified phosphoproteins were further validated by Western blotting using phospho-specific antibodies. The functions of these proteins are discussed in the context of signal transduction networks in P. tricornutum. Our results advance the current understanding of diatom biology and will be useful for elucidating the phosphor-relay signaling networks in this model diatom.
Use: pFind; pBuild
Disease Markers2014. Zhang, Y et al.
Peking Union Med Coll, Canc Inst Hosp, Beijing Key Lab Carcinogenesis & Canc Prevent, State Key Lab Mol Oncol, POB 2258, Beijing 100021, Peoples R China.
ABSTRACT:This study was aimed to create a large-scale laryngeal cancer relevant secretory/releasing protein database and further discover candidate biomarkers. Methods. Primary tissue cultures were established using tumor tissues and matched normal mucosal tissues collected from four laryngeal cancer patients. Serum-free conditioned medium( CM) samples were collected. These samples were then sequentially processed by SDS-PAGE separation, trypsin digestion, and LC-MS/MS analysis. The candidates in the database were validated by ELISA using plasma samples from laryngeal cancer patients, benign patients, and healthy individuals. Results. Combining MS data from the tumor tissues and normal tissues, 982 proteins were identified in total; extracellular proteins and cell surface proteins accounted for 15.0% and 4.3%, respectively. According to stringent criteria, 49 proteins were selected as candidates worthy of further validation. Of these, human tissue kallikrein 6 (KLK6) was verified. The level of KLK6 was significantly increased in the plasma samples from the cancer cohort compared to the benign and healthy cohorts and moreover showed a slight decrease in the postoperative plasma samples in comparison to the preoperative plasma samples. Conclusions. This laryngeal cancer-derived protein database provides a promising repository of candidate blood biomarkers for laryngeal cancer. The diagnostic potential of KLK6 deserves further investigation.
Use: pFind