pFind Studio: a computational solution for mass spectrometry-based proteomics



2020




Protocol for Proximity-Dependent Proteomic Profiling in Yeast Cells by APEX and Alk-Ph Probe
STAR Protocols2020. Li, Yi et al. Peking University
ABSTRACT:Alk-Ph is a clickable APEX2 substrate developed for spatially restricted protein/RNA labeling in intact yeast cells. Alk-Ph is more water soluble and cell wall permeable than biotin-phenol substrate, allowing more efficient profiling of the subcellular proteome in microorganisms. We describe the protocol for Alk-Ph probe synthesis, APEX2 expression, and protein/RNA labeling in yeast and the workflow for quantitative proteomic experiments and data analysis. Using the yeast mitochondria as an example, we provide guidelines to achieve high-resolution mapping of subcellular yeast proteome and transcriptome. For complete details on the use and execution of this protocol, please refer to Li etal. (2020). 2020 The Author(s).
Use: pQuant; pFind



Open-pFind verified four missing proteins from multi-tissues
Journal of proteome research2020. Wu, SJ et al. Wuhan Univ, Sch Basic Med Sci, Minist Educ, Key Lab Combinatl Biosynth & Drug Discovery, Wuhan 430072, Peoples R China.
ABSTRACT:The Chromosome-Centric Human Proteome Project (C-HPP) was launched in 2012 to perfect the annotation of human protein existence by identifying stronger evidence of the expression of missing proteins (MPs) at the protein level. After an 8 year effort all over the world, the number of MPs in the neXtProt database significantly decreased from 5511 (2012-02-24) to 1899 (2020-01-17). It is now more difficult to provide confident evidence of the remaining MPs because of their specific characteristics, including low abundance, low molecular weight, unexpected modifications, transmembrane structure, tissue-expression specificity, and so on. A higher resolution mass spectrometry (MS) interpretation engine might provide an opportunity to identify these buried MPs in complex samples by the combination with multi-tissue large-scale proteomics. In this study, open-pFind was used to dig MPs from 20 pairs of healthy human tissues by Wang et al. ( Mol. Syst. Biol. 2019, 15 (2), e8503) combined with our large-scale testis data set digested by three enzymes (Glu-C, Lys-C, and trypsin) with specificity for different amino acid residues ( J. Proteme Res. 2019, 18 (12), 4189-4196). A total of 1 535 536 peptides with 17 283 477 peptide-spectrum matches (PSMs) were mapped to 14 279 protein entries at a false discovery rate of <1% at the PSM, peptide, and protein levels. A total of 103 MP candidates were identified, among which 86 candidates had more unique peptide numbers compared with our single testis tissue. After rigorous screening, manual checks, peptide synthesis, and matching with documented peptides from PeptideAtlas, we validated four MPs, P0C7T8 (duodenum and small intestine), Q8WWZ4 (stomach and rectum), Q8IV35 (fallopian tube), and O14921 (tonsil), at the protein level. All MS raw files have been deposited to the ProteomeXchange with identifier PXD021391.
Use: pFind



A protein identification algorithm optimization for mass spectrometry data using deep learning
2020 3rd International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE)2020. Xu, Rui et al. State Key Laboratory of Proteomics, Beijing Proteome Research Center National Center for Protein Sciences (Beijing), Beijing
ABSTRACT:Protein sequence database search is one of the most commonly used methods for protein identification in shotgun proteomics. In tradition, searching a protein sequence database is usually required to construct the theoretical spectrum for each peptide at first, which only considers the information of mass-to-charge ratio at present. However, the information related to isotope peak intensity is neglected. Thanks to the rapid development of artificial intelligence technique in recent years, deep learning-based MS/MS spectrum prediction tools have showed a high accuracy and great potentials to improve the sensitivity and accuracy of protein sequence database searching. In this study, we used a deep learning model (pDeep2) to predict the theoretical mass spectrum of all peptides and applied it to a database searching tool (DeepNovo), thus improving the sensitivity and accuracy of peptide identification.
Use: pFind; pDeep



?
.. 2020. et al. , 100730, ; , , , 100875,
ABSTRACT:Proteins are essential nutrients for humans, but why is there a small amount of proteins excreted through urine? Although healthy people have low levels of protein in the urine, they are rich in variety.The proteins in them may be unnecessary wastes of the body or may even have a toxic effect on the body and must be excreted.Compared with small-molecule metabolites, proteins have a complex structure and various functions.Even a small change at the molecular level will affect their subsequent biological functions.A comprehensive comparison of the molecular level modifications of plasma and urine proteome, the difference may provide clues to the presence of protein in the urine.In this study, a total of 9 healthy people's urine and 9 healthy people's plasma samples were collected.The samples were analyzed by high-resolution tandem mass spectrometry and unlabeled quantitative proteomics techniques, and the non-limiting modification identification algorithm was used to to compare molecular modifications between two types of samples as a whole.The results showed that the amount of cysteine(Cys)modification to dehydroalanine(Dha)in the urine proteome was higher than that in plasma.The molecular modification[CysDha] destroys the disulfide bond of the original protein,thereby affecting or even changing the structure and biological function of the protein.Therefore, this study revealed the differences in the overall proteome modification of plasma and urine and pointed out that the structure of protein may be irreversibly changed due to the difference in modification of protein molecules, which in turn caused proteins which lost their functions and even toxic proteins from the blood to be excreted into the urine.
Use: pFind



Improvements on the quantitative analysis of Trypanosoma cruzi histone post translational modifications: Study of changes in epigenetic marks through the parasite's metacyclogenesis and life cycle
Journal of proteomics2020. de Lima, LP et al. Inst Butantan, Lab Especial Ciclo Celular, Ave Vital Brasil 1500, BR-05503900 Sao Paulo, Brazil.
ABSTRACT:Trypanosome histone N-terminal sequences are very divergent from the other eukaryotes, although they are still decorated by post-translational modifications (PTMs). Here, we used a highly robust workflow to analyze histone PTMs in the parasite Trypanosoma cruzi using mass spectrometry-based (MS-based) data-independent acquisition (DIA). We adapted the workflow for the analysis of the parasite's histone sequences by modifying the software EpiProfile 2.0, improving peptide and PTM quantification accuracy. This workflow could now be applied to the study of 141 T. cruzi modified histone peptides, which we used to investigate the dynamics of histone PTMs along the metacyclogenesis and the life cycle of T. cruzi. Global levels of histone acetylation and methylation fluctuates along metacyclogenesis, however most critical differences were observed between parasite life forms. More than 66 histone PTM changes were detected. Strikingly, the histone PTM pattern of metacyclic trypomastigotes is more similar to epimastigotes than to cellular trypomastigotes. Finally, we highlighted changes at the H4 N-terminus and at H3K76 discussing their impact on the trypanosome biology. Altogether, we have optimized a workflow easily applicable to the analysis of histone PTMs in T. cruzi and generated a dataset that may shed lights on the role of chromatin modifications in this parasite. Significance: Trypanosomes are unicellular parasites that have divergent histone sequences, no chromosome condensation and a peculiar genome/gene regulation. Genes are transcribed from divergent polycistronic regions and post-transcriptional gene regulation play major role on the establishment of transcripts and protein levels. In this regard, the fact that their histones are decorated with multiple PTMs raises interesting questions about their role. Besides, this digenetic organism must adapt to different environments changing its metabolism accordingly. As metabolism and epigenetics are closely related, the study of histone PTMs in trypanosomes may enlighten this strikingly, and not yet fully understood, interplay. From a biomedical perspective, the comprehensive study of molecular mechanisms associated to the metacyclogenesis process is essential to create better strategies for controlling Chagas disease.
Use: pFind



Characterization of Lysine Monomethylome and Methyltransferase in Model Cyanobacterium Synechocystis sp. PCC 6803
Genomics, proteomics & bioinformatics2020. Lin, XH et al. Chinese Acad Sci, Inst Hydrobiol, State Key Lab Freshwater Ecol & Biotechnol, Wuhan 430072, Peoples R China.
ABSTRACT:Protein lysine methylation is a prevalent post-translational modification (PTM) and plays critical roles in all domains of life. However, its extent and function in photosynthetic organisms are still largely unknown. Cyanobacteria are a large group of prokaryotes that carry out oxygenic photosynthesis and are applied extensively in studies of photosynthetic mechanisms and environmental adaptation. Here we integrated propionylation of monomethylated proteins, enrichment of the modified peptides, and mass spectrometry (MS) analysis to identify monomethylated proteins in Synechocystis sp. PCC 6803 (Synechocystis). Overall, we identified 376 monomethylation sites in 270 proteins, with numerous monomethylated proteins participating in photosynthesis and carbon metabolism. We subsequently demonstrated that CpcM, a previously identified asparagine methyl-transferase in Synechocystis, could catalyze lysine monomethylation of the potential aspartate amino-transferase 5110480 both in vivo and in vitro and regulate the enzyme activity of 5110480. The loss of CpcM led to decreases in the maximum quantum yield in primary photosystem II (PSII) and the efficiency of energy transfer during the photosynthetic reaction in Synechocystis. We report the first lysine monomethylome in a photosynthetic organism and present a critical database for functional analyses of monomethylation in cyanobacteria. The large number of monomethylated proteins and the identification of CpcM as the lysine methyltransferase in cyanobacteria suggest that reversible methylation may influence the metabolic process and photosynthesis in both cyanobacteria and plants.
Use: pFind



Unambiguous Phosphosite Localization through the Combination of Trypsin and LysargiNase Mirror Spectra in a Large-Scale Phosphoproteome Study
Journal of proteome research2020. Xu, F et al. Beijing Inst Life, Natl Ctr Prot Sci Beijing, Beijing Proteome Res Ctr, State Key Lab Prote, Beijing 102206, Peoples R China.
ABSTRACT:Understanding of the kinase-guided signaling pathways requires the identification and analysis of phosphosites. Mass spectrometry (MS)-based phosphoproteomics is a rapid and highly sensitive approach for high-throughput identification of phosphosites. However, phosphosite determination from MS data with a single protease is more likely to be ambiguous, regardless of the strategy used for phosphopeptide detection. Here, we explored the application of LysargiNase, which was recently reported to mirror trypsin in specificity to cleave arginine and lysine residues exclusively at the N-terminal side. We found that the combination of trypsin and LysargiNase mirror spectra resulted in higher ion coverage in MS(2 )spectra. The median ion coverage values of b ions in tryptic spectra, LysargiNase spectra, and combined spectra are 8.3, 20.5, and 25.0%, respectively. As for the median ion coverage of y ions, these values are 27.8, 10.0, and 32.3%. Higher ion coverage was helpful to pinpoint the precise phosphosites. Compared to trypsin alone, the combined use of trypsin and LysargiNase mirror spectra enabled 67.1% of mirror spectra with unreliable scores (confidence score <0.75) to become reliable (confidence score >= 0.75). Meanwhile, all of the mirror peptide-spectrum matches (PSMs) with multiple potential phosphosites from trypsin and LysargiNase digests could be assigned one precise phosphosite after applying the combination strategy. Besides, the combination strategy could identify more novel phosphosites than the union strategy did. We synthesized three phosphopeptides corresponding to the three novel phosphosites and validated the reliability of the identification. Taken together, our data demonstrated the distinctive potential of the combination strategy presented here for unambiguous phosphosite localization (Project accession PXD011178).
Use: pFind



UPEFinder: A Bioinformatic Tool for the Study of Uncharacterized Proteins Based on Gene Expression Correlation and the PageRank Algorithm
Journal of proteome research2020. Gonzalez-Gomariz, J et al. Navarra Inst Hlth Res, IdiSNA, E-31008 Pamplona, Spain.
ABSTRACT:The Human Proteome Project (HPP) is leading the international effort to characterize the human proteome. Although the main goal of this project was first focused on the detection of missing proteins, a new challenge arose from the need to assign biological functions to the uncharacterized human proteins and describe their implications in human diseases. Not only the proteins with experimental evidence (uPE1 proteins) but also the uncharacterized missing proteins (uMPs) were the objects of study in this challenge, neXt-CP50. In this work, we developed a new bioinformatic approach to infer biological annotations for the uPE1 proteins and uMPs based on a "guilt-by-association" analysis using public RNA-Seq data sets. We used the correlation of these proteins with the well-characterized PE1 proteins to construct a network. In this way, we applied the PageRank algorithm to this network to identify the most relevant nodes, which were the biological annotations of the uncharacterized proteins. All of the generated information was stored in a database. In addition, we implemented the web application UPEFinder (https://upefinder. proteored.org ) to facilitate the access to this new resource. This information is especially relevant for the researchers of the HPP who are interested in the generation and validation of new hypotheses about the functions of these proteins. Both the database and the web application are publicly available (https://github.com/tibioinformat/UPEfinder).
Use: pFind



Quantitative Perspective on Online Flow Reaction Profiling Using a Miniature Mass Spectrometer
ORGANIC PROCESS RESEARCH & DEVELOPMENT2020. Sheng, HM et al. Merck & Co Inc, Analyt Sci, Rahway, NJ 07065 USA.
ABSTRACT:Online mass spectrometry has proven to be a useful tool for characterizing many aspects of chemical reactions. However, to the best of the authors' knowledge, no reference standard (RS) quantitation approach has been applied in online MS profiling work to date. In this study, we present a RS approach for online quantitation of an aerobic oxidation reaction in flow using a miniature mass spectrometer, with both internal RS and external RS quantitation approaches being evaluated. Quinoline, a structurally similar and chemically inert compound under these reaction conditions, was chosen as the RS to quantify the pyridine aldehyde product. To investigate the optimal RS concentration and instrument attenuation, calibration curves were established by plotting the product/RS peak intensity ratio against the theoretical product yield at different attenuation (dilution factor) values. The MS quantitation results for the actual flow reactions were validated with conventional offline H-1 NMR analysis.
Use: pFind



Open-pFind Verified Four Missing Proteins from Multi-Tissues
Journal of proteome research2020. Wu, SJ et al. Wuhan Univ, Sch Basic Med Sci, Minist Educ, Key Lab Combinatl Biosynth & Drug Discovery, Wuhan 430072, Peoples R China.
ABSTRACT:The Chromosome-Centric Human Proteome Project (C-HPP) was launched in 2012 to perfect the annotation of human protein existence by identifying stronger evidence of the expression of missing proteins (MPs) at the protein level. After an 8 year effort all over the world, the number of MPs in the neXtProt database significantly decreased from 5511 (2012-02-24) to 1899 (2020-01-17). It is now more difficult to provide confident evidence of the remaining MPs because of their specific characteristics, including low abundance, low molecular weight, unexpected modifications, transmembrane structure, tissue-expression specificity, and so on. A higher resolution mass spectrometry (MS) interpretation engine might provide an opportunity to identify these buried MPs in complex samples by the combination with multi-tissue large-scale proteomics. In this study, open-pFind was used to dig MPs from 20 pairs of healthy human tissues by Wang et al. ( Mol. Syst. Biol. 2019, 15 (2), e8503) combined with our large-scale testis data set digested by three enzymes (Glu-C, Lys-C, and trypsin) with specificity for different amino acid residues ( J. Proteme Res. 2019, 18 (12), 4189-4196). A total of 1 535 536 peptides with 17 283 477 peptide-spectrum matches (PSMs) were mapped to 14 279 protein entries at a false discovery rate of <1% at the PSM, peptide, and protein levels. A total of 103 MP candidates were identified, among which 86 candidates had more unique peptide numbers compared with our single testis tissue. After rigorous screening, manual checks, peptide synthesis, and matching with documented peptides from PeptideAtlas, we validated four MPs, P0C7T8 (duodenum and small intestine), Q8WWZ4 (stomach and rectum), Q8IV35 (fallopian tube), and O14921 (tonsil), at the protein level. All MS raw files have been deposited to the ProteomeXchange with identifier PXD021391.
Use: pFind