pFind Studio: a computational solution for mass spectrometry-based proteomics



2020




Identification of modified peptides using localization-aware open search
Nature communications2020. Yu, FC et al. Univ Michigan, Dept Pathol, Ann Arbor, MI 48109 USA.
ABSTRACT:Identification of post-translationally or chemically modified peptides in mass spectrometry-based proteomics experiments is a crucial yet challenging task. We have recently introduced a fragment ion indexing method and the MSFragger search engine to empower an open search strategy for comprehensive analysis of modified peptides. However, this strategy does not consider fragment ions shifted by unknown modifications, preventing modification localization and limiting the sensitivity of the search. Here we present a localization-aware open search method, in which both modification-containing (shifted) and regular fragment ions are indexed and used in scoring. We also implement a fast mass calibration and optimization method, allowing optimization of the mass tolerances and other key search parameters. We demonstrate that MSFragger with mass calibration and localization-aware open search identifies modified peptides with significantly higher sensitivity and accuracy. Comparing MSFragger to other modification-focused tools (pFind3, MetaMorpheus, and TagGraph) shows that MSFragger remains an excellent option for fast, comprehensive, and sensitive searches for modified peptides in shotgun proteomics data. Mass spectrometry-based proteomics is the method of choice for the global mapping of post-translational modifications, but matching and scoring peaks with unknown masses remains challenging. Here, the authors present a refined open search strategy to score all peaks with higher sensitivity and accuracy.
Use: pFind; pParse



The proteome landscape of the kingdoms of life
Nature2020. Muller, JB et al. Max Planck Inst Biochem, Dept Prote & Signal Transduct, Martinsried, Germany.
ABSTRACT:Proteins carry out the vast majority of functions in all biological domains, but for technological reasons their large-scale investigation has lagged behind the study of genomes. Since the first essentially complete eukaryotic proteome was reported(1), advances in mass-spectrometry-based proteomics(2)have enabled increasingly comprehensive identification and quantification of the human proteome(3-6). However, there have been few comparisons across species(7,8), in stark contrast with genomics initiatives(9). Here we use an advanced proteomics workflow-in which the peptide separation step is performed by a microstructured and extremely reproducible chromatographic system-for the in-depth study of 100 taxonomically diverse organisms. With two million peptide and 340,000 stringent protein identifications obtained in a standardized manner, we double the number of proteins with solid experimental evidence known to the scientific community. The data also provide a large-scale case study for sequence-based machine learning, as we demonstrate by experimentally confirming the predicted properties of peptides fromBacteroides uniformis. Our results offer a comparative view of the functional organization of organisms across the entire evolutionary range. A remarkably high fraction of the total proteome mass in all kingdoms is dedicated to protein homeostasis and folding, highlighting the biological challenge of maintaining protein structure in all branches of life. Likewise, a universally high fraction is involved in supplying energy resources, although these pathways range from photosynthesis through iron sulfur metabolism to carbohydrate metabolism. Generally, however, proteins and proteomes are remarkably diverse between organisms, and they can readily be explored and functionally compared at www.proteomesoflife.org.
Use: pFind; pDeep



Accurate annotation of human protein-coding small open reading frames
Nature chemical biology2020. Martinez, TF et al. Salk Inst Biol Studies, Clayton Fdn Labs Peptide Biol, 10010 N Torrey Pines Rd, La Jolla, CA 92037 USA.
ABSTRACT:Functional protein-coding small open reading frames (smORFs) are emerging as an important class of genes. However, the number of translated smORFs in the human genome is unclear because proteogenomic methods are not sensitive enough, and, as we show, Ribo-seq strategies require additional measures to ensure comprehensive and accurate smORF annotation. Here, we integrate de novo transcriptome assembly and Ribo-seq into an improved workflow that overcomes obstacles with previous methods, to more confidently annotate thousands of smORFs. Evolutionary conservation analyses suggest that hundreds of smORF-encoded microproteins are likely functional. Additionally, many smORFs are regulated during fundamental biological processes, such as cell stress. Peptides derived from smORFs are also detectable on human leukocyte antigen complexes, revealing smORFs as a source of antigens. Thus, by including additional validation into our smORF annotation workflow, we accurately identify thousands of unannotated translated smORFs that will provide a rich pool of unexplored, functional human genes. An improved workflow combining de novo transcriptome assembly and Ribo-seq validated by cellular antigen display is developed to maximize small peptide discovery, leading to identification of thousands of unannotated protein-coding smORFs.
Use: pFind



Persulfidation-based modification of cysteine desulfhydrase and the NADPH oxidase RBOHD controls guard cell abscisic acid signaling
The Plant Cell2020. Shen, J et al. Nanjing Agr Univ, Coll Life Sci, Lab Ctr Life Sci, Nanjing 210095, Peoples R China.
ABSTRACT:A persulfidation-based reversible post-translational modification of Cys desulfhydrase and NADPH oxidase RBOHD fine-tunes guard cell ABA signaling. Hydrogen sulfide (H2S) is a gaseous signaling molecule that regulates diverse cellular signaling pathways through persulfidation, which involves the post-translational modification of specific Cys residues to form persulfides. However, the mechanisms that underlie this important redox-based modification remain poorly understood in higher plants. We have, therefore, analyzed how protein persulfidation acts as a specific and reversible signaling mechanism during the abscisic acid (ABA) response in Arabidopsis (Arabidopsis thaliana). Here we show that ABA stimulates the persulfidation of l-CYSTEINE DESULFHYDRASE1, an important endogenous H2S enzyme, at Cys44 and Cys205 in a redox-dependent manner. Moreover, sustainable H2S accumulation drives persulfidation of the NADPH oxidase RESPIRATORY BURST OXIDASE HOMOLOG PROTEIN D (RBOHD) at Cys825 and Cys890, enhancing its ability to produce reactive oxygen species. Physiologically, s-persulfidation-induced RBOHD activity is relevant to ABA-induced stomatal closure. Together, these processes form a negative feedback loop that fine-tunes guard cell redox homeostasis and ABA signaling. These findings not only expand our current knowledge of H2S function in the context of guard cell ABA signaling, but also demonstrate the presence of a rapid signal integration mechanism involving specific and reversible redox-based post-translational modifications that occur in response to changing environmental conditions.
Use: pFind



Pre-termination transcription complex: structure and function
Molecular cell2020. Hao, ZT et al. NYU, Sch Med, Dept Biochem & Mol Pharmacol, New York, NY 10016 USA.
ABSTRACT:Rho is a general transcription termination factor playing essential roles in RNA polymerase (RNAP) recycling, gene regulation, and genomic stability in most bacteria. Traditional models of transcription termination postulate that hexameric Rho loads onto RNA prior to contacting RNAP and then translocates along the transcript in pursuit of the moving RNAP to pull RNA from it. Here, we report the cryoelectron microscopy (cryo-EM) structures of two termination process intermediates. Prior to interacting with RNA, Rho forms a specific "pre-termination complex'' (PTC) with RNAP and elongation factors NusA and NusG, which stabilize the PTC. RNA exiting RNAP interacts with NusA before entering the central channel of Rho from the distal C-terminal side of the ring. We map the principal interactions in the PTC and demonstrate their critical role in termination. Our results support a mechanism in which the formation of a persistent PTC is a prerequisite for termination.
Use: pFind; pLink



A streamlined mass spectrometrybased proteomics workflow for largescale FFPE tissue analysis
The Journal of pathology2020. Coscia, F et al. Max Planck Inst Biochem, D-82152 Martinsried, Germany.
ABSTRACT:Formalin fixation and paraffin-embedding (FFPE) is the most common method to preserve human tissue for clinical diagnosis, and FFPE archives represent an invaluable resource for biomedical research. Proteins in FFPE material are stable over decades but their efficient extraction and streamlined analysis by mass spectrometry (MS)-based proteomics has so far proven challenging. Herein we describe a MS-based proteomic workflow for quantitative profiling of large FFPE tissue cohorts directly from histopathology glass slides. We demonstrate broad applicability of the workflow to clinical pathology specimens and variable sample amounts, including low-input cancer tissue isolated by laser microdissection. Using state-of-the-art data dependent acquisition (DDA) and data independent acquisition (DIA) MS workflows, we consistently quantify a large part of the proteome in 100 min single-run analyses. In an adenoma cohort comprising more than 100 samples, total workup took less than a day. We observed a moderate trend towards lower protein identification in long-term stored samples (>15 years), but clustering into distinct proteomic subtypes was independent of archival time. Our results underscore the great promise of FFPE tissues for patient phenotyping using unbiased proteomics and they prove the feasibility of analyzing large tissue cohorts in a robust, timely, and streamlined manner. (c) 2020 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of Pathological Society of Great Britain and Ireland.
Use: pFind



Cancer neoantigen prioritization through sensitive and reliable proteogenomics analysis
Nature communications2020. Wen, B et al. Baylor Coll Med, Lester & Sue Smith Breast Ctr, Houston, TX 77030 USA.
ABSTRACT:Genomics-based neoantigen discovery can be enhanced by proteomic evidence, but there remains a lack of consensus on the performance of different quality control methods for variant peptide identification in proteogenomics. We propose to use the difference between accurately predicted and observed retention times for each peptide as a metric to evaluate different quality control methods. To this end, we develop AutoRT, a deep learning algorithm with high accuracy in retention time prediction. Analysis of three cancer data sets with a total of 287 tumor samples using different quality control strategies results in substantially different numbers of identified variant peptides and putative neoantigens. Our systematic evaluation, using the proposed retention time metric, provides insights and practical guidance on the selection of quality control strategies. We implement the recommended strategy in a computational workflow named NeoFlow to support proteogenomics-based neoantigen prioritization, enabling more sensitive discovery of putative neoantigens. Identifying mutation-derived neoantigens by proteogenomics requires robust strategies for quality control. Here, the authors propose peptide retention time as an evaluation metric for proteogenomics quality control methods, and develop a deep learning algorithm for accurate retention time prediction.
Use: pFind



Identification of novel DPPIV inhibitory peptides from Atlantic salmon (Salmo salar) skin
Food Research International2020. Jin, RT et al. Northeast Agr Univ, Coll Food Sci, 600 Changjiang Rd, Harbin 150030, Peoples R China.
ABSTRACT:The aim of this study was to identify dipeptidyl peptidase IV (DPP-IV) inhibitory peptides from salmon skin collagen hydrolysate, and to evaluate the possible inhibition mechanism of DPP-IV and peptide. Salmon skin collagen was hydrolyzed by pepsin, trypsin, papain, or Alcalase 2.4 L, separately. Trypsin hydrolysate (10 mg/mL) showed the highest inhibitory activity of 66.12 +/- 0.68%. The hydrolysate was separated into three fractions by ultrafiltration, and the inhibitory IC50 of M1 (molecular weight< 3 kDa) was 1.54 +/- 0.06 mg/mL. M1 was separated by gel chromatography and RP-HPLC; A10 was the highest inhibitory fraction in the 12 fractions, i.e., IC50 was 0.79 +/- 0.13 mg/mL. A novel peptide LDKVFR with the IC50 value of 0.1 +/- 0.03 mg/mL (128.71 mu M) was identified from A10. Molecular docking revealed that six hydrogen bonds and eight hydrophobic interactions between LDKVFR and DPP-IV were contributed to DPP-IV inhibition.
Use: pFind



Characterization of urinary exosomes purified with size exclusion chromatography and ultracentrifugation
Journal of Proteome Research2020. Guan, S et al. Fudan Univ, Dept Chem, Shanghai 200438, Peoples R China.
ABSTRACT:Exosomes, a subtype of extracellular vesicles secreted by mammalian cells with a typical size range of 30-150 nm, have been implicated in many biological processes as intercellular communication carriers. The isolation of exosomes is an essential and challenging step before subsequent analysis and functional studies, due to the complexity of body fluids, as well as the small size and low density of exosomes. Ultracentrifugation (UC) and size exclusion chromatography (SEC) are two methods that have been extensively used for exosomes isolation in biological studies in recent years. In this work, we compared the characteristics of urinary exosomes extracted with SEC and UC methods in detail. Results showed that the SEC isolation method was superior to UC in the recovery of exosomal particles and proteins. The results of proteomics analysis showed that more purified exosomes were extracted with the SEC method. We also observed that parts of exosomes were ruptured and precipitated insufficiently during UC isolations. It not only led to a low recovery of exosome proteins but also resulted in a considerable loss of exosomal particles. Moreover, the exosomal rupture and particle loss in UC could not be avoided by resuspension of the exosomal particles. Our results also showed that exosomes from SEC purifications possessed a high internalization capability from 4 to 6 h when incubated with EA.hy926 and HCV29 cell lines.
Use: pFind



TransCirc: an interactive database for translatable circular RNAs based on multi-omics evidence
Nucleic Acids Research2020. Huang, WD et al. Chinese Acad Sci, Biomed Big Data Ctr, CAS MPG Partner Inst Computat Biol, CAS Key Lab Computat Biol,Shanghai Inst Nutr & Hl, Shanghai 200031, Peoples R China.
ABSTRACT:TransCirc (https://www.biosino.org/transcirc/) is a specialized database that provide comprehensive evidences supporting the translation potential of circular RNAs (circRNAs). This database was generated by integrating various direct and indirect evidences to predict coding potential of each human circRNA and the putative translation products. Seven types of evidences for circRNA translation were included: (i) ribosome/polysome binding evidences supporting the occupancy of ribosomes onto circRNAs; (ii) experimentally mapped translation initiation sites on circRNAs; (iii) internal ribosome entry site on circRNAs; (iv) published N-6-methyladenosine modification data in circRNA that promote translation initiation; (v) lengths of the circRNA specific open reading frames; (vi) sequence composition scores froma machine learning prediction of all potential open reading frames; (vii) mass spectrometry data that directly support the circRNA encoded peptides across back-splice junctions. TransCirc provides a user-friendly searching/browsing interface and independent lines of evidences to predicte how likely a circRNA can be translated. In addition, several flexible tools have been developed to aid retrieval and analysis of the data. TransCirc can serve as an important resource for investigating the translation capacity of circRNAs and the potential circRNA-encoded peptides, and can be expanded to include new evidences or additional species in the future.
Use: pFind