pFind Studio: a computational solution for mass spectrometry-based proteomics
2020
Proteomics2020. Xu, R et al.
Beijing Inst Life, Natl Ctr Prot Sci Beijing, Beijing Proteome Res Ctr, State Key Lab Prote, Beijing 102206, Peoples R China.
ABSTRACT:Spectrum prediction using machine learning or deep learning models is an emerging method in computational proteomics. Several deep learning-based MS/MS spectrum prediction tools have been developed and showed their potentials not only for increasing the sensitivity and accuracy of data-dependent acquisition search engines, but also for building spectral libraries for data-independent acquisition analysis. Different tools with their unique algorithms and implementations may result in different performances. Hence, it is necessary to systematically evaluate these tools to find out their preferences and intrinsic differences. In this study, multiple datasets with different collision energies, enzymes, instruments, and species, are used to evaluate the performances of the deep learning-based MS/MS spectrum prediction tools, as well as, the machine learning-based tool MS2PIP. The evaluations may provide helpful insights and guidelines of spectrum prediction tools for the corresponding researchers.
Use: pFind; pDeep
Proteomics2020. Liu, JF et al.
Chinese Acad Med Sci & Peking Union Med Coll, Inst Basic Med Sci, Dept Biochem & Mol Biol, State Key Lab Med Mol Biol, Beijing 100005, Peoples R China.
ABSTRACT:Lysine crotonylation (Kcr) is a recently discovered post-translational modification that potentially regulates multiple biological processes. With an objective to expand the available crotonylation datasets, LC-MS/MS is performed using mouse liver samples under normal physiological conditions to obtain in vivo crotonylome. A label-free strategy is used and 10 034 Class I (localization probabilities > 0.75) crotonylated sites are identified in 2245 proteins. The KcrE, KcrD, and EKcr motifs are significantly enriched in the crotonylated peptides. The identified crotonylated proteins are mostly enzymes and primarily located in the cytoplasm and nucleus. Functional enrichment analysis based on Gene Ontology and Kyoto Encyclopedia of Genes and Genomes shows that the crotonylated proteins are closely related to the purine-containing compound metabolic process, ribose phosphate metabolic process, carbon metabolism pathway, ribosome pathway, and a series of metabolism-associated biological processes. To the best of the authors' knowledge, this research provides the first report on the mouse liver crotonylome. Furthermore, it offers additional evidence that crotonylation exists in non-histone proteins, and is likely involved in various biological processes. The mass spectrometry proteomics data have been deposited in the ProteomeXchange Consortium with the dataset identifiers PXD019145.
Use: pFind
Journal of Mass Spectrometry2020. Huang, PW et al.
Southern Univ Sci & Technol, Dept Chem, Shenzhen 518055, Peoples R China.
ABSTRACT:Steady improvement in Orbitrap-based mass spectrometry (MS) technologies has greatly advanced the peptide sequencing speed and depth. In-depth analysis of the performance of state-of-the-art MS and optimization of key parameters can improve sequencing efficiency. In this study, we first systematically compared the performance of two popular data-dependent acquisition approaches, with Orbitrap as the first-stage (MS1) mass analyzer and the same Orbitrap (high-high approach) or ion trap (high-low approach) as the second-stage (MS2) mass analyzer, on the Orbitrap Fusion mass spectrometer. High-high approach outperformed high-low approach in terms of better saturation of the scan cycle and higher MS2 identification rate. However, regardless of the acquisition method, there are still more than 60% of peptide features untargeted for MS2 scan. We then systematically optimized the MS parameters using the high-high approach. Increasing the isolation window in the high-high approach could facilitate faster scan speed, but decreased MS2 identification rate. On the contrary, increasing the injection time of MS2 scan could increase identification rate but decrease scan speed and the number of identified MS2 spectra. Dynamic exclusion time should be set properly according to the chromatography peak width. Furthermore, we found that the Orbitrap analyzer, rather than the analytical column, was easily saturated with higher loading amount, thus limited the dynamic range of MS1-based quantification. By using optimized parameters, 10 000 proteins and 110 000 unique peptides were identified by using 20 h of effective liquid chromatography (LC) gradient time. The study therefore illustrated the importance of synchronizing LC-MS precursor ion targeting, fragment ion detection, and chromatographic separation for high efficient data-dependent proteomics.
Use: pFind; pParse
Analytical chemistry2020. Du, XX et al.
Fudan Univ, Shanghai Canc Ctr, Shanghai 200032, Peoples R China.
ABSTRACT:Protein N-terminal acetylation (N-alpha-acetylation) is one of the most common modifications in both eukaryotes and prokaryotes. Although studies have shown that N-alpha-acetylation plays important roles in protein assembly, stability, and location, the physiological role has not been fully elucidated. Therefore, a robust and large-scale analytical method is important for a better understanding of N-alpha-acetylation. Here, an enrichment strategy was presented based on LysN digestion and amine-reactive resin capture to study naturally acetylated protein N termini. Since LysN protease cleaves at the amino-terminus of the lysine residue, all resulting peptides except naturally acetylated N-terminal peptides contain free amino groups and can be removed by coupling with AminoLink Resin. Therefore, the naturally acetylated N-terminal peptides were left in solution and enriched for further liquid chromatography-tandem mass spectrometry (LC-MS/MS) analysis. The method was very simple and fast, which contained no additional chemical derivatization except protein reduction and alkylation necessarily needed in bottom-up proteomics. It could be used to study acetylated N termini from complex biological samples without bias toward different peptides with various physicochemical properties. The enrichment specificity was above 99% when it was applied in HeLa cell lysates. Neo-N termini generated by endogenous degradation could be directly distinguished without the use of stable-isotope labeling because no chemical derivatization was introduced in this method. Furthermore, this method was highly complementary to the traditional analytical methods for protein N termini based on trypsin only with ArgC-like activity. Therefore, the described method was beneficial to naturally acetylated protein N termini profiling.
Use: pFind
Journal of proteome research2020. Xu, F et al.
Beijing Inst Life, Natl Ctr Prot Sci Beijing, Beijing Proteome Res Ctr, State Key Lab Prote, Beijing 102206, Peoples R China.
ABSTRACT:Understanding of the kinase-guided signaling pathways requires the identification and analysis of phosphosites. Mass spectrometry (MS)-based phosphoproteomics is a rapid and highly sensitive approach for high-throughput identification of phosphosites. However, phosphosite determination from MS data with a single protease is more likely to be ambiguous, regardless of the strategy used for phosphopeptide detection. Here, we explored the application of LysargiNase, which was recently reported to mirror trypsin in specificity to cleave arginine and lysine residues exclusively at the N-terminal side. We found that the combination of trypsin and LysargiNase mirror spectra resulted in higher ion coverage in MS(2 )spectra. The median ion coverage values of b ions in tryptic spectra, LysargiNase spectra, and combined spectra are 8.3, 20.5, and 25.0%, respectively. As for the median ion coverage of y ions, these values are 27.8, 10.0, and 32.3%. Higher ion coverage was helpful to pinpoint the precise phosphosites. Compared to trypsin alone, the combined use of trypsin and LysargiNase mirror spectra enabled 67.1% of mirror spectra with unreliable scores (confidence score <0.75) to become reliable (confidence score >= 0.75). Meanwhile, all of the mirror peptide-spectrum matches (PSMs) with multiple potential phosphosites from trypsin and LysargiNase digests could be assigned one precise phosphosite after applying the combination strategy. Besides, the combination strategy could identify more novel phosphosites than the union strategy did. We synthesized three phosphopeptides corresponding to the three novel phosphosites and validated the reliability of the identification. Taken together, our data demonstrated the distinctive potential of the combination strategy presented here for unambiguous phosphosite localization (Project accession PXD011178).
Use: pFind
Current bioinformatics2020. Rolfs, Z et al.
Univ Wisconsin, Dept Chem, 1101 Univ Ave, Madison, WI 53706 USA.
ABSTRACT:Background: The identification of non-specifically cleaved peptides in proteomics and peptidomics poses a significant computational challenge. Current strategies for the identification of such peptides are typically time-consuming and hinder routine data analysis. Objective: We aimed to design an algorithm that would improve the speed of semi- and nonspecific enzyme searches and could be applied to existing search programs. Methods: We developed a novel search algorithm that leverages fragment-ion redundancy to simultaneously search multiple non-specifically cleaved peptides at once. Briefly, a theoretical peptide tandem mass spectrum is generated using only the fragment-ion series from a single terminus. This spectrum serves as a proxy for several shorter theoretical peptides sharing the same terminus. After database searching, amino acids are removed from the opposing terminus until the observed and theoretical precursor masses match within a given mass tolerance. Results: The algorithm was implemented in the search program MetaMorpheus and found to perform an order of magnitude faster than the traditional MetaMorpheus search and produce superior results. Conclusion: We report a speedy non-specific enzyme search algorithm that is open-source and enables search programs to utilize fragmention redundancy to achieve a notable increase in search speed.
Use: pFind
Genomics, proteomics & bioinformatics2020. Lin, XH et al.
Chinese Acad Sci, Inst Hydrobiol, State Key Lab Freshwater Ecol & Biotechnol, Wuhan 430072, Peoples R China.
ABSTRACT:Protein lysine methylation is a prevalent post-translational modification (PTM) and plays critical roles in all domains of life. However, its extent and function in photosynthetic organisms are still largely unknown. Cyanobacteria are a large group of prokaryotes that carry out oxygenic photosynthesis and are applied extensively in studies of photosynthetic mechanisms and environmental adaptation. Here we integrated propionylation of monomethylated proteins, enrichment of the modified peptides, and mass spectrometry (MS) analysis to identify monomethylated proteins in Synechocystis sp. PCC 6803 (Synechocystis). Overall, we identified 376 monomethylation sites in 270 proteins, with numerous monomethylated proteins participating in photosynthesis and carbon metabolism. We subsequently demonstrated that CpcM, a previously identified asparagine methyl-transferase in Synechocystis, could catalyze lysine monomethylation of the potential aspartate amino-transferase 5110480 both in vivo and in vitro and regulate the enzyme activity of 5110480. The loss of CpcM led to decreases in the maximum quantum yield in primary photosystem II (PSII) and the efficiency of energy transfer during the photosynthetic reaction in Synechocystis. We report the first lysine monomethylome in a photosynthetic organism and present a critical database for functional analyses of monomethylation in cyanobacteria. The large number of monomethylated proteins and the identification of CpcM as the lysine methyltransferase in cyanobacteria suggest that reversible methylation may influence the metabolic process and photosynthesis in both cyanobacteria and plants.
Use: pFind
Analytical Chemistry2020. Slavin, M et al.
Hebrew Univ Jerusalem, Inst Life Sci, IL-9190401 Jerusalem, Israel.
ABSTRACT:Development of new reagents for protein cross-linking is constantly ongoing. The chemical formulas for the linker adducts formed by these reagents are usually deduced from expert knowledge and then validated by mass spectrometry. Clearly, it would be more rigorous to infer the chemical compositions of the adducts directly from the data without any prior assumptions on their chemistries. Unfortunately, the analysis tools that are currently available to detect chemical modifications on linear peptides are not applicable to the case of two cross-linked peptides. Here, we show that an adaptation of the open search strategy that works on linear peptides can be used to characterize cross-link modifications in pairs of peptides. We benchmark our approach by correctly inferring the linker masses of two well-known reagents, DSS and formaldehyde, to accuracies of a few parts per million. We then investigate the cross-linking chemistries of two poorly characterized reagents: EMCS and glutaraldehyde. In the case of EMCS, we find that the expected cross-linking chemistry is accompanied by a competing chemistry that targets other amino acid types. In the case of glutaraldehyde, we find that the chemical formula of the dominant linker is C5H4, which indicates a ringed aromatic structure. These results demonstrate how, with very little effort, our approach can yield nontrivial insights to better characterize new cross-linkers.
Use: pFind
Journal of Pharmaceutical Analysis2020. Cui, XL et al.
State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing
ABSTRACT:Posttranslational modifications of antibody products affect their stability, charge distribution, and drug activity and are thus a critical quality attribute. The comprehensive mapping of antibody modifications and different charge isomers (CIs) is of utmost importance but is challenging. We intended to quantitatively characterize the posttranslational modification status of CIs ofantibody drugsand explore the impact of posttranslational modifications on charge heterogeneity. The CIs of antibodies were fractionated by strong cation exchange chromatography and verified by capillary isoelectric focusing-whole column imaging detection, followed by stepwise structural characterization at three levels. First, the differences between CIs were explored at the intact protein level using a top-down mass spectrometry approach; this showed differences in glycoforms and deamidation status. Second, at the peptide level, common modifications of oxidation, deamidation, and glycosylation were identified. Peptide mapping showed nonuniform deamidation and glycoform distribution among CIs. In total, 10N-glycoforms were detected by peptide mapping. Finally, an in-depth analysis ofglycanvariants of CIs was performed through the detection of enrichedglycopeptides. Qualitative and quantitative analyses demonstrated the dynamics of 24N-glycoforms. The results revealed thatsialic acidmodification is a critical factor accounting for charge heterogeneity, which is otherwise missed in peptide mapping and intact molecular weight analyses. This study demonstrated the importance of the comprehensive analyses of antibody CIs and provides a reference method for the quality control of biopharmaceutical analysis.
Use: pFind
JOURNAL OF SEPARATION SCIENCE2020. Yang, C et al.
Chinese Acad Sci, Natl Chromatog Res & Anal Ctr, Dalian Inst Chem Phys, CAS Key Lab Separat Sci Analyt Chem, 457 Zhongshan Rd, Dalian 116023, Peoples R China.
ABSTRACT:Peptide sequencing is critical to the quality control of peptide drugs and functional studies of active peptides. A combination of peptidase digestion and mass spectrometry technology is common for peptide sequencing. However, such methods often cannot obtain the complete sequence of a peptide due to insufficient amino acid sequence information. Here, we developed a method of generating full peptide ladders and comparing their MS(2)spectral similarities. The peptide ladders, of which each component was different from the next component with one residue, were generated by continuous digestion by peptidase (carboxypeptidase Y and aminopeptidase). Then, based on the characteristics of peptide ladders, complete sequencing was realized by comparing MS(2)spectral similarity of the generated peptide ladders. The complete amino acid sequences of bivalirudin, adrenocorticotropic hormone, and oxytocin were determined with high accuracy. This approach is beneficial to the quality control of drug peptides as well as the identification of novel bioactive peptides.
Use: pFind