pFind Studio: a computational solution for mass spectrometry-based proteomics



2017




Different Proteome Profiles between Male and Female Populus cathayana Exposed to UV-B Radiation
Frontiers in plant science2017. Zhang, Yunxiang et al. Chinese Acad Sci, Inst Mt Hazards & Environm, Key Lab Mt Surface Proc & Ecol Regulat, Chengdu, Peoples R China
ABSTRACT:With increasing altitude, solar UV-B radiation is enhanced. Based on the phenomenon of male-biased sex ratio of Populus cathayana Rehder in high altitude alpine area, we hypothesized that males have a faster and more sophisticated responsive mechanism to high UV-B radiation than that of females. Our previous studies have shown sexually different responses to high UV-B radiation were existed in P. cathayana at the morphological, physiological, and transcriptomic levels. However, the responses at the proteomic level remain unclear. In this study, an isobaric tag for relative and absolute quantification (iTRAQ)-based quantitative proteome analysis was performed in P. cathayana females and males. A total of 2,405 proteins were identified, with 331 proteins defined as differentially expressed proteins (DEPs). Among of these, 79 and 138 DEPs were decreased and 47 and 107 DEPs were increased under high solar UV-B radiation in females and males, respectively. A bioinformatics analysis categorized the common responsive proteins in the sexes as related to carbohydrate and energy metabolism, translation/transcription/post-transcriptional modification, photosynthesis, and redox reactions. The responsive proteins that showed differences in sex were mainly those involved in amino acid metabolism, stress response, and translation/transcription/post-transcriptional modification. This study provides proteomic profiles that poplars responding to solar UV-B radiation, and it also provides new insights into differentially sex-related responses to UV-B radiation.
Use: pFind



Proteomics investigations into serum proteins adsorbed by highflux and lowflux dialysis membranes
Proteomics Clinical Applications2017. Han, S et al. Chinese Acad Sci, Natl Chromatog Res & Anal Ctr, Dalian Inst Chem Phys, Key Lab Separat Sci Analyt Chem, Dalian, Peoples R China.
ABSTRACT:Hemodialysis is one of the most important therapies for patients with uremia, and the dialysis membrane is the predominant factor that impacts the efficiency of dialysis. Here, a protein adsorption on two different membranes is investigated to provide a basis for improving dialysis materials. Two cases treated with the Polyflux 14L low-flux dialyzer and the Polyflux 140H high-flux dialyzers during two continuous therapies are selected. Four used dialyzers from selected patients are infused with C12Im-Cl to elute the adsorbed proteins. Then labeled digested proteins adsorb by Polyflux 140H and Polyflux 14L with (CD2O)-C-13 and NaCNBD3 (light labeling, L) and CD2O and NaCNBH3 (heavy labeling, H), respectively. Liquid chromatography-tandem mass spectrometry (LC-MS/MS) is used to identify the proteins. According to the ratio (Light labeling/Heavy labeling), the eluted proteins are divided into three groups: significantly higher, significantly lower, and no significant differences with a ratio of > 2, < 0.5, and 0.5-2, respectively. A total of 668 proteins are identified by LC-MS/MS, among which 177 proteins are retained more by the Polyflux 140H membrane (ratio > 2), 320 proteins are retained more by the Polyflux 14L membrane (ratio < 0.5), and 171 proteins show no significant difference (ratio 0.5-2) between the two types of membranes. Statistical significance is shown in the percentage of adsorbed proteins with an isoelectric point (pI) ranging from 9 to 10 (19.08 versus 7.69%; chi(2) = 11.87, p = 0.0006). Proteins with a molecular weight (MW) of 10-15 kDa tend to deposit on Polyflux 140H compared with Polyflux 14L (25 versus 9.23%; chi(2) = 18.66, p = 0.0000) and proteins with a MW of 30-60 kDa tend to deposit on Polyflux 14L compared with Polyflux 140H (36.54 versus 22.37%; chi(2) = 8.96, p = 0.0028). According to gene ontology analysis, the proteins adsorbed by dialysis membranes are closely related to activation of complement system and the coagulation cascade. The proteins adsorbed by Polyflux 140H and Polyflux 14L show significant differences in PI, MW, and protein class. Proteomic techniques are an effective approach for studying hemodialysis membranes.
Use: pFind



A rapid and easy protein Nterminal profiling strategy using (NSuccinimidyloxycarbonylmethyl)tris(2,4,6trimethoxyphenyl)phosphonium bromide (TMPP) labeling
Proteomics2017. Li, YC et al. Beijing Proteome Res Ctr, 38 Sci Pk Rd, Beijing 102206, Peoples R China.
ABSTRACT:Protein N-terminal profiling is crucial when characterizing biological functions and provides proteomic evidences for genome reannotations. However, most of the current N-terminal enrichment approaches involve multiple chemical derivatizations and chromatographic separation processes which are time consuming and can contribute to N-terminal peptide losses. In this study, a fast, one-step approach utilizing (N-Succinimidyloxycarbonylmethyl)tris(2,4,6-trimethoxyphenyl)phosphonium bromide (TMPP) derivatization and StageTip separation was developed to enhance N-terminal peptide enrichment and analysis. Based on the characteristics of TMPP-derivatized samples, such as a higher hydrophobicity and increased likelihood to produce a and b ions in collision-induced dissociation or HCD fragmentation modes, first the SDS-PAGE was optimized to increase protein loading and gel entry and to remove unbound TMPP. Then, this process was combined with a simplified StageTip separation and a new scoring criterion (considering a, b and y ions) to identify more TMPP-modified N-terminal spectra. When utilizing a low amount of starting material (similar to 20 mu g protein), a total of 581 yeast N-terminal peptides were identified, with 485 of them being TMPP modified, in only about one third of the general experimental time. It is hoped that the workflow constructed herein will provide a fast and practical strategy for N-terminomic studies.
Use: pFind



Protein-level integration strategy of multiengine MS spectra search results for higher confidence and sequence coverage
Journal of Proteome Research2017. Zhao, PP et al. Jinan Univ, Key Lab Funct Prot Res Guangdong Higher Educ Inst, Inst Life & Hlth Engn, Coll Life Sci & Technol, Guangzhou 510632, Guangdong, Peoples R China.
ABSTRACT:Multiple search engines based on various models have been developed to search MS/MS spectra against a reference database, providing different results for the same data set. How to integrate these results efficiently with minimal compromise on false discoveries is an open question due to the lack of an independent, reliable, and highly sensitive standard. We took the advantage of the translating mRNA sequencing (RNC-seq) result as a standard to evaluate the integration strategies of the protein identifications from various search engines. We used seven mainstream search engines (Andromeda, Mascot, OMSSA, X!Tandem, pFind, InsPecT, and ProVerB) to search the same label-free MS data sets of human cell lines Hep3B, MHCCLM3, and MHCC97H from the Chinese C-HPP Consortium for Chromosomes 1, 8, and 20. As expected, the union of seven engines resulted in a boosted false identification, whereas the intersection of seven engines remarkably decreased the identification power. We found that identifications of at least two out of seven engines resulted in maximizing the protein identification power while minimizing the ratio of suspicious/translation-supported identifications (STR), as monitored by our STR index, based on RNC-Seq. Furthermore, this strategy also significantly improves the peptides coverage of the protein amino acid sequence. In summary, we demonstrated a simple strategy to significantly improve the performance for shotgun mass spectrometry by protein-level integrating multiple search engines, maximizing the utilization of the current MS spectra without additional experimental work.
Use: pFind



Sequential fragment ion filtering and endoglycosidase-assisted identification of intact glycopeptides
Analytical and Bioanalytical Chemistry2017. Yu, ZX et al. Anhui Med Univ, 81 Meishan Rd, Hefei 230032, Anhui, Peoples R China.
ABSTRACT:Detailed characterization of glycoprotein structures requires determining both the sites of glycosylation as well as the glycan structures associated with each site. In this work, we developed an analytical strategy for characterization of intact N-glycopeptides in complex proteome samples. In the first step, tryptic glycopeptides were enriched using ZIC-HILIC. Secondly, a portion of the glycopeptides was treated with endoglycosidase H (Endo H) to remove high-mannose (Man) and hybrid N-linked glycans. Thirdly, a fraction of the Endo H-treated glycopeptides was further subjected to PNGase F treatment in O-18 water to remove the remaining complex glycans. The intact glycopeptides and deglycosylated peptides were analyzed by nano-RPLC-MS/MS, and the glycan structures and the peptide sequences were identified by using the Byonic or pFind tools. Sequential digestion by endoglycosidase provided candidate glycosites information and indication of the glycoforms on each glycopeptide, thus helping to confine the database search space and improve the confidence regarding intact glycopeptide identification. We demonstrated the effectiveness of this approach using RNase B and IgG and applied this sequential digestion strategy for the identification of glycopeptides from the HepG2 cell line. We identified 4514 intact glycopeptides coming from 947 glycosites and 1011 unique peptide sequences from HepG2 cells. The intensity of different glycoforms at a specific glycosite was obtained to reach the occupancy ratios of site-specific glycoforms. These results indicate that our method can be used for characterizing site-specific protein glycosylation in complex samples.
Use: pFind; pGlyco



Chemoproteomics reveals unexpected lysine/arginine-specific cleavage of peptide chains as a potential protein degradation machinery
Analytical Chemistry2017. Tian, CP et al. Beijing Inst Life, Beijing Proteome Res Ctr, Natl Ctr Prot Sci, State Key Lab Prote, Beijing 102206, Peoples R China.
ABSTRACT:Proteins can undergo oxidative cleavage by in-vitro metal-catalyzed oxidation (MCO) in either the aamidation or the diamide pathway. However, whether oxidative cleavage of polypeptide-chain occurs in biological systems remains unexplored. We describe a chemoproteomic approach to globally and site-specifically profile electrophilic protein degradants formed from peptide backbone cleavages in human proteomes, including the known N-terminal alpha-ketoacyl products and >1000 unexpected N-terminal formyl products. Strikingly, such cleavages predominantly occur at the carboxyl side of lysine (K) and arginine (R) residues across native proteomes in situ, while MCO-induced oxidative cleavages randomly distribute on peptide/protein sequences in vitro. Furthermore, ionizing radiation-induced reactive oxygen species (ROS) also generate random oxidative cleavages in situ. These findings suggest that the endogenous formation of N-formyl and N-alpha-ketoacyl degradants in biological systems is more likely regulated by a previously unknown mechanism with a trypsin-like specificity, rather than the random oxidative damage as previously thought. More generally, our study highlights the utility of quantitative chemoproteomics in combination with unrestricted search tools as a viable strategy to discover unexpected chemical modifications of proteins labeled with active-based probes.
Use: pFind; pQuant



A chemoproteomic platform to assess bioactivation potential of drugs
Chemical Research in Toxicology2017. Sun, R et al. Beijing Inst Radiat Med, Beijing Proteome Res Ctr, Natl Ctr Prot Sci, State Key Lab Prote, Beijing 102206, Peoples R China.
ABSTRACT:Reactive metabolites (RM) formed from bioactivation of drugs can covalently modify liver proteins and cause mechanism-based inactivation of major cytochrome P450 (CYP450) enzymes. Risk of bioactivation of a test compound is routinely examined as part Of lead optimization efforts in drug discovery. Here we described a chemoproteomic platform to ass in vitro and in vivo bioactivation potential of drugs. This platform enabled us to determine reactivity of thousands of proteomic cysteines toward RMs of diclofenac formed in human liver microsomes and living animals. We pinpointed numerous reactive cysteines as the targets of RMs of diclofenac, including the active (heme-binding) sites on several key CYP450 isoforms (1A2, 2E1 and 3A4 for human, 2C39 and 3A11 for mouse). This general platform should be applied to other drugs, drug candidates, and xenobiotics with potential hepatoxicity, including environmental organic substances, bioactive natural products, and traditional Chinese medicine.
Use: pFind; pQuant



The Null-Test for peptide identification algorithm in Shotgun proteomics
Journal of Proteomics2017. Zhang, SR et al. Chinese Acad Sci, Dalian Inst Chem Phys, Key Lab Separat Sci Analyt Chem, Natl Chromatog Res & Anal Ctr, Dalian, Peoples R China.
ABSTRACT:The present research proposed general evaluation strategy named Null-Test for peptide identification algorithm in Shotgun proteomics. The Null-Test method based on random matching can be utilized to check whether the algorithm has a tendency to make a mistake or has potential bugs, faultiness, errors etc., and to validate the reliability of the identification algorithm. Unfortunately, none of the five famous identification software could pass the most stringent Null-Test. PatternLab had good performance in both Null-Test and routine search by making a good control on the overfitting with sound design. The fuzzy logics based method presented as another candidate strategy could pass the Null-Test and has competitive efficiency in peptide identification. Filtering the results by appropriate FDR would increase the number of discoveries in an experiment, at the cost of losing control of Type I errors. Thus, it is necessary to utilize some more stringent criteria when someone wants to design or analyze an algorithm/software. The more stringent criteria will facilitate the discovery of latent bugs, faultiness, errors etc. in the algorithm/software. It would be recommended to utilize independent search combining random database with statistics theorem to estimate the accurate FDR of the identified results. Biological significance: In the past decades, considerable effort has been devoted to developing a sensitive algorithm for peptide identification in Shotgun proteomics. However, little attention has been paid to controlling the reliability of the identification algorithm at the design stage. The Null-Test based on random matching can be utilized to check whether the algorithm has a tendency to make a mistake or has potential bugs, faultiness, errors etc. However, it turns out that none of the five famous identification software could pass the most stringent Null-Test in the present study, which should be taken into account seriously. Accordingly, a candidate strategy based on fuzzy logics has been demonstrated the possibility that an identification algorithm can pass the Null Test. PatternLab shows that earlier control on overfitting is valuable for designing an efficient algorithm. (C) 2017 Elsevier B.V. All rights reserved.
Use: pFind



Towards Centralized MS/MS Spectra Preprocessing: An Empirical Evaluation of Peptides Search Engines using Ground Truth Datasets
2017 IEEE 17TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE)2017. Maabreh, Majdi et al. Western Michigan Univ, Dept Comp Sci, Kalamazoo, MI 49008 USA
ABSTRACT:several peptides search engines have been developed in the recent decades. Most of the time and for the same inputs, different search engines' result in different peptides were identified, which can confuse the stakeholders in the field of proteomics. The massive amount of generated spectra by high throughput spectrometers adds another challenge which handicaps the current search engines. This motivates the researchers to evaluate the combination of several search engines. Several studies provided ensemble solutions over shared and distributed computing environments for reliable results. However, the massive amount of MS/MS spectra is a cumbersome traffic over the systems' networks. This issue directly impacts the searching performance and also adds unnecessary extra costs (computing, storage, network traffic) if cloud cluster is being used. The main question of this paper is: Can we build a central MS/MS spectra preprocessing for semantically different protein search engines? We evaluate different statistical reduction techniques using four popular protein search engines. In order to fairly evaluate the results, we build ground truth unanimous-based datasets for two different species; yeast and human. Our techniques result in significant peak reduction, where only around 30% of the spectra peaks are enough to report reliable identifications from the used search engines in this study.
Use: pFind



Deep vs. shallow learning-based filters of MS/MS spectra in support of protein search engines
2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM)2017. Maabreh, Majdi et al. Western Michigan Univ, Dept Comp Sci, Kalamazoo, MI 49008 USA
ABSTRACT:Despite the linear relation between the number of observed spectra and the searching time, the current protein search engines, even the parallel versions, could take several hours to search a large amount of MS/MS spectra, which can be generated in a short time. After a laborious searching process, some (and at times, majority) of the observed spectra are labeled as non-identifiable. We evaluate the role of machine learning in building an efficient MS/MS filter to remove non-identifiable spectra. We compare and evaluate the deep learning algorithm using 9 shallow learning algorithms with different configurations. Using 10 different datasets generated from two different search engines, different instruments, different sizes and from different species, we experimentally show that deep learning models are powerful in filtering MS/MS spectra. We also show that our simple feature list is significant where other shallow learning algorithms showed encouraging results in filtering the MS/MS spectra. Our deep learning model can exclude around 50% of the non-identifiable spectra while losing, on average, only 9% of the identifiable ones. As for shallow learning, algorithms of: Random Forest, Support Vector Machine and Neural Networks showed encouraging results, eliminating, on average, 70% of the non-identifiable spectra while losing around 25% of the identifiable ones. The deep learning algorithm may be especially more useful in instances where the protein(s) of interest are in lower cellular or tissue concentration, while the other algorithms may be more useful for concentrated or more highly expressed proteins.
Use: pFind