A Reference Library of Peptide Ion Fragmentation Spectra

A Reference Library of Peptide Ion Fragmentation Spectra

A Reference Library of Peptide Ion Fragmentation Spectra
Stephen Stein ; Lisa Kilpatrick ; Pedatsur Neta ; Jeri Roth ; Xiaoyu Yang
1
2
National Institute of Standards and Technology, Gaithersburg, MD/ Charleston, SC
Three library formats:
1

Overview

Purpose
Create comprehensive, annotated mass spectral libraries from
various organisms and selected proteins to identify peptides by
matching their MS/MS spectra to reference spectra.
Methods
- Acquire Shotgun proteomics data files from diverse sources.
- Identify peptides with available sequence search engines.
- For each peptide ion, create a consensus spectrum from
replicate spectra; also find best single spectrum.
- Derive reliability measures and remove ambiguities.
Results
- Spectrum libraries were built by matching both m/zs and
intensities of MS/MS peaks.
- Libraries derived from widely studied organisms such as human,
yeast, M. Smegamtis, D. Radiodurans, and standard proteins.
- Consensus spectra derived from reliable peptide identifications.
- Library indexing leads to very fast identification (<< 1 sec) even for very large libraries. - Sequence identification by spectrum library searching identifies far more spectra of known peptides than sequence library searching, can be 100 times faster and yields more robust and understandable results. Introduction High throughput proteomics requires automated, fast and accurate library search engines to identify peptide sequences from acquired MS/MS spectra. Current peptide identification methods match each measured MS/MS spectrum against a coarse theoretical spectrum of each possible peptide sequence. Since relative abundances, neutral losses from parent and product ions, and ratios of products having different charge states are not predictable, this rich, peptide-specific information is not effectively used for establishing identity. Also, prior occurrence information is ignored each search identifies the peptide as if for the first time. A spectrum library search matches not only the m/z, but also the relative intensities of the MS/MS peaks and can make use of other prior information. However, spectrum libraries can propagate errors, so reliable searching requires high quality reference libraries, the development of which is described here. We find that identifying peptides by matching their MS/MS spectra to reference spectra can be faster, more reliable and more informative than current sequence-based methods. 2 Methods 1. Acquire and organize Shotgun proteomics data files from diverse sources. Human 5347 LC-MS/MS data files from 11 labs and repositories Boston U. (Steffen/Ahmad) GPM (Beavis) HUPO/Plasma Proteome Project/Omenn HUPO/Brain Proteome Project/Meyer (not yet published) ISB/PeptideAtlas (Deutsch/King/Aebersold/) NCI-SAIC (Veenstra) PNNL/NCRR (Smith), UC Davis (Rice/Lee) Q-tof data from USB (Pannell) and Mayo Clinic (Muddiman) Yeast 2503 LC-MS/MS data files from 12 laboratories Online repositories PeptideAtlas Open Proteomics Database Collaborators/Contributors Blueprint Initiative (Hogue) Harvard University (Gygi) ISB PeptideAtlas (Deutsch/King/Aebersold/) NIH/LNT (Markey/Maynard/Geer/Kowalak/) University of Arizona (Haynes) University of San Francisco (Burlingame/Baker) NIST Test Measurements Mycobacteria Smegmatis 253 LC-MS/MS data files from the Open Proteomics Database online repository Deinococcus Radiodurans 495 LC-MS/MS data files from the PNNL/NCRR Repository. Standard Proteins 19 Proteins were analyzed in NIST laboratories by LC-MS/MS. 1 Different search engines often give very different scores for matching a given peptide ion with a single spectrum (figure bottom left panel). To capture the largest number of identifications, the highest score of up to four different search engines was used. This increased the number of reliable identifications by over 25% compared to any single method. 3. Create consensus spectrum and find best replicate spectrum For all spectra matching a given peptide ion, a multi-step process aligns m/z peaks, rejects outliers and creates a consensus spectrum. It also finds the best replicate spectrum based on search engine scores and spectrum quality. A peak in a consensus spectrum must be present in a majority of the spectra that might have generated the peak. 4. Derive reliability measures for each spectrum Input list Results Query MS/MS Head to tail sample and reference spectra comparison Libraries were built from different organisms. Human Yeast Peptide Class Peptide Class # Peptides Consensus 43,601 Singular (one ID) 1,864 Simple Tryptic 36,447 Consensus Singular (one ID) Simple Tryptic # Peptides 35,807 2,458 24,205 Tryptic Missed Cleavage 7,127 Tryptic Missed Cleavage 5,620 1+ 3,677 Semi Tryptic 5,982 2+ 30,194 1+ 3,658 3+ 9,730 2+ 22,327 ICAT 6,640 3+ 9,822 ICAT Peptide Class 15,061 D. Radiodurans M. Smegmatis # Peptides Consensus Peptide Class 3,562 Singular (one ID) 126 Simple Tryptic 3,252 Tryptic Missed Cleavage 254 Semi Tryptic 56 Consensus Singular (one ID) 8,809 284 Simple Tryptic 6,050 Tryptic Missed Cleavage 2,486 Semi Tryptic Matching peptide and probability scores # Peptides 111 1+ 1,816 2+ 2,130 2+ 5,168 3+ 1,287 3+ 1,799 Several times as many spectra identified by searching against spectra than against sequence (left panel, bottom right) Test Set: Yeast analysis files from the Open Proteomics Database (OPD40, 12 LC-MS/MS runs). Spectrum Library: Consensus spectra in current yeast library. Radiodurans library for false ids. Sequence Library: Search forward and reverse yeast library using relative homology scores or expectation values. Search Speed: Spectrum searching was about 100 times faster than sequence searching. May be accelerated by more peak indexing. Standard Proteins Peptide Class # Peptides Collaborators 4,095 15 Simple Tryptic 2,097 Tryptic Missed Cleavage 1,555 Semi Tryptic 443 1+ 663 2+ 1832 3+ 1320 4+ 245 5+ 35 These libraries depend on contributors for their success. Please contribute. All spectra cite contributors. Spectrum searching identifies peptides fast and reliably. Algorithms: Spectrum similarity scores have been adapted from algorithms used for electron ionization spectra. Peaks are weighted by their significance: - Reduce significance of common impurity ions (e.g., neutral loss from parent ion) - Reduce weight for uncertain and isotopic peaks - Use library spectrum reliability - Fold in sequence score for instrument dependence uses OMSSA scoring Speed: Straightforward indexing leads to very fast identification (<< 1 sec) even for very large libraries. Robustness: Spectrum match scores are less sensitive to spectral details than sequence scores (see figure below, left). Match theoretical spectrum, based on relative dissociation rates of adjacent amino acids (from statistical analysis of reliable spectra). Discrimination shown at right Fraction of unassigned abundance (peaks not originating from a known fragmentation path) Reference spectrum and annotation Spectrum library performance: 273 1+ A) Spectrum/Sequence Consistency Probably right - Simple ASCII msp format (derived from EI MS Library) - NIST Search Software (Windows, see figure below) - Dynamic Link Library (Source & Binary) Create annotated spectra for consensus and best matching single spectra. Resolve problems of similar spectra that appear to generate different peptide ions. Singular (one ID) 2. Identify peptides with available sequence search engines 1 5. Remove ambiguities and build library Consensus thresholds Confirmed 1 N. King et al, ISB - Annotation of the Yeast Proteome with PeptideAtlas (Poster WP 27/522) H. Lam et al., ISB SpectraST: An Open-Source MS/MS Spectra-Matching Library Search Tool for Targeted Proteomics (poster WP27/530) L. Geer et al, NIH Reducing false positive rates in MS/MS sequence searching and incorporating intensity into match based statistics (Poster TP34/638) HUPO PPP and BPP Projects Repositories and dozens of labs who directly and indirectly provided MS/MS data for public use Conclusion A reference spectrum library provides a sensitive, reliable, fast, and comprehensive resource for peptide identification. A peptide mass spectrum library can be used for: Direct peptide identification Validating peptides identified by sequence search programs Organizing and identifying recurring, unidentified spectra. Sensitive, high reliability detection of internal standards, biomarkers, and target proteins Subtracting a component from a mixture spectrum Y/B ion continuity and Y/B correlation Probably wrong Probably right Similarity of Measured vs. Theoretical Spectra (Dot Product x 100) B) Peptide Sequence Confirmation Other peptide ions with same sequence (different charge state or modification) Current sequence search methods yield divergent scores for the same spectrum due to use of incomplete spectrum information. C) Peptide Class (for setting acceptance threshold) Tryptic or semiTryptic Sequence contained in (or contains) another peptide SemiTryptic In source or unexpected Number of peptides per protein / protein length Missed Cleavages: None or explained, or unexplained SemiTryptic Confirmed or unconfirmed Missed Cleavages: Confirmed (found contained peptide) or unconfirmed Small Missing Peaks Can Have A Big Effect on Sequence Scores Sequence Search Score Contact: Steve Stein Director, Mass Spectrometry Data Center National Institute of Standards and Technology [email protected] 301-975-2505

Recently Viewed Presentations

  • POETRY Confession I have a brief confession that
  • American Institutes for Research

    American Institutes for Research

    How to Align SEL, PBIS, and RJ to Provide a Coherent Network of Support for Our Students ... Environmental Factors that Place Youth at Risk. Academic Frustration. Chaotic Classrooms, Public Space, & Transitions . Teasing, Bullying, Gangs. Poor Adult Role...
  • 92st Transportation Research Board (TRB) Annual Meeting ...

    92st Transportation Research Board (TRB) Annual Meeting ...

    * The adoption of shorter following gaps leads to higher capacity per travel lane This chart illustrates the capacity increase as a function of CACC market penetration 100% market penetration can essentially double the traffic capacity Source: Nowakowski, C., et...
  • Applicable Neuroradiology - School of Medicine

    Applicable Neuroradiology - School of Medicine

    Applicable Neuroradiology. Introduction. The field of Radiology first developed following the discovery of X-Rays by Wilhelm Roentgen in 1895. This resulted in widespread clinical use before the damaging effects of ionizing radiation were fully appreciated.
  • TripCase - ATG Transforming Business Travel

    TripCase - ATG Transforming Business Travel

    TripCase for Apple Watch Travel App. With the TripCase Apple Watch Travel App, you'll be able to see your upcoming trip items on your wrist. Of course, we also want to make it easy for you to take action, so...
  • Constraints on the Timing and Geometry of Kula-Farallon Ridge ...

    Constraints on the Timing and Geometry of Kula-Farallon Ridge ...

    The detrital zircon spectra shown above shows a strong correlation with the age and distribution of the intrusive rocks presently exposed in the Skykomish basin east of the study area. Tertiary and Cretaceous intrusive rocks exposed over 66% of the...
  • GT Testing Training CogAT Online September/October 2016 Agenda

    GT Testing Training CogAT Online September/October 2016 Agenda

    Picture Analogies (K - 8th)-Verbal Analogies (9th - 12th) You only need to create a session for the first subtest, the system automatically proceeds to the next subtest, as long as proctor selects Continue Testing All . after each subtest...
  • Essay Writing 101 - Mrs. Scully&#x27;s APUSH Class Site

    Essay Writing 101 - Mrs. Scully's APUSH Class Site

    CCOT- what stayed the same and what changed? Pre-write chart examples. See board for others. Prompts: Evaluate the extent to which trans-Atlantic interactions from 1600-1763 contributed to maintaining continuity as well as fostered change in labor systems in the British...