Description

PeptideAtlas collects raw mass spectrometry proteomics datasets from laboratories around the world and reprocesses them in a uniform bioinformatics workflow using the Trans-Proteomic Pipeline . This track displays peptide identifications from the PeptideAtlas August 2014 (Build 433) Human build. This build, based on 971 samples containing 420,607,360 spectra, identified 1,021,823 distinct peptides, covering 15,136 canonical proteins.

Each PeptideAtlas build comprises a set of reprocessed experiments from a single species or subset of samples (such has human plasma) from a species. Processed results are filtered to a quality level such that there is a 1% false discovery rate at the protein level. All peptide identifications of sufficient quality to enter a build are mapped to the Ensembl genome (v75) using the Ensembl toolkit. Genomic coordinates for all identified peptides to all their Ensembl protein, transcript, and gene mappings, including intron spans, as calculated by the Ensembl toolkit are stored in the PeptideAtlas database.

All peptide sequences in the August 2014 human build (including unmapped sequences) are available for download in FASTA format.

Methods

Mass spectrometer spectra are compared to theoretical spectra (SEQUEST, X!Tandem) or actual spectra (SpectraST) to identify possible peptides. These peptide identifications are scored and filtered (using PeptideProphet) to retain only the highest scoring identifications. The filtered sequences are compared to protein sequence databases (for human, Ensembl, IPI, and Swiss-Prot). The CDS coordinates relative to protein start of matched sequences are used to then calculate genomic coordinates. The protein identifications are then clustered and annotated using ProteinProphet, and stored in the SBEAMS database, where they assigned a unique identifer of the form PAp[8 digit number], e.g. PAp00000001. The processing pipeline is summarized in the graphic below.

Credits

Eric Deutsch, Zhi Sun, and the PeptideAtlas team at the Institute for Systems Biology, Seattle.

References

Desiere F, Deutsch EW, King NL, Nesvizhskii AI, Mallick P, Eng J, Chen S, Eddes J, Loevenich SN, Aebersold R. The PeptideAtlas project. Nucleic Acids Res. 2006 Jan 1;34(Database issue):D655-8. PMID: 16381952; PMC: PMC1347403

Farrah T, Deutsch EW, Omenn GS, Sun Z, Watts JD, Yamamoto T, Shteynberg D, Harris MM, Moritz RL. State of the human proteome in 2013 as viewed through PeptideAtlas: comparing the kidney, urine, and plasma proteomes for the biology- and disease-driven Human Proteome Project. J Proteome Res. 2014 Jan 3;13(1):60-75. PMID: 24261998; PMC: PMC3951210

Keller A, Nesvizhskii AI, Kolker E, Aebersold R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem. 2002 Oct 15;74(20):5383-92. PMID: 12403597

Nesvizhskii AI, Keller A, Kolker E, Aebersold R. A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem. 2003 Sep 1;75(17):4646-58. PMID: 14632076