Project details

This repository contains the data and code used in the preparation of the manuscript Combining isobaric tags and peptidomics enables the detection of single amino acids and small peptides in human cerebrospinal fluid.

It contains a small proof-of-principle experiment, where isobaric tags are combined with a sample of cerebrospinal fluid to identify singly charged molecules obtained by tandem mass spectrometry. Identification is done with a simple mass-based strategy with 10 ppm tolerance.

The goal of the project is to show that it is possible to identify a reasonable number of single amino acids, very small peptides, and metabolites using this strategy. While there is still a lot of room for improvement, when doing a peptidomics experiment using isobaric tags, collecting singly charged molecules and analysing these spectra adds a little amount of extra work, but can potentially give a wealth of additional information.

Identification of spectra

The precursor mass for each MS2 spectrum with at least three out of six TMT tags is used to match against a purpose-built database of masses. Only the precursor mass is used for identification, with a tolerance of 10 ppm (+ and - 5ppm relative to the theoretical mass in the database).

Molecules considered

Single amino acids, di- and tripeptides, and a list of metabolites are used for the identifications. The single amino acids, di- and tripeptides can also have up to one post-translational modification.

Single amino acids, di- and tripeptides

The mass of amino acids, elements (hydrogen, oxygen, and charge), and modifications are taken from Unimod. From www.unimod.org/downloads.html, we downloaded the XML file reflecting the logical structure of the database. (This file is called unimod.xml.)

Given the ubiquitous nature of oxidised methionine and carbamidomethylated cysteine, these two modified amino acids are treated as standard single amino acids. Secondly, as identification is solely done on the basis of mass, we will not be able to distinguish leucine and isoleucine. Thus, isoleucine is removed from the database.

Dipeptides are generated by making all possible combinations of two amino acids. As only the precursor mass is used for identification, we will not distinguish between, for example, alanine + leucine vs leucine + alanine. Hence, only one variant of each combination of amino acids is included.

Tripeptides are produced in a similar fashion as the dipeptides.

This leads to 2023 entries:

21 'single amino acids' (20 amino acids, minus isoleucine, plus oxidated methionine and carbamidomethylated cysteine)
231 dipeptides ((r+n-1)!/(r!(n-1)!) with r=2, n=21)
1771 tripeptides (as above, but with r=3)

We also added common post-translational modifications that are not on the N-term or the protein C-term.

This lead to the following 11 modifications:

Name	Abbreviation	Amino Acid
Biotinylation	Biotin	K
Phosphorylation	Phospho	Y
Phosphorylation	Phospho	T
Phosphorylation	Phospho	S
Methylation	Methyl	E
Methylation	Methyl	D
O-Sulfonation	Sulfo	S
O-Sulfonation	Sulfo	T
O-Sulfonation	Sulfo	Y
dihydroxy	Dioxidation	M
Crotonylation	Crotonyl	K

Resulting in 2783 extra molecules:

11 modified single amino acids
231 modified dipeptides
2541 modified tripeptides

We do not add more than one post-translational modification (in addition to the TMT-tag) to any molecule.

Metabolites

The metabolites were taken from the Human Metabolite Database HMDB. From the downloads site, we took the Metabolite and Protein Data in XML format for CSF metabolites. (Version 3.6, the most recent version at the time.) We only included the subclasses "Amines" and "Amino acids, peptides, and analogues", and removed different versions of single amino acids.

Adding TMT, water, and charge

The masses of the amino acids taken from Unimod are residual masses. These masses are also used when combining single amino acids into di- and tripeptides. To get to the masses we expect to see in the experiment, we add two hydrogen and one oxygen from the elemental masses part of Unimod. The molecules in HMDB already have the expected mass.

As we expect a single charge and a TMT-tag, we also add one hydrogen minus an electron (the mass of a charge) from the elemental masses part of Unimod, and a TMT6 tag from the modifications part of unimod to each of the molecules in our database.

Available files and instructions for running them

Dependencies (R packages)

This project uses the following R packages:

stringr
XML
data.table
Rcpp (you'll probably also need Rtools to compile the cpp code)

For the graphics:

ggplot2
gridExtra
lattice

Data

In the top level of the repository the file 20171024_EndoCSF_TMT_Rest_Charge1.mgf contains the MS2 spectra for the singly charged features. This file was generated using ProteoWizard MSConvert. To run the R files from the Rscript folder, you first need to download the xml files mentioned above from HMDB and Unimod.

Identification pipeline

To run the whole identification pipeline, run the file main.R. This first runs getMGF.R to read the mgf file and build a data table of the spectra, then makeAAdb.R which constructs the database of theoretical masses described above, and finally getIdentifications.R which maps the theoretical masses to the experimental masses.

Graphics

The files barplots.R, heatplots.R, and scatterplot.R contain the code to build the graphics used in the publication. First run main.R, then run the appropriate file with the code for the graphics to reproduce these.

Contact

For questions please use the issue tracker.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Rscript		Rscript
.gitignore		.gitignore
20171024_EndoCSF_TMT_Rest_Charge1.mgf		20171024_EndoCSF_TMT_Rest_Charge1.mgf
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project details

Identification of spectra

Molecules considered

Single amino acids, di- and tripeptides

Metabolites

Adding TMT, water, and charge

Available files and instructions for running them

Dependencies (R packages)

Data

Identification pipeline

Graphics

Contact

About

Releases

Packages

Languages

License

barsnes-group/isobaric-peptidomics

Folders and files

Latest commit

History

Repository files navigation

Project details

Identification of spectra

Molecules considered

Single amino acids, di- and tripeptides

Metabolites

Adding TMT, water, and charge

Available files and instructions for running them

Dependencies (R packages)

Data

Identification pipeline

Graphics

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages