-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
151 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,151 @@ | ||
# compareMS2-cmd | ||
|
||
compareMS2-cmd is the command line software of compareMS2. compareMS2 calculates the global similarity between tandem mass spectrometry datasets. | ||
A GUI for this software is available at [compareMS2](https://github.com/524D/compareMS2). | ||
|
||
[1. Introduction](#1-Introduction) | ||
[1.1 What is compareMS2?](#11-What-is-compareMS2) | ||
[1.2 How does compareMS2 differ from other tools?](#12-How-does-compareMS2-differ-from-other-tools) | ||
[1.3 What can compareMS2 be used for?](#13-What-can-compareMS2-be-used-for) | ||
[2. Installing compareMS2](#2-Installing-compareMS2) | ||
[2.1 Running compareMS2 in development mode](#21-Running-compareMS2-in-development-mode) | ||
[2.2 Building compareMS2](#22-Building-compareMS2) | ||
[3. Using compareMS2](#3-Using-compareMS2) | ||
[3.1 Configuring compareMS2](#31-Configuring-compareMS2) | ||
[3.2 Calculating distance matrices](#32-Calculating-distance-matrices) | ||
[3.3 Running compareMS2](#33-Running-compareMS2) | ||
[3.4 Molecular phylogenetics](#34-Molecular-phylogenetics) | ||
[3.5 Data quality control](#35-Data-quality-control) | ||
[3.6 Experimental features](#36-Experimental-features) | ||
[4. Acknowledgements ](#4-Acknowledgements) | ||
[5. Further reading](#5-Further-reading) | ||
|
||
## 1. Introduction | ||
|
||
### 1.1 What is compareMS2? | ||
|
||
compareMS2 is a tool for direct comparison of tandem mass spectrometry datasets, typically from liquid chromatography-tandem mass spectrometry (LC-MS/MS), defining similarity as a function of shared (similar) spectra and distance as the inverse of this similarity. Data with identical spectral content thus have similarity 1 and distance 0. The similarity of datasets with no similar spectra tend to 0 (distance +∞) as the size of the sets go to infinity. The extremes of none or all spectra being similar between two LC-MS/MS datasets are unlikely to occur in reality. | ||
|
||
### 1.2 How does compareMS2 differ from other tools? | ||
|
||
Though compareMS2 is not limited to tandem mass spectra of peptides, it has seen most application to this type of data. There are four broad categories of tools for the analysis of peptide tandem mass spectra in mass spectrometry-based proteomics based on what prior information they utilize. compareMS2 belongs to a class of tools that do not use existing sequence data or libraries of spectra assigned to a specific peptide sequence, but compare tandem mass spectra directly with other tandem mass spectra: | ||
|
||
<table> | ||
<tbody> | ||
<tr> | ||
<th rowspan=2 colspan=2></th> | ||
<th colspan=2>(translated) genome sequences available</th> | ||
</tr> | ||
<tr> | ||
<td>+</td> | ||
<td>-</td> | ||
</tr> | ||
<tr> | ||
<th rowspan=2>prior/other tandem <br>mass spectra available</th> | ||
<td>+</td> | ||
<td>spectral libraries (<a href="https://doi.org/10.1021/ac060279n">BiblioSpec</a>, | ||
<a href="https://doi.org/10.1002/pmic.200600625">SpectraST</a>, ...)</td> | ||
<td>direct comparison (<a href="https://doi.org/10.1002/rcm.6162"><b>compareMS2</b></a>, | ||
<a href="https://doi.org/10.1186/s12859-017-1514-2">DISMS2</a>, ...)</td> | ||
</tr> | ||
<tr> | ||
<td>-</td> | ||
<td>database search (<a href="https://doi.org/10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2">Mascot</a>, | ||
<a href="https://doi.org/10.1038/msb4100024">Comet</a>, ...)</td> | ||
<td><i>de novo</i> sequencing (<a href="https://doi.org/10.1002/pro.5560010902">LUTEFISK</a>, | ||
<a href="https://doi.org/10.1021/ac048788h">PepNovo</a>, ...)</td> | ||
</tr> | ||
</tbody> | ||
</table> | ||
|
||
### 1.3 What can compareMS2 be used for? | ||
|
||
compareMS2 (and similar tools) have extremely broad utility, but have so far seen most utility in data quality control, food/feed species identification and molecular phylogenetics. Molecular phylogenetics is the study of evolution and relatedness of organisms, genes or proteins. The field dates back to [1960](https://doi.org/10.1073/pnas.46.10.1349) using patterns of tryptic peptides separated by paper chromatography. compareMS2 is a [21st-century analogue](https://doi.org/10.1021/acs.jproteome.1c00528), comparing patterns of tryptic peptides as analyzed by tandem mass spectrometry, with the difference that it can use thousands of peptides and that the tandem mass spectra are highly peptide-specific. | ||
|
||
However, not only the amino acid sequences of the peptides affect the distance metric in compareMS2, but also the abundance (or coverage) of the proteins. compareMS2 can also be used to quantify the similarity of proteomes from different cell lines or tissues from the same species, before and independently of any protein identification by database or spectral library search. | ||
|
||
## 2. Installing the compareMS2 command line software | ||
|
||
|
||
## 3. Using compareMS2 | ||
|
||
This section describes how to run the compareMS2 command line software. Every compareMS2 analysis consists of two phases: (1) pairwise comparison of all LC-MS/MS datasets and (2) calculating a distance matrix from all pairwise comparisons. The pairwise comparison of the datasets provides a measure of their similarity. The distance matrix can then be used for molecular phylogenetics, data quality control or other purposes. | ||
|
||
### 3.1 Configuring compareMS2 | ||
|
||
The compareMS2 CLI has a small number of parameters, which are: | ||
|
||
-A *first dataset filename* | ||
-B *second dataset filename* | ||
-W *first scan number*, *last scan number* | ||
-R *first retention time*, *last retention time* | ||
-c *cutoff for spectral similarity* | ||
-o *output filename* | ||
-m *minimum base peak intensity*, *minimum total MS/MS intensity* | ||
-w *maximum scan number difference* | ||
-r *maximum retention time difference* | ||
-p *maximum difference in precursor mass* | ||
-e *maximum precursor mass measurement error* | ||
-s *intensity scaling before dot product* | ||
-n *noise threshold for dot product* | ||
-d *version of set distance metric* | ||
-q *version of QC metric* | ||
-N *include only N most intense spectra in comparison* | ||
-b *bin size for dot product* | ||
-I *minimum number of peaks for dot product* | ||
-L *lower m/z for dot product* | ||
-U *upper m/z for dot product* | ||
-x *experimental features* | ||
|
||
The compareMS2 GUI exposes some of these, and determine others automatically, e.g. the dataset filenames from a specified directory. | ||
|
||
### 3.2 Calculating distance matrices | ||
|
||
Distance matrices are calculated using a separate executable, compareMS2_to_distance_matrices. This can also average the distances for multiple replicates per species for more accurate molecular phylogenetic analysis. For this, a tab-delimited file with filenames and species names are required. If no such file is provided, one is created automatically, using the filenames as sample "species". The distance matrix can currently be saved in the MEGA or Nexus formats. [MEGA](https://www.megasoftware.net/) is recommended for creating trees from compareMS2 results. | ||
|
||
### 3.3 Running compareMS2 | ||
|
||
### 3.4 Molecular phylogenetics | ||
|
||
We recommend [MEGA](https://www.megasoftware.net/) creating phylogenetic trees from compareMS2 results. However, most phylogenetic software can take distance matrices as input for UPGMA analysis. This was the original use for which compareMS2 was developed, see the [2012 paper](https://doi.org/10.1002/rcm.6162). | ||
|
||
### 3.6 Experimental features | ||
|
||
Starting in version 2.0, we have begun to include experimental features in compareMS2. These are only available on the command line, but allow extraction of additional information from the comparisons, such as the distribution of similarity between tandem mass spectra as function of precursor mass measurement error, allowing identification of isotope errors and charge state distributions *before* any database search: | ||
|
||
![Experimental feature](./pictures/experimental_features.png) | ||
Figure 2. Similarity (spectral angle from 0 to 1) of tandem mass spectra plotted against precursor *m*/*z* difference, revealing isotope errors up to at least 2 (corresponding to bands at *m*/*z* difference 2/3 and 2/5) and charge states up to 6 (corresponding to the band at *m*/*z* difference 1/6). | ||
|
||
## 4. Acknowledgements | ||
|
||
The developers wish to thank Dr. Michael Dondrup at the University of Bergen for providing changes and additions to make compareMS2 work under macOS. All users and beta testers are also acknowledged for their valuable feedback that helped to improve compareMS2. | ||
|
||
## 5. Further reading | ||
|
||
compareMS2 and related applications have been described or used in a number of papers: | ||
|
||
compareMS2 2.0: An Improved Software for Comparing Tandem Mass Spectrometry Datasets, Marissen M, Varunjikar MS, Laros JFJ, Rasinger JD, Neely BA and Palmblad M, *J. Proteome Res.*, *in press*, 2022, [doi.org/10.1021/acs.jproteome.2c00457](https://doi.org/10.1021/acs.jproteome.2c00457) | ||
|
||
Shotgun proteomics approaches for authentication, biological analyses, and allergen detection in feed and food-grade insect species, Varunjikar MS, Belghit I, Gjerde J, Palmblad M, Oveland E and Rasinger JD, *Food Control* **131**, 2022, [doi.org/10.1016/j.foodcont.2022.108888](https://doi.org/10.1016/j.foodcont.2022.108888) | ||
|
||
Comparing novel shotgun DNA sequencing and state-of-the-art proteomics approaches for authentication of fish species in mixed samples, Varunjikar MS, Moreno-Ibarguen C, Andrade-Martinez JS, Tung HS, Belghit I, Palmblad M, Olsvik PA, Reyes A, Rasinger JD and Lie KK, *Food Control* **131**:108417, 2022, [doi.org/10.1016/j.foodcont.2021.108417](https://doi.org/10.1016/j.foodcont.2021.108417) | ||
|
||
Rewinding the molecular clock: looking at pioneering molecular phylogenetics experiments in the light of proteomics, Neely B and Palmblad M, *J. Proteome Res.* **20(10)**:4640-4645, 2021, [doi.org/10.1021/acs.jproteome.1c00528](https://doi.org/10.1021/acs.jproteome.1c00528) | ||
|
||
Future feed control – Tracing banned bovine material in insect meal. Belghit I, Varunjikar M, Lecrenier MC, Steinhilber A, Niedzwiecka A, Wang YV, Dieu M, Azzollini D, Lie K, Lock EJ, Berntssen MHG, Renard P, Zagon J, Fumière O, van Loon JJA, Larsen T, Poetz O, Braeuning A, Palmblad M and Rasinger JD, *Food Control* **128**:108183, 2021, [doi.org/10.1016/j.foodcont.2021.108183](https://doi.org/10.1016/j.foodcont.2021.108183) | ||
|
||
Species-Specific Discrimination of Insect Meals for Aquafeeds by Direct Comparison of Tandem Mass Spectra. Belghit I, Lock EJ, Fumière O, Lecrenier MC, Renard P, Dieu M, Berntssen MHG, Palmblad M and Rasinger JD, *Animals* **9(5)**:222, 2019 [doi.org/10.3390/ani9050222](https://doi.org/10.3390/ani9050222) | ||
|
||
Palaeoproteomics of bird bones for taxonomic classification. Horn IR, Kenens Y, Palmblad M, van der Plas-Duivesteijn SJ, Langeveld BW, Meijer HJM, Dalebout H, Marissen RJ, Fischer A, Vincent Florens FB, Niemann J, Rijsdijk KF, Schulp AS, Laros JFJ and Gravendeel B, *Zoological Journal of the Linnean Society* **186(3)**:650–665, 2019, [doi.org/10.1093/zoolinnean/zlz012](https://doi.org/10.1093/zoolinnean/zlz012) | ||
|
||
Species and tissues specific differentiation of processed animal proteins in aquafeeds using proteomics tools. Rasinger JD, Marbaix H, Dieu M, Fumière O, Mauro S, Palmblad M, Raes M and Berntssen MHG, *J. Proteomics* **147**:125-131, 2016, [doi.org/10.1016/j.jprot.2016.05.036](https://doi.org/10.1016/j.jprot.2016.05.036) | ||
|
||
Authentication of closely related fish and derived fish products using tandem mass spectrometry and spectral library matching. Nessen M, van der Zwaan D, Greevers S, Dalebout H, Staats M, Kok E and Palmblad M, *J. Agric. Food Chem.* **64(18)**:3669-3677, 2016, [doi.org/10.1021/acs.jafc.5b05322](https://doi.org/10.1021/acs.jafc.5b05322) | ||
|
||
Identification of meat products by shotgun spectral matching. Ohana D, Dalebout H, Marissen RJ, Wulff J, Bergquist J, Deelder AM and Palmblad M, *Food Chem.* **203**:28-34, 2016, [doi.org/10.1016/j.foodchem.2016.01.138](https://doi.org/10.1016/j.foodchem.2016.01.138) | ||
|
||
Differentiating samples and experimental protocols by direct comparison of tandem mass spectra. van der Plas-Duivesteijn SJ, Wulff T, Klychnikov O, Ohana D, Dalebout H, van Veelen PA, de Keijzer J, Nessen MA, van der Burgt YEM, Deelder AM and Palmblad M, *Rapid Commun. Mass Spectrom.* **30**:731-738, 2016, [doi.org/10.1002/rcm.7494](https://doi.org/10.1002/rcm.7494) | ||
|
||
Authentication of Fish Products by Large-Scale Comparison of Tandem Mass Spectra. Wulff T, Nielsen ME, Deelder AM, Jessen F and Palmblad M, *J. Proteome Res.* **12(11)**:5253-5259, 2013, [doi.org/10.1021/pr4006525](https://doi.org/10.1021/pr4006525) | ||
|
||
Molecular phylogenetics by direct comparison of tandem mass spectra. Palmblad M and Deelder AM, *Rapid Commun. Mass Spectrom.* **26(7)**:728-732, 2012, [doi.org/10.1002/rcm.6162](https://doi.org/10.1002/rcm.6162) |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.