Skip to content
trishorts edited this page Jul 14, 2017 · 32 revisions

MetaMorpheus: Why not see all your peptides?

The concept behind MetaMorpheus is simple. A significant percentage of peptides analyzed in bottom-up experiments contain posttranslational or sequence variants. Many search program ignore all of these peptides and look for only unmodified peptides. We discovered a way to identify these peptides with just as much confidence as for unmodified peptides. Now you can have your cake and eat it too!

How does it work?

In a standard search, you only need to use a database that already contains the location of known PTMs. Where do I get such a database you ask? Well, you can get it from the same place you always get your database, UniProt. Go ahead and find the correct database for your organism BUT.....before you download it, change the format to XML.

What about all the PTMs that aren't already in the database?

MetaMorpheus is the solution to all your woes. It comes prepackaged with the PTM discovery strategy known as G-PTM-D, which you can read about here. MetaMorpheus finds high scoring matches between an experimental MS/MS and a theoretical MS/MS where the difference in mass corresponds to a known PTM (e.g 79.97 Da for phosphorylation). MetaMorpheous then amends the protein database so that peptide can have a phosphorylation at all possible locations (e.g. S, T and Y). A second search with the new database uses these modified peptides as theoreticals for the search and if the score passes the specified FDR cut-off, then it will be reported in the results. Let us emphasize that G-PTM-D is not the equivalent of doing a variable modification search. It is much, much more accurate. PTM FDRs truly are at or below 1%.

Does G-PTM-D really work? I don't believe it.

It does. MetaMorpheus uses very high stringency standards. The narrowest possible tolerances for parent and fragment mass are used. Posterior error probability is calculated for each match and a 1% local FDR is required. Moreover, localization of each PTM is attempted and reported. In addition, diagnostic ions are also detected that aid in the confident assignment.

What else does MetaMorpheous do?

One of the key features that users will immediately take note of is the calibration. Several PTMs have very similar mass. Sulfonation (79.956815) and phosphorylation (79.966331Da) are only 0.009516Da apart. Acetylation (42.010565) and trimethylation (42.046950) are only 0.036385Da apart. High-quality calibration can make all the difference in the world in accurately discovering and identifying PTMs.

How do I get started?

  • Download a protein database in .XML format from UniProt and drag it onto the MetaMorpheus application. It will go where it needs to go. By the way, there is no need to unzip the database and waste all that hard-drive space. MetaMorpheus reads zipped files.
  • Next, drag a couple .raw or .mzML files onto the MetaMorpheus application. Again, they'll go where they need to go.
  • Click on the "New Search Task" tab. Open up "Some Search Properties" and make the appropriate settings adjustments.
  • Click on "Post-Search Analysis"and decide if you want to aggregate proteins and quantify peptides. The choice is yours.
  • Click on "Modifications" choose those you want to keep.
  • Click "Add the Search Task"
  • Finally "Run all tasks!"

Don't go far. The search will be over before you know it.

Search results for each file are generated automatically in the folder that contains the original files. Aggregate total PSMs and aggregate unique PSMs are automatically generated. If you selected "Aggregate Proteins" in the "Post Search Analysis" tab, then you will also have that result to look at.

Shouldn't I calibrate my files first?

Probably.

  • Click on "New Calibrate Task" and adjust the settings appropriately.
  • Click on "Add the Calibration Task"
  • Click on "Run all tasks!"

This one takes a little longer. So, go get a cup of coffee.

You said I could discover PTMs. How do I do that?

  • Simple. Click on "New G-PTM-D Task"
  • Select the modification that you want to discover.
  • Click on "Add the G-PTM-D Task"
  • Click on "Run All Tasks!"

But.....That only makes the database of PTMs. What you want to do now is:

  • Use that new database in a regular search. If you did things right, this database will appear in the Protein Databases field and be already selected.
  • Follow the directions for a regular search, which are described above.

I want to do everything.

Good news. You can. Just add all the individual tasks before you click on "Run all tasks!" and MetaMorpheous will take care of everything. In order:

When you come in to work in the morning, your data processing will be complete.

How does MetaMorpheus deal with contaminants?

With your help. A nice feature of MetaMorpheus is the ability to simultaneously use multiple database files in a single search. These can be any combination of .fasta and .xml. We recommend that you download an existing database of contaminants or create your own based on the type of sample you are analyzing the the probable contaminants in your lab. Once you've done it and dragged it into the MetaMorpheus GUI, you simply CHECK the box marked CONTAMINANT. That way MetaMorpheus knows that the database you created contains the contaminant proteins. During the search, any peptide matching a contaminant peptide gets assigned to a contaminant even if there is an exact duplicate in the target protein database. During protein parsimony, contaminant peptides and proteins get assigned first and are not included in target protein parsimony.

I'm a nerd. I really want to know what's under the hood.

Look no further. Below you will find all the technical details that make MetaMorpheus hum. If you have an issue or a question, please click on the "Issues" tab of this GitHub repository and let us know. We'll respond quickly.

Scoring Peptide Matches

FDR

PEP

Checking the "Aggregate Proteins" button in the MetaMorpheus "Add Search Task" window constructs the most concise possible list of proteins that could account for all observed peptides ("maximum parsimony"). Peptides are assigned to proteins by the following rules:

  • All peptides that could be assigned to a decoy protein are removed from any target protein associations (i.e., they are only assigned to decoy protein(s)).
  • A peptide that can only be assigned to one protein is a "unique" peptide; this protein is added and all peptides that could be assigned to that protein are assigned to that protein.
  • The remaining unaccounted-for peptides are assigned by the "greedy algorithm", which iteratively chooses a protein by how many peptides it can account for. For instance, if a protein can account for 4 unaccounted-for peptides, this is superior to a protein that would only account for 2 peptides. If two proteins have the same number of unaccounted-for peptides in the given iteration, the protein with the most total peptides is added. The loop continues until all peptides are accounted for.
  • Any protein that is indistinguishable (i.e., has the same set of peptides) from a protein in the resulting parsimonious list is added to that protein's group.
  • Protein groups are scored by summing the scores of all peptides below 1% FDR belonging to that group (ignoring duplicate base sequences and PTMs). Peptides below 1% FDR are not displayed in the protein groups list.

Open Modification Searches

In addition to the small set of UniProt modifications, MetaMorpheus allows using an expanded set of arbitrary user-defined modifications throughout.

Chain and Signal peptides

Some proteins are present in biological samples as subsequences of the complete sequence specified in the database. Since they are common, and Uniprot lists these protein fragments, we expanded the search functionality to look for those as well.

Histogram Peak Analysis

The modification discovery component is enhanced by the automated peak analysis heuristic. Every database search result is analyzed, and for every freqeuntly occuring mass shift (determined by a peak-finding algorithm), an analysis is conducted. The results of the analysis are written in a separate file, and they include the total number of unique peptides associated with the mass shift, the fraction of decoys, mass match with any known entry in the unimod or uniport database, mass match to an amino acid addition/removal combination, mass match to a combination of higher frequency peaks, fraction of localizable targets, localization residues and/or termini, and presence of any modifications in the matched peptides. All of this data can then be used to determine the nature of the peak, and the characteristics of the corresponding modification.