-
Notifications
You must be signed in to change notification settings - Fork 47
Home
The concept behind MetaMorpheus is simple. A significant percentage of peptides analyzed in bottom-up experiments contain post-translational modifications (PTMs) or sequence variants. Many search program ignore all of these peptides and look for only unmodified peptides. We discovered a way to identify these peptides with just as much confidence as for unmodified peptides. Now you can have your cake and eat it too!
You can search with a database that already contains the location of known PTMs. Where does one find such a database? Well, you can get it from the same place you always get your database, UniProt. Go ahead and find the correct database for your organism BUT before you download it, change the format to XML instead of FASTA.
MetaMorpheus is the solution to all your woes. It comes prepackaged with a PTM-discovery strategy known as G-PTM-D, which you can read about here. MetaMorpheus finds high-scoring matches between an experimental MS/MS and a theoretical MS/MS where the difference in mass corresponds to a known PTM (e.g 79.97 Da for phosphorylation). MetaMorpheus then annotates the PTM in the protein database so that peptide can have a phosphorylation at all possible locations (e.g. S, T and Y). A second search with the new database uses these modified peptides as theoretical peptides for the search and if the score passes the specified FDR cut-off, then it will be reported in the results. G-PTM-D is not the equivalent of doing a variable modification search! It is much, much faster and more accurate. PTM false discovery rates truly are at or below 1%.
Label free quantification in MetaMorpheus is really fast. It's really sensitive and accurate too. You can read about it here and here. We recently enabled the software to perform normalization across conditions, bioreps, fractions and technical replicates. It can also do advanced protein quantification ala Diffacto. Instructions for performing peptide and protein quantification can be found by following this link.
One of the key features that users will immediately take note of is the calibration. Several PTMs have very similar mass. Sulfonation (79.956815) and phosphorylation (79.966331Da) are only 0.009516Da apart. Acetylation (42.010565) and trimethylation (42.046950) are only 0.036385Da apart. High-quality calibration can make all the difference in the world in accurately discovering and identifying PTMs.
- Download a protein database in .XML or .fasta format from UniProt and drag it onto the MetaMorpheus application. It will go where it needs to go. By the way, there is no need to unzip the database and waste all that hard-drive space. MetaMorpheus reads .gz compressed databases.
- Next, drag a couple .raw or .mzML files onto the MetaMorpheus application. Again, they'll go where they need to go.
- Click on the "New Search Task" tab. Open up "Some Search Properties" and make the appropriate settings adjustments.
- Click on "Post-Search Analysis"and decide if you want to aggregate proteins and quantify peptides. The choice is yours.
- Click on "Modifications" choose the variable/fixed mods you want to keep. NOTE: This is not G-PTM-D! Use the G-PTM-D task to discover low-abundance PTMs.
- Click "Add the Search Task"
- Finally "Run all tasks!"
Search results for each file are generated automatically in the folder that contains the original files. PSMs and aggregate unique PSMs are automatically generated. If you selected "Construct protein groups" in the "Post Search Analysis" tab, then you will also have that result to look at.
Shouldn't I calibrate my files first?
Probably.
- Click on "New Calibrate Task" and adjust the settings appropriately.
- Click on "Add the Calibration Task"
- Click on "Run all tasks!"
This one takes a little longer. So, go get a cup of coffee.
- Simple. Click on "Discover PTMs"
- Select the modification that you want to discover.
- Click on "Add the GPTMD Task"
- Click on "Run All Tasks!"
But that only makes the database annotated with possible PTMs. What you want to do now is:
- Use that new database in a regular search. If you did things right, this database will appear in the Protein Databases in the upper left and be already selected.
- Follow the directions for a regular search, which are described above.
Good news. You can. Just add all the individual tasks before you click on "Run all tasks!" and MetaMorpheus will take care of everything. In order:
- Calibrate
- G-PTM-D
- Search - NOTE: This search will include all modifications discovered in G-PTM-D automatically. HURRAY!
When you come in to work in the morning, your data processing will be complete.
How does MetaMorpheus deal with contaminants?
With your help. A nice feature of MetaMorpheus is the ability to simultaneously use multiple database files in a single search. These can be any combination of .fasta and .xml. We recommend that you download an existing database of contaminants or create your own based on the type of sample you are analyzing the the probable contaminants in your lab. Once you've done it and dragged it into the MetaMorpheus GUI, you simply CHECK the box marked CONTAMINANT. That way MetaMorpheus knows that the database you created contains the contaminant proteins. During the search, any peptide matching a contaminant peptide gets assigned to a contaminant even if there is an exact duplicate in the target protein database. During protein parsimony, contaminant peptides and proteins get assigned first and are not included in target protein parsimony.
Look no further. Below you will find all the technical details that make MetaMorpheus hum. If you have an issue or a question, please click on the "Issues" tab of this GitHub repository and let us know. We'll respond quickly.
Checking the "Aggregate Proteins" button in the MetaMorpheus "Add Search Task" window constructs the most concise possible list of proteins that could account for all observed peptides ("maximum parsimony"). Peptides are assigned to proteins by the following rules:
- All peptides that could be assigned to a decoy protein are removed from any target protein associations (i.e., they are only assigned to decoy protein(s)).
- A peptide that can only be assigned to one protein is a "unique" peptide; this protein is added and all peptides that could be assigned to that protein are assigned to that protein.
- The remaining unaccounted-for peptides are assigned by the "greedy algorithm", which iteratively chooses a protein by how many peptides it can account for. For instance, if a protein can account for 4 unaccounted-for peptides, this is superior to a protein that would only account for 2 peptides. If two proteins have the same number of unaccounted-for peptides in the given iteration, the protein with the most total peptides is added. The loop continues until all peptides are accounted for.
- Any protein that is indistinguishable (i.e., has the same set of peptides) from a protein in the resulting parsimonious list is added to that protein's group.
- Protein groups are scored by using the highest-scoring PSM below 1% FDR belonging to that group. Peptides below 1% FDR are not displayed in the protein groups list.
In addition to the set of included modifications, MetaMorpheus allows adding user-defined modifications.
Some proteins are present in biological samples as subsequences of the complete sequence specified in the database. Since they are common, and UniProt lists these protein fragments, we expanded the search functionality to look for those as well.
The open-mass search is enhanced by automatic mass-difference histogram generation. The mass-difference of each PSM below 1% FDR is used for this analysis. The results of the analysis are written in a separate file, and they include the total number of unique peptides associated with the mass shift, the fraction of decoys, mass match with any known entry in the UniMod or UniProt databases, amino acid addition/removal combination, combination of higher frequency peaks, fraction of localizable targets, localization residues and/or termini, and presence of any modifications in the matched peptides. All of this data can then be used to determine the nature of the mass-difference, and the characteristics of the corresponding modification.