-
Notifications
You must be signed in to change notification settings - Fork 47
Protein Parsimony & Grouping (Protein Inference)
rmmiller22 edited this page Oct 25, 2019
·
7 revisions
- "Apply Protein Parsimony & Construct Protein Groups" checkbox - constructs protein groups according to the rule of maximum parsimony (Occam's razor). This means that the minimum number of protein groups will be constructed that can account for all the unambiguous peptides with a Q-Value less than or equal to 0.01. PSMs and peptides which are not used for parsimony still have protein associations listed.
- "Require at least two peptides to identify protein" checkbox - At least 2 peptides with a Q-Value under 0.01 are required to construct a protein group. PSMs still have a protein association listed in the PSM file even if that protein group only has one peptide (there are no orphan PSMs).
- "Treat modified peptides as different peptides" checkbox - Modified forms of a peptide base sequence are treated as different for the purposes of parsimony, protein group displays, peptide counts, etc. You should check this if two proteins in an XML database are distinguished by a modification.
Output is located in the ProteinGroups .tsv file and can be opened with Excel, Notepad, etc. Each header for the output is described here:
- Protein Accession - Protein accession numbers (from the input protein database) for all proteins in the group are listed here with the "|" character as the delimiter.
- Gene - Gene names (from the input protein database) for all proteins in the group are listed here with the "|" character as the delimiter.
- Protein Full Name - Protein names (from the input protein database) for all proteins in the group are listed here with the "|" character as the delimiter.
- Number of proteins in group - The number of proteins in the protein group.
- Unique peptides - Peptides that are unique to the listed protein (they can only come from that one protein, based on the database in silico digestion). Currently, peptides that are unique to the group are not listed here; i.e., a protein group with >1 protein will always have 0 unique peptides because they are shared between all proteins in the group.
- Shared peptides - Peptides that are shared between multiple proteins or protein groups are listed.
- Number of peptides - Sum of unique + shared peptides.
- Number of unique peptides - Number of unique proteins for the group.
- Sequence coverage % - Number of residues observed (in the group's peptides) divided by the total number of residues in the protein, as a percent.
- Sequence coverage - Displays the sequence coverage for each protein in the group with the "|" character as the delimiter. Lowercase residues were not observed. Uppercase residues were observed.
- Number of PSMs - Number of PSMs with Q-Values <0.01 corresponding to the peptides observed for the group.
- Summed MetaMorpheus Score - The highest-scoring PSM per peptide base sequence, summed for all observed peptide base sequences. The list of protein groups are ordered by score.
- Decoy/Contaminant/Target - "D" means decoy protein group, "C" means contaminant, "T" means target.
- Cumulative Target - Used for calculating Q-Value. Sum of target+contaminant proteins so far.
- Cumulative Decoy - Used for calculating Q-Value. Sum of decoy proteins so far.
- Q-Value (%)
- Indistinguishable proteins are listed; subset and subsumable proteins are not listed in the protein group.
- If multiple raw files are searched with "Apply Protein Parsimony & Construct Protein Groups" checked, MetaMorpheus will wait until the end of searching all the raw files to apply parsimony and construct protein groups based on the aggregate results. The upshot is that protein groups are stabilized run-to-run for comparison.
- Protein groups are constructed based on a global peptide Q-Value (all PSMs from all raw files are aggregated and the Q-Value cutoff is applied to these aggregated results). Only peptides below a global cutoff at Q-Value 0.01 are used for protein group construction.
- If multiple raw files from different proteolytic digestions are searched using file specific parameters and the "Apply Protein Parsimony & Construct Protein Groups" box is checked, protein groups are constructed using peptides aggregated from all of the proteolytic digestions.
- The peptides which are used for protein group construction are determined by the calculation of protease specific Q-Values. All PSMs for a specific protease are aggregated prior to a Q-Value cutoff of 0.01 being applied.
- The classification of a peptide as shared or unique is dependent on its protease of origin.
For an excellent and complete description of parsimony and the protein inference problem, consult: