-
Notifications
You must be signed in to change notification settings - Fork 47
MetaMorpheus Task Parameters Defined
trishorts edited this page Feb 14, 2022
·
1 revision
This section provides definitions for all MetaMorpheus task parameters. Parameters are organized in alphabetical order, by their name as displayed in the GUI. Following the GUI name, the parameter name, as displayed in .toml setting files, is provided in parenthesis for command-line users. The default values provided for all parameters in MetaMorpheus are designed to facilitate the analysis of most high-resolution MS2 data without requiring alteration.
- Apply Protein Parsimony and Construct Protein Groups (DoParsimony): This Search Task parameter indicates if protein parsimony will be performed on the identified peptides (1% PSM-level FDR). Selection of protein parsimony is required for match between runs, protein quantification and multi-protease protein inference.
- Child Scan Dissociation (MS2ChildScanDissociationType): This Glyco Search Task parameter specifies the dissociation type used for generating MS3 scans or second dissociation MS2 scans.
- Compress Individual File Results (CompressIndividualFiles): This Search Task parameter determines if MetaMorpheus’ individual results files are compressed in order to minimize memory requirements.
- Construct Mass-Difference Histogram (DoHistogramAnalysis): This postsearch analysis parameter within the Search Task allows for the creation of a histogram displaying the observed mass-shifts for all peptide identifications (1% FDR). The mass shifts observed for PSMs are clustered into bins and analyzed for peaks corresponding to the molecular weight of a PTM or amino acid substitution. This analysis is primarily useful for interpreting open-mass search results.
- Crosslink at Cleavage Sites (CrosslinkAtCleavageSite): This XL Search Task parameter dictates whether or not a crosslink can be identified at a proteolytic cleavage site.
- Crosslinker Type (all parameters under the XlSearchParameters.Crosslinker header): This XL Search Task parameter specifies the crosslinker used in the experiment. The crosslinker type can be selected from a list of crosslinkers or a custom crosslinker can be added (see Note 19.).
- C-Terminal Ions (FragmentationTerminus): This Search Task parameter specifies the generation of fragment ions from the C-terminus (e.g. 𝑥-, 𝑦- and 𝑧-ions) of all theoretical peptides.
- Deconvolute Precursor (DoPrecursorDeconvolution): Present in the GPTMD, Search, XL Search and Glyco Search Tasks, this parameter enables the identification of multiple peptides from a single MS2 scan. For each MS2 scan, the MS1 isolation window is investigated for precursors that could have been cofragmented to yield the observed fragmentation pattern.
- Deconvolution Max Assumed Charge State (DeconvolutionMaxAssumedChargeState): Present in the GPTMD and Search Tasks, this parameter dictates the maximum expected charge state for a peptide. Any isotopic envelopes with charge states larger than this value are discarded or are incorrectly identified as harmonics.
- Dissociation Type (DissociationType): Present in all task types (Calibration, GPTMD, Search, XL Search and Glyco Search), this parameter specifies the dissociation type used for the acquisition of MS2 spectra. MetaMorpheus was originally designed for analysis of high-resolution MS2 data, because of this all dissociation types are assumed to be high-resolution, with the exception of the LowCID option (see Note 3.).
- Filter Results to q-Value (QValueOutputFilter): This Search Task parameter dictates the maximum q-value of peptide identifications in the output files. The filtering of identifications makes the exported result files more manageable for large datasets.
- Fixed Modifications (ListOfModsFixed): Present in all tasks (Calibration, GPTMD, Search, XL Search and Glyco Search), this parameter dictates which PTMS are “fixed” and should be applied to every possible location in the database specified. Typically, the only fixed modification necessary is carbamidomethylation of cysteine, which results when reduced samples have been alkylated with iodoacetamide. Other fixed modifications can be selected when appropriate such as TandemMassTag (TMT) labels.
- Generate Complementary Ions (AddCompIons): Present in GPTMD and Search Tasks, this parameter adds artificial complementary masses to the experimental MS2 spectrum. Artificial fragment masses are inferred by subtracting the deconvoluted mass of each observed MS2 fragment ion from the observed precursor mass and adding a dissociation type-specific mass shift. This strategy can be helpful in identifying peptides with modifications near a terminus (e.g. C-terminal modifications of tryptic peptides) [12].
- Generate Decoy Proteins (DecoyType): Present in all search tasks (Search, XL Search and Glyco Search), this parameter indicates if MetaMorpheus automatically generates decoy protein sequences from the provided protein database(s). Decoy proteins provide known false-positive sequences which can be used to determine q-values. In the Search and Glyco Search Tasks, decoy proteins can be generated by using either the reversed or slided methods. Reversed decoys are generated by reversing the protein sequence provided in the target database. Slided decoys are generated by non-random shuffling of the amino acids within each provided protein sequence. If the protein database supplied already contains decoy protein sequences, uncheck this feature.
- Generate Target Proteins (SearchTarget): This Search Task parameter indicates MetaMorpheus will search for target peptides generated by the in silico digestion of provided database(s). This parameter can be disabled for decoy-only searches, which are useful in analyses where target and decoy databases are searched separately.
- Glyco Search (GlycanSearchType): This Glyco Search Task parameter determines whether O-glycopeptides or N-glycopeptides are discovered by the Glyco Search algorithm. Only one class of glycans can be investigated at a time using the Glyco Search Task.
- Handle Overlap Between Target and Contaminant Databases (TCAmbiguity): This Search Task parameter specifies the classification of protein entries that are shared between target and contaminant databases as a contaminant entry, target entry or both.
- Initiator Methionine (InitiatorMethionineBehavior): Present in all tasks (Calibration, GPTMD, Search, XL Search and Glyco Search), this parameter specifies how MetaMorpheus addresses the potential cleavage of initiator methionine residues in the protein database. The initiator methionine for protein entries can always be cleaved, always be retained, or variable (both cleaved and retained versions are created) in the generation of theoretical peptides. It is recommended to treat the initiator methionine as variable.
- Keep Top N Candidates (CrosslinkSearchTopNum or GlycoSearchTopNum): This parameter in the XL Search and Glyco Search Tasks specifies the maximum number of candidate peptides considered per MS2 scan to reduce computational complexity.
- LFQ: Quantify peptides/proteins with FlashLFQ (DoQuantification): Selection of this quantification option within the Search Task establishes that FlashLFQ will be used to perform label free peptide and protein level quantification. An experimental plan in MetaMorpheus is required for label free quantification (see Note 26). For additional information on FlashLFQ see chapter X.
- Mass Difference Acceptor Criterion (MassDiffAcceptorType): This Search Task parameter determines the acceptable the mass notch(es) for the difference between a peptide’s observed and theoretical precursor mass. Selections can be made from the provided options (“Exact”, “1 Missed Monoisotopic Peak”, “1 or 2 Missed Monoisotopic Peaks”, “1,2 or 3 Missed Monoisotopic Peaks”, “+-3 Missed Monoisotopic peaks”, “-187 and Up”, and “Accept all”). Additionally, MetaMorpheus supports the addition of a custom mass difference acceptor (see Note 27)
- Match Between Runs (MatchBetweenRuns): This Search Task parameter indicates match between runs will be utilized as part of the quantification process. Match between runs allows peptides that were fragmented in at least one spectra file to be quantified across all other spectra files. Any peptide identified in one spectra file is searched for in all other files within a small mass-to-charge and retention time window. To learn more about match between runs see the FlashLFQ chapter (chapter X).
- Max Fragment Mass (MaxFragmentSize): This Search Task parameter imposes an upper limit for the mass of theoretical fragment ions.
- Max Heterozygous variants for Combinatorics (MaxHeterozygousVariants): Present in the Calibration, GPTMD and Search Tasks, this parameter is only relevant when one of the databases provided is generated via Spritz [13] and contains annotated sequence variants. It dictates the maximum number of variants that can be applied to a single protein sequence, thus determining the number of theoretical variant-containing proteins generated.
- MaxMissedCleavages(MaxMissedCleavages): Present in all tasks (Calibration, GPTMD, Search, XL Search and Glyco Search), this parameter specifies the maximum number of missed cleavages allowed during in silico digestion of the protein database(s). The protease utilized affects this parameter because certain proteases, such as Chymotrypsin, are more prone to missed cleavages [14].
- MaxModificationIsoforms(MaxModficationIsoforms):This parameter is present in all tasks (Calibration, GPTMD, Search, XL Search and Glyco Search) and specifies the maximum number of different peptide forms (peptidoforms) possible for a single theoretical peptide sequence. A large number variable and/or annotated modifications for a peptide can drastically increase the number of peptidoforms present in the database, making this parameter crucial for controlling database size.
- Max Mods Per Peptide (MaxModsForPeptide): This parameter, present in the Calibration, GPTMD, Search, and Glyco Search Tasks, defines the maximum number of PTMs allowed on an individual peptide. As this value increases, so does the number of PTM combinations, search space and computational time.
- MaxPeptideLength(MaxPeptideLength): Present in all task types (Calibration, GPTMD, Search, XL Search and Glyco Search) this parameter establishes the maximum length of theoretical peptides generated by in silico database digestion. Any peptides present in the sample longer than the specified value will not be correctly identified.
- Max Threads (MaxThreadsToUsePerFile): This parameter specifies the maximum number of threads MetaMorpheus can utilize. The default value is determined based on the CPU running MetaMorpheus and is set to one less than the total number of threads.
- MaximumOGlycansAllowed(MaximumOGlycanAllowed): This Glyco Search parameter specifies the maximum number of O-glycosylation sites possible for a single theoretical peptide. This parameter should be adjusted depending on prior knowledge of the sample being analyzed. For example, mucins are a class of proteins known for heavy O-glycosylation [15]. A sample of primarily mucin proteins should have a higher value set for this parameter than non-mucin samples.
- Min Peptide Length (MinPeptideLength): Present in all task types (Calibration, GPTMD, Search, XL Search and Glyco Search), this parameter establishes the minimum length of theoretical peptides generated by in silico database digestion. The default value for this parameter is seven, because peptides shorter than this length are difficult to confidently identify.
- Min Read Depth for Variants (MinVariantDepth): Found in the Calibration, GPTMD and Search Tasks, this parameter is only relevant when one or more of the databases provided are generated by Spritz [13] and contain annotated sequence variants. This parameter specifies the read depth, or coverage, that a specific variant must have in the RNA sequencing data in order to be included into theoretical protein sequences. This prevents variants without sufficient transcriptomic support from expanding the search space.
- Minimum Intensity Ratio (MinimumAllowedIntensityRatioToBasePeak): This parameter, present in the GPTMD, Search, XL Search and Glyco Search tasks, establishes the minimum intensity ratio required for each experimental fragment ion. The intensity ratio for each fragment ion is calculated by dividing its intensity by that of the highest intensity peak in the scan. If the minimum intensity threshold is not met, that fragment ion cannot be compared to the theoretical peptide spectra.
- Minimum Score Allowed (ScoreCutoff): Present in all tasks (Calibration, GPTMD, Search, XL Search and Glyco Search), this parameter defines the minimum score required to report a PSM. The score, for high-resolution MS2 data, is determined by summing the number of matched fragment ions with the fraction of the total ion current (TIC) accounted for by these matched ions.
- MS2ChildScanDissociation(MS2ChildScanDissociationType): This XL Search Task parameter specifies the dissociation type used to generate MS3 scans or second dissociation MS2 scans. If this level of fragmentation is not relevant for the spectra being analyzed, the parameter can be set to Null.
- MS2 Scan Dissociation Type (DissociationType2): This XL Search Task parameter specifies the dissociation type used to generate MS2 fragmentation spectra.
- MS3 Child Scan Dissociation (MS3ChildScanDissociationType): This XL Search Task parameter specifies the dissociation type used to generate MS4 scans or second dissociation MS3 scans. If this level of fragmentation is not relevant for the spectra being analyzed, the parameter can be set to Null.
- N-Glycan Database (NGlycanDatabasefile): This Glyco Search Task parameter determine which N-glycan database will be utilized for the identification of Nlinked glycopeptides. Custom N-glycan databases can be added if necessary (see Note 21.).
- No Quantification (DoQuantification): Selection of this quantification option within the Search Task dictates that neither label free or SILAC quantification will be performed.
- Nominal Window Width Thomsons (WindowWidthThomsons): This Search Task parameter specifies the width of the MS1 and MS2 filtering windows in Thomsons (m/z units). Dividing MS1 and MS2 scans into windows helps prevent filtering bias that may result from prevalence of high intensity peaks in the center of the spectrum and lower intensity peaks at low and high m/z ranges.
- Normalize Peaks in Each Window (NormalizePeaksAcrossAllWindows): This Search Task parameter enables the normalization of peak intensity values to the most intense peak within the defined window.
- Normalize Quantification Results (Normalize): When label free quantification with FlashLFQ is enabled, this Search Task parameter dictates the normalization of peptide intensity values. This normalization is based on the assumption that the majority of peptides do not change in abundance between conditions (To learn more see Rob’s chapter). The normalization algorithm requires the information provided in the experimental design (see Note 26.).
- N-Terminal Ions (FragmentationTerminus): This Search Task parameter specifies the generation of fragment ions from the N-terminus (e.g. 𝑎-, 𝑏- and 𝑐-ions) of all theoretical peptides.
- Number of Database partitions (TotalPartitions): The modern, semi-specific and non-specific search algorithms generate an index of theoretical peptide spectra from the supplied database(s). This index can become prohibitively large and exceed the RAM capacity of the computer. This parameter allows for the search space to be divided into partitions, or sections, before the search is performed to avoid such complications. The theoretical peptides in each partition are searched separately and then aggregated to provide the same results as if the partitioning method was not applied.
- Number of Windows (NumberOfWindows): This Search Task parameter defines the number of windows, or sections, the MS1 and MS2 scans are to be divided into for peak filtering. Often, peaks are most intense in the center of the spectrum and less intense on the edges. When filtering is applied to the entire spectrum there is a risk of removing quality peaks in the low and high m/z regions of the spectra and retaining noise peaks in the center. Division of the scans into filtering windows prevents this bias.
- O-Glycan Database (OGlycanDatabasefile): This Glyco Search Task parameter determines which O-glycan database will be utilized for the identification of Olinked glycopeptides. Custom O-glycan databases can be added if necessary (see Note 21.).
- OxoniumIonFilt(OxoniumIonFilt): This Glyco Search Task parameter specifies that only MS2 scans containing an oxonium ion at 204 m/z will be investigated as potential glycopeptides.
- Peak-FindingTolerance(QuantifyPpmTol): This Search Task parameter defines the parent mass tolerance (in ppm) used for label free quantification.
- Precursor Mass Tolerance (PrecursorMassTolerance): This parameter, found in all tasks (Calibration, GPTMD, Search, XL Search and Glyco Search), establishes the maximum mass difference between the observed and theoretical precursor masses permitted for a PSM. This value is typically specified in ppm but can also be represented in daltons.
- Product Mass Tolerance (ProductMassTolerance): Found in all tasks (Calibration, GPTMD, Search, XL Search and Glyco Search), this parameter establishes the maximum mass difference between theoretical and experimental fragment ion permitted for it to be considered a match. This value can also be set in either ppm or daltons (see Note 28).
- Protease (Protease): Present in all tasks (Calibration, GPTMD, Search, XL Search and Glyco Search), this parameter establishes the protease used for in silico database digestion. This protease should be the same as was used experimentally to digest the sample. Selection can be made from a provided list of common proteases, or a custom protease can be specified (see Note 29).
- Quench Method (XLQuench): This XL Search Task parameter specifies the method(s) utilized to quench the crosslinker.
- Report PSM Ambiguity (ReportAllAmbiguity): This Search Task parameter defines how ambiguous peptide spectral matches are reported. When multiple theoretical peptide sequences match the same MS2 spectra, and these PSMs all have the same score, the identification is ambiguous. If this box is unchecked, a random peptide from the multiple ambiguous matches will be reported. Otherwise, all possible sequences are reported.
- Require at least Two Peptides to Identify Protein (NoOneHitWonders): This Search Task parameter requires the identification of two unique peptides for the establishment of a protein group in protein parsimony. Historically, this parameter was developed to eliminate the presence of one-hit wonders, but has since been considered to be overly stringent and be detrimental to protein parsimony overall (cite).
- Search Mode (SearchType): This Search Task parameter defines which search algorithm will be used for the task. MetaMorpheus includes 4 different search algorithms (or modes): a) Classic Search (see Note 30), b) Modern Search (see Note 31), c) Semi-Specific Search (see Note 32) and d) Non-Specific Search (see Note 33).
- Separation Type (SeparationType): This parameter in the Search and XL Search Tasks specifies the online separation method utilized prior to mass spectrometric analysis. This determines whether predicted hydrophobicity or electrophoretic mobility values are calculated for the peptides.
- SILAC/SILAM: Quantify peptides/proteins with stable isotope labels (DoQuantification and GenerateUnlabeledProteinsForSilac): Selection of this quantification option within the Search Task indicates a portion of the peptides and proteins within the sample have been isotopically labeled enabling relative quantification. Upon selection, additional parameters for SILAC-based quantification appear including a checkbox to quantify unlabeled peptides and a table in which to specify the amino acid labels being used (see Note 34).
- Top N Peaks per m/z window (NumberOfPeaksToKeepPerWindow): Present in the GPTMD, Search, XL Search and Glyco Search Tasks, this parameter indicates the maximum number of peaks allowed in a window with a specified m/z width. This parameter applies to the peak filtering process of MS1 and MS2 scans. The peaks within the window are ordered by intensity prior to the cutoff being applied. 59. Treat Modified Peptides as Different Peptides (ModPeptidesAreDifferent): This Search Task parameter requires the protein parsimony algorithm to consider modified peptides distinct from their unmodified form. This can potentially disambiguate protein groups by the presence of annotated PTMs.
- Trim MS1 Peaks (TrimMs1Peaks): This parameter is present in all search task types (Search, XL Search and Glyco Search) and enables the filtering of MS1 peaks as part of spectra pre-processing.
- Trim MS2 Peaks (TrimMsMsPeaks): This parameter is present in all search task types (Search, XL Search and Glyco Search) and enables the filtering of MS2 peaks as part of spectra pre-processing.
- Use Delta Scores for FDR (UseDeltaScore): This Search Task parameter specifies whether the Delta Score, instead of the Score, should be used for ranking PSMs prior to statistical analysis. The Delta Score is the difference between the scores of the two best matching peptides for the same MS2 spectrum. If the Delta Score produces fewer PSMs at a 1% FDR, then the Score will be automatically used instead.
- Use Provided Precursor (UseProvidedPrecursorInfo): Present in the GPTMD, Search, XL Search and Glyco Search Tasks, this parameter indicates that the precursor mass reported in the spectra should be used as the observed precursor mass for the search. This can be used in addition to deconvoluted precursor masses.
- Variable Modifications (ListOfModsVariable): Present in all tasks (Calibration, GPTMD, Search, XL Search and Glyco Search), this parameter dictates which PTMS are “variable” and that modified and unmodified forms of all peptides should be generated. Variable modifications should be used with caution because they massively increase the search space and typically lead to high false-positive rates. With the exception of variable oxidation of methionine, all other potentially present variable modifications should be searched for using the GPTMD approach see Subheading 3.6.
- Write.mzID(WriteMzId): This Search Task parameter requires additional search result files to be written in .mzID format. This is the output file type defined by the Human Proteome Organization (HUPO) and was designed to be a standardized format for reporting search results across different searching platforms.
- Write .pep.XML (WritePepXml): This XL Search Task parameter requires additional result files to be written in .pep.XML format. This file format is widely accepted for the output of proteomics search engines. This result file format can be used as input for ProXL (see Note 20.) for visualization of crosslinking results.
- Write Contaminants (WriteContaminants): This Search Task parameter specifies the inclusion of contaminant peptide identifications in the result files. Contaminant identifications are clearly annotated as contaminants.
- Write Decoys (WriteDecoys): This Search Task parameter specifying the inclusion of decoy peptide identifications in the result files. Decoy identifications are clearly annotated as decoys.
- Write Individual File Results (WriteIndividualFiles): This output option within the Search Task specifies that result files for each individual spectra file be written in addition to the cumulative result files.