diff --git a/notes/20241001_FI.md b/notes/20241001_FI.md index cd0149e7..cfa85b73 100644 --- a/notes/20241001_FI.md +++ b/notes/20241001_FI.md @@ -3,11 +3,11 @@ Fragment ion indexing was first introduced by [MSFragger in 2017](https://pubmed.ncbi.nlm.nih.gov/28394336/) and this strategy has since been adopted in search tools like [MetaMorpheus](https://pubmed.ncbi.nlm.nih.gov/29578715/) and [Sage](https://pubmed.ncbi.nlm.nih.gov/37819886/). And yes, you are encouraged -to go use MSFragger, MetaMorpheus, Sage and all of the other great search tools out -there. +to go use MSFragger, MetaMorpheus, Sage and all of the other great peptide identification +tools out there. -Fragment ion indexing (abbreviated as "FI" going forward) is supported in Comet as of -[version 2024.02 rev. 0](https://uwpr.github.io/Comet/releases/release_202402.html). +Fragment ion indexing (abbreviated as "FI" or "Comet-FI" going forward) is supported +n Comet as of [version 2024.02 rev. 0](https://uwpr.github.io/Comet/releases/release_202402.html). Given this is the first Comet release with FI functionality, we expect to improve on features, performance, and functionality going forward. @@ -66,17 +66,24 @@ can be avoided for all subsequent files being searched. ### Current limitations and known issues with Comet-FI: - MSFragger's database slicing has not yet been implemented so you must have - enough RAM to stored the entire FI in memory. Note that for real-time - search application, database slicing is not feasible. + enough RAM to stored the entire FI in memory. Note that for the real-time + search application for intelligent instrument control, database slicing is + not feasible. - Protein n-term and c-term variable modifications are not supported in this initial FI release. This fuctionality is expected to be added soon. This means that variable modifications are limited to residues and peptide termini. - Only [variable_mod01 through variable_mod05](https://uwpr.github.io/Comet/parameters/parameters_202402/variable_modXX.html) are supported with FI. + This is a limit imposed to restrict the FI to a reasonable size. - For each variable_modXX, a maximum of 5 modified residues will be considered in a peptide. This might further be limited by the total allowed number of modified residues in a peptide controlled by the [max_variable_mods_in_peptide](https://uwpr.github.io/Comet/parameters/parameters_202402/max_variable_mods_in_peptide.html) parameter. +- Comet's internal decoy search via the + [decoy_search](https://uwpr.github.io/Comet/parameters/parameters_202402/decoy_search.html) + parameter is not supported. For FDR analysis, you should supply Comet a FASTA + file containig target and decoy entries. ### Fragment ion index specific search parameters + - [fragindex_min_fragmentmass](/Comet/parameters/parameters_202402/fragindex_min_fragmentmass.html) - [fragindex_max_fragmentmasss](/Comet/parameters/parameters_202402/fragindex_max_fragmentmass.html) - [fragindex_min_ions_report](/Comet/parameters/parameters_202402/fragindex_min_ions_report.html) @@ -102,21 +109,23 @@ user who wants to analyze MHC peptides requiring non-specific enzyme constraint make sure you have a 128GB box before attempting this analysis with this version of Comet. The following searches were run using 8-cores of an AMD Epyc 7443P processor with -256GB RAM running on Ubuntu linux version 22.04 Search times and memory use are -noted: - -- Yeast forward + reverse (XXXX sequence entries), tryptic, 1 allowed - missed cleavage, variable mods 16M, peptide length 5 to 50 uses XX GB of RAM - and completes in XXX. -- Human forward + reverse (1XX,XXX sequence entries), tryptic, 1 allowed - missed cleavage, variable mods 16M, peptide length 5 to 50 uses XX GB of RAM - and completes in XXX. +256GB RAM running on Ubuntu linux version 22.04 Up to two of each specified variable +modifications are allowed in a peptide. The query file is a two hour Orbitrap Lumos +run with MS/MS spectra acquired in the Orbitrap. Peptide length 7 to 50 and +digest mass range 600.0 to 5000.0. + +- Yeast forward + reverse (12,488 sequence entries), tryptic, 1 allowed + missed cleavage, variable mods 16M and 80STY uses 5.6 GB of RAM and + completes in 31 seconds. +- Human forward + reverse (193,864 sequence entries), tryptic, 1 allowed + missed cleavage, variable mods 16M uses 5.2 GB of RAM and completes in + 31 seconds. - Human forward + reverse (1XX,XXX sequence entries), tryptic, 1 allowed - missed cleavage, variable mods 16M, 80STY, peptide length 5 to 50 uses XX GB of RAM - and completes in XXX. -- Human forward + reverse, no enzyme constraint, no variable mods, - peptide length range 7 to 15 uses XX GB of RAM - and completes in XXX. + missed cleavage, variable mods 16M, 80STY uses 11.3 GB of RAM and + completes in 68 seconds. Corresponding standard Comet search + took 4 minutes and 10 seconds. - Human forward + reverse sequences, no enzyme constraint, 16M variable mod, - peptide length range 7 to 15 uses XX GB of RAM - and completes in XXX. + peptide length range 7 to 15 uses 49 GB of RAM and completes in 5 minutes + and 30 seconds. However, just creating the plain peptide .idx file takes + over 100 GB RAM and 12 minutes. The corresponding standard Comet search + took 20 minutes and 5 seconds. diff --git a/releases/release_202402.md b/releases/release_202402.md index db6e6f61..c85ffdd6 100644 --- a/releases/release_202402.md +++ b/releases/release_202402.md @@ -8,11 +8,12 @@ Download release [here](https://github.com/UWPR/Comet/releases). - Add fragment ion indexing support. While fragment ion indexing code was present in the 2024.01 rev. 0 release, -this is the first release to official support fragment ion indexing, which -is a method that was originally implemented by [MSFragger](https://www.nature.com/articles/nmeth.4256). +this is the first Comet release to official support fragment ion indexing +which is a method that was originally implemented by +[MSFragger](https://www.nature.com/articles/nmeth.4256). In Comet's implementation, the fragment ion index is applied as a candidate peptide filter prior to performing full cross-correlation analysis. -[Please see this note](https://uwpr.github.io/Comet/notes/20241001_FI.html) +[Please see this page](https://uwpr.github.io/Comet/notes/20241001_FI.html) for more details on Comet's fragment ion index. Thanks to V. Sharma for implementing the modifications permutation code and to E. Bergstrom, C. McGann, and D. Schweppe for driving the development and testing. @@ -25,9 +26,10 @@ The following are new search parameters specific to this feature. - [fragindex_skipreadprecursors](https://uwpr.github.io/Comet/parameters/parameters_202402/fragindex_skipreadprecursors.html) - Allow variable modifications to apply to a subet of proteins. -For example, one can now apply mono-, di-, and tri-methylation -as variable modifications to only histone proteins and not all -proteins in the human database. This functionality is controlled by the +For example, one can now limit mono-, di-, and tri-methylation +as variable modifications to only histone proteins and not have +to apply those modifications on all proteins in the human database. +This functionality is controlled by the [protein_modlist_file](https://uwpr.github.io/Comet/parameters/parameters_202402/protein_modlist_file.html) parameter. Note there will be issues for post processing analysis, such as FDR, when applying this feature. Thanks to C. McGann for the feature request.