Releases: UWPR/Comet
v2024.02.0
-
While fragment ion indexing code was present in release 2024.01 rev. 0, this is the first release that officially supports that functionality (and there's still much more to do). Fragment ion indexing is a method that was originally implemented by MSFragger. Documentation on using Comet's fragment ion indexing can be found here. Thanks to V. Sharma for implementing the modifications permutation code and to E. Bergstrom, C. McGann, and D. Schweppe for driving the development and testing. The new parameters below are associated with this functionality:
-
Allow variable modifications to apply to a subset of proteins. For example, one can now limit mono-, di-, and tri-methylation as variable modifications to only histone proteins and not have to apply those modifications on all proteins in the human database. This functionality is controlled by the protein_modlist_file parameter. Note there will be issues for post-processing analysis, such as FDR, when applying this feature. Thanks to C. McGann for the feature request.
Comet v2024.01.1
This maintenance release addresses the following issues:
- Report the previous and next amino acid residues in the fragment ion index search output. Previously the index search did not track this information and simply returned '-' for the preceding and trailing residues. Thanks to E. Bergstrom for tracking this and the other issues in the fragment ion index project.
- Remove the lower limit to allow smaller than 0.01 fragment_bin_tol values. Thanks to I. Smith for reporting the presence of this lower limit in the code.
- Adds support for the new parameter "pinfile_protein_delimiter" which will replace the Percolator pin file protein field delimiter from a tab to the specified character or string. If this parameter entry is left blank the protein field delimiter remains a tab. This is a hidden parameter in that it doesn't appear in the example comet.params files that can be downloaded from the Comet website. Nor is it present in the abbreviated "comet.params.new" file generated with "comet -p". It will be present in the full "comet.param.new" file generated with "comet -q". Thanks to S. Paez for requesting this feature in issue #66. (I should note that I also deprecated a previously undocumented "pin_mod_proteindelim" parameter that, when set, changed the pin file protein delimiter to a comma from the tab.)
- Fix backwards compatibility with the old/retired peptide_mass_tolerance parameter. Although this parameter has been replaced by peptide_mass_tolerance_lower and peptide_mass_tolerance_upper, Comet code was intended to continue supporting the old peptide_mass_tolerance parameter. During a late change to support the new parameters, I broke support for the old parameter. This is addressed by commit 23a3901. Thanks to C. Bielow for posting this issue #60.
- Searches will stall when a fasta sequence loading threshold has been hit; addressed by commit e5cf236. Thanks to C. Bielow for posting the issue #62.
- A logic error in the StorePeptide() kills the search as I did not properly account for the strcmp() string comparison returning true when one of the strings is empty. This was also addressed by commit e5cf236. Thanks to C. Bielow for posting the issue #63.
- Fix the mzIdentML output regular expression for the digestion enzyme, removing an extra space in the regular expression. Also added the more complete EnzymeName element which references the PSI-MS ontology for common enzymes. These were addressed by commit 754514f. Thanks to J. Uszkoreit for posting the issue #64.
- Asp_N, Asp-N_ambic, and PepsinA enzyme definitions in comet.params.new are now updated as part of commit 754514f.
- Change the ppm tolerances to be applied to m/z instead of the deconvoluted mass. There are slight differences between the two that become apparent in extremely large ppm tolerances; in practice this change doesn't really do anything. This change was prompted by issue #689 on the crux-toolkit repository.
- macOS builds now use the macos-13 runner for compiling binaries as the macos-11 runner has been deprecated by GitHub.
Comet v2024.01.0
What's Changed
- Add the parameters “peptide_mass_tolerance_lower” and “peptide_mass_tolerance_upper” to allow the specification of non-symmetric precursor mass tolerances. This means that “peptide_mass_tolerance” is retired and you should start with a fresh comet.params with this release and not re-use an old parameters file.
- Add support for up to 15 variable modifications with the addition of “variable_mod10” through “variable_mod15”. Please do not attempt to search with 15 (or even 9) variable mods without using some serious constraints unless you are the most patient person in the world.
- Add support for what I will term an “exclusive” modification where only one from the set of exclusive variable modifications can appear in a peptide. You would want to apply this option to rare modifications that are unlikely to co-exist and be identified along with another rate modification in the same peptide. Denoting which variable modifications are an “exclusive” modification is accomplished by setting field 7 in the “variable_mod##” parameters to “-1”. The exclusive modification can still apply to multiple residues (controlled by the 4th field) and can exist in conjunction with other variable modifications that are not denoted as being exclusive. This reduces the complexity and search times when analyzing many modifications by not requiring all permutation/combinations of modifications to be analyzed. Requested by E. Deutsch.
- Change “isotope_error” options 4 thru 7. Those options now correspond to 4 = -1/0/1/2/3, 5 = -1/0/1, 6 = -3/-2/-1/0/1/2/3, 7 = -8/-4/0/4/8.
- Add the parameter “resolve_fullpaths” to allow the control of whether or not to resolve the full path base_names in the pepXML output. Default behavior is to resolve those full paths. This parameter allows the user to control leaving the paths as-is. Requested by M. Riffle.
- Fix calculating good E-value scores for extremely sparse spectra. This occurs for sparse spectra as any match to any single peak looks like an outlier from the majority of peptides that match no peaks. This is handled by putting a constraint on the linear regression step of the E-value calculation.
- Simplify the spectral processing for Sp scoring (preliminary score) by just taking the raw binned spectra and normalizing the max intensity to 100.
- Change the convention for the dCn (delta Cn) score for single hit results i.e. those results where only a single peptide is scored/reported. In the past, these single peptide hits received a dCn score of 1.0 but now these single peptide hits will receive a dCn score of 0.0.
- Update to the MSToolkit library to fix a scan numbering bug when spectra are not numbered. * Implemented by the talented M. Hoopmann.
- Update the index search, including the CometWrapper.dll interface used for real time search (RTS), to use fragment ion indexing. It is still a work in progress and not all functionality has been implemented (so do not use it unless you want to be a beta tester). Documentation will be added when it is ready for general use. The fragment ion indexing is used as a pre-filter to the full cross-correlation scoring and is not fast compared to other search tools. Thanks to V. Sharma for implementing the modifications permutation code and the E. Bergstrom, C. McGann, and D. Schweppe for development/testing feedback.
- Added “set_X_residue” parameters which allow user to redefine the base mass of each amino acid residue e.g. set_A_residue to modify the base mass of alanine. Making use of static modifications can effectively accomplish the same thing so there is a very limited use case for this new feature. Feature requested by m.f.abdollahnia via the Comet google group.
- Implemented returning multiple results, instead of just the top hit peptide, for each RTS spectrum query through the CometWrapper.dll interface. Code was contributed by our Thermo collaborators J. Canterbury and W. Barshop and integrated by C. McGann.
- “comet -p” now generates a slightly simplified comet.params.new file. Some lesser used parameters are left out of that file. “comet -q” will generate a comet.params.new file with a more complete list of supported search parameters.
- Fixed issues with the mzIdentML output as reported by R. Marissen in issue #45.
- Fixed the inconsistent Sp rank numbers between runs, reported by keesh0 in issue #46.
- Fixed bug with counting the number of missed cleavages for enzymes that cut before (N-terminal of) the residue, reported by cpaul32015 in issue #47.
New Contributors
- @ChrisMcGann made their first contribution in #49
Full Changelog: v2023.01.2...v2024.01.0
Comet v2023.01.2
This release addresses these two issues:
- mzML/mzXML files without the optional scan index would not be searched because their spectra could not be read. Support for this functionality was implemented in v2019.01.4 thru v2022.01.1 but was lost as of v2022.01.2 when MSToolkit code was updated from that library's repository. This functionality is re-implemented in this release. Thanks to J. Wang for reporting the issue.
- A parameter "export_additional_pepxml_scores" has been implemented. This is an optional/hidden parameter in that it is not written by default in the comet.params files; it needs to be added manually. When this parameter is present and it's value set to "1", additional search scores (lnrSp, deltLCn, lnExpect, and IonFrac) are reported in the pep.xml output. Feature request by J. Scheid, OpenMS group.
Comet v2023.01.1
This release addresses two minor issues:
- For pep.xml output when "num_output_lines = 1", the deltaCn scores will always be reported as "1.0". This occurs when only the top hit is reported and has been this behavior since the original release of Comet. This update will correctly report the deltaCn value for this case. Thanks to J. Scheid for reporting this issue.
- Comet and MSToolkit had input file name/path limits of 512 and 256 characters, respectively. Any input file strings longer than 256 would cause an error in reading the input spectra data. The file name buffer has been expanded to 4096 to mitigate this issue. Thanks to B. Connolly for reporting this issue.
- The Linux/Ubuntu and Mac runners for compiling the release binaries via GitHub Actions have been changed from "ubuntu-latest" and "macos-latest" to "ubuntu-20.04" and "macos-11". This is to give these binaries wider, backwards compatibility with older OS's. Thanks to T. Sachsenberg for requesting this change.
Comet v2023.01.0
What's Changed
- Address issue where a peptide with a low xcorr identified in a very poor/sparse spectrum would be assigned a good E-value. This is due to the majority of matched xcorr scores being "0" so any poor scoring match looks like a good outlier. To address this issue, Comet now randomly assigns roughly half of these "0" value xcorr scores a score associated with a single peak match within the xcorr histogram used for the E-value calculation. Thanks to D. Shteynberg for reporting the issue.
- Add "scale_fragmentNL" parameter entry which scales (multiplies) the neutral loss mass value by the number of modified residues in the fragment. Feature requested by A. Keller.
- Add contributions of fragment neutral loss peaks in preliminary (Sp) score; previously they only applied to the cross-correlation score.
- Correct bug where the fragment neutral loss peak was not analyzed if the primary fragment peak was not matched.
- Fix minor typo in command line help. by @mriffle in #30
New Contributors
Full Changelog: v2022.01.2...v2023.01.0
Comet v2022.01.2
This is a minor release update to address these issues:
- MSToolkit update to 20e99ce. Thanks to M. Hoopmann for addressing the MGF issue in the MSToolkit repo. Addresses issue 23.
- Add user message/warning when "spectrum_batch_size = 0" is set. Addresses issue 27.
- Add/return expectation value scores within real-time search interface.
Full Changelog: v2022.01.1...v2022.01.2
Comet v.2022.01.1
This is a minor release update to address these issues:
- Fix mzid output; all known errors were addressed so that files now validate. Thanks to J. Uszkoreit and M. Riffle for assisting with this process.
- The parameter entry "output_mzidentml" has been extended to allow control of whether the protein sequences in the <Seq> element are reported in the output mzid file.
- When running a regular search not using Comet's internal decoy peptides, any protein sequence that begins with the parameter string set in "decoy_prefix" will be annotated as a decoy entry in the mzid output. This is the same behavior as the Percolator pin output for annotating decoy matches when searching against a user supplied target-decoy database.
- A search with a single variable modification entry "79.99 STY" will run faster than if the modifications were specified separately, e.g. "79.99 S", "79.99 T", and "79.99 Y" as three separate variable modification entries. Comet will now automatically reduce/combine separate variable modifications to a single entry if possible. Thanks to C. Bielow/OpenMS for suggesting the optimization.
Full Changelog: v2022.01.0...v2022.01.1
Comet v2022.01.0
Documentation for parameters for release 2022.01 can be found here.
What's Changed
- Add support for the VariantComplex entries in PEFF databases. These are annotated as “sequence_substitution” elements in the pep.xml output. This functionality was implemented by M. Hoopmann and was actually present in the 2021.02 release.
- For the .pin output, decoy entries are now annotated with the “-1” decoy label under the “Label” column. Previously, the decoy annotations were supported only with Comet’s internal decoy searches. With this change, for “decoy_search = 0” searches (aka a user supplied target-decoy database), any database entry that matches the “decoy_prefix” text will be annotated with the “-1” decoy label.
- Add hidden parameter entry "clip_nterm_aa" which skips the N-term residue of every peptide. For example with trypsin digestion, tryptic peptides are generated then the n-term residue is removed before analysis. Feature requested by J. Luo. "Hidden" parameters are those that do not appear in the example params file downloadable from the website or generated by running "comet -p".
- Add hidden parameter entry "minimum_xcorr" which sets the minimum xcorr cutoff. By default, this cutoff is set to 0.0 (specifically 1e-8); for Crux compiled Comet, the default minimum xcorr is -999. Any peptide must score higher than this cutoff to be reported in the output. Feature requested by I. Smith.
- Bug fix: address memory leak in the ThreadPool code during search cleanup that caused segfaults under Linux, present since 2021.01.0. Thanks to D. Shteynberg and M. Hoopmann for the fix.
- Bug fix: correctly support the clip_nterm_methionine parameter in conjunction with PEFF searches is implemented with this release. Previous versions (including the 2021.02) did not correctly handle this combination.
- Bug fix: address memory pool issue present since 2021.01.0 for database indexing with num_threads set to "1". Thanks to A. Kertesz-Farkas for reporting the bug.
- Bug fix: move static "aminoacid_modification" elements before "terminal_modification" elements in the pep.xml output to conform to the schema. Thanks to D. Shteynberg for report the bug.
- There are no parameters changes so this version will work with comet.params files annotated as being for versions 2022.01, 2021.01 and 2020.01.
New Contributors
- @jesse-canterbury made their first contribution in #13
- @mhoopmann made their first contribution in #16
Full Changelog: v2021.02.0...v2022.01.0
Comet v2021.02.0
This release is effectively the same as v2021.01.0 with the exception of (i) being our first GitHub release, (ii) returning matched fragment neutral loss peaks for real-time search, and (iii) addressing some Crux integration issues. Huge thanks to D. Schweppe for and W. Fondrie for getting Comet migrated to and set up on GitHub, C. Grant for debugging a Comet-Crux linux segfault, and D. Shteynberg for looking into ThreadPool questions. Release notes can be found here.
What's Changed
- Add support for MacOS by @wfondrie in #3
- Add Action'd builds by @wfondrie in #4
- Add GitHub Actions for Releases and a Docker container by @wfondrie in #6
- Fix Windows Builds by @wfondrie in #7
- simplify the win build for comet.exe and wrapper only by @mammmals in #8
- Add artifact uploads by @wfondrie in #9
New Contributors
- @wfondrie made their first contribution in #3
- @mammmals made their first contribution in #8
- @jke000 made their first contribution in #11
Full Changelog: https://github.com/UWPR/Comet/commits/v2021.02.0