Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Sage spends a significant portion of its runtime sorting (peptides, fragment ions, preliminary scoring candidates, etc.).
After preliminary scoring of all potential candidates (how many fragments matched?), Sage selects the top k (~50 or more, depending on settings) candidates for hyperscore calculation. Currently, the entire list of candidates (100s to 100,000s) is completely sorted, and then the best k elements are selected.
This PR merges in an implementation of the heapselect algorithm. A size-bounded min heap is constructed in-place to select the top k elements. This speeds up candidate selection, and can improve total runtime by 10-15% depending on the search settings.
quickcheck
is used to perform property-based testing of the heapselect algorithm, and one of the integration tests has been changed to usequickcheck
as well, to increase testing surface area. In addition, crate internals have been reworked a bit, to provide initial support for compiling for wasm (#77) - there is still some awkwardness here that warrants more clean up. Further restructuring/refactoring should enable support for additional file formats as well.Of note: