Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement heapselect algorithm #80

Merged
merged 3 commits into from
Aug 1, 2023
Merged

Implement heapselect algorithm #80

merged 3 commits into from
Aug 1, 2023

Conversation

lazear
Copy link
Owner

@lazear lazear commented Aug 1, 2023

Sage spends a significant portion of its runtime sorting (peptides, fragment ions, preliminary scoring candidates, etc.).

After preliminary scoring of all potential candidates (how many fragments matched?), Sage selects the top k (~50 or more, depending on settings) candidates for hyperscore calculation. Currently, the entire list of candidates (100s to 100,000s) is completely sorted, and then the best k elements are selected.

This PR merges in an implementation of the heapselect algorithm. A size-bounded min heap is constructed in-place to select the top k elements. This speeds up candidate selection, and can improve total runtime by 10-15% depending on the search settings.

quickcheck is used to perform property-based testing of the heapselect algorithm, and one of the integration tests has been changed to use quickcheck as well, to increase testing surface area. In addition, crate internals have been reworked a bit, to provide initial support for compiling for wasm (#77) - there is still some awkwardness here that warrants more clean up. Further restructuring/refactoring should enable support for additional file formats as well.

Of note:

  • Results may differ between this version of Sage and prior versions, due to a different stable sorting order, and heapselect being applied to results of sub searches (e.g. z=2, z=3, z=4)

@@ -8,16 +8,17 @@ use sage_core::spectrum::SpectrumProcessor;
#[test]
fn integration() -> anyhow::Result<()> {
let mut builder = Builder::default();
builder.update_fasta("../../tests/Q99536.fasta".into());
builder.update_fasta("foo".into());
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should try to remove the fasta field from parameters while maintaining backwards compatibility

crates/sage-cloudpath/src/util.rs Outdated Show resolved Hide resolved
- Remove unnecessary Serialize
- Additional property-based tests
- Clean up internal code
- Use thiserror
@lazear lazear merged commit 3ae96df into master Aug 1, 2023
1 check passed
@lazear lazear deleted the heap branch August 1, 2023 03:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant