Skip to content

Fix fp16 overflow#1084

Open
hhy3 wants to merge 1 commit intorapidsai:branch-25.08from
hhy3:fix_fp16_overflow
Open

Fix fp16 overflow#1084
hhy3 wants to merge 1 commit intorapidsai:branch-25.08from
hhy3:fix_fp16_overflow

Conversation

@hhy3
Copy link

@hhy3 hhy3 commented Jul 4, 2025

This PR fixes issue #914 that accumulation using fp16 causes overflow

Signed-off-by: zh Wang <rekind133@outlook.com>
@hhy3 hhy3 requested a review from a team as a code owner July 4, 2025 05:51
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jul 4, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the cpp label Jul 4, 2025
Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution. I see the PR changes the accumulation type from fp16 to fp32 to avoid fp overflow. However I'm not sure if this is desirable in general and whether the speed drop is acceptable for the cases when the overflow doesn't happen.
Maybe we'd better just advise the user to switch to fp32 variant of the algorithm?
Please support the PR with the benchmark results (using cuvs ann-bench) before and after the PR for fp16 and fp32 if you decide to proceed with this approach.

template <>
struct config<half> {
using value_t = half;
using value_t = float;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check whether this is used outside the IVF-Flat. Changing the accumulation type like this can have a drastic impact on performance.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did simple benchmark and it showed no significant differences. I'll use cuvs ann-bench to get a more detailed benchmark results later

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hhy3 any updates here? We're about to begin burndown for 25.08 release. Should we consider this for 25.08 or push to 25.10 (October)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cjnolet hi, push it to 25.10, thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cpp non-breaking Introduces a non-breaking change

Projects

Development

Successfully merging this pull request may close these issues.

3 participants