Conversation
Signed-off-by: zh Wang <rekind133@outlook.com>
achirkin
left a comment
There was a problem hiding this comment.
Thank you for the contribution. I see the PR changes the accumulation type from fp16 to fp32 to avoid fp overflow. However I'm not sure if this is desirable in general and whether the speed drop is acceptable for the cases when the overflow doesn't happen.
Maybe we'd better just advise the user to switch to fp32 variant of the algorithm?
Please support the PR with the benchmark results (using cuvs ann-bench) before and after the PR for fp16 and fp32 if you decide to proceed with this approach.
| template <> | ||
| struct config<half> { | ||
| using value_t = half; | ||
| using value_t = float; |
There was a problem hiding this comment.
Please check whether this is used outside the IVF-Flat. Changing the accumulation type like this can have a drastic impact on performance.
There was a problem hiding this comment.
I did simple benchmark and it showed no significant differences. I'll use cuvs ann-bench to get a more detailed benchmark results later
There was a problem hiding this comment.
@hhy3 any updates here? We're about to begin burndown for 25.08 release. Should we consider this for 25.08 or push to 25.10 (October)?
This PR fixes issue #914 that accumulation using fp16 causes overflow