Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use fma in decoding #148

Merged
merged 4 commits into from
Jan 26, 2024
Merged

use fma in decoding #148

merged 4 commits into from
Jan 26, 2024

Conversation

ericphanson
Copy link
Member

closes #146

Here the main question is whether to use muladd or fma. Quoting the muladd doctsring,

help?> Base.muladd
muladd(x, y, z)

Combined multiply-add: computes x*y+z, but allowing the add and multiply to be merged with each other or with surrounding operations for
performance. For example, this may be implemented as an fma if the hardware supports it efficiently. The result can be different on different
machines and can also be different on the same machine due to constant propagation or other optimizations. See fma.

The result being different on different machines sounds like the kind of thing we don't want to have to worry about with encode/decode when trying to track down reproducibility issues etc. Moreover, fma is widely available now. So I think it makes sense to stick with fma for an accuracy boost and speedup on most hardware, at the potential cost of a big slowdown for software emulation on unsupported hardware. I think all the hardware used at Beacon supports fma.

Note the existing tests pass as-is, including some floating point == tests, which I guess means we were already rounding correctly on the test data (?).

@ericphanson ericphanson requested a review from jrevels December 4, 2023 11:50
@ericphanson ericphanson changed the title use fma use fma in decoding Dec 4, 2023
Copy link
Member

@jrevels jrevels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

project.toml bump before merging?

Copy link
Member

@palday palday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If any results change, they're being changed to be more accurate, which I'm fine with. Moreover, if your signal is that sensitive to rounding errors in decoding (as opposed to in some iterative algorithm), you probably have bigger problems.

Apple M1 silicon has FMA support; on x86-64, both AMD and Intel chips have had good FMA support since 2012/2013, according to Wikipedia. So I'm not worried about a performance hit from software emulation.

@ericphanson ericphanson merged commit ba275fc into main Jan 26, 2024
8 checks passed
@ericphanson ericphanson deleted the eph/use-fma branch January 26, 2024 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use muladd / fma in encoding/decoding?
3 participants