-
-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-100239: specialize long tail of binary operations #128722
Conversation
Nice! The arithmetic benchmarks show a good speedup in the provided link:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This already looks quite good. To get a feeling for the interfaces I added a few more specializations. See iritkatriel/cpython@gh-100239...eendebakpt:cpython:gh-100239-list-tuple-add
Adding more specializations is quite easy, but if we end up adding more we will need some more macros or tooling (such as for example the TRY_BINARY_SPECIALIZATION
in the branch above). Fine to leave that to a followup PR though.
Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks promising. We might want to add some filtering based on the class before calling the guard function when specializing. Calling a long chain of guard functions could be expensive.
Looking at the stats, it seems that this doesn't make that much difference to the number of BINARY_OP
s that are specialized.
Unfortunately the stats don't tell us which class pairs to add, but I think str % str
and str % tuple
would be worth a look.
Or we could enhance the stats to give us cls/cls/operator triples, at least for those classes with a small version number?
…naryOpSpecializationDescr
This reverts commit ede9e8c.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of suggestions, but nothing blocking.
OOI what was causing the earlier test failures?
This was missing: #128892 The test assumes some valid opcode is invalid. |
Co-authored-by: Mark Shannon <mark@hotpy.org>
I repeated the benchmarks with the multiply bug (that prevented specialisation) fixed: |
This implements part of #100239: the four arithmetic ops between
int, float
andfloat, int
.Microbenchmarks:
Old:
New:
So performance is 20-30% better, and also more uniform (old is 10% slower for
int+float
compared tofloat+int
).Full benchmarks don't show an overall speedup, but they do show better specialisation stats for BINARY_OP:
https://github.com/faster-cpython/benchmarking-public/tree/main/results/bm-20250110-3.14.0a3+-7264e37#readme