Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-117581: Specialize binary operators by refcount as well as type. #117627

Closed

Conversation

markshannon
Copy link
Member

@markshannon markshannon commented Apr 8, 2024

This PR changes the specialization of binary operators for the current, fairly ad-hoc approach to a bit more principled approach of specializing by refcount and using table lookup to specialize by type.

Tier 1 performance is in the noise, maybe showing a slight slowdown.

I expect the advantages of this approach to show up when applied to BINARY_SUBSCR, COMPARE_OP and with the JIT.

It also opens the possibility of specializing binary operators for numpy arrays and the Decimal class.

unused/1 +
_GUARD_NOS_REFCNT1 +
_GUARD_TOS_IMMORTAL +
_BINARY_OP_TABLE_NN;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please document in the bytecodes.c file what these suffixes mean? Some of them are easy to see, but I don't know what's NN vs ND immediately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@markshannon markshannon marked this pull request as ready for review May 21, 2024 19:54
@rhettinger rhettinger removed their request for review May 21, 2024 21:12
@markshannon
Copy link
Member Author

Performance on both tier 1 and tier 2 is about 1% slower.
With deferred reference counting, there will be no need for specializing on refcount, so I'm deferring this until deferred reference counting is done.

@markshannon
Copy link
Member Author

Earlier results had an issue with the telco benchmark, probably due to the decimal module falling back to the pure Python version.

Benchmarking on tier 1 shows the slowdown to be in the noise.
The significant results are a speedup of 7% for spectral_norm and a slowdown of 12% for nbody.
This is presumably due to more complete specialization for spectral_norm and additional overhead for the already well specialized nbody code.
We expect the advantage of better specialization to be important in tier 2, but the additional tier 1 overhead to be largely irrelevant there.

@markshannon markshannon reopened this May 29, 2024
@brandtbucher brandtbucher self-requested a review June 6, 2024 16:48
@brettcannon brettcannon removed their request for review June 11, 2024 20:34
@markshannon
Copy link
Member Author

Abandoning this as #128722 is handling the same problem and we have a better ideas for reducing refcounting overhead faster-cpython/ideas#700

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants