-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor optimization of switch statement in Clang 19.1.0 compared to Clang 18.1.0 #127365
Comments
I added a more minimal reproduction (without the benchmarking boilerplate) in the issue description. |
https://godbolt.org/z/9n4hGc1cs |
Ideally we'd prevent unfolding for cases which can't be SROAd anyway, but I think the most straightforward fix for this would probably be to insert the GEPs into the phi predecessor blocks instead of the entry block. In that case we should be able to sink them down again in SimplifyCFG. |
I'd like to work on this but I'm not very sure where is the best place to put the sink part. And I'm not sure if transforming |
Done in #127652, but this didn't work out because LICM still ends up hoisting the GEPs, before SimplifyCFG can sink them.
SimplifyCFG already does sinking, but it expects the instructions to the in the predecessor blocks. It won't try to sink instructions from other blocks. You could try allowing that for instructions that aren't used anywhere exception the phi node. The other way to fix this is to prevent SROA from making the transform. The select/phi unfolding is something of a hack, because it's done unconditionally. SROA also has handling to speculate accesses of selects/phis (see e.g. speculatePHINodeLoads). I believe the gep/select unfolding is only ever actually useful in conjunction with those, so it would make sense to extend them to also support an intermediate gep. This would ensure we only perform the transform when it enables SROA. Doing this is probably fairly tricky though. |
@nikic What if you just changed the currently-unconditional select/phi unfolding SROA code to prevent it from unfolding gep of phis that have more than some arbitrarily chosen number of (unique) incoming values originating from the (same) switch instruction. That, or prevent the code from unfolding gep of phis with more than a certain number of incoming values entirely. |
I was doing some tests/benchmarks regarding
switch
vs array look-ups and found this change in behavior from Clang 18.1.0 to Clang 19.1.0 (and current trunk):Clang 18.1.0 optimizes that big switch as a constant lookup table:
On the other hand, Clang 19.1.0 generates a separate label for each switch case, and every label feeds into a main one:
This can tank the performance, for example if the branch predictor can't accurately predict which label you're going to access on the current iteration. In my example I'm generating random indexes and with
perf stat
I'm seeing almost 300 million branch misses (one for eachincrement()
invocation).Assuming that this change isn't an intentional trade-off made for a benefit in some other usecases, then this is a regression.
On my machine, the results of running that binary (same source code as the one in Godbolt) compiled with Clang 18 vs Clang 19 are as follows:
So the binary generated by Clang 19 is about 11 times slower.
NOTE: The issue seems to be related to inlining, because if I add
__attribute__((noinline))
to theincrement()
function, then Clang 19 optimizes it with a lookup table, just like Clang 18, and the result is much faster than what I get with inlining allowed:The text was updated successfully, but these errors were encountered: