Conversation
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I wonder if this PR's approach will preserve some of the benefits of static flags via inlining. It's certainly nicer than having FLAGS everywhere. |
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (744269b): comparison URL. Overall result: ❌ regressions - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 2.4%, secondary -0.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary 3.0%, secondary 3.3%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 480.396s -> 481.316s (0.19%) |
This comment has been minimized.
This comment has been minimized.
|
Just enough of a regression that it's hard to justify, alas. I see now that Relatedly, I found the following functions take a
They can be changed to take a |
cd94110 to
940ba86
Compare
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (06d3be4): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (secondary -5.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -2.8%, secondary -3.2%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 487.064s -> 481.364s (-1.17%) |
`SemiDynamicQueryDispatcher` is just a `QueryVTable` wrapper with an additional `const FLAGS: QueryFlags` generic parameter that contains three booleans. This arrangement exists as a performance optimization. But the performance effects are very small and it adds quite a bit of complexity to an already overly-complex part of the codebase. If it didn't exist and somebody proposed adding it and asked me to review, I almost certainly wouldn't approve it. This commit removes it. The three booleans in `QueryFlags` are moved into `QueryVTable` The non-trivial methods of `SemiDynamicQueryDispatcher` become methods of `QueryVTable`.
It's now `query_vtable` because its return type changed. And thanks to the previous commit it can be manually inlined in several places. (The only remaining calls to it are in `make_dep_kind_vtable_for_query`, which are more challenging to remove.)
940ba86 to
e7fb201
Compare
|
|
|
Looking at icounts for primary benchmarks, there are only 14 runs where the regression is considered significant, and the worst case is 0.5%. 8 of the 14 are The story for secondary benchmark is pretty similar. Overall, the perf effects of this optimization is just really underwhelming. I don't think it's worth it, especially in part of the code that is already very complex and there is lots of ongoing work to reduce complexity. |
|
☔ The latest upstream changes (presumably #152747) made this pull request unmergeable. Please resolve the merge conflicts. |
When I first tried to get rid of FLAGS, I felt discouraged by the small-but-measurable perf regression. But every time I run into how annoying it is to deal with having FLAGS everywhere, that regression starts to look more and more like something we should accept for the big simplicity win. |
|
View all comments
The performance wins provided by these types are meagre, and I don't think they justify the code complexity they introduce.
r? @Zalathar