Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimization benefit/cost threshold #215

Open
pca006132 opened this issue Dec 27, 2024 · 4 comments
Open

optimization benefit/cost threshold #215

pca006132 opened this issue Dec 27, 2024 · 4 comments

Comments

@pca006132
Copy link

If I understand it correctly, the current interval tracing evaluator will perform optimization and generate a new tape whenever something can be optimized. However, this optimization itself can be costly: it is linear in the length of the tape, while the optimization may only remove a few instructions in some (extreme) cases. For tapes that are long, the optimization pass may not be worth it.

I'm thinking if it would make sense to track the use count of values, essentially do a dead-code elimination at the end of an execution. We collect all the instructions that are marked dead, reduce the use count of its dependencies, and repeat. This will only be linear in the number of instructions that are being removed. When this count reaches a certain threshold (percentage of the length of the tape, say), we do the backward pass or continue this DCE and use it to copy things from the front to the back.

The issue with this reference counting approach is that we need to store the use count for each instruction in the tape, and this reference count management is overhead when optimization is beneficial. This is probably only beneficial when the tape is long, say over several hundred instructions, and dependencies are shallow.

@mkeeter
Copy link
Owner

mkeeter commented Dec 28, 2024

Generating simplified expressions is definitely expensive (especially for the JIT); see RenderHints::simplify_tree_during_meshing as one strategy for doing it less often.

As for the reference-counting strategy, I think it's an interesting idea! It's probably low on my list of priorities, but I wanted to acknowledge that it's neat 😄

@pca006132
Copy link
Author

For the JIT, I'm curious about what makes generating it slow. Is it due to mmap? Can we allocate a large region and put the tapes consecutively? I'm thinking about this in my C++ implementation (https://github.com/elalish/manifold/pull/1122/files#diff-c331839d070dbcdee94043f16c77d222313b5c69c71c56884b349ed8e6fc513aR42-R75).

@mkeeter
Copy link
Owner

mkeeter commented Dec 28, 2024

Evaluation already recycles mmap regions, so I don't see that anywhere in the flamegraph. The main source of extra delay for the JIT backend is sys_icache_invalidate (on macOS); not sure if the equivalent flush_cache implementation on other OSes is similarly slow.

Screenshot 2024-12-28 at 9 13 50 AM

@pca006132
Copy link
Author

I see, that is probably a price we must pay :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants