optimization benefit/cost threshold #215

pca006132 · 2024-12-27T16:34:42Z

If I understand it correctly, the current interval tracing evaluator will perform optimization and generate a new tape whenever something can be optimized. However, this optimization itself can be costly: it is linear in the length of the tape, while the optimization may only remove a few instructions in some (extreme) cases. For tapes that are long, the optimization pass may not be worth it.

I'm thinking if it would make sense to track the use count of values, essentially do a dead-code elimination at the end of an execution. We collect all the instructions that are marked dead, reduce the use count of its dependencies, and repeat. This will only be linear in the number of instructions that are being removed. When this count reaches a certain threshold (percentage of the length of the tape, say), we do the backward pass or continue this DCE and use it to copy things from the front to the back.

The issue with this reference counting approach is that we need to store the use count for each instruction in the tape, and this reference count management is overhead when optimization is beneficial. This is probably only beneficial when the tape is long, say over several hundred instructions, and dependencies are shallow.

mkeeter · 2024-12-28T00:08:35Z

Generating simplified expressions is definitely expensive (especially for the JIT); see RenderHints::simplify_tree_during_meshing as one strategy for doing it less often.

As for the reference-counting strategy, I think it's an interesting idea! It's probably low on my list of priorities, but I wanted to acknowledge that it's neat 😄

pca006132 · 2024-12-28T02:48:34Z

For the JIT, I'm curious about what makes generating it slow. Is it due to mmap? Can we allocate a large region and put the tapes consecutively? I'm thinking about this in my C++ implementation (https://github.com/elalish/manifold/pull/1122/files#diff-c331839d070dbcdee94043f16c77d222313b5c69c71c56884b349ed8e6fc513aR42-R75).

mkeeter · 2024-12-28T14:14:13Z

Evaluation already recycles mmap regions, so I don't see that anywhere in the flamegraph. The main source of extra delay for the JIT backend is sys_icache_invalidate (on macOS); not sure if the equivalent flush_cache implementation on other OSes is similarly slow.

pca006132 · 2024-12-28T14:58:30Z

I see, that is probably a price we must pay :(

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimization benefit/cost threshold #215

optimization benefit/cost threshold #215

pca006132 commented Dec 27, 2024

mkeeter commented Dec 28, 2024

pca006132 commented Dec 28, 2024

mkeeter commented Dec 28, 2024

pca006132 commented Dec 28, 2024

optimization benefit/cost threshold #215

optimization benefit/cost threshold #215

Comments

pca006132 commented Dec 27, 2024

mkeeter commented Dec 28, 2024

pca006132 commented Dec 28, 2024

mkeeter commented Dec 28, 2024

pca006132 commented Dec 28, 2024