Skip to content

Possible Performance Improvements

bprail edited this page Feb 4, 2017 · 7 revisions

TBB supposedly has a fast, parallel memory allocator. Can a quick test be run to see if it would help the front-end runtime?

Most benchmarks have low overhead from the instrumentation. Fluidanimate is an exception. Further tracing shows that most of its instrumentation overhead stems from the ticket:

0.01 x lock xadd %rsi,__ctGlobalOrderNumber
7.60 x shl x20,%rdx

Could try hashing the address into a limited set of ticket counters?

Streaming Compression

Clone this wiki locally