-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory usage for instruction block #232
Conversation
This changes the method of accessing the instructions, so carefully testing to make sure everything has changed correctly is important. But as #228 reports, we have some issues with the compliance test now. |
I have implemented this without modifing the large number of code in this branch. After merging branch |
We should prioritize the implementation of the interpreter as the foundation for further optimizations, such as a memory pool. The |
It'll be better if there's a less-modified solution, but I suspect that whether On the other hand, this PR is tried to give "just enough" size for each block. Although we can still allocate too much memory, by expanding the memory according to the need of Of course, I may have to do more experiments to make sure I am correct. |
fe5c2a1
to
babc35f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebase the latest master
branch and utilize FORCE_INLINE
macro.
The original strategy for the allocation of instruction blocks is waste of memory. For every single block, we always create a space that can contain (1 << 10)
rv_insn_t
. However, we usually don't have that much instruction in one block in most of the case.We can simply know the heap usage by using valgrind. We can see that 20,306,989 bytes are allocated on the run of
puzzle.elf
for the old design.To address the issue, we can simply maintain a pool of
rv_insn_t
, and take only the required numbers ofrv_insn_t
space from it. This ensures heap allocation only happens when the pool is out ofrv_insn_t
. By using the parameterBLOCK_POOL_SIZE
, we have the flexibility to get the balance between the numbers ofcalloc
calls and the memory usage.In this way, we have great improvement for the heap memory allocation. As the following result, only 313,461 bytes are allocated on the
puzzle.elf
example.Because two instructions in sequence may now be separated into two discontinuous memory spaces, a drawback of this design is the cost of random access to the instruction. It seems that we only need random access to the instructions for some fuse operations, I think this might not be a big problem. We can still linear search that instruction with
GET_NEXT_N_INSN
macro in a relatively inefficient manner. The design could also introduce some cache locality issues two for the discontinuous memory spaces, but we may be able to adjustBLOCK_POOL_SIZE
to trade-off for this.