Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory usage for instruction block #232

Merged
merged 1 commit into from
Oct 2, 2023

Conversation

RinHizakura
Copy link
Collaborator

The original strategy for the allocation of instruction blocks is waste of memory. For every single block, we always create a space that can contain (1 << 10) rv_insn_t. However, we usually don't have that much instruction in one block in most of the case.

We can simply know the heap usage by using valgrind. We can see that 20,306,989 bytes are allocated on the run of puzzle.elf for the old design.

$ valgrind ./build/rv32emu ./build/puzzle.elf
==62803== Memcheck, a memory error detector
==62803== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==62803== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==62803== Command: ./build/rv32emu ./build/puzzle.elf
==62803== 
==62803== Warning: set address range perms: large range [0x59c87000, 0x159c87000) (defined)
success in 2005 trials
inferior exit code 0
==62803== Warning: set address range perms: large range [0x59c87000, 0x159c87000) (noaccess)
==62803== 
==62803== HEAP SUMMARY:
==62803==     in use at exit: 39,124 bytes in 259 blocks
==62803==   total heap usage: 1,281 allocs, 1,022 frees, 20,306,989 bytes allocated
==62803== 
==62803== LEAK SUMMARY:
==62803==    definitely lost: 0 bytes in 0 blocks
==62803==    indirectly lost: 0 bytes in 0 blocks
==62803==      possibly lost: 0 bytes in 0 blocks
==62803==    still reachable: 37,108 bytes in 238 blocks
==62803==         suppressed: 0 bytes in 0 blocks
==62803== Rerun with --leak-check=full to see details of leaked memory
==62803== 
==62803== For lists of detected and suppressed errors, rerun with: -s
==62803== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

To address the issue, we can simply maintain a pool of rv_insn_t, and take only the required numbers of rv_insn_t space from it. This ensures heap allocation only happens when the pool is out of rv_insn_t. By using the parameter BLOCK_POOL_SIZE, we have the flexibility to get the balance between the numbers of calloc calls and the memory usage.

In this way, we have great improvement for the heap memory allocation. As the following result, only 313,461 bytes are allocated on the puzzle.elf example.

$ valgrind ./build/rv32emu ./build/puzzle.elf
==62927== Memcheck, a memory error detector
==62927== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==62927== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==62927== Command: ./build/rv32emu ./build/puzzle.elf
==62927== 
==62927== Warning: set address range perms: large range [0x59c87000, 0x159c87000) (defined)
success in 2005 trials
inferior exit code 0
==62927== Warning: set address range perms: large range [0x59c87000, 0x159c87000) (noaccess)
==62927== 
==62927== HEAP SUMMARY:
==62927==     in use at exit: 39,124 bytes in 259 blocks
==62927==   total heap usage: 940 allocs, 681 frees, 313,461 bytes allocated
==62927== 
==62927== LEAK SUMMARY:
==62927==    definitely lost: 0 bytes in 0 blocks
==62927==    indirectly lost: 0 bytes in 0 blocks
==62927==      possibly lost: 0 bytes in 0 blocks
==62927==    still reachable: 37,108 bytes in 238 blocks
==62927==         suppressed: 0 bytes in 0 blocks
==62927== Rerun with --leak-check=full to see details of leaked memory
==62927== 
==62927== For lists of detected and suppressed errors, rerun with: -s
==62927== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Because two instructions in sequence may now be separated into two discontinuous memory spaces, a drawback of this design is the cost of random access to the instruction. It seems that we only need random access to the instructions for some fuse operations, I think this might not be a big problem. We can still linear search that instruction with GET_NEXT_N_INSN macro in a relatively inefficient manner. The design could also introduce some cache locality issues two for the discontinuous memory spaces, but we may be able to adjust BLOCK_POOL_SIZE to trade-off for this.

src/emulate.c Dismissed Show dismissed Hide dismissed
@RinHizakura
Copy link
Collaborator Author

This changes the method of accessing the instructions, so carefully testing to make sure everything has changed correctly is important. But as #228 reports, we have some issues with the compliance test now.

@jserv jserv requested a review from qwe661234 September 26, 2023 04:52
@qwe661234
Copy link
Collaborator

I have implemented this without modifing the large number of code in this branch. After merging branch wip/jit back to master, the interpreter's memory usage will be reduced.

@jserv
Copy link
Contributor

jserv commented Sep 26, 2023

I have implemented this without modifing the large number of code in this branch. After merging branch wip/jit back to master, the interpreter's memory usage will be reduced.

We should prioritize the implementation of the interpreter as the foundation for further optimizations, such as a memory pool. The wip/jit branch is for development purposes, and we should merge our efforts back into the master branch when possible.

@RinHizakura
Copy link
Collaborator Author

RinHizakura commented Sep 26, 2023

I have implemented this without modifing the large number of code in this branch. After merging branch wip/jit back to master, the interpreter's memory usage will be reduced.

It'll be better if there's a less-modified solution, but I suspect that whether mpool can address the issue properly. As far as I know, mpool is useful for the chunk of memory which will be allocated and free frequently. However, for rv_insn_t in each block, they will keep existing after being allocated until closing the emulator. Also, it looks like mpool will still allocate a fixed large chunk for each block, which still introduces unused memory for each block.

On the other hand, this PR is tried to give "just enough" size for each block. Although we can still allocate too much memory, by expanding the memory according to the need of rv_insn_t instead of block_t, I believe this solution is possible to save more space.

Of course, I may have to do more experiments to make sure I am correct.

src/emulate.c Outdated Show resolved Hide resolved
@RinHizakura RinHizakura force-pushed the master branch 3 times, most recently from fe5c2a1 to babc35f Compare September 30, 2023 05:49
src/emulate.c Outdated Show resolved Hide resolved
src/emulate.c Outdated Show resolved Hide resolved
src/emulate.c Outdated Show resolved Hide resolved
src/riscv.c Outdated Show resolved Hide resolved
src/riscv_private.h Outdated Show resolved Hide resolved
Copy link
Contributor

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebase the latest master branch and utilize FORCE_INLINE macro.

@jserv jserv merged commit 0031224 into sysprog21:master Oct 2, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants