-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove useless IR when fusing instructions #234
Conversation
Compared to PR #232, introducing a memory pool can effectively reduce memory usage, as valgrind only records the memory you actually use. For example, we allocate 2^32 bytes of memory using Here, I compare the performance based on Benchmark / Performance regression check and memory usage based on host machine.
|
I believe that the benefit of memory reduction mostly comes from If we think supporting a platform without |
Good point. In the context of utilizing the WebAssembly System Interface (WASI), it is apparent that there is no equivalent functionality to @qwe661234, you should take into consideration the scenario where mmap may not be feasibly usable on certain platforms. |
Even if we cannot use I believe that using a linked list to implement IR is still a more flexible approach if we want to remove any IR nodes from this linked list during instruction fusion. However, it consumes more memory than |
I'm not sure if it is safe to say something like" Although we are not good, at least we aren't that bad at all compared to others". In other words, the 2^32 bytes requirement of data memory may not be something that isn't able to do better. So I am not sure if we have to be restricted for this.
If |
Another idea here. What if we try to combine both solutions' advantages? Specifically, we still use the memory pool but allocate the chunk of memory in a single The possible issue for this solution could be the cost for |
To maintain compactness, we must reduce our memory usage to be lower than that of QEMU. If this approach can achieve this goal while maintaining acceptable performance, it would be preferable for this PR. |
Sure, 196cffbe shows the implementation of my new idea. 196cffbe with
#234 with
196cffbe w/o
#234 w/o
I haven't carefully experimented with the performance and other benchmarks yet. but it looks like some advantage could be brought for this new solution:
|
@jserv, I think this idea is better than using array to store IRs. With this modification, we can easily remove useless IR directly, instead of skipping it. |
Based on the feedback, I would like to merge pull request #232 first and then request improvements in IR manipulation by effectively removing unneeded instructions. |
The new implementation is merged to #232 now! We could discuss over there if we have a consensus to use that solution. |
38d663d
to
754006b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Squash the git commit and refine the commit message to address the benefit by use case.
Originally, we need to keep and handle useless IR carefully when fusing instructions. Now, we can discard these useless IRs because of the modification of IRs' data structure from array to singly-linked list.
06ed289
to
ac05003
Compare
I defer to @RinHizakura for confirmation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jserv Now it looks great to me!
Originally, memory usage sharply increased during the frequent allocation and freeing of basic blocks and IR arrays. To reduce memory usage, we introduced a block memory pool and an IR array memory pool.
As shown in the analysis below, we experienced about a 10% performance loss due to the reduction in memory usage. However, we are currently utilizing KiB of memory instead of MiB.