-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jemalloc defrag situation #364
Comments
Regarding the defrag and the need to defrag, I think we can try to think of ways to avoid the fragmentation. Few other project seem to need defrag. Am I wrong in assuming we get more fragmentation when I/O threads are used? A request is parsed into string objects which are later stored in the database as keys and values. When this is done in different threads, they get allocated in different arenas. Then they are all stored in the database. Maybe we can avoid it if we let the main thread allocate the keys and values that will be stored. Then, all the allocations done by I/O threads would be short-lived and not cause fragmentation. An idea I saw somewhere is that the request parser could return pointers into the query buffer, rather than duplicating the data as sds strings. A RESP request is always an array of strings (AKA multibulk) so we'd just need to store an offset and a length for each argument temporarily until the command returns. |
At AWS, we very rarely see places where defragmentation is actually useful. Most of the time, new data is getting added and removed at about the same rate and it naturally defragments itself. The only time it's really useful is when there is a fundamental shift in the workload and the data becomes permanently fragmented, and a one-time defrag will help free up a bunch of memory.
Reminds me a bit of the segcache architecture, which allocates in large segments with a lot of key/value pairs that are all freed as a bunch.
@hpatro ^ |
This ^ This is why I don't think it justifies the overhead/complexity to vendor jemalloc. linking #15 (comment) |
Then I think should gradually get rid of the The first thing we can do is to make it optional, I mean add a dumb variant of defrag. If we're running unpatched jemalloc, we can just realloc everything.
So to avoid excessive defrag work, we can measure fragmentation but wait an hour or so before we start the defrag to see if it resolves itself? The third thing would be to try to avoid creating fragmentation in the first place (if my speculation about I/O threads right). If we these works well in practice (live, not just release candidate) then in another release we can make it permanent and unvendor jemalloc. How about this plan? |
I came across this idea on a rust rewrite of Redis https://github.com/seppo0010/rsedis/blob/master/parser/src/lib.rs#L13-L28 |
@hpatro Yes, we still need to allocate those objects, but my point is that if it's done in the main thread (rather than in the I/O threads), they will be in the same jemalloc arena as the other allocations in the main thread. Jemalloc allocates from one arena per thread and if we have objects in lots of different arenas, that would cause more fragmentation. Does it make sense? We can still get fragmentation though, just a bit less, if this is true. |
That makes sense. I was trying to highlight the delayed robj creation for the storage layer can get really messy with the overall code structure in place. |
I see. The easiest solution is probably that the main thread allocates robj for all arguments immediately when it starts executing the command. It's not optimal but maybe it fixes the fragmentation issue. |
I don't think this is true for most cases. By default we do not use the per-cpu or per-thread arena, jemalloc does create by default multiple arenas(depend on # cpus), but usually all allocations are from arena 0. But allocations are going through the thread cache. It amortizes the access to arena to reduce the threads contention, by allocating batches of entries. But if system is not fragmented already, close in time allocations, even from different threads, will likely be allocated from close slabs/extents/mmaps. Thread cache also avoids going down to arena in case of short lived alloc/free. In any case, as the number of threads is not huge, I expect they have very little affect on fragmentation either way, the customer workload, as @madolson mentioned, is bigger factor in fragmentation(but not the only one). |
@madolson I agree the current defrag process is not so useful (I'm talking purely about the logic in jemalloc, it was improved with jemalloc 5.3 but still it is not guaranteed to even converge at all), but I disagree with the statement that "naturally defragment itself". I think that the current (pre-serverless) AWS experience hides the issue of memory being "locked" and not freed back to the OS. Another factor for evaluating the issues with defrag is the utilization, thanks to jemalloc bucket allocator, if you are at low utilization to begin with, fragmented memory will not cause major customer pain (besides visibility). Forward looking, when considering things like multi-tenancy, I believe fragmented memory will become a much bigger issue. |
Thanks for the insight, @zvi-code! Appreciate it.
I understand that you are providing a counter argument to the statement of "naturally defragment itself", which makes sense to me. I just wanted to clarify that my main rationale for de-vendoring jemalloc is that the overhead of vendoring any allocator for the sake of |
I didn't think about this case, you are right about the case where we want to defragment to release memory to the OS when running in a mixed environment (either multiple valkey processes or other processes running). |
defrag is enabled explicitly only when jemalloc is used so this should be fine. in other words, you get defrag with jemalloc only.
I am aligned with adding mimalloc as long as we don't vendor it. On a related note, I think we should move forward and remove the one-off
je_get_defrag_hint
patch and instead use more generic heuristics for defrag so that other allocators could benefit from the defrag feature as well. jemalloc should be devendored too.Originally posted by @PingXie in #346 (comment)
The text was updated successfully, but these errors were encountered: