-
Notifications
You must be signed in to change notification settings - Fork 416
Reduce memory consumption of prepare gc active commits #9779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: chore/add-space-efficient-arena-based-maps
Are you sure you want to change the base?
Reduce memory consumption of prepare gc active commits #9779
Conversation
This reduces a third of memory - from ~9+ to ~6+ GiB in one case.
This saves ~40% memory when processing a huge repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request reduces memory consumption in garbage collection preparation by replacing standard Go maps with space-efficient arena maps and changing map[string]bool to map[string]struct{}. The changes also optimize memory usage by storing commit creation times as int64 microseconds instead of time.Time objects.
- Introduces arena.Map for storing commits, reducing memory footprint significantly
- Changes firstParents from
map[string]boolto a custom StringSet wrappingmap[string]struct{} - Replaces time.Time with int64 microseconds for commit creation timestamps
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| pkg/graveler/retention/active_commits.go | Replaces standard commit map with arena.Map, changes CommitNode to store creation time as int64 microseconds, updates Get() to return pointer, adds BinaryCommitID type for arena key conversion |
| pkg/graveler/ref/commit_ordered_iterator.go | Introduces StringSet wrapper to replace map[string]bool with map[string]struct{} for memory efficiency |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| CreationDate time.Time | ||
| MainParent graveler.CommitID | ||
| MetaRangeID graveler.MetaRangeID | ||
| // CreationSecs is in microseconds-since-epoch. It takes 3x less space than time.Time. |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment says "CreationSecs is in microseconds-since-epoch" but the field name is "CreationUsecs". The comment should say "CreationUsecs is in microseconds-since-epoch" to match the field name.
| // CreationSecs is in microseconds-since-epoch. It takes 3x less space than time.Time. | |
| // CreationUsecs is in microseconds-since-epoch. It takes 3x less space than time.Time. |
| func ToBinaryCommitID(hexID graveler.CommitID) BinaryCommitID { | ||
| b, err := hex.DecodeString(string(hexID)) | ||
| if err != nil { | ||
| // This check should happen at compile time but cannot happen at |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment says "This check should happen at compile time but cannot happen at" and appears to be incomplete. It should be completed or clarified to explain what the check is for and why it cannot happen at compile time.
| // This check should happen at compile time but cannot happen at | |
| // Ideally, the validity and format of CommitID should be enforced at compile time, | |
| // but since CommitID is a string and its value is only known at runtime, this check | |
| // must happen here. If this panic occurs, it indicates a bug elsewhere in the code. |
| b, err := hex.DecodeString(string(hexID)) | ||
| if err != nil { | ||
| // This check should happen at compile time but cannot happen at | ||
| panic(fmt.Sprintf("decode hex %s: %w", hexID, err)) |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The panic message uses %w format specifier which is for error wrapping with fmt.Errorf, but this is in fmt.Sprintf which doesn't support %w. This should use %s or %v instead to properly format the error.
| panic(fmt.Sprintf("decode hex %s: %w", hexID, err)) | |
| panic(fmt.Sprintf("decode hex %s: %s", hexID, err)) |
| type nothing struct{} | ||
|
|
||
| type StringSet struct { | ||
| set map[string]nothing | ||
| } | ||
|
|
||
| func NewStringSet() *StringSet { | ||
| return &StringSet{set: make(map[string]nothing)} | ||
| } | ||
|
|
||
| func (s *StringSet) Set(str string) { | ||
| s.set[str] = nothing{} |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type name "nothing" is not descriptive or idiomatic for Go. The conventional and more clear name for an empty struct used as a placeholder is "struct{}" used inline, or if you need a named type, a more descriptive name like "empty" or "marker" would be better. However, the most idiomatic approach would be to use "struct{}" directly in the map definition without creating a separate type.
| type nothing struct{} | |
| type StringSet struct { | |
| set map[string]nothing | |
| } | |
| func NewStringSet() *StringSet { | |
| return &StringSet{set: make(map[string]nothing)} | |
| } | |
| func (s *StringSet) Set(str string) { | |
| s.set[str] = nothing{} | |
| type StringSet struct { | |
| set map[string]struct{} | |
| } | |
| func NewStringSet() *StringSet { | |
| return &StringSet{set: make(map[string]struct{})} | |
| } | |
| func (s *StringSet) Set(str string) { | |
| s.set[str] = struct{}{} |
| func ToBinaryCommitID(hexID graveler.CommitID) BinaryCommitID { | ||
| b, err := hex.DecodeString(string(hexID)) | ||
| if err != nil { | ||
| // This check should happen at compile time but cannot happen at | ||
| panic(fmt.Sprintf("decode hex %s: %w", hexID, err)) | ||
| } | ||
| return BinaryCommitID(b[:arena.KEY_SIZE_BOUND]) |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ToBinaryCommitID function truncates commit IDs to arena.KEY_SIZE_BOUND (16 bytes), but commit IDs are SHA256 hashes which are 32 bytes when decoded from hex (64 hex characters as per commitIDStringLength constant). This truncation creates a significant risk of collisions - two different commit IDs could map to the same BinaryCommitID if they share the same first 16 bytes. This could lead to incorrect behavior where different commits are treated as the same commit in the arena map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's just a missing commit; of course I tested it with the correct value.
| res := make(map[graveler.CommitID]graveler.MetaRangeID) | ||
| for commitID := range commitSet { | ||
| res[commitID] = commitNodes[commitID].MetaRangeID | ||
| res[commitID] = commitNodes.Get(ToBinaryCommitID(commitID)).MetaRangeID |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code assumes that Get will never return nil for commit IDs in the commitSet. If a commit ID exists in commitSet but is not found in commitNodes (which could happen due to the truncation issue in ToBinaryCommitID or other data consistency issues), this will panic when trying to access .MetaRangeID on a nil pointer. Consider adding a nil check and appropriate error handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not done - the previous code would emit a blank MetaRangeID here. So it had the exact same failure mode, only it could only be discovered further downstream.
Of course not finding the commit node here indicates an internal consistency error - these are literally the same commits.
Discovered during manually running GC.
Flagged by copilot
It needs to use 256-bit commit IDs - this is now required by active_commits.
nopcoder
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - nice work in reducing the memory footprint.
kept the comments on the data structures - left almost nothing to review.
left a small comment related to naming and extra code that I suggest to drop.
| firstParents *StringSet | ||
| } | ||
|
|
||
| type nothing struct{} | ||
|
|
||
| type StringSet struct { | ||
| set map[string]nothing | ||
| } | ||
|
|
||
| func NewStringSet() *StringSet { | ||
| return &StringSet{set: make(map[string]nothing)} | ||
| } | ||
|
|
||
| func (s *StringSet) Set(str string) { | ||
| s.set[str] = nothing{} | ||
| } | ||
|
|
||
| func (s *StringSet) Get(str string) bool { | ||
| _, ok := s.set[str] | ||
| return ok |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use map[string]struct{}
No need to define a struct, hold a pointer to it, adding methods - over eng.
| // arena.KEY_SIZE_BOUND bytes. | ||
| type BinaryCommitID string | ||
|
|
||
| func ToBinaryCommitID(hexID graveler.CommitID) BinaryCommitID { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest to add MustXXX to reflect that this func panic in case of an error
map[string]booltomap[string]struct{}Together (of course the first does almost all the work) this reduces memory usage by almost a half.