-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat(tiering): Serialize hashes #6015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
1839c60 to
a6f53a6
Compare
a6f53a6 to
4205592
Compare
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
9dce7a9 to
c178a09
Compare
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
ecdfad3 to
40d054b
Compare
romange
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a big PR, I wish it could be split into multiple parts.
src/server/tiered_storage.cc
Outdated
| if (pv.Encoding() == kEncodingListPack) { | ||
| auto* lp = static_cast<uint8_t*>(pv.RObjPtr()); | ||
| size_t bytes = lpBytes(lp); | ||
| bytes += lpLength(lp) * 2 * 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment explaining the formula?
src/server/tiered_storage.cc
Outdated
| // TODO(vlad): Maybe split into different accessors? | ||
| // Do NOT enforce rules depending on dynamic runtime values as this is called | ||
| // when scheduling stash and just before succeeeding and is expected to return the same results | ||
| optional<pair<size_t /*size*/, CompactObj::ExternalRep>> EstimateSerializedSize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does more than EstimateSize, and with having optional and precise ExternalRep the behavior is confusing. Maybe call it GetSerializationDescriptor which will return
struct SerializationDescriptor {
size estimated_size;
CompactObj::ExternalRep repr;
and add NONE to ExternalRep enum? or alternatively add is_valid() { return size> 0; } and return size=0 for unfit objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I am not a big fan of chaining optionals—or other types like std::expected—one within another, especially when we control the wrapped class and it can describe the "undef" state itself. For example, in our codebase, we have using Result = std::optional; where ResultType is another optional, or std::optional<facade::ErrorReply> where ErrorReply can hold an empty state. These levels of indirection decrease readability, imho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree here, but in a properly structured code base type composition is always almost preferable
|
Removed the optional and just keeped a pair |
romange
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM + a minor comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for serializing and offloading hash data types to tiered storage. It introduces a generic serialization infrastructure that determines serialization parameters based on object type, serializes values to disk, and handles uploading them back to memory. The feature is gated behind the tiered_experimental_hash_support flag.
Key changes:
- Generic serialization functions
DetermineSerializationParams()andSerialize()that handle both strings and hashes - Propagation of
ExternalRepthrough the cooling and stashing pipeline to track serialization type - Implementation of hash serialization via
SerializedMapencoding and decoding
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/server/tiering/small_bins.h/cc |
Added current_entries_cnt to track entries in the current bin |
src/server/tiering/serialized_map_test.cc |
Updated test to use std::pair<string, string> instead of string_view pairs |
src/server/tiering/serialized_map.h |
Changed Input type from string_view pairs to string pairs |
src/server/tiering/decoders.cc |
Implemented SerializedMapDecoder::Upload() to reconstruct hash objects |
src/server/tiered_storage_test.cc |
Added test for hash offloading with experimental flag |
src/server/tiered_storage.h/cc |
Main implementation of generic serialization and hash support |
src/server/hset_family.cc |
Integrated tiered storage into hash set operations |
src/server/common.h/cc |
Added small_bins_filling_entries_cnt to stats |
src/core/detail/listpack_wrap.h/cc |
Added WithCapacity() factory method |
src/core/compact_object.h/cc |
Updated SetCool() to accept ExternalRep parameter and added Freeze() method |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/server/tiered_storage.cc
Outdated
| id = bin->first; | ||
| // TODO(vlad): Write bin to prepared buffer instead of allocating one | ||
| stash_string(bin->second); | ||
| if (auto prepared = op_manager_->PrepareStash(est_size); prepared) { |
Copilot
AI
Nov 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Size mismatch: PrepareStash(est_size) allocates space for the estimated serialized size of a single value, but bin->second.size() contains the size of a full bin which may include multiple values and metadata. This will either waste space or cause buffer overflow. Use bin->second.size() for PrepareStash instead of est_size.
| if (auto prepared = op_manager_->PrepareStash(est_size); prepared) { | |
| if (auto prepared = op_manager_->PrepareStash(bin->second.size()); prepared) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dranikpg wdyt? what is the relationship between est_size and bin->second.size() ?
|
The issue found is non testable because we always provide at least 4kb, which is all thats needed for the bin |
Ok, so this changed a lot. Add active path for serializing hashes:
EstimateSerializedSizeSerialize