Skip to content

Conversation

@dranikpg
Copy link
Contributor

@dranikpg dranikpg commented Nov 5, 2025

Ok, so this changed a lot. Add active path for serializing hashes:

  • Generic EstimateSerializedSize
  • Generic Serialize
  • Propagating ExternalRep in CompactObj, including cooling
  • Handle upload

Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
@dranikpg dranikpg requested a review from romange November 18, 2025 14:46
@dranikpg dranikpg marked this pull request as ready for review November 18, 2025 14:46
romange
romange previously approved these changes Nov 19, 2025
Signed-off-by: Vladislav Oleshko <vlad@dragonflydb.io>
@dranikpg dranikpg requested a review from romange November 20, 2025 11:47
Copy link
Collaborator

@romange romange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a big PR, I wish it could be split into multiple parts.

if (pv.Encoding() == kEncodingListPack) {
auto* lp = static_cast<uint8_t*>(pv.RObjPtr());
size_t bytes = lpBytes(lp);
bytes += lpLength(lp) * 2 * 4;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment explaining the formula?

// TODO(vlad): Maybe split into different accessors?
// Do NOT enforce rules depending on dynamic runtime values as this is called
// when scheduling stash and just before succeeeding and is expected to return the same results
optional<pair<size_t /*size*/, CompactObj::ExternalRep>> EstimateSerializedSize(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does more than EstimateSize, and with having optional and precise ExternalRep the behavior is confusing. Maybe call it GetSerializationDescriptor which will return

struct SerializationDescriptor {
size estimated_size;
CompactObj::ExternalRep repr;

and add NONE to ExternalRep enum? or alternatively add is_valid() { return size> 0; } and return size=0 for unfit objects.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I am not a big fan of chaining optionals—or other types like std::expected—one within another, especially when we control the wrapped class and it can describe the "undef" state itself. For example, in our codebase, we have using Result = std::optional; where ResultType is another optional, or std::optional<facade::ErrorReply> where ErrorReply can hold an empty state. These levels of indirection decrease readability, imho.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree here, but in a properly structured code base type composition is always almost preferable

@dranikpg dranikpg requested a review from romange November 21, 2025 09:41
@dranikpg
Copy link
Contributor Author

Removed the optional and just keeped a pair

Copy link
Collaborator

@romange romange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM + a minor comment.

@romange romange requested a review from Copilot November 24, 2025 18:56
Copilot finished reviewing on behalf of romange November 24, 2025 18:58
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for serializing and offloading hash data types to tiered storage. It introduces a generic serialization infrastructure that determines serialization parameters based on object type, serializes values to disk, and handles uploading them back to memory. The feature is gated behind the tiered_experimental_hash_support flag.

Key changes:

  • Generic serialization functions DetermineSerializationParams() and Serialize() that handle both strings and hashes
  • Propagation of ExternalRep through the cooling and stashing pipeline to track serialization type
  • Implementation of hash serialization via SerializedMap encoding and decoding

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/server/tiering/small_bins.h/cc Added current_entries_cnt to track entries in the current bin
src/server/tiering/serialized_map_test.cc Updated test to use std::pair<string, string> instead of string_view pairs
src/server/tiering/serialized_map.h Changed Input type from string_view pairs to string pairs
src/server/tiering/decoders.cc Implemented SerializedMapDecoder::Upload() to reconstruct hash objects
src/server/tiered_storage_test.cc Added test for hash offloading with experimental flag
src/server/tiered_storage.h/cc Main implementation of generic serialization and hash support
src/server/hset_family.cc Integrated tiered storage into hash set operations
src/server/common.h/cc Added small_bins_filling_entries_cnt to stats
src/core/detail/listpack_wrap.h/cc Added WithCapacity() factory method
src/core/compact_object.h/cc Updated SetCool() to accept ExternalRep parameter and added Freeze() method

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

id = bin->first;
// TODO(vlad): Write bin to prepared buffer instead of allocating one
stash_string(bin->second);
if (auto prepared = op_manager_->PrepareStash(est_size); prepared) {
Copy link

Copilot AI Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Size mismatch: PrepareStash(est_size) allocates space for the estimated serialized size of a single value, but bin->second.size() contains the size of a full bin which may include multiple values and metadata. This will either waste space or cause buffer overflow. Use bin->second.size() for PrepareStash instead of est_size.

Suggested change
if (auto prepared = op_manager_->PrepareStash(est_size); prepared) {
if (auto prepared = op_manager_->PrepareStash(bin->second.size()); prepared) {

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dranikpg wdyt? what is the relationship between est_size and bin->second.size() ?

@dragonflydb dragonflydb deleted a comment from Copilot AI Nov 24, 2025
@dranikpg dranikpg requested a review from romange November 27, 2025 11:25
@dranikpg
Copy link
Contributor Author

The issue found is non testable because we always provide at least 4kb, which is all thats needed for the bin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants