Add some per key optimization for UDT in memtable only feature #13031

jowlyzhang · 2024-09-24T20:37:59Z

This PR added some optimizations for the per key handling for SST file for the user-defined timestamps in Memtable only feature. CPU profiling shows this part is a big culprit for regression. This optimization saves some string construction/destruction/appending/copying. vector operations like reserve/emplace_back.

When iterating keys in a block, we need to copy some shared bytes from previous key, put it together with the non shared bytes and find a right location to pad the min timestamp. Previously, we create a tmp local string buffer to first construct the key from its pieces, and then copying this local string's content into IterKey's buffer. To avoid having this local string and to avoid this extra copy. Instead of piecing together the key in a local string first, we just track all the pieces that make this key in a reused Slice array. And then copy the pieces in order into IterKey's buffer. Since the previous key should be kept intact while we are copying some shared bytes from it, we added a secondary buffer in IterKey and alternate between primary buffer and secondary buffer.

Test plan:
Existing tests.

facebook-github-bot · 2024-09-25T21:27:48Z

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

ltamasi

Thanks a lot for improving this @jowlyzhang !

ltamasi · 2024-10-03T18:24:23Z

db/dbformat.cc

+
+void IterKey::EnlargeSecondaryBufferIfNeeded(size_t key_size) {
+  // If size is smaller than buffer size, continue using current buffer,
+  // or the static allocated one, as default


Very minor but seems to me the buffers are not actually statically allocated; maybe call them something like "fixed-size" or "inline"

ltamasi · 2024-10-03T18:25:52Z

db/dbformat.h

@@ -562,18 +562,25 @@ inline uint64_t GetInternalKeySeqno(const Slice& internal_key) {
 //    allocation for smaller keys.
 // 3. It tracks user key or internal key, and allow conversion between them.
 class IterKey {
+  static constexpr char kTsMin[] = "\x00\x00\x00\x00\x00\x00\x00\x00";


I think it would be nice to add the usual comment here about only 64-bit timestamps being supported currently.

ltamasi · 2024-10-03T18:46:25Z

db/dbformat.h

+  char* secondary_buf_;
+  char space_for_secondary_buf_[39];  // Avoid allocation for short keys


This probably wouldn't cause any issues in practice but since secondary_buf_ can potentially point to space_for_secondary_buf_, it would be nice to have these two ordered the other way around. (Technically, secondary_buf_ currently gets constructed before and destroyed after space_for_secondary_buf_.) Also, we could introduce a named constant for the size of the inline buffers (39).

That's a good point, thank you for the suggestion!

ltamasi · 2024-10-03T18:47:28Z

db/dbformat.h

+  // Use to track the pieces that together make the whole key. We then copy
+  // these pieces in order either into buf_ or secondary_buf_ depending on where
+  // the previous key is held.
+  Slice key_slices_[5];


We could consider using std::array instead of a C-style arrray

ltamasi · 2024-10-03T18:49:30Z

db/dbformat.h

+      secondary_buf_ = space_for_secondary_buf_;
+    }
+    secondary_buf_size_ = sizeof(space_for_secondary_buf_);
+    key_size_ = 0;


Would it make sense to clear key_size_ iff key_ points to the secondary buffer?

Good catch! This is only supposed to be called when key_ points to secondary buffer, or during destruction. It's good to make a check for this.

ltamasi · 2024-10-03T18:52:22Z

db/dbformat.h

+    size_t actual_total_bytes = 0;
+#endif  // NDEBUG
+    for (size_t i = 0; i < num_key_slices; i++) {
+      size_t key_size = key_slices_[i].size();


key_size might not be the best name for this variable; how about something like key_slice_size or slice_size?

Good catch, the name is indeed confusing.

ltamasi · 2024-10-03T18:59:13Z

db/dbformat.h

-      key_parts.emplace_back(slice_data, left_sz);
-      key_parts.emplace_back(min_timestamp);
-      key_parts.emplace_back(slice_data + left_sz, slice_sz - left_sz);
+      key_slices_[(*next_key_slice_idx)++] = Slice(slice_data, left_sz);


We could assert that next_key_slice_idx is not null and that we don't overrun the key_slices_ buffer (i.e. that we don't end up with more than 5 parts)

facebook-github-bot · 2024-10-03T21:19:49Z

@jowlyzhang has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2024-10-03T21:37:14Z

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-10-03T21:48:27Z

@jowlyzhang has updated the pull request. You must reimport the pull request before landing.

ltamasi

LGTM, thanks @jowlyzhang !

ltamasi · 2024-10-03T22:00:49Z

db/dbformat.h

+    if (key_ == secondary_buf_) {
+      key_size_ = 0;
+    }


Should we have a similar check in ResetBuffer too (with buf_)?

facebook-github-bot · 2024-10-03T22:06:56Z

@jowlyzhang has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2024-10-03T22:14:48Z

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-10-04T01:01:11Z

@jowlyzhang merged this pull request in 32dd657.

Summary: This PR added some optimizations for the per key handling for SST file for the user-defined timestamps in Memtable only feature. CPU profiling shows this part is a big culprit for regression. This optimization saves some string construction/destruction/appending/copying. vector operations like reserve/emplace_back. When iterating keys in a block, we need to copy some shared bytes from previous key, put it together with the non shared bytes and find a right location to pad the min timestamp. Previously, we create a tmp local string buffer to first construct the key from its pieces, and then copying this local string's content into `IterKey`'s buffer. To avoid having this local string and to avoid this extra copy. Instead of piecing together the key in a local string first, we just track all the pieces that make this key in a reused Slice array. And then copy the pieces in order into `IterKey`'s buffer. Since the previous key should be kept intact while we are copying some shared bytes from it, we added a secondary buffer in `IterKey` and alternate between primary buffer and secondary buffer. Pull Request resolved: #13031 Test Plan: Existing tests. Reviewed By: ltamasi Differential Revision: D63416531 Pulled By: jowlyzhang fbshipit-source-id: 9819b0e02301a2dbc90621b2fe4f651bc912113c

facebook-github-bot added the CLA Signed label Sep 24, 2024

jowlyzhang marked this pull request as draft September 24, 2024 21:10

jowlyzhang force-pushed the per_key_optimization branch from 14e33e3 to 34a90b9 Compare September 25, 2024 02:09

Add some per key optimization for UDT in memtable only feature

321991a

jowlyzhang force-pushed the per_key_optimization branch from 34a90b9 to 321991a Compare September 25, 2024 18:02

jowlyzhang marked this pull request as ready for review September 25, 2024 21:27

jowlyzhang requested a review from ltamasi September 25, 2024 21:27

ltamasi reviewed Oct 3, 2024

View reviewed changes

jowlyzhang force-pushed the per_key_optimization branch from 0313896 to a174a68 Compare October 3, 2024 21:48

ltamasi approved these changes Oct 3, 2024

View reviewed changes

Address review comments

2d47f74

jowlyzhang force-pushed the per_key_optimization branch from a174a68 to 2d47f74 Compare October 3, 2024 22:06

facebook-github-bot closed this in 32dd657 Oct 4, 2024

facebook-github-bot added the Merged label Oct 4, 2024

jowlyzhang deleted the per_key_optimization branch October 4, 2024 16:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add some per key optimization for UDT in memtable only feature #13031

Add some per key optimization for UDT in memtable only feature #13031

jowlyzhang commented Sep 24, 2024 •

edited

Loading

facebook-github-bot commented Sep 25, 2024

ltamasi left a comment

ltamasi Oct 3, 2024

ltamasi Oct 3, 2024

ltamasi Oct 3, 2024

jowlyzhang Oct 3, 2024

ltamasi Oct 3, 2024

ltamasi Oct 3, 2024

jowlyzhang Oct 3, 2024

ltamasi Oct 3, 2024

jowlyzhang Oct 3, 2024

ltamasi Oct 3, 2024

facebook-github-bot commented Oct 3, 2024

facebook-github-bot commented Oct 3, 2024

facebook-github-bot commented Oct 3, 2024

ltamasi left a comment

ltamasi Oct 3, 2024

facebook-github-bot commented Oct 3, 2024

facebook-github-bot commented Oct 3, 2024

facebook-github-bot commented Oct 4, 2024

		char* secondary_buf_;
		char space_for_secondary_buf_[39]; // Avoid allocation for short keys

Add some per key optimization for UDT in memtable only feature #13031

Add some per key optimization for UDT in memtable only feature #13031

Conversation

jowlyzhang commented Sep 24, 2024 • edited Loading

facebook-github-bot commented Sep 25, 2024

ltamasi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Oct 3, 2024

facebook-github-bot commented Oct 3, 2024

facebook-github-bot commented Oct 3, 2024

ltamasi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Oct 3, 2024

facebook-github-bot commented Oct 3, 2024

facebook-github-bot commented Oct 4, 2024

jowlyzhang commented Sep 24, 2024 •

edited

Loading