Help understanding blob file size #120
-
I'm experimenting with KV separation, using 2048-byte values. I create a partition with KV separation enabled, using default I then batch insert a bunch of data -- roughly 140G. Insertion is multithreaded over 16 cores. Looking at the files in my partition, under segments/ I see a bunch of files that seem to max out at ~75M. There aren't many of them, and they total ~5G. OK so far. Under blobs/segments, I have a very large number of files (~13k), mostly around 10M in size, up to ~13M. This surprises me -- given the default value of I'm guessing I don't really understand what |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
The initial size of blob files is governed by the memtable size, which is 16M by default. So it makes sense that the blob files are ~10-13M initially. When working with large values I would recommend a larger memtable, around 64M maybe (which is the default for RocksDB), so there are less frequent, larger flushes. That should also make the blob files larger. The But currently there is no strategy to just rewrite some blob files that are not fragmented to reduce the amount of blob files - a strategy to rewrite a limited set of the oldest blob files would probably be nice to have. |
Beta Was this translation helpful? Give feedback.
The initial size of blob files is governed by the memtable size, which is 16M by default. So it makes sense that the blob files are ~10-13M initially. When working with large values I would recommend a larger memtable, around 64M maybe (which is the default for RocksDB), so there are less frequent, larger flushes. That should also make the blob files larger.
The
KvSeparationOptions::file_target_size
is the target size at which a blob file rewrite (GC) will rotate to a new file. So it only matters when actually performing a GC strategy.But currently there is no strategy to just rewrite some blob files that are not fragmented to reduce the amount of blob files - a strategy to rewrite a lim…