Help understanding blob file size #120

dbbnrl · 2025-01-04T04:33:51Z

dbbnrl
Jan 4, 2025

I'm experimenting with KV separation, using 2048-byte values.

I create a partition with KV separation enabled, using default KvSeparationOptions.

I then batch insert a bunch of data -- roughly 140G. Insertion is multithreaded over 16 cores.

Looking at the files in my partition, under segments/ I see a bunch of files that seem to max out at ~75M. There aren't many of them, and they total ~5G. OK so far.

Under blobs/segments, I have a very large number of files (~13k), mostly around 10M in size, up to ~13M. This surprises me -- given the default value of KvSeparationOptions.file_target_size, I would have expected fewer, much larger files, by a factor of roughly 10.

I'm guessing I don't really understand what file_target_size really does; I'd very much appreciate a rough explanation of why I'm seeing this result.

Answered by marvin-j97

Jan 4, 2025

The initial size of blob files is governed by the memtable size, which is 16M by default. So it makes sense that the blob files are ~10-13M initially. When working with large values I would recommend a larger memtable, around 64M maybe (which is the default for RocksDB), so there are less frequent, larger flushes. That should also make the blob files larger.

The KvSeparationOptions::file_target_size is the target size at which a blob file rewrite (GC) will rotate to a new file. So it only matters when actually performing a GC strategy.

But currently there is no strategy to just rewrite some blob files that are not fragmented to reduce the amount of blob files - a strategy to rewrite a lim…

View full answer

marvin-j97 · 2025-01-04T18:24:54Z

marvin-j97
Jan 4, 2025
Maintainer

The initial size of blob files is governed by the memtable size, which is 16M by default. So it makes sense that the blob files are ~10-13M initially. When working with large values I would recommend a larger memtable, around 64M maybe (which is the default for RocksDB), so there are less frequent, larger flushes. That should also make the blob files larger.

The KvSeparationOptions::file_target_size is the target size at which a blob file rewrite (GC) will rotate to a new file. So it only matters when actually performing a GC strategy.

But currently there is no strategy to just rewrite some blob files that are not fragmented to reduce the amount of blob files - a strategy to rewrite a limited set of the oldest blob files would probably be nice to have.

5 replies

dbbnrl Jan 4, 2025
Author

Thanks for the helpful reply, as usual.

I've just changed my code to set max_write_buffer_size(1024 * 1024 * 256) on the keyspace and max_memtable_size(1024 * 1024 * 64) on the partition. As I watch the bulk insert, I see that blob files have only slightly increased in size (typically 13M - 15M). Definitely nowhere close to 64M.

Additional info in case it's relevant: I am not batching writes. Bulk loading is multi-threaded via Rayon with 16 workers. Each unit of work is around 64k keys. Within each work unit, keys are not inserted in sorted order (though there is some degree of clustering).

marvin-j97 Jan 5, 2025
Maintainer

Are you using the previous database (which may still be configured with the old memtable size)?

use fjall::{Config, KvSeparationOptions, PartitionCreateOptions as CreateOptions};
use rand::RngCore;

fn main() -> fjall::Result<()> {
    let keyspace = Config::default().open()?;

    let tree = keyspace.open_partition(
        "default",
        CreateOptions::default()
            .with_kv_separation(KvSeparationOptions::default())
            .max_memtable_size(64 * 1_024 * 1_024),
    )?;

    let mut value = vec![0; 2_048];
    {
        let mut rng = rand::thread_rng();
        rng.fill_bytes(&mut value);
    }

    for x in 0u64..1_000_000 {
        tree.insert(x.to_be_bytes(), &value)?;
    }

    Ok(())
}

Running this, I get exactly 64 MB blob files.

dbbnrl Jan 5, 2025
Author

I figured it out -- it turns out I wasn't thinking clearly about the role of compression. If I modify my code to random-fill my values as in your example, I get the same result you get (blob file size == memtable size).

In retrospect it's fairly obvious that the memtable size is measured in uncompressed bytes, which may not correspond to the size on-disk.

I have two followup questions if you're willing:

In your initial answer you mentioned that the blob file size limit will really only come into play during GC operations. At that point, would the limit apply to the actual on-disk (post-compression) file size?
There is a block size parameter. The docs imply that it might make sense to increase this for large values, which makes sense intuitively. BUT, am I correct in assuming that with KV separation, value size is irrelevant for properly tuning block size? I.e., the "blocks" tuned by block_size() are not used in the blob files?

marvin-j97 Jan 5, 2025
Maintainer

In your initial answer you mentioned that the blob file size limit will really only come into play during GC operations. At that point, would the limit apply to the actual on-disk (post-compression) file size?

It should yeah, it keeps track of the real file pointer and compares that to the target size: https://github.com/fjall-rs/value-log/blob/ec3c3bde20e3034a2dff2d1d45994adff526c54f/src/segment/multi_writer.rs#L133

BUT, am I correct in assuming that with KV separation, value size is irrelevant for properly tuning block size? I.e., the "blocks" tuned by block_size() are not used in the blob files?

Yes, the blob files are not grouped into blocks (each blob is separately compressed) - in this case, the block_size only affects the "index tree" (which is the "normal" LSM-tree), and actually in that case the block size is probably better off being around 4-8K because it will mostly end up containing blob references, which are very small (24 bytes I believe).

dbbnrl Jan 5, 2025
Author

That makes perfect sense, thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Help understanding blob file size #120

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Help understanding blob file size #120

Uh oh!

dbbnrl Jan 4, 2025

Replies: 1 comment · 5 replies

Uh oh!

Uh oh!

marvin-j97 Jan 4, 2025 Maintainer

Uh oh!

dbbnrl Jan 4, 2025 Author

Uh oh!

marvin-j97 Jan 5, 2025 Maintainer

Uh oh!

dbbnrl Jan 5, 2025 Author

Uh oh!

marvin-j97 Jan 5, 2025 Maintainer

Uh oh!

dbbnrl Jan 5, 2025 Author

dbbnrl
Jan 4, 2025

Replies: 1 comment 5 replies

marvin-j97
Jan 4, 2025
Maintainer

dbbnrl Jan 4, 2025
Author

marvin-j97 Jan 5, 2025
Maintainer

dbbnrl Jan 5, 2025
Author

marvin-j97 Jan 5, 2025
Maintainer

dbbnrl Jan 5, 2025
Author