Skip to content

Supporting keys > 127 bytes long #42

@scottcarey

Description

@scottcarey

HaloDB encodes key length as a signed byte on disk, and in memory. It rejects keys longer than 127 bytes.

Although most of my string keys in different databases are between 10 and 80 bytes, I have some rare outliers as large as 780 bytes. Every other K/V store I work with (I have an API abstraction that wraps about 10) can support large keys; only one other breaks at 512 bytes.

There are some options for longer keys:

Easy: read the byte as an unsigned value, and then sizes from 0 to 255 are supported. However, this will make it even harder to support larger keys.

Harder: Allow for larger sizes, perhaps up to 2KB.

  • Index/Record/Tombstone files: Steal the top 3 bits from the version byte. Since the version byte is currently 0, new code versions would interpret existing data files the same ( top 3 bits of existing version byte | key size byte ). Old code would interpret any 'extended' keys as a version mismatch and thus still be safe. Therefore I think this can remain version 0. If versions were to get up to 32, a different format would be needed at that time.
  • SegmentWithMemoryPool: No change for now, it will not support key sizes larger than its configured fixedKeySize which would be 127 or less. It could be extended to support key overflow when fixedKeySize is set to 8 or larger. In this case, when the key length is larger than fixedKeySize, then the slot holds a pointer to extended key data, plus whatever prefix of the key fits in the remaining slot (fixedKeySize - 8). An alternative when fixedKeySize is large enough is to keep a portion of the key hash in this area as well, so that the pointer to the extended key data does not need to be followed for most lookups. Even just one byte of the hash that was not used for accessing the hash bucket would decrease the chance that the pointer is followed by a factor of 256 on a miss.
  • SegmentNonMemoryPool: Since all hash entries are individually allocated, (it appears to be closed addressing with a linked chain of entries), the allocated entry in memory can either use a variable length integer encoding for the key/value lengths, or a constant two bytes for the key.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions