-
Notifications
You must be signed in to change notification settings - Fork 100
Open
Labels
enhancementNew feature or requestNew feature or request
Description
HaloDB encodes key length as a signed byte on disk, and in memory. It rejects keys longer than 127 bytes.
Although most of my string keys in different databases are between 10 and 80 bytes, I have some rare outliers as large as 780 bytes. Every other K/V store I work with (I have an API abstraction that wraps about 10) can support large keys; only one other breaks at 512 bytes.
There are some options for longer keys:
Easy: read the byte as an unsigned value, and then sizes from 0 to 255 are supported. However, this will make it even harder to support larger keys.
Harder: Allow for larger sizes, perhaps up to 2KB.
- Index/Record/Tombstone files: Steal the top 3 bits from the version byte. Since the version byte is currently 0, new code versions would interpret existing data files the same ( top 3 bits of existing version byte | key size byte ). Old code would interpret any 'extended' keys as a version mismatch and thus still be safe. Therefore I think this can remain version 0. If versions were to get up to 32, a different format would be needed at that time.
- SegmentWithMemoryPool: No change for now, it will not support key sizes larger than its configured
fixedKeySizewhich would be 127 or less. It could be extended to supportkey overflowwhenfixedKeySizeis set to 8 or larger. In this case, when the key length is larger thanfixedKeySize, then the slot holds a pointer toextendedkey data, plus whatever prefix of the key fits in the remaining slot (fixedKeySize - 8). An alternative whenfixedKeySizeis large enough is to keep a portion of the key hash in this area as well, so that the pointer to theextendedkey data does not need to be followed for most lookups. Even just one byte of the hash that was not used for accessing the hash bucket would decrease the chance that the pointer is followed by a factor of 256 on a miss. - SegmentNonMemoryPool: Since all hash entries are individually allocated, (it appears to be closed addressing with a linked chain of entries), the allocated entry in memory can either use a variable length integer encoding for the key/value lengths, or a constant two bytes for the key.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request