-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve filters, key, protobuf and query documentation.
- Loading branch information
Showing
4 changed files
with
276 additions
and
219 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,45 +1,55 @@ | ||
# Keys in DataObjects | ||
|
||
All DataObjects must be stored under a unique key, which acts as an index to the object. | ||
The key serves as a permanent identifier for the object and can be used to retrieve it. | ||
If you store new data under an existing key, the existing data will be overwritten. | ||
All DataObjects must be stored under a unique key, which acts as an index to the object. The key serves as a permanent | ||
identifier for the object and can be used to retrieve it. If you store new data under an existing key, the existing data | ||
will be overwritten. | ||
|
||
## Default UUID Keys | ||
|
||
If you don't specify a key definition, the model will automatically generate 128-bit v4 UUID keys. | ||
These keys are guaranteed to be unique, but they won't provide any benefits for scanning the data. | ||
If you don't specify a key definition, the model will automatically generate 128-bit v4 UUID keys. These keys are | ||
guaranteed to be unique, ensuring that no two DataObjects will share the same identifier. However, please note that | ||
while they are unique, they do not provide any inherent benefits for scanning or ordering the data. | ||
|
||
## Choosing the Right Key from the Start | ||
|
||
It's essential to be mindful when designing the key for a DataModel, as the | ||
key structure cannot be changed after data has been stored (without complex migrations). | ||
Consider the primary use case and ordering of the data and make sure the key is optimized | ||
for this purpose. Secondary use cases can be addressed by adding an index. | ||
It's essential to be mindful when designing the key for a DataModel since the key structure cannot be changed after data | ||
has been stored (without complex migrations). Consider the primary use case and the desired ordering of the data | ||
carefully. Make sure the key is optimized for this purpose upfront. If secondary use cases arise, they can often be | ||
addressed by adding an index later. | ||
|
||
## Properties for Key Structure | ||
|
||
Key structures must have fixed byte lengths. This way the location of key elements are | ||
predictable which are beneficial for scans over keys. | ||
Key structures must have fixed byte lengths. This predictability is crucial as it allows for efficient indexing and | ||
scanning over keys. | ||
|
||
Properties that can be used for key elements include numbers, dates and times, fixed bytes, references, enums, booleans, | ||
multi-type objects, and ValueDataModels containing similar values. | ||
Properties that can be utilized for key elements include numbers, dates and times, fixed bytes, references, enums, | ||
booleans, multi-type objects, and ValueDataModels containing similar values. | ||
|
||
Properties that cannot be used in keys include strings, flexible bytes, sets, lists, maps, and | ||
embedded models, as they have varying byte lengths. | ||
On the other hand, properties that cannot be used in keys include strings, flexible bytes, sets, lists, maps, and | ||
embedded models, as they possess varying byte lengths, which disrupts the consistency required for effective key | ||
structures. | ||
|
||
## The order of keys | ||
Keys are stored in order, which means that data scans will traverse or skip | ||
the data in the same order. If the key starts with a reference, the data scan | ||
can start at the exact location for that reference. | ||
## The Order of Keys | ||
|
||
If the data is often requested in the newest-first order, it is recommended to | ||
reverse the date of creation so that new data is retrieved first. | ||
Keys are stored in a specific order, meaning that data scans will traverse or skip the data in the same organized | ||
manner. If the key begins with a reference, scans can efficiently start at the exact location corresponding to that | ||
reference. | ||
|
||
## Tips on designing a key | ||
If the data is often requested in a newest-first order, it is advisable to reverse the date of creation in the key | ||
structure. This way, newer data will be retrieved first during scans, enhancing the user experience. | ||
|
||
- Consider performance: The key structure should be optimized for the most common use cases, as it will affect the performance of data retrieval and scans. | ||
- If data "belongs" to a particular entity or person, start the key with a reference to that entity or person. | ||
- If data needs to be ordered by time, include the time in the key. If newer data is frequently requested first, reverse the time. | ||
- If the data has a primary multi-type property, include the type ID in the key so you can quickly retrieve data objects of a specific type. If time is also included in the key, make sure to place it after the type ID so that data is still ordered by time. | ||
- If date precision in nanoseconds is somehow not enough, consider adding a random number to the key. | ||
- Use indexing: If you have a key structure that is not optimized for your use case, you can use an index to improve performance. This is especially useful if you have large datasets. | ||
## Tips on Designing a Key | ||
|
||
- **Consider performance:** The key structure should be optimized for the most common use cases, as this will directly | ||
impact the performance of data retrieval and scans. | ||
- **Entity association:** If data "belongs" to a particular entity or person, start the key with a reference to that | ||
entity or person. This associativity can streamline searches. | ||
- **Time ordering:** If data needs to be ordered by time, include the time in the key. If newer data is frequently | ||
requested first, reverse the time to prioritize its retrieval. | ||
- **Type identification:** If the data has a primary multi-type property, include the type ID in the key to enable quick | ||
retrieval of data objects of a specific type. Ensure that any time information follows the type ID to maintain | ||
chronological order. | ||
- **Date precision:** If date precision in nanoseconds is insufficient for your application's needs, consider appending | ||
a random number to the key for additional uniqueness. | ||
- **Use indexing:** If you find that your key structure is not optimized for your primary use case, utilize indexing to | ||
improve performance. This is especially beneficial when dealing with large datasets. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,24 +1,34 @@ | ||
# Protobuf Transportation | ||
|
||
The encoding standard for [ProtoBuf V3](https://developers.google.com/protocol-buffers/) has been adopted for efficient | ||
and compact transportation of data. Developed by Google, ProtoBuf is a widely adopted standard. Currently, only the | ||
encoding standard has been adopted, and schema generation is yet to be implemented. | ||
and compact transportation of data. Developed by Google, ProtoBuf is a widely adopted standard used across various | ||
platforms and languages for serializing structured data. Currently, only the encoding standard has been implemented, | ||
while schema generation is yet to be developed. | ||
|
||
For a more in-depth understanding of how values are encoded, refer to the [ProtoBuf encoding documentation](https://developers.google.com/protocol-buffers/docs/encoding) | ||
For a more in-depth understanding of how values are encoded, refer to | ||
the [ProtoBuf encoding documentation](https://developers.google.com/protocol-buffers/docs/encoding), which provides | ||
detailed insights on the encoding mechanisms and examples. | ||
|
||
## Key Value pairs | ||
## Key Value Pairs | ||
|
||
A ProtoBuf message is built using key-value pairs, where the key contains a tag that identifies the encoded property and | ||
a wire_type that indicates the type of value that was encoded. The value is encoded in the byte format for transport, | ||
and the encoding format for each property type is documented in the [properties documentation](properties/properties.md). | ||
A ProtoBuf message is constructed using key-value pairs. In this structure, the key consists of a tag that uniquely | ||
identifies the encoded property, along with a wire type that specifies the type of value being encoded. The actual value | ||
is then represented in byte format for efficient transport. The encoding format for each property type is thoroughly | ||
documented in the [properties documentation](properties/properties.md), which serves as a reference for developers | ||
looking to implement and understand the specifics of encoding different data types. | ||
|
||
## Wire Types | ||
|
||
Maryk supports all wire types supported by ProtoBuf, including: | ||
Maryk supports all wire types defined by ProtoBuf, including: | ||
|
||
* VarInt: A variable integer used for numeric values that grow in size with the value. | ||
* Length Delimited: Used for variable length values. The length of the bytes is preceded by the value. | ||
It can also contain key-value pairs of embedded messages. | ||
* 32 Bit: Used for values of 4 bytes. | ||
* 64 Bit: Used for values of 8 bytes. | ||
* Start Group / End Group: Not currently used and also deprecated in ProtoBuf. | ||
* **VarInt**: A variable-length integer used for numeric values, allowing for efficient storage of small values while | ||
accommodating larger integers without wasting space. | ||
* **Length Delimited**: This type is utilized for values that can vary in length. The actual bytes of the value are | ||
prefixed by a length field, making it versatile for use with strings and byte arrays. Additionally, it can encapsulate | ||
key-value pairs of embedded messages. | ||
* **32 Bit**: Specifically employed for values that are exactly 4 bytes in size, commonly used for fixed-width numerical | ||
types. | ||
* **64 Bit**: Designed for values that are exactly 8 bytes, typically used for large numerical types or higher-precision | ||
floating-point numbers. | ||
* **Start Group / End Group**: These wire types are currently not in use and are also deprecated in the ProtoBuf | ||
specification. It is advisable to avoid using them in new implementations. |
Oops, something went wrong.