Replies: 3 comments
-
Some IdeasNormal
TopK
|
Beta Was this translation helpful? Give feedback.
-
Time-series formatRelease the power unique in time-series data Semantic typeRepeat one more time here. Storage FormatIn our current model, data are stored in the parquet file plainly. I.e., row to row and column to column:
Thanks to the powerful parquet format, this method works fine in overall disk size. And has good write performance. However, it falls short when measuring read ability. Queries like
The proposed format collapse ts (time index) and fields into lists, while keeping tags unchanged. Reconsider queries above, Memory FormatNext part is the memory format, it's related to query execution.
Both of them can alleviate the repeated tags. But both of them requires new feature to the existing query engine, as these nested type are always second-class citizen. And (b)
Other notesWhat to include in a single record batch?I prefer to only include one time-series per record batch if there are execution plan is based on time-series logic. This may increase the number of processed record batch and the function call. But this is also a great place to apply pipeline execution (the data domain is very clear, and pipeline breaker can help to concat batches), though we don't have one for now. Is it okay to change our persist format?I think it's acceptable. We don't change too much, and it's easy to convert back, and the improvement it brings worth it. |
Beta Was this translation helpful? Give feedback.
-
What type of enhancement is this?
Performance
What does the enhancement do?
Optimization rules
Data statistics & index
Storage format
Implementation challenges
No response
Beta Was this translation helpful? Give feedback.
All reactions