-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for storing non-CLP-encodable values in a separate column; Replace CLP row value that are too large to store in FixedByteMVMutableForwardIndex with an error message. #14365
base: master
Are you sure you want to change the base?
Add support for storing non-CLP-encodable values in a separate column; Replace CLP row value that are too large to store in FixedByteMVMutableForwardIndex with an error message. #14365
Conversation
…; Replace CLP row value that are too large to store in FixedByteMVMutableForwardIndex with an error message.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #14365 +/- ##
============================================
+ Coverage 61.75% 64.02% +2.27%
- Complexity 207 1565 +1358
============================================
Files 2436 2663 +227
Lines 133233 146219 +12986
Branches 20636 22405 +1769
============================================
+ Hits 82274 93619 +11345
- Misses 44911 45706 +795
- Partials 6048 6894 +846
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -525,6 +526,9 @@ public boolean index(GenericRow row, @Nullable RowMetadata rowMetadata) | |||
} | |||
} | |||
|
|||
// NOTE: we msut do this before we index a single column to avoid partially indexing the row |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change seems affecting non-CLP logic, can you please eloborate why this change is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an edge case that wasn't handled in the original Pinot code. CLP triggered this edge case in very extreme case. Specifically, if a MV column's content is larger than a chunk, Pinot will error with or without CLP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jackie-Jiang can you help review this section. is this introduce backward incompatible behavior?
The dictVar and encodedVar columns in CLP are stored in multi-value forward indexes, which require users to specify the maximum number of values per row at index creation time. However, this is impractical because the number of values per row is only determined during ingestion. To prevent ingestion errors when the number of values in a row exceeds the allowed limit for a multi-value row, the encoded value is replaced with an error message. To preserve the original content, the raw value is stored in a separate field.