Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for storing non-CLP-encodable values in a separate column; Replace CLP row value that are too large to store in FixedByteMVMutableForwardIndex with an error message. #14365

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

jackluo923
Copy link
Contributor

The dictVar and encodedVar columns in CLP are stored in multi-value forward indexes, which require users to specify the maximum number of values per row at index creation time. However, this is impractical because the number of values per row is only determined during ingestion. To prevent ingestion errors when the number of values in a row exceeds the allowed limit for a multi-value row, the encoded value is replaced with an error message. To preserve the original content, the raw value is stored in a separate field.

…; Replace CLP row value that are too large to store in FixedByteMVMutableForwardIndex with an error message.
@codecov-commenter
Copy link

codecov-commenter commented Nov 2, 2024

Codecov Report

Attention: Patch coverage is 72.22222% with 5 lines in your changes missing coverage. Please review.

Project coverage is 64.02%. Comparing base (59551e4) to head (5f3f321).
Report is 1343 commits behind head on master.

Files with missing lines Patch % Lines
...local/indexsegment/mutable/MutableSegmentImpl.java 64.28% 3 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #14365      +/-   ##
============================================
+ Coverage     61.75%   64.02%   +2.27%     
- Complexity      207     1565    +1358     
============================================
  Files          2436     2663     +227     
  Lines        133233   146219   +12986     
  Branches      20636    22405    +1769     
============================================
+ Hits          82274    93619   +11345     
- Misses        44911    45706     +795     
- Partials       6048     6894     +846     
Flag Coverage Δ
custom-integration1 100.00% <ø> (+99.99%) ⬆️
integration 100.00% <ø> (+99.99%) ⬆️
integration1 100.00% <ø> (+99.99%) ⬆️
integration2 0.00% <ø> (ø)
java-11 63.98% <72.22%> (+2.27%) ⬆️
java-21 63.85% <72.22%> (+2.23%) ⬆️
skip-bytebuffers-false 64.02% <72.22%> (+2.27%) ⬆️
skip-bytebuffers-true 63.81% <72.22%> (+36.08%) ⬆️
temurin 64.02% <72.22%> (+2.27%) ⬆️
unittests 64.02% <72.22%> (+2.27%) ⬆️
unittests1 55.60% <50.00%> (+8.71%) ⬆️
unittests2 34.70% <72.22%> (+6.97%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -525,6 +526,9 @@ public boolean index(GenericRow row, @Nullable RowMetadata rowMetadata)
}
}

// NOTE: we msut do this before we index a single column to avoid partially indexing the row
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change seems affecting non-CLP logic, can you please eloborate why this change is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an edge case that wasn't handled in the original Pinot code. CLP triggered this edge case in very extreme case. Specifically, if a MV column's content is larger than a chunk, Pinot will error with or without CLP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jackie-Jiang can you help review this section. is this introduce backward incompatible behavior?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants