Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optimizeDictionaryType config to automatically choose dictionary type #14444

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

itschrispeck
Copy link
Collaborator

We see some mild improvements from using fixed length dictionaries when all values of a column in a segment are the same length. Since we programmatically manage our columns and use other optimizeDictionary functionality, we'd like to programmatically choose the dictionary type as well.

Previously the creation info is ignored during segment conversion. When the new config optimizeDictionaryType is true, then the creation info is used to determine the dictionary type (fixed width or var width). Tested in our internal cluster, inspected the log added and verified fixed length cols such as hash/uuid only cols use fixed dict, and variable length data uses var dict.

This PR also contains some minor fixes in TableConfigBuilder for missing configs.

@codecov-commenter
Copy link

codecov-commenter commented Nov 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 0.00%. Comparing base (59551e4) to head (bf2a59d).
Report is 1324 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (59551e4) and HEAD (bf2a59d). Click for more details.

HEAD has 53 uploads less than BASE
Flag BASE (59551e4) HEAD (bf2a59d)
integration 7 1
integration2 3 1
temurin 12 1
java-21 7 1
skip-bytebuffers-true 3 1
skip-bytebuffers-false 7 0
unittests 5 0
unittests1 2 0
java-11 5 0
unittests2 3 0
integration1 2 0
custom-integration1 2 0
Additional details and impacted files
@@              Coverage Diff              @@
##             master   #14444       +/-   ##
=============================================
- Coverage     61.75%    0.00%   -61.76%     
=============================================
  Files          2436        3     -2433     
  Lines        133233        6   -133227     
  Branches      20636        0    -20636     
=============================================
- Hits          82274        0    -82274     
+ Misses        44911        6    -44905     
+ Partials       6048        0     -6048     
Flag Coverage Δ
custom-integration1 ?
integration 0.00% <ø> (-0.01%) ⬇️
integration1 ?
integration2 0.00% <ø> (ø)
java-11 ?
java-21 0.00% <ø> (-61.63%) ⬇️
skip-bytebuffers-false ?
skip-bytebuffers-true 0.00% <ø> (-27.73%) ⬇️
temurin 0.00% <ø> (-61.76%) ⬇️
unittests ?
unittests1 ?
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

if (_config.isOptimizeDictionaryType()) {
LOGGER.info("Overriding dictionary type for column: {} using var-length dictionary: {}", columnName,
columnIndexCreationInfo.isUseVarLengthDictionary());
dictConfig = new DictionaryIndexConfig(dictConfig, columnIndexCreationInfo.isUseVarLengthDictionary());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the default dictionary type? shall we name it OptimizeWithVarLengthDictionaryType?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is whichever is specified in the table config, the addition option will ignore the static config and use column index creation info - it does not necessary go var -> fixed or fixed -> var, so I think naming should stay generic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants