-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optimizeDictionaryType
config to automatically choose dictionary type
#14444
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #14444 +/- ##
=============================================
- Coverage 61.75% 0.00% -61.76%
=============================================
Files 2436 3 -2433
Lines 133233 6 -133227
Branches 20636 0 -20636
=============================================
- Hits 82274 0 -82274
+ Misses 44911 6 -44905
+ Partials 6048 0 -6048
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
if (_config.isOptimizeDictionaryType()) { | ||
LOGGER.info("Overriding dictionary type for column: {} using var-length dictionary: {}", columnName, | ||
columnIndexCreationInfo.isUseVarLengthDictionary()); | ||
dictConfig = new DictionaryIndexConfig(dictConfig, columnIndexCreationInfo.isUseVarLengthDictionary()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the default dictionary type? shall we name it OptimizeWithVarLengthDictionaryType
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is whichever is specified in the table config, the addition option will ignore the static config and use column index creation info - it does not necessary go var -> fixed or fixed -> var, so I think naming should stay generic
We see some mild improvements from using fixed length dictionaries when all values of a column in a segment are the same length. Since we programmatically manage our columns and use other optimizeDictionary functionality, we'd like to programmatically choose the dictionary type as well.
Previously the creation info is ignored during segment conversion. When the new config
optimizeDictionaryType
is true, then the creation info is used to determine the dictionary type (fixed width or var width). Tested in our internal cluster, inspected the log added and verified fixed length cols such as hash/uuid only cols use fixed dict, and variable length data uses var dict.This PR also contains some minor fixes in TableConfigBuilder for missing configs.