-
Notifications
You must be signed in to change notification settings - Fork 5.5k
fix(plugin-iceberg): Iceberg time column type #26523
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Reviewer's guide (collapsed on small PRs)Reviewer's GuideIntroduces a dedicated mapping for Iceberg TIME types by adding an icebergTypeToHiveType helper and refactoring toHiveColumns to use this helper, ensuring TIME columns are mapped to Hive LONG instead of STRING. Class diagram for updated IcebergUtil type mappingclassDiagram
class IcebergUtil {
+List<Column> toHiveColumns(List<NestedField> columns)
+FileFormat getFileFormat(Table table)
-HiveType icebergTypeToHiveType(org.apache.iceberg.types.Type icebergType)
}
IcebergUtil --> Column
IcebergUtil --> HiveType
IcebergUtil --> HiveSchemaUtil
IcebergUtil --> org.apache.iceberg.types.Type
IcebergUtil --> FileFormat
IcebergUtil --> Table
class Column {
+String name
+HiveType type
+Optional comment
+Optional extra
}
class HiveType {
+static HiveType toHiveType(Type type)
+static HiveType HIVE_LONG
}
class HiveSchemaUtil {
+static Type convert(Type type)
}
class org.apache.iceberg.types.Type {
+TypeID typeId()
}
class TypeID {
TIME
// ...
}
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hantangwangd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PingLiuPing Thanks for this fix. Just to confirm, have you tested it manually? Curious but I still got an error when querying a table including a column of Time type in prestissimo. I'm sure my prestissimo is the latest version, and the type is already parsed as bigint instead of varchar.
Besides, is it possible to add a test case for this fix?
aditi-pandit
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PingLiuPing : Thanks for this code.
Prestissimo also has TIME logical type now. Since that is backed by the BIGINT physical type this code would be fine mostly, but why not use the logical type like TIMESTAMP here ?
Also, please can you add some tests.
|
Thanks for the comment. Yes, the error is expected (I guess the error should be reported from ParquetReader "!requestedType || isCompatible(requestedType, isRepeated"). This PR only address the time column data type issue. There should be few PR required in Velox to adapt to the new time data type which is been added few weeks ago. I have submitted few PRs in Velox for this but not all of them get merged, only one relate to writing time data type been merged. Yes, I will try to find a place to add a test case. |
|
@aditi-pandit thank you for comment.
Can you point me to the PR or code in Prestissimo? Thank you.
I think TIME data type is a different logical type than TIMESTAMP but their physical type can be same such as bigint (int64). And if we convert the type to TIMESTAMP here, when this type passed to Velox, it will be treated as TIMESTAMP. And the semantics is different with TIME. |
My bad... I meant Velox. https://github.com/facebookincubator/velox/blob/main/velox/type/Type.h#L1546
There is some confusion here... I meant TIMESTAMP in Iceberg uses logical type TIMESTAMP in Velox. So similarly TIME in Iceberg should use Velox TIME type. |
Description
Resolve issue #26214
Motivation and Context
Impact
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.