-
Notifications
You must be signed in to change notification settings - Fork 356
rfc: Modularize iceberg Implementations
#1854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Xuanwo <github@xuanwo.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
license-eye has checked 374 files.
| Valid | Invalid | Ignored | Fixed |
|---|---|---|---|
| 306 | 1 | 67 | 0 |
Click to see the invalid file list
- docs/rfcs/0001_kernel.md
Use this command to fix any missing license headers
```bash
docker run -it --rm -v $(pwd):/github/workspace apache/skywalking-eyes header fix
</details>
Signed-off-by: Xuanwo <github@xuanwo.io>
|
I'm thinking is reasonable to split out |
I think that's an interesting idea. It’s fine to just expose a spec crate, but how useful would it be? For reading snapshots, manifest lists, and manifests, users still need a FileIO. Is there a use case that users just want to ser/de a manifest file? |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks like a great plan personally -- though I am not a direct user of the crate so I don't have a whole lot of gravitas.
I will also try and hype this PR up some more to get some more commetns
|
btw #1857 should resolve the CI issue |
|
merged main to fix ci |
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Signed-off-by: Xuanwo <github@xuanwo.io>
docs/rfcs/0001_kernel.md
Outdated
|
|
||
| - **Full read & write coverage**: the kernel must contain every protocol component required for both scan planning and transactional writes (append, rewrite, commit, etc.). | ||
| - **No default runtime dependency**: the kernel defines a `Runtime` trait instead of depending on Tokio or Smol. | ||
| - **No default storage dependency**: the kernel defines `FileIO` traits only; concrete implementations (for example `iceberg-fileio-opendal`) live in dedicated crates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be Storage trait rather than FileIO trait?
From the code example below it seems like FileIO is a struct container for Storage and registry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking that we have a iceberg-storage crate, which contains all fileio related things, including FileIO/InputFile/OutputFile struct, Storage trait, some builtin storage implemention(enabled independently by feature).
liurenjie1024
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Xuanwo for this proposal. I think the overall direction is correct, but I'm not sure we need a new kernel crate. What we need to do is move something like fileio, runtime api, local table execution out of iceberg crate, and this way it serves as the kernel crate, WDYT?
docs/rfcs/0001_kernel.md
Outdated
|
|
||
| - **Full read & write coverage**: the kernel must contain every protocol component required for both scan planning and transactional writes (append, rewrite, commit, etc.). | ||
| - **No default runtime dependency**: the kernel defines a `Runtime` trait instead of depending on Tokio or Smol. | ||
| - **No default storage dependency**: the kernel defines `FileIO` traits only; concrete implementations (for example `iceberg-fileio-opendal`) live in dedicated crates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking that we have a iceberg-storage crate, which contains all fileio related things, including FileIO/InputFile/OutputFile struct, Storage trait, some builtin storage implemention(enabled independently by feature).
docs/rfcs/0001_kernel.md
Outdated
| - **Full read & write coverage**: the kernel must contain every protocol component required for both scan planning and transactional writes (append, rewrite, commit, etc.). | ||
| - **No default runtime dependency**: the kernel defines a `Runtime` trait instead of depending on Tokio or Smol. | ||
| - **No default storage dependency**: the kernel defines `FileIO` traits only; concrete implementations (for example `iceberg-fileio-opendal`) live in dedicated crates. | ||
| - **Stable facade for existing users**: the top-level `iceberg` crate continues to expose the familiar API by re-exporting the kernel plus a default engine feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we should keep public api as stable as possible, but it maybe not practical before we reach 1.0. Currently we reply on review to ensure it, but after reaching 1.0 we should use some autonomous checks to ensure that.
docs/rfcs/0001_kernel.md
Outdated
| - Introduce standalone crates (`iceberg-runtime-tokio`, `iceberg-fileio-opendal`, etc.) that implement the new traits. | ||
|
|
||
| 3. **Phase 3 – Detach Arrow/execution** | ||
| - Move `arrow` helpers and `ArrowReaderBuilder` into a reference executor crate (e.g. `iceberg-engine-arrow`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, there are several parts in the to_arrow method:
ArrowReaderBuilderwhich is used to read one split of planned task. This is in fact compute engine independent, and we should leave it inicebergcrate.- An executor to use runtime to scan table in parallel. I think this part should be move to the new default engine crate.
iceberg Implementations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
license-eye has checked 374 files.
| Valid | Invalid | Ignored | Fixed |
|---|---|---|---|
| 306 | 1 | 67 | 0 |
Click to see the invalid file list
- docs/rfcs/0001_modularize_iceberg_implementations.md
Use this command to fix any missing license headers
```bash
docker run -it --rm -v $(pwd):/github/workspace apache/skywalking-eyes header fix
</details>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
|
I had an offline discussion with @liurenjie1024 and we’ve agreed to refactor our efforts to modularize the In short, we’ll expose all our core traits in In the past, we thought it was better to split into more crates, but we realized it just added to our maintenance burden keeping those tiny crates up to date and most of the time we need to use them all at once. |
liurenjie1024
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Xuanwo for raising this. I general I think the direction is right, we just need to refine the action items.
Signed-off-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Xuanwo <github@xuanwo.io>
|
cc @liurenjie1024 and @kevinjqliu please review again. |
liurenjie1024
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Xuanwo for this rfc, LGTM!
|
Let's keep this for a while to have more eyes on it. |
Which issue does this PR close?
What changes are included in this PR?
Add RFC for iceberg-kernel
Are these changes tested?