Skip to content

Conversation

@Xuanwo
Copy link
Member

@Xuanwo Xuanwo commented Nov 13, 2025

Which issue does this PR close?

What changes are included in this PR?

Add RFC for iceberg-kernel

Are these changes tested?

Signed-off-by: Xuanwo <github@xuanwo.io>
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has checked 374 files.

Valid Invalid Ignored Fixed
306 1 67 0
Click to see the invalid file list
  • docs/rfcs/0001_kernel.md
Use this command to fix any missing license headers
```bash

docker run -it --rm -v $(pwd):/github/workspace apache/skywalking-eyes header fix

</details>

Signed-off-by: Xuanwo <github@xuanwo.io>
@ZENOTME
Copy link
Contributor

ZENOTME commented Nov 13, 2025

I'm thinking is reasonable to split out spec part as a single minimum module? It just provide the memory representation of iceberg spec and de/serialize way. It can be evolve quickly and as a minimum module reuse by external user or inner module.

@Xuanwo
Copy link
Member Author

Xuanwo commented Nov 14, 2025

I'm thinking is reasonable to split out spec part as a single minimum module? It just provide the memory representation of iceberg spec and de/serialize way. It can be evolve quickly and as a minimum module reuse by external user or inner module.

I think that's an interesting idea. It’s fine to just expose a spec crate, but how useful would it be? For reading snapshots, manifest lists, and manifests, users still need a FileIO. Is there a use case that users just want to ser/de a manifest file?

Copy link

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks like a great plan personally -- though I am not a direct user of the crate so I don't have a whole lot of gravitas.

I will also try and hype this PR up some more to get some more commetns

@kevinjqliu
Copy link
Contributor

btw #1857 should resolve the CI issue

@kevinjqliu
Copy link
Contributor

merged main to fix ci

Xuanwo and others added 3 commits November 17, 2025 17:32
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Signed-off-by: Xuanwo <github@xuanwo.io>

- **Full read & write coverage**: the kernel must contain every protocol component required for both scan planning and transactional writes (append, rewrite, commit, etc.).
- **No default runtime dependency**: the kernel defines a `Runtime` trait instead of depending on Tokio or Smol.
- **No default storage dependency**: the kernel defines `FileIO` traits only; concrete implementations (for example `iceberg-fileio-opendal`) live in dedicated crates.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be Storage trait rather than FileIO trait?

From the code example below it seems like FileIO is a struct container for Storage and registry

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking that we have a iceberg-storage crate, which contains all fileio related things, including FileIO/InputFile/OutputFile struct, Storage trait, some builtin storage implemention(enabled independently by feature).

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Xuanwo for this proposal. I think the overall direction is correct, but I'm not sure we need a new kernel crate. What we need to do is move something like fileio, runtime api, local table execution out of iceberg crate, and this way it serves as the kernel crate, WDYT?


- **Full read & write coverage**: the kernel must contain every protocol component required for both scan planning and transactional writes (append, rewrite, commit, etc.).
- **No default runtime dependency**: the kernel defines a `Runtime` trait instead of depending on Tokio or Smol.
- **No default storage dependency**: the kernel defines `FileIO` traits only; concrete implementations (for example `iceberg-fileio-opendal`) live in dedicated crates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking that we have a iceberg-storage crate, which contains all fileio related things, including FileIO/InputFile/OutputFile struct, Storage trait, some builtin storage implemention(enabled independently by feature).

- **Full read & write coverage**: the kernel must contain every protocol component required for both scan planning and transactional writes (append, rewrite, commit, etc.).
- **No default runtime dependency**: the kernel defines a `Runtime` trait instead of depending on Tokio or Smol.
- **No default storage dependency**: the kernel defines `FileIO` traits only; concrete implementations (for example `iceberg-fileio-opendal`) live in dedicated crates.
- **Stable facade for existing users**: the top-level `iceberg` crate continues to expose the familiar API by re-exporting the kernel plus a default engine feature.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should keep public api as stable as possible, but it maybe not practical before we reach 1.0. Currently we reply on review to ensure it, but after reaching 1.0 we should use some autonomous checks to ensure that.

- Introduce standalone crates (`iceberg-runtime-tokio`, `iceberg-fileio-opendal`, etc.) that implement the new traits.

3. **Phase 3 – Detach Arrow/execution**
- Move `arrow` helpers and `ArrowReaderBuilder` into a reference executor crate (e.g. `iceberg-engine-arrow`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, there are several parts in the to_arrow method:

  1. ArrowReaderBuilder which is used to read one split of planned task. This is in fact compute engine independent, and we should leave it in iceberg crate.
  2. An executor to use runtime to scan table in parallel. I think this part should be move to the new default engine crate.

Signed-off-by: Xuanwo <github@xuanwo.io>
@Xuanwo Xuanwo changed the title docs: Add RFC for iceberg-kernel rfc: Modularize iceberg Implementations Nov 19, 2025
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license-eye has checked 374 files.

Valid Invalid Ignored Fixed
306 1 67 0
Click to see the invalid file list
  • docs/rfcs/0001_modularize_iceberg_implementations.md
Use this command to fix any missing license headers
```bash

docker run -it --rm -v $(pwd):/github/workspace apache/skywalking-eyes header fix

</details>

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@Xuanwo
Copy link
Member Author

Xuanwo commented Nov 19, 2025

I had an offline discussion with @liurenjie1024 and we’ve agreed to refactor our efforts to modularize the iceberg implementations instead of adding a new iceberg-kernel crate.

In short, we’ll expose all our core traits in iceberg and make it modular so users can plug in whatever they need. At the same time, we’ll provide a pre-built stack to make it easier for users to start scanning an iceberg table.

In the past, we thought it was better to split into more crates, but we realized it just added to our maintenance burden keeping those tiny crates up to date and most of the time we need to use them all at once.

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Xuanwo for raising this. I general I think the direction is right, we just need to refine the action items.

@Xuanwo
Copy link
Member Author

Xuanwo commented Nov 24, 2025

cc @liurenjie1024 and @kevinjqliu please review again.

Signed-off-by: Xuanwo <github@xuanwo.io>
Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Xuanwo for this rfc, LGTM!

@liurenjie1024
Copy link
Contributor

Let's keep this for a while to have more eyes on it.

@liurenjie1024 liurenjie1024 requested a review from Fokko November 26, 2025 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants