feat: CometNativeWriteExec support with native scan as a child #2839

mbutrovich · 2025-12-02T22:04:53Z

Which issue does this PR close?

Closes #.

Rationale for this change

Support native scan (tested with COMET_PARQUET_SCAN_IMPL=native_datafusion) as a child. Previously it never converted the native scan child.

What changes are included in this PR?

One change to reset the firstNativeOp flag and a lot of documentation to explain why.

How are these changes tested?

Existing test but with COMET_PARQUET_SCAN_IMPL=native_datafusion.

codecov-commenter · 2025-12-02T22:27:30Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.17%. Comparing base (f09f8af) to head (dcc5342).
⚠️ Report is 732 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #2839      +/-   ##
============================================
+ Coverage     56.12%   59.17%   +3.05%     
- Complexity      976     1490     +514     
============================================
  Files           119      167      +48     
  Lines         11743    15274    +3531     
  Branches       2251     2524     +273     
============================================
+ Hits           6591     9039    +2448     
- Misses         4012     4945     +933     
- Partials       1140     1290     +150

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

comphead

wondering if there could be a test for this? 🤔

mbutrovich · 2025-12-03T20:31:02Z

wondering if there could be a test for this? 🤔

Added.

wForget · 2025-12-04T01:02:58Z

spark/src/test/scala/org/apache/comet/parquet/CometParquetWriterSuite.scala

+      // Perform native write
+      df.write.parquet(outputPath)
+
+      // Wait for listener to be called with timeout


nit: use sparkContext.listenerBus.waitUntilEmpty() or org.scalatest.concurrent.Eventually#eventually

Thanks, I'll update the tests soon to use this approach

wForget

Thanks @mbutrovich , LGTM

comphead · 2025-12-04T03:00:33Z

spark/src/test/scala/org/apache/comet/parquet/CometParquetWriterSuite.scala

+    val outputDir = new File(outputPath)
+    val partFiles = outputDir.listFiles().filter(_.getName.startsWith("part-"))
+    // With 1000 rows and default parallelism, we should get multiple partitions
+    assert(partFiles.length > 1, "Expected multiple part files to be created")


should we check exact number of partitions? example: if you write a df hash partiotined by 50 we should have 50 files

I just moved that logic. Since this is a pretty early proof-of-concept feature from @andygrove I'm not too inclined to change test behavior in this PR.

andygrove

Thanks @mbutrovich @wForget

native scan support with native write.

1d3c0e0

mbutrovich requested review from andygrove and wForget December 2, 2025 22:04

comphead reviewed Dec 3, 2025

View reviewed changes

add test

dcc5342

mbutrovich requested a review from comphead December 3, 2025 20:30

wForget reviewed Dec 4, 2025

View reviewed changes

wForget approved these changes Dec 4, 2025

View reviewed changes

comphead reviewed Dec 4, 2025

View reviewed changes

andygrove approved these changes Dec 4, 2025

View reviewed changes

andygrove merged commit fe49e40 into apache:main Dec 4, 2025
113 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: CometNativeWriteExec support with native scan as a child #2839

feat: CometNativeWriteExec support with native scan as a child #2839

Uh oh!

mbutrovich commented Dec 2, 2025

Uh oh!

codecov-commenter commented Dec 2, 2025 •

edited

Loading

Uh oh!

comphead left a comment

Uh oh!

mbutrovich commented Dec 3, 2025

Uh oh!

wForget Dec 4, 2025 •

edited

Loading

Uh oh!

andygrove Dec 4, 2025

Uh oh!

wForget left a comment

Uh oh!

comphead Dec 4, 2025

Uh oh!

mbutrovich Dec 4, 2025

Uh oh!

andygrove left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

feat: CometNativeWriteExec support with native scan as a child #2839

feat: CometNativeWriteExec support with native scan as a child #2839

Uh oh!

Conversation

mbutrovich commented Dec 2, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

codecov-commenter commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

comphead left a comment

Choose a reason for hiding this comment

Uh oh!

mbutrovich commented Dec 3, 2025

Uh oh!

wForget Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andygrove Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

wForget left a comment

Choose a reason for hiding this comment

Uh oh!

comphead Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

mbutrovich Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

andygrove left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov-commenter commented Dec 2, 2025 •

edited

Loading

wForget Dec 4, 2025 •

edited

Loading