Skip to content

Conversation

@zhuqi-lucas
Copy link
Collaborator

This wants to improve performance for json array support for datafusion.

Upstream PR:

apache#19924

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves performance for JSON array format support in DataFusion by introducing a streaming converter that transforms JSON array format to NDJSON on-the-fly, avoiding the need to load entire files into memory. It also renames format_array to newline_delimited with inverted semantics for better clarity, and renames NdJsonReadOptions to JsonReadOptions to reflect that it now supports both formats.

Changes:

  • Implements JsonArrayToNdjsonReader - a streaming converter that processes JSON array format in chunks without loading entire files
  • Renames format_array option to newline_delimited (with inverted boolean semantics) across protobuf, config, and API
  • Renames NdJsonReadOptions to JsonReadOptions with deprecation for backward compatibility

Reviewed changes

Copilot reviewed 22 out of 26 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
datafusion/datasource-json/src/utils.rs New streaming JSON array to NDJSON converter implementation
datafusion/datasource-json/src/source.rs Updated to use streaming converter for JSON array format in both file and stream cases
datafusion/datasource-json/src/file_format.rs Updated schema inference to use streaming converter
datafusion/proto-common/proto/datafusion_common.proto Protobuf definition changes: removed compression_level, replaced format_array with newline_delimited
datafusion/proto-common/src/generated/*.rs Generated code from protobuf changes
datafusion/core/src/datasource/file_format/options.rs Renamed NdJsonReadOptions to JsonReadOptions with deprecation alias
datafusion/core/src/prelude.rs Export JsonReadOptions instead of NdJsonReadOptions
datafusion/common/src/config.rs Updated config option from format_array to newline_delimited
datafusion/sqllogictest/test_files/json.slt Updated test to use newline_delimited option
datafusion/core/src/datasource/file_format/json.rs Added comprehensive tests for JSON array format
Other test files Updated to use JsonReadOptions instead of NdJsonReadOptions
Cargo.toml, Cargo.lock Added tokio-stream and tokio-util dependencies

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@zhuqi-lucas zhuqi-lucas merged commit bb8195e into branch-51 Jan 30, 2026
64 of 65 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants