Test: configuration fuzzer for (external) sort queries #15501

2010YOUY01 · 2025-03-31T07:42:26Z

Which issue does this PR close?

Closes #.

Rationale for this change

Recently we have detected multiple bugs for out-of-core sorting, and there are many potential fixes/improvements that will change the sort execution code, thus more test coverage is intended.

The existing sort fuzzer https://github.com/apache/datafusion/blob/main/datafusion/core/tests/fuzz_cases/sort_fuzz.rs only tests on SortExec instead of end-to-end sort queries, and it only includes int and string types.
This new fuzzer will run end-to-end sort queries with multiple partiitons, and increases the test coverage for more dataset types, include TopK executors, and test all related configuration options.

The new sort fuzzer is able to detect known bugs in #14748, so the random query generation will avoid the known issues (e.g. don't generate utf8 sort key, more details can be found in code comment)
It has also detected several new bugs, for example: #15469 and #15355
There are other bugs detected and it's not obvious why it fails, so now by default, it will only run with unbounded memory limit, perhaps we should retry it after the known issues like #14748 are cleared. An example of such failed query is in sort_query_fuzz.rs's test case test_sort_query_fuzzer_reproduce

Implementation

There are two key structs in the sort query fuzzer:

SortQueryFuzzer: controls the runner config like how many rounds to run, and how many queries to test inside each round.
SortFuzzerTestGenerator: Generates random datasets, queries, and configs.

The log looks like

[SortQueryFuzzer] Round 0, Query 0 (Config 9)
  Seeds:
    init_seed   = 5007153919587973719
    query_seed  = 4047642740781589262
    config_seed = 13112349042886494469
  Dataset schema:
    [timestamp_ns:Timestamp(Nanosecond, None);N, u32:UInt32;N, time32_ms:Time32(Millisecond);N, i16:Int16;N, timestamp_ms:Timestamp(Millisecond, None);N, u8_low:UInt8;N]
  Query:
    SELECT * FROM sort_fuzz_table ORDER BY u32 DESC
  Config:
    Dataset size: 93.2 KB
    Number of partitions: 3
    Batch size: 6
    Memory limit: Unbounded
    Per partition memory limit: Unbounded
    Sort spill reservation bytes: 10.3 KB
    Sort in place threshold bytes: 540.0 B

There is a utility function to reproduce the failed execution deterministically using the above seeds, see the example in in sort_query_fuzz.rs's test case test_sort_query_fuzzer_reproduce

What changes are included in this PR?

Refactor: move RecordBatchGenerator and related structs from datafusion/core/tests/fuzz_cases/aggregation_fuzzer/data_generator.rs to datafusion/core/tests/fuzz_cases/record_batch_generator.rs, to reuse the dataset generation utilities for aggregation fuzzer to generate dataset with more types.
Implement SortQueryFuzzer and SortFuzzerTestGenerator in sort_query_fuzz.rs

The entry point is now in sort_query_fuzz.rs, and the fuzzer is limited to run for up to 20 seconds (around 75 queries on my machine) to keep the overall test runtime reasonable. Once the known issues are resolved and the fuzzer runs stably without uncovering new bugs, we can move it to the extended test CI job and allow it to run for a longer duration.

Are these changes tested?

NA

Are there any user-facing changes?

No.

Copilot

Pull Request Overview

This PR introduces a new configuration fuzzer for end-to-end sort queries to improve test coverage for sorting, including TopK executors and multiple dataset types. Key changes include:

Adding the new module "sort_query_fuzz" and its associated entry point.
Refactoring dataset generation utilities by moving RecordBatchGenerator-related code into a shared utility module.
Modifying the AggregationFuzzer API to require a mutable self reference during async execution.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

File	Description
datafusion/core/tests/fuzz_cases/mod.rs	Registers the new sort_query_fuzz module and adds the record_batch_generator utility module.
datafusion/core/tests/fuzz_cases/aggregation_fuzzer/mod.rs	Updates the import of ColumnDescr to use the shared record_batch_generator module.
datafusion/core/tests/fuzz_cases/aggregation_fuzzer/fuzzer.rs	Changes async functions to accept a mutable self reference to support internal state modifications.
datafusion/core/tests/fuzz_cases/aggregate_fuzz.rs	Refactors inline column definitions to use the shared get_supported_types_columns function for dataset setup.

Comments suppressed due to low confidence (2)

datafusion/core/tests/fuzz_cases/aggregation_fuzzer/fuzzer.rs:167

[nitpick] Changing the function signature to require a mutable self reference indicates that the fuzzer's internal state is modified during execution; consider updating the function's documentation to clearly reflect this behavior for future maintainers.

pub async fn run(&mut self) {

datafusion/core/tests/fuzz_cases/aggregate_fuzz.rs:203

[nitpick] Ensure that get_supported_types_columns returns a comprehensive set of columns covering all the data types previously specified explicitly to avoid any unintended gap in test coverage.

let columns = get_supported_types_columns(rng.gen());

alamb

This is awesome -- thank you @2010YOUY01

I think we should consider the random seed along with the runtime of this test, but otherwise it looks really nice

There are other bugs detected and it's not obvious why it fails, so now by default, it will only run with unbounded memory limit,

Could you make a list of known issues that are preventing this fuzz test from being able to run with bounded memory limits (or a placeholder if we don't know why it fails)

I think it would be valuable to track the progress towards "enable sort fuzzing with limited memory" as I think others would be interested in helping and we would avoid leaving a half completed test in here.

alamb · 2025-04-01T00:11:45Z

datafusion/core/tests/fuzz_cases/record_batch_generator.rs

+
+/// Columns that are supported by the record batch generator
+/// The RNG is used to generate the precision and scale for the decimal columns, thread
+/// RNG is not used because this is used in fuzzing and deterministic results are preferred


I think when fuzzing it probably would be preferred to use the thread_rng to add additional coverage.

It does make the test failures unpredictable however