[BUG] window_function_test.test_window_aggs_for_rows_collect_set[DATAGEN_SEED=1740159968, TZ=America/Punta_Arenas, IGNORE_ORDER({'local': True}) failed #12199

pxLi · 2025-02-24T00:58:15Z

Describe the bug
rapids_it-non-utc-pre_release, run:183

note: this failed in branch-25.02, so the case may fails with specific TZ and DATAGEN_SEED

failed case with spark341:

src.main.python.window_function_test.test_window_aggs_for_rows_collect_set[DATAGEN_SEED=1740159968, TZ=America/Punta_Arenas, IGNORE_ORDER({'local': True}), ALLOW_NON_GPU(ProjectExec,FilterExec,FileSourceScanExec,BatchScanExec,CollectLimitExec,DeserializeToObjectExec,DataWritingCommandExec,WriteFilesExec,ShuffleExchangeExec,ExecutedCommandExec)]

assert failure

AssertionError: CPU and GPU list have different lengths at [4, 'sort_array(cc_float, true)'] CPU: 76 GPU: 75

@ignore_order(local=True)
    @allow_non_gpu(*non_utc_allow)
    def test_window_aggs_for_rows_collect_set():
>       assert_gpu_and_cpu_are_equal_sql(
            lambda spark: gen_df(spark, _gen_data_for_collect_set),
            "window_collect_table",
            '''
            select a, b,
                sort_array(cc_bool),
                sort_array(cc_int),
                sort_array(cc_long),
                sort_array(cc_short),
                sort_array(cc_date),
                sort_array(cc_ts),
                sort_array(cc_byte),
                sort_array(cc_str),
                sort_array(cc_float),
                sort_array(cc_double),
                sort_array(cc_decimal_32),
                sort_array(cc_decimal_64),
                sort_array(cc_decimal_128),
                sort_array(cc_fp_nan)
            from (
                select a, b,
                  collect_set(c_bool) over
                    (partition by a order by b,c_int rows between CURRENT ROW and UNBOUNDED FOLLOWING) as cc_bool,
                  collect_set(c_int) over
                    (partition by a order by b,c_int rows between CURRENT ROW and UNBOUNDED FOLLOWING) as cc_int,
                  collect_set(c_long) over
                    (partition by a order by b,c_int rows between CURRENT ROW and UNBOUNDED FOLLOWING) as cc_long,
                  collect_set(c_short) over
                    (partition by a order by b,c_int rows between CURRENT ROW and UNBOUNDED FOLLOWING) as cc_short,
                  collect_set(c_date) over
                    (partition by a order by b,c_int rows between CURRENT ROW and UNBOUNDED FOLLOWING) as cc_date,
                  collect_set(c_timestamp) over
                    (partition by a order by b,c_int rows between CURRENT ROW and UNBOUNDED FOLLOWING) as cc_ts,
                  collect_set(c_byte) over
                    (partition by a order by b,c_int rows between CURRENT ROW and UNBOUNDED FOLLOWING) as cc_byte,
                  collect_set(c_string) over
                    (partition by a order by b,c_int rows between CURRENT ROW and UNBOUNDED FOLLOWING) as cc_str,
                  collect_set(c_float) over
                    (partition by a order by b,c_int rows between CURRENT ROW and UNBOUNDED FOLLOWING) as cc_float,
                  collect_set(c_double) over
                    (partition by a order by b,c_int rows between CURRENT ROW and UNBOUNDED FOLLOWING) as cc_double,
                  collect_set(c_decimal_32) over
                    (partition by a order by b,c_int rows between CURRENT ROW and UNBOUNDED FOLLOWING) as cc_decimal_32,
                  collect_set(c_decimal_64) over
                    (partition by a order by b,c_int rows between CURRENT ROW and UNBOUNDED FOLLOWING) as cc_decimal_64,
                  collect_set(c_decimal_128) over
                    (partition by a order by b,c_int rows between CURRENT ROW and UNBOUNDED FOLLOWING) as cc_decimal_128,
                  collect_set(c_fp_nan) over
                    (partition by a order by b,c_int rows between CURRENT ROW and UNBOUNDED FOLLOWING) as cc_fp_nan
                from window_collect_table
            ) t
            ''',
            conf={'spark.rapids.sql.window.collectSet.enabled': True})

../../src/main/python/window_function_test.py:1457: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../src/main/python/asserts.py:641: in assert_gpu_and_cpu_are_equal_sql
    assert_gpu_and_cpu_are_equal_collect(do_it_all, conf, is_cpu_first=is_cpu_first)
../../src/main/python/asserts.py:599: in assert_gpu_and_cpu_are_equal_collect
    _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first, result_canonicalize_func_before_compare=result_canonicalize_func_before_compare)
../../src/main/python/asserts.py:521: in _assert_gpu_and_cpu_are_equal
    assert_equal(from_cpu, from_gpu)
../../src/main/python/asserts.py:111: in assert_equal
    _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])
../../src/main/python/asserts.py:43: in _assert_equal
    _assert_equal(cpu[index], gpu[index], float_check, path + [index])
../../src/main/python/asserts.py:36: in _assert_equal
    _assert_equal(cpu[field], gpu[field], float_check, path + [field])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cpu = [-960209551360.0, -265976643584.0, -285.2320861816406, -1.8941408514047178e-20, -0.0, 0.0, ...]
gpu = [-960209551360.0, -265976643584.0, -285.2320861816406, -1.8941408514047178e-20, -0.0, 3.651951356399018e+32, ...]
float_check = <function get_float_check.<locals>.<lambda> at 0x7efdf3ff13f0>
path = [4, 'sort_array(cc_float, true)']

    def _assert_equal(cpu, gpu, float_check, path):
        t = type(cpu)
        if (t is Row):
            assert len(cpu) == len(gpu), "CPU and GPU row have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
            if hasattr(cpu, "__fields__") and hasattr(gpu, "__fields__"):
                assert cpu.__fields__ == gpu.__fields__, "CPU and GPU row have different fields at {} CPU: {} GPU: {}".format(path, cpu.__fields__, gpu.__fields__)
                for field in cpu.__fields__:
                    _assert_equal(cpu[field], gpu[field], float_check, path + [field])
            else:
                for index in range(len(cpu)):
                    _assert_equal(cpu[index], gpu[index], float_check, path + [index])
        elif (t is list):
>           assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
E           AssertionError: CPU and GPU list have different lengths at [4, 'sort_array(cc_float, true)'] CPU: 76 GPU: 75

../../src/main/python/asserts.py:41: AssertionError

Steps/Code to reproduce bug
Please provide a list of steps or a code sample to reproduce the issue.
Avoid posting private or sensitive data.

Expected behavior
A clear and concise description of what you expected to happen.

Environment details (please complete the following information)

Environment location: [Standalone, YARN, Kubernetes, Cloud(specify cloud provider)]
Spark configuration settings related to the issue

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

pxLi · 2025-02-24T01:41:25Z

cc @sperlingxx @res-life to help check thanks

res-life · 2025-02-24T02:57:32Z

Reproduced on branch/tag: v24.12.1, branch-25.02 and branch-25.04 with DATAGEN_SEED=1740159968.
If not set this SEED, it passed; This SEED should trigger a corner case.

res-life · 2025-02-24T06:17:45Z

@mythrocks Could you please help check this, the num of result rows are diff:
CPU: 76 GPU: 75

revans2 · 2025-02-24T14:11:53Z

It appears to be a 0.0 vs -0.0 issue. If you look at the CPU output the last values shown are 0.0 and -0.0 where as the GPU only shows 0.0. I personally would consider this a bug in Spark and we might want to file an issue against them for this.

pxLi added ? - Needs Triage Need team to review and classify bug Something isn't working labels Feb 24, 2025

pxLi assigned sperlingxx and res-life Feb 24, 2025

pxLi unassigned sperlingxx Feb 24, 2025

mattahrens removed the ? - Needs Triage Need team to review and classify label Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] window_function_test.test_window_aggs_for_rows_collect_set[DATAGEN_SEED=1740159968, TZ=America/Punta_Arenas, IGNORE_ORDER({'local': True}) failed #12199

[BUG] window_function_test.test_window_aggs_for_rows_collect_set[DATAGEN_SEED=1740159968, TZ=America/Punta_Arenas, IGNORE_ORDER({'local': True}) failed #12199

pxLi commented Feb 24, 2025 •

edited

Loading

pxLi commented Feb 24, 2025

res-life commented Feb 24, 2025 •

edited

Loading

res-life commented Feb 24, 2025

revans2 commented Feb 24, 2025

[BUG] window_function_test.test_window_aggs_for_rows_collect_set[DATAGEN_SEED=1740159968, TZ=America/Punta_Arenas, IGNORE_ORDER({'local': True}) failed #12199

[BUG] window_function_test.test_window_aggs_for_rows_collect_set[DATAGEN_SEED=1740159968, TZ=America/Punta_Arenas, IGNORE_ORDER({'local': True}) failed #12199

Comments

pxLi commented Feb 24, 2025 • edited Loading

pxLi commented Feb 24, 2025

res-life commented Feb 24, 2025 • edited Loading

res-life commented Feb 24, 2025

revans2 commented Feb 24, 2025

pxLi commented Feb 24, 2025 •

edited

Loading

res-life commented Feb 24, 2025 •

edited

Loading