Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] window_function_test.test_window_aggs_for_rows_collect_set[DATAGEN_SEED=1740159968, TZ=America/Punta_Arenas, IGNORE_ORDER({'local': True}) failed #12199

Open
pxLi opened this issue Feb 24, 2025 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@pxLi
Copy link
Member

pxLi commented Feb 24, 2025

Describe the bug
rapids_it-non-utc-pre_release, run:183

note: this failed in branch-25.02, so the case may fails with specific TZ and DATAGEN_SEED

failed case with spark341:

src.main.python.window_function_test.test_window_aggs_for_rows_collect_set[DATAGEN_SEED=1740159968, TZ=America/Punta_Arenas, IGNORE_ORDER({'local': True}), ALLOW_NON_GPU(ProjectExec,FilterExec,FileSourceScanExec,BatchScanExec,CollectLimitExec,DeserializeToObjectExec,DataWritingCommandExec,WriteFilesExec,ShuffleExchangeExec,ExecutedCommandExec)]

assert failure

AssertionError: CPU and GPU list have different lengths at [4, 'sort_array(cc_float, true)'] CPU: 76 GPU: 75
@ignore_order(local=True)
    @allow_non_gpu(*non_utc_allow)
    def test_window_aggs_for_rows_collect_set():
>       assert_gpu_and_cpu_are_equal_sql(
            lambda spark: gen_df(spark, _gen_data_for_collect_set),
            "window_collect_table",
            '''
            select a, b,
                sort_array(cc_bool),
                sort_array(cc_int),
                sort_array(cc_long),
                sort_array(cc_short),
                sort_array(cc_date),
                sort_array(cc_ts),
                sort_array(cc_byte),
                sort_array(cc_str),
                sort_array(cc_float),
                sort_array(cc_double),
                sort_array(cc_decimal_32),
                sort_array(cc_decimal_64),
                sort_array(cc_decimal_128),
                sort_array(cc_fp_nan)
            from (
                select a, b,
                  collect_set(c_bool) over
                    (partition by a order by b,c_int rows between CURRENT ROW and UNBOUNDED FOLLOWING) as cc_bool,
                  collect_set(c_int) over
                    (partition by a order by b,c_int rows between CURRENT ROW and UNBOUNDED FOLLOWING) as cc_int,
                  collect_set(c_long) over
                    (partition by a order by b,c_int rows between CURRENT ROW and UNBOUNDED FOLLOWING) as cc_long,
                  collect_set(c_short) over
                    (partition by a order by b,c_int rows between CURRENT ROW and UNBOUNDED FOLLOWING) as cc_short,
                  collect_set(c_date) over
                    (partition by a order by b,c_int rows between CURRENT ROW and UNBOUNDED FOLLOWING) as cc_date,
                  collect_set(c_timestamp) over
                    (partition by a order by b,c_int rows between CURRENT ROW and UNBOUNDED FOLLOWING) as cc_ts,
                  collect_set(c_byte) over
                    (partition by a order by b,c_int rows between CURRENT ROW and UNBOUNDED FOLLOWING) as cc_byte,
                  collect_set(c_string) over
                    (partition by a order by b,c_int rows between CURRENT ROW and UNBOUNDED FOLLOWING) as cc_str,
                  collect_set(c_float) over
                    (partition by a order by b,c_int rows between CURRENT ROW and UNBOUNDED FOLLOWING) as cc_float,
                  collect_set(c_double) over
                    (partition by a order by b,c_int rows between CURRENT ROW and UNBOUNDED FOLLOWING) as cc_double,
                  collect_set(c_decimal_32) over
                    (partition by a order by b,c_int rows between CURRENT ROW and UNBOUNDED FOLLOWING) as cc_decimal_32,
                  collect_set(c_decimal_64) over
                    (partition by a order by b,c_int rows between CURRENT ROW and UNBOUNDED FOLLOWING) as cc_decimal_64,
                  collect_set(c_decimal_128) over
                    (partition by a order by b,c_int rows between CURRENT ROW and UNBOUNDED FOLLOWING) as cc_decimal_128,
                  collect_set(c_fp_nan) over
                    (partition by a order by b,c_int rows between CURRENT ROW and UNBOUNDED FOLLOWING) as cc_fp_nan
                from window_collect_table
            ) t
            ''',
            conf={'spark.rapids.sql.window.collectSet.enabled': True})

../../src/main/python/window_function_test.py:1457: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../src/main/python/asserts.py:641: in assert_gpu_and_cpu_are_equal_sql
    assert_gpu_and_cpu_are_equal_collect(do_it_all, conf, is_cpu_first=is_cpu_first)
../../src/main/python/asserts.py:599: in assert_gpu_and_cpu_are_equal_collect
    _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first, result_canonicalize_func_before_compare=result_canonicalize_func_before_compare)
../../src/main/python/asserts.py:521: in _assert_gpu_and_cpu_are_equal
    assert_equal(from_cpu, from_gpu)
../../src/main/python/asserts.py:111: in assert_equal
    _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])
../../src/main/python/asserts.py:43: in _assert_equal
    _assert_equal(cpu[index], gpu[index], float_check, path + [index])
../../src/main/python/asserts.py:36: in _assert_equal
    _assert_equal(cpu[field], gpu[field], float_check, path + [field])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cpu = [-960209551360.0, -265976643584.0, -285.2320861816406, -1.8941408514047178e-20, -0.0, 0.0, ...]
gpu = [-960209551360.0, -265976643584.0, -285.2320861816406, -1.8941408514047178e-20, -0.0, 3.651951356399018e+32, ...]
float_check = <function get_float_check.<locals>.<lambda> at 0x7efdf3ff13f0>
path = [4, 'sort_array(cc_float, true)']

    def _assert_equal(cpu, gpu, float_check, path):
        t = type(cpu)
        if (t is Row):
            assert len(cpu) == len(gpu), "CPU and GPU row have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
            if hasattr(cpu, "__fields__") and hasattr(gpu, "__fields__"):
                assert cpu.__fields__ == gpu.__fields__, "CPU and GPU row have different fields at {} CPU: {} GPU: {}".format(path, cpu.__fields__, gpu.__fields__)
                for field in cpu.__fields__:
                    _assert_equal(cpu[field], gpu[field], float_check, path + [field])
            else:
                for index in range(len(cpu)):
                    _assert_equal(cpu[index], gpu[index], float_check, path + [index])
        elif (t is list):
>           assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
E           AssertionError: CPU and GPU list have different lengths at [4, 'sort_array(cc_float, true)'] CPU: 76 GPU: 75

../../src/main/python/asserts.py:41: AssertionError

Steps/Code to reproduce bug
Please provide a list of steps or a code sample to reproduce the issue.
Avoid posting private or sensitive data.

Expected behavior
A clear and concise description of what you expected to happen.

Environment details (please complete the following information)

  • Environment location: [Standalone, YARN, Kubernetes, Cloud(specify cloud provider)]
  • Spark configuration settings related to the issue

Additional context
Add any other context about the problem here.

@pxLi pxLi added ? - Needs Triage Need team to review and classify bug Something isn't working labels Feb 24, 2025
@pxLi
Copy link
Member Author

pxLi commented Feb 24, 2025

cc @sperlingxx @res-life to help check thanks

@pxLi pxLi changed the title [BUG] window_function_test.test_window_aggs_for_rows_collect_set[DATAGEN_SEED=1740159968, TZ=America/Punta_Arenas, IGNORE_ORDER({'local': True}) failed non-UTC case [BUG] window_function_test.test_window_aggs_for_rows_collect_set[DATAGEN_SEED=1740159968, TZ=America/Punta_Arenas, IGNORE_ORDER({'local': True}) failed Feb 24, 2025
@res-life
Copy link
Collaborator

res-life commented Feb 24, 2025

Reproduced on branch/tag: v24.12.1, branch-25.02 and branch-25.04 with DATAGEN_SEED=1740159968.
If not set this SEED, it passed; This SEED should trigger a corner case.

@res-life
Copy link
Collaborator

@mythrocks Could you please help check this, the num of result rows are diff:
CPU: 76 GPU: 75

@revans2
Copy link
Collaborator

revans2 commented Feb 24, 2025

It appears to be a 0.0 vs -0.0 issue. If you look at the CPU output the last values shown are 0.0 and -0.0 where as the GPU only shows 0.0. I personally would consider this a bug in Spark and we might want to file an issue against them for this.

@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants