Fix uninitialized reads in parquet chunked reader #17810

pmattione-nvidia · 2025-01-24T16:12:40Z

This PR fixes a couple of uninitialized reads in parquet chunked reader.

We are trying to go from an array of size N to a list of offsets to the data referred to by the array of length N+1, where the last offset is the total offset to the end. The exclusive_scan() calls compute the offsets, but do an uninitialized read of array[N]. This introduces thrust counting_transform_iterator's to perform an indirection so that we don't read array[N]. Note that exclusive_scan() computes a prefix sum, so the uninitialized data in array[N] shouldn't have been used to compute anything anyway.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

davidwendt · 2025-01-24T16:22:24Z

I would be concerned about any performance impact this my incur.
Overall, I don't think this extra code is necessary since the read does not actual contribute to the result.
We have many places in libcudf that work this way with exclusive-scan.
See the discussion here #12667 (comment) and the proposed resolution to be fixed in CCCL here: NVIDIA/cccl#876

pmattione-nvidia · 2025-01-24T17:53:16Z

That CCCL resolution has been open for 3.5 years now; I doubt it is getting fixed. And UB is bad. And any overhead this adds is negligible compared to everything else needed for parquet reads (decompression, reading global device memory, etc.)

davidwendt · 2025-01-24T17:58:07Z

That CCCL resolution has been open for 3.5 years now; I doubt it is getting fixed. And UB is bad.

You are welcome to note your comments in the CCCL issue.
But I don't think we should try to work around this in general.

pmattione-nvidia · 2025-02-03T17:20:05Z

Also, this overhead isn't being applied to processing of the data itself, it's just overhead on calculating a few page offsets. The perf cost of this is negligible, UB should be avoided, and CCCL is not going to fix this (it may even require a branch in exclusive_scan(), so they may not even want to change anything there).

nvdbaranec · 2025-02-03T17:43:51Z

I would be concerned about any performance impact this my incur. Overall, I don't think this extra code is necessary since the read does not actual contribute to the result. We have many places in libcudf that work this way with exclusive-scan. See the discussion here #12667 (comment) and the proposed resolution to be fixed in CCCL here: NVIDIA/cccl#876

The counterargument here is that this isn't being applied to bulk data. It's being applied to metadata - dozens/hundreds of things (page structs) during the setup phase.

davidwendt · 2025-02-03T18:51:29Z

Also, this overhead isn't being applied to processing of the data itself, it's just overhead on calculating a few page offsets. The perf cost of this is negligible, UB should be avoided, and CCCL is not going to fix this (it may even require a branch in exclusive_scan(), so they may not even want to change anything there).

I don't want to workaround issues like this but prefer CCCL fix (or not) the issue in their library.
I think the risk here is not worth this change.
Although the read is UB, the value is discarded by CCCL and there is no risk of data corruption.
Perhaps this could even be a feature request for compute-sanitizer to detect the read value is not used.
Regardless, this has been covered in great detail in #12667 and you are welcome to add more comments to that closed issue.

davidwendt

Recommend closing this PR.

fix uninit reads

899c804

pmattione-nvidia added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 24, 2025

pmattione-nvidia self-assigned this Jan 24, 2025

pmattione-nvidia requested review from a team as code owners January 24, 2025 16:12

pmattione-nvidia requested review from AyodeAwe, vyasr, galipremsagar and PointKernel January 24, 2025 16:12

pmattione-nvidia changed the base branch from branch-25.02 to branch-25.04 January 24, 2025 16:13

pmattione-nvidia requested a review from nvdbaranec January 24, 2025 16:15

pmattione-nvidia mentioned this pull request Jan 24, 2025

[BUG] Intermittent result discrepancy for NDS SF3K query86 on L40S NVIDIA/spark-rapids#11835

Open

Merge branch 'branch-25.04' into fix_uninit_parquet

219c003

github-actions bot removed the Python Affects Python cuDF API. label Feb 3, 2025

github-actions bot removed CMake CMake build issue Java Affects Java cuDF API. cudf.pandas Issues specific to cudf.pandas cudf.polars Issues specific to cudf.polars pylibcudf Issues specific to the pylibcudf package labels Feb 3, 2025

nvdbaranec approved these changes Feb 3, 2025

View reviewed changes

davidwendt requested changes Feb 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix uninitialized reads in parquet chunked reader #17810

Fix uninitialized reads in parquet chunked reader #17810

pmattione-nvidia commented Jan 24, 2025 •

edited

Loading

davidwendt commented Jan 24, 2025

pmattione-nvidia commented Jan 24, 2025 •

edited

Loading

davidwendt commented Jan 24, 2025

pmattione-nvidia commented Feb 3, 2025

nvdbaranec commented Feb 3, 2025

davidwendt commented Feb 3, 2025

davidwendt left a comment

Fix uninitialized reads in parquet chunked reader #17810

Are you sure you want to change the base?

Fix uninitialized reads in parquet chunked reader #17810

Conversation

pmattione-nvidia commented Jan 24, 2025 • edited Loading

Checklist

davidwendt commented Jan 24, 2025

pmattione-nvidia commented Jan 24, 2025 • edited Loading

davidwendt commented Jan 24, 2025

pmattione-nvidia commented Feb 3, 2025

nvdbaranec commented Feb 3, 2025

davidwendt commented Feb 3, 2025

davidwendt left a comment

Choose a reason for hiding this comment

pmattione-nvidia commented Jan 24, 2025 •

edited

Loading

pmattione-nvidia commented Jan 24, 2025 •

edited

Loading