#8806 Consolidate bitwise operations and make nullif respect ArrayData bitmap layout contract
#8869
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Here’s a version that fits their template and avoids mentioning Python:
Which issue does this PR close?
Rationale for this change
The bitwise operations used across
arrow-buffer,arrow-arith, andarrow-selectwere spread across several overlapping helpers, which made it difficult to:nullif.While working on bitwise kernels, we also found that
nullifinarrow-selectwas fragile around sliced arrays, offsets, and null-count handling. This PR centralizes the bitwise logic and makesnullifexplicitly respect theArrayDatabitmap layout invariants (length, offset, validity mapping).What changes are included in this PR?
High-level:
Documented the bitmap layout contract
Added a documentation file (
docs/arraydata_bitmap_layout_contract.md) that spells out:lenandoffsetrelate to validity bits,offset = 0),nullifvalidity rule:result_valid(i) = V(i) & !C(i).Strengthened core bitwise testing in
arrow-buffer/arrow-arithAdded/extended tests that:
Buffer/MutableBufferbitwise APIs against existing helpers across random data,(offset + len)boundary cases),MutableBuffer) operations produce the same bits as allocating versions.Added
randas a dev-dependency ofarrow-arithto support reproducible randomized tests.Refactored
nullifto follow the layout contractIntroduced a helper:
which implements the core
V(i) & !C(i)validity logic over buffers and bit offsets.Updated
nullifto:Derive correct bit offsets from
ArrayDatafor the left validity and condition mask,Always build new result
ArrayDatawithoffset = 0,Slice value/offset buffers so element 0 in the result corresponds to logical index 0 for:
Nullif-focused tests
Added a focused unit test for
compute_nullif_validitythat exercises:(offset + len) % 8 == 0boundaries,Verified and fixed behavior so that the existing
nulliftests all pass, including:null_countregression tests,nullif_fuzztest.Net effect:
nullifis now a thin consumer of the documented bitmap layout contract and the consolidated bitwise APIs.Are these changes tested?
Yes.
The following were run locally:
All of the above pass, including the previously failing
nullifregression and fuzz tests.Are there any user-facing changes?
There are no breaking changes to public APIs.
User-visible effects:
nullifbehavior is now strictly aligned with the documented bitmap layout contract for:This should only be observable as bug fixes in corner cases (e.g., nulls at specific offsets, struct/string slices), not API changes.
No new configuration or feature flags are introduced; this is primarily a correctness and maintainability improvement around existing behavior.