Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
apacheGH-37515: [C++] Remove memory address optimization from `Chunke…
…dArray::Equals(const std::shared_ptr<arrow::ChunkedArray>& other)` if the `ChunkedArray` can have `NaN` values (apache#37579) ### Rationale for this change `ChunkedArray::Equals(const std::shared_ptr<ChunkedArray>& other)` assumes that if the two `ChunkedArray`s share the same memory address, then they must be equal. However, this optimization doesn't take into account that `NaN` values are not considered equal by default. Consequently, this can lead to surprising, inconsistent results from a user's perspective. For example, `ChunkedArray::Equals(const std::shared_ptr<ChunkedArray>& other)` and `ChunkedArray::Equals(const ChunkedArray& other)` may return different results. The program below illustrates this inconsistency: ```c++ #include <arrow/api.h> #include <arrow/type.h> #include <iostream> #include <math.h> #include <sstream> arrow::Result<std::shared_ptr<arrow::ChunkedArray>> make_chunked_array() { arrow::NumericBuilder<arrow::DoubleType> builder; std::shared_ptr<arrow::Array> array; ARROW_RETURN_NOT_OK(builder.AppendValues({0, 1, NAN, 2, 4})); ARROW_RETURN_NOT_OK(builder.Finish(&array)); return arrow::ChunkedArray::Make({array}); } int main(int argc, char *argv[]) { auto maybe_chunked_array = make_chunked_array(); if (!maybe_chunked_array.ok()) { return -1; } auto chunked_array = std::move(maybe_chunked_array).ValueUnsafe(); auto array = chunked_array->chunk(0); std::stringstream stream; stream << "chunked_array contents: "; stream << "\n\n"; stream << chunked_array->ToString(); stream << "\n\n"; stream << "chunked_array->Equals(chunked_array): "; stream << chunked_array->Equals(chunked_array); stream << "chunked_array->Equals(*chunked_array): "; stream << chunked_array->Equals(*chunked_array); std::cout << stream.str() << std::endl; } ``` Here is the output of this program: ```shell chunked_array contents: [ [ 0, 1, nan, 2, 4 ] ] chunked_array->Equals(chunked_array): 1 chunked_array->Equals(*chunked_array): 0 ``` ### What changes are included in this PR? Updated `ChunkedArray::Equals(const std::shared_ptr<ChunkedArray>& other)` to only return `true` early IF: - The two share the same address AND - They cannot have `NaN` values If both of those conditions are not satisfied, `ChunkedArray::Equals(const std::shared_ptr<ChunkedArray>& other)` will do the element-by-element comparison. ### Are these changes tested? Yes. I added a new test case called `EqualsSameAddressWithNaNs` to `chunked_array_test.cc`. ### Are there any user-facing changes? Yes. `ChunkedArray::Equals(const std::shared_ptr<arrow::ChunkedArray>& other)` may return `false` even if the two `ChunkedArray`s have the same memory address. This will only occur if the `ChunkedArray`'s contain `NaN` values. * Closes: apache#37515 Authored-by: Sarah Gilmore <sgilmore@mathworks.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
- Loading branch information