Skip to content

[C++] Memory allocation in Arrow 19.x-21.x conflicts with AddressSanitizer (ASan) shadow memory #47828

@lesterfan

Description

@lesterfan

Describe the bug, including details regarding any error messages, version, and platform.

There is an issue in recent Arrow releases where memory allocations made by Arrow conflict with the virtual memory region reserved for the AddressSanitizer (ASan) shadow memory on x86_64 Linux. This causes any subsequent attempt to initialize ASan (e.g. by loading a sanitized library) to fail.

This bug is currently fixed on main but is present in major releases 19.x - 21.x (and possibly earlier, though I haven't verified this).

Minimal Reproduction

The following test currently reliably fails on recent Arrow releases on x86_64 Linux:

#include <sys/mman.h>
#include <cerrno>
#include <cstdint>
#include <cstring>
#include <memory>
#include "arrow/api.h"

TEST(Lester, LesterTest) {
  arrow::Int64Builder a, b;
  ASSERT_TRUE(a.AppendValues({1, 2}).ok());
  ASSERT_TRUE(b.AppendValues({3, 4}).ok());
  std::shared_ptr<arrow::Array> aa, bb;
  ASSERT_TRUE(a.Finish(&aa).ok());
  ASSERT_TRUE(b.Finish(&bb).ok());
  auto schema = arrow::schema(
      {arrow::field("a", arrow::int64()), arrow::field("b", arrow::int64())});
  auto table = arrow::Table::Make(schema, {aa, bb});
  (void)table;

  constexpr int PROT = PROT_READ | PROT_WRITE;
  constexpr int FLAGS = MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED_NOREPLACE | MAP_NORESERVE;

  // https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit
  // This is a reserved region in virtual memory needed for the address sanitizer to properly work.
  // Note that the `MAP_FIXED_NOREPLACE` tells `mmap` to exit if any region overlaps with existing allocated regions.
  constexpr uintptr_t BEGIN = 0x02008fff7000ULL;
  constexpr uintptr_t END = 0x10007fff7fffULL;
  constexpr size_t LEN = static_cast<size_t>(END - BEGIN + 1);

  void* shadow_region = mmap(reinterpret_cast<void*>(BEGIN), LEN, PROT, FLAGS, 0, 0);
  ASSERT_EQ(shadow_region, reinterpret_cast<void*>(BEGIN)) << "mmap failed (" << errno << "): " << std::strerror(errno);
}

The test does the following:

  1. Performs an arrow::Table allocation.
  2. Makes an mmap call to reserve memory corresponding to the shadow memory reserved for the address sanitizer (Note that the MAP_FIXED_NOREPLACE flag tells mmap to fail if any region overlaps with existing allocated regions, which it does in this test).

On affected versions, the assertion fails because the initial Arrow allocation has already claimed a portion of the shadow memory, causing the second mmap call to fail with EEXIST.

Diagnosis

Running strace on the test confirms that libarrow.so makes an mmap call that allocates memory inside the ASan shadow range. For example:

mmap(0x3e780000000, 67108864, ...) = 0x3e780000000

The address 0x3e780000000 is within the ASan 64-bit shadow memory region [0x02008fff7000, 0x10007fff7fff]. The address provided to mmap by the allocator appears to be a bad hint. The address is also relatively inscrutable which makes us wonder whether it's an uninitialized variable read.

Root Cause and Resolution

Since this bug was reproducible on recent releases but not on main. I ran a git bisect using this script and minimal test: main...lesterfan:arrow:20251014-cpp-asan-error. The bisect identified #47589 as the first commit which fixes this test. I was able to verify this by cherry-picking this commit onto our release and observing that our observed issues with ASAN were resolved. This suggests that this bug is somewhere within mimalloc v2 and has since been fixed in v3.

Given that this bug prevents using Arrow with ASan-instrumented code, would it be possible to cherry-pick this commit into a maintenance release for the affected Arrow versions?

Even if not, I wanted to file this issue upstream for visibility in case other people run into similar issues. Thank you!

Component(s)

C++

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions