-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ucx_perftest binary missing linking information #12
Comments
Alright, I don't have a full explanation and suggested fix yet, but stopping to put up some notes. So I can see here, interactively, that the linking information looks correct at the end of the wheel build and full code to build and unpack wheel (click me)# get auditwheel source (used later in debugging)
git clone \
git@github.com:pypa/auditwheel.git \
./auditwheel-src
docker run \
--rm \
-v $(pwd):/opt/work \
-w /opt/work \
-it rapidsai/ci-wheel:cuda12.2.2-rockylinux8-py3.11 \
bash
rm -rf ./dist
rm -rf ./final_dist
rm -rf ./unzipped_contents
rm -rf ./unzipped-post-auditwheel
pip uninstall --yes auditwheel
pip install -e ./auditwheel-src
# move to a different directory not mounted in, to avoid those annoying docker 'permission denied'
# issues when files are changed by the build process
cp -R $(pwd) /tmp/ucx-wheels
cd /tmp/ucx-wheels/python/libucx
python -m pip wheel \
-w dist \
-v \
--no-deps \
--disable-pip-version-check \
.
mkdir -p ./unzipped-contents
unzip \
./dist/libucx*.whl \
-d ./unzipped-contents mkdir -p ./unzipped-contents
unzip \
./dist/libucx*.whl \
-d ./unzipped-contents
ldd ./unzipped-contents/libucx/bin/ucx_perftest ldd output (click me)
objdump -x ./unzipped-contents/libucx/bin/ucx_perftest | grep PATH
# RUNPATH /tmp/ucx-wheels/python/libucx/build/lib/libucx/lib And ./unzipped-contents/libucx/bin/ucx_perftest
# [1729282637.710255] [ea8c4a832eab:23962:0] perftest.c:793 UCX WARN CPU affinity is not set (bound to 80 cpus).
# Performance may be impacted.
# Waiting for connection... I installed that patch (click me)diff --git a/src/auditwheel/patcher.py b/src/auditwheel/patcher.py
index 67367c9..1baca3c 100644
--- a/src/auditwheel/patcher.py
+++ b/src/auditwheel/patcher.py
@@ -3,7 +3,13 @@ from __future__ import annotations
import re
from itertools import chain
from shutil import which
-from subprocess import CalledProcessError, check_call, check_output
+from subprocess import CalledProcessError, check_call as subpr_check_call, check_output
+
+
+def check_call(args: list):
+ arg_str = " ".join(args)
+ print(f"(command) '{arg_str}'")
+ subpr_check_call(args)
class ElfPatcher:
diff --git a/src/auditwheel/repair.py b/src/auditwheel/repair.py
index 85e3ca3..0723c6b 100644
--- a/src/auditwheel/repair.py
+++ b/src/auditwheel/repair.py
@@ -10,7 +10,7 @@ import stat
from os.path import abspath, basename, dirname, exists, isabs
from os.path import join as pjoin
from pathlib import Path
-from subprocess import check_call
+from subprocess import check_call as subpr_check_call
from typing import Iterable
from auditwheel.patcher import ElfPatcher
@@ -23,6 +23,10 @@ from .wheeltools import InWheelCtx, add_platforms
logger = logging.getLogger(__name__)
+def check_call(args: list):
+ arg_str = " ".join(args)
+ print(f"(command) '{arg_str}'")
+ subpr_check_call(args)
# Copied from wheel 0.31.1
WHEEL_INFO_RE = re.compile( Then ran it just as it's run in CI, but redirecting the output to a file. code to do that (click me)python -m auditwheel -vvv repair \
-w final_dist \
--exclude "libcuda.so.1" \
--exclude "libnvidia-ml.so.1" \
--exclude "libucm.so.0" \
--exclude "libuct.so.0" \
--exclude "libucs.so.0" \
--exclude "libucp.so.0" \
dist/* \
> /opt/work/auditwheel.txt 2>&1 From that, I see that patchelf --set-soname libgomp-24e2ab19.so.1.0.0 libucx_cu12.libs/libgomp-24e2ab19.so.1.0.0
patchelf --replace-needed libgomp.so.1 libgomp-24e2ab19.so.1.0.0 libucx/bin/ucx_perftest
patchelf --remove-rpath /tmp/tmp8v5ujsmi/libucx/bin/ucx_perftest
patchelf --force-rpath --set-rpath $ORIGIN/../../libucx_cu12.libs /tmp/tmp8v5ujsmi/libucx/bin/ucx_perftest Which then leaves pip install ./final_dist/*.whl
SITE_PACKAGES=$(python -c "import site; print(site.getsitepackages()[0])")
objdump -x "${SITE_PACKAGES}/libucx/bin/ucx_perftest" | grep PATH
# RPATH $ORIGIN/../../libucx_cu12.libs Notice that That's a default from auditwheel. The default settings for It's possible to change the Like this: match = WHEEL_INFO_RE(wheel_fname)
dest_dir = match.group("name") + lib_sdir We want the directory in import libucx
libucx.load_library() That's customized here: ucx-wheels/python/libucx/setup.py Line 33 in ff19461
ucx-wheels/python/libucx/setup.py Lines 48 to 52 in ff19461
We have, for example, wheels here called I tried patching that installed CLI after the fact... did not work. patchelf --print-rpath "${SITE_PACKAGES}/libucx/bin/ucx_perftest"
# $ORIGIN/../../libucx_cu12.libs
patchelf --remove-rpath "${SITE_PACKAGES}/libucx/bin/ucx_perftest"
patchelf --print-rpath "${SITE_PACKAGES}/libucx/bin/ucx_perftest"
# (empty)
patchelf --force-rpath --set-rpath '$ORIGIN/../lib' "${SITE_PACKAGES}/libucx/bin/ucx_perftest"
# Assertion failed: splitIndex != -1 (patchelf.cc: shiftFile: 504)
# Aborted (core dumped)
patchelf --set-rpath '$ORIGIN/../lib' "${SITE_PACKAGES}/libucx/bin/ucx_perftest"
# Assertion failed: splitIndex != -1 (patchelf.cc: shiftFile: 504)
# Aborted (core dumped)
patchelf --add-rpath '$ORIGIN/../../lib' "${SITE_PACKAGES}/libucx/bin/ucx_perftest"
# Assertion failed: splitIndex != -1 (patchelf.cc: shiftFile: 504)
# Aborted (core dumped) And that's where I'm stuck at right now. file "${SITE_PACKAGES}/libucx/bin/ucx_perftest"
I've attached the full auditwheel logs here (as a file attachment, because it's large: |
Description
UCX provides a CLI,
ucx_perftest
, for running performance tests (example from UCX docs).While investigating rapidsai/ucx-py#1072, @pentschev attempted to use that tool bundled in the wheels produced here, and found that it segfaulted immediately. The root cause looked to be missing linking information.
In #11, removing this invocation of
auditwheel repair
appeared to leave that linking in place:ucx-wheels/ci/build_wheel.sh
Line 15 in ff19461
And that change alone allowed
ucx_perftest
to execute successfully 🎉That should be investigated, and changes might be required for the build here.
Reproducible Example
On an x86_64 system with CUDA 12.2
Notes
Some relevant notes in the OpenUCX docs:
The text was updated successfully, but these errors were encountered: