-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Description
Running ghexbench with UCX_DC_MLX5_TM_ENABLE=y causes an error and a segfault. The same setting works with MPI backend when using OpenMPI on IB networks. Is it something about how we create the worker / UCX context?
[1615309041.680074] [b2237:256544:0] rc_mlx5_common.c:827 UCX ERROR ibv_exp_create_srq(device=mlx5_0) failed: Cannot allocate memory
==== backtrace (tid: 110170) ====
0 0x0000000000052e95 ucs_debug_print_backtrace() /build-result/src/hpcx-v2.7.0-gcc-MLNX_OFED_LINUX-4.7-1.0.0.1-redhat7.7-x86_64/ucx-v1.9.x/src/ucs/debug/debug.c:656
1 0x000000000003e54c ucp_address_pack() /build-result/src/hpcx-v2.7.0-gcc-MLNX_OFED_LINUX-4.7-1.0.0.1-redhat7.7-x86_64/ucx-v1.9.x/src/ucp/wireup/address.c:832
2 0x000000000003e54c ucp_address_pack() /build-result/src/hpcx-v2.7.0-gcc-MLNX_OFED_LINUX-4.7-1.0.0.1-redhat7.7-x86_64/ucx-v1.9.x/src/ucp/wireup/address.c:844
3 0x00000000000246bd ucp_worker_get_address() /build-result/src/hpcx-v2.7.0-gcc-MLNX_OFED_LINUX-4.7-1.0.0.1-redhat7.7-x86_64/ucx-v1.9.x/src/ucp/core/ucp_worker.c:2241
4 0x00000000004327a8 gridtools::ghex::tl::ucx::worker_t::worker_t() ???:0
5 0x000000000042b646 cartex::runtime::impl::init() ???:0
6 0x000000000041da99 cartex::runtime::exchange() ???:0
7 0x000000000040afe5 main() ???:0
8 0x0000000000022545 __libc_start_main() ???:0
9 0x000000000040ca8d _start() ???:0
=================================
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels