Skip to content

Releases: openucx/ucx

v1.13.0-rc1

27 May 15:05
43f710a
Compare
Choose a tag to compare
v1.13.0-rc1 Pre-release
Pre-release

1.13.0-rc1 (May 27, 2022)

Features

Core
  • Added new objects to VFS: local and remote address of endpoint, statistics of ucp_ep_create success/failure, failed/destroyed endpoints
  • Added support for UCX static libraries
  • Added profiling for rkey management routines
  • PCIe relaxed order enabled by default for AMD CPUs

UCP

  • Added API to pass pre-registered memory handle to UCP operations
  • Added implementation of AM rendezvous protocol
  • Added 2-stage pipeline rendezvous protocol for GPU
  • Added support for fragment mem_type for v1 pipeline proto, disabled by default
  • Added active message support for proto v2
  • Added UCP memory registration cache
  • Improved adaptive progress - deactivate iface when all p2p lanes are destroyed
  • Added support for user memh in proto_v1
  • Added support for selecting local address when creating a client endpoint
  • Added option to limit GPUDirectRDMA size in rendezvous protocol, UCX_RNDV_MEMTYPE_DIRECT_SIZE
  • Deprecated UCX_SOCKADDR_AUX_TLS configuration parameter

UCT

  • Introduced API uct_md_mkey_pack_v2
  • Introduced UCT iface features API
  • Introduced max_inflight_eps parameter in perf_attr API
  • Introduced UCT_SEND_FLAG_PEER_CHECK flag that forces checking connectivity to a peer
  • Introduced UCX_RCACHE_PURGE_ON_FORK to enable/disable cleaning regions when application is forking

RDMA CORE (IB, ROCE, etc.)

  • Introduced NDR autorecognition
  • Introduced CQE zipping support
  • Set the default MAX_RD_ATOMIC to maximum value supported by the hardware

ROCM

  • Increased maximum number of HSA agents

UCS

  • Added topo module infrastructure
  • Added memtrack and rcache information to VFS

Tools

  • Added support for pre-registered memory in ucx_perftest
  • Added loopback transport support for UCT perf tests

Bugfixes

Core

  • Fixed not deallocating memory from ucp_mem_unmap if no rcache
  • Fixed versioning infrastructure
  • Multiple code improvements: refactoring, debug prints and assertions, etc.
  • Multiple improvements in build, test and docs infrastructure

UCP

  • Resolving remote EP ID when creating local EP disabled by default
  • Multiple fixes in keepalive protocol
  • Fixed initialization request send state if software RMA/AMO in use
  • Fixed error handling in RMA and BW lanes selection logic
  • Fixed CM wireup fallback
  • Fixed occasional crash in finalize
  • Fixed AM proto flags
  • Fixed single zcopy proto initialization for AM
  • Fixed proto v2 selection, take into account user header length
  • Fixed selecting auxiliary transports when creating EP for sending EP_REMOVED
  • Fixed printing invalid configuration
  • Fixed allocation of indirect remote ID for internal EP if connected EP supports PEER_FAILURE
  • Fixed memh allocation when no rcache
  • Fixed protocol selection logic for UCP AM send
  • Fixed error handling flow for EP discard requests from pending queue
  • Fixed EP destroy flow
  • Fixed rsc_index for prereg_md_map
  • Fixed wireup error handling flow Create EP which send WIREUP_MSG/EP_REMOVED with AM lane only
  • Fixed probe for multi-fragment eager
  • Fixed alignment for AM rdesc init
  • Fixed perf estimation for proto v2
  • Fixed CM wireup with proto v2
  • Fixed EP discard flow during fast-forward
  • Fixed datatype issue in TAG send
  • Fixed EP refcount overflow
  • Fixed EP error handling flow
  • Fixed wire compatibility in address unpacking
  • Fixed ucp_ep_close_nb for failed endpoint when related requests have registered memory that should be invalidated
  • Fixed fragmented proto v2
  • Fixed UCP address v2 packing/unpacking and usage of seg_size
  • Fixed purge requests on failed endpoint
  • Fixed error handling of connecting p2p lanes during WIREUP phase
  • Fixed UCP endpoint use after free

UCT

  • Fixed ABI break of uct_ep_params_t
  • Fixed common intra-node keepalive protocol
  • Fixed a typo UCT_PERF_ATTR_FIELD_REMOTE_SYS_DEIVCE -> UCT_PERF_ATTR_FIELD_REMOTE_SYS_DEVICE
  • Fixed potential crash on MD mem alloc
  • Disabled PEER_FAILURE capability for XPMEM

RDMA CORE (IB, ROCE, etc.)

  • Fixed 2G aligned MR registration
  • Fixed FC_HARD_REQ resending
  • Fixed remote access to invalidated MR
  • Fixed max_rd_atomic_dc value for DV
  • Fixed DC handshake logic
  • Fixed error handling flows
  • Fixed flush(CANCEL) with UD and DC transports
  • Fixed multi-path handling for passive endpoint with UD transport
  • Fixed attributes for DV QP creation
  • Fixed device query
  • Fixed memory leak in case of disabling RDMA transport
  • Fixed dci->pool_index initialization
  • Fixed fallback if port speed not detected
  • Fixed tag offload recv for inlined data
  • Fixed PKEY index initialization
  • Disabled mlx5 ifaces on verbs MD

TCP

  • Fixed flush(CANCEL)
  • Fixed close protocol when UCT EP pairs have only RX capability
  • Fixed query local/remote saddr

GPU (CUDA, ROCM)

  • Fixed a bug in invalidating address range in CUDA_IPC
  • Fixed CUDA context caching and cleanup
  • Fixed ROCM initialization
  • Fixed ROCM components compilation
  • Fixed IPC tls reachability check
  • Fixed ROCM memory type detection
  • Use ROCM remote_agent if available

KNEM

  • Fixed memory registration cost

UCM

  • Fixed potential hang on init

UCS

  • Fixed name shadow problem in CentOS6.x

Tools

  • Print stream API limits and handle stream feature in ucx_info
  • Replaced ucp_ep_close_nb by ucp_ep_close_nbx in examples
  • Replaced completed field by checking UCS status in io_demo

JAVA

  • Throw exception if ucp_mem_query failed

GO

  • Disabled go bindings in rpmbuild
  • Fixed configure behavior if can't find go compiler
  • Standalone performance benchmark
  • Increased port range + make it dependent on agent_id
  • Check compiler minimum version
  • Set GOCACHE to a local directory that is cleared for each job in CI
  • Disabled module for goperftest
  • Fixed OOS build

v1.12.1

21 Mar 17:20
dc92435
Compare
Choose a tag to compare

1.12.1 (March 21, 2022)

Bugfixes

  • Fixed memory hooks for Cuda 11.5
  • Fixed memory type cache merge
  • Fixed continuously triggering wakeup fd when keepalive is used
  • Fixed memtype cache fallback when memory hooks are not installed
  • Fixed parsing header flags of worker address
  • Fixed pipeline protocol when sending from host memory to GPU memory
  • Fixed transport progress not deactivated when all its connections are closed
  • Fixed progress loop in io_demo application
  • Fixed ROCm segfault when using internal_ops functions
  • Fixed ROCm memory hooks
  • Fixed performance regression on A64FX
  • Fixed DCT create failure with rdma-core v22
  • Fixed golang bindings build
  • Fixed .deb package build on Ubuntu 22.04
  • Fixed build on archlinux

Important changes

  • If Cuda memory hooks on driver API cannot be installed, memory type cache and
    memory registration cache will be disabled. This may lead to lower performance
    of some applications on setups with NVIDIA GPUs, even if Cuda memory is not
    being used. Prior to this change, failing to install driver API hooks could
    lead to runtime errors or data corruption when Cuda memory is used and linked
    statically with cuda runtime.
    In order to revert to previous behavior (when the application is linked
    dynamically with cuda runtime), can set UCX_MEM_CUDA_HOOK_MODE=reloc.
    See more info in #7865.

v1.12.1-rc4

16 Mar 19:30
f8c35b8
Compare
Choose a tag to compare
v1.12.1-rc4 Pre-release
Pre-release

1.12.1-rc4 (March 16, 2022)

Bugfixes

  • Fixed memory hooks for Cuda 11.5
  • Fixed memory type cache merge
  • Fixed continuously triggering wakeup fd when keepalive is used
  • Fixed memtype cache fallback when memory hooks are not installed
  • Fixed parsing header flags of worker address
  • Fixed pipeline protocol when sending from host memory to GPU memory
  • Fixed transport progress not deactivated when all its connections are closed
  • Fixed progress loop in io_demo application
  • Fixed ROCm segfault when using internal_ops functions
  • Fixed ROCm memory hooks
  • Fixed performance regression on A64FX
  • Fixed DCT create failure with rdma-core v22
  • Fixed golang bindings build
  • Fixed .deb package build on Ubuntu 22.04
  • Fixed build on archlinux

Important changes

  • If Cuda memory hooks on driver API cannot be installed, memory type cache and
    memory registration cache will be disabled. This may lead to lower performance
    of some applications on setups with NVIDIA GPUs, even if Cuda memory is not
    being used. Prior to this change, failing to install driver API hooks could
    lead to runtime errors or data corruption when Cuda memory is used and linked
    statically with cuda runtime.
    In order to revert to previous behavior (when the application is linked
    dynamically with cuda runtime), can set UCX_MEM_CUDA_HOOK_MODE=reloc.
    See more info in #7865.

v1.12.1-rc3

05 Mar 02:05
b8dfe5b
Compare
Choose a tag to compare
v1.12.1-rc3 Pre-release
Pre-release

1.12.1-rc3 (March 4, 2022)

Bugfixes

  • Fixed memory hooks for Cuda 11.5
  • Fixed memory type cache merge
  • Fixed continuously triggering wakeup fd when keepalive is used
  • Fixed memtype cache fallback when memory hooks are not installed
  • Fixed parsing header flags of worker address
  • Fixed pipeline protocol when sending from host memory to GPU memory
  • Fixed transport progress not deactivated when all its connections are closed
  • Fixed progress loop in io_demo application
  • Fixed ROCm segfault when using internal_ops functions
  • Fixed ROCm memory hooks

Important changes

  • If Cuda memory hooks on driver API cannot be installed, memory type cache and
    memory registration cache will be disabled. This may lead to lower performance
    of some applications on setups with NVIDIA GPUs, even if Cuda memory is not
    being used. Prior to this change, failing to install driver API hooks could
    lead to runtime errors or data corruption when Cuda memory is used and linked
    statically with cuda runtime.
    In order to revert to previous behavior (when the application is linked
    dynamically with cuda runtime), can set UCX_MEM_CUDA_HOOK_MODE=reloc.
    See more info in #7865.

v1.12.1-rc2

14 Feb 20:43
c5a9d4e
Compare
Choose a tag to compare
v1.12.1-rc2 Pre-release
Pre-release

1.12.1-rc2 (February 14, 2022)

Bugfixes

  • Fixed memory hooks for Cuda 11.5
  • Fixed memory type cache merge
  • Fixed continuously triggering wakeup fd when keepalive is used
  • Fixed memtype cache fallback when memory hooks are not installed
  • Fixed parsing header flags of worker address
  • Fixed pipeline protocol when sending from host memory to GPU memory

Important changes

  • If Cuda memory hooks on driver API cannot be installed, memory type cache and
    memory registration cache will be disabled. This may lead to lower performance
    of some applications on setups with NVIDIA GPUs, even if Cuda memory is not
    being used. Prior to this change, failing to install driver API hooks could
    lead to runtime errors or data corruption when Cuda memory is used and linked
    statically with cuda runtime.
    In order to revert to previous behavior (when the application is linked
    dynamically with cuda runtime), can set UCX_MEM_CUDA_HOOK_MODE=reloc.
    See more info in #7865.

v1.12.1-rc1

09 Feb 11:51
47f786e
Compare
Choose a tag to compare
v1.12.1-rc1 Pre-release
Pre-release

1.12.1-rc1 (February 9, 2022)

Bugfixes

  • Fixed memory hooks for Cuda 11.5
  • Fixed memory type cache merge
  • Fixed continuously triggering wakeup fd when keepalive is used
  • Fixed memtype cache fallback when memory hooks are not installed

Important changes

  • If Cuda memory hooks on driver API cannot be installed, memory type cache and
    memory registration cache will be disabled. This may lead to lower performance
    of some applications on setups with NVIDIA GPUs, even if Cuda memory is not
    being used. Prior to this change, failing to install driver API hooks could
    lead to runtime errors or data corruption when Cuda memory is used and linked
    statically with cuda runtime.
    In order to revert to previous behavior (when the application is linked
    dynamically with cuda runtime), can set UCX_MEM_CUDA_HOOK_MODE=reloc.
    See more info in #7865.

v1.12.0

12 Jan 15:00
d367332
Compare
Choose a tag to compare

1.12.0 (January 12, 2022)

Features:

Core

  • Added beta-level support for Go language bindings
  • Added new objects to VFS (md, component, log_level, etc.)
  • Added configuration variable to specify which loadable modules are allowed
  • Added build-time configuration to disable sigaction overriding

UCP

  • Added client_id to ucp_worker_create() and ucp_conn_request_query() APIs
  • Added ucp_worker_address_query() API
  • Updated ucp_ep_query() API for getting local and remote addresses
  • Added address versioning to correctly preserve wire compatibility starting from version 1.11.0
  • Added new client/server connection establishment packet header format
  • Enabled rendezvous and tag sync protocols when error handling is enabled on the endpoint
  • Added iov zcopy support to RMA operations
  • Reduced memory usage of unexpected messages by fitting receive buffer size to packet size
  • Added support for modifying UCT and UCS configs by ucp_config_modify() API
  • Optimized unpacked rkeys memory consumption
  • Added request flag to influence latency vs. bandwidth protocol
  • Reduced memory management overhead with new protocols
  • Improved performance calculations for new protocols
  • Added AMO support with GPU memory target using new protocols
  • Added put_zcopy, get_zcopy and pipeline based rendezvous in new protocols
  • Added support for user-defined alignment in Active Messages
  • Added support for offload tag sync in new protocols
  • Updated ucp_atomic_post() to use NBX flow

UCT

  • Added API - uct_iface_is_reachable_v2()
  • Added IPv6 address support in TCP
  • Added latency estimation to uct_iface_estimate_perf()
  • Adjusted knem and cma overhead cost
  • Increased built-in TCP keep-alive interval to 2 seconds

RDMA CORE (IB, ROCE, etc.)

  • Added detection of IB NDR devices
  • Added check for CQ overrun in assert mode
  • Added bitmap usage for releasing detached DCIs
  • Added configuration for requests ack frequency with DevX
  • Added remote QP info to tx error CQE traces

UCS

  • Added API for a per-process aggregate-sum statistics report
  • Added memory pool set data structure
  • Added new ptr_array API for bulk allocation
  • Added ucs_string_buffer_append_flags() for string buffer
  • Added ucs_ffs32()
  • Added ucs_vsnprintf_safe() which always adds '\0'
  • Added thread-safe put to ptr_map
  • Improved accuracy of the topology distance estimation
  • Added prints of leaked callbacks from the callback queue
  • Removed a diagnostic message when fuse thread is stopped
  • Added configurable limit for the memory consumed by rcache
  • Added configuration for VFS(FUSE) thread affinity
  • Added memory limit support to memtrack

CUDA

  • Added global memtype cache to allow UCT transports to query memory attributes
  • Auto-register CUDA whole allocations to avoid repeated registration costs
  • Added capability to select CUDA stream based on source and destination memory type
    (required for device memory based pipelining)
  • Added selection of CUDA-IPC capabilities based on NVLINK topology
    (to prefer writes vs. reads for specific platforms using NVML)
  • Added option to set cuda_copy bandwidth
  • Added profiling of CUDA runtime function calls
  • Added option to limit GPUDirectRDMA size in rendezvous protocol

Java

  • Added ucp_listener_reject functionality
  • Added support for setting worker id and querying it from the connection request
  • Added support to bind on a free port in UcpListener

Packaging

  • Added cmake config files for better integration with external cmake based projects

Tests

  • Removed memcpy from AM eager flow in io_demo
  • Added check_qps.sh script to detected stuck QPs
  • Improved diagnostic in test_init_mt
  • Added iov support in ucp_client_server
  • Added option to use epoll in io_demo
  • Added registration of memory allocated by io_demo in memtrack
  • Extended statistics in io_demo
  • Improved logging in io_demo
  • Replaced rand by urand in io_demo
  • More improvements in io_demo
  • Generalized median calculation to support any percentile in ucx_perftest

Tools

  • Added loop-back transport support in ucx_perftest
  • Split ucx_perftest into separate modules
  • Added process placement option for ucx_info
  • Extended parameters correctness check in ucx_perftest
  • Added support for GPU memory RMA and atomics in ucx_perftest

CI

  • Updated gtest 1.7 to 1.10
  • Increased uptime in network corrupter (used for io_demo)
  • Enabled set of gtests for new protocols
  • Added running CI in docker containers
  • Increased thresholds for test_ucp_wait_mem
  • Added test for ucx binary compatibility between OS versions
  • Increased test job timeout to 6 hours
  • Reduced testing time under valgrind
  • Added suppressions for glibc and libnl leaks
  • Relaxed performance requirements in perf test

Bugfixes

Core

  • Fixed invalid remote memory access after connection error
  • Fixed creating more than 64K endpoints between the same peers
  • Fixed simultaneous endpoint close with ucp_hello_world

UCP

  • Fixes and improvements in new protocols infrastructure
  • Fixes in AM flows
  • Fixed tag short threshold selection
  • Multiple fixes in keep-alive protocol
  • Multiple fixes in wire-up protocol
  • Fixes in error flow during rendezvous protocol
  • Multiple fixes in general error flow
  • Fixed fallback to PUT pipeline in rendezvous protocol
  • Reduced default value of keep-alive interval to 20 seconds
  • Fixes in tag_send datatype processing

UCT

  • Fixed keep-alive protocol for intra-node transports (sm, cuda)
  • Fixed deadlock in TCP
  • Suppressed EHOSTUNREACH error in TCP sockcm
  • Restricted connecting loop-back to other devices in TCP

RDMA CORE (IB, ROCE, etc.)

  • Fixed pkey_index initialization when creating RC QP with DEVX
  • Disabled MP_SRQ by default
  • Fixed TX WQ overflow check
  • Fixed dci->pool_index initialization when HAVE_DC_DV is false
  • Fixed syndrome value for creating rdmacm reserved qpn
  • Fixed error code on rdma_establish failure
  • Fixed uct_ep_am_short_iov for UD verbs
  • Fixed handling of error CQE after rc_ep is destroyed
  • Fixes in flow control when error CQE is polled
  • Multiple fixes in RC and DC error flows
  • Fixed deadlock between DCIs and RDMA_READ credits
  • Removed AM handler invocation for PURE_GRANT messages
  • Fixed endpoint arbiter_group leak in DC
  • Fixed resource check in flush for DC

UCS

  • Fixed segmentation fault for ucs_stats_parser
  • Fixed potential crash on cleanup when use UCX profiling
  • Fixed read_profile print of new request
  • Fixed uninitialized variable access in VFS
  • Changed log level of inotify_init failure to diag
  • Fixed integer overflow in mpool chunk allocation

Packaging

  • Fixed with-fuse arg for RPM build

Documentation

  • Fixes in UCP, UCT, UCS, FAQ and README documentation

Tests

  • Multiple fixes in io_demo

CI

  • Fixed snapshot docker name
  • Fixed hipMallocManaged hook gtest
  • Fixes in Azure release pipeline
  • Fixes in Coverity CI
  • Fixed test_uct_query gtest for ROCm
  • Fixes in jenkins test script
  • Fixed release commit title check

v1.12.0-rc3

11 Jan 15:47
d74fd54
Compare
Choose a tag to compare
v1.12.0-rc3 Pre-release
Pre-release

1.12.0 RC3 (January 11, 2022)

Bugfixes

  • Fixes in tag_send datatype processing
  • Fixed keep-alive protocol for intra-node transports (sm, cuda)

v1.12.0-rc2

08 Jan 20:30
9fe66a5
Compare
Choose a tag to compare
v1.12.0-rc2 Pre-release
Pre-release

1.12.0 RC2 (January 8, 2022)

Features:

Added detection of IB NDR

v1.12.0-rc1

14 Dec 16:07
b98911f
Compare
Choose a tag to compare
v1.12.0-rc1 Pre-release
Pre-release

1.12.0 RC1 (December 14, 2021)

Features:

Core

  • Added beta-level support for Go language bindings
  • Added new objects to VFS (md, component, log_level, etc.)
  • Added configuration variable to specify which loadable modules are allowed
  • Added build-time configuration to disable sigaction overriding

UCP

  • Added client_id to ucp_worker_create() and ucp_conn_request_query() APIs
  • Added ucp_worker_address_query() API
  • Updated ucp_ep_query() API for getting local and remote addresses
  • Added address versioning to correctly preserve wire compatibility starting from version 1.11.0
  • Added new client/server connection establishment packet header format
  • Enabled rendezvous and tag sync protocols when error handling is enabled on the endpoint
  • Added iov zcopy support to RMA operations
  • Reduced memory usage of unexpected messages by fitting receive buffer size to packet size
  • Added support for modifying UCT and UCS configs by ucp_config_modify() API
  • Optimized unpacked rkeys memory consumption
  • Added request flag to influence latency vs. bandwidth protocol
  • Reduced memory management overhead with new protocols
  • Improved performance calculations for new protocols
  • Added AMO support with GPU memory target using new protocols
  • Added put_zcopy, get_zcopy and pipeline based rendezvous in new protocols
  • Added support for user-defined alignment in Active Messages
  • Added support for offload tag sync in new protocols
  • Updated ucp_atomic_post() to use NBX flow

UCT

  • Added API - uct_iface_is_reachable_v2()
  • Added IPv6 address support in TCP
  • Added latency estimation to uct_iface_estimate_perf()
  • Adjusted knem and cma overhead cost
  • Increased built-in TCP keep-alive interval to 2 seconds

RDMA CORE (IB, ROCE, etc.)

  • Added check for CQ overrun in assert mode
  • Added bitmap usage for releasing detached DCIs
  • Added configuration for requests ack frequency with DevX
  • Added remote QP info to tx error CQE traces

UCS

  • Added API for a per-process aggregate-sum statistics report
  • Added memory pool set data structure
  • Added new ptr_array API for bulk allocation
  • Added ucs_string_buffer_append_flags() for string buffer
  • Added ucs_ffs32()
  • Added ucs_vsnprintf_safe() which always adds '\0'
  • Added thread-safe put to ptr_map
  • Improved accuracy of the topology distance estimation
  • Added prints of leaked callbacks from the callback queue
  • Removed a diagnostic message when fuse thread is stopped
  • Added configurable limit for the memory consumed by rcache
  • Added configuration for VFS(FUSE) thread affinity
  • Added memory limit support to memtrack

CUDA

  • Added global memtype cache to allow UCT transports to query memory attributes
  • Auto-register CUDA whole allocations to avoid repeated registration costs
  • Added capability to select CUDA stream based on source and destination memory type
    (required for device memory based pipelining)
  • Added selection of CUDA-IPC capabilities based on NVLINK topology
    (to prefer writes vs. reads for specific platforms using NVML)
  • Added option to set cuda_copy bandwidth
  • Added profiling of CUDA runtime function calls
  • Added option to limit GPUDirectRDMA size in rendezvous protocol

Java

  • Added ucp_listener_reject functionality
  • Added support for setting worker id and querying it from the connection request
  • Added support to bind on a free port in UcpListener

Packaging

  • Added cmake config files for better integration with external cmake based projects

Tests

  • Removed memcpy from AM eager flow in io_demo
  • Added check_qps.sh script to detected stuck QPs
  • Improved diagnostic in test_init_mt
  • Added iov support in ucp_client_server
  • Added option to use epoll in io_demo
  • Added registration of memory allocated by io_demo in memtrack
  • Extended statistics in io_demo
  • Improved logging in io_demo
  • Replaced rand by urand in io_demo
  • More improvements in io_demo
  • Generalized median calculation to support any percentile in ucx_perftest

Tools

  • Added loop-back transport support in ucx_perftest
  • Split ucx_perftest into separate modules
  • Added process placement option for ucx_info
  • Extended parameters correctness check in ucx_perftest
  • Added support for GPU memory RMA and atomics in ucx_perftest

CI

  • Updated gtest 1.7 to 1.10
  • Increased uptime in network corrupter (used for io_demo)
  • Enabled set of gtests for new protocols
  • Added running CI in docker containers
  • Increased thresholds for test_ucp_wait_mem
  • Added test for ucx binary compatibility between OS versions
  • Increased test job timeout to 6 hours
  • Reduced testing time under valgrind
  • Added suppressions for glibc and libnl leaks
  • Relaxed performance requirements in perf test

Bugfixes

Core

  • Fixed invalid remote memory access after connection error
  • Fixed creating more than 64K endpoints between the same peers
  • Fixed simultaneous endpoint close with ucp_hello_world

UCP

  • Fixes and improvements in new protocols infrastructure
  • Fixes in AM flows
  • Fixed tag short threshold selection
  • Multiple fixes in keep-alive protocol
  • Multiple fixes in wire-up protocol
  • Fixes in error flow during rendezvous protocol
  • Multiple fixes in general error flow
  • Fixed fallback to PUT pipeline in rendezvous protocol
  • Reduced default value of keep-alive interval to 20 seconds

UCT

  • Fixed deadlock in TCP
  • Suppressed EHOSTUNREACH error in TCP sockcm
  • Restricted connecting loop-back to other devices in TCP

RDMA CORE (IB, ROCE, etc.)

  • Fixed pkey_index initialization when creating RC QP with DEVX
  • Disabled MP_SRQ by default
  • Fixed TX WQ overflow check
  • Fixed dci->pool_index initialization when HAVE_DC_DV is false
  • Fixed syndrome value for creating rdmacm reserved qpn
  • Fixed error code on rdma_establish failure
  • Fixed uct_ep_am_short_iov for UD verbs
  • Fixed handling of error CQE after rc_ep is destroyed
  • Fixes in flow control when error CQE is polled
  • Multiple fixes in RC and DC error flows
  • Fixed deadlock between DCIs and RDMA_READ credits
  • Removed AM handler invocation for PURE_GRANT messages
  • Fixed endpoint arbiter_group leak in DC
  • Fixed resource check in flush for DC

UCS

  • Fixed segmentation fault for ucs_stats_parser
  • Fixed potential crash on cleanup when use UCX profiling
  • Fixed read_profile print of new request
  • Fixed uninitialized variable access in VFS
  • Changed log level of inotify_init failure to diag
  • Fixed integer overflow in mpool chunk allocation

Packaging

  • Fixed with-fuse arg for RPM build

Documentation

  • Fixes in UCP, UCT, UCS, FAQ and README documentation

Tests

  • Multiple fixes in io_demo

CI

  • Fixed snapshot docker name
  • Fixed hipMallocManaged hook gtest
  • Fixes in Azure release pipeline
  • Fixes in Coverity CI
  • Fixed test_uct_query gtest for ROCm
  • Fixes in jenkins test script
  • Fixed release commit title check