Skip to content

Releases: openucx/ucx

v1.10.1

13 May 02:01
6a5856e
Compare
Choose a tag to compare

1.10.1 (May 12, 2021)

Bugfixes:

  • Fixes in Infiniband port speed detection for HDR100
  • Fixes in building gtest-all.cc and sock.c with GCC11
  • Fixes addressing performance degradation with cuda memory on a self endpoint
  • Fixes in JUCX listener connection handler
  • Fixed in configuration of loopback TCP transport (disable by default)
  • Fixes in RPM dependency on libibverbs
  • Fixes in ABI backward compatibility for active message protocol
  • Fixes in the DC transport - adding support for full-handshake mode (off by default)
  • Fixes in Active Messages short reply protocol
  • Fixes for segmentation fault while listening for connections

v1.10.1-rc2

10 May 21:41
f633e85
Compare
Choose a tag to compare
v1.10.1-rc2 Pre-release
Pre-release

1.10.1 RC2 (May 10, 2021)

Bugfixes:

  • Fixes in Infiniband port speed detection for HDR100
  • Fixes in building gtest-all.cc and sock.c with GCC11
  • Fixes addressing performance degradation with cuda memory on a self endpoint
  • Fixes in JUCX listener connection handler
  • Fixed in configuration of loopback TCP transport (disable by default)
  • Fixes in RPM dependency on libibverbs
  • Fixes in ABI backward compatibility for active message protocol
  • Add support for DC full-handshake mode (off by default)
  • Fixes in Active Messages short reply protocol
  • Fixes for segmentation fault while listening for connections

v1.10.1-rc1

22 Apr 23:57
cbcc551
Compare
Choose a tag to compare
v1.10.1-rc1 Pre-release
Pre-release

1.10.1-rc1

Bugfixes:

  • Fix Infiniband port speed detection for HDR100
  • Fix build issues in gtest-all.cc and sock.c with GCC11
  • Fix performance degradation with cuda memory on self endpoint
  • Fix bug in JUCX listener connection handler.

v1.10.0

09 Mar 23:04
20697e5
Compare
Choose a tag to compare

Features:

Core

  • Added support for Nvidia HPC SDK
  • Added support for latest PGI and Clang
  • Added support for ROCM-3.7+ (warning generated if older version detected)
  • Added support for GCC11

Architecture

  • Added Arm SVE memcpy()
  • Redesigned Arm WFE support
  • Improved clear_cache performance for Arm
  • Added architecture detection for Zhaoxin CPU

CI

  • Added release builds on CUDA 11
  • Enabled performance validation in gtest
  • Added new OS for release CI

UCP

  • Added locality awareness to the transport selection logic for GPU devices
  • Added put/offload/short and put/offload/zcopy protocols
  • Added receive message nbx routine
  • Reworked AM implementation and API, which adds support for RNDV semantics
  • Added support for multi-lane connection manager over TCP
  • Added support for printing AM tls with info log level
  • Implement flush and destroy for UCT EPs on UCP worker
  • Reduced UCP request size
  • Added support for keepalive protocol
  • Added support for multi-fragment protocol
  • Added implementation for protocol progress for eager, bcopy, and multicopy
  • Improved selection logic for protocol selection
  • Added new protocols for UCP get operation
  • Added bcopy protocols with support for GPU memory
  • Added RNDV protocol implementation for GPU devices (CUDA, ROCm)
  • Set SOCKADDR_CM_ENABLE=y by default
  • Added support for fast-path short with new tag protocols
  • Added a new parameter to control the CM listener's backlog
  • Added support sending AM RTS over short message protocol
  • Added support for shared memory multi-lane when CM is used
  • Added missing async locks

UCT

  • Added API for keepalive_timeout value
  • Added add uct_completion.status
  • Allowed transports to access multiple mem_types
  • Removed status arg from uct_completion_callback_t
  • Restructured uct_mem_alloc/uct_md_mem_alloc to use mem_type
  • Updated documentation for uct_listener_params
  • Lowered the log level for certain network errors
  • Added cuda_copy wakeup feature
  • Added wakeup support for shared memory

UCS

  • Added "inf" and "auto" values to time units
  • Added on-stack constructors for array and string buffer
  • Added ucs_ptr_map_t data structure
  • Added bool CSWAP
  • Improved logging
  • Added optimization for namespace processing
  • Fixes for connection matching functionality

CUDA

  • Added support for global IPC cache

RDMA CORE (IB, ROCE, etc.)

  • Added support for auto detection of adapative routing settings
  • Added an option to poll TX CQ every progress iteration
  • Added local and remote addresses to the reject error message
  • Added support for UAR allocation with non-cacheable memory type
  • Added support for multiple flush cancel without completion
  • Added async events callback support
  • Added detection for ConnectX-6, ConnectX-7 and BlueField-1/2 devices
  • Added support for connection matching for UD
  • Added a check for AM ordering
  • Added better support for non-4K MTU values

Java (preview)

  • Added support for a different javadoc executable path for different java versions
  • Added UCS memory type constants
  • Added support build on Java10+
  • Added support for io-vector datatype.
  • Removed libjucx from packages.

Tests

  • Added CI for CUDA 11
  • Added test_ucp_sockaddr_protocols.stream_short
  • Reimplemented tests using NBX API
  • Added flush(cancel) test
  • Added memory_wait mode to perftest
  • Added support for clang 10
  • Refactored RMA and atomic tests, add memtype support
  • Added test for uct_md_mem_query()
  • Added request interrupt support
  • Added support for connection manager fallbacks
  • Added new ucp request test checking for leaks from the ptr_map

Documentation

  • Added glossaries

Bugfixes:

Portability

  • Fixes in print functions to use format string like PRIx64, etc.
  • Fixes for Arm v8 cross compilation support

Continues Integration:

  • Fixes in Github release flow
  • Fixes in docker image

Packaging

  • Removed deb package dependencies
  • Fixes in SPEC to make the RPM relocatable

Documentation

  • Fixes in documentation for ucp_am_recv_data_nbx
  • Fixes in quick start example
  • Fixes in installation instruction
  • Fixes in updates in author list

Tests

  • Fixes for failures under valgrind runtime
  • Fixes in mmap tests for 0-length RMA
  • Fixes in definition of LAST_WQE wait timeout
  • Fixes in ROCm for mem_buffer test
  • Fixes in test name printing format
  • Fixes in tcp_sockcm test

UCP

  • Fixes in worker cleanup flow
  • Fixes in RNDV RTS flow
  • Fix in length check condition for RMA PUT short
  • Fixes in handling failures from AM Bcopy
  • Fix in a release flow of deferred data
  • Fixes for invalid ID and handling of status in RNDV
  • Fixes in short active message reply protocol

CUDA

  • Fixes in managed memory support
  • Fixes in topology detection

RDMA CORE (IB, ROCE, etc.)

  • Fixes in assert definitions
  • Fixes in printing an error about invalid AM Bcopy length for UD
  • Fixes for thread safety support
  • Fixes to get ROCE device name according to GID
  • Fixes for SL selection
  • Fixes in create STRICT_ORDER key
  • Fixes addressing performance degradation in UD transport due to excess async events
  • Fixes in QP destroy
  • Fixes for CQ creation failure using old Verbs API

UGNI

  • Fixing disable logic in config
  • Fixing clang 11 warnings

Java

  • Fixes in build dependencies
  • Fixes in constructing UcpRequest object on error
  • Fixes in exception handling on endpoint closure request
  • Fixes for segfault in UcpErrorHandler

UCP

  • Fixes in datatype support for get_zcopy RNDV
  • Fixes in connection manager disconnect
  • Fixes in assert definitions
  • Fixes in completion flow for failed EP
  • Fixes in flush error handling flow
  • Fixes in latency calculations for wireup protocol
  • Fixes in offload completion with inlined data
  • Fixes in unpacking flow
  • Fixes in error handling for various protocols

UCT

  • Fixes in flush TX
  • Fixes in checks for enabling GPU Direct RDMA

UCS

  • Fixes for crashes on incorrect value set in config
  • Fixes in ptr_array
  • Fixes in maximal size for ucs_snprintf_safe()
  • Fixes in compilation warning
  • Fixes in ucs_aarch64_dsb(_op) definition

TCP

  • Fixes in default route interface confirmation flow
  • Fixes in PUT protocol
  • Fixes in max connection limit and improved error reporting

UCM

  • Fixing crash on prevent unload
  • Fixes in libucm_rocm
  • Fixes for few racing conditions

v1.10.0-rc5

27 Feb 15:00
b54f3b2
Compare
Choose a tag to compare
v1.10.0-rc5 Pre-release
Pre-release

1.10.0-rc5 (February 26, 2021)

Features:

Core

  • Added support for Nvidia HPC SDK
  • Added support for latest PGI and Clang
  • Added support for ROCM-3.7+ (warning generated if older version detected)
  • Added support for GCC11

Architecture

  • Added Arm SVE memcpy()
  • Redesigned Arm WFE support
  • Improved clear_cache performance for Arm
  • Added architecture detection for Zhaoxin CPU

CI

  • Added release builds on CUDA 11
  • Enabled performance validation in gtest
  • Added new OS for release CI

UCP

  • Added locality awareness to the transport selection logic for GPU devices
  • Added put/offload/short and put/offload/zcopy protocols
  • Added receive message nbx routine
  • Reworked AM implementation and API, which adds support for RNDV semantics
  • Added support for multi-lane connection manager over TCP
  • Added support for printing AM tls with info log level
  • Implement flush and destroy for UCT EPs on UCP worker
  • Reduced UCP request size
  • Added support for keepalive protocol
  • Added support for multi-fragment protocol
  • Added implementation for protocol progress for eager, bcopy, and multicopy
  • Improved selection logic for protocol selection
  • Added new protocols for UCP get operation
  • Added bcopy protocols with support for GPU memory
  • Added RNDV protocol implementation for GPU devices (CUDA, ROCm)
  • Set SOCKADDR_CM_ENABLE=y by default
  • Added support for fast-path short with new tag protocols
  • Added a new parameter to control the CM listener's backlog
  • Added support sending AM RTS over short message protocol
  • Added support for shared memory multi-lane when CM is used
  • Added missing async locks

UCT

  • Added API for keepalive_timeout value
  • Added add uct_completion.status
  • Allowed transports to access multiple mem_types
  • Removed status arg from uct_completion_callback_t
  • Restructured uct_mem_alloc/uct_md_mem_alloc to use mem_type
  • Updated documentation for uct_listener_params
  • Lowered the log level for certain network errors
  • Added cuda_copy wakeup feature
  • Added wakeup support for shared memory

UCS

  • Added "inf" and "auto" values to time units
  • Added on-stack constructors for array and string buffer
  • Added ucs_ptr_map_t data structure
  • Added bool CSWAP
  • Improved logging
  • Added optimization for namespace processing
  • Fixes for connection matching functionality

CUDA

  • Added support for global IPC cache

RDMA CORE (IB, ROCE, etc.)

  • Added support for auto detection of adapative routing settings
  • Added an option to poll TX CQ every progress iteration
  • Added local and remote addresses to the reject error message
  • Added support for UAR allocation with non-cacheable memory type
  • Added support for multiple flush cancel without completion
  • Added async events callback support
  • Added detection for ConnectX-6, ConnectX-7 and BlueField-1/2 devices
  • Added support for connection matching for UD
  • Added a check for AM ordering
  • Added better support for non-4K MTU values

Java (preview)

  • Added support for a different javadoc executable path for different java versions
  • Added UCS memory type constants
  • Added support build on Java10+
  • Added support for io-vector datatype.

Tests

  • Added CI for CUDA 11
  • Added test_ucp_sockaddr_protocols.stream_short
  • Reimplemented tests using NBX API
  • Added flush(cancel) test
  • Added memory_wait mode to perftest
  • Added support for clang 10
  • Refactored RMA and atomic tests, add memtype support
  • Added test for uct_md_mem_query()
  • Added request interrupt support
  • Added support for connection manager fallbacks
  • Added new ucp request test checking for leaks from the ptr_map

Documentation

  • Added glossaries

Bugfixes:

Portability

  • Fixes in print functions to use format string like PRIx64, etc.
  • Fixes for Arm v8 cross compilation support

Continues Integration:

  • Fixes in Github release flow
  • Fixes in docker image

Packaging

  • Removed deb package dependencies
  • Fixes in SPEC to make the RPM relocatable

Documentation

  • Fixes in documentation for ucp_am_recv_data_nbx
  • Fixes in quick start example
  • Fixes in installation instruction
  • Fixes in updates in author list

Tests

  • Fixes for failures under valgrind runtime
  • Fixes in mmap tests for 0-length RMA
  • Fixes in definition of LAST_WQE wait timeout
  • Fixes in ROCm for mem_buffer test
  • Fixes in test name printing format
  • Fixes in tcp_sockcm test

UCP

  • Fixes in worker cleanup flow
  • Fixes in RNDV RTS flow
  • Fix in length check condition for RMA PUT short
  • Fixes in handling failures from AM Bcopy
  • Fix in a release flow of deferred data
  • Fixes for invalid ID and handling of status in RNDV
  • Fixes in short active message reply protocol

CUDA

  • Fixes in managed memory support
  • Fixes in topology detection

RDMA CORE (IB, ROCE, etc.)

  • Fixes in assert definitions
  • Fixes in printing an error about invalid AM Bcopy length for UD
  • Fixes for thread safety support
  • Fixes to get ROCE device name according to GID
  • Fixes for SL selection
  • Fixes in create STRICT_ORDER key
  • Fixes addressing performance degradation in UD transport due to excess async events
  • Fixes in QP destroy
  • Fixes for CQ creation failure using old Verbs API

UGNI

  • Fixing disable logic in config
  • Fixing clang 11 warnings

Java

  • Fixes in build dependencies
  • Fixes in constructing UcpRequest object on error
  • Fixes in exception handling on endpoint closure request
  • Fixes for segfault in UcpErrorHandler

UCP

  • Fixes in datatype support for get_zcopy RNDV
  • Fixes in connection manager disconnect
  • Fixes in assert definitions
  • Fixes in completion flow for failed EP
  • Fixes in flush error handling flow
  • Fixes in latency calculations for wireup protocol
  • Fixes in offload completion with inlined data
  • Fixes in unpacking flow
  • Fixes in error handling for various protocols

UCT

  • Fixes in flush TX
  • Fixes in checks for enabling GPU Direct RDMA

UCS

  • Fixes for crashes on incorrect value set in config
  • Fixes in ptr_array
  • Fixes in maximal size for ucs_snprintf_safe()
  • Fixes in compilation warning
  • Fixes in ucs_aarch64_dsb(_op) definition

TCP

  • Fixes in default route interface confirmation flow
  • Fixes in PUT protocol
  • Fixes in max connection limit and improved error reporting

UCM

  • Fixing crash on prevent unload
  • Fixes in libucm_rocm
  • Fixes for few racing conditions

v1.10.0-rc4

21 Feb 17:35
96422ce
Compare
Choose a tag to compare
v1.10.0-rc4 Pre-release
Pre-release

1.10.0-rc4 (February 20, 2021)

Features:

Core

  • Added support for Nvidia HPC SDK
  • Added support for latest PGI and Clang
  • Added support for ROCM-3.7+ (warning generated if older version detected)
  • Added support for GCC11

Architecture

  • Added Arm SVE memcpy()
  • Redesigned Arm WFE support
  • Improved clear_cache performance for Arm
  • Added architecture detection for Zhaoxin CPU

CI

  • Added release builds on CUDA 11
  • Enabled performance validation in gtest
  • Added new OS for release CI

UCP

  • Added locality awareness to the transport selection logic for GPU devices
  • Added put/offload/short and put/offload/zcopy protocols
  • Added receive message nbx routine
  • Reworked AM implementation and API, which adds support for RNDV semantics
  • Added support for multi-lane connection manager over TCP
  • Added support for printing AM tls with info log level
  • Implement flush and destroy for UCT EPs on UCP worker
  • Reduced UCP request size
  • Added support for keepalive protocol
  • Added support for multi-fragment protocol
  • Added implementation for protocol progress for eager, bcopy, and multicopy
  • Improved selection logic for protocol selection
  • Added new protocols for UCP get operation
  • Added bcopy protocols with support for GPU memory
  • Added RNDV protocol implementation for GPU devices (CUDA, ROCm)
  • Set SOCKADDR_CM_ENABLE=y by default
  • Added support for fast-path short with new tag protocols
  • Added a new parameter to control the CM listener's backlog
  • Added support sending AM RTS over short message protocol
  • Added support for shared memory multi-lane when CM is used
  • Added missing async locks

UCT

  • Added API for keepalive_timeout value
  • Added add uct_completion.status
  • Allowed transports to access multiple mem_types
  • Removed status arg from uct_completion_callback_t
  • Restructured uct_mem_alloc/uct_md_mem_alloc to use mem_type
  • Updated documentation for uct_listener_params
  • Lowered the log level for certain network errors
  • Added cuda_copy wakeup feature
  • Added wakeup support for shared memory

UCS

  • Added "inf" and "auto" values to time units
  • Added on-stack constructors for array and string buffer
  • Added ucs_ptr_map_t data structure
  • Added bool CSWAP
  • Improved logging
  • Added optimization for namespace processing
  • Fixes for connection matching functionality

CUDA

  • Added support for global IPC cache

RDMA CORE (IB, ROCE, etc.)

  • Added support for auto detection of adapative routing settings
  • Added an option to poll TX CQ every progress iteration
  • Added local and remote addresses to the reject error message
  • Added support for UAR allocation with non-cacheable memory type
  • Added support for multiple flush cancel without completion
  • Added async events callback support
  • Added detection for ConnectX-6, ConnectX-7 and BlueField-1/2 devices
  • Added support for connection matching for UD
  • Added a check for AM ordering
  • Added better support for non-4K MTU values

Java (preview)

  • Added support for a different javadoc executable path for different java versions
  • Added UCS memory type constants
  • Added support build on Java10+
  • Added support for io-vector datatype.

Tests

  • Added CI for CUDA 11
  • Added test_ucp_sockaddr_protocols.stream_short
  • Reimplemented tests using NBX API
  • Added flush(cancel) test
  • Added memory_wait mode to perftest
  • Added support for clang 10
  • Refactored RMA and atomic tests, add memtype support
  • Added test for uct_md_mem_query()
  • Added request interrupt support
  • Added support for connection manager fallbacks
  • Added new ucp request test checking for leaks from the ptr_map

Documentation

  • Added glossaries

Bugfixes:

Portability

  • Fixes in print functions to use format string like PRIx64, etc.
  • Fixes for Arm v8 cross compilation support

Continues Integration:

  • Fixes in Github release flow
  • Fixes in docker image

Packaging

  • Removed deb package dependencies
  • Fixes in SPEC to make the RPM relocatable

Documentation

  • Fixes in documentation for ucp_am_recv_data_nbx
  • Fixes in quick start example
  • Fixes in installation instruction
  • Fixes in updates in author list

Tests

  • Fixes for failures under valgrind runtime
  • Fixes in mmap tests for 0-length RMA
  • Fixes in definition of LAST_WQE wait timeout
  • Fixes in ROCm for mem_buffer test
  • Fixes in test name printing format
  • Fixes in tcp_sockcm test

UCP

  • Fixes in worker cleanup flow
  • Fixes in RNDV RTS flow
  • Fix in length check condition for RMA PUT short
  • Fixes in handling failures from AM Bcopy
  • Fix in a release flow of deferred data
  • Fixes for invalid ID and handling of status in RNDV

CUDA

  • Fixes in managed memory support
  • Fixes in topology detection

RDMA CORE (IB, ROCE, etc.)

  • Fixes in assert definitions
  • Fixes in printing an error about invalid AM Bcopy length for UD
  • Fixes for thread safety support
  • Fixes to get ROCE device name according to GID
  • Fixes for SL selection
  • Fixes in create STRICT_ORDER key
  • Fixes addressing performance degradation in UD transport due to excess async events
  • Fixes in QP destroy
  • Fixes for CQ creation failure using old Verbs API

UGNI

  • Fixing disable logic in config
  • Fixing clang 11 warnings

Java

  • Fixes in build dependencies
  • Fixes in constructing UcpRequest object on error
  • Fixes in exception handling on endpoint closure request
  • Fixes for segfault in UcpErrorHandler

UCP

  • Fixes in datatype support for get_zcopy RNDV
  • Fixes in connection manager disconnect
  • Fixes in assert definitions
  • Fixes in completion flow for failed EP
  • Fixes in flush error handling flow
  • Fixes in latency calculations for wireup protocol
  • Fixes in offload completion with inlined data
  • Fixes in unpacking flow
  • Fixes in error handling for various protocols

UCT

  • Fixes in flush TX
  • Fixes in checks for enabling GPU Direct RDMA

UCS

  • Fixes for crashes on incorrect value set in config
  • Fixes in ptr_array
  • Fixes in maximal size for ucs_snprintf_safe()
  • Fixes in compilation warning
  • Fixes in ucs_aarch64_dsb(_op) definition

TCP

  • Fixes in default route interface confirmation flow
  • Fixes in PUT protocol
  • Fixes in max connection limit and improved error reporting

UCM

  • Fixing crash on prevent unload
  • Fixes in libucm_rocm
  • Fixes for few racing conditions

v1.10.0-rc3

15 Feb 16:27
c334359
Compare
Choose a tag to compare
v1.10.0-rc3 Pre-release
Pre-release

1.10.0-rc3 (February 15, 2021)

Features:

Core

  • Added support for Nvidia HPC SDK
  • Added support for latest PGI and Clang
  • Added support for ROCM-3.7+ (warning generated if older version detected)
  • Added support for GCC11

Architecture

  • Added Arm SVE memcpy()
  • Redesigned Arm WFE support
  • Improved clear_cache performance for Arm
  • Added architecture detection for Zhaoxin CPU

CI

  • Added release builds on CUDA 11
  • Enabled performance validation in gtest

UCP

  • Added locality awareness to the transport selection logic for GPU devices
  • Added put/offload/short and put/offload/zcopy protocols
  • Added receive message nbx routine
  • Reworked AM implementation and API, which adds support for RNDV semantics
  • Added support for multi-lane connection manager over TCP
  • Added support for printing AM tls with info log level
  • Implement flush and destroy for UCT EPs on UCP worker
  • Reduced UCP request size
  • Added support for keepalive protocol
  • Added support for multi-fragment protocol
  • Added implementation for protocol progress for eager, bcopy, and multicopy
  • Improved selection logic for protocol selection
  • Added new protocols for UCP get operation
  • Added bcopy protocols with support for GPU memory
  • Added RNDV protocol implementation for GPU devices (CUDA, ROCm)
  • Set SOCKADDR_CM_ENABLE=y by default
  • Added support for fast-path short with new tag protocols
  • Added a new parameter to control the CM listener's backlog
  • Added support sending AM RTS over short message protocol
  • Added support for shared memory multi-lane when CM is used

UCT

  • Added API for keepalive_timeout value
  • Added add uct_completion.status
  • Allowed transports to access multiple mem_types
  • Removed status arg from uct_completion_callback_t
  • Restructured uct_mem_alloc/uct_md_mem_alloc to use mem_type
  • Updated documentation for uct_listener_params
  • Lowered the log level for certain network errors
  • Added cuda_copy wakeup feature
  • Added wakeup support for shared memory

UCS

  • Added "inf" and "auto" values to time units
  • Added on-stack constructors for array and string buffer
  • Added ucs_ptr_map_t data structure
  • Added bool CSWAP
  • Improved logging
  • Added optimization for namespace processing
  • Fixes for connection matching functionality

RDMA CORE (IB, ROCE, etc.)

  • Added support for auto detection of adapative routing settings
  • Added an option to poll TX CQ every progress iteration
  • Added local and remote addresses to the reject error message
  • Added support for UAR allocation with non-cacheable memory type
  • Added support for multiple flush cancel without completion
  • Added async events callback support
  • Added detection for ConnectX-6, ConnectX-7 and BlueField-1/2 devices
  • Added support for connection matching for UD
  • Added a check for AM ordering
  • Added better support for non-4K MTU values

Java (preview)

  • Added support for a different javadoc executable path for different java versions
  • Added UCS memory type constants
  • Added support build on Java10+
  • Added support for io-vector datatype.

Tests

  • Added CI for CUDA 11
  • Added test_ucp_sockaddr_protocols.stream_short
  • Reimplemented tests using NBX API
  • Added flush(cancel) test
  • Added memory_wait mode to perftest
  • Added support for clang 10
  • Refactored RMA and atomic tests, add memtype support
  • Added test for uct_md_mem_query()
  • Added request interrupt support
  • Added support for connection manager fallbacks
  • Added new ucp request test checking for leaks from the ptr_map

Documentation

  • Added glossaries

Bugfixes:

Portability

  • Fixes in print functions to use format string like PRIx64, etc.
  • Fixes for Arm v8 cross compilation support

Continues Integration:

  • Fixes in Github release flow
  • Fixes in docker image

Packaging

  • Removed deb package dependencies
  • Fixes in SPEC to make the RPM relocatable

Documentation

  • Fixes in documentation for ucp_am_recv_data_nbx
  • Fixes in quick start example
  • Fixes in installation instruction
  • Fixes in updates in author list

Tests

  • Fixes for failures under valgrind runtime
  • Fixes in mmap tests for 0-length RMA
  • Fixes in definition of LAST_WQE wait timeout
  • Fixes in ROCm for mem_buffer test
  • Fixes in test name printing format
  • Fixes in tcp_sockcm test

UCP

  • Fixes in worker cleanup flow
  • Fixes in RNDV RTS flow
  • Fix in length check condition for RMA PUT short
  • Fixes in handling failures from AM Bcopy
  • Fix in a release flow of deferred data
  • Fixes for invalid ID and handling of status in RNDV

CUDA

  • Fixes in managed memory support

RDMA CORE (IB, ROCE, etc.)

  • Fixes in assert definitions
  • Fixes in printing an error about invalid AM Bcopy length for UD
  • Fixes for thread safety support
  • Fixes to get ROCE device name according to GID
  • Fixes for SL selection
  • Fixes in create STRICT_ORDER key
  • Fixes addressing performance degradation in UD transport due to excess async events
  • Fixes in QP destroy
  • Fixes for CQ creation failure using old Verbs API

UGNI

  • Fixing disable logic in config
  • Fixing clang 11 warnings

Java

  • Fixes in build dependencies
  • Fixes in constructing UcpRequest object on error
  • Fixes in exception handling on endpoint closure request
  • Fixes for segfault in UcpErrorHandler

UCP

  • Fixes in datatype support for get_zcopy RNDV
  • Fixes in connection manager disconnect
  • Fixes in assert definitions
  • Fixes in completion flow for failed EP
  • Fixes in flush error handling flow
  • Fixes in latency calculations for wireup protocol
  • Fixes in offload completion with inlined data
  • Fixes in unpacking flow
  • Fixes in error handling for various protocols

UCT

  • Fixes in flush TX
  • Fixes in checks for enabling GPU Direct RDMA

UCS

  • Fixes for crashes on incorrect value set in config
  • Fixes in ptr_array
  • Fixes in maximal size for ucs_snprintf_safe()
  • Fixes in compilation warning
  • Fixes in ucs_aarch64_dsb(_op) definition

TCP

  • Fixes in default route interface confirmation flow
  • Fixes in PUT protocol
  • Fixes in max connection limit and improved error reporting

UCM

  • Fixing crash on prevent unload
  • Fixes in libucm_rocm
  • Fixes for few racing conditions

v1.10.0-rc2

27 Feb 03:28
f609817
Compare
Choose a tag to compare
v1.10.0-rc2 Pre-release
Pre-release

1.10.0-rc2 (February 2, 2021)

Features:

Core

  • Added support for Nvidia HPC SDK
  • Added support for latest PGI and Clang
  • Added support for ROCM-3.7+ (warning generated if older version detected)

Architecture

  • Added Arm SVE memcpy()
  • Redesigned Arm WFE support
  • Improved clear_cache performance for Arm
  • Added architecture detection for Zhaoxin CPU

CI

  • Added release builds on CUDA 11
  • Enabled performance validation in gtest

UCP

  • Added locality awareness to the transport selection logic for GPU devices
  • Added put/offload/short and put/offload/zcopy protocols
  • Added receive message nbx routine
  • Reworked AM implementation and API, which adds support for RNDV semantics
  • Added support for multi-lane connection manager over TCP
  • Added support for printing AM tls with info log level
  • Implement flush and destroy for UCT EPs on UCP worker
  • Reduced UCP request size
  • Added support for keepalive protocol
  • Added support for multi-fragment protocol
  • Added implementation for protocol progress for eager, bcopy, and multicopy
  • Improved selection logic for protocol selection
  • Added new protocols for UCP get operation
  • Added bcopy protocols with support for GPU memory
  • Added RNDV protocol implementation for GPU devices (CUDA, ROCm)
  • Set SOCKADDR_CM_ENABLE=y by default
  • Added support for fast-path short with new tag protocols
  • Added a new parameter to control the CM listener's backlog
  • Added support sending AM RTS over short message protocol
  • Added support for shared memory multi-lane when CM is used

UCT

  • Added API for keepalive_timeout value
  • Added add uct_completion.status
  • Allowed transports to access multiple mem_types
  • Removed status arg from uct_completion_callback_t
  • Restructured uct_mem_alloc/uct_md_mem_alloc to use mem_type
  • Updated documentation for uct_listener_params
  • Lowered the log level for certain network errors
  • Added cuda_copy wakeup feature
  • Added wakeup support for shared memory

UCS

  • Added "inf" and "auto" values to time units
  • Added on-stack constructors for array and string buffer
  • Added ucs_ptr_map_t data structure
  • Added bool CSWAP
  • Improved logging
  • Added optimization for namespace processing
  • Fixes for connection matching functionality

RDMA CORE (IB, ROCE, etc.)

  • Added support for auto detection of adapative routing settings
  • Added an option to poll TX CQ every progress iteration
  • Added local and remote addresses to the reject error message
  • Added support for UAR allocation with non-cacheable memory type
  • Added support for multiple flush cancel without completion
  • Added async events callback support
  • Added detection for ConnectX-6, ConnectX-7 and BlueField-1/2 devices
  • Added support for connection matching for UD
  • Added a check for AM ordering

Java (preview)

  • Added support for a different javadoc executable path for different java versions
  • Added UCS memory type constants
  • Added support build on Java10+
  • Added support for io-vector datatype.

Tests

  • Added CI for CUDA 11
  • Added test_ucp_sockaddr_protocols.stream_short
  • Reimplemented tests using NBX API
  • Added flush(cancel) test
  • Added memory_wait mode to perftest
  • Added support for clang 10
  • Refactored RMA and atomic tests, add memtype support
  • Added test for uct_md_mem_query()
  • Added request interrupt support
  • Added support for connection manager fallbacks
  • Added new ucp request test checking for leaks from the ptr_map

Documentation

  • Added glossaries

Bugfixes:

Portability

  • Fixes in print functions to use format string like PRIx64, etc.

Continues Integration:

  • Fixes in Github release flow
  • Fixes in docker image

Packaging

  • Removed deb package dependencies
  • Fixes in SPEC to make the RPM relocatable

Documentation

  • Fixes in documentation for ucp_am_recv_data_nbx
  • Fixes in quick start example
  • Fixes in installation instruction

Tests

  • Fixes for failures under valgrind runtime
  • Fixes in mmap tests for 0-length RMA
  • Fixes in definition of LAST_WQE wait timeout
  • Fixes in ROCm for mem_buffer test
  • Fixes in test name printing format
  • Fixes in tcp_sockcm test

UCP

  • Fixes in worker cleanup flow

CUDA

  • Fixes in managed memory support

RDMA CORE (IB, ROCE, etc.)

  • Fixes in assert definitions
  • Fixes in printing an error about invalid AM Bcopy length for UD
  • Fixes for thread safety support
  • Fixes to get ROCE device name according to GID
  • Fixes for SL selection
  • Fixes in create STRICT_ORDER key
  • Fixes addressing performance degradation in UD transport due to excess async events

UGNI

  • Fixing disable logic in config
  • Fixing clang 11 warnings

Java

  • Fixes in build dependencies
  • Fixes in constructing UcpRequest object on error
  • Fixes in exception handling on endpoint closure request
  • Fixes for segfault in UcpErrorHandler

UCP

  • Fixes in datatype support for get_zcopy RNDV
  • Fixes in connection manager disconnect
  • Fixes in assert definitions
  • Fixes in completion flow for failed EP
  • Fixes in flush error handling flow
  • Fixes in latency calculations for wireup protocol
  • Fixes in offload completion with inlined data
  • Fixes in unpacking flow
  • Fixes in error handling for various protocols

UCT

  • Fixes in flush TX
  • Fixes in checks for enabling GPU Direct RDMA

UCS

  • Fixes for crashes on incorrect value set in config
  • Fixes in ptr_array
  • Fixes in maximal size for ucs_snprintf_safe()
  • Fixes in compilation warning
  • Fixes in ucs_aarch64_dsb(_op) definition

TCP

  • Fixes in default route interface confirmation flow
  • Fixes in PUT protocol
  • Fixes in max connection limit and improved error reporting

UCM

  • Fixing crash on prevent unload
  • Fixes in libucm_rocm
  • Fixes for few racing conditions

v1.10.0-rc1

05 Jan 22:24
a212a09
Compare
Choose a tag to compare
v1.10.0-rc1 Pre-release
Pre-release

Features: TBD

Bugfixes: TBD

v1.9.0

20 Sep 09:41
cd9efd3
Compare
Choose a tag to compare

Features:

UCX Core

  • Added a new class of communication APIs '*_nbx' that enable API extendability while
    preserving ABI backward compatibility
  • Added asynchronous event support to UCT/IB/DEVX
  • Added support for latest CUDA library version
  • Added NAK-based reliability protocol for UCT/IB/UD to optimize resends
  • Added new tests for ROCm
  • Added new configuration parameters for protocol selection
  • Added performance optimization for Fujitsu A64FX with InfiniBand
  • Added performance optimization for clear cache code aarch64
  • Added support for relaxed-order PCIe access in IB RDMA transports
  • Added new TCP connection manager
  • Added support for UCT/IB PKey with partial membership in IB transports
  • Added support for RoCE LAG
  • Added support for ROCm 3.7 and above
  • Added flow control for RDMA read operations
  • Improved endpoint flush implementation for UCT/IB
  • Improved UD timer to avoid interrupting the main thread when not in use
  • Improved latency estimation for network path with CUDA
  • Improved error reporting messages
  • Improved performance in active message flow (removed malloc call)
  • Improved performance in ptr_array flow
  • Improved performance in UCT/SM progress engine flow
  • Improved I/O demo code
  • Improved rendezvous protocol for CUDA
  • Updated examples code

UCX Java (API Preview)

  • Added support for UCX shared library loading from both classpath and LD_LIBRARY_PATH
  • Added configuration map to ucp_params to be able to set UCX properties programmatically

Bugfixes:

  • Fixes for most resent versions of GCC, CLANG, ARMCLANG, PGI
  • Fixes in UCT/IB for strict order keys
  • Fixes in memory barrier code for aarch64
  • Fixes in UCT/IB/DEVX for fork system call
  • Fixes in UCT/IB for rand() call in rdma-core
  • Fixed in group rescheduling for UCT/IB/DC
  • Fixes in UCT/CUDA bandwidth reporting
  • Fixes in rkey_ptr protocol
  • Fixes in lane selection for rendezvous protocol based on get-zero-copy flow
  • Fixes for ROCm build
  • Fixes for XPMEM transport
  • Fixes in closing endpoint code
  • Fixes in RDMACM code
  • Fixes in memcpy selection for AMD
  • Fixed in UCT/UD endpoint flush functionality
  • Fixes in XPMEM detection
  • Fixes in rendezvous staging protocol
  • Fixes in ROCEv1 mlx5 UDP source port configuration
  • Multiple fixes in RPM spec file
  • Multiple fixes in UCP documentation
  • Multiple fixes in socket connection manager
  • Multiple fixes in gtest
  • Multiple fixes in JAVA API implementation