Skip to content

v1.10.0-rc2

Pre-release
Pre-release
Compare
Choose a tag to compare
@shamisp shamisp released this 27 Feb 03:28
f609817

1.10.0-rc2 (February 2, 2021)

Features:

Core

  • Added support for Nvidia HPC SDK
  • Added support for latest PGI and Clang
  • Added support for ROCM-3.7+ (warning generated if older version detected)

Architecture

  • Added Arm SVE memcpy()
  • Redesigned Arm WFE support
  • Improved clear_cache performance for Arm
  • Added architecture detection for Zhaoxin CPU

CI

  • Added release builds on CUDA 11
  • Enabled performance validation in gtest

UCP

  • Added locality awareness to the transport selection logic for GPU devices
  • Added put/offload/short and put/offload/zcopy protocols
  • Added receive message nbx routine
  • Reworked AM implementation and API, which adds support for RNDV semantics
  • Added support for multi-lane connection manager over TCP
  • Added support for printing AM tls with info log level
  • Implement flush and destroy for UCT EPs on UCP worker
  • Reduced UCP request size
  • Added support for keepalive protocol
  • Added support for multi-fragment protocol
  • Added implementation for protocol progress for eager, bcopy, and multicopy
  • Improved selection logic for protocol selection
  • Added new protocols for UCP get operation
  • Added bcopy protocols with support for GPU memory
  • Added RNDV protocol implementation for GPU devices (CUDA, ROCm)
  • Set SOCKADDR_CM_ENABLE=y by default
  • Added support for fast-path short with new tag protocols
  • Added a new parameter to control the CM listener's backlog
  • Added support sending AM RTS over short message protocol
  • Added support for shared memory multi-lane when CM is used

UCT

  • Added API for keepalive_timeout value
  • Added add uct_completion.status
  • Allowed transports to access multiple mem_types
  • Removed status arg from uct_completion_callback_t
  • Restructured uct_mem_alloc/uct_md_mem_alloc to use mem_type
  • Updated documentation for uct_listener_params
  • Lowered the log level for certain network errors
  • Added cuda_copy wakeup feature
  • Added wakeup support for shared memory

UCS

  • Added "inf" and "auto" values to time units
  • Added on-stack constructors for array and string buffer
  • Added ucs_ptr_map_t data structure
  • Added bool CSWAP
  • Improved logging
  • Added optimization for namespace processing
  • Fixes for connection matching functionality

RDMA CORE (IB, ROCE, etc.)

  • Added support for auto detection of adapative routing settings
  • Added an option to poll TX CQ every progress iteration
  • Added local and remote addresses to the reject error message
  • Added support for UAR allocation with non-cacheable memory type
  • Added support for multiple flush cancel without completion
  • Added async events callback support
  • Added detection for ConnectX-6, ConnectX-7 and BlueField-1/2 devices
  • Added support for connection matching for UD
  • Added a check for AM ordering

Java (preview)

  • Added support for a different javadoc executable path for different java versions
  • Added UCS memory type constants
  • Added support build on Java10+
  • Added support for io-vector datatype.

Tests

  • Added CI for CUDA 11
  • Added test_ucp_sockaddr_protocols.stream_short
  • Reimplemented tests using NBX API
  • Added flush(cancel) test
  • Added memory_wait mode to perftest
  • Added support for clang 10
  • Refactored RMA and atomic tests, add memtype support
  • Added test for uct_md_mem_query()
  • Added request interrupt support
  • Added support for connection manager fallbacks
  • Added new ucp request test checking for leaks from the ptr_map

Documentation

  • Added glossaries

Bugfixes:

Portability

  • Fixes in print functions to use format string like PRIx64, etc.

Continues Integration:

  • Fixes in Github release flow
  • Fixes in docker image

Packaging

  • Removed deb package dependencies
  • Fixes in SPEC to make the RPM relocatable

Documentation

  • Fixes in documentation for ucp_am_recv_data_nbx
  • Fixes in quick start example
  • Fixes in installation instruction

Tests

  • Fixes for failures under valgrind runtime
  • Fixes in mmap tests for 0-length RMA
  • Fixes in definition of LAST_WQE wait timeout
  • Fixes in ROCm for mem_buffer test
  • Fixes in test name printing format
  • Fixes in tcp_sockcm test

UCP

  • Fixes in worker cleanup flow

CUDA

  • Fixes in managed memory support

RDMA CORE (IB, ROCE, etc.)

  • Fixes in assert definitions
  • Fixes in printing an error about invalid AM Bcopy length for UD
  • Fixes for thread safety support
  • Fixes to get ROCE device name according to GID
  • Fixes for SL selection
  • Fixes in create STRICT_ORDER key
  • Fixes addressing performance degradation in UD transport due to excess async events

UGNI

  • Fixing disable logic in config
  • Fixing clang 11 warnings

Java

  • Fixes in build dependencies
  • Fixes in constructing UcpRequest object on error
  • Fixes in exception handling on endpoint closure request
  • Fixes for segfault in UcpErrorHandler

UCP

  • Fixes in datatype support for get_zcopy RNDV
  • Fixes in connection manager disconnect
  • Fixes in assert definitions
  • Fixes in completion flow for failed EP
  • Fixes in flush error handling flow
  • Fixes in latency calculations for wireup protocol
  • Fixes in offload completion with inlined data
  • Fixes in unpacking flow
  • Fixes in error handling for various protocols

UCT

  • Fixes in flush TX
  • Fixes in checks for enabling GPU Direct RDMA

UCS

  • Fixes for crashes on incorrect value set in config
  • Fixes in ptr_array
  • Fixes in maximal size for ucs_snprintf_safe()
  • Fixes in compilation warning
  • Fixes in ucs_aarch64_dsb(_op) definition

TCP

  • Fixes in default route interface confirmation flow
  • Fixes in PUT protocol
  • Fixes in max connection limit and improved error reporting

UCM

  • Fixing crash on prevent unload
  • Fixes in libucm_rocm
  • Fixes for few racing conditions