Skip to content

v3.0.2012

Compare
Choose a tag to compare
@abouteiller abouteiller released this 03 Mar 22:47
· 1012 commits to master since this release
d2ae417

PaRSEC 20.12 (December 2020)

  • PaRSEC API 3.0

  • PaRSEC now requires CMake 3.16.

  • New configure system to ease the installation of PaRSEC. See
    INSTALL for details. This system automates installation on most DOE
    leadership systems.

  • Split DPLASMA and PaRSEC into separate repositories. PaRSEC moves from
    cmake-2.0 to cmake-3.12, using targets. Targets are exported for
    third-party integration

  • Add visualization tools to extract user-defined properties from the
    application (see: PR 229 visualization-tools)

  • Automate expression of required data transfers from host-to-device and
    device-to-host to satisfy depencencies (and anti-dependencies). PaRSEC tracks
    multiple versions of the same data as data copies with a coherency algorithm
    that initiates data transfers as needed. The heurisitic for the eviction policy
    in out-of-memory event on GPU has been optimized to allow for efficient
    operation in larger than GPU memory problems.

  • Add support for MPI out-of-order matching capabilities; Added capability
    for compute threads to send direct control messages to indicate completion
    of tasks to remote nodes (without delegation to the communication thread)

  • Remove communication mode EAGER from the runtime. It had a rare
    but hard to correct bug that would rarely deadlock, and the performance
    benefit was small.

  • Add a Map operator on the Block Cyclic matrix data collection that
    performs in-place data transformation on the collection with a user provided
    operator.

  • Add support in the runtime for user-defined properties evaluated at
    runtime and easy to export through a shared memory region (see: PR
    229 visualization-tools)

  • Add a PAPI-SDE interface to the parsec library, to expose internal
    counters via the PAPI-Software Defined Events interface.

  • Add a backend support for OTF2 in the profiling mechanism. OTF2 is
    used automatically if a OTF2 installation is found.

  • Add a MCA parameter to control the number of ejected blocks from GPU
    memory (device_cuda_max_number_of_ejected_data). Add a MCA parameter
    to control wether or not the GPU engine will take some time to sort
    the first N tasks of the pending queue (device_cuda_sort_pending_list).

  • Reshape the users vision of PaRSEC: they only have to include a single
    header (parsec.h) for most usages, and link with a single library
    (-lparsec).

  • Update the PaRSEC DSL handling of initial tasks. We now rely on 2
    pieces of information: the number of DSL tasks, and the number of
    tasks imposed by the system (all types of data transfer).

  • Add a purely local scheduler (ll), that uses a single LIFO per
    thread. Each schedule operation does 1 atomic (push in local queue),
    each select operation does up to t atomics (pop in local queue, then
    try any other thread's queue until they are all tested empty).

  • Add a --ignore-properties=... option to parsec_ptgpp

  • Change API of hash tables: allow keys of arbitrary size. The API
    features how to build a key from a task; how to hash a key into
    1 <= N <= 64 bits; and how to compare twy keys (plus a printing
    function to debug).

  • Change behavior of DEBUG_HISTORY: log all information inside
    a buffer of fixed size (MCA parameter) per thread, do not allocate
    memory during logging, and use timestamp to re-order output
    when the user calls dump()

  • DTD interface is updated (new flag to send pointer as parameter,
    unpacking of paramteres is simpler etc).

  • DTD provides mca param (dtd_debug_verbose) to print information
    about traversal of DAG in a separate output stream from the default.