PaRSEC 20.12 (December 2020)

PaRSEC API 3.0
PaRSEC now requires CMake 3.16.
New configure system to ease the installation of PaRSEC. See
INSTALL for details. This system automates installation on most DOE
leadership systems.
Split DPLASMA and PaRSEC into separate repositories. PaRSEC moves from
cmake-2.0 to cmake-3.12, using targets. Targets are exported for
third-party integration
Add visualization tools to extract user-defined properties from the
application (see: PR 229 visualization-tools)
Automate expression of required data transfers from host-to-device and
device-to-host to satisfy depencencies (and anti-dependencies). PaRSEC tracks
multiple versions of the same data as data copies with a coherency algorithm
that initiates data transfers as needed. The heurisitic for the eviction policy
in out-of-memory event on GPU has been optimized to allow for efficient
operation in larger than GPU memory problems.
Add support for MPI out-of-order matching capabilities; Added capability
for compute threads to send direct control messages to indicate completion
of tasks to remote nodes (without delegation to the communication thread)
Remove communication mode EAGER from the runtime. It had a rare
but hard to correct bug that would rarely deadlock, and the performance
benefit was small.
Add a Map operator on the Block Cyclic matrix data collection that
performs in-place data transformation on the collection with a user provided
operator.
Add support in the runtime for user-defined properties evaluated at
runtime and easy to export through a shared memory region (see: PR
229 visualization-tools)
Add a PAPI-SDE interface to the parsec library, to expose internal
counters via the PAPI-Software Defined Events interface.
Add a backend support for OTF2 in the profiling mechanism. OTF2 is
used automatically if a OTF2 installation is found.
Add a MCA parameter to control the number of ejected blocks from GPU
memory (device_cuda_max_number_of_ejected_data). Add a MCA parameter
to control wether or not the GPU engine will take some time to sort
the first N tasks of the pending queue (device_cuda_sort_pending_list).
Reshape the users vision of PaRSEC: they only have to include a single
header (parsec.h) for most usages, and link with a single library
(-lparsec).
Update the PaRSEC DSL handling of initial tasks. We now rely on 2
pieces of information: the number of DSL tasks, and the number of
tasks imposed by the system (all types of data transfer).
Add a purely local scheduler (ll), that uses a single LIFO per
thread. Each schedule operation does 1 atomic (push in local queue),
each select operation does up to t atomics (pop in local queue, then
try any other thread's queue until they are all tested empty).
Add a --ignore-properties=... option to parsec_ptgpp
Change API of hash tables: allow keys of arbitrary size. The API
features how to build a key from a task; how to hash a key into
1 <= N <= 64 bits; and how to compare twy keys (plus a printing
function to debug).
Change behavior of DEBUG_HISTORY: log all information inside
a buffer of fixed size (MCA parameter) per thread, do not allocate
memory during logging, and use timestamp to re-order output
when the user calls dump()
DTD interface is updated (new flag to send pointer as parameter,
unpacking of paramteres is simpler etc).
DTD provides mca param (dtd_debug_verbose) to print information
about traversal of DAG in a separate output stream from the default.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v3.0.2012

PaRSEC 20.12 (December 2020)