|
3 | 3 |
|
4 | 4 | ## DASH Template Library
|
5 | 5 |
|
6 |
| -### Features: |
| 6 | +### New Features: |
7 | 7 |
|
8 | 8 | - Added meta-type traits and helpers
|
9 | 9 | - Added range types and range expressions
|
|
17 | 17 | space.
|
18 | 18 | - Global dynamic memory allocation: concepts and reference implementations
|
19 | 19 | (`dash::GlobHeapMem`, `dash::GlobStaticMem`)
|
20 |
| -- Supporting `dash::Atomic<T>` as container element type |
21 |
| -- Well-defined atomic operation semantics for `dash::Shared` |
22 |
| -- Added load balance patterns and automatic data distribution based on |
23 |
| - locality information to aid in load balancing |
24 |
| -- Improved pattern implementations, rewriting pattern methods as single |
25 |
| - arithmetic expressions |
| 20 | +- Added `dash::Atomic<T>` as container element type to support atomic access |
26 | 21 | - Introduced parallel IO concepts for DASH containers (`dash::io`),
|
27 | 22 | currently implemented based on HDF5
|
28 |
| -- Introduced stencil iterator and halo block concepts |
29 |
| -- Using strict unit ID types to distinguish global and team scope |
30 |
| -- Using new DASH locality domain concept to provide automatic configuration |
31 |
| - of OpenMP for node-level parallelization |
32 |
| -- New algorithms, including `dash::fill`, `dash::generate`, `dash::find`. |
33 |
| -- Drastic performance improvements in algorithms, e.g. `dash::min_element`, |
| 23 | +- Introduced Halo matrix supporting arbitrary stencils |
| 24 | +- New algorithms, including `dash::fill`, `dash::generate`, `dash::find`, |
| 25 | + `dash::reduce`, and `dash::sort` |
| 26 | +- Performance improvements in algorithms, e.g. `dash::min_element`, |
34 | 27 | `dash::transform`
|
35 |
| -- Additional benchmark applications |
36 |
| -- Additional example applications, including histogram sort and radix sort |
37 |
| - implementations |
38 | 28 | - Runtime configuration interface (`dash::util::Config`)
|
39 |
| -- Improved output format and log targets in unit tests |
40 |
| -- Added support for HDF5 groups |
41 |
| -- Relaxed restrictions on container element types |
42 |
| -- Support patterns with underfilled blocks in `dash::io::hdf5` |
| 29 | +- Restricted container element type check to `std::is_trivially_copyable` |
| 30 | +- Added CoArray implementation (`dash::CoArray`) |
| 31 | +- Made global pointers (`dash::GlobPtr`) copyable across units |
| 32 | +- Additional benchmark applications |
43 | 33 |
|
44 | 34 | ### Bugfixes:
|
45 | 35 |
|
|
49 | 39 | - Conversions of `GlobPtr<T>`, `GlobRef<T>`, `GlobIter<T>`, ... now
|
50 | 40 | const-correct (e.g., to `GlobIter<const T>` instead of `const GlobIter<T>`)
|
51 | 41 | - Consistent usage of index- and size types
|
52 |
| -- Numerous stability fixes and performance improvements |
53 | 42 | - Move-semantics of allocators
|
| 43 | +- Numerous stability fixes and performance improvements |
54 | 44 |
|
55 | 45 | ### Known limitations:
|
56 | 46 |
|
57 |
| -- Type trait `dash::is_container_compatible` does not check |
58 |
| - `std::is_trivially_copyable` for Cray compilers and GCC <= 4.8.0 |
59 |
| - (issue #241) |
60 |
| - |
61 |
| - |
62 |
| - |
63 | 47 | ## DART Interface and Base Library
|
64 | 48 |
|
65 | 49 | ### Features:
|
|
68 | 52 | IDs (`dart_global_unit_t`) and IDs that are relative to a team
|
69 | 53 | (`dart_team_unit_t`).
|
70 | 54 | - Added function `dart_allreduce` and `dart_reduce`
|
71 |
| -- Made global memory allocation and communication operations aware of the |
| 55 | +- Made global memory allocation and communication operations aware of the |
72 | 56 | underlying data type to improve stability and performance
|
73 |
| -- Made DART global pointer globally unique to allow copying of global pointer |
74 |
| - between members of the team that allocated the global memory. Note that a |
75 |
| - global pointer now contains unit IDs relative to the team that allocated |
76 |
| - the memory instead of global unit IDs. |
77 |
| -- Extended use of `const` specifier in DART communication interface |
| 57 | +- Made DART global pointer globally unique to allow copying of global pointer |
| 58 | + between members of the team that allocated the global memory. |
| 59 | +- `const`-correctness in DART communication interface |
78 | 60 | - Added interface component `dart_locality` implementing topology discovery
|
79 | 61 | and hierarchical locality description
|
80 | 62 |
|
81 |
| - - New types: |
82 |
| - - `dart_locality_scope_t`: enum of locality scopes (global, node, |
83 |
| - module, NUMA, ...) |
84 |
| - - `dart_hwinfo_t`: hardware information such as number of NUMA |
85 |
| - domains and cores, CPU clock frequencies, CPU pinning, cache sizes, |
86 |
| - etc. |
87 |
| - - `dart_domain_locality_t`: node in a locality domain hierarchy |
88 |
| - providing locality information such as the number of units in the |
89 |
| - domain and their ids, sub-domains, level in topology, etc. |
90 |
| - - `dart_unit_localiy_t`: locality information for a specific unit |
91 |
| - |
92 |
| - - New functions: |
93 |
| - - `dart_domain_locality`: Access hierarchical locality description of |
94 |
| - a specified locality domain |
95 |
| - - `dart_team_locality`: Access hierarchical locality description of a |
96 |
| - specified team. |
97 |
| - - `dart_unit_locality`: Access locality description of a specified |
98 |
| - unit |
99 |
| - |
100 |
| - - New base implementations: \ |
101 |
| - Implementations of the locality components to be usable by any DART |
102 |
| - backend: |
103 |
| - - `dart__base__locality__init` |
104 |
| - - `dart__base__locality__finalize` |
105 |
| - - `dart__base__locality__domain` |
106 |
| - - `dart__base__locality__unit` |
107 |
| - |
108 | 63 | ### Bugfixes:
|
109 | 64 |
|
110 |
| -- Added clarification which DART functionality provides thread-safe access. |
| 65 | +- Added clarification which DART functionality provides thread-safe access. |
111 | 66 | DART functions can be considered thread-safe as long as they do not operate
|
112 |
| - on the same data structures. In particular, thread-concurrent (collective) |
113 |
| - operations on the same team are not guaranteed to be safe. |
| 67 | + on the same data structures. In particular, thread-concurrent (collective) |
| 68 | + operations on the same team are not guaranteed to be safe. |
114 | 69 |
|
115 | 70 | ### Known limitations:
|
116 | 71 |
|
|
123 | 78 |
|
124 | 79 | ### Bugfixes:
|
125 | 80 |
|
126 |
| -- Fixed numerous memory leaks in dart-mpi |
| 81 | +- Added support for `put`/`get` operations on data `>2GB` |
| 82 | +- Added support for custom data-types and reduction operations |
| 83 | +- Fixed numerous stability issues and memory leaks in dart-mpi |
127 | 84 |
|
128 | 85 | ### Known limitations:
|
129 | 86 |
|
130 |
| -- Elements allocated in shared windows are not properly aligned for some |
131 |
| - versions of OpenMPI (issue #280, fixed since OpenMPI 2.0.2) |
132 |
| -- Thread-concurrent access may lead to failures with OpenMPI even if |
133 |
| - thread support is enabled in DART (build option `ENABLE_THREADSUPPORT`, |
134 |
| - issue #292) |
135 |
| - |
| 87 | +- Elements allocated in shared windows are not properly aligned for some |
| 88 | + versions of Open MPI (issue #280) |
| 89 | +- Potential NUMA performance issue caused by shared memory allocation in the |
| 90 | + underlying MPI windows |
136 | 91 |
|
137 | 92 |
|
138 | 93 | ## Build System
|
|
163 | 118 | - `DASH__ARCH__HAS_RDTSC`: Whether the target architecture provides
|
164 | 119 | an RDTSC micro-instruction.
|
165 | 120 |
|
166 |
| -- Added compiler wrapper dashc++ (and aliases dashcxx and dashCC) that includes |
167 |
| - DASH-specific flags compiler and linker flags. To use the wrapper, simply |
168 |
| - replace mpicxx with dashcxx when building your application. |
| 121 | +- Added compiler wrapper dashc++ (and aliases dashcxx and dashCC) that includes |
| 122 | + DASH-specific compiler and linker flags. To use the wrapper, simply |
| 123 | + replace mpicxx with dash-mpicxx when building your application. |
169 | 124 |
|
170 | 125 | ### Bugfixes:
|
171 | 126 |
|
|
0 commit comments