Skip to content

Commit c0827a7

Browse files
authored
Update with changes from 0.3.4
1 parent 86cff4e commit c0827a7

File tree

1 file changed

+73
-0
lines changed

1 file changed

+73
-0
lines changed

Changelog.txt

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,77 @@
11
OpenBLAS ChangeLog
2+
====================================================================
3+
Version 0.3.4
4+
02-Dec-2018
5+
6+
common:
7+
* the new, experimental thread-local memory allocation had
8+
inadvertently been left enabled for gmake builds in 0.3.3
9+
despite the announcement. It is now disabled by default, and
10+
single-threaded builds will keep using the old allocator even
11+
if the USE_TLS option is turned on.
12+
* OpenBLAS will now provide enough buffer space for at least 50
13+
threads by default.
14+
* The output of openblas_get_config() now contains the version
15+
number.
16+
* A serious thread safety bug in GEMV operation with small M and
17+
large N size has been fixed.
18+
* The code will now automatically call blas_thread_init after a
19+
fork if needed before handling a call to openblas_set_num_threads
20+
* Accesses to parallelized level3 functions from multiple callers
21+
are now serialized to avoid thread races (unless using OpenMP).
22+
This should provide better performance than the known-threadsafe
23+
(but non-default) USE_SIMPLE_THREADED_LEVEL3 option.
24+
* When building LAPACK with gfortran, -frecursive is now (again)
25+
enabled by default to ensure correct behaviour.
26+
* The OpenBLAS version cblas.h now supports both CBLAS_ORDER and
27+
CBLAS_LAYOUT as the name of the matrix row/column order option.
28+
* Externally set LDFLAGS are now passed through to the final compile/link
29+
steps to facilitate setting platform-specific linker flags.
30+
* A potential race condition during the build of LAPACK (that would
31+
usually manifest itself as a failure to build TESTING/MATGEN) has been
32+
fixed.
33+
* xHEMV has been changed to stay single-threaded for small input sizes
34+
where the overhead of multithreading exceeds any possible gains
35+
* CSWAP and ZSWAP have been limited to a single thread except on ARMV8 or
36+
ThunderX hardware with sizable input.
37+
* Linker flags for the PGI compiler have been updated
38+
* Behaviour of AXPY with zero increments is now handled in the C interface,
39+
correcting the result on at least Intel Atom.
40+
* The result matrix from calling SGELSS with an all-zero input matrix is
41+
now zeroed completely.
42+
43+
x86_64:
44+
* Autodetection of AMD Ryzen2 has been fixed (again).
45+
* CMAKE builds now support labeling of an INTERFACE64=1 build of
46+
the library with the _64 suffix.
47+
* AVX512 version of DGEMM has been added and the AVX512 SGEMM kernel
48+
has been sped up by rewriting with C intrinsics
49+
* Fixed compilation on RHEL5/CENTOS5 (issue with typename __WAIT_STATUS)
50+
51+
POWER:
52+
* added support for building on AIX (with gcc and GNU tools from AIX Toolbox).
53+
* CPU type detection has been implemented for AIX.
54+
* CPU type detection has been fixed for NETBSD.
55+
56+
MIPS64:
57+
* AXPY on LOONGSON3A has been corrected to pass "zero increment" utest.
58+
* DSDOT on LOONGSON3A has been fixed.
59+
* the SGEMM microkernel has been hardened against potential data loss.
60+
61+
ARMV8:
62+
* DYNAMic_ARCH support is now available for 64bit ARM
63+
* cross-compiling for ARMV8 under iOS now works.
64+
* cpu-specific code has been rearranged to make better use of both
65+
hardware commonalities and model-specific compiler optimizations.
66+
* XGENE1 has been removed as a TARGET, superseded by the improved generic
67+
ARMV8 support.
68+
69+
ARMV7:
70+
* Older assembly mnemonics have been converted to UAL form to allow
71+
building with clang 7.0
72+
* Cross compiling LAPACKE for Android has been fixed again (broken by
73+
update to LAPACK 3.7.0 some while ago).
74+
275
====================================================================
376
Version 0.3.3
477
31-Aug-2018

0 commit comments

Comments
 (0)