Skip to content

Releases: ridiculousfish/libdivide

libdivide-5.1

30 Jul 17:59
Compare
Choose a tag to compare

This is a maintenance release.

This release mainly fixes a C++ compilation failure by the upcoming GCC 15 compiler: #113

ChangeLog

New Contributors

Full Changelog: 5.0...v5.1

v5.0.0

17 Jul 18:56
Compare
Choose a tag to compare
  • Reference code for narrowing division has been added.
  • The C and C++ APIs have been extended to support 16-bit scalar integer division.
  • Multiple enhancements to add support for 8-bit microcontrollers
    • Compiles cleanly using avr-gcc, used by the Atmel AVR microcontroller family (popular on Arduino boards)
      • Code base includes AtMega2560 test & bench marking programs
    • Adds predefined macros to speed up division by 16-bit constants: division by a 16-bit constant is not optimized by avr-gcc on 8-bit systems.

v4.0.0

09 Mar 07:20
Compare
Choose a tag to compare
  • All SIMD types may now be used simultaneously, instead of selecting one at compile time. For example you may define all of LIBDIVIDE_SSE2, LIBDIVIDE_AVX2, and LIBDIVIDE_AVX512 and use them simultaneously.
  • ARM NEON types are now supported. New functions take uint32x4_t, int32x4_t, uint64x2_t, and int64x2_t. Note: while libdivide is tested on both ARM32 and AArch64, NEON intrinsics have only been tested on AArch64.
  • Breaking: To support multiple vector types, vector functions have been renamed according to their width (#52). Instead of libdivide_u32_do_vector, now use libdivide_u32_do_vec128 for SSE2 or NEON, libdivide_u32_do_vec256 for AVX2, and libdivide_u32_do_vec512 for AVX512.
  • On non-x86 CPUs, generating 64 bit dividers is now faster than before. Previously libdivide used __uint128_t when available; however libdivide's fallback code was shown to be several times faster so the __uint128_t path has been removed. x86 and x86-64 CPUs are unaffected.
  • Certain code sourced from StackOverflow has been reimplemented; this code had an ambiguous license. All code in libdivide is now covered under the zlib or boost license (at your option).
  • libdivide.h no longer requires C++11 or later. The minimum language standards are C99 or C++98.

libdivide-3.0

16 Oct 09:32
Compare
Choose a tag to compare

This release adds C++ support for all 32-bit and 64-bit integer types (#58). Unfortunately this code change required C++11 instead of C++98, hence the major version had to be increased (even though this is a small release). This version also improves libdivide's CMake build system which should make it easier to package libdivide.

  • BREAKING
    • libdivide.h now requires C++11 or later
  • BUG FIXES
    • Support all 32-bit and 64-bit integer types in C++ (#58)
    • Fix cross compilation (#59)
  • ENHANCEMENT
    • Add support for CMake find_package(libdivide)

libdivide-2.0

04 Jul 14:45
Compare
Choose a tag to compare

I am happy to announce the release of libdivide-2.0 🎉

Libdivide finally supports AVX2 and AVX512 vector division on x86 CPUs. Libdivide now also works with the clang-cl compiler and the Intel C++ compiler on Windows. There have been many small incremental improvements which should provide minor speedups for many use cases.

Since libdivide is now nearly 10 years old and many features have been added over the years it has become necessary to remove some rarely used functionality. I have removed the unswitch functionality since it was a large amount of code that has never been used by anybody as far as I am aware of. So overall, even with the added support for AVX2 and AVX512, libdivide.h now contains fewer lines of code than the previous release and compiles faster using both C and C++.

  • BREAKING
    • Removed unswitch functionality (#46)
    • Renamed macro LIBDIVIDE_USE_SSE2 to LIBDIVIDE_SSE2
    • Renamed divider::recover_divisor() to divider::recover()
  • BUG FIXES
    • Remove _udiv128() as not yet supported by clang-cl and icl compilers
    • Fix C++ linker issue caused by anonymous namespace (#54)
    • Fix clang-cl (Windows) linker issue (#56)
  • ENHANCEMENT
    • Add AVX2 & AVX512 vector division
    • Speed up SSE2 libdivide_mullhi_u64_vector()
    • Support +1 & -1 signed branchfree dividers (4a1d5a7)
    • Speed up unsigned branchfull power of 2 dividers (2422199)
    • Simplify C++ templates
    • Simplify more bit flags of the libdivide_*_t structs
    • Get rid of MAYBE_VECTOR() hack
  • TESTING
    • tester.cpp: Convert to modern C++
    • tester.cpp: Add more test cases
    • benchmark_branchfreee.cpp: Convert to modern C++
    • benchmark.c: Prevent compilers from optmizing too much
  • BUILD
    • Automatically detect SSE2/AVX2/AVX512
  • DOCS

libdivide-1.1

29 May 16:51
Compare
Choose a tag to compare

This release fixes 2 non critical bugs and silences a few compiler warnings. The generation of libdivide divisors has been sped up for MSVC on x64 and for GCC/Clang on 64-bit CPU architectures other than x64. I have also done some general code clean ups, below is the compete changelog:

  • BUG FIXES
    • Fix bug in libdivide_128_div_64_to_64() (#45)
    • Fix MSVC ARM 64-bit bug (07931e9)
    • Fix -Wshift-count-overflow warning on avr CPU architecture (#41)
    • Fix -Wshadow warning in libdivide_s32_do()
    • Fix -Wignored-attributes warnings when compiling SSE2 code using GCC 9
  • ENHANCEMENT
    • libdivide_128_div_64_to_64(): optimize using _udiv128() for MSVC 2019 or later
    • libdivide_128_div_64_to_64(): optimize using __uint128_t for GCC/Clang on 64-bit CPU architectures
    • Add LIBDIVIDE_VERSION macro to libdivide.h
    • Clean up SSE2 code in libdivide.h
    • Increase runtime of test cases in primes_benchmark.cpp
  • BUILD
    • Remove windows directory with legacy Visual Studio project files
    • Move test programs to test directory

libdivide-1.0

21 Jan 16:38
Compare
Choose a tag to compare

I am happy to announce the 1.0 release of libdivide 🎉

A lot of effort has been spent to polish libdivide for the 1.0 release. It has also been tested extensively using a plethora of different compilers (GCC, Clang, MSVC, ICC, MinGW, Cygwin), OSes and CPU architectures (i386, x86-64, ARM, ARM64, PowerPC, PPC64) to ensure it passes all tests and compiles without warnings at a high warning level.

Have a look at the ChangeLog to see what's new.