einops-cpp is a C++17 compatible header-only library that implement the einops python project developed by Alex Rogozhnikov and elegantly summarized: "flexible and powerful tensor operations for readable and reliable code".
No installation needed, this is a zero-dependencies (except libtorch) library.
Just put the 'include'
directory on your compiler path and add the following line somewhere in you code:
#include <einops.hpp>
However, the project includes in is thirdparty directory the great python's OrderedDict as ordered-map, a C++ header-only implementation from Tessil's github.
No build is needed for the library. However, a very basic cmake file is ready to build the test project. It partially follows the different tests of the python project. Build successfull with Windows MSVC 17.7.4 & LLVM-Clang (VS2022) and GCC 11.3 on Linux Ubuntu 22-04 (WSL2).
#include <einops.hpp>
using namespace einops;
auto x = torch::arange({ 10 * 20 * 30 * 40 }).reshape({ 10, 20, 30, 40 });
// here it is an example of max-pooling with einops
auto y = reduce(x, "b c (h h1) (w w1) -> b c h w", "max", axis("h1", 2), axis("w1", 2));
Important
axis(key,value)
is an helper to simulate the syntax of the axes lengths, in python : (..., h1=2, w1=2)
All the following methods in the public C++ API are documented. For a better understanding, take a look at the test project which contains examples for each of the public methods. You can also check the original python project documentation einops.
- Follow the code of the python release
v0.7.0
release - Implements the
reduce()
,rearrange()
,repeat()
,einsum()
andparse_shape()
methods. - Implements the
pack()
,unpack()
methods. - Finalize the code of the
Rearrange
,Reduce
andEinMix
layers (akatorch::Module
) - Benchmark the LRU cache in few internal methods
- Optimize the code where possible (limit potential overhead)
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
The current code is a direct port of the Python code, libtorch is relatively easy to be adapted in C++. A future version will not be directly based on the python code, but rewritten for support another backends (e.g. Tensorflow, XTensor and more).
Tensorflow C++ API flow is not adapted to the current implementation (need scope, graph flow and session) and need a concatenation of the operations in a session at '_apply_recipe
' level for a better optimization of the code.
XTensor need type specialization by template for dynamic tensor instanciation (aka xt::xarray
), so the current abstraction of the Backend base class need to be rewrite.
The compile-time limitations of some potential future backend support, think Fastor (here), force to rethink the whole architecture to adapt both runtime and compile-time process.