CSA reduction based microarchitecture modification of the Forward-Adder-Network in the SIGMA DNN Accelerator that supports multi-vector, multi-operand addition within the same adder instantiation for sparse and irregular GEMM workloads.
- testbench(es)
- readme detailed report
- link sigma arxiv (and paper citation)
- CarrySaveFAN arch diag
- RedTreeFAN arch diag
- Detailed README -- tradeoffs/caveats
- Signed Integer Support
- Reverse 3:2 compressors
- FP32 support
- CMake/Verilator environment setup
- Sparsity support (gating?)
- SIGMA integration
- Benchmark/synth analysis sweep over N and Ws