v0.4-pre-apache-incubation
NOTE: This is a release pre apache incubation
This release features several major improvements. The high-level graph optimizer is now part of TVM repo. Some of the highlights are: Initial support of AutoTVM for automated optimization; customized accelerator backend VTA. Please also check out tvm.ai for latest blogposts.
The community welcomes new reviewers @kazum @alex-weaver @masahi @zhreshold @PariksheetPinjari909 @srkreddy1238 @eqy, new code owner @merrymercy, and new committer @yzhliu
Change List
Tensor Expression and Optimization
- Tensor operator primitives
- Introduce attrs field to operator primitives(e.g. compute) to store additional metadata, the attrs can be used as hint for scheduling
- Enable embedding of asm micro-kernels
- Hybrid python programming model
- python AST based IR builder interface
- support GPU programs
- AutoTVM, Automated tuning, and scheduling
- basic autotvm infra
- GPU IR verifier
- basic autotuning tutorial
- topi integration
- ARM support
- winograd support
- initial support of ARM autotuning records
- TOPI Vision
- Generic GPU sort support(useful for vision)
- SSD operator support
- TOPI numpy consistency
- Rename all binary operators for numpy consistecy: broadcast_add-> add, broadcast_sub -> substract, broadcast_mul -> multiply, broadcast_div->divide
- New operators: slice, LRN, equal, not_equal, less, greater
- tutorials on topi
- Initial low-bit operator support support
- Optimized popcount generation on ARM
- general bit-serial convolution and GEMM
- optimized low bit kernels
- parallel optimization
- New topi backend optimization for intel graphics
- Adapt AVX schedules for SSE target
Backend
- VTA: customized accelerator backend
- custom hardware backend example
- tutorials on how to use customized accelerator
- Initial experimental support for HLS backend
- Bugfix in SPIRV code generator for vulkan
- libdevice support, enable NVPTX backend
Runtime
- Introduce NDArrayContainer for managed NDarray
- RPC and Device API
- Support communication between big/small endian machines.
- RPC and device API protocol upgrade (this is a non-backward compatible change) to support big-small endian communication. This is a non-backward compatible change, need to use the latest version of TVM runtime with the RPC
- graduate rpc from contrib, tvm.contrib.rpc->tvm.rpc
-Support tracker in Android RPC, add fault tolerance for AutoTVM
- BIG.LITTLE aware threadpool
- tvm4j graph runtime that runs end to end workload in java
- DLPack support
- Support from_dlpack and to_dlpack
- Enables bridges to pytorch
- Enable link of stackvm in runtime
NNVM
- Tensorflow graphdef frontend
- Keras frontend
- improved to support reuse layers, add activations
- ONNX
- gather, LRN
- CoreML frontend
- Support C-RNN and activation functions
- Fix grads for sum and expand_like
- Enhanced operator fusion for multiple elemwise branches
- Separate nnvm fusion and compilation pass
Misc
- Unified build system to cmake, customizable cmake path for vulkan, rocm, cuda
Contributors
See the complete list here. Thanks to all the contributors to contribute to this release.
Code reviewers
- @yzhliu topi, tvm4j, nnvm
- @kevinthesun nnvm
- @Huyuwei topi operators
- @tmoreau89 hardware backends
- @comaniac fpga backends
- @kazum nnvm, opencl backend, fpga
- @nishi-t nnvm, opencl backend
- @merrymercy topi, arm,
- @vinx13 gpu backend
- @masahi nnvm, topi
- @eqy autotvm
- @jroesch runtime
- @PariksheetPinjari909 frontends, topi
- @srkreddy1238 frontends, topi
- @FrozenGene autotvm
Compiler
- @alex-weaver vulkan
- @were hybrid script mode
- @nishi-t CUDA, fp16, int8 support
- @Ktabata intel FPGA support
- @kazum xilinx fpga support
- @cowanmeg arm optimized popcount
- @tmoreau89 VTA customized accelerator
TOPI, graph optimization
- @merrymercy AutoTVM
- @yzhliu tvm4j graph runtime, x86
- @Laurawly intel graphics
- @abergeron conda build fix
- @nhynes sgx random
- @masahi topi, more robust op fusion
- @kevinthesun vision ops
- @grwlf argmax/min ops
- @cowanmeg bit-serial operator
- @ehsanmok topi tutorial
- @zhiics refactor fusion and compilation into separate pass
- @liangfu binary logical operators
Frontends
- @srkreddy1238 tutorials for deployment, tensorflow frontend
- @siju-samuel coreml, tf frontend
- @PariksheetPinjari909 nnvm, slice
- @kazum keras
- @nishi-t mxnet, nnvm
Deploy
- @eqy rpc, thread runtime
- @dayanandasiet android tutorials