Skip to content

Apache TVM v0.13.0

Compare
Choose a tag to compare
@Hzfengsy Hzfengsy released this 14 Jul 02:37
97c5de6

Introduction

The TVM community has worked since the v0.12.0 release to deliver the following new exciting improvements! The main tags are below (bold text is with lots of progress):

  • Community, RFC;
  • Frontend: TensorFlow/TFLite, Pytorch/Torch, Paddle, keras;
  • Runtime: Adreno, OpenCL & CLML, ROCm, CUDA & CUTLASS & TensorRT, Ethosn, Vulkan, Hexagon, Metal, others about runtime;
  • Relay, BYOC, TOPI, Arith, TIR, TVMScript, MetaSchedule;
  • microTVM, AOT, TVMC, LLVM;
  • CI, BugFix, Docs, Docker, Miscs;

Please visit the full listing of commits for a complete view: v0.12.0...v0.13.0.

Community

  • #15086 - Aleksei-grovety -> Reviewer
  • #14676 - Jiajun Jiang -> Reviewer
  • #14677 - Qiang Zhang -> Reviewer
  • #14622 - Sunghyun Park -> Reviewer
  • #14578 - Zihao Ye -> Committer
  • #14853 - Anirudh Sundar Subramaniam -> Committer
  • #14772 - Add new key for release signing

RFC


Frontend

  • #14830 - Use f-strings for string formatting, NFC
  • Keras
    • #15122 - [Relay][Keras] Fix SeparableConv2D conversion in dilation_rate attribute
    • #15107 - [Relay][Keras] Fix a wrong variable name in keras frontend
    • #15053 - [Relay][Keras] Fix the wrong implementation logic about cropping2D
    • #15082 - [Relay][Keras] Fix UpSampling2D about the wrong assertion about size
    • #15060 - [Relay][keras] Fix the bug about the attribute 'output_padding' in Deconv
    • #14707 - [Keras]fix a bug about alpha attribute in LeakyReLU which lead to passes conflict
    • #15175 - [Relay][Keras] Fix concatenate convert function in axis parsing
  • Paddle
    • #14801 - [Paddle] [PaddlePaddle Hackathon 4]add attribute support for gaussian_random/softplus/Conv3d/Conv2d
    • #14973 - [Paddle] [PaddlePaddle Hackathon 4] add convert support for tanhshrink/pool3d/set_value ops for paddle frontend
    • #14826 - [Paddle] [PaddlePaddle Hackathon 4] add convert support for p_norm/roi_align/softmax_with_cross_entropy
    • #14575 - [Paddle] [PaddlePaddle Hackathon 4]add attribute support for dropout/hard_sigmoid/pixel_shuffle
  • TFLite
    • #14667 - [TFLite]Support for quantized squared difference
    • #14819 - [TFLite]Generate name when tensor name is missing
    • #15173 - [FRONTEND][TFLITE]Fix int16 transpose conv loading
  • TensorFlow
    • #14546 - [Tensorflow] Fix conv2d_transpose for NHWC layout
  • PyTorch
    • #14747 - [PyTorch] Add aten::new_zeros
    • #14699 - [Torch] fix typo in new_full
    • #14963 - [PyTorch] Support use_input_stats in instance_norm
    • #14930 - Fix pytorch axis
  • ONNX
    • #15017 - [ONNX] Fix bug in scatter_elements

Runtime

  • #15182 - Add weak symbol to builtin fp16
  • #15161 - Clean TVM stacktrace in error messages
  • #15162 - Support void as dtype in FFI
  • #14902 - Update Module and Registry to use String Container
  • #14967 - [Runtime,RPC] Use f-strings for string formatting, NFC
  • #14887 - Make systemlib unique per prefix
  • #14775 - Added str for tvm._ffi.runtime_ctypes.TVMArray
  • #14656 - Fix Can't "query_imports" Bug of VM Executable

Adreno

  • #15061 - [TOPI]Fix problem with ceil_log2
  • #14996 - [OpenCL]Fix conv2d when output channels < 4

CMSIS-NN

  • #15059 - Update CMSIS-NN release to v4.1.0

OpenCL & CLML

  • #14972 - [OPENCL] Always use convert_T for type conversion
  • #14995 - [OpenCL] Improve diagnostic message
  • #14833 - [Codegen][OpenCL] fix amibiguous selection operator call
  • #14792 - [OpenCL] Refactor OpenCL runtime to support SPIRV binary ingestion
  • #14922 - [OpenCLML] Reactor and introduce on chip memory and memory planner
  • #14949 - [CodegenC] Updated unit test for sorted CodegenC output
  • #14767 - [OpenCLML] Transposed convolution support and other fixes

cuda & cutlass & tensorrt

  • #14751 - [CUDA] Fixed the call of the min function in the schedule for cuda
  • #14798 - [CUTLASS] Add NDEBUG option to CUTLASS compile to speed up attention kernel
  • #14782 - [Bugfix][Codegen][CUDA] Wrong casting in ASM

metal

  • #14962 - Fix int8 vectorized cast
  • #14846 - Fix vectorized select
  • #14727 - Update metal runtime to directly store kernel map
  • #14671 - Fix flaky memory issue due to racing

Vulkan

  • #15035 - [Vulkan] Allow DeclBuffer in CodeGenSPIRV
  • #14817 - [Vulkan] Add cooperative matrix support

Hexagon

  • #14997 - Remove "c" as aot_host_target tvm/contrib/hexagon/pytest_pl…
  • #14948 - Update instructions to compile hexagon runtime
  • #14965 - Add support for v73, make v68 default
  • #14720 - [TIR] Add get_vtcm_allocation_sizes with lowering
  • #14567 - [TIR] Use the "target" value in T.func_attr for VTCM limit

ROCm

  • #15106 - [TensorIR]AMD Matrix Core Support
  • #15088 - [Target]Replace rocm arch parsing from int to string

microTVM

  • #14872 - Use self.close_transport() on error

AOT

  • #15033 - Avoid Var-to-Var Let binding in AOTExecutorCodegen
  • #15032 - Remove duplication in tvm.testing.aot.compile_models
  • #14529 - Fix warning on dropping const in TVMAotExecutor_GetInputName

micoNPU

  • #15159 - [microNPU][ETHOSU] Fix compiler attributes types
  • #15147 - [microNPU][ETHOSU] Add option to disable copying constants for case without cascader
  • #15069 - [microNPU][ETHOSU] Fix SoftMax legalization parameters
  • #15115 - [microNPU][ETHOSU] Upgrade to 23.05 version of Arm(R) Ethos(TM)-U NPU drivers
  • #15114 - [microNPU] Upgrade Vela to v3.8.0
  • #15104 - [microNPU][ETHOSU] Fix minimum buffer size
  • #15063 - [microNPU][ETHOSU] Fix CopyComputeReordering pass arguments
  • #14861 - [microNPU][ETHOSU] Add offloading to the NPU the nn.avg_pool2d operator with a stride > 3
  • #14765 - [microNPU][ETHOSU] Channel pad offloaded to NPU
  • #14774 - [microNPU][ETHOSU] Fix Softmax quantization parameters
  • #14629 - [microNPU][ETHOSU] Softmax int8 legalization support
  • #14353 - [microNPU] Add support for MEAN with uint8 ifm
  • #14587 - [microNPU] Fix skip tests when Vela is not present
  • #14464 - [microNPU][ETHOSU] Add restrictions to convert to NHCWB16 layout in LayoutOptimization pass

BYOC

  • #15046 - Add GEMM kernel from FasterTransformer as submodule
  • #15029 - Hide internal cutlass symbols

Relay

  • #15068 - Improve the "clip" op optimization in simplify expr pass
  • #14925 - add a dimension check to reject invalid input
  • #14858 - [simplify_expr]: Add pass to remove trivial transpose ops
  • #14838 - Use f-strings for string formatting, NFC
  • #14831 - [Relay/Op] Use f-strings for string formatting, NFC
  • #14580 - Simplify the square of a binomial
  • #14735 - Handle pad value coming from Tensor instead of scalar
  • #14601 - Enhance type infer for dynamic shape
  • #14885 - [Relay] fix broadcast in PyTorch frontend
  • #15090 - [Relay] Insertion of "device_copy" CallNode to Resolve Device Conflict on Unconstrained Nodes
  • #14845 - [Relay] Fix softplus in paddlepaddle frontend
  • #14837 - [Relay] Fix AdaptiveAvgPool2d about wrong dtype prasing
  • #14821 - [Relay] Fix softplus about the wrong calculation formula in Relay PyTorch frontend
  • #14820 - [Relay] Fix threshold calculation logic in PyTorch frontend
  • #14824 - [Relay] fix a bug about ReLu in the threshold attribute which causes a different results with keras
  • #14796 - [relay] fix wrong calculate logic about celu
  • #14773 - [Relay] fix scatter_nd type relation
  • #14742 - [relay] Fix alpha attribute with None in ELU
  • #14740 - [Relay] Fix stride in LpPool for default
  • #14556 - [Relay] fix a bug caused by IncompleteTypeNode in EinsumRel while doing MergeComposite
  • #15057 - [QNN] Implement quantized avg_pool2d
  • #14536 - [QNN] Implement 'qnn.softmax'
  • #14875 - [Quantization]: Update simulated_quantize to infer correct layout

TOPI

  • #15018 - Fix dynamic dimensions support for Dense on TOPI side
  • #14856 - Fix in interpretation of empty axis parameter in reduction fun…
  • #14483 - [Target] Add SVE specific convolution
  • #14839 - Use f-strings for string formatting, NFC
  • #14822 - Use f-strings for string formatting, NFC
  • #14519 - Vectorize depthwise conv2d output operator
  • #14549 - remove the i32 cast for output shape of pool
  • #14566 - [Topi] Output strides in pack_buffer() utility

Arith

  • #15131 - Hotfix flaky test in padded matmul
  • #15120 - NormalizeToIterSum
  • #15081 - Improve arith simplify to handle symbolic reshape pattern
  • #14532 - Implement statistics counters for RewriteSimplifier
  • #14704 - [cherry-pick][BUGFIX] Fix a bug of iter map floormod(x,2) simplify
  • #14849 - [TVMScript] Capture fails if var appears only in annotation
  • #14596 - [TensorIR] Improve CompactBufferRegion for symbolic shape
  • #15129 - [TIR] Recognize empty extents
  • #14982 - [TIR][VTA] Update host-side target, even without device func
  • #14547 - Enhance IterMapSimplify for symbolic
  • #14571 - [BUGFIX] Fix a bug of iter map floormod(x,2) simplify
  • #14582 - Fix solve inequality of unbound var ranges
  • #14538 - Enhance CanonicalSimplify to Simplify ProdDiv

MetaSchedule

  • #14781 - [MetaSchedule] RPC port needs to be an integer
  • #14673 - Introduce MMA Tensor Core Multilevel Tiling
  • #14784 - Enhance tune_tir to tune IRModule of TIR Collections
  • #14783 - Add an API to dump a pruned database
  • #14785 - Clear screen only when specified
  • #14654 - Handle output cases for InlineConstantScalars
  • #14642 - PostProc not rewriting unroll for purely spatial block
  • #14591 - Handle cases when no features found by FeatureExtractor
  • #14584 - [ARM] Beautification of the function names

TIR

  • #15153 - [TensorIR][Visitor] Visit buffer members in match_buffer's in block visitor functions
  • #15168 - [Schedule] Support padding-by-factor in PadEinsum
  • #15165 - Expose UndefinedVars to Python
  • #15163 - Fix RenewDef for symbolic input shapes
  • #15142 - [Schedule] Enhance compute-inline for fusion
  • #15150 - Fix typo in code example
  • #15144 - [TensorIR][Schedule] New schedule primitive unsafe_hide_buffer_access
  • #15146 - Block dependence analysis without schedules
  • #15119 - Avoid duplicate GlobalVar names in SplitHostDevice
  • #15037 - Handle DeclBuffer in CacheReadWrite schedule primitive
  • #15098 - [Ethos-U]Handle DeclBuffer in Ethos-U inputs
  • #15044 - [USMP] Preserve DeclBuffer in PoolAllocationToOffsetConverter
  • #15078 - Handle DeclBuffer in LowerThreadAllreduce
  • #15094 - Handle DeclBuffer in MergeDynamicSharedMemoryAllocations
  • #15093 - Handle DeclBuffer in StorageAccessInfoLower
  • #15045 - Handle DeclBuffer in InjectDoubleBuffer
  • #15096 - Handle DeclBuffer in RemoveNoOp
  • #15076 - [CodeGen] Define PackedFunc error code in MakePackedAPI
  • #15102 - Update primfunc host attachment to include host
  • #14854 - [Compute-at] Enable complex floordiv/floormod expressions in compute_at
  • #15041 - Handle DeclBuffer in LowerCustomDatatypes
  • #15038 - Handle DeclBuffer in Inline/ComputeAt/ReverseComputeAt
  • #15052 - [Analysis] Handle DeclBuffer in FlopEstimator
  • #15051 - Handle DeclBuffer in StorageRewrite
  • #15050 - [Schedule] Fix decompose_padding bug with dtypes
  • #15034 - Refactor BlockScope outside schedule
  • #15054 - Handle DeclBuffer in IRSubstitute
  • #14986 - Move SplitHostDevice to before MakePackedAPI
  • #15042 - Handle DeclBuffer in StorageFlatten's input
  • #15040 - Preserve object equality in Buffer::GetFlattenedBuffer
  • #14693 - Enhance TVMScript Buffer Slice Access
  • #14988 - Handle callees on same target, different codegen
  • #14951 - Keep trivial LetStmt in tir.Simplify when used in buffer decl
  • #14944 - Restrict tir.transform.LowerTVMBuiltin to host functions
  • #14990 - [IR,TE,TIR] Use f-strings for string formatting, NFC
  • #14993 - Fix incorrect construction of block frames
  • #14952 - Avoid re-defining var = arg_var in ArgBinder
  • #14918 - SplitHostDevice, handle subroutines
  • #14943 - Restrict tir.transform.InstallDebugSpans to host functions
  • #14942 - Preserve existing kTarget function attribute in BindTarget
  • #14945 - Restrict tir.transform.CombineContextCall to host functions
  • #14914 - Handle subroutine calls in MakeUnpackedAPI
  • #14913 - Handle subroutine calls in MakePackedAPI
  • #14892 - Expand unit tests for ConvertSSA
  • #14866 - Avoid too complex predicate in compaction
  • #14766 - [Schedule] Improve blockize to support blockizing multiple blocks
  • #14776 - Improved parameter name in DLTensor unpacking error messages
  • #14562 - [Driver] Move ShouldAnnotateEntryFunc logic into transform
  • #14741 - Keep block annotations from tensorization
  • #14021 - More flexible buffer compaction
  • #14711 - [Analysis] Calculate allocated memory at module level
  • #14492 - Flatten SeqStmt on construction
  • #14598 - Add CUDA int4 tensor core intrinsics
  • #14593 - [Schedule] Method returning the function being worked on
  • #14592 - [TensorIR] Fix ComputeAt with perfect symbolic bound
  • #14491 - Use String instead of StringImm for AttrStmtNode::node
  • #14626 - [TensorIR]reindex_cache_write do not mutate init statement
  • #14588 - [Fix][TIR] UnifyThreadBinding creating unit loop with annotation
  • #14589 - [Fix][TIR][Analysis] Reduction block checking alloc_buffers

TVMScript

  • #15083 - Avoid visiting repetition tensor in SetCommonPrefix Visitor
  • #15091 - [TIR]Convert tir.op operands to PrimExpr
  • #14919 - [TIR] Parse subroutine calls with no arguments
  • #14941 - Prevent bool to int conversion in T.Assert condition
  • #14915 - Allow T.target("device", host="host") to specify host
  • #14900 - Round-trip DeclBuffer with undefined data pointer
  • #14889 - [TIR]Added format/parsing of subroutine calls
  • #14874 - Use default fallback for un-registered type
  • #14840 - Print Executor, Runtime, and FunctionInfo as metadata
  • #14812 - Handle AllocatedPoolInfo, ConstantPoolInfo, ConstantInfo
  • #14786 - Add __name__ attr for parsed PrimFunc and IRModule
  • #14531 - Preserve LetStmt of constants
  • #14488 - Distinguish between void* and handle

TVMC

  • #14994 - [Bugfix]Fix tvmc option for printing which operators are offloaded to the Ethos-U

LLVM

  • #15127 - Remove the "ret_void" argument of AddFunction
  • #15139 - Minor refactor to LLVMModuleNode::SaveToFile
  • #14958 - [Codegen]Allow void return type from PackedFunc
  • #14946 - Expose Host CPU Feature Detection
  • #14901 - Codegen subroutine call when CallNode::op is GlobalVar
  • #14570 - Use Var annotation in LetStmt for pointer type
  • #14843 - [RUNTIME] Enable multi systemlib with device code
  • #14564 - Validate generated LLVM module before optimization
  • #14568 - Expand tvm::Type to DWARF conversion
  • #14563 - [Codegen]Remove cast to i8* in builtin::address_of

BugFix

  • #14960 - [Bug] Add typing_extensions requirement again
  • #15015 - [Hotfix] Remove LOG(INFO) from unsupported dtype legalization pass
  • #14991 - Make ThreadAllReduce pass compatible with int64
  • #14950 - Avoid symbol conflicts in MakePackedAPI/MakeUnpackedAPI
  • #14903 - [Test Cases]Add some version check to make test cases run in all PyTorch versions
  • #14890 - [Fix] Fix typo in error message
  • #14879 - fix the undeclared identifier 'f'
  • #14857 - Fix batch_norm
  • #14787 - [FIX] fix typo in comment

CI

  • #15179 - [Testing] Utility method to run TVM on remote device
  • #15138 - [Test] Improve check for TVMError exception in test_cast
  • #15062 - Clone submodule recursively
  • #15065 - Revert "Make Graviton3 default AArch64 job runner node (#14983)"
  • #14983 - Make Graviton3 default AArch64 job runner node
  • #15056 - [Bugfix]Fix CacheControl version constraint violation
  • #14908 - Update the expected CI jobs list in the update_branch script
  • #14847 - Update CPU image to install PyTorch
  • #14808 - [Testing] Use TVMScript's "name" argument for error messages
  • #14780 - fix doc deploy issue
  • #14651 - Modify test cases to accommodate the CI upgrades
  • #14666 - sccache support while using ci.py under multi user environments
  • #14635 - Upgrade CI
  • #14713 - Add PLATFORM env var to builds
  • #14680 - Downgrade ci_cpu llvm version back to 11
  • #14653 - [tests][scripts][release] Optimize release note script about categories etc
  • #14646 - [test][script] Fix release gather_pr.py of script about ghost users or blank PR nodes
  • #14550 - Add JAX deps in Dockerfiles
  • #14466 - Update ci_cpu image and build with llvm-15

Docker

  • #15149 - Fix build.sh environment variables
  • #15105 - Update docker images for llvm-16
  • #15092 - Update ci-cortexm docker image to contain CMSIS-NN release v…
  • #15095 - Add build.sh environment variables
  • #15067 - Migrate arm docker image to use llvm packages
  • #15031 - Update ci_cpu docker image to one containing polly package f…
  • #15003 - [ADRENO] Docker setup changes for multi user environments
  • #14912 - Add polly package
  • #14842 - Install PyTorch on cpu image
  • #14590 - Support rootless docker when using docker/bash.sh

Docs

  • #15126 - [DOC] Add RPC System Setup Document
  • #15071 - Updated the copyright year from 2020 to 2023
  • #15055 - [DOC][TUTORIAL] Fix typo for the 'Making your Hardware Accelerator TVM-ready with UMA'
  • #14504 - [TensorIR][Doc] Docstring of reorder_block_iter_var
  • #14611 - [TIR] Fix unsafe_set_dtype docstring
  • #14585 - Fix typo in the Vitis AI Integration docs

Misc

  • #15267 - [release] Disable git merge to avoid conflict
  • #15187 - [RPC] Report RPC Session Timeout to Client Instead of "kShutdown"
  • #15185 - Update tvm_runtime.h
  • #15164 - [CMake] Support LLVM-16 static linking
  • #15167 - [Python] Enhance Wheel Packaging
  • #15166 - [Target] Add MetaSchedule-compatible attributes to OpenCL
  • #15154 - [Minor] Fix Compilation Warnings
  • #15132 - [NDArray] Allow creating a view from a strided array
  • #15116 - [RPC] Add Missing Option "port_end" to RPC Proxy
  • #15073 - [CodeGenC] Use PrimFuncNode::ret_type in function signature
  • #15036 - [StackVM] Updated CodeGenStackVM to handle DeclBuffer
  • #15022 - [Build] Fix missing virtual destructor in SIBuilder
  • #15016 - Fix type parse error about AdaptiveMaxPool
  • #15007 - [Minor] Fix compilation warnings
  • #15000 - [CMAKE] Introduce dummy build as an option
  • #14863 - [DataType] Initial support of fp8 (e4m3/e5m2)
  • #14975 - [CMAKE] Add a dummy target to defer libtvm dep
  • #14574 - [IR][SIBuilder]
  • #14939 - [Target] Add target to all TVM callbacks
  • #14937 - [BUILD] Enable log before throw message in windows
  • #14934 - [TestCases] fix unreachable test cases due to outside the for-loop
  • #14916 - [TypoFix] fix some typo problem in keras frontend
  • #14893 - [Contrib] Use f-strings for string formatting, NFC
  • #14884 - [AutoTVM] Use f-strings for string formatting, NFC
  • #14876 - [CONTRIB] Enable create_staticlib to take in tar files
  • #14867 - Fix f-string typo
  • #14851 - Add v0.12.0 docs
  • #14813 - [BUILD] Removed the duplicated MACROs in config.cmake
  • #14743 - [SUPPORT] Fix RingBuffer ReadWithCallback
  • #14799 - [LINT] Fix clang-format script for newest clang-format
  • #14797 - [NDArray] Allow arbitrary stride when the corresponding shape is 1
  • #14790 - More clear ref of thirdparty license
  • #14779 - fix: use arm on demand instead of spot
  • #14762 - [Target][Minor] Add A6000 Target Tag
  • #14683 - [AutoTVM] Added Droplet algorithm in TVM
  • #14694 - unify search path approach to various libs
  • #14686 - [CMAKE] Update search pattern of config
  • #14636 - Fix bug about wrong attribute name
  • #14628 - [CODEGEN] Fix metal codegen when with only single working dim
  • #14607 - fix: deploy ci
  • #14569 - [Node] Allow alternative root names in ObjectPath::Root()
  • #14522 - [Object] Implemented .as for ObjectRef param, returns Optional
  • #14477 - feat: use spot instances for ci with on demand as a backup
  • #14468 - [AutoTVM] New rank-binary loss_type for the new xgboost >= 2.0.0 behaviour
  • #14544 - Update to v0.13.dev0
  • #14539 - [Target] Add Apple M1 GPU tag with 256-thread restriction