Releases · sophgo/tpu-mlir

18 Sep 09:02

3a49702

v1.11-beta.0 Latest

Latest

[soc_dump] add doc

Change-Id: Icaf313113415a9bf0ad9c75abdcb609d661c815b

Assets 8

15 Aug 05:02

github-actions

v1.10

d9ce48d

TPU-MLIR v1.10 Release

Release Note

Enhancements:

Added CUDA support for various operations like conv2d, MatMul, dwconv, pool2d, and more.
Improved performance for operations like MeanStdScale and softmax.
Enhanced multi-core batch mm and added support for bm168x with CUDA.
Refined CUDA code style and adjusted interfaces for various operations.

Bug Fixes:

Fixed issues with matmul, calibration failures, conv pad problems, and various performance problems.
Addressed bugs in model transformations, calibration, and various pattern issues.
Resolved bugs in different model backends like ssd, vit, detr, and yolov5.

New Features:

Added support for new models like resnet50, mobilenet_v2, shufflenet_v2, and yolox_s/alphapose_res50.
Introduced new operations like RequantIntAxisOp and Depth2Space with CUDA support.
Implemented new functionalities for better model inference and compilation.

Documentation Updates:

Updated weight.md, calibration sections, and user interface details.
Improved documentation for quick start, developer manual, and various tpulang interfaces.
Enhanced documentation for model transformation parameters and tensor data arrangements.

Miscellaneous:

Added new npz tools, modelzoo regression, and support for bmodel encryption.
Fixed issues with various model performance, shape inference, and CUDA backend optimizations.
Revived performance for models like yolov5s-6, bm1690 swin multicore, and more.

Assets 8

15 Jul 14:40

github-actions

v1.9

3fe7a13

TPU-MLIR v1.9 Release

Release Note

Enhancements:

Implemented output order preservation in converters like ONNX, Caffe, Torch, and TFLite.
Added support for resnet50-v2 bm1690 f8 regression.
Improved ILP group mlir file sequences for resnet50 training.
Updated chip libraries and performance AI for A2 profiling.
Added a new dump mode "COMB" and refined abs/relu conversions.

Bug Fixes:

Fixed issues with preprocess when source layout differs from target layout.
Addressed bugs in various operations like softmax, concat, and weight reorder in conv2d.
Resolved bugs in model training, model transformation, and various pattern issues.
Fixed bugs related to CUDA inference, matmul with bias, and multi-output calibration.

New Features:

Added support for multi-graph in TPULang.
Introduced new options in TPULang for inference and model deployment.
Implemented various optimizations and enhancements for dynamic operations and model transformations.

Documentation Updates:

Refined documentation for quick start quantization and user interface sections.
Updated backend information, docker image download methods, and model deployment details in the documentation.

Miscellaneous:

Improved performance for various models like vit, yolov5s, and bm1690.
Introduced new functionalities like embedding multi-device slice and groupnorm train operations.
Added support for adaptive_avgpool inference and multiple Einsum modes.

Assets 8

12 Jul 09:27

github-actions

v1.8.1

f411c2b

TPU-MLIR v1.8.1

Full Changelog: v1.8...v1.8.1

Assets 8

29 May 11:15

github-actions

v1.8

a085169

TPU-MLIR v1.8 Release

Highlights:

Enhancements:
- Added support for dynamic shape inference in various operations.
- Optimized core operations for better performance on specific models.
- Improved backend support for multiple models like BM1684X, BM1688, BM1690, SG2380, etc.
- Introduced new operations and patterns for more efficient model processing.
- Updated documentation for better clarity and user guidance.
Bug Fixes:
- Resolved issues related to input/output handling, kernel configurations, and model-specific bugs.
- Fixed bugs in dynamic compilation, core parallel processing, and various backend operations.
- Addressed errors in specific model post-processing steps like YOLOv5, EfficientNet, etc.
Performance Improvements:
- Optimized cycle calculations for multi-core models.
- Enhanced bandwidth usage statistics for better resource management.
- Accelerated compilation processes for training models using a new layer-group scheme.
New Features:
- Introduced new operations like attention quant block, prelu op, and various dynamic compile features.
- Added support for additional operations, weight location, and dynamic compile enhancements.

Documentation Updates:

Updated developer manuals, quick start guides, and model-specific documentation for better understanding.

Miscellaneous:

Streamlined workflows for faster commit checks and improved debugging processes.
Added new test cases for regression testing and script-based model evaluations.
Fine-tuned backend operations for improved model performance and accuracy.

Assets 8

19 Apr 09:58

charlesxzb

v1.7

5598736

TPU-MLIR v1.7 Release

Change Log

New Features

Added support for new operations including flash attention, custom op dynamic compile, and tpulang ops.
Enabled AttnReorder and added support for dynamic indices in ops like onehot, scatterelements, and cumsum.
Added --dump_dataframe option for bmodel_checker and support for transpose with order [1, 2, 3, 0].
Introduced Watchpoint feature to TDB and added support for mixed-precision networks.
Implemented optimizations for dma efficiency of flash attention and optimized backend for various models.
Added support for local memory dump in pcie mode and added various quantization features like eva quant, swin quant, and detr quant.
Enhanced multi-core support including support for LayerNorm and GroupNorm in coreParallel, and multi-core data slice in tensorLocation.
Added new patterns for Cswin and Einsum operations.
Improved support for LLM (Large Language Models) in bm1688.

Bug Fixes

Fixed various bugs including kernel_module msg_id, SAM-VIT-encoder regression, and attention accuracy problems.
Addressed logical issues in AddToScale pattern and issues in fp_forward.
Resolved bugs in model info core dump, op's liveRange in coreParallel, and DevParallel bugs.
Fixed issues in model combine with io alone and bugs in various ops like interp, RotaryPosEmbPattern, and efficient-lite4 permute.

Performance Improvements

Improved the performance of TDB and the bmodel_checker for 1684x pcie.
Optimized facenet and fixed performance issues of 1688 multicore.
Enabled single-core mode optimizations where necessary.

Documentation and Testing

Updated documentation, refined custom chapters, and ensured consistency in quick start docs.
Added test cases for custom tpulang, multi-core with subnets, and custom cpuop.
Fixed various documentation errors and updated the release note.

Other Changes

Added restrictions to tpulang ops and net test cases.
Adjusted descriptions and refined interfaces for better user experience.
Updated backend .so files and addressed sensitive words in the codebase.
Added support for int4 dtype in tpu_profile and ensured tool/scripts work in Python virtual environments.

Assets 8

01 Apr 09:16

luluman

v1.7-beta.0

af1ab50

Technical Preview Pre-release

Pre-release

Features

Added support for LLM Decoding by utilizing multi-cores to enhance processing efficiency.
Introduced fx2mlir, a new functionality for enhanced MLIR conversion.
Implemented nnvlc2.0 and nnvlc1.0 local activation and weight operations, respectively, for improved neural network performance.
Enabled TPULANG support for operations like sort, argsort, and additional ops, enhancing the language's functionality and flexibility.
Added cv186x support in run_sensitive_layer.py and for the TDB, expanding compatibility and debugging capabilities.
Introduced new ops and features like Watchpoint in TDB and activation ops support for scale & zero_point, broadening the range of functionalities available in the tpu-mlir project.
Supports BM1690.
L2mem performs intermediate data exchange for active tensor.

Bug Fixes

Resolved a variety of bugs affecting backend processes, including issues with the 1684x backend, permutefuse2, permutemulconstswap, and more, improving overall stability and performance.
Fixed several critical issues across tpulang, including errors in sort_by_key operation, reshape operations, where operation, and more, enhancing the language's reliability for developers.
Addressed bugs in model processing, including fixes for concat logic, scale2conv, scale2conv3d, instance norm, and several more, ensuring smoother model optimization and execution.
Corrected errors in the documentation, providing clearer and more accurate information for users and developers.

Documentation Updates

Updated tpulang documentation to include new functionalities and optimizations, making it easier for users to understand and utilize the language effectively.

Performance Improvements

Optimized TDB and bmodel_checker for 1684x pcie mode, significantly reducing processing times and enhancing efficiency for model analysis.
Improved the efficiency of DMA in flash attention operations, ensuring faster data handling and processing.
Enabled IO tag mode and refined address mode for better memory management and operational flexibility.

Assets 8

27 Mar 12:25

luluman

v1.6.1

8e3206a

TPU-MLIR v1.6.1

Full Changelog: v1.6...v1.6.1

Assets 8

23 Feb 16:56

luluman

v1.6

eadac2d

TPU-MLIR v1.6 release

Change Log

Bug Fixes

Fixed documentation errors and added checks for documentation errors during build.
Set workaround for ar.copy cycle issue to 0, avoiding potential data overwriting in inplacing operations.
Addressed a bug in Caffe DetectionOutput and fixed a hang in cv186x.
Corrected Mul buffer size alignment issues and various other buffer size corrections.
Fixed issues with attention accuracy, RotaryPosEmbPattern, and op status validation before the matching process.
Addressed a series of backend bugs, including daily build errors, performance declines, and incorrect return values.
Fixed data_checker issues, api_conv bug, and a local slice calculation bug.
Resolved incorrect affineMap for Pooling buffer and fixed reshape bug for inner products.
Corrected Mul&Div dynamic support for local operations and fixed issues with Conv2d buffer size calculations.
Addressed various matmul bugs, including fp8 support issues and quantization inconsistencies.

Features

Enabled multicore optimizations and added support for multi-core model tests.
Updated libbackend_1688.so and various backend updates for better performance and compatibility.
Introduced groupParallel operation, support for dynamic input data generation.
Added support for new patterns such as Permute fuse pattern and splitQuantizedMLP pattern.
Implemented npz compare visualizer tool and added support for bm1688 backend.
Added MatMul weight split case and improved permute performance.
Added support for img2col pattern, attention interface, and several dialects for SG2260 operations.

Documentation Updates

Updated release notes and resolved issues with document formatting.
Standardized expression terminology and replaced sensitive words in documentation.

Performance Improvements

Improved local softmax performance and optimized dataFlow checking in coreMatch.
Enhanced performance for Vit L i8 4 batch operations and refined conv multi-core handling.
Optimized VIT-B concurrency and addressed performance issues with MaxPool buffer sizes.

Assets 8

29 Jan 13:39

luluman

v1.6-beta.0

dc061a7

v1.6-beta.0 Pre-release

Pre-release

New Features

Implemented SG2260 structureOp interface and structured transform, including a solver for finding transforms【ea234bc2†source】.
Added OneHot converter and support for fp8 in the debugger【c03ba46c†source】【f87127bd†source】【fed7e68a†source】.
Supported MatMulOp for special cases broadcast in batch dims and added interface for attention【90d4b327†source】【044c4fc3†source】.
Provided "decompose linalg op" and "tile+fuse" pass for MatMul parallel supports more batch patterns【25f24e3d†source】.
Unet single block test added【ea76f9c9†source】.
Implemented fp8 support for Matmul and other ops including addconst, subconst, mul, add, sub, and abs【e09adbda†source】【7eaec57f†source】.

Performance Improvements

Improved Matmul fp8 performance with new backend support【2b8dd03b†source】.
Enabled distribute MLP and attention with improved performance for cascade_net input/output names and order【d5a42d7a†source】.
Refactored tdb to improve disassembler serialize and resolve BM1688 decoding issue【e73450f8†source】【1457df29†source】.
Improved weight reorder for ConvOp and optimized permute of attention matmul【a9045c3c†source】【91a353e3†source】.

Bug Fixes

Resolved various bugs in MatMul, Conv, and other ops across multiple chipsets including SG2260, BM1688, and CV18xx【b809a8c1†source】【bfada4de†source】【9804e30c†source】.
Fixed bugs related to ReduceOp, ArgOp, SliceOp, and others for better operation and tensor handling【2cdeb60d†source】【bbacf00f†source】.
Addressed issues in SAM, daily test, and tdb related to core operations and functionality【83e1979c†source】【7c37e39d†source】.
Fixed memory and data handling bugs for more accurate and stable operation of the models【2310cd8d†source】【0ed60f1f†source】.

Documentation Updates

Updated documentation to remove sensitive words and improve clarity and comprehensiveness【43e0b428†source】【5d6c49fc†source】.

Miscellaneous

Enhanced various backend libraries and supported new ops and patterns for more efficient and versatile model handling【1ca95d71†source】【8f1a2de8†source】.
Improved scatterE and reduce dynamic shape_value handling for better model optimization【fa2ccf29†source】.
Refinements in graph optimization, permute parallel indexMapping, and related areas for improved model processing【094f05da†source】【1ec6c16b†source】.

Assets 8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release Note

Enhancements:

Bug Fixes:

New Features:

Documentation Updates:

Miscellaneous:

Release Note

Enhancements:

Bug Fixes:

New Features:

Documentation Updates:

Miscellaneous:

Change Log

New Features

Bug Fixes

Performance Improvements

Documentation and Testing

Other Changes

Features

Bug Fixes

Documentation Updates

Performance Improvements

Change Log

Bug Fixes

Features

Documentation Updates

Performance Improvements

New Features

Performance Improvements

Bug Fixes

Documentation Updates

Miscellaneous

Releases: sophgo/tpu-mlir

v1.11-beta.0

TPU-MLIR v1.10 Release

Release Note

Enhancements:

Bug Fixes:

New Features:

Documentation Updates:

Miscellaneous:

TPU-MLIR v1.9 Release

Release Note

Enhancements:

Bug Fixes:

New Features:

Documentation Updates:

Miscellaneous:

TPU-MLIR v1.8.1

TPU-MLIR v1.8 Release

TPU-MLIR v1.7 Release

Change Log

New Features

Bug Fixes

Performance Improvements

Documentation and Testing

Other Changes

Technical Preview

Features

Bug Fixes

Documentation Updates

Performance Improvements

TPU-MLIR v1.6.1

TPU-MLIR v1.6 release

Change Log

Bug Fixes

Features

Documentation Updates

Performance Improvements

v1.6-beta.0

New Features

Performance Improvements

Bug Fixes

Documentation Updates

Miscellaneous