diff --git a/README.md b/README.md
index 358afebbd..edd846a48 100644
--- a/README.md
+++ b/README.md
@@ -71,19 +71,14 @@ And the figure of P6 model is in [model_design.md](docs/en/algorithm_description
 
 ## What's New
 
-💎 **v0.1.3** was released on 10/11/2022:
+💎 **v0.2.0** was released on 1/12/2022:
 
-1. Fix training failure when saving best weights based on mmengine 0.3.1
-2. Fix `add_dump_metric` error based on mmdet 3.0.0rc3
-
-💎 **v0.1.2** was released on 3/11/2022:
-
-1. Support [YOLOv5/YOLOv6/YOLOX/RTMDet deployments](https://github.com/open-mmlab/mmyolo/blob/main/configs/deploy) for ONNXRuntime and TensorRT
-2. Support [YOLOv6](https://github.com/open-mmlab/mmyolo/blob/main/configs/yolov6) s/t/n model training
-3. YOLOv5 supports [P6 model training which can input 1280-scale images](https://github.com/open-mmlab/mmyolo/blob/main/configs/yolov5)
-4. YOLOv5 supports [VOC dataset training](https://github.com/open-mmlab/mmyolo/blob/main/configs/yolov5/voc)
-5. Support [PPYOLOE](https://github.com/open-mmlab/mmyolo/blob/main/configs/ppyoloe) and [YOLOv7](https://github.com/open-mmlab/mmyolo/blob/main/configs/yolov7) model inference and official weight conversion
-6. Add YOLOv5 replacement [backbone tutorial](https://github.com/open-mmlab/mmyolo/blob/dev/docs/en/advanced_guides/how_to.md#use-backbone-network-implemented-in-other-openmmlab-repositories) in How-to documentation
+1. Support [YOLOv7](https://github.com/open-mmlab/mmyolo/tree/dev/configs/yolov7) P5 and P6 model
+2. Support [YOLOv6](https://github.com/open-mmlab/mmyolo/blob/dev/configs/yolov6/README.md) ML model
+3. Support [Grad-Based CAM and Grad-Free CAM](https://github.com/open-mmlab/mmyolo/blob/dev/demo/boxam_vis_demo.py)
+4. Support [large image inference](https://github.com/open-mmlab/mmyolo/blob/dev/demo/large_image_demo.py) based on sahi
+5. Add [easydeploy](https://github.com/open-mmlab/mmyolo/blob/dev/projects/easydeploy/README.md) project under the projects folder
+6. Add [custom dataset guide](https://github.com/open-mmlab/mmyolo/blob/dev/docs/zh_cn/user_guides/custom_dataset.md)
 
 For release history and update details, please refer to [changelog](https://mmyolo.readthedocs.io/en/latest/notes/changelog.html).
 
@@ -123,6 +118,7 @@ For different parts from MMDetection, we have also prepared user guides and adva
   - [Useful Tools](https://mmdetection.readthedocs.io/en/latest/user_guides/index.html#useful-tools)
     - [Visualization](docs/en/user_guides/visualization.md)
     - [Useful Tools](docs/en/user_guides/useful_tools.md)
+    - [Custom Dataset](docs/en/user_guides/custom_dataset.md)
 
 - Algorithm description
 
@@ -155,8 +151,8 @@ Results and models are available in the [model zoo](docs/en/model_zoo.md).
 - [x] [YOLOX](configs/yolox)
 - [x] [RTMDet](configs/rtmdet)
 - [x] [YOLOv6](configs/yolov6)
+- [x] [YOLOv7](configs/yolov7)
 - [ ] [PPYOLOE](configs/ppyoloe)(Inference only)
-- [ ] [YOLOv7](configs/yolov7)(Inference only)
 
 </details>
 
diff --git a/README_zh-CN.md b/README_zh-CN.md
index 252713729..c22bcad86 100644
--- a/README_zh-CN.md
+++ b/README_zh-CN.md
@@ -71,19 +71,14 @@ P6 模型图详见 [model_design.md](docs/zh_CN/algorithm_descriptions/model_des
 
 ## 最新进展
 
-💎 **v0.1.3** 版本已经在 2022.11.10 发布：
+💎 **v0.2.0** 版本已经在 2022.12.1 发布：
 
-1. 基于 mmengine 0.3.1 修复保存最好权重时训练失败问题
-2. 基于 mmdet 3.0.0rc3 修复 `add_dump_metric` 报错 (#253)
-
-💎 **v0.1.2** 版本已经在 2022.11.3 发布：
-
-1. 支持 ONNXRuntime 和 TensorRT 的 [YOLOv5/YOLOv6/YOLOX/RTMDet 部署](https://github.com/open-mmlab/mmyolo/blob/main/configs/deploy)
-2. 支持 [YOLOv6](https://github.com/open-mmlab/mmyolo/blob/main/configs/yolov6) s/t/n 模型训练
-3. YOLOv5 支持 [P6 大分辨率 1280 尺度训练](https://github.com/open-mmlab/mmyolo/blob/main/configs/yolov5)
-4. YOLOv5 支持 [VOC 数据集训练](https://github.com/open-mmlab/mmyolo/blob/main/configs/yolov5/voc)
-5. 支持 [PPYOLOE](https://github.com/open-mmlab/mmyolo/blob/main/configs/ppyoloe) 和 [YOLOv7](https://github.com/open-mmlab/mmyolo/blob/main/configs/yolov7) 模型推理和官方权重转化
-6. How-to 文档中新增 YOLOv5 替换 [backbone 教程](https://github.com/open-mmlab/mmyolo/blob/dev/docs/zh_cn/advanced_guides/how_to.md#%E8%B7%A8%E5%BA%93%E4%BD%BF%E7%94%A8%E4%B8%BB%E5%B9%B2%E7%BD%91%E7%BB%9C)
+1. 支持 [YOLOv7](https://github.com/open-mmlab/mmyolo/tree/dev/configs/yolov7) P5 和 P6 模型
+2. 支持 [YOLOv6](https://github.com/open-mmlab/mmyolo/blob/dev/configs/yolov6/README.md) 中的 ML 大模型
+3. 支持 [Grad-Based CAM 和 Grad-Free CAM](https://github.com/open-mmlab/mmyolo/blob/dev/demo/boxam_vis_demo.py)
+4. 基于 sahi 支持 [大图推理](https://github.com/open-mmlab/mmyolo/blob/dev/demo/large_image_demo.py)
+5. projects 文件夹下新增 [easydeploy](https://github.com/open-mmlab/mmyolo/blob/dev/projects/easydeploy/README.md) 项目
+6. 新增 [自定义数据集教程](https://github.com/open-mmlab/mmyolo/blob/dev/docs/zh_cn/user_guides/custom_dataset.md)
 
 同时我们也推出了解读视频：
 
@@ -92,7 +87,9 @@ P6 模型图详见 [model_design.md](docs/zh_CN/algorithm_descriptions/model_des
 | 🌟  |        特征图可视化        | [![Link](https://i2.hdslb.com/bfs/archive/480a0eb41fce26e0acb65f82a74501418eee1032.jpg@112w_63h_1c.webp)](https://www.bilibili.com/video/BV188411s7o8)  [![bilibili](https://img.shields.io/badge/dynamic/json?label=views&style=social&logo=bilibili&query=data.stat.view&url=https%3A%2F%2Fapi.bilibili.com%2Fx%2Fweb-interface%2Fview%3Fbvid%3DBV188411s7o8)](https://www.bilibili.com/video/BV188411s7o8)  | [特征图可视化.ipynb](https://github.com/open-mmlab/OpenMMLabCourse/blob/main/codes/MMYOLO_tutorials/%5B%E5%B7%A5%E5%85%B7%E7%B1%BB%E7%AC%AC%E4%B8%80%E6%9C%9F%5D%E7%89%B9%E5%BE%81%E5%9B%BE%E5%8F%AF%E8%A7%86%E5%8C%96.ipynb) |
 | 🌟  |     特征图可视化 Demo      | [![Link](http://i0.hdslb.com/bfs/archive/081f300c84d6556f40d984cfbe801fc0644ff449.jpg@112w_63h_1c.webp)](https://www.bilibili.com/video/BV1je4y1478R/)  [![bilibili](https://img.shields.io/badge/dynamic/json?label=views&style=social&logo=bilibili&query=data.stat.view&url=https%3A%2F%2Fapi.bilibili.com%2Fx%2Fweb-interface%2Fview%3Fbvid%3DBV1je4y1478R)](https://www.bilibili.com/video/BV1je4y1478R/) |                                                                                                                                                                                                                               |
 | 🌟  |         配置全解读         |  [![Link](http://i1.hdslb.com/bfs/archive/e06daf640ea39b3c0700bb4dc758f1a253f33e13.jpg@112w_63h_1c.webp)](https://www.bilibili.com/video/BV1214y157ck)  [![bilibili](https://img.shields.io/badge/dynamic/json?label=views&style=social&logo=bilibili&query=data.stat.view&url=https%3A%2F%2Fapi.bilibili.com%2Fx%2Fweb-interface%2Fview%3Fbvid%3DBV1214y157ck)](https://www.bilibili.com/video/BV1214y157ck)  |                                                                                   [配置全解读文档](https://zhuanlan.zhihu.com/p/577715188)                                                                                    |
-| 🌟  | 源码阅读和调试「必备」技巧 | [![Link](https://i2.hdslb.com/bfs/archive/790d2422c879ff20488910da1c4422b667ea6af7.jpg@112w_63h_1c.webp)](https://www.bilibili.com/video/BV1N14y1V7mB)  [![bilibili](https://img.shields.io/badge/dynamic/json?label=views&style=social&logo=bilibili&query=data.stat.view&url=https%3A%2F%2Fapi.bilibili.com%2Fx%2Fweb-interface%2Fview%3Fbvid%3DBV1N14y1V7mB)](https://www.bilibili.com/video/BV1N14y1V7mB)  |                                                                                                                                                                                                                               |
+| 🌟  | 源码阅读和调试「必备」技巧 | [![Link](https://i2.hdslb.com/bfs/archive/790d2422c879ff20488910da1c4422b667ea6af7.jpg@112w_63h_1c.webp)](https://www.bilibili.com/video/BV1N14y1V7mB)  [![bilibili](https://img.shields.io/badge/dynamic/json?label=views&style=social&logo=bilibili&query=data.stat.view&url=https%3A%2F%2Fapi.bilibili.com%2Fx%2Fweb-interface%2Fview%3Fbvid%3DBV1N14y1V7mB)](https://www.bilibili.com/video/BV1N14y1V7mB)  |                                                                           [源码阅读和调试「必备」技巧文档](https://zhuanlan.zhihu.com/p/580885852)                                                                            |
+| 🌟  |      工程文件结构简析      |   [![Link](http://i2.hdslb.com/bfs/archive/41030efb84d0cada06d5451c1e6e9bccc0cdb5a3.jpg@112w_63h_1c.webp)](https://www.bilibili.com/video/BV1LP4y117jS)[![bilibili](https://img.shields.io/badge/dynamic/json?label=views&style=social&logo=bilibili&query=data.stat.view&url=https%3A%2F%2Fapi.bilibili.com%2Fx%2Fweb-interface%2Fview%3Fbvid%3DBV1LP4y117jS)](https://www.bilibili.com/video/BV1LP4y117jS)   |                                                                                [工程文件结构简析文档](https://zhuanlan.zhihu.com/p/584807195)                                                                                 |
+| 🌟  |     10分钟换遍主干网络     |  [![Link](http://i0.hdslb.com/bfs/archive/c51f1aef7c605856777249a7b4478f44bd69f3bd.jpg@112w_63h_1c.webp)](https://www.bilibili.com/video/BV1JG4y1d7GC)  [![bilibili](https://img.shields.io/badge/dynamic/json?label=views&style=social&logo=bilibili&query=data.stat.view&url=https%3A%2F%2Fapi.bilibili.com%2Fx%2Fweb-interface%2Fview%3Fbvid%3DBV1JG4y1d7GC)](https://www.bilibili.com/video/BV1JG4y1d7GC)  |     [10分钟换遍主干网络文档](https://zhuanlan.zhihu.com/p/585641598)<br>[10分钟换遍主干网络.ipynb](https://github.com/open-mmlab/OpenMMLabCourse/blob/main/codes/MMYOLO_tutorials/[实用类第二期]10分钟换遍主干网络.ipynb)     |
 
 发布历史和更新细节请参考 [更新日志](https://mmyolo.readthedocs.io/zh_CN/latest/notes/changelog.html)
 
@@ -132,6 +129,7 @@ MMYOLO 用法和 MMDetection 几乎一致，所有教程都是通用的，你也
   - [实用工具](https://mmyolo.readthedocs.io/zh_CN/latest/user_guides/index.html#实用工具)
     - [可视化教程](docs/zh_cn/user_guides/visualization.md)
     - [实用工具](docs/zh_cn/user_guides/useful_tools.md)
+    - [自定义数据集](docs/zh_cn/user_guides/custom_dataset.md)
 
 - 算法描述
 
@@ -139,6 +137,7 @@ MMYOLO 用法和 MMDetection 几乎一致，所有教程都是通用的，你也
     - [模型设计相关说明](docs/zh_cn/algorithm_descriptions/model_design.md)
   - [算法原理和实现全解析](https://mmyolo.readthedocs.io/zh_CN/latest/algorithm_descriptions/index.html#算法原理和实现全解析)
     - [YOLOv5 原理和实现全解析](docs/zh_cn/algorithm_descriptions/yolov5_description.md)
+    - [YOLOv6 原理和实现全解析](docs/zh_cn/algorithm_descriptions/yolov6_description.md)
     - [RTMDet 原理和实现全解析](docs/zh_cn/algorithm_descriptions/rtmdet_description.md)
 
 - 算法部署
@@ -167,8 +166,8 @@ MMYOLO 用法和 MMDetection 几乎一致，所有教程都是通用的，你也
 - [x] [YOLOX](configs/yolox)
 - [x] [RTMDet](configs/rtmdet)
 - [x] [YOLOv6](configs/yolov6)
+- [x] [YOLOv7](configs/yolov7)
 - [ ] [PPYOLOE](configs/ppyoloe)(仅推理)
-- [ ] [YOLOv7](configs/yolov7)(仅推理)
 
 </details>
 
diff --git a/configs/deploy/detection_rknn-fp16_static-320x320.py b/configs/deploy/detection_rknn-fp16_static-320x320.py
new file mode 100644
index 000000000..b7bd31331
--- /dev/null
+++ b/configs/deploy/detection_rknn-fp16_static-320x320.py
@@ -0,0 +1,9 @@
+_base_ = ['./base_static.py']
+onnx_config = dict(
+    input_shape=[320, 320], output_names=['feat0', 'feat1', 'feat2'])
+codebase_config = dict(model_type='rknn')
+backend_config = dict(
+    type='rknn',
+    common_config=dict(target_platform='rv1126', optimization_level=1),
+    quantization_config=dict(do_quantization=False, dataset=None),
+    input_size_list=[[3, 320, 320]])
diff --git a/configs/deploy/detection_rknn-int8_static-320x320.py b/configs/deploy/detection_rknn-int8_static-320x320.py
new file mode 100644
index 000000000..10c96b2f2
--- /dev/null
+++ b/configs/deploy/detection_rknn-int8_static-320x320.py
@@ -0,0 +1,9 @@
+_base_ = ['./base_static.py']
+onnx_config = dict(
+    input_shape=[320, 320], output_names=['feat0', 'feat1', 'feat2'])
+codebase_config = dict(model_type='rknn')
+backend_config = dict(
+    type='rknn',
+    common_config=dict(target_platform='rv1126', optimization_level=1),
+    quantization_config=dict(do_quantization=True, dataset=None),
+    input_size_list=[[3, 320, 320]])
diff --git a/configs/deploy/detection_tensorrt-int8_dynamic-192x192-960x960.py b/configs/deploy/detection_tensorrt-int8_dynamic-192x192-960x960.py
index b0ba7d70b..21591c4d4 100644
--- a/configs/deploy/detection_tensorrt-int8_dynamic-192x192-960x960.py
+++ b/configs/deploy/detection_tensorrt-int8_dynamic-192x192-960x960.py
@@ -10,6 +10,6 @@
                     min_shape=[1, 3, 192, 192],
                     opt_shape=[1, 3, 640, 640],
                     max_shape=[1, 3, 960, 960])))
-    ],
-    calib_config=dict(create_calib=True, calib_file='calib_data.h5'))
+    ])
+calib_config = dict(create_calib=True, calib_file='calib_data.h5')
 use_efficientnms = False  # whether to replace TRTBatchedNMS plugin with EfficientNMS plugin # noqa E501
diff --git a/configs/deploy/detection_tensorrt-int8_static-640x640.py b/configs/deploy/detection_tensorrt-int8_static-640x640.py
index f439c7b10..ac394a6b3 100644
--- a/configs/deploy/detection_tensorrt-int8_static-640x640.py
+++ b/configs/deploy/detection_tensorrt-int8_static-640x640.py
@@ -11,6 +11,6 @@
                     min_shape=[1, 3, 640, 640],
                     opt_shape=[1, 3, 640, 640],
                     max_shape=[1, 3, 640, 640])))
-    ],
-    calib_config=dict(create_calib=True, calib_file='calib_data.h5'))
+    ])
+calib_config = dict(create_calib=True, calib_file='calib_data.h5')
 use_efficientnms = False  # whether to replace TRTBatchedNMS plugin with EfficientNMS plugin # noqa E501
diff --git a/configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py b/configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py
index 4a56b492f..51376b649 100644
--- a/configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py
+++ b/configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py
@@ -19,6 +19,9 @@
 # persistent_workers must be False if num_workers is 0.
 persistent_workers = True
 
+# Base learning rate for optim_wrapper
+base_lr = 0.01
+
 # only on Val
 batch_shapes_cfg = dict(
     type='BatchShapePolicy',
@@ -199,7 +202,7 @@
     type='OptimWrapper',
     optimizer=dict(
         type='SGD',
-        lr=0.01,
+        lr=base_lr,
         momentum=0.937,
         weight_decay=0.0005,
         nesterov=True,
diff --git a/configs/yolov6/README.md b/configs/yolov6/README.md
index 7ae405266..4070a1996 100644
--- a/configs/yolov6/README.md
+++ b/configs/yolov6/README.md
@@ -16,17 +16,20 @@ For years, YOLO series have been de facto industry-level standard for efficient
 
 ### COCO
 
-| Backbone | Arch | size | SyncBN | AMP | Mem (GB) | box AP |                           Config                            |                                                                                                                                                           Download                                                                                                                                                           |
-| :------: | :--: | :--: | :----: | :-: | :------: | :----: | :---------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
-| YOLOv6-n |  P5  | 640  |  Yes   | Yes |   6.04   |  36.2  | [config](../yolov6/yolov6_n_syncbn_fast_8xb32-400e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov6/yolov6_n_syncbn_fast_8xb32-400e_coco/yolov6_n_syncbn_fast_8xb32-400e_coco_20221030_202726-d99b2e82.pth) \| [log](https://download.openmmlab.com/mmyolo/v0/yolov6/yolov6_n_syncbn_fast_8xb32-400e_coco/yolov6_n_syncbn_fast_8xb32-400e_coco_20221030_202726.log.json) |
-| YOLOv6-t |  P5  | 640  |  Yes   | Yes |   8.13   |  41.0  | [config](../yolov6/yolov6_t_syncbn_fast_8xb32-400e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov6/yolov6_t_syncbn_fast_8xb32-400e_coco/yolov6_t_syncbn_fast_8xb32-400e_coco_20221030_143755-cf0d278f.pth) \| [log](https://download.openmmlab.com/mmyolo/v0/yolov6/yolov6_t_syncbn_fast_8xb32-400e_coco/yolov6_t_syncbn_fast_8xb32-400e_coco_20221030_143755.log.json) |
-| YOLOv6-s |  P5  | 640  |  Yes   | Yes |   8.88   |  44.0  | [config](../yolov6/yolov6_s_syncbn_fast_8xb32-400e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov6/yolov6_s_syncbn_fast_8xb32-400e_coco/yolov6_s_syncbn_fast_8xb32-400e_coco_20221102_203035-932e1d91.pth) \| [log](https://download.openmmlab.com/mmyolo/v0/yolov6/yolov6_s_syncbn_fast_8xb32-400e_coco/yolov6_s_syncbn_fast_8xb32-400e_coco_20221102_203035.log.json) |
+| Backbone | Arch | Size | Epoch | SyncBN | AMP | Mem (GB) | Box AP |                           Config                            |                                                                                                                                                           Download                                                                                                                                                           |
+| :------: | :--: | :--: | :---: | :----: | :-: | :------: | :----: | :---------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| YOLOv6-n |  P5  | 640  |  400  |  Yes   | Yes |   6.04   |  36.2  | [config](../yolov6/yolov6_n_syncbn_fast_8xb32-400e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov6/yolov6_n_syncbn_fast_8xb32-400e_coco/yolov6_n_syncbn_fast_8xb32-400e_coco_20221030_202726-d99b2e82.pth) \| [log](https://download.openmmlab.com/mmyolo/v0/yolov6/yolov6_n_syncbn_fast_8xb32-400e_coco/yolov6_n_syncbn_fast_8xb32-400e_coco_20221030_202726.log.json) |
+| YOLOv6-t |  P5  | 640  |  400  |  Yes   | Yes |   8.13   |  41.0  | [config](../yolov6/yolov6_t_syncbn_fast_8xb32-400e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov6/yolov6_t_syncbn_fast_8xb32-400e_coco/yolov6_t_syncbn_fast_8xb32-400e_coco_20221030_143755-cf0d278f.pth) \| [log](https://download.openmmlab.com/mmyolo/v0/yolov6/yolov6_t_syncbn_fast_8xb32-400e_coco/yolov6_t_syncbn_fast_8xb32-400e_coco_20221030_143755.log.json) |
+| YOLOv6-s |  P5  | 640  |  400  |  Yes   | Yes |   8.88   |  44.0  | [config](../yolov6/yolov6_s_syncbn_fast_8xb32-400e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov6/yolov6_s_syncbn_fast_8xb32-400e_coco/yolov6_s_syncbn_fast_8xb32-400e_coco_20221102_203035-932e1d91.pth) \| [log](https://download.openmmlab.com/mmyolo/v0/yolov6/yolov6_s_syncbn_fast_8xb32-400e_coco/yolov6_s_syncbn_fast_8xb32-400e_coco_20221102_203035.log.json) |
+| YOLOv6-m |  P5  | 640  |  300  |  Yes   | Yes |  16.69   |  48.4  | [config](../yolov6/yolov6_m_syncbn_fast_8xb32-400e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov6/yolov6_m_syncbn_fast_8xb32-300e_coco/yolov6_m_syncbn_fast_8xb32-300e_coco_20221109_182658-85bda3f4.pth) \| [log](https://download.openmmlab.com/mmyolo/v0/yolov6/yolov6_m_syncbn_fast_8xb32-300e_coco/yolov6_m_syncbn_fast_8xb32-300e_coco_20221109_182658.log.json) |
+| YOLOv6-l |  P5  | 640  |  300  |  Yes   | Yes |  20.86   |  51.0  | [config](../yolov6/yolov6_l_syncbn_fast_8xb32-300e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov6/yolov6_l_syncbn_fast_8xb32-300e_coco/yolov6_l_syncbn_fast_8xb32-300e_coco_20221109_183156-91e3c447.pth) \| [log](https://download.openmmlab.com/mmyolo/v0/yolov6/yolov6_l_syncbn_fast_8xb32-300e_coco/yolov6_l_syncbn_fast_8xb32-300e_coco_20221109_183156.log.json) |
 
 **Note**:
 
-1. The performance is unstable and may fluctuate by about 0.3 mAP.
-2. YOLOv6-m,l,x will be supported in later version.
-3. If users need the weight of 300 epoch, they can train according to the configs of 300 epoch provided by us, or convert the official weight according to the [converter script](../../tools/model_converters/).
+1. The official m and l models use knowledge distillation, but our version does not support it, which will be implemented in [MMRazor](https://github.com/open-mmlab/mmrazor) in the future.
+2. The performance is unstable and may fluctuate by about 0.3 mAP.
+3. If users need the weight of 300 epoch for nano, tiny and small model, they can train according to the configs of 300 epoch provided by us, or convert the official weight according to the [converter script](../../tools/model_converters/).
+4. We have observed that the [base model](https://github.com/meituan/YOLOv6/tree/main/configs/base) has been officially released in v6 recently. Although the accuracy has decreased, it is more efficient. We will also provide the base model configuration in the future.
 
 ## Citation
 
diff --git a/configs/yolov6/metafile.yml b/configs/yolov6/metafile.yml
index 95a170939..df4515269 100644
--- a/configs/yolov6/metafile.yml
+++ b/configs/yolov6/metafile.yml
@@ -57,3 +57,27 @@ Models:
         Metrics:
           box AP: 41.0
     Weights: https://download.openmmlab.com/mmyolo/v0/yolov6/yolov6_t_syncbn_fast_8xb32-400e_coco/yolov6_t_syncbn_fast_8xb32-400e_coco_20221030_143755-cf0d278f.pth
+  - Name: yolov6_m_syncbn_fast_8xb32-300e_coco
+    In Collection: YOLOv6
+    Config: configs/yolov6/yolov6_m_syncbn_fast_8xb32-300e_coco.py
+    Metadata:
+      Training Memory (GB): 16.69
+      Epochs: 300
+    Results:
+      - Task: Object Detection
+        Dataset: COCO
+        Metrics:
+          box AP: 48.4
+    Weights: https://download.openmmlab.com/mmyolo/v0/yolov6/yolov6_m_syncbn_fast_8xb32-300e_coco/yolov6_m_syncbn_fast_8xb32-300e_coco_20221109_182658-85bda3f4.pth
+  - Name: yolov6_l_syncbn_fast_8xb32-300e_coco
+    In Collection: YOLOv6
+    Config: configs/yolov6/yolov6_l_syncbn_fast_8xb32-300e_coco.py
+    Metadata:
+      Training Memory (GB): 20.86
+      Epochs: 300
+    Results:
+      - Task: Object Detection
+        Dataset: COCO
+        Metrics:
+          box AP: 51.0
+    Weights: https://download.openmmlab.com/mmyolo/v0/yolov6/yolov6_l_syncbn_fast_8xb32-300e_coco/yolov6_l_syncbn_fast_8xb32-300e_coco_20221109_183156-91e3c447.pth
diff --git a/configs/yolov6/yolov6_l_syncbn_fast_8xb32-300e_coco.py b/configs/yolov6/yolov6_l_syncbn_fast_8xb32-300e_coco.py
new file mode 100644
index 000000000..924f10753
--- /dev/null
+++ b/configs/yolov6/yolov6_l_syncbn_fast_8xb32-300e_coco.py
@@ -0,0 +1,23 @@
+_base_ = './yolov6_m_syncbn_fast_8xb32-300e_coco.py'
+
+deepen_factor = 1
+widen_factor = 1
+
+model = dict(
+    backbone=dict(
+        deepen_factor=deepen_factor,
+        widen_factor=widen_factor,
+        hidden_ratio=1. / 2,
+        block_cfg=dict(
+            type='ConvWrapper',
+            norm_cfg=dict(type='BN', momentum=0.03, eps=0.001)),
+        act_cfg=dict(type='SiLU', inplace=True)),
+    neck=dict(
+        deepen_factor=deepen_factor,
+        widen_factor=widen_factor,
+        hidden_ratio=1. / 2,
+        block_cfg=dict(
+            type='ConvWrapper',
+            norm_cfg=dict(type='BN', momentum=0.03, eps=0.001)),
+        block_act_cfg=dict(type='SiLU', inplace=True)),
+    bbox_head=dict(head_module=dict(widen_factor=widen_factor)))
diff --git a/configs/yolov6/yolov6_m_syncbn_fast_8xb32-300e_coco.py b/configs/yolov6/yolov6_m_syncbn_fast_8xb32-300e_coco.py
new file mode 100644
index 000000000..28e1549d5
--- /dev/null
+++ b/configs/yolov6/yolov6_m_syncbn_fast_8xb32-300e_coco.py
@@ -0,0 +1,54 @@
+_base_ = './yolov6_s_syncbn_fast_8xb32-300e_coco.py'
+
+deepen_factor = 0.6
+widen_factor = 0.75
+affine_scale = 0.9
+
+model = dict(
+    backbone=dict(
+        type='YOLOv6CSPBep',
+        deepen_factor=deepen_factor,
+        widen_factor=widen_factor,
+        hidden_ratio=2. / 3,
+        block_cfg=dict(type='RepVGGBlock'),
+        act_cfg=dict(type='ReLU', inplace=True)),
+    neck=dict(
+        type='YOLOv6CSPRepPAFPN',
+        deepen_factor=deepen_factor,
+        widen_factor=widen_factor,
+        block_cfg=dict(type='RepVGGBlock'),
+        hidden_ratio=2. / 3,
+        block_act_cfg=dict(type='ReLU', inplace=True)),
+    bbox_head=dict(
+        type='YOLOv6Head', head_module=dict(widen_factor=widen_factor)))
+
+mosaic_affine_pipeline = [
+    dict(
+        type='Mosaic',
+        img_scale=_base_.img_scale,
+        pad_val=114.0,
+        pre_transform=_base_.pre_transform),
+    dict(
+        type='YOLOv5RandomAffine',
+        max_rotate_degree=0.0,
+        max_shear_degree=0.0,
+        scaling_ratio_range=(1 - affine_scale, 1 + affine_scale),
+        border=(-_base_.img_scale[0] // 2, -_base_.img_scale[1] // 2),
+        border_val=(114, 114, 114))
+]
+
+train_pipeline = [
+    *_base_.pre_transform, *mosaic_affine_pipeline,
+    dict(
+        type='YOLOv5MixUp',
+        prob=0.1,
+        pre_transform=[*_base_.pre_transform, *mosaic_affine_pipeline]),
+    dict(type='YOLOv5HSVRandomAug'),
+    dict(type='mmdet.RandomFlip', prob=0.5),
+    dict(
+        type='mmdet.PackDetInputs',
+        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'flip',
+                   'flip_direction'))
+]
+
+train_dataloader = dict(dataset=dict(pipeline=train_pipeline))
diff --git a/configs/yolov6/yolov6_s_syncbn_fast_8xb32-400e_coco.py b/configs/yolov6/yolov6_s_syncbn_fast_8xb32-400e_coco.py
index 930727d96..309a520ab 100644
--- a/configs/yolov6/yolov6_s_syncbn_fast_8xb32-400e_coco.py
+++ b/configs/yolov6/yolov6_s_syncbn_fast_8xb32-400e_coco.py
@@ -12,6 +12,7 @@
 img_scale = (640, 640)  # height, width
 deepen_factor = 0.33
 widen_factor = 0.5
+affine_scale = 0.5
 save_epoch_intervals = 10
 train_batch_size_per_gpu = 32
 train_num_workers = 8
@@ -112,7 +113,7 @@
         type='YOLOv5RandomAffine',
         max_rotate_degree=0.0,
         max_translate_ratio=0.1,
-        scaling_ratio_range=(0.5, 1.5),
+        scaling_ratio_range=(1 - affine_scale, 1 + affine_scale),
         border=(-img_scale[0] // 2, -img_scale[1] // 2),
         border_val=(114, 114, 114),
         max_shear_degree=0.0),
@@ -136,7 +137,7 @@
         type='YOLOv5RandomAffine',
         max_rotate_degree=0.0,
         max_translate_ratio=0.1,
-        scaling_ratio_range=(0.5, 1.5),
+        scaling_ratio_range=(1 - affine_scale, 1 + affine_scale),
         max_shear_degree=0.0,
     ),
     dict(type='YOLOv5HSVRandomAug'),
diff --git a/configs/yolov7/README.md b/configs/yolov7/README.md
new file mode 100644
index 000000000..ac92ca432
--- /dev/null
+++ b/configs/yolov7/README.md
@@ -0,0 +1,45 @@
+# YOLOv7
+
+> [YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors](https://arxiv.org/abs/2207.02696)
+
+<!-- [ALGORITHM] -->
+
+## Abstract
+
+YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56.8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100. YOLOv7-E6 object detector (56 FPS V100, 55.9% AP) outperforms both transformer-based detector SWIN-L Cascade-Mask R-CNN (9.2 FPS A100, 53.9% AP) by 509% in speed and 2% in accuracy, and convolutional-based detector ConvNeXt-XL Cascade-Mask R-CNN (8.6 FPS A100, 55.2% AP) by 551% in speed and 0.7% AP in accuracy, as well as YOLOv7 outperforms: YOLOR, YOLOX, Scaled-YOLOv4, YOLOv5, DETR, Deformable DETR, DINO-5scale-R50, ViT-Adapter-B and many other object detectors in speed and accuracy. Moreover, we train YOLOv7 only on MS COCO dataset from scratch without using any other datasets or pre-trained weights. Source code is released in [this https URL](https://github.com/WongKinYiu/yolov7).
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/17425982/204231759-cc5c77a9-38c6-4a41-85be-eb97e4b2bcbb.png"/>
+</div>
+
+## Results and models
+
+### COCO
+
+| Backbone | Arch | Size | SyncBN | AMP | Mem (GB) | Box AP |                           Config                            |                                                                                                                                                           Download                                                                                                                                                           |
+| :------: | :--: | :--: | :---: | :----: | :-: | :------: | :----: | :---------------------------------------------------------: | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| YOLOv7-tiny |  P5  | 640  | Yes   | Yes |   2.7   |  37.5  | [config](../yolov7/yolov7_tiny_syncbn_fast_8x16b-300e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov7/yolov7_tiny_syncbn_fast_8x16b-300e_coco/yolov7_tiny_syncbn_fast_8x16b-300e_coco_20221126_102719-0ee5bbdf.pth) | [log](https://download.openmmlab.com/mmyolo/v0/yolov7/yolov7_tiny_syncbn_fast_8x16b-300e_coco/yolov7_tiny_syncbn_fast_8x16b-300e_coco_20221126_102719.log.json) |
+| YOLOv7-l |  P5  | 640  |   Yes   | Yes |   10.3   |  50.9  | [config](../yolov7/yolov7_l_syncbn_fast_8x16b-300e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov7/yolov7_l_syncbn_fast_8x16b-300e_coco/yolov7_l_syncbn_fast_8x16b-300e_coco_20221123_023601-8113c0eb.pth) | [log](https://download.openmmlab.com/mmyolo/v0/yolov7/yolov7_l_syncbn_fast_8x16b-300e_coco/yolov7_l_syncbn_fast_8x16b-300e_coco_20221123_023601.log.json) |
+| YOLOv7-x |  P5  | 640  |   Yes   | Yes |   13.7   |  52.8  | [config](../yolov7/yolov7_x_syncbn_fast_8x16b-300e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov7/yolov7_x_syncbn_fast_8x16b-300e_coco/yolov7_x_syncbn_fast_8x16b-300e_coco_20221124_215331-ef949a68.pth) | [log](https://download.openmmlab.com/mmyolo/v0/yolov7/yolov7_x_syncbn_fast_8x16b-300e_coco/yolov7_x_syncbn_fast_8x16b-300e_coco_20221124_215331.log.json) |
+| YOLOv7-w |  P6  | 1280  |   Yes   | Yes |  27.0   |  54.1  | [config](../yolov7/yolov7_w-p6_syncbn_fast_8x16b-300e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov7/yolov7_w-p6_syncbn_fast_8x16b-300e_coco/yolov7_w-p6_syncbn_fast_8x16b-300e_coco_20221123_053031-a68ef9d2.pth) | [log](https://download.openmmlab.com/mmyolo/v0/yolov7/yolov7_w-p6_syncbn_fast_8x16b-300e_coco/yolov7_w-p6_syncbn_fast_8x16b-300e_coco_20221123_053031.log.json) |
+| YOLOv7-e |  P6  | 1280  |   Yes   | Yes |  42.5   |  55.1  | [config](../yolov7/yolov7_e-p6_syncbn_fast_8x16b-300e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolov7/yolov7_e-p6_syncbn_fast_8x16b-300e_coco/yolov7_e-p6_syncbn_fast_8x16b-300e_coco_20221126_102636-34425033.pth) | [log](https://download.openmmlab.com/mmyolo/v0/yolov7/yolov7_e-p6_syncbn_fast_8x16b-300e_coco/yolov7_e-p6_syncbn_fast_8x16b-300e_coco_20221126_102636.log.json) |
+
+**Note**:
+In the official YOLOv7 code, the `random_perspective` data augmentation in COCO object detection task training uses mask annotation information, which leads to higher performance. Object detection should not use mask annotation, so only box annotation information is used in `MMYOLO`. We will use the mask annotation information in the instance segmentation task.
+
+1. The performance is unstable and may fluctuate by about 0.3 mAP. The performance shown above is the best model.
+2. If users need the weight of `YOLOv7-e2e`, they can train according to the configs provided by us, or convert the official weight according to the [converter script](https://github.com/open-mmlab/mmyolo/blob/main/tools/model_converters/yolov7_to_mmyolo.py).
+3. `fast` means that `YOLOv5DetDataPreprocessor` and `yolov5_collate` are used for data preprocessing, which is faster for training, but less flexible for multitasking. Recommended to use fast version config if you only care about object detection.
+4. `SyncBN` means use SyncBN, `AMP` indicates training with mixed precision.
+5. We use 8x A100 for training, and the single-GPU batch size is 16. This is different from the official code.
+
+## Citation
+
+```latex
+@article{wang2022yolov7,
+  title={{YOLOv7}: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors},
+  author={Wang, Chien-Yao and Bochkovskiy, Alexey and Liao, Hong-Yuan Mark},
+  journal={arXiv preprint arXiv:2207.02696},
+  year={2022}
+}
+```
diff --git a/configs/yolov7/metafile.yml b/configs/yolov7/metafile.yml
new file mode 100644
index 000000000..067ec6b45
--- /dev/null
+++ b/configs/yolov7/metafile.yml
@@ -0,0 +1,83 @@
+Collections:
+  - Name: YOLOv7
+    Metadata:
+      Training Data: COCO
+      Training Techniques:
+        - SGD with Nesterov
+        - Weight Decay
+        - AMP
+        - Synchronize BN
+      Training Resources: 8x A100 GPUs
+      Architecture:
+        - EELAN
+        - PAFPN
+        - RepVGG
+    Paper:
+      URL: https://arxiv.org/abs/2207.02696
+      Title: 'YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors'
+    README: configs/yolov7/README.md
+    Code:
+      URL: https://github.com/open-mmlab/mmyolo/blob/v0.0.1/mmyolo/models/detectors/yolo_detector.py#L12
+      Version: v0.0.1
+
+Models:
+  - Name: yolov7_tiny_syncbn_fast_8x16b-300e_coco
+    In Collection: YOLOv7
+    Config: configs/yolov7/yolov7_tiny_syncbn_fast_8x16b-300e_coco.py
+    Metadata:
+      Training Memory (GB): 2.7
+      Epochs: 300
+    Results:
+      - Task: Object Detection
+        Dataset: COCO
+        Metrics:
+          box AP: 37.5
+    Weights: https://download.openmmlab.com/mmyolo/v0/yolov7/yolov7_tiny_syncbn_fast_8x16b-300e_coco/yolov7_tiny_syncbn_fast_8x16b-300e_coco_20221126_102719-0ee5bbdf.pth
+  - Name: yolov7_l_syncbn_fast_8x16b-300e_coco
+    In Collection: YOLOv7
+    Config: configs/yolov7/yolov7_l_syncbn_fast_8x16b-300e_coco.py
+    Metadata:
+      Training Memory (GB): 10.3
+      Epochs: 300
+    Results:
+      - Task: Object Detection
+        Dataset: COCO
+        Metrics:
+          box AP: 50.9
+    Weights: https://download.openmmlab.com/mmyolo/v0/yolov7/yolov7_l_syncbn_fast_8x16b-300e_coco/yolov7_l_syncbn_fast_8x16b-300e_coco_20221123_023601-8113c0eb.pth
+  - Name: yolov7_x_syncbn_fast_8x16b-300e_coco
+    In Collection: YOLOv7
+    Config: configs/yolov7/yolov7_x_syncbn_fast_8x16b-300e_coco.py
+    Metadata:
+      Training Memory (GB): 13.7
+      Epochs: 300
+    Results:
+      - Task: Object Detection
+        Dataset: COCO
+        Metrics:
+          box AP: 52.8
+    Weights: https://download.openmmlab.com/mmyolo/v0/yolov7/yolov7_x_syncbn_fast_8x16b-300e_coco/yolov7_x_syncbn_fast_8x16b-300e_coco_20221124_215331-ef949a68.pth
+  - Name: yolov7_w-p6_syncbn_fast_8x16b-300e_coco
+    In Collection: YOLOv7
+    Config: configs/yolov7/yolov7_w-p6_syncbn_fast_8x16b-300e_coco.py
+    Metadata:
+      Training Memory (GB): 27.0
+      Epochs: 300
+    Results:
+      - Task: Object Detection
+        Dataset: COCO
+        Metrics:
+          box AP: 54.1
+    Weights: https://download.openmmlab.com/mmyolo/v0/yolov7/yolov7_w-p6_syncbn_fast_8x16b-300e_coco/yolov7_w-p6_syncbn_fast_8x16b-300e_coco_20221123_053031-a68ef9d2.pth
+  - Name: yolov7_e-p6_syncbn_fast_8x16b-300e_coco
+    In Collection: YOLOv7
+    Config: configs/yolov7/yolov7_e-p6_syncbn_fast_8x16b-300e_coco.py
+    Metadata:
+      Training Memory (GB): 42.5
+      Epochs: 300
+    Results:
+      - Task: Object Detection
+        Dataset: COCO
+        Metrics:
+          box AP: 55.1
+    Weights: https://download.openmmlab.com/mmyolo/v0/yolov7/yolov7_e-p6_syncbn_fast_8x16b-300e_coco/yolov7_e-p6_syncbn_fast_8x16b-300e_coco_20221126_102636-34425033.pth
diff --git a/configs/yolov7/yolov7_d-p6_syncbn_fast_8x16b-300e_coco.py b/configs/yolov7/yolov7_d-p6_syncbn_fast_8x16b-300e_coco.py
new file mode 100644
index 000000000..a68715264
--- /dev/null
+++ b/configs/yolov7/yolov7_d-p6_syncbn_fast_8x16b-300e_coco.py
@@ -0,0 +1,21 @@
+_base_ = './yolov7_w-p6_syncbn_fast_8x16b-300e_coco.py'
+
+model = dict(
+    backbone=dict(arch='D'),
+    neck=dict(
+        use_maxpool_in_downsample=True,
+        use_in_channels_in_downsample=True,
+        block_cfg=dict(
+            type='ELANBlock',
+            middle_ratio=0.4,
+            block_ratio=0.2,
+            num_blocks=6,
+            num_convs_in_block=1),
+        in_channels=[384, 768, 1152, 1536],
+        out_channels=[192, 384, 576, 768]),
+    bbox_head=dict(
+        head_module=dict(
+            in_channels=[192, 384, 576, 768],
+            main_out_channels=[384, 768, 1152, 1536],
+            aux_out_channels=[384, 768, 1152, 1536],
+        )))
diff --git a/configs/yolov7/yolov7_e-p6_syncbn_fast_8x16b-300e_coco.py b/configs/yolov7/yolov7_e-p6_syncbn_fast_8x16b-300e_coco.py
new file mode 100644
index 000000000..3d1463dc4
--- /dev/null
+++ b/configs/yolov7/yolov7_e-p6_syncbn_fast_8x16b-300e_coco.py
@@ -0,0 +1,19 @@
+_base_ = './yolov7_w-p6_syncbn_fast_8x16b-300e_coco.py'
+
+model = dict(
+    backbone=dict(arch='E'),
+    neck=dict(
+        use_maxpool_in_downsample=True,
+        use_in_channels_in_downsample=True,
+        block_cfg=dict(
+            type='ELANBlock',
+            middle_ratio=0.4,
+            block_ratio=0.2,
+            num_blocks=6,
+            num_convs_in_block=1),
+        in_channels=[320, 640, 960, 1280],
+        out_channels=[160, 320, 480, 640]),
+    bbox_head=dict(
+        head_module=dict(
+            in_channels=[160, 320, 480, 640],
+            main_out_channels=[320, 640, 960, 1280])))
diff --git a/configs/yolov7/yolov7_e2e-p6_syncbn_fast_8x16b-300e_coco.py b/configs/yolov7/yolov7_e2e-p6_syncbn_fast_8x16b-300e_coco.py
new file mode 100644
index 000000000..6af81051b
--- /dev/null
+++ b/configs/yolov7/yolov7_e2e-p6_syncbn_fast_8x16b-300e_coco.py
@@ -0,0 +1,20 @@
+_base_ = './yolov7_w-p6_syncbn_fast_8x16b-300e_coco.py'
+
+model = dict(
+    backbone=dict(arch='E2E'),
+    neck=dict(
+        use_maxpool_in_downsample=True,
+        use_in_channels_in_downsample=True,
+        block_cfg=dict(
+            type='EELANBlock',
+            num_elan_block=2,
+            middle_ratio=0.4,
+            block_ratio=0.2,
+            num_blocks=6,
+            num_convs_in_block=1),
+        in_channels=[320, 640, 960, 1280],
+        out_channels=[160, 320, 480, 640]),
+    bbox_head=dict(
+        head_module=dict(
+            in_channels=[160, 320, 480, 640],
+            main_out_channels=[320, 640, 960, 1280])))
diff --git a/configs/yolov7/yolov7_l_fast_8x16b-300_coco.py b/configs/yolov7/yolov7_l_fast_8x16b-300_coco.py
deleted file mode 100644
index 671dbbb7b..000000000
--- a/configs/yolov7/yolov7_l_fast_8x16b-300_coco.py
+++ /dev/null
@@ -1,129 +0,0 @@
-_base_ = '../_base_/default_runtime.py'
-
-# dataset settings
-data_root = 'data/coco/'
-dataset_type = 'YOLOv5CocoDataset'
-
-# parameters that often need to be modified
-img_scale = (640, 640)  # height, width
-deepen_factor = 1.0
-widen_factor = 1.0
-max_epochs = 300
-save_epoch_intervals = 10
-train_batch_size_per_gpu = 16
-train_num_workers = 8
-val_batch_size_per_gpu = 1
-val_num_workers = 2
-
-# persistent_workers must be False if num_workers is 0.
-persistent_workers = True
-
-# only on Val
-batch_shapes_cfg = dict(
-    type='BatchShapePolicy',
-    batch_size=val_batch_size_per_gpu,
-    img_size=img_scale[0],
-    size_divisor=32,
-    extra_pad_ratio=0.5)
-
-# different from yolov5
-anchors = [[(12, 16), (19, 36), (40, 28)], [(36, 75), (76, 55), (72, 146)],
-           [(142, 110), (192, 243), (459, 401)]]
-strides = [8, 16, 32]
-
-# single-scale training is recommended to
-# be turned on, which can speed up training.
-env_cfg = dict(cudnn_benchmark=True)
-
-model = dict(
-    type='YOLODetector',
-    data_preprocessor=dict(
-        type='YOLOv5DetDataPreprocessor',
-        mean=[0., 0., 0.],
-        std=[255., 255., 255.],
-        bgr_to_rgb=True),
-    backbone=dict(
-        type='YOLOv7Backbone',
-        deepen_factor=deepen_factor,
-        widen_factor=widen_factor,
-        norm_cfg=dict(type='BN', momentum=0.03, eps=0.001),
-        act_cfg=dict(type='SiLU', inplace=True)),
-    neck=dict(
-        type='YOLOv7PAFPN',
-        deepen_factor=deepen_factor,
-        widen_factor=widen_factor,
-        upsample_feats_cat_first=False,
-        in_channels=[512, 1024, 1024],
-        out_channels=[128, 256, 512],
-        norm_cfg=dict(type='BN', momentum=0.03, eps=0.001),
-        act_cfg=dict(type='SiLU', inplace=True)),
-    bbox_head=dict(
-        type='YOLOv7Head',
-        head_module=dict(
-            type='YOLOv5HeadModule',
-            num_classes=80,
-            in_channels=[256, 512, 1024],
-            widen_factor=widen_factor,
-            featmap_strides=strides,
-            num_base_priors=3),
-        prior_generator=dict(
-            type='mmdet.YOLOAnchorGenerator',
-            base_sizes=anchors,
-            strides=strides)),
-    test_cfg=dict(
-        multi_label=True,
-        nms_pre=30000,
-        score_thr=0.001,
-        nms=dict(type='nms', iou_threshold=0.65),
-        max_per_img=300))
-
-test_pipeline = [
-    dict(
-        type='LoadImageFromFile',
-        file_client_args={{_base_.file_client_args}}),
-    dict(type='YOLOv5KeepRatioResize', scale=img_scale),
-    dict(
-        type='LetterResize',
-        scale=img_scale,
-        allow_scale_up=False,
-        pad_val=dict(img=114)),
-    dict(type='LoadAnnotations', with_bbox=True, _scope_='mmdet'),
-    dict(
-        type='mmdet.PackDetInputs',
-        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
-                   'scale_factor', 'pad_param'))
-]
-
-val_dataloader = dict(
-    batch_size=val_batch_size_per_gpu,
-    num_workers=val_num_workers,
-    persistent_workers=persistent_workers,
-    pin_memory=True,
-    drop_last=False,
-    sampler=dict(type='DefaultSampler', shuffle=False),
-    dataset=dict(
-        type=dataset_type,
-        data_root=data_root,
-        test_mode=True,
-        data_prefix=dict(img='val2017/'),
-        ann_file='annotations/instances_val2017.json',
-        pipeline=test_pipeline,
-        batch_shapes_cfg=batch_shapes_cfg))
-
-test_dataloader = val_dataloader
-
-val_evaluator = dict(
-    type='mmdet.CocoMetric',
-    proposal_nums=(100, 1, 10),  # Can be accelerated
-    ann_file=data_root + 'annotations/instances_val2017.json',
-    metric='bbox')
-test_evaluator = val_evaluator
-
-# train_cfg = dict(
-#     type='EpochBasedTrainLoop',
-#     max_epochs=max_epochs,
-#     val_interval=save_epoch_intervals)
-val_cfg = dict(type='ValLoop')
-test_cfg = dict(type='TestLoop')
-
-# randomness = dict(seed=1, deterministic=True)
diff --git a/configs/yolov7/yolov7_l_syncbn_fast_8x16b-300e_coco.py b/configs/yolov7/yolov7_l_syncbn_fast_8x16b-300e_coco.py
new file mode 100644
index 000000000..5c4e51415
--- /dev/null
+++ b/configs/yolov7/yolov7_l_syncbn_fast_8x16b-300e_coco.py
@@ -0,0 +1,267 @@
+_base_ = '../_base_/default_runtime.py'
+
+# dataset settings
+data_root = 'data/coco/'
+dataset_type = 'YOLOv5CocoDataset'
+
+# parameters that often need to be modified
+img_scale = (640, 640)  # height, width
+max_epochs = 300
+save_epoch_intervals = 10
+train_batch_size_per_gpu = 16
+train_num_workers = 8
+# persistent_workers must be False if num_workers is 0.
+persistent_workers = True
+val_batch_size_per_gpu = 1
+val_num_workers = 2
+
+# only on Val
+batch_shapes_cfg = dict(
+    type='BatchShapePolicy',
+    batch_size=val_batch_size_per_gpu,
+    img_size=img_scale[0],
+    size_divisor=32,
+    extra_pad_ratio=0.5)
+
+# different from yolov5
+anchors = [
+    [(12, 16), (19, 36), (40, 28)],  # P3/8
+    [(36, 75), (76, 55), (72, 146)],  # P4/16
+    [(142, 110), (192, 243), (459, 401)]  # P5/32
+]
+strides = [8, 16, 32]
+num_det_layers = 3
+num_classes = 80
+
+# single-scale training is recommended to
+# be turned on, which can speed up training.
+env_cfg = dict(cudnn_benchmark=True)
+
+model = dict(
+    type='YOLODetector',
+    data_preprocessor=dict(
+        type='YOLOv5DetDataPreprocessor',
+        mean=[0., 0., 0.],
+        std=[255., 255., 255.],
+        bgr_to_rgb=True),
+    backbone=dict(
+        type='YOLOv7Backbone',
+        arch='L',
+        norm_cfg=dict(type='BN', momentum=0.03, eps=0.001),
+        act_cfg=dict(type='SiLU', inplace=True)),
+    neck=dict(
+        type='YOLOv7PAFPN',
+        block_cfg=dict(
+            type='ELANBlock',
+            middle_ratio=0.5,
+            block_ratio=0.25,
+            num_blocks=4,
+            num_convs_in_block=1),
+        upsample_feats_cat_first=False,
+        in_channels=[512, 1024, 1024],
+        # The real output channel will be multiplied by 2
+        out_channels=[128, 256, 512],
+        norm_cfg=dict(type='BN', momentum=0.03, eps=0.001),
+        act_cfg=dict(type='SiLU', inplace=True)),
+    bbox_head=dict(
+        type='YOLOv7Head',
+        head_module=dict(
+            type='YOLOv7HeadModule',
+            num_classes=80,
+            in_channels=[256, 512, 1024],
+            featmap_strides=strides,
+            num_base_priors=3),
+        prior_generator=dict(
+            type='mmdet.YOLOAnchorGenerator',
+            base_sizes=anchors,
+            strides=strides),
+        # scaled based on number of detection layers
+        loss_cls=dict(
+            type='mmdet.CrossEntropyLoss',
+            use_sigmoid=True,
+            reduction='mean',
+            loss_weight=0.3 * (num_classes / 80 * 3 / num_det_layers)),
+        loss_bbox=dict(
+            type='IoULoss',
+            iou_mode='ciou',
+            bbox_format='xywh',
+            reduction='mean',
+            loss_weight=0.05 * (3 / num_det_layers),
+            return_iou=True),
+        loss_obj=dict(
+            type='mmdet.CrossEntropyLoss',
+            use_sigmoid=True,
+            reduction='mean',
+            loss_weight=0.7 * ((img_scale[0] / 640)**2 * 3 / num_det_layers)),
+        obj_level_weights=[4., 1., 0.4],
+        # BatchYOLOv7Assigner params
+        prior_match_thr=4.,
+        simota_candidate_topk=10,
+        simota_iou_weight=3.0,
+        simota_cls_weight=1.0),
+    test_cfg=dict(
+        multi_label=True,
+        nms_pre=30000,
+        score_thr=0.001,
+        nms=dict(type='nms', iou_threshold=0.65),
+        max_per_img=300))
+
+pre_transform = [
+    dict(type='LoadImageFromFile', file_client_args=_base_.file_client_args),
+    dict(type='LoadAnnotations', with_bbox=True)
+]
+
+mosiac4_pipeline = [
+    dict(
+        type='Mosaic',
+        img_scale=img_scale,
+        pad_val=114.0,
+        pre_transform=pre_transform),
+    dict(
+        type='YOLOv5RandomAffine',
+        max_rotate_degree=0.0,
+        max_shear_degree=0.0,
+        max_translate_ratio=0.2,  # note
+        scaling_ratio_range=(0.1, 2.0),  # note
+        border=(-img_scale[0] // 2, -img_scale[1] // 2),
+        border_val=(114, 114, 114)),
+]
+
+mosiac9_pipeline = [
+    dict(
+        type='Mosaic9',
+        img_scale=img_scale,
+        pad_val=114.0,
+        pre_transform=pre_transform),
+    dict(
+        type='YOLOv5RandomAffine',
+        max_rotate_degree=0.0,
+        max_shear_degree=0.0,
+        max_translate_ratio=0.2,  # note
+        scaling_ratio_range=(0.1, 2.0),  # note
+        border=(-img_scale[0] // 2, -img_scale[1] // 2),
+        border_val=(114, 114, 114)),
+]
+
+randchoice_mosaic_pipeline = dict(
+    type='RandomChoice',
+    transforms=[mosiac4_pipeline, mosiac9_pipeline],
+    prob=[0.8, 0.2])
+
+train_pipeline = [
+    *pre_transform,
+    randchoice_mosaic_pipeline,
+    dict(
+        type='YOLOv5MixUp',
+        alpha=8.0,  # note
+        beta=8.0,  # note
+        prob=0.15,
+        pre_transform=[*pre_transform, randchoice_mosaic_pipeline]),
+    dict(type='YOLOv5HSVRandomAug'),
+    dict(type='mmdet.RandomFlip', prob=0.5),
+    dict(
+        type='mmdet.PackDetInputs',
+        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'flip',
+                   'flip_direction'))
+]
+
+train_dataloader = dict(
+    batch_size=train_batch_size_per_gpu,
+    num_workers=train_num_workers,
+    persistent_workers=persistent_workers,
+    pin_memory=True,
+    sampler=dict(type='DefaultSampler', shuffle=True),
+    collate_fn=dict(type='yolov5_collate'),  # FASTER
+    dataset=dict(
+        type=dataset_type,
+        data_root=data_root,
+        ann_file='annotations/instances_train2017.json',
+        data_prefix=dict(img='train2017/'),
+        filter_cfg=dict(filter_empty_gt=False, min_size=32),
+        pipeline=train_pipeline))
+
+test_pipeline = [
+    dict(type='LoadImageFromFile', file_client_args=_base_.file_client_args),
+    dict(type='YOLOv5KeepRatioResize', scale=img_scale),
+    dict(
+        type='LetterResize',
+        scale=img_scale,
+        allow_scale_up=False,
+        pad_val=dict(img=114)),
+    dict(type='LoadAnnotations', with_bbox=True, _scope_='mmdet'),
+    dict(
+        type='mmdet.PackDetInputs',
+        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+                   'scale_factor', 'pad_param'))
+]
+
+val_dataloader = dict(
+    batch_size=val_batch_size_per_gpu,
+    num_workers=val_num_workers,
+    persistent_workers=persistent_workers,
+    pin_memory=True,
+    drop_last=False,
+    sampler=dict(type='DefaultSampler', shuffle=False),
+    dataset=dict(
+        type=dataset_type,
+        data_root=data_root,
+        test_mode=True,
+        data_prefix=dict(img='val2017/'),
+        ann_file='annotations/instances_val2017.json',
+        pipeline=test_pipeline,
+        batch_shapes_cfg=batch_shapes_cfg))
+
+test_dataloader = val_dataloader
+
+param_scheduler = None
+optim_wrapper = dict(
+    type='OptimWrapper',
+    optimizer=dict(
+        type='SGD',
+        lr=0.01,
+        momentum=0.937,
+        weight_decay=0.0005,
+        nesterov=True,
+        batch_size_per_gpu=train_batch_size_per_gpu),
+    constructor='YOLOv7OptimWrapperConstructor')
+
+default_hooks = dict(
+    param_scheduler=dict(
+        type='YOLOv5ParamSchedulerHook',
+        scheduler_type='cosine',
+        lr_factor=0.1,  # note
+        max_epochs=max_epochs),
+    checkpoint=dict(
+        type='CheckpointHook',
+        save_param_scheduler=False,
+        interval=1,
+        save_best='auto',
+        max_keep_ckpts=3))
+
+val_evaluator = dict(
+    type='mmdet.CocoMetric',
+    proposal_nums=(100, 1, 10),  # Can be accelerated
+    ann_file=data_root + 'annotations/instances_val2017.json',
+    metric='bbox')
+test_evaluator = val_evaluator
+
+train_cfg = dict(
+    type='EpochBasedTrainLoop',
+    max_epochs=max_epochs,
+    val_interval=save_epoch_intervals,
+    dynamic_intervals=[(270, 1)])
+
+custom_hooks = [
+    dict(
+        type='EMAHook',
+        ema_type='ExpMomentumEMA',
+        momentum=0.0001,
+        update_buffers=True,
+        strict_load=False,
+        priority=49)
+]
+
+val_cfg = dict(type='ValLoop')
+test_cfg = dict(type='TestLoop')
+
+# randomness = dict(seed=1, deterministic=True)
diff --git a/configs/yolov7/yolov7_tiny_syncbn_fast_8x16b-300e_coco.py b/configs/yolov7/yolov7_tiny_syncbn_fast_8x16b-300e_coco.py
new file mode 100644
index 000000000..db311b23a
--- /dev/null
+++ b/configs/yolov7/yolov7_tiny_syncbn_fast_8x16b-300e_coco.py
@@ -0,0 +1,81 @@
+_base_ = './yolov7_l_syncbn_fast_8x16b-300e_coco.py'
+
+num_classes = _base_.num_classes
+num_det_layers = _base_.num_det_layers
+img_scale = _base_.img_scale
+pre_transform = _base_.pre_transform
+
+model = dict(
+    backbone=dict(
+        arch='Tiny', act_cfg=dict(type='LeakyReLU', negative_slope=0.1)),
+    neck=dict(
+        is_tiny_version=True,
+        in_channels=[128, 256, 512],
+        out_channels=[64, 128, 256],
+        block_cfg=dict(
+            _delete_=True, type='TinyDownSampleBlock', middle_ratio=0.25),
+        act_cfg=dict(type='LeakyReLU', negative_slope=0.1),
+        use_repconv_outs=False),
+    bbox_head=dict(
+        head_module=dict(in_channels=[128, 256, 512]),
+        loss_cls=dict(loss_weight=0.5 *
+                      (num_classes / 80 * 3 / num_det_layers)),
+        loss_obj=dict(loss_weight=1.0 *
+                      ((img_scale[0] / 640)**2 * 3 / num_det_layers))))
+
+mosiac4_pipeline = [
+    dict(
+        type='Mosaic',
+        img_scale=img_scale,
+        pad_val=114.0,
+        pre_transform=pre_transform),
+    dict(
+        type='YOLOv5RandomAffine',
+        max_rotate_degree=0.0,
+        max_shear_degree=0.0,
+        max_translate_ratio=0.1,  # change
+        scaling_ratio_range=(0.5, 1.6),  # change
+        border=(-img_scale[0] // 2, -img_scale[1] // 2),
+        border_val=(114, 114, 114)),
+]
+
+mosiac9_pipeline = [
+    dict(
+        type='Mosaic9',
+        img_scale=img_scale,
+        pad_val=114.0,
+        pre_transform=pre_transform),
+    dict(
+        type='YOLOv5RandomAffine',
+        max_rotate_degree=0.0,
+        max_shear_degree=0.0,
+        max_translate_ratio=0.1,  # change
+        scaling_ratio_range=(0.5, 1.6),  # change
+        border=(-img_scale[0] // 2, -img_scale[1] // 2),
+        border_val=(114, 114, 114)),
+]
+
+randchoice_mosaic_pipeline = dict(
+    type='RandomChoice',
+    transforms=[mosiac4_pipeline, mosiac9_pipeline],
+    prob=[0.8, 0.2])
+
+train_pipeline = [
+    *pre_transform,
+    randchoice_mosaic_pipeline,
+    dict(
+        type='YOLOv5MixUp',
+        alpha=8.0,
+        beta=8.0,
+        prob=0.05,  # change
+        pre_transform=[*pre_transform, randchoice_mosaic_pipeline]),
+    dict(type='YOLOv5HSVRandomAug'),
+    dict(type='mmdet.RandomFlip', prob=0.5),
+    dict(
+        type='mmdet.PackDetInputs',
+        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'flip',
+                   'flip_direction'))
+]
+
+train_dataloader = dict(dataset=dict(pipeline=train_pipeline))
+default_hooks = dict(param_scheduler=dict(lr_factor=0.01))
diff --git a/configs/yolov7/yolov7_w-p6_syncbn_fast_8x16b-300e_coco.py b/configs/yolov7/yolov7_w-p6_syncbn_fast_8x16b-300e_coco.py
new file mode 100644
index 000000000..f4c55a491
--- /dev/null
+++ b/configs/yolov7/yolov7_w-p6_syncbn_fast_8x16b-300e_coco.py
@@ -0,0 +1,118 @@
+_base_ = './yolov7_l_syncbn_fast_8x16b-300e_coco.py'
+
+img_scale = (1280, 1280)  # height, width
+num_classes = 80
+# only on Val
+batch_shapes_cfg = dict(img_size=img_scale[0], size_divisor=64)
+
+anchors = [
+    [(19, 27), (44, 40), (38, 94)],  # P3/8
+    [(96, 68), (86, 152), (180, 137)],  # P4/16
+    [(140, 301), (303, 264), (238, 542)],  # P5/32
+    [(436, 615), (739, 380), (925, 792)]  # P6/64
+]
+strides = [8, 16, 32, 64]
+num_det_layers = 4
+
+model = dict(
+    backbone=dict(arch='W', out_indices=(2, 3, 4, 5)),
+    neck=dict(
+        in_channels=[256, 512, 768, 1024],
+        out_channels=[128, 256, 384, 512],
+        use_maxpool_in_downsample=False,
+        use_repconv_outs=False),
+    bbox_head=dict(
+        head_module=dict(
+            type='YOLOv7p6HeadModule',
+            in_channels=[128, 256, 384, 512],
+            featmap_strides=strides,
+            norm_cfg=dict(type='BN', momentum=0.03, eps=0.001),
+            act_cfg=dict(type='SiLU', inplace=True)),
+        prior_generator=dict(base_sizes=anchors, strides=strides),
+        simota_candidate_topk=20,  # note
+        # scaled based on number of detection layers
+        loss_cls=dict(loss_weight=0.3 *
+                      (num_classes / 80 * 3 / num_det_layers)),
+        loss_bbox=dict(loss_weight=0.05 * (3 / num_det_layers)),
+        loss_obj=dict(loss_weight=0.7 *
+                      ((img_scale[0] / 640)**2 * 3 / num_det_layers)),
+        obj_level_weights=[4.0, 1.0, 0.25, 0.06]))
+
+pre_transform = _base_.pre_transform
+
+mosiac4_pipeline = [
+    dict(
+        type='Mosaic',
+        img_scale=img_scale,
+        pad_val=114.0,
+        pre_transform=pre_transform),
+    dict(
+        type='YOLOv5RandomAffine',
+        max_rotate_degree=0.0,
+        max_shear_degree=0.0,
+        max_translate_ratio=0.2,  # note
+        scaling_ratio_range=(0.1, 2.0),  # note
+        border=(-img_scale[0] // 2, -img_scale[1] // 2),
+        border_val=(114, 114, 114)),
+]
+
+mosiac9_pipeline = [
+    dict(
+        type='Mosaic9',
+        img_scale=img_scale,
+        pad_val=114.0,
+        pre_transform=pre_transform),
+    dict(
+        type='YOLOv5RandomAffine',
+        max_rotate_degree=0.0,
+        max_shear_degree=0.0,
+        max_translate_ratio=0.2,  # note
+        scaling_ratio_range=(0.1, 2.0),  # note
+        border=(-img_scale[0] // 2, -img_scale[1] // 2),
+        border_val=(114, 114, 114)),
+]
+
+randchoice_mosaic_pipeline = dict(
+    type='RandomChoice',
+    transforms=[mosiac4_pipeline, mosiac9_pipeline],
+    prob=[0.8, 0.2])
+
+train_pipeline = [
+    *pre_transform,
+    randchoice_mosaic_pipeline,
+    dict(
+        type='YOLOv5MixUp',
+        alpha=8.0,  # note
+        beta=8.0,  # note
+        prob=0.15,
+        pre_transform=[*pre_transform, randchoice_mosaic_pipeline]),
+    dict(type='YOLOv5HSVRandomAug'),
+    dict(type='mmdet.RandomFlip', prob=0.5),
+    dict(
+        type='mmdet.PackDetInputs',
+        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'flip',
+                   'flip_direction'))
+]
+train_dataloader = dict(dataset=dict(pipeline=train_pipeline))
+
+test_pipeline = [
+    dict(type='LoadImageFromFile', file_client_args=_base_.file_client_args),
+    dict(type='YOLOv5KeepRatioResize', scale=img_scale),
+    dict(
+        type='LetterResize',
+        scale=img_scale,
+        allow_scale_up=False,
+        pad_val=dict(img=114)),
+    dict(type='LoadAnnotations', with_bbox=True, _scope_='mmdet'),
+    dict(
+        type='mmdet.PackDetInputs',
+        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+                   'scale_factor', 'pad_param'))
+]
+val_dataloader = dict(
+    dataset=dict(pipeline=test_pipeline, batch_shapes_cfg=batch_shapes_cfg))
+test_dataloader = val_dataloader
+
+# The only difference between P6 and P5 in terms of
+# hyperparameters is lr_factor
+default_hooks = dict(param_scheduler=dict(lr_factor=0.2))
diff --git a/configs/yolov7/yolov7_x_syncbn_fast_8x16b-300e_coco.py b/configs/yolov7/yolov7_x_syncbn_fast_8x16b-300e_coco.py
new file mode 100644
index 000000000..992970596
--- /dev/null
+++ b/configs/yolov7/yolov7_x_syncbn_fast_8x16b-300e_coco.py
@@ -0,0 +1,15 @@
+_base_ = './yolov7_l_syncbn_fast_8x16b-300e_coco.py'
+
+model = dict(
+    backbone=dict(arch='X'),
+    neck=dict(
+        in_channels=[640, 1280, 1280],
+        out_channels=[160, 320, 640],
+        block_cfg=dict(
+            type='ELANBlock',
+            middle_ratio=0.4,
+            block_ratio=0.4,
+            num_blocks=3,
+            num_convs_in_block=2),
+        use_repconv_outs=False),
+    bbox_head=dict(head_module=dict(in_channels=[320, 640, 1280])))
diff --git a/configs/yolox/README.md b/configs/yolox/README.md
index c6c24af27..c202c6f75 100644
--- a/configs/yolox/README.md
+++ b/configs/yolox/README.md
@@ -16,7 +16,7 @@ In this report, we present some experienced improvements to YOLO series, forming
 
 |  Backbone  | size | Mem (GB) | box AP |                                                Config                                                 |                                                                                                                                    Download                                                                                                                                    |
 | :--------: | :--: | :------: | :----: | :---------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
-| YOLOX-tiny | 640  |   2.8    |  32.7  | [config](https://github.com/open-mmlab/mmyolo/tree/master/configs/yolox/yolox_tiny_8xb8-300e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolox/yolox_tiny_8xb8-300e_coco/yolox_tiny_8xb8-300e_coco_20220919_090908-0e40a6fc.pth) \| [log](https://download.openmmlab.com/mmyolo/v0/yolox/yolox_tiny_8xb8-300e_coco/yolox_tiny_8xb8-300e_coco_20220919_090908.log.json) |
+| YOLOX-tiny | 416  |   2.8    |  32.7  | [config](https://github.com/open-mmlab/mmyolo/tree/master/configs/yolox/yolox_tiny_8xb8-300e_coco.py) | [model](https://download.openmmlab.com/mmyolo/v0/yolox/yolox_tiny_8xb8-300e_coco/yolox_tiny_8xb8-300e_coco_20220919_090908-0e40a6fc.pth) \| [log](https://download.openmmlab.com/mmyolo/v0/yolox/yolox_tiny_8xb8-300e_coco/yolox_tiny_8xb8-300e_coco_20220919_090908.log.json) |
 |  YOLOX-s   | 640  |   5.6    |  40.8  |  [config](https://github.com/open-mmlab/mmyolo/tree/master/configs/yolox/yolox_s_8xb8-300e_coco.py)   |       [model](https://download.openmmlab.com/mmyolo/v0/yolox/yolox_s_8xb8-300e_coco/yolox_s_8xb8-300e_coco_20220917_030738-d7e60cb2.pth) \| [log](https://download.openmmlab.com/mmyolo/v0/yolox/yolox_s_8xb8-300e_coco/yolox_s_8xb8-300e_coco_20220917_030738.log.json)       |
 
 **Note**:
diff --git a/demo/boxam_vis_demo.py b/demo/boxam_vis_demo.py
new file mode 100644
index 000000000..3672b727d
--- /dev/null
+++ b/demo/boxam_vis_demo.py
@@ -0,0 +1,276 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+"""This script is in the experimental verification stage and cannot be
+guaranteed to be completely correct. Currently Grad-based CAM and Grad-free CAM
+are supported.
+
+The target detection task is different from the classification task. It not
+only includes the AM map of the category, but also includes information such as
+bbox and mask, so this script is named bboxam.
+"""
+
+import argparse
+import os.path
+import warnings
+from functools import partial
+
+import cv2
+import mmcv
+from mmengine import Config, DictAction, MessageHub
+from mmengine.utils import ProgressBar
+
+from mmyolo.utils import register_all_modules
+from mmyolo.utils.boxam_utils import (BoxAMDetectorVisualizer,
+                                      BoxAMDetectorWrapper, DetAblationLayer,
+                                      DetBoxScoreTarget, GradCAM,
+                                      GradCAMPlusPlus, reshape_transform)
+from mmyolo.utils.misc import get_file_list
+
+try:
+    from pytorch_grad_cam import AblationCAM, EigenCAM
+except ImportError:
+    raise ImportError('Please run `pip install "grad-cam"` to install '
+                      'pytorch_grad_cam package.')
+
+GRAD_FREE_METHOD_MAP = {
+    'ablationcam': AblationCAM,
+    'eigencam': EigenCAM,
+    # 'scorecam': ScoreCAM, # consumes too much memory
+}
+
+GRAD_BASED_METHOD_MAP = {'gradcam': GradCAM, 'gradcam++': GradCAMPlusPlus}
+
+ALL_SUPPORT_METHODS = list(GRAD_FREE_METHOD_MAP.keys()
+                           | GRAD_BASED_METHOD_MAP.keys())
+
+IGNORE_LOSS_PARAMS = {
+    'yolov5': ['loss_obj'],
+    'yolov6': ['loss_cls'],
+    'yolox': ['loss_obj'],
+    'rtmdet': ['loss_cls'],
+}
+
+# This parameter is required in some algorithms
+# for calculating Loss
+message_hub = MessageHub.get_current_instance()
+message_hub.runtime_info['epoch'] = 0
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(description='Visualize Box AM')
+    parser.add_argument(
+        'img', help='Image path, include image file, dir and URL.')
+    parser.add_argument('config', help='Config file')
+    parser.add_argument('checkpoint', help='Checkpoint file')
+    parser.add_argument(
+        '--method',
+        default='gradcam',
+        choices=ALL_SUPPORT_METHODS,
+        help='Type of method to use, supports '
+        f'{", ".join(ALL_SUPPORT_METHODS)}.')
+    parser.add_argument(
+        '--target-layers',
+        default=['neck.out_layers[2]'],
+        nargs='+',
+        type=str,
+        help='The target layers to get Box AM, if not set, the tool will '
+        'specify the neck.out_layers[2]')
+    parser.add_argument(
+        '--out-dir', default='./output', help='Path to output file')
+    parser.add_argument(
+        '--show', action='store_true', help='Show the CAM results')
+    parser.add_argument(
+        '--device', default='cuda:0', help='Device used for inference')
+    parser.add_argument(
+        '--score-thr', type=float, default=0.3, help='Bbox score threshold')
+    parser.add_argument(
+        '--topk',
+        type=int,
+        default=-1,
+        help='Select topk predict resutls to show. -1 are mean all.')
+    parser.add_argument(
+        '--max-shape',
+        nargs='+',
+        type=int,
+        default=-1,
+        help='max shapes. Its purpose is to save GPU memory. '
+        'The activation map is scaled and then evaluated. '
+        'If set to -1, it means no scaling.')
+    parser.add_argument(
+        '--preview-model',
+        default=False,
+        action='store_true',
+        help='To preview all the model layers')
+    parser.add_argument(
+        '--norm-in-bbox', action='store_true', help='Norm in bbox of am image')
+    parser.add_argument(
+        '--cfg-options',
+        nargs='+',
+        action=DictAction,
+        help='override some settings in the used config, the key-value pair '
+        'in xxx=yyy format will be merged into config file. If the value to '
+        'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
+        'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
+        'Note that the quotation marks are necessary and that no white space '
+        'is allowed.')
+    # Only used by AblationCAM
+    parser.add_argument(
+        '--batch-size',
+        type=int,
+        default=1,
+        help='batch of inference of AblationCAM')
+    parser.add_argument(
+        '--ratio-channels-to-ablate',
+        type=int,
+        default=0.5,
+        help='Making it much faster of AblationCAM. '
+        'The parameter controls how many channels should be ablated')
+
+    args = parser.parse_args()
+    return args
+
+
+def init_detector_and_visualizer(args, cfg):
+    max_shape = args.max_shape
+    if not isinstance(max_shape, list):
+        max_shape = [args.max_shape]
+    assert len(max_shape) == 1 or len(max_shape) == 2
+
+    model_wrapper = BoxAMDetectorWrapper(
+        cfg, args.checkpoint, args.score_thr, device=args.device)
+
+    if args.preview_model:
+        print(model_wrapper.detector)
+        print('\n Please remove `--preview-model` to get the BoxAM.')
+        return None, None
+
+    target_layers = []
+    for target_layer in args.target_layers:
+        try:
+            target_layers.append(
+                eval(f'model_wrapper.detector.{target_layer}'))
+        except Exception as e:
+            print(model_wrapper.detector)
+            raise RuntimeError('layer does not exist', e)
+
+    ablationcam_extra_params = {
+        'batch_size': args.batch_size,
+        'ablation_layer': DetAblationLayer(),
+        'ratio_channels_to_ablate': args.ratio_channels_to_ablate
+    }
+
+    if args.method in GRAD_BASED_METHOD_MAP:
+        method_class = GRAD_BASED_METHOD_MAP[args.method]
+        is_need_grad = True
+    else:
+        method_class = GRAD_FREE_METHOD_MAP[args.method]
+        is_need_grad = False
+
+    boxam_detector_visualizer = BoxAMDetectorVisualizer(
+        method_class,
+        model_wrapper,
+        target_layers,
+        reshape_transform=partial(
+            reshape_transform, max_shape=max_shape, is_need_grad=is_need_grad),
+        is_need_grad=is_need_grad,
+        extra_params=ablationcam_extra_params)
+    return model_wrapper, boxam_detector_visualizer
+
+
+def main():
+    register_all_modules()
+
+    args = parse_args()
+
+    # hard code
+    ignore_loss_params = None
+    for param_keys in IGNORE_LOSS_PARAMS:
+        if param_keys in args.config:
+            print(f'The algorithm currently used is {param_keys}')
+            ignore_loss_params = IGNORE_LOSS_PARAMS[param_keys]
+            break
+
+    cfg = Config.fromfile(args.config)
+    if args.cfg_options is not None:
+        cfg.merge_from_dict(args.cfg_options)
+
+    if not os.path.exists(args.out_dir) and not args.show:
+        os.mkdir(args.out_dir)
+
+    model_wrapper, boxam_detector_visualizer = init_detector_and_visualizer(
+        args, cfg)
+
+    # get file list
+    image_list, source_type = get_file_list(args.img)
+
+    progress_bar = ProgressBar(len(image_list))
+
+    for image_path in image_list:
+        image = cv2.imread(image_path)
+        model_wrapper.set_input_data(image)
+
+        # forward detection results
+        result = model_wrapper()[0]
+
+        pred_instances = result.pred_instances
+        # Get candidate predict info with score threshold
+        pred_instances = pred_instances[pred_instances.scores > args.score_thr]
+
+        if len(pred_instances) == 0:
+            warnings.warn('empty detection results! skip this')
+            continue
+
+        if args.topk > 0:
+            pred_instances = pred_instances[:args.topk]
+
+        targets = [
+            DetBoxScoreTarget(
+                pred_instances,
+                device=args.device,
+                ignore_loss_params=ignore_loss_params)
+        ]
+
+        if args.method in GRAD_BASED_METHOD_MAP:
+            model_wrapper.need_loss(True)
+            model_wrapper.set_input_data(image, pred_instances)
+            boxam_detector_visualizer.switch_activations_and_grads(
+                model_wrapper)
+
+        # get box am image
+        grayscale_boxam = boxam_detector_visualizer(image, targets=targets)
+
+        # draw cam on image
+        pred_instances = pred_instances.numpy()
+        image_with_bounding_boxes = boxam_detector_visualizer.show_am(
+            image,
+            pred_instances,
+            grayscale_boxam,
+            with_norm_in_bboxes=args.norm_in_bbox)
+
+        if source_type['is_dir']:
+            filename = os.path.relpath(image_path, args.img).replace('/', '_')
+        else:
+            filename = os.path.basename(image_path)
+        out_file = None if args.show else os.path.join(args.out_dir, filename)
+
+        if out_file:
+            mmcv.imwrite(image_with_bounding_boxes, out_file)
+        else:
+            cv2.namedWindow(filename, 0)
+            cv2.imshow(filename, image_with_bounding_boxes)
+            cv2.waitKey(0)
+
+        # switch
+        if args.method in GRAD_BASED_METHOD_MAP:
+            model_wrapper.need_loss(False)
+            boxam_detector_visualizer.switch_activations_and_grads(
+                model_wrapper)
+
+        progress_bar.update()
+
+    if not args.show:
+        print(f'All done!'
+              f'\nResults have been saved at {os.path.abspath(args.out_dir)}')
+
+
+if __name__ == '__main__':
+    main()
diff --git a/demo/featmap_vis_demo.py b/demo/featmap_vis_demo.py
index c9fb8eda5..2006c7af1 100644
--- a/demo/featmap_vis_demo.py
+++ b/demo/featmap_vis_demo.py
@@ -13,7 +13,6 @@
 from mmyolo.utils.misc import auto_arrange_images, get_file_list
 
 
-# TODO: Refine
 def parse_args():
     parser = argparse.ArgumentParser(description='Visualize feature map')
     parser.add_argument(
@@ -190,8 +189,9 @@ def main():
         if args.show:
             visualizer.show(shown_imgs)
 
-    print(f'All done!'
-          f'\nResults have been saved at {os.path.abspath(args.out_dir)}')
+    if not args.show:
+        print(f'All done!'
+              f'\nResults have been saved at {os.path.abspath(args.out_dir)}')
 
 
 # Please refer to the usage tutorial:
diff --git a/demo/image_demo.py b/demo/image_demo.py
index 5ccc7aefe..81b41358d 100644
--- a/demo/image_demo.py
+++ b/demo/image_demo.py
@@ -9,7 +9,8 @@
 
 from mmyolo.registry import VISUALIZERS
 from mmyolo.utils import register_all_modules, switch_to_deploy
-from mmyolo.utils.misc import get_file_list
+from mmyolo.utils.labelme_utils import LabelmeFormat
+from mmyolo.utils.misc import get_file_list, show_data_classes
 
 
 def parse_args():
@@ -30,6 +31,15 @@ def parse_args():
         help='Switch model to deployment mode')
     parser.add_argument(
         '--score-thr', type=float, default=0.3, help='Bbox score threshold')
+    parser.add_argument(
+        '--class-name',
+        nargs='+',
+        type=str,
+        help='Only Save those classes if set')
+    parser.add_argument(
+        '--to-labelme',
+        action='store_true',
+        help='Output labelme style label file')
     args = parser.parse_args()
     return args
 
@@ -37,6 +47,10 @@ def parse_args():
 def main():
     args = parse_args()
 
+    if args.to_labelme and args.show:
+        raise RuntimeError('`--to-labelme` or `--show` only '
+                           'can choose one at the same time.')
+
     # register all modules in mmdet into the registries
     register_all_modules()
 
@@ -56,6 +70,22 @@ def main():
     # get file list
     files, source_type = get_file_list(args.img)
 
+    # get model class name
+    dataset_classes = model.dataset_meta.get('CLASSES')
+
+    # ready for labelme format if it is needed
+    to_label_format = LabelmeFormat(classes=dataset_classes)
+
+    # check class name
+    if args.class_name is not None:
+        for class_name in args.class_name:
+            if class_name in dataset_classes:
+                continue
+            show_data_classes(dataset_classes)
+            raise RuntimeError(
+                'Expected args.class_name to be one of the list, '
+                f'but got "{class_name}"')
+
     # start detector inference
     progress_bar = ProgressBar(len(files))
     for file in files:
@@ -70,8 +100,22 @@ def main():
             filename = os.path.basename(file)
         out_file = None if args.show else os.path.join(args.out_dir, filename)
 
+        progress_bar.update()
+
+        # Get candidate predict info with score threshold
+        pred_instances = result.pred_instances[
+            result.pred_instances.scores > args.score_thr]
+
+        if args.to_labelme:
+            # save result to labelme files
+            out_file = out_file.replace(
+                os.path.splitext(out_file)[-1], '.json')
+            to_label_format(pred_instances, result.metainfo, out_file,
+                            args.class_name)
+            continue
+
         visualizer.add_datasample(
-            os.path.basename(out_file),
+            filename,
             img,
             data_sample=result,
             draw_gt=False,
@@ -79,12 +123,15 @@ def main():
             wait_time=0,
             out_file=out_file,
             pred_score_thr=args.score_thr)
-        progress_bar.update()
 
-    if not args.show:
+    if not args.show and not args.to_labelme:
         print_log(
             f'\nResults have been saved at {os.path.abspath(args.out_dir)}')
 
+    elif args.to_labelme:
+        print_log('\nLabelme format label files '
+                  f'had all been saved in {args.out_dir}')
+
 
 if __name__ == '__main__':
     main()
diff --git a/demo/large_image.jpg b/demo/large_image.jpg
new file mode 100644
index 000000000..1abbc5d9b
Binary files /dev/null and b/demo/large_image.jpg differ
diff --git a/demo/large_image_demo.py b/demo/large_image_demo.py
new file mode 100644
index 000000000..9b4a72ac7
--- /dev/null
+++ b/demo/large_image_demo.py
@@ -0,0 +1,208 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+"""Perform MMYOLO inference on large images (as satellite imagery) as:
+
+```shell
+wget -P checkpoint https://download.openmmlab.com/mmyolo/v0/yolov5/yolov5_m-v61_syncbn_fast_8xb16-300e_coco/yolov5_m-v61_syncbn_fast_8xb16-300e_coco_20220917_204944-516a710f.pth syncbn_fast_8xb16-300e_coco/yolov5_m-v61_syncbn_fast_8xb16-300e_coco_20220917_204944-516a710f.pth   syncbn_fast_8xb16-300e_coco/yolov5_m-v61_syncbn_fast_8xb16-300e_coco_20220917_204944-516a710f.pth  # noqa: E501, E261.
+
+python demo/large_image_demo.py \
+    demo/large_image.jpg \
+    configs/yolov5/yolov5_m-v61_syncbn_fast_8xb16-300e_coco.py \
+    checkpoint/yolov5_m-v61_syncbn_fast_8xb16-300e_coco_20220917_204944-516a710f.pth \
+```
+"""
+
+import os
+from argparse import ArgumentParser
+
+import mmcv
+from mmdet.apis import inference_detector, init_detector
+from mmengine.logging import print_log
+from mmengine.utils import ProgressBar
+
+try:
+    from sahi.slicing import slice_image
+except ImportError:
+    raise ImportError('Please run "pip install -U sahi" '
+                      'to install sahi first for large image inference.')
+
+from mmyolo.registry import VISUALIZERS
+from mmyolo.utils import register_all_modules, switch_to_deploy
+from mmyolo.utils.large_image import merge_results_by_nms
+from mmyolo.utils.misc import get_file_list
+
+
+def parse_args():
+    parser = ArgumentParser(
+        description='Perform MMYOLO inference on large images.')
+    parser.add_argument(
+        'img', help='Image path, include image file, dir and URL.')
+    parser.add_argument('config', help='Config file')
+    parser.add_argument('checkpoint', help='Checkpoint file')
+    parser.add_argument(
+        '--out-dir', default='./output', help='Path to output file')
+    parser.add_argument(
+        '--device', default='cuda:0', help='Device used for inference')
+    parser.add_argument(
+        '--show', action='store_true', help='Show the detection results')
+    parser.add_argument(
+        '--deploy',
+        action='store_true',
+        help='Switch model to deployment mode')
+    parser.add_argument(
+        '--score-thr', type=float, default=0.3, help='Bbox score threshold')
+    parser.add_argument(
+        '--patch-size', type=int, default=640, help='The size of patches')
+    parser.add_argument(
+        '--patch-overlap-ratio',
+        type=int,
+        default=0.25,
+        help='Ratio of overlap between two patches')
+    parser.add_argument(
+        '--merge-iou-thr',
+        type=float,
+        default=0.25,
+        help='IoU threshould for merging results')
+    parser.add_argument(
+        '--merge-nms-type',
+        type=str,
+        default='nms',
+        help='NMS type for merging results')
+    parser.add_argument(
+        '--batch-size',
+        type=int,
+        default=1,
+        help='Batch size, must greater than or equal to 1')
+    parser.add_argument(
+        '--debug',
+        action='store_true',
+        help='Export debug images at each stage for 1 input')
+    args = parser.parse_args()
+    return args
+
+
+def main():
+    args = parse_args()
+
+    # register all modules in mmdet into the registries
+    register_all_modules()
+
+    # build the model from a config file and a checkpoint file
+    model = init_detector(args.config, args.checkpoint, device=args.device)
+
+    if args.deploy:
+        switch_to_deploy(model)
+
+    if not os.path.exists(args.out_dir) and not args.show:
+        os.mkdir(args.out_dir)
+
+    # init visualizer
+    visualizer = VISUALIZERS.build(model.cfg.visualizer)
+    visualizer.dataset_meta = model.dataset_meta
+
+    # get file list
+    files, source_type = get_file_list(args.img)
+
+    # if debug, only process the first file
+    if args.debug:
+        files = files[:1]
+
+    # start detector inference
+    print(f'Performing inference on {len(files)} images... \
+This may take a while.')
+    progress_bar = ProgressBar(len(files))
+    for file in files:
+        # read image
+        img = mmcv.imread(file)
+
+        # arrange slices
+        height, width = img.shape[:2]
+        sliced_image_object = slice_image(
+            img,
+            slice_height=args.patch_size,
+            slice_width=args.patch_size,
+            auto_slice_resolution=False,
+            overlap_height_ratio=args.patch_overlap_ratio,
+            overlap_width_ratio=args.patch_overlap_ratio,
+        )
+
+        # perform sliced inference
+        slice_results = []
+        start = 0
+        while True:
+            # prepare batch slices
+            end = min(start + args.batch_size, len(sliced_image_object))
+            images = []
+            for sliced_image in sliced_image_object.images[start:end]:
+                images.append(sliced_image)
+
+            # forward the model
+            slice_results.extend(inference_detector(model, images))
+
+            if end >= len(sliced_image_object):
+                break
+            start += args.batch_size
+
+        if source_type['is_dir']:
+            filename = os.path.relpath(file, args.img).replace('/', '_')
+        else:
+            filename = os.path.basename(file)
+
+        # export debug images
+        if args.debug:
+            # export sliced images
+            for i, image in enumerate(sliced_image_object.images):
+                image = mmcv.imconvert(image, 'bgr', 'rgb')
+                out_file = os.path.join(args.out_dir, 'sliced_images',
+                                        filename + f'_slice_{i}.jpg')
+
+                mmcv.imwrite(image, out_file)
+
+            # export sliced image results
+            for i, slice_result in enumerate(slice_results):
+                out_file = os.path.join(args.out_dir, 'sliced_image_results',
+                                        filename + f'_slice_{i}_result.jpg')
+                image = mmcv.imconvert(sliced_image_object.images[i], 'bgr',
+                                       'rgb')
+
+                visualizer.add_datasample(
+                    os.path.basename(out_file),
+                    image,
+                    data_sample=slice_result,
+                    draw_gt=False,
+                    show=args.show,
+                    wait_time=0,
+                    out_file=out_file,
+                    pred_score_thr=args.score_thr,
+                )
+
+        image_result = merge_results_by_nms(
+            slice_results,
+            sliced_image_object.starting_pixels,
+            src_image_shape=(height, width),
+            nms_cfg={
+                'type': args.merge_nms_type,
+                'iou_thr': args.merge_iou_thr
+            })
+
+        img = mmcv.imconvert(img, 'bgr', 'rgb')
+        out_file = None if args.show else os.path.join(args.out_dir, filename)
+
+        visualizer.add_datasample(
+            os.path.basename(out_file),
+            img,
+            data_sample=image_result,
+            draw_gt=False,
+            show=args.show,
+            wait_time=0,
+            out_file=out_file,
+            pred_score_thr=args.score_thr,
+        )
+        progress_bar.update()
+
+    if not args.show:
+        print_log(
+            f'\nResults have been saved at {os.path.abspath(args.out_dir)}')
+
+
+if __name__ == '__main__':
+    main()
diff --git a/docs/en/advanced_guides/how_to.md b/docs/en/advanced_guides/how_to.md
index 201f17ff1..bd2300c61 100644
--- a/docs/en/advanced_guides/how_to.md
+++ b/docs/en/advanced_guides/how_to.md
@@ -1,27 +1,10 @@
-This tutorial collects answers to any `How to xxx with MMYOLO`. Feel free to update this doc if you meet new questions about `How to` and find the answers!
-
-# Add plugins to the Backbone network
+# How to xxx
 
-MMYOLO supports adding plugins such as none_local and dropout after different stages of Backbone. Users can directly manage plugins by modifying the plugins parameter of backbone in config. For example, add GeneralizedAttention plugins for `YOLOv5`. The configuration files are as follows:
-
-```python
-_base_ = './yolov5_s-v61_syncbn_8xb16-300e_coco.py'
+This tutorial collects answers to any `How to xxx with MMYOLO`. Feel free to update this doc if you meet new questions about `How to` and find the answers!
 
-model = dict(
-    backbone=dict(
-        plugins=[
-            dict(
-                cfg=dict(
-                    type='mmdet.GeneralizedAttention',
-                    spatial_range=-1,
-                    num_heads=8,
-                    attention_type='0011',
-                    kv_stride=2),
-                stages=(False, False, True, True)),
-        ], ))
-```
+## Add plugins to the backbone network
 
-`cfg` parameter indicates the specific configuration of the plug-in. The `stages` parameter indicates whether to add plug-ins after the corresponding stage of the backbone. The length of list `stages` must be the same as the number of backbone stages.
+Please see [Plugins](plugins.md).
 
 ## Apply multiple Necks
 
@@ -61,16 +44,33 @@ model = dict(
 )
 ```
 
-## Use backbone network implemented in other OpenMMLab repositories
-
-The model registry in MMYOLO, MMDetection, MMClassification, and MMSegmentation all inherit from the root registry in MMEngine in the OpenMMLab 2.0 system, allowing these repositories to directly use modules already implemented by each other. Therefore, in MMYOLO, users can use backbone networks from MMDetection and MMClassification without reimplementation.
+## Replace the backbone network
 
 ```{note}
 1. When using other backbone networks, you need to ensure that the output channels of the backbone network match the input channels of the neck network.
 2. The configuration files given below only ensure that the training will work correctly, and their training performance may not be optimal. Because some backbones require specific learning rates, optimizers, and other hyperparameters. Related contents will be added in the "Training Tips" section later.
 ```
 
-### Use backbone network implemented in MMDetection
+### Use backbone network implemented in MMYOLO
+
+Suppose you want to use `YOLOv6EfficientRep` as the backbone network of `YOLOv5`, the example config is as the following:
+
+```python
+_base_ = './yolov5_s-v61_syncbn_8xb16-300e_coco.py'
+
+model = dict(
+    backbone=dict(
+        type='YOLOv6EfficientRep',
+        norm_cfg=dict(type='BN', momentum=0.03, eps=0.001),
+        act_cfg=dict(type='ReLU', inplace=True))
+)
+```
+
+### Use backbone network implemented in other OpenMMLab repositories
+
+The model registry in MMYOLO, MMDetection, MMClassification, and MMSegmentation all inherit from the root registry in MMEngine in the OpenMMLab 2.0 system, allowing these repositories to directly use modules already implemented by each other. Therefore, in MMYOLO, users can use backbone networks from MMDetection and MMClassification without reimplementation.
+
+#### Use backbone network implemented in MMDetection
 
 1. Suppose you want to use `ResNet-50` as the backbone network of `YOLOv5`, the example config is as the following:
 
@@ -151,7 +151,7 @@ The model registry in MMYOLO, MMDetection, MMClassification, and MMSegmentation
    )
    ```
 
-### Use backbone network implemented in MMClassification
+#### Use backbone network implemented in MMClassification
 
 1. Suppose you want to use `ConvNeXt-Tiny` as the backbone network of `YOLOv5`, the example config is as the following:
 
@@ -231,7 +231,7 @@ The model registry in MMYOLO, MMDetection, MMClassification, and MMSegmentation
    )
    ```
 
-### Use backbone network in `timm` through MMClassification
+#### Use backbone network in `timm` through MMClassification
 
 MMClassification also provides a wrapper for the Py**T**orch **Im**age **M**odels (`timm`) backbone network, users can directly use the backbone network in `timm` through MMClassification. Suppose you want to use `EfficientNet-B1` as the backbone network of `YOLOv5`, the example config is as the following:
 
@@ -269,3 +269,197 @@ model = dict(
             widen_factor=widen_factor))
 )
 ```
+
+#### Use backbone network implemented in MMSelfSup
+
+Suppose you want to use `ResNet-50` which is self-supervised trained by `MoCo v3` in MMSelfSup as the backbone network of `YOLOv5`, the example config is as the following:
+
+```python
+_base_ = './yolov5_s-v61_syncbn_8xb16-300e_coco.py'
+
+# please run the command, mim install "mmselfsup>=1.0.0rc3", to install mmselfsup
+# import mmselfsup.models to trigger register_module in mmselfsup
+custom_imports = dict(imports=['mmselfsup.models'], allow_failed_imports=False)
+checkpoint_file = 'https://download.openmmlab.com/mmselfsup/1.x/mocov3/mocov3_resnet50_8xb512-amp-coslr-800e_in1k/mocov3_resnet50_8xb512-amp-coslr-800e_in1k_20220927-e043f51a.pth'  # noqa
+deepen_factor = _base_.deepen_factor
+widen_factor = 1.0
+channels = [512, 1024, 2048]
+
+model = dict(
+    backbone=dict(
+        _delete_=True, # Delete the backbone field in _base_
+        type='mmselfsup.ResNet',
+        depth=50,
+        num_stages=4,
+        out_indices=(2, 3, 4), # Note: out_indices of ResNet in MMSelfSup are 1 larger than those in MMdet and MMCls
+        frozen_stages=1,
+        norm_cfg=dict(type='BN', requires_grad=True),
+        norm_eval=True,
+        style='pytorch',
+        init_cfg=dict(type='Pretrained', checkpoint=checkpoint_file)),
+    neck=dict(
+        type='YOLOv5PAFPN',
+        deepen_factor=deepen_factor,
+        widen_factor=widen_factor,
+        in_channels=channels, # Note: The 3 channels of ResNet-50 output are [512, 1024, 2048], which do not match the original yolov5-s neck and need to be changed.
+        out_channels=channels),
+    bbox_head=dict(
+        type='YOLOv5Head',
+        head_module=dict(
+            type='YOLOv5HeadModule',
+            in_channels=channels, # input channels of head need to be changed accordingly
+            widen_factor=widen_factor))
+)
+```
+
+## Output prediction results
+
+If you want to save the prediction results as a specific file for offline evaluation, MMYOLO currently supports both json and pkl formats.
+
+```{note}
+The json file only save `image_id`, `bbox`, `score` and `category_id`. The json file can be read using the json library.
+The pkl file holds more content than the json file, and also holds information such as the file name and size of the predicted image; the pkl file can be read using the pickle library. The pkl file can be read using the pickle library.
+```
+
+### Output into json file
+
+If you want to output the prediction results as a json file, the command is as follows.
+
+```shell
+python tools/test.py {path_to_config} {path_to_checkpoint} --json-prefix {json_prefix}
+```
+
+The argument after `--json-prefix` should be a filename prefix (no need to enter the `.json` suffix) and can also contain a path. For a concrete example:
+
+```shell
+python tools/test.py configs\yolov5\yolov5_s-v61_syncbn_8xb16-300e_coco.py yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700-86e02187.pth --json-prefix work_dirs/demo/json_demo
+```
+
+Running the above command will output the `json_demo.bbox.json` file in the `work_dirs/demo` folder.
+
+### Output into pkl file
+
+If you want to output the prediction results as a pkl file, the command is as follows.
+
+```shell
+python tools/test.py {path_to_config} {path_to_checkpoint} --out {path_to_output_file}
+```
+
+The argument after `--out` should be a full filename (**must be** with a `.pkl` or `.pickle` suffix) and can also contain a path. For a concrete example:
+
+```shell
+python tools/test.py configs\yolov5\yolov5_s-v61_syncbn_8xb16-300e_coco.py yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700-86e02187.pth --out work_dirs/demo/pkl_demo.pkl
+```
+
+Running the above command will output the `pkl_demo.pkl` file in the `work_dirs/demo` folder.
+
+## Use mim to run scripts from other OpenMMLab repositories
+
+```{note}
+1. All script calls across libraries are currently not supported and are being fixed. More examples will be added to this document when the fix is complete. 2.
+2. mAP plotting and average training speed calculation are fixed in the MMDetection dev-3.x branch, which currently needs to be installed via the source code to be run successfully.
+```
+
+## Log Analysis
+
+#### Curve plotting
+
+`tools/analysis_tools/analyze_logs.py` plots loss/mAP curves given a training log file. Run `pip install seaborn` first to install the dependency.
+
+```shell
+mim run mmdet analyze_logs plot_curve \
+    ${LOG} \                                     # path of train log in json format
+    [--keys ${KEYS}] \                           # the metric that you want to plot, default to 'bbox_mAP'
+    [--start-epoch ${START_EPOCH}]               # the epoch that you want to start, default to 1
+    [--eval-interval ${EVALUATION_INTERVAL}] \   # the evaluation interval when training, default to 1
+    [--title ${TITLE}] \                         # title of figure
+    [--legend ${LEGEND}] \                       # legend of each plot, default to None
+    [--backend ${BACKEND}] \                     # backend of plt, default to None
+    [--style ${STYLE}] \                         # style of plt, default to 'dark'
+    [--out ${OUT_FILE}]                          # the path of output file
+# [] stands for optional parameters, when actually entering the command line, you do not need to enter []
+```
+
+Examples:
+
+- Plot the classification loss of some run.
+
+  ```shell
+  mim run mmdet analyze_logs plot_curve \
+      yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700.log.json \
+      --keys loss_cls \
+      --legend loss_cls
+  ```
+
+  <img src="https://user-images.githubusercontent.com/27466624/204747359-754555df-1f97-4d5c-87ca-9ad3a0badcce.png" width="600"/>
+
+- Plot the classification and regression loss of some run, and save the figure to a pdf.
+
+  ```shell
+  mim run mmdet analyze_logs plot_curve \
+      yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700.log.json \
+      --keys loss_cls loss_bbox \
+      --legend loss_cls loss_bbox \
+      --out losses_yolov5_s.pdf
+  ```
+
+  <img src="https://user-images.githubusercontent.com/27466624/204748560-2d17ce4b-fb5f-4732-a962-329109e73aad.png" width="600"/>
+
+- Compare the bbox mAP of two runs in the same figure.
+
+  ```shell
+  mim run mmdet analyze_logs plot_curve \
+      yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700.log.json \
+      yolov5_n-v61_syncbn_fast_8xb16-300e_coco_20220919_090739.log.json \
+      --keys bbox_mAP \
+      --legend yolov5_s yolov5_n \
+      --eval-interval 10 # Note that the evaluation interval must be the same as during training. Otherwise, it will raise an error.
+  ```
+
+<img src="https://user-images.githubusercontent.com/27466624/204748704-21db9f9e-386e-449c-91c7-2ce3f8b51f24.png" width="600"/>
+
+#### Compute the average training speed
+
+```shell
+mim run mmdet analyze_logs cal_train_time \
+    ${LOG} \                                # path of train log in json format
+    [--include-outliers]                    # include the first value of every epoch when computing the average time
+```
+
+Examples:
+
+```shell
+mim run mmdet analyze_logs cal_train_time \
+    yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700.log.json
+```
+
+The output is expected to be like the following.
+
+```text
+-----Analyze train time of yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700.log.json-----
+slowest epoch 278, average time is 0.1705 s/iter
+fastest epoch 300, average time is 0.1510 s/iter
+time std over epochs is 0.0026
+average iter time: 0.1556 s/iter
+```
+
+### Print the whole config
+
+`print_config.py` in MMDetection prints the whole config verbatim, expanding all its imports. The command is as following.
+
+```shell
+mim run mmdet print_config \
+    ${CONFIG} \                              # path of the config file
+    [--save-path] \                          # save path of whole config, suffixed with .py, .json or .yml
+    [--cfg-options ${OPTIONS [OPTIONS...]}]  # override some settings in the used config
+```
+
+Examples:
+
+```shell
+mim run mmdet print_config \
+    configs/yolov5/yolov5_s-v61_syncbn_fast_1xb4-300e_balloon.py \
+    --save-path ./work_dirs/yolov5_s-v61_syncbn_fast_1xb4-300e_balloon.py
+```
+
+Running the above command will save the `yolov5_s-v61_syncbn_fast_1xb4-300e_balloon.py` config file with the inheritance relationship expanded to \`\`yolov5_s-v61_syncbn_fast_1xb4-300e_balloon_whole.py`in the`./work_dirs\` folder.
diff --git a/docs/en/advanced_guides/plugins.md b/docs/en/advanced_guides/plugins.md
index 779d515c6..b488ab894 100644
--- a/docs/en/advanced_guides/plugins.md
+++ b/docs/en/advanced_guides/plugins.md
@@ -1 +1,34 @@
 # Plugins
+
+MMYOLO supports adding plugins such as `none_local` and `dropblock` after different stages of Backbone. Users can directly manage plugins by modifying the plugins parameter of the backbone in the config. For example, add `GeneralizedAttention` plugins for `YOLOv5`. The configuration files are as follows:
+
+```python
+_base_ = './yolov5_s-v61_syncbn_8xb16-300e_coco.py'
+
+model = dict(
+    backbone=dict(
+        plugins=[
+            dict(
+                cfg=dict(
+                    type='GeneralizedAttention',
+                    spatial_range=-1,
+                    num_heads=8,
+                    attention_type='0011',
+                    kv_stride=2),
+                stages=(False, False, True, True))
+        ]))
+```
+
+`cfg` parameter indicates the specific configuration of the plugin. The `stages` parameter indicates whether to add plug-ins after the corresponding stage of the backbone. The length of the list `stages` must be the same as the number of backbone stages.
+
+MMYOLO currently supports the following plugins:
+
+<details open>
+<summary><b>Supported Plugins</b></summary>
+
+1. [CBAM](https://github.com/open-mmlab/mmyolo/blob/dev/mmyolo/models/plugins/cbam.py#L84)
+2. [GeneralizedAttention](https://github.com/open-mmlab/mmcv/blob/2.x/mmcv/cnn/bricks/generalized_attention.py#L13)
+3. [NonLocal2d](https://github.com/open-mmlab/mmcv/blob/2.x/mmcv/cnn/bricks/non_local.py#L250)
+4. [ContextBlock](https://github.com/open-mmlab/mmcv/blob/2.x/mmcv/cnn/bricks/context_block.py#L18)
+
+</details>
diff --git a/docs/en/community/code_style.md b/docs/en/community/code_style.md
new file mode 100644
index 000000000..08c534eae
--- /dev/null
+++ b/docs/en/community/code_style.md
@@ -0,0 +1,3 @@
+## Code Style
+
+Coming soon. Please refer to [chinese documentation](https://mmyolo.readthedocs.io/zh_CN/latest/community/code_style.html).
diff --git a/docs/en/community/contributing.md b/docs/en/community/contributing.md
new file mode 100644
index 000000000..92d251614
--- /dev/null
+++ b/docs/en/community/contributing.md
@@ -0,0 +1,255 @@
+## Contributing to OpenMMLab
+
+Welcome to the MMYOLO community, we are committed to building a cutting-edge computer vision foundational library, and all kinds of contributions are welcomed, including but not limited to
+
+**Fix bug**
+
+You can directly post a Pull Request to fix typos in code or documents
+
+The steps to fix the bug of code implementation are as follows.
+
+1. If the modification involves significant changes, you should create an issue first and describe the error information and how to trigger the bug. Other developers will discuss it with you and propose a proper solution.
+
+2. Posting a pull request after fixing the bug and adding the corresponding unit test.
+
+**New Feature or Enhancement**
+
+1. If the modification involves significant changes, you should create an issue to discuss with our developers to propose a proper design.
+2. Post a Pull Request after implementing the new feature or enhancement and add the corresponding unit test.
+
+**Document**
+
+You can directly post a pull request to fix documents. If you want to add a document, you should first create an issue to check if it is reasonable.
+
+### Pull Request Workflow
+
+If you're not familiar with Pull Request, don't worry! The following guidance will tell you how to create a Pull Request step by step. If you want to dive into the development mode of Pull Request, you can refer to the [official documents](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests)
+
+#### 1. Fork and clone
+
+If you are posting a pull request for the first time, you should fork the OpenMMLab repositories by clicking the **Fork** button in the top right corner of the GitHub page, and the forked repositories will appear under your GitHub profile.
+
+<img src="https://user-images.githubusercontent.com/27466624/204301143-2d262d2c-28b3-4060-8576-21d9f4237f2f.png" width="1200">
+
+Then, you can clone the repositories to local:
+
+```shell
+git clone git@github.com:{username}/mmyolo.git
+```
+
+After that, you should add official repository as the upstream repository.
+
+```bash
+git remote add upstream git@github.com:open-mmlab/mmyolo
+```
+
+Check whether the remote repository has been added successfully by `git remote -v`
+
+```bash
+origin	git@github.com:{username}/mmyolo.git (fetch)
+origin	git@github.com:{username}/mmyolo.git (push)
+upstream	git@github.com:open-mmlab/mmyolo (fetch)
+upstream	git@github.com:open-mmlab/mmyolo (push)
+```
+
+```{note}
+Here's a brief introduction to the origin and upstream. When we use "git clone", we create an "origin" remote by default, which points to the repository cloned from. As for "upstream", we add it ourselves to point to the target repository. Of course, if you don't like the name "upstream", you could name it as you wish. Usually, we'll push the code to "origin". If the pushed code conflicts with the latest code in official("upstream"), we should pull the latest code from upstream to resolve the conflicts, and then push to "origin" again. The posted Pull Request will be updated automatically.
+```
+
+#### 2. Configure pre-commit
+
+You should configure [pre-commit](https://pre-commit.com/#intro) in the local development environment to make sure the code style matches that of OpenMMLab. **Note**: The following code should be executed under the MMYOLO directory.
+
+```shell
+pip install -U pre-commit
+pre-commit install
+```
+
+Check that pre-commit is configured successfully, and install the hooks defined in `.pre-commit-config.yaml`.
+
+```shell
+pre-commit run --all-files
+```
+
+<img src="https://user-images.githubusercontent.com/57566630/173660750-3df20a63-cb66-4d33-a986-1f643f1d8aaf.png" width="1200">
+
+<img src="https://user-images.githubusercontent.com/57566630/202368856-0465a90d-8fce-4345-918e-67b8b9c82614.png" width="1200">
+
+```{note}
+Chinese users may fail to download the pre-commit hooks due to the network issue. In this case, you could download these hooks from gitee by setting the .pre-commit-config-zh-cn.yaml
+
+pre-commit install -c .pre-commit-config-zh-cn.yaml
+pre-commit run --all-files -c .pre-commit-config-zh-cn.yaml
+```
+
+If the installation process is interrupted, you can repeatedly run `pre-commit run ... ` to continue the installation.
+
+If the code does not conform to the code style specification, pre-commit will raise a warning and  fixes some of the errors automatically.
+
+<img src="https://user-images.githubusercontent.com/57566630/202369176-67642454-0025-4023-a095-263529107aa3.png" width="1200">
+
+If we want to commit our code bypassing the pre-commit hook, we can use the `--no-verify` option(**only for temporarily commit**.
+
+```shell
+git commit -m "xxx" --no-verify
+```
+
+#### 3. Create a development branch
+
+After configuring the pre-commit, we should create a branch based on the dev branch to develop the new feature or fix the bug. The proposed branch name is `username/pr_name`
+
+```shell
+git checkout -b yhc/refactor_contributing_doc
+```
+
+In subsequent development, if the dev branch of the local repository is behind the dev branch of "upstream", we need to pull the upstream for synchronization, and then execute the above command:
+
+```shell
+git pull upstream dev
+```
+
+#### 4. Commit the code and pass the unit test
+
+- MMYOLO introduces mypy to do static type checking to increase the robustness of the code. Therefore, we need to add Type Hints to our code and pass the mypy check. If you are not familiar with Type Hints, you can refer to [this tutorial](https://docs.python.org/3/library/typing.html).
+
+- The committed code should pass through the unit test
+
+  ```shell
+  # Pass all unit tests
+  pytest tests
+
+  # Pass the unit test of yolov5_coco dataset
+  pytest tests/test_datasets/test_yolov5_coco.py
+  ```
+
+  If the unit test fails for lack of dependencies, you can install the dependencies referring to the [guidance](#unit-test)
+
+- If the documents are modified/added, we should check the rendering result referring to [guidance](#document-rendering)
+
+#### 5. Push the code to remote
+
+We could push the local commits to remote after passing through the check of unit test and pre-commit. You can associate the local branch with remote branch by adding `-u` option.
+
+```shell
+git push -u origin {branch_name}
+```
+
+This will allow you to use the `git push` command to push code directly next time, without having to specify a branch or the remote repository.
+
+#### 6. Create a Pull Request
+
+(1) Create a pull request in GitHub's Pull request interface
+
+<img src="https://user-images.githubusercontent.com/27466624/204302289-d1e54901-8f27-4934-923f-fda800ff9851.png" width="1200">
+
+(2) Modify the PR description according to the guidelines so that other developers can better understand your changes
+
+<img src="https://user-images.githubusercontent.com/27466624/204303311-84456397-ee41-44f9-945c-85ce415da235.png" width="1200">
+
+Find more details about Pull Request description in [pull request guidelines](#pr-specs).
+
+**note**
+
+(a) The Pull Request description should contain the reason for the change, the content of the change, and the impact of the change, and be associated with the relevant Issue (see [documentation](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue)
+
+(b) If it is your first contribution, please sign the CLA
+
+<img src="https://user-images.githubusercontent.com/57566630/167307569-a794b967-6e28-4eac-a942-00deb657815f.png" width="1200">
+
+(c) Check whether the Pull Request pass through the CI
+
+<img src="https://user-images.githubusercontent.com/27466624/204303753-900de590-ddd1-4be2-8e43-8dc09f127f5d.png" width="1200">
+
+MMYOLO will run unit test for the posted Pull Request on Linux, based on different versions of Python, and PyTorch to make sure the code is correct. We can see the specific test information by clicking `Details` in the above image so that we can modify the code.
+
+(3) If the Pull Request passes the CI, then you can wait for the review from other developers. You'll modify the code based on the reviewer's comments, and repeat the steps [4](#4-commit-the-code-and-pass-the-unit-test)-[5](#5-push-the-code-to-remote) until all reviewers approve it. Then, we will merge it ASAP.
+
+<img src="https://user-images.githubusercontent.com/57566630/202145400-cc2cd8c4-10b0-472f-ba37-07e6f50acc67.png" width="1200">
+
+#### 7. Resolve conflicts
+
+If your local branch conflicts with the latest dev branch of "upstream", you'll need to resolove them. There are two ways to do this:
+
+```shell
+git fetch --all --prune
+git rebase upstream/dev
+```
+
+or
+
+```shell
+git fetch --all --prune
+git merge upstream/dev
+```
+
+If you are very good at handling conflicts, then you can use rebase to resolve conflicts, as this will keep your commit logs tidy. If you are unfamiliar with `rebase`, you can use `merge` to resolve conflicts.
+
+### Guidance
+
+#### Unit test
+
+We should also make sure the committed code will not decrease the coverage of unit test, we could run the following command to check the coverage of unit test:
+
+```shell
+python -m coverage run -m pytest /path/to/test_file
+python -m coverage html
+# check file in htmlcov/index.html
+```
+
+#### Document rendering
+
+If the documents are modified/added, we should check the rendering result. We could install the dependencies and run the following command to render the documents and check the results:
+
+```shell
+pip install -r requirements/docs.txt
+cd docs/zh_cn/
+# or docs/en
+make html
+# check file in ./docs/zh_cn/_build/html/index.html
+```
+
+### Code style
+
+#### Python
+
+We adopt [PEP8](https://www.python.org/dev/peps/pep-0008/) as the preferred code style.
+
+We use the following tools for linting and formatting:
+
+- [flake8](https://github.com/PyCQA/flake8): A wrapper around some linter tools.
+- [isort](https://github.com/timothycrosley/isort): A Python utility to sort imports.
+- [yapf](https://github.com/google/yapf): A formatter for Python files.
+- [codespell](https://github.com/codespell-project/codespell): A Python utility to fix common misspellings in text files.
+- [mdformat](https://github.com/executablebooks/mdformat): Mdformat is an opinionated Markdown formatter that can be used to enforce a consistent style in Markdown files.
+- [docformatter](https://github.com/myint/docformatter): A formatter to format docstring.
+
+Style configurations of yapf and isort can be found in [setup.cfg](./setup.cfg).
+
+We use [pre-commit hook](https://pre-commit.com/) that checks and formats for `flake8`, `yapf`, `isort`, `trailing whitespaces`, `markdown files`,
+fixes `end-of-files`, `double-quoted-strings`, `python-encoding-pragma`, `mixed-line-ending`, sorts `requirments.txt` automatically on every commit.
+The config for a pre-commit hook is stored in [.pre-commit-config](./.pre-commit-config.yaml).
+
+#### C++ and CUDA
+
+We follow the [Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html).
+
+### PR Specs
+
+1. Use [pre-commit](https://pre-commit.com) hook to avoid issues of code style
+
+2. One short-time branch should be matched with only one PR
+
+3. Accomplish a detailed change in one PR. Avoid large PR
+
+   - Bad: Support Faster R-CNN
+   - Acceptable: Add a box head to Faster R-CNN
+   - Good: Add a parameter to box head to support custom conv-layer number
+
+4. Provide clear and significant commit message
+
+5. Provide clear and meaningful PR description
+
+   - Task name should be clarified in title. The general format is: \[Prefix\] Short description of the PR (Suffix)
+   - Prefix: add new feature \[Feature\], fix bug \[Fix\], related to documents \[Docs\], in developing \[WIP\] (which will not be reviewed temporarily)
+   - Introduce main changes, results and influences on other modules in short description
+   - Associate related issues and pull requests with a milestone
diff --git a/docs/en/deploy/basic_deployment_guide.md b/docs/en/deploy/basic_deployment_guide.md
index 19e84788e..69258f540 100644
--- a/docs/en/deploy/basic_deployment_guide.md
+++ b/docs/en/deploy/basic_deployment_guide.md
@@ -1 +1,287 @@
 # Basic Deployment Guide
+
+## Introduction of MMDeploy
+
+MMDeploy is an open-source deep learning model deployment toolset. It is a part of the [OpenMMLab](https://openmmlab.com/) project, and provides **a unified experience of exporting different models** to various platforms and devices of the OpenMMLab series libraries. Using MMDeploy, developers can easily export the specific compiled SDK they need from the training result, which saves a lot of effort.
+
+More detailed introduction and guides can be found [here](https://github.com/open-mmlab/mmdeploy/blob/dev-1.x/docs/en/get_started.md)
+
+## Supported Algorithms
+
+Currently our deployment kit supports on the following models and backends:
+
+| Model  | Task            | OnnxRuntime | TensorRT |                              Model config                               |
+| :----- | :-------------- | :---------: | :------: | :---------------------------------------------------------------------: |
+| YOLOv5 | ObjectDetection |      Y      |    Y     | [config](https://github.com/open-mmlab/mmyolo/tree/main/configs/yolov5) |
+| YOLOv6 | ObjectDetection |      Y      |    Y     | [config](https://github.com/open-mmlab/mmyolo/tree/main/configs/yolov6) |
+| YOLOX  | ObjectDetection |      Y      |    Y     | [config](https://github.com/open-mmlab/mmyolo/tree/main/configs/yolox)  |
+| RTMDet | ObjectDetection |      Y      |    Y     | [config](https://github.com/open-mmlab/mmyolo/tree/main/configs/rtmdet) |
+
+Note: ncnn and other inference backends support are coming soon.
+
+## How to Write Config for MMYOLO
+
+All config files related to the deployment are located at [`configs/deploy`](../../../configs/deploy/).
+
+You only need to change the relative data processing part in the model config file to support either static or dynamic input for your model. Besides, MMDeploy integrates the post-processing parts as customized ops, you can modify the strategy in `post_processing` parameter in `codebase_config`.
+
+Here is the detail description:
+
+```python
+codebase_config = dict(
+    type='mmyolo',
+    task='ObjectDetection',
+    model_type='end2end',
+    post_processing=dict(
+        score_threshold=0.05,
+        confidence_threshold=0.005,
+        iou_threshold=0.5,
+        max_output_boxes_per_class=200,
+        pre_top_k=5000,
+        keep_top_k=100,
+        background_label_id=-1),
+    module=['mmyolo.deploy'])
+```
+
+- `score_threshold`: set the score threshold to filter candidate bboxes before `nms`
+- `confidence_threshold`: set the confidence threshold to filter candidate bboxes before `nms`
+- `iou_threshold`: set the `iou` threshold for removing duplicates in `nums`
+- `max_output_boxes_per_class`: set the maximum number of bboxes for each class
+- `pre_top_k`: set the number of fixedcandidate bboxes before `nms`, sorted by scores
+- `keep_top_k`: set the number of output candidate bboxs after `nms`
+- `background_label_id`: set to `-1` as MMYOLO has no background class information
+
+### Configuration for Static Inputs
+
+#### 1. Model Config
+
+Taking `YOLOv5` of MMYOLO as an example, here are the details:
+
+```python
+_base_ = '../../yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py'
+
+test_pipeline = [
+    dict(type='LoadImageFromFile', file_client_args=_base_.file_client_args),
+    dict(
+        type='LetterResize',
+        scale=_base_.img_scale,
+        allow_scale_up=False,
+        use_mini_pad=False,
+    ),
+    dict(type='LoadAnnotations', with_bbox=True, _scope_='mmdet'),
+    dict(
+        type='mmdet.PackDetInputs',
+        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+                   'scale_factor', 'pad_param'))
+]
+
+test_dataloader = dict(
+    dataset=dict(pipeline=test_pipeline, batch_shapes_cfg=None))
+```
+
+`_base_ = '../../yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py'` inherits the model config in the training stage.
+
+`test_pipeline` adds the data processing piple for the deployment, `LetterResize` controls the size of the input images and the input for the converted model
+
+`test_dataloader` adds the dataloader config for the deployment, `batch_shapes_cfg` decides whether to use the `batch_shapes` strategy. More details can be found at [yolov5 configs](../user_guides/config.md)
+
+#### 2. Deployment Config
+
+Here we still use the `YOLOv5` in MMYOLO as the example. We can use [`detection_onnxruntime_static.py`](https://github.com/open-mmlab/mmyolo/blob/main/configs/deploy/detection_onnxruntime_static.py) as the config to deploy \`YOLOv5\` to \`ONNXRuntim\` with static inputs.
+
+```python
+_base_ = ['./base_static.py']
+codebase_config = dict(
+    type='mmyolo',
+    task='ObjectDetection',
+    model_type='end2end',
+    post_processing=dict(
+        score_threshold=0.05,
+        confidence_threshold=0.005,
+        iou_threshold=0.5,
+        max_output_boxes_per_class=200,
+        pre_top_k=5000,
+        keep_top_k=100,
+        background_label_id=-1),
+    module=['mmyolo.deploy'])
+backend_config = dict(type='onnxruntime')
+```
+
+`backend_config` indicates the deployment backend with `type='onnxruntime'`, other information can be referred from the third section.
+
+To deploy the `YOLOv5` to `TensorRT`, please refer to the [`detection_tensorrt_static-640x640.py`](https://github.com/open-mmlab/mmyolo/blob/main/configs/deploy/detection_tensorrt_static-640x640.py) as follows.
+
+```python
+_base_ = ['./base_static.py']
+onnx_config = dict(input_shape=(640, 640))
+backend_config = dict(
+    type='tensorrt',
+    common_config=dict(fp16_mode=False, max_workspace_size=1 << 30),
+    model_inputs=[
+        dict(
+            input_shapes=dict(
+                input=dict(
+                    min_shape=[1, 3, 640, 640],
+                    opt_shape=[1, 3, 640, 640],
+                    max_shape=[1, 3, 640, 640])))
+    ])
+use_efficientnms = False
+```
+
+`backend_config` indices the backend with `type=‘tensorrt’`.
+
+Different from `ONNXRuntime` deployment configuration, `TensorRT` needs to specify the input image size and the parameters required to build the engine file, including:
+
+- `onnx_config` specifies the input shape as `input_shape=(640, 640)`
+- `fp16_mode=False` and `max_workspace_size=1 << 30` in `backend_config['common_config']` indicates whether to build the engine in the parameter format of `fp16`, and the maximum video memory for the current `gpu` device, respectively. The unit is in `GB`. For detailed configuration of `fp16`, please refer to the [`detection_tensorrt-fp16_static-640x640.py`](https://github.com/open-mmlab/mmyolo/blob/main/configs/deploy/detection_tensorrt-fp16_static-640x640.py)
+- The `min_shape`/`opt_shape`/`max_shape` in `backend_config['model_inputs']['input_shapes']['input']` should remain the same under static input, the default is `[1, 3, 640, 640]`.
+
+`use_efficientnms` is a new configuration introduced by the `MMYOLO` series, indicating whether to enable `Efficient NMS Plugin` to replace `TRTBatchedNMS plugin` in `MMDeploy` when exporting `onnx`.
+
+You can refer to the official [efficient NMS plugins](https://github.com/NVIDIA/TensorRT/blob/main/plugin/efficientNMSPlugin/README.md) by `TensorRT` for more details.
+
+Note: this out-of-box feature is **only available in TensorRT>=8.0**, no need to compile it by yourself.
+
+### Configuration for Dynamic Inputs
+
+#### 1. Model Config
+
+When you deploy a dynamic input model, you don't need to modify any model configuration files but the deployment configuration files.
+
+#### 2. Deployment Config
+
+To deploy the `YOLOv5` in MMYOLO to `ONNXRuntime`, please refer to the [`detection_onnxruntime_dynamic.py`](https://github.com/open-mmlab/mmyolo/blob/main/configs/deploy/detection_onnxruntime_dynamic.py).
+
+```python
+_base_ = ['./base_dynamic.py']
+codebase_config = dict(
+    type='mmyolo',
+    task='ObjectDetection',
+    model_type='end2end',
+    post_processing=dict(
+        score_threshold=0.05,
+        confidence_threshold=0.005,
+        iou_threshold=0.5,
+        max_output_boxes_per_class=200,
+        pre_top_k=5000,
+        keep_top_k=100,
+        background_label_id=-1),
+    module=['mmyolo.deploy'])
+backend_config = dict(type='onnxruntime')
+```
+
+`backend_config` indicates the backend with `type='onnxruntime'`. Other parameters stay the same as the static input section.
+
+To deploy the `YOLOv5` to `TensorRT`, please refer to the [`detection_tensorrt_dynamic-192x192-960x960.py`](https://github.com/open-mmlab/mmyolo/blob/main/configs/deploy/detection_tensorrt_dynamic-192x192-960x960.py).
+
+```python
+_base_ = ['./base_dynamic.py']
+backend_config = dict(
+    type='tensorrt',
+    common_config=dict(fp16_mode=False, max_workspace_size=1 << 30),
+    model_inputs=[
+        dict(
+            input_shapes=dict(
+                input=dict(
+                    min_shape=[1, 3, 192, 192],
+                    opt_shape=[1, 3, 640, 640],
+                    max_shape=[1, 3, 960, 960])))
+    ])
+use_efficientnms = False
+```
+
+`backend_config` indicates the backend with `type='tensorrt'`. Since the dynamic and static inputs are different in `TensorRT`, please check the details at [TensorRT dynamic input official introduction](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-843/developer-guide/index.html#work_dynamic_shapes).
+
+`TensorRT` deployment requires you to specify `min_shape`, `opt_shape` , and `max_shape`. `TensorRT` limits the size of the input image between `min_shape` and `max_shape`.
+
+`min_shape` is the minimum size of the input image. `opt_shape` is the common size of the input image, inference performance is best under this size. `max_shape` is the maximum size of the input image.
+
+`use_efficientnms` configuration is the same as the `TensorRT` static input configuration in the previous section.
+
+### INT8 Quantization Support
+
+Note: Int8 quantization support will soon be released.
+
+## How to Convert Model
+
+### Usage
+
+Set the root directory of `MMDeploy` as an env parameter `MMDEPLOY_DIR` using `export MMDEPLOY_DIR=/the/root/path/of/MMDeploy` command.
+
+```shell
+python3 ${MMDEPLOY_DIR}/tools/deploy.py \
+    ${DEPLOY_CFG_PATH} \
+    ${MODEL_CFG_PATH} \
+    ${MODEL_CHECKPOINT_PATH} \
+    ${INPUT_IMG} \
+    --test-img ${TEST_IMG} \
+    --work-dir ${WORK_DIR} \
+    --calib-dataset-cfg ${CALIB_DATA_CFG} \
+    --device ${DEVICE} \
+    --log-level INFO \
+    --show \
+    --dump-info
+```
+
+### Parameter Description
+
+- `deploy_cfg`: set the deployment config path of MMDeploy for the model, including the type of inference framework, whether quantize, whether the input shape is dynamic, etc. There may be a reference relationship between configuration files, e.g. `configs/deploy/detection_onnxruntime_static.py`
+- `model_cfg`: set the MMYOLO model config path, e.g. `configs/deploy/model/yolov5_s-deploy.py`, regardless of the path to MMDeploy
+- `checkpoint`: set the torch model path. It can start with `http/https`, more details are available in `mmengine.fileio` apis
+- `img`: set the path to the image or point cloud file used for testing during model conversion
+- `--test-img`: set the image file that used to test model. If not specified, it will be set to `None`
+- `--work-dir`: set the work directory that used to save logs and models
+- `--calib-dataset-cfg`: use for calibration only for INT8 mode. If not specified, it will be set to None and use “val” dataset in model config for calibration
+- `--device`: set the device used for model conversion. The default is `cpu`, for TensorRT used `cuda:0`
+- `--log-level`: set log level which in `'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'`. If not specified, it will be set to `INFO`
+- `--show`: show the result on screen or not
+- `--dump-info`: output SDK information or not
+
+## How to Evaluate Model
+
+### Usage
+
+After the model is converted to your backend, you can use `${MMDEPLOY_DIR}/tools/test.py` to evaluate the performance.
+
+```shell
+python3 ${MMDEPLOY_DIR}/tools/test.py \
+    ${DEPLOY_CFG} \
+    ${MODEL_CFG} \
+    --model ${BACKEND_MODEL_FILES} \
+    [--out ${OUTPUT_PKL_FILE}] \
+    [--format-only] \
+    [--metrics ${METRICS}] \
+    [--show] \
+    [--show-dir ${OUTPUT_IMAGE_DIR}] \
+    [--show-score-thr ${SHOW_SCORE_THR}] \
+    --device ${DEVICE} \
+    [--cfg-options ${CFG_OPTIONS}] \
+    [--metric-options ${METRIC_OPTIONS}]
+    [--log2file work_dirs/output.txt]
+    [--batch-size ${BATCH_SIZE}]
+    [--speed-test] \
+    [--warmup ${WARM_UP}] \
+    [--log-interval ${LOG_INTERVERL}]
+```
+
+### Parameter Description
+
+- `deploy_cfg`: set the deployment config file path
+- `model_cfg`: set the MMYOLO model config file path
+- `--model`: set the converted model. For example, if we exported a TensorRT model, we need to pass in the file path with the suffix ".engine"
+- `--out`: save the output result in pickle format, use only when you need it
+- `--format-only`: format the output without evaluating it. It is useful when you want to format the result into a specific format and submit it to a test server
+- `--metrics`: use the specific metric supported in MMYOLO to evaluate, such as "proposal" in COCO format data.
+- `--show`: show the evaluation result on screen or not
+- `--show-dir`: save the evaluation result to this directory, valid only when specified
+- `--show-score-thr`: show the threshold for the detected bboxes or not
+- `--device`: indicate the device to run the model. Note that some backends limit the running devices. For example, TensorRT must run on CUDA
+- `--cfg-options`: pass in additional configs, which will override the current deployment configs
+- `--metric-options`: add custom options for metrics. The key-value pair format in xxx=yyy will be the kwargs of the dataset.evaluate() method
+- `--log2file`: save the evaluation results (with the speed) to a file
+- `--batch-size`: set the batch size for inference, which will override the `samples_per_gpu` in data config. The default value is `1`, however, not every model supports `batch_size > 1`
+- `--speed-test`: test the inference speed or not
+- `--warmup`: warm up before speed test or not, works only when `speed-test` is specified
+- `--log-interval`: set the interval between each log, works only when `speed-test` is specified
+
+Note: other parameters in `${MMDEPLOY_DIR}/tools/test.py` are used for speed test, they will not affect the evaluation results.
diff --git a/docs/en/deploy/yolov5_deployment.md b/docs/en/deploy/yolov5_deployment.md
index c2683d761..f0e319674 100644
--- a/docs/en/deploy/yolov5_deployment.md
+++ b/docs/en/deploy/yolov5_deployment.md
@@ -1 +1,432 @@
 # YOLOv5 Deployment
+
+Please check the [basic_deployment_guide](basic_deployment_guide.md) to get familiar with the configurations.
+
+## Model Training and Validation
+
+The details of training and validation can be found at [yolov5_tutorial](../user_guides/yolov5_tutorial.md).
+
+## MMDeploy Environment Setup
+
+Please check the installation document of `MMDeploy` at [build_from_source](https://github.com/open-mmlab/mmdeploy/blob/dev-1.x/docs/en/01-how-to-build/build_from_source.md). Please build both `MMDeploy` and the customized Ops to your specific platform.
+
+Note: please check at `MMDeploy` [FAQ](https://github.com/open-mmlab/mmdeploy/blob/dev-1.x/docs/en/faq.md) or create new issues in `MMDeploy` when you come across any problems.
+
+## How to Prepare Configuration File
+
+This deployment guide uses the `YOLOv5` model trained on `COCO` dataset in MMYOLO to illustrate the whole process, including both static and dynamic inputs and different procedures for `TensorRT` and `ONNXRuntime`.
+
+### For Static Input
+
+#### 1. Model Config
+
+To deploy the model with static inputs, you need to ensure that the model inputs are in fixed size, e.g. the input size is set to `640x640` while uploading data in the test pipeline and test dataloader.
+
+Here is a example in [`yolov5_s-static.py`](https://github.com/open-mmlab/mmyolo/tree/main/configs/deploy/model/yolov5_s-static.py)
+
+```python
+_base_ = '../../yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py'
+
+test_pipeline = [
+    dict(type='LoadImageFromFile', file_client_args=_base_.file_client_args),
+    dict(
+        type='LetterResize',
+        scale=_base_.img_scale,
+        allow_scale_up=False,
+        use_mini_pad=False,
+    ),
+    dict(type='LoadAnnotations', with_bbox=True, _scope_='mmdet'),
+    dict(
+        type='mmdet.PackDetInputs',
+        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+                   'scale_factor', 'pad_param'))
+]
+
+test_dataloader = dict(
+    dataset=dict(pipeline=test_pipeline, batch_shapes_cfg=None))
+```
+
+As the `YOLOv5` will turn on `allow_scale_up` and `use_mini_pad` during the test to change the size of the input image in order to achieve higher accuracy. However, it will cause the input size mismatch problem when deploying in the static input model.
+
+Compared with the original configuration file, this configuration has been modified as follows:
+
+- turn off the settings related to reshaping the image in `test_pipeline`, e.g. setting `allow_scale_up=False` and `use_mini_pad=False` in `LetterResize`
+- turn off the `batch_shapes` in `test_dataloader` as `batch_shapes_cfg=None`.
+
+#### 2. Deployment Cofnig
+
+To deploy the model to `ONNXRuntime`, please refer to the [`detection_onnxruntime_static.py`](https://github.com/open-mmlab/mmyolo/tree/main/configs/deploy/detection_onnxruntime_static.py) as follows:
+
+```python
+_base_ = ['./base_static.py']
+codebase_config = dict(
+    type='mmyolo',
+    task='ObjectDetection',
+    model_type='end2end',
+    post_processing=dict(
+        score_threshold=0.05,
+        confidence_threshold=0.005,
+        iou_threshold=0.5,
+        max_output_boxes_per_class=200,
+        pre_top_k=5000,
+        keep_top_k=100,
+        background_label_id=-1),
+    module=['mmyolo.deploy'])
+backend_config = dict(type='onnxruntime')
+```
+
+The `post_processing` in the default configuration aligns the accuracy of the current model with the trained `pytorch` model. If you need to modify the relevant parameters, you can refer to the detailed introduction of [dasic_deployment_guide](basic_deployment_guide.md).
+
+To deploy the model to `TensorRT`, please refer to the [`detection_tensorrt_static-640x640.py`](https://github.com/open-mmlab/mmyolo/tree/main/configs/deploy/detection_tensorrt_static-640x640.p).
+
+```python
+_base_ = ['./base_static.py']
+onnx_config = dict(input_shape=(640, 640))
+backend_config = dict(
+    type='tensorrt',
+    common_config=dict(fp16_mode=False, max_workspace_size=1 << 30),
+    model_inputs=[
+        dict(
+            input_shapes=dict(
+                input=dict(
+                    min_shape=[1, 3, 640, 640],
+                    opt_shape=[1, 3, 640, 640],
+                    max_shape=[1, 3, 640, 640])))
+    ])
+use_efficientnms = False
+```
+
+In this guide, we use the default settings such as  `input_shape=(640, 640)` and `fp16_mode=False` to build in network in `fp32` mode. Moreover, we set `max_workspace_size=1 << 30` for the gpu memory which allows `TensorRT` to build the engine with maximum `1GB` memory.
+
+### For Dynamic Input
+
+#### 1. Model Confige
+
+As `TensorRT` limits the minimum and maximum input size, we can use any size for the inputs when deploy the model in dynamic mode. In this way, we can keep the default settings in [`yolov5_s-v61_syncbn_8xb16-300e_coco.py`](https://github.com/open-mmlab/mmyolo/tree/main/configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py). The data processing and dataloader parts are as follows.
+
+```python
+batch_shapes_cfg = dict(
+    type='BatchShapePolicy',
+    batch_size=val_batch_size_per_gpu,
+    img_size=img_scale[0],
+    size_divisor=32,
+    extra_pad_ratio=0.5)
+
+test_pipeline = [
+    dict(type='LoadImageFromFile', file_client_args=_base_.file_client_args),
+    dict(type='YOLOv5KeepRatioResize', scale=img_scale),
+    dict(
+        type='LetterResize',
+        scale=img_scale,
+        allow_scale_up=False,
+        pad_val=dict(img=114)),
+    dict(type='LoadAnnotations', with_bbox=True, _scope_='mmdet'),
+    dict(
+        type='mmdet.PackDetInputs',
+        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+                   'scale_factor', 'pad_param'))
+]
+
+val_dataloader = dict(
+    batch_size=val_batch_size_per_gpu,
+    num_workers=val_num_workers,
+    persistent_workers=persistent_workers,
+    pin_memory=True,
+    drop_last=False,
+    sampler=dict(type='DefaultSampler', shuffle=False),
+    dataset=dict(
+        type=dataset_type,
+        data_root=data_root,
+        test_mode=True,
+        data_prefix=dict(img='val2017/'),
+        ann_file='annotations/instances_val2017.json',
+        pipeline=test_pipeline,
+        batch_shapes_cfg=batch_shapes_cfg))
+```
+
+We use `allow_scale_up=False` to control when the input small images will be upsampled or not in the initialization of `LetterResize`. At the same time, the default `use_mini_pad=False` turns off the minimum padding strategy of the image, and `val_dataloader['dataset']` is passed in` batch_shapes_cfg=batch_shapes_cfg` to ensure that the minimum padding is performed according to the input size in `batch`. These configs will change the dimensions of the input image, so the converted model can support dynamic inputs according to the above dataset loader when testing.
+
+#### 2. Deployment Cofnig
+
+To deploy the model to `ONNXRuntime`, please refer to the [`detection_onnxruntime_dynamic.py`](https://github.com/open-mmlab/mmyolo/blob/main/configs/deploy/detection_onnxruntime_dynamic.py) for more details.
+
+```python
+_base_ = ['./base_dynamic.py']
+codebase_config = dict(
+    type='mmyolo',
+    task='ObjectDetection',
+    model_type='end2end',
+    post_processing=dict(
+        score_threshold=0.05,
+        confidence_threshold=0.005,
+        iou_threshold=0.5,
+        max_output_boxes_per_class=200,
+        pre_top_k=5000,
+        keep_top_k=100,
+        background_label_id=-1),
+    module=['mmyolo.deploy'])
+backend_config = dict(type='onnxruntime')
+```
+
+Differs from the static input config we introduced in previous section, dynamic input config additionally inherits the `dynamic_axes`. The rest of the configuration stays the same as the static inputs.
+
+To deploy the model to `TensorRT`, please refer to the [`detection_tensorrt_dynamic-192x192-960x960.py`](https://github.com/open-mmlab/mmyolo/tree/main/configs/deploy/detection_tensorrt_dynamic-192x192-960x960.py) for more details.
+
+```python
+_base_ = ['./base_dynamic.py']
+backend_config = dict(
+    type='tensorrt',
+    common_config=dict(fp16_mode=False, max_workspace_size=1 << 30),
+    model_inputs=[
+        dict(
+            input_shapes=dict(
+                input=dict(
+                    min_shape=[1, 3, 192, 192],
+                    opt_shape=[1, 3, 640, 640],
+                    max_shape=[1, 3, 960, 960])))
+    ])
+use_efficientnms = False
+```
+
+In our example, the network is built in `fp32` mode as `fp16_mode=False`, and the maximum graphic memory is `1GB` for building the `TensorRT` engine as `max_workspace_size=1 << 30`.
+
+At the same time, `min_shape=[1, 3, 192, 192]`, `opt_shape=[1, 3, 640, 640]`, and `max_shape=[1, 3, 960, 960]` in the default setting set  the model with minimum input size to `192x192`, the maximum size to `960x960`, and the most common size to `640x640`.
+
+When you deploy the model, it can adopt to the input image dimensions automatically.
+
+## How to Convert Model
+
+Note: The `MMDeploy` root directory used in this guide is `/home/openmmlab/dev/mmdeploy`, please modify it to your `MMDeploy` directory.
+
+Use the following command to download the pretrained YOLOv5 weight and save it to your device:
+
+```shell
+wget https://download.openmmlab.com/mmyolo/v0/yolov5/yolov5_s-v61_syncbn_fast_8xb16-300e_coco/yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700-86e02187.pth -O /home/openmmlab/dev/mmdeploy/yolov5s.pth
+```
+
+Set the relevant env parameters using the following command as well:
+
+```shell
+export MMDEPLOY_DIR=/home/openmmlab/dev/mmdeploy
+export PATH_TO_CHECKPOINTS=/home/openmmlab/dev/mmdeploy/yolov5s.pth
+```
+
+### YOLOv5 Static Model Deployment
+
+#### ONNXRuntime
+
+```shell
+python3 ${MMDEPLOY_DIR}/tools/deploy.py \
+    configs/deploy/detection_onnxruntime_static.py \
+    configs/deploy/model/yolov5_s-static.py \
+    ${PATH_TO_CHECKPOINTS} \
+    demo/demo.jpg \
+    --work-dir work_dir \
+    --show \
+    --device cpu
+```
+
+#### TensorRT
+
+```shell
+python3 ${MMDEPLOY_DIR}/tools/deploy.py \
+    configs/deploy/detection_tensorrt_static-640x640.py \
+    configs/deploy/model/yolov5_s-static.py \
+    ${PATH_TO_CHECKPOINTS} \
+    demo/demo.jpg \
+    --work-dir work_dir \
+    --show \
+    --device cuda:0
+```
+
+### YOLOv5 Dynamic Model Deployment
+
+#### ONNXRuntime
+
+```shell
+python3 ${MMDEPLOY_DIR}/tools/deploy.py \
+    configs/deploy/detection_onnxruntime_dynamic.py \
+    configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py \
+    ${PATH_TO_CHECKPOINTS} \
+    demo/demo.jpg \
+    --work-dir work_dir \
+    --show \
+    --device cpu
+```
+
+#### TensorRT
+
+```shell
+python3 ${MMDEPLOY_DIR}/tools/deploy.py \
+    configs/deploy/detection_tensorrt_dynamic-192x192-960x960.py \
+    configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py \
+    ${PATH_TO_CHECKPOINTS} \
+    demo/demo.jpg \
+    --work-dir work_dir \
+    --show \
+    --device cuda:0
+```
+
+When convert the model using the above commands, you will find the following files under the `work_dir` folder:
+
+![image](https://user-images.githubusercontent.com/92794867/199377596-605c3493-c1e0-435d-bc97-2e46846ac87d.png)
+
+or
+
+![image](https://user-images.githubusercontent.com/92794867/199377848-a771f9c5-6bd6-49a1-9f58-e7e7b96c800f.png)
+
+After exporting to `onnxruntime`, you will get three files as shown in Figure 1, where `end2end.onnx` represents the exported `onnxruntime` model.
+
+After exporting to `TensorRT`, you will get the four files as shown in Figure 2, where `end2end.onnx` represents the exported intermediate model. `MMDeploy` uses this model to automatically continue to convert the `end2end.engine` model for `TensorRT `Deployment.
+
+## How to Evaluate Model
+
+After successfully convert the model, you can use `${MMDEPLOY_DIR}/tools/test.py` to evaluate the converted model. The following part shows how to evaluate the static models of `ONNXRuntime` and `TensorRT`. For dynamic model evaluation, please modify the configuration of the inputs.
+
+#### ONNXRuntime
+
+```shell
+python3 ${MMDEPLOY_DIR}/tools/test.py \
+        configs/deploy/detection_onnxruntime_static.py \
+        configs/deploy/model/yolov5_s-static.py \
+        --model work_dir/end2end.onnx  \
+        --device cpu \
+        --work-dir work_dir
+```
+
+Once the process is done, you can get the output results as this:
+
+![image](https://user-images.githubusercontent.com/92794867/199380483-cf8d867b-7309-4994-938a-f743f4cada77.png)
+
+#### TensorRT
+
+Note: `TensorRT` must run on `CUDA` devices!
+
+```shell
+python3 ${MMDEPLOY_DIR}/tools/test.py \
+        configs/deploy/detection_tensorrt_static-640x640.py \
+        configs/deploy/model/yolov5_s-static.py \
+        --model work_dir/end2end.engine  \
+        --device cuda:0 \
+        --work-dir work_dir
+```
+
+Once the process is done, you can get the output results as this:
+
+![image](https://user-images.githubusercontent.com/92794867/199380370-da15cfca-2723-4e5b-b6cf-0afb5f44a66a.png)
+
+More useful evaluation tools will be released in the future.
+
+# Deploy using Docker
+
+`MMYOLO` provides a deployment [`Dockerfile`](https://github.com/open-mmlab/mmyolo/blob/main/docker/Dockerfile_deployment) for deployment purpose. Please make sure your local docker version is greater than `19.03`.
+
+Note: users in mainland China can comment out the `Optional` part in the dockerfile for better experience.
+
+```dockerfile
+# (Optional)
+RUN sed -i 's/http:\/\/archive.ubuntu.com\/ubuntu\//http:\/\/mirrors.aliyun.com\/ubuntu\//g' /etc/apt/sources.list && \
+    pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
+```
+
+To build the docker image,
+
+```bash
+# build an image with PyTorch 1.12, CUDA 11.6, TensorRT 8.2.4 ONNXRuntime 1.8.1
+docker build -f docker/Dockerfile_deployment -t mmyolo:v1 .
+```
+
+To run the docker image,
+
+```bash
+export DATA_DIR=/path/to/your/dataset
+docker run --gpus all --shm-size=8g -it --name mmyolo -v ${DATA_DIR}:/openmmlab/mmyolo/data/coco mmyolo:v1
+```
+
+`DATA_DIR` is the path of your `COCO` dataset.
+
+We provide a `script.sh` file for you which runs the whole pipeline. Create the script under `/openmmlab/mmyolo` directory in your docker container using the following content.
+
+```bash
+#!/bin/bash
+wget -q https://download.openmmlab.com/mmyolo/v0/yolov5/yolov5_s-v61_syncbn_fast_8xb16-300e_coco/yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700-86e02187.pth \
+  -O yolov5s.pth
+export MMDEPLOY_DIR=/openmmlab/mmdeploy
+export PATH_TO_CHECKPOINTS=/openmmlab/mmyolo/yolov5s.pth
+
+python3 ${MMDEPLOY_DIR}/tools/deploy.py \
+  configs/deploy/detection_tensorrt_static-640x640.py \
+  configs/deploy/model/yolov5_s-static.py \
+  ${PATH_TO_CHECKPOINTS} \
+  demo/demo.jpg \
+  --work-dir work_dir_trt \
+  --device cuda:0
+
+python3 ${MMDEPLOY_DIR}/tools/test.py \
+  configs/deploy/detection_tensorrt_static-640x640.py \
+  configs/deploy/model/yolov5_s-static.py \
+  --model work_dir_trt/end2end.engine \
+  --device cuda:0 \
+  --work-dir work_dir_trt
+
+python3 ${MMDEPLOY_DIR}/tools/deploy.py \
+  configs/deploy/detection_onnxruntime_static.py \
+  configs/deploy/model/yolov5_s-static.py \
+  ${PATH_TO_CHECKPOINTS} \
+  demo/demo.jpg \
+  --work-dir work_dir_ort \
+  --device cpu
+
+python3 ${MMDEPLOY_DIR}/tools/test.py \
+  configs/deploy/detection_onnxruntime_static.py \
+  configs/deploy/model/yolov5_s-static.py \
+  --model work_dir_ort/end2end.onnx \
+  --device cpu \
+  --work-dir work_dir_ort
+```
+
+Then run the script under `/openmmlab/mmyolo`.
+
+```bash
+sh script.sh
+```
+
+This script automatically downloads the `YOLOv5` pretrained weights in `MMYOLO` and convert the model using `MMDeploy`. You will get the output result as follows.
+
+- TensorRT：
+
+  ![image](https://user-images.githubusercontent.com/92794867/199657349-1bad9196-c00b-4a65-84f5-80f51e65a2bd.png)
+
+- ONNXRuntime：
+
+  ![image](https://user-images.githubusercontent.com/92794867/199657283-95412e84-3ba4-463f-b4b2-4bf52ec4acbd.png)
+
+We can see from the above images that the accuracy of converted models shrink within 1% compared with the pytorch [MMYOLO-YOLOv5](https://github.com/open-mmlab/mmyolo/tree/main/configs/yolov5#results-and-models) models.
+
+If you need to test the inference speed of the converted model, you can use the following commands.
+
+- TensorRT
+
+```shell
+python3 ${MMDEPLOY_DIR}/tools/profiler.py \
+  configs/deploy/detection_tensorrt_static-640x640.py \
+  configs/deploy/model/yolov5_s-static.py \
+  data/coco/val2017 \
+  --model work_dir_trt/end2end.engine \
+  --device cuda:0
+```
+
+- ONNXRuntime
+
+```shell
+python3 ${MMDEPLOY_DIR}/tools/profiler.py \
+  configs/deploy/detection_onnxruntime_static.py \
+  configs/deploy/model/yolov5_s-static.py \
+  data/coco/val2017 \
+  --model work_dir_ort/end2end.onnx \
+  --device cpu
+```
+
+## Model Inference
+
+TODO
diff --git a/docs/en/get_started.md b/docs/en/get_started.md
index dd1028ace..ab04f5ba3 100644
--- a/docs/en/get_started.md
+++ b/docs/en/get_started.md
@@ -7,6 +7,7 @@ Compatible MMEngine, MMCV and MMDetection versions are shown as below. Please in
 | MMYOLO version |   MMDetection version    |     MMEngine version     |      MMCV version       |
 | :------------: | :----------------------: | :----------------------: | :---------------------: |
 |      main      | mmdet>=3.0.0rc3, \<3.1.0 | mmengine>=0.3.1, \<1.0.0 | mmcv>=2.0.0rc0, \<2.1.0 |
+|     0.2.0      | mmdet>=3.0.0rc3, \<3.1.0 | mmengine>=0.3.1, \<1.0.0 | mmcv>=2.0.0rc0, \<2.1.0 |
 |     0.1.3      | mmdet>=3.0.0rc3, \<3.1.0 | mmengine>=0.3.1, \<1.0.0 | mmcv>=2.0.0rc0, \<2.1.0 |
 |     0.1.2      | mmdet>=3.0.0rc2, \<3.1.0 | mmengine>=0.3.0, \<1.0.0 | mmcv>=2.0.0rc0, \<2.1.0 |
 |     0.1.1      |     mmdet==3.0.0rc1      | mmengine>=0.1.0, \<0.2.0 | mmcv>=2.0.0rc0, \<2.1.0 |
diff --git a/docs/en/index.rst b/docs/en/index.rst
index 8c60978c4..5082b8427 100644
--- a/docs/en/index.rst
+++ b/docs/en/index.rst
@@ -51,6 +51,13 @@ Welcome to MMYOLO's documentation!
    notes/changelog.md
    notes/faq.md
 
+.. toctree::
+   :maxdepth: 2
+   :caption: Community
+
+   community/contributing.md
+   community/code_style.md
+
 .. toctree::
    :caption: Switch Languag
 
diff --git a/docs/en/notes/changelog.md b/docs/en/notes/changelog.md
index 90175b7b1..fc09f75d2 100644
--- a/docs/en/notes/changelog.md
+++ b/docs/en/notes/changelog.md
@@ -1,5 +1,55 @@
 # Changelog
 
+## v0.2.0（1/12/2022)
+
+### Highlights
+
+1. Support [YOLOv7](https://github.com/open-mmlab/mmyolo/tree/dev/configs/yolov7) P5 and P6 model
+2. Support [YOLOv6](https://github.com/open-mmlab/mmyolo/blob/dev/configs/yolov6/README.md) ML model
+3. Support [Grad-Based CAM and Grad-Free CAM](https://github.com/open-mmlab/mmyolo/blob/dev/demo/boxam_vis_demo.py)
+4. Support [large image inference](https://github.com/open-mmlab/mmyolo/blob/dev/demo/large_image_demo.py) based on sahi
+5. Add [easydeploy](https://github.com/open-mmlab/mmyolo/blob/dev/projects/easydeploy/README.md) project under the projects folder
+6. Add [custom dataset guide](https://github.com/open-mmlab/mmyolo/blob/dev/docs/zh_cn/user_guides/custom_dataset.md)
+
+### New Features
+
+1. `browse_dataset.py` script supports visualization of original image, data augmentation and intermediate results (#304)
+2. Add flag to output labelme label file in `image_demo.py` (#288, #314)
+3. Add `labelme2coco` script (#308, #313)
+4. Add split COCO dataset script (#311)
+5. Add two examples of backbone replacement in `how-to.md` and update `plugin.md` (#291)
+6. Add `contributing.md` and `code_style.md` (#322)
+7. Add docs about how to use mim to run scripts across libraries (#321)
+8. Support `YOLOv5` deployment at RV1126 device (#262)
+
+### Bug Fixes
+
+1. Fix MixUp padding error (#319)
+2. Fix scale factor order error of `LetterResize` and `YOLOv5KeepRatioResize` (#305)
+3. Fix training errors of `YOLOX Nano` model (#285)
+4. Fix `RTMDet` deploy error (#287)
+5. Fix int8 deploy config (#315)
+6. Fix `make_stage_plugins` doc in `basebackbone` (#296)
+7. Enable switch to deploy when create pytorch model in deployment (#324)
+8. Fix some errors in `RTMDet` model graph (#317)
+
+### Improvements
+
+1. Add option of json output in `test.py` (#316)
+2. Add area condition in `extract_subcoco.py` script (#286)
+3. Deployment doc translation (#289)
+4. Add YOLOv6 description overview doc (#252)
+5. Improve `config.md` (#297, #303)
+   6Add mosaic9 graph in docstring  (#307)
+6. Improve `browse_coco_json.py` script args (#309)
+7. Refactor some functions in `dataset_analysis.py` to be more general (#294)
+
+#### Contributors
+
+A total of 14 developers contributed to this release.
+
+Thank  @fcakyon, @matrixgame2018, @MambaWong, @imAzhou, @triple-Mu, @RangeKing, @PeterH0323, @xin-li-67, @kitecats, @hanrui1sensetime, @AllentDan, @Zheng-LinXiao, @hhaAndroid, @wanghonglie
+
 ## v0.1.3（10/11/2022)
 
 ### New Features
diff --git a/docs/en/user_guides/config.md b/docs/en/user_guides/config.md
index 56edadf5f..0e0aac446 100644
--- a/docs/en/user_guides/config.md
+++ b/docs/en/user_guides/config.md
@@ -198,7 +198,7 @@ val_dataloader = dict(
 test_dataloader = val_dataloader
 ```
 
-[Evaluators](https://mmengine.readthedocs.io/en/latest/design/metric_and_evaluator.html) are used to compute the metrics of the trained model on the validation and testing datasets. The config of evaluators consists of one or a list of metric configs:
+[Evaluators](https://mmengine.readthedocs.io/en/latest/design/evaluation.html) are used to compute the metrics of the trained model on the validation and testing datasets. The config of evaluators consists of one or a list of metric configs:
 
 ```python
 val_evaluator = dict(  # Validation evaluator config
@@ -270,7 +270,7 @@ test_cfg = dict(type='TestLoop')  # The testing loop type
 
 ### Optimization config
 
-`optim_wrapper` is the field to configure optimization-related settings. The optimizer wrapper not only provides the functions of the optimizer but also supports functions such as gradient clipping, mixed precision training, etc. Find out more in the [optimizer wrapper tutorial](https://mmengine.readthedocs.io/en/latest/tutorials/optimizer.html).
+`optim_wrapper` is the field to configure optimization-related settings. The optimizer wrapper not only provides the functions of the optimizer but also supports functions such as gradient clipping, mixed precision training, etc. Find out more in the [optimizer wrapper tutorial](https://mmengine.readthedocs.io/en/latest/tutorials/optim_wrapper.html).
 
 ```python
 optim_wrapper = dict(  # Optimizer wrapper config
@@ -282,7 +282,7 @@ optim_wrapper = dict(  # Optimizer wrapper config
         weight_decay=0.0005, # Weight decay of SGD
         nesterov=True, # Enable Nesterov momentum, Refer to http://www.cs.toronto.edu/~hinton/absps/momentum.pdf
         batch_size_pre_gpu=train_batch_size_pre_gpu),  # Enable automatic learning rate scaling
-    clip_grad=None,  # Gradient clip option. Set None to disable gradient clip. Find usage in https://mmengine.readthedocs.io/en/latest/tutorials/optimizer.html
+    clip_grad=None,  # Gradient clip option. Set None to disable gradient clip. Find usage in https://mmengine.readthedocs.io/en/latest/tutorials/optim_wrapper.html
     constructor='YOLOv5OptimizerConstructor') # The constructor for YOLOv5 optimizer
 ```
 
@@ -545,7 +545,7 @@ When submitting jobs using `tools/train.py` or `tools/test.py`, you may specify
 We follow the below style to name config files. Contributors are advised to follow the same style.
 
 ```
-{algorithm name}_{model component names [component1]_[component2]_[...]}-[version id]_[norm setting]_[data preprocessor type]_{training settings}_{training dataset information}_{testing dataset information}.py
+{algorithm name}_{model component names [component1]_[component2]_[...]}-[version id]_[norm setting]_[data preprocessor type]_{training settings}_{training dataset information}_[testing dataset information].py
 ```
 
 The file name is divided into 8 name fields, which have 4 required parts and 4 optional parts. All parts and components are connected with `_` and words of each part or component should be connected with `-`. `{}` indicates the required name field, and `[]` indicates the optional name field.
diff --git a/docs/en/user_guides/custom_dataset.md b/docs/en/user_guides/custom_dataset.md
new file mode 100644
index 000000000..205564aaf
--- /dev/null
+++ b/docs/en/user_guides/custom_dataset.md
@@ -0,0 +1,4 @@
+# The whole process of custom dataset annotation+training+testing+deployment
+
+Coming soon.
+Please refer to [chinese documentation](../../zh_cn/user_guides/custom_dataset.md)
diff --git a/docs/en/user_guides/index.rst b/docs/en/user_guides/index.rst
index 830d6d93e..0981ae036 100644
--- a/docs/en/user_guides/index.rst
+++ b/docs/en/user_guides/index.rst
@@ -24,5 +24,6 @@ Useful Tools
 .. toctree::
    :maxdepth: 1
 
+   custom_dataset.md
    visualization.md
    useful_tools.md
diff --git a/docs/en/user_guides/useful_tools.md b/docs/en/user_guides/useful_tools.md
index a1ec5dfca..e9a5939fd 100644
--- a/docs/en/user_guides/useful_tools.md
+++ b/docs/en/user_guides/useful_tools.md
@@ -5,11 +5,9 @@ We provide lots of useful tools under the `tools/` directory. In addition, you c
 Take MMDetection as an example. If you want to use [print_config.py](https://github.com/open-mmlab/mmdetection/blob/3.x/tools/misc/print_config.py), you can directly use the following commands without copying the source code to the MMYOLO library.
 
 ```shell
-mim run mmdet print_config [CONFIG]
+mim run mmdet print_config ${CONFIG}
 ```
 
-**Note**: The MMDetection library must be installed through the MIM before the above command can succeed.
-
 ## Visualization
 
 ### Visualize COCO labels
@@ -17,49 +15,60 @@ mim run mmdet print_config [CONFIG]
 `tools/analysis_tools/browse_coco_json.py` is a script that can visualization to display the COCO label in the picture.
 
 ```shell
-python tools/analysis_tools/browse_coco_json.py ${DATA_ROOT} \
-                                                [--ann_file ${ANN_FILE}] \
-                                                [--img_dir ${IMG_DIR}] \
+python tools/analysis_tools/browse_coco_json.py [--data-root ${DATA_ROOT}] \
+                                                [--img-dir ${IMG_DIR}] \
+                                                [--ann-file ${ANN_FILE}] \
                                                 [--wait-time ${WAIT_TIME}] \
                                                 [--disp-all] [--category-names CATEGORY_NAMES [CATEGORY_NAMES ...]] \
                                                 [--shuffle]
 ```
 
+If images and labels are in the same folder, you can specify `--data-root` to the folder, and then `--img-dir` and `--ann-file` to specify the relative path of the folder. The code will be automatically spliced.
+If the image and label files are not in the same folder, you do not need to specify `--data-root`, but directly specify `--img-dir` and `--ann-file` of the absolute path.
+
 E.g:
 
 1. Visualize all categories of `COCO` and display all types of annotations such as `bbox` and `mask`:
 
 ```shell
-python tools/analysis_tools/browse_coco_json.py './data/coco/' \
-                                                --ann_file 'annotations/instances_train2017.json' \
-                                                --img_dir 'train2017' \
+python tools/analysis_tools/browse_coco_json.py --data-root './data/coco' \
+                                                --img-dir 'train2017' \
+                                                --ann-file 'annotations/instances_train2017.json' \
+                                                --disp-all
+```
+
+If images and labels are not in the same folder, you can use a absolutely path:
+
+```shell
+python tools/analysis_tools/browse_coco_json.py --img-dir '/dataset/image/coco/train2017' \
+                                                --ann-file '/label/instances_train2017.json' \
                                                 --disp-all
 ```
 
 2. Visualize all categories of `COCO`, and display only the `bbox` type labels, and shuffle the image to show:
 
 ```shell
-python tools/analysis_tools/browse_coco_json.py './data/coco/' \
-                                                --ann_file 'annotations/instances_train2017.json' \
-                                                --img_dir 'train2017' \
+python tools/analysis_tools/browse_coco_json.py --data-root './data/coco' \
+                                                --img-dir 'train2017' \
+                                                --ann-file 'annotations/instances_train2017.json' \
                                                 --shuffle
 ```
 
 3. Only visualize the `bicycle` and `person` categories of `COCO` and only the `bbox` type labels are displayed:
 
 ```shell
-python tools/analysis_tools/browse_coco_json.py './data/coco/' \
-                                                --ann_file 'annotations/instances_train2017.json' \
-                                                --img_dir 'train2017' \
+python tools/analysis_tools/browse_coco_json.py --data-root './data/coco' \
+                                                --img-dir 'train2017' \
+                                                --ann-file 'annotations/instances_train2017.json' \
                                                 --category-names 'bicycle' 'person'
 ```
 
 4. Visualize all categories of `COCO`, and display all types of label such as `bbox`, `mask`, and shuffle the image to show:
 
 ```shell
-python tools/analysis_tools/browse_coco_json.py './data/coco/' \
-                                                --ann_file 'annotations/instances_train2017.json' \
-                                                --img_dir 'train2017' \
+python tools/analysis_tools/browse_coco_json.py --data-root './data/coco' \
+                                                --img-dir 'train2017' \
+                                                --ann-file 'annotations/instances_train2017.json' \
                                                 --disp-all \
                                                 --shuffle
 ```
@@ -350,10 +359,77 @@ python tools/analysis_tools/optimize_anchors.py ${CONFIG} \
     --output-dir ${OUTPUT_DIR}
 ```
 
+## Perform inference on large images
+
+First install [`sahi`](https://github.com/obss/sahi) with:
+
+```shell
+pip install -U sahi>=0.11.4
+```
+
+Perform MMYOLO inference on large images (as satellite imagery) as:
+
+```shell
+wget -P checkpoint https://download.openmmlab.com/mmyolo/v0/yolov5/yolov5_m-v61_syncbn_fast_8xb16-300e_coco/yolov5_m-v61_syncbn_fast_8xb16-300e_coco_20220917_204944-516a710f.pth
+
+python demo/large_image_demo.py \
+    demo/large_image.jpg \
+    configs/yolov5/yolov5_m-v61_syncbn_fast_8xb16-300e_coco.py \
+    checkpoint/yolov5_m-v61_syncbn_fast_8xb16-300e_coco_20220917_204944-516a710f.pth \
+```
+
+Arrange slicing parameters as:
+
+```shell
+python demo/large_image_demo.py \
+    demo/large_image.jpg \
+    configs/yolov5/yolov5_m-v61_syncbn_fast_8xb16-300e_coco.py \
+    checkpoint/yolov5_m-v61_syncbn_fast_8xb16-300e_coco_20220917_204944-516a710f.pth \
+    --patch-size 512
+    --patch-overlap-ratio 0.25
+```
+
+Export debug visuals while performing inference on large images as:
+
+```shell
+python demo/large_image_demo.py \
+    demo/large_image.jpg \
+    configs/yolov5/yolov5_m-v61_syncbn_fast_8xb16-300e_coco.py \
+    checkpoint/yolov5_m-v61_syncbn_fast_8xb16-300e_coco_20220917_204944-516a710f.pth \
+    --debug
+```
+
+[`sahi`](https://github.com/obss/sahi) citation:
+
+```
+@article{akyon2022sahi,
+  title={Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection},
+  author={Akyon, Fatih Cagatay and Altinuc, Sinan Onur and Temizel, Alptekin},
+  journal={2022 IEEE International Conference on Image Processing (ICIP)},
+  doi={10.1109/ICIP46576.2022.9897990},
+  pages={966-970},
+  year={2022}
+}
+```
+
 ## Extracts a subset of COCO
 
 The training dataset of the COCO2017 dataset includes 118K images, and the validation set includes 5K images, which is a relatively large dataset. Loading JSON in debugging or quick verification scenarios will consume more resources and bring slower startup speed.
-The `extract_subcoco.py` script provides the ability to extract a specified number of images. The user can use the `--num-img` parameter to get a COCO subset of the specified number of images.
+
+The `extract_subcoco.py` script provides the ability to extract a specified number/classes/area-size of images. The user can use the `--num-img`, `--classes`, `--area-size` parameter to get a COCO subset of the specified condition of images.
+
+For example, extract images use scripts as follows:
+
+```shell
+python tools/misc/extract_subcoco.py \
+    ${ROOT} \
+    ${OUT_DIR} \
+    --num-img 20 \
+    --classes cat dog person \
+    --area-size small
+```
+
+It gone be extract 20 images, and only includes annotations which belongs to cat(or dog/person) and bbox area size is small, after filter by class and area size, the empty annotation images won't be chosen, guarantee the images be extracted definitely has annotation info.
 
 Currently, only support COCO2017. In the future will support user-defined datasets of standard coco JSON format.
 
@@ -384,3 +460,15 @@ python tools/misc/extract_subcoco.py ${ROOT} ${OUT_DIR} --num-img 20 --use-train
 ```shell
 python tools/misc/extract_subcoco.py ${ROOT} ${OUT_DIR} --num-img 20 --use-training-set --seed 1
 ```
+
+4. Extract images by specify classes
+
+```shell
+python tools/misc/extract_subcoco.py ${ROOT} ${OUT_DIR} --classes cat dog person
+```
+
+5. Extract images by specify anchor size
+
+```shell
+python tools/misc/extract_subcoco.py ${ROOT} ${OUT_DIR} --area-size small
+```
diff --git a/docs/en/user_guides/visualization.md b/docs/en/user_guides/visualization.md
index eb5a530cb..7835a434f 100644
--- a/docs/en/user_guides/visualization.md
+++ b/docs/en/user_guides/visualization.md
@@ -1,5 +1,7 @@
 # Visualization
 
+This article includes feature map visualization and Grad-Based and Grad-Free CAM visualization
+
 ## Feature map visualization
 
 <div align=center>
@@ -13,7 +15,7 @@ In MMYOLO, you can use the `Visualizer` provided in MMEngine for feature map vis
 - Support basic drawing interfaces and feature map visualization.
 - Support selecting different layers in the model to get the feature map. The display methods include `squeeze_mean`, `select_max`, and `topk`. Users can also customize the layout of the feature map display with `arrangement`.
 
-## Feature map generation
+### Feature map generation
 
 You can use `demo/featmap_vis_demo.py` to get a quick view of the visualization results. To better understand all functions, we list all primary parameters and their features here as follows:
 
@@ -51,7 +53,7 @@ You can use `demo/featmap_vis_demo.py` to get a quick view of the visualization
 
 **Note: When the image and feature map scales are different, the `draw_featmap` function will automatically perform an upsampling alignment. If your image has an operation such as `Pad` in the preprocessing during the inference, the feature map obtained is processed with `Pad`, which may cause misalignment problems if you directly upsample the image.**
 
-## Usage examples
+### Usage examples
 
 Take the pre-trained YOLOv5-s model as an example. Please download the model weight file to the root directory.
 
@@ -88,7 +90,7 @@ The original `test_pipeline` is:
 test_pipeline = [
     dict(
         type='LoadImageFromFile',
-        file_client_args={{_base_.file_client_args}}),
+        file_client_args=_base_.file_client_args),
     dict(type='YOLOv5KeepRatioResize', scale=img_scale),
     dict(
         type='LetterResize',
@@ -166,7 +168,7 @@ python demo/featmap_vis_demo.py demo/dog.jpg \
 ```
 
 <div align=center>
-<img src="https://user-images.githubusercontent.com/17425982/198522489-8adee6ae-9915-4e9d-bf50-167b8a12c275.png" width="1200" alt="image"/>
+<img src="https://user-images.githubusercontent.com/17425982/198522489-8adee6ae-9915-4e9d-bf50-167b8a12c275.png" width="800" alt="image"/>
 </div>
 
 (5) When the visualization process finishes, you can choose to display the result or store it locally. You only need to add the parameter `--out-file xxx.jpg`:
@@ -179,3 +181,113 @@ python demo/featmap_vis_demo.py demo/dog.jpg \
                                 --channel-reduction select_max \
                                 --out-file featmap_backbone.jpg
 ```
+
+## Grad-Based and Grad-Free CAM Visualization
+
+Object detection CAM visualization is much more complex and different than classification CAM.
+This article only briefly explains the usage, and a separate document will be opened to describe the implementation principles and precautions in detail later.
+
+You can call `demo/boxmap_vis_demo.py` to get the AM visualization results at the Box level easily and quickly. Currently, `YOLOv5/YOLOv6/YOLOX/RTMDet` is supported.
+
+Taking YOLOv5 as an example, as with the feature map visualization, you need to modify the `test_pipeline` first, otherwise there will be a problem of misalignment between the feature map and the original image.
+
+The original `test_pipeline` is:
+
+```python
+test_pipeline = [
+    dict(
+        type='LoadImageFromFile',
+        file_client_args=_base_.file_client_args),
+    dict(type='YOLOv5KeepRatioResize', scale=img_scale),
+    dict(
+        type='LetterResize',
+        scale=img_scale,
+        allow_scale_up=False,
+        pad_val=dict(img=114)),
+    dict(type='LoadAnnotations', with_bbox=True, _scope_='mmdet'),
+    dict(
+        type='mmdet.PackDetInputs',
+        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+                   'scale_factor', 'pad_param'))
+]
+```
+
+Change to the following version:
+
+```python
+test_pipeline = [
+    dict(
+        type='LoadImageFromFile',
+        file_client_args=_base_.file_client_args),
+    dict(type='mmdet.Resize', scale=img_scale, keep_ratio=False), # change the  LetterResize to mmdet.Resize
+    dict(type='LoadAnnotations', with_bbox=True, _scope_='mmdet'),
+    dict(
+        type='mmdet.PackDetInputs',
+        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+                   'scale_factor'))
+]
+```
+
+(1) Use the `GradCAM` method to visualize the AM of the last output layer of the neck module
+
+```shell
+python demo/boxam_vis_demo.py \
+        demo/dog.jpg \
+        configs/yolov5/yolov5_s-v61_syncbn_fast_8xb16-300e_coco.py \
+        yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700-86e02187.pth
+```
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/17425982/203775584-c4aebf11-4ff8-4530-85fe-7dda897e95a8.jpg" width="800" alt="image"/>
+</div>
+
+The corresponding feature AM is as follows:
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/17425982/203774801-1555bcfb-a8f9-4688-8ed6-982d6ad38e1d.jpg" width="800" alt="image"/>
+</div>
+
+It can be seen that the `GradCAM` effect can highlight the AM information at the box level.
+
+You can choose to visualize only the top prediction boxes with the highest prediction scores via the `--topk` parameter
+
+```shell
+python demo/boxam_vis_demo.py \
+        demo/dog.jpg \
+        configs/yolov5/yolov5_s-v61_syncbn_fast_8xb16-300e_coco.py \
+        yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700-86e02187.pth \
+        --topk 2
+```
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/17425982/203778700-3165aa72-ecaf-40cc-b470-6911646e6046.jpg" width="800" alt="image"/>
+</div>
+
+(2) Use the AblationCAM method to visualize the AM of the last output layer of the neck module
+
+```shell
+python demo/boxam_vis_demo.py \
+        demo/dog.jpg \
+        configs/yolov5/yolov5_s-v61_syncbn_fast_8xb16-300e_coco.py \
+        yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700-86e02187.pth \
+        --method ablationcam
+```
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/17425982/203776978-b5a9b383-93b4-4b35-9e6a-7cac684b372c.jpg" width="800" alt="image"/>
+</div>
+
+Since `AblationCAM` is weighted by the contribution of each channel to the score, it is impossible to visualize only the AM information at the box level like `GradCAN`. But you can use `--norm-in-bbox` to only show bbox inside AM
+
+```shell
+python demo/boxam_vis_demo.py \
+        demo/dog.jpg \
+        configs/yolov5/yolov5_s-v61_syncbn_fast_8xb16-300e_coco.py \
+        yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700-86e02187.pth \
+        --method ablationcam \
+        --norm-in-bbox
+```
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/17425982/203777566-7c74e82f-b477-488e-958f-91e1d10833b9.jpg" width="800" alt="image"/>
+</div>
diff --git a/docs/zh_cn/advanced_guides/how_to.md b/docs/zh_cn/advanced_guides/how_to.md
index 00bd82465..9793e59ee 100644
--- a/docs/zh_cn/advanced_guides/how_to.md
+++ b/docs/zh_cn/advanced_guides/how_to.md
@@ -47,17 +47,33 @@ model = dict(
 )
 ```
 
-## 跨库使用主干网络
-
-OpenMMLab 2.0 体系中 MMYOLO、MMDetection、MMClassification、MMSegmentation 中的模型注册表都继承自 MMEngine 中的根注册表，允许这些 OpenMMLab
-开源库直接使用彼此已经实现的模块。 因此用户可以在 MMYOLO 中使用来自 MMDetection、MMClassification 的主干网络，而无需重新实现。
+## 更换主干网络
 
 ```{note}
 1. 使用其他主干网络时，你需要保证主干网络的输出通道与 Neck 的输入通道相匹配。
 2. 下面给出的配置文件，仅能确保训练可以正确运行，直接训练性能可能不是最优的。因为某些 backbone 需要配套特定的学习率、优化器等超参数。后续会在“训练技巧章节”补充训练调优相关内容。
 ```
 
-### 使用在 MMDetection 中实现的主干网络
+### 使用 MMYOLO 中注册的主干网络
+
+假设想将 `YOLOv6EfficientRep`  作为 `YOLOv5` 的主干网络，则配置文件如下：
+
+```python
+_base_ = './yolov5_s-v61_syncbn_8xb16-300e_coco.py'
+
+model = dict(
+    backbone=dict(
+        type='YOLOv6EfficientRep',
+        norm_cfg=dict(type='BN', momentum=0.03, eps=0.001),
+        act_cfg=dict(type='ReLU', inplace=True))
+)
+```
+
+### 跨库使用主干网络
+
+OpenMMLab 2.0 体系中 MMYOLO、MMDetection、MMClassification、MMSelfsup 中的模型注册表都继承自 MMEngine 中的根注册表，允许这些 OpenMMLab 开源库直接使用彼此已经实现的模块。 因此用户可以在 MMYOLO 中使用来自 MMDetection、MMClassification、MMSelfsup 的主干网络，而无需重新实现。
+
+#### 使用在 MMDetection 中实现的主干网络
 
 1. 假设想将 `ResNet-50` 作为 `YOLOv5` 的主干网络，则配置文件如下：
 
@@ -138,7 +154,7 @@ OpenMMLab 2.0 体系中 MMYOLO、MMDetection、MMClassification、MMSegmentation
    )
    ```
 
-### 使用在 MMClassification 中实现的主干网络
+#### 使用在 MMClassification 中实现的主干网络
 
 1. 假设想将 `ConvNeXt-Tiny` 作为 `YOLOv5` 的主干网络，则配置文件如下：
 
@@ -218,10 +234,9 @@ OpenMMLab 2.0 体系中 MMYOLO、MMDetection、MMClassification、MMSegmentation
    )
    ```
 
-### 通过 MMClassification 使用 `timm` 中实现的主干网络
+#### 通过 MMClassification 使用 `timm` 中实现的主干网络
 
-由于 MMClassification 提供了 Py**T**orch **Im**age **M**odels (`timm`) 主干网络的封装，用户也可以通过 MMClassification 直接使用 `timm`
-中的主干网络。假设想将 `EfficientNet-B1`作为 `YOLOv5` 的主干网络，则配置文件如下：
+由于 MMClassification 提供了 Py**T**orch **Im**age **M**odels (`timm`) 主干网络的封装，用户也可以通过 MMClassification 直接使用 `timm` 中的主干网络。假设想将 `EfficientNet-B1`作为 `YOLOv5` 的主干网络，则配置文件如下：
 
 ```python
 _base_ = './yolov5_s-v61_syncbn_8xb16-300e_coco.py'
@@ -257,3 +272,197 @@ model = dict(
             widen_factor=widen_factor))
 )
 ```
+
+#### 使用在 MMSelfSup 中实现的主干网络
+
+假设想将 MMSelfSup 中 `MoCo v3`  自监督训练的 `ResNet-50` 作为 `YOLOv5` 的主干网络，则配置文件如下：
+
+```python
+_base_ = './yolov5_s-v61_syncbn_8xb16-300e_coco.py'
+
+# 请先使用命令： mim install "mmselfsup>=1.0.0rc3"，安装 mmselfsup
+# 导入 mmselfsup.models 使得可以调用 mmselfsup 中注册的模块
+custom_imports = dict(imports=['mmselfsup.models'], allow_failed_imports=False)
+checkpoint_file = 'https://download.openmmlab.com/mmselfsup/1.x/mocov3/mocov3_resnet50_8xb512-amp-coslr-800e_in1k/mocov3_resnet50_8xb512-amp-coslr-800e_in1k_20220927-e043f51a.pth'  # noqa
+deepen_factor = _base_.deepen_factor
+widen_factor = 1.0
+channels = [512, 1024, 2048]
+
+model = dict(
+    backbone=dict(
+        _delete_=True, # 将 _base_ 中关于 backbone 的字段删除
+        type='mmselfsup.ResNet',
+        depth=50,
+        num_stages=4,
+        out_indices=(2, 3, 4), # 注意：MMSelfSup 中 ResNet 的 out_indices 比 MMdet 和 MMCls 的要大 1
+        frozen_stages=1,
+        norm_cfg=dict(type='BN', requires_grad=True),
+        norm_eval=True,
+        style='pytorch',
+        init_cfg=dict(type='Pretrained', checkpoint=checkpoint_file)),
+    neck=dict(
+        type='YOLOv5PAFPN',
+        deepen_factor=deepen_factor,
+        widen_factor=widen_factor,
+        in_channels=channels, # 注意：ResNet-50 输出的3个通道是 [512, 1024, 2048]，和原先的 yolov5-s neck 不匹配，需要更改
+        out_channels=channels),
+    bbox_head=dict(
+        type='YOLOv5Head',
+        head_module=dict(
+            type='YOLOv5HeadModule',
+            in_channels=channels, # head 部分输入通道也要做相应更改
+            widen_factor=widen_factor))
+)
+```
+
+## 输出预测结果
+
+如果想将预测结果保存为特定的文件，用于离线评估，目前 MMYOLO 支持 json 和 pkl 两种格式。
+
+```{note}
+json 文件仅保存 `image_id`、`bbox`、`score` 和 `category_id`； json 文件可以使用 json 库读取。
+pkl 保存内容比 json 文件更多，还会保存预测图片的文件名和尺寸等一系列信息； pkl 文件可以使用 pickle 库读取。
+```
+
+### 输出为 json 文件
+
+如果想将预测结果输出为 json 文件，则命令如下：
+
+```shell
+python tools/test.py ${CONFIG} ${CHECKPOINT} --json-prefix ${JSON_PREFIX}
+```
+
+`--json-prefix` 后的参数输入为文件名前缀（无需输入 `.json` 后缀），也可以包含路径。举一个具体例子：
+
+```shell
+python tools/test.py configs\yolov5\yolov5_s-v61_syncbn_8xb16-300e_coco.py yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700-86e02187.pth --json-prefix work_dirs/demo/json_demo
+```
+
+运行以上命令会在 `work_dirs/demo` 文件夹下，输出 `json_demo.bbox.json` 文件。
+
+### 输出为 pkl 文件
+
+如果想将预测结果输出为 pkl 文件，则命令如下：
+
+```shell
+python tools/test.py ${CONFIG} ${CHECKPOINT} --out ${OUTPUT_FILE} [--cfg-options ${OPTIONS [OPTIONS...]}]
+```
+
+`--out` 后的参数输入为完整文件名（**必须输入** `.pkl` 或 `.pickle` 后缀），也可以包含路径。举一个具体例子：
+
+```shell
+python tools/test.py configs\yolov5\yolov5_s-v61_syncbn_8xb16-300e_coco.py yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700-86e02187.pth --out work_dirs/demo/pkl_demo.pkl
+```
+
+运行以上命令会在 `work_dirs/demo` 文件夹下，输出 `pkl_demo.pkl` 文件。
+
+## 使用 mim 跨库调用其他 OpenMMLab 仓库的脚本
+
+```{note}
+1. 目前暂不支持跨库调用所有脚本，正在修复中。等修复完成，本文档会添加更多的例子。
+2. 绘制 mAP 和 计算平均训练速度 两项功能在 MMDetection dev-3.x 分支中修复，目前需要通过源码安装该分支才能成功调用。
+```
+
+### 日志分析
+
+#### 曲线图绘制
+
+MMDetection 中的 `tools/analysis_tools/analyze_logs.py` 可利用指定的训练 log 文件绘制 loss/mAP 曲线图， 第一次运行前请先运行 `pip install seaborn` 安装必要依赖。
+
+```shell
+mim run mmdet analyze_logs plot_curve \
+    ${LOG} \                                     # 日志文件路径
+    [--keys ${KEYS}] \                           # 需要绘制的指标，默认为 'bbox_mAP'
+    [--start-epoch ${START_EPOCH}]               # 起始的 epoch，默认为 1
+    [--eval-interval ${EVALUATION_INTERVAL}] \   # 评估间隔，默认为 1
+    [--title ${TITLE}] \                         # 图片标题，无默认值
+    [--legend ${LEGEND}] \                       # 图例，默认为 None
+    [--backend ${BACKEND}] \                     # 绘制后端，默认为 None
+    [--style ${STYLE}] \                         # 绘制风格，默认为 'dark'
+    [--out ${OUT_FILE}]                          # 输出文件路径
+# [] 代表可选参数，实际输入命令行时，不用输入 []
+```
+
+样例：
+
+- 绘制分类损失曲线图
+
+  ```shell
+  mim run mmdet analyze_logs plot_curve \
+      yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700.log.json \
+      --keys loss_cls \
+      --legend loss_cls
+  ```
+
+  <img src="https://user-images.githubusercontent.com/27466624/204747359-754555df-1f97-4d5c-87ca-9ad3a0badcce.png" width="600"/>
+
+- 绘制分类损失、回归损失曲线图，保存图片为对应的 pdf 文件
+
+  ```shell
+  mim run mmdet analyze_logs plot_curve \
+      yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700.log.json \
+      --keys loss_cls loss_bbox \
+      --legend loss_cls loss_bbox \
+      --out losses_yolov5_s.pdf
+  ```
+
+  <img src="https://user-images.githubusercontent.com/27466624/204748560-2d17ce4b-fb5f-4732-a962-329109e73aad.png" width="600"/>
+
+- 在同一图像中比较两次运行结果的 bbox mAP
+
+  ```shell
+  mim run mmdet analyze_logs plot_curve \
+      yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700.log.json \
+      yolov5_n-v61_syncbn_fast_8xb16-300e_coco_20220919_090739.log.json \
+      --keys bbox_mAP \
+      --legend yolov5_s yolov5_n \
+      --eval-interval 10 # 注意评估间隔必须和训练时设置的一致，否则会报错
+  ```
+
+<img src="https://user-images.githubusercontent.com/27466624/204748704-21db9f9e-386e-449c-91c7-2ce3f8b51f24.png" width="600"/>
+
+#### 计算平均训练速度
+
+```shell
+mim run mmdet analyze_logs cal_train_time \
+    ${LOG} \                                # 日志文件路径
+    [--include-outliers]                    # 计算时包含每个 epoch 的第一个数据
+```
+
+样例：
+
+```shell
+mim run mmdet analyze_logs cal_train_time \
+    yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700.log.json
+```
+
+输出以如下形式展示：
+
+```text
+-----Analyze train time of yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700.log.json-----
+slowest epoch 278, average time is 0.1705 s/iter
+fastest epoch 300, average time is 0.1510 s/iter
+time std over epochs is 0.0026
+average iter time: 0.1556 s/iter
+```
+
+### 打印完整配置文件
+
+MMDetection 中的 `tools/misc/print_config.py` 脚本可将所有配置继承关系展开，打印相应的完整配置文件。调用命令如下：
+
+```shell
+mim run mmdet print_config \
+    ${CONFIG} \                              # 需要打印的配置文件路径
+    [--save-path] \                          # 保存文件路径，必须以 .py, .json 或者 .yml 结尾
+    [--cfg-options ${OPTIONS [OPTIONS...]}]  # 通过命令行参数修改配置文件
+```
+
+样例：
+
+```shell
+mim run mmdet print_config \
+    configs/yolov5/yolov5_s-v61_syncbn_fast_1xb4-300e_balloon.py \
+    --save-path ./work_dirs/yolov5_s-v61_syncbn_fast_1xb4-300e_balloon_whole.py
+```
+
+运行以上命令，会将 `yolov5_s-v61_syncbn_fast_1xb4-300e_balloon.py` 继承关系展开后的配置文件保存到 `./work_dirs` 文件夹内的 `yolov5_s-v61_syncbn_fast_1xb4-300e_balloon_whole.py` 文件中。
diff --git a/docs/zh_cn/advanced_guides/plugins.md b/docs/zh_cn/advanced_guides/plugins.md
index 338d280f8..ae8a17d75 100644
--- a/docs/zh_cn/advanced_guides/plugins.md
+++ b/docs/zh_cn/advanced_guides/plugins.md
@@ -1,7 +1,6 @@
 # 更多的插件使用
 
-MMYOLO 支持在 Backbone 的不同 Stage 后增加如 `none_local`、`dropblock` 等插件，用户可以直接通过修改 config 文件中 `backbone` 的 `plugins`
-参数来实现对插件的管理。例如为 `YOLOv5` 增加 `GeneralizedAttention` 插件，其配置文件如下：
+MMYOLO 支持在 Backbone 的不同 Stage 后增加如 `none_local`、`dropblock` 等插件，用户可以直接通过修改 config 文件中 `backbone` 的 `plugins`参数来实现对插件的管理。例如为 `YOLOv5` 增加`GeneralizedAttention` 插件，其配置文件如下：
 
 ```python
 _base_ = './yolov5_s-v61_syncbn_8xb16-300e_coco.py'
@@ -27,9 +26,9 @@ model = dict(
 <details open>
 <summary><b>支持的插件</b></summary>
 
-- [x] [CBAM](https://github.com/open-mmlab/mmyolo/blob/dev/mmyolo/models/plugins/cbam.py#L84)
-- [x] [GeneralizedAttention](https://github.com/open-mmlab/mmcv/blob/2.x/mmcv/cnn/bricks/generalized_attention.py#L13)
-- [x] [NonLocal2d](https://github.com/open-mmlab/mmcv/blob/2.x/mmcv/cnn/bricks/non_local.py#L250)
-- [x] [ContextBlock](https://github.com/open-mmlab/mmcv/blob/2.x/mmcv/cnn/bricks/context_block.py#L18)
+1. [CBAM](https://github.com/open-mmlab/mmyolo/blob/dev/mmyolo/models/plugins/cbam.py#L84)
+2. [GeneralizedAttention](https://github.com/open-mmlab/mmcv/blob/2.x/mmcv/cnn/bricks/generalized_attention.py#L13)
+3. [NonLocal2d](https://github.com/open-mmlab/mmcv/blob/2.x/mmcv/cnn/bricks/non_local.py#L250)
+4. [ContextBlock](https://github.com/open-mmlab/mmcv/blob/2.x/mmcv/cnn/bricks/context_block.py#L18)
 
 </details>
diff --git a/docs/zh_cn/algorithm_descriptions/index.rst b/docs/zh_cn/algorithm_descriptions/index.rst
index 82797502e..a8ff8d957 100644
--- a/docs/zh_cn/algorithm_descriptions/index.rst
+++ b/docs/zh_cn/algorithm_descriptions/index.rst
@@ -14,4 +14,5 @@
    :maxdepth: 1
 
    yolov5_description.md
+   yolov6_description.md
    rtmdet_description.md
diff --git a/docs/zh_cn/algorithm_descriptions/rtmdet_description.md b/docs/zh_cn/algorithm_descriptions/rtmdet_description.md
index 164a83a1c..cc4a3ef3d 100644
--- a/docs/zh_cn/algorithm_descriptions/rtmdet_description.md
+++ b/docs/zh_cn/algorithm_descriptions/rtmdet_description.md
@@ -5,7 +5,7 @@
 高性能，低延时的单阶段目标检测器
 
 <div align=center>
-<img alt="RTMDet_structure_v1.2" src="https://user-images.githubusercontent.com/27466624/200001002-008ac696-e74d-4da1-9c6d-07149e2ad752.jpg"/>
+<img alt="RTMDet_structure_v1.3" src="https://user-images.githubusercontent.com/27466624/204126145-cb4ff4f1-fb16-455e-96b5-17620081023a.jpg"/>
 </div>
 
 以上结构图由 RangeKing@github 绘制。
diff --git a/docs/zh_cn/algorithm_descriptions/yolov6_description.md b/docs/zh_cn/algorithm_descriptions/yolov6_description.md
new file mode 100644
index 000000000..050ad2423
--- /dev/null
+++ b/docs/zh_cn/algorithm_descriptions/yolov6_description.md
@@ -0,0 +1,106 @@
+# YOLOv6 原理和实现全解析
+
+## 0 简介
+
+以上结构图 xxx 绘制。
+
+YOLOv6 有一系列适用于各种工业场景的模型，包括N/T/S/M/L，考虑到模型的大小，其架构有所不同，以获得更好的精度-速度权衡。 此外，还引入了一些 "Bag-of-freebies "方法来进一步提高性能，如自我渐变和更多的训练周期。 在工业部署方面，我们采用QAT与信道蒸馏和图形优化来追求极端的性能（后续支持）。
+
+简单来说 YOLOv6 开源库的主要特点为：
+
+1. 统一设计了更高效的 Backbone 和 Neck：受到硬件感知神经网络设计思想的启发，基于 RepVGG style 设计了可重参数化、更高效的骨干网络 EfficientRep Backbone 和 Rep-PAN Neck。
+2. 相比于 YOLOX 的 Decoupled Head，进一步优化设计了简洁有效的 Efficient Decoupled Head，在维持精度的同时，降低了一般解耦头带来的额外延时开销。
+3. 在训练策略上，采用 Anchor-free 的策略，同时辅以 SimOTA 标签分配策略以及 SIoU 边界框回归损失来进一步提高检测精度。
+
+本文将从 YOLOv6 算法本身原理讲起，然后重点分析 MMYOLO 中的实现。关于 YOLOv6 的使用指南和速度等对比请阅读本文的后续内容。
+
+希望本文能够成为你入门和掌握 YOLOv6 的核心文档。由于 YOLOv6 本身也在不断迭代更新，我们也会不断的更新本文档。请注意阅读最新版本。
+
+MMYOLO 实现配置：https://github.com/open-mmlab/mmyolo/blob/main/configs/yolov6/
+
+YOLOv6 官方开源库地址：https://github.com/meituan/YOLOv6
+
+## 1 YLOLv6 2.0 算法原理和 MMYOLO 实现解析
+
+YOLOv6 2.0 官方 release 地址：https://github.com/meituan/YOLOv6/releases/tag/0.2.0
+
+<div align=center >
+<img alt="YOLOv6精度图" src="https://github.com/meituan/YOLOv6/blob/main/assets/speed_comparision_v2.png"/>
+</div>
+
+<div align=center >
+<img alt="YOLOv6精度速度图" src="https://user-images.githubusercontent.com/25873202/201611723-0d3d02be-d778-4bdd-8010-fbcb9df8740e.png"/>
+</div>
+
+YOLOv6 和 YOLOv5 一样也可以分成数据增强、模型结构、loss 计算等组件，如下所示：
+
+<div align=center >
+<img alt="训练测试策略" src="https://user-images.githubusercontent.com/40284075/190542423-f6b20d8e-c82a-4a34-9065-c161c5e29e7c.png"/>
+</div>
+
+下面将从原理和结合 MMYOLO 的具体实现方面进行简要分析。
+
+### 1.1 数据增强模块
+
+YOLOv6 目标检测算法中使用的数据增强与 YOLOv5 基本一致，唯独不一样的是没有使用 Albu 的数据增强方式：
+
+- **Mosaic 马赛克**
+- **RandomAffine 随机仿射变换**
+- **MixUp**
+- ~~**图像模糊等采用 Albu 库实现的变换**~~
+- **HSV 颜色空间增强**
+- **随机水平翻转**
+
+关于每一个增强的详细解释，详情请看 [YOLOv5 数据增强模块](yolov5_description.md)
+
+另外，YOLOv6 参考了 YOLOX 的数据增强方式，分为 2 钟增强方法组，一开始和 YOLOv5 一致，但是在最后 15 个 epoch 的时候将 `Mosaic` 使用 `YOLOv5KeepRatioResize` + `LetterResize` 替代了，个人感觉是为了拟合真实情况。
+
+### 1.2 网络结构
+
+#### 1.2.1 Backbone
+
+#### 1.2.2 Neck
+
+#### 1.2.3 Head
+
+### 1.3 正负样本匹配策略
+
+#### 1.3.1 Anchor 设置
+
+YOLOv6 采用与 YOLOX 一样的 Anchor-free 无锚范式，省略的了聚类和繁琐的Anchor超参设定，泛化能力强，解码逻辑简单。在训练的过程中会根据 feature size 去自动生成先验框。
+
+#### 1.3.2 Bbox 编解码过程
+
+与 YOLOv5 一致，详情请看 [YOLOv5 Bbox 编解码过程](yolov5_description.md)
+
+#### 1.3.3 匹配策略
+
+### 1.4 Loss 设计
+
+- Classes loss：使用的是 `mmdet.VarifocalLoss`
+- Objectness loss：使用的是 `mmdet.CrossEntropyLoss`
+- BBox loss：l/m/s使用的是 GIoULoss,  t/n 用的是 SIoULoss
+
+另外 YOLOv6 在计算 loss 之前，根据 epoch 的不同，会经过不同的 Assigner：
+
+- epoch \< 4，使用 `BatchATSSAssigner`
+- epoch >= 4，使用 `BatchTaskAlignedAssigner`
+
+### 1.5 优化策略和训练过程
+
+#### 1.5.1 优化器分组
+
+与 YOLOv5 一致，详情请看 [YOLOv5 优化器分组](yolov5_description.md)
+
+#### 1.5.2 weight decay 参数自适应
+
+与 YOLOv5 一致，详情请看 [YOLOv5 weight decay 参数自适应](yolov5_description.md)
+
+### 1.6 推理和后处理过程
+
+YOLOv6 后处理过程和 YOLOv5 高度类似，实际上 YOLO 系列的后处理逻辑都是类似的。
+详情请看 [YOLOv5 推理和后处理过程](yolov5_description.md)
+
+## 2 总结
+
+本文对 YOLOv6 原理和在 MMYOLO 实现进行了详细解析，希望能帮助用户理解算法实现过程。同时请注意：由于 YOLOv6 本身也在不断更新，本开源库也会不断迭代，请及时阅读和同步最新版本。
diff --git a/docs/zh_cn/article.md b/docs/zh_cn/article.md
index ccabaf353..30332909d 100644
--- a/docs/zh_cn/article.md
+++ b/docs/zh_cn/article.md
@@ -7,29 +7,40 @@
 ### 文章
 
 - [社区协作，简洁易用，快来开箱新一代 YOLO 系列开源库](https://zhuanlan.zhihu.com/p/575615805)
+
 - [MMYOLO 社区倾情贡献，RTMDet 原理社区开发者解读来啦！](https://zhuanlan.zhihu.com/p/569777684)
+
 - [玩转 MMYOLO 基础类第一期： 配置文件太复杂？继承用法看不懂？配置全解读来了](https://zhuanlan.zhihu.com/p/577715188)
+
 - [玩转 MMYOLO 工具类第一期： 特征图可视化](https://zhuanlan.zhihu.com/p/578141381?)
 
+- [玩转 MMYOLO 实用类第二期：源码阅读和调试「必备」技巧文档](https://zhuanlan.zhihu.com/p/580885852)
+
+- [玩转 MMYOLO 基础类第二期：工程文件结构简析](https://zhuanlan.zhihu.com/p/584807195)
+
+- [玩转 MMYOLO 实用类第二期：10分钟换遍主干网络文档](https://zhuanlan.zhihu.com/p/585641598)
+
 ### 视频
 
 #### 工具类
 
-|       |     内容     |                                                                                                                                                                                                     视频                                                                                                                                                                                                      |                                                                                                         课程中的代码                                                                                                          |
-| :---: | :----------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
-| 第1讲 | 特征图可视化 | [![Link](https://i2.hdslb.com/bfs/archive/480a0eb41fce26e0acb65f82a74501418eee1032.jpg@112w_63h_1c.webp)](https://www.bilibili.com/video/BV188411s7o8)  [![bilibili](https://img.shields.io/badge/dynamic/json?label=views&style=social&logo=bilibili&query=data.stat.view&url=https%3A%2F%2Fapi.bilibili.com%2Fx%2Fweb-interface%2Fview%3Fbvid%3DBV188411s7o8)](https://www.bilibili.com/video/BV188411s7o8) | [特征图可视化.ipynb](https://github.com/open-mmlab/OpenMMLabCourse/blob/main/codes/MMYOLO_tutorials/%5B%E5%B7%A5%E5%85%B7%E7%B1%BB%E7%AC%AC%E4%B8%80%E6%9C%9F%5D%E7%89%B9%E5%BE%81%E5%9B%BE%E5%8F%AF%E8%A7%86%E5%8C%96.ipynb) |
+|       |     内容     |                                                                                                                                                                                                     视频                                                                                                                                                                                                      |                                                                                                                                      课程中的代码/文档                                                                                                                                      |
+| :---: | :----------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| 第1讲 | 特征图可视化 | [![Link](https://i2.hdslb.com/bfs/archive/480a0eb41fce26e0acb65f82a74501418eee1032.jpg@112w_63h_1c.webp)](https://www.bilibili.com/video/BV188411s7o8)  [![bilibili](https://img.shields.io/badge/dynamic/json?label=views&style=social&logo=bilibili&query=data.stat.view&url=https%3A%2F%2Fapi.bilibili.com%2Fx%2Fweb-interface%2Fview%3Fbvid%3DBV188411s7o8)](https://www.bilibili.com/video/BV188411s7o8) | [特征图可视化文档](https://zhuanlan.zhihu.com/p/578141381)<br>[特征图可视化.ipynb](https://github.com/open-mmlab/OpenMMLabCourse/blob/main/codes/MMYOLO_tutorials/%5B%E5%B7%A5%E5%85%B7%E7%B1%BB%E7%AC%AC%E4%B8%80%E6%9C%9F%5D%E7%89%B9%E5%BE%81%E5%9B%BE%E5%8F%AF%E8%A7%86%E5%8C%96.ipynb) |
 
 #### 基础类
 
-|       |    内容    |                                                                                                                                                                                                     视频                                                                                                                                                                                                     |                    课程中的代码/文档                     |
-| :---: | :--------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------: |
-| 第1讲 | 配置全解读 | [![Link](http://i1.hdslb.com/bfs/archive/e06daf640ea39b3c0700bb4dc758f1a253f33e13.jpg@112w_63h_1c.webp)](https://www.bilibili.com/video/BV1214y157ck)  [![bilibili](https://img.shields.io/badge/dynamic/json?label=views&style=social&logo=bilibili&query=data.stat.view&url=https%3A%2F%2Fapi.bilibili.com%2Fx%2Fweb-interface%2Fview%3Fbvid%3DBV1214y157ck)](https://www.bilibili.com/video/BV1214y157ck) | [配置全解读文档](https://zhuanlan.zhihu.com/p/577715188) |
+|       |       内容       |                                                                                                                                                                                                     视频                                                                                                                                                                                                     |                       课程中的代码/文档                        |
+| :---: | :--------------: | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :------------------------------------------------------------: |
+| 第1讲 |    配置全解读    | [![Link](http://i1.hdslb.com/bfs/archive/e06daf640ea39b3c0700bb4dc758f1a253f33e13.jpg@112w_63h_1c.webp)](https://www.bilibili.com/video/BV1214y157ck)  [![bilibili](https://img.shields.io/badge/dynamic/json?label=views&style=social&logo=bilibili&query=data.stat.view&url=https%3A%2F%2Fapi.bilibili.com%2Fx%2Fweb-interface%2Fview%3Fbvid%3DBV1214y157ck)](https://www.bilibili.com/video/BV1214y157ck) |    [配置全解读文档](https://zhuanlan.zhihu.com/p/577715188)    |
+| 第2讲 | 工程文件结构简析 |  [![Link](http://i2.hdslb.com/bfs/archive/41030efb84d0cada06d5451c1e6e9bccc0cdb5a3.jpg@112w_63h_1c.webp)](https://www.bilibili.com/video/BV1LP4y117jS)[![bilibili](https://img.shields.io/badge/dynamic/json?label=views&style=social&logo=bilibili&query=data.stat.view&url=https%3A%2F%2Fapi.bilibili.com%2Fx%2Fweb-interface%2Fview%3Fbvid%3DBV1LP4y117jS)](https://www.bilibili.com/video/BV1LP4y117jS)  | [工程文件结构简析文档](https://zhuanlan.zhihu.com/p/584807195) |
 
 #### 实用类
 
-|       |            内容            |                                                                                                                                                                                                     视频                                                                                                                                                                                                      |          课程中的代码/文档           |
-| :---: | :------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :----------------------------------: |
-| 第1讲 | 源码阅读和调试「必备」技巧 | [![Link](https://i2.hdslb.com/bfs/archive/790d2422c879ff20488910da1c4422b667ea6af7.jpg@112w_63h_1c.webp)](https://www.bilibili.com/video/BV1N14y1V7mB)  [![bilibili](https://img.shields.io/badge/dynamic/json?label=views&style=social&logo=bilibili&query=data.stat.view&url=https%3A%2F%2Fapi.bilibili.com%2Fx%2Fweb-interface%2Fview%3Fbvid%3DBV1N14y1V7mB)](https://www.bilibili.com/video/BV1N14y1V7mB) | [源码阅读和调试「必备」技巧文档](<>) |
+|       |            内容            |                                                                                                                                                                                                     视频                                                                                                                                                                                                      |                                                                                                   课程中的代码/文档                                                                                                   |
+| :---: | :------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
+| 第1讲 | 源码阅读和调试「必备」技巧 | [![Link](https://i2.hdslb.com/bfs/archive/790d2422c879ff20488910da1c4422b667ea6af7.jpg@112w_63h_1c.webp)](https://www.bilibili.com/video/BV1N14y1V7mB)  [![bilibili](https://img.shields.io/badge/dynamic/json?label=views&style=social&logo=bilibili&query=data.stat.view&url=https%3A%2F%2Fapi.bilibili.com%2Fx%2Fweb-interface%2Fview%3Fbvid%3DBV1N14y1V7mB)](https://www.bilibili.com/video/BV1N14y1V7mB) |                                                                       [源码阅读和调试「必备」技巧文档](https://zhuanlan.zhihu.com/p/580885852)                                                                        |
+| 第2讲 |     10分钟换遍主干网络     | [![Link](http://i0.hdslb.com/bfs/archive/c51f1aef7c605856777249a7b4478f44bd69f3bd.jpg@112w_63h_1c.webp)](https://www.bilibili.com/video/BV1JG4y1d7GC)  [![bilibili](https://img.shields.io/badge/dynamic/json?label=views&style=social&logo=bilibili&query=data.stat.view&url=https%3A%2F%2Fapi.bilibili.com%2Fx%2Fweb-interface%2Fview%3Fbvid%3DBV1JG4y1d7GC)](https://www.bilibili.com/video/BV1JG4y1d7GC)  | [10分钟换遍主干网络文档](https://zhuanlan.zhihu.com/p/585641598)<br>[10分钟换遍主干网络.ipynb](https://github.com/open-mmlab/OpenMMLabCourse/blob/main/codes/MMYOLO_tutorials/[实用类第二期]10分钟换遍主干网络.ipynb) |
 
 #### 源码解读类
 
diff --git a/docs/zh_cn/community/code_style.md b/docs/zh_cn/community/code_style.md
new file mode 100644
index 000000000..8ddb87c23
--- /dev/null
+++ b/docs/zh_cn/community/code_style.md
@@ -0,0 +1,609 @@
+## 代码规范
+
+### 代码规范标准
+
+#### PEP 8 —— Python 官方代码规范
+
+[Python 官方的代码风格指南](https://www.python.org/dev/peps/pep-0008/)，包含了以下几个方面的内容：
+
+- 代码布局，介绍了 Python 中空行、断行以及导入相关的代码风格规范。比如一个常见的问题：当我的代码较长，无法在一行写下时，何处可以断行？
+
+- 表达式，介绍了 Python 中表达式空格相关的一些风格规范。
+
+- 尾随逗号相关的规范。当列表较长，无法一行写下而写成如下逐行列表时，推荐在末项后加逗号，从而便于追加选项、版本控制等。
+
+  ```python
+  # Correct:
+  FILES = ['setup.cfg', 'tox.ini']
+  # Correct:
+  FILES = [
+      'setup.cfg',
+      'tox.ini',
+  ]
+  # Wrong:
+  FILES = ['setup.cfg', 'tox.ini',]
+  # Wrong:
+  FILES = [
+      'setup.cfg',
+      'tox.ini'
+  ]
+  ```
+
+- 命名相关规范、注释相关规范、类型注解相关规范，我们将在后续章节中做详细介绍。
+
+  "A style guide is about consistency. Consistency with this style guide is important. Consistency within a project is more important. Consistency within one module or function is the most important." PEP 8 -- Style Guide for Python Code
+
+:::{note}
+PEP 8 的代码规范并不是绝对的，项目内的一致性要优先于 PEP 8 的规范。OpenMMLab 各个项目都在 setup.cfg 设定了一些代码规范的设置，请遵照这些设置。一个例子是在 PEP 8 中有如下一个例子：
+
+```python
+# Correct:
+hypot2 = x*x + y*y
+# Wrong:
+hypot2 = x * x + y * y
+```
+
+这一规范是为了指示不同优先级，但 OpenMMLab 的设置中通常没有启用 yapf 的 `ARITHMETIC_PRECEDENCE_INDICATION` 选项，因而格式规范工具不会按照推荐样式格式化，以设置为准。
+:::
+
+#### Google 开源项目风格指南
+
+[Google 使用的编程风格指南](https://google.github.io/styleguide/pyguide.html)，包括了 Python 相关的章节。相较于 PEP 8，该指南提供了更为详尽的代码指南。该指南包括了语言规范和风格规范两个部分。
+
+其中，语言规范对 Python 中很多语言特性进行了优缺点的分析，并给出了使用指导意见，如异常、Lambda 表达式、列表推导式、metaclass 等。
+
+风格规范的内容与 PEP 8 较为接近，大部分约定建立在 PEP 8 的基础上，也有一些更为详细的约定，如函数长度、TODO 注释、文件与 socket 对象的访问等。
+
+推荐将该指南作为参考进行开发，但不必严格遵照，一来该指南存在一些 Python 2 兼容需求，例如指南中要求所有无基类的类应当显式地继承 Object, 而在仅使用 Python 3 的环境中，这一要求是不必要的，依本项目中的惯例即可。二来 OpenMMLab 的项目作为框架级的开源软件，不必对一些高级技巧过于避讳，尤其是 MMCV。但尝试使用这些技巧前应当认真考虑是否真的有必要，并寻求其他开发人员的广泛评估。
+
+另外需要注意的一处规范是关于包的导入，在该指南中，要求导入本地包时必须使用路径全称，且导入的每一个模块都应当单独成行，通常这是不必要的，而且也不符合目前项目的开发惯例，此处进行如下约定：
+
+```python
+# Correct
+from mmcv.cnn.bricks import (Conv2d, build_norm_layer, DropPath, MaxPool2d,
+                             Linear)
+from ..utils import ext_loader
+
+# Wrong
+from mmcv.cnn.bricks import Conv2d, build_norm_layer, DropPath, MaxPool2d, \
+                            Linear  # 使用括号进行连接，而不是反斜杠
+from ...utils import is_str  # 最多向上回溯一层，过多的回溯容易导致结构混乱
+```
+
+OpenMMLab 项目使用 pre-commit 工具自动格式化代码，详情见[贡献代码](./contributing.md#代码风格)。
+
+### 命名规范
+
+#### 命名规范的重要性
+
+优秀的命名是良好代码可读的基础。基础的命名规范对各类变量的命名做了要求，使读者可以方便地根据代码名了解变量是一个类 / 局部变量 / 全局变量等。而优秀的命名则需要代码作者对于变量的功能有清晰的认识，以及良好的表达能力，从而使读者根据名称就能了解其含义，甚至帮助了解该段代码的功能。
+
+#### 基础命名规范
+
+| 类型            | 公有             | 私有               |
+| --------------- | ---------------- | ------------------ |
+| 模块            | lower_with_under | \_lower_with_under |
+| 包              | lower_with_under |                    |
+| 类              | CapWords         | \_CapWords         |
+| 异常            | CapWordsError    |                    |
+| 函数（方法）    | lower_with_under | \_lower_with_under |
+| 函数 / 方法参数 | lower_with_under |                    |
+| 全局 / 类内常量 | CAPS_WITH_UNDER  | \_CAPS_WITH_UNDER  |
+| 全局 / 类内变量 | lower_with_under | \_lower_with_under |
+| 变量            | lower_with_under | \_lower_with_under |
+| 局部变量        | lower_with_under |                    |
+
+注意：
+
+- 尽量避免变量名与保留字冲突，特殊情况下如不可避免，可使用一个后置下划线，如 class\_
+- 尽量不要使用过于简单的命名，除了约定俗成的循环变量 i，文件变量 f，错误变量 e 等。
+- 不会被用到的变量可以命名为 \_，逻辑检查器会将其忽略。
+
+#### 命名技巧
+
+良好的变量命名需要保证三点：
+
+1. 含义准确，没有歧义
+2. 长短适中
+3. 前后统一
+
+```python
+# Wrong
+class Masks(metaclass=ABCMeta):  # 命名无法表现基类；Instance or Semantic？
+    pass
+
+# Correct
+class BaseInstanceMasks(metaclass=ABCMeta):
+    pass
+
+# Wrong，不同地方含义相同的变量尽量用统一的命名
+def __init__(self, inplanes, planes):
+    pass
+
+def __init__(self, in_channels, out_channels):
+    pass
+```
+
+常见的函数命名方法：
+
+- 动宾命名法：crop_img, init_weights
+- 动宾倒置命名法：imread, bbox_flip
+
+注意函数命名与参数的顺序，保证主语在前，符合语言习惯：
+
+- check_keys_exist(key, container)
+- check_keys_contain(container, key)
+
+注意避免非常规或统一约定的缩写，如 nb -> num_blocks，in_nc -> in_channels
+
+### docstring 规范
+
+#### 为什么要写 docstring
+
+docstring 是对一个类、一个函数功能与 API 接口的详细描述，有两个功能，一是帮助其他开发者了解代码功能，方便 debug 和复用代码；二是在 Readthedocs 文档中自动生成相关的 API reference 文档，帮助不了解源代码的社区用户使用相关功能。
+
+#### 如何写 docstring
+
+与注释不同，一份规范的 docstring 有着严格的格式要求，以便于 Python 解释器以及 sphinx 进行文档解析，详细的 docstring 约定参见 [PEP 257](https://www.python.org/dev/peps/pep-0257/)。此处以例子的形式介绍各种文档的标准格式，参考格式为 [Google 风格](https://zh-google-styleguide.readthedocs.io/en/latest/google-python-styleguide/python_style_rules/#comments)。
+
+1. 模块文档
+
+   代码风格规范推荐为每一个模块（即 Python 文件）编写一个 docstring，但目前 OpenMMLab 项目大部分没有此类 docstring，因此不做硬性要求。
+
+   ```python
+   """A one line summary of the module or program, terminated by a period.
+
+   Leave one blank line. The rest of this docstring should contain an
+   overall description of the module or program. Optionally, it may also
+   contain a brief description of exported classes and functions and/or usage
+   examples.
+
+   Typical usage example:
+
+   foo = ClassFoo()
+   bar = foo.FunctionBar()
+   """
+   ```
+
+2. 类文档
+
+   类文档是我们最常需要编写的，此处，按照 OpenMMLab 的惯例，我们使用了与 Google 风格不同的写法。如下例所示，文档中没有使用 Attributes 描述类属性，而是使用 Args 描述 __init__ 函数的参数。
+
+   在 Args 中，遵照 `parameter (type): Description.` 的格式，描述每一个参数类型和功能。其中，多种类型可使用 `(float or str)` 的写法，可以为 None 的参数可以写为 `(int, optional)`。
+
+   ```python
+   class BaseRunner(metaclass=ABCMeta):
+       """The base class of Runner, a training helper for PyTorch.
+
+       All subclasses should implement the following APIs:
+
+       - ``run()``
+       - ``train()``
+       - ``val()``
+       - ``save_checkpoint()``
+
+       Args:
+           model (:obj:`torch.nn.Module`): The model to be run.
+           batch_processor (callable, optional): A callable method that process
+               a data batch. The interface of this method should be
+               ``batch_processor(model, data, train_mode) -> dict``.
+               Defaults to None.
+           optimizer (dict or :obj:`torch.optim.Optimizer`, optional): It can be
+               either an optimizer (in most cases) or a dict of optimizers
+               (in models that requires more than one optimizer, e.g., GAN).
+               Defaults to None.
+           work_dir (str, optional): The working directory to save checkpoints
+               and logs. Defaults to None.
+           logger (:obj:`logging.Logger`): Logger used during training.
+                Defaults to None. (The default value is just for backward
+                compatibility)
+           meta (dict, optional): A dict records some import information such as
+               environment info and seed, which will be logged in logger hook.
+               Defaults to None.
+           max_epochs (int, optional): Total training epochs. Defaults to None.
+           max_iters (int, optional): Total training iterations. Defaults to None.
+       """
+
+       def __init__(self,
+                    model,
+                    batch_processor=None,
+                    optimizer=None,
+                    work_dir=None,
+                    logger=None,
+                    meta=None,
+                    max_iters=None,
+                    max_epochs=None):
+           ...
+   ```
+
+   另外，在一些算法实现的主体类中，建议加入原论文的链接；如果参考了其他开源代码的实现，则应加入 modified from，而如果是直接复制了其他代码库的实现，则应加入 copied from ，并注意源码的 License。如有必要，也可以通过 .. math:: 来加入数学公式
+
+   ```python
+   # 参考实现
+   # This func is modified from `detectron2
+   # <https://github.com/facebookresearch/detectron2/blob/ffff8acc35ea88ad1cb1806ab0f00b4c1c5dbfd9/detectron2/structures/masks.py#L387>`_.
+
+   # 复制代码
+   # This code was copied from the `ubelt
+   # library<https://github.com/Erotemic/ubelt>`_.
+
+   # 引用论文 & 添加公式
+   class LabelSmoothLoss(nn.Module):
+       r"""Initializer for the label smoothed cross entropy loss.
+
+       Refers to `Rethinking the Inception Architecture for Computer Vision
+       <https://arxiv.org/abs/1512.00567>`_.
+
+       This decreases gap between output scores and encourages generalization.
+       Labels provided to forward can be one-hot like vectors (NxC) or class
+       indices (Nx1).
+       And this accepts linear combination of one-hot like labels from mixup or
+       cutmix except multi-label task.
+
+       Args:
+           label_smooth_val (float): The degree of label smoothing.
+           num_classes (int, optional): Number of classes. Defaults to None.
+           mode (str): Refers to notes, Options are "original", "classy_vision",
+               "multi_label". Defaults to "classy_vision".
+           reduction (str): The method used to reduce the loss.
+               Options are "none", "mean" and "sum". Defaults to 'mean'.
+           loss_weight (float):  Weight of the loss. Defaults to 1.0.
+
+       Note:
+           if the ``mode`` is "original", this will use the same label smooth
+           method as the original paper as:
+
+           .. math::
+               (1-\epsilon)\delta_{k, y} + \frac{\epsilon}{K}
+
+           where :math:`\epsilon` is the ``label_smooth_val``, :math:`K` is
+           the ``num_classes`` and :math:`\delta_{k,y}` is Dirac delta,
+           which equals 1 for k=y and 0 otherwise.
+
+           if the ``mode`` is "classy_vision", this will use the same label
+           smooth method as the `facebookresearch/ClassyVision
+           <https://github.com/facebookresearch/ClassyVision/blob/main/classy_vision/losses/label_smoothing_loss.py>`_ repo as:
+
+           .. math::
+               \frac{\delta_{k, y} + \epsilon/K}{1+\epsilon}
+
+           if the ``mode`` is "multi_label", this will accept labels from
+           multi-label task and smoothing them as:
+
+           .. math::
+               (1-2\epsilon)\delta_{k, y} + \epsilon
+   ```
+
+```{note}
+注意 \`\`here\`\`、\`here\`、"here" 三种引号功能是不同。
+
+在 reStructured 语法中，\`\`here\`\` 表示一段代码；\`here\` 表示斜体；"here" 无特殊含义，一般可用来表示字符串。其中 \`here\` 的用法与 Markdown 中不同，需要多加留意。
+另外还有 :obj:\`type\` 这种更规范的表示类的写法，但鉴于长度，不做特别要求，一般仅用于表示非常用类型。
+```
+
+3. 方法（函数）文档
+
+   函数文档与类文档的结构基本一致，但需要加入返回值文档。对于较为复杂的函数和类，可以使用 Examples 字段加入示例；如果需要对参数加入一些较长的备注，可以加入 Note 字段进行说明。
+
+   对于使用较为复杂的类或函数，比起看大段大段的说明文字和参数文档，添加合适的示例更能帮助用户迅速了解其用法。需要注意的是，这些示例最好是能够直接在 Python 交互式环境中运行的，并给出一些相对应的结果。如果存在多个示例，可以使用注释简单说明每段示例，也能起到分隔作用。
+
+   ```python
+   def import_modules_from_strings(imports, allow_failed_imports=False):
+       """Import modules from the given list of strings.
+
+       Args:
+           imports (list | str | None): The given module names to be imported.
+           allow_failed_imports (bool): If True, the failed imports will return
+               None. Otherwise, an ImportError is raise. Defaults to False.
+
+       Returns:
+           List[module] | module | None: The imported modules.
+           All these three lines in docstring will be compiled into the same
+           line in readthedocs.
+
+       Examples:
+           >>> osp, sys = import_modules_from_strings(
+           ...     ['os.path', 'sys'])
+           >>> import os.path as osp_
+           >>> import sys as sys_
+           >>> assert osp == osp_
+           >>> assert sys == sys_
+       """
+       ...
+   ```
+
+   如果函数接口在某个版本发生了变化，需要在 docstring 中加入相关的说明，必要时添加 Note 或者 Warning 进行说明，例如：
+
+   ```python
+   class CheckpointHook(Hook):
+       """Save checkpoints periodically.
+
+       Args:
+           out_dir (str, optional): The root directory to save checkpoints. If
+               not specified, ``runner.work_dir`` will be used by default. If
+               specified, the ``out_dir`` will be the concatenation of
+               ``out_dir`` and the last level directory of ``runner.work_dir``.
+               Defaults to None. `Changed in version 1.3.15.`
+           file_client_args (dict, optional): Arguments to instantiate a
+               FileClient. See :class:`mmcv.fileio.FileClient` for details.
+               Defaults to None. `New in version 1.3.15.`
+
+       Warning:
+           Before v1.3.15, the ``out_dir`` argument indicates the path where the
+           checkpoint is stored. However, in v1.3.15 and later, ``out_dir``
+           indicates the root directory and the final path to save checkpoint is
+           the concatenation of out_dir and the last level directory of
+           ``runner.work_dir``. Suppose the value of ``out_dir`` is
+           "/path/of/A" and the value of ``runner.work_dir`` is "/path/of/B",
+           then the final path will be "/path/of/A/B".
+   ```
+
+   如果参数或返回值里带有需要展开描述字段的 dict，则应该采用如下格式：
+
+   ```python
+   def func(x):
+       r"""
+       Args:
+           x (None): A dict with 2 keys, ``padded_targets``, and ``targets``.
+
+               - ``targets`` (list[Tensor]): A list of tensors.
+                 Each tensor has the shape of :math:`(T_i)`. Each
+                 element is the index of a character.
+               - ``padded_targets`` (Tensor): A tensor of shape :math:`(N)`.
+                 Each item is the length of a word.
+
+       Returns:
+           dict: A dict with 2 keys, ``padded_targets``, and ``targets``.
+
+           - ``targets`` (list[Tensor]): A list of tensors.
+             Each tensor has the shape of :math:`(T_i)`. Each
+             element is the index of a character.
+           - ``padded_targets`` (Tensor): A tensor of shape :math:`(N)`.
+             Each item is the length of a word.
+       """
+       return x
+   ```
+
+```{important}
+为了生成 readthedocs 文档，文档的编写需要按照 ReStructrued 文档格式，否则会产生文档渲染错误，在提交 PR 前，最好生成并预览一下文档效果。
+语法规范参考：
+
+- [reStructuredText Primer - Sphinx documentation](https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#)
+- [Example Google Style Python Docstrings ‒ napoleon 0.7 documentation](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html#example-google)
+```
+
+### 注释规范
+
+#### 为什么要写注释
+
+对于一个开源项目，团队合作以及社区之间的合作是必不可少的，因而尤其要重视合理的注释。不写注释的代码，很有可能过几个月自己也难以理解，造成额外的阅读和修改成本。
+
+#### 如何写注释
+
+最需要写注释的是代码中那些技巧性的部分。如果你在下次代码审查的时候必须解释一下，那么你应该现在就给它写注释。对于复杂的操作，应该在其操作开始前写上若干行注释。对于不是一目了然的代码，应在其行尾添加注释。
+—— Google 开源项目风格指南
+
+```python
+# We use a weighted dictionary search to find out where i is in
+# the array. We extrapolate position based on the largest num
+# in the array and the array size and then do binary search to
+# get the exact number.
+if i & (i-1) == 0:  # True if i is 0 or a power of 2.
+```
+
+为了提高可读性, 注释应该至少离开代码2个空格.
+另一方面, 绝不要描述代码. 假设阅读代码的人比你更懂Python, 他只是不知道你的代码要做什么.
+—— Google 开源项目风格指南
+
+```python
+# Wrong:
+# Now go through the b array and make sure whenever i occurs
+# the next element is i+1
+
+# Wrong:
+if i & (i-1) == 0:  # True if i bitwise and i-1 is 0.
+```
+
+在注释中，可以使用 Markdown 语法，因为开发人员通常熟悉 Markdown 语法，这样可以便于交流理解，如可使用单反引号表示代码和变量（注意不要和 docstring 中的 ReStructured 语法混淆）
+
+```python
+# `_reversed_padding_repeated_twice` is the padding to be passed to
+# `F.pad` if needed (e.g., for non-zero padding types that are
+# implemented as two ops: padding + conv). `F.pad` accepts paddings in
+# reverse order than the dimension.
+self._reversed_padding_repeated_twice = _reverse_repeat_tuple(self.padding, 2)
+```
+
+#### 注释示例
+
+1. 出自 `mmcv/utils/registry.py`，对于较为复杂的逻辑结构，通过注释，明确了优先级关系。
+
+   ```python
+   # self.build_func will be set with the following priority:
+   # 1. build_func
+   # 2. parent.build_func
+   # 3. build_from_cfg
+   if build_func is None:
+       if parent is not None:
+           self.build_func = parent.build_func
+       else:
+           self.build_func = build_from_cfg
+   else:
+       self.build_func = build_func
+   ```
+
+2. 出自 `mmcv/runner/checkpoint.py`，对于 bug 修复中的一些特殊处理，可以附带相关的 issue 链接，帮助其他人了解 bug 背景。
+
+   ```python
+   def _save_ckpt(checkpoint, file):
+       # The 1.6 release of PyTorch switched torch.save to use a new
+       # zipfile-based file format. It will cause RuntimeError when a
+       # checkpoint was saved in high version (PyTorch version>=1.6.0) but
+       # loaded in low version (PyTorch version<1.6.0). More details at
+       # https://github.com/open-mmlab/mmpose/issues/904
+       if digit_version(TORCH_VERSION) >= digit_version('1.6.0'):
+           torch.save(checkpoint, file, _use_new_zipfile_serialization=False)
+       else:
+           torch.save(checkpoint, file)
+   ```
+
+### 类型注解
+
+#### 为什么要写类型注解
+
+类型注解是对函数中变量的类型做限定或提示，为代码的安全性提供保障、增强代码的可读性、避免出现类型相关的错误。
+Python 没有对类型做强制限制，类型注解只起到一个提示作用，通常你的 IDE 会解析这些类型注解，然后在你调用相关代码时对类型做提示。另外也有类型注解检查工具，这些工具会根据类型注解，对代码中可能出现的问题进行检查，减少 bug 的出现。
+需要注意的是，通常我们不需要注释模块中的所有函数：
+
+1. 公共的 API 需要注释
+2. 在代码的安全性，清晰性和灵活性上进行权衡是否注释
+3. 对于容易出现类型相关的错误的代码进行注释
+4. 难以理解的代码请进行注释
+5. 若代码中的类型已经稳定，可以进行注释. 对于一份成熟的代码，多数情况下，即使注释了所有的函数，也不会丧失太多的灵活性.
+
+#### 如何写类型注解
+
+1. 函数 / 方法类型注解，通常不对 self 和 cls 注释。
+
+   ```python
+   from typing import Optional, List, Tuple
+
+   # 全部位于一行
+   def my_method(self, first_var: int) -> int:
+       pass
+
+   # 另起一行
+   def my_method(
+           self, first_var: int,
+           second_var: float) -> Tuple[MyLongType1, MyLongType1, MyLongType1]:
+       pass
+
+   # 单独成行（具体的应用场合与行宽有关，建议结合 yapf 自动化格式使用）
+   def my_method(
+       self, first_var: int, second_var: float
+   ) -> Tuple[MyLongType1, MyLongType1, MyLongType1]:
+       pass
+
+   # 引用尚未被定义的类型
+   class MyClass:
+       def __init__(self,
+                    stack: List["MyClass"]) -> None:
+           pass
+   ```
+
+   注：类型注解中的类型可以是 Python 内置类型，也可以是自定义类，还可以使用 Python 提供的 wrapper 类对类型注解进行装饰，一些常见的注解如下：
+
+   ```python
+   # 数值类型
+   from numbers import Number
+
+   # 可选类型，指参数可以为 None
+   from typing import Optional
+   def foo(var: Optional[int] = None):
+       pass
+
+   # 联合类型，指同时接受多种类型
+   from typing import Union
+   def foo(var: Union[float, str]):
+       pass
+
+   from typing import Sequence  # 序列类型
+   from typing import Iterable  # 可迭代类型
+   from typing import Any  # 任意类型
+   from typing import Callable  # 可调用类型
+
+   from typing import List, Dict  # 列表和字典的泛型类型
+   from typing import Tuple  # 元组的特殊格式
+   # 虽然在 Python 3.9 中，list, tuple 和 dict 本身已支持泛型，但为了支持之前的版本
+   # 我们在进行类型注解时还是需要使用 List, Tuple, Dict 类型
+   # 另外，在对参数类型进行注解时，尽量使用 Sequence & Iterable & Mapping
+   # List, Tuple, Dict 主要用于返回值类型注解
+   # 参见 https://docs.python.org/3/library/typing.html#typing.List
+   ```
+
+2. 变量类型注解，一般用于难以直接推断其类型时
+
+   ```python
+   # Recommend: 带类型注解的赋值
+   a: Foo = SomeUndecoratedFunction()
+   a: List[int]: [1, 2, 3]         # List 只支持单一类型泛型，可使用 Union
+   b: Tuple[int, int] = (1, 2)     # 长度固定为 2
+   c: Tuple[int, ...] = (1, 2, 3)  # 变长
+   d: Dict[str, int] = {'a': 1, 'b': 2}
+
+   # Not Recommend：行尾类型注释
+   # 虽然这种方式被写在了 Google 开源指南中，但这是一种为了支持 Python 2.7 版本
+   # 而补充的注释方式，鉴于我们只支持 Python 3, 为了风格统一，不推荐使用这种方式。
+   a = SomeUndecoratedFunction()  # type: Foo
+   a = [1, 2, 3]  # type: List[int]
+   b = (1, 2, 3)  # type: Tuple[int, ...]
+   c = (1, "2", 3.5)  # type: Tuple[int, Text, float]
+   ```
+
+3. 泛型
+
+   上文中我们知道，typing 中提供了 list 和 dict 的泛型类型，那么我们自己是否可以定义类似的泛型呢？
+
+   ```python
+   from typing import TypeVar, Generic
+
+   KT = TypeVar('KT')
+   VT = TypeVar('VT')
+
+   class Mapping(Generic[KT, VT]):
+       def __init__(self, data: Dict[KT, VT]):
+           self._data = data
+
+       def __getitem__(self, key: KT) -> VT:
+           return self._data[key]
+   ```
+
+   使用上述方法，我们定义了一个拥有泛型能力的映射类，实际用法如下：
+
+   ```python
+   mapping = Mapping[str, float]({'a': 0.5})
+   value: float = example['a']
+   ```
+
+   另外，我们也可以利用 TypeVar 在函数签名中指定联动的多个类型：
+
+   ```python
+   from typing import TypeVar, List
+
+   T = TypeVar('T')  # Can be anything
+   A = TypeVar('A', str, bytes)  # Must be str or bytes
+
+
+   def repeat(x: T, n: int) -> List[T]:
+       """Return a list containing n references to x."""
+       return [x]*n
+
+
+   def longest(x: A, y: A) -> A:
+       """Return the longest of two strings."""
+       return x if len(x) >= len(y) else y
+   ```
+
+更多关于类型注解的写法请参考 [typing](https://docs.python.org/3/library/typing.html)。
+
+#### 类型注解检查工具
+
+[mypy](https://mypy.readthedocs.io/en/stable/) 是一个 Python 静态类型检查工具。根据你的类型注解，mypy 会检查传参、赋值等操作是否符合类型注解，从而避免可能出现的 bug。
+
+例如如下的一个  Python 脚本文件 test.py:
+
+```python
+def foo(var: int) -> float:
+    return float(var)
+
+a: str = foo('2.0')
+b: int = foo('3.0')  # type: ignore
+```
+
+运行 mypy test.py 可以得到如下检查结果，分别指出了第 4 行在函数调用和返回值赋值两处类型错误。而第 5 行同样存在两个类型错误，由于使用了 type: ignore 而被忽略了，只有部分特殊情况可能需要此类忽略。
+
+```
+test.py:4: error: Incompatible types in assignment (expression has type "float", variable has type "int")
+test.py:4: error: Argument 1 to "foo" has incompatible type "str"; expected "int"
+Found 2 errors in 1 file (checked 1 source file)
+```
diff --git a/docs/zh_cn/community/contributing.md b/docs/zh_cn/community/contributing.md
new file mode 100644
index 000000000..4fac637a0
--- /dev/null
+++ b/docs/zh_cn/community/contributing.md
@@ -0,0 +1,266 @@
+## 贡献代码
+
+欢迎加入 MMYOLO 社区，我们致力于打造最前沿的计算机视觉基础库，我们欢迎任何类型的贡献，包括但不限于
+
+**修复错误**
+
+修复代码实现错误的步骤如下：
+
+1. 如果提交的代码改动较大，建议先提交 issue，并正确描述 issue 的现象、原因和复现方式，讨论后确认修复方案。
+2. 修复错误并补充相应的单元测试，提交拉取请求。
+
+**新增功能或组件**
+
+1. 如果新功能或模块涉及较大的代码改动，建议先提交 issue，确认功能的必要性。
+2. 实现新增功能并添单元测试，提交拉取请求。
+
+**文档补充**
+
+修复文档可以直接提交拉取请求
+
+添加文档或将文档翻译成其他语言步骤如下
+
+1. 提交 issue，确认添加文档的必要性。
+2. 添加文档，提交拉取请求。
+
+### 拉取请求工作流
+
+如果你对拉取请求不了解，没关系，接下来的内容将会从零开始，一步一步地指引你如何创建一个拉取请求。如果你想深入了解拉取请求的开发模式，可以参考 github [官方文档](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-pull-requests)
+
+#### 1. 复刻仓库
+
+当你第一次提交拉取请求时，先复刻 OpenMMLab 原代码库，点击 GitHub 页面右上角的 **Fork** 按钮，复刻后的代码库将会出现在你的 GitHub 个人主页下。
+
+<img src="https://user-images.githubusercontent.com/27466624/204301143-2d262d2c-28b3-4060-8576-21d9f4237f2f.png" width="1200">
+
+将代码克隆到本地
+
+```shell
+git clone git@github.com:{username}/mmyolo.git
+```
+
+添加原代码库为上游代码库
+
+```bash
+git remote add upstream git@github.com:open-mmlab/mmyolo
+```
+
+检查 remote 是否添加成功，在终端输入 `git remote -v`
+
+```bash
+origin	git@github.com:{username}/mmyolo.git (fetch)
+origin	git@github.com:{username}/mmyolo.git (push)
+upstream	git@github.com:open-mmlab/mmyolo (fetch)
+upstream	git@github.com:open-mmlab/mmyolo (push)
+```
+
+```{note}
+这里对 origin 和 upstream 进行一个简单的介绍，当我们使用 git clone 来克隆代码时，会默认创建一个 origin 的 remote，它指向我们克隆的代码库地址，而 upstream 则是我们自己添加的，用来指向原始代码库地址。当然如果你不喜欢他叫 upstream，也可以自己修改，比如叫 open-mmlab。我们通常向 origin 提交代码（即 fork 下来的远程仓库），然后向 upstream 提交一个 pull request。如果提交的代码和最新的代码发生冲突，再从 upstream 拉取最新的代码，和本地分支解决冲突，再提交到 origin。
+```
+
+#### 2. 配置 pre-commit
+
+在本地开发环境中，我们使用 [pre-commit](https://pre-commit.com/#intro) 来检查代码风格，以确保代码风格的统一。在提交代码，需要先安装 pre-commit（需要在 MMYOLO 目录下执行）:
+
+```shell
+pip install -U pre-commit
+pre-commit install
+```
+
+检查 pre-commit 是否配置成功，并安装 `.pre-commit-config.yaml` 中的钩子：
+
+```shell
+pre-commit run --all-files
+```
+
+<img src="https://user-images.githubusercontent.com/57566630/173660750-3df20a63-cb66-4d33-a986-1f643f1d8aaf.png" width="1200">
+
+<img src="https://user-images.githubusercontent.com/57566630/202368856-0465a90d-8fce-4345-918e-67b8b9c82614.png" width="1200">
+
+```{note}
+如果你是中国用户，由于网络原因，可能会出现安装失败的情况，这时可以使用国内源
+
+pre-commit install -c .pre-commit-config-zh-cn.yaml
+
+pre-commit run --all-files -c .pre-commit-config-zh-cn.yaml
+```
+
+如果安装过程被中断，可以重复执行 `pre-commit run ...` 继续安装。
+
+如果提交的代码不符合代码风格规范，pre-commit 会发出警告，并自动修复部分错误。
+
+<img src="https://user-images.githubusercontent.com/57566630/202369176-67642454-0025-4023-a095-263529107aa3.png" width="1200">
+
+如果我们想临时绕开 pre-commit 的检查提交一次代码，可以在 `git commit` 时加上 `--no-verify`（需要保证最后推送至远程仓库的代码能够通过 pre-commit 检查）。
+
+```shell
+git commit -m "xxx" --no-verify
+```
+
+#### 3. 创建开发分支
+
+安装完 pre-commit 之后，我们需要基于 dev 创建开发分支，建议的分支命名规则为 `username/pr_name`。
+
+```shell
+git checkout -b yhc/refactor_contributing_doc
+```
+
+在后续的开发中，如果本地仓库的 dev 分支落后于 upstream 的 dev 分支，我们需要先拉取 upstream 的代码进行同步，再执行上面的命令
+
+```shell
+git pull upstream dev
+```
+
+#### 4. 提交代码并在本地通过单元测试
+
+- MMYOLO 引入了 mypy 来做静态类型检查，以增加代码的鲁棒性。因此我们在提交代码时，需要补充 Type Hints。具体规则可以参考[教程](https://zhuanlan.zhihu.com/p/519335398)。
+
+- 提交的代码同样需要通过单元测试
+
+  ```shell
+  # 通过全量单元测试
+  pytest tests
+
+  # 我们需要保证提交的代码能够通过修改模块的单元测试，以 yolov5_coco dataset 为例
+  pytest tests/test_datasets/test_yolov5_coco.py
+  ```
+
+  如果你由于缺少依赖无法运行修改模块的单元测试，可以参考[指引-单元测试](#单元测试)
+
+- 如果修改/添加了文档，参考[指引](#文档渲染)确认文档渲染正常。
+
+#### 5. 推送代码到远程
+
+代码通过单元测试和 pre-commit 检查后，将代码推送到远程仓库，如果是第一次推送，可以在 `git push` 后加上 `-u` 参数以关联远程分支
+
+```shell
+git push -u origin {branch_name}
+```
+
+这样下次就可以直接使用 `git push` 命令推送代码了，而无需指定分支和远程仓库。
+
+#### 6. 提交拉取请求（PR）
+
+(1) 在 GitHub 的 Pull request 界面创建拉取请求
+<img src="https://user-images.githubusercontent.com/27466624/204302289-d1e54901-8f27-4934-923f-fda800ff9851.png" width="1200">
+
+(2) 根据指引修改 PR 描述，以便于其他开发者更好地理解你的修改
+
+<img src="https://user-images.githubusercontent.com/27466624/204303311-84456397-ee41-44f9-945c-85ce415da235.png" width="1200">
+
+描述规范详见[拉取请求规范](#拉取请求规范)
+
+&#160;
+
+**注意事项**
+
+(a) PR 描述应该包含修改理由、修改内容以及修改后带来的影响，并关联相关 Issue（具体方式见[文档](https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue)）
+
+(b) 如果是第一次为 OpenMMLab 做贡献，需要签署 CLA
+
+<img src="https://user-images.githubusercontent.com/57566630/167307569-a794b967-6e28-4eac-a942-00deb657815f.png" width="1200">
+
+(c) 检查提交的 PR 是否通过 CI（集成测试）
+
+<img src="https://user-images.githubusercontent.com/27466624/204303753-900de590-ddd1-4be2-8e43-8dc09f127f5d.png" width="1200">
+
+MMYOLO 会在 Linux 上，基于不同版本的 Python、PyTorch 对提交的代码进行单元测试，以保证代码的正确性，如果有任何一个没有通过，我们可点击上图中的 `Details` 来查看具体的测试信息，以便于我们修改代码。
+
+(3) 如果 PR 通过了 CI，那么就可以等待其他开发者的 review，并根据 reviewer 的意见，修改代码，并重复 [4](#4-提交代码并本地通过单元测试)-[5](#5-推送代码到远程) 步骤，直到 reviewer 同意合入 PR。
+
+<img src="https://user-images.githubusercontent.com/57566630/202145400-cc2cd8c4-10b0-472f-ba37-07e6f50acc67.png" width="1200">
+
+所有 reviewer 同意合入 PR 后，我们会尽快将 PR 合并到 dev 分支。
+
+#### 7. 解决冲突
+
+随着时间的推移，我们的代码库会不断更新，这时候，如果你的 PR 与 dev 分支存在冲突，你需要解决冲突，解决冲突的方式有两种：
+
+```shell
+git fetch --all --prune
+git rebase upstream/dev
+```
+
+或者
+
+```shell
+git fetch --all --prune
+git merge upstream/dev
+```
+
+如果你非常善于处理冲突，那么可以使用 rebase 的方式来解决冲突，因为这能够保证你的 commit log 的整洁。如果你不太熟悉 `rebase` 的使用，那么可以使用 `merge` 的方式来解决冲突。
+
+### 指引
+
+#### 单元测试
+
+在提交修复代码错误或新增特性的拉取请求时，我们应该尽可能的让单元测试覆盖所有提交的代码，计算单元测试覆盖率的方法如下
+
+```shell
+python -m coverage run -m pytest /path/to/test_file
+python -m coverage html
+# check file in htmlcov/index.html
+```
+
+#### 文档渲染
+
+在提交修复代码错误或新增特性的拉取请求时，可能会需要修改/新增模块的 docstring。我们需要确认渲染后的文档样式是正确的。
+本地生成渲染后的文档的方法如下
+
+```shell
+pip install -r requirements/docs.txt
+cd docs/zh_cn/
+# or docs/en
+make html
+# check file in ./docs/zh_cn/_build/html/index.html
+```
+
+### 代码风格
+
+#### Python
+
+[PEP8](https://www.python.org/dev/peps/pep-0008/) 作为 OpenMMLab 算法库首选的代码规范，我们使用以下工具检查和格式化代码
+
+- [flake8](https://github.com/PyCQA/flake8)：Python 官方发布的代码规范检查工具，是多个检查工具的封装
+- [isort](https://github.com/timothycrosley/isort)：自动调整模块导入顺序的工具
+- [yapf](https://github.com/google/yapf)：Google 发布的代码规范检查工具
+- [codespell](https://github.com/codespell-project/codespell)：检查单词拼写是否有误
+- [mdformat](https://github.com/executablebooks/mdformat)：检查 markdown 文件的工具
+- [docformatter](https://github.com/myint/docformatter)：格式化 docstring 的工具
+
+yapf 和 isort 的配置可以在 [setup.cfg](./setup.cfg) 找到
+
+通过配置 [pre-commit hook](https://pre-commit.com/) ，我们可以在提交代码时自动检查和格式化 `flake8`、`yapf`、`isort`、`trailing whitespaces`、`markdown files`，
+修复 `end-of-files`、`double-quoted-strings`、`python-encoding-pragma`、`mixed-line-ending`，调整 `requirments.txt` 的包顺序。
+pre-commit 钩子的配置可以在 [.pre-commit-config](./.pre-commit-config.yaml) 找到。
+
+pre-commit 具体的安装使用方式见[拉取请求](#2-配置-pre-commit)。
+
+更具体的规范请参考 [OpenMMLab 代码规范](code_style.md)。
+
+#### C++ and CUDA
+
+C++ 和 CUDA 的代码规范遵从 [Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html)
+
+### 拉取请求规范
+
+1. 使用 [pre-commit hook](https://pre-commit.com)，尽量减少代码风格相关问题
+
+2. 一个`拉取请求`对应一个短期分支
+
+3. 粒度要细，一个`拉取请求`只做一件事情，避免超大的`拉取请求`
+
+   - Bad：实现 Faster R-CNN
+   - Acceptable：给 Faster R-CNN 添加一个 box head
+   - Good：给 box head 增加一个参数来支持自定义的 conv 层数
+
+4. 每次 Commit 时需要提供清晰且有意义 commit 信息
+
+5. 提供清晰且有意义的`拉取请求`描述
+
+   - 标题写明白任务名称，一般格式:\[Prefix\] Short description of the pull request (Suffix)
+   - prefix：新增功能 \[Feature\], 修 bug \[Fix\], 文档相关 \[Docs\], 开发中 \[WIP\] (暂时不会被 review)
+   - 描述里介绍`拉取请求`的主要修改内容，结果，以及对其他部分的影响, 参考`拉取请求`模板
+   - 关联相关的`议题` (issue) 和其他`拉取请求`
+
+6. 如果引入了其他三方库，或借鉴了三方库的代码，请确认他们的许可证和 mmyolo 兼容，并在借鉴的代码上补充 `This code is inspired from http://`
diff --git a/docs/zh_cn/deploy/basic_deployment_guide.md b/docs/zh_cn/deploy/basic_deployment_guide.md
index a17e4233e..e0181d540 100644
--- a/docs/zh_cn/deploy/basic_deployment_guide.md
+++ b/docs/zh_cn/deploy/basic_deployment_guide.md
@@ -21,7 +21,7 @@ ncnn 和其他后端的支持会在后续支持。
 
 ## MMYOLO 中部署相关配置说明
 
-所有部署配置文件在 [`configs/deploy`](configs/deploy) 目录下。
+所有部署配置文件在 [`configs/deploy`](../../../configs/deploy/) 目录下。
 
 您可以部署静态输入或者动态输入的模型，因此您需要修改模型配置文件中与此相关的数据处理流程。
 
@@ -89,7 +89,7 @@ test_dataloader = dict(
 
 以 `MMYOLO` 中的 `YOLOv5` 部署配置为例，下面是对配置文件参数说明介绍。
 
-`ONNXRuntime` 部署 `YOLOv5` 可以使用 [`detection_onnxruntime_static.py`](configs/deploy/detection_onnxruntime_static.py) 配置。
+`ONNXRuntime` 部署 `YOLOv5` 可以使用 [`detection_onnxruntime_static.py`](https://github.com/open-mmlab/mmyolo/blob/main/configs/deploy/detection_onnxruntime_static.py) 配置。
 
 ```python
 _base_ = ['./base_static.py']
@@ -111,7 +111,7 @@ backend_config = dict(type='onnxruntime')
 
 `backend_config` 中指定了部署后端 `type=‘onnxruntime’`，其他信息可参考第三小节。
 
-`TensorRT` 部署 `YOLOv5` 可以使用 [`detection_tensorrt_static-640x640.py`](config/deploy/detection_tensorrt_static-640x640.py) 配置。
+`TensorRT` 部署 `YOLOv5` 可以使用 [`detection_tensorrt_static-640x640.py`](https://github.com/open-mmlab/mmyolo/blob/main/configs/deploy/detection_tensorrt_static-640x640.py) 配置。
 
 ```python
 _base_ = ['./base_static.py']
@@ -135,7 +135,7 @@ use_efficientnms = False
 与 `ONNXRuntime` 部署配置不同的是，`TensorRT`  需要指定输入图片尺寸和构建引擎文件需要的参数，包括：
 
 - `onnx_config` 中指定 `input_shape=(640, 640)`
-- `backend_config['common_config']` 中 `fp16_mode=False` 和 `max_workspace_size=1 << 30`, `fp16_mode` 表示是否以 `fp16` 的参数格式构建引擎，`max_workspace_size` 表示当前 `gpu` 设备最大显存, 单位为 `GB`。`fp16` 的详细配置可以参考 [`detection_tensorrt-fp16_static-640x640.py`](configs/deploy/detection_tensorrt-fp16_static-640x640.py)
+- `backend_config['common_config']` 中包括 `fp16_mode=False` 和 `max_workspace_size=1 << 30`, `fp16_mode` 表示是否以 `fp16` 的参数格式构建引擎，`max_workspace_size` 表示当前 `gpu` 设备最大显存, 单位为 `GB`。`fp16` 的详细配置可以参考 [`detection_tensorrt-fp16_static-640x640.py`](https://github.com/open-mmlab/mmyolo/blob/main/configs/deploy/detection_tensorrt-fp16_static-640x640.py)
 - `backend_config['model_inputs']['input_shapes']['input']` 中 `min_shape` /`opt_shape`/`max_shape` 对应的值在静态输入下应该保持相同，即默认均为 `[1, 3, 640, 640]`。
 
 `use_efficientnms` 是 `MMYOLO` 系列新引入的配置，表示在导出 `onnx` 时是否启用`Efficient NMS Plugin`来替换 `MMDeploy` 中的 `TRTBatchedNMS plugin` 。
@@ -148,11 +148,11 @@ use_efficientnms = False
 
 #### (1) 模型配置文件介绍
 
-当您部署静态输入模型时，您无需修改任何模型配置文件，仅需要修改部署配置文件即可。
+当您部署动态输入模型时，您无需修改任何模型配置文件，仅需要修改部署配置文件即可。
 
 #### (2) 部署配置文件介绍
 
-`ONNXRuntime` 部署 `YOLOv5` 可以使用  [`detection_onnxruntime_dynamic.py`](configs/deploy/detection_onnxruntime_dynamic.py)  配置。
+`ONNXRuntime` 部署 `YOLOv5` 可以使用  [`detection_onnxruntime_dynamic.py`](https://github.com/open-mmlab/mmyolo/blob/main/configs/deploy/detection_onnxruntime_dynamic.py)  配置。
 
 ```python
 _base_ = ['./base_dynamic.py']
@@ -174,7 +174,7 @@ backend_config = dict(type='onnxruntime')
 
 `backend_config` 中指定了后端 `type='onnxruntime'`，其他配置与上一节在 ONNXRuntime 部署静态输入模型相同。
 
-`TensorRT` 部署 `YOLOv5` 可以使用  [`detection_tensorrt_dynamic-192x192-960x960.py`](config/deploy/detection_tensorrt_dynamic-192x192-960x960.py) 配置。
+`TensorRT` 部署 `YOLOv5` 可以使用  [`detection_tensorrt_dynamic-192x192-960x960.py`](https://github.com/open-mmlab/mmyolo/blob/main/configs/deploy/detection_tensorrt_dynamic-192x192-960x960.py) 配置。
 
 ```python
 _base_ = ['./base_dynamic.py']
@@ -228,8 +228,8 @@ python3 ${MMDEPLOY_DIR}/tools/deploy.py \
 ### 参数描述
 
 - `deploy_cfg` : mmdeploy 针对此模型的部署配置，包含推理框架类型、是否量化、输入 shape 是否动态等。配置文件之间可能有引用关系，`configs/deploy/detection_onnxruntime_static.py` 是一个示例。
-- `model_cfg` : MMYOLO 算法库的模型配置，例如 `configs/deploy/model/yolov5_s-deploy.py`, 与 mmdeploy 的路径无关.
-- `checkpoint` : torch 模型路径。可以 http/https 开头，详见 `mmcv.FileClient` 的实现。.
+- `model_cfg` : MMYOLO 算法库的模型配置，例如 `configs/deploy/model/yolov5_s-deploy.py`, 与 mmdeploy 的路径无关。
+- `checkpoint` : torch 模型路径。可以 http/https 开头，详见 `mmengine.fileio` 的实现。
 - `img` : 模型转换时，用做测试的图像文件路径。
 - `--test-img` : 用于测试模型的图像文件路径。默认设置成`None`。
 - `--work-dir` : 工作目录，用来保存日志和模型文件。
diff --git a/docs/zh_cn/deploy/yolov5_deployment.md b/docs/zh_cn/deploy/yolov5_deployment.md
index 2cf652427..014b735e8 100644
--- a/docs/zh_cn/deploy/yolov5_deployment.md
+++ b/docs/zh_cn/deploy/yolov5_deployment.md
@@ -1,10 +1,10 @@
 # YOLOv5 部署全流程说明
 
-请先参考 [`部署必备指南`](./部署必备指南.md) 了解部署配置文件等相关信息。
+请先参考 [`部署必备指南`](basic_deployment_guide.md) 了解部署配置文件等相关信息。
 
 ## 模型训练和测试
 
-模型训练和测试请参考 [`YOLOv5 从入门到部署全流程`](docs/zh_cn/user_guides/yolov5_tutorial.md) 。
+模型训练和测试请参考 [YOLOv5 从入门到部署全流程](../user_guides/yolov5_tutorial.md) 。
 
 ## 准备 MMDeploy 运行环境
 
@@ -22,7 +22,7 @@
 
 当您需要部署静态输入模型时，您应该确保模型的输入尺寸是固定的，比如在测试流程或测试数据集加载时输入尺寸为 `640x640`。
 
-您可以查看 [`yolov5_s-deploy.py`](configs/deploy/model/yolov5_s-deploy.py) 中测试流程或测试数据集加载部分，如下所示：
+您可以查看 [`yolov5_s-static.py`](https://github.com/open-mmlab/mmyolo/tree/main/configs/deploy/model/yolov5_s-static.py) 中测试流程或测试数据集加载部分，如下所示：
 
 ```python
 _base_ = '../../yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py'
@@ -55,7 +55,7 @@ test_dataloader = dict(
 
 #### (2) 部署配置文件
 
-当您部署在 `ONNXRuntime` 时，您可以查看 [`detection_onnxruntime_static.py`](configs/deploy/detection_onnxruntime_static.py) ，如下所示：
+当您部署在 `ONNXRuntime` 时，您可以查看 [`detection_onnxruntime_static.py`](https://github.com/open-mmlab/mmyolo/tree/main/configs/deploy/detection_onnxruntime_static.py) ，如下所示：
 
 ```python
 _base_ = ['./base_static.py']
@@ -75,9 +75,9 @@ codebase_config = dict(
 backend_config = dict(type='onnxruntime')
 ```
 
-默认配置中的 `post_processing` 后处理参数是当前模型与 `pytorch` 模型精度对齐的配置，若您需要修改相关参数，可以参考 [`部署必备指南`](./部署必备指南.md) 的详细介绍。
+默认配置中的 `post_processing` 后处理参数是当前模型与 `pytorch` 模型精度对齐的配置，若您需要修改相关参数，可以参考 [`部署必备指南`](basic_deployment_guide.md) 的详细介绍。
 
-当您部署在 `TensorRT` 时，您可以查看 [`detection_tensorrt_static-640x640.py`](config/deploy/detection_tensorrt_static-640x640.py) ，如下所示：
+当您部署在 `TensorRT` 时，您可以查看 [`detection_tensorrt_static-640x640.py`](https://github.com/open-mmlab/mmyolo/tree/main/configs/deploy/detection_tensorrt_static-640x640.py) ，如下所示：
 
 ```python
 _base_ = ['./base_static.py']
@@ -102,7 +102,7 @@ use_efficientnms = False
 
 #### (1) 模型配置文件
 
-当您需要部署动态输入模型时，模型的输入可以为任意尺寸(`TensorRT` 会限制最小和最大输入尺寸)，因此使用默认的 [`yolov5_s-v61_syncbn_8xb16-300e_coco.py`](configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py) 模型配置文件即可，其中数据处理和数据集加载器部分如下所示：
+当您需要部署动态输入模型时，模型的输入可以为任意尺寸(`TensorRT` 会限制最小和最大输入尺寸)，因此使用默认的 [`yolov5_s-v61_syncbn_8xb16-300e_coco.py`](https://github.com/open-mmlab/mmyolo/tree/main/configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py) 模型配置文件即可，其中数据处理和数据集加载器部分如下所示：
 
 ```python
 batch_shapes_cfg = dict(
@@ -148,7 +148,7 @@ val_dataloader = dict(
 
 #### (2) 部署配置文件
 
-当您部署在 `ONNXRuntime` 时，您可以查看 [`detection_onnxruntime_dynamic.py`](configs/deploy/detection_onnxruntime_dynamic.py) ，如下所示：
+当您部署在 `ONNXRuntime` 时，您可以查看 [`detection_onnxruntime_dynamic.py`](https://github.com/open-mmlab/mmyolo/blob/main/configs/deploy/detection_onnxruntime_dynamic.py) ，如下所示：
 
 ```python
 _base_ = ['./base_dynamic.py']
@@ -170,7 +170,7 @@ backend_config = dict(type='onnxruntime')
 
 与静态输入配置仅有 `_base_ = ['./base_dynamic.py']` 不同，动态输入会额外继承 `dynamic_axes` 属性。其他配置与静态输入配置相同。
 
-当您部署在 `TensorRT` 时，您可以查看 [`detection_tensorrt_dynamic-192x192-960x960.py`](config/deploy/detection_tensorrt_dynamic-192x192-960x960.py) ，如下所示：
+当您部署在 `TensorRT` 时，您可以查看 [`detection_tensorrt_dynamic-192x192-960x960.py`](https://github.com/open-mmlab/mmyolo/tree/main/configs/deploy/detection_tensorrt_dynamic-192x192-960x960.py) ，如下所示：
 
 ```python
 _base_ = ['./base_dynamic.py']
@@ -227,7 +227,7 @@ python3 ${MMDEPLOY_DIR}/tools/deploy.py \
 
 #### TensorRT
 
-```bash
+```shell
 python3 ${MMDEPLOY_DIR}/tools/deploy.py \
     configs/deploy/detection_tensorrt_static-640x640.py \
     configs/deploy/model/yolov5_s-static.py \
@@ -318,9 +318,9 @@ python3 ${MMDEPLOY_DIR}/tools/test.py \
 
 # 使用 Docker 部署测试
 
-`MMYOLO` 提供了一个 [`Dockerfile`](docker/Dockerfile_deployment) 用于构建镜像。请确保您的 `docker` 版本大于等于 `19.03`。
+`MMYOLO` 提供了一个 [`Dockerfile`](https://github.com/open-mmlab/mmyolo/blob/main/docker/Dockerfile_deployment) 用于构建镜像。请确保您的 `docker` 版本大于等于 `19.03`。
 
-温馨提示；国内用户建议取消掉 [`Dockerfile`](docker/Dockerfile_deployment) 里面 `Optional` 后两行的注释，可以获得火箭一般的下载提速：
+温馨提示；国内用户建议取消掉 [`Dockerfile`](https://github.com/open-mmlab/mmyolo/blob/main/docker/Dockerfile_deployment) 里面 `Optional` 后两行的注释，可以获得火箭一般的下载提速：
 
 ```dockerfile
 # (Optional)
@@ -330,14 +330,14 @@ RUN sed -i 's/http:\/\/archive.ubuntu.com\/ubuntu\//http:\/\/mirrors.aliyun.com\
 
 构建命令：
 
-```shell
+```bash
 # build an image with PyTorch 1.12, CUDA 11.6, TensorRT 8.2.4 ONNXRuntime 1.8.1
 docker build -f docker/Dockerfile_deployment -t mmyolo:v1 .
 ```
 
 用以下命令运行 Docker 镜像：
 
-```shell
+```bash
 export DATA_DIR=/path/to/your/dataset
 docker run --gpus all --shm-size=8g -it --name mmyolo -v ${DATA_DIR}:/openmmlab/mmyolo/data/coco mmyolo:v1
 ```
@@ -386,7 +386,7 @@ python3 ${MMDEPLOY_DIR}/tools/test.py \
 
 在 `/openmmlab/mmyolo` 下运行：
 
-```shell
+```bash
 sh script.sh
 ```
 
@@ -400,7 +400,7 @@ sh script.sh
 
   ![image](https://user-images.githubusercontent.com/92794867/199657283-95412e84-3ba4-463f-b4b2-4bf52ec4acbd.png)
 
-可以看到，经过 `MMDeploy` 部署的模型与 [MMYOLO-YOLOv5](`https://github.com/open-mmlab/mmyolo/tree/main/configs/yolov5`) 的 mAP-37.7 差距在 1% 以内。
+可以看到，经过 `MMDeploy` 部署的模型与 [MMYOLO-YOLOv5](https://github.com/open-mmlab/mmyolo/tree/main/configs/yolov5#results-and-models) 的 mAP-37.7 差距在 1% 以内。
 
 如果您需要测试您的模型推理速度，可以使用以下命令：
 
diff --git a/docs/zh_cn/get_started.md b/docs/zh_cn/get_started.md
index db8ae0fff..c1371f379 100644
--- a/docs/zh_cn/get_started.md
+++ b/docs/zh_cn/get_started.md
@@ -7,6 +7,7 @@
 | MMYOLO version |   MMDetection version    |     MMEngine version     |      MMCV version       |
 | :------------: | :----------------------: | :----------------------: | :---------------------: |
 |      main      | mmdet>=3.0.0rc3, \<3.1.0 | mmengine>=0.3.1, \<1.0.0 | mmcv>=2.0.0rc0, \<2.1.0 |
+|     0.2.0      | mmdet>=3.0.0rc3, \<3.1.0 | mmengine>=0.3.1, \<1.0.0 | mmcv>=2.0.0rc0, \<2.1.0 |
 |     0.1.3      | mmdet>=3.0.0rc3, \<3.1.0 | mmengine>=0.3.1, \<1.0.0 | mmcv>=2.0.0rc0, \<2.1.0 |
 |     0.1.2      | mmdet>=3.0.0rc2, \<3.1.0 | mmengine>=0.3.0, \<1.0.0 | mmcv>=2.0.0rc0, \<2.1.0 |
 |     0.1.1      |     mmdet==3.0.0rc1      | mmengine>=0.1.0, \<0.2.0 | mmcv>=2.0.0rc0, \<2.1.0 |
diff --git a/docs/zh_cn/index.rst b/docs/zh_cn/index.rst
index 4bdd48194..8d3372115 100644
--- a/docs/zh_cn/index.rst
+++ b/docs/zh_cn/index.rst
@@ -57,6 +57,13 @@
    notes/faq.md
    notes/changelog.md
 
+.. toctree::
+   :maxdepth: 2
+   :caption: 社区
+
+   community/contributing.md
+   community/code_style.md
+
 .. toctree::
    :caption: 语言切换
 
diff --git a/docs/zh_cn/notes/changelog.md b/docs/zh_cn/notes/changelog.md
index 2aac408c5..ac5df1dc2 100644
--- a/docs/zh_cn/notes/changelog.md
+++ b/docs/zh_cn/notes/changelog.md
@@ -1,5 +1,60 @@
 # 更新日志
 
+## v0.2.0（1/12/2022)
+
+### 亮点
+
+1. 支持 [YOLOv7](https://github.com/open-mmlab/mmyolo/tree/dev/configs/yolov7) P5 和 P6 模型
+2. 支持 [YOLOv6](https://github.com/open-mmlab/mmyolo/blob/dev/configs/yolov6/README.md) 中的 ML 大模型
+3. 支持 [Grad-Based CAM 和 Grad-Free CAM](https://github.com/open-mmlab/mmyolo/blob/dev/demo/boxam_vis_demo.py)
+4. 基于 sahi 支持 [大图推理](https://github.com/open-mmlab/mmyolo/blob/dev/demo/large_image_demo.py)
+5. projects 文件夹下新增 [easydeploy](https://github.com/open-mmlab/mmyolo/blob/dev/projects/easydeploy/README.md) 项目
+6. 新增 [自定义数据集教程](https://github.com/open-mmlab/mmyolo/blob/dev/docs/zh_cn/user_guides/custom_dataset.md)
+
+### 新特性
+
+1. `browse_dataset.py` 脚本支持可视化原图、数据增强后和中间结果功能 (#304)
+2. `image_demo.py` 新增预测结果保存为 labelme 格式功能 (#288, #314)
+3. 新增 labelme 格式转 COCO 格式脚本 `labelme2coco` (#308, #313)
+4. 新增 COCO 数据集切分脚本 `coco_split.py` (#311)
+5. `how-to.md` 文档中新增两个 backbone 替换案例以及更新 `plugin.md` (#291)
+6. 新增贡献者文档 `contributing.md` and 代码规范文档 `code_style.md` (#322)
+7. 新增如何通过 mim 跨库调用脚本文档 (#321)
+8. `YOLOv5` 支持 RV1126 设备部署 (#262)
+
+### Bug 修复
+
+1. 修复 `MixUp` padding 错误 (#319)
+2. 修复 `LetterResize` 和 `YOLOv5KeepRatioResize` 中 `scale_factor` 参数顺序错误 (#305)
+3. 修复 `YOLOX Nano` 模型训练错误问题 (#285)
+4. 修复 `RTMDet` 部署没有导包的错误 (#287)
+5. 修复 int8 部署配置错误 (#315)
+6. 修复 `basebackbone` 中 `make_stage_plugins` 注释 (#296)
+7. 部署模块支持切换为 deploy 模式功能 (#324)
+8. 修正 `RTMDet` 模型结构图中的错误 (#317)
+
+### 完善
+
+1. `test.py` 中新增 json 格式导出选项 (#316)
+2. `extract_subcoco.py` 脚本中新增基于面积阈值过滤规则 (#286)
+3. 部署相关中文文档翻译为英文 (#289)
+4. 新增 `YOLOv6` 算法描述大纲文档 (#252)
+5. 完善 `config.md` (#297, #303)
+6. 完善 `mosiac9` 的 docstring (#307)
+7. 完善 `browse_coco_json.py` 脚本输入参数 (#309)
+8. 重构 `dataset_analysis.py` 中部分函数使其更加通用 (#294)
+
+### 视频
+
+1. 发布了 [工程文件结构简析](https://www.bilibili.com/video/BV1LP4y117jS)
+2. 发布了 [10分钟换遍主干网络文档](https://www.bilibili.com/video/BV1JG4y1d7GC)
+
+### 贡献者
+
+总共 14 位开发者参与了本次版本
+
+谢谢 @fcakyon, @matrixgame2018, @MambaWong, @imAzhou, @triple-Mu, @RangeKing, @PeterH0323, @xin-li-67, @kitecats, @hanrui1sensetime, @AllentDan, @Zheng-LinXiao, @hhaAndroid, @wanghonglie
+
 ## v0.1.3（10/11/2022)
 
 ### 新特性
diff --git a/docs/zh_cn/user_guides/config.md b/docs/zh_cn/user_guides/config.md
index 2cb188406..738931a37 100644
--- a/docs/zh_cn/user_guides/config.md
+++ b/docs/zh_cn/user_guides/config.md
@@ -79,7 +79,7 @@ model = dict(
 
 ### 数据集和评测器配置
 
-在使用 [执行器](https://mmengine.readthedocs.io/en/latest/tutorials/runner.html) 进行训练、测试、验证时，我们需要配置 [Dataloader](https://pytorch.org/docs/stable/data.html?highlight=data%20loader#torch.utils.data.DataLoader) 。构建数据 dataloader 需要设置数据集（dataset）和数据处理流程（data pipeline）。 由于这部分的配置较为复杂，我们使用中间变量来简化 dataloader 配置的编写。由于 MMYOLO 中各类轻量目标检测算法使用了更加复杂的数据增强方法，因此会比 MMDetection 中的其他模型拥有更多样的数据集配置。
+在使用 [执行器](https://mmengine.readthedocs.io/zh_CN/latest/tutorials/runner.html) 进行训练、测试、验证时，我们需要配置 [Dataloader](https://pytorch.org/docs/stable/data.html?highlight=data%20loader#torch.utils.data.DataLoader) 。构建数据 dataloader 需要设置数据集（dataset）和数据处理流程（data pipeline）。 由于这部分的配置较为复杂，我们使用中间变量来简化 dataloader 配置的编写。由于 MMYOLO 中各类轻量目标检测算法使用了更加复杂的数据增强方法，因此会比 MMDetection 中的其他模型拥有更多样的数据集配置。
 
 YOLOv5 的训练与测试的数据流存在一定差异，这里我们分别进行介绍。
 
@@ -198,7 +198,7 @@ val_dataloader = dict(
 test_dataloader = val_dataloader
 ```
 
-[评测器](https://mmengine.readthedocs.io/en/latest/tutorials/metric_and_evaluator.html) 用于计算训练模型在验证和测试数据集上的指标。评测器的配置由一个或一组评价指标（Metric）配置组成：
+[评测器](https://mmengine.readthedocs.io/zh_CN/latest/tutorials/evaluation.html) 用于计算训练模型在验证和测试数据集上的指标。评测器的配置由一个或一组评价指标（Metric）配置组成：
 
 ```python
 val_evaluator = dict(  # 验证过程使用的评测器
@@ -270,7 +270,7 @@ test_cfg = dict(type='TestLoop')  # 测试循环的类型
 
 ### 优化相关配置
 
-`optim_wrapper` 是配置优化相关设置的字段。优化器封装（OptimWrapper）不仅提供了优化器的功能，还支持梯度裁剪、混合精度训练等功能。更多内容请看[优化器封装教程](https://mmengine.readthedocs.io/en/latest/tutorials/optimizer.html).
+`optim_wrapper` 是配置优化相关设置的字段。优化器封装（OptimWrapper）不仅提供了优化器的功能，还支持梯度裁剪、混合精度训练等功能。更多内容请看[优化器封装教程](https://mmengine.readthedocs.io/zh_CN/latest/tutorials/optim_wrapper.html).
 
 ```python
 optim_wrapper = dict(  # 优化器封装的配置
@@ -282,7 +282,7 @@ optim_wrapper = dict(  # 优化器封装的配置
         weight_decay=0.0005, # 权重衰减
         nesterov=True, # 开启Nesterov momentum，公式详见 http://www.cs.toronto.edu/~hinton/absps/momentum.pdf
         batch_size_per_gpu=train_batch_size_per_gpu),  # 该选项实现了自动权重衰减系数缩放
-    clip_grad=None,  # 梯度裁剪的配置，设置为 None 关闭梯度裁剪。使用方法请见 https://mmengine.readthedocs.io/en/latest/tutorials/optimizer.html
+    clip_grad=None,  # 梯度裁剪的配置，设置为 None 关闭梯度裁剪。使用方法请见 https://mmengine.readthedocs.io/zh_CN/latest/tutorials/optim_wrapper.html
     constructor='YOLOv5OptimizerConstructor') # YOLOv5 优化器构建器
 
 ```
@@ -328,7 +328,7 @@ custom_hooks = [
 ### 运行相关配置
 
 ```python
-default_scope = 'mmyolo'  # 默认的注册器域名，默认从此注册器域中寻找模块。请参考 https://mmengine.readthedocs.io/en/latest/tutorials/registry.html
+default_scope = 'mmyolo'  # 默认的注册器域名，默认从此注册器域中寻找模块。请参考 https://mmengine.readthedocs.io/zh_CN/latest/tutorials/registry.html
 
 env_cfg = dict(
     cudnn_benchmark=True,  # 是否启用 cudnn benchmark, 推荐单尺度训练时开启，可加速训练
@@ -361,7 +361,7 @@ resume = False  # 是否从 `load_from` 中定义的检查点恢复。 如果 `l
 
 如果你在构建一个与任何现有方法不共享结构的全新方法，那么可以在 `configs` 文件夹下创建一个新的例如 `yolov100` 文件夹。
 
-更多细节请参考 [MMEngine 配置文件教程](https://mmengine.readthedocs.io/en/latest/tutorials/config.html)。
+更多细节请参考 [MMEngine 配置文件教程](https://mmengine.readthedocs.io/zh_CN/latest/tutorials/config.html)。
 
 通过设置 `_base_` 字段，我们可以设置当前配置文件继承自哪些文件。
 
@@ -384,7 +384,7 @@ _base_ = [
 
 ### 忽略基础配置文件里的部分内容
 
-有时，您也许会设置 `_delete_=True` 去忽略基础配置文件里的一些域内容。 您也许可以参照 [MMEngine 配置文件教程](https://mmengine.readthedocs.io/en/latest/tutorials/config.html) 来获得一些简单的指导。
+有时，您也许会设置 `_delete_=True` 去忽略基础配置文件里的一些域内容。 您也许可以参照 [MMEngine 配置文件教程](https://mmengine.readthedocs.io/zh_CN/latest/tutorials/config.html) 来获得一些简单的指导。
 
 在 MMYOLO 里，例如为了改变 RTMDet 的主干网络的某些内容：
 
@@ -540,7 +540,7 @@ pre_transform = _base_.pre_transform # 变量 pre_transform 等于 _base_ 中定
 我们遵循以下样式来命名配置文件。建议贡献者遵循相同的风格。
 
 ```
-{algorithm name}_{model component names [component1]_[component2]_[...]}-[version id]_[norm setting]_[data preprocessor type]_{training settings}_{training dataset information}_{testing dataset information}.py
+{algorithm name}_{model component names [component1]_[component2]_[...]}-[version id]_[norm setting]_[data preprocessor type]_{training settings}_{training dataset information}_[testing dataset information].py
 ```
 
 文件名分为 8 个部分，其中 4 个必填部分、4 个可选部分。 每个部分用 `_` 连接，每个部分内的单词应该用 `-` 连接。`{}` 表示必填部分，`[]` 表示选填部分。
diff --git a/docs/zh_cn/user_guides/custom_dataset.md b/docs/zh_cn/user_guides/custom_dataset.md
new file mode 100644
index 000000000..a96a42c9b
--- /dev/null
+++ b/docs/zh_cn/user_guides/custom_dataset.md
@@ -0,0 +1,470 @@
+# 自定义数据集 标注+训练+测试+部署 全流程
+
+本章节会介绍从 用户自定义图片数据集标注 到 最终进行训练和部署 的整体流程。流程步骤概览如下：
+
+1. 数据集准备：`tools/misc/download_dataset.py`
+2. 使用 [labelme](https://github.com/wkentaro/labelme) 进行数据集标注：`demo/image_demo.py` + labelme
+3. 使用脚本转换成 COCO 数据集格式：`tools/dataset_converters/labelme2coco.py`
+4. 数据集划分：`tools/misc/coco_split.py`
+5. 根据数据集内容新建 config 文件
+6. 训练：`tools/train.py`
+7. 推理：`demo/image_demo.py`
+8. 部署
+
+下面详细介绍每一步。
+
+## 1. 数据集准备
+
+- 如果自己没有数据集，可以使用本教程提供的一个 `cat` 数据集，下载命令：
+
+```shell
+python tools/misc/download_dataset.py --dataset-name cat --save-dir ./data/cat --unzip --delete
+```
+
+会自动下载到 `./data/cat` 文件夹中，该文件的目录结构是：
+
+```shell
+.
+└── ./data/cat
+    ├── images # 图片文件
+    │    ├── image1.jpg
+    │    ├── image2.png
+    │    └── ...
+    ├── labels # labelme 标注文件
+    │    ├── image1.json
+    │    ├── image2.json
+    │    └── ...
+    ├── annotations # 数据集划分的 COCO 文件
+    │    ├── annotations_all.json # 全量数据的 COCO label 文件
+    │    ├── trainval.json # 划分比例 80% 的数据
+    │    └── test.json # 划分比例 20% 的数据
+    └── class_with_id.txt # id + class_name 文件
+```
+
+**Tips**：这个数据集可以直接训练，如果您想体验整个流程的话，可以将 `images` 文件夹**以外的**其余文件都删除。
+
+- 如你已经有数据，可以将其组成下面的结构
+
+```shell
+.
+└── $DATA_ROOT
+    └── images
+         ├── image1.jpg
+         ├── image2.png
+         └── ...
+```
+
+## 2. 使用 labelme 进行数据集标注
+
+通常，标注有 2 种方法：
+
+- 软件或者算法辅助 + 人工修正 label
+- 仅人工标注
+
+## 2.1 软件或者算法辅助 + 人工修正 label
+
+辅助标注的原理是用已有模型进行推理，将得出的推理信息保存为标注软件 label 文件格式。
+
+**Tips**：如果已有模型典型的如 COCO 预训练模型没有你自定义新数据集的类别，建议先人工打 100 张左右的图片 label，训练个初始模型，然后再进行辅助标注。
+
+人工操作标注软件加载生成好的 label 文件，只需要检查每张图片的目标是否标准，以及是否有漏掉的目标。
+
+【辅助 + 人工标注】这种方式可以节省很多时间和精力，达到降本提速的目的。
+
+下面会分别介绍其过程：
+
+### 2.1.1 软件或者算法辅助
+
+MMYOLO 提供的模型推理脚本 `demo/image_demo.py` 设置 `--to-labelme` 可以生成 labelme 格式 label 文件，具体用法如下：
+
+```shell
+python demo/image_demo.py img \
+                          config \
+                          checkpoint
+                          [--out-dir OUT_DIR] \
+                          [--device DEVICE] \
+                          [--show] \
+                          [--deploy] \
+                          [--score-thr SCORE_THR] \
+                          [--class-name CLASS_NAME]
+                          [--to-labelme]
+```
+
+其中：
+
+- `img`： 图片的路径，支持文件夹、文件、URL；
+- `config`：用到的模型 config 文件路径；
+- `checkpoint`：用到的模型权重文件路径；
+- `--out-dir`：推理结果输出到指定目录下，默认为 `./output`，当 `--show` 参数存在时，不保存检测结果；
+- `--device`：使用的计算资源，包括 `CUDA`, `CPU` 等，默认为 `cuda:0`；
+- `--show`：使用该参数表示在屏幕上显示检测结果，默认为 `False`；
+- `--deploy`：是否切换成 deploy 模式；
+- `--score-thr`：置信度阈值，默认为 `0.3`；
+- `--to-labelme`：是否导出 `labelme` 格式的 label 文件，不可以与 `--show` 参数同时存在
+
+例子：
+
+这里使用 YOLOv5-s 作为例子来进行辅助标注刚刚下载的 `cat` 数据集，先下载 YOLOv5-s 的权重:
+
+```shell
+mkdir work_dirs
+wget https://download.openmmlab.com/mmyolo/v0/yolov5/yolov5_s-v61_syncbn_fast_8xb16-300e_coco/yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700-86e02187.pth -P ./work_dirs
+```
+
+由于 COCO 80 类数据集中已经包括了 `cat` 这一类，因此我们可以直接加载 COCO 预训练权重进行辅助标注。
+
+```shell
+python demo/image_demo.py ./data/cat/images \
+                          ./configs/yolov5/yolov5_s-v61_syncbn_fast_8xb16-300e_coco.py \
+                          ./work_dirs/yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700-86e02187.pth \
+                          --out-dir ./data/cat/labels \
+                          --class-name cat \
+                          --to-labelme
+```
+
+**Tips**：
+
+- 如果你的数据集需要标注多类，可以采用类似 `--class-name class1 class2` 格式输入；
+- 如果全部输出，则删掉 `--class-name` 这个 flag 即可全部类都输出。
+
+生成的 label 文件会在 `--out-dir` 中:
+
+```shell
+.
+└── $OUT_DIR
+    ├── image1.json
+    ├── image1.json
+    └── ...
+```
+
+### 2.1.2 人工标注
+
+本教程使用的标注软件是 [labelme](https://github.com/wkentaro/labelme)
+
+- 安装 labelme
+
+```shell
+pip install labelme
+```
+
+- 启动 labelme
+
+```shell
+labelme ${图片文件夹路径（即上一步的图片文件夹）} \
+        --output ${label文件所处的文件夹路径（即上一步的 --out-dir）} \
+        --autosave \
+        --nodata
+```
+
+其中：
+
+- `--output`：labelme 标注文件保存路径，如果该路径下已经存在部分图片的标注文件，则会进行加载；
+- `--autosave`：标注文件自动保存，会略去一些繁琐的保存步骤；
+- `--nodata`：每张图片的标注文件中不保存图片的 base64 编码，设置了这个 flag 会大大减少标注文件的大小。
+
+例子：
+
+```shell
+labelme ./data/cat/images --output ./data/cat/labels --autosave --nodata
+```
+
+输入命令之后 labelme 就会启动，然后进行 label 检查即可。如果 labelme 启动失败，命令行输入 `export QT_DEBUG_PLUGINS=1` 查看具体缺少什么库，安装一下即可。
+
+**注意：标注的时候务必使用 `rectangle`，快捷键 `Ctrl + R`（如下图）**
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/25873202/204076212-86dab4fa-13dd-42cd-93d8-46b04b864449.png" alt="rectangle"/>
+</div>
+
+## 2.2 仅人工标注
+
+步骤和 【1.1.2 人工标注】 相同，只是这里是直接标注，没有预先生成的 label 。
+
+## 3. 使用脚本转换成 COCO 数据集格式
+
+### 3.1 使用脚本转换
+
+MMYOLO 提供脚本将 labelme 的 label 转换为 COCO label
+
+```shell
+python tools/dataset_converters/labelme2coco.py --img-dir ${图片文件夹路径} \
+                                                --labels-dir ${label 文件夹位置} \
+                                                --out ${输出 COCO label json 路径}
+                                                [--class-id-txt]
+```
+
+其中：
+`--class-id-txt`：是数据集 `id class_name` 的 `.txt` 文件：
+
+- 如果不指定，则脚本会自动生成，生成在 `--out` 同级的目录中，保存文件名为 `class_with_id.txt`；
+- 如果指定，脚本仅会进行读取但不会新增或者覆盖，同时，脚本里面还会判断是否存在 `.txt` 中其他的类，如果出现了会报错提示，届时，请用户检查 `.txt` 文件并加入新的类及其 `id`。
+
+`.txt` 文件的例子如下（ `id` 可以和 COCO 一样，从 `1` 开始）：
+
+```text
+1 cat
+2 dog
+3 bicycle
+4 motorcycle
+
+```
+
+### 3.2 检查转换的 COCO label
+
+使用下面的命令可以将 COCO 的 label 在图片上进行显示，这一步可以验证刚刚转换是否有问题：
+
+```shell
+python tools/analysis_tools/browse_coco_json.py --img-dir ${图片文件夹路径} \
+                                                --ann-file ${COCO label json 路径}
+```
+
+关于 `tools/analysis_tools/browse_coco_json.py` 的更多用法请参考 [可视化 COCO label](useful_tools.md)。
+
+## 4. 数据集划分
+
+```shell
+python tools/misc/coco_split.py --json ${COCO label json 路径} \
+                                --out-dir ${划分 label json 保存根路径} \
+                                --ratios ${划分比例} \
+                                [--shuffle] \
+                                [--seed ${划分的随机种子}]
+```
+
+其中：
+
+- `--ratios`：划分的比例，如果只设置了 2 个，则划分为 `trainval + test`，如果设置为 3 个，则划分为 `train + val + test`。支持两种格式 —— 整数、小数：
+  - 整数：按比分进行划分，代码中会进行归一化之后划分数据集。例子： `--ratio 2 1 1`（代码里面会转换成 `0.5 0.25 0.25`） or `--ratio 3 1`（代码里面会转换成 `0.75 0.25`）
+  - 小数：划分为比例。**如果加起来不为 1 ，则脚本会进行自动归一化修正**。例子： `--ratio 0.8 0.1 0.1` or `--ratio 0.8 0.2`
+- `--shuffle`: 是否打乱数据集再进行划分；
+- `--seed`：设定划分的随机种子，不设置的话自动生成随机种子。
+
+## 5. 根据数据集内容新建 config 文件
+
+确保数据集目录是这样的：
+
+```shell
+.
+└── $DATA_ROOT
+    ├── annotations
+    │    ├── train.json # or trainval.json
+    │    ├── val.json # optional
+    │    └── test.json
+    ├── images
+    │    ├── image1.jpg
+    │    ├── image1.png
+    │    └── ...
+    └── ...
+```
+
+因为是我们自定义的数据集，所以我们需要自己新建一个 config 并加入需要修改的部分信息。
+
+关于新的 config 的命名：
+
+- 这个 config 继承的是 `yolov5_s-v61_syncbn_fast_8xb16-300e_coco.py`；
+- 训练的类以本教程提供的数据集中的类 `cat` 为例（如果是自己的数据集，可以自定义类型的总称）；
+- 本教程测试的显卡型号是 1 x 3080Ti 12G 显存，电脑内存 32G，可以训练 YOLOv5-s 最大批次是 `batch size = 32`（详细机器资料可见附录）；
+- 训练轮次是 `100 epoch`。
+
+综上所述：可以将其命名为 `yolov5_s-v61_syncbn_fast_1xb32-100e_cat.py`。
+
+我们可以在 configs 目录下新建一个新的目录 `custom_dataset`，同时在里面新建该 config 文件，并添加以下内容：
+
+```python
+_base_ = '../yolov5/yolov5_s-v61_syncbn_fast_8xb16-300e_coco.py'
+
+max_epochs = 100  # 训练的最大 epoch
+data_root = './data/cat/'  # 数据集目录的绝对路径
+
+# 结果保存的路径，可以省略，省略保存的文件名位于 work_dirs 下 config 同名的文件夹中
+# 如果某个 config 只是修改了部分参数，修改这个变量就可以将新的训练文件保存到其他地方
+work_dir = './work_dirs/yolov5_s-v61_syncbn_fast_1xb32-100e_cat'
+
+# load_from 可以指定本地路径或者 URL，设置了 URL 会自动进行下载，因为上面已经下载过，我们这里设置本地路径
+load_from = './work_dirs/yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700-86e02187.pth'
+
+train_batch_size_per_gpu = 32  # 根据自己的GPU情况，修改 batch size，YOLOv5-s 默认为 8卡 * 16bs
+train_num_workers = 4  # 推荐使用 train_num_workers = nGPU x 4
+
+save_epoch_intervals = 2  # 每 interval 轮迭代进行一次保存一次权重
+
+# 根据自己的 GPU 情况，修改 base_lr，修改的比例是 base_lr_default * (your_bs / default_bs)
+base_lr = _base_.base_lr / 4
+
+num_classes = 1
+metainfo = dict(  # 根据 class_with_id.txt 类别信息，设置 metainfo
+    CLASSES=('cat',),
+    PALETTE=[(220, 20, 60)]  # 画图时候的颜色，随便设置即可
+)
+
+train_cfg = dict(
+    max_epochs=max_epochs,
+    val_begin=10,  # 第几个epoch后验证，这里设置 10 是因为前 10 个 epoch 精度不高，测试意义不大，故跳过
+    val_interval=save_epoch_intervals  # 每 val_interval 轮迭代进行一次测试评估
+)
+
+model = dict(
+    bbox_head=dict(
+        head_module=dict(num_classes=num_classes),
+
+        # loss_cls 会根据 num_classes 动态调整，但是 num_classes = 1 的时候，loss_cls 恒为 0
+        loss_cls=dict(loss_weight=0.5 * (num_classes / 80 * 3 / _base_.num_det_layers))
+    )
+)
+
+train_dataloader = dict(
+    batch_size=train_batch_size_per_gpu,
+    num_workers=train_num_workers,
+    dataset=dict(
+        _delete_=True,
+        type='RepeatDataset',
+        times=5,  # 数据量太少的话，可以使用 RepeatDataset 来增量数据，这里设置 5 是 5 倍
+        dataset=dict(
+            type=_base_.dataset_type,
+            data_root=data_root,
+            metainfo=metainfo,
+            ann_file='annotations/trainval.json',
+            data_prefix=dict(img='images/'),
+            filter_cfg=dict(filter_empty_gt=False, min_size=32),
+            pipeline=_base_.train_pipeline)
+    ))
+
+val_dataloader = dict(
+    dataset=dict(
+        metainfo=metainfo,
+        data_root=data_root,
+        ann_file='annotations/trainval.json',
+        data_prefix=dict(img='images/')))
+
+test_dataloader = val_dataloader
+
+val_evaluator = dict(ann_file=data_root + 'annotations/trainval.json')
+test_evaluator = val_evaluator
+
+optim_wrapper = dict(optimizer=dict(lr=base_lr))
+
+default_hooks = dict(
+    # 设置间隔多少个 epoch 保存模型，以及保存模型最多几个，`save_best` 是另外保存最佳模型（推荐）
+    checkpoint=dict(type='CheckpointHook', interval=save_epoch_intervals,
+                    max_keep_ckpts=5, save_best='auto'),
+    # logger 输出的间隔
+    logger=dict(type='LoggerHook', interval=10)
+)
+```
+
+## 6. 训练
+
+使用下面命令进行启动训练（训练大约需要 2.5 个小时）：
+
+```shell
+python tools/train.py configs/custom_dataset/yolov5_s-v61_syncbn_fast_1xb32-100e_cat.py
+```
+
+下面是 `1 x 3080Ti`、`batch size = 32`，训练 `100 epoch` 最佳精度权重 `work_dirs/yolov5_s-v61_syncbn_fast_1xb32-100e_cat/best_coco/bbox_mAP_epoch_100.pth` 得出来的精度（详细机器资料可见附录）：
+
+```shell
+ Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.950
+ Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 1.000
+ Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 1.000
+ Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
+ Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
+ Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.950
+ Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.869
+ Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.964
+ Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.964
+ Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
+ Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
+ Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.964
+
+bbox_mAP_copypaste: 0.950 1.000 1.000 -1.000 -1.000 0.950
+Epoch(val) [100][116/116]  coco/bbox_mAP: 0.9500  coco/bbox_mAP_50: 1.0000  coco/bbox_mAP_75: 1.0000  coco/bbox_mAP_s: -1.0000  coco/bbox_mAP_m: -1.0000  coco/bbox_mAP_l: 0.9500
+```
+
+## 7. 推理
+
+使用最佳的模型进行推理，下面命令中的最佳模型路径是 `./work_dirs/yolov5_s-v61_syncbn_fast_1xb32-100e_cat/best_coco/bbox_mAP_epoch_100.pth`，请用户自行修改为自己训练的最佳模型路径。
+
+```shell
+python demo/image_demo.py ./data/cat/images \
+                          ./configs/custom_dataset/yolov5_s-v61_syncbn_fast_1xb32-100e_cat.py \
+                          ./work_dirs/yolov5_s-v61_syncbn_fast_1xb32-100e_cat/best_coco/bbox_mAP_epoch_100.pth \
+                          --out-dir ./data/cat/pred_images
+```
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/25873202/204773727-5d3cbbad-1265-45a0-822a-887713555049.jpg" alt="推理图片"/>
+</div>
+
+**Tips**：如果推理结果不理想，这里举例 2 种情况：
+
+1. 欠拟合：
+   需要先判断是不是训练 epoch 不够导致的欠拟合，如果是训练不够，则修改 config 文件里面的 `max_epochs` 和 `work_dir` 参数，或者根据上面的命名方式新建一个 config 文件，重新进行训练。
+
+2. 数据集优化：
+   如果 epoch 加上去了还是不行，可以增加数据集数量，同时可以重新检查并优化数据集的标注，然后重新进行训练。
+
+## 8. 部署
+
+MMYOLO 提供两种部署方式：
+
+1. [MMDeploy](https://github.com/open-mmlab/mmdeploy) 框架进行部署
+2. 使用 `projects/easydeploy` 进行部署
+
+### 8.1 MMDeploy 框架进行部署
+
+详见[YOLOv5 部署全流程说明](https://mmyolo.readthedocs.io/zh_CN/latest/deploy/yolov5_deployment.html)
+
+### 8.2 使用 `projects/easydeploy` 进行部署
+
+详见[部署文档](https://github.com/open-mmlab/mmyolo/blob/dev/projects/easydeploy/README_zh-CN.md)
+
+TODO: 下个版本会完善这个部分...
+
+## 附录
+
+### 1. 本教程训练机器的详细环境的资料如下：
+
+```shell
+sys.platform: linux
+Python: 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:58:50) [GCC 10.3.0]
+CUDA available: True
+numpy_random_seed: 2147483648
+GPU 0: NVIDIA GeForce RTX 3080 Ti
+CUDA_HOME: /usr/local/cuda
+NVCC: Cuda compilation tools, release 11.5, V11.5.119
+GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
+PyTorch: 1.10.0
+PyTorch compiling details: PyTorch built with:
+  - GCC 7.3
+  - C++ Version: 201402
+  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
+  - Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
+  - OpenMP 201511 (a.k.a. OpenMP 4.5)
+  - LAPACK is enabled (usually provided by MKL)
+  - NNPACK is enabled
+  - CPU capability usage: AVX2
+  - CUDA Runtime 11.3
+  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;
+                             arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;
+                             -gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;
+                             arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
+  - CuDNN 8.2
+  - Magma 2.5.2
+  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0,
+                    CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden
+                    -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK
+                    -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra
+                    -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas
+                    -Wno-sign-compare -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic
+                    -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new
+                    -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format
+                    -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1,
+                    TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON,
+                    USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
+
+TorchVision: 0.11.0
+OpenCV: 4.6.0
+MMEngine: 0.3.1
+MMCV: 2.0.0rc3
+MMDetection: 3.0.0rc3
+MMYOLO: 0.1.3+3815671
+```
diff --git a/docs/zh_cn/user_guides/index.rst b/docs/zh_cn/user_guides/index.rst
index a814669c3..d5122fded 100644
--- a/docs/zh_cn/user_guides/index.rst
+++ b/docs/zh_cn/user_guides/index.rst
@@ -24,5 +24,6 @@ MMYOLO 在 `Model Zoo <https://mmyolo.readthedocs.io/en/latest/model_zoo.html>`_
 .. toctree::
    :maxdepth: 1
 
+   custom_dataset.md
    visualization.md
    useful_tools.md
diff --git a/docs/zh_cn/user_guides/useful_tools.md b/docs/zh_cn/user_guides/useful_tools.md
index 84cb52c19..3bf5637f5 100644
--- a/docs/zh_cn/user_guides/useful_tools.md
+++ b/docs/zh_cn/user_guides/useful_tools.md
@@ -5,11 +5,9 @@
 以 MMDetection 为例，如果想利用 [print_config.py](https://github.com/open-mmlab/mmdetection/blob/3.x/tools/misc/print_config.py)，你可以直接采用如下命令，而无需复制源码到 MMYOLO 库中。
 
 ```shell
-mim run mmdet print_config [CONFIG]
+mim run mmdet print_config ${CONFIG}
 ```
 
-**注意**：上述命令能够成功的前提是 MMDetection 库必须通过 MIM 来安装。
-
 ## 可视化
 
 ### 可视化 COCO 标签
@@ -17,49 +15,60 @@ mim run mmdet print_config [CONFIG]
 脚本 `tools/analysis_tools/browse_coco_json.py` 能够使用可视化显示 COCO 标签在图片的情况。
 
 ```shell
-python tools/analysis_tools/browse_coco_json.py ${DATA_ROOT} \
-                                                [--ann_file ${ANN_FILE}] \
-                                                [--img_dir ${IMG_DIR}] \
+python tools/analysis_tools/browse_coco_json.py [--data-root ${DATA_ROOT}] \
+                                                [--img-dir ${IMG_DIR}] \
+                                                [--ann-file ${ANN_FILE}] \
                                                 [--wait-time ${WAIT_TIME}] \
                                                 [--disp-all] [--category-names CATEGORY_NAMES [CATEGORY_NAMES ...]] \
                                                 [--shuffle]
 ```
 
+其中，如果图片、标签都在同一个文件夹下的话，可以指定 `--data-root` 到该文件夹，然后 `--img-dir` 和 `--ann-file` 指定该文件夹的相对路径，代码会自动拼接。
+如果图片、标签文件不在同一个文件夹下的话，则无需指定 `--data-root` ，直接指定绝对路径的 `--img-dir` 和 `--ann-file` 即可。
+
 例子：
 
 1. 查看 `COCO` 全部类别，同时展示 `bbox`、`mask` 等所有类型的标注：
 
 ```shell
-python tools/analysis_tools/browse_coco_json.py './data/coco/' \
-                                                --ann_file 'annotations/instances_train2017.json' \
-                                                --img_dir 'train2017' \
+python tools/analysis_tools/browse_coco_json.py --data-root './data/coco' \
+                                                --img-dir 'train2017' \
+                                                --ann-file 'annotations/instances_train2017.json' \
+                                                --disp-all
+```
+
+如果图片、标签不在同一个文件夹下的话，可以使用绝对路径：
+
+```shell
+python tools/analysis_tools/browse_coco_json.py --img-dir '/dataset/image/coco/train2017' \
+                                                --ann-file '/label/instances_train2017.json' \
                                                 --disp-all
 ```
 
 2. 查看 `COCO` 全部类别，同时仅展示 `bbox` 类型的标注，并打乱显示：
 
 ```shell
-python tools/analysis_tools/browse_coco_json.py './data/coco/' \
-                                                --ann_file 'annotations/instances_train2017.json' \
-                                                --img_dir 'train2017' \
+python tools/analysis_tools/browse_coco_json.py --data-root './data/coco' \
+                                                --img-dir 'train2017' \
+                                                --ann-file 'annotations/instances_train2017.json' \
                                                 --shuffle
 ```
 
 3. 只查看 `bicycle` 和 `person` 类别，同时仅展示 `bbox` 类型的标注：
 
 ```shell
-python tools/analysis_tools/browse_coco_json.py './data/coco/' \
-                                                --ann_file 'annotations/instances_train2017.json' \
-                                                --img_dir 'train2017' \
+python tools/analysis_tools/browse_coco_json.py --data-root './data/coco' \
+                                                --img-dir 'train2017' \
+                                                --ann-file 'annotations/instances_train2017.json' \
                                                 --category-names 'bicycle' 'person'
 ```
 
 4. 查看 `COCO` 全部类别，同时展示 `bbox`、`mask` 等所有类型的标注，并打乱显示：
 
 ```shell
-python tools/analysis_tools/browse_coco_json.py './data/coco/' \
-                                                --ann_file 'annotations/instances_train2017.json' \
-                                                --img_dir 'train2017' \
+python tools/analysis_tools/browse_coco_json.py --data-root './data/coco' \
+                                                --img-dir 'train2017' \
+                                                --ann-file 'annotations/instances_train2017.json' \
                                                 --disp-all \
                                                 --shuffle
 ```
@@ -351,7 +360,21 @@ python tools/analysis_tools/optimize_anchors.py ${CONFIG} \
 ## 提取 COCO 子集
 
 COCO2017 数据集训练数据集包括 118K 张图片，验证集包括 5K 张图片，数据集比较大。在调试或者快速验证程序是否正确的场景下加载 json 会需要消耗较多资源和带来较慢的启动速度，这会导致程序体验不好。
-`extract_subcoco.py` 脚本提供了切分指定张图片的功能，用户可以通过 `--num-img` 参数来得到指定图片数目的 COCO 子集，从而满足上述需求。
+
+`extract_subcoco.py` 脚本提供了按指定图片数量、类别、锚框尺寸来切分图片的功能，用户可以通过 `--num-img`, `--classes`, `--area-size` 参数来得到指定条件的 COCO 子集，从而满足上述需求。
+
+例如通过以下脚本切分图片：
+
+```shell
+python tools/misc/extract_subcoco.py \
+    ${ROOT} \
+    ${OUT_DIR} \
+    --num-img 20 \
+    --classes cat dog person \
+    --area-size small
+```
+
+会切分出 20 张图片，且这 20 张图片只会保留同时满足类别条件和锚框尺寸条件的标注信息, 没有满足条件的标注信息的图片不会被选择，保证了这 20 张图都是有 annotation info 的。
 
 注意： 本脚本目前仅仅支持 COCO2017 数据集，未来会支持更加通用的 COCO JSON 格式数据集
 
@@ -382,3 +405,15 @@ python tools/misc/extract_subcoco.py ${ROOT} ${OUT_DIR} --num-img 20 --use-train
 ```shell
 python tools/misc/extract_subcoco.py ${ROOT} ${OUT_DIR} --num-img 20 --use-training-set --seed 1
 ```
+
+4. 按指定类别切分图片
+
+```shell
+python tools/misc/extract_subcoco.py ${ROOT} ${OUT_DIR} --classes cat dog person
+```
+
+5. 按指定锚框尺寸切分图片
+
+```shell
+python tools/misc/extract_subcoco.py ${ROOT} ${OUT_DIR} --area-size small
+```
diff --git a/docs/zh_cn/user_guides/visualization.md b/docs/zh_cn/user_guides/visualization.md
index d8bd051b9..e5975eed6 100644
--- a/docs/zh_cn/user_guides/visualization.md
+++ b/docs/zh_cn/user_guides/visualization.md
@@ -1,5 +1,7 @@
 # 可视化
 
+本文包括特征图可视化和 Grad-Based 和 Grad-Free CAM 可视化
+
 ## 特征图可视化
 
 <div align=center>
@@ -12,7 +14,7 @@ MMYOLO 中，将使用 MMEngine 提供的 `Visualizer` 可视化器进行特征
 - 支持基础绘图接口以及特征图可视化。
 - 支持选择模型中的不同层来得到特征图，包含 `squeeze_mean` ， `select_max` ， `topk` 三种显示方式，用户还可以使用 `arrangement` 自定义特征图显示的布局方式。
 
-## 特征图绘制
+### 特征图绘制
 
 你可以调用 `demo/featmap_vis_demo.py` 来简单快捷地得到可视化结果，为了方便理解，将其主要参数的功能梳理如下：
 
@@ -50,7 +52,7 @@ MMYOLO 中，将使用 MMEngine 提供的 `Visualizer` 可视化器进行特征
 
 **注意：当图片和特征图尺度不一样时候，`draw_featmap` 函数会自动进行上采样对齐。如果你的图片在推理过程中前处理存在类似 Pad 的操作此时得到的特征图也是 Pad 过的，那么直接上采样就可能会出现不对齐问题。**
 
-## 用法示例
+### 用法示例
 
 以预训练好的 YOLOv5-s 模型为例:
 
@@ -167,7 +169,7 @@ python demo/featmap_vis_demo.py demo/dog.jpg \
 ```
 
 <div align=center>
-<img src="https://user-images.githubusercontent.com/17425982/198522489-8adee6ae-9915-4e9d-bf50-167b8a12c275.png" width="1200" alt="image"/>
+<img src="https://user-images.githubusercontent.com/17425982/198522489-8adee6ae-9915-4e9d-bf50-167b8a12c275.png" width="800" alt="image"/>
 </div>
 
 (5) 存储绘制后的图片，在绘制完成后，可以选择本地窗口显示，也可以存储到本地，只需要加入参数 `--out-file xxx.jpg`：
@@ -180,3 +182,113 @@ python demo/featmap_vis_demo.py demo/dog.jpg \
                                 --channel-reduction select_max \
                                 --out-file featmap_backbone.jpg
 ```
+
+## Grad-Based 和 Grad-Free CAM 可视化
+
+目标检测 CAM 可视化相比于分类 CAM 复杂很多且差异很大。本文只是简要说明用法，后续会单独开文档详细描述实现原理和注意事项。
+
+你可以调用 `demo/boxmap_vis_demo.py` 来简单快捷地得到 Box 级别的 AM 可视化结果，目前已经支持 `YOLOv5/YOLOv6/YOLOX/RTMDet`。
+
+以 YOLOv5 为例，和特征图可视化绘制一样，你需要先修改 `test_pipeline`，否则会出现特征图和原图不对齐问题。
+
+旧的 `test_pipeline` 为：
+
+```python
+test_pipeline = [
+    dict(
+        type='LoadImageFromFile',
+        file_client_args=_base_.file_client_args),
+    dict(type='YOLOv5KeepRatioResize', scale=img_scale),
+    dict(
+        type='LetterResize',
+        scale=img_scale,
+        allow_scale_up=False,
+        pad_val=dict(img=114)),
+    dict(type='LoadAnnotations', with_bbox=True, _scope_='mmdet'),
+    dict(
+        type='mmdet.PackDetInputs',
+        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+                   'scale_factor', 'pad_param'))
+]
+```
+
+修改为如下配置：
+
+```python
+test_pipeline = [
+    dict(
+        type='LoadImageFromFile',
+        file_client_args=_base_.file_client_args),
+    dict(type='mmdet.Resize', scale=img_scale, keep_ratio=False), # 这里将 LetterResize 修改成 mmdet.Resize
+    dict(type='LoadAnnotations', with_bbox=True, _scope_='mmdet'),
+    dict(
+        type='mmdet.PackDetInputs',
+        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
+                   'scale_factor'))
+]
+```
+
+(1) 使用 `GradCAM` 方法可视化 neck 模块的最后一个输出层的 AM 图
+
+```shell
+python demo/boxam_vis_demo.py \
+        demo/dog.jpg \
+        configs/yolov5/yolov5_s-v61_syncbn_fast_8xb16-300e_coco.py \
+        yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700-86e02187.pth
+
+```
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/17425982/203775584-c4aebf11-4ff8-4530-85fe-7dda897e95a8.jpg" width="800" alt="image"/>
+</div>
+
+相对应的特征图 AM 图如下：
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/17425982/203774801-1555bcfb-a8f9-4688-8ed6-982d6ad38e1d.jpg" width="800" alt="image"/>
+</div>
+
+可以看出 `GradCAM` 效果可以突出 box 级别的 AM 信息。
+
+你可以通过 `--topk` 参数选择仅仅可视化预测分值最高的前几个预测框
+
+```shell
+python demo/boxam_vis_demo.py \
+        demo/dog.jpg \
+        configs/yolov5/yolov5_s-v61_syncbn_fast_8xb16-300e_coco.py \
+        yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700-86e02187.pth \
+        --topk 2
+```
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/17425982/203778700-3165aa72-ecaf-40cc-b470-6911646e6046.jpg" width="800" alt="image"/>
+</div>
+
+(2) 使用 `AblationCAM` 方法可视化 neck 模块的最后一个输出层的 AM 图
+
+```shell
+python demo/boxam_vis_demo.py \
+        demo/dog.jpg \
+        configs/yolov5/yolov5_s-v61_syncbn_fast_8xb16-300e_coco.py \
+        yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700-86e02187.pth \
+        --method ablationcam
+```
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/17425982/203776978-b5a9b383-93b4-4b35-9e6a-7cac684b372c.jpg" width="800" alt="image"/>
+</div>
+
+由于 `AblationCAM` 是通过每个通道对分值的贡献程度来加权，因此无法实现类似 `GradCAM` 的仅仅可视化 box 级别的 AM 信息, 但是你可以使用 `--norm-in-bbox` 来仅仅显示 bbox 内部 AM
+
+```shell
+python demo/boxam_vis_demo.py \
+        demo/dog.jpg \
+        configs/yolov5/yolov5_s-v61_syncbn_fast_8xb16-300e_coco.py \
+        yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700-86e02187.pth \
+        --method ablationcam \
+        --norm-in-bbox
+```
+
+<div align=center>
+<img src="https://user-images.githubusercontent.com/17425982/203777566-7c74e82f-b477-488e-958f-91e1d10833b9.jpg" width="800" alt="image"/>
+</div>
diff --git a/docs/zh_cn/user_guides/yolov5_tutorial.md b/docs/zh_cn/user_guides/yolov5_tutorial.md
index 2cd7ccf68..20a24cbd9 100644
--- a/docs/zh_cn/user_guides/yolov5_tutorial.md
+++ b/docs/zh_cn/user_guides/yolov5_tutorial.md
@@ -30,7 +30,7 @@ mim install -v -e .
 本文选取不到 40MB 大小的 balloon 气球数据集作为 MMYOLO 的学习数据集。
 
 ```shell
-python tools/misc/download_dataset.py  --dataset-name balloon --save-dir data --unzip
+python tools/misc/download_dataset.py --dataset-name balloon --save-dir data --unzip
 python tools/dataset_converters/balloon2coco.py
 ```
 
diff --git a/mmyolo/datasets/transforms/__init__.py b/mmyolo/datasets/transforms/__init__.py
index 2ff6ad7b0..842ad641a 100644
--- a/mmyolo/datasets/transforms/__init__.py
+++ b/mmyolo/datasets/transforms/__init__.py
@@ -1,10 +1,10 @@
 # Copyright (c) OpenMMLab. All rights reserved.
-from .mix_img_transforms import Mosaic, YOLOv5MixUp, YOLOXMixUp
+from .mix_img_transforms import Mosaic, Mosaic9, YOLOv5MixUp, YOLOXMixUp
 from .transforms import (LetterResize, LoadAnnotations, YOLOv5HSVRandomAug,
                          YOLOv5KeepRatioResize, YOLOv5RandomAffine)
 
 __all__ = [
     'YOLOv5KeepRatioResize', 'LetterResize', 'Mosaic', 'YOLOXMixUp',
     'YOLOv5MixUp', 'YOLOv5HSVRandomAug', 'LoadAnnotations',
-    'YOLOv5RandomAffine'
+    'YOLOv5RandomAffine', 'Mosaic9'
 ]
diff --git a/mmyolo/datasets/transforms/mix_img_transforms.py b/mmyolo/datasets/transforms/mix_img_transforms.py
index 42b82318e..1b85ab2a5 100644
--- a/mmyolo/datasets/transforms/mix_img_transforms.py
+++ b/mmyolo/datasets/transforms/mix_img_transforms.py
@@ -195,15 +195,15 @@ class Mosaic(BaseMixImageTransform):
                         mosaic transform
                            center_x
                 +------------------------------+
-                |       pad        |  pad      |
-                |      +-----------+           |
+                |       pad        |           |
+                |      +-----------+    pad    |
                 |      |           |           |
-                |      |  image1   |--------+  |
-                |      |           |        |  |
-                |      |           | image2 |  |
-     center_y   |----+-------------+-----------|
+                |      |  image1   +-----------+
+                |      |           |           |
+                |      |           |   image2  |
+     center_y   |----+-+-----------+-----------+
                 |    |   cropped   |           |
-                |pad |   image3    |  image4   |
+                |pad |   image3    |   image4  |
                 |    |             |           |
                 +----|-------------+-----------+
                      |             |
@@ -465,13 +465,306 @@ def __repr__(self) -> str:
         return repr_str
 
 
+@TRANSFORMS.register_module()
+class Mosaic9(BaseMixImageTransform):
+    """Mosaic9 augmentation.
+
+    Given 9 images, mosaic transform combines them into
+    one output image. The output image is composed of the parts from each sub-
+    image.
+
+                +-------------------------------+------------+
+                | pad           |      pad      |            |
+                |    +----------+               |            |
+                |    |          +---------------+  top_right |
+                |    |          |      top      |   image2   |
+                |    | top_left |     image1    |            |
+                |    |  image8  o--------+------+--------+---+
+                |    |          |        |               |   |
+                +----+----------+        |     right     |pad|
+                |               | center |     image3    |   |
+                |     left      | image0 +---------------+---|
+                |    image7     |        |               |   |
+            +---+-----------+---+--------+               |   |
+            |   |  cropped  |            |  bottom_right |pad|
+            |   |bottom_left|            |    image4     |   |
+            |   |  image6   |   bottom   |               |   |
+            +---|-----------+   image5   +---------------+---|
+                |    pad    |            |        pad        |
+                +-----------+------------+-------------------+
+
+    The mosaic transform steps are as follows:
+
+        1. Get the center image according to the index, and randomly
+           sample another 8 images from the custom dataset.
+        2. Randomly offset the image after Mosaic
+
+    Required Keys:
+
+    - img
+    - gt_bboxes (BaseBoxes[torch.float32]) (optional)
+    - gt_bboxes_labels (np.int64) (optional)
+    - gt_ignore_flags (np.bool) (optional)
+    - mix_results (List[dict])
+
+    Modified Keys:
+
+    - img
+    - img_shape
+    - gt_bboxes (optional)
+    - gt_bboxes_labels (optional)
+    - gt_ignore_flags (optional)
+
+    Args:
+        img_scale (Sequence[int]): Image size after mosaic pipeline of single
+            image. The shape order should be (height, width).
+            Defaults to (640, 640).
+        bbox_clip_border (bool, optional): Whether to clip the objects outside
+            the border of the image. In some dataset like MOT17, the gt bboxes
+            are allowed to cross the border of images. Therefore, we don't
+            need to clip the gt bboxes in these cases. Defaults to True.
+        pad_val (int): Pad value. Defaults to 114.
+        pre_transform(Sequence[dict]): Sequence of transform object or
+            config dict to be composed.
+        prob (float): Probability of applying this transformation.
+            Defaults to 1.0.
+        use_cached (bool): Whether to use cache. Defaults to False.
+        max_cached_images (int): The maximum length of the cache. The larger
+            the cache, the stronger the randomness of this transform. As a
+            rule of thumb, providing 5 caches for each image suffices for
+            randomness. Defaults to 50.
+        random_pop (bool): Whether to randomly pop a result from the cache
+            when the cache is full. If set to False, use FIFO popping method.
+            Defaults to True.
+        max_refetch (int): The maximum number of retry iterations for getting
+            valid results from the pipeline. If the number of iterations is
+            greater than `max_refetch`, but results is still None, then the
+            iteration is terminated and raise the error. Defaults to 15.
+    """
+
+    def __init__(self,
+                 img_scale: Tuple[int, int] = (640, 640),
+                 bbox_clip_border: bool = True,
+                 pad_val: Union[float, int] = 114.0,
+                 pre_transform: Sequence[dict] = None,
+                 prob: float = 1.0,
+                 use_cached: bool = False,
+                 max_cached_images: int = 50,
+                 random_pop: bool = True,
+                 max_refetch: int = 15):
+        assert isinstance(img_scale, tuple)
+        assert 0 <= prob <= 1.0, 'The probability should be in range [0,1]. ' \
+                                 f'got {prob}.'
+        if use_cached:
+            assert max_cached_images >= 9, 'The length of cache must >= 9, ' \
+                                           f'but got {max_cached_images}.'
+
+        super().__init__(
+            pre_transform=pre_transform,
+            prob=prob,
+            use_cached=use_cached,
+            max_cached_images=max_cached_images,
+            random_pop=random_pop,
+            max_refetch=max_refetch)
+
+        self.img_scale = img_scale
+        self.bbox_clip_border = bbox_clip_border
+        self.pad_val = pad_val
+
+        # intermediate variables
+        self._current_img_shape = [0, 0]
+        self._center_img_shape = [0, 0]
+        self._previous_img_shape = [0, 0]
+
+    def get_indexes(self, dataset: Union[BaseDataset, list]) -> list:
+        """Call function to collect indexes.
+
+        Args:
+            dataset (:obj:`Dataset` or list): The dataset or cached list.
+
+        Returns:
+            list: indexes.
+        """
+        indexes = [random.randint(0, len(dataset)) for _ in range(8)]
+        return indexes
+
+    def mix_img_transform(self, results: dict) -> dict:
+        """Mixed image data transformation.
+
+        Args:
+            results (dict): Result dict.
+
+        Returns:
+            results (dict): Updated result dict.
+        """
+        assert 'mix_results' in results
+
+        mosaic_bboxes = []
+        mosaic_bboxes_labels = []
+        mosaic_ignore_flags = []
+
+        img_scale_h, img_scale_w = self.img_scale
+
+        if len(results['img'].shape) == 3:
+            mosaic_img = np.full(
+                (int(img_scale_h * 3), int(img_scale_w * 3), 3),
+                self.pad_val,
+                dtype=results['img'].dtype)
+        else:
+            mosaic_img = np.full((int(img_scale_h * 3), int(img_scale_w * 3)),
+                                 self.pad_val,
+                                 dtype=results['img'].dtype)
+
+        # index = 0 is mean original image
+        # len(results['mix_results']) = 8
+        loc_strs = ('center', 'top', 'top_right', 'right', 'bottom_right',
+                    'bottom', 'bottom_left', 'left', 'top_left')
+
+        results_all = [results, *results['mix_results']]
+        for index, results_patch in enumerate(results_all):
+            img_i = results_patch['img']
+            # keep_ratio resize
+            img_i_h, img_i_w = img_i.shape[:2]
+            scale_ratio_i = min(img_scale_h / img_i_h, img_scale_w / img_i_w)
+            img_i = mmcv.imresize(
+                img_i,
+                (int(img_i_w * scale_ratio_i), int(img_i_h * scale_ratio_i)))
+
+            paste_coord = self._mosaic_combine(loc_strs[index],
+                                               img_i.shape[:2])
+
+            padw, padh = paste_coord[:2]
+            x1, y1, x2, y2 = (max(x, 0) for x in paste_coord)
+            mosaic_img[y1:y2, x1:x2] = img_i[y1 - padh:, x1 - padw:]
+
+            gt_bboxes_i = results_patch['gt_bboxes']
+            gt_bboxes_labels_i = results_patch['gt_bboxes_labels']
+            gt_ignore_flags_i = results_patch['gt_ignore_flags']
+            gt_bboxes_i.rescale_([scale_ratio_i, scale_ratio_i])
+            gt_bboxes_i.translate_([padw, padh])
+
+            mosaic_bboxes.append(gt_bboxes_i)
+            mosaic_bboxes_labels.append(gt_bboxes_labels_i)
+            mosaic_ignore_flags.append(gt_ignore_flags_i)
+
+        # Offset
+        offset_x = int(random.uniform(0, img_scale_w))
+        offset_y = int(random.uniform(0, img_scale_h))
+        mosaic_img = mosaic_img[offset_y:offset_y + 2 * img_scale_h,
+                                offset_x:offset_x + 2 * img_scale_w]
+
+        mosaic_bboxes = mosaic_bboxes[0].cat(mosaic_bboxes, 0)
+        mosaic_bboxes.translate_([-offset_x, -offset_y])
+        mosaic_bboxes_labels = np.concatenate(mosaic_bboxes_labels, 0)
+        mosaic_ignore_flags = np.concatenate(mosaic_ignore_flags, 0)
+
+        if self.bbox_clip_border:
+            mosaic_bboxes.clip_([2 * img_scale_h, 2 * img_scale_w])
+        else:
+            # remove outside bboxes
+            inside_inds = mosaic_bboxes.is_inside(
+                [2 * img_scale_h, 2 * img_scale_w]).numpy()
+            mosaic_bboxes = mosaic_bboxes[inside_inds]
+            mosaic_bboxes_labels = mosaic_bboxes_labels[inside_inds]
+            mosaic_ignore_flags = mosaic_ignore_flags[inside_inds]
+
+        results['img'] = mosaic_img
+        results['img_shape'] = mosaic_img.shape
+        results['gt_bboxes'] = mosaic_bboxes
+        results['gt_bboxes_labels'] = mosaic_bboxes_labels
+        results['gt_ignore_flags'] = mosaic_ignore_flags
+        return results
+
+    def _mosaic_combine(self, loc: str,
+                        img_shape_hw: Tuple[int, int]) -> Tuple[int, ...]:
+        """Calculate global coordinate of mosaic image.
+
+        Args:
+            loc (str): Index for the sub-image.
+            img_shape_hw (Sequence[int]): Height and width of sub-image
+
+        Returns:
+             paste_coord (tuple): paste corner coordinate in mosaic image.
+        """
+        assert loc in ('center', 'top', 'top_right', 'right', 'bottom_right',
+                       'bottom', 'bottom_left', 'left', 'top_left')
+
+        img_scale_h, img_scale_w = self.img_scale
+
+        self._current_img_shape = img_shape_hw
+        current_img_h, current_img_w = self._current_img_shape
+        previous_img_h, previous_img_w = self._previous_img_shape
+        center_img_h, center_img_w = self._center_img_shape
+
+        if loc == 'center':
+            self._center_img_shape = self._current_img_shape
+            #  xmin, ymin, xmax, ymax
+            paste_coord = img_scale_w, \
+                img_scale_h, \
+                img_scale_w + current_img_w, \
+                img_scale_h + current_img_h
+        elif loc == 'top':
+            paste_coord = img_scale_w, \
+                          img_scale_h - current_img_h, \
+                          img_scale_w + current_img_w, \
+                          img_scale_h
+        elif loc == 'top_right':
+            paste_coord = img_scale_w + previous_img_w, \
+                          img_scale_h - current_img_h, \
+                          img_scale_w + previous_img_w + current_img_w, \
+                          img_scale_h
+        elif loc == 'right':
+            paste_coord = img_scale_w + center_img_w, \
+                          img_scale_h, \
+                          img_scale_w + center_img_w + current_img_w, \
+                          img_scale_h + current_img_h
+        elif loc == 'bottom_right':
+            paste_coord = img_scale_w + center_img_w, \
+                          img_scale_h + previous_img_h, \
+                          img_scale_w + center_img_w + current_img_w, \
+                          img_scale_h + previous_img_h + current_img_h
+        elif loc == 'bottom':
+            paste_coord = img_scale_w + center_img_w - current_img_w, \
+                          img_scale_h + center_img_h, \
+                          img_scale_w + center_img_w, \
+                          img_scale_h + center_img_h + current_img_h
+        elif loc == 'bottom_left':
+            paste_coord = img_scale_w + center_img_w - \
+                          previous_img_w - current_img_w, \
+                          img_scale_h + center_img_h, \
+                          img_scale_w + center_img_w - previous_img_w, \
+                          img_scale_h + center_img_h + current_img_h
+        elif loc == 'left':
+            paste_coord = img_scale_w - current_img_w, \
+                          img_scale_h + center_img_h - current_img_h, \
+                          img_scale_w, \
+                          img_scale_h + center_img_h
+        elif loc == 'top_left':
+            paste_coord = img_scale_w - current_img_w, \
+                          img_scale_h + center_img_h - \
+                          previous_img_h - current_img_h, \
+                          img_scale_w, \
+                          img_scale_h + center_img_h - previous_img_h
+
+        self._previous_img_shape = self._current_img_shape
+        #  xmin, ymin, xmax, ymax
+        return paste_coord
+
+    def __repr__(self) -> str:
+        repr_str = self.__class__.__name__
+        repr_str += f'(img_scale={self.img_scale}, '
+        repr_str += f'pad_val={self.pad_val}, '
+        repr_str += f'prob={self.prob})'
+        return repr_str
+
+
 @TRANSFORMS.register_module()
 class YOLOv5MixUp(BaseMixImageTransform):
     """MixUp data augmentation for YOLOv5.
 
     .. code:: text
 
-     The mixup transform steps are as follows:
+    The mixup transform steps are as follows:
 
         1. Another random image is picked by dataset.
         2. Randomly obtain the fusion ratio from the beta distribution,
@@ -514,7 +807,7 @@ class YOLOv5MixUp(BaseMixImageTransform):
             when the cache is full. If set to False, use FIFO popping method.
             Defaults to True.
         max_refetch (int): The maximum number of iterations. If the number of
-            iterations is greater than `max_iters`, but gt_bbox is still
+            iterations is greater than `max_refetch`, but gt_bbox is still
             empty, then the iteration is terminated. Defaults to 15.
     """
 
@@ -599,20 +892,20 @@ class YOLOXMixUp(BaseMixImageTransform):
     .. code:: text
 
                          mixup transform
-                +------------------------------+
+                +---------------+--------------+
                 | mixup image   |              |
                 |      +--------|--------+     |
                 |      |        |        |     |
-                |---------------+        |     |
+                +---------------+        |     |
                 |      |                 |     |
                 |      |      image      |     |
                 |      |                 |     |
                 |      |                 |     |
-                |      |-----------------+     |
+                |      +-----------------+     |
                 |             pad              |
                 +------------------------------+
 
-     The mixup transform steps are as follows:
+    The mixup transform steps are as follows:
 
         1. Another random image is picked by dataset and embedded in
            the top left patch(after padding and resizing)
@@ -662,7 +955,7 @@ class YOLOXMixUp(BaseMixImageTransform):
             when the cache is full. If set to False, use FIFO popping method.
             Defaults to True.
         max_refetch (int): The maximum number of iterations. If the number of
-            iterations is greater than `max_iters`, but gt_bbox is still
+            iterations is greater than `max_refetch`, but gt_bbox is still
             empty, then the iteration is terminated. Defaults to 15.
     """
 
@@ -759,9 +1052,9 @@ def mix_img_transform(self, results: dict) -> dict:
         ori_img = results['img']
         origin_h, origin_w = out_img.shape[:2]
         target_h, target_w = ori_img.shape[:2]
-        padded_img = np.zeros(
-            (max(origin_h, target_h), max(origin_w,
-                                          target_w), 3)).astype(np.uint8)
+        padded_img = np.ones((max(origin_h, target_h), max(
+            origin_w, target_w), 3)) * self.pad_val
+        padded_img = padded_img.astype(np.uint8)
         padded_img[:origin_h, :origin_w] = out_img
 
         x_offset, y_offset = 0, 0
@@ -823,6 +1116,6 @@ def __repr__(self) -> str:
         repr_str += f'ratio_range={self.ratio_range}, '
         repr_str += f'flip_ratio={self.flip_ratio}, '
         repr_str += f'pad_val={self.pad_val}, '
-        repr_str += f'max_iters={self.max_iters}, '
+        repr_str += f'max_refetch={self.max_refetch}, '
         repr_str += f'bbox_clip_border={self.bbox_clip_border})'
         return repr_str
diff --git a/mmyolo/datasets/transforms/transforms.py b/mmyolo/datasets/transforms/transforms.py
index 17dc961db..890df8ac2 100644
--- a/mmyolo/datasets/transforms/transforms.py
+++ b/mmyolo/datasets/transforms/transforms.py
@@ -104,8 +104,7 @@ def _resize_img(self, results: dict):
             resized_h, resized_w = image.shape[:2]
             scale_ratio = resized_h / original_h
 
-            scale_factor = np.array([scale_ratio, scale_ratio],
-                                    dtype=np.float32)
+            scale_factor = (scale_ratio, scale_ratio)
 
             results['img'] = image
             results['img_shape'] = image.shape[:2]
@@ -208,10 +207,13 @@ def _resize_img(self, results: dict):
                 interpolation=self.interpolation,
                 backend=self.backend)
 
-        scale_factor = np.array([ratio[0], ratio[1]], dtype=np.float32)
+        scale_factor = (ratio[1], ratio[0])  # mmcv scale factor is (w, h)
 
         if 'scale_factor' in results:
-            results['scale_factor'] = results['scale_factor'] * scale_factor
+            results['scale_factor'] = (results['scale_factor'][0] *
+                                       scale_factor[0],
+                                       results['scale_factor'][1] *
+                                       scale_factor[1])
         else:
             results['scale_factor'] = scale_factor
 
diff --git a/mmyolo/datasets/yolov5_coco.py b/mmyolo/datasets/yolov5_coco.py
index 048571186..55bc899ab 100644
--- a/mmyolo/datasets/yolov5_coco.py
+++ b/mmyolo/datasets/yolov5_coco.py
@@ -7,6 +7,9 @@
 
 
 class BatchShapePolicyDataset(BaseDetDataset):
+    """Dataset with the batch shape policy that makes paddings with least
+    pixels during batch inference process, which does not require the image
+    scales of all batches to be the same throughout validation."""
 
     def __init__(self,
                  *args,
@@ -17,7 +20,7 @@ def __init__(self,
 
     def full_init(self):
         """rewrite full_init() to be compatible with serialize_data in
-        BatchShapesPolicy."""
+        BatchShapePolicy."""
         if self._fully_initialized:
             return
         # load data information
diff --git a/mmyolo/deploy/models/dense_heads/yolov5_head.py b/mmyolo/deploy/models/dense_heads/yolov5_head.py
index cf61fb3ca..ecbe24437 100644
--- a/mmyolo/deploy/models/dense_heads/yolov5_head.py
+++ b/mmyolo/deploy/models/dense_heads/yolov5_head.py
@@ -146,3 +146,34 @@ def yolov5_head__predict_by_feat(ctx,
 
     return nms_func(bboxes, scores, max_output_boxes_per_class, iou_threshold,
                     score_threshold, pre_top_k, keep_top_k)
+
+
+@FUNCTION_REWRITER.register_rewriter(
+    func_name='mmyolo.models.dense_heads.yolov5_head.'
+    'YOLOv5Head.predict',
+    backend='rknn')
+def yolov5_head__predict__rknn(ctx, self, x: Tuple[Tensor], *args,
+                               **kwargs) -> Tuple[Tensor, Tensor, Tensor]:
+    """Perform forward propagation of the detection head and predict detection
+    results on the features of the upstream network.
+
+    Args:
+        x (tuple[Tensor]): Multi-level features from the
+            upstream network, each is a 4D-tensor.
+    """
+    outs = self(x)
+    return outs
+
+
+@FUNCTION_REWRITER.register_rewriter(
+    func_name='mmyolo.models.dense_heads.yolov5_head.'
+    'YOLOv5HeadModule.forward',
+    backend='rknn')
+def yolov5_head_module__forward__rknn(
+        ctx, self, x: Tensor, *args,
+        **kwargs) -> Tuple[Tensor, Tensor, Tensor]:
+    """Forward feature of a single scale level."""
+    out = []
+    for i, feat in enumerate(x):
+        out.append(self.convs_pred[i](feat))
+    return out
diff --git a/mmyolo/deploy/object_detection.py b/mmyolo/deploy/object_detection.py
index 2317ec915..ba8c69ea8 100644
--- a/mmyolo/deploy/object_detection.py
+++ b/mmyolo/deploy/object_detection.py
@@ -1,6 +1,7 @@
 # Copyright (c) OpenMMLab. All rights reserved.
-from typing import Callable
+from typing import Callable, Dict, Optional
 
+import torch
 from mmdeploy.codebase.base import CODEBASE, MMCodebase
 from mmdeploy.codebase.mmdet.deploy import ObjectDetection
 from mmdeploy.utils import Codebase, Task
@@ -16,13 +17,23 @@ class MMYOLO(MMCodebase):
 
     task_registry = MMYOLO_TASK
 
+    @classmethod
+    def register_deploy_modules(cls):
+        """register all rewriters for mmdet."""
+        import mmdeploy.codebase.mmdet.models  # noqa: F401
+        import mmdeploy.codebase.mmdet.ops  # noqa: F401
+        import mmdeploy.codebase.mmdet.structures  # noqa: F401
+
     @classmethod
     def register_all_modules(cls):
+        """register all modules."""
         from mmdet.utils.setup_env import \
             register_all_modules as register_all_modules_mmdet
 
         from mmyolo.utils.setup_env import \
             register_all_modules as register_all_modules_mmyolo
+
+        cls.register_deploy_modules()
         register_all_modules_mmyolo(True)
         register_all_modules_mmdet(False)
 
@@ -72,3 +83,40 @@ def get_visualizer(self, name: str, save_dir: str):
         if metainfo is not None:
             visualizer.dataset_meta = metainfo
         return visualizer
+
+    def build_pytorch_model(self,
+                            model_checkpoint: Optional[str] = None,
+                            cfg_options: Optional[Dict] = None,
+                            **kwargs) -> torch.nn.Module:
+        """Initialize torch model.
+
+        Args:
+            model_checkpoint (str): The checkpoint file of torch model,
+                defaults to `None`.
+            cfg_options (dict): Optional config key-pair parameters.
+        Returns:
+            nn.Module: An initialized torch model generated by other OpenMMLab
+                codebases.
+        """
+        from copy import deepcopy
+
+        from mmengine.model import revert_sync_batchnorm
+        from mmengine.registry import MODELS
+
+        from mmyolo.utils import switch_to_deploy
+
+        model = deepcopy(self.model_cfg.model)
+        preprocess_cfg = deepcopy(self.model_cfg.get('preprocess_cfg', {}))
+        preprocess_cfg.update(
+            deepcopy(self.model_cfg.get('data_preprocessor', {})))
+        model.setdefault('data_preprocessor', preprocess_cfg)
+        model = MODELS.build(model)
+        if model_checkpoint is not None:
+            from mmengine.runner.checkpoint import load_checkpoint
+            load_checkpoint(model, model_checkpoint, map_location=self.device)
+
+        model = revert_sync_batchnorm(model)
+        switch_to_deploy(model)
+        model = model.to(self.device)
+        model.eval()
+        return model
diff --git a/mmyolo/engine/hooks/switch_to_deploy_hook.py b/mmyolo/engine/hooks/switch_to_deploy_hook.py
index e597eb22b..28ac345f4 100644
--- a/mmyolo/engine/hooks/switch_to_deploy_hook.py
+++ b/mmyolo/engine/hooks/switch_to_deploy_hook.py
@@ -17,4 +17,5 @@ class SwitchToDeployHook(Hook):
     """
 
     def before_test_epoch(self, runner: Runner):
+        """Switch to deploy mode before testing."""
         switch_to_deploy(runner.model)
diff --git a/mmyolo/engine/optimizers/__init__.py b/mmyolo/engine/optimizers/__init__.py
index 3ad91894a..b598020d0 100644
--- a/mmyolo/engine/optimizers/__init__.py
+++ b/mmyolo/engine/optimizers/__init__.py
@@ -1,4 +1,5 @@
 # Copyright (c) OpenMMLab. All rights reserved.
 from .yolov5_optim_constructor import YOLOv5OptimizerConstructor
+from .yolov7_optim_wrapper_constructor import YOLOv7OptimWrapperConstructor
 
-__all__ = ['YOLOv5OptimizerConstructor']
+__all__ = ['YOLOv5OptimizerConstructor', 'YOLOv7OptimWrapperConstructor']
diff --git a/mmyolo/engine/optimizers/yolov5_optim_constructor.py b/mmyolo/engine/optimizers/yolov5_optim_constructor.py
index 8abe5db89..5e5f42cb5 100644
--- a/mmyolo/engine/optimizers/yolov5_optim_constructor.py
+++ b/mmyolo/engine/optimizers/yolov5_optim_constructor.py
@@ -120,6 +120,10 @@ def __call__(self, model: nn.Module) -> OptimWrapper:
         # bias
         optimizer_cfg['params'].append({'params': params_groups[2]})
 
+        print_log(
+            'Optimizer groups: %g .bias, %g conv.weight, %g other' %
+            (len(params_groups[2]), len(params_groups[0]), len(
+                params_groups[1])), 'current')
         del params_groups
 
         optimizer = OPTIMIZERS.build(optimizer_cfg)
diff --git a/mmyolo/engine/optimizers/yolov7_optim_wrapper_constructor.py b/mmyolo/engine/optimizers/yolov7_optim_wrapper_constructor.py
new file mode 100644
index 000000000..79ea8b699
--- /dev/null
+++ b/mmyolo/engine/optimizers/yolov7_optim_wrapper_constructor.py
@@ -0,0 +1,139 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Optional
+
+import torch.nn as nn
+from mmengine.dist import get_world_size
+from mmengine.logging import print_log
+from mmengine.model import is_model_wrapper
+from mmengine.optim import OptimWrapper
+
+from mmyolo.models.dense_heads.yolov7_head import ImplicitA, ImplicitM
+from mmyolo.registry import (OPTIM_WRAPPER_CONSTRUCTORS, OPTIM_WRAPPERS,
+                             OPTIMIZERS)
+
+
+# TODO: Consider merging into YOLOv5OptimizerConstructor
+@OPTIM_WRAPPER_CONSTRUCTORS.register_module()
+class YOLOv7OptimWrapperConstructor:
+    """YOLOv7 constructor for optimizer wrappers.
+
+    It has the following functions：
+
+        - divides the optimizer parameters into 3 groups:
+        Conv, Bias and BN/ImplicitA/ImplicitM
+
+        - support `weight_decay` parameter adaption based on
+        `batch_size_per_gpu`
+
+    Args:
+        optim_wrapper_cfg (dict): The config dict of the optimizer wrapper.
+            Positional fields are
+
+                - ``type``: class name of the OptimizerWrapper
+                - ``optimizer``: The configuration of optimizer.
+
+            Optional fields are
+
+                - any arguments of the corresponding optimizer wrapper type,
+                  e.g., accumulative_counts, clip_grad, etc.
+
+            The positional fields of ``optimizer`` are
+
+                - `type`: class name of the optimizer.
+
+            Optional fields are
+
+                - any arguments of the corresponding optimizer type, e.g.,
+                  lr, weight_decay, momentum, etc.
+
+        paramwise_cfg (dict, optional): Parameter-wise options. Must include
+            `base_total_batch_size` if not None. If the total input batch
+            is smaller than `base_total_batch_size`, the `weight_decay`
+            parameter will be kept unchanged, otherwise linear scaling.
+
+    Example:
+        >>> model = torch.nn.modules.Conv1d(1, 1, 1)
+        >>> optim_wrapper_cfg = dict(
+        >>>     dict(type='OptimWrapper', optimizer=dict(type='SGD', lr=0.01,
+        >>>         momentum=0.9, weight_decay=0.0001, batch_size_per_gpu=16))
+        >>> paramwise_cfg = dict(base_total_batch_size=64)
+        >>> optim_wrapper_builder = YOLOv7OptimWrapperConstructor(
+        >>>     optim_wrapper_cfg, paramwise_cfg)
+        >>> optim_wrapper = optim_wrapper_builder(model)
+    """
+
+    def __init__(self,
+                 optim_wrapper_cfg: dict,
+                 paramwise_cfg: Optional[dict] = None):
+        if paramwise_cfg is None:
+            paramwise_cfg = {'base_total_batch_size': 64}
+        assert 'base_total_batch_size' in paramwise_cfg
+
+        if not isinstance(optim_wrapper_cfg, dict):
+            raise TypeError('optimizer_cfg should be a dict',
+                            f'but got {type(optim_wrapper_cfg)}')
+        assert 'optimizer' in optim_wrapper_cfg, (
+            '`optim_wrapper_cfg` must contain "optimizer" config')
+
+        self.optim_wrapper_cfg = optim_wrapper_cfg
+        self.optimizer_cfg = self.optim_wrapper_cfg.pop('optimizer')
+        self.base_total_batch_size = paramwise_cfg['base_total_batch_size']
+
+    def __call__(self, model: nn.Module) -> OptimWrapper:
+        if is_model_wrapper(model):
+            model = model.module
+        optimizer_cfg = self.optimizer_cfg.copy()
+        weight_decay = optimizer_cfg.pop('weight_decay', 0)
+
+        if 'batch_size_per_gpu' in optimizer_cfg:
+            batch_size_per_gpu = optimizer_cfg.pop('batch_size_per_gpu')
+            # No scaling if total_batch_size is less than
+            # base_total_batch_size, otherwise linear scaling.
+            total_batch_size = get_world_size() * batch_size_per_gpu
+            accumulate = max(
+                round(self.base_total_batch_size / total_batch_size), 1)
+            scale_factor = total_batch_size * \
+                accumulate / self.base_total_batch_size
+
+            if scale_factor != 1:
+                weight_decay *= scale_factor
+                print_log(f'Scaled weight_decay to {weight_decay}', 'current')
+
+        params_groups = [], [], []
+        for v in model.modules():
+            # no decay
+            # Caution: Coupling with model
+            if isinstance(v, (ImplicitA, ImplicitM)):
+                params_groups[0].append(v.implicit)
+            elif isinstance(v, nn.modules.batchnorm._NormBase):
+                params_groups[0].append(v.weight)
+            # apply decay
+            elif hasattr(v, 'weight') and isinstance(v.weight, nn.Parameter):
+                params_groups[1].append(v.weight)  # apply decay
+
+            # biases, no decay
+            if hasattr(v, 'bias') and isinstance(v.bias, nn.Parameter):
+                params_groups[2].append(v.bias)
+
+        # Note: Make sure bias is in the last parameter group
+        optimizer_cfg['params'] = []
+        # conv
+        optimizer_cfg['params'].append({
+            'params': params_groups[1],
+            'weight_decay': weight_decay
+        })
+        # bn ...
+        optimizer_cfg['params'].append({'params': params_groups[0]})
+        # bias
+        optimizer_cfg['params'].append({'params': params_groups[2]})
+
+        print_log(
+            'Optimizer groups: %g .bias, %g conv.weight, %g other' %
+            (len(params_groups[2]), len(params_groups[1]), len(
+                params_groups[0])), 'current')
+        del params_groups
+
+        optimizer = OPTIMIZERS.build(optimizer_cfg)
+        optim_wrapper = OPTIM_WRAPPERS.build(
+            self.optim_wrapper_cfg, default_args=dict(optimizer=optimizer))
+        return optim_wrapper
diff --git a/mmyolo/models/backbones/__init__.py b/mmyolo/models/backbones/__init__.py
index 851e8917c..0c5015376 100644
--- a/mmyolo/models/backbones/__init__.py
+++ b/mmyolo/models/backbones/__init__.py
@@ -3,10 +3,10 @@
 from .csp_darknet import YOLOv5CSPDarknet, YOLOXCSPDarknet
 from .csp_resnet import PPYOLOECSPResNet
 from .cspnext import CSPNeXt
-from .efficient_rep import YOLOv6EfficientRep
+from .efficient_rep import YOLOv6CSPBep, YOLOv6EfficientRep
 from .yolov7_backbone import YOLOv7Backbone
 
 __all__ = [
-    'YOLOv5CSPDarknet', 'BaseBackbone', 'YOLOv6EfficientRep',
+    'YOLOv5CSPDarknet', 'BaseBackbone', 'YOLOv6EfficientRep', 'YOLOv6CSPBep',
     'YOLOXCSPDarknet', 'CSPNeXt', 'YOLOv7Backbone', 'PPYOLOECSPResNet'
 ]
diff --git a/mmyolo/models/backbones/base_backbone.py b/mmyolo/models/backbones/base_backbone.py
index 57a00eae0..730c7095e 100644
--- a/mmyolo/models/backbones/base_backbone.py
+++ b/mmyolo/models/backbones/base_backbone.py
@@ -48,7 +48,7 @@ class BaseBackbone(BaseModule, metaclass=ABCMeta):
      In P6 model, n=5
 
     Args:
-        arch_setting (dict): Architecture of BaseBackbone.
+        arch_setting (list): Architecture of BaseBackbone.
         plugins (list[dict]): List of plugins for stages, each dict contains:
 
             - cfg (dict, required): Cfg dict to build plugin.
@@ -75,7 +75,7 @@ class BaseBackbone(BaseModule, metaclass=ABCMeta):
     """
 
     def __init__(self,
-                 arch_setting: dict,
+                 arch_setting: list,
                  deepen_factor: float = 1.0,
                  widen_factor: float = 1.0,
                  input_channels: int = 3,
@@ -87,7 +87,6 @@ def __init__(self,
                  norm_eval: bool = False,
                  init_cfg: OptMultiConfig = None):
         super().__init__(init_cfg)
-
         self.num_stages = len(arch_setting)
         self.arch_setting = arch_setting
 
@@ -135,7 +134,7 @@ def build_stage_layer(self, stage_idx: int, setting: list):
         """
         pass
 
-    def make_stage_plugins(self, plugins, idx, setting):
+    def make_stage_plugins(self, plugins, stage_idx, setting):
         """Make plugins for backbone ``stage_idx`` th stage.
 
         Currently we support to insert ``context_block``,
@@ -154,7 +153,7 @@ def make_stage_plugins(self, plugins, idx, setting):
             ... ]
             >>> model = YOLOv5CSPDarknet()
             >>> stage_plugins = model.make_stage_plugins(plugins, 0, setting)
-            >>> assert len(stage_plugins) == 3
+            >>> assert len(stage_plugins) == 1
 
         Suppose ``stage_idx=0``, the structure of blocks in the stage would be:
 
@@ -162,7 +161,7 @@ def make_stage_plugins(self, plugins, idx, setting):
 
             conv1 -> conv2 -> conv3 -> yyy
 
-        Suppose 'stage_idx=1', the structure of blocks in the stage would be:
+        Suppose ``stage_idx=1``, the structure of blocks in the stage would be:
 
         .. code-block:: none
 
@@ -188,7 +187,7 @@ def make_stage_plugins(self, plugins, idx, setting):
             plugin = plugin.copy()
             stages = plugin.pop('stages', None)
             assert stages is None or len(stages) == self.num_stages
-            if stages is None or stages[idx]:
+            if stages is None or stages[stage_idx]:
                 name, layer = build_plugin_layer(
                     plugin['cfg'], in_channels=in_channels)
                 plugin_layers.append(layer)
diff --git a/mmyolo/models/backbones/csp_darknet.py b/mmyolo/models/backbones/csp_darknet.py
index 88d99c79d..2ce0fb669 100644
--- a/mmyolo/models/backbones/csp_darknet.py
+++ b/mmyolo/models/backbones/csp_darknet.py
@@ -3,7 +3,7 @@
 
 import torch
 import torch.nn as nn
-from mmcv.cnn import ConvModule
+from mmcv.cnn import ConvModule, DepthwiseSeparableConvModule
 from mmdet.models.backbones.csp_darknet import CSPLayer, Focus
 from mmdet.utils import ConfigType, OptMultiConfig
 
@@ -146,8 +146,8 @@ def build_stage_layer(self, stage_idx: int, setting: list) -> list:
         return stage
 
     def init_weights(self):
+        """Initialize the parameters."""
         if self.init_cfg is None:
-            """Initialize the parameters."""
             for m in self.modules():
                 if isinstance(m, torch.nn.Conv2d):
                     # In order to be consistent with the source code,
@@ -178,6 +178,8 @@ class YOLOXCSPDarknet(BaseBackbone):
             Defaults to (2, 3, 4).
         frozen_stages (int): Stages to be frozen (stop grad and set eval
             mode). -1 means not freezing any parameters. Defaults to -1.
+        use_depthwise (bool): Whether to use depthwise separable convolution.
+            Defaults to False.
         spp_kernal_sizes: (tuple[int]): Sequential of kernel sizes of SPP
             layers. Defaults to (5, 9, 13).
         norm_cfg (dict): Dictionary to construct and config norm layer.
@@ -218,12 +220,14 @@ def __init__(self,
                  input_channels: int = 3,
                  out_indices: Tuple[int] = (2, 3, 4),
                  frozen_stages: int = -1,
+                 use_depthwise: bool = False,
                  spp_kernal_sizes: Tuple[int] = (5, 9, 13),
                  norm_cfg: ConfigType = dict(
                      type='BN', momentum=0.03, eps=0.001),
                  act_cfg: ConfigType = dict(type='SiLU', inplace=True),
                  norm_eval: bool = False,
                  init_cfg: OptMultiConfig = None):
+        self.use_depthwise = use_depthwise
         self.spp_kernal_sizes = spp_kernal_sizes
         super().__init__(self.arch_settings[arch], deepen_factor, widen_factor,
                          input_channels, out_indices, frozen_stages, plugins,
@@ -251,7 +255,9 @@ def build_stage_layer(self, stage_idx: int, setting: list) -> list:
         out_channels = make_divisible(out_channels, self.widen_factor)
         num_blocks = make_round(num_blocks, self.deepen_factor)
         stage = []
-        conv_layer = ConvModule(
+        conv = DepthwiseSeparableConvModule \
+            if self.use_depthwise else ConvModule
+        conv_layer = conv(
             in_channels,
             out_channels,
             kernel_size=3,
diff --git a/mmyolo/models/backbones/efficient_rep.py b/mmyolo/models/backbones/efficient_rep.py
index 9ac1b81be..691c5b846 100644
--- a/mmyolo/models/backbones/efficient_rep.py
+++ b/mmyolo/models/backbones/efficient_rep.py
@@ -8,20 +8,18 @@
 
 from mmyolo.models.layers.yolo_bricks import SPPFBottleneck
 from mmyolo.registry import MODELS
-from ..layers import RepStageBlock, RepVGGBlock
-from ..utils import make_divisible, make_round
+from ..layers import BepC3StageBlock, RepStageBlock
+from ..utils import make_round
 from .base_backbone import BaseBackbone
 
 
 @MODELS.register_module()
 class YOLOv6EfficientRep(BaseBackbone):
     """EfficientRep backbone used in YOLOv6.
-
     Args:
         arch (str): Architecture of BaseDarknet, from {P5, P6}.
             Defaults to P5.
         plugins (list[dict]): List of plugins for stages, each dict contains:
-
             - cfg (dict, required): Cfg dict to build plugin.
             - stages (tuple[bool], optional): Stages to apply plugin, length
               should be same as 'num_stages'.
@@ -41,10 +39,10 @@ class YOLOv6EfficientRep(BaseBackbone):
         norm_eval (bool): Whether to set norm layers to eval mode, namely,
             freeze running stats (mean and var). Note: Effect on Batch Norm
             and its variants only. Defaults to False.
-        block (nn.Module): block used to build each stage.
+        block_cfg (dict): Config dict for the block used to build each
+            layer. Defaults to dict(type='RepVGGBlock').
         init_cfg (Union[dict, list[dict]], optional): Initialization config
             dict. Defaults to None.
-
     Example:
         >>> from mmyolo.models import YOLOv6EfficientRep
         >>> import torch
@@ -78,9 +76,9 @@ def __init__(self,
                      type='BN', momentum=0.03, eps=0.001),
                  act_cfg: ConfigType = dict(type='ReLU', inplace=True),
                  norm_eval: bool = False,
-                 block: nn.Module = RepVGGBlock,
+                 block_cfg: ConfigType = dict(type='RepVGGBlock'),
                  init_cfg: OptMultiConfig = None):
-        self.block = block
+        self.block_cfg = block_cfg
         super().__init__(
             self.arch_settings[arch],
             deepen_factor,
@@ -96,12 +94,16 @@ def __init__(self,
 
     def build_stem_layer(self) -> nn.Module:
         """Build a stem layer."""
-        return self.block(
-            in_channels=self.input_channels,
-            out_channels=make_divisible(self.arch_setting[0][0],
-                                        self.widen_factor),
-            kernel_size=3,
-            stride=2)
+
+        block_cfg = self.block_cfg.copy()
+        block_cfg.update(
+            dict(
+                in_channels=self.input_channels,
+                out_channels=int(self.arch_setting[0][0] * self.widen_factor),
+                kernel_size=3,
+                stride=2,
+            ))
+        return MODELS.build(block_cfg)
 
     def build_stage_layer(self, stage_idx: int, setting: list) -> list:
         """Build a stage layer.
@@ -112,24 +114,28 @@ def build_stage_layer(self, stage_idx: int, setting: list) -> list:
         """
         in_channels, out_channels, num_blocks, use_spp = setting
 
-        in_channels = make_divisible(in_channels, self.widen_factor)
-        out_channels = make_divisible(out_channels, self.widen_factor)
+        in_channels = int(in_channels * self.widen_factor)
+        out_channels = int(out_channels * self.widen_factor)
         num_blocks = make_round(num_blocks, self.deepen_factor)
 
-        stage = []
+        rep_stage_block = RepStageBlock(
+            in_channels=out_channels,
+            out_channels=out_channels,
+            num_blocks=num_blocks,
+            block_cfg=self.block_cfg,
+        )
 
-        ef_block = nn.Sequential(
-            self.block(
+        block_cfg = self.block_cfg.copy()
+        block_cfg.update(
+            dict(
                 in_channels=in_channels,
                 out_channels=out_channels,
                 kernel_size=3,
-                stride=2),
-            RepStageBlock(
-                in_channels=out_channels,
-                out_channels=out_channels,
-                n=num_blocks,
-                block=self.block,
-            ))
+                stride=2))
+        stage = []
+
+        ef_block = nn.Sequential(MODELS.build(block_cfg), rep_stage_block)
+
         stage.append(ef_block)
 
         if use_spp:
@@ -152,3 +158,130 @@ def init_weights(self):
                     m.reset_parameters()
         else:
             super().init_weights()
+
+
+@MODELS.register_module()
+class YOLOv6CSPBep(YOLOv6EfficientRep):
+    """CSPBep backbone used in YOLOv6.
+    Args:
+        arch (str): Architecture of BaseDarknet, from {P5, P6}.
+            Defaults to P5.
+        plugins (list[dict]): List of plugins for stages, each dict contains:
+            - cfg (dict, required): Cfg dict to build plugin.
+            - stages (tuple[bool], optional): Stages to apply plugin, length
+              should be same as 'num_stages'.
+        deepen_factor (float): Depth multiplier, multiply number of
+            blocks in CSP layer by this amount. Defaults to 1.0.
+        widen_factor (float): Width multiplier, multiply number of
+            channels in each layer by this amount. Defaults to 1.0.
+        input_channels (int): Number of input image channels. Defaults to 3.
+        out_indices (Tuple[int]): Output from which stages.
+            Defaults to (2, 3, 4).
+        frozen_stages (int): Stages to be frozen (stop grad and set eval
+            mode). -1 means not freezing any parameters. Defaults to -1.
+        norm_cfg (dict): Dictionary to construct and config norm layer.
+            Defaults to dict(type='BN', requires_grad=True).
+        act_cfg (dict): Config dict for activation layer.
+            Defaults to dict(type='LeakyReLU', negative_slope=0.1).
+        norm_eval (bool): Whether to set norm layers to eval mode, namely,
+            freeze running stats (mean and var). Note: Effect on Batch Norm
+            and its variants only. Defaults to False.
+        block_cfg (dict): Config dict for the block used to build each
+            layer. Defaults to dict(type='RepVGGBlock').
+        block_act_cfg (dict): Config dict for activation layer used in each
+            stage. Defaults to dict(type='SiLU', inplace=True).
+        init_cfg (Union[dict, list[dict]], optional): Initialization config
+            dict. Defaults to None.
+    Example:
+        >>> from mmyolo.models import YOLOv6CSPBep
+        >>> import torch
+        >>> model = YOLOv6CSPBep()
+        >>> model.eval()
+        >>> inputs = torch.rand(1, 3, 416, 416)
+        >>> level_outputs = model(inputs)
+        >>> for level_out in level_outputs:
+        ...     print(tuple(level_out.shape))
+        ...
+        (1, 256, 52, 52)
+        (1, 512, 26, 26)
+        (1, 1024, 13, 13)
+    """
+    # From left to right:
+    # in_channels, out_channels, num_blocks, use_spp
+    arch_settings = {
+        'P5': [[64, 128, 6, False], [128, 256, 12, False],
+               [256, 512, 18, False], [512, 1024, 6, True]]
+    }
+
+    def __init__(self,
+                 arch: str = 'P5',
+                 plugins: Union[dict, List[dict]] = None,
+                 deepen_factor: float = 1.0,
+                 widen_factor: float = 1.0,
+                 input_channels: int = 3,
+                 hidden_ratio: float = 0.5,
+                 out_indices: Tuple[int] = (2, 3, 4),
+                 frozen_stages: int = -1,
+                 norm_cfg: ConfigType = dict(
+                     type='BN', momentum=0.03, eps=0.001),
+                 act_cfg: ConfigType = dict(type='SiLU', inplace=True),
+                 norm_eval: bool = False,
+                 block_cfg: ConfigType = dict(type='ConvWrapper'),
+                 init_cfg: OptMultiConfig = None):
+        self.hidden_ratio = hidden_ratio
+        super().__init__(
+            arch=arch,
+            deepen_factor=deepen_factor,
+            widen_factor=widen_factor,
+            input_channels=input_channels,
+            out_indices=out_indices,
+            plugins=plugins,
+            frozen_stages=frozen_stages,
+            norm_cfg=norm_cfg,
+            act_cfg=act_cfg,
+            norm_eval=norm_eval,
+            block_cfg=block_cfg,
+            init_cfg=init_cfg)
+
+    def build_stage_layer(self, stage_idx: int, setting: list) -> list:
+        """Build a stage layer.
+
+        Args:
+            stage_idx (int): The index of a stage layer.
+            setting (list): The architecture setting of a stage layer.
+        """
+        in_channels, out_channels, num_blocks, use_spp = setting
+        in_channels = int(in_channels * self.widen_factor)
+        out_channels = int(out_channels * self.widen_factor)
+        num_blocks = make_round(num_blocks, self.deepen_factor)
+
+        rep_stage_block = BepC3StageBlock(
+            in_channels=out_channels,
+            out_channels=out_channels,
+            num_blocks=num_blocks,
+            hidden_ratio=self.hidden_ratio,
+            block_cfg=self.block_cfg,
+            norm_cfg=self.norm_cfg,
+            act_cfg=self.act_cfg)
+        block_cfg = self.block_cfg.copy()
+        block_cfg.update(
+            dict(
+                in_channels=in_channels,
+                out_channels=out_channels,
+                kernel_size=3,
+                stride=2))
+        stage = []
+
+        ef_block = nn.Sequential(MODELS.build(block_cfg), rep_stage_block)
+
+        stage.append(ef_block)
+
+        if use_spp:
+            spp = SPPFBottleneck(
+                in_channels=out_channels,
+                out_channels=out_channels,
+                kernel_sizes=5,
+                norm_cfg=self.norm_cfg,
+                act_cfg=self.act_cfg)
+            stage.append(spp)
+        return stage
diff --git a/mmyolo/models/backbones/yolov7_backbone.py b/mmyolo/models/backbones/yolov7_backbone.py
index c016e277d..bb9a5eed8 100644
--- a/mmyolo/models/backbones/yolov7_backbone.py
+++ b/mmyolo/models/backbones/yolov7_backbone.py
@@ -1,12 +1,13 @@
 # Copyright (c) OpenMMLab. All rights reserved.
-from typing import List, Tuple, Union
+from typing import List, Optional, Tuple, Union
 
 import torch.nn as nn
 from mmcv.cnn import ConvModule
+from mmdet.models.backbones.csp_darknet import Focus
 from mmdet.utils import ConfigType, OptMultiConfig
 
 from mmyolo.registry import MODELS
-from ..layers import ELANBlock, MaxPoolAndStrideConvBlock
+from ..layers import MaxPoolAndStrideConvBlock
 from .base_backbone import BaseBackbone
 
 
@@ -15,8 +16,7 @@ class YOLOv7Backbone(BaseBackbone):
     """Backbone used in YOLOv7.
 
     Args:
-        arch (str): Architecture of YOLOv7, from {P5, P6}.
-            Defaults to P5.
+        arch (str): Architecture of YOLOv7Defaults to L.
         deepen_factor (float): Depth multiplier, multiply number of
             blocks in CSP layer by this amount. Defaults to 1.0.
         widen_factor (float): Width multiplier, multiply number of
@@ -40,28 +40,107 @@ class YOLOv7Backbone(BaseBackbone):
         init_cfg (:obj:`ConfigDict` or dict or list[dict] or
             list[:obj:`ConfigDict`]): Initialization config dict.
     """
+    _tiny_stage1_cfg = dict(type='TinyDownSampleBlock', middle_ratio=0.5)
+    _tiny_stage2_4_cfg = dict(type='TinyDownSampleBlock', middle_ratio=1.0)
+    _l_expand_channel_2x = dict(
+        type='ELANBlock',
+        middle_ratio=0.5,
+        block_ratio=0.5,
+        num_blocks=2,
+        num_convs_in_block=2)
+    _l_no_change_channel = dict(
+        type='ELANBlock',
+        middle_ratio=0.25,
+        block_ratio=0.25,
+        num_blocks=2,
+        num_convs_in_block=2)
+    _x_expand_channel_2x = dict(
+        type='ELANBlock',
+        middle_ratio=0.4,
+        block_ratio=0.4,
+        num_blocks=3,
+        num_convs_in_block=2)
+    _x_no_change_channel = dict(
+        type='ELANBlock',
+        middle_ratio=0.2,
+        block_ratio=0.2,
+        num_blocks=3,
+        num_convs_in_block=2)
+    _w_no_change_channel = dict(
+        type='ELANBlock',
+        middle_ratio=0.5,
+        block_ratio=0.5,
+        num_blocks=2,
+        num_convs_in_block=2)
+    _e_no_change_channel = dict(
+        type='ELANBlock',
+        middle_ratio=0.4,
+        block_ratio=0.4,
+        num_blocks=3,
+        num_convs_in_block=2)
+    _d_no_change_channel = dict(
+        type='ELANBlock',
+        middle_ratio=1 / 3,
+        block_ratio=1 / 3,
+        num_blocks=4,
+        num_convs_in_block=2)
+    _e2e_no_change_channel = dict(
+        type='EELANBlock',
+        num_elan_block=2,
+        middle_ratio=0.4,
+        block_ratio=0.4,
+        num_blocks=3,
+        num_convs_in_block=2)
 
     # From left to right:
-    # in_channels, out_channels, ELAN mode
+    # in_channels, out_channels, Block_params
     arch_settings = {
-        'P5': [[64, 128, 'expand_channel_2x'], [256, 512, 'expand_channel_2x'],
-               [512, 1024, 'expand_channel_2x'],
-               [1024, 1024, 'no_change_channel']]
+        'Tiny': [[64, 64, _tiny_stage1_cfg], [64, 128, _tiny_stage2_4_cfg],
+                 [128, 256, _tiny_stage2_4_cfg],
+                 [256, 512, _tiny_stage2_4_cfg]],
+        'L': [[64, 256, _l_expand_channel_2x],
+              [256, 512, _l_expand_channel_2x],
+              [512, 1024, _l_expand_channel_2x],
+              [1024, 1024, _l_no_change_channel]],
+        'X': [[80, 320, _x_expand_channel_2x],
+              [320, 640, _x_expand_channel_2x],
+              [640, 1280, _x_expand_channel_2x],
+              [1280, 1280, _x_no_change_channel]],
+        'W':
+        [[64, 128, _w_no_change_channel], [128, 256, _w_no_change_channel],
+         [256, 512, _w_no_change_channel], [512, 768, _w_no_change_channel],
+         [768, 1024, _w_no_change_channel]],
+        'E':
+        [[80, 160, _e_no_change_channel], [160, 320, _e_no_change_channel],
+         [320, 640, _e_no_change_channel], [640, 960, _e_no_change_channel],
+         [960, 1280, _e_no_change_channel]],
+        'D': [[96, 192,
+               _d_no_change_channel], [192, 384, _d_no_change_channel],
+              [384, 768, _d_no_change_channel],
+              [768, 1152, _d_no_change_channel],
+              [1152, 1536, _d_no_change_channel]],
+        'E2E': [[80, 160, _e2e_no_change_channel],
+                [160, 320, _e2e_no_change_channel],
+                [320, 640, _e2e_no_change_channel],
+                [640, 960, _e2e_no_change_channel],
+                [960, 1280, _e2e_no_change_channel]],
     }
 
     def __init__(self,
-                 arch: str = 'P5',
-                 plugins: Union[dict, List[dict]] = None,
+                 arch: str = 'L',
                  deepen_factor: float = 1.0,
                  widen_factor: float = 1.0,
                  input_channels: int = 3,
                  out_indices: Tuple[int] = (2, 3, 4),
                  frozen_stages: int = -1,
+                 plugins: Union[dict, List[dict]] = None,
                  norm_cfg: ConfigType = dict(
                      type='BN', momentum=0.03, eps=0.001),
                  act_cfg: ConfigType = dict(type='SiLU', inplace=True),
                  norm_eval: bool = False,
                  init_cfg: OptMultiConfig = None):
+        assert arch in self.arch_settings.keys()
+        self.arch = arch
         super().__init__(
             self.arch_settings[arch],
             deepen_factor,
@@ -77,31 +156,57 @@ def __init__(self,
 
     def build_stem_layer(self) -> nn.Module:
         """Build a stem layer."""
-        stem = nn.Sequential(
-            ConvModule(
-                3,
-                int(self.arch_setting[0][0] * self.widen_factor // 2),
+        if self.arch in ['L', 'X']:
+            stem = nn.Sequential(
+                ConvModule(
+                    3,
+                    int(self.arch_setting[0][0] * self.widen_factor // 2),
+                    3,
+                    padding=1,
+                    stride=1,
+                    norm_cfg=self.norm_cfg,
+                    act_cfg=self.act_cfg),
+                ConvModule(
+                    int(self.arch_setting[0][0] * self.widen_factor // 2),
+                    int(self.arch_setting[0][0] * self.widen_factor),
+                    3,
+                    padding=1,
+                    stride=2,
+                    norm_cfg=self.norm_cfg,
+                    act_cfg=self.act_cfg),
+                ConvModule(
+                    int(self.arch_setting[0][0] * self.widen_factor),
+                    int(self.arch_setting[0][0] * self.widen_factor),
+                    3,
+                    padding=1,
+                    stride=1,
+                    norm_cfg=self.norm_cfg,
+                    act_cfg=self.act_cfg))
+        elif self.arch == 'Tiny':
+            stem = nn.Sequential(
+                ConvModule(
+                    3,
+                    int(self.arch_setting[0][0] * self.widen_factor // 2),
+                    3,
+                    padding=1,
+                    stride=2,
+                    norm_cfg=self.norm_cfg,
+                    act_cfg=self.act_cfg),
+                ConvModule(
+                    int(self.arch_setting[0][0] * self.widen_factor // 2),
+                    int(self.arch_setting[0][0] * self.widen_factor),
+                    3,
+                    padding=1,
+                    stride=2,
+                    norm_cfg=self.norm_cfg,
+                    act_cfg=self.act_cfg))
+        elif self.arch in ['W', 'E', 'D', 'E2E']:
+            stem = Focus(
                 3,
-                padding=1,
-                stride=1,
-                norm_cfg=self.norm_cfg,
-                act_cfg=self.act_cfg),
-            ConvModule(
-                int(self.arch_setting[0][0] * self.widen_factor // 2),
                 int(self.arch_setting[0][0] * self.widen_factor),
-                3,
-                padding=1,
-                stride=2,
+                kernel_size=3,
                 norm_cfg=self.norm_cfg,
-                act_cfg=self.act_cfg),
-            ConvModule(
-                int(self.arch_setting[0][0] * self.widen_factor),
-                int(self.arch_setting[0][0] * self.widen_factor),
-                3,
-                padding=1,
-                stride=1,
-                norm_cfg=self.norm_cfg,
-                act_cfg=self.act_cfg))
+                act_cfg=self.act_cfg)
         return stem
 
     def build_stage_layer(self, stage_idx: int, setting: list) -> list:
@@ -111,39 +216,70 @@ def build_stage_layer(self, stage_idx: int, setting: list) -> list:
             stage_idx (int): The index of a stage layer.
             setting (list): The architecture setting of a stage layer.
         """
-        in_channels, out_channels, elan_mode = setting
-
+        in_channels, out_channels, stage_block_cfg = setting
         in_channels = int(in_channels * self.widen_factor)
         out_channels = int(out_channels * self.widen_factor)
 
+        stage_block_cfg = stage_block_cfg.copy()
+        stage_block_cfg.setdefault('norm_cfg', self.norm_cfg)
+        stage_block_cfg.setdefault('act_cfg', self.act_cfg)
+
+        stage_block_cfg['in_channels'] = in_channels
+        stage_block_cfg['out_channels'] = out_channels
+
         stage = []
-        if stage_idx == 0:
-            pre_layer = ConvModule(
+        if self.arch in ['W', 'E', 'D', 'E2E']:
+            stage_block_cfg['in_channels'] = out_channels
+        elif self.arch in ['L', 'X']:
+            if stage_idx == 0:
+                stage_block_cfg['in_channels'] = out_channels // 2
+
+        downsample_layer = self._build_downsample_layer(
+            stage_idx, in_channels, out_channels)
+        stage.append(MODELS.build(stage_block_cfg))
+        if downsample_layer is not None:
+            stage.insert(0, downsample_layer)
+        return stage
+
+    def _build_downsample_layer(self, stage_idx: int, in_channels: int,
+                                out_channels: int) -> Optional[nn.Module]:
+        """Build a downsample layer pre stage."""
+        if self.arch in ['E', 'D', 'E2E']:
+            downsample_layer = MaxPoolAndStrideConvBlock(
                 in_channels,
                 out_channels,
-                3,
-                stride=2,
-                padding=1,
-                norm_cfg=self.norm_cfg,
-                act_cfg=self.act_cfg)
-            elan_layer = ELANBlock(
-                out_channels,
-                mode=elan_mode,
-                num_blocks=2,
+                use_in_channels_of_middle=True,
                 norm_cfg=self.norm_cfg,
                 act_cfg=self.act_cfg)
-            stage.extend([pre_layer, elan_layer])
-        else:
-            pre_layer = MaxPoolAndStrideConvBlock(
+        elif self.arch == 'W':
+            downsample_layer = ConvModule(
                 in_channels,
-                mode='reduce_channel_2x',
-                norm_cfg=self.norm_cfg,
-                act_cfg=self.act_cfg)
-            elan_layer = ELANBlock(
-                in_channels,
-                mode=elan_mode,
-                num_blocks=2,
+                out_channels,
+                3,
+                stride=2,
+                padding=1,
                 norm_cfg=self.norm_cfg,
                 act_cfg=self.act_cfg)
-            stage.extend([pre_layer, elan_layer])
-        return stage
+        elif self.arch == 'Tiny':
+            if stage_idx != 0:
+                downsample_layer = nn.MaxPool2d(2, 2)
+            else:
+                downsample_layer = None
+        elif self.arch in ['L', 'X']:
+            if stage_idx == 0:
+                downsample_layer = ConvModule(
+                    in_channels,
+                    out_channels // 2,
+                    3,
+                    stride=2,
+                    padding=1,
+                    norm_cfg=self.norm_cfg,
+                    act_cfg=self.act_cfg)
+            else:
+                downsample_layer = MaxPoolAndStrideConvBlock(
+                    in_channels,
+                    in_channels,
+                    use_in_channels_of_middle=False,
+                    norm_cfg=self.norm_cfg,
+                    act_cfg=self.act_cfg)
+        return downsample_layer
diff --git a/mmyolo/models/dense_heads/__init__.py b/mmyolo/models/dense_heads/__init__.py
index 469880688..57fd668c0 100644
--- a/mmyolo/models/dense_heads/__init__.py
+++ b/mmyolo/models/dense_heads/__init__.py
@@ -3,11 +3,12 @@
 from .rtmdet_head import RTMDetHead, RTMDetSepBNHeadModule
 from .yolov5_head import YOLOv5Head, YOLOv5HeadModule
 from .yolov6_head import YOLOv6Head, YOLOv6HeadModule
-from .yolov7_head import YOLOv7Head
+from .yolov7_head import YOLOv7Head, YOLOv7HeadModule, YOLOv7p6HeadModule
 from .yolox_head import YOLOXHead, YOLOXHeadModule
 
 __all__ = [
     'YOLOv5Head', 'YOLOv6Head', 'YOLOXHead', 'YOLOv5HeadModule',
     'YOLOv6HeadModule', 'YOLOXHeadModule', 'RTMDetHead',
-    'RTMDetSepBNHeadModule', 'YOLOv7Head', 'PPYOLOEHead', 'PPYOLOEHeadModule'
+    'RTMDetSepBNHeadModule', 'YOLOv7Head', 'PPYOLOEHead', 'PPYOLOEHeadModule',
+    'YOLOv7HeadModule', 'YOLOv7p6HeadModule'
 ]
diff --git a/mmyolo/models/dense_heads/yolov5_head.py b/mmyolo/models/dense_heads/yolov5_head.py
index 50115bbab..57913ca6e 100644
--- a/mmyolo/models/dense_heads/yolov5_head.py
+++ b/mmyolo/models/dense_heads/yolov5_head.py
@@ -167,6 +167,7 @@ def __init__(self,
                      reduction='mean',
                      loss_weight=1.0),
                  prior_match_thr: float = 4.0,
+                 near_neighbor_thr: float = 0.5,
                  obj_level_weights: List[float] = [4.0, 1.0, 0.4],
                  train_cfg: OptConfigType = None,
                  test_cfg: OptConfigType = None,
@@ -192,6 +193,7 @@ def __init__(self,
         self.featmap_sizes = [torch.empty(1)] * self.num_levels
 
         self.prior_match_thr = prior_match_thr
+        self.near_neighbor_thr = near_neighbor_thr
         self.obj_level_weights = obj_level_weights
 
         self.special_init()
@@ -231,7 +233,7 @@ def special_init(self):
             [0, 1],  # up
             [-1, 0],  # right
             [0, -1],  # bottom
-        ]).float() * 0.5
+        ]).float()
         self.register_buffer(
             'grid_offset', grid_offset[:, None], persistent=False)
 
@@ -534,9 +536,10 @@ def loss_by_feat(
             # them as positive samples as well.
             batch_targets_cxcy = batch_targets_scaled[:, 2:4]
             grid_xy = scaled_factor[[2, 3]] - batch_targets_cxcy
-            left, up = ((batch_targets_cxcy % 1 < 0.5) &
+            left, up = ((batch_targets_cxcy % 1 < self.near_neighbor_thr) &
                         (batch_targets_cxcy > 1)).T
-            right, bottom = ((grid_xy % 1 < 0.5) & (grid_xy > 1)).T
+            right, bottom = ((grid_xy % 1 < self.near_neighbor_thr) &
+                             (grid_xy > 1)).T
             offset_inds = torch.stack(
                 (torch.ones_like(left), left, up, right, bottom))
 
@@ -552,7 +555,8 @@ def loss_by_feat(
             priors_inds, (img_inds, class_inds) = priors_inds.long().view(
                 -1), img_class_inds.long().T
 
-            grid_xy_long = (grid_xy - retained_offsets).long()
+            grid_xy_long = (grid_xy -
+                            retained_offsets * self.near_neighbor_thr).long()
             grid_x_inds, grid_y_inds = grid_xy_long.T
             bboxes_targets = torch.cat((grid_xy - grid_xy_long, grid_wh), 1)
 
diff --git a/mmyolo/models/dense_heads/yolov6_head.py b/mmyolo/models/dense_heads/yolov6_head.py
index cf56ea405..b2581ef5f 100644
--- a/mmyolo/models/dense_heads/yolov6_head.py
+++ b/mmyolo/models/dense_heads/yolov6_head.py
@@ -14,7 +14,6 @@
 from torch import Tensor
 
 from mmyolo.registry import MODELS, TASK_UTILS
-from ..utils import make_divisible
 from .yolov5_head import YOLOv5Head
 
 
@@ -31,7 +30,7 @@ class YOLOv6HeadModule(BaseModule):
             feature map.
         widen_factor (float): Width multiplier, multiply number of
             channels in each layer by this amount. Default: 1.0.
-        num_base_priors:int: The number of priors (points) at a point
+        num_base_priors: (int): The number of priors (points) at a point
             on the feature grid.
         featmap_strides (Sequence[int]): Downsample factor of each feature map.
              Defaults to [8, 16, 32].
@@ -65,12 +64,10 @@ def __init__(self,
         self.act_cfg = act_cfg
 
         if isinstance(in_channels, int):
-            self.in_channels = [make_divisible(in_channels, widen_factor)
+            self.in_channels = [int(in_channels * widen_factor)
                                 ] * self.num_levels
         else:
-            self.in_channels = [
-                make_divisible(i, widen_factor) for i in in_channels
-            ]
+            self.in_channels = [int(i * widen_factor) for i in in_channels]
 
         self._init_layers()
 
@@ -380,7 +377,7 @@ def loss_by_feat(
             loss_cls=loss_cls * world_size, loss_bbox=loss_bbox * world_size)
 
     @staticmethod
-    def gt_instances_preprocess(batch_gt_instances: Tensor,
+    def gt_instances_preprocess(batch_gt_instances: Union[Tensor, Sequence],
                                 batch_size: int) -> Tensor:
         """Split batch_gt_instances with batch size, from [all_gt_bboxes, 6]
         to.
@@ -396,28 +393,51 @@ def gt_instances_preprocess(batch_gt_instances: Tensor,
         Returns:
             Tensor: batch gt instances data, shape [batch_size, number_gt, 5]
         """
-
-        # sqlit batch gt instance [all_gt_bboxes, 6] ->
-        # [batch_size, number_gt_each_batch, 5]
-        batch_instance_list = []
-        max_gt_bbox_len = 0
-        for i in range(batch_size):
-            single_batch_instance = \
-                batch_gt_instances[batch_gt_instances[:, 0] == i, :]
-            single_batch_instance = single_batch_instance[:, 1:]
-            batch_instance_list.append(single_batch_instance)
-            if len(single_batch_instance) > max_gt_bbox_len:
-                max_gt_bbox_len = len(single_batch_instance)
-
-        # fill [-1., 0., 0., 0., 0.] if some shape of
-        # single batch not equal max_gt_bbox_len
-        for index, gt_instance in enumerate(batch_instance_list):
-            if gt_instance.shape[0] >= max_gt_bbox_len:
-                continue
-            fill_tensor = batch_gt_instances.new_full(
-                [max_gt_bbox_len - gt_instance.shape[0], 5], 0)
-            fill_tensor[:, 0] = -1.
-            batch_instance_list[index] = torch.cat(
-                (batch_instance_list[index], fill_tensor), dim=0)
-
-        return torch.stack(batch_instance_list)
+        if isinstance(batch_gt_instances, Sequence):
+            max_gt_bbox_len = max(
+                [len(gt_instances) for gt_instances in batch_gt_instances])
+            # fill [-1., 0., 0., 0., 0.] if some shape of
+            # single batch not equal max_gt_bbox_len
+            batch_instance_list = []
+            for index, gt_instance in enumerate(batch_gt_instances):
+                bboxes = gt_instance.bboxes
+                labels = gt_instance.labels
+                batch_instance_list.append(
+                    torch.cat((labels[:, None], bboxes), dim=-1))
+
+                if bboxes.shape[0] >= max_gt_bbox_len:
+                    continue
+
+                fill_tensor = bboxes.new_full(
+                    [max_gt_bbox_len - bboxes.shape[0], 5], 0)
+                fill_tensor[:, 0] = -1.
+                batch_instance_list[index] = torch.cat(
+                    (batch_instance_list[-1], fill_tensor), dim=0)
+
+            return torch.stack(batch_instance_list)
+        else:
+            # faster version
+            # sqlit batch gt instance [all_gt_bboxes, 6] ->
+            # [batch_size, number_gt_each_batch, 5]
+            batch_instance_list = []
+            max_gt_bbox_len = 0
+            for i in range(batch_size):
+                single_batch_instance = \
+                    batch_gt_instances[batch_gt_instances[:, 0] == i, :]
+                single_batch_instance = single_batch_instance[:, 1:]
+                batch_instance_list.append(single_batch_instance)
+                if len(single_batch_instance) > max_gt_bbox_len:
+                    max_gt_bbox_len = len(single_batch_instance)
+
+            # fill [-1., 0., 0., 0., 0.] if some shape of
+            # single batch not equal max_gt_bbox_len
+            for index, gt_instance in enumerate(batch_instance_list):
+                if gt_instance.shape[0] >= max_gt_bbox_len:
+                    continue
+                fill_tensor = batch_gt_instances.new_full(
+                    [max_gt_bbox_len - gt_instance.shape[0], 5], 0)
+                fill_tensor[:, 0] = -1.
+                batch_instance_list[index] = torch.cat(
+                    (batch_instance_list[index], fill_tensor), dim=0)
+
+            return torch.stack(batch_instance_list)
diff --git a/mmyolo/models/dense_heads/yolov7_head.py b/mmyolo/models/dense_heads/yolov7_head.py
index 532c86434..80e6aadd2 100644
--- a/mmyolo/models/dense_heads/yolov7_head.py
+++ b/mmyolo/models/dense_heads/yolov7_head.py
@@ -1,84 +1,210 @@
 # Copyright (c) OpenMMLab. All rights reserved.
-from typing import Sequence
+import math
+from typing import List, Optional, Sequence, Tuple, Union
 
+import torch
 import torch.nn as nn
-from mmdet.utils import (ConfigType, OptConfigType, OptInstanceList,
-                         OptMultiConfig)
+from mmcv.cnn import ConvModule
+from mmdet.models.utils import multi_apply
+from mmdet.utils import ConfigType, OptInstanceList
+from mmengine.dist import get_dist_info
 from mmengine.structures import InstanceData
 from torch import Tensor
 
 from mmyolo.registry import MODELS
-from .yolov5_head import YOLOv5Head
+from ..layers import ImplicitA, ImplicitM
+from ..task_modules.assigners.batch_yolov7_assigner import BatchYOLOv7Assigner
+from .yolov5_head import YOLOv5Head, YOLOv5HeadModule
+
+
+@MODELS.register_module()
+class YOLOv7HeadModule(YOLOv5HeadModule):
+    """YOLOv7Head head module used in YOLOv7."""
+
+    def _init_layers(self):
+        """initialize conv layers in YOLOv7 head."""
+        self.convs_pred = nn.ModuleList()
+        for i in range(self.num_levels):
+            conv_pred = nn.Sequential(
+                ImplicitA(self.in_channels[i]),
+                nn.Conv2d(self.in_channels[i],
+                          self.num_base_priors * self.num_out_attrib, 1),
+                ImplicitM(self.num_base_priors * self.num_out_attrib),
+            )
+            self.convs_pred.append(conv_pred)
+
+    def init_weights(self):
+        """Initialize the bias of YOLOv7 head."""
+        super(YOLOv5HeadModule, self).init_weights()
+        for mi, s in zip(self.convs_pred, self.featmap_strides):  # from
+            mi = mi[1]  # nn.Conv2d
+
+            b = mi.bias.data.view(3, -1)
+            # obj (8 objects per 640 image)
+            b.data[:, 4] += math.log(8 / (640 / s)**2)
+            b.data[:, 5:] += math.log(0.6 / (self.num_classes - 0.99))
+
+            mi.bias.data = b.view(-1)
+
+
+@MODELS.register_module()
+class YOLOv7p6HeadModule(YOLOv5HeadModule):
+    """YOLOv7Head head module used in YOLOv7."""
+
+    def __init__(self,
+                 *args,
+                 main_out_channels: Sequence[int] = [256, 512, 768, 1024],
+                 aux_out_channels: Sequence[int] = [320, 640, 960, 1280],
+                 use_aux: bool = True,
+                 norm_cfg: ConfigType = dict(
+                     type='BN', momentum=0.03, eps=0.001),
+                 act_cfg: ConfigType = dict(type='SiLU', inplace=True),
+                 **kwargs):
+        self.main_out_channels = main_out_channels
+        self.aux_out_channels = aux_out_channels
+        self.use_aux = use_aux
+        self.norm_cfg = norm_cfg
+        self.act_cfg = act_cfg
+        super().__init__(*args, **kwargs)
+
+    def _init_layers(self):
+        """initialize conv layers in YOLOv7 head."""
+        self.main_convs_pred = nn.ModuleList()
+        for i in range(self.num_levels):
+            conv_pred = nn.Sequential(
+                ConvModule(
+                    self.in_channels[i],
+                    self.main_out_channels[i],
+                    3,
+                    padding=1,
+                    norm_cfg=self.norm_cfg,
+                    act_cfg=self.act_cfg),
+                ImplicitA(self.main_out_channels[i]),
+                nn.Conv2d(self.main_out_channels[i],
+                          self.num_base_priors * self.num_out_attrib, 1),
+                ImplicitM(self.num_base_priors * self.num_out_attrib),
+            )
+            self.main_convs_pred.append(conv_pred)
+
+        if self.use_aux:
+            self.aux_convs_pred = nn.ModuleList()
+            for i in range(self.num_levels):
+                aux_pred = nn.Sequential(
+                    ConvModule(
+                        self.in_channels[i],
+                        self.aux_out_channels[i],
+                        3,
+                        padding=1,
+                        norm_cfg=self.norm_cfg,
+                        act_cfg=self.act_cfg),
+                    nn.Conv2d(self.aux_out_channels[i],
+                              self.num_base_priors * self.num_out_attrib, 1))
+                self.aux_convs_pred.append(aux_pred)
+        else:
+            self.aux_convs_pred = [None] * len(self.main_convs_pred)
+
+    def init_weights(self):
+        """Initialize the bias of YOLOv5 head."""
+        super(YOLOv5HeadModule, self).init_weights()
+        for mi, aux, s in zip(self.main_convs_pred, self.aux_convs_pred,
+                              self.featmap_strides):  # from
+            mi = mi[2]  # nn.Conv2d
+            b = mi.bias.data.view(3, -1)
+            # obj (8 objects per 640 image)
+            b.data[:, 4] += math.log(8 / (640 / s)**2)
+            b.data[:, 5:] += math.log(0.6 / (self.num_classes - 0.99))
+            mi.bias.data = b.view(-1)
+
+            if self.use_aux:
+                aux = aux[1]  # nn.Conv2d
+                b = aux.bias.data.view(3, -1)
+                # obj (8 objects per 640 image)
+                b.data[:, 4] += math.log(8 / (640 / s)**2)
+                b.data[:, 5:] += math.log(0.6 / (self.num_classes - 0.99))
+                mi.bias.data = b.view(-1)
+
+    def forward(self, x: Tuple[Tensor]) -> Tuple[List]:
+        """Forward features from the upstream network.
+
+        Args:
+            x (Tuple[Tensor]): Features from the upstream network, each is
+                a 4D-tensor.
+        Returns:
+            Tuple[List]: A tuple of multi-level classification scores, bbox
+            predictions, and objectnesses.
+        """
+        assert len(x) == self.num_levels
+        return multi_apply(self.forward_single, x, self.main_convs_pred,
+                           self.aux_convs_pred)
+
+    def forward_single(self, x: Tensor, convs: nn.Module,
+                       aux_convs: Optional[nn.Module]) \
+            -> Tuple[Union[Tensor, List], Union[Tensor, List],
+                     Union[Tensor, List]]:
+        """Forward feature of a single scale level."""
+
+        pred_map = convs(x)
+        bs, _, ny, nx = pred_map.shape
+        pred_map = pred_map.view(bs, self.num_base_priors, self.num_out_attrib,
+                                 ny, nx)
+
+        cls_score = pred_map[:, :, 5:, ...].reshape(bs, -1, ny, nx)
+        bbox_pred = pred_map[:, :, :4, ...].reshape(bs, -1, ny, nx)
+        objectness = pred_map[:, :, 4:5, ...].reshape(bs, -1, ny, nx)
+
+        if not self.training or not self.use_aux:
+            return cls_score, bbox_pred, objectness
+        else:
+            aux_pred_map = aux_convs(x)
+            aux_pred_map = aux_pred_map.view(bs, self.num_base_priors,
+                                             self.num_out_attrib, ny, nx)
+            aux_cls_score = aux_pred_map[:, :, 5:, ...].reshape(bs, -1, ny, nx)
+            aux_bbox_pred = aux_pred_map[:, :, :4, ...].reshape(bs, -1, ny, nx)
+            aux_objectness = aux_pred_map[:, :, 4:5,
+                                          ...].reshape(bs, -1, ny, nx)
+
+            return [cls_score,
+                    aux_cls_score], [bbox_pred, aux_bbox_pred
+                                     ], [objectness, aux_objectness]
 
 
-# Training mode is currently not supported
 @MODELS.register_module()
 class YOLOv7Head(YOLOv5Head):
     """YOLOv7Head head used in `YOLOv7 <https://arxiv.org/abs/2207.02696>`_.
 
     Args:
-        head_module(nn.Module): Base module used for YOLOv6Head
-        prior_generator(dict): Points generator feature maps
-            in 2D points-based detectors.
-        loss_cls (:obj:`ConfigDict` or dict): Config of classification loss.
-        loss_bbox (:obj:`ConfigDict` or dict): Config of localization loss.
-        loss_obj (:obj:`ConfigDict` or dict): Config of objectness loss.
-        train_cfg (:obj:`ConfigDict` or dict, optional): Training config of
-            anchor head. Defaults to None.
-        test_cfg (:obj:`ConfigDict` or dict, optional): Testing config of
-            anchor head. Defaults to None.
-        init_cfg (:obj:`ConfigDict` or list[:obj:`ConfigDict`] or dict or
-            list[dict], optional): Initialization config dict.
-            Defaults to None.
+        simota_candidate_topk (int): The candidate top-k which used to
+            get top-k ious to calculate dynamic-k in BatchYOLOv7Assigner.
+            Defaults to 10.
+        simota_iou_weight (float): The scale factor for regression
+            iou cost in BatchYOLOv7Assigner. Defaults to 3.0.
+        simota_cls_weight (float): The scale factor for classification
+            cost in BatchYOLOv7Assigner. Defaults to 1.0.
     """
 
     def __init__(self,
-                 head_module: nn.Module,
-                 prior_generator: ConfigType = dict(
-                     type='mmdet.YOLOAnchorGenerator',
-                     base_sizes=[[(10, 13), (16, 30), (33, 23)],
-                                 [(30, 61), (62, 45), (59, 119)],
-                                 [(116, 90), (156, 198), (373, 326)]],
-                     strides=[8, 16, 32]),
-                 bbox_coder: ConfigType = dict(type='YOLOv5BBoxCoder'),
-                 loss_cls: ConfigType = dict(
-                     type='mmdet.CrossEntropyLoss',
-                     use_sigmoid=True,
-                     reduction='sum',
-                     loss_weight=1.0),
-                 loss_bbox: ConfigType = dict(
-                     type='mmdet.GIoULoss', reduction='sum', loss_weight=5.0),
-                 loss_obj: ConfigType = dict(
-                     type='mmdet.CrossEntropyLoss',
-                     use_sigmoid=True,
-                     reduction='sum',
-                     loss_weight=1.0),
-                 train_cfg: OptConfigType = None,
-                 test_cfg: OptConfigType = None,
-                 init_cfg: OptMultiConfig = None):
-        super().__init__(
-            head_module=head_module,
-            prior_generator=prior_generator,
-            bbox_coder=bbox_coder,
-            loss_cls=loss_cls,
-            loss_bbox=loss_bbox,
-            loss_obj=loss_obj,
-            train_cfg=train_cfg,
-            test_cfg=test_cfg,
-            init_cfg=init_cfg)
-
-    def special_init(self):
-        """Since YOLO series algorithms will inherit from YOLOv5Head, but
-        different algorithms have special initialization process.
-
-        The special_init function is designed to deal with this situation.
-        """
-        pass
+                 *args,
+                 simota_candidate_topk: int = 20,
+                 simota_iou_weight: float = 3.0,
+                 simota_cls_weight: float = 1.0,
+                 aux_loss_weights: float = 0.25,
+                 **kwargs):
+        super().__init__(*args, **kwargs)
+        self.aux_loss_weights = aux_loss_weights
+        self.assigner = BatchYOLOv7Assigner(
+            num_classes=self.num_classes,
+            num_base_priors=self.num_base_priors,
+            featmap_strides=self.featmap_strides,
+            prior_match_thr=self.prior_match_thr,
+            candidate_topk=simota_candidate_topk,
+            iou_weight=simota_iou_weight,
+            cls_weight=simota_cls_weight)
 
     def loss_by_feat(
             self,
-            cls_scores: Sequence[Tensor],
-            bbox_preds: Sequence[Tensor],
+            cls_scores: Sequence[Union[Tensor, List]],
+            bbox_preds: Sequence[Union[Tensor, List]],
+            objectnesses: Sequence[Union[Tensor, List]],
             batch_gt_instances: Sequence[InstanceData],
             batch_img_metas: Sequence[dict],
             batch_gt_instances_ignore: OptInstanceList = None) -> dict:
@@ -92,6 +218,9 @@ def loss_by_feat(
             bbox_preds (Sequence[Tensor]): Box energies / deltas for each scale
                 level, each is a 4D-tensor, the channel number is
                 num_priors * 4.
+            objectnesses (Sequence[Tensor]): Score factor for
+                all scale level, each is a 4D-tensor, has shape
+                (batch_size, 1, H, W).
             batch_gt_instances (list[:obj:`InstanceData`]): Batch of
                 gt_instance. It usually includes ``bboxes`` and ``labels``
                 attributes.
@@ -104,4 +233,172 @@ def loss_by_feat(
         Returns:
             dict[str, Tensor]: A dictionary of losses.
         """
-        raise NotImplementedError('Not implemented yet！')
+
+        if isinstance(cls_scores[0], Sequence):
+            with_aux = True
+            batch_size = cls_scores[0][0].shape[0]
+            device = cls_scores[0][0].device
+
+            bbox_preds_main, bbox_preds_aux = zip(*bbox_preds)
+            objectnesses_main, objectnesses_aux = zip(*objectnesses)
+            cls_scores_main, cls_scores_aux = zip(*cls_scores)
+
+            head_preds = self._merge_predict_results(bbox_preds_main,
+                                                     objectnesses_main,
+                                                     cls_scores_main)
+            head_preds_aux = self._merge_predict_results(
+                bbox_preds_aux, objectnesses_aux, cls_scores_aux)
+        else:
+            with_aux = False
+            batch_size = cls_scores[0].shape[0]
+            device = cls_scores[0].device
+
+            head_preds = self._merge_predict_results(bbox_preds, objectnesses,
+                                                     cls_scores)
+
+        # Convert gt to norm xywh format
+        # (num_base_priors, num_batch_gt, 7)
+        # 7 is mean (batch_idx, cls_id, x_norm, y_norm,
+        # w_norm, h_norm, prior_idx)
+        batch_targets_normed = self._convert_gt_to_norm_format(
+            batch_gt_instances, batch_img_metas)
+
+        scaled_factors = [
+            torch.tensor(head_pred.shape, device=device)[[3, 2, 3, 2]]
+            for head_pred in head_preds
+        ]
+
+        loss_cls, loss_obj, loss_box = self._calc_loss(
+            head_preds=head_preds,
+            head_preds_aux=None,
+            batch_targets_normed=batch_targets_normed,
+            near_neighbor_thr=self.near_neighbor_thr,
+            scaled_factors=scaled_factors,
+            batch_img_metas=batch_img_metas,
+            device=device)
+
+        if with_aux:
+            loss_cls_aux, loss_obj_aux, loss_box_aux = self._calc_loss(
+                head_preds=head_preds,
+                head_preds_aux=head_preds_aux,
+                batch_targets_normed=batch_targets_normed,
+                near_neighbor_thr=self.near_neighbor_thr * 2,
+                scaled_factors=scaled_factors,
+                batch_img_metas=batch_img_metas,
+                device=device)
+            loss_cls += self.aux_loss_weights * loss_cls_aux
+            loss_obj += self.aux_loss_weights * loss_obj_aux
+            loss_box += self.aux_loss_weights * loss_box_aux
+
+        _, world_size = get_dist_info()
+        return dict(
+            loss_cls=loss_cls * batch_size * world_size,
+            loss_obj=loss_obj * batch_size * world_size,
+            loss_bbox=loss_box * batch_size * world_size)
+
+    def _calc_loss(self, head_preds, head_preds_aux, batch_targets_normed,
+                   near_neighbor_thr, scaled_factors, batch_img_metas, device):
+        loss_cls = torch.zeros(1, device=device)
+        loss_box = torch.zeros(1, device=device)
+        loss_obj = torch.zeros(1, device=device)
+
+        assigner_results = self.assigner(
+            head_preds,
+            batch_targets_normed,
+            batch_img_metas[0]['batch_input_shape'],
+            self.priors_base_sizes,
+            self.grid_offset,
+            near_neighbor_thr=near_neighbor_thr)
+        # mlvl is mean multi_level
+        mlvl_positive_infos = assigner_results['mlvl_positive_infos']
+        mlvl_priors = assigner_results['mlvl_priors']
+        mlvl_targets_normed = assigner_results['mlvl_targets_normed']
+
+        if head_preds_aux is not None:
+            # This is mean calc aux branch loss
+            head_preds = head_preds_aux
+
+        for i, head_pred in enumerate(head_preds):
+            batch_inds, proir_idx, grid_x, grid_y = mlvl_positive_infos[i].T
+            num_pred_positive = batch_inds.shape[0]
+            target_obj = torch.zeros_like(head_pred[..., 0])
+            # empty positive sampler
+            if num_pred_positive == 0:
+                loss_box += head_pred[..., :4].sum() * 0
+                loss_cls += head_pred[..., 5:].sum() * 0
+                loss_obj += self.loss_obj(
+                    head_pred[..., 4], target_obj) * self.obj_level_weights[i]
+                continue
+
+            priors = mlvl_priors[i]
+            targets_normed = mlvl_targets_normed[i]
+
+            head_pred_positive = head_pred[batch_inds, proir_idx, grid_y,
+                                           grid_x]
+
+            # calc bbox loss
+            grid_xy = torch.stack([grid_x, grid_y], dim=1)
+            decoded_pred_bbox = self._decode_bbox_to_xywh(
+                head_pred_positive[:, :4], priors, grid_xy)
+            target_bbox_scaled = targets_normed[:, 2:6] * scaled_factors[i]
+
+            loss_box_i, iou = self.loss_bbox(decoded_pred_bbox,
+                                             target_bbox_scaled)
+            loss_box += loss_box_i
+
+            # calc obj loss
+            target_obj[batch_inds, proir_idx, grid_y,
+                       grid_x] = iou.detach().clamp(0).type(target_obj.dtype)
+            loss_obj += self.loss_obj(head_pred[..., 4],
+                                      target_obj) * self.obj_level_weights[i]
+
+            # calc cls loss
+            if self.num_classes > 1:
+                pred_cls_scores = targets_normed[:, 1].long()
+                target_class = torch.full_like(
+                    head_pred_positive[:, 5:], 0., device=device)
+                target_class[range(num_pred_positive), pred_cls_scores] = 1.
+                loss_cls += self.loss_cls(head_pred_positive[:, 5:],
+                                          target_class)
+            else:
+                loss_cls += head_pred_positive[:, 5:].sum() * 0
+        return loss_cls, loss_obj, loss_box
+
+    def _merge_predict_results(self, bbox_preds: Sequence[Tensor],
+                               objectnesses: Sequence[Tensor],
+                               cls_scores: Sequence[Tensor]) -> List[Tensor]:
+        """Merge predict output from 3 heads.
+
+        Args:
+            cls_scores (Sequence[Tensor]): Box scores for each scale level,
+                each is a 4D-tensor, the channel number is
+                num_priors * num_classes.
+            bbox_preds (Sequence[Tensor]): Box energies / deltas for each scale
+                level, each is a 4D-tensor, the channel number is
+                num_priors * 4.
+            objectnesses (Sequence[Tensor]): Score factor for
+                all scale level, each is a 4D-tensor, has shape
+                (batch_size, 1, H, W).
+
+        Returns:
+              List[Tensor]: Merged output.
+        """
+        head_preds = []
+        for bbox_pred, objectness, cls_score in zip(bbox_preds, objectnesses,
+                                                    cls_scores):
+            b, _, h, w = bbox_pred.shape
+            bbox_pred = bbox_pred.reshape(b, self.num_base_priors, -1, h, w)
+            objectness = objectness.reshape(b, self.num_base_priors, -1, h, w)
+            cls_score = cls_score.reshape(b, self.num_base_priors, -1, h, w)
+            head_pred = torch.cat([bbox_pred, objectness, cls_score],
+                                  dim=2).permute(0, 1, 3, 4, 2).contiguous()
+            head_preds.append(head_pred)
+        return head_preds
+
+    def _decode_bbox_to_xywh(self, bbox_pred, priors_base_sizes,
+                             grid_xy) -> Tensor:
+        bbox_pred = bbox_pred.sigmoid()
+        pred_xy = bbox_pred[:, :2] * 2 - 0.5 + grid_xy
+        pred_wh = (bbox_pred[:, 2:] * 2)**2 * priors_base_sizes
+        decoded_bbox_pred = torch.cat((pred_xy, pred_wh), dim=-1)
+        return decoded_bbox_pred
diff --git a/mmyolo/models/layers/__init__.py b/mmyolo/models/layers/__init__.py
index 3c8a543bd..d8ef15154 100644
--- a/mmyolo/models/layers/__init__.py
+++ b/mmyolo/models/layers/__init__.py
@@ -1,12 +1,14 @@
 # Copyright (c) OpenMMLab. All rights reserved.
 from .ema import ExpMomentumEMA
-from .yolo_bricks import (EffectiveSELayer, ELANBlock,
+from .yolo_bricks import (BepC3StageBlock, EELANBlock, EffectiveSELayer,
+                          ELANBlock, ImplicitA, ImplicitM,
                           MaxPoolAndStrideConvBlock, PPYOLOEBasicBlock,
                           RepStageBlock, RepVGGBlock, SPPFBottleneck,
-                          SPPFCSPBlock)
+                          SPPFCSPBlock, TinyDownSampleBlock)
 
 __all__ = [
     'SPPFBottleneck', 'RepVGGBlock', 'RepStageBlock', 'ExpMomentumEMA',
     'ELANBlock', 'MaxPoolAndStrideConvBlock', 'SPPFCSPBlock',
-    'PPYOLOEBasicBlock', 'EffectiveSELayer'
+    'PPYOLOEBasicBlock', 'EffectiveSELayer', 'TinyDownSampleBlock',
+    'EELANBlock', 'ImplicitA', 'ImplicitM', 'BepC3StageBlock'
 ]
diff --git a/mmyolo/models/layers/yolo_bricks.py b/mmyolo/models/layers/yolo_bricks.py
index c720c1a40..f284acfa3 100644
--- a/mmyolo/models/layers/yolo_bricks.py
+++ b/mmyolo/models/layers/yolo_bricks.py
@@ -22,7 +22,7 @@ class SiLU(nn.Module):
         def __init__(self, inplace=True):
             super().__init__()
 
-        def forward(self, inputs) -> torch.Tensor:
+        def forward(self, inputs) -> Tensor:
             return inputs * torch.sigmoid(inputs)
 
     MODELS.register_module(module=SiLU, name='SiLU')
@@ -100,7 +100,7 @@ def __init__(self,
             norm_cfg=norm_cfg,
             act_cfg=act_cfg)
 
-    def forward(self, x: torch.Tensor) -> torch.Tensor:
+    def forward(self, x: Tensor) -> Tensor:
         """Forward process
         Args:
             x (Tensor): The input tensor.
@@ -118,6 +118,7 @@ def forward(self, x: torch.Tensor) -> torch.Tensor:
         return x
 
 
+@MODELS.register_module()
 class RepVGGBlock(nn.Module):
     """RepVGGBlock is a basic rep-style block, including training and deploy
     status This code is based on
@@ -227,7 +228,7 @@ def __init__(self,
                 norm_cfg=norm_cfg,
                 act_cfg=None)
 
-    def forward(self, inputs: torch.Tensor) -> torch.Tensor:
+    def forward(self, inputs: Tensor) -> Tensor:
         """Forward process.
         Args:
             inputs (Tensor): The input tensor.
@@ -281,8 +282,7 @@ def _pad_1x1_to_3x3_tensor(self, kernel1x1):
         else:
             return torch.nn.functional.pad(kernel1x1, [1, 1, 1, 1])
 
-    def _fuse_bn_tensor(self,
-                        branch: nn.Module) -> Tuple[np.ndarray, torch.Tensor]:
+    def _fuse_bn_tensor(self, branch: nn.Module) -> Tuple[np.ndarray, Tensor]:
         """Derives the equivalent kernel and bias of a specific branch layer.
 
         Args:
@@ -348,38 +348,177 @@ def switch_to_deploy(self):
         self.deploy = True
 
 
-class RepStageBlock(nn.Module):
-    """RepStageBlock is a stage block with rep-style basic block.
+@MODELS.register_module()
+class BepC3StageBlock(nn.Module):
+    """Beer-mug RepC3 Block.
 
     Args:
-        in_channels (int): The input channels of this Module.
-        out_channels (int): The output channels of this Module.
-        n (int, tuple[int]): Number of blocks.  Defaults to 1.
-        block (nn.Module): Basic unit of RepStage. Defaults to RepVGGBlock.
+        in_channels (int): Number of channels in the input image
+        out_channels (int): Number of channels produced by the convolution
+        num_blocks (int): Number of blocks. Defaults to 1
+        hidden_ratio (float): Hidden channel expansion.
+            Default: 0.5
+        concat_all_layer (bool): Concat all layer when forward calculate.
+            Default: True
+        block_cfg (dict): Config dict for the block used to build each
+            layer. Defaults to dict(type='RepVGGBlock').
+        norm_cfg (ConfigType): Config dict for normalization layer.
+            Defaults to dict(type='BN', momentum=0.03, eps=0.001).
+        act_cfg (ConfigType): Config dict for activation layer.
+            Defaults to dict(type='ReLU', inplace=True).
     """
 
     def __init__(self,
                  in_channels: int,
                  out_channels: int,
-                 n: int = 1,
-                 block: nn.Module = RepVGGBlock):
+                 num_blocks: int = 1,
+                 hidden_ratio: float = 0.5,
+                 concat_all_layer: bool = True,
+                 block_cfg: ConfigType = dict(type='RepVGGBlock'),
+                 norm_cfg: ConfigType = dict(
+                     type='BN', momentum=0.03, eps=0.001),
+                 act_cfg: ConfigType = dict(type='ReLU', inplace=True)):
         super().__init__()
-        self.conv1 = block(in_channels, out_channels)
-        self.block = nn.Sequential(*(block(out_channels, out_channels)
-                                     for _ in range(n - 1))) if n > 1 else None
+        hidden_channels = int(out_channels * hidden_ratio)
 
-    def forward(self, x: torch.Tensor) -> torch.Tensor:
-        """Forward process.
-        Args:
-            inputs (Tensor): The input tensor.
+        self.conv1 = ConvModule(
+            in_channels,
+            hidden_channels,
+            kernel_size=1,
+            stride=1,
+            groups=1,
+            bias=False,
+            norm_cfg=norm_cfg,
+            act_cfg=act_cfg)
+        self.conv2 = ConvModule(
+            in_channels,
+            hidden_channels,
+            kernel_size=1,
+            stride=1,
+            groups=1,
+            bias=False,
+            norm_cfg=norm_cfg,
+            act_cfg=act_cfg)
+        self.conv3 = ConvModule(
+            2 * hidden_channels,
+            out_channels,
+            kernel_size=1,
+            stride=1,
+            groups=1,
+            bias=False,
+            norm_cfg=norm_cfg,
+            act_cfg=act_cfg)
+        self.block = RepStageBlock(
+            in_channels=hidden_channels,
+            out_channels=hidden_channels,
+            num_blocks=num_blocks,
+            block_cfg=block_cfg,
+            bottle_block=BottleRep)
+        self.concat_all_layer = concat_all_layer
+        if not concat_all_layer:
+            self.conv3 = ConvModule(
+                hidden_channels,
+                out_channels,
+                kernel_size=1,
+                stride=1,
+                groups=1,
+                bias=False,
+                norm_cfg=norm_cfg,
+                act_cfg=act_cfg)
 
-        Returns:
-            Tensor: The output tensor.
-        """
-        x = self.conv1(x)
-        if self.block is not None:
-            x = self.block(x)
-        return x
+    def forward(self, x):
+        if self.concat_all_layer is True:
+            return self.conv3(
+                torch.cat((self.block(self.conv1(x)), self.conv2(x)), dim=1))
+        else:
+            return self.conv3(self.block(self.conv1(x)))
+
+
+class BottleRep(nn.Module):
+    """Bottle Rep Block.
+
+    Args:
+        in_channels (int): Number of channels in the input image
+        out_channels (int): Number of channels produced by the convolution
+        block_cfg (dict): Config dict for the block used to build each
+            layer. Defaults to dict(type='RepVGGBlock').
+        adaptive_weight (bool): Add adaptive_weight when forward calculate.
+            Defaults False.
+    """
+
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 block_cfg: ConfigType = dict(type='RepVGGBlock'),
+                 adaptive_weight: bool = False):
+        super().__init__()
+        conv1_cfg = block_cfg.copy()
+        conv2_cfg = block_cfg.copy()
+
+        conv1_cfg.update(
+            dict(in_channels=in_channels, out_channels=out_channels))
+        conv2_cfg.update(
+            dict(in_channels=out_channels, out_channels=out_channels))
+
+        self.conv1 = MODELS.build(conv1_cfg)
+        self.conv2 = MODELS.build(conv2_cfg)
+
+        if in_channels != out_channels:
+            self.shortcut = False
+        else:
+            self.shortcut = True
+        if adaptive_weight:
+            self.alpha = nn.Parameter(torch.ones(1))
+        else:
+            self.alpha = 1.0
+
+    def forward(self, x: Tensor) -> Tensor:
+        outputs = self.conv1(x)
+        outputs = self.conv2(outputs)
+        return outputs + self.alpha * x if self.shortcut else outputs
+
+
+@MODELS.register_module()
+class ConvWrapper(nn.Module):
+    """Wrapper for normal Conv with SiLU activation.
+
+    Args:
+        in_channels (int): Number of channels in the input image
+        out_channels (int): Number of channels produced by the convolution
+        kernel_size (int or tuple): Size of the convolving kernel
+        stride (int or tuple): Stride of the convolution. Default: 1
+        groups (int, optional): Number of blocked connections from input
+            channels to output channels. Default: 1
+        bias (bool, optional): Conv bias. Default: True.
+        norm_cfg (ConfigType): Config dict for normalization layer.
+            Defaults to dict(type='BN', momentum=0.03, eps=0.001).
+        act_cfg (ConfigType): Config dict for activation layer.
+            Defaults to dict(type='ReLU', inplace=True).
+    """
+
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 kernel_size: int = 3,
+                 stride: int = 1,
+                 groups: int = 1,
+                 bias: bool = True,
+                 norm_cfg: ConfigType = None,
+                 act_cfg: ConfigType = dict(type='SiLU')):
+        super().__init__()
+        self.block = ConvModule(
+            in_channels,
+            out_channels,
+            kernel_size,
+            stride,
+            padding=kernel_size // 2,
+            groups=groups,
+            bias=bias,
+            norm_cfg=norm_cfg,
+            act_cfg=act_cfg)
+
+    def forward(self, x: Tensor) -> Tensor:
+        return self.block(x)
 
 
 @MODELS.register_module()
@@ -465,20 +604,21 @@ def forward(self, feat: Tensor, avg_feat: Tensor) -> Tensor:
         return self.conv(feat * weight)
 
 
+@MODELS.register_module()
 class ELANBlock(BaseModule):
     """Efficient layer aggregation networks for YOLOv7.
 
-    - if mode is `reduce_channel_2x`, the output channel will be
-      reduced by a factor of 2
-    - if mode is `no_change_channel`, the output channel does not change.
-    - if mode is `expand_channel_2x`, the output channel will be
-      expanded by a factor of 2
-
     Args:
         in_channels (int): The input channels of this Module.
-        mode (str): Output channel mode. Defaults to `expand_channel_2x`.
+        out_channels (int): The out channels of this Module.
+        middle_ratio (float): The scaling ratio of the middle layer
+            based on the in_channels.
+        block_ratio (float): The scaling ratio of the block layer
+            based on the in_channels.
         num_blocks (int): The number of blocks in the main branch.
             Defaults to 2.
+        num_convs_in_block (int): The number of convs pre block.
+            Defaults to 1.
         conv_cfg (dict): Config dict for convolution layer. Defaults to None.
             which means using conv2d. Defaults to None.
         norm_cfg (dict): Config dict for normalization layer.
@@ -491,37 +631,28 @@ class ELANBlock(BaseModule):
 
     def __init__(self,
                  in_channels: int,
-                 mode: str = 'expand_channel_2x',
+                 out_channels: int,
+                 middle_ratio: float,
+                 block_ratio: float,
                  num_blocks: int = 2,
+                 num_convs_in_block: int = 1,
                  conv_cfg: OptConfigType = None,
                  norm_cfg: ConfigType = dict(
                      type='BN', momentum=0.03, eps=0.001),
                  act_cfg: ConfigType = dict(type='SiLU', inplace=True),
                  init_cfg: OptMultiConfig = None):
         super().__init__(init_cfg=init_cfg)
+        assert num_blocks >= 1
+        assert num_convs_in_block >= 1
 
-        assert mode in ('expand_channel_2x', 'no_change_channel',
-                        'reduce_channel_2x')
-
-        if mode == 'expand_channel_2x':
-            mid_channels = in_channels // 2
-            block_channels = mid_channels
-            final_conv_in_channels = 2 * in_channels
-            final_conv_out_channels = 2 * in_channels
-        elif mode == 'no_change_channel':
-            mid_channels = in_channels // 4
-            block_channels = mid_channels
-            final_conv_in_channels = in_channels
-            final_conv_out_channels = in_channels
-        else:
-            mid_channels = in_channels // 2
-            block_channels = mid_channels // 2
-            final_conv_in_channels = in_channels * 2
-            final_conv_out_channels = in_channels // 2
+        middle_channels = int(in_channels * middle_ratio)
+        block_channels = int(in_channels * block_ratio)
+        final_conv_in_channels = int(
+            num_blocks * block_channels) + 2 * middle_channels
 
         self.main_conv = ConvModule(
             in_channels,
-            mid_channels,
+            middle_channels,
             1,
             conv_cfg=conv_cfg,
             norm_cfg=norm_cfg,
@@ -529,7 +660,7 @@ def __init__(self,
 
         self.short_conv = ConvModule(
             in_channels,
-            mid_channels,
+            middle_channels,
             1,
             conv_cfg=conv_cfg,
             norm_cfg=norm_cfg,
@@ -537,9 +668,9 @@ def __init__(self,
 
         self.blocks = nn.ModuleList()
         for _ in range(num_blocks):
-            if mode == 'reduce_channel_2x':
+            if num_convs_in_block == 1:
                 internal_block = ConvModule(
-                    mid_channels,
+                    middle_channels,
                     block_channels,
                     3,
                     padding=1,
@@ -547,29 +678,26 @@ def __init__(self,
                     norm_cfg=norm_cfg,
                     act_cfg=act_cfg)
             else:
-                internal_block = nn.Sequential(
-                    ConvModule(
-                        mid_channels,
-                        block_channels,
-                        3,
-                        padding=1,
-                        conv_cfg=conv_cfg,
-                        norm_cfg=norm_cfg,
-                        act_cfg=act_cfg),
-                    ConvModule(
-                        block_channels,
-                        block_channels,
-                        3,
-                        padding=1,
-                        conv_cfg=conv_cfg,
-                        norm_cfg=norm_cfg,
-                        act_cfg=act_cfg))
-            mid_channels = block_channels
+                internal_block = []
+                for _ in range(num_convs_in_block):
+                    internal_block.append(
+                        ConvModule(
+                            middle_channels,
+                            block_channels,
+                            3,
+                            padding=1,
+                            conv_cfg=conv_cfg,
+                            norm_cfg=norm_cfg,
+                            act_cfg=act_cfg))
+                    middle_channels = block_channels
+                internal_block = nn.Sequential(*internal_block)
+
+            middle_channels = block_channels
             self.blocks.append(internal_block)
 
         self.final_conv = ConvModule(
             final_conv_in_channels,
-            final_conv_out_channels,
+            out_channels,
             1,
             conv_cfg=conv_cfg,
             norm_cfg=norm_cfg,
@@ -591,17 +719,38 @@ def forward(self, x: Tensor) -> Tensor:
         return self.final_conv(x_final)
 
 
+@MODELS.register_module()
+class EELANBlock(BaseModule):
+    """Expand efficient layer aggregation networks for YOLOv7.
+
+    Args:
+        num_elan_block (int): The number of ELANBlock.
+    """
+
+    def __init__(self, num_elan_block: int, **kwargs):
+        super().__init__()
+        assert num_elan_block >= 1
+        self.e_elan_blocks = nn.ModuleList()
+        for _ in range(num_elan_block):
+            self.e_elan_blocks.append(ELANBlock(**kwargs))
+
+    def forward(self, x: Tensor) -> Tensor:
+        outs = []
+        for elan_blocks in self.e_elan_blocks:
+            outs.append(elan_blocks(x))
+        return sum(outs)
+
+
 class MaxPoolAndStrideConvBlock(BaseModule):
     """Max pooling and stride conv layer for YOLOv7.
 
-    - if mode is `reduce_channel_2x`, the output channel will
-    be reduced by a factor of 2
-    - if mode is `no_change_channel`, the output channel does not change.
-
     Args:
         in_channels (int): The input channels of this Module.
-        mode (str): Output channel mode. `reduce_channel_2x` or
-            `no_change_channel`. Defaults to `reduce_channel_2x`
+        out_channels (int): The out channels of this Module.
+        maxpool_kernel_sizes (int): kernel sizes of pooling layers.
+            Defaults to 2.
+        use_in_channels_of_middle (bool): Whether to calculate middle channels
+            based on in_channels. Defaults to False.
         conv_cfg (dict): Config dict for convolution layer. Defaults to None.
             which means using conv2d. Defaults to None.
         norm_cfg (dict): Config dict for normalization layer.
@@ -614,7 +763,9 @@ class MaxPoolAndStrideConvBlock(BaseModule):
 
     def __init__(self,
                  in_channels: int,
-                 mode: str = 'reduce_channel_2x',
+                 out_channels: int,
+                 maxpool_kernel_sizes: int = 2,
+                 use_in_channels_of_middle: bool = False,
                  conv_cfg: OptConfigType = None,
                  norm_cfg: ConfigType = dict(
                      type='BN', momentum=0.03, eps=0.001),
@@ -622,33 +773,31 @@ def __init__(self,
                  init_cfg: OptMultiConfig = None):
         super().__init__(init_cfg=init_cfg)
 
-        assert mode in ('no_change_channel', 'reduce_channel_2x')
-
-        if mode == 'reduce_channel_2x':
-            out_channels = in_channels // 2
-        else:
-            out_channels = in_channels
+        middle_channels = in_channels if use_in_channels_of_middle \
+            else out_channels // 2
 
         self.maxpool_branches = nn.Sequential(
-            MaxPool2d(2, 2),
+            MaxPool2d(
+                kernel_size=maxpool_kernel_sizes, stride=maxpool_kernel_sizes),
             ConvModule(
                 in_channels,
-                out_channels,
+                out_channels // 2,
                 1,
                 conv_cfg=conv_cfg,
                 norm_cfg=norm_cfg,
                 act_cfg=act_cfg))
+
         self.stride_conv_branches = nn.Sequential(
             ConvModule(
                 in_channels,
-                out_channels,
+                middle_channels,
                 1,
                 conv_cfg=conv_cfg,
                 norm_cfg=norm_cfg,
                 act_cfg=act_cfg),
             ConvModule(
-                out_channels,
-                out_channels,
+                middle_channels,
+                out_channels // 2,
                 3,
                 stride=2,
                 padding=1,
@@ -666,6 +815,92 @@ def forward(self, x: Tensor) -> Tensor:
         return torch.cat([stride_conv_out, maxpool_out], dim=1)
 
 
+@MODELS.register_module()
+class TinyDownSampleBlock(BaseModule):
+    """Down sample layer for YOLOv7-tiny.
+
+    Args:
+        in_channels (int): The input channels of this Module.
+        out_channels (int): The out channels of this Module.
+        middle_ratio (float): The scaling ratio of the middle layer
+            based on the in_channels. Defaults to 1.0.
+        kernel_sizes (int, tuple[int]): Sequential or number of kernel
+             sizes of pooling layers. Defaults to 3.
+        conv_cfg (dict): Config dict for convolution layer. Defaults to None.
+            which means using conv2d. Defaults to None.
+        norm_cfg (dict): Config dict for normalization layer.
+            Defaults to dict(type='BN', momentum=0.03, eps=0.001).
+        act_cfg (dict): Config dict for activation layer.
+            Defaults to dict(type='LeakyReLU', negative_slope=0.1).
+        init_cfg (dict or list[dict], optional): Initialization config dict.
+            Defaults to None.
+    """
+
+    def __init__(
+            self,
+            in_channels: int,
+            out_channels: int,
+            middle_ratio: float = 1.0,
+            kernel_sizes: Union[int, Sequence[int]] = 3,
+            conv_cfg: OptConfigType = None,
+            norm_cfg: ConfigType = dict(type='BN', momentum=0.03, eps=0.001),
+            act_cfg: ConfigType = dict(type='LeakyReLU', negative_slope=0.1),
+            init_cfg: OptMultiConfig = None):
+        super().__init__(init_cfg)
+
+        middle_channels = int(in_channels * middle_ratio)
+
+        self.short_conv = ConvModule(
+            in_channels,
+            middle_channels,
+            1,
+            conv_cfg=conv_cfg,
+            norm_cfg=norm_cfg,
+            act_cfg=act_cfg)
+
+        self.main_convs = nn.ModuleList()
+        for i in range(3):
+            if i == 0:
+                self.main_convs.append(
+                    ConvModule(
+                        in_channels,
+                        middle_channels,
+                        1,
+                        conv_cfg=conv_cfg,
+                        norm_cfg=norm_cfg,
+                        act_cfg=act_cfg))
+            else:
+                self.main_convs.append(
+                    ConvModule(
+                        middle_channels,
+                        middle_channels,
+                        kernel_sizes,
+                        padding=(kernel_sizes - 1) // 2,
+                        conv_cfg=conv_cfg,
+                        norm_cfg=norm_cfg,
+                        act_cfg=act_cfg))
+
+        self.final_conv = ConvModule(
+            middle_channels * 4,
+            out_channels,
+            1,
+            conv_cfg=conv_cfg,
+            norm_cfg=norm_cfg,
+            act_cfg=act_cfg)
+
+    def forward(self, x) -> Tensor:
+        short_out = self.short_conv(x)
+
+        main_outs = []
+        for main_conv in self.main_convs:
+            main_out = main_conv(x)
+            main_outs.append(main_out)
+            x = main_out
+
+        return self.final_conv(torch.cat([*main_outs[::-1], short_out], dim=1))
+
+
+@MODELS.register_module()
 class SPPFCSPBlock(BaseModule):
     """Spatial pyramid pooling - Fast (SPPF) layer with CSP for
      YOLOv7
@@ -677,6 +912,8 @@ class SPPFCSPBlock(BaseModule):
             Defaults to 0.5.
          kernel_sizes (int, tuple[int]): Sequential or number of kernel
              sizes of pooling layers. Defaults to 5.
+         is_tiny_version (bool): Is tiny version of SPPFCSPBlock. If True,
+            it means it is a yolov7 tiny model. Defaults to False.
          conv_cfg (dict): Config dict for convolution layer. Defaults to None.
              which means using conv2d. Defaults to None.
          norm_cfg (dict): Config dict for normalization layer.
@@ -692,38 +929,50 @@ def __init__(self,
                  out_channels: int,
                  expand_ratio: float = 0.5,
                  kernel_sizes: Union[int, Sequence[int]] = 5,
+                 is_tiny_version: bool = False,
                  conv_cfg: OptConfigType = None,
                  norm_cfg: ConfigType = dict(
                      type='BN', momentum=0.03, eps=0.001),
                  act_cfg: ConfigType = dict(type='SiLU', inplace=True),
                  init_cfg: OptMultiConfig = None):
         super().__init__(init_cfg=init_cfg)
+        self.is_tiny_version = is_tiny_version
+
         mid_channels = int(2 * out_channels * expand_ratio)
 
-        self.main_layers = nn.Sequential(
-            ConvModule(
+        if is_tiny_version:
+            self.main_layers = ConvModule(
                 in_channels,
                 mid_channels,
                 1,
                 conv_cfg=conv_cfg,
                 norm_cfg=norm_cfg,
-                act_cfg=act_cfg),
-            ConvModule(
-                mid_channels,
-                mid_channels,
-                3,
-                padding=1,
-                conv_cfg=conv_cfg,
-                norm_cfg=norm_cfg,
-                act_cfg=act_cfg),
-            ConvModule(
-                mid_channels,
-                mid_channels,
-                1,
-                conv_cfg=conv_cfg,
-                norm_cfg=norm_cfg,
-                act_cfg=act_cfg),
-        )
+                act_cfg=act_cfg)
+        else:
+            self.main_layers = nn.Sequential(
+                ConvModule(
+                    in_channels,
+                    mid_channels,
+                    1,
+                    conv_cfg=conv_cfg,
+                    norm_cfg=norm_cfg,
+                    act_cfg=act_cfg),
+                ConvModule(
+                    mid_channels,
+                    mid_channels,
+                    3,
+                    padding=1,
+                    conv_cfg=conv_cfg,
+                    norm_cfg=norm_cfg,
+                    act_cfg=act_cfg),
+                ConvModule(
+                    mid_channels,
+                    mid_channels,
+                    1,
+                    conv_cfg=conv_cfg,
+                    norm_cfg=norm_cfg,
+                    act_cfg=act_cfg),
+            )
 
         self.kernel_sizes = kernel_sizes
         if isinstance(kernel_sizes, int):
@@ -735,24 +984,33 @@ def __init__(self,
                 for ks in kernel_sizes
             ])
 
-        self.fuse_layers = nn.Sequential(
-            ConvModule(
+        if is_tiny_version:
+            self.fuse_layers = ConvModule(
                 4 * mid_channels,
                 mid_channels,
                 1,
                 conv_cfg=conv_cfg,
                 norm_cfg=norm_cfg,
-                act_cfg=act_cfg),
-            ConvModule(
-                mid_channels,
-                mid_channels,
-                3,
-                padding=1,
-                conv_cfg=conv_cfg,
-                norm_cfg=norm_cfg,
-                act_cfg=act_cfg))
+                act_cfg=act_cfg)
+        else:
+            self.fuse_layers = nn.Sequential(
+                ConvModule(
+                    4 * mid_channels,
+                    mid_channels,
+                    1,
+                    conv_cfg=conv_cfg,
+                    norm_cfg=norm_cfg,
+                    act_cfg=act_cfg),
+                ConvModule(
+                    mid_channels,
+                    mid_channels,
+                    3,
+                    padding=1,
+                    conv_cfg=conv_cfg,
+                    norm_cfg=norm_cfg,
+                    act_cfg=act_cfg))
 
-        self.short_layers = ConvModule(
+        self.short_layer = ConvModule(
             in_channels,
             mid_channels,
             1,
@@ -777,15 +1035,66 @@ def forward(self, x) -> Tensor:
         if isinstance(self.kernel_sizes, int):
             y1 = self.poolings(x1)
             y2 = self.poolings(y1)
-            x1 = self.fuse_layers(
-                torch.cat([x1] + [y1, y2, self.poolings(y2)], 1))
+            concat_list = [x1] + [y1, y2, self.poolings(y2)]
+            if self.is_tiny_version:
+                x1 = self.fuse_layers(torch.cat(concat_list[::-1], 1))
+            else:
+                x1 = self.fuse_layers(torch.cat(concat_list, 1))
         else:
-            x1 = self.fuse_layers(
-                torch.cat([x1] + [m(x1) for m in self.poolings], 1))
-        x2 = self.short_layers(x)
+            concat_list = [x1] + [m(x1) for m in self.poolings]
+            if self.is_tiny_version:
+                x1 = self.fuse_layers(torch.cat(concat_list[::-1], 1))
+            else:
+                x1 = self.fuse_layers(torch.cat(concat_list, 1))
+
+        x2 = self.short_layer(x)
         return self.final_conv(torch.cat((x1, x2), dim=1))
 
 
+class ImplicitA(nn.Module):
+    """Implicit add layer in YOLOv7.
+
+    Args:
+        in_channels (int): The input channels of this Module.
+        mean (float): Mean value of implicit module. Defaults to 0.
+        std (float): Std value of implicit module. Defaults to 0.02
+    """
+
+    def __init__(self, in_channels: int, mean: float = 0., std: float = .02):
+        super().__init__()
+        self.implicit = nn.Parameter(torch.zeros(1, in_channels, 1, 1))
+        nn.init.normal_(self.implicit, mean=mean, std=std)
+
+    def forward(self, x):
+        """Forward process
+        Args:
+            x (Tensor): The input tensor.
+        """
+        return self.implicit + x
+
+
+class ImplicitM(nn.Module):
+    """Implicit multiplier layer in YOLOv7.
+
+    Args:
+        in_channels (int): The input channels of this Module.
+        mean (float): Mean value of implicit module. Defaults to 1.
+        std (float): Std value of implicit module. Defaults to 0.02.
+    """
+
+    def __init__(self, in_channels: int, mean: float = 1., std: float = .02):
+        super().__init__()
+        self.implicit = nn.Parameter(torch.ones(1, in_channels, 1, 1))
+        nn.init.normal_(self.implicit, mean=mean, std=std)
+
+    def forward(self, x):
+        """Forward process
+        Args:
+            x (Tensor): The input tensor.
+        """
+        return self.implicit * x
+
+
 @MODELS.register_module()
 class PPYOLOEBasicBlock(nn.Module):
     """PPYOLOE Backbone BasicBlock.
@@ -986,3 +1295,69 @@ def forward(self, x: Tensor) -> Tensor:
             y = self.attn(y)
         y = self.conv3(y)
         return y
+
+
+@MODELS.register_module()
+class RepStageBlock(nn.Module):
+    """RepStageBlock is a stage block with rep-style basic block.
+
+    Args:
+        in_channels (int): The input channels of this Module.
+        out_channels (int): The output channels of this Module.
+        num_blocks (int, tuple[int]): Number of blocks.  Defaults to 1.
+        bottle_block (nn.Module): Basic unit of RepStage.
+            Defaults to RepVGGBlock.
+        block_cfg (ConfigType): Config of RepStage.
+            Defaults to 'RepVGGBlock'.
+    """
+
+    def __init__(self,
+                 in_channels: int,
+                 out_channels: int,
+                 num_blocks: int = 1,
+                 bottle_block: nn.Module = RepVGGBlock,
+                 block_cfg: ConfigType = dict(type='RepVGGBlock')):
+        super().__init__()
+        block_cfg = block_cfg.copy()
+
+        block_cfg.update(
+            dict(in_channels=in_channels, out_channels=out_channels))
+
+        self.conv1 = MODELS.build(block_cfg)
+
+        block_cfg.update(
+            dict(in_channels=out_channels, out_channels=out_channels))
+
+        self.block = None
+        if num_blocks > 1:
+            self.block = nn.Sequential(*(MODELS.build(block_cfg)
+                                         for _ in range(num_blocks - 1)))
+
+        if bottle_block == BottleRep:
+            self.conv1 = BottleRep(
+                in_channels,
+                out_channels,
+                block_cfg=block_cfg,
+                adaptive_weight=True)
+            num_blocks = num_blocks // 2
+            self.block = None
+            if num_blocks > 1:
+                self.block = nn.Sequential(*(BottleRep(
+                    out_channels,
+                    out_channels,
+                    block_cfg=block_cfg,
+                    adaptive_weight=True) for _ in range(num_blocks - 1)))
+
+    def forward(self, x: Tensor) -> Tensor:
+        """Forward process.
+
+        Args:
+            inputs (Tensor): The input tensor.
+
+        Returns:
+            Tensor: The output tensor.
+        """
+        x = self.conv1(x)
+        if self.block is not None:
+            x = self.block(x)
+        return x
diff --git a/mmyolo/models/losses/iou_loss.py b/mmyolo/models/losses/iou_loss.py
index 579f26190..0e9ccc263 100644
--- a/mmyolo/models/losses/iou_loss.py
+++ b/mmyolo/models/losses/iou_loss.py
@@ -20,27 +20,31 @@ def bbox_overlaps(pred: torch.Tensor,
     `Implementation of paper `Enhancing Geometric Factors into
     Model Learning and Inference for Object Detection and Instance
     Segmentation <https://arxiv.org/abs/2005.03572>`_.
+
     In the CIoU implementation of YOLOv5 and MMDetection, there is a slight
     difference in the way the alpha parameter is computed.
+
     mmdet version:
         alpha = (ious > 0.5).float() * v / (1 - ious + v)
     YOLOv5 version:
         alpha = v / (v - ious + (1 + eps)
+
     Args:
         pred (Tensor): Predicted bboxes of format (x1, y1, x2, y2)
             or (x, y, w, h),shape (n, 4).
         target (Tensor): Corresponding gt bboxes, shape (n, 4).
-        iou_mode (str): Options are "ciou".
+        iou_mode (str): Options are ('iou', 'ciou', 'giou', 'siou').
             Defaults to "ciou".
         bbox_format (str): Options are "xywh" and "xyxy".
             Defaults to "xywh".
         siou_theta (float): siou_theta for SIoU when calculate shape cost.
             Defaults to 4.0.
         eps (float): Eps to avoid log(0).
+
     Returns:
-        Tensor: shape (n,).
+        Tensor: shape (n, ).
     """
-    assert iou_mode in ('ciou', 'giou', 'siou')
+    assert iou_mode in ('iou', 'ciou', 'giou', 'siou')
     assert bbox_format in ('xyxy', 'xywh')
     if bbox_format == 'xywh':
         pred = HorizontalBoxes.cxcywh_to_xyxy(pred)
diff --git a/mmyolo/models/necks/__init__.py b/mmyolo/models/necks/__init__.py
index c6dd09554..7165327d4 100644
--- a/mmyolo/models/necks/__init__.py
+++ b/mmyolo/models/necks/__init__.py
@@ -3,11 +3,11 @@
 from .cspnext_pafpn import CSPNeXtPAFPN
 from .ppyoloe_csppan import PPYOLOECSPPAFPN
 from .yolov5_pafpn import YOLOv5PAFPN
-from .yolov6_pafpn import YOLOv6RepPAFPN
+from .yolov6_pafpn import YOLOv6CSPRepPAFPN, YOLOv6RepPAFPN
 from .yolov7_pafpn import YOLOv7PAFPN
 from .yolox_pafpn import YOLOXPAFPN
 
 __all__ = [
     'YOLOv5PAFPN', 'BaseYOLONeck', 'YOLOv6RepPAFPN', 'YOLOXPAFPN',
-    'CSPNeXtPAFPN', 'YOLOv7PAFPN', 'PPYOLOECSPPAFPN'
+    'CSPNeXtPAFPN', 'YOLOv7PAFPN', 'PPYOLOECSPPAFPN', 'YOLOv6CSPRepPAFPN'
 ]
diff --git a/mmyolo/models/necks/yolov5_pafpn.py b/mmyolo/models/necks/yolov5_pafpn.py
index cc7487e78..b95147fc5 100644
--- a/mmyolo/models/necks/yolov5_pafpn.py
+++ b/mmyolo/models/necks/yolov5_pafpn.py
@@ -56,12 +56,15 @@ def __init__(self,
             init_cfg=init_cfg)
 
     def init_weights(self):
-        """Initialize the parameters."""
-        for m in self.modules():
-            if isinstance(m, torch.nn.Conv2d):
-                # In order to be consistent with the source code,
-                # reset the Conv2d initialization parameters
-                m.reset_parameters()
+        if self.init_cfg is None:
+            """Initialize the parameters."""
+            for m in self.modules():
+                if isinstance(m, torch.nn.Conv2d):
+                    # In order to be consistent with the source code,
+                    # reset the Conv2d initialization parameters
+                    m.reset_parameters()
+        else:
+            super().init_weights()
 
     def build_reduce_layer(self, idx: int) -> nn.Module:
         """build reduce layer.
diff --git a/mmyolo/models/necks/yolov6_pafpn.py b/mmyolo/models/necks/yolov6_pafpn.py
index 54f22d0ab..74b7ce932 100644
--- a/mmyolo/models/necks/yolov6_pafpn.py
+++ b/mmyolo/models/necks/yolov6_pafpn.py
@@ -7,8 +7,8 @@
 from mmdet.utils import ConfigType, OptMultiConfig
 
 from mmyolo.registry import MODELS
-from ..layers import RepStageBlock, RepVGGBlock
-from ..utils import make_divisible, make_round
+from ..layers import BepC3StageBlock, RepStageBlock
+from ..utils import make_round
 from .base_yolo_neck import BaseYOLONeck
 
 
@@ -29,8 +29,8 @@ class YOLOv6RepPAFPN(BaseYOLONeck):
             Defaults to dict(type='BN', momentum=0.03, eps=0.001).
         act_cfg (dict): Config dict for activation layer.
             Defaults to dict(type='ReLU', inplace=True).
-        block (nn.Module): block used to build each layer.
-            Defaults to RepVGGBlock.
+        block_cfg (dict): Config dict for the block used to build each
+            layer. Defaults to dict(type='RepVGGBlock').
         init_cfg (dict or list[dict], optional): Initialization config dict.
             Defaults to None.
     """
@@ -45,10 +45,10 @@ def __init__(self,
                  norm_cfg: ConfigType = dict(
                      type='BN', momentum=0.03, eps=0.001),
                  act_cfg: ConfigType = dict(type='ReLU', inplace=True),
-                 block: nn.Module = RepVGGBlock,
+                 block_cfg: ConfigType = dict(type='RepVGGBlock'),
                  init_cfg: OptMultiConfig = None):
         self.num_csp_blocks = num_csp_blocks
-        self.block = block
+        self.block_cfg = block_cfg
         super().__init__(
             in_channels=in_channels,
             out_channels=out_channels,
@@ -64,16 +64,14 @@ def build_reduce_layer(self, idx: int) -> nn.Module:
 
         Args:
             idx (int): layer idx.
-
         Returns:
             nn.Module: The reduce layer.
         """
         if idx == 2:
             layer = ConvModule(
-                in_channels=make_divisible(self.in_channels[idx],
-                                           self.widen_factor),
-                out_channels=make_divisible(self.out_channels[idx - 1],
-                                            self.widen_factor),
+                in_channels=int(self.in_channels[idx] * self.widen_factor),
+                out_channels=int(self.out_channels[idx - 1] *
+                                 self.widen_factor),
                 kernel_size=1,
                 stride=1,
                 norm_cfg=self.norm_cfg,
@@ -88,15 +86,12 @@ def build_upsample_layer(self, idx: int) -> nn.Module:
 
         Args:
             idx (int): layer idx.
-
         Returns:
             nn.Module: The upsample layer.
         """
         return nn.ConvTranspose2d(
-            in_channels=make_divisible(self.out_channels[idx - 1],
-                                       self.widen_factor),
-            out_channels=make_divisible(self.out_channels[idx - 1],
-                                        self.widen_factor),
+            in_channels=int(self.out_channels[idx - 1] * self.widen_factor),
+            out_channels=int(self.out_channels[idx - 1] * self.widen_factor),
             kernel_size=2,
             stride=2,
             bias=True)
@@ -106,26 +101,27 @@ def build_top_down_layer(self, idx: int) -> nn.Module:
 
         Args:
             idx (int): layer idx.
-
         Returns:
             nn.Module: The top down layer.
         """
+        block_cfg = self.block_cfg.copy()
+
         layer0 = RepStageBlock(
-            in_channels=make_divisible(
-                self.out_channels[idx - 1] + self.in_channels[idx - 1],
+            in_channels=int(
+                (self.out_channels[idx - 1] + self.in_channels[idx - 1]) *
                 self.widen_factor),
-            out_channels=make_divisible(self.out_channels[idx - 1],
-                                        self.widen_factor),
-            n=make_round(self.num_csp_blocks, self.deepen_factor),
-            block=self.block)
+            out_channels=int(self.out_channels[idx - 1] * self.widen_factor),
+            num_blocks=make_round(self.num_csp_blocks, self.deepen_factor),
+            block_cfg=block_cfg)
+
         if idx == 1:
             return layer0
         elif idx == 2:
             layer1 = ConvModule(
-                in_channels=make_divisible(self.out_channels[idx - 1],
-                                           self.widen_factor),
-                out_channels=make_divisible(self.out_channels[idx - 2],
-                                            self.widen_factor),
+                in_channels=int(self.out_channels[idx - 1] *
+                                self.widen_factor),
+                out_channels=int(self.out_channels[idx - 2] *
+                                 self.widen_factor),
                 kernel_size=1,
                 stride=1,
                 norm_cfg=self.norm_cfg,
@@ -137,15 +133,12 @@ def build_downsample_layer(self, idx: int) -> nn.Module:
 
         Args:
             idx (int): layer idx.
-
         Returns:
             nn.Module: The downsample layer.
         """
         return ConvModule(
-            in_channels=make_divisible(self.out_channels[idx],
-                                       self.widen_factor),
-            out_channels=make_divisible(self.out_channels[idx],
-                                        self.widen_factor),
+            in_channels=int(self.out_channels[idx] * self.widen_factor),
+            out_channels=int(self.out_channels[idx] * self.widen_factor),
             kernel_size=3,
             stride=2,
             padding=3 // 2,
@@ -157,26 +150,136 @@ def build_bottom_up_layer(self, idx: int) -> nn.Module:
 
         Args:
             idx (int): layer idx.
-
         Returns:
             nn.Module: The bottom up layer.
         """
+        block_cfg = self.block_cfg.copy()
+
         return RepStageBlock(
-            in_channels=make_divisible(self.out_channels[idx] * 2,
-                                       self.widen_factor),
-            out_channels=make_divisible(self.out_channels[idx + 1],
-                                        self.widen_factor),
-            n=make_round(self.num_csp_blocks, self.deepen_factor),
-            block=self.block)
+            in_channels=int(self.out_channels[idx] * 2 * self.widen_factor),
+            out_channels=int(self.out_channels[idx + 1] * self.widen_factor),
+            num_blocks=make_round(self.num_csp_blocks, self.deepen_factor),
+            block_cfg=block_cfg)
 
     def build_out_layer(self, *args, **kwargs) -> nn.Module:
         """build out layer."""
         return nn.Identity()
 
     def init_weights(self):
-        """Initialize the parameters."""
-        for m in self.modules():
-            if isinstance(m, torch.nn.Conv2d):
-                # In order to be consistent with the source code,
-                # reset the Conv2d initialization parameters
-                m.reset_parameters()
+        if self.init_cfg is None:
+            """Initialize the parameters."""
+            for m in self.modules():
+                if isinstance(m, torch.nn.Conv2d):
+                    # In order to be consistent with the source code,
+                    # reset the Conv2d initialization parameters
+                    m.reset_parameters()
+        else:
+            super().init_weights()
+
+
+@MODELS.register_module()
+class YOLOv6CSPRepPAFPN(YOLOv6RepPAFPN):
+    """Path Aggregation Network used in YOLOv6.
+
+    Args:
+        in_channels (List[int]): Number of input channels per scale.
+        out_channels (int): Number of output channels (used at each scale)
+        deepen_factor (float): Depth multiplier, multiply number of
+            blocks in CSP layer by this amount. Defaults to 1.0.
+        widen_factor (float): Width multiplier, multiply number of
+            channels in each layer by this amount. Defaults to 1.0.
+        num_csp_blocks (int): Number of bottlenecks in CSPLayer. Defaults to 1.
+        freeze_all(bool): Whether to freeze the model.
+        norm_cfg (dict): Config dict for normalization layer.
+            Defaults to dict(type='BN', momentum=0.03, eps=0.001).
+        act_cfg (dict): Config dict for activation layer.
+            Defaults to dict(type='ReLU', inplace=True).
+        block_cfg (dict): Config dict for the block used to build each
+            layer. Defaults to dict(type='RepVGGBlock').
+        block_act_cfg (dict): Config dict for activation layer used in each
+            stage. Defaults to dict(type='SiLU', inplace=True).
+        init_cfg (dict or list[dict], optional): Initialization config dict.
+            Defaults to None.
+    """
+
+    def __init__(self,
+                 in_channels: List[int],
+                 out_channels: int,
+                 deepen_factor: float = 1.0,
+                 widen_factor: float = 1.0,
+                 hidden_ratio: float = 0.5,
+                 num_csp_blocks: int = 12,
+                 freeze_all: bool = False,
+                 norm_cfg: ConfigType = dict(
+                     type='BN', momentum=0.03, eps=0.001),
+                 act_cfg: ConfigType = dict(type='ReLU', inplace=True),
+                 block_act_cfg: ConfigType = dict(type='SiLU', inplace=True),
+                 block_cfg: ConfigType = dict(type='RepVGGBlock'),
+                 init_cfg: OptMultiConfig = None):
+        self.hidden_ratio = hidden_ratio
+        self.block_act_cfg = block_act_cfg
+        super().__init__(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            deepen_factor=deepen_factor,
+            widen_factor=widen_factor,
+            num_csp_blocks=num_csp_blocks,
+            freeze_all=freeze_all,
+            norm_cfg=norm_cfg,
+            act_cfg=act_cfg,
+            block_cfg=block_cfg,
+            init_cfg=init_cfg)
+
+    def build_top_down_layer(self, idx: int) -> nn.Module:
+        """build top down layer.
+
+        Args:
+            idx (int): layer idx.
+        Returns:
+            nn.Module: The top down layer.
+        """
+        block_cfg = self.block_cfg.copy()
+
+        layer0 = BepC3StageBlock(
+            in_channels=int(
+                (self.out_channels[idx - 1] + self.in_channels[idx - 1]) *
+                self.widen_factor),
+            out_channels=int(self.out_channels[idx - 1] * self.widen_factor),
+            num_blocks=make_round(self.num_csp_blocks, self.deepen_factor),
+            block_cfg=block_cfg,
+            hidden_ratio=self.hidden_ratio,
+            norm_cfg=self.norm_cfg,
+            act_cfg=self.block_act_cfg)
+
+        if idx == 1:
+            return layer0
+        elif idx == 2:
+            layer1 = ConvModule(
+                in_channels=int(self.out_channels[idx - 1] *
+                                self.widen_factor),
+                out_channels=int(self.out_channels[idx - 2] *
+                                 self.widen_factor),
+                kernel_size=1,
+                stride=1,
+                norm_cfg=self.norm_cfg,
+                act_cfg=self.act_cfg)
+            return nn.Sequential(layer0, layer1)
+
+    def build_bottom_up_layer(self, idx: int) -> nn.Module:
+        """build bottom up layer.
+
+        Args:
+            idx (int): layer idx.
+        Returns:
+            nn.Module: The bottom up layer.
+        """
+        block_cfg = self.block_cfg.copy()
+
+        return BepC3StageBlock(
+            in_channels=int(self.out_channels[idx] * 2 * self.widen_factor),
+            out_channels=int(self.out_channels[idx + 1] * self.widen_factor),
+            num_blocks=make_round(self.num_csp_blocks, self.deepen_factor),
+            block_cfg=block_cfg,
+            hidden_ratio=self.hidden_ratio,
+            norm_cfg=self.norm_cfg,
+            act_cfg=self.block_act_cfg)
diff --git a/mmyolo/models/necks/yolov7_pafpn.py b/mmyolo/models/necks/yolov7_pafpn.py
index ec48663db..1d31f4623 100644
--- a/mmyolo/models/necks/yolov7_pafpn.py
+++ b/mmyolo/models/necks/yolov7_pafpn.py
@@ -6,8 +6,7 @@
 from mmdet.utils import ConfigType, OptMultiConfig
 
 from mmyolo.registry import MODELS
-from ..layers import (ELANBlock, MaxPoolAndStrideConvBlock, RepVGGBlock,
-                      SPPFCSPBlock)
+from ..layers import MaxPoolAndStrideConvBlock, RepVGGBlock, SPPFCSPBlock
 from .base_yolo_neck import BaseYOLONeck
 
 
@@ -18,12 +17,21 @@ class YOLOv7PAFPN(BaseYOLONeck):
     Args:
         in_channels (List[int]): Number of input channels per scale.
         out_channels (int): Number of output channels (used at each scale).
+        block_cfg (dict): Config dict for block.
         deepen_factor (float): Depth multiplier, multiply number of
             blocks in CSP layer by this amount. Defaults to 1.0.
         widen_factor (float): Width multiplier, multiply number of
             channels in each layer by this amount. Defaults to 1.0.
         spp_expand_ratio (float): Expand ratio of SPPCSPBlock.
             Defaults to 0.5.
+        is_tiny_version (bool): Is tiny version of neck. If True,
+            it means it is a yolov7 tiny model. Defaults to False.
+        use_maxpool_in_downsample (bool): Whether maxpooling is
+            used in downsample layers. Defaults to True.
+        use_in_channels_in_downsample (bool): MaxPoolAndStrideConvBlock
+            module input parameters. Defaults to False.
+        use_repconv_outs (bool): Whether to use `repconv` in the output
+            layer. Defaults to True.
         upsample_feats_cat_first (bool): Whether the output features are
             concat first after upsampling in the topdown module.
             Defaults to True. Currently only YOLOv7 is false.
@@ -39,9 +47,19 @@ class YOLOv7PAFPN(BaseYOLONeck):
     def __init__(self,
                  in_channels: List[int],
                  out_channels: List[int],
+                 block_cfg: dict = dict(
+                     type='ELANBlock',
+                     middle_ratio=0.5,
+                     block_ratio=0.25,
+                     num_blocks=4,
+                     num_convs_in_block=1),
                  deepen_factor: float = 1.0,
                  widen_factor: float = 1.0,
                  spp_expand_ratio: float = 0.5,
+                 is_tiny_version: bool = False,
+                 use_maxpool_in_downsample: bool = True,
+                 use_in_channels_in_downsample: bool = False,
+                 use_repconv_outs: bool = True,
                  upsample_feats_cat_first: bool = False,
                  freeze_all: bool = False,
                  norm_cfg: ConfigType = dict(
@@ -49,7 +67,15 @@ def __init__(self,
                  act_cfg: ConfigType = dict(type='SiLU', inplace=True),
                  init_cfg: OptMultiConfig = None):
 
+        self.is_tiny_version = is_tiny_version
+        self.use_maxpool_in_downsample = use_maxpool_in_downsample
+        self.use_in_channels_in_downsample = use_in_channels_in_downsample
         self.spp_expand_ratio = spp_expand_ratio
+        self.use_repconv_outs = use_repconv_outs
+        self.block_cfg = block_cfg
+        self.block_cfg.setdefault('norm_cfg', norm_cfg)
+        self.block_cfg.setdefault('act_cfg', act_cfg)
+
         super().__init__(
             in_channels=[
                 int(channel * widen_factor) for channel in in_channels
@@ -74,11 +100,12 @@ def build_reduce_layer(self, idx: int) -> nn.Module:
         Returns:
             nn.Module: The reduce layer.
         """
-        if idx == 2:
+        if idx == len(self.in_channels) - 1:
             layer = SPPFCSPBlock(
                 self.in_channels[idx],
                 self.out_channels[idx],
                 expand_ratio=self.spp_expand_ratio,
+                is_tiny_version=self.is_tiny_version,
                 kernel_sizes=5,
                 norm_cfg=self.norm_cfg,
                 act_cfg=self.act_cfg)
@@ -112,12 +139,10 @@ def build_top_down_layer(self, idx: int) -> nn.Module:
         Returns:
             nn.Module: The top down layer.
         """
-        return ELANBlock(
-            self.out_channels[idx - 1] * 2,
-            mode='reduce_channel_2x',
-            num_blocks=4,
-            norm_cfg=self.norm_cfg,
-            act_cfg=self.act_cfg)
+        block_cfg = self.block_cfg.copy()
+        block_cfg['in_channels'] = self.out_channels[idx - 1] * 2
+        block_cfg['out_channels'] = self.out_channels[idx - 1]
+        return MODELS.build(block_cfg)
 
     def build_downsample_layer(self, idx: int) -> nn.Module:
         """build downsample layer.
@@ -128,11 +153,22 @@ def build_downsample_layer(self, idx: int) -> nn.Module:
         Returns:
             nn.Module: The downsample layer.
         """
-        return MaxPoolAndStrideConvBlock(
-            self.out_channels[idx],
-            mode='no_change_channel',
-            norm_cfg=self.norm_cfg,
-            act_cfg=self.act_cfg)
+        if self.use_maxpool_in_downsample and not self.is_tiny_version:
+            return MaxPoolAndStrideConvBlock(
+                self.out_channels[idx],
+                self.out_channels[idx + 1],
+                use_in_channels_of_middle=self.use_in_channels_in_downsample,
+                norm_cfg=self.norm_cfg,
+                act_cfg=self.act_cfg)
+        else:
+            return ConvModule(
+                self.out_channels[idx],
+                self.out_channels[idx + 1],
+                3,
+                stride=2,
+                padding=1,
+                norm_cfg=self.norm_cfg,
+                act_cfg=self.act_cfg)
 
     def build_bottom_up_layer(self, idx: int) -> nn.Module:
         """build bottom up layer.
@@ -143,12 +179,10 @@ def build_bottom_up_layer(self, idx: int) -> nn.Module:
         Returns:
             nn.Module: The bottom up layer.
         """
-        return ELANBlock(
-            self.out_channels[idx + 1] * 2,
-            mode='reduce_channel_2x',
-            num_blocks=4,
-            norm_cfg=self.norm_cfg,
-            act_cfg=self.act_cfg)
+        block_cfg = self.block_cfg.copy()
+        block_cfg['in_channels'] = self.out_channels[idx + 1] * 2
+        block_cfg['out_channels'] = self.out_channels[idx + 1]
+        return MODELS.build(block_cfg)
 
     def build_out_layer(self, idx: int) -> nn.Module:
         """build out layer.
@@ -159,9 +193,24 @@ def build_out_layer(self, idx: int) -> nn.Module:
         Returns:
             nn.Module: The out layer.
         """
-        return RepVGGBlock(
-            self.out_channels[idx],
-            self.out_channels[idx] * 2,
-            3,
-            norm_cfg=self.norm_cfg,
-            act_cfg=self.act_cfg)
+        if len(self.in_channels) == 4:
+            # P6
+            return nn.Identity()
+
+        out_channels = self.out_channels[idx] * 2
+
+        if self.use_repconv_outs:
+            return RepVGGBlock(
+                self.out_channels[idx],
+                out_channels,
+                3,
+                norm_cfg=self.norm_cfg,
+                act_cfg=self.act_cfg)
+        else:
+            return ConvModule(
+                self.out_channels[idx],
+                out_channels,
+                3,
+                padding=1,
+                norm_cfg=self.norm_cfg,
+                act_cfg=self.act_cfg)
diff --git a/mmyolo/models/necks/yolox_pafpn.py b/mmyolo/models/necks/yolox_pafpn.py
index 765a1ba47..bd2595e70 100644
--- a/mmyolo/models/necks/yolox_pafpn.py
+++ b/mmyolo/models/necks/yolox_pafpn.py
@@ -2,7 +2,7 @@
 from typing import List
 
 import torch.nn as nn
-from mmcv.cnn import ConvModule
+from mmcv.cnn import ConvModule, DepthwiseSeparableConvModule
 from mmdet.models.backbones.csp_darknet import CSPLayer
 from mmdet.utils import ConfigType, OptMultiConfig
 
@@ -22,6 +22,8 @@ class YOLOXPAFPN(BaseYOLONeck):
         widen_factor (float): Width multiplier, multiply number of
             channels in each layer by this amount. Defaults to 1.0.
         num_csp_blocks (int): Number of bottlenecks in CSPLayer. Defaults to 1.
+        use_depthwise (bool): Whether to use depthwise separable convolution.
+            Defaults to False.
         freeze_all(bool): Whether to freeze the model. Defaults to False.
         norm_cfg (dict): Config dict for normalization layer.
             Defaults to dict(type='BN', momentum=0.03, eps=0.001).
@@ -37,12 +39,14 @@ def __init__(self,
                  deepen_factor: float = 1.0,
                  widen_factor: float = 1.0,
                  num_csp_blocks: int = 3,
+                 use_depthwise: bool = False,
                  freeze_all: bool = False,
                  norm_cfg: ConfigType = dict(
                      type='BN', momentum=0.03, eps=0.001),
                  act_cfg: ConfigType = dict(type='SiLU', inplace=True),
                  init_cfg: OptMultiConfig = None):
         self.num_csp_blocks = round(num_csp_blocks * deepen_factor)
+        self.use_depthwise = use_depthwise
 
         super().__init__(
             in_channels=[
@@ -123,7 +127,9 @@ def build_downsample_layer(self, idx: int) -> nn.Module:
         Returns:
             nn.Module: The downsample layer.
         """
-        return ConvModule(
+        conv = DepthwiseSeparableConvModule \
+            if self.use_depthwise else ConvModule
+        return conv(
             self.in_channels[idx],
             self.in_channels[idx],
             kernel_size=3,
diff --git a/mmyolo/models/plugins/cbam.py b/mmyolo/models/plugins/cbam.py
index 0741fe9f2..e9559f2e2 100644
--- a/mmyolo/models/plugins/cbam.py
+++ b/mmyolo/models/plugins/cbam.py
@@ -48,6 +48,7 @@ def __init__(self,
         self.sigmoid = nn.Sigmoid()
 
     def forward(self, x: torch.Tensor) -> torch.Tensor:
+        """Forward function."""
         avgpool_out = self.fc(self.avg_pool(x))
         maxpool_out = self.fc(self.max_pool(x))
         out = self.sigmoid(avgpool_out + maxpool_out)
@@ -74,6 +75,7 @@ def __init__(self, kernel_size: int = 7):
             act_cfg=dict(type='Sigmoid'))
 
     def forward(self, x: torch.Tensor) -> torch.Tensor:
+        """Forward function."""
         avg_out = torch.mean(x, dim=1, keepdim=True)
         max_out, _ = torch.max(x, dim=1, keepdim=True)
         out = torch.cat([avg_out, max_out], dim=1)
@@ -111,6 +113,7 @@ def __init__(self,
         self.spatial_attention = SpatialAttention(kernel_size)
 
     def forward(self, x: torch.Tensor) -> torch.Tensor:
+        """Forward function."""
         out = self.channel_attention(x) * x
         out = self.spatial_attention(out) * out
         return out
diff --git a/mmyolo/models/task_modules/assigners/batch_yolov7_assigner.py b/mmyolo/models/task_modules/assigners/batch_yolov7_assigner.py
new file mode 100644
index 000000000..7d59239ec
--- /dev/null
+++ b/mmyolo/models/task_modules/assigners/batch_yolov7_assigner.py
@@ -0,0 +1,325 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Sequence
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from mmdet.structures.bbox import bbox_cxcywh_to_xyxy, bbox_overlaps
+
+
+def _cat_multi_level_tensor_in_place(*multi_level_tensor, place_hold_var):
+    """concat multi-level tensor in place."""
+    for level_tensor in multi_level_tensor:
+        for i, var in enumerate(level_tensor):
+            if len(var) > 0:
+                level_tensor[i] = torch.cat(var, dim=0)
+            else:
+                level_tensor[i] = place_hold_var
+
+
+class BatchYOLOv7Assigner(nn.Module):
+    """Batch YOLOv7 Assigner.
+
+    It consists of two assigning steps:
+
+        1. YOLOv5 cross-grid sample assigning
+        2. SimOTA assigning
+
+    This code referenced to
+    https://github.com/WongKinYiu/yolov7/blob/main/utils/loss.py.
+    """
+
+    def __init__(self,
+                 num_classes: int,
+                 num_base_priors: int,
+                 featmap_strides: Sequence[int],
+                 prior_match_thr: float = 4.0,
+                 candidate_topk: int = 10,
+                 iou_weight: float = 3.0,
+                 cls_weight: float = 1.0):
+        super().__init__()
+        self.num_classes = num_classes
+        self.num_base_priors = num_base_priors
+        self.featmap_strides = featmap_strides
+        # yolov5 param
+        self.prior_match_thr = prior_match_thr
+        # simota param
+        self.candidate_topk = candidate_topk
+        self.iou_weight = iou_weight
+        self.cls_weight = cls_weight
+
+    @torch.no_grad()
+    def forward(self,
+                pred_results,
+                batch_targets_normed,
+                batch_input_shape,
+                priors_base_sizes,
+                grid_offset,
+                near_neighbor_thr=0.5) -> dict:
+        # (num_base_priors, num_batch_gt, 7)
+        # 7 is mean (batch_idx, cls_id, x_norm, y_norm,
+        # w_norm, h_norm, prior_idx)
+
+        # mlvl is mean multi_level
+        if batch_targets_normed.shape[1] == 0:
+            # empty gt of batch
+            num_levels = len(pred_results)
+            return dict(
+                mlvl_positive_infos=[pred_results[0].new_empty(
+                    (0, 4))] * num_levels,
+                mlvl_priors=[] * num_levels,
+                mlvl_targets_normed=[] * num_levels)
+
+        # if near_neighbor_thr = 0.5 are mean the nearest
+        # 3 neighbors are also considered positive samples.
+        # if near_neighbor_thr = 1.0 are mean the nearest
+        # 5 neighbors are also considered positive samples.
+        mlvl_positive_infos, mlvl_priors = self.yolov5_assigner(
+            pred_results,
+            batch_targets_normed,
+            priors_base_sizes,
+            grid_offset,
+            near_neighbor_thr=near_neighbor_thr)
+
+        mlvl_positive_infos, mlvl_priors, \
+            mlvl_targets_normed = self.simota_assigner(
+                pred_results, batch_targets_normed, mlvl_positive_infos,
+                mlvl_priors, batch_input_shape)
+
+        place_hold_var = batch_targets_normed.new_empty((0, 4))
+        _cat_multi_level_tensor_in_place(
+            mlvl_positive_infos,
+            mlvl_priors,
+            mlvl_targets_normed,
+            place_hold_var=place_hold_var)
+
+        return dict(
+            mlvl_positive_infos=mlvl_positive_infos,
+            mlvl_priors=mlvl_priors,
+            mlvl_targets_normed=mlvl_targets_normed)
+
+    def yolov5_assigner(self,
+                        pred_results,
+                        batch_targets_normed,
+                        priors_base_sizes,
+                        grid_offset,
+                        near_neighbor_thr=0.5):
+        num_batch_gts = batch_targets_normed.shape[1]
+        assert num_batch_gts > 0
+
+        mlvl_positive_infos, mlvl_priors = [], []
+
+        scaled_factor = torch.ones(7, device=pred_results[0].device)
+        for i in range(len(pred_results)):  # lever
+            priors_base_sizes_i = priors_base_sizes[i]
+            # (1, 1, feat_shape_w, feat_shape_h, feat_shape_w, feat_shape_h)
+            scaled_factor[2:6] = torch.tensor(
+                pred_results[i].shape)[[3, 2, 3, 2]]
+
+            # Scale batch_targets from range 0-1 to range 0-features_maps size.
+            # (num_base_priors, num_batch_gts, 7)
+            batch_targets_scaled = batch_targets_normed * scaled_factor
+
+            # Shape match
+            wh_ratio = batch_targets_scaled[...,
+                                            4:6] / priors_base_sizes_i[:, None]
+            match_inds = torch.max(
+                wh_ratio, 1. / wh_ratio).max(2)[0] < self.prior_match_thr
+            batch_targets_scaled = batch_targets_scaled[
+                match_inds]  # (num_matched_target, 7)
+
+            # no gt bbox matches anchor
+            if batch_targets_scaled.shape[0] == 0:
+                mlvl_positive_infos.append(
+                    batch_targets_scaled.new_empty((0, 4)))
+                mlvl_priors.append([])
+                continue
+
+            # Positive samples with additional neighbors
+            batch_targets_cxcy = batch_targets_scaled[:, 2:4]
+            grid_xy = scaled_factor[[2, 3]] - batch_targets_cxcy
+            left, up = ((batch_targets_cxcy % 1 < near_neighbor_thr) &
+                        (batch_targets_cxcy > 1)).T
+            right, bottom = ((grid_xy % 1 < near_neighbor_thr) &
+                             (grid_xy > 1)).T
+            offset_inds = torch.stack(
+                (torch.ones_like(left), left, up, right, bottom))
+            batch_targets_scaled = batch_targets_scaled.repeat(
+                (5, 1, 1))[offset_inds]  # ()
+            retained_offsets = grid_offset.repeat(1, offset_inds.shape[1],
+                                                  1)[offset_inds]
+
+            # batch_targets_scaled: (num_matched_target, 7)
+            # 7 is mean (batch_idx, cls_id, x_scaled,
+            # y_scaled, w_scaled, h_scaled, prior_idx)
+
+            # mlvl_positive_info: (num_matched_target, 4)
+            # 4 is mean (batch_idx, prior_idx, x_scaled, y_scaled)
+            mlvl_positive_info = batch_targets_scaled[:, [0, 6, 2, 3]]
+            retained_offsets = retained_offsets * near_neighbor_thr
+            mlvl_positive_info[:,
+                               2:] = mlvl_positive_info[:,
+                                                        2:] - retained_offsets
+            mlvl_positive_info[:, 2].clamp_(0, scaled_factor[2] - 1)
+            mlvl_positive_info[:, 3].clamp_(0, scaled_factor[3] - 1)
+            mlvl_positive_info = mlvl_positive_info.long()
+            priors_inds = mlvl_positive_info[:, 1]
+
+            mlvl_positive_infos.append(mlvl_positive_info)
+            mlvl_priors.append(priors_base_sizes_i[priors_inds])
+
+        return mlvl_positive_infos, mlvl_priors
+
+    def simota_assigner(self, pred_results, batch_targets_normed,
+                        mlvl_positive_infos, mlvl_priors, batch_input_shape):
+        num_batch_gts = batch_targets_normed.shape[1]
+        assert num_batch_gts > 0
+        num_levels = len(mlvl_positive_infos)
+
+        mlvl_positive_infos_matched = [[] for _ in range(num_levels)]
+        mlvl_priors_matched = [[] for _ in range(num_levels)]
+        mlvl_targets_normed_matched = [[] for _ in range(num_levels)]
+
+        for batch_idx in range(pred_results[0].shape[0]):
+            # (num_batch_gt, 7)
+            # 7 is mean (batch_idx, cls_id, x_norm, y_norm,
+            # w_norm, h_norm, prior_idx)
+            targets_normed = batch_targets_normed[0]
+            # (num_gt, 7)
+            targets_normed = targets_normed[targets_normed[:, 0] == batch_idx]
+            num_gts = targets_normed.shape[0]
+
+            if num_gts == 0:
+                continue
+
+            _mlvl_decoderd_bboxes = []
+            _mlvl_obj_cls = []
+            _mlvl_priors = []
+            _mlvl_positive_infos = []
+            _from_which_layer = []
+
+            for i, head_pred in enumerate(pred_results):
+                # (num_matched_target, 4)
+                #  4 is mean (batch_idx, prior_idx, grid_x, grid_y)
+                _mlvl_positive_info = mlvl_positive_infos[i]
+                if _mlvl_positive_info.shape[0] == 0:
+                    continue
+
+                idx = (_mlvl_positive_info[:, 0] == batch_idx)
+                _mlvl_positive_info = _mlvl_positive_info[idx]
+                _mlvl_positive_infos.append(_mlvl_positive_info)
+
+                priors = mlvl_priors[i][idx]
+                _mlvl_priors.append(priors)
+
+                _from_which_layer.append(
+                    torch.ones(size=(_mlvl_positive_info.shape[0], )) * i)
+
+                # (n,85)
+                level_batch_idx, prior_ind, \
+                    grid_x, grid_y = _mlvl_positive_info.T
+                pred_positive = head_pred[level_batch_idx, prior_ind, grid_y,
+                                          grid_x]
+                _mlvl_obj_cls.append(pred_positive[:, 4:])
+
+                # decoded
+                grid = torch.stack([grid_x, grid_y], dim=1)
+                pred_positive_cxcy = (pred_positive[:, :2].sigmoid() * 2. -
+                                      0.5 + grid) * self.featmap_strides[i]
+                pred_positive_wh = (pred_positive[:, 2:4].sigmoid() * 2) ** 2 \
+                    * priors * self.featmap_strides[i]
+                pred_positive_xywh = torch.cat(
+                    [pred_positive_cxcy, pred_positive_wh], dim=-1)
+                _mlvl_decoderd_bboxes.append(pred_positive_xywh)
+
+            # 1 calc pair_wise_iou_loss
+            _mlvl_decoderd_bboxes = torch.cat(_mlvl_decoderd_bboxes, dim=0)
+            num_pred_positive = _mlvl_decoderd_bboxes.shape[0]
+            if num_pred_positive == 0:
+                continue
+
+            # scaled xywh
+            batch_input_shape_wh = pred_results[0].new_tensor(
+                batch_input_shape[::-1]).repeat((1, 2))
+            targets_scaled_bbox = targets_normed[:, 2:6] * batch_input_shape_wh
+
+            targets_scaled_bbox = bbox_cxcywh_to_xyxy(targets_scaled_bbox)
+            _mlvl_decoderd_bboxes = bbox_cxcywh_to_xyxy(_mlvl_decoderd_bboxes)
+            pair_wise_iou = bbox_overlaps(targets_scaled_bbox,
+                                          _mlvl_decoderd_bboxes)
+            pair_wise_iou_loss = -torch.log(pair_wise_iou + 1e-8)
+
+            # 2 calc pair_wise_cls_loss
+            _mlvl_obj_cls = torch.cat(_mlvl_obj_cls, dim=0).float().sigmoid()
+            _mlvl_positive_infos = torch.cat(_mlvl_positive_infos, dim=0)
+            _from_which_layer = torch.cat(_from_which_layer, dim=0)
+            _mlvl_priors = torch.cat(_mlvl_priors, dim=0)
+
+            gt_cls_per_image = (
+                F.one_hot(targets_normed[:, 1].to(torch.int64),
+                          self.num_classes).float().unsqueeze(1).repeat(
+                              1, num_pred_positive, 1))
+            # cls_score * obj
+            cls_preds_ = _mlvl_obj_cls[:, 1:]\
+                .unsqueeze(0)\
+                .repeat(num_gts, 1, 1) \
+                * _mlvl_obj_cls[:, 0:1]\
+                .unsqueeze(0).repeat(num_gts, 1, 1)
+            y = cls_preds_.sqrt_()
+            pair_wise_cls_loss = F.binary_cross_entropy_with_logits(
+                torch.log(y / (1 - y)), gt_cls_per_image,
+                reduction='none').sum(-1)
+            del cls_preds_
+
+            # calc cost
+            cost = (
+                self.cls_weight * pair_wise_cls_loss +
+                self.iou_weight * pair_wise_iou_loss)
+
+            # num_gt, num_match_pred
+            matching_matrix = torch.zeros_like(cost)
+
+            top_k, _ = torch.topk(
+                pair_wise_iou,
+                min(self.candidate_topk, pair_wise_iou.shape[1]),
+                dim=1)
+            dynamic_ks = torch.clamp(top_k.sum(1).int(), min=1)
+
+            # Select only topk matches per gt
+            for gt_idx in range(num_gts):
+                _, pos_idx = torch.topk(
+                    cost[gt_idx], k=dynamic_ks[gt_idx].item(), largest=False)
+                matching_matrix[gt_idx][pos_idx] = 1.0
+            del top_k, dynamic_ks
+
+            # Each prediction box can match at most one gt box,
+            # and if there are more than one,
+            # only the least costly one can be taken
+            anchor_matching_gt = matching_matrix.sum(0)
+            if (anchor_matching_gt > 1).sum() > 0:
+                _, cost_argmin = torch.min(
+                    cost[:, anchor_matching_gt > 1], dim=0)
+                matching_matrix[:, anchor_matching_gt > 1] *= 0.0
+                matching_matrix[cost_argmin, anchor_matching_gt > 1] = 1.0
+            fg_mask_inboxes = matching_matrix.sum(0) > 0.0
+            matched_gt_inds = matching_matrix[:, fg_mask_inboxes].argmax(0)
+
+            targets_normed = targets_normed[matched_gt_inds]
+            _mlvl_positive_infos = _mlvl_positive_infos[fg_mask_inboxes]
+            _from_which_layer = _from_which_layer[fg_mask_inboxes]
+            _mlvl_priors = _mlvl_priors[fg_mask_inboxes]
+
+            # Rearranged in the order of the prediction layers
+            # to facilitate loss
+            for i in range(num_levels):
+                layer_idx = _from_which_layer == i
+                mlvl_positive_infos_matched[i].append(
+                    _mlvl_positive_infos[layer_idx])
+                mlvl_priors_matched[i].append(_mlvl_priors[layer_idx])
+                mlvl_targets_normed_matched[i].append(
+                    targets_normed[layer_idx])
+
+        results = mlvl_positive_infos_matched, \
+            mlvl_priors_matched, \
+            mlvl_targets_normed_matched
+        return results
diff --git a/mmyolo/utils/boxam_utils.py b/mmyolo/utils/boxam_utils.py
new file mode 100644
index 000000000..5e1ec9134
--- /dev/null
+++ b/mmyolo/utils/boxam_utils.py
@@ -0,0 +1,504 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import bisect
+import copy
+import warnings
+from pathlib import Path
+from typing import Callable, List, Optional, Tuple, Union
+
+import cv2
+import numpy as np
+import torch
+import torch.nn as nn
+import torchvision
+from mmcv.transforms import Compose
+from mmdet.evaluation import get_classes
+from mmdet.models import build_detector
+from mmdet.utils import ConfigType
+from mmengine.config import Config
+from mmengine.runner import load_checkpoint
+from mmengine.structures import InstanceData
+from torch import Tensor
+
+try:
+    from pytorch_grad_cam import (AblationCAM, AblationLayer,
+                                  ActivationsAndGradients)
+    from pytorch_grad_cam import GradCAM as Base_GradCAM
+    from pytorch_grad_cam import GradCAMPlusPlus as Base_GradCAMPlusPlus
+    from pytorch_grad_cam.base_cam import BaseCAM
+    from pytorch_grad_cam.utils.image import scale_cam_image, show_cam_on_image
+    from pytorch_grad_cam.utils.svd_on_activations import get_2d_projection
+except ImportError:
+    pass
+
+
+def init_detector(
+    config: Union[str, Path, Config],
+    checkpoint: Optional[str] = None,
+    palette: str = 'coco',
+    device: str = 'cuda:0',
+    cfg_options: Optional[dict] = None,
+) -> nn.Module:
+    """Initialize a detector from config file.
+
+    Args:
+        config (str, :obj:`Path`, or :obj:`mmengine.Config`): Config file path,
+            :obj:`Path`, or the config object.
+        checkpoint (str, optional): Checkpoint path. If left as None, the model
+            will not load any weights.
+        palette (str): Color palette used for visualization. If palette
+            is stored in checkpoint, use checkpoint's palette first, otherwise
+            use externally passed palette. Currently, supports 'coco', 'voc',
+            'citys' and 'random'. Defaults to coco.
+        device (str): The device where the anchors will be put on.
+            Defaults to cuda:0.
+        cfg_options (dict, optional): Options to override some settings in
+            the used config.
+
+    Returns:
+        nn.Module: The constructed detector.
+    """
+    if isinstance(config, (str, Path)):
+        config = Config.fromfile(config)
+    elif not isinstance(config, Config):
+        raise TypeError('config must be a filename or Config object, '
+                        f'but got {type(config)}')
+    if cfg_options is not None:
+        config.merge_from_dict(cfg_options)
+    elif 'init_cfg' in config.model.backbone:
+        config.model.backbone.init_cfg = None
+
+    # only change this
+    # grad based method requires train_cfg
+    # config.model.train_cfg = None
+
+    model = build_detector(config.model)
+    if checkpoint is not None:
+        checkpoint = load_checkpoint(model, checkpoint, map_location='cpu')
+        # Weights converted from elsewhere may not have meta fields.
+        checkpoint_meta = checkpoint.get('meta', {})
+        # save the dataset_meta in the model for convenience
+        if 'dataset_meta' in checkpoint_meta:
+            # mmdet 3.x
+            model.dataset_meta = checkpoint_meta['dataset_meta']
+        elif 'CLASSES' in checkpoint_meta:
+            # < mmdet 3.x
+            classes = checkpoint_meta['CLASSES']
+            model.dataset_meta = {'CLASSES': classes, 'PALETTE': palette}
+        else:
+            warnings.simplefilter('once')
+            warnings.warn(
+                'dataset_meta or class names are not saved in the '
+                'checkpoint\'s meta data, use COCO classes by default.')
+            model.dataset_meta = {
+                'CLASSES': get_classes('coco'),
+                'PALETTE': palette
+            }
+
+    model.cfg = config  # save the config in the model for convenience
+    model.to(device)
+    model.eval()
+    return model
+
+
+def reshape_transform(feats: Union[Tensor, List[Tensor]],
+                      max_shape: Tuple[int, int] = (20, 20),
+                      is_need_grad: bool = False):
+    """Reshape and aggregate feature maps when the input is a multi-layer
+    feature map.
+
+    Takes these tensors with different sizes, resizes them to a common shape,
+    and concatenates them.
+    """
+    if len(max_shape) == 1:
+        max_shape = max_shape * 2
+
+    if isinstance(feats, torch.Tensor):
+        feats = [feats]
+    else:
+        if is_need_grad:
+            raise NotImplementedError('The `grad_base` method does not '
+                                      'support output multi-activation layers')
+
+    max_h = max([im.shape[-2] for im in feats])
+    max_w = max([im.shape[-1] for im in feats])
+    if -1 in max_shape:
+        max_shape = (max_h, max_w)
+    else:
+        max_shape = (min(max_h, max_shape[0]), min(max_w, max_shape[1]))
+
+    activations = []
+    for feat in feats:
+        activations.append(
+            torch.nn.functional.interpolate(
+                torch.abs(feat), max_shape, mode='bilinear'))
+
+    activations = torch.cat(activations, axis=1)
+    return activations
+
+
+class BoxAMDetectorWrapper(nn.Module):
+    """Wrap the mmdet model class to facilitate handling of non-tensor
+    situations during inference."""
+
+    def __init__(self,
+                 cfg: ConfigType,
+                 checkpoint: str,
+                 score_thr: float,
+                 device: str = 'cuda:0'):
+        super().__init__()
+        self.cfg = cfg
+        self.device = device
+        self.score_thr = score_thr
+        self.checkpoint = checkpoint
+        self.detector = init_detector(self.cfg, self.checkpoint, device=device)
+
+        pipeline_cfg = copy.deepcopy(self.cfg.test_dataloader.dataset.pipeline)
+        pipeline_cfg[0].type = 'mmdet.LoadImageFromNDArray'
+
+        new_test_pipeline = []
+        for pipeline in pipeline_cfg:
+            if not pipeline['type'].endswith('LoadAnnotations'):
+                new_test_pipeline.append(pipeline)
+        self.test_pipeline = Compose(new_test_pipeline)
+
+        self.is_need_loss = False
+        self.input_data = None
+        self.image = None
+
+    def need_loss(self, is_need_loss: bool):
+        """Grad-based methods require loss."""
+        self.is_need_loss = is_need_loss
+
+    def set_input_data(self,
+                       image: np.ndarray,
+                       pred_instances: Optional[InstanceData] = None):
+        """Set the input data to be used in the next step."""
+        self.image = image
+
+        if self.is_need_loss:
+            assert pred_instances is not None
+            pred_instances = pred_instances.numpy()
+            data = dict(
+                img=self.image,
+                img_id=0,
+                gt_bboxes=pred_instances.bboxes,
+                gt_bboxes_labels=pred_instances.labels)
+            data = self.test_pipeline(data)
+        else:
+            data = dict(img=self.image, img_id=0)
+            data = self.test_pipeline(data)
+            data['inputs'] = [data['inputs']]
+            data['data_samples'] = [data['data_samples']]
+        self.input_data = data
+
+    def __call__(self, *args, **kwargs):
+        assert self.input_data is not None
+        if self.is_need_loss:
+            # Maybe this is a direction that can be optimized
+            # self.detector.init_weights()
+
+            if hasattr(self.detector.bbox_head, 'featmap_sizes'):
+                # Prevent the model algorithm error when calculating loss
+                self.detector.bbox_head.featmap_sizes = None
+
+            data_ = {}
+            data_['inputs'] = [self.input_data['inputs']]
+            data_['data_samples'] = [self.input_data['data_samples']]
+            data = self.detector.data_preprocessor(data_, training=False)
+            loss = self.detector._run_forward(data, mode='loss')
+
+            if hasattr(self.detector.bbox_head, 'featmap_sizes'):
+                self.detector.bbox_head.featmap_sizes = None
+
+            return [loss]
+        else:
+            with torch.no_grad():
+                results = self.detector.test_step(self.input_data)
+                return results
+
+
+class BoxAMDetectorVisualizer:
+    """Box AM visualization class."""
+
+    def __init__(self,
+                 method_class,
+                 model: nn.Module,
+                 target_layers: List,
+                 reshape_transform: Optional[Callable] = None,
+                 is_need_grad: bool = False,
+                 extra_params: Optional[dict] = None):
+        self.target_layers = target_layers
+        self.reshape_transform = reshape_transform
+        self.is_need_grad = is_need_grad
+
+        if method_class.__name__ == 'AblationCAM':
+            batch_size = extra_params.get('batch_size', 1)
+            ratio_channels_to_ablate = extra_params.get(
+                'ratio_channels_to_ablate', 1.)
+            self.cam = AblationCAM(
+                model,
+                target_layers,
+                use_cuda=True if 'cuda' in model.device else False,
+                reshape_transform=reshape_transform,
+                batch_size=batch_size,
+                ablation_layer=extra_params['ablation_layer'],
+                ratio_channels_to_ablate=ratio_channels_to_ablate)
+        else:
+            self.cam = method_class(
+                model,
+                target_layers,
+                use_cuda=True if 'cuda' in model.device else False,
+                reshape_transform=reshape_transform,
+            )
+            if self.is_need_grad:
+                self.cam.activations_and_grads.release()
+
+        self.classes = model.detector.dataset_meta['CLASSES']
+        self.COLORS = np.random.uniform(0, 255, size=(len(self.classes), 3))
+
+    def switch_activations_and_grads(self, model) -> None:
+        """In the grad-based method, we need to switch
+        ``ActivationsAndGradients`` layer, otherwise an error will occur."""
+        self.cam.model = model
+
+        if self.is_need_grad is True:
+            self.cam.activations_and_grads = ActivationsAndGradients(
+                model, self.target_layers, self.reshape_transform)
+            self.is_need_grad = False
+        else:
+            self.cam.activations_and_grads.release()
+            self.is_need_grad = True
+
+    def __call__(self, img, targets, aug_smooth=False, eigen_smooth=False):
+        img = torch.from_numpy(img)[None].permute(0, 3, 1, 2)
+        return self.cam(img, targets, aug_smooth, eigen_smooth)[0, :]
+
+    def show_am(self,
+                image: np.ndarray,
+                pred_instance: InstanceData,
+                grayscale_am: np.ndarray,
+                with_norm_in_bboxes: bool = False):
+        """Normalize the AM to be in the range [0, 1] inside every bounding
+        boxes, and zero outside of the bounding boxes."""
+
+        boxes = pred_instance.bboxes
+        labels = pred_instance.labels
+
+        if with_norm_in_bboxes is True:
+            boxes = boxes.astype(np.int32)
+            renormalized_am = np.zeros(grayscale_am.shape, dtype=np.float32)
+            images = []
+            for x1, y1, x2, y2 in boxes:
+                img = renormalized_am * 0
+                img[y1:y2, x1:x2] = scale_cam_image(
+                    [grayscale_am[y1:y2, x1:x2].copy()])[0]
+                images.append(img)
+
+            renormalized_am = np.max(np.float32(images), axis=0)
+            renormalized_am = scale_cam_image([renormalized_am])[0]
+        else:
+            renormalized_am = grayscale_am
+
+        am_image_renormalized = show_cam_on_image(
+            image / 255, renormalized_am, use_rgb=False)
+
+        image_with_bounding_boxes = self._draw_boxes(
+            boxes, labels, am_image_renormalized, pred_instance.get('scores'))
+        return image_with_bounding_boxes
+
+    def _draw_boxes(self,
+                    boxes: List,
+                    labels: List,
+                    image: np.ndarray,
+                    scores: Optional[List] = None):
+        """draw boxes on image."""
+        for i, box in enumerate(boxes):
+            label = labels[i]
+            color = self.COLORS[label]
+            cv2.rectangle(image, (int(box[0]), int(box[1])),
+                          (int(box[2]), int(box[3])), color, 2)
+            if scores is not None:
+                score = scores[i]
+                text = str(self.classes[label]) + ': ' + str(
+                    round(score * 100, 1))
+            else:
+                text = self.classes[label]
+
+            cv2.putText(
+                image,
+                text, (int(box[0]), int(box[1] - 5)),
+                cv2.FONT_HERSHEY_SIMPLEX,
+                0.5,
+                color,
+                1,
+                lineType=cv2.LINE_AA)
+        return image
+
+
+class DetAblationLayer(AblationLayer):
+    """Det AblationLayer."""
+
+    def __init__(self):
+        super().__init__()
+        self.activations = None
+
+    def set_next_batch(self, input_batch_index, activations,
+                       num_channels_to_ablate):
+        """Extract the next batch member from activations, and repeat it
+        num_channels_to_ablate times."""
+        if isinstance(activations, torch.Tensor):
+            return super().set_next_batch(input_batch_index, activations,
+                                          num_channels_to_ablate)
+
+        self.activations = []
+        for activation in activations:
+            activation = activation[
+                input_batch_index, :, :, :].clone().unsqueeze(0)
+            self.activations.append(
+                activation.repeat(num_channels_to_ablate, 1, 1, 1))
+
+    def __call__(self, x):
+        """Go over the activation indices to be ablated, stored in
+        self.indices."""
+        result = self.activations
+
+        if isinstance(result, torch.Tensor):
+            return super().__call__(x)
+
+        channel_cumsum = np.cumsum([r.shape[1] for r in result])
+        num_channels_to_ablate = result[0].size(0)  # batch
+        for i in range(num_channels_to_ablate):
+            pyramid_layer = bisect.bisect_right(channel_cumsum,
+                                                self.indices[i])
+            if pyramid_layer > 0:
+                index_in_pyramid_layer = self.indices[i] - channel_cumsum[
+                    pyramid_layer - 1]
+            else:
+                index_in_pyramid_layer = self.indices[i]
+            result[pyramid_layer][i, index_in_pyramid_layer, :, :] = -1000
+        return result
+
+
+class DetBoxScoreTarget:
+    """Det Score calculation class.
+
+    In the case of the grad-free method, the calculation method is that
+    for every original detected bounding box specified in "bboxes",
+    assign a score on how the current bounding boxes match it,
+
+        1. In Bbox IoU
+        2. In the classification score.
+        3. In Mask IoU if ``segms`` exist.
+
+    If there is not a large enough overlap, or the category changed,
+    assign a score of 0. The total score is the sum of all the box scores.
+
+    In the case of the grad-based method, the calculation method is
+    the sum of losses after excluding a specific key.
+    """
+
+    def __init__(self,
+                 pred_instance: InstanceData,
+                 match_iou_thr: float = 0.5,
+                 device: str = 'cuda:0',
+                 ignore_loss_params: Optional[List] = None):
+        self.focal_bboxes = pred_instance.bboxes
+        self.focal_labels = pred_instance.labels
+        self.match_iou_thr = match_iou_thr
+        self.device = device
+        self.ignore_loss_params = ignore_loss_params
+        if ignore_loss_params is not None:
+            assert isinstance(self.ignore_loss_params, list)
+
+    def __call__(self, results):
+        output = torch.tensor([0.], device=self.device)
+
+        if 'loss_cls' in results:
+            # grad-based method
+            # results is dict
+            for loss_key, loss_value in results.items():
+                if 'loss' not in loss_key or \
+                        loss_key in self.ignore_loss_params:
+                    continue
+                if isinstance(loss_value, list):
+                    output += sum(loss_value)
+                else:
+                    output += loss_value
+            return output
+        else:
+            # grad-free method
+            # results is DetDataSample
+            pred_instances = results.pred_instances
+            if len(pred_instances) == 0:
+                return output
+
+            pred_bboxes = pred_instances.bboxes
+            pred_scores = pred_instances.scores
+            pred_labels = pred_instances.labels
+
+            for focal_box, focal_label in zip(self.focal_bboxes,
+                                              self.focal_labels):
+                ious = torchvision.ops.box_iou(focal_box[None],
+                                               pred_bboxes[..., :4])
+                index = ious.argmax()
+                if ious[0, index] > self.match_iou_thr and pred_labels[
+                        index] == focal_label:
+                    # TODO: Adaptive adjustment of weights based on algorithms
+                    score = ious[0, index] + pred_scores[index]
+                    output = output + score
+            return output
+
+
+class SpatialBaseCAM(BaseCAM):
+    """CAM that maintains spatial information.
+
+    Gradients are often averaged over the spatial dimension in CAM
+    visualization for classification, but this is unreasonable in detection
+    tasks. There is no need to average the gradients in the detection task.
+    """
+
+    def get_cam_image(self,
+                      input_tensor: torch.Tensor,
+                      target_layer: torch.nn.Module,
+                      targets: List[torch.nn.Module],
+                      activations: torch.Tensor,
+                      grads: torch.Tensor,
+                      eigen_smooth: bool = False) -> np.ndarray:
+
+        weights = self.get_cam_weights(input_tensor, target_layer, targets,
+                                       activations, grads)
+        weighted_activations = weights * activations
+        if eigen_smooth:
+            cam = get_2d_projection(weighted_activations)
+        else:
+            cam = weighted_activations.sum(axis=1)
+        return cam
+
+
+class GradCAM(SpatialBaseCAM, Base_GradCAM):
+    """Gradients are no longer averaged over the spatial dimension."""
+
+    def get_cam_weights(self, input_tensor, target_layer, target_category,
+                        activations, grads):
+        return grads
+
+
+class GradCAMPlusPlus(SpatialBaseCAM, Base_GradCAMPlusPlus):
+    """Gradients are no longer averaged over the spatial dimension."""
+
+    def get_cam_weights(self, input_tensor, target_layers, target_category,
+                        activations, grads):
+        grads_power_2 = grads**2
+        grads_power_3 = grads_power_2 * grads
+        # Equation 19 in https://arxiv.org/abs/1710.11063
+        sum_activations = np.sum(activations, axis=(2, 3))
+        eps = 0.000001
+        aij = grads_power_2 / (
+            2 * grads_power_2 +
+            sum_activations[:, :, None, None] * grads_power_3 + eps)
+        # Now bring back the ReLU from eq.7 in the paper,
+        # And zero out aijs where the activations are 0
+        aij = np.where(grads != 0, aij, 0)
+
+        weights = np.maximum(grads, 0) * aij
+        return weights
diff --git a/mmyolo/utils/labelme_utils.py b/mmyolo/utils/labelme_utils.py
new file mode 100644
index 000000000..3bfc65029
--- /dev/null
+++ b/mmyolo/utils/labelme_utils.py
@@ -0,0 +1,91 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import json
+
+from mmengine.structures import InstanceData
+
+
+class LabelmeFormat:
+    """Predict results save into labelme file.
+
+    Base on https://github.com/wkentaro/labelme/blob/main/labelme/label_file.py
+
+    Args:
+        classes (tuple): Model classes name.
+    """
+
+    def __init__(self, classes: tuple):
+        super().__init__()
+        self.classes = classes
+
+    def __call__(self, pred_instances: InstanceData, metainfo: dict,
+                 output_path: str, selected_classes: list):
+        """Get image data field for labelme.
+
+        Args:
+            pred_instances (InstanceData): Candidate prediction info.
+            metainfo (dict): Meta info of prediction.
+            output_path (str): Image file path.
+            selected_classes (list): Selected class name.
+
+        Labelme file eg.
+            {
+              "version": "5.0.5",
+              "flags": {},
+              "imagePath": "/data/cat/1.jpg",
+              "imageData": null,
+              "imageHeight": 3000,
+              "imageWidth": 4000,
+              "shapes": [
+                {
+                  "label": "cat",
+                  "points": [
+                    [
+                      1148.076923076923,
+                      1188.4615384615383
+                    ],
+                    [
+                      2471.1538461538457,
+                      2176.923076923077
+                    ]
+                  ],
+                  "group_id": null,
+                  "shape_type": "rectangle",
+                  "flags": {}
+                },
+                {...}
+              ]
+            }
+        """
+
+        image_path = metainfo['img_path']
+
+        json_info = {
+            'version': '5.0.5',
+            'flags': {},
+            'imagePath': image_path,
+            'imageData': None,
+            'imageHeight': metainfo['ori_shape'][0],
+            'imageWidth': metainfo['ori_shape'][1],
+            'shapes': []
+        }
+
+        for pred_instance in pred_instances:
+            pred_bbox = pred_instance.bboxes.cpu().numpy().tolist()[0]
+            pred_label = self.classes[pred_instance.labels]
+
+            if selected_classes is not None and \
+                    pred_label not in selected_classes:
+                # filter class name
+                continue
+
+            sub_dict = {
+                'label': pred_label,
+                'points': [pred_bbox[:2], pred_bbox[2:]],
+                'group_id': None,
+                'shape_type': 'rectangle',
+                'flags': {}
+            }
+            json_info['shapes'].append(sub_dict)
+
+        with open(output_path, 'w', encoding='utf-8') as f_json:
+            json.dump(json_info, f_json, ensure_ascii=False, indent=2)
diff --git a/mmyolo/utils/large_image.py b/mmyolo/utils/large_image.py
new file mode 100644
index 000000000..68c6938e5
--- /dev/null
+++ b/mmyolo/utils/large_image.py
@@ -0,0 +1,76 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Sequence, Tuple
+
+from mmcv.ops import batched_nms
+from mmdet.structures import DetDataSample, SampleList
+from mmengine.structures import InstanceData
+
+
+def shift_predictions(det_data_samples: SampleList,
+                      offsets: Sequence[Tuple[int, int]],
+                      src_image_shape: Tuple[int, int]) -> SampleList:
+    """Shift predictions to the original image.
+
+    Args:
+        det_data_samples (List[:obj:`DetDataSample`]): A list of patch results.
+        offsets (Sequence[Tuple[int, int]]): Positions of the left top points
+            of patches.
+        src_image_shape (Tuple[int, int]): A (height, width) tuple of the large
+            image's width and height.
+    Returns:
+        (List[:obj:`DetDataSample`]): shifted results.
+    """
+    try:
+        from sahi.slicing import shift_bboxes, shift_masks
+    except ImportError:
+        raise ImportError('Please run "pip install -U sahi" '
+                          'to install sahi first for large image inference.')
+
+    assert len(det_data_samples) == len(
+        offsets), 'The `results` should has the ' 'same length with `offsets`.'
+    shifted_predictions = []
+    for det_data_sample, offset in zip(det_data_samples, offsets):
+        pred_inst = det_data_sample.pred_instances.clone()
+
+        # shift bboxes and masks
+        pred_inst.bboxes = shift_bboxes(pred_inst.bboxes, offset)
+        if 'masks' in det_data_sample:
+            pred_inst.masks = shift_masks(pred_inst.masks, offset,
+                                          src_image_shape)
+
+        shifted_predictions.append(pred_inst.clone())
+
+    shifted_predictions = InstanceData.cat(shifted_predictions)
+
+    return shifted_predictions
+
+
+def merge_results_by_nms(results: SampleList, offsets: Sequence[Tuple[int,
+                                                                      int]],
+                         src_image_shape: Tuple[int, int],
+                         nms_cfg: dict) -> DetDataSample:
+    """Merge patch results by nms.
+
+    Args:
+        results (List[:obj:`DetDataSample`]): A list of patch results.
+        offsets (Sequence[Tuple[int, int]]): Positions of the left top points
+            of patches.
+        src_image_shape (Tuple[int, int]): A (height, width) tuple of the large
+            image's width and height.
+        nms_cfg (dict): it should specify nms type and other parameters
+            like `iou_threshold`.
+    Returns:
+        :obj:`DetDataSample`: merged results.
+    """
+    shifted_instances = shift_predictions(results, offsets, src_image_shape)
+
+    _, keeps = batched_nms(
+        boxes=shifted_instances.bboxes,
+        scores=shifted_instances.scores,
+        idxs=shifted_instances.labels,
+        nms_cfg=nms_cfg)
+    merged_instances = shifted_instances[keeps]
+
+    merged_result = results[0].clone()
+    merged_result.pred_instances = merged_instances
+    return merged_result
diff --git a/mmyolo/utils/misc.py b/mmyolo/utils/misc.py
index dbc2a62e7..5b5dd5d20 100644
--- a/mmyolo/utils/misc.py
+++ b/mmyolo/utils/misc.py
@@ -5,6 +5,7 @@
 import numpy as np
 import torch
 from mmengine.utils import scandir
+from prettytable import PrettyTable
 
 from mmyolo.models import RepVGGBlock
 
@@ -90,3 +91,26 @@ def get_file_list(source_root: str) -> [list, dict]:
     source_type = dict(is_dir=is_dir, is_url=is_url, is_file=is_file)
 
     return source_file_path_list, source_type
+
+
+def show_data_classes(data_classes):
+    """When printing an error, all class names of the dataset."""
+    print('\n\nThe name of the class contained in the dataset:')
+    data_classes_info = PrettyTable()
+    data_classes_info.title = 'Information of dataset class'
+    # List Print Settings
+    # If the quantity is too large, 25 rows will be displayed in each column
+    if len(data_classes) < 25:
+        data_classes_info.add_column('Class name', data_classes)
+    elif len(data_classes) % 25 != 0 and len(data_classes) > 25:
+        col_num = int(len(data_classes) / 25) + 1
+        data_name_list = list(data_classes)
+        for i in range(0, (col_num * 25) - len(data_classes)):
+            data_name_list.append('')
+        for i in range(0, len(data_name_list), 25):
+            data_classes_info.add_column('Class name',
+                                         data_name_list[i:i + 25])
+
+    # Align display data to the left
+    data_classes_info.align['Class name'] = 'l'
+    print(data_classes_info)
diff --git a/mmyolo/version.py b/mmyolo/version.py
index 3d43f2dfe..f823adabf 100644
--- a/mmyolo/version.py
+++ b/mmyolo/version.py
@@ -1,6 +1,6 @@
 # Copyright (c) OpenMMLab. All rights reserved.
 
-__version__ = '0.1.3'
+__version__ = '0.2.0'
 
 from typing import Tuple
 
diff --git a/model-index.yml b/model-index.yml
index 40ad558b3..de8794ca9 100644
--- a/model-index.yml
+++ b/model-index.yml
@@ -3,3 +3,4 @@ Import:
   - configs/yolov6/metafile.yml
   - configs/yolox/metafile.yml
   - configs/rtmdet/metafile.yml
+  - configs/yolov7/metafile.yml
diff --git a/projects/easydeploy/README.md b/projects/easydeploy/README.md
new file mode 100644
index 000000000..1816e7ed9
--- /dev/null
+++ b/projects/easydeploy/README.md
@@ -0,0 +1,11 @@
+# MMYOLO Model Easy-Deployment
+
+## Introduction
+
+This project is developed for easily converting your MMYOLO models to other inference backends without the need of MMDeploy, which reduces the cost of both time and effort on getting familiar with MMDeploy.
+
+Currently we support converting to `ONNX` and `TensorRT` formats, other inference backends such `ncnn` will be added to this project as well.
+
+## Supported Backends
+
+- [Model Convert](docs/model_convert.md)
diff --git a/projects/easydeploy/README_zh-CN.md b/projects/easydeploy/README_zh-CN.md
new file mode 100644
index 000000000..4c6bc0cf4
--- /dev/null
+++ b/projects/easydeploy/README_zh-CN.md
@@ -0,0 +1,11 @@
+# MMYOLO 模型转换
+
+## 介绍
+
+本项目作为 MMYOLO 的部署 project 单独存在，意图剥离 MMDeploy 当前的体系，独自支持用户完成模型训练后的转换和部署功能，使用户的学习和工程成本下降。
+
+当前支持对 ONNX 格式和 TensorRT 格式的转换，后续对其他推理平台也会支持起来。
+
+## 转换教程
+
+- [Model Convert](docs/model_convert.md)
diff --git a/projects/easydeploy/backbone/__init__.py b/projects/easydeploy/backbone/__init__.py
new file mode 100644
index 000000000..46776f9b1
--- /dev/null
+++ b/projects/easydeploy/backbone/__init__.py
@@ -0,0 +1,4 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .focus import DeployFocus, GConvFocus, NcnnFocus
+
+__all__ = ['DeployFocus', 'NcnnFocus', 'GConvFocus']
diff --git a/projects/easydeploy/backbone/focus.py b/projects/easydeploy/backbone/focus.py
new file mode 100644
index 000000000..2a19afcca
--- /dev/null
+++ b/projects/easydeploy/backbone/focus.py
@@ -0,0 +1,79 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch import Tensor
+
+
+class DeployFocus(nn.Module):
+
+    def __init__(self, orin_Focus: nn.Module):
+        super().__init__()
+        self.__dict__.update(orin_Focus.__dict__)
+
+    def forward(self, x: Tensor) -> Tensor:
+        batch_size, channel, height, width = x.shape
+        x = x.reshape(batch_size, channel, -1, 2, width)
+        x = x.reshape(batch_size, channel, x.shape[2], 2, -1, 2)
+        half_h = x.shape[2]
+        half_w = x.shape[4]
+        x = x.permute(0, 5, 3, 1, 2, 4)
+        x = x.reshape(batch_size, channel * 4, half_h, half_w)
+
+        return self.conv(x)
+
+
+class NcnnFocus(nn.Module):
+
+    def __init__(self, orin_Focus: nn.Module):
+        super().__init__()
+        self.__dict__.update(orin_Focus.__dict__)
+
+    def forward(self, x: Tensor) -> Tensor:
+        batch_size, c, h, w = x.shape
+        assert h % 2 == 0 and w % 2 == 0, f'focus for yolox needs even feature\
+            height and width, got {(h, w)}.'
+
+        x = x.reshape(batch_size, c * h, 1, w)
+        _b, _c, _h, _w = x.shape
+        g = _c // 2
+        # fuse to ncnn's shufflechannel
+        x = x.view(_b, g, 2, _h, _w)
+        x = torch.transpose(x, 1, 2).contiguous()
+        x = x.view(_b, -1, _h, _w)
+
+        x = x.reshape(_b, c * h * w, 1, 1)
+
+        _b, _c, _h, _w = x.shape
+        g = _c // 2
+        # fuse to ncnn's shufflechannel
+        x = x.view(_b, g, 2, _h, _w)
+        x = torch.transpose(x, 1, 2).contiguous()
+        x = x.view(_b, -1, _h, _w)
+
+        x = x.reshape(_b, c * 4, h // 2, w // 2)
+
+        return self.conv(x)
+
+
+class GConvFocus(nn.Module):
+
+    def __init__(self, orin_Focus: nn.Module):
+        super().__init__()
+        device = next(orin_Focus.parameters()).device
+        self.weight1 = torch.tensor([[1., 0], [0, 0]]).expand(3, 1, 2,
+                                                              2).to(device)
+        self.weight2 = torch.tensor([[0, 0], [1., 0]]).expand(3, 1, 2,
+                                                              2).to(device)
+        self.weight3 = torch.tensor([[0, 1.], [0, 0]]).expand(3, 1, 2,
+                                                              2).to(device)
+        self.weight4 = torch.tensor([[0, 0], [0, 1.]]).expand(3, 1, 2,
+                                                              2).to(device)
+        self.__dict__.update(orin_Focus.__dict__)
+
+    def forward(self, x: Tensor) -> Tensor:
+        conv1 = F.conv2d(x, self.weight1, stride=2, groups=3)
+        conv2 = F.conv2d(x, self.weight2, stride=2, groups=3)
+        conv3 = F.conv2d(x, self.weight3, stride=2, groups=3)
+        conv4 = F.conv2d(x, self.weight4, stride=2, groups=3)
+        return self.conv(torch.cat([conv1, conv2, conv3, conv4], dim=1))
diff --git a/projects/easydeploy/bbox_code/__init__.py b/projects/easydeploy/bbox_code/__init__.py
new file mode 100644
index 000000000..2a5c41da7
--- /dev/null
+++ b/projects/easydeploy/bbox_code/__init__.py
@@ -0,0 +1,4 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .bbox_coder import rtmdet_bbox_decoder, yolov5_bbox_decoder
+
+__all__ = ['yolov5_bbox_decoder', 'rtmdet_bbox_decoder']
diff --git a/projects/easydeploy/bbox_code/bbox_coder.py b/projects/easydeploy/bbox_code/bbox_coder.py
new file mode 100644
index 000000000..153d7888e
--- /dev/null
+++ b/projects/easydeploy/bbox_code/bbox_coder.py
@@ -0,0 +1,35 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from typing import Optional
+
+import torch
+from torch import Tensor
+
+
+def yolov5_bbox_decoder(priors: Tensor, bbox_preds: Tensor,
+                        stride: Tensor) -> Tensor:
+    bbox_preds = bbox_preds.sigmoid()
+
+    x_center = (priors[..., 0] + priors[..., 2]) * 0.5
+    y_center = (priors[..., 1] + priors[..., 3]) * 0.5
+    w = priors[..., 2] - priors[..., 0]
+    h = priors[..., 3] - priors[..., 1]
+
+    x_center_pred = (bbox_preds[..., 0] - 0.5) * 2 * stride + x_center
+    y_center_pred = (bbox_preds[..., 1] - 0.5) * 2 * stride + y_center
+    w_pred = (bbox_preds[..., 2] * 2)**2 * w
+    h_pred = (bbox_preds[..., 3] * 2)**2 * h
+
+    decoded_bboxes = torch.stack(
+        [x_center_pred, y_center_pred, w_pred, h_pred], dim=-1)
+
+    return decoded_bboxes
+
+
+def rtmdet_bbox_decoder(priors: Tensor, bbox_preds: Tensor,
+                        stride: Optional[Tensor]) -> Tensor:
+    tl_x = (priors[..., 0] - bbox_preds[..., 0])
+    tl_y = (priors[..., 1] - bbox_preds[..., 1])
+    br_x = (priors[..., 0] + bbox_preds[..., 2])
+    br_y = (priors[..., 1] + bbox_preds[..., 3])
+    decoded_bboxes = torch.stack([tl_x, tl_y, br_x, br_y], -1)
+    return decoded_bboxes
diff --git a/projects/easydeploy/docs/model_convert.md b/projects/easydeploy/docs/model_convert.md
new file mode 100644
index 000000000..062247fc4
--- /dev/null
+++ b/projects/easydeploy/docs/model_convert.md
@@ -0,0 +1,56 @@
+# MMYOLO 模型 ONNX 转换
+
+## 环境依赖
+
+- [onnx](https://github.com/onnx/onnx)
+
+  ```shell
+  pip install onnx
+  ```
+
+  [onnx-simplifier](https://github.com/daquexian/onnx-simplifier) (可选，用于简化模型)
+
+  ```shell
+  pip install onnx-simplifier
+  ```
+
+## 使用方法
+
+[模型导出脚本](./projects/easydeploy/tools/export.py)用于将 `MMYOLO` 模型转换为 `onnx` 。
+
+### 参数介绍:
+
+- `config` : 构建模型使用的配置文件，如 [`yolov5_s-v61_syncbn_fast_8xb16-300e_coco.py`](./configs/yolov5/yolov5_s-v61_syncbn_fast_8xb16-300e_coco.py) 。
+- `checkpoint` : 训练得到的权重文件，如 `yolov5s.pth` 。
+- `--work-dir` : 转换后的模型保存路径。
+- `--img-size`: 转换模型时输入的尺寸，如 `640 640`。
+- `--batch-size`: 转换后的模型输入 `batch size` 。
+- `--device`: 转换模型使用的设备，默认为 `cuda:0`。
+- `--simplify`: 是否简化导出的 `onnx` 模型，需要安装 [onnx-simplifier](https://github.com/daquexian/onnx-simplifier)，默认关闭。
+- `--opset`: 指定导出 `onnx` 的 `opset`，默认为 `11` 。
+- `--backend`: 指定导出 `onnx` 用于的后端 id，`ONNXRuntime`: `1`, `TensorRT8`: `2`, `TensorRT7`: `3`，默认为`1`即 `ONNXRuntime`。
+- `--pre-topk`: 指定导出 `onnx` 的后处理筛选候选框个数阈值，默认为 `1000`。
+- `--keep-topk`: 指定导出 `onnx` 的非极大值抑制输出的候选框个数阈值，默认为 `100`。
+- `--iou-threshold`: 非极大值抑制中过滤重复候选框的 `iou` 阈值，默认为 `0.65`。
+- `--score-threshold`: 非极大值抑制中过滤候选框得分的阈值，默认为 `0.25`。
+
+例子:
+
+```shell
+python ./projects/easydeploy/tools/export.py \
+	configs/yolov5/yolov5_s-v61_syncbn_fast_8xb16-300e_coco.py \
+	yolov5s.pth \
+	--work-dir work_dir \
+    --img-size 640 640 \
+    --batch 1 \
+    --device cpu \
+    --simplify \
+	--opset 11 \
+	--backend 1 \
+	--pre-topk 1000 \
+	--keep-topk 100 \
+	--iou-threshold 0.65 \
+	--score-threshold 0.25
+```
+
+然后利用后端支持的工具如 `TensorRT` 读取 `onnx` 再次转换为后端支持的模型格式如 `.engine/.plan` 等
diff --git a/projects/easydeploy/model/__init__.py b/projects/easydeploy/model/__init__.py
new file mode 100644
index 000000000..5ab73a82a
--- /dev/null
+++ b/projects/easydeploy/model/__init__.py
@@ -0,0 +1,5 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .backendwrapper import BackendWrapper, EngineBuilder
+from .model import DeployModel
+
+__all__ = ['DeployModel', 'BackendWrapper', 'EngineBuilder']
diff --git a/projects/easydeploy/model/backendwrapper.py b/projects/easydeploy/model/backendwrapper.py
new file mode 100644
index 000000000..ddc10e90f
--- /dev/null
+++ b/projects/easydeploy/model/backendwrapper.py
@@ -0,0 +1,256 @@
+import warnings
+from collections import OrderedDict, namedtuple
+from functools import partial
+from pathlib import Path
+from typing import List, Optional, Tuple, Union
+
+import numpy as np
+import onnxruntime
+import tensorrt as trt
+import torch
+from numpy import ndarray
+from torch import Tensor
+
+warnings.filterwarnings(action='ignore', category=DeprecationWarning)
+
+
+class BackendWrapper:
+
+    def __init__(
+            self,
+            weight: Union[str, Path],
+            device: Optional[Union[str, int, torch.device]] = None) -> None:
+        weight = Path(weight) if isinstance(weight, str) else weight
+        assert weight.exists() and weight.suffix in ('.onnx', '.engine',
+                                                     '.plan')
+        if isinstance(device, str):
+            device = torch.device(device)
+        elif isinstance(device, int):
+            device = torch.device(f'cuda:{device}')
+        self.weight = weight
+        self.device = device
+        self.__build_model()
+        self.__init_runtime()
+        self.__warm_up(10)
+
+    def __build_model(self) -> None:
+        model_info = dict()
+        num_input = num_output = 0
+        names = []
+        is_dynamic = False
+        if self.weight.suffix == '.onnx':
+            model_info['backend'] = 'ONNXRuntime'
+            providers = ['CPUExecutionProvider']
+            if 'cuda' in self.device.type:
+                providers.insert(0, 'CUDAExecutionProvider')
+            model = onnxruntime.InferenceSession(
+                str(self.weight), providers=providers)
+            for i, tensor in enumerate(model.get_inputs()):
+                model_info[tensor.name] = dict(
+                    shape=tensor.shape, dtype=tensor.type)
+                num_input += 1
+                names.append(tensor.name)
+                is_dynamic |= any(
+                    map(lambda x: isinstance(x, str), tensor.shape))
+            for i, tensor in enumerate(model.get_outputs()):
+                model_info[tensor.name] = dict(
+                    shape=tensor.shape, dtype=tensor.type)
+                num_output += 1
+                names.append(tensor.name)
+        else:
+            model_info['backend'] = 'TensorRT'
+            logger = trt.Logger(trt.Logger.ERROR)
+            trt.init_libnvinfer_plugins(logger, namespace='')
+            with trt.Runtime(logger) as runtime:
+                model = runtime.deserialize_cuda_engine(
+                    self.weight.read_bytes())
+            profile_shape = []
+            for i in range(model.num_bindings):
+                name = model.get_binding_name(i)
+                shape = tuple(model.get_binding_shape(i))
+                dtype = trt.nptype(model.get_binding_dtype(i))
+                is_dynamic |= (-1 in shape)
+                if model.binding_is_input(i):
+                    num_input += 1
+                    profile_shape.append(model.get_profile_shape(i, 0))
+                else:
+                    num_output += 1
+                model_info[name] = dict(shape=shape, dtype=dtype)
+                names.append(name)
+            model_info['profile_shape'] = profile_shape
+
+        self.num_input = num_input
+        self.num_output = num_output
+        self.names = names
+        self.is_dynamic = is_dynamic
+        self.model = model
+        self.model_info = model_info
+
+    def __init_runtime(self) -> None:
+        bindings = OrderedDict()
+        Binding = namedtuple('Binding',
+                             ('name', 'dtype', 'shape', 'data', 'ptr'))
+        if self.model_info['backend'] == 'TensorRT':
+            context = self.model.create_execution_context()
+            for name in self.names:
+                shape, dtype = self.model_info[name].values()
+                if self.is_dynamic:
+                    cpu_tensor, gpu_tensor, ptr = None, None, None
+                else:
+                    cpu_tensor = np.empty(shape, dtype=np.dtype(dtype))
+                    gpu_tensor = torch.from_numpy(cpu_tensor).to(self.device)
+                    ptr = int(gpu_tensor.data_ptr())
+                bindings[name] = Binding(name, dtype, shape, gpu_tensor, ptr)
+        else:
+            output_names = []
+            for i, name in enumerate(self.names):
+                if i >= self.num_input:
+                    output_names.append(name)
+                shape, dtype = self.model_info[name].values()
+                bindings[name] = Binding(name, dtype, shape, None, None)
+            context = partial(self.model.run, output_names)
+        self.addrs = OrderedDict((n, d.ptr) for n, d in bindings.items())
+        self.bindings = bindings
+        self.context = context
+
+    def __infer(
+            self, inputs: List[Union[ndarray,
+                                     Tensor]]) -> List[Union[ndarray, Tensor]]:
+        assert len(inputs) == self.num_input
+        if self.model_info['backend'] == 'TensorRT':
+            outputs = []
+            for i, (name, gpu_input) in enumerate(
+                    zip(self.names[:self.num_input], inputs)):
+                if self.is_dynamic:
+                    self.context.set_binding_shape(i, gpu_input.shape)
+                self.addrs[name] = gpu_input.data_ptr()
+
+            for i, name in enumerate(self.names[self.num_input:]):
+                i += self.num_input
+                if self.is_dynamic:
+                    shape = tuple(self.context.get_binding_shape(i))
+                    dtype = self.bindings[name].dtype
+                    cpu_tensor = np.empty(shape, dtype=np.dtype(dtype))
+                    out = torch.from_numpy(cpu_tensor).to(self.device)
+                    self.addrs[name] = out.data_ptr()
+                else:
+                    out = self.bindings[name].data
+                outputs.append(out)
+            assert self.context.execute_v2(list(
+                self.addrs.values())), 'Infer fault'
+        else:
+            input_feed = {
+                name: inputs[i]
+                for i, name in enumerate(self.names[:self.num_input])
+            }
+            outputs = self.context(input_feed)
+        return outputs
+
+    def __warm_up(self, n=10) -> None:
+        for _ in range(n):
+            _tmp = []
+            if self.model_info['backend'] == 'TensorRT':
+                for i, name in enumerate(self.names[:self.num_input]):
+                    if self.is_dynamic:
+                        shape = self.model_info['profile_shape'][i][1]
+                        dtype = self.bindings[name].dtype
+                        cpu_tensor = np.empty(shape, dtype=np.dtype(dtype))
+                        _tmp.append(
+                            torch.from_numpy(cpu_tensor).to(self.device))
+                    else:
+                        _tmp.append(self.bindings[name].data)
+            else:
+                print('Please warm up ONNXRuntime model by yourself')
+                print("So this model doesn't warm up")
+                return
+            _ = self.__infer(_tmp)
+
+    def __call__(
+            self, inputs: Union[List, Tensor,
+                                ndarray]) -> List[Union[Tensor, ndarray]]:
+        if not isinstance(inputs, list):
+            inputs = [inputs]
+        outputs = self.__infer(inputs)
+        return outputs
+
+
+class EngineBuilder:
+
+    def __init__(
+            self,
+            checkpoint: Union[str, Path],
+            opt_shape: Union[Tuple, List] = (1, 3, 640, 640),
+            device: Optional[Union[str, int, torch.device]] = None) -> None:
+        checkpoint = Path(checkpoint) if isinstance(checkpoint,
+                                                    str) else checkpoint
+        assert checkpoint.exists() and checkpoint.suffix == '.onnx'
+        if isinstance(device, str):
+            device = torch.device(device)
+        elif isinstance(device, int):
+            device = torch.device(f'cuda:{device}')
+
+        self.checkpoint = checkpoint
+        self.opt_shape = np.array(opt_shape, dtype=np.float32)
+        self.device = device
+
+    def __build_engine(self,
+                       scale: Optional[List[List]] = None,
+                       fp16: bool = True,
+                       with_profiling: bool = True) -> None:
+        logger = trt.Logger(trt.Logger.WARNING)
+        trt.init_libnvinfer_plugins(logger, namespace='')
+        builder = trt.Builder(logger)
+        config = builder.create_builder_config()
+        config.max_workspace_size = torch.cuda.get_device_properties(
+            self.device).total_memory
+        flag = (1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
+        network = builder.create_network(flag)
+        parser = trt.OnnxParser(network, logger)
+        if not parser.parse_from_file(str(self.checkpoint)):
+            raise RuntimeError(
+                f'failed to load ONNX file: {str(self.checkpoint)}')
+        inputs = [network.get_input(i) for i in range(network.num_inputs)]
+        outputs = [network.get_output(i) for i in range(network.num_outputs)]
+        profile = None
+        dshape = -1 in network.get_input(0).shape
+        if dshape:
+            profile = builder.create_optimization_profile()
+            if scale is None:
+                scale = np.array(
+                    [[1, 1, 0.5, 0.5], [1, 1, 1, 1], [4, 1, 1.5, 1.5]],
+                    dtype=np.float32)
+                scale = (self.opt_shape * scale).astype(np.int32)
+            elif isinstance(scale, List):
+                scale = np.array(scale, dtype=np.int32)
+                assert scale.shape[0] == 3, 'Input a wrong scale list'
+            else:
+                raise NotImplementedError
+
+        for inp in inputs:
+            logger.log(
+                trt.Logger.WARNING,
+                f'input "{inp.name}" with shape{inp.shape} {inp.dtype}')
+            if dshape:
+                profile.set_shape(inp.name, *scale)
+        for out in outputs:
+            logger.log(
+                trt.Logger.WARNING,
+                f'output "{out.name}" with shape{out.shape} {out.dtype}')
+        if fp16 and builder.platform_has_fast_fp16:
+            config.set_flag(trt.BuilderFlag.FP16)
+        self.weight = self.checkpoint.with_suffix('.engine')
+        if dshape:
+            config.add_optimization_profile(profile)
+        if with_profiling:
+            config.profiling_verbosity = trt.ProfilingVerbosity.DETAILED
+        with builder.build_engine(network, config) as engine:
+            self.weight.write_bytes(engine.serialize())
+        logger.log(
+            trt.Logger.WARNING, f'Build tensorrt engine finish.\n'
+            f'Save in {str(self.weight.absolute())}')
+
+    def build(self,
+              scale: Optional[List[List]] = None,
+              fp16: bool = True,
+              with_profiling=True):
+        self.__build_engine(scale, fp16, with_profiling)
diff --git a/projects/easydeploy/model/model.py b/projects/easydeploy/model/model.py
new file mode 100644
index 000000000..c274dd831
--- /dev/null
+++ b/projects/easydeploy/model/model.py
@@ -0,0 +1,144 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from functools import partial
+from typing import List, Optional
+
+import torch
+import torch.nn as nn
+from mmdet.models.backbones.csp_darknet import Focus
+from mmengine.config import ConfigDict
+from torch import Tensor
+
+from mmyolo.models import RepVGGBlock
+from mmyolo.models.dense_heads import RTMDetHead, YOLOv5Head
+from ..backbone import DeployFocus, GConvFocus, NcnnFocus
+from ..bbox_code import rtmdet_bbox_decoder, yolov5_bbox_decoder
+from ..nms import batched_nms, efficient_nms, onnx_nms
+
+
+class DeployModel(nn.Module):
+
+    def __init__(self,
+                 baseModel: nn.Module,
+                 postprocess_cfg: Optional[ConfigDict] = None):
+        super().__init__()
+        self.baseModel = baseModel
+        self.baseHead = baseModel.bbox_head
+        self.__init_sub_attributes()
+        detector_type = type(self.baseHead)
+        if postprocess_cfg is None:
+            pre_top_k = 1000
+            keep_top_k = 100
+            iou_threshold = 0.65
+            score_threshold = 0.25
+            backend = 1
+        else:
+            pre_top_k = postprocess_cfg.get('pre_top_k', 1000)
+            keep_top_k = postprocess_cfg.get('keep_top_k', 100)
+            iou_threshold = postprocess_cfg.get('iou_threshold', 0.65)
+            score_threshold = postprocess_cfg.get('score_threshold', 0.25)
+            backend = postprocess_cfg.get('backend', 1)
+        self.__switch_deploy()
+        self.__dict__.update(locals())
+
+    def __init_sub_attributes(self):
+        self.bbox_decoder = self.baseHead.bbox_coder.decode
+        self.prior_generate = self.baseHead.prior_generator.grid_priors
+        self.num_base_priors = self.baseHead.num_base_priors
+        self.featmap_strides = self.baseHead.featmap_strides
+        self.num_classes = self.baseHead.num_classes
+
+    def __switch_deploy(self):
+        for layer in self.baseModel.modules():
+            if isinstance(layer, RepVGGBlock):
+                layer.switch_to_deploy()
+            if isinstance(layer, Focus):
+                # onnxruntime tensorrt8 tensorrt7
+                if self.backend in (1, 2, 3):
+                    self.baseModel.backbone.stem = DeployFocus(layer)
+                # ncnn
+                elif self.backend == 4:
+                    self.baseModel.backbone.stem = NcnnFocus(layer)
+                # switch focus to group conv
+                else:
+                    self.baseModel.backbone.stem = GConvFocus(layer)
+
+    def pred_by_feat(self,
+                     cls_scores: List[Tensor],
+                     bbox_preds: List[Tensor],
+                     objectnesses: Optional[List[Tensor]] = None,
+                     **kwargs):
+        assert len(cls_scores) == len(bbox_preds)
+        dtype = cls_scores[0].dtype
+        device = cls_scores[0].device
+
+        nms_func = self.select_nms()
+        if self.detector_type is YOLOv5Head:
+            bbox_decoder = yolov5_bbox_decoder
+        elif self.detector_type is RTMDetHead:
+            bbox_decoder = rtmdet_bbox_decoder
+        else:
+            bbox_decoder = self.bbox_decoder
+
+        num_imgs = cls_scores[0].shape[0]
+        featmap_sizes = [cls_score.shape[2:] for cls_score in cls_scores]
+
+        mlvl_priors = self.prior_generate(
+            featmap_sizes, dtype=dtype, device=device)
+
+        flatten_priors = torch.cat(mlvl_priors)
+
+        mlvl_strides = [
+            flatten_priors.new_full(
+                (featmap_size[0] * featmap_size[1] * self.num_base_priors, ),
+                stride) for featmap_size, stride in zip(
+                    featmap_sizes, self.featmap_strides)
+        ]
+        flatten_stride = torch.cat(mlvl_strides)
+
+        # flatten cls_scores, bbox_preds and objectness
+        flatten_cls_scores = [
+            cls_score.permute(0, 2, 3, 1).reshape(num_imgs, -1,
+                                                  self.num_classes)
+            for cls_score in cls_scores
+        ]
+        cls_scores = torch.cat(flatten_cls_scores, dim=1).sigmoid()
+
+        flatten_bbox_preds = [
+            bbox_pred.permute(0, 2, 3, 1).reshape(num_imgs, -1, 4)
+            for bbox_pred in bbox_preds
+        ]
+        flatten_bbox_preds = torch.cat(flatten_bbox_preds, dim=1)
+
+        if objectnesses is not None:
+            flatten_objectness = [
+                objectness.permute(0, 2, 3, 1).reshape(num_imgs, -1)
+                for objectness in objectnesses
+            ]
+            flatten_objectness = torch.cat(flatten_objectness, dim=1).sigmoid()
+            cls_scores = cls_scores * (flatten_objectness.unsqueeze(-1))
+
+        scores = cls_scores
+
+        bboxes = bbox_decoder(flatten_priors[None], flatten_bbox_preds,
+                              flatten_stride)
+
+        return nms_func(bboxes, scores, self.keep_top_k, self.iou_threshold,
+                        self.score_threshold, self.pre_top_k, self.keep_top_k)
+
+    def select_nms(self):
+        if self.backend == 1:
+            nms_func = onnx_nms
+        elif self.backend == 2:
+            nms_func = efficient_nms
+        elif self.backend == 3:
+            nms_func = batched_nms
+        else:
+            raise NotImplementedError
+        if type(self.baseHead) is YOLOv5Head:
+            nms_func = partial(nms_func, box_coding=1)
+        return nms_func
+
+    def forward(self, inputs: Tensor):
+        neck_outputs = self.baseModel(inputs)
+        outputs = self.pred_by_feat(*neck_outputs)
+        return outputs
diff --git a/projects/easydeploy/nms/__init__.py b/projects/easydeploy/nms/__init__.py
new file mode 100644
index 000000000..59c5cdbd2
--- /dev/null
+++ b/projects/easydeploy/nms/__init__.py
@@ -0,0 +1,5 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from .ort_nms import onnx_nms
+from .trt_nms import batched_nms, efficient_nms
+
+__all__ = ['efficient_nms', 'batched_nms', 'onnx_nms']
diff --git a/projects/easydeploy/nms/ort_nms.py b/projects/easydeploy/nms/ort_nms.py
new file mode 100644
index 000000000..aad93cf05
--- /dev/null
+++ b/projects/easydeploy/nms/ort_nms.py
@@ -0,0 +1,122 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+from torch import Tensor
+
+_XYWH2XYXY = torch.tensor([[1.0, 0.0, 1.0, 0.0], [0.0, 1.0, 0.0, 1.0],
+                           [-0.5, 0.0, 0.5, 0.0], [0.0, -0.5, 0.0, 0.5]],
+                          dtype=torch.float32)
+
+
+def select_nms_index(scores: Tensor,
+                     boxes: Tensor,
+                     nms_index: Tensor,
+                     batch_size: int,
+                     keep_top_k: int = -1):
+    batch_inds, cls_inds = nms_index[:, 0], nms_index[:, 1]
+    box_inds = nms_index[:, 2]
+
+    scores = scores[batch_inds, cls_inds, box_inds].unsqueeze(1)
+    boxes = boxes[batch_inds, box_inds, ...]
+    dets = torch.cat([boxes, scores], dim=1)
+
+    batched_dets = dets.unsqueeze(0).repeat(batch_size, 1, 1)
+    batch_template = torch.arange(
+        0, batch_size, dtype=batch_inds.dtype, device=batch_inds.device)
+    batched_dets = batched_dets.where(
+        (batch_inds == batch_template.unsqueeze(1)).unsqueeze(-1),
+        batched_dets.new_zeros(1))
+
+    batched_labels = cls_inds.unsqueeze(0).repeat(batch_size, 1)
+    batched_labels = batched_labels.where(
+        (batch_inds == batch_template.unsqueeze(1)),
+        batched_labels.new_ones(1) * -1)
+
+    N = batched_dets.shape[0]
+
+    batched_dets = torch.cat((batched_dets, batched_dets.new_zeros((N, 1, 5))),
+                             1)
+    batched_labels = torch.cat((batched_labels, -batched_labels.new_ones(
+        (N, 1))), 1)
+
+    _, topk_inds = batched_dets[:, :, -1].sort(dim=1, descending=True)
+    topk_batch_inds = torch.arange(
+        batch_size, dtype=topk_inds.dtype,
+        device=topk_inds.device).view(-1, 1)
+    batched_dets = batched_dets[topk_batch_inds, topk_inds, ...]
+    batched_labels = batched_labels[topk_batch_inds, topk_inds, ...]
+    batched_dets, batched_scores = batched_dets.split([4, 1], 2)
+    batched_scores = batched_scores.squeeze(-1)
+
+    num_dets = (batched_scores > 0).sum(1, keepdim=True)
+    return num_dets, batched_dets, batched_scores, batched_labels
+
+
+class ONNXNMSop(torch.autograd.Function):
+
+    @staticmethod
+    def forward(
+        ctx,
+        boxes: Tensor,
+        scores: Tensor,
+        max_output_boxes_per_class: Tensor = torch.tensor([100]),
+        iou_threshold: Tensor = torch.tensor([0.5]),
+        score_threshold: Tensor = torch.tensor([0.05])
+    ) -> Tensor:
+        device = boxes.device
+        batch = scores.shape[0]
+        num_det = 20
+        batches = torch.randint(0, batch, (num_det, )).sort()[0].to(device)
+        idxs = torch.arange(100, 100 + num_det).to(device)
+        zeros = torch.zeros((num_det, ), dtype=torch.int64).to(device)
+        selected_indices = torch.cat([batches[None], zeros[None], idxs[None]],
+                                     0).T.contiguous()
+        selected_indices = selected_indices.to(torch.int64)
+
+        return selected_indices
+
+    @staticmethod
+    def symbolic(
+            g,
+            boxes: Tensor,
+            scores: Tensor,
+            max_output_boxes_per_class: Tensor = torch.tensor([100]),
+            iou_threshold: Tensor = torch.tensor([0.5]),
+            score_threshold: Tensor = torch.tensor([0.05]),
+    ):
+        return g.op(
+            'NonMaxSuppression',
+            boxes,
+            scores,
+            max_output_boxes_per_class,
+            iou_threshold,
+            score_threshold,
+            outputs=1)
+
+
+def onnx_nms(
+    boxes: torch.Tensor,
+    scores: torch.Tensor,
+    max_output_boxes_per_class: int = 100,
+    iou_threshold: float = 0.5,
+    score_threshold: float = 0.05,
+    pre_top_k: int = -1,
+    keep_top_k: int = 100,
+    box_coding: int = 0,
+):
+    max_output_boxes_per_class = torch.tensor([max_output_boxes_per_class])
+    iou_threshold = torch.tensor([iou_threshold])
+    score_threshold = torch.tensor([score_threshold])
+
+    batch_size, _, _ = scores.shape
+    if box_coding == 1:
+        boxes = boxes @ (_XYWH2XYXY.to(boxes.device))
+    scores = scores.transpose(1, 2).contiguous()
+    selected_indices = ONNXNMSop.apply(boxes, scores,
+                                       max_output_boxes_per_class,
+                                       iou_threshold, score_threshold)
+
+    num_dets, batched_dets, batched_scores, batched_labels = select_nms_index(
+        scores, boxes, selected_indices, batch_size, keep_top_k=keep_top_k)
+
+    return num_dets, batched_dets, batched_scores, batched_labels.to(
+        torch.int32)
diff --git a/projects/easydeploy/nms/trt_nms.py b/projects/easydeploy/nms/trt_nms.py
new file mode 100644
index 000000000..5c837b406
--- /dev/null
+++ b/projects/easydeploy/nms/trt_nms.py
@@ -0,0 +1,220 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import torch
+from torch import Tensor
+
+
+class TRTEfficientNMSop(torch.autograd.Function):
+
+    @staticmethod
+    def forward(
+        ctx,
+        boxes: Tensor,
+        scores: Tensor,
+        background_class: int = -1,
+        box_coding: int = 0,
+        iou_threshold: float = 0.45,
+        max_output_boxes: int = 100,
+        plugin_version: str = '1',
+        score_activation: int = 0,
+        score_threshold: float = 0.25,
+    ):
+        batch_size, _, num_classes = scores.shape
+        num_det = torch.randint(
+            0, max_output_boxes, (batch_size, 1), dtype=torch.int32)
+        det_boxes = torch.randn(batch_size, max_output_boxes, 4)
+        det_scores = torch.randn(batch_size, max_output_boxes)
+        det_classes = torch.randint(
+            0, num_classes, (batch_size, max_output_boxes), dtype=torch.int32)
+        return num_det, det_boxes, det_scores, det_classes
+
+    @staticmethod
+    def symbolic(g,
+                 boxes: Tensor,
+                 scores: Tensor,
+                 background_class: int = -1,
+                 box_coding: int = 0,
+                 iou_threshold: float = 0.45,
+                 max_output_boxes: int = 100,
+                 plugin_version: str = '1',
+                 score_activation: int = 0,
+                 score_threshold: float = 0.25):
+        out = g.op(
+            'TRT::EfficientNMS_TRT',
+            boxes,
+            scores,
+            background_class_i=background_class,
+            box_coding_i=box_coding,
+            iou_threshold_f=iou_threshold,
+            max_output_boxes_i=max_output_boxes,
+            plugin_version_s=plugin_version,
+            score_activation_i=score_activation,
+            score_threshold_f=score_threshold,
+            outputs=4)
+        num_det, det_boxes, det_scores, det_classes = out
+        return num_det, det_boxes, det_scores, det_classes
+
+
+class TRTbatchedNMSop(torch.autograd.Function):
+    """TensorRT NMS operation."""
+
+    @staticmethod
+    def forward(
+        ctx,
+        boxes: Tensor,
+        scores: Tensor,
+        plugin_version: str = '1',
+        shareLocation: int = 1,
+        backgroundLabelId: int = -1,
+        numClasses: int = 80,
+        topK: int = 1000,
+        keepTopK: int = 100,
+        scoreThreshold: float = 0.25,
+        iouThreshold: float = 0.45,
+        isNormalized: int = 0,
+        clipBoxes: int = 0,
+        scoreBits: int = 16,
+        caffeSemantics: int = 1,
+    ):
+        batch_size, _, numClasses = scores.shape
+        num_det = torch.randint(
+            0, keepTopK, (batch_size, 1), dtype=torch.int32)
+        det_boxes = torch.randn(batch_size, keepTopK, 4)
+        det_scores = torch.randn(batch_size, keepTopK)
+        det_classes = torch.randint(0, numClasses,
+                                    (batch_size, keepTopK)).float()
+        return num_det, det_boxes, det_scores, det_classes
+
+    @staticmethod
+    def symbolic(
+        g,
+        boxes: Tensor,
+        scores: Tensor,
+        plugin_version: str = '1',
+        shareLocation: int = 1,
+        backgroundLabelId: int = -1,
+        numClasses: int = 80,
+        topK: int = 1000,
+        keepTopK: int = 100,
+        scoreThreshold: float = 0.25,
+        iouThreshold: float = 0.45,
+        isNormalized: int = 0,
+        clipBoxes: int = 0,
+        scoreBits: int = 16,
+        caffeSemantics: int = 1,
+    ):
+        out = g.op(
+            'TRT::BatchedNMSDynamic_TRT',
+            boxes,
+            scores,
+            shareLocation_i=shareLocation,
+            plugin_version_s=plugin_version,
+            backgroundLabelId_i=backgroundLabelId,
+            numClasses_i=numClasses,
+            topK_i=topK,
+            keepTopK_i=keepTopK,
+            scoreThreshold_f=scoreThreshold,
+            iouThreshold_f=iouThreshold,
+            isNormalized_i=isNormalized,
+            clipBoxes_i=clipBoxes,
+            scoreBits_i=scoreBits,
+            caffeSemantics_i=caffeSemantics,
+            outputs=4)
+        num_det, det_boxes, det_scores, det_classes = out
+        return num_det, det_boxes, det_scores, det_classes
+
+
+def _efficient_nms(
+    boxes: Tensor,
+    scores: Tensor,
+    max_output_boxes_per_class: int = 1000,
+    iou_threshold: float = 0.5,
+    score_threshold: float = 0.05,
+    pre_top_k: int = -1,
+    keep_top_k: int = 100,
+    box_coding: int = 0,
+):
+    """Wrapper for `efficient_nms` with TensorRT.
+    Args:
+        boxes (Tensor): The bounding boxes of shape [N, num_boxes, 4].
+        scores (Tensor): The detection scores of shape
+            [N, num_boxes, num_classes].
+        max_output_boxes_per_class (int): Maximum number of output
+            boxes per class of nms. Defaults to 1000.
+        iou_threshold (float): IOU threshold of nms. Defaults to 0.5.
+        score_threshold (float): score threshold of nms.
+            Defaults to 0.05.
+        pre_top_k (int): Number of top K boxes to keep before nms.
+            Defaults to -1.
+        keep_top_k (int): Number of top K boxes to keep after nms.
+            Defaults to -1.
+        box_coding (int): Bounding boxes format for nms.
+            Defaults to 0 means [x1, y1 ,x2, y2].
+            Set to 1 means [x, y, w, h].
+    Returns:
+        tuple[Tensor, Tensor, Tensor, Tensor]:
+        (num_det, det_boxes, det_scores, det_classes),
+        `num_det` of shape [N, 1]
+        `det_boxes` of shape [N, num_det, 4]
+        `det_scores` of shape [N, num_det]
+        `det_classes` of shape [N, num_det]
+    """
+    num_det, det_boxes, det_scores, det_classes = TRTEfficientNMSop.apply(
+        boxes, scores, -1, box_coding, iou_threshold, keep_top_k, '1', 0,
+        score_threshold)
+    return num_det, det_boxes, det_scores, det_classes
+
+
+def _batched_nms(
+    boxes: Tensor,
+    scores: Tensor,
+    max_output_boxes_per_class: int = 1000,
+    iou_threshold: float = 0.5,
+    score_threshold: float = 0.05,
+    pre_top_k: int = -1,
+    keep_top_k: int = 100,
+    box_coding: int = 0,
+):
+    """Wrapper for `efficient_nms` with TensorRT.
+    Args:
+        boxes (Tensor): The bounding boxes of shape [N, num_boxes, 4].
+        scores (Tensor): The detection scores of shape
+            [N, num_boxes, num_classes].
+        max_output_boxes_per_class (int): Maximum number of output
+            boxes per class of nms. Defaults to 1000.
+        iou_threshold (float): IOU threshold of nms. Defaults to 0.5.
+        score_threshold (float): score threshold of nms.
+            Defaults to 0.05.
+        pre_top_k (int): Number of top K boxes to keep before nms.
+            Defaults to -1.
+        keep_top_k (int): Number of top K boxes to keep after nms.
+            Defaults to -1.
+        box_coding (int): Bounding boxes format for nms.
+            Defaults to 0 means [x1, y1 ,x2, y2].
+            Set to 1 means [x, y, w, h].
+    Returns:
+        tuple[Tensor, Tensor, Tensor, Tensor]:
+        (num_det, det_boxes, det_scores, det_classes),
+        `num_det` of shape [N, 1]
+        `det_boxes` of shape [N, num_det, 4]
+        `det_scores` of shape [N, num_det]
+        `det_classes` of shape [N, num_det]
+    """
+    boxes = boxes if boxes.dim() == 4 else boxes.unsqueeze(2)
+    _, _, numClasses = scores.shape
+
+    num_det, det_boxes, det_scores, det_classes = TRTbatchedNMSop.apply(
+        boxes, scores, '1', 1, -1, int(numClasses), min(pre_top_k, 4096),
+        keep_top_k, score_threshold, iou_threshold, 0, 0, 16, 1)
+
+    det_classes = det_classes.int()
+    return num_det, det_boxes, det_scores, det_classes
+
+
+def efficient_nms(*args, **kwargs):
+    """Wrapper function for `_efficient_nms`."""
+    return _efficient_nms(*args, **kwargs)
+
+
+def batched_nms(*args, **kwargs):
+    """Wrapper function for `_batched_nms`."""
+    return _batched_nms(*args, **kwargs)
diff --git a/projects/easydeploy/tools/build_engine.py b/projects/easydeploy/tools/build_engine.py
new file mode 100644
index 000000000..7b02e97b5
--- /dev/null
+++ b/projects/easydeploy/tools/build_engine.py
@@ -0,0 +1,43 @@
+import argparse
+
+from ..model import EngineBuilder
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('checkpoint', help='Checkpoint file')
+    parser.add_argument(
+        '--img-size',
+        nargs='+',
+        type=int,
+        default=[640, 640],
+        help='Image size of height and width')
+    parser.add_argument(
+        '--device', type=str, default='cuda:0', help='TensorRT builder device')
+    parser.add_argument(
+        '--scales',
+        type=str,
+        default='[[1,3,640,640],[1,3,640,640],[1,3,640,640]]',
+        help='Input scales for build dynamic input shape engine')
+    parser.add_argument(
+        '--fp16', action='store_true', help='Build model with fp16 mode')
+    args = parser.parse_args()
+    args.img_size *= 2 if len(args.img_size) == 1 else 1
+    return args
+
+
+def main(args):
+    img_size = (1, 3, *args.img_size)
+    try:
+        scales = eval(args.scales)
+    except Exception:
+        print('Input scales is not a python variable')
+        print('Set scales default None')
+        scales = None
+    builder = EngineBuilder(args.checkpoint, img_size, args.device)
+    builder.build(scales, fp16=args.fp16)
+
+
+if __name__ == '__main__':
+    args = parse_args()
+    main(args)
diff --git a/projects/easydeploy/tools/export.py b/projects/easydeploy/tools/export.py
new file mode 100644
index 000000000..e1a33c381
--- /dev/null
+++ b/projects/easydeploy/tools/export.py
@@ -0,0 +1,135 @@
+import argparse
+import os
+import warnings
+from io import BytesIO
+
+import onnx
+import torch
+from mmdet.apis import init_detector
+from mmengine.config import ConfigDict
+
+from mmyolo.utils import register_all_modules
+from projects.easydeploy.model import DeployModel
+
+warnings.filterwarnings(action='ignore', category=torch.jit.TracerWarning)
+warnings.filterwarnings(action='ignore', category=torch.jit.ScriptWarning)
+warnings.filterwarnings(action='ignore', category=UserWarning)
+warnings.filterwarnings(action='ignore', category=FutureWarning)
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('config', help='Config file')
+    parser.add_argument('checkpoint', help='Checkpoint file')
+    parser.add_argument(
+        '--work-dir', default='./work_dir', help='Path to save export model')
+    parser.add_argument(
+        '--img-size',
+        nargs='+',
+        type=int,
+        default=[640, 640],
+        help='Image size of height and width')
+    parser.add_argument('--batch-size', type=int, default=1, help='Batch size')
+    parser.add_argument(
+        '--device', default='cuda:0', help='Device used for inference')
+    parser.add_argument(
+        '--simplify',
+        action='store_true',
+        help='Simplify onnx model by onnx-sim')
+    parser.add_argument(
+        '--opset', type=int, default=11, help='ONNX opset version')
+    parser.add_argument(
+        '--backend', type=int, default=1, help='Backend for export onnx')
+    parser.add_argument(
+        '--pre-topk',
+        type=int,
+        default=1000,
+        help='Postprocess pre topk bboxes feed into NMS')
+    parser.add_argument(
+        '--keep-topk',
+        type=int,
+        default=100,
+        help='Postprocess keep topk bboxes out of NMS')
+    parser.add_argument(
+        '--iou-threshold',
+        type=float,
+        default=0.65,
+        help='IoU threshold for NMS')
+    parser.add_argument(
+        '--score-threshold',
+        type=float,
+        default=0.25,
+        help='Score threshold for NMS')
+    args = parser.parse_args()
+    args.img_size *= 2 if len(args.img_size) == 1 else 1
+    return args
+
+
+def build_model_from_cfg(config_path, checkpoint_path, device):
+    model = init_detector(config_path, checkpoint_path, device=device)
+    model.eval()
+    return model
+
+
+def main():
+    args = parse_args()
+    register_all_modules()
+
+    if not os.path.exists(args.work_dir):
+        os.mkdir(args.work_dir)
+
+    postprocess_cfg = ConfigDict(
+        pre_top_k=args.pre_topk,
+        keep_top_k=args.keep_topk,
+        iou_threshold=args.iou_threshold,
+        score_threshold=args.score_threshold,
+        backend=args.backend)
+
+    baseModel = build_model_from_cfg(args.config, args.checkpoint, args.device)
+
+    deploy_model = DeployModel(
+        baseModel=baseModel, postprocess_cfg=postprocess_cfg)
+    deploy_model.eval()
+
+    fake_input = torch.randn(args.batch_size, 3,
+                             *args.img_size).to(args.device)
+    # dry run
+    deploy_model(fake_input)
+
+    save_onnx_path = os.path.join(args.work_dir, 'end2end.onnx')
+    # export onnx
+    with BytesIO() as f:
+        torch.onnx.export(
+            deploy_model,
+            fake_input,
+            f,
+            input_names=['images'],
+            output_names=['num_det', 'det_boxes', 'det_scores', 'det_classes'],
+            opset_version=args.opset)
+        f.seek(0)
+        onnx_model = onnx.load(f)
+        onnx.checker.check_model(onnx_model)
+
+        # Fix tensorrt onnx output shape, just for view
+        if args.backend in (2, 3):
+            shapes = [
+                args.batch_size, 1, args.batch_size, args.keep_topk, 4,
+                args.batch_size, args.keep_topk, args.batch_size,
+                args.keep_topk
+            ]
+            for i in onnx_model.graph.output:
+                for j in i.type.tensor_type.shape.dim:
+                    j.dim_param = str(shapes.pop(0))
+    if args.simplify:
+        try:
+            import onnxsim
+            onnx_model, check = onnxsim.simplify(onnx_model)
+            assert check, 'assert check failed'
+        except Exception as e:
+            print(f'Simplify failure: {e}')
+    onnx.save(onnx_model, save_onnx_path)
+    print(f'ONNX export success, save into {save_onnx_path}')
+
+
+if __name__ == '__main__':
+    main()
diff --git a/requirements/runtime.txt b/requirements/runtime.txt
index 24ce15ab7..794a9cab5 100644
--- a/requirements/runtime.txt
+++ b/requirements/runtime.txt
@@ -1 +1,2 @@
 numpy
+prettytable
diff --git a/requirements/sahi.txt b/requirements/sahi.txt
new file mode 100644
index 000000000..0e7b7b842
--- /dev/null
+++ b/requirements/sahi.txt
@@ -0,0 +1 @@
+sahi>=0.11.4
diff --git a/tests/test_datasets/test_transforms/test_mix_img_transforms.py b/tests/test_datasets/test_transforms/test_mix_img_transforms.py
index d2855fb13..fa6ef7e58 100644
--- a/tests/test_datasets/test_transforms/test_mix_img_transforms.py
+++ b/tests/test_datasets/test_transforms/test_mix_img_transforms.py
@@ -9,7 +9,7 @@
 from mmdet.structures.mask import BitmapMasks
 
 from mmyolo.datasets import YOLOv5CocoDataset
-from mmyolo.datasets.transforms import Mosaic, YOLOv5MixUp, YOLOXMixUp
+from mmyolo.datasets.transforms import Mosaic, Mosaic9, YOLOv5MixUp, YOLOXMixUp
 from mmyolo.utils import register_all_modules
 
 register_all_modules()
@@ -108,6 +108,99 @@ def test_transform_with_box_list(self):
         self.assertTrue(results['gt_ignore_flags'].dtype == bool)
 
 
+class TestMosaic9(unittest.TestCase):
+
+    def setUp(self):
+        """Setup the data info which are used in every test method.
+
+        TestCase calls functions in this order: setUp() -> testMethod() ->
+        tearDown() -> cleanUp()
+        """
+        rng = np.random.RandomState(0)
+        self.pre_transform = [
+            dict(
+                type='LoadImageFromFile',
+                file_client_args=dict(backend='disk')),
+            dict(type='LoadAnnotations', with_bbox=True)
+        ]
+
+        self.dataset = YOLOv5CocoDataset(
+            data_prefix=dict(
+                img=osp.join(osp.dirname(__file__), '../../data')),
+            ann_file=osp.join(
+                osp.dirname(__file__), '../../data/coco_sample_color.json'),
+            filter_cfg=dict(filter_empty_gt=False, min_size=32),
+            pipeline=[])
+        self.results = {
+            'img':
+            np.random.random((224, 224, 3)),
+            'img_shape': (224, 224),
+            'gt_bboxes_labels':
+            np.array([1, 2, 3], dtype=np.int64),
+            'gt_bboxes':
+            np.array([[10, 10, 20, 20], [20, 20, 40, 40], [40, 40, 80, 80]],
+                     dtype=np.float32),
+            'gt_ignore_flags':
+            np.array([0, 0, 1], dtype=bool),
+            'gt_masks':
+            BitmapMasks(rng.rand(3, 224, 224), height=224, width=224),
+            'dataset':
+            self.dataset
+        }
+
+    def test_transform(self):
+        # test assertion for invalid img_scale
+        with self.assertRaises(AssertionError):
+            transform = Mosaic9(img_scale=640)
+
+        # test assertion for invalid probability
+        with self.assertRaises(AssertionError):
+            transform = Mosaic9(prob=1.5)
+
+        # test assertion for invalid max_cached_images
+        with self.assertRaises(AssertionError):
+            transform = Mosaic9(use_cached=True, max_cached_images=1)
+
+        transform = Mosaic9(
+            img_scale=(10, 12), pre_transform=self.pre_transform)
+        results = transform(copy.deepcopy(self.results))
+        self.assertTrue(results['img'].shape[:2] == (20, 24))
+        self.assertTrue(results['gt_bboxes_labels'].shape[0] ==
+                        results['gt_bboxes'].shape[0])
+        self.assertTrue(results['gt_bboxes_labels'].dtype == np.int64)
+        self.assertTrue(results['gt_bboxes'].dtype == np.float32)
+        self.assertTrue(results['gt_ignore_flags'].dtype == bool)
+
+    def test_transform_with_no_gt(self):
+        self.results['gt_bboxes'] = np.empty((0, 4), dtype=np.float32)
+        self.results['gt_bboxes_labels'] = np.empty((0, ), dtype=np.int64)
+        self.results['gt_ignore_flags'] = np.empty((0, ), dtype=bool)
+        transform = Mosaic9(
+            img_scale=(10, 12), pre_transform=self.pre_transform)
+        results = transform(copy.deepcopy(self.results))
+        self.assertIsInstance(results, dict)
+        self.assertTrue(results['img'].shape[:2] == (20, 24))
+        self.assertTrue(
+            results['gt_bboxes_labels'].shape[0] == results['gt_bboxes'].
+            shape[0] == results['gt_ignore_flags'].shape[0])
+        self.assertTrue(results['gt_bboxes_labels'].dtype == np.int64)
+        self.assertTrue(results['gt_bboxes'].dtype == np.float32)
+        self.assertTrue(results['gt_ignore_flags'].dtype == bool)
+
+    def test_transform_with_box_list(self):
+        transform = Mosaic9(
+            img_scale=(10, 12), pre_transform=self.pre_transform)
+        results = copy.deepcopy(self.results)
+        results['gt_bboxes'] = HorizontalBoxes(results['gt_bboxes'])
+        results = transform(results)
+        self.assertTrue(results['img'].shape[:2] == (20, 24))
+        self.assertTrue(results['gt_bboxes_labels'].shape[0] ==
+                        results['gt_bboxes'].shape[0])
+        self.assertTrue(results['gt_bboxes_labels'].dtype == np.int64)
+        self.assertTrue(results['gt_bboxes'].dtype == torch.float32)
+        self.assertTrue(results['gt_ignore_flags'].dtype == bool)
+
+
 class TestYOLOv5MixUp(unittest.TestCase):
 
     def setUp(self):
diff --git a/tests/test_datasets/test_transforms/test_transforms.py b/tests/test_datasets/test_transforms/test_transforms.py
index 43012bcae..610c084ae 100644
--- a/tests/test_datasets/test_transforms/test_transforms.py
+++ b/tests/test_datasets/test_transforms/test_transforms.py
@@ -27,59 +27,62 @@ def setUp(self):
         self.data_info1 = dict(
             img=np.random.random((300, 400, 3)),
             gt_bboxes=np.array([[0, 0, 150, 150]], dtype=np.float32),
-            batch_shape=np.array([460, 672], dtype=np.int64),
+            batch_shape=np.array([192, 672], dtype=np.int64),
             gt_masks=BitmapMasks(rng.rand(1, 300, 400), height=300, width=400))
         self.data_info2 = dict(
             img=np.random.random((300, 400, 3)),
             gt_bboxes=np.array([[0, 0, 150, 150]], dtype=np.float32))
         self.data_info3 = dict(
             img=np.random.random((300, 400, 3)),
-            batch_shape=np.array([460, 672], dtype=np.int64))
+            batch_shape=np.array([192, 672], dtype=np.int64))
         self.data_info4 = dict(img=np.random.random((300, 400, 3)))
 
     def test_letter_resize(self):
         # Test allow_scale_up
         transform = LetterResize(scale=(640, 640), allow_scale_up=False)
         results = transform(copy.deepcopy(self.data_info1))
-        self.assertEqual(results['img_shape'], (460, 672, 3))
+        self.assertEqual(results['img_shape'], (192, 672, 3))
         self.assertTrue(
-            (results['gt_bboxes'] == np.array([[136., 80., 286.,
-                                                230.]])).all())
-        self.assertTrue((results['batch_shape'] == np.array([460, 672])).all())
+            (results['gt_bboxes'] == np.array([[208., 0., 304., 96.]])).all())
+        self.assertTrue((results['batch_shape'] == np.array([192, 672])).all())
+        self.assertTrue((results['pad_param'] == np.array([0., 0., 208.,
+                                                           208.])).all())
         self.assertTrue(
-            (results['pad_param'] == np.array([80., 80., 136., 136.])).all())
-        self.assertTrue((results['scale_factor'] <= 1.).all())
+            (np.array(results['scale_factor'], dtype=np.float32) <= 1.).all())
 
         # Test pad_val
         transform = LetterResize(scale=(640, 640), pad_val=dict(img=144))
         results = transform(copy.deepcopy(self.data_info1))
-        self.assertEqual(results['img_shape'], (460, 672, 3))
+        self.assertEqual(results['img_shape'], (192, 672, 3))
         self.assertTrue(
-            (results['gt_bboxes'] == np.array([[29., 0., 259., 230.]])).all())
-        self.assertTrue((results['batch_shape'] == np.array([460, 672])).all())
-        self.assertTrue((results['pad_param'] == np.array([0., 0., 29.,
-                                                           30.])).all())
-        self.assertTrue((results['scale_factor'] > 1.).all())
+            (results['gt_bboxes'] == np.array([[208., 0., 304., 96.]])).all())
+        self.assertTrue((results['batch_shape'] == np.array([192, 672])).all())
+        self.assertTrue((results['pad_param'] == np.array([0., 0., 208.,
+                                                           208.])).all())
+        self.assertTrue(
+            (np.array(results['scale_factor'], dtype=np.float32) <= 1.).all())
 
         # Test use_mini_pad
         transform = LetterResize(scale=(640, 640), use_mini_pad=True)
         results = transform(copy.deepcopy(self.data_info1))
-        self.assertEqual(results['img_shape'], (460, 640, 3))
+        self.assertEqual(results['img_shape'], (192, 256, 3))
+        self.assertTrue((results['gt_bboxes'] == np.array([[0., 0., 96.,
+                                                            96.]])).all())
+        self.assertTrue((results['batch_shape'] == np.array([192, 672])).all())
+        self.assertTrue((results['pad_param'] == np.array([0., 0., 0.,
+                                                           0.])).all())
         self.assertTrue(
-            (results['gt_bboxes'] == np.array([[13., 0., 243., 230.]])).all())
-        self.assertTrue((results['batch_shape'] == np.array([460, 672])).all())
-        self.assertTrue((results['pad_param'] == np.array([0., 0., 13.,
-                                                           14.])).all())
-        self.assertTrue((results['scale_factor'] > 1.).all())
+            (np.array(results['scale_factor'], dtype=np.float32) <= 1.).all())
 
         # Test stretch_only
         transform = LetterResize(scale=(640, 640), stretch_only=True)
         results = transform(copy.deepcopy(self.data_info1))
-        self.assertEqual(results['img_shape'], (460, 672, 3))
+        self.assertEqual(results['img_shape'], (192, 672, 3))
         self.assertTrue((results['gt_bboxes'] == np.array(
-            [[0., 0., 230., 251.99998474121094]])).all())
-        self.assertTrue((results['batch_shape'] == np.array([460, 672])).all())
-        self.assertTrue((results['pad_param'] == np.array([0, 0, 0, 0])).all())
+            [[0., 0., 251.99998474121094, 96.]])).all())
+        self.assertTrue((results['batch_shape'] == np.array([192, 672])).all())
+        self.assertTrue((results['pad_param'] == np.array([0., 0., 0.,
+                                                           0.])).all())
 
         # Test
         transform = LetterResize(scale=(640, 640), pad_val=dict(img=144))
@@ -150,13 +153,15 @@ def test_yolov5_keep_ratio_resize(self):
         self.assertEqual(results['img_shape'], (480, 640))
         self.assertTrue(
             (results['gt_bboxes'] == np.array([[0., 0., 240., 240.]])).all())
-        self.assertTrue((results['scale_factor'] == 1.6).all())
+        self.assertTrue((np.array(results['scale_factor'],
+                                  dtype=np.float32) == 1.6).all())
 
         # Test only img
         transform = YOLOv5KeepRatioResize(scale=(640, 640))
         results = transform(copy.deepcopy(self.data_info2))
         self.assertEqual(results['img_shape'], (480, 640))
-        self.assertTrue((results['scale_factor'] == 1.6).all())
+        self.assertTrue((np.array(results['scale_factor'],
+                                  dtype=np.float32) == 1.6).all())
 
 
 class TestYOLOv5HSVRandomAug(unittest.TestCase):
diff --git a/tests/test_engine/test_optimizers/test_yolov7_optim_wrapper_constructor.py b/tests/test_engine/test_optimizers/test_yolov7_optim_wrapper_constructor.py
new file mode 100644
index 000000000..a2f445bed
--- /dev/null
+++ b/tests/test_engine/test_optimizers/test_yolov7_optim_wrapper_constructor.py
@@ -0,0 +1,81 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+
+import copy
+from unittest import TestCase
+
+import torch
+import torch.nn as nn
+from mmengine.optim import build_optim_wrapper
+
+from mmyolo.engine import YOLOv7OptimWrapperConstructor
+from mmyolo.utils import register_all_modules
+
+register_all_modules()
+
+
+class ExampleModel(nn.Module):
+
+    def __init__(self):
+        super().__init__()
+        self.param1 = nn.Parameter(torch.ones(1))
+        self.conv1 = nn.Conv2d(3, 4, kernel_size=1, bias=False)
+        self.conv2 = nn.Conv2d(4, 2, kernel_size=1)
+        self.bn = nn.BatchNorm2d(2)
+
+
+class TestYOLOv7OptimWrapperConstructor(TestCase):
+
+    def setUp(self):
+        self.model = ExampleModel()
+        self.base_lr = 0.01
+        self.weight_decay = 0.0001
+        self.optim_wrapper_cfg = dict(
+            type='OptimWrapper',
+            optimizer=dict(
+                type='SGD',
+                lr=self.base_lr,
+                momentum=0.9,
+                weight_decay=self.weight_decay,
+                batch_size_per_gpu=16))
+
+    def test_init(self):
+        YOLOv7OptimWrapperConstructor(copy.deepcopy(self.optim_wrapper_cfg))
+        YOLOv7OptimWrapperConstructor(
+            copy.deepcopy(self.optim_wrapper_cfg),
+            paramwise_cfg={'base_total_batch_size': 64})
+
+        # `paramwise_cfg` must include `base_total_batch_size` if not None.
+        with self.assertRaises(AssertionError):
+            YOLOv7OptimWrapperConstructor(
+                copy.deepcopy(self.optim_wrapper_cfg), paramwise_cfg={'a': 64})
+
+    def test_build(self):
+        optim_wrapper = YOLOv7OptimWrapperConstructor(
+            copy.deepcopy(self.optim_wrapper_cfg))(
+                self.model)
+        # test param_groups
+        assert len(optim_wrapper.optimizer.param_groups) == 3
+        for i in range(3):
+            param_groups_i = optim_wrapper.optimizer.param_groups[i]
+            assert param_groups_i['lr'] == self.base_lr
+            if i == 0:
+                assert param_groups_i['weight_decay'] == self.weight_decay
+            else:
+                assert param_groups_i['weight_decay'] == 0
+
+        # test weight_decay linear scaling
+        optim_wrapper_cfg = copy.deepcopy(self.optim_wrapper_cfg)
+        optim_wrapper_cfg['optimizer']['batch_size_per_gpu'] = 128
+        optim_wrapper = YOLOv7OptimWrapperConstructor(optim_wrapper_cfg)(
+            self.model)
+        assert optim_wrapper.optimizer.param_groups[0][
+            'weight_decay'] == self.weight_decay * 2
+
+        # test without batch_size_per_gpu
+        optim_wrapper_cfg = copy.deepcopy(self.optim_wrapper_cfg)
+        optim_wrapper_cfg['optimizer'].pop('batch_size_per_gpu')
+        optim_wrapper = dict(
+            optim_wrapper_cfg, constructor='YOLOv7OptimWrapperConstructor')
+        optim_wrapper = build_optim_wrapper(self.model, optim_wrapper)
+        assert optim_wrapper.optimizer.param_groups[0][
+            'weight_decay'] == self.weight_decay
diff --git a/tests/test_models/test_backbone/test_efficient_rep.py b/tests/test_models/test_backbone/test_efficient_rep.py
index 836ee739d..53af20294 100644
--- a/tests/test_models/test_backbone/test_efficient_rep.py
+++ b/tests/test_models/test_backbone/test_efficient_rep.py
@@ -5,7 +5,7 @@
 import torch
 from torch.nn.modules.batchnorm import _BatchNorm
 
-from mmyolo.models.backbones import YOLOv6EfficientRep
+from mmyolo.models.backbones import YOLOv6CSPBep, YOLOv6EfficientRep
 from mmyolo.utils import register_all_modules
 from .utils import check_norm_state, is_norm
 
@@ -23,7 +23,7 @@ def test_init(self):
             # frozen_stages must in range(-1, len(arch_setting) + 1)
             YOLOv6EfficientRep(frozen_stages=6)
 
-    def test_forward(self):
+    def test_YOLOv6EfficientRep_forward(self):
         # Test YOLOv6EfficientRep with first stage frozen
         frozen_stages = 1
         model = YOLOv6EfficientRep(frozen_stages=frozen_stages)
@@ -111,3 +111,92 @@ def test_forward(self):
         assert feat[0].shape == torch.Size((1, 256, 32, 32))
         assert feat[1].shape == torch.Size((1, 512, 16, 16))
         assert feat[2].shape == torch.Size((1, 1024, 8, 8))
+
+    def test_YOLOv6CSPBep_forward(self):
+        # Test YOLOv6CSPBep with first stage frozen
+        frozen_stages = 1
+        model = YOLOv6CSPBep(frozen_stages=frozen_stages)
+        model.init_weights()
+        model.train()
+
+        for mod in model.stem.modules():
+            for param in mod.parameters():
+                assert param.requires_grad is False
+        for i in range(1, frozen_stages + 1):
+            layer = getattr(model, f'stage{i}')
+            for mod in layer.modules():
+                if isinstance(mod, _BatchNorm):
+                    assert mod.training is False
+            for param in layer.parameters():
+                assert param.requires_grad is False
+
+        # Test YOLOv6CSPBep with norm_eval=True
+        model = YOLOv6CSPBep(norm_eval=True)
+        model.train()
+
+        assert check_norm_state(model.modules(), False)
+
+        # Test YOLOv6CSPBep forward with widen_factor=0.25
+        model = YOLOv6CSPBep(
+            arch='P5', widen_factor=0.25, out_indices=range(0, 5))
+        model.train()
+
+        imgs = torch.randn(1, 3, 64, 64)
+        feat = model(imgs)
+        assert len(feat) == 5
+        assert feat[0].shape == torch.Size((1, 16, 32, 32))
+        assert feat[1].shape == torch.Size((1, 32, 16, 16))
+        assert feat[2].shape == torch.Size((1, 64, 8, 8))
+        assert feat[3].shape == torch.Size((1, 128, 4, 4))
+        assert feat[4].shape == torch.Size((1, 256, 2, 2))
+
+        # Test YOLOv6CSPBep forward with dict(type='ReLU')
+        model = YOLOv6CSPBep(
+            widen_factor=0.125,
+            act_cfg=dict(type='ReLU'),
+            out_indices=range(0, 5))
+        model.train()
+
+        imgs = torch.randn(1, 3, 64, 64)
+        feat = model(imgs)
+        assert len(feat) == 5
+        assert feat[0].shape == torch.Size((1, 8, 32, 32))
+        assert feat[1].shape == torch.Size((1, 16, 16, 16))
+        assert feat[2].shape == torch.Size((1, 32, 8, 8))
+        assert feat[3].shape == torch.Size((1, 64, 4, 4))
+        assert feat[4].shape == torch.Size((1, 128, 2, 2))
+
+        # Test YOLOv6CSPBep with BatchNorm forward
+        model = YOLOv6CSPBep(widen_factor=0.125, out_indices=range(0, 5))
+        for m in model.modules():
+            if is_norm(m):
+                assert isinstance(m, _BatchNorm)
+        model.train()
+
+        imgs = torch.randn(1, 3, 64, 64)
+        feat = model(imgs)
+        assert len(feat) == 5
+        assert feat[0].shape == torch.Size((1, 8, 32, 32))
+        assert feat[1].shape == torch.Size((1, 16, 16, 16))
+        assert feat[2].shape == torch.Size((1, 32, 8, 8))
+        assert feat[3].shape == torch.Size((1, 64, 4, 4))
+        assert feat[4].shape == torch.Size((1, 128, 2, 2))
+
+        # Test YOLOv6CSPBep with BatchNorm forward
+        model = YOLOv6CSPBep(plugins=[
+            dict(
+                cfg=dict(type='mmdet.DropBlock', drop_prob=0.1, block_size=3),
+                stages=(False, False, True, True)),
+        ])
+
+        assert len(model.stage1) == 1
+        assert len(model.stage2) == 1
+        assert len(model.stage3) == 2  # +DropBlock
+        assert len(model.stage4) == 3  # +SPPF+DropBlock
+        model.train()
+        imgs = torch.randn(1, 3, 256, 256)
+        feat = model(imgs)
+        assert len(feat) == 3
+        assert feat[0].shape == torch.Size((1, 256, 32, 32))
+        assert feat[1].shape == torch.Size((1, 512, 16, 16))
+        assert feat[2].shape == torch.Size((1, 1024, 8, 8))
diff --git a/tests/test_models/test_backbone/test_yolov7_backbone.py b/tests/test_models/test_backbone/test_yolov7_backbone.py
new file mode 100644
index 000000000..76b40aa44
--- /dev/null
+++ b/tests/test_models/test_backbone/test_yolov7_backbone.py
@@ -0,0 +1,154 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import pytest
+import torch
+from torch.nn.modules.batchnorm import _BatchNorm
+
+from mmyolo.models.backbones import YOLOv7Backbone
+from mmyolo.utils import register_all_modules
+from .utils import check_norm_state
+
+register_all_modules()
+
+
+class TestYOLOv7Backbone(TestCase):
+
+    def test_init(self):
+        # out_indices in range(len(arch_setting) + 1)
+        with pytest.raises(AssertionError):
+            YOLOv7Backbone(out_indices=(6, ))
+
+        with pytest.raises(ValueError):
+            # frozen_stages must in range(-1, len(arch_setting) + 1)
+            YOLOv7Backbone(frozen_stages=6)
+
+    def test_forward(self):
+        # Test YOLOv7Backbone-L with first stage frozen
+        frozen_stages = 1
+        model = YOLOv7Backbone(frozen_stages=frozen_stages)
+        model.init_weights()
+        model.train()
+
+        for mod in model.stem.modules():
+            for param in mod.parameters():
+                assert param.requires_grad is False
+        for i in range(1, frozen_stages + 1):
+            layer = getattr(model, f'stage{i}')
+            for mod in layer.modules():
+                if isinstance(mod, _BatchNorm):
+                    assert mod.training is False
+            for param in layer.parameters():
+                assert param.requires_grad is False
+
+        # Test YOLOv7Backbone-L with norm_eval=True
+        model = YOLOv7Backbone(norm_eval=True)
+        model.train()
+
+        assert check_norm_state(model.modules(), False)
+
+        # Test YOLOv7Backbone-L forward with widen_factor=0.25
+        model = YOLOv7Backbone(
+            widen_factor=0.25, out_indices=tuple(range(0, 5)))
+        model.train()
+
+        imgs = torch.randn(1, 3, 64, 64)
+        feat = model(imgs)
+        assert len(feat) == 5
+        assert feat[0].shape == torch.Size((1, 16, 32, 32))
+        assert feat[1].shape == torch.Size((1, 64, 16, 16))
+        assert feat[2].shape == torch.Size((1, 128, 8, 8))
+        assert feat[3].shape == torch.Size((1, 256, 4, 4))
+        assert feat[4].shape == torch.Size((1, 256, 2, 2))
+
+        # Test YOLOv7Backbone-L with plugins
+        model = YOLOv7Backbone(
+            widen_factor=0.25,
+            plugins=[
+                dict(
+                    cfg=dict(
+                        type='mmdet.DropBlock', drop_prob=0.1, block_size=3),
+                    stages=(False, False, True, True)),
+            ])
+
+        assert len(model.stage1) == 2
+        assert len(model.stage2) == 2
+        assert len(model.stage3) == 3  # +DropBlock
+        assert len(model.stage4) == 3  # +DropBlock
+        model.train()
+        imgs = torch.randn(1, 3, 128, 128)
+        feat = model(imgs)
+        assert len(feat) == 3
+        assert feat[0].shape == torch.Size((1, 128, 16, 16))
+        assert feat[1].shape == torch.Size((1, 256, 8, 8))
+        assert feat[2].shape == torch.Size((1, 256, 4, 4))
+
+        # Test YOLOv7Backbone-X forward with widen_factor=0.25
+        model = YOLOv7Backbone(arch='X', widen_factor=0.25)
+        model.train()
+
+        imgs = torch.randn(1, 3, 64, 64)
+        feat = model(imgs)
+        assert len(feat) == 3
+        assert feat[0].shape == torch.Size((1, 160, 8, 8))
+        assert feat[1].shape == torch.Size((1, 320, 4, 4))
+        assert feat[2].shape == torch.Size((1, 320, 2, 2))
+
+        # Test YOLOv7Backbone-tiny forward with widen_factor=0.25
+        model = YOLOv7Backbone(arch='Tiny', widen_factor=0.25)
+        model.train()
+
+        feat = model(imgs)
+        assert len(feat) == 3
+        assert feat[0].shape == torch.Size((1, 32, 8, 8))
+        assert feat[1].shape == torch.Size((1, 64, 4, 4))
+        assert feat[2].shape == torch.Size((1, 128, 2, 2))
+
+        # Test YOLOv7Backbone-w forward with widen_factor=0.25
+        model = YOLOv7Backbone(
+            arch='W', widen_factor=0.25, out_indices=(2, 3, 4, 5))
+        model.train()
+
+        imgs = torch.randn(1, 3, 128, 128)
+        feat = model(imgs)
+        assert len(feat) == 4
+        assert feat[0].shape == torch.Size((1, 64, 16, 16))
+        assert feat[1].shape == torch.Size((1, 128, 8, 8))
+        assert feat[2].shape == torch.Size((1, 192, 4, 4))
+        assert feat[3].shape == torch.Size((1, 256, 2, 2))
+
+        # Test YOLOv7Backbone-w forward with widen_factor=0.25
+        model = YOLOv7Backbone(
+            arch='D', widen_factor=0.25, out_indices=(2, 3, 4, 5))
+        model.train()
+
+        feat = model(imgs)
+        assert len(feat) == 4
+        assert feat[0].shape == torch.Size((1, 96, 16, 16))
+        assert feat[1].shape == torch.Size((1, 192, 8, 8))
+        assert feat[2].shape == torch.Size((1, 288, 4, 4))
+        assert feat[3].shape == torch.Size((1, 384, 2, 2))
+
+        # Test YOLOv7Backbone-w forward with widen_factor=0.25
+        model = YOLOv7Backbone(
+            arch='E', widen_factor=0.25, out_indices=(2, 3, 4, 5))
+        model.train()
+
+        feat = model(imgs)
+        assert len(feat) == 4
+        assert feat[0].shape == torch.Size((1, 80, 16, 16))
+        assert feat[1].shape == torch.Size((1, 160, 8, 8))
+        assert feat[2].shape == torch.Size((1, 240, 4, 4))
+        assert feat[3].shape == torch.Size((1, 320, 2, 2))
+
+        # Test YOLOv7Backbone-w forward with widen_factor=0.25
+        model = YOLOv7Backbone(
+            arch='E2E', widen_factor=0.25, out_indices=(2, 3, 4, 5))
+        model.train()
+
+        feat = model(imgs)
+        assert len(feat) == 4
+        assert feat[0].shape == torch.Size((1, 80, 16, 16))
+        assert feat[1].shape == torch.Size((1, 160, 8, 8))
+        assert feat[2].shape == torch.Size((1, 240, 4, 4))
+        assert feat[3].shape == torch.Size((1, 320, 2, 2))
diff --git a/tests/test_models/test_dense_heads/test_yolov5_head.py b/tests/test_models/test_dense_heads/test_yolov5_head.py
index de31c1f31..18299e09b 100644
--- a/tests/test_models/test_dense_heads/test_yolov5_head.py
+++ b/tests/test_models/test_dense_heads/test_yolov5_head.py
@@ -127,7 +127,7 @@ def test_loss_by_feat(self):
         head = YOLOv5Head(head_module=self.head_module)
         gt_instances = InstanceData(
             bboxes=torch.Tensor([[23.6667, 23.8757, 238.6326, 151.8874]]),
-            labels=torch.LongTensor([1]))
+            labels=torch.LongTensor([0]))
 
         one_gt_losses = head.loss_by_feat(cls_scores, bbox_preds, objectnesses,
                                           [gt_instances], img_metas)
diff --git a/tests/test_models/test_dense_heads/test_yolov7_head.py b/tests/test_models/test_dense_heads/test_yolov7_head.py
new file mode 100644
index 000000000..5033f97e1
--- /dev/null
+++ b/tests/test_models/test_dense_heads/test_yolov7_head.py
@@ -0,0 +1,145 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import torch
+from mmengine.config import Config
+from mmengine.structures import InstanceData
+
+from mmyolo.models.dense_heads import YOLOv7Head
+from mmyolo.utils import register_all_modules
+
+register_all_modules()
+
+
+# TODO: Test YOLOv7p6HeadModule
+class TestYOLOv7Head(TestCase):
+
+    def setUp(self):
+        self.head_module = dict(
+            type='YOLOv7HeadModule',
+            num_classes=2,
+            in_channels=[32, 64, 128],
+            featmap_strides=[8, 16, 32],
+            num_base_priors=3)
+
+    def test_predict_by_feat(self):
+        s = 256
+        img_metas = [{
+            'img_shape': (s, s, 3),
+            'ori_shape': (s, s, 3),
+            'scale_factor': (1.0, 1.0),
+        }]
+        test_cfg = Config(
+            dict(
+                multi_label=True,
+                max_per_img=300,
+                score_thr=0.01,
+                nms=dict(type='nms', iou_threshold=0.65)))
+
+        head = YOLOv7Head(head_module=self.head_module, test_cfg=test_cfg)
+
+        feat = []
+        for i in range(len(self.head_module['in_channels'])):
+            in_channel = self.head_module['in_channels'][i]
+            feat_size = self.head_module['featmap_strides'][i]
+            feat.append(
+                torch.rand(1, in_channel, s // feat_size, s // feat_size))
+
+        cls_scores, bbox_preds, objectnesses = head.forward(feat)
+        head.predict_by_feat(
+            cls_scores,
+            bbox_preds,
+            objectnesses,
+            img_metas,
+            cfg=test_cfg,
+            rescale=True,
+            with_nms=True)
+        head.predict_by_feat(
+            cls_scores,
+            bbox_preds,
+            objectnesses,
+            img_metas,
+            cfg=test_cfg,
+            rescale=False,
+            with_nms=False)
+
+    def test_loss_by_feat(self):
+        s = 256
+        img_metas = [{
+            'img_shape': (s, s, 3),
+            'batch_input_shape': (s, s),
+            'scale_factor': 1,
+        }]
+
+        head = YOLOv7Head(head_module=self.head_module)
+
+        feat = []
+        for i in range(len(self.head_module['in_channels'])):
+            in_channel = self.head_module['in_channels'][i]
+            feat_size = self.head_module['featmap_strides'][i]
+            feat.append(
+                torch.rand(1, in_channel, s // feat_size, s // feat_size))
+
+        cls_scores, bbox_preds, objectnesses = head.forward(feat)
+
+        # Test that empty ground truth encourages the network to predict
+        # background
+        gt_instances = InstanceData(
+            bboxes=torch.empty((0, 4)), labels=torch.LongTensor([]))
+
+        empty_gt_losses = head.loss_by_feat(cls_scores, bbox_preds,
+                                            objectnesses, [gt_instances],
+                                            img_metas)
+        # When there is no truth, the cls loss should be nonzero but there
+        # should be no box loss.
+        empty_cls_loss = empty_gt_losses['loss_cls'].sum()
+        empty_box_loss = empty_gt_losses['loss_bbox'].sum()
+        empty_obj_loss = empty_gt_losses['loss_obj'].sum()
+        self.assertEqual(
+            empty_cls_loss.item(), 0,
+            'there should be no cls loss when there are no true boxes')
+        self.assertEqual(
+            empty_box_loss.item(), 0,
+            'there should be no box loss when there are no true boxes')
+        self.assertGreater(empty_obj_loss.item(), 0,
+                           'objectness loss should be non-zero')
+
+        # When truth is non-empty then both cls and box loss should be nonzero
+        # for random inputs
+        head = YOLOv7Head(head_module=self.head_module)
+        gt_instances = InstanceData(
+            bboxes=torch.Tensor([[23.6667, 23.8757, 238.6326, 151.8874]]),
+            labels=torch.LongTensor([1]))
+
+        one_gt_losses = head.loss_by_feat(cls_scores, bbox_preds, objectnesses,
+                                          [gt_instances], img_metas)
+        onegt_cls_loss = one_gt_losses['loss_cls'].sum()
+        onegt_box_loss = one_gt_losses['loss_bbox'].sum()
+        onegt_obj_loss = one_gt_losses['loss_obj'].sum()
+        self.assertGreater(onegt_cls_loss.item(), 0,
+                           'cls loss should be non-zero')
+        self.assertGreater(onegt_box_loss.item(), 0,
+                           'box loss should be non-zero')
+        self.assertGreater(onegt_obj_loss.item(), 0,
+                           'obj loss should be non-zero')
+
+        # test num_class = 1
+        self.head_module['num_classes'] = 1
+        head = YOLOv7Head(head_module=self.head_module)
+        gt_instances = InstanceData(
+            bboxes=torch.Tensor([[23.6667, 23.8757, 238.6326, 151.8874]]),
+            labels=torch.LongTensor([0]))
+
+        cls_scores, bbox_preds, objectnesses = head.forward(feat)
+
+        one_gt_losses = head.loss_by_feat(cls_scores, bbox_preds, objectnesses,
+                                          [gt_instances], img_metas)
+        onegt_cls_loss = one_gt_losses['loss_cls'].sum()
+        onegt_box_loss = one_gt_losses['loss_bbox'].sum()
+        onegt_obj_loss = one_gt_losses['loss_obj'].sum()
+        self.assertEqual(onegt_cls_loss.item(), 0,
+                         'cls loss should be non-zero')
+        self.assertGreater(onegt_box_loss.item(), 0,
+                           'box loss should be non-zero')
+        self.assertGreater(onegt_obj_loss.item(), 0,
+                           'obj loss should be non-zero')
diff --git a/tests/test_models/test_detectors/test_yolo_detector.py b/tests/test_models/test_detectors/test_yolo_detector.py
index d8df3289d..906b2324b 100644
--- a/tests/test_models/test_detectors/test_yolo_detector.py
+++ b/tests/test_models/test_detectors/test_yolo_detector.py
@@ -22,7 +22,8 @@ def setUp(self):
         'yolov5/yolov5_n-v61_syncbn_fast_8xb16-300e_coco.py',
         'yolov6/yolov6_s_syncbn_fast_8xb32-400e_coco.py',
         'yolox/yolox_tiny_8xb8-300e_coco.py',
-        'rtmdet/rtmdet_tiny_syncbn_8xb32-300e_coco.py'
+        'rtmdet/rtmdet_tiny_syncbn_8xb32-300e_coco.py',
+        'yolov7/yolov7_tiny_syncbn_fast_8x16b-300e_coco.py'
     ])
     def test_init(self, cfg_file):
         model = get_detector_cfg(cfg_file)
@@ -37,6 +38,7 @@ def test_init(self, cfg_file):
     @parameterized.expand([
         ('yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py', ('cuda', 'cpu')),
         ('yolox/yolox_s_8xb8-300e_coco.py', ('cuda', 'cpu')),
+        ('yolov7/yolov7_tiny_syncbn_fast_8x16b-300e_coco.py', ('cuda', 'cpu')),
         ('rtmdet/rtmdet_tiny_syncbn_8xb32-300e_coco.py', ('cuda', 'cpu'))
     ])
     def test_forward_loss_mode(self, cfg_file, devices):
@@ -47,6 +49,13 @@ def test_forward_loss_mode(self, cfg_file, devices):
         model = get_detector_cfg(cfg_file)
         model.backbone.init_cfg = None
 
+        if 'fast' in cfg_file:
+            model.data_preprocessor = dict(
+                type='mmdet.DetDataPreprocessor',
+                mean=[0., 0., 0.],
+                std=[255., 255., 255.],
+                bgr_to_rgb=True)
+
         from mmdet.models import build_detector
         assert all([device in ['cpu', 'cuda'] for device in devices])
 
@@ -69,6 +78,7 @@ def test_forward_loss_mode(self, cfg_file, devices):
                                                                 'cpu')),
         ('yolov6/yolov6_s_syncbn_fast_8xb32-400e_coco.py', ('cuda', 'cpu')),
         ('yolox/yolox_tiny_8xb8-300e_coco.py', ('cuda', 'cpu')),
+        ('yolov7/yolov7_tiny_syncbn_fast_8x16b-300e_coco.py', ('cuda', 'cpu')),
         ('rtmdet/rtmdet_tiny_syncbn_8xb32-300e_coco.py', ('cuda', 'cpu'))
     ])
     def test_forward_predict_mode(self, cfg_file, devices):
@@ -100,6 +110,7 @@ def test_forward_predict_mode(self, cfg_file, devices):
                                                                 'cpu')),
         ('yolov6/yolov6_s_syncbn_fast_8xb32-400e_coco.py', ('cuda', 'cpu')),
         ('yolox/yolox_tiny_8xb8-300e_coco.py', ('cuda', 'cpu')),
+        ('yolov7/yolov7_tiny_syncbn_fast_8x16b-300e_coco.py', ('cuda', 'cpu')),
         ('rtmdet/rtmdet_tiny_syncbn_8xb32-300e_coco.py', ('cuda', 'cpu'))
     ])
     def test_forward_tensor_mode(self, cfg_file, devices):
diff --git a/tests/test_models/test_necks/test_yolov6_pafpn.py b/tests/test_models/test_necks/test_yolov6_pafpn.py
index ae09f6ac1..bea49febe 100644
--- a/tests/test_models/test_necks/test_yolov6_pafpn.py
+++ b/tests/test_models/test_necks/test_yolov6_pafpn.py
@@ -3,15 +3,15 @@
 
 import torch
 
-from mmyolo.models.necks import YOLOv6RepPAFPN
+from mmyolo.models.necks import YOLOv6CSPRepPAFPN, YOLOv6RepPAFPN
 from mmyolo.utils import register_all_modules
 
 register_all_modules()
 
 
-class TestYOLOv6RepPAFPN(TestCase):
+class TestYOLOv6PAFPN(TestCase):
 
-    def test_forward(self):
+    def test_YOLOv6RepPAFP_forward(self):
         s = 64
         in_channels = [8, 16, 32]
         feat_sizes = [s // 2**i for i in range(4)]  # [32, 16, 8]
@@ -27,3 +27,20 @@ def test_forward(self):
         for i in range(len(feats)):
             assert outs[i].shape[1] == out_channels[i]
             assert outs[i].shape[2] == outs[i].shape[3] == s // (2**i)
+
+    def test_YOLOv6CSPRepPAFPN_forward(self):
+        s = 64
+        in_channels = [8, 16, 32]
+        feat_sizes = [s // 2**i for i in range(4)]  # [32, 16, 8]
+        out_channels = [8, 16, 32]
+        feats = [
+            torch.rand(1, in_channels[i], feat_sizes[i], feat_sizes[i])
+            for i in range(len(in_channels))
+        ]
+        neck = YOLOv6CSPRepPAFPN(
+            in_channels=in_channels, out_channels=out_channels)
+        outs = neck(feats)
+        assert len(outs) == len(feats)
+        for i in range(len(feats)):
+            assert outs[i].shape[1] == out_channels[i]
+            assert outs[i].shape[2] == outs[i].shape[3] == s // (2**i)
diff --git a/tests/test_models/test_necks/test_yolov7_pafpn.py b/tests/test_models/test_necks/test_yolov7_pafpn.py
new file mode 100644
index 000000000..17bf455c1
--- /dev/null
+++ b/tests/test_models/test_necks/test_yolov7_pafpn.py
@@ -0,0 +1,79 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+from unittest import TestCase
+
+import torch
+from mmcv.cnn import ConvModule
+
+from mmyolo.models.necks import YOLOv7PAFPN
+from mmyolo.utils import register_all_modules
+
+register_all_modules()
+
+
+class TestYOLOv7PAFPN(TestCase):
+
+    def test_forward(self):
+        # test P5
+        s = 64
+        in_channels = [8, 16, 32]
+        feat_sizes = [s // 2**i for i in range(4)]  # [32, 16, 8]
+        out_channels = [8, 16, 32]
+        feats = [
+            torch.rand(1, in_channels[i], feat_sizes[i], feat_sizes[i])
+            for i in range(len(in_channels))
+        ]
+        neck = YOLOv7PAFPN(in_channels=in_channels, out_channels=out_channels)
+        outs = neck(feats)
+        assert len(outs) == len(feats)
+        for i in range(len(feats)):
+            assert outs[i].shape[1] == out_channels[i] * 2
+            assert outs[i].shape[2] == outs[i].shape[3] == s // (2**i)
+
+        # test is_tiny_version
+        neck = YOLOv7PAFPN(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            is_tiny_version=True)
+        outs = neck(feats)
+        assert len(outs) == len(feats)
+        for i in range(len(feats)):
+            assert outs[i].shape[1] == out_channels[i] * 2
+            assert outs[i].shape[2] == outs[i].shape[3] == s // (2**i)
+
+        # test use_in_channels_in_downsample
+        neck = YOLOv7PAFPN(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            use_in_channels_in_downsample=True)
+        for f in feats:
+            print(f.shape)
+        outs = neck(feats)
+        for f in outs:
+            print(f.shape)
+        assert len(outs) == len(feats)
+        for i in range(len(feats)):
+            assert outs[i].shape[1] == out_channels[i] * 2
+            assert outs[i].shape[2] == outs[i].shape[3] == s // (2**i)
+
+        # test use_repconv_outs is False
+        neck = YOLOv7PAFPN(
+            in_channels=in_channels,
+            out_channels=out_channels,
+            use_repconv_outs=False)
+        self.assertIsInstance(neck.out_layers[0], ConvModule)
+
+        # test P6
+        s = 64
+        in_channels = [8, 16, 32, 64]
+        feat_sizes = [s // 2**i for i in range(4)]
+        out_channels = [8, 16, 32, 64]
+        feats = [
+            torch.rand(1, in_channels[i], feat_sizes[i], feat_sizes[i])
+            for i in range(len(in_channels))
+        ]
+        neck = YOLOv7PAFPN(in_channels=in_channels, out_channels=out_channels)
+        outs = neck(feats)
+        assert len(outs) == len(feats)
+        for i in range(len(feats)):
+            assert outs[i].shape[1] == out_channels[i]
+            assert outs[i].shape[2] == outs[i].shape[3] == s // (2**i)
diff --git a/tools/analysis_tools/browse_coco_json.py b/tools/analysis_tools/browse_coco_json.py
index 4f16774ca..71a2fc2a9 100644
--- a/tools/analysis_tools/browse_coco_json.py
+++ b/tools/analysis_tools/browse_coco_json.py
@@ -10,7 +10,10 @@
 
 
 def show_coco_json(args):
-    coco = COCO(osp.join(args.data_root, args.ann_file))
+    if args.data_root is not None:
+        coco = COCO(osp.join(args.data_root, args.ann_file))
+    else:
+        coco = COCO(args.ann_file)
     print(f'Total number of images：{len(coco.getImgIds())}')
     categories = coco.loadCats(coco.getCatIds())
     category_names = [category['name'] for category in categories]
@@ -30,8 +33,11 @@ def show_coco_json(args):
 
     for i in range(len(image_ids)):
         image_data = coco.loadImgs(image_ids[i])[0]
-        image_path = osp.join(args.data_root, args.img_dir,
-                              image_data['file_name'])
+        if args.data_root is not None:
+            image_path = osp.join(args.data_root, args.img_dir,
+                                  image_data['file_name'])
+        else:
+            image_path = osp.join(args.img_dir, image_data['file_name'])
 
         annotation_ids = coco.getAnnIds(
             imgIds=image_data['id'], catIds=category_ids, iscrowd=0)
@@ -103,14 +109,13 @@ def show_bbox_only(coco, anns, show_label_bbox=True, is_filling=True):
 
 def parse_args():
     parser = argparse.ArgumentParser(description='Show coco json file')
+    parser.add_argument('--data-root', default=None, help='dataset root')
     parser.add_argument(
-        'data_root', default='data/coco/', help='data root path')
+        '--img-dir', default='data/coco/train2017', help='image folder path')
     parser.add_argument(
-        '--ann_file',
-        default='annotations/instances_train2017.json',
+        '--ann-file',
+        default='data/coco/annotations/instances_train2017.json',
         help='ann file path')
-    parser.add_argument(
-        '--img_dir', default='train2017', help='image folder path')
     parser.add_argument(
         '--wait-time', type=float, default=2, help='the interval of show (s)')
     parser.add_argument(
@@ -133,6 +138,10 @@ def parse_args():
     return args
 
 
-if __name__ == '__main__':
+def main():
     args = parse_args()
     show_coco_json(args)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/tools/analysis_tools/browse_dataset.py b/tools/analysis_tools/browse_dataset.py
index ee5e37929..5b45c25d3 100644
--- a/tools/analysis_tools/browse_dataset.py
+++ b/tools/analysis_tools/browse_dataset.py
@@ -1,28 +1,64 @@
 # Copyright (c) OpenMMLab. All rights reserved.
 import argparse
 import os.path as osp
+import sys
+from typing import Tuple
 
+import cv2
+import mmcv
 import numpy as np
 from mmdet.models.utils import mask2ndarray
 from mmdet.structures.bbox import BaseBoxes
 from mmengine.config import Config, DictAction
+from mmengine.dataset import Compose
 from mmengine.utils import ProgressBar
+from mmengine.visualization import Visualizer
 
 from mmyolo.registry import DATASETS, VISUALIZERS
 from mmyolo.utils import register_all_modules
 
 
+# TODO: Support for printing the change in key of results
 def parse_args():
     parser = argparse.ArgumentParser(description='Browse a dataset')
     parser.add_argument('config', help='train config file path')
+    parser.add_argument(
+        '--phase',
+        '-p',
+        default='train',
+        type=str,
+        choices=['train', 'test', 'val'],
+        help='phase of dataset to visualize, accept "train" "test" and "val".'
+        ' Defaults to "train".')
+    parser.add_argument(
+        '--mode',
+        '-m',
+        default='transformed',
+        type=str,
+        choices=['original', 'transformed', 'pipeline'],
+        help='display mode; display original pictures or '
+        'transformed pictures or comparison pictures. "original" '
+        'means show images load from disk; "transformed" means '
+        'to show images after transformed; "pipeline" means show all '
+        'the intermediate images. Defaults to "transformed".')
     parser.add_argument(
         '--output-dir',
         default=None,
         type=str,
-        help='If there is no display interface, you can save it')
+        help='If there is no display interface, you can save it.')
     parser.add_argument('--not-show', default=False, action='store_true')
+    parser.add_argument(
+        '--show-number',
+        '-n',
+        type=int,
+        default=sys.maxsize,
+        help='number of images selected to visualize, '
+        'must bigger than 0. if the number is bigger than length '
+        'of dataset, show all the images in dataset; '
+        'default "sys.maxsize", show all images in dataset')
     parser.add_argument(
         '--show-interval',
+        '-i',
         type=float,
         default=3,
         help='the interval of show (s)')
@@ -40,49 +76,180 @@ def parse_args():
     return args
 
 
+def _get_adaptive_scale(img_shape: Tuple[int, int],
+                        min_scale: float = 0.3,
+                        max_scale: float = 3.0) -> float:
+    """Get adaptive scale according to image shape.
+
+    The target scale depends on the the short edge length of the image. If the
+    short edge length equals 224, the output is 1.0. And output linear
+    scales according the short edge length. You can also specify the minimum
+    scale and the maximum scale to limit the linear scale.
+
+    Args:
+        img_shape (Tuple[int, int]): The shape of the canvas image.
+        min_scale (int): The minimum scale. Defaults to 0.3.
+        max_scale (int): The maximum scale. Defaults to 3.0.
+    Returns:
+        int: The adaptive scale.
+    """
+    short_edge_length = min(img_shape)
+    scale = short_edge_length / 224.
+    return min(max(scale, min_scale), max_scale)
+
+
+def make_grid(imgs, names):
+    """Concat list of pictures into a single big picture, align height here."""
+    visualizer = Visualizer.get_current_instance()
+    ori_shapes = [img.shape[:2] for img in imgs]
+    max_height = int(max(img.shape[0] for img in imgs) * 1.1)
+    min_width = min(img.shape[1] for img in imgs)
+    horizontal_gap = min_width // 10
+    img_scale = _get_adaptive_scale((max_height, min_width))
+
+    texts = []
+    text_positions = []
+    start_x = 0
+    for i, img in enumerate(imgs):
+        pad_height = (max_height - img.shape[0]) // 2
+        pad_width = horizontal_gap // 2
+        # make border
+        imgs[i] = cv2.copyMakeBorder(
+            img,
+            pad_height,
+            max_height - img.shape[0] - pad_height + int(img_scale * 30 * 2),
+            pad_width,
+            pad_width,
+            cv2.BORDER_CONSTANT,
+            value=(255, 255, 255))
+        texts.append(f'{"execution: "}{i}\n{names[i]}\n{ori_shapes[i]}')
+        text_positions.append(
+            [start_x + img.shape[1] // 2 + pad_width, max_height])
+        start_x += img.shape[1] + horizontal_gap
+
+    display_img = np.concatenate(imgs, axis=1)
+    visualizer.set_image(display_img)
+    img_scale = _get_adaptive_scale(display_img.shape[:2])
+    visualizer.draw_texts(
+        texts,
+        positions=np.array(text_positions),
+        font_sizes=img_scale * 7,
+        colors='black',
+        horizontal_alignments='center',
+        font_families='monospace')
+    return visualizer.get_image()
+
+
+class InspectCompose(Compose):
+    """Compose multiple transforms sequentially.
+
+    And record "img" field of all results in one list.
+    """
+
+    def __init__(self, transforms, intermediate_imgs):
+        super().__init__(transforms=transforms)
+        self.intermediate_imgs = intermediate_imgs
+
+    def __call__(self, data):
+        if 'img' in data:
+            self.intermediate_imgs.append({
+                'name': 'original',
+                'img': data['img'].copy()
+            })
+        self.ptransforms = [
+            self.transforms[i] for i in range(len(self.transforms) - 1)
+        ]
+        for t in self.ptransforms:
+            data = t(data)
+            # Keep the same meta_keys in the PackDetInputs
+            self.transforms[-1].meta_keys = [key for key in data]
+            data_sample = self.transforms[-1](data)
+            if data is None:
+                return None
+            if 'img' in data:
+                self.intermediate_imgs.append({
+                    'name':
+                    t.__class__.__name__,
+                    'dataset_sample':
+                    data_sample['data_samples']
+                })
+        return data
+
+
 def main():
     args = parse_args()
     cfg = Config.fromfile(args.config)
     if args.cfg_options is not None:
         cfg.merge_from_dict(args.cfg_options)
 
-    # register all modules in mmdet into the registries
+    # register all modules in mmyolo into the registries
     register_all_modules()
 
-    dataset = DATASETS.build(cfg.train_dataloader.dataset)
+    dataset_cfg = cfg.get(args.phase + '_dataloader').get('dataset')
+    dataset = DATASETS.build(dataset_cfg)
     visualizer = VISUALIZERS.build(cfg.visualizer)
     visualizer.dataset_meta = dataset.metainfo
 
-    progress_bar = ProgressBar(len(dataset))
-    for item in dataset:
-        img = item['inputs'].permute(1, 2, 0).numpy()
-        data_samples = item['data_samples'].numpy()
-        gt_instances = data_samples.gt_instances
-        img_path = osp.basename(item['data_samples'].img_path)
-
-        out_file = osp.join(
-            args.output_dir,
-            osp.basename(img_path)) if args.output_dir is not None else None
-
-        img = img[..., [2, 1, 0]]  # bgr to rgb
-        gt_bboxes = gt_instances.get('bboxes', None)
-        if gt_bboxes is not None and isinstance(gt_bboxes, BaseBoxes):
-            gt_instances.bboxes = gt_bboxes.tensor
-        gt_masks = gt_instances.get('masks', None)
-        if gt_masks is not None:
-            masks = mask2ndarray(gt_masks)
-            gt_instances.masks = masks.astype(np.bool)
-        data_samples.gt_instances = gt_instances
-
-        visualizer.add_datasample(
-            osp.basename(img_path),
-            img,
-            data_samples,
-            draw_pred=False,
-            show=not args.not_show,
-            wait_time=args.show_interval,
-            out_file=out_file)
+    intermediate_imgs = []
+    # TODO: The dataset wrapper occasion is not considered here
+    dataset.pipeline = InspectCompose(dataset.pipeline.transforms,
+                                      intermediate_imgs)
+
+    # init visualization image number
+    assert args.show_number > 0
+    display_number = min(args.show_number, len(dataset))
+
+    progress_bar = ProgressBar(display_number)
+    for i, item in zip(range(display_number), dataset):
+        image_i = []
+        result_i = [result['dataset_sample'] for result in intermediate_imgs]
+        for k, datasample in enumerate(result_i):
+            image = datasample.img
+            gt_instances = datasample.gt_instances
+            image = image[..., [2, 1, 0]]  # bgr to rgb
+            gt_bboxes = gt_instances.get('bboxes', None)
+            if gt_bboxes is not None and isinstance(gt_bboxes, BaseBoxes):
+                gt_instances.bboxes = gt_bboxes.tensor
+            gt_masks = gt_instances.get('masks', None)
+            if gt_masks is not None:
+                masks = mask2ndarray(gt_masks)
+                gt_instances.masks = masks.astype(np.bool)
+                datasample.gt_instances = gt_instances
+            # get filename from dataset or just use index as filename
+            visualizer.add_datasample(
+                'result',
+                image,
+                datasample,
+                draw_pred=False,
+                draw_gt=True,
+                show=False)
+            image_show = visualizer.get_image()
+            image_i.append(image_show)
+
+        if args.mode == 'original':
+            image = image_i[0]
+        elif args.mode == 'transformed':
+            image = image_i[-1]
+        else:
+            image = make_grid([result for result in image_i],
+                              [result['name'] for result in intermediate_imgs])
+
+        if hasattr(datasample, 'img_path'):
+            filename = osp.basename(datasample.img_path)
+        else:
+            # some dataset have not image path
+            filename = f'{i}.jpg'
+        out_file = osp.join(args.output_dir,
+                            filename) if args.output_dir is not None else None
+
+        if out_file is not None:
+            mmcv.imwrite(image[..., ::-1], out_file)
+
+        if not args.not_show:
+            visualizer.show(
+                image, win_name=filename, wait_time=args.show_interval)
 
+        intermediate_imgs.clear()
         progress_bar.update()
 
 
diff --git a/tools/analysis_tools/dataset_analysis.py b/tools/analysis_tools/dataset_analysis.py
index ae0bd1144..6e494677f 100644
--- a/tools/analysis_tools/dataset_analysis.py
+++ b/tools/analysis_tools/dataset_analysis.py
@@ -7,12 +7,12 @@
 import matplotlib.pyplot as plt
 import numpy as np
 from mmengine.config import Config
-from mmengine.dataset.dataset_wrapper import ConcatDataset
 from mmengine.utils import ProgressBar
 from prettytable import PrettyTable
 
 from mmyolo.registry import DATASETS
 from mmyolo.utils import register_all_modules
+from mmyolo.utils.misc import show_data_classes
 
 
 def parse_args():
@@ -348,29 +348,6 @@ def show_data_list(args, area_rule):
     print(data_info)
 
 
-def show_data_classes(data_classes):
-    """When printing an error, all class names of the dataset."""
-    print('\n\nThe name of the class contained in the dataset:')
-    data_classes_info = PrettyTable()
-    data_classes_info.title = 'Information of dataset class'
-    # List Print Settings
-    # If the quantity is too large, 25 rows will be displayed in each column
-    if len(data_classes) < 25:
-        data_classes_info.add_column('Class name', data_classes)
-    elif len(data_classes) % 25 != 0 and len(data_classes) > 25:
-        col_num = int(len(data_classes) / 25) + 1
-        data_name_list = list(data_classes)
-        for i in range(0, (col_num * 25) - len(data_classes)):
-            data_name_list.append('')
-        for i in range(0, len(data_name_list), 25):
-            data_classes_info.add_column('Class name',
-                                         data_name_list[i:i + 25])
-
-    # Align display data to the left
-    data_classes_info.align['Class name'] = 'l'
-    print(data_classes_info)
-
-
 def main():
     args = parse_args()
     cfg = Config.fromfile(args.config)
@@ -378,21 +355,36 @@ def main():
     # register all modules in mmdet into the registries
     register_all_modules()
 
+    def replace_pipeline_to_none(cfg):
+        """Recursively iterate over all dataset(or datasets) and set their
+        pipelines to none.Datasets are mean ConcatDataset.
+
+        Recursively terminates only when all dataset(or datasets) have been
+        traversed
+        """
+
+        if cfg.get('dataset', None) is None and cfg.get('datasets',
+                                                        None) is None:
+            return
+        dataset = cfg.dataset if cfg.get('dataset', None) else cfg.datasets
+        if isinstance(dataset, list):
+            for item in dataset:
+                item.pipeline = None
+        elif dataset.get('pipeline', None):
+            dataset.pipeline = None
+        else:
+            replace_pipeline_to_none(dataset)
+
     # 1.Build Dataset
     if args.val_dataset is False:
+        replace_pipeline_to_none(cfg.train_dataloader)
         dataset = DATASETS.build(cfg.train_dataloader.dataset)
-    elif args.val_dataset is True:
+    else:
+        replace_pipeline_to_none(cfg.val_dataloader)
         dataset = DATASETS.build(cfg.val_dataloader.dataset)
 
-    # Determine whether the dataset is ConcatDataset
-    if isinstance(dataset, ConcatDataset):
-        datasets = dataset.datasets
-        data_list = []
-        for idx in range(len(datasets)):
-            datasets_list = datasets[idx].load_data_list()
-            data_list += datasets_list
-    else:
-        data_list = dataset.load_data_list()
+    # Build  lists to store data for all raw data
+    data_list = dataset
 
     # 2.Prepare data
     # Drawing settings
diff --git a/tools/dataset_converters/labelme2coco.py b/tools/dataset_converters/labelme2coco.py
new file mode 100644
index 000000000..94e46e166
--- /dev/null
+++ b/tools/dataset_converters/labelme2coco.py
@@ -0,0 +1,310 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+"""This script helps to convert labelme-style dataset to the coco format.
+
+Usage:
+    $ python labelme2coco.py \
+                --img-dir /path/to/images \
+                --labels-dir /path/to/labels \
+                --out /path/to/coco_instances.json \
+                [--class-id-txt /path/to/class_with_id.txt]
+
+Note:
+    Labels dir file structure:
+    .
+    └── PATH_TO_LABELS
+         ├── image1.json
+         ├── image2.json
+         └── ...
+
+    Images dir file structure:
+    .
+    └── PATH_TO_IMAGES
+         ├── image1.jpg
+         ├── image2.png
+         └── ...
+
+    If user set `--class-id-txt` then will use it in `categories` field,
+    if not set, then will generate auto base on the all labelme label
+    files to `class_with_id.json`.
+
+    class_with_id.txt example, each line is "id class_name":
+    ```text
+    1 cat
+    2 dog
+    3 bicycle
+    4 motorcycle
+
+    ```
+"""
+import argparse
+import json
+from pathlib import Path
+from typing import Optional
+
+import numpy as np
+from mmengine import track_iter_progress
+
+from mmyolo.utils.misc import IMG_EXTENSIONS
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--img-dir', type=str, help='Dataset image directory')
+    parser.add_argument(
+        '--labels-dir', type=str, help='Dataset labels directory')
+    parser.add_argument('--out', type=str, help='COCO label json output path')
+    parser.add_argument(
+        '--class-id-txt', default=None, type=str, help='All class id txt path')
+    args = parser.parse_args()
+    return args
+
+
+def format_coco_annotations(points: list, image_id: int, annotations_id: int,
+                            category_id: int) -> dict:
+    """Gen COCO annotations format label from labelme format label.
+
+    Args:
+        points (list): Coordinates of four vertices of rectangle bbox.
+        image_id (int): Image id.
+        annotations_id (int): Annotations id.
+        category_id (int): Image dir path.
+
+    Return:
+        annotation_info (dict): COCO annotation data.
+    """
+    annotation_info = dict()
+    annotation_info['iscrowd'] = 0
+    annotation_info['category_id'] = category_id
+    annotation_info['id'] = annotations_id
+    annotation_info['image_id'] = image_id
+
+    # bbox is [x1, y1, w, h]
+    annotation_info['bbox'] = [
+        points[0][0], points[0][1], points[1][0] - points[0][0],
+        points[1][1] - points[0][1]
+    ]
+
+    annotation_info['area'] = annotation_info['bbox'][2] * annotation_info[
+        'bbox'][3]  # bbox w * h
+    segmentation_points = np.asarray(points).copy()
+    segmentation_points[1, :] = np.asarray(points)[2, :]
+    segmentation_points[2, :] = np.asarray(points)[1, :]
+    annotation_info['segmentation'] = [list(segmentation_points.flatten())]
+
+    return annotation_info
+
+
+def parse_labelme_to_coco(
+        image_dir: str,
+        labels_root: str,
+        all_classes_id: Optional[dict] = None) -> (dict, dict):
+    """Gen COCO json format label from labelme format label.
+
+    Args:
+        image_dir (str): Image dir path.
+        labels_root (str): Image label root path.
+        all_classes_id (Optional[dict]): All class with id. Default None.
+
+    Return:
+        coco_json (dict): COCO json data.
+        category_to_id (dict): category id and name.
+
+    COCO json example:
+
+    {
+        "images": [
+            {
+                "height": 3000,
+                "width": 4000,
+                "id": 1,
+                "file_name": "IMG_20210627_225110.jpg"
+            },
+            ...
+        ],
+        "categories": [
+            {
+                "id": 1,
+                "name": "cat"
+            },
+            ...
+        ],
+        "annotations": [
+            {
+                "iscrowd": 0,
+                "category_id": 1,
+                "id": 1,
+                "image_id": 1,
+                "bbox": [
+                    1183.7313232421875,
+                    1230.0509033203125,
+                    1270.9998779296875,
+                    927.0848388671875
+                ],
+                "area": 1178324.7170306593,
+                "segmentation": [
+                    [
+                        1183.7313232421875,
+                        1230.0509033203125,
+                        1183.7313232421875,
+                        2157.1357421875,
+                        2454.731201171875,
+                        2157.1357421875,
+                        2454.731201171875,
+                        1230.0509033203125
+                    ]
+                ]
+            },
+            ...
+        ]
+    }
+    """
+
+    # init coco json field
+    coco_json = {'images': [], 'categories': [], 'annotations': []}
+
+    image_id = 0
+    annotations_id = 0
+    if all_classes_id is None:
+        category_to_id = dict()
+        categories_labels = []
+    else:
+        category_to_id = all_classes_id
+        categories_labels = list(all_classes_id.keys())
+
+    # filter incorrect image file
+    img_file_list = [
+        img_file for img_file in Path(image_dir).iterdir()
+        if img_file.suffix.lower() in IMG_EXTENSIONS
+    ]
+
+    for img_file in track_iter_progress(img_file_list):
+
+        # get label file according to the image file name
+        label_path = Path(labels_root).joinpath(
+            img_file.stem).with_suffix('.json')
+        if not label_path.exists():
+            print(f'Can not find label file: {label_path}, skip...')
+            continue
+
+        # load labelme label
+        with open(label_path, encoding='utf-8') as f:
+            labelme_data = json.load(f)
+
+        image_id = image_id + 1  # coco id begin from 1
+
+        # update coco 'images' field
+        coco_json['images'].append({
+            'height':
+            labelme_data['imageHeight'],
+            'width':
+            labelme_data['imageWidth'],
+            'id':
+            image_id,
+            'file_name':
+            Path(labelme_data['imagePath']).name
+        })
+
+        for label_shapes in labelme_data['shapes']:
+
+            # Update coco 'categories' field
+            class_name = label_shapes['label']
+
+            if (all_classes_id is None) and (class_name
+                                             not in categories_labels):
+                # only update when not been added before
+                coco_json['categories'].append({
+                    'id':
+                    len(categories_labels) + 1,  # categories id start with 1
+                    'name': class_name
+                })
+                categories_labels.append(class_name)
+                category_to_id[class_name] = len(categories_labels)
+
+            elif (all_classes_id is not None) and (class_name
+                                                   not in categories_labels):
+                # check class name
+                raise ValueError(f'Got unexpected class name {class_name}, '
+                                 'which is not in your `--class-id-txt`.')
+
+            # get shape type and convert it to coco format
+            shape_type = label_shapes['shape_type']
+            if shape_type != 'rectangle':
+                print(f'not support `{shape_type}` yet, skip...')
+                continue
+
+            annotations_id = annotations_id + 1
+            # convert point from [xmin, ymin, xmax, ymax] to [x1, y1, w, h]
+            (x1, y1), (x2, y2) = label_shapes['points']
+            x1, x2 = sorted([x1, x2])  # xmin, xmax
+            y1, y2 = sorted([y1, y2])  # ymin, ymax
+            points = [[x1, y1], [x2, y2], [x1, y2], [x2, y1]]
+            coco_annotations = format_coco_annotations(
+                points, image_id, annotations_id, category_to_id[class_name])
+            coco_json['annotations'].append(coco_annotations)
+
+    print(f'Total image = {image_id}')
+    print(f'Total annotations = {annotations_id}')
+    print(f'Number of categories = {len(categories_labels)}, '
+          f'which is {categories_labels}')
+
+    return coco_json, category_to_id
+
+
+def convert_labelme_to_coco(image_dir: str,
+                            labels_dir: str,
+                            out_path: str,
+                            class_id_txt: Optional[str] = None):
+    """Convert labelme format label to COCO json format label.
+
+    Args:
+        image_dir (str): Image dir path.
+        labels_dir (str): Image label path.
+        out_path (str): COCO json file save path.
+        class_id_txt (Optional[str]): All class id txt file path.
+            Default None.
+    """
+    assert Path(out_path).suffix == '.json'
+
+    if class_id_txt is not None:
+        assert Path(class_id_txt).suffix == '.txt'
+
+        all_classes_id = dict()
+        with open(class_id_txt, encoding='utf-8') as f:
+            txt_lines = f.read().splitlines()
+        assert len(txt_lines) > 0
+
+        for txt_line in txt_lines:
+            v, k = txt_line.split(' ')
+            all_classes_id.update({k: v})
+    else:
+        all_classes_id = None
+
+    # convert to coco json
+    coco_json_data, category_to_id = parse_labelme_to_coco(
+        image_dir, labels_dir, all_classes_id)
+
+    # save json result
+    Path(out_path).parent.mkdir(exist_ok=True, parents=True)
+    print(f'Saving json to {out_path}')
+    json.dump(coco_json_data, open(out_path, 'w'), indent=2)
+
+    if class_id_txt is None:
+        category_to_id_path = Path(out_path).with_name('class_with_id.txt')
+        print(f'Saving class id txt to {category_to_id_path}')
+        with open(category_to_id_path, 'w', encoding='utf-8') as f:
+            for k, v in category_to_id.items():
+                f.write(f'{v} {k}\n')
+    else:
+        print('Not Saving new class id txt, user should using '
+              f'{class_id_txt} for training config')
+
+
+def main():
+    args = parse_args()
+    convert_labelme_to_coco(args.img_dir, args.labels_dir, args.out,
+                            args.class_id_txt)
+    print('All done!')
+
+
+if __name__ == '__main__':
+    main()
diff --git a/tools/misc/coco_split.py b/tools/misc/coco_split.py
new file mode 100644
index 000000000..8ce70349b
--- /dev/null
+++ b/tools/misc/coco_split.py
@@ -0,0 +1,122 @@
+# Copyright (c) OpenMMLab. All rights reserved.
+import argparse
+import json
+import random
+from pathlib import Path
+
+import numpy as np
+from pycocotools.coco import COCO
+
+
+def parse_args():
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        '--json', type=str, required=True, help='COCO json label path')
+    parser.add_argument(
+        '--out-dir', type=str, required=True, help='output path')
+    parser.add_argument(
+        '--ratios',
+        nargs='+',
+        type=float,
+        help='ratio for sub dataset, if set 2 number then will generate '
+        'trainval + test (eg. "0.8 0.1 0.1" or "2 1 1"), if set 3 number '
+        'then will generate train + val + test (eg. "0.85 0.15" or "2 1")')
+    parser.add_argument(
+        '--shuffle',
+        action='store_true',
+        help='Whether to display in disorder')
+    parser.add_argument('--seed', default=-1, type=int, help='seed')
+    args = parser.parse_args()
+    return args
+
+
+def split_coco_dataset(coco_json_path: str, save_dir: str, ratios: list,
+                       shuffle: bool, seed: int):
+    if not Path(coco_json_path).exists():
+        raise FileNotFoundError(f'Can not not found {coco_json_path}')
+
+    if not Path(save_dir).exists():
+        Path(save_dir).mkdir(parents=True)
+
+    # ratio normalize
+    ratios = np.array(ratios) / np.array(ratios).sum()
+
+    if len(ratios) == 2:
+        ratio_train, ratio_test = ratios
+        ratio_val = 0
+        train_type = 'trainval'
+    elif len(ratios) == 3:
+        ratio_train, ratio_val, ratio_test = ratios
+        train_type = 'train'
+    else:
+        raise ValueError('ratios must set 2 or 3 group!')
+
+    # Read coco info
+    coco = COCO(coco_json_path)
+    coco_image_ids = coco.getImgIds()
+
+    # gen image number of each dataset
+    val_image_num = int(len(coco_image_ids) * ratio_val)
+    test_image_num = int(len(coco_image_ids) * ratio_test)
+    train_image_num = len(coco_image_ids) - val_image_num - test_image_num
+    print('Split info: ====== \n'
+          f'Train ratio = {ratio_train}, number = {train_image_num}\n'
+          f'Val ratio = {ratio_val}, number = {val_image_num}\n'
+          f'Test ratio = {ratio_test}, number = {test_image_num}')
+
+    seed = int(seed)
+    if seed != -1:
+        print(f'Set the global seed: {seed}')
+        np.random.seed(seed)
+
+    if shuffle:
+        print('shuffle dataset.')
+        random.shuffle(coco_image_ids)
+
+    # split each dataset
+    train_image_ids = coco_image_ids[:train_image_num]
+    if val_image_num != 0:
+        val_image_ids = coco_image_ids[train_image_num:train_image_num +
+                                       val_image_num]
+    else:
+        val_image_ids = None
+    test_image_ids = coco_image_ids[train_image_num + val_image_num:]
+
+    # Save new json
+    categories = coco.loadCats(coco.getCatIds())
+    for img_id_list in [train_image_ids, val_image_ids, test_image_ids]:
+        if img_id_list is None:
+            continue
+
+        # Gen new json
+        img_dict = {
+            'images': coco.loadImgs(ids=img_id_list),
+            'categories': categories,
+            'annotations': coco.loadAnns(coco.getAnnIds(imgIds=img_id_list))
+        }
+
+        # save json
+        if img_id_list == train_image_ids:
+            json_file_path = Path(save_dir, f'{train_type}.json')
+        elif img_id_list == val_image_ids:
+            json_file_path = Path(save_dir, 'val.json')
+        elif img_id_list == test_image_ids:
+            json_file_path = Path(save_dir, 'test.json')
+        else:
+            raise ValueError('img_id_list ERROR!')
+
+        print(f'Saving json to {json_file_path}')
+        with open(json_file_path, 'w') as f_json:
+            json.dump(img_dict, f_json, ensure_ascii=False, indent=2)
+
+    print('All done!')
+
+
+def main():
+    args = parse_args()
+    split_coco_dataset(args.json, args.out_dir, args.ratios, args.shuffle,
+                       args.seed)
+
+
+if __name__ == '__main__':
+    main()
diff --git a/tools/misc/download_dataset.py b/tools/misc/download_dataset.py
index 5d4776b09..7d1c64d82 100644
--- a/tools/misc/download_dataset.py
+++ b/tools/misc/download_dataset.py
@@ -91,10 +91,14 @@ def main():
         balloon=[
             # src link: https://github.com/matterport/Mask_RCNN/releases/download/v2.1/balloon_dataset.zip # noqa
             'https://download.openmmlab.com/mmyolo/data/balloon_dataset.zip'
-        ])
+        ],
+        cat=[
+            'https://download.openmmlab.com/mmyolo/data/cat_dataset.zip'  # noqa
+        ],
+    )
     url = data2url.get(args.dataset_name, None)
     if url is None:
-        print('Only support COCO, VOC, balloon,and LVIS now!')
+        print('Only support COCO, VOC, balloon, cat and LVIS now!')
         return
     download(
         url,
diff --git a/tools/misc/extract_subcoco.py b/tools/misc/extract_subcoco.py
index a797b580c..31528e0b3 100644
--- a/tools/misc/extract_subcoco.py
+++ b/tools/misc/extract_subcoco.py
@@ -49,23 +49,47 @@ def _process_data(args,
         'annotations': []
     }
 
-    images = json_data['images']
+    area_dict = {
+        'small': [0., 32 * 32],
+        'medium': [32 * 32, 96 * 96],
+        'large': [96 * 96, float('inf')]
+    }
+
     coco = COCO(ann_path)
 
+    # filter annotations by category ids and area range
+    areaRng = area_dict[args.area_size] if args.area_size else []
+    catIds = coco.getCatIds(args.classes) if args.classes else []
+    ann_ids = coco.getAnnIds(catIds=catIds, areaRng=areaRng)
+    ann_info = coco.loadAnns(ann_ids)
+
+    # get image ids by anns set
+    filter_img_ids = {ann['image_id'] for ann in ann_info}
+    filter_img = coco.loadImgs(filter_img_ids)
+
     # shuffle
-    np.random.shuffle(images)
+    np.random.shuffle(filter_img)
 
-    progress_bar = mmengine.ProgressBar(args.num_img)
+    num_img = args.num_img if args.num_img > 0 else len(filter_img)
+    if num_img > len(filter_img):
+        print(
+            f'num_img is too big, will be set to {len(filter_img)}, '
+            'because of not enough image after filter by classes and area_size'
+        )
+        num_img = len(filter_img)
 
-    for i in range(args.num_img):
-        file_name = images[i]['file_name']
+    progress_bar = mmengine.ProgressBar(num_img)
+
+    for i in range(num_img):
+        file_name = filter_img[i]['file_name']
         image_path = osp.join(args.root, in_dataset_type + year, file_name)
 
-        ann_ids = coco.getAnnIds(imgIds=[images[i]['id']])
-        ann_info = coco.loadAnns(ann_ids)
+        ann_ids = coco.getAnnIds(
+            imgIds=[filter_img[i]['id']], catIds=catIds, areaRng=areaRng)
+        img_ann_info = coco.loadAnns(ann_ids)
 
-        new_json_data['images'].append(images[i])
-        new_json_data['annotations'].extend(ann_info)
+        new_json_data['images'].append(filter_img[i])
+        new_json_data['annotations'].extend(img_ann_info)
 
         shutil.copy(image_path, osp.join(args.out_dir,
                                          out_dataset_type + year))
@@ -88,7 +112,16 @@ def parse_args():
     parser.add_argument(
         'out_dir', type=str, help='directory where subset coco will be saved.')
     parser.add_argument(
-        '--num-img', default=50, type=int, help='num of extract image')
+        '--num-img',
+        default=50,
+        type=int,
+        help='num of extract image, -1 means all images')
+    parser.add_argument(
+        '--area-size',
+        choices=['small', 'medium', 'large'],
+        help='filter ground-truth info by area size')
+    parser.add_argument(
+        '--classes', nargs='+', help='filter ground-truth by class name')
     parser.add_argument(
         '--use-training-set',
         action='store_true',
diff --git a/tools/model_converters/yolov6_to_mmyolo.py b/tools/model_converters/yolov6_to_mmyolo.py
index c5385803a..e9e86ab46 100644
--- a/tools/model_converters/yolov6_to_mmyolo.py
+++ b/tools/model_converters/yolov6_to_mmyolo.py
@@ -28,12 +28,28 @@ def convert(src, dst):
 
         if 'ERBlock_2' in k:
             name = k.replace('ERBlock_2', 'stage1.0')
+            if '.cv' in k:
+                name = name.replace('.cv', '.conv')
+            if '.m.' in k:
+                name = name.replace('.m.', '.block.')
         elif 'ERBlock_3' in k:
             name = k.replace('ERBlock_3', 'stage2.0')
+            if '.cv' in k:
+                name = name.replace('.cv', '.conv')
+            if '.m.' in k:
+                name = name.replace('.m.', '.block.')
         elif 'ERBlock_4' in k:
             name = k.replace('ERBlock_4', 'stage3.0')
+            if '.cv' in k:
+                name = name.replace('.cv', '.conv')
+            if '.m.' in k:
+                name = name.replace('.m.', '.block.')
         elif 'ERBlock_5' in k:
             name = k.replace('ERBlock_5', 'stage4.0')
+            if '.cv' in k:
+                name = name.replace('.cv', '.conv')
+            if '.m.' in k:
+                name = name.replace('.m.', '.block.')
             if 'stage4.0.2' in name:
                 name = name.replace('stage4.0.2', 'stage4.1')
                 name = name.replace('cv', 'conv')
@@ -41,10 +57,22 @@ def convert(src, dst):
             name = k.replace('reduce_layer0', 'reduce_layers.2')
         elif 'Rep_p4' in k:
             name = k.replace('Rep_p4', 'top_down_layers.0.0')
+            if '.cv' in k:
+                name = name.replace('.cv', '.conv')
+            if '.m.' in k:
+                name = name.replace('.m.', '.block.')
         elif 'reduce_layer1' in k:
             name = k.replace('reduce_layer1', 'top_down_layers.0.1')
+            if '.cv' in k:
+                name = name.replace('.cv', '.conv')
+            if '.m.' in k:
+                name = name.replace('.m.', '.block.')
         elif 'Rep_p3' in k:
             name = k.replace('Rep_p3', 'top_down_layers.1')
+            if '.cv' in k:
+                name = name.replace('.cv', '.conv')
+            if '.m.' in k:
+                name = name.replace('.m.', '.block.')
         elif 'upsample0' in k:
             name = k.replace('upsample0.upsample_transpose',
                              'upsample_layers.0')
@@ -53,8 +81,16 @@ def convert(src, dst):
                              'upsample_layers.1')
         elif 'Rep_n3' in k:
             name = k.replace('Rep_n3', 'bottom_up_layers.0')
+            if '.cv' in k:
+                name = name.replace('.cv', '.conv')
+            if '.m.' in k:
+                name = name.replace('.m.', '.block.')
         elif 'Rep_n4' in k:
             name = k.replace('Rep_n4', 'bottom_up_layers.1')
+            if '.cv' in k:
+                name = name.replace('.cv', '.conv')
+            if '.m.' in k:
+                name = name.replace('.m.', '.block.')
         elif 'downsample2' in k:
             name = k.replace('downsample2', 'downsample_layers.0')
         elif 'downsample1' in k:
diff --git a/tools/model_converters/yolov7_to_mmyolo.py b/tools/model_converters/yolov7_to_mmyolo.py
index ced4157b5..f8bff9472 100644
--- a/tools/model_converters/yolov7_to_mmyolo.py
+++ b/tools/model_converters/yolov7_to_mmyolo.py
@@ -1,10 +1,85 @@
 # Copyright (c) OpenMMLab. All rights reserved.
 import argparse
+import os.path as osp
 from collections import OrderedDict
 
 import torch
 
-convert_dict = {
+convert_dict_tiny = {
+    # stem
+    'model.0': 'backbone.stem.0',
+    'model.1': 'backbone.stem.1',
+
+    # stage1 TinyDownSampleBlock
+    'model.2': 'backbone.stage1.0.short_conv',
+    'model.3': 'backbone.stage1.0.main_convs.0',
+    'model.4': 'backbone.stage1.0.main_convs.1',
+    'model.5': 'backbone.stage1.0.main_convs.2',
+    'model.7': 'backbone.stage1.0.final_conv',
+
+    # stage2  TinyDownSampleBlock
+    'model.9': 'backbone.stage2.1.short_conv',
+    'model.10': 'backbone.stage2.1.main_convs.0',
+    'model.11': 'backbone.stage2.1.main_convs.1',
+    'model.12': 'backbone.stage2.1.main_convs.2',
+    'model.14': 'backbone.stage2.1.final_conv',
+
+    # stage3 TinyDownSampleBlock
+    'model.16': 'backbone.stage3.1.short_conv',
+    'model.17': 'backbone.stage3.1.main_convs.0',
+    'model.18': 'backbone.stage3.1.main_convs.1',
+    'model.19': 'backbone.stage3.1.main_convs.2',
+    'model.21': 'backbone.stage3.1.final_conv',
+
+    # stage4 TinyDownSampleBlock
+    'model.23': 'backbone.stage4.1.short_conv',
+    'model.24': 'backbone.stage4.1.main_convs.0',
+    'model.25': 'backbone.stage4.1.main_convs.1',
+    'model.26': 'backbone.stage4.1.main_convs.2',
+    'model.28': 'backbone.stage4.1.final_conv',
+
+    # neck SPPCSPBlock
+    'model.29': 'neck.reduce_layers.2.short_layer',
+    'model.30': 'neck.reduce_layers.2.main_layers',
+    'model.35': 'neck.reduce_layers.2.fuse_layers',
+    'model.37': 'neck.reduce_layers.2.final_conv',
+    'model.38': 'neck.upsample_layers.0.0',
+    'model.40': 'neck.reduce_layers.1',
+    'model.42': 'neck.top_down_layers.0.short_conv',
+    'model.43': 'neck.top_down_layers.0.main_convs.0',
+    'model.44': 'neck.top_down_layers.0.main_convs.1',
+    'model.45': 'neck.top_down_layers.0.main_convs.2',
+    'model.47': 'neck.top_down_layers.0.final_conv',
+    'model.48': 'neck.upsample_layers.1.0',
+    'model.50': 'neck.reduce_layers.0',
+    'model.52': 'neck.top_down_layers.1.short_conv',
+    'model.53': 'neck.top_down_layers.1.main_convs.0',
+    'model.54': 'neck.top_down_layers.1.main_convs.1',
+    'model.55': 'neck.top_down_layers.1.main_convs.2',
+    'model.57': 'neck.top_down_layers.1.final_conv',
+    'model.58': 'neck.downsample_layers.0',
+    'model.60': 'neck.bottom_up_layers.0.short_conv',
+    'model.61': 'neck.bottom_up_layers.0.main_convs.0',
+    'model.62': 'neck.bottom_up_layers.0.main_convs.1',
+    'model.63': 'neck.bottom_up_layers.0.main_convs.2',
+    'model.65': 'neck.bottom_up_layers.0.final_conv',
+    'model.66': 'neck.downsample_layers.1',
+    'model.68': 'neck.bottom_up_layers.1.short_conv',
+    'model.69': 'neck.bottom_up_layers.1.main_convs.0',
+    'model.70': 'neck.bottom_up_layers.1.main_convs.1',
+    'model.71': 'neck.bottom_up_layers.1.main_convs.2',
+    'model.73': 'neck.bottom_up_layers.1.final_conv',
+    'model.74': 'neck.out_layers.0',
+    'model.75': 'neck.out_layers.1',
+    'model.76': 'neck.out_layers.2',
+
+    # head
+    'model.77.m.0': 'bbox_head.head_module.convs_pred.0.1',
+    'model.77.m.1': 'bbox_head.head_module.convs_pred.1.1',
+    'model.77.m.2': 'bbox_head.head_module.convs_pred.2.1'
+}
+
+convert_dict_l = {
     # stem
     'model.0': 'backbone.stem.0',
     'model.1': 'backbone.stem.1',
@@ -70,7 +145,7 @@
     'model.51.cv4': 'neck.reduce_layers.2.main_layers.2',
     'model.51.cv5': 'neck.reduce_layers.2.fuse_layers.0',
     'model.51.cv6': 'neck.reduce_layers.2.fuse_layers.1',
-    'model.51.cv2': 'neck.reduce_layers.2.short_layers',
+    'model.51.cv2': 'neck.reduce_layers.2.short_layer',
     'model.51.cv7': 'neck.reduce_layers.2.final_conv',
 
     # neck
@@ -140,11 +215,522 @@
     'model.104.rbr_1x1.1': 'neck.out_layers.2.rbr_1x1.bn',
 
     # head
-    'model.105.m': 'bbox_head.head_module.convs_pred'
+    'model.105.m.0': 'bbox_head.head_module.convs_pred.0.1',
+    'model.105.m.1': 'bbox_head.head_module.convs_pred.1.1',
+    'model.105.m.2': 'bbox_head.head_module.convs_pred.2.1'
+}
+
+convert_dict_x = {
+    # stem
+    'model.0': 'backbone.stem.0',
+    'model.1': 'backbone.stem.1',
+    'model.2': 'backbone.stem.2',
+
+    # stage1
+    # ConvModule
+    'model.3': 'backbone.stage1.0',
+    # ELANBlock expand_channel_2x
+    'model.4': 'backbone.stage1.1.short_conv',
+    'model.5': 'backbone.stage1.1.main_conv',
+    'model.6': 'backbone.stage1.1.blocks.0.0',
+    'model.7': 'backbone.stage1.1.blocks.0.1',
+    'model.8': 'backbone.stage1.1.blocks.1.0',
+    'model.9': 'backbone.stage1.1.blocks.1.1',
+    'model.10': 'backbone.stage1.1.blocks.2.0',
+    'model.11': 'backbone.stage1.1.blocks.2.1',
+    'model.13': 'backbone.stage1.1.final_conv',
+
+    # stage2
+    # MaxPoolBlock reduce_channel_2x
+    'model.15': 'backbone.stage2.0.maxpool_branches.1',
+    'model.16': 'backbone.stage2.0.stride_conv_branches.0',
+    'model.17': 'backbone.stage2.0.stride_conv_branches.1',
+
+    # ELANBlock expand_channel_2x
+    'model.19': 'backbone.stage2.1.short_conv',
+    'model.20': 'backbone.stage2.1.main_conv',
+    'model.21': 'backbone.stage2.1.blocks.0.0',
+    'model.22': 'backbone.stage2.1.blocks.0.1',
+    'model.23': 'backbone.stage2.1.blocks.1.0',
+    'model.24': 'backbone.stage2.1.blocks.1.1',
+    'model.25': 'backbone.stage2.1.blocks.2.0',
+    'model.26': 'backbone.stage2.1.blocks.2.1',
+    'model.28': 'backbone.stage2.1.final_conv',
+
+    # stage3
+    # MaxPoolBlock reduce_channel_2x
+    'model.30': 'backbone.stage3.0.maxpool_branches.1',
+    'model.31': 'backbone.stage3.0.stride_conv_branches.0',
+    'model.32': 'backbone.stage3.0.stride_conv_branches.1',
+    # ELANBlock expand_channel_2x
+    'model.34': 'backbone.stage3.1.short_conv',
+    'model.35': 'backbone.stage3.1.main_conv',
+    'model.36': 'backbone.stage3.1.blocks.0.0',
+    'model.37': 'backbone.stage3.1.blocks.0.1',
+    'model.38': 'backbone.stage3.1.blocks.1.0',
+    'model.39': 'backbone.stage3.1.blocks.1.1',
+    'model.40': 'backbone.stage3.1.blocks.2.0',
+    'model.41': 'backbone.stage3.1.blocks.2.1',
+    'model.43': 'backbone.stage3.1.final_conv',
+
+    # stage4
+    # MaxPoolBlock reduce_channel_2x
+    'model.45': 'backbone.stage4.0.maxpool_branches.1',
+    'model.46': 'backbone.stage4.0.stride_conv_branches.0',
+    'model.47': 'backbone.stage4.0.stride_conv_branches.1',
+    # ELANBlock no_change_channel
+    'model.49': 'backbone.stage4.1.short_conv',
+    'model.50': 'backbone.stage4.1.main_conv',
+    'model.51': 'backbone.stage4.1.blocks.0.0',
+    'model.52': 'backbone.stage4.1.blocks.0.1',
+    'model.53': 'backbone.stage4.1.blocks.1.0',
+    'model.54': 'backbone.stage4.1.blocks.1.1',
+    'model.55': 'backbone.stage4.1.blocks.2.0',
+    'model.56': 'backbone.stage4.1.blocks.2.1',
+    'model.58': 'backbone.stage4.1.final_conv',
+
+    # neck SPPCSPBlock
+    'model.59.cv1': 'neck.reduce_layers.2.main_layers.0',
+    'model.59.cv3': 'neck.reduce_layers.2.main_layers.1',
+    'model.59.cv4': 'neck.reduce_layers.2.main_layers.2',
+    'model.59.cv5': 'neck.reduce_layers.2.fuse_layers.0',
+    'model.59.cv6': 'neck.reduce_layers.2.fuse_layers.1',
+    'model.59.cv2': 'neck.reduce_layers.2.short_layer',
+    'model.59.cv7': 'neck.reduce_layers.2.final_conv',
+
+    # neck
+    'model.60': 'neck.upsample_layers.0.0',
+    'model.62': 'neck.reduce_layers.1',
+
+    # neck ELANBlock reduce_channel_2x
+    'model.64': 'neck.top_down_layers.0.short_conv',
+    'model.65': 'neck.top_down_layers.0.main_conv',
+    'model.66': 'neck.top_down_layers.0.blocks.0.0',
+    'model.67': 'neck.top_down_layers.0.blocks.0.1',
+    'model.68': 'neck.top_down_layers.0.blocks.1.0',
+    'model.69': 'neck.top_down_layers.0.blocks.1.1',
+    'model.70': 'neck.top_down_layers.0.blocks.2.0',
+    'model.71': 'neck.top_down_layers.0.blocks.2.1',
+    'model.73': 'neck.top_down_layers.0.final_conv',
+    'model.74': 'neck.upsample_layers.1.0',
+    'model.76': 'neck.reduce_layers.0',
+
+    # neck ELANBlock reduce_channel_2x
+    'model.78': 'neck.top_down_layers.1.short_conv',
+    'model.79': 'neck.top_down_layers.1.main_conv',
+    'model.80': 'neck.top_down_layers.1.blocks.0.0',
+    'model.81': 'neck.top_down_layers.1.blocks.0.1',
+    'model.82': 'neck.top_down_layers.1.blocks.1.0',
+    'model.83': 'neck.top_down_layers.1.blocks.1.1',
+    'model.84': 'neck.top_down_layers.1.blocks.2.0',
+    'model.85': 'neck.top_down_layers.1.blocks.2.1',
+    'model.87': 'neck.top_down_layers.1.final_conv',
+
+    # neck MaxPoolBlock no_change_channel
+    'model.89': 'neck.downsample_layers.0.maxpool_branches.1',
+    'model.90': 'neck.downsample_layers.0.stride_conv_branches.0',
+    'model.91': 'neck.downsample_layers.0.stride_conv_branches.1',
+
+    # neck ELANBlock reduce_channel_2x
+    'model.93': 'neck.bottom_up_layers.0.short_conv',
+    'model.94': 'neck.bottom_up_layers.0.main_conv',
+    'model.95': 'neck.bottom_up_layers.0.blocks.0.0',
+    'model.96': 'neck.bottom_up_layers.0.blocks.0.1',
+    'model.97': 'neck.bottom_up_layers.0.blocks.1.0',
+    'model.98': 'neck.bottom_up_layers.0.blocks.1.1',
+    'model.99': 'neck.bottom_up_layers.0.blocks.2.0',
+    'model.100': 'neck.bottom_up_layers.0.blocks.2.1',
+    'model.102': 'neck.bottom_up_layers.0.final_conv',
+
+    # neck MaxPoolBlock no_change_channel
+    'model.104': 'neck.downsample_layers.1.maxpool_branches.1',
+    'model.105': 'neck.downsample_layers.1.stride_conv_branches.0',
+    'model.106': 'neck.downsample_layers.1.stride_conv_branches.1',
+
+    # neck ELANBlock reduce_channel_2x
+    'model.108': 'neck.bottom_up_layers.1.short_conv',
+    'model.109': 'neck.bottom_up_layers.1.main_conv',
+    'model.110': 'neck.bottom_up_layers.1.blocks.0.0',
+    'model.111': 'neck.bottom_up_layers.1.blocks.0.1',
+    'model.112': 'neck.bottom_up_layers.1.blocks.1.0',
+    'model.113': 'neck.bottom_up_layers.1.blocks.1.1',
+    'model.114': 'neck.bottom_up_layers.1.blocks.2.0',
+    'model.115': 'neck.bottom_up_layers.1.blocks.2.1',
+    'model.117': 'neck.bottom_up_layers.1.final_conv',
+
+    # Conv
+    'model.118': 'neck.out_layers.0',
+    'model.119': 'neck.out_layers.1',
+    'model.120': 'neck.out_layers.2',
+
+    # head
+    'model.121.m.0': 'bbox_head.head_module.convs_pred.0.1',
+    'model.121.m.1': 'bbox_head.head_module.convs_pred.1.1',
+    'model.121.m.2': 'bbox_head.head_module.convs_pred.2.1'
+}
+
+convert_dict_w = {
+    # stem
+    'model.1': 'backbone.stem.conv',
+
+    # stage1
+    # ConvModule
+    'model.2': 'backbone.stage1.0',
+    # ELANBlock
+    'model.3': 'backbone.stage1.1.short_conv',
+    'model.4': 'backbone.stage1.1.main_conv',
+    'model.5': 'backbone.stage1.1.blocks.0.0',
+    'model.6': 'backbone.stage1.1.blocks.0.1',
+    'model.7': 'backbone.stage1.1.blocks.1.0',
+    'model.8': 'backbone.stage1.1.blocks.1.1',
+    'model.10': 'backbone.stage1.1.final_conv',
+
+    # stage2
+    'model.11': 'backbone.stage2.0',
+    # ELANBlock
+    'model.12': 'backbone.stage2.1.short_conv',
+    'model.13': 'backbone.stage2.1.main_conv',
+    'model.14': 'backbone.stage2.1.blocks.0.0',
+    'model.15': 'backbone.stage2.1.blocks.0.1',
+    'model.16': 'backbone.stage2.1.blocks.1.0',
+    'model.17': 'backbone.stage2.1.blocks.1.1',
+    'model.19': 'backbone.stage2.1.final_conv',
+
+    # stage3
+    'model.20': 'backbone.stage3.0',
+    # ELANBlock
+    'model.21': 'backbone.stage3.1.short_conv',
+    'model.22': 'backbone.stage3.1.main_conv',
+    'model.23': 'backbone.stage3.1.blocks.0.0',
+    'model.24': 'backbone.stage3.1.blocks.0.1',
+    'model.25': 'backbone.stage3.1.blocks.1.0',
+    'model.26': 'backbone.stage3.1.blocks.1.1',
+    'model.28': 'backbone.stage3.1.final_conv',
+
+    # stage4
+    'model.29': 'backbone.stage4.0',
+    # ELANBlock
+    'model.30': 'backbone.stage4.1.short_conv',
+    'model.31': 'backbone.stage4.1.main_conv',
+    'model.32': 'backbone.stage4.1.blocks.0.0',
+    'model.33': 'backbone.stage4.1.blocks.0.1',
+    'model.34': 'backbone.stage4.1.blocks.1.0',
+    'model.35': 'backbone.stage4.1.blocks.1.1',
+    'model.37': 'backbone.stage4.1.final_conv',
+
+    # stage5
+    'model.38': 'backbone.stage5.0',
+    # ELANBlock
+    'model.39': 'backbone.stage5.1.short_conv',
+    'model.40': 'backbone.stage5.1.main_conv',
+    'model.41': 'backbone.stage5.1.blocks.0.0',
+    'model.42': 'backbone.stage5.1.blocks.0.1',
+    'model.43': 'backbone.stage5.1.blocks.1.0',
+    'model.44': 'backbone.stage5.1.blocks.1.1',
+    'model.46': 'backbone.stage5.1.final_conv',
+
+    # neck SPPCSPBlock
+    'model.47.cv1': 'neck.reduce_layers.3.main_layers.0',
+    'model.47.cv3': 'neck.reduce_layers.3.main_layers.1',
+    'model.47.cv4': 'neck.reduce_layers.3.main_layers.2',
+    'model.47.cv5': 'neck.reduce_layers.3.fuse_layers.0',
+    'model.47.cv6': 'neck.reduce_layers.3.fuse_layers.1',
+    'model.47.cv2': 'neck.reduce_layers.3.short_layer',
+    'model.47.cv7': 'neck.reduce_layers.3.final_conv',
+
+    # neck
+    'model.48': 'neck.upsample_layers.0.0',
+    'model.50': 'neck.reduce_layers.2',
+
+    # neck ELANBlock
+    'model.52': 'neck.top_down_layers.0.short_conv',
+    'model.53': 'neck.top_down_layers.0.main_conv',
+    'model.54': 'neck.top_down_layers.0.blocks.0',
+    'model.55': 'neck.top_down_layers.0.blocks.1',
+    'model.56': 'neck.top_down_layers.0.blocks.2',
+    'model.57': 'neck.top_down_layers.0.blocks.3',
+    'model.59': 'neck.top_down_layers.0.final_conv',
+    'model.60': 'neck.upsample_layers.1.0',
+    'model.62': 'neck.reduce_layers.1',
+
+    # neck ELANBlock reduce_channel_2x
+    'model.64': 'neck.top_down_layers.1.short_conv',
+    'model.65': 'neck.top_down_layers.1.main_conv',
+    'model.66': 'neck.top_down_layers.1.blocks.0',
+    'model.67': 'neck.top_down_layers.1.blocks.1',
+    'model.68': 'neck.top_down_layers.1.blocks.2',
+    'model.69': 'neck.top_down_layers.1.blocks.3',
+    'model.71': 'neck.top_down_layers.1.final_conv',
+    'model.72': 'neck.upsample_layers.2.0',
+    'model.74': 'neck.reduce_layers.0',
+    'model.76': 'neck.top_down_layers.2.short_conv',
+    'model.77': 'neck.top_down_layers.2.main_conv',
+    'model.78': 'neck.top_down_layers.2.blocks.0',
+    'model.79': 'neck.top_down_layers.2.blocks.1',
+    'model.80': 'neck.top_down_layers.2.blocks.2',
+    'model.81': 'neck.top_down_layers.2.blocks.3',
+    'model.83': 'neck.top_down_layers.2.final_conv',
+    'model.84': 'neck.downsample_layers.0',
+
+    # neck ELANBlock
+    'model.86': 'neck.bottom_up_layers.0.short_conv',
+    'model.87': 'neck.bottom_up_layers.0.main_conv',
+    'model.88': 'neck.bottom_up_layers.0.blocks.0',
+    'model.89': 'neck.bottom_up_layers.0.blocks.1',
+    'model.90': 'neck.bottom_up_layers.0.blocks.2',
+    'model.91': 'neck.bottom_up_layers.0.blocks.3',
+    'model.93': 'neck.bottom_up_layers.0.final_conv',
+    'model.94': 'neck.downsample_layers.1',
+
+    # neck ELANBlock reduce_channel_2x
+    'model.96': 'neck.bottom_up_layers.1.short_conv',
+    'model.97': 'neck.bottom_up_layers.1.main_conv',
+    'model.98': 'neck.bottom_up_layers.1.blocks.0',
+    'model.99': 'neck.bottom_up_layers.1.blocks.1',
+    'model.100': 'neck.bottom_up_layers.1.blocks.2',
+    'model.101': 'neck.bottom_up_layers.1.blocks.3',
+    'model.103': 'neck.bottom_up_layers.1.final_conv',
+    'model.104': 'neck.downsample_layers.2',
+
+    # neck ELANBlock reduce_channel_2x
+    'model.106': 'neck.bottom_up_layers.2.short_conv',
+    'model.107': 'neck.bottom_up_layers.2.main_conv',
+    'model.108': 'neck.bottom_up_layers.2.blocks.0',
+    'model.109': 'neck.bottom_up_layers.2.blocks.1',
+    'model.110': 'neck.bottom_up_layers.2.blocks.2',
+    'model.111': 'neck.bottom_up_layers.2.blocks.3',
+    'model.113': 'neck.bottom_up_layers.2.final_conv',
+    'model.114': 'bbox_head.head_module.main_convs_pred.0.0',
+    'model.115': 'bbox_head.head_module.main_convs_pred.1.0',
+    'model.116': 'bbox_head.head_module.main_convs_pred.2.0',
+    'model.117': 'bbox_head.head_module.main_convs_pred.3.0',
+
+    # head
+    'model.118.m.0': 'bbox_head.head_module.main_convs_pred.0.2',
+    'model.118.m.1': 'bbox_head.head_module.main_convs_pred.1.2',
+    'model.118.m.2': 'bbox_head.head_module.main_convs_pred.2.2',
+    'model.118.m.3': 'bbox_head.head_module.main_convs_pred.3.2'
+}
+
+convert_dict_e = {
+    # stem
+    'model.1': 'backbone.stem.conv',
+
+    # stage1
+    'model.2.cv1': 'backbone.stage1.0.stride_conv_branches.0',
+    'model.2.cv2': 'backbone.stage1.0.stride_conv_branches.1',
+    'model.2.cv3': 'backbone.stage1.0.maxpool_branches.1',
+
+    # ELANBlock
+    'model.3': 'backbone.stage1.1.short_conv',
+    'model.4': 'backbone.stage1.1.main_conv',
+    'model.5': 'backbone.stage1.1.blocks.0.0',
+    'model.6': 'backbone.stage1.1.blocks.0.1',
+    'model.7': 'backbone.stage1.1.blocks.1.0',
+    'model.8': 'backbone.stage1.1.blocks.1.1',
+    'model.9': 'backbone.stage1.1.blocks.2.0',
+    'model.10': 'backbone.stage1.1.blocks.2.1',
+    'model.12': 'backbone.stage1.1.final_conv',
+
+    # stage2
+    'model.13.cv1': 'backbone.stage2.0.stride_conv_branches.0',
+    'model.13.cv2': 'backbone.stage2.0.stride_conv_branches.1',
+    'model.13.cv3': 'backbone.stage2.0.maxpool_branches.1',
+
+    # ELANBlock
+    'model.14': 'backbone.stage2.1.short_conv',
+    'model.15': 'backbone.stage2.1.main_conv',
+    'model.16': 'backbone.stage2.1.blocks.0.0',
+    'model.17': 'backbone.stage2.1.blocks.0.1',
+    'model.18': 'backbone.stage2.1.blocks.1.0',
+    'model.19': 'backbone.stage2.1.blocks.1.1',
+    'model.20': 'backbone.stage2.1.blocks.2.0',
+    'model.21': 'backbone.stage2.1.blocks.2.1',
+    'model.23': 'backbone.stage2.1.final_conv',
+
+    # stage3
+    'model.24.cv1': 'backbone.stage3.0.stride_conv_branches.0',
+    'model.24.cv2': 'backbone.stage3.0.stride_conv_branches.1',
+    'model.24.cv3': 'backbone.stage3.0.maxpool_branches.1',
+
+    # ELANBlock
+    'model.25': 'backbone.stage3.1.short_conv',
+    'model.26': 'backbone.stage3.1.main_conv',
+    'model.27': 'backbone.stage3.1.blocks.0.0',
+    'model.28': 'backbone.stage3.1.blocks.0.1',
+    'model.29': 'backbone.stage3.1.blocks.1.0',
+    'model.30': 'backbone.stage3.1.blocks.1.1',
+    'model.31': 'backbone.stage3.1.blocks.2.0',
+    'model.32': 'backbone.stage3.1.blocks.2.1',
+    'model.34': 'backbone.stage3.1.final_conv',
+
+    # stage4
+    'model.35.cv1': 'backbone.stage4.0.stride_conv_branches.0',
+    'model.35.cv2': 'backbone.stage4.0.stride_conv_branches.1',
+    'model.35.cv3': 'backbone.stage4.0.maxpool_branches.1',
+
+    # ELANBlock
+    'model.36': 'backbone.stage4.1.short_conv',
+    'model.37': 'backbone.stage4.1.main_conv',
+    'model.38': 'backbone.stage4.1.blocks.0.0',
+    'model.39': 'backbone.stage4.1.blocks.0.1',
+    'model.40': 'backbone.stage4.1.blocks.1.0',
+    'model.41': 'backbone.stage4.1.blocks.1.1',
+    'model.42': 'backbone.stage4.1.blocks.2.0',
+    'model.43': 'backbone.stage4.1.blocks.2.1',
+    'model.45': 'backbone.stage4.1.final_conv',
+
+    # stage5
+    'model.46.cv1': 'backbone.stage5.0.stride_conv_branches.0',
+    'model.46.cv2': 'backbone.stage5.0.stride_conv_branches.1',
+    'model.46.cv3': 'backbone.stage5.0.maxpool_branches.1',
+
+    # ELANBlock
+    'model.47': 'backbone.stage5.1.short_conv',
+    'model.48': 'backbone.stage5.1.main_conv',
+    'model.49': 'backbone.stage5.1.blocks.0.0',
+    'model.50': 'backbone.stage5.1.blocks.0.1',
+    'model.51': 'backbone.stage5.1.blocks.1.0',
+    'model.52': 'backbone.stage5.1.blocks.1.1',
+    'model.53': 'backbone.stage5.1.blocks.2.0',
+    'model.54': 'backbone.stage5.1.blocks.2.1',
+    'model.56': 'backbone.stage5.1.final_conv',
+
+    # neck SPPCSPBlock
+    'model.57.cv1': 'neck.reduce_layers.3.main_layers.0',
+    'model.57.cv3': 'neck.reduce_layers.3.main_layers.1',
+    'model.57.cv4': 'neck.reduce_layers.3.main_layers.2',
+    'model.57.cv5': 'neck.reduce_layers.3.fuse_layers.0',
+    'model.57.cv6': 'neck.reduce_layers.3.fuse_layers.1',
+    'model.57.cv2': 'neck.reduce_layers.3.short_layer',
+    'model.57.cv7': 'neck.reduce_layers.3.final_conv',
+
+    # neck
+    'model.58': 'neck.upsample_layers.0.0',
+    'model.60': 'neck.reduce_layers.2',
+
+    # neck ELANBlock
+    'model.62': 'neck.top_down_layers.0.short_conv',
+    'model.63': 'neck.top_down_layers.0.main_conv',
+    'model.64': 'neck.top_down_layers.0.blocks.0',
+    'model.65': 'neck.top_down_layers.0.blocks.1',
+    'model.66': 'neck.top_down_layers.0.blocks.2',
+    'model.67': 'neck.top_down_layers.0.blocks.3',
+    'model.68': 'neck.top_down_layers.0.blocks.4',
+    'model.69': 'neck.top_down_layers.0.blocks.5',
+    'model.71': 'neck.top_down_layers.0.final_conv',
+    'model.72': 'neck.upsample_layers.1.0',
+    'model.74': 'neck.reduce_layers.1',
+
+    # neck ELANBlock
+    'model.76': 'neck.top_down_layers.1.short_conv',
+    'model.77': 'neck.top_down_layers.1.main_conv',
+    'model.78': 'neck.top_down_layers.1.blocks.0',
+    'model.79': 'neck.top_down_layers.1.blocks.1',
+    'model.80': 'neck.top_down_layers.1.blocks.2',
+    'model.81': 'neck.top_down_layers.1.blocks.3',
+    'model.82': 'neck.top_down_layers.1.blocks.4',
+    'model.83': 'neck.top_down_layers.1.blocks.5',
+    'model.85': 'neck.top_down_layers.1.final_conv',
+    'model.86': 'neck.upsample_layers.2.0',
+    'model.88': 'neck.reduce_layers.0',
+    'model.90': 'neck.top_down_layers.2.short_conv',
+    'model.91': 'neck.top_down_layers.2.main_conv',
+    'model.92': 'neck.top_down_layers.2.blocks.0',
+    'model.93': 'neck.top_down_layers.2.blocks.1',
+    'model.94': 'neck.top_down_layers.2.blocks.2',
+    'model.95': 'neck.top_down_layers.2.blocks.3',
+    'model.96': 'neck.top_down_layers.2.blocks.4',
+    'model.97': 'neck.top_down_layers.2.blocks.5',
+    'model.99': 'neck.top_down_layers.2.final_conv',
+    'model.100.cv1': 'neck.downsample_layers.0.stride_conv_branches.0',
+    'model.100.cv2': 'neck.downsample_layers.0.stride_conv_branches.1',
+    'model.100.cv3': 'neck.downsample_layers.0.maxpool_branches.1',
+
+    # neck ELANBlock
+    'model.102': 'neck.bottom_up_layers.0.short_conv',
+    'model.103': 'neck.bottom_up_layers.0.main_conv',
+    'model.104': 'neck.bottom_up_layers.0.blocks.0',
+    'model.105': 'neck.bottom_up_layers.0.blocks.1',
+    'model.106': 'neck.bottom_up_layers.0.blocks.2',
+    'model.107': 'neck.bottom_up_layers.0.blocks.3',
+    'model.108': 'neck.bottom_up_layers.0.blocks.4',
+    'model.109': 'neck.bottom_up_layers.0.blocks.5',
+    'model.111': 'neck.bottom_up_layers.0.final_conv',
+    'model.112.cv1': 'neck.downsample_layers.1.stride_conv_branches.0',
+    'model.112.cv2': 'neck.downsample_layers.1.stride_conv_branches.1',
+    'model.112.cv3': 'neck.downsample_layers.1.maxpool_branches.1',
+
+    # neck ELANBlock
+    'model.114': 'neck.bottom_up_layers.1.short_conv',
+    'model.115': 'neck.bottom_up_layers.1.main_conv',
+    'model.116': 'neck.bottom_up_layers.1.blocks.0',
+    'model.117': 'neck.bottom_up_layers.1.blocks.1',
+    'model.118': 'neck.bottom_up_layers.1.blocks.2',
+    'model.119': 'neck.bottom_up_layers.1.blocks.3',
+    'model.120': 'neck.bottom_up_layers.1.blocks.4',
+    'model.121': 'neck.bottom_up_layers.1.blocks.5',
+    'model.123': 'neck.bottom_up_layers.1.final_conv',
+    'model.124.cv1': 'neck.downsample_layers.2.stride_conv_branches.0',
+    'model.124.cv2': 'neck.downsample_layers.2.stride_conv_branches.1',
+    'model.124.cv3': 'neck.downsample_layers.2.maxpool_branches.1',
+
+    # neck ELANBlock
+    'model.126': 'neck.bottom_up_layers.2.short_conv',
+    'model.127': 'neck.bottom_up_layers.2.main_conv',
+    'model.128': 'neck.bottom_up_layers.2.blocks.0',
+    'model.129': 'neck.bottom_up_layers.2.blocks.1',
+    'model.130': 'neck.bottom_up_layers.2.blocks.2',
+    'model.131': 'neck.bottom_up_layers.2.blocks.3',
+    'model.132': 'neck.bottom_up_layers.2.blocks.4',
+    'model.133': 'neck.bottom_up_layers.2.blocks.5',
+    'model.135': 'neck.bottom_up_layers.2.final_conv',
+    'model.136': 'bbox_head.head_module.main_convs_pred.0.0',
+    'model.137': 'bbox_head.head_module.main_convs_pred.1.0',
+    'model.138': 'bbox_head.head_module.main_convs_pred.2.0',
+    'model.139': 'bbox_head.head_module.main_convs_pred.3.0',
+
+    # head
+    'model.140.m.0': 'bbox_head.head_module.main_convs_pred.0.2',
+    'model.140.m.1': 'bbox_head.head_module.main_convs_pred.1.2',
+    'model.140.m.2': 'bbox_head.head_module.main_convs_pred.2.2',
+    'model.140.m.3': 'bbox_head.head_module.main_convs_pred.3.2'
+}
+
+convert_dicts = {
+    'yolov7-tiny.pt': convert_dict_tiny,
+    'yolov7-w6.pt': convert_dict_w,
+    'yolov7-e6.pt': convert_dict_e,
+    'yolov7.pt': convert_dict_l,
+    'yolov7x.pt': convert_dict_x
 }
 
 
 def convert(src, dst):
+    src_key = osp.basename(src)
+    convert_dict = convert_dicts[osp.basename(src)]
+
+    num_levels = 3
+    if src_key == 'yolov7.pt':
+        indexes = [102, 51]
+        in_channels = [256, 512, 1024]
+    elif src_key == 'yolov7x.pt':
+        indexes = [121, 59]
+        in_channels = [320, 640, 1280]
+    elif src_key == 'yolov7-tiny.pt':
+        indexes = [77, 1000]
+        in_channels = [128, 256, 512]
+    elif src_key == 'yolov7-w6.pt':
+        indexes = [118, 47]
+        in_channels = [256, 512, 768, 1024]
+        num_levels = 4
+    elif src_key == 'yolov7-e6.pt':
+        indexes = [140, [2, 13, 24, 35, 46, 57, 100, 112, 124]]
+        in_channels = 320, 640, 960, 1280
+        num_levels = 4
+
+    if isinstance(indexes[1], int):
+        indexes[1] = [indexes[1]]
     """Convert keys in detectron pretrained YOLOv7 models to mmyolo style."""
     try:
         yolov7_model = torch.load(src)['model'].float()
@@ -161,24 +747,41 @@ def convert(src, dst):
             continue
 
         num, module = key.split('.')[1:3]
-        if int(num) < 102 and int(num) != 51:
+        if int(num) < indexes[0] and int(num) not in indexes[1]:
             prefix = f'model.{num}'
             new_key = key.replace(prefix, convert_dict[prefix])
             state_dict[new_key] = weight
             print(f'Convert {key} to {new_key}')
-        elif int(num) < 105 and int(num) != 51:
-            strs_key = key.split('.')[:4]
+        elif int(num) in indexes[1]:
+            strs_key = key.split('.')[:3]
             new_key = key.replace('.'.join(strs_key),
                                   convert_dict['.'.join(strs_key)])
             state_dict[new_key] = weight
             print(f'Convert {key} to {new_key}')
         else:
-            strs_key = key.split('.')[:3]
+            strs_key = key.split('.')[:4]
             new_key = key.replace('.'.join(strs_key),
                                   convert_dict['.'.join(strs_key)])
             state_dict[new_key] = weight
             print(f'Convert {key} to {new_key}')
 
+    # Add ImplicitA and ImplicitM
+    for i in range(num_levels):
+        if num_levels == 3:
+            implicit_a = f'bbox_head.head_module.' \
+                         f'convs_pred.{i}.0.implicit'
+            state_dict[implicit_a] = torch.zeros((1, in_channels[i], 1, 1))
+            implicit_m = f'bbox_head.head_module.' \
+                         f'convs_pred.{i}.2.implicit'
+            state_dict[implicit_m] = torch.ones((1, 3 * 85, 1, 1))
+        else:
+            implicit_a = f'bbox_head.head_module.' \
+                         f'main_convs_pred.{i}.1.implicit'
+            state_dict[implicit_a] = torch.zeros((1, in_channels[i], 1, 1))
+            implicit_m = f'bbox_head.head_module.' \
+                         f'main_convs_pred.{i}.3.implicit'
+            state_dict[implicit_m] = torch.ones((1, 3 * 85, 1, 1))
+
     # save checkpoint
     checkpoint = dict()
     checkpoint['state_dict'] = state_dict
@@ -189,8 +792,8 @@ def convert(src, dst):
 def main():
     parser = argparse.ArgumentParser(description='Convert model keys')
     parser.add_argument(
-        '--src', default='yolov7.pt', help='src yolov7 model path')
-    parser.add_argument('--dst', default='mm_yolov7l.pt', help='save path')
+        'src', default='yolov7.pt', help='src yolov7 model path')
+    parser.add_argument('dst', default='mm_yolov7l.pt', help='save path')
     args = parser.parse_args()
     convert(args.src, args.dst)
 
diff --git a/tools/test.py b/tools/test.py
index fc80c887a..0c5b89b89 100644
--- a/tools/test.py
+++ b/tools/test.py
@@ -12,7 +12,7 @@
 from mmyolo.utils import register_all_modules
 
 
-# TODO: support fuse_conv_bn and format_only
+# TODO: support fuse_conv_bn
 def parse_args():
     parser = argparse.ArgumentParser(
         description='MMYOLO test (and eval) a model')
@@ -24,7 +24,13 @@ def parse_args():
     parser.add_argument(
         '--out',
         type=str,
-        help='dump predictions to a pickle file for offline evaluation')
+        help='output result file (must be a .pkl file) in pickle format')
+    parser.add_argument(
+        '--json-prefix',
+        type=str,
+        help='the prefix of the output json file without perform evaluation, '
+        'which is useful when you want to format the result to a specific '
+        'format and submit it to the test server')
     parser.add_argument(
         '--show', action='store_true', help='show prediction results')
     parser.add_argument(
@@ -92,6 +98,14 @@ def main():
     if args.deploy:
         cfg.custom_hooks.append(dict(type='SwitchToDeployHook'))
 
+    # add `format_only` and `outfile_prefix` into cfg
+    if args.json_prefix is not None:
+        cfg_json = {
+            'test_evaluator.format_only': True,
+            'test_evaluator.outfile_prefix': args.json_prefix
+        }
+        cfg.merge_from_dict(cfg_json)
+
     # build the runner from config
     if 'runner_type' not in cfg:
         # build the default runner