[Feature] Support Simplebaseline3D (#2500)

open-mmlab · Jul 4, 2023 · bb0e1e9 · bb0e1e9
1 parent 07104c9
commit bb0e1e9
Show file tree

Hide file tree

Showing 18 changed files with 315 additions and 48 deletions.
diff --git a/configs/body_3d_keypoint/pose_lift/README.md b/configs/body_3d_keypoint/pose_lift/README.md
@@ -0,0 +1,51 @@
+# Single-view 3D Human Body Pose Estimation
+
+## Video-based Single-view 3D Human Body Pose Estimation
+
+Video-based 3D pose estimation is the detection and analysis of X, Y, Z coordinates of human body joints from a sequence of RGB images.
+
+For single-person 3D pose estimation from a monocular camera, existing works can be classified into three categories:
+
+(1) from 2D poses to 3D poses (2D-to-3D pose lifting)
+
+(2) jointly learning 2D and 3D poses, and
+
+(3) directly regressing 3D poses from images.
+
+### Results and Models
+
+#### Human3.6m Dataset
+
+| Arch | Receptive Field | MPJPE | P-MPJPE | N-MPJPE | ckpt | log |
+
+| :------------------------------------------------------ | :-------------: | :---: | :-----: | :-----: | :------------------------------------------------------: | :-----------------------------------------------------: |
+
+| [VideoPose3D-supervised](/configs/body_3d_keypoint/pose_lift/h36m/pose-lift_videopose3d-27frm-supv_8xb128-80e_h36m.py) | 27 | 40.1 | 30.1 | / | [ckpt](https://download.openmmlab.com/mmpose/body3d/videopose/videopose_h36m_27frames_fullconv_supervised-fe8fbba9_20210527.pth) | [log](https://download.openmmlab.com/mmpose/body3d/videopose/videopose_h36m_27frames_fullconv_supervised_20210527.log.json) |
+
+| [VideoPose3D-supervised](/configs/body_3d_keypoint/pose_lift/h36m/pose-lift_videopose3d-81frm-supv_8xb128-80e_h36m.py) | 81 | 39.1 | 29.3 | / | [ckpt](https://download.openmmlab.com/mmpose/body3d/videopose/videopose_h36m_81frames_fullconv_supervised-1f2d1104_20210527.pth) | [log](https://download.openmmlab.com/mmpose/body3d/videopose/videopose_h36m_81frames_fullconv_supervised_20210527.log.json) |
+
+| [VideoPose3D-supervised](/configs/body_3d_keypoint/pose_lift/h36m/pose-lift_videopose3d-243frm-supv_8xb128-80e_h36m.py) | 243 | | | / | [ckpt](https://download.openmmlab.com/mmpose/body3d/videopose/videopose_h36m_243frames_fullconv_supervised-880bea25_20210527.pth) | [log](https://download.openmmlab.com/mmpose/body3d/videopose/videopose_h36m_243frames_fullconv_supervised_20210527.log.json) |
+
+| [VideoPose3D-supervised-CPN](/configs/body_3d_keypoint/pose_lift/h36m/pose-lift_videopose3d-1frm-supv-cpn-ft_8xb128-80e_h36m.py) | 1 | 53.0 | 41.3 | / | [ckpt](https://download.openmmlab.com/mmpose/body3d/videopose/videopose_h36m_1frame_fullconv_supervised_cpn_ft-5c3afaed_20210527.pth) | [log](https://download.openmmlab.com/mmpose/body3d/videopose/videopose_h36m_1frame_fullconv_supervised_cpn_ft_20210527.log.json) |
+
+| [VideoPose3D-supervised-CPN](/configs/body_3d_keypoint/pose_lift/h36m/pose-lift_videopose3d-243frm-supv-cpn-ft_8xb128-200e_h36m.py) | 243 | | | / | [ckpt](https://download.openmmlab.com/mmpose/body3d/videopose/videopose_h36m_243frames_fullconv_supervised_cpn_ft-88f5abbb_20210527.pth) | [log](https://download.openmmlab.com/mmpose/body3d/videopose/videopose_h36m_243frames_fullconv_supervised_cpn_ft_20210527.log.json) |
+
+| [VideoPose3D-semi-supervised](/configs/body_3d_keypoint/pose_lift/h36m/pose-lift_videopose3d-27frm-semi-supv_8xb64-200e_h36m.py) | 27 | 57.2 | 42.4 | 54.2 | [ckpt](https://download.openmmlab.com/mmpose/body3d/videopose/videopose_h36m_27frames_fullconv_semi-supervised-54aef83b_20210527.pth) | [log](https://download.openmmlab.com/mmpose/body3d/videopose/videopose_h36m_27frames_fullconv_semi-supervised_20210527.log.json) |
+
+| [VideoPose3D-semi-supervised-CPN](/configs/body_3d_keypoint/pose_lift/h36m/pose-lift_videopose3d-27frm-semi-supv-cpn-ft_8xb64-200e_h36m.py) | 27 | 67.3 | 50.4 | 63.6 | [ckpt](https://download.openmmlab.com/mmpose/body3d/videopose/videopose_h36m_27frames_fullconv_semi-supervised_cpn_ft-71be9cde_20210527.pth) | [log](https://download.openmmlab.com/mmpose/body3d/videopose/videopose_h36m_27frames_fullconv_semi-supervised_cpn_ft_20210527.log.json) |
+
+## Image-based Single-view 3D Human Body Pose Estimation
+
+3D pose estimation is the detection and analysis of X, Y, Z coordinates of human body joints from an RGB image.
+For single-person 3D pose estimation from a monocular camera, existing works can be classified into three categories:
+(1) from 2D poses to 3D poses (2D-to-3D pose lifting)
+(2) jointly learning 2D and 3D poses, and
+(3) directly regressing 3D poses from images.
+
+### Results and Models
+
+#### Human3.6m Dataset
+
+| Arch | MPJPE | P-MPJPE | N-MPJPE | ckpt | log |
+| :------------------------------------------------------ | :-------------: | :---: | :-----: | :-----: | :------------------------------------------------------: | :-----------------------------------------------------: |
+| [SimpleBaseline3D-tcn](/configs/body_3d_keypoint/pose_lift/h36m/pose-lift_simplebaseline3d_8xb64-200e_h36m.py) | 43.4 | 34.3 | /|[ckpt](https://download.openmmlab.com/mmpose/body3d/simple_baseline/simple3Dbaseline_h36m-f0ad73a4_20210419.pth) | [log](https://download.openmmlab.com/mmpose/body3d/simple_baseline/20210415_065056.log.json) |
diff --git a/configs/body_3d_keypoint/pose_lift/h36m/pose-lift_simplebaseline3d_8xb64-200e_h36m.py b/configs/body_3d_keypoint/pose_lift/h36m/pose-lift_simplebaseline3d_8xb64-200e_h36m.py
@@ -0,0 +1,168 @@
+_base_ = ['../../../_base_/default_runtime.py']
+
+vis_backends = [
+    dict(type='LocalVisBackend'),
+]
+visualizer = dict(
+    type='Pose3dLocalVisualizer', vis_backends=vis_backends, name='visualizer')
+
+# runtime
+train_cfg = dict(max_epochs=200, val_interval=10)
+
+# optimizer
+optim_wrapper = dict(optimizer=dict(type='Adam', lr=1e-3))
+
+# learning policy
+param_scheduler = [
+    dict(type='StepLR', step_size=100000, gamma=0.96, end=80, by_epoch=False)
+]
+
+auto_scale_lr = dict(base_batch_size=512)
+
+# hooks
+default_hooks = dict(
+    checkpoint=dict(
+        type='CheckpointHook',
+        save_best='MPJPE',
+        rule='less',
+        max_keep_ckpts=1))
+
+# codec settings
+# 3D keypoint normalization parameters
+# From file: '{data_root}/annotation_body3d/fps50/joint3d_rel_stats.pkl'
+target_mean = [[-2.55652589e-04, -7.11960570e-03, -9.81433052e-04],
+               [-5.65463051e-03, 3.19636009e-01, 7.19329269e-02],
+               [-1.01705840e-02, 6.91147892e-01, 1.55352986e-01],
+               [2.55651315e-04, 7.11954606e-03, 9.81423866e-04],
+               [-5.09729780e-03, 3.27040413e-01, 7.22258095e-02],
+               [-9.99656606e-03, 7.08277383e-01, 1.58016408e-01],
+               [2.90583676e-03, -2.11363307e-01, -4.74210915e-02],
+               [5.67537804e-03, -4.35088906e-01, -9.76974016e-02],
+               [5.93884964e-03, -4.91891970e-01, -1.10666618e-01],
+               [7.37352083e-03, -5.83948619e-01, -1.31171400e-01],
+               [5.41920653e-03, -3.83931702e-01, -8.68145417e-02],
+               [2.95964662e-03, -1.87567488e-01, -4.34536934e-02],
+               [1.26585822e-03, -1.20170579e-01, -2.82526049e-02],
+               [4.67186639e-03, -3.83644089e-01, -8.55125784e-02],
+               [1.67648571e-03, -1.97007177e-01, -4.31368364e-02],
+               [8.70569015e-04, -1.68664569e-01, -3.73902498e-02]],
+target_std = [[0.11072244, 0.02238818, 0.07246294],
+              [0.15856311, 0.18933832, 0.20880479],
+              [0.19179935, 0.24320062, 0.24756193],
+              [0.11072181, 0.02238805, 0.07246253],
+              [0.15880454, 0.19977188, 0.2147063],
+              [0.18001944, 0.25052739, 0.24853247],
+              [0.05210694, 0.05211406, 0.06908241],
+              [0.09515367, 0.10133032, 0.12899733],
+              [0.11742458, 0.12648469, 0.16465091],
+              [0.12360297, 0.13085539, 0.16433336],
+              [0.14602232, 0.09707956, 0.13952731],
+              [0.24347532, 0.12982249, 0.20230181],
+              [0.2446877, 0.21501816, 0.23938235],
+              [0.13876084, 0.1008926, 0.1424411],
+              [0.23687529, 0.14491219, 0.20980829],
+              [0.24400695, 0.23975028, 0.25520584]]
+# 2D keypoint normalization parameters
+# From file: '{data_root}/annotation_body3d/fps50/joint2d_stats.pkl'
+keypoints_mean = [[532.08351635, 419.74137558], [531.80953144, 418.2607141],
+                  [530.68456967, 493.54259285], [529.36968722, 575.96448516],
+                  [532.29767646, 421.28483336], [531.93946631, 494.72186795],
+                  [529.71984447, 578.96110365], [532.93699382, 370.65225054],
+                  [534.1101856, 317.90342311], [534.55416813, 304.24143901],
+                  [534.86955004, 282.31030885], [534.11308566, 330.11296796],
+                  [533.53637525, 376.2742511], [533.49380107, 391.72324565],
+                  [533.52579142, 330.09494668], [532.50804964, 374.190479],
+                  [532.72786934, 380.61615716]],
+keypoints_std = [[107.73640054, 63.35908715], [119.00836213, 64.1215443],
+                 [119.12412107, 50.53806215], [120.61688045, 56.38444891],
+                 [101.95735275, 62.89636486], [106.24832897, 48.41178119],
+                 [108.46734966, 54.58177071], [109.07369806, 68.70443672],
+                 [111.20130351, 74.87287863], [111.63203838, 77.80542514],
+                 [113.22330788, 79.90670556], [105.7145833, 73.27049436],
+                 [107.05804267, 73.93175781], [107.97449418, 83.30391802],
+                 [121.60675105, 74.25691526], [134.34378973, 77.48125087],
+                 [131.79990652, 89.86721124]]
+codec = dict(
+    type='ImagePoseLifting',
+    num_keypoints=17,
+    root_index=0,
+    remove_root=True,
+    target_mean=target_mean,
+    target_std=target_std,
+    keypoints_mean=keypoints_mean,
+    keypoints_std=keypoints_std)
+
+# model settings
+model = dict(
+    type='PoseLifter',
+    backbone=dict(
+        type='TCN',
+        in_channels=2 * 17,
+        stem_channels=1024,
+        num_blocks=2,
+        kernel_sizes=(1, 1, 1),
+        dropout=0.5,
+    ),
+    head=dict(
+        type='TemporalRegressionHead',
+        in_channels=1024,
+        num_joints=16,
+        loss=dict(type='MSELoss'),
+        decoder=codec,
+    ))
+
+# base dataset settings
+dataset_type = 'Human36mDataset'
+data_root = 'data/h36m/'
+
+# pipelines
+train_pipeline = [
+    dict(type='GenerateTarget', encoder=codec),
+    dict(
+        type='PackPoseInputs',
+        meta_keys=('id', 'category_id', 'target_img_path', 'flip_indices',
+                   'target_root', 'target_root_index', 'target_mean',
+                   'target_std'))
+]
+val_pipeline = train_pipeline
+
+# data loaders
+train_dataloader = dict(
+    batch_size=64,
+    num_workers=2,
+    persistent_workers=True,
+    sampler=dict(type='DefaultSampler', shuffle=True),
+    dataset=dict(
+        type=dataset_type,
+        ann_file='annotation_body3d/fps50/h36m_train.npz',
+        seq_len=1,
+        causal=True,
+        keypoint_2d_src='gt',
+        data_root=data_root,
+        data_prefix=dict(img='images/'),
+        pipeline=train_pipeline,
+    ))
+val_dataloader = dict(
+    batch_size=64,
+    num_workers=2,
+    persistent_workers=True,
+    drop_last=False,
+    sampler=dict(type='DefaultSampler', shuffle=False, round_up=False),
+    dataset=dict(
+        type=dataset_type,
+        ann_file='annotation_body3d/fps50/h36m_test.npz',
+        seq_len=1,
+        causal=True,
+        keypoint_2d_src='gt',
+        data_root=data_root,
+        data_prefix=dict(img='images/'),
+        pipeline=train_pipeline,
+    ))
+test_dataloader = val_dataloader
+
+# evaluators
+val_evaluator = [
+    dict(type='MPJPE', mode='mpjpe'),
+    dict(type='MPJPE', mode='p-mpjpe')
+]
+test_evaluator = val_evaluator
diff --git a/...ose3d-1frm-supv-cpn-ft_8xb128-80e_h36m.py → ...ose3d-1frm-supv-cpn-ft_8xb128-80e_h36m.py b/...ose3d-1frm-supv-cpn-ft_8xb128-80e_h36m.py → ...ose3d-1frm-supv-cpn-ft_8xb128-80e_h36m.py
diff --git a/...3d-243frm-supv-cpn-ft_8xb128-200e_h36m.py → ...3d-243frm-supv-cpn-ft_8xb128-200e_h36m.py b/...3d-243frm-supv-cpn-ft_8xb128-200e_h36m.py → ...3d-243frm-supv-cpn-ft_8xb128-200e_h36m.py
diff --git a/...ideopose3d-243frm-supv_8xb128-80e_h36m.py → ...ideopose3d-243frm-supv_8xb128-80e_h36m.py b/...ideopose3d-243frm-supv_8xb128-80e_h36m.py → ...ideopose3d-243frm-supv_8xb128-80e_h36m.py
diff --git a/...27frm-semi-supv-cpn-ft_8xb64-200e_h36m.py → ...27frm-semi-supv-cpn-ft_8xb64-200e_h36m.py b/...27frm-semi-supv-cpn-ft_8xb64-200e_h36m.py → ...27frm-semi-supv-cpn-ft_8xb64-200e_h36m.py
diff --git a/...pose3d-27frm-semi-supv_8xb64-200e_h36m.py → ...pose3d-27frm-semi-supv_8xb64-200e_h36m.py b/...pose3d-27frm-semi-supv_8xb64-200e_h36m.py → ...pose3d-27frm-semi-supv_8xb64-200e_h36m.py
diff --git a/...videopose3d-27frm-supv_8xb128-80e_h36m.py → ...videopose3d-27frm-supv_8xb128-80e_h36m.py b/...videopose3d-27frm-supv_8xb128-80e_h36m.py → ...videopose3d-27frm-supv_8xb128-80e_h36m.py
diff --git a/...videopose3d-81frm-supv_8xb128-80e_h36m.py → ...videopose3d-81frm-supv_8xb128-80e_h36m.py b/...videopose3d-81frm-supv_8xb128-80e_h36m.py → ...videopose3d-81frm-supv_8xb128-80e_h36m.py
diff --git a/configs/body_3d_keypoint/pose_lift/h36m/simplebaseline3d_h36m.md b/configs/body_3d_keypoint/pose_lift/h36m/simplebaseline3d_h36m.md
@@ -0,0 +1,44 @@
+<!-- [BACKBONE] -->
+
+<details>
+<summary align="right"><a href="http://openaccess.thecvf.com/content_iccv_2017/html/Martinez_A_Simple_yet_ICCV_2017_paper.html">SimpleBaseline3D (ICCV'2017)</a></summary>
+
+```bibtex
+@inproceedings{martinez_2017_3dbaseline,
+  title={A simple yet effective baseline for 3d human pose estimation},
+  author={Martinez, Julieta and Hossain, Rayat and Romero, Javier and Little, James J.},
+  booktitle={ICCV},
+  year={2017}
+}
+```
+
+</details>
+
+<!-- [DATASET] -->
+
+<details>
+<summary align="right"><a href="https://ieeexplore.ieee.org/abstract/document/6682899/">Human3.6M (TPAMI'2014)</a></summary>
+
+```bibtex
+@article{h36m_pami,
+  author = {Ionescu, Catalin and Papava, Dragos and Olaru, Vlad and Sminchisescu,  Cristian},
+  title = {Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments},
+  journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
+  publisher = {IEEE Computer Society},
+  volume = {36},
+  number = {7},
+  pages = {1325-1339},
+  month = {jul},
+  year = {2014}
+}
+```
+
+</details>
+
+Results on Human3.6M dataset with ground truth 2D detections
+
+| Arch                                                            | MPJPE | P-MPJPE |                              ckpt                               |                              log                               |
+| :-------------------------------------------------------------- | :---: | :-----: | :-------------------------------------------------------------: | :------------------------------------------------------------: |
+| [SimpleBaseline3D-tcn<sup>1</sup>](/configs/body_3d_keypoint/pose_lift/h36m/pose-lift_simplebaseline3d_8xb64-200e_h36m.py) | 43.4  |  34.3   | [ckpt](https://download.openmmlab.com/mmpose/body3d/simple_baseline/simple3Dbaseline_h36m-f0ad73a4_20210419.pth) | [log](https://download.openmmlab.com/mmpose/body3d/simple_baseline/20210415_065056.log.json) |
+
+<sup>1</sup> Differing from the original paper, we didn't apply the `max-norm constraint` because we found this led to a better convergence and performance.
diff --git a/configs/body_3d_keypoint/pose_lift/h36m/simplebaseline3d_h36m.yml b/configs/body_3d_keypoint/pose_lift/h36m/simplebaseline3d_h36m.yml
@@ -0,0 +1,21 @@
+Collections:
+- Name: SimpleBaseline3D
+  Paper:
+    Title: A simple yet effective baseline for 3d human pose estimation
+    URL: http://openaccess.thecvf.com/content_iccv_2017/html/Martinez_A_Simple_yet_ICCV_2017_paper.html
+  README: https://github.com/open-mmlab/mmpose/blob/main/docs/en/papers/algorithms/simplebaseline3d.md
+Models:
+- Config: configs/body_3d_keypoint/pose_lift/h36m/pose-lift_simplebaseline3d_8xb64-200e_h36m.py
+  In Collection: SimpleBaseline3D
+  Metadata:
+    Architecture: &id001
+    - SimpleBaseline3D
+    Training Data: Human3.6M
+  Name: pose-lift_simplebaseline3d_8xb64-200e_h36m
+  Results:
+  - Dataset: Human3.6M
+    Metrics:
+      MPJPE: 43.4
+      P-MPJPE: 34.3
+    Task: Body 3D Keypoint
+  Weights: https://download.openmmlab.com/mmpose/body3d/simple_baseline/simple3Dbaseline_h36m-f0ad73a4_20210419.pth