Any person detection and/or pose estimation model can be used + many …

…others
davidpagnon · Jan 13, 2025 · e6d440f · e6d440f
1 parent 7428e7d
commit e6d440f
Show file tree

Hide file tree

Showing 9 changed files with 1,065 additions and 256 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,7 +1,9 @@
 **/__pycache__/
 **/build/
 *.pyc
+logs.txt*
+**/*.log
 *.egg-info/
 dist/
-logs.txt
-demo_Sports2D/**
+demo_Sports2D/**
+**/Demo/*.jpg
diff --git a/README.md b/README.md
@@ -121,12 +121,13 @@ The Demo video is voluntarily challenging to demonstrate the robustness of the p
 - One person walking in the sagittal plane
 - One person doing jumping jacks in the frontal plane. This person then performs a flip while being backlit, both of which are challenging for the pose detection algorithm
 - One tiny person flickering in the background who needs to be ignored
+- The first person is starting high and ending low on the image, which messes up the automatic floor angle calculation. You can set it up manually with the parameter `--floor_angle 0`
 
 <br>
 
 ### Play with the parameters
 
-For a full list of the available parameters, check the [Config_Demo.toml](https://github.com/davidpagnon/Sports2D/blob/main/Sports2D/Demo/Config_demo.toml) file or type:
+For a full list of the available parameters, see [this section](#all-the-parameters) of the documentation, check the [Config_Demo.toml](https://github.com/davidpagnon/Sports2D/blob/main/Sports2D/Demo/Config_demo.toml) file, or type:
 ``` cmd
 sports2d --help
 ```
@@ -195,7 +196,16 @@ Note that it does not take distortions into account, and that it will be less ac
 
 **Quick fixes:**
 - Use ` --save_vid false --save_img false --show_realtime_results false`: Will not save images or videos, and will not display the results in real time. 
-- Use `--mode lightweight`: Will use a lighter version of RTMPose, which is faster but less accurate.
+- Use `--mode lightweight`: Will use a lighter version of RTMPose, which is faster but less accurate.\
+Note that any detection and pose models can be used (first [deploy them with MMPose](https://mmpose.readthedocs.io/en/latest/user_guides/how_to_deploy.html#onnx) if you do not have their .onnx or .zip files), with the following formalism:
+  ```
+  --mode """{'det_class':'YOLOX',
+          'det_model':'https://download.openmmlab.com/mmpose/v1/projects/rtmposev1/onnx_sdk/yolox_nano_8xb8-300e_humanart-40f6f0d0.zip',
+          'det_input_size':[416,416],
+          'pose_class':'RTMPose',
+          'pose_model':'https://download.openmmlab.com/mmpose/v1/projects/rtmposev1/onnx_sdk/rtmpose-t_simcc-body7_pt-body7_420e-256x192-026a1439_20230504.zip',
+          'pose_input_size':[192,256]}"""
+  ```
 - Use `--det_frequency 50`: Will detect poses only every 50 frames, and track keypoints in between, which is faster.
 - Use `--multiperson false`: Can be used if one single person is present in the video. Otherwise, persons' IDs may be mixed up.
 - Use `--load_trc <path_to_file_px.trc>`: Will use pose estimation results from a file. Useful if you want to use different parameters for pixel to meter conversion or angle calculation without running detection and pose estimation all over.
@@ -375,6 +385,7 @@ sports2d --help
 'trimmed_extrema_percent': ["", "Proportion of the most extreme segment values to remove before calculating their mean. Defaults to 50"],
 'fontSize': ["", "font size for angle values. 0.3 if not specified"],
 'flip_left_right': ["", "true or false. true to get consistent angles with people facing both left and right sides. Set it to false if you want timeseries to be continuous even when the participent switches their stance. true if not specified"],
+'fix_segment_angles_with_floor_angle': ["", "true or false. If the camera is tilted, corrects segment angles as regards to the floor angle. Set to false is the floor is tilted instead. True if not specified"],
 'interpolate': ["", "interpolate missing data. true if not specified"],
 'interp_gap_smaller_than': ["", "interpolate sequences of missing data if they are less than N frames long. 10 if not specified"],
 'fill_large_gaps_with': ["", "last_value, nan, or zeros. last_value if not specified"],
@@ -491,9 +502,10 @@ If you want to contribute to Sports2D, please follow [this guide](https://docs.g
 - [x] Option to only save one person (with the highest average score, or with the most frames and fastest speed)
 - [x] Run again without pose estimation with the option `--load_trc` for px .trc file.
 - [x] **Convert positions to meters** by providing the person height, a calibration file, or 3D points [to click on the image](https://stackoverflow.com/questions/74248955/how-to-display-the-coordinates-of-the-points-clicked-on-the-image-in-google-cola)
+- [x] Support any detection and/or pose estimation model.
 
 - [ ] Perform **Inverse kinematics and dynamics** with OpenSim (cf. [Pose2Sim](https://github.com/perfanalytics/pose2sim), but in 2D). Update [this model](https://github.com/davidpagnon/Sports2D/blob/main/Sports2D/Utilities/2D_gait.osim) (add arms, markers, remove muscles and contact spheres). Add pipeline example.
-- [ ] - [ ] Optionally let user select the person of interest in single_person mode:\
+- [ ]  Optionally let user select the person of interest in single_person mode:\
 `multiperson = true # true, or 'single_auto', or 'single_click'. 'single_auto' selects the person with highest average likelihood, and 'single_click' lets the user manually select the person of interest.`
 - [ ] Run with the option `--compare_to` to visually compare motion with a trc file. If run with a webcam input, the user can follow the motion of the trc file. Further calculation can then be done to compare specific variables.
 - [ ] **Colab version**: more user-friendly, usable on a smartphone.

diff --git a/Sports2D/Demo/Config_demo.toml b/Sports2D/Demo/Config_demo.toml
@@ -13,14 +13,15 @@
 
 [project]
 video_input = 'demo.mp4' # 'webcam' or '<video_path.ext>', or ['video1_path.mp4', 'video2_path.avi>', ...]
-                        # Time ranges can be different for each video. All other processing arguments will be identical.
+                        # On Windows, replace '\' with '/'
                         # Beware that images won't be saved if paths contain non ASCII characters.
 person_height = 1.70    # Height of the person in meters (for pixels -> meters conversion)
 load_trc = ''           # If you do not want to recalculate pose, load it from a trc file (in px, not in m)
 compare = false         # Not implemented yet
 
 # Video parameters
 time_range = []   # [] for the whole video, or [start_time, end_time] (in seconds), or [[start_time1, end_time1], [start_time2, end_time2], ...]
+                  # Time ranges can be different for each video. 
 video_dir = ''    # If empty, video dir is current dir
 
 # Webcam parameters
@@ -48,13 +49,29 @@ result_dir = '' # If empty, project dir is current dir
 slowmo_factor = 1       # 1 for normal speed. For a video recorded at 240 fps and exported to 30 fps, it would be 240/30 = 8
 
 # Pose detection parameters
-pose_model = 'Body_with_feet' # Only Body_with_feet is available for now
-mode = 'balanced'       # 'lightweight', 'balanced', or 'performance'
+pose_model = 'Body_with_feet'  #With RTMLib: Body_with_feet (default HALPE_26 model), Whole_body (COCO_133: body + feet + hands), Body (COCO_17), CUSTOM (see example at the end of the file), or any from skeletons.py
+mode = 'balanced' # 'lightweight', 'balanced', 'performance', or """{dictionary}""" (see below)
+
+# A dictionary (WITHIN THREE DOUBLE QUOTES) allows you to manually select the person detection (if top_down approach) and/or pose estimation models (see https://github.com/Tau-J/rtmlib). 
+# Make sure the input_sizes are within triple quotes, and that they are in the opposite order from the one in the model path (for example, it would be [192,256] for rtmpose-m_simcc-body7_pt-body7-halpe26_700e-256x192-4d3e73dd_20230605.zip). 
+# If your pose_model is not provided in skeletons.py, you may have to create your own one (see example at the end of the file).
+# Example, equivalent to mode='balanced':
+# mode = """{'det_class':'YOLOX',
+#          'det_model':'https://download.openmmlab.com/mmpose/v1/projects/rtmposev1/onnx_sdk/yolox_m_8xb8-300e_humanart-c2c7a14a.zip',
+#          'det_input_size':[640, 640],
+#          'pose_class':'RTMPose',
+#          'pose_model':'https://download.openmmlab.com/mmpose/v1/projects/rtmposev1/onnx_sdk/rtmpose-m_simcc-body7_pt-body7-halpe26_700e-256x192-4d3e73dd_20230605.zip',
+#          'pose_input_size':[192,256]}"""
+# Example with one-stage RTMO model (Requires pose_model = 'Body'):
+# mode = """{'pose_class':'RTMO', 
+#          'pose_model':'https://download.openmmlab.com/mmpose/v1/projects/rtmo/onnx_sdk/rtmo-m_16xb16-600e_body7-640x640-39e78cc4_20231211.zip', 
+#          'pose_input_size':[640, 640]}"""
+
 det_frequency = 1       # Run person detection only every N frames, and inbetween track previously detected bounding boxes (keypoint detection is still run on all frames). 
                         # Equal to or greater than 1, can be as high as you want in simple uncrowded cases. Much faster, but might be less accurate. 
-tracking_mode = 'sports2d' # 'rtmlib' or 'sports2d'. 'sports2d' is generally much more accurate and comparable in speed
 device = 'auto' # 'auto', 'CPU', 'CUDA', 'MPS', 'ROCM'
 backend = 'auto' # 'auto', 'openvino', 'onnxruntime', 'opencv'
+tracking_mode = 'sports2d' # 'rtmlib' or 'sports2d'. 'sports2d' is generally much more accurate and comparable in speed
 
 
 # Processing parameters
@@ -86,13 +103,14 @@ fontSize = 0.3
 
 # Select joint angles among
 # ['Right ankle', 'Left ankle', 'Right knee', 'Left knee', 'Right hip', 'Left hip', 'Right shoulder', 'Left shoulder', 'Right elbow', 'Left elbow', 'Right wrist', 'Left wrist']
-joint_angles = ['Right ankle', 'Left ankle', 'Right knee', 'Left knee', 'Right hip', 'Left hip', 'Right shoulder', 'Left shoulder', 'Right elbow', 'Left elbow']
+joint_angles = ['Right ankle', 'Left ankle', 'Right knee', 'Left knee', 'Right hip', 'Left hip', 'Right shoulder', 'Left shoulder', 'Right elbow', 'Left elbow', 'Right wrist', 'Left wrist']
 # Select segment angles among
 # ['Right foot', 'Left foot', 'Right shank', 'Left shank', 'Right thigh', 'Left thigh', 'Pelvis', 'Trunk', 'Shoulders', 'Head', 'Right arm', 'Left arm', 'Right forearm', 'Left forearm']
 segment_angles = ['Right foot', 'Left foot', 'Right shank', 'Left shank', 'Right thigh', 'Left thigh', 'Pelvis', 'Trunk', 'Shoulders', 'Head', 'Right arm', 'Left arm', 'Right forearm', 'Left forearm']
 
 # Processing parameters
 flip_left_right = true  # Same angles whether the participant faces left/right. Set it to false if you want timeseries to be continuous even when the participent switches their stance.
+correct_segment_angles_with_floor_angle = true # If the camera is tilted, corrects segment angles as regards to the floor angle. Set to false is the floor is tilted instead
 
 
 [post-processing]
@@ -124,5 +142,88 @@ person_orientation = ['front', 'none', 'left'] # Choose among 'auto', 'none', 'f
 osim_setup_path = '../OpenSim_setup' # Path to the OpenSim setup folder
 close_to_zero_speed_m = 0.2 # Sum for all keypoints: about 50 px/frame or 0.2 m/frame 
 
+
 [logging]
-use_custom_logging = false # if integrated in an API that already has logging
+use_custom_logging = false # if integrated in an API that already has logging
+
+
+
+# CUSTOM skeleton
+# If you use a model with different keypoints and/or different ordering
+# Useful if you trained your own model, from DeepLabCut or MMPose for example. 
+# Make sure the ids are set in the right order and start from zero.
+# 
+# If you want to perform inverse kinematics, you will also need to create an OpenSim model
+# and add to its markerset the location where you expect the triangulated keypoints to be detected.
+# 
+# In this example, CUSTOM reproduces the HALPE_26 skeleton (default skeletons are stored in skeletons.py).
+# You can create as many custom skeletons as you want, just add them further down and rename them.
+# 
+# Check your model hierarchy with:  for pre, _, node in RenderTree(model): 
+#                                      print(f'{pre}{node.name} id={node.id}')
+[pose.CUSTOM]
+name = "Hip"
+id = 19
+  [[pose.CUSTOM.children]]
+  name = "RHip"
+  id = 12
+     [[pose.CUSTOM.children.children]]
+     name = "RKnee"
+     id = 14
+        [[pose.CUSTOM.children.children.children]]
+        name = "RAnkle"
+        id = 16
+           [[pose.CUSTOM.children.children.children.children]]
+           name = "RBigToe"
+           id = 21
+              [[pose.CUSTOM.children.children.children.children.children]]
+              name = "RSmallToe"
+              id = 23
+           [[pose.CUSTOM.children.children.children.children]]
+           name = "RHeel"
+           id = 25
+  [[pose.CUSTOM.children]]
+  name = "LHip"
+  id = 11
+     [[pose.CUSTOM.children.children]]
+     name = "LKnee"
+     id = 13
+        [[pose.CUSTOM.children.children.children]]
+        name = "LAnkle"
+        id = 15
+           [[pose.CUSTOM.children.children.children.children]]
+           name = "LBigToe"
+           id = 20
+              [[pose.CUSTOM.children.children.children.children.children]]
+              name = "LSmallToe"
+              id = 22
+           [[pose.CUSTOM.children.children.children.children]]
+           name = "LHeel"
+           id = 24
+  [[pose.CUSTOM.children]]
+  name = "Neck"
+  id = 18
+     [[pose.CUSTOM.children.children]]
+     name = "Head"
+     id = 17
+        [[pose.CUSTOM.children.children.children]]
+        name = "Nose"
+        id = 0
+     [[pose.CUSTOM.children.children]]
+     name = "RShoulder"
+     id = 6
+        [[pose.CUSTOM.children.children.children]]
+        name = "RElbow"
+        id = 8
+           [[pose.CUSTOM.children.children.children.children]]
+           name = "RWrist"
+           id = 10
+     [[pose.CUSTOM.children.children]]
+     name = "LShoulder"
+     id = 5
+        [[pose.CUSTOM.children.children.children]]
+        name = "LElbow"
+        id = 7
+           [[pose.CUSTOM.children.children.children.children]]
+           name = "LWrist"
+           id = 9
diff --git a/Sports2D/Sports2D.py b/Sports2D/Sports2D.py
@@ -173,7 +173,9 @@
                                                     'Right shoulder',
                                                     'Left shoulder',
                                                     'Right elbow',
-                                                    'Left elbow'],
+                                                    'Left elbow',
+                                                    'Right wrist',
+                                                    'Left wrist'],
                                 'segment_angles': [ 'Right foot',
                                                     'Left foot',
                                                     'Right shank',
@@ -188,7 +190,8 @@
                                                     'Left arm',
                                                     'Right forearm',
                                                     'Left forearm'],
-                                'flip_left_right': True
+                                'flip_left_right': True,
+                                'fix_segment_angles_with_floor_angle': True
                                 },
                     'post-processing': {'interpolate': True,
                                         'interp_gap_smaller_than': 10,
@@ -230,7 +233,7 @@
                 'save_angles': ["A", "save angles as mot files. true if not specified"],
                 'slowmo_factor': ["", "slow-motion factor. For a video recorded at 240 fps and exported to 30 fps, it would be 240/30 = 8. 1 if not specified"],
                 'pose_model': ["p", "only body_with_feet is available for now. body_with_feet if not specified"],
-                'mode': ["m", "light, balanced, or performance. balanced if not specified"],
+                'mode': ["m", 'light, balanced, performance, or a """{dictionary within triple quote}""". balanced if not specified. Use a dictionary to specify your own detection and/or pose estimation models (more about in the documentation).'],
                 'det_frequency': ["f", "run person detection only every N frames, and inbetween track previously detected bounding boxes. keypoint detection is still run on all frames.\n\
                                  Equal to or greater than 1, can be as high as you want in simple uncrowded cases. Much faster, but might be less accurate. 1 if not specified: detection runs on all frames"],
                 'backend': ["", "Backend for pose estimation can be 'auto', 'cpu', 'cuda', 'mps' (for MacOS), or 'rocm' (for AMD GPUs)"],
@@ -256,6 +259,7 @@
                 'trimmed_extrema_percent': ["", "Proportion of the most extreme segment values to remove before calculating their mean. Defaults to 50"],
                 'fontSize': ["", "font size for angle values. 0.3 if not specified"],
                 'flip_left_right': ["", "true or false. true to get consistent angles with people facing both left and right sides. Set it to false if you want timeseries to be continuous even when the participent switches their stance. true if not specified"],
+                'fix_segment_angles_with_floor_angle': ["", "true or false. If the camera is tilted, corrects segment angles as regards to the floor angle. Set to false is the floor is tilted instead. True if not specified"],
                 'interpolate': ["", "interpolate missing data. true if not specified"],
                 'interp_gap_smaller_than': ["", "interpolate sequences of missing data if they are less than N frames long. 10 if not specified"],
                 'fill_large_gaps_with': ["", "last_value, nan, or zeros. last_value if not specified"],
@@ -505,10 +509,11 @@ def main():
     # Override dictionary with command-line arguments if provided
     leaf_keys = get_leaf_keys(new_config)
     for leaf_key, default_value in leaf_keys.items():
-        leaf_name = leaf_key.split('.')[-1]
-        cli_value = getattr(args, leaf_name)
-        if cli_value is not None:
-            set_nested_value(new_config, leaf_key, cli_value)
+        if not 'CUSTOM' in leaf_key:
+            leaf_name = leaf_key.split('.')[-1]
+            cli_value = getattr(args, leaf_name)
+            if cli_value is not None:
+                set_nested_value(new_config, leaf_key, cli_value)
 
     # Run process with the new configuration dictionary
     Sports2D.process(new_config)