In this project we explore the performance of human reconstruction models. We focus on creating an accurate method and thus need human reconstruction datasets with reliable ground truth. Currently, real human reconstruction datasets do not provide such high accuray, we thus use self generated synthetic datasets. As human reconstruction model we use DeepMultiCap: Performance Capture of Multiple Characters Using Sparse Multiview Cameras (ICCV 2021). We use the provided code of the authors (https://github.com/DSaurus/DeepMultiCap) as base. As their code base is incomprehensive we additionally implement remaining code and also adapt their code to our synthetic dataset.
- torch
- torchvision
- trimesh
- numpy
- matplotlib
- PIL
- skimage
- tqdm
- cv2
- json
- taichi==0.6.39 or 0.7.15
- taichi_three
- taichi_glsl==0.0.10
- configargparse
- tensorboardX
- open3d
Result on our Squat dataset. Green depicts the ground truth, red the result of the coarse module and blue the result of the fine module.
Result on our Jumping Jack dataset. Again, green depicts the ground truth, red the resut of the coarse module nand bluethe result of the fine module.
Results of our method trained on Squat and evaluated on Squat.
Train Set | Module | # Cam. | Chamfer | P2S | Norm. |
---|---|---|---|---|---|
Squat | Coarse | 5 | 0.022 | 0.033 | 0.033 |
Squat | Fine | 5 | 0.015 | 0.019 | 0.040 |
Results of our method trained on Squat and evaluated on Jumping Jack
Train Set | Module | # Cam. | Chamfer | P2S | Norm. |
---|---|---|---|---|---|
Squat | Fine | 5 | 0.026 | 0.039 | 0.038 |
Results of our method trained on Squat and evaluated on Jumpings Jack considering different parts of the ground truth mesh (i.e., cloth and human).
Part | Train Set | Module | # Cam. | Chamfer | P2S | Norm. |
---|---|---|---|---|---|---|
Human | Squat | Fine | 5 | - | 0.025 | - |
Cloth | Squat | Fine | 5 | - | 0.016 | - |
Our pre-trained models and synthetic data can be found here under reproducing
.
Also download the pre-trained weights for the normal net. They are found under checkpoints
.
These are then also put in checkpoints
in the workspace folder.
The password is fcanys2333.
Unzip the synthetic.zip folder into data/
and the outputs folder directly workspace folder.
All datasets (i.e., arm, jumping_jack, squat) follow the same data structure.
data/Synthetic
├── arm # arm dataset
├── jumping_jack # jumping jack dataset
├── old_val # a dataset similar to jumping jack, not used
├── old_val_easymocap # contains data used for easy mocap
├── squat
│ ├── Depth
│ ├── Normal
│ ├── Obj
│ └── person_0
│ ├── cloth # contains the cloth
│ ├── combined # contains the cloth and human data
│ ├── merged_2 # another variation we used for merging (not used)
│ ├── smplx # the ground truth smpl_x files
│ ├── smplx_no_cloth # the smplx_x files with only visible human shapes (only used for eval)
│ └── voxel_grid # the voxel grid produced by binvox for the smplx input (used, since binxo gets stuck)
│ ├── output_data.npz
│ ├── RGB
│ ├── scene_camera.json
│ ├── Segmentation
│ ├── smpl_pos # the smplx global normal maps
│ └── smpl_pred # the smpl human mesh predicted using easy mocap (see later sections of this readme)
└── squat_easymocap # contains data used for easy mocap
Evaluating the pre-trained models from the given outputs.zip folder.
The output will be found in the folder specified in the folder
flag under folder/<your dataset>
.
Evaluating the pre-trained coarse module on Squat dataset:
python apps/eval_3d.py --config configs/squat_coarse.yaml --val_size -1 --folder 07_19-01_22_15_SQUAT_COARSE
Evaluating the pre-trained fine module on Squat dataset:
python apps/eval_3d.py --config configs/squat_fine.yaml --val_size -1 --folder 07_19-15_26_36_SQUAT_FINE
Evaluating the pre-trained fine module on Squat dataset with only 2 cameras:
python apps/eval_3d.py --config configs/squat_fine.yaml --val_size -1 --folder 07_23-23_01_20_SQUAT_FINE_CAM_2 --cameras 6 28
Evaluating the pre-trained fine module on Squat dataset using the SMPL models predicted by EasyMocap:
python apps/eval_3d.py --config configs/squat_fine.yaml --val_size -1 --folder 07_20-22_49_16_SQUAT_FINE_PRED --smpl_path smpl_pred
Evaluating the pre-trained fine module on Jumping Jack dataset:
python apps/eval_3d.py --config configs/squat_fine.yaml --val_size -1 --folder 07_20-22_49_16_SQUAT_FINE_PRED --val_frames 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 --val_dataroot data/Synthetic/jumping_jack
Evaluating PIFuHD on camera 0:
python apps/evaluator.py 0
Evaluating PIfuHD on camera 6:
python apps/evaluator.py 6
The evaluation follows the following structure:
Evaluating the coarse module on Squat dataset:
python apps/eval_3d.py --config configs/squat_coarse.yaml --val_size -1 --folder <the_output_folder_of_that_experiment>
For other datasets proceed analogously. Other configurations like validation frames, validation cameras, resolution etc. can be adjusted with flags or in the configuration.
The output will be found in the output folder.
Training the coarse module on the Squat dataset:
python apps/train.py --config configs/squat_coarse.yaml
Training the fine module on the Squat dataset:
python apps/train.py --config configs/squat_fine.yaml --load_netG_checkpoint_path <insert model from coarse module>
For the arm and Jumping Jack dataset proceed analogously.
We use this repository for generating the masks.
- Follow the installment steps from the github repository, including detectron2 for the multiple human parsing framework.
- Make sure that the Synthetic data is under
data/Synthetic/<first_trial>
- Make sure that all images to be parsed are in the same folder
cd external/Self-Correction-Human-Parsing
python process.py -src_img ../../data/Synthetic/first_trial -dst_img mhp_extension/data/synthetic_first_trial/global_pic
- Create coco style annotations for the images you just copied
cd mhp_extension
python ./coco_style_annotation_creator/test_human2coco_format.py --dataset "synthetic_first_trial" --json_save_dir "./data/synthetic_first_trial/annotations" --test_img_dir "./data/synthetic_first_trial/global_pic"
- Generate instance prediction for images
python finetune_net.py --num-gpus 1 --config-file configs/Misc/synthetic_first_trial.yaml --eval-only MODEL.WEIGHTS pretrain_model/detectron2_maskrcnn_cihp_finetune.pth TEST.AUG.ENABLED False DATALOADER.NUM_WORKERS 0
- Crop images by prediction bounding boxes
python make_crop_and_mask_w_mask_nms.py --img_dir "./data/synthetic_first_trial/global_pic" --save_dir "./data/synthetic_first_trial" --img_list "./data/synthetic_first_trial/annotations/synthetic_first_trial.json" --det_res "./data/synthetic_first_trial/detectron_2_prediction/inference/instances_predictions.pth"
- Generate txt files for images in
global_pic
andcrop_pic
python generate_txt_file.py --folder_path "data/synthetic_first_trial/global_pic" --txt_file_name "global_pic.txt"
python generate_txt_file.py --folder_path "data/synthetic_first_trial/crop_pic" --txt_file_name "crop_pic.txt"
- Generate parsed images for cropped images, global images
cd ..
python ./mhp_extension/global_local_parsing/global_local_evaluate.py --data-dir "./mhp_extension/data/synthetic_first_trial" --split-name "crop_pic" --model-restore "./mhp_extension/pretrain_model/exp_schp_multi_cihp_local.pth" --log-dir "./mhp_extension/data/synthetic_first_trial" --save-results
python ./mhp_extension/global_local_parsing/global_local_evaluate.py --data-dir "./mhp_extension/data/synthetic_first_trial" --split-name "global_pic" --model-restore "./mhp_extension/pretrain_model/exp_schp_multi_cihp_global.pth" --log-dir "./mhp_extension/data/synthetic_first_trial" --save-results
-
Install
pip install joblib
if necessary -
Fuse the inputs and get your results!
python mhp_extension/logits_fusion.py --test_json_path "./mhp_extension/data/synthetic_first_trial/crop.json" --global_output_dir "./mhp_extension/data/synthetic_first_trial/global_pic_parsing" --gt_output_dir "./mhp_extension/data/synthetic_first_trial/crop_pic_parsing" --mask_output_dir "./mhp_extension/data/synthetic_first_trial/crop_mask" --save_dir "./mhp_extension/data/synthetic_first_trial/mhp_fusion_parsing"
- Results are now in
./mhp_extension/data/synthetic_first_trial/mhp_fusion_parsing/global_tag
- Copy the results into the data folder:
python process.py --depth_folder ../../data/Synthetic/first_trial/Depth --depth_target ../../data/Synthetic/first_trial/depth_npz --img_folder ../../data/Synthetic/first_trial --img_target ../../data/Synthetic/first_trial/img --normal_folder ../../data/Synthetic/first_trial/Normal --normal_target ../../data/Synthetic/first_trial/normal_post_process --mask_folder ./mhp_extension/data/synthetic_first_trial/mhp_fusion_parsing/global_tag --mask_target ../../data/Synthetic/first_trial/masks
Download SMPL models:
pip install gdown wget
# You might need to rename the HrNet weight file
python scripts/download.py
Prepare your Conda environment (if necessary):
conda create -n easymocap python=3.9 -y
conda activate easymocap
Install remaining requirements:
cd external/EasyMocap-master
python -m pip install -r requirements.txt
python3 -m pip install pyrender
python setup.py develop
Convert Dataset into easymocap format:
python scripts/convert_params.py -i data/Synthetic/first_trial/camera_info.json -o data/Synthetic/first_trial_easymocap -d data/Synthetic/first_trial -f 30
Extract the images from videos:
data=/path/to/data
python scripts/preprocess/extract_video.py ${data} --no2d
Create 2D keypoints:
python apps/preprocess/extract_keypoints.py ${data} --mode yolo-hrnet
Create 3D keypoints:
python3 apps/demo/mvmp.py ${data} --out ${data}/output --annot annots --cfg config/exp/mvmp1f.yml --undis --vis_det --vis_repro
Track 3D keypoints:
python3 apps/demo/auto_track.py ${data}/output ${data}/output-track --track3d
Fit SMPL model:
python3 apps/demo/smpl_from_keypoints.py ${data} --skel ${data}/output-track/keypoints3d --out ${data}/output-track/smpl --verbose --opts smooth_poses 1e1
baseline
.
Also make sure to include the dataset under data
, namely MultiHuman
and multihuman_single_raw
(see file structure).
Make sure to have pretrained checkpoints of DeepMultiCap downloaded in checkpoints/demo/
from here and the MultiHuman dataset
- Generate image, normal, masks and depth from object files:
cd taichi_render_gpu
python render_multi.py --data_root ../data/MultiHuman/single/obj --texture_root ../data/multihuman_single_raw/multihuman_single --save_path ../data/multihuman_single_inputs --num_angles 4
- You should now have
depth
,img
,mask
,normal
, andparameter
in yourdata/multihuman_single_inputs
folder. - These images should look like this:
- These images do not contain any colors, because DeepMultiCap has a weird file structure and no documentation at all :(
- Generate smpl global maps
python render_smpl.py --dataroot ../data/multihuman_single_inputs --obj_path ../data/MultiHuman/single/smplx --faces_path ../lib/data/smplx_multi.obj --yaw_list 0 90 180 270
- This should now generate a folder called
smpl_pos
-
Copy the smplx folder of
data/MultiHuman/single/smplx
intodata/multihuman_single_inputs
. "Normally", these should include the estimated smpl models from another method. -
Generate reconstructions and visualization
# go back to project root folder
python apps/eval_3d.py --config configs/multihuman_single.yaml --dataroot data/multihuman_single_inputs
- Now the reconstructions should be in
results/multihuman_single
- The results look similar to:
- Evaluation of baseline (Not published by the authors)
- Tony Wang
- Yushan Zheng
- Michael Pabst