Using GPT2 to build a human motion model!
Kanazawa et al.'s End-to-end Recovery of Human Shape and Pose allows us to make pseudo ground truth labels of human shape and pose sequences:
We can train GPT2 on such sequences! Trained using (90 of the) the 100 Ways to Walk youtube video, we can generate plausible motions on the test set (the remaining 10 ways to walk).
Each of the following gifs were generated by giving a starting context of 30 frames to the model ("Given context"). The next 90 frames are generated. The ground truth ends first (30 frames before the generated frames).
(The minister for silly walks takes up) Scuba diving
Moonwalk
In a wind storm
Although these examples may not match the ground truth, they still look quite plausible. Sometimes, however, the entire point is missed:
Royal guard has given up marching and just wants to stroll
Stepped-in-something guy knows that his feet are sticky but he wants to maintain his "tough" image
Modified hmr and GPT2 folders are included in here, but please follow their reqs/installation instructions
Separately, you'll need an installation of openpose for hmr.
- Download the 100 Ways to Walk video. This is "videoplayback.webm" in data_prep.sh:
ffmpeg -i videoplayback.webm -vsync 0 raw/out%d.png
- Modify path to openpose in data_prep.sh
- Run data_prep.sh. This should give you the following folders in data/:
- preproc: this is just the different walks in .png form, sorted into folders
- json: this is the output from openpose, which hmr relies upon. Gives you the bounding box of the person.
- hmr: this is the pickled extracted 85-dimensional representation of the person, using hmr.
- Run train_gpt2.sh. This will give you the checkpoints and logs folders in GPT2/.
- Run infer_gpt2.sh. This will give you:
- GPT2/logs/train and GPT2/logs/test: these are the predicted representations for each frame, for 10 examples from the test and training sets. Note: the first n_ctx (default:30, set in GPT2/small.json) frames are the given context and are not predicted.
- data/gpt_vis/train and data/gpt_vis/test. You should find 101 images for each walk as we generated 90 frames (n_pred) but start visualisation from frame 19 (to save some runtime).