Skip to content

This is the official implementation of the paper: EgoCast: Forecasting Egocentric Human Pose in the Wild.

Notifications You must be signed in to change notification settings

BCV-Uniandes/EgoCast

Repository files navigation

EgoCast

Project pagearXiv

Maria Escobar1, Juanita Puentes1, Cristhian Forigua1, Jordi Pont-Tuset2, Kevis-Kokitsi Maninis2, Pablo Arbelaez1. EgoCast: Forecasting Egocentric Human Pose in the Wild. arXiv, 2025.
1Universidad de Los Andes, 2Google DeepMind

Overview

EgoCast is a novel framework for full-body pose forecasting. We use visual and proprioceptive cues to accurately predict body motion.

overview Our method leverages proprioception and visual streams to estimate 3D human pose. (Top) For forecasting, we input previous camera poses and 3D full-body pose predictions through a forecasting head to estimate future 3D poses from t+1 to t+n. (Bottom) Since ground-truth 3D full-body poses are not available in real-case scenarios, we implement a current-frame estimation module that integrates camera poses and visual cues to estimate 3D pose at time t.


Getting started

  1. Clone the repository.

    git clone https://github.com/BCV-Uniandes/EgoCast.git
  2. Install general dependencies.

    To set up the environment and install the necessary dependencies, run the following commands:

    cd EgoCast
    conda create -n egocast python=3.11 -y
    conda activate egocast
    pip install .
  3. Download model checkpoint.

    We use the EgoVPL model from EgoVPL implementation. Please download and put the checkpoint under model_zoo/

Dataset & Preparation

We utilize EgoExo-4D, a large-scale, multi-modal, multi-view video dataset collected across 13 cities worldwide. This dataset serves as a benchmark for egocentric and exocentric human motion analysis.

For training, our model leverages camera poses and egocentric video data.

  1. Data Download

    To download the dataset, follow the instructions provided in the EgoExo-4D documentation.

    To obtain metadata and body pose annotations, run the following command:

     egoexo -o dataset --parts annotations --benchmarks egopose --release v2

    To download the downscaled takes (448p resolution) of the egocentric videos, run the following command:

     egoexo -o dataset --parts annotations --benchmarks egopose --release v2
  2. Data Preparation

    To train our model, the downloaded egocentric video takes must be converted into individual frames. This step extracts frames from the videos and saves them as images for further processing.

    python video2image.py

Current-Frame Estimation Module

The Current-Frame Estimation Module predicts the full-body pose at the current timestamp using camera poses and, optionally, egocentric video. This eliminates the reliance on ground-truth body poses at test time, enabling real-world applicability. We offer two training approaches:

Training

  1. IMU-Based Approach (Uses only camera poses) Train using only IMU (headset pose) data:

    python main_train_egocast.py -opt options/train_egocast_imu.json
  2. EgoCast Approach (Uses camera poses and egocentric video) Train using both camera pose and visual data:

    python main_train_egocast.py -opt options/train_egocast_video.json

Test

  1. IMU-Based Testing (Uses only camera poses) Run the following command to evaluate the IMU-based model:

    python main_test_egocast.py -opt options/test_egocast_imu.json
  2. EgoCast Testing (Uses camera poses and egocentric video) Run the following command to test the model using both IMU data and video:

    python main_test_multiprocessing.py -opt options/test_egocast_multiprocessing.json

Forecasting Module

Make sure you are on the forecasting branch before running the following command:

python main_train_egocast.py -opt options/train_egocast_forecasting.json

Citations

If you find EgoCast useful for your work please cite:

@article{escobar2025egocast,
  author    = {Escobar, Maria and Puentes, Juanita and Forigua, Cristhian and Pont-Tuset, Jordi and Maninis, Kevis-Kokitsi and Arbeláez, Pablo},
  title     = {EgoCast: Forecasting Egocentric Human Pose in the Wild},
  booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  year      = {2025},
}

License and Acknowledgement

This project borrows heavily from AvatarPoser, we thank the authors for their contributions to the community.

Website License

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

About

This is the official implementation of the paper: EgoCast: Forecasting Egocentric Human Pose in the Wild.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages