EgoCast

Maria Escobar¹, Juanita Puentes¹, Cristhian Forigua¹, Jordi Pont-Tuset², Kevis-Kokitsi Maninis², Pablo Arbelaez¹. EgoCast: Forecasting Egocentric Human Pose in the Wild. arXiv, 2025.

¹Universidad de Los Andes, ²Google DeepMind

Overview

EgoCast is a novel framework for full-body pose forecasting. We use visual and proprioceptive cues to accurately predict body motion.

Our method leverages proprioception and visual streams to estimate 3D human pose. (Top) For forecasting, we input previous camera poses and 3D full-body pose predictions through a forecasting head to estimate future 3D poses from t+1 to t+n. (Bottom) Since ground-truth 3D full-body poses are not available in real-case scenarios, we implement a current-frame estimation module that integrates camera poses and visual cues to estimate 3D pose at time t.

Getting started

Clone the repository.

git clone https://github.com/BCV-Uniandes/EgoCast.git

Install general dependencies.

To set up the environment and install the necessary dependencies, run the following commands:
```
cd EgoCast
conda create -n egocast python=3.11 -y
conda activate egocast
pip install .
```
Download model checkpoint.

We use the EgoVPL model from EgoVPL implementation. Please download and put the checkpoint under model_zoo/

Dataset & Preparation

We utilize EgoExo-4D, a large-scale, multi-modal, multi-view video dataset collected across 13 cities worldwide. This dataset serves as a benchmark for egocentric and exocentric human motion analysis.

For training, our model leverages camera poses and egocentric video data.

Data Download

To download the dataset, follow the instructions provided in the EgoExo-4D documentation.

To obtain metadata and body pose annotations, run the following command:
```
 egoexo -o dataset --parts annotations --benchmarks egopose --release v2
```
To download the downscaled takes (448p resolution) of the egocentric videos, run the following command:
```
 egoexo -o dataset --parts annotations --benchmarks egopose --release v2
```
Data Preparation

To train our model, the downloaded egocentric video takes must be converted into individual frames. This step extracts frames from the videos and saves them as images for further processing.
```
python video2image.py
```

Current-Frame Estimation Module

The Current-Frame Estimation Module predicts the full-body pose at the current timestamp using camera poses and, optionally, egocentric video. This eliminates the reliance on ground-truth body poses at test time, enabling real-world applicability. We offer two training approaches:

Training

IMU-Based Approach (Uses only camera poses) Train using only IMU (headset pose) data:
```
python main_train_egocast.py -opt options/train_egocast_imu.json
```
EgoCast Approach (Uses camera poses and egocentric video) Train using both camera pose and visual data:
```
python main_train_egocast.py -opt options/train_egocast_video.json
```

Test

IMU-Based Testing (Uses only camera poses) Run the following command to evaluate the IMU-based model:
```
python main_test_egocast.py -opt options/test_egocast_imu.json
```
EgoCast Testing (Uses camera poses and egocentric video) Run the following command to test the model using both IMU data and video:
```
python main_test_multiprocessing.py -opt options/test_egocast_multiprocessing.json
```

Forecasting Module

Make sure you are on the forecasting branch before running the following command:

python main_train_egocast.py -opt options/train_egocast_forecasting.json

Citations

If you find EgoCast useful for your work please cite:

@article{escobar2025egocast,
  author    = {Escobar, Maria and Puentes, Juanita and Forigua, Cristhian and Pont-Tuset, Jordi and Maninis, Kevis-Kokitsi and Arbeláez, Pablo},
  title     = {EgoCast: Forecasting Egocentric Human Pose in the Wild},
  booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  year      = {2025},
}

License and Acknowledgement

This project borrows heavily from AvatarPoser, we thank the authors for their contributions to the community.

Website License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
models		models
options		options
utils		utils
README.md		README.md
main_test_egocast.py		main_test_egocast.py
main_test_multiprocessing.py		main_test_multiprocessing.py
main_train_egocast.py		main_train_egocast.py
overviewFig.png		overviewFig.png
parse_config.py		parse_config.py
setup.py		setup.py
video2frame.py		video2frame.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EgoCast

Overview

Getting started

Dataset & Preparation

Current-Frame Estimation Module

Training

Test

Forecasting Module

Citations

License and Acknowledgement

Website License

About

Releases

Packages

Contributors 2

Languages

BCV-Uniandes/EgoCast

Folders and files

Latest commit

History

Repository files navigation

EgoCast

Overview

Getting started

Dataset & Preparation

Current-Frame Estimation Module

Training

Test

Forecasting Module

Citations

License and Acknowledgement

Website License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages