Embodied-Manipulation-Foundation-Model-Paper-List

A curated collection of influential research papers in robotics, computer vision, and machine learning.

author: Congsheng (ACondaway) Xu, Organization: VapourX

OpenVLA Series

OpenVLA: An Open-Source Vision-Language-Action Model - Embodied manipulation VLA foundation model (2024-06)
Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success - Embodied manipulation VLA foundation model (2025-02-01)

RDT Series

RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation - Bimanual manipulation foundation model (2024-10)
H-RDT: Human Manipulation Enhanced Bimanual Robotic Manipulation - Data-efficient bimanual manipulation foundation model (2024-10)

TikTok GR Series

UNLEASHING LARGE-SCALE VIDEO GENERATIVE PRE-TRAINING FOR VISUAL ROBOT MANIPULATION - Large-scale video pre-training model proposed by ByteDance (2023-12)
GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation - Large-scale video pre-training model proposed by ByteDance (2024-10)
GR-3 Technical Report - Large-scale video pre-training model proposed by ByteDance (2025-07-01)

Google-Research RT Series

RT-1: Robotics Transformer for Real-World Control at Scale - Key work in RT series VLA (2022-12)
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control - Key work in RT series VLA (2023-07)

PaLM-E Series

PaLM-E: An Embodied Multimodal Language Model - Key work in PaLM-E series (2023-03)

Meta-AI Series

R3M: A Universal Visual Representation for Robot Manipulation - Key work in Meta-AI series (2022-03)

π Series

π0: A Vision-Language-Action Flow Model for General Robot Control - Key work in PI series VLA (2024-10)
π0.5: a Vision-Language-Action Model with Open-World Generalization - Key work in PI series VLA (2024-10)

Being-Beyond Series

Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos - Building embodied manipulation foundation model using existing large-scale data (2025-07)
Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills - Building embodied manipulation foundation model using existing large-scale data (2025-03)

Agibot Series

AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems - Key work in Agibot series (2025-03)
Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation - Key work in Agibot series (2025-08)

Embodied-R1 Series

Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation - Use R1-based method (2025-08)

Galaxea Series

Galaxea Open-World Dataset & G0 Dual-System VLA Model - Key works for Galaxea (2025-08)

NVIDIA GR00T

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots - Key works for NVIDIA Robotics (2025-03)

Octo Series

Octo: An Open-Source Generalist Robot Policy - (2024-05) Last updated: Aug.10 2025

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Embodied-Manipulation-Foundation-Model-Paper-List

author: Congsheng (ACondaway) Xu, Organization: VapourX

OpenVLA Series

RDT Series

TikTok GR Series

Google-Research RT Series

PaLM-E Series

Meta-AI Series

π Series

Being-Beyond Series

Agibot Series

Embodied-R1 Series

Galaxea Series

NVIDIA GR00T

Octo Series

About

Uh oh!

Releases

Packages

VapourX-ScaleLab/Embodied-Manipulation-Foundation-Model-Paper-List

Folders and files

Latest commit

History

Repository files navigation

Embodied-Manipulation-Foundation-Model-Paper-List

author: Congsheng (ACondaway) Xu, Organization: VapourX

OpenVLA Series

RDT Series

TikTok GR Series

Google-Research RT Series

PaLM-E Series

Meta-AI Series

π Series

Being-Beyond Series

Agibot Series

Embodied-R1 Series

Galaxea Series

NVIDIA GR00T

Octo Series

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages