A curated collection of influential research papers in robotics, computer vision, and machine learning.
author: Congsheng (ACondaway) Xu, Organization: VapourX
- OpenVLA: An Open-Source Vision-Language-Action Model - Embodied manipulation VLA foundation model (2024-06)
- Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success - Embodied manipulation VLA foundation model (2025-02-01)
- RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation - Bimanual manipulation foundation model (2024-10)
- H-RDT: Human Manipulation Enhanced Bimanual Robotic Manipulation - Data-efficient bimanual manipulation foundation model (2024-10)
- UNLEASHING LARGE-SCALE VIDEO GENERATIVE PRE-TRAINING FOR VISUAL ROBOT MANIPULATION - Large-scale video pre-training model proposed by ByteDance (2023-12)
- GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation - Large-scale video pre-training model proposed by ByteDance (2024-10)
- GR-3 Technical Report - Large-scale video pre-training model proposed by ByteDance (2025-07-01)
- RT-1: Robotics Transformer for Real-World Control at Scale - Key work in RT series VLA (2022-12)
- RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control - Key work in RT series VLA (2023-07)
- PaLM-E: An Embodied Multimodal Language Model - Key work in PaLM-E series (2023-03)
- R3M: A Universal Visual Representation for Robot Manipulation - Key work in Meta-AI series (2022-03)
- π0: A Vision-Language-Action Flow Model for General Robot Control - Key work in PI series VLA (2024-10)
- π0.5: a Vision-Language-Action Model with Open-World Generalization - Key work in PI series VLA (2024-10)
- Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos - Building embodied manipulation foundation model using existing large-scale data (2025-07)
- Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills - Building embodied manipulation foundation model using existing large-scale data (2025-03)
- AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems - Key work in Agibot series (2025-03)
- Genie Envisioner: A Unified World Foundation Platform for Robotic Manipulation - Key work in Agibot series (2025-08)
- Embodied-R1: Reinforced Embodied Reasoning for General Robotic Manipulation - Use R1-based method (2025-08)
- Galaxea Open-World Dataset & G0 Dual-System VLA Model - Key works for Galaxea (2025-08)
- GR00T N1: An Open Foundation Model for Generalist Humanoid Robots - Key works for NVIDIA Robotics (2025-03)
- Octo: An Open-Source Generalist Robot Policy - (2024-05) Last updated: Aug.10 2025