Skip to content

Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources

License

Notifications You must be signed in to change notification settings

meng-f21/Awesome-LLM-3D

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome-LLM-3D Awesome

Curated by Xianzheng Ma and Yash Bhalgat


🔥 Here is a curated list of papers about 3D-Related Tasks empowered by Large Language Models(LLMs). It contains various tasks including 3D understanding, reasoning, generation, and embodied agents.

Table of Content

3D Understanding

ID Keywords Institute Paper Publication Others
1 3D-LLM UCLA 3D-LLM: Injecting the 3D World into Large Language Models NeurIPS'2023 github
2 LL3DA Fudan University LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning Arxiv github
3 LLM-Grounder U-Mich LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent Arxiv github
4 Point-Bind CUHK Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following Arxiv github
5 3D-VisTA BIGAI 3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment ICCV‘2023 github
6 LEO BIGAI An Embodied Generalist Agent in 3D World Arxiv github
7 OpenScene ETHz OpenScene: 3D Scene Understanding with Open Vocabularies CVPR’2023 github
8 LERF UC Berkeley LERF: Language Embedded Radiance Fields ICCV‘2023 github
9 ViewRefer CUHK ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding ICCV'2023 github
10 Contrastive Lift Oxford-VGG Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast Contrastive Fusion NeurIPS'2023 github
11 CLIP2Scene HKU CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP CVPR'2023 github
12 PointLLM CUHK PointLLM: Empowering Large Language Models to UnderstandPoint Clouds Arxiv github
13 - MIT Leveraging Large (Visual) Language Models for Robot 3D Scene Understanding Arxiv github
14 Chat-3D ZJU Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes Arxiv github
15 PLA HKU PLA: Language-Driven Open-Vocabulary 3D Scene Understanding CVPR'2023 github
16 UniT3D TUM UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding ICCV'2023 github
17 CG3D JHU CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition Arxiv github
18 JM3D-LLM Xiamen University JM3D & JM3D-LLM: Elevating 3D Representation with Joint Multi-modal Cues ACM MM'2023 github
19 Open-Fusion - Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation Arxiv github
20 - - From Language to 3D Worlds: Adapting Language Model for Point Cloud Perception OpenReview -
21 OpenNerf - OpenNerf: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views OpenReview github
22 - KAUST & LIX Zero-Shot 3D Shape Correspondence Siggraph Asia'2023 -
23 LiDAR-LLM PKU LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding Arxiv project
24 ScanNet200 TUM & NVIDIA Language-Grounded Indoor 3D Semantic Segmentation in the Wild ECCV'2022 project
25 Semantic Abstraction Columbia Semantic Abstraction: Open-World 3D Scene Understanding from 2D Vision-Language Models CoRL'2022 project
26 CLIP-Fields NYU & Meta AI CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory Arxiv project
27 ConceptFusion MIT et al. ConceptFusion: Open-set Multimodal 3D Mapping RSS'2023 project
28 CLIP-FO3D Tsinghua & Xi'an Jiaotong University CLIP-FO3D: Learning Free Open-world 3D Scene Representations from 2D Dense CLIP ICCVW'2023 -
29 VL-Fields University of Edinburgh VL-Fields: Towards Language-Grounded Neural Implicit Spatial Representations ICRA'2023 project
30 3D-VQA ETH CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes CVPRW 2023 code
31 Multi-CLIP ETH Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes Arxiv -
32 OpenMask3D ETH & Microsoft & Google OpenMask3D: Open-Vocabulary 3D Instance Segmentation NeurIPS'2023 project
33 3D-OVS NTU et al. Weakly Supervised 3D Open-vocabulary Segmentation NeurIPS'2023 github
34 RegionPLC HKU & SenseTime RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding Arxiv project
35 OVIR-3D Rutgers University OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D Data CoRL'2023 github
36 OpenIns3D Cambridge & HKU & HKUST OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation Arxiv project
37 Open3DIS VinAI et al. Open3DIS: Open-vocabulary 3D Instance Segmentation with 2D Mask Guidance Arxiv project
38 SAI3D PKU et al. SAI3D: Segment Any Instance in 3D Scenes Arxiv project

3D Reasoning

ID keywords Institute (first) Paper Publication Others
1 3D-CLR UCLA 3D Concept Learning and Reasoning from Multi-View Images CVPR'2023 github
2 Transcribe3D TTI, Chicago Transcribe3D: Grounding LLMs Using Transcribed Information for 3D Referential Reasoning with Self-Corrected Finetuning CoRL'2023 github

3D Generation

ID keywords Institute Paper Publication Others
1 3D-GPT ANU 3D-GPT: Procedural 3D Modeling with Large Language Models Arxiv github
2 MeshGPT TUM MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers Arxiv project
3 ShapeGPT Fudan University ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model Arxiv github
4 DreamLLM MEGVII & Tsinghua DreamLLM: Synergistic Multimodal Comprehension and Creation Arxiv github
5 LLMR MIT, RPI & Microsoft LLMR: Real-time Prompting of Interactive Worlds using Large Language Models Arxiv github
6 ChatAvatar Deemos Tech DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance ACM TOG website

3D Embodied Agent

ID keywords Institute Paper Publication Others
1 RT-1 Google RT-1: Robotics Transformer for Real-World Control at Scale Arxiv github
2 RT-2 Google-DeepMind RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control Arxiv github
3 SayPlan QUT Centre for Robotics SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning CoRL'2023 github
4 UniHSI Shanghai AI Lab Unified Human-Scene Interaction via Prompted Chain-of-Contacts Arxiv github
5 LLM-Planner The Ohio State University LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models ICCV'2023 github
6 STEVE ZJU & UW See and Think: Embodied Agent in Virtual Environment Arxiv github
7 SceneDiffuser BIGAI Diffusion-based Generation, Optimization, and Planning in 3D Scenes Arxiv github
8 LEO BIGAI An Embodied Generalist Agent in 3D World Arxiv github
9 CLIP-Fields NYU, Meta CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory RSS'2023 github
10 Dobb-E NYU, Meta On Bringing Robots Home Arxiv github
11 VoxPoser Stanford VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models Arxiv github

3D Benchmarks

ID keywords Institute Paper Publication Others
1 ScanQA RIKEN AIP ScanQA: 3D Question Answering for Spatial Scene Understanding CVPR'2023 github
2 ScanRefer TUM ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language ECCV'2020 github
3 Scan2Cap TUM Scan2Cap: Context-aware Dense Captioning in RGB-D Scans CVPR'2021 github
4 SQA3D BIGAI SQA3D: Situated Question Answering in 3D Scenes ICLR'2023 github
5 - DeepMind & UCL Evaluating VLMs for Score-Based, Multi-Probe Annotation of 3D Objects Arxiv github
6 M3DBench Fudan University M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts Arxiv github

Contributing

This is an active repository and your contributions are always welcome!

I will keep some pull requests open if I'm not sure if they are awesome for 3D LLMs, you could vote for them by adding 👍 to them.


If you have any questions about this opinionated list, please get in touch at xianzheng@robots.ox.ac.uk.

Acknowledgement

This repo is inspired by Awesome-LLM

About

Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published