Curated by Xianzheng Ma and Yash Bhalgat
🔥 Here is a curated list of papers about 3D-Related Tasks empowered by Large Language Models(LLMs). It contains various tasks including 3D understanding, reasoning, generation, and embodied agents.
ID | keywords | Institute (first) | Paper | Publication | Others |
---|---|---|---|---|---|
1 | 3D-CLR | UCLA | 3D Concept Learning and Reasoning from Multi-View Images | CVPR'2023 | github |
2 | Transcribe3D | TTI, Chicago | Transcribe3D: Grounding LLMs Using Transcribed Information for 3D Referential Reasoning with Self-Corrected Finetuning | CoRL'2023 | github |
ID | keywords | Institute | Paper | Publication | Others |
---|---|---|---|---|---|
1 | 3D-GPT | ANU | 3D-GPT: Procedural 3D Modeling with Large Language Models | Arxiv | github |
2 | MeshGPT | TUM | MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers | Arxiv | project |
3 | ShapeGPT | Fudan University | ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model | Arxiv | github |
4 | DreamLLM | MEGVII & Tsinghua | DreamLLM: Synergistic Multimodal Comprehension and Creation | Arxiv | github |
5 | LLMR | MIT, RPI & Microsoft | LLMR: Real-time Prompting of Interactive Worlds using Large Language Models | Arxiv | github |
6 | ChatAvatar | Deemos Tech | DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance | ACM TOG | website |
ID | keywords | Institute | Paper | Publication | Others |
---|---|---|---|---|---|
1 | ScanQA | RIKEN AIP | ScanQA: 3D Question Answering for Spatial Scene Understanding | CVPR'2023 | github |
2 | ScanRefer | TUM | ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language | ECCV'2020 | github |
3 | Scan2Cap | TUM | Scan2Cap: Context-aware Dense Captioning in RGB-D Scans | CVPR'2021 | github |
4 | SQA3D | BIGAI | SQA3D: Situated Question Answering in 3D Scenes | ICLR'2023 | github |
5 | - | DeepMind & UCL | Evaluating VLMs for Score-Based, Multi-Probe Annotation of 3D Objects | Arxiv | github |
6 | M3DBench | Fudan University | M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts | Arxiv | github |
This is an active repository and your contributions are always welcome!
I will keep some pull requests open if I'm not sure if they are awesome for 3D LLMs, you could vote for them by adding 👍 to them.
If you have any questions about this opinionated list, please get in touch at xianzheng@robots.ox.ac.uk.
This repo is inspired by Awesome-LLM