Awesome-LLM-3D

Curated by Xianzheng Ma and Yash Bhalgat

🔥 Here is a curated list of papers about 3D-Related Tasks empowered by Large Language Models(LLMs). It contains various tasks including 3D understanding, reasoning, generation, and embodied agents.

Table of Content

Awesome-LLM-3D

3D Understanding

ID	Keywords	Institute	Paper	Publication	Others
1	3D-LLM	UCLA	3D-LLM: Injecting the 3D World into Large Language Models	NeurIPS'2023	github
2	LL3DA	Fudan University	LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning	Arxiv	github
3	LLM-Grounder	U-Mich	LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent	Arxiv	github
4	Point-Bind	CUHK	Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following	Arxiv	github
5	3D-VisTA	BIGAI	3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment	ICCV‘2023	github
6	LEO	BIGAI	An Embodied Generalist Agent in 3D World	Arxiv	github
7	OpenScene	ETHz	OpenScene: 3D Scene Understanding with Open Vocabularies	CVPR’2023	github
8	LERF	UC Berkeley	LERF: Language Embedded Radiance Fields	ICCV‘2023	github
9	ViewRefer	CUHK	ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding	ICCV'2023	github
10	Contrastive Lift	Oxford-VGG	Contrastive Lift: 3D Object Instance Segmentation by Slow-Fast Contrastive Fusion	NeurIPS'2023	github
11	CLIP2Scene	HKU	CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP	CVPR'2023	github
12	PointLLM	CUHK	PointLLM: Empowering Large Language Models to UnderstandPoint Clouds	Arxiv	github
13	-	MIT	Leveraging Large (Visual) Language Models for Robot 3D Scene Understanding	Arxiv	github
14	Chat-3D	ZJU	Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes	Arxiv	github
15	PLA	HKU	PLA: Language-Driven Open-Vocabulary 3D Scene Understanding	CVPR'2023	github
16	UniT3D	TUM	UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding	ICCV'2023	github
17	CG3D	JHU	CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition	Arxiv	github
18	JM3D-LLM	Xiamen University	JM3D & JM3D-LLM: Elevating 3D Representation with Joint Multi-modal Cues	ACM MM'2023	github
19	Open-Fusion	-	Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation	Arxiv	github
20	-	-	From Language to 3D Worlds: Adapting Language Model for Point Cloud Perception	OpenReview	-
21	OpenNerf	-	OpenNerf: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views	OpenReview	github
22	-	KAUST & LIX	Zero-Shot 3D Shape Correspondence	Siggraph Asia'2023	-
23	LiDAR-LLM	PKU	LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding	Arxiv	project
24	ScanNet200	TUM & NVIDIA	Language-Grounded Indoor 3D Semantic Segmentation in the Wild	ECCV'2022	project
25	Semantic Abstraction	Columbia	Semantic Abstraction: Open-World 3D Scene Understanding from 2D Vision-Language Models	CoRL'2022	project
26	CLIP-Fields	NYU & Meta AI	CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory	Arxiv	project
27	ConceptFusion	MIT et al.	ConceptFusion: Open-set Multimodal 3D Mapping	RSS'2023	project
28	CLIP-FO3D	Tsinghua & Xi'an Jiaotong University	CLIP-FO3D: Learning Free Open-world 3D Scene Representations from 2D Dense CLIP	ICCVW'2023	-
29	VL-Fields	University of Edinburgh	VL-Fields: Towards Language-Grounded Neural Implicit Spatial Representations	ICRA'2023	project
30	3D-VQA	ETH	CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes	CVPRW 2023	code
31	Multi-CLIP	ETH	Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes	Arxiv	-
32	OpenMask3D	ETH & Microsoft & Google	OpenMask3D: Open-Vocabulary 3D Instance Segmentation	NeurIPS'2023	project
33	3D-OVS	NTU et al.	Weakly Supervised 3D Open-vocabulary Segmentation	NeurIPS'2023	github
34	RegionPLC	HKU & SenseTime	RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding	Arxiv	project
35	OVIR-3D	Rutgers University	OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D Data	CoRL'2023	github
36	OpenIns3D	Cambridge & HKU & HKUST	OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation	Arxiv	project
37	Open3DIS	VinAI et al.	Open3DIS: Open-vocabulary 3D Instance Segmentation with 2D Mask Guidance	Arxiv	project
38	SAI3D	PKU et al.	SAI3D: Segment Any Instance in 3D Scenes	Arxiv	project

3D Reasoning

ID	keywords	Institute (first)	Paper	Publication	Others
1	3D-CLR	UCLA	3D Concept Learning and Reasoning from Multi-View Images	CVPR'2023	github
2	Transcribe3D	TTI, Chicago	Transcribe3D: Grounding LLMs Using Transcribed Information for 3D Referential Reasoning with Self-Corrected Finetuning	CoRL'2023	github

3D Generation

ID	keywords	Institute	Paper	Publication	Others
1	3D-GPT	ANU	3D-GPT: Procedural 3D Modeling with Large Language Models	Arxiv	github
2	MeshGPT	TUM	MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers	Arxiv	project
3	ShapeGPT	Fudan University	ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model	Arxiv	github
4	DreamLLM	MEGVII & Tsinghua	DreamLLM: Synergistic Multimodal Comprehension and Creation	Arxiv	github
5	LLMR	MIT, RPI & Microsoft	LLMR: Real-time Prompting of Interactive Worlds using Large Language Models	Arxiv	github
6	ChatAvatar	Deemos Tech	DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance	ACM TOG	website

3D Embodied Agent

ID	keywords	Institute	Paper	Publication	Others
1	RT-1	Google	RT-1: Robotics Transformer for Real-World Control at Scale	Arxiv	github
2	RT-2	Google-DeepMind	RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control	Arxiv	github
3	SayPlan	QUT Centre for Robotics	SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning	CoRL'2023	github
4	UniHSI	Shanghai AI Lab	Unified Human-Scene Interaction via Prompted Chain-of-Contacts	Arxiv	github
5	LLM-Planner	The Ohio State University	LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models	ICCV'2023	github
6	STEVE	ZJU & UW	See and Think: Embodied Agent in Virtual Environment	Arxiv	github
7	SceneDiffuser	BIGAI	Diffusion-based Generation, Optimization, and Planning in 3D Scenes	Arxiv	github
8	LEO	BIGAI	An Embodied Generalist Agent in 3D World	Arxiv	github
9	CLIP-Fields	NYU, Meta	CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory	RSS'2023	github
10	Dobb-E	NYU, Meta	On Bringing Robots Home	Arxiv	github
11	VoxPoser	Stanford	VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models	Arxiv	github

3D Benchmarks

ID	keywords	Institute	Paper	Publication	Others
1	ScanQA	RIKEN AIP	ScanQA: 3D Question Answering for Spatial Scene Understanding	CVPR'2023	github
2	ScanRefer	TUM	ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language	ECCV'2020	github
3	Scan2Cap	TUM	Scan2Cap: Context-aware Dense Captioning in RGB-D Scans	CVPR'2021	github
4	SQA3D	BIGAI	SQA3D: Situated Question Answering in 3D Scenes	ICLR'2023	github
5	-	DeepMind & UCL	Evaluating VLMs for Score-Based, Multi-Probe Annotation of 3D Objects	Arxiv	github
6	M3DBench	Fudan University	M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts	Arxiv	github

Contributing

This is an active repository and your contributions are always welcome!

I will keep some pull requests open if I'm not sure if they are awesome for 3D LLMs, you could vote for them by adding 👍 to them.

If you have any questions about this opinionated list, please get in touch at xianzheng@robots.ox.ac.uk.

Acknowledgement

This repo is inspired by Awesome-LLM

Name		Name	Last commit message	Last commit date
Latest commit History 374 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
contributing.md		contributing.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome-LLM-3D

Curated by Xianzheng Ma and Yash Bhalgat

Table of Content

3D Understanding

3D Reasoning

3D Generation

3D Embodied Agent

3D Benchmarks

Contributing

Acknowledgement

About

Releases

Packages

License

meng-f21/Awesome-LLM-3D

Folders and files

Latest commit

History

Repository files navigation

Awesome-LLM-3D

Curated by Xianzheng Ma and Yash Bhalgat

Table of Content

3D Understanding

3D Reasoning

3D Generation

3D Embodied Agent

3D Benchmarks

Contributing

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages