X-Dreamer 💤

A pytorch implementation of “X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation”

【Paper】【Project Page】

Introduction Video 🎥

blenderV2.mp4

intro.MP4

Requirement

System requirement: Ubuntu20.04
Tested GPUs: RTX3090
Environment Installation

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt

How to Run

Text-to-3D generation from an ellipsoid

# Geometry modeling
python -m torch.distributed.launch --nproc_per_node=4 \
        train_x_dreamer.py \
        --config configs/cupcake_geometry.json \
        --out-dir 'results/result_XDreamer/cupcake_geometry'

# Geometry modeling
python -m torch.distributed.launch --nproc_per_node=4 \
        train_x_dreamer.py \
        --config configs/cupcake_appearance.json \
        --out-dir 'results/result_XDreamer/cupcake_appearance' \
        --base-mesh 'results/result_XDreamer/cupcake_geometry/dmtet_mesh/mesh.obj'

Text-to-3D generation from coarse-grained meshes

# Geometry modeling
python -m torch.distributed.launch --nproc_per_node=4 \
        train_x_dreamer.py \
        --config configs/Batman_geometry.json \
        --out-dir 'results/result_XDreamer/Batman_geometry'

# Geometry modeling
python -m torch.distributed.launch --nproc_per_node=4 \
        train_x_dreamer.py \
        --config configs/Batman_appearance.json \
        --out-dir 'results/result_XDreamer/Batman_appearance' \
        --base-mesh 'results/result_XDreamer/Batman_geometry/dmtet_mesh/mesh.obj'

Overview 💻

Overview of the proposed X-Dreamer, which consists of two main stages: geometry learning and appearance learning.For the geometry learning stage, we employ DMTET as the 3D representation and initialize it with a 3D ellipsoid using the mean squared error (MSE) loss. Subsequently, we optimize DMTET and CG-LoRA using the score distillation sampling (SDS) loss and our proposed attention-mask alignment (AMA) loss to ensure the alignment between the 3D representation and the input text prompt. For the appearance learning, we leverage bidirectional reflectance distribution function (BRDF) modeling. Specifically, we utilize an MLP with trainable parameters to predict surface materials. Similar to the geometry learning stage, we optimize the MLP and CG-LoRA using the SDS loss and the AMA loss to achieve alignment between the 3D representation and the input text prompt.

News 📝

2023.11.27: Create Repository
2023.12.28: Release Code

Results 🔍

result.mp4

Example generated objects

We conduct the experiments using four Nvidia RTX 3090 GPUs and the PyTorch library. To calculate the SDS loss, we utilize the Stable Diffusion implemented by Hugging Face Diffusers. For the DMT_ET and material encoder, we implement them as a two-layer MLP and a single-layer MLP, respectively, with a hidden dimension of 32. We optimize X-Dreamer for 2000 iterations for geometry learning and 1000 iterations for appearance learning.

Text-to-3D generation from an ellipsoid

We present representative results of X-Dreamer for text-to-3D generation, utilizing an ellipsoid as the initial geometry.

Image	Normal

A DSLR photo of a blue and white porcelain vase, highly detailed, 8K, HD.	A DSLR photo of a blue and white porcelain vase, highly detailed, 8K, HD.

A cabbage, highly detailed.	A cabbage, highly detailed.

A chocolate cupcake, highly detailed.	A chocolate cupcake, highly detailed.

A sliced loaf of fresh bread.	A sliced loaf of fresh bread.

A DSLR photo of a pear, highly detailed, 8K, HD.	A DSLR photo of a pear, highly detailed, 8K, HD.

A hamburger.	A hamburger.

A DSLR photo of a corn, highly detailed, 8K, HD.	A DSLR photo of a corn, highly detailed, 8K, HD.

Text-to-3D generation from coarse-grained meshes

X-Dreamer also supports text-based mesh geometry editing and is capable of delivering excellent results.

Coarse-grained Mesh	Image	Normal

	A beautifully carved wooden queen chess piece.	A beautifully carved wooden queen chess piece.

	Barack Obama's head.	Barack Obama's head.

Different lighting conditions

We demonstrate how swapping the HDR environment map results in diverse lighting, thereby creating various reflective effects on the generated 3D assets in X-Dreamer.

Env. Map1	Env. Map2	Env. Map3	Env. Map4	Env. Map5



A DSLR photo of a brown cowboy hat.

Messi's head, highly detailed, 8K, HD.

A DSLR photo of a fox, highly detailed.

A DSLR photo of red rose, highly detailed, 8K, HD.

A marble bust of a mouse.

A small saguaro cactus planted in a clay pot.

A DSLR photo of a vase, highly detailed, 8K, HD.

Editing process

We demonstrate the editing process of the geometry and appearance of 3D assets in X-Dreamer using an ellipsoid and coarse-grained guided meshes as geometric shapes for initialization, respectively.

From an ellipsoid		From coarse-grained guided meshes

A DSLR photo of a blue and white porcelain vase, highly detailed, 8K, HD.		A marble bust of an angel, 3D model, high resolution.

A stack of pancakes covered in maple syrup.		A DSLR photo of the Terracotta Army, 3D model, high resolution.

Comparison

We compared X-Dreamer with four state-of-the-art (SOTA) methods: DreamFusion, Magic3D, Fantasia3D, and ProlificDreamer. The results are shown below:

DreamFusion	Magic3D	Fantasia3D	ProlificDreamer	X-Dreamer


A 3D rendering of Batman, highly detailed.


A cat, highly detailed.


Garlic with white skin, highly detailed, 8K, HD.


A statue of Leonardo DiCaprio's head.


A DSLR photo of Lord Voldemort's head, highly detailed.

Using results in 3D computer graphics software 🔧

blender.mp4

BibTeX 📚

  @article{ma2023xdreamer,
    title={X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation},
    author={Ma, Yiwei and Fan, Yijun and Ji, Jiayi and Wang, Haowei and Sun, Xiaoshuai and Jiang, Guannan and Shu, Annan and Ji, Rongrong},
    journal={arXiv preprint arXiv:2312.00085},
    year={2023}
  }

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
configs		configs
data		data
dataset		dataset
geometry		geometry
image		image
render		render
LICENSE		LICENSE
README.md		README.md
attention.py		attention.py
attention_processor.py		attention_processor.py
environment.yml		environment.yml
lora.py		lora.py
renderer.py		renderer.py
requirements.txt		requirements.txt
sd_cglora.py		sd_cglora.py
train_x_dreamer.py		train_x_dreamer.py
transformer_2d.py		transformer_2d.py
unet_2d_blocks.py		unet_2d_blocks.py
unet_2d_condition_multiatt.py		unet_2d_condition_multiatt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

X-Dreamer 💤

Introduction Video 🎥

Requirement

How to Run

Text-to-3D generation from an ellipsoid

Text-to-3D generation from coarse-grained meshes

Overview 💻

News 📝

Results 🔍

Example generated objects

Text-to-3D generation from an ellipsoid

Text-to-3D generation from coarse-grained meshes

Different lighting conditions

Editing process

Comparison

Using results in 3D computer graphics software 🔧

BibTeX 📚

About

Releases

Packages

Languages

License

xmu-xiaoma666/X-Dreamer

Folders and files

Latest commit

History

Repository files navigation

X-Dreamer 💤

Introduction Video 🎥

Requirement

How to Run

Text-to-3D generation from an ellipsoid

Text-to-3D generation from coarse-grained meshes

Overview 💻

News 📝

Results 🔍

Example generated objects

Text-to-3D generation from an ellipsoid

Text-to-3D generation from coarse-grained meshes

Different lighting conditions

Editing process

Comparison

Using results in 3D computer graphics software 🔧

BibTeX 📚

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages