🔥 🔥 On Manipulating Scene Text in the Wild with Diffusion Models (DBEST) WACV 2024 🔥 🔥

News

2024.11.08 Pretrained weights can be downloaded here

2024.02.05 Pre-Release code 🥳 🥳

Requirements

In our implementation, we use text2image pre-trained weight from Latent Diffusion Model (LDM). Please download the pre-trained weight from their official github. Alternatively, you can use directly from Diffusers library (version 0.3.0).
Please note that our method requires a cropped text. You may use EAST for cropping the text or manual crop.
For training LDM with Syntext, please download Syntext.

Run DBEST

Step 1: Generate Synthesized Text Scene Dataset

You can download our SynText dataset on this. Alternatively, you can generate by your own. Please refer to srnet. We slightly change the code from its origin. Please refer to generate-syntext directory.

Please go to generate-syntext/ directory and run

python datagen.py

Step 2: Training Diffusion model with Syntext dataset

Before training the noise model, please initialize the weight from the pretrained text2img from Latent Diffusion Model. Then, go to outer-loop/ directory and run

python finetune.py

Step 3: Finetuning per sample image

For text recognition model, please use the origin weight from ABINet. After that, please run

python gradio_dbest.py

FAQs & Discussion

Q: What is the sampling method used in this paper?

A: We use Denoising Diffusion Implicit Models (DDIM) and Pseudo Numerical Methods for Diffusion Models on Manifolds (PNDM) as implemented by diffusers and we used a normalized guidance scale (gs) 0-1.

Q: Is there any post-processing step?

A: Yes. We apply the color transfer method to enhance the final result's quality.

Q: How to generate the Syntext dataset?

A: Yes. Please refer to the original srnet repository. We heavily used their code and only changed the parameters.

Q: Why the result is broken?

A: You should check the diffusers version. I tried to re-implement my code with the newest version of diffusers and found some of my code should be tuned. My code used the earlier diffusers and some of the functions are overrided by myself. If you have any questions, please send me an email or just write in the issue.

Q: For any discussion and input?

A: I open for any discussion and input. Please send me an email to janojoshua_at_gmail_dot_com.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
t2tldm		t2tldm
.gitignore		.gitignore
README.md		README.md
gradio_dbest.py		gradio_dbest.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔥 🔥 On Manipulating Scene Text in the Wild with Diffusion Models (DBEST) WACV 2024 🔥 🔥

News

Requirements

Run DBEST

Step 1: Generate Synthesized Text Scene Dataset

Step 2: Training Diffusion model with Syntext dataset

Step 3: Finetuning per sample image

FAQs & Discussion

About

Releases

Packages

Languages

joshuajano/DBEST

Folders and files

Latest commit

History

Repository files navigation

🔥 🔥 On Manipulating Scene Text in the Wild with Diffusion Models (DBEST) WACV 2024 🔥 🔥

News

Requirements

Run DBEST

Step 1: Generate Synthesized Text Scene Dataset

Step 2: Training Diffusion model with Syntext dataset

Step 3: Finetuning per sample image

FAQs & Discussion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages