Skip to content

Latest commit



302 lines (223 loc) · 9.82 KB

File metadata and controls

302 lines (223 loc) · 9.82 KB

InPainTor🎨: Context-Aware Segmentation and Inpainting in Real-Time

License: GPL v3 Python 3.8+ Conda


InPainTor🎨 is a deep learning model designed for context-aware segmentation and inpainting in real-time. It recognizes objects of interest and performs inpainting on specific classes while preserving the surrounding context.


🚀 Features

  • Real-time object recognition and inpainting
  • Selective removal and filling of missing or unwanted objects
  • Context preservation during inpainting
  • Two-stage training process: segmentation and inpainting
  • Support for COCO and RORD datasets

🚧 WIP (Work In Progress)

This project is currently under development. Use with caution and expect changes.

🛠️ Installation

  1. Clone the repository:

    git clone
    cd InPainTor
  2. Create and activate the Conda environment:

    conda env create -f environment.yml
    conda activate inpaintor

🖥️ Usage


To train the InPainTor model:

python src/ --coco_data_dir "path/to/COCO" --rord_data_dir "path/to/RORD" --seg_epochs <num_epochs> --inpaint_epochs <num_epochs>
Click to view all training arguments
  • --coco_data_dir: Path to the COCO 2017 dataset directory
  • --rord_data_dir: Path to the RORD dataset directory
  • --seg_epochs: Number of epochs for segmentation training (default: 10)
  • --inpaint_epochs: Number of epochs for inpainting training (default: 10)
  • --batch_size: Batch size for training (default: 2)
  • --learning_rate: Learning rate for the optimizer (default: 0.1)
  • --image_size: Size of the input images, assumed to be square (default: 512)
  • --mask_size: Size of the masks, assumed to be square (default: 256)
  • --model_name: Name of the model (default: 'InPainTor')
  • --log_interval: Log interval for training (default: 1000)
  • --resume_checkpoint: Path to the checkpoint to resume training from (default: None)
  • --selected_classes: List of class IDs for inpainting (default: [1, 72, 73, 77])


To perform inference using the trained InPainTor model:

python src/ --model_path "path/to/model.pth" --data_dir "path/to/data" --image_size 512 --mask_size 256 --batch_size <num_examples_per_batch> --output_dir "path/to/outputs"
Click to view all inference arguments
  • --model_path: Path to the trained model checkpoint
  • --data_dir: Path to the directory containing images for inference
  • --image_size: Size of the input images, assumed to be square (default: 512)
  • --mask_size: Size of the masks, assumed to be square (default: 256)
  • --batch_size: Batch size for inference (default: 1)
  • --output_dir: Path to the directory to save the inpainted images

📁 Project Structure

Click to view the repository structure
├── assets/                   📂: Repository assets (images, logos, etc.)
├── checkpoints/              💾: Model checkpoints
├── logs/                     📃: Log files
├── notebooks/                📓: Jupyter notebooks
├── outputs/                  📺: Output files generated during inference, training and debugging
├── src/                      📜: Source code files
│   ├──           📊: Initialization file
│   ├──  📑: Data augmentation operations
│   ├──            📊: Dataset loading and preprocessing
│   ├──        📊: Model debugging
│   ├──          📊: Inference script
│   ├──             📊: Model layers
│   ├──             📊: Loss functions
│   ├──              📑: InpainTor model implementation
│   ├──              📊: Training script
│   └──     📊: Visualization functions
├── .gitignore                🚫: Files to ignore in Git
├── environment.yml           🎛️: Conda environment configuration
└──                 📖: Project README file

🧠 Model Architecture

The InPainTor model consists of three main components:

  1. SharedEncoder: Encodes input images into a series of feature maps.
  2. SegmentorDecoder: Decodes encoded features into segmentation masks.
  3. GenerativeDecoder: Uses segmentation information to generate inpainted images.

Overview of InPainTor Model Architecture


Model Components in Detail


Model Concept


Model Training Process

  1. Train SharedEncoder and SegmentorDecoder for accurate segmentation
  2. Freeze SharedEncoder and and SegmentorDecoder, train GenerativeDecoder.


Example of Loss During Training Stages


📊 Dataset Requirements

RORD Inpainting Dataset Structure

The RORD dataset should be organized as follows:

├── train/
│   ├── img/
│   │   ├── image1.jpg
│   │   ├── image2.jpg
│   │   └── ...
│   └── gt/
│       ├── image1.jpg
│       ├── image2.jpg
│       └── ...
└── val/
    ├── img/
    │   ├── image1.jpg
    │   ├── image2.jpg
    │   └── ...
    └── gt/
        ├── image1.jpg
        ├── image2.jpg
        └── ...
COCO Segmentation Dataset Structure

The COCO dataset (2017 version with 91 classes) should be organized as follows:

├── train/
│   ├── img/
│   │   ├── image1.jpg
│   │   ├── image2.jpg
│   │   └── ...
│   └── gt/
│       ├── image1.jpg
│       ├── image2.jpg
│       └── ...
└── val/
    ├── img/
    │   ├── image1.jpg
    │   ├── image2.jpg
    │   └── ...
    └── gt/
        ├── image1.jpg
        ├── image2.jpg
        └── ...

For more information on COCO dataset classes, refer to this link.

🔮 Current Limitations and Future Work

  1. Segmentation Performance:

    • The current segmentation model works relatively well for small datasets with limited variety
    • It struggles with larger, more diverse datasets like COCO 2017.
  2. Generator Performance:

    • The current generator architecture may be too simplistic, particularly in the layers following the masking process.
    • The frozen encoder in the generator could be limiting the model's learning capacity.
  3. Hardware Constraints:

    • Memory limitations restrict model size and batch processing capabilities.
    • Impacts choice of architectures and training strategies.
  4. No Data Augmentation:

    • Not currently integrated into the training pipeline (but the implementation is 90% ready)
Future Work
  1. Improve Segmentation Section:

    • Investigate and implement more sophisticated segmentation architectures like ENet or BiSeNet
    • Check if itś possible to adapt pre-trained models to this architecture.
  2. Enhance Generator Architecture:

    • Increase the number of parameters and layers after the masking process in the generator.
    • Experiment with more sophisticated generator designs, potentially allowing (limited) parts of the encoder to be trainable.
  3. Experiment with Cost Functions:

    • Test and evaluate alternative loss functions.
    • Consider multi-objective loss functions that balance different aspects of the inpainting task.
  4. Incorporate Data Augmentation:

    • Integrate the already implemented data augmentation techniques into the training pipeline.
  5. Evaluation Metrics:

    • Implement evaluation metrics to better assess the quality of inpainted images.

🤝 Contributing

Contributions to the InPainTor project are welcome! Please follow these steps to contribute:

  1. Fork the repository
  2. Create a new branch for your feature or bug fix
  3. Commit your changes
  4. Push to your fork and submit a pull request

We appreciate your contributions to improve InPainTor!

🙏 Acknowledgements

This work is funded by FCT - Fundação para a Ciência e a Tecnologia, I.P., through project with reference 2022.09235.PTDC.

📄 License

This project is licensed under GPLv3.

For more information or support, please open an issue in the GitHub repository.