InPainTor🎨 is a deep learning model designed for context-aware segmentation and inpainting in real-time. It recognizes objects of interest and performs inpainting on specific classes while preserving the surrounding context.
- Real-time object recognition and inpainting
- Selective removal and filling of missing or unwanted objects
- Context preservation during inpainting
- Two-stage training process: segmentation and inpainting
- Support for COCO and RORD datasets
This project is currently under development. Use with caution and expect changes.
-
Clone the repository:
git clone https://github.com/your-username/InPainTor.git cd InPainTor
-
Create and activate the Conda environment:
conda env create -f environment.yml conda activate inpaintor
To train the InPainTor model:
python src/train.py --coco_data_dir "path/to/COCO" --rord_data_dir "path/to/RORD" --seg_epochs <num_epochs> --inpaint_epochs <num_epochs>
Click to view all training arguments
--coco_data_dir
: Path to the COCO 2017 dataset directory--rord_data_dir
: Path to the RORD dataset directory--seg_epochs
: Number of epochs for segmentation training (default: 10)--inpaint_epochs
: Number of epochs for inpainting training (default: 10)--batch_size
: Batch size for training (default: 2)--learning_rate
: Learning rate for the optimizer (default: 0.1)--image_size
: Size of the input images, assumed to be square (default: 512)--mask_size
: Size of the masks, assumed to be square (default: 256)--model_name
: Name of the model (default: 'InPainTor')--log_interval
: Log interval for training (default: 1000)--resume_checkpoint
: Path to the checkpoint to resume training from (default: None)--selected_classes
: List of class IDs for inpainting (default: [1, 72, 73, 77])
To perform inference using the trained InPainTor model:
python src/inference.py --model_path "path/to/model.pth" --data_dir "path/to/data" --image_size 512 --mask_size 256 --batch_size <num_examples_per_batch> --output_dir "path/to/outputs"
Click to view all inference arguments
--model_path
: Path to the trained model checkpoint--data_dir
: Path to the directory containing images for inference--image_size
: Size of the input images, assumed to be square (default: 512)--mask_size
: Size of the masks, assumed to be square (default: 256)--batch_size
: Batch size for inference (default: 1)--output_dir
: Path to the directory to save the inpainted images
Click to view the repository structure
InpainTor/
├── assets/ 📂: Repository assets (images, logos, etc.)
├── checkpoints/ 💾: Model checkpoints
├── logs/ 📃: Log files
├── notebooks/ 📓: Jupyter notebooks
├── outputs/ 📺: Output files generated during inference, training and debugging
├── src/ 📜: Source code files
│ ├── __init__.py 📊: Initialization file
│ ├── data_augmentation.py 📑: Data augmentation operations
│ ├── dataset.py 📊: Dataset loading and preprocessing
│ ├── debug_model.py 📊: Model debugging
│ ├── inference.py 📊: Inference script
│ ├── layers.py 📊: Model layers
│ ├── losses.py 📊: Loss functions
│ ├── model.py 📑: InpainTor model implementation
│ ├── train.py 📊: Training script
│ └── visualizations.py 📊: Visualization functions
├── .gitignore 🚫: Files to ignore in Git
├── environment.yml 🎛️: Conda environment configuration
└── README.md 📖: Project README file
The InPainTor model consists of three main components:
- SharedEncoder: Encodes input images into a series of feature maps.
- SegmentorDecoder: Decodes encoded features into segmentation masks.
- GenerativeDecoder: Uses segmentation information to generate inpainted images.
- Train SharedEncoder and SegmentorDecoder for accurate segmentation
- Freeze SharedEncoder and and SegmentorDecoder, train GenerativeDecoder.
RORD Inpainting Dataset Structure
The RORD dataset should be organized as follows:
root_dir/
├── train/
│ ├── img/
│ │ ├── image1.jpg
│ │ ├── image2.jpg
│ │ └── ...
│ └── gt/
│ ├── image1.jpg
│ ├── image2.jpg
│ └── ...
└── val/
├── img/
│ ├── image1.jpg
│ ├── image2.jpg
│ └── ...
└── gt/
├── image1.jpg
├── image2.jpg
└── ...
COCO Segmentation Dataset Structure
The COCO dataset (2017 version with 91 classes) should be organized as follows:
root_dir/
├── train/
│ ├── img/
│ │ ├── image1.jpg
│ │ ├── image2.jpg
│ │ └── ...
│ └── gt/
│ ├── image1.jpg
│ ├── image2.jpg
│ └── ...
└── val/
├── img/
│ ├── image1.jpg
│ ├── image2.jpg
│ └── ...
└── gt/
├── image1.jpg
├── image2.jpg
└── ...
For more information on COCO dataset classes, refer to this link.
Limitations
-
Segmentation Performance:
- The current segmentation model works relatively well for small datasets with limited variety
- It struggles with larger, more diverse datasets like COCO 2017.
-
Generator Performance:
- The current generator architecture may be too simplistic, particularly in the layers following the masking process.
- The frozen encoder in the generator could be limiting the model's learning capacity.
-
Hardware Constraints:
- Memory limitations restrict model size and batch processing capabilities.
- Impacts choice of architectures and training strategies.
-
No Data Augmentation:
- Not currently integrated into the training pipeline (but the implementation is 90% ready)
Future Work
-
Improve Segmentation Section:
- Investigate and implement more sophisticated segmentation architectures like ENet or BiSeNet
- Check if itś possible to adapt pre-trained models to this architecture.
-
Enhance Generator Architecture:
- Increase the number of parameters and layers after the masking process in the generator.
- Experiment with more sophisticated generator designs, potentially allowing (limited) parts of the encoder to be trainable.
-
Experiment with Cost Functions:
- Test and evaluate alternative loss functions.
- Consider multi-objective loss functions that balance different aspects of the inpainting task.
-
Incorporate Data Augmentation:
- Integrate the already implemented data augmentation techniques into the training pipeline.
-
Evaluation Metrics:
- Implement evaluation metrics to better assess the quality of inpainted images.
Contributions to the InPainTor project are welcome! Please follow these steps to contribute:
- Fork the repository
- Create a new branch for your feature or bug fix
- Commit your changes
- Push to your fork and submit a pull request
We appreciate your contributions to improve InPainTor!
This work is funded by FCT - Fundação para a Ciência e a Tecnologia, I.P., through project with reference 2022.09235.PTDC.
This project is licensed under GPLv3.
For more information or support, please open an issue in the GitHub repository.