Multi-Modal Image Generation using Grounding DINO Segment Anything Model and Stable Diffusion

Overview

This project integrates multiple functionalities for image processing, including zero-shot object detection using Grounding DINO, followed by semantic segmentation with Meta AI's Segment Anything Model (SAM) and Finally Inpainting using Stability AI's Stable Diffusionv2 for inpainting the input image. The inpainting quality assessment was done using Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM). The gaol of the project was to integrate these techniques and deploy the model on Hugging Face with a Gradio interface for users to detect, segment regions and inpaint them in the provided images.

Key Features

Object Detection: Utilizes DINO for object detection within images.
Object Segmentation: Utilizes SAM for object detection within the bounding box.
Image Inpainting: Implements Stable Diffusion for inpainting missing regions in images.
Quality Assessment: Computes PSNR and SSIM metrics to assess the quality of processed images.

Usage

Input Image: Upload an image or drag and drop the Input Image in the first window for processing.
Detection Prompt: Enter a prompt in the first text box with the object/accessory you want to detect.
Mask Generation: Click on the Region of Interest created by the detected box in the second window for mask generation using SAM.(It is recommended to click on multiple areas of the ROI to get a better and finer mask).
Inpainting Prompt: After you are satisfied with the mask, enter the prompt in the second text box for inpainting the region with the object/accesory of your choice.
Submit: Click to initiate processing and view the output. (Submitting the prompt multiple times will create different variations)

Testing Results

Gradio Demo

NOTE: This application works best for images with the resolution of 512x512. For images with a higher resolution the inapinting quality drops substantially.

Dependencies

gradio
numpy
torch
diffusers
PIL
cv2
skimage
huggingface_hub
GroundingDINO
torchvision

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
DINO_SAM_StableDiffusionv2.ipynb		DINO_SAM_StableDiffusionv2.ipynb
README.md		README.md
desktop.ini		desktop.ini
dino_sam_sdiff.png		dino_sam_sdiff.png
dino_sam_sdiff_output.mp4		dino_sam_sdiff_output.mp4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Modal Image Generation using Grounding DINO Segment Anything Model and Stable Diffusion

Overview

Key Features

Usage

Testing Results

Gradio Demo

Dependencies

References

About

Releases

Packages

Languages

Rishikesh-Jadhav/Multi-Modal-Image-Generation-using-Grounding-DINO-SAM-and-Stable-Diffusionv2

Folders and files

Latest commit

History

Repository files navigation

Multi-Modal Image Generation using Grounding DINO Segment Anything Model and Stable Diffusion

Overview

Key Features

Usage

Testing Results

Gradio Demo

Dependencies

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages