We implement a simple module that detects the objects in an image via text prompts. In our module, we use pretrained OWLv2 provided by HuggingFace.
- Install Conda, if not already installed.
- Clone the repository
git clone https://github.com/byrkbrk/prompting-for-object-detection.git
- Change the directory:
cd prompting-for-object-detection
- For macos, run:
For linux or windows, run:
conda env create -f prompting-for-od_macos.yaml
conda env create -f prompting-for-od_linux.yaml
- Activate the environment:
conda activate prompting-for-od
Check it out how to use:
python3 detect.py -h
Output:
Detects bounding boxes for given image and text prompts
positional arguments:
image_name Name of the image file that be processed. Note image
file must be in 'segmentation-images' directory
text_prompts Text prompts for the model
options:
-h, --help show this help message and exit
--image_size IMAGE_SIZE [IMAGE_SIZE ...]
Size (height, width) to which the image be transformed
--device DEVICE Device that be used during inference
python3 detect.py dogs.jpg "jacket" "small nose" --image_size 1024 1024
The output image with bounding boxes (see below, on the right) will be saved into detected-images
folder.
To run the gradio app on your local computer, execute
python3 app.py
Then, visit the url http://127.0.0.1:7860 to open the interface seen below.