Image Captioning using Deep learning models in Keras. The models were trained on Flickr_8k Dataset using Google Colab.
- Prepare photo and text data for training a deep learning model.
- Design and train a deep learning model.
- Evaluate the model
- Using this model generate caption for new pictures.
- Data collection
- Understanding the data
- Data Cleaning
- Loading the training set
- Data Preprocessing — Images
- Data Preprocessing — Captions
- Data Preparation using Generator Function
- Word Embeddings
- Model Architecture
- Inference
After requesting the dataset from the author's website. I got these two files.
- Flickr8k_Dataset: Contains 8092 photographs in JPEG format.
- Flickr8k_text: Contains a number of files containing different sources of descriptions for the photographs.
The dataset has a pre-defined training dataset (6,000 images), development dataset (1,000 images), and test dataset (1,000 images).
Built a basic web app using Flask. It takes an image as input and generates a caption to it.
- From the result you can see it's not accurate because model was trained for 5 epochs due to limited GPU time Google colab.
- Using Checkpoints can make a difference but it will be updated.
- Have to try for other techniques like different pretrained models for feature etraction and word to vec for token generation.
- This is Implemented by understanding the tutorial of Jason Brownlee(Machine learning mastery).