Image Captioning Generator Keras
Dataset - Flickr 8k Dataset
Flicker8k_Dataset
Flickr8k_text
Flicker8k_Dataset - Contains 8092 images in jpeg format.
Flickr8k_text - Each image contains 5 description.
Built and trained a deep learning model for captioning real world image.
- Used pre-trained InceptionV3 to extract feature from image.
- Used pre-trained fasttext embedding, these were feed into a Stacked Bi-directional GRU layer.
- They both were combined and predicted the next word till the end of caption using greedy search (during testing).
- Add attention
- Use beam search instead of greddy seaech