This is a repository for the final project in Deep Learning School about NLP spring 2021. Here I try to solve image captioning problem.
The whole idea of image captioning is to create the description of one picture. In two architectures below I am using the method of first giving the picture to the model and then making generation according to it and the previously decoded state.
Creating this project I've used three articles: Overview of image captioning models Show, Attend and Tell Metrics for image captionin
In a few scipts you can find:
- model - with and without attention mechanism
- training pipeline
- metrics calculation
All of the parts of the models and its representation can be found here in image_captioning_project.ipynb
All of the training reports with metrics can be found here: