The goal of this project is to build a representation based on bags of visual words and to use spatial pyramid matching for classifying the scene categories.
The dataset used in this project is a subset of SUN database. The data set contains 1600 images from various scene categories like garden, library and ocean.
In this project, the visual words were built from the training set images and with the visual words, i.e. the dictionary, in section 2, the images are represented as a visual-word vector. Then the comparison between images is realized in the visual-word vector space. Finally, a scene recognition system was built based on the visual bag-of-words approach to classify a given image into 8 types of scenes.
An overview of the project is represented in the below figure