This project focuses on building a robust and accurate classification system for identifying different breeds of dogs using cutting-edge vision transformer models. The models used include ViT, Swin, BEiT, DeiT, and LeViT.
This project leverages several transformer-based models known for their capabilities in image recognition tasks:- ViT (Vision Transformer): A pioneering model that applies transformers directly to image patches.
- Swin Transformer: A hierarchical vision transformer with a shifted window mechanism for capturing contextual information.
- BEiT (Bidirectional Encoder representation from Image Transformers): Utilizes a transformer-based self-supervised framework for image tasks.
- DeiT (Data-efficient Image Transformer): A robust and data-efficient variant of ViT.
- LeViT: Optimized for low latency and efficiency, well-suited for smaller devices.
The dataset used for this project was collected through web scraping to compile a comprehensive set of dog breed images.
Contributions are welcome! If you have suggestions, improvements, or bug fixes, feel free to create a pull request or open an issue.