Skip to content

This repository offers a straightforward implementation of Vision Transformers (ViT), specifically designed for computer vision tasks using PyTorch. Dive into efficient and practical transformer applications for image recognition.

Notifications You must be signed in to change notification settings

benisalla/Tiny-ViT-Transformer-from-scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

36 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

TINY-ViT: Vision Transformer from Scratch

Implementing a Vision Transformer model from the scratch.

A little demo
Explanatory Video

Table of Contents πŸ“˜


About The Project

TINY-ViT offers a minimalist, yet complete implementation of the Vision Transformer (ViT) architecture for computer vision tasks. This project aims to provide a clear and structured approach to building Vision Transformers, making it accessible for educational purposes and practical applications alike.


Features

  • Modular Design: Clear separation of components like data processing, model architecture, and training routines.

  • Customizable: Easy to adapt the architecture and data pipeline for various datasets and applications.

  • Poetry Dependency Management: Utilizes Poetry for simple and reliable package management.

  • Advanced Embedding Techniques: Implements three distinct techniques for image embedding in Vision Transformers:

    • ViTConv2dEmbedding: Utilizes a Conv2D layer to transform input images into a sequence of flattened 2D patches, with a learnable class token appended.
    class ViTConv2dEmbedding(nn.Module):
    • ViTLNEmbedding: Applies layer normalization to flattened input patches before projecting them into an embedding space, enhancing stability and performance.
    class ViTLNEmbedding(nn.Module):
    • ViTPyCon2DEmbedding: Offers a unique tensor reshaping strategy to transform input images into a sequence of embedded patches, also including a learnable class token.
    class ViTPyCon2DEmbedding(nn.Module):
  • Custom Activation Function: Incorporates the ViTGELUActFun class, which implements the Gaussian Error Linear Unit (GELU), providing smoother gating behavior than traditional nonlinearities like ReLU.

    class ViTGELUActFun(nn.Module):

Project Structure

TINY-VIT-TRANSFORMER-FROM-SCRATCH
β”‚
β”œβ”€β”€ dataset                   # Dataset directory
β”œβ”€β”€ tests                     # Test scripts
β”œβ”€β”€ tiny_vit_transformer_from_scratch
β”‚   β”œβ”€β”€ core                  # Core configurations and caching
β”‚   β”œβ”€β”€ data                  # Data processing modules
β”‚   └── model                 # Transformer model components
β”œβ”€β”€ train.py                  # Script to train the model
β”œβ”€β”€ finetune.py               # Script for fine-tuning the model
β”œβ”€β”€ README.md                 # Project README file
β”œβ”€β”€ poetry.lock               # Poetry lock file for consistent builds
└── pyproject.toml            # Poetry project file with dependency descriptions

Built With

This section should list any major frameworks/libraries used to bootstrap your project:


Getting Started

To get a local copy up and running follow these simple steps.

Installation

  1. Clone the repo
    git clone https://github.com/benisalla/tiny-vit-transformer-from-scratch.git
  2. Install Poetry packages
    poetry install

Usage

how you can use this code

Training

To train the model using the default configuration:

poetry run python train.py

Fine-Tuning

To fine-tune a pre-trained model:

poetry run python finetune.py

Model Performance

The tiny-vit model was evaluated on a comprehensive set of test images to gauge its accuracy and performance. Here are the results:

  • Accuracy on test images: 81.60%

These results demonstrate the effectiveness of the tiny-vit model in handling complex image recognition tasks. We continuously seek to improve the model and update the metrics as new test results become available.

image

image


Roadmap

See the open issues for a list of proposed features (and known issues).


Contributing

Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Contributions are welcome! For major changes, please open an issue first to discuss what you would like to change.


Contributors

  • Asmae El-Ghezzaz - Data Scientist/ML

    • GitHub
    • Contributions: Provided expertise in machine learning and data science methodologies.
  • Idriss El Houari - Data Scientist

    • GitHub
    • Kaggle
    • Contributions: Curated the plant disease dataset and developed the initial analysis notebook.
  • Farheen Akhter - Graduate Student at California State University

    • GitHub
    • Kaggle
    • Contributions: Worked on improving crop yield and pest/disease detection through data analytics. Provided dataset and analytical insights on local farms in Ghana.
  • Aicha Dessa - Data Scientist Intern

    • GitHub
    • Kaggle
    • Contributions: Analyzed plant disease recognition data and contributed to model training and testing processes.
  • Zeroual Salma - Student at Agronomic and Veterinary Institute Hassan II

    • GitHub
    • Contributions: Focused on plant pathology and contributed to dataset analysis and insights into Botrytis disease.
  • El Fakir Chaimae - Master's Student in Artificial Intelligence

    • GitHub
    • Contributions: Provided datasets on pathogen detection and collaborated on developing AI models for disease prediction.
  • Laghbissi Salma - Master's Student in Software Engineering for Cloud Computing

    • GitHub
    • Contributions: Researched on plant pathologies and contributed significantly to the dataset understanding and processing.

Authors

  • Ismail Ben Alla (Me πŸ˜‰) - View My GitHub Profile
  • Asmae El-Ghezzaz (a friend of mine) - deserves a special thanks for her help and advices

Acknowledgements

This project owes its success to the invaluable support and resources provided by several individuals and organizations. A heartfelt thank you to:

  • Asmae El-Ghezzaz - For inviting me to be a member of Moroccan Data Scientists (MDS), where I had the opportunity to develop this project.
  • Moroccan Data Scientists (MDS) - Although I am no longer a member, I hold great admiration for the community and wish it continued success.
  • Pests and Vigitebles Diseased Detection Team in MDS - Aicha, hiba, idriss, farheen, asmae, ...
  • PyTorch - For the powerful and flexible deep learning platform that has made implementing models a smoother process.
  • Kaggle - For providing the datasets used in training our models and hosting competitions that inspire our approaches.
  • Google Colab - For the computational resources that have been instrumental in training and testing our models efficiently.

License

This project is made available under fair use guidelines. While there is no formal license associated with the repository, users are encouraged to credit the source if they utilize or adapt the code in their work. This approach promotes ethical practices and contributions to the open-source community. For citation purposes, please use the following:

@misc{tiny_vit_2024,
  title={TINY-ViT: Vision Transformer from Scratch},
  author={Ben Alla Ismail},
  year={2024},
  url={https://github.com/benisalla/tiny-vit-transformer-from-scratch}
}

About Me

πŸŽ“ Ismail Ben Alla - Neural Network Enthusiast

I am deeply passionate about exploring artificial intelligence and its potential to solve complex problems and unravel the mysteries of our universe. My academic and professional journey is characterized by a commitment to learning and innovation in AI, deep learning, and machine learning.

What Drives Me

  • Passion for AI: Eager to push the boundaries of technology and discover new possibilities.
  • Continuous Learning: Committed to staying informed and skilled in the latest advancements.
  • Optimism and Dedication: Motivated by the challenges and opportunities that the future of AI holds.

I thoroughly enjoy what I do and am excited about the future of AI and machine learning. Let's connect and explore the endless possibilities of artificial intelligence together!


Get ready to see pixels transform into insights πŸŒŸπŸ”βœ¨

About

This repository offers a straightforward implementation of Vision Transformers (ViT), specifically designed for computer vision tasks using PyTorch. Dive into efficient and practical transformer applications for image recognition.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published