- Add patch embeddings
- Add transformer encoder layer
- Add transformer encoder (multiple layers)
- Why repeat class token?
- Attention dropout
- Embedding dropout
- MLP dropout (in encoder)
- Add classification head
- Complete ViT-Base
- Make named layers to make torchvision compatible
- Add training scripts
pip install vit_pytorch
Load a config.yml
file and pass to ViT
module to modify architecture
parameters.