Ultimate-Awesome-Transformer-Attention

This repo contains a comprehensive paper list of Vision Transformer & Attention, including papers, codes, and related websites.
This list is maintained by Min-Hung Chen. (Actively keep updating)

If you find some ignored papers, feel free to open issues or create pull requests.
Contributions in any form to make this list more comprehensive are welcome.

If you find this repository useful, please consider citing and STARing this list.
Feel free to share this list with others!

Overview

Image Classification / Backbone
Detection
Segmentation
Video (High-level)
Multi-Modality
Other High-level Vision Tasks
Transfer / X-Supervised / X-Shot / Continual Learning
Low-level Vision Tasks
Reinforcement Learning
- Navigation
- Other RL Tasks
Medical
Other Tasks
Attention Mechanisms in Vision/NLP
- Attention for Vision
- NLP
- Both
- Others
Citation
References

Image Classification / Backbone

Replace Conv w/ Attention

Pure Attention

LR-Net: "Local Relation Networks for Image Recognition", ICCV, 2019 (Microsoft). [Paper][PyTorch (gan3sh500)]
SASA: "Stand-Alone Self-Attention in Vision Models", NeurIPS, 2019 (Google). [Paper][PyTorch-1 (leaderj1001)][PyTorch-2 (MerHS)]
Axial-Transformer: "Axial Attention in Multidimensional Transformers", arXiv, 2019 (Google). [Paper][PyTorch (lucidrains)]
SAN: "Exploring Self-attention for Image Recognition", CVPR, 2020 (CUHK + Intel). [Paper][PyTorch]
Axial-DeepLab: "Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation", ECCV, 2020 (Google). [Paper][PyTorch]

Conv-stem + Attention

GSA: "Global Self-Attention Networks for Image Recognition", arXiv, 2020 (Google). [Paper][PyTorch (lucidrains)]
HaloNet: "Scaling Local Self-Attention For Parameter Efficient Visual Backbones", CVPR, 2021 (Google). [Paper][PyTorch (lucidrains)]
CoTNet: "Contextual Transformer Networks for Visual Recognition", CVPRW, 2021 (JD). [Paper][PyTorch]
TransCNN: "Transformer in Convolutional Neural Networks", arXiv, 2021 (ETHZ). [Paper]

Conv + Attention

AA: "Attention Augmented Convolutional Networks", ICCV, 2019 (Google). [Paper][PyTorch (leaderj1001)][Tensorflow (titu1994)]
GCNet: "Global Context Networks", ICCVW, 2019 (& TPAMI 2020) (Microsoft). [Paper][PyTorch]
LambdaNetworks: "LambdaNetworks: Modeling long-range Interactions without Attention", ICLR, 2021 (Google). [Paper][PyTorch-1 (lucidrains)][PyTorch-2 (leaderj1001)]
BoTNet: "Bottleneck Transformers for Visual Recognition", CVPR, 2021 (Google). [Paper][PyTorch-1 (lucidrains)][PyTorch-2 (leaderj1001)]
GCT: "Gaussian Context Transformer", CVPR, 2021 (Zhejiang University). [Paper]
CoAtNet: "CoAtNet: Marrying Convolution and Attention for All Data Sizes", NeurIPS, 2021 (Google). [Paper]

Name		Name	Last commit message	Last commit date
Latest commit History 151 Commits
README.md		README.md

ZAKAUDD/transformers

Folders and files

Latest commit

History

Repository files navigation

Ultimate-Awesome-Transformer-Attention

Overview

Image Classification / Backbone

Replace Conv w/ Attention

Pure Attention

Conv-stem + Attention

Conv + Attention

Vision Transformer

General Vision Transformer

Efficient Vision Transformer

Conv + Transformer

Training + Transformer

Robustness + Transformer

Model Compression + Transformer

Attention-Free

MLP-Series

Other Attention-Free

Analysis for Transformer

Detection

Object Detection

3D Object Detection

Multi-Modal Detection

HOI Detection

Salient Object Detection

Other Detection Tasks

Segmentation

Semantic Segmentation

Depth Estimation

Object Segmentation

Other Segmentation Tasks

Video (High-level)

Action Recognition

Action Detection/Localization

Action Prediction

Video Object Segmentation

Video Instance Segmentation

Other Video Tasks

Multi-Modality

VQA / Captioning

Visual Grounding

Multi-Modal Representation Learning

Multi-Modal Retrieval

Multi-Modal Generation

Visual Document Understanding

Scene Graph

Other Multi-Modal Tasks

Other High-level Vision Tasks

Point Cloud

Pose Estimation

Tracking

Re-ID

Face

Neural Architecture Search

Transfer / X-Supervised / X-Shot / Continual Learning

Low-level Vision Tasks

Image Restoration

Video Restoration

Inpainting / Completion / Outpainting

Image Generation

Video Generation

Transfer / Translation / Manipulation

Other Low-Level Tasks

Reinforcement Learning

Navigation

Other RL Tasks

Medical

Medical Segmentation

Medical Classification

Medical Detection

Medical Reconstruction

Medical Low-Level Vision

Medical Others

Other Tasks

Attention Mechanisms in Vision/NLP

Attention for Vision

Packages