CIFAR |
MNIST |
AUDIO |
wav_aggressor.mp4 |
A simplest possible implementation of Autoregressive Image Generation without Vector Quantization.
- Simple Architecture: A tiny transformer for autoregression and an MLP for diffusion.
- Minimal Dependencies: Built from scratch using only basic MLX operations.
- Single-File Implementation: Entire model in one Python file
aggressor.py
.
Aggressor
: Main model class combining transformer and diffusion.Transformer
: Multi-layer transformer with attention and MLP blocks.Denoiser
: MLP-based diffusion process with time embedding.Scheduler
: Handles forward and backward processes for diffusion.
python aggressor.py
(Training on 60000 images x 20 epochs takes approximately 7~8 minutes on 8GB M2 MacBook.)
Thanks to lucidrains' fantastic code that inspired this project. The official implementation is available here.