Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
deep-learning
optimizer
pytorch
artificial-intelligence
moe
resnet
vit
diffusion
mae
fairseq
cuda-programming
bert-model
gpt2
transformer-xl
timm
convnext
adan
llms
dreamfusion
llm-training
-
Updated
Jul 2, 2024 - Python