How to tune hyperparameters #424

liqi0126 · 2020-04-08T00:39:48Z

liqi0126
Apr 8, 2020

Thanks for your great work! This repo includes many state-of-the-art methods and is easy to reproduce. So I would like to ask for some advice about insight into tuning hyperparameters. There are many hyperparameters given in the script of training, and I'm wondering how did you dig them out. How should I adjust them if I am training a model of my own?

Kshitij09 · 2020-05-22T15:12:46Z

Kshitij09
May 22, 2020

@rwightman I'm also stumbled upon same question. How could I know what are the hyperparameters that I'm able to tune? I agree it varies depending on the model, but I'm looking for some documentation or a place to dig into. Particularly, I'm interested in customizing activation_layer, attention_layer, anit-aliasing layer

0 replies

rwightman · 2021-02-11T00:04:08Z

rwightman
Feb 11, 2021
Maintainer

@liqi17thu @Kshitij09 this reply has been a long time coming, but going to move to discussions as it seems a good thread to leave open.

There's no magic to hparam tuning, just persistence and spending enough time with your models and datasets to get a feel for what works and what doesn't. With more experience you'll get better at finding starting points but will still have to search for the optimal values. If you've got enough compute, just doing exhaustive sweeps can be a good strategy, especially if your dataset isn't large.

The starting points, and large improvement jumps in hparams often comes from ideas put forward in various papers and their combinations. My current 'best' ImageNet hparams are based on a blend of ideas in EfficientNet, RandAugment, and 'Bag-of-Tricks' papers + experience based additions which include turning up the augmentation and regularization significantly. This won't necessarily work as well on a different dataset, especially smaller ones, or for fine-tuning instead of training from scratch, etc.

Fine-tuning, I often find a simple combo of Adam + Plateau LR schedule, or Momentum/Nesterov (SGD) w/ cosine decay and lower LR than you'd use for training from scratch is a good starting point. Also less augmentation is usually a good starting point.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to tune hyperparameters #424

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

How to tune hyperparameters #424

liqi0126 Apr 8, 2020

Replies: 2 comments

Kshitij09 May 22, 2020

rwightman Feb 11, 2021 Maintainer

liqi0126
Apr 8, 2020

Kshitij09
May 22, 2020

rwightman
Feb 11, 2021
Maintainer