This is a Tensorflow implementation of Super-Convergence: very fast training of neural networks using large learning rates aiming to be compatible on the TensorFlow-Slim image classification model library.
This paper suggests a different learning rate policy called one-cycle policy which makes network to be trained significantly faster and named this phenomenon super-convergence.
Cyclical Learning Rate (CLR) is the previous research of the author and suggests to train a network with CLR to get better classification performance.
Super-convergence paper suggests one-cycle policy which is small modification of CLR. It always use one cycle that is smaller than the total number of iterations/epochs and allow the learning rate to decrease several orders of magnitude less than the initial learning rate for the remaining iterations. If using one cycle learning rate schedule, it is better to use a cyclical momentum (CM) that starts at the maximum momentum value and decreases with increasing learning rate.
In order to find proper learning rate range for CLR and one-cycle policy, you can use learning rate range test. As the learning rate increases, it eventually becomes too large and czuses the test/validation loss to increase and the accuracy to decrease. This point or smaller value of this point can be used as the maximum bound. Minimum bound can be found as:
- a factor of 3 or 4 less than the maximum bound
- a factor of 10 or 20 less than the maximum bound if only one cycle is used
- by a short test of hundreds of iterations with a few initial learning rates and pick the largest one that allows convergence to begin without signs of overfitting
- Python 3.x
- TensorFlow 1.x
- TF-slim
You should prepare your own dataset or open dataset (Cifar10, flowers, MNIST, ImageNet). For preparing dataset, you can follow the 'preparing the datasets' part in TF-Slim image models README.
Below script gives you an example of learning rate range test.
To perform learning rate range test, --learning_rate_decay_type
argument should be lr_range_test
.
Minimum learning rate and maximum learning rate should be set on --learning_rate
and --max_learning_rate
arguments.
DATASET_DIR=/DIRECTORY/TO/DATASET
TRAIN_DIR=/DIRECTORY/TO/TRAIN
CUDA_VISIBLE_DEVICES=0 python ../train_image_classifier.py \
--train_dir $TRAIN_DIR \
--dataset_dir $DATASET_DIR \
--dataset_name imagenet \
--dataset_split_name train \
--model_name resnet_v1_50 \
--learning_rate_decay_type lr_range_test \
--optimizer momentum \
--momentum 0.9 \
--weight_decay 0.00001 \
--learning_rate 0.00001 \
--max_learning_rate 3.0 \
--step_size 50000 \
--max_number_of_steps 100000 \
--train_image_size 224 \
--batch_size 64
Below script gives you an example of training a model with one-cycle policy.
To perform one-cycle policy, --learning_rate_decay_type
argument should be one_cycle
.
For cyclical learning rate, minimum learning rate and maximum learning rate should be set on --learning_rate
and --max_learning_rate
arguments.
For cyclical momentum, minimum momentum and maximum momentum should be set on --min_momentum
and --momentum
arguments.
DATASET_DIR=/DIRECTORY/TO/DATASET
TRAIN_DIR=/DIRECTORY/TO/TRAIN
CUDA_VISIBLE_DEVICES=0 python ../train_image_classifier.py \
--train_dir $TRAIN_DIR \
--dataset_dir $DATASET_DIR \
--dataset_name imagenet \
--dataset_split_name train \
--model_name resnet_v1_50 \
--learning_rate_decay_type one_cycle \
--optimizer momentum \
--momentum 0.95 \
--min_momentum 0.85 \
--weight_decay 0.00001 \
--learning_rate 0.5 \
--max_learning_rate 1.0 \
--step_size 50000 \
--max_number_of_steps 150000 \
--train_image_size 224 \
--batch_size 64
Below script gives you an example of training a model with cyclical learning rate.
To perform cyclical learning rate, --learning_rate_decay_type
argument should be CLR
.
Minimum learning rate and maximum learning rate should be set on --learning_rate
and --max_learning_rate
arguments.
DATASET_DIR=/DIRECTORY/TO/DATASET
TRAIN_DIR=/DIRECTORY/TO/TRAIN
CUDA_VISIBLE_DEVICES=0 python ../train_image_classifier.py \
--train_dir $TRAIN_DIR \
--dataset_dir $DATASET_DIR \
--dataset_name imagenet \
--dataset_split_name train \
--model_name resnet_v1_50 \
--learning_rate_decay_type CLR \
--optimizer momentum \
--momentum 0.95 \
--min_momentum 0.85 \
--weight_decay 0.00001 \
--learning_rate 0.1 \
--max_learning_rate 0.5 \
--step_size 1000 \
--max_number_of_steps 10000 \
--train_image_size 224 \
--batch_size 64
To keep track of validation accuracy while training, you can use eval_image_classifier_loop.py
which evaluate the performance at multiple checkpoints during training.
If you want to just evaluate a model once, you can use eval_image_classifier.py
.
Below script gives you an example of evaluating a model repeatedly while training a model.
DATASET_DIR=/DIRECTORY/TO/DATASET
CHECKPOINT_FILE=/DIRECTORY/TO/CHECKPOINT
EVAL_DIR=/DIRECTORY/TO/EVAL
CUDA_VISIBLE_DEVICES=0 python eval_image_classifier_loop.py \
--alsologtostderr \
--checkpoint_path=${CHECKPOINT_FILE} \
--dataset_dir=${DATASET_DIR} \
--eval_dir=${EVAL_DIR} \
--dataset_name=imagenet \
--dataset_split_name=validation \
--model_name=resnet_v1_50 \
--batch_size=100
LR_schedule | model | data | max_steps | step_size | batch_size | optimizer | lr | weight_decay | momentum | acc | training_time |
---|---|---|---|---|---|---|---|---|---|---|---|
constant LR | resnet50_v1 | imagenet | 1000k | - | 128 | rmsprop | 0.01 | 0.00004 | 0.9 | 0.6923 | 7d 19h |
one-cycle policy | resnet50_v1 | imagenet | 250k | 100k | 128 | momentum | 0.05-1.0 | 0.00001 | 0.95-0.85 | 0.7075 | 2d 16h |
- Blog: Super-Convergence: very fast training of neural networks using large learning rates
- Repository: CBAM-TensorFlow-Slim
- Repository: SENet-TensorFlow-Slim
- Paper: Cyclical learning rates for training neural networks
- Paper: Super-Convergence: very fast training of neural networks using large learning rates
- Paper: A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay
- Github: Caffe files of NRL Technical Report
- Github: keras-one-cycle
Byung Soo Ko / kobiso62@gmail.com