Workplan

Plan

Stage 1 (MNIST)

Create simple, 3 layer CNN and train on MNIST dataset
Extend to full teacher network and train
Create student network and implement KD loss
Train student using KD on MNIST
Experiment with temperature effect on model predictions

Stage 2 (CIFAR-10 -- BLKD)

Implement ResNet-18 teacher network
Train on CIFAR-10 dataset
Create smaller, student network
Train in same way to MNIST using KD from teacher
Gather results for effect of both temperature and alpha on test accuracy

Stage 3 (Teacher-Assistant KD -- TAKD)

Implement Assistant network (1-step assistant)
Implement training loop for Teacher->Assistant->Student distillation
Test accuracy of TAKD (teacher-assistant KD) against BLKD (baseline KD)