- Create simple, 3 layer CNN and train on MNIST dataset
- Extend to full teacher network and train
- Create student network and implement KD loss
- Train student using KD on MNIST
- Experiment with temperature effect on model predictions
- Implement ResNet-18 teacher network
- Train on CIFAR-10 dataset
- Create smaller, student network
- Train in same way to MNIST using KD from teacher
- Gather results for effect of both temperature and alpha on test accuracy
- Implement Assistant network (1-step assistant)
- Implement training loop for Teacher->Assistant->Student distillation
- Test accuracy of TAKD (teacher-assistant KD) against BLKD (baseline KD)