The MNIST
database is available at http://yann.lecun.com/exdb/mnist/
The MNIST
database is a dataset of handwritten digits. It has 60,000 training
samples, and 10,000 test samples. Each image is represented by 28x28 pixels, each
containing a value 0 - 255 with its grayscale value.
It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.
Thanks to Yann LeCun, Corinna Cortes, Christopher J.C. Burges.
- The discriminator is going to be a typical linear classifier.
- The activation function we will be using is Leaky ReLu.
- We should use a leaky ReLU to allow gradients to flow backward through the layer unhindered. A leaky ReLU is like a normal ReLU, except that there is a small non-zero output for negative input values.
- The generator uses latent samples to make fake images. These latent samples are vectors which are mapped to the fake images.
- The activation function for all the layers remains the same except we will be using Tanh at the output.
- The generator has been found to perform the best with 𝑡𝑎𝑛ℎtanh for the generator output, which scales the output to be between -1 and 1, instead of 0 and 1.
- We output of the generator to be comparable to the real images pixel values, which are normalized values between 0 and 1. Thus, we'll also have to scale our real input images to have pixel values between -1 and 1 when we train the discriminator. This will be done during the training phase.
- To help the discriminator generalize better, the labels are reduced a bit from 1.0 to 0.9. For this, we'll use the parameter smooth; if True, then we should smooth our labels. In PyTorch, this looks like:
labels = torch.ones(size) * 0.9
- We also made use of dropout layers to avoid overfitting.
- The discriminator's goal is to output a 1 for real and 0 for fake images. On the other hand, the generator wants to make fake images that closely resemble the real ones.
- Thus we can say if "D" represents the loss for the discriminator, then the following can be stated:
The goal of discriminator : D(real_images)=1 & D(fake_images)=0
The goal of generator: D(real_images)=0 & D(fake_images)=1
- We will use BCEWithLogitsLoss, which combines a sigmoid activation function (we want the discriminator to output a value 0–1 indicating whether an image is real or fake) and binary cross-entropy loss.
- As mentioned earlier, Adam is a suitable optimizer.
- The generator takes in a vector z and outputs fake images. The discriminator alternates between training on the real images and that of the fakes images produced by the generator.
- Steps involved in discriminator training:
- We first compute the loss on real images
- Generate fake images
- Compute loss on fake images
- Add the loss of the real and fake images
- Perform backpropagation and update weights of the discriminator
- Steps involved in generator training:
- Generate fake images
- Compute loss on fake images with inversed labels
- Perform backpropagation and update the weights of the generator.
- We shall plot generator and discriminator losses against the number of epochs.
At the start
Overtime
- This way the generator starts out with noisy images and learns over time.
- Since the time Ian Goodfellow and his colleagues at the University of Montreal designed GANs, they exploded with popularity. The number of applications is remarkable. GANs were further improved by many variations some of which are CycleGAN, Conditional GAN, Progressive GAN, etc.