Skip to content

Autoencoders

Brenda Huppenthal edited this page Apr 11, 2024 · 7 revisions

📓Notebook for today's session: autoencoder.ipynb

Structure

source: Matthew Stewart in TowardsDataScience

Autoencoders are composed of two neural networks, the encoder and the decoder. The encoder compresses the input into a lower-dimensional code. This code is a compact representation of the input, also known as the latent-space representation. The decoder reconstructs the original input from the code.

An autoencoder a self-supervised learning algorithm. Rather than train using labeled samples, autoencoders are trained by feeding an input to the encoder and obtaining a reconstruction of that input from the decoder which can be compared to the original input. The loss function is the reconstruction error. One example of reconstruction error would be the L2 loss between an input image and the reconstructed image. By forcing the information through a bottleneck of smaller dimensionality than the input, the encoder must learn a transformation to a code in the latent space that preserves informative features about the input while discarding uninformative features. The decoder must then learn a function from the code back up to the dimensionality of the input space.

If the bottleneck size is smaller than the size of the input, then the autocoder is undercomplete. If the bottleneck is larger than the number of inputs, the autoencoder is overcomplete and is capable of learning to copy the input to the output even if it is restricted to linear functions. As such, if the latent code size is too small, the autoencoder cannot capture enough information about the input to properly reconstruct it. If the latent code size is too large, the representation is not as compressed as it could be, and the network can retain irrelevant information to the task of encoding. Thus, the code size is a hyperparameter and can be tuned, and the process of training includes finding a useful compact representation.

The latent space is only capable of taking discrete values, and there may be gaps where latent codes do not create meaningful images.

Properties

Autoencoders are:

  • data specific. They only compress data similar to what they saw in the training set, as they have learned to represent features specific to that set. If you train an autoencoder on MNIST and test on a picture of a cat, the autoencoder would not perform well because it has not learned any features which relate to capturing important features of a cat.
  • lossy. Because the input is compressed down into a smaller dimension that approximates the important features of the input, the decoder cannot perfectly recover the input, though the properties of neural networks and sufficient power will allow it to recover something close.
  • self-supervised. The data does not need to be labeled in order for the autoencoder to learn useful features for that data.

Uses

  • data denoising: we can train an autoencoder to denoise images by adding noise to the input images and then measuring the reconstruction loss against the original image.
  • image reconstruction: similar to denoising, we can train autoencoders to reconstruct images or even upscale or sharpen images by downscaling/blurring the training examples and comparing reconstruction loss to the original image.
  • image colorization: as above, instead we convert training examples to greyscale images and compute reconstruction loss with respect to the fully colorized version.
  • dimensionality reduction for data visualization: a neural network learning important features rather than deterministically extracting them through a method like PCA.
  • data compression: learning a useful latent representation which is smaller, but can be inflated back to the original size with the required fidelity.
  • feature extraction: because autoencoders can learn useful representations of the input, the encoder can be trained in a self-supervised manner and then used as a feature extractor for other tasks. As one example, the encoder could be used as a feature extractor, and a second network trained to take the features and use them for image classification.

Sources and Additional Resources