A four-layer autoencoder was developed for the TinyImageNet dataset to process 600x600 color images. This Jupyter Notebook (Task_1.ipynb) underwent preprocessing, including class selection, resizing, and normalization. The model was trained for 10 epochs with the Adam optimizer, using mean squared error for loss evaluation. The model's effectiveness was validated by the loss metrics and visual comparison of original and reconstructed test images, demonstrating its capability in image reconstruction and feature extraction.
- Dataset: TinyImageNet, containing natural color images.
- Classes Selected: A subset of classes with 500 training and 50 validation samples each.
- Specific Classes: 'mashed potato', 'bell pepper', 'alp', 'sewing machine', 'lemon', 'banana', 'umbrella', 'volleyball', 'torch', 'mushroom'.
- Allocation: 400 samples for training (40 from each class) and 100 samples for testing (10 from each class).
- Approach: Generator-based for efficient memory usage.
- Image Resizing: Adjusted to 600x600 pixels for the autoencoder.
- Normalization: Pixel values scaled to the [0, 1] range.
- Layers: Four convolutional layers with 'relu' activation.
- Filters: Sizes increase from 32, 64, 128, to 256.
- Pooling: Max-pooling layers follow each convolutional layer.
- Design: Mirrors the encoder with convolutional and upsampling layers.
- Activation: 'sigmoid' function in the final layer for image reconstruction.
- Optimizer: Adam.
- Loss Function: Mean squared error.
- Epochs: 10.
- Batch Size: 32.
- Validation: Used to monitor performance during training.
This task (Task_2.ipynb) involves the unique challenge of reconstructing two distinct images from their average. A parallel autoencoder architecture was implemented to independently train for each output image. The model was trained using the CIFAR-10 dataset, where 2000 images were selected for training and 800 for testing, creating 1000 and 400 pairs respectively. The input to the model was the average of these image pairs. The model's performance was evaluated using the Structural Similarity Index (SSIM) as the loss function over 40 epochs.
- Dataset: CIFAR-10.
- Training Images: 2000 images from the training set to create 1000 pairs.
- Testing Images: 800 images from the test set to create 400 pairs.
- Input Generation: Averaging each pair of images to serve as the model input.
- Encoder: Consists of convolutional layers with 'relu' activation, followed by max-pooling.
- Decoder: Includes convolutional layers with 'relu' activation, followed by upsampling, and a final 'sigmoid' activation layer for output.
- Optimizers: Two Adam optimizers for the parallel autoencoder.
- Loss Function: Structural Similarity Index (SSIM) for loss calculation.
- Epochs: 40.