-
This project contains an implementation of 11 variants of Gradient Descent algorithm from scratch. It's applied to The Boston Housing Dataset, and it follows the guidelines and explanations given in the paper of Sebastian Ruder (2017) An overview of gradient descent optimization algorithms. This repository includes a detailed mathematical explanation of every algorithm, and also some of the reasons to use gradient descent instead of LSE solution in the case of linear regression problems.
-
The algorithms that were implemented include the following:
- Vanilla Gradient Descent (aka Batch Gradient Descent).
- Stochastic Gradient Descent (SGD).
- Mini-Batch Gradient Descent (MBGD).
- Momentum with SGD.
- Nesterov Accelerated Gradient (NAG).
- Adagrad (Adaptive gradient).
- Adadelta.
- RMSprop.
- Adam (Adaptive moment estimation).
- Adamax.
- Nadam (Nesterov-accelerated Adaptive moment estimation).
The repository contains the following files & directories:
- Notebooks/Optimization_For_Machine_Learning.ipynb: This notebook contains all details about the implementation of each algorithm, as well as a brief graph comparison between the speed of each algorithm.
For any information, feedback or questions, please contact me