- Weight initialization methods
- Random initialization
- Xavier initialization
- He initialization
- Activation functions and their derivatives
- Identity
- Sigmoid
- Softmax
- Tanh
- ReLU
- Loss functions and their derivatives
- Mean Squared Error
- Log-likelihood
- Cross Entropy
- Stochastic mini-batch Gradient Descent
- Momentum based Gradient Descent
- Nesterov accelerated Gradient Descent
- ReadMe
- ...
- Use validation data for hyper parameter tuning
- hyper paramters: epochs, mini-batch size, learning rate, momentum
- Plots for monitoring loss and accuracy over epochs
- With data as arg ( options: training_data, validation_data, test_data )
- Regularization techniques: L1, L2, dropout
- Add optimizers: Adam, RMSProp
- CNN
- RBF NN