- List of Symbols
- Neuron Equations
- Training Algorithm
- Activation Functions
- Cost Functions
- Normalization
- Regularization
- Optimization Algorithms
- Neural Network Implementation
- Q-Learning
- References
In order to reduce the errors of the network, weights and biases are adjusted to minimize a given cost function
The chain rule allows to separate the derivatives of the cost function into components:
The terms
where typically:
Normalization is the process of adimensionalizing the input layer, which address a problem known as Internal Covariate Shift.
With:
Where
Extra terms are added to the cost function in order to address overfitting.
Network parameters are updated after every training batch
It is a gradient descend performed after every training sample
Network parameters are updated after every training batch
where:
where typically:
%%{init: {"class": {"hideEmptyMembersBox": true}}}%%
classDiagram
ActivationFunction <|-- StepActivationFunction
ActivationFunction <|-- LinearActivationFunction
ActivationFunction <|-- Etc
<<Interface>> ActivationFunction
ActivationFunction : +virtual computeOutput(double intermediateQuantity) double
ActivationFunction : +virtual computeOutputDerivative(double intermediateQuantity) double
StepActivationFunction : +computeOutput(double intermediateQuantity) double
StepActivationFunction : +computeOutputDerivative(double intermediateQuantity) double
LinearActivationFunction : +computeOutput(double intermediateQuantity) double
LinearActivationFunction : +computeOutputDerivative(double intermediateQuantity) double
Etc : ...
classDiagram
CostFunction <|-- QuadraticCostFunction
CostFunction <|-- EntropyCostFunction
CostFunction <|-- Etc
<<Interface>> CostFunction
CostFunction : +virtual computeCost(double output, double target) double
CostFunction : +virtual computeCostDerivative(double output, double target) double
EntropyCostFunction : +computeCost(double output, double target) double
EntropyCostFunction : +computeCostDerivative(double output, double target) double
QuadraticCostFunction : +computeCost(double output, double target) double
QuadraticCostFunction : +computeCostDerivative(double output, double target) double
Etc : ...
classDiagram
OptimizationAlgorithm <|-- GradientDescendOptimizationAlgorithm
OptimizationAlgorithm <|-- AdamOptimizationAlgorithm
OptimizationAlgorithm <|-- Etc
<<Interface>> OptimizationAlgorithm
OptimizationAlgorithm : +virtual computeWeightCorrection(vector~double~ batchOutputs, vector~double~ batchTargets) double
GradientDescendOptimizationAlgorithm : +computeWeightCorrection(vector~double~ batchOutputs, vector~double~ batchTargets) double
AdamOptimizationAlgorithm : +computeWeightCorrection(vector~double~ batchOutputs, vector~double~ batchTargets) double
Etc : ...
classDiagram
class Neuron
Neuron: +vector~double~ weights
Neuron: +shared_ptr~ActivationFunction~ activationFunction
class Layer
Layer: +vector~Neuron~ neurons
Given a model state
Starting with the Bellman equation:
Which states that the
Everytime the model is run, each step or transition is stored as part of the experience, or replay buffer:
classDiagram
class Transition
Transition: +vector~double~ state
Transition: +optional~vector~double~~ nextState
Transition: +double reward
Transition: +int actionId
After every transition, we use our network
- http://neuralnetworksanddeeplearning.com/
- https://www.analyticsvidhya.com/blog/2020/04/feature-scaling-machine-learning-normalization-standardization/
- https://comsm0045-applied-deep-learning.github.io/Slides/COMSM0045_05.pdf
- https://towardsdatascience.com/optimizers-for-training-neural-network-59450d71caf6
- https://stats.stackexchange.com/questions/154879/a-list-of-cost-functions-used-in-neural-networks-alongside-applications
- https://en.wikipedia.org/wiki/Activation_function#Table_of_activation_functions
- https://arxiv.org/abs/1502.03167
- https://www.samyzaf.com/ML/rl/qmaze.html