Optimizer.py
SGD optimizer to update the parameters \(W=W-\eta\nabla_WE(L)\)
Momentum \(v=\beta_1{v}+\left(1-\beta_1\right)\nabla_WL\)
If including the momentum \(\beta\), the optimizer becomes \(W=W-\eta{v}\)
Adam optimizer to update the parameters \(W=W-\eta\frac{v}{\sqrt{G+\epsilon}}\)