PyTorch Optimizer: Optimizer.py

SGD optimizer to update the parameters \(W=W-\eta\nabla_WE(L)\)

Momentum \(v=\beta_1{v}+\left(1-\beta_1\right)\nabla_WL\)

If including the momentum \(\beta\), the optimizer becomes \(W=W-\eta{v}\)

Adam optimizer to update the parameters \(W=W-\eta\frac{v}{\sqrt{G+\epsilon}}\)