optimizer

The optimizer config group determines the torch-optim optimizer used for backpropagation during model training. For now only a single optimizer and single parameter group is supported.

adam

The default optimizer by mml.

_target_
default: Adam
  • the Adam optimizer

  • see Adam

betas
default: [ 0.9, 0.999 ]
  • coefficients used for computing running averages of gradient and its square

lr
default: 0.0005
  • the initial learning rate

eps
default: 1e-08
  • denominator summand for numerical stability

weight_decay
default: 0
  • L2 penalty

sgd

_target_
default: SGD
  • stochastic gradient descent optimizer

  • see SGD

lr
default: 0.0005
  • the initial learning rate

momentum
default: 0
  • momentum factor

weight_decay
default: 0
  • L2 penalty

dampening
default: 0
  • dampening for momentum