optimizer

The optimizer config group determines the torch-optim optimizer used for backpropagation during model training. For now only a single optimizer and single parameter group is supported.

adam

The default optimizer by mml.

_target_

default: Adam

the Adam optimizer
see Adam

betas

default: [ 0.9, 0.999 ]

coefficients used for computing running averages of gradient and its square

lr

default: 0.0005

the initial learning rate

eps

default: 1e-08

denominator summand for numerical stability

weight_decay

default: 0

L2 penalty

sgd

_target_

default: SGD

stochastic gradient descent optimizer
see SGD

lr

default: 0.0005

the initial learning rate

momentum

default: 0

momentum factor

weight_decay

default: 0

L2 penalty

dampening

default: 0

dampening for momentum