CLI

This help provides a detailed overview on CLI of MML. The basic call pattern is

mml [mode] [overrides] [hydra.overrides] [hydra-flags]

Besides there are also the following mml-core CLIs (without any arguments):

mml-env-setup - sets up an mml.env file at your current location

mml-copy-conf - sets up mml configs outside the mml-core package

mode

Available modes include:

create - Installs datasets and tasks on the workstation.

pp - Preprocesses tasks with the given “preprocessing”.

train - Trains, tests and/or predicts (single or multi-task).

post - Postprocessing via calibration and ensembling.

info - Provides information on tasks, trained models, etc..

clean - May be used the remove artefacts from mml.

upgrade - Used to migrate mml results and data upwards.

downgrade - Used to migrate mml results and data downwards.

Note that mml plugins may add further modes. You can find more example usages for modes at Modes and specific details on mode configuration at mode.

overrides

MML offers a flexible system to override experiment configuration from the command line. It is powered by Hydra and more details on the syntax can be found in the respective documentation. In a nutshell configuration options are grouped and one can either override a whole group of options with existing config files (e.g. lr_scheduler=cosine) or set values inside a config group (e.g. lr_scheduler.verbose=false).

Note

Hydra configuration is presented in a simplified manner above. There are special cases of combining config files (e.g. callbacks=[early,mixup]), accessing nested config files (e.g. loss/mlcls=ce) or adding new keys to a configuration (e.g. +lr_scheduler.eta_min=0.01).

This following is a list of mml config groups. To see all available current options for a group call mml --help.

arch - determines model architecture

augmentations - sets the pipeline for image augmentations during training and general normalization strategy

callbacks - determines lightning callbacks during training

compile - BETA controls pytorch 2.0 torch.compile behaviour

hpo - hyperparameter optimization methodology

hydra - hydra internal configs, see hydra.overrides below

logging - experiment logging and other notification settings

loss - training loss

lr_scheduler - learning rate schedulers

metrics - metrics to measure model performance

mode - central component defining the scheduler and other corresponding runtime settings

optimizer - network training optimizer

preprocessing - image preprocessing pipeline (applied while training and predicting)

reuse - determines the reuse of previous results as well as clean up of intermediates

sampling - sets sampling strategy

search_space - defines the search space during hyperparameter optimization

sys - system properties (manages to run on different hardware)

tasks - pre-compiled task lists, pivot tasks and task tagging options

trainer - lightning trainer options

tta - BETA test time augmentation

tune - tuning options with lightning tuner

The configuration groups and overrides will be compiled to a final single job configuration (or multiple in --multirun mode as described below). The final configuration can be displayed with the help of hydra-flags (see below) and is also stored in the run folder inside the .hydra subdir.

main config file

Furthermore config_mml.yaml (the main config file) specifies the defaults for each of these groups as well as some other default values. These few top-level options are listed here:

proj

default: default

the project name, this will be used as a top-level folder name in the results directory
it is recommended to separate independent experiments to different projects
the reuse functionality allows cross-project reusability

seed

default: 42

integer to seed the builtin random module, numpy and torch randomness through lightning.seed_everything
will be applied before every scheduler step (so potentially multiple times per mml call)
seeding of dataloader workers is taken care of by lightning
set to False or 0 if random seeding is desired, this reduces reproducibility

allow_gpu

default: True

whether to allow gpu usage outside lightning training, e.g. for task creation or feature extraction
for all lightning related accelerator settings see trainer

continue

default: False

the continue flag allows to resume aborted / interrupted mml experiments
it will skip already completed commands in the scheduler and load the latest checkpoint of any model training
either set to ‘latest’ or specify a run directory by date and time
note that activating this will ignore all currently given CLI options (except the proj) and load the original config

use_best_params

default: False

automatically load the best parameters of a previous hpo study, overwriting the currently specified values
supports two kinds of usages, a minimal but restricted way without persistent storage
provide the hpo identifier in the project of format %Y-%m-%d_%H-%M-%S_%f (e.g. 2024-12-03_12-28-46_362374)
this only works for studies that did not fail, the required summary is only generated at the end of the sweep
also requires to set the current proj to the respective one that conducted the hpo search
alternative set to study_name, requires a preserving hpo.storage (e.g. see mml-sql) to load the optuna.Study
this also works for partly failed / interrupted AND cross project studies

hydra.overrides

The same override style also let’s you alter configurations that directly influence the internal behaviour of hydra. The most common use case might be for example hydra.verbose=true, which enters verbose mode and print all logged debug messages. You can find more config groups via mml --help or follow the hydra documentation.

hydra-flags

hydra offers some functionality that is inherited by mml. All existing options are displayed once more if you call mml --help, but here are some noteworthy ones:

--cfg=job - print the compiled config (without running mml)

--multirun - used for hyperparameter search, starts multiple jobs

--info - information on the defaults tree, config search paths, etc

More info in the hydra docs.