Preprocess mode

The preprocessing of tasks is optional. More precisely the preprocess.pipeline config attribute determines the exact steps to preprocess any images (and masks, …) before potentially do any other data augmentation. Preprocessing steps should be deterministic (no randomness involved). If calling any data processing mode (e.g. train) with any preprocessing option (default: default) mml will check if the task has already been preprocessed with this pipeline and if so load samples directly from there. If not mml will simply preprocess the raw images (and masks, …) on the fly. So to sum up:

  • while data exploration one can easily rely on “on-the-flight” preprocessing

  • during number crunching calling pp beforehand causes less training computations at the price of more occupied disk memory

To demonstrate this behaviour notice the warning of mml: “Task mml_fake_task not yet preprocessed. Pipeline contains 3 transforms. If you want to speed up training, preprocess this task beforehand.”

!mml train tasks=fake preprocessing=default trainer.max_epochs=1 mode.cv=false mode.nested=false tune.lr=false

Now let’s preprocess the task.

!mml pp tasks=fake preprocessing=default

The warning disappears when we repeat the experiment!

!mml train tasks=fake preprocessing=default trainer.max_epochs=1 mode.cv=false mode.nested=false tune.lr=false