Preprocess mode
The preprocessing of tasks is optional. More precisely the preprocess.pipeline config attribute determines the exact steps to preprocess any images (and masks, …) before
potentially do any other data augmentation. Preprocessing steps should be deterministic (no randomness involved). If calling any data processing mode (e.g. train) with any preprocessing option (default: default) mml will check if the task has already been preprocessed with this pipeline and if so load samples directly from there. If not mml will simply preprocess the raw images (and masks, …) on the fly. So to sum up:
while data exploration one can easily rely on “on-the-flight” preprocessing
during number crunching calling
ppbeforehand causes less training computations at the price of more occupied disk memory
To demonstrate this behaviour notice the warning of mml:
“Task mml_fake_task not yet preprocessed. Pipeline contains 3 transforms. If you want to speed up training, preprocess this task beforehand.”
!mml train tasks=fake preprocessing=default trainer.max_epochs=1 mode.cv=false mode.nested=false tune.lr=false
Now let’s preprocess the task.
!mml pp tasks=fake preprocessing=default
The warning disappears when we repeat the experiment!
!mml train tasks=fake preprocessing=default trainer.max_epochs=1 mode.cv=false mode.nested=false tune.lr=false