{ "cells": [ { "cell_type": "markdown", "source": [ "# Preprocess mode\n", "\n", "The preprocessing of tasks is optional. More precisely the `preprocess.pipeline` config attribute determines the exact steps to preprocess any images (and masks, ...) before\n", "potentially do any other data augmentation. Preprocessing steps should be deterministic (no randomness involved). If calling any data processing mode (e.g. `train`) with any preprocessing option (default: `default`) `mml` will check if the task has already been preprocessed with this pipeline and if so load samples directly from there. If not `mml` will simply preprocess the raw images (and masks, ...) on the fly. So to sum up:\n", "\n", " * while data exploration one can easily rely on \"on-the-flight\" preprocessing\n", " * during number crunching calling `pp` beforehand causes less training computations at the price of more occupied disk memory\n" ], "metadata": { "collapsed": false } }, { "cell_type": "markdown", "source": [ "To demonstrate this behaviour notice the warning of `mml`:\n", "\"Task mml_fake_task not yet preprocessed. Pipeline contains 3 transforms. If you want to speed up training, preprocess this task beforehand.\"" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 4, "source": "!mml train tasks=fake preprocessing=default trainer.max_epochs=1 mode.cv=false mode.nested=false tune.lr=false", "metadata": { "collapsed": false, "ExecuteTime": { "start_time": "2024-01-17T13:57:13.556088Z", "end_time": "2024-01-17T13:57:25.914250Z" } }, "outputs": [] }, { "cell_type": "markdown", "source": [ "Now let's preprocess the task." ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 5, "source": "!mml pp tasks=fake preprocessing=default", "metadata": { "collapsed": false, "ExecuteTime": { "start_time": "2024-01-17T13:57:34.673626Z", "end_time": "2024-01-17T13:57:42.232901Z" } }, "outputs": [] }, { "cell_type": "markdown", "source": [ "The warning disappears when we repeat the experiment!" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 6, "source": "!mml train tasks=fake preprocessing=default trainer.max_epochs=1 mode.cv=false mode.nested=false tune.lr=false", "metadata": { "collapsed": false, "ExecuteTime": { "start_time": "2024-01-17T13:58:13.857613Z", "end_time": "2024-01-17T13:58:26.347457Z" } }, "outputs": [] }, { "cell_type": "code", "execution_count": null, "source": [], "metadata": { "collapsed": false }, "outputs": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }