API components overview ======================= The internals of ``mml`` may appear deeply intertwined at the beginning. The following figure shows the main components that interact with each other during a standard experiment. .. image:: ../_static/mml_overview.png For full ``mml-core`` internals see :doc:`overview`. For an overview on plugins see :doc:`plugins/overview`. The following quickly introduces the shown core components. scheduler --------- The scheduler determines the single steps each experiment runs through. It is determined by the ``mode.scheduler._target_`` entry of the compiled ``hydra`` config. A sample entry would be ``mml.core.scripts.create_scheduler.CreateScheduler``. The main loop instantiates the referred scheduler and hands over the config. Internally the scheduler creates all other required objects (more precisely this is handled by :class:`~mml.core.scripts.base_scheduler.AbstractBaseScheduler`). Usually inherited scheduler implement the following * an ``__init__``, calling ``super().__init__()`` providing ``cfg`` and ``available_subroutines`` * an ``create_routine``, where based on ``self.subroutines`` both ``self.commands`` and ``self.params`` are extended * optionally the ``after_preparation_hook`` and ``before_finishing_hook`` may be overwritten * methods that reflect the actual data processing and are added to ``self.commands`` within ``create_routine`` These methods should reflect "atomic" steps in the processing (e.g. training of a single neural network), but not mix multiple processing steps (e.g. training and prediction). The underlying idea is that :class:`~mml.core.scripts.base_scheduler.AbstractBaseScheduler` keeps close track on the progress and may, if interrupted during runtime, restart at the very atomic processing step the interruption happened. This ``continue`` functionality is described in more detail in :doc:`../usage`. The :meth:`~mml.core.scripts.base_scheduler.AbstractBaseScheduler.run` method will iterate over all entries of :attr:`~mml.core.scripts.base_scheduler.AbstractBaseScheduler.commands` and call them with the corresponding parameters listed in :attr:`~mml.core.scripts.base_scheduler.AbstractBaseScheduler.params`. Within those steps various convenience methods and attributes of :class:`~mml.core.scripts.base_scheduler.AbstractBaseScheduler` can be used: * :meth:`~mml.core.scripts.base_scheduler.AbstractBaseScheduler.get_struct` returns a ``TaskStruct`` * :meth:`~mml.core.scripts.base_scheduler.AbstractBaseScheduler.create_trainer` returns a ``Trainer`` * :meth:`~mml.core.scripts.base_scheduler.AbstractBaseScheduler.create_model` returns a ``model`` * :meth:`~mml.core.scripts.base_scheduler.AbstractBaseScheduler.create_datamodule` returns a ``datamodule`` * :meth:`~mml.core.scripts.base_scheduler.AbstractBaseScheduler.lightning_tune` can tune learning rate and batch size of a model * :attr:`~mml.core.scripts.base_scheduler.AbstractBaseScheduler.cfg` allows access to the compiled ``config`` * :attr:`~mml.core.scripts.base_scheduler.AbstractBaseScheduler.fm` allows access to the current ``MMLFileManager`` * :attr:`~mml.core.scripts.base_scheduler.AbstractBaseScheduler.pivot` is an (optional) prominent task within the current task list * :attr:`~mml.core.scripts.base_scheduler.AbstractBaseScheduler.return_value` represents the value that will be returned once the scheduler run all commands A consequence of the "atomic" character of scheduler methods is the necessity to store intermediate results as files, to be persistent after a crash and reusable with ``continue`` flag. The paths to these files should be attached to the :class:`~mml.core.data_loading.task_struct.TaskStruct` via the ``paths`` attribute (except for ``models``). See below for more details. file manager ------------ The :class:`~mml.core.data_loading.file_manager.MMLFileManager` is a :class:`~mml.core.scripts.utils.Singleton` and may at any time be accessed via :meth:`~mml.core.data_loading.file_manager.MMLFileManager.instance` once it has been initialized during :class:`~mml.core.scripts.base_scheduler.AbstractBaseScheduler`'s :meth:`~mml.core.scripts.base_scheduler.AbstractBaseScheduler.__init__`. The file manager is the main interface for reading and writing files within ``mml``. It is responsible to detect all installed tasks of ``mml`` and read in the respective ``.json`` task descriptions to create the corresponding :class:`~mml.core.data_loading.task_struct.TaskStruct`. It's main access within scheduler's custom methods is via :meth:`~mml.core.data_loading.file_manager.MMLFileManager.construct_saving_path` which should **ALWAYS** be used to generate saving paths for objects. Templates for the construction of such paths are provided via :meth:`~mml.core.data_loading.file_manager.MMLFileManager.add_assignment_path` which is a class method and can and should be called before the file manager initialization. If declared as such these paths can be reusable and shared / loaded from other projects via ``mml``. For details of this ``reuse`` functionality see :doc:`../usage`. A lot more magic from :class:`~mml.core.data_loading.file_manager.MMLFileManager` is happening under the hood of ``mml``. One example is the ``clean_up`` functionality. Assume that specific kind of intermediate files are not necessary any more after a full ``run`` from the scheduler. Setting e.g. ``reuse.clean_up.parameters=true`` automatically deletes all files of type parameter that have been created during the experiment after successful finish. Note that ``lightning`` checkpoints of model training and all files of type ``temp`` are deleted automatically. task struct ----------- A :class:`~mml.core.data_loading.task_struct.TaskStruct` is a lightweight representation of a task. It stores high level information as e.g. :attr:`~mml.core.data_loading.task_struct.TaskStruct.task_type` and :attr:`~mml.core.data_loading.task_struct.TaskStruct.num_classes`. Furthermore is is used to attach intermediate results of scheduler methods via :attr:`~mml.core.data_loading.task_struct.TaskStruct.paths` and :attr:`~mml.core.data_loading.task_struct.TaskStruct.models`. The former is a dictionary holding flexible string to paths associations and the latter is a list of all trained :class:`~mml.core.scripts.model_storage.ModelStorage`s for this task. Getting the latest :class:`~mml.core.data_loading.task_struct.TaskStruct` for a task from scheduler's ``cfg.task_list`` is achieved via :meth:`~mml.core.scripts.base_scheduler.AbstractBaseScheduler.get_struct`.