mml.core.scripts.schedulers.base_scheduler

class AbstractBaseScheduler[source]

Bases: object

This is the base class of a scheduler for a possible series of experiments. Based on a special order of routines one can implement a derived scheduler class for an own setup. The scheduler itself keeps track of the status, datasets, manages file savings and loading, provides routines for the inclusion of dataloaders & models.

__init__(cfg: DictConfig, available_subroutines: List[str])[source]

Creates the schedule. Can be started afterward with the .run() method.

Parameters:
  • cfg – configs of the current run

  • available_subroutines – available subroutines of inherited scheduler

after_preparation_hook() None[source]

This hook will be called at the end of the prepare_exp() step. That step basically prepares the task structs accordingly. Once this is done, this hook can be used for remaining setups necessary that rely on task structs (e.g. compatibility checks).

In contrast to the other scheduler steps in the schedule, the prepare_exp() is also performed in CONTINUE mode. This hook can be used to ensure continue capability of the scheduler, e.g., by loading additional ressources from previous runs. More on CONTINUE mode: continue

Example usages:

before_finishing_hook()[source]

This hook will be called at the beginning of the finish_exp() step. That step performs dumping and clean up of intermediates. As such this hook is the perfect opportunity to aggregate results or other intermediates from the previous computing steps (e.g., some results plotting).

It is also ideally placed to set the return_value of the scheduler, which will be returned by the run() method and can be used for experiment evaluation and in hyperparameter optimization (see Hyperparameter optimization).

Example usages:

static compare_schedule_entries(entry_1: str, entry_2: str) bool[source]

Helper function in comparsion of schedules.

Parameters:
  • entry_1 – line of a schedule (command and args)

  • entry_2 – line of a schedule (command and args)

Returns:

true if lines are compatible, else false

create_datamodule(task_structs: TaskStruct | List[TaskStruct], fold: int = 0) MultiTaskDataModule[source]

Creates a pytorch lightning datamodule.

Parameters:
  • task_structs (Union[TaskStruct, List[TaskStruct]]) – task struct(s) to create datamodule from

  • fold (int) – fold to be used

Returns:

datamodule instance

create_model(task_structs: List[TaskStruct], task_weights: List[float] | None = None, load_parameters: Path | None = None) LightningModule[source]

Creates a pytorch lightning module.

Parameters:
  • task_structs (List[TaskStruct]) – list of task structs to construct lightning module

  • task_weights (Optional[List[float]]) – (optional) list of task weights to weigh loss

  • load_parameters (Optional[Path]) – (optional) path to load model weights

Returns:

LightningModule instance

abstract create_routine() None[source]

Adds commands and parameters to the schedule. May e.g. be in the form of:

if 'xyz' in self.subroutines:
    for task in self.cfg.task_list:
        self.commands.append(self.MY_IMPLEMENTED_ROUTINE)
        self.params.append([task])
Returns:

None

create_trainer(monitor: Tuple[str, str] | None = None, metrics_callback: bool = False) Trainer[source]

Creates a trainer from cfg.trainer with callbacks from cfg.cbs. By default, uses two MMLModelCheckpoint callbacks that behave as follows:

  • at least every 30 minutes a checkpoint is stored to ensure resume compatibility,

  • if monitor is given will keep the best model stored based thereof, regularly checking at the end of each epochs validation

  • if monitor is None only the very last epoch will be stored (besides the temporal check)

The non-time based checkpoint may be accessed through checkpoint_callback.

Parameters:
  • monitor (Optional[Tuple[str, str]]) – (optional) a tuple of metric name and mode (min or max) to be monitored by model checkpoint (saves best model) and early stopping callback (if activated in cfg)

  • metrics_callback (bool) – (optional) if true creates and also a metric callback

Returns:

trainer instance, the callbacks can be accessed through the scheduler attributes metrics_callback and checkpoint_callback

Return type:

Union[Tuple[pl.Trainer, ModelCheckpoint], Tuple[pl.Trainer, ModelCheckpoint, MetricsTrackerCallback]]

finish_exp() None[source]

Last command of any experiment, this is how every experiment finishes. Ensures dumping of task factory, unlinks the planned schedule, removes intermediate results if specified in config and allows also for specific instructions of any subclass via the >additional_finishing_instructions< interface. USE THAT INSTEAD AND DO NOT OVERWRITE THIS FUNCTION UNLESS YOU KNOW WHAT YOU DO.

Returns:

None

get_checkpoints_dir()[source]

Path to store checkpoints currently.

Returns:

Path to a folder to store training checkpoints

get_struct(task_name: str) TaskStruct[source]

Convenience function to access a task struct.

Parameters:

task_name (str) – name of the task

Returns:

the corresponding task struct

highlight_text(text: str) str[source]

Helper function in highlighting text within terminal. May be turned of by the logging.highlight_task_names config option.

Parameters:

text – text to be highlighted

Returns:

modified text if highlighting is active, else plain input text

lightning_tune(trainer: Trainer, model: LightningModule, datamodule: LightningDataModule | None, train_dataloaders=None) None[source]

Tune a model / datamodule based on configs.tune setting.

Parameters:
  • trainer – the lightning trainer

  • model – the lightning model

  • datamodule – the lightning datamodule

  • train_dataloaders – alternative method to provide the data, set datamodule to None in this case

Returns:

none, tuned values are stored inside model / datamodule

prepare_exp() None[source]

First command of any experiment. Mainly handles loading of task structs and seeding of experiment. Specific preparation might also be done with the >additional_preparation_instructions<. USE THAT INSTEAD AND DO NOT OVERWRITE THIS FUNCTION UNLESS YOU KNOW WHAT YOU DO.

Returns:

None

run() float[source]

The run routine starts the schedule and logs the process (within a file at self.status_log).

Returns:

self.return_value (which might be set during runtime)

set_active_naming(command_ix) None[source]

Defines the active_step_naming attribute for the given command index.

Parameters:

command_ix – index of the command

Returns:

None