mml.core.scripts.model_storage

class EnsembleStorage[source]

Bases: object

An EnsembleStorage represents a collection of models that are applied jointly on a task.

__init__(performance: float, weights: ~typing.List[float] = <factory>, members: ~typing.List[~pathlib.Path] = <factory>, predictions: ~typing.Dict[str, ~pathlib.Path] = <factory>, metrics: dict = <factory>, search_params: dict = <factory>, _stored: ~pathlib.Path | None = None) None
property folds: List[int]

Folds used by the members.

classmethod from_json(path: Path) EnsembleStorage[source]

Counterpart to saving the storage. Creates the storage object from a file.

Parameters:

path – path to load the storage from

Returns:

an ensemble storage dataclass

get_members() List[ModelStorage][source]

Loads the actual ModelStorage members of the Ensemble from disk. :return: list of ModelStorage instances

members: List[Path]
metrics: dict
performance: float
predictions: Dict[str, Path]
search_params: dict
store(task_struct: TaskStruct | None = None, path: Path | None = None, file_name: str = 'ensemble.json') Path[source]

Saves the model ensemble. If struct is given it creates a new path and returns it, if path is given otherwise uses that if None is given it tries to look up if the storage has been loaded previously and will update that location.

Parameters:
  • task_struct (Optional[TaskStruct]) – task struct corresponding to the task the ensemble was optimised on, will be used to determine the path. Either task_struct or path must be provided.

  • path (Optional[Path]) – (optional) if a path already exists for this storage, overwrite it, raises an error if the presented path does not exist yet

  • file_name (str) – only relevant for task_struct variant, determines the naming of the json file

Returns:

the path the storage was saved to

property tasks: Set[str]

Tasks of the members.

weights: List[float]
class ModelStorage[source]

Bases: object

Lightweight wrapper for everything to reproduce, load and compare trained models. Basically consists of a path to a saved pipeline, a path to saved parameters and a performance value, indicating validation metric after training.

Parameters:
  • pipeline (Path) – path to a stored ~mml.core.scipts.pipeline_configuration.PipelineCfg

  • parameters (Path) – path to stored model parameters

  • performance (float) – validation score of the model, usually best/last epoch loss value, might be used for model selection

  • training_time (float) – training time in seconds

  • task (Optional[str]) – in simple supervised settings this may indicate the target task trained for

  • fold (Optional[int]) – may indicate the fold number used

  • predictions (Dict[str, Path]) – (optional) predictions that have been made with this model

  • metrics (list) – (optional) detailed training and validation metrics

__init__(pipeline: ~pathlib.Path, parameters: ~pathlib.Path, performance: float, training_time: float = -1.0, created: ~datetime.datetime | None = None, task: str | None = None, fold: int | None = None, predictions: ~typing.Dict[str, ~pathlib.Path] = <factory>, metrics: list = <factory>, _stored: ~pathlib.Path | None = None) None
created: datetime | None = None
fold: int | None = None
classmethod from_json(path: Path, results_root: Path | None = None) ModelStorage[source]

Counterpart to saving the storage. Creates the storage object from a file.

Parameters:
  • path (Path) – path to load the storage from

  • results_root (Path) – the current systems’ results root, if not provided will be tried to be inferred

Returns:

a model storage dataclass

metrics: list
parameters: Path
performance: float
pipeline: Path
predictions: Dict[str, Path]
store(task_struct: TaskStruct | None = None, path: Path | None = None, fold: int | None = None) Path[source]

Saves the model storage. If struct is given it creates a new path and returns it, if path is given otherwise uses that if None is given it tries to look up if the storage has been loaded previously and will update that location.

Parameters:
  • task_struct (Optional[TaskStruct]) – task struct corresponding to the task the model was trained on, will be used to determine the path.

  • path (Optional[Path]) – (optional) if a path already exists for this storage, overwrite it, raises an error if the presented path does not exist yet

  • fold (Optional[int]) – (optional) if a fold is specified and path is None, the file name will be fold_{fold}.json, otherwise the file name falls back to model_storage.json.

Returns:

the path the storage was saved to

task: str | None = None
training_time: float = -1.0