mml.core.data_loading.task_dataset

class TaskDataset[source]

Bases: Dataset

The TaskDataset class represents a loadable dataset, handling folds, data loading, different modalities of a task as well as non-batched transforms. After initialization, it may be directly given to some (multithreaded) dataloader.

__init__(root: Path | str, split: DataSplit = DataSplit.TRAIN, fold: int = 0, transform: AugmentationModule | AugmentationModuleContainer | None = None, caching_limit: int = 0, loaders: Dict[Modality, ModalityLoader] | None = None)[source]

The TaskDataset initialization loads all meta information on the task and selects active split + fold. This choice can later be changed by the ‘select_samples’ method.

Parameters:
  • root (Path) – Path to TASKXXX_name.json file of task to load.

  • split (DataSplit) – one of ‘train’, ‘val’, ‘full_train’ and ‘test’

  • fold (int) – irrelevant if ‘test’ split, inactive fold in ‘train’ split and only active fold in ‘val’ split

  • transform (Optional[A.Compose]) – :mod:albumentation compose transform to be applied on samples

  • caching_limit (int) – this corresponds to the number of max images cached

  • ModalityLoader]] (Optional[Dict[Modality,) – a dict of ModalityLoaders for this task, if None are given a default set of loaders is used

disable_cache() None[source]

Deactivates the usage of the internal image cache. :return:

enable_cache() None[source]

After cache has been created and filled, enable caching to speed up training. :return:

fill_cache(num_workers: int = 0) None[source]
Returns:

static get_classes_from_idx_dict(idx_to_class: Dict[int, str]) List[str][source]

Transforms the idx_to_class dict of a task to the actual list of classes.

Parameters:

idx_to_class – index to class mapping as provided in task meta information

Returns:

class list, ordered by increasing idx

load_sample(index: int) Dict[str, Any][source]

Loads all necessary components. This based on the active modalities and the information provided there. Be aware that for preprocessing the raw_index_mapping is removed by default (set to None). Handle this separately.

Parameters:

index – int within range(len(self.samples))

Returns:

dict with modality key (str) and obj

select_samples(split: DataSplit, fold: int) None[source]

Chooses the actual samples from the task meta information. Handles splits, folds and subsets.

Parameters:
  • split (DataSplit) – either ‘train’, ‘val’, ‘full_train’, ‘unlabelled’ or ‘test’

  • fold (int) – irrelevant if ‘test’ split, inactive fold in ‘train’ split and only active fold in ‘val’ split

Returns:

None

class TupelizedTaskDataset[source]

Bases: Dataset

__init__(task_dataset: TaskDataset, transform: Compose | None = None)[source]

Turns the output of a TaskDataset to tuples (which are dicts by default). Also allows to overwrite the transform.

Parameters:
  • task_dataset (TaskDataset) – TaskDataset instance

  • transform – (optional) if not None, overwrites the dataset transform