mml.core.data_loading.task_dataset

class TaskDataset[source]

Bases: Dataset

The TaskDataset class represents a loadable dataset, handling folds, data loading, different modalities of a task as well as non-batched transforms. After initialization, it may be directly given to some (multithreaded) dataloader.

__init__(root: Path | str, split: DataSplit = DataSplit.TRAIN, fold: int = 0, transform: AugmentationModule | AugmentationModuleContainer | None = None, caching_limit: int = 0, loaders: Dict[Modality, ModalityLoader] | None = None)[source]

The TaskDataset initialization loads all meta information on the task and selects active split + fold. This choice can later be changed by the ‘select_samples’ method.

Parameters:

root (Path) – Path to TASKXXX_name.json file of task to load.
split (DataSplit) – one of ‘train’, ‘val’, ‘full_train’ and ‘test’
fold (int) – irrelevant if ‘test’ split, inactive fold in ‘train’ split and only active fold in ‘val’ split
transform (Optional[A.Compose]) – :mod:albumentation compose transform to be applied on samples
caching_limit (int) – this corresponds to the number of max images cached
ModalityLoader]] (Optional[Dict[Modality,) – a dict of ModalityLoaders for this task, if None are given a default set of loaders is used

disable_cache() → None[source]: Deactivates the usage of the internal image cache. :return:

enable_cache() → None[source]: After cache has been created and filled, enable caching to speed up training. :return:

fill_cache(num_workers: int = 0) → None[source]

Returns:

static get_classes_from_idx_dict(idx_to_class: Dict[int, str]) → List[str][source]

Transforms the idx_to_class dict of a task to the actual list of classes.

Parameters:: idx_to_class – index to class mapping as provided in task meta information
Returns:: class list, ordered by increasing idx

load_sample(index: int) → Dict[str, Any][source]

Loads all necessary components. This based on the active modalities and the information provided there. Be aware that for preprocessing the raw_index_mapping is removed by default (set to None). Handle this separately.

Parameters:: index – int within range(len(self.samples))
Returns:: dict with modality key (str) and obj

select_samples(split: DataSplit, fold: int) → None[source]

Chooses the actual samples from the task meta information. Handles splits, folds and subsets.

Parameters:

split (DataSplit) – either ‘train’, ‘val’, ‘full_train’, ‘unlabelled’ or ‘test’
fold (int) – irrelevant if ‘test’ split, inactive fold in ‘train’ split and only active fold in ‘val’ split

Returns:

None

class TupelizedTaskDataset[source]

Bases: Dataset

__init__(task_dataset: TaskDataset, transform: Compose | None = None)[source]

Turns the output of a TaskDataset to tuples (which are dicts by default). Also allows to overwrite the transform.

Parameters:

task_dataset (TaskDataset) – TaskDataset instance
transform – (optional) if not None, overwrites the dataset transform