mml.core.data_loading.task_dataset
- class TaskDataset[source]
Bases:
DatasetThe TaskDataset class represents a loadable dataset, handling folds, data loading, different modalities of a task as well as non-batched transforms. After initialization, it may be directly given to some (multithreaded) dataloader.
- __init__(root: Path | str, split: DataSplit = DataSplit.TRAIN, fold: int = 0, transform: AugmentationModule | AugmentationModuleContainer | None = None, caching_limit: int = 0, loaders: Dict[Modality, ModalityLoader] | None = None)[source]
The TaskDataset initialization loads all meta information on the task and selects active split + fold. This choice can later be changed by the ‘select_samples’ method.
- Parameters:
root (Path) – Path to TASKXXX_name.json file of task to load.
split (DataSplit) – one of ‘train’, ‘val’, ‘full_train’ and ‘test’
fold (int) – irrelevant if ‘test’ split, inactive fold in ‘train’ split and only active fold in ‘val’ split
transform (Optional[A.Compose]) – :mod:albumentation compose transform to be applied on samples
caching_limit (int) – this corresponds to the number of max images cached
ModalityLoader]] (Optional[Dict[Modality,) – a dict of ModalityLoaders for this task, if None are given a default set of loaders is used
- enable_cache() None[source]
After cache has been created and filled, enable caching to speed up training. :return:
- static get_classes_from_idx_dict(idx_to_class: Dict[int, str]) List[str][source]
Transforms the idx_to_class dict of a task to the actual list of classes.
- Parameters:
idx_to_class – index to class mapping as provided in task meta information
- Returns:
class list, ordered by increasing idx
- load_sample(index: int) Dict[str, Any][source]
Loads all necessary components. This based on the active modalities and the information provided there. Be aware that for preprocessing the raw_index_mapping is removed by default (set to None). Handle this separately.
- Parameters:
index – int within range(len(self.samples))
- Returns:
dict with modality key (str) and obj
- class TupelizedTaskDataset[source]
Bases:
Dataset- __init__(task_dataset: TaskDataset, transform: Compose | None = None)[source]
Turns the output of a TaskDataset to tuples (which are dicts by default). Also allows to overwrite the transform.
- Parameters:
task_dataset (TaskDataset) – TaskDataset instance
transform – (optional) if not None, overwrites the dataset transform