mml.core.data_preparation.utils
- class TaskCreatorActions[source]
Bases:
StrEnumAbstract action that can be done on a task creator.
- FIND_DATA = 'find_data'
- FINISH = 'finish'
- LOAD = 'load'
- MODIFY = 'modify'
- NONE = 'none'
- SET_FOLDING = 'set_folds'
- SET_STATS = 'set_stats'
- __new__(value)
- class TaskCreatorState[source]
Bases:
IntEnumAbstract states a task creator can be in. Default traversal path is: INIT –find_data–> DATA_FOUND –set_folding–> FOLDS_SPLIT –infer/set_stats–> STATS_SET –finish–> FINISHED
- DATA_FOUND = 1
- FINISHED = 4
- FOLDS_SPLIT = 2
- INIT = 0
- STATS_SET = 3
- __new__(value)
- traverse(action: TaskCreatorActions) TaskCreatorState[source]
Implements the legal traversals of states and actions within a task creator.
- Parameters:
action (TaskCreatorActions) – The action that is tried to be applied on the current state.
- Returns:
The follow-up state of the task creator.
- class WIPBar[source]
Bases:
SingletonA singleton class that shows a loading loop managed by a thread (e.g. while data is copied). Description can be updated during loading and in the end will print either a success or failure message, depending on whether an exception was raised. Exception handling can be done either inside or outside the context. Does not interfere with itself if used in a nested fashion, but the user is responsible for updating the description after each inner loop closes.
Usage:
with WIPBar() as bar: bar.desc = 'Copying' shutil.copytree(...) bar.desc = 'Extracting' zipfile ... # continue without WIPBar
- calc_means_stds_sizes(task_path: Path, means: bool = True, stds: bool = True, sizes: bool = True, const_size: bool = False, device: device = device(type='cuda')) Dict[str, Sizes | RGBInfo][source]
Calculates means, stds and/or sizes of a task. Requires at most 2 runs through the dataset. Might take some time.
- Parameters:
task_path – path to task .json file.
means – if means should be calculated
stds – if stds should be calculated
sizes – if sizes should be calculated
const_size – if images have constant size (allows for faster loading, if ‘sizes’ is True and finds constant sizes, this is detected internally)
device – device to be used for computations
- Returns:
dict with possible keys ‘sizes’, ‘means’, ‘stds’ and Sizes / RGBInfo values
- download_file(path_to_store: Path, download_url: str, file_name: str) None[source]
Downloads file and places it accordingly. Skips finished downloads.
- get_iterator_and_mapping_from_image_dataset(root: Path, dup_id_flag: bool | None = False, classes: List[str] | None = None) Tuple[List[Dict[Modality, int | List[int] | List[float] | str]], Dict[int, str]][source]
Utility func for the reoccurring case, that classification datasets are ordered as in
ImageFolder. The iterator will store the file stem of the image as id.- Parameters:
- Returns:
data iterator and idx_to_class as used for TaskCreator.find_data
- get_iterator_from_segmentation_dataset(images_root: ~pathlib.Path, masks_root: ~pathlib.Path, path_matcher: ~typing.Callable[[~pathlib.Path], ~pathlib.Path] = <function <lambda>>) List[Dict[Modality, int | List[int] | List[float] | str]][source]
Utility func for the reoccuring case, that segmentation datasets are ordered as follows: there are two separate folders containing the images and labels respectively. There is also some similar structure / pattern in the naming of these. The iterator will store the file stem of the image as id.
- Parameters:
images_root – root path of image data
masks_root – root path of mask data
path_matcher – (optional) function to get the (relative) mask path from the (relative) image path, relative corresponds to the provided root paths, default value is the identity function
- Returns:
data iterator as used for TaskCreator.find_data
- get_iterator_from_unlabeled_dataset(root: Path) List[Dict[Modality, int | List[int] | List[float] | str]][source]
Utility func for the reoccurring case, that unlaballed data is simply organised in a single folder. The iterator will store the file stem of the image as id.
- Parameters:
root (Path) – root path
- Returns:
data iterator as used for TaskCreator.find_data