mml.core.data_preparation.utils

class TaskCreatorActions[source]

Bases: StrEnum

Abstract action that can be done on a task creator.

FIND_DATA = 'find_data'
FINISH = 'finish'
LOAD = 'load'
MODIFY = 'modify'
NONE = 'none'
SET_FOLDING = 'set_folds'
SET_STATS = 'set_stats'
__new__(value)
class TaskCreatorState[source]

Bases: IntEnum

Abstract states a task creator can be in. Default traversal path is: INIT –find_data–> DATA_FOUND –set_folding–> FOLDS_SPLIT –infer/set_stats–> STATS_SET –finish–> FINISHED

DATA_FOUND = 1
FINISHED = 4
FOLDS_SPLIT = 2
INIT = 0
STATS_SET = 3
__new__(value)
traverse(action: TaskCreatorActions) TaskCreatorState[source]

Implements the legal traversals of states and actions within a task creator.

Parameters:

action (TaskCreatorActions) – The action that is tried to be applied on the current state.

Returns:

The follow-up state of the task creator.

class WIPBar[source]

Bases: Singleton

A singleton class that shows a loading loop managed by a thread (e.g. while data is copied). Description can be updated during loading and in the end will print either a success or failure message, depending on whether an exception was raised. Exception handling can be done either inside or outside the context. Does not interfere with itself if used in a nested fashion, but the user is responsible for updating the description after each inner loop closes.

Usage:

with WIPBar() as bar:
    bar.desc = 'Copying'
    shutil.copytree(...)
    bar.desc = 'Extracting'
    zipfile ...
# continue without WIPBar
__init__()[source]
reset_messages()[source]
calc_means_stds_sizes(task_path: Path, means: bool = True, stds: bool = True, sizes: bool = True, const_size: bool = False, device: device = device(type='cuda')) Dict[str, Sizes | RGBInfo][source]

Calculates means, stds and/or sizes of a task. Requires at most 2 runs through the dataset. Might take some time.

Parameters:
  • task_path – path to task .json file.

  • means – if means should be calculated

  • stds – if stds should be calculated

  • sizes – if sizes should be calculated

  • const_size – if images have constant size (allows for faster loading, if ‘sizes’ is True and finds constant sizes, this is detected internally)

  • device – device to be used for computations

Returns:

dict with possible keys ‘sizes’, ‘means’, ‘stds’ and Sizes / RGBInfo values

download_file(path_to_store: Path, download_url: str, file_name: str) None[source]

Downloads file and places it accordingly. Skips finished downloads.

Parameters:
  • path_to_store (Path)

  • download_url (str)

  • file_name (str)

Returns:

None

get_iterator_and_mapping_from_image_dataset(root: Path, dup_id_flag: bool | None = False, classes: List[str] | None = None) Tuple[List[Dict[Modality, int | List[int] | List[float] | str]], Dict[int, str]][source]

Utility func for the reoccurring case, that classification datasets are ordered as in ImageFolder. The iterator will store the file stem of the image as id.

Parameters:
  • root (Path) – root path

  • dup_id_flag (Optional[bool]) – (optional) flag for same filenames in different classes

  • classes (Optional[List[str]]) – (optional) list defining classes, if not given any dir will be used as class

Returns:

data iterator and idx_to_class as used for TaskCreator.find_data

get_iterator_from_segmentation_dataset(images_root: ~pathlib.Path, masks_root: ~pathlib.Path, path_matcher: ~typing.Callable[[~pathlib.Path], ~pathlib.Path] = <function <lambda>>) List[Dict[Modality, int | List[int] | List[float] | str]][source]

Utility func for the reoccuring case, that segmentation datasets are ordered as follows: there are two separate folders containing the images and labels respectively. There is also some similar structure / pattern in the naming of these. The iterator will store the file stem of the image as id.

Parameters:
  • images_root – root path of image data

  • masks_root – root path of mask data

  • path_matcher – (optional) function to get the (relative) mask path from the (relative) image path, relative corresponds to the provided root paths, default value is the identity function

Returns:

data iterator as used for TaskCreator.find_data

get_iterator_from_unlabeled_dataset(root: Path) List[Dict[Modality, int | List[int] | List[float] | str]][source]

Utility func for the reoccurring case, that unlaballed data is simply organised in a single folder. The iterator will store the file stem of the image as id.

Parameters:

root (Path) – root path

Returns:

data iterator as used for TaskCreator.find_data