mml.core.data_preparation

mml.core.data_preparation deals with the initial data integration part, for more general runtime data loading see which is dealt with in mml.core.data_loading.

The two core components of integrating a task into mml are
  • DSetCreator to virtually arrange data at the right place

  • TaskCreator to aggregate full task description into a .json file

The splitting of these two concepts has the following advantages:
  • large files like images and masks are only stored once even if multiple tasks share the same files

  • support of multiple kind of tasks on the same data

  • capabilities to easily modify task descriptions without touching underlying data

The modules dset_creator and task_creator hold the respective classes. fake_task is a simple instantiation of those, that can be used during testing. The registry is the central spot to administrate all data_archive provides the capability to describe and arrange raw datasets, as e.g. to be downloaded from the web. The archive_extractors add support for unpacking archives as zip or rar. Finally utils holds a bunch of convenience functions to be used while creating tasks.