Medical Meta Learner

mml is a research oriented Python package which aims to provide an easy and scalable way of deep learning on multiple image tasks (see Meta-Learning).

It features:

a clear methodology to store, load, refer, modify and combine RGB image datasets across task types (classification, segmentation, …)

a highly configurable CLI for the full deep learning pipeline

a dedicated file management system, capable of continuing aborted experiments, reuse previous results and parallelize runs

an api for interactive pre- and post-experiment exploration

smooth integration of latest deep learning libraries (lightning, hydra, optuna, …)

easy expandability via using plugins or directly hooking into runtime objects via scripts or notebooks

good documentation, broad testing and ambitious goals

Note

MML is still considered in Beta stage, which means any feedback is highly appreciated!

Quickstart

Setup mml as described in Installation and write a short script to load your data into mml as follows:

from mml.api import (DSetCreator, License, Keyword, TaskCreator, TaskType, register_dsetcreator,
                     register_taskcreator, get_iterator_and_mapping_from_image_dataset)
from mml.cli import main
# this example shows how to quickly include an existing pytorch image classification dataset
from my_code.data import MyExistingPyTorchDataSet

dset_name = 'my_dataset'
task_name = 'my_task'

@register_dsetcreator(dset_name=dset_name)
def create_dset():
    dset_creator = DSetCreator(dset_name=dset_name)
    # DSetCreator has various help functions to create datasets (e.g. from kaggle, pytorch datasets, ...)
    train_dset = MyExistingPyTorchDataSet(root=dset_creator.download_path, download=True, train=True)
    test_dset = MyExistingPyTorchDataSet(root=dset_creator.download_path, download=True, train=False)
    dset_path = dset_creator.extract_from_pytorch_datasets(datasets={'training': train_dset,
                                                                     'testing': test_dset},
                                                           task_type=TaskType.CLASSIFICATION,
                                                           class_names=train_dset.classes)
    return dset_path


@register_taskcreator(task_name=task_name, dset_name=dset_name)
def create_task(dset_path: Path):
    task = TaskCreator(dset_path=dset_path, name=task_name,
                       task_type=TaskType.CLASSIFICATION,
                       desc="(optional) My task description.",
                       ref="(optional) My bibtex entry.",
                       url='(optional) My data website.',
                       instr='(optional) Any instructions to access data.',
                       lic=License.UNKNOWN,  # the license of the task
                       release='(optional) Year of data release.',
                       keywords=[Keyword.NATURAL_OBJECTS])  # choose from a variety of keywords to describe data background
    # if classes are split by folders (which is the case if using extract_from_pytorch_datasets), one may simply
    train_iterator, idx_to_class = get_iterator_and_mapping_from_image_dataset(
        root=dset_path / 'training_data', classes=None)
    test_iterator, _ = get_iterator_and_mapping_from_image_dataset(
        root=dset_path / 'testing_data', classes=None)
    task.find_data(train_iterator=train_iterator, test_iterator=test_iterator, idx_to_class=idx_to_class)
    task.auto_complete()

# start the MML cli from this script
if __name__ == "__main__":
    main()

You can run your script with any mml CLI configurations and use the registered data along. The following command installs the data, preprocesses it, trains a model and infers predictions on the test split.

python script_name.py create task_list=[my_task]
python script_name.py pp task_list=[my_task]
python script_name.py train pivot.name=my_task mode.subroutines=[train,predict] mode.cv=false

See Getting started for more details on customizing the pipeline via CLI. Note that after the create call (where the registered creators are needed) from now on you may omit the python script_name.py to start the mml pipeline and instead type mml train ... instead.

Similar libraries

Here is a small comparison to python packages that are close to mml:

lightning-hydra-template is a template for deep learning projects, similarly relying on hydra and pytorch lightning, offers much less functionality and configuration options as it is intended to be individually extended for each project, mml on the other hand tries to unify many tasks, datasets and models in one environment to ease cross project reusability
GaNDLF the Generally Nuanced Deep Learning Framework for segmentation, regression and classification has a similar scope to mml, no code requried to train robust models and few code to customize the framework, to name some differences it relies on click instead of hydra, implements training routines itself instead of leveraging pytorch lightning and focuses less on reusability of past experiments
MONAI provides state-of-the-art, end-to-end training workflows for healthcare imaging; it implements a lot of metrics, network architectures and transforms specifically to the need of 3D medical image segmentation (but is not limited to this use case), preserving meta information on model training and applicability is also part of the concept, training routine is based on ignite (in contrast to pytorch lignting in mml)
OpenMMLab provides an ecosystem of dozens of interoperable toolboxes for computer vision models (e.g. mmdetection for detection models, mmpose for pose estimation or mmpretrain for model pre-training), while expandability and interoperability is a key feature it has minimal dependencies - implementing most features within the ecosystem

Author and Contributors

Feel free to leave bug reports or feature requests:

Main author (>99%):

Patrick Godau

Other contributors:

Licensing

This library is licensed under the permissive MIT license, which is fully compatible with both academic and commercial applications. This project is/was supported by

the German Federal Ministry of Health under the reference number 2520DAT0P1 as part of the pAItient (Protected Artificial Intelligence Innovation Environment for Patient Oriented Digital Health Solutions for developing, testing and evidence based evaluation of clinical value) project,

HELMHOLTZ IMAGING, a platform of the Helmholtz Information & Data Science Incubator and

the Helmholtz Association under the joint research school “HIDSS4Health – Helmholtz Information and Data Science School for Health”

If you use this code in a research paper, please cite:

@InProceedings{Godau2021TaskF,
    author="Godau, Patrick and Maier-Hein, Lena",
    editor="de Bruijne, Marleen and Cattin, Philippe C. and Cotin, St{\'e}phane and Padoy, Nicolas and Speidel, Stefanie and Zheng, Yefeng and Essert, Caroline",
    title="Task Fingerprinting for Meta Learning inBiomedical Image Analysis",
    booktitle="Medical Image Computing and Computer Assisted Intervention -- MICCAI 2021",
    year="2021",
    publisher="Springer International Publishing",
    pages="436--446"
}

Indices

Index

Module Index