sampling

The sampling config group determines some behaviour of the dataloader within MultiTaskDataModule as well as the underlying TaskDataset.

full

sample_num
default: 0
  • number of samples per epoch

  • if 0 will use len(dataset) samples (even if balanced sampling is active)

balanced
default: false
  • if true will try to sample equally from each (target) class

  • if false samples randomly over the split

  • unbalanced sampling might activate weights in loss criterion, see loss.auto_activate_weighing

batch_size
default: 300
  • number of samples used in one forward+backward pass

  • batch size will be overwritten if you activate tune.bs to automatically tune the batch size

drop_last
default: false
  • whether the final (incomplete) batch will be dropped at the end of the epoch

  • directly passed to DataLoader

enable_caching
default: false
  • if true activates a caching mechanism which trades in RAM usage for less disk access

  • only works for already preprocessed datasets!

cache_max_size
default: 10000
  • sets a max_size of cache (in terms of images), cache will be disabled if more images are in the datasets loaded

  • avoid exploding RAM by keeping low or downsize images during preprocess

  • from experience increasing above 10000 does not yield any more benefits, but this may be very case dependent

  • for full optimization consider experimenting with num_workers entry of the config