peft

The peft config group manages parameter efficient finetuning strategies. Note that this feature is still in beta phase. By default (peft=none) no peft is performed. If activated the model will get injected some adapters that will be trainable but all other backbone parameters will be frozen. Note that model heads always remain trainable. See BaseModel for implementation details and huggingface/peft for the details of the library that MML leverages. Note that once peft is activated on a model this process is not reversible, any loading of the model will always use the originally configured peft method. The following example gives a good overview on som eof the configuration options:

lora

Low-Rank Adaptation (LoRA) is a PEFT method that decomposes a large matrix into two smaller low-rank matrices in the some model layers. This drastically reduces the number of parameters that need to be fine-tuned. Please refer to the docs for all config options of LoRA.

r

default: 8

LoRa attention dimension (the “rank”)

target_modules

default: auto

list of module names or regex expression of the module names to replace with LoRA
for example, [‘q’, ‘v’] or ‘.*decoder.*(SelfAttention|EncDecAttention).*(q|v)$’
if “auto” will find all compatible layers

exclude_modules

default: null

the names of the modules to not apply the adapter. When passing a string, a regex match will be performed.
when passing a list of strings, either an exact match will be performed
or it is checked if the name of the module ends with any of the passed strings