peft
The peft config group manages parameter efficient finetuning strategies. Note that this feature is still in beta
phase. By default (peft=none) no peft is performed. If activated the model will get injected some adapters that will
be trainable but all other backbone parameters will be frozen. Note that model heads always remain trainable. See
BaseModel for implementation details and
huggingface/peft for the details of the library that MML leverages.
Note that once peft is activated on a model this process is not reversible, any loading of the model will always use
the originally configured peft method. The following example gives a good overview on som eof the configuration options:
lora
Low-Rank Adaptation (LoRA) is a PEFT method that decomposes a large matrix into two smaller low-rank matrices in the some model layers. This drastically reduces the number of parameters that need to be fine-tuned. Please refer to the docs for all config options of LoRA.
- r
- default: 8
LoRa attention dimension (the “rank”)
- target_modules
- default: auto
list of module names or regex expression of the module names to replace with LoRA
for example, [‘q’, ‘v’] or ‘.*decoder.*(SelfAttention|EncDecAttention).*(q|v)$’
if “auto” will find all compatible layers
- exclude_modules
- default: null
the names of the modules to not apply the adapter. When passing a string, a regex match will be performed.
when passing a list of strings, either an exact match will be performed
or it is checked if the name of the module ends with any of the passed strings