LoRA and Adapter Fine-Tuning

Why LoRA

Low-Rank Adaptation (LoRA) fine-tunes large models by training only small adapter matrices instead of full weight tensors. This reduces trainable parameters and optimizer memory.

Core classes

Grilly provides:

nn.LoRAConfig
nn.LoRALinear
nn.LoRAEmbedding
nn.LoRAAttention
nn.LoRAModel

You can also use utility functions:

nn.apply_lora_to_linear(…)
nn.calculate_lora_params(…)

Basic LoRALinear flow

import numpy as np
from grilly.nn.lora import LoRALinear
from grilly.nn.autograd import Variable

lora = LoRALinear(in_features=768, out_features=768, rank=8, alpha=16)
x = Variable(np.random.randn(4, 768).astype(np.float32))
y = lora(x)

print("trainable params:", lora.num_trainable_params())

Managing inference overhead

For deployment, merge adapters into base weights:

lora.merge_weights()
# inference path
lora.unmerge_weights()

Checkpointing adapters

LoRAModel supports saving/loading adapter checkpoints with configuration and metadata, enabling portable fine-tuning artifacts.