LoRA and Adapter Fine-Tuning
============================

Why LoRA
--------

Low-Rank Adaptation (LoRA) fine-tunes large models by training only small
adapter matrices instead of full weight tensors. This reduces trainable
parameters and optimizer memory.

Core classes
------------

Grilly provides:

- `nn.LoRAConfig`
- `nn.LoRALinear`
- `nn.LoRAEmbedding`
- `nn.LoRAAttention`
- `nn.LoRAModel`

You can also use utility functions:

- `nn.apply_lora_to_linear(...)`
- `nn.calculate_lora_params(...)`

Basic `LoRALinear` flow
-----------------------

.. code-block:: python

   import numpy as np
   from grilly.nn.lora import LoRALinear
   from grilly.nn.autograd import Variable

   lora = LoRALinear(in_features=768, out_features=768, rank=8, alpha=16)
   x = Variable(np.random.randn(4, 768).astype(np.float32))
   y = lora(x)

   print("trainable params:", lora.num_trainable_params())

Managing inference overhead
---------------------------

For deployment, merge adapters into base weights:

.. code-block:: python

   lora.merge_weights()
   # inference path
   lora.unmerge_weights()

Checkpointing adapters
----------------------

`LoRAModel` supports saving/loading adapter checkpoints with configuration and
metadata, enabling portable fine-tuning artifacts.