Tensor Model and Shapes ======================= Primary tensor type ------------------- Grilly APIs are built around NumPy arrays. - Most compute paths expect `np.float32`. - Some indexing paths use integer dtypes (`np.int32`, `np.int64`). - Several modules can accept tensor-like objects and convert internally. Shape conventions ----------------- Common conventions used by Grilly modules: - Dense feedforward: `(batch, features)` or `(batch, seq, features)` - Conv2d family: `(batch, channels, height, width)` - Attention (module-level): `(batch, seq, embed_dim)` - Flash attention backend path: often `(batch, heads, seq, head_dim)` - Memory search: queries `(Q, D)`, database/codebook `(N, D)` Why shape discipline matters ---------------------------- Many kernels dispatch with explicit shape-derived workgroups. Wrong layout can silently hurt performance or break correctness. Best practices: 1. Normalize dtype and layout before call boundaries. 2. Keep tensors contiguous when possible. 3. Explicitly print shapes in early pipeline debugging. Example input guards -------------------- .. code-block:: python import numpy as np def ensure_f32(x): x = np.asarray(x) if x.dtype != np.float32: x = x.astype(np.float32) return x def expect_2d(x): if x.ndim != 2: raise ValueError(f"expected 2D tensor, got shape {x.shape}") return x Parameter and gradient storage ------------------------------ `grilly.nn` parameters are stored as parameter-like arrays with optional `.grad`. Backward calls populate `.grad`, and optimizers consume those gradients. For stable training loops: 1. Forward pass. 2. Build output gradient for your loss. 3. `model.zero_grad()` 4. `model.backward(grad_output)` 5. `optimizer.step()`