grilly.optim

Optimizers Module (PyTorch-like)

GPU-accelerated optimizers using Vulkan compute shaders.

Classes

`Adam`(params[, lr, betas, eps, weight_decay, ...])	Adam optimizer using GPU-accelerated shaders.
`AdamW`(params[, lr, betas, eps, ...])	AdamW optimizer with decoupled weight decay.
`AffectAdam`(params[, lr, betas, eps, ...])	Affect-aware Adam optimizer.
`AutoHypergradientAdamW`(params[, lr, betas, ...])	AdamW with OSGM-style auto hypergradient adjustment.
`CosineAnnealingLR`(optimizer, T_max[, ...])	Set the learning rate using a cosine annealing schedule.
`HypergradientAdamW`(params[, lr, betas, eps, ...])	AdamW with hypergradient-based online learning rate adaptation.
`LRScheduler`(optimizer[, last_epoch])	Base class for learning rate schedulers.
`NLMS`(params[, lr, lr_decay, lr_min, eps, ...])	NLMS (Normalized Least Mean Squares) optimizer.
`NaturalGradient`(params[, lr, ...])	Natural Gradient optimizer using Fisher information matrix.
`OneCycleLR`(optimizer, max_lr[, total_steps, ...])	Sets the learning rate according to the 1cycle learning rate policy.
`Optimizer`(params, defaults)	Base class for all optimizers.
`ReduceLROnPlateau`(optimizer[, mode, factor, ...])	Reduce learning rate when a metric has stopped improving.
`SGD`(params[, lr, momentum, weight_decay, ...])	Stochastic Gradient Descent optimizer.
`StepLR`(optimizer, step_size[, gamma, last_epoch])	Decays the learning rate by gamma every step_size epochs.

class grilly.optim.Optimizer(params, defaults)[source]

Bases: object

Base class for all optimizers.

Similar to torch.optim.Optimizer, but works with numpy arrays and GPU-accelerated operations via Vulkan shaders.

Initialize optimizer.

Parameters

params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
defaults (dict[str, Any]) – Dictionary of default hyperparameter values

__init__(params, defaults)[source]

Initialize optimizer.

Parameters

params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
defaults (dict[str, Any]) – Dictionary of default hyperparameter values

Dependencies: numpy.

Variables: params (collections.abc.Iterator[numpy.ndarray], required); defaults (dict[str, typing.Any], required).

Usage Example

import numpy as np
from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.__init__(params=np.zeros(1, dtype=np.float32), defaults='example')

zero_grad()[source]

Clear gradients for all parameters.

Note: In this implementation, gradients are expected to be stored in a separate structure (e.g., in the model’s backward pass). This method is provided for API compatibility.

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.zero_grad()

step(closure=None)[source]

Perform a single optimization step.

Parameters: closure – Optional closure that reevaluates the model and returns loss

Must be implemented by subclasses.

Dependencies: None detected from callable globals.

Variables: closure (Any, optional, default None).

Usage Example

from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.step(closure=None)

state_dict()[source]

Return the state of the optimizer as a dict.

Returns: Dictionary containing optimizer state
Return type: dict[str, Any]

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.state_dict()

load_state_dict(state_dict)[source]

Load optimizer state from state_dict.

Parameters: state_dict (dict[str, Any]) – Dictionary containing optimizer state

Dependencies: None detected from callable globals.

Variables: state_dict (dict[str, typing.Any], required).

Usage Example

from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.load_state_dict(state_dict='example')

class grilly.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.0, use_gpu=True)[source]

Bases: Optimizer

Adam optimizer using GPU-accelerated shaders.

Uses: adam-update.glsl

Implements the Adam algorithm: - m = beta1 * m + (1 - beta1) * grad - v = beta2 * v + (1 - beta2) * grad^2 - m_hat = m / (1 - beta1^t) - v_hat = v / (1 - beta2^t) - param = param - lr * m_hat / (sqrt(v_hat) + eps)

Initialize Adam optimizer.

Parameters

params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Learning rate (default: 1e-3)
betas (tuple) – Coefficients for computing running averages (default: (0.9, 0.999))
eps (float) – Term added to denominator for numerical stability (default: 1e-8)
weight_decay (float) – Weight decay (L2 penalty) (default: 0.0)
use_gpu (bool) – Whether to use GPU acceleration (default: True)

__init__(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.0, use_gpu=True)[source]

Initialize Adam optimizer.

Parameters

params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Learning rate (default: 1e-3)
betas (tuple) – Coefficients for computing running averages (default: (0.9, 0.999))
eps (float) – Term added to denominator for numerical stability (default: 1e-8)
weight_decay (float) – Weight decay (L2 penalty) (default: 0.0)
use_gpu (bool) – Whether to use GPU acceleration (default: True)

Dependencies: None detected from callable globals.

Variables: params (collections.abc.Iterator[numpy.ndarray], required); lr (float, optional, default 0.001); betas (tuple, optional, default (0.9, 0.999)); eps (float, optional, default 1e-08); weight_decay (float, optional, default 0.0); use_gpu (bool, optional, default True).

Usage Example

import numpy as np
from grilly.optim.adam import Adam

instance = Adam(...)
result = instance.__init__(params=np.zeros(1, dtype=np.float32), lr=0.001, betas=(), eps=1e-08, weight_decay=0.0, use_gpu=True)

_get_backend()[source]

Get or create backend instance

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.adam import Adam

instance = Adam(...)
result = instance._get_backend()

step(closure=None, gradients=None)[source]

Perform a single optimization step.

Parameters

closure – Optional closure that reevaluates the model and returns loss
gradients – Optional dict mapping parameter IDs to gradients. If None, tries to get gradients from param.grad attribute.

Dependencies: numpy.

Variables: closure (Any, optional, default None); gradients (Any, optional, default None).

Usage Example

from grilly.optim.adam import Adam

instance = Adam(...)
result = instance.step(closure=None, gradients=None)

_adam_update_gpu(backend, param, grad, exp_avg, exp_avg_sq, lr, beta1, beta2, eps, beta1_t, beta2_t)[source]

GPU-accelerated Adam update using adam-update.glsl shader.

Dependencies: numpy.

Variables: backend (Any, required); param (Any, required); grad (Any, required); exp_avg (Any, required); exp_avg_sq (Any, required); lr (Any, required); beta1 (Any, required); beta2 (Any, required); eps (Any, required); beta1_t (Any, required); beta2_t (Any, required).

Usage Example

from grilly.optim.adam import Adam

instance = Adam(...)
result = instance._adam_update_gpu(backend=None, param=None, grad=None, exp_avg=None, exp_avg_sq=None, lr=None, beta1=None, beta2=None, eps=None, beta1_t=None, beta2_t=None)

load_state_dict(state_dict)

Load optimizer state from state_dict.

Parameters: state_dict (dict[str, Any]) – Dictionary containing optimizer state

Dependencies: None detected from callable globals.

Variables: state_dict (dict[str, typing.Any], required).

Usage Example

from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.load_state_dict(state_dict='example')

state_dict()

Return the state of the optimizer as a dict.

Returns: Dictionary containing optimizer state
Return type: dict[str, Any]

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.state_dict()

zero_grad()

Clear gradients for all parameters.

Note: In this implementation, gradients are expected to be stored in a separate structure (e.g., in the model’s backward pass). This method is provided for API compatibility.

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.zero_grad()

class grilly.optim.AdamW(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.01, amsgrad=False, use_gpu=True)[source]

Bases: Optimizer

AdamW optimizer with decoupled weight decay.

Implements the AdamW algorithm: - m = beta1 * m + (1 - beta1) * grad - v = beta2 * v + (1 - beta2) * grad^2 - m_hat = m / (1 - beta1^t) - v_hat = v / (1 - beta2^t) - param = param - lr * m_hat / (sqrt(v_hat) + eps) # Adam step - param = param - lr * weight_decay * param # Decoupled weight decay

This decoupling improves generalization compared to Adam’s coupled weight decay.

Initialize AdamW optimizer.

Parameters

params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Learning rate (default: 1e-3)
betas (tuple) – Coefficients for computing running averages (default: (0.9, 0.999))
eps (float) – Term added to denominator for numerical stability (default: 1e-8)
weight_decay (float) – Decoupled weight decay coefficient (default: 0.01)
amsgrad (bool) – Whether to use AMSGrad variant (default: False)
use_gpu (bool) – Whether to use GPU acceleration (default: True)

__init__(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.01, amsgrad=False, use_gpu=True)[source]

Initialize AdamW optimizer.

Parameters

params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Learning rate (default: 1e-3)
betas (tuple) – Coefficients for computing running averages (default: (0.9, 0.999))
eps (float) – Term added to denominator for numerical stability (default: 1e-8)
weight_decay (float) – Decoupled weight decay coefficient (default: 0.01)
amsgrad (bool) – Whether to use AMSGrad variant (default: False)
use_gpu (bool) – Whether to use GPU acceleration (default: True)

Dependencies: None detected from callable globals.

Variables: params (collections.abc.Iterator[numpy.ndarray], required); lr (float, optional, default 0.001); betas (tuple, optional, default (0.9, 0.999)); eps (float, optional, default 1e-08); weight_decay (float, optional, default 0.01); amsgrad (bool, optional, default False); use_gpu (bool, optional, default True).

Usage Example

import numpy as np
from grilly.optim.adamw import AdamW

instance = AdamW(...)
result = instance.__init__(params=np.zeros(1, dtype=np.float32), lr=0.001, betas=(), eps=1e-08, weight_decay=0.01, amsgrad=False, use_gpu=True)

_get_backend()[source]

Get or create backend instance

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.adamw import AdamW

instance = AdamW(...)
result = instance._get_backend()

step(closure=None, gradients=None)[source]

Perform a single optimization step.

Parameters

closure – Optional closure that reevaluates the model and returns loss
gradients – Optional dict mapping parameter IDs to gradients. If None, tries to get gradients from param.grad attribute.

Dependencies: numpy.

Variables: closure (Any, optional, default None); gradients (Any, optional, default None).

Usage Example

from grilly.optim.adamw import AdamW

instance = AdamW(...)
result = instance.step(closure=None, gradients=None)

_adamw_update_gpu(backend, param, grad, exp_avg, exp_avg_sq, lr, beta1, beta2, eps, weight_decay, beta1_t, beta2_t, amsgrad)[source]

GPU-accelerated AdamW update using adamw-update.glsl shader.

Dependencies: numpy.

Variables: backend (Any, required); param (Any, required); grad (Any, required); exp_avg (Any, required); exp_avg_sq (Any, required); lr (Any, required); beta1 (Any, required); beta2 (Any, required); eps (Any, required); weight_decay (Any, required); beta1_t (Any, required); beta2_t (Any, required); amsgrad (Any, required).

Usage Example

from grilly.optim.adamw import AdamW

instance = AdamW(...)
result = instance._adamw_update_gpu(backend=None, param=None, grad=None, exp_avg=None, exp_avg_sq=None, lr=None, beta1=None, beta2=None, eps=None, weight_decay=None, beta1_t=None, beta2_t=None, amsgrad=None)

load_state_dict(state_dict)

Load optimizer state from state_dict.

Parameters: state_dict (dict[str, Any]) – Dictionary containing optimizer state

Dependencies: None detected from callable globals.

Variables: state_dict (dict[str, typing.Any], required).

Usage Example

from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.load_state_dict(state_dict='example')

state_dict()

Return the state of the optimizer as a dict.

Returns: Dictionary containing optimizer state
Return type: dict[str, Any]

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.state_dict()

zero_grad()

Clear gradients for all parameters.

Note: In this implementation, gradients are expected to be stored in a separate structure (e.g., in the model’s backward pass). This method is provided for API compatibility.

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.zero_grad()

class grilly.optim.AffectAdam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.0, use_gpu=True)[source]

Bases: Adam

Affect-aware Adam optimizer.

Uses: affect-adam.glsl

Similar to Adam but optimized for affect/emotion processing.

Initialize AffectAdam optimizer.

Args are the same as Adam.

Parameters

params (Iterator[numpy.ndarray]) –
lr (float) –
betas (tuple) –
eps (float) –
weight_decay (float) –
use_gpu (bool) –

__init__(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.0, use_gpu=True)[source]

Initialize AffectAdam optimizer.

Args are the same as Adam.

Dependencies: None detected from callable globals.

Variables: params (collections.abc.Iterator[numpy.ndarray], required); lr (float, optional, default 0.001); betas (tuple, optional, default (0.9, 0.999)); eps (float, optional, default 1e-08); weight_decay (float, optional, default 0.0); use_gpu (bool, optional, default True).

Usage Example

import numpy as np
from grilly.optim.adam import AffectAdam

instance = AffectAdam(...)
result = instance.__init__(params=np.zeros(1, dtype=np.float32), lr=0.001, betas=(), eps=1e-08, weight_decay=0.0, use_gpu=True)

Parameters

params (Iterator[numpy.ndarray]) –
lr (float) –
betas (tuple) –
eps (float) –
weight_decay (float) –
use_gpu (bool) –

_adam_update_gpu(backend, param, grad, exp_avg, exp_avg_sq, lr, beta1, beta2, eps, beta1_t, beta2_t)[source]

GPU-accelerated AffectAdam update using affect-adam.glsl shader.

Dependencies: None detected from callable globals.

Variables: backend (Any, required); param (Any, required); grad (Any, required); exp_avg (Any, required); exp_avg_sq (Any, required); lr (Any, required); beta1 (Any, required); beta2 (Any, required); eps (Any, required); beta1_t (Any, required); beta2_t (Any, required).

Usage Example

from grilly.optim.adam import AffectAdam

instance = AffectAdam(...)
result = instance._adam_update_gpu(backend=None, param=None, grad=None, exp_avg=None, exp_avg_sq=None, lr=None, beta1=None, beta2=None, eps=None, beta1_t=None, beta2_t=None)

_get_backend()

Get or create backend instance

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.adam import Adam

instance = Adam(...)
result = instance._get_backend()

load_state_dict(state_dict)

Load optimizer state from state_dict.

Parameters: state_dict (dict[str, Any]) – Dictionary containing optimizer state

Dependencies: None detected from callable globals.

Variables: state_dict (dict[str, typing.Any], required).

Usage Example

from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.load_state_dict(state_dict='example')

state_dict()

Return the state of the optimizer as a dict.

Returns: Dictionary containing optimizer state
Return type: dict[str, Any]

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.state_dict()

step(closure=None, gradients=None)

Perform a single optimization step.

Parameters

closure – Optional closure that reevaluates the model and returns loss
gradients – Optional dict mapping parameter IDs to gradients. If None, tries to get gradients from param.grad attribute.

Dependencies: numpy.

Variables: closure (Any, optional, default None); gradients (Any, optional, default None).

Usage Example

from grilly.optim.adam import Adam

instance = Adam(...)
result = instance.step(closure=None, gradients=None)

zero_grad()

Clear gradients for all parameters.

Note: In this implementation, gradients are expected to be stored in a separate structure (e.g., in the model’s backward pass). This method is provided for API compatibility.

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.zero_grad()

class grilly.optim.SGD(params, lr=0.001, momentum=0.0, weight_decay=0.0, dampening=0.0, nesterov=False, use_gpu=False)[source]

Bases: Optimizer

Stochastic Gradient Descent optimizer.

Implements: param = param - lr * grad

Note: SGD is simple enough that CPU implementation is efficient. For GPU acceleration, we could use a generic update shader in the future.

Initialize SGD optimizer.

Parameters

params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Learning rate (default: 1e-3)
momentum (float) – Momentum factor (default: 0.0)
weight_decay (float) – Weight decay (L2 penalty) (default: 0.0)
dampening (float) – Dampening for momentum (default: 0.0)
nesterov (bool) – Enable Nesterov momentum (default: False)
use_gpu (bool) – Whether to attempt GPU acceleration (default: False, CPU is efficient)

__init__(params, lr=0.001, momentum=0.0, weight_decay=0.0, dampening=0.0, nesterov=False, use_gpu=False)[source]

Initialize SGD optimizer.

Parameters

params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Learning rate (default: 1e-3)
momentum (float) – Momentum factor (default: 0.0)
weight_decay (float) – Weight decay (L2 penalty) (default: 0.0)
dampening (float) – Dampening for momentum (default: 0.0)
nesterov (bool) – Enable Nesterov momentum (default: False)
use_gpu (bool) – Whether to attempt GPU acceleration (default: False, CPU is efficient)

Dependencies: None detected from callable globals.

Variables: params (collections.abc.Iterator[numpy.ndarray], required); lr (float, optional, default 0.001); momentum (float, optional, default 0.0); weight_decay (float, optional, default 0.0); dampening (float, optional, default 0.0); nesterov (bool, optional, default False); use_gpu (bool, optional, default False).

Usage Example

import numpy as np
from grilly.optim.sgd import SGD

instance = SGD(...)
result = instance.__init__(params=np.zeros(1, dtype=np.float32), lr=0.001, momentum=0.0, weight_decay=0.0, dampening=0.0, nesterov=False, use_gpu=False)

_get_backend()[source]

Get or create backend instance

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.sgd import SGD

instance = SGD(...)
result = instance._get_backend()

step(closure=None)[source]

Perform a single optimization step.

Parameters: closure – Optional closure that reevaluates the model and returns loss

Dependencies: numpy.

Variables: closure (Any, optional, default None).

Usage Example

from grilly.optim.sgd import SGD

instance = SGD(...)
result = instance.step(closure=None)

load_state_dict(state_dict)

Load optimizer state from state_dict.

Parameters: state_dict (dict[str, Any]) – Dictionary containing optimizer state

Dependencies: None detected from callable globals.

Variables: state_dict (dict[str, typing.Any], required).

Usage Example

from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.load_state_dict(state_dict='example')

state_dict()

Return the state of the optimizer as a dict.

Returns: Dictionary containing optimizer state
Return type: dict[str, Any]

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.state_dict()

zero_grad()

Clear gradients for all parameters.

Note: In this implementation, gradients are expected to be stored in a separate structure (e.g., in the model’s backward pass). This method is provided for API compatibility.

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.zero_grad()

class grilly.optim.NLMS(params, lr=0.5, lr_decay=0.99995, lr_min=0.1, eps=1e-06, use_gpu=True)[source]

Bases: Optimizer

NLMS (Normalized Least Mean Squares) optimizer.

Uses: nlms-update.glsl

Implements adaptive filtering with normalized learning rate: - w = w + mu * error * x / (||x||^2 + eps)

Reference: ref/brain/specialist.py NLMSExpertHead

Initialize NLMS optimizer.

Parameters

params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Initial learning rate (mu) (default: 0.5)
lr_decay (float) – Learning rate decay factor (default: 0.99995)
lr_min (float) – Minimum learning rate (default: 0.1)
eps (float) – Small constant for numerical stability (default: 1e-6)
use_gpu (bool) – Whether to use GPU acceleration (default: True)

__init__(params, lr=0.5, lr_decay=0.99995, lr_min=0.1, eps=1e-06, use_gpu=True)[source]

Initialize NLMS optimizer.

Parameters

params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Initial learning rate (mu) (default: 0.5)
lr_decay (float) – Learning rate decay factor (default: 0.99995)
lr_min (float) – Minimum learning rate (default: 0.1)
eps (float) – Small constant for numerical stability (default: 1e-6)
use_gpu (bool) – Whether to use GPU acceleration (default: True)

Dependencies: None detected from callable globals.

Variables: params (collections.abc.Iterator[numpy.ndarray], required); lr (float, optional, default 0.5); lr_decay (float, optional, default 0.99995); lr_min (float, optional, default 0.1); eps (float, optional, default 1e-06); use_gpu (bool, optional, default True).

Usage Example

import numpy as np
from grilly.optim.nlms import NLMS

instance = NLMS(...)
result = instance.__init__(params=np.zeros(1, dtype=np.float32), lr=0.5, lr_decay=0.99995, lr_min=0.1, eps=1e-06, use_gpu=True)

_get_backend()[source]

Get or create backend instance

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.nlms import NLMS

instance = NLMS(...)
result = instance._get_backend()

step(closure=None)[source]

Perform a single optimization step.

Parameters: closure – Optional closure that reevaluates the model and returns loss

Dependencies: numpy.

Variables: closure (Any, optional, default None).

Usage Example

from grilly.optim.nlms import NLMS

instance = NLMS(...)
result = instance.step(closure=None)

load_state_dict(state_dict)

Load optimizer state from state_dict.

Parameters: state_dict (dict[str, Any]) – Dictionary containing optimizer state

Dependencies: None detected from callable globals.

Variables: state_dict (dict[str, typing.Any], required).

Usage Example

from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.load_state_dict(state_dict='example')

state_dict()

Return the state of the optimizer as a dict.

Returns: Dictionary containing optimizer state
Return type: dict[str, Any]

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.state_dict()

zero_grad()

Clear gradients for all parameters.

Note: In this implementation, gradients are expected to be stored in a separate structure (e.g., in the model’s backward pass). This method is provided for API compatibility.

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.zero_grad()

class grilly.optim.NaturalGradient(params, lr=0.001, fisher_momentum=0.9, use_gpu=True)[source]

Bases: Optimizer

Natural Gradient optimizer using Fisher information matrix.

Uses: fisher-natural-gradient.glsl

Implements natural gradient descent: - F = Fisher information matrix - param = param - lr * F^(-1) * grad

Reference: grilly/backend/learning.py natural_gradient

Initialize Natural Gradient optimizer.

Parameters

params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Learning rate (default: 1e-3)
fisher_momentum (float) – Momentum for Fisher information estimate (default: 0.9)
use_gpu (bool) – Whether to use GPU acceleration (default: True)

__init__(params, lr=0.001, fisher_momentum=0.9, use_gpu=True)[source]

Initialize Natural Gradient optimizer.

Parameters

params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Learning rate (default: 1e-3)
fisher_momentum (float) – Momentum for Fisher information estimate (default: 0.9)
use_gpu (bool) – Whether to use GPU acceleration (default: True)

Dependencies: None detected from callable globals.

Variables: params (collections.abc.Iterator[numpy.ndarray], required); lr (float, optional, default 0.001); fisher_momentum (float, optional, default 0.9); use_gpu (bool, optional, default True).

Usage Example

import numpy as np
from grilly.optim.natural_gradient import NaturalGradient

instance = NaturalGradient(...)
result = instance.__init__(params=np.zeros(1, dtype=np.float32), lr=0.001, fisher_momentum=0.9, use_gpu=True)

_get_backend()[source]

Get or create backend instance

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.natural_gradient import NaturalGradient

instance = NaturalGradient(...)
result = instance._get_backend()

step(closure=None)[source]

Perform a single optimization step.

Parameters: closure – Optional closure that reevaluates the model and returns loss

Dependencies: numpy.

Variables: closure (Any, optional, default None).

Usage Example

from grilly.optim.natural_gradient import NaturalGradient

instance = NaturalGradient(...)
result = instance.step(closure=None)

load_state_dict(state_dict)

Load optimizer state from state_dict.

Parameters: state_dict (dict[str, Any]) – Dictionary containing optimizer state

Dependencies: None detected from callable globals.

Variables: state_dict (dict[str, typing.Any], required).

Usage Example

from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.load_state_dict(state_dict='example')

state_dict()

Return the state of the optimizer as a dict.

Returns: Dictionary containing optimizer state
Return type: dict[str, Any]

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.state_dict()

zero_grad()

Clear gradients for all parameters.

Note: In this implementation, gradients are expected to be stored in a separate structure (e.g., in the model’s backward pass). This method is provided for API compatibility.

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.zero_grad()

class grilly.optim.HypergradientAdamW(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.01, beta_hyper=1e-07, lr_min=1e-06, lr_max=1.0, log_scale=False, use_gpu=True)[source]

Bases: AdamW

AdamW with hypergradient-based online learning rate adaptation.

Basic version from Baydin et al. (2018). Uses a fixed hypergradient learning rate beta_hyper. Simple but requires manual tuning of beta_hyper. For a self-tuning version, use AutoHypergradientAdamW.

Update rule:: alpha_{t+1} = alpha_t + beta_hyper * sum(g_t * d_{t-1})

Parameters

params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Initial learning rate (default: 1e-3)
betas (tuple) – Coefficients for running averages (default: (0.9, 0.999))
eps (float) – Numerical stability term (default: 1e-8)
weight_decay (float) – Decoupled weight decay (default: 0.01)
beta_hyper (float) – Hypergradient learning rate (default: 1e-7)
lr_min (float) – Minimum learning rate clamp (default: 1e-6)
lr_max (float) – Maximum learning rate clamp (default: 1.0)
log_scale (bool) – If True, adapt log(lr) instead of lr (default: False)
use_gpu (bool) – Whether to use GPU acceleration (default: True)

Initialize AdamW optimizer.

Parameters

params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Learning rate (default: 1e-3)
betas (tuple) – Coefficients for computing running averages (default: (0.9, 0.999))
eps (float) – Term added to denominator for numerical stability (default: 1e-8)
weight_decay (float) – Decoupled weight decay coefficient (default: 0.01)
amsgrad – Whether to use AMSGrad variant (default: False)
use_gpu (bool) – Whether to use GPU acceleration (default: True)
beta_hyper (float) –
lr_min (float) –
lr_max (float) –
log_scale (bool) –

__init__(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.01, beta_hyper=1e-07, lr_min=1e-06, lr_max=1.0, log_scale=False, use_gpu=True)[source]

Initialize AdamW optimizer.

Parameters

params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Learning rate (default: 1e-3)
betas (tuple) – Coefficients for computing running averages (default: (0.9, 0.999))
eps (float) – Term added to denominator for numerical stability (default: 1e-8)
weight_decay (float) – Decoupled weight decay coefficient (default: 0.01)
amsgrad – Whether to use AMSGrad variant (default: False)
use_gpu (bool) – Whether to use GPU acceleration (default: True)
beta_hyper (float) –
lr_min (float) –
lr_max (float) –
log_scale (bool) –

Dependencies: None detected from callable globals.

Variables: params (collections.abc.Iterator[numpy.ndarray], required); lr (float, optional, default 0.001); betas (tuple, optional, default (0.9, 0.999)); eps (float, optional, default 1e-08); weight_decay (float, optional, default 0.01); beta_hyper (float, optional, default 1e-07); lr_min (float, optional, default 1e-06); lr_max (float, optional, default 1.0); log_scale (bool, optional, default False); use_gpu (bool, optional, default True).

Usage Example

import numpy as np
from grilly.optim.hypergradient import HypergradientAdamW

instance = HypergradientAdamW(...)
result = instance.__init__(params=np.zeros(1, dtype=np.float32), lr=0.001, betas=(), eps=1e-08, weight_decay=0.01, beta_hyper=1e-07, lr_min=1e-06, lr_max=1.0, log_scale=False, use_gpu=True)

property current_lr

property lr_history

step(closure=None, gradients=None)[source]

Perform a single optimization step.

Parameters

closure – Optional closure that reevaluates the model and returns loss
gradients – Optional dict mapping parameter IDs to gradients. If None, tries to get gradients from param.grad attribute.

Dependencies: numpy.

Variables: closure (Any, optional, default None); gradients (Any, optional, default None).

Usage Example

from grilly.optim.hypergradient import HypergradientAdamW

instance = HypergradientAdamW(...)
result = instance.step(closure=None, gradients=None)

_adamw_update_gpu(backend, param, grad, exp_avg, exp_avg_sq, lr, beta1, beta2, eps, weight_decay, beta1_t, beta2_t, amsgrad)

GPU-accelerated AdamW update using adamw-update.glsl shader.

Dependencies: numpy.

Variables: backend (Any, required); param (Any, required); grad (Any, required); exp_avg (Any, required); exp_avg_sq (Any, required); lr (Any, required); beta1 (Any, required); beta2 (Any, required); eps (Any, required); weight_decay (Any, required); beta1_t (Any, required); beta2_t (Any, required); amsgrad (Any, required).

Usage Example

from grilly.optim.adamw import AdamW

instance = AdamW(...)
result = instance._adamw_update_gpu(backend=None, param=None, grad=None, exp_avg=None, exp_avg_sq=None, lr=None, beta1=None, beta2=None, eps=None, weight_decay=None, beta1_t=None, beta2_t=None, amsgrad=None)

_get_backend()

Get or create backend instance

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.adamw import AdamW

instance = AdamW(...)
result = instance._get_backend()

load_state_dict(state_dict)

Load optimizer state from state_dict.

Parameters: state_dict (dict[str, Any]) – Dictionary containing optimizer state

Dependencies: None detected from callable globals.

Variables: state_dict (dict[str, typing.Any], required).

Usage Example

from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.load_state_dict(state_dict='example')

state_dict()

Return the state of the optimizer as a dict.

Returns: Dictionary containing optimizer state
Return type: dict[str, Any]

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.state_dict()

zero_grad()

Clear gradients for all parameters.

Note: In this implementation, gradients are expected to be stored in a separate structure (e.g., in the model’s backward pass). This method is provided for API compatibility.

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.zero_grad()

class grilly.optim.AutoHypergradientAdamW(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.01, hyper_lr=0.01, hyper_lr_beta=1.0, lr_min=1e-06, lr_max=1.0, adapt_momentum=False, track_surprise=False, surprise_gamma=0.9, surprise_alpha=0.1, trauma_threshold=0.5, beta_min=0.5, beta_max=0.9995, warmup_steps=10, use_gpu=True)[source]

Bases: AdamW

AdamW with OSGM-style auto hypergradient adjustment.

Self-tuning optimizer that automatically adapts the learning rate (and optionally momentum beta1) using online hypergradient descent with AdaGrad-stabilized updates. No manual hypergradient LR tuning needed — the AdaGrad accumulator self-adjusts the meta-learning rate.

Based on the OSGM/HDM algorithm:

Step size hypergradient (how lr should change):: h_lr = -g_k . d_{k-1} / (||g_{k-1}||^2 + eps) G_lr += h_lr^2 lr -= hyper_lr * h_lr / (sqrt(G_lr) + eps)
Momentum hypergradient (how beta1 should change):: h_beta = g_k . m_{k-1} / (||g_{k-1}||^2 + eps) G_beta += h_beta^2 beta1 -= hyper_lr_beta * h_beta / (sqrt(G_beta) + eps)

The gradient-norm normalization (/ ||g||^2) makes the algorithm scale-invariant, and the AdaGrad accumulator makes the meta-LR self-adjusting — larger past hypergradients automatically slow down future adaptation, preventing oscillation.

Particularly effective for SNN training where surrogate gradients are noisy and the optimal learning rate shifts during training.

Surprise signal (optional, input-level):

Tracks gradient prediction error as a “surprise” signal and exposes it for the model to use as input gain modulation. Unlike backprop-level momentum changes, this acts at the forward-pass level — amplifying input signals when the optimization landscape shifts unexpectedly.

Instant surprise (gradient prediction error):

S_instant = tanh(||g_k - EMA(g)||^2 / (EMA(||g||^2) + eps))

Accumulated surprise (biological momentum / S_bar):

S_bar = alpha * S_instant + (1-alpha) * S_bar_prev

Inverted-U gain (Yerkes-Dodson / trauma protection):

gain = S_bar * exp(-S_bar / trauma_threshold)

The inverted-U curve implements the biological stress response:

Low S_bar → low gain (nothing interesting)
Moderate S_bar → peak gain (optimal learning zone)
High S_bar → gain drops (trauma protection)

This prevents “unerasable events” — if surprise stays high for many consecutive steps (chronic stress), the gain suppresses instead of amplifying, protecting the model from fixating on a single extreme event. Mirrors the HPA axis: acute stress enhances encoding, chronic stress impairs plasticity.

The model reads current_surprise_gain for input scaling:: x_effective = x * (1 + scale * optimizer.current_surprise_gain)

Parameters

params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Initial learning rate (default: 1e-3)
betas (tuple) – Coefficients for running averages (default: (0.9, 0.999))
eps (float) – Numerical stability term (default: 1e-8)
weight_decay (float) – Decoupled weight decay (default: 0.01)
hyper_lr (float) – Meta-learning rate for step size adaptation (default: 0.01). This is automatically modulated by the AdaGrad accumulator, so it’s much less sensitive than HypergradientAdamW’s beta_hyper.
hyper_lr_beta (float) – Meta-learning rate for momentum adaptation (default: 1.0). Only used when adapt_momentum=True.
lr_min (float) – Minimum learning rate clamp (default: 1e-6)
lr_max (float) – Maximum learning rate clamp (default: 1.0)
adapt_momentum (bool) – If True, also adapt beta1 via hypergradient (default: False)
track_surprise (bool) – If True, compute and expose gradient surprise signal via current_surprise_gain (default: False). The model’s forward pass should read this to modulate input gain.
surprise_gamma (float) – EMA decay for gradient tracking (default: 0.9). Higher = smoother baseline, slower to detect change.
surprise_alpha (float) – EMA decay for surprise accumulation S_bar (default: 0.1). Controls how fast accumulated surprise builds up and decays. Lower = longer memory of surprise.
trauma_threshold (float) – S_bar level where gain peaks before suppression (default: 0.5). The inverted-U gain = S_bar * exp(-S_bar/T) peaks at S_bar = T. Above this, gain decreases (protection).
beta_min (float) – Minimum beta1 clamp (default: 0.5)
beta_max (float) – Maximum beta1 clamp (default: 0.9995)
warmup_steps (int) – Steps before starting adaptation (default: 10). Lets Adam moments initialize before adapting LR.
use_gpu (bool) – Whether to use GPU acceleration (default: True)

Initialize AdamW optimizer.

Parameters

params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Learning rate (default: 1e-3)
betas (tuple) – Coefficients for computing running averages (default: (0.9, 0.999))
eps (float) – Term added to denominator for numerical stability (default: 1e-8)
weight_decay (float) – Decoupled weight decay coefficient (default: 0.01)
amsgrad – Whether to use AMSGrad variant (default: False)
use_gpu (bool) – Whether to use GPU acceleration (default: True)
hyper_lr (float) –
hyper_lr_beta (float) –
lr_min (float) –
lr_max (float) –
adapt_momentum (bool) –
track_surprise (bool) –
surprise_gamma (float) –
surprise_alpha (float) –
trauma_threshold (float) –
beta_min (float) –
beta_max (float) –
warmup_steps (int) –

__init__(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.01, hyper_lr=0.01, hyper_lr_beta=1.0, lr_min=1e-06, lr_max=1.0, adapt_momentum=False, track_surprise=False, surprise_gamma=0.9, surprise_alpha=0.1, trauma_threshold=0.5, beta_min=0.5, beta_max=0.9995, warmup_steps=10, use_gpu=True)[source]

Initialize AdamW optimizer.

Parameters

params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Learning rate (default: 1e-3)
betas (tuple) – Coefficients for computing running averages (default: (0.9, 0.999))
eps (float) – Term added to denominator for numerical stability (default: 1e-8)
weight_decay (float) – Decoupled weight decay coefficient (default: 0.01)
amsgrad – Whether to use AMSGrad variant (default: False)
use_gpu (bool) – Whether to use GPU acceleration (default: True)
hyper_lr (float) –
hyper_lr_beta (float) –
lr_min (float) –
lr_max (float) –
adapt_momentum (bool) –
track_surprise (bool) –
surprise_gamma (float) –
surprise_alpha (float) –
trauma_threshold (float) –
beta_min (float) –
beta_max (float) –
warmup_steps (int) –

Dependencies: None detected from callable globals.

Variables: params (collections.abc.Iterator[numpy.ndarray], required); lr (float, optional, default 0.001); betas (tuple, optional, default (0.9, 0.999)); eps (float, optional, default 1e-08); weight_decay (float, optional, default 0.01); hyper_lr (float, optional, default 0.01); hyper_lr_beta (float, optional, default 1.0); lr_min (float, optional, default 1e-06); lr_max (float, optional, default 1.0); adapt_momentum (bool, optional, default False); track_surprise (bool, optional, default False); surprise_gamma (float, optional, default 0.9); surprise_alpha (float, optional, default 0.1); trauma_threshold (float, optional, default 0.5); beta_min (float, optional, default 0.5); beta_max (float, optional, default 0.9995); warmup_steps (int, optional, default 10); use_gpu (bool, optional, default True).

Usage Example

import numpy as np
from grilly.optim.hypergradient import AutoHypergradientAdamW

instance = AutoHypergradientAdamW(...)
result = instance.__init__(params=np.zeros(1, dtype=np.float32), lr=0.001, betas=(), eps=1e-08, weight_decay=0.01, hyper_lr=0.01, hyper_lr_beta=1.0, lr_min=1e-06, lr_max=1.0, adapt_momentum=False, track_surprise=False, surprise_gamma=0.9, surprise_alpha=0.1, trauma_threshold=0.5, beta_min=0.5, beta_max=0.9995, warmup_steps=10, use_gpu=True)

property current_lr

property current_surprise: Instant surprise signal [0, 1]. Raw gradient prediction error.

property accumulated_surprise: Accumulated surprise S_bar. Biological momentum of surprise.

property current_surprise_gain

Inverted-U gain signal for input-level modulation.

Implements the Yerkes-Dodson curve / trauma protection:: gain = S_bar * exp(-S_bar / trauma_threshold)

Low S_bar → low gain (nothing interesting happening)
Moderate S_bar → peak gain (optimal learning zone)
High S_bar → gain drops (trauma protection, don’t fixate)

Read this after each optimizer step and pass to the model:: x_effective = x * (1 + scale * optimizer.current_surprise_gain)

Returns 0.0 when surprise tracking is off or during warmup.

property lr_history

property beta1_history

property surprise_history

property s_bar_history

step(closure=None, gradients=None)[source]

Perform optimization step with OSGM-style auto LR adaptation.

Collect current gradients g_k
Compute surprise signal (if track_surprise=True)
Compute normalized hypergradients (after warmup): h_lr = -g_k . d_{k-1} / ||g_{k-1}||^2 h_beta = g_k . m_{k-1} / ||g_{k-1}||^2
Update AdaGrad accumulators and adjust lr (and beta1)
Run standard AdamW step with adapted hyperparameters
Store d_k, ||g_k||^2, m_k for next step

Dependencies: numpy.

Variables: closure (Any, optional, default None); gradients (Any, optional, default None).

Usage Example

from grilly.optim.hypergradient import AutoHypergradientAdamW

instance = AutoHypergradientAdamW(...)
result = instance.step(closure=None, gradients=None)

_adamw_update_gpu(backend, param, grad, exp_avg, exp_avg_sq, lr, beta1, beta2, eps, weight_decay, beta1_t, beta2_t, amsgrad)

GPU-accelerated AdamW update using adamw-update.glsl shader.

Dependencies: numpy.

Variables: backend (Any, required); param (Any, required); grad (Any, required); exp_avg (Any, required); exp_avg_sq (Any, required); lr (Any, required); beta1 (Any, required); beta2 (Any, required); eps (Any, required); weight_decay (Any, required); beta1_t (Any, required); beta2_t (Any, required); amsgrad (Any, required).

Usage Example

from grilly.optim.adamw import AdamW

instance = AdamW(...)
result = instance._adamw_update_gpu(backend=None, param=None, grad=None, exp_avg=None, exp_avg_sq=None, lr=None, beta1=None, beta2=None, eps=None, weight_decay=None, beta1_t=None, beta2_t=None, amsgrad=None)

_get_backend()

Get or create backend instance

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.adamw import AdamW

instance = AdamW(...)
result = instance._get_backend()

load_state_dict(state_dict)

Load optimizer state from state_dict.

Parameters: state_dict (dict[str, Any]) – Dictionary containing optimizer state

Dependencies: None detected from callable globals.

Variables: state_dict (dict[str, typing.Any], required).

Usage Example

from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.load_state_dict(state_dict='example')

state_dict()

Return the state of the optimizer as a dict.

Returns: Dictionary containing optimizer state
Return type: dict[str, Any]

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.state_dict()

zero_grad()

Clear gradients for all parameters.

Note: In this implementation, gradients are expected to be stored in a separate structure (e.g., in the model’s backward pass). This method is provided for API compatibility.

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.base import Optimizer

instance = Optimizer(...)
result = instance.zero_grad()

class grilly.optim.LRScheduler(optimizer, last_epoch=-1)[source]

Bases: object

Base class for learning rate schedulers.

All schedulers should inherit from this class and implement the get_lr() method.

Initialize base scheduler.

Parameters

optimizer – Wrapped optimizer
last_epoch – The index of last epoch (default: -1)

__init__(optimizer, last_epoch=-1)[source]

Initialize base scheduler.

Parameters

optimizer – Wrapped optimizer
last_epoch – The index of last epoch (default: -1)

Dependencies: None detected from callable globals.

Variables: optimizer (Any, required); last_epoch (Any, optional, default -1).

Usage Example

from grilly.optim.lr_scheduler import LRScheduler

instance = LRScheduler(...)
result = instance.__init__(optimizer=None, last_epoch=-1)

state_dict()[source]

Returns the state of the scheduler as a dict.

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.lr_scheduler import LRScheduler

instance = LRScheduler(...)
result = instance.state_dict()

load_state_dict(state_dict)[source]

Loads the scheduler state.

Dependencies: None detected from callable globals.

Variables: state_dict (Any, required).

Usage Example

from grilly.optim.lr_scheduler import LRScheduler

instance = LRScheduler(...)
result = instance.load_state_dict(state_dict=None)

get_last_lr()[source]

Return last computed learning rate by current scheduler.

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.lr_scheduler import LRScheduler

instance = LRScheduler(...)
result = instance.get_last_lr()

get_lr()[source]

Compute learning rate using chainable form of the scheduler.

This method should be implemented by subclasses.

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.lr_scheduler import LRScheduler

instance = LRScheduler(...)
result = instance.get_lr()

step(epoch=None)[source]

Perform a scheduler step.

Parameters: epoch – Optional epoch number to use instead of incrementing

Dependencies: None detected from callable globals.

Variables: epoch (Any, optional, default None).

Usage Example

from grilly.optim.lr_scheduler import LRScheduler

instance = LRScheduler(...)
result = instance.step(epoch=None)

class grilly.optim.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1)[source]

Bases: LRScheduler

Decays the learning rate by gamma every step_size epochs.

Matches torch.optim.lr_scheduler.StepLR

Initialize StepLR scheduler.

Parameters

optimizer – Wrapped optimizer
step_size – Period of learning rate decay
gamma – Multiplicative factor of learning rate decay (default: 0.1)
last_epoch – The index of last epoch (default: -1)

__init__(optimizer, step_size, gamma=0.1, last_epoch=-1)[source]

Initialize StepLR scheduler.

Parameters

optimizer – Wrapped optimizer
step_size – Period of learning rate decay
gamma – Multiplicative factor of learning rate decay (default: 0.1)
last_epoch – The index of last epoch (default: -1)

Dependencies: None detected from callable globals.

Variables: optimizer (Any, required); step_size (Any, required); gamma (Any, optional, default 0.1); last_epoch (Any, optional, default -1).

Usage Example

from grilly.optim.lr_scheduler import StepLR

instance = StepLR(...)
result = instance.__init__(optimizer=None, step_size=None, gamma=0.1, last_epoch=-1)

get_lr()[source]

Compute learning rate for current epoch.

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.lr_scheduler import StepLR

instance = StepLR(...)
result = instance.get_lr()

get_last_lr()

Return last computed learning rate by current scheduler.

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.lr_scheduler import LRScheduler

instance = LRScheduler(...)
result = instance.get_last_lr()

load_state_dict(state_dict)

Loads the scheduler state.

Dependencies: None detected from callable globals.

Variables: state_dict (Any, required).

Usage Example

from grilly.optim.lr_scheduler import LRScheduler

instance = LRScheduler(...)
result = instance.load_state_dict(state_dict=None)

state_dict()

Returns the state of the scheduler as a dict.

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.lr_scheduler import LRScheduler

instance = LRScheduler(...)
result = instance.state_dict()

step(epoch=None)

Perform a scheduler step.

Parameters: epoch – Optional epoch number to use instead of incrementing

Dependencies: None detected from callable globals.

Variables: epoch (Any, optional, default None).

Usage Example

from grilly.optim.lr_scheduler import LRScheduler

instance = LRScheduler(...)
result = instance.step(epoch=None)

class grilly.optim.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=-1)[source]

Bases: LRScheduler

Set the learning rate using a cosine annealing schedule.

Matches torch.optim.lr_scheduler.CosineAnnealingLR

Initialize CosineAnnealingLR scheduler.

Parameters

optimizer – Wrapped optimizer
T_max – Maximum number of iterations
eta_min – Minimum learning rate (default: 0)
last_epoch – The index of last epoch (default: -1)

__init__(optimizer, T_max, eta_min=0, last_epoch=-1)[source]

Initialize CosineAnnealingLR scheduler.

Parameters

optimizer – Wrapped optimizer
T_max – Maximum number of iterations
eta_min – Minimum learning rate (default: 0)
last_epoch – The index of last epoch (default: -1)

Dependencies: None detected from callable globals.

Variables: optimizer (Any, required); T_max (Any, required); eta_min (Any, optional, default 0); last_epoch (Any, optional, default -1).

Usage Example

from grilly.optim.lr_scheduler import CosineAnnealingLR

instance = CosineAnnealingLR(...)
result = instance.__init__(optimizer=None, T_max=None, eta_min=0, last_epoch=-1)

get_lr()[source]

Compute learning rate using cosine annealing.

Dependencies: math.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.lr_scheduler import CosineAnnealingLR

instance = CosineAnnealingLR(...)
result = instance.get_lr()

get_last_lr()

Return last computed learning rate by current scheduler.

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.lr_scheduler import LRScheduler

instance = LRScheduler(...)
result = instance.get_last_lr()

load_state_dict(state_dict)

Loads the scheduler state.

Dependencies: None detected from callable globals.

Variables: state_dict (Any, required).

Usage Example

from grilly.optim.lr_scheduler import LRScheduler

instance = LRScheduler(...)
result = instance.load_state_dict(state_dict=None)

state_dict()

Returns the state of the scheduler as a dict.

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.lr_scheduler import LRScheduler

instance = LRScheduler(...)
result = instance.state_dict()

step(epoch=None)

Perform a scheduler step.

Parameters: epoch – Optional epoch number to use instead of incrementing

Dependencies: None detected from callable globals.

Variables: epoch (Any, optional, default None).

Usage Example

from grilly.optim.lr_scheduler import LRScheduler

instance = LRScheduler(...)
result = instance.step(epoch=None)

class grilly.optim.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)[source]

Bases: object

Reduce learning rate when a metric has stopped improving.

Matches torch.optim.lr_scheduler.ReduceLROnPlateau

Initialize ReduceLROnPlateau scheduler.

Parameters

optimizer – Wrapped optimizer
mode – One of ‘min’ or ‘max’. In ‘min’ mode, lr will be reduced when the quantity monitored has stopped decreasing (default: ‘min’)
factor – Factor by which the learning rate will be reduced (default: 0.1)
patience – Number of epochs with no improvement after which learning rate will be reduced (default: 10)
threshold – Threshold for measuring the new optimum (default: 1e-4)
threshold_mode – One of ‘rel’, ‘abs’ (default: ‘rel’)
cooldown – Number of epochs to wait before resuming normal operation after lr has been reduced (default: 0)
min_lr – A lower bound on the learning rate (default: 0)
eps – Minimal decay applied to lr (default: 1e-8)

__init__(optimizer, mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)[source]

Initialize ReduceLROnPlateau scheduler.

Parameters

optimizer – Wrapped optimizer
mode – One of ‘min’ or ‘max’. In ‘min’ mode, lr will be reduced when the quantity monitored has stopped decreasing (default: ‘min’)
factor – Factor by which the learning rate will be reduced (default: 0.1)
patience – Number of epochs with no improvement after which learning rate will be reduced (default: 10)
threshold – Threshold for measuring the new optimum (default: 1e-4)
threshold_mode – One of ‘rel’, ‘abs’ (default: ‘rel’)
cooldown – Number of epochs to wait before resuming normal operation after lr has been reduced (default: 0)
min_lr – A lower bound on the learning rate (default: 0)
eps – Minimal decay applied to lr (default: 1e-8)

Dependencies: None detected from callable globals.

Variables: optimizer (Any, required); mode (Any, optional, default 'min'); factor (Any, optional, default 0.1); patience (Any, optional, default 10); threshold (Any, optional, default 0.0001); threshold_mode (Any, optional, default 'rel'); cooldown (Any, optional, default 0); min_lr (Any, optional, default 0); eps (Any, optional, default 1e-08).

Usage Example

from grilly.optim.lr_scheduler import ReduceLROnPlateau

instance = ReduceLROnPlateau(...)
result = instance.__init__(optimizer=None, mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)

_reset()[source]

Reset num_bad_epochs counter and cooldown counter.

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.lr_scheduler import ReduceLROnPlateau

instance = ReduceLROnPlateau(...)
result = instance._reset()

step(metrics, epoch=None)[source]

Perform a scheduler step based on metric.

Parameters

metrics – The metric to monitor
epoch – Optional epoch number

Dependencies: None detected from callable globals.

Variables: metrics (Any, required); epoch (Any, optional, default None).

Usage Example

from grilly.optim.lr_scheduler import ReduceLROnPlateau

instance = ReduceLROnPlateau(...)
result = instance.step(metrics=None, epoch=None)

_reduce_lr(epoch)[source]

Reduce learning rate.

Dependencies: None detected from callable globals.

Variables: epoch (Any, required).

Usage Example

from grilly.optim.lr_scheduler import ReduceLROnPlateau

instance = ReduceLROnPlateau(...)
result = instance._reduce_lr(epoch=None)

property in_cooldown: Check if scheduler is in cooldown period.

is_better(a, best)[source]

Check if metric ‘a’ is better than ‘best’.

Dependencies: None detected from callable globals.

Variables: a (Any, required); best (Any, required).

Usage Example

from grilly.optim.lr_scheduler import ReduceLROnPlateau

instance = ReduceLROnPlateau(...)
result = instance.is_better(a=None, best=None)

_init_is_better(mode, threshold, threshold_mode)[source]

Initialize comparison function.

Dependencies: None detected from callable globals.

Variables: mode (Any, required); threshold (Any, required); threshold_mode (Any, required).

Usage Example

from grilly.optim.lr_scheduler import ReduceLROnPlateau

instance = ReduceLROnPlateau(...)
result = instance._init_is_better(mode=None, threshold=None, threshold_mode=None)

state_dict()[source]

Returns the state of the scheduler as a dict.

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.lr_scheduler import ReduceLROnPlateau

instance = ReduceLROnPlateau(...)
result = instance.state_dict()

load_state_dict(state_dict)[source]

Loads the scheduler state.

Dependencies: None detected from callable globals.

Variables: state_dict (Any, required).

Usage Example

from grilly.optim.lr_scheduler import ReduceLROnPlateau

instance = ReduceLROnPlateau(...)
result = instance.load_state_dict(state_dict=None)

class grilly.optim.OneCycleLR(optimizer, max_lr, total_steps=None, epochs=None, steps_per_epoch=None, pct_start=0.3, anneal_strategy='cos', cycle_momentum=True, base_momentum=0.85, max_momentum=0.95, div_factor=25.0, final_div_factor=10000.0, last_epoch=-1)[source]

Bases: LRScheduler

Sets the learning rate according to the 1cycle learning rate policy.

Matches torch.optim.lr_scheduler.OneCycleLR

Initialize OneCycleLR scheduler.

Parameters

optimizer – Wrapped optimizer
max_lr – Upper learning rate boundary in the cycle
total_steps – Total number of steps in the cycle (optional)
epochs – Number of epochs to train for (optional)
steps_per_epoch – Number of steps per epoch (optional)
pct_start – Percentage of the cycle spent increasing the learning rate (default: 0.3)
anneal_strategy – Specifies the annealing strategy: ‘cos’ or ‘linear’ (default: ‘cos’)
cycle_momentum – If True, momentum is cycled inversely (default: True)
base_momentum – Lower momentum boundary in the cycle (default: 0.85)
max_momentum – Upper momentum boundary in the cycle (default: 0.95)
div_factor – Determines the initial learning rate via initial_lr = max_lr/div_factor (default: 25)
final_div_factor – Determines the minimum learning rate via min_lr = initial_lr/final_div_factor (default: 1e4)
last_epoch – The index of last epoch (default: -1)

__init__(optimizer, max_lr, total_steps=None, epochs=None, steps_per_epoch=None, pct_start=0.3, anneal_strategy='cos', cycle_momentum=True, base_momentum=0.85, max_momentum=0.95, div_factor=25.0, final_div_factor=10000.0, last_epoch=-1)[source]

Initialize OneCycleLR scheduler.

Parameters

optimizer – Wrapped optimizer
max_lr – Upper learning rate boundary in the cycle
total_steps – Total number of steps in the cycle (optional)
epochs – Number of epochs to train for (optional)
steps_per_epoch – Number of steps per epoch (optional)
pct_start – Percentage of the cycle spent increasing the learning rate (default: 0.3)
anneal_strategy – Specifies the annealing strategy: ‘cos’ or ‘linear’ (default: ‘cos’)
cycle_momentum – If True, momentum is cycled inversely (default: True)
base_momentum – Lower momentum boundary in the cycle (default: 0.85)
max_momentum – Upper momentum boundary in the cycle (default: 0.95)
div_factor – Determines the initial learning rate via initial_lr = max_lr/div_factor (default: 25)
final_div_factor – Determines the minimum learning rate via min_lr = initial_lr/final_div_factor (default: 1e4)
last_epoch – The index of last epoch (default: -1)

Dependencies: None detected from callable globals.

Variables: optimizer (Any, required); max_lr (Any, required); total_steps (Any, optional, default None); epochs (Any, optional, default None); steps_per_epoch (Any, optional, default None); pct_start (Any, optional, default 0.3); anneal_strategy (Any, optional, default 'cos'); cycle_momentum (Any, optional, default True); base_momentum (Any, optional, default 0.85); max_momentum (Any, optional, default 0.95); div_factor (Any, optional, default 25.0); final_div_factor (Any, optional, default 10000.0); last_epoch (Any, optional, default -1).

Usage Example

from grilly.optim.lr_scheduler import OneCycleLR

instance = OneCycleLR(...)
result = instance.__init__(optimizer=None, max_lr=None, total_steps=None, epochs=None, steps_per_epoch=None, pct_start=0.3, anneal_strategy='cos', cycle_momentum=True, base_momentum=0.85, max_momentum=0.95, div_factor=25.0, final_div_factor=10000.0, last_epoch=-1)

_format_param(name, optimizer, param)[source]

Format parameter to be a list per parameter group.

Dependencies: None detected from callable globals.

Variables: name (Any, required); optimizer (Any, required); param (Any, required).

Usage Example

from grilly.optim.lr_scheduler import OneCycleLR

instance = OneCycleLR(...)
result = instance._format_param(name=None, optimizer=None, param=None)

_annealing_cos(start, end, pct)[source]

Cosine annealing from start to end as pct goes from 0.0 to 1.0.

Dependencies: math.

Variables: start (Any, required); end (Any, required); pct (Any, required).

Usage Example

from grilly.optim.lr_scheduler import OneCycleLR

instance = OneCycleLR(...)
result = instance._annealing_cos(start=None, end=None, pct=None)

get_last_lr()

Return last computed learning rate by current scheduler.

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.lr_scheduler import LRScheduler

instance = LRScheduler(...)
result = instance.get_last_lr()

load_state_dict(state_dict)

Loads the scheduler state.

Dependencies: None detected from callable globals.

Variables: state_dict (Any, required).

Usage Example

from grilly.optim.lr_scheduler import LRScheduler

instance = LRScheduler(...)
result = instance.load_state_dict(state_dict=None)

state_dict()

Returns the state of the scheduler as a dict.

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.lr_scheduler import LRScheduler

instance = LRScheduler(...)
result = instance.state_dict()

step(epoch=None)

Perform a scheduler step.

Parameters: epoch – Optional epoch number to use instead of incrementing

Dependencies: None detected from callable globals.

Variables: epoch (Any, optional, default None).

Usage Example

from grilly.optim.lr_scheduler import LRScheduler

instance = LRScheduler(...)
result = instance.step(epoch=None)

_annealing_linear(start, end, pct)[source]

Linear annealing from start to end as pct goes from 0.0 to 1.0.

Dependencies: None detected from callable globals.

Variables: start (Any, required); end (Any, required); pct (Any, required).

Usage Example

from grilly.optim.lr_scheduler import OneCycleLR

instance = OneCycleLR(...)
result = instance._annealing_linear(start=None, end=None, pct=None)

get_lr()[source]

Compute learning rate at current step.

Dependencies: None detected from callable globals.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.optim.lr_scheduler import OneCycleLR

instance = OneCycleLR(...)
result = instance.get_lr()

Modules

`grilly.optim.adam`	Adam Optimizer
`grilly.optim.adamw`	AdamW Optimizer
`grilly.optim.base`	Base Optimizer class (PyTorch-like)
`grilly.optim.hypergradient`	Hypergradient Descent Optimizers
`grilly.optim.lr_scheduler`	Learning Rate Schedulers
`grilly.optim.natural_gradient`	Natural Gradient Optimizer
`grilly.optim.nlms`	NLMS (Normalized Least Mean Squares) Optimizer
`grilly.optim.sgd`	SGD Optimizer