grilly.optim
Optimizers Module (PyTorch-like)
GPU-accelerated optimizers using Vulkan compute shaders.
Classes
|
Adam optimizer using GPU-accelerated shaders. |
|
AdamW optimizer with decoupled weight decay. |
|
Affect-aware Adam optimizer. |
|
AdamW with OSGM-style auto hypergradient adjustment. |
|
Set the learning rate using a cosine annealing schedule. |
|
AdamW with hypergradient-based online learning rate adaptation. |
|
Base class for learning rate schedulers. |
|
NLMS (Normalized Least Mean Squares) optimizer. |
|
Natural Gradient optimizer using Fisher information matrix. |
|
Sets the learning rate according to the 1cycle learning rate policy. |
|
Base class for all optimizers. |
|
Reduce learning rate when a metric has stopped improving. |
|
Stochastic Gradient Descent optimizer. |
|
Decays the learning rate by gamma every step_size epochs. |
- class grilly.optim.Optimizer(params, defaults)[source]
Bases:
objectBase class for all optimizers.
Similar to torch.optim.Optimizer, but works with numpy arrays and GPU-accelerated operations via Vulkan shaders.
Initialize optimizer.
- Parameters
params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
defaults (dict[str, Any]) – Dictionary of default hyperparameter values
- __init__(params, defaults)[source]
Initialize optimizer.
- Parameters
params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
defaults (dict[str, Any]) – Dictionary of default hyperparameter values
Dependencies:
numpy.Variables:
params(collections.abc.Iterator[numpy.ndarray], required);defaults(dict[str, typing.Any], required).Usage Example
import numpy as np from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.__init__(params=np.zeros(1, dtype=np.float32), defaults='example')
- zero_grad()[source]
Clear gradients for all parameters.
Note: In this implementation, gradients are expected to be stored in a separate structure (e.g., in the model’s backward pass). This method is provided for API compatibility.
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.zero_grad()
- step(closure=None)[source]
Perform a single optimization step.
- Parameters
closure – Optional closure that reevaluates the model and returns loss
Must be implemented by subclasses.
Dependencies:
Nonedetected from callable globals.Variables:
closure(Any, optional, defaultNone).Usage Example
from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.step(closure=None)
- state_dict()[source]
Return the state of the optimizer as a dict.
- Returns
Dictionary containing optimizer state
- Return type
dict[str, Any]
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.state_dict()
- load_state_dict(state_dict)[source]
Load optimizer state from state_dict.
- Parameters
state_dict (dict[str, Any]) – Dictionary containing optimizer state
Dependencies:
Nonedetected from callable globals.Variables:
state_dict(dict[str, typing.Any], required).Usage Example
from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.load_state_dict(state_dict='example')
- class grilly.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.0, use_gpu=True)[source]
Bases:
OptimizerAdam optimizer using GPU-accelerated shaders.
Uses: adam-update.glsl
Implements the Adam algorithm: - m = beta1 * m + (1 - beta1) * grad - v = beta2 * v + (1 - beta2) * grad^2 - m_hat = m / (1 - beta1^t) - v_hat = v / (1 - beta2^t) - param = param - lr * m_hat / (sqrt(v_hat) + eps)
Initialize Adam optimizer.
- Parameters
params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Learning rate (default: 1e-3)
betas (tuple) – Coefficients for computing running averages (default: (0.9, 0.999))
eps (float) – Term added to denominator for numerical stability (default: 1e-8)
weight_decay (float) – Weight decay (L2 penalty) (default: 0.0)
use_gpu (bool) – Whether to use GPU acceleration (default: True)
- __init__(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.0, use_gpu=True)[source]
Initialize Adam optimizer.
- Parameters
params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Learning rate (default: 1e-3)
betas (tuple) – Coefficients for computing running averages (default: (0.9, 0.999))
eps (float) – Term added to denominator for numerical stability (default: 1e-8)
weight_decay (float) – Weight decay (L2 penalty) (default: 0.0)
use_gpu (bool) – Whether to use GPU acceleration (default: True)
Dependencies:
Nonedetected from callable globals.Variables:
params(collections.abc.Iterator[numpy.ndarray], required);lr(float, optional, default0.001);betas(tuple, optional, default(0.9, 0.999));eps(float, optional, default1e-08);weight_decay(float, optional, default0.0);use_gpu(bool, optional, defaultTrue).Usage Example
import numpy as np from grilly.optim.adam import Adam instance = Adam(...) result = instance.__init__(params=np.zeros(1, dtype=np.float32), lr=0.001, betas=(), eps=1e-08, weight_decay=0.0, use_gpu=True)
- _get_backend()[source]
Get or create backend instance
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.adam import Adam instance = Adam(...) result = instance._get_backend()
- step(closure=None, gradients=None)[source]
Perform a single optimization step.
- Parameters
closure – Optional closure that reevaluates the model and returns loss
gradients – Optional dict mapping parameter IDs to gradients. If None, tries to get gradients from param.grad attribute.
Dependencies:
numpy.Variables:
closure(Any, optional, defaultNone);gradients(Any, optional, defaultNone).Usage Example
from grilly.optim.adam import Adam instance = Adam(...) result = instance.step(closure=None, gradients=None)
- _adam_update_gpu(backend, param, grad, exp_avg, exp_avg_sq, lr, beta1, beta2, eps, beta1_t, beta2_t)[source]
GPU-accelerated Adam update using adam-update.glsl shader.
Dependencies:
numpy.Variables:
backend(Any, required);param(Any, required);grad(Any, required);exp_avg(Any, required);exp_avg_sq(Any, required);lr(Any, required);beta1(Any, required);beta2(Any, required);eps(Any, required);beta1_t(Any, required);beta2_t(Any, required).Usage Example
from grilly.optim.adam import Adam instance = Adam(...) result = instance._adam_update_gpu(backend=None, param=None, grad=None, exp_avg=None, exp_avg_sq=None, lr=None, beta1=None, beta2=None, eps=None, beta1_t=None, beta2_t=None)
- load_state_dict(state_dict)
Load optimizer state from state_dict.
- Parameters
state_dict (dict[str, Any]) – Dictionary containing optimizer state
Dependencies:
Nonedetected from callable globals.Variables:
state_dict(dict[str, typing.Any], required).Usage Example
from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.load_state_dict(state_dict='example')
- state_dict()
Return the state of the optimizer as a dict.
- Returns
Dictionary containing optimizer state
- Return type
dict[str, Any]
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.state_dict()
- zero_grad()
Clear gradients for all parameters.
Note: In this implementation, gradients are expected to be stored in a separate structure (e.g., in the model’s backward pass). This method is provided for API compatibility.
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.zero_grad()
- class grilly.optim.AdamW(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.01, amsgrad=False, use_gpu=True)[source]
Bases:
OptimizerAdamW optimizer with decoupled weight decay.
Implements the AdamW algorithm: - m = beta1 * m + (1 - beta1) * grad - v = beta2 * v + (1 - beta2) * grad^2 - m_hat = m / (1 - beta1^t) - v_hat = v / (1 - beta2^t) - param = param - lr * m_hat / (sqrt(v_hat) + eps) # Adam step - param = param - lr * weight_decay * param # Decoupled weight decay
This decoupling improves generalization compared to Adam’s coupled weight decay.
Initialize AdamW optimizer.
- Parameters
params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Learning rate (default: 1e-3)
betas (tuple) – Coefficients for computing running averages (default: (0.9, 0.999))
eps (float) – Term added to denominator for numerical stability (default: 1e-8)
weight_decay (float) – Decoupled weight decay coefficient (default: 0.01)
amsgrad (bool) – Whether to use AMSGrad variant (default: False)
use_gpu (bool) – Whether to use GPU acceleration (default: True)
- __init__(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.01, amsgrad=False, use_gpu=True)[source]
Initialize AdamW optimizer.
- Parameters
params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Learning rate (default: 1e-3)
betas (tuple) – Coefficients for computing running averages (default: (0.9, 0.999))
eps (float) – Term added to denominator for numerical stability (default: 1e-8)
weight_decay (float) – Decoupled weight decay coefficient (default: 0.01)
amsgrad (bool) – Whether to use AMSGrad variant (default: False)
use_gpu (bool) – Whether to use GPU acceleration (default: True)
Dependencies:
Nonedetected from callable globals.Variables:
params(collections.abc.Iterator[numpy.ndarray], required);lr(float, optional, default0.001);betas(tuple, optional, default(0.9, 0.999));eps(float, optional, default1e-08);weight_decay(float, optional, default0.01);amsgrad(bool, optional, defaultFalse);use_gpu(bool, optional, defaultTrue).Usage Example
import numpy as np from grilly.optim.adamw import AdamW instance = AdamW(...) result = instance.__init__(params=np.zeros(1, dtype=np.float32), lr=0.001, betas=(), eps=1e-08, weight_decay=0.01, amsgrad=False, use_gpu=True)
- _get_backend()[source]
Get or create backend instance
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.adamw import AdamW instance = AdamW(...) result = instance._get_backend()
- step(closure=None, gradients=None)[source]
Perform a single optimization step.
- Parameters
closure – Optional closure that reevaluates the model and returns loss
gradients – Optional dict mapping parameter IDs to gradients. If None, tries to get gradients from param.grad attribute.
Dependencies:
numpy.Variables:
closure(Any, optional, defaultNone);gradients(Any, optional, defaultNone).Usage Example
from grilly.optim.adamw import AdamW instance = AdamW(...) result = instance.step(closure=None, gradients=None)
- _adamw_update_gpu(backend, param, grad, exp_avg, exp_avg_sq, lr, beta1, beta2, eps, weight_decay, beta1_t, beta2_t, amsgrad)[source]
GPU-accelerated AdamW update using adamw-update.glsl shader.
Dependencies:
numpy.Variables:
backend(Any, required);param(Any, required);grad(Any, required);exp_avg(Any, required);exp_avg_sq(Any, required);lr(Any, required);beta1(Any, required);beta2(Any, required);eps(Any, required);weight_decay(Any, required);beta1_t(Any, required);beta2_t(Any, required);amsgrad(Any, required).Usage Example
from grilly.optim.adamw import AdamW instance = AdamW(...) result = instance._adamw_update_gpu(backend=None, param=None, grad=None, exp_avg=None, exp_avg_sq=None, lr=None, beta1=None, beta2=None, eps=None, weight_decay=None, beta1_t=None, beta2_t=None, amsgrad=None)
- load_state_dict(state_dict)
Load optimizer state from state_dict.
- Parameters
state_dict (dict[str, Any]) – Dictionary containing optimizer state
Dependencies:
Nonedetected from callable globals.Variables:
state_dict(dict[str, typing.Any], required).Usage Example
from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.load_state_dict(state_dict='example')
- state_dict()
Return the state of the optimizer as a dict.
- Returns
Dictionary containing optimizer state
- Return type
dict[str, Any]
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.state_dict()
- zero_grad()
Clear gradients for all parameters.
Note: In this implementation, gradients are expected to be stored in a separate structure (e.g., in the model’s backward pass). This method is provided for API compatibility.
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.zero_grad()
- class grilly.optim.AffectAdam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.0, use_gpu=True)[source]
Bases:
AdamAffect-aware Adam optimizer.
Uses: affect-adam.glsl
Similar to Adam but optimized for affect/emotion processing.
Initialize AffectAdam optimizer.
Args are the same as Adam.
- Parameters
params (Iterator[numpy.ndarray]) –
lr (float) –
betas (tuple) –
eps (float) –
weight_decay (float) –
use_gpu (bool) –
- __init__(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.0, use_gpu=True)[source]
Initialize AffectAdam optimizer.
Args are the same as Adam.
Dependencies:
Nonedetected from callable globals.Variables:
params(collections.abc.Iterator[numpy.ndarray], required);lr(float, optional, default0.001);betas(tuple, optional, default(0.9, 0.999));eps(float, optional, default1e-08);weight_decay(float, optional, default0.0);use_gpu(bool, optional, defaultTrue).Usage Example
import numpy as np from grilly.optim.adam import AffectAdam instance = AffectAdam(...) result = instance.__init__(params=np.zeros(1, dtype=np.float32), lr=0.001, betas=(), eps=1e-08, weight_decay=0.0, use_gpu=True)
- Parameters
params (Iterator[numpy.ndarray]) –
lr (float) –
betas (tuple) –
eps (float) –
weight_decay (float) –
use_gpu (bool) –
- _adam_update_gpu(backend, param, grad, exp_avg, exp_avg_sq, lr, beta1, beta2, eps, beta1_t, beta2_t)[source]
GPU-accelerated AffectAdam update using affect-adam.glsl shader.
Dependencies:
Nonedetected from callable globals.Variables:
backend(Any, required);param(Any, required);grad(Any, required);exp_avg(Any, required);exp_avg_sq(Any, required);lr(Any, required);beta1(Any, required);beta2(Any, required);eps(Any, required);beta1_t(Any, required);beta2_t(Any, required).Usage Example
from grilly.optim.adam import AffectAdam instance = AffectAdam(...) result = instance._adam_update_gpu(backend=None, param=None, grad=None, exp_avg=None, exp_avg_sq=None, lr=None, beta1=None, beta2=None, eps=None, beta1_t=None, beta2_t=None)
- _get_backend()
Get or create backend instance
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.adam import Adam instance = Adam(...) result = instance._get_backend()
- load_state_dict(state_dict)
Load optimizer state from state_dict.
- Parameters
state_dict (dict[str, Any]) – Dictionary containing optimizer state
Dependencies:
Nonedetected from callable globals.Variables:
state_dict(dict[str, typing.Any], required).Usage Example
from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.load_state_dict(state_dict='example')
- state_dict()
Return the state of the optimizer as a dict.
- Returns
Dictionary containing optimizer state
- Return type
dict[str, Any]
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.state_dict()
- step(closure=None, gradients=None)
Perform a single optimization step.
- Parameters
closure – Optional closure that reevaluates the model and returns loss
gradients – Optional dict mapping parameter IDs to gradients. If None, tries to get gradients from param.grad attribute.
Dependencies:
numpy.Variables:
closure(Any, optional, defaultNone);gradients(Any, optional, defaultNone).Usage Example
from grilly.optim.adam import Adam instance = Adam(...) result = instance.step(closure=None, gradients=None)
- zero_grad()
Clear gradients for all parameters.
Note: In this implementation, gradients are expected to be stored in a separate structure (e.g., in the model’s backward pass). This method is provided for API compatibility.
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.zero_grad()
- class grilly.optim.SGD(params, lr=0.001, momentum=0.0, weight_decay=0.0, dampening=0.0, nesterov=False, use_gpu=False)[source]
Bases:
OptimizerStochastic Gradient Descent optimizer.
Implements: param = param - lr * grad
Note: SGD is simple enough that CPU implementation is efficient. For GPU acceleration, we could use a generic update shader in the future.
Initialize SGD optimizer.
- Parameters
params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Learning rate (default: 1e-3)
momentum (float) – Momentum factor (default: 0.0)
weight_decay (float) – Weight decay (L2 penalty) (default: 0.0)
dampening (float) – Dampening for momentum (default: 0.0)
nesterov (bool) – Enable Nesterov momentum (default: False)
use_gpu (bool) – Whether to attempt GPU acceleration (default: False, CPU is efficient)
- __init__(params, lr=0.001, momentum=0.0, weight_decay=0.0, dampening=0.0, nesterov=False, use_gpu=False)[source]
Initialize SGD optimizer.
- Parameters
params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Learning rate (default: 1e-3)
momentum (float) – Momentum factor (default: 0.0)
weight_decay (float) – Weight decay (L2 penalty) (default: 0.0)
dampening (float) – Dampening for momentum (default: 0.0)
nesterov (bool) – Enable Nesterov momentum (default: False)
use_gpu (bool) – Whether to attempt GPU acceleration (default: False, CPU is efficient)
Dependencies:
Nonedetected from callable globals.Variables:
params(collections.abc.Iterator[numpy.ndarray], required);lr(float, optional, default0.001);momentum(float, optional, default0.0);weight_decay(float, optional, default0.0);dampening(float, optional, default0.0);nesterov(bool, optional, defaultFalse);use_gpu(bool, optional, defaultFalse).Usage Example
import numpy as np from grilly.optim.sgd import SGD instance = SGD(...) result = instance.__init__(params=np.zeros(1, dtype=np.float32), lr=0.001, momentum=0.0, weight_decay=0.0, dampening=0.0, nesterov=False, use_gpu=False)
- _get_backend()[source]
Get or create backend instance
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.sgd import SGD instance = SGD(...) result = instance._get_backend()
- step(closure=None)[source]
Perform a single optimization step.
- Parameters
closure – Optional closure that reevaluates the model and returns loss
Dependencies:
numpy.Variables:
closure(Any, optional, defaultNone).Usage Example
from grilly.optim.sgd import SGD instance = SGD(...) result = instance.step(closure=None)
- load_state_dict(state_dict)
Load optimizer state from state_dict.
- Parameters
state_dict (dict[str, Any]) – Dictionary containing optimizer state
Dependencies:
Nonedetected from callable globals.Variables:
state_dict(dict[str, typing.Any], required).Usage Example
from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.load_state_dict(state_dict='example')
- state_dict()
Return the state of the optimizer as a dict.
- Returns
Dictionary containing optimizer state
- Return type
dict[str, Any]
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.state_dict()
- zero_grad()
Clear gradients for all parameters.
Note: In this implementation, gradients are expected to be stored in a separate structure (e.g., in the model’s backward pass). This method is provided for API compatibility.
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.zero_grad()
- class grilly.optim.NLMS(params, lr=0.5, lr_decay=0.99995, lr_min=0.1, eps=1e-06, use_gpu=True)[source]
Bases:
OptimizerNLMS (Normalized Least Mean Squares) optimizer.
Uses: nlms-update.glsl
Implements adaptive filtering with normalized learning rate: - w = w + mu * error * x / (||x||^2 + eps)
Reference: ref/brain/specialist.py NLMSExpertHead
Initialize NLMS optimizer.
- Parameters
params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Initial learning rate (mu) (default: 0.5)
lr_decay (float) – Learning rate decay factor (default: 0.99995)
lr_min (float) – Minimum learning rate (default: 0.1)
eps (float) – Small constant for numerical stability (default: 1e-6)
use_gpu (bool) – Whether to use GPU acceleration (default: True)
- __init__(params, lr=0.5, lr_decay=0.99995, lr_min=0.1, eps=1e-06, use_gpu=True)[source]
Initialize NLMS optimizer.
- Parameters
params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Initial learning rate (mu) (default: 0.5)
lr_decay (float) – Learning rate decay factor (default: 0.99995)
lr_min (float) – Minimum learning rate (default: 0.1)
eps (float) – Small constant for numerical stability (default: 1e-6)
use_gpu (bool) – Whether to use GPU acceleration (default: True)
Dependencies:
Nonedetected from callable globals.Variables:
params(collections.abc.Iterator[numpy.ndarray], required);lr(float, optional, default0.5);lr_decay(float, optional, default0.99995);lr_min(float, optional, default0.1);eps(float, optional, default1e-06);use_gpu(bool, optional, defaultTrue).Usage Example
import numpy as np from grilly.optim.nlms import NLMS instance = NLMS(...) result = instance.__init__(params=np.zeros(1, dtype=np.float32), lr=0.5, lr_decay=0.99995, lr_min=0.1, eps=1e-06, use_gpu=True)
- _get_backend()[source]
Get or create backend instance
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.nlms import NLMS instance = NLMS(...) result = instance._get_backend()
- step(closure=None)[source]
Perform a single optimization step.
- Parameters
closure – Optional closure that reevaluates the model and returns loss
Dependencies:
numpy.Variables:
closure(Any, optional, defaultNone).Usage Example
from grilly.optim.nlms import NLMS instance = NLMS(...) result = instance.step(closure=None)
- load_state_dict(state_dict)
Load optimizer state from state_dict.
- Parameters
state_dict (dict[str, Any]) – Dictionary containing optimizer state
Dependencies:
Nonedetected from callable globals.Variables:
state_dict(dict[str, typing.Any], required).Usage Example
from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.load_state_dict(state_dict='example')
- state_dict()
Return the state of the optimizer as a dict.
- Returns
Dictionary containing optimizer state
- Return type
dict[str, Any]
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.state_dict()
- zero_grad()
Clear gradients for all parameters.
Note: In this implementation, gradients are expected to be stored in a separate structure (e.g., in the model’s backward pass). This method is provided for API compatibility.
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.zero_grad()
- class grilly.optim.NaturalGradient(params, lr=0.001, fisher_momentum=0.9, use_gpu=True)[source]
Bases:
OptimizerNatural Gradient optimizer using Fisher information matrix.
Uses: fisher-natural-gradient.glsl
Implements natural gradient descent: - F = Fisher information matrix - param = param - lr * F^(-1) * grad
Reference: grilly/backend/learning.py natural_gradient
Initialize Natural Gradient optimizer.
- Parameters
params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Learning rate (default: 1e-3)
fisher_momentum (float) – Momentum for Fisher information estimate (default: 0.9)
use_gpu (bool) – Whether to use GPU acceleration (default: True)
- __init__(params, lr=0.001, fisher_momentum=0.9, use_gpu=True)[source]
Initialize Natural Gradient optimizer.
- Parameters
params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Learning rate (default: 1e-3)
fisher_momentum (float) – Momentum for Fisher information estimate (default: 0.9)
use_gpu (bool) – Whether to use GPU acceleration (default: True)
Dependencies:
Nonedetected from callable globals.Variables:
params(collections.abc.Iterator[numpy.ndarray], required);lr(float, optional, default0.001);fisher_momentum(float, optional, default0.9);use_gpu(bool, optional, defaultTrue).Usage Example
import numpy as np from grilly.optim.natural_gradient import NaturalGradient instance = NaturalGradient(...) result = instance.__init__(params=np.zeros(1, dtype=np.float32), lr=0.001, fisher_momentum=0.9, use_gpu=True)
- _get_backend()[source]
Get or create backend instance
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.natural_gradient import NaturalGradient instance = NaturalGradient(...) result = instance._get_backend()
- step(closure=None)[source]
Perform a single optimization step.
- Parameters
closure – Optional closure that reevaluates the model and returns loss
Dependencies:
numpy.Variables:
closure(Any, optional, defaultNone).Usage Example
from grilly.optim.natural_gradient import NaturalGradient instance = NaturalGradient(...) result = instance.step(closure=None)
- load_state_dict(state_dict)
Load optimizer state from state_dict.
- Parameters
state_dict (dict[str, Any]) – Dictionary containing optimizer state
Dependencies:
Nonedetected from callable globals.Variables:
state_dict(dict[str, typing.Any], required).Usage Example
from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.load_state_dict(state_dict='example')
- state_dict()
Return the state of the optimizer as a dict.
- Returns
Dictionary containing optimizer state
- Return type
dict[str, Any]
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.state_dict()
- zero_grad()
Clear gradients for all parameters.
Note: In this implementation, gradients are expected to be stored in a separate structure (e.g., in the model’s backward pass). This method is provided for API compatibility.
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.zero_grad()
- class grilly.optim.HypergradientAdamW(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.01, beta_hyper=1e-07, lr_min=1e-06, lr_max=1.0, log_scale=False, use_gpu=True)[source]
Bases:
AdamWAdamW with hypergradient-based online learning rate adaptation.
Basic version from Baydin et al. (2018). Uses a fixed hypergradient learning rate beta_hyper. Simple but requires manual tuning of beta_hyper. For a self-tuning version, use AutoHypergradientAdamW.
- Update rule:
alpha_{t+1} = alpha_t + beta_hyper * sum(g_t * d_{t-1})
- Parameters
params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Initial learning rate (default: 1e-3)
betas (tuple) – Coefficients for running averages (default: (0.9, 0.999))
eps (float) – Numerical stability term (default: 1e-8)
weight_decay (float) – Decoupled weight decay (default: 0.01)
beta_hyper (float) – Hypergradient learning rate (default: 1e-7)
lr_min (float) – Minimum learning rate clamp (default: 1e-6)
lr_max (float) – Maximum learning rate clamp (default: 1.0)
log_scale (bool) – If True, adapt log(lr) instead of lr (default: False)
use_gpu (bool) – Whether to use GPU acceleration (default: True)
Initialize AdamW optimizer.
- Parameters
params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Learning rate (default: 1e-3)
betas (tuple) – Coefficients for computing running averages (default: (0.9, 0.999))
eps (float) – Term added to denominator for numerical stability (default: 1e-8)
weight_decay (float) – Decoupled weight decay coefficient (default: 0.01)
amsgrad – Whether to use AMSGrad variant (default: False)
use_gpu (bool) – Whether to use GPU acceleration (default: True)
beta_hyper (float) –
lr_min (float) –
lr_max (float) –
log_scale (bool) –
- __init__(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.01, beta_hyper=1e-07, lr_min=1e-06, lr_max=1.0, log_scale=False, use_gpu=True)[source]
Initialize AdamW optimizer.
- Parameters
params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Learning rate (default: 1e-3)
betas (tuple) – Coefficients for computing running averages (default: (0.9, 0.999))
eps (float) – Term added to denominator for numerical stability (default: 1e-8)
weight_decay (float) – Decoupled weight decay coefficient (default: 0.01)
amsgrad – Whether to use AMSGrad variant (default: False)
use_gpu (bool) – Whether to use GPU acceleration (default: True)
beta_hyper (float) –
lr_min (float) –
lr_max (float) –
log_scale (bool) –
Dependencies:
Nonedetected from callable globals.Variables:
params(collections.abc.Iterator[numpy.ndarray], required);lr(float, optional, default0.001);betas(tuple, optional, default(0.9, 0.999));eps(float, optional, default1e-08);weight_decay(float, optional, default0.01);beta_hyper(float, optional, default1e-07);lr_min(float, optional, default1e-06);lr_max(float, optional, default1.0);log_scale(bool, optional, defaultFalse);use_gpu(bool, optional, defaultTrue).Usage Example
import numpy as np from grilly.optim.hypergradient import HypergradientAdamW instance = HypergradientAdamW(...) result = instance.__init__(params=np.zeros(1, dtype=np.float32), lr=0.001, betas=(), eps=1e-08, weight_decay=0.01, beta_hyper=1e-07, lr_min=1e-06, lr_max=1.0, log_scale=False, use_gpu=True)
- property current_lr
- property lr_history
- step(closure=None, gradients=None)[source]
Perform a single optimization step.
- Parameters
closure – Optional closure that reevaluates the model and returns loss
gradients – Optional dict mapping parameter IDs to gradients. If None, tries to get gradients from param.grad attribute.
Dependencies:
numpy.Variables:
closure(Any, optional, defaultNone);gradients(Any, optional, defaultNone).Usage Example
from grilly.optim.hypergradient import HypergradientAdamW instance = HypergradientAdamW(...) result = instance.step(closure=None, gradients=None)
- _adamw_update_gpu(backend, param, grad, exp_avg, exp_avg_sq, lr, beta1, beta2, eps, weight_decay, beta1_t, beta2_t, amsgrad)
GPU-accelerated AdamW update using adamw-update.glsl shader.
Dependencies:
numpy.Variables:
backend(Any, required);param(Any, required);grad(Any, required);exp_avg(Any, required);exp_avg_sq(Any, required);lr(Any, required);beta1(Any, required);beta2(Any, required);eps(Any, required);weight_decay(Any, required);beta1_t(Any, required);beta2_t(Any, required);amsgrad(Any, required).Usage Example
from grilly.optim.adamw import AdamW instance = AdamW(...) result = instance._adamw_update_gpu(backend=None, param=None, grad=None, exp_avg=None, exp_avg_sq=None, lr=None, beta1=None, beta2=None, eps=None, weight_decay=None, beta1_t=None, beta2_t=None, amsgrad=None)
- _get_backend()
Get or create backend instance
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.adamw import AdamW instance = AdamW(...) result = instance._get_backend()
- load_state_dict(state_dict)
Load optimizer state from state_dict.
- Parameters
state_dict (dict[str, Any]) – Dictionary containing optimizer state
Dependencies:
Nonedetected from callable globals.Variables:
state_dict(dict[str, typing.Any], required).Usage Example
from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.load_state_dict(state_dict='example')
- state_dict()
Return the state of the optimizer as a dict.
- Returns
Dictionary containing optimizer state
- Return type
dict[str, Any]
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.state_dict()
- zero_grad()
Clear gradients for all parameters.
Note: In this implementation, gradients are expected to be stored in a separate structure (e.g., in the model’s backward pass). This method is provided for API compatibility.
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.zero_grad()
- class grilly.optim.AutoHypergradientAdamW(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.01, hyper_lr=0.01, hyper_lr_beta=1.0, lr_min=1e-06, lr_max=1.0, adapt_momentum=False, track_surprise=False, surprise_gamma=0.9, surprise_alpha=0.1, trauma_threshold=0.5, beta_min=0.5, beta_max=0.9995, warmup_steps=10, use_gpu=True)[source]
Bases:
AdamWAdamW with OSGM-style auto hypergradient adjustment.
Self-tuning optimizer that automatically adapts the learning rate (and optionally momentum beta1) using online hypergradient descent with AdaGrad-stabilized updates. No manual hypergradient LR tuning needed — the AdaGrad accumulator self-adjusts the meta-learning rate.
Based on the OSGM/HDM algorithm:
- Step size hypergradient (how lr should change):
h_lr = -g_k . d_{k-1} / (||g_{k-1}||^2 + eps) G_lr += h_lr^2 lr -= hyper_lr * h_lr / (sqrt(G_lr) + eps)
- Momentum hypergradient (how beta1 should change):
h_beta = g_k . m_{k-1} / (||g_{k-1}||^2 + eps) G_beta += h_beta^2 beta1 -= hyper_lr_beta * h_beta / (sqrt(G_beta) + eps)
The gradient-norm normalization (/ ||g||^2) makes the algorithm scale-invariant, and the AdaGrad accumulator makes the meta-LR self-adjusting — larger past hypergradients automatically slow down future adaptation, preventing oscillation.
Particularly effective for SNN training where surrogate gradients are noisy and the optimal learning rate shifts during training.
- Surprise signal (optional, input-level):
Tracks gradient prediction error as a “surprise” signal and exposes it for the model to use as input gain modulation. Unlike backprop-level momentum changes, this acts at the forward-pass level — amplifying input signals when the optimization landscape shifts unexpectedly.
- Instant surprise (gradient prediction error):
S_instant = tanh(||g_k - EMA(g)||^2 / (EMA(||g||^2) + eps))
- Accumulated surprise (biological momentum / S_bar):
S_bar = alpha * S_instant + (1-alpha) * S_bar_prev
- Inverted-U gain (Yerkes-Dodson / trauma protection):
gain = S_bar * exp(-S_bar / trauma_threshold)
- The inverted-U curve implements the biological stress response:
Low S_bar → low gain (nothing interesting)
Moderate S_bar → peak gain (optimal learning zone)
High S_bar → gain drops (trauma protection)
This prevents “unerasable events” — if surprise stays high for many consecutive steps (chronic stress), the gain suppresses instead of amplifying, protecting the model from fixating on a single extreme event. Mirrors the HPA axis: acute stress enhances encoding, chronic stress impairs plasticity.
- The model reads current_surprise_gain for input scaling:
x_effective = x * (1 + scale * optimizer.current_surprise_gain)
- Parameters
params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Initial learning rate (default: 1e-3)
betas (tuple) – Coefficients for running averages (default: (0.9, 0.999))
eps (float) – Numerical stability term (default: 1e-8)
weight_decay (float) – Decoupled weight decay (default: 0.01)
hyper_lr (float) – Meta-learning rate for step size adaptation (default: 0.01). This is automatically modulated by the AdaGrad accumulator, so it’s much less sensitive than HypergradientAdamW’s beta_hyper.
hyper_lr_beta (float) – Meta-learning rate for momentum adaptation (default: 1.0). Only used when adapt_momentum=True.
lr_min (float) – Minimum learning rate clamp (default: 1e-6)
lr_max (float) – Maximum learning rate clamp (default: 1.0)
adapt_momentum (bool) – If True, also adapt beta1 via hypergradient (default: False)
track_surprise (bool) – If True, compute and expose gradient surprise signal via current_surprise_gain (default: False). The model’s forward pass should read this to modulate input gain.
surprise_gamma (float) – EMA decay for gradient tracking (default: 0.9). Higher = smoother baseline, slower to detect change.
surprise_alpha (float) – EMA decay for surprise accumulation S_bar (default: 0.1). Controls how fast accumulated surprise builds up and decays. Lower = longer memory of surprise.
trauma_threshold (float) – S_bar level where gain peaks before suppression (default: 0.5). The inverted-U gain = S_bar * exp(-S_bar/T) peaks at S_bar = T. Above this, gain decreases (protection).
beta_min (float) – Minimum beta1 clamp (default: 0.5)
beta_max (float) – Maximum beta1 clamp (default: 0.9995)
warmup_steps (int) – Steps before starting adaptation (default: 10). Lets Adam moments initialize before adapting LR.
use_gpu (bool) – Whether to use GPU acceleration (default: True)
Initialize AdamW optimizer.
- Parameters
params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Learning rate (default: 1e-3)
betas (tuple) – Coefficients for computing running averages (default: (0.9, 0.999))
eps (float) – Term added to denominator for numerical stability (default: 1e-8)
weight_decay (float) – Decoupled weight decay coefficient (default: 0.01)
amsgrad – Whether to use AMSGrad variant (default: False)
use_gpu (bool) – Whether to use GPU acceleration (default: True)
hyper_lr (float) –
hyper_lr_beta (float) –
lr_min (float) –
lr_max (float) –
adapt_momentum (bool) –
track_surprise (bool) –
surprise_gamma (float) –
surprise_alpha (float) –
trauma_threshold (float) –
beta_min (float) –
beta_max (float) –
warmup_steps (int) –
- __init__(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.01, hyper_lr=0.01, hyper_lr_beta=1.0, lr_min=1e-06, lr_max=1.0, adapt_momentum=False, track_surprise=False, surprise_gamma=0.9, surprise_alpha=0.1, trauma_threshold=0.5, beta_min=0.5, beta_max=0.9995, warmup_steps=10, use_gpu=True)[source]
Initialize AdamW optimizer.
- Parameters
params (Iterator[numpy.ndarray]) – Iterator of parameter arrays to optimize
lr (float) – Learning rate (default: 1e-3)
betas (tuple) – Coefficients for computing running averages (default: (0.9, 0.999))
eps (float) – Term added to denominator for numerical stability (default: 1e-8)
weight_decay (float) – Decoupled weight decay coefficient (default: 0.01)
amsgrad – Whether to use AMSGrad variant (default: False)
use_gpu (bool) – Whether to use GPU acceleration (default: True)
hyper_lr (float) –
hyper_lr_beta (float) –
lr_min (float) –
lr_max (float) –
adapt_momentum (bool) –
track_surprise (bool) –
surprise_gamma (float) –
surprise_alpha (float) –
trauma_threshold (float) –
beta_min (float) –
beta_max (float) –
warmup_steps (int) –
Dependencies:
Nonedetected from callable globals.Variables:
params(collections.abc.Iterator[numpy.ndarray], required);lr(float, optional, default0.001);betas(tuple, optional, default(0.9, 0.999));eps(float, optional, default1e-08);weight_decay(float, optional, default0.01);hyper_lr(float, optional, default0.01);hyper_lr_beta(float, optional, default1.0);lr_min(float, optional, default1e-06);lr_max(float, optional, default1.0);adapt_momentum(bool, optional, defaultFalse);track_surprise(bool, optional, defaultFalse);surprise_gamma(float, optional, default0.9);surprise_alpha(float, optional, default0.1);trauma_threshold(float, optional, default0.5);beta_min(float, optional, default0.5);beta_max(float, optional, default0.9995);warmup_steps(int, optional, default10);use_gpu(bool, optional, defaultTrue).Usage Example
import numpy as np from grilly.optim.hypergradient import AutoHypergradientAdamW instance = AutoHypergradientAdamW(...) result = instance.__init__(params=np.zeros(1, dtype=np.float32), lr=0.001, betas=(), eps=1e-08, weight_decay=0.01, hyper_lr=0.01, hyper_lr_beta=1.0, lr_min=1e-06, lr_max=1.0, adapt_momentum=False, track_surprise=False, surprise_gamma=0.9, surprise_alpha=0.1, trauma_threshold=0.5, beta_min=0.5, beta_max=0.9995, warmup_steps=10, use_gpu=True)
- property current_lr
- property current_surprise
Instant surprise signal [0, 1]. Raw gradient prediction error.
- property accumulated_surprise
Accumulated surprise S_bar. Biological momentum of surprise.
- property current_surprise_gain
Inverted-U gain signal for input-level modulation.
- Implements the Yerkes-Dodson curve / trauma protection:
gain = S_bar * exp(-S_bar / trauma_threshold)
Low S_bar → low gain (nothing interesting happening)
Moderate S_bar → peak gain (optimal learning zone)
High S_bar → gain drops (trauma protection, don’t fixate)
- Read this after each optimizer step and pass to the model:
x_effective = x * (1 + scale * optimizer.current_surprise_gain)
Returns 0.0 when surprise tracking is off or during warmup.
- property lr_history
- property beta1_history
- property surprise_history
- property s_bar_history
- step(closure=None, gradients=None)[source]
Perform optimization step with OSGM-style auto LR adaptation.
Collect current gradients g_k
Compute surprise signal (if track_surprise=True)
Compute normalized hypergradients (after warmup): h_lr = -g_k . d_{k-1} / ||g_{k-1}||^2 h_beta = g_k . m_{k-1} / ||g_{k-1}||^2
Update AdaGrad accumulators and adjust lr (and beta1)
Run standard AdamW step with adapted hyperparameters
Store d_k, ||g_k||^2, m_k for next step
Dependencies:
numpy.Variables:
closure(Any, optional, defaultNone);gradients(Any, optional, defaultNone).Usage Example
from grilly.optim.hypergradient import AutoHypergradientAdamW instance = AutoHypergradientAdamW(...) result = instance.step(closure=None, gradients=None)
- _adamw_update_gpu(backend, param, grad, exp_avg, exp_avg_sq, lr, beta1, beta2, eps, weight_decay, beta1_t, beta2_t, amsgrad)
GPU-accelerated AdamW update using adamw-update.glsl shader.
Dependencies:
numpy.Variables:
backend(Any, required);param(Any, required);grad(Any, required);exp_avg(Any, required);exp_avg_sq(Any, required);lr(Any, required);beta1(Any, required);beta2(Any, required);eps(Any, required);weight_decay(Any, required);beta1_t(Any, required);beta2_t(Any, required);amsgrad(Any, required).Usage Example
from grilly.optim.adamw import AdamW instance = AdamW(...) result = instance._adamw_update_gpu(backend=None, param=None, grad=None, exp_avg=None, exp_avg_sq=None, lr=None, beta1=None, beta2=None, eps=None, weight_decay=None, beta1_t=None, beta2_t=None, amsgrad=None)
- _get_backend()
Get or create backend instance
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.adamw import AdamW instance = AdamW(...) result = instance._get_backend()
- load_state_dict(state_dict)
Load optimizer state from state_dict.
- Parameters
state_dict (dict[str, Any]) – Dictionary containing optimizer state
Dependencies:
Nonedetected from callable globals.Variables:
state_dict(dict[str, typing.Any], required).Usage Example
from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.load_state_dict(state_dict='example')
- state_dict()
Return the state of the optimizer as a dict.
- Returns
Dictionary containing optimizer state
- Return type
dict[str, Any]
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.state_dict()
- zero_grad()
Clear gradients for all parameters.
Note: In this implementation, gradients are expected to be stored in a separate structure (e.g., in the model’s backward pass). This method is provided for API compatibility.
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.base import Optimizer instance = Optimizer(...) result = instance.zero_grad()
- class grilly.optim.LRScheduler(optimizer, last_epoch=-1)[source]
Bases:
objectBase class for learning rate schedulers.
All schedulers should inherit from this class and implement the get_lr() method.
Initialize base scheduler.
- Parameters
optimizer – Wrapped optimizer
last_epoch – The index of last epoch (default: -1)
- __init__(optimizer, last_epoch=-1)[source]
Initialize base scheduler.
- Parameters
optimizer – Wrapped optimizer
last_epoch – The index of last epoch (default: -1)
Dependencies:
Nonedetected from callable globals.Variables:
optimizer(Any, required);last_epoch(Any, optional, default-1).Usage Example
from grilly.optim.lr_scheduler import LRScheduler instance = LRScheduler(...) result = instance.__init__(optimizer=None, last_epoch=-1)
- state_dict()[source]
Returns the state of the scheduler as a dict.
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.lr_scheduler import LRScheduler instance = LRScheduler(...) result = instance.state_dict()
- load_state_dict(state_dict)[source]
Loads the scheduler state.
Dependencies:
Nonedetected from callable globals.Variables:
state_dict(Any, required).Usage Example
from grilly.optim.lr_scheduler import LRScheduler instance = LRScheduler(...) result = instance.load_state_dict(state_dict=None)
- get_last_lr()[source]
Return last computed learning rate by current scheduler.
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.lr_scheduler import LRScheduler instance = LRScheduler(...) result = instance.get_last_lr()
- get_lr()[source]
Compute learning rate using chainable form of the scheduler.
This method should be implemented by subclasses.
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.lr_scheduler import LRScheduler instance = LRScheduler(...) result = instance.get_lr()
- step(epoch=None)[source]
Perform a scheduler step.
- Parameters
epoch – Optional epoch number to use instead of incrementing
Dependencies:
Nonedetected from callable globals.Variables:
epoch(Any, optional, defaultNone).Usage Example
from grilly.optim.lr_scheduler import LRScheduler instance = LRScheduler(...) result = instance.step(epoch=None)
- class grilly.optim.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1)[source]
Bases:
LRSchedulerDecays the learning rate by gamma every step_size epochs.
Matches torch.optim.lr_scheduler.StepLR
Initialize StepLR scheduler.
- Parameters
optimizer – Wrapped optimizer
step_size – Period of learning rate decay
gamma – Multiplicative factor of learning rate decay (default: 0.1)
last_epoch – The index of last epoch (default: -1)
- __init__(optimizer, step_size, gamma=0.1, last_epoch=-1)[source]
Initialize StepLR scheduler.
- Parameters
optimizer – Wrapped optimizer
step_size – Period of learning rate decay
gamma – Multiplicative factor of learning rate decay (default: 0.1)
last_epoch – The index of last epoch (default: -1)
Dependencies:
Nonedetected from callable globals.Variables:
optimizer(Any, required);step_size(Any, required);gamma(Any, optional, default0.1);last_epoch(Any, optional, default-1).Usage Example
from grilly.optim.lr_scheduler import StepLR instance = StepLR(...) result = instance.__init__(optimizer=None, step_size=None, gamma=0.1, last_epoch=-1)
- get_lr()[source]
Compute learning rate for current epoch.
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.lr_scheduler import StepLR instance = StepLR(...) result = instance.get_lr()
- get_last_lr()
Return last computed learning rate by current scheduler.
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.lr_scheduler import LRScheduler instance = LRScheduler(...) result = instance.get_last_lr()
- load_state_dict(state_dict)
Loads the scheduler state.
Dependencies:
Nonedetected from callable globals.Variables:
state_dict(Any, required).Usage Example
from grilly.optim.lr_scheduler import LRScheduler instance = LRScheduler(...) result = instance.load_state_dict(state_dict=None)
- state_dict()
Returns the state of the scheduler as a dict.
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.lr_scheduler import LRScheduler instance = LRScheduler(...) result = instance.state_dict()
- step(epoch=None)
Perform a scheduler step.
- Parameters
epoch – Optional epoch number to use instead of incrementing
Dependencies:
Nonedetected from callable globals.Variables:
epoch(Any, optional, defaultNone).Usage Example
from grilly.optim.lr_scheduler import LRScheduler instance = LRScheduler(...) result = instance.step(epoch=None)
- class grilly.optim.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=-1)[source]
Bases:
LRSchedulerSet the learning rate using a cosine annealing schedule.
Matches torch.optim.lr_scheduler.CosineAnnealingLR
Initialize CosineAnnealingLR scheduler.
- Parameters
optimizer – Wrapped optimizer
T_max – Maximum number of iterations
eta_min – Minimum learning rate (default: 0)
last_epoch – The index of last epoch (default: -1)
- __init__(optimizer, T_max, eta_min=0, last_epoch=-1)[source]
Initialize CosineAnnealingLR scheduler.
- Parameters
optimizer – Wrapped optimizer
T_max – Maximum number of iterations
eta_min – Minimum learning rate (default: 0)
last_epoch – The index of last epoch (default: -1)
Dependencies:
Nonedetected from callable globals.Variables:
optimizer(Any, required);T_max(Any, required);eta_min(Any, optional, default0);last_epoch(Any, optional, default-1).Usage Example
from grilly.optim.lr_scheduler import CosineAnnealingLR instance = CosineAnnealingLR(...) result = instance.__init__(optimizer=None, T_max=None, eta_min=0, last_epoch=-1)
- get_lr()[source]
Compute learning rate using cosine annealing.
Dependencies:
math.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.lr_scheduler import CosineAnnealingLR instance = CosineAnnealingLR(...) result = instance.get_lr()
- get_last_lr()
Return last computed learning rate by current scheduler.
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.lr_scheduler import LRScheduler instance = LRScheduler(...) result = instance.get_last_lr()
- load_state_dict(state_dict)
Loads the scheduler state.
Dependencies:
Nonedetected from callable globals.Variables:
state_dict(Any, required).Usage Example
from grilly.optim.lr_scheduler import LRScheduler instance = LRScheduler(...) result = instance.load_state_dict(state_dict=None)
- state_dict()
Returns the state of the scheduler as a dict.
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.lr_scheduler import LRScheduler instance = LRScheduler(...) result = instance.state_dict()
- step(epoch=None)
Perform a scheduler step.
- Parameters
epoch – Optional epoch number to use instead of incrementing
Dependencies:
Nonedetected from callable globals.Variables:
epoch(Any, optional, defaultNone).Usage Example
from grilly.optim.lr_scheduler import LRScheduler instance = LRScheduler(...) result = instance.step(epoch=None)
- class grilly.optim.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)[source]
Bases:
objectReduce learning rate when a metric has stopped improving.
Matches torch.optim.lr_scheduler.ReduceLROnPlateau
Initialize ReduceLROnPlateau scheduler.
- Parameters
optimizer – Wrapped optimizer
mode – One of ‘min’ or ‘max’. In ‘min’ mode, lr will be reduced when the quantity monitored has stopped decreasing (default: ‘min’)
factor – Factor by which the learning rate will be reduced (default: 0.1)
patience – Number of epochs with no improvement after which learning rate will be reduced (default: 10)
threshold – Threshold for measuring the new optimum (default: 1e-4)
threshold_mode – One of ‘rel’, ‘abs’ (default: ‘rel’)
cooldown – Number of epochs to wait before resuming normal operation after lr has been reduced (default: 0)
min_lr – A lower bound on the learning rate (default: 0)
eps – Minimal decay applied to lr (default: 1e-8)
- __init__(optimizer, mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)[source]
Initialize ReduceLROnPlateau scheduler.
- Parameters
optimizer – Wrapped optimizer
mode – One of ‘min’ or ‘max’. In ‘min’ mode, lr will be reduced when the quantity monitored has stopped decreasing (default: ‘min’)
factor – Factor by which the learning rate will be reduced (default: 0.1)
patience – Number of epochs with no improvement after which learning rate will be reduced (default: 10)
threshold – Threshold for measuring the new optimum (default: 1e-4)
threshold_mode – One of ‘rel’, ‘abs’ (default: ‘rel’)
cooldown – Number of epochs to wait before resuming normal operation after lr has been reduced (default: 0)
min_lr – A lower bound on the learning rate (default: 0)
eps – Minimal decay applied to lr (default: 1e-8)
Dependencies:
Nonedetected from callable globals.Variables:
optimizer(Any, required);mode(Any, optional, default'min');factor(Any, optional, default0.1);patience(Any, optional, default10);threshold(Any, optional, default0.0001);threshold_mode(Any, optional, default'rel');cooldown(Any, optional, default0);min_lr(Any, optional, default0);eps(Any, optional, default1e-08).Usage Example
from grilly.optim.lr_scheduler import ReduceLROnPlateau instance = ReduceLROnPlateau(...) result = instance.__init__(optimizer=None, mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)
- _reset()[source]
Reset num_bad_epochs counter and cooldown counter.
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.lr_scheduler import ReduceLROnPlateau instance = ReduceLROnPlateau(...) result = instance._reset()
- step(metrics, epoch=None)[source]
Perform a scheduler step based on metric.
- Parameters
metrics – The metric to monitor
epoch – Optional epoch number
Dependencies:
Nonedetected from callable globals.Variables:
metrics(Any, required);epoch(Any, optional, defaultNone).Usage Example
from grilly.optim.lr_scheduler import ReduceLROnPlateau instance = ReduceLROnPlateau(...) result = instance.step(metrics=None, epoch=None)
- _reduce_lr(epoch)[source]
Reduce learning rate.
Dependencies:
Nonedetected from callable globals.Variables:
epoch(Any, required).Usage Example
from grilly.optim.lr_scheduler import ReduceLROnPlateau instance = ReduceLROnPlateau(...) result = instance._reduce_lr(epoch=None)
- property in_cooldown
Check if scheduler is in cooldown period.
- is_better(a, best)[source]
Check if metric ‘a’ is better than ‘best’.
Dependencies:
Nonedetected from callable globals.Variables:
a(Any, required);best(Any, required).Usage Example
from grilly.optim.lr_scheduler import ReduceLROnPlateau instance = ReduceLROnPlateau(...) result = instance.is_better(a=None, best=None)
- _init_is_better(mode, threshold, threshold_mode)[source]
Initialize comparison function.
Dependencies:
Nonedetected from callable globals.Variables:
mode(Any, required);threshold(Any, required);threshold_mode(Any, required).Usage Example
from grilly.optim.lr_scheduler import ReduceLROnPlateau instance = ReduceLROnPlateau(...) result = instance._init_is_better(mode=None, threshold=None, threshold_mode=None)
- state_dict()[source]
Returns the state of the scheduler as a dict.
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.lr_scheduler import ReduceLROnPlateau instance = ReduceLROnPlateau(...) result = instance.state_dict()
- load_state_dict(state_dict)[source]
Loads the scheduler state.
Dependencies:
Nonedetected from callable globals.Variables:
state_dict(Any, required).Usage Example
from grilly.optim.lr_scheduler import ReduceLROnPlateau instance = ReduceLROnPlateau(...) result = instance.load_state_dict(state_dict=None)
- class grilly.optim.OneCycleLR(optimizer, max_lr, total_steps=None, epochs=None, steps_per_epoch=None, pct_start=0.3, anneal_strategy='cos', cycle_momentum=True, base_momentum=0.85, max_momentum=0.95, div_factor=25.0, final_div_factor=10000.0, last_epoch=-1)[source]
Bases:
LRSchedulerSets the learning rate according to the 1cycle learning rate policy.
Matches torch.optim.lr_scheduler.OneCycleLR
Initialize OneCycleLR scheduler.
- Parameters
optimizer – Wrapped optimizer
max_lr – Upper learning rate boundary in the cycle
total_steps – Total number of steps in the cycle (optional)
epochs – Number of epochs to train for (optional)
steps_per_epoch – Number of steps per epoch (optional)
pct_start – Percentage of the cycle spent increasing the learning rate (default: 0.3)
anneal_strategy – Specifies the annealing strategy: ‘cos’ or ‘linear’ (default: ‘cos’)
cycle_momentum – If True, momentum is cycled inversely (default: True)
base_momentum – Lower momentum boundary in the cycle (default: 0.85)
max_momentum – Upper momentum boundary in the cycle (default: 0.95)
div_factor – Determines the initial learning rate via initial_lr = max_lr/div_factor (default: 25)
final_div_factor – Determines the minimum learning rate via min_lr = initial_lr/final_div_factor (default: 1e4)
last_epoch – The index of last epoch (default: -1)
- __init__(optimizer, max_lr, total_steps=None, epochs=None, steps_per_epoch=None, pct_start=0.3, anneal_strategy='cos', cycle_momentum=True, base_momentum=0.85, max_momentum=0.95, div_factor=25.0, final_div_factor=10000.0, last_epoch=-1)[source]
Initialize OneCycleLR scheduler.
- Parameters
optimizer – Wrapped optimizer
max_lr – Upper learning rate boundary in the cycle
total_steps – Total number of steps in the cycle (optional)
epochs – Number of epochs to train for (optional)
steps_per_epoch – Number of steps per epoch (optional)
pct_start – Percentage of the cycle spent increasing the learning rate (default: 0.3)
anneal_strategy – Specifies the annealing strategy: ‘cos’ or ‘linear’ (default: ‘cos’)
cycle_momentum – If True, momentum is cycled inversely (default: True)
base_momentum – Lower momentum boundary in the cycle (default: 0.85)
max_momentum – Upper momentum boundary in the cycle (default: 0.95)
div_factor – Determines the initial learning rate via initial_lr = max_lr/div_factor (default: 25)
final_div_factor – Determines the minimum learning rate via min_lr = initial_lr/final_div_factor (default: 1e4)
last_epoch – The index of last epoch (default: -1)
Dependencies:
Nonedetected from callable globals.Variables:
optimizer(Any, required);max_lr(Any, required);total_steps(Any, optional, defaultNone);epochs(Any, optional, defaultNone);steps_per_epoch(Any, optional, defaultNone);pct_start(Any, optional, default0.3);anneal_strategy(Any, optional, default'cos');cycle_momentum(Any, optional, defaultTrue);base_momentum(Any, optional, default0.85);max_momentum(Any, optional, default0.95);div_factor(Any, optional, default25.0);final_div_factor(Any, optional, default10000.0);last_epoch(Any, optional, default-1).Usage Example
from grilly.optim.lr_scheduler import OneCycleLR instance = OneCycleLR(...) result = instance.__init__(optimizer=None, max_lr=None, total_steps=None, epochs=None, steps_per_epoch=None, pct_start=0.3, anneal_strategy='cos', cycle_momentum=True, base_momentum=0.85, max_momentum=0.95, div_factor=25.0, final_div_factor=10000.0, last_epoch=-1)
- _format_param(name, optimizer, param)[source]
Format parameter to be a list per parameter group.
Dependencies:
Nonedetected from callable globals.Variables:
name(Any, required);optimizer(Any, required);param(Any, required).Usage Example
from grilly.optim.lr_scheduler import OneCycleLR instance = OneCycleLR(...) result = instance._format_param(name=None, optimizer=None, param=None)
- _annealing_cos(start, end, pct)[source]
Cosine annealing from start to end as pct goes from 0.0 to 1.0.
Dependencies:
math.Variables:
start(Any, required);end(Any, required);pct(Any, required).Usage Example
from grilly.optim.lr_scheduler import OneCycleLR instance = OneCycleLR(...) result = instance._annealing_cos(start=None, end=None, pct=None)
- get_last_lr()
Return last computed learning rate by current scheduler.
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.lr_scheduler import LRScheduler instance = LRScheduler(...) result = instance.get_last_lr()
- load_state_dict(state_dict)
Loads the scheduler state.
Dependencies:
Nonedetected from callable globals.Variables:
state_dict(Any, required).Usage Example
from grilly.optim.lr_scheduler import LRScheduler instance = LRScheduler(...) result = instance.load_state_dict(state_dict=None)
- state_dict()
Returns the state of the scheduler as a dict.
Dependencies:
Nonedetected from callable globals.Variables: This callable does not take explicit input variables.
Usage Example
from grilly.optim.lr_scheduler import LRScheduler instance = LRScheduler(...) result = instance.state_dict()
- step(epoch=None)
Perform a scheduler step.
- Parameters
epoch – Optional epoch number to use instead of incrementing
Dependencies:
Nonedetected from callable globals.Variables:
epoch(Any, optional, defaultNone).Usage Example
from grilly.optim.lr_scheduler import LRScheduler instance = LRScheduler(...) result = instance.step(epoch=None)
- _annealing_linear(start, end, pct)[source]
Linear annealing from start to end as pct goes from 0.0 to 1.0.
Dependencies:
Nonedetected from callable globals.Variables:
start(Any, required);end(Any, required);pct(Any, required).Usage Example
from grilly.optim.lr_scheduler import OneCycleLR instance = OneCycleLR(...) result = instance._annealing_linear(start=None, end=None, pct=None)
Modules
Adam Optimizer |
|
AdamW Optimizer |
|
Base Optimizer class (PyTorch-like) |
|
Hypergradient Descent Optimizers |
|
Learning Rate Schedulers |
|
Natural Gradient Optimizer |
|
NLMS (Normalized Least Mean Squares) Optimizer |
|
SGD Optimizer |