Performance, Debugging, and Testing

Performance model

Grilly performance depends on:

shader availability for your code path
memory movement volume (host to device and back)
tensor shapes and batch sizing
operation fusion opportunities

Design choices

Performance and correctness tooling in Grilly favors explicitness:

Keep kernel boundaries visible so bottlenecks are measurable.
Preserve CPU fallback paths for differential testing and debugging.
Use strict docs/test builds (-W and targeted suites) to catch regressions early in CI and local workflows.

Profiling strategy

Use a layered profiling approach:

Measure end-to-end step time.
Isolate hotspot operators.
Verify whether code path is GPU or fallback CPU.
Reduce unnecessary downloads and host-side conversions.

Debugging checklist

Confirm Vulkan backend initialization.
Check tensor dtype (float32) and expected shape.
Verify required shader exists in loaded shader map.
Reproduce issue with smallest possible tensor sizes.
Add finite checks (np.isfinite) at major boundaries.

Testing workflow

Useful commands:

pytest -q
pytest tests/experimental -q
pytest tests/test_integration_vulkan.py -q

For docs:

uv run --with-requirements docs/requirements.txt sphinx-build -b html docs docs/_build/html -W

Reproducibility tips

Use stable hash utilities for deterministic seed derivation.
Save checkpoint artifacts for long ingestion/training flows.
Keep environment variables and driver versions tracked in experiment logs.