grilly.scripts.ingest_svc
SVC ingestion (fast, streaming).
- The original ingestion flow was doing two full passes:
InstantLanguage.ingest_svc(…)
CognitiveController.ingest_svc(…) (which calls language.ingest_svc again)
That doubles the encoding work and makes large JSONL files feel like they “hang”. This script ingests once through CognitiveController, streaming the JSONL in configurable chunks.
- Usage:
python scripts/ingest_svc.py -f datasets/_data/svc_training_merged.jsonl python scripts/ingest_svc.py -f … –max 50000 –chunk 4096 –no-templates python scripts/ingest_svc.py -f … –no-ngrams # much faster vocab build
Functions
|
Run main. |
Classes
|
PurePath subclass that can make system calls. |
- grilly.scripts.ingest_svc._fmt_rate(n, dt)[source]
Run fmt rate.
Dependencies:
Nonedetected from callable globals.Variables:
n(int, required);dt(float, required).Usage Example
from grilly.scripts.ingest_svc import _fmt_rate result = _fmt_rate(n=0, dt=0.0)
- Parameters
n (int) –
dt (float) –
- Return type
str