grilly.scripts.ingest_svc

SVC ingestion (fast, streaming).

The original ingestion flow was doing two full passes:
  1. InstantLanguage.ingest_svc(…)

  2. CognitiveController.ingest_svc(…) (which calls language.ingest_svc again)

That doubles the encoding work and makes large JSONL files feel like they “hang”. This script ingests once through CognitiveController, streaming the JSONL in configurable chunks.

Usage:

python scripts/ingest_svc.py -f datasets/_data/svc_training_merged.jsonl python scripts/ingest_svc.py -f … –max 50000 –chunk 4096 –no-templates python scripts/ingest_svc.py -f … –no-ngrams # much faster vocab build

Functions

main()

Run main.

Classes

Path(*args, **kwargs)

PurePath subclass that can make system calls.

grilly.scripts.ingest_svc._fmt_rate(n, dt)[source]

Run fmt rate.

Dependencies: None detected from callable globals.

Variables: n (int, required); dt (float, required).

Usage Example

from grilly.scripts.ingest_svc import _fmt_rate

result = _fmt_rate(n=0, dt=0.0)
Parameters
  • n (int) –

  • dt (float) –

Return type

str

grilly.scripts.ingest_svc.main()[source]

Run main.

Dependencies: argparse, pathlib, time.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.scripts.ingest_svc import main

result = main()
Return type

None