grilly.datasets.validate_svc

SVC Dataset Validator

Validates instruct and conversation JSONL files for: - Schema completeness (10 fields, correct types) - SVC rule correctness (imperative promotion, normal subjects) - Field distributions (realm, complexity, POS/deps) - Data quality (text length, verb presence) - Cross-file consistency (source field matches file)

Functions

count_lines(filepath)

Count total lines in file.

main()

Autogenerated reference for grilly.datasets.validate_svc.main.

sample_lines(filepath, n)

Reservoir sample n lines from a JSONL file.

validate_data_quality(entry)

Check data quality constraints.

validate_file(filepath, expected_source, ...)

Validate a single JSONL file and print report.

validate_schema(entry, expected_fields)

Check that all expected fields are present with correct types.

validate_source_consistency(entry, ...)

Check source field matches the file it came from.

validate_svc_rules(entry)

Check SVC extraction rules.

Classes

Counter([iterable])

Dict subclass for counting hashable items.

Path(*args, **kwargs)

PurePath subclass that can make system calls.

grilly.datasets.validate_svc.sample_lines(filepath, n)[source]

Reservoir sample n lines from a JSONL file.

Dependencies: json, random.

Variables: filepath (pathlib.Path, required); n (int, required).

Usage Example

from grilly.datasets.validate_svc import sample_lines

result = sample_lines(filepath=None, n=0)
Parameters
  • filepath (Path) –

  • n (int) –

Return type

list[dict]

grilly.datasets.validate_svc.count_lines(filepath)[source]

Count total lines in file.

Dependencies: None detected from callable globals.

Variables: filepath (pathlib.Path, required).

Usage Example

from grilly.datasets.validate_svc import count_lines

result = count_lines(filepath=None)
Parameters

filepath (Path) –

Return type

int

grilly.datasets.validate_svc.validate_schema(entry, expected_fields)[source]

Check that all expected fields are present with correct types.

Dependencies: None detected from callable globals.

Variables: entry (dict, required); expected_fields (set, required).

Usage Example

from grilly.datasets.validate_svc import validate_schema

result = validate_schema(entry={}, expected_fields=None)
Parameters
  • entry (dict) –

  • expected_fields (set) –

Return type

list[str]

grilly.datasets.validate_svc.validate_svc_rules(entry)[source]

Check SVC extraction rules.

Dependencies: None detected from callable globals.

Variables: entry (dict, required).

Usage Example

from grilly.datasets.validate_svc import validate_svc_rules

result = validate_svc_rules(entry={})
Parameters

entry (dict) –

Return type

list[str]

grilly.datasets.validate_svc.validate_data_quality(entry)[source]

Check data quality constraints.

Dependencies: None detected from callable globals.

Variables: entry (dict, required).

Usage Example

from grilly.datasets.validate_svc import validate_data_quality

result = validate_data_quality(entry={})
Parameters

entry (dict) –

Return type

list[str]

grilly.datasets.validate_svc.validate_source_consistency(entry, expected_source)[source]

Check source field matches the file it came from.

Dependencies: None detected from callable globals.

Variables: entry (dict, required); expected_source (str, required).

Usage Example

from grilly.datasets.validate_svc import validate_source_consistency

result = validate_source_consistency(entry={}, expected_source='example')
Parameters
  • entry (dict) –

  • expected_source (str) –

Return type

list[str]

grilly.datasets.validate_svc.validate_file(filepath, expected_source, expected_fields)[source]

Validate a single JSONL file and print report.

Dependencies: collections, random.

Variables: filepath (pathlib.Path, required); expected_source (str, required); expected_fields (set, required).

Usage Example

from grilly.datasets.validate_svc import validate_file

result = validate_file(filepath=None, expected_source='example', expected_fields=None)
Parameters
  • filepath (Path) –

  • expected_source (str) –

  • expected_fields (set) –

grilly.datasets.validate_svc.main()[source]

Autogenerated reference for grilly.datasets.validate_svc.main.

Dependencies: pathlib, sys.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.datasets.validate_svc import main

result = main()