grilly.datasets.validate_svc
SVC Dataset Validator
Validates instruct and conversation JSONL files for: - Schema completeness (10 fields, correct types) - SVC rule correctness (imperative promotion, normal subjects) - Field distributions (realm, complexity, POS/deps) - Data quality (text length, verb presence) - Cross-file consistency (source field matches file)
Functions
|
Count total lines in file. |
|
Autogenerated reference for |
|
Reservoir sample n lines from a JSONL file. |
|
Check data quality constraints. |
|
Validate a single JSONL file and print report. |
|
Check that all expected fields are present with correct types. |
|
Check source field matches the file it came from. |
|
Check SVC extraction rules. |
Classes
|
Dict subclass for counting hashable items. |
|
PurePath subclass that can make system calls. |
- grilly.datasets.validate_svc.sample_lines(filepath, n)[source]
Reservoir sample n lines from a JSONL file.
Dependencies:
json,random.Variables:
filepath(pathlib.Path, required);n(int, required).Usage Example
from grilly.datasets.validate_svc import sample_lines result = sample_lines(filepath=None, n=0)
- Parameters
filepath (Path) –
n (int) –
- Return type
list[dict]
- grilly.datasets.validate_svc.count_lines(filepath)[source]
Count total lines in file.
Dependencies:
Nonedetected from callable globals.Variables:
filepath(pathlib.Path, required).Usage Example
from grilly.datasets.validate_svc import count_lines result = count_lines(filepath=None)
- Parameters
filepath (Path) –
- Return type
int
- grilly.datasets.validate_svc.validate_schema(entry, expected_fields)[source]
Check that all expected fields are present with correct types.
Dependencies:
Nonedetected from callable globals.Variables:
entry(dict, required);expected_fields(set, required).Usage Example
from grilly.datasets.validate_svc import validate_schema result = validate_schema(entry={}, expected_fields=None)
- Parameters
entry (dict) –
expected_fields (set) –
- Return type
list[str]
- grilly.datasets.validate_svc.validate_svc_rules(entry)[source]
Check SVC extraction rules.
Dependencies:
Nonedetected from callable globals.Variables:
entry(dict, required).Usage Example
from grilly.datasets.validate_svc import validate_svc_rules result = validate_svc_rules(entry={})
- Parameters
entry (dict) –
- Return type
list[str]
- grilly.datasets.validate_svc.validate_data_quality(entry)[source]
Check data quality constraints.
Dependencies:
Nonedetected from callable globals.Variables:
entry(dict, required).Usage Example
from grilly.datasets.validate_svc import validate_data_quality result = validate_data_quality(entry={})
- Parameters
entry (dict) –
- Return type
list[str]
- grilly.datasets.validate_svc.validate_source_consistency(entry, expected_source)[source]
Check source field matches the file it came from.
Dependencies:
Nonedetected from callable globals.Variables:
entry(dict, required);expected_source(str, required).Usage Example
from grilly.datasets.validate_svc import validate_source_consistency result = validate_source_consistency(entry={}, expected_source='example')
- Parameters
entry (dict) –
expected_source (str) –
- Return type
list[str]
- grilly.datasets.validate_svc.validate_file(filepath, expected_source, expected_fields)[source]
Validate a single JSONL file and print report.
Dependencies:
collections,random.Variables:
filepath(pathlib.Path, required);expected_source(str, required);expected_fields(set, required).Usage Example
from grilly.datasets.validate_svc import validate_file result = validate_file(filepath=None, expected_source='example', expected_fields=None)
- Parameters
filepath (Path) –
expected_source (str) –
expected_fields (set) –