grilly.datasets.clean_conversations

Conversations SVC Cleaner

Cleans the conversations_svc_semantic.jsonl file by removing: - Leaked filenames (*.py, *.pt, *.txt, etc.) - File paths (Unix/Windows) - Code artifacts (&&, backticks, arrows, shell commands) - Technical numbers (checkpoint IDs, dimension specs) - Leaked project names (AURA, STDP, etc.) - Entries that become too short or invalid after cleaning

Output: conversations_svc_cleaned.jsonl

Functions

clean_svc(svc)

Clean SVC fields.

clean_text(text)

Apply all cleaning rules to a text string.

is_valid_after_cleaning(entry)

Check if an entry is still valid after cleaning.

main()

Autogenerated reference for grilly.datasets.clean_conversations.main.

Classes

Counter([iterable])

Dict subclass for counting hashable items.

Path(*args, **kwargs)

PurePath subclass that can make system calls.

grilly.datasets.clean_conversations.clean_text(text)[source]

Apply all cleaning rules to a text string.

Dependencies: re.

Variables: text (str, required).

Usage Example

from grilly.datasets.clean_conversations import clean_text

result = clean_text(text='example')
Parameters

text (str) –

Return type

str

grilly.datasets.clean_conversations.clean_svc(svc)[source]

Clean SVC fields.

Dependencies: None detected from callable globals.

Variables: svc (dict, required).

Usage Example

from grilly.datasets.clean_conversations import clean_svc

result = clean_svc(svc={})
Parameters

svc (dict) –

Return type

dict

grilly.datasets.clean_conversations.is_valid_after_cleaning(entry)[source]

Check if an entry is still valid after cleaning.

Dependencies: None detected from callable globals.

Variables: entry (dict, required).

Usage Example

from grilly.datasets.clean_conversations import is_valid_after_cleaning

result = is_valid_after_cleaning(entry={})
Parameters

entry (dict) –

Return type

bool

grilly.datasets.clean_conversations.main()[source]

Autogenerated reference for grilly.datasets.clean_conversations.main.

Dependencies: collections, json, pathlib, re, sys.

Variables: This callable does not take explicit input variables.

Usage Example

from grilly.datasets.clean_conversations import main

result = main()