Example Workflows
This page shows the two primary end-user workflows:
- Parse a
natural_languagerule, compile the DSL candidate, and preview it against one row. - Upload a dataset, generate a target dataset, poll the resulting
job, and download generated artifacts.
Start the local stack first with Quick Start, then set:
export BASE_URL=http://127.0.0.1:8000
Parse, Compile, Preview
Use this workflow when you want to inspect rule behavior before full dataset generation.
Parse a Natural-Language Rule
POST /rules/parse accepts either top-level rule fields or one rule embedded
in a schema row. For schema-embedded input, the target column is inferred from
the schema row name.
curl -s "$BASE_URL/rules/parse" \
-H "Content-Type: application/json" \
-d '{
"table_name": "employees",
"schema": [
{"name": "salary", "type": "FLOAT", "nullable": false, "source": "syngen"},
{"name": "job_level", "type": "INT", "nullable": false, "source": "syngen"},
{
"name": "bonus",
"type": "FLOAT",
"nullable": true,
"source": "rule",
"source_text": "If job_level is 5 or higher, set bonus to 10 percent of salary.",
"source_type": "natural_language"
}
]
}'
Important response fields include:
dsl_candidate: the translated DSL expression. Treat it as untrusted until compilation succeeds.diagnostics: structured feedback for parsing, translation, and validation.prompt_auditandprompt_audits: audit metadata for translation attempts.metrics: LLM request metrics when a real translation backend is used.explainability_trace: trace data connecting input, translation, and compiler behavior.
Supported schema row source_type values are natural_language, dsl, and
domain_specific_language.
Compile the DSL Candidate
Use the parse response dsl_candidate, or submit a DSL expression directly.
The compile step validates the expression and returns a persisted
compiled_rule artifact.
COMPILE_RESPONSE="$(
curl -s "$BASE_URL/rules/compile" \
-H "Content-Type: application/json" \
--data-binary @- <<'EOF'
{
"expression": "0.1 * col('salary') if col('job_level') >= 5 else 0",
"target_column": "bonus"
}
EOF
)"
export ARTIFACT_ID="$(echo "$COMPILE_RESPONSE" | jq -r '.artifact_id')"
echo "ARTIFACT_ID=$ARTIFACT_ID"
Save the returned artifact_id; the preview endpoint can use it without
resending the expression.
Preview Against One Row
POST /rules/preview runs the compiled rule with a sample row and seed. Local
preview supports row-phase helpers only; aggregate helpers such as group_sum
and group_count are for dataset generation.
curl -s "$BASE_URL/rules/preview" \
-H "Content-Type: application/json" \
--data-binary @- <<EOF
{
"artifact_id": "$ARTIFACT_ID",
"row": {
"salary": 120000,
"job_level": 6
},
"seed": 99
}
EOF
Key response fields are value, execution_mode, and diagnostics.
Upload, Generate, Poll, Download
Use this workflow when you want to apply rule-generated columns across a dataset.
Upload a Source File
POST /datasets/uploads stages a CSV or JSON file and returns a file_id.
UPLOAD_RESPONSE="$(
curl -s "$BASE_URL/datasets/uploads" \
-F "file=@samples/orders.csv;type=text/csv"
)"
export FILE_ID="$(echo "$UPLOAD_RESPONSE" | jq -r '.file_id')"
echo "FILE_ID=$FILE_ID"
The upload response includes file_id, format, row_count, and columns.
Submit a Generation Job
POST /datasets/generate creates a tracked generation job. Exactly one of
base_rows or file_id must be supplied. When file_id is used, the service
derives row_count from the uploaded file, so the request must not include
row_count.
GENERATE_RESPONSE="$(
curl -s "$BASE_URL/datasets/generate" \
-H "Content-Type: application/json" \
--data-binary @- <<EOF
{
"file_id": "$FILE_ID",
"schema": [
{"name": "order_id", "type": "STRING", "nullable": false, "source": "syngen"},
{"name": "line_amount", "type": "INT", "nullable": false, "source": "syngen"},
{
"name": "order_total",
"type": "INT",
"nullable": true,
"source": "rule",
"source_text": "group_sum(key=col(\"order_id\"), value=col(\"line_amount\"))",
"source_type": "domain_specific_language"
}
],
"seed": 17
}
EOF
)"
export JOB_ID="$(echo "$GENERATE_RESPONSE" | jq -r '.job_id')"
echo "JOB_ID=$JOB_ID"
The response is metadata-only. It includes job_id, status,
planned_column_sources, llm_metrics when natural-language translation is
used, and diagnostics.
Poll the Job
curl -s "$BASE_URL/jobs/$JOB_ID"
Poll until status is succeeded or failed. A succeeded job includes:
result.output_path: generated dataset path on therulesgenhost.artifacts: dataset, manifest, diagnostics, and execution-log metadata.diagnostics: execution-path diagnostics.llm_metrics: translation metrics when natural-language rules were used.
The job response remains metadata-only; download endpoints retrieve file contents.
Download Generated Output
Download the generated dataset:
curl -s "$BASE_URL/jobs/$JOB_ID/dataset" -o generated_rows.json
Download a specific stored artifact from the same job:
export ARTIFACT_ID="$(
curl -s "$BASE_URL/jobs/$JOB_ID" \
| jq -r '.artifacts[] | select(.kind == "input_manifest") | .artifact_id' \
| head -n 1
)"
echo "ARTIFACT_ID=$ARTIFACT_ID"
curl -s "$BASE_URL/jobs/$JOB_ID/artifacts/$ARTIFACT_ID" -o artifact.bin
By default, generated files are written under the configured local OSSFS root.
In the default local configuration that root is .rulesgen-data/ossfs/.
Backend Behavior
Dataset generation uses the backend configured by RULESGEN_SANDBOX_BACKEND:
subprocess: runs the shared dataset runner in a child Python process and stores manifests and outputs under the local OSSFS root.opensandbox: uploads the same manifest contract to an Alibaba OpenSandbox-managed container and downloads generated output back to the local OSSFS root.
See Run Modes for local and OpenSandbox deployment choices.