Example Workflows

This page shows the two primary end-user workflows:

Parse a natural_language rule, compile the DSL candidate, and preview it against one row.
Upload a dataset, generate a target dataset, poll the resulting job, and download generated artifacts.

Start the local stack first with Quick Start, then set:

export BASE_URL=http://127.0.0.1:8000

Parse, Compile, Preview

Use this workflow when you want to inspect rule behavior before full dataset generation.

Parse a Natural-Language Rule

POST /rules/parse accepts either top-level rule fields or one rule embedded in a schema row. For schema-embedded input, the target column is inferred from the schema row name.

curl -s "$BASE_URL/rules/parse" \
  -H "Content-Type: application/json" \
  -d '{
    "table_name": "employees",
    "schema": [
      {"name": "salary", "type": "FLOAT", "nullable": false, "source": "syngen"},
      {"name": "job_level", "type": "INT", "nullable": false, "source": "syngen"},
      {
        "name": "bonus",
        "type": "FLOAT",
        "nullable": true,
        "source": "rule",
        "source_text": "If job_level is 5 or higher, set bonus to 10 percent of salary.",
        "source_type": "natural_language"
      }
    ]
  }'

Important response fields include:

dsl_candidate: the translated DSL expression. Treat it as untrusted until compilation succeeds.
diagnostics: structured feedback for parsing, translation, and validation.
prompt_audit and prompt_audits: audit metadata for translation attempts.
metrics: LLM request metrics when a real translation backend is used.
explainability_trace: trace data connecting input, translation, and compiler behavior.

Supported schema row source_type values are natural_language, dsl, and domain_specific_language.

Compile the DSL Candidate

Use the parse response dsl_candidate, or submit a DSL expression directly. The compile step validates the expression and returns a persisted compiled_rule artifact.

COMPILE_RESPONSE="$(
  curl -s "$BASE_URL/rules/compile" \
    -H "Content-Type: application/json" \
    --data-binary @- <<'EOF'
{
  "expression": "0.1 * col('salary') if col('job_level') >= 5 else 0",
  "target_column": "bonus"
}
EOF
)"

export ARTIFACT_ID="$(echo "$COMPILE_RESPONSE" | jq -r '.artifact_id')"
echo "ARTIFACT_ID=$ARTIFACT_ID"

Save the returned artifact_id; the preview endpoint can use it without resending the expression.

Preview Against One Row

POST /rules/preview runs the compiled rule with a sample row and seed. Local preview supports row-phase helpers only; aggregate helpers such as group_sum and group_count are for dataset generation.

curl -s "$BASE_URL/rules/preview" \
  -H "Content-Type: application/json" \
  --data-binary @- <<EOF
{
  "artifact_id": "$ARTIFACT_ID",
  "row": {
    "salary": 120000,
    "job_level": 6
  },
  "seed": 99
}
EOF

Key response fields are value, execution_mode, and diagnostics.

Upload, Generate, Poll, Download

Use this workflow when you want to apply rule-generated columns across a dataset.

Upload a Source File

POST /datasets/uploads stages a CSV or JSON file and returns a file_id.

UPLOAD_RESPONSE="$(
  curl -s "$BASE_URL/datasets/uploads" \
    -F "file=@samples/orders.csv;type=text/csv"
)"

export FILE_ID="$(echo "$UPLOAD_RESPONSE" | jq -r '.file_id')"
echo "FILE_ID=$FILE_ID"

The upload response includes file_id, format, row_count, and columns.

Submit a Generation Job

POST /datasets/generate creates a tracked generation job. Exactly one of base_rows or file_id must be supplied. When file_id is used, the service derives row_count from the uploaded file, so the request must not include row_count.

GENERATE_RESPONSE="$(
  curl -s "$BASE_URL/datasets/generate" \
    -H "Content-Type: application/json" \
    --data-binary @- <<EOF
{
  "file_id": "$FILE_ID",
  "schema": [
    {"name": "order_id", "type": "STRING", "nullable": false, "source": "syngen"},
    {"name": "line_amount", "type": "INT", "nullable": false, "source": "syngen"},
    {
      "name": "order_total",
      "type": "INT",
      "nullable": true,
      "source": "rule",
      "source_text": "group_sum(key=col(\"order_id\"), value=col(\"line_amount\"))",
      "source_type": "domain_specific_language"
    }
  ],
  "seed": 17
}
EOF
)"

export JOB_ID="$(echo "$GENERATE_RESPONSE" | jq -r '.job_id')"
echo "JOB_ID=$JOB_ID"

The response is metadata-only. It includes job_id, status, planned_column_sources, llm_metrics when natural-language translation is used, and diagnostics.

Poll the Job

curl -s "$BASE_URL/jobs/$JOB_ID"

Poll until status is succeeded or failed. A succeeded job includes:

result.output_path: generated dataset path on the rulesgen host.
artifacts: dataset, manifest, diagnostics, and execution-log metadata.
diagnostics: execution-path diagnostics.
llm_metrics: translation metrics when natural-language rules were used.

The job response remains metadata-only; download endpoints retrieve file contents.

Download Generated Output

Download the generated dataset:

curl -s "$BASE_URL/jobs/$JOB_ID/dataset" -o generated_rows.json

Download a specific stored artifact from the same job:

export ARTIFACT_ID="$(
  curl -s "$BASE_URL/jobs/$JOB_ID" \
    | jq -r '.artifacts[] | select(.kind == "input_manifest") | .artifact_id' \
    | head -n 1
)"
echo "ARTIFACT_ID=$ARTIFACT_ID"

curl -s "$BASE_URL/jobs/$JOB_ID/artifacts/$ARTIFACT_ID" -o artifact.bin

By default, generated files are written under the configured local OSSFS root. In the default local configuration that root is .rulesgen-data/ossfs/.

Backend Behavior

Dataset generation uses the backend configured by RULESGEN_SANDBOX_BACKEND:

subprocess: runs the shared dataset runner in a child Python process and stores manifests and outputs under the local OSSFS root.
opensandbox: uploads the same manifest contract to an Alibaba OpenSandbox-managed container and downloads generated output back to the local OSSFS root.

See Run Modes for local and OpenSandbox deployment choices.

Parse, Compile, Preview​

Parse a Natural-Language Rule​

Compile the DSL Candidate​

Preview Against One Row​

Upload, Generate, Poll, Download​

Upload a Source File​

Submit a Generation Job​

Poll the Job​

Download Generated Output​

Backend Behavior​