Configuration
rulesgen uses Pydantic settings with the RULESGEN_ environment-variable
prefix. In host-run mode, settings are loaded from .env and the shell
environment. In Docker Compose mode, settings come from compose.yaml,
compose.opensandbox.yaml, and the shell environment.
See the repository
.env.example
for a local template.
Application Settings
Core service settings:
RULESGEN_APP_NAMERULESGEN_APP_VERSIONRULESGEN_ENVRULESGEN_LOG_LEVELRULESGEN_DOCS_ENABLEDRULESGEN_PROBLEM_BASE_URL
RULESGEN_DOCS_ENABLED=true enables the built-in OpenAPI pages at /docs,
/redoc, and /openapi.json, plus the versioned docs endpoints under
/v1/docs and /v1/openapi.json.
Auth and HTTP Edge
Authentication is disabled by default for local evaluation:
RULESGEN_AUTH_ENABLED=falseRULESGEN_API_KEY=change-me
When RULESGEN_AUTH_ENABLED=true, callers provide the API key through the
X-API-Key header.
HTTP edge settings:
RULESGEN_CORS_ALLOW_ORIGINSRULESGEN_TRUSTED_HOSTS
Both can be supplied as comma-separated values or JSON-style lists.
DSL Limits
The compiler validates DSL expressions against size and depth limits:
RULESGEN_DSL_MAX_LENGTHRULESGEN_DSL_MAX_DEPTHRULESGEN_DSL_MAX_NODES
These limits protect parser and validator behavior for untrusted rule input.
Local Storage
Generated files, uploads, rules, jobs, artifacts, audits, and semantic-cache data are local runtime outputs. Keep them out of source control.
Storage settings:
RULESGEN_DATA_DIRRULESGEN_RULES_REPOSITORY_DIRRULESGEN_JOBS_REPOSITORY_DIRRULESGEN_ARTIFACTS_REPOSITORY_DIRRULESGEN_UPLOADS_REPOSITORY_DIRRULESGEN_AUDITS_REPOSITORY_DIRRULESGEN_OSSFS_ROOT_DIRRULESGEN_LLM_SEMANTIC_CACHE_DIR
The default local output tree is under .rulesgen-data/.
Execution Backend
Dataset generation uses RULESGEN_SANDBOX_BACKEND:
subprocess: the default child-process dataset executor.opensandbox: the Alibaba OpenSandbox adapter.
Shared sandbox settings:
RULESGEN_SANDBOX_BACKENDRULESGEN_SANDBOX_WORKSPACE_DIRRULESGEN_SANDBOX_TIMEOUT_SECONDSRULESGEN_SANDBOX_PYTHON_EXECUTABLE
OpenSandbox settings:
RULESGEN_OPENSANDBOX_DOMAINRULESGEN_OPENSANDBOX_PROTOCOLRULESGEN_OPENSANDBOX_API_KEYRULESGEN_OPENSANDBOX_REQUEST_TIMEOUT_SECONDSRULESGEN_OPENSANDBOX_USE_SERVER_PROXYRULESGEN_OPENSANDBOX_IMAGERULESGEN_OPENSANDBOX_TTL_SECONDSRULESGEN_OPENSANDBOX_READY_TIMEOUT_SECONDSRULESGEN_OPENSANDBOX_WORKSPACE_DIR
See Run Modes for example combinations.
LLM Gateway
The LLM gateway translates natural_language input into a semantic_frame
and DSL candidate. Configure it with:
RULESGEN_LLM_GATEWAY_BACKEND:stub,http, orlitellm.RULESGEN_LLM_GATEWAY_URL: optional OpenAI-compatible gateway URL.RULESGEN_LLM_GATEWAY_TIMEOUT_SECONDSRULESGEN_LLM_PROMPT_TEMPLATE_VERSIONRULESGEN_LLM_MODEL_NAMERULESGEN_LLM_TEMPERATURERULESGEN_LLM_EXTRA_COMPLETION_PARAMSRULESGEN_LLM_FEEDBACK_MAX_ATTEMPTSRULESGEN_LLM_PROVIDER:auto,openai,anthropic,gemini,azure, ordatabricks.
Provider keys such as OPENAI_API_KEY, ANTHROPIC_API_KEY,
GEMINI_API_KEY, and AZURE_API_KEY are read by the provider SDKs or gateway
client. They are credential values at runtime and must never be committed.
RULESGEN_LLM_TEMPERATURE accepts null or an empty string to omit the
temperature parameter entirely. Use RULESGEN_LLM_EXTRA_COMPLETION_PARAMS for
model-specific JSON options such as maximum-token or reasoning controls.
Databricks Gateway Settings
The Databricks gateway is selected by RULESGEN_LLM_PROVIDER=databricks, or
by provider auto-detection when Databricks runtime variables are present and
the Databricks extra is installed.
Databricks environment-variable name settings:
RULESGEN_DATABRICKS_HOST_ENV_VARRULESGEN_DATABRICKS_TOKEN_ENV_VAR
Their default values point to the standard Databricks SDK environment variable names. The credential values themselves are resolved by the Databricks SDK auth chain.
See Databricks Models for setup examples.
Semantic Cache
Semantic-cache settings:
RULESGEN_LLM_SEMANTIC_CACHE_ENABLEDRULESGEN_LLM_SEMANTIC_CACHE_DIRRULESGEN_LLM_SEMANTIC_CACHE_SIMILARITY_THRESHOLDRULESGEN_LLM_SEMANTIC_CACHE_EMBEDDING_DIMENSION
Cache entries are scoped by prompt version, model, table, schema fingerprint, and requested targets.
Guardrails
Guardrails scan natural-language rule input before it reaches the LLM gateway.
Core settings:
RULESGEN_GUARDRAILS_ENABLEDRULESGEN_GUARDRAILS_BACKENDRULESGEN_GUARDRAILS_THRESHOLDRULESGEN_GUARDRAILS_MATCH_TYPERULESGEN_GUARDRAILS_MODEL_CACHE_DIRRULESGEN_GUARDRAILS_MODEL_IDRULESGEN_GUARDRAILS_BLOCK_MESSAGE
HTTP scanner settings:
RULESGEN_GUARDRAILS_HTTP_ENDPOINTRULESGEN_GUARDRAILS_HTTP_AUTH_MODERULESGEN_GUARDRAILS_HTTP_AUTH_ENV_VARRULESGEN_GUARDRAILS_HTTP_DATABRICKS_HOST_ENV_VARRULESGEN_GUARDRAILS_HTTP_TIMEOUT_SECONDSRULESGEN_GUARDRAILS_HTTP_THRESHOLDRULESGEN_GUARDRAILS_HTTP_REQUEST_FIELDRULESGEN_GUARDRAILS_HTTP_RESPONSE_SCORE_PATH
See Safety Guardrails for backend behavior.
Local Example
This example runs local translation through the stub backend and uses the subprocess dataset executor.
export RULESGEN_LLM_GATEWAY_BACKEND=stub
export RULESGEN_SANDBOX_BACKEND=subprocess
export RULESGEN_DOCS_ENABLED=true