Configuration
Configure database connections, authentication, workers, clustering, and node roles in forge.toml.
The Code
[project]
name = "my-app"
[database]
url = "${DATABASE_URL}"
pool_size = 50
replica_urls = ["${DATABASE_REPLICA_URL}"]
[gateway]
port = 8080
[worker]
max_concurrent_jobs = 10
poll_interval_ms = 100
[auth]
jwt_algorithm = "RS256"
jwks_url = "https://www.googleapis.com/service_accounts/v1/jwk/securetoken@system.gserviceaccount.com"
jwt_issuer = "https://securetoken.google.com/my-project"
[node]
roles = ["gateway", "worker", "scheduler"]
worker_capabilities = ["general", "media"]
What Happens
Forge reads forge.toml at startup and substitutes environment variables. Each section configures a different subsystem. Sections you omit use sensible defaults.
Environment variables use ${VAR_NAME} syntax (uppercase letters, numbers, underscores). Default values are supported with ${VAR-default} or ${VAR:-default} syntax. Unset variables without defaults remain as literal strings.
Sections
[project]
| Option | Type | Default | Description |
|---|---|---|---|
name | string | "forge-app" | Project identifier |
version | string | "0.1.0" | Project version |
[database]
| Option | Type | Default | Description |
|---|---|---|---|
url | string | - | PostgreSQL connection URL |
pool_size | u32 | 50 | Connection pool size |
pool_timeout_secs | u64 | 30 | Pool checkout timeout |
statement_timeout_secs | u64 | 30 | Query timeout |
replica_urls | string[] | [] | Read replica URLs |
read_from_replica | bool | false | Route reads to replicas |
[database]
url = "${DATABASE_URL}"
During development, forge dev runs PostgreSQL via Docker Compose. For production, provide a DATABASE_URL pointing to your PostgreSQL instance.
Read Replicas
[database]
url = "${DATABASE_URL}"
replica_urls = [
"${DATABASE_REPLICA_1}",
"${DATABASE_REPLICA_2}"
]
read_from_replica = true
Queries route to healthy replicas via round-robin. Mutations always use the primary. A background monitor pings each replica every 15 seconds and removes unhealthy ones from rotation. If all replicas fail, reads fall back to primary.
Queries that need read-after-write consistency can use #[forge::query(consistent)] to bypass replicas and read directly from the primary.
Pool Isolation (Bulkhead)
Separate connection pools prevent runaway workloads from starving others:
[database]
url = "${DATABASE_URL}"
pool_size = 50
[database.pools.default]
size = 30
timeout_secs = 30
[database.pools.jobs]
size = 15
timeout_secs = 60
statement_timeout_secs = 300
[database.pools.analytics]
size = 5
timeout_secs = 120
statement_timeout_secs = 600
[database.pools.observability]
size = 3
timeout_secs = 5
statement_timeout_secs = 10
Available pool names and what uses them:
| Pool | Used By |
|---|---|
default | Queries, mutations, rate limiter, reactor, cluster coordination |
jobs | Job workers, cron runners, daemon processes, workflow executors |
analytics | Available via db.analytics_pool() for user code |
observability | Internal metrics collection (pool utilization, slow query tracking) |
Without pool isolation configured, everything shares the primary pool. With it configured, a spike in background job processing cannot starve user-facing query connections. Each pool enforces independent connection limits, checkout timeouts, and statement timeouts.
[gateway]
| Option | Type | Default | Description |
|---|---|---|---|
port | u16 | 8080 | HTTP port |
grpc_port | u16 | 9000 | Inter-node communication port |
max_connections | usize | 4096 | Maximum concurrent connections |
request_timeout_secs | u64 | 30 | Request timeout |
cors_enabled | bool | false | Enable CORS handling |
cors_origins | string[] | [] | Allowed CORS origins (use ["*"] for any) |
quiet_routes | string[] | ["/_api/health", "/_api/ready"] | Routes excluded from traces, metrics, and logs |
[function]
Controls query and mutation execution limits.
| Option | Type | Default | Description |
|---|---|---|---|
max_concurrent | usize | 1000 | Maximum concurrent function executions |
timeout_secs | u64 | 30 | Function execution timeout |
memory_limit | usize | 536870912 | Memory limit per function (bytes, 512 MiB) |
[function]
max_concurrent = 1000
timeout_secs = 30
memory_limit = 536870912 # 512 MiB
The memory limit is advisory. Functions exceeding this limit may be terminated. Set appropriately for your workload.
[security]
Reserved security settings parsed from config for forward compatibility.
| Option | Type | Default | Description |
|---|---|---|---|
secret_key | string | - | Reserved; currently not used by the runtime |
[security]
secret_key = "${FORGE_SECRET_KEY}"
Generate a secure key if you want to populate this value ahead of future runtime support:
openssl rand -base64 32
[auth]
| Option | Type | Default | Description |
|---|---|---|---|
jwt_algorithm | string | "HS256" | Signing algorithm |
jwt_secret | string | - | Secret for HMAC algorithms |
jwks_url | string | - | JWKS endpoint for RSA algorithms |
jwks_cache_ttl_secs | u64 | 3600 | Public key cache duration |
jwt_issuer | string | - | Expected issuer (optional) |
jwt_audience | string | - | Expected audience (optional) |
token_expiry | string | - | Optional app-level convention; not applied automatically by ctx.issue_token() |
session_ttl_secs | u64 | 604800 | Session TTL (7 days) |
HMAC (Symmetric)
[auth]
jwt_algorithm = "HS256" # or HS384, HS512
jwt_secret = "${JWT_SECRET}"
RSA with JWKS (Asymmetric)
[auth]
jwt_algorithm = "RS256" # or RS384, RS512
jwks_url = "https://your-provider.com/.well-known/jwks.json"
jwt_issuer = "https://your-provider.com"
jwt_audience = "your-app-id"
Common JWKS URLs:
| Provider | JWKS URL |
|---|---|
| Firebase | https://www.googleapis.com/service_accounts/v1/jwk/securetoken@system.gserviceaccount.com |
| Auth0 | https://YOUR_DOMAIN.auth0.com/.well-known/jwks.json |
| Clerk | https://YOUR_DOMAIN.clerk.accounts.dev/.well-known/jwks.json |
| Supabase | https://YOUR_PROJECT.supabase.co/auth/v1/jwks |
[mcp]
Controls Forge MCP server exposure on Streamable HTTP.
| Option | Type | Default | Description |
|---|---|---|---|
enabled | bool | false | Enable MCP endpoint |
path | string | "/mcp" | MCP endpoint path under /_api |
session_ttl_secs | u64 | 3600 | MCP session lifetime |
allowed_origins | string[] | [] | Allowed Origin values |
require_protocol_version_header | bool | true | Require MCP-Protocol-Version header after initialize |
[mcp]
enabled = true
path = "/mcp"
session_ttl_secs = 3600
allowed_origins = ["https://your-app.example"]
require_protocol_version_header = true
With default API routing, path = "/mcp" resolves to /_api/mcp.
[worker]
| Option | Type | Default | Description |
|---|---|---|---|
max_concurrent_jobs | usize | 50 | Concurrent job limit per worker |
job_timeout_secs | u64 | 3600 | Default job timeout (1 hour) |
poll_interval_ms | u64 | 100 | Queue polling interval |
Workers maintain a semaphore sized to max_concurrent_jobs. They only poll when permits are available. Backpressure propagates naturally.
[cluster]
| Option | Type | Default | Description |
|---|---|---|---|
name | string | "default" | Cluster identifier |
discovery | string | "postgres" | Discovery method; the current runtime only implements postgres |
heartbeat_interval_secs | u64 | 5 | Heartbeat frequency |
dead_threshold_secs | u64 | 15 | Missing heartbeats before dead |
seed_nodes | string[] | [] | Static seed node addresses (for static discovery) |
dns_name | string | - | DNS name for service discovery (for dns discovery) |
Discovery
Nodes register in the forge_nodes database table by default, so an external service is not required. The current runtime only implements Postgres-backed discovery; other configured discovery values are parsed but ignored with a warning.
[cluster]
discovery = "postgres"
[node]
| Option | Type | Default | Description |
|---|---|---|---|
roles | string[] | all roles | Roles this node assumes |
worker_capabilities | string[] | ["general"] | Job routing capabilities |
Node Roles
| Role | Responsibility |
|---|---|
gateway | HTTP/gRPC endpoints, SSE subscriptions |
function | Query and mutation execution |
worker | Background job processing |
scheduler | Cron scheduling, leader election |
Single-node deployment (default):
[node]
roles = ["gateway", "function", "worker", "scheduler"]
API-only node:
[node]
roles = ["gateway", "function"]
Worker-only node:
[node]
roles = ["worker"]
worker_capabilities = ["gpu", "ml"]
Scheduler node (singleton per cluster):
[node]
roles = ["scheduler"]
Multiple nodes can run Scheduler. Advisory locks ensure only one is active. Others wait as standbys.
Worker Capabilities
Route jobs to specific workers:
# GPU worker
[node]
roles = ["worker"]
worker_capabilities = ["gpu"]
# General purpose worker
[node]
roles = ["worker"]
worker_capabilities = ["general", "media"]
Jobs requiring worker_capability = "gpu" only run on workers with that capability. Jobs without a capability requirement run on any worker.
[observability]
OTLP-based telemetry for traces, metrics, and logs. Disabled by default. When enabled, Forge auto-instruments HTTP requests, function calls, job execution, and database queries without any application code changes.
| Option | Type | Default | Description |
|---|---|---|---|
enabled | bool | false | Enable OTLP telemetry export |
otlp_endpoint | string | "http://localhost:4318" | OTLP collector endpoint (HTTP) |
service_name | string | project name | Service name in telemetry data |
enable_traces | bool | true | Export distributed traces |
enable_metrics | bool | true | Export metrics |
enable_logs | bool | true | Export logs via OTLP |
sampling_ratio | f64 | 1.0 | Trace sampling ratio (0.0 to 1.0) |
log_level | string | "info" | Log level for the tracing subscriber |
[observability]
enabled = true
otlp_endpoint = "http://localhost:4318"
sampling_ratio = 0.5
Requires an OTLP-compatible collector (Jaeger, Grafana Alloy, OpenTelemetry Collector, etc).
What Gets Instrumented
With enabled = true, Forge automatically creates spans and records metrics for:
- HTTP requests (
http.requestspan): method, route, status code, duration, trace ID, request ID - Function calls (
fn.executespan): function name, kind (query/mutation), duration - Job execution (
job.executespan): job ID, job type, duration, outcome (completed/retrying/failed/timeout) - Database queries: operation, table, duration, connection pool utilization
Slow queries (over 500ms) emit a warning automatically. Database pool metrics (size, active, idle, waiting) are recorded every 15 seconds.
Routes listed in [gateway].quiet_routes are excluded from all telemetry. Health and readiness probes are excluded by default to avoid noise from Kubernetes liveness checks. Set quiet_routes = [] to monitor everything.
Console logs always work regardless of the enabled flag. The flag only controls OTLP export.
Patterns
Development
Development uses forge dev which runs Docker Compose. The forge.toml in generated projects uses ${DATABASE_URL} which is set by the Docker Compose environment:
[project]
name = "my-app"
[database]
url = "${DATABASE_URL}"
[gateway]
port = 8080
Production Single Node
[project]
name = "my-app"
[database]
url = "${DATABASE_URL}"
pool_size = 100
[gateway]
port = 8080
[auth]
jwt_algorithm = "RS256"
jwks_url = "${JWKS_URL}"
jwt_issuer = "${JWT_ISSUER}"
jwt_audience = "${JWT_AUDIENCE}"
[worker]
max_concurrent_jobs = 20
Production Multi-Node
API nodes:
[database]
url = "${DATABASE_URL}"
replica_urls = ["${DATABASE_REPLICA_URL}"]
read_from_replica = true
[database.pools.default]
size = 40
[node]
roles = ["gateway", "function"]
[cluster]
discovery = "postgres"
Worker nodes:
[database]
url = "${DATABASE_URL}"
[database.pools.jobs]
size = 30
statement_timeout_secs = 600
[node]
roles = ["worker"]
worker_capabilities = ["general"]
[worker]
max_concurrent_jobs = 25
[cluster]
discovery = "postgres"
Specialized Workers
GPU processing node:
[node]
roles = ["worker"]
worker_capabilities = ["gpu"]
[worker]
max_concurrent_jobs = 4 # GPU memory limits concurrency
job_timeout_secs = 7200 # 2 hours for training jobs
Under the Hood
Environment Variable Substitution
Variables match the pattern ${VAR_NAME} where VAR_NAME contains uppercase letters, numbers, and underscores:
let re = Regex::new(r"\$\{([A-Z_][A-Z0-9_]*)\}")?;
Substitution happens at parse time. Unset variables remain as literal ${VAR_NAME} strings (useful for detecting misconfiguration).
Bulkhead Isolation
Connection pools isolate workloads:
┌─────────────────────────────────────────────────┐
│ PostgreSQL │
└─────────────────────────────────────────────────┘
▲ ▲ ▲
│ │ │
┌────┴────┐ ┌────┴────┐ ┌────┴────┐
│ default │ │ jobs │ │analytics│
│ 30 conn │ │ 15 conn │ │ 5 conn │
│ 30s TO │ │ 300s TO │ │ 600s TO │
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Queries │ │ Jobs │ │ Reports │
│Mutations│ │ │ │ │
└─────────┘ └─────────┘ └─────────┘
A runaway batch job cannot exhaust connections reserved for user requests.
Cluster Discovery
Nodes discover each other through PostgreSQL:
SELECT * FROM forge_nodes WHERE last_heartbeat > NOW() - INTERVAL '15s'
Nodes insert their address on startup, update on heartbeat, and get cleaned up when dead_threshold passes. Additional infrastructure is not required.
Node Role Enforcement
Roles determine which subsystems start:
if config.node.roles.contains(&NodeRole::Gateway) {
start_http_server(&config.gateway).await?;
}
if config.node.roles.contains(&NodeRole::Worker) {
start_job_worker(&config.worker).await?;
}
if config.node.roles.contains(&NodeRole::Scheduler) {
start_cron_scheduler().await?;
}
Omitted roles mean those subsystems never start. A Worker-only node never binds the HTTP port. A Gateway-only node never polls the job queue.