Skip to main content

Configuration

Control database connections, authentication, workers, clustering, and node roles from a single file.

The Code

[project]
name = "my-app"

[database]
url = "${DATABASE_URL}"
pool_size = 50
replica_urls = ["${DATABASE_REPLICA_URL}"]

[gateway]
port = 8080

[worker]
max_concurrent_jobs = 10
poll_interval_ms = 100

[auth]
jwt_algorithm = "RS256"
jwks_url = "https://www.googleapis.com/service_accounts/v1/jwk/securetoken@system.gserviceaccount.com"
jwt_issuer = "https://securetoken.google.com/my-project"

[node]
roles = ["Gateway", "Worker", "Scheduler"]
worker_capabilities = ["general", "media"]

What Happens

Forge reads forge.toml at startup and substitutes environment variables. Each section configures a different subsystem. Sections you omit use sensible defaults.

Environment variables use ${VAR_NAME} syntax (uppercase letters, numbers, underscores). Unset variables remain as literal strings.

Sections

[project]

OptionTypeDefaultDescription
namestring"forge-app"Project identifier
versionstring"0.1.0"Project version

[database]

OptionTypeDefaultDescription
urlstring-PostgreSQL connection URL
embeddedboolfalseUse embedded PostgreSQL
data_dirstring.forge/postgresData directory for embedded mode
pool_sizeu3250Connection pool size
pool_timeout_secsu6430Pool checkout timeout
statement_timeout_secsu6430Query timeout
replica_urlsstring[][]Read replica URLs
read_from_replicaboolfalseRoute reads to replicas

Embedded PostgreSQL

For development or small deployments, Forge bundles PostgreSQL:

[database]
embedded = true
data_dir = ".forge/data"

No Docker. No external database. Data persists in data_dir. Requires the embedded-db feature.

Read Replicas

[database]
url = "${DATABASE_URL}"
replica_urls = [
"${DATABASE_REPLICA_1}",
"${DATABASE_REPLICA_2}"
]
read_from_replica = true

Queries route to replicas via round-robin. Mutations always use the primary. If all replicas fail, reads fall back to primary.

Pool Isolation (Bulkhead)

Separate connection pools prevent runaway workloads from starving others:

[database]
url = "${DATABASE_URL}"
pool_size = 50

[database.pools.default]
size = 30
timeout_secs = 30

[database.pools.jobs]
size = 15
timeout_secs = 60
statement_timeout_secs = 300

[database.pools.analytics]
size = 5
timeout_secs = 120
statement_timeout_secs = 600

A slow analytics query exhausting 5 connections cannot touch the 30 connections reserved for user requests. Each pool has independent size limits and statement timeouts.

[gateway]

OptionTypeDefaultDescription
portu168080HTTP port
grpc_portu169000Inter-node communication port
max_connectionsusize10000Maximum concurrent connections
request_timeout_secsu6430Request timeout

[auth]

OptionTypeDefaultDescription
jwt_algorithmstring"HS256"Signing algorithm
jwt_secretstring-Secret for HMAC algorithms
jwks_urlstring-JWKS endpoint for RSA algorithms
jwks_cache_ttl_secsu643600Public key cache duration
jwt_issuerstring-Expected issuer (optional)
jwt_audiencestring-Expected audience (optional)
token_expirystring-Token lifetime (e.g., "15m", "7d")
session_ttl_secsu64604800WebSocket session TTL (7 days)

HMAC (Symmetric)

[auth]
jwt_algorithm = "HS256" # or HS384, HS512
jwt_secret = "${JWT_SECRET}"

RSA with JWKS (Asymmetric)

[auth]
jwt_algorithm = "RS256" # or RS384, RS512
jwks_url = "https://your-provider.com/.well-known/jwks.json"
jwt_issuer = "https://your-provider.com"
jwt_audience = "your-app-id"

Common JWKS URLs:

ProviderJWKS URL
Firebasehttps://www.googleapis.com/service_accounts/v1/jwk/securetoken@system.gserviceaccount.com
Auth0https://YOUR_DOMAIN.auth0.com/.well-known/jwks.json
Clerkhttps://YOUR_DOMAIN.clerk.accounts.dev/.well-known/jwks.json
Supabasehttps://YOUR_PROJECT.supabase.co/auth/v1/jwks

[worker]

OptionTypeDefaultDescription
max_concurrent_jobsusize50Concurrent job limit per worker
job_timeout_secsu643600Default job timeout (1 hour)
poll_interval_msu64100Queue polling interval

Workers maintain a semaphore sized to max_concurrent_jobs. They only poll when permits are available. Backpressure propagates naturally.

[cluster]

OptionTypeDefaultDescription
namestring"default"Cluster identifier
heartbeat_interval_secsu645Heartbeat frequency
dead_threshold_secsu6415Missing heartbeats before dead

Discovery

Nodes register in the forge_nodes database table. No external service required.

[cluster]
discovery = "postgres"

[node]

OptionTypeDefaultDescription
rolesstring[]all rolesRoles this node assumes
worker_capabilitiesstring[]["general"]Job routing capabilities

Node Roles

RoleResponsibility
GatewayHTTP/gRPC endpoints, WebSocket connections
FunctionQuery and mutation execution
WorkerBackground job processing
SchedulerCron scheduling, leader election

Single-node deployment (default):

[node]
roles = ["Gateway", "Function", "Worker", "Scheduler"]

API-only node:

[node]
roles = ["Gateway", "Function"]

Worker-only node:

[node]
roles = ["Worker"]
worker_capabilities = ["gpu", "ml"]

Scheduler node (singleton per cluster):

[node]
roles = ["Scheduler"]

Multiple nodes can run Scheduler. Advisory locks ensure only one is active. Others wait as standbys.

Worker Capabilities

Route jobs to specific workers:

# GPU worker
[node]
roles = ["Worker"]
worker_capabilities = ["gpu"]

# General purpose worker
[node]
roles = ["Worker"]
worker_capabilities = ["general", "media"]

Jobs requiring worker_capability = "gpu" only run on workers with that capability. Jobs without a capability requirement run on any worker.

Patterns

Development

[project]
name = "my-app"

[database]
embedded = true

[gateway]
port = 3000

Production Single Node

[project]
name = "my-app"

[database]
url = "${DATABASE_URL}"
pool_size = 100

[gateway]
port = 8080

[auth]
jwt_algorithm = "RS256"
jwks_url = "${JWKS_URL}"
jwt_issuer = "${JWT_ISSUER}"
jwt_audience = "${JWT_AUDIENCE}"

[worker]
max_concurrent_jobs = 20

Production Multi-Node

API nodes:

[database]
url = "${DATABASE_URL}"
replica_urls = ["${DATABASE_REPLICA_URL}"]
read_from_replica = true

[database.pools.default]
size = 40

[node]
roles = ["Gateway", "Function"]

[cluster]
discovery = "postgres"

Worker nodes:

[database]
url = "${DATABASE_URL}"

[database.pools.jobs]
size = 30
statement_timeout_secs = 600

[node]
roles = ["Worker"]
worker_capabilities = ["general"]

[worker]
max_concurrent_jobs = 25

[cluster]
discovery = "postgres"

Specialized Workers

GPU processing node:

[node]
roles = ["Worker"]
worker_capabilities = ["gpu"]

[worker]
max_concurrent_jobs = 4 # GPU memory limits concurrency
job_timeout_secs = 7200 # 2 hours for training jobs

Under the Hood

Environment Variable Substitution

Variables match the pattern ${VAR_NAME} where VAR_NAME contains uppercase letters, numbers, and underscores:

let re = Regex::new(r"\$\{([A-Z_][A-Z0-9_]*)\}")?;

Substitution happens at parse time. Unset variables remain as literal ${VAR_NAME} strings (useful for detecting misconfiguration).

Bulkhead Isolation

Connection pools isolate workloads:

┌─────────────────────────────────────────────────┐
│ PostgreSQL │
└─────────────────────────────────────────────────┘
▲ ▲ ▲
│ │ │
┌────┴────┐ ┌────┴────┐ ┌────┴────┐
│ default │ │ jobs │ │analytics│
│ 30 conn │ │ 15 conn │ │ 5 conn │
│ 30s TO │ │ 300s TO │ │ 600s TO │
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Queries │ │ Jobs │ │ Reports │
│Mutations│ │ │ │ │
└─────────┘ └─────────┘ └─────────┘

A runaway batch job cannot exhaust connections needed for user requests. Each pool enforces independent:

  • Connection count limits
  • Checkout timeouts
  • Statement timeouts

Cluster Discovery

Nodes discover each other through PostgreSQL:

SELECT * FROM forge_nodes WHERE last_heartbeat > NOW() - INTERVAL '15s'

Nodes insert their address on startup, update on heartbeat, and get cleaned up when dead_threshold passes. No additional infrastructure required.

Node Role Enforcement

Roles determine which subsystems start:

if config.node.roles.contains(&NodeRole::Gateway) {
start_http_server(&config.gateway).await?;
}
if config.node.roles.contains(&NodeRole::Worker) {
start_job_worker(&config.worker).await?;
}
if config.node.roles.contains(&NodeRole::Scheduler) {
start_cron_scheduler().await?;
}

Omitted roles mean those subsystems never start. A Worker-only node never binds the HTTP port. A Gateway-only node never polls the job queue.