Observability Catalog
This page is a watch-list reference. Every metric name, span name, and contract documented here is stable at GA — breaking changes require a major version bump. Items marked experimental may be renamed or removed in a minor release.
Metrics
All metrics are emitted under the forge-runtime meter (OTel SDK) and exported via OTLP HTTP to the configured otlp_endpoint. Metrics are only active when the otel Cargo feature is enabled; without it, all record calls are no-ops.
HTTP
| Metric | Type | Dimensions | Description | Tier |
|---|---|---|---|---|
http_requests_total | Counter | method, path, status | Total HTTP requests processed by the gateway | Stable |
http_request_duration_seconds | Histogram | method, path, status | Request latency in seconds | Stable |
active_connections | UpDownCounter | type | Current open connections (value of type: sse, websocket, or similar) | Stable |
Functions
| Metric | Type | Dimensions | Description | Tier |
|---|---|---|---|---|
fn.executions_total | Counter | function, kind, status | Total RPC function executions (status: ok or error) | Stable |
fn.duration_seconds | Histogram | function, kind, status | Handler execution time in seconds | Stable |
kind is the handler type: query, mutation, webhook, mcp_tool.
Jobs
| Metric | Type | Dimensions | Description | Tier |
|---|---|---|---|---|
job_executions_total | Counter | job_type, status | Total job executions (status: completed, retrying, failed, cancelled, dead_letter) | Stable |
job_duration_seconds | Histogram | job_type, status | Job execution time in seconds | Stable |
Database
These follow OTel semantic conventions for database clients. Meter name: forge.db.
| Metric | Type | Dimensions | Description | Tier |
|---|---|---|---|---|
db.client.operation.duration | Histogram | db.system, db.operation.name | Duration of individual DB operations | Stable |
db.client.connection.count | Gauge | db.system | Active (non-idle) connections in the pool | Stable |
db.client.connection.idle_count | Gauge | db.system | Idle connections in the pool | Stable |
db.client.connection.max | Gauge | db.system | Pool max size from config | Stable |
db.system is always postgresql.
Spans
Spans are created via tracing::info_span! and bridged to OTel through the tracing-opentelemetry layer. Span names below match the string passed to info_span!.
Gateway
| Span | Attributes | Description |
|---|---|---|
http.request | http.method, http.route, http.status_code, trace_id, request_id | One span per inbound HTTP request. http.status_code recorded on response. |
Functions
| Span | Attributes | Description |
|---|---|---|
fn.execute | function, fn.kind | Wraps the full handler call including cache lookup and timeout. |
db.transaction | db.system | Created when a mutation runs with transactional = true. |
db.query | db.system, db.operation.name, db.collection.name (optional) | Per-query span from instrumented_query. db.collection.name omitted when the table cannot be derived. |
Jobs
| Span | Attributes | Description |
|---|---|---|
job.execute | job_id, job_type | Wraps a single job execution attempt. |
Cron
| Span | Attributes | Description |
|---|---|---|
cron.tick | cron.tick_id, cron.jobs_checked, cron.jobs_executed | One span per scheduler poll cycle. jobs_checked and jobs_executed recorded after the tick. |
cron.execute | cron.name, cron.run_id, cron.schedule, cron.timezone, cron.scheduled_time, cron.is_catch_up, cron.duration_ms, cron.status | Wraps a single cron handler invocation. otel.name overridden to cron <name> for readable traces. |
cron.catch_up | cron.name, cron.missed_count, cron.executed_count | Covers the catch-up replay loop for a single cron. |
Daemons
| Span | Attributes | Description |
|---|---|---|
daemon.runner | daemon.node_id, daemon.count, daemon.uptime_ms | Wraps the entire daemon runner lifetime. |
daemon.lifecycle | daemon.name, daemon.node_id, daemon.leader_elected, daemon.restart_count, daemon.uptime_ms, daemon.final_status | Per-daemon span covering restarts. otel.name overridden to daemon <name>. |
daemon.execute | daemon.instance_id, daemon.execution_duration_ms, daemon.status | One span per daemon execution instance (between restarts). |
Frozen Contracts
Workflow signature derivation
The workflow signature is a frozen contract. The set of fields fed into the hash and the algorithm will never change without a major version bump.
Algorithm: FNV-1a 64-bit, hex-encoded. Fields fed in order, each separated by a 0xff byte:
name(workflow name string)version(version string)- Step keys (sorted, from
ctx.step("key", ...)call sites — string literals only) - Wait keys (sorted, from
ctx.wait_for_event("key", ...)call sites — string literals only) timeout_secsas little-endianu64- Input type string (Rust type as token stream string, e.g.
Uuidor()) - Output type string
Consequence: renaming any step, adding/removing a step or wait, changing the timeout, or changing the input/output type changes the signature. A run pinned to the old signature blocks resume and flips /_api/ready to 503 until resolved. Never add new fields to the derivation without bumping the workflow version.
Step name rules
Step and wait-event keys must follow these rules, enforced at runtime (not compile time):
- String literals only — no format strings, no runtime-computed values
- Maximum 64 characters
- Allowed characters: alphanumeric (
a-z,A-Z,0-9), underscore (_), hyphen (-) - Names are case-sensitive and must be stable across deploys within a version
Violating these rules can produce silently different signatures or failed lookups on resume.
Operational Constraints
Database primary failover requires restart
Forge's reactivity system (LISTEN/NOTIFY) holds a single dedicated long-lived PostgreSQL connection via ChangeListener. This connection is not automatically re-established if the primary fails over to a replica. After a failover, restart the process to reconnect. Relying on in-process reconnection will cause change events to stop arriving silently.
The job worker and migration runner also use pool connections, but those reconnect automatically via SQLx's pool health checks. The LISTEN connection is the exception.
forge_* reserved schema namespace
All tables, sequences, functions, and indexes prefixed with forge_ are owned by the framework. Do not create application tables with this prefix. forge check will fail the build if it detects user-defined SQL that references previously-unknown forge_* table names in mutations without going through the framework dispatch APIs.
Current framework tables (incomplete — check your migration history for the full list):
forge_jobs,forge_job_runsforge_workflow_runs,forge_workflow_definitionsforge_cron_runsforge_nodesforge_signals_events,forge_signals_sessions,forge_signals_daily_saltforge_migrations
Config Substitution Edge Cases
forge.toml supports ${VAR} and ${VAR-default} / ${VAR:-default} interpolation before TOML parsing. The rules:
| Syntax | Var set, non-empty | Var set, empty string | Var unset |
|---|---|---|---|
${VAR} | substitutes value | substitutes empty string | preserves literal ${VAR} |
${VAR-default} | substitutes value | substitutes empty string | substitutes default |
${VAR:-default} | substitutes value | substitutes empty string | substitutes default |
Key points:
VAR=""(set to empty string) counts as set. The default is not used.${VAR-fallback}expands to an empty string, notfallback.${VAR}with no default and no env var leaves the literal${VAR}in the TOML, which then either gets parsed as a string (unusual) or causes a TOML parse error. It does not silently become an empty string.:-and-are equivalent — both only trigger on unset, not on empty.- Var names must match
[A-Z_][A-Z0-9_]*. Lowercase names are not substituted and are passed through as-is.
# Uses "http://localhost:4318" when FORGE_OTEL_ENDPOINT is unset
otlp_endpoint = "${FORGE_OTEL_ENDPOINT:-http://localhost:4318}"
# Will cause a TOML parse error if DATABASE_URL is unset (no default — intentional)
url = "${DATABASE_URL}"
# Expands to empty string if LOG_LEVEL="" — probably not what you want
log_level = "${LOG_LEVEL-info}"