Skip to main content

Observability Catalog

This page is a watch-list reference. Every metric name, span name, and contract documented here is stable at GA — breaking changes require a major version bump. Items marked experimental may be renamed or removed in a minor release.

Metrics

All metrics are emitted under the forge-runtime meter (OTel SDK) and exported via OTLP HTTP to the configured otlp_endpoint. Metrics are only active when the otel Cargo feature is enabled; without it, all record calls are no-ops.

HTTP

MetricTypeDimensionsDescriptionTier
http_requests_totalCountermethod, path, statusTotal HTTP requests processed by the gatewayStable
http_request_duration_secondsHistogrammethod, path, statusRequest latency in secondsStable
active_connectionsUpDownCountertypeCurrent open connections (value of type: sse, websocket, or similar)Stable

Functions

MetricTypeDimensionsDescriptionTier
fn.executions_totalCounterfunction, kind, statusTotal RPC function executions (status: ok or error)Stable
fn.duration_secondsHistogramfunction, kind, statusHandler execution time in secondsStable

kind is the handler type: query, mutation, webhook, mcp_tool.

Jobs

MetricTypeDimensionsDescriptionTier
job_executions_totalCounterjob_type, statusTotal job executions (status: completed, retrying, failed, cancelled, dead_letter)Stable
job_duration_secondsHistogramjob_type, statusJob execution time in secondsStable

Database

These follow OTel semantic conventions for database clients. Meter name: forge.db.

MetricTypeDimensionsDescriptionTier
db.client.operation.durationHistogramdb.system, db.operation.nameDuration of individual DB operationsStable
db.client.connection.countGaugedb.systemActive (non-idle) connections in the poolStable
db.client.connection.idle_countGaugedb.systemIdle connections in the poolStable
db.client.connection.maxGaugedb.systemPool max size from configStable

db.system is always postgresql.


Spans

Spans are created via tracing::info_span! and bridged to OTel through the tracing-opentelemetry layer. Span names below match the string passed to info_span!.

Gateway

SpanAttributesDescription
http.requesthttp.method, http.route, http.status_code, trace_id, request_idOne span per inbound HTTP request. http.status_code recorded on response.

Functions

SpanAttributesDescription
fn.executefunction, fn.kindWraps the full handler call including cache lookup and timeout.
db.transactiondb.systemCreated when a mutation runs with transactional = true.
db.querydb.system, db.operation.name, db.collection.name (optional)Per-query span from instrumented_query. db.collection.name omitted when the table cannot be derived.

Jobs

SpanAttributesDescription
job.executejob_id, job_typeWraps a single job execution attempt.

Cron

SpanAttributesDescription
cron.tickcron.tick_id, cron.jobs_checked, cron.jobs_executedOne span per scheduler poll cycle. jobs_checked and jobs_executed recorded after the tick.
cron.executecron.name, cron.run_id, cron.schedule, cron.timezone, cron.scheduled_time, cron.is_catch_up, cron.duration_ms, cron.statusWraps a single cron handler invocation. otel.name overridden to cron <name> for readable traces.
cron.catch_upcron.name, cron.missed_count, cron.executed_countCovers the catch-up replay loop for a single cron.

Daemons

SpanAttributesDescription
daemon.runnerdaemon.node_id, daemon.count, daemon.uptime_msWraps the entire daemon runner lifetime.
daemon.lifecycledaemon.name, daemon.node_id, daemon.leader_elected, daemon.restart_count, daemon.uptime_ms, daemon.final_statusPer-daemon span covering restarts. otel.name overridden to daemon <name>.
daemon.executedaemon.instance_id, daemon.execution_duration_ms, daemon.statusOne span per daemon execution instance (between restarts).

Frozen Contracts

Workflow signature derivation

The workflow signature is a frozen contract. The set of fields fed into the hash and the algorithm will never change without a major version bump.

Algorithm: FNV-1a 64-bit, hex-encoded. Fields fed in order, each separated by a 0xff byte:

  1. name (workflow name string)
  2. version (version string)
  3. Step keys (sorted, from ctx.step("key", ...) call sites — string literals only)
  4. Wait keys (sorted, from ctx.wait_for_event("key", ...) call sites — string literals only)
  5. timeout_secs as little-endian u64
  6. Input type string (Rust type as token stream string, e.g. Uuid or ())
  7. Output type string

Consequence: renaming any step, adding/removing a step or wait, changing the timeout, or changing the input/output type changes the signature. A run pinned to the old signature blocks resume and flips /_api/ready to 503 until resolved. Never add new fields to the derivation without bumping the workflow version.

Step name rules

Step and wait-event keys must follow these rules, enforced at runtime (not compile time):

  • String literals only — no format strings, no runtime-computed values
  • Maximum 64 characters
  • Allowed characters: alphanumeric (a-z, A-Z, 0-9), underscore (_), hyphen (-)
  • Names are case-sensitive and must be stable across deploys within a version

Violating these rules can produce silently different signatures or failed lookups on resume.


Operational Constraints

Database primary failover requires restart

Forge's reactivity system (LISTEN/NOTIFY) holds a single dedicated long-lived PostgreSQL connection via ChangeListener. This connection is not automatically re-established if the primary fails over to a replica. After a failover, restart the process to reconnect. Relying on in-process reconnection will cause change events to stop arriving silently.

The job worker and migration runner also use pool connections, but those reconnect automatically via SQLx's pool health checks. The LISTEN connection is the exception.

forge_* reserved schema namespace

All tables, sequences, functions, and indexes prefixed with forge_ are owned by the framework. Do not create application tables with this prefix. forge check will fail the build if it detects user-defined SQL that references previously-unknown forge_* table names in mutations without going through the framework dispatch APIs.

Current framework tables (incomplete — check your migration history for the full list):

  • forge_jobs, forge_job_runs
  • forge_workflow_runs, forge_workflow_definitions
  • forge_cron_runs
  • forge_nodes
  • forge_signals_events, forge_signals_sessions, forge_signals_daily_salt
  • forge_migrations

Config Substitution Edge Cases

forge.toml supports ${VAR} and ${VAR-default} / ${VAR:-default} interpolation before TOML parsing. The rules:

SyntaxVar set, non-emptyVar set, empty stringVar unset
${VAR}substitutes valuesubstitutes empty stringpreserves literal ${VAR}
${VAR-default}substitutes valuesubstitutes empty stringsubstitutes default
${VAR:-default}substitutes valuesubstitutes empty stringsubstitutes default

Key points:

  • VAR="" (set to empty string) counts as set. The default is not used. ${VAR-fallback} expands to an empty string, not fallback.
  • ${VAR} with no default and no env var leaves the literal ${VAR} in the TOML, which then either gets parsed as a string (unusual) or causes a TOML parse error. It does not silently become an empty string.
  • :- and - are equivalent — both only trigger on unset, not on empty.
  • Var names must match [A-Z_][A-Z0-9_]*. Lowercase names are not substituted and are passed through as-is.
# Uses "http://localhost:4318" when FORGE_OTEL_ENDPOINT is unset
otlp_endpoint = "${FORGE_OTEL_ENDPOINT:-http://localhost:4318}"

# Will cause a TOML parse error if DATABASE_URL is unset (no default — intentional)
url = "${DATABASE_URL}"

# Expands to empty string if LOG_LEVEL="" — probably not what you want
log_level = "${LOG_LEVEL-info}"