Expected Behavior
- Prometheus metric cardinality for Temporal matching should be roughly bounded by the set of live physical task queues and their labels (namespaces × queues × task types × partitions, etc.).
- When a
PhysicalTaskQueueManager is unloaded or a task queue becomes idle/obsolete, its gauge time series should not accumulate indefinitely over pod lifetime.
- Long-lived clusters should not see
scrape_samples_scraped increase monotonically per matching pod purely as a function of pod uptime.
Actual Behavior
Matching Service - Primary
Gauge values leak in gaugeAdapter.values — entries are never removed
The OpenTelemetry gauge implementation in Temporal uses a gaugeAdapter that stores gauge values in a map keyed by label set. Entries are only ever added, never deleted.
File: common/metrics/otel_metrics_handler.go:31-41,146-151
type gaugeAdapter struct {
lock sync.Mutex
values map[attribute.Distinct]gaugeValue // ONLY ADDS, NEVER REMOVES
}
func (g *gaugeAdapterGauge) Record(v float64, tags ...Tag) {
set := g.omp.makeSet(tags)
g.adapter.lock.Lock()
defer g.adapter.lock.Unlock()
g.adapter.values[set.Equivalent()] = gaugeValue{value: v, set: set}
}
On every Prometheus scrape, the callback iterates ALL entries (otel_metrics_handler.go:137-143), reporting every label combination that was ever recorded — including stale ones from unloaded task queues.
What creates new label combinations over time
Each PhysicalTaskQueueManager gets a uniquely-tagged metrics handler (physical_task_queue_manager.go:132-134):
taggedMetricsHandler := partitionMgr.metricsHandler.WithTags(
metrics.OperationTag(metrics.MatchingTaskQueueMgrScope),
metrics.WorkerBuildIdTag(buildIdTagValue, config.BreakdownMetricsByBuildID()))
New physical task queues are created for:
- Each unique worker build ID (every deployment with a new build ID)
- Each deployment series + build ID pair
- Each version set (old versioning API)
Gauge metrics emitted per physical TQ include:
- approximate_backlog_count (db.go:709)
- approximate_backlog_age_seconds (db.go:711-713)
- task_lag_per_tl (db.go:715)
Full cardinality = namespaces × task_queues × build_IDs × partitions × task_types × ~3 gauge metrics
History Service (Secondary)
History uses counters/histograms (not gauges) with dynamic labels, so the growth is slower but still present:
Events cache — NamespaceIDTag on every cache operation (events/cache.go:111,147,159,176)
Workflow cache — NamespaceIDTag on every access (workflow/cache/cache.go:182-185)
Workflow completion — WorkflowTypeTag on every workflow close (workflow/metrics.go:94)
Mutable state stats — NamespaceTag on every persist (workflow/transaction_impl.go:684,699)
Growth is proportional to unique namespace IDs × workflow types. More bounded than matching but still unbounded.
Steps to Reproduce the Problem
- Deploy Temporal Server v1.29.4 via
temporal-operator on Kubernetes.
- Enable Prometheus scraping for the matching and history services (standard
/metrics endpoint).
- Run workloads over time that:
- Create many task queues and partitions.
- Roll out multiple worker build IDs / deployment versions and version sets.
- Observe over days/weeks:
scrape_samples_scraped{job="temporal-cluster-matching-headless", clusterName="ahp", ...} increases steadily for long-lived matching pods.
- A similar, but slower, increasing pattern appears for the history service.
Specifications
- Version: Temporal Server v1.29.4 (server), using the otel metrics implementation (
gaugeAdapter).
- Platform: Kubernetes (deployed via
temporal-operator), Prometheus for metrics scraping, cluster with many namespaces / task queues / worker build IDs.
Expected Behavior
PhysicalTaskQueueManageris unloaded or a task queue becomes idle/obsolete, its gauge time series should not accumulate indefinitely over pod lifetime.scrape_samples_scrapedincrease monotonically per matching pod purely as a function of pod uptime.Actual Behavior
Matching Service - Primary
Gauge values leak in
gaugeAdapter.values— entries are never removedThe OpenTelemetry gauge implementation in Temporal uses a
gaugeAdapterthat stores gauge values in a map keyed by label set. Entries are only ever added, never deleted.On every Prometheus scrape, the callback iterates ALL entries (otel_metrics_handler.go:137-143), reporting every label combination that was ever recorded — including stale ones from unloaded task queues.
What creates new label combinations over time
Each PhysicalTaskQueueManager gets a uniquely-tagged metrics handler (physical_task_queue_manager.go:132-134):
New physical task queues are created for:
Gauge metrics emitted per physical TQ include:
Full cardinality = namespaces × task_queues × build_IDs × partitions × task_types × ~3 gauge metrics
History Service (Secondary)
History uses counters/histograms (not gauges) with dynamic labels, so the growth is slower but still present:
Events cache — NamespaceIDTag on every cache operation (events/cache.go:111,147,159,176)
Workflow cache — NamespaceIDTag on every access (workflow/cache/cache.go:182-185)
Workflow completion — WorkflowTypeTag on every workflow close (workflow/metrics.go:94)
Mutable state stats — NamespaceTag on every persist (workflow/transaction_impl.go:684,699)
Growth is proportional to unique namespace IDs × workflow types. More bounded than matching but still unbounded.
Steps to Reproduce the Problem
temporal-operatoron Kubernetes./metricsendpoint).scrape_samples_scraped{job="temporal-cluster-matching-headless", clusterName="ahp", ...}increases steadily for long-lived matching pods.Specifications
gaugeAdapter).temporal-operator), Prometheus for metrics scraping, cluster with many namespaces / task queues / worker build IDs.