OpenTelemetry Base Metrics Collection - Complete Reference

This document describes all metrics collected by your OpenTelemetry configuration and where they come from.

Metrics Collection Architecture

Collection Flow

┌─────────────────────────────────────────────────────────────────────┐
│                      METRICS SOURCES                                │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌─────────────────┐                         ┌─────────────────┐    │
│  │ Applications    │                         │ Host Metrics    │    │
│  │ (OTLP)          │                         │ (node-level)    │    │
│  └────────┬────────┘                         └────────┬────────┘    │
│           │                                           │             │
│           └──────────────────┬────────────────────────┘             │
│                              │                                      │
│                              ▼                                      │
│               ┌──────────────────────────┐                          │
│               │ OTel Collector DaemonSet │                          │
│               │  (on each node)          │                          │
│               └──────────┬───────────────┘                          │
│                          │                                          │
│                          ▼                                          │
│               ┌──────────────────────────┐                          │
│               │ Prometheus Remote Write  │                          │
│               │ (:9090/api/v1/write)     │                          │
│               └──────────────────────────┘                          │
│                                                                     │
├─────────────────────────────────────────────────────────────────────┤
│              PROMETHEUS DIRECT SCRAPING (ServiceMonitors)           │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐             │
│  │ API      │  │ Scheduler│  │ Ctrl Mgr │  │ Kubelet  │             │
│  │ Server   │  │          │  │          │  │ cAdvisor │ ...         │
│  └─────┬────┘  └─────┬────┘  └─────┬────┘  └─────┬────┘             │
│        │             │             │             │                  │
│        └─────────────┼─────────────┼─────────────┘                  │
│                      │             │                                │
│                      ▼             ▼                                │
│           ┌──────────────────────────────┐                          │
│           │ Prometheus Server (scrape)   │                          │
│           └──────────────────────────────┘                          │
└─────────────────────────────────────────────────────────────────────┘

Key Components

OTel Collector DaemonSet: Runs on every node, collects node-local metrics
Prometheus Server: Scrapes Kubernetes components directly via ServiceMonitors
Remote Write: OTel collectors push metrics to Prometheus

OTel Collector Metrics

These metrics are collected by the OTel collector and sent to Prometheus via remote write.

1. Host Metrics (from hostmetrics receiver)

Source: hostmetrics receiver on each node
Collection Interval: 30s
Labels: k8s_node_name, host_name, service_instance_id

CPU Metrics

Metric Name	Type	Description	Labels
`system_cpu_time_seconds_total`	Counter	CPU time by state	`cpu`, `state` (user, system, idle, iowait, irq, softirq, steal, nice)
`system_cpu_utilization_ratio`	Gauge	CPU utilization (0-1)	`cpu`, `state`
`system_cpu_load_average_1m`	Gauge	1-minute load average	-
`system_cpu_load_average_5m`	Gauge	5-minute load average	-
`system_cpu_load_average_15m`	Gauge	15-minute load average	-

Memory Metrics

Metric Name	Type	Description	Labels
`system_memory_usage_bytes`	Gauge	Memory usage	`state` (used, free, buffered, cached, slab_reclaimable, slab_unreclaimable)
`system_memory_utilization_ratio`	Gauge	Memory utilization (0-1)	`state`

Disk Metrics

Metric Name	Type	Description	Labels
`system_disk_io_bytes_total`	Counter	Disk I/O bytes	`device`, `direction` (read, write)
`system_disk_operations_total`	Counter	Disk operations count	`device`, `direction`
`system_disk_io_time_seconds_total`	Counter	Disk I/O time	`device`
`system_disk_weighted_io_time_seconds_total`	Counter	Weighted I/O time	`device`
`system_disk_merged_total`	Counter	Merged operations	`device`, `direction`
`system_disk_pending_operations`	Gauge	Pending operations	`device`

Filesystem Metrics

Metric Name	Type	Description	Labels
`system_filesystem_usage_bytes`	Gauge	Filesystem usage	`device`, `type`, `mode`, `mountpoint`, `state` (used, free, reserved)
`system_filesystem_utilization_ratio`	Gauge	Filesystem utilization (0-1)	`device`, `type`, `mode`, `mountpoint`
`system_filesystem_inodes_usage`	Gauge	Inode usage	`device`, `type`, `mode`, `mountpoint`, `state`

Note: Excludes pseudo-filesystems: /dev/*, /proc/*, /sys/*, /run/*, overlay, tmpfs, etc.

Network Metrics

Metric Name	Type	Description	Labels
`system_network_io_bytes_total`	Counter	Network I/O bytes	`device`, `direction` (transmit, receive)
`system_network_packets_total`	Counter	Network packets	`device`, `direction`
`system_network_errors_total`	Counter	Network errors	`device`, `direction`
`system_network_dropped_total`	Counter	Dropped packets	`device`, `direction`
`system_network_connections`	Gauge	Network connections	`protocol`, `state`

Paging Metrics

Metric Name	Type	Description	Labels
`system_paging_usage_bytes`	Gauge	Swap usage	`device`, `state` (used, free)
`system_paging_operations_total`	Counter	Page in/out operations	`type` (major, minor), `direction`
`system_paging_faults_total`	Counter	Page faults	`type`

Process Metrics

Metric Name	Type	Description	Labels
`system_processes_count`	Gauge	Process count	`status` (running, blocked, etc.)
`system_processes_created_total`	Counter	Processes created	-

2. Pod and Container Metrics (from Prometheus cAdvisor scraping)

Source: Prometheus scraping kubelet's cAdvisor endpoint https://<node-ip>:10250/metrics/cadvisor
Scrape Interval: 30s (default)
Labels: Includes pod, container, namespace, node, image labels

Note: In your configuration, kubeletstats receiver is disabled to avoid duplicate sample errors. Instead, pod and container metrics come from Prometheus directly scraping kubelet/cAdvisor via ServiceMonitor. These metrics use the container_* prefix instead of k8s_pod_* or k8s_container_*.

Container CPU Metrics

Metric Name	Type	Description	Labels	kubeletstats Equivalent
`container_cpu_usage_seconds_total`	Counter	Total CPU time used	`container`, `pod`, `namespace`, `node`	`k8s_container_cpu_time_seconds_total`
`container_cpu_cfs_periods_total`	Counter	CFS scheduler periods	`container`, `pod`, `namespace`	-
`container_cpu_cfs_throttled_periods_total`	Counter	Throttled periods	`container`, `pod`, `namespace`	-
`container_cpu_cfs_throttled_seconds_total`	Counter	Time throttled	`container`, `pod`, `namespace`	-

Query Examples:

# CPU usage rate (cores)
rate(container_cpu_usage_seconds_total{container!=""}[5m])

# Per-pod CPU usage
sum by (pod, namespace) (rate(container_cpu_usage_seconds_total{container!=""}[5m]))

# CPU throttling percentage
rate(container_cpu_cfs_throttled_seconds_total[5m]) 
  / 
rate(container_cpu_cfs_periods_total[5m]) * 100

Container Memory Metrics

Metric Name	Type	Description	Labels	kubeletstats Equivalent
`container_memory_usage_bytes`	Gauge	Current memory usage	`container`, `pod`, `namespace`, `node`	`k8s_container_memory_usage_bytes`
`container_memory_working_set_bytes`	Gauge	Working set size (used for OOM)	`container`, `pod`, `namespace`, `node`	`k8s_container_memory_working_set_bytes`
`container_memory_rss`	Gauge	Resident set size	`container`, `pod`, `namespace`, `node`	`k8s_container_memory_rss_bytes`
`container_memory_cache`	Gauge	Page cache	`container`, `pod`, `namespace`	-
`container_memory_swap`	Gauge	Swap usage	`container`, `pod`, `namespace`	-
`container_memory_failcnt`	Counter	Memory allocation failures	`container`, `pod`, `namespace`	-
`container_memory_failures_total`	Counter	Memory limit hit count	`container`, `pod`, `namespace`, `failure_type`, `scope`	-

Query Examples:

# Current memory usage
container_memory_working_set_bytes{container!=""}

# Per-pod memory usage
sum by (pod, namespace) (container_memory_working_set_bytes{container!=""})

# Memory usage as % of limit
container_memory_working_set_bytes{container!=""}
  /
container_spec_memory_limit_bytes{container!=""} * 100

Container Network Metrics

Metric Name	Type	Description	Labels	kubeletstats Equivalent
`container_network_receive_bytes_total`	Counter	Network RX bytes	`pod`, `namespace`, `interface`	`k8s_pod_network_io_bytes_total{direction="receive"}`
`container_network_transmit_bytes_total`	Counter	Network TX bytes	`pod`, `namespace`, `interface`	`k8s_pod_network_io_bytes_total{direction="transmit"}`
`container_network_receive_packets_total`	Counter	Network RX packets	`pod`, `namespace`, `interface`	-
`container_network_transmit_packets_total`	Counter	Network TX packets	`pod`, `namespace`, `interface`	-
`container_network_receive_packets_dropped_total`	Counter	RX packets dropped	`pod`, `namespace`, `interface`	-
`container_network_transmit_packets_dropped_total`	Counter	TX packets dropped	`pod`, `namespace`, `interface`	-
`container_network_receive_errors_total`	Counter	RX errors	`pod`, `namespace`, `interface`	`k8s_pod_network_errors_total{direction="receive"}`
`container_network_transmit_errors_total`	Counter	TX errors	`pod`, `namespace`, `interface`	`k8s_pod_network_errors_total{direction="transmit"}`

Query Examples:

# Network receive rate (bytes/sec)
rate(container_network_receive_bytes_total[5m])

# Per-pod network bandwidth
sum by (pod, namespace) (
  rate(container_network_receive_bytes_total[5m]) +
  rate(container_network_transmit_bytes_total[5m])
)

Container Filesystem Metrics

Metric Name	Type	Description	Labels	kubeletstats Equivalent
`container_fs_usage_bytes`	Gauge	Filesystem usage	`container`, `pod`, `namespace`, `device`	`k8s_container_filesystem_usage_bytes`
`container_fs_limit_bytes`	Gauge	Filesystem limit	`container`, `pod`, `namespace`, `device`	`k8s_container_filesystem_capacity_bytes`
`container_fs_reads_total`	Counter	Filesystem reads	`container`, `pod`, `namespace`, `device`	-
`container_fs_writes_total`	Counter	Filesystem writes	`container`, `pod`, `namespace`, `device`	-
`container_fs_reads_bytes_total`	Counter	Bytes read	`container`, `pod`, `namespace`, `device`	-
`container_fs_writes_bytes_total`	Counter	Bytes written	`container`, `pod`, `namespace`, `device`	-
`container_fs_inodes_total`	Gauge	Total inodes	`container`, `pod`, `namespace`, `device`	-
`container_fs_inodes_free`	Gauge	Free inodes	`container`, `pod`, `namespace`, `device`	-

Query Examples:

# Filesystem usage
container_fs_usage_bytes{container!=""}

# Filesystem usage %
container_fs_usage_bytes{container!=""}
  /
container_fs_limit_bytes{container!=""} * 100

# I/O rate
rate(container_fs_reads_bytes_total[5m]) + 
rate(container_fs_writes_bytes_total[5m])

Container Limits and Requests

Metric Name	Type	Description	Labels
`container_spec_cpu_quota`	Gauge	CPU quota (microseconds per period)	`container`, `pod`, `namespace`
`container_spec_cpu_period`	Gauge	CPU period (microseconds)	`container`, `pod`, `namespace`
`container_spec_cpu_shares`	Gauge	CPU shares	`container`, `pod`, `namespace`
`container_spec_memory_limit_bytes`	Gauge	Memory limit	`container`, `pod`, `namespace`
`container_spec_memory_reservation_limit_bytes`	Gauge	Memory soft limit	`container`, `pod`, `namespace`
`container_spec_memory_swap_limit_bytes`	Gauge	Memory + swap limit	`container`, `pod`, `namespace`

Query Examples:

# CPU limit in cores
container_spec_cpu_quota / container_spec_cpu_period

# Memory limit
container_spec_memory_limit_bytes{container!=""}

Pod-Level Aggregations

Since cAdvisor reports per-container metrics, aggregate to pod level:

# Pod CPU usage (sum of all containers in pod)
sum by (pod, namespace, node) (
  rate(container_cpu_usage_seconds_total{container!=""}[5m])
)

# Pod memory usage (sum of all containers in pod)
sum by (pod, namespace, node) (
  container_memory_working_set_bytes{container!=""}
)

# Pod network I/O (already at pod level, not container level)
rate(container_network_receive_bytes_total[5m]) +
rate(container_network_transmit_bytes_total[5m])

Special Labels

Important label filters: - container!="" - Excludes pod-level aggregates (use for per-container metrics) - container="" - Shows only pod-level aggregates - container!="POD" - Excludes pause containers in older K8s versions - image!="" - Excludes special system containers

Label mappings from cAdvisor: - pod → Kubernetes pod name (cAdvisor) - k8s_pod_name → Kubernetes pod name (kubeletstats) - namespace → Kubernetes namespace (cAdvisor) - k8s_namespace_name → Kubernetes namespace (kubeletstats) - container → Container name (cAdvisor) - k8s_container_name → Container name (kubeletstats)

3. kubeletstats Metrics (DISABLED in Current Config)

Status: ⚠️ DISABLED - kubeletstats receiver is disabled to prevent duplicate sample errors.

The kubeletstats receiver would provide k8s_pod_* and k8s_container_* metrics, but these overlap significantly with the container_* metrics from cAdvisor (above).

Why disabled: Multiple OTel collectors were sending identical metrics with identical timestamps to Prometheus, causing "duplicate sample" and "out of order sample" errors. The issue persisted even with proper node-level filtering.

Alternative: Use the container_* metrics from Prometheus/cAdvisor instead (see section above for equivalent queries).

If you need kubeletstats metrics: They can be re-enabled via the kubeletMetrics preset, but you may experience duplicate sample warnings in Prometheus logs. Most metrics will still work despite the warnings.

3. Application Metrics (via OTLP)

Source: Applications sending metrics to OTel collector via OTLP
Endpoints: - gRPC: :4317 - HTTP: :4318

Applications instrumented with OpenTelemetry SDKs can send custom metrics. These vary by application but commonly include:

Request counts, durations, error rates
Business metrics (transactions, users, etc.)
Resource usage from application perspective
Custom application-specific metrics

Labels: Enriched with Kubernetes metadata via k8sattributes processor: - k8s_namespace_name - k8s_pod_name - k8s_node_name - k8s_deployment_name, k8s_statefulset_name, etc. - Pod labels: app, app_kubernetes_io_name, app_kubernetes_io_component, etc.

Kubernetes Component Metrics

These metrics are scraped directly by Prometheus via ServiceMonitors (NOT via OTel collectors).

1. API Server Metrics

Source: kube-apiserver (scraped by Prometheus)
Endpoint: https://kubernetes.default.svc:443/metrics
Labels: instance, job, endpoint, service, namespace

Common metrics: - apiserver_request_total - API request count - apiserver_request_duration_seconds - Request latency (histogram) - apiserver_request_sli_duration_seconds - SLI latency - apiserver_current_inflight_requests - In-flight requests - apiserver_longrunning_requests - Long-running requests - apiserver_response_sizes - Response sizes - apiserver_storage_objects - Object counts in etcd - apiserver_audit_event_total - Audit events - etcd_request_duration_seconds - etcd request latency - etcd_db_total_size_in_bytes - etcd database size

Note: Some histogram buckets are dropped via metricRelabelings to reduce cardinality.

2. Scheduler Metrics

Source: kube-scheduler (scraped by Prometheus)
Endpoint: https://<scheduler-pod-ip>:10259/metrics
Labels: instance, job, endpoint, service, namespace, pod

Common metrics: - scheduler_queue_incoming_pods_total - Incoming pods by event/queue - scheduler_scheduling_attempt_duration_seconds - Scheduling duration - scheduler_e2e_scheduling_duration_seconds - End-to-end scheduling time - scheduler_pod_scheduling_attempts - Pod scheduling attempts - scheduler_pending_pods - Pending pods count - scheduler_framework_extension_point_duration_seconds - Plugin execution time - scheduler_schedule_attempts_total - Schedule attempts - scheduler_preemption_attempts_total - Preemption attempts

3. Controller Manager Metrics

Source: kube-controller-manager (scraped by Prometheus)
Endpoint: https://<controller-pod-ip>:10257/metrics
Labels: instance, job, endpoint, service, namespace, pod

Common metrics: - workqueue_adds_total - Work queue additions - workqueue_depth - Work queue depth - workqueue_queue_duration_seconds - Time in queue - workqueue_work_duration_seconds - Work duration - workqueue_retries_total - Retries count - Controller-specific metrics for each controller (deployment, replicaset, etc.)

4. Kubelet Metrics (cAdvisor)

Source: kubelet cAdvisor endpoint (scraped by Prometheus)
Endpoint: https://<node-ip>:10250/metrics/cadvisor
Labels: instance, job, node, container, pod, namespace, image

Important: These are different from kubeletstats metrics. cAdvisor provides more detailed container metrics.

Common metrics: - container_cpu_usage_seconds_total - Container CPU usage - container_memory_usage_bytes - Container memory usage - container_memory_working_set_bytes - Container working set - container_network_receive_bytes_total - Network RX bytes - container_network_transmit_bytes_total - Network TX bytes - container_fs_usage_bytes - Filesystem usage - container_fs_reads_total - Filesystem reads - container_fs_writes_total - Filesystem writes - container_start_time_seconds - Container start time - container_last_seen - Last seen timestamp

5. Kubelet Metrics (kubelet /metrics)

Source: kubelet endpoint (scraped by Prometheus)
Endpoint: https://<node-ip>:10250/metrics
Labels: instance, job, node

Common metrics: - kubelet_running_pods - Running pods count - kubelet_running_containers - Running containers count - kubelet_volume_stats_* - Volume statistics - kubelet_pod_start_duration_seconds - Pod start duration - kubelet_pod_worker_duration_seconds - Pod worker duration - kubelet_runtime_operations_* - Container runtime operations - kubelet_pleg_* - Pod lifecycle event generator metrics

6. CoreDNS Metrics

Source: coredns (scraped by Prometheus)
Endpoint: http://<coredns-pod-ip>:9153/metrics
Labels: instance, job, pod, namespace

Common metrics: - coredns_dns_request_duration_seconds - DNS request duration - coredns_dns_requests_total - DNS requests count - coredns_dns_responses_total - DNS responses count - coredns_forward_requests_total - Forwarded requests - coredns_cache_hits_total - Cache hits - coredns_cache_misses_total - Cache misses

7. etcd Metrics

Source: etcd (scraped by Prometheus if configured)
Endpoint: http://<etcd-ip>:2381/metrics
Labels: instance, job

Common metrics: - etcd_server_proposals_committed_total - Committed proposals - etcd_server_proposals_applied_total - Applied proposals - etcd_server_proposals_pending - Pending proposals - etcd_server_proposals_failed_total - Failed proposals - etcd_disk_wal_fsync_duration_seconds - WAL fsync duration - etcd_disk_backend_commit_duration_seconds - Backend commit duration - etcd_network_peer_round_trip_time_seconds - Peer RTT - etcd_mvcc_db_total_size_in_bytes - Database size

8. Kube Proxy Metrics

Source: kube-proxy (scraped by Prometheus)
Endpoint: http://<proxy-pod-ip>:10249/metrics
Labels: instance, job, node

Common metrics: - kubeproxy_sync_proxy_rules_duration_seconds - Sync duration - kubeproxy_network_programming_duration_seconds - Network programming duration - rest_client_requests_total - REST client requests - rest_client_request_duration_seconds - REST client latency

Node and System Metrics

1. Node Exporter Metrics

Source: node-exporter DaemonSet (scraped by Prometheus)
Endpoint: http://<node-exporter-pod-ip>:9100/metrics
Labels: instance, job, node (added via relabeling)

Important: These provide MORE detailed node metrics than hostmetrics receiver.

CPU Metrics

node_cpu_seconds_total - CPU time by mode
node_load1, node_load5, node_load15 - Load averages
node_procs_running - Running processes
node_procs_blocked - Blocked processes

Memory Metrics

node_memory_MemTotal_bytes - Total memory
node_memory_MemFree_bytes - Free memory
node_memory_MemAvailable_bytes - Available memory
node_memory_Buffers_bytes - Buffers
node_memory_Cached_bytes - Cached
node_memory_SwapTotal_bytes - Swap total
node_memory_SwapFree_bytes - Swap free

Disk Metrics

node_disk_io_time_seconds_total - Disk I/O time
node_disk_read_bytes_total - Bytes read
node_disk_written_bytes_total - Bytes written
node_disk_reads_completed_total - Read operations
node_disk_writes_completed_total - Write operations

Filesystem Metrics

node_filesystem_size_bytes - Filesystem size
node_filesystem_free_bytes - Filesystem free
node_filesystem_avail_bytes - Filesystem available
node_filesystem_files - Total inodes
node_filesystem_files_free - Free inodes

System Metrics

node_time_seconds - System time
node_boot_time_seconds - Boot time
node_context_switches_total - Context switches
node_forks_total - Forks
node_intr_total - Interrupts

2. Kube State Metrics

Source: kube-state-metrics Deployment (scraped by Prometheus)
Endpoint: http://<ksm-pod-ip>:8080/metrics
Labels: Varies by resource type

Important: Provides Kubernetes object state metrics (NOT performance metrics).

Pod Metrics

kube_pod_info - Pod information
kube_pod_status_phase - Pod phase (Running, Pending, etc.)
kube_pod_status_ready - Pod ready status
kube_pod_container_status_ready - Container ready status
kube_pod_container_status_restarts_total - Container restarts
kube_pod_labels - Pod labels
kube_pod_owner - Pod owner

Deployment Metrics

kube_deployment_status_replicas - Deployment replicas
kube_deployment_status_replicas_available - Available replicas
kube_deployment_status_replicas_unavailable - Unavailable replicas
kube_deployment_spec_replicas - Desired replicas
kube_deployment_metadata_generation - Generation

Node Metrics

kube_node_info - Node information
kube_node_status_condition - Node conditions (Ready, MemoryPressure, etc.)
kube_node_status_allocatable - Allocatable resources
kube_node_status_capacity - Node capacity
kube_node_spec_unschedulable - Unschedulable status

StatefulSet Metrics

kube_statefulset_status_replicas - StatefulSet replicas
kube_statefulset_status_replicas_ready - Ready replicas
kube_statefulset_replicas - Desired replicas

DaemonSet Metrics

kube_daemonset_status_number_ready - Ready pods
kube_daemonset_status_desired_number_scheduled - Desired pods
kube_daemonset_status_number_available - Available pods

Other Resource Metrics

Services: kube_service_*
ConfigMaps: kube_configmap_*
Secrets: kube_secret_*
PersistentVolumes: kube_persistentvolume_*
PersistentVolumeClaims: kube_persistentvolumeclaim_*
Ingress: kube_ingress_*
Jobs: kube_job_*
CronJobs: kube_cronjob_*

Application Metrics

OpenStack Service Metrics (via OTLP)

If OpenStack services are instrumented to send OTLP metrics, they will include: - Custom application metrics - Auto-instrumented framework metrics (HTTP requests, DB queries, etc.) - Resource usage from application perspective

Labels: Enriched with: - service_name (nova, neutron, cinder, etc.) - Kubernetes labels from k8sattributes processor - Custom application labels

Metric Label Reference

Common Labels Added by OTel Collectors

All metrics from OTel collectors are enriched with these labels (where applicable):

Label	Source	Description
`k8s_node_name`	k8sattributes + attributes processor	Kubernetes node name
`host_name`	attributes processor	Same as k8s_node_name
`service_instance_id`	attributes processor	Node name (for uniqueness)
`k8s_namespace_name`	k8sattributes	Kubernetes namespace
`k8s_pod_name`	k8sattributes	Pod name
`k8s_pod_uid`	k8sattributes	Pod UID
`k8s_deployment_name`	k8sattributes	Deployment name
`k8s_statefulset_name`	k8sattributes	StatefulSet name
`k8s_daemonset_name`	k8sattributes	DaemonSet name
`k8s_job_name`	k8sattributes	Job name
`k8s_cronjob_name`	k8sattributes	CronJob name
`k8s_container_name`	k8sattributes	Container name
`app`	k8sattributes (from pod label)	App label
`app_kubernetes_io_name`	k8sattributes	Standard app name
`app_kubernetes_io_component`	k8sattributes	App component
`app_kubernetes_io_instance`	k8sattributes	App instance
`app_kubernetes_io_version`	k8sattributes	App version
`deployment_environment`	attributes processor	Fixed: "genestack"
`k8s_cluster_name`	attributes processor	Fixed: "openstack-genestack-k8s-cluster"

Labels from Prometheus ServiceMonitors

Metrics scraped directly by Prometheus include:

Label	Description
`instance`	Scrape target address (IP:port)
`job`	ServiceMonitor job name (e.g., "kube-scheduler")
`endpoint`	Endpoint name
`service`	Service name
`namespace`	Namespace
`pod`	Pod name (if available)
`node`	Node name (if added via relabeling)

Note: To add node labels to ServiceMonitor metrics, add relabeling rules (see section above).

Estimated Metric Cardinality

Based on your configuration:

From OTel Collectors

hostmetrics: ~200-300 unique series per node × N nodes
kubeletstats: ⚠️ DISABLED (would be ~50-100 series per pod × P pods if enabled)
OTLP applications: Varies widely (1000s to 100,000s depending on instrumentation)

From Prometheus ServiceMonitors

API Server: ~5,000 series
Scheduler: ~500 series per scheduler
Controller Manager: ~1,000 series per controller
Kubelet (cAdvisor): ~100-200 series per pod × P pods (primary source for container metrics)
Kubelet (kubelet): ~100 series per node × N nodes
CoreDNS: ~50 series per CoreDNS pod
etcd: ~500 series per etcd member
Kube Proxy: ~100 series per node × N nodes
Node Exporter: ~800-1000 series per node × N nodes
Kube State Metrics: ~5-10 series per Kubernetes object

Total Estimate

For a typical cluster with: - 5 nodes - 100 pods - Standard Kubernetes components

Total: ~50,000-100,000 unique time series

For larger clusters, this can grow to 500,000+ series.

Note: With kubeletstats disabled, you're primarily using Prometheus-scraped container_* metrics for pod/container monitoring, which reduces the risk of duplicate metrics while still providing comprehensive coverage.

Troubleshooting

Metrics Not Appearing

Check if OTel collector is running:
```
kubectl get pods -n opentelemetry
```

Check collector logs for errors:

kubectl logs -n opentelemetry daemonset/opentelemetry-kube-stack-daemon-collector -c otc-container

Verify metrics are being collected (debug exporter):

kubectl logs -n opentelemetry daemonset/opentelemetry-kube-stack-daemon-collector \
  -c otc-container | grep "k8s_pod_cpu"

Check Prometheus targets:
Go to Prometheus UI → Status → Targets
Verify all ServiceMonitors are discovered and UP

Missing Labels

For OTel-collected metrics: Check that k8sattributes and attributes/add-node-labels processors are in the pipeline
For Prometheus-scraped metrics: Add relabeling rules to ServiceMonitors

Verify labels in Prometheus:

# Check a metric and see all its labels
system_cpu_time_seconds_total{k8s_node_name!=""}

High Cardinality Issues

If Prometheus is struggling with too many metrics:

Drop unnecessary metrics via metricRelabelings in ServiceMonitors
Reduce collection intervals (30s → 60s)
Limit metric groups in kubeletstats receiver
Use recording rules to aggregate high-cardinality metrics

Duplicate Metrics

If you see the same metric with different prefixes: - system_* from hostmetrics (OTel) - node_* from node-exporter (Prometheus) - container_* from both kubeletstats and cAdvisor

This is normal - they provide different levels of detail. Choose which source you prefer and drop duplicates if needed.

Summary

Your configuration collects metrics from:

OTel Collectors (Remote Write to Prometheus):
Host metrics (CPU, memory, disk, network, filesystem, paging, processes) via hostmetrics receiver
Application OTLP metrics (traces and metrics from instrumented applications)
Note: kubeletstats receiver is disabled to avoid duplicate sample errors
Prometheus Direct Scraping:
Kubernetes components (API server, scheduler, controller manager, etc.)
Kubelet/cAdvisor (pod and container metrics - container_* prefix)
Node exporter (detailed node metrics - node_* prefix)
Kube-state-metrics (Kubernetes object state - kube_* prefix)

Key Metric Sources by Prefix

Metric Prefix	Source	Collection Method	What It Measures
`system_*`	hostmetrics receiver	OTel DaemonSet → Remote Write	Node-level system metrics (CPU, memory, disk, network)
`container_*`	kubelet/cAdvisor	Prometheus scraping	Pod and container resource usage
`node_*`	node-exporter	Prometheus scraping	Detailed node metrics (more comprehensive than system_*)
`kube_*`	kube-state-metrics	Prometheus scraping	Kubernetes object state and metadata
`apiserver_`, `scheduler_`, etc.	K8s components	Prometheus scraping	Control plane component metrics
Custom app metrics	OTLP instrumented apps	OTel DaemonSet → Remote Write	Application-specific metrics

Common Query Patterns

Pod CPU Usage

# Using cAdvisor metrics (container_*)
sum by (pod, namespace) (
  rate(container_cpu_usage_seconds_total{container!=""}[5m])
)

Pod Memory Usage

# Using cAdvisor metrics (container_*)
sum by (pod, namespace) (
  container_memory_working_set_bytes{container!=""}
)

Node CPU Usage

# Using hostmetrics (system_*)
sum by (k8s_node_name) (
  rate(system_cpu_time_seconds_total{state!="idle"}[5m])
)

# Or using node-exporter (node_*)
1 - avg by (instance) (
  rate(node_cpu_seconds_total{mode="idle"}[5m])
)

Node Memory Usage

# Using hostmetrics (system_*)
sum by (k8s_node_name) (
  system_memory_usage_bytes{state="used"}
)

# Or using node-exporter (node_*)
node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes

All metrics are enriched with Kubernetes metadata and stored in Prometheus for querying in Grafana.

OpenTelemetry Base Metrics Collection - Complete Reference

Table of Contents

Metrics Collection Architecture

Collection Flow

Key Components

OTel Collector Metrics

1. Host Metrics (from hostmetrics receiver)

CPU Metrics

Memory Metrics

Disk Metrics

Filesystem Metrics

Network Metrics

Paging Metrics

Process Metrics

2. Pod and Container Metrics (from Prometheus cAdvisor scraping)

Container CPU Metrics

Container Memory Metrics

Container Network Metrics

Container Filesystem Metrics

Container Limits and Requests

Pod-Level Aggregations

Special Labels

3. kubeletstats Metrics (DISABLED in Current Config)

3. Application Metrics (via OTLP)

Kubernetes Component Metrics

1. API Server Metrics

2. Scheduler Metrics

3. Controller Manager Metrics

4. Kubelet Metrics (cAdvisor)

5. Kubelet Metrics (kubelet /metrics)

6. CoreDNS Metrics

7. etcd Metrics

8. Kube Proxy Metrics

Node and System Metrics

1. Node Exporter Metrics

CPU Metrics

Memory Metrics

Disk Metrics

Filesystem Metrics

Network Metrics

System Metrics

2. Kube State Metrics

Pod Metrics

Deployment Metrics

Node Metrics

StatefulSet Metrics

DaemonSet Metrics

Other Resource Metrics

Application Metrics

OpenStack Service Metrics (via OTLP)

Metric Label Reference

Common Labels Added by OTel Collectors

Labels from Prometheus ServiceMonitors

Estimated Metric Cardinality

From OTel Collectors

From Prometheus ServiceMonitors

Total Estimate

Troubleshooting

Metrics Not Appearing

Missing Labels

High Cardinality Issues

Duplicate Metrics

Summary

Key Metric Sources by Prefix

Common Query Patterns

Pod CPU Usage

Pod Memory Usage

Node CPU Usage

Node Memory Usage