Amazon EKS and Kubernetes Container Insights metrics

Focus mode

Amazon EKS and Kubernetes Container Insights metrics - Amazon CloudWatch

The following tables list the metrics and dimensions that Container Insights collects for Amazon EKS and Kubernetes. These metrics are in the ContainerInsights namespace. For more information, see Metrics.

If you do not see any Container Insights metrics in your console, be sure that you have completed the setup of Container Insights. Metrics do not appear before Container Insights has been set up completely. For more information, see Setting up Container Insights.

Metric name	Dimensions	Description
`cluster_failed_node_count`	`ClusterName`	The number of failed worker nodes in the cluster. A node is considered failed if it is suffering from any node conditions. For more information, see Conditions in the Kubernetes documentation.
`cluster_node_count`	`ClusterName`	The total number of worker nodes in the cluster.
`namespace_number_of_running_pods`	`Namespace` `ClusterName` `ClusterName`	The number of pods running per namespace in the resource that is specified by the dimensions that you're using.
`node_cpu_limit`	`ClusterName`	The maximum number of CPU units that can be assigned to a single node in this cluster.
`node_cpu_reserved_capacity`	`NodeName`, `ClusterName`, `InstanceId` `ClusterName`	The percentage of CPU units that are reserved for node components, such as kubelet, kube-proxy, and Docker. Formula: `node_cpu_request / node_cpu_limit` Note `node_cpu_request` is not reported directly as a metric, but is a field in performance log events. For more information, see Relevant fields in performance log events for Amazon EKS and Kubernetes.
`node_cpu_usage_total`	`ClusterName`	The number of CPU units being used on the nodes in the cluster.
`node_cpu_utilization`	`NodeName`, `ClusterName`, `InstanceId` `ClusterName`	The total percentage of CPU units being used on the nodes in the cluster. Formula: `node_cpu_usage_total / node_cpu_limit`
`node_gpu_limit`	`ClusterName` `ClusterName`, `InstanceId`, `NodeName`	The total number of GPU(s) available on the node.
`node_gpu_usage_total`	`ClusterName` `ClusterName`, `InstanceId`, `NodeName`	The number of GPU(s) being used by the running pods on the node.
`node_gpu_reserved_capacity`	`ClusterName` `ClusterName`, `InstanceId`, `NodeName`	The percentage of GPU currently being reserved on the node. The formula is, `node_gpu_request / node_gpu_limit`. Note `node_gpu_request` is not reported directly as a metric, but is a field in performance log events. For more information, see Relevant fields in performance log events for Amazon EKS and Kubernetes.
`node_filesystem_utilization`	`NodeName`, `ClusterName`, `InstanceId` `ClusterName`	The total percentage of file system capacity being used on nodes in the cluster. Formula: `node_filesystem_usage / node_filesystem_capacity` Note `node_filesystem_usage` and `node_filesystem_capacity` are not reported directly as metrics, but are fields in performance log events. For more information, see Relevant fields in performance log events for Amazon EKS and Kubernetes.
`node_memory_limit`	`ClusterName`	The maximum amount of memory, in bytes, that can be assigned to a single node in this cluster.
`node_memory_reserved_capacity`	`NodeName`, `ClusterName`, `InstanceId` `ClusterName`	The percentage of memory currently being used on the nodes in the cluster. Formula: `node_memory_request / node_memory_limit` Note `node_memory_request` is not reported directly as a metric, but is a field in performance log events. For more information, see Relevant fields in performance log events for Amazon EKS and Kubernetes.
`node_memory_utilization`	`NodeName`, `ClusterName`, `InstanceId` `ClusterName`	The percentage of memory currently being used by the node or nodes. It is the percentage of node memory usage divided by the node memory limitation. Formula: `node_memory_working_set / node_memory_limit`.
`node_memory_working_set`	`ClusterName`	The amount of memory, in bytes, being used in the working set of the nodes in the cluster.
`node_network_total_bytes`	`NodeName`, `ClusterName`, `InstanceId` `ClusterName`	The total number of bytes per second transmitted and received over the network per node in a cluster. Formula: `node_network_rx_bytes + node_network_tx_bytes` Note `node_network_rx_bytes` and `node_network_tx_bytes` are not reported directly as metrics, but are fields in performance log events. For more information, see Relevant fields in performance log events for Amazon EKS and Kubernetes.
`node_number_of_running_containers`	`NodeName`, `ClusterName`, `InstanceId` `ClusterName`	The number of running containers per node in a cluster.
`node_number_of_running_pods`	`NodeName`, `ClusterName`, `InstanceId` `ClusterName`	The number of running pods per node in a cluster.
`pod_cpu_reserved_capacity`	`PodName`, `Namespace`, `ClusterName` `ClusterName`	The CPU capacity that is reserved per pod in a cluster. Formula: `pod_cpu_request / node_cpu_limit` Note `pod_cpu_request` is not reported directly as a metric, but is a field in performance log events. For more information, see Relevant fields in performance log events for Amazon EKS and Kubernetes.
`pod_cpu_utilization`	`PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName` `Service`, `Namespace`, `ClusterName` `ClusterName`	The percentage of CPU units being used by pods. Formula: `pod_cpu_usage_total / node_cpu_limit`
`pod_cpu_utilization_over_pod_limit`	`PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName` `Service`, `Namespace`, `ClusterName` `ClusterName`	The percentage of CPU units being used by pods relative to the pod limit. Formula: `pod_cpu_usage_total / pod_cpu_limit`
`pod_gpu_request`	`ClusterName` `ClusterName`, `Namespace`, `PodName` `ClusterName`, `FullPodName`, `Namespace`, `PodName`	The GPU requests for the pod. This value must always be equal to `pod_gpu_limit`.
`pod_gpu_limit`	`ClusterName` `ClusterName`, `Namespace`, `PodName` `ClusterName`, `FullPodName`, `Namespace`, `PodName`	The maximum number of GPU(s) that can be assigned to the pod in a node.
`pod_gpu_usage_total`	`ClusterName` `ClusterName`, `Namespace`, `PodName` `ClusterName`, `FullPodName`, `Namespace`, `PodName`	The number of GPU(s) being allocated on the pod.
`pod_gpu_reserved_capacity`	`ClusterName` `ClusterName`, `Namespace`, `PodName` `ClusterName`, `FullPodName`, `Namespace`, `PodName`	The percentage of GPU currently being reserved for the pod. The formula is - pod_gpu_request / node_gpu_reserved_capacity.
`pod_memory_reserved_capacity`	`PodName`, `Namespace`, `ClusterName` `ClusterName`	The percentage of memory that is reserved for pods. Formula: `pod_memory_request / node_memory_limit` Note `pod_memory_request` is not reported directly as a metric, but is a field in performance log events. For more information, see Relevant fields in performance log events for Amazon EKS and Kubernetes.
`pod_memory_utilization`	`PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName` `Service`, `Namespace`, `ClusterName` `ClusterName`	The percentage of memory currently being used by the pod or pods. Formula: `pod_memory_working_set / node_memory_limit`
`pod_memory_utilization_over_pod_limit`	`PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName` `Service`, `Namespace`, `ClusterName` `ClusterName`	The percentage of memory that is being used by pods relative to the pod limit. If any containers in the pod don't have a memory limit defined, this metric doesn't appear. Formula: `pod_memory_working_set / pod_memory_limit`
`pod_network_rx_bytes`	`PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName` `Service`, `Namespace`, `ClusterName` `ClusterName`	The number of bytes per second being received over the network by the pod. Formula: `sum(pod_interface_network_rx_bytes)` Note `pod_interface_network_rx_bytes` is not reported directly as a metric, but is a field in performance log events. For more information, see Relevant fields in performance log events for Amazon EKS and Kubernetes.
`pod_network_tx_bytes`	`PodName`, `Namespace`, `ClusterName` `Namespace`, `ClusterName` `Service`, `Namespace`, `ClusterName` `ClusterName`	The number of bytes per second being transmitted over the network by the pod. Formula: `sum(pod_interface_network_tx_bytes)` Note `pod_interface_network_tx_bytes` is not reported directly as a metric, but is a field in performance log events. For more information, see Relevant fields in performance log events for Amazon EKS and Kubernetes.
`pod_number_of_container_restarts`	`PodName`, `Namespace`, `ClusterName`	The total number of container restarts in a pod.
`service_number_of_running_pods`	`Service`, `Namespace`, `ClusterName` `ClusterName`	The number of pods running the service or services in the cluster.

Kueue metrics

Beginning with version v2.4.0-eksbuild.1 of the the CloudWatch Observability EKS add-on, Container Insights for Amazon EKS supports collecting Kueue metrics from Amazon EKS clusters. For more information about the add-on, see Install the CloudWatch agent with the Amazon CloudWatch Observability EKS add-on or the Helm chart.

For information about enabling the metrics, see Enable Kueue metrics to enable them.

The Kueue metrics that are collected are listed in the following table. These metrics are published into the ContainerInsights/Prometheus namespace in CloudWatch. Some of these metrics use the following dimensions:

ClusterQueue is the name of the ClusterQueue
The possible values of Status are active and inadmissible
The possible values of Reason are Preempted, PodsReadyTimeout, AdmissionCheck, ClusterQueueStopped, and InactiveWorkload
Flavor is the referenced flavor.
Resource refers to cluster computer resources, such as cpu, memory, gpu, and so on.

Metric name	Dimensions	Description
`kueue_pending_workloads`	`ClusterName`, `ClusterQueue`, `Status` `ClusterName`, `ClusterQueue` `ClusterName`, `Status` `ClusterName`	The number of pending workloads.
`kueue_evicted_workloads_total`	`ClusterName`, `ClusterQueue`, `Reason` `ClusterName`, `ClusterQueue` `ClusterName`, `Reason` `ClusterName`	The total number of evicted workloads.
`kueue_admitted_active_workloads`	`ClusterName`, `ClusterQueue` `ClusterName`	The number of admitted workloads that are active (unsuspended and not finished).
`kueue_cluster_queue_resource_usage`	`ClusterName`, `ClusterQueue`, `Resource`, `Flavor` `ClusterName`, `ClusterQueue`, `Resource` `ClusterName`, `ClusterQueue`, `Flavor` `ClusterName`, `ClusterQueue` `ClusterName`	Reports the total resource usage of the ClusterQueue.
`kueue_cluster_queue_nominal_quota`	`ClusterName`, `ClusterQueue`, `Resource`, `Flavor` `ClusterName`, `ClusterQueue`, `Resource` `ClusterName`, `ClusterQueue`, `Flavor` `ClusterName`, `ClusterQueue` `ClusterName`	Reports the resource quota of the ClusterQueue.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Amazon EKS and Kubernetes Container Insights with enhanced observability metrics

Performance log reference

Next topic:

Performance log reference

Previous topic:

Amazon EKS and Kubernetes Container Insights with enhanced observability metrics

Select your cookie preferences

Customize cookie preferences

Essential

Performance

Functional

Advertising

Unable to save cookie preferences

Amazon EKS and Kubernetes Container Insights metrics

Note

Note

Note

Note

Note

Note

Note

Note

Note

Kueue metrics

Next topic:

Previous topic:

Need help?

On this page

Did this page help you?