Decoding Kubernetes: When HPA Can’t Fetch Metrics

The Horizontal Pod Autoscaler (HPA) is pivotal in Kubernetes. It’s like our trusty assistant, automatically adjusting the number of pods in a deployment according to observed metrics like CPU usage. However, there are moments when it encounters hurdles. One such instance is when you stumble upon error messages such as:

Name:                                                  widget-app-sun
Namespace:                                             development
...
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  <unknown> / 55%
Min replicas:                                          1
Max replicas:                                          3
Deployment pods:                                       1 current / 0 desired
Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: unable to get metrics for resource cpu: no metrics returned from resource metrics API
Events:
  Type     Reason                        Age                 From                       Message
  ----     ------                        ----                ----                       -------
  Warning  FailedComputeMetricsReplicas  20m (x20 over 9m)   horizontal-pod-autoscaler  invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: unable to get metrics for resource cpu: no metrics returned from resource metrics API
  Warning  FailedGetResourceMetric       12s (x29 over 9m)   horizontal-pod-autoscaler  unable to get metrics for resource cpu: no metrics returned from resource metrics API

What’s the Scoop?

This cryptic message is essentially HPA’s way of saying, “I’m having a hard time fetching those CPU metrics I need.” But why? Here are a few culprits:

Perhaps Metrics-server isn’t installed or isn’t operating correctly.
Maybe Metrics-server is present, but it’s struggling to fetch metrics from the nodes.
It could be a misconfiguration on HPA’s end.
Sometimes, network policies or RBAC restrictions come into play, obstructing access to the metrics API.

The Detective Work: Troubleshooting Steps

.- Is Metrics-server Onboard?

kubectl get deployments -n kube-system | grep metrics-server

metrics-server           1/1     1            1           221d

If it’s missing in action, it’s time to deploy it. Helm is a handy tool for this. https://artifacthub.io/packages/helm/metrics-server/metrics-server

.- How’s Metrics-server Feeling Today?

kubectl get pods -n kube-system -l k8s-app=metrics-server

NAME                              READY   STATUS    RESTARTS      AGE
metrics-server-5f9f776df5-zlg42   1/1     Running   6 (71d ago)   221d

Make sure it’s running smoothly. If it’s throwing a tantrum, dive into its logs:

kubectl logs metrics-server-5f9f776df5-zlg42 -n kube-system

I0730 17:07:50.422754       1 secure_serving.go:266] Serving securely on [::]:10250
I0730 17:07:50.425140       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0730 17:07:50.425155       1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController

.- A Peek into Metrics-server’s Config

Sometimes, it needs some flags to communicate correctly, especially if your cluster has a unique CNI or is lounging on a special cloud provider. You might need to check and adjust flags like:

--kubelet-preferred-address-types or --kubelet-insecure-tls

.- The Network or RBAC Culprits

Are there any stringent network policies that are hindering the conversation between the metrics-server and the API server or the kubelets? Or maybe, metrics-server doesn’t have the right RBAC permissions to access metrics?

Peek into network policies in the kube-system namespace:

kubectl get networkpolicy -n kube-system

And don’t forget to inspect the ClusterRole:

kubectl describe clusterrole | grep metrics-server -A10

Name:         system:metrics-server
Labels:       objectset.rio.cattle.io/hash=9a6f488150c249811b9df07e116280789628963e
Annotations:  objectset.rio.cattle.io/applied:
                H4sIAAAAAAAA/4yRwY6bMBCGX6WasyEhSQkg9VD10ENvPfRScRjsSXABG80Yom7Eu69MotVKq93syRr/+j7711wBR/uHWKx3UAE3qFOcQuvZPmGw3qVdIan1mzkDBZ11Bir40U8SiH...

.- Version Harmony: HPA & Metrics-server

Compatibility matters! Ensure HPA and metrics-server are on the same page. Sometimes, a version mismatch might be the root cause.

Here’s how to check your Kubernetes version:

kubectl version

Client Version: v1.28.2
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3

And let’s not forget about the metrics-server:

kubectl describe deployment metrics-server -n kube-system | grep Image:

Image:      rancher/mirrored-metrics-server:v0.6.2

Troubleshooting in Kubernetes can be challenging, but with a systematic approach, many issues, like the HPA metrics problem, can be resolved. It’s essential to understand the components involved and to remain adaptable. As Kubernetes continues to evolve, so too should our methods for diagnosing and fixing problems.

Share