PowerProtect: Kubernetes backup failed with error 'controller pod is not running'
Summary: PPDM Kubernetes backup failed with error 'controller pod is not running'
This article applies to
This article does not apply to
This article is not tied to any specific product.
Not all product versions are identified in this article.
Symptoms
In the instance where this was observed, all PPDM Kubernetes backups start to fail after recovery of PPDM from its server disaster recovery backup. It could apply to other situation though.
Kubernetes backup failed with error 'controller pod is not running'.
Below error can be observed in logs:
2021-07-21T03:49:48.340Z ERROR [] [task-5011a057-340f-40fb-8cd8-12414685d058] [][][][TRACE_ID:a66ce529604914ad;JOB_ID:a9b8915af1637407][] [K8sHelperApi.isDone(90)] - Failed to wait on job com.emc.dpsg.ecdm.baseresourceservice.exception.ValidationServiceException: controller pod is not running
2021-07-21T03:50:14.065Z WARN [] [dsSource-plpd-testcluster] [][][][][] [c.e.b.c.s.p.K8sHealthMonitor.checkPodHealth(200)] - Controller Pod is down, cluster: <k8s_cluster_name>, age=PT153H49M43.065S
Output of command kubectl describe pod -n powerprotect for that k8s cluster:
powerprotect powerprotect-controller-666ffccbbf-p5rwh 0/1 ImagePullBackOff 0 6d12h
velero-ppdm backup-driver-587cfcdf59-2mc8p 1/1 Running 0 49d
velero-ppdm velero-5df5fcd896-p68rw 1/1 Running 0 49d
Kubernetes backup failed with error 'controller pod is not running'.
Below error can be observed in logs:
2021-07-21T03:49:48.340Z ERROR [] [task-5011a057-340f-40fb-8cd8-12414685d058] [][][][TRACE_ID:a66ce529604914ad;JOB_ID:a9b8915af1637407][] [K8sHelperApi.isDone(90)] - Failed to wait on job com.emc.dpsg.ecdm.baseresourceservice.exception.ValidationServiceException: controller pod is not running
2021-07-21T03:50:14.065Z WARN [] [dsSource-plpd-testcluster] [][][][][] [c.e.b.c.s.p.K8sHealthMonitor.checkPodHealth(200)] - Controller Pod is down, cluster: <k8s_cluster_name>, age=PT153H49M43.065S
Output of command kubectl describe pod -n powerprotect for that k8s cluster:
powerprotect powerprotect-controller-666ffccbbf-p5rwh 0/1 ImagePullBackOff 0 6d12h
velero-ppdm backup-driver-587cfcdf59-2mc8p 1/1 Running 0 49d
velero-ppdm velero-5df5fcd896-p68rw 1/1 Running 0 49d
Cause
Powerprotect-controller pod is unable to pull required image from internet.
Resolution
1. Check if Kubernetes cluster can access Docker Hub at https://hub.docker.com/ and Quay at https://quay.io/ to pull required images.
2. If a Kubernetes cluster cannot access these sites due to firewall or other restrictions, you can pull these images to a local registry that the cluster can access. Please follow below procedure.
1). Create an application.properties file /usr/local/brs/lib/cndm/config/application.properties on
the PowerProtect Data Manager appliance with the following contents:
k8s.docker.registry=fqdn:port For example, k8s.docker.registry=artifacts.example.com:8446
k8s.image.pullsecrets=secret resource name Specify this entry only if you require an image pull secret.
2). Run cndm restart to apply the properties.
Note: See PPDM Administration and User Guide for more details.
3. As Kubernetes cluster has already been added as an asset source in PPDM GUI, a manual discovery of the Kubernetes cluster is required after step 1 or 2 is checked/performed.
2. If a Kubernetes cluster cannot access these sites due to firewall or other restrictions, you can pull these images to a local registry that the cluster can access. Please follow below procedure.
1). Create an application.properties file /usr/local/brs/lib/cndm/config/application.properties on
the PowerProtect Data Manager appliance with the following contents:
k8s.docker.registry=fqdn:port For example, k8s.docker.registry=artifacts.example.com:8446
k8s.image.pullsecrets=secret resource name Specify this entry only if you require an image pull secret.
2). Run cndm restart to apply the properties.
Note: See PPDM Administration and User Guide for more details.
3. As Kubernetes cluster has already been added as an asset source in PPDM GUI, a manual discovery of the Kubernetes cluster is required after step 1 or 2 is checked/performed.
Affected Products
PowerProtect Data ManagerArticle Properties
Article Number: 000190024
Article Type: Solution
Last Modified: 27 Aug 2022
Version: 6
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.