Start a Conversation

Solved!

Go to Solution

1275

April 1st, 2021 05:00

Unity driver: After running csi-install.sh unity pods show "1/2 CrashLoopBackOff"

Problem:
when deploying the unity-csi driver, running the csi-install.sh script results in pods that are in a CrashLoopBackOff state.

unity-controllers = 5/5
unity-nodes = 1/2

Error seems to be "Container image "k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.1.0" already present on machine"

How can I trouble shoot this?

Environment:

- Master and nodes running as virtual machines on vmware 6.7
- Running kubernetes 1.18.5
- Downloaded the latest driver for csi-unity from github, so I believe this is version 1.5?

Trouble shooting performed:

- Made sure ./verify.sh showed successes for every check. 
- Followed all instructions in this link

Commands used for verify and install of driver

./verify.sh --namespace unity --values ./values.yaml --node-verify-user nodeadmin
./csi-install.sh --namespace unity --values ./values.yaml --node-verify-user nodeadmin


My ./values.yaml file used for install

csiDebug: "false"
volumeNamePrefix: emc-csi-vol
snapNamePrefix: emc-csi-snap
imagePullPolicy: IfNotPresent
certSecretCount: 1
syncNodeInfoInterval: 15
controllerCount: 2
createStorageClassesWithTopology: true
allowRWOMultiPodAccess: "false"
defaultFsType: ext4
storageClassProtocols:
  - protocol: "iSCSI"
  - protocol: "NFS"
storageArrayList:
  - name: "APM00203025495"
    isDefaultArray: "true"
    storageClass:
    storagePool: "TieredStoragePool_01"
    FsType: "ext4"
    thinProvisioned: "true"
    isDataReductionEnabled: "false"
    tieringPolicy: "0"
    hostIOLimitName: ""
    nasServer: ""
    reclaimPolicy: Delete
  snapshotClass:
    retentionDuration: ""

get pods output

NAME READY STATUS RESTARTS AGE
snapshot-controller-0 1/1 Running 0 20h
unity-controller-78f5d6bc9d-p9r9k 5/5 Running 0 3m23s
unity-controller-78f5d6bc9d-s5dgn 5/5 Running 0 3m23s
unity-node-89nvr 1/2 CrashLoopBackOff 4 3m23s
unity-node-rkbdb 1/2 CrashLoopBackOff 4 3m23s
unity-node-tktck 1/2 CrashLoopBackOff 4 3m23s
unity-node-wkjpz 1/2 CrashLoopBackOff 4 3m23s

Performing a kubectl describe on one of the pods (they all have the same output)

Name: unity-node-89nvr
Namespace: unity
Priority: 0
Node: taxmd-k8n03-v/10.104.8.143
Start Time: Thu, 01 Apr 2021 08:39:48 -0400
Labels: app=unity-node
controller-revision-hash=6d957c6896
pod-template-generation=1
Annotations:
Status: Running
IP: 10.104.8.143
IPs:
IP: 10.104.8.143
Controlled By: DaemonSet/unity-node
Containers:
driver:
Container ID: docker://f7cc1a40b15f2a5b8ea2503c88f9c072c76594052709a041281d1cd97ad6699c
Image: dellemc/csi-unity:v1.5.0
Image ID: docker-pullable://dellemc/csi-unity@sha256:9ab99020e1a5939d6348f488bda0b037ef8c584a44c941ceeedfcee250f15e0d
Port:
Host Port:
Args:
--driver-name=csi-unity.dellemc.com
--driver-config=/unity-config/config
State: Running
Started: Thu, 01 Apr 2021 08:39:59 -0400
Ready: True
Restart Count: 0
Environment:
CSI_ENDPOINT: unix:///var/lib/kubelet/plugins/unity.emc.dell.com/csi_sock
X_CSI_MODE: node
X_CSI_UNITY_AUTOPROBE: true
X_CSI_UNITY_ALLOW_MULTI_POD_ACCESS: false
X_CSI_DEBUG: false
X_CSI_PRIVATE_MOUNT_DIR: /var/lib/kubelet/plugins/unity.emc.dell.com/disks
X_CSI_EPHEMERAL_STAGING_PATH: /var/lib/kubelet/plugins/kubernetes.io/csi/pv/
X_CSI_ISCSI_CHROOT: /noderoot
X_CSI_UNITY_NODENAME: (v1:spec.nodeName)
X_CSI_UNITY_NODENAME_PREFIX:
GOUNITY_DEBUG: false
SSL_CERT_DIR: /certs
X_CSI_UNITY_SYNC_NODEINFO_INTERVAL: 15
Mounts:
/certs from certs (ro)
/dev from dev (rw)
/noderoot from noderoot (rw)
/unity-config from unity-config (rw)
/var/lib/kubelet/plugins/kubernetes.io/csi from volumedevices-path (rw)
/var/lib/kubelet/plugins/unity.emc.dell.com from driver-path (rw)
/var/lib/kubelet/pods from pods-path (rw)
/var/run/secrets/kubernetes.io/serviceaccount from unity-node-token-cglbm (ro)
registrar:
Container ID: docker://44914aba36dcd4aa0f63403d0b4d03eadd49f9babb9a1a17258811a9899ef02c
Image: k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.1.0
Image ID: docker-pullable://k8s.gcr.io/sig-storage/csi-node-driver-registrar@sha256:a61d309da54641db41fb8f35718f744e9f730d4d0384f8c4b186ddc9f06cbd5f
Port:
Host Port:
Args:
--v=5
--csi-address=$(ADDRESS)
--kubelet-registration-path=/var/lib/kubelet/plugins/unity.emc.dell.com/csi_sock
State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 01 Apr 2021 08:41:44 -0400
Finished: Thu, 01 Apr 2021 08:41:49 -0400
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 01 Apr 2021 08:40:58 -0400
Finished: Thu, 01 Apr 2021 08:41:03 -0400
Ready: False
Restart Count: 4
Environment:
ADDRESS: /csi/csi_sock
KUBE_NODE_NAME: (v1:spec.nodeName)
Mounts:
/csi from driver-path (rw)
/registration from registration-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from unity-node-token-cglbm (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
registration-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/kubelet/plugins_registry/
HostPathType: DirectoryOrCreate
driver-path:
Type: HostPath (bare host directory volume)
Path: /var/lib/kubelet/plugins/unity.emc.dell.com
HostPathType: DirectoryOrCreate
volumedevices-path:
Type: HostPath (bare host directory volume)
Path: /var/lib/kubelet/plugins/kubernetes.io/csi
HostPathType: DirectoryOrCreate
pods-path:
Type: HostPath (bare host directory volume)
Path: /var/lib/kubelet/pods
HostPathType: Directory
dev:
Type: HostPath (bare host directory volume)
Path: /dev
HostPathType: Directory
noderoot:
Type: HostPath (bare host directory volume)
Path: /
HostPathType: Directory
certs:
Type: Projected (a volume that contains injected data from multiple sources)
SecretName: unity-certs-0
SecretOptionalName:
unity-config:
Type: Secret (a volume populated by a Secret)
SecretName: unity-creds
Optional: false
unity-node-token-cglbm:
Type: Secret (a volume populated by a Secret)
SecretName: unity-node-token-cglbm
Optional: false
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/disk-pressure:NoExecute
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoExecute
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoExecute
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled default-scheduler Successfully assigned unity/unity-node-89nvr to taxmd-k8n03-v
Normal Pulling 2m5s kubelet, taxmd-k8n03-v Pulling image "dellemc/csi-unity:v1.5.0"
Normal Pulled 119s kubelet, taxmd-k8n03-v Successfully pulled image "dellemc/csi-unity:v1.5.0"
Normal Created 116s kubelet, taxmd-k8n03-v Created container driver
Normal Started 116s kubelet, taxmd-k8n03-v Started container driver
Normal Pulling 116s kubelet, taxmd-k8n03-v Pulling image "k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.1.0"
Normal Pulled 114s kubelet, taxmd-k8n03-v Successfully pulled image "k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.1.0"
Normal Started 57s (x4 over 114s) kubelet, taxmd-k8n03-v Started container registrar
Warning BackOff 23s (x6 over 104s) kubelet, taxmd-k8n03-v Back-off restarting failed container
Normal Created 11s (x5 over 114s) kubelet, taxmd-k8n03-v Created container registrar
Normal Pulled 11s (x4 over 109s) kubelet, taxmd-k8n03-v Container image "k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.1.0" already present on machine

April 2nd, 2021 13:00

I got it working!
The driver pod was not failing at all, it was the registrar pod.
Problem was the Unity still had old hosts built in it from the failed attempts. Cleared those out and it finally worked. Also other prereqs were not in place (Did not have the iscsi initiators set up on the nodes correctly to begin with) There were other issues as well, I will make a new post detailing tips and tricks for others.

March 13th, 2024 22:18

@storage-dude​ - Did you create the detailed post on this issue?  Can you please explain what is old hosts built in it from failed attempts and where to clear?

No Events found!

Top