Container Storage Interface Drivers Family: When a Node Goes Down, Block Volumes that are Attached to the Node Cannot be Attached to Another Node

Summary: When a node goes down (due to node crash, node down, power off scenario), the block volumes that are attached to the node cannot be attached to another node.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

When a node goes down (due to node crash, node down, power off scenario), the block volumes that are attached to the node cannot be attached to another node. 

The issue is specific to block volumes only.

The issue is not seen for NFS volumes.

Issue affects the following drivers:

  • CSI Driver for PowerFlex
  • CSI Driver for PowerMax
  • CSI Driver for PowerScale
  • CSI Driver for Unity

This issue does not affect the CSI Driver for PowerStore.

Issue is reported on GitHub #282 This hyperlink is taking you to a website outside of Dell Technologies.

Steps to Reproduce: 

  1. Create a PVC1 and create POD1.
  2. Check the node where POD1 was created and power off the node from vSphere.
  3. When node becomes not ready, try to delete POD1 (It is stuck in terminating state since node is not ready.)
  4. Try to create POD2 using same PVC1. POD2 is in container creating state with this error in describe output.
Warning FailedAttachVolume 43s attachdetach-controller Multi-Attach error for volume "csivol-18eb3daee0" Volume is already used by pod(s) iscsipod1-p 

Expected Result: POD should get deleted even when node is not ready.

Result: POD is stuck in terminating state because of not ready node.

The below output shows the original pod terminating and the new pod stuck in Container Creating:

kubectl get pods -o wide

NAME        READY STATUS            RESTARTS AGE   IP     NODE    NOMINATED NODE READINESS GATES
iscsipod1-p 1/1   Terminating       0        9m43s <IP>   <Node3> <none>         <none>
iscsipod2-p 0/1   ContainerCreating 0        55s   <none> <Node2> <none>         <none>


The following command shows that the node is Not Ready:

kubectl get nodes

NAME  STATUS   ROLES                AGE  VERSION
Node1 Ready    control-plane,master 163d v1.23.0
Node2 Ready    <none>               162d v1.23.0
Node3 NotReady <none>              162d v1.23.0


The following command shows that the PVC is still bound to the PV:

kubectl get pvc -n <namespace>

NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
iscsipvc1-p Bound csivol-18eb3daee0 5Gi RWO powerstore-iscsi 10m


The following command shows the warning:

kubectl describe pod -n <namespace>

...
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 108s default-scheduler Successfully assigned default/iscsipod2-p to lglw3178
Warning FailedAttachVolume 108s attachdetach-controller Multi-Attach error for volume "csivol-18eb3daee0" Volume is already used by pod(s) iscsipod1-p

Cause

The root cause is that the attacher sidecar is not able to send ControllerUnpublishVolume() for the node that went down. See information contained in GitHub #215 This hyperlink is taking you to a website outside of Dell Technologies.

Resolution

Workaround:
  1. Force delete the pod that was running on the node that went down.
kubectl delete po <pod name> --force --grace-period=0
  1. Delete the volume attachment to the node that went down.
kubectl delete volumeattachment <volumeattachment>

The volume can now be attached to the new node.

Resolution:
 This solution will be updated when a fix has been released.
Article Properties
Article Number: 000200778
Article Type: Solution
Last Modified: 07 Jul 2023
Version:  8
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.