Container Storage Interface Drivers Family: When a Node Goes Down, Block Volumes that are Attached to the Node Cannot be Attached to Another Node
摘要: When a node goes down (due to node crash, node down, power off scenario), the block volumes that are attached to the node cannot be attached to another node.
症状
When a node goes down (due to node crash, node down, power off scenario), the block volumes that are attached to the node cannot be attached to another node.
The issue is specific to block volumes only.
The issue is not seen for NFS volumes.
Issue affects the following drivers:
- CSI Driver for PowerFlex
- CSI Driver for PowerMax
- CSI Driver for PowerScale
- CSI Driver for Unity
This issue does not affect the CSI Driver for PowerStore.
Issue is reported on GitHub #282
Steps to Reproduce:
- Create a PVC1 and create POD1.
- Check the node where POD1 was created and power off the node from vSphere.
- When node becomes not ready, try to delete POD1 (It is stuck in terminating state since node is not ready.)
- Try to create POD2 using same PVC1. POD2 is in container creating state with this error in describe output.
Warning FailedAttachVolume 43s attachdetach-controller Multi-Attach error for volume "csivol-18eb3daee0" Volume is already used by pod(s) iscsipod1-p
Expected Result: POD should get deleted even when node is not ready.
Result: POD is stuck in terminating state because of not ready node.
The below output shows the original pod terminating and the new pod stuck in Container Creating:
kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES iscsipod1-p 1/1 Terminating 0 9m43s <IP> <Node3> <none> <none> iscsipod2-p 0/1 ContainerCreating 0 55s <none> <Node2> <none> <none>
The following command shows that the node is Not Ready:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
Node1 Ready control-plane,master 163d v1.23.0
Node2 Ready <none> 162d v1.23.0
Node3 NotReady <none> 162d v1.23.0
The following command shows that the PVC is still bound to the PV:
kubectl get pvc -n <namespace> NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE iscsipvc1-p Bound csivol-18eb3daee0 5Gi RWO powerstore-iscsi 10m
The following command shows the warning:
kubectl describe pod -n <namespace>
...
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 108s default-scheduler Successfully assigned default/iscsipod2-p to lglw3178
Warning FailedAttachVolume 108s attachdetach-controller Multi-Attach error for volume "csivol-18eb3daee0" Volume is already used by pod(s) iscsipod1-p
原因
解决方案
- Force delete the pod that was running on the node that went down.
kubectl delete po <pod name> --force --grace-period=0
- Delete the volume attachment to the node that went down.
kubectl delete volumeattachment <volumeattachment>
The volume can now be attached to the new node.
Resolution:
This solution will be updated when a fix has been released.