VNX: Pool's internal FLU's go offline during a single SP reboot
Summary: Pool's internal FLU's go offline during a single SP reboot, thus bringing pools offline and causing DU.
Symptoms
In this example, SPA reboots first.
B 06/28/18 20:40:07 SP B 944 Hard Peer Bus Error 13 0 0
B 06/28/18 20:40:07 SP A a23 Peer SP Down. 3 0 0
Pool's FLU's are then trespassed to SPB, but the FLU's do not get ready on SPB, thus FLUs going offline.
B 06/28/18 20:40:09 MLU 712d85xx A problem was detected while accessing the Storage Pool. Please resolve any hardware problems. FLU 0x400000xxx went offline causing Pool 0xx00000011 to go offline.
B 06/28/18 20:40:09 MLU 712d8dxx LUN 6006016055113600:8192074aab21xxxx is unable to service IO due to a storage pool problem. LUN OID A000000xx. 00000400xxxx [ALU 148]
Cause
The reason for partition signature ok flag missing on one SP is that FLU rebuild got set on SP that was rebuilding the FLU while it got cleared on peer.
20:38:43.748 2 FFFFFA801C69Fxxx 0 std:NTFE: Lun 457 is having bad fru signature errors
20:38:43.748 1 FFFFFA801C69Fxxx 0 std:NTFE: Lun 456 is having bad fru signature errors
20:38:43.748 1 FFFFFA801C69Fxxx 0 std:NTFE: Lun 459 is having bad fru signature errors
20:38:43.748 2 FFFFFA801C69Fxxx 0 std:NTFE: Lun 461 is having bad fru signature errors
20:38:43.748 2 FFFFFA801C69Fxxx 0 std:NTFE: Lun 462 is having bad fru signature errors
20:38:43.748 2 FFFFFA801C69Fxxx 0 std:NTFE: Lun 464 is having bad fru signature errors 20:38:43.749 1 FFFFFA801C69Fxxx 0 std:NTFE: Lun 503 is having bad fru signature errors <<<<<<
As it finds the FLU has fru signature error, NotReady status was returned as true on FLU.
20:38:43.733 2 FFFFFA801C69Fxxx 0 std:CM:_fru_physical_powerdown_needs_rebuild: fru 208, state 0x4, peer state 0xffffffff
20:38:43.733 3 FFFFFA801C69Fxxx 0 std:NTFE: ntfeLuConvertDeviceControlIrp: exit_idx:11: LU 503 NotReady=TRUE Attrib=0x4034, TE=FALSE....... 20:38:43.748 2 FFFFFA801C69Fxxx 0 std:NTFE: Lun 457 is having bad fru signature errors
20:38:43.748 2 FFFFFA801C69Fxxx 0 std:NTFE: Lun 464 is having bad fru signature errors 20:38:43.749 1 FFFFFA801C69Fxxx 0 std:NTFE: Lun 503 is having bad fru signature errors <<<<<<