PowerStore: Unexpected node reboot due to FC port flap
Summary: A memory leak in the FC driver of the PowerStore node may occur while a registered state change notification (RSCN) is processed. As a result, the memory that is required to return a list of port and node names that are zoned to the system may not be freed as expected. ...
Symptoms
Symptoms may include:
- Node panic due to unexpected reboot
- Kernel Panic (Out of Memory) OOM due to FC port flapping
- Impacted host HBA cannot get stable connectivity
- Host side loss of access to data
Cause
When a device status changes (login/logout) in a SAN fabric, the switch sends out RSCN notifications to all connected devices.
The PowerStore FC driver sends commands to the switch to query the WWNs that are in the PowerStore's zone.
Memory is allocated for such a query but may not be properly freed.
Resolution
The issue fixed in PowerStoreOS version 3.5.x.x
The node auto recovers after the unexpected reboot.
Additional Information
The faulty WWN should be identified and fixed or disconnected.
Review the switch and host logs to determine the cause of the issues.
Possible reasons for port flapping include: loose or dirty FC cable, faulty switch port, faulty SFP, host HBA or host HBA driver/firmware.
Host HBA driver/firmware compatibility should be checked.
Examples of Port flapping, port status changed from Online to Offline:
Brocade switch example of port2 flapping:
fabriclog --show : Time Stamp Input and *Action S, P Sn,Pn Port Xid =================================================================================== Switch 0; Sat Mar 19 10:02:31 2022 GMT (GMT+0:00) 10:02:31.817858 SCN Port Offline;rsn=0x4,g=0x4fd58 D2,P0 D2,P0 2 NA 10:02:31.817865 *Removing all nodes from port D2,P0 D2,P0 2 NA 10:02:31.831807 SCN LR_PORT(0);g=0x4fd58 D2,P0 D2,P0 2 NA 10:02:31.840928 SCN Port Online; g=0x4fd58,isolated=0 D2,P0 D2,P1 2 NA 10:02:31.841017 Port Elp engaged D2,P1 D2,P0 2 NA 10:02:31.841034 *Removing all nodes from port D2,P0 D2,P0 2 NA 10:02:31.841093 SCN Port F_PORT D2,P1 D2,P0 2 NA
Cisco switch example port fc1/22 flapping:
`show port-config internal all` *************** Port Config Port Control Log *************** ---- ------ ----------- ------- Time PortNo Port Action ErrCode ---- ------ ----------- ------- Mar 19 12:27:53 2023 00986053 fc1/22 Enable None Mar 19 12:27:53 2023 00984797 fc1/22 Participating Mode None Mar 19 12:13:43 2023 00558421 fc1/22 Enable None Mar 19 12:13:43 2023 00557170 fc1/22 Participating Mode None Mar 19 12:02:21 2023 00738769 fc1/22 Enable None Mar 19 12:02:21 2023 00737461 fc1/22 Participating Mode None Mar 19 11:40:58 2023 00976928 fc1/22 Enable None Mar 19 11:40:58 2023 00975543 fc1/22 Participating Mode None Mar 19 11:39:01 2023 00195273 fc1/22 Enable None Mar 19 11:39:01 2023 00193893 fc1/22 Participating Mode None Mar 19 11:37:13 2023 00341497 fc1/22 Enable None Mar 19 11:37:13 2023 00340169 fc1/22 Participating Mode None