FPIN (Fabric Performance Impact Notification) messages, what are they and what do they mean
Summary: FPIN (Fabric Performance Impact Notification) messages, what are they and what do they mean.
Instructions
Fabric Performance Impact Notification (FPIN) messages are designed to proactively alert devices within a fabric network about specific conditions that may impact performance.
Fabric notifications messages serve as an early warning system to relief any negative effects on fabric performance and were developed to optimize I/O behavior and avoid impaired paths by notifying devices of current fabric conditions.
They include notifications regarding link integrity, delivery notification and congestion issues.
Fabric notifications are a mechanism that provides end devices with more information about events in the fabric and are intended to help (the user) to address the data flow issues experienced in modern FC SANs.
The notifications essentially tell an end device, "You are sending too much into the fabric" or "Beware, there is a problem ahead—slow down or switch paths."
The end device is made aware of a problem and can act to initiate remediation proactively notifying the sending or receiving devices of congestion until a fabric-wide event was unfolding.
FC Data flow and impact:
- FC data flow can be impacted by three issues in
- Link Integrity: Questionable / malfunctioning components (SFPs, cables, patch panels) along a SAN path can have severe impacts and frequently leads to application degradation, crashes, and outages.
-
- Improper Multi-Path Input Output (MPIO) settings: Most default MPIO settings use "round robin.” When there is a link integrity issue or congestion, MPIO set to round robin will continue to use an impaired (sick but not dead) pathway sending data into the fabric which often results in congestion or worse.
- Congestion: Occurs when the rate of frames entering the fabric exceeds the rate of frames exiting the fabric. This is often seen as slow drain.
-
-
- Oversubscription, which occurs when more frames are arriving than can be processed (bandwidth mismatch)
- More of a problem today as higher speed 32 Gbps Storage Arrays are being mixed in with legacy 4 Gbps, 8 Gbps, and 16 Gbps fabrics
- Credit stall, which occurs when a device stops returning credits bringing the link to a standstill
- A credit-stalled device is seen as a "slow drain"
- Lost credits, which occur when physical errors damage frames or the credit response and reduces the capacity of the link
- Oversubscription, which occurs when more frames are arriving than can be processed (bandwidth mismatch)
-
Elements and requirements:
Two types of Fabric Notifications
-
- Software-based Fabric Performance Impact Notifications (FPINs) ELS messages (Extended Link Services)
- Hardware-based Congestion Signal Primitives (CSPs)
Fabric notifications require three critical fabric services to be implemented into the fabric for notifications to happen/Core Extended Link Services (ELS) operating on Fabric Controllers (switch)
-
- Exchange Diagnostic Capabilities (EDC)
- Register Diagnostic Functions (RDF)
- Fabric Performance Impact Notifications (FPIN)
- How devices register for notifications:
- The device must support the T11 notification standards.
- A wide range of support available from storage, switch, HBA, OS, and multi-pathing software vendors
- A forth coming white paper on Fabric Notifications go into detail around support devices, firmware, OSs
- The end devices interested in receiving Signals and FPINs register with the Fabric Controller (typically the switch) after login (FLOGI)
- The device must support the T11 notification standards.
- In order to successfully receive notifications a device must:
- Be registered to receive a particular notification type.
- Be experiencing the notification condition.
- Be an in-zone peer device where the condition exists.
The end devices interested in receiving Signals and FPIN, register using EDC (Exchange Diagnostic Capability) and RDF (Register diagnostic function) respectively with the Fabric Controller after login (FLOGI).
To receive notifications, the device must be registered to receive the particular notification, be experiencing the notification condition, and be an in-zone peer device where the condition exists.
There are 4 types of events for which FPINs can be generated for (Software):
- Link Integrity: MPIO drivers receive Link Integrity notifications and manage path selection. When MPIO is connected to an impaired path, those affected MPIO hosts get notified so they can act.
The information includes the reason (Link failure, Loss of Signal, Invalid CRC so on) and a threshold value that was breached.
- Congestion: A congestion condition detected at a fabric F-port will be notified to the connected end device.
FPIN Congestion notifications are valuable information for end devices that can optimize I/O scheduling, for example, slowing transfer rates or issuing serial read I/Os.
In general, Congestion notifications indicate why long exchange completion times may be occurring.
- Peer Congestion: Peer Congestion notifications are sent to all the registered in-zone peers of end devices that are experiencing congestion.
There are various remedies that peers can leverage to relieve this type of congestion.
For example - if the peer's port may have auto-negotiated faster than the destination port; the peer could limit its data rate to match that of the destination.
- Delivery disruption: When a fabric has discarded a packet, Fabric Notifications notifies the initiator of the failure by sending an FPIN Delivery notification.
No matter if the command is dropped by an ISL or end device connection, the originator is notified.
The information includes the reason code (Timeout, No route and so on) and a dropped packet header used to deduce the flows impacted by the drop.
Fabric Notification Types - CSPs (Hardware)
- Hardware-based Congestion Signal Primitives (CSPs)
- Typically sent from the FC-1 layer of detecting port (typically by low-level firmware) to registered devices
- Consist of optical codes (not frames) that are sent over the link between directly connected Fibre Channel devices. Not subject to fabric latency issues
- Provide fast detection (real time) of sudden congestion situations and reacts instantly by signaling the attached physically connected port
- Not supported with PowerMax
- Congestion Signal:
This is typically sent from an FC-1 layer of the detecting port (typically by low-level firmware) to registered devices.
Signals are required in addition to FPIN because primitives can be transmitted on a congested port even when there are no credits available (an FPIN frame has to wait when there are no credits).
So unlike an FPIN, Signals can be considered as real time indicators of congestion with better delivery guarantee.
Fabric Requirements:
- Brocade
- Connectrix DS-66xx switch, MP-7810/7850B and ED-DCX6B director hardware or above is supported
- Brocade supports Fabric Notifications with Fabric OS 9.0 or above
- PowerMax integration is supported with Fabric OS 9.2.0a or above is supported
- Fabric OS 9.2.0a has more precise thresholds for Fabric Notifications
- Fabric Vision license required for FPIN-LI
- MAPS policy (Conservative, Moderate, or Aggressive) must be enabled
- Cisco MDS
- Cisco supports Fabric Notifications with NX-OS 9.3.1 or above
- PowerMax integration was tested with 9.3(2a)
- Any switch or director capable of running these versions is supported.
- Fabric Notifications are not enabled by default and require steps to enable it
- Cisco supports Fabric Notifications with NX-OS 9.3.1 or above
- MPIO and PowerPath require OSs and HBAs that support Fabric Notifications.
- All components do not need to support Fabric Notifications
- Example, old 4G, and 8G HBAs where aging optics and congestion are significant problems.
For Default FC port troubleshooting always follow the self-help article:
Connectrix: How to troubleshoot Fibre Channel node to switch port or SFP communication problems by elimination, Self-Help.
Additional Information
SAN array and fabric administrators may be able to resolve these by properly cleaning the optical connectors on cables. For more information see All products: Contaminants such as dust on fiber optic connector end face causes poor IO performance