PowerVault ME5: The host link PHY error count is greater than the error threshold
Summary: PowerVault ME5 operators may observe occasional event 663 host link PHY error count messages in the event history log when using either fiber channel or SAS controller frontend ports connected to hosts or switches. This event has been introduced in ME5 firmware version ME5.1.2.1.0 and above. ...
Instructions
Depending on PowerVault ME5 series controller module configuration, go to the appropriate section in this article.
- Controller modules using fiber channel (FC) frontend ports
- Controller modules using Serial Attached SCSI (SAS) frontend ports
Controllers using fiber channel (FC) frontend ports
PowerVault ME5 series array operators with controllers modules connected using fiber channel (FC) SFP transceivers may need to take corrective action to resolve this symptom. Often this does not affect I/O processing, however, the message is an early indication of a configuration problem, potential hardware, or connection problem with either the SFP transceiver or the fiber optic cable.
In most instances this is easily resolved by correctly cleaning the end face of the LC connector on the fiber optic cable to remove dust or other contaminants that impede light refraction. For instructions on how to both inspect and clean the fiber optic cables connector end face, follow the guidance in this knowledge base article: Contaminants such as dust on fiber optic connector end face causes poor IO performance
A41844 2024-08-12 10:45:54 112 INFORMATIONAL Host link down. (port: 1) A41853 2024-08-12 10:46:30 111 INFORMATIONAL Host link up. (port: 1, speed: 32 Gbps, point-to-point, fabric) A42131 2024-10-13 18:44:37 663 ERROR The host link PHY error count is greater than the error threshold. (port: 1, type: ) A42132 2024-10-13 18:46:44 663 RESOLVED The host link PHY error count has been resolved. (port: 1, type: resolved)
PowerVault ME5 array firmware versions ME5.1.2.1.0 and later monitor the FC ports Invalid Transmission Word Count metric. This counter value means that a word did not transmit successfully, resulting in encoding errors. This counters value is not displayed in PowerVault Manager or CLI, however, it is recorded in each storage controllers (SC) debug log. The controller SC debug logs are gathered within the PowerVault ME support bundle. See PowerVault ME5: How to collect PowerVault support logs
If this ERROR message is observed frequently, operators can take the following actions:
-
Inspect the fiber optic cable installation and properly clean the fiber optic cable connector end face and monitor the event history logs for repeated occurrences of event 663. See Contaminants such as dust on fiber optic connector end face causes poor IO performance
-
If cleaning the fiber optic cable connectors do not resolve the issue, substitute a known good transceiver and fiber optic cable. Ensure that the parts are not damaged and are being handled correctly.
-
If replacing transceivers does not resolve the symptom, operators should use validated transceivers as listed in the Dell PowerVault ME5 Series Storage System Support Matrix, otherwise contact their vendor for support.
-
In rare cases, operators who use older generation 8 Gb FC switches may need to adjust their switch portCfgFillword setting. See the Additional Info section below.
Controllers using Serial Attached SCSI (SAS) frontend ports
Each host to controller SAS cable connection forms a SAS-wide port that consists of more than one physical link (PHY). Each PHY is a set of four wires used as two differential signal pairs, allowing data to be transmitted in both directions simultaneously.
Usually this does not affect I/O processing, the event is expected when SAS cables are inserted as the SAS link is formed. On redundant controller configurations, the event 663 is recorded simultaneously on both controllers as the connected host server boots and loads its host operating system SAS driver. No further action is needed by the operator.
More considerations:
Where PowerVault ME controllers are connected to a Dell SAS HBA355e. The host server port connections can be distributed per PowerVault ME5 series report host port degraded when connected to Dell HBA355e SAS controller After change server SAS HBAs or SAS HBA ports used. Use PowerVault Manager to check the port initiator WWN ID is mapped to the correct hosts and volumes.
Ensure that SAS cables are securely inserted by gently tugging at each end of the SAS cable. If the SAS cable comes to lose from the port, reseat it correctly. You may hear a click when the connector latch is secured. If the SAS cable connector cannot be properly secured, check which PCI-e slot is being used in the host server, the slot position of the SAS HBA within the chassis may obstruct connector insertion. For details see PowerEdge 16G models: HBA355e PCI-e Slot Selection
Sample PowerVault ME5 event history log when a SAS connected host is rebooted. Both controllers record the event, and the last event in the sequence indicates that the issue is resolved.
B1473 2024-08-15 09:55:22 112 INFORMATIONAL Host link down. (port: 2) A3538 2024-08-15 09:55:26 112 INFORMATIONAL Host link down. (port: 2) B1483 2024-08-15 09:55:29 111 INFORMATIONAL Host link up. (port: 2, type: SAS) A3547 2024-08-15 09:55:32 111 INFORMATIONAL Host link up. (port: 2, type: SAS) A3911 2024-10-22 10:10:46 354 WARNING Host SAS topology was changed. (host port: 2, 0 out of 4 PHYs are up, link speed: Autonegotiated) B1640 2024-10-22 10:10:46 354 WARNING Host SAS topology was changed. (host port: 2, 0 out of 4 PHYs are up, link speed: Autonegotiated) A3912 2024-10-22 10:11:55 354 INFORMATIONAL Host SAS topology was changed. (host port: 2, 4 out of 4 PHYs are up, link speed: 12 Gbps) B1641 2024-10-22 10:11:55 354 INFORMATIONAL Host SAS topology was changed. (host port: 2, 4 out of 4 PHYs are up, link speed: 12 Gbps) A3913 2024-10-22 10:12:58 663 ERROR The host link PHY error count is greater than the error threshold. (port: 2, type: disparity errors, lost dword count, invalid dword count) B1642 2024-10-22 10:13:33 663 ERROR The host link PHY error count is greater than the error threshold. (port: 2, type: disparity errors, lost dword count, invalid dword count) A3914 2024-10-22 10:15:00 663 RESOLVED The host link PHY error count has been resolved. (port: 2, type: resolved)
Additional Information
Operators using older generation Connectrix or Brocade FC switches that support 8 Gb can configure the fillword setting by portCfgFillword command. When 8 Gb switches were introduced, ARBff was adopted instead of IDLE mostly because it could contribute to lower bit error. The IDLE that used for initialization was also changed to ARBff, along with the fillword change.
Operators of older generation FC switches who continue to use them in production may observe an increase in Invalid Transmission Word Count counters and need to set the portcfgfillword value to always use ARBff. On a Brocade switch this is the output of "portcfgfillword --help."
admin> portcfgfillword --help
Usage: portCfgFillWord [SlotNumber/]PortNumber Mode [Passive]
Mode: 0/-idle-idle - IDLE in Link Init, IDLE as fill word (default)
1/-arbff-arbff - ARBFF in Link Init, ARBFF as fill word
2/-idle-arbff - IDLE inLink Init, ARBFF as fill word (SW)
3/-aa-then-ia - If ARBFF/ARBFF failed, then do IDLE/ARBFF
Passive: 0/1
admin>
0: Use IDLE for initialization as 4Gb switches do.
1: Always use ARBff.
2: Use IDLE for initialization and use ARBff for between data frames.
3: Use ARBff for initialization first. If it fails, use IDLE. Use ARBff for between data frames.