Article Number: 000028863
How to troubleshoot Fibre Channel node to switch port or SFP communication problems by elimination?
Too many pro-active SFP replacements
Link failure
G port
No light
Not Operational Sequence (NOS)
Off Line Sequence (OLS)
Loss of Signal
Faulty SFP
Troubleshoot FC port
Errors on FC port
Too many SFP pro-actively replaced while the problem lies outside the SFP or switch.
To resolve this issue:
porterrshow : CURRENT CONTEXT -- 3 , 111 frames enc crc crc too too bad enc disc link loss loss frjt fbsy c3timeout pcs tx rx in err g_eof shrt long eof out c3 fail sync sig tx rx err xx: 849.1k 493.2k 0 0 0 0 0 0 2.3m 0 4 6 0 0 0 0 0 0
General Reason:
Only valid, if port statistics have been cleared within the last 24 hours. Otherwise classify these counters as historical. Clear port statistics (https://support.emc.com/kb/304525) and retake data after 4-6 hours.
From the errors, we can see the link fail and loss of sync PLUS enc out errors, these can also include loss sig error.
These combinations of errors generally indicate a host reboot or a link reset external to the switch. The enc out errors are caused during the speed negotiation as part of a link initialization.
Expected Actions:
Verify that the device attached to the port had a legitimate reason to go offline and or online. For example, host reboot. If not, raise an SR.
Example 2 ENC OUT:
porterrshow : CURRENT CONTEXT -- 3 , 111 frames enc crc crc too too bad enc disc link loss loss frjt fbsy c3timeout pcs tx rx in err g_eof shrt long eof out c3 fail sync sig tx rx err xx: 849.1k 493.2k 0 0 0 0 0 0 2.3m 0 0 0 0 0 0 0 0 0
General Reason:
Only valid, if port statistics have been cleared within the last 24 hours. Otherwise classify these counters as historical. Clear port statistics (https://support.emc.com/kb/304525) and retake data after 4-6 hours.
Enc out errors without any associated errors indicate dirty cable.
Expected Actions:
Inspect and clean all optic faces on cable and SFP connected to this port and attached devices.
Example 3 CRC and CRC G_EOF:
porterrshow : CURRENT CONTEXT -- 3 , 111 frames enc crc crc too too bad enc disc link loss loss frjt fbsy c3timeout pcs tx rx in err g_eof shrt long eof out c3 fail sync sig tx rx err xx: 849.1k 493.2k 0 1.2k 1.2k 0 0 0 0 0 0 0 0 0 0 0 0 0
General Reason:
Only valid, if port statistics have been cleared within the last 24 hours. Otherwise classify these counters as historical. Clear port statistics (https://support.emc.com/kb/304525) and retake data after 4-6 hours.
The frame is entering the switch port with a bad CRC but with the end of the frame still marked as good.
This is an indication that this is the first port to register the bad frame so the issue is either the SFP/Cable/Attached device interface on this specific port.
Expected Actions:
See default action in the resolution.
For an ISL port, clear stats with statsclear and slotstatsclear commands, wait 4-6 hours, and collect supportsaves from both switches and open SR for normal troubleshooting.
Example 4 CRC:
porterrshow : CURRENT CONTEXT -- 3 , 111 frames enc crc crc too too bad enc disc link loss loss frjt fbsy c3timeout pcs tx rx in err g_eof shrt long eof out c3 fail sync sig tx rx err xx: 849.1k 493.2k 0 1.2k 0 0 0 0 0 0 0 0 0 0 0 0 0 0
General Reason:
Only valid, if port statistics have been cleared within the last 24 hours. Otherwise classify these counters as historical. Clear port statistics (https://support.emc.com/kb/304525) and retake data after 4-6 hours.
The port is recording a frame entering the switch with a bad CRC frame, but with the frame already marked as bad. Normally see this on an ISL and NPIV F-ports.
Expected Actions:
If CRC errors are logging on NPIV port, have the device investigated by maintaining vendor!
For an ISL port, check all ports in the fabric for any port logging crc g_oef and action as in Example 3.
Example 5 PCS ERR with LINK FAIL and LOSS SYNC:
porterrshow : CURRENT CONTEXT -- 3 , 111 frames enc crc crc too too bad enc disc link loss loss frjt fbsy c3timeout pcs tx rx in err g_eof shrt long eof out c3 fail sync sig tx rx err xx: 849.1k 493.2k 0 0 0 0 0 0 0 0 4 4 0 0 0 0 0 466
General Reason:
Only valid, if port statistics have been cleared within the last 24 hours. Otherwise classify these counters as historical. Clear port statistics (https://support.emc.com/kb/304525) and retake data after 4-6 hours.
This is applicable only on platforms that support 10 Gbps or 16 Gbps ports (6505/6510/6520/DCX-8510) and it was introduced with Condor3 ASIC, the GEN5 Platform. ER_PCS_BLK shows the number of Physical Coding Sublayer (PCS) block errors. This counter is equivalent with enc_out for 8Gb/4Gb link and it is used only for 10 GB and 16 GB speed.
From the errors, we can see link fail and loss of sync plus pcs err errors, these can also include loss sig error.
These combinations of errors generally indicate a host reboot or link reset external to the switch.
The pcs err errors are caused during the speed negotiation as part of link initialization.
Expected Actions:
Verify that the device attached to the port had a legitimate reason to go offline and or online. For example, host reboot. If not, raise SR.
Example 6 PCS ERR:
porterrshow : CURRENT CONTEXT -- 3 , 111 frames enc crc crc too too bad enc disc link loss loss frjt fbsy c3timeout pcs tx rx in err g_eof shrt long eof out c3 fail sync sig tx rx err xx: 849.1k 493.2k 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 466
General Reason:
Only valid, if port statistics have been cleared within the last 24 hours. Otherwise classify these counters as historical. Clear port statistics (https://support.emc.com/kb/304525) and retake data after 4-6 hours.
PCS ERR errors without any associated errors indicate dirty cable.
Expected Actions:
Inspect and clean all optic faces on cable and SFP connected to this port and attached devices.
CISCO EXAMPLES:
Example 1:
Errdisabled - no interface errors incrementing
fc1/1 is down (Error disabled - bit error rate too high) Hardware is Fibre Channel, SFP is short wave laser w/o OFC (SN). 5 minutes input rate 0 bits/sec, 0 bytes/sec, 0 frames/sec 5 minutes output rate 0 bits/sec, 0 bytes/sec, 0 frames/sec 179 frames input, 7668 bytes 0 discards, 0 errors 0 CRC, 0 unknown class 0 too long, 0 too short 23 frames output, 1320 bytes 0 discards, 0 errors 1 input OLS, 1 LRR, 0 NOS, 1 loop inits 2 output OLS, 0 LRR, 0 NOS, 1 loop inits Interface last changed at Thu Jun 5 01:51:00 2014
General Reason:
The "Errdisabled" state of an interface can be a bit misleading as interface counters can be clean on the front end and the switch seems to down the port with "errdisabled" state, and error counters increasing on the back end (ASIC/internal/linecard).
Expected Actions:
See default action in the resolution. If re-occurring, collect tech support details output and open SR.
Example 2:
CRCs incrementing
fc13/1 is down (Initializing) Port description is *** Hardware is Fibre Channel, SFP is long wave laser cost reduced . 5 minutes input rate 32 bits/sec, 4 bytes/sec, 0 frames/sec 5 minutes output rate 32 bits/sec, 4 bytes/sec, 0 frames/sec 162 frames input, 6136 bytes 0 discards, 17 errors 17 CRC, 0 unknown class 0 too long, 17 too short 74 frames output, 6304 bytes 2 discards, 0 errors 108 input OLS, 54 LRR, 2 NOS, 0 loop inits 83 output OLS, 26 LRR, 56 NOS, 0 loop inits Interface last changed at Tue May 27 08:37:20 2014
General Reason:
The port is recording a frame entering the switch with a bad CRC but a good end of frame. The CRC counter only increments on the specific ingress port logging the error and any investigations should be done on this physical link.
Expected Actions:
See default action in the resolution.
Clearing stats on Cisco switches:
Use the commands:
MDS-9509# clear counters interface all
MDS-9509# clear counters interface port-channel <1-256>
MDS-9509# attach module 1 Attaching to module 1 ... To exit type 'exit', to abort type '$.' Bad terminal type: "ansi". Will assume vt100. module-1# clear asic-cnt all
Connectrix
Connectrix, Connectrix B-Series Hardware, Connectrix MDS-Series Hardware
02 Oct 2023
5
Solution