Connectrix: How to troubleshoot Fibre Channel node to switch port or SFP communication problems by elimination, Self-Help.

Summary: This article explains how to troubleshoot Fibre Channel node to switch port or SFP communication problems by elimination.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

How to troubleshoot Fibre Channel node to switch port or SFP communication problems by elimination?

Too many pro-active SFP replacements
Link failure
G port
No light
Not Operational Sequence (NOS)
Off Line Sequence  (OLS)
Loss of Signal
Faulty SFP
Troubleshoot FC port
Errors on FC port

Cause

Too many SFP pro-actively replaced while the problem lies outside the SFP or switch.

Resolution

To resolve this issue:

  1. Identify the node and switch port involved in the communications failure.
  2. Verify that the switch port is administratively up (unblocked, no shut), or enabled.
  3. Make sure there are redundant paths available to the attached device before proceeding.

 

WARNING: Before proceeding any further, make sure you know how your node reacts if it gets a new FCID. Some operating system versions of AIX and HP-UX do not react well to such changes, since the FCID is built in the hardware path to the storage device. If you move the cable, you might have data unavailable. If you have any doubts, consult with an EMC Technical Support Engineer.

 

  1. To eliminate the SFP of being the problem, do the following:
NOTE:  If there is an issue with the SFP, this procedure is the quickest way of bringing the device back online.

 

  1. Check for the free port on the switch.
  2. Disable the identified free port on the switch.
  3. Move the cable from the port to be investigated to the new disabled free port in the previous step.
  4. Change the disabled port to enable state (or administratively up) and bring the device back online.
  5. Clear/reset the stats/counters to zero on the switch.

For Brocade see KBA: 

Connectrix B-Series Brocade: How to clear interface and ASIC counters on an Connectrix Brocade B-Series switches and Directors 

For Cisco see KBA: 

Connectrix - MDS Series Cisco: How to clear interface and ASIC counters on an MDS

 

  1. Monitor the port with the respective commands for 4-6 hours.

 

RESULTS:

  • If the error counters increase, the problem lies outside the switch, and the customer / user / SAN admin must be advised to:

 

    • The new port SFP and the Cable require cleaning. (To prevent contamination on the SFP of a dirty cable, consider using a professional cleaning kit.)
    • The attached device must be investigated further by whomever support the device.
    • On Cisco switch, if the "errdisabled" state comes back with no counter increase, an SR must be opened for further back end investigation.

 

  • If the errors do not increase (or the Errdisabled state on the Cisco switch does not come back), the SFP on the previous port is defective, raise SR for SFP replacement providing above analysis results, including the log outputs, SFP details (SM or MM, and speed, so forth)

 

NOTE: You can do the same procedure from Step 6 onwards if you replaced the cable and or attached device, by checking the counters.

 

Additional Information

NOTE: Most of the time, if an SFP optical transceiver definitely fails, you see a clear optic failure in the event log.


Hardware failures can easily be isolated by applying a simple algorithm to the problem; if it is not this piece of hardware, then it is the other piece. Loop until you isolate the failure pointing to the problem hardware.


BROCADE EXAMPLES:

NOTE: For an explanation or description of the counters in the porterrshow see the Self-Help Knowledge Base Article (KBA): (The examples highlighted below.)
Connectrix B-Series: How to Interpret the Brocade porterrshow output, and what do the counters mean. Self-Help



Example 1           ENC OUT with LINK FAIL and LOSS SYNC:

 

porterrshow        :
CURRENT CONTEXT -- 3 , 111
     frames        enc  crc  crc    too   too   bad  enc   disc  link  loss  loss  frjt  fbsy  c3timeout    pcs
     tx     rx     in   err  g_eof  shrt  long  eof  out   c3    fail  sync  sig               tx    rx     err
xx:  849.1k 493.2k 0    0    0      0     0     0    2.3m  0     4     6     0     0     0     0      0     0


General Reason:
Only valid, if port statistics have been cleared within the last 24 hours. Otherwise classify these counters as historical. Clear port statistics (as listed above under point 9) and check the counters and retake data after 4-6 hours.

From the errors, we can see the link fail and loss of sync PLUS enc out errors, these can also include loss sig error.
These combinations of errors generally indicate a host reboot or a link reset external to the switch. The enc out errors are caused during the speed negotiation as part of a link initialization.

Expected Actions:
Verify that the device attached to the port had a legitimate reason to go offline and or online. For example, host reboot. If not, raise an SR.


Example 2           ENC OUT:

porterrshow        :
CURRENT CONTEXT -- 3 , 111
     frames        enc  crc  crc    too   too   bad  enc   disc  link  loss  loss  frjt  fbsy  c3timeout    pcs
     tx     rx     in   err  g_eof  shrt  long  eof  out   c3    fail  sync  sig               tx    rx     err
xx:  849.1k 493.2k 0    0    0      0     0     0    2.3m  0     0     0     0     0     0     0      0     0


General Reason:
Only valid, if port statistics have been cleared within the last 24 hours. Otherwise classify these counters as historical. Clear port statistics (as listed above under point 9) and check the counters and retake data after 4-6 hours.

Enc out errors without any associated errors indicate dirty cable.

Expected Actions:
Inspect and clean all optic faces on cable and SFP connected to this port and attached devices.


Example 3     CRC and CRC G_EOF:

porterrshow        :
CURRENT CONTEXT -- 3 , 111
     frames        enc  crc  crc    too   too   bad  enc   disc  link  loss  loss  frjt  fbsy  c3timeout    pcs
     tx     rx     in   err  g_eof  shrt  long  eof  out   c3    fail  sync  sig               tx    rx     err
xx:  849.1k 493.2k 0    1.2k 1.2k   0     0     0    0     0     0     0     0     0     0     0      0     0


General Reason:
Only valid, if port statistics have been cleared within the last 24 hours. Otherwise classify these counters as historical. Clear port statistics (as listed above under point 9) and check the counters and retake data after 4-6 hours.

The frame is entering the switch port with a bad CRC but with the end of the frame still marked as good.
This is an indication that this is the first port to register the bad frame so the issue is either the SFP/Cable/Attached device interface on this specific port.

Expected Actions:
See default action in the resolution.

For an ISL port, clear stats (as listed above under point 9) and check the counters and retake data after 4-6 hours, collect supportsaves from both switches and open SR for normal troubleshooting.

 

Example 4     CRC:

porterrshow        :
CURRENT CONTEXT -- 3 , 111
     frames        enc  crc  crc    too   too   bad  enc   disc  link  loss  loss  frjt  fbsy  c3timeout    pcs
     tx     rx     in   err  g_eof  shrt  long  eof  out   c3    fail  sync  sig               tx    rx     err
xx:  849.1k 493.2k 0    1.2k 0      0     0     0    0     0     0     0     0     0     0     0      0     0


General Reason:
Only valid, if port statistics have been cleared within the last 24 hours. Otherwise classify these counters as historical. Clear port statistics (as listed above under point 9) and check the counters and retake data after 4-6 hours.

The port is recording a frame entering the switch with a bad CRC frame, but with the frame already marked as bad. Normally see this on an ISL and NPIV F-ports.

Expected Actions:
If CRC errors are logging on NPIV port, have the device investigated by maintaining vendor!
For an ISL port, check all ports in the fabric for any port logging crc g_oef and action as in Example 3.


Example 5     PCS ERR with LINK FAIL and LOSS SYNC:

porterrshow        :
CURRENT CONTEXT -- 3 , 111
     frames        enc  crc  crc    too   too   bad  enc   disc  link  loss  loss  frjt  fbsy  c3timeout    pcs
     tx     rx     in   err  g_eof  shrt  long  eof  out   c3    fail  sync  sig               tx    rx     err
xx:  849.1k 493.2k 0    0    0      0     0     0    0     0     4     4     0     0     0     0      0     466


General Reason:
Only valid, if port statistics have been cleared within the last 24 hours. Otherwise classify these counters as historical. Clear port statistics (as listed above under point 9) and check the counters and retake data after 4-6 hours.

This is applicable only on platforms that support 10 Gbps or 16 Gbps ports and higher  (6505/6510/6520/DCX-8510) and it was introduced with Condor3 ASIC, the GEN5 Platform.

ER_PCS_BLK shows the number of Physical Coding Sublayer (PCS) block errors. This counter is equivalent with enc_out for 8Gb/4Gb link and it is used only for 10 GB and 16 GB speed and higher.
From the errors, we can see link fail and loss of sync plus pcs err  errors, these can also include loss sig error.
These combinations of errors generally indicate a host reboot or link reset external to the switch.
The pcs err errors are caused during the speed negotiation as part of link initialization.

Expected Actions:
Verify that the device attached to the port had a legitimate reason to go offline and or online. For example, host reboot. If not, raise SR.


Example 6     PCS ERR:

porterrshow        :
CURRENT CONTEXT -- 3 , 111
     frames        enc  crc  crc    too   too   bad  enc   disc  link  loss  loss  frjt  fbsy  c3timeout    pcs
     tx     rx     in   err  g_eof  shrt  long  eof  out   c3    fail  sync  sig               tx    rx     err
xx:  849.1k 493.2k 0    0    0      0     0     0    0     0     0     0     0     0     0     0      0     466


General Reason:
Only valid, if port statistics have been cleared within the last 24 hours. Otherwise classify these counters as historical. Clear port statistics (as listed above under point 9) and check the counters and retake data after 4-6 hours.

PCS ERR errors without any associated errors indicate dirty cable.

Expected Actions:
Inspect and clean all optic faces on cable and SFP connected to this port and attached devices.

 


CISCO EXAMPLES:

Example 1    Errdisabled:

Errdisabled - no interface errors incrementing
 

fc1/1 is down (Error disabled - bit error rate too high)
    Hardware is Fibre Channel, SFP is short wave laser w/o OFC (SN).
    5 minutes input rate 0 bits/sec, 0 bytes/sec, 0 frames/sec
    5 minutes output rate 0 bits/sec, 0 bytes/sec, 0 frames/sec
      179 frames input, 7668 bytes
        0 discards, 0 errors
        0 CRC,  0 unknown class
        0 too long, 0 too short
      23 frames output, 1320 bytes
        0 discards, 0 errors
      1 input OLS, 1 LRR, 0 NOS, 1 loop inits
      2 output OLS, 0 LRR, 0 NOS, 1 loop inits
    Interface last changed at Thu Jun  5 01:51:00 2014

 

General Reason:
The "Errdisabled" state of an interface can be a bit misleading as interface counters can be clean on the front end and the switch seems to down the port with "errdisabled" state, and error counters increasing on the back end (ASIC/internal/linecard).

Expected Actions:
See default action in the resolution. If re-occurring, collect tech support details output and open SR.
 

NOTE: Information on "Errdisabled" state from Cisco: The bit errors can occur for the following reasons:
  • Faulty or bad cable
  • Faulty or bad SFP
  • SFP is specified to operate at 1 Gbps but is used at 2 Gbps.
  • SFP is specified to operate at 2 Gbps but is used at 4 Gbps.
  • Short haul cable is used for long haul, or long haul cable is used for short haul.
  • Momentary sync loss
  • Loose cable connection at one or both ends
  • Improper SFP connection at one or both ends


A bit error rate threshold is detected when 15 error bursts occur in a 5-minute period. By default, the switch disables the interface when the threshold is reached. You can enter the commands below in sequence to reenable the interface.

shutdown
no shutdown
You can configure the switch to not disable an interface when the threshold is crossed.


By default, the threshold disables the interface.


Example 2:   CRC

CRCs incrementing
 

fc13/1 is down (Initializing)
    Port description is ***
    Hardware is Fibre Channel, SFP is long wave laser cost reduced.
    5 minutes input rate 32 bits/sec, 4 bytes/sec, 0 frames/sec
    5 minutes output rate 32 bits/sec, 4 bytes/sec, 0 frames/sec
      162 frames input, 6136 bytes
        0 discards, 17 errors
        17 CRC,  0 unknown class
        0 too long, 17 too short
      74 frames output, 6304 bytes
        2 discards, 0 errors
      108 input OLS, 54 LRR, 2 NOS, 0 loop inits
      83 output OLS, 26 LRR, 56 NOS, 0 loop inits
    Interface last changed at Tue May 27 08:37:20 2014


General Reason:

The port is recording a frame entering the switch with a bad CRC but a good end of frame. The CRC counter only increments on the specific ingress port logging the error and any investigations should be done on this physical link.

Expected Actions:
See default action in the resolution.

Clear port statistics (as listed above under point 9) and check the counters and retake data after 4-6 hours.

 

Example 3: NOS

Non-Operational Sequence (NOS)

show int fc1/1 counters
fc1/1
    5 minutes input rate 1753296 bits/sec, 219162 bytes/sec, 199 frames/sec
    5 minutes output rate 2310384 bits/sec, 288798 bytes/sec, 194 frames/sec
    2741512190 frames input, 2542476084276 bytes
      0 class-2 frames, 0 bytes
      2741512190 class-3 frames, 2542476084276 bytes
      0 class-f frames, 0 bytes
      0 discards, 0 errors, 0 CRC
      0 unknown class, 0 too long, 0 too short
    3410405365 frames output, 5164364339412 bytes
      0 class-2 frames, 0 bytes
      3410405365 class-3 frames, 5164364339412 bytes
      0 class-f frames, 0 bytes
      0 discards, 0 errors
    1 input OLS, 1 LRR, 0 NOS, 307 loop inits
    289 output OLS, 289 LRR, 289 NOS, 289 loop inits
    0 link failures, 0 sync losses, 0 signal losses
     48276 BB credit transitions from zero
      16 receive B2B credit remaining
      3 transmit B2B credit remaining
      3 low priority transmit B2B credit remaining

 

General Reason:

Loss of connection prior to link negotiations.


Expected Actions:
Check layer 1 (physical layer) and the source device.

Affected Products

Connectrix

Products

Connectrix, Connectrix B-Series Hardware, Connectrix MDS-Series Hardware
Article Properties
Article Number: 000028863
Article Type: Solution
Last Modified: 29 Jul 2025
Version:  9
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.