ECN-APJ
2 Iron

How to troubleshoot a SAN fabric

How to troubleshoot a SAN fabric


Introduction

      Even though most administrators develop their own unique style of troubleshooting, it’s strongly recommended to begin the process in the SAN. Switches are located between the hosts and storage devices and have visibility to both sides of the storage network. Starting with them can help narrow the search path. Once the fabric is checked, see if the problem is on the host side or the storage side, and continue a more detailed diagnosis from there. An overview to how troubleshoot a fabric, you might need to check the following items:

·         SAN issues

o    Missing devices

o    Marginal links

o    Incorrect switch and zoning configuration

·         Storage system issues

o    Physical issues between switch and storage

o    Incorrect storage configuration

·         Server configuration issues

o    Down-level HBA firmware

o    Incorrect device driver installation and configuration

Detailed Information

      Some basic data should always be accumulated before you investigate problems and solutions. Using this data can help quickly determine which network components have been subject to change and, therefore, can be the cause of problems. To identify a problem in a SAN environment, follow these steps:

1.     Gather information or logs

2.     Verify physical connectivity and registration to the fabric

3.     Verify storage system and server configuration

4.     Verify end-to-end connectivity and fabric configuration

      Next we shall take Brocade B-series and Cisco MDS-series switches as the example, to demonstrate how to troubleshoot a fabric on switches.

Brocade B-series

1.       Use supportsave/supportshow/supportftpcommands to get RASLOG, TRACE, supportshow, core file, FFDC data, and other support information.

2.       To verify physical connectivity, ask some basic questions:

·         Are you using the correct fiber type (SM or MM)?

·         Has it been checked for proper connection?

·         Is it broken in any way?

·         Is the LED on the connected module port green?

·         Do the LEDs on any HBA or storage system ports indicate normal functionality?

3.       Use switchshow/nsshow commands to verify logical connections, and make sure the devices are connecting to the switch.

4.       Use Web Tools - Switch View to verify ports status:

·         Green: healthy

·         Yellow: marginal

·         Red: critical

·         Gray: unmonitored

·         Blue: buffer-limited

·         Dimmed: not licensed

san_1.bmp

5.       Use portcfgshow/porterrshow commands to verify port configuration

Note: In some cases, you may find that the port has been locked as an L_Port and the device attached is a fabric point-to-point device such as a host or an array target. This would be an incorrect configuration for the device and therefore the device can’t log into the switch. To correct this type of problem, use portcfgdefault command to remove the lock L_Port configuration.

6.       Use portshow command to verify port status and specific configuration parameters.

Cisco MDS-series

1.       Use show tech-support command to collect all the switch configuration information

# terminal length 0

# show tech-support details

# tac-pac bootflash://showtech.switch1

# copy bootflash://showtech.gz ftp://10.127.96.150/showtech_mds1.gz

2.       To verify physical connectivity, ask some basic questions:

·         Are you using the correct fiber type (SM or MM)?

·         Has it been checked for proper connection?

·         Is it broken in any way?

·         Is the LED on the connected module port green?

·         Do the LEDs on any HBA or storage system ports indicate normal functionality?

3.       Use show flogidatabase/show fcns database commands to verify fabric registration

4.       Use Device Manager to check port status

·         Green box: A successful fabric login has occurred; the connection is active.

·         Red X: An SFP is present but there is no connection, Thiscould indicate a disconnected or faulty cable or no active device connection.

·         Red box: An SFP is present but FLOGI has failed. This is typically a mismatch in port or fabric parameters with the neighboring device.

·         Yellow box: A port has been selected.

·         Gray box: This port is administratively disabled.

·         Black box: An SFP is not present.

san_2.bmp

5.       In Device Manager - Summary view, check the information available for port monitoring, includes:

·         Speed

·         Frames transmitted and received

·         Percent utilization for the CPU, dynamic memory and flash memory

Additional tabs includes: Rx BB Credit, Port Channel ID, WWN, MTU, FCID (Fibre Channel ID), Rx Buffer Size, pWWN, nWWN, TrunkConfig, Trunk Failure, Beaconing and SFP information.

6.       Use show interface brief command to monitor interfaces in an easily viewed tabular format.

7.       Use show interface [slot/port] status command to check the down state of a single interface.

Note: If the interface appears as down or offline, use the no shutdown command to bring the port online.

8.       If the link is stuck in initialization state or is in a point-to-point state, use show port internal info interface fc [slot/port] command to verify the port status is in link-failure. If so, then you may have a cabling issue; if not, then use the shutdown/no shutdown commands to disable and enable the port. If this does not clear the problem, try moving the connection to a different port on the same or another module.

                                              


Author: Roger


             

iEMC APJ

Please click here for for all contents shared by us.

Labels (2)
6 Replies
Martin2341
1 Nickel

Re: How to troubleshoot a SAN fabric

This is a good post. I would also like to highlight that the FOS Troubleshooting and Diagnostics Guide can also be helpful for Brocade B-series products. The following link is to the FOS v7.2.0 version however you should endeavor to use the version which matches the code level you are running as closely as possible of course.

http://www.brocade.com/downloads/documents/html_product_manuals/FOS_TRBLSHOOT_720/wwhelp/wwhimpl/js/...

0 Kudos
ECN-APJ
2 Iron

Re: How to troubleshoot a SAN fabric

Thanks for the supplement, Martin. It is very useful.

0 Kudos
koppuru
1 Nickel

Re: How to troubleshoot a SAN fabric

Is there a command in cisco that would return fc-alias value for a given wwpn ? In brocade I could think of "nodefind" command and I am looking for similar one in cisco

0 Kudos
dynamox
6 Gallium

Re: How to troubleshoot a SAN fabric

sh fcalias | grep prev 2 50:05:07:63:0f:40:28:05

0 Kudos
darekwj
1 Nickel

Re: How to troubleshoot a SAN fabric

Hi,

It is good summary.
Could You post some guidelines about CMCNE/BNA dashboards?
This is also very helpfull for troubleshooting as well as for daily administration.
However I have not found any consistent guide desribing how to start and get the right data displayed.

Specifically What I am interested are the best practises for dashboards building.

Dariusz

0 Kudos
ECN-APJ
2 Iron

Re: Re: How to troubleshoot a SAN fabric

0 Kudos