PowerScale: How to determine the subnet controller of a PowerScale InfiniBand fabric

Samenvatting: How to determine the subnet controller of a PowerScale InfiniBand fabric.

Dit artikel is van toepassing op Dit artikel is niet van toepassing op Dit artikel is niet gebonden aan een specifiek product. Niet alle productversies worden in dit artikel vermeld.

Instructies

Introduction

OpenSM provides an implementation of an InfiniBand (IB) Subnet Manager and Administration and runs on top of OpenIB. OpenSM must be functioning properly in order for all InfiniBand (IB) traffic that relies on OpenSM to work properly. If an IB issue occurs, you might want to review the OpenSM logs, and to do that, you must know which log to review. The opensm service runs on all nodes, and each node has its own OpenSM logs. However, only the subnet master of the IB fabric makes topology discoveries, so only its OpenSM log has complete and accurate information. Therefore to be able to identify which device is acting as the subnet master of the fabric.

In a dual-switch configuration, you must correlate the OpenSM log to the interface that it is bound to. The opensm-1.topo and opensm-2.topo files do not always correlate directly to the internal-a (int-a) and internal-b (int-b) interfaces. You can use the IB interface link layer address (lladdr) value to determine which file is associated with which interface. The procedure below describes how to do this.

NOTE
A .topo file is generated when a connection to the IB switch is initiated and contains information gathered at that time. A .log file will always accompany the .topo file. The .log file contains messages and topography information about the InfiniBand connection. Once you know the master, you can review the appropriate .log file for information about a specific issue.  Only the topo file on the node for the OpenSM master can be relied upon for a correct topology of the fabric.  topo files from other nodes should not be used.

 

 Procedure

1. Open an SSH connection on any node in the cluster and log in using the "root" account. Stay on the same node to perform the rest of the steps in this procedure.

2. Determine the OpenSM Masters for each switch:

In an environment with two switches, there should be two lines of output, one for each switch.

isi_for_array -XI 'ps auxw | grep opensm' | grep master


In the example below for an environment with a single InfiniBand Switch, 0xe41d2d0300bc8fc2 corresponds to the lladdr of the Isilon NIC for that node, and IsilonX210-S19-1 includes the node number for the master.

IsilonX210-S19-3# isi_for_array -XI 'ps auxw | grep opensm' | grep master
IsilonX210-S19-1: root    3757   0.0  0.0  28536   5036  -  S    23Feb17      3:56.20 opensm: 0xe41d2d0300bc8fc2 master (opensm)


3. For each result in command 2, determine which interface on the node is referred to from the output above by examining the lladdr from ifconfig output.  

Repeat this step for each node in the output from step 2 replacing <LNN> with the node number

isi_for_array -n <LNN> 'ifconfig ib0 ; ifconfig ib1' | grep -E "ib[01]"\|lladdr\|status

For our example, the interface of the master would be ib1 (lladdr is separated by a dot in this output for clarity in reading and ends in bc.8f.c2, the same as from the example in command 2 above.)

IsilonX210-S19-3# isi_for_array -n 1 'ifconfig ib0 ; ifconfig ib1' | grep -E "ib[01]"\|lladdr\|status
IsilonX210-S19-1: ib0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 4092
IsilonX210-S19-1:       lladdr 0.0.0.48.fe.80.0.0.0.0.0.0.e4.1d.2d.3.0.bc.8f.c1
IsilonX210-S19-1:       status: inactive
IsilonX210-S19-1: ib1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 2044
IsilonX210-S19-1:       lladdr 0.0.0.49.fe.80.0.0.0.0.0.0.e4.1d.2d.3.0.bc.8f.c2
IsilonX210-S19-1:       status: active

Extra informatie

If an interface shows as inactive in step 3, then its status can be disregarded as it may show as master.   

If there are multiple OpenSM masters per switch, and the extra masters are not due to an inactive Network Interface Card (NIC), contact PowerScale Technical Support.

If there are no OpenSM masters, confirm that there are no other devices physically connected to the switch which are not part of the cluster.  This includes nodes that are powered on if they have either not been added to the cluster or removed from it.  If there are no additional connections on the InfiniBand fabric, the switch may have taken on the role of being the master.

When optimally configured, the designated subnet master should be a cluster node, not an IB switch or unconfigured node.

In rare cases, an IB switch might be configured as its own subnet master. This can cause problems that are difficult to diagnose. I.e. an IB interface doesn't come up, switch doesn't route IB traffic correctly or even prevent nodes from joining the cluster.

If the IB switch is the master, contact PowerScale Technical Support.

Getroffen producten

Isilon Switches

Producten

PowerScale OneFS
Artikeleigenschappen
Artikelnummer: 000004114
Artikeltype: How To
Laatst aangepast: 07 jan. 2026
Versie:  8
Vind antwoorden op uw vragen via andere Dell gebruikers
Support Services
Controleer of uw apparaat wordt gedekt door Support Services.