This post is more than 5 years old
11 Posts
1
5228
December 18th, 2014 05:00
Isilon int-a switch replacement, failover/failback
Hi,
We recently upgraded the Infiniband switches (firmware rev 4.8.930+205-0002-05_A) on an Isilon Cluster (OneFS 7.0.1) from 8-port to 24-port switches. The procedure went smoothly, and indications are that the cluster is now using int-b for data traffic, and int-a for heartbeat/message traffic between cluster nodes. However, we don't know how to verify this, beyond visual instection of the LED indicators. We have two questions:
1) How do we determine which internal network is being used for primary communication between cluster nodes? Does it matter? The docs only state that "int-a must be configured for cluster failover operation to occur"
2) Upon reboot, will primary communication revert back to int-a?
Thanks,
Don Pichette



Jeffey1
4 Operator
•
2.8K Posts
0
December 19th, 2014 01:00
Hi Piched,
You can configure a single internal network and you can specify a second internal network that includes internal network failover if either the int-a or int-b port fails. There are two commands to get detailed information about internal networking information when you are troubleshooting in Isilon.
1. Run "status advanced" from the "isi config" command shell.
This command will indicates the health status of all nodes in the cluster, check the internal networking information as follows:
2. Run "isi_eth_mixer_d showlayout" to indicate how each node selected an interface for connection to each other node. For more information about this command , you can access the thread which peter shared.
Finally, if you reboot the int-a switch, the internal communication of the nodes will failover to int-b switch.
Anonymous User
170 Posts
1
December 18th, 2014 13:00
In /var/log, you'll find a several files named opensm* and they'll show you what's being used for what. For example, /var/log/opensm-*.topo will show you which ports are active and how the nodes communicate to other nodes.
Peter_Sero
4 Operator
•
1.2K Posts
0
December 18th, 2014 13:00
See also
https://community.emc.com/message/851390#851390
in the context of discussing cluster quorum.
-- Peter