PowerScale: InfiniBand to Ethernet conversion may result in incorrect link aggregation configuration
Summary: This article describes how to resolve an issue that is seen during a cluster's backend InfiniBand to Ethernet conversion for OneFS version prior to 9.1.0.0.
Symptoms
OneFS versions prior to 9.1 and converting a cluster from InfiniBand to Ethernet backend can result in incorrectly configured aggregate ports. Rebooting the node would create the bad aggregate and cause a node split.
Mellanox vendor interfaces (mlxen) that are misconfigured for aggregation and may likely result in the node failing to re-join the cluster. Reviewing ifconfig from a node shows ISIINTERNAL interfaces mapped to lagg0.
Isilon-18# ifconfig bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=8009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTATE> ether 00:60:16:cc:bb:aa inet 192.168.60.10 netmask 0xffffff80 broadcast 192.168.60.127 zone 1 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex,master>) status: active mlxen0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,ISIINTERNAL> metric 0 mtu 1500 options=ed07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> ether 98:03:9b:cc:bb:aa inet 128.221.252.18 netmask 0xffffff00 broadcast 128.221.252.255 zone 1 inet 128.221.254.18 netmask 0xffffff00 broadcast 128.221.254.255 zone 1 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> media: Ethernet autoselect (40Gbase-CR4 <full-duplex,rxpause,txpause>) status: active mlxen1: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,ISIINTERNAL> metric 0 mtu 1500 options=ed07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> ether 98:03:9b:cc:bb:aa inet 128.221.253.18 netmask 0xffffff00 broadcast 128.221.253.255 zone 1 inet 128.221.254.18 netmask 0xffffff00 broadcast 128.221.254.255 zone 1 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> media: Ethernet autoselect (40Gbase-CR4 <full-duplex,rxpause,txpause>) status: active mlxen2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=ed07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> ether 98:03:9b:cc:bb:fa nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (10Gbase-CX4 <full-duplex,rxpause,txpause>) status: active mlxen3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=ed07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> ether 98:03:9b:cc:bb:fb nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (10Gbase-CX4 <full-duplex,rxpause,txpause>) status: active lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=ed07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6> ether 98:03:9b:cc:bb:aa nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> media: Ethernet autoselect status: active groups: lagg laggproto lacp lagghash l2,l3,l4 laggport: mlxen0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING> laggport: mlxen1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING> vlan0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=303<RXCSUM,TXCSUM,TSO4,TSO6> ether 98:03:9b:cc:bb:aa inet 10.10.20.11 netmask 0xffffff00 broadcast 10.10.20.255 zone 18 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> media: Ethernet autoselect status: active vlan: 100 vlanpcp: 0 parent interface: lagg0 groups: vlan
Cause
The conversion from Infiniband to Ethernet changes the interface names from ib0 to mlxen0 (int-a) and ib1 to mlxen1 (int-b). The lag creator is referencing "mlxen0" and "mlxen1" as the external network interface ports. Should the issue occur (meaning preventative measures were not taken), steps are needed to correct the mapping within Flexnet (the networking management daemon).
Resolution
Before Migration:
This issue has been corrected for OneFS 9.1 and later. If you are on an affected version, perform the following prior to migrating from InfiniBand to Ethernet.
- Remove all aggregate interfaces from all network pools.
- Complete the migration.
- Readd all aggregate interfaces to all necessary network pools.
After Migration:
If the issue has occurred and a node split, perform one of the following steps (Auto or Manual) to work around the issue.
Auto Resolution (Workaround)
========================================================
-
Create a backup of the "lni" file:
mv /etc/mcp/sys/lni.xml /etc/mcp/sys/lni.xml.bak
-
Remove the affected node interface from the network pool.
isi network pools modify <groupnet.subnet.pool> --remove-ifaces=<interface example: 2:40gige-agg-1>
-
Run the following command to rebuild the node's lni.xml file:
isi_create_lni_xml
-
Reboot the node.
-
Verify that the interface is correct.
-
Proceed with the final step of configuring the MTU 9000. After this is performed, add the affected node interface back to the pool.
isi network pools modify <groupnet.subnet.pool> --add-ifaces=<interface example: 2:40gige-agg-1>
Manual Resolution (Workaround)
========================================================
In order to resolve this issue, the laggports must be removed manually by performing the following actions.
-
Use a serial connection into the affected node.
-
Disable "mcp" on the affected node.
killall -9 isi_mcp
-
Disable "isi_flexnet_d" on the affected node.
killall -9 isi_flexnet_d
-
Create a backup of both "flx_config.xml" files on the local directory.
-
mv /etc/ifs/flexnet/flx_config.xml /etc/ifs/flexnet/flx_config.xml.bak
-
mv /etc/ifs/flexnet/flx_config.xml~ /etc/ifs/flexnet/flx_config.xml~.bak
-
-
If there is any "vlans" that is associated with the aggregate port, bring those down.
ifconfig <vlan interface> down
EXAMPLE
ifconfig vlan0 down
-
Remove the "laggports" from the lag interface.
ifconfig <lag interface> -laggport <mlx iface>
EXAMPLES
ifconfig lagg0 -laggport mlxen0
ifconfig lagg0 -laggport mlxen1
-
Bring down the lag interface.
ifconfig <lag iface> down
EXAMPLE
ifconfig lagg0 down
-
Now that the backend interface is disassociated from the lag port, test ping to any other node through "int-a" AND "int-b."
ping <back-end IP [int-a]>
ping <back-end IP [int-b]>
-
Verify that the node is no longer down.
isi status -q
-
To refresh all processes, reboot the node.