PowerScale: Isilon: Nodes running Ethernet Backend can split if both backend interfaces are brought down and back up within five minutes of each other
Summary: This article describes a condition where a node running Ethernet Backend can split when both backend interfaces are brought down within five minutes due to Prep timer status.
This article applies to
This article does not apply to
This article is not tied to any specific product.
Not all product versions are identified in this article.
Symptoms
Nodes running only Ethernet Backend can split if one backend interface flaps and the other backend interface flaps within five minutes of each other.
NOTE: This scenario does not affect InfiniBand architecture, if this scenario matches InfiniBand nodes, the cause is something different and should be investigated.
For example, messages log will indicate the link going down, and then back up:
NOTE: The message indicates "Prep Secs Left: 10" which is incorrect. This counter increments UP to 300, it is misleading and there a fix to this issue. This output indicates that the counter is at 10 seconds and still has 290 seconds left.
NOTE: This scenario does not affect InfiniBand architecture, if this scenario matches InfiniBand nodes, the cause is something different and should be investigated.
For example, messages log will indicate the link going down, and then back up:
2021-01-01T16:27:15Z <0.5> ISILON-1(id1) /boot/kernel.amd64/kernel: mlxen0: *** HW EVENT: Link DOWN *** 2021-01-01T16:27:15Z <0.5> ISILON-1(id1) /boot/kernel.amd64/kernel: mlxen0: link state changed to DOWN 2021-01-01T16:27:27Z <0.5> ISILON-1(id1) /boot/kernel.amd64/kernel: mlxen0: *** HW EVENT: Link UP *** 2021-01-01T16:27:27Z <0.5> ISILON-1(id1) /boot/kernel.amd64/kernel: mlxen0: link state changed to UPWhen the link is back up, a prep timer begins in the Load Balancing or Failover (LBFO) logs:
2021-01-01T16:27:27Z <26.5> ISILON-1(id1) /boot/kernel.amd64/kernel: [lbfo_kif.c:432](pid 12="intr")(tid=100042) mlxen0: prep state started after link UPIf the other link is brought down within five minutes:
2021-01-01T16:29:04Z <0.5> ISILON-1(id1) /boot/kernel.amd64/kernel: mlxen1: *** HW EVENT: Link DOWN *** 2021-01-01T16:29:04Z <0.5> ISILON-1(id1) /boot/kernel.amd64/kernel: mlxen1: link state changed to DOWNThe other path may not be ready yet:
2021-01-01T16:29:04Z <26.4> ISILON-8(id8) /boot/kernel.amd64/kernel: [lbfo_so.c:484](pid 12="intr")(tid=100176) * WARNING * FAILOVER DEFERRED (TX RE-TRANSMIT MONITOR) !!! Alternate path in prep state. Node: 128.221.254.1, CurPath: mlxen1, AltPath: mlxen0, Prep Secs Left: 10
NOTE: The message indicates "Prep Secs Left: 10" which is incorrect. This counter increments UP to 300, it is misleading and there a fix to this issue. This output indicates that the counter is at 10 seconds and still has 290 seconds left.
Cause
This occurs because the LBFO daemon places the link into a "prep state". Prep state is a status where a path (because the link went down) is unavailable until a five minute counter is completed after the link has come back up. The path is unavailable when the link goes down, however, the prep state counter starts when the link has come back up.
Resolution
Steps to consider and perform:
1. Do not perform tests simulating backend failures unless this timer is considered.
2. Performing an upgrade to backend switches requires the switch to reload. A reload brings down interfaces that are connected to the switch, the node sees them as "no carrier".
From the Command Line Interface (cli) run the following command to check the interface status.
# ifconfig -vvv
3. When the switch is finished reloading, node interfaces come back up (showing a carrier status "active"). Run the above command to verify interface status. This begins a prep state timer of five minutes where the interface remains unusable until the timer has completed.
4. During backend switch upgrades, ensure that there is enough time that is allocated when moving from one fabric to the next.
For example, once int-a is completed upgrading, wait 10-15 minutes before starting the upgrade for int-b.
1. Do not perform tests simulating backend failures unless this timer is considered.
2. Performing an upgrade to backend switches requires the switch to reload. A reload brings down interfaces that are connected to the switch, the node sees them as "no carrier".
From the Command Line Interface (cli) run the following command to check the interface status.
# ifconfig -vvv
3. When the switch is finished reloading, node interfaces come back up (showing a carrier status "active"). Run the above command to verify interface status. This begins a prep state timer of five minutes where the interface remains unusable until the timer has completed.
4. During backend switch upgrades, ensure that there is enough time that is allocated when moving from one fabric to the next.
For example, once int-a is completed upgrading, wait 10-15 minutes before starting the upgrade for int-b.
Affected Products
Isilon Switch EthernetProducts
Isilon A200, Isilon A2000, Isilon F800, Isilon F810, Isilon Gen6.5, Isilon Gen6, Isilon H400, Isilon H500, Isilon H5600, Isilon H600, PowerScale F200, PowerScale F600, PowerScale F900Article Properties
Article Number: 000189469
Article Type: Solution
Last Modified: 16 Sept 2022
Version: 3
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.