PowerStore 500T: CMI self-protect mechanism may lead to a short service disruption

Summary: PowerStore 500T, CMI self-protect mechanism may lead to an auto recovered Data Unavailable (DU) condition.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

This issue only affects PowerStore 500T running PowerStoreOS version 3.2.1 (or later).

If a PowerStore node is not responding, the CMI link is disabled. This leads to a node reboot and in some rare cases the peer node may also unexpectedly panic leading to a short auto recovered DU.

Cause

PowerStore 500T leverages RDMA for nodes internal communication. The communication is done on a CMI link (PCIe NTB links).

On releases prior to version 3.2.1, there was an issue where a node that encounters a CPU IERR (CATERR) may propagate the CPU IERR to the peer node leading to a potential data loss (DL). For more information, see KB# 000213516 PowerStore 500T: Hardware failure may propagate to peer node leading to service disruption.

To fix the issue, a Watchdog is implemented in the BMC. When the Watchdog expires the BMC cuts the CMI links between the two nodes.  This causes one of the nodes to be fenced and reboot.

The other node may sometimes experience starvations (inability to access resources) due to remote memory on the peer node is "cut."

Resolution

None. There is no workaround to avoid the service disruption.

Affected Products

PowerStore, PowerStore 500T
Article Properties
Article Number: 000302480
Article Type: Solution
Last Modified: 07 Apr 2025
Version:  3
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.