PowerStore: System health check may report multiple failures after CP restart PowerStoreOS 3.2 or 3.5
Summary: This article explains how to after a CP restart (include NDU) on a PowerStoreOS 3.2 or 3.5 system, following issue may occur. - Running a system health check may result in multiple failed checks. - Replication sessions may also be in a system paused state. - It is not possible to add a remote system. ...
This article applies to
This article does not apply to
This article is not tied to any specific product.
Not all product versions are identified in this article.
Symptoms
- The system check may return the following failed checks:
- Replication sessions may be in a system paused state.
Pre-Upgrade Check node_a Failed PUHC health check returned 3 failed checks. Checking SFP Alerts Failed Checking Active Imports Failed Checking Storage Migrations Failed REST Configuration Service node_a Failed Unable to perform health check due to communication error.
- Replication sessions may be in a system paused state.
Alert 0x01302d03 Data transfer is paused for file system replication session Alert 0x01700403 A replication session associated with resource VolumeABC123 is in System_Paused state because of error.
- Attempt to add a remote system may fail immediately or after 60 seconds.
Cause
This issue was a result of a software issue since 3.2.
There are multiple certificates, corresponds to multiple IPs the cluster uses.
During CP restart, (could happen during NDU since we restart CP during NDU, could also happen during manual CP restart or any other CP restart while running on the affected version), there is a race condition that CP cannot load all correct certificates all the time during every restart.
As a result, communications rely on certificate may be impacted, including for example:
- Replication mgmt connection broken and replication sessions impacted.
- System health check failed.
- Cannot add a remote system to configure replication.
There are multiple certificates, corresponds to multiple IPs the cluster uses.
During CP restart, (could happen during NDU since we restart CP during NDU, could also happen during manual CP restart or any other CP restart while running on the affected version), there is a race condition that CP cannot load all correct certificates all the time during every restart.
As a result, communications rely on certificate may be impacted, including for example:
- Replication mgmt connection broken and replication sessions impacted.
- System health check failed.
- Cannot add a remote system to configure replication.
Resolution
Workaround:
- Workaround for PUHC failure:Restart Control Path (CP) again and perform a system health check after five minutes.
svc_container_mgmt restart CP
NOTE: If another CP restart does not resolve the issue, wait for 15 minutes and restart CP again.
Restarting CP (management services) does not affect host access, and should only cause a short loss of access to the PowerStore Manager user interface (UI).
Restarting CP (management services) does not affect host access, and should only cause a short loss of access to the PowerStore Manager user interface (UI).
- Workaround for replication pause issue:
1. Restart CP again on the problematic appliance.
2. Resume the replication_session on the Source array user interface. Check the result.
- Workaround for "cannot add remote system" issue:
1. Restart CP on the problematic appliance.
2. Retry "add remote system" on the Source array user interface. Check the result.
NOTE: If another CP restart does not resolve the issue, wait for 15 minutes and restart CP again.
Restarting CP (management services) does not affect host access, and should only cause a short loss of access to the PowerStore Manager user interface (UI).
Restarting CP (management services) does not affect host access, and should only cause a short loss of access to the PowerStore Manager user interface (UI).
Fix:
The fix for this issue is available in PowerStoreOS 3.6.Affected Products
PowerStore, PowerStore 1000T, PowerStore 1200T, PowerStore 3000T, PowerStore 3200T, PowerStore 5000T, PowerStore 500T, PowerStore 5200T, PowerStore 7000T, PowerStore 9000TProducts
PowerStore 9200TArticle Properties
Article Number: 000216753
Article Type: Solution
Last Modified: 26 Feb 2024
Version: 8
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.