PowerStore: Async Replication Sessions are out of sync, System_Paused, not progressing, or impacting node stability

Summary: PowerStore volume replication appears as not syncing after a recent node reboot or otherwise reboot during NDU. This condition causes system stability impact if left unresolved. Node reboot can happen shortly after new replication session is created. ...

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

The system reports replications out of sync, and the system pauses the activity.

An alert is triggered on User Interface:
A replication session in this state cannot synchronize data from the local cluster to the remote cluster until the issue is resolved. The RPO for this session might be out of compliance.

Journal or Alert indications:
May 11 22:53:26.629200 ::: B_ALERT volume Major A replication session associated with resource Test-Repl2 is in System_Paused state because of error: Transit object with Handle cbcf96ea-e8c8-4d88-8024-4bc507d7ac7c:34:761 : status not available.

Replication session also attempts to delete and re-create, after which the session showed no progress.
  • PowerStore node reboots occur as a result of this condition.

Cause

The issue is due to the storage network MTU being inconsistent with the user's network infrastructure.

In one situation, this occurred when the storage network (Async-replication transport layer) was changed from standard 1500 MTU to 9000.

Resolution

MTU must have a consistent configuration across the entire user's network.
Creating an inconsistent MTU network breaks replication relation and impacts PowerStore stability as nodes eventually reboot as a result of this misconfiguration.

Revert the Storage network configuration back to MTU 1500.
This allows replication to resume working without any further action required.

Note:
While the validation tool ("Verify and Update" Under Protection/Remote System in PowerStore UI) may show proper connection to the remote site, support requires the use of ping test from source to DR and the opposite way with specific packet size to test and confirm that MTU 9000 transport is NOT working.

Additional Information

In one additional scenario, replication can also show not progressing when both storage networks on source and remote sites are configured with same IP addresses range. This is a rare situation that may happen due to human error during initial implementation.

The issue is not related to the following article:
KB 185143: PowerStore: After the successful NDU of PowerStore cluster, replication session's synchronization goes into System_Paused state.

Affected Products

PowerStore
Article Properties
Article Number: 000187268
Article Type: Solution
Last Modified: 26 Dec 2022
Version:  5
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.