All Consistency Groups replicating from a specific VMAX3/PowerMax array go into Error or Paused by System state.
Error in CLI (get_system_status) and GUI alerts:
Items: ERROR: One or more links of group CG1 are set to replicate snaps and an error occurred in the snap-based replication process. The following errors were received from the storage: Link = Site1 -> Site2, error = [An internal Snap or Clone error has occurred. Please see the symapi log file]
On the Site Control RPA Symapi log (/var/symapi/log/symapi-xxxxxx.log):
01/02/2020 18:20:26.504 4002 STARTING a SNAPVX 'ESTABLISH' operation for Device List, snapshot 'RPc786dd98_xxxx_000xxxxxxxxxxxxxxx'.
01/02/2020 18:20:26.505 4002 Symm 0001xxxxxxxx Number of Devices: xx
01/02/2020 18:20:26.505 4002 Source Devices: [ 0xxx-0xxx ]
01/02/2020 18:20:26.513 4002 6 EMC:RecoverPoint execute_9244_syscall syscall: 0x81, 00xxx(S), SYMAPI: SYMAPI_C_SNAP_INTERNAL_ERROR, Order17: SNAPVX_FAILURE(0x4D), Extended RC: Unknown(0x000800C5), flag: 0x00
01/02/2020 18:20:26.686 4002 STARTING a SNAPVX 'ESTABLISH' operation for Device List, snapshot 'RPc786dd98_xxxx_000xxxxxxxxxxxxxxx'.
01/02/2020 18:20:26.687 4002 Symm 0001xxxxxxxx Number of Devices: 1
01/02/2020 18:20:26.687 4002 Source Devices: [ 0xxx ]
01/02/2020 18:20:26.692 4002 7 EMC:RecoverPoint execute_9244_syscall syscall: 0x81, 00xxx(S), SYMAPI: SYMAPI_C_SNAP_INTERNAL_ERROR, Order17: SNAPVX_FAILURE(0x4D), Extended RC: Unknown(0x000800C5), flag: 0x00
01/02/2020 18:20:27.520 4002 6 EMC:RecoverPoint execute_9244_syscall syscall: 0x81, 00xxx(S), SYMAPI: SYMAPI_C_SNAP_INTERNAL_ERROR, Order17: SNAPVX_FAILURE(0x4D), Extended RC: Unknown(0x000800C5), flag: 0x00
...
01/02/2020 18:20:29.532 4002 The SNAPVX 'ESTABLISH' operation FAILED.
01/02/2020 18:20:29.532 4002 The Snapvx operation FAILED.
01/02/2020 18:20:29.702 4002 The SNAPVX 'ESTABLISH' operation FAILED.
01/02/2020 18:20:29.702 4002 The Snapvx operation FAILED.
RecoverPoint uses SnapVX operations on VMAX3/PowerMax arrays, and the Storage Resource Pool (SRP) on the array filled up, causing all snapshots to fail.
RecoverPoint is suffering from the SRP filling up on the array, similar to KBA VMAX3, VMAX All Flash, & PowerMax: Storage Resource Pool (SRP) threshold alert
Workaround:
As discussed on KBA VMAX3, VMAX All Flash, & PowerMax: Storage Resource Pool (SRP) threshold alert the first step is to make some space available for the SRP.
Due to the issues in the SnapVX snapshots, in order for RP replication to work, the Cgs will need to be disabled and later re-enabled.
The action of disabling the CGs should free up some space on the SRP since the local snapshots will be cleared, however the same snapshots will be recreated when the CGs are re-enabled, so more actions should be taken to allow for space on the SRP other than the disable/enable.
Resolution:
In order to prevent this issue from occurring, there should always be space available on the local array SRP.
Monitor the spaced used on the SRP with VMAX tools (Solutions Enabler/ Unisphere for VMAX)
The steps mentioned in KBA VMAX3, VMAX All Flash, & PowerMax: Storage Resource Pool (SRP) threshold alert can be used before the space is consumed to prevent the issue from occurring.