When a scsi unmap command is run and sends an I/O that is greater than 1MB to the VPLEX splitter, when RecoverPoint is configured on the VPLEX, it causes issues that in turn will cause director firmware to crash and could potentially lead to a Total Cluster Outage (TCO) or possible Data Unavailable (DU) situation.
From the firmware logs you will see the ASSERT listed:
128.221.253.35/cpu0/log:5988:W/"006016ad681c5222-2":2337:<0>2022/02/24 05:11:00.94: utl/0 ASSERT: /export/local1/jenkins/clone_legacy/nsfw/snac/amf/splitter.c:splitterPrepareAlpsFromDva/572: not enough alps available for transfer
128.221.252.35/cpu0/log:5988:W/"006016ad681c5222-2":8879:<0>2022/02/24 05:43:45.72: utl/0 ASSERT: /export/local1/jenkins/clone_legacy/nsfw/snac/amf/splitter.c:splitterPrepareAlpsFromDva/572: not enough alps available for transfer
In GeoSynchrony 6.2 Patch 5 changes were made in how unmap commands are processed which created an exposure for an unhandled condition when SCSI UNMAP commands larger than 1MB are received for volumes involved in RecoverPoint replication, which causes the director firmware to crash.
Permanent Resolution:
This issue has been fixed in GeoSynchrony 6.2 Patch 6 and later.
Workaround:
- If this issue is being actively encountered you will need to contact VPLEX Customer Support for assistance and also If you want to see what virtual-volumes are set for thin-enabled, as unmap is used for thin-enabled virtual-volumes only, run the command below, set Putty to logging to capture the output from the command,
VPlexcli:/> ll /clusters/cluster-*/virtual-volumes
- Once the command has completed you will have a saved copy of the putty captured data to refer to.
- If you want to try to access the VPlexcli to disable unmap, called "Thin Enabled", for all virtual-volumes run the command below. As the command is a global command it can be run at the main VPlexcli prompt, you do not need to drill down into the virtual-volume context level. To disable Thin-Enabled for all virtual-volumes run the command string below.
set /clusters/*/virtual-volumes/*::thin-enabled false
- At this point the system should stabilize and the firmware should stop crashing. Check 'cluster status' and 'director uptime' to make sure all directors are online. Use 'director run' to start any director/s listed in a 'stopped' status (this will not restart any director/s already working).