SRA Test Failover failed: Failed to create snapshots of replica devices
Summary:
Symmetrix Remote Data Facility (SRDF) Storage Replication Adapter (SRA) Test failover failed for a few Protection Groups in a Recovery Plan.
Error: Failed to create snapshots of
replica devices. Failed to create a snapshot of replica consistency group . The SRA command 'testFailoverStart' failed for the consistency group xx. Failed to fetch replication information. Please check EmcSrdfSra logs for more information on the error.
...
Symptoms
While running an SRDF SRA test failover of a Recovery Plan containing multiple Protection Groups, the test failover for a few protection groups failed with error:
Error: Failed to create snapshots of replica devices. Failed to create snapshot of replica consistency group xx. SRA command 'testFailoverStart' failed for consistency group xx. Failed to fetch replication information. Please check EmcSrdfSra logs for more information on the error.
The EmcSrdfSraGlobalOptions.xml and the EmcSrdfSraTestFailoverConfig.xml files are correctly configured.
The Test Failover also failed when the affected Protection Groups were added to individual recovery plans and performed test failover upon, with the same error.
Cause
It was found out that the disaster recovery (DR) side Device Groups had gone into an Invalid state. This is seen in a symdg list command output from the Recovery Site SYMAPI server. This is the reason why the SRA Test Failover failed.
The following errors are seen in the symapi.log:
EMC:SMBASE emcSymValidateGroup Group (DG_NAME) invalid; TGT dev (0x 200), Symm <array SN> ; reason; SYMAPI_C_DEV_IS_VVOL (The action cannot be performed because the specified device is a VVol device)
It was found that there were a few VVol devices configured and added to the Device Groups (DG). These were later deleted from the array. Device Groups are host-based and user-defined, so deleting the devices directly from the PowerMax array does not update the Device Group.
The deleted device information existed in the DG, and the DG went into an invalid state. When the SRA Test failover was performed on the invalid DG, it failed.
Resolution
- It is recommended to make sure that the DG is valid before performing a SRA Test Failover on the same.
- If the above step is missed and the SRA test failover was performed and failed, the below action plan can help recover the SRA Failover operation.
- Delete the DG and re-create it from scratch. This is the quicker option.
-
symdg delete DGNAME -force symdg create DGNAME -type RDF2 --> RDF2 since the test failover will always happen on the Recovery Site. symdg -g DGNAME -sid <SN> add dev <dev_name>
-
OR
- Export the group to a file, remove the offending device, delete the current DG, and import the DG again from the file.
-
symdg export DGNAME -file <filename.txt> Update the file to contain only the devices that should be in the DG symdg delete DGNAME -force symdg import DGNAME -file <filename.txt>
-
- Perform the SRA Test Failover.