RecoverPoint with VMware SRM: Failover operations fail "Create writable storage" step
Summary: VMware Site Recovery Manager(SRM) Failover or Test Failover operations fail after some time on "Create writable storage" step: Error - Failed to create snapshots of replica devices. SRA command 'testFailoverStart' failed. Failed opening session for user to site mgmt IP. ...
Symptoms
Due to Site Recovery Manager (SRM) test failures, users increase the SRM timeouts beyond 31 minutes and image access enable operations take more than the timeout setting.
SRM Test Failover or Failover operations may fail on Step 4: "Create writable storage snapshot" after a period of time on this step (31 minutes by default) with error in SRM Recovery plan steps:
Error - Failed to create snapshots of replica devices. SRA command 'testFailoverStart' failed. Failed opening session for user to site mgmt IP.
Errors from SRM logs (C:\ProgramData\VMware\VMware vCenter Site Recovery Manager\Logs):
--> Feb 20, 2019 3:12:52 PM com.emc.santorini.log.KLogger log --> INFO: Starting to run: TestFailoverStart command --> Feb 20, 2019 3:43:53 PM com.emc.santorini.log.KLogger logWithException --> WARNING: Caught SocketTimeoutException. Please check your network connection to the RPAs. --> javax.xml.ws.WebServiceException: java.net.SocketTimeoutException: Read timed out --> at com.sun.xml.internal.ws.transport.http.client.HttpClientTransport.readResponseCodeAndMessage(Unknown Source) --> at com.sun.xml.internal.ws.transport.http.client.HttpTransportPipe.createResponsePacket(Unknown Source) --> at com.sun.xml.internal.ws.transport.http.client.HttpTransportPipe.process(Unknown Source) --> at com.sun.xml.internal.ws.transport.http.client.HttpTransportPipe.processRequest(Unknown Source) --> at com.sun.xml.internal.ws.transport.DeferredTransportPipe.processRequest(Unknown Source) --> at com.sun.xml.internal.ws.api.pipe.Fiber.__doRun(Unknown Source) --> at com.sun.xml.internal.ws.api.pipe.Fiber._doRun(Unknown Source) --> at com.sun.xml.internal.ws.api.pipe.Fiber.doRun(Unknown Source) --> at com.sun.xml.internal.ws.api.pipe.Fiber.runSync(Unknown Source) --> at com.sun.xml.internal.ws.client.Stub.process(Unknown Source) --> at com.sun.xml.internal.ws.client.sei.SEIStub.doProcess(Unknown Source) --> at com.sun.xml.internal.ws.client.sei.SyncMethodHandler.invoke(Unknown Source) --> at com.sun.xml.internal.ws.client.sei.SyncMethodHandler.invoke(Unknown Source) --> at com.sun.xml.internal.ws.client.sei.SEIStub.invoke(Unknown Source) --> at com.sun.proxy.$Proxy36.testFailoverStartWithOpaques(Unknown Source) --> at com.emc.santorini.handlers.SantoriniLogic.testFailoverStart(SantoriniLogic.java:278) --> at com.emc.santorini.commands.TestFailoverStartCommand.execute(TestFailoverStartCommand.java:40) --> at com.emc.santorini.handlers.SantoriniCommandDispatcher.handleCommandAction(SantoriniCommandDispatcher.java:105) --> at com.emc.santorini.main.SantoriniMain.main(SantoriniMain.java:57) --> Caused by: java.net.SocketTimeoutException: Read timed out ...
No errors on the RecoverPoint side
Cause
The default SRM timeouts are set to 5 minutes and can be increased, when increased beyond 31 minutes, a different timeout - SRA timeout may occur if the image access process takes more than 1860 seconds (31 minutes) - WEB_SERVICE_REQUEST_TIMEOUT which is set to 1860 seconds by default.
Resolution
Resolution:
Change SRA timeouts to match the requested timeout changes in SRM.
SRA timeouts are set on the SRM server, under: C:\Program Files\VMware\VMware vCenter Site Recovery Manager\storage\sra\array-type-recoverpoint\conf\cancun_run.properties file
The changes should be made on both SRM servers.
Change the following properties to match the SRM timeouts (Set in seconds), in this example it is set to 1 hour - 3600 seconds:
VERIFY_PAUSED_TSP_TIMEOUT=3600 VERIFY_REPLICATING_TIMEOUT=3600 VERIFY_TRANSFER_SNAP_SHIPPING_IDLE_TIMEOUT=3600 WEB_SERVICE_REQUEST_TIMEOUT=3600