PowerProtect Cyber Recovery: Recovery Fails at the Update PPDM Data Domain Address Task
Summary: PowerProtect Data Manager recovery initiated from Cyber Recovery fails at the "Update PPDM Data Domain Address" task.
Symptoms
The PowerProtect Data Manager recovery proceeds as far as the Update PPDM Data Domain Address task as seen in the /opt/dellemc/cr/var/log/apps/apps.log:
[2024-10-02 12:04:02.619] [DEBUG] [apps] [jobs.go:519 Log()] : [JobID: 66fd12a530b8d015e3700469 JobName: recoverapp_7F74F1 Task: recoverapp phase2] recovery Task Name: 95- Update PPDM DataDomain Address
[2024-10-02 12:04:02.619] [INFO] [apps] [jobs.go:517 Log()] : [JobID: 66fd12a530b8d015e3700469 JobName: recoverapp_7F74F1 Task: recoverapp phase2] Attempting to replace PPDM production data domain with vault data domain
[2024-10-02 12:05:02.850] [INFO] [apps] [jobs.go:517 Log()] : [JobID: 66fd12a530b8d015e3700469 JobName: recoverapp_7F74F1 Task: recoverapp phase2] Attempting to update PPDM production DataDomain prod_dd.cyber.local to point to vault DataDomain address 192.168.100.10
The next step is logging into the PowerProtect Data Manager host and running the API to update the Data Domain.
It successfully obtains a token, but fails at the datadomain-replacement api endpoint:
[2024-10-02 12:05:03.058] [DEBUG] [apps] [jobs.go:519 Log()] : [JobID: 66fd12a530b8d015e3700469 JobName: recoverapp_7F74F1 Task: recoverapp phase2] PPDM Version: 19.17.0-10
[2024-10-02 12:05:03.058] [INFO] [apps] [restapi_client.go:310 CallRESTAPI()] : Methods is : POST URL is: https://192.168.254.174:8443/api/v2/token
[2024-10-02 12:05:03.200] [INFO] [apps] [restapi_client.go:310 CallRESTAPI()] : Methods is : POST URL is: https://192.168.254.174:8443/api/v2/datadomain-replacement
[2024-10-02 12:05:03.248] [INFO] [apps] [jobs.go:517 Log()] : [JobID: 66fd12a530b8d015e3700469 JobName: recoverapp_7F74F1 Task: recoverapp phase2] PPDM server returned service unavailable code: 503 Retrying in 30 seconds... 1 of 10
...
[2024-10-02 12:10:03.954] [ERROR] [apps] [jobs.go:521 Log()] : [JobID: 66fd12a530b8d015e3700469 JobName: recoverapp_7F74F1 Task: recoverapp phase2] Failed to initiate the production data domain replacement operation
From the PowerProtect Data Manager /logs/brs/zuul/zuul.log, there is a corresponding 503 message:
2024-10-02T10:06:03.896Z INFO [] [https-jsse-nio-8443-exec-4] [][][][][] [c.e.b.z.f.p.AuthenticationFilter.run(320)] - remoteHost 192.168.255.2, uri /api/v2/datadomain-replacement
2024-10-02T10:06:03.901Z INFO [] [https-jsse-nio-8443-exec-4] [][][][TRACE_ID:aab7bc711d8a0bbd][] [c.e.b.z.f.p.RibbonReadTimeoutFilter.run(62)] - replmgr ribbon ReadTimeout is :90000
2024-10-02T10:06:03.901Z INFO [] [https-jsse-nio-8443-exec-4] [][][][TRACE_ID:aab7bc711d8a0bbd][] [c.e.b.z.f.ApiMetricsBaseFilter.logFromPreFilter(35)] - Request begin : 72724cf4-19b3-4ebf-a7b7-ce4f3a9283a1 POST /api/v2/datadomain-replacement
2024-10-02T10:06:03.902Z ERROR [] [https-jsse-nio-8443-exec-4] [][][][TRACE_ID:aab7bc711d8a0bbd][] [c.e.b.z.f.e.ErrorFilter.run(82)] - Error during filtering, ctx={retryable=false, request=FirewalledRequest[ org.apache.catalina.connector.RequestFacade@29e9d44d], ignoredHeaders=[set-cookie, expires, x-content-type-options, x-xss-protection, cookie, x-frame-options, cache-control, pragma], originResponseHeaders=[com.netflix.util.Pair@96ed8774], zuulRequestHeaders={x-dell-log-trace=TRACE_ID:19493e27-470f-42d3-aab7-bc711d8a0bbd;, x-forwarded-host=192.168.254.174:8443, x-forwarded-proto=https, x-service-response-timeout=90000, x-forwarded-port=8443, xbrs_requestorigin=external_remote, x-forwarded-for=192.168.255.2, x-original-host=192.168.254.174}, throwable=com.netflix.zuul.exception.ZuulException: Filter threw Exception, zuulEngineRan=true, requestURI=/api/v2/datadomain-replacement, isDispatcherServletRequest=false, proxy=dataDomainReplacementSpinalCased, processStartTime=1727863563901, sendErrorFilter.ran=true, X-THROTTLING-REQUEST=Request(identify=192.168.255.2, uri=/api/v2/datadomain-replacement, method=POST, size=211, throttlingKey=192.168.255.2#-798204950, isInAllowList=false, reasonCode=0), response=com.netflix.zuul.http.HttpServletResponseWrapper@6396c44a, executedFilters=ServletDetectionFilter[SUCCESS][0ms], Servlet30WrapperFilter[SUCCESS][0ms], ThrottlingPreFilter[SUCCESS][0ms], AuthenticationFilter[SUCCESS][4ms], CorrelationIDFilter[SUCCESS][0ms], RequestOriginFilter[SUCCESS][0ms], PreDecorationFilter[SUCCESS][0ms], LegacyFilter[SUCCESS][0ms], ResourceValidationFilter[SUCCESS][1ms], RibbonReadTimeoutFilter[SUCCESS][0ms], FeatureDisabledFilter[SUCCESS][0ms], ApiMetricsPreFilter[SUCCESS][0ms], RibbonRoutingFilter[FAILED][1ms], RibbonRoutingFilter[FAILED][1ms], serviceId=replmgr}
2024-10-02T10:06:03.902Z ERROR [] [https-jsse-nio-8443-exec-4] [][][][TRACE_ID:aab7bc711d8a0bbd][] [c.e.b.z.f.e.ErrorFilter.run(83)] - cause :
org.springframework.cloud.netflix.zuul.util.ZuulRuntimeException: com.netflix.zuul.exception.ZuulException: Forwarding error
...
Caused by: com.netflix.zuul.exception.ZuulException: Forwarding error
...
Caused by: com.netflix.client.ClientException: Load balancer does not have available server for client: replmgr
2024-10-02T10:09:34.229Z INFO [] [https-jsse-nio-8443-exec-9] [][][][TRACE_ID:87bed23ea1942f67][] [c.e.b.z.f.ApiMetricsBaseFilter.logFromPostFilter(65)] - Request end : a8c69039-7c9d-420a-a1bc-3470e94def1b 503 /api/v2/datadomain-replacement Cause
The serverdr service was unable to start during the recovery process due to a problematic function.
The following was found in the /logs/brs/serverdr/serverdr.out:
Server Disaster Recovery
ERROR SpringApplication Application run failed
java.lang.IllegalArgumentException: Could not resolve placeholder 'APP_NAME' in value "service.name=${APP_NAME}"
Resolution
The following workaround can be implemented until this can be resolved in a future release:
- Delete any existing VM snapshot on the vault PowerProtect Data Manager.
- For
serverdr service, edit the file/usr/local/brs/lib/serverdr/config/application.propertiesline:
metrics.export.otlp.resourceAttributes=service.name=${APP_NAME}
Change to:
metrics.export.otlp.resourceAttributes=service.name=${app.name}
- For the other services
vmdm, nasdm, replmgr, scheduler, repeat steps 4 and 5. - Create a file
/usr/local/brs/lib/xxxx/config/application.properties(where xxxx is the service name listed above) - Add the below line to the file:
metrics.export.otlp.resourceAttributes=service.name=${app.name}
If the file /usr/local/brs/lib/xxxx/config/application.properties already exists in any service, follow the steps 1-3 for serverdr.
The subsequent recovery attempt should be successful.