Data Protection Search: Restoring files from large NetWorker backups fails due to timeout

Summary: When trying to restore files using Data Protection Search, a timeout may occur when querying for the backup details in NetWorker for large backups.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

When trying to restore files using Data Protection Search, a timeout may occur when querying for the backup details in NetWorker for large backups (hundreds of GBs to TBs).

The following is be seen in the Search-NetWorker-Worker.logs after one hour (timeout):     
|2021-03-30 11:32:34,482|BaseProvider::runWorkitem|INFO|[task-begin][AXiCsSKVxvpb3JQZu51A][YourFileToRestore.bat] |BaseProvider.java(916)|
|2021-03-30 11:32:34,482|NetworkerRestoreHandler::runTask|INFO|Start to run restore task. Task id: [AXiCsSKVxvpb3JQZu51A] |NetworkerRestoreHandler.java(43)|
|2021-03-30 11:32:34,485|RestoreTaskHelper::runTask|INFO|Start to run restore task [AXiCsSKVxvpb3JQZu51A]. |RestoreTaskHelper.java(45)|
|2021-03-30 11:32:34,520|ConnectorFactory::createConnector|INFO|create Connector for platform: Networker |ConnectorFactory.java(21)|
|2021-03-30 11:32:34,690|BaseProvider::updateWorkitem|INFO|Workitem id=AXiCsSKVxvpb3JQZu51A,type=restore,category=task,pid=AXiCsRoPxvpb3JQZu50_,ppid=AXiCsQjjxvpb3JQZu50-, updating updated. |BaseProvider.java(545)|
|2021-03-30 12:32:37,158|RestoreTaskHelper::runTask|WARNING|Restore task failed due to timeout |RestoreTaskHelper.java(95)|
|2021-03-30 12:32:37,159|RestoreTaskHelper::runTask|ERROR|Restore task [AXiCsSKVxvpb3JQZu51A] run failed. com.emc.zinc.dpsearch.ApplicationException: Restore task failed due to timeout
at com.emc.zinc.dpsearch.restore.RestoreTaskHelper.runTask(RestoreTaskHelper.java:96)
at com.emc.zinc.networker.restore.NetworkerRestoreTaskHandler.runTask(NetworkerRestoreTaskHandler.java:25)
at com.emc.zinc.networker.restore.NetworkerRestoreHandler.runTask(NetworkerRestoreHandler.java:48)
at com.emc.zinc.BaseProvider.runWorkitem(BaseProvider.java:923)
at com.emc.zinc.WorkitemHandler.run(WorkitemHandler.java:76)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source) |RestoreTaskHelper.java(121)|
|2021-03-30 12:32:37,184|NetworkerRestoreHandler::runTask|ERROR|An exception was thrown during restore task running. Task id: [AXiCsSKVxvpb3JQZu51A] |NetworkerRestoreHandler.java(50)|
|2021-03-30 12:32:37,184|BaseProvider::runWorkitem|ERROR|Workitem id=AXiCsSKVxvpb3JQZu51A,category=task,type=restore run failed: An exception was thrown during restore task running: Restore task failed due to timeout com.emc.zinc.ZincException: An exception was thrown during restore task running: Restore task failed due to timeout
at com.emc.zinc.networker.restore.NetworkerRestoreHandler.runTask(NetworkerRestoreHandler.java:51)
at com.emc.zinc.BaseProvider.runWorkitem(BaseProvider.java:923)
at com.emc.zinc.WorkitemHandler.run(WorkitemHandler.java:76)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source) |BaseProvider.java(990)|
|2021-03-30 12:32:37,211|BaseProvider::runWorkitem|INFO|[task-end][AXiCsSKVxvpb3JQZu51A][3602729ms][YourFileToRestore.bat] |BaseProvider.java(1016)|

This was encountered in a distributed environment, backup from NETAPP NDMP recovery from Cloud, which take time to pull data from cloud to DD, rehydrate on remote storage node, and then finally send data to NetApp filer. Only after this operation completes is the backup presented to NetWorker.

If we do the restore from NetWorker side, it takes time to pull the information from the cloud, but it does finish. No timeout is presented.

Cause

This is due to a software limitation cited in escalation Jira Zinc-1250. A hard coded one hour limit for NetWorker restore operation updates (heartbeat).

Resolution

This issue is resolved in versions of Data Protection Search 19.6 and later.

If this issue is encountered and a version of Search that resolves this issue is not available, contact support to determine if there is binary release available for the version in use.

Products

Data Protection Search, Data Protection Search
Article Properties
Article Number: 000186163
Article Type: Solution
Last Modified: 07 Jun 2021
Version:  2
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.